^{1}

^{1}

^{*}

^{1}

^{2}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Data simulation is widely used in remote sensing to produce imagery for a new sensor in the design stage, for scale issues of some special applications, or for testing of novel algorithms. Hyperspectral data could provide more abundant information than traditional multispectral data and thus greatly extend the range of remote sensing applications. Unfortunately, hyperspectral data are much more difficult and expensive to acquire and were not available prior to the development of operational hyperspectral instruments, while large amounts of accumulated multispectral data have been collected around the world over the past several decades. Therefore, it is reasonable to examine means of using these multispectral data to simulate or construct hyperspectral data, especially in situations where hyperspectral data are necessary but hard to acquire. Here, a method based on spectral reconstruction is proposed to simulate hyperspectral data (Hyperion data) from multispectral Advanced Land Imager data (ALI data). This method involves extraction of the inherent information of source data and reassignment to newly simulated data. A total of 106 bands of Hyperion data were simulated from ALI data covering the same area. To evaluate this method, we compare the simulated and original Hyperion data by visual interpretation, statistical comparison, and classification. The results generally showed good performance of this method and indicated that most bands were well simulated, and the information both preserved and presented well. This makes it possible to simulate hyperspectral data from multispectral data for testing the performance of algorithms, extend the use of multispectral data and help the design of a virtual sensor.

Remote sensing is playing an increasingly important role in earth science research and environmental problem solving. A number of earth satellites have been launched to advance our understanding of Earth’s environment. Satellite sensors, both active and passive, capture data from visible to microwave regions of the electromagnetic spectrum. A wide range of satellite data, including multispectral data and hyperspectral data, such as Landsat Thematic Mapper 5/Enhanced Thematic Mapper (TM/ETM+); Global Imager (GLI); Moderate Resolution Imaging Spectroradiometer (MODIS); and Advanced Land Imager (ALI) and Hyperion, are frequently used in oceanography, hydrology, geology, forestry, and meteorology studies. Different studies and applications require different spatial, spectral, radiant resolution, and time-resolution data [

Data simulation is widely used in remote sensing. It is often utilized to produce imagery for virtual or new sensors that are in the design stage. Simulated data can be used to assess or evaluate the spectral and spatial characteristics of the sensor, which are critical in the planning of a project [

The universal pattern decomposition method (UPDM) is a sensor-independent method which can be considered as a spectral reconstruction approach, in which each satellite pixel is expressed as the linear sum of fixed, standard spectral patterns for water, vegetation, and soil, and the same normalized spectral patterns can be used for different solar-reflected spectral satellite sensors [

The spectral reconstruction approach is based on the UPDM, which is a sensor-independent method derived from PDM that has been successfully applied in many studies [

Here, _{i}_{w}_{v}_{s}_{4}_{w}_{v}_{s}_{4}

For each sensor band, the standard spectral patterns of each band _{i}_{w}_{i}_{v}_{i}_{s}_{e}_{s}_{k}_{k}_{k}

As the supplemental pattern is not fixed, it can be chosen according to the purpose of the study. As an example, we used a yellow-leaf spectrum to briefly show how a supplemental is added. Due to the multi-colinearity, the yellow-leaf pattern cannot be added directly. A residual yellow-leaf pattern is used as the supplementary spectral pattern (see [_{4}_{4}

_{4}_{i4}

For simplicity, we express UPDM in matrix form as follows [_{1}, _{2}, ⋯, _{n}]^{T} is the column vector of observations; n is the number of spectral bands of a sensor; _{w}_{v}_{s}_{4}^{T} is the _{w}_{v}, _{s}, _{4}]^{T} is the column vector of UPDM coefficients, and r is the residual column vector. C can be obtained by minimizing the sum-of-squared-error criterion function:

The reduced χ^{2} employed to evaluate the precision of fitting is defined as follows [

Spectral sensitivity is an important parameter of any sensor and is normally expressed as the spectral response function (SRF), which is the relative responsivity of the sensor to monochromatic radiation of different wavelengths. Various studies have indicated the effects of sensor SRFs on analysis results in a variety of applications, and it is very important to take SRFs into account when comparing data from different sensors and applying physical models [_{i}

The study area was located in Yueyang in northeastern Hunan province, P.R. China, where Dongting Lake, China’s second-largest freshwater lake, connects to the Yangtze River. The climate of the Dongting Lake area is between middle and northern subtropical. The annual mean temperature is about 16.4°C–17°C; the mean temperature in January is 3.8°C–4.5°C and in July is about 29°C. The annual precipitation is about 1,100 mm–1,400 mm, with more than half of the rainfall occurring between April and June.

The remote sensing data used in this study were ALI and Hyperion images. Hyperion and ALI, collecting data over the same area simultaneously, are two of the three sensors onboard the NASA EO-1 satellite with sun-synchronous orbit at an altitude of 705 km. The cross-track width of an ALI scene and a Hyperion scene are 37 and 7.7 km, respectively. The along-track scene length for both ALI and Hyperion will generally be either 42 km or 185 km, depending on the dimensions specified when the scene was scheduled. ALI was built to provide vital information for the next Landsat mission, with 1 panchromatic band and 9 multispectral bands, most of which are comparable to ETM+ bands. The Hyperion sensor collects a total of 242 bands, and its final L1R product provides a total of 198 bands representing 427–2,395 nm continuous spectra with 10-nm spectral resolution.

ALI and Hyperion data covering the study area were acquired on September 2, 2002. The center of the images is 29.38° N, 113.06° E. Frequent torrential rain in May and June 2002 led to localized flooding, which damaged crops and infrastructure along the shores of Dongting Lake. ALI data served as source data from which we attempted to simulate hyperspectral data, whereas Hyperion data served as test data, i.e., real data, to test and evaluate the final results.

Vertical streaks between columns in the along-track direction of image data in a push-broom system are quite common. Such effects are evident in Hyperion data, especially in shortwave infrared (SWIR) channels. To remove the striping, we adopted a statistical balancing method that calculates the mean and standard deviation of a local selectable neighborhood of columns as the reference values to adjust the column data [

After destriping, Hyperion L1R data were atmospherically corrected using ACORN5.0 in mode 1.5 to obtain surface reflectance data. As ALI data are of very good quality, atmospheric correction was performed directly using ACORN5.0 in mode 5.0 without similar preprocessing.

Although both sensors are onboard the same satellite, the areas covered by the pixels of their images were not strictly identical. Hyperion data were geometrically corrected to ALI data using first-order polynomial interpolation and bilinear resampling.

As described in Section 2, it is necessary to calculate standard pattern matrices of ALI and Hyperion to apply UPDM. According to (12), SRFs of both sensors are also needed. The SRFs of ALI are available from the website of CSRIO (

The Gaussian function g(λ̄_{i}, σ_{i}) can be represented by the central wavelength λ̄_{i} and the bandwidth σ_{i}, which is a function of Full Width at Half Maximum (FWHM) (14). With the assumption that the peak of the Gaussian function corresponding to the central wavelength is 1, the formula of g(λ̄_{i}, σ_{i}) is given as:

After obtaining the SRFs of both sensors, the standard pattern matrices are calculated using (12). The standard spectral patterns used here were the same as those used previously [_{A}, has an order of 9 × 4, and the matrix of Hyperion, denoting as _{H}, has an order of 106 × 4. That is, information from all ALI multispectral bands is used to simulate Hyperion data. Due to the strong water vapor absorption, low SNR, and valid region of wavelength of standard spectral patterns used, a subset of 106 Hyperion bands were used in this study (

UPDM is applied to ALI data to acquire the decomposition coefficients vector _{A}, which is considered to be sensor-independent, i.e., it holds the same value when UPDM is applied to Hyperion data:

To construct Hyperion data, we substitute _{A} for _{H} in the following equation:

Following the process flow discussed above, 106 bands were simulated based on UPDM from ALI data. We used source data referring to ALI data, simulated data referring to the new generated Hyperion data, and original data or real data referred to as real Hyperion data. Here, we evaluated this simulating method by comparing simulated and original data with regard to three aspects. First, the general appearances of both types of data were compared by visual interpretation to determine whether they have similar visual effects. Second, the statistical characteristics of both types of data were compared to determine whether they show a good correlation. Finally, we performed classification of both types of data to evaluate how information is preserved in application.

Based on UPDM, we used all nine bands of ALI data to simulate the final set of 106 bands of Hyperion data. According to the correlation coefficients between each pair of original and simulated bands, we selected four bands, band 13 (average central wavelength 477.69 nm), band 19 (538.74 nm), band 94 (1,083.99 nm), and band 148 (1,628.81 nm), to show their general appearance by visual interpretation (See Section 4.2 and

By interpreting each pair of images in

For comparison of detailed regions, we selected a small area covering the border of the pond circled by the rectangle in bands 94 and 148. This small region had a great deal of variety and many details and therefore served as a good test region (

The reflectance of original 9-band ALI multispectral data and the simulated 106-band Hyperion data from ALI is obviously different. However, the reflectance of original Hyperion data and that of the simulated Hyperion data is very similar, the latter spectral curve are a little more smooth than that of the former.

For simple statistical comparison, we used the mean and standard deviation of each band from both datasets (

We also calculated the correlation coefficients between each band of original data and the counterpart in the simulated data (

To further evaluate this method, we sampled 1000 pixels at random to perform linear regression with a fixed slope of 1 on a subset of bands, which corresponded to different levels of correlation (^{2} values (0.789 and 0.472, respectively) were much lower than those of other bands. As indicated by its lowest correlation coefficient, band 36 showed the poorest performance with the greatest RMS of 0.123 and lowest R^{2} of 0.472. The wavelength of band 36 was 711 nm, corresponding to the red edge. The rapid change in vegetation reflectance around the red edge may degrade our model performance and cause the lowest correlation coefficient and poor performance. This may also have been responsible for the low correlation coefficients of bands 35 and 37, as they were also around the red edge. Adding some supplementary spectral patterns accounting for this rapid change into UPDM or replacing the vegetation standard spectral pattern with ground-measured vegetation spectra in the study area may improve the results. The second group consisted of bands 52, 94, 113, 148, and 208, all of which had correlation coefficients >0.9. The data points for these bands clustered around the fitted line very well with quite high R^{2} values (0.973, 0.949, 0.936, 0.934, and 0.879, respectively), and their fitted lines were very close to the line 1:1. These observations indicated that these bands were well simulated and highly similar to the original bands. The best performance was observed for band 52; its fitted line was y = x + 0.00209 (R^{2} = 0.973), indicating that the simulated band was almost the same as the real data. Band 13 alone was considered a separate group for which the intercept of the fitted line (0.0286) relative to the dynamic range of data (0.06–0.13) was much greater than that for the second group, causing it to move away from the line y = x. However, the data points of band 13 fit the line quite well. The high values of R^{2} (0.956) and correlation coefficient (0.98) also suggested good linearity.

To evaluate the general results of the whole dataset, the vector angles between simulated and original data of each pixel were calculated. The images were displayed as cosine values of the angle, and a higher value corresponded to a smaller angle (

Combining the above discussion and analysis, most simulated bands, with the exception of a small fraction with quite low correlation coefficients, showed strong correlations and high linearity with the original bands. The vector angle image also showed a high degree of similarity and good simulation for most areas. These observations indicated that our method is valid for simulating Hyperion bands from the viewpoint of statistics.

We also performed classification using the spectral angle mapper method on the whole original Hyperion data (106 bands), simulated Hyperion data (106 bands), and ALI data (9 bands) to evaluate the general performance of our simulation method in classification application (

The aim of our method is to simulate hyperspectral data from multispectral data to make it comparable to the real hyperspectral data. Our method involves transformation to present the information of multispectral data in the new simulated hyperspectral data, but not creating or adding new information. This is quite reasonable, as we could never create any new information just using mathematical techniques. The situation here is quite similar to the PCA method, which never creates any new information but reassigns inherent information.

From the classification images, we can see that the classification results on simulated data are similar to those on ALI data, indicating that the information of ALI data is well preserved in the simulated data using our method. That is, the inherent information of ALI data is not lost after being reassigned. In addition, the classification on original Hyperion data is also similar to the results of simulated Hyperion data shown by the overall classification accuracies, which are calculated using the classification results on original Hyperion data as the reference image to compare and evaluate the classification results of simulated Hyperion data and ALI data (

The classification results showed that our method successfully preserved the inherent information and presented it in the new data.

We have proposed a method to simulate hyperspectral data from multispectral data based on the spectral reconstruction method UPDM. A total of 106 bands of Hyperion were simulated from ALI data covering the same area. Visual comparison showed that the simulated data successfully presented the information of ground features and objects described by the original data for interpretation. To further evaluate our method, we compared the simulated and original data by statistical methods. The results indicated that most bands had very high correlation coefficients, suggesting a high degree of similarity and good consistency of the simulated bands to the original bands. The detailed results of linear regression analyses further verify that, for the bands with high correlation coefficients, the data points were generally clustered very tightly and fit the line 1:1 very well. These observations indicated that most bands showed good linearity and similarity to the original data. The high cosine values of the vector angle between the simulated and original data of each pixel also demonstrated the general good performance of our method.

However, a small fraction of bands showed lower correlation coefficients, corresponding to poor simulation. This may have been because our standard spectral patterns were not collected in the study area. It may be possible to improve the results by adding supplementary patterns and replacing the standard spectral patterns with those derived from ground-measured spectra.

The aim of our method is to simulate hyperspectral data from multispectral data and make them comparable to the real hyperspectral data. Similar to PCA, our method attempts to preserve and reassign the inherent information of multispectral data to the simulated data and to make full use of them, but not to add or create extra new information. The similarity between classification results derived from ALI data and simulated Hyperion data showed that the inherent information of ALI data were well preserved by reassignment to simulated Hyperion data.

Simulated data could serve as a powerful tool in algorithm testing and assessment and could act as a potential surrogate when real hyperspectral data are unavailable. Validation and evaluation of such algorithms should be conducted using hyperspectral images covering a wide range of spatial complexities, but acquiring enough hyperspectral data to meet this need can be difficult. Our method can provide simulated hyperspectral imagery with the spatial complexity of real-world imagery, thus allowing for extensive yet lower-cost testing of algorithms over a wide variety of environmental conditions. In addition to algorithm development and testing, our method can also be applied to simulate the imagery of new sensors still in the design stage.

Although this pilot study demonstrated the good general performance of our method from the viewpoint of visual interpretation, statistical comparison, and classification application, further studies of both the theory and applications should be performed to improve this method. For example, we may add some supplementary spectral patterns or consider the variability of standard spectral patterns, attempt to derive standard spectral patterns from ground-measured spectra, and select different standard spectral patterns for different applications.

This work was supported under the 863 Project of the People’s Republic of China (Project Number 2007AA12Z111, 2008AA121100), the National Scientific and Technological Support Scheme (Project Number 2006BAI09B02, 2008BAC34B03), the National Natural Science Foundation of China (Project Number 30772890), and the Open Research Fund of State Key Laboratory for Information Engineering in Surveying, Mapping, and Remote Sensing (Project Number WKL(06)0102).

Visual comparison of simulated Hyperion data and original Hyperion data: (a) and (b) are band 13 of original Hyperion data and simulation Hyperion data, respectively; (c) and (d) are band 19; (e) and (f) are band 94; (g) and (h) are band 148. There are no significant difference between simulated and original data of band 13, 94 and 148 by visual interpretation while band 19 shows some obvious differences.

Visual comparison of the magnification of a detailed region circled by the white rectangle in

Mean and standard deviation of 106 bands of original and simulated data: (a) shows the mean and standard deviation of original bands, and (b) shows the mean and standard deviation of simulated bands (for convenience in plotting, we used the sequence numbers 1–106 to refer to bands. The corresponding bands are listed in

Correlation coefficients between simulated and original Hyperion data of 106 bands.

Linear regression analysis of simulated and original data. Charts from (a) to (h) are for bands 13, 19, 36, 52, 94, 113, 148, and 208, respectively.

Cosine values of vector angles between simulated and original data of each pixel.

Classification results of original Hyperion data, simulated Hyperion data, and ALI data: (a) shows the classification image of original Hyperion data, (b) shows the classification image of simulated Hyperion data, and (c) shows the classification image of ALI data. (Class label “pond” refers to water, which has characteristics of pond water; label “plant 1” refers to sparse plant area; and Label “plant 2” refers to dense plant area).

Hyperion 106-band subset used in this study.

1–46 | 8–53 | 427–885 |

47–54 | 87–94 | 1,013–1,084 |

55–61 | 107–113 | 1,215–1,276 |

62–81 | 139–158 | 1,538–1,730 |

82–106 | 195–219 | 2,103–2,345 |

Classification accuracies of simulated Hyperion data and ALI data using the classification results of original data as reference image.

| ||||
---|---|---|---|---|

River | 4,957/5,092 | 4,957/5,156 | 4,965/5,092 | 4,965/5,195 |

Pond | 2,838/3,473 | 2,838/3,479 | 2,883/3,473 | 2,883/3,584 |

Plant1 | 24,713/28,839 | 24,713/27,835 | 23,896/28,839 | 23,896/26,669 |

Plant2 | 21,448/23,997 | 21,448/24,366 | 21,769/23,997 | 21,769/25,681 |

Bare Land | 2,076/2,599 | 2,076/3,164 | 2,042/2,599 | 2,042/2,871 |

Kappa | 0.808 | 0.797 | ||

Overall | 87.6% | 86.8% |