1. Introduction
The continuous monitoring of plants at high spatial and temporal resolution is a crucial component of making agriculture more efficient and thereby preparing it for an increasing world population. Currently, such information is mainly collected by remote sensing or when agricultural machines drive on the field. However, while offering high spatial resolution, these methods are not suitable for continuous monitoring at high temporal resolution.
An element for continuous monitoring could be added in the form of a Wireless Sensor Network (WSN) with nodes that include multiple sensors. One very versatile sensor for such a node would be an optical spectrometer or hyperspectral sensor. The versatility stems from the fact that a multitude of information on the plant condition can be derived from different ranges of the spectrum. Naturally, more spectral bands allow for the extraction of more information. We argue that obtaining spectral data may even be superior to adding many specialized sensors as this reduces the cost of individual sensor nodes, allows for re-using plenty of models built around remote sensing data, and allows for the calibration of remote sensing data that contain similar information.
In previous works [
1,
2], it was determined that multispectral sensors with less than 10 bands are more likely to be affordable for such a network than actual spectrometers with significantly more bands. While the miniaturization and cost reduction of such spectrometers are also marking significant steps forward, we expect that multispectral sensors will always stay ahead due to their lower complexity. Smart dust is an idea for the more distant future. In agriculture, it encompasses large numbers of sensor nodes being "planted" together with the plants. If development actually moves in that direction, we expect multispectral sensors may reach a sufficiently small size sooner than hyperspectral sensors and many more specialized sensors. Due to the huge number of devices, reducing the complexity and cost of the individual sensor nodes becomes even more important in that case.
When deriving vegetation information, naturally, high-resolution spectra as acquired by hyperspectral sensors allow for the calculation of more information than multispectral sensors with lower spectral resolution. When using multispectral sensors, there are multiple ways to overcome these limitations: firstly, one may simply accept having less information available; secondly, one may research new algorithms and metrics based on the limited number of bands, yielding similar information. However, this needs to be repeated for every new set of bands; thirdly, one may add a step in between consisting of calculating a high-resolution spectrum based on the low-resolution spectrum and deriving information from it. This allows for re-using all the algorithms and metrics designed for hyperspectral data, making the sensors on the one hand and the algorithms and metrics on the other hand more replaceable. This third approach also facilitates the calibration of remote sensing data as matching bands can be constructed.
In this paper, we further investigated the third way which we called Multi- to Hyperspectral Sensor Network (M2HSN) in [
2]. The structure of an M2HSN is shown in
Figure 1. It consists of multiple sensor nodes with each node being equipped with an array of light sensors whose readings are digitized and transmitted to a fusion center. These sensor nodes are turned into multispectral sensors by adding different optical filters in front of the sensor array. Thus, the sensor nodes are very simple in comparison to high-resolution spectrometers. Note that this also lowers the amount of data transmitted to the fusion center in comparison to actual high-resolution spectrometers.
There are two possibilities for choosing the filters: these may either be homogeneously chosen with the same set of filters in every node or heterogeneously with varying filter sets at different positions.
In our previous work [
2], we focused on the heterogeneous case. In this paper, we investigated the homogeneous case. As no customization of the hardware is required with respect to the location, the mass fabrication of such sensors will be more feasible. These are in fact similar to sensors already being cheaply available nowadays. In contrast to the heterogeneous case, the M2HSN with homogeneous filter sets does not gather any information on the bands not included in the band set. The data must instead be obtained from a different source. One may argue, that this is a problem of the approach, because obtaining hyperspectral data is not fully avoided. Therefore, we evaluated the feasibility of supplying this information by learning it from remote sensing data. The remote sensing data for training needs to be hyperspectral; however, no hyperspectral measurements are required on the ground. As the M2HSN is mainly intended as an addition to remote sensing, the less frequently obtained remote sensing data can be used as training data for the M2HSN. It is therefore relatively easy to obtain. The algorithm for learning the information is K Singular Value Decomposition (K-SVD) [
3].
K-SVD was previously used for estimating hyperspectral images from RGB images as acquired by smartphone cameras or DSLRs [
4]. However, to the best of our knowledge, it has never been investigated how well this approach applies to remote sensing data and data collected on the ground in agricultural fields. This is one of the main areas investigated using spectral remote sensing which we believe deserves an isolated investigation. Furthermore, we investigated how the approach benefits from adding more bands in a broader range of wavelengths—this seems a logical choice as bands in the near infrared proved essential for vegetation.
Our core contributions in the paper are: (1) the first evaluation of the suitability of increasing spectral resolution with K-SVD for in situ and remote-sensing data; (2) providing the idea of the homogeneous M2HSN and its simulative evaluation; and (3) offering a guide for choosing the correct kind of M2HSN, data-processing algorithm, and parametrization depending on the scenario.
The remaining part of this paper is structured as follows: in
Section 2, we give an overview of the underlying methods used and evaluated in this paper and outline the embedding of our work into current research;
Section 3 contains a description of how the methods are used and modified; in
Section 4, we evaluate the configuration of K-SVD in-depth; the comparative evaluations with all methods follow in
Section 5; and finally, we draw conclusions in
Section 6.
4. Parametrization
In this section, we focused on the evaluation of K-SVD because it is the method which is newly introduced to this kind of data in this paper. The goal here was to develop an understanding of how K-SVD manages to cope with the data and find an appropriate parametrization. In a first step, we determined the size of the dictionary and the appropriate value for the sparsity target. We used a similar value range to the plot in [
4] in order to allow for a comparison. The result is shown in
Figure 2. The sparsity target was varied from 5 to 50 and the dictionary size from 100 to 400. The training was performed on
Air_M1, the best bands were selected using
Air_M2 and the evaluation was performed on
Air_M3. The color shows the median Root Mean Square Error (RMSE) which we determined by first calculating the RMSE for the spectrum of each pixel in the evaluation area. We then determined the median across all pixels. Finally, we repeated the process 20 times with different seeds for the band selection and obtained the median of the 20 medians. Note, that we deferred a closer investigation of the distribution of the RMSEs to
Section 5.
Curiously, the result drastically differs from that in [
4]: the error increases when increasing the sparsity target. Increasing the dictionary size only increases reconstruction for higher sparsity targets. However, as the previous effect is stronger, best results are therefore achieved with small dictionary and low sparsity target. We attribute this to the spectra varying less across pixels because of the comparably low resolution in remote sensing, which leads to averaging out special spectra, and because of the relatively homogeneous agricultural environment.
Having found that a very small dictionary suffices, such a small dictionary should be trainable from a smaller training dataset. Therefore, we evaluated the training dataset size in the following step by simply selecting a limited number of pixels at random from the training dataset. This is shown again in
Figure 3 which displays the median RMSE of the reflectance. Instead of the sparsity target, we now vary the number of training pixels. The sparsity target is set to 10 percent of the number of bands in the hyperspectral version of the spectrum. The 10 percent rule is the default rule of the K-SVD implementation in use. The resulting sparsity target is 4, which is close to the optimum in
Figure 2. Furthermore, the sparsity target is limited to the dictionary size as it is impossible to choose more atoms than there available. From
Figure 3 it becomes clear that, surprisingly, a small number of training pixels suffice for a reflectance RMSE of less than approximately 0.03. Increasing the number of training pixels mainly helps increase the result with larger dictionaries. However, as already seen in
Figure 2, a large dictionary leads to a lower reconstruction quality. For dictionary sizes below approximately 16, the results are quite good. Dictionary sizes of 2 and 8 are slightly worse. At the value of 2, we attribute this to the dictionary simply being too small. At the value of 8, we found out that this happens due to SL0 performing poorly when the dictionary size is equal the number of bands. The effect does not occur with OMP—but we still stick with SL0 because of an overall better reconstruction quality. Curiously, the reconstruction quality becomes more variable with the increasing training set size. We attribute this to an increasing chance of having anomaly pixels in the training set. Just a few of these suffice to create a transformation that tries to cover the anomaly pixels as well.For smaller training sets, this can happen in rare cases and will have an even worse effect.However, these cases are not reflected in this evaluation plot as they are rejected when the median is calculated.
An advantage of such a small dictionary is that it can be visualized for qualitative investigation. Some samples are shown in
Figure 4. Each plot contains the elements of a trained transform, also called atoms, as line plots. The number of atoms is increased from left to right by re-training with a different dictionary size. The atoms have different colors merely for visualization, and the order is arbitrary. Clearly, most atoms are dominated by the red edge and adding more atoms mainly helps refine the representation of the red edge. A comparison of
Figure 4a,b, shows that this effect may be observed for both datasets. Note that the dimension of this basis is comparable to the one assumed in UPDM with its three-base spectra. Due to small number of atoms, which are usually all present in the solution, even using an
-solver becomes viable, turning the approach into a more simplistic approach. However, we still use SL0 which starts with the
-solution anyway and can thereby be considered a more general solver. Note that this model of the spectra including a linear combination of a few base spectra is very similar to the model in UPDM.
In a last step before the main evaluation, we considered the role of band selection. Rather than developing a sophisticated algorithm and tuning its parameters as in [
19], we concentrated on evaluating how well a band optimization on one dataset may be transferable to another. The result is shown in
Figure 5: the dictionary was trained on
Air_MA and evaluated for 20 random band sets on the datasets
Air_MB and
Ground_Full. The figure shows the resulting RMSE for all band sets sorted according to the RMSE. Each band set is denoted by a different color and the same band sets are connected with straight lines to visualize how the order of set quality correlates between datasets. Firstly, as found in [
19], there are few very badly performing band sets. The remaining bands show similar performances. Now, taking the corresponding position of the sets between datasets into consideration, the high RMSE sets are rejected quite effectively by choosing some of the low RMSE filter sets. However, in the plateau, there are many non-parallel lines, indicating that choosing one of the low RMSE sets is likely to be far less optimal in the other dataset. Therefore, we refrain from using a more sophisticated algorithm and simply selected some of the good bands in the following evaluations, as this brings a major part of the improvements with far less effort.
Overall, we found that the training size has a relatively low influence on the result while the sparsity target and even the dictionary size should surprisingly be chosen to be low. More precise values will be further investigated in
Section 5.
5. Results
In this section, we compare the performance of K-SVD against the other approaches in order to determine which is the best choice and under what circumstances. In contrast to the previous examples, here, training is always performed on a dataset using a different sensor than the dataset used for evaluation in order to better reflect the real-world situation. The selection of best bands is also performed on one of the sub-datasets using the same sensor as the training sub-dataset because the selection of best bands belongs to the training phase.
In each dataset combination, we compared the resulting RMSE for all pixels in the dataset with 20 replications for band selection and in the case of DCS, for the groups of pixels evaluated together. For K-SVD, we kept varying the dictionary size in order to further investigate which size is appropriate.
The results are shown in
Figure 6 and
Figure 7 for a varying number of bands
M as box plots. Note that we refrained from including outliers in the graphics as they were highly distracting due to the sheer number as a result of the large sample size. For some parameters, the boxes partially or completely lie outside the plotting range. The first setting shown in
Figure 6 is the main use case followed in the paper: training on a remote sensing image with a diverse environment and using it on measurements from the ground. We compared a total of six different approaches:
K-SVD,
KSVD-BBS,
UPDM,
UPDM-BBS,
DCS and
DCS-GM.
KSVD-BBS refers to K-SVD including the best band selection; herein, the three best band sets according to the band selection sub-dataset were kept.
UPDM-BBS refers to UPDM including the best band selection; again, the three best band sets were kept.
DCS-GM refers to DCS with the mixing of the groups by calculating the median of all spectra calculated for one pixel. As the 20 replications are generated by combining five group selections with four band set selections, four median spectra of five spectra each were constructed per parameter set and pixel.
K-SVD and
KSVD-BBS were performed with different numbers of atoms as shown in
Figure 6. The main observations here are that an increasing number of bands naturally leads to a reduced RMSE for all approaches. The selection of the best bands in
KSVD-BBS leads to a significant improvement in the
K-SVD results. In addition to the lowered median, the spread of values is also much lower. The selection of the best bands in
UPDM-BBS leads to a decent improvement, especially for small numbers of bands. The group mixing in
DCS-GM also leads to a slight but reliable reduction in the RMSE in comparison to the pure
DCS proving the benefit of this improvement. These improvements were very similar for all dataset combinations. Therefore, we refrained from including the un-improved versions in
Figure 7 to make it more comprehensible and facilitate the comparison of the dataset combinations.
The first plot
Figure 7a shown in
Figure 7 is the same as in
Figure 6 but only a more compact version—included here for facilitating the comparison. In this dataset combination,
DCS-GM reliably delivers good results and outperforms
UPDM-BBS.
KSVD-BBS beats
DCS-GM at certain dictionary sizes. Interestingly, for less than six bands, it performs best with approximately four atoms; while for more than six bands, it performs best with eight atoms. For all numbers of bands,
KSVD-BBS is able to outperform
DCS-GM for the best fitting number of atoms. However, even when selecting a slightly incorrect number of atoms,
DCS-GM tends to be better.
In order to provide a better understanding of the results, we picked the spectra with the lowest, highest and median RMSE from one of the simulations with eight bands and in the case of
KSVD-BBS, eight atoms, for both
KSVD-BBS and
DCS-GM, are shown in
Figure 8. For
KSVD-BBS, the band set which performed best in the band selection dataset (
Air_MB) was chosen. For
DCS-GM, the band sets were randomly picked as no such indicator was available. In both approaches, for both the best case as well as the median, the differences of the estimate in comparison to the original spectra were very low, qualitatively confirming the quantitative findings.
This allows an estimate of the impact on vegetation indices: these are usually built by comparing the reflectance at different wavelengths. For a vegetation index which requires the reflectances at wavelengths which have not been directly measured, the reconstruction results clearly deliver better reflectance values for these wavelengths than simply using the closest measured values or an interpolation between the closest measured values. Hence, the result of the vegetation index will also be improved.
For the cases considered herein, the errors with
KSVD-BBS and
DCS-GM are qualitatively similar, not allowing the derivation of a general statement on the cause of the difference between the two approaches. However, the differences are much clearer in the worst-case spectrum. In both approaches, errors in the worst-case spectrum become the worst in the spectral ranges with few bands. In the case of
DCS-GM (
Figure 8b), these ranges are wider due to the many different band sets. One of the particularly uneven band distributions generates the worst case seen herein with five bands clustered in the range between 450 nm and 570 nm. In the case of
KSVD-BBS (
Figure 8a), the band selection is not as uneven because such band sets are rejected. While the relatively large gap with no bands between 500 nm between 700 nm causes the problems in the worst case, it cannot be identified as a fundamental problem because it only affects the worst cases. In the median and best case, this gap has little to no impact on the reconstruction quality. Note that the three representatives in both
Figure 8a,b were selected from the same set of pixels. Therefore, the reflectance being overall higher in
Figure 8b is completely coincidental.
In
Figure 7b,d, we investigated the performance when training and evaluating on datasets containing solely vegetation data. In
Figure 7b, training was performed on the remote sensing dataset and evaluation was performed on the ground dataset. In
Figure 7d, it was the other way around. In both cases,
DCS-GM is ahead as it performs particularly well with homogeneous data. Interestingly, in
Figure 7b,
KSVD-BBS is superior over
UPDM-BBS despite suffering from the same restriction with both the training for
KSVD-BBS and the base spectra for
UPDM-BBS being based on the other dataset. In
Figure 7d,
UPDM-BBS performs almost as good as
DCS-GM for high band numbers and superior for small band numbers, as found in [
2], which may be explained by very similar vegetation spectra in the dataset
Air_V1 that are also very similar to the base spectra in use. In
Figure 7d,
KSVD-BBS performs slightly better than in
Figure 7b at least for a higher number of bands and optimal number of atoms. Together with
Figure 7a, this shows a trend of
KSVD-BBS benefiting from more diversified training data. However, in
Figure 7d,
DCS-GM and
UPDM-BBS benefit more from the similar data, rendering
KSVD-BBS inferior in this case. The similarity also leads to the very low variation of the results with all approaches.
Figure 7c was included mainly for the completion of the dataset combinations. Its sense is limited as the learning dataset is far less diverse than the evaluation dataset. However,
KSVD-BBS still works surprisingly well in this case in comparison to
DCS-GM and
UPDM-BBS which both fall behind in this case because of the lower similarity between pixels in the case of
DCS-GM and less vegetation pixels in the case of
UPDM-BBS. Since
Air_Full is a typical remote sensing dataset, this shows that
KSVD-BBS may also be a promising approach for estimating hyperspectral remote sensing images based on multispectral remote sensing images—aside from its application in M2HSNs.
For a qualitative evaluation of this case, we assess the reconstruction quality across the area in
Figure 9.
Figure 9a is an RGB image including the corresponding bands from the original dataset.
Figure 9b,c show the RMSE per pixel for one of the best band sets according to training in
KSVD-BBS and a group mixing randomly selected for
DCS-GM. In
Figure 9b,
KSVD-BBS generates low RMSE values especially in vegetation areas. Curiously, the trained base was also suitable for the bare soil areas, although no such samples were included in the training base. The quality suffers in the villages. If planning to use the approach on remote sensing data, clearly a more diversified training dataset is required.
Figure 9c shows why the RMSE values are higher with
DCS-GM: the quality is only about as good as with
KSVD-BBS in some of the vegetation areas but the main effect affecting the RMSE is the extreme variation across the whole area. We attribute this to a mixture of two effects: the first one is the one observed in
Figure 8b, that in some pixels an uneven bands selection leads to bad reconstructions. The second effect is that the groupings often include spectra of differing kinds which reduce the overall reconstruction quality of the group. The latter effect also serves as the main explanation for why the reconstruction quality is significantly better in all the other scenarios with less variation across the spectra.