Deep-Learning-Based Multispectral Image Reconstruction from Single Natural Color RGB Image—Enhancing UAV-Based Phenotyping

: Multispectral images (MSIs) are valuable for precision agriculture due to the extra spectral information acquired compared to natural color RGB (ncRGB) images. In this paper, we thus aim to generate high spatial MSIs through a robust, deep-learning-based reconstruction method using ncRGB images. Using the data from the agronomic research trial for maize and breeding research trial for rice, we ﬁrst reproduced ncRGB images from MSIs through a rendering model, Model-True to natural color image (Model-TN), which was built using a benchmark hyperspectral image dataset. Subsequently, an MSI reconstruction model, Model-Natural color to Multispectral image (Model-NM), was trained based on prepared ncRGB (ncRGB-Con) images and MSI pairs, ensuring the model can use widely available ncRGB images as input. The integrated loss function of mean relative absolute error (MRAE loss ) and spectral information divergence (SID loss ) were most effective during the building of both models, while models using the MRAE loss function were more robust towards variability between growing seasons and species. The reliability of the reconstructed MSIs was demonstrated by high coefﬁcients of determination compared to ground truth values, using the Normalized Difference Vegetation Index (NDVI) as an example. The advantages of using “reconstructed” NDVI over Triangular Greenness Index (TGI), as calculated directly from RGB images, were illustrated by their higher capabilities in differentiating three levels of irrigation treatments on maize plants. This study emphasizes that the performance of MSI reconstruction models could beneﬁt from an optimized loss function and the intermediate step of ncRGB image preparation. The ability of the developed models to reconstruct high-quality MSIs from low-cost ncRGB images will, in particular, promote the application for plant phenotyping in precision agriculture.


Introduction
Monitoring plant growth status during the whole growing season is an important objective targeted by multispectral imaging in both breeding and precision agriculture [1,2]. In multispectral images (MSIs), each pixel is composed of reflectance or radiance from multiple discrete wavebands-providing additional spectral information regarding the chemical composition of an object compared to natural color RGB (ncRGB; red, green, blue) Taking the different while complimentary properties of these two types of loss functions (i.e., directionless/directional) into consideration, it is tempting to have a composite loss function that can simultaneously measure the differences in magnitude and shape dissimilarities-providing the model an overall "learning direction" during training. Combining two or more algorithms has been shown to feature outstanding performances in material classification, pointing towards the superiority of composite measures of spectra compared to individual ones [31][32][33]. For example, [34] combined both the shape and magnitude of spectra to avoid the limitation of using SAM loss in HSI classification. Unfortunately, models using composite loss functions during training for spectral reconstruction are still rare in precision agriculture.
This study developed and validated a novel method for reconstructing MSIs using ncRGB images of maize and rice plots captured by UAVs. We improved the state-of-theart deep learning structure HSCNN-R by tuning the number of residual blocks inside the architecture for feature extraction and different loss functions to optimize the model convergence. The enhanced models were used to generate ncRGB images from MSIs and subsequently reconstruct MSIs from ncRGB images. The major contributions can be summarized as follows. First is the derivation of the mapping function converting tcRGB to ncRGB images through training Model-TN based on a benchmark drone HSIs dataset [35]. The mapping function ensured RGB images used for training and future predictions were of a more similar color space, minimizing the performance drop of the RGB color space dependent on MSI reconstruction models. Second is the involvement of combined loss functions in regulating the model convergence. The combined loss functions with complementary properties further improved the model performance. Third, we tested the MSI reconstruction models in real-world experiments. The end-to-end supervised deep learning model (Model-NM) was built based on ncRGB-Con images and MSIs of the maize experiment in 2018. The performances of the Model-NM in recovering multispectral information from RGB images were tested on contrasting testing datasets (i.e., maize data 2019, and rice data 2018) from our experiments. The fidelities of the reconstructed MSIs for phenotyping applications were demonstrated by comparing the calculated Normalized Difference Vegetation Index (NDVI) of reconstructed MSIs from standardized ncRGB images and ground truth MSIs using different growth stages, years and/or crop species. The potential benefits of using NDVI calculated from reconstructed MSIs over Triangular Greenness Index (TGI) based on RGB images for discriminating differently irrigated maize plants were explored to illustrate the application potential of the novel image reconstruction technique in real-world tasks in precision agriculture.
The remainder of this article is organized as follows. The Materials and Methods section describes how different data were collected and used to build different models. The Results section shows the major outcomes from different analyses. The Discussion section mainly discusses the major outcomes of the study and how they were affected by different analyses. The Conclusion section summarizes the study and predicts future research direction on this research topic.

tcRGB and ncRGB Image Acquisition from Hyperspectral Images
The true-color RGB (tcRGB) and the corresponding converted, natural color RGB (ncRGB-Con) images, used to build Model-TN (see below), were derived from an HSI benchmark dataset of WHU-Hi-Honghu HSI [35]. The benchmark HSI dataset contains 270 bands from 400 to 1000 nm with a ground sampling distance of about 0.043 m. The tcRGB image was composed of the 33rd, 72nd and 119th bands corresponding to~475 nm, 561 nm and 668 nm in HSI. The ncRGB images were converted from the HSIs following the procedure described in [14], excluding radiance conversion. In brief, the CIE1934 color response function was used to convert hyperspectral reflectance to a tristimulus vector XYZ and then subjected to linear transformation, non-linear brightness adjustment and finally a threshold of 10 −4 was applied to improve the image contrast.

UAV Image Acquisition of MSI and ncRGB Images from Maize and Rice Fields
Maize (Zea mays L.) plants of the same cultivar were grown on 30 subplots in the field in 2018 ( Figure S1a) and 2019 (Figure S1b) in India (N17 • 19 27.22", E78 • 23 55.71"), see [36] for details. Three different irrigation levels were set, i.e., 60% (level 1), 80% (level 2) and 120% (level 3) of cumulative pan evaporation were applied at the subplot level throughout the growing season. Each treatment was replicated three times and randomly distributed as subplots (90) in the field.
In both experiments, MSIs and ncRGB images (ncRGB-Cam) were collected simultaneously by different cameras onboard. For MSI acquisition, a MicaSense RedEdge multispectral camera (MicaSense Inc., Seattle, WA, USA) [37] covering 5 wavebands, blue (475 nm, 32 nm bandwidth), green (560 nm, 27 nm bandwidth), red (668 nm, 14 nm bandwidth), near-infrared (840 nm, 57 nm bandwidth), and red edge (717 nm, 12 nm bandwidth) was mounted on an Inspire-1 Pro (DJI, Shenzhen, China) unmanned aerial vehicle (UAV). The onboard RGB camera was Zenmuse X5 (DJI, Shenzhen, China) with a resolution of 16.0 megapixels and ISO range of 100 to 25,600. The flight was set in autopilot mode, flight speed and altitude were 4 km h −1 and 10 m above ground level, respectively. Eighty percent overlap between two consecutive images, containing 1296 × 960 pixels in each band, were realized. Flights were conducted every week around 11 a.m. throughout the growing seasons. Subsequently, specific days of the year (DOY) representing four different growth stages of maize (DOY 303, 313, 323 and 354) and rice (DOY 233, 270, 299 and 328) in 2018, and three different growth stages of maize (DOY 313, 326 and 354) in 2019 were selected for further analysis. Orthomosaics of the entire fields were produced per flight by Agisoft Metashape software [38].

Model Selection, Training, Validation and Testing
Residual block, which has been used in HSCNN-R [13] previously and shown excellent performance in HSI reconstruction, was used as basic architecture and the number of residual blocks was tuned based on the task complexity in this study. There were 16 residual blocks included in the original architecture which was built to map 3-channel RGB images to 31-channel HSIs for HSI reconstruction. We adapted the output channel number to 3 or 5 based on the number of channels of output images in our study. It has been shown in previous studies that the number of residual blocks was redundant [6,39]. The number of residual blocks was further tuned in this study as well. One residual block was finalized for the ncRGB-Con image production model (Model-True to Model-Natural color image (Model-TN), see below) while three blocks were for MSI reconstruction models (Model-Natural to Multispectral image (Model-NM), see below). The Model-TN ( Figure S2) was trained to convert tcRGB images of maize and rice crops to ncRGB ones. Model-NM ( Figure  S3) was used to recover the multispectral information of the HR-RGB images of maize and rice. HR images are advantageous as the boundaries between different objects can be more clearly set and smaller objects can be more distinguishable, contributing to easier pixel-level semantic segmentation [40]. Contrary to HR features rich in spatial details such as points, lines and local edges, high-level features provide abstract semantic information such as cars, trees and differently irrigated crops [41][42][43]. In order to manage semantic classification tasks well, HR features and high-level features have to be combined [42]. The flowchart of the whole analysis is also supplemented as Figure S4.
Multispectral radiance from the cameras was converted to reflectance based on the known reference panel provided by (MicaSense Inc., Seattle, WA, USA) [37]. RGB images were color corrected with a white reference panel placed inside the field. The RGB images of both benchmark and field experiments were transformed to a range of 0-1 while multispectral images, with reflectance already in the range of 0-1, were used. In order to increase the robustness of both models towards different brightness levels, a random scaling factor Remote Sens. 2022, 14, 1272 5 of 19 (0.1-1.9) was added to each pixel of both input images and output prediction during training [21]. For model training, the batch size was set as 1 and the optimizer Adammax [44] with settings of β1 = 0.9, β2 = 0.999, and eps = 10 −8 was selected as default. The weights were initialized through HeNormal initialization [45] in each convolutional layer. The initial learning rate was set at 10 −4 and the learning rate decreased by 20% every 5 epochs if the validation loss failed to decrease further. The training of models with composite loss functions is initialized from the weights of trained models with corresponding magnitude loss functions with the same initial learning rate (10 −4 ). All models were trained until no further decrease in validation loss occurred over successive 200 epochs.
The quality of a reconstructed spectrum can be quantified in both magnitude and shape differences when compared to the ground truth one. Three different loss functions were used to either measure magnitude differences, i.e., mean square error (MSE loss ; Equation (1); [6,46]) or mean relative absolute error (MRAE loss ; Equation (2); [47]), or shape dissimilarities, i.e., spectral information divergence (SID loss ; Equation (3); [26]). Both MSE loss and MRAE loss functions are commonly used while prone to extreme reflectance values [25]. SID loss has been shown to effectively quantify spectral shape differences regardless of magnitude ones [26]. In addition, two composite loss functions combined measurements of magnitude and shape dissimilarity, i.e., MSE-SID loss and MRAE-SID loss . Different weights (W) were assigned to each subcomponent of the composite loss functions: W mse :W sid equaled 0.333 and W mrae :W sid was 0.0667 throughout the training process to ensure similar contributions to the final loss. The effectiveness of the five different loss functions was compared during model training. Trained Model-TN and Model-NM with the smallest validation losses were selected for further predictions. Three evaluation metrics corresponding to the loss functions, i.e., MRAE ev (Equation (4)), RMSE ev (Equation (5)), and SID ev (Equation (6)), were used to evaluate models' performance.
I gt (i) and I re (i) represent the ith pixel of the ground truth and reconstructed MSIs, I, respectively.
For Model-TN, original-sized tcRGB and ncRGB-Con images were split into 5 sections: 4 of those were randomly selected for training and the other for validation. The trained model with the best performance was then used for transforming tcRGB images, derived from MSIs of field experiments, to ncRGB-Con images.
During the training of Model-NM, all images of maize collected in 2018 were used for model building, while maize images from 2019 and rice images from 2018 were used for testing-allowing for an independent test of model performance and transferability. Training images were fragmented into 152 patches (size: 512 × 512 pixels) of which 2/3 were used for training and 1/3 for initial validation. These validated models, trained with different loss functions, were subsequently used to reconstruct MSIs of independent testing datasets and further evaluated through three metrics, MRAE ev , RMSE ev and SID ev at subplot level ( Figure S1). The cloud service Google Colaboratory (Colab Pro), 25GB RAM with Python 3 runtime served as the major platform for all model training, validation and testing.

Ground Truthing of Reconstructed MSIs Using NDVI, and Comparison with RGB-Derived TGI
To illustrate the quality of MSIs reconstructed from standard ncRGB-Con images, the ncRGB-Cam images of maize and rice from the year 2018 were color matched to corresponding ncRGB-Con images rendered from MSIs, before being used to reconstruct high spatial resolution MSIs. Mutual information (MI) is the Kullback divergence between the joint probability density function (PDF) of observed values over local patches of the two images and the product of the marginal PDFs of them [48]. MI was calculated to compare similarities between ncRGB-Con and ncRGB-Cam image either before or after histogram color matching; MI reaches its maximum of one when two images are the same. The quality of the reconstructed MSIs was further assessed through comparisons between calculated NDVI [49] of subplots of MSIs reconstructed from ncRGB-Cam-Con images. Because only plants were of interest when comparing the NDVIs, a threshold of 0.6 was applied to filter out the soil and shadows. The average NDVI of each subplot was calculated and linear regression was used to compare reconstructed and ground truth values.
The NDVIs of reconstructed MSIs from models built on both ncRGB-Cam and tcRGB images were compared so as to highlight the superiority of the intermediate step of natural color conversion from tcRGB to ncRGB-Con.
In order to show the advantages of reconstructed MSIs over indexes derived from ncRGB-Cam, the TGI was calculated based on the color-matched ncRGB-Cam images of maize [50]. The ability of either MSI-derived NDVI or ncRGB-Cam-derived TGIs to reliably separate three levels of irrigation treatments in maize was compared. NDVIs and TGIs of differently irrigated maize plants within each sampling date in 2018 were compared through a permutation test [51] in R [52].

Model Convergence with Different Loss Functions
Both Model-TN and Model-NM, optimized by five different loss functions, had similar performance rankings based on three evaluation metrics (Tables 1 and 2). Models with MRAE-SID loss generated minimum total losses of 0.0582 and 0.0571, respectively, compared to models with the other four utilized loss functions. Models with MRAE-SID loss also predominantly produced the minimum values in each individual evaluation metric, except RMSE ev = 0.0127 from Model-TN with MRAE loss with a marginal difference of 0.0001. The second-best loss function was MRAE loss , with a performance close to models regulated by MRAE-SID loss ; the total error differences were marginal as well, i.e., 0.000400 and 0.00310 in Model-TN and Model-NM, respectively. Models with MSE-SID loss produced errors of 0.258 (Model-NM) and 0.0636 (Model-TN), both being lower than the corresponding models with MSE loss while greater than those of models with SID loss . Both model types with SID loss held much higher MRAE ev and RMSE ev while featuring comparable SID ev compared with models optimized by other loss functions.

Model Convergence with Different Loss Functions
Both Model-TN and Model-NM, optimized by five different loss functions, had similar performance rankings based on three evaluation metrics (Tables 1 and 2). Models with MRAE-SIDloss generated minimum total losses of 0.0582 and 0.0571, respectively, compared to models with the other four utilized loss functions. Models with MRAE-SIDloss also predominantly produced the minimum values in each individual evaluation metric, except RMSEev = 0.0127 from Model-TN with MRAEloss with a marginal difference of 0.0001. The second-best loss function was MRAEloss, with a performance close to models regulated by MRAE-SIDloss; the total error differences were marginal as well, i.e., 0.000400 and 0.00310 in Model-TN and Model-NM, respectively. Models with MSE-SIDloss produced errors of 0.258 (Model-NM) and 0.0636 (Model-TN), both being lower than the corresponding models with MSEloss while greater than those of models with SIDloss. Both model types with SIDloss held much higher MRAEev and RMSEev while featuring comparable SIDev compared with models optimized by other loss functions.
Models of Model-TN trained with MSEloss and MSE-SIDloss loss functions had a similar convergence speed, with 3724 and 3729 epochs in 248 and 250 s, respectively (Table 1). Using other loss functions resulted in (considerable) more epochs and time. Models of Model-NM with both MRAEloss and MSEloss loss functions converged at a similar speed of 22969 and 20952 s ( Table 2), while it took 35298 s for final convergence when using SIDloss. Noticeably, the two models with composite loss functions consumed much more time compared to those trained with individual loss functions.  The minimum values from different evaluation metrics are highlighted in bold. Using other loss functions resulted in (considerable) more epochs and time. Models of Model-NM with both MRAE loss and MSE loss loss functions converged at a similar speed of 22,969 and 20,952 s (Table 2), while it took 35,298 s for final convergence when using SID loss . Noticeably, the two models with composite loss functions consumed much more time compared to those trained with individual loss functions.

Universality of Models Trained with Different Loss Functions
The model performances on the maize testing datasets, evaluated at subplot levels, varied greatly with different loss functions (Figure 2a-c). Models with MRAE loss featured the best performances on testing maize data-indicated by low evaluation values (0.0600, 0.0237 and 0.00348 in MRAE ev , RMSE ev and SID ev , respectively). In contrast, all models trained with SID loss had significantly higher magnitude errors (MRAE ev = 0.322, RMSE ev = 0.0620) compared with other loss functions in maize testing data; nevertheless, they reached a similar level of SID ev compared to the best performing models. The models with the composite loss function MRAE-SID loss had comparable errors in both evaluations of MRAE ev and SID ev as models optimized by MRAE loss . Visualization of different evaluation metrics on reconstructed MSIs from ncRGB-Con images ( Figure S5) of both maize and rice testing data are shown in Figures 3 and 4. More extensive errors of the reconstructed MSIs, indicated by red color, mainly originated from locations with very low reflectance values, such as shadows. Applied to the rice testing data, the performances of models with different loss functions were similar (Figure 2d-f) compared to the models applied to the maize testing data. Models with MRAE loss performed best in all three evaluation criteria (MRAE ev , RMSE ev and SID ev of 0.107, 0.046, and 0.012, respectively). SID loss once more possessed the least generality as suggested by significantly greater errors of MRAE ev and RMSE ev (0.0363 and 0.0978, respectively) when applied to the rice data. Models trained with MSE-SID loss had the same level of generality as their corresponding subcomponent, MSE loss , as statistically the same values were produced in all three evaluations. The composite loss function MRAE-SID loss performed better in MRAE ev while worse in RMSE ev than the other composite one, MSE-SID loss . Interestingly, models with all loss functions except MRAE loss reconstructed rice MSIs with statistically the same SID ev .
Visualization of different evaluation metrics on reconstructed MSIs from ncRGB-Con images ( Figure S5) of both maize and rice testing data are shown in Figures 3 and 4. More extensive errors of the reconstructed MSIs, indicated by red color, mainly originated from locations with very low reflectance values, such as shadows.  (2019, subplot level, a-c) and rice (2018, subplot level, d-f) by five loss functions (MRAEloss, MSEloss, SIDloss, MRAE-SIDloss and MSE-SIDloss) using three evaluation metrics (MRAEev, RMSEev and SIDev) (mean ± standard error); see text for details. A total of 90 maize sublots from three sampling dates, and 1680 rice subplots from four sampling dates were analyzed. Different letters in each subpanel indicate a significant difference at the level of 0.05. Note: Differences in Y axis scales; Y axis breaks in subpanels a and d for better visualization are indicated by dotted lines.
Visualization of different evaluation metrics on reconstructed MSIs from ncRGB-Con images ( Figure S5) of both maize and rice testing data are shown in Figures 3 and 4. More extensive errors of the reconstructed MSIs, indicated by red color, mainly originated from locations with very low reflectance values, such as shadows.

Effectiveness of MSIs Reconstructed from ncRGB-Cam-Con Images through NDVI and TGI Comparisons
The color differences of tcRGB, ncRGB-Cam, ncRGB-Con and ncRGB-Cam-Con images are exemplified in Figure 5. Even though ncRGB-Con images transformed by the Model-TN looked more natural compared to tcRGB ones, they were still different from

Effectiveness of MSIs Reconstructed from ncRGB-Cam-Con Images through NDVI and TGI Comparisons
The color differences of tcRGB, ncRGB-Cam, ncRGB-Con and ncRGB-Cam-Con images are exemplified in Figure 5. Even though ncRGB-Con images transformed by the Model-TN looked more natural compared to tcRGB ones, they were still different from directly captured ncRGB-Cam images ( Figure 5). The colors of the ncRGB-Cam-Con images (Figure 5c,f) of both maize and rice became more similar to the ncRGB-Con ones (Figure 5b,e)-both by visual impression and MI (Table S1) after color matching. Color matching increased the MI of these ncRGB-Cam testing images of both maize and rice. The minimum increase of MI was 7.55% on maize image on DOY 313 in 2018 while the highest one was on maize on DOY 323 in 2018 with 38.9% after histogram color matching. In rice, the improvement in MI was generally higher than maize images. The smallest increase of MI was 21. A significant increase in correlation of reconstructed NDVIs with ground truth NDVIS was found when reconstruction models were either built from tcRGB or ncRGB-Con images: 0.57 to 0.75 in maize and 0.47 to 0.78 in rice, respectively ( Figure 6). The NDVI calculated from reconstructed MSIs (NDVIrec) values of subplots of ncRGB-Cam-Con images of maize and rice in 2018 reached the highest correlations (R 2 = 0.89-0.91) with the NDVI calculated from ground truth MSIs taken directly by a multispectral camera (NDVIgt) (Figure 7). A significant increase in correlation of reconstructed NDVIs with ground truth NDVIS was found when reconstruction models were either built from tcRGB or ncRGB-Con images: 0.57 to 0.75 in maize and 0.47 to 0.78 in rice, respectively ( Figure 6). The NDVI calculated from reconstructed MSIs (NDVI rec ) values of subplots of ncRGB-Cam-Con images of maize and rice in 2018 reached the highest correlations (R 2 = 0.89-0.91) with the NDVI calculated from ground truth MSIs taken directly by a multispectral camera (NDVI gt ) (Figure 7). NDVIS was found when reconstruction models were either built from tcRGB or ncRGB-Con images: 0.57 to 0.75 in maize and 0.47 to 0.78 in rice, respectively ( Figure 6). The NDVI calculated from reconstructed MSIs (NDVIrec) values of subplots of ncRGB-Cam-Con images of maize and rice in 2018 reached the highest correlations (R 2 = 0.89-0.91) with the NDVI calculated from ground truth MSIs taken directly by a multispectral camera (NDVIgt) (Figure 7).   Statistically significant differences of NDVIs of maize leaves among different levels of irrigation were similarly detected by both ground truth MSIs and reconstructed ones from ncRGB-Cam-Con images on DOY 254 in 2018 (Figure 8d,e), while no differences were found between irrigation levels at DOY 303 and 313 (not shown) and DOY 323 (Figure 8a,b). TGI could not detect the different irrigation levels on maize plants on DOY 354 in 2018, while NDVI was able to do so. Similar to NDVI, no significant differences were found in TGIs of maize plants among different irrigation levels during the first three growth stages, DOY 303 and 313 (not shown) and 323 in 2018 (Figure 8c). Even though TGI values of the least irrigated maize plants (60%) were significantly greater, 19.6, than the other moderately irrigated plants, 16  Statistically significant differences of NDVIs of maize leaves among different levels of irrigation were similarly detected by both ground truth MSIs and reconstructed ones from ncRGB-Cam-Con images on DOY 254 in 2018 (Figure 8d,e), while no differences were found between irrigation levels at DOY 303 and 313 (not shown) and DOY 323 (Figure 8a,b). 2018, while NDVI was able to do so. Similar to NDVI, no significant differences were found in TGIs of maize plants among different irrigation levels during the first three growth stages, DOY 303 and 313 (not shown) and 323 in 2018 (Figure 8c). Even though TGI values of the least irrigated maize plants (60%) were significantly greater, 19.6, than the other moderately irrigated plants, 16.3, no difference was found between the moderately irrigated maize plants with either of the other two levels of irrigated maize plants on DOY 354 in 2018 (Figure 8f).

Discussion
The multispectral information recovery model, Model-NM, has to target ncRGB images for the sake of universality because ncRGB images can be produced easily from low-cost RGB sensors. Models trained on input images of one color type, either true or natural (Figure 1c,f), will perform poorly when reconstructing MSIs from another color space because of the enormous difference in vector magnitude, RMSE ev and MRAE ev , and directions, SID ev , of pixel values [53]. The performance increase from Model-NM built on tcRGB to Model-NM built on ncRGB-Con images was confirmed by the much more accurate reconstructed NDVIs from ncRGB-Cam images ( Figure 6). The recovery of spectral information is thus largely the reconstruction of a higher-dimensional vector from lowerdimension ones, and both the magnitude and direction of the lower-dimensional RGB vector surely affect the reconstruction process [53,54]. The Model-TN was able to produce natural-looking ncRGB images from over-saturated tcRGB images with the help of hyperspectral images, which was an indispensable step to connect MSIs and ncRGB images. Model-TN was trained on benchmark images with various crops, soil and buildings while the maize and rice were not included. Nevertheless, the rendered ncRGB images of maize and rice appeared natural and looked much closer to the color captured by RGB cameras on board based on human perception, indicating a well-polished multispectral response function embedded within the model. Representation of MSIs in ncRGB images is valuable, as the false color tcRGB images can easily lead to misunderstandings during knowledge transferring especially when viewers interpret them based on common sense [55].
The MSI reconstruction model, Model-NM, trained on maize from year 2018, already performed very well on maize of different growth stages from the following year, 2019, and even another crop, rice, at different sampling dates in this study-as indicated by low error rates, green color, of the three applied evaluation metrics on reconstructed MSIs based on models trained with different loss functions, except for ones with SID loss . The highly accurate reconstructed MSIs at least validated the possibility of MSIs' recovery from RGB images, which can already offer both researchers and farmers the potential to obtain high-quality MSIs based on standard ncRGB-Cam images.
Fusion of HR-RGB and LR-MSIs is another commonly used approach in spectral image super-resolution. One advantage of fusion strategies over the direct reconstruction ones, e.g., HSCNNR, is that higher quality HR-MSIs can be generated when the super-resolution scale is high (often ≥ 8) [56]. However, the final quality of generated HR-MSIs should not be affected if the super-resolution scale is low, e.g., only 4 in our study. Another advantage of direct reconstruction methods in MSI super-resolution is that the HR-MSIs are reconstructed directly based on the HR-RGB image, taking full advantage of the spatial information of HR-RGB and leading to marginal spatial distortion in the derived HR-MSIs [57]. Additionally, the fusion strategy normally requires a pair of existing low-resolution spectral images and HR-RGB-largely limiting the transferability during practical applications.
Different brightness levels of input images were also covered in the trained models by adding the scaling factor, which changed the brightness from 0.1 to 1.9 times, on input images and reconstructed output as shown in [21], which could be inferred from the excellent performance of the trained models on various testing images in the field. The brightness invariant property is important because it can be easily affected by many factors such as shutter speed, illumination, and aperture size [53]. Most of the spectral reconstruction studies failed to cover this aspect, resulting in big errors when the various exposure of the RGB images were tested [6]. An earlier study using deep learning to reconstruct spectral image brightness invariantly still failed to manage the task on MSI reconstruction due to the missing standard function converting MSIs to ncRGB images [53]. Nonetheless, most images are more or less white balanced through either automatic or manual correction in practice. In precision agriculture, either reference reflectance panels or Downwelling Light Sensor is a must in order to have the comparable reflectance of the crops correctly calculated in time series and environment changing measurements [58]. Even though it is not strictly necessary to train a model to have brightness differences considered, it is still appealing if the model could handle it well. The brightness invariant property of the trained model also guaranteed a much greater potential and robustness towards various real-world situations.
The trained brightness invariant models could still not handle shadow well. Brightness is a matter of light intensity changes, i.e., differences in magnitude, while the spectral shape remains relatively constant [53]. In contrast, shadows affect both the shape and magnitude of the spectrum due to heterogenous illuminations and geometric structure of objects [59]. Shadows are unavoidable for field images and reflectance values were generally much lower, which can be close to 0 in all wavebands, compared to well-illuminated conditions [60], making them prone to have large MRAE errors during spectral reconstruction. Nansen et al. [61] found that one model developed on crops under lighting conditions failed to work on the same crop under shade conditions. The unpredictable properties of spectra due to shadows might be one of the biggest obstacles reducing the generality of the trained models, as no spectra under shadow are alike. Shadow removal, which has been studied extensively in recent years, should thus be one challenge to further improve the model performance [62][63][64].
In brief, loss functions regulate the "direction" towards which the model will learn. Regardless of its high importance, loss function optimization in spectral construction tasks has been much less studied than architectural changes and other hyperparameter tuning in deep learning [39]. Directionless loss functions such as MSE loss and MRAE loss were commonly used in deep learning, in particular, in most scenarios including spectral reconstruction. In contrast, direction sensitive loss functions such as SID loss and SAM loss were previously mostly used on hyperspectral images for spectral matching during multi/hyper-spectral image exploration [6,65]. Considering the practical applications of spectral imaging, the bias of using MRAE loss in training spectral reconstruction models is much smaller compared to models trained with MSE loss -as the integrity of the whole spectrum is of interest [13,39].
The superiority and consistency of models trained with MRAE loss were also confirmed by Zhao et al. [39]. SID loss , however, was only able to regulate the shape of reconstructed MSIs, leaving other magnitude-related errors to be enormously high [26]. If aiming to reconstruct MSIs as close to ground truth values as possible and obtaining physically plausible ncRGB images converted from reconstructed MSIs [21], SID loss will thus not be an option. Nonetheless, due to the fact that most spectral matching assignments are searching spectral signatures for separating different materials or objects, SID loss is still essential in spectral studies.
The more overall difference a loss function can measure, the better the model can possibly be regulated. In this study, models trained with the composite loss function, MRAE-SID loss , featuring complementary properties, constantly possessed a better performance compared to models with individual subcomponents. While it was shown earlier that MRAE loss worked more effectively than MSE loss [39], the composite loss function possessed an even better performance compared to MRAE loss alone because of the extra regulation on the shapes of spectra, specifically during model convergence. The below-average performance of models with MSE-SID loss might be due to the lower capability of the MSE loss component in managing the bias towards outliers or high reflectance values, analog to the limited performance of models trained with MSE loss alone. The contributions of subcomponents were set to equal in this study. However, more efforts in tuning different ratios of them might further improve the model's final performance but were beyond the scope of this study.
The performances of trained models with different loss functions on new datasets were not consistent with the ones on training sets. Models with composite loss function, MRAE-SID loss , fell short compared to models with MRAE loss , which is also common in deep learning due to overfitting issues [39]. Deformed spectral reflectance due to shadows also makes the model harder to generalize (see discussion above). Moreover, it has been shown that performances of the spectral reconstruction models were partially dependent on the spatial pattern of the image objects [6]. This may explain why all evaluations ended with lower error values in testing maize plants from another growing season, compared to rice. As MRAE-SID loss was able to assist models to converge well in all three tasks, further increasing the diversity or augmenting data during training might be a way to further increase model performance on unseen data. Nevertheless, the overfitted model still had outstanding performance, which was better than most of other loss functions tested, on different testing datasets. NDVI is the parameter indicating the chlorophyll content [49], and is frequently used to create biochemical maps of field crops in precision agriculture. The fidelity of reconstructed MSIs can also be partially speculated by whether NDVIs calculated from reconstructed images match with the ground truth values from MSIs, particularly if they can be similarly used to detect plants that were watered differently. The NDVIs showed significant differences among different levels of irrigation at the last growth stage of maize plants. Even though the R 2 of sublots' NDVIs of rice data were not as high as the ones of maize, this was quite reasonable, as the canopy structures between maize and rice were very different which surely affected their spectral reflectance reconstructions [66]. Even though the NDVIs calculated from MSIs which were reconstructed directly from ncRGB-Cam also significantly correlated to the ground truth ones, the relation was further increased, as suggested by higher R 2 values, 0.75-0.78 to 0.89-0.91, when ncRGB-Cam images were color adjusted to match ncRGB-Con ones.
The natural color space in which these ncRGB-Cam images locate also affects the final reconstructed MSIs [53]. Histogram color matching was commonly used to correct color differences due to the various light and atmospheric conditions in remote sensing [67,68].
As it is clear that the MIs of all these color-corrected ncRGB-Cam images were subjected to a big increase, even over 100% in rice images, the effectiveness of using histogram color matching in bringing different ncRGB-Cam images to more uniform color space is supported. The color-corrected images share more similar color spaces to corresponding ncRGB-Con images, which is the main reason the reconstructed MSIs had higher accuracy.
These indexes generated were more consistent towards ground truth ones, underlining different water treatments. As the spectral response functions of consumer-level cameras vary greatly, resulting in extensive color differences in produced RGB images in practice, these functions are mostly not available for consumers [69]. Transformation of different ncRGB-Cam images of different appearances to the standard format, e.g., ncRGB-Con in this study, should be a more robust method to be explored in order to simplify the reconstruction process and increase the model's generality at the same time.
The indexes generated from reconstructed MSIs are more consistent with ground truth measurements compared to ones derived from "normal", broadband RGB images. The most well-known index characterizing the chlorophyll content based on RGB images is TGI [70,71]. TGI also showed the most intimate behavior to NDVI compared to other indexes calculated from RGB images [71]. However, compared with the excellent agreements of NDVI rec , TGIs were not able to show the irrigation differences. It has been shown recently that the TGI calculation depends on the peak wavelength sensitivity, which is affected by the specifications of the Complementary Metal Oxide Semiconductor (CMOS) sensors of the camera used [50]. Even the recalibrated formula calculating TGI, which tried to cover wider CMOS sensitivities, still could not fully cover the variabilities coming from different camera sensors in our study. The reconstructed near-infrared information in MSIs is thus rated indispensable for achieving a higher efficacy in distinguishing maize plants that were differently irrigated.

Conclusions
We validated the fidelity of the deep learning model in reconstructing MSIs based on high-resolution ncRGB-Cam images irrespective of brightness levels. We illustrated the benefits of combining complimentary loss functions to supervise the convergence direction of the model on different tasks. The advantage of natural color conversion on tcRGB images in improving the performance of MSI reconstruction models was also highlighted. The model was trained and validated using one crop (maize) imaged in different years and successfully applied on another crop, rice, with a totally different canopy structure. The superiority of the reconstructed NDVIs in separating differently watered maize plants over the frequently used broadband RGB images-derived TGI was endorsed.
The application of reconstructed MSIs in precision agriculture just started and more studies in this area should be conducted, especially targeting image segmentation, object detection and 3D image reconstruction, in which both higher spatial and spectral resolution play more important roles [5,72,73]. During further advancement in this field, some key limitations have to be solved. A more robust mapping function should be developed to connect MSIs to ncRGB-Cam color space rather than a complicated model relying on benchmark HSIs. The establishment of a spectral response function that can translate the reflectance of hyperspectral, multispectral or even RGB images directly to the target ncRGB-Cam image through deep learning is needed. For example, an end-to-end supervised linear neural network with physical constraints applied can be integrated into the reconstruction model. Another issue in multispectral image reconstruction is the ubiquitous existence of shadows. Image pre-processing includes either transforming images to shadow invariant space, or other deep learning methods to remove the shadows before being used for multispectral image reconstruction [74,75]. Once these strategies are incorporated in current reconstruction models, full spectrum reconstruction processes will likely become more precise and thus automatable. Hyperspectral images featuring a higher spectral resolution, thus holding wider application potentials than MSIs, should also be focused on in the future.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs14051272/s1, Table S1: The Kullback divergence-based mutual information (MI) between on ncRGB-Con image and ncRGB-Cam images of maize and rice on different DOYs in 2018 either without (MI or ) or with (MI matched ) histogram color matching. Figure S1: Examples of subplots in the fields of maize on DOY 354 in both 2018 (a) and 2019 (b), rice on DOY 328 in 2018 (c). Figure S2: Model-TN for tcRGB to ncRGB image conversion. The architecture of Model-TN composed with only one residual block which is highlighted in dashed box is shown in (a). The whole process of ncRGB image conversion is shown in (b), model training in red box and prediction in blue box. Figure S3. Model-NM for ncRGB image to MSIs conversion. The architecture of Model-NM composed with three residual blocks which are highlighted in dashed box is shown in (a). The whole conversion process from ncRGB image to MSIs is shown in (b). Figure S4: The flowchart of the detailed analysis. Figure S5: The ncRGB-Con images of maize on DOY 354 (a) and rice on DOY 328 in 2018 (b) used for visualization in Figures 3 and 4, respectively.
Author Contributions: J.Z. performed data analysis and drafted the manuscript; A.K., B.N.B., B.M. and P.R. performed the field experiment and collected data; S.N. and W.G. designed the experiment and acquired the funding; all authors contributed substantially to the manuscript preparation and revision. All authors have read and agreed to the published version of the manuscript.
Funding: This study is partially funded by the Japan Science and Technology Agency (JST) and India Department of Science and Technology (DST), SICORP Program JPMJSC16H2, and JST AIP Acceleration Research "Studies of CPS platform to raise big-data-driven AI agriculture". B.R. was funded by the University of Natural Resources and Life Sciences Vienna.