True-Color Reconstruction Based on Hyperspectral LiDAR Echo Energy

: With the development of remote sensing technology, the simultaneous acquisition of 3D point cloud and color information has become the constant goal for scientiﬁc research and commercial applications in this ﬁeld. However, since radar echo data in practice refer to the value of the spectral channel and its corresponding energy, it is still impossible to obtain accurate tristimulus values of the point through color integral calculation after traditional normalization and multispectral correction. Furthermore, the reﬂectance of the target, the laser transmission power and other factors lead to the problems of no echo energy or weak echo energy in some bands of the visible spectrum, which further leads to large chromatic difference compared to the color calculated from the spectral reﬂectance of standard color card. In response to these problems, the hyperbolic tangent spectrum correction model with parameters is proposed for the spectrum correction of the acquired hyperspectral LiDAR in the 470–700 nm band. In addition, the improved gradient boosting decision tree sequence prediction algorithm is proposed for the reconstruction of missing spectrum in the 400–470 nm band where the echo energy is weak and missing. Experimental results show that there is relatively small chromatic difference between the obtained spectral information after correction and reconstruction and the spectrum of standard color card, achieving the purpose of true color reconstruction.


Introduction
With the emergence of projects such as "Realistic 3D" and "Smart City", there are increasingly urgent needs for target true color reproduction in the fields of urban planning, landform restoration, and scene reconstruction [1][2][3][4][5][6][7]. In order to reproduce the 3D urban real scene or reflect the real geographic pattern and information, the structure and color of the target or landform must be accurately restored [8]. In the visible range, spectral reflectance is the fingerprint of target surface color, which characterizes the ratio of the radiant exitance on object surface to the radiant incidence at each waveband, and reflects the object's absorption and reflection characteristics of radiant energy [9][10][11]. According to the colorimetry principle, the tristimulus values {X, Y, Z} [12] of the object color are obtained by integrating the relative spectral power distribution of illuminant l(λ), spectra of object surface r(λ), standard colorimetric observer matching functions {x(λ),ȳ(λ),z(λ)}, adjustment factor K, and wavelength sampling interval d(λ), as shown in Equation (1). X = K l(λ)r(λ)x(λ)dλ; Y = K l(λ)r(λ)ȳ(λ)dλ; Z = K l(λ)r(λ)z(λ)dλ; (1) As shown in Figure 1a, the new hyperspectral LiDAR [13][14][15][16][17][18][19] not only retains the three-dimensional space detection capability of laser radar, but also increases the number of laser detection bands. By combining the advantages of active detection technology and passive detection technology, the integrated acquisition of spatial 3D spectral information on the same device is realized, which increases the possibility of conducting 3D reconstruction and true-color restoration simultaneously [20]. However, since the laser radar echo data are the energy values corresponding to the spectral channel, there is relatively large spectral difference between the spectral response function calculated by the traditional normalization method and the spectrum correction with the standard spectrum provided by the X-rite ColorChecker Classic in practical applications, which leads to the colour difference in the color reconstruction. Moreover, the distribution of laser transmission power also causes some problems. As shown in Figure 1b, within the visible light range of 400-700 nm, there is no echo energy when the band is 400-430 nm, and the echo energy is weak when the band is 430-470 nm, which leads to the problem that the target cannot be correctly reconstructed with true color. Therefore, conduct normalization and spectrum correction on laser radar echo energy data, and perform spectrum reconstruction on the missing spectral information in the band of 400-470 nm have become the key to the true-color restoration of the target.  In terms of true-color reconstruction of hyperspectral LiDAR, Wang et al. [21] and Chen et al. [22] found that the most suitable wavelength combinations for RGB true-color reconstruction are 466 nm, 546 nm and 626 nm through the analysis of the true-color synthesis results of different wavelengths. Therefore, they selected three wide bands near the corresponding bands, especially the relatively wide blue band, to increase the energy of the RGB channel and obtain a higher signal-to-noise ratio. This method can achieve the color reconstruction of the target to a certain extent. On the other hand, due to the metamerism problem and the failure to consider the standard illuminant and standard observer, the color obtained by reconstruction will have color difference after changing the illuminant.
Normalization and spectrum correction, as the key in hyperspectral laser radar energy data processing, play a decisive role in the accuracy of the subsequent color integral calculation results of multispectral data. Traditional spectrum correction algorithms include first derivative spectrophotometry, centroid algorithm local similarity matching algorithm and various improvement methods [23]. Among them, first derivative spectrophotometry [24] and local similarity matching algorithm [25] have the advantages of simple operation and high real-time, as well as the disadvantage of being easily affected by noise. The centroid algorithm [26,27] is sensitive to the spectrum waveform asymmetry, which will cause the corrected multispectral image to generate random direction deviation. After simulation correction of radar echo energy through the above algorithms, the color obtained by spectral response curve integral calculation and standard color block calculation still has great color difference, which can cause the traditional spectrum correction method to fail to perform true-color reconstruction on the target.
Regarding the reconstruction of missing spectrum, the missing band fixed by laser radar is between 400-470 nm, and the information in the 470-700 nm band is relatively intact. At the same time, there is a strong correlation between the multispectral spectrum segments. Therefore, it is feasible to restore the missing 400-470 nm band by using 470-700 nm. Adam et al. [28] sampled the low-quality spectrum segments in multispectral data and the adjacent high-quality spectrum segments respectively, and rearranged them to form a 3D data cube. Then, wavelet decomposition and sparse transformation were used to restore the data cube. Zhong et al. [29] firstly established the conditional random fields model between multiple spectrums to select the unusable part of the spectrum and the spectrum segment with high signal-to-noise ratio. Then, the texture and statistical information related to the spectrum segment with high signal-to-noise ratio was used to restore the continuous missing spectrum segment. Yin et al. [30] believed that the spectral information of multispectral images was highly redundant. They discussed the sparsity of the spectrum by setting up the general Gaussian model. In addition, they conducted the spectrum-dimensional reconstruction based on the compressed sensing theory, which was superior to the linear interpolation in the recovery of missing spectrum. Liu et al. [31] established a bi-inverted Gaussian fitting model. As atmospheric water vapor absorbed the reconstruction vegetation and resulted in the unusable spectrum, the diagnostic spectrum features of vegetation absorption peak were extracted for the inversion of water content. It should be noted that the above model-based reconstruction methods for missing spectrum only have good effects for plants within certain wavelength ranges or specific types of geographic objects, and their versatility in true-color reconstruction is significantly limited.
In order to solve the problem of the large spectral difference between the normalized echo energy data based on traditional methods and the standard spectrum provided by Xrite ColorChecker Classic, this article proposes a Hyperbolic Tangent Spectrum Correction Model with Parameters. The echo energy processed by this model has a relatively small spectral difference compared with the X-rite Eye One Pro spectrophotometer at 10 nm intervals from 400 to 700 nm.
In order to solve the problem of serious color difference in true-color restoration due to the missing of 400-470 nm band spectrum, a series forecasting algorithm based on gradient boosting decision tree is proposed in this article. The spectrum data provided by X-rite ColorChecker Classic is used as the radar control experiment data, and the multispectral data set [32] provided by ICVL (interdisciplinary computational vision laboratory) is used as the prior data and test data. Through cyclic learning, the spectrum data are divided into 31 parts, and each feature is modeled separately. Through the locally weighted linear regression algorithm, the missing 400-470 nm band information is cyclically and accurately reconstructed, so as to solve the problem of great color difference in the true-color reconstruction result.
Through the spectral reconstruction by hyperbolic tangent normalization and correction model with parameters and improved gradient boosting decision tree sequence prediction algorithm, the relative energy data and coordinate point cloud data of the radar echo can be directly processed into the color information and position information of each point, thus achieving the integrated representation of echo energy-spectrum-color.

HSL System Description
All spatial point cloud and echo energy data in this article were obtained through the all-day active hyperspectral LiDAR prototype device. The laboratory HSL is mainly composed of five parts, including supercontinuum laser, optical receiver, light-splitting, signal reception and ranging components. First, high-nonlinearity photonic crystal fibers are used to generate and emit wide-band white lasers. Among them, the output power of the supercontinuum laser source is greater than 6 w, with the repetition rate of 0.1-1 mhz and the pulse duration of 4 ns. The spectral range of the laser is 400-2400 nm. In order to consider the data quality of backscattered signals, a high-precision two-dimensional scanning platform is adopted, and it is equipped with off-axis parabolic mirror of high reflectivity and Schmidt-Cassegrain telescope with focal length and diameter of 400 mm and 200 mm. The collected signal is guided by fiber to a grating spectrometer. In order to accurately project the spectral signals onto the detectors, we adopted a 150 g/mm blazed grating with high spectroscopic efficiency. The 32-element photosensitive photomultiplier tube array with a wavelength of 300-920 nm is selected as the received sensor. The output signal of the laser source and the backscattered signals of the target will be recorded into the APD detector respectively. By comparing the time interval between output signals and backscattered signals, time-of-flight can be used to calculate distance. The laser radar samples the multispectral data and point cloud data for each point on the X-rite ColorChecker Classic, as shown in Figure 2. The standard color card used was X-rite ColorChecker Classic, and the sampling method was the single point test of the color sample of color card. The final value was the average of three samplings at a single point. The collected standard diffuse reflection whiteboard channel energy values were used as the peak value of the traditional normalization method.

Hyperbolic Tangent Normalization and Correction Model with Parameters
After sampling the energy of X-rite ColorChecker Classic hyperspectral LiDAR, the corresponding energy distribution of each band is shown in Figure 3. By analyzing the corresponding relationship between energy and spectrum, the following findings are obtained. When the spectral band is between 400 nm and 430 nm, the echo energy obtained by the hyperspectral LiDAR is almost 0. As a result, the spectral information of this band is completely lost. When the spectral band is between 430-470 nm, the echo energy obtained by hyperspectral LiDAR is less than 5000. Therefore, it is a weaker echo energy range. When the spectral band is between 470-700 nm, the hyperspectral LiDAR has relatively normal echo energy data. Within the visible light range of 400-700 nm, the energy curve also corresponds to the divergence power of the laser radar, as shown in Figure 1b. Although the echo energy in the 470-700 nm range is normal, by measuring the standard diffuse reflection whiteboard, the spectral response curve of the normalized echo energy is significantly different from the standard spectral response curve provided by the X-rite ColorChecker Classic color card, as shown in Figure 4.  The expected spectral response curve has the higher sensitivity and change rate when the echo energy is low, as well as the lower sensitivity and change rate when the reflected energy approaches the maximum value. In order to get the accurate spectrum normalization value, this article proposes a new normalization model. The min-max normalization of the energy function in the 470-700 nm band is shown in Equation (2): where, MIN is the minimum value among all the energy values collected in the 470-700 nm band, and MAX is the maximum value. By comparing the spectral response curves, we found that the two curves have the same trend but have larger fluctuations. The function fitting method can solve this problem well. When it is close to 1, it can be flattened, and the other parts retain their original sensitivity. Hyperbolic tangent function has this property. As shown in Equation (3), the hyperbolic tangent spectrum correction model with parameters is used to correct the normalization value in the 470-700 nm band.
In the calibration process, the mean square difference between the target spectrum and the source spectrum information ∃b is calculated. By making f (x, b) = 0 and substituting the parameter b into the hyperbolic tangent calibration model, the normalized spectral response curve of 470-700 nm band is obtained.

Improved Spectrum Reconstruction of Gradient Boosting Decision Tree Series Forecasting
In essence, the gradient boosting decision tree is a kind of integrated learning method based on tree structure. The basic tree structure is the classification and regression tree, which is a locally weighted linear regression algorithm that divides the features into multiple parts and models each feature separately. Different from the traditional methods, the regression algorithm based on the tree structure is a parameter-based learning algorithm. Through the data training of model, the parameter value of each tree node is determined, without the need of updating in the subsequent prediction process.
Currently, Gradient Boosting Decision Trees [33][34][35] has many improved algorithms, but they are all for classification tasks. Zhang J., et al. [36] proposed Fisher score and Gradient Boosting Decision Trees algorithm, this paper combines Fisher algorithm with features trained in the gradient boosting decision tree for cancer risk identification. Zhang W., et al. [37] This paper is also an improved Gradient Boosting Decision Trees for the prediction model for air conditioning system loads. Chen Yiyi, et al. [38] The paper is a classification of data related to psychology by using gradient boosting decision trees. All of the above are improvements to classification problems using gradient boosting decision trees, which are not suitable for trend prediction tasks.
Firstly, the gradient boosting decision tree has been rewritten from the classification method to the sequence prediction algorithm. Therefore, the output value is no longer categories, but sequence prediction results. Secondly, if the correction function is not added, the subsequent value will become larger and larger, thus affecting the accuracy of prediction result.
It is assumed that the model is represented as F(x; P), where P represents the parameter set, P = {p 0 , p 1 , p 2 · · · p n }, and F(x; P) represents the prediction function x with P as the parameter. As each feature is modelled separately, the model is composed of multiple sub models, where β represents the weight of each sub model, and α represents the parameters in each sub model. Optimizing F(x; P) an be abstracted to optimizing {β, α}, i.e., optimizing parameter P. Considering that the optimized data set is normalized based on Equation (3), the expression for x in the form of Equation (3) is used to achieve consistency between the optimized result and the expected result.
Equation (4) performs a hyperbolic optimization of the expression for x. As it approaches 1, the model has less variation. At it is less than 1, the model can maintain its original sensitivity. The parameter b is derived by Equation (3). The parameter {β, α} in the model is represented by Equation (5): The purpose of this equation is to use N sample points (x i , y i ) to calculate the loss function under the model F(x; β, α). The purpose of optimizing {β, α} is to minimize the value of the loss function. The model F m (x) is optimized in the direction where the loss function of the previously obtained model F m−1 (x) declines fastest. Each point x i yields −−−→ g m (x i ). Equation (6) is used to derive the complete direction of gradient descent.

− →
In order for F m (x) to be in the direction of −−−→ g m (x i ), least-square method is used to optimize the Equation (5) to get Equation (7): On the basis of the derived α m , β m can be calculated, as shown in Equation (8): The final combined model equation is shown in Equation (9): where, the value of parameter b is the parameter value calculated by the normalization equation. The overall algorithm flow chart is shown in Figure 5, where the parameter P is the number of channels to be predicted, and M is the number of features. In addition, there are many trend prediction algorithms, such as seq2seq and Random-Forest. However, Seq2Seq and RandomForest are suitable for NLP tasks such as Machine Translation and Semantic Recognition.

Results
In order to enable the monitor to display the color situation correctly, the angle 10 • of CIE1964 was selected as the standard observer parameter, and the standard illuminant information was used as the D65 illuminant parameter. The tristimulus values calculated by the color integral equation were converted to the sRGB color gamut.
The spectral information provided by X-rite ColorChecker Classic was shown in Figure 6a. The result normalized by the standard diffuse reflection whiteboard was used to calculate the colors as shown in Figure 6b. The whiteboard with standard normalization is corrected by first derivative spectrophotometry, local similarity matching algorithm and centroid algorithm. The final calculated colors were shown in Figure 6d-f. As indicated, this type of correction algorithm was not applicable to spectrum color restoration. Through the hyperbolic tangent spectrum normalization and correction model with parameters proposed in this article, the results calculated by color integral are shown in Figure 6c. According to the color calculation results, first derivative spectrophotometry, local similarity matching algorithm and centroid algorithm were not suitable for true-color reconstruction. The traditional normalization result based on diffuse reflection whiteboard and the spectral difference between the normalization optimization algorithm proposed in this article as well as the standard spectrum of X-rite ColorChecker Classic were shown in Figure 7, the calculation method is shown in Equation (10).
Among them, the abscissa was the color number, and the ordinate was the spectral difference value. By comparison, it could be found that the spectrum after normalization and correction by the algorithm proposed in this article was closer to the standard spectrum. According to the color calculation results in Figure 6, it can be found that there was obvious colour difference between the spectral information missing the 400-470 nm channel and the X-rite ColorChecker Classic. In this article, the processed ICVL multispectral data set was used as the priori data set. Since 12 multispectral natural images with resolution of were composed, there were a total of 21,715,200 spectral information in 400-700 nm band in the data set. Among them, the first 80% of the spectral information of each image was used as the priori data, and the last 20% of the spectrum data was used as the test data. We did a five-fold cross-validation [39] on the training set and test set. The two indicators for comparison are ∆E 2000 , which mainly measures the color difference between the reconstructed color and the original color. The smaller the value, the more similar the two colors. The reconstructed part and the original part are calculated by RMSE, which measures the similarity between the reconstructed spectrum and the original spectrum. The smaller the value, the more similar the two spectra. The results are shown in Table 1. Based on the calculated prior data parameters, the improved gradient boosting decision tree series forecasting algorithm was applied to reconstruct the missing channel between 400-470 nm, and the results were shown in Figure 8b. The bi-inverted Gaussian model is applied to reconstruct the missing part of spectrum, and the color calculation result was shown in Figure 8d. The color calculation results of the normalized spectrum based on the traditional standard diffuse reflection whiteboard was shown in Figure 8c. The color calculation results based on the Linear Interpolation was shown in Figure 8e.  Figure 9 showed the color difference between the reconstruction results based on the above three methods and the X-rite ColorChecker Classic. Among them, the abscissa was the color number, and the ordinate was the color difference value [40]. According to the color difference value, there was smaller color difference between color information calculated by using the energy obtained by laser radar and X-rite Col-orChecker Classic.
At the same time, a simulation experiment based on Munsell 1269 color chip was carried out to verify the adaptability of the algorithm in this paper. According to the spectral reflectance of each color published by Munsell, the color calculated by the color integral equation was converted into the CIE Lab Color Space. Figure 10a was obtained, where value a was the abscissa (X-axis) and value b was the ordinate (Y-axis). After removing the 400-470nm band information, the calculated result was shown in Figure 10b. Through improved spectrum reconstruction of gradient boosting decision tree series forecasting, the 400-470 nm band information was complemented, and the calculated results were shown in Figure 10c. The results obtained by complementing the spectrum by the biinverted Gaussian model and linear Interpolation were shown in Figure 10d,e respectively. The color difference relationship between the color calculated by the recovered spectrum based on the above 4 methods and the color calculated by the complete spectrum was calculated, as shown in Figure 11.
As can be seen from the line chart Figure 11, the average color difference between 24 colors and 1269 colors was calculated as shown in Table 2 for the color blocks missing the 400-470 nm band by the method proposed in this article.
As can be seen from Table 2, compared with B the bi-inverted Gaussian model and the linear interpolation algorithm, the color results calculated by the method proposed in this paper had the advantage of smaller color difference.
For both data results of Mean and Max, the performance of the improved algorithm will be significantly enhanced than that of the original method. In the analysis of Figure 11, there was relatively large color difference in individual colors. The color difference heat map Figure 12 involved the missing spectra in the 400-470 nm band Figure 12a, the restored spectra based on the algorithm Figure 12b in this paper, and the complete spectra. Since the missing 400-470 nm band belongs to the blue band, a serious color cast appeared in the blue color zone. The algorithm in this paper had a relatively large color difference between the cyan area and the central area, which caused large fluctuations in these parts of the spectrum, which in turn led to the deviations in the prediction results. This could be used as the optimization direction for follow-up research.  In order to verify whether it can accurately restore the color information of geographic objects in the actual application, experiments on color card with 17 colors and real object are carried out, as shown in Figure 13.
By scanning the laser radar array of the color card with 17 colors, the spatial information of the color card with 17 colors and the energy information corresponding to the channel are obtained. The collected energy information is normalized and corrected through the hyperbolic tangent spectrum correction model with parameters. Then, spectral reconstruction is performed on the missing 400-470 nm bands based on the parameters calculated above. Finally, the spectrum integral of each reconstructed point is used to calculate the corresponding color information, as shown in Figure 14.  Based on the same approach, the result of real object data is shown in Figure 15. The experimental results show that the high spectrum correction algorithm and the missing reconstruction algorithm proposed in this article can also restore the target color accurately in actual application. Figure 15 is obtained by scanning the HSL system array. A total of 2800 points are obtained. In addition to the x.y.z coordinate data, there are 27 channels of data from 440-700 nm at 10 nm intervals. Through the processing of the spectral reconstruction by hyperbolic tangent normalization and correction model with parameters, the reconstruction of the missing part by the improved gradient boosting decision tree sequence prediction algorithm, and the combination of the color value of the point calculated based on the color integral formula and the x.y.z coordinate data, the final result figure is obtained. In addition, the problem of mixed pixels also led to inaccurate colors. As shown in Figure 12, the red part of the Rubik's Cube appeared muddy. This issue was also worthy of in-depth study.

Discussion
The above experiment did not consider the negative impact of atmospheric effects on the high-precision acquisition of backscattered intensity. Various conditions such as atmospheric components will make the backscattered laser energy suffer a certain degree of attenuation. For some satellite LiDAR systems, atmospheric effects may cause varying degrees of interference to the reconstructed colors, which further causes the reconstruction accuracy to fail to reach the laboratory effect.
Similarly, good hardware performance of system configuration is a prerequisite for true color reconstruction. Continuous supercontinuum laser source device is needed, with spectral output range that can cover most of the visible range and the acquired signal that has a higher signal-to-noise ratio (SNR). In order to the improve photoelectric transformation efficiency, photoelectric detectors also need to exhibit high quantum efficiency.
This research provides a reconstruction method that combines hyperspectral LiDAR echo energy and true-color. In the laboratory scenario, this method can more accurately reconstruct the true colors of color cards and objects in the absence of part of the visible light spectrum. Moreover, this method can also be applied to hyperspectral imaging to optimize the color reconstruction results.

Conclusions
This article proposes the hyperbolic tangent spectrum normalization and correction model and the improved spectrum reconstruction algorithm with gradient boosting de-cision tree sequence prediction, with the aim of directly converting radar echo energy data into image color information. The point cloud information obtained by hyperspectral LiDAR and the energy information of corresponding channel are used as the experimental data set. The results show that the color integral equation based on the normalization and correction function and the missing part spectrum reconstruction provided in this article can accurately achieve the true color reconstruction. By comparing various spectrum correction algorithms based on 24-color and 1269-color cards, it is found that the traditional spectrum correction algorithm is not applicable to the true color reconstruction. Compared with the traditional normalization method with standard diffuse reflection whiteboard, the spectrum corrected by the algorithm proposed in this article is closer to that of the standard color card. The improved spectrum reconstruction algorithm with gradient boosting decision tree sequence prediction proposed in this article can accurately reconstruct the missing multispectral band information between 400-470 nm. In addition, experiments suggest that it can also achieve good effects in practical application of hyperspectral LiDAR. This article demonstrates the possibility of true color reconstruction through hyperspectral LiDAR echo energy data, which is of vital significance and value for multispectral imaging and ecological monitoring.