Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms

Dmitriev, Pavel A.; Dmitrieva, Anastasiya A.; Kozlovsky, Boris L.

doi:10.3390/agriengineering7030090

Open AccessArticle

Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms

by

Pavel A. Dmitriev

^*

,

Anastasiya A. Dmitrieva

and

Boris L. Kozlovsky

Botanical Garden, Academy of Biology and Biotechnologies, Southern Federal University, Rostov-on-Don 344006, Russia

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(3), 90; https://doi.org/10.3390/agriengineering7030090

Submission received: 10 February 2025 / Revised: 10 March 2025 / Accepted: 18 March 2025 / Published: 20 March 2025

(This article belongs to the Collection Exploring the Application of Artificial Intelligence and Image Processing in Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Hyperspectral plant phenotyping is a method that has a wide range of applications in various fields, including agriculture, forestry, food processing, medicine and plant breeding. It can be used to obtain a large amount of spectral and spatial information about an object. However, it is important to acknowledge the inherent limitations of this approach, which include the presence of noise and the redundancy of information. The present study aims to assess a novel approach to hyperspectral data preprocessing, namely Random Reflectance (RR), for the classification of plant species. This study employs machine learning (ML) algorithms, specifically Random Forest (RF) and Gradient Boosting (GB), to analyse the performance of RR in comparison to Min–Max Normalisation (MMN) and Principal Component Analysis (PCA). The testing process was conducted on data derived from the proximal hyperspectral imaging (HSI) of leaves from three different maple species, which were sampled from trees at 7–10-day intervals between 2021 and 2024. The RF algorithm demonstrated a relative increase of 8.8% in the F1-score in 2021, 9.7% in 2022, 11.3% in 2023 and 11.8% in 2024. The GB algorithm exhibited a similar trend: 6.5% in 2021, 13.2% in 2022, 16.5% in 2023 and 17.4% in 2024. It has been demonstrated that hyperspectral data preprocessing with the MMN and PCA methods does not result in enhanced accuracy when classifying species using ML algorithms. The impact of preprocessing spectral profiles using the RR method may be associated with the observation that the synthesised set of spectral profiles exhibits a stronger reflection of the general parameters of spectral reflectance compared to the set of actual profiles. Subsequent research endeavours are anticipated to elucidate a mechanistic rationale for the RR method in conjunction with the RF and GB algorithms. Furthermore, the efficacy of this method will be evaluated through its application in deep machine learning algorithms.

Keywords:

hyperspectral phenotyping; proximal hyperspectral imaging; spectral band; synthetic hyperspectral profile; Acer; random forest; gradient boosting

1. Introduction

Remote and proximal hyperspectral phenotyping is now widely used in agriculture and forestry [1,2,3]. Hyperspectral imaging (HSI) is a technique that provides simultaneous spectral and spatial information about an object by combining the data into a three-dimensional matrix, known as a hypercube. This method employs two approaches to study the object—imaging and spectroscopy—which enable simultaneous assessments of both physiological and morphological parameters of plants [4,5,6]. Hyperspectral imaging is characterised by high spatial and spectral resolution but is sensitive to noise and can lead to impractical data sizes due to redundancy. Noise in the data can be caused by sensor calibration, atmospheric effects, and light scattering due to the plant morphology [5,7].

Proximal HSI is a new technology for acquiring hyperspectral images of ultra-high spatial resolution [4]. One of the current remote sensing problems that can be addressed by proximal HSI is the identification of woody plants from their spectral characteristics. Without solving this problem, it is impossible to remotely monitor the condition of fruit trees [8] and tree plantations in settlements and forests [3,9,10]. In proximal HSI, the primary source of noise is scatter effects, which introduce nonlinearity into the data [7].

Data redundancy and noise complicate hyperspectral data analysis, processing and storage. Therefore, preprocessing, including noise removal and information compression, is a critical step in hyperspectral data analytics. Currently, there is a large set of simple methods of hyperspectral data preprocessing aimed at noise removal. These include simple spectral averaging, spectral smoothing (moving average methods, the Savitzky–Golay filter, first derivatives and second derivatives), Standard Normal Variate, multiplicative scatter correction, Min–Max Normalisation (MMN), Principal Component Analysis (PCA), mean centring, area under the curve and single wavelength [4,11,12,13,14]. These methods are quite simple, easy to understand, long-tested and do not require large computational costs. Concurrently, these techniques do not significantly enhance the classification accuracy of machine learning (ML) algorithms [15,16,17,18].

Classification accuracy can be significantly affected by more complex variants of spectral averaging due to the enlargement of the object of classification, e.g., from pixel to leaf or crown and from pixel to superpixel [19,20]. The use of superpixels created by different methods can meaningfully improve the classification accuracy of ML algorithms [21,22,23,24]. In addition, the enlargement of the analytics object leads to a reduction in the amount of data, but the use of these preprocessing methods results in the loss of some of the spectral information.

There are other promising approaches to improve classification accuracy. Thus, Fan et al. [25] proposed a new image preprocessing method, aiming to reduce the complexity of spatial information to facilitate model training and improve accuracy. This method evaluates the similarity between the central pixel and other pixels in each image fragment and then replaces pixels that are different from the centre pixel with pixels that are like the central pixel. This method can be considered as one of the variants of spectral smoothing. Shenming et al. [26] proposed a hyperspectral image classification method to improve the accuracy of models, which combines a two-dimensional Gabor filter with random patch convolution (GRPC) feature extraction to obtain spatial–spectral feature information. Duan et al. [27] proposed a novel complex total variation method for extracting structural features from hyperspectral images to improve classification accuracy. However, such preprocessing methods are quite complex and consequently require large computational resources.

This study presents a very simple method for hyperspectral data preprocessing, which can be easily implemented in any computing environment and does not require large computational resources. This method (Random Reflectance, RR) is a synthesis of artificial spectral profiles (SPs) by randomly selecting the values of each spectral band (SB) from the original dataset within the region of interest (ROI) in the hyperspectral image.

The aim of this study was to test the RR method for plant species classification tasks using the Random Forest (RF) and Gradient Boosting (GB) algorithms and to compare its performance with MMN and PCA.

It is found that the RR method significantly improves the accuracy of ML algorithms.

2. Materials and Methods

2.1. Object of Study

This study was conducted in the Botanical Garden of the Southern Federal University (SFedU, 47°13′ N; 39°39′ E) in the period from 2021 to 2024. The objects of the study were three maple species—Acer campestre L., A. negundo L. and A. saccharinum L. Acer campestre and A. saccharinum are valuable plants for ornamental horticulture [28]. In contrast, A. negundo is a widespread invasive species that requires constant monitoring [29]. Each species was represented by three trees. All specimens of the species were of the same age and grown under the same conditions, including light exposure. Maple leaves were used as the object of laboratory proximal HSI, which were sampled from trees at 7–10-day intervals during their vegetation period. Ten leaves were sampled from each tree along the perimeter of the crown. The selected leaves were transported to the laboratory for HSI within an hour.

2.2. Hyperspectral Imaging

Hyperspectral imaging of maple leaves was carried out in laboratory conditions under artificial light using a Cubert UHD-185 frame camera (Cubert GmbH, Ulm, Germany). The camera has a spectral range from 450 to 950 nm and a spectral resolution of 4 nm and 125 SB. The leaf was positioned at a distance of 40 cm from the camera lens (pixel size 0.25 cm²).

2.3. Hyperspectral Data Preprocessing

All spectra in the hyperspectral images were smoothed using the Savitsky–Golay filter beforehand. Region of interest (ROI) selection on the hyperspectral image was performed using two-stage segmentation [30]. In the first stage, ROI was selected by setting the Carter5 [31] vegetation index threshold with a value greater than 1.4. In the second stage, morphological erosion was applied using a 3 × 3 structuring element. A flowchart explaining the technology roadmap for this study is presented in Figure 1.

Synthetic SP was used to eliminate the scatter effects and improve the classification accuracy of ML algorithms. The synthesis of SP was performed by randomly selecting (random selection in the form of return sampling) reflectance (R) values for each SB from a set of real SP corresponding to pixels within a particular maple leaf on a hyperspectral image. This method of generating synthetic spectral profiles was named Random Reflectance (RR). The principle of synthetic profile generation is demonstrated in Figure 2.

The performance of the RR method was compared with hyperspectral data preprocessing methods such as MMN and PCA (Figure 1).

2.4. Analysing Preprocessed Hyperspectral Data

The classification of maple species via SB was performed using the RF and GB algorithms. A total of 70% of the data were used for training and 30% of the data for testing. A 5-fold cross-validation method was used to adjust the hyperparameters and to assess efficacy (Figure 3). RF hyperparameters: number of trees = 100, number of features used for node partitioning = 5 and number of predictors that will be randomly sampled at each split = 64. GB hyperparameters: maximum tree depth = 3, learning rate = 0.2, number of trees = 500, and number of iterations of the RF and GB model = 100. The number of original SPs was levelled by their minimum number per leaf on a particular date. The number of synthetic SPs per leaf was 300.

3. Results

3.1. Exploration Analysis of Synthetic Spectral Profiles

Exploratory analyses of synthetic SP were performed, and basic statistics were calculated for the SB of the original and synthetic SPs (Table 1).

The use of the RR method for the synthesis of spectral profiles does not change the quantitative and qualitative characteristics of their distribution over the R values of SBs in comparison with original SPs. The main parameters of the SB distribution do not change with increasing number of synthetic SPs. This shows that in the process of synthesis of the artificial SP, there is no loss of spectral information (dimensionality and variation) by SB. Therefore, this method can be suitable for balancing classes and expanding the volume of training samples when using ML algorithms.

Figure 2 shows that the randomisation of SB values of SP represents one of the variants of spectral averaging, which is performed at the SP level. To demonstrate this, the distributions of original and synthetic SPs were plotted according to the mean values of their R (Figure 4). The average R value of SP was calculated as the sum of R values of all its SB, which was then divided by their number (125).

The nature of the SP distributions indicates that the synthetic SP is more effective in reflecting the central tendency of the leaf spectral response, with their distributions clearly differentiating by species. This has potential significance for object classification using ML algorithms.

The utilisation of synthetic SP for the purpose of maple classification has been substantiated by the PCA results (Figure 5). Upon the plane of the first two principal components, the synthetic SP manifests a conspicuously elevated degree of grouping by species in comparison to the original SP.

A comparison of the matrices of pairwise coefficients of determination representing the strength of the correlation relationship between the SB of the original and synthetic SPs provides interesting information (Figure 6).

In synthetic SPs, there is no correlation between different SBs. The absence of multicollinearity phenomenon in synthetic SPs may be of importance when using ML algorithms.

3.2. Results of Maple Classification Using Machine Learning Algorithms Based on Original and Synthetic Spectral Profiles

Time series of hyperspectral leaf images of three maple species for four growing seasons (2021–2024) were used for classification by ML algorithms.

The results of the RF classification of three maple species based on original, synthetic, MMN and PCA preprocessed spectral data are presented in Figure 7.

The employment of SPs synthesised using the RR algorithm resulted in an enhancement of the accuracy of the RF classification of maple species. The relative increase in F1-score was 8.8% in 2021, 9.7% in 2022, 11.3% in 2023 and 11.8% in 2024. The effect of classification accuracy increase is observed at all hyperspectral dates and by years. At the same time, preprocessing of hyperspectral data using the MMN and PCA methods did not improve the classification accuracy. In connection with this result, it is interesting to analyse the dependence of out-of-bag (OOB) error on the number of trees in the RF algorithm when original and synthetic SPs are used in classification (Figure 8).

It is shown that the optimal number of trees for RF classification when using synthetic SPs is smaller than when using original SPs. At the same time, the OOB error is significantly lower.

The results of the GB classification of three maple species based on original, synthetic, MMN and PCA preprocessed spectral data are presented in Figure 9.

As was the case with the RF algorithm, the GB algorithm performed better in classifying the three maple species when using synthetic SPs compared to the original SP or the preprocessed MMN and PCA methods. The relative increase in F1-score was 6.5% in 2021, 13.2% in 2022, 16.5% in 2023 and 17.4% in 2024.

With both algorithms, the following trend is observed, the lower the classification accuracy based on the original SP, the greater the positive effect of using synthetic SPs.

Thus, the effect of using RR is independent of the classification algorithm, seasons (spring, summer, autumn) of HSI and the year of the experiment.

4. Discussion

In recent years, remote sensing in combination with hyperspectral cameras has assumed a pivotal role in agriculture, forestry, and other human activities [1,2,3]. It is evident that HSI is a more efficient method for spectral phenotyping of plants in comparison with multispectral imaging [4,5,6,32]. Concurrently, the analysis of hyperspectral data is complicated by several factors, including information redundancy, strong correlation between neighbouring spectral channels (multicollinearity), nonlinear data structure (including as a consequence of noise) and a small number of training samples [7,26,33]. Therefore, preprocessing is a very important step in hyperspectral data analytics. The present study puts forward a novel RR method for the generation of synthetic SPs, with the objective of its subsequent utilisation within ML algorithms. The synthesis of SPs is performed through the random selection of reflectance (R) values for each SB from a set of original SPs corresponding to the object pixels in the hyperspectral image. Upon initial observation, the transformation of an SP through the RR method appears to defy logic, resulting in an increase in entropy. Nevertheless, the efficacy of the RR method for the preprocessing of SPs, with a view to enhance the accuracy of plant species classification through ML algorithms, is evidenced by the research material collected over a period of four years. Mixing within separate SBs of reflectance values over the whole SP averages ‘bad’ and ‘good’ SPs (pixels) of the object (maple leaf). At the same time, the SP becomes closer to each other (i.e., more similar) through individual SBs. The set of synthetic SPs better reflects general parameters of spectral reflection of the object (critical characteristics that distinguish different species of maple), compared to the set of original SPs. When applying the RR method, it is important to note that both the dimensionality and the variation in SB values within the ROI remain constant (Table 1). This ensures that there is no loss of spectral information. This is of particular importance in the context of plant object classification using the ML and deep ML algorithms, where the comprehensive nature of the information is crucial for effective classification. Part of the relevant information may be noise if it is, for example, the result of a particular leaf shape or surface structure characteristic of a particular plant species or cultivar [34,35]. The presented method of synthesising artificial SPs can most likely not be considered as a direct way to remove noise (all information on individual SBs is preserved completely; only the central tendency of the SP is changed). Considering that the RF algorithm is quite robust to noise in the data [15,17], the steady increase in the accuracy of maple classification after preprocessing of the original SP using the RR method gives a reason to identify it to improve the accuracy of classification algorithms.

Such noise removal techniques as spectral smoothing, Standard Normal Variate, multiplicative scatter correction, Min–Max Normalisation, mean centring, area under the curve and single wavelength do not reject ‘bad’ SPs but only transform them [4,13,14]. Concurrently, the original dimensionality and variation in reflectance values in these approaches, in contrast to the RR method, can exhibit significant variability. Consequently, it is imperative to acknowledge that alterations in the statistical attributes of the initial hyperspectral data during the process of spectral smoothing may impede the accurate classification of vegetation objects [36].

The multicollinearity of features is a serious problem for ML [37]. It can cause overfitting of models, which leads to incorrect classification results. A high correlation between neighbouring SBs in a hyperspectral image serves to negate the advantage offered by high spectral resolution. The proposed RR method very strongly reduces the correlation (R² < 0.1) between SBs over the whole SP (Figure 6). This may be one of the reasons for the increased accuracy of ML algorithms.

This study compared the performance of three preprocessing methods for maple species classification: RR, PCA and MMN. Previous studies have indicated that PCA and MMN are effective preprocessing methods for hyperspectral data [11,38], including improved classification accuracy [12,39]. However, in this study, MMN (data normalisation) and PCA (data normalisation followed by data compression) were found to be unsuccessful. It has been shown previously that the effectiveness of a particular hyperspectral data preprocessing method may depend on the nature of the data itself [40,41]. This emphasises the need to pre-test hyperspectral data preprocessing methods. The RR method has demonstrated versatility with respect to the RF and GB algorithms, so it is probably potentially applicable to other ML algorithms.

In consideration of the findings, three primary applications of Random Reflectance can be delineated:

To balance classes and expand the size of training samples, which is relevant in cases when the initial data are few or difficult to collect. This is justified by the fact that the method synthesises any number of SPs while preserving the statistical characteristics of the distribution of reflectivity values of their SBs. In such cases, it is crucial that the sample of the initial SP is representative of the object.
To improve the classification accuracy of ML algorithms, which is justified by the significant improvement in prediction accuracy of maple species observed with both the RF and GB algorithms.
To ‘combat’ the phenomenon of collinearity of neighbouring spectral channels in the hyperspectral cube; the application of the RR method allows us to reduce the R² value between spectral channels to 0.1.

The RR method is a simple and low computationally intensive technique of hyperspectral data preprocessing that has been shown to improve the accuracy of plant object classification. This renders the RR method suitable for implementation in programmes designed for the remote identification of plant species in real time, a potential advantage over methods relying on deep ML [25,26,27].

5. Limitations

The use of the RR method for hyperspectral data preprocessing has several limitations. This includes the loss of most of the spatial information; that is, the synthetic SP cannot be linked to a specific pixel of the object (the location of the synthetic SP on the object is determined only with a certain probability). The efficiency of the RR method is shown for hyperspectral data preprocessing; using this method for multispectral data preprocessing will probably not give a positive result.

6. Future Perspectives

Further testing of the RR method on deep machine learning algorithms is envisaged. In addition, future research will focus on establishing links between the accuracy of species classification and their phenological status. The experiment lasted four years, so it is possible to give a preliminary answer to the current question of spectral phenotyping of woody plants about the presence of phenological phases, interphase periods or climatic seasons in which the difference between species in spectral characteristics is most significant [42,43,44,45]. It is unfortunate that the present study failed to identify a well-defined pattern between leaf development and the accuracy of maple species classification.

7. Conclusions

The present study found that the use of SPs synthesised using the RR method significantly improves the accuracy of maple species classification using ML algorithms. For the RF algorithm, the relative increase in F1-score was 8.8% in 2021, 9.7% in 2022, 11.3% in 2023 and 11.8% in 2024. For the GB algorithm, it was 6.5% in 2021, 13.2% in 2022, 16.5% in 2023 and 17.4% in 2024. The performance of RR was compared with the MMN and PCA methods. It has been shown that preprocessing hyperspectral data with the MMN and PCA methods does not give an increase in accuracy when classifying species using ML algorithms. This effect can be explained by a clearer reflection in the synthetic SP of the central tendencies of leaf spectral characteristics that determine the differences between species. Concurrently, there is no loss of spectral information on separate spectral bands; the values of minimum, maximum, mean, median and standard deviation for reflectance remain constant. Moreover, the employment of synthetic profiles can be used to facilitate class balancing and augment sample sizes, a matter of pertinence when raw data are either scarce or difficult to obtain. The RR method has been demonstrated to address the issue of multicollinearity in hyperspectral data, a consideration that is of relevance when utilising ML algorithms.

Author Contributions

Conceptualization, P.A.D. and B.L.K.; data curation, A.A.D.; formal analysis, A.A.D.; investigation, P.A.D., B.L.K. and A.A.D.; methodology, P.A.D. and B.L.K.; project administration, P.A.D.; resources, P.A.D. and B.L.K.; software, A.A.D.; writing—original draft, P.A.D. and B.L.K.; writing—review and editing, P.A.D., B.L.K. and A.A.D. All authors have read and agreed to the published version of the manuscript.

Funding

The project was supported by the Russian Science Foundation under grant No. 24-24-00405, https://rscf.ru/project/24-24-00405/ (accessed on 8 February 2025), and performed at Southern Federal University (Rostov-on-Don, Russian Federation).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

García-Vera, Y.E.; Polochè-Arango, A.; Mendivelso-Fajardo, C.A.; Gutiérrez-Bernal, F.J. Hyperspectral Image Analysis and Machine Learning Techniques for Crop Disease Detection and Identification: A Review. Sustainability 2024, 16, 6064. [Google Scholar] [CrossRef]
Ram, B.G.; Oduor, P.; Igathinathane, C.; Howatt, K.; Sun, X. A systematic review of hyperspectral imaging in precision agriculture: Analysis of its current state and future prospects. Comput. Electron. Agric. 2024, 222, 109037. [Google Scholar] [CrossRef]
Yel, S.G.; Gormus, E.T. Exploiting hyperspectral and multispectral images in the detection of tree species: A review. Front. Remote Sens. 2023, 4, 1136289. [Google Scholar] [CrossRef]
Jian, L.; Asaari, M.S.M.; Haidi, I.; Khairi, M.I.; Abdul, D. A Review on Analysis Method of Proximal Hyperspectral Imaging for Studying Plant Traits. Pertanika J. Sci. Technol. T1 2023, 31, 2823–2850. [Google Scholar] [CrossRef]
Lu, B.; Dao, P.D.; Liu, J.; He, Y.; Shang, J. Recent Advances of Hyperspectral Imaging Technology and Applications in Agriculture. Remote Sens. 2020, 12, 2659. [Google Scholar] [CrossRef]
Wong, C.Y.S.; Gilbert, M.E.; Pierce, M.A.; Parker, T.A.; Palkovic, A.; Gepts, P.; Magney, T.S.; Buckley, T.N. Hyperspectral Remote Sensing for Phenotyping the Physiological Drought Response of Common and Tepary Bean. Plant Phenomics 2023, 5, 0021. [Google Scholar] [CrossRef]
Mishra, P.; Lohumi, S.; Khan, H.A.; Nordon, A. Close-range hyperspectral imaging of whole plants for digital phenotyping: Recent applications and illumination correction approaches. Comput. Electron. Agric. 2020, 178, 105780. [Google Scholar] [CrossRef]
Huang, Y.; Ren, Z.; Li, D.; Liu, X. Phenotypic techniques and applications in fruit trees: A review. Plant Methods 2020, 16, 107. [Google Scholar] [CrossRef] [PubMed]
Dmitriev, P.A.; Kozlovsky, B.L.; Dmitrieva, A.A.; Varduni, T.V. Maple species identification based on leaf hyperspectral imaging data. Remote Sens. Appl. Soc. Environ. 2023, 30, 100964. [Google Scholar] [CrossRef]
Hycza, T.; Stereńczak, K.; Bałazy, R. Potential use of hyperspectral data to classify forest tree species. N. Z. J. For. Sci. 2018, 48, 18. [Google Scholar] [CrossRef]
Mazdeyasna, S.; Arefin, M.S.; Fales, A.; Leavesley, S.J.; Pfefer, T.J.; Wang, Q. Evaluating Normalization Methods for Robust Spectral Performance Assessments of Hyperspectral Imaging Cameras. Biosensors 2025, 15, 20. [Google Scholar] [CrossRef] [PubMed]
Saha, D.; Manickavasagan, A. Machine learning techniques for analysis of hyperspectral images to determine quality of food products: A review. Curr. Res. Food Sci. 2021, 4, 28–44. [Google Scholar] [CrossRef]
Witteveen, M.; Sterenborg, H.J.C.M.; van Leeuwen, T.G.; Aalders, M.C.G.; Ruers, T.J.M.; Post, A.L. Comparison of preprocessing techniques to reduce nontissue-related variations in hyperspectral reflectance imaging. J. Biomed Opt. 2022, 27, 106003. [Google Scholar] [CrossRef]
Mishra, P.; Asaari, M.S.M.; Herrero-Langreo, A.; Lohumi, S.; Diezma, B.; Scheunders, P. Close range hyperspectral imaging of plants: A review. Biosyst. Eng. 2017, 164, 49–67. [Google Scholar] [CrossRef]
Arun, S.; Lovish, S.; Viren, C.; Debrup, C.; Aneek, B.R. Impact of Noise in Dataset on Machine Learning Algorithms. Mach. Learn. Res. 2019, 1, 1–8. [Google Scholar] [CrossRef]
Barra, I.; Briak, H.; Kebede, F. The application of statistical preprocessing on spectral data does not always guarantee the improvement of the predictive quality of multivariate models: Case of soil spectroscopy applied to Moroccan soils. Vib. Spectrosc. 2022, 121, 103409. [Google Scholar] [CrossRef]
Hoosen, A.N.; Onisimo, M.; Kabir, P.; Riyad, I. The Impact of Simulated Spectral Noise on Random Forest and Oblique Random Forest Classification Performance. J. Spectrosc. 2018, 2018, 8316918. [Google Scholar] [CrossRef]
Maharana, K.; Mondal, S.; Nemade, B. A review: Data pre-processing and data augmentation techniques. Glob. Transit. Proc. 2022, 3, 91–99. [Google Scholar] [CrossRef]
Zhang, C.; Liu, F.; He, Y. Identification of coffee bean varieties using hyperspectral imaging: Influence of preprocessing methods and pixel-wise spectra analysis. Sci Rep. 2018, 8, 2166. [Google Scholar] [CrossRef]
Clark, M.; Roberts, D.; Clark, D. Hyperspectral discrimination of tropical rain forest tree species at leaf to crown scales. Remote Sens. Environ. 2005, 96, 375–398. [Google Scholar] [CrossRef]
Sellars, P.; Aviles-Rivero, A.; Schonlieb, C.B. Superpixel Contracted Graph-Based Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4180–4193. [Google Scholar] [CrossRef]
Xie, F.; Gao, Q.; Jin, C.; Zhao, F. Hyperspectral Image Classification Based on Superpixel Pooling Convolutional Neural Network with Transfer Learning. Remote Sens. 2021, 13, 930. [Google Scholar] [CrossRef]
Li, D.; Wang, Q.; Kong, F. Superpixel-feature-based multiple kernel sparse representation for hyperspectral image classification. Signal Process. 2020, 176, 107682. [Google Scholar] [CrossRef]
Tu, B.; Ren, Q.; Li, Q.; He, W.; He, W. Hyperspectral Image Classification Using A Superpixel-Pixel-Subpixel Multilevel Network. IEEE Trans. Instrum. Meas. 2023, 72, 5013616. [Google Scholar] [CrossRef]
Fan, J.; Zhang, X.; Chen, Y.; Sun, C. Classification of hyperspectral image by preprocessing method based relation network. Int. J. Remote Sens. 2023, 44, 6929–6953. [Google Scholar] [CrossRef]
Shenming, Q.; Xiang, L.; Zhihua, G. A new hyperspectral image classification method based on spatial-spectral features. Sci. Rep. 2022, 12, 1541. [Google Scholar] [CrossRef] [PubMed]
Duan, P.; Kang, X.; Li, S.; Ghamisi, P. Noise-Robust Hyperspectral Image Classification via Multi-Scale Total Variation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1948–1962. [Google Scholar] [CrossRef]
Prakash, I. Comprehensive Review of Maple Trees: Evolution, Biogeographical Distribution, Ecology, and Economic Significance with Emphasis on Canada. Indian J. Ecol. 2024, 15, 1418–1423. [Google Scholar] [CrossRef]
Khapugin, A. A global systematic review of publications concerning the invasion biology of four tree species. Hacquetia 2019, 18, 233–270. [Google Scholar] [CrossRef]
Dmitriev, P.A.; Kozlovsky, B.L.; Dmitrieva, A.A. Assessing the phenological state of evergreen conifers using hyperspectral imaging time series. Remote Sens. Appl. Soc. Environ. 2024, 36, 101342. [Google Scholar] [CrossRef]
Carter, G.A. Ratios of leaf reflectances in narrow wavebands as indicators of plant stress. Int. J. Remote Sens. 1994, 15, 697–703. [Google Scholar] [CrossRef]
Adão, T.; Hruška, J.; Pádua, L.; Bessa, J.; Peres, E.; Morais, R.; Sousa, J.J. Hyperspectral Imaging: A Review on UAV-Based Sensors, Data Processing and Applications for Agriculture and Forestry. Remote Sens. 2017, 9, 1110. [Google Scholar] [CrossRef]
Zhou, C.; Tu, B.; Ren, Q.; Chen, S. Spatial peak-aware collaborative representation for hyperspectral imagery classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 5506805. [Google Scholar] [CrossRef]
Stasinski, L.; White, D.M.; Nelson, P.R.; Ree, R.H.; Meireles, J.E. Reading light: Leaf spectra capture fine-scale diversity of closely related, hybridizing arctic shrubs. New Phytol. 2021, 232, 2283–2294. [Google Scholar] [CrossRef]
Peters, R.D.; Noble, S.D. Characterization of leaf surface phenotypes based on light interaction. Plant Methods 2023, 19, 26. [Google Scholar] [CrossRef]
Patiluna, V.; Owen, J., Jr.; Maja, J.M.; Neupane, J.; Behmann, J.; Bohnenkamp, D.; Borra-Serrano, I.; Peña, J.M.; Robbins, J.; de Castro, A. Using Hyperspectral Imaging and Principal Component Analysis to Detect and Monitor Water Stress in Ornamental Plants. Remote Sens. 2025, 17, 285. [Google Scholar] [CrossRef]
Chan, J.Y.-L.; Leow, S.M.H.; Bea, K.T.; Cheng, W.K.; Phoong, S.W.; Hong, Z.-W.; Chen, Y.-L. Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics 2022, 10, 1283. [Google Scholar] [CrossRef]
Chen, Z.; Yang, B.; Wang, B. A Preprocessing Method for Hyperspectral Target Detection Based on Tensor Principal Component Analysis. Remote Sens. 2018, 10, 1033. [Google Scholar] [CrossRef]
Cozzolino, D.; Williams, P.J.; Hoffman, L.C. An overview of pre-processing methods available for hyperspectral imaging applications. Microchem. J. 2023, 193, 09129. [Google Scholar] [CrossRef]
Xu, X.; Chen, S.; Xu, Z.; Yu, Y.; Zhang, S.; Dai, R. Exploring Appropriate Preprocessing Techniques for Hyperspectral Soil Organic Matter Content Estimation in Black Soil Area. Remote Sens. 2020, 12, 3765. [Google Scholar] [CrossRef]
Vaiphasa, C. Consideration of smoothing techniques for hyperspectral remote sensing. ISPRS J. Photogramm. Remote Sens. 2006, 60, 91–99. [Google Scholar] [CrossRef]
Ferreira, M.P.; de Almeida, D.R.A.; de Almeida Papa, D.; Minervino, J.B.S.; Veras, H.F.P.; Formighieri, A.; Santos, C.A.N.; Ferreira, M.A.D.; Figueiredo, E.O.; Ferreira, E.J.L. Individual tree detection and species classification of amazonian palms using uavimages and deep learning. For. Ecol. Manag. 2020, 475, 118397. [Google Scholar] [CrossRef]
Pu, R.; Landry, S. Mapping Urban Tree Species by Integrating Multi-Seasonal High Resolution Pléiades Satellite Imagery with Airborne LiDAR Data. Urban For. Urban Green 2020, 53, 126675. [Google Scholar] [CrossRef]
Grigorieva, O.; Brovkina, O.; Saidov, A. An original method for tree species classification using multitemporal multispectral and hyperspectral satellite data. Silva Fenn. 2020, 54, 10143. [Google Scholar] [CrossRef]
Fang, F.; McNeil, B.E.; Warner, T.A.; Maxwell, A.E.; Dahle, G.A.; Eutsler, E.; Li, J. Discriminating tree species at different taxonomic levels using multitemporal WorldView-3 imagery in Washington DC, USA. Remote Sens. Environ. 2020, 246, 111811. [Google Scholar] [CrossRef]

Figure 1. Roadmap for this study.

Figure 2. An example of synthesising a single synthetic SP using the RR method based on several original SPs.

Figure 3. The division of the dataset into training, testing and validation subsets.

Figure 4. Distribution of the original SP (a) and the synthetic SP (b) leaves of three maple species by mean reflectance values.

Figure 5. Projection of SP of leaves of three maple species on the first two principal components: (a)—original SP; (b)—synthetic SP. (date of HSI: 15 August 2023).

Figure 6. Matrix of pairwise coefficients of determination for the values of SBs of the original (a) and synthetic (b) SPs.

Figure 7. F1-score dynamics from the RF classification of A. campestre, A. negundo and A. saccharinum using original, synthetic, MMN and PCA preprocessed spectral data.

Figure 8. Dynamics of OOB error values as a function of the number of trees during the RF classification of maples using original (a), synthetic (b), MMN (c) and PCA (d) preprocessed spectral data.

Figure 9. F1-score dynamics from the GB classification of A. campestre, A. negundo and A. saccharinum using original, synthetic, MMN and PCA preprocessed spectral data.

Table 1. Basic statistics of SB values of original and synthetic SP (on the example of one leaf of A. negundo).

Spectral Profile (Sample Size)	Statistics	Band, nm
Spectral Profile (Sample Size)	Statistics	454	510	562	614	670	722	770	822	870	930
Real (n = 172)	Min	3.0	6.0	9.0	6.0	5.0	16.0	17.0	17.0	17.0	11.0
	Max	5.0	11.0	15.0	12.0	10.0	48.0	59.0	59.0	57.0	39.0
	Mean	3.9	7.7	12.0	7.9	6.1	33.7	40.7	40.3	39.1	25.3
	Median	4.0	8.0	12.0	8.0	6.0	35.0	42.0	41.5	40.0	26.0
	Standard deviation	0.6	0.8	1.1	0.8	0.8	5.8	8.0	8.0	7.8	5.1
Sintez (n = 172)	Min	3.0	6.0	9.0	6.0	5.0	16.0	17.0	21.0	20.0	11.0
	Max	5.0	11.0	15.0	12.0	10.0	48.0	59.0	59.0	57.0	36.0
	Mean	3.9	7.7	12.0	8.0	6.0	33.5	41.1	40.7	39.0	25.1
	Median	4.0	8.0	12.0	8.0	6.0	34.0	43.0	42.0	41.0	26.0
	Standard deviation	0.6	0.8	1.1	0.8	0.7	5.7	8.3	7.5	8.1	4.8
Sintez (n = 300)	Min	3.0	6.0	9.0	6.0	5.0	16.0	17.0	17.0	17.0	11.0
	Max	5.0	11.0	15.0	12.0	8.0	48.0	59.0	59.0	57.0	39.0
	Mean	4.0	7.7	12.0	7.9	6.1	33.3	40.7	40.4	39.6	25.4
	Median	4.0	8.0	12.0	8.0	6.0	34.0	42.0	43.0	40.0	26.0
	Standard deviation	0.6	0.9	1.1	0.8	0.8	6.0	8.1	7.8	8.4	5.2
Sintez (n = 1000)	Min	3.0	6.0	9.0	6.0	5.0	16.0	17.0	17.0	17.0	11.0
	Max	5.0	11.0	15.0	12.0	10.0	48.0	59.0	59.0	57.0	39.0
	Mean	3.9	7.7	12.0	7.9	6.1	33.5	40.7	40.5	39.1	25.5
	Median	4.0	8.0	12.0	8.0	6.0	35.0	42.0	41.0	40.0	26.0
	Standard deviation	0.6	0.8	1.1	0.8	0.8	5.7	8.0	7.7	7.6	4.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dmitriev, P.A.; Dmitrieva, A.A.; Kozlovsky, B.L. Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms. AgriEngineering 2025, 7, 90. https://doi.org/10.3390/agriengineering7030090

AMA Style

Dmitriev PA, Dmitrieva AA, Kozlovsky BL. Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms. AgriEngineering. 2025; 7(3):90. https://doi.org/10.3390/agriengineering7030090

Chicago/Turabian Style

Dmitriev, Pavel A., Anastasiya A. Dmitrieva, and Boris L. Kozlovsky. 2025. "Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms" AgriEngineering 7, no. 3: 90. https://doi.org/10.3390/agriengineering7030090

APA Style

Dmitriev, P. A., Dmitrieva, A. A., & Kozlovsky, B. L. (2025). Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms. AgriEngineering, 7(3), 90. https://doi.org/10.3390/agriengineering7030090

Article Menu

Random Reflectance: A New Hyperspectral Data Preprocessing Method for Improving the Accuracy of Machine Learning Algorithms

Abstract

1. Introduction

2. Materials and Methods

2.1. Object of Study

2.2. Hyperspectral Imaging

2.3. Hyperspectral Data Preprocessing

2.4. Analysing Preprocessed Hyperspectral Data

3. Results

3.1. Exploration Analysis of Synthetic Spectral Profiles

3.2. Results of Maple Classification Using Machine Learning Algorithms Based on Original and Synthetic Spectral Profiles

4. Discussion

5. Limitations

6. Future Perspectives

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI