Next Article in Journal
Response of Nitrogen Accrual in Various Soil Organic Matter Fractions to Different Land Uses
Previous Article in Journal
Hydro-Sedimentological Controls on Natural and Anthropogenic Radionuclide Distribution in the Western Black Sea Shelf
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Classification of Zones with Different Levels of Atmospheric Pollution Through a Set of Optical Features Extracted from Mulberry and Linden Leaves

1
Faculty of Technics and Technologies, Trakia University, 38 Graf Ignatiev Street, 8602 Yambol, Bulgaria
2
Faculty of Agriculture, Trakia University, Studentski Grad Str., 6000 Stara Zagora, Bulgaria
*
Author to whom correspondence should be addressed.
Environments 2026, 13(4), 185; https://doi.org/10.3390/environments13040185
Submission received: 15 February 2026 / Revised: 16 March 2026 / Accepted: 24 March 2026 / Published: 26 March 2026

Abstract

This study evaluates the ability of three classification procedures to distinguish areas with different levels of atmospheric pollution, based on biomonitoring carried out by analyzing the color and spectral characteristics of mulberry (Morus L.) and linden (Tilia L.) leaves. Sampling was carried out in areas that were grouped into four classes according to the concentrations of fine particulate matter (PM2.5, PM10) and gaseous pollutants (TVOC, NOx, SOx, CO, and eCO2), measured using a specialized multisensor device. A total of 57 informative features were analyzed, representing indices obtained from two color models (RGB and Lab), as well as from VIS and NIR spectral characteristics measured for the adaxial and abaxial leaf surfaces. The data processing methodology includes feature selection using the ReliefF method and a comparative analysis between two approaches to dimensionality reduction—principal components (PC) and latent variables (LV). The results indicate that data reduction using PC provides significantly higher accuracy and better class separability, regardless of the classifier used, compared to LV, where errors exceed 40%. The comparison between classifiers shows a clear superiority of nonlinear models. While linear discriminant analysis demonstrates low efficiency, quadratic discriminant analysis (Q and DQ) and SVM with radial basis function (RBF) achieve high accuracy of class separability, reaching 100% in the SVM-RBF model for both tree species. The study also reveals functional asymmetry: the adaxial side of the leaves is more informative for spectral indices, while the abaxial side is more sensitive to color changes. The results confirm that the combined optical characteristics obtained from the leaf surface of bioindicators form a reliable method for ecological monitoring of air quality in urban areas.

1. Introduction

Tree leaves are one of the most widely used bioindicators for assessing environmental quality due to their sensitivity to pollution, easy accessibility, and ability to accumulate both gaseous and solid pollutants. They respond to changes in atmospheric conditions through physiological, biochemical, and morphological changes that can be quantitatively measured and interpreted. According to Asif et al. [1], leaves are a reliable integral indicator of air quality, as they reflect both short-term and long-term effects of pollutants on plants. Similar conclusions are drawn by Mehmood et al. [2], who point out that plant species, especially trees, show clear reactions to pollution, making them suitable for biomonitoring in urban and industrial areas.
To complement the instantaneous measurements of gaseous pollutants and particulate matter, leaf-based optical indices serve as integrative bioindicators that reflect the cumulative deposition of pollutants and their physiological impact on plant tissues. Unlike direct air-quality measurements, these indices capture long-term stress responses, pigment degradation, structural changes, and species-specific sensitivity, providing a biological assessment of pollution effects.
Air pollution affects plant physiology through various mechanisms. Gaseous pollutants such as ozone, sulfur dioxide, and nitrogen oxides cause oxidative stress, which leads to chlorophyll degradation, disruption of photosynthetic processes, and changes in leaf metabolism [3,4,5]. Fine dust particles, on the other hand, accumulate on the leaf surface, reduce light transmission, alter leaf surface temperature, and can block stomata, leading to reduced gas exchange [6,7]. Patil et al. [8] demonstrate that pollution leads to significant changes in the biochemical parameters of leaves, including relative water content, extract pH, and ascorbic acid content, which are among the main components of the pollution tolerance index (APTI). These changes inevitably affect the optical properties of leaves, as pigments and leaf tissue structure determine how light is transmitted, absorbed, reflected, and scattered.
Spectral methods for assessing plant stress have been developed intensively in recent decades, offering a non-invasive, sensitive, and rapid approach to detecting physiological changes. Spectral analysis allows the recording of changes in reflectance at different parts of the spectrum, which are related to chlorophyll concentration, the state of the photosynthetic apparatus, water content, and the structural characteristics of the leaves [9,10]. For example, a decrease in reflectance in the green range and changes in the red edge are often associated with chlorophyll degradation and pollution-induced stress [11,12]. Molnár et al. [13] demonstrate that the spectral characteristics of leaves can be used as a reliable indicator of biochemical changes associated with the impact of pollutants on urban trees. This makes spectral methods particularly suitable for environmental monitoring, including through remote techniques such as hyperspectral photography, satellite observations, and drones [14,15].
Despite significant advances in spectral biomonitoring, there is a lack of studies that examine the adaxial (upper) and abaxial (lower) sides of leaves separately. These two surfaces have different anatomical structures, different degrees of exposure to pollutants, and different physiological functions. The adaxial side is usually thicker, with a more developed cuticle, while the abaxial side contains more stomata and is more sensitive to gaseous pollutants [16,17]. However, most studies treat leaves as a homogeneous structure, which limits the identification of specific mechanisms of pollution impact. Furthermore, comparative analyses between different tree species are also limited, although species-specific differences can help in determining the most appropriate bioindicators [18,19].
The color characteristics of leaves are linked to their physiological state, and visual alterations often represent the earliest symptoms of pollution-induced stress [20]. Changes in green intensity and the appearance of yellow, brown, or red hues reflect shifts in chlorophyll levels, pigment degradation, and the accumulation of stress-related secondary metabolites [21]. Air pollutants can accelerate chlorophyll loss, disrupt carotenoid and anthocyanin synthesis, and induce localized necrotic lesions, producing measurable modifications in leaf coloration [22,23]. Although traditionally evaluated subjectively, these alterations can be quantified through digital imaging and analysis in RGB, HSV, and CIELAB color spaces, enabling precise detection of stress responses and supporting the integration of color indicators into biomonitoring systems [24].
In the available literature on spectral analysis of plant leaves, the majority of studies focus on the use of the NIR and VNIR ranges. These spectral regions are traditionally considered more informative for the assessment of physiological and biochemical parameters, which is why the VIS range often remains in the background or is considered more limited. A significant portion of the available data is generated by specialized devices with a relatively high cost, which naturally narrows the scope of applications and limits the possibilities for wider practical implementation of spectral analysis methods.
Although both color and spectral characteristics have been well studied individually as indicators of physiological changes, relatively few studies have examined their combined use. The lack of sufficient research in this area limits the possibility of developing comprehensive stress assessment models that combine visual and spectral markers. This highlights the need for research that analyzes both types of characteristics simultaneously, especially in the context of affordable sensor technologies that allow simultaneous recording of color and spectral data.
Using an extended vector of combined features enables machine learning models to resolve complex overlaps in the data and achieve more reliable classification of areas with different pollution levels than when individual indicators are used individually. This illustrates the value of developing stress-detection approaches that integrate multiple optical and spectral characteristics into a unified analytical framework.
This study introduces an integrated approach that combines color-based (RGB, Lab) and spectral (VIS, NIR) characteristics obtained from accessible imaging devices and spectrophotometric measurements. The combination of visual and spectral information enables the detection of both morphological changes expressed through color indices and subtle physiological or biochemical alterations captured across the visible and near-infrared ranges. The use of affordable equipment makes these methods feasible in real-world conditions, including environments where specialized high-cost instruments are impractical, thereby expanding the potential for large-scale environmental biomonitoring and broader participation from research and applied communities.
The aim is to evaluate how combined color (RGB, Lab) and spectral (VIS, NIR) indices derived from mulberry (Morus) and linden (Tilia) leaf biomass support the classification performance of three machine learning algorithms in distinguishing areas with different levels of atmospheric pollution. Analyses are performed separately for the adaxial and abaxial leaf surfaces. By constructing an informative vector of optical indices, the study identifies specific markers associated with pollution-induced physiological and biochemical changes. The findings contribute to the development of accessible, sensitive, and widely applicable methods for detecting ecological stress in urban, industrial, and natural ecosystems.
The main contributions of this work are: (i) separate assessments of adaxial and abaxial leaf surfaces, (ii) integration of combined optical features from RGB/Lab and VIS/NIR domains, and (iii) comparative evaluation of multiple classifiers across pollution gradients.

2. Material and Methods

The experimental measurements were carried out in four consecutive stages (E1, E2, E3, and E4) throughout the growing season in order to evaluate the robustness of the classification models under different levels of accumulated ecological stress and changing physiological conditions of mulberry and linden leaves. Early stages (E1–E2) captured the initial responses of the leaves to atmospheric pollution, while the later stages (E3–E4) reflected more pronounced physiological and biochemical changes resulting from cumulative exposure to particulate matter and gaseous pollutants. Leaf sampling was conducted across 16 sites evenly distributed among the four pollution classes (C1–C4), with four sites assigned to each class. At each site, leaves were collected from four mature Morus and four Tilia trees representing typical canopy conditions. From every tree, eight fully developed, undamaged leaves were sampled during each stage, yielding 32 leaves per species per site per stage. The same trees were revisited in all four stages to ensure temporal consistency and minimize inter-individual variability. Sampling took place during four phenological periods of the vegetation season (E1 in late May, E2 in late June, E3 in late July, and E4 in early September), providing balanced spatial and temporal replication for assessing pollution-related optical changes.
The introduction of time intervals allows us to check whether the combined color and spectral indices retain their informativeness and resolution throughout the entire life cycle of the leaves, which is crucial for the reliability of biomonitoring in real conditions.
The characteristics of the environment were determined using a device developed at FTT-Yambol, Bulgaria [25]. The measuring device was located on site in the regions for collecting leaf biomass, and a systematic recording of the concentrations of fine dust particles and gaseous pollutants was carried out. The measurements were taken in a quadrant located in the southeastern part of Bulgaria, with a total area of 340 km2, with geographical coordinates, according to WGS84, of 42.50239167 N 26.48405833 E; 42.49651389 N 26.93885556 E; 42.41335278 N 26.93737778 E; and 42.42006944 N 26.48560833 E.
Table 1 contains data on the threshold values for pollutants. They are defined according to the pollution levels specified by Lark [26], Bright et al. [27], and Latif et al. [28]. The CO2 value of 410 ppm represents the typical background atmospheric concentration during the study period and is not used as a pollution threshold in the regulatory sense. The cited references do not classify CO2 as a harmful pollutant; instead, it is included here as a contextual environmental parameter that helps distinguish areas with elevated anthropogenic activity when combined with the other measured pollutants. Its role in the classification scheme is therefore supportive rather than diagnostic.
Table 2 shows data on the average values of pollutants in the four classes. They show contrasts between the four classes, with 1 and 2 being the most polluted, especially in terms of fine particulate matter (PM2.5, PM10) and TVOC, indicative of emissions from heavy industry and vehicles. Class 1 has the highest NOx and SOx values, indicative of its proximity to intensive or industrial areas, while class 2 has the highest TVOC concentration. On the other hand, class 4 shows the lowest values of all pollutants, reflecting low-pollution areas such as rural or suburban areas. Class 3 occupies an intermediate position; levels are relatively moderate for eCO2 and TVOC, which highlights suburban areas with average levels of harmful emissions.
To assign each measurement zone to one of the four pollution classes (C1–C4), all pollutants listed in Table 1 were compared against their respective threshold values. For each pollutant, values exceeding the threshold were coded as “high,” and values below the threshold as “low.” Particulate matter (PM2.5 and PM10) was treated as the primary determinant of pollution intensity because of its strong spatial variability and well-documented physiological impact on leaf surfaces. The remaining pollutants (TVOC, eCO2, NOx, SOx, CO) were combined into an overall pollution-level score (PL), calculated as the number of pollutants exceeding their thresholds. Zones with at least three pollutants above the threshold were classified as high-pollution-level (H-PL), while those with fewer than three were classified as low-pollution-level (L-PL). The final class labels were assigned based on the combination of PM status (H-PM or L-PM) and PL status (H-PL or L-PL), resulting in four categories: C1 (H-PM, H-PL), C2 (H-PM, L-PL), C3 (L-PM, H-PL), and C4 (L-PM, L-PL). This approach provides a transparent and reproducible method for integrating multipollutant exposure into a categorical classification scheme.
The four classes of areas with different degrees of pollution, covering the different levels of fine dust particles and other measured pollutants, are presented in Table 3.
Color digital images of mulberry and linden leaves were obtained. The photographed sample was placed in a dome-shaped lighting system on a white background in order to prevent the influence of ambient light and to obtain homogeneous lighting. The shooting distance from the video sensor to the object was 25 cm.
The illumination was provided by a cold-white LED strip with a correlated color temperature of 6400 K. This type of phosphor-converted LED emits a broad continuous spectrum across the visible range, produced by a blue primary emission peak near 450 nm combined with a wide yellow–green phosphor band, resulting in stable white illumination suitable for colorimetric analysis.
The color setting of the video sensor was made using a color scale with 24 color fields, Color Check Chart BST11 (Danes-Picta, Prague, Czech Republic).
Spectral characteristics in the VIS spectral range were obtained. The full spectrum of the images was used to obtain the VIS spectral characteristics. The conversion functions are for observer 2o (Stiles and Burch 2o, RGB (1955)) and D65 illumination (average daylight with UV component (6500 K)), according to mathematical dependencies presented by Wyman et al. [29].
Spectrophotometric measurements were made in the NIR spectral range. For each leaf, spectral characteristics were obtained from the adaxial and abaxial sides of the leaf blade. The spectral characteristics were obtained at a distance of 5 mm from the measuring probe to the object. An NIRQuest512 measuring device (Ocean Optics Inc., Orlando, FL, USA) was used.
For the purposes of the study, several groups of optical indicators were calculated and analyzed. Color indices were defined based on the components of RGB [30,31,32] and the Lab model [33,34]. The spectral characteristics of the leaves were assessed by calculating VIS indices using specific wavelengths in the visible range [35,36], as well as by NIR spectral indices [37,38], in which the average (central) wavelengths for each index were applied. A detailed description of the mathematical formulas used, as well as the calculated values for all RGB, Lab, VIS, and NIR indices, are presented in Supplementary Materials (Equations (S1)–(S52), and Tables S1–S8).
Table 4 shows the features used and their numbers. Several RGB- and VIS-based indices share identical names because they originate from different formula families but use the same abbreviation.
The ReliefF method was applied for the selection of informative features [39,40,41]. The algorithm evaluates the significance of each feature according to its ability to distinguish between observations that are close in value but belong to different classes. Since the study solved a classification problem with a categorical dependent variable (the four pollution zones), ReliefF was used, which is specialized for this type of data, unlike its regression variant RReliefF. The method assigns a weight to each variable, with higher values indicating a greater contribution to the separability of the data. Those features with weight coefficients above 0.6 are considered informative [42,43]. A vector of them is defined from the selected features. This makes it an effective tool for preliminary selection of informative variables, and in the present study, it was used to form compact vectors of features for the adaxial and abaxial sides of the leaves. A feature selected in only one of the four stages was retained because the goal of the analysis was to capture both stable and stage-specific stress responses. Some indices (e.g., pigment-related VIS ratios) become informative only under stronger cumulative stress in late stages, while others are more sensitive during early physiological transitions. Excluding such features would remove biologically meaningful information about temporal dynamics of pollution effects.
The study also used and compared established classification methods in biomonitoring, ecological spectroscopy, and chemometric analysis, including the Naive Bayes classifier, discriminant analysis, and SVM models. These approaches allowed the assessment of the data structure, the degree of separability between classes, and the effectiveness of different separating functions in the classification of mulberry and linden samples, and hence the spatial differences in air quality.
The Naive Bayes classifier [44,45,46] was used as a reference method to assess the primary separability of classes and the effectiveness of data reduction procedures. The classifier creates the characteristic circular (concentric) boundaries that the model forms. These boundaries stem from the assumption about the shape of probability distributions—when the model assumes that the features follow a normal distribution and are independent, the isolines of equal probability often take on a circular or elliptical shape. The model is called “naive” because it assumes that all features are conditionally independent of each other within the class and have circular boundaries between them. This greatly simplifies the calculations and allows for fast and stable classification, but at the same time it is a “naive” assumption, since in real data the features are often correlated.
The classification task was also performed with a discriminant classifier with three types of separating functions [47,48]. In the linear model, the boundary between classes is a straight line, reflecting the assumption that the covariance matrices of the two classes are identical. In the quadratic model, the separating curve is curved, as a different covariance structure is assumed, allowing for more flexible adaptation to the data shape. The diagonal–quadratic model uses a simplified covariance matrix without correlations between features, resulting in an intermediate type of boundary—more flexible than the linear one, but more limited than the full quadratic one.
The third classification approach involves the application of the SVM method, implemented with three separating functions to assess the distinguishability between classes [45,46,47,49]. In the linear model, the separating boundary is a straight line (hyperplane), which is based on the assumption of linear separability of the data in the feature space. The polynomial kernel forms more complex, curved boundaries, which allows the classifier to identify nonlinear dependencies between individual classes. The highest degree of flexibility is offered by the radial basis function (RBF), which creates localized separation surfaces around the data. The use of the RBF kernel is aimed at accurately modeling complex and unpredictable distributions characteristic of the multidimensional spectral and color characteristics of leaf biomass under conditions of ecological stress. The selection of these three functions allows us to compare the ability of SVM to adapt to the different geometric structure of the mulberry and linden data.
Table 5 shows the class label categories when evaluating the accuracy of classification into two classes. The four main label categories obtained from the comparison between the classes predicted by the classifier and the actual labels are shown [50,51]: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). These categories serve as the basis for calculating key metrics for assessing classification accuracy.
Table 6 presents the evaluations used for the classifiers. The indicators presented describe the main quantitative criteria for evaluating the quality of two-class classification models [52,53]. Precision—The proportion of correctly predicted positive cases relative to all cases predicted as positive. It shows the reliability of positive predictions. Sensitivity (Recall, True Positive Rate)—The proportion of actual positive cases that the model correctly recognizes. It assesses its ability to detect positive examples. Specificity (True Negative Rate)—The proportion of truly negative cases that the model classifies correctly. Reflects its ability to avoid false positive predictions. Accuracy—The proportion of all correctly classified cases (positive and negative) relative to the total number of observations. It represents an overall assessment of the quality of the model. Overall Error—The proportion of incorrectly classified cases relative to the total number. It reflects the overall frequency of errors in the classification. For the model to work well, all metrics should have high values close to 100%, while the overall error should be minimal and tend towards 0%.

3. Results and Discussion

3.1. Selection of Informative Features for Classes of Areas, Depending on the Degree of Atmospheric Air Pollution

A total of 57 indicators were used to ensure sufficiently accurate classification in the study. They cover the intensity of the main color components and their derivatives, various spectral ratios, and derived spectral and color indices (e.g., NDVI, ExG, VARI, etc.). These indicators reflect changes in vegetation cover, such as density and coloration, the physiological condition of the samples, including water content and pigment concentration, photosynthetic activity, and the characteristic visual–spectral features of leaf tissue.
Figures S1–S4 in Supplementary Materials present the results of the selection of informative features for the adaxial (FVG) and abaxial (FVD) sides of mulberry and linden leaves. The traits were selected using the ReliefF method. Those traits with weight coefficients above 0.6 in at least one of the measurement stages considered are defined as informative.
The following feature vectors were formed for mulberry leaves:
FVG = [3 6 9 10 19 20 21 30 33 35 37 38 39 40 41 42 44 48 50 55]
FVD = [3 5 24 25 32 33 42 44 46 50 52 55 56 57]
The following feature vectors were formed for linden leaves:
FVG = [2 3 6 8 9 10 11 21 29 31 33 35 36 37 38 39 40 41 42 44 46 50 52 55 57]
FVD = [2 3 5 7 8 9 10 11 12 13 17 21 24 28 31 43 44 45 46 47 49 50 51 52 53 54 55 56 57]
Table 7 provides a mapping between the numeric feature codes selected by ReliefF and their corresponding full index names for both species and both leaf surfaces. It summarizes which RGB, Lab, VIS, and NIR indices were identified as informative for mulberry and linden leaves, separately for the adaxial and abaxial sides.
The vectors formed from the features for both mulberry and linden show that the spectral indices are more informative for the adaxial side of the leaves in terms of the classification task. This means that the adaxial surface contains more information about the pigment content, water status, and structural features of the tissues, which are more reliably reflected by spectral ratios. For the underside, color indices prove to be more informative, which shows that it provides more information about the visual characteristics of the leaf surface—hue, intensity, and spatial distribution of color, which are assessed more accurately by color models than by spectral characteristics. The differences in the number of selected indices indicate that the adaxial and abaxial sides of the leaves differ in terms of the informativeness of the spectral response, which reflects their different morphological and optical structure.

3.2. Classification with a Naive Bayes (NB) Classifier

The results presented in Table S9 (adaxial part) and Table S10 (abaxial part) of Supplementary Materials, as well as in Figure 1, show the distinction between the two approaches used for data reduction—latent variables (LV) and principal components (PC)—in the reference classification with the Naive Bayes classifier based on mulberry data. For the adaxial side of the leaves, the LV method demonstrates low to moderate effectiveness, with the average accuracy in all stages remaining around 45–55% and the total error exceeding 40–55%. This shows significant overlap between classes and the limited ability of LV to extract informative features from the adaxial leaf surface. The PC method achieves higher accuracy values, reaching 80–95% in individual comparisons, which shows better but still inconsistent classification performance compared to LV. For the underside of the leaves, the difference between the two methods is even more pronounced. The PC method provides very high and stable accuracy in all stages (E1–E4), with accuracy exceeding 95–100% in many cases and overall error falling below 5%. This shows that the underside provides more informative and stable characteristics, which PC extracts optimally. Conversely, LV again shows moderate effectiveness, with accuracy values around 45–55% and high errors, confirming its limited applicability. The results clearly indicate that principal components (PC) is a more suitable method for data reduction for both the adaxial and abaxial sides of the leaves, with their advantage being particularly pronounced on the abaxial surface. The Naive Bayes classifier, used as a reference method, shows sensitivity to the choice of data reduction method.
The results presented in Table S11 (adaxial side) and Table S12 (abaxial side) of Supplementary Materials, as well as in Figure 2, show the advantage of the principal components (PC) method over latent variables (LV) in the classification of data from linden leaves with the Naive Bayesian classifier. For the adaxial side of the leaves, the PC method demonstrates very high and stable classification efficiency in all stages (E1–E4). In many pairs of classes, the accuracy exceeds 95%, and the overall error falls below 5%, including cases of almost perfect classification. This shows that PC manages to extract the key spectral and color characteristics that best distinguish the classes. Conversely, LV shows moderate efficiency, with accuracy values around 45–55% and significantly higher errors, which limits its ability to form reliable boundaries between classes. For the underside of the leaves, the difference between the two methods is even more pronounced. The PC method achieves extremely high accuracy at all stages, with accuracy in many cases between 98% and 100% and overall error practically zero. This shows that the underside of linden leaves provides particularly informative characteristics, which PC extracts optimally. LV again demonstrates low to moderate effectiveness, with accuracy values around 47–60% and significantly higher errors, which highlights its limited applicability for this type of data. Principal components (PC) is a suitable method for reducing data when classifying linden leaves, providing a relatively more stable, accurate, and reliable classification compared to LV. The high errors in LV, especially on the adaxial side, are an indicator of strong overlap between classes, which is characteristic of the elliptical membership areas formed by the Naive Bayes classifier. This further emphasizes the need to use two principal components as the optimal approach for subsequent classification analyses.

3.3. Results from Classification with Discriminant Analysis

The results presented in Tables S13–S16 (E1–E4) in Supplementary Materials show a clear and consistent distinction between the effectiveness of the three separating functions in discriminant analysis—linear (L), quadratic (Q), and diagonal–quadratic (DQ)—in the classification of data from the adaxial and abaxial sides of the mulberry leaf, reduced by principal components (PC). In all four stages, there is a stable and clear advantage of the quadratic models (Q and DQ). They achieve high accuracy values for the adaxial side of the leaves (usually 75–96%) and very high values for the abaxial side (in most cases 97–100%), with minimal or practically zero errors. This shows that nonlinear separating functions describe the data structure much better and allow reliable class differentiation in all measurement stages. On the other hand, linear DA demonstrates low to moderate effectiveness in all stages. The accuracy for the adaxial side varies approximately between 22 and 57%, and for the abaxial side between 24 and 68%, with overall error remaining high. This shows that linear boundaries are not suitable for the complex structure of the spectral and color characteristics of the leaves and cannot capture the nonlinear dependencies that dominate the data. Comparison with the results of the Naive Bayes classifier (NB) applied to the same PC-reduced data shows that the quadratic DA models are fully comparable to NB and in some cases even outperform it, especially on the abaxial side of the leaves, where the errors are practically zero at all stages. Linear DA remains significantly weaker and cannot compete with either NB or quadratic DA models. The results from the four stages clearly show that for mulberry data, the most suitable methods are those that can model complex, nonlinear separation boundaries—the quadratic forms of discriminant analysis (Q and DQ) and the Naive Bayes classifier. Linear DA is not applicable for reliable classification in any of the stages, which emphasizes the need to use nonlinear models when analyzing the spectral and color characteristics of the leaves.
The results presented in Tables S17–S20 (E1–E4) in Supplementary Materials show different effectiveness in performing the classification task using the three separating functions of discriminant analysis—linear (L), quadratic (Q), and diagonal–quadratic (DQ)—in classifying data from the adaxial and abaxial sides of linden leaves, reduced by principal components (PC). When using principal components (PC) for dimensionality reduction, the linear model (L) demonstrates a systematic inability to reliably distinguish classes according to their atmospheric pollution. The overall accuracy values for the adaxial surface vary within the unsatisfactory range of 33–61%, while for the abaxial side, levels between 26 and 70% are recorded, accompanied by high levels of overall error. These indicators suggest that the linear separating hyperplanes do not correspond to the actual distribution of optical indices in this tree species.
Unlike linear approximation, nonlinear modifications (Q and DQ) adapt precisely to the complex variability of the data. Throughout the entire vegetation cycle, the quadratic models maintain extremely high diagnostic reliability: the accuracy on the adaxial side of the leaves varies between 89 and 100%, and on the abaxial surface it reaches practically maximum values (93–100%). Minimizing the error to levels of 0–3% confirms that nonlinear boundaries describe much more adequately the specificity of the spectral and color response of linden under conditions of ecological stress.
When compared to the reference Naive Bayes classifier (NB), the quadratic variants of discriminant analysis are positioned as completely equivalent, and in the analysis of the abqaxial leaf surface, they often demonstrate even higher precision. The results obtained lead to the conclusion that for the purposes of biomonitoring in linden trees, the use of linear methods is methodologically unjustified due to the strong overlap of characteristics in a real urban environment. Optimal resolution is achieved only through nonlinear modeling (Q, DQ, or NB), which successfully maps the multidimensional dependencies in the combined characteristics of leaf biomass. Figure 3 presents the results of the discriminant analysis (DA) for mulberry and linden leaves, comparing the accuracy and overall error for the upper (adaxial) and lower (abaxial) sides of the leaves in the four measurement stages (E1–E4). A clear trend is observed for both species: the abaxial side of the leaves consistently shows higher accuracy and lower error, regardless of the stage. The differences are particularly pronounced in linden, where the classification accuracy for the abaxial surface reaches the highest values in all stages. These results confirm that the underside of the leaves provides more informative spectral and color characteristics for classification purposes, while the adaxial side demonstrates more moderate class separability.

3.4. Results from Classification with SVM

The results presented in Tables S21–S24 (E1–E4) in Supplementary Materials show a distinction between the effectiveness of the three kernels used in the support vector machine (SVM) method—linear, polynomial, and radial basis function (RBF)—in the classification of data from the adaxial and abaxial sides of the mulberry leaf, reduced by principal components (PC). In all four stages, the linear and polynomial kernels demonstrate unstable and inconsistent effectiveness. Although high sensitivity (sometimes reaching 100%) is observed in individual class pairs, it is often combined with low specificity, resulting in moderate overall accuracy—approximately 32–77% for the adaxial side and 38–64% for the abaxial side. These results show that linear and polynomial kernels fail to capture the complex nonlinear dependencies in the spectral and color characteristics of the leaves and do not provide reliable class separation. In contrast, SVM with a radial basis function (RBF) kernel demonstrates extremely high and stable performance at all stages. For the adaxial side of the leaves, the accuracy reaches 98–100%, and for the abaxial side, 92–100%, with minimal or practically zero errors. This clearly shows that the RBF model is best suited for describing the nonlinear structures in the data and provides the most stable and reliable classification regardless of the measurement stage. A comparison with the results of the Naive Bayes classifier (NB) and discriminant analysis (DA) shows that SVM-RBF consistently outperforms both methods. While NB and the quadratic variants of DA also achieve high accuracy on PC-reduced data, SVM-RBF demonstrates even higher stability, lower errors, and greater robustness between different class pairs. Linear DA and linear SVM show similar limitations, while the nonlinear variants of both methods perform significantly better. The results of the four stages clearly show that SVM with a radial basis kernel is the most effective classifier for mulberry data, providing the highest accuracy, lowest errors, and greatest robustness to data variations. This makes it the most suitable method for reliable classification based on a combination of the spectral and color characteristics of the leaves.
The results presented in Tables S25–S28 (E1–E4) in Supplementary Materials show clear differences between the classification effectiveness of the three kernels used in the support vector machine (SVM) method—linear, polynomial, and radial basis function (RBF), regardless of whether data from the adaxial or abaxial side of the linden leaves were used.
The linear and polynomial kernels show limited ability to approximate the complex dependencies in the spectral–color features. They show unsatisfactory performance, characterized by an imbalance between sensitivity and specificity, resulting in low overall accuracy—between 30 and 67% for the adaxial side and 22 and 80% for the abaxial side of the leaves. These geometric limitations of the models confirm the presence of a nonlinear data structure that cannot be covered by standard planar or polynomial boundaries. In contrast, the SVM model with a radial basis function (RBF) demonstrates high adaptability and stability throughout the entire growing season. Its ability to form flexible, localized separation boundaries allows for extremely high accuracy: from 90% to 100% for the adaxial surface and absolute accuracy (100%) for the abaxial surface in all class pairs. The minimal error levels and high resistance of the RBF kernel to data variations establish it as the most reliable tool for identifying ecological stress in linden trees.
Figure 4 presents the results of the classification using the support vector machine (SVM) method for mulberry and linden leaves, showing the accuracy and overall error for the upper (adaxial) and lower (abaxial) sides of the leaves, averaged for the four measurement stages (E1–E4). Both species show sufficiently high classification efficiency, with accuracy close to 100% for all stages and errors being minimal or practically zero. The differences between the stages are insignificant, which emphasizes the stability of the model. The abaxial side of the leaves shows relatively higher accuracy and lower error, but both surfaces demonstrate almost perfect separability. These results clearly illustrate that the SVM model, especially with a radial basis kernel, provides the most reliable and stable classification among all the methods studied.

3.5. Discussion

A comparative analysis of the results obtained for the classification of areas with different degrees of atmospheric pollution, based on spectral and color indices from mulberry and linden leaves, shows sufficient consistency with the trends described in the available literature [54,55]. The presented study uses a wide range of indices (NDVI, ExG, VARI, NDWI, SAVI, EVI, PSRI, TVI, MSR, NBR, etc.) that reflect the pigment content, water status, and structural features of the leaf tissue. This complements the established understanding that spectral indices are sensitive to the physiological state of vegetation and can serve as reliable indicators of stress and environmental pollution [37,56,57].
The results for mulberry and linden, where SVM with a radial basis function (RBF) kernel achieves very high classification accuracy, complement previous studies in which SVM has been established as one of the most effective methods for classifying plant objects based on spectral and image characteristics [58,59]. Developments dedicated to the assessment of the urban environment through spectral characteristics of mulberry leaves also use data reduction methods (principal components, latent variables) and SVM, showing that the combination of an appropriate data reduction method and a nonlinear classifier leads to sufficiently high accuracy in distinguishing areas with different environmental quality [60,61]. Similar results are observed in other applications—for example, in the classification of stress symptoms in plants under controlled conditions, where SVM (especially with an RBF kernel) demonstrates high sensitivity and specificity in distinguishing different types of stress based on color and spectral features [62,63].
The superiority of SVM-RBF over the Naive Bayes classifier and linear variants of discriminant analysis observed in the present study is consistent with the general trend in the literature, according to which nonlinear methods outperform linear ones in tasks where the relationship between features and classes is highly nonlinear and multidimensional [44]. Previous studies related to the assessment of urban environment quality using mulberry leaves have shown that the use of principal components and nonlinear classifiers (including SVM) leads to significantly higher accuracy compared to linear models and non-complex statistical approaches [54,55]. The results obtained in this study complement those in the literature in that SVM-RBF and quadratic discriminant analysis (Q, DQ) systematically dominate linear variants and NB, especially in the later stages of leaf development (E3–E4), when the differences between zones are more pronounced.
The use of feature selection methods (ReliefF) and data dimension reduction (using principal components) is also in line with the best practices described in contemporary guidelines for working with spectral data from tree leaves. The literature [64,65,66] emphasizes that with a large number of highly correlated spectral variables, it is necessary to apply techniques such as PLS, PCA, or other reduction and selection methods to avoid overfitting and improve model stability. In the present study, this was achieved by combining ReliefF for feature selection and PCA for reduction to two principal components, which creates compact but informative feature vectors for subsequent classification. The sufficiently high classification accuracy values obtained with NB, DA, and SVM on PC-reduced data confirm that this approach is methodologically sound and effective.
Of particular interest is the result that, in some configurations, the Naive Bayes classifier using principal components achieves sufficiently high accuracy comparable to that of DA and SVM, while when reducing the data using latent variables (LV), the accuracy drops significantly. This complements observations from other studies [45,46,49,67], in which the choice of data reduction method proves to be important for the quality of classification—an incorrectly selected or overly complex reduction method can worsen class separability, even when using relatively powerful classifiers. The results of the present study provide empirical confirmation that for tasks involving the classification of pollution zones based on leaf indices, PCA is a more suitable choice than LV, especially when combined with nonlinear classifiers such as SVM-RBF.
The results demonstrate that the proposed approach has strong practical applicability for real-world biomonitoring. By relying on inexpensive optical measurements of leaves, the method captures cumulative physiological responses to pollution rather than only instantaneous air-quality values. The high classification accuracy achieved across species, leaf surfaces, and phenological stages indicates that the combined RGB, Lab, VIS, and NIR indices provide a reliable and scalable tool for assessing pollution gradients in urban environments. This makes the approach suitable for routine monitoring in areas where dense sensor networks are impractical or economically unjustified.
The results obtained are in line with the general trend in precision agriculture and environmental monitoring, where the combination of spectral indices, methods for selecting and reducing the volume of feature vector data, and modern machine learning algorithms (SVM, deep networks, etc.) is establishing itself as the standard approach for reliable detection of stress, pollution, and degradation of plant ecosystems. The present study complements these developments by showing that even with a relatively limited number of classes (four pollution zones) and using classical indices, sufficiently effective classification can be achieved if a structured process is applied, involving the selection of features, appropriate methods for reducing the volume of data, and the choice of a nonlinear classifier.

4. Conclusions

A selection of informative features was made, which shows a distinction between the adaxial and abaxial sides of the leaves. It was established that the adaxial side contains more informative spectral indices, while the abaxial side is more sensitive to color indices. This is important when analyzing leaves for functional asymmetry of the leaf surface in ecological monitoring.
Vectors of features for the two types of leaves were formed, which are compact but retain their sufficiently high informativeness. These vectors are important for the task of classifying pollution zones and have been validated by three independent classification methods.
A Naive Bayes classifier was used as a reference method to evaluate the effectiveness of the various methods for reducing the volume of data. It was found that when applying PC, the accuracy reaches 100% in individual pairs of classes, while with LV, the errors exceed 40%.
A comparative analysis was performed between three classification methods—NB, DA, and SVM—which showed the superiority of nonlinear models compared to linear separating functions. Quadratic discriminant analysis (Q and DQ) and SVM with RBF kernel achieved the highest accuracy, with SVM-RBF reaching the highest classification levels (100%) in almost all stages of the study for both types of leaves.
SVM-RBF was found to be the most suitable method for classifying pollution zones among all those compared, showing the highest stability compared to the other methods used, the lowest errors, and the smallest variation in results between the study stages.
It has been proven that leaf indices can serve as a sufficiently reliable bioindicator of environmental quality, allowing the differentiation of areas with different degrees of pollution with an accuracy comparable to the most modern methods presented in the available literature. This confirms the applicability of the approach in real conditions.
A sufficiently high degree of consistency between the results for the two leaf types has been demonstrated, indicating that the research methodology used is robust and does not depend on the specific plant species. This broadens the possibilities for applying the developed approach to other tree species.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/environments13040185/s1, Equations (S1)–(S15) Equations for RGB color indices; Equations (S16)–(S26) Equations for Lab color indices; Equations (S27)–(S37) Equations for VIS spectral indices; Equations (S38)–(S52) Equations for NIR spectral indices; Table S1. RGB indices of mulberry leaves; Table S2. RGB indices of linden leaves; Table S3. Lab indices of mulberry leaves; Table S4. Lab indices of linden leaves; Table S5. VIS indices of mulberry leaves; Table S6. VIS indices of linden leaves; Table S7. NIR indices of mulberry leaves; Table S8. NIR indices of linden leaves; Figure S1. Results of selection of informative features based on data for the adaxial part of mulberry leaves; Figure S2. Results of selection of informative features based on data for the abaxial part of mulberry leaves; Figure S3. Results of selection of informative features based on data for the adaxial part of a linden leaf; Figure S4. Results of selection of informative features based on data for the abaxial part of a linden leaf; Table S9. Classification results with a Naive Bayes classifier on data from the adaxial part of a mulberry leaf (DRM—data reduction method); Table S10. Classification results with a Naive Bayes classifier on data from the abaxial part of a mulberry leaf (DRM—data reduction method); Table S11. Classification results with a Naive Bayes classifier on data from the adaxial part of a linden leaf (DRM—data reduction method); Table S12. Classification results with a Naive Bayes classifier on data from the abaxial part of a linden leaf (DRM—data reduction method); Table S13. Classification errors with DA for mulberry leaves in E1 from measurements; Table S14. Classification errors with DA for mulberry leaves in E2 from measurements; Table S15. Classification errors with DA for mulberry leaves in E3 from measurements; Table S16. Classification errors with DA for mulberry leaves in E4 from measurements; Table S17. Classification errors with DA for linden leaves in E1 from measurements; Table S18. Classification errors with DA for linden leaves in E2 from measurements; Table S19. Classification errors with DA for linden leaves in E3 from measurements; Table S20. Classification errors with DA for linden leaves in E4 from measurements; Table S21. Classification errors with SVM for mulberry leaves in E1 from measurements; Table S22. Classification errors with SVM for mulberry leaves in E2 from measurements; Table S23. Classification errors with SVM for mulberry leaves in E3 from measurements; Table S24. Classification errors with SVM for mulberry leaves in E4 from measurements; Table S25. Classification errors with SVM for linden leaves in E1 from measurements; Table S26. Classification errors with SVM for linden leaves in E2 from measurements; Table S27. Classification errors with SVM for linden leaves in E3 from measurements; Table S28. Classification errors with SVM for linden leaves in E4 from measurements.

Author Contributions

Conceptualization, Z.Z. and M.V.; methodology, P.V.; software, Z.Z.; validation, D.K., M.V., P.V. and Z.Z.; formal analysis, Z.Z.; investigation, D.K., M.V. and P.V.; resources, D.K., M.V. and Z.Z.; data curation, D.K.; writing—original draft preparation, Z.Z. and M.V.; writing—review and editing, D.K. and P.V.; visualization, Z.Z.; supervision, Z.Z. and P.V.; project administration, Z.Z. and M.V.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Bulgarian national program “Development of scientific research and innovation at Trakia University, Bulgaria in the service of health and sustainable well-being”—BG-RRP-2.004-006-C02.

Data Availability Statement

The supporting data is presented in Supplementary Materials, added as a Supplementary File. Additional information can be requested from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The abbreviations for color and spectral indices are presented in Supplementary Materials. The following abbreviations are used in this manuscript:
APTIPollution tolerance index
COCarbon monoxides
DADiscriminant analysis
DQDiagonal–quadratic separation function
DRMData reduction method
eCO2Equivalent carbon dioxide
FNFalse negative
FPFalse positive
FTTFaculty of Technics and Technologies
FVDFeature vector of abaxial part of the leaves
FVGFeature vector of adaxial part of the leaves
HSVHue–saturation–value color model
LLinear separation function
LabLab color model
LEDLight-Emitting Diode
LVLatent variables
NBNaive Bayes classifier
NIRNear-infrared spectra
NOxNitrogen oxide
OEOverall error
PCPrincipal components
PLPollution level
PMParticulate matter
ppmParts per million
QQuadratic separation function
RBFRadial basis function
RGBRed–green–blue color model
SOxSulfur oxide
SVMSupport vector machine (classifier)
TNTrue negative
TPTrue positive
TVOCTotal volatile organic compound
UVUltraviolet
VISVisible spectra
VNIRVisible–near-infrared spectral range
WGS84World geographic system—84

References

  1. Asif, Z.; Ma, W. Assessing the Air Pollution Tolerance Index of Urban Plantation: A Case Study Conducted along High-Traffic Roadways. Atmosphere 2024, 15, 659. [Google Scholar] [CrossRef]
  2. Mehmood, Z.; Yang, H.-H.; Awan, M.U.F.; Ahmed, U.; Hasnain, A.; Luqman, M.; Muhammad, S.; Sardar, A.A.; Chan, T.-Y.; Sharjeel, A. Effects of Air Pollution on Morphological, Biochemical, DNA, and Tolerance Ability of Roadside Plant Species. Sustainability 2024, 16, 3427. [Google Scholar] [CrossRef]
  3. Anand, P.; Mina, U.; Khare, M.; Kumar, P.; Kota, S.H. Air pollution and plant health response—Current status and future directions. Atmos. Pollut. Res. 2022, 13, 101508. [Google Scholar] [CrossRef]
  4. Ainsworth, E.A.; Yendrek, C.R.; Sitch, S.; Collins, W.J.; Emberson, L.D. The effects of tropospheric ozone on net primary productivity and implications for climate change. Annu. Rev. Plant Biol. 2012, 63, 637–661. [Google Scholar] [CrossRef] [PubMed]
  5. Fowler, D.; Pilegaard, K.; Sutton, M.A.; Ambus, P.; Raivonen, M.; Duyzer, J.; Simpson, D.; Fagerli, H.; Fuzzi, S.; Schjoerring, J.K.; et al. Atmospheric composition change: Ecosystems–Atmosphere interactions. Atmos. Environ. 2009, 43, 5193–5267. [Google Scholar] [CrossRef]
  6. Przybysz, A.; Sæbø, A.; Hanslin, H.M.; Gawroński, S.W. Accumulation of particulate matter and trace elements on vegetation as affected by pollution level, rainfall and the passage of time. Sci. Total Environ. 2014, 481, 360–369. [Google Scholar] [CrossRef]
  7. Sæbø, A.; Popek, R.; Nawrot, B.; Hanslin, H.M.; Gawronska, H.; Gawronski, S.W. Plant species differences in particulate matter accumulation on leaf surfaces. Sci. Total Environ. 2012, 427–428, 347–354. [Google Scholar] [CrossRef] [PubMed]
  8. Patil, P.Y.; Goud, A.V.; Patil, P.P.; Jadhav, K.K. Assessment of air pollution tolerance index (APTI) and anticipated performance index (API) of selected roadside plant species for the green belt development at Ratnagiri City in the Konkan region of Maharashtra, India. Environ. Monit. Assess. 2023, 195, 494. [Google Scholar] [CrossRef]
  9. Carter, G.A.; Knapp, A.K. Leaf optical properties in higher plants: Linking spectral characteristics to stress and chlorophyll concentration. Am. J. Bot. 2001, 88, 677–684. [Google Scholar] [CrossRef]
  10. Mederer, D.; Kattenborn, T.; Cherif, E.; Guimaraes Steinicke, C.; Joswig, J.S.; Schneider, F.D.; Feilhauer, H. Unraveling the seasonality of functional diversity through remote sensing. Commun. Earth Environ. 2025, 6, 790. [Google Scholar] [CrossRef]
  11. Gitelson, A.A.; Zygielbaum, A.I.; Arkebauer, T.J.; Walter-Shea, E.A.; Solovchenko, A. Stress detection in vegetation based on remotely sensed light absorption coefficient. Int. J. Remote Sens. 2024, 45, 259–277. [Google Scholar] [CrossRef]
  12. Peng, Y.; Nguy-Robertson, A.; Arkebauer, T.; Gitelson, A.A. Assessment of Canopy Chlorophyll Content Retrieval in Maize and Soybean: Implications of Hysteresis on the Development of Generic Algorithms. Remote Sens. 2017, 9, 226. [Google Scholar] [CrossRef]
  13. Molnár, V.É.; Tőzsér, D.; Szabó, S.; Tóthmérész, B.; Simon, E. Use of Leaves as Bioindicator to Assess Air Pollution Based on Composite Proxy Measure (APTI), Dust Amount and Elemental Concentration of Metals. Plants 2020, 9, 1743. [Google Scholar] [CrossRef] [PubMed]
  14. Pascucci, S.; Pignatti, S.; Casa, R.; Darvishzadeh, R.; Huang, W. Special Issue “Hyperspectral Remote Sensing of Agriculture and Vegetation”. Remote Sens. 2020, 12, 3665. [Google Scholar] [CrossRef]
  15. Thenkabail, P.; Aneece, I.; Teluguntla, P.; Upadhyay, R.; Siddiqui, A.; Kalambukattu, J.G.; Kumar, S.; Gumma, M.; Dheeravath, V. Hyperspectral remote sensing for terrestrial applications. In Remote Sensing Handbook, Volume III: Agriculture, Food Security, Rangelands, Vegetation, Phenology, and Soils; Thenkabail, P.S., Ed.; USGS Publications Warehouse: Reston, VA, USA, 2024; pp. 285–358. Available online: https://pubs.usgs.gov/publication/70261150 (accessed on 28 January 2026).
  16. Karabourniotis, G.; Liakopoulos, G.; Bresta, P.; Nikolopoulos, D. The Optical Properties of Leaf Structural Elements and Their Contribution to Photosynthetic Performance and Photoprotection. Plants 2021, 10, 1455. [Google Scholar] [CrossRef]
  17. Momayyezi, M.; Rippner, D.A.; Duong, F.V.; Raja, P.V.; Brown, P.J.; Kluepfel, D.A.; Earles, J.M.; Forrestel, E.J.; Gilbert, M.E.; McElrone, A.J. Structural and functional leaf diversity lead to variability in photosynthetic capacity across a range of Juglans regia genotypes. Plant Cell Environ. 2022, 45, 2351–2365. [Google Scholar] [CrossRef]
  18. Diana Grecia, A.-M.; Sergio Arturo, T.-S.; Marlenne, G.-R. Review: Implications of Air Pollution on Trees Located in Urban Areas. Earth 2025, 6, 38. [Google Scholar] [CrossRef]
  19. Petrova, S.; Velcheva, I.; Nikolov, B.; Vasileva, T.; Bivolarski, V. Antioxidant Responses and Adaptation Mechanisms of Tilia tomentosa Moench, Fraxinus excelsior L. and Pinus nigra J.F. Arnold towards Urban Air Pollution. Forests 2022, 13, 1689. [Google Scholar] [CrossRef]
  20. Chang, Y.; Moan, S.L.; Bailey, D. RGB imaging based estimation of leaf chlorophyll content. In Proceedings of the 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), Dunedin, New Zealand, 2–4 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
  21. Yan, N.; Wang, W.; Zhao, A.; Chang, K.; Zhang, X.; Li, R.; Du, Y.; Song, Z.; Zhao, L.; Du, G. Physiological and molecular mechanisms of leaf color transformation in ‘Nanguo’ pear and its impact on flower bud quality. Plant Soil 2025, 517, 1563–1581. [Google Scholar] [CrossRef]
  22. Liu, Y.; Feng, X.; Zhang, Y.; Zhou, F.; Zhu, P. Simultaneous changes in anthocyanin, chlorophyll, and carotenoid contents produce green variegation in pink-leaved ornamental kale. BMC Genom. 2021, 22, 455. [Google Scholar] [CrossRef]
  23. Ložienė, K.; Chochlovaitė, I. Effect of Phenological Stage and Leaf Age on Changes of Chlorophyll and Carotenoid Contents in Some Weeds and Invasive Species. Molecules 2025, 30, 3788. [Google Scholar] [CrossRef]
  24. Agarwal, A.; de Jesus Colwell, F.; Correa Galvis, V.A.; Hill, T.R.; Boonham, N.; Prashar, A. Assessing nutritional pigment content of green and red leafy vegetables by image analysis: Catching the “red herring” of plant digital color processing via machine learning. Biol. Methods Protoc. 2025, 10, bpaf027. [Google Scholar] [CrossRef]
  25. Zlatev, Z.; Todorov, A.; Karadzhova, D.; Vasilev, M.; Veleva, P. Evaluating the Performance and Practicality of a Multi-Parameter Assessment System with Design, Comparative Analysis, and Future Directions. Sustainability 2024, 16, 4124. [Google Scholar] [CrossRef]
  26. Lark, R. Acceptable VOC Levels in PPM. Available online: https://environment.co/acceptable-voc-levels-ppm (accessed on 5 August 2024).
  27. Bright, R.M.; Lund, M.T. CO2-equivalence metrics for surface albedo change based on the radiative forcing concept: A critical review. Atmos. Chem. Phys. 2021, 21, 9887–9907. [Google Scholar] [CrossRef]
  28. Latif, S.D.; Almalayih, M.; Yafouz, A.; Ahmed, A.N.; Zaini, N.; Irwan, D.; AlDahoul, N.; Sherif, M.; El-Shafie, A. Prediction of atmospheric carbon monoxide concentration utilizing different machine learning algorithms: A case study in Kuala Lumpur, Malaysia. Environ. Technol. Innov. 2023, 32, 103387. [Google Scholar] [CrossRef]
  29. Wyman, C.; Sloan, P.-P.; Shirley, P. Simple Analytic Approximations to the CIE XYZ Color Matching Functions. J. Comput. Graph. Tech. 2013, 2, 11. [Google Scholar]
  30. Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-based plant height from crop surface models, visible, and near-infrared vegetation indices for biomass monitoring in barley. Int. J. Appl. Earth Obs. Geoinf. 2015, 39, 79–87. [Google Scholar] [CrossRef]
  31. Lussem, U.; Bolten, A.; Gnyp, M.L.; Jasper, J.; Bareth, G. Evaluation of RGB-based vegetation indices from UAV imagery to estimate forage yield in grassland. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, 42, 1215–1219. [Google Scholar] [CrossRef]
  32. Portz, G.; Gnyp, M.; Jasper, J. Capability of crop canopy sensing to predict crop parameters of cut grass swards aiming at early season variable rate nitrogen top dressings. Adv. Anim. Biosci. 2017, 8, 792–795. [Google Scholar] [CrossRef]
  33. Ghodke, A.; Shalini, K.; Laxmi, A. Influence of additives on rheological characteristics of whole-wheat dough and quality of Chapatti (Indian unleavened flat bread) Part I—Hydrocolloids. Food Hydrocoll. 2007, 21, 110–117. [Google Scholar] [CrossRef]
  34. Pathare, P.; Opara, U.; Al-Said, F. Colour measurement and analysis in fresh and processed foods: A review. Food Bioprocess Technol. 2013, 6, 36–60. [Google Scholar] [CrossRef]
  35. Atanassova, S.; Nikolov, P.; Valchev, N.; Masheva, S.; Yorgov, D. Early detection of powdery mildew (Podosphaera xanthii) on cucumber leaves based on visible and near-infrared spectroscopy. AIP Conf. Proc. 2019, 2075, 160014. [Google Scholar] [CrossRef]
  36. Cermakova, I.; Komarkova, J.; Sedlak, P. Calculation of Visible Spectral Indices from UAV-Based Data: Small Water Bodies Monitoring. In Proceedings of the 2019 14th Iberian Conference on Information Systems and Technologies (CISTI), Coimbra, Portugal, 19–22 June 2019; pp. 1–5. [Google Scholar]
  37. Ayanlade, A. Remote sensing vegetation dynamics analytical methods: A review of vegetation indices techniques. Geoinform. Pol. 2017, 16, 7–17. [Google Scholar] [CrossRef]
  38. Lykhovyd, P.; Averchev, O.; Fedorchuk, M.; Fedorchuk, V. The relationship between spatial vegetation indices: A case study for the south of Ukraine. Environ. Ecol. Res. 2023, 11, 740–746. [Google Scholar] [CrossRef]
  39. Aggarwal, N.; Shukla, U.; Saxena, G.J.; Rawat, M.; Bafila, A.S.; Singh, S.; Pundir, A. Mean-based Relief: An improved feature selection method based on ReliefF. Appl. Intell. 2023, 53, 23004–23028. [Google Scholar] [CrossRef]
  40. Kira, K.; Rendell, L.A. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, USA, 12–16 July 1992; AAAI Press: Washington, DC, USA, 1992; pp. 129–134. [Google Scholar]
  41. Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
  42. Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef]
  43. Urbanowicz, R.J.; Olson, R.S.; Schmitt, P.; Meeker, M.; Moore, J.H. Benchmarking Relief-based feature selection methods for bioinformatics data mining. arXiv 2018, arXiv:1711.08477. [Google Scholar] [CrossRef] [PubMed]
  44. Chong, K.S.; Shah, N. Comparison of Naive Bayes and SVM classification in grid-search hyperparameter tuned and non-hyperparameter tuned healthcare stock market sentiment analysis. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 90–94. [Google Scholar] [CrossRef]
  45. Nurwicaksana, S.; Oh, L.; Sukmana, H. A Comparative Study of Naive Bayes, SVM, and Decision Tree Algorithms for Diabetes Detection Based on Health Datasets. Int. J. Inf. Inf. Syst. 2025, 7, 200–209. [Google Scholar] [CrossRef]
  46. Wibowo, J.S.; Wahyudi, E.N.; Listiyono, H. Performance comparison of SVM, Naive Bayes, and Random Forest models in fake news classification. Eng. Technol. J. 2024, 9, 4799–4804. [Google Scholar] [CrossRef]
  47. Owoyi, M.C.; Okoh, J.E.; Obukohwo, V.; Olamuyiwa, S. Comparative study of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA) and Support Vector Machine (SVM) in dataset. Adv. J. Sci. Technol. Eng. 2025, 5, 70–84. [Google Scholar] [CrossRef]
  48. Qu, L.; Pei, Y. A Comprehensive Review on Discriminant Analysis for Addressing Challenges of Class-Level Limitations, Small Sample Size, and Robustness. Processes 2024, 12, 1382. [Google Scholar] [CrossRef]
  49. Polowczyk, A.; Polowczyk, A. The effectiveness of PCA in KNN, Gaussian Naive Bayes classifier and SVM for raisin dataset. CEUR Workshop Proc. 2023, 3695, 9–16. Available online: https://ceur-ws.org/Vol-3695/p02.pdf (accessed on 27 January 2026).
  50. Powers, D.M.W. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
  51. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
  52. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
  53. Kirilova, E. Synthesis of classifiers based on color features for detecting Fusarium Moniliforme mold on maize kernels. Sci. Work. Univ. Ruse 2012, 51, 168–175. [Google Scholar]
  54. Dineva, S.; Zlatev, Z. Urban environmental quality assessment by shape and spectral indices of mulberry leaves. Appl. Res. Tech. Technol. Educ. 2019, 7, 184–205. [Google Scholar] [CrossRef]
  55. Dineva, S.; Veleva-Doneva, P.; Zlatev, Z. Urban Environmental Quality Assessment by Spectral Characteristics of Mulberry (Morus L.) Leaves. Environments 2021, 8, 87. [Google Scholar] [CrossRef]
  56. Andreeva, A.; Buznikov, A.; Timofeev, A.; Alekseeva-Popova, N.; Belyaeva, A.I. Evaluation of the ecological state of environment with a reflection spectrum of indicator species of vegetation. Actual Probl. Remote Sens. Earth Space 2006, 2, 265–270. Available online: http://www.iki.rssi.ru/earth/articles06/vol2-265-270.pdf (accessed on 23 March 2026).
  57. Aničić, M.; Spasić, T.; Tomašević, M.; Rajšić, S.; Tasić, M. Trace Elements Accumulation and Temporal Trends in Leaves of Urban Deciduous Trees (Aesculus hippocastanum and Tilia spp.). Ecol. Indic. 2011, 11, 824–830. [Google Scholar] [CrossRef]
  58. Burnett, A.C.; Anderson, J.; Davidson, K.J.; Ely, K.S.; Lamour, J.; Li, Q.; Morrison, B.D.; Yang, D.; Rogers, A.; Serbin, S.P. A best-practice guide to predicting plant traits from leaf-level hyperspectral data using partial least squares regression. J. Exp. Bot. 2021, 72, 6175–6189. [Google Scholar] [CrossRef] [PubMed]
  59. Islam, S.; Samsuzzaman; Reza, M.N.; Lee, K.-H.; Ahmed, S.; Cho, Y.J.; Noh, D.H.; Chung, S.-O. Image Processing and Support Vector Machine (SVM) for Classifying Environmental Stress Symptoms of Pepper Seedlings Grown in a Plant Factory. Agronomy 2024, 14, 2043. [Google Scholar] [CrossRef]
  60. Meirista, E.; Ruslau, M.F.V.; Nurhayati; Pratama, R.A. Application of Principal Component Analysis (PCA) for dimensionality reduction in watermelon leaf classification. In Proceedings of the NST Proceedings: The 8th International Joint Conference on Science and Technology, Sumenep, Indonesia, 25 October 2025; pp. 129–139. [Google Scholar] [CrossRef]
  61. Shi, G.; Shen, X.; Ren, H.; Rao, Y.; Weng, S.; Tang, X. Kernel principal component analysis and differential non-linear feature extraction of pesticide residues on fruit surface based on surface-enhanced Raman spectroscopy. Front. Plant Sci. 2022, 13, 956778. [Google Scholar] [CrossRef] [PubMed]
  62. Brabant, C.; Alvarez-Vanhard, E.; Morin, G.; Nguyen, T.N.; Laribi, A.; Houet, T. Evaluation of dimensional reduction methods on urban vegetation classification performance using hyperspectral data. In Proceedings of the IGARSS 2018—IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018. [Google Scholar] [CrossRef]
  63. Li, X.; Peng, F.; Wei, Z.; Han, G.; Liu, J. Non-destructive detection of protein content in mulberry leaves by using hyperspectral imaging. Front. Plant Sci. 2023, 14, 1275004. [Google Scholar] [CrossRef]
  64. Schratz, P.; Muenchow, J.; Iturritxa, E.; Cortés, J.; Bischl, B.; Brenning, A. Monitoring Forest Health Using Hyperspectral Imagery: Does Feature Selection Improve the Performance of Machine-Learning Techniques? Remote Sens. 2021, 13, 4832. [Google Scholar] [CrossRef]
  65. Sreehari, E.; Dhinesh Babu, L.D. Feature selection techniques for data analysis and decision making in interdisciplinary areas: A systematic review. IEEE Access 2024, 12, 188845–188873. [Google Scholar] [CrossRef]
  66. Xu, L.; Chen, L.; Luo, Q.; Zhao, S.; Huang, J.; Wang, K.; Yang, Z.; Weng, X.; Fang, K.; Feng, H. Leveraging UAV hyperspectral imaging for crop physiology and biochemistry: A comprehensive review of feature extraction and selection methods. Plant Phenomics 2026, 8, 100141. [Google Scholar] [CrossRef]
  67. Blazhevska, T.; Menkinoska, M.; Pavlova, V.; Stamatovska, V.; Cilev, G.; Stojanovski, S.; Dimovska, N. Microbiological characteristics and hygienic-sanitary aspect of the water from Black River in the Pelagonija region–North Macedonia. Bulgar. J. Soil Sci. Agrochem. Ecol. 2021, 55, 15–20. Available online: https://agriacad.eu/ojs/index.php/bjssae/article/view/1660 (accessed on 5 February 2026).
Figure 1. Results from NB for mulberry leaves. Acc—accuracy; OE—overall error; LV—latent variables; PC—principal components.
Figure 1. Results from NB for mulberry leaves. Acc—accuracy; OE—overall error; LV—latent variables; PC—principal components.
Environments 13 00185 g001
Figure 2. Results from NB for linden leaves. Acc—accuracy; OE—overall error; LV—latent variables; PC—principal components.
Figure 2. Results from NB for linden leaves. Acc—accuracy; OE—overall error; LV—latent variables; PC—principal components.
Environments 13 00185 g002
Figure 3. Results from DA for leaves. (a) Mulberry; (b) linden leaves.
Figure 3. Results from DA for leaves. (a) Mulberry; (b) linden leaves.
Environments 13 00185 g003
Figure 4. Results from SVM for leaves. (a) Mulberry; (b) linden leaves.
Figure 4. Results from SVM for leaves. (a) Mulberry; (b) linden leaves.
Environments 13 00185 g004
Table 1. Criteria for determining pollutant threshold levels.
Table 1. Criteria for determining pollutant threshold levels.
PollutantThreshold ValuePollutantThreshold Value
PM2.535 µg/m3NOx1 ppm
PM1050 µg/m3SOx0.6 ppm
TVOC1 ppmCO0.6 ppm
eCO2410 ppm--
Table 2. Measured pollutant values in the four classes of areas (all data have a statistically significant difference at p < 0.05).
Table 2. Measured pollutant values in the four classes of areas (all data have a statistically significant difference at p < 0.05).
PollutantPM2.5, µg/m3PM10, µg/m3TVOC, ppmeCO2, ppmNOx, ppmSOx, ppmCO, ppm
Class
Class 145.45 ± 8.2459.58 ± 9.072.07 ± 0.63401.57 ± 63.600.88 ± 0.131.00 ± 0.030.7 ± 0.08
Class 2127.76 ± 19.81143.34 ± 10.635.62 ± 1.34402.52 ± 15.630.83 ± 0.330.57 ± 0.070.5 ± 0.04
Class 316.61 ± 3.4828.47 ± 9.433.97 ± 1.01448 ± 40.770.79 ± 0.120.57 ± 0.030.73 ± 0.01
Class 45.11 ± 1.518.11 ± 9.771.02 ± 0.7408.99 ± 14.160.57 ± 0.090.51 ± 0.030.6 ± 0.01
Table 3. Grouping of measurement zones into classes, depending on the degree of pollution.
Table 3. Grouping of measurement zones into classes, depending on the degree of pollution.
ClassDegree of Pollution
Class C1H-PM, H-PL
Class C2H-PM, L-PL
Class C3L-PM, H-PL
Class C4L-PM, L-PL
H—high; L—low; PM—particulate matter; PL—pollution level.
Table 4. Features used.
Table 4. Features used.
NoFeatureNoFeatureNoFeatureNoFeatureNoFeatureNoFeature
1Nr11GLI21YI31PACI41VARI51DVI
2Ng12VARI22WI32REI42ExG52PSRI
3Nb13RGR23BI33PTI43NDVI53TVI
4ExR14RBR24SI34CTI44NDWI54MSR
5ExG15GBR25CIRG35TVI45SR55NBR
6ExB16L26COL36G46WBI56MTVI2
7GRVI17a27CL37NExG47SAVI57WBI2
8BRVI18b28ECB38NGRDI48EVI--
9GBVI19C29FCI39RGBVI49Clre--
10RGBVI20h30WL40GLI50PRI--
Table 5. Presentation of class label categories in two-class classification.
Table 5. Presentation of class label categories in two-class classification.
--True Labels
-CategoryCorrectIncorrect
Labels predicted by the classifierCorrect
(Positive)
Actually Correct
(True Positive, TP)
Incorrectly Correct
(False Positive, FP)
Incorrect (Negative)Incorrectly False
(False Negative, FN)
Actually False
(True Negative, TN)
Table 6. Types of performance evaluations for classifiers.
Table 6. Types of performance evaluations for classifiers.
EvaluationFormulaEssence of the Evaluation
Precision T P T P + F P × 100 ,   % The percentage of actual correct objects
recognized by the classifier as correct
Sensitivity T P T P + F N × 100 ,   % The percentage of objects that are recognized by the classifier as correct and belonging to the TP category
Specificity T N T N + F P × 100 ,   % The percentage of objects that are recognized by the classifier as incorrect and belong to the category of actual incorrect objects
Accuracy T P + T N T P + T N + F P + F N × 100 ,   % The percentage of objects correctly recognized by the classifier in relation to all objects
Total error (eo) F P + F N T P + T N + F P + F N × 100 ,   % Represents the percentage of all incorrectly
classified objects out of the total number of objects
Matlab 2017a (The Mathworks Inc., Natick, MA, USA) and MS Office 2016 (Microsoft Corp., Redmond, WA, USA) software were used to process the experimental data.
Table 7. Selected numerical features and their full names.
Table 7. Selected numerical features and their full names.
MulberryLinden
AdaxialAbaxialAdaxialAbaxial
NoNameNoNameNoNameNoName
3Nb3Nb2Ng2Ng
6ExB5ExG3Nb3Nb
9GBVI24SI6ExB5ExG
10RGBVI25CIRG8BRVI7GRVI
19C32REI9GBVI8BRVI
20h33PTI10RGBVI9GBVI
21YI42ExG11GLI10RGBVI
30WL44NDWI21YI11GLI
33PTI46WBI29FCI12VARI
35TVI50PRI31PACI13RGR
37NExG52PSRI33PTI17a
38NGRDI55NBR35TVI21YI
39RGBVI56MTVI236G24SI
40GLI57WBI237NExG28ECB
41VARI--38NGRDI31PACI
42ExG--39RGBVI43NDVI
44NDWI--40GLI44NDWI
48EVI--41VARI45SR
50PRI--42ExG46WBI
55NBR--44NDWI47SAVI
----46WBI49Clre
----50PRI50PRI
----52PSRI51DVI
----55NBR52PSRI
----57WBI253TVI
------54MSR
------55NBR
------56MTVI2
------57WBI2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karadzhova, D.; Vasilev, M.; Veleva, P.; Zlatev, Z. Classification of Zones with Different Levels of Atmospheric Pollution Through a Set of Optical Features Extracted from Mulberry and Linden Leaves. Environments 2026, 13, 185. https://doi.org/10.3390/environments13040185

AMA Style

Karadzhova D, Vasilev M, Veleva P, Zlatev Z. Classification of Zones with Different Levels of Atmospheric Pollution Through a Set of Optical Features Extracted from Mulberry and Linden Leaves. Environments. 2026; 13(4):185. https://doi.org/10.3390/environments13040185

Chicago/Turabian Style

Karadzhova, Dzheni, Miroslav Vasilev, Petya Veleva, and Zlatin Zlatev. 2026. "Classification of Zones with Different Levels of Atmospheric Pollution Through a Set of Optical Features Extracted from Mulberry and Linden Leaves" Environments 13, no. 4: 185. https://doi.org/10.3390/environments13040185

APA Style

Karadzhova, D., Vasilev, M., Veleva, P., & Zlatev, Z. (2026). Classification of Zones with Different Levels of Atmospheric Pollution Through a Set of Optical Features Extracted from Mulberry and Linden Leaves. Environments, 13(4), 185. https://doi.org/10.3390/environments13040185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop