1. Introduction
Environmental quality monitoring in urban areas is a method that offers the opportunity to avoid adverse effects on human health. Passive and active biomonitoring of air quality has both advantages and disadvantages. Passive biomonitoring has the advantage of using tree species already present in the ecosystem, making this approach affordable and effective over time [
1].
Detailed studies have been conducted on passive bio-monitoring of air quality based on leaf data from Tilia (
Tilia sp.) [
2], hornbeam (
Carpinus betulus) [
3], and white willow (
Salix mucronata) [
4]. In the studies of these plants, both classical laboratory methods and contactless measurement techniques were used, such as spectral characteristics in the visible and near infrared, as well as hyperspectral analysis. The requirements for the application of different types of plants to passively determine the quality of the air in cities include identifying features of pollution that alter plant characteristics, needing to develop new methods, or refining existing ones [
5,
6,
7,
8,
9]. The aim of this study is to address precisely these requirements.
A plant that has a proven content of bioactive substances [
10], is resistant to contaminated soils, and allows pesticide treatment [
11] is the mulberry. Additionally, the plant is sensitive to changes in the environment [
12]. The economic importance of mulberries is to feed silkworms, cattle, and goats. In some cases, it is used as a park tree. In Bulgaria it is mainly found near major roads in the cities. The wide use of mulberry for creating plantations for different purposes (leaf mass, fruit, wood, landscaping), and the versatile application that the individual parts of the tree have, determine its great economic importance.
Many of the published results related to mulberry analysis and evaluation, include spectral and hyperspectral analysis methods, to define different qualitative indicators such as proteins, vegetable oils, starch, dry matter, moisture, acidity, various mycotoxins, and infections affecting indicators, relating to the application of the plant for medical and business purposes. For the most part, the laboratory methods used for analyzing mulberries are subjective and require considerable time to process plant samples. The accuracy of diagnosis is not high and depends on the expert’s qualifications. That is why the creation of highly efficient automated technologies for the evaluation of mulberry indicators is a priority objective of current research in this field. The purpose of spectral methods is to speed up the process of determining the state of the environment, not to replace classical laboratory methods.
Directive 2008/105/EC determines the good chemical status to be achieved by all Member States of the European Union (including Bulgaria), the legal basis for monitoring priority substances in sediments and flora and fauna. The laboratory and field methods described in this normative document include determination of O2, CO2, CO, SOx, NOx, Cl2, H2S, HCl, VOC, and PM in the air. Additionally, it describes determining the content of heavy metals, anions, and pesticides.
There are few studies on the qualitative indicators related to the surface texture of mulberry leaves, as well as the influence of the polluted environment in the habitat area of the plant. The question of whether the mulberry is suitable for passive biomonitoring for air quality remains unclear.
The purpose of this study is to analyze changes in the spectral reflectance characteristics of mulberry leaves depending on the level of air pollution in the habitat area of the plant.
2. Material and Methods
The mulberry leaf samples were taken from six areas with high and low car traffic. 25 leaves were used from each area taken from the sun-exposed side of the trees. The leaves were transported in a cooler bag. The measurements were made as soon as they were delivered to the laboratory.
To determine the environmental parameters in the analyzed areas, an experimental set-up was used, developed at the Faculty of Technics and Technologies, Yambol, Bulgaria [
13]. The measuring device consists of a sensor module and a microprocessor control system offering wireless communication.
By the system were measured: smoke gasses, ppm; particle matter PM > 0.5 μg/m3; equivalent CO2, (eCO2), ppm; and total volatile organic compounds, TVOC, ppb.
The measurements were made at a temperature of 22 ± 3 °C and a relative humidity of 39 ± 5% RH.
Table 1 shows the data on the areas in which the leaves were taken. Pollution rates and geographical coordinates are indicated. The area is located in the southeastern part of Bulgaria. Mulberries are more common urban plants in the studied geographical region.
The leaves are grouped into 2 groups—derived from less polluted zones (LP) and polluted zones (P). Passive determination is equal to passive biomonitoring in this case.
2.1. Measurement of Air Environmental Parameters
To determine the environmental parameters in the analyzed zones, an experimental set-up was used, developed at the Faculty of Technics and Technologies, Yambol, Bulgaria [
13]. The measuring device consists of a sensor module and a microprocessor control system offering wireless communication. The system measured smoke gasses, ppm; particle matter PM > 0.5 μg/m
3; equivalent CO
2, eCO
2, ppm; and total volatile organic compounds, TVOC, ppb. The measurements were made at a temperature of 22 ± 3 °C and a relative humidity of 39 ± 5% RH.
2.2. Planar Chromatography
The method used was that presented in Priyadarshini et al. [
14], with some modifications. The mulberry leaves from the polluted and less polluted areas were cut into 10 × 10 mm pieces. Their handles were removed. They were soaked for 4 h in acetone. On a white paper with a density of 80 g/m
2 and dimensions 105 × 19 mm, a drop of extract was applied at a distance of 15 mm from the end of the paper. The sample was dried for 15 min. Four mm from the end of the paper was immersed in acetone. After 15 min, the samples were removed from acetone and dried for 1 h. The values of the 5 separated fractions were then reported. Three replicates were made and the mean and standard deviation of Rf were reported. Rf = A/B, where A is the distance recorded by the solvent, and B is the distance reached by the corresponding fraction. Carotene, xantophyll, chlorophyll a, chlorophyll b, and anthocyanin fractions are reported.
2.3. Determination of Physicochemical Parameters
The preparation of the measurement samples was carried out according to the procedure presented in AACC 02-52.01 [
15], with some modifications suitable for the electrometric measurement of leaf parameters, in the following order: distilled water was heated to 70 °C; the leaf mass was crushed and placed in distilled water at a ratio of 1/10 (5 g of raw material in 50 mL of distilled water); stirring; and after cooling to ambient temperature, 3 consecutive measurements of each indicator were made and their average value and standard deviation were determined.
Measuring instruments used: Technical balance MH-200 (ZheZhong Weighing Apparatus Factory, Yongkang, China), maximum defined mass 200 g, with a resolution of 0.02 g; active acidity pH, pH meter PH-108 (Hangzhou Lohand Biological Co., Ltd, Hangzhou, China); EC conductivity, µS/cm, Conductivity Meter AP-2 (HM Digital, Inc., Redondo Beach, CA, USA); total dissolved solids, ppm, TDS-3 measuring instrument (HM Digital, Inc., Redondo Beach, CA, USA); and redox potential ORP, mV, Measuring Instrument Model ORP-2069 (Shanghai Longway Optical Instruments Co., Ltd, Shanghai, China).
2.4. Experimental Set-Up for Obtaining Spectral Characteristics
The experimental set-up used consists of a personal computer with software for receiving and processing images and spectral characteristics in the visible and near-infrared areas. The spectral characteristics of leaves were captured with a spectrophotometric sensor TCS230 (TAOS Inc., Premstaetten, Austria). The sensor is controlled by single-board microcomputer Arduino Nano compatible (Kuongshun Electronic Ltd., Shenzhen, China). The measuring distance was 0.5 cm from the leaf to the sensor. White LEDs with a maximum light intensity of 450 nm were used. The measurements are for 5 points of the adaxial and also 5 points on the abaxial part of the leaves.
2.5. Obtaining Spectral Characteristics
The transformation of values from
XYZ and
LMS models into reflection spectra in the VIS and NIR, in the 390–730 nm and 800–1000 nm ranges, was performed mathematically and the transformation was possible in both directions of equality [
16]. Mathematical dependencies, with the possibility of converting in both directions of equality:
where
A(
λ) is a matrix for converting color to reflection spectra in the VIS range, for accepted observer and illumination.
The used matrices for converting (matching functions) color components to spectrum are available in [
17] for the VIS region. Conversion functions for observer 2° are applied (LMS 2°, CIE 2006).
The conversion to NIR was performed using the compliance functions presented in [
18]. The illumination data used to convert the VIS and NIR characteristics were in accordance with D65 (average daylight with UV component (6500 K)) illumination. The conversion function between the
RGB and
XYZ models, in the range
λ1–
λ2 (380–780 nm), can be represented as:
where
M is the transformation matrix under the specified conditions for observer 2° and illumination D65. From here, the spectral characteristic is of the form:
Conversion functions change the way spectral data is stored or the way that it is represented. The conversion function in the range
λ1–
λ2 (800–1000 nm) between the
XYZ and the
LMS model can be represented as:
where
T is the transformation matrix under the specified conditions for observer 2° and illumination D65. From here the spectral characteristic is of the form:
2.6. Determination of Information Indices by Spectral Characteristics
The
NDAI (Normalized Dorsiventral Asymmetry Index) is defined as a linear combination of the reflections of the adaxial and abaxial parts of the leaves.
where
ρI,ab is the reflection of the abaxial part of the leaf;
ρI,ad—reflection of the adaxial part of the leaf.
The two types of reflection of the leaves are determined in the same wavelength. The blue part of the visible spectrum is at 420 nm, the green at 520 nm, and the red at 620 nm.
2.7. Reducing the Amount of Spectral Characteristics Data
Latent variables (LV), principal components (PC), and kernel variant of principal components (kPC) were used to reduce the amount of spectral characteristics data [
19]. The kernel version uses three kernel functions: Simple; Polynomial; and Gaussian. Software tools described by Wang [
20] were used to obtain the kernel principal components.
The PCA kernel method can be summarized in the following steps:
- ✓
Creating a
K kernel matrix from the training sample {
xi} by:
- ✓
Gram
K′ matrix calculation:
- ✓
Calculating vectors a
i by dividing
K by
K′:
- ✓
Calculation of kernel principal components
yk(
x)
:
2.8. A Correlation Method Was Used
This method determined the strength of the relationship between the NDAI spectral index and the physicochemical characteristics of mulberry leaves. The distribution of the data was checked by the methods: Shapiro–Wilks test; Kolmogorov–Smirnov test; and Lilliefors test.
As a criterion for evaluation, a correlation coefficient R was used. At R < 0.3, there is no or a very weak relationship between the data; at 0.3 < R < 0.5, the relationship is weak; at 0.5 < R < 0.7 the relationship is moderate; and for R > 0.7 the relationship is strong.
2.9. Classification Methods Used
The Naïve Bayes classifier was used as a reference [
21,
22]. One of the classic algorithms in machine learning is the Naïve Bayes Classifier, which is based on the Bayes theorem for determining the posterior probability of an event occurring. Accepting the “naïve” assumption of conditional independence between each pair of attributes, the Naïve Bayes classifier effectively handles too many attributes to describe an example, i.e., with the so-called “The curse of dimension”. Bayes’s theorem:
where
P(
y = c|x) is the probability of an object belonging to a class c (posterior probability of the class);
P(
x|y=c)—the probability of the object
x to meet in the middle of the object of class c;
P(
y = c)—unconditional probability of occurrence of object
y in class
c (a priori probability of class); and
P(
x)—unconditional probability of object x.
The purpose of the classification is to determine to which class the object
x belongs. Therefore, it is necessary to find the probability class of the object
x, i.e., it is necessary for all classes to select the one that gives the maximum probability
P(
y = c|x).
The definition of boundary values for the separation of polluted and less polluted zones, depending on the characteristics of the mulberry leaves, was made by discriminant analysis using a linear separation function (LDA) [
23]. LDA is suitable for datasets that have high clustering and low variance. In general, the linear separating function is:
where
δk is a separating function;
µk is the average vector;
x—observations; and
Σ−1—covariance matrix. For practical purposes, it is convenient to present the separation function as:
where
K is a constant;
L—linear coefficient; and
v = [
x;y]—vectors (matrix) of the data
x and
y.
Among use of the classifiers, one of the most important parts of the work is the choice of an appropriate measure in order to properly assess the classification performance. The evaluation of the performance of the classifiers used is based on a general classification error, which is described by the formula:
where
yik is the number of class
i samples classified by classifier in class
k;
yii—number of correctly recognized samples;
k = 1, …,
n—number incorrectly assigned to a class
i relative to the total number of samples; and
n—number of classes. All data were processed at a level of significance of α = 0.05.
3. Results and Discussion
Effective application of mulberry leaves data to determine the degree of pollution of the habitat area is entirely aimed at using methods that would be sufficiently effective with respect to rapid and simple classification, and at the same time giving satisfactory accuracy according to generally accepted standards to that end.
The results presented can be summarized in three groups. In the first stage, technological measurements of the mulberry leaves were made, including chromatographic and physicochemical methods of analysis. In the second stage, data from the analysis of spectral characteristics in the visible and near infrared spectral ranges are presented, both in their direct use and by methods of the amount of data reduction and classification. Finally, a discussion is made in which the results obtained are compared with those reported by other authors.
The measured parameters of the environment in the habitats of mulberry are shown in
Table 2. It was found that in the high-pollution areas, the parameter values were significantly higher than in the low-pollution zones.
Table 3 presents the results of planar chromatography on mulberry leaves from polluted (P) and less polluted (LP) areas. It is seen that for the leaves of the polluted areas, the values of the individual parameters are significantly lower than those of the less polluted areas. It is also seen that the coefficient of variation (CV) is below 30% (CV = SD/mean).
Table 4 shows the results of physicochemical parameters of mulberry leaves from polluted (P) and less polluted (LP) areas. Compared to the less polluted areas, the leaves from the polluted areas have higher values of active acidity and redox potential, lower values of electrical conductivity, and completely dissolved substances. It is also seen that the coefficient of variation (CV) is below 30% (CV = SD/mean). As in the previous cases, this indicator is higher for mulberry leaves than less polluted areas.
Figure 1 shows the averaged VIS spectral characteristics for the adaxial and abaxial part of mulberry leaves. It can be seen that the adaxial part has a separation between the spectral characteristics between the leaves in the polluted and less polluted areas. Only overlap 490–510 nm is observed. There is a strong overlap in spectral characteristics at the abaxial part of the leaves. Only in the 380–500 nm range is there a visible resolution between these characteristics.
Figure 2 shows the averaged NIR spectral characteristics for the adaxial and abaxial part of a mulberry leaf. Both parts have strong overlapping spectral characteristics. The separation is observed in the 820–860 nm ranges as well as at 880–950 nm. In the second spectral range, the separation of the characteristics is more pronounced at the abaxial part of the leaves.
Figure 3 presents the results for
NDAI indices obtained from the correlation between the spectral characteristics measured from the adaxial and abaxial parts of the leaf petal. As can be seen from the figure in the mean values, there is a difference in the spectral indices for mulberry leaves from the polluted and less polluted area. Their standard deviations overlap, which indicates that a breakdown of these indices cannot be made for all measurement cases. These results indicate that the direct use of spectral characteristics data is not appropriate in distinguishing between mulberry leaves from polluted and less polluted areas.
The correlation between the NDAI spectral index and the physicochemical characteristics of mulberry leaves was evaluated. From the analysis of the distribution, it was found that p = 0.07–0.09. At df = 18–74, it can be assumed that the data have a distribution close to normal.
Figure 4 shows the correlation between the analyzed values. At
λ = 420 nm, corresponding to the blue color of the spectrum, a strong correlation (R > 0.7) of the
NDAI index was observed with anthocyanin, pH, EC, and TDS. At
λ = 520 nm, corresponding to the green color of the spectrum, a strong correlation (R > 0.7) of the
NDAI index was observed with carotene, xanthophyll, chlorophyll a, and pH. At
λ = 620 nm, corresponding to the red color of the spectrum, a strong correlation (R > 0.7) of the
NDAI index was observed with carotene, xanthophyll, chlorophyll a, and pH.
The strong relationship between the
NDAI spectral index and chlorophyll is due to the fact that they absorb light most strongly in the blue part of the spectrum, as well as in the red part. Conversely, they are a poor absorber of green and almost green parts of the spectrum, which it reflects, producing a green color to tissues containing chlorophyll [
24].
The possibility of distinguishing mulberry leaves from polluted and less polluted areas was examined by using methods to reduce the amount of data of the spectral characteristics of mulberry leaves in the visible and near infrared ranges of the spectra.
Figure 5 shows an example of the work of a Naïve Bayes classifier. The results shown are using reduced spectral characteristics data from the adaxial part of the bilberry leaf, reduced by the kernel principal components method.
Table 5 shows the results of a common classification error using the accepted Naïve Bayes classifier. It can be seen that, using latent variables and the linear variant of the principal components, too-large values of the total classification error of over 45% are obtained. This is an expected result, since these two methods produce reduced values that are close in nature to the spectral characteristics.
Significantly lower error values were obtained using the kernel variant of the principal components. Only with the use of the Gaussian kernel, for spectral characteristics in the visible spectral range, high values of the common classification error of more than 40% are obtained.
For the next stages of work, a kernel variant of the principal components using the simple and polynomial kernel functions is selected.
Figure 6 shows in general the results of using a linear discriminant classifier. It can be seen that, when using a polynomial kernel, it produces worse results than using a simple kernel. The data overlap, which is a prerequisite for increasing the classification error and hence reducing the accuracy in distinguishing polluted and less polluted areas according to mulberry leaf data.
Table 6 shows the results of a common classification error using a linear discriminant classifier, in combination with the two methods selected to reduce the amount of data on the spectral characteristics of mulberry leaves. Spectral data from the adaxial and abaxial parts of the leaves, reduced with simple and polynomial variants of kPCA, were compared.
The results show that the lowest values of the common classification error are obtained with the combination of linear discriminant classifier and kernel principal components with “simple” kernel function. For this variant of application, the VIS and NIR spectral characteristics, reduced by the specified method, are defined. The separation functions defined are:
The results obtained indicate that the direct use of the spectral characteristics of the adaxial and abaxial part of leaf petals to passively determine the degree of air pollution in the area of the mulberry habitat is not appropriate. In contrast to the reported results for Tilia leaves by Zadeh et al. [
2], spectral indices obtained from the spectral characteristics of the adaxial and abaxial part of the leaves cannot be directly used in mulberry analysis.
The data obtained corroborate those reported by Sun et al. [
11], which use spectral characteristics to predict pesticide content in mulberry leaves. For accurate prediction, with an accuracy of 87%, a more sophisticated method of analysis is required, such as regression of the support vectors. In the present work, the separation between leaves collected from polluted and less polluted areas is obtained after applying the kernel variant of the principal components.
Prediction of water stress in mulberry caused by various factors, including air pollution, was reported by Bhosle et al. [
24]. Their proposed method, using REP spectral index, shows a predictive power of 93%. Similar high-resolution accuracy as shown in the present work can be obtained using the averages of these spectral indices. According to the results of other authors and of the results obtained here, it may be recommended to use complex methods of analysis to evaluate changes in mulberry leaves, depending on the pollution of the habitat area of the plant.
4. Conclusions
An approach has been adapted to passively determine the degree of pollution by the spectral characteristics of mulberry leaves, based on extracted features and classification. A comparative analysis of the application of methods for reducing the amount of data of spectral characteristics was performed. This analysis found that the direct use of latent variables and the linear variant of principal components is not appropriate in distinguishing between polluted and less polluted areas in an urban environment, according to mulberry leaf data, because the common classification error in using them exceeds 40%.
In the study conducted to determine the degree of air pollution by the spectral characteristics of the mulberry leaf, it was found that this can be realized with a common error of 0–1%, using a linear discriminant classifier, in combination with the kernel variant of the principal components. Analytical dependencies of the separation functions were derived. They were shown to be effective in solving the problem of determining the degree of air pollution in the mulberry habitat.
A strong relationship between the NDAI spectral index and chlorophyll was found, due to the fact that the mulberry leaves absorb light most strongly in the blue part of the spectrum, as well as in the red part.
The results obtained improve and complement those reported in the available literature. They can be used to refine the approaches and methods used so far to passively determine the degree of air pollution in the habitat area of the plant.
The proposed methods and software tools could be used in the development of mobile applications and methods for remote measurement, in express determination of the degree of environmental pollution, according to data from the mulberry leaves. More research can be carried out in the subject area, including data on color and spectral indices, as well as combinations of them. Organizing them into feature vectors and processing them with methods to reduce the volume of data would increase the accuracy of forecasting the state of the environment based on data from mulberry leaves.