Low-Cost Multispectral Sensor Array for Determining Leaf Nitrogen Status

: A crop’s health can be determined by its leaf nutrient status; more precisely, leaf nitrogen (N) level, is a critical indicator that carries a lot of worthwhile nutrient information for classifying the plant’s health. However, the existing non-invasive techniques are expensive and bulky. The aim of this study is to develop a low-cost, quick-read multi-spectral sensor array to predict N level in leaves non-invasively. The proposed sensor module has been developed using two reﬂectance-based multi-spectral sensors (visible and near-infrared (NIR)). In addition, the proposed device can capture the reﬂectance data at 12 di ﬀ erent wavelengths (six for each sensor). We conducted the experiment on canola leaves in a controlled greenhouse environment as well as in the ﬁeld. In the greenhouse experiment, spectral data were collected from 87 leaves of 24 canola plants, subjected to varying levels of N fertilization. Later, 42 canola cultivars were subjected to low and high nitrogen levels in the ﬁeld experiment. The k-nearest neighbors (KNN) algorithm was employed to model the reﬂectance data. The trained model shows an average accuracy of 88.4% on the test set for the greenhouse experiment and 79.2% for the ﬁeld experiment. Overall, the result concludes that the proposed cost-e ﬀ ective sensing system can be viable in determining leaf nitrogen status.


Introduction
Nitrogen (N) is one of the most indispensable elements for plants owing to its vital role in major physiological processes [1]. Various enzymatic proteins that regulate plant growth processes are formed basically from N [2]. During vegetative growth, plants take up soil N as mineral and transport it to the leaves. Due to the limitation of N supply in soils, the optimal use of N fertilizer at the different growth stages of plants is a challenging issue in the existing agricultural system [3]. Besides, to achieve high yield, sometimes farmers over-fertilize N, which can be leached below the root zone or might get lost in run-off. These occurrences may cause the rise of nitrate ion in water, which creates human health issues [4]. Furthermore, excess N fertilization results in adverse environmental effects such as soil denitrification, greenhouse effect, and water pollution [5]. On the other hand, photosynthesis and crop yield are negatively affected by N deficiency; therefore, N content monitoring during the vegetative growth stage is needed for optimal fertilizer application [6].
Optimal N management can minimize N losses to the environment and also ensure high production. The conventional approach for N management is based on soil analysis, which is time-consuming and not rapid enough to make fertilizer recommendations on time [6]. Several crop-based N status measurement techniques have been proposed over time. Most of the current techniques can be categorized as chemical analysis, optical spectroscopy, image-based analysis, and electrical impedance-based analysis. The widely used chemical-based methods are Kjeldahl digestion and Dumas combustion; these are destructive and labor-intensive processes [7]. Optical spectroscopy is widely used for non-destructive measurements of N status in leaves by utilizing Soil Plant Analysis Development (SPAD) (Spectrum Technologies, Inc., 3600 Thayer Court, Aurora, IL, USA), the spectrometer FieldSpec 3 (Analytical Spectral Devices, Boulder, CO, USA), GreenSeeker (Trimble, Sunnyvale, CA, USA), or QuickBird (Ball Aerospace and Technologies, Boulder, CO, USA). These are basically called proximal optical sensors as they are a form of remote sensing where these are placed either in contact or close to the crop [4]. However, these spectral-based measurements have some drawbacks. For instance, the SPAD meter's functionality is centered on only one wavelength at 650 nm for chlorophyll determination and can become saturated with high N application. According to Xiong et al. (2015) [5], the part of N allocated to chlorophyll is very small, and mostly constitutes photosynthetic proteins. Moreover, the SPAD meter is also affected by the crop species, and environmental factors such as shading, and light-dependent chloroplast movements. Although a correlation coefficient R 2 = 0.86 is reported by using the FieldSpec 3 device [8], this device is expensive and not feasible to carry around in the field. Additionally, GreenSeeker is also a costly device, and the estimate can become saturated with increasing areas of biomass/leaf. Technical development in precision agriculture for facilitating non-invasive measurements such as image-based measurement is becoming popular among farmers. The ways the images are captured can be categorized into three: satellites, airplanes, and unmanned aerial vehicles (UAVs) [9]. The imagery from the QuickBird satellite is expensive and affected by atmospheric interference [10]. Further, using aerial devices needs specific training, and the weather conditions can also harm image quality. Red green blue (RGB) images are based on three optical bands that correspond to three visual cones of the eye. Mercado et al. (2008) [11] used a digital RGB camera and could show a correlation between N in tomato with color images. However, in recent years, the use of the hyperspectral camera is becoming more popular than RGB cameras for plant phenotyping due to the high throughput. This device can provide nitrogen distribution in spatial coordinates [12]. However, hyperspectral imaging (HSI) is generally very expensive, limiting its use in the field, and the usage is limited to a very specific application. A lot of spaceborne image spectrometer data has become accessible in recent years, and this potential supply of data would decrease the cost of research in the future [13][14][15]. Researchers are also trying to correlate the electrical properties of plants (electrical impedance) with macronutrients [16]. These electrical techniques are still in the initial phase of development. Concisely, these methods suffer from electrode polarization effects. Figure 1 summarizes the overall techniques related to N estimation in crops.
Nitrogen 2020, 1 2 N status measurement techniques have been proposed over time. Most of the current techniques can be categorized as chemical analysis, optical spectroscopy, image-based analysis, and electrical impedance-based analysis. The widely used chemical-based methods are Kjeldahl digestion and Dumas combustion; these are destructive and labor-intensive processes [7]. Optical spectroscopy is widely used for non-destructive measurements of N status in leaves by utilizing Soil Plant Analysis Development (SPAD) (Spectrum Technologies, Inc., 3600 Thayer Court, Aurora, IL, USA), the spectrometer FieldSpec 3 (Analytical Spectral Devices, Boulder, CO, USA), GreenSeeker (Trimble, Sunnyvale, CA, USA), or QuickBird (Ball Aerospace and Technologies, Boulder, CO, USA). These are basically called proximal optical sensors as they are a form of remote sensing where these are placed either in contact or close to the crop [4]. However, these spectral-based measurements have some drawbacks. For instance, the SPAD meter's functionality is centered on only one wavelength at 650 nm for chlorophyll determination and can become saturated with high N application. According to Xiong et al. (2015) [5], the part of N allocated to chlorophyll is very small, and mostly constitutes photosynthetic proteins. Moreover, the SPAD meter is also affected by the crop species, and environmental factors such as shading, and light-dependent chloroplast movements. Although a correlation coefficient 2 = 0.86 is reported by using the FieldSpec 3 device [8], this device is expensive and not feasible to carry around in the field. Additionally, GreenSeeker is also a costly device, and the estimate can become saturated with increasing areas of biomass/leaf. Technical development in precision agriculture for facilitating non-invasive measurements such as image-based measurement is becoming popular among farmers. The ways the images are captured can be categorized into three: satellites, airplanes, and unmanned aerial vehicles (UAVs) [9]. The imagery from the QuickBird satellite is expensive and affected by atmospheric interference [10]. Further, using aerial devices needs specific training, and the weather conditions can also harm image quality. Red green blue (RGB) images are based on three optical bands that correspond to three visual cones of the eye. Mercado et al. (2008) [11] used a digital RGB camera and could show a correlation between N in tomato with color images. However, in recent years, the use of the hyperspectral camera is becoming more popular than RGB cameras for plant phenotyping due to the high throughput. This device can provide nitrogen distribution in spatial coordinates [12]. However, hyperspectral imaging (HSI) is generally very expensive, limiting its use in the field, and the usage is limited to a very specific application. A lot of spaceborne image spectrometer data has become accessible in recent years, and this potential supply of data would decrease the cost of research in the future [13][14][15]. Researchers are also trying to correlate the electrical properties of plants (electrical impedance) with macronutrients [16]. These electrical techniques are still in the initial phase of development. Concisely, these methods suffer from electrode polarization effects. Figure 1 summarizes the overall techniques related to N estimation in crops. Various chemical forms of N are sensitive to visible and near-infrared (NIR) regions. The reflectance around 550 nm wavelength is significant for differentiating various N treatments [17]. Various chemical forms of N are sensitive to visible and near-infrared (NIR) regions. The reflectance around 550 nm wavelength is significant for differentiating various N treatments [17]. Besides, isotopes of N have also been measured using spectral analysis [18]. Moreover, the total N in plants can be determined at 671 nm and 780 nm wavelength [19]. More precisely, for hyperspectral imaging 440, 473, 513, 542, 659, 718, 744, 865, 928, 965, 986, and 1015 nm wavelengths are effective for determining N level in leaves, which could obtain the correlation R 2 = 0.882 with N content [20].
Machine learning (ML) allows effective decision-making and rational behavior without (or with limited) human interference in real-world scenarios. ML creates a comprehensive and scalable platform for not only data-driven decision making but also for the integration of specialist expertise into the method. According to the literature [21][22][23], machine learning approaches can help to create a precise model for estimating the leaf N status. Probabilistic machine learning models, such as Gaussian and Buffet processes [24,25], are able to reduce sensor noises when the fusion of information is used. On the other hand, since multi-spectral data is high-dimensional [26], non-linear methods such as support vector machine, k-nearest neighbor, neural networks, and decision tree have been regarded as acceptable methods with the ability to solve the issue of overfitting [6].
In the present study, a multi-spectral sensor is proposed that can be used to detect the leaf N levels. The proposed method is applied and tested on the canola leaves. The primary objectives of this study are (a) develop a low-cost sensor, (b) validate the portability for in-field measurement, and (c) build an accurate machine learning model for level identification. To achieve our research objectives, we tested our model on a single canola variety in greenhouse, and on several canola varieties grown in field with variable nitrogen levels.

Designed Hardware for N Sensing System
The designed hardware prototype was composed of two sensors named as Sensor1 and Sensor2. Sensor1 (SparkFun number SEN-14347, SparkFun Electronics, Niwot, CO, USA) was a visible, multi-spectral sensor, and Sensor2 (SparkFun number SEN-14351, SparkFun Electronics, Niwot, CO, USA) was a NIR multi-spectral sensor. Other essential components were a Qwiic mux breakout board (SparkFun number BOB-14685 SparkFun Electronics, Niwot, CO, USA) and a Raspberry Pi version 3 model B (Raspberry Pi Foundation; Cambridge, UK). Sensor1 (Figure 2a,b) was a six-channel multi-spectral sensor in the visible spectrum. With a full-width half-max (FWHM) of 40 nm, the spectral bands ranged from around 430 to 670 nm. It had a built-in aperture that provided a controlled light entrance. The I2C register set was used to store the spectral data. The six visible channels were 450 nm (channel V), 500 nm (channel B), 550 nm (channel G), 570 nm (channel Y), 600 nm (channel O), and 650 nm (channel R), each with 40 nm FWHM as shown in Figure 2c. Moreover, it had a 16-bit ADC (analog to digital converter), and it operated at low voltage of around 2.7 to 3.6 V with the I2C interface. In addition, the sensor's field of view was ±20 degree. Here, each channel was tested with GAIN = 16× at ambient temperature (25 • C), and the calibration and measurements were made using diffused light. The measurement unit of the channel was µW/cm 2 with an accuracy of 12%. The sensor-on-chip had a built-in source light (5700 K white LED, L130-5780HE1400001, Lumileds, CA, USA) having a color rendering index (CRI) of 80. The energy at each channel was calculated with a ±40 nm bandwidth around the center wavelengths.
Sensor2 was a digital six-channel spectrometer in the NIR light region (Figure 3a,b). The six independent optical filters had a spectral response defined in the NIR wavelengths, approximately from 600-870 nm and 20 nm of FWHM. Additionally, it had capabilities similar to Sensor1 of integrating Gaussian filters into CMOS (complementary metal-oxide-semiconductor) silicon via nano-optic deposited interference filter technology. As mentioned, it provided a six-channel spectrometry solution at wavelengths 610 nm (channel R), 680 nm (channel S), 730 nm (channel T), 760 nm (channel U), 810 nm (channel V), and 860 nm (channel W), each with 20 nm FWHM (Figure 3c). In addition, the onboard source light was a 2700 K warm LED (L130-2790001400001; Lumileds; CA, USA) with a CRI of 90. The energy at each channel was calculated with a ±33 nm bandwidth around the center wavelengths. The other configurations that are like Sensor1 are not mentioned. Nitrogen 2020, 1 4 (a) (b) (c) Sensor2 was a digital six-channel spectrometer in the NIR light region (Figure 3a,b). The six independent optical filters had a spectral response defined in the NIR wavelengths, approximately from 600-870 nm and 20 nm of FWHM. Additionally, it had capabilities similar to Sensor1 of integrating Gaussian filters into CMOS (complementary metal-oxide-semiconductor) silicon via nano-optic deposited interference filter technology. As mentioned, it provided a six-channel spectrometry solution at wavelengths 610 nm (channel R), 680 nm (channel S), 730 nm (channel T), 760 nm (channel U), 810 nm (channel V), and 860 nm (channel W), each with 20 nm FWHM ( Figure  3c). In addition, the onboard source light was a 2700 K warm LED (L130-2790001400001; Lumileds; CA, USA) with a CRI of 90. The energy at each channel was calculated with a ±33 nm bandwidth around the center wavelengths. The other configurations that are like Sensor1 are not mentioned. Raspberry Pi 3 was used for controlling the sensors. The main processing unit consisted of a 1.2 GHz ARM processor and a 1 GB LPDDR2 main memory [30]. This device is used in various applications such as image processing [31], IOT systems [32], and automatic control, for example in smart irrigation [33].
The system consisted of two sensors that had the same I2C address. For simplicity, a multiplexer (mux) was used to communicate with Raspberry Pi on the same bus. It had eight configurable addresses of its own that offered 64 I2C buses. In the proposed setup, channel 1 and channel 2 were used. After collecting the data, the modeling and processing were performed on a computer using MATLAB 2018b. The summary of the hardware components mentioned above is given in Table 1. Moreover, the overall setup, along with the physical 3D view of the system, is shown in Figure 4a-c.  Sensor2 was a digital six-channel spectrometer in the NIR light region (Figure 3a,b). The six independent optical filters had a spectral response defined in the NIR wavelengths, approximately from 600-870 nm and 20 nm of FWHM. Additionally, it had capabilities similar to Sensor1 of integrating Gaussian filters into CMOS (complementary metal-oxide-semiconductor) silicon via nano-optic deposited interference filter technology. As mentioned, it provided a six-channel spectrometry solution at wavelengths 610 nm (channel R), 680 nm (channel S), 730 nm (channel T), 760 nm (channel U), 810 nm (channel V), and 860 nm (channel W), each with 20 nm FWHM ( Figure  3c). In addition, the onboard source light was a 2700 K warm LED (L130-2790001400001; Lumileds; CA, USA) with a CRI of 90. The energy at each channel was calculated with a ±33 nm bandwidth around the center wavelengths. The other configurations that are like Sensor1 are not mentioned. Raspberry Pi 3 was used for controlling the sensors. The main processing unit consisted of a 1.2 GHz ARM processor and a 1 GB LPDDR2 main memory [30]. This device is used in various applications such as image processing [31], IOT systems [32], and automatic control, for example in smart irrigation [33].
The system consisted of two sensors that had the same I2C address. For simplicity, a multiplexer (mux) was used to communicate with Raspberry Pi on the same bus. It had eight configurable addresses of its own that offered 64 I2C buses. In the proposed setup, channel 1 and channel 2 were used. After collecting the data, the modeling and processing were performed on a computer using MATLAB 2018b. The summary of the hardware components mentioned above is given in Table 1. Moreover, the overall setup, along with the physical 3D view of the system, is shown in Figure 4a-c. Raspberry Pi 3 was used for controlling the sensors. The main processing unit consisted of a 1.2 GHz ARM processor and a 1 GB LPDDR2 main memory [30]. This device is used in various applications such as image processing [31], IOT systems [32], and automatic control, for example in smart irrigation [33].
The system consisted of two sensors that had the same I2C address. For simplicity, a multiplexer (mux) was used to communicate with Raspberry Pi on the same bus. It had eight configurable addresses of its own that offered 64 I2C buses. In the proposed setup, channel 1 and channel 2 were used. After collecting the data, the modeling and processing were performed on a computer using MATLAB 2018b. The summary of the hardware components mentioned above is given in Table 1. Moreover, the overall setup, along with the physical 3D view of the system, is shown in Figure 4a-c.

Greenhouse Experimental Setup
Canola seeds were sown on 2 November of 2018 in a controlled greenhouse environment situated in Agriculture and Agri-Food Canada (AAFC), Saskatoon. During the first 3 weeks, all 24

Greenhouse Experimental Setup
Canola seeds were sown on 2 November of 2018 in a controlled greenhouse environment situated in Agriculture and Agri-Food Canada (AAFC), Saskatoon. During the first 3 weeks, all 24 pots were fertilized with slow-release 15-30-15 (15% nitrogen, 30% phosphorus, and 15% potassium) fertilizer at a rate of 4 g/L to ensure uniform establishment. Later, the twenty-four pots were divided into four N concentrations, each containing six replicate plants (4 N levels × 6 pots = 24 plants, Figure 5). Henceforward, only nitrogen fertilizer 30-0-0 (30% nitrogen, 0% phosphorus, and 0% potassium) was applied three times a week at four concentration levels: 0, 6, 12, and 20 g/L. Two panels of 45 w grow lights were used, which were illuminated 16 h a day. The day temperature was kept at 25 • C and night at 20 • C with a relative humidity of~45%.
Nitrogen 2020, 1 6 pots were fertilized with slow-release 15-30-15 (15% nitrogen, 30% phosphorus, and 15% potassium) fertilizer at a rate of 4 g/L to ensure uniform establishment. Later, the twenty-four pots were divided into four N concentrations, each containing six replicate plants (4 N levels × 6 pots = 24 plants, Figure  5). Henceforward, only nitrogen fertilizer 30-0-0 (30% nitrogen, 0% phosphorus, and 0% potassium) was applied three times a week at four concentration levels: 0, 6, 12, and 20 g/L. Two panels of 45 w grow lights were used, which were illuminated 16 h a day. The day temperature was kept at 25 °C and night at 20 °C with a relative humidity of ~45%.

Field Experiment Setup
This experiment was conducted at Lewellyn Farm, Saskatoon, Saskatchewan, Canada, during the summer of 2019 (Figure 6a). In this experiment, a total of 42 canola varieties were subjected to low-N and high-N levels. Here, a total of 336 plots (42 varieties × 4 replicates per N levels) were subjected to two levels of N treatments. In this work, three leaves (one of each plant) from three central and south side plants were used from each plot. Finally, the total number of leaf samples was

Field Experiment Setup
This experiment was conducted at Lewellyn Farm, Saskatoon, Saskatchewan, Canada, during the summer of 2019 (Figure 6a). In this experiment, a total of 42 canola varieties were subjected to low-N and high-N levels. Here, a total of 336 plots (42 varieties × 4 replicates per N levels) were subjected to two levels of N treatments. In this work, three leaves (one of each plant) from three central and south side plants were used from each plot. Finally, the total number of leaf samples was 1008 (3 × 336; Figure 6b,c).

Field Experiment Setup
This experiment was conducted at Lewellyn Farm, Saskatoon, Saskatchewan, Canada, during the summer of 2019 (Figure 6a). In this experiment, a total of 42 canola varieties were subjected to low-N and high-N levels. Here, a total of 336 plots (42 varieties × 4 replicates per N levels) were subjected to two levels of N treatments. In this work, three leaves (one of each plant) from three central and south side plants were used from each plot. Finally, the total number of leaf samples was 1008 (3 × 336; Figure 6b,c).

Data Collection and Modeling
The proposed prototype measures the reflectance at different wavelengths. The device was calibrated by taking reflectance data from a white surface each time before the data were collected. A regular white mirror paper [34] was used as a reference for calibration for all measurements. The reflectance data at 12 different wavelengths (450, 500, 550, 570, 600 nm, and 650, 610, 680, 730, 760, 810, and 860 nm) were collected for each of the leaves. In both the experiments, reflectance data collection started at week seven after the sowing. For the greenhouse experiment, 87 fresh leaves were selected from 24 plants while 1008 leaves from the 336 plots were taken for the field experiment. Figure 6b,c shows the proposed setup being used for collecting spectral data. All the data were

Data Collection and Modeling
The proposed prototype measures the reflectance at different wavelengths. The device was calibrated by taking reflectance data from a white surface each time before the data were collected.
A regular white mirror paper [34] was used as a reference for calibration for all measurements. The reflectance data at 12 different wavelengths (450, 500, 550, 570, 600 nm, and 650, 610, 680, 730, 760, 810, and 860 nm) were collected for each of the leaves. In both the experiments, reflectance data collection started at week seven after the sowing. For the greenhouse experiment, 87 fresh leaves were selected from 24 plants while 1008 leaves from the 336 plots were taken for the field experiment. Figure 6b,c shows the proposed setup being used for collecting spectral data. All the data were collected from three different positions around the midrib of a leaf by scanning 15 times. For data processing, the K-means clustering technique, which is widely used for pattern recognition, and plant stress phenotyping [35] was applied. K-means clustering was performed on the 15 scans by using the value of K as 3. From the three most different clusters, three centroids were selected corresponding to the reflectance from the three different positions. The reflectance data collected at 12 wavelengths from the leaves were used as features to classify N levels. The purpose of the classification was to investigate if a model can be built using the reflectance by the sensor to determine the four categories of responses from four N treatments in the greenhouse experiment and two categories in the field experiment.
Feature selection can be categorized into the filter, wrapper, and embedded methods [36]. Filter methods select sub-sets of variables without having information about the classifier. By contrast, wrapper and embedded methods use a fitting process such as the machine learning procedure for finding significant features. To shed light on it, different feature combinations were evaluated based on a predictive model in wrapper methods. On the other hand, embedded methods learn which set of features can feed to a model to achieve the best accuracy. In this paper, we used the embedded method in which features are weighted based on the particle swarm optimization (PSO) algorithm during learning. A similar approach was taken by Pourmohammadali et al. [37], wherein they utilized the genetic algorithm.
In the next step, the normalization technique was applied to data to bin the data in a similar domain when they were not in the same quantity domain. Normalization helps the classifier work more effectively since it changes the underlying probability distribution of features. In this paper, a modified standard score (z-score) was used for normalization [38]. Instead of using average, the median operator was also implemented to make the normalization robust against outliers [39].
The k-nearest neighbor (KNN) is a lazy learning classification method that works based on the closest training examples in feature space [40]. It is the simplest technique when there is no prior knowledge about the distribution of the dataset. The K was set to 1 in this research, founded by trial and error. KNN has been applied in many fields, such as leaf N and water content classification [38], plant leaf classification [41], and gene expression analysis [42]. Various machine learning approaches, including decision tree (DT), support vector machine (SVM), and ensemble bagged tree, are also used for comparison. SVM relies on a set of hyperplanes in a high or infinite-dimensional space. The radial basis function (RBF) is used as a kernel of SVM, and soft-margin and RBF kernel parameters are optimized using the Wang [43] method. It has been widely used in several research domains like plant disease detection [44] and optical remote sensing [45]. Furthermore, DT divides the dataset to small subsets based on the entropy of features, and creates a tree with leaf nodes, incrementally. In this research, decision tree C4.5 [46] is used. Moreover, ensemble bagged decision tree is also implemented in which a bag of the decision trees uses ensemble technique for aggregating results [47].

Evaluation Metrics
The final classification model needs a multiclass evaluation. The systematic analysis of classification, which was introduced by Sokolova and Lapalme [48], is used, and the validation metrics, including accuracy, sensitivity/recall, specificity, precision, and F1-score, are utilized. The definitions of these metrics are defined in the Supplementary Materials. For evaluation, data is divided into a train (75%), and test (25%) sets using the hold-out method. After training the system, the designed model is applied to the test set. Figure 7 shows the process flow diagram of this study, starting from the calibrating phase to the training and testing phase. metrics, including accuracy, sensitivity/recall, specificity, precision, and F1-score, are utilized. The definitions of these metrics are defined in the Supplementary Materials. For evaluation, data is divided into a train (75%), and test (25%) sets using the hold-out method. After training the system, the designed model is applied to the test set. Figure 7 shows the process flow diagram of this study, starting from the calibrating phase to the training and testing phase.

Result and Discussion
At first, the reflectance properties of the different intensity levels of fertilized nitrogen leaf in the greenhouse experiment were investigated. To serve the purpose, the average reflectance from each of the four N treatments at 12 wavelengths was plotted (Figures 8 and 9). These figures show that the average reflectance values of 20 g/L treatments are greater than other treatments. Additionally, the reflectance curves are distinctive respecting to N treatments, which suggests that the four N levels can be classified.
Nitrogen 2020, 1 9 At first, the reflectance properties of the different intensity levels of fertilized nitrogen leaf in the greenhouse experiment were investigated. To serve the purpose, the average reflectance from each of the four N treatments at 12 wavelengths was plotted (Figures 8 and 9). These figures show that the average reflectance values of 20 g/L treatments are greater than other treatments. Additionally, the reflectance curves are distinctive respecting to N treatments, which suggests that the four N levels can be classified.   Average reflectance versus wavelength (Sensor1) of leaves subjected to four nitrogen fertilization regimes under visible range. The red line indicates the reflectance from 0 g/L plant, the black line represents 6 g/L, blue represents 12 g/L, and the green represents 20 g/L nitrogen rates. All the reflectance is scaled to the 20 g/L reflectance at 550 nm. Figure 9. Average reflectance versus wavelength (Sensor2) of canola leaves subjected to four nitrogen fertilization regimes under the NIR range. The red line indicates the reflectance from 0 g/L plant, the black line represents 6 g/L, blue represents 12 g/L, and the green represents 20 g/L nitrogen rates. All the reflectance is scaled to the 20 g/L reflectance at 610 nm. Figure 9. Average reflectance versus wavelength (Sensor2) of canola leaves subjected to four nitrogen fertilization regimes under the NIR range. The red line indicates the reflectance from 0 g/L plant, the black line represents 6 g/L, blue represents 12 g/L, and the green represents 20 g/L nitrogen rates. All the reflectance is scaled to the 20 g/L reflectance at 610 nm.
For classifying the captured data, the data set was divided into two parts: training set (75%) and testing set (25%), and the model was trained and tested five times with shuffling data. Finally, the average with a standard deviation of evaluation metrics was reported in Table 2, representing the class-wise (for each category) and overall results on the test set for the greenhouse growing canola plants. From the classification results shown in Table 2, it can be found that Category 4 has the best accuracy, and Category 2 has the least compared to others. One of the reasons might be the fact that there should some samples from Category 2 whose reflectance properties are nearly the same as Category 1. Later, a binary classification (two-class classification) was performed where Category 1 and Category 2 were combined; at the same time, Category 3 and Category 4 were combined. This means the classification is performed only in two classes. The results of the binary classification combining Category 1 and Category 2, and Category 3 and Category 4, are shown in Table 3. The binary classification of greenhouse results reveals that the accuracy, precision, recall, specificity, and F1-score have improved compared to distinct classes classification. Moreover, the standard deviation is decreased, which means a more consistent result has been acquired.
The field experiment results are also summarized in Table 3. It shows the 25% test set results of two categories (low N and high N). The average accuracy of the test set is 79.2%, which is less compared to the test results achieved in the greenhouse experiment. Although the field experiment is all about classifying two categories of nitrogen, the combined dataset has high variance as it consists of 42 canola varieties. In some cases, the model might get confused, identifying each category for similar responses from a different category. As a result, the model gets disorderly, in some cases, in differentiating two nitrogen levels.
In this section, some other state-of-the-art machine learning algorithms such as decision tree, SVM, and ensemble bagged tree are compared with the results obtained from the KNN algorithm. For this comparison, the same procedure is applied, and the testing accuracy is shown in Table 4. Among them, KNN showed the best accuracy (underlined). The next best result is shown by the ensemble bagged tree, followed by SVM and decision tree. Therefore, using the KNN algorithm for creating a model for this type of data is the best choice. The comparisons in Table 4 illustrate that KNN outperforms decision tree, SVM, and ensemble bagged tree. This is not surprising because, in several previous comparatives studies, researchers have shown similar results. For example, in a comparative study of SVM and KNN on classifying of an acoustic signal, KNN classifier shows better performance than SVM [49]. Additionally, KNN outperforms SVM in weather classification data [50]. Moreover, better accuracy is achieved by KNN compared to decision tree in the classification of Irish national forest inventory data [51]. In general, KNN can achieve higher accuracy when the number of training samples is much larger than the number of features [50]. However, there is no universal classifier which is consistently better at any task than others.
In this research, the reflectance at 12 different wavelengths has been used to train the proposed method. From the feature weight optimization analysis, as described in the methodology, the features are ranked based on the weights gained from PSO. The whole process is run five times. Using the weights in five runs, box plots are plotted in Figure 10. From the figure, the importance of features can be ranked based on the median value of the weights after multiple runs. The most important feature is found to be 450 nm followed by 500, 860, 680, 570, 650, 600, 550, 760, 810, 730, and 610 nm in descending order.
The proposed multi-spectral sensors, along with the suitable machine learning algorithm, have been shown to be very cost-effective for determining leaf nitrogen status. This is the first time that a low-cost system was designed for measuring leaf nitrogen level using the reflectance at 12 wavelengths.
The proposed method was verified in greenhouse and field conditions. One of the limitations of this research is that the actual contents of nitrogen are not measured by the correlation, which will be the next phase of our project. Additionally, only four levels (greenhouse experiment) and two levels (field experiment) of nitrogen were considered in these experiments, and the effects with increasing nitrogen levels need further investigation. Furthermore, N is found basically as proteins and chlorophyll forms in leaf cells; however, nitrogen-containing biochemical constituents sensitive in visible and NIR spectral domains have been used in this study. One of the reasons of neglecting protein-related N forms is they are most sensitive in the SWIR spectral domain [52,53] (1020, 1510, 1980, 2060, 2130, 2180, 2300 nm, as confirmed by other research [54][55][56]), and the detectors in those regions are very expensive and bulky to incorporate into the existing setup. So, continuous exploration for cheaper detector technology is needed. In addition, applying the technique to other crop species at various growth stages has great potential for future research.
KNN can achieve higher accuracy when the number of training samples is much larger than the number of features [50]. However, there is no universal classifier which is consistently better at any task than others.
In this research, the reflectance at 12 different wavelengths has been used to train the proposed method. From the feature weight optimization analysis, as described in the methodology, the features are ranked based on the weights gained from PSO. The whole process is run five times. Using the weights in five runs, box plots are plotted in Figure 10. From the figure, the importance of features can be ranked based on the median value of the weights after multiple runs. The most important feature is found to be 450 nm followed by 500, 860, 680, 570, 650, 600, 550, 760, 810, 730, and 610 nm in descending order.

Conclusions
In this research, a low-cost and portable multi-spectral sensing system is proposed for classifying the N status of plants. The device can capture the reflectance at 12 wavelengths ranging from 450 to 860 nm. The major two parts of the system are a visible sensor that can capture the reflectance at six wavelengths from 450-650 nm, and the other six wavelengths are covered by a NIR sensor range of 610-860 nm. To examine the performance of the proposed device, two individual experiments have been conducted. The KNN algorithm has been employed to model the collected data set. The model achieved 88.4% and 79.2% accuracy for the greenhouse and the field experiments, respectively. The results reveal that the proposed low-cost multi-spectral sensor array can classify leaves' nitrogen level with decent accuracy. It is worth mentioning that the overall cost of the proposed device is USD $200, which is very cheap compared to existing technologies. Moreover, this device can be further modified to measure the other plant nutrients such as phosphorus (P), potassium (K), etc. Additionally, the device accuracy can be improved if more optical bands in the SWIR regions are incorporated.
Supplementary Materials: The following are available online at http://www.mdpi.com/2504-3129/1/1/7/s1, Table S1: Validation metrics. Funding: This work is financially supported by The Canada First Research Excellence Fund through the Global Institute for Food Security, University of Saskatchewan to K.A.W. and by Agriculture and Agri-Food Canada (AAFC) to R.S.