Classification of Organic and Conventional Vegetables Using Machine Learning: A Case Study of Brinjal, Chili and Tomato

Growing organic food is becoming a challenging task with increasing demand. Food fraud activity has increased considerably with the increase in population growth. Consumers cannot visually distinguish between conventional and organically grown food products. Spectroscopic methodologies are presented to identify chemicals in food, thereby identifying organic and conventional food. Such spectroscopic techniques are laboratory-based, take more time to produce an outcome, and are costlier. Thus, this research designed a portable, low-cost multispectral sensor system to discriminate between organic and conventional vegetables. The designed multispectral sensor system uses a wavelength range (410 nm–940 nm) that includes three bands, namely visible (VIS), ultraviolet (UV) and near-infrared (NIR) spectra, to enhance the accuracy of detection. Tomato, brinjal and green chili samples are employed for the experiment. The organic and conventional discrimination problem is formulated as a classification problem and solved through random forest (RF) and neural network (NN) models, which achieve 92% and 89% accuracy, respectively. A two-stage enhancement mechanism is proposed to improve accuracy. In the first stage, the fuzzy logic mechanism generates additional feature sets. Ant colony optimization (ACO) algorithm-based parameter tuning and feature selection are employed in the second stage to enhance accuracy further. This two-stage improvement mechanism results in 100% accuracy in discriminating between organic and conventional vegetable samples. The detected adulterant is displayed on a web page through an IoT-developed application module to be accessed from anywhere.


Introduction
Currently, a small portion of around 1-2% of total agricultural land accounts for organic farming in India [1]. Still, organic farming is in the growth stage and is developing well. Both domestically and internationally, organic products in markets are increasing. However, some challenges exist in developing and commercializing organic products due to the lack of certification, limited market access, investment and consumer awareness. Environmental and health concerns drive the demand for organic products to gain maximum profit, although they are more expensive than conventional products. Due to the lack of production, limited supply and higher cost, fraudulent activities have increased. False labeling, chemical spraying on products to look fresh and false marketing are common problems that undermine consumer confidence in the authentication of organic products. As demand for organic products grows, developing a system model that helps identify organic products is required. It is convenient to undergo spectroscopic measurements such as near-infrared (NIR), Fourier transform infrared (FT-IR), mid-infrared (MIR), terahertz and nuclear magnetic resonance (NMR) to distinguish between organic and conventional food products. Spectroscopic measurements are used to measure light absorbance at different wavelengths, which helps measure the concentration of substances in a material.
Beer-Lambert's law defines a linear relationship between the absorbance of a substance, molar absorption coefficient, optical coefficient and concentration, given by Ω = ε χl (1) where Ω-absorbance; ε-molar absorptivity (m −1 cm −1 ); χ-concentration of substances; l-length of sample. The amount of light reflected after absorption by organic and conventional vegetables is utilized for distinguishing between the conventional and organic constituent elements present in the vegetables.
A few literature works have been reported to detect organic and conventional food items using spectroscopic methods.
NIR spectroscopic methods were used to evaluate the differentiation between conventional and organic apples [2]. The results show significant differences between conventional and organic apples according to sugar-acid ratio, volume and color. According to volume and color, the outer appearance difference achieves a robust significant correlation of analysis of variance (ANOVA) (p < 0.0001), and the sugar-acid ratio obtains a weak significant correlation of ANOVA p = 0.04 between organic and conventional apple samples.
A review work focused on identifying organic and conventional vegetables and fruits. Electron paramagnetic resonance spectrometry (EPR) [3] was used to determine the consecutive measurement of nitrate and nitrite ions in vegetables and fruits. The quantity of nitrite and nitrate ions in 18 different vegetables and fruits was compared between those homegrown using organic fertilizers and local markets. The highest concentration of nitrate ions for vegetables was 38.9 mg/kg for homegrown products and 451.6 mg/kg for local market products. The experimental analysis revealed that 0.45 mg/kg nitrite ion concentration are found in homegrown beetroot and 1.54 mg/kg nitrite ion concentration for local market purchased beetroot samples. The lowest nitrate concentration is found in potato (17.6 mg/kg homegrown and 40.2 mg/kg local market), and nitrite ion achieves 0 mg/kg for homegrown potato and 0.66 mg/kg for local market potato. Biosensors and chemometric analysis play a significant role in determining disease diagnosis [4,5].
A study [6] was conducted to determine the influence of agronomic practice on antioxidant vitamins and phenolics concentration in the yellow plums. Organic yellow plums were grown on three different soils: Trifolium, tilled soil and soil covered with natural meadow conventional plums are grown on tilled soil. Proximate analysis revealed that antioxidant vitamins and phenolic compounds are at the maximum concentration in organically grown plums.
In farming field trials [7], Chinese yam was used to discriminate between conventional and organic. Manganese (Mn), chromium (Cr), Se, Na, As, δD and δ15N are crucial variables that are used to discriminate between yam samples. Mn, Cr, Se, Na and As are essential to determine the presence of nutrients in Chinese yam. Stable isotopes were used to trace the toxic substances in the samples. Mass spectrometry analyzed the contaminant and chemical component activity of yam samples. The random forest (RF) classifier achieved 97.3% classification accuracy for organic and conventional yam discrimination.
The use of dyeing agents on food items to make them look fresh and prolong the storage period was analyzed, as this practice harms individuals. 1 H NMR relaxometry [8] was deployed to determine the presence of Sudan dye, malachite green and copper sulfate on red chili, peas and okra. Homegrown vegetable samples were utilized by spraying the dyes and soaking the vegetables in different solvents. The results show that 1 H NMR relaxometry can quantify the presence of dyes to the lowest value of 1 g/L (0.1%). Table 1 summarizes the various spectroscopic methods for discriminating between organic and conventional vegetables and fruits. Both destructive and nondestructive methods are deployed to evaluate the compositional elements and classification. Machine learning algorithms are deployed for the analysis of spectral information. A maximum accuracy of 100% and a minimum accuracy of 85.9% are reported from the destructive spectroscopic analysis for the discrimination of conventional and organic vegetables and fruits. Table 1. Spectroscopic analysis methods for organic and conventional fruits/vegetables discrimination.
A computer vision-based sensor system [10] was presented using spectroscopic image data to discriminate between conventional and organic apples. It involves five pattern recognition algorithms such as support vector machine (SVM), k-nearest neighbor (K-NN), partial least-squares discriminant analysis (PLS-DA), kernel PLS-DA and locally weighted partial least-squares classifier (LW-PLSC). A maximum classification accuracy of 94% was obtained by the locally weighted partial least-squares classifier algorithm. A subsequent work explored the NIR spectroscopy system for distinguishing between organic and conventional Gala apples [11]. The spectral data were preprocessed, baseline-corrected and normalized for classification by PLS-DA. The results show that a classification accuracy of 98% is achieved for discriminating the Gala apple samples.
To differentiate between conventional and organic tomatoes, 1 H proton nuclear magnetic resonance ( 1 H NMR), isotope ratio mass spectrometry (IRMS) and MIR spectroscopy were utilized [12]. The spectral data were processed with linear discriminant analysis (LDA), principal component analysis (PCA) and PLS-DA algorithms. The individual spectroscopy accuracy results are as follows: NMR-91.3%-100%, MIR-75%-91.7%. Of the validation results, 100% were obtained for fused data ( 1 H NMR+ IRMS+ MIR) with the LDA method.
The authors of [13] carried out a study to discriminate between organic and conventional pineapples by extracting the total soluble solids contents. The portable NIR spectroscopic technique combined with machine learning (ML) algorithms such as KNN, LDA, PCA, PLS-DA and multiplicative scatter correction-principal component analysislinear discriminant analysis (MSC-PCA-LDA) was used to classify organic and conventional pineapple samples. Among the above models, MSC-PCA-LDA provides 100% classification accuracy based on the total soluble solids (TSS) present in pineapples. Two more NIR spectroscopy techniques [14,15] were employed to differentiate between conventional and organic food materials by examining the soluble solid content (SSC) in fruits, vegetables and dairy products. A nondestructive spectroscopic system [16] was used to predict the SSC of apple samples. An optical setup was used to obtain the multispectral data from the samples. A multiple linear regression model analyzed and predicted the root mean square error at 0.861.
Another work [17] classified whether the diuron herbicide content is within the maximum residual limit (0.2 ppm) or not on intact olives using NIR spectroscopic technology. The spectral validation dataset achieved 85.9% classification accuracy for binary class of C1: diuron level > 0.2 ppm and C2: ≤ 0.2 ppm.
Most of the reported spectroscopic analyses are based on destructive sample analysis, which is expensive, laborious, bulky and unsuitable for the real-time analysis of organic and conventional discrimination. Portable, low-cost spectroscopic sensor systems [18] were previously designed and applied in milk adulteration detection; however, no attempt has been made to discriminate between organic and conventional vegetables. A real-time, cost-effective and reliable detection mechanism is required to detect organic vegetables and fruits. To overcome these flaws, this research work focuses on the following:

•
Our proposed work includes the design and construction of a nondestructive, portable, rapid-analysis multispectral sensor system to discriminate between conventional and organic vegetable samples using random forest and neural network algorithms. • A multispectral sensor system is designed with three bands of NIR, UV and visible to enable reliable detection.

•
In practical use of the sensor system, the accuracy of detection is challenged by the influencing factors of the angle of placement of the sensor on the sample, the distance between the sensor and sample, the presence of ambient light, the size of the sample, variation in the color texture of the sample. Fuzzy-logic-based feature generation is employed to scope-up such variable factors.

•
The ant colony algorithm is utilized for feature selection and parameter tuning to improve detection accuracy in the presence of variable factors and reduce the response time.
The remainder of this article is arranged as follows: Section 2 presents the materials and methods, the development of an AI-enabled multispectral sensor system, the design of a fuzzy engine for feature generation, feature selection using the ACO algorithm, and random forest and neural network models for discriminating between organic and conventional vegetable samples, followed by a discussion of sample preparation and data acquisition. Section 3 presents the results and discussion of the histogram analysis and three-band spectral responses, followed by ant colony, random forest and neural network algorithm results. Finally, Section 4 concludes the work with the future scope.

Materials and Methods
This section describes the multispectral sensor system design. Data acquisition, fuzzy logic feature generation, optimized feature selection, and parameter tuning by ACO algorithm are delineated. Random forest and neural network models are presented to classify organic and conventional vegetable samples.

Multispectral Sensor System Model
The designed and developed prototype for organic and conventional vegetable classification is given in Figure 1. It comprises the hardware components of the Triad Spectroscopic sensor system AS7265X, ESP8266 and Raspberry Pi. Its software component uses a fuzzy engine for feature generation.An ACO algorithm software module for optimizing parameter and feature selection. Neural network and random forest machine learning modules are used for the classification of organic and conventional samples, an MQTT client and server module are used for moving the data to a PC server, and a PC web server module with a GUI is used to display the result.
The Triad sensor holds three optical sensors: (i) AS72651, (ii) AS72652 and (iii) AS72653. Each sensor has 6 optical filters independently and a total of 18 channels with wavelengths ranging from 410 nm to 940 nm. The AS72651 is a master and holds ultraviolet (UV) ranges from 610 nm to 860 nm. AS72652 and AS72653 act as slave sensors that can sense near-infrared and visible regions from 560 nm to 940 nm [19,20]. The sensor has 3 integrated LED light sources of white LED with a luminescence of 5700 k, a UV LED emitting a wavelength of 405 nm and an IR LED with a light emission of 875 nm. The field of view of the sensor is ±20.5 • .
All channel data in the Triad sensor represent the received photon count β reflected after absorption from the measurement surface. The received photon counts β are defined by where α-intensity of incident LED light source; γ-amount of light incident on sample; ε χ l-number of photons absorbed in samples; l-sample length; ε-molar absorptivity; χ-concentration of substance. ESP8266 is connected to the sensor system through a QWIIC cable. It acquires the data in three bands (visible and NIR) with 18 channels through the data streamer of an Excel sheet. The software module for data acquisition and formatting with the message queuing telemetry transport (MQTT) protocol and a web client is deployed on ESP8266. Using the MQTT protocol, the web client communicates and stores the 18-channel data to the web server.
In the system, the data acquisition module is separated from the data processing module to support scalability. ESP8266 is used for data acquisition alone, and Raspberry Pi is used for data processing to run machine learning algorithms. The Raspberry Pi processor runs a software module for data splitting, fuzzy engine, ACO and the machine learning algorithm of the neural network and random forest algorithm. The Raspberry Pi processor obtains the 18-channel data stored in the web server using the MQTT protocol. The 18-channel data are split into three-band data for further processing. The fuzzy logic engine is designed to generate a new 18 fuzzy feature set using three-band split data. The ACO optimization algorithm selects the 12 best features from the 18 fuzzy features. These selected features are used for classification using the artificial intelligence (AI) machine learning algorithm of random forest (RF) ensemble learning and the neural network (NN) algorithm. The random forest and neural network models are designed for the binary classification of organic and conventional vegetables. This research aims to fulfill the sharing of discrimination information of conventional and organic vegetables with the end-user community. The Internet of Things (IoT) is used for this purpose. IoT technology connects things to the internet to enable output response accessibility from anywhere. Therefore, the sensor system is IoT-enabled to only publish the result on the internet for access to the public community. A web page is designed with a user-friendly graphical user interface (GUI) to display the result and acquired data. A password-based authentication system is developed to implement a security mechanism for access.

Fuzzy Model for Feature Generation
Fuzzy logic is designed with a set of rules to solve decision-making problems when the data are fuzzy in nature [20]. Fuzzy logic consists of three process stages: fuzzifier, intelligence and defuzzifier. Here, fuzzification and defuzzification are applied to generate a new feature set. In our spectral sensor, variable factors that affect the sensor response are (i) the distance between the sensor and the sample, (ii) the focus angle of the sensor and (iii) the variation in the size of the samples. Such factors introduce fuzziness in the acquired data set. Fuzzy-logic-based new feature generation is used to handle this fuzziness. Fuzziness in a dataset is determined using the membership function. In this system, the triangle membership function is used.
The triangle fuzzy membership function is defined as where a-lowest edge point of triangle; b-top point of triangle; c-highest edge point of triangle; x-input variable. Table 2 gives the input membership function a,b,c triangular point values used to fuzzify the sensor values. Defuzzification is applied to generate a new feature set. After the fuzzifying and fuzzy inference process, defuzzification is applied to generate new feature data. In this work, a centroid-based defuzzification is applied. The centroid method to compute the output y defu for fuzzy input x is defined as where µ i (x)-aggregated membership function; y defu -membership degree output. Table 3 defines the output membership functions associated with converting the fuzzy feature value into a crisp value. The 18-channel new feature is generated using fuzzy logic, which is applied to the ACO algorithm for the feature selection mechanism. The feature selection of the ACO algorithm is discussed in the next section. Our proposed system involves 18 channels of data, which are acquired at 3 different spectral bands. Using all 18 channels with 3 bands may overfit the machine learning model; therefore, feature selection is essential for this work. It is observed that the spectral data show significant variation with respect to ambient light conditions, the distance between the sensor and sample, the focus angle of the sensor to the sample, the dimension of the samples and variation in the sample color texture. New fuzzy feature generation and feature selection processes are carried out to scope-up such variation factors and improve classification accuracy. Usually, evolutionary algorithms such as genetic and ACO algorithms are used for optimized parameters and feature selection [21]. This work uses the ACO algorithm to search for an optimal feature subset. The ACO algorithm generally uses the probabilistic method to solve optimal path-finding problems such as the behavior of ants searching for food. The ACO algorithm builds a graph with n features as nodes and k ants, which are initially assigned in V k list records as the ants visit nodes. The pheromone δ ij (0) is assigned an initial value of 0. In this work, the optimal feature selection problem is formulated as an optimal path-finding problem and solved by an ACO algorithm.
Sometimes, the ACO algorithm fails to provide considerable performance and produces reduced feature dimension due to low (poor) diversity in selecting the features. Two-stage updating is performed for pheromone updating to improve the performance of an algorithm under the above case. In the first stage, updating is intended to increase the particular path pheromone value, and in the second stage, it is used to prevent falling in the local optimum. By performing this update, the optimal feature subset can be sorted, the accuracy of the classification can be enhanced, and feature subset dimensions can be reduced.
The algorithm of ACO follows the following steps to select optimal features.
Step 1: Generate random pheromone, initial outcome and ant k = 0; Step 2: In iteration repeat. Ant selects the next node to move based on the pheromone concentration. The path transfer probability ψ k ij (t) in which the ant moves from feature i to j node for iteration t = 0 is defined as In general, h ij -heuristic function (1/ed ij ); ed ij -Euclidean distance. However, for the two-stage pheromone updating process, h ij can be redefined as follows where TPR-true-positive rate, which is defined as where TP-true positive; FN-false negative. δ ij (t)-path pheromone concentration from ith to jth features during t iterations; Γ i , h -information and expectation heuristic factors used to assign weights. After the traversal of ants based on the probability of Equation (5), the information concentration (pheromone) is updated as, p-weight coefficient (0< p < 1); δ k ij -pheromone path increments between the feature i and j node of k th ant is given by Π-constant; l k -path length of kth ant in traversal.
Step 3: Select a subfeature set and parameters for machine learning models. Calculate the fitness function for feature subset and classification accuracy.
ξ-balancing weight between feature dimension and classification performance; S n -dimension of the whole set traversed by ants; S fd -selected feature subset; S p -selected parameter set; FPR-false-positive rate, which is defined as FP-false positive; TN-true negative.
Step 4: Check whether the maximum accuracy reached if met, stop and take the currently selected subfeature as the optimal feature set and the selected parameter set as the optimal parameter.
If not, perform two-stage pheromone updating in Steps 5 and 6.
Step 5: Stage 1. The total path traversed by ants for iteration t, updated for pheromone concentration k-ant number; m-total number ant colony in iteration; κ-pheromone concentration attenuation coefficient. δ k ij (t)-pheromone concentration released by ant k on the path (I, j) for t iteration, given by f k (t)-feature subset is chosen by ant k in t iteration; |f k (t)| length of feature subset; CF f k (t) − an index measuring classifier performance.
Step 6: Stage 2. Pheromone update The pheromone change value of tth iteration of the best ant can be defined as ∆δ bs ij (t)= η δ bs ij (t); arc (i, j) path of best ant 0; else (15) bs-optimal solution created in iteration by total ants; η-pheromone concentration for special paths. Pheromone addition of the nearest ant and the best ant is given as The Stage 2 pheromone concentration update can be defined in Equation (19) using the best ant and nearest ant addition values Step 7: Increase the number of ant k = k + 1, and go to Step 2 The optimally selected feature set from the outcome of the ant colony algorithm is used as input to the random forest and neural network machine learning algorithm to perform accurate classification with reduced time complexity. The details of the random forest algorithm classification are discussed in the next section.

Random Forest Algorithm
A random forest algorithm is constructed with decision trees on different spectral samples and chooses the majority vote for the final decision which is presented in Algorithm 1. It uses ensemble learning, combining many classifiers to solve complex problems. The symbols used for the random forest algorithm can be defined as follows [22] N P ← partition; N d -number of the decision trees in a random forest; R np -ratio of negative and positive cases; Ts-sampling threshold; R nn -radius of nearest neighbor; bootstrap works with respect to the sampling process; M ot -optimal model; Acc M -accuracy; P ot ← optimal parameter. The neural network is another machine learning algorithm utilized in the proposed method to test classification accuracy, which is discussed in the next section. Figure 2 illustrates the neural network's architecture with 12 optimal features, i 1 , i 2 , i 3 , . . . , i 12 , which represents the input layer neurons trailed by ten hidden layers, and a single neuron is designed for binary class classification at the output layer. W mn constitutes weights between the input and hidden layers, W no constitutes weights between the interconnected hidden layers, and W op constitutes weights between the last hidden layer. The output layer calculates the error value at the output by comparing the predicted and target values. This error value propagates in backpropagation from the output to the input layer. The weight values are adjusted with the gradient descent method to minimize the error value and backpropagate the errors.

Neural Network Architecture
Cross-entropy is employed as a loss function, which measures the error value between the target and predicted value, and it is represented by where Θ-true value; r i -sigmoidal probability of ith class. Cross-entropy is used to reach the minimum possible error, and in turn, the weight value is updated. The sigmoidal activation function is used in the output layer for binary classification. It is given by where v-input; F(u) = output.

Sample Preparation
This section describes the land and sample preparation procedure, followed by multispectral spectral data collection. Land preparation is one of the key elements in agriculture sample cultivation, especially when preparing vegetable samples for organic and conventional discrimination. Nutrition component analysis also discriminates between conventional and organic vegetables and fruits [23]. The organic and conventional vegetable samples were grown in the same red soil in different areas with a proper water supply. Figure 3a,b depicts a visual representation of organic-and conventional-grown brinjal plants. Soil fertility plays a significant role in growing organic food products. The soil should have enough nutrient and mineral supply for plant growth [24]. The red soil land is divided into two separate 10 × 10 square feet areas, one for growing organic and another for conventional vegetable samples for cultivation. Both areas are maintained at a distance of 15 m from each other during cultivation. The organic land is sprayed with organic fertilizers such as cow dung and goat dung. The fertilizers used for the conventional area are diammonium phosphate (DAP) 18:46 and single super phosphate (SSP). In total, 70 vegetable samples were obtained from the farm. One or two samples from each were eliminated due to damaged, diseased and spoiled conditions. Out of 25 counts of tomato (red color), 11 were organic and 13 were conventional. Similarly, in brinjal (purple), 12 were organic and 10 were conventional. In green chili (green color), 12 samples were conventional and 12 were organic. The samples were collected from a controlled environment, and no vegetable presample preparation was involved before testing. In total, 10,800 spectral data were obtained from all three vegetables, and each individual vegetable (i.e., organic and conventional) constituted 1800 spectral data.

Results and Discussion
The hardware setup discussed in Section 2 collects spectral data with a 150 ms sampling rate of 6 data samples per second. The acquired spectral data are processed with the fuzzy engine to generate new features. The ACO algorithm analyzes all the generated features for optimal 12-channel feature selection. The selected 12-channel features are applied to the random forest algorithm and neural network model to classify organic and conventional vegetable samples. This results section is arranged as follows: Section 3.1 discusses the histogram analysis of 18-channel spectral data photon counts, followed by the NIR, UV and visible-band photon count distribution in Section 3.2. Section 3.3 describes the results of the ACO optimal channel feature selection mechanism. The performance of the random forest algorithm and neural network model classification is discussed in Section 3.4. The system's response time and repeatability result analysis are discussed in Sections 3.5 and 3.6.

Histogram Analysis of Photon Count Distribution-18 Channels
This section analyzes and visually illustrates the number of photons received by the multispectral sensor detector from the vegetable samples. Since the spectral response depends on the color of the vegetables, the histogram analysis was carried out for the three colored vegetable (red-color tomato, purple-color brinjal and green chili) samples taken for the analysis in the study at three different sections.

Histogram Analysis of Photon Count Distribution to Discriminate Organic and Conventional Vegetable-Red Color Tomato
The channel values can be distinguished by colors in the histogram. Figure 4a,b illustrates a few observations of the histogram analysis for conventional and organic tomato samples. Table 4 shows the maximum count of occurrence of the photon count that can be examined to find a peak-specific responsive band and channel for a given vegetable sample. The responses of photon count distribution to the constituent element present in the organic and conventional tomato.  In Channel 1, a photon count value of 128 is observed with a minimum count of occurrence of 2 times only and a maximum count of occurrence of 44 times for a photon count value of 66 for the conventional tomato sample. In the second column of Table 4, the first value of 2 refers to the minimum count of occurrence, and 128 represents the photon count value of the conventional tomato sample of Channel 1.
Likewise, a minimum count of occurrence of 2 times with a photon count value of 248 is recorded in Channel 1. A maximum count of occurrence of 69 times is observed with a 146 photon count value for the organic tomato sample. Similarly, the photon count values and count of occurrences for the remaining channels are listed in Table 4 for the conventional and organic tomato (red color) samples.
For conventional tomatoes, the maximum count of occurrence of 55 is recorded from Channel 13 (NIR, 730 nm) with a photon count value of 2300. The first minimum count of occurrence of 11 times with a photon count value of 68 is for Channel 3 (UV, 460 nm).
For organic tomatoes, the first maximum count of occurrence of 140 is recorded from Channel 6 (IR, 535 nm) with a photon count value of 300. The first minimum count of occurrence of 6 is in Channel 12 (Vis, 705 nm) with a photon count value of 102. Table 4 shows that the conventional tomato of Channel 13 (NIR, 730 nm) obtains a maximum count of occurrence, and the organic tomato of Channel 6 (IR, 535 nm) also shows a maximum count of occurrence. Channels 13 and 6 are required to discriminate between the organic and conventional red-color tomato samples.

Histogram Analysis of Photon Count Distribution to Discriminate Organic and
Conventional Vegetable-Purple-Color Brinjal Figure 5a,b visually illustrates a few key observations of the histogram analysis for conventional and organic brinjal samples.
In Table 5, Channel 1, a photon count value of 321 is observed with a minimum count of occurrence of 6 times for conventional brinjal.
From Table 5, the first maximum count of occurrence of 59 is recorded from Channel 13 (NIR, 730 nm) with a photon count value of 686, followed by an initial minimum count of occurrence of 9 times with a photon count value of 390 for Channel 3 (UV, 460 nm) obtained for the conventional brinjal sample.  Table 5 shows that the first maximum count of occurrence of 91 is observed from Channel 7 (Vis, 560 nm) with a photon count value of 31. The first minimum count of 7 occurred in Channel 12 (Vis, 705 nm) for a 342 photon count value for the organic brinjal sample. Table 5 shows that the conventional brinjal sample shows a maximum count of occurrence at Channels 7 (Vis, 560 nm) and 13 (NIR, 730 nm) with a value of 59. Among Channels 7 and 13, the photon count reflection is high in Channel 13 (NIR, 730 nm); therefore, the Channel 13 (NIR, 730 nm) response provides a better response analysis for the conventional case of Channel 7 (Vis, 560 nm), which shows a maximum occurrence count for the organic brinjal channel. Differentiating between the conventional and organic brinjal samples of Channels 13 (NIR, 730 nm) and 7 (Vis, 560 nm) is required.  Figure 6a,b illustrates the histogram analysis of the conventional and organic chili sample histogram analysis.  Table 6 records a 1048 photon count value for a minimum count of 2 times from Channel 1 for conventional chili. A photon count value of 421 is recorded for a single count of occurrence of organic chili. Table 6. Eighteen-channel photon count distribution and the number of occurrences for conventional and organic green chili.  Table 6 shows that for the histogram analysis value for conventional green chili, the first maximum count of 91 is observed from Channel 4 (UV, 485 nm) with a photon count value of 50. Next, the minimum occurrence count of 10 is observed with a photon count value of 59 in Channel 9 (NIR, 610 nm).

Channel
For organic green chili, a starting maximum count of occurrence of 86 is observed from Channel 2 (UV, 435 nm) and a photon count value of 72. Next, a minimum count of occurrence of 7 is recorded in Channel 4 (UV, 485 nm) with a photon count value of 181. Table 6 shows that for conventional green chili, the occurrence count is maximum in Channel 4 (UV, 485 nm), and for organic chili, Channel 2 (UV, 435 nm) shows a maximum count of occurrence. Therefore, it is concluded that Channels 4 and 2 show a better response to discriminate between the organic and conventional green chili samples.  show that to discriminate between the vegetable samples of red tomato, Channels 13 (NIR, 730 nm) and 6 (IR, 535 nm) are required. For purple-color brinjal, Channels 13 (NIR, 730 nm) and 7 (Vis, 560 nm), and for green chili, Channels 4 (UV, 485 nm) and 2 (UV, 485 nm), are required.
Each vegetable sample shows a maximum count of responses at different bands for discrimination. Figures 4a,b, 5a,b and 6a,b depict most of the channels' data are occupied on the exact count of occurrence of photon values. Therefore, it is tedious to distinguish between the photon count distribution for the individual channels due to overlapping on the same point. However, it is possible to analyze and locate the peak response band of the organic and conventional samples using the maximum count of occurrence.

Individual Channel Responses
In our proposed multispectral sensor system design, there are 18 channels available, divided into 3 prominent spectral bands, namely ultraviolet, visible and near-infrared, each consisting of 6 channels. The following section illustrates the NIR, UV and Vis spectral responses for organic and conventional brinjal samples. Each band shows better discrimination concerning the spectral photon count distribution for the brinjal sample. The individual-band spectral channel response is analyzed only for the purple-color brinjal sample alone as an example of illustration to reduce the number of pages in the article.   Figure 8 depicts the samples collected from the 560 to 705 nm bands with wavelengths of 560 nm, 585 nm, 610 nm, 680 nm and 705 nm on Channels 7,8,9,11 and 12. Channels 7,8,9,11 and 12 show better discrimination between the organic and conventional brinjal samples. Channel 10 (Vis, 645 nm) illustrates nondiscrimination between the organic and conventional brinjal samples because the amplitude levels collided.

ACO Results
The analysis performed in Section 3.2 of individual band responses proves that only a few bands respond well, and many band data do not respond to discriminate between organic and conventional vegetable samples. Therefore, optimal feature selection is required to select the optimal band that responds better. To select such optimal features, the ACO algorithm was employed. The selected 12 optimal feature sets were applied to the random forest and neural network classifier. The results are shown in Figure 10. From Figure 10, it is seen that the accuracy of the random forest and neural network models is 100% and 90% at the 40th iteration and then converges to 100%. Convergence of RF and NN was observed at the 40th and 50th iterations. The ACO optimization algorithm is essential in optimizing and improving classification accuracy by optimal feature selection. The optimized features were applied to the random forest classifier for organic and conventional vegetable sample discrimination. Optimally selected features (Sf d ) were given to the random forest algorithm, and a classification accuracy of 100% was achieved. Table 7 compares the accuracy of the random forest and neural network models before and after ACO optimization for the organic and conventional vegetable samples.

Analysis of Neural Network Model
The ACO-selected optimal feature set was applied to the neural network model. The network was trained with 70% of the sample data. The validation phase use 15% of the sample data. Finally, the testing phase takes left out 15% of the sample data. Figure 11 depicts the predicted values, represented in the column, and the actual values, represented in the row. The diagonal axis represents the true-positive and truenegative labels classified for conventional and organic vegetables. The resultant classification achieves 100% accuracy in discriminating between conventional and organic samples. Figure 12 shows the receiver operating curve for the training, validation, test and all data sets. The receiver operating curve (ROC) was calculated for the true-positive rate (TPR) and false-positive rate (FPR), i.e., sensitivity and ROC between the false-positive rate (FPR) and false-negative rate (FNR), i.e., specificity. The ROC curve on the top left corner proves that the system works better with a performance of 100% accuracy for two-class classification models.
A comparison of the proposed model performance with similar literature is presented in Table 8.
In comparison with the similar literature work listed in Table 8, our proposed system performs better in the following aspects:

•
The model proposed in [6] achieves an accuracy of 97.3% for organic and conventional yam discrimination. The cost of the experimented system model is expensive and not portable; therefore, mass spectroscopy is not applicable for real-time analysis. The system model is analyzed only with a single vegetable. The proposed work involves designing and constructing a nondestructive, less expensive, portable, rapid-analysis multispectral sensor system. It can discriminate between organic and conventional vegetables (tomato, green chili, brinjal) using fuzzy and ACO optimization algorithms for optimal feature selection. The optimal feature set is applied through random forest and neural network algorithms, achieving 100% classification accuracy.   Since one of our objectives is to reduce the cost and make it portable, the Triad multispectral sensor system was utilized for discrimination between organic and conventional vegetables. The Triad multispectral sensor costs around Rs. 6250/-. The cost of Raspberry Pi is around Rs. 1600/-. The ESP8266 Wi-Fi module costs around 1850/-. The completely designed equipment is much cheaper at Rs. 9700/-.

System's Response Time
The response time of the entire sensor system was analyzed, and it was observed that the system can respond by taking a maximum time of 400 ms.

Repeatability Analysis
The repeatability response of the proposed system was tested at regular intervals over three months. Each time, three tests were carried out with different vegetable samples obtained from different markets (conventional samples). Organic samples were cultivated, taken from different soil places, and tested for the repeatability test. From the above repeatability test, it is observed that the system can reproduce the same results.

Conclusions
This research showed the potential usage of a multispectral sensor system for the nondestructive, rapid discrimination of organic and conventional green chili, tomato and brinjal. The proposed multispectral sensor is portable, which enables the consumer to efficiently use it in the field. The histogram analysis of 18-channel spectral responses provided discrimination between organic and conventional vegetable samples. Three spectral-band responses were plotted, depicting the discrimination between organic and conventional purple brinjal samples. The 18-channel raw data were tested with a neural network and random forest algorithm and obtained 89% and 92% classification accuracy. The classification accuracy was improved to 100% by the new fuzzy feature set generation and optimal feature set selected by the ACO algorithm. The organic and conventional discrimination results are displayed on the developed web page through the IoT application system. Authenticated users can access the detected output anywhere with the developed system. The future scope can be enhanced by testing other vegetable samples grown under different soil conditions. Vegetable samples grown under the use of DAP fertilizer were tested, which can be extended by testing with other similar types of fertilizers. Pesticideand insecticide-sprayed vegetable sample testing can also be expanded. Some chemicals are sprayed on vegetables to look fresh, and this kind of toxication can be identified in the future for storage purposes. Sample testing in the proposed work was performed in a controlled environment. On, farm field testing can be explored in the near future.
Funding: This research received no external funding.
Institutional Review Board Statement: Ethical review and approval were waived for this study due to this study not involving humans or animals.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality.
Acknowledgments: I am grateful to acknowledge P. Singaravelu farmer from Attur, who provided land for cultivating organic and conventional vegetables timely.

Conflicts of Interest:
The authors declare that the research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.