Weed Classification for Site-Specific Weed Management Using an Automated Stereo Computer-Vision Machine-Learning System in Rice Fields

Site-specific weed management and selective application of herbicides as eco-friendly techniques are still challenging tasks to perform, especially for densely cultivated crops, such as rice. This study is aimed at developing a stereo vision system for distinguishing between rice plants and weeds and further discriminating two types of weeds in a rice field by using artificial neural networks (ANNs) and two metaheuristic algorithms. For this purpose, stereo videos were recorded across the rice field and different channels were extracted and decomposed into the constituent frames. Next, upon pre-processing and segmentation of the frames, green plants were extracted out of the background. For accurate discrimination of the rice and weeds, a total of 302 color, shape, and texture features were identified. Two metaheuristic algorithms, namely particle swarm optimization (PSO) and the bee algorithm (BA), were used to optimize the neural network for selecting the most effective features and classifying different types of weeds, respectively. Comparing the proposed classification method with the K-nearest neighbors (KNN) classifier, it was found that the proposed ANN-BA classifier reached accuracies of 88.74% and 87.96% for right and left channels, respectively, over the test set. Taking into account either the arithmetic or the geometric means as the basis, the accuracies were increased up to 92.02% and 90.7%, respectively, over the test set. On the other hand, the KNN suffered from more cases of misclassification, as compared to the proposed ANN-BA classifier, generating an overall accuracy of 76.62% and 85.59% for the classification of the right and left channel data, respectively, and 85.84% and 84.07% for the arithmetic and geometric mean values, respectively.


Introduction
Rice is considered a staple food for over half of the world's population [1]. Weeds are among the most significant factors decreasing the yield of rice, incurring not only major economic costs, but also crop two optimized neural network metaheuristic algorithms to select the most effective features and then classify the plants. Accordingly, five effective features were selected from the original pool of 186 features by using a hybrid neural network-cultural algorithm method. Next, the neural network was combined with a harmony research algorithm to detect weeds from potato plants with 98.38% accuracy in a processing time of 0.8 s per frame [26]. Nguyen et al. (2013) used genetic programing for discrimination of rice and other leaf classes. In order to evaluate the classifier, they further used a scanning window of 20 × 20 pixels on a test image and applied the classifier on every single pixel of the window based on a color threshold, achieving an accuracy of 90% [31]. Partel et al. (2019) designed and developed a smart sprayer using machine vision and artificial intelligence to discriminate weed and non-weed objects. For precise spraying, this targeted system was integrated with a novel precision spraying system equipped with a state-of-the-art weed detection system and a weed mapping system. The results showed that the application of this system reduced the required quantity of agrochemicals compared to the traditional broadcast spraying systems that usually use the entire field [32]. Bakhshipour and Jafari (2018) tried to use a SVM and ANN to detect weeds across a sugar beet field. For each type of weed/crop, they built a pattern from several shape features and used this pattern for classification. This study showed higher classification accuracy of the SVM (up to 93.39%) compared to the ANN (up to 92.67%) [33]. Hamuda et al. (2017) used the HSV color space and morphological operations for automatic discrimination of cauliflowers, weeds, and soil in neutral condition. In the proposed algorithm, a target region was found by filtering each of the three HSV (Hue, Saturation, and Value) channels between the minimum and maximum thresholds before applying morphological operations onto the selected region. Finally, the statistical moment method was applied on the video frames to determine the position and mass of the object. For evaluating the algorithm, the result was compared with ground truth methods, returning a classification sensitivity of 98.91% and a precision of 99.04% [34].
Two cameras are used in passive stereo vision techniques. A baseline separates these two cameras and they capture two (left and right) images in the same scene and at the same time. For getting disparity maps for depth calculation, this technique depends on correspondence matching between left and right images. Correspondence matching between left and right images not only uses computation intensively in real-time, but also adequate textural information is needed to guarantee the reliability and accuracy of this correspondence matching. Active stereo vision, compared with passive stereo vision approach, uses a structured light pattern (grid, lines), which is projected onto the surface of a detected object to reconstruct its 3D shape. This technique may take an unacceptable processing time for in-field applications [35][36][37]. Jeon et al. (2011) developed a machine vision system by using a stereo camera for discriminating crop plants, weeds, and soil on the images taken from fields under natural illumination conditions in the early growth stage of the plant. The proposed algorithm included the normalized excessive green modification, statistical threshold estimation, image segmentation, median filter application, morphological feature extraction, and ANN. Results of this study showed that the ANN could detect the crop plants correctly with an accuracy of up to 95.1% [38]. Tilneac et al. (2012) developed a 3D stereo vision by using two web cameras to discriminate weeds and plants in laboratory conditions. For this propose, they used green color and depth information. The result showed the feasibility of such discrimination only when there was a significant height difference between the crop and the weed [39].
The main objective of this study was to develop a new utilization of the stereo vision system for discrimination of the rice plant and two types of weed groups by using ANNs and two metaheuristic algorithms for feature selection and classification. Unlike the common methods of using stereo vision that utilize point clouds and disparity maps for weeds and crop detection by their phenotypes and height, the innovation of the presented research lays in the new application of a stereo camera for Plants 2020, 9, 559 4 of 19 rice and weeds identification by splitting the left and right channels of the stereo recorded video. Classification results of two 2D extracted videos were then compared to obtain high-accuracy plant classification and, hence, smart weed-control under natural-illumination under in-field conditions. Little research on weed detection of rice fields using image processing due to the specific conditions of rice field has led to the use of a combination of 2D and 3D vision processing advantages to identify weeds of a rice field. In principle, there should not be much difference between images taken from one scene by two similar lenses located close, but in practice, due to overlapping and shadows caused by vegetation density, there is some ambiguity that can be eliminated by using multiple source information (cameras) and integrating their outputs.

Plant Material
In this study, a rice cultivar (Tarom Mahali) and two common types of weeds (narrow-leaf weeds (Echinochloa crus-galli, Paspalum distichum, and Cyperus difformis) and wide-leaf weeds (Alisma plantago-aquatica and Eclipta prostrata)) were focused on because of their abundance in the selected region. Another reason for choosing these species of weeds was their competition with rice in every stage of growth. The study was performed on a 5 ha rice field in Mazandaran, Iran (36 • 37 48.71" N, 52 • 30 11.39" E), during 2017. The predominant method of rice cultivation in this area is traditional transplanting, which begins around the middle of April. In water and foliar spray are two modes of herbicide application in rice fields. The water method, in addition to the high consumption of herbicides, causes a lot of environmental pollution, for obvious reasons. On the other hand, the foliar spray method in rice fields is a time-consuming activity and requires great care in practice and precision spraying, as a site-specific weed management can increase the efficiency of this method.

Video Data Acquisition
The required data was collected in the form of stereo videos by a stereo camera with different channels of each frame extracted. For this purpose, a Fujifilm FinePix Real 3D-W3 digital camera (equipped with a 10-megapixel CCD sensor capturing stereo videos in AVI format (NTSC)) with a sensitivity of iso 400 and frame resolution of 640 × 480 pixels (30 fps) was used. For holding the camera and moving it across the field, a rail platform was designed ( Figure 1). The camera was attached to the conveyor at a height of 70 cm from soil surface (30 cm from the tip of the top most leaf, approximately) and moved along a 3 m length rail route at 0.10 m/s during the video acquisition in the rice field. An inverter-controlled electromotor was used to move the conveyor. Data analysis was performed using MatLab software 2018 on a computer system with an Intel Core i5-2540m 2.6 GHz CPU and 4 GB of RAM running a 64-bit operating system. Videos were recorded of the rice plant from 2-3 weeks old after transplanting (rice growth stage from code 14 to code 25 on the BBCH scale) [40], at which time the herbicide application would minimize weed competition. An attempt was made to collect as much data as possible in cloudy conditions, with 850-1200 lux illumination.

Pre-Processing and Segmentation
To achieve the aim of this research, the captured stereo video was decomposed into two 2D video channels (left and right) by programming in the FFmpeg software. The videos were then converted to their constituent frames by a code written in Matlab software. In order to remove unneeded information from the frames, segmentation was performed to separate different regions of the frames according to common features, thereby discriminating the plants (either weeds or the crop) and the background (soil, rocks, and residue). Considering the accuracy and fatness of the color-based segmentation, different color spaces (e.g., RGB, HIS, HSV, YIQ, YCbCr, and CMY) were surveyed for selecting the most efficient color model for segmentation. The best results were obtained with RGB color space because of the presence of the green objects on the frames. Equation (1) identifies a pixel as a plant if its green component (II(:,:,2)) is dominant over its blue (II(:,:,3)) and red (II(:,:,1)) components. Numerous studies have been conducted to segment green plants from the background using RGB indices [4,41]. Considering the camera motion speed (0.10 m/s) and natural condition effects, the optimal threshold of 140 for green channels of RGB color space was adopted (in the range of 20-250), according to the method described in Reference [42], by checking different images for eliminating incomplete green components versus the required processing time. Figure 2 shows sample frames on which the green components were segmented. For shape feature extraction, the result of this segmentation had to be converted to binary images, a process that is always associated with the generation of unwanted noise and holes on the image. For solving this problem, morphological closing operation was employed, connecting thin broken components and filling the small holes [43]. A combination of two dilation and erosion operations as a closing filter was used to soften the contours of the object. mask = (1) Plants 2020, 9, x FOR PEER REVIEW 5 of 20 Plants 2020, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/plants (II(:,:,1)) components. Numerous studies have been conducted to segment green plants from the background using RGB indices [4,41]. Considering the camera motion speed (0.10 m/s) and natural condition effects, the optimal threshold of 140 for green channels of RGB color space was adopted (in the range of 20-250), according to the method described in Reference [42], by checking different images for eliminating incomplete green components versus the required processing time. Figure 2 shows sample frames on which the green components were segmented. For shape feature extraction, the result of this segmentation had to be converted to binary images, a process that is always associated with the generation of unwanted noise and holes on the image. For solving this problem, morphological closing operation was employed, connecting thin broken components and filling the small holes [43]. A combination of two dilation and erosion operations as a closing filter was used to soften the contours of the object.
If G(x, y) ≥ 140, then pixel (x, y) is considered as the object (foreground), else (x, y) is considered as the background, since (x, y) is the position of the pixel under consideration, which in turn depends on both the sample image and the capture system used. A series of algorithms were used to extract information from images and process them. In this context, masks were filters used to remove unwanted information and image noise. The growth stage of the rice on the BBCH scale was leaf development and tillering (from 1 week after transplanting to the sixth week) and weeds were leaf development (after three leaves unfolded). Water depth was 10 cm.
Plants 2020, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/plants transplanting to the sixth week) and weeds were leaf development (after three leaves unfolded). Water depth was 10 cm.

Feature Extraction
As a common operation in machine learning, feature extraction has been studied by many researchers [4,26,27]. In principle, feature extraction is performed based on the measurement of geometric properties (such as size and shape) and surface characteristics (such as color and texture) of different objects across an image. In this research, in order to achieve accurate identification of the rice and weeds, a total of 302 color, shape, and texture features were extracted. The texture has been frequently used to classify and discriminate crops and weeds, where leaf occlusion and overlapping are problematic [44]. Two texture-based feature extraction methods have been utilized in the relevant literature; these have been based on either the gray level co-occurrence matrix or histogram analysis. The former technique considers the gray scale values of the pixels and seeks to capture repetitive patterns in an image [45]. The co-occurrence matrix C (i, j) computes the co-occurrence of pixels with gray values i and j at a distance d, defined as a length in a polar coordinates system along an orientation declared by θ. In this study, 146 texture features were extracted at four different neighborhood angles (θ = 0°, 45°, 90°, and 135°) and at a unit distance of 1. The histogram analysis was performed to extract two texture features. Many researchers have used the color characteristics for feature extraction as a key step of image processing [5,30]. Accordingly, 127 color features (based on average and standard deviation of pixel values in either of the three channels of six color spaces (RGB, HIS, HSV, YIQ, CMY, and YCbCr) and vegetation indices [46] were extracted in this study. As a visional feature, the shape can be used for feature extraction accurately when no occlusion or If G(x, y) ≥ 140, then pixel (x, y) is considered as the object (foreground), else (x, y) is considered as the background, since (x, y) is the position of the pixel under consideration, which in turn depends on both the sample image and the capture system used. A series of algorithms were used to extract information from images and process them. In this context, masks were filters used to remove unwanted information and image noise.

Feature Extraction
As a common operation in machine learning, feature extraction has been studied by many researchers [4,26,27]. In principle, feature extraction is performed based on the measurement of geometric properties (such as size and shape) and surface characteristics (such as color and texture) of different objects across an image. In this research, in order to achieve accurate identification of the rice and weeds, a total of 302 color, shape, and texture features were extracted. The texture has been frequently used to classify and discriminate crops and weeds, where leaf occlusion and overlapping are problematic [44]. Two texture-based feature extraction methods have been utilized in the relevant literature; these have been based on either the gray level co-occurrence matrix or histogram analysis. The former technique considers the gray scale values of the pixels and seeks to capture repetitive patterns in an image [45]. The co-occurrence matrix C (i, j) computes the co-occurrence of pixels with gray values i and j at a distance d, defined as a length in a polar coordinates system along an orientation declared by θ. In this study, 146 texture features were extracted at four different neighborhood angles (θ = 0 • , 45 • , 90 • , and 135 • ) and at a unit distance of 1. The histogram analysis was performed to extract Plants 2020, 9, 559 7 of 19 two texture features. Many researchers have used the color characteristics for feature extraction as a key step of image processing [5,30]. Accordingly, 127 color features (based on average and standard deviation of pixel values in either of the three channels of six color spaces (RGB, HIS, HSV, YIQ, CMY, and YCbCr) and vegetation indices [46] were extracted in this study. As a visional feature, the shape can be used for feature extraction accurately when no occlusion or overlapping of the leaves exists [47,48]. In this study, 29 shape features were extracted for each object in this study.

Effective Feature Selection
Not all of the features extracted from an image are equally informative. Indeed, some of the features may be noisy, correlated to other features, or even irrelevant. In order to save the processing time for pattern recognition and increase the classification accuracy, it is very important to select the most effective features among the entire pool of extracted features. For this purpose, several authors have used statistical techniques, as well as ANN-based methods [18,49]. An ANN is a network of artificial neurons connected together to mimic the function of a human's brain and is designed to perform a non-linear signal processing, classification, or regression. Thanks to its capabilities for learning complex functions (supervised learning from known desired output input samples), an ANN can be used to process parallel information (e.g., identification of crop plants on field images) [50,51], and, once it has properly learnt from examples, to generalize to un-seen input data. The present work proposes a hybrid artificial neural network-particle swarm optimization (ANN-PSO) algorithm to help the search, process, and choose the most relevant features from the entire pool of extracted features. The PSO is a bio-inspired computational algorithm that works on the basis of randomly selected populations, called particle swarms. The particles move collectively within a search space in order to achieve the optimal solution, resembling so-called 'bird behavior'. The movement of the particles in the search space is managed by their velocities. An objective function is utilized to evaluate a fitness value for each particle in the PSO. The velocity and position of particles are iteratively updated until the goal is met. To select the most significant features, among the 302 features extracted by ANN-PSO, total input dataset samples (objects) extracted from frames were split into either a training subset (70%) a validation subset (15%), or a test subset (15%), following a uniform random selection of input samples. The PSO algorithm was used to form subsets of different sizes and send them to a multi-layer perception (MLP) neural network. Table 1 shows the empirically tuned parameters of the MLP and PSO. Table 1. The parameters of the multi-layer perception (MLP) and particle swarm optimization (PSO) for the hybrid artificial neural network (ANN)-PSO for selecting the most significant features.

MLP Parameters PSO Parameters
One input layer Swarm size: 30 One hidden layer with 10 neurons Maximum iteration: 20 One output layer with 3 outputs.
Inertia weight damping ratio:

Classification
The last step of the application of computer vision for weed and rice detection is the classification; the process of analyzing various properties of the image features and clustering them into various predefined classes [52].
Different classifiers were used for image classification, such decision trees, ANNs, and SVMs. Classification accuracy is directly associated with the choice of the classifier. Of the mentioned classifiers, an ANN was utilized in this study because of its following advantages:

•
High computation speed; • Ability to efficiently handle noisy inputs; • Data-driven nature, thanks to learning from the training data.
An ANN is formed of successive layers, with each layer being composed of a set of neurons. Weighted connections link the neurons of each layer to all neurons of the preceding and proceeding layers. The first layer (input layer) receives inputs by interacting with the environment, and the last layer (output layer) provides the processed data. The artificial intelligence (AI) incorporated into the proposed classifier in this research was a MLP neural network-a technique inspired by the biological network of neurons. This classifier included a series of parameters (number of neurons in each layer, transfer functions, back-propagation weighting factors and bias learning function, and back-propagation training function), which together defined the network structure. For optimizing the network parameters and successfully executing the ANN, the bee algorithm (BA), with multiple iterations, was used in this study. The BA is an optimization algorithm inspired by the honey bee's foraging behavior [53], where a group of discoverer bees are randomly flown to the field from one patch to another. After returning to the hive, they start dancing [54]. This dance transfers some information about the flower patch, such as the direction toward, the distance to, and the quality of the found flowers [55]. This information helps other bees watching the dance find the flower with no more guidance. In this way, the colony gathers enough food quickly. For performance evaluation of the proposed ANN-BA, the K-nearest neighbors (KNN) classifier was used as a reference to compare the classification results. The KNN is a supervised machine-learning algorithm for statistical classification. It is a popular, simple, and easy-to-use classifier [56] that takes the training data in the training stage and classifies the testing data by comparing them to the training data. In order to evaluate the capabilities of the proposed classifiers for predicting respective classes, 70% of the input data was selected for training and validation of the network, with the remaining 30% of the data used for testing the accuracy of the classifier.

Proposed System for the Classification of Rice and Weed Plants Inside Rice Fields
In order to achieve high-accuracy discrimination of rice and weeds based on stereo vision under natural illumination, four categories of data were processed and classified. In addition to classification of the left and right channel data separately, arithmetic and geometric means of the corresponding features on the two channels were calculated and classified. Finally, classification results of right channel and left channel data and the arithmetic and geometric mean were compared to select the best classification scheme. For this purpose, the pre-processing, segmentation, effective feature extraction, and classification steps, as described in Figure 3, were performed for all four data sets. Figure 3 presents the complete flowchart of the proposed system for rice and weeds classification.

Arithmetic and Geometric Means
Arithmetic and geometric means are two commonly used mathematical terms differing in the method of calculation. Arithmetic mean (or simply the mean) is calculated by adding up all the numbers in the dataset and dividing the result by the total number of data points, while the geometric mean is calculated by multiplying the numbers in the dataset and taking the nth root of the result, where n is the total number of data points [57]. In this study, after extracting the features of the left and right channels and finding the corresponding points, the arithmetic and geometric mean of the points were calculated according to Equations

Effective Feature Extraction with ANN-PSO
The results of effective feature extraction by the hybrid ANN-PSO algorithm for the two channels and arithmetic and geometric means are shown in Table 2. As shown in this table, 6 features were selected from the pool of 302 extracted features and most of the selected features were either color or texture features. Table 3 presents the definitions of the selected features in the four categories.

Effective Feature Extraction with ANN-PSO
The results of effective feature extraction by the hybrid ANN-PSO algorithm for the two channels and arithmetic and geometric means are shown in Table 2. As shown in this table, 6 features were selected from the pool of 302 extracted features and most of the selected features were either color or texture features. Table 3 presents the definitions of the selected features in the four categories.

Description Selected Feature Name
Excess yellow from YIQ color space EXY-YIQ Elongation feature = (L − W)/(L + W) L = length and W = width Elongation feature Ng(i, j)

Classification Using Hybrid Metaheuristic Algorithms
As mentioned earlier, the aim of using the BA was to optimize the parameters of the MLP neural network. For this reason, the training process was iterated for 1000 cycles for all of the four categories. Specifications of the used MLP and optimal values of its parameters are shown in Table 4.

Classification Using Hybrid ANN-BA
The result of performing the hybrid ANN-BA for classification of the test set for the left channel, right channel, arithmetic mean, and geometric mean to three classes (Class 1: rice, Class 2: narrow-leaf weeds, and Class 3: wide-leaf) are tabulated, in the form of the best confusion matrix and overall accuracy, in Table 5. Confusion matrix indicates the performance of the classification model, which is typically a learning-supervised model on the testing dataset by comparing predicted classes (Column) against the actual ones (Row). The results show that the proposed classifier could correctly classify the rice in all four categories. The misclassification cases were much more in narrow-leaf weeds class rather than the other classes, due to the close similarity between the first and second classes. Misclassification of the wide-leaf weeds class under the first (rice) or second classes (narrow-leaf weeds) and vice versa could be a result of moving the camera in the field and recording the video under natural light conditions (Video S1 and Video S2). As shown in the tables, the overall accuracy of the ANN-BA classifier for the right and left channel data (Video S3) achieved 88.74% and 87.96%, respectively. Taking the arithmetic and geometric means as a basis, the accuracy of 92.02% and 90.7% were obtained, respectively. In fact, the classification accuracy increased when the process was based on the arithmetic and geometric means of the corresponding points from the right and left channels. This result expresses the inadequacy of a single 2D movie recorder when classification under natural field condition was concerned. On the other hand, higher classification accuracy was obtained with the arithmetic mean (Video S4) rather than the geometric mean (Video S5), because the geometric averaging was applicable to only positive values, always returning a lower value than the arithmetic mean, while the arithmetic mean covered both positive and negative values, making its value always greater than the geometric mean. This finding further indicates that there were not any outliers in the dataset.

Classification Using KNN Classifier
In order to investigate the performance of the proposed ANN-BA classifier, the KNN classifier was used as a reference for comparison. The results of applying the KNN classifier on the test set for the left channel, right channel, arithmetic mean, and geometric mean to three output classes of 1-rice, 2-narrow-leaf weeds, and 3-wide leaf weeds are presented in Table 6. The result shows that the KNN classifier could classify the first and third classes more correctly than the second class, just like the proposed ANN-BA classifier. Indeed, misclassification of rice under the second class was the most frequent problem, confirming the similarity of rice and narrow-leaf weeds, as explained previously. In total, the KNN generated more cases of misclassification under all of the three classes, as compared to the proposed ANN-BA classifier, returning classification accuracies of 76.62%, 85.59%, 85.84%, and 84.07% based on the right and left channel data and arithmetic and geometric means, respectively. This comparison highlighted the larger capability of the machine-learning classification method compared to the statistical method. Statistical methods are merely based on mathematical equations and some background assumptions, while machine-learning work, by learning from training examples, resembles human cognition. To check for reliability of the proposed classifier, the training process was iterated for 1000 cycles, followed by calculating the mean and standard deviation (STD) values of accuracy. Table 7 reports the results for the proposed hybrid ANN-BA and the KNN classifier in each of the four categories. The results indicated low STD values for both classifiers under three classes in the four categories, proving proper training of the classifiers. The higher mean value of accuracy for the proposed hybrid ANN-BA, as compared to the KNN classifier, indicated better training performance of the ANN-BA classifier.

Classification Performance Evaluation by receiver operating characteristic (ROC) Curves
Performance evaluation is an essential task in machine learning. The receiver operating characteristic (ROC) curve is a graphic representation that shows the performance of a classifier over all possible thresholds. The Y-axis and X-axis of an ROC curve represent the true positive rate (TPR) and false positive rate (FPR). Since the TPR and FPR are defined as "sensitivity" and "1-specificity", respectively, the ROC curve is sometimes introduced as the "sensitivity vs. 1-specificity" plot. Each prediction result of the confusion matrix represents a point on the ROC curve. In the ROC domain, the point (0, 1) represents the perfect classification. The points from a random guess lie on the diagonal line that divides the ROC domain from the left bottom to the top right corner. The points above the diagonal show good (or better than random) classification results, while those below the line show poor (worse than random) results. In order to evaluate the classifier performance more accurately, in addition to the ROC curve, the ROC-best curve was studied in this work. Figure 4 shows the classification performance of the two classifiers on the test dataset from the right and left channel cameras and arithmetic and geometric means in the form of the ROC and ROC-best curves. Based on this figure, the ANN-BA classifier outperformed the KNN classifier in all of the four categories. Moreover, both of the classifiers returned better results based on the arithmetic and geometric means data rather than either right or left channel data alone. To prove this, the area under the curves were calculated (Table 7), showing a larger value for the proposed hybrid ANN-BA classifier compared to the KNN classifier under all of the three classes in all of the four categories. Given the novelty of the proposed methodology, unfortunately, we were not able to find similar approaches to directly compare results with; thus, we had to compare with others. Cheng and Matson (2015) used feature-based methods for rice and weed identification. In this research, the Harris corner detection algorithm was applied to find characteristic points, such as the tips and ears of the leaves. Multiple features were extracted for each point and fed into three machine-learning algorithms (decision tree, SVM, and neural network) to distinguish between weeds and rice. The decision tree classifier returned an accuracy of 98.2%, while the corresponding figures to SVM and naive Bayes were 95.3% and 93.1%, respectively. They used a clustering algorithm for noise removal and for clustering the most similar objects into three clusters. Working on presumably edited/processed images retrieved from the Internet, rather than taking actual photos, they ended up with artificially high accuracies. Sabzi et al. (2018a) classified potato plants among three different kinds of weeds by a novel computer vision system. They used 2D video images acquired under outdoor lighting conditions and used them to train and test two metaheuristic algorithms (ANN-CA for selecting effective features and ANN-HS for classification) for optimizing the performance of a neural network classifier. They further compared their results against the KNN, a statistical classifier. The result of this study showed that the proposed expert system could achieve a high accuracy of 98.38%, as compared to the 88.25% accuracy achieved by the KNN classifier [27]. In another study by Sabzi et al. (2018b), a digital camera was used as a video acquisition system for classification of potato plants among three weed types using three hybrid ANN classifiers based on an ant colony algorithm, radial basis function ANN, and discriminant analysis, leading to ultimate accuracies of 98.13%, 91.23%, and 70.8%, respectively [58]. A comparison between the results of this research and the proposed methodology in the present study revealed the capability of the proposed hybrid ANN-BA classification algorithm. The mentioned studies have been done in the dry fields with low density of weeds and crops, while the present study focused on a densely cultivated rice field with special conditions (moving the video capturing device on wet soil), where high accuracy could not be expected with a single digital camera (either right or left channel). This emphasizes the efficiency of using a pair of cameras (stereo vision) for increasing the classification accuracy.    The average time for image acquisition and pre-processing operations was 0.233 s. Altogether, the feature extraction stage needed 0.175 s and the classification time was 0.164 s on average, thus resulting in an average total time of 0.633 s per frame.
Considering that the identification of weeds from the rice crop was the first stage of site-specific weed management to control weeds, the results of this study showed that the proposed methodology with the hybrid ANN-BA classification algorithm had a good performance in discriminating rice and weeds (Tables 8 and 9).  Finally, Figure 5 contains results of the proposed computer vision expert system in segmentation and classification of rice and weeds in the four categories for a frame. It can be observed that the system was capable of more carefully detecting and classifying all existing plants in arithmetic mean (Video S6) and geometric mean (Video S7) categories than the left channel (Video S8) and right channel (Video S9) categories.

Conclusions
Application of a new stereo vision-based method for weeds and crop classification across a densely cultivated rice crop was developed. For this aim, stereo videos were recorded in the rice field and decomposed into right and left channel data. All weeds in the rice field were classified under narrow-leaf weeds and wide-leaf weeds. In order to enhance the classification accuracy, two metaheuristic algorithms, namely PSO and BA, were used to optimize the performance of the neural network for selecting the most effective features and classification, respectively. Results of this classification method were compared to those of the KNN classifier on a set of test data consisting of right channel and left channel data and arithmetic and geometric means of the corresponding points on the two channels. Results proved the promising capability of the stereo vision technology proposed, averaging the corresponding points on different channels and the proposed hybrid ANN-BA classifier for increased classification accuracy. Future research work may evaluate the proposed methodology over other varieties of rice and on different density cultivated crops, under different farm conditions. Funding: This study was financially supported by University of Mohaghegh Ardabili.