Recognition of Bloom/Yield in Crop Images Using Deep Learning Models for Smart Agriculture: A Review

Precision agriculture is a crucial way to achieve greater yields by utilizing the natural deposits in a diverse environment. The yield of a crop may vary from year to year depending on the variations in climate, soil parameters and fertilizers used. Automation in the agricultural industry moderates the usage of resources and can increase the quality of food in the post-pandemic world. Agricultural robots have been developed for crop seeding, monitoring, weed control, pest management and harvesting. Physical counting of fruitlets, flowers or fruits at various phases of growth is labour intensive as well as an expensive procedure for crop yield estimation. Remote sensing technologies offer accuracy and reliability in crop yield prediction and estimation. The automation in image analysis with computer vision and deep learning models provides precise field and yield maps. In this review, it has been observed that the application of deep learning techniques has provided a better accuracy for smart farming. The crops taken for the study are fruits such as grapes, apples, citrus, tomatoes and vegetables such as sugarcane, corn, soybean, cucumber, maize, wheat. The research works which are carried out in this research paper are available as products for applications such as robot harvesting, weed detection and pest infestation. The methods which made use of conventional deep learning techniques have provided an average accuracy of 92.51%. This paper elucidates the diverse automation approaches for crop yield detection techniques with virtual analysis and classifier approaches. Technical hitches in the deep learning techniques have progressed with limitations and future investigations are also surveyed. This work highlights the machine vision and deep learning models which need to be explored for improving automated precision farming expressly during this pandemic.


Introduction
Smart farming helps farmers plan their work with the data obtained with agricultural drones, satellites and sensors. The detailed topography, climate forecasts, temperature and acidity of the soil can be accessed by sensors positioned on the agricultural farms. Precision agriculture affords farmers with compilations of statistics to: • create an outline of the agricultural land • detect environmental risks • manage the usage of fertilizers and pesticides • forecast crop yields • organize for harvest was developed considering light variations. The machine vision system changed its color threshold in response to alterations in the light intensity [16]. Robot harvesting machines achieve lower accuracy in spotting and picking crops due to occlusions caused by leaves and twigs [17,18]. Modern machine vision techniques and machine learning models with assorted sensors and cameras can overcome these inadequacies. The basic system of a robot harvester must perform functions such as: detect the fruit or detect the disease, pick the fruit/ berry without damaging it, guide the harvester to navigate the field, maneuver irrespective of the lighting and weather conditions, be cost-effective and have a simple mechanical design [19]. Being a review paper, we have extensively surveyed the merits and demerits of the deep learning techniques used in smart agriculture. A keyword-based search was performed for transactions, journal and conference papers with the scientific indexing from databases such as IEEE Xplore, Scopus, Wiley Online library and ScienceDirect. We used "machine vision" and "deep learning techniques in agriculture" as keywords and filtered the papers for various agricultural applications. This review intends to help researchers further explore machine vision techniques and the various classifiers of deep learning models used in smart farming.
The outline of our review is as follows: Section 2 discusses with the various image acquisition approaches at the ground level and aerial view. The modes of automation for diverse agricultural applications are surveyed in Section 3. Section 4 deals with the image preprocessing techniques involved in enhancing the raw images. Section 5 lists the image segmentation approaches for distinguishing fruit/flower pixels from the background pixels. Section 6 emphases the selection and extraction of features for the deep classifier models. Section 7 discusses further literature on classifiers used in deep learning models for various agricultural applications, followed by the available datasets. Section 8 expands the survey with the performance metrics used to compare the existing and proposed algorithms. Section 9 interprets with the pros and cons of the existing approaches. Section 10 concludes the survey with a discussion of potential future work.

Materials and Methods
Many studies on machine vision and deep learning models for fruit and flower detection, counting and harvesting are being formulated. The accurate yield estimation for diverse vegetable and fruit crops is extremely essential for better harvesting, marketing and logistics planning. Bloom intensity estimation effectively provides crop yield predictions and fruit detection with machine vision techniques facilitates yield estimations. The accurate prediction of yield helps the farmers to improve the quality of the crop at an early stage.
This review deals with diverse research issues in agricultural automation such as: image acquisition using handheld cameras under different lighting conditions; approaches employing image segmentation techniques; identification of features with various descriptors; improving the classification rate with deep learning models; achieving high accuracy and reducing the error rates; the essential challenges to be tackled in the future.
The outcome of this research will reduce the labor, time consumption and cost-effective machineries to support farmers in precision farming. The stages of the literature review are shown in Figure 1. Several publicly available datasets were used in this study. The Apple A dataset provides a collection of apple flower images recorded with a hand-held camera and the Apple B dataset provides a collection of apple flower images taken by a utility vehicle. Figure 2a presents the distribution of papers taken for the literature review. From the bar chart, it is relevant that 50% of papers taken for study are published in the years 2018, 2019 and 2020. Figure 2b presents the distribution of agriculture-related articles in each journal taken for the literature study. Several publicly available datasets were used in this study. The Apple A dataset provides a collection of apple flower images recorded with a hand-held camera and the Apple B dataset provides a collection of apple flower images taken by a utility vehicle. Figure 2a presents the distribution of papers taken for the literature review. From the bar chart, it is relevant that 50% of papers taken for study are published in the years 2018, 2019 and 2020. Figure 2b presents the distribution of agriculture-related articles in each journal taken for the literature study.

Crop Image Acquisition
Machine vision systems attempt to provide automatic analysis and image-based spection data for guidance and control of the machine by integrating the accessible me ods in innovative ways for solving the real-time problems [20]. The so called agrob execute their farm duties through developed machine learning technology and robot sion systems [21]. The mapping, navigation and detection for an autonomous agrobo

Crop Image Acquisition
Machine vision systems attempt to provide automatic analysis and image-based inspection data for guidance and control of the machine by integrating the accessible methods in innovative ways for solving the real-time problems [20]. The so called agrobots execute their farm duties through developed machine learning technology and robot vision systems [21]. The mapping, navigation and detection for an autonomous agrobot to control and plan the execution can be achieved with these machine vision algorithms [22]. The machine vision system requires an assortment of image processing technologies such as: filtering, thresholding, segmentation, color-texture-shape analysis, pattern recognition, edge detection, blob detection and diverse neural network processing models. The machine learning models combined with machine vision techniques enhance the performance of the automated system to perform farm duties more precisely.
2.1.1. Crop Image Acquired by Cameras at Ground Level RGB images are captured with cameras in a small scale or on a larger scale depending on the area of the field. The images of the plants are captured utilizing common digital cameras with a high image resolution [23]. The recordings were taken under various environmental conditions. The lighting conditions include natural sunlight, with or without shadows, and artificial illumination or infrared lighting at night. Fusion of RGB with near infrared (NIR) multimodal pictures acquired all through the day as well as at night was used for fruit detection [24]. To enhance the fruit detection, pixel-based fusion techniques like the Laplacian pyramid transform (LPT) and fuzzy logic were tested. Fuzzy logic of grey image functioned better than LPT based on image fusion indices. The segmentation success rate was 0.89 for fuzzy logic and 0.72 for LPT. A visible image and a thermal infrared image fusion enhanced fruit detection [25]. RGB images were analyzed to estimate the chlorophyll content in potato plants [26] and maturity of the tomato plants. Image acquisition cameras provide high-resolution real-time pictures that are further processed contingent on the prerequisites of the machine. The artificial active lighting enhances the system with a ring flash fastened around the lenses [23]. The image acquisition at the bloom stage (flowering) predicted the crop yield using image processing algorithms [27]. The image acquisition for training data needs to be cautiously chosen. The selection of the architecture of the neural network should not be affected by the size of training datasets with respect to its performance. The machine vision techniques encounter specific problems owing to the configuration of the agricultural fields for image acquisition such as: Natural illumination to detect the fruit/berries on the plant [28,29] Multiple recognition instances of the same fruit, acquired from subsequent images that may perhaps lead to miscounting [30].
Occlusion of fruits due to foliage, twigs, branches or additional fruit [14,31]. Location of camera with respect to distance and angle [32].
Sample dataset images of tomato crop phenology are shown in Figure 3.
machine learning models combined with machine vision techniques enhance the performance of the automated system to perform farm duties more precisely.
2.1.1. Crop Image Acquired by Cameras at Ground Level RGB images are captured with cameras in a small scale or on a larger scale depending on the area of the field. The images of the plants are captured utilizing common digital cameras with a high image resolution [23]. The recordings were taken under various environmental conditions. The lighting conditions include natural sunlight, with or without shadows, and artificial illumination or infrared lighting at night. Fusion of RGB with near infrared (NIR) multimodal pictures acquired all through the day as well as at night was used for fruit detection [24]. To enhance the fruit detection, pixel-based fusion techniques like the Laplacian pyramid transform (LPT) and fuzzy logic were tested. Fuzzy logic of grey image functioned better than LPT based on image fusion indices. The segmentation success rate was 0.89 for fuzzy logic and 0.72 for LPT. A visible image and a thermal infrared image fusion enhanced fruit detection [25]. RGB images were analyzed to estimate the chlorophyll content in potato plants [26] and maturity of the tomato plants. Image acquisition cameras provide high-resolution real-time pictures that are further processed contingent on the prerequisites of the machine. The artificial active lighting enhances the system with a ring flash fastened around the lenses [23]. The image acquisition at the bloom stage (flowering) predicted the crop yield using image processing algorithms [27]. The image acquisition for training data needs to be cautiously chosen. The selection of the architecture of the neural network should not be affected by the size of training datasets with respect to its performance. The machine vision techniques encounter specific problems owing to the configuration of the agricultural fields for image acquisition such as: ♦ Natural illumination to detect the fruit/berries on the plant [28,29] ♦ Multiple recognition instances of the same fruit, acquired from subsequent images that may perhaps lead to miscounting [30]. ♦ Occlusion of fruits due to foliage, twigs, branches or additional fruit [14,31]. ♦ Location of camera with respect to distance and angle [32].
Sample dataset images of tomato crop phenology are shown in Figure 3.

Crop Image Acquisition by Remote Sensing
The images obtained through Landsat 8 OLI have low resolution and a pan-sharpening technique is applied to calculate the vegetation indices [33][34][35]. The multispectral and hyperspectral images acquired through remote sensing were used for monitoring seasonally variable crop and soil status features such as crop diseases, crop biomass, the nitrogen content in leaves, weed and insect penetration, chlorophyll levels of leaves, moisture content, surface roughness, soil texture and soil temperature. The satellite images have low

Crop Image Acquisition by Remote Sensing
The images obtained through Landsat 8 OLI have low resolution and a pan-sharpening technique is applied to calculate the vegetation indices [33][34][35]. The multispectral and hyperspectral images acquired through remote sensing were used for monitoring seasonally variable crop and soil status features such as crop diseases, crop biomass, the nitrogen content in leaves, weed and insect penetration, chlorophyll levels of leaves, moisture content, surface roughness, soil texture and soil temperature. The satellite images have low spatial resolution compared to the images acquired via drones or in-situ images. This requires cloud scattering, radiometric, atmospheric and geometric correction with assorted techniques for effective calibration of the acquired remote sensing data [8]. The obstructions in remote sensing data may occur due to cloud coverage during the course of the satellite overflight. The corn yield estimation was achieved with the various data acquired from MODIS product such as leaf area index (LAI), gross primary production (GPP), fraction of photosynthetically active radiation (FPAR), evapotranspiration (ET), soil moisture (SM) and enhanced vegetation Index (EVI) using deep learning techniques [36][37][38][39][40]. The yield prediction is executed more effectively with RGB data than normalized difference vegetation indices (NDVI) images [41].

Unmanned Ground Vehicles (UGVs)
The autonomous ground vehicles for tree pruning, and blossom stage tasks such as fruit thinning, mowing, spraying pests, sensing, fruit harvesting and post harvesting were trained, reconfigured and reassigned for numerous operations. An autonomous prime mover could complete 300 km of driving in orchards with no supervision to reduce labor costs [42]. The autonomous navigation technique in row detection for tiny plants adapts a pattern with a Hough transform to assess the row spacing and lateral offset with real-time field data [28,43]. The localization, mapping, path planning and the agricultural field information with miscellaneous sensors guide autonomous ground vehicles to perform various farm duties. Obstacle-averting decisions in agricultural terrain are complicated. The fusion of sensors with multiple algorithms and multiple robots detects the defined obstacles, controls and navigates the agricultural environment [44][45][46]. The automation system increases the farm proficiency with the self-guided vehicles and autonomous execution of farm duties like spraying, pruning, mowing, thinning and harvesting. A phenotyping robot can be fast and more precise; a next-best view (NBV) algorithm collects the information of unknown obstacles to plan the plant phenotyping automatically [47].
The autonomous harvesting process entails the detection of targets by the machine vision model and plans the sequential task of grasping the real targets by the manipulators or grippers. The sugarcane harvester machine combined with machine vision algorithm detects the damaged billets, consequently increasing the quality of the production [48]. The autonomous rice harvester with a combined robot performed harvesting, unloading and restarting with adequate accuracy [49]. The autonomous harvesting grippers with machine vision locate target like peduncles for various crops and remove the leaves and stems as obstacles to improve the harvesting system [50]. A manipulator (Jaco arm) can perform trimming of a bush into three shapes and the navigation system tracked the generalized travelling salesman problem (GTSP) [51]. Reducing the cycle time for harvesting represents a vital role for the robot industry. The review showed a kiwi harvester could achieved the shortest cycle time of 1 s due to a low-cost and effective manipulator [52,53]. The damage caused during the grasping of the fruit is one of the main concerns in dealing with manipulators [54]. An arc shaped finger was designed and demonstrated to reduce injury to the cortex of apples.
Autonomous robots are widely used in weed detection and management [55]. The agricultural robots are quite expensive and not widely used due to safety reasons, mechanical and industrial limitations. The pest sprayer robot lessens the exposure of human workers to pesticides, reducing medical hazards [56][57][58]. A high-resolution camera for a machine vision system along with accurate sensors and increased number of manipulators executing in parallel with human collaboration can progress the agricultural automation industry [59]. Dual arm manipulators in a single harvester with one gripper dedicated to moving aside any obstacles, could successfully pick strawberries [16]. An autonomous robot can navigate in straight or curved rows without having hundreds of programmed waypoints, by utilizing a 2D laser scanner [46]. A 3D simulation with real-time geographic coordinates and a first order approximation model was designed as a skid-steering autonomous robot [60]. The dynamic and kinematic constraints of a path-planning robot were successfully applied to a yield prediction and harvest scheduling path planner autonomous machine [61]. The motion control of the manipulator has been realized with a TRAC-IK kinematic solver. Some other kinematic solvers used are ROS MoveIt, and lazy PRM planner [62].

Unmanned Aerial Vehicles (UAVs)
Drones or unmanned aerial vehicles mounted with RGB-NIR cameras afford data with high spatial resolution. UAVs can cover the tree crowns and plants and provide multi-spectral images by flying at low altitudes [63]. Agricultural drones provide a bird's eye view with multispectral images and survey the field periodically to provide information about the crops. Drones provide bird-view snapshots of the agronomic world. The aerial images are acquired through drones equipped with a high definition RGB camera with 4k resolution to snap images along with an attached GPS [64]. Depending on the agricultural application, UAV platforms with diverse embedded technologies have been commercialized. UAVs can fly at lower as well as higher altitudes, depending on the requirements of the monitoring function at hand [65].

Enhancement of Captured Crop Images for Bloom/Yield Detection
The enhancement process eliminates noise or blur in an image. Various techniques like bilinear, nearest neighbor, bicubic, histogram equalization, iterative-curvature-based interpolation, and linear/non-linear filtering enhancement were employed in applications such as resizing, image reduction, image registration, zooming and to alter spatiogeometric distortions.

Resolution Enhancement
Real time captured color images are resized with techniques like the bicubic interpolation or bilinear interpolation method for computational simplicity. Bicubic interpolation technique resample images considering 16 (4 × 4) pixels on a 2D grid to furnish a smooth scaled image. In remote sensing applications, a digital terrain model established process improves the accuracy and efficacy of the satellite data [66]. To unify the dimensions of the samples, resizing of the images is necessary. The effectiveness in categorization can be ensured by increasing the resolution of satellite images through a pan-sharpening technique. The accuracy of the crop yield estimation relies upon the total number as well as the size of the samples. The size of the input image for classification persists as a crucial parameter. The average size of the featured templates defines the minimum size of the image window [67]. The resolution of the images was chosen based on the regions of interest (ROIs) which were extracted from the captured real time frame. The importance of positioning of the camera angles such as azimuth and zenith angles, were explained in [32]. The detection rate increases with the increase in number of captured images of the same crop from multifarious viewpoints.

Filtering
The images require filtering techniques to diminish noisy pixels. NIR filters help to trace the visible light precisely. A Gaussian filter is applied to the tonal images to lessen the noise [68]. The separation of fruit from the background pixels along with noise elimination uses a Gaussian density function, emphasized with erosion and dilation, which erodes the neighborhood pixels [23]. The Gaussian filtering affects the edges during noise removal for a fixed window size. The median filter preserves edges effectively and eliminates the impulsive noise in digital images. The templates are transformed with K-means clustering based on Euclidean metrics to rescale images considering points of interest. For the enhancement of an RGB image and to remove the edge pixel region, a variance filter was used to replace the individual pixels using the neighbourhood variance value of the R, G, B regions correspondingly [14]. A median filter can reduce the noise caused by sun ray illumination.

Histogram Equalization
In computer vision, the image histogram can be epitomized graphically with pixels plotted through tonal variations for images to analyze the peaks and valleys and consequently to uncover the threshold value. The histograms applied for color spaces assist in background removal to improve the efficiency as well as accuracy [69]. The optimal threshold value can be resolved automatically with the unimodal attributes generated by the grey-level histogram of the luminance designed for natural images. The histogram for each pixel in the image constructs peaks and valleys for each object in accordance with the color. As a consequence, a threshold value can be procured for each entity in the image. Normalized histograms were used for the correlation with persimmon fruit [70].
Our investigation indicates that the most common color space model used in various yield recognition applications was RGB. Image resizing processes are applied in most of the works to unify the dimensions of the samples and to speed up the training process in deep learning models. High-resolution cameras are commonly used for capturing the details of the crops. In most of the works, color-based or threshold-based segmentation was performed to extract the region of interest. Table 1 lists the miscellaneous modes of capture and their enhancement methods.

Crop Image Segmentation
The image-based segmentation is the process of classification of parts of images into fruit, leaf, stem, flower or any background as non-plant pixels. In this method, the acquired raw images are modified to lessen the effects of blur, noise and distortion to improve the image quality. Recent advances in computer vision enable us to analyze each pixel of an image. For identifying the pixel region as an individual fruit, leaf, flower or twig, image segmentation approaches are required [76,77]. Some of the image-segmentation approaches for machine vision system have been reviewed in this article.

Threshold-Based Segmentation
The partitioning of an image into its foreground and background by a threshold value is defined by exploring the peaks and valleys from the histogram. The optimal threshold value segments the object from the background. The threshold magnitudes were determined by the trial-and-error method for assorted color spaces, to procure the required color layer. The color image acquired will be instantly threshold segmented into the H layer by the Otsu method for differentiating reddish grapes region from greenish background. The histogram-based thresholding for the H component eliminated the twigs, leaf, sky, trunk and sky from real-time images. Local 3D threshold values computed for different smaller regions in network device interface (NDI) space of the image with high, medium and low illumination conditions could reduce the false detection of real fruits [78]. The tracking of the fruit with respect to new detection are estimated by the boundary threshold value and intersection of union (IoU) threshold value. The threshold value was computed for the red regions to eliminate the background images with morphological functions for the estimation of tomatoes. The EVI data contaminated with clouds were smoothened by a hard threshold [68]. The false positive elimination was made on the basis of cluster reflectance, geometry and positioning. The clusters were segmented with a 3D point cloud with a reflectance threshold value. The blob and pixel-based segmentation with X-means clustering technique classified and detected individual tomato berry from a fruit cluster [79].

Color-Based Segmentation
Machine vision for harvesting incorporates miscellaneous color spaces like RGB, HSI, L*A*B*, CIE Lab changing in harmony with the illumination of the environment. The color space, L*A*B* restores the human vision based on chromatic eccentricity level of the image. The detection of the fruit during the ripening period uses RGB and HSI color models along with the calibration spheres to resolve the size of apple fruits in [15,80]. The fruit detection by thresholding of grey images was not satisfactory, as the histogram values of the color images and grey images were not unimodal. The color of the fruit was accounted for with the calculated pixels of an individual citrus fruit and tomato berry. The color-based segmentation may be effective only under natural daylight conditions. The multiband ratio-based segmentation has a self-adaptive range for diverse illumination effects accomplished with an Otsu threshold value.

Segmentation Based on Texture Analysis
Texture-based segmentation extracts the regions of interest from images based on spatially distributed boundaries with similar pixels. The Wigner-Ville distribution defines an auto-correlation function of the time-frequency domain to construct the textures of the color segmented image [81]. Entropy (E) and smoothness (S) were estimated to remove the false positives. The lower entropy value was categorized as fruits when compared with the values of leaves and twigs. The fruit identification with texture-based analysis extracts the regions with similar adjacent pixels and partitions the needed fruit pixels from the background pixels. A revolution invariant circular Gabor texture segmentation with color and shape features-an 'Eigenfruit' algorithm-was proposed to detect green citrus fruits [74].

Segmentation Based on Shapes
To preserve shapes and to reconstruct the captured scene, a 3D reconstruction shape algorithm with image registration practices is employed for pruning vines [82]. Considering only the color features may lead to many false positives due to the similarity of the green colors of fruits and leaves. A circular Hough transform can identify the circular citrus fruits by merging multiple detections along with the histograms of H, R, B components [69]. The calibration measurements with destructive hand samples by the time of imaging provide accurate prediction of vine yields [83].

Morphological Operations
The morphological operations involve the conversion of pixel regions to individual fruits to be counted. The eccentricity was calculated for individual color segmented apple regions; thereby, a threshold value finds a relatively round region, which further determines individual apples without occlusion. In the case of two or more occluded fruits, the length of the ellipse was calculated and its major axis was split into two segments [30]. The parting of individual fruits from clusters or to link disjoint fragments of the same fruit, the watershed algorithm was used along with a circular Hough transform. The watershed algorithm executed the work better than circular Hough transform. The watershed transformation is an efficient segmentation algorithm which considers the image as a homogeneous topographic plane. The morphological functions were applied with an Euclidean transform and watershed gradient lines to detect small blobbed cucumbers thereby eliminating small leaves and flowers [67].

Feature Extraction for Classification
Feature extraction with color values of RGB converted to HSI color values determines the maturity of tomato samples with image processing techniques. The deep learning methods do not require hand-crafted features during training of data. The basic convolution neural network (CNN) architecture with its input layer, intervening convolution and maxpooling or sub-sampling layer and the output layers automatically extracts feature as well as classifies diverse object classes in images. Haar-like features based on edge, radian and line were experimented for grey images combined with color analysis in an Adaboost classifier to eliminate the false negative ratio. The texture features and the maximally stable color region (MSCR) descriptor sets, govern fruit detection in the subdivided support window with a frequency distributed histogram using a support vector machine (SVM) classifier. The attribute profiles with multilevel morphological characteristics such as area (dimensions of the region in terms of pixels), standard deviation (texture measure of the region) and moment of inertia (shape of the region) offered a vast advancement upon state-of-art descriptors in the classification of weed and crop using machine vision [84]. The estimation of fruit is realized with a fruit-as-feature as a SfM, which converts 2D traces to 3D markers. This feature improves the system with CNN classifier to eliminate scenarios of double counting of fruits and was a faster algorithm when paralleled with scale-invariant feature transform (SIFT) features [85]. The feature sets like closeness, solidity, extent, compactness and texture were selected by a sequential forward selection and RELIEF algorithm and implemented with SVM for yield estimation and prediction [86]. A combination of histograms of oriented gradients (HOG) features in color image, false color removal (FCR) technique and non-maximum suppression (NMS) features trained using a SVM classifier detected mature tomatoes with a processing time of 0.95 s [87].
The other parameters selected by feature assorting algorithms like variance inflation factor, sequential forward parameter descriptor, random forest variable and correlationbased descriptor selection were number of wells, area, tanks, the canal lengths and soil capacity, landscape features. The descriptors selected for crop yield prediction were rainfall, maximum-average-minimum temperature, solar radiation, planting area, irrigation water depth and season duration [88]. Automatic disease detection with computer vision technology can treat the crop at the earliest, which consequently improves the quality and increase the crop yield. Crop disease detection can utilize simple linear iterative clustering features to segment the super pixels in a CIELAB color model [89]. The autonomous maturity detection of tomato berries was developed with the fusion of multiple (colortexture) features using an iterative RELIEF algorithm [90]. Principal component analysis (PCA), a pixel level classification technique, could automatically detect diseases in pepper leaves from the color features [91].

Deep Learning Models
Deep learning models have been used in diverse applications of crop yield measurements such as crop monitoring, prediction, estimation and fruit detection in harvesting with numerous data sets for the machine to learn. The architecture can be implemented in different ways like, deep Boltzmann machine, deep belief network, convolutional neural network and stacked auto-encoders. The CNN architecture learns in depth the hierarchical features with residual blocks and soft-NMS decays the detected object with the bounding boxes. The networks are interpreted as universal approximation theorems with hidden layers, filters and hyper-parameters. Previous works report on the prediction accuracy with respect to the number of convolutional layers. The increase in the number of convolutional layers improves the accuracy of the network.

Deep Architectures in Smart Farming
A deep convolutional neural network (DCNN) is a multi-layered neuron, which is trained with complex patterns provided with appropriately classified features of an image. The InceptionV3 model assists as a conventional image feature extractor to classify fruit and background pixels in an image. The classifier localizes the fruits to count the quantities of fruit present [31,92] and classify the species of tomato [93]. A K-nearest neighbour (KNN) classifier was employed to classify the fruit pixels in trained datasets with a threshold pixel value set as a fruit pixel. The SVM functions for pattern classification as well as linear regression assessment, based on the selected features. Darknet classifier with a trained "you only look once" (YOLO) model detected iceberg lettuce [94] and grapes [95] with edges for harvesting using a Vegebot. YOLO models offer a high objects detection rate in real-time when compared to faster region-based CNN (FRCNN) [96].
The AdaBoost model structures the strong traditional classifier by combining the weak classifiers linearly with minimal thresholding tasks and Haar-like features to detect tomato berries with an accuracy of 96% [72]. A multi-modal faster region-based CNN model constructs an efficient fruit yield detection technique with multifarious modalities by the fusion of RGB and near-infrared images and has improved the performance up to 0.83 F1 score [97]. The dataset images were fed to the R-CNN model to generate the feature map for classification.
The spatiotemporal exploration from remote sensing image data of normalized difference vegetation indices were trained with a spiking neural network (SNN) to plan crop yield prediction and crop yield estimation of winter wheat [98]. A better prediction algorithm for corn, soybean [99] and paddy crops was proposed with a (feed forward back propagation) artificial neural network (ANN) and later with a fusion of multiple linear regression (MLR). The linear discriminant analysis (LDA) approach eradicates the imbalance generated from the performance value attained through an ANN classifier [100]. The fusion of huge datasets was implemented and compared with various machine learning models like SVM, DL, extremely randomized trees (ERT) and random forest (RF) for the estimation of corn yield [36]. The deep learning (DL) model succeeded with high accuracies with respect to correlation coefficients. The detection of flowers in an image accomplished by a deep learning model in semantic segmentation of CNN and SVM classifier helps crop yield management. The image segmentation techniques and canopy features were used by backpropagation neural network (BPNN) model to train the system for the apple yield prediction [101]. The SVM and kNN classifiers were efficient, with an accuracy of 98.49% and 98.50%. Deep convolutional neural networks were developed to identify plant diseases and to predict the macronutrient deficiencies during the flowering and fruit development stage [102]. The visual geometry group (VGG) CNN architecture identified plant diseases with the leaf images of the plants and communicated the results to farmers through smart phones [103,104]. The endemic fungal infection diagnosis in the winter wheat [89] was validated and trained with Imagenet datasets and implemented with an adaptive deep CNN. The deep CNN model with GoogleNet classified nine diseases in the tomato leaves [105]. The defects in the external regions and the occlusion of flower and berries of tomatoes were identified with deep autoencoders and a residual neural network (ResNet) 50 classifier [106,107]. A leaf-based disease identification model was developed with a random forest classifier trained with HOG features and could detect diseases on papaya leaves [108].
Ripeness estimation is required in the agricultural industry to know the quality and level of maturity of the fruit. The ripening of tomatoes was detected with the fusion of features extracted and classified using a weighted relevance vector machine (RVM) as a bilayer classification approach for harvesting agrobots. The maturity levels in tomatoes were detected with the color features classified with BPNN model. A fuzzy rule-based classification (FRBCS) approach was proposed based on the color feature with decision trees (DT) and Mamdani fuzzy technique to estimate six stages of maturity level in tomato berries [109]. A mature-tomato can be identified with a SVM classifier trained by HOG features along with false elimination and overlap removal features.

Network Training Datasets and Tools
The deep learning architectures vary widely based on the diverse applications and models that are implemented to train them. The deep learning techniques analyze huge datasets in a very short computation period to predict the sowing time as well as the optimum harvesting time of the plants. The available datasets could be accessed to predict and detect diseases in the crops. For creating a deep learning model, there exist various architectures which may be pre-trained with diverse datasets. Huge datasets with vast numbers of input images are required to train the deep learning models to resolve complicated challenges. The numerous models and datasets include CaffeNet, modified Inception-ResNet, YOLO version 3 trained by Darknet classifier, MobileNet, R-CNN trained using ResNet 152, Resnet 50, AlexNet, GoogLeNet, Overfeat, AlexNetOWTBn and VGG, ReLU, GitHub and Kaggle. The neural network libraries that work with Python comprise of keras, DeepLaB + RGR, TensorFlow, Caffe, R library, Torch7, LuaJIT, PyTorch, pylearn2, Theano and the Deep Learning Matlab Toolbox. The available datasets for crop images are listed in Table 2.

Performance Metrics
The image enhancement techniques, image processing algorithms, detection and prediction algorithms with classifier approaches were compared and evaluated with diverse performance metrics. The comparative studies presented by diverse deep learning models were validated with assorted performance metrics which are as follows: Root mean square error (RMSE) Normalized mean absolute error (MAE) Root relative square error (RRSE) Correlation coefficient (R) Mean forecast error Average cycle time Harvest and detachment success Mean absolute percentage error (MAPE) Root relative square error (RRSE) Receiver operating characteristics Precision, recall, F-measure Table 3 lists the various classifiers used for different agricultural applications with their respective pros and cons and some topics for future work.

Discussion
Precise agricultural farming requires constant innovation to increase the quantity and quality of food production. The machine vision techniques for machine learning approaches in the automation industry have both positive features and shortcomings that are discussed in this section.

Advantages and Disadvantages
The acquired images may be prone to be degradation caused by misfocus of the camera, poor lighting conditions or sensor noise. The image enhancement techniques The image segmentation techniques are easy to implement and modify to classify pixels with less computation. The threshold segmentation requires appropriate lighting conditions. The optimal threshold value has to be selected, but it may not be pertinent for every application. Any background complexity increases the error rate and computation time. The color-based segmentation has constraints due to the non-uniform light sensitivity. Otsu thresholds excel in the detection of edges and select the threshold value based on the features provided to the image. The watershed segmentation provides continuous boundaries however, with consequent complexity in the calculation of the gradients. The texture and shape-based segmentation are time-consuming and provide blurred boundaries. To optimize the computer vision technology, further exploration in unstable agricultural environments has to be formulated.
The feature selection process reduces the quantity of input data while developing a predictive classifier model. Haar wavelet features combined with an AdaBoost classifier achieved high accuracy. The feature selection prioritizes the existing features in a dataset. The PCA can outperform other features with high accuracy by the pixel-level identification of input as original image compared with the input features. The SIFT detection algorithm requires scaling of local features in the images. The HOG method can extract global features by computing the edge gradient. HOG+FCR+NMS achieved a computation time of 0.95s for maturity detection. The hybrid approaches in feature extraction can improve classification and computation time.
The DL model with SVM, BPNN classifiers outperformed other classifiers. The SVM classifiers provide less error with effective prediction but require abundant datasets and are more complex and delicate to handle varied datatypes. The TensorFlow library endeavors to uncover optimal policy and does not wait till the termination to update the utility function. K-NN classifiers are robust in classifying the data with zero cost in learning process. These classifiers require large datasets with high computation for mixed data. The DL can extract the required features based on color, texture, shape and SIFT feature extraction processes. The combination of ANN and MLR classifier provided the highest accuracy in crop prediction. DL classifiers were used in a wide range of agricultural applications with an average performance F1 score of 0.8. Errors occurred due to the occlusion of leaves or cluster of fruits. The fruit detection for robot harvesting and yield estimation outperformed using a combination of CNN and linear regression models. The need of large datasets as input for training increases the computation time for DL approach. The SVM classifiers provide high accuracy with improved computation time. The fusion of the classifiers with assorted features may improve the computer vision technique and DL model.

Conclusions
The progression and challenges of various image processing and deep learning classification techniques in agricultural farm duties were analyzed in this paper. The inferences based on our extensive review is presented below:

•
The review highlighted the merits and demerits of different machine vision and deep learning techniques along with their various performance metrics.

•
The pertinence of diverse techniques for yield prediction with the bloom intensity estimation helps farmers improve their crop yields at the early stage.

•
The fruit detection and counting models with image analysis and feature extraction for classifiers technologically advance the crop yield estimation and robot harvesting.

•
The combination of various hand-crafted features using hybrid DL models improves the computational efficiency and reduces the computation time.

•
The deep learning models outperform the other conventional image processing techniques with an average accuracy of 92.51% in diverse agricultural applications.
As future work, hybrid techniques of machine vision and deep learning models can be applied to develop automated systems for precision agriculture. The methods highlighted in this paper can be tested for real-time crop yield estimation applications. Innovative methods with the objective of improving the performance of the overall system can be also developed.

Abbreviations
The abbreviations used in this manuscript are given as under:  Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.