Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops

Unigarro, Christian; Hernandez, Jorge; Florez, Hector

doi:10.3390/informatics12020046

Open AccessSystematic Review

Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops

by

Christian Unigarro

,

Jorge Hernandez

and

Hector Florez

^*

ITI Research Group, Universidad Distrital Francisco Jose de Caldas, Bogota 110231, Colombia

^*

Author to whom correspondence should be addressed.

Informatics 2025, 12(2), 46; https://doi.org/10.3390/informatics12020046

Submission received: 8 January 2025 / Revised: 31 March 2025 / Accepted: 30 April 2025 / Published: 6 May 2025

Download

Browse Figures

Versions Notes

Abstract

:

Precision agriculture is an approach that uses information technologies to improve and optimize agricultural production. It is based on the collection and analysis of agricultural data to support decision making in agricultural processes. In recent years, Artificial Neural Networks (ANNs) have demonstrated significant benefits in addressing precision agriculture needs, such as pest detection, disease classification, crop state assessment, and soil quality evaluation. This article aims to perform a systematic literature review on how ANNs with an emphasis on image processing can assess if fruits such as mango, apple, lemon, and coffee are ready for harvest. These specific crops were selected due to their diversity in color and size, providing a representative sample for analyzing the most commonly employed ANN methods in agriculture, especially for fruit ripening, damage, pest detection, and harvest prediction. This review identifies Convolutional Neural Networks (CNNs), including commonly employed architectures such as VGG16 and ResNet50, as highly effective, achieving accuracies ranging between 83% and 99%. Additionally, it discusses the integration of hardware and software, image preprocessing methods, and evaluation metrics commonly employed. The results reveal the notable underuse of vegetation indices and infrared imaging techniques for detailed fruit quality assessment, indicating valuable opportunities for future research.

Keywords:

artificial neural networks; image processing; harvest prediction; systematic literature review

1. Introduction

Food security is one of the most important aspects in the world; it ensures the quality of life in countries [1]. If a country manages to industrialize crops, it is likely to be able to obtain more food for its population [2]. However, crops must be of good quality and have a great variety to ensure that people can have food every day. Nowadays, most crops lack early warning systems that can reduce food loss. Crops must be monitored because different factors can affect productivity, sustainability, and economic viability. The most common factors are climate change that affects crop yield [3], diseases that can usually be detected visually [4], and soil fertility that enables the high production of crops by having the right nutrients [5].

Precision agriculture has been evolving thanks to technologies that use the minimum amount of elements for production. The detection of food harvest can be achieved by monitoring the state of the fruits in the crops (ripe or unripe and healthy or damaged), along with multispectral or drone images. Harvesting is usually performed manually; then, to calculate the amount of harvested products, it is necessary to develop technological solutions to automate the process [6].

In the past, pests were difficult to detect and affected the ideal production in agriculture. Today, thanks to ANNs, we can automatically detect pests and weeds early. Particularly, Convolutional Neural Networks (CNNs) are trained with images of crops that can accurately recognize weeds and pests in real time. For instance, recent research proposed a CNN model with an attention network mechanism, which achieved 98.17% in precision [7]. Furthermore, models such as CATNet, which integrate efficient attention and cascading transformer modules, have demonstrated significant improvements in marine species classification, suggesting that similar techniques could be applied in agriculture for accurate pest and weed classification [8]. In the same way, detecting weeds in crops has necessitated the implementation of detection object techniques like YOLO (You Only Look Once). An experiment with YOLOv4 identified 98.8% of weeds with an average accuracy (mAP) of 73% for the detection of different species of invasive weeds [9].

Related to previous studies, the detection of diseases in plants is still relevant. Deep Convolutional Neural Networks (CNNs) trained with thousands of images have demonstrated a big performance differencing sleekness [10]. For instance, CNN models pre-trained, like ResNet50, VGG16, or DenseNet121, obtain an accuracy between 95% and 99% [11]. Hybrid approaches also use classical algorithms; a study reported an accuracy of 98% in classifying diseases in leaves using a Support Vector Machine (SVM) supported by the segmentation and extraction of texture techniques [11].

Another application is crop variety identification and hyperspectral image classification. Artificial Neural Networks (ANNs) allow for modeling complex interactions between productivity factors (climate, soil type, plant genetics, agricultural management, etc.). For example, GACNet, a generative adversarial-powered cross-attention network, has been used for wheat variety identification using hyperspectral images, improving variety classification accuracy [12]. Similarly, SSTNet, a spatially, spectrally, and texture-aware attention network, has been applied for corn variety identification, demonstrating the effectiveness of attention techniques in crop classification [13].

Monitoring the nutritional status of both soil and plants is another area where ANNs are having an impact. They determine how balanced soil is in terms of nutrients (nitrogen, phosphorus, potassium, etc.). An ANN is applied in two ways: (1) the prediction of soil nutrients from sensor data or images and (2) the detection of nutritional deficiencies in plants using computer vision. For the first case, models are used for nutrient estimation, combining sample data and data from sensors [14]. For the second case, CNNs are trained on leaf images to diagnose deficiencies (e.g., nitrogen deficiency often causes the yellowing of leaves, potassium deficiency causes burnt edges, etc.). A recent study trained different CNN architectures on images of rice and bananas, resulting in Inception-V3 achieving 93% accuracy in identifying nutritional deficiencies in plants [15].

Irrigation prediction is another application, especially for zones with hybrid shortages. Systems of intelligent irrigation combine sensors (humidity and weather) with algorithms of an ANN to decide when and how much water is applied to the crops, optimizing the use of water without sacrificing plant health. For instance, a study of an automatic irrigation system, decision tree, SVM, neural networks, and Random Forest obtained results of 84.6% accuracy in the prediction of the optimum irrigation time [16]. Implementing the system, it achieved 60% water savings compared to traditional fixed irrigation methods [16].

Technologies such as Artificial Neural Networks (ANNs), along with the Internet of Things (IoT), play a crucial role in modern agriculture. ANNs can analyze historical data to identify patterns, helping farmers make informed decisions in areas such as automatic crop monitoring, determining the optimal harvest time, and predicting the overall yield. Specifically, neural networks can be employed to assess fruit maturity, detect the best harvesting periods, and accurately estimate total crop production [17].

Highlights

The purpose of this article is to perform a systematic literature review to establish how fruit maturity has been calculated in crops like apples, coffee, lemon, and mango. Using ANNs that process aerial or whole-crop images, we want to establish what types of architectures and image processing have been generated when trying to monitor the state of fruits and crops. Additionally, we want to know how software and hardware have interacted when applying an ANN to crops.

The main contributions are summarized as follows:

We analyze how neural networks can determine the state of fruits in crops such as mango, apple, lemon, and coffee.
We find that CNN neural networks, including architectures such as VGG16 and ResNet50 networks, are the most used to detect crop maturity.
We discuss the integration of hardware and software, image preprocessing methods, and evaluation metrics employed.
We analyze whether the approaches were post-harvest or pre-harvest.

Figure 1 presents a summary of this paper. In this study, we used a literature review process to identify how other authors used artificial neuronal networks to determine if a crop can be harvested from the state of fruit (lemon, coffee, apple, and mango).

This article is structured as follows. Section 2 presents the methodology used to systematically review the literature. Section 3 presents the results obtained in the systematic literature review. Section 4 discusses the results. Finally, Section 5 offers some future work, while Section 6 concludes the paper.

2. Methodology

This article conducts a systematic literature review via the following steps. First, we discuss the research questions to be answered with the analysis of the literature. Then, the inclusion and exclusion criteria are presented to limit the literature according to the research questions. Finally, the quality of the literature is validated [18,19].

2.1. Research Questions

This article used the following research questions:

Q1. What types of ANNs are most frequently used to detect the status of mango, apple, lemon, and coffee crops?
Q2. What are the performance evaluation metrics of ANNs in predicting the status of each fruit or crop product?
Q3. Which hardware and software tools have been used to implement ANNs for predicting and monitoring crop status in agriculture?
Q4. How are images prepared and processed for input into ANNs?
Q5. What vegetation indices have been calculated using ANNs in mango, apple, lemon, and coffee crops?
Q6. Which hardware tools or devices are used to collect images for datasets?

2.1.1. Search Strategy

To perform the systematic literature review, we focused on the following relevant aspects:

Type of Crop: The kind of agricultural product studied or analyzed (mango, apple, lemon, or coffee).
Device for Image Processing: The equipment used to capture or analyze images, such as drones, cameras, or satellites.
Computer Vision Task: The specific goal of analyzing images, such as classification, segmentation, or object detection.
Artificial Neural Network Architecture: The specific design or structure of the neural network, including layers, nodes, and connections.

2.1.2. Selected Journals and Conferences

The selection of journals and conferences was conducted to ensure a review of the application of ANNs in precision agriculture. The focus is on fruit ripeness detection, harvest prediction, and crop quality assessment. To achieve this, we queried several academic databases, including Scopus, ScienceDirect, Springer Link, and IEEE Xplore, using filters to identify studies published between 2019 and 2024.

The queries incorporated specific keywords related to ANNs, vegetation indices (e.g., NDVI, EVI, SAVI), and crops such as apple, lemon, coffee, and mango. Articles were further refined by selecting those within relevant subject areas, such as agriculture, computer science, and engineering, ensuring that only peer-reviewed research articles were included. Table 1 outlines the filters and queries applied to each database.

2.2. Study Selection Criteria

We used selection criteria based on documents about image processing using ANNs for precision agriculture.

2.2.1. Inclusion Criteria

The inclusion criteria ensure the selected articles are relevant to the research focus and objectives. The criteria are as follows:

IC1: Articles focused on fruit crops such as apples, mangoes, lemons, or coffee. These crops were selected because of their economic importance and frequent use in studies involving ANNs for agricultural monitoring.
IC2: Studies that involve the application of ANNs for analyzing crop health, predicting maturity, or calculating vegetation indices.
IC3: Research articles published between 2019 and 2024 to ensure the inclusion of recent advancements in precision agriculture.
IC4: Peer-reviewed articles in the domains of agriculture, computer science, or engineering to maintain high-quality and relevant sources.
IC5: The literature that includes information on how images are processed to use the images in ANNs.

2.2.2. Exclusion Criteria

The exclusion criteria filter out articles that do not align with the research scope. The criteria are as follows:

EC1: Studies that do not use ANNs as part of their methodology for crop monitoring or analysis.
EC2: Articles that do not involve selected crops (apple, mango, lemon, or coffee) or focus on other unrelated agricultural products.
EC3: Publications such as opinion papers, editorials, or conference abstracts that lack a detailed methodology or experimental results.
EC4: Publications that do not mention image processing.

2.3. Study Quality Assessment

Following inclusion and exclusion criteria, a total of 68 articles were selected through a final screening process (see Table 2). To enhance the accessibility of each algorithm, the selected articles were categorized into modules (see Table 3).

2.4. Data Extraction

The main pieces of information used to evaluate our defined criteria are the following:

The title, authors, journal or conference, and publication details (year, country, and reference);
The article database where it was found;
The topic area;
The problem to be solved;
Objectives;
Methods used to provide a solution;
Results.

2.5. Data Synthesis and Quality Verification

After classifying the articles into modules, we outlined questions to select them based on prior modules (see Table 4).

3. Results

This section presents the literature review results, mainly the methods, types of neural networks, and how the vegetation indices and the fruits’ state of maturity are calculated and analyzed. Figure 2 shows the results for each research question. This study identified several types of ANNs, the metrics, the hardware or software used in ANNs, the image processing techniques, the vegetation index, and the hardware to collect the images.

3.1. Methods for Classification and Segmentation

For fruit classification, and segmentation approach typically uses Convolutional Neural Networks (CNNs) combined with algorithms such as Multilayer Perceptrons (MLPs), Random Forests, Support Vector Machines (SVMs), or You Only Look Once (YOLO), which are used to evaluate fruit quality or detect pests on fruits and leaves [20]. Then, the evaluation typically relies on precision, recall, F1-score, and confusion matrix metrics to measure the efficiency of classification models [21,22].

CNN architectures are also applied to maintain simplicity and avoid additional steps in the training process. For example, Dakwala et al. [23] evaluates multiple architectures commonly referenced in other research, including VGGNet16, ResNet, NASNet, Xception, ResNet50, ResNet v2, DenseNet, and VGGNet19. Among these, ResNet50 achieved the best performance, with an accuracy of 98.27%.

In Ahmed et al. [24], a segmentation of apple diseases is performed using an analysis of the tree’s leaves with a CNN. Admass et al. [25,26] also present a CNN to study the mango leaf. Bhavya et al. [27] used a CNN to calculate the fruit quality of ten types of crops. Also, Singhet al. [28] used a CNN to detect diseases in the leaves of apples.

The reason for the use of images is because there is a necessity to utilize methods that do not produce damage to fruits, different from the common ones, which require damaging the fruit. These methods aim to detect levels of sugar or internal problems in the crop like pests or upper ripeness [29].

According to the main problem, in many studies, researchers created their datasets to control variability and ensure proper model training to achieve better generalization. These studies often avoid using freely available or paid datasets [30].

Given the challenges associated with accessing or creating datasets, one notable approach observed was the use of synthetic images generated by tools like DALL-E to overcome these limitations [31].

Also, Li et al. [32] use a backpropagation neural network (BPNN) to estimate the canopy nitrogen concentration of apples.

3.2. Image Features

This study focuses on the literature that uses images of fruits captured using RGB cameras, hyperspectral cameras, drones, or professional cameras. Thus, the selected features to perform the literature review are textures, edges, and patterns associated with fruit ripeness.

3.3. Mathematical Methods

The reviewed literature used mathematical methods to complete the approach or to analyze the images and generate results. The study identifies several methods used in the reviewed articles.

3.3.1. Residual Predictive Deviation (RPD)

In precision agriculture, RPD is used to evaluate the accuracy of models that predict crop properties, such as nutrient content or moisture, from spectral data. For example, Li et al. (2022) [32] employed RPD to evaluate the accuracy of models estimating nitrogen concentration in the apple canopy using hyperspectral images captured by UAVs.

3.3.2. Particle Swarm Optimization (PSO)

In precision agriculture, PSO is used to optimize parameters of predictive models and classifiers, improving accuracy in tasks such as fruit quality assessment. For example, Peng et al. (2023) [33] combined PSO with neural networks to qualitatively and quantitatively assess apple quality using visible spectroscopy.

3.3.3. Chicken Swarm Optimization (CSO)

In precision agriculture, CSO has been applied to select optimal traits and improve grading in automated fruit quality assessment systems. For example, Kumari et al. (2022) [34] used CSO for optimal feature selection and hybrid classification in automated mango quality assessment.

3.3.4. Discriminant Score

In precision agriculture, the discriminant score is used to classify and detect diseases in fruit, differentiating between healthy and affected fruit. For example, Ashok et al. (2023) [35] constructed a medium-sized dataset for the non-destructive classification of diseases in mangoes using machine and deep learning models, employing discriminant scores to improve accuracy.

3.3.5. Discrete Fourier Transform (DFT)

In precision agriculture, DFT is used to analyze textures and patterns in crop images, facilitating the detection of anomalies or diseases. For example, Kumari et al. (2022) [34] also applied DFT in their study to analyze texture features in mango images, improving the accuracy in automated quality assessment.

3.3.6. Monte Carlo

In precision agriculture, the Monte Carlo method is used to model and predict agricultural variables, such as fruit quality, considering the inherent uncertainty and variability. For example, Guo et al. (2024) [36] used a portable Vis/NIR transmission spectroscopy system combined with the Monte Carlo method for the nondestructive determination of edible quality and watercore degree in apples.

3.3.7. Multicount Measurement Classification and Recognition

In precision agriculture, multicount measurement classification and recognition is used to optimize visual pattern recognition in the detection of mechanical damage in fruit using laser relaxation spectroscopy. For example, Lian et al. (2023) [37] applied this method to optimize visual pattern recognition in the detection of mechanical damage in apples using laser relaxation spectroscopy.

3.3.8. Principal Component Analysis (PCA)

In precision agriculture, PCA is used to reduce the dimensionality of spectral or image data, facilitating the classification and quality assessment of fruits. For example, Dhiman et al. (2021) [38] developed a general-purpose multi-fruit system for assessing fruit quality, applying PCA for dimensionality reduction and using recurrent neural networks for classification.

3.3.9. Land Surface Temperature (LST)

In precision agriculture, LST is monitored to assess crop health and stress, and manage irrigation and other farming practices. For example, Francis et al. (2023) [39] monitored canopy quality and improved equitable outcomes of urban tree planting using LiDAR and machine learning.

3.3.10. Shannon Entropy

The fusion of multitemporal data and very-high-resolution aerial imagery is conducted for species mapping. For example, Neyns et al. (2024) [40] apply Shannon Entropy to fuse multitemporal data from PlanetScope and high-resolution aerial imagery. This technique improves tree species classification and mapping in urban environments.

3.4. Fruit Harvest Using Artificial Neural Networks

Harvest outcomes, particularly fruit products, can be predicted or enhanced by assessing fruit quality, ripeness, or pest damage using artificial intelligence tasks such as classification, segmentation, and detection. These tasks are commonly performed with architectures based on CNNs.

Ashok et al. [35] used a CNN to classify whether mango crops contained damaged, undamaged, or cold-damaged fruit. The input of the network is a discriminant function with 603 predictor variables based on the image with the following characteristics: intensity extracted from the R, G, and B color channels of each sample. They used the Haralick approach with 162 standard intensity features and Fourier descriptors, 21 Hu invariant moments, and 420 texture statistics.

Kumari et al. [34] used a CNN to calculate the mangoes that were ripe, partially ripe, and unripe. The authors use the K-means algorithm for characteristic segmentation.

Studies discussing indices such as the NDVI, EVI, or SAVI generally focus on the condition of the plant (mainly in leaves) rather than the state of the fruit. This is because most of these indices are derived from high-altitude data collected using Unmanned aerial vehicles (UAVs). These indices are more suitable for evaluating the overall condition of the crop and are not specifically related to harvesting.

Some studies address the detection of pests on leaves, offering a more specific perspective on crop conditions. These approaches can be complemented by indices, enabling a more comprehensive and accurate assessment of the crop [41,42].

Other studies calculates the soluble solid content (SSC) to determine the ripening stages. Huang et al. [43] and Guo et al. [36] used a BPNN to calculate SSC. However, Liu et al. [44] argue that the calculation of SSC is an expensive computational method.

We also found that Recurrent Neural Networks (RNNs) have been used. Dhiman et al. [38] were in charge of calculating the quality of nine types of fruits using an RNN. Also, Watnakornbuncha et al. [45] used a combination of RNNs and CNNs for quality prediction in lemon crops.

Magro et al. [46] reviewed computational models in precision fruit growing, emphasizing the impact of temporal variability on perennial crop yield assessment. The study focuses on how fluctuations in environmental conditions affect fruit development and yield predictions. Machine learning techniques, including Artificial Neural Networks, have been used to model these variations, integrating multisource data such as weather patterns, phenological stages, and spectral indices.

3.5. Image Augmentation

Most methods used for augmentation were scaling, flipping, rotation, and translation, which aim for generalization. Also, we see some normalization methods consisting of transforming each pixel from 0 to 255 to the range of 0 to 1 [23,47].

Li et al. [32] aimed to explore an effective approach to invert and map canopy nitrogen concentration (CNC) distributions based on UAVs hyperspectral imagery data for apple crops. The drone was a DJI Matrice 600 PRO equipped with a Cubert UHD 185-Firefly hyperspectral sensor to capture canopy images of an apple orchard.

Images for different spectra help to implement tasks for computer vision like classification, segmentation, and object detection. For instance, Kumar et al. [48] employed infrared image from satellite Landsat to train a Convolutional Neural Network, with an accuracy of 90.10%.

Farjon et al. [49] discuss the development of various datasets to enhance model training, incorporating imaging techniques like RGB, hyperspectral, and thermal imaging. The review also addresses challenges related to occlusion, where objects like fruits or plants are partially hidden; scale variation, where objects appear in different sizes due to perspective changes; and data scarcity, which limits model training and generalization. Strategies like data augmentation and transfer learning are explored to mitigate these issues.

In Ashok et al. [35], an image dataset was created to train a CNN with classes chilling damage, defective, and non-defective in mango crops. The images were taken between 12:00 and 13:00 using a Sony Cyber-shot DSC-WX7 digital camera. The total color images was 2279 in a resolution of 640 × 480 pixels and size of 60.5 KB each. The preprocessing used was to resize the image to 200 × 150 pixels.

Identified Techniques for Image Processing

According to the reviewed literature, preprocessing is required before creating the models. Primarily, preprocessing was performed to resize the image, crop the image, rotate the image, and check the contrast. Because there are no tidy data but rather images that are input to the models, distributions or relationships in the variables are not reviewed. In the studies where a CNN was used, it was identified that feature selection was related to creating the necessary filters to find shapes, colors, and sizes. For example, in [25], in the pooling layer, filters were used to reduce the image size and reduce its dimensionality.

Table 5 shows the general steps that were analyzed from the literature to capture the general flow to solve the problem. First, the image must be captured, then processed, obtaining the patterns, and with this, the classification or prediction model could be created. In addition, an additional step is post-processing and decision making, not identified in the literature; only ref. [27] indicates that alerts should be sent, but it is not identified if the approach sends alerts. A complete approach would be if it could alert users about the status of the fruit in real time. The techniques identified for the images are presented below.

Kumari et al. [34] used the GLCM technique to create feature extraction for a mango dataset for a CNN to analyze the texture of the image using energy, entropy, contrast, and homogeneity.

Knott et al. [50] used pre-trained vision transformers to create a CNN to detect defects in apples and ripeness in bananas. They split the images into tokens using the DINO ViTs approach. In addition, Xiao et al. [51] used vision transformers to calculate apple ripeness mainly with a Swin Transformer.

Infrared images were used in Da et al. [52] to employ a deep neural network (DNN) to identify regions where bruised apples occur.

3.6. Type of Crops

As mentioned before, this review was focused on lemon, mango, coffee, and apple crops. However, as some works used other types of crops, we counted these types of crops. Figure 3 presents the percentage of how the quality of crops has been resolved in the types of crops. The apple is the one that has been studied the most in the results of the literature. The other category refers to pitahaya, strawberry, corn, potato, orange, pineapple, guava, grape, pomegranate, and peach.

3.7. Calculating Vegetation Index

According to the systematic literature review, only three articles regarding vegetation indices were identified. In particular, Kumar et al. [48] calculated the NDVI using common neural network architectures. The study focused on assessing the state of crops, without addressing the relationship between ripeness and fruit quality.

Li et al. [32] used the vegetation index (NDVI) to calculate the canopy nitrogen concentration in apple crops. This vegetation index is very commonly used to extract the green vegetation of crops.

Afsar et al. [53] used the NDVI index to detect mango crops. The authors use a CNN to calculate the health status of the crop.

3.8. Software and Hardware in Artificial Neural Networks in Crops

Some ANNs aim to support tasks in automated harvesting using robots, generally for cultivation, and the application of insecticides. Consequently, architectures requiring lower computational resources, such as MobileNetV2 (a CNN-based model), are favored for practical implementation [31].

Mirbod et al. [54] developed hardware to take images for apple quality identification based on the size of each fruit.

Bongulwar et al. [55] designed a lighting prototype to avoid backscatter effects from other lighting sources when taking images for quality assessment of mangoes.

3.9. Common ANN Architectures and Models

Some architectures are commonly used for fruit classification, addressing various purposes ranging from hardware-specific applications to the evaluation of detailed characteristics for detection. Architectures designed for use on hardware or robots during harvesting must be lightweight. In contrast, those aimed at obtaining detailed information about fruits, such as diseases or pests, require more computationally intensive evaluations to support the application of pesticides or fertilizers. The most widely used architectures are described in this section.

A way to capture the mathematical meaning of deep convolutional architectures is to focus on how each network transforms an input image X through a series of convolutional, activation, and pooling or skip-connection operations, which results in a set of fully connected layers that produce the final outputs [56].

3.9.1. MobileNetV2 [31]

MobileNetV2 uses depthwise separable convolutions to reduce computational complexity.
A standard 2D convolution with kernel $W ϵ R^{k \times k \times C_{i n} \times C_{o u t}}$ is factored into

$Conv 2 D (X, W) \approx \underset{k \times k \times 1 filters}{\underset{︸}{{Conv 2 D}_{depthwise} (X, W_{1})}} \to \underset{1 \times 1 filters}{\underset{︸}{{Conv 2 D}_{pointwise} (\cdot, W_{2})}} .$
It includes inverted residual blocks with linear bottlenecks, designed to reduce parameters while preserving representational capacity.

3.9.2. VGG16 [26,27,57,58,59]

VGG16 is characterized by sequential 3 × 3 convolutions with fixed spatial padding and ReLU activations.
Each layer transforms an input feature map $X$ via

$X_{ℓ + 1} = σ (Conv 2 D (X_{ℓ}, W_{ℓ})),$

where $σ$ is a nonlinear activation function (ReLU).

3.9.3. ResNet50 [23,59]

ResNet50 introduces skip connections (or residual connections) to combat vanishing gradients and enable very deep architectures.
The residual block can be written as

$X_{ℓ + 1} = X_{ℓ} + F (X_{ℓ}, W_{ℓ}),$

where $F (.)$ is a stack of convolution, batch normalization and ReLU layers.

3.9.4. Inception V3 [58]

Inception V3 uses inception modules that factorize convolutions into parallel paths with different kernel sizes (e.g., $1 \times 1, 3 \times 3, 5 \times 5$ ) and then concatenate their outputs along the channel dimension.
Mathematically, for an inception block with k parallel paths, the following applies:

$X_{ℓ + 1} = ⨁_{i = 1}^{k} σ (Conv 2 D (X_{ℓ}, W_{ℓ, i})),$

where ⨁ denotes concatenation along channels.

3.9.5. ThinNet [60]

ThinNet is a family of lightweight CNNs focusing on channel reduction or factorized convolutions to reduce computation.
It may use techniques similar to depthwise separable convolutions or group convolutions to achieve smaller parameter counts while maintaining accuracy.

3.9.6. Faster R-CNN [54]

A Faster R-CNN is a two-stage object detection architecture.
Stage 1: The Region Proposal Network (RPN) predicts candidate bounding boxes, where $p$ represents objectness probabilities and $t$ represents bounding-box coordinates.
Stage 2: It classifies the proposals and refines their coordinates.

3.9.7. Mask R-CNN [51,54]

A Mask R-CNN extends a Faster R-CNN with a third branch for instance segmentation.
It adds a $F C N$ -style head for predicting the segmentation mask $M$ of an object within each detected bounding box.

3.9.8. YOLOv3 [61] and YOLOv5 [62]

YOLOv3 and YOLOv5 are single-stage detection architectures designed to perform object localization and classification simultaneously, providing bounding boxes and associated class probabilities.
They operate on multiple scale levels, detecting objects of varying sizes efficiently.
Each detection scale $i \in {1, 2, 3}$ generates feature maps through convolutional layers (Conv2D), predicting anchor boxes as follows:

$(p_{i}, t_{i}) = {DetectionHead}_{i} (X_{ℓ})$

where the following applies:
–
$p_{i}$ denotes the objectness probabilities and class probabilities.
–
$t_{i}$ represents bounding-box coordinate adjustments at the corresponding scale level.
–
$X_{ℓ}$ indicates the input feature map generated by the backbone network at scale ℓ.

3.9.9. DenseNet [63]

DenseNet uses dense connections: each layer takes as input all the feature maps of preceding layers.
If $X_{0}, X_{1}, \dots, X_{ℓ - 1}$ are previous feature maps, the ℓ-th layer output is

$X_{ℓ} = F ([X_{0}, X_{1}, \dots, X_{ℓ - 1}], W_{ℓ}),$

where $[X_{0}, X_{1}, \dots, X_{ℓ - 1}]$ denotes concatenation and F is a convolution-based transformation.

Despite their architectural differences, these networks share the fundamental concept of iteratively applying convolutional transformations, normalizations, and nonlinearities to extract abstract features from raw image data. The design variations—such as depthwise separable layers, residual connections, inception modules, and dense connections—aim to enhance accuracy, efficiency, or both, all within the same mathematical framework of convolutional feature extraction.

We identified common principles that these architectures employ, such as the ConvD2 framework, which captures spatial patterns and local features. They learn hierarchical characteristics by detecting borders and textures at a low level while recognizing shapes or objects as more abstract, deeper features. The following formula illustrates these shared attributes:

y (i, j, c) = \sum_{k, l, d} w (k, l, d, c) \cdot x (i + k, j + l, d) + b (c)

where the following applies:

$y (i, j, c)$ : the value of the feature map at a value position $i, j$ for the c output channel.
$x (i + k, j + l, d)$ : the value of the input feature map at spatial position $(i + k, j + l)$ for the d input channel.
$w (k, l, d, c)$ : the value of the convolutional filter (or kernel) at position $(k, l)$ for the d input channel and c output channel.
$b (c)$ : the bias added to the c output channel.

They use optimization techniques like SGD (Stochastic Gradient Descent) or their variants like Adam or RMSProp (Root Mean Square Propagation):

θ \leftarrow θ - η \nabla_{θ} L (X, Y; θ)

where the following applies:

$θ$ : model parameters.
$η$ : learning rate.
$\nabla_{θ} L (X, Y; θ)$ : the gradient of the loss function $L$ with respect to parameter $θ$ .

Due to the current architectures requiring datasets with a significant number of images, some approaches have leveraged transfer learning to build upon other researchers’ efforts. For instance, transfer learning techniques using CNN architectures such as ResNet50 have been employed to predict the maturity of Citrus Limon fruits with high precision [64]. By fine-tuning pre-trained models, the need for large datasets was reduced, achieving accurate predictions of fruit maturity while optimizing computational resources.

Arivalagan et al. [65] use transfer learning with CNN-based architectures, including InceptionV3 and MobileNetV2, to assess fruit quality. This work emphasized the effectiveness of lightweight and efficient models, especially in real-time applications where computational limitations are critical. In addition, it underlines the versatility of transfer learning in adapting existing models to exhaustive tasks in agriculture, enabling the faster development and deployment of solutions for tasks like fruit quality assessment and maturity prediction.

3.10. Metrics for Evaluating the Artificial Neural Networks

Table 6 presents the metrics used in the articles and their corresponding values. These metrics are important because they indicate the model’s confidence level for the problem posed in each article, given the type of neural network and data. In the articles, it is evident that most use CNNs using accuracy. This is because most authors attempt to process images to calculate whether the crop should be harvested or not.

The following formulas represent the metrics used to measure the efficiency of these kinds of models:

3.10.1. R-Squared ( $R^{2}$ )

This metric is used to calculate how well the model fits the variability of your data:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

where the following applies:

$y_{i}$ : the observed value for the i-th data point.
$\hat{y}$ : the predicted value for the i-th data point from the regression model.
$\bar{y}$ : the mean of all observed values.

3.10.2. Root Mean Square Error (RMSE)

The RMSE error is the residuals’ standard deviation:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

where the following applies:

n: the total number of data points.
$y_{i}$ : the observed values for the i-th data point.
$\hat{y}$ : the predicted value for the i-th data point.

3.10.3. Acccuracy

Accuracy measures the proportion of correctly predicted instances (positive and negative) out of the total number of predictions.

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

where the following applies:

$T P$ : true positives.
$T N$ : true negatives.
$F P$ : false positives.
$F N$ : false negatives.

3.10.4. Precision, Recall and F1-Score

These formulas define three commonly used metrics for evaluating classification models. Recall measures the proportion of actual positive instances correctly identified by the model, whereas precision measures the fraction of predicted positive instances that are truly positive. The F1-Score, which is the mean of precision and recall, provides a balanced measure that is useful when both metrics are equally important.

Recall = \frac{TP}{TP + FN}

where the following applies:

$T P$ : true positives.
$F N$ : false negatives.

Precision = \frac{TP}{TP + FP}

where the following applies:

$T P$ : true positives.
$F P$ : false positives.

F 1 Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

The previous metrics are commonly used to validate classification models, particularly in applications related to precision agriculture, where the accurate detection, classification, and segmentation of agricultural variables are useful. These metrics indicate the model’s strengths and weaknesses by quantifying how closely the predictions match the ground truth.

3.11. Input Data Dimensionality and Dataset Sizes

The reviewed studies describe a wide range of input data dimensionalities and dataset sizes, depending on the type of image and the problem being addressed. In Li et al. [32] (2022), the authors collected 92 hyperspectral samples from apple canopies using UAV imaging, splitting them into 69 samples for training and 23 for validation, following a 3:1 ratio. Each sample was represented by a 16-dimensional vector composed of selected spectral indices such as the NDVI and red-edge parameters.

In contrast, studies based on RGB images often report the resolution and number of input images. Ashok et al. [35] (2023) created a dataset of 2279 mango fruit images with an initial resolution of 640 × 480 pixels. These images were preprocessed by resizing them to 200 × 150 pixels before being fed into a CNN. The dataset was split into 80% for training and 20% for testing, and 10-fold cross-validation was applied on the training set.

Peng et al. [33] (2023) collected UV-Vis spectroscopy data from 100 apple samples for predicting soluble solid content (SSC) and increased the dataset to 165 spectra through data augmentation. The raw spectral data (ranging from 190 to 1100 nm) was reduced to 35 principal components using PCA before model training.

For the classification of leaf diseases, some datasets began with 1452 images and were augmented to over 3000 images. For instance, Bezabh et al. [26] (2024) resized all images to 128 × 128 pixels and applied augmentation techniques. In their study, the data were split into 70% training, 15% validation, and 15% testing sets.

These examples illustrate the variation in image dimensionality, from small feature vectors derived from spectroscopy or hyperspectral indices to full-resolution or resized RGB images, as well as the different dataset sizes and training-validation strategies employed across studies.

3.12. Training and Validation Methodologies

The included studies employed a variety of strategies to train models and validate their performance. A common approach is to split the available data into separate training and testing subsets. For instance, in the mango leaf disease Ensemble-CNN study, the authors partitioned their augmented dataset into 70% training, 15% validation, and 15% testing data [26]. Many deep learning studies follow this paradigm with similar ratios (e.g., 80/20 or 70/30 splits). On the other hand, several works leveraged cross-validation to make full use of limited data and to obtain more reliable performance estimates. Ashok et al. (2023) first used 20% of their mango fruit images as an independent test set and then performed a 10-fold cross-validation on the remaining 80% during training [35]. This yielded not only an average accuracy but also a 95% confidence interval for the SVM classifier’s performance (they report 95% accuracy with CI ± 1.05%), indicating an attempt to quantify result stability. In Wenping Peng’s apple SSC study, the evaluation was even more rigorous: they employed leave-one-out cross-validation (LOOCV) on the training set to compute the RMSECV (root mean square error of cross-validation) as a robust indicator and then tested the final model on a separate test set [33]. This approach helps ensure the reported prediction errors are not tied to a particular train–test split. Another example is the apple canopy nitrogen study, which used an unconventional but systematic split: samples were sorted by nitrogen level and then divided 2:1 into a calibration set (modeling set) and a validation set [32].

Machine learning studies employed k-fold cross-validation or LOOCV to maximize data usage. Each paper’s methodology section specifies the approach: e.g., X–Y% train–test splits [26], k-fold cross-validation (typically k = 5 or 10) [35], or a combination of both (train/validation/test plus the cross-validation of training data) in certain cases.

3.13. Model Hyperparameters and Architecture Details

Most papers provide specifics about model architecture and hyperparameters, though the level of detail varies. Several studies using Artificial Neural Networks (ANNs) outline the network topology and learning parameters. For example, in the apple canopy nitrogen estimation study [32], a single-hidden-layer backpropagation neural network (BPNN) was employed with four neurons in the hidden layer. The authors specified the use of the Levenberg–Marquardt training algorithm (trainlm), a learning rate of 0.001, a target mean squared error of 0.0001, and a maximum of 1000 training iterations for the BPNN.

Similarly, Peng et al. (2023) [33] described their network for predicting soluble solid content (SSC) in apples, selecting a configuration with 14 neurons in the hidden layer (determined via experiments). A learning rate

η = 0.001

yielded the best results. Their network was initialized using a stacked autoencoder’s learned weights to improve training stability.

Deep learning models also include hyperparameter descriptions. Ashok et al. (2023) [35] designed a custom CNN for mango defect classification, consisting of three convolutional blocks with 64, 128, and 256 filters respectively, each followed by pooling layers, and two fully connected layers with 128 neurons in the penultimate layer and a 3-neuron softmax output corresponding to the three quality classes. This architecture comprised approximately 1.67 million trainable parameters in total. The authors used the ReLU activation function throughout and added a dropout layer to mitigate overfitting. The CNN was trained for 50–100 epochs, determined by early experimentation, and the batch size was selected to allow for adequate sample exposure per update step.

In the SVM-based portion of the same study, a linear kernel was selected, and the optimal feature subset was determined using a Chicken Swarm Optimization (CSO) algorithm. While the CSO algorithm introduced hyperparameters such as the population size and number of iterations, these were not described in detail.

Across the included papers, commonly reported hyperparameters include the learning rate (typically around 0.001 in deep learning models), number of training epochs (ranging from tens to over a thousand, depending on model complexity), network architecture specifications (e.g., the number of layers and neurons per layer, filter sizes) in Table 6, and training techniques such as transfer learning, dropout regularization, and optimization algorithms like Adam, SGD, or Matlab’s default backpropagation implementations.

4. Discussion

The findings of this systematic literature review highlight several key areas using ANNs for fruit maturity detection, crop monitoring, and harvest prediction. This section discusses observations, limitations, and advancements based on the reviewed studies.

4.1. Using Customized Datasets

Researchers often create their datasets to ensure variability and model generalizability. These approaches provide datasets aligned with specific crop types (e.g., mango, apple, lemon, and coffee) and environmental conditions, which are often lacking in publicly available datasets [30].

Future research could benefit from efforts to create standardized and publicly available datasets for fruit quality and maturity detection, ensuring larger and more diverse datasets.

4.2. Indices Not Related to Ripeness and Quality of Fruits

Vegetation indices like the NDVI, EVI, and SAVI are frequently used to evaluate the condition of crops but rarely applied to assess fruit ripeness or quality [68]. These indices are derived from high-altitude images captured using UAVs that provide metrics for plant health; however, they are not related to fruit-specific metrics such as sugar content levels, internal pest damage, and ripeness stage.

Approaches to incorporate spectral or infrared imaging for detecting fruit-specific traits are needed because they could increase the accuracy of the models.

4.3. Infrared Images as a Better Dataset to Improve Detection

There are good approaches for improving fruit quality detection based on infrared images and other spectral data. Infrared imaging helps to detect internal defects, pests, and ripeness levels without causing physical damage to the fruits [29]. However, only a few studies leverage these advanced imaging techniques.

Research should explore cost-effective imaging solutions (e.g., low-cost hyperspectral cameras) and lightweight models optimized for real-time fruit assessment.

4.4. Machine Learning Architectures Designed for Robotic Purposes

ANN architectures are commonly used for fruit classification and segmentation tasks, such as ResNet50, VGG16, and MobileNetV2. Lightweight models (e.g., MobileNetV2) are preferred for robotic and real-time applications, where computational resources are limited [31]. Complex architectures (e.g., ResNet50, InceptionV3) are used for detailed evaluations such as disease detection or pest classification, where high accuracy is critical.

Transfer learning represents a solution for reducing computational costs and training times when large datasets are unavailable [64,65]. This approach allows pre-trained models to be fine-tuned for specific agricultural tasks, offering a balance between accuracy and efficiency.

To ensure the effective deployment of these architectures in robotic applications, standardized evaluation metrics are essential.

4.5. Standarized Evaluation Metrics

Metrics such as accuracy, precision, recall, F1-score, and RMSE are commonly used to evaluate the performance of ANNs in agricultural applications, including robotic systems. However, the lack of standardized metrics makes it difficult to compare models across different tasks and platforms. Establishing a common set of performance metrics would facilitate comparisons and help identify the most effective ANN approaches, particularly for robotic harvesting and monitoring.

4.6. Integration of Hardware and Software

ANNs in agriculture require the harmonious integration of hardware and software tools. Research should focus on optimizing software–hardware interactions to develop scalable and low-cost solutions for real-world agricultural applications. Zhang et al. [69] provide a survey on using small unmanned aerial vehicles (UAVs) for orchard management, emphasizing their role in data acquisition and crop monitoring. The study reviews various UAV types and sensor technologies, including RGB, multispectral, hyperspectral, thermal, and LiDAR, focusing on their applications in assessing crop health, resource efficiency, and disease detection. Integrating these sensing technologies with AI analysis presents opportunities to enhance precision agriculture, improving automation and decision making in orchard management.

4.7. Discussion on the Complexity of Approaches

Each approach in the selected literature used a type of neural network or a pre-trained model to solve the problem of calculating the state of the fruit. Table 7 presents a comparison of Artificial Neural Networks and pre-trained models, where the main advantages and disadvantages and the level of complexity in terms of the use and implementation of the model are indicated. The level of complexity can be interpreted by the level of processing time or by the configuration, either by having a simple or complex structure; for example, VGG has high complexity, due to its high dimensionality in its parameters, compared to Yolo V3 and Yolo v5. VGG is easy to interpret but takes a long time to train. ResNet is also high because it introduces residual connections to allow very deep networks, which consumes a lot of memory. DenseNet has each layer connected to the previous layers, which maximizes feature reuse. Although DenseNet uses fewer parameters than ResNet, each layer receives inputs from all previous layers, which increases memory usage. The comparison is made in terms of use and implementation because each approach is different due to the image size, number of images, and number of parameters. Therefore, a comparison of accuracy is not applicable, since each approach solves the problem differently.

In Artificial Neural Networks, DPNNs are computationally expensive, so uncertainty is incorporated into predictions. DPNNs are highly complex compared to CNNs, RNNs, and DNNs. Additionally, CNNs are very useful for object recognition in images, although they require large datasets.

4.8. Recommendations According to the Results

Based on the results of the literature, the following can be recommended to calculate fruit quality. Note that the first step is to identify if it is post-cosecha or not:

Post harvest
–
Crop type. At this point, it is recommended to use a single crop type to calculate fruit quality, primarily regarding apples and mangoes, which have been studied the most.
–
Artificial Neural Network. It is recommended to use a CNN-type neural network, which is used in a model to calculate the condition of the fruit based on previously taken photos. Preprocessing should be focused on generating rotations, as the fruit could be in different positions or have extra elements around it, which could lead to incorrect predictions.
–
Pretrained model. It is recommended to use Yolo v3 as the results of the selected studies show good accuracy.
–
Image processing technology. Because these types of studies are designed to be carried out on conveyor belts or in containers, it is recommended to use a professional camera that allows you to take photos to take pictures. This device must have an internet connection to upload the image to the cloud. In this sense, image resizing and texture techniques must be applied to transfer the image to the model and achieve good results.
In harvest
–
Crop type. Due to the shape of the fruit, it is best to use a single crop type in the study. Due to size, it was evident that apples and mangoes are the best to study. Lemons and coffee are smaller in size, so these crop types could make it difficult to calculate the fruit’s state.
–
Artificial Neural Network. Much work has been carried out with CNNs. It is recommended to use CNNs to calculate the fruit’s state, performing image preprocessing to classify the fruit’s ripeness.
–
Pretrained Model. ResNet-50 is used to classify ripeness based on color changes. However, YOLO v5 could be used as long as adjustments are made to calculate fruit size, since we identified that YOLO has good results, while taking into account the size and shape of the fruit.
–
Image processing technology. It is recommended to take spectral images with drones to determine the overall condition of the crop using the NDVI. To calculate the condition of individual fruits, photos can be taken manually with a camera. Otherwise, an approach must be created where a drone is trained to fly from fruit to fruit and takes photos of them. This means that a model is needed to determine if it is a fruit and for the drone to generate a flight path, and another model is needed to calculate the condition of the fruit.

4.9. Reliability and Significance of Reported Results

One notable gap in the reviewed literature is the limited attention to statistical validation and result consistency across multiple experimental runs. While most studies report high classification or prediction accuracies (ranging from 83% to 99% for CNN-based models), most do not include statistical significance testing or variability metrics to assess the robustness of these results.

For instance, Peng et al. [33] (2023) report achieving an accuracy of 98.48% on training data and 87.88% on test data using an optimized neural network but do not specify whether this outcome was averaged over multiple runs or subject to statistical testing. Similarly, many studies, including those employing CNNs or SVMs, provide point estimates for accuracy, precision, or

R^{2}

, often based on a single execution. While cross-validation (e.g., k-fold or leave-one-out) was used in some cases to mitigate overfitting and assess generalization (e.g., Ashok et al. [35] 2023; Huang et al. [43] 2021), formal tests of significance (such as ANOVA, t-tests, or confidence intervals) have been rarely reported.

An exception is found in Ashok et al. [35] (2023), where a 10-fold cross-validation procedure was used to report a 95% confidence interval for classification accuracy, offering a limited but valuable measure of reliability. Nonetheless, this level of rigor remains uncommon among the reviewed studies.

We therefore emphasize that the reliability of the performance metrics reported in the literature is often assumed rather than demonstrated. Future studies should incorporate statistical analysis to validate improvements, such as using multiple random train–test splits, reporting standard deviations, and applying hypothesis testing to compare models. Including these practices would strengthen the credibility and reproducibility of ANN-based approaches in agricultural image analysis.

5. Future Work

In this article, we conducted a literature review to validate the use of neural networks to detect fruit status using image processing. We found that there is a big difference between approaches that focus on pre-harvest [27,45] and post-harvest [33,34,38,50,55,66]. This might change the way the image is captured to calculate the fruit status as, if it is post-harvest, the fruit may already be in a packing machine. However, if the fruit is in the crop, a mechanism must be created to calculate the fruit status, but the location of the fruits must first be detected using real-time approaches [27].

Most of the literature agrees that fruit quality is handled manually. On the one hand, this implies that the agricultural industry must continue to strengthen itself to achieve control over crop quality. On the other hand, the current literature mainly focuses on disease detection, quantitatively or qualitatively assessing fruit quality, fruit size, and solid-state content level. Some approaches are further divided into having disruptive [70] or non-invasive methods to avoid wasting fruit [34,35,36,37,44,53,63,66,71]. However, we found in the literature that there has been little emphasis on creating an approach that can have a mechanism to monitor the crop and calculate how many fruits are damaged or not to notify the fruit manager. With this in mind, there is a great opportunity to create new research projects to solve this issue.

5.1. Generate New Literature Review

This work shows us that there are many opportunities for precision agriculture. Now, we want to review how approaches have been generated to control drones focused on automatic navigation for the purpose of monitoring the state of the crop. This new literature review will allow us to understand how to process images in real time and determine daily how the crop behaves in terms of its quality.

5.2. New Approaches in Crop Monitoring

The results of the literature review have left us with the following approaches that can be applied:

Using ANNs. We want to use Convolutional Neural Networks to detect the state of fruits. A dataset with images of fruits in different states (ripe, unripe, healthy, and damaged) could be created. In addition, we want to calculate the vegetation index with aerial images to have a complete analysis of the crop. So, with the vegetation index, we can calculate the diseases, physical damage, and water deficit. This combination of looking at each fruit with neural networks and generally the crop with the vegetation index would give us a complete approach to industrializing the crop and measuring its quality.
Multiple types of crops in the analysis. In most cases, researchers focused on a single type of crop. Therefore, it is necessary to have an approach that can analyze different types of crops. In this way, a complete solution can be provided. However, this is a challenge because there would be several types of problems that the neural network must solve.

6. Conclusions

This work presents a systematic literature review (SLR) on the application of Artificial Neural Networks (ANNs) for assessing fruit quality and maturity in mango, apple, lemon, and coffee, using image processing techniques. CNNs emerged as the most frequently utilized networks to assess fruit conditions, achieving accuracies ranging between 83% and 99%. Among these architectures, VGG16 was identified as the most commonly used, with only studies employing VGG16 explicitly integrating transfer learning techniques [27,70]. The review analyzed various architectures such as MobileNetV2, VGG16, and YOLO, demonstrating their effectiveness in classifying fruits and detecting defects or diseases accurately.

Several studies considered multiple crop types simultaneously [27,38,44,50,63], with apple and mango being the most widely investigated. We found that the methods used are generally post-harvest, taking into account that real-time methods are applied to detect how crops behave.

Additionally, this review identified several research opportunities, particularly the limited integration of vegetation indices, infrared, and hyperspectral imaging techniques, which offer substantial advantages by enabling non-destructive assessments of internal fruit quality and maturity. These approaches could enhance the predictive capabilities of precision agriculture systems.

There remains a need for developing optimized ANN algorithms for real-time deployment in embedded systems and robotic hardware. Lightweight architectures and transfer learning techniques should be further explored to provide cost-effective and efficient solutions in precision agriculture environments.

A notable issue identified was the lack of standardized public datasets and their metrics, hindering consistent comparisons between studies. Future research should focus on developing such standards to improve reproducibility and facilitate the benchmarking of neural network performance in agriculture applications.

Finally, the integration of vegetation indices such as the NDVI or EVI into neural network-based analyses remains underexplored, despite their proven effectiveness in monitoring crop health. This integration would enhance the practicality of smart agriculture solutions, enabling not only accurate predictions but also efficient resource management and timely decision making in farming practices.

Author Contributions

Conceptualization, C.U., J.H. and H.F.; methodology, C.U., J.H. and H.F.; formal analysis, C.U., J.H. and H.F.; investigation, C.U. and J.H.; resources, H.F.; data curation, C.U. and J.H.; writing—original draft preparation, C.U. and J.H.; writing—review and editing, H.F.; supervision, H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
BPNN	Backpropagation neural network
CNN	Convolutional neural networks
GAN	Generative adversarial network
MLP	Multilayer Perceptron
SVM	Support Vector Machine
UAV	Unmanned aerial vehicle
YOLO	You Only Look Once

References

Sood, S.; Singh, H. Computer vision and machine learning based approaches for food security: A review. Multimed. Tools Appl. 2021, 80, 27973–27999. [Google Scholar] [CrossRef]
Jararweh, Y.; Fatima, S.; Jarrah, M.; AlZu’bi, S. Smart and sustainable agriculture: Fundamentals, enabling technologies, and future directions. Comput. Electr. Eng. 2023, 110, 108799. [Google Scholar] [CrossRef]
Singh, B.K.; Delgado-Baquerizo, M.; Egidi, E.; Guirado, E.; Leach, J.E.; Liu, H.; Trivedi, P. Climate change impacts on plant pathogens, food security and paths forward. Nat. Rev. Microbiol. 2023, 21, 640–656. [Google Scholar] [CrossRef] [PubMed]
Chin, R.; Catal, C.; Kassahun, A. Plant disease detection using drones in precision agriculture. Precis. Agric. 2023, 24, 1663–1682. [Google Scholar] [CrossRef]
Havlin, J.; Heiniger, R. Soil fertility management for better crop production. Agronomy 2020, 10, 1349. [Google Scholar] [CrossRef]
Benos, L.; Moysiadis, V.; Kateris, D.; Tagarakis, A.C.; Busato, P.; Pearson, S.; Bochtis, D. Human–robot interaction in agriculture: A systematic review. Sensors 2023, 23, 6776. [Google Scholar] [CrossRef]
Zhao, S.; Liu, J.; Bai, Z.; Hu, C.; Jin, Y. Crop pest recognition in real agricultural environment using convolutional neural networks by a parallel attention mechanism. Front. Plant Sci. 2022, 13, 839572. [Google Scholar] [CrossRef]
Zhang, W.; Chen, G.; Zhuang, P.; Zhao, W.; Zhou, L. CATNet: Cascaded attention transformer network for marine species image classification. Expert Syst. Appl. 2024, 256, 124932. [Google Scholar] [CrossRef]
Saqib, M.A.; Aqib, M.; Tahir, M.N.; Hafeez, Y. Towards deep learning based smart farming for intelligent weeds management in crops. Front. Plant Sci. 2023, 14, 1211235. [Google Scholar] [CrossRef]
Ghanei Ghooshkhaneh, N.; Mollazade, K. Optical techniques for fungal disease detection in citrus fruit: A review. Food Bioprocess Technol. 2023, 16, 1668–1689. [Google Scholar] [CrossRef]
Ngugi, H.N.; Akinyelu, A.A.; Ezugwu, A.E. Machine Learning and Deep Learning for Crop Disease Diagnosis: Performance Analysis and Review. Agronomy 2024, 14, 3001. [Google Scholar] [CrossRef]
Zhang, W.; Li, Z.; Li, G.; Zhuang, P.; Hou, G.; Zhang, Q.; Li, C. GACNet: Generate adversarial-driven cross-aware network for hyperspectral wheat variety identification. IEEE Trans. Geosci. Remote Sens. 2023, 62, 5503314. [Google Scholar] [CrossRef]
Zhang, W.; Li, Z.; Sun, H.H.; Zhang, Q.; Zhuang, P.; Li, C. SSTNet: Spatial, spectral, and texture aware attention network using hyperspectral image for corn variety identification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5514205. [Google Scholar] [CrossRef]
Folorunso, O.; Ojo, O.; Busari, M.; Adebayo, M.; Joshua, A.; Folorunso, D.; Ugwunna, C.O.; Olabanjo, O.; Olabanjo, O. Exploring machine learning models for soil nutrient properties prediction: A systematic review. Big Data Cogn. Comput. 2023, 7, 113. [Google Scholar] [CrossRef]
Mkhatshwa, J.; Kavu, T.; Daramola, O. Analysing the performance and interpretability of CNN-based architectures for plant nutrient deficiency identification. Computation 2024, 12, 113. [Google Scholar] [CrossRef]
Glória, A.; Cardoso, J.; Sebastião, P. Sustainable irrigation system for farming supported by machine learning and real-time sensor data. Sensors 2021, 21, 3079. [Google Scholar] [CrossRef]
Oliveira, R.C.d.; Silva, R.D.d.S.e. Artificial intelligence in agriculture: Benefits, challenges, and trends. Appl. Sci. 2023, 13, 7405. [Google Scholar] [CrossRef]
Unigarro, C.; Florez, H. RGB Image Reconstruction for Precision Agriculture: A Systematic Literature Review. In Proceedings of the International Conference on Applied Informatics, Vina del Mar, Chile, 24–26 October 2024; Springer: Cham, Switzerland, 2024; pp. 211–227. [Google Scholar]
Rodríguez, Y.; Huérfano, A.; Yepes-Calderon, F.; McComb, J.G.; Florez, H. Cerebrospinal Fluid Containers Navigator. A Systematic Literature Review. In Proceedings of the International Conference on Computational Science and Its Applications, Malaga, Spain, 4–7 July 2022; Springer: Cham, Switzerland, 2022; pp. 340–351. [Google Scholar]
Gupta, R.; Kaur, M.; Garg, N.; Shankar, H.; Ahmed, S. Lemon Diseases Detection and Classification using Hybrid CNN-SVM Model. In Proceedings of the 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 26–28 May 2023; pp. 326–331. [Google Scholar]
Govindharaj, I.; Thapliyal, N.; Manwal, M.; Kukreja, V.; Sharma, R. Enhancing Mango Quality Evaluation: Utilizing an MLP Model for Five-Class Severity Grading. In Proceedings of the 2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET), Nagpur, India, 7–8 June 2024; pp. 1–4. [Google Scholar]
Ghodeswar, U.; Puri, C.; Shingade, S.; Waware, T.; Ladhe, A.; Durge, T. Sorting of Fresh and Damaged Apple Fruits using Machine Learning Approach. In Proceedings of the 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 24–26 May 2024; pp. 1–4. [Google Scholar]
Dakwala, K.; Shelke, V.; Bhagwat, P.; Bagade, A.M. Evaluating performances of various CNN architectures for multi-class classification of rotten fruits. In Proceedings of the 2022 Sardar Patel International Conference on Industry 4.0-Nascent Technologies and Sustainability for’Make in India’Initiative, Mumbai, India, 22–23 December 2022; pp. 1–4. [Google Scholar]
Ahmed, I.; Yadav, P.K. Predicting Apple Plant Diseases in Orchards Using Machine Learning and Deep Learning Algorithms. SN Comput. Sci. 2024, 5, 700. [Google Scholar] [CrossRef]
Admass, W.S.; Munaye, Y.Y.; Bogale, G.A. Convolutional neural networks and histogram-oriented gradients: A hybrid approach for automatic mango disease detection and classification. Int. J. Inf. Technol. 2024, 16, 817–829. [Google Scholar] [CrossRef]
Bezabh, Y.A.; Ayalew, A.M.; Abuhayi, B.M.; Demlie, T.N.; Awoke, E.A.; Mengistu, T.E. Classification of mango disease using ensemble convolutional neural network. Smart Agric. Technol. 2024, 8, 100476. [Google Scholar] [CrossRef]
Bhavya, K.; Raja, S.P. Fruit quality prediction using deep learning strategies for agriculture. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 301–310. [Google Scholar]
Singh, S.; Gupta, I.; Gupta, S.; Koundal, D.; Aljahdali, S.; Mahajan, S.; Pandit, A.K. Deep learning based automated detection of diseases from Apple leaf images. Comput. Mater. Contin. 2022, 71, 1849–1866. [Google Scholar]
Shi, H.; Wang, Z.; Peng, H.; Jiang, J. Application Research of Non-destructive Detection of Apple Sugar Content Based on Convolution Neural Network. In Proceedings of the 2023 5th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT), Guangzhou, China, 18–20 August 2023; pp. 168–171. [Google Scholar]
Jayaweera, S.; Sewwandi, P.; Tharaka, D.; Pallewatta, P.; Halloluwa, T.; Wickramasinghe, M.; Karunanayaka, K.; Arachchi, S.M. MangoDB-A TJC Mango Dataset for Deep-Learning-Based on Classification and Detection in Precision Agriculture. In Proceedings of the 2024 4th International Conference on Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, 21–24 February 2024; pp. 115–120. [Google Scholar]
Kona, M.S.R.; Guvvala, A.; Eedara, V.V.L.; Gowri, M.S.; Aluri, V. Mango Fruit Defect Detection Using MobileNetV2. In Proceedings of the 2024 International Conference on Emerging Innovations and Advanced Computing (INNOCOMP), Sonipat, India, 25–26 May 2024; pp. 22–27. [Google Scholar]
Li, W.; Zhu, X.; Yu, X.; Li, M.; Tang, X.; Zhang, J.; Xue, Y.; Zhang, C.; Jiang, Y. Inversion of nitrogen concentration in apple canopy based on UAV hyperspectral images. Sensors 2022, 22, 3503. [Google Scholar] [CrossRef] [PubMed]
Peng, W.; Ren, Z.; Wu, J.; Xiong, C.; Liu, L.; Sun, B.; Liang, G.; Zhou, M. Qualitative and Quantitative Assessments of Apple Quality Using Vis Spectroscopy Combined with Improved Particle-Swarm-Optimized Neural Networks. Foods 2023, 12, 1991. [Google Scholar] [CrossRef]
Kumari, N.; Dwivedi, R.K.; Bhatt, A.K.; Belwal, R. Automated fruit grading using optimal feature selection and hybrid classification by self-adaptive chicken swarm optimization: Grading of mango. Neural Comput. Appl. 2022, 34, 1285–1306. [Google Scholar] [CrossRef]
Ashok, V.; Bharathi, R.; Shivakumara, P. Building a Medium Scale Dataset for Non-destructive Disease Classification in Mango Fruits Using Machine Learning and Deep Learning Models. Int. J. Image Graph. Signal Process. 2023, 15, 83–95. [Google Scholar] [CrossRef]
Guo, Z.; Zou, Y.; Sun, C.; Jayan, H.; Jiang, S.; El-Seedi, H.R.; Zou, X. Nondestructive determination of edible quality and watercore degree of apples by portable Vis/NIR transmittance system combined with CARS-CNN. J. Food Meas. Charact. 2024, 18, 1–16. [Google Scholar] [CrossRef]
Lian, J.; Ma, L.; Wu, X.; Zhu, T.; Liu, Q.; Sun, Y.; Mei, Z.; Ning, J.; Ye, H.; Hui, G.; et al. Visualized pattern recognition optimization for apple mechanical damage by laser relaxation spectroscopy. Int. J. Food Prop. 2023, 26, 1566–1578. [Google Scholar] [CrossRef]
Dhiman, B.; Kumar, Y.; Hu, Y.C. A general purpose multi-fruit system for assessing the quality of fruits with the application of recurrent neural network. Soft Comput. 2021, 25, 9255–9272. [Google Scholar] [CrossRef]
Francis, J.; Disney, M.; Law, S. Monitoring canopy quality and improving equitable outcomes of urban tree planting using LiDAR and machine learning. Urban For. Urban Green. 2023, 89, 128115. [Google Scholar] [CrossRef]
Neyns, R.; Efthymiadis, K.; Libin, P.; Canters, F. Fusion of multi-temporal PlanetScope data and very high-resolution aerial imagery for urban tree species mapping. Urban For. Urban Green. 2024, 99, 128410. [Google Scholar] [CrossRef]
Rayed, M.E.; Akib, A.A.; Alfaz, N.; Niha, S.I.; Islam, S.S. A vision transformer-based approach for recognizing seven prevalent mango leaf diseases. In Proceedings of the 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 13–15 December 2023; pp. 1–6. [Google Scholar]
Mir, T.A.; Gupta, S.; Malhotra, S.; Devliyal, S.; Banerjee, D.; Chythanya, K.R. Hybrid CNN-SVM System for Multiclass Detection of Apple Leaf Diseases. In Proceedings of the 2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore, India, 28–29 June 2024; pp. 1–6. [Google Scholar]
Huang, Y.; Wang, J.; Li, N.; Yang, J.; Ren, Z. Predicting soluble solids content in “Fuji” apples of different ripening stages based on multiple information fusion. Pattern Recognit. Lett. 2021, 151, 76–84. [Google Scholar] [CrossRef]
Liu, Y.; Jiang, L.; Kong, L.; Xiang, Q.; Liu, X.; Chen, G. Wi-Fruit: See through fruits with smart devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 1–29. [Google Scholar] [CrossRef]
Watnakornbuncha, D.; Am-Dee, N.; Sangsongfa, A. Adaptive Deep Learning with Optimization Hybrid Convolutional Neural Network and Recurrent Neural Network for Prediction Lemon Fruit Ripeness. Prz. Elektrotech. 2024, 2024, 202–211. [Google Scholar] [CrossRef]
Magro, R.B.; Alves, S.A.M.; Gebler, L. Computational models in Precision Fruit Growing: Reviewing the impact of temporal variability on perennial crop yield assessment. SN Comput. Sci. 2023, 4, 554. [Google Scholar] [CrossRef]
Awotunde, J.B.; Misra, S.; Obagwu, D.; Florez, H. Multiple colour detection of RGB images using machine learning algorithm. In Proceedings of the International Conference on Applied Informatics, Arequipa, Peru, 27–29 October 2022; Springer: Cham, Switzerland, 2022; pp. 60–74. [Google Scholar]
Kumar, S.; Shwetank, S.; Jain, K. Development of spectral signature of land cover and feature extraction using artificial neural network model. In Proceedings of the 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Greater Noida, India, 19–20 February 2021; pp. 113–118. [Google Scholar]
Farjon, G.; Huijun, L.; Edan, Y. Deep-learning-based counting methods, datasets, and applications in agriculture: A review. Precis. Agric. 2023, 24, 1683–1711. [Google Scholar] [CrossRef]
Knott, M.; Perez-Cruz, F.; Defraeye, T. Facilitated machine learning for image-based fruit quality assessment. J. Food Eng. 2023, 345, 111401. [Google Scholar] [CrossRef]
Xiao, B.; Nguyen, M.; Yan, W.Q. Fruit ripeness identification using transformers. Appl. Intell. 2023, 53, 22488–22499. [Google Scholar] [CrossRef]
Das, D.H.; Dey Roy, S.; Dey, S.; Saha, P.; Bhowmik, M.K. A novel self-attention guided deep neural network for bruise segmentation using infrared imaging. Innov. Syst. Softw. Eng. 2024, 20, 1–9. [Google Scholar] [CrossRef]
Afsar, M.M.; Bakhshi, A.D.; Hussain, E.; Iqbal, J. A deep learning-based framework for object recognition in ecological environments with dense focal loss and occlusion. Neural Comput. Appl. 2024, 36, 9591–9604. [Google Scholar] [CrossRef]
Mirbod, O.; Choi, D.; Heinemann, P.H.; Marini, R.P.; He, L. On-tree apple fruit size estimation using stereo vision with deep learning-based occlusion handling. Biosyst. Eng. 2023, 226, 27–42. [Google Scholar] [CrossRef]
Bongulwar, D.M.; Singh, V.P.; Talbar, S. Evaluation of CNN based on Hyperparameters to Detect the Quality of Apples. Int. J. Eng. Trends Technol. 2022, 70, 232–246. [Google Scholar] [CrossRef]
Huang, Y.; Ren, Z.; Li, D.; Liu, X. Phenotypic techniques and applications in fruit trees: A review. Plant Methods 2020, 16, 2–22. [Google Scholar] [CrossRef]
Sadhana, T.; RJ, A.K.; Bhavani, S.; BN, S.K. Fruit Quality Identification Using Deep LearningTechniques. In Proceedings of the 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), Mandya, India, 26–27 December 2022; pp. 1–4. [Google Scholar]
Goel, D.; Singh, D.; Gupta, A.; Yadav, S.P.; Sharma, M. An Efficient Approach For To Predict The Quality Of Apple Through Its Appearance. In Proceedings of the 2023 International Conference on Computer, Electronics & Electrical Engineering & Their Applications (IC2E3), Srinagar Garhwal, India, 8–9 June 2023; pp. 1–6. [Google Scholar]
Sharma, G.; Singh, A.; Jain, S. Hybrid deep learning techniques for estimation of daily crop evapotranspiration using limited climate data. Comput. Electron. Agric. 2022, 202, 107338. [Google Scholar]
Sundaram, K.M.; Shankar, T.; Reddy, N.S. An efficient fruit quality monitoring and classification using convolutional neural network and fuzzy system. Int. J. Eng. Syst. Model. Simul. 2024, 15, 20–26. [Google Scholar] [CrossRef]
Karthikeyan, M.; Subashini, T.; Srinivasan, R.; Santhanakrishnan, C.; Ahilan, A. YOLOAPPLE: Augment Yolov3 deep learning algorithm for apple fruit quality detection. Signal Image Video Process. 2024, 18, 119–128. [Google Scholar] [CrossRef]
Chandak, M.; Rawat, S. Hyperspectral Imaging Technique to Analyse Fruit Quality Using Deep Learning: Apple Perspective. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 114–123. Available online: https://ijisae.org/index.php/IJISAE/article/view/4797 (accessed on 5 May 2024).
Zhao, M.; You, Z.; Chen, H.; Wang, X.; Ying, Y.; Wang, Y. Integrated Fruit Ripeness Assessment System Based on an Artificial Olfactory Sensor and Deep Learning. Foods 2024, 13, 793. [Google Scholar] [CrossRef]
Sannidhan, M.; Martis, J.E.; Suhas, M.; Sunil Kumar Aithal, S. Predicting Citrus Limon Maturity with Precision Using Transfer Learning. In Proceedings of the 2023 International Conference on Recent Advances in Information Technology for Sustainable Development (ICRAIS), Manipal, India, 6–7 November 2023; pp. 182–187. [Google Scholar]
Arivalagan, D.; Nikitha, P.; Manoj, G.; Jeevanantham, C.; Vignesh, O. Intelligent Fruit Quality Assessment Using CNN Transfer Learning Techniques. In Proceedings of the 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), Bengaluru, India, 15–16 March 2024; pp. 1–6. [Google Scholar]
Hasanzadeh, B.; Abbaspour-Gilandeh, Y.; Soltani-Nazarloo, A.; Hernández-Hernández, M.; Gallardo-Bernal, I.; Hernández-Hernández, J.L. Non-destructive detection of fruit quality parameters using hyperspectral imaging, multiple regression analysis and artificial intelligence. Horticulturae 2022, 8, 598. [Google Scholar] [CrossRef]
Kaur, A.; Sharma, R.; Thapliyal, N.; Aeri, M. Improving Mango Quality Assessment: A Multi-Layer Perceptron Approach for Grading. In Proceedings of the 2024 2nd World Conference on Communication & Computing (WCONF), Raipur, India, 12–14 July 2024; pp. 1–4. [Google Scholar]
Meléndez, A.S.; Burry, L.S.; Palacio, P.I.; Trivi, M.E.; Quesada, M.N.; Freire, V.Z.; D’Antoni, H. Ecosystems dynamics and environmental management: An NDVI reconstruction model for El Alto-Ancasti mountain range (Catamarca, Argentina) from 442 AD through 1980 AD. Quat. Sci. Rev. 2024, 324, 108450. [Google Scholar] [CrossRef]
Zhang, C.; Valente, J.; Kooistra, L.; Guo, L.; Wang, W. Orchard management with small unmanned aerial vehicles: A survey of sensing and analysis approaches. Precis. Agric. 2021, 22, 2007–2052. [Google Scholar] [CrossRef]
Pan, L.; Wu, W.; Hu, Z.; Li, H.; Zhang, M.; Zhao, J. Updating apple Vis-NIR spectral ripeness classification model based on deep learning and multi-seasonal database. Biosyst. Eng. 2024, 245, 164–176. [Google Scholar] [CrossRef]
Tan, W.K.; Husin, Z.; Yasruddin, M.L.; Ismail, M.A.H. Development of a non-destructive fruit quality assessment utilizing odour sensing, expert vision and deep learning algorithm. Neural Comput. Appl. 2024, 36, 19613–19641. [Google Scholar] [CrossRef]

Figure 1. A summary of this study. We describe the problem. The problem requires a literature review to identify and analyze the solutions.

Figure 2. The general results. In this figure, the taxonomy terms identified in the selected literature are mapped to each research question.

Figure 3. The types of crops in the results. We identified the type of crop used in the papers. The main crops are coffee, mango, and others.

Table 1. Database queries.

Filter	Years Interval	Query
Scopus	2019–2024	TITLE-ABS-KEY (“neural networks”) AND ((TITLE-ABS-KEY (“fruit ripeness” OR “harvest prediction” OR “crop maturity detection” OR “fruit quality”)) OR (TITLE-ABS-KEY(“NDVI” OR “EVI” OR “SAVI”))) AND TITLE-ABS-KEY (“apple” OR “coffee” OR “lemon” OR “mango”) AND PUBYEAR > 2019 AND PUBYEAR < 2025 AND DOCTYPE (ar) AND (LIMIT-TO (SUBJAREA,“AGRI”) OR LIMIT-TO (SUBJAREA,“COMP”) OR LIMIT-TO (SUBJAREA,“ENGI”))
Science Direct	2019–2024	(“neural networks” AND ((“fruit ripeness” OR “harvest prediction” OR “crop maturity detection” OR “fruit quality”) OR (“NDVI” OR “EVI” OR “SAVI”)) Year: 2019–2026)
Springer Link	2019–2024	“neural networks” AND (“fruit ripeness” OR “harvest prediction” OR “crop maturity detection” OR “fruit quality”) AND (“NDVI” OR “EVI” OR “SAVI”) AND (“apple” OR “coffee” OR “lemon” OR “mango”)
IEEE Explore	2019–2024	((“neural networks”) AND ((“fruit ripeness” OR “harvest prediction” OR “crop maturity detection” OR “fruit quality”) OR (“NDVI” OR “EVI” OR “SAVI”)) AND (“apple” OR “coffee” OR “lemon” OR “mango”))

Table 2. Selected articles.

Database	Retrieved	Selected
Scopus	52	30
IEEE Xplore	37	19
Science Direct	56	6
Springer Link	11	10
Total	156	65

Table 3. Study design modules.

Module	Basic type	Definition
Train data	Input	This module focuses on collecting and preparing the data used to train the ANN. It includes labeled datasets of images representing crops and their respective states (e.g., maturity, quality).
Neural network architecture	Processing	This module defines the design and structure of the neural networks, including the number of layers, activation functions, and connection strategies tailored for agricultural image processing.
Image preprocessing	Input	This module involves preparing the images for neural network input, such as resizing, filtering, and augmentation to enhance learning and reduce noise.
Vegetation indices	Feature extraction	This module computes indices like NDVI, EVI, and SAVI from images to enhance the representation of crop health and maturity levels.
Hardware and devices	Acquisition	This module discusses the tools and devices (e.g., drones, cameras, satellites) used for capturing the images required for the analysis.
Evaluation metrics	Output	This module focuses on defining the metrics used to evaluate the performance of the neural networks, such as accuracy, precision, recall, and F1-score.
Prediction and monitoring	Output	This module focuses on how the trained models are used to predict crop maturity, detect diseases, or estimate yield, providing actionable insights for farmers.

Table 4. Quality criteria per module.

Module	Quality Criteria
Training data	VC1. The dataset includes labeled images relevant to the selected crops (mango, apple, lemon, or coffee). VC2. The dataset covers diverse environmental conditions to ensure model generalizability. VC3. The data are balanced to avoid bias in training the neural networks.
Neural network architecture	VC1. The architecture is explicitly described, including the number of layers, activation functions, and optimization techniques. VC2. The choice of architecture (e.g., CNNs, GANs, transformers) is justified based on the problem being addressed. VC3. The architecture incorporates techniques to mitigate overfitting, such as dropout or regularization.
Image preprocessing	VC1. The preprocessing steps, such as resizing, normalization, and augmentation, are clearly detailed. VC2. The preprocessing techniques align with best practices for neural network training.
Hardware and devices	VC1. The hardware used for image acquisition (e.g., drones, cameras) is specified. VC2. The resolution and quality of the captured images are appropriate for the intended analysis.
Evaluation metrics	VC1. The evaluation process includes both training and validation datasets.
Prediction and monitoring	VC1. The models provide insights for farmers, such as predicting fruit ripeness or detecting crop diseases. VC2. The prediction outputs are validated with real-world data. VC3. The monitoring solutions are scalable and practical for agricultural use.

Table 5. Image processing workflow in CNN for fruit ripeness validation.

Stage	Description
Image Acquisition	Images of fruits are captured using RGB cameras, hyperspectral cameras, drones, or professional cameras.
Preprocessing	Techniques such as resizing, color normalization, rotation, segmentation, and data augmentation are applied to improve image quality.
Feature Extraction	Convolutional layers detect textures, edges, and patterns associated with fruit ripeness. Pretrained models like ResNet, DenseNet, or MobileNet are used for this.
Classification or Prediction	Fully connected layers (Softmax) or algorithms like SVM are used to predict fruit ripeness in categories or continuous values.
Post-Processing and Decision Making	Alerts for harvesting are generated, data are integrated into agricultural databases, and automated harvesting systems are activated.

Table 6. Metrics used by the authors to evaluate the ANNs. It should be noted that each author used a different dataset; it is not feasible to compare performance across studies. Our goal was to map the metrics used and understand the values of each metric.

Paper	Type of ANN or Model	Metrics	Values
[32]	BPNN (Back Propagation Neural Network)	$R^{2}$ Section 3.10.1, RMSE Section 3.10.2	0.77, 0.16
[33]	SDSG-PCA-BPNN	Accuracy Section 3.10.3	87.88%
[24,25,27,28,34,35,50,54,60]	CNN	Accuracy Section 3.10.3	91.52%, 83.17%, 92.00%, 99.00%, 99.5%, 90.00%, 94.79%, 99.20%
[43]	SAE-BPNN	$R^{2}$ Section 3.10.1, RMSE Section 3.10.2	0.5953, 0.8856%
[44]	Light ANN + Visual Fusion	RMSE Section 3.10.2	0.319
[50]	Pre-Trained Vision Transformer (ViT)	Accuracy Section 3.10.3	95.0%
[66]	Neural network feed-forward (trained with LM)	$R^{2}$ Section 3.10.1, RMSE Section 3.10.2	0.93, 0.03
[67]	MLP (Multilayer Perceptron)	Accuracy Section 3.10.3	95.0%
[36]	CARS-CNN	Accuracy Section 3.10.3	9.43%
[38]	RNN	Accuracy Section 3.10.3, Precision Section 3.10.4, Recall Section 3.10.4	98.47%, 98.93%, 75.44%
[55]	CNN-LSTM	Accuracy Section 3.10.3	96.08%
[61]	YOLO v3	Precision Section 3.10.4	99.13%
[62]	YOLO v5	Precision Section 3.10.4	95.00%
[23]	RestNet50	Accuracy Section 3.10.3	98.27%
[23]	VGG	Accuracy Section 3.10.3	98.41%
[63]	DenseNet	Accuracy Section 3.10.3	97.39%

Table 7. A comparison of pretrained models and the types of Artificial Neural Networks.

Approach	Advantages	Disadvantages	Level of Complexity
MobileNetV2	Lightweight, fast inference	Lower accuracy than deeper models	Low
ResNet	Handles vanishing gradients well	Computationally expensive	High
VGG	Simple architecture, good accuracy	Large number of parameters, slow	High
Yolo v3	Fast object detection	Lower accuracy for small objects	Medium
Yolo v5	Optimized, lightweight	Requires retraining for custom datasets	Medium
Densenet	Efficient parameter usage, improved feature reuse	High memory consumption	High
CNN	Effective for image tasks	Requires large datasets	Medium
RNN	Good for sequential data	Struggles with long dependencies	Medium
DPNN	Robust feature extraction	High computational cost	High
DNN	General-purpose, scalable	Requires careful tuning	Medium

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Unigarro, C.; Hernandez, J.; Florez, H. Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops. Informatics 2025, 12, 46. https://doi.org/10.3390/informatics12020046

AMA Style

Unigarro C, Hernandez J, Florez H. Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops. Informatics. 2025; 12(2):46. https://doi.org/10.3390/informatics12020046

Chicago/Turabian Style

Unigarro, Christian, Jorge Hernandez, and Hector Florez. 2025. "Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops" Informatics 12, no. 2: 46. https://doi.org/10.3390/informatics12020046

APA Style

Unigarro, C., Hernandez, J., & Florez, H. (2025). Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops. Informatics, 12(2), 46. https://doi.org/10.3390/informatics12020046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Neural Networks for Image Processing in Precision Agriculture: A Systematic Literature Review on Mango, Apple, Lemon, and Coffee Crops

Abstract

1. Introduction

Highlights

2. Methodology

2.1. Research Questions

2.1.1. Search Strategy

2.1.2. Selected Journals and Conferences

2.2. Study Selection Criteria

2.2.1. Inclusion Criteria

2.2.2. Exclusion Criteria

2.3. Study Quality Assessment

2.4. Data Extraction

2.5. Data Synthesis and Quality Verification

3. Results

3.1. Methods for Classification and Segmentation

3.2. Image Features

3.3. Mathematical Methods

3.3.1. Residual Predictive Deviation (RPD)

3.3.2. Particle Swarm Optimization (PSO)

3.3.3. Chicken Swarm Optimization (CSO)

3.3.4. Discriminant Score

3.3.5. Discrete Fourier Transform (DFT)

3.3.6. Monte Carlo

3.3.7. Multicount Measurement Classification and Recognition

3.3.8. Principal Component Analysis (PCA)

3.3.9. Land Surface Temperature (LST)

3.3.10. Shannon Entropy

3.4. Fruit Harvest Using Artificial Neural Networks

3.5. Image Augmentation

Identified Techniques for Image Processing

3.6. Type of Crops

3.7. Calculating Vegetation Index

3.8. Software and Hardware in Artificial Neural Networks in Crops

3.9. Common ANN Architectures and Models

3.9.1. MobileNetV2 [31]

3.9.2. VGG16 [26,27,57,58,59]

3.9.3. ResNet50 [23,59]

3.9.4. Inception V3 [58]

3.9.5. ThinNet [60]

3.9.6. Faster R-CNN [54]

3.9.7. Mask R-CNN [51,54]

3.9.8. YOLOv3 [61] and YOLOv5 [62]

3.9.9. DenseNet [63]

3.10. Metrics for Evaluating the Artificial Neural Networks

3.10.1. R-Squared ( R 2 )

3.10.2. Root Mean Square Error (RMSE)

3.10.3. Acccuracy

3.10.4. Precision, Recall and F1-Score

3.11. Input Data Dimensionality and Dataset Sizes

3.12. Training and Validation Methodologies

3.13. Model Hyperparameters and Architecture Details

4. Discussion

4.1. Using Customized Datasets

4.2. Indices Not Related to Ripeness and Quality of Fruits

4.3. Infrared Images as a Better Dataset to Improve Detection

4.4. Machine Learning Architectures Designed for Robotic Purposes

4.5. Standarized Evaluation Metrics

4.6. Integration of Hardware and Software

4.7. Discussion on the Complexity of Approaches

4.8. Recommendations According to the Results

4.9. Reliability and Significance of Reported Results

5. Future Work

5.1. Generate New Literature Review

5.2. New Approaches in Crop Monitoring

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

3.10.1. R-Squared ( $R^{2}$ )