Next Article in Journal
Comparison of Three Temperature and Emissivity Separation Algorithms for Graybodies with Low Spectral Contrast: A Case Study on Water Bodies
Previous Article in Journal
Observations of Saharan Dust Intrusions over Potenza, Southern Italy, During 13 Years of Lidar Measurements: Seasonal Variability of Optical Properties and Radiative Impact
 
 
Article
Peer-Review Record

A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications

Remote Sens. 2025, 17(3), 454; https://doi.org/10.3390/rs17030454
by Tajul Miftahushudur 1,2, Halil Mertkan Sahin 2, Bruce Grieve 2 and Hujun Yin 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Remote Sens. 2025, 17(3), 454; https://doi.org/10.3390/rs17030454
Submission received: 4 November 2024 / Revised: 23 January 2025 / Accepted: 25 January 2025 / Published: 29 January 2025
(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study aims to survey and analyze methods for mitigating data imbalance in machine learning, with a particular focus on agricultural applications. Data imbalance is of critical importance in agriculture, where minority classes, such as rare crop diseases and pest infestations, often have limited samples but significant implications for decision-making. The research addresses a vital issue in the application of ML to agriculture. However, the following feedback highlights areas that could improve the clarity, coherence, and relevance of the paper:

 

Introduction Section:

  • The authors dedicate an extensive portion of the introduction (lines 16–76) to emphasize the significance of food safety and sustainable agriculture. While these are undoubtedly important topics, they lack a direct causal relationship with the issue of data imbalance. It is recommended that the authors shift their focus towards addressing the underlying reasons for data imbalance in agricultural science and its specific implications for machine learning (ML) development.
  • In lines 55–57, the authors state that 'NDVI is limited by environmental factors.' While NDVI is designed to quantify vegetation greenness, it is influenced by environmental factors such as soil background effects, atmospheric conditions, and vegetation density. Additionally, sensor characteristics (e.g., spatial or spectral resolution) can further exacerbate these limitations. The authors should also clarify the connection between NDVI and data imbalance, as this link is not clearly articulated and would benefit from additional explanation or examples. 
  • Lines 60–69 claim that 'VI do not provide the expected level of accuracy in certain situations, while ML has demonstrated effectiveness.' This comparison may be flawed, as VI and ML serve different roles in agricultural science. VI is typically used as a feature augmentation layer, enhancing input data, whereas ML is an application layer for tasks such as classification or prediction. In most ML and DL models, VI functions as either input data or intermediate features. Comparing VI and ML directly overlooks their complementary relationship in many workflows and would benefit from clearer contextual framing.
  • In lines 79–84, the authors assert that "the cost incurred due to misclassification by ML can be significant" and provide examples to support this claim. However, it should be noted that the cost of misclassification, irrespective of its origin, is inherently significant and is not a disadvantage specific to ML. Additionally, the relevance of this argument to the central theme of data imbalance is unclear and should be better contextualized.  The authors could strengthen their argument by explicitly linking the cost of misclassification to data imbalance. For example, the authors might explain how data imbalance increases the likelihood of misclassification for underrepresented classes and why this is particularly problematic.
  • The example provided in lines 92–96, which assumes an extreme data distribution of 5% healthy and 95% diseased samples, which may not effectively support the argument. For the example to be persuasive, the authors should provide credible sources or a strong rationale for why this distribution is appropriate in the context of their argument.

 

Section 2:

  • As a review paper, the current classification of challenges is divided into only two categories. Data imbalance amplifies challenges at every stage of machine learning, from training to evaluation. For example, minority class underrepresentation leads to biased models and unreliable metrics, requiring solutions like cost-sensitive learning, resampling, or specialized algorithms.  Providing a more comprehensive classification of these challenges would enhance understanding of how data imbalance affects various stages of machine learning and support the development of more effective mitigation strategies.

Section 3:

  • Section 4 could be integrated into Section 3.1 for better structural coherence and to avoid redundancy in the discussion.

Section 6:

·      The metrics discussed in this section, such as MCC, recall, F1-score, PR-AUC, confusion metrics, sensitivity, and specificity, are valuable for evaluating model performance in the presence of data imbalance. However, the review could be further enhanced by including additional metrics specifically relevant to imbalanced datasets, such as G-Mean, balanced accuracy, or cost-sensitive metrics. These metrics provide complementary insights, particularly in cases where class imbalance is severe or where specific trade-offs between false positives and false negatives are critical. Expanding the discussion to include these metrics would offer a more comprehensive evaluation framework and better address the unique challenges posed by data imbalance.

 

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

  1. The authors dedicate an extensive portion of the introduction (lines 16–76) to emphasize the significance of food safety and sustainable agriculture. While these are undoubtedly important topics, they lack a direct causal relationship with the issue of data imbalance. It is recommended that the authors shift their focus towards addressing the underlying reasons for data imbalance in agricultural science and its specific implications for machine learning (ML) development.

Response: Thank you for your valuable feedback. We have expanded paragraph 6 (lines 70–76) to further emphasize the importance of data imbalance and explain the relationship between the causes of data imbalance in agricultural sciences and its implications for the development of machine learning models.

(Lines 80 – 94) One of the most common challenges in developing ML and Deep Learning (DL) models is data imbalance, which occurs when the number of samples in one class is significantly smaller or larger compared to other classes in the dataset [1]. This issue is not confined to a specific field but is also frequently encountered in various domains such as healthcare, finance, and security [2, 3, 4]. In the context of Precision Agriculture (PA), data imbalance becomes more complex due to the unique characteristics of agricultural data, such as the irregularity of events (e.g., pest outbreaks or rare diseases) [5], limited access to data from remote or under-researched regions [6], and seasonal variations that cause certain agricultural phenomena to be infrequent or difficult to capture [7]. As a result, ML models trained on imbalanced datasets tend to be biased toward the majority class, making it difficult to generalise to underrepresented classes. For example, a model primarily trained on healthy plants may fail to detect rare diseases that occur sporadically, thereby reducing the reliability of ML solutions in addressing real-world agricultural challenges. Therefore, understanding and addressing data imbalance is crucial for improving the accuracy, robustness, and reliability of ML models in agricultural applications.

  1. In lines 55–57, the authors state that 'NDVI is limited by environmental factors.' While NDVI is designed to quantify vegetation greenness, it is influenced by environmental factors such as soil background effects, atmospheric conditions, and vegetation density. Additionally, sensor characteristics (e.g., spatial or spectral resolution) can further exacerbate these limitations. The authors should also clarify the connection between NDVI and data imbalance, as this link is not clearly articulated and would benefit from additional explanation or examples.

Response: Thank you for your suggestion. I have revised the statement regarding the limitations of NDVI and the factors that influence it. Additionally, I have added a clearer explanation of the relationship between NDVI (or other vegetation indices) and data imbalance, particularly when integrated used into the development of machine learning models.

(Lines 49 – 67) One example is the monitoring of plant health conditions using hyperspectral or multispectral imagery, which are processed to derive spectral Vegetation Indices (VIs) as indicators of plant health. One of the most common VIs is the Normalised Difference Vegetation Index (NDVI), which is related to vegetation greenness and can detect changes in plant health [8]. However, the effectiveness of the analyses based on NDVI is limited by environmental factors, such as soil background effects, atmospheric conditions, and vegetation density [9, 10]. Soil background effects can influence reflectance [11], while atmospheric conditions [12], such as clouds and dust, disturb the accuracy of measurements.

NDVI values are not always uniform within a scene due to various factors, especially cloud shadows or other objects [13]. Variations in these conditions can lead to differences in reflectance values for each pixel of an object. For example, in bad weather or when an area is covered by thick clouds that reduce sunlight intensity [14], the plant's reflectance value becomes very low, thus lowering the NDVI value. This can cause analysis to misclassify healthy plants as unhealthy, decreasing accuracy [15, 16]. On the other hand, machine learning-based analysis can provide better accuracy with analysing more complex spectral data by considering feature patterns, even in the presence of environmental changes that disturb the reflectance readings on sensors [17, 18]. Also, high vegetation density can lead to NDVI value saturation [19], and sensor characteristics, such as spatial and spectral resolution, limit the ability to detect variations in vegetation [20].

We have also added a new paragraph highlighting that the use of vegetation indices (including NDVI) in training data for machine learning shows promise. However, this approach is susceptible to data imbalance issues, as NDVI values at each pixel are vulnerable to intra-class variance due to inconsistencies in light sources.

(Lines 67 – 72) The combination of various features from VIs, including NDVI, commonly used to assess plant health, in ML, can significantly enhance the performance of health analysis and plant classification [17]. However, the issue of inconsistency in NDVI values can lead to intra-class variance [21, 22]. This phenomenon occurs when an object/class has a variation in NDVI values, potentially creating sub-classes that are unevenly spread within a main class.

  1. Lines 60–69 claim that 'VI do not provide the expected level of accuracy in certain situations, while ML has demonstrated effectiveness.' This comparison may be flawed, as VI and ML serve different roles in agricultural science. VI is typically used as a feature augmentation layer, enhancing input data, whereas ML is an application layer for tasks such as classification or prediction. In most ML and DL models, VI functions as either input data or intermediate features. Comparing VI and ML directly overlooks their complementary relationship in many workflows and would benefit from clearer contextual framing.

Response: Thank you for your feedback. Indeed, these two have different roles. We have revised the paragraph and sentence, and we have also provided examples of the use of vegetation indices, including NDVI, and machine learning, highlighting their complementary roles in precision agriculture applications.

  1. In lines 79–84, the authors assert that "the cost incurred due to misclassification by ML can be significant" and provide examples to support this claim. However, it should be noted that the cost of misclassification, irrespective of its origin, is inherently significant and is not a disadvantage specific to ML. Additionally, the relevance of this argument to the central theme of data imbalance is unclear and should be better contextualized. The authors could strengthen their argument by explicitly linking the cost of misclassification to data imbalance. For example, the authors might explain how data imbalance increases the likelihood of misclassification for underrepresented classes and why this is particularly problematic.

Response: Thank you for your comments and suggestion. We agree. We have revised these sentences and added a couple of sentences outlining the issues caused by data imbalance.

(Lines 95 – 104) In PA, researchers tend to focus more on identifying rare diseases or stress conditions that occur infrequently in crops. In such situations, detecting rare cases is much more critical as costs arising from misclassification can be highly significant [23]. Data imbalance is one of the factors that exacerbate this issue, as ML models tend to prioritise the majority class to minimise overall errors (global loss). As a result, classes with fewer representations—such as crops with rare diseases—are often overlooked or misclassified. Oppositely, if the model predicts healthy plants as infected with a disease, the costs may include unnecessary pesticide purchases and applications. However, if the model fails to detect a rare disease that should have been identified and treated, consequences can be much more severe. The infection could spread to other areas, increasing the risk to the entire agricultural field.

  1. The example provided in lines 92–96, which assumes an extreme data distribution of 5% healthy and 95% diseased samples, which may not effectively support the argument. For the example to be persuasive, the authors should provide credible sources or a strong rationale for why this distribution is appropriate in the context of their argument.

Response: Thank you, we have update the statement with more concrete examples in medical diagnosis application with supported by credible source.

(Lines 117 – 129) For example, when an ML model is trained to classify patients as healthy and unhealthy, with a training dataset consisting of 10097 (94\%) healthy patients and 677 (6\%) diagnosed with congenital heart disease (CHD) [24], the report shows the recall of 100\%. This indicates that the model successfully identifies all healthy patients. However, if the model fails to identify any diseased patients, it will still produce an accuracy score of 94\%. This occurs because accuracy, as a metric, is heavily influenced by the majority class (healthy patients in this case) and does not reflect the model's poor performance in detecting the minority class. Therefore, relying solely on accuracy in imbalanced datasets can be misleading and insufficient for evaluating model performance. While the imbalance in this example is extreme, ratios such as 10:100 or 15:100 are also common in agricultural applications. For instance, in [25], healthy potato samples (1430 samples) vastly outnumber diseased ones, particularly those affected by early blight (203 samples) or late blight (251 samples)

  1. As a review paper, the current classification of challenges is divided into only two categories. Data imbalance amplifies challenges at every stage of machine learning, from training to evaluation. For example, minority class underrepresentation leads to biased models and unreliable metrics, requiring solutions like cost-sensitive learning, resampling, or specialized algorithms. Providing a more comprehensive classification of these challenges would enhance understanding of how data imbalance affects various stages of machine learning and support the development of more effective mitigation strategies.

Response: Thank you for the constructive feedback and suggestion. We fully agree that data imbalance affects every stage in machine learning, from data collection to evaluation. Therefore, we will expand the classification of challenges by adding a more comprehensive category: data collection, model training, and evaluation metric selection, and review corresponding strategies to addressing them.

(Lines 189 – 248) 2.3. Impact of Data Imbalance on the ML Pipeline

Data imbalance presents significant obstacles across various stages of ML pipelines, from data collection to evaluation. To address this, the challenges are classified into three key stages: data collection, model training, and evaluation metric selection.

  • Data Collection:
    Datasets often suffer from unbalanced class distributions, especially when data for minority classes is scarce or difficult to collect due to various factors. One cause is the natural rarity of certain phenomena, such as rare diseases in crops or specific pest infestations that only occur in certain environmental conditions. Limited access to certain populations or regions is also a factor, such as remote farmlands that are difficult to reach for surveys or sampling. Finally, limited technology or resources to capture data in complex or hard-to-reach environments, such as farms in mountainous areas or areas that experience frequent and extreme weather changes, can also exacerbate data imbalance issues.

Another issue faced in data collection is the need to obtain varied data. In agriculture, for example, it is important to collect samples of plant species from different positions, as well as from different stages of plant growth or varying degrees of virus severity. This is crucial to ensure that ML models can generalise well and make accurate predictions across a wide range of real-world conditions. Unfortunately, in practice, obtaining data with sufficient variation is often constrained by limited time, cost, and availability of representative samples.

This challenge can be mitigated by generating or synthesising realistic data using techniques like data augmentation as a pre-processing strategy. Data augmentation methods, such as random cropping, flipping, or colour jittering, can be used to artificially increase the diversity of the dataset without the need for additional data collection [26]. Furthermore, generative models, such as GANs, can be employed to create new, realistic samples that represent underrepresented classes [27]. These synthetic data generation techniques have proven effective in addressing the challenges posed by limited or imbalanced real-world data in fields like agriculture or scenarios with rare events [28, 29].

  • Model Training:
    During training, bias toward the majority class becomes a significant issue. For instance, in plant disease detection for cassava, where datasets are often imbalanced. In this case, the number of healthy cassava plant images far exceeds those of plants infected with disease, causing the model to predominantly predict the majority class (healthy plants). This reduces the model's accuracy in detecting diseases in cassava plants, as it becomes less sensitive to the minority class, which represents diseased plants [30].

Furthermore, Overfitting presents another challenge, particularly when models are trained on imbalanced data. In this situation, the model frequently "sees" the majority class during training, leading to a lack of robustness and the tendency to overlook the minority class. As a result, the model becomes overfit, meaning it becomes overly tuned to the training data, leading to poor performance when tested on new, unseen data. In practice, such models struggle to generalise well and tend to predict the majority class, neglecting the minority class. [31].

In addition to data augmentations, this challenge often requires solutions like cost-sensitive learning, where greater penalties are applied to errors involving the minority class, or moving the threshold decision to a point that better balances the predictions between the majority and minority classes, such as fine-tuning the decision threshold to maximise recall or F1-score for the minority class [32, 33].

  • Evaluation Metric Selection
    When assessing model performance, a class imbalance can make common metrics such as accuracy can be misleading in imbalanced datasets, as a model may achieve high accuracy by predominantly predicting the majority class while failing to correctly classify the minority class. Similarly, precision and recall, when used independently, may not provide a complete picture of model performance. For example, High precision for a minority class might not account for the recall trade-off, where the model misses most of the minority class instances [34]. Alternative metrics such as F1-score, MCC, or G-Mean are often more informative in the context of class imbalance [35].
  1. Section 4 could be integrated into Section 3.1 for better structural coherence and to avoid redundancy in the discussion.

Response: Thank you for the suggestion to integrate Section 4 into Section 3.1. However, I believe it is more appropriate to keep both sections separate for the following reasons:

Section 3.1 focuses on addressing imbalance at the algorithm-level, specifically through traditional methods such as cost-sensitive learning, threshold moving, and others. This section deals with established techniques that are generally modify the standard machine learning algorithm. On the other hand, Section 4 discusses advanced deep learning approaches, such as data augmentation with GANs and Variational Autoencoders (VAE) which stand in data-level domain; and Transfer Learning. By keeping both sections separate, it has clearer distinction between classical methods and more advanced deep learning techniques.

  1. The metrics discussed in this section, such as MCC, recall, F1-score, PR-AUC, confusion metrics, sensitivity, and specificity, are valuable for evaluating model performance in the presence of data imbalance. However, the review could be further enhanced by including additional metrics specifically relevant to imbalanced datasets, such as G-Mean, balanced accuracy, or cost-sensitive metrics. These metrics provide complementary insights, particularly in cases where class imbalance is severe or where specific trade-offs between false positives and false negatives are critical. Expanding the discussion to include these metrics would offer a more comprehensive evaluation framework and better address the unique challenges posed by data imbalance.

Response: Thank you for the valuable suggestion. We agree that including additional metrics such as geometric mean (G-Mean), balanced accuracy, and cost-sensitive metrics would provide a more comprehensive evaluation framework. We have added these metrics and explain their relevance in the context of imbalanced datasets.

(Lines 710 - 730) 6.6. Geometric Mean (G-Mean)

G-mean is a metric that gauges the balance of classification performance between both the majority and minority classes and penalises inequalities (see Equation 6). A low G-mean indicates poor performance in classifying the majority cases, even if the minority cases are accurately classified [36].G-Mean is particularly useful when the goal is to maintain a balance between sensitivity and specificity.

                       (6)

6.7. Balanced Accuracy

Balanced accuracy is the mean of sensitivity and specificity, reflecting the overall accuracy achieved for both the minority and majority classes. When a classifier performs equally well on both classes, this measure is equivalent to traditional accuracy. However, if the classifier achieves a high traditional accuracy as the result of focusing on the majority class, the balanced accuracy will produce a lower score than the traditional accuracy [36]. The balanced accuracy is defined as:

           (7)

As seen in Equation 7, balanced accuracy accounts for performance on both classes, helping to mitigate the bias introduced by imbalanced datasets.

6.8. Cost-Sensitive Metrics

Cost-sensitive metrics are designed to account for the costs associated with classification errors, especially in the context of imbalanced data where errors in the minority class often have a greater impact. Some commonly used cost-sensitive metrics include the weighted F1-score [37], cost matrix (confusion matrix with cost), and cost-sensitive accuracy [38].

 

REFERENCES

[1] Leevy, J.L.; Khoshgoftaar, T.M.; Bauder, R.A.; Seliya, N. A survey on addressing high-class imbalance in big data. Journal of Big Data 2018. https://doi.org/10.1186/s40537-018-0151-6.

[2] Ojo, M.O.; Zahid, A. Improving Deep Learning Classifiers Performance via Preprocessing and Class Imbalance Approaches in a Plant Disease Detection Pipeline. Agronomy 2023, 13. https://doi.org/10.3390/agronomy13030887.

[3] Walsh, R.; Tardy, M. A Comparison of Techniques for Class Imbalance in Deep Learning Classification of Breast Cancer. Diagnostics 2023, 13. https://doi.org/10.3390/diagnostics13010067.

[4] Cheah, P.C.Y.; Yang, Y.; Lee, B.G. Enhancing Financial Fraud Detection through Addressing Class Imbalance Using Hybrid SMOTE-GAN Techniques. International Journal of Financial Studies 2023, 11. https://doi.org/10.3390/ijfs11030110.

[5] Xiang, Y.; Yao, J.; Yang, Y.; Yao, K.; Wu, C.; Yue, X.; Li, Z.; Ma, M.; Zhang, J.; Gong, G. Real-Time Detection Algorithm for Kiwifruit Canker Based on a Lightweight and Efficient Generative Adversarial Network. Plants 2023, 12. https://doi.org/10.3390/plants12173053.

[6] Pesaresi, S.; Mancini, A.; Casavecchia, S. Recognition and Characterization of Forest Plant Communities through Remote-Sensing NDVI Time Series. Diversity 2020, 12. https://doi.org/10.3390/d12080313.

[7] Mahakosee, S.; Jogloy, S.; Vorasoot, N.; Theerakulpisut, P.; Holbrook, C.C.; Kvien, C.K.; Banterng, P. Seasonal Variation in Canopy Size, Light Penetration and Photosynthesis of Three Cassava Genotypes with Different Canopy Architectures. Agronomy 2020, 10. https://doi.org/10.3390/agronomy10101554.

[8] Thenkabail, P.S.; Lyon, J.G.; Huete, A. Hyperspectral Indices and Image Classifications for Agriculture and Vegetation, 2 ed.; CRC Press, 2019.

[9] Li, C.; Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. Journal of Sensors 2017, 2017, 1353691. Hindawi, https://doi.org/10.1155/2017/1353691.

[10] Nascimento, S.M.; Amano, K.; Foster, D.H. Spatial distributions of local illumination color in natural scenes. Vision Research 2016. https://doi.org/10.1016/j.visres.2015.07.005.

[11] Prudnikova, E.; Savin, I.; Vindeker, G.; Grubina, P.; Shishkonakova, E.; Sharychev, D. Influence of Soil Background on Spectral Reflectance of Winter Wheat Crop Canopy. Remote Sensing 2019, 11. https://doi.org/10.3390/rs11161932.

[12] Moravec, D.; Komárek, J.; López-Cuervo Medina, S.; Molina, I. Effect of Atmospheric Corrections on NDVI: Intercomparability of Landsat 8, Sentinel-2, and UAV Sensors. Remote Sensing 2021, 13. https://doi.org/10.3390/rs13183550.

[13] Stamford, J.D.; Vialet-Chabrand, S.; Cameron, I.; Lawson, T. Development of an accurate low cost NDVI imaging system for assessing plant health. Plant Methods 2023, 19, 9. https://doi.org/10.1186/s13007-023-00981-8.

[14] Yang, X.; Zuo, X.; Xie, W.; Li, Y.; Guo, S.; Zhang, H. A Correction Method of NDVI Topographic Shadow Effect for Rugged Terrain. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2022, 15, 8456–8472. https://doi.org/10.1109/JSTARS.2022.3193419. 887

[15] Guo, Y.; Wang, C.; Lei, S.; Yang, J.; Zhao, Y. A Framework of Spatio-Temporal Fusion Algorithm Selection for Landsat NDVI Time Series Construction. ISPRS International Journal of Geo-Information 2020, 9. https://doi.org/10.3390/ijgi9110665.

[16] Dougherty, T.R.; Jain, R.K. Invisible walls: Exploration of microclimate effects on building energy consumption in New York City. Sustainable Cities and Society 2023, 90, 104364. https://doi.org/https://doi.org/10.1016/j.scs.2022.104364.

[17] AlSuwaidi, A.; Veys, C.; Hussey, M.; Grieve, B.; Yin, H. Hyperspectral selection based algorithm for plant classification. In Proceedings of the 2016 IEEE International Conference on Imaging Systems and Techniques (IST), 2016, pp. 395–400. https://doi.org/10.1109/IST.2016.7738258.

[18] Ramanath, A.; Muthusrinivasan, S.; Xie, Y.; Shekhar, S.; Ramachandra, B. NDVI Versus CNN Features in Deep Learning for Land Cover Clasification of Aerial Images. In Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, 2019, pp. 6483–6486. https://doi.org/10.1109/IGARSS.2019.8900165.

[19] Zaitunah, A.; Samsuri.; Marbun, Y.M.H.; Susilowati, A.; Elfiati, D.; Syahputra, O.K.H.; Arinah, H.; Rangkuti, A.B.; Rambey, R.; Harahap, M.M.; et al. Vegetation density analysis using normalized difference vegetation index in East Jakarta, Indonesia. IOP Conference Series: Earth and Environmental Science 2021, 912, 012053. https://doi.org/10.1088/1755-1315/912/1/012053.

[20] Franke, J.; Heinzel, V.; Menz, G. Assessment of NDVI- Differences Caused by Sensor Specific Relative Spectral Response Functions. In Proceedings of the 2006 IEEE International Symposium on Geoscience and Remote Sensing, 2006, pp. 1138–1141. https://doi.org/10.1109/IGARSS.2006.294.

[21] Gong, C.; Yin, R.; Long, T.; Jiao, W.; He, G.; Wang, G. Spatial–Temporal Approach and Dataset for Enhancing Cloud Detection in Sentinel-2 Imagery: A Case Study in China. Remote Sensing 2024, 16. https://doi.org/10.3390/rs16060973.

[22] Revel, C.; Deville, Y.; Achard, V.; Briottet, X.; Weber, C. Inertia-Constrained Pixel-by-Pixel Nonnegative Matrix Factorisation: A Hyperspectral Unmixing Method Dealing with Intra-Class Variability. Remote Sensing 2018, 10. https://doi.org/10.3390/rs10111706.

[23] He, J.; Cheng, M.X. Weighting Methods for Rare Event Identification From Imbalanced Datasets. Frontiers in Big Data 2021, 4. https://doi.org/10.3389/fdata.2021.715320.

[24] Kosolwattana, T.; Liu, C.; Hu, R.; Han, S.; Chen, H.; Lin, Y. A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare. BioData Min 2023, 16, 15.

[25] Hou, C.; Zhuang, J.; Tang, Y.; He, Y.; Miao, A.; Huang, H.; Luo, S. Recognition of early blight and late blight diseases on potato leaves based on graph cut segmentation. Journal of Agriculture and Food Research 2021, 5, 100154. https://doi.org/https://doi.org/10.1016/j.jafr.2021.100154.

[26] Khare, O.; Mane, S.; Kulkarni, H.; Barve, N. LeafNST: an improved data augmentation method for classification of plant disease using object-based neural style transfer. Discover Artificial Intelligence 2024, 4, 50. https://doi.org/10.1007/s44163-024-00150-3.

[27] Sauber-Cole, R.; Khoshgoftaar, T.M. The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey. Journal of Big Data 2022, 9, 98. https://doi.org/10.1186/s40537-022-00648-6.

[28] Temraz, M.; Kenny, E.M.; Ruelle, E.; Shalloo, L.; Smyth, B.; Keane, M.T. Handling Climate Change Using Counterfactuals: Using Counterfactuals in Data Augmentation to Predict Crop Growth in an Uncertain Climate Future. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, Vol. 12877 LNAI. https://doi.org/10.1007/978-3-030-86957-1_15.

[29] Mirzaei, A.; Bagheri, H.; Khosravi, I. Enhancing Crop Classification Accuracy through Synthetic SAR-Optical Data Generation Using Deep Learning. ISPRS International Journal of Geo-Information 2023, 12. https://doi.org/10.3390/ijgi12110450.

[30] Sambasivam, G.; Opiyo, G.D. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian Informatics Journal 2021, 22, 27–34. https://doi.org/https://doi.org/10.1016/j.eij.2020.02.007.

[31] Vaidya, H.; Prasad, K.; Rajashekhar, C.; Tripathi, D.; S, R.; Shetty, J.; Swamy, K.; Y, S. A class imbalance aware hybrid model for accurate rice variety classification. International Journal of Cognitive Computing in Engineering 2025, 6, 170–182. https://doi.org/https://doi.org/10.1016/j.ijcce.2024.12.004.

[32] Prexawanprasut, T.; Banditwattanawong, T. Improving Minority Class Recall through a Novel Cluster-Based Oversampling Technique. Informatics 2024, 11. https://doi.org/10.3390/informatics11020035.

[33] Provost, F.; Fawcett, T. Robust Classification for Imprecise Environments. Machine Learning 2001, 42, 203–231. https://doi.org/10.1023/A:1007601015854.

[34] Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10. https://doi.org/10.1371/journal.pone.0118432.

[35] Williams, C.K.I. The Effect of Class Imbalance on Precision-Recall Curves. Neural Computation 2021, 33, 853–857. https://doi.org/10.1162/neco_a_01362.

[36] Akosa, J.S. Predictive Accuracy : A Misleading Performance Measure for Highly Imbalanced Data. 2017.

[37] Hinojosa Lee, M.C.; Braet, J.; Springael, J. Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores. Applied Sciences 2024, 14. https://doi.org/10.3390/app14219863.

[38] Man, X.; Lin, J.; Yang, Y. Stock-UniBERT: A News-based Cost-sensitive Ensemble BERT Model for Stock Trading. In Proceedings of the 2020 IEEE 18th International Conference on Industrial Informatics (INDIN), 2020, Vol. 1, pp. 440–445. https://doi.org/10.1109/INDIN45582.2020.9442147.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The author provides a good perspective on the importance of data for agricultural development and the need for precision agriculture.  In particular, the author mentions the practical application of many examples and tools in the process of monitoring and managing crop nutrients, and also points out the importance of data shortage and data analysis methods for agricultural development.  There are some questions about this paper:

1. What is your solution to the agricultural data imbalance?

2. There have been a number of attempts at an explanation for the imbalance in agricultural data. How do you address the weaknesses of these methods?

3. Please redrew Table 1.

4. In '5. Application in Agriculture', Do you need to add growth monitoring and nutritional diagnosis?

5. Explain why future research focuses on the interpretability of DL models?

6. Please standardize the citation format.

Author Response

  1. What is your solution to the agricultural data imbalance?

Response: Thank you for your review and the question. To address agricultural data imbalance, several solutions can be implemented, focusing on both data-related and algorithmic approaches. Here is an outline of potential solutions:

  1. Data Augmentation: One of the most common approaches to mitigating imbalance is to use data augmentation techniques. This can involve generating synthetic data to balance underrepresented classes. Techniques such as image rotation, flipping, scaling, or even using generative models like GANs (Generative Adversarial Networks) can be applied to increase the diversity of the minority class.
  1. Resampling Methods: Oversampling the minority class or undersampling the majority class can help balance the dataset. In the case of oversampling, synthetic samples are generated for the minority class. For undersampling, samples are randomly removed from the majority class. Advanced methods like SMOTE (Synthetic Minority Over-sampling Technique) can be applied to create synthetic data points in the feature space.
  1. Cost-sensitive Learning: This approach adjusts the ML model training to penalize misclassification of minority class samples more heavily. By applying higher weights to the minority class, the model is encouraged to focus on improving the prediction for the underrepresented class, thus mitigating the bias towards the majority class.
  2. Using Ensemble Methods: Techniques like random forests or boosting (e.g., AdaBoost or XGBoost) can be effective when dealing with imbalanced datasets. These methods can focus on the minority class by incorporating weighted voting or by emphasizing the errors made on the minority class during model training.
  3. Evaluation Metrics: To effectively evaluate performance on imbalanced datasets, it is essential to focus on appropriate metrics that consider both classes, such as F1-score, Precision, Recall, and the Area Under the Receiver Operating Characteristic curve (AUC-ROC), instead of relying on accuracy alone, which may be misleading in the case of imbalanced data.
  1. There have been a number of attempts at an explanation for the imbalance in agricultural data. How do you address the weaknesses of these methods?

Response: Thank you for your thoughtful question. Several conventional resampling approaches as well as deep learning-based methods have been developed to address the imbalance issue, but these approaches still have their weaknesses. Many traditional methods, such as random oversampling or undersampling, tend to be too simplistic and often fail to handle the complexity of agricultural data. For example, oversampling techniques like SMOTE generate data randomly to address imbalance, but without validation or supervision from experts. Without proper control, the generated data is unrealistic and could worsen the existing data condition, making it not always effective in improving detection accuracy. Therefore, domain knowledge involvement is needed to ensure that the generated data is relevant and accurate. Additionally, by incorporating domain knowledge, one can use more context-aware sampling techniques, such as considering specific variables like soil conditions, weather, or crop types.

Secondly, generative models like GANs (Generative Adversarial Networks) have been used for data augmentation, but they can sometimes generate blurry, less diverse or unrepresentative synthetic samples, especially when the training data is imbalanced or not diverse enough. To overcome this, one can fine-tune GANs using a semi-supervised learning approach, where real data labels help guide the model's generation of synthetic data. Additionally, a hybrid approach that combines GANs with traditional techniques like SMOTE or under-sampling can also be effective in training GANs with insufficient training data.

  1. Please redrew Table 1.

Response: Thank you, we have redrawn the Table 1.

  1. In '5. Application in Agriculture', Do you need to add growth monitoring and nutritional diagnosis?

Response: Thank you for your question. Indeed, while many studies address plant nutrition and growth monitoring using machine learning for agriculture application, it is difficult to find studies that directly address the issue of imbalanced data in that context. This may be due to the main focus of the research emphasising more on the accuracy and effectiveness of the model in predicting plant nutrition or growth status without considering the class distribution in the dataset.

However, in some studies that address agricultural data analysis, there is discussion of imbalanced data. For example, in a study discussing the analysis of data variation in plant roots and stems, it was mentioned that data imbalance can be a problem in machine learning. However, these studies did not specifically address the application of data imbalance in nutrient diagnosis or plant growth monitoring.

Therefore, although there are many studies that use machine learning for nutrient analysis and plant growth, studies that specifically address the application of data imbalance handling techniques in these contexts are limited.

  1. Explain why future research focuses on the interpretability of DL models?

Response: Thank you for bringing up this important question.

(Lines 781 - 790) Secondly, future research increasingly focuses on interpretability of DL models because, while DL models often achieve high performance, their "black-box" nature makes it difficult to understand how they make decisions. This lack of transparency presents several challenges, particularly in high-stake fields like agriculture. In practical application, stakeholders (farmers, agronomists, etc.) need to trust the model's recommendations. If a model trained on imbalanced data consistently misclassifies rare plant diseases or crop conditions, good interpretability would allow experts to understand the rationale behind these errors. This transparency helps in adjusting the model or training process to improve performance, making the model more reliable and applicable in real-world scenarios.

  1. Please standardize the citation format.

Response: Thank you for your concern, we are using the LaTeX format provided by the publisher, so the citation style is automatically standardized according to the publisher's guidelines.

 

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

The paper “A survey of methods for addressing imbalance data problems in agriculture applications” focuses on reviewing existing methods for mitigating imbalances in detection, classification, or segmentation methods for agricultural applications. The authors investigate data-level, algorithm-level, and hybrid approaches. They describe the challenges of imbalanced classification in agricultural applications, emphasizing multiclass and intra-class classification. Multiclass classification requires specialized techniques to effectively address imbalances between different classes of crops or diseases, while intra-class classification encounters difficulties in distinguishing subtle variations within a single type of crop that shares similar features. The methods addressing imbalanced data are reviewed based on algorithm-level, data-level, and hybrid-level approaches and are summarized in tables and figures. The paper also discusses generative models for synthetic data generation to address imbalanced datasets. Another chapter is dedicated to applications in agriculture, explaining how this imbalance is mitigated in plant disease detection, soil management and crop type classification. The datasets covered include a variety of data types, from leaves and crops to hyperspectral and SAR data. Regarding soil management, various approaches for soil classification are summarized in a table for better visibility. Crop type classification using machine learning focuses on hyperspectral data due to its ability to characterize plants, distinguish between species, assess plant health and detect the presence of diseases or pests with much higher precision than other datasets. Evaluation metrics such as confusion matrix, Matthews Correlation Coefficient (MCC), precision, recall, sensitivity, specificity and F-score are presented and discussed. The authors acknowledge certain limitations in this survey, including the use of non-public datasets in some studies, especially for soil management and disease detection. The lack of standardized public datasets hinders efforts to compare and validate research findings consistently. The authors make recommendations for future studies, particularly emphasizing the use of public datasets and GANs for generating more complex and representative synthetic data.

For authors: The article provides an extensive review of methods for addressing data imbalance, covering a wide range of techniques, including data-level, algorithm-level, and hybrid approaches. This categorization makes it accessible to researchers with varying levels of expertise. While the article presents a solid review of methods addressing data imbalance in agricultural applications, it could benefit from several improvements. Although the categorization into data-level, algorithm-level and hybrid approaches is clear, the lack of detailed comparisons between methods leaves readers without a strong understanding of where each approach excels or falls short in specific contexts, such as multiclass or intra-class classification. Additionally, the paper briefly mentions generative models like GANs but does not explore other emerging techniques such as Variational Autoencoders or transfer learning, which could provide broader insights. High-dimensional datasets like hyperspectral and SAR data introduce unique challenges that are not fully addressed such as the need for dimensionality reduction or feature selection. Furthermore, while evaluation metrics such as MCC and F-score are mentioned, there is no critical analysis of their suitability for different imbalance scenarios, leaving a gap in understanding which metrics are most appropriate. The discussion on the importance of public datasets is relevant, but the recommendations are broad; more actionable suggestions, such as proposing collaborative efforts for dataset standardization, would add value. Finally, the review could better contextualize its findings within agricultural needs by discussing how constraints like noisy or incomplete datasets affect the applicability of the methods. These additions would make the article more comprehensive and impactful within its scope as a survey.

Author Response

  1. Although the categorization into data-level, algorithm-level and hybrid approaches is clear, the lack of detailed comparisons between methods leaves readers without a strong understanding of where each approach excels or falls short in specific contexts, such as multiclass or intra-class classification.

Response: Thank you for highlighting this important point. We understand that a detailed comparison between data-level, algorithm-level, and hybrid approaches can provide valuable insights into their respective strengths and weaknesses. To address this, we have expanded Section 3.1, 3.2, and 3.3 to include a comparative analysis of the three approaches. We highlight specific scenarios where each method performs optimally.

(Lines 357 - 370) The use of resampling depends heavily on the characteristics of the dataset [1]. For example, data-level methods such as SMOTE work effectively when the majority and minority classes are well-separated with a clear decision boundary. However, in datasets with more complex class distributions, where minority class data points lie near or on the decision boundary, ADASYN is preferable as it adapts the sampling distribution to focus on harder-to-classify instances. Similarly, when many minority samples are ambiguous or similar to majority samples, SVM-SMOTE becomes more effective because it generates synthetic data points around the decision boundary. This helps to refine the decision boundary and improve the classification of minority class instances. For noisy datasets, undersampling methods like Edited Nearest Neighbours (ENN) or Tomek Links outperform oversampling techniques by removing noisy instances and preventing noise amplification. Moreover, when the majority class exhibits significant intra-class variation, clustering-based undersampling (e.g., k-means) is beneficial as it minimises the risk of losing critical data.

(Lines 312 - 321) These approaches are particularly effective for large datasets with extreme imbalance, where data-level methods like oversampling may incur high computational costs or risk overfitting. Cost-sensitive learning adjusts the loss function to penalise misclassification of minority class samples, making it a robust choice for maintaining class distribution without requiring preprocessing steps. Another algorithm-level method is threshold moving, which adjusts the decision threshold for classification to favour the minority class. This technique is effective when the model outputs probabilistic scores that are well-calibrated. For example, in scenarios where achieving high recall for the minority class is critical (e.g., detecting rare diseases or identifying plant diseases in early stages), moving the threshold to a lower value for the minority class can significantly improve its detection rate.

(Lines 398 - 404) As hybrid methods combine the strengths of data-level and algorithm-level techniques to address their respective limitations. For example, SMOTE-ENN integrates synthetic oversampling with noise-reducing undersampling [2]. This approach compensates for the potential increase in noise associated with oversampling by analysing and removing noisy samples, resulting in a more robust method compared to SMOTE or ENN individually. Consequently, hybrid methods like SMOTE-ENN are computationally intensive, making them best suited for applications where processing time is not a critical factor. 

  1. Additionally, the paper briefly mentions generative models like GANs but does not explore other emerging techniques such as Variational Autoencoders or transfer learning, which could provide broader insights.

Response: Thank you very much for the valuable comments. We appreciate the suggestions regarding the discussion of recent techniques such as Variational Autoencoders (VAEs) or transfer learning. We fully agree that these two approaches have great potential in addressing class imbalance issues in agricultural data. In revised revision, we have added the use of Variational Autoencoders (VAEs) as an alternative generative method, in addition to GANs, which can be used to generate synthetic data.

(Lines 474 - 533) 4.2. Variational Autoencoder (VAE)

Another approach that can be categorized as a generative model is the variational autoencoder (VAE) [3]. VAE is a development of traditional autoencoder (AE) networks primarily used for dimensional reduction. The basic architecture of an autoencoder comprises two main components: an encoder and a decoder. The encoder’s function is to convert input data into a condensed, lower-dimensional representation. In contrast, the decoder is responsible for reconstructing the data to its original size and shape. Because standard autoencoders compress data into a more compact latent vector representation and then reconstruct it, they often produce reconstructed data with limited variation. Therefore, this mechanism is not ideal for tasks requiring high variability such as data generation. The main distinction between a traditional autoencoder and a VAE is that while an autoencoder learns to produce a compressed output in the encoder, represented in a bottleneck or latent vector, a VAE learns the input data distribution. This results in the bottleneck in an autoencoder being divided into two separate layers - mean distribution and standard deviation - before learning the distribution as a latent vector, which is then reconstructed with the decoder to obtain the output (see Figure 4 for illustration). This stochastic process enables VAEs not just to learn a fixed representation but also to capture the distribution of the data in the latent space, allowing them to generate new, similar data points and making them highly effective for generative tasks.

Figure 4. Variational autoencoder model, a type of generative model that learns to encode input data into a probabilistic latent space. (Modified from [4])

Recent studies demonstrate various approaches to addressing class imbalance in different types of data using VAEs. Several papers apply VAEs to generate synthetic data that enhances the performance of predictive models, both in regression contexts, such as Imbalanced Regression (IR) [5], and in disease detection, such as COVID-19 detection using chest X-ray images [6]. VAEs are also utilized to create synthetic data to improve prediction accuracy by integrating them into graph attention networks for construction management [7]. A technique called contrastive variational autoencoders also used for oversampling in [8].

Although both generative model approaches are effective in generating new synthetic data, each approach still has its own weaknesses. For example, while GANs can effectively generate high-quality data, they are still susceptible to the problem of mode collapse, where the variation in the generated data is quite limited [9]. On the other hand, VAEs are only effective in generating low-resolution data, as they are more likely to produce blurry data [10].

4.3. Transfer Learning

Transfer learning is a machine learning technique that leverages knowledge from previously learned tasks or domains to enhance performance on new, related tasks or domains [11]. This approach is particularly beneficial when the target task has limited or unbalanced training data, as it enables models to utilise information acquired from other sources [12]. Common methods of transfer learning include feature extraction and fine-tuning tuning [13]. In feature extraction, pre-trained models are employed to extract features from new data, which are then used by other models for classification. Fine-tuning involves adjusting the weights of a pre-trained model on the target dataset, allowing the model to adapt to the specific characteristics of the new data.

One significant advantage of transfer learning in the context of imbalanced data is its ability to reduce the necessity for large training datasets [14]. By utilising models trained on extensive, balanced datasets, transfer learning enables models to recognise important patterns and features, even when data for minority classes is scarce. This capability can enhance the accuracy and generalisation performance of models concerning minority classes.

However, applying transfer learning to imbalanced data presents challenges. If there are substantial differences between the source and target domains, transfer learning may result in negative transfer, where the transferred knowledge is irrelevant or detrimental to model performance [15]. Therefore, ensuring compatibility between source and target domains is crucial. Additionally, incorporating techniques such as class weight adjustment or error-cost sensitive algorithms may be necessary to effectively address data imbalance [16, 17].

In agriculture, transfer learning has been applied to address data imbalance in leaf disease detection. For instance, the Lightweight Federated Transfer Learning (LFTL) framework has been developed to detect and classify leaf diseases [18]. Additionally, the Progressive Loss-Aware Fine-Tuning Stepwise Learning (PLAFTSL) model has been proposed for rice disease detection. This model utilises a fine-tuned ResNet50 architecture with stepwise learning to improve the training efficiency [19].

  1. High-dimensional datasets like hyperspectral and SAR data introduce unique challenges that are not fully addressed such as the need for dimensionality reduction or feature selection.

Response: Thank you for the provided feedback. In this manuscript, we have discussed the challenges related to high-dimensional data, particularly hyperspectral data (HSI), in Subsection 5.3. We explained that conventional oversampling methods are less effective for HSI data because they may generate overlapping samples. This can potentially distort decision boundaries between classes and reduce the model's accuracy in classifying plant species. To address this issue, we also refer to solutions suggested by various studies, such as the use of feature selection to eliminate irrelevant features before applying oversampling methods like SMOTE.

  1. Furthermore, while evaluation metrics such as MCC and F-score are mentioned, there is no critical analysis of their suitability for different imbalance scenarios, leaving a gap in understanding which metrics are most appropriate.

Response: Thank you for the valuable feedback regarding the use of evaluation metrics such as Matthews Correlation Coefficient (MCC) and F1-score in the context of imbalanced data. We fully agree that selecting the appropriate evaluation metric is crucial for accurately assessing model performance in class imbalance scenarios.

We have added recommendations on when these metrics are best considered:

(Lines 707 - 709) The F1 Score is recommended for use when precision and recall need to be balanced, particularly in situations where the cost of both types of errors is considered equal.

(Lines 714 - 715) G-Mean is particularly useful when the goal is to maintain a balance between sensitivity and specificity.

(Lines 668 - 670)  MCC is effectively used when a comprehensive understanding of model performance is required, particularly in scenarios with extreme class imbalance.

  1. The discussion on the importance of public datasets is relevant, but the recommendations are broad; more actionable suggestions, such as proposing collaborative efforts for dataset standardization, would add value.

Response: Thank you for the valuable feedback.

(Lines 813 - 817) Additionally, journals may encourage data sharing by offering free open-access publication to researchers who make their datasets publicly available. Collaborative efforts for dataset standardization would add significant value by promoting consistency and improving the usability of shared data.

  1. Finally, the review could better contextualize its findings within agricultural needs by discussing how constraints like noisy or incomplete datasets affect the applicability of the methods. These additions would make the article more comprehensive and impactful within its scope as a survey.

Response: Thank you for the valuable feedback. We have discussed the issue of noisy data in the Resampling Techniques section, particularly under Undersampling Methods. One of the primary objectives of undersampling techniques, such as Edited Nearest Neighbours (ENN) and Tomek Links, is to selectively remove noisy and borderline samples from the dataset. In terms of incomplete dataset issue, we have added several paragraphs in the Challenge and Future Directions section.

(Lines 740 - 743)  Incomplete data, which can arise from factors such as inaccurate sensors, fluctuating weather conditions, or human errors during data collection, further exacerbates this issue by reducing model accuracy and hindering generalization.

REFERENCES

[1] Kraiem, M.S.; Sánchez-Hernández, F.; Moreno-García, M.N. Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. Applied Sciences 2021, 11. https://doi.org/10.3390/app11188546.

[2] Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 2004. https://doi.org/10.1145/1007730.1007735.

[3] Kingma, D.P.; Welling, M. Auto-encoding Variational Bayes. 2014.

[4] Qian, W.; Gechter, F. Variational information bottleneck model for accurate indoor position recognition. 2020. https://doi.org/10.1109/ICPR48806.2021.9412651.

[5] Stocksieker, S.; Pommeret, D.; Charpentier, A. Data Augmentation with Variational Autoencoder for Imbalanced Dataset, 2024, [arXiv:cs.LG/2412.07039].

 

 

[6] Chatterjee, S.; Maity, S.; Bhattacharjee, M.; Banerjee, S.; Das, A.K.; Ding, W. Variational Autoencoder Based Imbalanced COVID-19 Detection Using Chest X-Ray Images. New Generation Computing 2023, 41, 25–60. https://doi.org/10.1007/s00354-022-00194-y.

[7]  Mostofi, F.; Behzat Tokdemir, O.; To ˘gan, V. Generating synthetic data with variational autoencoder to address class imbalance of graph attention network prediction model for construction management. Advanced Engineering Informatics 2024, 62, 102606. https://doi.org/https://doi.org/10.1016/j.aei.2024.102606.

[8] Dai, W.; Ng, K.; Severson, K.; Huang, W.; Anderson, F.; Stultz, C. Generative Oversampling with a Contrastive Variational Autoencoder. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), 2019, pp. 101–109. https://doi.org/10.1109/ICDM.2019.00020.

[9] Kossale, Y.; Airaj, M.; Darouichi, A. Mode Collapse in Generative Adversarial Networks: An Overview. In Proceedings of the 2022 8th International Conference on Optimization and Applications (ICOA), 2022, pp. 1–6. https://doi.org/10.1109/ICOA55659.2022.9934291.

[10] Naderi, H.; Soleimani, B.H.; Matwin, S. Generating High-Fidelity Images with Disentangled Adversarial VAEs and Structure- Aware Loss. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8. https://doi.org/10.1109/IJCNN48605.2020.9207056.

[11] Marques, J.A.L.; Gois, F.N.B.; do Vale Madeiro, J.P.; Li, T.; Fong, S.J. Chapter 4 - Artificial neural network-based approaches for computer-aided disease diagnosis and treatment. In Cognitive and Soft Computing Techniques for the Analysis of Healthcare Data; Bhoi, A.K.; de Albuquerque, V.H.C.; Srinivasu, P.N.; Marques, G., Eds.; Intelligent Data-Centric Systems, Academic Press, 2022; pp. 79–99. https://doi.org/https://doi.org/10.1016/B978-0-323-85751-2.00008-6.

[12] Ali, A.H.; Yaseen, M.G.; Aljanabi, M.; Abed, S.A.; GPT, C. Transfer Learning: A New Promising Techniques. Mesopotamian Journal of Big Data 2023. https://doi.org/10.58496/mjbd/2023/004.

[13] Hadhrami, E.A.; Mufti, M.A.; Taha, B.; Werghi, N. Transfer learning with convolutional neural networks for moving target classification with micro-Doppler radar spectrograms. In Proceedings of the 2018 International Conference on Artificial Intelligence and Big Data, ICAIBD 2018, 2018. https://doi.org/10.1109/ICAIBD.2018.8396184.

[14] Liu, T.; Alibhai, S.; Wang, J.; Liu, Q.; He, X.; Wu, C. Exploring Transfer Learning to Reduce Training Overhead of HPC Data in Machine Learning. In Proceedings of the 2019 IEEE International Conference on Networking, Architecture and Storage (NAS), 2019, pp. 1–7. https://doi.org/10.1109/NAS.2019.8834723.

[15] Zhang, W.; Deng, L.; Zhang, L.; Wu, D. A Survey on Negative Transfer. IEEE/CAA Journal of Automatica Sinica 2023, 10, 305–329. https://doi.org/10.1109/JAS.2022.106004.

[16] Lakkapragada, A.; Sleiman, E.; Surabhi, S.; Wall, D.P. Mitigating Negative Transfer in Multi-Task Learning with Exponential Moving Average Loss Weighting Strategies. In Proceedings of the Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, 2023, Vol. 37.

[17] Zhang, H.; Liu, W.; Yang, H.; Zhou, Y.; Zhu, C.; Zhang, W. CSAL: Cost sensitive active learning for multi-source drifting stream. Knowledge-Based Systems 2023, 277, 110771. https://doi.org/https://doi.org/10.1016/j.knosys.2023.110771.

[18] Choubey, S.; Divya. Lightweight Federated Transfer Learning for Plant Leaf Disease Detection and Classification across Multiclient Cross-Silo Datasets. In Proceedings of the BIO Web of Conferences, 2024, Vol. 82. https://doi.org/10.1051/bioconf/20248205018.

[19] Upreti, K.; Singh, P.; Jain, D.; Pandey, A.K.; Gupta, A.; Singh, H.R.; Srivastava, S.K.; Prasad, J.S. Progressive loss-aware fine-tuning stepwise learning with GAN augmentation for rice plant disease detection. Multimedia Tools and Applications 2024, 83, 84565–84588. https://doi.org/10.1007/s11042-024-19255-z. 1155

Author Response File: Author Response.docx

Reviewer 4 Report

Comments and Suggestions for Authors

Thanks for this opportunity to review this survey. My comments are as follows:

1.       The abstract needs to be majorly revised as it did not provide a clear statement of the key findings and key conclusions of this survey. In other words, I did not find enough valuable information or new knowledge solely from the current abstract.

2.       Line 110-113, the authors claimed that they surveyed the topics of plant disease detection, soil management and monitoring, and plant/crop classification. Indeed, these topics are important in precision agriculture. My question: Is data imbalance occurring in crop/weed discrimination or weed classification/detection/segmentation? If so, it would be better to add related review and analysis within a new subsection in section 5.

3.       The survey seems to pay extensive attention to classification in the field of precision agriculture. However, classification is relatively easy and primary task both in computer vision and precision agriculture. More high-level tasks, such as data/image/video detection, segmentation, and even visual object tracking are providing more in-depth and comprehensive information for agricultural stakeholders. More importantly, the imbalance data problem also applies to these agricultural tasks. Therefore, I would highly recommend the authors do a more comprehensive review of these topics to further underscore the meaning and value of this survey.

4.       Almost all contents in Section 6 cover the common information both in computer vision and agriculture. This information is both meaningless and low-related to the topic of this survey.

5.       For a literature review, a detailed literature search method and results analysis are usually necessary.

6.       The limitations of this article need to be clearly stated.

7.       There are not enough references closely related to agriculture.

Author Response

Authors’ Responses (in blue) to Reviewer 4’s Questions and Comments (in red)

  1. The abstract needs to be majorly revised as it did not provide a clear statement of the key findings and key conclusions of this survey. In other words, I did not find enough valuable information or new knowledge solely from the current abstract.

Response: Thank you for your concern, we have revised the abstract.

This survey explores recent advances in addressing class imbalance issues in precision agriculture, with a focus on techniques used for plant disease detection, soil management, and crop classification. We examine the impact of class imbalance on agricultural data and evaluate various resampling methods, such as oversampling and undersampling, as well as algorithm-level approaches, to mitigate this challenge. The paper also highlights the importance of evaluation metrics, including F1-score, G-mean, and MCC, in assessing the performance of machine learning models under imbalanced conditions. Additionally, the review provides an in-depth analysis of emerging trends in the use of generative models, like GANs and VAEs, for data augmentation in agricultural applications. Despite the significant progress, challenges such as noisy data, incomplete datasets, and the lack of publicly available datasets remain. This survey concludes with recommendations for future research directions, including the need for robust methods that can handle high-dimensional agricultural data effectively.

  1. Line 110-113, the authors claimed that they surveyed the topics of plant disease detection, soil management and monitoring, and plant/crop classification. Indeed, these topics are important in precision agriculture. My question: Is data imbalance occurring in crop/weed discrimination or weed classification/detection/segmentation? If so, it would be better to add related review and analysis within a new subsection in section 5.

Response: Thank you for your question. Weed detection and segmentation are indeed relevant to the issue of data imbalance, as weed classes often has significantly fewer samples compared to the crop class. We have added such studies on weed detection and segmentation in the revision.

(Lines 622 - 637) 5.4. Weed detection

In weed detection, data imbalance often arises when the number of weed samples is significantly lower than that of healthy crops or background elements. This imbalance can cause models to favour classifying images as healthy crops, leading to reduced accuracy in identifying weeds. Moreover, the small size and scattered distribution of weeds, coupled with their visual similarity to young crop plants, pose additional challenges for accurate detection. Errors in weed detection can result in inefficient pesticide application, increased costs, and potential environmental damage. To address these challenges, various approaches have been explored, including data resampling techniques, the adoption of specialized deep learning architectures like SegNet and U-Net, the use of tailored loss functions such as weighted loss or focal loss, and the application of GAN to increase the number of training samples. Accurate weed detection is crucial for optimizing pesticide use, minimizing environmental impact, and ensuring better crop yields. Table 6 summarizes recent advancements and methodologies employed to tackle data imbalance issues in weed detection within precision agriculture systems.

 

 

Table 6/ Summary of Various Approaches for Weed Detection with Imbalanced Data.

Title

Year

Dataset

Techniques

Shape and style GAN-based multispectral data augmentation for crop/weed segmentation in precision farming [1]

2024

Multispectral image of sugar beet dataset

This research utilises two types of GAN: cGAN and DCGAN for scene augmentation.

Synthesizing Training Data for Intelligent Weed Control Systems Using Generative AI [2]

2024

Multispectral image of sugar beet dataset

This study employs a generative approach to create synthetic images for training object detection systems for weed control. It combines the Segment Anything Model (SAM) for zero-shot transfer to new domains with an AI-based Stable Diffusion Model to generate synthetic images.

Fully convolutional network for rice seedling and weed image segmentation at the seedling stage in paddy fields [3]

2019

RGB images of rice seedlings and weeds in paddy fields

The study applies semantic segmentation using SegNet to detect the positions of rice seedlings and weeds in paddy fields. In addition, class weight coefficients are calculated to handle the class imbalance.

Weed identification in broomcorn millet field using Segformer semantic segmentation based on multiple loss functions [4]

2024

RGB Images of broomcorn millet farms with 67% weed coverage

The study uses Segformer. Furthermore, a combination of dice loss and focal loss is applied to address the imbalance between positive and negative samples and to resolve the issue of small area segmentation due to densely growing weeds.

Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model [5]

2019

Multispectral image of sugar beet dataset

This study introduces a lightweight convolutional neural network with a codec structure for real-time sugar beet and weed recognition. Then, A weighted loss function addresses pixel imbalances between soil, crops, and weeds.

 

  1. The survey seems to pay extensive attention to classification in the field of precision agriculture. However, classification is relatively easy and primary task both in computer vision and precision agriculture. More high-level tasks, such as data/image/video detection, segmentation, and even visual object tracking are providing more in-depth and comprehensive information for agricultural stakeholders. More importantly, the imbalance data problem also applies to these agricultural tasks. Therefore, I would highly recommend the authors do a more comprehensive review of these topics to further underscore the meaning and value of this survey.

Response: Thank you for your comments. ML-based models are making their way to precision agriculture as to complement or enhance simple sensor-based monitoring. The survey focuses on one of the challenges, unbalanced data, as it is critical to training ML models. Indeed, in additional to classification, such problems occur to detection and segmentation tasks. We have emphasized these and add more explanation, comments and examples of issues associated with data imbalance issues in such ML models.

 

  1. Almost all contents in Section 6 cover the common information both in computer vision and agriculture. This information is both meaningless and low-related to the topic of this survey.

Response: Thank you for your concern, but we would like to clarify that Section 6 was intentionally focused on evaluating machine learning techniques in the context of imbalanced datasets, which is a key aspect of our survey. While some of the information may seem common across computer vision and agriculture, it is crucial for general readers to understand how the evaluation metrics, methods, and challenges are applied, especially in agricultural applications where imbalanced datasets are prevalent.

  1. For a literature review, a detailed literature search method and results analysis are usually necessary.

Response: Thank you for your comment. In response to your feedback, we have conducted a comprehensive literature search focusing on papers published within the past few years and indexed in Scopus journals.

  1. The limitations of this article need to be clearly stated.

Response: Thank you for your feedback. Below is the limitations listed in the revised version:

(Lines 800 - 813)  1. Scope of the literature: The review focuses on papers published within the last few years and indexed in Scopus, which means it does not cover older journals or publications that may still provide valuable insights.

  1. Focus on specific techniques: While the review discusses a range of techniques to address class imbalance in agricultural applications, it does not exhaustively compare all of the methods discussed nor does it explore other potentially effective methods, such as transfer learning, which has gained traction in other applications.
  2. Data limitations: A significant limitation noted in the reviewed studies is the lack of publicly available datasets. Many of the datasets used in the papers referenced are proprietary or have restricted access, which may limit the reproducibility.
  3. There are not enough references closely related to agriculture.

Response: Thank you for your comment. In the revised version, we have added additional references closely related to agriculture, particularly in the field of e-agriculture.

 

REFERENCES

[1] Fawakherji, M.; Suriani, V.; Nardi, D.; Bloisi, D.D. Shape and style GAN-based multispectral data augmentation for crop/weed segmentation in precision farming. Crop Protection 2024, 184, 106848. https://doi.org/https://doi.org/10.1016/j.cropro.2024.106 1228848.

[2] Modak, S.; Stein, A. Synthesizing Training Data for Intelligent Weed Control Systems Using Generative AI. In Proceedings of the Architecture of Computing Systems; Fey, D.; Stabernack, B.; Lankes, S.; Pacher, M.; Pionteck, T., Eds., Cham, 2024; pp. 112–126.

[3] Ma, X.; Deng, X.; Qi, L.; Jiang, Y.; Li, H.; Wang, Y.; Xing, X. Fully convolutional network for rice seedling and weed image segmentation at the seedling stage in paddy fields. PLOS ONE 2019, 14, 1–13. https://doi.org/10.1371/journal.pone.0215676.

[4] BI, Z.; LI, Y.; GUAN, J.; LI, J.; ZHANG, P.; ZHANG, X.; HAN, Y.; WANG, L.; GUO, W. Weed identification in broomcorn millet field using segformer semantic segmentation based on multiple loss functions. Engineering in Agriculture, Environment and Food 2024, 17, 27–36. https://doi.org/10.37221/eaef.17.1_27. 1236

[5] Jun, S.; Wenjun, T.; Xiaohong, W.; Jifeng, S.; Bing, L.; Chunxia, D. Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE) 2019, 35, 184–190. https://doi.org/10.11975/j.issn.1002-6819.2019.12.022.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed most of my comments, but one minor comment:

In the example from lines 109-121, the authors note that in a dataset with 94% healthy patients and only 6% with congenital heart disease (CHD), a classifier that always predicts "healthy" would achieve an accuracy of 94%. While this highlights the limitations of accuracy in imbalanced datasets, it is important to recognize accuracy as one part of a comprehensive evaluation. Paired with metrics like class-wise recall, accuracy provides useful context for model performance. Therefore, dismissing accuracy entirely without considering its role within a broader evaluation framework may be too narrow an approach.

Author Response

Comments:

The authors have addressed most of my comments, but one minor comment:

In the example from lines 109-121, the authors note that in a dataset with 94% healthy patients and only 6% with congenital heart disease (CHD), a classifier that always predicts "healthy" would achieve an accuracy of 94%. While this highlights the limitations of accuracy in imbalanced datasets, it is important to recognize accuracy as one part of a comprehensive evaluation. Paired with metrics like class-wise recall, accuracy provides useful context for model performance. Therefore, dismissing accuracy entirely without considering its role within a broader evaluation framework may be too narrow an approach.

 

Responses: 

Thank you!

Re this minor comments: we fully agree with you. Indeed, accuracy is still a useful and common measure of performance. Although it alone may not give a fuller picture as it may be biased towards majority classes, combining with other measures such as F1 or using class-wise accuracy would give a more comprehensive evaluation. We have revised this paragraph in Introduction of the manuscript following your suggestion.  

 

 

Reviewer 4 Report

Comments and Suggestions for Authors

Since the author has basically responded to my concerns, I recommend accept for publication.

Author Response

Comments: 

Since the author has basically responded to my concerns, I recommend accept for publication.

 

Responses:

Thank you!

 

 

Back to TopTop