MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection

Park, Jun; Park, Sung-Wook; Kim, Yong-Seok; Jung, Se-Hoon; Sim, Chun-Bo

doi:10.3390/app151910503

Open AccessArticle

MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection

by

Jun Park

¹

,

Sung-Wook Park

¹

,

Yong-Seok Kim

²,

Se-Hoon Jung

^3,*

and

Chun-Bo Sim

^1,*

¹

Interdisciplinary Program in IT-Bio Convergence System, Sunchon National University, Suncheon 57922, Republic of Korea

²

Climate Change Assessment Division, Wanju National Institute of Agricultural Sciences, 166, Nongsaengmyeong-ro, Iseo-myeon, Wanju-gun 55365, Republic of Korea

³

Department of Computer Engineering, Sunchon National University, Suncheon 57922, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10503; https://doi.org/10.3390/app151910503

Submission received: 28 August 2025 / Revised: 24 September 2025 / Accepted: 26 September 2025 / Published: 28 September 2025

(This article belongs to the Special Issue Advanced Agricultural Technologies: Monitoring, Modeling, and Machine Learning Techniques)

Download

Browse Figures

Versions Notes

Abstract

Climate change poses significant challenges to agriculture, leading to increased crop damage owing to extreme weather conditions. Detecting and analyzing such damage is crucial for mitigating its effects on crop yield. This study proposes a novel autoencoder (AE)-based model, termed “Memory Ganomaly,” designed to detect and analyze weather-induced crop damage under conditions of significant class imbalance. The model integrates memory modules into the Ganomaly architecture, thereby enhancing its ability to identify anomalies by focusing on normal (undamaged) states. The proposed model was evaluated using apple and peach datasets, which included both damaged and undamaged images, and was compared with existing robust Convolutional neural network (CNN) models (ResNet-50, EfficientNet-B3, and ResNeXt-50) and AE models (Ganomaly and MemAE). Although these CNN models are not the latest technologies, they are still highly effective for image classification tasks and are deemed suitable for comparative analyses. The results showed that CNN and Transformer baselines achieved very high overall accuracy (94–98%) but completely failed to identify damaged samples, with precision and recall equal to zero under severe class imbalance. Few-shot learning partially alleviated this issue (up to 75.1% recall in the 20-shot setting for the apple dataset) but still lagged behind AE-based approaches in terms of accuracy and precision. In contrast, the proposed Memory Ganomaly delivered a more balanced performance across accuracy, precision, and recall (Apple: 80.32% accuracy, 79.4% precision, 79.1% recall; Peach: 81.06% accuracy, 83.23% precision, 80.3% recall), outperforming AE baselines in precision and recall while maintaining comparable accuracy. This study concludes that the Memory Ganomaly model offers a robust solution for detecting anomalies in agricultural datasets, where data imbalance is prevalent, and suggests its potential for broader applications in agricultural monitoring and beyond. While both Ganomaly and MemAE have shown promise in anomaly detection, they suffer from limitations—Ganomaly often lacks long-term pattern recall, and MemAE may miss contextual cues. Our proposed Memory Ganomaly integrates the strengths of both, leveraging contextual reconstruction with pattern recall to enhance detection of subtle weather-related anomalies under class imbalance.

Keywords:

deep learning; computer vision; autoencoder; anomaly detection; crop damage detection

1. Introduction

All living organisms on Earth are highly sensitive to climate change, and agriculture is one of the industries most significantly affected by climatic variations. Climate change can severely disrupt crop production, leading to reduced yields and, in some cases, complete crop loss [1,2,3]. Over the past few decades, a rapid increase in industrial activity has led to a sharp increase in greenhouse gas emissions, resulting in an increase in global average temperatures. This increase in temperature has amplified the frequency and intensity of extreme weather events, which, in turn, disrupt crop growth cycles and productivity on a global scale.

Efforts to mitigate weather-related crop damage have been pursued across various fields with advancements in machine learning, particularly deep learning, which offer new opportunities for the agricultural sector [4,5,6]. Deep learning technology has already achieved remarkable success in solving complex problems in healthcare, automotive, finance, and robotics, and holds significant potential for applications in agriculture, such as crop disease detection, yield prediction, and growth monitoring [7]. Comprehensive surveys have also summarized the rapid expansion of deep learning applications across diverse domains [8].

However, the collection of data on weather-induced crop damage remains challenging. Unpredictable occurrences, accessibility issues, the need for specialized imaging equipment, and inconsistent shooting conditions present significant hurdles.

To address these challenges, this study proposes a novel AE-based model called “Memory Ganomaly.” The model was designed to accurately identify weather-induced crop damage by extracting critical features from limited data [9]. This research is situated at the intersection of machine learning technology and agricultural meteorology and aims to foster complementary advancements in both fields.

2. Related Work

2.1. Machine Learning-Based Approaches

Traditional machine learning algorithms such as Support Vector Machine and Random Forest have primarily been used to predict crop growth status, physiological status, and detect diseases [10,11,12,13,14,15]. However, these techniques often fail to handle large-scale data and complex patterns, particularly when managing the nonlinearity and high dimensionality of agricultural data.

2.2. Convolutional Neural Networks

CNNs are widely used in agricultural fields to analyze image data. Research has demonstrated the effectiveness of CNNs in classifying crop diseases and monitoring growth stages [16,17,18,19]. However, CNNs require large amounts of labeled data and further improvements to detect anomalies caused by climate variations. In particular, handling the data collected from various perspectives and environments remains challenging.

2.3. Autoencoders

AEs are unsupervised learning techniques used to extract and reconstruct important data features. Studies utilizing AEs have focused on anomaly detection, which is useful for identifying crop damage caused by climate change [20,21,22,23]. However, AEs tend to effectively reconstruct both normal and abnormal patterns, limiting their effectiveness in distinguishing anomalies.

2.4. Generative Adversarial Networks

Generative adversarial networks (GANs) train generative and discriminative models to generate data. Models such as Ganomaly have introduced GAN concepts into AE structures to enhance anomaly detection performance. GAN-based models can learn diverse data distributions, thus increasing the generalizability of anomaly detection [24,25,26,27,28,29]. However, GANs can suffer from training instability, overfitting, and mode collapse, which hinder their effective implementation.

This study proposes a model that integrates the MemAE [30] memory module into the Ganomaly [31] structure to more accurately detect crop damage due to climatic conditions. This approach overcomes the limitations of previous studies by effectively storing and reconstructing the critical features of input data, consequently improving the accuracy and reliability of anomaly detection.

2.5. Transformer-Based Models

Recently, transformer-based architectures such as Vision Transformer (ViT), and self-supervised models like MAE have shown strong generalization in vision tasks under limited data regimes. However, their resource requirements and training complexity make them less feasible in low-resource agricultural deployments. This study instead focuses on enhancing lightweight AE-based approaches with memory integration to address class imbalance and interpretability.

3. Materials and Methods

3.1. Data Preprocessing Module

In this study, a novel deep-learning model was developed to automatically detect and classify weather-induced crop damage. The methodology includes the entire process from data collection to final damage identification. Figure 1 illustrates the steps involved in this study. During the data collection stage, real-time weather data and cropped images were collected using an automatic weather station (AWS). In the data preprocessing stage, these images were optimized to highlight weather-damaged areas. The model architecture is based on an AE consisting of an encoder and a decoder designed to distinguish between normal crop conditions and weather-damaged conditions. The memory module stores significant patterns from these intermediate representations and the decoder uses this information to reconstruct the input data. Through this process, the model accurately identifies damaged areas, and the classifier categorizes normal and abnormal states.

3.2. Data Collection

In this study, apples and peaches were selected as the data categories to develop a model for identifying weather damage to crops. These data were provided by the Rural Development Administration of South Korea. The collection process involved installing an AWS equipped with cameras on farms where apples and peaches were grown, allowing for automatic periodic photography. Peach data were collected in 2019 from the Icheon and Cheongdo regions and in 2020 from the Naju, Namwon, Imsil, Jangseong, and Jeonju regions. Apple data were collected from the Jangsu, Yeongju, Gunwi, Chungju, and Geochang regions in 2019 and 2020. The collected data encompassed the various growth stages of the crops, which were categorized into eight stages: dormancy, bud stage, flowering, post-flowering, early fruit growth, fruit growth, harvest, and nutrient accumulation. This categorization includes the entire process, from plant growth and energy accumulation to dormancy, allowing for a precise analysis of weather damage impacts at each stage. Figure 2 presents example images of apple and peach crops collected at various stages of growth. Each column represents the condition of the crops at a specific point in time, showing seasonal progression from dormancy to harvest. This visualization supports the detection of abnormal patterns by highlighting the temporal variation in crop appearance.

To ensure fair and reproducible evaluation, the collected datasets were divided into training, validation, and test sets. We adopted a 5-fold cross-validation scheme. Stratified sampling was applied with respect to the damaged/undamaged class labels. In addition, samples from different growth stages and regions were evenly distributed across folds to prevent bias caused by temporal or regional concentration. Data augmentation of damaged samples was performed only within the training subset after splitting to avoid any data leakage into the validation or test folds. The process was repeated five times so that each subset served once as the test set, and the reported results were averaged across all folds. To ensure fair evaluation under imbalanced conditions, stratified sampling was applied with respect to the “damaged/undamaged” class labels. In addition, samples from different growth stages and regions were evenly distributed across folds to prevent bias caused by temporal or regional concentration. Data augmentation of damaged samples was performed only within the training subset after splitting to avoid any data leakage into the validation or test folds. Stratified sampling and the fold construction ensured a representative composition across growth stages and regions, while augmentation improved the diversity of the training data.

3.3. Data Preprocessing

Data preprocessing is a critical stage that significantly affects model performance. In this study, the collected images were adjusted to render them suitable for the AE model. The images collected through the AWS contain metadata such as the region name, date, and time of capture, which are unnecessary for analysis and must be removed. Therefore, the top and bottom parts of the images were cropped to remove the region name, date, and time.

In the image-augmentation stage, each image was flipped horizontally and vertically. This increases the diversity of the dataset and allows the model to recognize weather damage to crops from various angles. Other augmentation techniques, such as adding noise or changing colors, were not used because they could introduce unintended confusion and risk distorting the actual characteristics of the weather damage. After this preprocessing, the images were used as inputs for the AE model, providing a foundation for the model to accurately identify crop weather damage.

3.4. Autoencoder Model

The structure of the AE model developed in this study is illustrated in Figure 3. It was designed by integrating the memory module of MemAE and skip connections into the Ganomaly architecture. This integrated structure allows for more precise identification and classification of crop damage caused by weather conditions.

The proposed Memory Ganomaly model employs an encoder–decoder architecture integrated with a memory module. The generator encoder is composed of four convolutional layers (kernel size = 3 × 3, stride = 2), which progressively reduce the input image (128 × 128 × 3) into feature maps of 64 × 64 × 64, 32 × 32 × 128, 16 × 16 × 256, and 8 × 8 × 512. The latent representation is refined by the memory module through pattern retrieval. The generator decoder mirrors the encoder with four deconvolutional layers (stride = 2), reconstructing feature maps of 16 × 16 × 256, 32 × 32 × 128, 64 × 64 × 64, and finally 128 × 128 × 3. Skip connections transfer intermediate feature maps to preserve spatial details during reconstruction. The discriminator consists of four convolutional layers (kernel size = 4 × 4, stride = 2), followed by a fully connected layer that outputs the real/fake score for adversarial training.

The memory module size was determined empirically through validation experiments. Configurations of 128, 256, 512, and 1024 items were tested. A capacity of 512 items provided the best trade-off between anomaly detection performance and computational efficiency. Smaller capacities reduced recall due to limited representation ability, whereas larger ones introduced redundancy and additional computational cost without consistent performance gains. Therefore, 512 memory items were selected for all experiments.

3.4.1. Generator

The generator in the proposed model was designed to transform the input images into a latent-space representation and then reconstruct them. The generator comprises two main components: the generator encoder (

G_{E}

) and generator decoder (

G_{D}

).

Generator Encoder: The generator encoder compresses the input image into a lower-dimensional latent-space vector

z

. This process involved multiple convolutional layers and skip connections. Skip connections help retain important features and fine details from the input image by bypassing certain layers and directly connecting the early to later layers. This architecture ensures that the model preserves the essential characteristics of the input data, which are crucial for accurate reconstruction and anomaly detection.

Latent Space Vector: The compressed latent-space vector

z

passes through a memory module that addresses and retrieves the relevant memory items. This module refines the representation by associating the input with the stored patterns, leading to a more precise latent-space vector

\hat{z}

.

Generator Decoder: The generator decoder takes the concatenated latent-space vectors

z

and

\hat{z}

and reconstructs the input image. The decoder has an asymmetric structure compared to the encoder, which is designed to expand the compressed information back into the original image dimensions. The reconstruction process involved several deconvolutional layers that progressively upscaled the latent representation into a high-dimensional output image.

Reconstruction Process: The goal of the reconstruction process is to generate an output image

\hat{x}

that closely matches the input image

x

. This process captures the detailed characteristics of the input, enabling the model to identify normal and anomalous patterns accurately.

Skip Connections: Skip connections in the encoder and decoder allow the model to transfer detailed information directly, aiding in better reconstruction. These connections are particularly useful for retaining the spatial details that may be lost during the compression process in the encoder.

Memory Module: The memory module in the generator addresses and retrieves memory items based on the latent space vector

z

. This module enhances the representation by integrating the input with the stored patterns, thus reconstruction accuracy. Memory items store significant features learned from the training data, which improves the ability of the model to distinguish between normal and anomalous inputs.

Output: The final output of the generator is a reconstructed image

\hat{x}

, that is used for further processing in the model to determine anomalies.

This generator architecture effectively compresses and reconstructs the input images, allowing accurate anomaly detection by comparing the reconstructed images with the original inputs. The integration of the memory module enhances the model’s ability to handle diverse patterns and improves the reconstruction fidelity, which is crucial for reliable anomaly detection in agricultural images.

3.4.2. Memory Module

The memory module is a crucial component of the proposed model and is designed to store significant patterns from the input data and utilize them during the reconstruction process to enhance the performance of the model. This module performs two key functions, namely pattern storage and pattern retrieval.

As illustrated in Figure 4, the memory module addresses and retrieves memory items based on the latent space vector z. This module enhances the representation by integrating the input with stored patterns, improving reconstruction accuracy. The memory items store significant features learned from the training data, which helps improve the model’s ability to distinguish between normal and anomalous inputs.

Pattern Storage: During the training process, the memory module stores important features of the input data as memory items. Each memory item represents a specific pattern of data, aiding the model in better understanding and reconstructing input data.

Pattern Retrieval: Pattern retrieval is critical for the reconstruction process. The memory module retrieves patterns from memory items similar to the input vector

z

, thereby enhancing the reconstruction accuracy. The pattern-retrieval process involves the following steps:

Similarity Measurement: The memory module measures the similarity between the input vector

z

and the memory items. It selects memory items with patterns that are the most similar to

z

.

Pattern Combination: The selected memory items are combined with

z

to form a more refined latent space vector

\hat{z}

. This combined vector is then passed to the generator decoder to produce the final reconstructed image.

The structure of the memory module involves measuring the similarity between the input vector

z

and the memory items and selecting the memory item with the most similar pattern. The selected memory item is combined with the input vector to generate a sophisticated latent-space vector

\hat{z}

, which is finally transmitted to the generator decoder. Mathematically, the similarity between the input vector z and a memory item

M_{i}

can be defined as follows:

d (z, M i) = ∥ z - M_{i} ∥^{2}

(1)

where

d (z, M_{i})

represents the distance between input vector

z

and memory item

M_{i}

. The selected memory item

M_{s}

was determined as follows:

M_{s} = a r g m i n_{M_{i}} d (z, M_{i})

(2)

The combined latent-space vector

\hat{z}

is expressed as the weighted sum of the input vector

z

and the selected memory item

M_{s}

:

\hat{z} = α z + (1 - α) M_{s}

(3)

where

α

is a parameter that adjusts the weighting. The combined vector

\hat{z}

is then passed to the generator decoder to generate the final reconstructed image.

3.4.3. Encoder

The encoder compresses the input image into a latent space vector. This process involves multiple convolutional layers and nonlinear activation functions that extract essential features from an input image and convert them into low-dimensional vectors. The encoder focuses on minimizing the difference between the reconstructed and original images, allowing the model to better distinguish between normal and anomalous patterns.

The structure of the encoder consists of several convolutional layers and activation functions that gradually compress the input image into a latent space vector. During this process, the essential features of the input image were extracted and stored in a latent-space vector. This compression process plays a crucial role in retaining essential information from the input image while reducing its dimensionality.

3.4.4. Discriminator

The discriminator differentiates between the images generated by the generator and the real images. This is a crucial part of the GAN structure because it guides the generator to produce more realistic images. The discriminator learns the differences between real and generated images and uses this knowledge to determine the authenticity of the input image.

The discriminator consists of several convolutional layers and activation functions that determine whether an input image is real or generated. During this process, the discriminator learns the features of the input image and uses them to distinguish real images from those generated by the generator.

The primary role of the discriminator is to learn the differences between real and generated images. The discriminator plays a key role in enhancing the overall performance of the model by guiding the generator to produce better images. Through this learning process, the discriminator can determine the authenticity of the input image, ensuring that the model can effectively distinguish between normal and anomalous patterns.

For a comprehensive evaluation, we compared the proposed Memory Ganomaly with multiple categories of baseline models, each representing different methodological characteristics. Convolutional neural networks (CNNs), including ResNet-50, EfficientNet-B3, ResNeXt-50, and ConvNeXt-Tiny, are powerful in large-scale supervised classification tasks but tend to overfit the majority class under severe class imbalance. Transformer-based architectures such as Swin Transformer leverage self-attention mechanisms and exhibit strong generalization, yet they require substantial computational resources and large-scale data for training. Few-shot learning models, including Siamese Network, Prototypical Network, and Matching Network, are designed to operate with limited labeled samples and thus address data scarcity, although their performance is often lower than that of anomaly detection frameworks. Autoencoder (AE)-based approaches, such as Ganomaly and MemAE, focus on learning normal patterns and identifying deviations as anomalies. The proposed Memory Ganomaly builds upon these AE-based frameworks by integrating a memory module into Ganomaly, thereby combining contextual reconstruction with long-term pattern recall to improve anomaly detection in imbalanced agricultural datasets.

3.4.5. Hyperparameter Settings

For reproducibility, we specify the key hyperparameters used in the proposed Memory Ganomaly model. The memory module contained 512 items with a dimension of 256. The weighting coefficient

a

for combining the input latent vector with the selected memory item was set to 0.7, based on validation experiments. The loss weights were set as follows: 1.0 for contextual loss, 0.5 for encoder loss, and 0.1 for adversarial loss. These values were empirically chosen to balance reconstruction fidelity and anomaly sensitivity.

3.4.6. Programming Environment

All experiments were implemented in Python 3.7 using TensorFlow 1.15 with CUDA 10.1 and cuDNN 7.6 for GPU acceleration. Supporting libraries included NumPy 1.18.5, Pandas 1.1.5, and Scikit-learn 0.24.2 for data preprocessing and evaluation, while Matplotlib 3.3.4 was used for visualization.

3.5. Training Details

The proposed Memory Ganomaly model was trained using the Adam optimizer with β1 = 0.5 and β2 = 0.999. The initial learning rate was set to 0.0002 and reduced by a factor of 0.5 every 50 epochs. A batch size of 32 was used, and the model was trained for a total of 200 epochs. These training configurations were selected based on commonly adopted settings in GAN-based anomaly detection and were empirically validated to ensure stable convergence and reproducibility. The size of the memory module was also determined empirically through validation experiments. We tested 128, 256, 512, and 1024 memory items and found that 512 provided the best trade-off between anomaly detection performance and computational efficiency. Smaller sizes (128 or 256) reduced recall due to insufficient representation capacity, whereas larger sizes (1024) increased redundancy and computational cost without consistent performance gains. Therefore, 512 memory items were selected as the optimal configuration in all experiments.

4. Loss Function

4.1. Loss Function Overview

The proposed model employs several loss functions to optimize the performance of both the generator and the discriminator. Each loss function targets a specific aspect of the model operation to enhance overall performance. The primary loss functions used in this model are the contextual loss (

L_{c o n t e x t u a l}

), encoder loss (

L_{e n c o d e r}

), and adversarial loss (

L_{a d v e r s a r i a l}

). Each of these loss functions plays a critical role in ensuring that the model can effectively reconstruct the images and accurately detect anomalies.

4.2. Contextual Loss

Contextual loss is designed to minimize the pixel-wise difference between the reconstructed and original input images. By reducing this difference, the model improves the quality of the reconstructed images, which is crucial for accurate anomaly detection. This loss function can be mathematically expressed as

L_{c o n t e x t u a l} = ∥ x - \hat{x} ∥^{2}

(4)

where

x

is the original image and

\hat{x}

is the reconstructed image generated by the model.

4.3. Encoder Loss

The encoder loss is applied to ensure that the latent space representations of the original input and reconstructed images are closely aligned. This loss function is critical for maintaining consistency between the input and reconstructed data in the latent space, which helps distinguish between normal and anomalous patterns. The encoder loss is defined as

L_{e n c o d e r} = ∥ z - \hat{z} ∥^{2}

(5)

where

z

is the latent-space vector of the input image and

\hat{z}

is the latent-space vector of the reconstructed image.

4.4. Adversarial Loss

Adversarial loss plays a crucial role in enhancing the realism of generated images by leveraging the competition between the generator and discriminator. In the context of GANs, this loss function encourages the generator to produce images indistinguishable from real images. The adversarial loss is represented as

L_{a d v e r s a r i a l} = - E [\log D (x)] - E [\log (1 - D (\hat{x})]

(6)

where

D

represents the discriminator,

x

is the real image, and

\hat{x}

is the image generated by the generator.

4.5. Total Loss Function

The total loss function is a weighted combination of the contextual, encoder, and adversarial loss functions. By adjusting the weights (

λ_{c o n t e x t u a l},

λ_{e n c o d e r}

, and

λ_{a d v e r s a r i a l}

), the model can be fine-tuned to optimize different aspects of the reconstruction and anomaly detection processes. The total loss function is defined as

L_{t o t a l} = λ_{c o n t e x t u a l L c o n t e x t u a l} + λ_{e n c o d e r L e n c o d e r} + λ_{a d v e r s a r i a l L a d v e r s a r i a l}

(7)

By carefully tuning these weights, the model can be optimized to balance the reconstruction accuracy with the ability to generate realistic images and effectively detect anomalies.

5. Results

5.1. Crop Dataset

In this study, the performance of the proposed memory-anomaly model was evaluated using the apple and peach datasets provided by the Rural Development Administration of South Korea. Data were collected using an AWS installed at various farm locations to capture a wide range of environmental conditions throughout crop growth stages. The dataset was used for training and testing the model, with data augmentation applied to some samples by flipping the images horizontally and vertically to address data imbalance.

Table 1 and Table 2 provide an overview of the apple and peach datasets. In particular, weather damage types, including cold and heat damage, were augmented owing to the limited amount of data collected, resulting in a threefold increase in data volume. This augmentation was crucial for ensuring the reliability of the model performance evaluation.

As shown in Table 1, each apple growth phase contains exactly 401 samples. This uniform distribution was not artificially standardized for presentation purposes but was the result of a preprocessing step. Because the Flowering stage had the fewest samples (401), we undersampled the other phases to this count to ensure balanced evaluation across growth stages and to prevent bias toward overrepresented phases.

5.2. Model Performance

Table 3 and Table 4 present the performance results of all evaluated models on the apple and peach datasets. These models can be broadly categorized into four groups: CNN-based models (including ResNet-50 [32], EfficientNet-B3 [33], ResNeXt-50 [34], and ConvNeXt-Tiny [35]), a Transformer-based model (Swin Transformer-Tiny [36]), few-shot learning models (Siamese Network [37], Prototypical Network [38], and Matching Network [39]), and autoencoder (AE)-based anomaly detection models (Ganomaly, MemAE, and the proposed Memory Ganomaly).

The CNN and Transformer-based models exhibited relatively high overall accuracy, approximately 94.86%. However, they failed to identify any damaged samples, resulting in zero precision and recall. This outcome does not indicate flaws in labeling or evaluation, but rather reflects the extreme class imbalance in the dataset. Because undamaged samples dominated the training distribution, the supervised CNN and Transformer models overfit to the majority class, leading to high overall accuracy but an inability to recognize minority damaged samples. As a result, these supervised learning approaches demonstrated a critical limitation in generalizing to minority class instances under real-world agricultural conditions. It should be noted that this comparison with unsupervised anomaly detection methods is not intended as a direct equivalence of learning paradigms, but rather to contextualize the limitations of supervised models under severe imbalance and to highlight the complementary advantages of anomaly detection frameworks in such scenarios.

In contrast, the few-shot learning models showed improved performance in detecting damaged crops under limited supervision. These models were evaluated using a 2-way K-shot classification setup, where K was set to 10 and 20 in separate experiments. Each episode consisted of a support set comprising a small number of damaged and undamaged samples, and a query set used for evaluation. This setup reflects practical agricultural scenarios in which only a few labeled damaged samples are available.

Among the few-shot models, the Siamese Network achieved the highest recall at 75.1% in the 20-shot setting, followed by the Matching and Prototypical Networks. Notably, although increasing the number of support samples from 10 to 20 generally improved performance, minor fluctuations were observed in some models due to the inclusion of less representative support examples. These findings highlight the effectiveness of metric-based approaches in addressing class imbalance, even under severely limited data conditions. Nonetheless, their overall accuracy and precision remained lower than those of AE-based anomaly detection models.

The AE-based models, which are trained to reconstruct only normal patterns and detect deviations as anomalies, demonstrated the most robust performance under class-imbalanced conditions. In particular, the proposed Memory Ganomaly model outperformed all other models, achieving an accuracy of 80.32%, precision of 79.4%, and recall of 79.1% on the apple dataset. These findings underscore the effectiveness of anomaly detection frameworks in scenarios where abnormal cases are scarce and difficult to define explicitly.

The performance results highlight critical differences among the evaluated models. CNN and Transformer-based models achieved high overall accuracy (94–97%) but completely failed to identify damaged samples, with precision and recall both equal to zero. This demonstrates their tendency to overfit the majority “undamaged” class under severe class imbalance. Few-shot learning models partially alleviated this issue, achieving recall scores up to 75.1% in the apple dataset under the 20-shot setting, but their accuracy and precision remained lower than those of AE-based approaches. By contrast, anomaly detection frameworks showed consistent robustness. Ganomaly and MemAE outperformed supervised and few-shot models, and the proposed Memory Ganomaly achieved the best balance across all metrics (Apple: 80.32% accuracy, 79.4% precision, 79.1% recall; Peach: 81.06% accuracy, 83.23% precision, 80.3% recall). These improvements of 1–3% over MemAE and up to 15% over few-shot models demonstrate the benefit of integrating a memory module for both contextual reconstruction and long-term pattern recall.

To further assess the contribution of the memory module and skip connections, we reorganized the baseline results into an ablation study, as shown in Table 5. The proposed Memory Ganomaly achieved the best overall performance across precision and recall while maintaining comparable accuracy to the baselines on both apple and peach datasets. Notably, although Ganomaly showed slightly higher accuracy on the peach dataset (81.42% vs. 81.06%), our model outperformed it in precision and recall, highlighting the benefits of the memory module for anomaly discrimination.

In addition, removing skip connections (MemAE) resulted in lower precision (Apple: 79.4% → 78.2%, Peach: 83.2% → 79.0%), underscoring their importance in preserving spatial details during reconstruction. These findings indicate that the two components contribute complementary strengths, leading to the superior overall performance of the proposed model.

Moreover, to illustrate fold-wise variability and complement the tabulated results, we report ablation boxplots for the apple and peach datasets across three variants: the proposed model, a variant without the memory module (Ganomaly), and a variant without skip connections (MemAE) (Figure 5). Each boxplot summarizes the distribution of five folds for accuracy, precision, and recall, with diamonds denoting fold means. The results reveal consistent trends across both crops: the memory module substantially improves recall, skip connections mainly enhance precision, and their joint integration produces the most balanced and stable performance across folds. These visual findings align with the quantitative comparisons in Table 3, Table 4 and Table 5, reinforcing that the two components contribute complementary strengths.

Finally, to investigate the effect of memory module size itself, we conducted an additional analysis using four configurations (128, 256, 512, and 1024 items). Figure 6 summarizes the trade-off curves for accuracy, precision, and recall on both apple and peach datasets. The results show that smaller memory sizes (128–256) led to limited representation capacity and reduced recall, whereas an excessively large size (1024) introduced redundancy and increased computational overhead without consistent performance gains. In contrast, the intermediate size of 512 items achieved the most favorable balance, delivering stable accuracy and recall with moderate computational cost. These findings justify our choice of 512 memory items in the proposed model.

5.3. Reconstruction Results

Figure 7 and Figure 8 present the reconstruction results for the undamaged and damaged data, respectively, using the proposed model. For undamaged data, the model successfully reconstructed images that closely resembled the original images, thereby accurately capturing the characteristics of healthy crops. In contrast, for the damaged data, the reconstructed images exhibited significant differences from the original images, particularly in areas where the damage was most pronounced. These discrepancies indicate the sensitivity of the model to anomalies, aligning with the objective of the AE approach, where the model, trained primarily on undamaged data, detects anomalies by struggling to accurately reconstruct them.

The reconstruction results further illustrate these differences. For undamaged crops (Figure 7), the proposed model reconstructed images that closely resembled the originals, preserving fine-grained details such as leaf structure and fruit texture. For damaged crops (Figure 8), however, the reconstructed images diverged significantly from the originals, particularly in regions corresponding to visible damage such as browning of leaves or deformation of fruits. This indicates that the model, trained primarily on normal samples, struggled to reproduce anomalous regions and thereby marked them as anomalies. These findings validate the model’s ability to capture subtle damage characteristics, providing a more interpretable anomaly detection process.

Figure 9 shows the data collection and reconstruction results at specific locations across the different growth stages. The top row shows the original images captured at various growth stages, and the bottom row shows the corresponding reconstructed images generated by the proposed model. This sequence highlights the model’s ability to consistently reconstruct crop images across different growth stages while maintaining the integrity of key features despite variations in crop development over time. The results demonstrate the robustness of the model in handling temporal changes in data, further supporting its applicability to real-world agricultural monitoring tasks.

Furthermore, Figure 9 demonstrates that this reconstruction capability remained stable across different growth stages. The model consistently preserved the structural integrity of undamaged crops while amplifying discrepancies in damaged regions. This robustness across temporal variation highlights the applicability of the proposed approach to long-term agricultural monitoring under changing growth conditions.

To complement the qualitative reconstruction results presented in Figure 7 and Figure 8, we further quantified reconstruction quality using SSIM and PSNR. The proposed Memory Ganomaly achieved an average SSIM of 0.87 and PSNR of 28.5 dB on undamaged samples, while showing substantially lower values (SSIM: 0.62, PSNR: 21.7 dB) on damaged samples. This contrast highlights the model’s difficulty in reconstructing anomalous regions, thereby validating its anomaly detection capability. Compared with baseline AE models, our method consistently achieved higher SSIM and PSNR on normal data, indicating improved reconstruction fidelity.

Table 6 reports the quantitative reconstruction results in terms of SSIM and PSNR for apple and peach datasets. The proposed model achieved high reconstruction quality on undamaged samples (SSIM > 0.87, PSNR > 28 dB), whereas the scores were substantially lower on damaged samples (SSIM ≈ 0.62–0.64, PSNR ≈ 21–22 dB). This discrepancy highlights the model’s difficulty in reproducing anomalous regions and reinforces its suitability for anomaly detection.

5.4. Computational Efficiency

Table 7 presents the computational complexity of the baseline methods and our proposed MemGanomaly under the same setting (128 × 128 RGB input, memory module = 1000 × 512, α = 0.7). GANomaly has 6.5 M parameters and only 248 G FLOPs, reflecting its relatively simple generator–discriminator design. In contrast, MemAE employs fewer parameters (5.2 M) but exhibits a much higher computational cost (769 G FLOPs) because of the intensive memory addressing operations. Our MemGanomaly combines both backbones, resulting in 38.1 M parameters and 2241 G FLOPs. The additional parameters mainly come from the encoder–decoder–encoder backbone and skip connections, whereas the FLOPs increase is largely due to the matrix multiplications in the memory module.

Although the complexity is considerably higher, forward runtime on RTX 3090 Ti hardware remains within a few milliseconds, and GPU memory usage is still practical for modern deployment. This analysis highlights an explicit trade-off: GANomaly is parameter-heavy but computationally light, MemAE is parameter-light but computationally heavy, and MemGanomaly inherits both characteristics. By sacrificing efficiency, our model achieves superior anomaly detection accuracy, illustrating that enhanced representational power often requires additional computational resources.

6. Discussion

The findings of this study highlight the effectiveness of anomaly detection frameworks, particularly the proposed Memory Ganomaly, for detecting weather-induced crop damage under severe class imbalance. In comparison with conventional CNN and Transformer-based models, which achieved high overall accuracy but failed to identify damaged samples, our approach demonstrated balanced performance across accuracy, precision, and recall. This result is consistent with previous studies in agricultural anomaly detection, where supervised deep learning models often struggled to handle minority classes. By contrast, integrating a memory module into the Ganomaly architecture enhanced long-term pattern recall and contextual reconstruction, thereby improving robustness under imbalanced conditions.

Despite the overall improvements, the ablation study provided more nuanced insights into the contributions of individual components. Removing the memory module (Ganomaly) decreased recall, especially on the peach dataset (80.3% → 76.9%), demonstrating its role in improving anomaly discrimination. Although Ganomaly achieved slightly higher accuracy on the peach dataset (81.42% vs. 81.06%), this gain was largely due to the dominance of the majority class. In contrast, our proposed model achieved higher recall (80.3% vs. 76.9%), thereby providing a more balanced performance across accuracy, precision, and recall. Removing skip connections (MemAE) resulted in lower precision (Apple: 79.4% → 78.2%, Peach: 83.2% → 79.0%), underscoring their importance in preserving spatial details during reconstruction. Taken together, these findings indicate that the memory module and skip connections contribute complementary strengths, and their integration leads to the superior overall performance of the proposed model.

Several limitations of this study should be acknowledged. First, the dataset was restricted to apple and peach crops collected in specific regions of South Korea. Although this provides a valuable benchmark, the generalizability of the model to other crops, climates, and environmental conditions remains to be validated. Second, due to the scarcity of damaged samples, data augmentation was applied within the training subset. While this strategy improved balance, it may not fully capture the diversity of real-world damage scenarios. Third, the proposed model, though relatively lightweight compared with CNNs and Transformers, has not yet been evaluated for scalability to higher-resolution imagery or multimodal inputs such as hyperspectral or thermal data. Moreover, although this study primarily focused on frost- and heat-induced crop damage, the framework itself is not inherently restricted to these conditions. As an unsupervised anomaly detection approach, Memory Ganomaly has the potential to generalize to other stressors such as hail, drought, or disease-related damage. However, the present dataset did not include sufficient samples of these cases, and systematic validation across broader categories of crop stress remains an important direction for future research.

Future research should address these limitations. Expanding validation to a wider range of crops and climatic conditions will be critical for assessing the generalizability of the proposed method. Integrating self-supervised or hybrid few-shot learning strategies could further mitigate data imbalance, enabling more robust detection of rare damage types. Moreover, recent Transformer-based anomaly detection methods have shown strong performance on visual anomaly tasks. Although not included in the present study due to dataset scale and computational constraints, incorporating and benchmarking such Transformer models against our framework will be an important direction for future work. Additionally, real-world deployment in agricultural monitoring systems will provide practical insights into scalability, reliability, and ease of integration with existing smart farming infrastructure. Finally, exploring attention mechanisms and advanced memory structures may offer further improvements in detecting subtle weather-related anomalies across dynamic agricultural environments.

Finally, while this study primarily focused on demonstrating anomaly detection performance, we did not conduct a systematic evaluation of the computational overhead introduced by the memory module. Given that model efficiency and lightweight deployment are critical for practical agricultural applications, we acknowledge this omission as a limitation and highlight the development of lightweight and efficiency-optimized variants of Memory Ganomaly as an important avenue for future research.

Previous studies on anomaly detection in agriculture have similarly highlighted the limitations of conventional supervised models under class imbalance. For example, previous studies reported that autoencoder-based approaches improved anomaly sensitivity but often suffered from imbalanced precision–recall trade-offs [20,22]. Our results extend these findings by demonstrating that the integration of a memory module improves both precision and recall simultaneously, leading to a more balanced performance. In addition, another study emphasized the importance of reconstruction fidelity for anomaly detection in agricultural machinery [9], which aligns with our observation that skip connections are essential for preserving fine-grained structural details. Furthermore, while deep learning has been successfully applied for orchard mapping with abundant labeled samples [4], our framework highlights the advantage of anomaly detection under limited damaged samples, thereby complementing supervised and few-shot paradigms. These comparisons collectively support the robustness and practical relevance of the proposed Memory Ganomaly model for agricultural monitoring.

7. Conclusions

This study proposed Memory Ganomaly, a novel anomaly detection framework that integrates memory modules into the conventional AE–GAN architecture to address severe class imbalance in agricultural datasets. Using apple and peach images collected by AWSs across multiple regions in South Korea, the model was evaluated under diverse crop health conditions and weather-induced damage. The results showed that the proposed method achieved balanced performance across accuracy, precision, and recall, outperforming CNNs, Transformer-based models, few-shot learning approaches, and other AE variants.

The key contribution of this work lies in demonstrating that anomaly detection frameworks, particularly memory-augmented AE models, are more effective than conventional supervised classifiers in domains with highly imbalanced data. The memory module enhanced recall by retaining representative patterns of normal data, while skip connections preserved fine details, enabling robust identification of weather-induced anomalies. Beyond methodological advances, this study provides evidence that advanced anomaly detection techniques can serve as practical tools for agricultural monitoring in the era of climate change, where timely and reliable detection of crop stress is essential.

Despite these strengths, the study has several limitations. The dataset was limited to apples and peaches collected from specific regions, which restricts the generalizability of the findings. Furthermore, the scarcity of damaged samples required data augmentation during training, which may not fully capture the variability encountered in real-world conditions. These factors highlight the need for further validation in broader agricultural scenarios.

Future research should evaluate the proposed framework across diverse crop types, climates, and field conditions to assess robustness and adaptability. Incorporating self-supervised learning, attention mechanisms, or few-shot strategies may further enhance performance in data-scarce scenarios. In addition, deploying the model in real-world agricultural monitoring systems and assessing its practical impact in collaboration with stakeholders will be critical for advancing sustainable food production and resilience against climate variability.

Finally, this study reported mean and standard deviation values from 5-fold cross-validation to capture performance variability but did not include formal statistical significance tests (e.g., paired t-tests) in the main results. Because the small number of folds limits statistical power, reporting exact p-values could be misleading. We acknowledge this as a limitation but note that the proposed model consistently outperformed all baselines across folds. Future work will include more rigorous statistical analyses, such as paired tests with larger sample sizes or bootstrap-based validation, to enhance the reliability of performance comparisons.

Author Contributions

J.P. conceptualized the study, developed the Memory Ganomaly model, conducted the experiments, analyzed the results, and wrote the original draft of the manuscript. S.-W.P. contributed to data preprocessing, model implementation, and validation methodologies. Y.-S.K. provided expertise on climate-induced crop damage, supported data collection, and contributed to the agricultural domain analysis. S.-H.J. and C.-B.S. equally supervised the research, provided critical feedback on model architecture, secured funding, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was conducted with support from the Rural Development Administration’s “New Agricultural Climate Change Response System Construction Project (Project Numbers: RS-2020-RD009396 and RS-2024-00332198)”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article material. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

You, J.; Li, X.; Low, M.; Lobell, D.; Ermon, S. Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data. Proc. AAAI Conf. Artif. Intell. 2017, 31, 4559–4566. [Google Scholar] [CrossRef]
Verma, S.; Singh, A.; Pradhan, S.S.; Kushuwaha, M. Impact of Climate Change on Agriculture: A Review. Int. J. Environ. Clim. Change 2024, 14, 615–620. [Google Scholar] [CrossRef]
Arora, N.K. Impact of Climate Change on Agriculture Production and Its Sustainable Solutions. Environ. Sustain. 2019, 2, 95–96. [Google Scholar] [CrossRef]
Afsar, M.M.; Bakhshi, A.D.; Iqbal, M.S.; Hussain, E.; Iqbal, J. High-Precision Mango Orchard Mapping Using a Deep Learning Pipeline Leveraging Object Detection and Segmentation. Remote Sens. 2024, 16, 3207. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef]
Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access 2020, 9, 4843–4873. [Google Scholar] [CrossRef]
Nevavuori, P.; Narra, N.; Lipping, T. Crop yield prediction with deep convolutional neural networks. Comput. Electron. Agric. 2019, 163, 104859. [Google Scholar] [CrossRef]
Dong, S.; Wang, P.; Abbas, K. A Survey on Deep Learning and Its Applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Mujkic, E.; Philipsen, M.P.; Moeslund, T.B.; Christiansen, M.P.; Ravn, O. Anomaly Detection for Agricultural Vehicles Using Autoencoders. Sensors 2022, 22, 3608. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Jeyabose, A.; Eunice, J.; Popescu, D.E.; Chowdary, M.K.; Hemanth, J. Deep Learning-Based Leaf Disease Detection in Crops Using Images for Agricultural Applications. Agronomy 2022, 12, 2395. [Google Scholar] [CrossRef]
Rumpf, T.; Mahlein, A.-K.; Steiner, U.; Oerke, E.-C.; Dehn, H.-W.; Plümer, L. Early Detection and Classification of Plant Diseases with Support Vector Machines Based on Hyperspectral Reflectance. Comput. Electron. Agric. 2010, 74, 91–99. [Google Scholar] [CrossRef]
Demilie, W.B. Plant disease detection and classification techniques: A comparative study of the performances. J. Big Data 2024, 11, 5. [Google Scholar] [CrossRef]
Oğuz, A.; Ertuğrul, Ö.F. A survey on applications of machine learning algorithms in water quality assessment and water supply and management. Water Supply 2023, 23, 895–922. [Google Scholar] [CrossRef]
Meshram, V.; Patil, K.; Meshram, V.; Hanchate, D.; Ramteke, S.D. Machine learning in agriculture domain: A state-of-art survey. Artif. Intell. Life Sci. 2021, 1, 100010. [Google Scholar] [CrossRef]
Garofalo, S.P.; Ardito, F.; Sanitate, N.; De Carolis, G.; Ruggieri, S.; Giannico, V.; Rana, G.; Ferrara, R.M. Robustness of Actual Evapotranspiration Predicted by Random Forest Model Integrating Remote Sensing and Meteorological Information: Case of Watermelon (Citrullus lanatus, (Thunb.) Matsum. & Nakai, 1916). Water 2025, 17, 323. [Google Scholar] [CrossRef]
Araújo, S.O.; Peres, R.S.; Ramalho, J.C.; Lidon, F.; Barata, J. Machine Learning Applications in Agriculture: Current Trends, Challenges, and Future Perspectives. Agronomy 2023, 13, 2976. [Google Scholar] [CrossRef]
Sutaji, D.; Rosyid, H. Convolutional Neural Network (CNN) Models for Crop Diseases Classification. Kinetik 2022, 7. [Google Scholar] [CrossRef]
Ferentinos, K.P. Deep Learning Models for Plant Disease Detection and Diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
Too, E.C.; Li, Y.; Njuki, S.; Liu, Y. A Comparative Study of Fine-Tuning Deep Learning Models for Plant Disease Identification. Comput. Electron. Agric. 2019, 161, 272–279. [Google Scholar] [CrossRef]
Ramcharan, A.; Baranowski, A.; McCloskey, P.; Ahmed, B.; Legg, J.; Hughes, D.P. Deep Learning for Image-Based Cassava Disease Detection. Front. Plant Sci. 2017, 8, 1852. [Google Scholar] [CrossRef]
Tokunaga, T.; Katafuchi, R. Image-based Plant Disease Diagnosis with Unsupervised Anomaly Detection Based on Reconstructability of Colors. In Proceedings of the International Conference on Image Processing and Vision Engineering (IMPROVE 2021), Online, 28–30 April 2021; pp. 112–120. [Google Scholar] [CrossRef]
Garcia-Huerta, R.A.; González-Jiménez, L.E.; Villalon-Turrubiates, L.E. Sensor Fusion Algorithm Using a Model-Based Kalman Filter for the Position and Attitude Estimation of Precision Aerial Delivery Systems. Sensors 2020, 20, 5227. [Google Scholar] [CrossRef] [PubMed]
Benfenati, A.; Causin, P.; Oberti, R.; Stefanello, G. Unsupervised deep learning techniques for automatic detection of plant diseases: Reducing the need of manual labelling of plant images. J. Math. Ind. 2023, 13, 5. [Google Scholar] [CrossRef]
Ciniglio, A.; Guiotto, A.; Spolaor, F.; Sawacha, Z. The Design and Simulation of a 16-Sensors Plantar Pressure Insole Layout for Different Applications: From Sports to Clinics, a Pilot Study. Sensors 2021, 21, 1450. [Google Scholar] [CrossRef]
Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Langs, G.; Schmidt-Erfurth, U. f-AnoGAN: Fast Unsupervised Anomaly Detection with Generative Adversarial Networks. Med. Image Anal. 2019, 54, 30–44. [Google Scholar] [CrossRef]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2016, arXiv:1511.06434. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 2 (NIPS’14), Montreal, QC, Canada, 8–13 December 2014; MIT Press: Cambridge, MA, USA, 2014; pp. 2672–2680. Available online: https://dl.acm.org/doi/10.5555/2969033.2969125 (accessed on 25 September 2025).
Prosvirin, A.E.; Islam, M.M.M.; Kim, J.-M. An Improved Algorithm for Selecting IMF Components in Ensemble Empirical Mode Decomposition for Domain of Rub-Impact Fault Diagnosis. IEEE Access 2019, 7, 121728–121741. [Google Scholar] [CrossRef]
Gong, D.; Liu, L.; Le, V.; Saha, B.; Mansour, M.R.; Venkatesh, S. Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 16000–16009. [Google Scholar] [CrossRef]
Akçay, S.; Atapour-Abarghouei, A.; Breckon, T.P. Ganomaly: Semi-Supervised Anomaly Detection via Adversarial Training. Proc. Asian Conf. Comput. Vis. 2018, 11363, 622–637. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Koch, G.; Zemel, R.; Salakhutdinov, R. Siamese Neural Networks for One-Shot Image Recognition. Available online: https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf (accessed on 25 September 2025).
Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-Shot Learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4077–4087. [Google Scholar] [CrossRef]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One-Shot Learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3630–3638. [Google Scholar] [CrossRef]

Figure 1. Full structure of the proposed crop anomaly detection model. In the data collection stage, real-time weather data and crop images are collected through the automatic weather station (AWS). In the data preprocessing stage, the collected images are optimized for analysis, highlighting the damaged areas. In the AE model stage, the input data is reconstructed through the encoder and decoder, which store and restore important features through the memory modules. Finally, the normal and abnormal conditions to determine crop abnormalities.

Figure 2. Examples of data used for model training. These images show the status of apples and peaches collected at various stages of growth. Each column represents the crop status at a particular point in time, and this data enables the detection of abnormal patterns in crops due to climate change.

Figure 3. Structure of the proposed AE-based model. The model consists of a generator, a memory module, an encoder, and a discriminator. The generator encodes the input image into a latent spatial vector

z

, and combines similar latent vector

\hat{z}

through a memory module to generate a reconstructed image. The encoder and discriminator receive reconstructed images to distinguish normal from abnormal images.

L_{c o n t e x t u a l}

,

L_{e n c o d e r}

, and

L_{a d v e r s a r i a l}

are loss functions used improve the model’s performance with context, encoder, and adversarial learning, respectively.

Figure 3. Structure of the proposed AE-based model. The model consists of a generator, a memory module, an encoder, and a discriminator. The generator encodes the input image into a latent spatial vector

z

, and combines similar latent vector

\hat{z}

through a memory module to generate a reconstructed image. The encoder and discriminator receive reconstructed images to distinguish normal from abnormal images.

L_{c o n t e x t u a l}

,

L_{e n c o d e r}

, and

L_{a d v e r s a r i a l}

are loss functions used improve the model’s performance with context, encoder, and adversarial learning, respectively.

Figure 4. Structure of the memory module. The input vector

z

measures similarity with the memory item, and the memory item with the most similar pattern is selected. The selected memory item is combined with the input vector to generate a sophisticated latent space vector

\hat{z}

, which is finally transmitted to the generator decoder.

Figure 4. Structure of the memory module. The input vector

z

measures similarity with the memory item, and the memory item with the most similar pattern is selected. The selected memory item is combined with the input vector to generate a sophisticated latent space vector

\hat{z}

, which is finally transmitted to the generator decoder.

Figure 5. Ablation study boxplots on apple (top row) and peach (bottom row) datasets for accuracy, precision, and recall across 5-fold cross-validation. The three variants—Proposed model, w/o Memory (Ganomaly), and w/o Skip (MemAE)—are compared. Boxes denote the interquartile range, whiskers indicate the overall range, and diamonds represent fold means. The results highlight that the memory module mainly improves recall, while skip connections enhance precision, leading to the most balanced performance when both are integrated in the proposed model.

Figure 6. Trade-off analysis of memory module size (128, 256, 512, 1024) on apple and peach datasets. Each curve shows mean accuracy, precision, and recall with standard deviation across 5 folds. Smaller sizes reduce recall due to limited representation, while very large size (1024) increases redundancy and computational cost without consistent gains. The configuration with 512 items achieves the best balance between detection performance and efficiency.

Figure 7. Reconstruction results of undamaged crop images using the proposed Memory Ganomaly model. The reconstructed outputs closely resemble the original images, successfully preserving fine-grained visual details such as leaf structure and fruit texture. This demonstrates the model’s ability to accurately capture and reproduce normal crop conditions.

Figure 8. Reconstruction results of damaged crop images using the proposed Memory Ganomaly model. The reconstructed outputs differ significantly from the original images, particularly in regions with visible damage such as leaf browning or fruit deformation. This illustrates the model’s sensitivity to anomalies, as it struggles to accurately reproduce damaged areas and thereby highlights them as abnormal regions.

Figure 9. Examples of original and reconstructed crop images collected at the same location across different growth stages. Unlike Figure 7 and Figure 8, which show individual undamaged or damaged samples, this figure emphasizes longitudinal consistency. The proposed model robustly reconstructs crop images over time while maintaining structural integrity in undamaged regions and amplifying discrepancies in damaged regions, thereby demonstrating stable anomaly detection performance under temporal variations.

Table 1. Overview of Original and Used Crop Data by Growth Phase.

State	Apple		Peach
State	Original	Used	Original	Used
Dormancy	4504	401	10,877	1047
Bud stage	683	401	2584	1047
Flowering	401	401	1047	1047
Post-flowering	619	401	1417	1047
Early fruit growth	1560	401	2960	1047
Fruit growth	2364	401	9298	1047
Harvest	544	401	2120	1047
Nutrient accumulation	3667	401	4122	1047
Total	14,342	3208	34,425	8376

Table 2. Overview of Original and Augmented Weather Damage Data for Apples and Peaches.

State	Apple		Peach
State	Original	Used	Original	Used
Cold Damage	31	93	36	108
Heat Damage	27	81	32	96
Total	58	174	68	204

Table 3. Performance test results with 5-Fold cross-validation on the Apple dataset. Metrics are reported as mean ± standard deviation.

Model	Accuracy (%)	Precision (%)	Recall (%)
ResNet-50	94.86	0	0
EfficientNet-B3	94.86	0	0
ResNeXt-50	94.86	0	0
ConvNeXt-Tiny	94.86	0	0
Swin Transformer-Tiny	94.86	0	0
Siamese Network(10-shot)	68.12 ± 2.7	64.11 ± 2.1	70.3 ± 2.8
Prototypical Network(10-shot)	68.64 ± 3.1	65.32 ± 4.4	65.6 ± 5.7
Matching Network(10-shot)	65.81 ± 3.4	61.55 ± 3.2	64.0 ± 4.7
Siamese Network(20-shot)	73.11 ± 3.7	77.43 ± 2.6	75.3 ± 2.8
Prototypical Network(20-shot)	66.64 ± 6.9	61.32 ± 7.4	62.6 ± 7.1
Matching Network(20-shot)	71.81 ± 4.3	73.55 ± 5.0	72.0 ± 4.7
Ganomaly	77.29 ± 2.6	76.98 ± 3.1	77.12 ± 2.9
MemAE	79.61 ± 1.7	78.21 ± 2.5	78.0 ± 2.1
Proposed Model	80.32 ± 1.3	79.4 ± 1.6	79.1 ± 1.4

Table 4. Performance test results with 5-Fold cross-validation on the Peach dataset. Metrics are reported as mean ± standard deviation.

Model	Accuracy (%)	Precision (%)	Recall (%)
ResNet-50	97.63	0	0
EfficientNet-B3	97.63	0	0
ResNeXt-50	97.63	0	0
ConvNeXt-Tiny	97.63	0	0
Swin Transformer-Tiny	97.63	0	0
Siamese Network(10-shot)	70.64 ± 2.7	69.9 ± 3.7	71.9 ± 1.1
Prototypical Network(10-shot)	67.33 ± 3.6	69.33 ± 6.7	68.31 ± 4.7
Matching Network(10-shot)	68.17 ± 4.1	64.87 ± 3.4	57.74 ± 5.4
Siamese Network(20-shot)	71.43 ± 3.1	76.3 ± 4.9	75.1 ± 2.8
Prototypical Network(20-shot)	61.7 ± 5.5	63.27 ± 3.3	59.6 ± 7.1
Matching Network(20-shot)	73.94 ± 5.2	69.64 ± 4.6	71.0 ± 3.7
Ganomaly	81.42 ± 3.1	81.08 ± 2.7	76.9 ± 3.2
MemAE	80.96 ± 2.1	79.01 ± 2.9	77.8 ± 2.5
Proposed Model	81.06 ± 1.7	83.23 ± 1.6	80.3 ± 1.7

Table 5. Ablation study results of the proposed Memory Ganomaly model. Performance is compared with variants without the memory module (Ganomaly) and without skip connections (MemAE). Metrics include accuracy, precision, and recall, reported as mean ± standard deviation across 5-fold cross-validation on apple and peach datasets.

Dataset	Metric	Proposed Model	w/o Memory Module (Ganomaly)	w/o Skip Connections (MemAE)
Apple	Accuracy	80.32 ± 1.3	77.29 ± 2.6	79.61 ± 1.7
	Precision	79.4 ± 1.6	76.98 ± 3.1	78.21 ± 2.5
	Recall	79.1 ± 1.4	77.12 ± 2.9	78.0 ± 2.1
Peach	Accuracy	81.06 ± 1.7	81.42 ± 3.1	80.96 ± 2.1
	Precision	83.23 ± 1.6	81.08 ± 2.7	79.01 ± 2.9
	Recall	80.3 ± 1.7	76.9 ± 3.2	77.8 ± 2.5

Table 6. Quantitative reconstruction results using SSIM and PSNR metrics for apple and peach datasets.

Dataset	Condition	SSIM (↑)	PSNR (dB, ↑)
Apple	Undamaged	0.87 ± 0.02	28.5 ± 1.1
Apple	Damaged	0.62 ± 0.03	21.7 ± 1.4
Peach	Undamaged	0.89 ± 0.01	29.2 ± 1.0
Peach	Damaged	0.64 ± 0.02	22.1 ± 1.2

Table 7. Computational complexity comparison of the baseline models and the proposed MemGanomaly (input: 128 × 128 RGB, memory module = 1000 × 512, α = 0.7).

Model	Component	Params (M)	FLOPs (G)
GANomaly	Generator	6.07	120.1
	Discriminator	0.37	127.8
	Total	6.45	247.9
MemAE	AE (Total)	5.24	769.2
MemGanomaly (Proposed)	Generator	34.18	2022.6
	Discriminator	3.91	218.5
	Total	38.09	2241.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, J.; Park, S.-W.; Kim, Y.-S.; Jung, S.-H.; Sim, C.-B. MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection. Appl. Sci. 2025, 15, 10503. https://doi.org/10.3390/app151910503

AMA Style

Park J, Park S-W, Kim Y-S, Jung S-H, Sim C-B. MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection. Applied Sciences. 2025; 15(19):10503. https://doi.org/10.3390/app151910503

Chicago/Turabian Style

Park, Jun, Sung-Wook Park, Yong-Seok Kim, Se-Hoon Jung, and Chun-Bo Sim. 2025. "MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection" Applied Sciences 15, no. 19: 10503. https://doi.org/10.3390/app151910503

APA Style

Park, J., Park, S.-W., Kim, Y.-S., Jung, S.-H., & Sim, C.-B. (2025). MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection. Applied Sciences, 15(19), 10503. https://doi.org/10.3390/app151910503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MemGanomaly: Memory-Augmented Ganomaly for Frost- and Heat-Damaged Crop Detection

Abstract

1. Introduction

2. Related Work

2.1. Machine Learning-Based Approaches

2.2. Convolutional Neural Networks

2.3. Autoencoders

2.4. Generative Adversarial Networks

2.5. Transformer-Based Models

3. Materials and Methods

3.1. Data Preprocessing Module

3.2. Data Collection

3.3. Data Preprocessing

3.4. Autoencoder Model

3.4.1. Generator

3.4.2. Memory Module

3.4.3. Encoder

3.4.4. Discriminator

3.4.5. Hyperparameter Settings

3.4.6. Programming Environment

3.5. Training Details

4. Loss Function

4.1. Loss Function Overview

4.2. Contextual Loss

4.3. Encoder Loss

4.4. Adversarial Loss

4.5. Total Loss Function

5. Results

5.1. Crop Dataset

5.2. Model Performance

5.3. Reconstruction Results

5.4. Computational Efficiency

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI