AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation

Kanagarajan, Malathi; Natarajan, Mohanasundaram; Rajendran, Santhosh; Velusamy, Parthasarathy; Ganesan, Saravana Kumar; Bose, Manikandan; Sakthivel, Ranjithkumar; Stephen Inbaraj, Baskaran

doi:10.3390/agriengineering7080247

Open AccessArticle

AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation

by

Malathi Kanagarajan

^1,2,*

,

Mohanasundaram Natarajan

¹,

Santhosh Rajendran

^1,2

,

Parthasarathy Velusamy

^1,2

,

Saravana Kumar Ganesan

^2,3

,

Manikandan Bose

²

,

Ranjithkumar Sakthivel

²

and

Baskaran Stephen Inbaraj

^4,*

¹

Department of Computer Science Engineering, Karpagam Academy of Higher Education (Deemed University), Coimbatore 641021, India

²

Centre for Artificial Intelligence and Unmanned Aerial Vehicles (CAIUAV), Karpagam Academy of Higher Education (Deemed University), Coimbatore 641102, India

³

Department of Electronics and Communication Engineering, Karpagam College of Engineering, Coimbatore 641032, India

⁴

Department of Food Science, Fu Jen Catholic University, New Taipei City 242062, Taiwan

^*

Authors to whom correspondence should be addressed.

AgriEngineering 2025, 7(8), 247; https://doi.org/10.3390/agriengineering7080247

Submission received: 28 May 2025 / Revised: 15 July 2025 / Accepted: 23 July 2025 / Published: 1 August 2025

Download

Browse Figures

Versions Notes

Abstract

Tea cultivation faces significant threats from red spider mite (RSM: Oligonychus coffeae) infestations, which reduce yields and economic viability in major tea-producing regions. Current automated detection methods rely on supervised deep learning models requiring extensive labeled data, limiting scalability for smallholder farmers. This article proposes AIM-Net (AI-based Infestation Mapping Network) by evaluating SwAV (Swapping Assignments between Views), a self-supervised learning framework, for classifying RSM infestation severity (Mild, Moderate, Severe) using a geo-referenced, field-acquired dataset of RSM infested tea-leaves, Cam-RSM. The methodology combines SwAV pre-training on unlabeled data with fine-tuning on labeled subsets, employing multi-crop augmentation and online clustering to learn discriminative features without full supervision. Comparative analysis against a fully supervised ResNet-50 baseline utilized 5-fold cross-validation, assessing accuracy, F1-scores, and computational efficiency. Results demonstrate SwAV’s superiority, achieving 98.7% overall accuracy (vs. 92.1% for ResNet-50) and macro-average F1-scores of 98.3% across classes, with a 62% reduction in labeled data requirements. The model showed particular strength in Mild_RSM-class detection (F1-score: 98.5%) and computational efficiency, enabling deployment on edge devices. Statistical validation confirmed significant improvements (p < 0.001) over baseline approaches. These findings establish self-supervised learning as a transformative tool for precision pest management, offering resource-efficient solutions for early infestation detection while maintaining high accuracy.

Keywords:

self-supervised learning; red spider mite detection; SwAV; tea cultivation; accuracy; F1-score

1. Introduction

Oligonychus coffeae, commonly known as the red spider mite, is a significant pest affecting Camellia sinensis, or tea plants, with substantial economic implications. Infestations peak during dry seasons, so control measures should begin at the end of the first month after the dry season starts [1]. Red spider mites reduce chlorophyll, carotenoids, and xanthophylls in tea leaves, lowering tea quality by increasing phenolic and alcoholic components while reducing sugar and protein content [2]. The red spider mite, is a major pest of Camellia sinensis (tea), causing 5–15% annual crop losses [3]. Synthetic acaricides are commonly used, but resistance has emerged, necessitating chemical rotation with different modes of action [4]. Plant extracts from Clerodendrum viscosum and Murraya paniculata have effectively reduced mite populations without harming beneficial insects or affecting tea quality [5,6]. Biological control using fungi like Metarhizium anisopliae has also proven effective [7]. Integrated pest management (IPM), combining biological, cultural, and chemical methods, offers a sustainable solution [8,9,10]. IPM reduces chemical reliance and environmental impact.

Traditional approaches for detecting and classifying Oligonychus coffeae infestations in Camellia sinensis, mainly based on visual estimation, have notable limitations. These include human error, rater variability, and subjective bias, leading to inaccurate severity classification [11,12]. The Horsfall–Barratt scale, though widely used, is criticized for flawed assumptions and misuse risks [13]. Visual methods also demand representative sampling, which is difficult in large plantations [14]. Sensor-based technologies offer potential improvements but remain limited under field variability [11]. Traditional methods also fail to address rising pest resistance to chemical controls, highlighting the need for better early detection [15]. While non-chemical practices rooted in traditional knowledge show promise, they require further validation for integration into pest management strategies [16]. Overall, improved technology and methodologies are essential for accurate, reliable pest detection.

The autonomous detection of Oligonychus coffeae in Camellia sinensis can be effectively achieved using advanced machine learning, particularly deep learning and image processing techniques. Deep learning models developed via AutoML, such as PNASNet-5, ResNet-50, ResNeXt-101, and Inception-ResNet-V2, have shown high accuracy in pest classification [17]. Transfer learning, using CNNs with classifiers like SVM, further enhances plant disease detection [18]. Vision Transformers, DenseNet, and YOLO variants improve detection capabilities [19], while the Extra Tree Classifier (ETC) has proven effective for tea leaf disease detection [20]. Modified Mask R-CNN models offer precise infestation segmentation [21]. These technologies collectively support robust, accurate detection of O. coffeae, enabling timely pest management and promoting sustainable tea cultivation [22].

Deep learning approaches for the autonomous detection and severity classification of Oligonychus coffeae infestations in Camellia sinensis can benefit significantly from advancements in machine vision and image processing technologies. Machine vision-based automation, as discussed in the context of precision agriculture, can enhance productivity and quality by efficiently detecting diseases and stresses in crops, which is crucial for managing infestations like those caused by Oligonychus coffeae [23]. The use of deep convolutional neural networks (DCNNs), such as the VGG-19 model with transfer learning, has shown promise in classifying plant diseases with high accuracy, as demonstrated in the classification of tomato leaf diseases. This approach involves segmenting images to focus on relevant features, thereby improving classification performance and reducing training time [24].

Self-supervised deep learning offers effective solutions for detecting and classifying Oligonychus coffeae infestations in Camellia sinensis, especially in resource-limited environments. Lightweight models like TS-YOLO, ULEN, and Dise-Efficient enhance accuracy and speed while reducing computational load, making them suitable for mobile deployment. These models support efficient operation under low-resource and variable-lighting conditions [25,26,27]. Self-supervised learning reduces dependency on labeled data, vital in agricultural settings [28]. Integration with existing systems and real-time monitoring via UAVs or mobile devices enables timely intervention [29]. Despite their promise, challenges such as imaging quality, pest differentiation, and environmental adaptability persist. Continued research and collaboration are essential for optimizing these technologies for practical, scalable pest management in tea cultivation.

A two-level fusion network combining YOLOv3 and DenseNet201 can adapt to detect Oligonychus coffeae infestations [30,31]. Computer vision methods for severity Estimation [32] and IoT-based detection systems enhance real-time monitoring [33].

Additionally, leveraging generative adversarial networks (GANs) for high-resolution datasets [34] and edge computing solutions like the NVIDIA Jetson TX2 facilitates robust pest detection and deployment in remote tea areas. Lightweight convolutional neural networks, such as RLGCNet for tea quality detection, reduce computational demands, making the technology accessible for small-scale farmers [35]. The IMVTS model’s success in detecting multiple tea varieties suggests that incorporating attention mechanisms like CBAM or ECA can enhance pest detection accuracy and adaptability across tea cultivars [36]. These strategies provide a comprehensive framework for developing self-supervised deep learning models tailored to the challenges of detecting and classifying Oligonychus coffeae infestations in tea plants, considering deployment constraints in resource-limited settings. Integrating FPGA-based image processing systems enables real-time monitoring, facilitating timely decisions for pest management [37]. While not directly addressing Oligonychus coffeae, the discussed methodologies offer a robust framework for autonomous pest detection, improving tea crop yield and quality.

Research Questions (RQ)

RQ1: Can self-supervised feature learning with SwAV significantly reduce the labeled data required for accurate classification of RSM severity in tea plantations, while maintaining classification accuracy above 95%?
RQ2: Does implementing multi-crop data augmentation enhance the model’s robustness against environmental variability typical in field conditions, such as occlusion, varying illumination, and leaf orientation?
RQ3: Can computationally efficient architectures support real-time pest severity detection on low-resource edge devices without significantly compromising accuracy?
RQ4: To what extent do self-supervised learned features correlate spatially with entomologically validated pest damage symptoms on tea leaves?

Objectives

Objective 1: To evaluate the effectiveness of SwAV-based self-supervised learning in reducing dependency on large-scale labeled datasets, targeting a reduction of at least 60% in labeled image requirements while achieving accuracy comparable to fully supervised models.
Objective 2: To investigate the contribution of multi-crop augmentation to the model’s robustness under realistic field conditions (e.g., naturally varying illumination up to ~1200 lux), with a focus on improving Mild-class recall, albeit without conducting explicit comparative experiments under controlled lighting or occlusion conditions.
Objective 3: To develop and optimize a computationally efficient SwAV-ResNet50 architecture that maintains high accuracy (within 2% of a fully supervised baseline) while significantly reducing computational overhead to meet the resource constraints of ARM-based IoT devices, albeit without direct inference latency measurements on actual edge hardware.

Critical Research Gaps in Agricultural Pest Detection—Despite considerable progress in agricultural pest detection methodologies, several critical research gaps remain inadequately addressed in current literature, particularly for tea pest management and self-supervised learning applications.

Gap 1: Limited Self-Supervised Learning Application—Self-supervised learning techniques remain significantly underexplored in agricultural pest detection. While traditional supervised approaches dominate the field, they require extensive labeled datasets that are expensive and time-consuming to obtain. The specific application of SwAV (Swapping Assignments between Multiple Views) self-supervised learning to agricultural pest severity classification has not been reported in the literature, representing a significant methodological gap offering unique advantages over contrastive learning approaches.

Gap 2: Inadequate Class Imbalance Solutions—Agricultural pest datasets inherently suffer from severe class imbalance, where Mild and Moderate severity cases are significantly underrepresented. Current approaches rely on basic data augmentation techniques that fail to address complex boundary conditions between severity classes. The integration of ADASYN (Adaptive Synthetic Sampling) with self-supervised learning for agricultural pest detection has not been adequately explored.

Gap 3: Lack of Resource-Efficient Architectures—Existing systems employ computationally intensive models impractical for deployment in resource-constrained agricultural environments. The strategic use of frozen backbone architectures in conjunction with self-supervised pre-training for agricultural applications has not been systematically investigated.

Gap 4: Insufficient Multi-Scale Feature Learning—Tea pest symptoms manifest across multiple scales, requiring sophisticated multi-scale feature learning approaches. The application of multi-crop augmentation strategies specifically designed for capturing multi-scale pest symptoms in self-supervised learning frameworks has not been adequately investigated.

To address these identified gaps, this study proposes AIM-Net (Adaptive Intelligence Model for Agricultural Networks), integrating four key innovations: (1) SwAV self-supervised learning specifically optimized for agricultural pest detection, (2) intelligent ADASYN data balancing integration, (3) resource-efficient frozen ResNet-50 architecture, and (4) multi-scale feature learning for tea pest detection, establishing a comprehensive framework that advances state-of-the-art agricultural pest detection methodologies.

2. Materials and Methods

2.1. Image Acquisition Methods

The images were systematically gathered from three major tea-growing regions in Tamil Nadu, India, selected for their distinct agroclimatic conditions: Connor (11.35° N, 76.82° E; 1850 m elevation) UPASI Tea farm. Image acquisition utilized a Canon EOS 90D DSLR [Tokyo, Japan] camera equipped with an EF-S 18–135 mm f/3.5–5.6 IS STM lens, configured to capture 12.2-megapixel RGB images (4032 × 3024 pixels) in RAW + JPEG dual format. To standardize imaging conditions, all shots were taken under natural daylight (200–1200 lux intensity) between 09:00–15:00 IST, maintaining a fixed aperture (f/5.6), shutter speed (1/250 s), ISO (400), and focal length (85 mm) at a 1.5 m working distance from leaf surfaces. A calibrated X-Rite ColorChecker [Grand Rapids, MI, USA] Classic was included in 10% of frames to enable post hoc color normalization, critical for maintaining spectral consistency across variable lighting conditions. The curation protocol incorporated rigorous quality controls: blurred frames (SSIM < 0.85), partially occluded pest regions (>15% debris coverage), and improperly exposed images (histogram peaks outside [0.1, 0.9] intensity range) were excluded. Geospatial metadata was embedded using RTK-GPS (2 cm accuracy) via a DJI Matrice 300 RTK drone [Shenzhen, Guangdong, China], enabling precise localization of infestation hotspots. Temporal diversity was ensured through triennial sampling (2022–2024), covering monsoon, winter, and summer growth phases. Ethical compliance was maintained through Institutional Review Board approval (UPASI-TRF IRB #AgEng-2024-017), with farmer identities encrypted via SHA-256 hashing.

The rigorous data collection methodology employed in this study extends beyond conventional agricultural imaging protocols to establish a standardized, reproducible framework for tea pest detection research. The image acquisition system was designed with multiple redundancy measures and quality control checkpoints to ensure dataset integrity and scientific validity.

2.1.1. Field Site Characterization and Environmental Controls

The Connor UPASI Tea Farm location (11.35° N, 76.82° E; 1850 m elevation) was selected through systematic evaluation of representative agroclimatic conditions across Tamil Nadu’s tea-growing regions. Comprehensive environmental monitoring was implemented throughout the data collection period, including continuous recording of ambient temperature (22–28 °C), relative humidity (65–85%), wind velocity (<3 m/s), and atmospheric pressure variations.

2.1.2. Advanced Imaging Protocol and Quality Assurance

The Canon EOS 90D DSLR imaging system [Tokyo, Japan] was calibrated daily using standardized procedures to ensure consistent performance across the multi-season data collection period. Beyond the basic camera specifications previously mentioned, the imaging protocol incorporated advanced features including focus bracketing (5-point autofocus system), exposure bracketing (±2 EV range), and white balance calibration using the X-Rite Color Checker Classic [Grand Rapids, MI, USA] under varying illumination conditions. Image quality assessment was performed using automated metrics including spatial frequency response (SFR), modulation transfer function (MTF), and signal-to-noise ratio (SNR) calculations to ensure technical adequacy before inclusion in the dataset.

2.1.3. Standardized Sampling Strategy and Statistical Power Analysis

The sampling methodology was designed using stratified random sampling with power analysis calculations to ensure adequate representation across pest severity classes and temporal variations. Sample size determination was based on effect size calculations (Cohen’s d = 0.8), alpha level (α = 0.05), and desired statistical power (1 − β = 0.90), resulting in minimum requirements of 788 images per severity class for reliable classification performance. The triennial sampling approach (2022–2024) incorporated systematic temporal stratification to capture seasonal pest lifecycle variations, with specific emphasis on monsoon breeding cycles, winter dormancy patterns, and summer stress-induced infestations.

2.2. Image Labelling Process

The image labeling protocol employed a three-stage (Figure 1a–c) expert review process to ensure accurate classification of RSM severity levels, adhering to the Symptom Severity Index (SSI) developed through collaboration with the UPASI. Five certified entomologists with tea cultivation experience performed annotations using a dual-channel system.

2.2.1. Visual Symptom Assessment

Mild Class: Characterized by ≤15% leaf area damage, identified through faint silvery speckling (RGB: 180 ± 15, 180 ± 15, 180 ± 15) localized to interveinal regions. Annotators required ≥90% agreement on the presence of <5 visible mites/cm² (magnification: 5×) and intact leaf structure (curvature <0.15 rad).
Moderate Class: Defined by 15–30% affected area with light browning (CIE Lab*: L = 65 ± 5, a = 10 ± 2, b = 40 ± 3) along leaf margins. Required observable mite clusters (5–15/cm²) under 10× magnification and early curling (curvature 0.15–0.35 rad).
Severe Class: Marked by >30% leaf damage showing reddish-brown discoloration (HSV: H = 15 ± 5°, S = 80 ± 5%, V = 50 ± 10%) and structural compromise (curvature > 0.35 rad). Mite density inferred through webbing coverage (>25% leaf surface) rather than direct counts.

2.2.2. Quantitative Validation

a.: Digital Planimetry: ImageJ v1.53 with threshold-based segmentation (Otsu’s method) quantified affected areas:

A_{damage} = (\frac{\sum {Pixels}_{discolored}}{\sum {Pixels}_{leaf}}) \times 100

(1)

\sum {Pixels}_{leaf}

: Total number of pixels representing the full leaf area in the image.

\sum {Pixels}_{discolored}

: Total number of pixels identified as discolored (brown, red, pale, etc.), typically due to RSM feeding damage.

A_{damage}

: The percentage of leaf area affected by discoloration or visible damage caused by RSM.

b.: Color Calibration: X-Rite Color Checker [Grand Rapids, MI, USA] -driven normalization ensured consistent interpretation of discoloration levels across lighting conditions (ΔE < 3.0 in CIEDE2000 space).

2.2.3. Consensus Protocol

Initial labels from two independent annotators compared using Cohen’s κ coefficient (κ = 0.86, 95% CI: 0.82–0.89)
Discrepancies resolved through panel review with Advisor—UPASI (25 years’ experience)
10% random sample re-analyzed post-labeling (κ = 0.92, p < 0.001)

The above operations yield a total of 4883 RGB images of Camellia sinensis leaves infested with Oligonychus coffeae, comprising 2000 Severe RSM images, 2000 Moderate RSM images, and 883 Mild RSM images. The authors named this dataset (Figure 2) Cam-RSM.

2.3. Pre-Processing Pipeline

2.3.1. Image Resizing and Normalization

The preprocessing pipeline begins with adaptive histogram equalization using Contrast Limited Adaptive Histogram Equalization (CLAHE) with clip limit optimization (clip limit = 2.0, tile_grid_size = 8 × 8) to enhance local contrast while preserving global image characteristics. Illumination correction was implemented using homomorphic filtering combined with Retinex-based algorithms to normalize lighting variations across different acquisition times and weather conditions. Color constancy was achieved through Gray World assumption-based white balance correction, followed by chromatic adaptation using the Bradford transformation matrix. Standardizing input dimensions ensures compatibility with deep learning architectures while preserving critical pest features. Resizing maintains aspect ratios to avoid spatial distortion of mite distribution patterns. Image resizing standardized inputs to 1024 px width via bilinear interpolation, reducing GPU memory consumption while preserving critical pest features.

H_{new} = ⌊\frac{1024}{W_{orig}} \times H_{orig}⌋

(2)

H_{new}

is the new height of the image;

W_{orig}

is the original width;

H_{orig}

is the original height. Per-channel normalization was used to mitigate illumination variability accelerating model convergence.

I_{norm} = \frac{I_{raw} - μ}{σ}

(3)

I_{raw}

are original pixel values (typically in range [0, 1] after dividing by 255; μ = [0.485, 0.456, 0.406] is mean per channel (RGB); σ = [0.229, 0.224, 0.225] is standard deviation per channel (RGB);

I_{norm}

: normalized image tensor. Standardization ensures statistical consistency across the dataset, reducing variance from image size discrepancies and illumination conditions that could skew model performance.

2.3.2. Informative Region Extraction

This step enhances detection accuracy in complex field environments where background elements (soil, equipment, non-infested foliage) could trigger false positives in pest infestation scenarios. Informative region extraction employed YOLOv5-guided cropping with adaptive padding (15 px margins) and HSV thresholding (H: 35–70°, S: 20–255, V: 0–255), achieving 98.6% pest retention efficiency.

OI Efficiency = \frac{Pest Pixels Retained}{Total Pest Pixels} = 0.986

(4)

The image cleaning operation was ensured through calculating the following:

Background Noise Reductio = 1 - \frac{Non - Leaf Pixels post}{Non - Leaf Pixels pre} = 0.74

(5)

2.3.3. Image Augmentation

Image augmentation addresses image variability under different field conditions (lighting, viewing angles, leaf orientations) typical in agricultural environments and it utilizes Albumentations 1.3.0 to carry out horizontal/vertical flips, rotations (θ = ±30), perspective warps (λ_scale = 0.1), and HSV shifts (ΔH = ±15°, ΔS = ±0.1, ΔV = ±0.1). The perspective warp transformation matrix is as follows:

[\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}] = [\begin{matrix} 1 + λ & 0 & 0 \\ 0 & 1 + λ & 0 \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ 1 \end{matrix}] where λ \sim U (- 0.1,0.1)

(6)

This uniform isotropic scaling transformation scales the image randomly between 90% and 110% of its original size.

2.4. ADASYN for Balanced TEA-RSM-S3D Dataset

The Adaptive Synthetic Sampling (ADASYN) algorithm was employed to address the significant class imbalance in the Cam-RSM dataset, where the RSM_Minor class constituted only 18.1% of the total data. This imbalance posed challenges for machine learning models, as underrepresented classes often lead to biased predictions and reduced generalization. ADASYN generates synthetic samples for minority classes by interpolating between existing samples and their nearest neighbors, weighted by the local density of minority samples. This ensures that harder-to-learn regions in the feature space are better represented, improving model robustness and fairness.

The original minority samples were clustered in feature space, as visualized using a t-SNE-like projection as shown in Figure 3. ADASYN generated synthetic samples by interpolating between randomly selected pairs of neighboring minority samples. For each original sample

x_{i},

a neighbor

x_{i}

was chosen, and a synthetic sample

x_{synthetic}

was created as follows:

x_{synthetic} = x_{i} + λ (x_{j} - x_{i}), λ \sim U (0,1)

(7)

This interpolation ensured that synthetic samples were distributed along the feature space’s decision boundaries, where the minority class was underrepresented. The visualization in Figure 3 shows how synthetic samples (red crosses) fill the sparse regions around original minority samples (blue dots), creating a more uniform distribution across feature dimensions.

The implementation of Adaptive Synthetic Sampling (ADASYN) on the Cam-RSM dataset (Figure 4) demonstrates significant statistical improvement in class distribution for tea pest detection applications. The original dataset exhibited substantial class imbalance with the Mild_RSM class (883 images) representing only 18.1% of the total data compared to Moderate_RSM and Severe_RSM classes (2000 images each, 41%). Through ADASYN application, 1105 synthetic minority class samples were generated, resulting in a perfectly balanced distribution of 1988 images per severity class. This balanced dataset provides statistical validity through equal representation (33.3% per class), which is critical for unbiased model training in precision agriculture applications.

2.5. Methodology: End-to-End Framework Description

The end-to-end methodology (Figure 5) AIM-Net, employed for classifying RSM infestations in tea leaves, highlights the interplay between data collection, preprocessing, self-supervised learning, and final classification. The pipeline commences with Image Acquisition at Coonoor UPASI Tea Farms, where diverse leaf samples are captured under field conditions to ensure broad coverage of pest severity levels. These raw images then undergo an Image Labeling Process, wherein domain experts classify each leaf’s infestation status (Mild, Moderate, or Severe RSM) according to established damage thresholds and visible mite characteristics—this curated set of labeled images forms the Camellia-RSM dataset.

Next, pre-processing steps include size normalization, color balancing, and adaptive sampling. The integration of ADASYN addresses the fundamental challenge of class imbalance inherent in agricultural datasets, where severe RSM infestations are naturally less frequent than mild cases. ADASYN’s adaptive synthetic sample generation strategy focuses computational resources on minority class examples that are harder to learn, those positioned closer to classification decision boundaries. This intelligent oversampling approach generates synthetic data with weighted distributions based on learning difficulty, effectively reducing bias introduced by class imbalance while adaptively shifting the classification boundary toward challenging examples. Augmentation strategies—like random flips, rotations, and color jitter—further expand data variability without inflating the overall memory footprint.

Subsequently, the methodology leverages a SwAV (Swapped Assignments Between Multiple Views) Pre-training stage. This which fundamentally transforms the traditional paradigm of agricultural pest detection. Unlike conventional contrastive learning approaches that require computationally expensive pairwise feature comparisons, SwAV employs an elegant clustering-based mechanism that simultaneously learns visual representations while enforcing consistency between cluster assignments of different augmented views of the same leaf image. This approach eliminates the need for large memory banks or momentum networks, making it exceptionally suitable for resource-constrained agricultural deployment scenarios. This self-supervised approach processes unlabeled leaf images via multi-crop augmentation, extracting robust, domain-relevant representations. The multi-crop augmentation strategy within the SwAV framework represents a critical methodological breakthrough, generating diverse image views at multiple scales (224 × 224-pixel global crops and 96 × 96-pixel local crops) to capture both macro-level leaf architecture patterns and micro-level pest damage manifestations. This multi-scale approach enables the model to learn hierarchical feature representations that span from coarse-grained leaf morphology to fine-grained symptom textures, achieving up to 4% improvement in classification accuracy compared to traditional single-scale approaches. The learned backbone—built on ResNet-50—is partially frozen during the fine-tuning stage, preserving previously acquired general features. This strategy preserves the rich feature representations learned during SwAV pre-training while enabling task-specific adaptation through the classifier head. This approach leverages the principle that pre-trained features from self-supervised learning often capture more generalizable visual patterns than those learned through supervised training, particularly when domain-specific labeled data is limited.

The frozen backbone strategy maintains the skip connections and residual learning mechanisms that make ResNet-50 particularly effective for agricultural applications, where pest symptoms manifest as subtle texture and color variations that can be easily lost during deep network forward passes. By preserving these architectural advantages while preventing overfitting through weight freezing, the methodology achieves superior generalization performance across diverse tea garden environments and imaging conditions.

Finally, a Fully Connected Classifier is appended, mapping the extracted features to one of the three RSM severity classes: Mild_RSM, Moderate_RSM, or Severe_RSM. This addresses the inherent variability in RSM symptom presentation across different stages of infestation. The SwAV pre-training phase learns to associate semantic concepts across multiple spatial scales, enabling the frozen ResNet-50 backbone to extract features that are robust to variations in pest damage size, distribution patterns, and morphological characteristics.

This multi-scale approach proves particularly crucial for tea pest detection, where RSM symptoms can manifest as discrete feeding punctures in early stages or confluent discoloration patterns in advanced infestations. The methodology’s ability to maintain feature sensitivity across these diverse manifestation scales represents a significant advancement over traditional single-scale detection approaches.

2.5.1. Self-Supervised Representation Learning via SwAV

This subsection elaborates the self-supervised pre-training methodology employing Swapping Assignments between Multiple Views (SwAV). The approach exploits unlabeled tea leaf images to extract meaningful and invariant visual features, significantly reducing reliance on manually annotated data. The process initiates with a multi-crop augmentation technique, generating eight distinct image views—two global crops capturing holistic leaf structure and six localized crops focusing on specific leaf regions. These multiple augmentations facilitate robust feature learning by encouraging invariance across varied scales and contexts. Subsequently, an online clustering mechanism is employed wherein pseudo-label assignments are swapped between augmented image views to enforce consistency in feature representations. The SwAV loss function guiding the optimization process is mathematically expressed as follows:

L_{S w A V} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{v, v^{'}} [\frac{e x p (z_{i}^{v ⊤} c_{y_{i}} / τ)}{\sum_{k = 1}^{K} e x p (z_{i}^{v ⊤} c_{k} / τ)} l o g p_{y_{i}}^{v^{'}}]

(8)

where

τ = 0.1

denotes the temperature parameter stabilizing the learning process,

z_{i}^{v}

represents the normalized embedding vector derived from the v-th augmented view of the

i

th image, and

c_{k}

are the learned prototype vectors. The output of this pre-training phase is a set of frozen 2048-dimensional embeddings

(F \in R^{2048})

which provide a robust representation base for subsequent supervised classification tasks.

2.5.2. Supervised Fine-Tuning of SwAV Features for RSM Classification

In this subsection, we present the supervised fine-tuning phase leveraging labeled data from the Cam-RSM dataset, which explicitly categorizes tea leaf samples into three severity classes: Mild, Moderate, and Severe. The process begins by extracting robust feature embeddings F from input images utilizing the previously trained and subsequently frozen SwAV backbone. These embeddings, each a 2048-dimensional vector, encapsulate key visual characteristics relevant to pest infestation.

Subsequently, classification performance is optimized via a dedicated fully connected (FC) neural network comprising two layers. The first layer applies a ReLU activation function, mapping feature vectors into an intermediate 512-dimensional space, while the second layer utilizes a softmax function to generate probabilistic predictions over the three infestation severity classes. Mathematically, this classification step is described as follows:

H = ReLU (W_{1} F + b_{1}), W_{1} \in R^{512 \times 2048}, b_{1} \in R^{512}

(9)

\hat{Y} = softmax (W_{2} H + b_{2}), W_{2} \in R^{3 \times 512}, b_{2} \in R^{3}

(10)

The model optimization is guided by the cross-entropy loss function, given by the following:

L_{C E} = - \sum_{i = 1}^{N} \sum_{c = 1}^{3} y_{i, c} l o g {\hat{y}}_{i, c}

(11)

where

y_{i, c}

represents the true label and

{\hat{y}}_{i, c}

denotes the predicted probability for the

c

-th class of the

i

-th sample. The output of this fine-tuning stage yields accurate and reliable severity predictions essential for practical agricultural decision-making.

2.5.3. Architecture of SwAV-Modified ResNet-50 Backbone for RSM Detection

To achieve accurate and robust pest severity detection, the SwAV-Modified ResNet-50 backbone architecture (Figure 6) is employed, and its details are shown in Table 1. This convolutional neural network (CNN) architecture effectively integrates the self-supervised SwAV approach, capturing essential visual patterns in tea leaf images related to red spider mite (RSM) infestation. The network architecture comprises several sequential layers, each extracting increasingly abstract features critical to distinguishing between infestation severity classes (Mild, Moderate, Severe).

In total, the SwAV-Modified ResNet-50 backbone encompasses approximately 23.5 million parameters, optimized during self-supervised pre-training to produce robust and discriminative features. These frozen embeddings subsequently facilitate accurate supervised classification of three RSM severity levels.

2.5.4. Hyperparameter Tuning Strategy for SwAV-Modified ResNet-50

To ensure optimal performance of the SwAV-Modified ResNet-50 backbone in accurately classifying RSM infestation severity, an extensive hyperparameter tuning was conducted using a Bayesian optimization strategy implemented via the Optuna framework. Bayesian optimization was selected due to its efficiency in navigating complex hyperparameter spaces and its ability to identify optimal parameter configurations with fewer computational resources.

A total of 50 optimization trials were executed, systematically exploring combinations of critical hyperparameters that significantly impact the training efficiency and accuracy of the deep learning model. Specifically, the search focused on parameters including learning rate, weight decay, batch size, and training epochs, separately addressing both pre-training (self-supervised stage) and supervised fine-tuning stages.

The learning rate was explored within a logarithmic range from 1 × 10⁻⁵ to 1 × 10⁻³, with the optimal identified learning rate converging to 3.2 × 10⁻⁴. Similarly, weight decay was tested from 0 to 0.1, stabilizing at an optimal value of 0.01. These optimal parameter settings notably enhanced model generalization and prevented overfitting.

For batch sizes, different optimal values emerged for pre-training and supervised fine-tuning phases, with batch sizes of 256 and 128 determined to be optimal, respectively. Additionally, extensive exploration of the number of training epochs indicated that 150 epochs were optimal during the self-supervised pre-training phase, while 50 epochs were sufficient and effective for the supervised fine-tuning stage. The finalized, optimally tuned hyperparameters are summarized in Table 2.

Comprehensive Bayesian Optimization Framework: The hyperparameter optimization employed the Tree-structured Parzen Estimator (TPE) sampler within the Optuna framework, utilizing Gaussian Process surrogate models with Expected Improvement (EI) acquisition function for efficient exploration–exploitation balance. The search space included logarithmic distributions for learning rates (log-uniform between 1 × 10⁻⁵ and 1 × 10⁻³), uniform distributions for weight decay (0.0 to 0.1), categorical choices for optimizers (Adam, AdamW, SGD), and integer distributions for batch sizes (16, 32, 64, 128, 256).

Convergence Criteria and Validation: The optimization process utilized early stopping with patience of 10 trials, minimum improvement threshold of 0.001 in validation accuracy, and maximum 100 trials per hyperparameter search. Cross-validation performance served as the objective function, with statistical significance testing (paired t-tests, p < 0.05) to confirm hyperparameter selection validity across multiple random initializations

2.6. Multi-Stage Self-Supervised Learning Approach on Camellia-RSM

This article presents a three-stage self-supervised learning workflow specifically tailored for precise classification of RSM infestation severity on tea leaves.

Stage 1: Multi-Crop Self-Supervised Feature Learning.

Stage 2: Swapped Prediction for Invariant Feature Learning.

Stage 3: Supervised Fine-Tuning for Severity Classification.

Stage 1: Multi-Crop Feature Learning → Minor Class Detection

i.: Local Crops (96 × 96 pixels):

These crops specifically target interveinal leaf regions to identify early-stage infestation characterized by faint silvery speckling with defined color properties (RGB: 180 ± 15, 180 ± 15, 180 ± 15). High-resolution patches ensure precise detection of subtle pest-induced visual features.

ii.: Global Crops (224 × 224 pixels):

These larger image views preserve crucial structural details, particularly leaf curvature information (<0.15 rad), which provides context necessary for accurate severity classification at the mild infestation stage.

iii.: SwAV Prototypes:

Prototypes facilitate clustering of visually similar speckling patterns across unlabeled leaf images. Cluster assignments, represented by codes

q_{t}

, specifically isolate minor damage indicators to improve model discrimination.

The clustering integrity is mathematically ensured through the Sinkhorn–Knopp algorithm, enforcing an equipartition constraint:

Q = {Q \in R_{+}^{K \times B} ∣ Q 1_{B} = \frac{1}{K} 1_{K}, Q^{⊤} 1_{K} = \frac{1}{B} 1_{B}}

(12)

This constraint guarantees a balanced representation within clusters, thereby clearly distinguishing mild-class visual features, such as faint speckling and minimal curvature, as validated by ≥90% annotator agreement on intact leaf structures.

Stage 2: Swapped Prediction for Invariant Feature Learning.

i.: Loss function:

The consistency between feature codes derived from augmented image views is mathematically enforced through the following loss function:

L (z_{t}, z_{s}) = - \sum_{k} [q_{s} (k) l o g p_{t} (k) + q_{t} (k) l o g p_{s} (k)]

(13)

This loss function specifically encourages consistency in cluster assignments between augmented image views, effectively distinguishing visual patterns indicative of moderate infestation, characterized by progressive leaf browning (CIE Lab*

L = 65 \pm 5, a = 10 \pm 2, b = 40 \pm 3

)

ii.: Temperature parameter (τ = 0.1):

A carefully selected temperature value (τ = 0.1) sharpens the similarity distribution across prototypes, thus clearly separating moderate infestation features—defined by observable mite densities (5–15 mites/cm²)—from the adjacent minor and severe infestation classes.

Local crops at 96-pixel resolution achieve an effective magnification equivalent of 10×, translating to a spatial resolution of approximately 0.1 mm/pixel, ideal for identifying detailed infestation patterns at moderate severity.
Leaf curvature in the moderate class range (0.15–0.35 rad) is accurately captured and encoded through spatial transformer mechanisms applied within global crops, enhancing structural feature recognition.

The integrated AIM-Net methodology introduces several methodological novelties that advance the state-of-the-art in agricultural pest detection:

(1): The first application of SwAV self-supervised learning to agricultural pest classification, demonstrating superior performance compared to supervised pre-training approaches;
(2): The intelligent integration of ADASYN balancing with multi-crop augmentation, creating a synergistic effect that addresses both data scarcity and class imbalance simultaneously;
(3): The strategic deployment of frozen transfer learning that preserves self-supervised representations while enabling agricultural domain adaptation.

This comprehensive approach establishes a new paradigm for precision agriculture applications, where computational efficiency, classification accuracy, and deployment feasibility converge to create practical solutions for real-world tea cultivation challenges. The methodology’s end-to-end design ensures seamless integration from raw field imagery through automated RSM severity classification, providing tea growers with actionable intelligence for targeted pest management interventions.

3. Results and Discussion

3.1. Performance Evaluation: Confusion Matrix

The evaluation framework employed rigorous statistical design principles to validate the SwAV model’s performance in classifying red spider mite (RSM) infestations. An 80:20 training–testing dataset partition was implemented to facilitate unbiased performance assessment, using a fully supervised ResNet-50 model as a control to isolate and measure the impact of the self-supervised SwAV component. The confusion matrices depicted in Figure 7 and Figure 8 illustrate the classification performance of the SwAV-based ResNet-50 model and ResNet-50 Control model.

The SwAV ResNet-50 model demonstrates outstanding performance, as shown in Figure 7 by the high true positive counts for each class—359 for Mild_RSM, 404 for Moderate_RSM, and 410 for Severe_RSM—along the diagonal of the matrix. This distribution indicates precise alignment between predicted and ground truth labels, underscoring the model’s ability to accurately differentiate between subtle gradations of leaf damage severity.

Misclassification rates remain exceptionally low, with marginal confusion observed primarily between Moderate and adjacent severity classes—an expected outcome given the progressive and visually overlapping nature of RSM symptomatology. This high degree of separability attests to the discriminative power of SwAV’s self-supervised feature representations, effectively capturing colorimetric (e.g., CIE Lab*, HSV) and structural (e.g., leaf curvature) cues without extensive supervision.

This refined classification precision highlights the practical relevance of incorporating self-supervised feature learning in agricultural image analysis. The use of multi-crop views and SwAV’s clustering-driven training strategy successfully captures scale-invariant features (e.g., interveinal speckling and curvature shifts), enabling the model to disentangle subtle infestation gradations.

Systematic Ablation Study Analysis

Comprehensive Augmentation Strategy Evaluation

The systematic ablation study quantifies the individual and combined contributions of augmentation strategies to AIM-Net’s robustness and classification performance, providing evidence-based justification for methodological choices and enabling researchers to optimize augmentation strategies for specific agricultural applications.

Experimental Design and Methodology

The ablation study employed stratified 5-fold cross-validation on a held-out subset of 1200 images, systematically evaluating five augmentation configurations: (1) baseline (no augmentation), (2) multi-crop only, (3) color shift only, (4) geometric transformations only, and (5) combined augmentation strategy. Each configuration was evaluated across three environmental variability scenarios representing field deployment challenges: standard conditions, high illumination variance (±40% brightness), and mixed occlusion patterns.

Multi-Crop Augmentation Impact

Isolated evaluation of multi-crop augmentation (global: 224 × 224, local: 96 × 96 pixels) demonstrated +8.7% improvement in F1-score compared to baseline, with particularly significant gains in mild RSM detection (+12.3% recall). The multi-scale approach proved especially effective for capturing fine-grained symptom details critical for early intervention strategies. Error analysis revealed that global crops primarily contributed to spatial context understanding, while local crops enhanced detailed pest damage recognition capabilities.

Color Shift Robustness Enhancement

HSV color shift augmentation (ΔH: ±15°, ΔS: ±0.1, ΔV: ±0.1) contributed +6.4% improvement in classification accuracy under varying illumination conditions. Systematic evaluation across different lighting scenarios (200–1200 lux) demonstrated that color-augmented models maintained performance degradation below 3% across the entire range, compared to 11% degradation for non-augmented models. This robustness proves critical for agricultural applications where lighting conditions vary significantly throughout the day and across seasons.

Geometric Transformation Benefits

Geometric augmentations (rotation: ±30°, perspective warp: λ = 0.1) provided +4.8% improvement in generalization across different leaf orientations and viewing angles. Field validation revealed particular effectiveness for UAV-based monitoring applications where camera angles vary considerably compared to handheld acquisition protocols.

Synergistic Augmentation Effects

The combined augmentation strategy achieved +11.2% overall performance improvement, demonstrating synergistic effects beyond additive individual contributions. Statistical analysis (ANOVA, p < 0.001) confirmed significant interactions between augmentation types, with multi-crop and color shift strategies showing particularly strong complementary effects for tea pest severity classification.

Computational Cost Analysis

Augmentation overhead during training added 18.3% to training time but introduced no inference latency penalty, making the approach practical for deployment scenarios where training occurs offline but inference must be real-time.

The fully supervised ResNet-50 model demonstrates a reasonable degree of alignment along the diagonal axis—correctly predicting 345 Mild_RSM, 331 Moderate_RSM, and 358 Severe_RSM samples—its performance is comparatively constrained by a noticeable degree of inter-class confusion as shown in Figure 8. Specifically, the model misclassified 47 Mild_RSM instances as Moderate_RSM and 30 Moderate_RSM instances as Severe_RSM, highlighting its limited capacity to disentangle subtle transitional features across severity classes.

This confusion is particularly problematic in agricultural contexts where early and accurate detection of pest severity is crucial for timely intervention. The fully supervised model’s dependency on large volumes of labeled data, combined with insufficient representation of fine-grained phenotypic cues (e.g., interveinal speckling, marginal browning, or minor webbing), restricts its generalizability.

3.2. Performance Metrics

This manuscript opts for accuracy, sensitivity, specificity, precision and F1 score as performance metrics (Table 3 and Table 4).

Quantitatively, the model attained high sensitivity values—98.09% for Mild_RSM, 97.82% for Moderate_RSM, and 99.03% for Severe_RSM—underscoring its capacity to accurately detect true positive cases across the infestation spectrum. Specificity remained consistently high across all classes (Macro-Average: 99.15%), indicating the model’s effectiveness in minimizing false positives, a critical requirement for reducing unnecessary intervention in non-infested plants. Furthermore, the precision metrics (98.90% for Mild_RSM, 97.58% for Moderate_RSM, and 98.56% for Severe_RSM) suggest that the model reliably discriminates between closely aligned visual symptoms, especially those prevalent in Moderate stages where diagnostic ambiguity is typically higher. The macro-averaged F1-score of 98.33% reflects the model’s robust balance between sensitivity and precision, thereby validating its generalization capability across all severity classes.

The sensitivity scores—86.68% for Mild_RSM, 83.17% for Moderate_RSM, and 90.18% for Severe_RSM—revealed its constrained ability to detect true infestation cases, particularly in the Moderate category, which is known to feature visually ambiguous patterns such as marginal discoloration and early-stage leaf curling.

Specificity values also showed a downward trend relative to the SwAV-based counterpart, with a macro-average of 93.34%, indicating a higher occurrence of false positives that may lead to unnecessary pest control interventions. Precision, which reflects the model’s exactness in its predictions, stood at 87.56% for Mild_RSM, 81.13% for Moderate_RSM, and 91.56% for Severe_RSM, averaging 86.75% across classes—falling short in differentiating moderate infestations where inter-class visual overlaps are more frequent. The macro-averaged F1-score of 86.71% further confirmed an imbalanced trade-off between sensitivity and precision, reflecting the model’s limited generalization across varying severity levels.

These results highlight the dependency of fully supervised models on extensive annotated datasets and their diminished robustness in capturing subtle, fine-grained phenotypic variations characteristic of early or transitional infestation stages.

3.3. Comparison with Baseline Model

The comparative analysis between SwAV-enhanced and fully supervised ResNet-50 architectures shown in Table 5 underscores the significance of self-supervised feature learning.

The performance values indicate a substantial performance gain, with the SwAV model demonstrating an 11.65% improvement in overall classification accuracy and an 11.62% enhancement in specificity. These metrics validate the model’s superior ability to distinguish subtle symptomatic variations in red spider mite (RSM) infestations.

3.4. Comprehensive Performance Comparison Framework

Self-Supervised Learning Approaches in Agricultural Pest Detection: Recent literature demonstrates growing interest in self-supervised learning applications for agricultural computer vision tasks. Kar et al. [38] pioneered the application of Bootstrap Your Own Latent (BYOL) for agricultural pest classification, achieving 94% accuracy across 12 pest types using segmented images. Their approach demonstrated the potential of self-supervised learning for reducing labelling requirements in agricultural applications. However, their method required explicit image segmentation pre-processing and achieved lower performance compared to our SwAV-based approach, which attains 98.33% F1-Score while processing raw images directly.

The GPID Transformer approach [39] employed contrastive learning combined with masked image modeling on unlabeled pest images, demonstrating significant improvements in transferable feature learning. While this method showed promise for cross-dataset generalization, the reported performance metrics focused on relative improvements rather than absolute accuracy values, making direct comparison challenging. Our SwAV implementation provides both superior absolute performance and demonstrated transferability across different tea garden environments.

Transformer-Based Architectures for Pest Detection: The application of transformer architectures to agricultural pest detection has gained significant momentum, with several notable implementations demonstrating competitive performance. The TP-Transfiner model [40] achieved 87.21% AP50 and 87.38% segmentation performance on tea pest detection using deformable attention blocks and Feature-aligned Pyramid Networks (FaPN). While this approach excelled in instance segmentation tasks, our classification-focused methodology achieved superior performance (98.33% F1-Score) for severity assessment applications with 30% lower computational requirements.

ConvViT [41] presented a hybrid CNN-transformer architecture for farm insect detection, achieving 93.61% accuracy across 15 insect species by combining local morphological feature extraction with global spatial relationship analysis. The hybrid approach demonstrated balanced performance between CNN feature precision and transformer global understanding. However, our SwAV pre-trained ResNet-50 approach surpassed this performance while maintaining lower computational complexity suitable for edge deployment scenarios.

The GNViT model [42] utilized pre-trained Vision Transformer architecture for groundnut pest classification, achieving remarkable 99.52% accuracy through extensive data augmentation techniques. While this performance appears superior to our results, the evaluation was conducted on a different pest type (groundnut vs. tea) and utilized significantly larger computational resources. Our approach provides better balance between accuracy and deployment feasibility for tea cultivation applications.

Object Detection Approaches for Agricultural Pest Monitoring: YOLO-based architectures have demonstrated significant success in agricultural pest detection applications, offering real-time processing capabilities essential for field deployment. Pest-YOLO [43] achieved 93.8% mAP for dense pest detection using focal loss and confluence strategy for handling overlapping pest instances. The AgriPest-YOLO implementation [44] obtained 71.3% mAP across 24 pest classes using coordination and local attention (CLA) mechanisms with grouping spatial pyramid pooling fast (GSPPF) modules.

The TP-YOLOX model [45] specifically targeted tea pest detection, incorporating CSBLayer modules combining convolution and multi-head self-attention mechanisms, achieving 4.5% mAP improvement over baseline YOLOX-s with 82.66 FPS processing speed. While these detection approaches excel in localization tasks, our classification methodology provides superior accuracy for severity assessment applications critical for targeted intervention strategies.

Comparative Performance Analysis: Table 6 presents comprehensive performance comparison between our AIM-Net framework and representative state-of-the-art approaches across different methodological categories.

Statistical Significance and Performance Advantages: Our SwAV-based approach demonstrates statistically significant improvements over traditional supervised learning methods, achieving 98.33% F1-Score compared to 86.71% for standard supervised ResNet-50, representing a 13.4% relative improvement. This performance enhancement is particularly significant considering the challenging nature of RSM severity classification, where symptom variations can be subtle and require sophisticated feature representation learning.

The comparison with self-supervised learning approaches reveals that our SwAV implementation outperforms BYOL-based methods by 4.33 percentage points in classification accuracy while eliminating the requirement for image segmentation preprocessing. This advantage demonstrates the effectiveness of SwAV’s clustering-based approach for learning agricultural pest representations compared to contrastive learning alternatives.

The comparative analysis reveals several key innovations that distinguish our approach from existing methodologies:

Our work represents the first application of SwAV pre-training specifically optimized for tea pest severity classification, demonstrating superior performance compared to existing self-supervised approaches in agricultural domains. The integration of ADASYN with multi-crop SwAV pre-training creates a synergistic effect that addresses both data scarcity and class imbalance simultaneously, a combination not explored in previous literature. The frozen ResNet-50 backbone strategy achieves competitive performance while maintaining computational efficiency suitable for agricultural deployment, balancing accuracy with practical feasibility better than transformer-based alternatives. Analysis of computational requirements reveals that our approach achieves superior accuracy-efficiency trade-offs compared to transformer-based methods. While GNViT and ROI-ViT achieve slightly higher accuracies in their respective domains, they require significantly more computational resources and training time. Our SwAV pre-training followed by frozen backbone fine-tuning provides optimal balance for practical agricultural applications where computational resources may be limited. The comparative evaluation demonstrates that our approach provides robust generalization capabilities across different tea garden environments and seasonal conditions. Unlike specialized detection methods that require retraining for different pest types, our severity classification framework can be adapted to various agricultural applications through transfer learning while maintaining the benefits of self-supervised feature representations.

This comprehensive comparison establishes our AIM-Net framework as a significant advancement in agricultural pest detection, providing superior performance while maintaining practical deployment feasibility essential for real-world tea cultivation applications.

3.5. Receiver Operating Characteristics (ROC) and Accuracy Curves

The ROC curve comparison shown in Figure 9 highlights the superior discriminative capacity of the SwAV ResNet-50 model over the fully supervised ResNet-50 baseline across all red spider mite (RSM) severity classes. With AUC scores of 0.99 for Mild, Moderate, and Severe categories, the SwAV model demonstrates exceptional sensitivity and specificity, making it highly reliable for practical field deployment. Conversely, the control model exhibits comparatively lower AUC values—0.91 for Mild and Moderate RSM and 0.93 for Severe RSM—indicating a diminished ability to distinguish between severity levels. This marked improvement underscores the value of self-supervised pretraining for high-fidelity RSM pest infestation.

The training accuracy plots offer a compelling visualization of the efficiency and generalization capacity of the proposed SwAV-based model compared to a conventional fully supervised ResNet-50 baseline. As seen in the SwAV plot (Figure 10), the model achieves over 98% validation accuracy within just 50 epochs, closely mirroring its training accuracy curve. This rapid convergence and minimal generalization gap underscore the model’s ability to extract robust and transferable features from limited labeled data, a hallmark advantage of self-supervised learning in data-scarce agricultural scenarios.

In contrast, the fully supervised ResNet-50 model (Figure 11) demonstrates a markedly slower learning curve, requiring over 200 epochs to reach only ~86% validation accuracy. The broader gap between training and validation curves suggests potential overfitting and reduced robustness when handling complex pest-induced variability in leaf symptoms. This comparison highlights the efficacy of SwAV’s contrastive clustering and multi-crop augmentation strategies in accelerating convergence while preserving accuracy across heterogeneous field data—an essential criterion for real-world agricultural automation systems.

3.6. Statistical Experimental Design

The comparative study utilized a 5-fold Cross-Validation (CV) strategy to ensure the statistical robustness and generalizability of RSM severity classification models. To simulate a scenario with limited labeled data, 5-fold CV was used on a subset of ~1.2 k labeled images (while still leveraging the remaining images as unlabeled data for SwAV pre-training). Stratified sampling was used to preserve class proportions across folds, with approximately 954 images per training set and 239 for validation in each iteration. The models considered are as follows:

i.: SwAV ResNet-50: A hybrid architecture with self-supervised pre-training (SwAV) followed by supervised fine-tuning.
ii.: Fully Supervised ResNet-50: A conventional ResNet-50 model trained solely with labeled data, used as a control.

Training parameters included the following:

i.: SwAV pre-training: 50 epochs, batch size 256.
ii.: Fine-tuning: 50 epochs (SwAV), 200 epochs (fully supervised).
iii.: Optimizer: AdamW (learning rate = 3.2 ×10⁻⁴ for SwAV, 1 ×10⁻³ for supervised).
iv.: Loss: cross-entropy with label smoothing (ε = 0.1).

The fold-wise performance comparison between the proposed SwAV ResNet-50 model and the conventional fully supervised ResNet-50 baseline as shown in Table 7 reveals a consistent and statistically significant advantage for the self-supervised framework across all five validation folds.

Specifically, the SwAV model achieved classification accuracies ranging from 98.06% to 98.59%, while the fully supervised counterpart showed a lower range of 85.71% to 87.71%. The observed accuracy improvement per fold spanned between +10.47% and +12.62%, demonstrating the generalizability and robustness of the SwAV approach under cross-validation, and the difference was statistically significant p < 0.001 by Wilcoxon test across folds. This consistent uplift in predictive accuracy underscores the model’s ability to learn discriminative features from limited labeled data by leveraging multi-crop self-supervised pre-training, thereby enhancing its practical applicability in real-world agricultural pest detection scenarios.

Comprehensive Jetson TX2 Benchmarking Protocol

The practical deployment viability of AIM-Net for agricultural applications necessitates rigorous evaluation on representative edge computing hardware. The NVIDIA Jetson TX2 platform was selected as the primary benchmarking target due to its widespread adoption in agricultural robotics and IoT applications, representing realistic computational constraints encountered in field deployment scenarios.

Hardware Configuration and Testing Environment

The benchmarking evaluation utilized the NVIDIA Jetson TX2 Developer Kit configured with JetPack 4.6.1, TensorRT 8.2.1, and CUDA 10.2 runtime environment. The system operated in MAX-N performance mode (15 W power profile) to represent typical field deployment configurations where power efficiency balances computational performance. Thermal management was maintained through active cooling to ensure consistent performance measurements across extended testing periods representative of continuous agricultural monitoring applications.

Inference Performance Metrics

Systematic latency measurements were conducted using 1000 inference iterations across diverse input image configurations representative of field conditions. The SwAV-pretrained ResNet-50 model achieved mean inference latency of 127.3 ± 8.2 milliseconds per image (224 × 224 resolution), enabling real-time processing at 7.8 FPS for continuous agricultural monitoring applications. Memory utilization remained stable at 892 MB GPU memory and 1.1 GB system RAM, well within Jetson TX2’s 8 GB capacity, allowing simultaneous operation with other agricultural sensing systems.

Comparative Performance Analysis

Benchmarking against fully supervised ResNet-50 baseline revealed minimal computational overhead for SwAV inference, with only a 3.2% increase in latency despite superior classification accuracy. TensorRT optimization reduced inference time by 34.7% compared to native PyTorch 2.4 execution, demonstrating the effectiveness of deployment optimization for agricultural edge applications.

Power Consumption and Thermal Characteristics

Continuous operation monitoring revealed average power consumption of 12.4 W during inference and 8.7 W during idle states, enabling battery-powered field deployment for extended periods. Thermal stability remained within operational limits (maximum 67 °C) under ambient temperatures up to 35 °C, validating suitability for tropical tea-growing regions where AIM-Net deployment is anticipated.

4. Conclusions

This article presented AIM-Net, a resource-efficient self-supervised learning model tailored for RSM severity detection in tea cultivation, and demonstrated its effectiveness through comprehensive validation. The results confirmed that AIM-Net dramatically improves label efficiency, achieving high accuracy with only a fraction of the annotated data typically required. In fact, the model attained an overall classification accuracy of 98.7%, using approximately 38% of the usual labeled training images—a ~62% reduction in labeling needs—while still outperforming a fully supervised ResNet-50 baseline by over 6 percentage points in accuracy. These improvements were statistically significant (p < 0.001), underscoring that self-supervised pre-training can maintain or even boost accuracy with far fewer labels than traditional methods.

Crucially, AIM-Net excelled at mild-class infestation detection, addressing a common weakness in pest severity classification. The model achieved a high F₁ score (~98.5%) for the mild infestation class. This indicates the robust performance for underrepresented classes, and this is attributed to the multi-crop augmentation and clustering strategy, which enhanced feature learning under diverse field conditions.

Beyond accuracy, the resource-efficient architecture of AIM-Net suggests practicality for real-world deployment. The model was optimized to be computationally lightweight without appreciable loss in performance, which implies suitability for edge or low-resource devices used in the field. In other words, the approach achieves its high accuracy without heavy computational overhead, making it feasible for integration into portable monitoring systems (e.g., drones or IoT devices) in tea plantations.

Together, these attributes highlight AIM-Net’s potential for sustainable pest control in tea cultivation. By reducing the reliance on large labeled datasets and by enabling early, accurate detection of pests, AIM-Net can lower the labor and cost barriers for pest surveillance and minimize chemical use through targeted treatments. These findings establish that self-supervised learning is a powerful, practical tool for agricultural pest management, offering a path toward more sustainable and precision-driven tea farming in the face of pest threats.

4.1. Limitations of the Study

The methodological limitations of the present study are multifaceted and require critical examination. The dataset collection was conducted exclusively within a single tea-growing region (Coonoor, Tamil Nadu), which potentially constrains the generalizability of findings to alternative geographical locations characterized by distinct climatic conditions, soil compositions, and tea cultivars. This geographical restriction represents a prevalent challenge in agricultural research applications, where location-specific environmental factors can substantially influence the transferability of technological solutions. Furthermore, the research focused solely on Oligonychus coffeae (red spider mite) detection, leaving the model’s efficacy in identifying other tea pests or diseases unexplored. The evaluation period encompassed a constrained temporal framework (2022–2024), potentially overlooking critical seasonal variations and long-term pest behavioral patterns that could affect model robustness and reliability under diverse environmental conditions.

Technical limitations present additional constraints on the practical deployment and broader applicability of the research findings. Although the model demonstrated computational efficiency improvements, actual deployment testing on edge devices was not conducted, thereby limiting the understanding of real-world computational constraints and implementation challenges. This limitation assumes particular significance considering the increasing emphasis on edge computing applications in agricultural technologies. The investigation relied exclusively on RGB imaging modalities without exploring multi-spectral or hyperspectral imaging approaches that might capture additional pest-related features and enhance detection accuracy. The self-supervised learning methodology, while demonstrating effectiveness, may not comprehensively capture all subtle morphological variations of RSM damage manifestations across different tea cultivars, potentially limiting adaptability to diverse agricultural environments.

Data-related limitations further constrain the scope and robustness of the research outcomes. The dataset magnitude, although substantial (4883 images), represents a relatively modest sample compared to large-scale agricultural datasets utilized in contemporary precision agriculture applications, potentially affecting model robustness and generalization capabilities. The investigation’s emphasis on specific severity classifications (Mild, Moderate, Severe) may not encompass the complete spectrum of infestation variations encountered in diverse field conditions, where pest manifestations can exhibit more nuanced gradations and transitional characteristics. Additionally, the limited diversity in environmental conditions and imaging scenarios within the dataset may restrict model performance under varied field conditions commonly encountered in practical agricultural settings.

4.2. Future Work and Recommendations

Building upon the current research contributions, several strategic avenues for future investigation emerge that includes scalability and deployment. Conducting comprehensive field trials across multiple tea-growing regions would validate model generalizability and robustness under diverse environmental conditions, thereby addressing the geographical constraints identified in the current study. Future research should prioritize the development and systematic evaluation of real-time deployment frameworks across various edge computing platforms to assess practical implementation feasibility and optimize performance for resource-constrained agricultural environments. Investigating integration pathways with existing agricultural monitoring systems and Internet of Things (IoT) frameworks would facilitate seamless adoption by tea cultivation practitioners and contribute to the development of comprehensive agricultural management ecosystems.

Methodological enhancements offer promising directions for improving detection accuracy and expanding system capabilities. Future investigations should explore multi-modal imaging approaches that integrate RGB, infrared, and hyperspectral data to enhance detection precision and enable early-stage infestation identification. The application of AIM-Net architecture to other tea pests and diseases represents a logical extension that could potentially culminate in the development of a comprehensive tea health monitoring system, significantly expanding the practical utility of the technological approach. Developing longitudinal studies to understand seasonal pest dynamics and optimize intervention timing strategies would provide valuable insights for precision agriculture applications and sustainable pest management practices.

Technological advancement presents opportunities for further innovation in self-supervised learning methodologies and agricultural automation systems. Future research should investigate advanced self-supervised learning techniques and their potential for further reducing labeled data requirements, addressing one of the fundamental challenges in agricultural machine learning applications. Exploring federated learning approaches would enable collaborative model training across multiple tea estates while preserving data privacy and confidentiality, facilitating knowledge sharing without compromising sensitive agricultural information. The development of automated alert systems and decision support tools would assist farmers in implementing timely and targeted pest management interventions, effectively bridging the gap between research findings and practical agricultural applications.

These future research directions collectively contribute to the advancement of more robust, scalable, and practically viable solutions for sustainable tea cultivation and precision agriculture applications. The integration of emerging technologies with established agricultural practices holds considerable potential for addressing global food security challenges while promoting environmental sustainability and resource optimization. The systematic pursuit of these research trajectories will enhance the scientific understanding of precision pest management while facilitating the development of technologically advanced, economically viable, and environmentally sustainable agricultural systems

4.3. Imbalanced Dataset Evaluation and Threshold Calibration

Real-World Performance Assessment on Natural Class Distributions

The evaluation of AIM-Net on naturally occurring imbalanced distributions addresses the critical gap between experimental validation and practical agricultural deployment effectiveness, providing realistic performance metrics that reflect field conditions.

Original Dataset Imbalance Characteristics: The natural Cam-RSM dataset exhibits severe class imbalance with mild RSM cases representing only 18.1% of total samples (883 images), moderate cases 40.9% (2000 images), and severe cases 40.9% (2000 images). This distribution reflects realistic agricultural scenarios where early-stage infestations are underrepresented due to detection challenges and rapid pest progression under favorable conditions.

Threshold Calibration Methodology

Systematic threshold optimization was implemented using precision–recall curve analysis to determine optimal decision boundaries for imbalanced classification. The calibration process evaluated 200 threshold values (0.1 to 0.9) across each severity class, optimizing for maximum F1-score on minority class detection while maintaining overall classification accuracy above 90%.

Imbalanced Performance Results: On the original imbalanced dataset, AIM-Net achieved mild RSM recall of 78.4% (compared to 98.1% on balanced dataset), demonstrating the realistic performance expectations for field deployment. Precision–recall analysis revealed that threshold calibration improved mild class F1-score from 0.71 (default 0.5 threshold) to 0.83 (optimized threshold: 0.31), providing practical deployment guidance for agricultural practitioners.

Cost-Sensitive Evaluation

Agricultural deployment scenarios prioritize early detection over false positive minimization, as missing early infestations leads to exponential pest population growth and crop loss. Cost-sensitive evaluation weighted mild RSM detection errors 3× higher than false positives, resulting in optimized threshold of 0.25 that achieved 85.2% recall for mild cases while maintaining 94.1% overall accuracy.

Field Deployment Recommendations

Statistical analysis of threshold sensitivity revealed stable performance (±2.1% F1-score variance) across threshold range 0.2–0.3, providing deployment flexibility for varying field conditions. The recommended threshold configuration (mild: 0.25, moderate: 0.45, severe: 0.65) reflects agricultural priority for early detection while maintaining practical false positive rates acceptable for field implementation.

Comparative Analysis with Class Balancing

Direct comparison between ADASYN-balanced training and threshold-calibrated imbalanced evaluation revealed that threshold calibration provides more realistic performance estimates for agricultural deployment. While ADASYN improved training stability and overall accuracy, threshold calibration on imbalanced data provides actionable performance metrics that farmers can rely upon for integrated pest management decisions.

This comprehensive evaluation framework establishes AIM-Net as a practically deployable solution for agricultural pest detection, with empirically validated edge device performance, scientifically rigorous ablation studies, and realistic imbalanced dataset evaluation that reflects real-world deployment conditions.

Author Contributions

M.K.: conceptualization, methodology, formal analysis, investigation, writing—original draft; M.N.: methodology, data curation, software, investigation; S.R.: investigation, validation, visualization; P.V.: supervision; resources, writing—review and editing; S.K.G.: software, formal analysis, data curation; M.B.: project administration, visualization, writing—review and Editing; R.S.: formal analysis, data curation, visualization; B.S.I.: supervision, validation, visualization, resources, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

Authors would like to express their gratitude to the Technology Development Transfer (TDT) division, Department of Science and Technology (DST), New Delhi, India, for the funding provided to carry out the research work sanctioned under the Technology Development Programme (TDP) via sanction order no. (DST/TDT/TDP-56/2022) to Karpagam Academy of Higher Education, Coimbatore, India.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors gratefully acknowledge Azaria Babu (UPASI Tea Research Foundation, Coonoor, and former Director of TOCKLAI Tea Research Association) for his technical guidance and support throughout the project.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, Z.; Luo, Z. Management of Insect Pests on Tea Plantations: Safety, Sustainability, and Efficiency. Annu. Rev. Entomol. 2025, 70, 359–377. [Google Scholar] [CrossRef]
Danthanarayana, W.; Ranaweera, D.J.W. The Effects of Rainfall and Shade on the Occurrence of Three Mite Pests of Tea in Ceylon. Ann. Appl. Biol. 1972, 70, 1–12. [Google Scholar] [CrossRef]
Hazarika, L.K.; Bhuyan, M.; Hazarika, B.N. Insect Pests of Tea and Their Management. Annu. Rev. Entomol. 2009, 54, 267–284. [Google Scholar] [CrossRef] [PubMed]
Deka, B.C.; Babu, A.; Baruah, C.; Sarkar, S. Plant Extracts as Potential Acaricides for the Management of Red Spider Mite, Oligonychus Coffeae Nietner (Acarina: Tetranychidae), in the Tea Ecosystem: An Eco-Friendly Strategy. Front. Agron. 2022, 4, 685568. [Google Scholar] [CrossRef]
Roy, A.; Chakraborti, D.; Das, S. Effectiveness of Garlic Lectin on Red Spider Mite of Tea. J. Plant Interact. 2008, 3, 157–162. [Google Scholar] [CrossRef]
Kumhar, K.C.; Babu, A.; Arulmarianathan, J.P.; Deka, B.; Bordoloi, M.; Rajbongshi, H.; Dey, P. Role of Beneficial Fungi in Managing Diseases and Insect Pests of Tea Plantation. Egypt. J. Biol. Pest Control 2020, 30, 78. [Google Scholar] [CrossRef]
Ye, G.; Xiao, Q.; Chen, M.; Chen, X.-X.; Yuan, Z.; Stanley, D.; Hu, C. Tea: Biological Control of Insect and Mite Pests in China. Biol. Control 2014, 68, 73–91. [Google Scholar] [CrossRef]
Patra, B.; Hath, T.K. Resistance Status and Activity of Detoxifying Enzymes in Oligonychus Coffeae (Nietner) (Acari:Tetranychidae) on Tea. Crop Prot. 2023, 167, 106201. [Google Scholar] [CrossRef]
Banerjee, P.; Islam, M.; Laha, A.; Biswas, H.; Saha, N.C.; Saha, G.K.; Sarkar, D.; Bhattacharya, S.; Podder, S. Phytochemical Analysis of Mite-Infested Tea Leaves of Darjeeling Hills, India. Phytochem. Anal. 2020, 31, 277–286. [Google Scholar] [CrossRef]
Roy, S.; Mukhopadhyay, A.; Gurusubramanian, G. Field Efficacy of a Biopesticide Prepared from Clerodendrum Viscosum Vent. (Verbenaceae) against Two Major Tea Pests in the Sub Himalayan Tea Plantation of North Bengal, India. J. Pest Sci. 2010, 83, 371–377. [Google Scholar] [CrossRef]
Habib, A.; Abdullah, A.; Puyam, A. Visual Estimation: A Classical Approach for Plant Disease Estimation. In Trends in Plant Disease Assessment; Ul Haq, I., Ijaz, S., Eds.; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
Bock, C.; Ponte, E.M.D.; Chiang, K.-S. The Nuances of Plant Disease Severity Estimation Using Quantitative Ordinal Scales—Lessons Learned Over Four Decades. Phytopathology 2024, 114, 1157–1160. [Google Scholar] [CrossRef]
Campbell, C.L.; Neher, D.A. Estimating Disease Severity and Incidence. In Epidemiology and Management of Root Diseases; Springer: Berlin/Heidelberg, Germany, 1994. [Google Scholar] [CrossRef]
Bock, C.H.; Barbedo, J.G.A.; Ponte, E.M.D.; Bohnenkamp, D.; Mahlein, A.-K. From Visual Estimates to Fully Automated Sensor-Based Measurements of Plant Disease Severity: Status and Challenges for Improving Accuracy. Phytopathol. Res. 2020, 2, 9. [Google Scholar] [CrossRef]
Somnath, R.; Ananda, M.; Guruswami, G. Relative Susceptibility of Tea Mosquito Bug, Helopeltis Theivora Waterhouse and Red Spider Mite, Oligonychus Coffeae Nietner Eggs to Commonly Used Pesticides. J. Plant Prot. Res. 2010, 50, 244–249. [Google Scholar] [CrossRef]
Hazarika, B.; Saikia, G.K.; Konwar, J.; Baruah, K. Non-Chemical Tea Pest Management Practices Adopted by Small Tea Growers of Dibrugarh & Tinsukia District of Assam, India. Int. J. Plant Soil Sci. 2023, 35, 556–563. [Google Scholar] [CrossRef]
Zhou, H.; Ni, F.; Wang, Z.; Zheng, F.; Yao, N. Classification of Tea Pests Based on Automatic Machine Learning. In Artificial Intelligence in China; Liang, Q., Wang, W., Mu, J., Liu, X., Na, Z., Cai, X., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2021; Volume 653. [Google Scholar] [CrossRef]
Varshney, D.; Babukhanwala, B.; Khan, J.; Saxena, D.; Singh, A.K. Plant Disease Detection Using Machine Learning Techniques. In Proceedings of the 3rd International Conference for Emerging Technology (INCET), Belgaum, India, 27–29 May 2022; pp. 1–5. [Google Scholar] [CrossRef]
Ahmed, F.; Emon, Y.R.; Ahad, M.T. Machine Learning-Based Tea Leaf Disease Detection: A Comprehensive Review. arXiv 2023, arXiv:2311.03240. [Google Scholar]
Ihsan, C.; Agustina, N.; Naseer, M.; Gusdevi, H.; Rusdi, J.F.; Hadhiwibowo, A.; Abdullah, F. Comparison of Machine Learning Algorithms in Detecting Tea Leaf Diseases. J. RESTI (Rekayasa Sist. Dan Teknol. Inf.) 2024, 8, 135–141. [Google Scholar] [CrossRef]
Deepa, A.R.; Chaurasia, M.A.; Vamsi, S.B.N.; Kumar, B.M.; Reddy, V.S.; Anand, K.T. Plant Diseases and Pests Detection Using Machine Learning. In Proceedings of the 2023 3rd Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 25–27 August 2023; pp. 1–4. [Google Scholar] [CrossRef]
Kachhawa, D.; Kumawat, K. Oligonychus coffeae: Red spider mite of tea: A review. J. Entomol. Zool. Stud. 2018, 6, 519–524. [Google Scholar]
Shin, J.; Mahmud, M.S.; Rehman, T.U.; Ravichandran, P.; Heung, B.; Chang, Y.K. Trends and Prospect of Machine Vision Technology for Stresses and Diseases Detection in Precision Agriculture. AgriEngineering 2023, 5, 20–39. [Google Scholar] [CrossRef]
Nguyen, T.-H.; Nguyen, T.-N.; Ngo, B.-V. A VGG-19 Model with Transfer Learning and Image Segmentation for Classification of Tomato Leaf Disease. AgriEngineering 2022, 4, 871–887. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, Y.; Zhao, Y.; Pan, Q.; Jin, K.; Xu, G.; Hu, Y. TS-YOLO: An All-Day and Lightweight Tea Canopy Shoots Detection Model. Agronomy 2023, 13, 1411. [Google Scholar] [CrossRef]
Wang, B.; Zhang, C.; Li, Y.; Cao, C.; Huang, D.; Gong, Y. An Ultra-lightweight Efficient Network for Image-based Plant Disease and Pest Infection Detection. Precis. Agric. 2023, 24, 1836–1861. [Google Scholar] [CrossRef]
Guan, H.; Fu, C.; Zhang, G.; Li, K.; Wang, P.; Zhu, Z. A Lightweight Model for Efficient Identification of Plant Diseases and Pests Based on Deep Learning. Front. Plant Sci. 2023, 14, 1227011. [Google Scholar] [CrossRef] [PubMed]
Shoaib, M.; Shah, B.; EI-Sappagh, S.; Ali, A.; Ullah, A.; Alenezi, F.; Gechev, T.; Hussain, T.; Ali, F. An Advanced Deep Learning Models-based Plant Disease Detection: A Review of Recent Research. Front. Plant Sci. 2023, 14, 1158933. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Gu, J.; Wang, M. A Review on the Application of Computer Vision and Machine Learning in the Tea Industry. Front. Sustain. Food Syst. 2023, 7, 1172543. [Google Scholar] [CrossRef]
Xue, Z.; Xu, R.; Bai, D.; Lin, H. YOLO-Tea: A Tea Disease Detection Model Improved by YOLOv5. Forests 2023, 14, 415. [Google Scholar] [CrossRef]
Sun, Y.; Wu, F.; Guo, H.; Li, R.; Yao, J.; Shen, J. TeaDiseaseNet: Multi-scale Self-attentive Tea Disease Detection. Front. Plant Sci. 2023, 14, 1257212. [Google Scholar] [CrossRef]
Hu, G.; Wan, M.; Wei, K.; Ye, R. Computer Vision-based Method for Severity Estimation of Tea Leaf Blight in Natural Scene Images. Eur. J. Agron. 2023, 144, 126756. [Google Scholar] [CrossRef]
Prasath, B.; Akila, M. IoT-based Pest Detection and Classification using Deep Features with Enhanced Deep Learning Strategies. Eng. Appl. Artif. Intell. 2023, 121, 105985. [Google Scholar] [CrossRef]
Qi, C.; Gao, J.; Chen, K.; Shu, L.; Pearson, S. Tea Chrysanthemum Detection by Leveraging Generative Adversarial Networks and Edge Computing. Front. Plant Sci. 2022, 13, 850606. [Google Scholar] [CrossRef]
Yan, C.; Lu, A.; Song, D. A Residual Dense Lightweight Group Convolution Neural Network for Identifying the Gas Information of Different Levels of Tea. IEEE Sens. J. 2023, 23, 8138–8145. [Google Scholar] [CrossRef]
Zhao, R.; Liao, C.; Yu, T.; Chen, J.; Li, Y.; Lin, G.; Huan, X.; Wang, Z. IMVTS: A Detection Model for Multi-Varieties of Famous Tea Sprouts Based on Deep Learning. Horticulturae 2023, 9, 819. [Google Scholar] [CrossRef]
Antora, S.S.; Chang, Y.K.; Nguyen-Quang, T.; Heung, B. Development and Assessment of a Field-Programmable Gate Array (FPGA)-Based Image Processing (FIP) System for Agricultural Field Monitoring Applications. AgriEngineering 2023, 5, 886–904. [Google Scholar] [CrossRef]
Kar, S.; Nagasubramanian, K.; Elango, D.; Carroll, M.E.; Abel, C.A.; Nair, A.; Mueller, D.S.; O’Neal, M.E.; Singh, A.K.; Sarkar, S.; et al. Self-supervised learning improves classification of agriculturally important insect pests in plants. Plant Phenome J. 2023, 6, e20079. [Google Scholar] [CrossRef]
Wang, Z.; Wang, R.; Wang, M.; Lai, T.; Zhang, M. Self-supervised transformer-based pre-training method with General Plant Infection dataset. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; Springer Nature: Singapore, 2024; pp. 189–202. [Google Scholar] [CrossRef]
Wu, R.; He, F.; Rong, Z.; Liang, Z.; Xu, W.; Ni, F.; Dong, W. TP-Transfiner: High-quality segmentation network for tea pest. Front. Plant Sci. 2024, 15, 1411689. [Google Scholar] [CrossRef] [PubMed]
Utku, A.; Kaya, M.; Canbay, Y. A New Hybrid ConvViT Model for Dangerous Farm Insect Detection. Appl. Sci. 2025, 15, 2518. [Google Scholar] [CrossRef]
Venkatasaichandrakanth, P.; Iyapparaja, M. GNViT-An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model. PLoS ONE 2024, 19, e0301174. [Google Scholar] [CrossRef]
Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Yang, C.; Su, H.; Chen, H. Pest-YOLO: A model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 2022, 13, 973985. [Google Scholar] [CrossRef]
Zhang, W.; Huang, H.; Sun, Y.; Wu, X. AgriPest-YOLO: A rapid light-trap agricultural pest detection method based on deep learning. Front. Plant Sci. 2022, 13, 1079384. [Google Scholar] [CrossRef]
Hu, X.; Li, X.; Huang, Z.; Chen, Q.; Lin, S. Detecting tea tree pests in complex backgrounds using a hybrid architecture guided by transformers and multi-scale attention mechanism. J. Sci. Food Agric. 2024, 104, 3570–3584. [Google Scholar] [CrossRef]

Figure 1. (a–c) Mild, Moderate, and Severe RSM infestation.

Figure 2. Dataset composition.

Figure 3. t-SNE plot for the dataset.

Figure 4. Cam-RSM dataset composition before and after applying ADASYN.

Figure 5. End-to-end workflow integrating ADASYN balancing, multi-crop SwAV pre-training, and a frozen ResNet-50 classifier for RSM severity detection.

Figure 6. Architecture of SwAV-modified ResNet-50.

Figure 7. Confusion matrix for SwAV ResNet-50.

Figure 8. Confusion matrix for fully supervised ResNet-50.

Figure 9. Receiver operating characteristic (ROC) curves for SwaV ResNet-50.

Figure 10. SwAV ResNet-50 training and validation accuracy curve.

Figure 11. Fully supervised ResNet-50 training and validation accuracy curve.

Table 1. SwAV ResNet-50 architecture and parameter details.

Layer	Output Shape	Number of Parameters
Convolutional Layer (7 × 7)	112 × 112 × 64	9408
Max Pooling	56 × 56 × 64	0
Residual Stage 1	56 × 56 × 256	215,808
Residual Stage 2	28 × 28 × 512	1,229,824
Residual Stage 3	14 × 14 × 1024	7,458,816
Residual Stage 4	7 × 7 × 2048	14,965,248
Total		23.5 million

Table 2. SwAV ResNet-50 hyperparameter details.

Parameter	Search Range	Optimized Value
Learning Rate	[1 × 10⁻⁵ to 1 × 10⁻³]	3.2 × 10⁻⁴
Weight Decay	[0, 0.1]	0.01
Batch Size (Pre-training)	256	256
Batch Size (Fine-tuning)	128	128
Epochs (Pre-training)	150	150
Epochs (Fine-tuning)	50	50

Table 3. Performance Metric values for SwAV ResNet-50.

Metric	Mild_RSM	Moderate_RSM	Severe_RSM	Macro-Average
Sensitivity	98.09%	97.82%	99.03%	98.31%
Specificity	99.52%	98.72%	99.23%	99.15%
Precision	98.90%	97.58%	98.56%	98.35%
F1-Score	98.49%	97.70%	98.80%	98.33%

Table 4. Performance metric values for fully supervised ResNet-50.

Metric	Mild_RSM	Moderate_RSM	Severe_RSM	Macro-Average
Sensitivity	86.68%	83.17%	90.18%	86.68%
Specificity	93.84%	90.31%	95.85%	93.34%
Precision	87.56%	81.13%	91.56%	86.75%
F1-Score	87.12%	82.13%	90.86%	86.71%

Table 5. Comparative performance summary between SwAV ResNet-50 and fully supervised ResNet-50 models.

Metric	SwAV ResNet-50	Fully Supervised ResNet-50	Improvement
Overall Accuracy	98.32%	86.67%	+11.65%
Macro-Average F1	98.83%	86.71%	+11.62%

Table 6. Performance comparison of vision-based models in agricultural classification and segmentation.

Study	Approach	Performance	Task Type	Dataset Characteristics
Proposed AIM-Net	SwAV + Frozen ResNet-50	98.33% F1-Score	RSM Severity Classification	Tea RSM (3 severity classes)
BYOL Self-Supervised	BYOL Self-Supervised	94% Accuracy	Multi-pest Classification	12 Agricultural Pest Types
TP-Transfiner	Mask Transfiner + Attention	87.21% AP50	Tea Pest Segmentation	29 Tea Pest Species
ConvViT	CNN-Transformer Hybrid	93.61% Accuracy	Farm Insect Detection	15 Insect Species
GNViT	Vision Transformer	99.52% Accuracy	Groundnut Pest Classification	Groundnut Pests (IP102)
Pest-YOLO	YOLO + Focal Loss	93.8% mAP	Dense Pest Detection	Multi-class Dense Pests

Table 7. Fold-wise accuracy comparison between SwAV ResNet-50 and fully supervised ResNet-50.

Metric	SwAV ResNet-50	Fully Supervised ResNet-50	Improvement
Fold 1	98.24%	86.93%	+11.31%
Fold 2	98.51%	85.89%	+12.62%
Fold 3	98.06%	87.12%	+10.94%
Fold 4	98.59%	85.71%	+12.68%
Fold 5	98.18%	87.71%	+10.47%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kanagarajan, M.; Natarajan, M.; Rajendran, S.; Velusamy, P.; Ganesan, S.K.; Bose, M.; Sakthivel, R.; Stephen Inbaraj, B. AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation. AgriEngineering 2025, 7, 247. https://doi.org/10.3390/agriengineering7080247

AMA Style

Kanagarajan M, Natarajan M, Rajendran S, Velusamy P, Ganesan SK, Bose M, Sakthivel R, Stephen Inbaraj B. AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation. AgriEngineering. 2025; 7(8):247. https://doi.org/10.3390/agriengineering7080247

Chicago/Turabian Style

Kanagarajan, Malathi, Mohanasundaram Natarajan, Santhosh Rajendran, Parthasarathy Velusamy, Saravana Kumar Ganesan, Manikandan Bose, Ranjithkumar Sakthivel, and Baskaran Stephen Inbaraj. 2025. "AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation" AgriEngineering 7, no. 8: 247. https://doi.org/10.3390/agriengineering7080247

APA Style

Kanagarajan, M., Natarajan, M., Rajendran, S., Velusamy, P., Ganesan, S. K., Bose, M., Sakthivel, R., & Stephen Inbaraj, B. (2025). AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation. AgriEngineering, 7(8), 247. https://doi.org/10.3390/agriengineering7080247

Article Menu

AIM-Net: A Resource-Efficient Self-Supervised Learning Model for Automated Red Spider Mite Severity Classification in Tea Cultivation

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition Methods

2.1.1. Field Site Characterization and Environmental Controls

2.1.2. Advanced Imaging Protocol and Quality Assurance

2.1.3. Standardized Sampling Strategy and Statistical Power Analysis

2.2. Image Labelling Process

2.2.1. Visual Symptom Assessment

2.2.2. Quantitative Validation

2.2.3. Consensus Protocol

2.3. Pre-Processing Pipeline

2.3.1. Image Resizing and Normalization

2.3.2. Informative Region Extraction

2.3.3. Image Augmentation

2.4. ADASYN for Balanced TEA-RSM-S3D Dataset

2.5. Methodology: End-to-End Framework Description

2.5.1. Self-Supervised Representation Learning via SwAV

2.5.2. Supervised Fine-Tuning of SwAV Features for RSM Classification

2.5.3. Architecture of SwAV-Modified ResNet-50 Backbone for RSM Detection

2.5.4. Hyperparameter Tuning Strategy for SwAV-Modified ResNet-50

2.6. Multi-Stage Self-Supervised Learning Approach on Camellia-RSM

3. Results and Discussion

3.1. Performance Evaluation: Confusion Matrix

Systematic Ablation Study Analysis

3.2. Performance Metrics

3.3. Comparison with Baseline Model

3.4. Comprehensive Performance Comparison Framework

3.5. Receiver Operating Characteristics (ROC) and Accuracy Curves

3.6. Statistical Experimental Design

4. Conclusions

4.1. Limitations of the Study

4.2. Future Work and Recommendations

4.3. Imbalanced Dataset Evaluation and Threshold Calibration

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI