An Advancing GCT-Inception-ResNet-V3 Model for Arboreal Pest Identification

: The significance of environmental considerations has been highlighted by the substantial impact of plant pests on ecosystems. Addressing the urgent demand for sophisticated pest management solutions in arboreal environments, this study leverages advanced deep learning technologies to accurately detect and classify common tree pests, such as “mole cricket”, “aphids”, and “Therioaphis maculata (Buckton)”. Through comparative analysis with the baseline model ResNet-18 model, this research not only enhances the SE-RegNetY and SE-RegNet models but also introduces innovative frameworks, including GCT-Inception-ResNet-V3, SE-Inception-ResNet-V3, and SE-Inception-RegNetY-V3 models. Notably, the GCT-Inception-ResNet-V3 model demonstrates exceptional performance, achieving a remarkable average overall accuracy of 94.59%, average kappa coefficient of 91.90%, average mAcc of 94.60%, and average mIoU of 89.80%. These results signify substantial progress over conventional methods, outperforming the baseline model’s results by margins of 9.1%, nearly 13.7%, 9.1%, and almost 15% in overall accuracy, kappa coefficient, mAcc, and mIoU, respectively. This study signifies a considerable step forward in blending sustainable agricultural practices with environmental conservation, setting new benchmarks in agricultural pest management. By enhancing the accuracy of pest identification and classification in agriculture, it lays the groundwork for more sustainable and eco-friendly pest control approaches, offering valuable contributions to the future of agricultural protection.


Introduction
The criticality of effective pest management in safeguarding agricultural productivity and environmental health has heightened in recent years [1].Consequently, understanding and mitigating tree pest threats has become an imperative endeavor.Tree pests, such as mole crickets, aphids, and Therioaphis maculata (Buckton) [2], detrimentally affect crop yields and forest ecosystems, underscoring the urgent need to devise sophisticated detection and management tactics.Trees, meanwhile, are pivotal to both the natural environment and human societies; they are foundational to ecosystems and crucial in sustaining biodiversity [3], climate regulation, soil and water conservation, and providing ecological services.Thus, the application of advanced technology to address arboricultural pest issues is imperative [4].
The introduction of three innovative models, notably GCT-Inception-ResNet-V3, which outperform traditional deep learning models used in pest research [26].The GCT-Inception-ResNet-V3 model, in particular, exhibits significant improvement over the baseline model.Also, the standard deviations of the indicators of the proposed model are all smaller than 0.001, demonstrating the stability and applicability of the model; 2.
Previous tree-specific pest studies have been limited, often focusing on a single tree species, with a lack of deep learning-based pest research for a broader range of trees.This research targets a wide variety of trees, specifically arborvitae [27,28]; 3.
While prior research on tree insect pests primarily concentrated on intelligent target detection, there has been little emphasis on their identification and classification [29], which is the focal point of this study; Agronomy 2024, 14, 864 3 of 20 4.
Incorporating the GCT (Gated Channel Transformation) attention mechanism, the GCT-Inception-ResNet-V3 model is demonstrated to be less time-consuming and more efficient compared to models utilizing the traditional SE (Squeeze-and-Excitation) Attention mechanism [26]; 5.
The model proposed in this study combines the improved Feature Extraction Algorithm to enhance the model so that it is also applicable to the identification of different pests in other different types of crops in agriculture, while the preprocessing part of the data contains image processing operations such as median filtering, histogram equalization, etc., which can greatly help in avoiding the detection of the effects of weather, sunlight, and soot, which are the most common weather conditions in agricultural fields.Therefore, this study will be applied to a wider range of agricultural crops and pests in the future.
The rest of this paper is organized as follows.Section 2 introduces the data and the models used in this study.Section 3 shows the experimental results.Section 4 offers discussions of the effectiveness of the proposed GCT-Inception-V3 model.Section 5 gives the conclusions of the paper.

Dataset
The IP102 dataset, a specialized image collection for agricultural pests, is designed to foster the advancement of image recognition technology within this field [30].Comprising images of 102 prevalent agricultural pests across various categories, such as insects and mites, this dataset underscores the substantial impact these pests have on agricultural productivity, posing significant threats to crop growth and yield.The IP102 dataset serves as a valuable resource for researchers, facilitating the application of machine learning and deep learning methodologies to devise an effective pest recognition and classification system.Such advancements aim to bolster agricultural production through intelligent pest management.
For this research, three arboreal pests from the IP102 dataset-mole cricket, aphids, and Therioaphis maculata (Buckton)-were chosen for investigation due to their pronounced relevance to environmental health.These pests are notorious in agricultural contexts, directly endangering crop vitality and yield.The dataset encompasses 1905 images of these pests, forming the training set (635 training samples for each of the three classes), and an additional 333 images constituting the test set to assess the model's pest recognition efficacy (the number of test sets for the mole cricket, aphid, and Buckton comprise 119, 107 and 107, respectively).Figure 1 illustrates samples from the pest dataset.
3. While prior research on tree insect pests primarily concentrated on intelligent target detection, there has been little emphasis on their identification and classification [29], which is the focal point of this study; 4. Incorporating the GCT (Gated Channel Transformation) attention mechanism, the GCT-Inception-ResNet-V3 model is demonstrated to be less time-consuming and more efficient compared to models utilizing the traditional SE (Squeeze-and-Excitation) Attention mechanism [26]; 5.The model proposed in this study combines the improved Feature Extraction Algorithm to enhance the model so that it is also applicable to the identification of different pests in other different types of crops in agriculture, while the preprocessing part of the data contains image processing operations such as median filtering, histogram equalization, etc., which can greatly help in avoiding the detection of the effects of weather, sunlight, and soot, which are the most common weather conditions in agricultural fields.Therefore, this study will be applied to a wider range of agricultural crops and pests in the future.
The rest of this paper is organized as follows.Section 2 introduces the data and the models used in this study.Section 3 shows the experimental results.Section 4 offers discussions of the effectiveness of the proposed GCT-Inception-V3 model.Section 5 gives the conclusions of the paper.

Dataset
The IP102 dataset, a specialized image collection for agricultural pests, is designed to foster the advancement of image recognition technology within this field [30].Comprising images of 102 prevalent agricultural pests across various categories, such as insects and mites, this dataset underscores the substantial impact these pests have on agricultural productivity, posing significant threats to crop growth and yield.The IP102 dataset serves as a valuable resource for researchers, facilitating the application of machine learning and deep learning methodologies to devise an effective pest recognition and classification system.Such advancements aim to bolster agricultural production through intelligent pest management.
For this research, three arboreal pests from the IP102 dataset-mole cricket, aphids, and Therioaphis maculata (Buckton)-were chosen for investigation due to their pronounced relevance to environmental health.These pests are notorious in agricultural contexts, directly endangering crop vitality and yield.The dataset encompasses 1905 images of these pests, forming the training set (635 training samples for each of the three classes), and an additional 333 images constituting the test set to assess the model's pest recognition efficacy (the number of test sets for the mole cricket, aphid, and Buckton comprise 119, 107 and 107, respectively).Figure 1 illustrates samples from the pest dataset.To augment the dataset and enhance the model's generalization capabilities, this study employs various data preprocessing techniques, including image flipping and rotation.These methods create new training samples from the original images, thereby increasing data diversity [31].Furthermore, the pixel dimensions of the images were standardized, with all images uniformly resized to 60 × 60 pixels.This adjustment not only diminishes the computational demands of model training, but also accelerates the training process while ensuring the model's effectiveness with input images of varying sizes [11].Also in this study, histogram equalization has been performed to improve the overall contrast of the image, to make the details in the image more visible and to reduce the effect of strong light, weak light, brightness etc.; in this study, a median filter is used to reduce the noise in order to reduce the effect caused by dust, shadows, smoke etc., so as to efficiently remove the random noise in the image while maintaining the edges and details of the image [12].Through these data enhancement and preprocessing measures, the research aims to develop an efficient and robust pest identification model that facilitates the rapid and precise detection of agricultural pests, offering technical support for pest management strategies.The processed RGB color image data are fed into the model for training and testing.Figure 2 illustrates the research's technical pathway.To augment the dataset and enhance the model's generalization capabilities, this study employs various data preprocessing techniques, including image flipping and rotation.These methods create new training samples from the original images, thereby increasing data diversity [31].Furthermore, the pixel dimensions of the images were standardized, with all images uniformly resized to 60 × 60 pixels.This adjustment not only diminishes the computational demands of model training, but also accelerates the training process while ensuring the model's effectiveness with input images of varying sizes [11].Also in this study, histogram equalization has been performed to improve the overall contrast of the image, to make the details in the image more visible and to reduce the effect of strong light, weak light, brightness etc.; in this study, a median filter is used to reduce the noise in order to reduce the effect caused by dust, shadows, smoke etc., so as to efficiently remove the random noise in the image while maintaining the edges and details of the image [12].Through these data enhancement and preprocessing measures, the research aims to develop an efficient and robust pest identification model that facilitates the rapid and precise detection of agricultural pests, offering technical support for pest management strategies.The processed RGB color image data are fed into the model for training and testing.Figure 2 illustrates the research's technical pathway.

ResNet-18 Model
The ResNet-18 model, a derivative of the Residual Network architecture, introduces a pioneering method for deep neural network training by incorporating skip connections, which circumvent certain layers [32].These connections counteract the vanishing gradient issue, facilitating the training of significantly deeper networks than were previously achievable [33].The ResNet-18 model, with its 18-layer deep structure, has demonstrated effectiveness in various image recognition tasks, attributed to its capacity to extract highly distinctive features from a broad spectrum of images while maintaining computational efficiency [34].This makes ResNet-18 a suitable choice for the task of arboreal pest identification.Particularly, when identifying pests like mole cricket, aphids, and Therioaphis maculata (Buckton) from the IP102 dataset, the ResNet-18 model's ability to discern detailed patterns and textures is crucial [35].
For this study, the ResNet-18 model has been customized for classifying three common arboreal pests associated with environmental impact.The model starts with a convolutional layer of 3 × 3 kernel size, a stride of 1, and padding of 1, followed by batch normalization and ReLU activation, setting the stage for feature extraction.This layer is tailored to accommodate input images resized to 60 × 60 pixels, optimizing the model for this specific dataset.The model's architecture includes sequences of basic blocks, each containing two 3 × 3 convolutional layers with batch normalization and ReLU activation, arranged in four sequences with the configuration [2, 2, 2, 2], aligning with the ResNet-18 standard.To tailor the model for pest detection using the IP102 dataset, several

Methodology 2.2.1. ResNet-18 Model
The ResNet-18 model, a derivative of the Residual Network architecture, introduces a pioneering method for deep neural network training by incorporating skip connections, which circumvent certain layers [32].These connections counteract the vanishing gradient issue, facilitating the training of significantly deeper networks than were previously achievable [33].The ResNet-18 model, with its 18-layer deep structure, has demonstrated effectiveness in various image recognition tasks, attributed to its capacity to extract highly distinctive features from a broad spectrum of images while maintaining computational efficiency [34].This makes ResNet-18 a suitable choice for the task of arboreal pest identification.Particularly, when identifying pests like mole cricket, aphids, and Therioaphis maculata (Buckton) from the IP102 dataset, the ResNet-18 model's ability to discern detailed patterns and textures is crucial [35].
For this study, the ResNet-18 model has been customized for classifying three common arboreal pests associated with environmental impact.The model starts with a convolutional layer of 3 × 3 kernel size, a stride of 1, and padding of 1, followed by batch normalization and ReLU activation, setting the stage for feature extraction.This layer is tailored to accommodate input images resized to 60 × 60 pixels, optimizing the model for this specific dataset.The model's architecture includes sequences of basic blocks, each containing two 3 × 3 convolutional layers with batch normalization and ReLU activation, arranged in four sequences with the configuration [2, 2, 2, 2], aligning with the ResNet-18 standard.To tailor the model for pest detection using the IP102 dataset, several adjustments and optimizations were applied, including a grid search for optimal parameters.To mitigate overfitting and boost generalization, a dropout rate of 0.5 was implemented before the final fully connected layer.Figure 3 illustrates the ResNet-18 model's adapted architecture used in this research.
adjustments and optimizations were applied, including a grid search for optimal parameters.To mitigate overfitting and boost generalization, a dropout rate of 0.5 was implemented before the final fully connected layer.Figure 3 illustrates the ResNet-18 model's adapted architecture used in this research.

SE-RegNet Model
The SE-RegNet model embodies a cutting-edge architectural integration, merging the strengths of Squeeze-and-Excitation (SE) blocks with the efficient RegNet framework, designed for precise and effective image classification tasks [36].SE blocks, which adaptively recalibrate channel-wise feature responses, boost the network's representational capacity by methodically capturing interdependencies among channels, achieving notable performance enhancements with minimal additional computational demand [37].RegNet, recognized for its straightforwardness and scalability, offers a modular structure that can be readily scaled and customized for various applications.The selection of SE-RegNet for arboreal pest identification is driven by the requirement for a model adept at processing the complex and nuanced imagery of pests, capable of accentuating pertinent features and diminishing irrelevant ones [38].This function is vital for differentiating among similar pest species and variations within the IP102 dataset, where minor visual indicators are key for precise categorization.The integration of SE blocks into the RegNet architecture aims to exploit spatial and channel-wise attention mechanisms, thereby improving feature extraction and learning efficacy, positioning SE-RegNet as an optimal choice for this demanding field.
The SE-RegNet model is intricately designed to excel in classifying three common arboreal pests related to environmental concerns from the IP102 dataset.At the heart of its architecture is the incorporation of Squeeze-and-Excitation (SE) blocks, which introduce a dynamic channel-wise attention mechanism to enhance the feature learning process.These blocks employ global average pooling to condense spatial information into a succinct channel descriptor, which is subsequently refined through a series of fully connected layers, resulting in a channel-wise modulation of the feature maps based on the learned importance of each channel.Figure 4 illustrates the architectural configuration of the SE-RegNet model.

SE-RegNet Model
The SE-RegNet model embodies a cutting-edge architectural integration, merging the strengths of Squeeze-and-Excitation (SE) blocks with the efficient RegNet framework, designed for precise and effective image classification tasks [36].SE blocks, which adaptively recalibrate channel-wise feature responses, boost the network's representational capacity by methodically capturing interdependencies among channels, achieving notable performance enhancements with minimal additional computational demand [37].RegNet, recognized for its straightforwardness and scalability, offers a modular structure that can be readily scaled and customized for various applications.The selection of SE-RegNet for arboreal pest identification is driven by the requirement for a model adept at processing the complex and nuanced imagery of pests, capable of accentuating pertinent features and diminishing irrelevant ones [38].This function is vital for differentiating among similar pest species and variations within the IP102 dataset, where minor visual indicators are key for precise categorization.The integration of SE blocks into the RegNet architecture aims to exploit spatial and channel-wise attention mechanisms, thereby improving feature extraction and learning efficacy, positioning SE-RegNet as an optimal choice for this demanding field.
The SE-RegNet model is intricately designed to excel in classifying three common arboreal pests related to environmental concerns from the IP102 dataset.At the heart of its architecture is the incorporation of Squeeze-and-Excitation (SE) blocks, which introduce a dynamic channel-wise attention mechanism to enhance the feature learning process.These blocks employ global average pooling to condense spatial information into a succinct channel descriptor, which is subsequently refined through a series of fully connected layers, resulting in a channel-wise modulation of the feature maps based on the learned importance of each channel.Figure 4 illustrates the architectural configuration of the SE-RegNet model.
adjustments and optimizations were applied, including a grid search for optimal parameters.To mitigate overfitting and boost generalization, a dropout rate of 0.5 was implemented before the final fully connected layer.Figure 3 illustrates the ResNet-18 model's adapted architecture used in this research.

SE-RegNet Model
The SE-RegNet model embodies a cutting-edge architectural integration, merging the strengths of Squeeze-and-Excitation (SE) blocks with the efficient RegNet framework, designed for precise and effective image classification tasks [36].SE blocks, which adaptively recalibrate channel-wise feature responses, boost the network's representational capacity by methodically capturing interdependencies among channels, achieving notable performance enhancements with minimal additional computational demand [37].RegNet, recognized for its straightforwardness and scalability, offers a modular structure that can be readily scaled and customized for various applications.The selection of SE-RegNet for arboreal pest identification is driven by the requirement for a model adept at processing the complex and nuanced imagery of pests, capable of accentuating pertinent features and diminishing irrelevant ones [38].This function is vital for differentiating among similar pest species and variations within the IP102 dataset, where minor visual indicators are key for precise categorization.The integration of SE blocks into the RegNet architecture aims to exploit spatial and channel-wise attention mechanisms, thereby improving feature extraction and learning efficacy, positioning SE-RegNet as an optimal choice for this demanding field.
The SE-RegNet model is intricately designed to excel in classifying three common arboreal pests related to environmental concerns from the IP102 dataset.At the heart of its architecture is the incorporation of Squeeze-and-Excitation (SE) blocks, which introduce a dynamic channel-wise attention mechanism to enhance the feature learning process.These blocks employ global average pooling to condense spatial information into a succinct channel descriptor, which is subsequently refined through a series of fully connected layers, resulting in a channel-wise modulation of the feature maps based on the learned importance of each channel.Figure 4 illustrates the architectural configuration of the SE-RegNet model.

SE-RegNetY Model
The SE-RegNetY model marks a significant advancement in deep learning for image classification, particularly tailored for the intricate demands of environmental monitoring, such as in arboreal pest detection [39].This model stands out by integrating the dynamic, channel-wise attention modulation of the Squeeze-and-Excitation (SE) mechanism with the structural efficiency of the RegNet architecture, forming the specialized SE-RegNetY framework [40].The incorporation of the SE mechanism within the RegNet framework facilitates a nuanced, focused recalibration of features, ensuring the model accentuates the most relevant features for classification tasks.This feature is crucial in pest recognition, where minor distinctions between species are essential for precise identification [41].The SE-RegNetY model, distinct from conventional SE-RegNet models, is specifically designed to offer enhanced precision and adaptability, addressing the complex variability found in natural settings, such as changing light conditions, various pest positions, and backgrounds that challenge typical classification models.
The SE-RegNetY model is intricately constructed using YBlocks as its fundamental components, each consisting of a convolutional layer followed by batch normalization and ReLU activation.This configuration constitutes the model's backbone, facilitating efficient feature extraction from input images resized to 60 × 60 pixels.The distinctive aspect of the YBlock is the incorporation of SE-Blocks, which dynamically modulate the network's channel weights, enabling the model to focus on features crucial for pest classification.This modulation is accomplished via adaptive average pooling within the SE-Block, which condenses spatial information into a channel descriptor, itself subsequently utilized to adjust the feature maps, thereby amplifying the network's representational capacity.The Adam optimizer is utilized, incorporating a weight decay rate for regularization, to refine the training process, ensuring consistent progress.Moreover, L2 regularization is implemented to mitigate overfitting.Figure 5 depicts the SE-RegNetY model's architecture.

SE-RegNetY Model
The SE-RegNetY model marks a significant advancement in deep learning for image classification, particularly tailored for the intricate demands of environmental monitoring, such as in arboreal pest detection [39].This model stands out by integrating the dynamic, channel-wise attention modulation of the Squeeze-and-Excitation (SE) mechanism with the structural efficiency of the RegNet architecture, forming the specialized SE-RegNetY framework [40].The incorporation of the SE mechanism within the RegNet framework facilitates a nuanced, focused recalibration of features, ensuring the model accentuates the most relevant features for classification tasks.This feature is crucial in pest recognition, where minor distinctions between species are essential for precise identification [41].The SE-RegNetY model, distinct from conventional SE-RegNet models, is specifically designed to offer enhanced precision and adaptability, addressing the complex variability found in natural settings, such as changing light conditions, various pest positions, and backgrounds that challenge typical classification models.
The SE-RegNetY model is intricately constructed using YBlocks as its fundamental components, each consisting of a convolutional layer followed by batch normalization and ReLU activation.This configuration constitutes the model's backbone, facilitating efficient feature extraction from input images resized to 60 × 60 pixels.The distinctive aspect of the YBlock is the incorporation of SE-Blocks, which dynamically modulate the network's channel weights, enabling the model to focus on features crucial for pest classification.This modulation is accomplished via adaptive average pooling within the SE-Block, which condenses spatial information into a channel descriptor, itself subsequently utilized to adjust the feature maps, thereby amplifying the network's representational capacity.The Adam optimizer is utilized, incorporating a weight decay rate for regularization, to refine the training process, ensuring consistent progress.Moreover, L2 regularization is implemented to mitigate overfitting.Figure 5 depicts the SE-RegNetY model's architecture.

SE-Inception-ResNet-V3 Model
The SE-Inception-ResNet-V3 model represents an advanced integration of three key architectural concepts in deep learning: Squeeze-and-Excitation (SE) blocks, Inception modules, and the Residual Network (ResNet) framework.This composite architecture is deliberately selected for arboreal pest detection, attributed to its superior capacity to identify and accentuate pivotal features within intricate visual data.The inclusion of SE blocks

SE-Inception-ResNet-V3 Model
The SE-Inception-ResNet-V3 model represents an advanced integration of three key architectural concepts in deep learning: Squeeze-and-Excitation (SE) blocks, Inception modules, and the Residual Network (ResNet) framework.This composite architecture is deliberately selected for arboreal pest detection, attributed to its superior capacity to identify and accentuate pivotal features within intricate visual data.The inclusion of SE blocks facilitates dynamic channel-wise recalibration, augmenting the model's acuity in recognizing essential characteristics that distinguish various pest species [42].Inception modules enhance the model's adaptability in managing diverse spatial dimensions, allowing the effective analysis of detailed pest imagery in natural settings [43].The design of this model is particularly apt for detecting specific arboreal pests from the IP102 dataset, where the visual resemblances among different species present a notable classification challenge.The differences in appearance, scale, and context of these pests in the images demand a model proficient in detecting subtle nuances while ensuring high accuracy levels.
The SE-Inception-ResNet-V3 model is meticulously engineered to optimize its performance in analyzing and classifying complex image data.The network's initial phase comprises a convolutional layer with a 7 × 7 kernel, a stride of 2, and padding of 3, followed by batch normalization and ReLU activation.This configuration effectively reduces the input's spatial dimensions while initiating the feature extraction process.A subsequent max-pooling layer further downsamples the input, setting the stage for the ensuing layers.At the core of the model are the modified Basic Blocks that integrate SE functionality, enabling the network to dynamically recalibrate its focus on the most salient features.After the initial residual blocks, the model progresses to a series of Inception modules, which divide the input into various branches processed with distinct kernel sizes (1 × 1, 5 × 5, and 3 × 3) before concatenating the results.This branched strategy allows the model to capture an extensive spectrum of spatial information, encompassing both fine details and larger contextual elements.The incorporation of L2 regularization and a dropout layer further bolsters the model's robustness and generalizability.The distinctive architecture of the model is illustrated in Figure 6.
the effective analysis of detailed pest imagery in natural settings [43].The design of this model is particularly apt for detecting specific arboreal pests from the IP102 dataset, where the visual resemblances among different species present a notable classification challenge.The differences in appearance, scale, and context of these pests in the images demand a model proficient in detecting subtle nuances while ensuring high accuracy levels.
The SE-Inception-ResNet-V3 model is meticulously engineered to optimize its performance in analyzing and classifying complex image data.The network's initial phase comprises a convolutional layer with a 7 × 7 kernel, a stride of 2, and padding of 3, followed by batch normalization and ReLU activation.This configuration effectively reduces the input's spatial dimensions while initiating the feature extraction process.A subsequent max-pooling layer further downsamples the input, setting the stage for the ensuing layers.At the core of the model are the modified Basic Blocks that integrate SE functionality, enabling the network to dynamically recalibrate its focus on the most salient features.After the initial residual blocks, the model progresses to a series of Inception modules, which divide the input into various branches processed with distinct kernel sizes (1 × 1, 5 × 5, and 3 × 3) before concatenating the results.This branched strategy allows the model to capture an extensive spectrum of spatial information, encompassing both fine details and larger contextual elements.The incorporation of L2 regularization and a dropout layer further bolsters the model's robustness and generalizability.The distinctive architecture of the model is illustrated in Figure 6.

SE-Inception-RegNetY-V3 Model
The SE-Inception-RegNetY-V3 model represents a state-of-the-art architecture tailored for the intricate task of arboreal pest classification, harnessing the advantages of Squeeze-and-Excitation (SE) blocks, Inception modules, and the RegNet framework.This amalgamation responds to the necessity for a model capable of adeptly managing the significant variability and complexity found in environmental imagery, particularly within the IP102 dataset for pest identification.SE blocks introduce an adaptive mechanism for recalibrating channel-wise feature responses, markedly improving the model's capacity to emphasize pertinent features [44].Inception modules facilitate the extraction of information across multiple scales, enabling the model to discern detailed attributes of pests and their environments-a critical aspect for differentiating similar species [45].Additionally, the RegNet backbone provides a scalable and efficient structure, supporting deep learning processes while keeping computational demands in check.The selection of the SE-Inception-RegNetY-V3 model for this task is based on its superior proficiency in addressing the nuanced differences and high intraclass variation among pest species.The model's dynamic attention mechanism, courtesy of SE blocks, along with the multi-scale feature analysis afforded by Inception modules, guarantees thorough feature extraction [46].

SE-Inception-RegNetY-V3 Model
The SE-Inception-RegNetY-V3 model represents a state-of-the-art architecture tailored for the intricate task of arboreal pest classification, harnessing the advantages of Squeeze-and-Excitation (SE) blocks, Inception modules, and the RegNet framework.This amalgamation responds to the necessity for a model capable of adeptly managing the significant variability and complexity found in environmental imagery, particularly within the IP102 dataset for pest identification.SE blocks introduce an adaptive mechanism for recalibrating channel-wise feature responses, markedly improving the model's capacity to emphasize pertinent features [44].Inception modules facilitate the extraction of information across multiple scales, enabling the model to discern detailed attributes of pests and their environments-a critical aspect for differentiating similar species [45].Additionally, the RegNet backbone provides a scalable and efficient structure, supporting deep learning processes while keeping computational demands in check.The selection of the SE-Inception-RegNetY-V3 model for this task is based on its superior proficiency in addressing the nuanced differences and high intraclass variation among pest species.The model's dynamic attention mechanism, courtesy of SE blocks, along with the multi-scale feature analysis afforded by Inception modules, guarantees thorough feature extraction [46].
The SE-Inception-RegNetY-V3 model's architecture is meticulously designed to optimize feature extraction and classification efficacy.The network begins with a convolutional layer that readies the input for advanced processing, followed by the implementation of Basic Blocks enhanced with SE functionality.This early incorporation of attention mechanisms primes subsequent layers to focus on pertinent features, reducing the influence of background noise.At its core, the model boasts a series of Inception-YBlocks, a novel amalgamation that combines YBlock architecture with Simplified Inception Modules.These blocks are structured to analyze inputs via parallel pathways, capturing both local and broad contexts through the use of various kernel sizes.Embedding SE-Blocks within these Inception-YBlocks further hones the feature maps, accentuating vital details and minimizing redundancies.The model adopts a weight decay approach during optimization to ence of background noise.At its core, the model boasts a series of Inception-YBlocks, a novel amalgamation that combines YBlock architecture with Simplified Inception Modules.These blocks are structured to analyze inputs via parallel pathways, capturing both local and broad contexts through the use of various kernel sizes.Embedding SE-Blocks within these Inception-YBlocks further hones the feature maps, accentuating vital details and minimizing redundancies.The model adopts a weight decay approach during optimization to enhance learning regularization, promoting adaptability to novel data.Figure 7 displays the SE-Inception-RegNetY-V3 model's architecture.

GCT-Inception-ResNet-V3 Model
The GCT-Inception-ResNet-V3 model represents a groundbreaking architecture specifically designed for the complex task of arboreal pest detection.It skillfully merges Global Context Transformation (GCT) mechanisms with the adaptability of Inception modules and the robustness of the ResNet framework, and thus creates a formidable solution for deep learning-driven image classification [26].The GCT mechanism stands out for its use of channel-wise feature normalization to boost the network's representational efficiency.By adjusting features according to their global statistical characteristics, GCT ensures the model accentuates pertinent patterns while minimizing noise, thus enhancing focus and discriminability.This model is chosen for arboreal pest identification due to several key reasons.First, the GCT's capacity to refine feature representation is ideally

GCT-Inception-ResNet-V3 Model
The GCT-Inception-ResNet-V3 model represents a groundbreaking architecture specifically designed for the complex task of arboreal pest detection.It skillfully merges Global Context Transformation (GCT) mechanisms with the adaptability of Inception modules and the robustness of the ResNet framework, and thus creates a formidable solution for deep learning-driven image classification [26].The GCT mechanism stands out for its use of channel-wise feature normalization to boost the network's representational efficiency.By adjusting features according to their global statistical characteristics, GCT ensures the model accentuates pertinent patterns while minimizing noise, thus enhancing focus and discriminability.This model is chosen for arboreal pest identification due to several key reasons.First, the GCT's capacity to refine feature representation is ideally suited to the subtle distinctions among various pest species, where minor details can greatly influence classification outcomes.Second, the multi-scale processing ability of the Inception modules enables the network to discern a broad array of spatial features, ranging from fine details to larger shapes, vital for detecting pests of varying sizes and orientations.Finally, integrating these elements within a ResNet backbone facilitates deep feature extraction while preventing vanishing gradients, courtesy of the residual connections that promote effective learning in deep networks.Consequently, the GCT-Inception-ResNet-V3 model is an exemplary choice for the complex requirements of pest recognition within natural environments.The sophisticated management of global and local features, paired with the dynamic adjustment of feature channels, establishes a new standard for accuracy and robustness in environmental image classification tasks.Within convolutional networks, let x belong to R C×H×W , denoting an activation feature with H and W as spatial dimensions and C as the channel count.GCT operates via the following transformation, shown in Formula (1): where the embedding weight a adapts outputs, and the gating weight γ along with bias β control gate activation, influencing GCT's channel-specific actions.Importantly, GCT's parameter load, O(C), demonstrates greater efficiency compared to SE Attention module's O C 2 [26].
The GCT-Inception-ResNet-V3 model's architecture is intricately crafted to optimize both efficiency and effectiveness in processing high-dimensional image data.Initiating with a BasicConv2d layer featuring a 7 × 7 convolution, the model kickstarts the feature extraction process, which is then refined by max-pooling to diminish spatial dimensions while retaining vital information.At the heart of the architecture lies the integration of Inception modules, which divide the input into multiple branches, each processed with different kernel sizes, enabling the network to concurrently capture information across various scales.This feature is essential for processing complex images where pests are set against varied backgrounds and orientations.Subsequent to the Inception module, a GCT layer globally normalizes features, boosting the model's emphasis on significant patterns via a transformation that involves scaling and shifting parameters.Additional convolutional layers and an adaptive average pooling layer follow, further honing the features before classification.A fully connected layer then conducts the final classification, mapping the processed features to specific pest categories.The GCT-Inception-ResNet-V3 model represents an exemplary fusion of advanced neural network technologies, enhancing its proficiency in distinguishing closely related pest species.Through the strategic amalgamation of GCT for feature modulation, Inception modules for multi-scale processing, and ResNet for deep learning efficiency, the model establishes a new benchmark in environmental image analysis, marking notable progress in agricultural and environmental monitoring.The incorporation of a weight decay rate in the optimizer aids in regularizing the learning process, fostering a robust model capable of high precision across varied environmental settings.Figure 8 illustrates the GCT-Inception-ResNet-V3 model's architecture.

Improved Algorithm Combination
The Enhanced Feature Extraction Algorithm, denoted as Algorithm 1, presents an intricate methodology for characterizing arboreal images to improve the identification of arboreal pests.This advanced approach includes morphological operations such as open-

Improved Algorithm Combination
The Enhanced Feature Extraction Algorithm, denoted as Algorithm 1, presents an intricate methodology for characterizing arboreal images to improve the identification of arboreal pests.This advanced approach includes morphological operations such as opening and top-hat transformations, adeptly engineered to highlight minute textural features and contrasts within the foliage, which are indicative of pest activity [47].By amalgamating this algorithm with six distinct models, each specifically crafted for the detection of tree pests, the collective leverage optimizes the unique strengths inherent within diverse analytical frameworks [48].The incorporation of the algorithm for feature extraction due to its robustness against variations in different situation and computational efficiency, crucial for the real-time detection of pests.
The Enhanced Feature Extraction Algorithm boasts several advantages: 1.
Increased sensitivity-The preliminary application of morphological operations amplifies subtle textural discrepancies, potentially flagging the early stages of pest invasion, thus enabling timelier and more precise detection; 2.
Improved feature relevance-The Dilation operation increases the size of the pest region, thus compensating for some of the edge detail that may be lost during segmentation [49]; 3.
Enhanced robustness-The top-hat transformation is specially designed to accentuate novel elements within the image, such as the manifestation of pests against complex backgrounds, often overlooked by conventional methodologies; 4.
Flexibility-The algorithm is model-agnostic, thus facilitating seamless incorporation across various computational models to enhance their discernment of pest-afflicted regions within tree canopies; 5.
Efficiency-Despite its complexity, the algorithm remains computationally economical, making it suitable for deployment in resource-limited environments, such as mobile applications or infield diagnostic tools.
The integration of Algorithm 1 into each of the six models is predicted to significantly elevate the precision of arboreal pest detection.This enhancement will yield models that are not merely more attuned to the different expressions of pest damage, but are also equipped to function consistently across diverse environmental settings, assuring robust and dependable pest detection.

Experimental Setup
The experimental component of this study was conducted using Python version 3.7 and the PyTorch framework.The experiments were carried out on hardware equipped with NVIDIA RTX 3090 graphics cards.To ensure the reliability and stability of our findings, each model underwent ten separate experimental runs for comparison purposes.This approach allowed for a more comprehensive assessment of the models' performance and stability across different experimental conditions.

Setting of Hyperparameters
In this research, we adopted the ResNet-18 model as the baseline for pest identification studies due to its prominence in previous research [15,16].The hyperparameters for the ResNet-18 were meticulously adjusted to suit the IP102 dataset's requirements, implementing a grid search for the optimal set.The network was configured with a batch size of 30 and a learning rate of 0.007, and it utilized the Adam optimizer to efficiently handle sparse gradients in complex environments.To enhance model generalization and prevent overfitting, a dropout rate of 0.5 was applied prior to the final fully connected layer.
For the SE-RegNet model, a grid search was employed to determine the best hyperparameters, settling on a learning rate of 0.0075, a batch size of 27, and an extended training duration of 270 epochs to achieve consistent model convergence.The Adam optimizer was selected for its dynamic adjustment capabilities of learning rates, complemented by dropout techniques to navigate the model through varied pest classification challenges effectively.The SE-RegNetY model also utilized a grid search for hyperparameter optimization, opting for a learning rate of 0.0075, a batch size of 23, and a training span of 260 epochs to ensure a thorough learning process while maintaining rapid convergence.It incorporated a weight decay rate and L2 regularization to further refine the training procedure and control overfitting.For the SE-Inception-ResNet-V3 model, the hyperparameters were fine-tuned through a grid search, selecting a learning rate of 0.007, a batch size of 23, and a succinct training period of 75 epochs to balance efficiency and effectiveness.The addition of L2 regularization and a dropout layer was intended to enhance the model's durability and generalization capabilities.The SE-Inception-RegNetY-V3 model's hyperparameters, identified via grid search, included a learning rate of 0.008, a batch size of 21, and a 200-epoch training regimen, carefully chosen to prevent overfitting while ensuring swift model convergence.The optimization process was augmented with a weight decay strategy to boost the regularization of the learning curve.The GCT-Inception-ResNet-V3 model's hyperparameter configuration was determined through a grid search, adopting a learning rate of 0.008 and a batch size of 29.This setup aimed to balance quick convergence with the minimization of overfitting risks, employing a weight decay rate in the optimizer to regularize the learning process effectively.These strategic selections and modifications across models demonstrate a comprehensive approach to optimizing pest detection performance while addressing the challenges of model training and generalization.

Results Show
This study assesses the efficacy of various models, including the standard ResNet-18, employing metrics such as overall accuracy (OA), Kappa coefficient, mAcc, and mIoU, to understand their effectiveness in pest identification [38].Specifically, OA quantifies the percentage of correct predictions, computed as the total of true positives and true negatives divided by all observations, as delineated in Formula (2): OA = (TP + TN)/(TP + TN + FP + FN) Here, TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.The Kappa coefficient (k) quantifies the concordance between model predictions and actual observations, correcting for random agreement, as illustrated in Formula (3): Mean Accuracy (mAcc) represents the average classification accuracy across different classes, providing insight into the model's uniformity in performance, particularly valuable in instances of dataset imbalance.The mAcc formula is: Mean Intersection over Union (mIoU) is crucial for segmentation tasks and evaluating the congruence between predicted and actual classifications, reflecting the model's precision in defining class boundaries and specifics, which is vital for tasks demanding meticulous accuracy: Po signifies observed agreement, whereas Pe denotes the probability of random agreement.For a more comprehensive analysis, User's Accuracy (UA, or Recall), Producer's Accuracy (PA, or Precision), and the F1-Score are employed to examine the precision of each category within the models.UA evaluates the likelihood that a pixel classified in the map/image accurately represents its real-world class, highlighting the model's precision in identifying specific classes-a key metric for tasks requiring accurate class identification [37].UA is calculated as the ratio of correct predictions (TP) to all predictions for that class (TP and FP): UA = TP/(TP + FP) PA, or Precision, indicates the likelihood that a pixel correctly classified corresponds to its true class, which is pivotal for reducing false positives.It sheds light on a model's accuracy in making positive predictions for each class, determined by dividing the true positives (TP) by the sum of true positives and false negatives (TP and FN) [50]: The F-1 score integrates precision and recall into a singular metric that underscores their equilibrium.It is critical in scenarios where both precision and recall are equally important, ensuring one is not optimized at the expense of the other.This metric offers a comprehensive evaluation of model performance, as illustrated in Formula (8): F1-Score = (2 ×* Precision × Recall)/(Precision + Recall) (8) These metrics elucidate the model's effectiveness, highlighting its strengths and areas for enhancement in pest identification.
Figure 9 illustrates the average value of the performance metrics-OA, Kappa coefficient, mAcc, and mIoU-of seven different models of the 10 separate experimental runs, including the baseline model ResNet-18.An in-depth analysis of these metrics for the evaluated models provides critical insights into their capabilities in classifying arboreal pests.The data indicate that the GCT-Inception-ResNet-V3, SE-Inception-ResNet-V3, SE-Inception-RegNetY-V3, and SE-RegNet models outperform the baseline model ResNet-18, achieving superior OA, Kappa coefficient, mAcc, and mIoU scores.Specifically, the average OA values for GCT-Inception-ResNet-V3, SE-Inception-ResNet-V3, SE-Inception-RegNetY-V3, SE-RegNet, and ResNet-18 are 0.9459, 0.8851, 0.8682, 0.8576, and 0.8547, respectively.The average Kappa coefficients for these models are 0.9190, 0.8280, 0.8020, 0.7860, and 0.7820, respectively.The average mAcc values follow the same order at 0.9460, 0.8854, 0.8684, 0.8573, and 0.8549, while the average mIoU scores are 0.8980, 0.7930, 0.7680, 0.7465, and 0.7475, respectively.These enhanced metrics underscore the superior capacity of these models to detect and categorize the intricate features of arboreal pests, signifying a notable progression beyond the ResNet-18 model.Notably, the GCT-Inception ResNet-V3 model excels across all four average accuracy indicators, showcasing improvements of 9.1% in OA, 13.7% in the Kappa coefficient, and 9.1% in mAcc, with an mIoU increase of 15% compared to the baseline model.For the GCT-Inception-ResNet-V3 model, the standard deviations of OA, Kappa coefficient, mAcc, and mIoU for the ten sets of experiments are 0.0003, 0.0006, 0.0003, and 0.0009, respectively, which are all less than 0.001, showing the stabilizing and statistical significance of this model.This substantial progress empha-sizes the GCT-Inception ResNet-V3 model's precision and consistency in identifying arboreal pests.OA, Kappa Coefficient, mAcc, and mIoU provide a comprehensive overview of a model's predictive accuracy and its alignment with empirical observations.The GCT-Inception ResNet-V3 model's exemplary scores in these metrics underscore its exceptional predictive proficiency and classification accuracy, thereby demonstrating its advanced capabilities in pest identification [51].Nevertheless, for a holistic understanding, it is imperative to also incorporate average User's and Producer's Accuracies [52], as well as the F1-Score, in order to thoroughly assess the model's performance on specific pest characteristics, as delineated in Table 1.GCT-Inception ResNet-V3 demonstrates outstanding results in almost all aspects of average producer and user accuracies, and the average F1-score, affirming its leading status.The average performance matrix for each model, which offers detailed comparisons, is depicted in Figure 10.OA, Kappa Coefficient, mAcc, and mIoU provide a comprehensive overview of a model's predictive accuracy and its alignment with empirical observations.The GCT-Inception ResNet-V3 model's exemplary scores in these metrics underscore its exceptional predictive proficiency and classification accuracy, thereby demonstrating its advanced capabilities in pest identification [51].Nevertheless, for a holistic understanding, it is imperative to also incorporate average User's and Producer's Accuracies [52], as well as the F1-Score, in order to thoroughly assess the model's performance on specific pest characteristics, as delineated in Table 1.GCT-Inception ResNet-V3 demonstrates outstanding results in almost all aspects of average producer and user accuracies, and the average F1-score, affirming its leading status.The average performance matrix for each model, which offers detailed comparisons, is depicted in Figure 10.affirming its leading status.The average performance matrix for each model, which offers detailed comparisons, is depicted in Figure 10.

Conclusions
This study was dedicated to the identification of pests, a critical aspect of supporting agricultural productivity and environmental conservation [54].To this end, we explored and assessed the efficacy of various advanced models, including GCT-Inception-ResNet-V3, SE-Inception-ResNet-V3, and SE-Inception-RegNetY-V3, in addition to employing SE-RegNet and SE-RegNetY, with the well-established ResNet-18 model serving as the baseline for our comparative analysis.Among these, the GCT-Inception-ResNet-V3 model stood out, demonstrating exceptional performance by achieving an overall accuracy (OA) of 94.59%, a Kappa Coefficient of 91.90%, a mean accuracy (mAcc) of 94.60%, and a mean Intersection over Union (mIoU) of 89.80%.These results not only highlight the model's accuracy and reliability in identifying arboreal pests, but also validate the method's effectiveness.Additionally, the application of visualized convolutional neural network heatmaps provided insightful data on the model's focus areas, further affirming its practical utility and validity [53].The study encountered certain limitations, including the models' varying degrees of sensitivity to different types of arboreal pests and the inherent challenges in generalizing the findings across diverse agricultural contexts.Future research directions will concentrate on developing innovative modeling approaches that can accommodate the complexities inherent in the investigation of arboreal pests and diseases.The aim is to refine and enhance agricultural conservation strategies further, contributing to the sustainability of agricultural practices and the preservation of environmental health [55].

Conclusions
This study was dedicated to the identification of arboreal pests, a critical aspect of supporting agricultural productivity and environmental conservation [54].To this end, we explored and assessed the efficacy of various advanced models, including GCT-Inception-ResNet-V3, SE-Inception-ResNet-V3, and SE-Inception-RegNetY-V3, in addition to employing SE-RegNet and SE-RegNetY, with the well-established ResNet-18 model serving as the baseline for our comparative analysis.Among these, the GCT-Inception-ResNet-V3 model stood out, demonstrating exceptional performance by achieving an overall accuracy (OA) of 94.59%, a Kappa Coefficient of 91.90%, a mean accuracy (mAcc) of 94.60%, and a mean Intersection over Union (mIoU) of 89.80%.These results not only highlight the model's accuracy and reliability in identifying arboreal pests, but also validate the method's effectiveness.Additionally, the application of visualized convolutional neural network heatmaps provided insightful data on the model's focus areas, further affirming its practical utility and validity [53].The study encountered certain limitations, including the models' varying degrees of sensitivity to different types of arboreal pests and the inherent challenges in generalizing the findings across diverse agricultural contexts.Future research directions will concentrate on developing innovative modeling approaches that can accommodate the complexities inherent in the investigation of arboreal pests and diseases.The aim is to refine and enhance agricultural conservation strategies further, contributing to the sustainability of agricultural practices and the preservation of environmental health [55].

Figure 3 .
Figure 3.The architecture of the ResNet-18 Model.

Figure 3 .
Figure 3.The architecture of the ResNet-18 Model.

Figure 3 .
Figure 3.The architecture of the ResNet-18 Model.

Figure 4 .
Figure 4.The architecture of the SE-RegNet Model.

Figure 4 .
Figure 4.The architecture of the SE-RegNet Model.

Figure 5 .
Figure 5.The architecture of the SE-RegNetY model.

Figure 5 .
Figure 5.The architecture of the SE-RegNetY model.

FOR PEER REVIEW 14 of 21 Figure 9 .
Figure 9. Model performance comparison.
Figure 11 illustrates the variation in loss and validation set accuracy across epochs during the training phase of each model, wherein the GCT-Inception ResNet-V3 model reduces the loss to 0.0952 by the final epoch, and achieves a validation set accuracy of 0.9659.

Figure 12 .
Figure 12.An exemplary result of the CNN visualization heat map.

Author Contributions:
Conceptualization, C.L.; methodology, C.L.; software, C.L., Y.T. and M.S.; validation, C.L.; formal analysis, C.L. and Y.T.; investigation, C.L.; resources, C.L. and M.S.; data curation, C.L. and M.S.; writing-original draft preparation, C.L. and Y.T.; writing-review and editing, C.L. and X.T.; visualization, C.L. and H.C.; supervision, X.T., Y.Z. and H.C.; project administration, X.T. and Y.Z.; funding acquisition, X.T. and Y.Z.All authors have read and agreed to the published version of the manuscript.Funding: This work was supported in part by the Wuyi University Hong Kong Macao Joint Research and Development Fund under Grant 2022WGALH19, and in part by the Science and Technology Development Fund of Macau (grant number 0038/2020/A1).Informed Consent Statement: Not applicable.Data Availability Statement: The data of this study were obtained from the publicly available dataset IP102, accessed at https://github.com/xpwu95/IP102.Accessed on 13 January 2024.Acknowledgments: This work was supported by Wuyi University and Macau University of Science and Technology.

Figure 12 .
Figure 12.An exemplary result of the CNN visualization heat map.