Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks

Cao, Jingjun; Xian, Xiaoqing; Qiu, Minghui; Li, Xin; Wei, Yajie; Liu, Wanxue; Zhang, Guifen; Jiang, Lihua

doi:10.3390/agronomy15071557

Open AccessArticle

Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks

by

Jingjun Cao

^1,†

,

Xiaoqing Xian

^2,†,

Minghui Qiu

¹,

Xin Li

²,

Yajie Wei

²,

Wanxue Liu

²,

Guifen Zhang

^2,* and

Lihua Jiang

^1,*

¹

Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China

²

State Key Laboratory for Biology of Plant Diseases and Insect Pests, Key Laboratory for Prevention and Control of Invasive Alien Species, Ministry of Agriculture and Rural Affairs, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agronomy 2025, 15(7), 1557; https://doi.org/10.3390/agronomy15071557

Submission received: 23 May 2025 / Revised: 20 June 2025 / Accepted: 23 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue Sustainable Management of Arthropod Pests in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Five beetle species can occur in potato fields simultaneously, including one quarantine pest (the Colorado potato beetle (CPB)), one phytophagous pest (the 28-spotted potato ladybird beetle), and three predatory ladybird beetles (the 7-spotted lady beetle, the tortoise beetle, and the harlequin ladybird beetle). The timely detection and accurate identification of CPB and other phytophagous or predatory beetles are critical for the effective implementation of monitoring and control strategies. However, morphological identification requires specialized expertise, is time-consuming, and is particularly challenging due to the dark brown body color of these beetles when in the young larval stages. This study provides an effective solution to distinguish between phytophagous and/or quarantine and predatory beetles. This solution is in the form of a new convolutional neural network architecture, known as MSAC-ResNet. Specifically, it comprises several multiscale asymmetric convolution blocks, which are designed to extract features at multiple scales, mainly by integrating different-sized asymmetric convolution kernels in parallel. We evaluated the MSAC-ResNet through comprehensive model training and testing on a beetle image dataset of 11,325 images across 20 beetle categories. The proposed recognition model achieved accuracy, precision, and recall rates of 99.11%, 99.18%, and 99.11%, respectively, outperforming another five existing models, namely, AlexNet, MobileNet-v3, EfficientNet-b0, DenseNet, and ResNet-101. Notably, the developed field investigation mini-program can identify all the developmental stages of these five beetle species, from young larvae to adults, and provide timely management (or protection) suggestions to farmers. Our findings could be significant for future research related to precise pest control and the conservation of natural enemies.

Keywords:

asymmetric convolution; automatic recognition; colorado potato beetle; deep learning; field investigation

1. Introduction

Potatoes, part of a globally important industry in terms of poverty alleviation and rural revitalization, have made significant contributions to China’s grain increase and farmers’ incomes and play an important role in promoting and ensuring China’s food security, precision industry poverty alleviation, and planting structure adjustment. With the increase in potato planting area and the aggravation of climate change, frequent outbreaks of insect pests have become an important factor restricting the increase in potato production. Pests not only directly worsen the quality and yield of potatoes but also spread various viruses which indirectly cause the development and spread of potato diseases [1]. China has a vast territory, complex agricultural habitats, and various potato pests and natural enemies.

Five beetle species can occur in potato fields simultaneously, including the Colorado potato beetle (CPB, Leptinotarsa decemlineata Say) (Coleoptera: Chrysomelidae), the 28-spotted potato ladybird beetle Henosepilachna vigintioctopunctata (Fabricius), and predatory ladybird beetles (e.g., the 7-spotted lady beetle Coccinella septempunctata (L.), the tortoise beetle Propylaea japonica (Thunberg), and the harlequin ladybird beetle Harmonia axyridis (Pallas)) (Coleoptera: Coccinellidae). The CPB, native to North America and now also distributed in Asia and Europe, is one of the most destructive potato pests globally and listed as a quarantine or regulated pest by over 30 countries and organizations worldwide. It causes significant annual losses in global potato production. The area affected by the CPB is approximately 324 hectares in China. The 28-spotted potato ladybird beetle, native to India, feeds on the foliage of potatoes and other solanaceous crops. It is mainly distributed in Asia (in China, Japan, South Korea, Vietnam, Thailand, Bhutan, Indonesia, and Nepal), but it has also been detected in Europe (Russia), Oceania (New Zealand and Australia), and South America (Brazil and Argentina) [2,3]. The seven-spotted lady beetle, which was originally found in Europe and Asia, is a predator of many crop-threatening aphids and is distributed throughout the Middle East, India, and North America [4]. The tortoise beetle, originally identified in Asia, is widely distributed in Japan, Russia, North Korea, Vietnam, Bhutan, and India [5]. The harlequin ladybird beetle is native to central and eastern Asia and has been used as a biological control agent against aphids worldwide; however, it is also identified as an invasive species in most areas of North America and Europe. Nowadays, it can be found in more than 70 countries and districts mainly due to intentional introductions coupled with natural dispersal [6]. Usually, these two pest beetle species and three predatory beetle species appear at the same time in potato fields. Due to their morphological similarities, it is difficult to distinguish between herbivorous beetles and predatory beetles, especially in their young larval stages, because the body length of their newly hatched larvae is less than 2 mm and the larval body color is dark brown [1,7]; this has resulted in poor pest control, especially in the case of the quarantine pest CPB, marked by the killing of natural enemies by mistake.

With the development of computer vision technology, convolutional neural networks (CNNs) and deep learning (DL) methods have been studied in the field of disease and pest identification in recent years. Bevers et al. [8] invented an automated soybean disease classifier based on digital images. The model was trained and validated on a dataset of over 9500 images of soybeans, including eight distinct deficiency and disease categories. Transfer learning, data augmentation, and data engineering were adopted for efficient training. However, the computational complexity of the model can be further optimized. Bertolla et al. [9] invented a strategy for recognizing fall armyworms to allow for improved management of maize crops in Brazil. Identifying fall armyworms is crucial in maize production, especially in all of the growth stages. The invented method classified data by finding an optimum hyperplane that maximizes the distance between each class of caterpillar in multi-dimensional spaces. Serrato et al. [10] invented an automatic pest detection method that can detect two pests on potato and bean crops: Colorado potato beetle and Mexican bean beetle. However, their study only focused on the adult stage of the above-listed pests. Sohel et al. [11] used Inception-v3, Vgg-16, and MobileNet-v2 to identify eight prevalent potato pest species, including Colorado potato beetle, in Bangladesh. However, their study also only focused on the adult stage of these pests. In addition, the authors found that the recognition accuracy could be further improved.

Application systems combined with CNN models have been studied and developed to address practical problems in the fields of pest and disease control. An automatic pest monitoring system was developed to detect and count rice planthoppers [12]. Chen et al. [13] presented an automatic system to detect and count stored grain pests in actual situations. Proper measures can be taken to minimize economic damage. The system comprised a vehicle with a camera and an object detection model. The experimental results demonstrate that the mean average precision of the proposed method is 97.55%, which meets the practical requirements for counting and detecting pests in granaries. To accurately distinguish Liriomyza, Li et al. [14] proposed a DL image recognition algorithm known as SeResNet-Liriomyza. The recognition model could distinguish between different Liriomyza species, including L. sativae, L. huidobrensis, L. trifolii, L. chinensis, and L. bryoniae. An intelligent field investigation application was developed to identify these five species conveniently. A magnifying lens (APEXEL company, with a built-in LED light) and a rotatable carrier assembled outside the mobile phone camera were used to capture magnified images of Liriomyza. The lens could magnify the photographed target 200 times to identify tiny targets. The average recognition accuracy could reach 99.88%. A CNN classification model and an acoustic detection system were adopted to detect insects hidden in stored grains [15]. The recorded sounds of major insect pests in stored paddy grains, including the lesser grain borer (Rhyzopertha dominica Fabricius), red flour beetle (Tribolium castaneum Herbst), and rice weevil (Sitophilus oryzae L.), were characterized using spectrogram profiles. A CNN was applied to classify insect pests based on the emitted sound profiles. The model achieved an average accuracy of 84.51%.

Establishing a comprehensive four-phase early prevention and control system (with the phases encompassing early detection, early reporting, early isolation, and early response) is a key strategy to block CPB’s spread and harm [16]. To effectively monitor and manage this pest, scientific surveillance is typically implemented in its endemic regions, high-risk spread areas, and major potato-producing areas during the growth periods of both cultivated host crops and wild host plants (e.g., Solanum nigrum L. and Hyoscyamus niger L.) [17]. Currently, the primary monitoring method involves manual inspections to detect abnormal symptoms on host plants, such as leaf lesions, egg clusters, or feeding damage on leaves and tender stems. If suspected pests are found, specimens need to be collected for further identification. While manual inspection is straightforward and requires no sophisticated equipment, it demands significant expertise from inspectors. Non-professionals often struggle to accurately distinguish potato beetles across their developmental stages (eggs, larvae, pupae, and adults). However, preliminary identification of host plant damage symptoms, combined with insect image recognition technology, can greatly improve the accuracy and efficiency of potato beetle monitoring. To further improve image-based identification precision, this study selected beetles that are morphologically similar to, or co-occur with, the CPB in potato fields, as identified during field surveys and pest management programs.

Many researchers have conducted preliminary pest identification and recognition studies through computer vision and deep learning, and they have achieved research results that show the capability of convolutional neural networks. However, there have been few studies on the application of such models and networks to identify all developmental stages of CPB, from young larvae to adults. This study focuses on the young larvae stage in particular, a suitable period for pesticide control. The target sizes in the beetle images at different growth stages are diverse, and the recognition accuracies of existing models for distinguishing between beneficial and harmful beetles, especially CPBs from other beetle species with similar appearances, need to be further improved. Accurate identification of species is crucial in pest control and protection and utilization of natural enemies. In potato fields, CPB usually coexists with the 28-spotted potato ladybird beetle. Both herbivorous beetles are morphologically similar, especially in their pre-adult stages (e.g., egg, larva, and pupa) [18]. In this case, misidentification of these beetles always happens during field surveys and pest control. Moreover, predatory ladybird beetles, such as the seven-spotted lady beetle, the tortoise beetle, and the harlequin ladybird beetle, often hunt aphids in potato fields. These pest and natural enemy species, particularly in the pre-adult stages, are morphologically similar, making it very challenging to accurately distinguish between them. Therefore, the present study aimed to develop a methodology to distinguish between quarantine pest beetles (one species), herbivorous beetles (one species), and predatory beetles (three species) in a potato field rapidly and precisely. The primary research contributions can be summarized as follows:

A multiscale asymmetric convolution block was designed and developed to extract features at multiple scales by integrating asymmetric convolution kernels of different sizes in parallel.
A novel CNN built based on the aforementioned multiscale asymmetric convolution block, named ‘MSAC-ResNet’, was proposed to distinguish between these five beetle species. The proposed algorithm outperforms five other SOTA networks.
The developed field investigation mini-program can identify all developmental stages of these five beetle species, from young larvae to adults, and provide management (or protection) suggestions in a timely manner.

This paper is organized as follows. Section 1 introduces related research. Section 2 contains a description of the beetle image dataset and the proposed MSAC-ResNet architecture. The experimental results and analyses are described in Section 3. The experimental results were obtained by comparing the MSAC-ResNet with popular networks and integrating the proposed network with the CPB investigation program. Finally, we provide a discussion of the results in Section 4 and our conclusions in Section 5.

2. Materials and Methods

2.1. Overall Framework of Colorado Potato Beetle Investigation System

The CPB investigation system comprises three parts: a portable image acquisition device, a mini-program for fieldwork, and a server integrated with a recognition algorithm. The overall framework is illustrated in Figure 1. During the field investigation, the staff captured suspected CPB samples and photographed them. Then, using the Hypertext Transfer Protocol Secure (HTTPS) protocol, the images were transmitted to the server through the mini-program. When receiving the uploaded images, the server extracts and analyzes their features. The recognition results are returned to the corresponding staff. Besides recognizing or identifying species, the location of the infestation can be easily recorded and transmitted in the mini-program. Data on the type and location of infestations provide important information.

2.2. Beetle Image Dataset Organization

2.2.1. Image Collection

During the field investigation, images of different growth stages of CPBs, as well as other beetle species, including H. vigintioctopunctata, P. japonica, H. axyridis, and C. septempunctata, that occur in the same potato fields were collected. Stage-specific images of these insects included young larvae, older larvae, pupae, and adult insects. Data pertaining to the number of images for each of the five beetle species are shown in Table 1. Images of CPBs were captured in a potato field in Xinyuan County, Ili Prefecture, Xinjiang Autonomous Region, China. The H. vigintioctopunctata images were also captured in a potato field in Huocheng County, Ili Prefecture, Xinjiang Autonomous Region, China. Images of P. japonica and H. axyridis images were captured in a laboratory at the Plant Protection Institute, Chinese Academy of Agriculture Sciences (IPP, CAAS). The C. septempunctata images were captured in a laboratory at the Beijing Academy of Agriculture and Forestry Sciences. These three predatory ladybird beetles fed on aphids on potato plants.

All categories of insect samples for collecting images were identified based on morphological characteristics or via molecular biological methods (for eggs, DNA barcoding was used) by specialists from the IPP, CAAS to ensure the correctness of the image dataset. The images collected in this study are shown in Figure 2. From left to right, the images show the following: (a) the young larval stage of CPB, (b) older larval stage of CPB, (c) pupal stage of CPB, and (d) adult stage of CPB; (e) the young larval stage of H. vigintioctopunctata, (f) older larval stage of H. vigintioctopunctata, (g) pupal stage of H. vigintioctopunctata, and (h) adult stage of H. vigintioctopunctata; (i) young larval stage of P. japonica, (j) older larval stage of P. japonica, (k) pupal stage of P. japonica, and (l) adult stage of P. japonica; (m) the young larval stage of H. axyridis, (n) older larval stage of H. axyridis, (o) pupal stage of H. axyridis, (p) adult stage of H. axyridis; and (q) the young larval stage of C. septempunctata, (r) older larval stage of C. septempunctata, (s) pupal stage of C. septempunctata, and (t) adult stage of C. septempunctata (Figure 2). Overall, the dataset included 11,325 images from 20 categories across these five beetle species (Table 1).

2.2.2. Data Augmentation

To enhance the model’s robustness, data augmentation was performed to expand and diversify the training datasets. During preprocessing, the images of the dataset were normalized on each channel by subtracting the mean of the data and dividing it by the standard deviation. This preprocessing method helps to accentuate distinct features among different specimens. Furthermore, we applied various augmentation techniques—including randomly changing the images’ brightness, contrast, and saturation; flipping them horizontally and vertically; and subjecting them to 90° and 270° rotations—to improve the algorithm’s generalization capacity.

2.3. Asymmetric Convolution

As illustrated in Figure 3, the k × k standard convolution (Conv k × k) can be broken down into a k × 1 convolution (Conv k × 1), followed by a 1 × k convolution (Conv 1 × k), which is known as an asymmetric convolution. Asymmetric convolution is often used to approximate the standard convolution for model compression and acceleration [19]. The number of parameters that the asymmetric convolution reduced compared to the standard convolution is as in Formula (1):

\begin{matrix} D = (K \times K - 2 \times K) / K \times K = (K - 2) / K \end{matrix}

(1)

where D denotes the number of reduced parameters, and K means convolutional kernel size. For instance, in the Inception-v3 algorithm, a 7 × 7 convolution is replaced with 1 × 7 and 7 × 1 convolutions [20]. The image segmentation algorithm EDANet also adopted this method to design an efficient network [21], and the number of algorithm parameters was reduced by 33%.

2.4. Multiscale Asymmetric Convolution Block

Because insect images captured at different growth stages have different scale features, a multiscale asymmetric convolution (MSAC) block comprising different-sized asymmetric convolution kernels in parallel was used to extract features at multiple scales, such as 3 × 3, 5 × 5, and 7 × 7. The conv k × k is decomposed into two asymmetric convolutions, conv k × 1 and conv 1 × k. After applying different convolutions, features of the same dimensions can be obtained and concatenated directly. Thus, the output features are no longer evenly distributed, and the aggregate features with strong correlations are clustered, such as the aggregate feature set after the 3 × 3 asymmetric convolution operation, the aggregate feature set after the 5 × 5 asymmetric convolution operation, and the aggregate feature set after the 7 × 7 asymmetric convolution operation, as illustrated in Figure 4. Furthermore, asymmetric convolution is used to approximate the standard convolution for model compression and acceleration. Multiple feature sets with strong correlations are clustered. Irrelevant noncritical features are weakened. Even if features of the same size are output, the redundant information feature output by the multiscale asymmetric convolution block is low. A parallel structure with a multisized convolutional kernel can enhance the network’s ability to perceive features at different scales. Through multiscale feature fusion, parallel convolution at multiple scales can be used to extract features from different scales, which can improve the accuracy of the algorithm.

A residual network is an architecture that addresses the problems of gradient vanishing [22] and explosion [23] during the training process of a deep neural network. Its core objective is to allow the network to learn the residual relationship between the input and expected output rather than learning the expected output directly [24]. The multiscale asymmetric convolution block achieves identity mapping by adding skip connections [25], that is, skipping the connection of certain network layers and thereafter adding and merging with the output of the network layer. Its operating mechanism is similar to that of a high-speed neural network, which opens the gate through a large positive bias weight [26]. This design makes DL models with dozens or hundreds of layers easier to train and can maintain, or even improve, the accuracy when the model depth is increased.

2.5. MSAC-ResNet Architecture

The MSAC-ResNet architecture was proposed to identify insects at different growth stages more accurately. MSAC-ResNet comprises several multiscale asymmetric convolution blocks, as mentioned in the previous section, except for the first stage, where a standard convolution is used. To construct a neural network in a simple way, exploring network topologies is necessary to efficiently determine an appropriate architecture [27]. All convolution operations were followed by batch normalization [28] and ReLU to reduce the internal covariate shift and improve the nonlinearity of the neural networks. Figure 5 shows the architecture of MSAC-ResNet. The architecture primarily comprises a standard convolution stage and four multiscale asymmetric convolution block stages. In stage 1, the first convolutional layer filters the 224 × 224 × 3 input image with 64 kernels of size 7 × 7 × 3 with a stride of 2 pixels and padding of 3. In stages 2, 3, and 4, the MSAC block comprises different-sized asymmetric filters that are parallel and follow two simple design principles: (1) for cases where the size is the same as the feature size, the MSAC block has the same number of filters; (2) if the size of the feature map is halved, the number of filters is doubled for preserving the complexity per layer. Downsampling is performed by convolutional operations that have a stride of 2. The final average pooling decreases the spatial dimensions to one before the application of a 20-way fully connected layer, and, finally, the generation of a one-hot format vector to compute the final classification results. The total number of weighted layers is 34.

The MSAC-ResNet training procedure was performed by optimizing the gradient descent during each search once a random weight vector was selected. This method does not require manual tuning of the learning rate and appears to be robust to different model architectures, hyperparameter selections, and various data modalities. The experimental setup for this study included the PyTorch 2.0 framework, as well as a cluster of 4 NVIDIA Tesla P100-SXM2-16GB graphics cards. In addition, 10-fold cross-validation was used. The hyperparameters are listed in Table 2. The batch size was set to 64, the learning rate was set to 0.001, and the training was stopped after 150 epochs.

2.6. Transfer Learning

Pretrained models are commonly used for transfer learning. The advantage of a pretrained model is that it acquires rich feature representation capabilities through training on large-scale datasets. In transfer learning, a pretrained model is used as the initial model for the target task [29]. By fine-tuning the dataset for the target task and continuing to train based on the pretrained model, the model can be adapted to the specific characteristics and requirements of the target task [30]. ImageNet Large Scale Visual Recognition Challenge 2012 was used to pretrain the model. Through transfer learning, the knowledge learned in the source domain is used to help learn tasks in the target domain, thereby helping to improve the generalization ability of the model [31]. Based on the pretrained model, training the proposed model from the initial state is avoided. This method can save considerable time and computing resources. Simultaneously, the pretrained model can provide a better initial parameter [32], thereby accelerating convergence and improving the model’s performance on the target task.

3. Results

3.1. Performance Evaluation

The experiment was conducted from October 2024 to April 2025. To accurately evaluate the model performance, the precision, recall, F1-score, and accuracy were selected to quantify the algorithm. The metrics used for our evaluation are shown in Formulas (2)–(5):

\begin{matrix} Precision = \frac{TP}{TP + FP} \end{matrix}

(2)

\begin{matrix} Recall = \frac{TP}{TP + FN} \end{matrix}

(3)

\begin{matrix} F 1 - score = 2 * \frac{Precision * Recall}{Precision + Recall} \end{matrix}

(4)

\begin{matrix} Accuracy = \frac{TP + TN}{TP + FP + FN + TN} \end{matrix}

(5)

Here, TP (True Positive) represents a case where both the predicted and actual values are positive. FP (False Positive) represents a case where the predicted value is positive but the actual value is negative. FN (False Negative) represents a case where the predicted value is negative but the actual value is positive. TN (True Negative) represents a case where both the predicted and actual values are negative.

3.2. Comparison with SOTA Networks

To evaluate the recognition performance of the proposed MSAC-ResNet model, several SOTA algorithms which are commonly used for agricultural pests recognition and have achieved good results were used for comparison; these include general models AlexNet [33], MobileNet-v3 [34], EfficientNet [35], DenseNet [36], and ResNet [24]. To ensure a fair comparison, all model weights were pretrained on the ImageNet Large Scale Visual Recognition Challenge 2012 dataset. The experimental results are listed in Table 3.

The comparison results indicated that our proposed MSAC-ResNet model outperformed those of the five existing models in distinguishing CPBs. The identification accuracy of the MSAC-ResNet model was 99.11%, which improved by 2.32%, 3.14%, 2.13%, 1.50%, and 1.05% when compared with the AlexNet, MobileNet-v3, EfficientNet-b0, DenseNet, and ResNet101 models, respectively. The inference time on CPU was calculated on the Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz platform (Table 3). Comparisons of the accuracy of each model on each class are shown in Table 4 to verify the proposed model’s effectiveness. We can observe that the proposed MSAC-ResNet model obtains better recognition accuracy at comparable inference time, since multiscale asymmetric convolution blocks can effectively extract features from different scales.

To further explore the cause of the error, a retrospective analysis of the misjudged samples was conducted. The analysis results showed that the model proposed in this study achieved accurate recognition results in most cases. The main reason for recognition errors is that the areas with significant differences were not obtained when taking photos.

3.3. The Design of the Field Investigation Mini-Program

The main functions of the field investigation and identification WeChat mini-program designed in this study are shown in Figure 6. The WeChat mini-program mainly includes the following functions: species recognition at various growth stages, field investigation of information records, morphological dictionaries, and query record history. Users can take photos or select images from their album via the WeChat mini-program to distinguish between beetles, especially CPBs, from other similar-looking potato pest insects or predatory beetles at various growth stages, including the young larval, older larval, pupal, and adult stages. The mini-program sends the photo to the server. The server program, combined with the trained model, extracts image features and infers categories.

When using the field investigation information record, the mini-program can automatically obtain relevant information, including the system time and the location of investigator, i.e., longitude and latitude. The locations were marked on a map. After completing and submitting the field investigation information, the server receives and stores the relevant information in the database. The species dictionary contains basic information on prevention and control measures for potato-related pest insects. In addition, the users can query the corresponding field investigation information based on time, geographic location, distribution range, and species name.

4. Discussion

The prompt and accurate identification of potato beetles is a prerequisite for the effective implementation of monitoring and control measures related to quarantine pests such as CPB. This effective CPB monitoring method can mitigate potato yield reduction. This study provides an effective solution for distinguishing between beetles that occur in potato fields, especially CPBs, from other beetle species (including one herbivorous beetle species and three predatory ladybird beetle species). This is especially useful given that these beetles have similar appearances during their various growth stages. With the help of an automatic recognition model, the designed investigation system can help agricultural production personnel identify CPBs rapidly and accurately, particularly for larvae and pupae. This approach improves recognition efficiency and stability. The proposed MSAC-ResNet algorithm adopts more asymmetric convolution kernels of different sizes to extract features, connects these features, and sends them to the subsequent module for calculation so that the network can extract richer and more diverse feature information than other models. Moreover, the network’s adaptability to scale is increased. The experimental results indicated that the accuracy, precision, and recall of the proposed model were 99.11%, 99.18%, and 99.11%, respectively, outperforming existing models, namely, AlexNet [33], MobileNet-v3 [34], EfficientNet-b0 [35], DenseNet [36], and ResNet-101 [24]. Thus, the present study offers a new strategy for addressing the issues of distinguishing the CPB and the 28-spotted potato ladybird beetle from predatory beetle species with similar appearances.

The multiscale asymmetric convolution block is the core component of the proposed MSAC-ResNet model. Because of the different target sizes in the beetle images at different growth stages, this block comprises different-sized asymmetric convolution kernels in parallel to extract multiscale features. The experimental results demonstrate that the proposed model has the highest classification accuracy, and using the proposed model can help to decrease the misclassification between morphologically similar species. This comparison verifies the effectiveness of the proposed algorithm. However, even if they are of the same species, beetles from different regions or climates might show distinct body colors; thus, regular updates of the image database are required. Moreover, there are other species of beetles in the potato fields; images of these beetle species need to be supplemented as soon as possible to improve the database.

Furthermore, a potato beetle investigation system, namely, a WeChat mini-program, was designed, which comprised a portable image acquisition device, a mini-program for fieldwork, and a server integrated with the recognition algorithm. Images were uploaded to the server for intelligent identification through the mini-program. The designed investigation system, combined with the proposed network architecture, is conducive to distinguishing CPBs and the 28-spotted potato ladybird beetle and benefits the protection and utilization of natural enemies.

This study mainly focused on beetles that occur in potato fields, including the quarantine CPB, the herbivorous 28-spotted potato ladybird beetle, and three predatory beetle species widely distributed in China and abroad. However, numerous insect species all over the globe pose threats to Solanaceae crops. In the future, additional images of other species will be collected to expand the model’s training dataset. Moreover, combination with molecular techniques can solve the problem of eggs of different beetle species being unrecognizable through images due to their high level of similarity in terms of appearance [37]. Thus, the CPB investigation system can provide more comprehensive information for staff to identify and control this pest rapidly and accurately. In future work, by introducing the CNN structure into the Vision Transformer, the local feature extraction expertise of CNNs and the long-distance dependencies of Transform will be combined. This could further improve the model’s performance in visual tasks.

5. Conclusions

This study provides an effective, timely, and accurate solution for distinguishing between three beetle groups: quarantine beetles (e.g., the CPB), herbivorous beetles (e.g., the 28-spotted potato ladybird beetle), and predatory beetles (e.g., the seven-spotted lady beetle, the tortoise beetle, and the harlequin ladybird beetle). These beetles occur simultaneously in potato fields and are morphologically similar across different growth stages, especially during the young larval stage, with the other stages being the older larval, pupal, and adult stages. A new CNN architecture, known as MSAC-ResNet, has been proposed, comprising several multiscale asymmetric convolution blocks to extract features at multiple scales, mainly by integrating differently sized asymmetric convolution kernels in parallel. The image features can be processed at various scales and then aggregated. Thus, the next stage can extract more features from different scales simultaneously to improve the training results. The experimental results indicated a recognition accuracy of 99.11%, showing that the proposed network architecture had a higher recognition accuracy than the existing five SOTA networks, particularly in terms of distinguishing CPBs. The developed WeChat mini-program requires minimal development costs and is easy to use even for farmers, meaning that it could play a significant role in ensuring effective pest control, especially for the quarantine pest CPB, and the protection of natural enemies. To improve recognition accuracy and effectiveness, further work should focus on technology optimization and algorithm innovation, as well as the imaging of more beetle species occurring in potato fields.

Author Contributions

J.C.: conceptualization, investigation, data curation, methodology, software, validation, writing. X.X.: methodology, writing—review and editing, project administration. M.Q.: data curation, software. X.L.: data curation, software. Y.W.: data curation, software. W.L.: conceptualization, methodology, validation, resources, supervision. G.Z.: methodology, supervision, funding acquisition. L.J.: conceptualization, methodology, validation, resources, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Key Research and Development Project of China. The grant number is 2021YFD1400200, 2021–2025.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors are grateful to Mingqin Zhao, Dibao Chen, and Shixiang Zhang from the College of Tropical Crops of Yunnan Agricultural University for their assistance with photograph collection.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, W.; Xun, T.; Xu, J.; Liu, J.; He, J. Research on the identification of colorado potato beetle and its distribution dispersal and damage in Xinjiang. Xinjiang Agric. Sci. 2010, 47, 906–909. [Google Scholar]
Epilachna vigintioctopunctata. Available online: https://www.cabidigitallibrary.org/doi/10.1079/cabicompendium.21518#sec-6 (accessed on 16 June 2025).
Fabricius. Wikipedia. Available online: http://en.wikipedia.org/wiki/Henosepilachna_vigintioctopunctata#Distribution (accessed on 16 June 2025).
Animal Diversity Web. Available online: https://animaldiversity.org/accounts/Coccinella_septempunctata/ (accessed on 16 June 2025).
Propylea japonica. Available online: https://www.cabidigitallibrary.org/doi/10.1079/cabicompendium.44604 (accessed on 16 June 2025).
Harmonia axyridis. Available online: https://www.cabidigitallibrary.org/doi/10.1079/cabicompendium.26515 (accessed on 16 June 2025).
Leptinotarsa decemlineata. Available online: https://www.cabidigitallibrary.org/doi/10.1079/cabicompendium.30380 (accessed on 16 June 2025).
Bevers, N.; Sikora, E.J.; Hardy, N.B. Soybean disease identification using original field images and transfer learning with convolutional neural networks. Comput. Electron. Agric. 2022, 203, 107449. [Google Scholar] [CrossRef]
Bertolla, A.B.; Cruvinel, P.E. Computational Intelligence Approach for Fall Armyworm Control in Maize Crop. Electronics 2025, 14, 1449. [Google Scholar] [CrossRef]
Roldán-Serrato, K.L.; Escalante-Estrada, J.; Rodríguez-González, M. Automatic pest detection on bean and potato crops by applying neural classifiers. Eng. Agric. Environ. Food 2018, 11, 245–255. [Google Scholar] [CrossRef]
Sohel, A.; Shakil, M.S.; Siddiquee, S.M.T.; Marouf, A.A.; Rokne, J.G.; Alhajj, R. Enhanced Potato Pest Identification: A Deep Learning Approach for Identifying Potato Pests. IEEE Access 2024, 12, 172149–172161. [Google Scholar] [CrossRef]
Wang, F.; Wang, R.; Xie, C.; Zhang, J.; Li, R.; Liu, L. Convolutional neural network based automatic pest monitoring system using hand-held mobile image analysis towards non-site-specific wild environment. Comput. Electron. Agric. 2021, 187, 106268. [Google Scholar] [CrossRef]
Chen, C.; Liang, Y.; Zhou, L.; Tang, X.; Dai, M. An automatic inspection system for pest detection in granaries using YOLOv4. Comput. Electron. Agric. 2022, 201, 107302. [Google Scholar] [CrossRef]
Li, H.; Liang, Y.; Liu, Y.; Xian, X.; Xue, Y.; Huang, H.; Yao, Q.; Liu, W. Development of an intelligent field investigation system for Liriomyza using SeResNet-Liriomyza for accurate identification. Comput. Electron. Agric. 2023, 214, 108276. [Google Scholar] [CrossRef]
Balingbing, C.B.; Kirchner, S.; Siebald, H.; Kaufmann, H.H.; Gummert, M.; Hung, N.V.; Hensel, O. Application of a multi-layer convolutional neural network model to classify major insect pests in stored rice detected by an acoustic device. Comput. Electron. Agric. 2024, 225, 109297. [Google Scholar] [CrossRef]
Wray, A.K.; Agnew, A.C.; Brown, M.E.; Dean, E.M.; Hernandez, N.D.; Jordon, A.; Morningstar, C.R.; Piccolomini, S.E.; Pickett, H.A.; Daniel, W.M.; et al. Understanding gaps in early detection of and rapid response to invasive species in the United States: A literature review and bibliometric analysis. Ecol. Inform. 2024, 84, 102855. [Google Scholar] [CrossRef]
Guo, W.; Xun, T.; Cheng, D.; Tan, W. Research progress on the main biology and ecology of potato beetles in my country and their monitoring and control strategies. Plant Prot. 2014, 40, 1–11. [Google Scholar]
Yan, J.; Guo, W.; Guoqing, L.; Pan, H. Current status and prospects of the management of important insect pests on potato in China. Plant Prot. 2023, 49, 190–206. [Google Scholar]
Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Lo, S.Y.; Hang, H.M.; Chan, S.W.; Lin, J.J. Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation. In Proceedings of the ACM Multimedia Asia, Beijing, China, 16–18 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Journal of Machine Learning Research, Sardinia, Italy, 13–15 May 2010; Volume 9, pp. 249–256. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; Volume 2016, pp. 770–778. [Google Scholar] [CrossRef]
Waheed, A.; Goyal, M.; Gupta, D.; Khanna, A.; Hassanien, A.E.; Pandey, H.M. An optimized dense convolutional neural network model for disease recognition and classification in corn leaf. Comput. Electron. Agric. 2020, 175, 105456. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Doll, P. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lile, France, 6–11 July 2015; Volume 1, pp. 448–456. [Google Scholar]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; pp. 270–279. [Google Scholar]
Zhou, K.; Liu, Z.; Qiao, Y.; Xiang, T.; Loy, C.C. Domain Generalization: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4396–4415. [Google Scholar] [CrossRef] [PubMed]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2017, 60, 84–90. [Google Scholar] [CrossRef]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the Proceedings—30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
Huang, W.; Xie, X.; Huo, L.; Liang, X.; Wang, X.; Chen, X. An integrative DNA barcoding framework of ladybird beetles (Coleoptera: Coccinellidae). Sci. Rep. 2020, 10, 10063. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall framework of Colorado potato beetle investigation system.

Figure 2. Image samples of dataset. (a–t) show the young larva, old larva, pupa, and adult stages of L. decemlineata, H. vigintioctopunctata, P. japonica, H. axyridis, and C. septempunctata, respectively.

Figure 3. Comparison between standard convolution (a) and asymmetric convolution (b).

Figure 4. Multiscale asymmetric convolution block.

Figure 5. The MSAC-ResNet architecture.

Figure 6. Interfaces of Colorado potato beetle identification WeChat mini-program. (a) Interface of recognition result. (b) Interface of investigation summary. (c) Interface of species dictionary. (d) Interface of history record.

Table 1. Number of images per category in the database.

Species	Young Larva	Older Larva	Pupa	Adult Insect
Colorado potato beetle	718	622	594	507
Henosepilachna vigintioctopunctata	661	613	574	693
Propylea japonica	507	663	634	457
Harmonia axyridis	519	544	510	590
Coccinella septempunctata	554	520	393	452

Table 2. Hyperparameter values optimized through training.

Parameter	Value
Optimization algorithm	SGD
Initial learning rate	0.001
Epoch	150
Batch size	64

Table 3. Results of comparison with SOTA networks.

Network	Precision	Recall	F1-score	Accuracy	Inference Time (S)
AlexNet	97.21%	96.79%	96.85%	96.79%	4.335
MobileNet-v3	96.80%	95.97%	96.00%	95.97%	4.751
EfficientNet-b0	96.77%	96.98%	96.78%	96.98%	4.691
DenseNet	97.96%	97.61%	97.58%	97.61%	4.557
ResNet101	98.20%	98.06%	98.08%	98.06%	4.650
MSAC-ResNet	99.18%	99.11%	99.11%	99.11%	4.505

Table 4. Testing the accuracy of each model on each class of the five beetle species.

Species	Stage	AlexNet	MobileNet-v3	EfficientNet-b0	DenseNet	ResNet101	MSAC-ResNet
Colorado potato beetle	Young larva	99%	97%	91%	100%	95%	100%
Colorado potato beetle	Older larva	99%	97%	98%	100%	97%	100%
Colorado potato beetle	Pupa	98%	99%	100%	99%	100%	100%
Colorado potato beetle	Adult insect	70%	57%	92%	100%	93%	100%
H. vigintioctopunctata	Young larva	93%	93%	93%	94%	94%	93%
H. vigintioctopunctata	Older larva	100%	100%	100%	100%	100%	100%
H. vigintioctopunctata	Pupa	100%	100%	100%	100%	100%	100%
H. vigintioctopunctata	Adult insect	100%	100%	100%	100%	100%	100%
P. japonica	Young larva	98%	99%	93%	100%	100%	100%
P. japonica	Older larva	100%	100%	100%	100%	100%	100%
P. japonica	Pupa	90%	91%	85%	99%	96%	97%
P. japonica	Adult insect	100%	100%	100%	100%	100%	100%
H. axyridis	Young larva	100%	100%	100%	100%	100%	100%
H. axyridis	Older larva	94%	98%	99%	99%	97%	100%
H. axyridis	Pupa	100%	99%	100%	100%	100%	100%
H. axyridis	Adult insect	98%	98%	98%	100%	98%	100%
C. septempunctata	Young larva	100%	100%	100%	72%	100%	100%
C. septempunctata	Older larva	92%	98%	87%	91%	88%	91%
C. septempunctata	Pupa	100%	99%	100%	100%	100%	100%
C. septempunctata	Adult insect	100%	98%	100%	100%	98%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, J.; Xian, X.; Qiu, M.; Li, X.; Wei, Y.; Liu, W.; Zhang, G.; Jiang, L. Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks. Agronomy 2025, 15, 1557. https://doi.org/10.3390/agronomy15071557

AMA Style

Cao J, Xian X, Qiu M, Li X, Wei Y, Liu W, Zhang G, Jiang L. Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks. Agronomy. 2025; 15(7):1557. https://doi.org/10.3390/agronomy15071557

Chicago/Turabian Style

Cao, Jingjun, Xiaoqing Xian, Minghui Qiu, Xin Li, Yajie Wei, Wanxue Liu, Guifen Zhang, and Lihua Jiang. 2025. "Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks" Agronomy 15, no. 7: 1557. https://doi.org/10.3390/agronomy15071557

APA Style

Cao, J., Xian, X., Qiu, M., Li, X., Wei, Y., Liu, W., Zhang, G., & Jiang, L. (2025). Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks. Agronomy, 15(7), 1557. https://doi.org/10.3390/agronomy15071557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automatic Potato Crop Beetle Recognition Method Based on Multiscale Asymmetric Convolution Blocks

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework of Colorado Potato Beetle Investigation System

2.2. Beetle Image Dataset Organization

2.2.1. Image Collection

2.2.2. Data Augmentation

2.3. Asymmetric Convolution

2.4. Multiscale Asymmetric Convolution Block

2.5. MSAC-ResNet Architecture

2.6. Transfer Learning

3. Results

3.1. Performance Evaluation

3.2. Comparison with SOTA Networks

3.3. The Design of the Field Investigation Mini-Program

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI