Integrated Analysis of Machine Learning and Deep Learning in Silkworm Pupae (Bombyx mori) Species and Sex Identification

Simple Summary Identifying silkworm pupae species and sex accurately is essential for the hybrid pairing of corresponding species in sericulture, which guarantees the quality of the silkworm eggs and silk. However, there is no cost-effective method that offers a labor-saving and intelligent solution for this. In this study, machine learning and deep learning are used for the automatic recognition of pupae species and sex, either separately or simultaneously, based on the patterns perceived from images. A vast number of postural images of pupae were used for global modeling to eliminate the impact of posture on recognition rate. Six traditional descriptors and six deep learning descriptors were employed for feature extraction and then combined with three machine learning classifiers for identification. Based on that, the model with the best identification performance was screened out, and it can serve as a reference for sericulture breeding. Abstract Hybrid pairing of the corresponding silkworm species is a pivotal link in sericulture, ensuring egg quality and directly influencing silk quantity and quality. Considering the potential of image recognition and the impact of varying pupal postures, this study used machine learning and deep learning for global modeling to identify pupae species and sex separately or simultaneously. The performance of traditional feature-based approaches, deep learning feature-based approaches, and their fusion approaches were compared. First, 3600 images of the back, abdomen, and side postures of 5 species of male and female pupae were captured. Next, six traditional descriptors, including the histogram of oriented gradients (HOG), and six deep learning descriptors, including ConvNeXt-S, were utilized to extract significant species and sex features. Finally, classification models were constructed using the multilayer perceptron (MLP), support vector machine, and random forest. The results indicate that the {HOG + ConvNeXt-S + MLP} model excelled, achieving 99.09% accuracy for separate species and sex recognition and 98.40% for simultaneous recognition, with precision–recall and receiver operating characteristic curves ranging from 0.984 to 1.0 and 0.996 to 1.0, respectively. In conclusion, it can capture subtle distinctions between pupal species and sexes and shows promise for extensive application in sericulture.


Introduction
Being the birthplace of global sericulture and the foremost silk producer and exporter, China has boosted rural revitalization and sericulturists' earnings.For example, sericulturists in Guangxi Province, China, earned CNY 20.818 billion from cocoon sales in 2021 [1].
Animals 2023, 13, 3612 2 of 14 In addition, sericulture impacts the global economy and culture considerably.Sorting silkworm pupae sexes affects the hybridization rate of eggs and the market competitiveness of silk [2].However, various factors, such as production seasonality, brief pupal duration, and rising labor costs, impose challenges in the sorting process [3].
Since the last century, researchers have been developing nondestructive and reliable sex sorting methods.Initially, biologists developed silkworm species with sex-linked traits to ease pupal sex sorting [4,5]; however, these represent only a small fraction of the species.DNA and amino acid analyses can determine sex but are destructive [6,7].Recently, powerful spectral and visual techniques have been proposed to identify pupal sexes.These techniques include magnetic resonance imaging (MRI) [8], X-ray imaging [9], hyper-spectral imaging (HSI) [10,11], near-infrared (NIR) spectroscopy [12], and image recognition [13].Among these, NIR is frequently used for sex determination based on differences in morphology, gonadal traits such as eggs, and water content between male and female pupae [14][15][16][17][18]. Having addressed sericulture challenges in areas like Shandong, China, Zhu et al. developed an NIR system with 97.5% accuracy and a sorting rate of 7.7 pupae per second [19].However, NIR requires frequent manual modeling, and the relatively high cost of its components limits its applicability.
Leveraging the stable genetic traits of pupae, low-cost image recognition has shown significant promise in sex identification.Kamtongdee et al. and Tao et al. identified pupa sex based on gonadal traits through image analysis [20][21][22][23].However, these methods require precise pupa positioning, making them less suitable for production lines.Note that methods using pupal appearance can overcome this drawback.Liang et al. achieved a 98% sex recognition rate using machine learning [24], and Yu et al. obtained 97% accuracy using deep learning [25].However, their studies must address the impact of varying pupae posture on feature extraction, and large datasets are required to train the model to ensure practical feasibility.
As the field advances, research on pupae species identification has been conducted alongside studies on sex identification [26,27].Identifying pupal species provides an objective basis for silkworm identity, thereby reducing species mix-ups in breeding and ensuring correct hybridization of the corresponding species.However, there is no reliable and cost-effective method that offers an intelligent and labor-saving solution for this.In response, this study combines image recognition with machine learning and deep learning techniques to develop a model to identify pupae species and sex, either separately or simultaneously, to meet the needs of silkworm breeding factories.Through global modeling of the pupal back, abdomen, and side postures, this proposed method effectively addresses interference caused by changing pupal positions [19,28].
Figure 1 shows a flowchart of the proposed method.First, images of pupae from both sexes across five species were captured.Then, six traditional descriptors (self-designed pupal shape feature (SD-PSF), Hu moments, histogram of oriented gradients (HOG), equivalent local binary pattern (LBP), gray-level co-occurrence matrix (GLCM), and color histogram) and six deep learning descriptors (VGG16, ResNet101, DenseNet169, MobileNetV3-L, RegNetX-8GF, and ConvNeXt-S) were used to extract features from pupae images.The extracted features were then input to three classifiers, namely multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), to perform the recognition task.
In summary, this paper makes the following contributions: (1) An economical and intelligent solution for sericulture breeding has been proposed.
(2) A global modeling approach was proposed that, using the constructed pupae image set, addressed posture-related identification challenges.In summary, this paper makes the following contributions: (1) An economical and intelligent solution for sericulture breeding has been proposed.
(2) A global modeling approach was proposed that, using the constructed pupae image set, addressed posture-related identification challenges.(3) The {HOG + ConvNeXt-S + MLP} model with the top recognition rate was screened based on an integrated analysis.(4) The advantages and disadvantages of the proposed model were compared with other techniques.

Sample Preparation
Live pupae from five species bred in autumn 2022 were sourced from the Institute of Sericulture Science and Technology Research in Chongqing, China, including the 7532 and 872 species from the Japanese system and the 871, FuRong, and HaoYue species from the Chinese system.Furthermore, skilled workers selected 120 female and 120 male pupae from each species based on the gonadal texture of their tails.

Image Acquisition and Data Partitioning
To capture pupae images, a system was configured using a digital single-lens reflex camera (D90, Nikon, Tokyo, Japan), paired with a zoom lens (AF-S NIKKOR 18-105 mm f/3.5-5.6 G ED, Nikon, Tokyo, Japan) mounted on a tripod (CVT-999RM, YunTeng, Zhongshan, China).The shooting distance of the lens to the pupa was 34 cm.Under indirect natural light, images were acquired with an F5.6 aperture, sensitivity of 100, automatic exposure, and white balance.The captured images were saved in JPEG format with a resolution of 2848 × 4288 pixels.

Sample Preparation
Live pupae from five species bred in autumn 2022 were sourced from the Institute of Sericulture Science and Technology Research in Chongqing, China, including the 7532 and 872 species from the Japanese system and the 871, FuRong, and HaoYue species from the Chinese system.Furthermore, skilled workers selected 120 female and 120 male pupae from each species based on the gonadal texture of their tails.

Image Acquisition and Data Partitioning
To capture pupae images, a system was configured using a digital single-lens reflex camera (D90, Nikon, Tokyo, Japan), paired with a zoom lens (AF-S NIKKOR 18-105 mm f/3.5-5.6 G ED, Nikon, Tokyo, Japan) mounted on a tripod (CVT-999RM, YunTeng, Zhongshan, China).The shooting distance of the lens to the pupa was 34 cm.Under indirect natural light, images were acquired with an F5.6 aperture, sensitivity of 100, automatic exposure, and white balance.The captured images were saved in JPEG format with a resolution of 2848 × 4288 pixels.
Image acquisition for each species was conducted between the 9th and 11th days during the pupal stage.As shown in Figure 2, each pupa was imaged from the back, abdomen, and side views; therefore, a dataset of 3600 images was collected, representing male and female pupae from 5 species (10 classes with 360 images each).Given the significant differences in pupae weight by species and sex [13,29], weight served as the basis for data partitioning.For traditional approaches, the datasets were split at a ratio of 8:2 for training and testing.For deep learning and fusion approaches, the split was 8:1:1 for the training, calibration, and testing sets.
ing the pupal stage.As shown in Figure 2, each pupa was imaged from the back, abdomen, and side views; therefore, a dataset of 3600 images was collected, representing male and female pupae from 5 species (10 classes with 360 images each).Given the significant differences in pupae weight by species and sex [13,29], weight served as the basis for data partitioning.For traditional approaches, the datasets were split at a ratio of 8:2 for training and testing.For deep learning and fusion approaches, the split was 8:1:1 for the training, calibration, and testing sets.

Image Preprocessing
To precisely separate the pupa from the background, the image preprocessing flow was based on sampled images and involved steps such as graying, binarization, image inversion, open operation, and contour detection.Figure 3 illustrates these operations, during which an area threshold was established for all identified contours to reduce interference from molts during image acquisition.After identifying the pupa, each image was cropped to 320 × 320 pixels (centered around the pupa mass) and then processed for feature extraction.

Image Preprocessing
To precisely separate the pupa from the background, the image preprocessing flow was based on sampled images and involved steps such as graying, binarization, image inversion, open operation, and contour detection.Figure 3 illustrates these operations, during which an area threshold was established for all identified contours to reduce interference from molts during image acquisition.After identifying the pupa, each image was cropped to 320 × 320 pixels (centered around the pupa mass) and then processed for feature extraction.
female pupae from 5 species (10 classes with 360 images each).Given the significant differences in pupae weight by species and sex [13,29], weight served as the basis for data partitioning.For traditional approaches, the datasets were split at a ratio of 8:2 for training and testing.For deep learning and fusion approaches, the split was 8:1:1 for the training, calibration, and testing sets.

Image Preprocessing
To precisely separate the pupa from the background, the image preprocessing flow was based on sampled images and involved steps such as graying, binarization, image inversion, open operation, and contour detection.Figure 3 illustrates these operations, during which an area threshold was established for all identified contours to reduce interference from molts during image acquisition.After identifying the pupa, each image was cropped to 320 × 320 pixels (centered around the pupa mass) and then processed for feature extraction.Features function as distinctive attributes for classifying input patterns.Traditional features are the basics of visual attributes, including shape, texture, color, and transformbased features [30].Herein, six traditional descriptors were employed to extract features: SD-PSF, HOG, Hu moment, equivalent LBP, GLCM, and color histogram (Table 1).The SD-PSF is derived from the pupal contour, and it includes parameters such as perimeter, area, and dispersity, as well as the minimum, average, and maximum radii, and the radius ratio.It also covers the major and minor axes, aspect ratio, rectangularity, circularity, and compactness by fitting the contour to shapes, e.g., rectangle, circle, and ellipse.In addition, the minor axis at K/24 of the pupal major axis (KMM, where K ranges from 1 to 23) represents the change in the contour's curvature.As shown in Figure 4, the pupal contour can be placed horizontally using its major axis and the eccentricity of its minimum-fit ellipse.Furthermore, the KMM can be calculated using its straight-edge bounding rectangle.In addition, the pupal head and tail are determined by comparing TL1 and TL2.Herein, if TL2 > TL1, the pupal contour is flipped to maintain the order for feature extraction.

Traditional Feature-Based Approaches
Features function as distinctive a ributes for classifying input pa erns.Traditional features are the basics of visual a ributes, including shape, texture, color, and transformbased features [30].Herein, six traditional descriptors were employed to extract features: SD-PSF, HOG, Hu moment, equivalent LBP, GLCM, and color histogram (Table 1).
Equivalent LBP 59 It relies on numerical leaps in the ordering of differential values between the center pixel and its neighbors.Color histogram 768 Hue, saturation, and intensity were each quantized to 256 dimensions.
The SD-PSF is derived from the pupal contour, and it includes parameters such as perimeter, area, and dispersity, as well as the minimum, average, and maximum radii, and the radius ratio.It also covers the major and minor axes, aspect ratio, rectangularity, circularity, and compactness by fi ing the contour to shapes, e.g., rectangle, circle, and ellipse.In addition, the minor axis at K/24 of the pupal major axis (KMM, where K ranges from 1 to 23) represents the change in the contour's curvature.As shown in Figure 4, the pupal contour can be placed horizontally using its major axis and the eccentricity of its minimum-fit ellipse.Furthermore, the KMM can be calculated using its straight-edge bounding rectangle.In addition, the pupal head and tail are determined by comparing TL1 and TL2.Herein, if TL2 > TL1, the pupal contour is flipped to maintain the order for feature extraction.HOG is a descriptor characterizing an image's local gradient direction and intensity, and is robust to illumination shifts and invariant to local geometric and photometric transformations [31].Hu moments, geometrically invariant moments formed from second-and HOG is a descriptor characterizing an image's local gradient direction and intensity, and is robust to illumination shifts and invariant to local geometric and photometric transformations [31].Hu moments, geometrically invariant moments formed from secondand third-order central moments, excel in pattern recognition and image matching because of their robustness against zooming, translation, rotation, and mirroring [32].
GLCM, which is frequently employed in statistical texture analysis, extracts texture features from a gray-level co-occurrence matrix and captures an image's details, including direction, interval, change range, and speed [10].On the other hand, LBP operators highlight the local image texture structure and resist gray-scale variations [33].Among them, the equivalent LBP takes advantage of numerical variations in the sequence of differential values between a central pixel and its surrounding pixels.
Animals 2023, 13, 3612 6 of 14 Color histograms serve as statistical representations of the frequency distribution of color intensity levels in images [30].The color histogram, which decouples color information from grayscale and characterizes images based on hue, saturation, and intensity, is particularly well-suited for research in machine vision [34].

Deep Learning Feature-Based Approaches
CNN can extract features directly from images through weight sharing and convolutional processes [35].Table 2 describes the six CNN architectures explored in this study: VGG16 [36], ResNet101 [37], DenseNet169 [38], MobileNetV3-L [39], RegNetX-8GF [40], and ConvNeXt-S [41]. Figure 5 shows the research framework for the deep learning approaches.Initially, data augmentation was employed to enhance training performance and prevent model overfitting [42].This process included the various transformations, such as flips, translations, rotations, and brightness adjustments, encountered when photographing the pupae.Transfer learning was used to expedite model convergence and boost accuracy [43].Particularly, weights pretrained on ImageNet were used to initialize the CNN model, followed by fine-tuning at a lower learning rate (LR).For both training and validation sets, the batch size was set to 64.During CNN training, stochastic gradient descent with a momentum of 0.9 and cross-entropy were employed as the optimizer and loss functions, respectively.A dynamic LR strategy was used, starting with an initial LR of 0.0001 and reducing it by a factor of 0.8 every five epochs of training.As shown in Figure 6, the early stopping method was employed to handle overfitting and halt training early.Herein, convergence began if training loss fluctuated by <0.001 over five epochs.If the validation loss shifts < 0.005 in another five epochs, it is considered fully converged, leading to early termination.Particularly, the best model parameters are saved from peak validation accuracy.

Identification
In this step, features extracted from the pupa image are input into a classifier for identification.The classifiers include MLP, SVM, and RF, and their parameters and settings are shown in Table 3. Herein, the traditional feature-based approach determines the optimal model hyperparameters through a grid search based on the highest accuracy of the classifier in five-fold cross-validation.These hyperparameters under consideration include the hidden layer size of the MLP, the penalty factor and kernel function's penalty factor of the SVM, and the number of decision trees in the RF.

Performance Evaluation
The performance of the proposed model was evaluated based on accuracy, precision, recall, and F1-score.Identifying pupae species and sex involves multi-class classifications; therefore, the arithmetic mean of the metrics was used for all classes, as described in Equations ( 1)-( 4).
For the hybrids of two descriptors that include {ConvNeXt-S}, MLP achieved the highest recognition rates for {Species, Sex, Species + Sex} using {ResNet101 + ConvNeXt-S} at {98.08%, 97.73%, 95.98%}, and the SVM reached {98.36%, 97.88%, 96.46%} with {DenseNet169 + ConvNeXt-S}.The RF obtained the highest recognition rates of {97.17%, 90.58%} for {Sex, Species + Sex} using {RegNetX-8GF + ConvNeXt-S}.However, these results show a decrease compared to those obtained using single descriptors.This indicates that deep learning performance relies not only on feature dimensions but also on the dataset scale and the interplay between descriptor structures.Moreover, descriptors linked to {MobileNetV3-L} were found to underperform in the classification tasks.

Fusion Approaches
Based on the above experimental results, this study selected the optimal traditional and deep learning descriptors from each classifier to conduct fusion experiments.Before the descriptors were concatenated, each descriptor's feature dimensions were compressed to 500.As shown in Table 6, the MLP with {HOG + ConvNeXt-S} reached recognition rates of {99.09%, 99.09%, 98.40%} for {Species, Sex, Species + Sex}, while the RF with {Color Histogram + RegNet-8GF + ConvNeXt-S} obtained the lowest accuracy at {66.84%, 89.92%, 73.01%} for the same categories.Even though the accuracy of the {HOG + ResNet101 + ConvNeXt-S + MLP} model is lower compared to the {HOG + ConvNeXt-S + MLP} model, and the accuracy of the {HOG + DenseNet169 + ConvNeXt-S + SVM} model is lower compared to the {HOG + ConvNeXt-S + SVM} model, both can serve as references for more complex future classification tasks.
Table 7 compares the precision, recall, and F1-score of three models.For all models, precision outperforms recall, thereby suggesting fewer false positives.Based on these results, the features extracted by the deep learning approaches exhibit superior performance compared to the traditional approaches.For example, even when SVM uses {ConvNeXt-S}, its precision, recall, and F1 score (0.9772, 0.9767, and 0.9769) surpass the {HOG + Color Histogram} (0.9468, 0.9458, and 0.9463).In addition, the {HOG + ConvNeXt-S + MLP} model also achieved the highest precision, recall, and F1-score (0.9840, 0.9838, and 0.9839).
Figure 7 shows the precision-recall (PR) and receiver operating characteristic (ROC) curves to visualize model performance.The area under the PR and ROC curves for all classes in the {HOG + ConvNeXt-S + MLP} model ranges from 0.984 to 1.0 and 0.996 to 1.0, respectively.Here, the areas under the PR and ROC curves for 871Female, 872Female, FuRongFemale, and HaoYueFemale all reached 1.0.This reflects the high reliability of machine and deep learning methods for recognizing pupae species and sex through images.

Comparison with Other Techniques
Compared to MRI [8], X-ray imaging [9], HSI [10], and NIR [19], the proposed global model has a lower cost; it is also nondestructive and nonradiative, unlike X-ray, DNA, and amino acid analysis [6,7].The proposed model is based on stable genetic pupae traits unaffected by breeding conditions, thereby reducing the frequent modeling needs of NIR.

Comparison with Other Techniques
Compared to MRI [8], X-ray imaging [9], HSI [10], and NIR [19], the proposed global model has a lower cost; it is also nondestructive and nonradiative, unlike X-ray, DNA, and amino acid analysis [6,7].The proposed model is based on stable genetic pupae traits unaffected by breeding conditions, thereby reducing the frequent modeling needs of NIR.It can also mitigate the effects of pupal postures on feature extraction and bypass precise positioning for methods based on the gonadal traits [21,23].Furthermore, to the best of our knowledge, this is the first study to introduce global modeling to predict pupae species and sex, and it achieved 99.09% accuracy for separate identification and 98.40% for simultaneous recognition.Thus, the proposed global model is more suitable for practical application in sericulture.

Conclusions
Based on a dataset of posture images of silkworm pupae, this study aims to establish a global model to accurately identify the species and sex of pupae separately or simultaneously.Three machine learning classifiers (MLP, SVM, and RF) employed traditional descriptors (SD-PSF, HOG, Hu moments, equivalent LBP, GLCM, and color histogram) and deep learning descriptors (VGG16, ResNet101, DenseNet169, MobileNetV3-L, RegNetX-8GF, and ConvNeXt-S) to construct classification models.Next, the model performance of traditional, deep learning, and their fusion approaches were evaluated and compared.The findings demonstrate that the {HOG + ConvNeXt-S + MLP} model achieved top recognition rates of 99.09%, 99.09%, and 98.40% for species, sex, and species + sex, respecively.Additionally, its precision, recall, and F1-score were 0.9840, 0.9838, and 0.9839, with all classes having achieved an area under the PR and ROC curves exceeding 0.984 and 0.996, respectively.These results validate the effectiveness of machine learning and deep learning in recognizing the species and sexes of pupae through image analysis.Future research will collect a broader range of pupal images from different species and sexes to assess the proposed model's generalizability under varied conditions.This will include the development of a finer-grained neural network based on HOG and ConvNeXt with the aim of improving the detection of subtle variations across species and sexes.Additional efforts will focus on modeling using datasets from diverse breeding batches to precisely identify stable hereditary phenotypic traits within individual pupal species.

( 3 )
The {HOG + ConvNeXt-S + MLP} model with the top recognition rate was screened based on an integrated analysis.(4) The advantages and disadvantages of the proposed model were compared with other techniques.

Figure 1 .
Figure 1.Flowchart of the identification of silkworm pupae species and sex.

Figure 1 .
Figure 1.Flowchart of the identification of silkworm pupae species and sex.

Figure 2 .
Figure 2.An example of back, abdomen, and side posture images of five species of male and female pupae.7532, 871, 872, FuRong, HaoYue represent the five silkworm species.

Figure 2 .
Figure 2.An example of back, abdomen, and side posture images of five species of male and female pupae.7532, 871, 872, FuRong, HaoYue represent the five silkworm species.

Figure 2 .
Figure 2.An example of back, abdomen, and side posture images of five species of male and female pupae.7532, 871, 872, FuRong, HaoYue represent the five silkworm species.

Figure 4 .
Figure 4.The process of extracting KMM.The black dashed line represents the horizontal line; the pink solid line and its length represent the KMM and its size.TL: total length.

Figure 4 .
Figure 4.The process of extracting KMM.The black dashed line represents the horizontal line; the pink solid line and its length represent the KMM and its size.TL: total length.

Animals 2023 , 15 Figure 5 .
Figure 5.The research framework for deep learning feature-based approaches.Figure 5.The research framework for deep learning feature-based approaches.

Figure 5 .
Figure 5.The research framework for deep learning feature-based approaches.Figure 5.The research framework for deep learning feature-based approaches.

Figure 5 .
Figure 5.The research framework for deep learning feature-based approaches.

Figure 8
Figure 8 further illustrates the more intuitive classification efficacy of the {HOG + ConvNeXt-S + MLP} model.The accuracy for all classes under this model ranges from 94.70% to 100%, with eight classes exceeding 98%.Particularly, the classification accuracies for the FuRongFemale and HaoYueFemale classes are 100%, corresponding to the RP and ROC curve results.In addition, misclassification mainly occurs between pupae of different sexes within the same species and those of the same sex across different species.On one hand, the {HOG + ConvNeXt-S + MLP} largely misclassifies 7532Male as 7532Female (and vice versa).On the other hand, it predominantly misclassifies 872Male as 7532Male, 871Male, FuRongMale, or HaoYueMale.

Table 1 .
Number of features extracted by six traditional descriptors and their descriptions.• , 45 • , 90 • , and 135 • .This was followed by computing angular moments, contrast, entropy, correlation, and inverse difference moments.

Table 1 .
Number of features extracted by six traditional descriptors and their descriptions.

Table 2 .
Details of the six CNN architectures.

Table 6 .
Classification accuracy of fusion approaches.

Table 7 .
Precision, recall, and F1-score of the selected descriptor and classifier combinations.