Papaver somniferum and Papaver rhoeas Classification Based on Visible Capsule Images Using a Modified MobileNetV3-Small Network with Transfer Learning

Traditional identification methods for Papaver somniferum and Papaver rhoeas (PSPR) consume much time and labor, require strict experimental conditions, and usually cause damage to the plant. This work presents a novel method for fast, accurate, and nondestructive identification of PSPR. First, to fill the gap in the PSPR dataset, we construct a PSPR visible capsule image dataset. Second, we propose a modified MobileNetV3-Small network with transfer learning, and we solve the problem of low classification accuracy and slow model convergence due to the small number of PSPR capsule image samples. Experimental results demonstrate that the modified MobileNetV3-Small is effective for fast, accurate, and nondestructive PSPR classification.


Introduction
The private cultivation of Papaver somniferum is illegal in many countries because its extracts can be turned into addictive and poisonous opioids. However, because of the huge profits, the illegal cultivation of Papaver somniferum occurs all over the world. The appearance of Papaver somniferum is similar to that of its relatives, such as the ornamental plant Papaver rhoeas, frequently leading to mistaken identification reports from civilians engaged in anti-drug work. This paper seeks to develop a fast, accurate, and non-destructive identification method for Papaver somniferum and its close relatives (represented by Papaver rhoeas) to improve civilians' ability to distinguish between them, thereby effectively assisting the police in drug control work. It also provides model support for the development of Papaver somniferum identification systems on mobile terminals.
Papaver somniferum is traditionally identified by methods including direct observation, physical and chemical property identification, and spectral analysis. Zhang et al. [1] employed a discrete stationary wavelet transform to extract characteristics from Fourier transform infrared spectroscopy data to identify Papaver somniferum and Papaver rhoeas (PSPR). Choe et al. [2] used metabolite spectral analysis to identify Papaver somniferum, Papaver rhoeas, and Papaver setigerum. Wang et al. [3] used specific combinations of characteristic wavelength points to distinguish between Papaver somniferum and non-poppy plants, proving that spectral properties can be used to identify Papaver somniferum. Li [4] used a fluorescent complex amplification test that contained three simple sequence repeats to achieve the precise detection of Papaver somniferum and its relatives.
The above methods have limitations that render them unsuitable for the identification of PSPR for ordinary people in daily life. Direct observation, for example, is time-consuming and labor-intensive, and observers must be familiar with the characteristics of these plants. Other approaches require stringent experimental conditions and tedious operations with GhostNet to increase classification accuracy and reduce intermediate parameters, according to the features of remote sensing image datasets, and they achieved higher classification accuracy on the AID, UC Merced, and NWPU-RESISC45 datasets.
DCNNs can show superior performance only when there are enough training samples. They are prone to phenomena such as overfitting and slipping into local optima when training samples are insufficient [9]. Because Papaver somniferum cultivation is strictly controlled by the government, it is difficult to obtain training data with numerous samples of Papaver somniferum capsule images, and because there is no publicly available PSPR capsule dataset, we can only rely on an Internet image search to build our experimental dataset, which results in a small sample. Transfer learning is a useful machine learning method that applies the knowledge or patterns learned in a certain domain or task to a different but related domain or problem. Existing feature extraction capabilities can be leveraged to accelerate and optimize model learning efficiency with the parameters of a neural network model trained on a large image dataset transferred to a target model to aid in the training of a new model, enabling the training of models with higher recognition accuracy using smaller training samples [28]. Transfer learning can effectively improve the accuracy and robustness of the model, and has been widely used in text processing, [29][30][31] image classification [32][33][34], collaborative filtering [35][36][37], and artificial intelligence planning [38,39].
MobileNetV3 has the advantages of high classification accuracy and a fast classification speed, and it can better balance efficiency and accuracy for image classification on mobile devices. We propose a new classification model, P-MobileNet, based on an improved MobileNetV3 network with transfer learning from ImageNet. This study provides a new solution for fast, accurate, and nondestructive identification of PSPR for ordinary people, and it can be extended to identify any relatives of Papaver somniferum.
The main contributions of this paper are as follows: • A database of 1496 Papaver somniferum capsule images and 1325 Papaver rhoeas capsule images is established; • The structure of the MobileNetV3 network is improved to reduce the number of parameters and amount of computation, achieving fast, convenient, accurate, and non-destructive identification of PSPR; • The effectiveness of data expansion and transfer learning for model training is experimentally verified, and the influence of different transfer learning methods on the model is compared; • The improved MobileNetV3 model combined with transfer learning solves the problem of low classification accuracy and slow model convergence due to the small number of PSPR capsule image samples, and it improves the robustness and classification accuracy of the proposed classification model.

Data
It is difficult to take images of Papaver somniferum capsules in the field because its cultivation is strictly controlled by the government. Therefore, all datasets for this experiment were collected from an Internet search, with a total of 2821 images, comprising 1496 images of Papaver somniferum capsules and 1325 images of Papaver rhoeas capsules. The intercepted images were taken under different angles and light, and covered all growth and development stages of the capsule stage (flowering-fruiting, fruiting, and seed-drop), as shown in Figure 1. Note that the capsule images in the dataset are not of the same size and are resized consistently in Figure 1 for aesthetics. The maximum and minimum sizes of images in the capsule dataset are 624 × 677 pixels and 27 × 35 pixels, respectively.  The establishment of the PSPR capsule image dataset can be divided into the following steps: 1. First, the dataset was mixed and scrambled and separated into training, validation, and testing data at a ratio of 8:1:1; 2. To improve the model's feature-extraction and generalization ability and avoid the problems of overfitting and low classification accuracy caused by a small sample dataset, the capsule image training set was expanded using common data expansion methods in deep learning [28], that is, horizontal mirroring, vertical mirroring, and rotation by 90, 180, and 270 degrees, respectively, as shown in Figure 2. As in Figure  1, the capsule images in Figure 2 are resized to a consistent size. The expanded training set includes 7170 Papaver somniferum capsule images and 6366 Papaver rhoeas capsule images; 3. Finally, all image sizes were resized to 224 × 224 pixels to ensure that the data suited the model's input size. The process flow of the establishment of the capsule image dataset is shown in Figure  3. The establishment of the PSPR capsule image dataset can be divided into the following steps: 1.
First, the dataset was mixed and scrambled and separated into training, validation, and testing data at a ratio of 8:1:1; 2.
To improve the model's feature-extraction and generalization ability and avoid the problems of overfitting and low classification accuracy caused by a small sample dataset, the capsule image training set was expanded using common data expansion methods in deep learning [28], that is, horizontal mirroring, vertical mirroring, and rotation by 90, 180, and 270 degrees, respectively, as shown in Figure 2. As in Figure 1, the capsule images in Figure 2 are resized to a consistent size. The expanded training set includes 7170 Papaver somniferum capsule images and 6366 Papaver rhoeas capsule images; 3.
Finally, all image sizes were resized to 224 × 224 pixels to ensure that the data suited the model's input size.  The establishment of the PSPR capsule image dataset can be divided into the following steps: 1. First, the dataset was mixed and scrambled and separated into training, validation, and testing data at a ratio of 8:1:1; 2. To improve the model's feature-extraction and generalization ability and avoid the problems of overfitting and low classification accuracy caused by a small sample dataset, the capsule image training set was expanded using common data expansion methods in deep learning [28], that is, horizontal mirroring, vertical mirroring, and rotation by 90, 180, and 270 degrees, respectively, as shown in Figure 2. As in Figure  1, the capsule images in Figure 2 are resized to a consistent size. The expanded training set includes 7170 Papaver somniferum capsule images and 6366 Papaver rhoeas capsule images; 3. Finally, all image sizes were resized to 224 × 224 pixels to ensure that the data suited the model's input size.  The process flow of the establishment of the capsule image dataset is shown in Figure  3. The process flow of the establishment of the capsule image dataset is shown in Figure 3.  Figure 3. Establishment of the capsule image dataset of PSPR.

Basic MobileNetV3-Small
MobileNetV3, as part of a new generation of lightweight networks, builds on Mo-bileNetV1 and MobileNetV2 by combining deep separable convolution and an inverse residual structure with a linear bottleneck to improve computational efficiency and effectively extract feature information. It uses platform-aware Neural Architecture Search [40] and Neural Network Adaptation [41] to optimize the network structure and parameters. A Squeeze-and-Excite (SE) [24] channel attention module further improves network performance and operational efficiency. Figure 4 shows the MobileNetV3 structure.

Basic MobileNetV3-Small
MobileNetV3, as part of a new generation of lightweight networks, builds on Mo-bileNetV1 and MobileNetV2 by combining deep separable convolution and an inverse residual structure with a linear bottleneck to improve computational efficiency and effectively extract feature information. It uses platform-aware Neural Architecture Search [40] and Neural Network Adaptation [41] to optimize the network structure and parameters. A Squeeze-and-Excite (SE) [24] channel attention module further improves network performance and operational efficiency. Figure 4 shows the MobileNetV3 structure.  Figure 3. Establishment of the capsule image dataset of PSPR.

Basic MobileNetV3-Small
MobileNetV3, as part of a new generation of lightweight networks, builds on Mo-bileNetV1 and MobileNetV2 by combining deep separable convolution and an inverse residual structure with a linear bottleneck to improve computational efficiency and effectively extract feature information. It uses platform-aware Neural Architecture Search [40] and Neural Network Adaptation [41] to optimize the network structure and parameters. A Squeeze-and-Excite (SE) [24] channel attention module further improves network performance and operational efficiency. Figure 4 shows the MobileNetV3 structure.   MobileNetV3 includes two versions: MobileNetV3-Small and MobileNetV3-Large, with similar architecture but different complexity to suit different scenarios. MobileNetV3-Small is suitable for low-performance mobile devices and embedded devices. Considering the issues of computational cost and model efficiency, we use MobileNetV3-Small as the basic framework of the PSPR classifier and improve its network structure.

Construction of Network for Papaver Somniferum Identification
We propose a P-MobileNet model based on transfer learning and a modified MobileNetV3-Small model to lower the model's data requirements while improving operational efficiency. Figure 5 shows the P-MobileNet model structure, which consists of a pre-trained MobileNetV3-Small model on the ImageNet dataset and a modified MobileNetV3-Small model. MobileNetV3 includes two versions: MobileNetV3-Small and MobileNetV3-Large, with similar architecture but different complexity to suit different scenarios. Mo-bileNetV3-Small is suitable for low-performance mobile devices and embedded devices. Considering the issues of computational cost and model efficiency, we use MobileNetV3-Small as the basic framework of the PSPR classifier and improve its network structure.

Construction of Network for Papaver Somniferum Identification
We propose a P-MobileNet model based on transfer learning and a modified Mo-bileNetV3-Small model to lower the model's data requirements while improving operational efficiency. Figure 5 shows the P-MobileNet model structure, which consists of a pretrained MobileNetV3-Small model on the ImageNet dataset and a modified MobileNetV3-Small model.

Transfer Learning
DCNNs often fail to achieve higher prediction performance with small sample datasets, they are prone to problems such as training difficulty and overfitting [9], and it is sometimes difficult to obtain a large amount of data with labels. Transfer learning is an efficient strategy to solve image classification problems with small samples [32][33][34]42,43].
There are two main approaches for applying a pre-trained DCNN to a new image classification task [9,44]. One approach, called transfer learning method 1 (TL_M1), is to freeze all the weights of the convolutional layers from the pre-trained model and use them as fixed-feature extractors [9,45,46], and fully connected layers are added and trained using the new sample dataset. The other, called transfer learning method 2 (TL_M2), is to initialize the target model using the weights of the pre-trained model and then fine-tune the network weights training on the new sample dataset [9,47,48]. The impact of transfer learning on the model will be described in detail in Section 4.3 through experiments.

Modified MobileNetV3-Small Model
MobileNetV3-Small performed well on the challenging thousand-classification task on ImageNet. As for our binary identification task, deep networks impose excessive calculation costs and affect the classification speed. Consequently, after the analysis of the network configuration, we modified the architecture of the MobileNetV3-Small network to improve efficiency without degrading performance. The kernel size of the depthwise convolution of the last bottleneck layer of the original MobileNetV3-Small model is modified from 5 × 5 to 3 × 3 to reduce the calculation and latency of feature extraction. The

Transfer Learning
DCNNs often fail to achieve higher prediction performance with small sample datasets, they are prone to problems such as training difficulty and overfitting [9], and it is sometimes difficult to obtain a large amount of data with labels. Transfer learning is an efficient strategy to solve image classification problems with small samples [32][33][34]42,43].
There are two main approaches for applying a pre-trained DCNN to a new image classification task [9,44]. One approach, called transfer learning method 1 (TL_M1), is to freeze all the weights of the convolutional layers from the pre-trained model and use them as fixed-feature extractors [9,45,46], and fully connected layers are added and trained using the new sample dataset. The other, called transfer learning method 2 (TL_M2), is to initialize the target model using the weights of the pre-trained model and then fine-tune the network weights training on the new sample dataset [9,47,48]. The impact of transfer learning on the model will be described in detail in Section 4.3 through experiments.

Modified MobileNetV3-Small Model
MobileNetV3-Small performed well on the challenging thousand-classification task on ImageNet. As for our binary identification task, deep networks impose excessive calculation costs and affect the classification speed. Consequently, after the analysis of the network configuration, we modified the architecture of the MobileNetV3-Small network to improve efficiency without degrading performance. The kernel size of the depthwise convolution of the last bottleneck layer of the original MobileNetV3-Small model is modified from 5 × 5 to 3 × 3 to reduce the calculation and latency of feature extraction. The last two 1 × 1 convolution layers, responsible for extrapolation and classification, are reduced to one layer to reduce the number of parameters. These changes significantly reduce the number of model parameters, along with the computational burden, while maintaining accuracy. Table 1 shows the network structure of the proposed P-MobileNet model.
The columns in Table 1 are as follows: (1) Input represents the feature map size input to each feature layer of MobileNetV3; (2) Operator represents the layer structure which each feature map will cross; (3) Exp size represents the number of channels after the inverse residual structure in the bottleneck rises; (4) Out represents the number of channels in the feature map after passing the bottleneck; (5) SE represents whether the SE attention mechanism is introduced at this layer; (6) NL represents the type of activation function used, HS (h-swish) or RE (ReLU); and (7) S represents the step size used for each layer structure.

Experimental Environment
The configuration used for model training and testing in this paper is as follows: Intel Core i5-10210U CPU @ 1.60 GHz/2.11 GHz; 16 GB RAM; Nvidia GeForce MX250 graphics card; Windows 10 Home Chinese version; CUDA version 10.1; and PyTorch 3.8.

Evaluation Indicators
The model was evaluated based on accuracy, precision (P), recall (R), F1, number of parameters, computation (measured using FLOPs), weight file size, and average prediction time for a single image. The task of PSPR is a binary classification problem, and we define Papaver somniferum as the positive class and Papaver rhoeas as the negative class.
Accuracy, precision, recall, and F1 are defined as follows [6,49]: Accuracy reflects the proportion of correct predictions in the entire sample; Precision reflects the proportion of samples with positive predictions that are positive; Recall indicates the proportion of all positive samples that are correctly predicted; F1 is the summed average of precision and recall [25].

Experimental Design
The MobileNetV3-Small model trained on the ImageNet dataset was selected as the basic model and P-MobileNet was the target model. Six sets of experiments were conducted, combined with three learning methods (training from scratch, TL_M1, and TL_M2) and two data expansion methods (unexpanded data and expanded data).
Specifically, training from scratch means randomly initializing the weight parameters of all layers of the model, and the capsule image dataset is used to train the model, following which the back-propagation algorithm is used to tune its weights. In TL_M1, the pre-trained model's weights are used as fixed feature extractors and the linear classifiers are trained on the new sample dataset. To clarify, since the feature extraction layer structure of P-MobileNet is not identical to that of MobileNetV3-Small (the kernel size of the depthwise convolution of the last bottleneck layer of MobileNetV3-Small is modified from 5 × 5 to 3 × 3), the weight information of this layer is not passed from the pre-trained model but is trained from scratch together with the classification layer (which is a 1 × 1 convolutional layer without batch normalization in P-MobileNet). In TL_M2, the new sample dataset is used to fine-tune all layers of the model initialized by the weights of the pre-trained model (the weight information of the last bottleneck layer from the pre-trained model is ignored, as in TL_M1). This enables the model to learn highly generalizable features from a larger sample dataset, while the features are more relevant to the new classification task.
Regarding the data expansion methods, training under unexpanded data means the model is trained using the original capsule image dataset with 2821 images, while the other is training under the expanded capsule image dataset with 14,099 images, using the data expansion method described in Section 2.
Considering the computation and training time, the batch size for both testing and training was set to eight. The Adam optimizer was used with a learning rate of 0.0001, and the maximum number of training rounds was set to 120 epochs.

Experimental Results and Analysis
After 120 training epochs, a comparison of the performance of P-MobileNet under different learning methods and data expansion methods is shown in Table 2. In addition to the accuracy, precision, recall, and F1 values of the testing set, we also calculated the standard deviation (SD) of the training loss (train_loss) and the accuracy of the validation set (val_acc) to measure the volatility of the data.  Figures  6 and 7, respectively. In both cases, P-MobileNet trained from scratch had the slowest convergence rate with large fluctuations, and the loss function presented a high loss value after stabilization. The model with TL_M2 had the fastest convergence speed and lowest loss value. The accuracy of P-MobileNet trained from scratch was the lowest and fluctuated greatly. The accuracy of the model with transfer learning fluctuated less, among which the accuracy of TL_M2 was the highest. The SD of val_acc for TL_M2 under unexpanded data was decreased by 3.354 percentage points compared to that for training from scratch.  The differences in the model performance between TL_M1 and TL_M2 were relatively small, but it can still be observed that P-MobileNet with TL_M2 was more advantageous than training with TL_M1. From Table 2, the F1 value of TL_M2 was more than 1 percentage point higher than that of TL_M1, which shows that P-MobileNet with TL_M2 has higher recognition accuracy and robustness.  1. Influence of different learning methods on model performance. The train_loss curve and val_acc for the three learning methods are shown in Figures  6 and 7, respectively. In both cases, P-MobileNet trained from scratch had the slowest convergence rate with large fluctuations, and the loss function presented a high loss value after stabilization. The model with TL_M2 had the fastest convergence speed and lowest loss value. The accuracy of P-MobileNet trained from scratch was the lowest and fluctuated greatly. The accuracy of the model with transfer learning fluctuated less, among which the accuracy of TL_M2 was the highest. The SD of val_acc for TL_M2 under unexpanded data was decreased by 3.354 percentage points compared to that for training from scratch.  The differences in the model performance between TL_M1 and TL_M2 were relatively small, but it can still be observed that P-MobileNet with TL_M2 was more advantageous than training with TL_M1. From Table 2, the F1 value of TL_M2 was more than 1 percentage point higher than that of TL_M1, which shows that P-MobileNet with TL_M2 has higher recognition accuracy and robustness. The differences in the model performance between TL_M1 and TL_M2 were relatively small, but it can still be observed that P-MobileNet with TL_M2 was more advantageous than training with TL_M1. From Table 2, the F1 value of TL_M2 was more than 1 percentage point higher than that of TL_M1, which shows that P-MobileNet with TL_M2 has higher recognition accuracy and robustness.
These results indicate that transfer learning effectively solved the problems of low classification accuracy and slow model convergence due to a small-sample dataset.

2.
Effect of data expansion on model performance.
The train_loss and val_acc for the expanded and unexpanded datasets under three different training methods are shown in Figures 8-10. For these three different learning methods, a general phenomenon was observed, namely that the loss function of the model trained on the expanded capsule image dataset was lower and less volatile than that on the original dataset. From Table 2, for training from scratch, TL_M1, and TL_M2, the test accuracy under expanded data was 1.4, 0.7, and 0.3 percentage points higher, respectively, than that for training on the original data; the SD of val_acc under expanded data was decreased by 2.924, 1.115, and 1.429 percentage points compared to that trained on the original data, respectively, which indicated that data expansion could improve the classification accuracy and robustness of the model. trained on the expanded capsule image dataset was lower and less volatile than tha the original dataset. From Table 2, for training from scratch, TL_M1, and TL_M2, the accuracy under expanded data was 1.4, 0.7, and 0.3 percentage points higher, respectiv than that for training on the original data; the SD of val_acc under expanded data decreased by 2.924, 1.115, and 1.429 percentage points compared to that trained on original data, respectively, which indicated that data expansion could improve the cl fication accuracy and robustness of the model.    trained on the expanded capsule image dataset was lower and less volatile than tha the original dataset. From Table 2, for training from scratch, TL_M1, and TL_M2, the accuracy under expanded data was 1.4, 0.7, and 0.3 percentage points higher, respecti than that for training on the original data; the SD of val_acc under expanded data decreased by 2.924, 1.115, and 1.429 percentage points compared to that trained on original data, respectively, which indicated that data expansion could improve the cl fication accuracy and robustness of the model.    trained on the expanded capsule image dataset was lower and less volatile than tha the original dataset. From Table 2, for training from scratch, TL_M1, and TL_M2, the accuracy under expanded data was 1.4, 0.7, and 0.3 percentage points higher, respectiv than that for training on the original data; the SD of val_acc under expanded data decreased by 2.924, 1.115, and 1.429 percentage points compared to that trained on original data, respectively, which indicated that data expansion could improve the cl fication accuracy and robustness of the model.    It could also be found that, under the model trained from scratch, data expansion had a greater promotion effect on improving the accuracy of the model and avoiding the phenomenon of overfitting than under the model with transfer learning. This was mainly due to the fact that the pre-trained model learned a large amount of knowledge on the large image dataset, weakening the role of data expansion.
In any case, the accuracy and robustness of the model were improved by different magnitudes on the expanded capsule image dataset, regardless of the learning strategy, indicating that the data expansion provided the necessary amount of data for model training and that a certain size of dataset is still necessary.
To summarize, the expanded capsule image dataset was used to train P-MobileNet with TL_M2.

Comparison of Classification Networks
To verify the effectiveness of P-MobileNet for PSPR identification, we compared various DCNNs on the self-constructed PSPR capsule image dataset (including the expanded training data, unexpanded validation data, and test data), with a total of 14099 images. Models included some representative traditional CNNs (AlexNet, GoogLeNet, ResNet-34) and popular lightweight networks. All models were trained under transfer learning. Classification results were compared in terms of accuracy, precision, recall, F1, number of parameters, FLOPs, weight file size, and average prediction time for a single image on the testing set, as shown in Table 3.  Table 3 further illustrates that the traditional network models could not meet the requirements for mobile deployment because of their enormous calculations. Lightweight networks tend to have much fewer parameters and FLOPs than traditional networks, but they have comparable or even better model performance. Among the lightweight network models, SqueezeNet had the fewest parameters and smallest model size but the lowest accuracy and recall rates, 96.2% and 94.7%, respectively. ShuffleNetV2 outperformed SqueezeNet, with the smallest FLOPs of 2.28 M, but the largest number of parameters, 148.8 M. The performance of GhostNet and MobileNetV3 exceeded that of ResNet-34.
MobileNetV3 performed best. The number of parameters, amount of computation, and model size of MobileNetV3-Small were much smaller than those of MobileNetV3-Large, while they showed similar performance at PSPR classification, which further indicates the redundancy of the MobileNetV3 model for this task. Compared with MobileNetV3-Small, the recall of P-MobileNet increased by 0.8 percentage points, and the F1 value was the same, at 98.9%. However, P-MobileNet had only 36% of the parameters of MobileNetV3-Small, and it used less calculation. The model was only slightly larger than SqueezeNet, and the prediction speed was the fastest.
We compared the performance of the models based on F1 and the number of parameters, as shown in Figure 11, where the horizontal scale is the number of parameters and the vertical scale is F1. P-MobileNet had the highest F1 with the fewest parameters. We compared the performance of the models based on F1 and the number of parameters, as shown in Figure 11, where the horizontal scale is the number of parameters and the vertical scale is F1. P-MobileNet had the highest F1 with the fewest parameters. Based on these results, P-MobileNet best balanced accuracy and efficiency for the PSPR classification task, with a classification accuracy of 98.9% and an average prediction time of 45.7 ms for a single image, which is better than other tested models. Based on these results, P-MobileNet best balanced accuracy and efficiency for the PSPR classification task, with a classification accuracy of 98.9% and an average prediction time of 45.7 ms for a single image, which is better than other tested models.

Conclusions
The appearance of Papaver somniferum is similar to that of Papaver rhoeas, increasing the difficulty of its identification. Traditional methods of Papaver somniferum identification, including direct observation, physical and chemical property identification, and spectral analysis, cannot be applied to drug-related cases and Papaver somniferum identification in daily life. To solve these problems, we proposed the P-MobileNet model for PSPR classification, based on the improved MobileNetV3-Small with transfer learning.

•
Compared with training from scratch, transfer learning could fully utilize the knowledge learned on large datasets, significantly accelerated the convergence speed of the model, and improved the classification performance. Regardless of the type of transfer learning method adopted, pre-training and fine-tuning P-MobileNet had a superior impact than that obtained by training P-MobileNet from scratch. The feature extraction ability of the random initialization model was not good enough under a small sample dataset; • The impact of data expansion on the model trained from scratch was greater than that of the model with transfer learning. Data expansion enriched the diversity of data, which was helpful to mitigate overfitting and improved the classification performance of the model. Although transfer learning weakened the effect of data expansion, a certain amount of training set expansion was necessary to improve the robustness of the model; • Analysis of the classification performance of different models showed that the proposed P-MobileNet model has the advantages of high classification accuracy, a few parameters, and a fast detection speed. Compared with MobileNetV3-Small, P-MobileNet maintains a high classification accuracy of 98.9%, with only 36% of the parameters of the MobileNetV3-Small model; the FLOPs are reduced by 2 M; and the detection speed is improved to 45.7 ms/image. This study provides a means to achieve the rapid, accurate, and non-destructive identification of PSPR on mobile terminals.