An Intelligent Sorting Method of Film in Cotton Combining Hyperspectral Imaging and the AlexNet-PCA Algorithm

Long-staple cotton from Xinjiang is renowned for its exceptional quality. However, it is susceptible to contamination with plastic film during mechanical picking. To address the issue of tricky removal of film in seed cotton, a technique based on hyperspectral images and AlexNet-PCA is proposed to identify the colorless and transparent film of the seed cotton. The method consists of black and white correction of hyperspectral images, dimensionality reduction of hyperspectral data, and training and testing of convolutional neural network (CNN) models. The key technique is to find the optimal way to reduce the dimensionality of the hyperspectral data, thus reducing the computational cost. The biggest innovation of the paper is the combination of CNNs and dimensionality reduction methods to achieve high-precision intelligent recognition of transparent plastic films. Experiments with three dimensionality reduction methods and three CNN architectures are conducted to seek the optimal model for plastic film recognition. The results demonstrate that AlexNet-PCA-12 achieves the highest recognition accuracy and cost performance in dimensionality reduction. In the practical application sorting tests, the method proposed in this paper achieved a 97.02% removal rate of plastic film, which provides a modern theoretical model and effective method for high-precision identification of heteropolymers in seed cotton.


Introduction
Cotton plays an irreplaceable part in the livelihood of the general population. Xinjiang is China's largest major producer of long-staple cotton. However, due to the low rainfall and strong light, drip irrigation under the film is often adopted to boost yield, which is prone to mixing with impurities such as plastic film during mechanical picking. In the spinning and weaving processes, the residual film combined with seed cotton can result in a significant number of flaws, which can impact the strength and coloring effect of the yarn and lead to financial losses for the textile sector [1].
The existing mainstream cotton film removal processes include Mechanical separation, Electrostatic separation, and Optical color separation. Whitelock D. P. et al. investigated major impurity removal equipment in the US cotton industry. A rotating spiked cylinder was utilized to eliminate significant impurities from the seed cotton. These impurities were subsequently gathered in a separate box by means of a grid strip or screen. [2]. Zhang et al. used computational fluid dynamics (CFD) to model the electrostatic separation of mechanical cotton harvesting and residual plastic film by flying the experimental sample into an electric field at different speeds and applying different electric field forces [3]. With the increasing prevalence of machine vision, optical color separation has become a popular method for the intelligent classification of agricultural products. In a study conducted by Li et al., a machine vision system was utilized to gather information on the color, shape, and texture of foreign fibers in cotton. The resulting data were adopted to achieve a classification accuracy of 92.34% through multi-class support vector machine (MSVM) [4].
However, mechanical classification is challenging in the aspect of assuring accuracy and small-size film classification. Electrostatic separation becomes unstable for long-term work because of environmental conditions. Optical color selection relies on color and form characteristics, making it challenging to effectively classify film which is colorless, transparent, or irregularly shaped. Hence, it is imperative to investigate a dependable technique for identifying transparent films in seed cotton.
Hyperspectral imaging combines advanced knowledge from multiple disciplines to achieve a perfect fusion of traditional two-dimensional imaging techniques and spectroscopy. Guo and Ma described the linear relationship between spectra and data by the partial least squares (PLS) method, which could realize the analysis of adulterated rice and the prediction of pork meat fatty acids [5,6]. Zhang, Jiang, et al. employed the support vector machine (SVM) in combination with shortwave infrared hyperspectral techniques for cotton foreign matter classification, which significantly improved the detection rate of plastic films in cotton compared to conventional methods [7][8][9].
The above literature has yielded promising results. However, extracting features from hyperspectral images requires manual intervention and has limitations in feature mining. In addition, manual feature extraction of hyperspectral images requires considerable expertise and has subjectivity in feature mining and selection. Therefore, it is highly significative for hyperspectral image features to probe an automatic feature extraction method.
Deep learning is an advanced technology applied to image processing. It has the capability to automatically detect and analyze complex information, which helps to extract deeper features. The use of hyperspectral data greatly enhances the accuracy and efficiency of image recognition [10,11]. However, it is important to note that hyperspectral data can be affected by elevated latitudes and severe information redundancy issues. To efficiently extract feature information to support the training of deep learning models, data dimensionality reduction is commonly applied to improve the data processing speed [12]. Jia et al. employed a method for dimensionality reduction of hyperspectral images by flexible Gabor-based superpixel-level unsupervised linear discriminant analysis (LDA), which reduced a large amount of flexible Gabor (FG) features and increased the peculiarity of image features [13]. Kang et al. proposed a method based on PCA-EPFs for hyperspectral image (HSI) classification, which used principal component analysis (PCA) to reduce the dimension of the superimposing Edge-preserving features (EPFs). The literature not only represented the EPFs in the mean square sense but also highlighted the divisibility of pixels in EPFs [14]. To reduce the dimension of hyperspectral remote sensing images, Daniela Lupu et al. established an independent component analysis (ICA) method based on a stochastic higher-order Taylor approximation-based algorithm, which could identify local maxima and facilitate minibatching [15]. The previous researchers have utilized LDA, PCA, and ICA techniques for reducing the dimensionality of hyperspectral data. The experimental results have demonstrated excellent outcomes, effectively enhancing the efficiency of image processing.
Convolutional neural network (CNN) is the most frequently employed deep learning model that performs excellent classification effect in feature extraction of hyperspectral data; it can be used to solve the problem of plastic film in seed cotton [16,17]. LeNet, AlexNet, and VGGNet are frequently employed neural network models in CNN which achieve high classification and recognition accuracy with great fusion with hyperspectral images. Hüseyin Fırat et al. proposed a method to effectively classify hyperspectral remote sensing images (HRSIs) based on PCA dimension reduction and LeNet-5 of the 3D-CNN model. The results showed that a 100% recognition and classification effect was obtained in all experimental data [18]. Jiang et al. obtained hyperspectral images of different types of pesticide residues and used the fusion of the AlexNet-CNN deep learning network to detect post-harvest pesticide residues in apples. The test results showed that when the number of training epochs was 10, the detection accuracy was 99.09% [19]. Zhao  by CNN, the classification accuracy of VGG-16 was 97.00% higher than that of GLNI-v3, and the method could provide theoretical support for the evaluation of cotton loss after waterlogging [20]. The aforementioned literature demonstrates that the CNN-based models (LeNet, AlexNet, VGGNet) mentioned above exhibit strong generalization and adaptive capabilities in processing hyperspectral images, resulting in effective application outcomes.
The combination of hyperspectral imaging and CNN techniques is commonly applied to the classification of remote-sensing images. However, there have been few reports of methods to identify the residual film in seed cotton. The academic paper presents a novel approach for removing film in seed cotton, which combines hyperspectral images and deep learning algorithm. The innovations are listed as follows: (1) The study establishes an optimal method for dimensionality reduction of hyperspectral data, which can reduce redundant hyperspectral characteristic information and reduce the time and cost of subsequent neural network training.
(2) The study integrates hyperspectral imaging technology and deep learning algorithm to obtain the optimal AlexNet-PCA-12 model which can effectively remove the colorless and transparent film in seed cotton in the practical application.
The remainder of the paper is structured as follows. Section 2 describes the hyperspectral sorting system, the theory of dimensionality reduction and CNN. Section 3 illustrates the discussion of the results of dimension reduction and CNN experiments. Conclusions and viewpoints are provided in Section 4.

Experimental Materials
Gaia Sorter-Dual, a full-band hyperspectral Sorter, is used in conjunction with the hyperspectral camera "Image-λ-N25E-SWIR". A total of 10 kg of machine-picked longstaple cotton from southern Xinjiang and 50 pieces of film of different sizes are picked out by skilled workers.
As shown in Figure 1, the hyperspectral imaging system can obtain the hyperspectral images of seed cotton mixed with the film: the resolution is 384 pixels × 600 pixels, the spectral range is 1000~2500 nm, 288 bands. The hyperspectral camera is positioned directly above the platform, with four halogen lamps symmetrically placed around it. The angle of irradiation of the halogen lamp can be adjusted arbitrarily. All halogen sources are adjusted to the position directly below the camera. The distance regulating mechanism is responsible for controlling the vertical motion of the hyperspectral camera in order to adjust the camera's image surface. Additionally, the transfer platform can continuously move horizontally to capture continuous one-dimensional images. Since experimental subjects come in different sizes, the electronic control platform allows for vertical movement to create storage space for the subjects. The collected hyperspectral images can be regarded as 230,400 pieces of sample data, including 92,456 seed cotton samples, 63,478 film on cotton samples, 62,897 background samples and 11,569 film on background samples. The specific operation steps are as follows: (1) The samples are placed on the transfer platform and irradiated uniform light from a halogen lamp. The reflected light is then captured by a hyperspectral camera, which provides one-dimensional spectral information.
(2) The transfer platform moves horizontally to obtain continuous one-dimensional spectral information, which is then transmitted to an industrial computer to generate hyperspectral images containing all the spectral information.

Technical Route
The technical route of the film sorting system is illustrated in Figure 2, where see cotton mixed with the film is given to the study subject. Firstly, a hyperspectral camera used to collect 1000-2500 nm hyperspectral images. Secondly, the experimental validatio involves nine models, including black and white correction, dimension reduction, an CNN training and testing. The purpose is to determine the best models for hyperspectr data dimension reduction and CNN. The binarization is established to display the reco nition outcome of the optimal AlexNet-PCA-12 model, which has an eminent recognitio accuracy of 98.07%. Finally, the coordinates of the film are fed into a high-speed spr valve to complete the film removal in the practical application sorting tests.

Technical Route
The technical route of the film sorting system is illustrated in Figure 2, where seed cotton mixed with the film is given to the study subject. Firstly, a hyperspectral camera is used to collect 1000-2500 nm hyperspectral images. Secondly, the experimental validation involves nine models, including black and white correction, dimension reduction, and CNN training and testing. The purpose is to determine the best models for hyperspectral data dimension reduction and CNN. The binarization is established to display the recognition outcome of the optimal AlexNet-PCA-12 model, which has an eminent recognition accuracy of 98.07%. Finally, the coordinates of the film are fed into a high-speed spray valve to complete the film removal in the practical application sorting tests.

Black and White Correction of Hyperspectral Images
The stability of data can be impacted by environmental factors including light intensity and angular variations. In addition, there is a dark current in the camera and noise interference in the acquisition. Therefore, the hyperspectral images need to be corrected, which can remove ambient light interference and most of the noise in the image and effectively improve the classification and recognition accuracy of the subsequent model. The original hyperspectral images can be corrected by [20]

Dimension Reduction of Hyperspectral Data
Dimension reduction can effectively eliminate noise and irrelevant information while also preventing data redundancy and dimension explosion caused by high-dimensional data during algorithmic processing [21]. At present, the dimension reduction methods of hyperspectral data mainly conclude linear discriminant analysis (LDA), principal

Black and White Correction of Hyperspectral Images
The stability of data can be impacted by environmental factors including light intensity and angular variations. In addition, there is a dark current in the camera and noise interference in the acquisition. Therefore, the hyperspectral images need to be corrected, which can remove ambient light interference and most of the noise in the image and effectively improve the classification and recognition accuracy of the subsequent model. The original hyperspectral images can be corrected by [20] where I re f represents the corrected image, I raw is the original image, I white denotes standard correction image, I dark indicates background correction image.

Dimension Reduction of Hyperspectral Data
Dimension reduction can effectively eliminate noise and irrelevant information while also preventing data redundancy and dimension explosion caused by high-dimensional data during algorithmic processing [21]. At present, the dimension reduction methods of hyperspectral data mainly conclude linear discriminant analysis (LDA), principal component analysis (PCA), independent component analysis (ICA), etc. By extracting and Sensors 2023, 23, 7041 6 of 28 mapping the main feature bands of the original data, these methods can effectively reduce the operating cost of the algorithm while ensuring the recognition accuracy of the algorithm.

Linear Discriminant Analysis
LDA is a linear learning method that employs pattern recognition, machine learning, and other techniques to extract similar features of two objects or events from multiple datasets. These features are then combined to more accurately identify the differences between them [22].
Hyperspectral data contains an LDA multi-classification task, which needs to project the vector x of the D dimension to y of the d (d < D) dimension, and the projection equation can be provided by where W is the projection matrix and the projection direction of each column vector is perovided by w i . Multi-classification task data sets X can be written as where N represents the number of sample types, i indicates the kind of sample, x The in-class divergence matrix S w is obtained as [22] where µ i presents the mean of training samples of class i, p(i, j) is the probability of x (i) j . The overall divergence matrix S t is given by [22] where µ represents the mean value of all training samples.
where p(i) denotes the probability of class i. Then, we obtain the objective function J [22]: The projection matrix W of the d dimension can be obtained by calculating the largest d eigenvalues of S −1 W S b and the corresponding d eigenvectors; d (d < N) is the dimension after dimensionality reduction of hyperspectral data.

Principal Component Analysis
PCA is a dimension reduction algorithm based on the discrete Karhunen-Loeve transform for extracting the main feature components of multivariate data information [23]. Although the majority of the noise in the image can be removed using PCA, it has greater advantages in terms of time complexity.
Data conversion. While reading is performed in the hyperspectral image data, each band data is converted into a one-dimensional vector. The hyperspectral image data are assumed to have a total of N bands with a w × h resolution, which can be represented as a matrix of (w × h) × N. Here, the band i can be expressed as For the eigenspace. The mean vector of all bands is calculated as [24] The distance vector between each band and the average band can be obtained as We set the matrix B as Then, the covariance matrix can be obtained as follows [24]: The transpose matrix in Formula (12) can be written as Since Formula (12) is a high-dimensional vector of (w × h) × (w × h), the calculation of eigenvectors of the first Z(Z ≤ N) large eigenvalues of the covariance matrix is too large, while Formula (13) is a low-dimensional vector of N × N, and therefore its eigenvalue can be calculated first [25]: where λ j presents the eigenvalue of Formula (12) and u j is the eigenvector of Formula (13). The eigenspace v j can be formed by the eigenvalues of Formula (13): Projection and similarity detection. The difference vector between each band and the average band is projected into the eigenspace, and the eigenvector i is expressed as The Euclidean distance is written as [25] When using PCA dimension reduction, similarity between images is determined by the Euclidean distance. A smaller Euclidean distance indicates a greater similarity and better results. After this operation, n eigenvector P with minimum Euclidean distance is tested to form a fresh hyperspectral data set, where n < N is the dimension of hyperspectral data after dimensionality reduction.

Independent Component Analysis
ICA is a method to find data intrinsic components from multi-dimensional statistical data which focuses on data analysis from independent sources, decomposing multivariate signals into different non-Gaussian signals [26]. Hyperspectral image data X can be regarded as a two-dimensional matrix with N rows and L columns (L = w × h). Hyperspectral data with band n (n < N) can be obtained through ICA to achieve the purpose of dimension reduction.
ICA of X can be expressed as [15] where N is the number of bands.
where (w 1 , w 2 , . . . w d , . . . , w N ) is defined as the transformation matrix W, (w 1d , w 2d , . . . , w Nd ) T presents the column vector of W. The independent component S is obtained by finding the appropriate transformation matrix W for the independent statistical and non-Gaussian properties of each component according to the principle of the central limit theorem.
Depending on the choice of the objective function, ICA includes FastICA, Projection pursuit, and Infomax, which mainly extract independent components by increasing non-Gaussian properties, reducing mutual information, and performing maximum likelihood estimation [15]. The FastICA approach which adopts batch processing to incorporate a huge quantity of sample data into the iterative process is utilized to optimize independent components. It also establishes negative entropy as a non-Gaussian measure of random variables. The steps of solving independent components by the FastICA algorithm can be described as follows: (1) Bleaching data. We set the average value of hyperspectral image data X as X and perform decentralized processing on the data to obtain The covariance matrix for P can be written as [27] The eigenvalue λ and eigenvalue diagonal matrix D are calculated through |λI − C| = 0 where I denotes the unit vector; the eigenvector E can be found by (λI There is a bleaching transformation matrix U = D − 1 2 E T , and the data after bleaching are obtained as [27] Z = U × P. (2) Finding the matrix W. We let k be the number of iterations, and the iterative computation of w (k) can be expressed as [27] where G(t) = tanh(t) = e t − e −t / e t + e −t denotes a hyperbolic tangent function, g(t) is the first derivative of G(t), E(•) indicates mean function. We orthogonalize and standardize the matrix W [27]: For any real number ε greater than 0, if w otherwise, k = k + 1 takes Formula (23) to continue the iteration. The column vector w d of W can be obtained from Formula (24), and d (d = 1, 2, . . . , N) is the index number of each band. When d = N, the matrix W is calculated as follows: By taking the matrix W into Formula (19), the independent component S can be solved.
the capacity of the j band containing i independent component information. By calculating the average absolute weight factor, it can assess how much of each band contains independent component information: The gained n bands with the largest average weight coefficient w j are formed into a new low-dimensional image to achieve the dimensionality reduction of the hyperspectral image; n (n < N) is the number of bands after the dimensionality reduction of the hyperspectral data.

Construction of the Convolutional Neural Network
The convolutional neural network mainly concludes with an input layer, convolution layer, pooling layer, fully connected layer, and output layer, which can effectively solve the over-fitting problem [28]. The research illustrates a 2D-CNN-based method for hyperspectral image classification which can reduce the training cost while ensuring high classification and recognition accuracy.
Convolution layer. The convolution layer applies a convolution kernel to transform the input matrix into a unit matrix for the next layer. During forward propagation, the convolution kernel computes the nodes in the right unit matrix by using the nodes in the left input matrix [29]. Multiple convolution kernels are used to convolve with input image data, and a series of feature graphs are obtained through an activation function after biasing [30]. In the paper, the ReLU activation function is utilized to map the input of neurons to the output; its nonlinear characteristics are introduced into the neural network, enabling its application to various nonlinear models. The convolution formula is expressed as follows [29]: where X l j denotes the j element of the l layer, M j stands for j convolution area of the l − 1 layer feature map, X l−1 i presents the elements, w l ij is the weight of the corresponding convolutional kernel matrix, b l j is the offset item, f (·) indicates the activation function, ∑ i∈M j X l−1 i •w l ij is the convolution formula.
Pooling layer. If all the features obtained through convolution are inputted into the classifier, a significant amount of computation is required to handle it. In this case, the Pooling function is required to process the feature maps obtained by convolution, and the Max pooling method is utilized in this paper. The pooled element matrix can reduce the dimension of the feature information obtained from the convolution layer and reduce the size of the matrix in the direction of height and width while ensuring the invariance of the feature scale. Meanwhile, the number of parameters of the whole neural network can be reduced, thus improving the generalization ability of the model [31].
Fully connected layer. With multi-layer convolution and pooling processing, images are gradually extracted with higher-level and more abstract feature information, which is classified by fully connected layers [32]. After unrolling the input feature vector into one dimension, the fully connected layer outputs the result via weighted summation and activation functions. The output formula is [29] y where k is the serial number of the network layer, y k is the output, x k−1 represents the expanded one-dimensional eigenvector, w k stands for the weight coefficient, b k is the offset item. f (·) is a model for probabilistic computation and an activation function suitable for classification tasks, which can be formulated as follows: The softmax loss function is structured in the fully connected layer to measure the solving accuracy of the problem, and loss function is adopted to describe the degree of dissatisfaction with the classification result. The effect of the neural network model is defined by the loss function. The tinier the loss value, the tinier the deviation between the result obtained by the model and the real value [33].
The purpose of neural network optimization is to accurately and timely update the parameters. Two optimization methods are employed for neural networks in the paper: the first step is the Gradient descent algorithm, and the second is the Back propagation algorithm. The optimization method of Gradient descent is to randomly select a function on the training data during the iteration process, which ensures the rapid update of parameters in each iteration. The back propagation algorithm based on the gradient descent algorithm can not only calculate vector gradients, but also calculate multidimensional tensors [34].
To avoid training overfitting, the Dropout function is used in the fully connected layers to make the output of neurons in the hidden layer drop to zero with a certain percentage probability. Dropout disables some hidden layer nodes that do not participate in the forward propagation process of the CNN. Due to the stochastic nature of the Dropout, each sample input to the network corresponds to a different network structure, but all these structures share weight. Since a neuron cannot depend on additional specific neurons, it reduces the complexity of inter-neuron adaptation and enables them to learn deeper features [35].

Design of Intelligent Recognition Algorithm for Film in Seed Cotton
In this section, three CNN models based on LeNet, AlexNet, and VGGNet are constructed for hyperspectral image recognition. The CNN schematic is shown in Figure 3. The schematic involves two steps. Firstly, the hyperspectral data are used to train the model and extract useful image features. Secondly, the trained features are applied to the testing set for verification, and the resulting recognition accuracy is outputted. Additionally, the network parameters are regulated through gradient descent and back propagation algorithms, which ascertain network parameters in time.

Design of Intelligent Recognition Algorithm for Film in Seed Cotton
In this section, three CNN models based on LeNet, AlexNet, and VGGNet are constructed for hyperspectral image recognition. The CNN schematic is shown in Figure 3. The schematic involves two steps. Firstly, the hyperspectral data are used to train the model and extract useful image features. Secondly, the trained features are applied to the testing set for verification, and the resulting recognition accuracy is outputted. Additionally, the network parameters are regulated through gradient descent and back propagation algorithms, which ascertain network parameters in time.  To achieve optimal recognition results for hyperspectral image recognition, LeNet, AlexNet, and VGGNet models are altered accordingly. The specific parameters for each model are outlined in Tables 1-3. For facilitating CNN to input hyperspectral data and output recognition accuracy, CNN is set corresponding to the input layer and output layer, specifically as follows: In the input layer, 5 × 5 indicates the data size of the input convolutional network by manual division and D denotes the data dimension obtained after adopting different dimensionality reduction algorithms. In the output layer, the Softmax loss function outputs the probabilities of four units, which include "cotton", "film on cotton", "background", and "film on background". LeNet structure mainly consists of 2 Convs, 2 Pools, and 1 FC. "Conv" is convolution layer, "Pool" denotes pooling layer, "FC" indicates fully connected layer.
Step: 2 Flatten layer Convert multi-dimensional input into one dimension FC Input neuron number: 27. Output neuron number: 60 Output layer Dropout drops 50% weight, Softmax loss function outputs the probabilities of four units.
Step: 2 Dropout drops 20% weight Flatten layer Convert multi-dimensional input into one dimension FC Input neuron number: 72. Output neuron number: 24 Output layer Dropout drops 20% weight, Softmax loss function outputs the probabilities of four units.

Design of Intelligent Recognition Algorithm for Film in Seed Cotton
To verify the generalization of the dimensional reduction, scatter plots are presented in Figure 4. The plots depict the application of different dimensional reduction methods on the same hyperspectral data from the three-dimensional reduction experiments. The scatter plots of dimensionality reduction for different batches of the same data under the same experimental conditions can be concluded as follows: Sensors 2023, 23, x FOR PEER REVIEW 14 of 29

CNN Model Training
Comparing three different dimension reduction methods (LDA, PCA, and ICA) after the hyperspectral data to a three-dimensional effect, three different structures are adopted CNN (LeNet, AlexNet, and VGGNet) for training and testing accuracy.
LeNet model training. Variations of training and testing accuracy of LeNet with the number of training epochs, training and testing loss curves are shown in Figure 5.
In Figure 5a, LDA recognition accuracy on the test set is about 92%, and the loss value is 0.15~0.2. In Figure 5b, PCA recognition accuracy on the test set is about 89%, and the loss value is 0.25~0.3. In Figure 5c, ICA recognition accuracy on the test set is about 85%, and the loss value is 0.3~0. 35.
Despite the fact that LDA and PCA are relatively stable to changes throughout the training phase, PCA slightly underperforms the LeNet model with LDA hyperspectral data reduction. However, the LeNet model with ICA hyperspectral data reduction has the worst stability of the three. (1) Considering only the first two samples, LDA data have obvious clustering and separability, but LDA data cannot classify sample "background" accurately.
(2) ICA data classified the four types of samples differently in different batches, so it is not general to data from different batches of dimensionality reduction, and the trained model cannot achieve ideal results on the test.
(3) Considering only the first two samples, PCA dimension reduction has distinct aggregation and separability on "background" and "film on background", while the data coincidence of the two samples "cotton" and "film on cotton" has no classification.
(4) The result shows that LDA has outstanding classification results with a dimensionality reduction of two for hyperspectral data. However, LDA can only reduce the data to three dimensions. Therefore, when the computer performance is satisfied, PCA obtains higher recognition accuracy when it is used to retain more dimensions.

CNN Model Training
Comparing three different dimension reduction methods (LDA, PCA, and ICA) after the hyperspectral data to a three-dimensional effect, three different structures are adopted CNN (LeNet, AlexNet, and VGGNet) for training and testing accuracy.
LeNet model training. Variations of training and testing accuracy of LeNet with the number of training epochs, training and testing loss curves are shown in Figure 5.
In Figure 5a, LDA recognition accuracy on the test set is about 92%, and the loss value is 0.15~0.2. In Figure 5b, PCA recognition accuracy on the test set is about 89%, and the loss value is 0.25~0.3. In Figure 5c, ICA recognition accuracy on the test set is about 85%, and the loss value is 0.3~0.35.
Despite the fact that LDA and PCA are relatively stable to changes throughout the training phase, PCA slightly underperforms the LeNet model with LDA hyperspectral data reduction. However, the LeNet model with ICA hyperspectral data reduction has the worst stability of the three.  Figure 6.
In Figure 6a, LDA recognition accuracy on the test set is about 93%, and the loss value is 0.15~0.2. In Figure 6b, PCA recognition accuracy on the test set is about 90%, and the loss value is 0.25~0.3. In Figure 6c, ICA recognition accuracy on the test set is about 88%, and the loss value is 0.3~0. 35. Despite the fact that LDA and PCA are relatively stable to changes throughout the training phase, PCA slightly underperforms the AlexNet model with LDA hyperspectral data reduction. However, the AlexNet model with ICA hyperspectral data reduction has the worst stability of the three.  Figure 6.
In Figure 6a, LDA recognition accuracy on the test set is about 93%, and the loss value is 0.15~0.2. In Figure 6b, PCA recognition accuracy on the test set is about 90%, and the loss value is 0.25~0.3. In Figure 6c, ICA recognition accuracy on the test set is about 88%, and the loss value is 0.3~0.35.
Despite the fact that LDA and PCA are relatively stable to changes throughout the training phase, PCA slightly underperforms the AlexNet model with LDA hyperspectral data reduction. However, the AlexNet model with ICA hyperspectral data reduction has the worst stability of the three.  Figure 7.
In Figure 7a, LDA recognition accuracy on the test set is about 90%, and the loss value is 0.2~0.25. The LDA model is relatively stable to changes throughout the training process and has excellent model stability.
In Figure 7b, PCA recognition accuracy on the test set is about 84%, and the loss value is about 0.4. In Figure 7c, ICA recognition accuracy on the test set is about 80%, and the loss value fluctuates widely. Both models have minor stability during the training phase. The ICA results are inferior than the VGGNet model with PCA hyperspectral data reduction.  Figure 7.
In Figure 7a, LDA recognition accuracy on the test set is about 90%, and the loss value is 0.2~0.25. The LDA model is relatively stable to changes throughout the training process and has excellent model stability.
In Figure 7b, PCA recognition accuracy on the test set is about 84%, and the loss value is about 0.4. In Figure 7c, ICA recognition accuracy on the test set is about 80%, and the loss value fluctuates widely. Both models have minor stability during the training phase. The ICA results are inferior than the VGGNet model with PCA hyperspectral data reduction.

CNN Model Testing
The confusion matrix for the test samples for the different algorithmic models are illustrated in Tables 4-6. It can be seen that 1 is the cotton, 2 represents the film on cotton, 3 indicates the background, and 4 denotes the film on background, the diagonal expresses the probability of correct classification. The experimental data analysis is as follows: (1) Using LDA and PCA dimensionality reduction hyperspectral data, it can be determined that the three kinds of CNN models have higher recognition accuracy for test samples. However, there are some errors in the classification of film samples on cotton and film samples on background, which is consistent with the conclusion of the scatter plots above.
(2) Since the hyperspectral data for ICA dimension reduction is not the same batch as the data during training, the extracted dimension information is unstable and the

CNN Model Testing
The confusion matrix for the test samples for the different algorithmic models are illustrated in Tables 4-6. It can be seen that 1 is the cotton, 2 represents the film on cotton, 3 indicates the background, and 4 denotes the film on background, the diagonal expresses the probability of correct classification. The experimental data analysis is as follows: (1) Using LDA and PCA dimensionality reduction hyperspectral data, it can be determined that the three kinds of CNN models have higher recognition accuracy for test samples. However, there are some errors in the classification of film samples on cotton and film samples on background, which is consistent with the conclusion of the scatter plots above.
(2) Since the hyperspectral data for ICA dimension reduction is not the same batch as the data during training, the extracted dimension information is unstable and the recognition effect is confused. Therefore, it cannot be applied to hyperspectral image recognition, which is consistent with the conclusion of the scatter plots above. The Overall Accuracy (OA) of the test samples is illustrated in Table 7, representing the percentage of all samples that are accurately predicted. The results can be summarized as follows: (1) When the hyperspectral data are reduced to three dimensions, the average OA of LDA is 91.68%, while PCA has an average OA of 87.08%; on the other hand, ICA has a lower average OA of 40.35%. Based on these results, it can be concluded that LDA demonstrates superior performance in terms of dimensionality reduction.
(2) The data in the table show that the CNN-based AlexNet model can achieve excellent recognition effects when the data are dimensionally reduced.
(3) When the dimension reduction of ICA is 3, it exhibits poor performance in terms of average OA compared to the other two dimensionality reduction methods. However, PCA can retain more dimension information to improve the recognition accuracy, which has more potential in practical applications. To distinguish the classification effects more intuitively, three bands are selected to display the hyperspectral data as pseudocolor images. Additionally, the spectral toolkit is used in the model tests to plot the predictions in the form of a two-dimensional image. The pseudocolor and manually labeled images are shown in Figure 8. The Overall Accuracy (OA) of the test samples is illustrated in Table 7, representing the percentage of all samples that are accurately predicted. The results can be summarized as follows: (1) When the hyperspectral data are reduced to three dimensions, the average OA of LDA is 91.68%, while PCA has an average OA of 87.08%; on the other hand, ICA has a lower average OA of 40.35%. Based on these results, it can be concluded that LDA demonstrates superior performance in terms of dimensionality reduction.
(2) The data in the table show that the CNN-based AlexNet model can achieve excellent recognition effects when the data are dimensionally reduced.
(3) When the dimension reduction of ICA is 3, it exhibits poor performance in terms of average OA compared to the other two dimensionality reduction methods. However, PCA can retain more dimension information to improve the recognition accuracy, which has more potential in practical applications. To distinguish the classification effects more intuitively, three bands are selected to display the hyperspectral data as pseudocolor images. Additionally, the spectral toolkit is used in the model tests to plot the predictions in the form of a two-dimensional image. The pseudocolor and manually labeled images are shown in Figure 8. Considering the actual sorting system only needed to locate the spatial coordinate position of the film, the classification results are combined from four categories into two categories: "film on cotton" and "film on background" are classified as film, and "cotton" and "background" are classified as non-film. The binarized images are shown in Figures  9-11. It can be seen that the combination of the AlexNet neural network structure and the LDA algorithm indicate the best recognition results, while the VGGNet neural network structure and the ICA algorithm denote the worst recognition results. Considering the actual sorting system only needed to locate the spatial coordinate position of the film, the classification results are combined from four categories into two categories: "film on cotton" and "film on background" are classified as film, and "cotton" and "background" are classified as non-film. The binarized images are shown in Figures 9-11. It can be seen that the combination of the AlexNet neural network structure and the LDA algorithm indicate the best recognition results, while the VGGNet neural network structure and the ICA algorithm denote the worst recognition results.   Regarding the reduction to three dimensions, the above experiments validate the classification effect of different dimensionality reduction methods on hyperspectral data. The results show that LDA achieves the highest performance in terms of aggregation and separability of features preserved by dimensionality reduction of hyperspectral data. With limited device conditions for hyperspectral images, it is advisable to opt for LDA dimension reduction. However, due to the limitations of the LDA algorithm, the data can only be reduced to three dimensions. Therefore, when the computer performance meets the requirements, PCA achieves higher recognition accuracy when more dimensions are retained. In summary, the AlexNet-PCA multi-dimensional algorithm is experimented with to obtain the highest recognition accuracy for seed cotton mixed with the film.

AlexNet-PCA Model Training
In Figure 12, the accuracy and loss value curves of the AlexNet model are shown when PCA is used to reduce the dimensionality by 6, 9, 12, and 15. The accuracy curve of the test set starts to converge at the training process of up to 40 iterations and mostly peaks at the training process of up to 60 iterations. The variation is stable throughout the training process and the model has great stability. Regarding the reduction to three dimensions, the above experiments validate the classification effect of different dimensionality reduction methods on hyperspectral data. The results show that LDA achieves the highest performance in terms of aggregation and separability of features preserved by dimensionality reduction of hyperspectral data. With limited device conditions for hyperspectral images, it is advisable to opt for LDA dimension reduction. However, due to the limitations of the LDA algorithm, the data can only be reduced to three dimensions. Therefore, when the computer performance meets the requirements, PCA achieves higher recognition accuracy when more dimensions are retained. In summary, the AlexNet-PCA multi-dimensional algorithm is experimented with to obtain the highest recognition accuracy for seed cotton mixed with the film.

AlexNet-PCA Model Testing
As shown in Table 8, the experimental data analysis can be summarized as follows: (1) The AlexNet-PCA algorithm for "cotton" and "background" has a minor number of errors in sample recognition classification. It can be attributed to the edge junction containing the reflection spectrum of both the cotton and the background.
(2) Misclassification is observed when using the AlexNet-PCA algorithm to identify the samples of "cotton" and "film on cotton", "background" and "film on background". It can be attributed to the weak reflection nature of the film, which leads to an indistinct discrimination of features. Especially for the PCA dimension selection, a set of linearly increasing dimensions 3, 6, 9 and 12 is chosen for the AlexNet-PCA multi-dimensional algorithm experiment. The linearly increasing dimensions are conducive to the smooth change in the image curve between overall accuracy and dimension; hence, the experimental results are more intuitive.
The Overall Accuracy (OA) of the test sample is shown in Table 9, representing the percentage of all samples that are accurately predicted. As the number of dimensions retained by PCA increases, the OA of the samples keep increasing. Figure 13 illustrates the OA of the samples as a function of the dimensionality reduction of PCA. The data in Table 9 and Figure 13 show teh following: (1) The increase in PCA dimensionality has an inverse relationship with the increase in accuracy.
(2) When the PCA dimension is set to 12, the proposed algorithm achieves a recognition accuracy of over 98%. Additionally, the overall classification accuracy of the samples begins to converge.
With the increase in PCA dimensionality reduction, the complexity of the neural network model also increases. However, the complexity of the model can lead to overfitting, which in turn can decrease the generalization ability of the model. In the study, we primarily utilize the Dropout method to avoid the overfitting problem. Dropout effectively weakens the connections between neuronal nodes, which reduces the network's reliance on individual neurons and thereby enhances model generalization ability.   The binarized images are shown in Figure 14a. As demonstrated in Figure 14b, the morphological method is utilized to perform an open operation on the binary image, which effectively minimizes the noise caused by light, dust, and artificial marks. As a result, the binary image contains the eliminated artifacts of identified small areas and image edges. The Figure 14 results show that: (1) Despite reducing the dimensionality to six utilizing PCA, the post-processing results still exhibit significant imperfections. However, when the dimension reduction is increased to 12, the post-processed image results successfully meet the requirement of providing coordinates. With a dimension reduction of 15, there is no significant difference between the post-processed image results compared to those obtained with a dimension reduction of 12.
(2) Considering the relationship between speed of accuracy improvement, computer performance, image processing results, dimension reduction, and training cost, PCA with a dimension reduction of 12 is the optimal solution for computer performance.  The binarized images are shown in Figure 14a. As demonstrated in Figure 14b, the morphological method is utilized to perform an open operation on the binary image, which effectively minimizes the noise caused by light, dust, and artificial marks. As a result, the binary image contains the eliminated artifacts of identified small areas and image edges. The Figure 14 results show that: (1) Despite reducing the dimensionality to six utilizing PCA, the post-processing results still exhibit significant imperfections. However, when the dimension reduction is increased to 12, the post-processed image results successfully meet the requirement of providing coordinates. With a dimension reduction of 15, there is no significant difference between the post-processed image results compared to those obtained with a dimension reduction of 12.
(2) Considering the relationship between speed of accuracy improvement, computer performance, image processing results, dimension reduction, and training cost, PCA with a dimension reduction of 12 is the optimal solution for computer performance. As can be seen from the above, the AlexNet-PCA-12 model with the optimal recognition accuracy is obtained experimentally. To verify the feasibility of the research, an application sorting test of the algorithm is conducted in a cotton factory in Aksu, Xinjiang. As depicted in Figure 15, the computer platform running the algorithm obtains the actual coordinates of the film and inserts them into the industrial control center, which controls the response time of the high-speed spray valve to complete the film removal. Table 10 shows the data of several sorting experiments: the overall removal rate of film is 97.02%, and the cotton sorting amount can reach 3.0 t/h, which meets the requirements of practical application. As can be seen from the above, the AlexNet-PCA-12 model with the optimal recognition accuracy is obtained experimentally. To verify the feasibility of the research, an application sorting test of the algorithm is conducted in a cotton factory in Aksu, Xinjiang. As depicted in Figure 15, the computer platform running the algorithm obtains the actual coordinates of the film and inserts them into the industrial control center, which controls the response time of the high-speed spray valve to complete the film removal. Table 10 shows the data of several sorting experiments: the overall removal rate of film is 97.02%, and the cotton sorting amount can reach 3.0 t/h, which meets the requirements of practical application.