Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology

Tian, Yan; Sun, Jun; Zhou, Xin; Cong, Sunli; Dai, Chunxia; Shi, Lei

doi:10.3390/foods14223832

Open AccessArticle

Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology

by

Yan Tian

^1,2

,

Jun Sun

^1,*

,

Xin Zhou

¹,

Sunli Cong

¹,

Chunxia Dai

¹

and

Lei Shi

¹

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China

²

School of Automation, Jiangsu University of Science and Technology, Zhenjiang 212008, China

^*

Author to whom correspondence should be addressed.

Foods 2025, 14(22), 3832; https://doi.org/10.3390/foods14223832 (registering DOI)

Submission received: 1 October 2025 / Revised: 5 November 2025 / Accepted: 6 November 2025 / Published: 9 November 2025

(This article belongs to the Special Issue Hyperspectral Imaging and Other Nondestructive Methods for Analyzing Food Quality)

Download

Browse Figures

Versions Notes

Abstract

Soluble solids content is the most important attribute related to the quality and price of apples. The objective of this study was to detect the soluble solids content (SSC) in ‘Fuji’ apples using hyperspectral imaging combined with a deep learning algorithm. The hyperspectral images of 570 apple samples were obtained and the whole region of apple sample hyperspectral data was collected and preprocessed. In addition, a method involving multi-attention convolutional neural network (MA-CNN) is proposed, which extracts spectral and spatial features from hyperspectral images by embedding channel attention (CA) and spatial attention (SA) modules in a convolutional neural network. The CA and SA modules help the network adaptively focus on important spectral–spatial features while reducing the interference of redundant information. Additionally, the Bayesian optimization algorithm (BOA) is used for model hyperparameter optimization. A comprehensive evaluation is conducted by comparing the proposed model with CA-CNN models, SA-CNN, and the current mainstream models. Furthermore, the best prediction performances for detecting SSC in apple samples were obtained from the MA-CNN model, with an

R_{p}^{2}

value of 0.9602 and an RMSEP value of 0.0612 °Brix. The results of this study indicated that the MA-CNN algorithm combined with hyperspectral imaging technology can be used as an effective method for rapid detection of apple quality parameters.

Keywords:

soluble solid content; apple; convolutional neural network; multi-attention; hyperspectral imaging technology

1. Introduction

Apple is a popular fruit favored by consumers for its nutritional value and health benefits. It contains various sugars, fruit acids, vitamins, cellulose, microelements, and antioxidant components, which can effectively reduce the damage caused by free radicals in the human body [1,2,3]. Soluble solids content (SSC), as an important inherent quality indicator of apples, is closely related to the taste of the apple and also affects the purchasing decisions and satisfaction of consumers [4]. Traditional SSC detection methods require the use of destructive digital refractometers, which are time-consuming and unable to quickly and efficiently determine the composition of the fruit flesh. Therefore, it is particularly important to develop a rapid, nondestructive, and safe detection technology to evaluate the SSC of apples [5].

With advancements in detection technology, various spectral techniques have been increasingly applied in the field of food analysis and detection, including visible and near-infrared spectroscopy [6,7,8], near-infrared spectroscopy [9,10,11], and Raman spectroscopy [12,13,14]. However, due to the uneven spatial distribution of chemical components within the sample, the information obtained through these spectroscopic techniques using point-source sampling is not fully representative [15]. Hyperspectral imaging (HSI) technology integrates the spectral and spatial information, which can detect the properties of samples nondestructively [16,17,18]. In previous studies, HSI technology has been successfully applied to quantitative and qualitative analysis of fruits such as apples [19], blueberries [20], pears [21], and grapes [22]. However, the spectral and spatial information of hyperspectral images are highly coupled, the increase in data dimensionality often leads to important features located within local band ranges being overwhelmed by redundant information.

Convolutional neural networks (CNNs) can learn deep abstract features from raw HSI data, avoiding complex expert knowledge and excessive manual intervention in feature extraction [23,24]. However, CNN processes data information from different spatial positions and spectral bands equally during feature extraction, making it difficult to extract higher quality spatial–spectral features. Meanwhile, processing all spectral bands at once to extract globally representative features makes the data susceptible to interference from redundant information, which limits the improvement of model prediction accuracy [25].

The attention mechanism can enhance crucial features while suppressing redundant information through weight mapping. Zhao et al. proposed a spatial–spectral transformation network that captures long-range spectral/image relationships via a multi-head attention mechanism for strawberry defect detection [26]. Roy et al. proposed an attention-based adaptive spectral–spatial kernel ResNet, which incorporates an adaptive spectral–spatial kernel and attention mechanism into the designed residual structure network. By dynamically adjusting the adaptive kernel based on the similarity and importance of samples, the model accuracy is significantly improved [27]. The mentioned studies demonstrate that employing the attention mechanism can effectively optimize the spectral–spatial feature extraction process, significantly enhancing model accuracy and robustness. Moreover, the reasonable selection of hyperparameters is crucial in deep learning modeling. Excessively high or low values of these parameters can lead to overfitting or underfitting of the model. Several popular hyperparameter optimization algorithms including grid search algorithms, particle swarm optimization algorithms, and Bayesian optimization algorithms (BOAs). It is worth noting that grid search algorithms require a large number of repeated experiments, which are time-consuming; particle swarm optimization is prone to getting trapped in local optima and exhibits low convergence accuracy. In contrast, BOAs obtain optimal parameters by introducing an acquisition function to evaluate the next optimization point [28].

Therefore, this study proposes a method that combines multi-attention mechanisms with CNN (MA-CNN) networks to extract spectral and spatial features of hyperspectral images using channel attention and spatial attention, respectively. By weighing the attention mechanism to enhance key features and weaken redundant information, the BOA is used to obtain optimized hyperparameters for the MA-CNN. Ultimately, a prediction model based on deep information fusion of spectral and spatial features is constructed to achieve nondestructive detection of apple quality.

2. Materials and Method

2.1. Apple Samples

Mature ‘Fuji’ apples were harvested from orchards in Qixia County, Shandong Province, China. All apple trees in the orchard were covered with two layers of paper bags two months after the peak flowering period. To ensure that the samples covered a wide range of growing conditions, apples were selected from different tree crowns and trunks, and three batches of experimental samples were collected, with each batch containing approximately 190 apples; a total of 570 apple samples with no obvious mechanical damage or defects were selected and transported to the laboratory of Jiangsu University, which had a temperature of 20 °C and a humidity level of 60%. The apple samples were placed in the laboratory for 24 h before the test to ensure that their temperature was consistent with the laboratory environment temperature. After that, all samples were cleaned and numbered, and the labeled samples were used for hyperspectral imaging system (HSI) acquisition and SSC value determination. In this experiment, the samples were divided into two groups at a ratio of 5:1, with 475 samples in the calibration set and 95 samples in the prediction set. And five-fold cross-validation was used in the calibration set to obtain the optimal network hyperparameters and to prevent the model from overfitting, through its combined use with the BOA.

2.2. Region of Interest Extraction and Hyperspectral Data Processing

The hyperspectral imaging system with a spectral range of 400.648–1001.61 nm (with 478 bands) is presented in Figure 1. The system mainly included a hyperspectral imaging camera (ImSpectorV10, Spectral Imaging Ltd., Oulu, Finland), two optical fiber halogen lamps (3900-ER, Illumination Technology, Inc., New York, NY, USA), a CCD camera (Zyla4.2 Plus, Andor Technology, Inc., Belfast, UK), a mobile platform controller (TS200AB, Zolix, Corp, Beijing, China), and a computer. The spectral resolution was 2.8 nm, and the spatial resolution was 2048 pixels. Prior to obtaining hyperspectral imaging of the apple samples, the instrument was warmed up for half an hour. Subsequently, the distance between the upper surface of the apple sample and the CCD camera was set to 0.45 m. The CCD exposure times for the whiteboard, blackboard, and sample were set to 10 ms, 10 ms, and 17 ms, respectively. The speed of the sample stage movement was set to 3.76 mm/s, and the CCD camera and hyperspectral imager were each set to a 2048 pixels × 478 bands (spatial × spectral) mode for sampling the test samples. Finally, apple samples were placed in the displacement platform of HSI system, and hyperspectral images of each apple sample were then obtained one by one. In order to eliminate the effects of uneven illumination and dark current noise, the raw hyperspectral image was calibrated according to the following formula [29].

I_{c a l} = \frac{I_{r a w} - I_{b l a c k}}{I_{w h i t e} - I_{b l a c k}}

(1)

where I_cal is the corrected reflectance image, I_raw is the original hyperspectral image, I_black is the black reference image (with approximately 0% reflectance) obtained by covering the lens completely with an opaque black cover, and I_white is the white reference image obtained by scanning a white standard plate with uniform and high reflectance (approximately 99.9% reflectance).

Extracting the region of interest (ROI) for the samples is critical in guaranteeing the reliability and typicality of the spectral information. Figure 1 shows the process of ROI extraction. Firstly, the spectra of samples were compared with the background region to obtain the bands exhibiting large and small differences (715.16 nm and 525.54 nm). The ratio image was obtained by converting the ratio between them. Then, a mask was obtained by using a minimum threshold of 1.6, which was manually selected according to the boundary critical value, with a large difference in reflectance between the target area of the apple sample and the interference background area. Finally, the original image was masked to obtain the target image. The average spectrum of the ROI was calculated to obtain data for subsequent modeling and analysis.

After data acquisition is completed, data augmentation is performed via data rotation and mirroring. In the hyperspectral image dataset, minor angle variations (randomly generating 3 rotations within different angular ranges, such as 0° to 30°, 150° to 180°, and 180° to 210°) are used, which enables the model to better adapt to various actual tilt angles. Subsequently, the image data undergoes mirroring in both left–right directions, which can increase data diversity. Specifically, a hyperspectral image can be augmented into five images (including the original image) through three rotations and subsequent left–right mirroring. During the experimentation process, five-fold cross-validation is used in the calibration set, data augmentation is performed independently within each fold of training, meaning that only the data from these four folds are used for data augmentation during each four-fold training session. The large-scale hyperspectral images inputted into the MA-CNN model may occupy a significant amount of memory and pose time-consuming issues. Therefore, it is necessary to implement image normalization by scaling the pixel values to the range of 0 to 1. Subsequently, every hyperspectral image should be cropped to 224 × 224 pixels.

2.3. Determination of SSC in Apple Samples

The measurement of SSC in apple samples was conducted using an Abbe refractometer (Model PAL-1, manufactured by ATAGO Co., Ltd., Tokyo, Japan). The detailed operational procedure is as follows: First, mince the flesh of the apple sample, manually press it, and filter it through gauze. Then, place 2–3 drops of the juice onto the center of the prism. Once the entire prism surface is thoroughly wetted with the juice, read and record the SSC value. The average of the three measurement results for each sample is taken as the reference value for the apple’s SSC [30].

2.4. CNN

As a supervised deep learning approach, CNNs have made remarkable accomplishments in speech recognition, image classification, and object detection. A typical CNN architecture comprises convolutional layers, activation layers, pooling layers, fully connected layers, and batch normalization layers [31]. Serving as the core component of the CNN model, the convolutional layer achieves local connectivity and weight sharing through convolutional kernels. These kernels slide across the input feature map, performing convolution operations with the data in the receptive field to features [32]. Batch normalization (BN) is employed to mitigate internal covariate shift and expedite the training of deep neural networks. The activation function imparts nonlinear representation capabilities to the network, enhances the model’s feature representation, maps originally indistinguishable multi-dimensional features into another space, and renders the learned features more easily discernible [33]. In addition, rectified linear units (ReLU) have been used as activation functions to accelerate the convergence of neural networks. The pooling layer is usually placed after the convolutional layer to reduce the number of feature dimensions and parameters and to prevent network overfitting. Furthermore, a few fully connected layers are usually used to integrate the extracted features and generate the output through a linear activation function.

2.5. MA-CNN

Figure 2 shows the overall network architecture framework, including spectral feature extraction branches based on channel attention, spatial feature extraction branches based on spatial attention, and spatial–spectral fusion feature extraction. Taking each pixel in the hyperspectral image as the center, two image blocks of different spatial sizes are constructed by taking the center pixel and its neighboring pixels. These blocks are input into the spectral feature extraction branch and the spatial feature extraction branch, respectively. Convolution, batch normalization, and pooling operations are performed to extract the spectral features of the pixel points and the spatial features composed of the center pixel and domain pixels. At the same time, shallow attention information is fused to assist the backbone network in extracting deep spectral and spatial features. Finally, deep spectral features are adaptively aggregated.

Step 1: Extract spectral features based on channel attention. The structure of the channel attention module is shown in Figure 3. In the spectral feature extraction branch, 3 × 3 image blocks are selected and fed into the network. The deep spectral feature vector F_c is obtained through convolution, the batch normalization layer, the ReLU activation layer, and global average pooling. The feature vector of the l-th layer of the spectral branch is denoted as F^l_c, which is subjected to global average pooling and regional maximum pooling along the spatial dimension to aggregate spatial features. The average pooling feature vector and maximum pooling feature vector are then passed through a multi-layer perceptron with shared weights to generate two one-dimensional feature vectors, M^l_avg and M^l_max. The specific formula is as follows [34]:

M^l_avg = MLP(AvgPool(F^l_c))

(2)

M^l_max = MLP(MaxPool(F^l_c))

(3)

Among them, AvgPool(*) and MaxPool(*) represent the global average pooling calculation and global maximum pooling calculation along the spatial dimension, while MLP(*) represents a multi-layer perceptron composed of two fully connected layers and one ReLU activation function. According to Formula (4), the shallow channel attention weight M^l⁻¹_c is introduced into the current layer features to assist in extracting deep features, and to obtain the attention weight vector M^l_c of the current layer [35].

M^l_c = σ(λ₁M^l_avg + λ₂ M^l_max + λ₃MLP(W₀(M^l⁻¹_c)))

(4)

λ₁, λ₂, and λ₃ are the weight coefficients for adaptive weighted fusion in network learning, with initial values of 1.0. W₀ denotes the convolutional layer that performs preliminary feature learning on the shallow channel attention vector M^l⁻¹_c, followed by batch normalization and ReLU activation function.

Where σ(*) represents the sigmoid activation function, and λ₁ represents the adaptive weighting of network learning. The output M^l_c is replicated through a broadcast mechanism, generating a matrix with the same dimension as F^l_c and optimizing F^l_c through element-wise multiplication [36]:

F^′_c^l = F^l_c ⊗ M^l_c

(5)

where ‘⊗’ denotes element-wise multiplication. Finally, the output F_c of the spectral feature extraction branch is obtained through the global average pooling operation (Figure 3).

Step 2: Spatial feature extraction based on spatial attention. The structure of the spatial attention module is shown in Figure 4. In the spatial feature extraction branch, image blocks within a range of 31 × 31 pixels in each hyperspectral image are taken as inputs, and deep spatial feature vectors F_s are obtained through convolutional layers, batch normalization layers, ReLU activation layers, local max pooling, and global average pooling. Assuming the feature vector of the l-th layer of the spatial branch is F^l_s, global average pooling and global maximum pooling operations are performed along the channel dimension to aggregate spatial features, resulting in two two-dimensional feature matrices representing spatial information distribution: the average pooling feature map is represented by M^l_avg and the maximum pooling feature map is denoted by M^l_max. The specific formula is as follows [37]:

M^l_avg = AvgPool(F^l_s)

(6)

M^l_max = MaxPool(F^l_s)

(7)

At the same time, according to Formula (8), there is a need to calculate the spatial attention weight vector, and concatenate the feature map M^l⁻¹_s generated by the shallow (l − 1 layer) attention weight map M^l⁻¹_s and the two-dimensional spatial feature maps M^l_avg and M^l_max generated by the current level feature map F^l_s along the channel dimension, and adaptively fuse them through a convolutional layer W_s [38].

M^l_s = σ (W_s[M^l_avg, M^l_max, W_l (M^l−1_s)])

(8)

Among them, W_l represents the learnable weight of the convolutional layer for cross-layer transmission of spatial attention information; W_s represents the convolutional layer for information fusion and spatial attention generation; and σ(*) represents the Sigmoid activation function. The output M^l_s is replicated along the channel dimension through a broadcast mechanism and optimized for F^l_s through primary element multiplication [39]:

F′_s^l = F^l_s ⊗ M^l_s

(9)

Among them, ‘⊗’ represents element-wise multiplication, the output feature F_s of the spatial feature extraction branch obtained through the global average pooling operation of the main element multiplication operation (Figure 4).

Step 3: Extract spectral and spatial features. The output F_c vector of the spectral feature extraction branch and the output F_s vector of the spatial feature extraction branch are matrix concatenated to obtain the spatial–spectral fusion feature vector, which is then inputted into the fully connected layer to obtain the output through the activation function.

2.6. BOA

The BOA involves establishing a computational prior distribution based on historical evaluations of the objective function, and combining it with observation points obtained from previous iterations to obtain a posterior distribution. Subsequently, the next sample point is selected based on the posterior information, aiming to minimize the objective function. The BOA comprises two core components: the probabilistic surrogate model and the acquisition function. In this paper, Gaussian processes and the PI function are chosen as the probabilistic surrogate model and the acquisition function, respectively. In this study, the BOA is used to determine the optimal MACNN architecture and hyperparameters, including the number of kernels, activation functions, optimization algorithms, batch sizes, and learning rates for all convolutional layers.

2.7. Model Evaluation

The performance of the model was evaluated by the coefficient of determination (R²), root mean square error (RMSE), and residual prediction deviation (RPD). The R² measures model accuracy, while the RMSE reflects the average difference between predicted and actual values in the corresponding set. RPD can further measure the quantitative prediction ability and stability of the model [40]. The calculation formulae for the above parameters are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}

(10)

R M S E = \sqrt{\frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{N}}

(11)

R P D = \frac{S D}{R M S E}

(12)

where

y_{i}

is the reference value,

{\hat{y}}_{i}

is the predicted value,

{\bar{y}}_{i}

is the average of the reference values, and n represents the number of samples. SD stands for the standard deviation of the reference value. Generally, a well performing model should have a high R², a low RMSE, and a high RPD. Ideally, the RPD value should be greater than 2.5 [15]. The MA-CNN is executed using a Windows 7 system, which has an Intel i7 8700 K processer with 8 cores of 3.7 GHz, and is developed based on the Keras framework (available at http://github.com/fchollet/keras, accessed on 1 November 2024), which is a Python (version 3.6) on exploiting deep learning models. In this experiment, the two equipped GPU of NIVDIA GT 1080Ti also help to improve the learning rate of MA-CNN.

3. Results and Discussion

3.1. Statistics of Reference Values

Table 1 shows the statistical values of SSC in apple samples. In the calibration set, the measured values of the SSC ranged from 7.20 to 18.10 °Brix, while the SSC values of the prediction set ranged from 8.05 to 15.41 °Brix. The reference range of SSC for the calibration set is broader than that of the prediction set, showing that this sample set division method is reasonable. Figure 5 illustrates the distribution frequency histogram of SSC in the apple samples. The data reveals that the data points are clustered around the mean, exhibiting an approximate normal distribution. The measured SSC values of the samples range from 7.2 °Brix to 18.10 °Brix, with an average of 11.76 °Brix and a standard deviation of 2.21 °Brix. Notably, the distribution is unimodal, with 68% of the data points clustered within one standard deviation of the mean (9.55 °Brix to 13.97 °Brix), indicating that most values are highly concentrated around the average SSC value.

3.2. CA-CNN Model

In this study, the kernel size of the first to third convolution layers was uniformly set to three. For each max pooling layer, both the pooling size and stride were configured to two. The number of convolution kernels, activation functions, optimization algorithms, batch sizes, and learning rates for all convolution layers were optimized using the BOA. Thirty iterations were assigned to the BOA in the experiment, and the model was trained for thirty epochs. The optimization process details are illustrated in Figure 6, while the optimized parameter values for the CA-CNN model are provided in Table 2. Figure 6a indicates that the average R² value of the training set reached its peak during the 22nd iteration, attaining a value of 0.9754. As depicted in Figure 6b, although several iterations after the 22nd iteration also achieved the highest R² value, the 22nd iteration demonstrated superior performance stability compared to others. Consequently, CA-CNN adopted the hyperparameters from the 22nd iteration and evaluated them through five-fold cross-validation, with the corresponding performance depicted in Figure 6c. Table 2 presents the hyperparameters and their respective search spaces.

3.3. SA-CNN Model

In SA-CNN, each convolutional layer is followed by a batch normalization layer. All convolutional layers are configured with predefined kernel sizes, along with the pooling layer size and stride of the max pooling layer. Specifically, the max pooling layer has a stride of (2, 2) (2, 2) and a pooling size of (3, 3) (3, 3); the first convolutional layer employs a (3, 3) (3, 3) kernel, the second uses a (5, 5) (5, 5) kernel, and the third uses a (7, 7) (7, 7) kernel. The optimization strategy mirrors that of CA-CNN, with the training model utilizing the BOA spanning 30 iterations and 30 cycles. Figure 7a illustrates the best R² value achieved in each iteration, with the highest R² value occurring at the 18th iteration. Figure 7b provides a detailed depiction of the BOA iteration process. As depicted, multiple iterations after the 18th epoch attain the highest R² value yet exhibit greater performance fluctuations compared to the 18th epoch. Consequently, the hyperparameters from the 18th iteration are ultimately selected (refer to Table 3 for specifics). Figure 7c displays the R² value of the model under five-fold cross-validation. Table 3 enumerates the hyperparameters and their respective search spaces.

3.4. MA-CNN Model

In the MA-CNN model, a network design with a parallel architecture is employed to segregate the spectral feature extraction, based on channel attention from the spatial feature extraction, based on spatial attention. This approach prevents mutual interference between the two processes, while reducing network depth and model complexity. The extracted spectral and spatial features are then fused and fed into a fully connected layer, with the BOA being utilized to select the model with the best performance. Figure 8 illustrates the optimization results of the SA-CNN, with the highest R² value per iteration of the training set presented in Figure 8a. As illustrated in the figure, the optimal performance (0.9796) is achieved at the eighth iteration. The detailed procedure of the BOA can be found in Figure 8b. Although multiple iterations can yield optimal performance, some iterations necessitate longer training times due to an excessive number of hidden layer nodes. Consequently, the hyperparameters from the eighth iteration are ultimately chosen for their minimal number of hidden layer nodes. The R² value of the optimized model under five-fold cross-validation is shown in Figure 8c. Table 4 provides the specific hyperparameters and their search spaces.

3.5. Comparison of SSC Detection Model

In this section, the model did not use a feature fusion module. The spatial and spectral features were concatenated, and output was obtained through two fully connected layers. The results are shown in Table 5, where a comparison can be made between models using the dataset of P1, P2, and the fusion data of P1 and P2. Compared to other CNN models without attention modules, the CNN model based on P2 input data exhibits the worst performance, with

R_{p}^{2}

of 0.9389, RMSEP of 0.0906 °Brix, and RPD of 2.3529. The CNN model based on the fused data of P1 and P2 is slightly better than the CNN model based on the P1 input data, with

R_{p}^{2}

values of 0.9578 and 0.9409, RMSEP values of 0.0743 °Brix and 0.0842 °Brix, and RPD values of 3.0635 and 2.6384, respectively.

Compared to other CNN models with attention modules, when only the spectral feature extraction branch with CA module is used, the

R_{p}^{2}

of 0.9571, RMSEP of 0.0738 °Brix, and RPD of 2.9876, respectively. When only the spatial feature extraction branch with SA module is used, the

R_{p}^{2}

of 0.9516, RMSEP of 0.0795 °Brix, and RPD of 2.8593, respectively. The performance in these two scenarios is close, reflecting the limitations of information utilization through single-category feature extraction, which hinder further improvement in prediction accuracy. When both branches are used together, the MA-CNN model with the fusion data of P1 and P2 obtained relatively better results, with

R_{p}^{2}

of 0.9602, RMSEP of 0.0612 °Brix, and RPD of 3.3417, indicating that the MA-CNN model can integrate multi-layer attention information and enable the network to better focus on the feature information of key parts.

3.6. Comparative Evaluation and Computational Complexity Analysis of Different Models

To evaluate the performance and computational cost of the MA-CNN model, four deep learning networks were introduced: Vision Transformer (ViT) [41], Hybrid Spectral Network (HybridSN) [42], Spectral–Spatial Attention Network (SSAN) [43], and HybridViT network [44]. All models were tested using the same set of experimental samples. The MA-CNN model followed the parameter settings in this paper, while the parameters of the other models were set with reference to the relevant literature.

The experimental results are detailed in Table 6. It can be seen from Table 6 that the ViT model, due to the lack of image inductive bias built into CNN models, requires more training data to achieve the same performance. Compared to ViT, HybridSN combines the advantages of 3D convolution and 2D convolution, while SSAN introduces spatial and spectral attention mechanisms, and HybridViT integrates spatial and spectral features and combines attention mechanisms. The performance of these three methods is superior to that of ViT. A comprehensive comparison between Table 5 and Table 6 shows that the MA-CNN model, which adopts a spatial–spectral feature fusion module, optimizes the extraction process of spectral and spatial features by introducing channel attention modules and spatial attention modules in the dual-branch CNN, respectively, to generate spectral weight vectors representing the importance of spectral bands and spatial weight matrices, representing the importance of spatial neighboring pixels. This enhances key features, weakens redundant features, and effectively improves the model’s prediction performance.

To thoroughly evaluate the complexity and computational time cost of the models, this chapter tests different models using the same set of experimental samples. The computational complexity of the models is assessed by counting the number of trainable weight parameters (trainable parameters) updated during the backpropagation process of each network. Additionally, the training and testing times required by each network are recorded to quantify the computational time cost of different methods. The experimental results are presented in Table 7. The number of trainable parameters in the MA-CNN is moderate among the five comparative networks. SSAN has the lowest number of trainable weight parameters, resulting in the shortest training and testing times. Due to the quadratic growth in computational complexity and input sequence length of its self-attention mechanism, ViT has the largest number of network parameters, leading to longer model training and testing times. The MA-CNN extracts multi-dimensional features through a dual-branch architecture and adopts a spatial–spectral multi-attention strategy, which increases computational cost while improving prediction accuracy. Despite this, the method proposed in this paper is basically comparable to current mainstream algorithms in terms of training and testing time, with training time ranking third among the five methods and testing time consumption ranking fourth. More importantly, MA-CNN achieves optimal prediction accuracy within an acceptable time frame.

3.7. Discussion

In previous work studying the prediction of SSC in fruits, Li et al. detected SSC by using FT-NIR spectroscopy, and the results indicated that the optimal PLSR model was obtained by selecting feature bands using CARS, with a correlation coefficient of 0.92 and RMSEP of 0.661 °Brix [45]. Guo et al. used shortwave near-infrared spectroscopy to predict SSC in ‘Fuji’ apples and obtained superior performance with an R_p of 0.9398, and RMSEP of 0.3870% by using the ICA-SVR model [46]. These studies all employed NIR spectroscopy combined with traditional machine learning methods to establish models for predicting SSC. Near-infrared spectroscopy can only detect spectral information, and machine learning relies on traditional feature engineering, which requires extensive human experience and expert knowledge. Qi et al. utilized hyperspectral imaging combined with a convolutional neural network and a Transformer (CNN–Transformer) to analyze the SSC of cherry tomatoes, and yielded a determination coefficient of only 0.83, perhaps due to the fact that only spectral information was collected while spatial information was overlooked [47]. In contrast, the MA-CNN model proposed in this paper exhibits superior performance, benefiting from the dual-branch parallel architecture, which not only separates the extraction processes of spectral and spatial features and avoids mutual interference between the two processes, but also reduces the depth of the network and decreases model complexity.

4. Conclusions

In this study, the use of the MA-CNN model with the fusion data of P1 and P2 as a nondestructive method to effectively detect the SCC content in apple samples was assessed. By embedding the channel attention module and spatial attention module into the CNN, it can adaptively focus on importance and reduce redundant information interference. Compared with traditional spectral and spatial feature machine learning methods, this approach significantly improves detection performance. Subsequently, the BOA was introduced to automatically select the optimal hyperparameter combination to solve the hyperparameter optimization problem in CA-CNN, SA-CNN, and MA-CNN models. Experiments showed that the fused model was superior to CA-CNN and SA-CNN models in terms of accuracy and robustness, and the results confirmed that the combination of deep learning algorithms and hyperspectral imaging technology can effectively improve detection performance, opening up new ideas for apple quality parameter prediction and providing innovative ideas for developing fast, reliable, and nondestructive quality inspection tools for other products.

Author Contributions

Methodology, Y.T.; software, Y.T.; writing—original draft, Y.T.; investigation, J.S.; project administration, J.S.; conceptualization, X.Z.; resources, X.Z.; data curation, S.C.; supervision, S.C.; formal analysis, C.D.; validation, L.S.; writing—review and editing, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

Funding was received from the Postgraduate Research & Practice Innovation Program of Jiangsu Province (SJCX25-2450).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mendoza, F.; Lu, R.; Cen, H. Grading of apples based on firmness and soluble solids content using Vis/SWNIR spectroscopy and spectral scattering techniques. J. Food Eng. 2014, 125, 59–68. [Google Scholar] [CrossRef]
Wu, X.; Zhou, H.; Wu, B.; Fu, H. Determination of apple varieties by near infrared reflectance spectroscopy coupled with improved possibilistic Gath-Geva clustering algorithm. J. Food Process. Preserv. 2020, 44, e14561. [Google Scholar] [CrossRef]
Xu, Q.; Wu, X.; Wu, B.; Zhou, H. Detection of apple varieties by near-infrared reflectance spectroscopy coupled with SPSO-PFCM. J. Food Process Eng. 2022, 45, e13993. [Google Scholar] [CrossRef]
Bobelyn, E.; Serban, A.; Nicu, M.; Lammertyn, J.; Nicolai, B.M.; Saeys, W. Postharvest quality of apple predicted by NIR-spectroscopy: Study of the effect of biological variability on spectra and model performance. Postharvest Biol. Technol. 2010, 55, 133–143. [Google Scholar] [CrossRef]
Tian, Y.; Sun, J.; Zhou, X.; Yao, K.; Tang, N. Detection of soluble solid content in apples based on hyperspectral technology combined with deep learning algorithm. J. Food Process. Preserv. 2022, 46, e16414. [Google Scholar] [CrossRef]
Rong, Y.; Zareef, M.; Liu, L.; Din, Z.; Chen, Q.; Ouyang, Q. Application of portable Vis-NIR spectroscopy for rapid detection of myoglobin in frozen pork. Meat Sci. 2023, 201, 109170. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Zareef, M.; Chen, Q.; Ouyang, Q. Application of visible-near infrared spectroscopy in tandem with multivariate analysis for the rapid evaluation of matcha physicochemical indicators. Food Chem. 2023, 421, 136185. [Google Scholar] [CrossRef]
Ouyang, Q.; Rong, Y.; Wu, J.; Wang, Z.; Lin, H.; Chen, Q. Application of colorimetric sensor array combined with visible near-infrared spectroscopy for the matcha classification. Food Chem. 2023, 420, 136078. [Google Scholar] [CrossRef]
Liu, L.; Zareef, M.; Wang, Z.; Li, H.; Chen, Q.; Ouyang, Q. Monitoring chlorophyll changes during Tencha processing using portable near-infrared spectroscopy. Food Chem. 2023, 412, 135505. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Fang, Y.; Wu, B.; Liu, M. Application of Near-Infrared Spectroscopy and Fuzzy Improved Null Linear Discriminant Analysis for Rapid Discrimination of Milk Brands. Foods 2023, 12, 3929. [Google Scholar] [CrossRef]
Li, Q.; Wu, X.; Zheng, J.; Wu, B.; Jian, H.; Sun, C.; Tang, Y. Determination of Pork Meat Storage Time Using Near-Infrared Spectroscopy Combined with Fuzzy Clustering Algorithms. Foods 2022, 11, 2101. [Google Scholar] [CrossRef]
Li, H.; Zhang, W.; Nunekpeku, X.; Sheng, W.; Chen, Q. Investigating the change mechanism and quantitative analysis of minced pork gel quality with different starches using Raman spectroscopy. Food Hydrocoll. 2025, 159, 110634. [Google Scholar] [CrossRef]
Jiang, H.; Wang, Z.; Deng, J.; Ding, Z.; Chen, Q. Quantitative detection of heavy metal Cd in vegetable oils: A nondestructive method based on Raman spectroscopy combined with chemometrics. J. Food Sci. 2024, 89, 8054–8065. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.; Jiang, X.; Rong, Y.; Wei, W.; Wu, S.; Jiao, T.; Chen, Q. Label-free detection of trace level zearalenone in corn oil by surface-enhanced Raman spectroscopy (SERS) coupled with deep learning models. Food Chem. 2023, 414, 135705. [Google Scholar] [CrossRef] [PubMed]
Liang, J.; Wang, Y.; Shi, Y.; Huang, X.; Li, Z.; Zhang, X.; Zou, X.; Shi, J. Non-destructive discrimination of homochromatic foreign materials in cut tobacco based on VIS-NIR hyperspectral imaging. J. Sci. Food Agric. 2023, 103, 4545–4552. [Google Scholar] [CrossRef]
Yu, X.; Lu, H.; Liu, Q. Deep-learning-based regression model and hyperspectral imaging for rapid detection of nitrogen concentration in oilseed rape (Brassica napus L.) leaf. Chemom. Intell. Lab. Syst. 2017, 172, 188–193. [Google Scholar] [CrossRef]
Tian, X.; Aheto, J.; Huang, X.; Zheng, K.; Dai, C.; Wang, C.; Bai, J. An evaluation of biochemical, structural and volatile changes of dry-cured pork using a combined ion mobility spectrometry, hyperspectral and confocal imaging approach. J. Sci. Food Agric. 2021, 101, 5972–5983. [Google Scholar] [CrossRef]
Shi, L.; Sun, J.; Zhang, B.; Wu, Z.; Jia, Y.; Yao, K.; Zhou, X. Simultaneous detection for storage condition and storage time of yellow peach under different storage conditions using hyperspectral imaging with multi-target characteristic selection and multi-task model. J. Food Compos. Anal. 2024, 135, 106647. [Google Scholar] [CrossRef]
Tian, Y.; Sun, J.; Zhou, X.; Wu, X.; Lu, B.; Dai, C. Research on apple origin classification based on variable iterative space shrinkage approach with stepwise regression–support vector machine algorithm and visible-near infrared hyperspectral imaging. J. Food Process. Eng. 2020, 43, e13432. [Google Scholar] [CrossRef]
Shanthini, K.; George, S.N.; Chandran, O. NorBlueNet: Hyperspectral imaging-based hybrid CNN-transformer model for non-destructive SSC analysis in Norwegian wild blueberries. Comput. Electron. Agric. 2025, 235, 110340. [Google Scholar] [CrossRef]
Yu, X.; Lu, H.; Wu, D. Development of deep learning method for predicting firmness and soluble solid content of postharvest Korla fragrant pear using Vis/NIR hyperspectral reflectance imaging. Postharvest Biol. Technol. 2018, 141, 39–49. [Google Scholar] [CrossRef]
Xu, M.; Sun, J.; Yao, K. Nondestructive detection of total soluble solids in grapes using VMD-RC and hyperspectral imaging. J. Food Sci. 2022, 87, 236–338. [Google Scholar] [CrossRef]
Guo, Z.; Zou, Y.; Sun, C.; Jayan, H.; Jiang, S.; El-Seedi, H.; Zou, X. Nondestructive determination of edible quality and watercore degree of apples by portable Vis/NIR transmittance system combined with CARS-CNN. J. Food Meas. Charact. 2024, 18, 4058–4073. [Google Scholar] [CrossRef]
Cheng, J.; Sun, J.; Yao, K.; Xu, M.; Dai, C. Multi-task convolutional neural network for simultaneous monitoring of lipid and protein oxidative damage in frozen-thawed pork using hyperspectral imaging. Meat Sci. 2023, 201, 109196. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Li, T.; Chen, T.; Zhang, X.; Taha, M.; Yang, N.; Mao, H.; Shi, Q. Cucumber Downy Mildew Disease Prediction Using a CNN-LSTM Approach. Agriculture 2024, 14, 1155. [Google Scholar] [CrossRef]
Zhao, L.; Zhou, S.; Liu, Y.; Pang, K.; Yin, Z.; Chen, H. Strawberry defect detection and visualization using hyperspectral imaging. Spectrosc. Spectr. Anal. 2025, 45, 1310–1318. [Google Scholar] [CrossRef]
Roy, S.; Manna, S.; Song, T.; Bruzzone, L. Attention-based adaptive spectral-spatial kernel ResNet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 7831–7843. [Google Scholar] [CrossRef]
Alam, M.; Sultana, N.; Hossain, S. Bayesian optimization algorithm based support vector regression analysis for estimation of shear capacity of FRP reinforced concrete members. Appl. Soft Comput. 2021, 105, 107281. [Google Scholar] [CrossRef]
Sun, J.; Zhou, X.; Hu, Y.; Wu, X.; Zhang, X.; Wang, P. Visualizing distribution of moisture content in tea leaves using optimization algorithms and NIR hyperspectral imaging. Comput. Electron. Agric. 2019, 160, 153–159. [Google Scholar] [CrossRef]
Vega, D.; Aldana, A.; Zuluaga, D. Prediction of dry matter content of recently harvested ‘Hass’ avocado fruits using hyperspectral imaging. J. Sci. Food Agric. 2020, 101, 897–906. [Google Scholar] [CrossRef]
Huang, Y.; Li, J.; Yang, R.; Wang, F.; Qian, W. Hyperspectral imaging for identification of an invasive plant mikania micrantha kunth. Front. Plant Sci. 2021, 12, 626516. [Google Scholar] [CrossRef] [PubMed]
Hu, Y.; Sheng, W.; Adade, S.; Wang, J.; Li, H.; Chen, Q. Comparison of machine learning and deep learning models for detecting quality components of vine tea using smartphone-based portable near-infrared device. Food Control 2025, 174, 111244. [Google Scholar] [CrossRef]
Xue, Y.; Jiang, H. Monitoring of chlorpyrifos residues in corn oil based on raman spectral deep-learning model. Foods 2023, 12, 2402. [Google Scholar] [CrossRef]
Plakias, S.; Boutalis, Y. Fault detection and identification of rolling element bearings with attentive dense CNN. Neurocomputing 2020, 405, 208–217. [Google Scholar] [CrossRef]
Shi, H.; Sun, H.; Zhao, C.; Han, G.; Wu, R.; Liu, Y. Bearing fault diagnosis based on residual networks and grouped two-level attention mechanism for multisource signal Fusion. IEEE Trans. Instrum. Meas. 2025, 74, 3532211. [Google Scholar] [CrossRef]
Huang, T.; Fu, S.; Feng, H.; Kuang, J. Bearing fault diagnosis based on shallow multi-scale convolutional neural network with attention. Energies 2019, 12, 3937. [Google Scholar] [CrossRef]
Liu, H.; Wei, C.; Sun, B. Quantitative Evaluation for robustness of intelligent fault diagnosis algorithms based on self-attention mechanism. J. Internet Technol. 2024, 25, 921–929. [Google Scholar] [CrossRef]
Zhou, X.; Luo, C.; Ren, P.; Zhang, B. Multiscale Complex-Valued Feature Attention Convolutional Neural Network for SAR Automatic Target Recognition. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2052–2066. [Google Scholar] [CrossRef]
Xu, Q.; Jiang, H.; Zhang, X.; Li, J.; Chen, L. Multiscale convolutional neural network nased on channel space attention for gearbox compound fault diagnosis. Sensors 2023, 23, 3827. [Google Scholar] [CrossRef]
Ni, J.; Xue, Y.; Zhou, Y.; Miao, M. Rapid identification of greenhouse tomato senescent leaves based on the sucrose-spectral quantitative prediction model. Biosyst. Eng. 2024, 238, 200–211. [Google Scholar] [CrossRef]
Visweswaran, M.; Mohan, J.; Kumar, S.; Soman, K. Synergistic detection of multimodal fake news leveraging TextGCN and vision transformer. Procedia Comput. Sci. 2024, 235, 142–151. [Google Scholar] [CrossRef]
Roy, S.; Krishna, G.; Dubey, S.; Chaudhuri, B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef]
Zhao, J.; Wang, J.; Chao, R.; Huang, L. Dual-Branch Spectral-spatial attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5504718. [Google Scholar] [CrossRef]
Yang, X.; Cao, W.; Tang, D.; Zhou, Y.; Lu, Y. ACTN: Adaptive coupling transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5503115. [Google Scholar] [CrossRef]
Li, X.; Huang, J.; Xiong, Y.; Zhou, J.; Tan, X.; Zhang, B. Determination of soluble solid content in multi-origin ‘Fuji’ apples by using FT-NIR spectroscopy and an origin discriminant strategy. Comput. Electron. Agric. 2018, 155, 23–31. [Google Scholar] [CrossRef]
Guo, Z.; Huang, W.; Peng, Y.; Chen, Q.; Ouyang, Q.; Zhao, L. Color compensation and comparison of shortwave near infrared and long wave near infrared spectroscopy for determination of soluble solids content of ‘Fuji’ apple. Postharvest Biol. Technol. 2016, 115, 81–90. [Google Scholar] [CrossRef]
Qi, H.; Li, H.; Chen, L.; Chen, F.; Luo, J.; Zhang, C. Hyperspectral Imaging Using a Convolutional Neural Network with Transformer for the Soluble Solid Content and pH Prediction of Cherry Tomatoes. Foods 2024, 13, 251. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Hyperspectral imaging system and ROI extraction.

Figure 2. MA-CNN architecture diagram.

Figure 3. CAM module structure diagram.

Figure 4. SAM module structure diagram.

Figure 5. The distribution frequency histogram of SSC in the apple samples.

Figure 6. Details of the BOA as used in the CA-CNN model. (a) The value of R² of the training set for each iteration; (b) the value of R² of the training set for each epoch; and (c) the value of R² for five-fold cross-validation.

Figure 7. Details of the BOA as used in the SA-CNN model. (a) The value of R² of the training set for each iteration; (b) the value of R² of the training set for each epoch; and (c) the value of R² for five-fold cross-validation.

Figure 8. Details of the BOA, as used in the MA-CNN model. (a) The value of R² of the training set for each iteration; (b) the value of R² of the training set for each epoch; and (c) the value of R² for five-fold cross-validation.

Table 1. Measurement results of SSC in apple samples.

Dataset	Samples	Max (°Brix)	Min (°Brix)	Mean (°Brix)	SD (°Brix)
Calibration sets	320	18.10	7.20	12.23	2.21
Prediction sets	95	15.41	8.05	11.34	1.89
Total samples	570	18.10	7.20	11.76	1.92

Table 2. Optimized parameters of the CA-CNN.

Parameters	Search Space	Search Results
Number of filters of Conv1 to Conv3	(4, 64), (8, 128), (16, 256)	(32, 64, 128)
Neurons in FC1 to FC2	(32, 128), (32, 128)	(83, 51)
Learning rate	(1 × 10⁻⁴, 1 × 10⁻¹)	0.0009
Batch size	(2, 128)	32
Activation function	[ReLU, SoftMax, Sigmoid, ELU]	ReLU
Optimization method	[SGD, Adam, AdaBound, RMSProp]	AdaBound

Table 3. Optimized parameters of the SA-CNN.

Parameters	Search Space	Search Results
Number of filters of Conv1 to Conv3	(4, 64), (16, 256), (8, 128)	(64, 128, 56)
Neurons in FC1 to FC2	(64, 256), (64, 256)	(125, 93)
Learning rate	(1 × 10⁻⁴, 1 × 10⁻²)	0.0007
Batch size	(2, 128)	53
Activation function	[ReLU, SoftMax, Sigmoid, ELU]	ELU
Optimization method	[SGD, Adam, AdaBound, RMSProp]	RMSProp

Table 4. Optimized parameters of the MA-CNN.

Parameters	Search Space	Search Results
Neurons in FC1 to FC2	(64, 256), (64, 256)	(86, 86)
Learning rate	(1 × 10⁻⁵, 1 × 10⁻¹)	0.0001
Batch size	(4, 128)	69
Activation function	[ReLU, SoftMax, Sigmoid, ELU]	ReLU
Optimization method	[SGD, Adam, AdaBound, RMSProp]	AdaBound

Table 5. Model performance based on deep learning models constructed with different inputs.

Input Data	Model	Calibration Set		Prediction Set
Input Data	Model	$R_{c}^{2}$	RMSEC	$R_{p}^{2}$	RMSEP	RPD
P1	CA-CNN	0.9754	0.0698	0.9571	0.0738	2.9876
P1	CNN	0.9607	0.0771	0.9409	0.0842	2.6384
P2	SA-CNN	0.9732	0.0583	0.9516	0.0795	2.8593
P2	CNN	0.9588	0.0897	0.9389	0.0906	2.3529
Pl and P2	MA-CNN	0.9796	0.0513	0.9602	0.0612	3.3417
Pl and P2	CNN	0.9691	0.0659	0.9578	0.0743	3.0635

Note: P1: the image block size is 3 × 3; P2: the image block size is 31 × 31.

Table 6. Comparison of the performance of different models.

Model	Training Set		Test Set
Model	$R_{c}^{2}$	RMSEC	$R_{p}^{2}$	RMSEP	RPD
ViT	0.9514	0.0541	0.9151	0.1496	2.7837
HybridSN	0.9633	0.0438	0.9210	0.0938	3.3121
SSAN	0.9678	0.0437	0.9230	0.0875	3.3252
HybridViT	0.9758	0.0406	0.9357	0.0806	3.2863

Table 7. Analysis of model complexity and computational time cost of different methods.

Methods	Trainable Params (M)	Training Time (s)	Testing Time (s)
ViT	0.25	1570	5.2
HybridSN	0.29	1457	4.8
SSAN	0.14	630	2.3
HybridViT	0.22	1289	4.1
MA-CNN	0.23	1103	4.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, Y.; Sun, J.; Zhou, X.; Cong, S.; Dai, C.; Shi, L. Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology. Foods 2025, 14, 3832. https://doi.org/10.3390/foods14223832

AMA Style

Tian Y, Sun J, Zhou X, Cong S, Dai C, Shi L. Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology. Foods. 2025; 14(22):3832. https://doi.org/10.3390/foods14223832

Chicago/Turabian Style

Tian, Yan, Jun Sun, Xin Zhou, Sunli Cong, Chunxia Dai, and Lei Shi. 2025. "Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology" Foods 14, no. 22: 3832. https://doi.org/10.3390/foods14223832

APA Style

Tian, Y., Sun, J., Zhou, X., Cong, S., Dai, C., & Shi, L. (2025). Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology. Foods, 14(22), 3832. https://doi.org/10.3390/foods14223832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Nondestructive Detection of Soluble Solids Content in Apples Based on Multi-Attention Convolutional Neural Network and Hyperspectral Imaging Technology

Abstract

1. Introduction

2. Materials and Method

2.1. Apple Samples

2.2. Region of Interest Extraction and Hyperspectral Data Processing

2.3. Determination of SSC in Apple Samples

2.4. CNN

2.5. MA-CNN

2.6. BOA

2.7. Model Evaluation

3. Results and Discussion

3.1. Statistics of Reference Values

3.2. CA-CNN Model

3.3. SA-CNN Model

3.4. MA-CNN Model

3.5. Comparison of SSC Detection Model

3.6. Comparative Evaluation and Computational Complexity Analysis of Different Models

3.7. Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI