1. Introduction
Strawberries are widely cultivated worldwide and are one of the most popular fruits among consumers. They are rich in various nutritional components, including sugars, vitamins, minerals, and bioactive compounds such as anthocyanins, making them beneficial for promoting human health and preventing disease [
1]. Strawberry quality is typically evaluated based on several physicochemical factors, including soluble solid content (SSC), titratable acidity (TA) [
2], and firmness. Among these, SSC is considered the most important quality indicator for determining taste, maturity, and harvest timing [
3]. Furthermore, Korean consumers regard SSC as a critical criterion when selecting strawberries.
Conventional SSC measurement methods have primarily involved destructive techniques such as high-performance liquid chromatography, gas chromatography, and refractive-index-based sugar content meters. However, these approaches are unsuitable for large-scale sorting processes because they not only destroy samples but also require significant time and labor for analysis. Consequently, near-infrared [
1] spectroscopy and hyperspectral imaging (HSI) technologies have recently attracted attention as nondestructive analytical tools for evaluating food quality and shelf life [
4,
5,
6]. In particular, HSI, which combines spectroscopic and image processing technologies, can acquire both spectral and spatial information about the internal and external quality of food, making it highly applicable for fruit quality assessment [
2].
Several studies have demonstrated the feasibility of rapid, nondestructive quality evaluation by predicting quality factors such as strawberry SSC using deep learning-based artificial neural networks in the visible–NIR and short-wave infrared wavelength ranges [
7]. In regression models developed to predict quality factors, including SSC, pH, and vitamin C in strawberries, partial least squares regression (PLSR) models exhibited the best predictive performance for SSC [
3]. A PLSR model designed to visualize the spatial distribution of sugar content in white strawberry flesh achieved a performance with
of 0.841 and root mean square error of prediction (RMSEP) of 0.576 [
8]. Additionally, the performance of a PLSR model based on the NIR range for predicting SSC and pH in cherry tomatoes reported, with an
of 0.84 and RMSEP of 0.56 [
9]. Furthermore, when comparing one-dimensional (1D) and three-dimensional (3D) convolutional neural network (CNN) models for strawberry sugar content prediction, the 1D CNN showed superior performance with smaller sample sizes [
10].
Recent research has also focused extensively on SSC prediction for fruit maturity analysis. In studies combining HSI with machine learning to predict quality factors in mangoes and strawberries, Random Forest [
11] models showed the highest performance for strawberry SSC prediction with
of 0.857 and mean squared error of 0.13 [
12]. Research applying PLSR models to predict sugar content in goji berries achieved
of 0.94 and RMSEP of 0.70 [
13]. In studies developing machine learning models to predict SSC across four maturity stages of strawberries, support vector machine (SVM) models achieved an
of 0.89 and root mean square error (RMSE) of 0.72 [
14]. Research on quality-factor-based models to predict the storage life of jujubes showed sugar content prediction results with
values of 0.837 and of 0.806 and root mean square error of validation (RMSEV) values of 0.810 and 1.304 for medium-ripe conditions and fully ripe conditions, respectively [
15]. Additionally, machine learning models using the HSI have been developed to predict sugar content and classify ripeness in postharvest stored kiwifruit, with SVM models achieving an RMSEP of 0.890 and classification accuracy of 92.381% [
16]. Research has also been conducted integrating HSI, organic analysis, and machine learning to predict the maturity of Seolhyang strawberries [
17].
Sugar content in strawberries is not uniformly distributed but varies spatially within the flesh. A previous study divided strawberries into three equal sections and analyzed correlations with phosphorus content, confirming the existence of regional differences in sugar content that were closely related to phosphorus content [
18]. These spatial patterns reflect the underlying sugar accumulation dynamics during fruit development, in which changes in sucrose content are closely associated with fruit growth and regulated by sucrose transporters such as FaSUT1 [
19]. Metabolomic analyses further indicate that the strawberry receptacle undergoes marked increases in major soluble sugars (sucrose, glucose, and fructose) during maturation, whereas achenes exhibit contrasting metabolic trajectories [
20]. Such spatial variations in sugar distribution can be attributed to the fruit’s vascular architecture, as vascular bundles transport nutrients acropetally from the pith toward the achenes and surrounding receptacle cells [
21], thereby establishing differential accumulation patterns across fruit regions. However, most existing nondestructive sugar content prediction studies have not accounted for such differences and have used the average sugar content of the entire flesh. Moreover, spectral data obtained using NIR spectroscopy typically represent average values across the whole fruit, which may obscure region-specific spectral information. On the other hand, since pixel-level spectral data are susceptible to noise, spatial averaging can improve the accuracy of SSC prediction. Furthermore, region-based analysis allows for the capture of localized spectral features that represent the actual sugar distribution of the fruit more accurately than a single whole-fruit average. While some studies have targeted the edge regions of strawberries for specific purposes owing to presumed SSC variations, these regions, which exhibit the least variation in SSC during growth, have limited utility as representative indicators of overall fruit quality [
22]. This highlights the lack of standardized criteria for selecting representative regions in applications such as nondestructive sorting equipment or portable sugar content measurement devices.
Therefore, this study aimed to develop and validate a region-based hyperspectral imaging and deep learning framework for the nondestructive prediction of SSC in strawberries. The strawberry surface was segmented into five concentric regions (G1–G5) to systematically assess the effect of spatial variability on model performance. To achieve this, both machine learning (PLSR) and deep learning architectures, including a simplified and lightweight VGG-CNN, were designed and compared with established CNNs to evaluate both accuracy and computational efficiency. The ultimate goal is to deliver a practical solution that enhances prediction performance while supporting real-time applications in postharvest quality control, high-throughput sorting, and portable sensing systems.
4. Discussion
Despite the promising outcomes, several limitations remain. First, the study was conducted using a single strawberry cultivar (Seolhyang) harvested under specific seasonal and regional conditions. Strawberry fruit quality is influenced by both genetic and environmental factors. Large-scale genetic studies have demonstrated substantial SSC variation across cultivars, with some varieties exhibiting significantly different sugar accumulation patterns due to differences in sucrose transporters and metabolic pathways [
54]. Furthermore, environmental conditions such as temperature and harvest season can significantly affect fruit SSC and other quality attributes [
55,
56], with winter and spring harvest conditions can result in marked differences in soluble solids content due to variations in temperature and solar radiation.
Additional validation across diverse cultivars, maturity stages, and growing environments is required to confirm the generalizability of the proposed framework. However, the core value of this study lies not in the specific trained model optimized for a single cultivar, but rather in the methodological validity of the region-based framework that systematically segments spatial variability and identifies data-optimal regions. This framework can serve as a standardized guideline for recalibrating predictive models to accommodate the unique characteristics of different cultivars or various cultivation environments. Second, the experiments were performed under controlled laboratory settings; future work should evaluate system robustness under variable illumination and environmental conditions representative of industrial sorting lines and in-field applications. In particular, the random orientation of strawberries during high-speed transport on actual sorting lines may compromise data consistency. Furthermore, factors such as surface moisture, bruising, and calyx residues could introduce significant noise into the spectral data. The data acquisition and processing times reported in this study must also be further optimized to meet the high throughput requirements demanded by industrial workflows. Third, while the lightweight CNN improved computational efficiency, further optimization at the hardware level (e.g., GPU acceleration or embedded system integration) is necessary for full deployment in portable devices. Aligning the current data processing speed with the high throughput requirements of industrial sorting lines is a critical step in transitioning laboratory models to field applications. Enhancing this real-time processing capability will significantly improve the agricultural practicality and on-site feasibility of the proposed system.
Accordingly, future research will focus on extending this region-based framework to other fruit species and advancing it into a multi-parameter quality assessment system (e.g., TA, firmness) to better reflect consumer perceptions. Efforts will also be directed toward ensuring system robustness against sample orientation and surface noise (moisture, bruising) on industrial lines. Furthermore, hardware-level optimization will be pursued to meet industrial throughput requirements, transforming the proposed methodology into a practical, on-site solution for the fresh produce supply chain.
5. Conclusions
This study developed a region-based hyperspectral imaging framework combined with lightweight deep learning models for nondestructive SSC prediction in strawberries. The PLSR model showed a general trend of improved prediction accuracy and reduced prediction error. G3 yielded the best performance, while partial regions also provided representative SSC information.
Among the deep learning models, the simplified VGG architecture consistently achieved superior performance across all spectral regions. For G1–G4, the application of SNV preprocessing led to optimal performance. When using datasets from all five groups (G1–G5), the RMSEP values remained below 0.5, confirming the feasibility for nondestructive quality prediction. Localized regional data at the G2–G3 level (50–75% of fruit area) achieved sufficient accuracy to represent overall SSC, highlighting advantages for sensor miniaturization and real-time measurement.
These results provide valuable insights into the design of practical sensing systems for quality control in strawberries and can be extended to other fresh produce. By enabling nondestructive and efficient SSC prediction, the proposed method can support improved postharvest handling, more reliable quality grading, and enhanced consumer satisfaction.
Overall, the region-based spectral analysis framework established in this study provides a robust foundation for designing cost-effective and lightweight sensors through data optimization. This approach holds significant implications as it reduces implementation costs and promotes the broader adoption of nondestructive quality assessment technologies, including small- and medium-sized farms and packing facilities. However, the present study is limited to a single strawberry cultivar. Additional validation is required to ensure generalizability across diverse cultivars, maturity stages, and storage conditions. Future research will focus on extending model applicability across various growth conditions and varieties, while also integrating 3D correction algorithms to address the effects of surface curvature and sample geometry.