1. Introduction
One of the most consumed subtropical and tropical fruits in the world is the pineapple (
Ananas comosus (L.) Merril.), the fruit is nutritionally important; it is a source of fiber, carotenoids and flavonoids, and supports digestion, the immune system, and promotes skin improvements [
1,
2,
3]. The pineapple is a fruit that does not ripen after harvesting. Therefore, the fruit must have a pleasant taste for the consumer when it is harvested.
Usually, the harvest season is defined by measurements inside the fruit, such as total soluble solids, where sugar predominates, measured in °Brix, and total acidity [
4]. However, the ratio between °Brix and the total acidity may be more accurate in determining quality, as the fruit may contain sufficient sugar but the high citric acid content may mask some of the perception of sweetness, i.e., fruit with sufficient sugar content but high acidity may be perceived as sour [
5].
In addition to the parameters that are usually evaluated to determine the time of harvest and the resulting flavor of the fruit, the evaluation of bioactive molecules with antioxidant and anti-inflammatory properties that can prevent cancer and cardiovascular diseases is of great value to human health. Flavonoids are among these beneficial compounds. They are found in various parts of the pineapple plant, particularly in the peel of the fruit, and are retained even after drying or freeze-drying the peel, a process used to produce flour that can be used in feed [
6].
Conventional methods for evaluating the quality of pineapples are destructive, require equipment, laboratory procedures, and trained personnel, and are quite laborious. Furthermore, not all fruit can be assessed using the destructive method. Instead, one sample is used to assess all fruits and this can lead to errors. Therefore, a more practical and non-destructive assessment of pineapple quality parameters such as °Brix, acidity and flavonoids would be more sustainable as it would eliminate the use of reagents and better meet the growing consumer demand for high-quality fruit [
4]. Therefore, it is important to develop new technologies and effective equipment to check the quality of pineapple fruit [
7,
8].
The use of hyperspectral sensors has many advantages, such as the fact that the fruit is not damaged and that the quality characteristics of the fruit can be predicted very well [
9]. In addition to this method, machine learning (ML) makes it possible to find a model with the accuracy of the evaluated factors that make up the quality of the fruit, ensuring a database of qualitative and quantitative characteristics for future evaluations of other fruits [
10]. Furthermore, studies have shown that selecting specific spectral features can improve maturity classification in crops such as strawberries and tomatoes using hyperspectral imaging [
11]. Non-destructive fruit quality assessment using hyperspectral data have benefited significantly from the use of machine learning techniques, such as Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). These approaches allow detailed analysis of internal and external fruit characteristics, providing accurate and efficient results.
Computational intelligence algorithms can predict the quality of fruit using training data without the need to manually extract the features, reducing the influence of human factors on the prediction results [
12]. Studies using ML algorithms with spectral data to evaluate fruit quality have been very promising in various fruits, e.g., peaches [
13], apples [
14], pears [
15], oranges [
16]. Specifically for pineapples noticed that NIR-HSI techniques could be successfully used as a commercial method to predict the maturity index of individual pineapple fruits [
17]. It was also concluded that the infrared thermal imaging technique, when used on image parameters, can auspiciously predict the physicochemical properties of different pineapple varieties during storage, especially fruit stored at 10 °C [
18]. In another study, they predicted the chemical composition in each pixel of the multispectral images using the calibration models that yielded spatially distributed quantification of the fruit slice [
19]. However, research and in-depth analysis are required to develop more accurate and robust models.
With this in mind, the objective of this study is to validate accurate ML models for predicting pineapple fruit quality using different parts of the plant: leaf, peel, and fruit. The specific objectives of the present study are as follows: (i) to build a predictive model for pineapple fruit quality features by comparing machine learning algorithms with different inputs and (ii) to optimize the detection of spectral features for pineapple fruit representative of different parts of the fruit, peel, and leaf, thereby improving the predictive results of the machine learning model.
2. Materials and Methods
2.1. Study Site and Growing Conditions
The field experiment was carried out in the experimental area at the Federal University of Mato Grosso do Sul, located in the municipality Chapadão do Sul, Brazil, at 18°46′44″ S, 52°36′59″ W and an altitude of 820 m. The region’s climate is classified as tropical humid (Aw type) according to the Köppen classification, with a rainy season in summer and a dry season in winter [
20]. The average rainfall is 1850 mm and the average annual temperature is between 13 °C and 29 °C [
21]. Throughout the experiment, air temperature and relative humidity measurements were made (
Figure 1).
The soil is classified as Oxisol [
22], with a clayey texture. Before planting, the soil was chemically analyzed (
Table 1). Soil correction and fertilization followed the recommendations for the Cerrado biome [
23]. Soil samples were collected in the 0 to 20 cm depth layer, using a Dutch auger manufactured by Sondaterra.
The pineapple variety ‘Pérola’ was used. To obtain the seedlings, after harvesting the pineapple fruits that had already been planted in the UFMS area, the seedlings from the base of the fruits remained on the plant until they reached 25–30 cm, then they were harvested and placed in a shaded area until planting.
The pineapples were planted in a single row. The distance between the rows was 1 m and between the plants 0.5 m. Eight rows of 20 m each were planted. The experiment used a drip irrigation system installed under a plastic film (mulching) to suppress the development of weeds. Topdressing fertilization was applied using a Venturi injector.
2.2. Brix and Total Titratable Acidity
The soluble solids content was determined using a manual refractometer with a scale of 0–32% and the results were expressed in °Brix [
24]. The ATC portable refractometer manufactured by Shenzhen Yieryi Technology (Shenzhen, China) was used, with a reading of 0–32% °Brix and automatic temperature compensation. Total titratable acidity was determined by titrimetry with 0.1 M NaOH solution and phenolphthalein indicator until a permanent light pink color was obtained [
25].
2.3. Liquid Chromatography Analysis for Quantification of Flavonoids
Flavonoids in pineapple fruit, peel, and leaf were extracted and measured during harvest. After drying, the samples were ground, and 50 mg of the samples were placed into a 2 mL Eppendorf, which also included 1.5 mL of 70% methanol with 0.1% acetic acid, in order to extract the isoflavones. Subsequently, the samples were centrifuged at 5000 rpm for 20 min and the supernatant was filtered using a 0.2 µm filter syringe and transferred to 1.5 mL vials before injection into an ultra-performance liquid chromatography system. Each sample was analyzed three times [
26]. The isoflavones were separated and quantified using a Waters Acquity 1100 series ultra-performance liquid chromatography (UPLC, Waters Corporation, Milford, CT, USA) automatic sample injector.
2.4. Hyperspectral Data
Ten months after planting, a spectral assessment was performed on the largest leaf of the plant, as it is the youngest among the adult leaves and the most physiologically active among all leaves [
27]. A total of 14 months after planting, at the time of harvest, a spectral evaluation of the fruit peel and pulp was performed. Hyperspectral evaluations of each part (leaf, rind and fruit) were performed on 100 plants.
The ASD FieldSpec® instrument, manufactured by Analytical Spectral Device (Boulder, CO, USA), which provides spectral readout capability in the broad reflectance spectrum (350–2500 nm), was used to acquire the spectral data. The equipment was coupled to a computer, allowing the recording of spectral data from each part of the plant. Each reading had 2200 spectral information items used in full for data processing with machine learning.
2.5. Machine Learning
Six machine learning algorithms were used to model fruit quality variables based on hyperspectral data: Artificial Neural Networks (ANN), M5P (model tree), REPTree decision trees, Random Forest (RF), Support Vector Machine (SMV), and Zero R. Hyperspectral data were used as input variables and the variables to be predicted were used as output variables: acidity, °Brix, ratio, daidzein, daidzin, and genistin. Hyperspectral data from leaves, bark and fruit were used as different inputs for each model.
Three different inputs were used for each model: leaf reflectance (leaf), peel reflectance (peel), and fruit reflectance (fruit) to predict the variables acidity, °Brix, ratio between °Brix and acidity (ratio) and secondary metabolites. Prediction was performed using stratified cross-validation with k-fold = 10 and 10 replicates (100 runs for each model). All model parameters were set according to the default configuration of Weka 3.8.5. software, except for the ANN, which was restricted to 10 neurons in the first and 10 neurons in the second layer, as previously performed in other studies [
28,
29,
30].
The ANN was implemented as a Multilayer Perceptron using the backpropagation algorithm, with a learning rate of 0.3, momentum of 0.2 and 500 epochs. J48 is a classifier that applies an additional pruning based on an error reduction strategy, adopting a minimum of 4 instances per leaf node. REPTree uses information gain for tree selection and pruning by error reduction, with no depth restriction and a minimum weight of the instances in the leaf of 2.0. RF generates 100 trees and uses voting for prediction, with other hyperparameters in the Weka standards. SVM builds hyperplanes for classification, being executed via LibSVM with RBF kernel (degree 3), tolerance of and cost of 1.0.
The performance of the evaluated prediction models was using the following: Mean absolute error (MAE) and Pearson’s correlation coefficient (r). To check the significance of the inputs, the tested algorithms and the interaction between them, an analysis of variance was performed. If significance was present, tables were created with the mean values of r and MAE grouped by the Scott–Knott test at a 5% probability level. The comparison of averages was carried out with the help of the ExpDes.pt packages of the R software version 4.2.3 [
31].
3. Results
Spectral information from pineapple leaves, peels, and fruit were tested in the machine learning models to predict each variable indicative of fruit quality. An analysis of variance with mean squares to investigate the interaction between ML models and inputs was constructed, finding significant interaction between all variables (
Table 2).
When predicting acidity, the use of fruit reflectance provided higher accuracy for all models (
Table 3). When evaluating the performance of the models with each input, fruit reflectance provided better results for SVM with a mean prediction of 0.43 correlation coefficient. With leaf reflectance as input, ANN, RF, and SVM showed higher accuracy mean values of 0.11, 0.11, and 0.18, respectively, which are unreliable prediction values. Peel reflectance showed better performance for the M5P model with an average accuracy of 0.30. As for the error of the models (
Table 3), fruit reflectance generally provided smaller errors for the models, while leaf reflectance produced larger errors. SVM and RF had smaller errors when using each input compared to the other models. These results suggest that while fruit reflectance is the most promising for predicting pineapple acidity, other parts of the plant can be examined to optimize accuracy depending on the variable of interest.
The use of peel reflectance led to a higher accuracy of the models for °Brix. In addition to using this input for ANN and SVM, fruit reflectance also provided good accuracy (
Table 4). When evaluating the models for each input tested, it can be seen that fruit reflectance provided excellent accuracy for the SVM in predicting °Brix, with an accuracy of 0.75. When using leaf reflectance, the SVM model also showed superiority over the others, but with a low correlation coefficient of about 0.17. The SVM also showed better accuracy than the others when using peel reflectance, with an r of about 0.66. As for the MAE (
Table 4), the models with the lowest errors were generally those using SVM, especially when using fruit reflectance. The prediction of °Brix content, peel reflectance and fruit reflectance are the most accurate inputs, with SVM consistently proving to be the most accurate model.
In terms of the relationship between °Brix and acidity (ratio), there were excellent prediction results, with the use of peel and fruit reflectance allowing the best performance for the models, in particular, SVM showed an accuracy of 0.73 when using fruit reflectance (
Table 5). When evaluating the performance of the models within each input data using fruit reflectance, SVM stood out from the others with an accuracy of 0.73. The same model performed better than the others when using leaf reflectance, but with a very low predictive metric of about 0.17. Peel reflectance provided better results for M5P and SVM with an accuracy of 0.57 and 0.56, respectively. Peel and fruit reflectance provided smaller errors, especially for SVM (
Table 5). The results confirm that peel and fruit reflectance are more efficient inputs for predicting the ratio, with SVM performing particularly well against the other models.
When evaluating the secondary metabolite daidzein, it was found that the use of peel reflectance provided better accuracy in all models (
Table 6). When evaluating the performance of the models within each input data, it is noticeable that fruit reflectance provided better accuracy for SVM with an r of 0.61. The same model showed better accuracy than the other models using leaf reflectance of about 0.37. Using the peel reflectance guaranteed higher r values for SVM and M5P, namely 0.63 and 0.60, respectively. In terms of error (
Table 6), the peel also generally provided lower errors and the SVM algorithm was the one with the lowest error rate compared to the others. The reflectance of the shell is the most promising input for the prediction of daidzein, with the SVM algorithm consistently proving to be the most accurate model.
The prediction of daidzin showed a different behavior than the prediction of the other variables, where leaf reflectance contributed the most to better model accuracy (
Table 7). When fruit reflectance was used, the SVM model showed better results, 0.41. When leaf SVM was used as input, it also showed good accuracy, 0.48. Peel reflectance provided better accuracy for M5P and SVM, 0.33 and 0.31, respectively. Leaf reflectance also provided smaller errors for the models, and the SVM algorithm had the lowest error rate. These results indicate that leaf reflectance proved to be the most effective input in predicting daidzin, with SVM remaining the most accurate model with the lowest error.
When predicting genistin, peel reflectance was more accurate than the other inputs (
Table 8). When using the fruit spectral data as input, the SVM achieved higher accuracy than the other models with a correlation coefficient of 0.64. For leaf reflectance, the same model was superior to the others with an accuracy of 0.44 and remained so when peel reflectance was used as input, with 0.63 r. In terms of error, it was mostly similar between models between inputs (
Table 8). The SVM was the model with the lowest accuracy, regardless of the input used.
Finally, the best model was the SVM, which had the best accuracy results, the lowest errors and proved to be the most robust model for most of the pineapple fruit quality variables analyzed. Regarding the best input, the reflectance of the peel and the fruit were generally the most effective inputs, with the peel being particularly efficient for secondary metabolites and the fruit standing out in the prediction of acidity, °Brix and ratio. Leaf reflectance showed good results for daidzin but unsatisfactory performance for the other variables.
4. Discussion
The applicability of hyperspectral data for fruit quality assessment is a technique that identifies and examine spectral data of different wavelengths and bands without destroying the fruit and with high precision [
32,
33]. Predicting fruit quality by analyzing leaves, peel and fruit and other attributes of fruit evaluation is a complex approach that can improve agricultural practices [
34,
35]. When the data are combined with machine learning algorithms to evaluate fruit quality, such as fruit damage, good performance is achieved [
36], as in carrots [
37] and real-time identification of defective apples [
38]. The use of hyperspectral data in the 400–1000 nm range has different capabilities in measuring some internal quality characteristics of Red and Golden Delicious apples in a non-destructive manner for variables such as pH, soluble solids, titratable acid and phenol [
39]. Another application of spectral data are the effectiveness of evaluating the internal and external quality attributes of peaches, demonstrating their versatility in relation to different species [
40].
When analyzing the deconvolution of the models, it was found that the use of fruit reflectance resulted in higher accuracy for all models. In terms of the performance of the individual inputs and the models, fruit reflection gave better results for SVM. The same algorithm showed promising accuracy in identifying moldy apple seeds [
41]. The same model also had good accuracy in predicting titratable acidity [
42], as shown in this study.
Peel and fruit reflectance were the most accurate inputs for °Brix prediction, with the SVM proving to be even more accurate. According to research on blueberry anthocyanin prediction [
43], it was reported that the SVM model was more appropriate for the determination of small samples and had the lowest errors. For the ratio, the results confirm that the reflectance of the peel and fruit are more efficient inputs for predicting the ratio, with the SVM performing particularly well compared to the other models. Furthermore, the application of ML techniques in conjunction with these data produces fast and reliable results for industrial applications [
44].
Pineapple fruits have a high content of various flavonoids, that are directly linked to the color of the fruit and are involved in the white-yellow pigmentation of the fruit [
45,
46,
47]. It has an antioxidant function and cellular signaling pathways and plays an essential role in plant growth and biotic and abiotic stress reduction [
48]. For the flavonoid variable, in general, peel and fruit reflectance were the most effective inputs, with peel being particularly efficient for secondary metabolites. The highest concentrations of flavonoids are found in the pineapple peel, which can be attributed to the protection of the fruit against environmental stress. They also act as biological protectors, against oxidative stress caused by UV radiation and in antimicrobial activity to protect the fruit [
49,
50].
Among the criteria of accuracy that were examined, the SVM’s algorithm was the one with the best performance, regardless of the input used. This algorithm shows good performance in prediction in conjunction with the machine learning algorithm [
51,
52]. It also has minimization of overfitting [
53], it reduces the errors presented, making it an excellent algorithm for prediction and accuracy.
The results indicate that although the use of leaves was initially considered the most promising approach for prediction, the results obtained were not satisfactory as their use would provide early answers while the fruit is still in the field. This suggests that the limited number of leaf samples may negatively affect the accuracy of the models. For future work, increasing the number of leaf samples could provide a more robust and representative database for the analyses. In addition, the application of deep learning techniques could be an effective alternative to improve the predictions, which could lead to more accurate and consistent results in the assessment of the variables of interest. However, the use of hyperspectral data for predictions based on pineapple peel and fruit features can optimize assessments, curb fruit destruction, and be very relevant for post-harvest economics as it provides fast and accurate information, especially when using fruit peels. It is a more accurate, effective, and easy-to-use method. This technology could be made available to farmers and companies to provide information to predict the quality of pineapple fruit. In this way, losses and waste of pineapple fruit can be reduced.
5. Conclusions
The machine learning model, Support Vector Machine (SVM), provided the best results, demonstrating good generalization capacity and effectiveness in predicting these attributes, reaching accuracy values above 0.7 for Brix and ratio, for example, using fruit reflectance. In terms of the overall efficiency of the input variables, peel and fruit were the most informative, with peel standing out especially for the estimation of secondary metabolism compounds, while the fruit showed excellent performance in predicting flavor-related attributes, such as acidity, °Brix and ratio, as mentioned previously, above 0.7. These results highlight the potential of using spectral data and machine learning techniques in the non-destructive assessment of pineapple quality, enabling advances in monitoring and selecting fruits with better sensory and nutritional characteristics.