Characterisation of pineapple cultivars under different storage conditions using infrared thermal imaging coupled with machine learning algorithms

The non-invasive ability of infrared thermal imaging has gained interest in various food classi�cation and recognition tasks. In this work, infrared thermal imaging was used to distinguish different pineapple cultivars i.e. MD2, Morris, and Josapine which were subjected at different storage temperatures i.e. 5, 10, and 25 °C and a relative humidity of 85 to 90 %. A total of 14 features from the thermal images were obtained to determine the variation in terms of image parameters among different pineapple cultivars. Principal component analysis was applied for feature reduction in order to prevent any effect of signi�cant difference between the selected features. Several types of machine learning algorithms were compared including linear discriminant analysis, quadratic discriminant analysis, support vector machine, k-nearest neighbour, decision tree, and Naïve Bayes to obtain the best performance for the classi�cation of pineapple cultivars. The results showed that support vector machine achieved the best performance from the combination of optimal image parameters with the highest classi�cation rate of 100 %. The ability of infrared thermal imaging coupled with machine learning approaches can be potentially used to distinguish pineapple cultivars which could enhance the grading and sorting processes of the fruit.


Introduction
Pineapple belongs to the Ananas genus of the Bromeliaceae family which has been cultivated commercially in subtropical and tropical regions worldwide (Lobo and Yahia 2016, Mohd Ali et al. 2020).
The appealing aroma and abundant nutritional composition of pineapple make it highly favoured by the consumers.Pineapples can be eaten fresh, dried, or processed in various products such as jam, juice, pickle, candy, canned syrup and beverages.The fruit also contains bromelain which acts as an enzyme to break down the protein and serves as a good source for various health bene ts (Zdrojewicz et al. 2018).
In terms of pineapple cultivation grown worldwide, the fruit cultivars are classi ed into four main groups including Queen, Smooth Cayenne, Red Spanish, and Pernambuco (Wali 2018).Generally, the pineapple cultivars are distinguishable by fruit weight, shape, size, colour, bioactive compounds, and physiochemical composition depending on the fruit characteristics.Due to the unique criteria of pineapple, it is necessary to differentiate the differences between the fruit cultivars to match the preferences of the consumers.
The pineapple fruit from different cultivars has been classi ed using various analytical methods such as evaluation of bioactive compounds (Chakraborty et al. 2016), determination of physicochemical properties (Nadzirah et al. 2013, Siti Rashima et al. 2019), carotenoid detection using high performance liquid chromatography (Steingass et al. 2020), and volatile ngerprinting (Lasekan and Hussein 2018).
However, these methods are labour and time-intensive due to the complex analysis and require specialised skills.In particular, conventional methods are heavily in uenced by human labour which is very subjective and requires extensive operation (Rungpichayapichet et al. 2017, Tavakolian et al. 2013).
The market demand for different types of pineapple cultivars is not only associated with the external quality of the fruit but also the internal quality which is prone to defects, speci cally during storage.
Storage is a bene cial factor in the postharvest chain of pineapple since the fruit availability and quality need to be monitored before distributing to the commercial market.For this reason, the conventional methods which are destructive in nature for assessing fruit quality during storage remain a huge challenge.Thus, a reliable and non-destructive technique for classifying pineapple cultivars is required to obtain e cient and robust results.
In recent years, infrared thermal imaging has been introduced as a reliable and non-destructive evaluation technique for monitoring the quality and safety of various agricultural products.Infrared thermal imaging is a non-contact technique that converts the temperature pattern of a material into visible images for the analysis of feature extraction (Gowen et al. 2010, Vadivambal andJayas 2011).The applications of infrared thermal imaging have gained much interest in the fruit industry due to the cost reduction in operating devices, rapid measurement, and simple procedure in obtaining data information of the material (Ishimwe et al. 2014).Furthermore, Hussain et al. (2018) described that the ability of monitoring temperature in food processing required no external source of energy for imaging.To date, various researchers have widely investigated the potential of the infrared thermal imaging technique for the quality inspection of fruit.The previous studies involving the applications of infrared thermal imaging include immature citrus fruit detection (Gan et  Nowadays, various machine learning methods have been developed to quantify the quality and safety evaluation of different kinds of fruit.In this sense, the integration of infrared thermal imaging coupled with machine learning approaches is considered e cient since the multivariate nature of the algorithm is easy to analyse and produces rapid results.The trend of using machine learning is explored by employing various algorithms such as partial least squares (PLS), support vector machine (SVM), principal component analysis (PCA), random forest, ordinary least squares, stepwise linear regression, knearest neighbour (kNN), etc. (Caladcad et al. 2020, Manthou et al. 2020).While a signi cant effort has been exerted in investigating the chemical and physical attributes of pineapples, only limited studies have been undertaken in developing predictive and classi cation systems based on various storage conditions for the fruit.In practical application, any machine learning classi er can be implemented in such a way that the feature extraction may provide distinct classi cation rates and increase the model accuracy (Dela-torre et al. 2019).Hence, this work attempts to classify pineapple cultivars under different storage conditions using infrared thermal imaging coupled with machine learning approach.Useful information regarding the effects of different storage conditions was required to enhance the fruit quality and ensure a long shelf life of pineapples, especially during postharvest operation and storage.Six prominent machine learning algorithms were employed to classify the pineapple cultivars in relation to the different storage conditions including linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), SVM, kNN, decision tree, and Naïve Bayes.The speci c objectives of this research were: (1) to evaluate the variation of the image parameters among different pineapple cultivars under different storage conditions, and (2) to compare the performance metrics of machine learning algorithms based on the image parameter features.

Fruit samples
Three different pineapple cultivars i.e.MD2, Morris, and Josapine were harvested at a ripening stage of Index 2 (50 % unripe, glossy dark green in colour with traces of yellow between eyes at the base) from a local farm in Simpang Renggam, Johor, Malaysia.All the pineapple cultivars were transported immediately to the Biomaterials Processing Laboratory, Universiti Putra Malaysia after harvest.The fruit samples were stored at three different temperatures: 5 °C (cold storage room), 10 °C (controlled refrigerator), and 25 °C (air-ventilated laboratory room) and a relative humidity of 85 to 90 %.The fruit samples were randomly numbered without cleaning or treatment prior to the storage to prevent any losses.Thirty pineapple samples were randomly selected into four interval groups (Day 0, Day 7, Day 14, and Day 21) for each cultivar.The fruits were kept in a laboratory room condition (25.0 ± 1.0 °C, 90.0 ± 0.5 % RH) before starting the sample preparation procedure.To determine the pineapple classi cation, a total of 1080 samples (360 of each cultivar) were randomly selected.All of the fruit samples from three different cultivars were analysed and divided into training and testing datasets.Based on the random classi cation algorithms, 756 pineapple samples were used in the training dataset, whereas the remaining 324 samples were chosen for the testing dataset.The fruit samples in the training and testing datasets remained the same for all the algorithms in order to compare the performance of different models.

Infrared thermal imaging
An infrared thermal imaging system consisting of a thermographic camera (FLIR E60, FLIR systems, King Hills, United Kingdom), sample holder, and a computer equipped with processing software was developed.The thermographic camera with temperature control in the range of −20 ºC to +650 ºC was equipped with 0.7 to 1.4 µm, an infrared resolution of 320 × 240 pixels, and a thermal sensitivity less than 0.05 ºC.A lens with a eld of view of 25° x 19° and ve measurement modes was used with the thermographic camera.The distance between the camera lens and the fruit surface was set to 0.4 m to capture the thermal images.The image acquisition of the fruits was performed immediately upon removing the samples from the storage at ambient temperature for the identi cation of pineapple cultivars under different storage conditions.The thermal images were acquired at a room temperature of 25 °C to avoid potential uctuations in temperatures of the thermal camera due to continuous operation.
A total of 3240 thermal images were obtained for the overall fruit samples.

Thermal image processing
Feature extraction was carried out to select the region of interest (ROI) of thermal images.Prior to feature extraction, the image processing and segmentation steps were performed to facilitate the cultivar classi cation of pineapples based on the selected image features.The image processing and segmentation steps of the thermal image are described in Fig. 1.The image processing steps comprise the removal of image shadow, background noise elimination as well as the separation of the ROI from the image background.The thermal image was converted to a grayscale image to facilitate the feature extraction.The Otsu thresholding technique was performed to obtain the threshold level in order to convert the grayscale image to a binary image.In this case, the image segmentation could be maximised by dividing the image into the background and selected ROI.The shape and pixel value features were obtained by feature extraction using MATLAB Version R2020a software (The MathWorks, USA).A total of 14 image features, accumulated from 6 pixel values (maximum intensity, mean intensity, minimum intensity, maximum of ROI, mean of ROI, and minimum of ROI) and 8 shapes (centroid, area, eccentricity, perimeter, orientation, major axis length, minor axis length, and extent) features were selected for each pineapple cultivar.The respective values in all the selected features were described in the pixel count which was stored as the classi cation variables.

Variable selection using principal component analysis
PCA was applied to obtain feature extraction according to the high component loading of the principal components (PCs).The PCA analysis was carried out using the Unscrambler X Version 10.3 (CAMO Software, Oslo, Norway).The relationship between the storage conditions was highlighted which was associated with the selected features based on the thermal images of pineapple.The variables were selected corresponding to the maximum eigenvalues and the largest contribution in the PCs in order to visualise the distribution of the pineapple cultivars.The proportion of total variability and the eigenvalues were determined based on the PCA.In this study, PCA score plots and correlation loading were obtained to choose the optimal image parameters for the cultivar classi cation of pineapples.

Machine learning algorithms
Six different machine learning algorithms were studied to classify the pineapple cultivars in relation to different storage conditions based on the image parameters including LDA, QDA, SVM, kNN, decision tree, and Naïve Bayes.All the machine learning algorithms were built using MATLAB Version R2020a software (The MathWorks, USA) in order to discriminate the pineapple cultivars.The owchart for the classi cation of pineapple cultivars using infrared thermal imaging is illustrated in Fig. 2.

Linear discriminant analysis (LDA)
The LDA algorithm applies a linear transformation to obtain the directions of maximum variance of input data (De-la-torre et al. 2019).The LDA is known as a supervised method that is widely used to classify objects for data generation and classi cation tasks.This algorithm aims to maximise interclass variability by employing several groupings for the classi cation model.In the present study, the LDA method was applied to develop classi cation models for the discrimination of pineapple cultivars according to the different storage conditions.

Quadratic discriminant analysis (QDA)
Another discriminant analysis that is used for the classi cation approach is the QDA method.The QDA method is based on quadratic model that generates the classi cation task from the testing data.Apart from that, QDA allows the covariance of each class instead of pooling the whole sample (Munera et al. 2017).To run both LDA and QDA algorithms, 70 % of the overall datasets were considered as training data and the remaining 30 % as the testing data.

Support vector machine (SVM)
The SVM is a supervised learning method speci cally to evaluate data for regression and classi cation problems.This approach uses a hyperplane that has the largest distance to the nearest training data from any class to achieve good classi cation (Azarmdel et al. 2020).The SVM can also discriminate nonlinear data by using a kernel function k (x, y) according to the related task.Further, SVM has been used for binary classi cation in order to achieve accurate results with a small amount of data sampling.By using the SVM method, 30 % was considered as testing data and the rest was selected as training data.
To establish a computational load, several common kernels can be used such as linear, sigmoid, radial basis, and polynomial functions (Koklu and Ozkan 2020).For this purpose, the radial basis function with the penalty coe cient value (γ) at 1 was selected for the SVM model.The radial basis function is de ned in Equation 1: Radial basis function:

K-nearest neighbour (kNN)
The kNN is a supervised method particularly for regression and classi cation analyses in which K denotes the number of neighbours (Noviyanto and Abdulla 2020).The input data comprises the k-closest training data from the features whereas the output data is the classi ed number of instances.Generally, the distances between training and testing data were calculated to obtain the k-nearest neighbour decision factor (Qiu et al. 2018).In order to categorise the testing data from several classes, the minimum distance based on the training data was determined.Initially, one and highest K values were evaluated to obtain the optimum number of neighbours.In this study, the number of neighbours obtained to develop the classi cation model was equalled to 10.

Decision tree
Decision tree is a decision support algorithm that implements a tree-like model to describe a possible result as a function of independent variables.This approach is widely used in classi cation models due to the easy interpretation and good reliability with the database systems (Sharma et al. 2020).The tree model was developed by repetitive splits of subsets based on the training datasets.Typically, each split was described by a simple rule according to the single independent variable.An optimal feature was chosen as the basis for the division set in order to construct the tree model.The training datasets were also randomly divided into subsets to obtain the best classi cation results.The tree model was developed when all of the subsets tted to the leaf nodes.A Gini index was used to choose a split for the benchmark of partition in the decision tree.The process was repeated until the tree model had a maximum size once the split was determined.For this study, the maximum number for decision trees was 16 for the classi cation model.

Naïve Bayes
Naïve Bayes is a parametric and supervised technique according to the Bayes' theorem along with strong independence associations between the data features (Yang et al. 2019).The preceding probability of classes was determined using the class relative frequency distribution.Naïve Bayes allows a normal distribution between classes by calculating the standard deviation and average of the training dataset via maximum likelihood estimation (Mustaffa et al. 2018).In this study, a classi cation model from the Naïve Bayes was applied to testing data based on the largest posterior probability.

Data analysis
Signi cant differences of image parameters at different cultivars were identi ed using analysis of variance (ANOVA).The mean comparison was determined by Tukey's test based on P < 0.05 using the SAS software (Version 9.4, SAS Institute, Cary, NC, USA).
The performance of the machine learning algorithms was evaluated in terms of classi cation accuracy (%).The classi cation methods were carried out using ten-fold cross-validation using the selected image parameters from the feature extraction.The mean accuracy was obtained for each classi cation trial in order to compare the performance among the machine learning algorithms.Generally, a classi cation model was evaluated based on the high accuracy rate from the classi cation trials.Further, a confusion matrix was used to describe the estimation rate of the machine learning algorithm with several variables known as true negatives, true positives, false negatives, and false positives (Nisio et al. 2020).

Feature selection
The average values of the image parameters from different pineapple cultivars are tabulated in Table 1.All image parameters of different pineapple cultivars had signi cant differences at the 95 % con dence level (p < 0.05).The values in all image parameters were calculated as a pixel count.The highest eccentricity and perimeter were found in MD2 cultivar with the values of 0.72 and 1464.30,respectively.For Josapine cultivar, the highest values of the image parameters were obtained in area (63976.00),orientation (0.70) and extent (0.84).On the other hand, the image parameter values remained unchanged for minimum intensity (0.54), maximum of ROI (0.97), and minimum of ROI (0.55) for all pineapple cultivars, respectively.As for the remaining image parameters, the highest values were found in Morris including centroid (157.39),major axis length (393.74),minor axis length (292.21),maximum intensity (0.96), mean intensity (0.67), and mean of ROI (0.81).For this reason, the utilisation of image parameters was best described to de ne the behaviour of the thermal images, contributing to the high dependency based on different pineapple cultivars.
In order to explore the dataset, a quantitative feature comparison was determined to evaluate the differences between the pineapple cultivars for the classi cation task.It was revealed that the distribution of image parameter values was signi cantly different between all pineapple cultivars.Considering the difference in the fruit cultivar, the temperature differences were attributed to the selected features of the thermal images (Yogesh et al. 2018).The changes of image parameters showed the pixel distribution based on the temperature mapping attained at the surface of the pineapples for different fruit cultivars.The image features were the basic elements for the cultivar discrimination which would be useful in determining the characteristics and parameters of the sample (Singh et al. 2020).Apart from that, the output from the feature selection of the image parameters is applied as input for developing machine learning algorithms to further improve the classi cation accuracy.

Relationship analysis
The image parameters derived from the pixel values and shape features were used to distinguish the pineapple cultivars.The linear correlation coe cients between all image parameters of pineapple images are shown in Fig. 3.Among all the image parameters, minimum intensity was highly correlated with eccentricity with a correlation coe cient (r) of 0.98.In contrast, extent was negatively correlated (r = -0.97)with minimum intensity.A low correlation was found between perimeter and major axis length (r = 0.56).It was demonstrated that the centroid was positively correlated with maximum intensity, area, extent, and orientation with linear correlation coe cients ranging from 0.68 to 0.94.Based on the pixel value features, only the maximum intensity was found to be positively correlated with all of the shape features.
In addition, speci c image parameters with high correlations could be chosen to be associated with a certain feature for the classi cation.A high correlation was observed due to the variation between the fruit cultivars indicating the relationship between the pixel values and shape features of the pineapples.Koklu and Ozkan (2020) identi ed different types of dry beans using shape and dimensional features taken from two-dimensional images for classifying the varieties.Feature extraction was generated to achieve feature values which were used to statistically compare between the classes for the classi cation (Van De Looverbosch et al. 2020).In this case, linear correlation has been used to investigate the relationship among fruit properties and cultivars as well as to obtain discriminatory features (Yang et al. 2019).In relation to the relationship analysis, all of the image parameters were signi cantly correlated which were feasible to determine the classi cation of pineapple cultivars according to different storage conditions.

Classi cation results using PCA
Based on the image parameters of pineapple images, the effectiveness of PCA models was evaluated as shown in Fig. 4. The PCA model was established to verify the clustering ability of the three different pineapple cultivars, namely MD2, Josapine, and Morris.It was observed that the three different pineapple cultivars were successfully classi ed by two PCs with PC1 (97 %) and PC2 (3 %), accumulating a total variance of 100 %, respectively (Fig. 4a).The classi cation results using PCA models were in agreement with Kuzy et al. (2018) who demonstrated high capability in terms of clustering patterns between Farthing and Meadowlark berries.Further, the ndings revealed that the three pineapple cultivars showed positive scores along both PC1 and PC2 according to the variability loadings.
According to the clustering performance based on the storage temperatures, the results clearly distinguished the variations by two components of PC1 (80 %) and PC2 (17 %) with total variances of 97 % (Fig. 4b), respectively.For this reason, it could be explained that each pineapple cultivar subjected to three different storage temperatures (5, 10, and 25 °C) showed signi cant variations in the quality attributes of the fruit.As a result, all the pineapple cultivars stored at three different storage temperatures were correctly discriminated according to the variability of image parameters.Additionally, the ndings successfully discriminated the variations of image parameters in relation to different storage days as proportioned by PC1 (74 %) and PC2 (25 %) resulting in a total variance of 99 % (Fig. 4c).In order to investigate the effect of image parameters, both PC1 and PC2 signi ed the ability of the infrared thermal imaging technique to distinguish the variations observed in pineapple samples during storage.
With respect to the classi cation scores corresponding with the selected image parameters and different pineapple cultivars, the correlation loadings were strongly correlated with PC1 (95 %) and PC2 (5 %), accumulating a total variance of 100 % (Fig. 4d).The results indicated that maximum intensity, mean intensity, minimum intensity, maximum of ROI, mean of ROI, and minimum of ROI, orientation, and extent described the best combination of image parameters for the classi cation of pineapple cultivars were subjected to the interior ellipse in the PCA plot.Furthermore, the loading scores aided in the detection of optimal image parameters which were suitable for the classi cation task of pineapple cultivars based on different storage conditions.All pineapple samples consisting of MD2, Josapine, and Morris were correctly distinguished in their respective clusters according to their cultivar-related functions.The discrimination of pineapple cultivars based on the image parameters was important as an indicator to provide a clear visualisation in uenced by the different storage conditions These observations were similar to Sanchez et al. (2021) who reported total variances of 100 % for the classi cation of sweet potato varieties based on the quality properties during storage.With regard to the experimental factors used, the PCA method required at least two variables to evaluate the classi cation performance of the samples (Mohd Ali et al. 2021).Thus, the baseline data could be applied to evaluate the variability of other physicochemical properties of pineapples for a wide range of cultivars and experimental factors.

Comparison of machine learning models
The implementation of machine learning algorithms was developed to determine the classi cation accuracy for the detection of pineapple cultivars based on different storage conditions using infrared thermal imaging technique.The classi cation performance of pineapple cultivars at different storage days and temperatures using the LDA method is presented in Table 2.The LDA results were obtained according to the classi cation performance of pineapple cultivars at 25 °C (93.21-98.03%), followed by 10 °C (92.49-97.91%), and 5 °C (92.81-97.64%), respectively.It can be denoted that the classi cation accuracy of the LDA models increased over storage days for all pineapple cultivars at different storage temperatures.The LDA models attained the highest classi cation accuracies recorded at 25 °C for both Day 0 (94.67 %) and Day 7 (96.39%) from the Josapine cultivar, respectively.The Morris cultivar obtained the highest classi cation accuracy among all storage days at 25 °C (98.03 %) for Day 21.The performance of the infrared thermal imaging technique based on LDA was found to be feasible which obtained the overall classi cation rates up to 96.25 % under different storage conditions for all pineapple cultivars.
The classi cation performance of pineapple cultivars at different storage days and temperatures using the QDA method is shown in Table 3.The ndings were described based on the classi cation performance of the pineapple cultivars at 25 °C (92.66-99.28%), followed by 10 °C (92.53-98.47%), and 5 °C (93.85-97.60%), respectively.The classi cation accuracy of the QDA models gradually increased over the storage days for all pineapple cultivars at different storage temperatures.The QDA models obtained the highest classi cation accuracies recorded at 25 °C for both Day 7 (95.71%) and Day 21 (99.28 %) from the Josapine cultivar, respectively.Based on the QDA results, it was signi ed that the overall classi cation rates achieved up to 96.40 % under different storage conditions for all pineapple cultivars.
The classi cation performance of pineapple cultivars at different storage days and temperatures using the SVM method is demonstrated in Table 4.The ndings were evaluated according to the classi cation performance of the pineapple cultivars at 25 °C (96.32-99.93%), followed by 10 °C (94.96-99.72 %), and 5 °C (96.02-99.62%), respectively.It was also observed that the classi cation accuracy of the SVM models increased over storage days for all pineapple cultivars at different storage temperatures.The SVM models achieved the highest classi cation accuracies recorded at 25 °C for Day 7 (99.11%), Day 14 (99.92 %), and Day 21 (99.93 %) from the Morris cultivar, respectively.Similarly, the Morris cultivar obtained the highest classi cation accuracy for Day 0 (98.26 %) which was recorded at 5 °C.Moreover, it was revealed that the overall classi cation rates achieved up to 99.30 % under different storage conditions for all pineapple cultivars.
The classi cation performance of pineapple cultivars at different storage days and temperatures using the kNN method is presented in Table 5.The kNN results were obtained according to the classi cation performance of the pineapple cultivars at 25 °C (95.83-99.93%), followed by 10 °C (96.42-99.75%), and 5 °C (95.39-99.46%), respectively.It was demonstrated that the classi cation accuracy of the kNN models increased over the storage days for all pineapple cultivars at different storage temperatures.The kNN models obtained the highest classi cation accuracies recorded at 25 °C for Day 7 (98.41%), Day 14 (99.48 %), and Day 21 (99.93 %) from the Morris cultivar, respectively.Likewise, the Morris cultivar also attained the highest classi cation accuracy for Day 0 (97.49%) which was recorded at 10 °C.In addition, the overall classi cation rates achieved up to 98.70 % under different storage conditions for all pineapple cultivars.
The classi cation performance of pineapple cultivars at different storage days and temperatures using the decision tree method is tabulated in Table 6.The ndings were achieved based on the classi cation performance of the pineapple cultivars at 10 °C (96.37-99.95%), followed by 25 °C (94.59-99.86%), and 5 °C (95.20-99.59%), respectively.It was signi ed that the classi cation accuracy of the decision tree models signi cantly increased over the storage days for all pineapple cultivars at different storage temperatures.The decision tree models achieved the highest classi cation accuracies recorded at 25 °C for Day 7 (99.86 %) and Day 14 (99.74 %) from the Morris cultivar, respectively.It was also revealed that the overall classi cation rates achieved up to 98.67 % under different storage conditions for all pineapple cultivars.
The classi cation performance of pineapple varieties at different storage days and temperatures using the Naïve Bayes method is shown in Table 7.The promising Naïve Bayes results were accounted according to the classi cation performance of the pineapple cultivars at 5 °C (95.27-99.96%), followed by 10 °C (95.09-99.96%), and 25 °C (93.67-99.92%), respectively.Based on the results, the classi cation accuracy of the Naïve Bayes models increased over the storage days for all pineapple varieties at different storage temperatures.The Naïve Bayes models obtained the highest classi cation accuracies recorded at 10 °C for Day 21 (99.96%) from the Morris cultivar.The Josapine cultivar also obtained the highest classi cation accuracy at 10 °C (97.49%) which was recorded at Day 7. It was also found that the overall classi cation rates achieved up to 98.03 % under different storage conditions for all pineapple cultivars.These ndings inferred that the changes in image parameters of pineapple cultivars using infrared thermal imaging technique could be promising to be utilised in monitoring various storage conditions.
In general, all the machine learning algorithms succeeded in achieving up to 99.30 % of overall classi cation rates in distinguishing pineapple cultivars according to various storage conditions.The typical trend of classi cation accuracy was enhanced in the large total number of features selected from the feature extraction (Kuzy et al. 2018).Regardless of the discrepancy in classi cation accuracies between the pineapple cultivars, it should be noted that the reference measurement described the signi cant changes in image parameters.Vélez Rivera et al. ( 2014) obtained a success rate of 90 % in detecting mechanical defects in mango using several algorithms such as LDA, kNN, and Naïve Bayes.Notwithstanding, in the majority of the cases, the high correlation of fruit properties could be predicted based on the selected features from the images (Yang et al. 2019).In view of different storage conditions of the fruit, infrared thermal imaging coupled with machine learning demonstrated strong performance and ability for the given classi cation applications.
To further classify pineapple cultivars according to the image parameters, the selected feature extraction allows the machine learning algorithms to achieve classi cation accuracy.The comparative performance in terms of classi cation accuracy for the classi cation of pineapple cultivars between the machine learning algorithms is monitored based on the optimal combination of image parameters.In this case, the distinct features selected from the image parameters provided a different optimal combination applied for each machine learning algorithm using confusion matrix.Particularly, eight image parameters were selected including maximum intensity, mean intensity, minimum intensity, maximum of ROI, mean of ROI, and minimum of ROI, orientation, and extent based on the feature selection using PCA analysis to achieve the highest performance of classi cation accuracy.The confusion matrices with average classi cation rates of different pineapple cultivars using six different machine learning algorithms are illustrated in Fig. 5.
It can be demonstrated that the LDA achieved an accuracy of 95 %, 94 %, and 96 % for the correct classi cation of Josapine, MD2, and Morris, respectively.The highest classi cation accuracy for correctly classi ed Josapine (97 %), Morris (97 %), and MD2 (94 %) was achieved by QDA.On the other hand, the SVM outperformed the rest of the machine learning algorithms with the highest classi cation rate of 100 % for the correct classi cation of all pineapple cultivars.In the case of kNN algorithm, both Josapine and MD2 were correctly classi ed with the highest classi cation accuracy of 100 %.The decision tree reached a good classi cation accuracy of 98 % for Josapine, 95 % for MD2, and 99 % for Morris, respectively.For the Naïve Bayes algorithm, the highest classi cation accuracy obtained was 98 % for correctly classi ed MD2 cultivar.The dataset of each pineapple cultivar was validated without retraining the machine learning algorithms in order to test the generalisability to other cultivars.Different algorithms should be employed according to the condition according to the current state of the data analysis in obtaining more accurate classi cation results.
With respect to the misclassi cation of different pineapple cultivars, it could be attributed due to the differences in terms of maturity stages and the relationship of variation in quality attributes (Müller et al. 2019).In a previous study by Van De Looverbosch et al. (2020), the superior SVM algorithm was investigated in order to detect two cultivars of pear with several internal disorder severities which obtained the highest classi cation accuracy of 95 %.Generally, the performance of all the machine learning algorithms described the highest classi cation accuracies based on the optimal combination features of image parameters.It was observed that all the machine learning models successfully classi ed the pineapple cultivars with the highest correct classi cation up to 100 %.Feature extraction may provide the means to choose a minimum number of image parameters for a given classi cation task in such a way as to reduce the computational complexity and enhance the model performance (Dela-torre et al. 2019).Hence, it can be denoted that all the machine learning algorithms were able to distinguish between different pineapples cultivars acquired using the infrared thermal imaging technique.

Conclusion
The current study evaluated the potential of infrared thermal imaging coupled with machine learning approaches for the cultivar classi cation of pineapples.The PCA analysis was employed to determine the optimal features to facilitate the cultivar classi cation of pineapples.By comparing the performance of six different machine learning algorithms, SVM was found to achieve the highest overall classi cation accuracy of 100 % which could be applied for the discrimination of pineapple cultivars in a nondestructive manner.Additionally, the results demonstrated that feature extraction based on the image parameters allows the machine learning classi ers to obtain high accuracy which should be considered for the real-time performance of the infrared thermal imaging technique.This evidence provides an insight into the operation involving fruit classi cation and recognition as an alternative to the manual and tedious conventional methods in order to save an enormous amount of time and effort.Future work may include the application of more sophisticated algorithms such as by employing deep learning for dealing with large datasets.Other algorithms should also be tested to obtain the best combination of feature extraction towards monitoring various fruit classi cation and recognition as well as other agricultural produce.
thermal image analysis using image segmentation processes Page 23/26

Table 2 .
Author Contributions M.M.A. wrote the main manuscript.N.H. critically revised and nalized the manuscript.S.A.A. and O.L. provided advice in the manuscript revisions.All authors have read reviewed the manuscript.The classi cation performance of pineapple cultivars at different storage days and temperatures * Values are the mean ± standard deviation.Different letters in the same row indicate signi cant differences (P < 0.05).

Table 4 .
The classi cation performance of pineapple cultivars at different storage days and temperatures using support vector machine

Table 5 .
The classi cation performance of pineapple cultivars at different storage days and temperatures

Table 6 .
The classi cation performance of pineapple cultivars at different storage days and temperatures using decision tree

Table 7 .
The classi cation performance of pineapple cultivars at different storage days and temperatures