Modelling of Mechanical Properties of Fresh and Stored Fruit of Large Cranberry Using Multiple Linear Regression and Machine Learning

: The study investigated the selected mechanical properties of fresh and stored large cranberries. The analyses focused on changes in the energy requirement up to the breaking point and aimed to identify the apparent elasticity index of the fruit of the investigated large cranberry fruit varieties relating to harvest time, water content, as well as storage duration and conditions. After 25 days in storage, the fruit of the investigated varieties were found with a decrease in mean acidity, from 1.56 g  100 g −1 to 1.42 g  100 g −1 , and mean water content, from 89.71% to 87.95%. The findings showed a decrease in breaking energy; there was also a change in the apparent modulus of elasticity, its mean value in the fresh fruit was 0.431 ± 0.07 MPa, and after 25 days of storage it decreased to 0.271 ± 0.08 MPa. The relationships between the cranberry varieties, storage temperature, duration of storage, x, y, and z dimensions of the fruits, and their selected mechanical parameters were mod-eled with the use of multiple linear


Introduction
The large cranberry (Vaccinium macrocarpon Aiton) is mainly cultivated as an industrial crop in North America (Canada and north-eastern regions of the USA) since peat soils and wetlands, commonly occurring there, constitute the optimum substrate for cranberry cultivation; the crop is also grown in Europe, mainly in Latvia and Belarus, and, in recent years, in south-eastern Poland [1]. The highest harvest in 2019 was recorded in the USA and amounted to 359,110 tons from a cultivation area of 15,580 ha. In Canada 172,440 tons were harvested from an area of 6393 ha, while in Belarus 235 tons of cranberries from an area of 101 ha [2].
Depending on the variety, the fruit of the large cranberry is spherical with a common diameter of 20 mm and skin color ranging from pink to dark purple [3]. Cranberry fruit and juice are highly valued in medicine because of their antimicrobial and antimycotic properties, producing beneficial effects, for instance, in the treatment of kidney diseases and inflammation of the urinary tract [4,5]. The high contents of polyphenols, including anthocyanins, flavonoids, stilbenes, phenolic acids, and proanthocyanidins also confirm the fact that cranberry fruits offer health benefits [6,7].
The harvesting of cranberries in large plantations is carried out using highly efficient mechanical shakers. "Wet harvesting" technology (where a given section of the plantation is flooded with water) makes it possible to increase the effectiveness of the harvest and to reduce damage to fruit collected mechanically, compared to fruit collected using the "dry harvest" method [8][9][10]. Mechanical defects occurring during harvest, transportation to a processing plant, and handling activities such as cleaning, rinsing, and other post-harvest operations adversely affect the quality of the raw material, eliminating a part of or an entire batch designated for commerce or food processing. This is linked to the fact that the fruit is crushed or bruised leading to a decrease in water content, as well as changes in the texture and firmness of the berries [11]. A decrease in water content in soft fruit after storage was reported in prior literature [12,13].
There is increased usage of this raw material and higher demands regarding quality requirements defined by food and pharmaceutical industries for cranberries, which predominantly are harvested mechanically in commercial plantations. For this reason, it is necessary to investigate the mechanical properties, which will make it possible to determine the timing of harvest and storage conditions. A study investigating the mechanical properties of cranberry skin and flesh was conducted by Gorzelany et al. [14]. Skin and flesh puncture testing was carried out with a cylindrical stamp, with a diameter of 2 mm, which was pressed into the fruit. The recorded measurements included puncture strength and energy and absolute lengthwise deformation.
Given the spherical shape of cranberries of the selected varieties, their mechanical properties may be determined using methodology applicable to spherical fruit, such as high-bush blueberries, redcurrants, tomatoes, onions, and Brussels sprouts [15,16]. The analysis of force-displacement relation showed the value of acceptable force and relative displacement as a reaction to preset load.
In a uniaxial compression test, applied with quasi-static loading, cranberries were compressed between two flat horizontal planes until breaking point [17]. To interpret the results, it is necessary to apply the appropriate statistical methods which make it possible to accurately assess the effect of the selected factors on the relevant mechanical parameters of the fruit of large cranberries as well as the chemical characteristics of the raw material.
Mathematical modelling is a very common method used in food technology and agriculture [18][19][20][21][22]. Accurate models allow the prediction of the physicochemical properties of food and optimize storage conditions. There are two approaches to the mathematical modelling of the mechanical properties of food. The first is based on experimental data (empirical), and the second is based on the physical nature of the phenomenon (theoretical). Since empirical modelling is more precise and easier to develop [23], this approach is becoming increasingly popular. Various methods and techniques are employed for empirical modelling in the food industry, including artificial intelligence techniques such as artificial neural networks (ANNs) and support vector machines (SVMs). Many applications of empirical modelling in the food industry were reported in prior literature. The ANN technique was useful for the determining changes in the water content, protein, and gluten in stored wheat [24], for accurate and rapid prediction of the moisture and fat content of tofu [25], for the development of a crispness prediction model of crunchy food [26], or the estimation of sugar concentration in food products [27]. Chauchard et al. [28] proposed the sensor for acidity prediction in grapes based on NIR spectroscopy and Least-Squared Support Vector Machine regression. The same technique was employed to predict the mechanical properties of prawns [29]. The Support Vector Machine was reported as a regression technique for the development of an accurate model of the soluble solid content of apples [30].
The aim of this research was to investigate the effect of the duration of storage on the fruits' selected physical (content of water and total acidity) and mechanical properties, including deformation, breaking energy, and apparent elasticity index. The results obtained can be considered a useful tool when the harvesting, transportation, and processing of cranberry fruits is developed and optimized. The acquired results were assessed using statistical analysis and mathematical modelling. The comparison of models developed with the use of multiple linear regression and machine learning was the additional aim of the research.

Characteristics of the Research Material
The research material consisted of three cranberry fruit varieties: Ben Lear, Pilgrim, and Stevens, obtained from a plantation located in Radomyśl nad Sanem (50°40′52″ N 21°56′41″ E; Stalowa Wola District, Podkarpackie Region). The cranberry plantation was established in 2014 on plots with a modified substrate, the top layer of which was finegrained washed sand with a thickness of 0.4 m. The meteorological conditions in the year of research are characterized in Table 1 [31]. Fruits were collected in the maturation stage (fruit being ready for harvest). The date of fruit harvest depended on the variety: 29 September 2018 for the Stevens variety and 10 October 2018 for the Pilgrim and Ben Lear varieties. Fruits of each cranberry variety were divided into three batches: two were placed in cold storage at temperatures of 4 °C and 10 °C, and one sample was stored at a temperature of 20 °C.

Measurement of the Chemical Properties
The water content of the fresh cranberries and the material kept in cold storage (randomly selected plant samples) was determined using the dryer method (105 °C)-PN-90/A-75101-03 [32], whereas total acidity of the material was determined in accordance with PN-90/A-75101-04 [33].

Measurement of the Mechanical Properties
The selected mechanical parameters of cranberry fruit (randomly selected samples) were determined in the uniaxial compression test between two horizontal plates using a Zwick/ Roell 2010 testing machine (ZwickRoell GmbH & Co. KG, Ulm, Germany). The following parameters were defined for the measurement process: initial stress applied to the sample was 0.1 N, and the speed of the loading panel during the test was 0.5 mms −1 . Values of maximum breaking force F (N) and deformation λ [mm] were recorded after each series of measurements. Characteristics of the force-deformation were determined based on the strength test. A summary value of the unit energy inputs (work) was considered in calculating the values of the apparent elasticity index Ec as a measure of the effective value of mechanical resistance of the investigated material.
where Ec is an apparent modulus of elasticity, F is a maximum breaking force (N), x and y are the dimensions of the ellipsoid in the direction perpendicular to the acting load (mm), and λ is a deformation in the direction of the loading applied (mm).

Method of Building Linear Models Using Multiple Linear Regression
Multiple linear regression (MLR) is the most commonly used linear regression. It is often used as a predictive tool, and it helps to explain the relationship between multiple independent variables (X1, X2...Xk) and the tested dependent variable (Y). The model's coefficient of determination, R 2 , explains the percentage variation in the dependent variable explained by the model; in other words, it is a measure of model fit.
The computational problem of multiple regression is to fit a straight line to a set of points. The most frequently used method for its implementation is the method of least squares. The method enables adjusting parameters of the regression equation so that the sum of squares of distances of measurement points from the determined line is as small as possible.
The equation of the regression line is in the form: where Y is the dependent (explained) variable, [X1, X2,...,Xk] are the independent (explanatory) variables, [β0, β1, β2,…,βk] are the parameters of the equation, and ε is the random component (model residual).
In this study, three regression models were built. The models were named R, RW, and REc, according to the labels of the independent characteristics that were analyzed (-deformation, W-breaking energy, and Ec-modulus of elasticity). The whole set of available measurement data consisted of 244 records and included six independent traits: variety, storage temperature, duration of storage, and the x, y, and z dimensions were used for the analyses.

Artificial Neural Networks
Artificial neural networks (ANNs) are a group of tools that are very useful for regression, classification, clustering, and other tasks. The most important advantage of ANNs is the fact that they are trained based on a data set (e.g., experimental data set) and, therefore, don't have to be programmed. This means that no prior knowledge about modelled phenomenon is necessary. In this research, a multilayer perceptron (MLP) was used for nonlinear regression. MLP consists of layers: an input layer, one or more hidden layers, and an output layer. The hidden layers and the output layer are composed of very simple units called artificial neurons. The input layer is composed of nodes that transfer input signals into the structure of ANN. In MLP, signals are forwarded only from the input layer through the hidden layers to the output layer where the output signals of ANN are produced (with no feedback loops). For this research, MLP with one hidden layer was used. The error backpropagation algorithm was employed for MLP training to adjust the connection weights in the network starting from their initial random values. The training process minimizes the error between the target output vector and output signals calculated by the ANN. Some parameters need to be adjusted in the MLP development process, namely the number of neurons in the hidden layers and transfer functions of neurons. During this research, these parameters were adjusted using a trial-and-error method. For each regression model, 5000 ANNs were trained with the use of Statistica v. 13 software. The number of neurons in the hidden layer was changed from 10 to 40. Different activation functions were used, namely sigmoid, hyperbolic tangent, and exponential. The experimental data set of 244 vectors were first normalized and then divided randomly into training, test, and validation sets at a 70:15:15 ratio.

Sensitivity Analysis
Sensitivity analysis in neural networks is a method that provides information about the relative importance of independent input variables in the model. In this research, the sensitivity analysis implemented in a Statistica v. 13 environment was used to calculate the influence of the input parameters on the output parameter of the ANN model. This method consists of two steps. First, the values of each input variable are replaced by its mean value, calculated based on the training data set. Then an error ratio is calculated. The error ratio is a quotient of the network error with a certain input changed by its mean value, and the network error with the input with the original value is calculated. Based on the error ratios of all input parameters, a percentage influence of independent input variables on the output of an ANN model can be determined. A similar sensitivity analysis method was used by Hadzima-Nyarko et al. [34] to model and analyze the structural damage after an earthquake.

Support Vector Machines
The support vector machine (SVM) was first proposed by Vapnik [35]. This technique is generally used for classification or nonlinear regression. There are many papers describing details of the underlying concept and the theoretical background of SVM [36,37]. Two types of SVM models can be used for the regression process: ε-type and -type support vector regression method. For this research, the ε -SVM regression model was used. The regression function is defined as: where w is the weights vector and b is the bias, x is an input feature vector, and y is target vector. The objective function (4) is minimized.
where C is the capacity constant,  is the size of the -insensitive tube which can be interpreted as the accuracy demanded for the approximation, (x) is the kernel function, and , * are slack variables. The kernel function is crucial for the performance of the SVM model. Kernel functions used for SVM regression models are e.g., polynomial, sigmoid, and radial basis function (RBF). RBF kernel was reported as the most appropriate for nonlinear regression [38]. Gaussian radial basis function (6) was used as a kernel function in this study.
When the ε-SVM regression model with RBF kernel function is used, the three parameters should be adjusted: C, , and . Proper tuning of these parameters can greatly improve the generalization capacity of the model. The correct value of  parameter in RBF kernel can avoid under-fitting and over-fitting phenomena in prediction [39]. The  influences the bias significantly, and its optimal value depends on the type of noise present in the dataset [40,41]. The C parameter affects the number of support vectors, and the proper value of C can minimize the over-fitting problem [42]. In this research, C, , and  were adjusted by the trial-and-error method. The dataset was randomly divided into training and validation sets in a ratio of 3:1. The ten-fold cross-validation method was used. All experiments were performed in Statistica v. 13 software.

Criteria of Accuracy Assessment of Models
The accuracy of the models developed in this research was evaluated based on two criteria, namely coefficient of correlation (R) and root mean squared error (RMSE) which are calculated as follows: where: Ypred is the absolute predicted value, ̅ is the average predicted value, Ymeas is the absolute measured (experimental) value, and ̅ is the average of measured values.
The better a model is, the closer to 1 the R-value is and the closer to 0 the RMSE value is.

Water Content
The mean water content in the fresh fruits of the relevant large cranberry varieties is detailed in Table 2. The results differed slightly and ranged between 89.19 and 90.05%. After 14 days in storage, the water content of the berries decreased on average by 1% compared to fresh fruit. After 25 days of storage the lowest water content was found in the Pilgrim variety (87.66%) and the highest value of the parameter was identified in the Stevens variety (88.12%). Statistical data are expressed as means ± SD.

Acidity of Cranberries
The mean acidity of the fresh fruit representing the selected cranberry varieties is presented in detail in Table 3. The mean acidity was in the range of 1.50-1.60 g100 g −1 . After 14 days in storage, the acidity of the berries decreased on average by 0.08 g100 g −1 compared to fresh fruit. After 25 days of storage the lowest acidity was found in the Stevens variety (1.30 g100 g −1 ) and the highest value of the parameter was identified in the Pilgrim variety (1.50 g100 g −1 ). Compared to fresh fruit, the most significant decrease in acidity following 25 days of storage was observed in the case of the Stevens variety. Statistical data are expressed as means ± SD.

Mechanical Properties of Cranberry Fruit
Based on the test results (Table 4), it was observed that irrespective of the variety and storage conditions (4 °C, 10 °C, 20 °C), there was a relationship between the selected mechanical parameters of cranberry fruit and duration of storage. The lowest values of the mechanical parameters were identified in the fruit kept in storage for 25 days. Analysis of the specific mechanical parameters showed a significant decrease in their values, which were as follows:  Statistical data are expressed as means ± SD. Means in a column followed by different letters show significant differences (α = 0.05) according to the LSD test.

The Results of Multiple Linear Regression
The developed R, RW, and REc regression models were based on six independent variables (variety, storage temperature, duration of storage, x, y and z dimensions). Detailed results of the multiple regression analysis for the presented independent variables and the dependent variables are presented in Table 5. Determination of the level of statistical significance: −non-significant. + significant for α = 0.05.
The variable for which statistical significance was not confirmed at the α = 0.05 level in all models was storage temperature. In the R model, the statistically significant traits were variety, y and x. Whereas in the RW model, the statistically significant traits were: variety, storage time, and the y, x, and z dimensions. The situation was different in the REc model, where only two variables were statistically significant, namely the y and z dimensions.
Based on the results from Table 5, considering only statistically significant traits, multiple regression equations were constructed for each model, which took the form:

Artificial Neural Networks
For each output parameter (, W and Ec) a separate neural model was developed (NN, NNW, NNEc). In Table 6, the structure and quality metrics of the best neural models are presented. Model structure means the number of neurons in each layer of MLP: input-hidden-output. The number of nodes in the input layer equals the number of input parameters of model, which is six. The number of neurons in the hidden layer was selected by trial-and-error method. In the output layer there is one neuron calculating the value of the output value of neural model. For deformation, the best architecture of the model is 48 neurons in the hidden layer. This model is of rather low accuracy with an R-value of 0.69 for the train data set and 0.68 for the validation data set. Better performance was obtained in the case of NNW model. The best architecture of ANN is network containing 10 neurons in the hidden layer with R value of 0.80 for train data set and 0.74 for the validation data set. For the NNEc the best model was achieved for architecture with 47 neurons in the hidden layer. The accuracy of this model can be stated as satisfactory with a relatively high R-value (0.89 for the train data set and 0.88 for the validation data set).

Sensitivity Analysis
The best MLP models described in Table 6 were used for sensitivity analysis. For better readability, the results are presented as a percentage influence of certain input variables on the output parameter.
As presented in Figure 1, in the case of deformation, a cranberry variety influences this parameter the most (23.04%). The geometrical dimensions, duration of storage, and storage temperature similarly affect deformation (from 14.21% to 16.97%). The parameter influencing breaking energy the most was duration of storage (45.64%). A significantly lower impact was observed for the variety and x dimensions. The influence of the y and z dimensions and storage temperature was minimal. The modulus of elasticity was affected the most by storage temperature (41.45%). Lower influence was noticed for the duration of storage (19.25%) and variety (14.96%). The impact of geometrical dimensions on Ec was very low.

Support Vector Machines
The three separate SVM models (SVM, SVMW and SVMEc) were developed with the same six input parameters (variety, storage temperature, duration of storage, and the x, y, and z dimensions) and different output parameters (deformation , breaking energy W, or modulus of elasticity Ec). Model parameters (C,  and ) were adjusted by trial-anderror approach. For all three models these parameters were as follows: C = 10,  = 0.03, and  = 0.24. Error metrics of models of the best accuracy are detailed in Table 7. The best accuracy was obtained for the SVMW model (R = 0.76 for validation data set). A lower correlation between experimental results and model prediction was observed for the SVM model (0.71). The SVMEc model was low accuracy with R-value of 0.67.
In this research, the three methods of modelling were used to develop models of relationships between large cranberry variety, storage temperature, duration of storage, x, y, z dimensions of fruits, and mechanical parameters, namely deformation, breaking energy, and modulus of elasticity. The regression method produced models of very low accuracy (R = 0.578 for deformation, R = 0.579 for breaking energy, and R = 0.475 for modulus of elasticity). The better models were developed with the use of artificial intelligence techniques. The best model for deformation and breaking energy was produced with the use of the SVM method. The error metrics for the validation data set calculated for these models are significantly better than for regression models (R = 0.705, RMSE = 1.451, and R = 0.758, RMSE = 53.869, respectively). Neural networks produced slightly worse models for these mechanical parameters. In the case of the modulus of elasticity, the ANN model was found to be the most accurate model (R = 0.878, RMSE = 0.067 for validation data set). In Figures 2-4, the performance of the best models of deformation , breaking energy W, and modulus of elasticity Ec for the validation data set is presented.

Discussion
Compared to fresh fruit, the most significant decrease in water content after 25 days of storage was observed in the Pilgrim variety. The fresh cranberries examined in a study by Oszmiański et al. (2017) [42] were found with varied water content, ranging from 87.21% in the Pilgrim variety and 87.52% in the Stevens variety to 89.94% in the Ben Lear variety. Similar water content in the fresh cranberries of the investigated varieties (87.0-87.5%) was reported by Oszmiański et al. (2018) [43]. Paniagua et al. [12] found a decrease in water content in blueberry fruit by 1.34% after three weeks of storage. Ruse et al. [13] reported a decrease in moisture content by 2% in cranberries stored in closed PP boxes in air ambiance for six months.
Compared to this study, a study by Teleszko [44] reported slightly higher acidity in the fresh cranberries of the Ben Lear variety amounting to 2.18 g100 g −1 , whereas Oszmiański et al. [42] found that total acidity in the fresh cranberries ranged from 1.95 g100 g −1 in Pilgrim and 2.25 g100 g −1 in the Stevens variety to 2.29 g100 g −1 in the Ben Lear variety. In another study, Oszmiański et al. [43] reported slightly higher acidity in the investigated cranberry varieties, with values in the range of 2.1-2.4 g100 g −1 .
Modelling and the development of regression models are very common and important in many scientific fields. Model accuracy is crucial in real-life applications, and experimental data is used for model development. Therefore, besides traditional techniques, artificial intelligence algorithms are often used for modeling. In this research, ANN and SVM provided much better results for the estimation of the mechanical parameters of large cranberry when compared to the MLR technique. Similar results were reported in state of art literature, where these techniques were used to develop regression models for various relationships. Karsavran and Erdik [45] developed sea-level prediction models and revealed that the ANN and SVM models outperformed MLR. The best performance resulted from ANN model with a coefficient of correlation R = 0.76. The same techniques were used by Mohammed et al. [46] to estimate time and cost indexes to predict the site overhead cost. They reported that the ANN and SVM techniques produced more accurate models than the MLR technique. A slightly higher accuracy of the ANN model (R = 0.99) when compared to SVM (R = 0.97) was reported by Afradi and Ebrahimabadi [47] who used AI methods to predict the penetration rate of tunnel boring machine. Sabzi-Nojadeh et al. [48] compared the accuracy of ANN and MLR models used to predict the oil yield and trans-anethole yield of fennel populations; ANN performed better (R = 0.96 and R = 0.88) than MLR (R = 0.74 and R = 0.68).

Conclusions
Knowledge of the mechanical parameters of fruits is crucial to optimize the storage process. Especially in the case of delicate fruit such as large cranberry. Therefore, the mechanical parameters of cranberry fruits in relation to storage conditions were investigated, and mathematical models of relationships under study were developed. The results of this study revealed that the water content in the fresh fruit of the relevant large cranberry varieties ranged between 89.19 and 90.05%. After 25 days of storage, the lowest water content was found in the Pilgrim variety (87.66%), and the highest value of this parameter was identified in the Stevens variety (88.12%). The mean acidity of fresh fruit representing the selected cranberry varieties was in the range of 1.50-1.60 g100 g −1 . After 25 days of storage, the lowest acidity was found in the Stevens variety (1.30 g100 g −1 ), and the highest value of this parameter was identified in the Pilgrim variety (1.50 g100 g −1 ). It was observed that irrespective of the variety and storage conditions (4 °C, 10 °C, 20 °C), there was a relationship between the selected mechanical parameters of cranberry fruit and the duration of storage. The lowest values of the mechanical parameters were identified in fruit kept in storage for 25 days. The ANN and SVM prediction models of relationships under study outperformed MLR models. The accuracy of the ANN and SVM models was comparable. In the case of deformation and breaking energy, the best performance was observed for the SVM model (R = 0.705 and R = 0.758, respectively). ANN produced the best model for modulus of elasticity (R = 0.878).
Funding: Wrocław University of Environmental and Life Sciences.
Institutional Review Board Statement: Not applicable.