Wild Blueberry Harvesting Losses Predicted with Selective Machine Learning Algorithms

: The production of wild blueberries ( Vaccinium angustifolium ) contributes 112.2 million dollars yearly to Canada’s revenue, which can be further increased by reducing harvest losses. A precise prediction of blueberry harvest losses is necessary to mitigate such losses. The performance of three machine learning (ML) algorithms was assessed to predict the wild blueberry harvest losses on the ground. The data from four commercial ﬁelds in Atlantic Canada (including Tracadie, Frank Webb, Small Scott, and Cooper ﬁelds) were utilized to achieve the goal. Wild blueberry losses (fruit loss on ground, leaf losses, blower losses) and yield were measured manually from randomly selected plots during mechanical harvesting. The plant height of wild blueberry, ﬁeld slope, and fruit zone readings were collected from each of the plots. For the purpose of predicting ground loss as a function of fruit zone, plant height, fruit production, slope, leaf loss, and blower damage, three ML models i.e., support vector regression (SVR), linear regression (LR), and random forest (RF)—were used. Statistical parameters i.e., mean absolute error ( MAE ), root mean square error ( RMSE ), and coefﬁcient of determination ( R 2 ), were used to assess the prediction accuracy of the models. The results of the correlation matrices showed that the blueberry yield and losses (leaf loss, blower loss) had medium to strong correlations accessed based on the correlation coefﬁcient ( r) range 0.37–0.79. The LR model showed the foremost predictions of ground loss as compared to all the other models analyzed. Tracadie, Frank Webb, Small Scott, and Cooper had R 2 values of 0.87, 0.91, 0.91, and 0.73, respectively. Support vector regression performed comparatively better at all the ﬁelds i.e., R 2 = 0.93 (Frank Webb ﬁeld), R 2 = 0.88 (Tracadie), and R 2 = 0.79 (Cooper) except Small Scott ﬁeld with R 2 = 0.07. When comparing the actual and anticipated ground loss, the SVR performed best ( R 2 = 0.79–0.93) as compared to the other two algorithms i.e., LR ( R 2 = 0.73 to 0.92), and RF ( R 2 = 0.53 to 0.89) for the three ﬁelds. The outcomes revealed that these ML algorithms can be useful in predicting ground losses during wild blueberry harvesting in the selected ﬁelds.


Introduction
Native to the northern parts of North America, the wild or lowbush blueberry (Vaccinium Augustifolium Ait.) is an eternal, deciduous shrub. [1]. Canada produced 161,346 tons of harvested wild blueberries in 2020 making its production greater than 50% of the world's wild blueberries [2]. Unlike other fruits, wild blueberries grow naturally from indigenous stands on deforested lands developed for agriculture [3]. Commercial fields of wild blueberries are grown on abandoned farmland or cleared forests where domestic blueberry plants already exist [4]. The fields are clipped to the ground level in the first year (vegetative year) as part of a biennial process that primarily controls the stands and is harvested in the second year (fruiting year) [4]. The wild blueberries are small and and extreme gradient boosting in terms of predicting maize production and Nitrogen losses. Yoosefzadeh-Najafabadi [24] compared three commonly used ML models namely RF, multilayer perception (MLP), and support vector machine (SVM) to predict soybean yield. Their results revealed that the RF had the highest prediction accuracy in predicting soybean yield as compared to the other two models they tested. Esfandiarpour-Boroujeni [25] estimated the apricot yield with high accuracy (R 2 = 0.81) using support vector regression (SVR). Abbas [26] predicted potato yield using four ML models namely LR, k-nearest neighbour, elastic net, and SVR concluded that all the algorithms worked very well in explaining the tuber yield having R 2 = 0.70, 0.65, 0.64, and 0.72 respectively.
The literature review has shown that various ML models have been used for the prediction of crop yield and loss. However, limited work has been done using ML models to predict the wild blueberry harvesting fruit losses. There is a need to investigate harvesting patterns and pinpoint losses since wild blueberry growers experience significant harvesting losses as a result of modified growing circumstances brought on by novel management techniques [27]. Prediction of harvesting losses would help farmers in decision-making so that they can develop their harvesting strategies to overcome the predicted losses by increasing the fruit yield. The goal of this study was to predict wild blueberry ground losses during harvesting using ML models.

Data Sites
Data about blueberry mechanical harvesting yield losses and the related factors contributing to the yield losses were obtained from four wild blueberry field studies conducted in Nova Scotia. The selected sites were in commercial wild blueberry fields including Frank Webb (45. To replicate early and late harvesting, the chosen fields were harvested every year from early August to early September using a mechanical blueberry harvester (Doug Bragg Enterprises Ltd., Collingwood, NS, Canada). Figure 1 displays the geolocation of the chosen fields. The chosen fields had undergone biennial trimming by mowing as well as traditional farming management techniques, and they had been commercially managed for the previous ten years (fertilization, pruning, weed, disease, and pollination).

Data Collection and Analysis
For data collection to understand the harvesting losses, eighty-two plots of 0.91 × 3 m dimension (identical to harvester head's width) were flagged arbitrarily in the Frank Webb field, Cooper field, and Small Scott field, and one hundred and nine plots were flagged in the Tracadie field. Each plot included a 0.3 m buffer around it to prevent inaccuracy during the data collection. A John Deere tractor (62.5 kW) was equipped with a solitary wild blueberry harvester. (Moline, Ill., Grand Detour, IL, United States). The harvester was operated in the fields at a ground speed of 1.6 km h −1 and 28 rpm. At the beginning of all the plots, the harvester's head was put down for harvesting and then raised at the end point of the plot. The belt of the harvester conveyor was connected to a bucket to collect the blueberries from each plot. Three losses including blower, ground, and leaf losses were considered. The blower damage was retrieved by mounting a collection bucket below the harvester blower fans that was emptied after each plot. The dropped berries were hand-picked from each plot to calculate the ground loss. For the leaf loss, the leaves and debris were separated from the collected good berries, placed in labelled Zip-loc® bags, and measured to calculate the weight of yield and fruit loss.
For average plant height per field, five plants were selected in each plot to measure their height. Readings of plant height were measured using a measuring tape and then averaged for each plot. The zone from the top to the bottom of the cluster of fruits on blueberry plants is indicated as the fruit zone. The purpose of fruit zone reading was to

Data Collection and Analysis
For data collection to understand the harvesting losses, eighty-two plots of 0.91 × 3 m dimension (identical to harvester head's width) were flagged arbitrarily in the Frank Webb field, Cooper field, and Small Scott field, and one hundred and nine plots were flagged in the Tracadie field. Each plot included a 0.3 m buffer around it to prevent inaccuracy during the data collection. A John Deere tractor (62.5 kW) was equipped with a solitary wild blueberry harvester. (Moline, Ill., Grand Detour, IL, United States). The harvester was operated in the fields at a ground speed of 1.6 km h −1 and 28 rpm. At the beginning of all the plots, the harvester's head was put down for harvesting and then raised at the end point of the plot. The belt of the harvester conveyor was connected to a bucket to collect the blueberries from each plot. Three losses including blower, ground, and leaf losses were considered. The blower damage was retrieved by mounting a collection bucket below the harvester blower fans that was emptied after each plot. The dropped berries were hand-picked from each plot to calculate the ground loss. For the leaf loss, the leaves and debris were separated from the collected good berries, placed in labelled Ziploc®bags, and measured to calculate the weight of yield and fruit loss.
For average plant height per field, five plants were selected in each plot to measure their height. Readings of plant height were measured using a measuring tape and then averaged for each plot. The zone from the top to the bottom of the cluster of fruits on blueberry plants is indicated as the fruit zone. The purpose of fruit zone reading was to help the operator in adjusting the harvester's head height from the ground to pick blueberries effectively. Slope measurements (five at each plot) were recorded by hand using a Craftsman SmartTool Plus digital level (Sears Holdings Corporation, Hoffman Estates, IL, USA) and then averaged to get a characteristic slope for the selected fields from the slope values of each plot. Fruit zone and plant height ranged from 7.4 to 34.6 cm and 10.6 to 39.0 cm respectively and both were moderately variable. The slope was highly variable with a range from 0.2 to 23.7 degrees within all the selected fields. The attributes of the harvesting plots are given in Table 1. The primary sign of the variability is the coefficient of variation (CV) in descriptive statistics. A CV less than 15% shows the least variability of parameters; the CV between 15 to 35% indicates that the parameter is moderately variable and the CV greater than 35% describes that the parameter is highly variable [28]. The ground loss varied from 3.4 to 1847 kg/ha with CV > 35% across all the fields. Relationships of all variables were assessed using Pearson correlation coefficients. The values of correlation coefficients (r ≤ 0.35) normally stand for weak correlations, 0.36-0.67 for medium correlations, and 0.68 to 0.90 for strong correlations, and r ≥ 0.90 shows significantly high correlations [29]. The selected ML models were trained using datasets (plant size, fruit area, slope, blueberry yield, leaf loss, and the blower loss). Because the model was constructed utilizing a variety of attributes, its parameters were established during the training stage using data from prior years (2011 and 2012). During the testing stage, the fraction of the data which wasn't utilized for training was employed for performance assessment. [22].

Machine Learning Models
Machine learning can direct patterns and correlations and uncover insights from the datasets. 80% of the data was utilized in training and 20% in testing. Machine learning studies consist of different challenges like inaccessible data, data security, and time-consuming implementation when building a well-functioning predictive model. It is difficult to choose the correct models to solve the issue at hand, and moreover, the models and the fundamental platforms have to handle a big volume of data [22].

Linear Regression
Linear Regression (LR) models are understandable but incredibly powerful. Linear regression gives an impact of each predictor variable on the response variable [30]. They reported that supervised learning is a method used in LR. It can be applied to predict continuous variables. Linear regression in ML uses data to learn by reducing the loss commonly known as the mean square error (MSE) or root mean square error (RMSE) by using models, for example, gradient descent. Based on the type of data, the gradient descent model works at minimal loss functions, increasing the LR model's ability to predict outcomes accurately. [30]. Linear regression is described by the equation below: The loss function (J) assists in evaluating the values of coefficients (a, b) by reducing the inaccuracy in between the real values and the anticipated values. It can be explained by the equation below: where,ŷ i = predicted value, y i = actual value.

Support Vector Regression
Support vector regression is a supervised learning model that may be utilized for both regression and classification purposes [31,32]. Support vector regression can be linear or non-linear using respective kernel functions. The well-known kernels are linear, radial basis function, sigmoidal, polynomial. The productivity of the SVR very much realize on the selection of the kernel. Linear kernel is used in the SVR for linear regression while using as appropriate nonlinear kernel makes it nonlinear [33]. By including the hyperplane and widening the gap between the anticipated and real values, SVR has always aimed to reduce inaccuracy. The SVR may perform better when applied to information that is imbalanced regarding the binary outcome because they are created utilizing only the support vectors [34]. On the other hand, it has some drawbacks, for example, the user must choose the SVR's kernels for nonlinear scenarios. The kernel and any related hyperparameters that the kernel requires should be specifically picked; a bad kernel selection might impair the performance of the model. [35]. Linear SVR was used for this study based on the results obtained from LR. Linear SVR can be defined by the formula given below: where, a and x = supplementary hyperplanes in conjunction with the regression line.

Random Forest
The RF model is a form of ensemble approach which generates forecasts by aggregating forecasts from many different base models. The RF model has had outstanding luck as a particular regression and classification tool since its inception by [36]. The bootstrap aggregating technique used by the RF model, also known as bagging, lowers the variability of a quantitative learning approach. [37]. In summary, different bootstrapped specimens out of the training information are collected, and trees are built using these samples. A democratic decision is made for each tree's anticipated class, and an average forecast is then returned. The overall prediction power of the model is potentially increased by this method. In addition, an estimate of out-of-bag error, which is a reliable estimation of the test error, is possible with bootstrap aggregating [36]. Both regression and classification problems can be solved by the RF model, which makes it a diverse model that is extensively used by engineers. In addition to prediction accuracy, A wide range of industries, especially the share market, banks, pharmacology, patient healthcare management, and physiology, frequently use RF as a tool [38]. While using the RF to solve regression problems, MSE has been used to know data branches from each node. The following equation is used to find MSE: where, N = total number of points, f i = output of the model, and y i = true value of data point i.
The RF model can provide the estimation of the importance of the variables by comparing the changes of MSE when a specific variable is randomly altered, and other variables are kept unchanged.

Hyperparameters Tunning
The data samples were split up into 20% and 80% sets for the testing and training of the data sets, respectively. It is important to evaluate various hyperparameters for varied datasets, because all hyperparameters behave uniquely for the different kinds of datasets [26]. That is why the testing procedure was performed by adopting the hit and trial method based on which the best combination of values was selected which gave the highest R 2 , mean absolute error (MAE) and RMSE. Following this procedure, the hyperparameters which are displayed in Table 2 were utilized to train the selected ML models. The hit and trial method was adopted to determine the range of hyperparameters. Support vector regression was tested by optimizing the regularization parameter (C) value from 50-200 and it performed well at C = 150. Similarly, the Epsilon value was optimized from 0.1-1.0, and it performed well at 0.2. In case of RF, seven hyperparameters were tested at maximum depth = 10-60, random state = 5-75, min samples leaf = 1-20, verbose = 0.1-10 and it showed the best results at maximum depth = 35, random state = 30, min samples leaf = 3, and verbose = 2.

Model Evaluation Criteria
References [39,40] used three statistical parameters, R 2 , RMSE, and MAE, which were utilized to evaluate LR, SVR, and RF models. R 2 assesses that how well a model explains or predicts the outcomes. Its value lies from 0.0 to 1.0 range. A value closure to 1 represents the model's excellent efficiency. The amount of error in a measurement is called absolute error and an average of all those absolute errors is known as MAE.
The difference between both the true and anticipated values are measured using RMSE. The effectiveness of the model is indicated by a reduced RMSE value.
where, yi = actual value present at the ith time,ŷi = estimated value at the ith time, y = mean value of yi, i has a value range of 1 to N, and N = number of values. Table 3 displays the findings of the descriptive statistics for the chosen parameters. Fruit yield was highly variable within all the selected fields with values varying from 253 to 17,968 kg/ha. The ground loss were highly variable and occurred due to many factors which include pre-harvest berry drop. Leaf loss showed a high variability ranging from 0 to 575 kg/ha. Blower loss also showed high variability (0-529 kg/ha) except only moderate variability for at the Tracadie site (21.2-129 kg/ha).

Correlation Analysis
In order to identify the relationships in between ground losses and other input variables, correlation matrices were established. The Pearson correlation's results have been shown in Figure 2. In the Frank Webb field, there were strong significant, and positive correlations between ground loss and fruit yield (r = 0.78), and leaf loss (r = 0.79). Farooque [41] reported that the fruit losses on the ground enhanced with an increment in the Agriculture 2022, 12, 1657 9 of 15 blueberry yield during the harvesting. There was a moderate correlation between ground loss and blower loss (r = 0.62). The ground loss was negatively correlated with plant height (r = −0.28) and fruit zone (r = −0.06) which means that ground loss decreased while plant height and fruit zone increased and vice versa. It has also been reported by [13] that the ground loss was inversely proportional to plant height (r = −0.21) and fruit zone (r = −0.07). The slope had a positive correlation with ground loss (r = 0.04). In the Cooper field, the ground loss had a moderate positive correlation with fruit yield (r = 0.47). It was due to the topography of the field and the size of berries. The remaining variables had a weak correlation i.e., r ≤ 0.35. In the Small Scott field, ground loss and fruit yield were positively correlated (r = 0.59). Leaf loss, blower loss, and slope also had a positive correlation with ground loss i.e., r = 0.33, 0.14, and 0.37, respectively. In the Tracadie field, a significant correlation between ground loss and fruit yield was observed (r = 0.73). Leaf loss, blower loss, and slope were weakly correlated with the ground loss which means that r ≤ 0.35 for these variables. [13] concluded that ground loss had a significant correlation with fruit yield (r = 0.78) but it had a weak correlation with blower loss (r = 0.15) and slope (r = 0.16). Plant height and fruit zone represented a reverse relationship with the ground loss. correlations between ground loss and fruit yield (r = 0.78), and leaf loss (r = 0.79). Farooque [41] reported that the fruit losses on the ground enhanced with an increment in the blueberry yield during the harvesting. There was a moderate correlation between ground loss and blower loss (r = 0.62). The ground loss was negatively correlated with plant height (r = −0.28) and fruit zone (r = −0.06) which means that ground loss decreased while plant height and fruit zone increased and vice versa. It has also been reported by [13] that the ground loss was inversely proportional to plant height (r = −0.21) and fruit zone (r = −0.07).
The slope had a positive correlation with ground loss (r = 0.04). In the Cooper field, the ground loss had a moderate positive correlation with fruit yield (r = 0.47). It was due to the topography of the field and the size of berries. The remaining variables had a weak correlation i.e., r ≤ 0.35. In the Small Scott field, ground loss and fruit yield were positively correlated (r = 0.59). Leaf loss, blower loss, and slope also had a positive correlation with ground loss i.e., r = 0.33, 0.14, and 0.37, respectively. In the Tracadie field, a significant correlation between ground loss and fruit yield was observed (r = 0.73). Leaf loss, blower loss, and slope were weakly correlated with the ground loss which means that r ≤ 0.35 for these variables. [13] concluded that ground loss had a significant correlation with fruit yield (r = 0.78) but it had a weak correlation with blower loss (r = 0.15) and slope (r = 0.16). Plant height and fruit zone represented a reverse relationship with the ground loss.

Evaluation of Machine Learning Algorithms
The outcomes of the model assessment have been given in the Table 4. SVR had a higher R 2 (0.93) for Frank Webb field; LR recorded R 2 = 0.91 whereas, the lowest R 2 (0.53) was recorded for RF in this field. The ranges of MAE and RMSE were 2.35-2.49 and 2.96-3 kg/ha respectively for all the algorithms in this field. In the Tracadie field, high R 2 was recorded for LR and SVR which were 0.87 and 0.88 respectively, whereas, RF had R 2 = 0.78. The values of MAE and RMSE for this field ranged from 10.74-34.32 and 13.08-45.15

Evaluation of Machine Learning Algorithms
The outcomes of the model assessment have been given in the Table 4. SVR had a higher R 2 (0.93) for Frank Webb field; LR recorded R 2 = 0.91 whereas, the lowest R 2 (0.53) was recorded for RF in this field. The ranges of MAE and RMSE were 2.35-2.49 and 2.96-3 kg/ha respectively for all the algorithms in this field. In the Tracadie field, high R 2 was recorded for LR and SVR which were 0.87 and 0.88 respectively, whereas, RF had R 2 = 0.78. The values of MAE and RMSE for this field ranged from 10.74-34.32 and 13.08-45.15 kg/ha, respectively. In Cooper field higher R 2 (0.89) was observed for RF whereas, for LR and SVR, the values of R 2 were 0.73 and 0.79, respectively. Lowest values of MAE and RMSE were calculated for SVR i.e., 0.1 and 0.15, respectively in this field. In the Small Scott field higher R 2 value was recorded for LR (0.91), and the lowest SVR and RF were recorded at 0.07 and 0.18 for this field, respectively. The highest MAE and RMSE were observed for RF i.e., 53.76 and 103, respectively. The findings revealed that the SVR and LR performed very well in predicting the berry losses (Table 4). Wang [42] compared the performance of the RF algorithm with SVR and artificial neural network (ANN) to remotely estimate the wheat biomass and reported that RF (R 2 = 0.79) and SVR (R 2 = 0.62) showed good results as compared to ANN (R 2 = 0.3). Gandhi [43] used different machine learning techniques and reported that SVR performed very well for the prediction of rice crop yield under different climatic scenarios. Palanivel [44] used diverse machine learning techniques such as LR, ANN, and backpropagation methods to predict the crop yield. In order to develop a prediction model for fruit yield, Obsie [45] selected four ML models, that are boosted decision trees, multiple linear regression, extreme gradient boosting, and RF, and concluded that RF was the second-most successful algorithm, accompanied by the Boosted Decision Tree algorithm with R 2 = 0.90.

Comparison of Actual and Predicted Ground Losses
Outputs of algorithms were compared to evaluate which algorithm performed better in predicting the ground losses from plant size, fruit area, topography, blueberry yield, leaf damage, and the blower loss as shown in Figure 3. In the Frank Webb field, SVR (R 2 = 0.94) performed better as compared to LR (R 2 = 0.91) in predicting the ground losses. In this field, RF did not perform well in predicting the losses (R 2 = 0.53). Whereas, in the Cooper field, RF had the highest value (R 2 = 0.99) which means RF performed very well in predicting the ground losses, while SVR and LR had (R 2 = 0.79) and (R 2 = 0.74) respectively. LR was found to be the best performer in predicting the ground losses in Small Scott (R 2 = 0.91) and Tracadie (R 2 = 0.89). Whereas SVR (R 2 = 0.88) and RF (R 2 = 0.78) were also good in predicting the ground losses for the Tracadie field. In comparison, SVR and RF performed better in three fields except for the Small Scott field LR performed well in all the fields. The logic behind poor performance of RF and SVR in the Small Scott field could be the result of influencing factors such as climate, soil, etc. which may influence yield. Therefore, some unknown factors which were not included in this study may have influence on yield which reduced modelling accuracy at the Small Scott site. Different models performed well in different studies like [46] used machine learning algorithms namely SVR, RF, and deep neural networks for the autumn crop yield prediction. The results showed that SVR and RF performed very well in predicting the yield having R 2 = 0.92 and R 2 = 0.90. All the models performed differently in varying fields due to the type of data. They performed well, especially for linear data and our data is point-based or discrete data which is not linear. So, the performance of the models depends on the correlation between input parameters (slope, plant height, blower loss, fruit zone, and leaf loss) and the output data (ground loss) which is different for each of the fields. The models performed differently in each fields because the productivity of models relies on the nature of the specific input data. models performed well in different studies like [46] used machine learning algorithms namely SVR, RF, and deep neural networks for the autumn crop yield prediction. The results showed that SVR and RF performed very well in predicting the yield having R 2 = 0.92 and R 2 = 0.90. All the models performed differently in varying fields due to the type of data. They performed well, especially for linear data and our data is point-based or discrete data which is not linear. So, the performance of the models depends on the correlation between input parameters (slope, plant height, blower loss, fruit zone, and leaf loss) and the output data (ground loss) which is different for each of the fields. The models performed differently in each fields because the productivity of models relies on the nature of the specific input data.

Comparison of Machine Learning Algorithms
Three ML models were utilized in this study to find the ground losses. The comparison of these algorithms showed that the LR and SVR performed comparatively well for all the fields as shown in Figure 4. The LR performed better because it utilizes the data to learn by reducing loss like MAE and RMSE [29]. SVR may perform better than other algorithms due to its use of a stronger optimization method for a wide range of variables [47]. Pan [48] established quantitative structure-function relationship algorithms for forecasting the auto-ignition temperatures of organic substances using a support vector. Investigated and contrasted the calibration and predictive power of the SVR with the other two widely used techniques, back-propagation neural network and LR. Outcomes revealed that the support vector performed better as compared to the backpropagation and MLR.

Comparison of Machine Learning Algorithms
Three ML models were utilized in this study to find the ground losses. The comparison of these algorithms showed that the LR and SVR performed comparatively well for all the fields as shown in Figure 4. The LR performed better because it utilizes the data to learn by reducing loss like MAE and RMSE [29]. SVR may perform better than other algorithms due to its use of a stronger optimization method for a wide range of variables [47]. Pan [48] established quantitative structure-function relationship algorithms for forecasting the auto-ignition temperatures of organic substances using a support vector. Investigated and contrasted the calibration and predictive power of the SVR with the other two widely used techniques, back-propagation neural network and LR. Outcomes revealed that the support vector performed better as compared to the backpropagation and MLR.
Additionally, it demonstrated improved generalization capabilities for the support vector and demonstrated that it is a powerful resource. The result of this research also highlights the superior productivity of LR and SVR in comparison to RF because of their improved optimization methods for a large number of parameters [46]. Support vector regression gives the supplemental functionality of kernel, which increases the productivity of the model by understanding the nature of attributes [49]. Linear regression performance was best in all the fields and SVR performance was better for three of the four fields. Whereas RF performed well for only two fields. On the basis of this study's findings, LR and SVR models are suggested to predict the ground losses in the selected blueberry fields.
Agriculture 2022, 12, x FOR PEER REVIEW 1 Additionally, it demonstrated improved generalization capabilities for the support v and demonstrated that it is a powerful resource. The result of this research also high the superior productivity of LR and SVR in comparison to RF because of their imp optimization methods for a large number of parameters [46]. Support vector regre gives the supplemental functionality of kernel, which increases the productivity model by understanding the nature of attributes [49]. Linear regression performanc best in all the fields and SVR performance was better for three of the four fields. Wh RF performed well for only two fields. On the basis of this study's findings, LR and models are suggested to predict the ground losses in the selected blueberry fields.

Conclusions
In this study, the losses on the ground have been predicted during the harvest blueberry using ML algorithms and the best algorithms have been proposed which c used to predict the fruit losses on the ground. Four blueberry fields were selected, randomized experiment was conducted in each field. Eighty-two plots were setup in fields and one hundred and nine plots were made in the fourth field. Berry losses and yield were measured from each plot. The values of fruit zone, plant height, and top phy were also noted from all the plots within the selected fields. Three ML algor namely LR, SVR, and RF were used to predict ground losses. Modeling techniques used to access the prediction of ground losses. Findings of correlation investigation cated that the blueberry yield and the losses (leaf loss, blower loss) had moderate to correlations with the ground loss with r ranging from 0.37-0.79. LR model performe as compared to the other models for Frank Webb, Tracadie, Cooper, and Small Scot R 2 = 0.91, 0.87, 0.73, and 0.91, respectively. With the exception of Small Scott (R 2 = the SVR model also outperformed the competition for the Frank Webb (R 2 = 0.93 Tracadie (R 2 = 0.88) and the Cooper (R 2 = 0.79). When actual and anticipated ground are compared, the LR model performed best with R 2 ranging from 0.73-0.92 with selected fields. SVR also performed well with R 2 ranging from 0.79 to 0.93 for three The results showed that these ML algorithms could be used to predict blueberry los the ground. These results will further help in optimizing the harvesting techniques

Conclusions
In this study, the losses on the ground have been predicted during the harvesting of blueberry using ML algorithms and the best algorithms have been proposed which can be used to predict the fruit losses on the ground. Four blueberry fields were selected, and a randomized experiment was conducted in each field. Eighty-two plots were setup in three fields and one hundred and nine plots were made in the fourth field. Berry losses and fruit yield were measured from each plot. The values of fruit zone, plant height, and topography were also noted from all the plots within the selected fields. Three ML algorithms namely LR, SVR, and RF were used to predict ground losses. Modeling techniques were used to access the prediction of ground losses. Findings of correlation investigation indicated that the blueberry yield and the losses (leaf loss, blower loss) had moderate to high correlations with the ground loss with r ranging from 0.37-0.79. LR model performed best as compared to the other models for Frank Webb, Tracadie, Cooper, and Small Scott with R 2 = 0.91, 0.87, 0.73, and 0.91, respectively. With the exception of Small Scott (R 2 = 0.07), the SVR model also outperformed the competition for the Frank Webb (R 2 = 0.93), the Tracadie (R 2 = 0.88) and the Cooper (R 2 = 0.79). When actual and anticipated ground losses are compared, the LR model performed best with R 2 ranging from 0.73-0.92 within all selected fields. SVR also performed well with R 2 ranging from 0.79 to 0.93 for three fields. The results showed that these ML algorithms could be used to predict blueberry losses on the ground. These results will further help in optimizing the harvesting techniques.