Prediction of Strawberry Leaf Color Using RGB Mean Values Based on Soil Physicochemical Parameters Using Machine Learning Models

: Intensively grown strawberries in a greenhouse require frequent and precise soil physicochemical constituents for optimal production. Strawberry leaf color analyses are the most effective way to evaluate soil status and protect against excess environmental nutrients and ﬁnancial setbacks. Meanwhile, precision agriculture (PA) endorsements have been utilized to mimic solutions to these problems. This research aimed to create machine learning models such as multiple linear regression (MLR) and gradient boost regression (GBR) for simulating strawberry leaf color changes related to soil physicochemical components and plant age using RGB (red, green, and blue) mean values. The soil physicochemical properties of the largest varied colored leaves of strawberry were precisely measured by a multifunctional soil sensor from the rooting zones. Simultaneously, 400 strawberry leaﬂets were detached in each vegetative and reproductive stage, and individual leaves were captured using a digital imaging system. The RGB mean values of colored images were extracted using the image segmentation algorithms of image processing technique. Consequently, MLR and GBR models were developed to predict leaf RGB mean values based on soil physicochemical measurements and plant age. The GBR model vigorously ﬁtted with RGB mean values throughout the growth stage, with R 2 and RMSE values of ( R = 0.77, 7.16, G = 0.72, 7.37, and B = 0.70, 5.68), respectively. Furthermore, the MLR model performed moderately with R 2 and RMSE values of ( R = 0.67, 8.59, G = 0.57, 9.12, and B = 0.56, 6.81) when consecutively predicting RGB mean values in strawberry leaves. Eventually, the GBR model performed more effectively than the MLR model with high-performance metrics. In addition, the leaf color model uses visualization technology to measure growth progress, and it performs well in predicting dynamic changes in strawberry leaf color.


Introduction
Strawberry (Fragaria × ananassa) cultivation is markedly increased in South Korea. Optimal strawberry yield in greenhouse cultivation is inducible by favorable nutrient availability in the soil. Simultaneously, greenhouse strawberry output has surpassed traditional soil growing in recent years all across the world [1].
Correspondingly, the strawberry cultivation leaf is considered a vital vegetative organ that acts as a reservoir for phytochemicals and other bioactive compounds that lead to the growth of the reproductive structures [2]. Therefore, leaf color is a typical visual character of the strawberry plant impacted by the growing environment and can help understand plant have been developed based on leaf chlorophyll content or SPAD value changes in different phases [5]. Maresma et al., 2016 [15], explored an unmanned air vehicle (UAV) and various vegetation indices in maize fields using fertilizer application rates through the regression model. The study provided convincing results, with R 2 values of 0.92 from different indexes. However, the limitation of this study is that it did not analyze one single macronutrient effect for vegetative growth. Hence, as these studies used datasets of moderate size, there is a strong possibility that overfitting occurred. Ozreccberouglu et al., 2020 [16], explored various linear regression models to investigate the optimum pomegranate leaf chlorophyll content (of a given area) using both G and B color values. In the present study, ML models are used, which employ the soil physicochemical parameters with plant age to predict the RGB color of strawberry plant leaves.
ML is a promising technique for analyzing massive volumes of data and is mainly applied for prediction and classification. However, plant science, plant production, and plant phenotyping are just a few disciplines where this technology is applied [17,18].
Gradient boost regression (GBR) was used as the prediction model in this investigation. GBR is a regression machine learning technique that generates a prediction model in the form of a "decision tree", an ensemble of weak prediction models. It creates a model, stage by stage, and then generalizes the model by optimizing an arbitrary differentiable loss function. In addition, GBR produces highly competitive, robust, and interpretable results, especially appropriate for mining data that are less than clean, as revealed by Friedman, 2001 [19]. Moreover, the GBR model is deemed to perform better with input features that are complex and nonlinear. Multiple linear regression (MLR) is a machine learning model often applied in agriculture-related research to predict the linear relationship between input variables [15].
Meanwhile, the current research aims to provide fresh insight into anticipating model leaf color dynamics in strawberries using RGB mean values based on plant age and soil physicochemical parameters. To the best of our knowledge, this study is the first of its kind to use the combination of soil nutrients and plant age to indicate strawberry leaf color. The expected results can provide a key technology to support further development of virtual strawberry production and its application in agriculture production.

Experimental Design
The present experiment was laid out in the controlled greenhouse at Smart Farm Systems Laboratory, Gyeongsang National University, South Korea, during the winter season in 2021. The overall experiment was 120 days (from October to the end of January). The indoor parameters such as humidity, temperature, light, and CO 2 were monitored daily using the specific highly accurate sensor unit MCH 383SD (Lutron Electronic Enterprises Co., Ltd., Taipei, Taiwan) [2]. In this experiment, the combination of bio plus compost and Hoagland solution was used for five rows of strawberries, with 100 strawberry plants in each row, as demonstrated in Figure 1. Moreover, bio plus compost soils consist of cocopeat (68.86%), peat moss (11.00%), perlite (11.00%), and zeolite (9.00%), as revealed by Khan et al., 2019 [20].

Leaf Sample Collection and Soil Physicochemical Parameter Measurement
Normal and colored strawberry leaves (biggest leaf) were collected in consistent normal growth, with no signs of pests and disease, and soil physicochemical measurements, namely, soil pH, EC, ST, and NPK content, were performed near to rootzone using a multifunctional soil sensor (JXBS-3001-SCY-PT, High-precision Environmental Sensors, Weihai JXCT Electronics Technology Co., Ltd., Weihai, China). A total of 400 leaves were collected in normal and colored leaf samples of different ages starting after 60 days of transplanting. Generally, normal leaves are green in color, whereas colored leaves consist of green and non-green parts that are differently colored [21]. A total of 400 leaves were collected every week, 40 leaves for one week, 20 normal leaves and 20 colored leaves, respectively [5]. The leaf color extraction, model development, and performance analysis procedures were implemented according to the following diagram, as illustrated in Figure 2.

Leaf Sample Collection and Soil Physicochemical Parameter Measurement
Normal and colored strawberry leaves (biggest leaf) were collected in consistent normal growth, with no signs of pests and disease, and soil physicochemical measurements, namely, soil pH, EC, ST, and NPK content, were performed near to rootzone using a multifunctional soil sensor (JXBS-3001-SCY-PT, High-precision Environmental Sensors, Weihai JXCT Electronics Technology Co., Ltd., Weihai, China). A total of 400 leaves were collected in normal and colored leaf samples of different ages starting after 60 days of transplanting. Generally, normal leaves are green in color, whereas colored leaves consist of green and non-green parts that are differently colored [21]. A total of 400 leaves were collected every week, 40 leaves for one week, 20 normal leaves and 20 colored leaves, respectively [5]. The leaf color extraction, model development, and performance analysis procedures were implemented according to the following diagram, as illustrated in Figure 2.   Normal and colored strawberry leaves (biggest leaf) were collected in consistent normal growth, with no signs of pests and disease, and soil physicochemical measurements, namely, soil pH, EC, ST, and NPK content, were performed near to rootzone using a multifunctional soil sensor (JXBS-3001-SCY-PT, High-precision Environmental Sensors, Weihai JXCT Electronics Technology Co., Ltd., Weihai, China). A total of 400 leaves were collected in normal and colored leaf samples of different ages starting after 60 days of transplanting. Generally, normal leaves are green in color, whereas colored leaves consist of green and non-green parts that are differently colored [21]. A total of 400 leaves were collected every week, 40 leaves for one week, 20 normal leaves and 20 colored leaves, respectively [5]. The leaf color extraction, model development, and performance analysis procedures were implemented according to the following diagram, as illustrated in Figure 2.

Image Acquisition
Each leaf was placed on a smooth rectangular light chamber (80 cm × 80 cm × 80 cm), which consists of a black-colored surface directly under the white light-emitting diodes (LEDs). The outputs of these LEDs lamps were two 20 W strips. The lamp positions were adjusted so that the leaves were evenly illuminated with no shadows, as indicated in Figure 3. The reflectance of the light band was observed at 450 nm at the upper edge. Concurrently, images were captured using a high-resolution RGB camera with a resolution of 5472 × 3648 pixels (SONY DSC-RX100 vii, Seoul, Korea. The camera was fixed on a tripod 80 cm above the platform's top, at the nadir position [6].
which consists of a black-colored surface directly under the white light-emitting diodes (LEDs). The outputs of these LEDs lamps were two 20 W strips. The lamp positions were adjusted so that the leaves were evenly illuminated with no shadows, as indicated in Figure 3. The reflectance of the light band was observed at 450 nm at the upper edge. Concurrently, images were captured using a high-resolution RGB camera with a resolution of 5472 × 3648 pixels (SONY DSC-RX100 vii, Seoul, Korea. The camera was fixed on a tripod 80 cm above the platform's top, at the nadir position [6]. The images were edited using remove.bg software, saving the PNG image as the transparent background and adjusting the image size to 612 × 408. Eventually, for each normal and colored leaf image, the mean value of red, green, and blue channels was computed using an image segmentation algorithm developed by a python program from Google Colaboratory, as shown in Figure 4. Eventually, histograms were developed for leaf R, G, and B mean values at two different ages, 60 and 123 days after transplanting, to obtain the leaf color distribution pattern.

Data Preprocessing and Models Building
Initially, before data fitting to the machine learning models, a Pearson correlation coefficient heatmap was developed to recognize the magnitude and association among independent variables, such as soil pH, EC, ST, N, P, K, and plant age, and dependent variables, namely, R, G, and B mean values of each strawberry leaf. The scikit-learn library was used to perform data preprocessing and develop the MLR and GBR models, and figures were created using a python program from the Google Colaboratory notebook.

Leaf Image Segmentation, Denoising, and Color Feature Extraction
The images were edited using remove.bg software, saving the PNG image as the transparent background and adjusting the image size to 612 × 408. Eventually, for each normal and colored leaf image, the mean value of red, green, and blue channels was computed using an image segmentation algorithm developed by a python program from Google Colaboratory, as shown in Figure 4. Eventually, histograms were developed for leaf R, G, and B mean values at two different ages, 60 and 123 days after transplanting, to obtain the leaf color distribution pattern.
(LEDs). The outputs of these LEDs lamps were two 20 W strips. The lamp positions were adjusted so that the leaves were evenly illuminated with no shadows, as indicated in Figure 3. The reflectance of the light band was observed at 450 nm at the upper edge. Concurrently, images were captured using a high-resolution RGB camera with a resolution of 5472 × 3648 pixels (SONY DSC-RX100 vii, Seoul, Korea. The camera was fixed on a tripod 80 cm above the platform's top, at the nadir position [6].

Leaf Image Segmentation, Denoising, and Color Feature Extraction
The images were edited using remove.bg software, saving the PNG image as the transparent background and adjusting the image size to 612 × 408. Eventually, for each normal and colored leaf image, the mean value of red, green, and blue channels was computed using an image segmentation algorithm developed by a python program from Google Colaboratory, as shown in Figure 4. Eventually, histograms were developed for leaf R, G, and B mean values at two different ages, 60 and 123 days after transplanting, to obtain the leaf color distribution pattern.

Data Preprocessing and Models Building
Initially, before data fitting to the machine learning models, a Pearson correlation coefficient heatmap was developed to recognize the magnitude and association among independent variables, such as soil pH, EC, ST, N, P, K, and plant age, and dependent variables, namely, R, G, and B mean values of each strawberry leaf. The scikit-learn library was used to perform data preprocessing and develop the MLR and GBR models, and figures were created using a python program from the Google Colaboratory notebook.

Data Preprocessing and Models Building
Initially, before data fitting to the machine learning models, a Pearson correlation coefficient heatmap was developed to recognize the magnitude and association among independent variables, such as soil pH, EC, ST, N, P, K, and plant age, and dependent variables, namely, R, G, and B mean values of each strawberry leaf. The scikit-learn library was used to perform data preprocessing and develop the MLR and GBR models, and figures were created using a python program from the Google Colaboratory notebook.
Subsequently, a standard scaler was applied to standardize features by removing the mean and scaling to unit variance to make prediction models based on the results. Thus, before applying the machine learning models, the standard scaler normalized the features of each independent variable. Standard scaling was performed according to Equation (1) [22].
where Z is the standard score, x is the feature value, µ is the mean value, and σ is the standard deviation. Eventually, all variables equally contributed to the model fitting and standardization, and the removal of bias. Subsequently, principal component analysis (PCA) was applied to reduce the dimension of the data sets and reduce the component size by up to 0.95. MLR has been more extensively applied in agricultural fields than other prediction techniques [23,24]. The primary goal of the MLR model is to create a linear relationship between the explanatory (independent) and response (dependent) variables. As a predictive analysis, MLR is based on the linear association with more explanatory variables and a response variable, as revealed by Abdipour et al., 2015 [25]. MLR was developed according to Equation (2) [26].
where y i is the R/G/B mean value, β o − β n are the regression coefficients, X 1 -X n are the input variables, and ε is the error associated with i th observation.
LGBM is a decision tree-based machine learning algorithm that was released by Microsoft in late 2017. The advantages of this model are low memory usage and high convergence speed, and the model has gained increasing popularity in the machine learning field, especially in data science, as reported by Cai et al., 2021 [27]. The LGBM model has used histogram-based algorithm buckets to divide continuous feature values into discrete bins, which fasten the training process, and it splits the tree leaf-wise with the best fit. In contrast, other boosting algorithms split the tree depth-wise or level-wise rather than leaf-wise [28].
For LGBM modelling, GBR, a feature importance method, was performed. GBR is an ensemble learning method that combines multiple weak learners to overcome the overfitting of the model. GBR summation was developed according to Equation (3), as reported by Jiao et al., 2006 [29].
where F m (x) refers to the output and m represents the number of iterations, h(x; α m ) refers to the decision tree, α is the parameter vector of the decision tree, and ρ m means the weight parameter of the regressor. GBR has many parameters that need to be tuned, such as boosting type, max depth, learning rate, and the number of leaves. In the GBR model, max depth is the parameter for the maximum depth of the individual regression estimators. The greater the value is, the more complex the features that the models describe are [30]. However, a high value might result in overfitting the training dataset. The parameter learning rate shrinks the contribution of each tree by the value of the learning rate. The parameter subsample is the fraction of samples that fit the individual base learners. Moreover, the bagging fraction is important to specify the fraction of data for each iteration and is generally used to speed up the training and avoid overfitting. Eventually, the max bin is significant to select the number of bins in which to bucket the feature values [30]. The parameters were adjusted by the trial and error method, and they are listed in Table 1. In this study, three statistical performance metrics were utilized: coefficient of determination (R 2 ), root mean square error (RMSE), and mean absolute error (MAE). In regression analysis, R 2 is a crucial statistic that indicates how near the predictions are to the actual values, as well as the extent of the regression model bias-variance trade-off [8]. Controversy, the RMSE is sensitive to large perturbations in prediction errors and measures their variations. The third metric, MAE, is a useful illustrator that reveals the average distribution of errors across all model predictions. Furthermore, it also shows how widely the anticipated values are scattered across the entire model, as mentioned by Jaihuni et al., 2020 [31]. The formulas for the metrics are as below.

Results and Discussion
The use of RGB models for strawberry leaf color analysis had shown clear drawbacks in the past. The model's major flaw was that it had too few parameters to forecast RGB color. The physiological importance of soil physicochemical factors in characterizing leaf color change was not explained [32].
Generally, the parameters that indicate some attribute or trait of leaf color are soil pH, EC, ST, N, P, K, and plant age. The RGB mean value is extracted based on the normality assumption; the leaf color heterogeneity is ignored. Moreover, the mean value can only describe the leaf color state quantitatively [5].
Color alterations from green to yellow are prominent characteristics of leaf senescence [33]; therefore, the color model in this study could be used to explore an approach to prolonging the duration of the functional leaf period and delaying leaf senescence by adjusting the fertilizer rate for enhancing strawberry productivity. Moreover, the color model could be applied to recognize the growth status of strawberries based on the RGB values, which would facilitate the potential application of virtual strawberry production [34].

Color Feature Extraction
The mean values of strawberry leave R, G and B are plotted at two different ages, 60 days (vegetative stage) and 123 days (reproductive phase) following transplanting, in Figure 5a-c. The skewed pattern shows different leaf ages with increasing and decreasing trends. Two skewed distribution patterns are observed in the last stage of the strawberry lifetime.

Data Preprocessing Results
The heat correlations between soil physicochemical properties and plant age, and dependent variables such as leaf R, G, and B mean values are illustrated in Figure 6. The Pearson correlation coefficient technique was used in this investigation, which quantifies the strength of a linear relationship between two variables. It has a range of values from −1 to 1, with −1 indicating a total negative correlation, 0 indicating no connection, and +1 indicating an absolute positive correlation. Such kind of correlation aids in denoting the relationship between those dependent and independent variables as strong or weak. According to the heatmap results, positive and negative correlations were observed. As the color becomes darker in either red or blue, those variables are more highly correlated. Based on the heatmap results, color values such as R with B mean, R with G mean, and G with B mean had strong correlation coefficients of 0.76, 0.75, and 0.73, consecutively. Moreover, a high positive correlation was observed with P and K, whereas a strong negative correlation was exhibited between soil pH and R mean, K and plant age, and soil pH and B mean

Data Preprocessing Results
The heat correlations between soil physicochemical properties and plant age, and dependent variables such as leaf R, G, and B mean values are illustrated in Figure 6. The Pearson correlation coefficient technique was used in this investigation, which quantifies the strength of a linear relationship between two variables. It has a range of values from -1 to 1, with -1 indicating a total negative correlation, 0 indicating no connection, and +1 indicating an absolute positive correlation. Such kind of correlation aids in denoting the relationship between those dependent and independent variables as strong or weak. According to the heatmap results, positive and negative correlations were observed. As the color becomes darker in either red or blue, those variables are more highly correlated. Based on the heatmap results, color values such as R with B mean, R with G mean, and G with B mean had strong correlation coefficients of 0.76, 0.75, and 0.73, consecutively. Moreover, a high positive correlation was observed with P and K, whereas a strong negative correlation was exhibited between soil pH and R mean, K and plant age, and soil pH and B mean values, with 0.82, -0.73, -0.71, and -0.68, respectively. All independent variables were selected for the ML model's development according to the correlation coefficient values. PCA was used as a statistical means of dimension reduction in feature space. The magnitude of data was reduced, as demonstrated in Figure 7. All the data were spread out among the first and second principal components. The dimensionality reduction increases the accuracy of MLR and LGBM models [30]. PCA was used as a statistical means of dimension reduction in feature space. The magnitude of data was reduced, as demonstrated in Figure 7. All the data were spread out among the first and second principal components. The dimensionality reduction increases the accuracy of MLR and LGBM models [30]. PCA was used as a statistical means of dimension reduction in feature space. The magnitude of data was reduced, as demonstrated in Figure 7. All the data were spread out among the first and second principal components. The dimensionality reduction increases the accuracy of MLR and LGBM models [30].

Performance of the MLR and LGBM (GBR) Models
The MLR and GBR regression models were trained using 400 images in a supervised method. The dataset was split efficiently for training (75%) and testing (25%) to adjust the weights to avoid overfitting and underfitting issues. By taking the input and output data used for the MLR model to predict strawberry leaf color, the following formulas were computed to predict R, G, and B mean values (Equations (7)-(9)).

Performance of the MLR and LGBM (GBR) Models
The MLR and GBR regression models were trained using 400 images in a supervised method. The dataset was split efficiently for training (75%) and testing (25%) to adjust the weights to avoid overfitting and underfitting issues. By taking the input and output data used for the MLR model to predict strawberry leaf color, the following formulas were computed to predict R, G, and B mean values (Equations (7)-(9)).
where R, G, and B mean values are red, green, and blue values of strawberry leaf, and other inputs are soil physicochemical parameters including soil pH, electrical conductivity, soil temperature, nitrogen, phosphorous, and potassium (K), respectively. The constants of the MLR model are 107.58, 126.65, and 71.76 for R, G, and B mean values, respectively. The plant age regression coefficient value was zero, and it was not affected the R, G, and B color prediction according to the MLR results. To evaluate the efficiency of the MLR model, the pattern of the distribution of actual and predicted mean values of R, G, and B were compared on a scatter plot (Figure 8a-c). In terms of actual and predicted leaf color mean values, more outliers are shown in Figure 8. Moreover, the existence of outliers can be attributed to the inability of the model to predict the strawberry leaf color values properly. Synchronously, the scatter plots obtained from the GBR model (Figure 9a-c) illustrate the very close distribution pattern with measured and predicted R, G, and B mean values. Moreover, the minimum outliers (unusual values of data) in the scatter plot denote the GBR model as the best to predict the strawberry leaf color.
tively. The plant age regression coefficient value was zero, and it was not affected the R, G, and B color prediction according to the MLR results. To evaluate the efficiency of the MLR model, the pattern of the distribution of actual and predicted mean values of R, G, and B were compared on a scatter plot (Figure 8a-c). In terms of actual and predicted leaf color mean values, more outliers are shown in Figure 8. Moreover, the existence of outliers can be attributed to the inability of the model to predict the strawberry leaf color values properly. Synchronously, the scatter plots obtained from the GBR model (Figure 9a-c) illustrate the very close distribution pattern with measured and predicted R, G, and B mean values. Moreover, the minimum outliers (unusual values of data) in the scatter plot denote the GBR model as the best to predict the strawberry leaf color. can be attributed to the inability of the model to predict the strawberry leaf color values properly. Synchronously, the scatter plots obtained from the GBR model (Figure 9a-c) illustrate the very close distribution pattern with measured and predicted R, G, and B mean values. Moreover, the minimum outliers (unusual values of data) in the scatter plot denote the GBR model as the best to predict the strawberry leaf color. Consequently, the trained model's generalizability in predicting RGB mean values linked with planting age and soil physicochemical features was assessed using digital photographs. According to the metrics values, the GBR model exhibited higher performance than that of the MLR model. The overall performance levels of the training and testing models are stated in Tables 2 and 3.  Most of the models performed well in RMSE and R 2 during the training time. However, the MLR model training phase RMSE values are slightly higher than the testing results due to the randomly allocated training set, and the test set contains data that has not been seen before [15]. The training accuracy was much higher in the GBR model than in the MLR model, and the results were slightly raised in the testing phase. In terms of the percentage difference between GBR training and testing per R, mean values were 22.22% less in R 2 , 80.45% less in RMSE, and 80.65% less in MAE. Regarding the G mean values, the percentage difference between training and testing of R 2 , RMSE, and MAE was 27.27%, 80.60%, and 80.54%, respectively. Furthermore, the percentage difference between B mean values when training and testing the GBR model were R 2 , RMSE, and MAE values of 15.66%, 10.04%, and 16.55%, respectively.
Based on statical qualitative metrics (R 2 , RMSE, and MAE), the results of the study denoted that the GBR model provided a more powerful tool compared to the MLR for forecasting strawberry leaf color, as seen in Table 2. Furthermore, R 2 measures the goodness of fit and strength of the relationship. The MLR model is moderately fitted with data, whereas the GBR model is substantially suited with data based on R 2 values [35]. Keskin et al., 2018 [36], explored the effect of leaf moisture content on predicting nutrition stress using chromameter color values. In the study, the leaf sample's color was highly correlated with N, Calcium (Ca), and water content estimated from color data (R 2 = 0.66, R 2 = 0.70, and R 2 = 0.65, respectively) [36].
Previous researchers developed a deep neural network to identify the nutrition stress in plant canopies using spatiotemporal information. Abdalla et al., 2020 [37], proposed the long short-term memory and convolutional neural network (CNN) combined model to classify oilseed rape crops according to nutrition status. The Inceptionv3-LSTM obtained the highest overall classification accuracy of 95% when tested on the dataset of 2017/2018, and it also provided an excellent generalization when using a cross-dataset validation, with the highest overall accuracy of 92%.
Jaihuni et al., 2021 [8], explored cornfield normalized vegetative index (NDVI) information during vegetative and reproductive stages using the UAV and captured plants' reflectance information. Synchronously, the field's soil samples N, P, K, and carbon (C) were examined, and a CNN model was developed to predict the infield NPKC spatiotemporal variations. The model performed vigorously with R 2 values 0.93, 0.92, 0.98, and 0.83 in predicting N, P, K, and C levels in soil, respectively.
In the current study, the GBR model was more effective in regressing the R mean values related to soil physicochemical and plant age parameters, followed by G and B. On the other hand, the RMSE results demonstrated small perturbations in the difference between the predicted and actual RGB mean values. Meanwhile, the MAE values reiterated that the tested model was stable in keeping error rates in predictions to under 10%. Inclusively, it can be deduced from the metrics that the regression process successfully preserved a balance between variance and bias in the model. The results show that the GBR model efficiently imitated and predicted the strawberry leaf color values based on soil physicochemical and plant age parameters.
According to the results of both models, RMSE and MAE metrics are lower for B mean value than for R and G mean values. Therefore, the lower values of RMSE and MAE imply higher accuracy of a regression model [38].
Furthermore, and even more importantly, the novelties of the current work lie in many aspects. Previously, extensive studies were conducted to predict the RGB model relevant to soil macronutrients. The difference is that our concept looks at the relationship between strawberry leaf RGB color and soil physicochemical characteristics and plant age. The developed models were able to quantify the soil pH, EC, ST, N, P, and K optimum values for altering the leaf color. Hence, farmers can determine fertilizer demands by the leaf RGB color information. Moreover, the color-altering problems deriving from fertilizer misuse can be easily monitored and controlled. It facilitates the optimum production of strawberries in the greenhouse under controlled environmental conditions. On the other hand, small datasets and manually acquired images were used to generate comparable RMSE levels in estimating strawberry RGB color related to soil chemical components, which is unlikely to nullify the risk of overfitting in the models; hence, in such ML models, the question of generalizability and reliability need to be scrutinized further with extensive and different datasets.

Conclusions
This study developed applicable and stable machine learning models, namely, multiple linear regression (MLR) and gradient boost regression (GBR) models, that vigorously predicted the strawberry leaf color from plant age and soil physicochemical measurements including soil pH, electrical conductivity (EC), soil temperature (ST), N, P, and K when compared to captured digital images from the vegetative and reproductive growth phases of strawberry growth. The GBR model performed better than the MLR model with highperformance metrics. From the results, the GBR provided R 2 levels of 0.77, 0.72, and 0.70 for R, G, and B mean values, respectively. Simultaneously, MLR moderately fitted with datasets with R 2 levels of 0.67, 0.57 and 0.56 for R, G, and B mean values, respectively. Plant age also affected the skewed color pattern. Furthermore, the results indirectly revealed that with an increase in plant age, the strawberry leaf R mean value appreciably increased concerning G and B mean values, which led to an increase in the model performance of R followed by G and B in both models.
As seen in the results, the MLR model was unable to make predictions when the data distribution was beyond the limit, and it focused only on the linear relationship between variables. Nevertheless, the GBR model performs better with input variables that are complex and nonlinear due to its self-adaptive nature.
Our proposed technique has outperformed the benchmark studies while adding some innovative features. This research can be expanded upon by analyzing soil nutrients and mapping them against vegetative and reproductive indicators. Adding diverse soil types and fertilizer levels to a study will also help to offer additional value to it. In addition, it is suggested that future studies also consider the seasonal changes that affect the strawberry leaf color changes.  Data Availability Statement: The datasets generated during and/or analyzed in the current study are available from the corresponding author on reasonable request.