Indoor Plant Soil-Plant Analysis Development (SPAD) Prediction Based on Multispectral Indices and Soil Electroconductivity: A Deep Learning Approach

: Leaf Soil-Plant Analysis Development (SPAD) prediction is a crucial measure of plant health and is essential for optimizing indoor plant management. The deep learning methods offer advanced tools for precise evaluations but their adaptation to the heterogeneous indoor plant ecosystem presents distinct challenges. This study assesses how accurately deep neural network (DNN) predicts SPAD values in leaves on indoor plants when compared to well-established machine learning techniques, including Random Forest (RF) and Extreme Gradient Boosting (XGB). The covariates for prediction were based on low-cost multispectral and soil electro-conductivity (EC) sensors, enabling a non-destructive sensing approach. The study also strongly emphasized multicollinearity analysis quantiﬁed by the Variance Inﬂation Factor (VIF) and two independent indices, as well as its effect on prediction accuracy using deep and machine learning methods. DNN resulted in higher accuracy to RF and XGB, also performing better using ﬁltered data after multicollinearity analysis based on the coefﬁcient of determination (R 2 ), root mean square error (RMSE) and mean absolute error (MAE) (R 2 = 0.589, RMSE = 11.68, MAE = 9.52) in comparison to using all input covariates (R 2 = 0.476, RMSE = 12.90, MAE = 10.94). Overall, DNN was proven as a more accurate prediction method than the conventional machine learning approach for the prediction of leaf SPAD values in indoor plants, despite using heterogenous plant types and input covariates.


Introduction
In an increasingly fast-paced and technologically driven world, where the majority of working hours are spent indoors, the importance of creating a work environment that is both conducive and sustainable cannot be overstated [1].The modern workplace is more than just a physical place to perform tasks; it is now a dynamic ecosystem with the ability to dramatically affect the physical and psychological well-being of its occupants [2].The introduction of greenery into interior spaces is not only motivated by aesthetic preferences but is also recognized as a strategic requirement in the quest to create workplaces that promote employee health and productivity [3].More than just decorative elements, indoor plants have been shown to improve air quality, reduce stress, increase productivity, and contribute to a more desirable and productive workplace [4].This increased awareness of the many benefits of indoor plants has created new opportunities for research and innovation in the fields of agriculture and horticulture.Precise assessment and prediction of plant health and vitality have emerged as critical variables in ensuring optimal crop development and effective resource management in the context of modern horticulture [5,6].Measurement of chlorophyll concentration is a key indicator used to assess overall plant health due to its direct relationship with photosynthetic activity [7].In addition, phytoremediation, the use of plants to remove toxins from indoor environments, is becoming increasingly important [8].In such cases, accurate prediction of chlorophyll concentration helps select indoor plant species with the greatest potential to effectively clean indoor air [9].The ability to differentiate between plants based on chlorophyll concentration can greatly improve the efficiency of phytoremediation efforts, making indoor environments healthier and more sustainable.
Chlorophyll content measurements are frequently represented by Soil-Plant Analysis Development (SPAD) values, which have become essential tools for assessing plant health [10].Variables that affect SPAD meters include the surrounding environment, leaf age, species-specific characteristics, and nutrient availability [11].Standardized techniques to improve reliability have been proposed through calibration and validation studies that have examined species-specific calibration curves and the effect of changing conditions on SPAD meter performance.However, many of these studies proposed calibration equations that are empirical and based on a single crop type, meaning that these are not viable for often highly heterogeneous indoor plant types [12][13][14].Studies contrasting SPAD values with conventional techniques highlight the relationship between these values and indices of plant health and a very high correlation with leaf chlorophyll content [15].A thorough understanding of SPAD estimation is becoming increasingly important as indoor plant management requires optimizing development and maintaining plants efficiently.As research into the process of predicting SPAD values has progressed, new advances in deep learning approaches are providing tools for ever more accurate assessments [16,17].However, previous research has mostly focused on outdoor plants, so applying these methods to indoor plants presents a unique set of obstacles.A wide variety of plant species and variable abiotic factors characterize the indoor environment, resulting in increased biodiversity [18].As a result, using deep learning techniques to reliably predict SPAD values in indoor plants requires novel approaches and specialized models adapted to this distinct and dynamic environment.Previous studies have highlighted the critical importance of integrating complementary and independent sensors in the context of automated plant health monitoring via the Internet of Things (IoT) [19].The integration of such sensor technology offers a novel paradigm for accurate and real-time plant health monitoring in the era of smart and sustainable indoor environments [20].Multispectral sensors collect spectral data at multiple wavelengths, enabling non-destructive inspection of plant leaves and, as a result, more accurate estimates of SPAD values [21].Soil electroconductivity (EC) measurements, on the other hand, provide an indirect but important indication of soil nutrient levels and general environmental conditions [22].These sensor modalities, when combined, provide an in-depth view of the plant-soil system, providing information on the health and development potential of indoor plants.Several authors used soil EC directly as an important physico-chemical-biological component of plant health [23], as well as an indirect representation of soil moisture availability for plant development in a broad scope of studies [24].In the context of the IoT, these sensors enable automated and continuous monitoring, providing real-time input that can be used for timely intervention and efficient resource management [25].Deep and machine learning approaches use sensor data to improve prediction accuracy, fostering the development of intelligent systems capable of maintaining thriving indoor plant ecosystems, with broader implications for occupant well-being and indoor sustainability [26].
The advanced search of studies indexed in the Web of Science Core Collection with the broad topics of "SPAD" and "deep learning", as well as "SPAD" and "indoor plants" returned 27 and 10 studies, respectively.While present research which used deep learning exclusively used hyperspectral and multispectral images for outdoor arable crops [27][28][29][30], those focusing on indoor plants used SPAD values as auxiliary indicators of the effectiveness of LED light in an indoor environment [31,32].Therefore, there is currently no research that provides indoor plant health monitoring quantified by SPAD values which is suitable for the implementation in the IoT.To address this research gap in predicting leaf SPAD values of indoor plants based on state-of-the-art deep learning, the main objective of this study was to evaluate its efficiency compared to well-known machine learning methods.Furthermore, the data collection for covariates was based on non-destructive multispectral and soil EC sensors, allowing the proposed procedure to be implemented as part of automated plant monitoring in the IoT.

Materials and Methods
The data collection in this study comprised two primary components: (1) leaf SPAD measurement of indoor plants as training and test data for prediction; and (2) leaf multispectral and soil EC sensing for modeling covariates for the prediction.The deep neural network (DNN) was proposed for leaf SPAD prediction and was evaluated alongside two well-known machine learning methods.

Indoor Plants Analyzed in the Study
A total of 52 individual indoor plants of ten species were analyzed in the study, as shown in Figure 1.Data collection was performed on 13 October 2023 in the building of the Faculty of Agrobiotechnical Sciences Osijek, which covers an area of 18,600 m 2 .The studied indoor plants were located on the first three floors of the building, which were distributed on the four main sides of the building, as well as in two central corridors connecting the north and south sides of the building.All indoor plants were maintained in standardized containers as shown in Figure 1, and were also watered, fertilized, and managed in a standardized manner throughout the building.SPAD values of indoor plants based on state-of-the-art deep learning, the main objective of this study was to evaluate its efficiency compared to well-known machine learning methods.Furthermore, the data collection for covariates was based on non-destructive multispectral and soil EC sensors, allowing the proposed procedure to be implemented as part of automated plant monitoring in the IoT.

Materials and Methods
The data collection in this study comprised two primary components: (1) leaf SPAD measurement of indoor plants as training and test data for prediction; and (2) leaf multispectral and soil EC sensing for modeling covariates for the prediction.The deep neural network (DNN) was proposed for leaf SPAD prediction and was evaluated alongside two well-known machine learning methods.

Indoor Plants Analyzed in the Study
A total of 52 individual indoor plants of ten species were analyzed in the study, as shown in Figure 1.Data collection was performed on 13 October 2023 in the building of the Faculty of Agrobiotechnical Sciences Osijek, which covers an area of 18,600 m 2 .The studied indoor plants were located on the first three floors of the building, which were distributed on the four main sides of the building, as well as in two central corridors connecting the north and south sides of the building.All indoor plants were maintained in standardized containers as shown in Figure 1, and were also watered, fertilized, and managed in a standardized manner throughout the building.

Sensors and Sensing Approach Used for Plant and Soil Measurement
Three non-destructive sensing approaches were used to model leaf SPAD values, as well as plant and soil covariates for deep and machine learning prediction of leaf SPAD

Sensors and Sensing Approach Used for Plant and Soil Measurement
Three non-destructive sensing approaches were used to model leaf SPAD values, as well as plant and soil covariates for deep and machine learning prediction of leaf SPAD (Figure 2).Furthermore, all sensors used are low-cost solutions, which allows their widespread implementation, both as standalone systems [33] or as part of the IoT [34,35].
The SPAD meter of indoor plant leaves was measured using the Konica Minolta SPAD-502 Plus handheld chlorophyll sensor (Tokyo, Japan).SPAD values were used to represent relative leaf chlorophyll content based on absorbance measurement, achieving a very high correlation with leaf nitrogen concentration [15].SPAD measurements per plant were taken as an average of six measurements at evenly spaced points, taking into account present variations in leaf color and condition in proportion to overall plant condition.
wavelengths (blue, green, red, red-edge, and near-infrared bands) [36].All measurements were collected and exported using the Plant-O-Meter Android application connected to the handheld device via Bluetooth.Because Plant-O-Meter operates as an active sensor, accurate canopy sensing was maintained regardless of lighting conditions.A total of 21 vegetation indices were calculated based on the Plant-O-Meter measurements, as listed in Table 1.These indices quantified a wide range of vegetation properties, providing a basis for implementing deep and machine-learning methods in a variety of agricultural studies [37].Because the indoor plant species evaluated in this study differed by canopy system, measurements were made by aiming the sensor with the criterion of covering a plane to include maximum leaf coverage during sampling.Soil EC was sampled using the Hanna Instruments HI 98331 Handheld Soil EC Sensor (Nusfalau, Romania) at 5 cm, 10 cm, and 15 cm soil depths within a 10 cm radius of plant stems, allowing for non-destructive measurement toward root systems.This resulted in a total of 52 input samples and 25 covariates evaluated in the study.The covariates for predicting SPAD values for indoor plants were collected using two different approaches based on plant and soil sensing: (1) multispectral sensing of plant leaves and (2) soil EC measurement (Figure 3).Multispectral sensing of the indoor plant canopy was performed using the Plant-O-Meter, a handheld device that operates in five wavelengths (blue, green, red, red-edge, and near-infrared bands) [36].All measurements were collected and exported using the Plant-O-Meter Android application connected to the handheld device via Bluetooth.Because Plant-O-Meter operates as an active sensor, accurate canopy sensing was maintained regardless of lighting conditions.A total of 21 vegetation indices were calculated based on the Plant-O-Meter measurements, as listed in Table 1.These indices quantified a wide range of vegetation properties, providing a basis for implementing deep and machine-learning methods in a variety of agricultural studies [37].Because the indoor plant species evaluated in this study differed by canopy system, measurements were made by aiming the sensor with the criterion of covering a plane to include maximum leaf coverage during sampling.Soil EC was sampled using the Hanna Instruments HI 98331 Handheld Soil EC Sensor (Nusfalau, Romania) at 5 cm, 10 cm, and 15 cm soil depths within a 10 cm radius of plant stems, allowing for non-destructive measurement toward root systems.This resulted in a total of 52 input samples and 25 covariates evaluated in the study.
Variance Inflation Factor (VIF) analysis was used to evaluate all multispectral and soil EC covariates for multicollinearity.The VIF was used as the primary metric to measure the degree of multicollinearity among predictor variables in a regression model.The VIF values generated for each covariate indicated the extent to which they were related, with values greater than 10 strongly indicating multicollinearity [54].In addition, two multicollinearity indices (IND1 and IND2) proposed by Ullah et al. [55] were used independently to assess multicollinearity.IND1 values greater than 0 and IND2 values less than 1 indicated less multicollinearity.

Deep and Machine Learning Prediction and Accuracy Assessment
The DNN was built sequentially using the Keras library in R v4.0.3, with each layer added sequentially to facilitate the flow of data from input to output, as shown in Figure 4.The data were normalized before entering the network using the -1L axis normalization, which indicates normalization over the last axis.The normalizer was then fitted to the training features by applying the fit function to the training data, which was represented as a matrix.The network consists of several dense (fully connected) layers, each with its own set of features.The first layer consisted of 32 units and used the Rectified Linear Unit (ReLU) activation function, which is a popular choice for hidden layers [56,57].In addition, to avoid overfitting, these layers were subjected to L1 regularization with a regularization strength of 0.001.To incorporate regularization by randomly deactivating a percentage of input units during training, a dropout layer with a dropout rate of 0.1 was used.This was followed by another 32-unit dense layer with ReLU activation, followed by a pair of 16-unit layers, the first with L1 regularization and the second without explicit regularization.Finally, the output layer was a single-unit dense layer, since the network was designed for a regression problem.The overall design incorporates many features aimed at maximizing predictive performance and controlling model complexity for the problem domain addressed in this study.
Two machine learning methods, Random Forest (RF) and Extreme Gradient Boosting (XGB), were evaluated alongside DNN.RF and XGB achieved superior prediction accuracy in regression problems compared to current machine learning algorithms in similar studies on various aspects of horticulture [58,59] and agriculture in general [60][61][62].As an ensemble learning technique, RF builds a forest of decision trees, each trained separately on randomly selected samples of the data and features [63].Averaging the predictions from these trees reduces overfitting, improves model robustness, and captures complex interactions between variables and the target variable.In contrast, the XGB gradient boosting technique uses an iterative process based on decision trees [64].Starting with a simple model, it builds decision trees in stages, each of which aims to minimize the errors produced by the previous iterations.Through this repeated learning process, XGB can adapt to complicated data relationships and continuously improve predictions.This process is guided by the calculation of residuals.Together, these two techniques successfully take advantage of both the ensemble approach of RF and the iterative improvement of XGB, enabling the model to estimate leaf SPAD values with high accuracy.A tuning hyperparameter for RF was the quantity of variables randomly sampled at each split (mtry), while for XGB the contribution of each tree to the overall ensemble was affected by the learning rate (eta), the number of boosting rounds (nrounds), while alpha controlled the L1 regularization and lambda controled the L2 regularization on the leaf weights.The hyperparameter turning was performed using a built-in automated approach in caret library.Two machine learning methods, Random Forest (RF) and Extreme Gradient Boosting (XGB), were evaluated alongside DNN.RF and XGB achieved superior prediction accuracy in regression problems compared to current machine learning algorithms in similar studies on various aspects of horticulture [58,59] and agriculture in general [60][61][62].As an ensemble learning technique, RF builds a forest of decision trees, each trained separately on randomly selected samples of the data and features [63].Averaging the predictions from these trees reduces overfitting, improves model robustness, and captures complex interactions between variables and the target variable.In contrast, the XGB gradient boosting technique uses an iterative process based on decision trees [64].Starting with a simple model, it builds decision trees in stages, each of which aims to minimize the errors produced by the previous iterations.Through this repeated learning process, XGB can adapt to complicated data relationships and continuously improve predictions.This process is guided by the calculation of residuals.Together, these two techniques successfully take advantage of both the ensemble approach of RF and the iterative improvement of XGB, enabling the model to estimate leaf SPAD values with high accuracy.A tuning hyperparameter for RF was the quantity of variables randomly sampled at each split (mtry), while for XGB the contribution of each tree to the overall ensemble was affected by the learning rate (eta), the number of boosting rounds (nrounds), while alpha controlled the L1 regularization and lambda controled the L2 regularization on the leaf weights.The hyperparameter turning was performed using a built-in automated approach in caret library.DNN, RF, and XGB were evaluated in two approaches based on the input data, considering all multispectral and soil covariates (all input data) and only covariates for which multicollinearity was not detected (filtered input data).Three statistical measures were used to assess the accuracy and reliability of the prediction models using 10-fold crossvalidation.The primary metric used was the coefficient of determination (R 2 ), which quantifies the proportion of variation in predicted SPAD explained by the model.A higher R 2 value indicates a better fit of the model to the data, indicating its ability to accurately capture the underlying relationships.In addition, the root mean square error (RMSE) and mean absolute error (MAE) were used to assess the prediction performance of the model.DNN, RF, and XGB were evaluated in two approaches based on the input data, considering all multispectral and soil covariates (all input data) and only covariates for which multicollinearity was not detected (filtered input data).Three statistical measures were used to assess the accuracy and reliability of the prediction models using 10-fold cross-validation.The primary metric used was the coefficient of determination (R 2 ), which quantifies the proportion of variation in predicted SPAD explained by the model.A higher R 2 value indicates a better fit of the model to the data, indicating its ability to accurately capture the underlying relationships.In addition, the root mean square error (RMSE) and mean absolute error (MAE) were used to assess the prediction performance of the model.The average size of prediction errors is quantified by the RMSE, which provides a measure of how well the model's predictions match the observed data.The average size of the absolute errors, expressed as MAE, provides a more robust insight into the accuracy of the model.The use of R 2 , RMSE, and MAE in this accuracy assessment provided a comprehensive evaluation of leaf SPAD prediction and allowed a comprehensive analysis of the accuracy of DNN, RF, and XGB.

Results and Discussion
The Pearson's correlation coefficients shown in the correlation plot (Figure 5) indicate two main results: (1) the indoor plant type and soil EC covariates generally produced low correlations compared to the Plant-O-Meter vegetation indices, and (2) the individual vegetation indices tended to produce high and very high absolute Pearson's correlation coefficients when evaluated against each other.The exception to the latter was to some extent present for DGCI, RGR, RDVI, NDRE and EVI.DGCI is the most distinct vegetation index among those evaluated because it uses band values transformed to the hue-saturationbrightness model [65], resulting in Pearson's correlation coefficients with other vegetation indices ranging from −0.73 to 0.48.Although the RGR calculation is based on a simple red-green ratio, it produced weak to moderate correlations with other vegetation indices.This was probably due to the specific selection of Plant-O-Meter vegetation indices, which focused dominantly on the use of red and near-infrared bands [36], with RGR being the only evaluated index using only red and green bands.However, there is no basis for the relatively low positive correlation of RDVI with almost all other vegetation indices related to band selection.Similar to RGR, NDRE resulted in low to moderate correlations with other vegetation indices, ranging from −0.49 to 0.63, because it was the only vegetation index evaluated that used the red edge band.Aside from its specificity in band selection, NDRE likely provided different results from the majority of indices based on red and near-infrared bands (especially NDVIr) due to its resistance to saturation effect in cases of high biomass [66].Based on previous studies, EVI was expected to provide different results from vegetation indices using only red and near-infrared bands and likely provided a more robust assessment of plant health due to the inclusion of blue bands [67].The multicollinearity indices used in the study, VIF, IND1, and IND2, partially agreed with the results of the correlation analysis regarding the covariate correlation (Table 2).With the primary criterion of VIF values less than 10 [54], only four out of a total of 25 covariates indicated an absence of multicollinearity, including plant type, soil EC at 5 cm and 15 cm soil depth, and NDRE.While the results of plant type and soil EC measurements strongly indicated their independence relative to all input covariates, NDRE was the only vegetation index that resulted in the absence of multicollinearity, probably due to the previously mentioned ability of the red-edge band to provide resistance to saturation in cases of higher biomass [67].The multicollinearity observations of VIF were confirmed by the IND1 and IND2 indices, providing an independent check of the multicollinearity analysis [55].The final data filtered after the multicollinearity analysis consisted of four covariates, including plant type, soil EC at 5 cm and 15 cm soil depth, along with NDRE, as a single vegetation index.Across all evaluation measures, the DNN model outperformed RF and XGB in terms of prediction accuracy with the primary criterion of higher R 2 (Table 3), while also achieving highly consistent prediction accuracy regardless of training and test data folds during cross-validation (Figure 6).The optimal hyperparameters considering input data were the same for RF (mtry = 2), while XGB produced higher regularization strength for all input data (eta = 0.3, nrounds = 50, lambda = 0.1, alpha = 0.0001), in comparison to filtered input data (eta = 0.3, nrounds = 100, lambda = 0, alpha = 0).Furthermore, it is noteworthy that the results using filtered input data after multicollinearity analysis improve the prediction accuracy for DNN, which was so far usually evaluated only for machine learning methods [68].The two evaluated machine learning methods were less sensitive to input covariate selection, with XGB producing slightly higher prediction accuracy using filtered input data.These results underscore the need for multicollinearity analysis in similar studies that consider numerous input covariates with DNN, supporting and expanding on the observations of McCaw et al. [69], despite mixed results regarding relative accuracy quantified by R 2 and absolute prediction accuracy represented by RMSE and MAE.Meanwhile, machine learning prediction results confirm the observations of previous studies in which was proven that RF can effectively handle multicollinearity, providing robust predictions despite a high correlation between input features [70], while XGB may be more prone to the effects of multicollinearity, given its tendency to construct deeper and more intricate trees [71].The results of 10-fold cross-validation also strongly indicated its superiority over split-sample accuracy assessment, which greatly varies among folds for RF and XGB, and could otherwise imply inconsistent and unreliable prediction performance in heterogeneous input datasets [72].Unlike more frequent studies that use deep learning based on images, Graditi et al. [73] noted that regression problems do not require a large amount of input data for accurate prediction, which was confirmed in studies based on both deep [74] and conventional machine learning [75].To ensure resistance to overfitting during the predictions based on smaller datasets, Hosseini et al. [76] strongly recommended implementing k-fold cross-validation, as was performed in this study, instead of a simpler and more frequent split-sample approach.However, a study by Gilbertson and van Niekerk proved that, while smaller datasets can reliably produce moderately high prediction accuracy, the addition of a larger amount of training samples would likely lead to higher prediction accuracy [77].Although SPAD values are widely used, there are several limitations when it comes to representing the chlorophyll content of indoor plant leaves.One notable limitation is the potential effect of environmental variables on SPAD measurements since chlorophyll fluorescence can be affected by changes in temperature, humidity, and light intensity, which can introduce variability into SPAD readings [78].In addition, there are difficulties in calibrating SPAD meters because the best calibration curves vary with species and highly heterogeneous indoor growing environments.In addition, because SPAD values cannot distinguish between chlorophyll a and chlorophyll b, they can only be used to estimate the amount of chlorophyll present [15].The possible effect of nutrient supplementation on SPAD readings should be considered, as indoor plant habitats are often subject to regulated conditions and excessive or deficient nutrient levels may result in a false representation of chlorophyll status.Due to the considerable heterogeneity in input indoor plant samples having ten species represented by 52 samples, the proposed approach can be expected to be robust with similar indoor plant datasets.Moreover, the increased sample count while retaining a similar amount of plant species will likely result in an increased prediction accuracy in all evaluated instances [77].Vegetation indices are widely recognized as sensitive indicators of plant health due to their ability to capture various aspects of vegetation dynamics [37,79].Thus, it is important to account for the effect of biomass when utilizing vegetation indices for plant health assessments [80].However, the influence of biomass on these indices was not explicitly considered in the research.The lack of inclusion of biomass-related factors in the research affects the spectral reflectance properties [81], thus limiting the predictive accuracy of the model.To overcome this limitation, further calibration and refinement of the proposed approach are possible.Integration of this approach with deep learning-based plant identification algorithms also presents an avenue for future enhancement [82].By utilizing the potential of vegetation indices and advanced plant species recognition technology based on deep learning, a more comprehensive and consistent model for predicting plant health can be developed.This integration may provide a more thorough understanding of vegetation status, including species-specific intricacies and overall biomass-related dynamics, consequently enhancing the robustness and dependability of plant health predictions.

Conclusions
The process of plant health assessment has changed significantly since deep learning was introduced to the field.However, there are particular difficulties in applying these methods to complex and heterogeneous indoor plant ecosystems.This research addressed these issues and highlighted the effectiveness of DNN in predicting leaf SPAD of indoor plants based on a non-destructive approach.
Indoor plant types and soil EC showed lower correlations compared to the vegetation indices obtained from the Plant-O-Meter.The evaluation of individual vegetation indices against each other resulted in high and very high absolute Pearson's correlation coefficients, except for DGCI, RGR, RDVI, NDRE, and EVI.Partial concurrence between multicollinearity analysis using VIF, IND1, and IND2 with the correlation results was observed.Only four out of the 25 covariates, which were plant type, soil EC at 5 cm and 15 cm soil depth, and NDRE, showed an absence of multicollinearity as indicated by VIF values.The only vegetation index without detected multicollinearity was NDRE, likely due to the resistance of its red-edge band to saturation effects in high-biomass scenarios.The fact that VIF, IND1, and IND2 all show consistent results provides solid evidence of the accuracy of the analysis regarding multicollinearity.
The DNN model outperformed both RF and XGB in terms of predictive accuracy across several evaluation measures.Additionally, it showed superior consistency in its prediction accuracy across various training and testing data folds during cross-validation.Notably, this robustness suggests a dependable predictive performance in diverse and heterogeneous input datasets.Filtering the input data based on multicollinearity analysis improved the prediction accuracy of the DNN model, which highlights the importance of accounting for multicollinearity in similar studies.On the other hand, RF demonstrated the ability to handle multicollinearity effectively, providing robust predictions despite high correlations between input features, while XGB produced moderately high accuracy but was more susceptible to multicollinearity because it constructed deeper and more intricate trees.
Furthermore, by incorporating these technologies into the IoT framework, it is possible to automatically monitor plant health in real time, which promises to create healthier indoor environments.The results of this study underscore the importance of considering multicollinearity when using DNN by selecting variables and emphasize the need for accuracy in data collection.

Figure 1 .
Figure 1.The display of ten indoor plant species analyzed in the study, maintained in standardized containers at the Faculty of Agrobiotechnical Sciences Osijek.

Figure 1 .
Figure 1.The display of ten indoor plant species analyzed in the study, maintained in standardized containers at the Faculty of Agrobiotechnical Sciences Osijek.

Figure 3 .
Figure 3.The display of sensing process for: (a) six evenly distributed and representative points per plant using chlorophyll sensor, (b) soil EC measurement, (c,d) multispectral sensing using Plant-O-Meter, based on plant canopy system.

Table 1 .
Vegetation indices collected using Plant-O-Meter as covariates for the prediction of leaf SPAD values.

Figure 3 .
Figure 3.The display of sensing process for: (a) six evenly distributed and representative points per plant using chlorophyll sensor, (b) soil EC measurement, (c,d) multispectral sensing using Plant-O-Meter, based on plant canopy system.

Horticulturae 2023, 8 , 16 Figure 4 .
Figure 4.The architecture of the proposed DNN for prediction of leaf SPAD values of indoor plants.

Figure 4 .
Figure 4.The architecture of the proposed DNN for prediction of leaf SPAD values of indoor plants.

Horticulturae 2023, 8 , 16 Figure 5 .
Figure 5.The correlation plot of all input covariates evaluated for the prediction of leaf SPAD of indoor plants, with values representing Pearson's correlation coefficient.The multicollinearity indices used in the study, VIF, IND1, and IND2, partially agreed with the results of the correlation analysis regarding the covariate correlation (Table 2).With the primary criterion of VIF values less than 10 [54], only four out of a total of 25 covariates indicated an absence of multicollinearity, including plant type, soil EC at 5

Figure 5 .
Figure 5.The correlation plot of all input covariates evaluated for the prediction of leaf SPAD of indoor plants, with values representing Pearson's correlation coefficient.

Figure 6 .
Figure 6.A display of variability of accuracy assessment metrics across 10 folds during cross-validation.

Figure 6 .
Figure 6.A display of variability of accuracy assessment metrics across 10 folds during cross-validation.

Table 2 .
The results of multicollinearity analysis for all 25 input covariates evaluated in the study.

Table 3 .
Prediction accuracy of evaluated deep and machine learning methods in leaf SPAD prediction of indoor plants.
The most accurate prediction metrics are bolded.