Machine Learning in the Analysis of Multispectral Reads in Maize Canopies Responding to Increased Temperatures and Water Deficit

Real-time monitoring of crop responses to environmental deviations represents a new avenue for applications of remote and proximal sensing. Combining the high-throughput devices with novel machine learning (ML) approaches shows promise in the monitoring of agricultural production. The 3 × 2 multispectral arrays with responses at 610 and 680 nm (red), 730 and 760 nm (red-edge) and 810 and 860 nm (infrared) spectra were used to assess the occurrence of leaf rolling (LR) in 545 experimental maize plots measured four times for calibration dataset (n = 2180) and 145 plots measured once for external validation. Multispectral reads were used to calculate 15 simple normalized vegetation indices. Four ML algorithms were assessed: single and multilayer perceptron (SLP and MLP), convolutional neural network (CNN) and support vector machines (SVM) in three validation procedures, which were stratified cross-validation, random subset validation and validation with external dataset. Leaf rolling occurrence caused visible changes in spectral responses and calculated vegetation indexes. All algorithms showed good performance metrics in stratified cross-validation (accuracy >80%). SLP was the least efficient in predictions with external datasets, while MLP, CNN and SVM showed comparable performance. Combining ML with multispectral sensing shows promise in transition towards agriculture based on data-driven decisions especially considering the novel Internet of Things (IoT) avenues.


Introduction
Human population growth has led to increasing food requirements and resource depletion, intensifying the use of modern technologies in agriculture over the last few decades. Thus, major achievements in sensing technologies, wireless communication and artificial intelligence have been made by research efforts in agriculture globally [1,2]. In the context of climate change, real-time monitoring of drought is of primary interest which yielded a forked remote sensing approach, deploying satellite imagery to spot drought occurrence [3][4][5][6][7], or utilizing the recent developments in affordable sensor solutions, generating data in a more (unmanned aerial vehicles-UAV) or less (pole, machine or tower mounted) remote manner [8][9][10][11][12]. In conventional agricultural production, the only objective data collected on plant side (grain yield) are collected when the plant is already dead, so the real-time monitoring practices represent a paradigmatic shift for most farmers around the world.
Studies show that there is a growing occurrence of heatwave days accompanied by longer spans of drought [13]. These deleterious climate changes come with increased sensitivity of maize and soybean to heat and drought, despite ever strong breeding efforts for tolerance to abiotic stress [14]. In maize, there is a large number of morpho-physiological 2 of 17 adjustments in response to drought stress [15][16][17]. However, many of these changes are irreversible, such as low fertilization rate, increased susceptibility to diseases or depleted stands [18], and once they are expressed, the economic losses are unavoidable. There are also some changes that are different among maize cultivars [19] and they are reversible in nature providing an excellent signal of the current plant water status.
One such trait is transverse rolling of leaf blades ( Figure S1), usually caused by hydronastic changes. Many plant species use this mechanism as a drought avoidance strategy [20]. It represents a mechanism of adaptation in plants to control stress mostly by the means of auto-stress (cell-wall tension increase/decrease), reduction in the light interception and transpiration, thus preventing dehydration and overheating [21]. Not all genotypes express leaf rolling at the same conditions [22,23], nor the monotonic increase in leaf curvature shows the same maximum over different genotypes [24]. While in some genotypes it represents an avoidance strategy for stress conditions, in others it marks a tipping point for the physiological damage [25][26][27], thus implying the need to dynamically monitor for this trait in real time, for breeding and management purposes. So far, the use of different methods of leaf rolling quantification was reported in the literature, most of them grading responses in scales from 1 (no leaf rolling) to 5 (completely rolled leaves, dead or lax) [28][29][30], or 1 (no leaf rolling) to 9 (completely rolled leaf blades) [31]. All of the mentioned methods imply phenotyping at (i) drought stress treatment, or (ii) at the time of solar noon, when the strongest leaf rolling is expected to occur [24]. However, all of the conventional methods also imply a need for human screening limiting the ability to capture trait dynamics, accompanied with the increased error in rolling grading between scorers. In order to use phenotypic indicators, such as leaf rolling in some unfavorable conditions, in decision support systems, the possibility for their remote monitoring is of critical importance.
Leaf rolling causes several easily detectable changes in plant level. One such change is reduction in leaf area index (LAI), rendering the changes easily detectable by simple hemispherical photography [31]. However, there are also more subtle changes in spectral derivatives of leaf-rolled plants exposed to drought treatment, mostly caused by the accumulation/translocation of biochemicals and decrease in photosynthetic activity [32], allowing the drought detection using hyperspectral data along with several derived normalized vegetation indexes such as normalized difference vegetation index (NDVI) [33]. Normalized difference vegetation index and other normalized vegetation indices (VI) represent a useful and sensitive tool in vegetation monitoring converting the raw sensor reads to useful normalized and repeatable results [34,35]. Hyper/multispectral monitoring is already a proven method of vegetation monitoring [36,37] coming more and more to focus of researchers addressing high-throughput phenotyping and precision agriculture [38][39][40]. Moreover, different types of stress can be detected by combining ML with spectral monitoring [41][42][43]. There are many approaches for applied regression analysis of the remote sensing data being used [44,45]; however, such models are inefficient when accounting for nonlinearity of targets in multi-dimensional hyperplanes. On the other hand, these data properties are efficiently handled by modern machine learning (ML) algorithms, extracting numerical features from the data while retaining the information from the original dataset. Moreover, there is a growing body of evidence of the superiority in performance of ML algorithms in remote sensing data analysis for various agricultural applications such as vegetation classification [46], biomass and soil moisture analysis [47], crop stress phenotyping [38], precision farming [48,49] and many others [50]. Furthermore, these methods also have an ecological and humanitarian depth showing promise in helping to adhere to the Sustainable Development Goals [51] presented by the United Nations [52].
In this research, we attempted to use a low-cost sensor capturing spectral responses around several critical plant reflectance wavelengths and to apply machine learning to detect changes in plant morphology for the envisioned use in precision agriculture and plant breeding. Specifically, objectives were to determine the usefulness of a simple multispectral sensor to monitor leaf rolling in maize as a sign of stress, and to assess different machine learning models readily applied in the classification of labeled data.

Field Experiments
The field experiments were carried out at the experimental station of the Agricultural Institute Osijek (AIO) in Osijek Croatia (45 • 32 N 18 • 44 E). Fields are subject to barleysoybean-maize rotation and are, following the soil analysis, fertilized and maintained with non-limiting amounts of fertilizers following respective local best practices and regulations. For the purpose of development of new cultivars AIO organizes several levels of field trials with multiple cultivars, separated between early, early to medium, medium to late and late maturity breeding programs. Maturity was categorized by FAO system [53] with reference genotypes used as checks to categorize hybrids to groups 1 (early) to 7 (late). Besides breeding programs, AIO organizes demonstrational trials with 3 to 75 hybrids representing the pallet of latest breeding efforts. AIO is a certified seed producer, marketing maize hybrids in Southeast and Central Europe and the Middle East. Thus, there is a considerable diversity of hybrids present at breeding trials of every maturity group, aiming at adaptation to different agro-ecological scenarios. For the purpose of this study, we chose seven trials with different numbers of hybrids from the mentioned maturity groups, namely, single irrigated demonstrational trial (DTir) and two rainfed demonstrational trials (DTrf and SDTrf). Soil type in DTrf and DTir was anthropogenized eutric cambisol, while at SDTrf, soil was sandy loam. In the gradient of trial qualities, SDTrf thus represented a low-water availability trial, while DTir was not water limited during the screening. The DTir trial was irrigated twice with 40 mm/m 2 per irrigation, on 20th of June and 2nd of July. Irrigation was carried out by gear-driven full circle sprinklers. Trials of early (ET), early to medium (EMT), medium to late (MLT) and late maturity (LT) breeding programs were represented by randomized complete block trials with 25 hybrids in four replicates on anthropogenized eutric cambisol (ET, MLT and LT) and sandy loam (EMT). All trials were sown in a northsouth orientation except SDTrf and EMT which were sown in an east-west orientation. Details on experimental design are shown in Table 1.

Measurements and Agroecological Conditions
Measurements were carried out with AMS (ams-OSRAM AG, Austria) AS7263 sensor unit with six spectral bands (3 × 2 photo diode array) responsive to wavelengths in red and near-infrared spectra (610, 680, 730, 760, 810 and 860 nm) with 20 nm full width at half maximum. Sensor consists of plastic housing, a lens and photodiode array with aperture of 0.75 mm and 20.5 • viewing angle. The sensor was connected to Arduino Uno prototyping board and the data were logged based on a programmed button-interrupt to SD card. Each interrupt consisted of 10 consecutive measurements within 2000 ms and their average was logged with timestamp. The wiring was mounted to a 3D printer printed mount and set up on a 2.2 m telescopic tripod, and the 10 Ah power bank was used to power the device (Figure 1c). The sensor was set 2 m from ground at 90 • to capture leaves in 0.6 m 2 of theoretical field width (to ground). The 2 m height was chosen as only 16 out of 545 plots showed height lower than 2 m (shortest hybrid in SDTrf was 1.77 m), so the sensor captured the leaves intersecting the field of view. Measurements were carried out during the morning and around solar noon (Figure 1a), when the weakest and the strongest leaf rolling is expected to occur (Table 1) [31]. The tripod with mounted sensor wiring was carried between the plots and set 1.5 m within rows for measurement ( Figure 1b). The exact data for plant height were collected for three experiments DTir, DTrf and SDTrf with 3 m long ruler, and no connection was observed between multispectral reads and plant height (correlations < 0.3). The mean height was 214.9 cm, with standard deviation of 6.05 cm. average was logged with timestamp. The wiring was mounted to a 3D printer printed mount and set up on a 2.2 m telescopic tripod, and the 10 Ah power bank was used to power the device (Figure 1c). The sensor was set 2 m from ground at 90° to capture leaves in 0.6 m 2 of theoretical field width (to ground). The 2 m height was chosen as only 16 out of 545 plots showed height lower than 2 m (shortest hybrid in SDTrf was 1.77 m), so the sensor captured the leaves intersecting the field of view. Measurements were carried out during the morning and around solar noon (Figure 1a), when the weakest and the strongest leaf rolling is expected to occur (Table 1) [31]. The tripod with mounted sensor wiring was carried between the plots and set 1.5 m within rows for measurement ( Figure 1b). The exact data for plant height were collected for three experiments DTir, DTrf and SDTrf with 3 m long ruler, and no connection was observed between multispectral reads and plant height (correlations < 0.3). The mean height was 214.9 cm, with standard deviation of 6.05 cm. All 545 plots (Table 1) were assessed for four times with a sensor and labeled by a maize breeder for leaf rolling, two times during the morning and two times in the solar noon ( Figure 1a) on sunny days, yielding 2180 labeled measurements and means of sensor reads. Additionally, DTir, DTrf and SDRtf were assessed on 22nd of July, to obtain an external validation set for testing the robustness of the modelling approach.
Timing of measurements ( Figure 1a) was chosen to capture the window of highest maize susceptibility to drought [54], heat [55] and the combination of these two stressors. This window covers growth stages from floral transition to early grain filling. During the experiments, different hybrids transitioned between developmental stages; however, the All 545 plots (Table 1) were assessed for four times with a sensor and labeled by a maize breeder for leaf rolling, two times during the morning and two times in the solar noon ( Figure 1a) on sunny days, yielding 2180 labeled measurements and means of sensor reads. Additionally, DTir, DTrf and SDRtf were assessed on 22nd of July, to obtain an external validation set for testing the robustness of the modelling approach.
Timing of measurements ( Figure 1a) was chosen to capture the window of highest maize susceptibility to drought [54], heat [55] and the combination of these two stressors. This window covers growth stages from floral transition to early grain filling. During the experiments, different hybrids transitioned between developmental stages; however, the aim of this study was not to analyze genotypic responses but rather the ability of a simple multispectral sensor to capture leaf rolling occurrence. Initial grades of leaf rolling (samples available as Supplementary Figure S1) were taken following methodology described in Bolaños and Edmeades [30] on scale 1 (green erect leaf blades) to 5 (rolled, lax or dead). However, the experimental design limited appropriate account for all factors affecting the rolling occurrence. Furthermore, as the leaf morphology affects the rolling maximum [24], the lower grades (higher than 1) also indicate leaf rolling, whose occurrence Remote Sens. 2022, 14, 2596 5 of 17 was of primary interest of this study. Reads were thus binary labeled with 0 for no leaf rolling (below 15% of the plants within the plot showing leaf rolling) and 1 (more than 15% of the plants within the plot showing leaf rolling). The 15% threshold represents the tolerable amount of secondary plants that are usually more susceptible to leaf rolling. The images of leaf-rolled plants in the field are available as Figure S1.

Data Analysis and Model Assesment
Six raw sensor reads were used to calculate 15 unique, simple vegetation indexes (VI). The indexes were calculated as absolute values of the quotient between differences of the subtracted and added values of each two pairs of wavelengths (wl a... f ): The raw sensor reads and VIs were scaled, centered and log-transformed and the principal component analysis (PCA) was carried out in R [56]. The raw sensor reads and the VI values (21 original features) were tested for differences between LR+ and LR− by the means of a Welch two-sample t-test. Prior to tests, the data were visually assessed for normality of distribution densities. The raw data and VIs were read into Python environment and four machine learning models were constructed. First model was the single layer perceptron (SLP) with single fully connected layer with 128 nodes with rectified linear unit (ReLu) activation function. The dense neural network layer was flattened and passed through softmax function to obtain predictions. For multilayer perceptron (MLP), another hidden layer was added with 32 fully connected nodes prior to flattening. Convolutional neural network (CNN) was setup with single 1D convolution layer of length 64, followed by two hidden fully connected layers with 48 and 24 nodes before flattening and passing to the softmax function ( Figure 2). These three models were setup in TensorFlow library with training in 100 epochs with batch size of 16. Additionally, a support vector machine (SVM) model was built with scikit-learn module svm with linear kernel and penalty of the error term (C) set to 1. Three procedures of model validation were followed using calibration (n = 2180) and external (n = 145) labeled datasets: 1.
Validation based on a 15% random subset (327) of the 2180 records with random seed number 109; 3.
Validation with an external validation set consisting of 145 separate records.
To ensure reproducibility of the results, the same subsets were used to validate each model.
The model performance was assessed by model accuracy and its standard deviation across folds in cross-validation, and by measuring accuracy, precision and recall in randomsubset validation and validation with external dataset. The breeder's classifications at solar noon (explained above) to LR− and LR+ were used as ground truth in model evaluation metrics. The performance indicators were calculated as: where TN is number of true negatives, TP is number of true positives, FP is a number of false positives, FN is number of false negatives and n is a number of relevant samples. Furthermore, F1 score was calculated as The CPU time was assessed on an Intel ® i7 9750H 6-core, 12-thread processor with 12 MB internal cache memory. Full notebook with Python code is available from the corresponding author upon request. The model performance was assessed by model accuracy and its standard deviation across folds in cross-validation, and by measuring accuracy, precision and recall in random-subset validation and validation with external dataset. The breeder's classifications at solar noon (explained above) to LR-and LR+ were used as ground truth in model evaluation metrics. The performance indicators were calculated as: where TN is number of true negatives, TP is number of true positives, FP is a number of false positives, FN is number of false negatives and n is a number of relevant samples. Furthermore, F1 score was calculated as The CPU time was assessed on an Intel ® i7 9750H 6-core, 12-thread processor with 12 MB internal cache memory. Full notebook with Python code is available from the corresponding author upon request.

Changes in Multispectral Sensor Reads in Leaf Rolling Conditions
All measurements in the calibration set were carried out during the extremely dry conditions ( Figure S2) and high temperature and VPD (Figure 1a). Conditions changed to very wet when the external validation set was assessed, leading to a low number (7) of

Changes in Multispectral Sensor Reads in Leaf Rolling Conditions
All measurements in the calibration set were carried out during the extremely dry conditions ( Figure S2) and high temperature and VPD (Figure 1a). Conditions changed to very wet when the external validation set was assessed, leading to a low number (7) Table 2). High standard deviations of raw wavelengths indicate the changes in light quality. However, the deviations decreased in VI values reflecting the normalization of the data. Interestingly, all VI values decreased in the LR+ in both datasets except VI680610 which increased slightly in LR+. According to the twosample t-test, all differences between LR− and LR+ were significant in the calibration set in both original features and Vis. In the external set, significant differences were observed only in reads at 610 and 680 nm among original features. In VIs, a lack of significant differences was observed among all indexes with reads at 730 nm as denominator, and 860 as numerator.
Principal component analysis (Figure 3) showed diverse and substantial correlations between original variables and their projections (PCs), seen as red arrows, e.g., eigenvectors. Full list of loading weights is available online as Table S1. First three principal components explained 74.0% of total variability in the dataset, separate PCs explaining 43.2, 19.0 and 11.8%, respectively. Principal component analysis confirmed patterns from Table 2 (Table S1) and ellipses represent 95% confidence intervals for each group based on Gaussian distribution.
Calculation of different normalized vegetation indices represents a convenient mean of auto-normalization of the raw sensor reads. The 15 normalized vegetation indices (termed VI) assessed in this study ( Table 2) showed significant variability between leaf rolling and plots without leaf rolling. Interestingly, between wavelengths from the red spectra (610 and 680 nm), lower difference was observed between LR-and LR+ (Figure 4). The same pattern was observed in VIs assessing wavelengths > 700 nm. The largest differences between LR-and LR+ reads were observed in VIs combining wavelengths > 760 nm and <700 nm.  (Table S1) and ellipses represent 95% confidence intervals for each group based on Gaussian distribution. Table 2. Raw sensor output at six wavelengths and 15 values of normalized difference vegetation indexes (VI) calculated from unique combinations of six wavelengths expressed as a mean ± standard deviation for maize plots showing leaf rolling (LR+) and no leaf rolling (LR−) in calibration (n = 2180) and external (n = 144) datasets measured by a multispectral sensor. Column p denotes significance according to two-sample t-test at values of α < 0.05 (*), <0.01 (**) and <0.001 (***). p values of differences >0.05 are denoted as non-significant (n.s.).

Feature
Calibration Set (n = 2180) p   (Table 2) showed significant variability between leaf rolling and plots without leaf rolling. Interestingly, between wavelengths from the red spectra (610 and 680 nm), lower difference was observed between LR− and LR+ (Figure 4). The same pattern was observed in VIs assessing wavelengths > 700 nm. The largest differences between LR− and LR+ reads were observed in VIs combining wavelengths > 760 nm and <700 nm.

Assesment Different of Machine Learning Algorithms for Prediction of Leaf Rolling
In stratified 5-fold cross-validation, considerable variability was detected between the performance of different models. Highest prediction accuracy with lowest standard deviation was observed for SLP, followed by CNN, SVM and MLP, respectively (Table 3). Highest precision in cross-validation was observed for SVM, accompanied by secondhighest standard deviation between folds. Recall was the highest for SLP, followed by

Assesment Different of Machine Learning Algorithms for Prediction of Leaf Rolling
In stratified 5-fold cross-validation, considerable variability was detected between the performance of different models. Highest prediction accuracy with lowest standard deviation was observed for SLP, followed by CNN, SVM and MLP, respectively (Table 3). Highest precision in cross-validation was observed for SVM, accompanied by secondhighest standard deviation between folds. Recall was the highest for SLP, followed by CNN. The highest F1 was observed for CNN. The compute times increased with model complexity in the order SLP < MLP < CNN < SVM. However, when attempting to generalize the results of the calibrated models with the random 15% subset, the prediction accuracies changed. The SLP was shown to be the least accurate model, however, with high recall. According to the precision metrics, the model aimed at target many times (many false positives) which was followed by many hits. Such results indicate overfitting in the model architecture possibly caused by many nodes (128) and only a single layer. Briefly, the model was able to extract features linked to leaf rolling in a stratified set, but when attempting to generalize an unrelated dataset, the performance metrics dramatically decreased. Support vector machines model showed highest accuracy, but according to the high precision and lowest recall, it was the most conservative model, yielding a low number of false positives. High accuracy and generalization ability of this model is in accordance with results of PCA ( Figure 3) and ability of simple dimension reduction (L2 norm in SVC linear kernel) to facilitate efficient feature extraction.
Contrarily, MLP and CNN showed fewer conservative values of precision with second highest accuracy (CNN) and 9.04% lower precision compared to SVM. Multilayer perceptron and CNN showed good generalization ability due to the added layers that mitigated the overfitting found for SLP. The best overall performance was captured by SVM and CNN, with higher accuracy and precision in SVM and higher recall in CNN. This was also confirmed by the highest values of harmonic mean accuracy of the model (F1) which was the highest for CNN, followed by SVM. Overall, compute times for the random subset training and predictions were significantly shorter (10 s to 136 s) compared to the cross-validation procedure.
The performance from validation procedures was mostly in alignment with the test using an external validation dataset. Due to the abundant rain between measurements of calibration and external datasets (Figure 1), there were only seven plots with detectable leaf rolling in the external dataset ( Table 2), four of which were detected in SDTrf with sandy loam (not shown). However, this represented an appropriate test of the model robustness, due to the changes in many aspects of plant vitality. Single-layer perceptron maintained poor generalization ability, although with all seven LR+ plots properly classified (Figure 5a). using an external validation dataset. Due to the abundant rain between measurements of calibration and external datasets (Figure 1), there were only seven plots with detectable leaf rolling in the external dataset ( Table 2), four of which were detected in SDTrf with sandy loam (not shown). However, this represented an appropriate test of the model robustness, due to the changes in many aspects of plant vitality. Single-layer perceptron maintained poor generalization ability, although with all seven LR+ plots properly classified (Figure 5a). As in the validation with the random subset, this validation was also followed by a high number of false positives reducing the F1 value to only 14.1 percent ( Figure 6) caused As in the validation with the random subset, this validation was also followed by a high number of false positives reducing the F1 value to only 14.1 percent (Figure 6) caused by very low precision (7.6%). High number of good classifications (7/7) followed by a relatively conservative number of false positives and no true negatives in MLP (Figure 5b) rewarded the highest F1 score of 77.7% in the external set validation ( Figure 6). Marginally lower F1 scores were obtained for CNN and SVM (both 76.9%), which were caused by the inability of the models to correctly classify all seven LR+ plots. by very low precision (7.6%). High number of good classifications (7/7) followed by a relatively conservative number of false positives and no true negatives in MLP (Figure 5b) rewarded the highest F1 score of 77.7% in the external set validation ( Figure 6). Marginally lower F1 scores were obtained for CNN and SVM (both 76.9%), which were caused by the inability of the models to correctly classify all seven LR+ plots.

Discussion
Phenotyping for drought responses in real time represents a new frontier for crop breeding and precision agriculture [57]; however, the advancements in this field are limited by the high costs of measurement equipment such as unmanned aerial vehicles and

Discussion
Phenotyping for drought responses in real time represents a new frontier for crop breeding and precision agriculture [57]; however, the advancements in this field are limited by the high costs of measurement equipment such as unmanned aerial vehicles and hyperspectral cameras. This implies the need for new low-cost proximal or remote sensing solutions, efficiently assessing plant physiological status in real time. Hot and dry conditions during the phenotyping procedure of our study (Figure 1a and Figure S2) in a large number of maize hybrids allowed us to robustly assess a large number of experimental plots with a prototype of multispectral proximal sensing node intended for use in the Internet of Things (IoT) applications. The sensor was used to assess leaf rolling, a trait that was shown to be involved in adaptation to drought and heat conditions [22]. There are various methods for the assessment of leaf rolling [23,29,31,58] and it is well known that different hybrids show different levels of leaf folding, especially given the varying water availability and temperature changes. However, the aim of our work was to assess if the leaf rolling occurrence, despite the varying levels of phenotypic expression, could be spotted or predicted based only on basic reads of spectral responses in red and near-infrared parts of electromagnetic spectra.
At the onset of leaf rolling, many physiological changes take place, such as reduction in photosynthetic activity [59] and changes in metabolic genetic regulatory mechanisms [60].
The reduction in photosynthetic activity should be mostly visible at wavelengths between 710 and 740 nm capturing fluorescence overlap of both photosystem I and II, and 685 nm representing a peak of fluorescence of photosystem II [61] which is within the spectral peaks captured by 680 and 730 nm diodes (according to 20 nm full width at half-maximum) in our study. This was corroborated by the reduction in VIs assessing 680 and 730 nm wavelengths (Table 2, Figure 4).
Stress adaptation, such as the reduction in photochemical activity, also involves translocation of the biochemicals [32]. Additionally, in responses such as leaf rolling, the previously unexposed plant parts become intercepted by sunlight, such as abaxial parts of leaves, having the different pigment mixtures compared to adaxial parts [62]. The reflectance between 700 nm and 980 nm, where the spectral responses of the brown pigments are located, along with chlorophyll fluorescence signals might provide the insight in plant biochemistry, and consequently, physiological status [63]. The exposure of plant abaxial surfaces to sunlight reveals red-brown pigments due to the water deficit [64], thus changing the leaf optical properties [65]. According to the results presented in Weber et al. [66], combined water stress and heat stress, as in our study (Figure 1 and Figure S2), are expected to produce the most visible response in leaf reflectance at wavelengths near the reflectance of brown pigments. This was also confirmed in our results, where the difference between LR− and LR+ increased in indices combining wavelengths > 700 nm and <700 nm ( Figure 4). Among these indices is also the commonly used NDVI combining reflectance at approximately 680 and 770 nm [67]. Normalized vegetation indices are traditionally used to assess the vegetation cover from satellite imagery [34], but the advancement of analytic solutions allows their deployment for analysis of a wide range of quantitative and qualitative traits. However, one must note that the usage of multispectral sensor reads also bears the risk of reduced repeatability of the results due to the deviations of atmospheric and sensor effects [68], so the use of VIs is advised, such as Vis in this study.
Due to the rapidly changing climate [69], there is strong pressure on developing new proximal and remote, data-rich high-throughput plant phenotyping solutions, rectified by the lack of manpower and the increased demand for high-quality data [70]. The UAV and phenopole (phenotyping pole) solutions yield similar insight and information density of the reads, however, with more throughput in UAV solutions [71], but more temporal information with phenopoles, facilitating the monitoring of the trait onset dynamics. The sensor node used in our study aims to provide the low-cost solution to this problem, so that the increased density of the sensing nodes, providing the better sampling, could compensate for UAV's higher throughput, at a lower cost. Furthermore, given the application of our sensor in the envisioned IoT framework, the simultaneous, real-time data collection could provide higher information density compared to UAV, without the need for human intervention [72]. Combining such developments with machine learning methodology should converge to provide new layers of information in plant monitoring paradigm for transition to Agriculture 4.0 [73]. Modelling complex data with unknown hidden features can be efficiently carried out using ML methodology to explain considerable amounts of variance in agricultural production deviations [74]; however, the interpretability of the models is low and despite their high accuracy, they are unable to surrogate the science.
In our study, only marginal differences were observed between MLP, CNN showing good generalization ability due to the added layers that mitigated the overfitting found for SLP. In stratified cross-validation, SLP showed the highest F1 score, followed by the CNN. However, the severe drop in accuracy of SLP in un-stratified datasets can be viewed from the perspective of poor generalization abilities of the networks having a small number of layers with an excessive number of neurons [75], which is apparently the case with SLP presented in our study. Thus, the model was able to extract features linked to leaf rolling in a stratified set, but when attempting to generalize an unrelated dataset, the performance metrics dramatically decreased.
Study on plant Bromus inermis using hyperspectral indices and three ML algorithms, CNN, SVM and random forest, showed feasibility of drought classification in ML framework with the highest prediction accuracies observed for SVM [32]. In our study, SVM also showed the highest overall prediction accuracy in random subset validation. This is also corroborated by the results of PCA ( Figure 3) and the ability of a simple dimension reduction technique such as Tikhonov regularization [76] (L2 norm in SVC linear kernel) to facilitate efficient feature extraction.
All models, except SLP, in our study performed well on the given dataset, and the observed differences were not discriminatory. Studies assessing vegetation classifications by the use of hyperspectral imagery with SVM, artificial neural networks and CNN by Hassan et al. [77,78], demonstrating very high classification accuracies reported similar conclusions. The added value in our research can be seen in the analysis of performance in tabular data using additional, unrelated validation dataset (Figures 5 and 6). It was shown that CNN and SVM yielded more conservative, similar performance metrics, while MLP showed the best overall performance, but with only a 1% increase in F1 score. On the other hand, the limitations of our study can be seen from two perspectives. As an early report, our dataset was only created in a single stage of plant development over a limited number of climatological scenarios [79], so further efforts with increased spatial and temporal resolution are needed. Additionally, the deep learning models represent a good way to cope with many types of data taking many conformations in multidimensional hyperplanes, however, with limited interpretability. Further assessment should thus include other ML models such as decision trees retaining more information on the effects of predictor variables.
The usability of ML was also demonstrated at many levels of agricultural production, by using multi/hyperspectral reads and imagery and climatological data, such as disease detection [41,48,[80][81][82][83], nutrient deficiency assessment [84,85], stress detection [86][87][88] and in-season predictions of agronomic performance in maize [89,90], sorghum [91], soybean [92], wheat [93][94][95] and cocoa [96]. Corroborating these new avenues in agricultural sciences with the constant involvement of large companies in the development of new learning ML algorithms, and the optimization of the existing ones, Python open-source libraries Tensorflow [97] created and maintained by Google and Pytorch [98] created and maintained by Facebook, makes the future uses of ML incomprehensible.

Conclusions
This study demonstrated the ability of machine learning algorithms to use simple multispectral reads for efficient classification of maize leaf rolling. It was shown that there is variability between ML algorithms in terms of performance metrics, but also, computing times. There is a growing need for increased spatiotemporal resolution of plant monitoring with affordable remote sensing solutions, especially in the context of the Internet of Things (IoT). It was demonstrated that ML algorithms can efficiently extract information from multispectral reads and predict plant states such as leaf rolling. Since the envisioned use of the demonstrated sensor is an IoT framework, the inclination is towards less computationally intensive ML algorithms, without sacrificing performance. Thus, the use of MLP might represent the best overall option. Increasing the information density and using smart solutions for decision support in agriculture could facilitate the transition to the Agriculture 4.0 empowered by the nexus between food production and machine learning. Further research should test the framework in multiple topographies and water/nutrient availability scenarios to tackle the abilities of simple and affordable sensing solutions in the dissection of biological systems showing the highest order of complexity.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/rs14112596/s1, Figure S1. (a) Maize plot showing no leaf rolling (LR−) and a plot with leaf rolling (b); Figure S2. Precipitation during the months of June and July 2021 with 50-year average-based percentiles of dry and wet conditions for Osijek. Gray boxes show measurement times, and the red box shows external validation set measurement time; Figure S3. Learning rates of neural network algorithms from 85% random subset training procedure; Table S1: loading weights of the first three principal components

Conflicts of Interest:
The authors declare no conflict of interest.