1. Introduction
Achieving global food security is an important 21st-century challenge [
1,
2,
3,
4,
5]. Population growth, climate variability, and mismanagement have reduced the availability of natural resources needed for growing crops [
6,
7]. Furthermore, biotic and abiotic stresses reduce crop yields, emphasizing the importance of adopting precision agriculture practices [
7,
8,
9]. Biotic stresses have biological sources, such as pathogens (viruses, bacteria, and fungi), pests, and weeds [
7,
10,
11]. Abiotic stressor examples are radiation, salinity, floods, and water and nutrient deficiency, among others [
10,
11,
12].
Deficiencies of water and nutrients can significantly reduce agricultural productivity. Precise and accurate early detection of water and nutrient stress in plants can boost agricultural productivity while improving water use efficiency [
13,
14,
15]. Such detection is a necessary first step for sustainable intensification-increasing crop yields without causing environmental degradation or converting additional non-agricultural land into farmland [
16,
17]. Then, corrective treatments, such as irrigation and fertiliser applications, should be applied to only the areas where the treatments would be beneficial, and in the amount needed to satisfy the actual need [
12,
16,
18].
Because different stresses can cause similar symptoms, the determination of crop stresses using visible symptoms is often a complex manual task predominantly conducted by trained and experienced agronomists, crop scientists, and plant pathologists. In other words, manual stress detection is labor-intensive, time-consuming, and inconsistently reproducible due to differences in experience, subjective interpretation, and manual ratings [
19]. Image processing (IP) and machine learning (ML) can be coupled to mitigate the weaknesses of manual methods. Recent advances in both disciplines have made remote sensing-based plant stress inference computationally tractable [
20,
21].
IP and ML techniques have been coupled in multiple agricultural applications [
16,
22,
23,
24,
25,
26]. The coupled techniques have been applied to identify [
27,
28,
29], classify [
30,
31], quantify [
19], and predict [
32,
33] crop stress. For instance, Raza et al. used visible and thermal image inputs with support vector machines (SVMs) and Gaussian classifiers to detect water stress in spinach, achieving 97% accuracy [
28]. Stas et al. compared Boosted Regression Trees (BRTs) and SVM models using NDVI data from low-resolution SPOT VEGETATION imagery to predict the wheat yield, finding BRT consistently superior based upon cross-validation error (RMSE) [
34]. Naik et al. used high-quality RGB images of soybeans’ fractional vegetation ratios (FVRs) to develop a real-time classification framework for detecting iron deficiency chlorosis (IDC) [
35]. They employed and compared Naïve Bayes (NB), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), SVM, K-Nearest Neighbors (KNN), and a Gaussian mixture model, ultimately achieving 96% accuracy. When Moghimi et al. [
36] and Niu et al. [
37] used hyperspectral and multispectral images to evaluate salt tolerance and vegetation cover under water stress, Random Forest was the most reliable. Khanna et al. used a complex dataset consisting of 55-dimensional vectors of RGB, infrared, and hyperspectral images, as well as the canopy cover, crop height, reflectance, and vegetation spectral indices, to detect crop stress [
16].
Building on the work by Khanna et al. [
16], this study focuses solely on developing, evaluating, and comparing Machine Learning Image Modules (MLIMs), each intended to identify stresses due to lack of water and nitrogen in growing sugar beet plants. Moreover, to mitigate overfitting and enhance model robustness, the existing dataset was artificially expanded through augmentation. At the beginning of the training and testing process, each MLIM consists of 54 pairs of one-of-six machine learning algorithms and one-of-nine input datasets derived from RGB images and canopy cover. Each such pair, referred to as a Machine Learning Image Submodule (MLIS), is intended to achieve accuracy in identifying the same particular stress. After training and testing, only the best one or more MLIS(s) are retained within each of the four MLIMs.
Unlike previous work that required specialized equipment, raw RGB images can be easily captured with common devices such as smartphones or SmartGlass. Use of the developed MLIMs would greatly simplify the process of detecting and classifying water and nitrogen stress in crops and determining nitrogen stress severity. The mission of this work is to develop low-cost, non-invasive modules deployable on mobile devices to enhance the timing of agricultural resource management. Although demonstrated for sugar beets, application of the reported MLIM development procedure to other crops would enable farmers worldwide to make more informed decisions, improving the economic impact of water use and enhancing agricultural productivity.
2. Materials and Methods
An overview of [
16]’s experimental conditions and data is presented below in
Section 2.1. Then, the proposed MLIM method of detecting abiotic stresses of different kinds and levels is illustrated. The MLIM method included two phases: preparing input datasets using image-processing approaches; and developing and testing MLIMs for detecting and identifying four crop stress classifications.
2.1. Experiment’s Environmental Conditions
The experiment was conducted at the ETH research station for plant sciences in Lindau Eschikon, Zurich, Switzerland (47.45° N, 8.68° E). Sugar beet plants of the variety “Samuela” were grown under controlled climate conditions in a greenhouse. In this experiment, six sugar beet plants were planted in rectangular plant cultivation boxes placed about 2 m below light sources.
The 150 images included 3 replications × 10 dates × 5 treatments: control, water stress (including sufficient and insufficient water), and nitrogen stress (including nitrogen stress and more nitrogen stress).
Table 1 and
Table A1 provide more information about the treatment and dataset used in this study.
2.2. Dataset and Preprocessing
Visible images were captured using the Intel
®Realsense ZR300 (RealSense Inc., Cupertino, CA, USA) camera. The 150 raw images had 1920 × 1080 pixels. See [
16] for more information on the imaging setup. These raw images included information beyond the cropped area, and due to installation imprecision, the raw images were from −2 to 2.5 degrees away from perfect alignment.
The image processing workflow, illustrated in
Figure 1, was applied to correct these deviations. The preprocessing procedure consisted of two main steps: rotation correction and region-of-interest (ROI) extraction. Image rotation angles ranged from −2° to +2.5°, as reported in the original dataset metadata. These rotations were corrected using the imutils 0.5.4 library in Python by reorienting each image around its center. The ROI was then defined in two stages: first, the inner edges of the growth boxes were selected and cropped using the SelectROI function from OpenCV 4.5.5; second, all images were re-cropped from the center to a uniform dimension of 600 × 400 pixels to ensure consistency in shape and size before being processed by the machine learning models. Equations (1) and (2) show the rotation transformation matrix.
where θ is the angle of rotation, and (X, Y) represent the coordinates of the rotation center.
Then, to develop robust models, five augmented versions of each image (rotation, zooming, flipping, brightness adjustment, and noise addition) were generated through preprocessed images, resulting in a total of 900 images (
Table A2). Although the dataset originates from [
16] and was collected under relatively uniform conditions, the augmentation process with randomly applied modification factors introduced variability and simulated real-world environmental inconsistencies, thereby enabling a more reliable evaluation of model robustness.
2.3. Proposed Method
Figure 2 illustrates aspects of the proposed crop stress detection framework and processing. In Phase 1 (
Figure 2a), raw RGB images are preprocessed and subjected to multiple feature extraction techniques to generate nine different input sets.
The set names refer to the employed extraction techniques, including RGB bands, canopy cover, Excess Green (ExG), Excess Green–Blue (ExGB), Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), Canny edges, RGB-based water stress (RGBWS), and Green band.
For Phase 2,
Figure 2b shows the four machine learning modules trained to identify, respectively, (a) stress existence, (b) water stress existence, (c) nitrogen stress existence, and (d) the severity level of nitrogen stress, classified as sufficient, nitrogen stress, or more nitrogen stress. Each of the four modules contains 54 Image Learning Machine Submodules (MLISs). Each MLIS is designed to accept only one of the nine input datasets and to employ a single machine learning algorithm, namely Multilayer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Linear Discriminant Analysis (LDA), or Stochastic Gradient Descent Classifier (SGDC). All algorithms were implemented using the Scikit-learn 0.24.2 package, except for the Multilayer Perceptron (MLP), which was implemented using TensorFlow 2.5.0 and Keras 2.4.3.
Figure 2c illustrates a close-up of the water stress existence module, when it is accepting merely the RGB dataset as the input to each of the six machine learning algorithms separately. Each of the six MLAs will analyze the input and determine whether the plant had sufficient or insufficient water when the image was taken. That model output is then compared to the actual condition label for validation. This two-phase pipeline enables scalable, accurate stress detection using only image data and machine learning. Each phase is explained in more detail in subsequent sections.
2.3.1. Phase 1: Generating Input Sets
Table 2 shows the nine input datasets that were created for this study by extraction from preprocessed RGB images. The nine input datasets were derived from RGB images and related indices as follows.
RGBCC: Consists of Red (R), Green (G), and Blue (B) bands with Fractional Canopy Cover (FCC). For the FCC input, we used the results from [
36]’s study, which analyzed the same image dataset used here. FCC was calculated as
where N
leaf and N
total are the number of leaf pixels and total image pixels, respectively. RGB and FCC arrays were flattened using the NumPy 1.21.3 library to generate the input matrix. Haddadi applied nine compound image segmentation methods to estimate fractional canopy cover under drought and nitrogen stress conditions. They showed that the combination of the Excess Green minus Excess Red vegetation index with manual thresholding achieved the highest accuracy (94.69%). Here, we used their method (Haddadi’s fractional canopy cover estimates) to calculate the canopy cover.
EGI and EGBI indices: Vegetation indices were computed from visible bands to enhance plant–soil discrimination:
HOG descriptor: The Histogram of Oriented Gradients (HOG) is a widely used feature descriptor in computer vision and image processing, primarily applied for object detection. The HOG computation begins by dividing the image into small cells, which are then grouped into blocks to allow local contrast normalization. HOG features were extracted using the following equation:
where g is the gradient magnitude, g
x and g
y are, respectively, the horizontal and vertical gradients of each pixel in the image, and θ is gradient orientation in radians. For each cell, the histogram of unsigned gradient orientations is computed and weighted by their corresponding magnitudes. These histograms are concatenated and normalized within each block to form the block descriptor, denoted by d(p), where p = 1, 2, …,
n, and
n is the total number of blocks in the image. Each block descriptor d(p) formed the matrix D(i) ∊
n×c as
where i = 1, 2, …, m; m is the number of images in the training set; and c is the number of elements in each block descriptor. The value of c is calculated using Equation (9):
where bins are the number of gradient orientation bins per cell, and CPB is the number of cells per block.
SIFT features: The Scale-Invariant Feature Transform (SIFT) is a robust keypoint detection and description algorithm widely used in image analysis. The process begins by constructing a Difference-of-Gaussians (DoG) scale-space to identify potential keypoints invariant to scale and rotation. The scale-space representation of an input image I (x, y) is obtained by convolving it with Gaussian filters G (x, y, σ) of varying standard deviations σ:
where “∗” represents the convolution operator, and L (x, y, σ) is the resulting Gaussian-blurred image at scale σ. Then, the DOG function D (x, y, σ) is computed as the difference between two consecutive Gaussian-blurred images at scales σ and kσ:
The local extrema of D (x, y, σ) are detected by comparing each pixel with its 26 neighboring points (8 in the same scale, 9 in the scale above, and 9 below). A point is identified as a potential keypoint if it represents a local maximum or minimum. The extremum location Z is refined using a Taylor expansion to improve accuracy:
Keypoints with low contrast or located along edges are discarded using a predefined threshold. For each remaining keypoint, a consistent orientation is assigned based on the local image gradients, ensuring rotation invariance. The gradient magnitude m (x, y) and orientation θ (x, y) at each pixel are calculated as follows:
Finally, orientation histograms are created from the gradient magnitudes and orientations within the neighborhood of each keypoint, forming the SIFT descriptor vector that characterizes the local image structure.
Canny Edge Detection (CED): The canny edge detection method outlined edges by looking for the local maximum of the gradient of input image. The canny detection method used two thresholds to detect strong and weak edges. It included the weak edges in the output only if they are connected to strong edges. Non-maximum suppression used to remove spurious responses, to detect edges for a shape, and two threshold values (t
1, t
2) were applied to determine the potential edges of a leaf image (t
1 < t
2). The recommended t
2 to t
1 is 3:1. The smoothed image is then filtered with a Sobel kernel in both the horizontal and vertical directions to derive the first derivative in the horizontal direction (G
x) and the vertical direction (G
y). From these two images, the edge gradient and the direction for each pixel can be obtained as follows:
RGBWB: Binary images obtained using EGI thresholding were recolored to restore white leaf pixels, minimizing soil interference.
Green band: The Green channel alone was used to enhance the contrast between vegetation and soil.
2.3.2. Phase 2: Developing Machine Modules for Detection and Classification
The purpose of this phase was to formulate an Identify–Classify–Quantify crop stress model that accepts RGB images or data derived from RGB images. The proposed model consists of four distinct modules, each designed to perform a different crop stress detection or classification task. The first module determines whether a crop is under stress or not, classifying it into one of two categories: under stress or not under stress. The second module classifies crops as having sufficient or insufficient water. The third module classifies a crop as being under nitrogen stress or not under nitrogen stress. The final module classifies a crop as being unstressed by nitrogen sufficiency, nitrogen stress, or more nitrogen stress.
In all, each of the four MLIM modules incorporates 54 different Machine Learning Image Submodule (MLISs). Each MLIS employs one of the nine
Table 2 input datasets and one of six machine learning algorithms (MLAs): Multilayer Perceptron artificial neural networks (MLPs) [
41], Support Vector Machines (SVMs) [
42], Random Forests (RFs) [
43], Decision Trees (DTs) [
44], Linear Discriminant Analysis (LDA) [
45], and a Stochastic Gradient Descent Classifier (SGDC) [
46]. To ensure a robust and unbiased evaluation, multiple machine learning algorithms were employed. These models represent diverse learning paradigms—neural, kernel-based, ensemble, rule-based, and linear—allowing a comparison across different data assumptions and decision complexities. Moreover, similar algorithms have been widely applied in previous studies on abiotic stress detection (e.g., [
9,
28,
47,
48,
49,
50]), demonstrating their effectiveness for plant stress classification.
Each MLIS applies one of six supervised algorithms trained to map input features:
To a binary stress label, y
i ∊ {0, 1}. All models aim to learn a function f (x
i; θ) that minimizes a general loss function:
where
n is the number of samples, and L is a model-specific loss function.
Multilayer Perceptron (MLP-ANN): The MLP uses a feed-forward neural network with one or more hidden layers. The output of layer l is given by
where W
(l) and b
(l) are the weight matrix and bias vector, h
(l) is the output vector of layer l, and σ is the activation function (ReLU in this study). The network parameters are optimized using stochastic gradient descent by minimizing the cross-entropy loss:
where L is the cross-entropy loss,
n is total number of training samples,
is the true (target) label of the ith sample, and
is the predicted output of the ith sample.
Support Vector Machine (SVM): SVM aims to find the optimal hyperplane w
⏉x + b = 0 that maximizes the margin between classes by minimizing
Subject to y
i (w
⏉x
i + b) ≥ 1 − ξ
i, ξ
i ≥ 0. Letter w stands for the weight vector defining the separating hyperplane, b is the bias term, C is the penalty parameter controlling misclassification tolerance, and ξ
i stands for the slack variable for sample i.
Random Forest (RF): RF combines predictions from T independent decision trees f
t (x):
where each tree is trained on a bootstrap sample and uses random feature subsets at each split to reduce variance. f
t (x) is the prediction of the tth decision tree, and T is the total number of trees in the forest.
Decision Tree (DT): DT recursively partitions the feature space by selecting thresholds that maximize the information gain:
where IG(S, A) is the information gain of attribute A for dataset S, H (S) is the Shannon entropy of set S, and S
υ is the subset of samples where attribute A takes value υ.
Linear Discriminant Analysis (LDA): LDA projects data onto a lower-dimensional space, maximizing class separability:
where w is the projection vector maximizing class separability, S
B and S
W are between- and within-class scatter matrices, and w
⏉ is the transpose of the vector w.
Stochastic Gradient Descent Classifier (SGDC): SGDC minimizes a regularized loss function iteratively:
where w is the model coefficient vector, x
i is the feature vector of the ith training sample, y
i is the true label of the ith sample,
n is the total number of training samples, λ is the regularization parameter controlling model complexity, L is the loss function (hinge or logistic loss), and
is the squared Euclidean norm of the weight vector, serving as the regularization term.
To identify, classify, and quantify crop stress, the performance of each MLIS was evaluated. A selected seventy percent of the input data was used for training each of the MLIS to identify, classify, and quantify crop stress. Then, the same remaining thirty percent of the data was used for testing each MLIS. The MLP-ANN was built using the Keras and Tensorflow Python 3.8 packages, and other ML models were designed using Scikit-learn [
51]. MLAs were fine-tuned by adjusting hyperparameters available in the same library. Details of the MLP structure and its hyperparameters’ tuning are provided below.
A multilayer perceptron (MLP) is an artificial neural network (ANN) with multiple layers of interconnected nodes. A neuron in a hidden layer is connected to the neurons in the previous layer, and its output is passed to neurons in the next layer [
52,
53]. For the multiclass classification problem presented here, a single neuron per class exists in the output layer using the logistic activation function. Each employed MLP model has one input layer, one or more hidden layers, and one output layer (
Figure 3).
MLP development consisted of three parts: partitioning the total dataset into training and testing sub-sets; looping; and stopping callback. During training, all model weights were set in two ways: (1) empirically, by changing dense layers’ numbers, neurons of the hidden layer, and activation functions [
54]; and (2) by using Keras tuner’s RandomSearch capability [
55]. Here, Keras tuner library version 1.0.3 is used. The optimizer was ADAM, and the loss function was categorical cross-entropy. The convergence of the model was evaluated based on a specific test accuracy value. Use of a loop with a stopping condition halted the training epochs when the highest test accuracy was achieved.
2.3.3. Training Process
After using 70% of the input data to train an MLIS, the remaining 30% of the data was used to test the MLIS and assess its accuracy. The modules were trained using an Intel® Xeon® Silver 4210 (Hewlett Packard Enterprise, Spring, TX, USA) processor with twenty 2.20 GHz cores and 125.5 GB of memory, and an NVIDIA GEFORCE RTX3090 (NVIDIA, Santa Clara, CA, USA) graphics card was used to train the proposed model. All codes were executed in the Python programming language (Spyder 5.0.5 environment).
2.4. Statistical Analysis
For each Machine Learning Image Module (MLIM), the performance of its associated Machine Learning Image Submodules (MLISs) was evaluated using statistical metrics and computational time. A confusion matrix was generated for each MLIS, and standard machine learning evaluation indicators, including accuracy, precision, recall, and F1-score, were calculated and compared to identify the most accurate MLIS for each module.
In this study, ‘positive’ refers to non-stress conditions, such as sufficient water or the absence of nitrogen stress, while ‘negative’ refers to stressed conditions, such as insufficient water or nitrogen stress. Here, TP represents true positives, where the model correctly classifies a non-stress condition; FP represents false positives, where the model incorrectly classifies a stressed condition as non-stress; FN represents false negatives, where a non-stress condition is misclassified as stressed; and TN represents true negatives, where the model correctly classifies a stressed condition.
2.5. Equilibrating Conflicting Objectives
Concerning each MLIS (combination of ML and input dataset), two critical outputs of this study are the accuracy of the computed results and the employed computer execution time. Both for each MLIM alone and when comparing all MLISs against each other, a user’s objective of achieving the highest accuracy can conflict with their objective of requiring the least execution time. While achieving the highest accuracy or the shortest execution time might be important for some situations, under other circumstances, it might be preferable to reach an intermediate state that balances both objectives. Bargaining theory includes methods useful for finding the equilibrium point and the trade-off value between such conflicting objectives [
56,
57,
58].
The trade-off between two objectives is calculated by examining how the Pareto optimal value of one objective shifts in response to a change in the Pareto optimal value of the other objective, within the set of all feasible Pareto optimal solutions defined in two-dimensional real space (S ⊆ ℜ
2, where ℜ
2 refers to two dimensions in real number space). Assuming a maximization objective (in which the largest objective function value is the best), Equation (9) is used to normalize the optimal accuracy and time-execution values. A normalized value of one represents the best objective function value, and a zero value represents the worst.
where
is the normalized ith objective function value,
is the ith objective function value,
is the ith minimum objective function value, and
is the ith maximum objective function value.
Before applying any conflict resolution method, MLISs dominated by other MLIMs are removed from the dataset. For example, if two MLISs have the same accuracy but one has a lower execution cost, the MLIS with the higher cost is considered dominated and removed. In this work, we utilized the Nash method to illustrate how decision-makers can assign proportional priorities to maximizing accuracy versus minimizing the execution time to choose between different MLISs.
Nash’s solution, an economics-based bargaining game theory method, is applied here [
59]. This method aims to find an optimal point upon the Pareto frontier that concurrently is most distant from the point of disagreement, where both objectives achieve their worst values. Therefore, a pair of (f*
1, f*
2) is a Nash bargaining solution if it solves the following optimization problem:
where f
1 and f
2 are the values of the first and second objective functions, respectively;
and
are the weights of importance of the first and second objective functions, respectively; and d
1 and d
2 are the values of the least achievements of the first and second objective functions, respectively. This conflict is defined in the S set (S ⊆ R
2), in which S is the set of feasible bargaining objective function value pairs.
We calculated the average performance for each combination of machine learning model and input set (MLIS) across all four modules. By adjusting the weights assigned to the objectives (high accuracy and low computational cost), we could (a) determine which input sets were most cost-effective (meaning they balance accuracy with execution time cost) for each machine learning algorithm; and (b) determine which algorithms were most cost-effective for each input set.
3. Results and Discussion
The identification of water stress, as well as the identification and classification of nitrogen stress severity, were each addressed using the same set of machine learning algorithms and input sets. Each of the 54 unique combinations of an algorithm and an input set is referred to as a Machine Learning Image Submodule (MLIS). The following sections report the results for each step of phase 2. Phase 2 develops MLIMs for detecting stress existence, the type of stress, and the severity of nitrogen stress.
3.1. Stress Existence Module
To evaluate the existence of stress, each of the six ML algorithms was run separately with each of the nine different types of input sets. Then, the obtained results were organized and summarized based on the test accuracy (%) into a radar chart (
Figure 4).
Figure 4 shows that the point farthest from the center has 100% test accuracy, and the point closest to the center is the closest to zero accuracy.
The MLP algorithms using RGB input achieved a 96.67% accuracy in detecting stress existence. With the same input, the DT and LDA algorithms were the least accurate, below 90%. Using RGBCC input, SVM achieved 91.85% accuracy, while DT and LDA had the lowest accuracies. Overall, the MLP algorithm often performed the best with all inputs, while DT and LDA frequently were least accurate. The superior performance of the MLP models can be attributed to their ability to capture nonlinear and complex interactions among RGB spectral bands, which are typical of subtle stress-induced variations in plant canopies. This aligns with previous findings that artificial neural networks and ensemble methods outperform linear models in predicting vegetation responses to stress due to their higher capacity for nonlinear feature mapping [
47,
60]. In contrast, linear classifiers such as LDA and rule-based models like DT were less effective because overlapping reflectance patterns in the visible spectrum make linear separations unreliable [
28,
61]. The MLP’s iterative learning and distribution-free structure also enhance flexibility and generalization, allowing it to better adapt to varying spectral–structural relationships in stressed versus non-stressed canopies.
Figure 5 shows the confusion matrix of the most accurate MLISs for detecting stress existence, MLP with RGB input. This shows that most of the predicted classification labels of non-stress and stress matched the actual true labels of the tested boxed crops. The 51 true positives, 9 false positives, 0 false negatives, and 210 true negatives correspond to correctly classifying 85% of non-stress and all stressed boxed crops in the test data.
The statistical indices in
Table 3 confirm that these models also had a near 97% precision, recall, and F1-score for both stress and non-stress categories, indicating perfect detection. Notably, this study investigates stress presence or absence, which [
16] did not.
3.2. Detecting the Type of Stress and Its Severity
3.2.1. Water Stress Module
The water stress detection module was less accurate than the stress existence detection module, and the resulting
Figure 6a radar plot had larger radii. The most accurate MLISs had RGBCC input with the RF model (95.93%); RGB with MLP (95.56%); and RGB with RF (95.18%). The least accurate MLISs were HOG with DT (67.41%), SIFT with SGDC (68.89%), and CED with DT (72.59%).
Figure 7a displays the confusion matrix for the most accurate water stress detection MLIS (MLP with RGB input). This model correctly classified 95 out of 104 instances of insufficient water (stressed condition) and 164 out of 166 instances of sufficient water (non-stressed condition).
Table 4 summarizes the statistical parameters of the most accurate MLIS for the water stress module, obtained using RGB image input. The 95.93% test accuracy was lower than the 97.62% achieved by Khanna et al. [
16], using Subspace KNN and RUSBoosted Trees algorithms with a 55-dimensional input set, including height, canopy cover, reflectance, and hyperspectral bands. Note the relatively comparable accuracy of the much simpler MLP-RGB-based MLIM approach presented here.
For water stress, changes in Green reflectance occur but are often accompanied by canopy structural alterations, such as leaf rolling, wilting, and fractional canopy cover reduction, which influence both pigment absorption and scattering. Studies have reported that canopy-based features like the fractional canopy cover are strongly correlated with soil water availability and improve stress classification when combined with spectral data [
16,
60,
62,
63]. The incorporation of FCC in this study improved water-stress detection, especially for RF and MLP models, confirming that combining structural and spectral cues enhances overall model robustness. Using FCC and integrating it with the RGB image (RGBCC) allowed for the detection of water stress with higher accuracy by a simpler machine (RF) and with lower computational cost compared to an MLP.
3.2.2. Nitrogen Stress
Two types of classifiers were designed for detecting nitrogen stress. The first functioned to identify the severity of nitrogen stress at three levels (sufficient, nitrogen stress, and more nitrogen stress), resulting from high, medium, and low nitrogen inputs, respectively, as was previously examined by Khanna et al. [
16]. The second classifier type merely tried to detect nitrogen stress.
For three-level nitrogen stress detection,
Figure 6b shows that the highest accuracy was achieved by RGB with MLP (94.44%), while the weakest accuracy was by RGBWB with MLP (70.74%) (
Figure 6b). Also, for the three levels,
Figure 7b shows that RGB with MLP was the most accurate classifier, correctly identifying 95% of non-stressed cases (58 out of 61), 101 out of 107 more nitrogen stress cases, and 95 out of 102 nitrogen stress cases. Note that the RGB with MLP model’s 94.44% accuracy was 13.49% higher than the best 80.95% accuracy obtained by Khanna et al. [
16] for the same three-level problem (using an SVM algorithm), with a complex 55-dimensional input set.
For two-level nitrogen stress detection, RGB with MLP achieved the highest accuracy at 95.93%, followed by RGB with RF, at 89.51%. EGI with DT was the least accurate, with an accuracy of 67.9% (
Figure 6c). This new two-level detection machine improved the nitrogen stress detection accuracy by 1.49% compared to the three-level detection model. For two-level nitrogen stress detection, RGB with MLP was the most accurate classifier, correctly identifying 108 of 112 stressed samples and 47 of 50 non-stressed samples. This model categorized nitrogen levels into two groups: sufficient (high nitrogen input) and insufficient (medium and low nitrogen inputs) (
Figure 7c).
For nitrogen stress, the high accuracy of RF and SVM using only the Green band reflects the strong sensitivity of Green reflectance to chlorophyll concentration, which directly depends on nitrogen availability. Nitrogen deficiency leads to a measurable reduction in leaf chlorophyll, resulting in increased Green reflectance before visible yellowing or senescence occurs [
64,
65,
66]. Burns et al. (2022) showed that the index derived from the Green region had a strong correlation with the nitrogen content and was effective for the early diagnosis of nitrogen stress across crops [
66].
3.3. Execution Time and Accuracy Comparison
Figure 8 shows the average computation time used by the MLISs for the above classifications. MLISs employing the MLP algorithm were most accurate but also required the most computation time. SGDC was the fastest algorithm. Most algorithms, other than MLP, executed within about one to ten seconds. Despite slight accuracy differences, faster algorithms such as SGDC can be substituted with MLP for some stress treatments.
When considering just one MLA, different input sets required different execution times. The HOG input required the least execution time, followed by the SIFT input. The RGBWB input required the longest execution times, followed closely by RGBCC.
Evaluation of MLIS test results shows a clear conflict between the processing execution time and the accuracy of the results. To address that conflict using the Nash method, weights were applied to prioritize the two objectives involving accuracy and time. For example, the weight pairs of 50%:50%, 30%:70%, and 70%:30%, respectively, represented equal weight for both objectives; 30% weight for accuracy and 70% for speed; and 70% for accuracy and 30% for speed.
The first Nash method step was to determine the best input set for each MLA. Three optimizations were run using the above weights for accuracy and cost.
Table 5 shows the resulting most cost-effective input sets.
The second step was to determine the best MLAs for each input set, and three more optimizations were performed, using the same weights as above. Then, for each input set, the most cost-effective machines were determined (
Table 6). As shown in
Table 5, when considering both time and accuracy equally, the combination of MLISs that stood up were RGB for MLP; RGB for SVM; RGBCC for RF; RGBWB for DT; RGB for LDA; and RGB for SGDC.
If time was prioritized, the results were similar, except that EGI for MLP became the most cost-effective MAL. This indicates that the time factor had little impact on determining the most cost-effective model. When accuracy was prioritized, the best combinations were RGB for MLP; RGB for SVM; RGBCC for RF; RGBWB for DT; RGB for LDA; and RGB for SGDC. Notably, MLP consistently performed well regardless of the weights or the prioritization across all three scenarios.
The most cost-effective machines were independent of time and accuracy weights for each machine learning algorithm. The combinations of RGB and RRF, RGBCC and RF, EGI and SVM, EGBI and SVM, HOG and SGDC, SIFT and RF, CED and SVM, RGBWB and SVM, and Green and RF were the most economical.
The analysis of the execution time and accuracy revealed a clear trade-off between computational efficiency and predictive precision. The MLP models achieved the highest accuracies due to their multilayer structure and iterative optimization but required longer processing times. In contrast, SGDC executed the fastest, reflecting its simpler linear optimization process, though at the expense of slightly lower accuracy. The Nash optimization results confirmed that model performance was largely stable across different weighting scenarios, indicating that RF and SVM offer a robust balance of accuracy and efficiency. The variation in input-set execution times also suggests that feature complexity directly affects computational demand, as RGBWB and RGBCC contain more information than single-band inputs such as HOG or SIFT. Nash results showed that RGB was largely stable across diffident weight scenarios, demonstrating a robust balance between accuracy and efficiency.
4. Conclusions
This study demonstrated the effectiveness of using RGB-based Machine Learning Image Modules (MLIMs) for detecting and classifying stress in crops, focusing on both water and nitrogen stress. Each developed and tested MLIM employed one of nine input feature sets and one of six machine learning algorithms for analyzing RGB images to detect stress; stress due to water insufficiency; stress due to nitrogen insufficiency; and stress due to two levels of nitrogen insufficiency. Achieving perfect accuracy for detecting general stress due to water or nitrogen insufficiency, the MLP algorithm with RGB input outperformed more complex methods that require additional data, such as canopy cover and hyperspectral bands. An MLIM that uses RGB input to an MLP algorithm can provide growers with a practical tool to assess crop stress and optimize water and nitrogen use with minimal technological overhead.
Regarding water stress detection, the RF algorithm with RGBCC input achieved an accuracy within two percent of that achieved by previously published models with much more complex and costly input, such as the 55-dimensional dataset of [
16]. This underscores the potential of RGB images as a cost-effective, readily accessible data source. In terms of nitrogen detection, other machine learning algorithms such as RF, SVM, and SGDC also performed well, particularly when considering both accuracy and computational time. The MLIMs that best balanced accuracy and execution time included RF with RGB, SVM with RGB, and RF with RGBCC. Although MLP offered the highest accuracy, it required much more computation time. Depending on the priority (time minimization vs. accuracy maximization), other algorithms such as SGDC and SVM might be preferable for some real-time applications. Overall, this research highlights the feasibility of using simple RGB images combined with the classical machine learning techniques to offer robust and efficient crop stress monitoring solutions, and to improve agricultural productivity and sustainability. The results confirm that visible image-based MLP models effectively integrate both spectral (chlorophyll-related) and structural (canopy-related) information, providing a reliable and data-efficient framework for identifying and differentiating abiotic stresses in sugar beet crops.
Future research should explore deep learning approaches with larger datasets, particularly deployed on devices such as drones, to detect crop stress conditions and extend the applicability of our models. Additionally, incorporating images from farms is suggested for future work to make the model more robust for applications outside controlled greenhouse environments.