Diagnosis of Cotton Nitrogen Nutrient Levels Using Ensemble MobileNetV2FC, ResNet101FC, and DenseNet121FC

: Nitrogen plays a crucial role in cotton growth, making the precise diagnosis of its nutrition levels vital for the scientific and rational application of fertilizers. Addressing this need, our study introduced an EMRDFC-based diagnosis model specifically for cotton nitrogen nutrition levels. In our field experiments, cotton was subjected to five different nitrogen application rates. To enhance the diagnostic capabilities of our model, we employed ResNet101, MobileNetV2


Introduction
Throughout its growth, nitrogen plays a crucial role in cotton development, significantly influencing the synthesis of amino acids, proteins, and chlorophyll in cotton plants [1].These elements collectively determine the yield and quality of the cotton harvest.In agricultural practices, the liberal use of nitrogen fertilizers is known to substantially boost cotton yields.However, the prevailing method of applying nitrogen fertilizer predominantly depends on manual judgment.In an effort to maximize yields, farmers often apply nitrogen fertilizers excessively, leading to unintended consequences such as reduced crop yields and environmental degradation.Accurately assessing the specific nitrogen status of cotton is therefore essential.This accuracy not only enables the precise application of the optimal amount of nitrogen fertilizer but also contributes to increased yields and cost savings.
The conventional method for determining the nitrogen nutrient level in plant leaves primarily involves chemical detection [2].However, this approach necessitates the destructive sampling of cotton leaves, which is both time-consuming and costly.Consequently, there is an increasing demand for intelligent diagnostic technologies that enable growers to quickly ascertain the nitrogen levels in cotton plants without resorting to destructive methods.With advancements in science and technology, optical imaging equipment has become widely accessible in daily life, offering a convenient and timely means to capture optical images.Leveraging these images along with computer vision technology facilitates the rapid acquisition of plant nutrient levels.This approach is advantageous as it is non-destructive, swift, and highly automated.Although machine learning methods have seen extensive application in agriculture [3][4][5], challenges arise due to the complex background of agricultural images and issues like leaf occlusion, leading to incomplete image information extraction.This limitation hinders accurate and efficient determination of crop conditions.For instance, Lee et al. [6] obtained rice canopy images under varying nitrogen treatments using a digital camera.They extracted features and applied multiple linear regression to assess nitrogen deficiency in rice, achieving a final accuracy rate of 75%.However, traditional machine learning methods require manual extraction of specific data features, limiting the generality and transferability of these methods across different tasks.
Convolutional Neural Networks (CNNs) have become a fundamental tool in agricultural applications, as detailed in [7].They are notably utilized for diagnosing pest and disease occurrences and predicting yields in various agricultural products, including fruits [8,9], vegetables [10,11], and grains [12,13].The extensive adoption of CNNs in these areas is attributed to their inherent abilities for automated feature extraction, high accuracy, and robustness.With the advancement of image classification techniques, CNNs have opened up new possibilities for identifying nutrient deficiencies in crops, leading to numerous studies on plant nutritional status.For instance, Han et al. [14] employed multiple deep convolutional neural networks and transfer learning methods to determine nutrient deficiency statuses in black bean leaves.Their results showed that the ResNet50 model achieved the highest test accuracy of 65.44%.Similarly, Cevallos et al. [15] compiled a dataset of tomato leaf images and used CNNs to identify deficiency types in tomatoes, achieving a notable accuracy of 86.57%.Bahtiar et al. [16] utilized various deep learning models to detect four different nutritional deficiencies in chili peppers, finding that the RCNN model performed best with a classification accuracy of 82.61%.Additionally, Uchechi et al. [17] investigated nutritional deficiencies in grapes, combining machine learning and deep learning models to differentiate between potassium-deficient and healthy grape leaves.Using feature extraction and SVM, they achieved an accuracy of 66.7%, while the application of the ResNet50 model increased accuracy to 80%.
CNNs have been extensively applied in cotton research.For instance, Caldeira et al. [18] utilized GoogleNet and ResNet50 for deep learning-based recognition of cotton leaf health conditions, achieving accuracies of 86.6% and 89.2%, respectively.Zekiwos et al. [19] developed a deep learning model for the detection of common diseases and pests in cotton, achieving promising results.Xu et al. [20] employed YOLOv5 to identify the severity of cotton aphid damage during the cotton seedling stage and implemented it on mobile devices for precision pest management.Islam et al. [21] enhanced the early detection of cotton leaf diseases by fine-tuning various deep learning models, facilitating early treatment and improving cotton yield.Kou et al. [22] predicted cotton nitrogen content by manually extracting features from UAV RGB images and feeding them into a CNN model used to diagnose cotton nitrogen levels, achieving good results, although the model's accuracy requires further improvement.He et al. [23] developed a grading model for cotton leaf nitrogen deficiency levels using deep learning technology and improved the ResNeXt model to enhance its diagnostic effectiveness, yet there is still room for performance improvement.
The complexity of crop elemental nutrient levels and deficiency types, compounded by the confounding factors of pest and disease symptoms, presents a significant challenge when it comes to accurately classifying crop nutritional status.To address this, researchers have turned to ensemble learning approaches with increasing frequency [24].These approaches involve the combination of multiple models for prediction, utilizing various integration strategies.This technique draws on the strengths of different models to overcome the limitations of individual models, thereby improving the classification accuracy.For example, Tran et al. [25] employed a combination of Inception-ResNet V2, an autoencoder, and an ensemble model that averaged the predictions of both, attaining accuracies of 87.27%, 79.09%, and 91%, respectively, on their test dataset.Notably, the ensemble model showed a 3.73% increase in accuracy over the best-performing base model.Similarly, Talukder et al. [26] used an ensemble model to identify nutrient deficiencies in early-stage rice.They developed a diagnosis model using pre-trained MobileNet, DenseNet121, and DenseNet169 models, achieving a maximum single-model accuracy of 93.33% and an average ensemble accuracy of 96.67% for the ternary model.In another instance, Talukder et al. [27] developed a Deep Ensemble Convolutional Neural Network (DECNN) model for diagnosing deficiency types in rice.They enhanced the base model by adding layers such as pooling and dropout, which improved the model's accuracy.Using a weighted averaging integration method, an ensemble of improved DenseNet169, DenseNet201, and InceptionV3 achieved a test accuracy of 98.33%.Furthermore, Yang et al. [28] created a Stacking integrated convolutional neural network model for identifying nitrogen nutrient levels in rice.Using DenseNet121, ResNet50, and Inception-ResNet V2 as base learners, they achieved an optimal diagnostic accuracy of 96.41% with DenseNet121, which increased to 98.10% after integrating these models using the Stacking algorithm.These integrative methodologies highlight the effectiveness of combining diverse models to enhance accuracy when diagnosing crop nutritional status, effectively navigating the complexities associated with various nutrient levels and deficiency types in agricultural scenarios.
To achieve high-precision diagnosis of cotton nitrogen nutrient levels, an ensemble strategy integrating the predictions of multiple individual models was employed, effectively reducing the risk of overfitting and enhancing the model's generalization capability and robustness towards unknown data.This approach, by aggregating the diverse insights and patterns recognized by different models, not only mitigates the potential biases inherent in single models but also leverages the specific expertise of different models in processing particular data types.Consequently, this method has demonstrated significant advantages in improving the accuracy of diagnosing cotton nitrogen nutrient levels.
In this paper, we propose an EMRDFC-based diagnosis model for accurately determining cotton nitrogen nutrition levels.The EMRDFC framework is constructed through the ensemble of enhanced models-ResNet101FC, MobileNetV2FC, and DenseNet121FC-each based on their foundational architectures: ResNet101, MobileNetV2, and DenseNet121, respectively.This ensemble leverages diverse integration strategies.The nomenclature 'EMRDFC' derives from the initialism of the key terms involved.Modifications were applied to the models MobileNetV2, ResNet101, and DenseNet121 to enhance their feature extraction capabilities.Subsequently, the improved models were integrated using different ensemble strategies to construct ensemble models.A comparative analysis was conducted to examine the differences in performance between the base models and the ensemble models in diagnosing cotton nitrogen levels, as well as the impact of various ensemble strategies on the performance of the ensemble models.This approach facilitated the development of a high-performance model for diagnosing cotton nitrogen nutrient levels, offering a solution for detecting nitrogen levels in cotton under complex environmental conditions.
This study demonstrates the application of deep learning models for the accurate and rapid diagnosis of cotton nitrogen nutrient status, providing a scientific foundation for precision fertilization based on actual crop requirements.This approach optimizes fertilizer use efficiency and enhances crop yield.The utilization of this technology aids in avoiding the overuse or insufficient use of nitrogen fertilizers, thereby reducing environmental issues such as soil degradation and eutrophication of water bodies and promoting environmentally friendly and sustainable agricultural production.Furthermore, precise nitrogen nutrient management, by minimizing unnecessary fertilizer inputs, reduces agricultural production costs; diminishes the incidence of diseases and pesticide use due to nutrient deficiencies, further lowering the costs associated with agricultural production; and simultaneously enhances the level of intelligence in agricultural production.
Section 2 presents the cotton nitrogen nutrient level dataset; the foundational models of ResNet101, MobileNetV2, and DenseNet121, along with their enhancements; the employed ensemble strategies; and introduces the proposed EMRDFC model.Section 3 examines the impact of these improvements on model performance and conducts a comparative analysis of model performance under different ensemble strategies.Section 4 analyzes and discusses the distinctions from existing research.Section 5 concludes the paper.

Data Acquisition and Preprocessing
The field experiment on cotton cultivation was conducted at the Experimental Station of the Xinjiang Academy of Agricultural Reclamation, located in Shihezi City, Xinjiang Uygur Autonomous Region (XUAR).The experiment established five different nitrogen application levels: 0 kg•ha −1 , 72 kg•ha −1 , 144 kg•ha −1 , 192 kg•ha −1 , and 240 kg•ha −1 , with 240 kg•ha −1 being the standard nitrogen application rate.The cotton variety used in this study was 'Jinken 1161'.During the early flowering stage, the top 4 uniform leaves on the primary stem of the cotton plants, termed functional leaves, were identified as particularly sensitive to nitrogen supply, providing an accurate reflection of the cotton's nitrogen nutritional status, as noted in [29].At the bolling stage of the main cotton shoot, images of these functional leaves and their canopy were captured using a Huawei P40 Pro camera, resulting in a collection of 1156 cotton canopy images and 231 functional leaf images.The device is equipped with a 50-million-pixel rear ultra-sensitive camera featuring wide-angle capabilities, an f/1.9 aperture, and Optical Image Stabilization (OIS) among others.For the shooting mode, Professional mode was selected, with parameter settings adjusted as follows: To minimize color discrepancies caused by changes in light intensity, the white balance was set to automatic mode under sunny conditions and to 4800 k under cloudy conditions.The focus mode was set to manual.Exposure compensation was adjusted based on the strength of the light, with settings of −2 for sunny and +1 for cloudy conditions.The shutter speed was set to automatic.To avoid excessive noise, the camera's sensitivity (ISO) was set to 100 on cloudy days and to automatic for Experiment II.During shooting, the distance between the camera lens and the leaf was maintained within a range of 20 to 50 cm, with the angle between the lens and the leaf surface controlled within a range of 45 to 90 degrees.In parallel, selected plants were destructively sampled, and the Kjeldahl method [30] was used to determine the actual nitrogen levels in these samples, which were then used as data labels.Examples of samples under five nitrogen nutrition levels are illustrated in Figure 1.
lizer use efficiency and enhances crop yield.The utilization of this technology aids in avoiding the overuse or insufficient use of nitrogen fertilizers, thereby reducing environmental issues such as soil degradation and eutrophication of water bodies and promoting environmentally friendly and sustainable agricultural production.Furthermore, precise nitrogen nutrient management, by minimizing unnecessary fertilizer inputs, reduces agricultural production costs; diminishes the incidence of diseases and pesticide use due to nutrient deficiencies, further lowering the costs associated with agricultural production; and simultaneously enhances the level of intelligence in agricultural production.
Section 2 presents the cotton nitrogen nutrient level dataset; the foundational models of ResNet101, MobileNetV2, and DenseNet121, along with their enhancements; the employed ensemble strategies; and introduces the proposed EMRDFC model.Section 3 examines the impact of these improvements on model performance and conducts a comparative analysis of model performance under different ensemble strategies.Section 4 analyzes and discusses the distinctions from existing research.Section 5 concludes the paper.

Data Acquisition and Preprocessing
The field experiment on cotton cultivation was conducted at the Experimental Station of the Xinjiang Academy of Agricultural Reclamation, located in Shihezi City, Xinjiang Uygur Autonomous Region (XUAR).The experiment established five different nitrogen application levels: 0 kg•ha −1 , 72 kg•ha −1 , 144 kg•ha −1 , 192 kg•ha −1 , and 240 kg•ha −1 , with 240 kg•ha −1 being the standard nitrogen application rate.The cotton variety used in this study was 'Jinken 1161'.During the early flowering stage, the top 4 uniform leaves on the primary stem of the cotton plants, termed functional leaves, were identified as particularly sensitive to nitrogen supply, providing an accurate reflection of the cotton's nitrogen nutritional status, as noted in [29].At the bolling stage of the main cotton shoot, images of these functional leaves and their canopy were captured using a Huawei P40 Pro camera, resulting in a collection of 1156 cotton canopy images and 231 functional leaf images.The device is equipped with a 50-million-pixel rear ultra-sensitive camera featuring wide-angle capabilities, an f/1.9 aperture, and Optical Image Stabilization (OIS) among others.For the shooting mode, Professional mode was selected, with parameter settings adjusted as follows: To minimize color discrepancies caused by changes in light intensity, the white balance was set to automatic mode under sunny conditions and to 4800 k under cloudy conditions.The focus mode was set to manual.Exposure compensation was adjusted based on the strength of the light, with settings of −2 for sunny and +1 for cloudy conditions.The shutter speed was set to automatic.To avoid excessive noise, the camera's sensitivity (ISO) was set to 100 on cloudy days and to automatic for Experiment II.During shooting, the distance between the camera lens and the leaf was maintained within a range of 20 to 50 cm, with the angle between the lens and the leaf surface controlled within a range of 45 to 90 degrees.In parallel, selected plants were destructively sampled, and the Kjeldahl method [30] was used to determine the actual nitrogen levels in these samples, which were then used as data labels.Examples of samples under five nitrogen nutrition levels are illustrated in Figure 1.For data captured under dimly lit, overcast conditions, feature enhancement was applied by converting RGB color space images to HIS color space.Subsequently, a gamma function was utilized to correct overexposure or underexposure in the images, thereby enhancing details.To facilitate the training of deep learning models, the results of grayscale enhancement in HIS space were converted back to RGB images.This conversion employed a limited contrast adaptive histogram equalization algorithm to prevent image distortion.This process was designed to neutralize potential light interference.To prevent model overfitting, the dataset was expanded using data augmentation techniques such as random flipping, rotating, and mirroring.The expanded dataset, with the quantity of each category post-augmentation, is detailed in Table 1.Following augmentation, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio.This distribution was aimed at optimizing the training process and facilitating effective model evaluation.

Development of a Diagnosis Model for Assessing Nitrogen Nutrient Levels in Cotton
In the domain of deep learning, the three models, ResNet101, MobileNetV2, and DenseNet121, each demonstrate unique advantages when handling complex image recognition tasks.ResNet101, with its 101-layer deep architecture and residual learning mechanism, effectively addresses the vanishing gradient problem and enhances the learning capability for deep features, making it suitable for recognizing subtle image differences.MobileNetV2, through its inverted residual structure and depth-wise separable convolutions, achieves model lightweighting, ensuring efficient performance on mobile and embedded devices.DenseNet121, with its densely connected network structure, improves feature utilization efficiency and model accuracy.The combination of these three models, leveraging their complementarity, enhances overall performance and robustness in complex tasks such as precise diagnosis of cotton nitrogen nutrient levels.Their application in image recognition and classification tasks has been extensively validated, proving their effectiveness and reliability in various fields, including agricultural image processing.
In this study, the base models MobileNetV2, ResNet101, and DenseNet121 were selected with careful consideration.To boost their feature extraction capabilities for images showing different nitrogen nutrition levels in cotton, the CBAM (Convolutional Block Attention Module) was strategically integrated into these models.The primary goal of this integration was to refine the models' effectiveness by reducing background noise and emphasizing key indicators of nitrogen deficiency in cotton leaves, thereby enhancing the models' precision and accuracy.Additionally, the dataset exhibited a slight imbalance in category distribution.To counter this imbalance, the Focal loss function was introduced, specifically to address the uneven representation of cotton nitrogen levels in the dataset.Following this enhancement, the models underwent extensive training and validation.Recognizing the inherent limitations of single models in achieving optimal results, our research adopted a strategy of model integration.This involved combining the enhanced models using various integration strategies, aiming to significantly improve diagnostic accuracy in order to determine the cotton nitrogen nutrition levels.The ultimate objective was to achieve a more precise classification of the different nitrogen levels present in cotton plants.

Basic Network Model
The general understanding that increased depth in deep learning models leads to improved performance is valid to a certain degree.However, with increasing network depth, there is often a notable reduction in model accuracy after a specific threshold is surpassed, presenting challenges in effectively training deeper models.Addressing this issue, He et al. [31] introduced the ResNet model, which is distinguished by its innovative residual structure.This structure incorporates 'shortcut connections', enabling a smoother information flow across layers and thus alleviating the problem of network degradation.As a result, ResNet facilitates the training of exceedingly deep models, successfully ac-commodating hundreds or even thousands of layers.The ResNet series includes several versions, each with unique benefits.Considering the advantages of deep-layered models and the complexities involved in diagnosing cotton nitrogen nutrition levels, this study opts for ResNet101 as the base model.
DenseNet, developed by Gao et al. [32], represents a notable advance in neural network architecture.This model optimizes the utilization of network features by densely connecting each layer in the network with every other layer.A key feature of DenseNet is that each layer receives inputs from all previous layers, which effectively tackles the problem of vanishing gradients and simultaneously reduces the overall number of parameters.This architectural design ensures remarkable performance and represents a significant breakthrough in neural network development.In light of the experimental platform employed in this study and the complexity of the task, we chose DenseNet121 as our base model.
MobileNetV2 [33] marks a significant advancement over its predecessor, MobileNetV1 [34].As a lightweight convolutional neural network, similar to its forerunner, it distinguishes itself through the incorporation of an inverted residual structure and a linear bottleneck, both aimed at enhancing the model's overall performance.The inverted residual structure is instrumental in facilitating efficient feature information transfer between layers within the network.This feature is key to substantially improving the model's capability for feature extraction.Additionally, the linear bottleneck is designed to reduce the loss of feature information that typically occurs due to the ReLU activation function, thereby increasing the model's accuracy.These architectural improvements collectively account for MobileNetV2's enhanced performance relative to MobileNetV1.

CBAM Module
In this study, the cotton leaf images collected under various nitrogen levels were obtained in natural settings, where the similarity across different categories of images poses challenges.The extraction of features is susceptible to interference from irrelevant factors, hindering the model's ability to learn essential features and leading to suboptimal performance.The attention mechanism enables the model to focus on critical features, reducing the noise from background and environmental factors, thereby enhancing model performance.Therefore, to augment the feature extraction capability of the base models, the Convolutional Block Attention Module (CBAM) [35] was incorporated at strategic positions within the base models, aiming to improve their ability to recognize cotton nitrogen nutrient levels.The structure of CBAM is depicted in Figure 2. By incorporating the CBAM attention mechanism, the model can focus more on regions within the image that are crucial for the classification task.This is particularly critical for the precise diagnosis of cotton nitrogen nutrient levels, as different nutrient levels may manifest as subtle differences in the images.
The CBAM (Convolutional Block Attention Module) functions as a hybrid attention mechanism, encompassing both the Channel Attention Module (CAM) and the Spatial Attention Module (SAM).The features extracted from the convolutional layer of the model serve as inputs to the CBAM.The features obtained from the model are fed into the CBAM module, where they first pass through the channel attention mechanism to yield F 1 , and subsequently through the spatial attention mechanism to produce F 2 .The specific calculation formulas for these processes are as follows: where M c (F) represents the channel module weights, M s (F 1 ) represents the spatial module weights, and ⊗ is the convolution operation.The CBAM (Convolutional Block Attention Module) functions as a hybrid attention mechanism, encompassing both the Channel Attention Module (CAM) and the Spatial Attention Module (SAM).The features extracted from the convolutional layer of the model serve as inputs to the CBAM.The features obtained from the model are fed into the CBAM module, where they first pass through the channel attention mechanism to yield F1, and subsequently through the spatial attention mechanism to produce F2.The specific calculation formulas for these processes are as follows: where Mc(F) represents the channel module weights, Ms(F1) represents the spatial module weights, and ⊗ is the convolution operation.

Focal Loss Function
Due to the difficulty in obtaining an unbalanced distribution of cotton leaf images under different nitrogen levels, resulting in the overall low accuracy of the model for the classification of images with different nitrogen levels, in order to alleviate the imbalance of cotton leaf data with different nitrogen levels, the Focal loss function obtained based on the modification of the cross-entropy loss function is introduced [36].The introduction of Focal Loss addresses the issue of class imbalance by reducing the weight of easy-to-classify samples and increasing the weight of hard-to-classify samples.This improves the model's ability to learn the characteristics of minority classes in the diagnosis of cotton nitrogen nutrient levels, thereby enhancing its overall diagnostic performance.The multiclassification cross entropy loss formula is as follows: In the equation for the multicategorical cross-entropy loss function, L represents the loss function.Here, x refers to the sample labels, while x′ denotes the output probability following the application of the Softmax activation function.
The Focal loss function in the cross-entropy loss function was based on the addition of adjustment factors, which included adding γ to reduce the weight of the easy-to-classify

Focal Loss Function
Due to the difficulty in obtaining an unbalanced distribution of cotton leaf images under different nitrogen levels, resulting in the overall low accuracy of the model for the classification of images with different nitrogen levels, in order to alleviate the imbalance of cotton leaf data with different nitrogen levels, the Focal loss function obtained based on the modification of the cross-entropy loss function is introduced [36].The introduction of Focal Loss addresses the issue of class imbalance by reducing the weight of easy-toclassify samples and increasing the weight of hard-to-classify samples.This improves the model's ability to learn the characteristics of minority classes in the diagnosis of cotton nitrogen nutrient levels, thereby enhancing its overall diagnostic performance.The multiclassification cross entropy loss formula is as follows: In the equation for the multicategorical cross-entropy loss function, L represents the loss function.Here, x refers to the sample labels, while x ′ denotes the output probability following the application of the Softmax activation function.
The Focal loss function in the cross-entropy loss function was based on the addition of adjustment factors, which included adding γ to reduce the weight of the easy-to-classify samples to make the model more concerned about the difficult-to-classify samples.The Focal loss formula is shown below: where M , FL represents the focal loss function, α i represents the balancing factor, which is used to balance the proportion of each category, m i represents the number of category i, and M represents the total number of datasets.

Ensemble Strategy
Ensemble learning, as mentioned in [37], involves the combination of multiple base models using a variety of strategies, often leading to improved performance compared to that achieved using a single model.Due to the limitations of individual models in achieving optimal diagnostic effectiveness, the enhancement of diagnosis models for cotton nitrogen nutrient levels requires the adoption of ensemble learning techniques.In this study, we use three base models and implement three different integration strategies: simple averaging, weighted averaging, and relative majority voting.These strategies are crucial in developing an ensemble model, facilitating the evaluation and comparison of performance differences among the various ensemble models.
The simple averaging method encompasses the aggregation of outputs from multiple base models, followed by calculating the average of these outputs to arrive at the final prediction.The simple average formula is shown below: In this equation, B 1 (x), B 2 (x), and B 3 (x) represent the outputs from the respective base models when classifying a single image of cotton nitrogen nutrition.
The key difference between the weighted averaging method and the simple averaging method lies in the allocation of distinct weights to the various base models.This approach takes into account the unique attributes of each base model, integrating them in a way that emphasizes their relative importance.This is implemented to optimize the overall performance of the ensemble model.The formula for the weighted average method is shown below: In this context, the weights w 1 , w 2 , and w 3 are set to be greater than or equal to 0, with their combined sum constrained to equal 1.These weights represent the relative significance attributed to the outputs of each respective base model in the weighted averaging method.
Relative majority voting is a commonly used integration strategy in model ensembles, involving a collective voting process among the outputs of multiple base models.This method aggregates the number of outputs corresponding to each category.The category receiving the highest count is then selected as the final output of the ensemble.In cases where there is a tie, with multiple categories receiving an equal number of votes, one of these categories is randomly selected to be the definitive result.

EMRDFC Network Model
To improve the base models' ability to recognize images representing different nitrogen levels, the CBAM (Convolutional Block Attention Module) module was strategically integrated at specific points within these models.Simultaneously, the introduction of the Focal loss function aimed to enhance overall model performance.In ResNet101, the CBAM module was inserted at the beginning of each residual module.This was implemented to strengthen the model's feature representation and perception capabilities, enabling it to capture finer details and gather comprehensive global information from images with varied nitrogen levels, thereby boosting its accuracy.For DenseNet121, the CBAM module was placed before each dense block.This placement was intended to improve feature selection and minimize the loss of low-dimensional feature information, thus strengthening the model's performance and generalization ability.In MobileNetV2, the CBAM module was added into the inverted residual structure following the depth-separable convolution phase.This strategic placement allowed the module to focus on channel significance and the specific identification of cotton functional leaves, facilitating the integration of global information within the model and enhancing its ability to discern varying nitrogen nutrient levels in cotton.Additionally, the Focal loss function was introduced to address the slight category imbalance in the dataset, optimizing the model's classification performance.As a result of these enhancements, the advanced models-ResNet101FC, DenseNet121FC, and MobileNetV2FC-were developed, representing improved versions of the original ResNet101, DenseNet121, and MobileNetV2, respectively.
To enhance the accuracy in identifying cotton nitrogen nutrition levels, this study introduces the EMRDFC model, an ensemble Convolutional Neural Network (CNN) diagnosis model.This model is crafted by fusing ResNet101FC, DenseNet121FC, and MobileNetV2FC using various integration strategies, aiming to improve the overall performance in assessing cotton nitrogen nutrition levels.The structural flow of the EMRDFC model is illustrated in Figure 3.Given the unique structures and capabilities of the three individual models, they provide different perspectives in classifying the same image of cotton nitrogen nutrient levels.By integrating these diverse results through specific strategies, the diagnostic model's performance in classifying nitrogen nutrient levels in cotton is significantly enhanced.This integration strategy, combining varying model perspectives, leads to a more comprehensive, robust, and accurate evaluation of cotton nitrogen nutrition levels.
tion performance.As a result of these enhancements, the advanced models-Res-Net101FC, DenseNet121FC, and MobileNetV2FC-were developed, representing improved versions of the original ResNet101, DenseNet121, and MobileNetV2, respectively.
To enhance the accuracy in identifying cotton nitrogen nutrition levels, this study introduces the EMRDFC model, an ensemble Convolutional Neural Network (CNN) diagnosis model.This model is crafted by fusing ResNet101FC, DenseNet121FC, and Mo-bileNetV2FC using various integration strategies, aiming to improve the overall performance in assessing cotton nitrogen nutrition levels.The structural flow of the EMRDFC model is illustrated in Figure 3.Given the unique structures and capabilities of the three individual models, they provide different perspectives in classifying the same image of cotton nitrogen nutrient levels.By integrating these diverse results through specific strategies, the diagnostic model's performance in classifying nitrogen nutrient levels in cotton is significantly enhanced.This integration strategy, combining varying model perspectives, leads to a more comprehensive, robust, and accurate evaluation of cotton nitrogen nutrition levels.

Evaluation Metrics
In this study, the model's performance is assessed using key metrics: Accuracy (A), Precision (P), Recall (R), and the F1 score (F1).These indicators are crucial benchmarks for evaluating the efficacy of the model, where higher values signify enhanced model performance.The calculation formula is shown below:

Evaluation Metrics
In this study, the model's performance is assessed using key metrics: Accuracy (A), Precision (P), Recall (R), and the F1 score (F1).These indicators are crucial benchmarks for evaluating the efficacy of the model, where higher values signify enhanced model performance.The calculation formula is shown below: where TP (True Positives) denotes the count of samples that are correctly predicted as positive; TN (True Negatives) represents the count of samples accurately identified as negative; FP (False Positives) refers to the number of samples that are incorrectly predicted as positive despite having negative labels; and FN (False Negatives) indicates the count of samples that are falsely predicted as negative despite being positive.

Experimental Platform
The specific parameters of the testbed used in the model training and validation process are shown in Table 2.

Parameter Settings
The hyper-parameters used for model training are as follows: a learning rate of 0.0001 for both ResNet101 and DenseNet121, and 0.001 for MobileNetV2, a weight regularization factor of 0.0001, a momentum of 0.9, a batch_size of 16, and an iteration number (Epoch) of 50.The optimizer is stochastic gradient descent (SGD).

Basic Model Performance Analysis
Figure 4 shows the accuracy and loss metrics for ResNet101, MobileNetV2, and DenseNet121 on both training and validation sets.Throughout the training phase, there is a notable consistent decrease in both training and validation losses.At the same time, there is a steady increase in accuracy that continues with each iteration until convergence is reached.Around the 40th iteration, the model curves begin to show signs of stabilization.Among these models, DenseNet121 achieves the highest accuracy, while MobileNetV2 has the lowest.Finally, the accuracy of ResNet101, MobileNetV2, and DenseNet121 on the validation set ended up being 90.76%, 88.24%, and 93.08%, respectively.

Ablation Experiment
To evaluate the effectiveness of integrating the CBAM module and the Focal loss function in boosting model performance, experiments were executed using consistent data and parameters.The primary aim was to scrutinize the impact of these enhancements on the performance of the base model.Different model variants were designated to indicate specific enhancements: ResNet101C, MobileNetV2C, and DenseNet121C denote

Ablation Experiment
To evaluate the effectiveness of integrating the CBAM module and the Focal loss function in boosting model performance, experiments were executed using consistent data and parameters.The primary aim was to scrutinize the impact of these enhancements on the performance of the base model.Different model variants were designated to indicate specific enhancements: ResNet101C, MobileNetV2C, and DenseNet121C denote models augmented with the CBAM module, while ResNet101FC, MobileNetV2FC, and DenseNet121FC represent models that incorporate both the CBAM module and the Focal loss function.The variations in accuracy and loss metrics on the validation set, both preand post-enhancements, are illustrated in Figure 5. Compared to the original models, the inclusion of the CBAM module resulted in improved accuracies across all models.This enhancement can be attributed to the CBAM module's capability to refine the model's focus on the functional leaf region while diminishing secondary information interference, such as canopy and background elements.Consequently, the model became adept at swiftly and accurately pinpointing the region exerting the most substantial influence on the output results.The concurrent embedding of shallow and deep layers within the attention mechanism notably fortified the model's Compared to the original models, the inclusion of the CBAM module resulted in improved accuracies across all models.This enhancement can be attributed to the CBAM module's capability to refine the model's focus on the functional leaf region while diminishing secondary information interference, such as canopy and background elements.Consequently, the model became adept at swiftly and accurately pinpointing the region exerting the most substantial influence on the output results.The concurrent embedding of shallow and deep layers within the attention mechanism notably fortified the model's feature extraction proficiency for assessing cotton nitrogen levels.The low-order texture information encapsulates morphological features distinctive to cotton leaves across varying nitrogen levels.The shallow attention mechanism initially facilitates a preliminary screening; however, it blocks out the functional leaves and other leaves in the canopy, hindering the mining of semantic information.In contrast, the deep attention mechanism operates at a higher semantic dimension level, facilitating precise localization of the cotton functional leaves and subsequently enhancing the accuracy of cotton nitrogen level diagnosis.Detailed experimental findings illustrating these effects are presented in Figure 6.feature extraction proficiency for assessing cotton nitrogen levels.The low-order texture information encapsulates morphological features distinctive to cotton leaves across varying nitrogen levels.The shallow attention mechanism initially facilitates a preliminary screening; however, it blocks out the functional leaves and other leaves in the canopy, hindering the mining of semantic information.In contrast, the deep attention mechanism operates at a higher semantic dimension level, facilitating precise localization of the cotton functional leaves and subsequently enhancing the accuracy of cotton nitrogen level diagnosis.Detailed experimental findings illustrating these effects are presented in Figure 6.As can be seen from the above figure, compared with the original model, the accuracy of ResNet101, MobileNetV2, and DenseNet121 on the validation set after embedding the CBAM module is improved by 1.52%, 2.19%, and 2.15, respectively, which proves that the addition of CBAM can effectively improve the performance of the diagnosis model, but due to the high number of redundant parameters in ResNet101, the improvement effect is lower than those of MobileNetV2 and DenseNet121.At the same time, the Focal loss function is introduced to alleviate the problem of category imbalance, and the accuracy of Res-Net101FC, MobileNetV2FC, and DenseNet121FC in the validation set is 93.06%, 91.15%, and 96.01%, respectively, which is higher than the accuracy before the introduction of the Focal loss function.The accuracy before the introduction of Focal loss function is improved by 0.78%, 0.72, 0.78%, while the accuracy is improved by 2.3%, 2.91%, 2.93% compared to the original model, which proves that the improvement in this paper can effectively improve the model's performance.

Ensemble Model Performance Analysis
To enhance the diagnostic efficiency for cotton nitrogen nutrition levels, this study utilized ResNet101FC, MobileNetV2FC, and DenseNet121FC as base models, integrating them using diverse strategies.The ensemble model demonstrated commendable performance in diagnosing cotton nitrogen nutrition levels.The three integration strategies employed were simple averaging, weighted averaging, and relative majority voting.It was observed that the best model performance was achieved when MobileNetV2FC, Res-Net101FC, and DenseNet121FC were ensembled using the weighted averaging method with weights of 0.2, 0.3, and 0.5, respectively.Precision, recall, and F1 score are critical As can be seen from the above figure, compared with the original model, the accuracy of ResNet101, MobileNetV2, and DenseNet121 on the validation set after embedding the CBAM module is improved by 1.52%, 2.19%, and 2.15, respectively, which proves that the addition of CBAM can effectively improve the performance of the diagnosis model, but due to the high number of redundant parameters in ResNet101, the improvement effect is lower than those of MobileNetV2 and DenseNet121.At the same time, the Focal loss function is introduced to alleviate the problem of category imbalance, and the accuracy of ResNet101FC, MobileNetV2FC, and DenseNet121FC in the validation set is 93.06%, 91.15%, and 96.01%, respectively, which is higher than the accuracy before the introduction of the Focal loss function.The accuracy before the introduction of Focal loss function is improved by 0.78%, 0.72, 0.78%, while the accuracy is improved by 2.3%, 2.91%, 2.93% compared to the original model, which proves that the improvement in this paper can effectively improve the model's performance.

Ensemble Model Performance Analysis
To enhance the diagnostic efficiency for cotton nitrogen nutrition levels, this study utilized ResNet101FC, MobileNetV2FC, and DenseNet121FC as base models, integrating them using diverse strategies.The ensemble model demonstrated commendable performance in diagnosing cotton nitrogen nutrition levels.The three integration strategies employed were simple averaging, weighted averaging, and relative majority voting.It was observed that the best model performance was achieved when MobileNetV2FC, ResNet101FC, and DenseNet121FC were ensembled using the weighted averaging method with weights of 0.2, 0.3, and 0.5, respectively.Precision, recall, and F1 score are critical metrics for evaluating the efficacy of different ensemble models.Figure 7 presents a comparative analysis of these metrics across various categories on the validation set for different integration models.The relative majority voting method (v_Ensemble) outperformed the simple average method (a_Ensemble) and the weighted average method (w_Ensemble) in certain categories in some metrics.However, the simple average method exhibited slightly superior metrics in a few categories, with the most notable being a 1.43% higher precision rate for the N_72 category compared to the weighted average method.An assessment of precision and recall rates revealed that each integration strategy had distinct advantages in different categories, complicating a comprehensive evaluation of the model's performance.When considering the F1 score, which combines precision and recall, a more holistic evaluation of model performance is possible.Except for the N_240 category, where the F1 score of the weighted average method was marginally lower by 0.68% than that of the simple average method, the F1 scores for the other categories were optimal.Upon a comprehensive assessment, the model ensemble using the weighted average method exhibited the best overall performance, characterized by robustness and strong generalization capabilities.Herein, v_Ensemble, a_Ensemble, and w_Ensemble denote the abbreviations for ensemble models constructed by integrating ResNet101FC, MobileNetV2FC, and DenseNet121FC using relative majority voting, simple averaging, and weighted averaging methods, respectively.
Agriculture 2024, 14, x FOR PEER REVIEW 14 of 19 parative analysis of these metrics across various categories on the validation set for different integration models.The relative majority voting method (v_Ensemble) outperformed the simple average method (a_Ensemble) and the weighted average method (w_Ensemble) in certain categories in some metrics.However, the simple average method exhibited slightly superior metrics in a few categories, with the most notable being a 1.43% higher precision rate for the N_72 category compared to the weighted average method.An assessment of precision and recall rates revealed that each integration strategy had distinct advantages in different categories, complicating a comprehensive evaluation of the model's performance.When considering the F1 score, which combines precision and recall, a more holistic evaluation of model performance is possible.Except for the N_240 category, where the F1 score of the weighted average method was marginally lower by 0.68% than that of the simple average method, the F1 scores for the other categories were optimal.Upon a comprehensive assessment, the model ensemble using the weighted average method exhibited the best overall performance, characterized by robustness and strong generalization capabilities.Herein, v_Ensemble, a_Ensemble, and w_Ensemble denote the abbreviations for ensemble models constructed by integrating ResNet101FC, Mo-bileNetV2FC, and DenseNet121FC using relative majority voting, simple averaging, and weighted averaging methods, respectively.The EMRDFC model proposed in this study integrates enhanced versions of Mo-bileNetV2FC, ResNet101FC, and DenseNet121FC, employing three distinct integration strategies.Table 3 details the performance of each model on the validation set.According to the table, the performance of any ensemble model surpasses that of its constituent base models.Among the ensemble models, v_Ensemble exhibited the lowest accuracy of 96.88% within the ensemble model framework.Nevertheless, this accuracy is still 0.87% higher than the highest-performing base model, DenseNet121FC, and represents a 5.73% improvement over MobileNetV2FC.The performance of the three ensemble models (v_Ensemble, a_Ensemble, w_Ensemble) surpasses that of the best individual model.This enhancement is attributed to the fact that each base model constituting the ensemble may focus on different features or patterns within the data.Through ensemble techniques, such as simple averaging, a more comprehensive capture of data characteristics is achieved, thereby reducing the bias inherent in single models and enhancing the generalization capability of the overall model.Furthermore, the integration of diverse models facilitates the mutual cancellation of errors and reduces the variance in model predictions.This is particularly beneficial for mitigating overfitting and reinforcing the ability to respond to key signals within the data, thereby improving the accuracy and stability of the model on unseen data and enhancing the diagnostic classification accuracy of cotton nitrogen nutrient The EMRDFC model proposed in this study integrates enhanced versions of Mo-bileNetV2FC, ResNet101FC, and DenseNet121FC, employing three distinct integration strategies.Table 3 details the performance of each model on the validation set.According to the table, the performance of any ensemble model surpasses that of its constituent base models.Among the ensemble models, v_Ensemble exhibited the lowest accuracy of 96.88% within the ensemble model framework.Nevertheless, this accuracy is still 0.87% higher than the highest-performing base model, DenseNet121FC, and represents a 5.73% improvement over MobileNetV2FC.The performance of the three ensemble models (v_Ensemble, a_Ensemble, w_Ensemble) surpasses that of the best individual model.This enhancement is attributed to the fact that each base model constituting the ensemble may focus on different features or patterns within the data.Through ensemble techniques, such as simple averaging, a more comprehensive capture of data characteristics is achieved, thereby reducing the bias inherent in single models and enhancing the generalization capability of the overall model.Furthermore, the integration of diverse models facilitates the mutual cancellation of errors and reduces the variance in model predictions.This is particularly beneficial for mitigating overfitting and reinforcing the ability to respond to key signals within the data, thereby improving the accuracy and stability of the model on unseen data and enhancing the diagnostic classification accuracy of cotton nitrogen nutrient levels.In contrast, w_Ensemble achieved the highest accuracy of 98.61%, an increase of 2.6% over the best base model, DenseNet121FC.It also surpassed v_Ensemble and a_Ensemble by 1.73% and 0.87%, respectively.The w_Ensemble model, in particular, demonstrated exceptional accuracy in diagnosing cotton nitrogen nutrition levels, underscoring its effectiveness in this area.Overall, the Ensemble model, especially w_Ensemble, exhibits the most commendable performance in diagnosing cotton nitrogen nutrient levels.This model effectively meets the high-precision diagnostic requirements in this field, showcasing its potential as a valuable tool in cotton nitrogen nutrition level assessment.To further validate the performance of the ensemble models, an analysis was conducted using the confusion matrices of the ensemble models on the test set.The values on the diagonal of the confusion matrices represent the percentage of correct classifications for each category.As evident from Figure 8, all three ensemble models correctly classified all samples of N_0, while other categories experienced varying degrees of misclassification.N_72 and N_144, as well as N_192 and N_240, were prone to misclassification by the models due to the similarity in color, morphology, and texture features between these categories.The v_Ensemble model exhibited the weakest ability to correctly classify N_72, with only 87.50% accuracy, and 12.50% of samples being misclassified as N_144.Additionally, 7.46% of N_144 samples were misclassified as N_72.There were relatively fewer misclassifications between N_192 and N_240.This is attributed to the equal weighting of each base model within the v_Ensemble model, which fails to highlight the contribution of each base model to the final outcome, leading to poorer performance, as models with lower accuracy adversely affect the overall accuracy, increasing the likelihood of misclassification across categories.The a_Ensemble model misclassified 5.21% of N_72 samples, showing a weakened ability to correctly classify N_144, but improved accuracy for the N_192 and N_240 categories.Unlike the v_Ensemble model, which only considers the final category, the a_Ensemble model accounts for the confidence level of each model's prediction.By averaging the output probabilities, it partially reflects the model's 'certainty' in its predictions.However, assigning equal weight to all base models does not optimize the ensemble model's effect.The w_Ensemble model significantly reduced the number of misclassifications across the aforementioned four categories, achieving higher accuracy rates for each category.This improvement is attributed to the w_Ensemble model allocating different weights based on the performance of each base model, obtaining a more comprehensive perspective from the predictions of different models, reducing prediction uncertainty, and ultimately enhancing the overall performance and accuracy of the ensemble model.In summary, the w_Ensemble model exhibits the best comprehensive performance, along with strong generalization capabilities.In conclusion, the w_Ensemble model emerged as the most effective, demonstrating optimal comprehensive performance with high classification accuracy across categories, which highlights its robustness and exceptional generalization capabilities.

Discussion
This study integrates the enhanced versions of ResNet101, MobileNetV2, and Dense-Net121 using various ensemble strategies to construct a cotton nitrogen nutrient-level diagnosis model based on EMRDFC, aiming for high-precision diagnosis of cotton nitrogen levels.The findings reveal that the proposed EMRDFC model effectively diagnoses the nitrogen nutrient levels in cotton and offers insights for diagnosing nitrogen levels in other crops.Compared to traditional methods of plant nitrogen nutrient-level diagnosis, such as chemical and manual diagnostics [2], this method offers advantages in terms of lower cost, timeliness, and scalability for widespread application.It can assist agricultural workers in applying nitrogen fertilizers scientifically and rationally, thereby increasing crop yield.
Previous studies have diagnosed cotton's nitrogen nutrient levels through image processing techniques [38][39][40], manually analyzing the relationship between image color, texture, and other feature parameters with different levels of nitrogen nutrition in cotton to construct models for predicting cotton nitrogen content.In contrast, the EMRDFC diagnosis model proposed in this paper eliminates the need for manual analysis of the relationship between image feature parameters and different nitrogen nutrient levels in cotton.The model autonomously recognizes the nitrogen nutrient levels in cotton without the need to manually extract features, thereby saving researchers a significant amount of labor and resources.Similarly, traditional machine learning methods for identifying crop nitrogen nutrient levels also require manual parameter extraction [41], which is heavily influenced by subjective factors and ultimately leads to poor model performance.Although CNNs can automatically extract features with superior effectiveness [42], single models are not ideal for diagnosing crop nitrogen nutrient levels.Compared to previous research [43,44], the method presented in this paper not only overcomes the limitations of traditional approaches but also achieves high-precision recognition of crop nitrogen nutrient levels.Furthermore, existing studies [28,45] have shown that ensemble models exhibit strong learning capabilities and generalization performance when diagnosing crop nitrogen nutrient levels, thereby validating the effectiveness of the method proposed in this paper for accurately diagnosing cotton nitrogen nutrient levels.This contributes to enhancing the efficiency and sustainability of agricultural production, promoting the intelligent development of agricultural production management.In conclusion, the EMRDFC model proposed in this article successfully circumvents the issues of subjectivity associated with manual feature extraction and the limitations of single-model performance, ultimately achieving high-precision diagnosis of cotton nitrogen nutrition levels.However, the increased complexity following the model ensemble elevates the demand for computational resources.

Discussion
This study integrates the enhanced versions of ResNet101, MobileNetV2, and DenseNet121 using various ensemble strategies to construct a cotton nitrogen nutrient-level diagnosis model based on EMRDFC, aiming for high-precision diagnosis of cotton nitrogen levels.The findings reveal that the proposed EMRDFC model effectively diagnoses the nitrogen nutrient levels in cotton and offers insights for diagnosing nitrogen levels in other crops.Compared to traditional methods of plant nitrogen nutrient-level diagnosis, such as chemical and manual diagnostics [2], this method offers advantages in terms of lower cost, timeliness, and scalability for widespread application.It can assist agricultural workers in applying nitrogen fertilizers scientifically and rationally, thereby increasing crop yield.
Previous studies have diagnosed cotton's nitrogen nutrient levels through image processing techniques [38][39][40], manually analyzing the relationship between image color, texture, and other feature parameters with different levels of nitrogen nutrition in cotton to construct models for predicting cotton nitrogen content.In contrast, the EMRDFC diagnosis model proposed in this paper eliminates the need for manual analysis of the relationship between image feature parameters and different nitrogen nutrient levels in cotton.The model autonomously recognizes the nitrogen nutrient levels in cotton without the need to manually extract features, thereby saving researchers a significant amount of labor and resources.Similarly, traditional machine learning methods for identifying crop nitrogen nutrient levels also require manual parameter extraction [41], which is heavily influenced by subjective factors and ultimately leads to poor model performance.Although CNNs can automatically extract features with superior effectiveness [42], single models are not ideal for diagnosing crop nitrogen nutrient levels.Compared to previous research [43,44], the method presented in this paper not only overcomes the limitations of traditional approaches but also achieves high-precision recognition of crop nitrogen nutrient levels.Furthermore, existing studies [28,45] have shown that ensemble models exhibit strong learning capabilities and generalization performance when diagnosing crop nitrogen nutrient levels, thereby validating the effectiveness of the method proposed in this paper for accurately diagnosing cotton nitrogen nutrient levels.This contributes to enhancing the efficiency and sustainability of agricultural production, promoting the intelligent development of agricultural production management.In conclusion, the EMRDFC model proposed in this article successfully circumvents the issues of subjectivity associated with manual feature extraction and the limitations of single-model performance, ultimately achieving highprecision diagnosis of cotton nitrogen nutrition levels.However, the increased complexity following the model ensemble elevates the demand for computational resources.

Conclusions
This study aimed to improve the performance of diagnosis models in determining cotton nitrogen nutrition levels, utilizing datasets with varied nitrogen levels in cotton imagery.We selected three foundational models: MobileNetV2, ResNet101, and DenseNet121, which were then enhanced with the CBAM module and the Focal loss function.These modifications led to accuracy improvements of 2.91%, 2.3%, and 2.93% in each model, respectively.To further advance the diagnostic capabilities for cotton nitrogen nutrition, we introduce the EMRDFC model.This new model amalgamates the improved versions using three distinct integration strategies.Significantly, the model ensemble used a weighted average approach, achieving an accuracy of 98.61%, which is 2.6% higher than that of the best-performing base model.This result underscores the efficacy of our proposed method in diagnosing cotton nitrogen nutrition levels.The application of deep learning technologies in diagnosing cotton nitrogen nutrient levels has significantly enhanced agricultural production efficiency, optimized fertilizer utilization, reduced environmental impact, lowered production costs, and promoted sustainable agricultural development.Through precise analysis and decision support for fertilization, this technology aids in agricultural ecological conservation, drives the digital and intelligent transformation of agriculture, and strengthens the decision-making process.
To facilitate real-time diagnosis on edge devices, future research could focus on model compression to reduce the model size and enhance the operational efficiency.Furthermore, the potential application of the EMRDFC model across other crops, such as cereal crops (e.g., wheat, maize) and vegetable crops (e.g., tomatoes, cucumbers), warrants investigation.This involves assessing the model's accuracy and applicability in diagnosing nitrogen nutrition levels in diverse crops.Considering the multifaceted environmental factors influencing the growth of cotton and other crops, integrating these factors into the EMRDFC model could enhance diagnostic accuracy and practicality.Most importantly, aligning closely with the realities of agricultural production, exploring the application of model diagnostic outcomes in precision fertilization and crop management could advance the development of intelligent and precision agriculture.

Figure 4 .
Figure 4. Model training accuracy and loss value variation: (a) Accuracy changes on the model training set; (b) change in loss values on the model training set; (c) accuracy changes on the model validation set; and (d) change in loss values on the model validation set.

Figure 4 .
Figure 4. Model training accuracy and loss value variation: (a) Accuracy changes on the model training set; (b) change in loss values on the model training set; (c) accuracy changes on the model validation set; and (d) change in loss values on the model validation set.

Figure 5 .
Figure 5. Changes in validation set accuracy and loss values before and after model improvement: (a) Change in accuracy before and after ResNet101 improvement; (b) change in loss values before and after ResNet101 improvement; (c) change in accuracy before and after MobileNetV2 improvement; (d) change in loss values before and after MobileNetV2 improvement; (e) change in accuracy before and after DenseNet121 improvement; and (f) change in loss values before and after Dense-Net121 improvement.

Figure 5 .
Figure 5. Changes in validation set accuracy and loss values before and after model improvement: (a) Change in accuracy before and after ResNet101 improvement; (b) change in loss values before and after ResNet101 improvement; (c) change in accuracy before and after MobileNetV2 improvement; (d) change in loss values before and after MobileNetV2 improvement; (e) change in accuracy before and after DenseNet121 improvement; and (f) change in loss values before and after DenseNet121 improvement.

Figure 6 .
Figure 6.Change in accuracy of validation set before and after model improvement.

Figure 6 .
Figure 6.Change in accuracy of validation set before and after model improvement.

Figure 7 .
Figure 7. Indicators of the ensemble model across categories: (a) Precision of the ensemble models across categories; (b) recall of the ensemble models across categories; and (c) F1 of the ensemble models across categories.

Figure 7 .
Figure 7. Indicators of the ensemble model across categories: (a) Precision of the ensemble models across categories; (b) recall of the ensemble models across categories; and (c) F1 of the ensemble models across categories.

Figure 8 .
Figure 8. Indicators of the ensemble model across categories: (a) Confusion matrix for relative majority voting ensemble models; (b) confusion matrix for simple averaging ensemble models; and (c) confusion matrix for weighted average ensemble models.

Figure 8 .
Figure 8. Indicators of the model across categories: (a) Confusion matrix for relative majority voting ensemble models; (b) confusion matrix for simple averaging ensemble models; and (c) confusion matrix for weighted average ensemble models.

Table 1 .
Number of categories in the dataset.

Table 3 .
Comparison of the performance of the models.