Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms

Karimi, Shno; Shariatmadari, Hossein; Shayannejad, Mohammad; Nourbakhsh, Farshid

doi:10.3390/agriculture16080834

Open AccessArticle

Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms

by

Shno Karimi

¹,

Hossein Shariatmadari

^1,*,

Mohammad Shayannejad

² and

Farshid Nourbakhsh

¹

Department of Soil Science, College of Agriculture, Isfahan University of Technology, Isfahan 84156-83111, Iran

²

Department of Water Science and Engineering, College of Agriculture, Isfahan University of Technology, Isfahan 84156-83111, Iran

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(8), 834; https://doi.org/10.3390/agriculture16080834

Submission received: 12 September 2025 / Revised: 15 October 2025 / Accepted: 20 March 2026 / Published: 9 April 2026

(This article belongs to the Topic Applications of Artificial Intelligence Models and Spatiotemporal Data in Agriculture and the Ecological Environment)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of compost maturity is vital for ensuring quality, safety, minimum substrate weight loss and agronomic performance of compost products. In this study, eight supervised machine learning (ML) classification models including Random Forest, Logistic Regression, Decision Tree, Gaussian and Multinomial Naive Bayes, K-Nearest Neighbors, Support Vector Machine, and AdaBoost were systematically evaluated for their ability to predict compost maturity using three key indicators: cation exchange capacity (CEC), carbon to nitrogen ratio (C/N), and humic acid (HA) content. A dataset comprising 756 samples (4 composting/vermicomposting systems × 7 treatments × 9 time points × 3 replicates) was generated. To reduce replicate-induced variability and ensure robust machine learning analysis, triplicates were averaged at each time point, resulting in 252 effective observations used for model development. Pearson correlation and heatmap analysis indicated strong interdependencies among CEC, HA, total nitrogen (TN) and organic matter (OM) content, confirming their collective utility in compost maturity classification. Model performance was assessed based on classification metrics (accuracy, precision, recall, F₁-score) and regression-based error indicators, including mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R²). Ensemble models, particularly RF and AdaBoost, showed the highest predictive accuracy (up to 0.98) and lowest error rates (e.g., MAE < 0.05, RMSE < 0.1, R² > 0.95) when predicting CEC and C/N-based maturity classes. HA-based predictions showed slightly lower precision and higher variance across models.

Keywords:

compost maturity; machine learning; ensemble models; vermicomposting; classification accuracy

Graphical Abstract

1. Introduction

The increasing demand for sustainable waste management and the growing awareness of soil health have intensified global interest in composting as an effective, environmentally friendly strategy to convert organic waste into stable and nutrient-rich organic fertilizers. Composting and vermicomposting are widely applied techniques for the bioconversion of livestock manure, particularly cattle and sheep manure, which are major sources of organic waste in the agricultural sector [1]. However, improper handling of manure can lead to severe environmental pollution, including the emission of greenhouse gases, nutrient leaching, and pathogen spread. Therefore, producing high-quality compost through optimized manure management strategies is essential. Maturity is a crucial parameter for determining compost quality, safety, and agronomic effectiveness. Immature compost may contain phytotoxic substances such as ammonia, volatile fatty acids, or unstable organic compounds that adversely affect plant germination and microbial activity in soils [2]. Therefore, reliable assessment of compost maturity is fundamental for ensuring safe application and marketability. Common indicators for assessing maturity include changes in organic matter content, the carbon-to-nitrogen (C/N) ratio, humification parameters, seed germination index (GI), and temperature stabilization [3].

Traditional methods for monitoring compost maturity often rely on labor-intensive and time-consuming laboratory analyses. Moreover, mechanistic and empirical models used to simulate composting dynamics frequently fail to capture the high degree of nonlinearity and heterogeneity inherent in composting processes, particularly when variable raw materials, additives, and microbial populations are involved [4]. These challenges have prompted researchers to seek more flexible, robust, and data-driven approaches for process modeling and prediction.

Machine learning (ML) has recently emerged as a powerful tool for modeling nonlinear, complex environmental processes. In contrast to traditional models that attempt to replicate mechanistic pathways, ML algorithms identify and learn patterns from historical data, allowing them to accurately predict system outcomes even when underlying mechanisms are poorly understood [5,6]. ML approaches are especially suited for composting systems where multiple interacting parameters, including temperature, moisture, aeration, additives, and microbial activity, collectively influence the composting trajectory and final maturity state [7,8]. ML algorithms can be broadly divided into regression and classification categories. While many previous studies have used regression models such as artificial neural networks (ANN), support vector regression (SVR), and random forest regression (RFR) to predict compost maturity on a continuous scale [9], classification models are more suitable in operational contexts where maturity needs to be assessed in discrete classes (e.g., immature, semi-mature, mature). Classification models offer intuitive outputs, reduce subjectivity, and can serve as reliable decision-support tools in compost management systems. Despite their potential, classification-based ML models remain underutilized in manure composting, especially in cases involving additives such as biochar, manganese dioxide, and clay. These additives significantly affect composting dynamics and maturity outcomes, altering microbial activity, moisture retention, and nutrient transformation. To address this gap, there is a pressing need to evaluate and compare the performance of various classification algorithms under different manure composting and vermicomposting scenarios [10].

In this study, we explored the applicability of eight supervised classification algorithms to predict compost maturity from process data collected during the composting and vermicomposting of cattle and sheep manure treated with different additives. The models included: Random Forest Classifier, Logistic Regression, Decision Tree Classifier, Naïve Bayes (Gaussian NB), Naïve Bayes (Multinomial NB), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), AdaBoost Classifier. Each algorithm represents a different approach to data classification, offering varied advantages in terms of interpretability, sensitivity to noise, computational efficiency, and resistance to overfitting [9,10]. The Random Forest and AdaBoost classifiers are ensemble methods that aggregate predictions from multiple base learners to improve accuracy and robustness. Logistic regression provides a linear baseline model that is interpretable and computationally efficient. Decision trees offer intuitive flowchart-based structures, while SVMs are particularly effective in high-dimensional spaces. The Naïve Bayes classifiers, though based on strong independence assumptions, are fast and surprisingly effective for many real-world problems, with the Multinomial variant particularly suited for datasets with positive, discrete features. Finally, KNN is a distance-based learner that classifies samples based on the majority class of their nearest neighbors.

The main objectives of this study were:

(i): to develop and compare the performance of these eight classification models in predicting compost maturity classes using manure-based composting datasets;
(ii): to validate the prediction performance of the best-performing models using experimentally measured maturity indicators (e.g., C/N ratio, CEC, HA).

By integrating advanced ML classification techniques with comprehensive composting datasets, this research aims to improve the accuracy and efficiency of maturity assessment in manure-based compost systems. The findings will contribute to better-informed decision-making in compost quality management and support the development of intelligent composting systems that minimize environmental impacts while maximizing agronomic benefits.

2. Materials and Methods

2.1. Raw Materials and Experimental Setup

Cattle and sheep manures were utilized as the primary substrates for both composting and vermicomposting processes. For composting, 5 kg (oven-dry weight basis) of homogenized manure mixtures were placed into insulated un-iolitic containers to minimize heat loss and maintain consistent thermal conditions. Vermicomposting was carried out in plastic containers of similar capacity, each inoculated with Eisenia fetida earthworms at a density of 20 individuals per kilogram of substrate, following established protocols for effective vermistabilization. All mixtures were adjusted to 70 ± 10% of their water-holding capacity (WHC) using deionized water, and moisture was maintained by periodic spraying. Aeration in compost units was ensured by manual mixing every 3–4 days. Substrate temperature was recorded daily with a digital probe thermometer, and pH was measured in 1:10 (w/v) water extracts at each sampling interval. Both composting and vermicomposting systems were maintained under ambient room temperature (25 ± 2 °C) for 120 days under aerobic conditions. To examine the effects of additives, three amendments were incorporated at the onset of the process:

(i) biochar derived from paper-pulp sludge, produced by slow pyrolysis at 360 °C, characterized by alkaline pH, porous structure, and a surface area of approximately 85 m² g⁻¹; (ii) carbonatic fire clay (FC), obtained from a local refractory clay deposit (Esteghlal Co., Abadeh, Iran), composed mainly of (SiO₂, 60%, Al₂O₃ = 27–28%, CaO + MgO, ≤1%, OM = 5.5–7.5%, pH = 6.5–8.6, and MnO₂ (industrial grade, >95% purity). These additives were applied at inclusion rates of 5% and 10% (w/w) for biochar and FC, and 0.5% and 1% (w/w) for MnO₂. Control treatments without additives were also included [11].

2.2. Composting and Sampling Procedures

The composting and vermicomposting processes were conducted in triplicate for each treatment. Aerobic conditions in compost reactors were maintained through manual turning at regular intervals to ensure uniform oxygen distribution and microbial activity. For vermicomposting treatments, the substrate was gently stirred to avoid harming earthworms while promoting aeration. Samples (100 g) were collected from each treatment at defined time points: 0, 7, 14, 21, 30, 45, 60, 90, and 120 days after incubation began. These time intervals were selected to monitor dynamic physicochemical and biological changes during the active composting and maturation phases. Post-collection, samples were air-dried at ambient temperature, lightly ground using a wooden mallet to avoid altering chemical structure, and sieved through a 2-mm mesh to ensure consistency in particle size prior to subsequent analyses.

2.3. Maturity Assessment and Binary Classification for Machine Learning

To establish an objective maturity reference, temporal changes in compost quality indicators (CEC, C/N ratio, HA/FA ratio, and microbial respiration) were analyzed across sampling intervals. Triplicate samples were averaged at each time point to reduce replicate-induced noise, yielding 252 effective observations (9 sampling stages × 7 treatments × 4 composting systems). Stabilization was operationally defined as a relative change of <5% between consecutive intervals (days 45→60 and 60→90). Based on this criterion, more than 80% of monitored indicators reached a plateau or near-stable values by day 60 across treatments. Therefore, day 60 was selected as the operational maturity time for subsequent model development. Using this reference, compost and vermicompost samples were dichotomized into mature (1) and immature (0) classes for machine learning analysis. Specifically, for each of the three maturity indicators (CEC, HA, and C/N), samples reaching ≥80% of their day-60 reference value were labeled as mature; otherwise, immature. This binary classification served as the target output in supervised ML models, providing a biologically validated and data-driven foundation for predictive modeling. Importantly, the dataset was nearly balanced, with approximately 53% immature and 47% mature samples, ensuring reliable model training without class imbalance bias.

2.4. Machine Learning Model Selection

In this study, compost and vermicompost samples were dichotomized into mature (1) and immature (0) classes using an 80% threshold applied to three maturity indicators: cation exchange capacity (CEC), humic acid content (HA), and the carbon-to-nitrogen ratio (C/N). For each indicator, observations reaching ≥80% of the reference maturity level were labeled mature; otherwise, immature. This binary classification was used as the target output in supervised machine learning models to predict compost and vermicompost maturity under different additive treatments. To predict compost maturity with high accuracy, eight machine learning (ML) classification algorithms were implemented and evaluated. These models represent a combination of standalone and ensemble-based classifiers with varied strengths in handling complex, nonlinear, and multivariate datasets. The classification algorithms included in this study are detailed below.

2.4.1. Random Forest Classifier (RFC)

Random Forest is an ensemble learning method that builds multiple decision trees on randomly selected data subsets and combines their outputs to enhance predictive accuracy and control overfitting. It is robust to noisy data and capable of capturing nonlinear interactions among variables [12,13]. Key hyperparameters such as n_estimators, max_depth, and max_features were optimized using grid search.

2.4.2. Logistic Regression (LR)

Logistic Regression is a linear classification model that estimates the probability of class membership using the logistic (sigmoid) function. It is simple, interpretable, and well-suited for binary and multiclass classification tasks with linearly separable data [13].

2.4.3. Decision Tree Classifier (DTC)

Decision Trees split the dataset into branches based on feature thresholds, creating a tree-like model that is easy to interpret. The model uses criteria such as Gini impurity or entropy to determine the best splits. DTCs can capture nonlinear relationships but are prone to overfitting unless pruned [14].

2.4.4. Gaussian Naive Bayes (GNB)

The Gaussian Naive Bayes algorithm assumes that the input features follow a Gaussian distribution and that they are conditionally independent given the class label. Despite its strong independence assumption, GNB performs surprisingly well for high-dimensional and small sample size datasets [15].

2.4.5. Multinomial Naive Bayes (MNB)

MNB is particularly suitable for discrete data such as word counts or positive-valued frequency features. It assumes feature independence and uses a multinomial distribution to model the probability of each feature given a class. In this study, MNB was applied to scaled and positively transformed maturity indices [16].

2.4.6. K-Nearest Neighbors (KNN)

KNN is a non-parametric, instance-based learning algorithm that assigns class labels based on the majority vote of the k nearest neighbors in the feature space. It is highly sensitive to feature scaling and noise, but effective in capturing local patterns and decision boundaries [17].

2.4.7. Support Vector Machine (SVM)

SVM constructs hyperplanes in high-dimensional space to separate classes with maximum margin. It is effective in handling both linear and nonlinear data through the use of kernel functions such as radial basis function (RBF), polynomial, and sigmoid. SVMs are especially suitable for small to medium datasets with clear margins of separation [18].

2.4.8. AdaBoost Classifier (ABC)

Adaptive Boosting (AdaBoost) is an ensemble learning technique that sequentially trains a series of weak learners (typically decision stumps) and adjusts their weights to focus on misclassified instances in each iteration. AdaBoost is known for its ability to reduce bias and variance simultaneously [19].

All models were trained and evaluated using cross-validation to ensure generalizability. Their performances were compared based on key classification metrics such as accuracy, precision, recall, F1-score, and area under the ROC curve (AUC-ROC). The most effective model was then selected for further interpretation and variable importance analysis.

2.5. Model Evaluation Metrics

To evaluate the classification performance of the eight machine learning (ML) models, a set of standard metrics was applied, including overall accuracy (ACC), precision, recall, F₁-score. These metrics provide a comprehensive understanding of model performance from different perspectives:

Accuracy (ACC) reflects the proportion of correct predictions among the total number of samples (Equation (1)). Precision measures the proportion of true positive predictions among all positive predictions, indicating the model’s reliability when it predicts a given class (Equation (2)). Recall (Sensitivity) is the proportion of true positive predictions among all actual positive samples, highlighting the model’s ability to detect instances of a specific class (Equation (3)). F₁-score is the harmonic mean of precision and recall, especially useful when data is imbalanced (Equation (4)).

Model training and evaluation were conducted using stratified 10-fold cross-validation to ensure robust generalizability and to mitigate the impact of data variability. The dataset was randomly split into 80% training and 20% testing sets, maintaining the original class distribution. To standardize feature scales, all input variables were normalized using the min–max scaling method, ensuring fair comparisons across algorithms [20].

accuracy = \frac{(T P + T N)}{(T P + T N + F P + F N)} \times 100

(1)

precision = \frac{T P}{T P + F P} \times 100

(2)

recall = \frac{T P}{T P + F N} \times 100

(3)

F_{1}_score = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(4)

where TP = True Positive; TN = True Negative; FP = False Positive; FN = False Negative. To further ensure dataset adequacy and minimize overfitting risk, model robustness was evaluated using two complementary approaches. First, stratified 10-fold cross-validation was performed to maintain class balance and reduce variability across folds. Second, bootstrap resampling (100 iterations) was applied to quantify prediction uncertainty and verify the stability of model accuracy and error metrics. These combined procedures confirmed that the dataset size (252 observations per compost type) was sufficient to support reliable model training and generalization, as also reported in similar compost maturity prediction studies [10,21].

2.6. Feature Importance Analysis

To gain interpretability and understand the underlying drivers of compost maturity classification, feature importance analysis was conducted on the trained ML models. Tree-based ensemble models such as Random Forest and AdaBoost naturally provide importance scores based on the decrease in Gini impurity or information gain contributed by each feature. In addition to model-specific importance metrics, permutation feature importance was calculated for all classifiers, providing a unified, model-agnostic assessment. This method quantifies the change in prediction error after permuting each feature, with larger increases indicating higher importance [22].

To ensure consistency across algorithms, feature importance was also cross-validated using SHAP (Shapley additive explanations), which provides robust and model-independent attributions of predictor effects. This dual approach (permutation and SHAP) avoided scaling-related anomalies previously observed in linear models (e.g., Logistic Regression).

The results from the importance analysis were used not only to interpret the biological and chemical relevance of key predictors (e.g., humification index, C/N ratio, CEC) but also to potentially guide future compost process monitoring and optimization strategies.

The robustness and generalization of the ML models were rigorously assessed through a combination of stratified 10-fold cross-validation and bootstrap resampling (100 iterations). These approaches quantified model uncertainty and confirmed stability in both accuracy and feature importance rankings. In addition, multiple evaluation metrics were applied, including the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE), to assess the predictive performance of each model [23]. The equations used to calculate these performance metrics are as follows:

R^{2} = 1 - \frac{Σ_{i = 1}^{n} (e_{i} - p_{i})}{Σ_{i = 1}^{n} (e_{i} - ē)} ē = \frac{Σ_{i = 1}^{n} e_{i}}{n}

(5)

RMSE = \sqrt{\frac{Σ_{i = 1}^{n} (e_{i} - p_{i})^{2}}{n}}

(6)

MAE = \frac{Σ_{i = 1}^{n} | p_{i} - e_{i} |}{n}

(7)

where n is the number of data points; e_i is the ith experimental data and p_i is the ith corresponding predicted value; ē is the mean of experimental values.

3. Results and Discussion

3.1. Correlation Between Different Parameters of Composting and Vermicomposting

The correlation heatmap (Figure 1) provides valuable insight into the relationships between various physicochemical, biochemical, and biological parameters during the composting and vermicomposting process. This type of multivariate correlation analysis helps to elucidate the underlying dynamics of organic matter transformation, microbial activity, and humification over time. Firstly, strong positive correlations (values close to +1 in dark red) are observed between key humification indicators such as HA (humic acid), FA (fulvic acid), and HR (humification ratio). This is consistent with the expectation that as humification progresses, these fractions co-accumulate. The positive correlation between time and HA/HR/FA confirms the gradual enrichment of humic substances during composting/vermicomposting, which is in line with findings by [24], who reported a steady increase in HA and HR over time in manure-biochar composts, indicating successful humification. Furthermore, the CEC (cation exchange capacity) is also positively correlated with HA and FA, reflecting the enhanced nutrient retention capacity associated with increased humified organic matter [25].

On the contrary, negative correlations are notable between cellulose, hemicellulose, and lignin and maturity indicators such as HR and HA, suggesting that as structural polysaccharides decompose, humic substances are synthesized. This supports the general notion that lignocellulosic degradation precedes and aids humification, particularly when facilitated by microbial and enzymatic activity (as shown in [26]). Similarly, weight loss is negatively correlated with lignocellulosic components, reflecting mass reduction through microbial respiration and organic matter mineralization [27].

Interestingly, microbial biomass nitrogen (MBN) and the Nm/i ratio (defined as the ratio of mineralized nitrogen to immobilized nitrogen) showed moderate positive correlations with early-stage parameters (OM, TN, OC), while demonstrating weak or negative correlations with maturity-related variables (HA, HR). For clarity, all abbreviations are defined in the legend of Figure 1, and the color gradient is explained to enhance readability. This aligns with prior research suggesting that microbial activity peaks during the active composting phase but then stabilizes or declines as substrate availability decreases and humic substances accumulate [28]. The negative correlation between the C/N ratio and maturity indicators further validates C/N as a classical compost maturity metric, supporting its inverse relationship with humification [29].

In the work by [10], a comprehensive machine learning framework was developed to identify critical maturity predictors in green waste composting. Correlation heatmaps revealed strong positive relationships between CEC and HA (r > 0.85), as well as inverse relationships between the C/N ratio and both TN and OM degradation (r < −0.7), highlighting the interdependence of organic nitrogen transformation and maturity progress. These findings underscore the importance of selecting robust parameters for predictive modeling of compost maturity. Similarly, [30] evaluated the microbial community dynamics during vermicomposting of sludge and kitchen waste. Their correlation analysis indicated a negative relationship between microbial diversity and ammonia content, suggesting that nitrogen mineralization and microbial succession are inversely linked a trend often modulated by earthworm activity and substrate type. Their heatmap displayed clustering of parameters into functional microbial consortia linked to composting stages. Ref. [31] extended the discussion by analyzing fungal community shifts during vermicomposting. Spearman’s rank correlation showed that key plant growth-promoting biostimulants (like indole acetic acid and gibberellins) were positively correlated with TN and lignin degradation (ρ > 0.75), indicating a biochemical linkage between nutrient mineralization and metabolite production. This supports the view that compost maturity is not only a function of physical degradation but also microbial metabolic output. In another metagenomics-based study, ref. [32] demonstrated stabilization trends through Pearson correlation, where microbial abundance and enzymatic activity showed strong positive associations with humification parameters (r > 0.9). Final vermicompost samples showed significantly strengthened correlations between pH, microbial diversity, and humic substances compared to compost-only treatments, indicating vermicomposting enhances synergistic biochemical transformations. Ref. [33] took a more integrated approach by using partial correlation analyses to define key functional predictors, identifying TN, CEC, and HA as primary maturity indicators. Their findings reinforce that vermicomposting amplifies correlations among beneficial parameters due to enhanced microbial processing, reduced inhibitory compounds, and optimized aeration. Ref. [34] introduced a bioactivity-focused correlation framework, emphasizing how humic substance functionality correlates strongly with maturity indicators like CEC and optical density ratios (r~0.95). Importantly, this study showed that vermicomposting samples consistently exhibited tighter clustering in the correlation matrix, reinforcing the idea that vermicomposting enhances the coherence of compost quality parameters. Additionally, Ref. [35], exploring biochar co-treatment strategies, found that EC, lignocellulose degradation, and earthworm biomass were significantly interlinked. Their correlation heatmap suggested that optimal biochar application can reinforce positive parameter associations, particularly between microbial respiration and nutrient availability. Ref. [36] conducted a broad correlation-based meta-analysis of composting and vermicomposting datasets using machine learning. Their use of Pearson correlation matrices revealed consistent relationships among key indicators such as OM, TN, HA, and microbial diversity across various organic substrates, indicating that these relationships are robust across composting systems.

Correlation analyses from recent high-impact studies reveal consistent, biologically meaningful relationships among composting and vermicomposting parameters. Strong positive correlations among CEC, HA, TN, and microbial activity emerge as universal trends, while inverse relationships between C/N and maturity indicators suggest that composting progression can be effectively tracked using these variables. Vermicomposting, in particular, enhances the interdependence and synergy among key parameters, likely due to improved aeration, microbial enrichment, and enzymatic degradation. Overall, this heatmap analysis reinforces established models of organic matter transformation and maturity development in composting systems. It affirms the interrelated dynamics of decomposition, microbial action, and humification, providing both theoretical and practical insights for optimizing compost quality and monitoring the process.

3.2. Performance Metrics of Different Machine Learning Models in Predicting Compost Maturity

The final dataset used for machine learning consisted of 252 averaged observations (9 sampling times × 7 treatments × 4 composting systems). Triplicate replicates at each time point were averaged to reduce noise, ensuring robust estimates for model training. Importantly, the binary classification of compost maturity yielded a nearly balanced dataset, with 53% immature and 47% mature samples across treatments and composting stages. This balance reduced the risk of model bias toward a dominant class and enhanced the reliability of training and validation. The temporal distribution of samples further reflected the biological progression of composting, with earlier stages dominated by immature samples and later stages converging toward maturity. Table 1 presents a comparative assessment of eight machine learning (ML) classifiers in predicting compost maturity based on three indicators: cation exchange capacity (CEC), carbon-to-nitrogen ratio (C/N), and humic acid (HA) content. Among the evaluated models, Random Forest (RF) demonstrated the highest overall performance, especially in predicting CEC (F₁-score: 0.97; accuracy: 0.98), reflecting its strong generalization capacity and capability to handle nonlinear relationships in complex compost datasets. This aligns with the findings of Ref. [11], who emphasized RF’s superiority in capturing key maturity indicators during manure composting. Similarly, AdaBoost showed competitive accuracy, particularly in predicting C/N (F₁-score: 0.97), and maintained robust performance across other indicators, confirming the conclusions by [10] regarding the effectiveness of ensemble models in enhancing weak learners for compost maturity classification.

Logistic Regression (LR) also performed reliably, achieving F₁-scores above 0.90 for both CEC and C/N, suggesting that linearly separable features still offer predictive strength under well-structured compost datasets, as supported by [10]. However, Decision Tree (DT), while delivering perfect recall (1.00) for HA, exhibited lower precision (0.84), which reduced its F₁-score to 0.91. This reflects DT’s known tendency to overfit noisy or unbalanced classes in biological datasets, as previously noted by [21].

Naive Bayes classifiers, particularly the Multinomial variant, underperformed across indicators, likely due to their assumption of feature independence, which does not hold in composting systems where biochemical interactions are interdependent. Gaussian NB showed relatively better results (F₁-score: 0.90 for CEC), though still inferior to ensemble models. The K-Nearest Neighbors (KNN) classifier achieved strong results for CEC (F₁-score: 0.94) and HA (F₁-score: 0.90), consistent with its sensitivity to localized patterns and its effective handling of moderately structured compost data [21].

Support Vector Machine (SVM) achieved perfect precision for C/N and HA, but lower recall (0.79 and 0.81, respectively), leading to slightly lower F1-scores (0.88 and 0.90). These results suggest SVM’s performance may be constrained by compost dataset variability and indicator overlap. Ref. [37] similarly reported that SVM’s margin-based classification can struggle with non-linearly separable data typical of composting stages.

Across all models and indicators, CEC emerged as the most robust and predictive maturity parameter, consistently associated with the highest F₁-scores and accuracy values. This confirms its relevance as a biochemical indicator strongly linked to compost functional maturity and nutrient buffering capacity [34]. Although HA and C/N provided useful complementary insights, their predictive value was more model-dependent. Variability in HA prediction accuracy suggests that this indicator is sensitive to both microbial dynamics and humification rate during the composting process (Table 2), a finding also emphasized by [11].

Overall, ensemble-based models such as Random Forest and AdaBoost outperformed others across multiple indicators, highlighting their suitability for predicting compost maturity in heterogeneous organic waste systems. Their ability to integrate multiple decision paths and correct weak predictions appears particularly advantageous in datasets where organic matter transformations are non-linear and time-dependent.

3.3. Model Performance Evaluation and Comparative Analysis of Machine Learning Models in Predicting Compost Maturity

To evaluate compost maturity using machine learning, we assessed the predictive power and reliability of eight classifiers (Random Forest, Logistic Regression, Decision Tree, Gaussian Naive Bayes, Multinomial Naive Bayes, K-Nearest Neighbors, Support Vector Machine, and AdaBoost) on three key parameters: cation exchange capacity (CEC), humic acid content (HA), and the carbon-to-nitrogen ratio (C/N). The analysis integrates model feature importance, cross-validation performance, and model uncertainty with recent advancements in compost maturity prediction research.

3.4. Feature Importance Evaluation

Feature importance values (Table 3) revealed that HA, C/N, and composting time consistently emerged as the most influential predictors across most models. For instance, in Random Forest, HA prediction was most sensitive to CEC (0.25), while C/N was most influenced by both CEC (0.19) and DP (0.20). Multinomial Naive Bayes exhibited a strong dependence on CEC in predicting all three maturity parameters, with notably high importance values for CEC predicting HA (0.38) and C/N (0.43). SVM and KNN also showed consistent contributions of CEC and C/N across parameters. To eliminate anomalies observed in the initial submission (e.g., unusually high importance values in Logistic Regression), all feature importance scores were re-calculated using normalized coefficients (z-scores) for linear models and SHAP/permutation importance for tree-based and kernel-based models. This adjustment provided consistency across algorithms and confirmed that CEC, HA, and C/N were robustly identified as the most influential predictors. The stability of these importance rankings was further supported by stratified 10-fold cross-validation and bootstrap resampling (100 iterations), which demonstrated limited variability in predictor contributions and reduced the likelihood of overfitting.

This is in alignment with the findings of [1], who emphasized CEC and HA as crucial compost maturity indicators, and [21], who identified TOC, OM, and nitrogen fractions as dominant predictors. Ref. [38] further highlighted humification metrics as strong discriminators across composting stages, which supports our observed influence of HA.

3.5. Cross-Validation Accuracy and Model Uncertainty

Based on the 5-fold cross-validation results (Table 4), the Random Forest (RF) model exhibited the highest overall performance in predicting compost maturity indicators, achieving mean accuracies of 0.92 for CEC (range: 0.95–0.98), 0.89 for HA (0.94–0.98), and 0.97 for C/N (0.97–0.99). Logistic Regression performed comparably well, with mean accuracies of 0.92 for CEC, 0.91 for HA, and 0.93 for C/N, confirming its ability to capture linear relationships in well-structured datasets. Decision Tree and Gaussian Naive Bayes (GNB) models showed moderate but stable performance, reaching 0.95 for C/N and 0.90 for CEC, though they exhibited wider confidence intervals in some cases. Multinomial Naive Bayes (MNB) and K Neighbors achieved slightly lower accuracies (means of 0.87–0.91) due to their sensitivity to data distribution and local variability. The Support Vector Machine (SVM) model displayed balanced performance across indicators (mean accuracies 0.90–0.91 for CEC and HA, 0.86 for C/N), indicating good adaptability to non-linear boundaries. AdaBoost also achieved strong and stable performance, particularly for C/N (mean: 0.98, range: 0.97–1.00) and CEC (mean: 0.93, range: 0.94–0.97), comparable to RF in both accuracy and robustness. To further validate these results and eliminate potential overfitting concerns, a bootstrap resampling analysis (n = 100, sampling rate = 0.8, with replacement) was performed. In each iteration, the models were retrained and re-evaluated, and the mean ± standard deviation of performance metrics were calculated. The results confirmed narrow uncertainty bounds and high reproducibility, with ensemble models showing stable accuracies across iterations, RF (0.95–0.98), AdaBoost (0.94–0.97), and SVM (0.88–0.91). The dataset also remained balanced (52% mature, 48% immature), effectively ruling out class imbalance effects. These results collectively demonstrate that the high accuracies of ensemble models are not artifacts of limited data size or overfitting but reflect consistent, biologically grounded prediction patterns. The robustness of these models reinforces their applicability for maturity assessment in diverse composting systems with additive treatments. These findings align with recent studies [33,39], which reported that ensemble tree-based methods, particularly Random Forest, combine high accuracy with robustness in compost maturity prediction, while certain boosting approaches like AdaBoost may yield unstable results in small or imbalanced datasets. Probabilistic models such as Naive Bayes remain advantageous for interpretability and handling noisy biochemical transformation data, but may underperform against optimized ensemble learners in complex multi-parameter scenarios.

3.6. Analysis of Regression-Based Performance Metrics for Compost Maturity Prediction Models

Table 5 summarizes the regression-based performance metrics, Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and coefficient of determination (R²) for eight classification models predicting three critical compost maturity indicators: Cation Exchange Capacity (CEC), Humic Acid content (HA), and Carbon-to-Nitrogen ratio (C/N).

The Random Forest model stands out with the highest predictive accuracy for CEC, achieving an R² of 0.91, coupled with the lowest MAE (0.02) and RMSE (0.144) (Table 5). This superior performance indicates Random Forest’s strong capacity to model the complex, nonlinear interactions governing soil nutrient dynamics and organic matter transformations during composting, consistent with findings reported by [10], who emphasized the model’s robustness in organic waste maturity prediction. For HA and C/N ratio, Random Forest, Logistic Regression, and Decision Tree exhibit comparable R² values (~0.72–0.74), indicating moderate predictive reliability for these indicators. The marginally higher MAE and RMSE values (0.06 and 0.25, respectively) reflect the increased complexity of predicting humification processes and nitrogen transformations, as also discussed by [21] and [38].

Logistic Regression and K-Nearest Neighbors (KNN) deliver similar performance, particularly for CEC and HA, with R² around 0.72–0.81 and low error metrics, highlighting their suitability for compost maturity prediction when the data structure exhibits moderate linearity or localized clustering patterns. In contrast, Naive Bayes models, both Gaussian (GNB) and Multinomial (MNB)—show relatively higher errors and lower R² values (0.53–0.65), reflecting limitations in handling continuous-valued compost maturity indicators, especially given the inherent variability and noise in composting datasets. These results align with [33], who noted Naive Bayes methods’ limited adaptability in modeling dynamic organic matter decomposition pathways.

Support Vector Machine (SVM) exhibits moderate accuracy, with R² values around 0.62–0.72, confirming its ability to handle nonlinear relationships but also indicating sensitivity to parameter tuning and data preprocessing. AdaBoost’s results, showing near-perfect R² for CEC and C/N but slightly lower for HA, suggest potential overfitting on these datasets or the need for further validation, a concern also raised by [2] regarding boosting algorithms in composting applications. The generally higher R² and lower error metrics for CEC prediction across models underscore CEC’s relative stability and strong correlation with compost maturity stages, corroborating earlier studies [39,40]. Conversely, HA and C/N ratios, influenced by multifactorial biological and chemical transformations, present greater modeling challenges, necessitating more complex or hybrid machine learning approaches. This analysis reinforces the necessity of multi-model evaluations, where models like Random Forest and Logistic Regression can serve as reliable baselines, complemented by SVM and ensemble methods for enhanced prediction under diverse composting scenarios.

In addition to overall classification accuracy, the operational maturity threshold was explicitly evaluated to justify the selection of day 60 as the reference maturity point. Across treatments, multiple indicators including CEC, C/N ratio, and HA/FA ratio reached at least 80% of their maximum values by day 60 and exhibited relative changes of less than 5% between consecutive intervals (day 45–60 and day 60–90). These trends, consistent across composting and vermicomposting setups, indicate biological stabilization and are in line with maturity benchmarks reported in earlier studies [41,42]. Therefore, day 60 was selected as the operational maturity time for model training and prediction. Regarding the comparatively lower prediction accuracy for humic acid (HA), this can be attributed to its dynamic and non-linear transformation during organic matter decomposition. Unlike CEC and C/N, which follow relatively monotonic stabilization patterns, HA undergoes both accumulation from lignin degradation and partial degradation by sustained microbial activity, leading to fluctuations rather than a single plateau. These opposing processes, strongly influenced by manure type and additive interactions, reduce the ability of models to consistently capture HA dynamics [43,44]. The observed variability is therefore biologically meaningful and highlights the complexity of HA as a maturity indicator compared to more stable parameters such as CEC and C/N.

3.7. Comparative Discussion with Literature

The prominence of CEC and HA in predicting compost maturity aligns with the biochemical transformations tracked during organic matter decomposition and humification, as detailed in [1] and [38] (Table 6). Our results confirm the assertion by [21] that C/N dynamics and nitrogen mineralization trajectories provide essential maturity cues. The superior performance of Naive Bayes variants supports [2], who documented the reliability of probabilistic models under data-scarce or noisy conditions (Table 5). The consistent role of composting time (as a surrogate for biological succession and material stability) aligns with [9] and [45], who argued for incorporating temporal and thermophilic indices in predictive modeling. Moreover, our findings complement the review by [31], which underscores the need for integrated modeling approaches to capture multi-parametric maturity indicators.

Overall, this comparative study demonstrates that Multinomial Naive Bayes and SVM models offer robust and accurate predictions of compost maturity indicators (CEC, HA, C/N) with minimal uncertainty. Feature importance analysis confirms the pivotal roles of CEC, HA, C/N ratio, and composting duration. Although the dataset size (252 observations per compost type) may appear moderate, its adequacy was ensured by the use of stratified 10-fold cross-validation combined with bootstrap resampling (100 iterations), which confirmed stable accuracy metrics and feature importance rankings across models. Learning behavior showed convergence between training and validation performance, suggesting no evidence of overfitting. Furthermore, our dataset scale is comparable to recent ML-based compost maturity studies [1,21], where datasets of similar size proved sufficient for identifying dominant indicators and achieving reliable predictions. Future modeling strategies should prioritize multi-indicator inputs and consider hybrid ensemble-probabilistic frameworks for higher reliability. These results strongly support recent advancements in machine learning-assisted compost maturity evaluation, confirming the generalizability of findings across composting systems, as highlighted in leading studies.

4. Conclusions

This study demonstrated the effectiveness of machine learning classification algorithms in predicting compost maturity based on three key indicators: cation exchange capacity (CEC), carbon-to-nitrogen (C/N) ratio, and humic acid (HA) content. Correlation heatmaps and statistical analysis revealed strong associations between CEC, total nitrogen (TN), HA, and organic matter (OM), affirming their collective diagnostic value for maturity evaluation. Among the evaluated models, AdaBoost and Random Forest consistently exhibited the highest classification performance, especially in predicting maturity based on CEC and C/N indicators, with mean accuracies exceeding 0.93 and narrow uncertainty ranges indicating robust reliability. Logistic Regression and K-Nearest Neighbors also performed competitively, particularly for CEC-based predictions, while Support Vector Machine showed high precision but lower recall in several cases. Notably, humic acid-based predictions were less consistent across models, reflecting its complex transformation dynamics during composting and vermicomposting. These results underscore the benefit of integrating multiple maturity indicators and leveraging ensemble ML approaches to enhance the reliability and generalizability of compost quality assessments.

Author Contributions

S.K.: Writing—original draft, Methodology, Data curation, Conceptualization, Resources, Investigation, Formal analysis. H.S.: Writing—review & editing, Visualization, Supervision. M.S.: Writing—review & editing, Visualization, Supervision. F.N.: review & editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Li, Y.; Li, J.; Chang, Y.; Li, R.; Zhou, K.; Zhan, Y.; Wei, R.; Wei, Y. Comparing bacterial dynamics for the conversion of organics and humus components during manure composting from different sources. Front. Microbiol. 2023, 14, 1281633. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; He, P.; Zhang, H.; Lü, F. Effects of volatile fatty acids on soil properties, microbial communities, and volatile metabolites in wheat rhizosphere of loess. J. Clean. Prod. 2024, 476, 143798. [Google Scholar] [CrossRef]
Zhan, Y.; Li, R.; Chen, W.; Chen, Y.; Yang, L.; Liu, B.; Tao, X.; Chen, P.; Wang, Z.; Zhang, H. Deciphering the carbon and nitrogen component conversion in humification process mediated by distinct microbial mechanisms in composting from different domestic organic wastes. Environ. Sci. Pollut. Res. 2024, 32, 13474–13486. [Google Scholar] [CrossRef] [PubMed]
Zendehboudi, S.; Rezaei, N.; Lohi, A. Applications of hybrid models in chemical, petroleum, and energy systems: A systematic review. Appl. Energy 2018, 228, 2539–2566. [Google Scholar] [CrossRef]
Cipullo, S.; Snapir, B.; Prpich, G.; Campo, P.; Coulon, F. Prediction of bioavailability and toxicity of complex chemical mixtures through machine learning models. Chemosphere 2019, 215, 388–395. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Chen, Q.; Chen, J.; Xu, D.; Zhan, H.; Peng, H.; Pan, J.; Vlaskin, M.; Leng, L.; Li, H. Machine learning for hydrothermal treatment of biomass: A review. Bioresour. Technol. 2023, 370, 128547. [Google Scholar] [CrossRef]
Dogan, H.; Temel, F.A.; Yolcu, O.C.; Turan, N.G. Modelling and optimization of sewage sludge composting using biomass ash via deep neural network and genetic algorithm. Bioresour. Technol. 2023, 370, 128541. [Google Scholar] [CrossRef]
Yolcu, O.C.; Temel, F.A.; Kuleyin, A. New hybrid predictive modeling principles for ammonium adsorption: The combination of Response Surface Methodology with feed-forward and Elman-Recurrent Neural Networks. J. Clean. Prod. 2021, 311, 127688. [Google Scholar] [CrossRef]
Ding, S.; Huang, W.; Xu, W.; Wu, Y.; Zhao, Y.; Fang, P.; Hu, B.; Lou, L. Improving kitchen waste composting maturity by optimizing the processing parameters based on machine learning model. Bioresour. Technol. 2022, 360, 127606. [Google Scholar] [CrossRef]
Li, Y.; Xue, Z.; Li, S.; Sun, X.; Hao, D. Prediction of composting maturity and identification of critical parameters for green waste compost using machine learning. Bioresour. Technol. 2023, 385, 129444. [Google Scholar] [CrossRef]
Karimi, S.; Shariatmadari, H.; Nourbakhsh, F. Integrated assessment of physicochemical and structural changes during composting and vermicomposting of cattle and sheep manure with additive treatments. Sci. Rep. 2026, 16, 10128. [Google Scholar] [CrossRef] [PubMed]
Shi, S.; Guo, Z.; Bao, J.; Jia, X.; Fang, X.; Tang, H.; Zhang, H.; Sun, Y.; Xu, X. Machine learning-based prediction of compost maturity and identification of key parameters during manure composting. Bioresour. Technol. 2025, 419, 132024. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Chen, H.; Zhang, L.; Feng, Z. Enhancing building energy efficiency using a random forest model: A hybrid prediction approach. Energy Rep. 2021, 7, 5003–5012. [Google Scholar] [CrossRef]
Song, Y.-Y.; Lu, Y. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130–135. [Google Scholar]
Jahromi, A.H.; Taheri, M. A non-parametric mixture of Gaussian naive Bayes classifiers based on local independent features. In Proceedings of the 2017 Artificial intelligence and signal processing conference (AISP), Shiraz, Iran, 25–27 October 2017; pp. 209–212. [Google Scholar]
Mjali, S.Z. Latent Semantic Models: A Study of Probabilistic Models for Text in Information Retrieval. Master’s Thesis, University of Pretoria, Pretoria, South Africa, 2020. [Google Scholar]
Halder, R.K.; Uddin, M.N.; Uddin, M.A.; Aryal, S.; Khraisat, A. Enhancing K-nearest neighbor algorithm: A comprehensive review and performance analysis of modifications. J. Big Data 2024, 11, 113. [Google Scholar] [CrossRef]
Razaque, A.; Ben Haj Frej, M.; Almi’ani, M.; Alotaibi, M.; Alotaibi, B. Improved support vector machine enabled radial basis function and linear variants for remote sensing image classification. Sensors 2021, 21, 4431. [Google Scholar] [CrossRef]
Chen, S.; Shen, B.; Wang, X.; Yoo, S.-J. A strong machine learning classifier and decision stumps based hybrid adaboost classification algorithm for cognitive radios. Sensors 2019, 19, 5077. [Google Scholar] [CrossRef]
Singh, V.; Pencina, M.; Einstein, A.J.; Liang, J.X.; Berman, D.S.; Slomka, P. Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging. Sci. Rep. 2021, 11, 14490. [Google Scholar] [CrossRef]
Wang, N.; Yang, W.; Wang, B.; Bai, X.; Wang, X.; Xu, Q. Predicting maturity and identifying key factors in organic waste composting using machine learning models. Bioresour. Technol. 2024, 400, 130663. [Google Scholar] [CrossRef]
Rajbahadur, G.K.; Wang, S.; Oliva, G.A.; Kamei, Y.; Hassan, A.E. The impact of feature importance methods on the interpretation of defect classifiers. IEEE Trans. Softw. Eng. 2021, 48, 2245–2261. [Google Scholar] [CrossRef]
Li, J.; Zhu, X.; Li, Y.; Tong, Y.W.; Ok, Y.S.; Wang, X. Multi-task prediction and optimization of hydrochar properties from high-moisture municipal solid waste: Application of machine learning on waste-to-resource. J. Clean. Prod. 2021, 278, 123928. [Google Scholar] [CrossRef]
Jindo, K.; Sonoki, T.; Matsumoto, K.; Canellas, L.; Roig, A.; Sanchez-Monedero, M.A. Influence of biochar addition on the humic substances of composting manures. Waste Manag. 2016, 49, 545–552. [Google Scholar] [CrossRef] [PubMed]
Zhu, X.; Wang, X.; Ok, Y.S. The application of machine learning methods for prediction of metal sorption onto biochars. J. Hazard. Mater. 2019, 378, 120727. [Google Scholar] [CrossRef] [PubMed]
Tuomela, M.; Vikman, M.; Hatakka, A.; Itävaara, M. Biodegradation of lignin in a compost environment: A review. Bioresour. Technol. 2000, 72, 169–183. [Google Scholar] [CrossRef]
Barrena, R.; Vázquez, F.; Sánchez, A. Dehydrogenase activity as a method for monitoring the composting process. Bioresour. Technol. 2008, 99, 905–908. [Google Scholar] [CrossRef]
Liu, J.; Shen, Y.; Ding, J.; Luo, W.; Zhou, H.; Cheng, H.; Wang, H.; Zhang, X.; Wang, J.; Xu, P. High oil content inhibits humification in food waste composting by affecting microbial community succession and organic matter degradation. Bioresour. Technol. 2023, 376, 128832. [Google Scholar] [CrossRef]
Bernal, M.P.; Alburquerque, J.A.; Moral, R. Composting of animal manures and chemical criteria for compost maturity assessment. A review. Bioresour. Technol. 2009, 100, 5444–5453. [Google Scholar] [CrossRef]
Gu, Z.; He, L.; Liu, T.; Xing, M.; Feng, L.; Luo, G. Exploration of Bacterial Community Structure Profiling and Functional Characteristics in the Vermicomposting of Wasted Sludge and Kitchen Waste. Water 2024, 16, 3107. [Google Scholar] [CrossRef]
Zhang, Y.; Yuan, H.; Peng, S.; Wang, Z.; Cai, S.; Chen, Z.; Yang, B.; Yang, P.; Wang, D.; Guo, J. Vermicomposting preferably alters fungal communities in wasted activated sludge and promotes the production of plant growth-promoting biostimulants in the vermicompost. Chem. Eng. J. 2024, 495, 153232. [Google Scholar] [CrossRef]
Popova, V.; Petkova, M.; Shilev, S. Metagenomic approach unravelling bacterial diversity in combined composting and vermicomposting technology of agricultural wastes. Ecol. Balk. 2023, 15, 135–150. [Google Scholar]
Huang, L.-T.; Hou, J.-Y.; Liu, H.-T. Machine-learning intervention progress in the field of organic waste composting: Simulation, prediction, optimization, and challenges. Waste Manag. 2024, 178, 155–167. [Google Scholar] [CrossRef] [PubMed]
Silva, A.C.; Rocha, P.; Valderrama, P.; Antelo, J.; Geraldo, D.; Proença, M.F.; Fiol, S.; Bento, F. A correlation-based approach for predicting humic substance bioactivity from direct compost characterization. Molecules 2025, 30, 1511. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Liu, S.; Xu, Y.; Xu, C.; Deng, B.; Cao, H.; Yuan, Q. Optimizing biochar addition strategies in combined processes: Comprehensive assessment of earthworm growth, lignocellulose degradation and vermicompost quality. Bioresour. Technol. 2024, 406, 131031. [Google Scholar] [CrossRef] [PubMed]
Chen, G.; Wang, D.; Song, Y.; Tao, J.; Ge, Y.; Niu, H.; Xu, S.; Liang, R.; Yan, B. Optimizing Composting Processes with Machine Learning Models: A Comprehensive Explanation of Multiple Indicators; Available at SSRN 5205688; Elsevier: Amsterdam, The Netherlands, 2025. [Google Scholar]
Kong, Y.; Zhang, J.; Zhang, X.; Gao, X.; Yin, J.; Wang, G.; Li, J.; Li, G.; Cui, Z.; Yuan, J. Applicability and limitation of compost maturity evaluation indicators: A review. Chem. Eng. J. 2024, 489, 151386. [Google Scholar] [CrossRef]
Ding, S.; Wu, D. Comprehensive analysis of compost maturity differences across stages and materials with statistical models. Waste Manag. 2025, 193, 250–260. [Google Scholar] [CrossRef]
Peng, Q.-Q.; Zhou, X.; Zhou, H.; Liao, Y.; Han, Z.-Y.; Hu, L.; Zeng, P.; Gu, J.-F.; Zhang, R. Bridging the Gap: Limitations of Machine Learning in Real-World Prediction of Heavy Metal Accumulation in Rice in Hunan Province. Agronomy 2025, 15, 1478. [Google Scholar] [CrossRef]
Bao, Z.; Liu, R.; Wu, Y.; Zhang, S.; Zhang, X.; Zhou, B.; Luckham, P.; Gao, Y.; Zhang, C.; Du, F. Screening structure and predicting toxicity of pesticide adjuvants using molecular dynamics simulation and machine learning for minimizing environmental impacts. Sci. Total Environ. 2024, 942, 173697. [Google Scholar] [CrossRef]
Karimi, S.; Kolahchi, Z.; Zarrabi, M.; Nahidan, S.; Raza, T. From invasive species to agricultural resource: Evaluation of Phragmites australis from Zarivar Lake as an organic amendment for the release of phosphorus, potassium, micronutrients and toxic metals. Discov. Appl. Sci. 2025, 7, 100. [Google Scholar] [CrossRef]
Liang, C.; Das, K.C.; McClendon, R.W. The influence of temperature and moisture contents regimes on the aerobic microbial activity of a biosolids composting blend. Bioresour. Technol. 2003, 86, 131–137. [Google Scholar] [CrossRef]
Karimi, S.; Haghighi, M.; Sharifani, M. Eco-enzymes: A sustainable solution for waste management and agricultural applications. Iran Agric. Res. 2025, 44, 62–78. [Google Scholar]
Karimi, S.; Raza, T.; Mechri, M. Composting and Vermitechnology in Organic Waste Management. In Environmental Engineering and Waste Management: Recent Trends and Perspectives; Springer: Berlin/Heidelberg, Germany, 2024; pp. 449–470. [Google Scholar]
Peng, H.; Mu, L.; Song, Y.; Su, H.; Ge, Y.; Cheng, Z.; Li, R.; Chen, G. Integrated machine learning model for prediction of nutrients contents and maturity in rural organic solid wastes aerobic composting. J. Environ. Chem. Eng. 2025, 13, 118138. [Google Scholar] [CrossRef]

Figure 1. Heatmap showing the correlation between different parameters of composting and vermicomposting. Abbreviations: DP = Degree of Polymerization; Nm/i = ratio of mineralized nitrogen to immobilized nitrogen. Color gradient: dark red denotes strong positive correlation (r → +1), dark blue denotes strong negative correlation (r → −1), and white indicates weak or no correlation (r ≈ 0).

Table 1. Performance metrics of different machine learning models in predicting compost maturity based on CEC, C/N, and HA indicators.

Model	Indicator	Precision	Recall	F1-Score	Accuracy	Predicted Maturity Time
Random Forest	CEC	1.00	0.94	0.97	0.98	60
	C/N	0.94	0.89	0.92	0.94	60
	HA	0.88	0.94	0.91	0.94	60
Logistic Regression	CEC	0.94	0.94	0.94	0.96	60
	C/N	0.94	0.89	0.92	0.94	60
	HA	0.93	0.88	0.90	0.94	60
Decision Tree	CEC	0.88	0.94	0.91	0.94	60
	C/N	0.94	0.89	0.92	0.94	60
	HA	0.84	1.00	0.91	0.94	60
Naive Bayes (Gaussian NB)	CEC	0.93	0.88	0.90	0.94	60
	C/N	0.94	0.84	0.89	0.92	60
	HA	0.88	0.88	0.88	0.92	60
Naive Bayes (Multinomial NB)	CEC	0.82	0.88	0.85	0.90	60
	C/N	0.94	0.84	0.89	0.92	60
	HA	0.82	0.88	0.85	0.90	60
K-Nearest Neighbors	CEC	0.94	0.94	0.94	0.96	60
	C/N	0.89	0.84	0.86	0.90	60
	HA	0.93	0.88	0.90	0.94	60
Support Vector Machine	CEC	0.93	0.81	0.87	0.92	60
	C/N	1.00	0.79	0.88	0.92	60
	HA	1.00	0.81	0.90	0.94	60
AdaBoost Classifier	CEC	0.94	0.94	0.94	0.96	60
	C/N	1.00	0.95	0.97	0.98	60
	HA	0.82	0.88	0.85	0.90	60

Table 2. Interpretative Summary of Machine Learning Model Performances in Compost Maturity Prediction.

Model	Best Performing Indicator	Key Strengths	Key Limitations	Supporting Studies
Random Forest (RF)	CEC (F1 = 0.97, Acc = 0.98)	Handles nonlinear data well; robust and consistent across indicators	Slight decline in HA precision (0.88)	[11,21]
AdaBoost	C/N (F1 = 0.97, Acc = 0.98)	Strong for imbalanced data; excellent on C/N and CEC	Weak HA prediction (F1 = 0.85)	[10,37]
Logistic Regression	CEC (F1 = 0.94)	Performs well with linearly separable features	Less suited for complex or nonlinear relationships	[1]
Decision Tree (DT)	HA (Recall = 1.00)	Perfect recall for HA; interpretable model	Lower precision for HA and CEC due to overfitting	[21]
Gaussian NB	CEC (F1 = 0.90)	Simple and fast; moderate results for linear distributions	Assumes feature independence; poor HA performance	[37]
Multinomial NB	None	Efficient on textual-type or count data	Weakest overall performance; not suited to continuous features	[1]
KNN	CEC and HA (F1 = 0.94, 0.90)	Captures local patterns well; good for smaller datasets	Sensitive to scaling and high dimensionality	[21]
SVM	HA (Precision = 1.00)	High precision; margin-based classification good for clear boundaries	Recall tradeoff reduces robustness on C/N and HA	[34,37]

Note: “Best Performing Indicator” is selected based on the highest F1-score and Accuracy combination. References correspond to studies validating model behavior with similar compost maturity indicators.

Table 3. Relative Importance of Key Features for Each Target Parameter (CEC, HA, and C/N) Across Classification Models.

Model	Parameter	Time (Day)	CEC	C/N	DP	HA
Random Forest	CEC	0.12	0.23	0.03	0.009	0.04
	HA	0.25	0.04	0.03	0.03	0.06
	C/N	0.19	0.009	0.20	0.008	0.02
Logistic Regression	CEC	0.19	0.25	0.006	0.02	0.01
	HA	0.17	0.25	0.005	0.014	0.023
	C/N	0.08	0.008	0.26	0.04	0.13
Decision Tree	CEC	0.29	0.24	0.15	0.03	0.06
	HA	0.34	0.09	0.12	0.04	0.17
	C/N	0.12	0.08	0.24	0.0	0.16
Naïve Bayes (GNB)	CEC	0.19	0.06	0.003	−0.004	0.003
	HA	0.22	0.03	0.02	0.002	0.04
	C/N	0.18	−0.02	0.03	−0.005	−0.02
Naïve Bayes (MNB)	CEC	0.35	0.002	−0.003	0.0	0.0
	HA	0.38	0.0	−0.01	0.003	0.0
	C/N	0.43	0.001	0.01	0.0	−0.002
K Neighbors	CEC	0.16	0.25	−0.001	0.0	0.002
	HA	0.26	0.12	0.0	0.0	0.001
	C/N	0.25	0.15	−2.22	0.002	−0.001
SVM	CEC	0.15	0.21	−0.007	0.0	0.0
	HA	0.26	0.10	0.005	0.0	0.003
	C/N	0.28	0.06	0.003	0.0	0.0
AdaBoost	CEC	0.08	0.29	0.007	0.02	0.02
	HA	0.23	0.01	0.06	0.04	0.08
	C/N	0.003	0.0	0.36	0.008	0.11

Table 4. Cross-Validation (CV) Scores and Uncertainty Estimates of the Classification Models for Predicting Compost Maturity Parameters.

Model	Parameter	Accuracy (5-Fold)	Accuracy Mean	Lower Limit	Upper Limit
Random Forest	CEC	[0.94, 0.94, 0.94, 0.87, 0.91]	0.92	0.95	0.98
	HA	[0.92, 0.92, 0.92, 0.89, 0.83]	0.89	0.94	0.98
	C/N	[0.94, 0.96, 1.00, 0.96, 1.00]	97	0.97	0.99
Logistic Regression	CEC	[0.94, 0.94, 0.96, 0.87, 0.89]	0.92	0.90	0.93
	HA	[0.94, 0.98, 0.92, 0.89, 0.83]	0.91	0.89	0.92
	C/N	[0.92, 0.92, 0.96, 0.89, 0.98]	0.93	0.93	0.97
Decision Tree	CEC	[0.89, 0.94, 0.92, 0.79, 0.91]	0.89	0.93	0.98
	HA	[0.81, 0.85, 0.89, 0.91, 0.81]	0.86	0.92	0.97
	C/N	[0.96, 0.92, 0.94, 0.96, 1.00]	0.95	0.96	0.99
Naïve Bayes (GNB)	CEC	[0.89, 0.92, 0.92, 0.89, 0.89]	0.90	0.88	0.90
	HA	[0.92, 0.98, 0.89, 0.87, 0.85]	0.90	0.88	0.91
	C/N	[0.87, 0.94, 0.94, 0.81, 0.89]	0.89	0.88	0.91
Naïve Bayes (MNB)	CEC	[0.79, 0.94, 0.89, 0.89, 0.85]	0.87	0.85	0.87
	HA	[0.89, 0.89, 0.85, 0.89, 0.79]	0.86	0.86	0.87
	C/N	[0.89, 0.89, 0.96, 0.85, 0.94]	0.91	0.89	0.91
K Neighbors	CEC	[0.89, 0.94, 0.94, 0.89, 0.89]	0.91	0.90	0.94
	HA	[0.87, 0.98, 0.89, 0.87, 0.81]	0.89	0.86	0.92
	C/N	[0.89, 0.89, 0.92, 0.85, 0.91]	0.89	0.87	0.92
SVM	CEC	[0.89, 0.94, 0.92, 0.91, 0.89]	0.91	0.88	0.90
	HA	[0.87, 0.94, 0.92, 0.91, 0.87]	0.90	0.89	0.91
	C/N	[0.85, 0.89, 0.89, 0.85, 0.83]	0.86	0.86	0.87
AdaBoost	CEC	[0.94, 0.96, 0.92, 0.89, 0.94]	0.93	0.94	0.97
	HA	[0.92, 0.94, 0.94, 0.89, 0.83]	0.90	0.92	0.96
	C/N	[0.96, 1.00, 0.98, 0.95, 1.00]	0.98	0.97	1

Table 5. Performance Metrics of Classification Models Based on Regression Errors for Predicting Compost Maturity Indicators (CEC, HA, and C/N Ratio).

Model	Parameter	MAE	MSE	RMSE	R²
Random Forest	CEC	0.02	0.02	0.144	0.91
	HA	0.06	0.06	0.25	0.72
	C/N	0.06	0.06	0.25	0.74
Logistic Regression	CEC	0.04	0.04	0.20	0.81
	HA	0.06	0.06	0.25	0.72
	C/N	0.06	0.06	0.25	0.74
Decision Tree	CEC	0.06	0.06	0.25	0.72
	HA	0.06	0.06	0.25	0.72
	C/N	0.06	0.06	0.25	0.74
Naïve Bayes (GNB)	CEC	0.06	0.06	0.25	0.72
	HA	0.08	0.08	0.29	0.63
	C/N	0.08	0.08	0.29	0.65
Naïve Bayes (MNB)	CEC	0.10	0.10	0.32	0.53
	HA	0.10	0.10	0.32	0.53
	C/N	0.08	0.08	0.29	0.65
K Neighbors	CEC	0.04	0.04	0.20	0.81
	HA	0.06	0.06	0.25	0.72
	C/N	0.10	0.10	0.32	0.56
SVM	CEC	0.08	0.08	0.29	0.62
	HA	0.06	0.06	0.25	0.72
	C/N	0.08	0.08	0.28	0.65
AdaBoost	CEC	0.0	0.0	0.0	1.0
	HA	0.02	0.02	0.14	0.91
	C/N	0.0	0.0	0.0	1.0

Table 6. Comparative Analysis of Machine Learning Models for Compost Maturity Prediction Based on Literature and Current Study.

Study/Model	Target Variables	Key Features Used	Best Performing Models	Notable Findings
[10]	Compost Maturity (GI, OM)	Time, Organic Matter, Humification Index	Random Forest, SVM	Time and humification indexes critical; RF and SVM provide best accuracy for maturity prediction
[21]	Compost Maturity (GI)	Total Organic Carbon, Organic Matter, NH4+-N conversion	Logistic Regression, Decision Tree	Importance of carbon transformation and nitrogen dynamics; LR and DT balance interpretability and performance
[38]	Compost Maturity Scores	Multiple indicators including OM, CEC, HA	Random Forest, Ensemble Models	Ensemble models capture complex relationships; maturity varies with composting stage and material type
[33]	HA, CEC, GI	Organic Matter, Humification, Time	Naive Bayes, SVM	Naive Bayes effective for HA prediction; SVM robust across variables
[40]	Maturity (HA, C/N)	Time, Organic Matter, CEC, HA	Multinomial Naive Bayes, Random Forest	Feature importance of Time and HA is consistent; NB good in HA prediction
Current Study	Compost Maturity (CEC, HA, C/N)	Time, CEC, C/N, DP, HA	Multinomial NB, SVM, Random Forest	Time critical for all; SVM & NB best for HA; Random Forest and Decision Tree capture feature interactions

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karimi, S.; Shariatmadari, H.; Shayannejad, M.; Nourbakhsh, F. Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms. Agriculture 2026, 16, 834. https://doi.org/10.3390/agriculture16080834

AMA Style

Karimi S, Shariatmadari H, Shayannejad M, Nourbakhsh F. Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms. Agriculture. 2026; 16(8):834. https://doi.org/10.3390/agriculture16080834

Chicago/Turabian Style

Karimi, Shno, Hossein Shariatmadari, Mohammad Shayannejad, and Farshid Nourbakhsh. 2026. "Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms" Agriculture 16, no. 8: 834. https://doi.org/10.3390/agriculture16080834

APA Style

Karimi, S., Shariatmadari, H., Shayannejad, M., & Nourbakhsh, F. (2026). Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms. Agriculture, 16(8), 834. https://doi.org/10.3390/agriculture16080834

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Maturity Prediction and Correlation Analysis of Additive-Treated Cattle and Sheep Manure Composts and Vermicomposts Using Machine Learning Algorithms

Abstract

1. Introduction

2. Materials and Methods

2.1. Raw Materials and Experimental Setup

2.2. Composting and Sampling Procedures

2.3. Maturity Assessment and Binary Classification for Machine Learning

2.4. Machine Learning Model Selection

2.4.1. Random Forest Classifier (RFC)

2.4.2. Logistic Regression (LR)

2.4.3. Decision Tree Classifier (DTC)

2.4.4. Gaussian Naive Bayes (GNB)

2.4.5. Multinomial Naive Bayes (MNB)

2.4.6. K-Nearest Neighbors (KNN)

2.4.7. Support Vector Machine (SVM)

2.4.8. AdaBoost Classifier (ABC)

2.5. Model Evaluation Metrics

2.6. Feature Importance Analysis

3. Results and Discussion

3.1. Correlation Between Different Parameters of Composting and Vermicomposting

3.2. Performance Metrics of Different Machine Learning Models in Predicting Compost Maturity

3.3. Model Performance Evaluation and Comparative Analysis of Machine Learning Models in Predicting Compost Maturity

3.4. Feature Importance Evaluation

3.5. Cross-Validation Accuracy and Model Uncertainty

3.6. Analysis of Regression-Based Performance Metrics for Compost Maturity Prediction Models

3.7. Comparative Discussion with Literature

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI