Implementation of XGBoost Models for Predicting CO2 Emission and Specific Tractor Fuel Consumption

Nebojša Balać; Zoran Mileusnić; Aleksandra Dragičević; Mihailo Milanović; Andrija Rajković; Rajko Miodragović; Olivera Ećim-Đurić

doi:10.3390/agriculture15111209

,

and

¹

Kite d.o.o., 21333 Čenej, Serbia

²

Department of Agricultural Engineering, Faculty of Agriculture, University of Belgrade, Nemanjina 6, 11080 Belgrade, Serbia

^*

Author to whom correspondence should be addressed.

Agriculture2025, 15(11), 1209;https://doi.org/10.3390/agriculture15111209

This article belongs to the Section Agricultural Technology

Version Notes

Order Reprints

Abstract

Tillage is one of the most energy-intensive operations in crop production, leading to high fuel consumption and the emission of harmful gases such as CO₂ and NO_x. This study was conducted under real field conditions to explore how soil parameters influence variations in fuel use and exhaust emissions. A machine learning approach based on the XGBoost algorithm was applied to develop predictive models for CO₂ concentrations in exhaust gases and specific fuel consumption. The CO₂ prediction model achieved an accuracy exceeding 80%, while the model for fuel consumption reached over 65%. Although not optimized for high precision, these models offer a valuable basis for preliminary assessments and highlight the potential of data-driven approaches for improving energy efficiency and environmental sustainability in agricultural mechanization.

Keywords:

tractor exhausts emission; predictive soil tillage; VRA soil tillage; precision agriculture; machine learning; XGBoost model

1. Introduction

Global energy consumption, according to the International Energy Agency, is expected to double by 2050 [1]. Unfortunately, energy consumption derived from fossil fuels remains high at around 80%, not only in certain countries but also in specific economic sectors [2]. Specifically, in agricultural production, diesel fuel currently dominates this sector, primarily due to the economic efficiency, durability, and reduced emissions of unburned hydrocarbons and carbon dioxide [3,4,5]. Although diesel is currently being successfully replaced by biodiesel in many countries [6], as the fuel of the 21st century, unfortunately, in the Republic of Serbia, diesel production has not yet reached a competitive level in the market [7]. Similar modeling approaches have been explored internationally, confirming the global relevance of such analyses. In Finland, Jokiniemi et al. [8] modeled fuel consumption across different silage-harvesting methods, integrating operational, mechanical, and environmental parameters. Nagar et al. [9] proposed a generalized machine learning framework to predict tractor fuel use under diverse Indian farming conditions, while Yang et al. [10] applied neural networks for estimating fuel consumption in autonomous agricultural systems. These studies demonstrate the broader applicability of AI-based models in optimizing fuel use and reducing emissions in agriculture across varying geographic and technological contexts [8,9,10]. Therefore, the analysis of energy inputs—fuel consumption, as well as monitoring gas emissions in different operational modes of tractors—is of exceptional importance.

The diverse operational purposes of tractors necessitate structural, energy, ergonomic, and ecological adaptations to current conditions. A tractor, as a working machine in agriculture, serves multifunctional purposes, and its ability to adapt to operational requirements defines it as more or less adaptable, which can significantly contribute to the reduction in exhaust gas emissions resulting from fuel combustion in internal combustion engines [11,12,13,14].

Many studies have addressed the impact of exhaust gases on the environment, with authors stating that vehicle exhaust gases have a significant effect on global warming, influence acid rain, and affect air composition [15] and, consequently, human health and other living beings [16,17]. In previous work [18], the authors developed a simplified approach to assess the key factors influencing tractor fuel consumption. To more accurately assess the issues of tractor exhaust emissions and the factors leading to increased emissions, factors are classified into three groups [19]: the first group is the technological equipment of tractors with technical systems for processing exhaust gases, the second includes operational factors, and the third pertains to the type and composition of the fuel and lubricants burned in the internal combustion engine. The first group can include factors such as technical and technological solutions for post-treatment of exhaust gases, as well as the technological level of the tractor engine itself. The second group can encompass operational causes such as the working setups of the tractor for performing certain operations in accordance with working conditions, the requirements of the implement with which the tractor is aggregated, etc., while the third group can include factors related to the fuel, such as the type and kind of fuel, the chemical composition of the fuel, etc.

This paper focuses on studying the causes of the second group, namely the operational factors that influence the change and composition of exhaust gases in the agro-technical operation of primary soil tillage with a plow. A tractor operating as a working machine in conjunction with a plow operates under variable loads, so variations in operation can be projected through different emissions of exhaust gases into the atmosphere. Therefore, an inadequately adjusted operational mode of the tractor for the given operational conditions can be noted as one of the causes of variation in exhaust gas emissions, categorized as an operational factor [20,21]. Determining the optimal operating mode of the tractor is very interesting from an economic point of view. Substantial expenditures are not needed to reduce exhaust gas emissions; instead, certain corrective actions or software solutions can bring about specific benefits.

Various authors have analyzed the fuel consumption of tractors and the environmental impact of exhaust gases when carrying out different operations and conditions [22,23]. According to the authors [24], the differences in the observed parameters range from a few percent to several times more.

The setting of the tractor’s working mode, especially the engine load, has a major influence on exhaust emissions. For example, it is found that the operating mode at engine speeds of around 1000 rpm and low torque of around 30% Mmax is not favorable from either an economic or ecological point of view. At a higher torque >50% Mmax and medium (1000–2000 rpm) and high (>2000 rpm) engine speeds, the tractor is more efficient and acceptable from an environmental point of view [25,26].

In a previous paper [27], it is stated that the specific tractor force of the tractor during tillage changes with the involvement of clay, the change in the volumetric mass of the soil, the involvement of organic matter in the soil, and the cohesive forces that prevail between the aggregates in the soil. It was also found that when the increased standard of organic fertilizer was applied for several years, the consumption of diesel fuel on the plot was reduced by 25%, while with the standard of organic fertilizer, 14% of diesel fuel was saved compared to the plot that was not fertilized with organic fertilizer. In addition to organic matter, the moisture content of the soil also has a major influence on the specific draught and thus on fuel consumption [28].

In addition to the standard methods for modeling exhaust emissions, artificial intelligence (AI) has been increasingly used for these problems in recent years. The most applied models are artificial neural networks (ANNs), which have proven to be exceptionally good at predicting data compared to traditional numerical models. Most models are based on the application of the Levenberg–Marquardt algorithm in conjunction with log- and tan-sigmoidal transfer functions, which provide the best results [29,30,31,32,33]. In contrast to the most applied models such as ANN networks, Decision Trees (DTs), Random Forest (RF), Support Vector Machines (SVM), and others [34,35], the XGBoost model was applied to predict specific fuel consumption and CO₂ emissions due to the specificity of the data obtained from the experiment (for model descriptions, see Section 2.7).

Therefore, the main objective of this study is to develop and validate predictive models using the XGBoost machine learning algorithm to estimate CO₂ emissions and specific fuel consumption of an agricultural tractor during primary soil tillage. The models are based on real operational data collected under field conditions, including soil properties (clay content, organic matter, and moisture), tractor settings (engine load and working regime), and exhaust gas measurements. By applying AI techniques to field data, this research aims to improve understanding of the relationships between soil–tractor interactions and environmental impact and support future development of data-driven decision-support tools in precision agriculture.

2. Materials and Methods

2.1. Structure of the Experiment

The experimental tests were carried out in December 2019 and November 2020 on a sample area of 243 ha in Vojvodina in the municipality of Stara Pazova at the coordinates 44°59′38.0″ N 20°06′10.3″ E, during the agrotechnology time frame for autumn/winter soil tillage in Serbian conditions. The area is located at an altitude of 81–86 m above sea level, which is important because the height of the terrain has a minimal influence on the change in the traction resistance of the plough. The direction of movement of the tractor on the plot is shown in Figure 1. The length of the tractor’s path during operation in the direction 75° E–256° W was approximately 2000 m. The plot is divided into 80 cultivation zones, of which 33 cultivation plots were effectively utilized during the trials in 2019 and 2020. The soil type on the trial plot is carbonate chernozem.

Figure 1. Tractor movement paths under different working regimes during field trials in 2019 and 2020, correlated with soil properties based on Arany index and humus content.

The control of soil compaction on the plot was performed at the tillage depth of 25 cm with a penetrometer at 50 points, and in the 2019 production year, resistances between 100 and 200 N/cm² were detected, while in 2020, the values were between 90 and 220 N/cm².

In each production year, soil moisture was monitored in certain cultivation zones where the tractor was driven in order to rule out a possible influence of moisture on the change in resistance during tillage. Sampling was carried out at 3 points during a tractor pass. Soil moisture during tillage in 2019 was between 24.10 and 27.63%, while in 2020, moisture of 24.20–27.20% was measured according to the gravimetric method.

The average volumetric mass obtained with the Kopecky method cylinder 100 cm³ of the sampled soil in 2019 was 1.42 g/cm³, while in 2020, it was 1.40 g/cm³, taken from the surface and from a depth of 25 cm from the ploughing depth. During tillage, the tractor took the direction of movement (vector orientation) on the plot 75° E–256° W. One working mode of the tractor was maintained from end to end of the plot, shifting the change in six different working modes at each subsequent pass for the working width of the plough of 2.4 m, and the movement of the tractor was carried out within the same cultivation zones, with similar soil moisture and similar soil compaction. In this way, the influence of secondary factors was canceled out, and the characteristics of the soil could exert their influence on the variation of the observed factors.

The wheel slippage in 2019 was between 9 and 16%, while in 2020 it was 8–14%. The Fendt 936 tractor (AGCO, Marktoberdorf, Germany) 2017 was used for the test. The tractor has a Deutz Fahr TTCD 6-cylinder diesel engine with a volume of 7750 cm³, which is specified with a rated power of 263 kW (358 hp) at 2200 rpm in accordance with the ECE R24 standard. The diesel fuel in the tractor is defined according to the DIN EN 590 standard, as required by the manufacturer. The compression ratio of the engine is 1:18 ± 0.3 and the fuel injection is a Deutz common rail system with an EDC 17 hardware engine control unit from Bosch. The engine meets the Tier IV emissions standard with a 2-stage turbocharger and cooled air in the intercooler, an EGR valve with cooled exhaust gases, a DPF filter, and SCR technology with AdBlue fluid, which is declared in accordance with the DIN 770 70 standard and diluted with demineralized water in a ratio of 32.5:67.5. The DPF filter was regenerated in 80 working hours in 2019, while the regeneration in 2020 took 130 working hours. All experimental setup data are shown in Table 1.

Table 1. Technical specifications of the tractor and plough used during the field experiment for primary soil tillage under varying operational modes—experimental setup data.

During tractor operation, the Tractor Management System (TMS) was used to set the transmission ratio with load limit control in combination with Vario technology. In this way, communication between the engine and transmission is direct, and the TMS processor is responsible for the most efficient way of operating each mode.

In addition to the specified factory weight of 10,830 kg, the mass of the tractor in operation has been increased with a ballast weight of 3000 kg, namely, an 1800 kg front ballast weight and 1200 kg (2 × 600 kg) ballast weights in the rear wheels.

Soil cultivation was performed with a 6-furrow reversible plow Kuhn Multi-Master 183, three-point hitched, with a working width of 240 cm and a support wheel that limited the soil cultivation depth to 25 cm.

During plowing, the tractor moved entirely on the unplowed ground (the “on land” variant), using navigation with an accuracy of 2.5 cm (1 inch), thus maintaining a uniform working width of 2.4 m.

2.2. Categorization of Input Variables

To enable the inclusion of diverse agronomic and environmental conditions within the machine learning model, several continuous input variables were transformed into categorical classes based on expert-defined thresholds and statistical distribution. Specifically:

Soil texture was categorized into three ordinal classes (1 = light, 2 = medium, 3 = heavy), based on the percentage of clay content as defined by FAO soil texture classification. Class 1 included soils with clay content < 20%, class 2 represented 20–35%, and class 3 referred to soils with >35% clay.
Humus content was categorized into three classes according to the agronomic interpretation: 1 = low (<2%), 2 = moderate (2–4%), and 3 = high (>4%).
Tractor working regime (engine load and gear combination) was encoded as categorical values (1, 2, and 3), representing predefined field operation setups: (1) low engine RPM and low gear, (2) medium engine RPM and medium gear, and (3) high engine RPM and high gear. These settings were based on typical field operation standards and predefined during the experimental design phase.

This binning approach allowed for improved model generalization while still preserving the agronomic interpretability of categorical inputs.

2.3. Remote Method for Determining Management Zones Based on NDVI

Remote sensing using the Normalized Difference Vegetation Index (NDVI) method was used as the starting point for creating management zones with different varieties on a 243 ha plot. The Normalized Difference Vegetation Index (NDVI) was used to delineate management zones within the experimental field, based on satellite data collected in the previous growing season. NDVI values served as a proxy for spatial variability in vegetation and indirectly reflected underlying differences in soil characteristics. These zones were used to define different sampling locations for measurements and were also coded as categorical variables during the machine learning model development to account for site-specific variability. Remote sampling is a relatively simple way to collect a large amount of multidimensional data [36,37,38].

The factors that led to variations in the NDVI index of the observed plant species on the experimental plot were drought, healthy and diseased plants, insects, soil compaction, nutrient availability, the quality of performed agricultural operations, variations in the textural composition of the soil within the plot, and many other factors.

In order to eliminate the factors that affect the NDVI index in a single production year and to focus on the factors that are present in the long term on the plot (such as different soil composition), a multi-year analysis of images was performed over a period of seven years using LandSat satellites. Analysis of the NDVI imagery resulted in the formation of 80 management zones of irregular shape, ranging in size from 0.92 ha to 5.9 ha [39].

2.4. Soil Sampling with an Automatic Soil Sampler

After determining the management zones, soil sampling was performed within each of the 80 zones. From each management zone, using an automatic soil sampler mounted on a car and with the aid of GPS navigation, sampling was performed from 5 samples for a management zone of 0.92 ha up to 30 samples for a management zone of 5.9 ha. Then, all the samples from one management zone were combined to form one representative sample, which was then analyzed in the laboratory. A total of about 1200 samples were taken. The sampling depth is the depth at which the plowing operation is performed, i.e., a depth of 25 cm [40].

2.5. Soil Testing Methods in the Laboratory

The upper limit of soil plasticity was determined by the Arany method, MSZ-08-0205-2:1978) [41,42], which is based on determining the amount of water (in cm³) that needs to be added with continuous mixing to air-dried soil to reach the upper limit of plasticity.

Although the Arany Yarn Number method is not internationally recognized as a standard for soil texture classification based on particle-size distribution (such as USDA or FAO methods), it is widely used in Serbia and parts of Central Europe as a practical and economical proxy for estimating soil plasticity and workability. Its selection in this study was motivated by the goal of future field-level implementation, where cost and simplicity are critical.

To support comparability with global standards, we refer to [43], who established correlations between the Arany index and conventional soil texture classes based on sand, silt, and clay content. This relationship enables the alignment of our soil classification results with globally recognized texture systems. To differentiate the results according to different soil compositions, the classification of the clay content in the soil was carried out as follows: 42–48 Arany number values indicate a low clay level, 49–51 Arany number values indicate a medium clay level, and 52–60 Arany number values indicate a high clay level.

The content of organic carbon, i.e., humus, in the soil was determined by the Hungarian standard method MSZ 08-0210:1977, MSZ-08-0452:1980 [44,45]. For the classification of soil according to humus content, as in the case of clay, a division was made: low clay level for 2–3.06%, medium clay level for 3.07–3.2%, and high clay level for 3.21–4.00%.

The instantaneous soil moisture content was determined by the gravimetric method, drying soil samples in an oven at 105 °C to a constant mass.

2.6. Method for Collecting Data from Tractors in Real Time

For the purpose of collecting data in real time from the Fendt 936 tractor, a compatible data logger “FMB 120” manufactured by Teltonika (Teltonika, Vilnius, Lithuania) was used. The device used GNSS connection in communication with the operations center: GPS, GLONASS, GALILEO, BEIDOU, SBAS, QZSS, DGPS, AGPS, and GSM mobile technology, which are shown in Table 2 with frequency transmission and reception range.

Table 2. Frequency transmission and reception ranges of the Teltonika FMB 120 GPS/GSM data logger used for tractor telemetry monitoring.

The working parameters of the tractor obtained via the FMB 120 data logger were the current location of the tractor on the experimental plot with an accuracy of <3 m, passed distance (m), fuel consumption (L⁻¹), engine load (%), and engine temperature (°C). Signals arrived at the operations center every 60 s. By placing POI polygons with management zones in the operations canter, the exact position of the tractor within each zone was determined. Several authors have addressed the topic of real-time data collection from tractors via the CAN bus. For example, in previous work [42], a specific data logger was used to collect data for the analysis of fuel consumption, engine load, engine speed, tractor performance, etc.

2.7. Method for Measuring Exhaust Gases

The composition of exhaust gases was measured with a portable gas analyzer Testo 350 (TestoGMBH, Lenzkirch, Germany), which meets the requirements of EN50270:2000-01 [46]. The analyzer was configured to monitor the concentration of molecular oxygen O₂, nitric oxide NO, nitrogen oxides NO_x, nitrogen dioxide NO₂, sulphur dioxide SO₂, carbon dioxide CO₂, and carbon monoxide CO. The time on the gas analyzer was synchronized with the time on the data logger installed in the tractor. This synchronization enabled the mutual correlation of real-time data from the mentioned devices and allowed for comparisons of different operating conditions, performance indicators, and exhaust gases.

2.8. Machine Learning Model

The application of machine learning (ML) models has proven to be an extremely effective statistical tool in regression and classification problems. ML models are based on connecting input data, which may not have a clear correlation in advance, which in the ML model, through learning and training methods, provide output data. Extreme Gradient Boosting (XGBoost) belongs to ensemble learning techniques, iterating through an ensemble of weak learners, mainly with decision trees, to ultimately generate a strong prediction model. The advantages of the XGBoost model are high flexibility, strong predictability, high scalability, and high efficiency in model learning. These models have proven extremely successful in processing various types of data, in both their models—regression and classification, because they use regularization to minimize the loss function. XGBoost models successively correct prediction errors, with approximation through decision trees. Also, the advantage of the XGBoost model compared to others is the generation of high model accuracy for less model training time compared to other methods. The success of these networks lies in the fact that they can be adapted to a wide range of applications. The current application of these models can be direct, integrated with other algorithms, and optimized for parameters [47,48,49,50,51,52,53,54]. Figure 2 shows the proposed XGBoost model for the analysis of experimentally collected data [55].

Figure 2. Conceptual flowchart of the XGBoost machine learning model used for prediction of CO₂ emissions and fuel consumption from field-collected tractor and soil data.

The model selection and the final comparison of model efficiency were based on standard statistical variables, the coefficient of determination, the mean absolute error, and the mean square error [56]:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(1)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(2)

R M S E = \sqrt{\sum_{i = 1}^{n} [\frac{({\hat{y}}_{i} - {y_{i}}^{2})}{n}]}

(3)

$y_{i}$ —actual (measured) value.
${\hat{y}}_{i}$ —predicted value from the model.
${\bar{y}}_{i}$ —mean of the actual values.
$n$ —total number of observations.
$R^{2}$ (coefficient of determination): measures the proportion of the variance in the dependent variable that is predictable from the independent variables.
$M S E$ (mean squared error): the average of the squares of the differences between actual and predicted values.
$R M S E$ (root mean squared error): the square root of MSE, providing an error metric in the same units as the target variable.

All statistical processing was performed in Python using the scikit-learn and XGBoost libraries. The dataset was pre-processed by encoding categorical variables numerically. Model performance was evaluated using standard regression metrics: coefficient of determination (R²), mean squared error (MSE), and root mean squared error (RMSE). In addition, hyperparameter tuning (learning rate and booster type) was performed to optimize model performance and avoid overfitting.

3. Results

The aim of the study was to determine the correlation between the working regime, the type of soil worked (loam and humus), the specific fuel consumption, and the CO₂ content in the exhaust gases. Since the input data are given in the form of categorical values, the formation of each model implies the formatting of the input data into numerical values. Based on all input data, the farm types are categorized into six categories and the soil composition, humus, and clay content into three categories. Table 3 shows the input data converted into numerical values.

Table 3. Categorized input variables (tractor working regime, clay content, and humus percentage) used in XGBoost model training and prediction.

The correlation matrix (Figure 3) was created to determine the dependencies between various input and output parameters. Positive values of the correlation coefficient indicate a positive correlation, while negative values indicate a negative correlation. The strength of the correlation is indicated by the absolute value of the correlation coefficient. From the data shown, it can be seen that none of the input data has a strong correlation with the output data, and this is the reason for choosing the XGBoost regression model. The percentage composition of CO₂ in the exhaust gases shows a negative correlation with all input parameters, and it is interesting to note that the percentage of humus shows a negative correlation in relation to the specific fuel consumption and the amount of CO₂.

Figure 3. Pearson correlation matrix showing linear relationships between input features (working regime, clay content, and humus content) and output variables (specific fuel consumption, CO₂ emission). Correlation values are dimensionless, ranging from –1 (perfect negative) to +1 (perfect positive). The units of the variables are as follows: humus (%) by weight, specific fuel consumption in g/kWh, and CO₂ concentration in exhaust gases in %.

This trend can be explained by the fact that soils with higher humus content tend to have better physical properties, including improved structure, porosity, and water retention capacity. These factors reduce soil compaction and draught resistance during tillage operations, which in turn leads to lower engine load and reduced fuel consumption. As a result, the engine operates more efficiently, producing lower levels of CO₂ in the exhaust gases.

The CO₂ concentration in exhaust gases is expressed as a percentage (%), while the specific fuel consumption is given in grams per kilowatt-hour (g/kWh). These units are now consistently used in all relevant tables and figure captions.

The correlation matrix (Figure 3) reveals a negative relationship between humus content and both CO₂ emissions (r = –0.25) and specific fuel consumption (r = –0.10). This indicates that increased humus content leads to improved soil structure and reduced tillage resistance, thereby decreasing fuel demand and lowering emissions. Additionally, CO₂ emissions show a weak negative correlation with working regime and clay content, while specific fuel consumption has a weak positive correlation with CO₂ emissions (r = 0.12), suggesting that engine load and combustion dynamics are linked but also influenced by soil variability. The moderate positive correlation between clay and humus (r = 0.32) reflects soil texture tendencies in the tested plots.

In order to optimize the XGBoost model for different output data for which the model needs to be trained, the hyperparameters of the model, booster and learning rate, were varied to avoid overfitting the model. The training score, i.e., the quality of the corresponding model, is determined according to the standard statistical equations (1) to (3), and the modeling results for the case of determining the percentage CO₂ content in the combustion products are shown in Table 4.

Table 4. Statistical performance of different XGBoost model configurations (booster and learning rate) in predicting CO₂ concentration in tractor exhaust gases.

Table 4 clearly shows that the last model has the best values of the statistical coefficients, where R² has the highest value and MSE and RMSE have the lowest values. For the selected values, the distribution diagram of the measured and calculated values is shown in Figure 4. Although the model with the selected gbtree booster and a learning rate of 0.02 shows a 22% lower R² value, an 11% higher MSE value, and a 0.05% higher RMSE value, it can be seen in Figure 5, which shows the distribution of the measured and calculated values, that this model is also acceptable.

Figure 4. Scatter plot comparing measured and predicted CO₂ values using the best-performing XGBoost model (dart booster, learning rate = 0.11).

Figure 5. Predicted versus measured CO₂ values for XGBoost model with gbtree booster and learning rate = 0.02, showing acceptable model performance.

When comparing the predicted and measured values for working modes 1 and 2 (Figure 6), a noticeable variation can be observed in a subset of data points. Specifically, prediction errors greater than 30% occurred in less than 15% of the total cases, indicating that the model maintained a relatively good level of generalization under field conditions. These mismatches are likely due to external factors not directly captured in the input dataset, such as localized variations in soil moisture, ploughing depth, and sensor dynamics during real-time data acquisition.

Figure 6. Model prediction error analysis for CO₂ emissions in working regimes 1 (a) and 2 (b), highlighting deviation patterns.

Despite these discrepancies, the model demonstrated adequate robustness and can be reliably used for the prediction of CO₂ emissions based on the selected operational and soil parameters. This confirms its potential for application in exploratory analysis and as a component of decision-support systems in precision agriculture.

As shown in Figure 4, Figure 5 and Figure 6, there are several data points where the predicted CO₂ values deviate significantly from the observed measurements. These differences are particularly noticeable in the peak ranges, where the actual CO₂ values exceed 14%. Such deviations may result from operational conditions not captured by the recorded variables, such as instantaneous changes in plough depth, engine acceleration, or soil heterogeneity. Additionally, delays in sensor response during dynamic field conditions can introduce temporal offsets between measurements and predictions. Despite these outliers, the model accurately follows the general trend and captures the variation structure of the data, as confirmed by the overall prediction accuracy exceeding 80%.

The modeling values for the optimization of specific fuel consumption are listed in Table 5. In contrast to the previous set of data, the statistical data show much lower values when predicting the specific fuel consumption per hectare traveled. The best model was obtained with the selection booster “dart” and learning rate = 0.05. The distribution of the measured and predicted values is shown in Figure 7, where the deviations between the data can be clearly seen.

Table 5. Performance evaluation of XGBoost models with varying booster types and learning rates for predicting specific tractor fuel consumption (g/kWh).

Figure 7. Comparison of predicted and actual values for specific fuel consumption using the optimal XGBoost model (dart booster, learning rate = 0.05).

To better evaluate model performance beyond standard regression metrics, we transformed the continuous output of specific fuel consumption (SFC) into a binary classification problem. A threshold of 350 g/kWh, based on the median of the empirical distribution, was used to divide data into “low” and “high” consumption classes. This allowed us to construct a confusion matrix (Figure 7) that visualizes the model’s classification accuracy.

Despite the limited R² value obtained from regression (<0.3), the model correctly classified 65% of cases, showing reasonable alignment with observed labels. The matrix reveals a slight bias toward underestimating high-consumption cases, which can be attributed to the absence of key variables (e.g., real-time torque, load, and terrain resistance). Nonetheless, this level of accuracy is sufficient for exploratory or operational screening where a quick estimation of whether consumption is within or above the threshold is valuable. The chosen threshold and binarization were further justified by their prevalence in practical field advisory systems, where such thresholds are often predefined based on fuel economy targets.

In order to evaluate the classification performance of the XGBoost model for specific fuel consumption (SFC), a binary confusion matrix was constructed by transforming the continuous SFC output into two classes. The threshold for classification was defined at 350 g/kWh, based on the median value of the empirical distribution. Values below this threshold were categorized as “low consumption”, and values above it as “high consumption”.

As shown in Figure 8, the model correctly classified 124 instances as true negatives (low SFC correctly identified) and 55 instances as true positives (high SFC correctly identified). However, 70 false positives (high SFC incorrectly predicted as low) and 27 false negatives (low SFC predicted as high) were observed. This performance indicates a reasonable balance between sensitivity and specificity but also reveals a tendency of the model to misclassify higher fuel consumption cases. This misclassification may be attributed to the limited number of features related to engine load and field resistance included in the current dataset. Future improvements could include real-time telemetry data such as torque, fuel injection pressure, or load dynamics to enhance prediction accuracy across consumption classes.

Figure 8. Confusion matrix for the classification of specific fuel consumption (SFC) based on a threshold of 350 g/kWh. The model distinguishes between low (≤350 g/kWh) and high (>350 g/kWh) consumption levels. Matrix values represent the number of samples (unitless), while SFC is measured in grams per kilowatt-hour (g/kWh).

This binary classification was used to simplify model evaluation and better visualize the discrimination ability of the model in distinguishing extreme consumption behaviors under field conditions, as recommended in similar studies on ML-based agricultural system modeling.

4. Discussion

This study demonstrates the applicability of the XGBoost algorithm for modeling CO₂ emissions and specific fuel consumption (SFC) during soil tillage using real-world experimental data. The obtained R² value exceeding 0.80 for CO₂ emissions confirms the ability of the model to capture complex, nonlinear relationships among operational, mechanical, and soil-related variables. Similar levels of accuracy were reported in related studies—e.g., Lim et al. (2024) achieved R² values of 0.85 and 0.97 for NO_x and PM emissions, respectively, during tractor plow tillage using regression-based models [23].

In contrast, the predictive power for SFC was substantially lower (R² < 0.3), which aligns with findings from previous studies that suggest SFC is influenced by a wider set of dynamic variables, including torque load, real-time engine RPM fluctuations, traction resistance, and topographical variations [45]. The absence of such parameters in our model—particularly torque and dynamic load data—likely limited the predictive performance. Moreover, the categorization of soil texture and working regimes (e.g., as classes instead of continuous values) may have contributed to a loss of sensitivity and granularity in model training.

Although the inclusion of multiple agronomic and machine parameters (e.g., soil moisture, wheel slippage, and engine temperature) enabled a robust predictive model for emissions, further model improvement—especially for SFC—could be achieved through advanced feature engineering. Incorporating polynomial interactions, time-series transformations, or sensor fusion techniques (e.g., CAN-bus data integration) would provide richer input context. Studies in other domains (e.g., wastewater aeration [56] and trajectory prediction for autonomous vehicles [54]) support the potential of such approaches.

The selection of the DART (Dropout Additive Regression Trees) booster with a learning rate of 0.11 was found optimal in our experiments, which is consistent with prior research highlighting the regularization and generalization benefits of DART in avoiding overfitting on smaller or noisy datasets [48]. DART’s dropout mechanism reduces the dominance of individual trees and encourages diversity in the boosted ensemble, improving generalization in agronomic settings where data may include irregularities or missing values [56].

A significant advantage of the XGBoost method, compared to linear regression, support vector machines (SVMs), or even artificial neural networks (ANNs), lies in its capacity to handle heterogeneously scaled features and missing values without extensive preprocessing [43,45]. Moreover, unlike deep neural networks, XGBoost provides relatively interpretable outputs—particularly when combined with SHAP (Shapley Additive Explanations), which has been effectively used in land-use modeling and environmental science [50]. Although SHAP was not implemented in this study, its application in future work could yield deeper insights into variable importance and model explainability.

Our findings on the weak individual correlation between input variables and emissions reaffirm the utility of ensemble tree-based models that aggregate weak signals to form strong predictors. However, this complexity raises challenges in direct physical interpretability—a known limitation of tree-based ensemble methods in agricultural research. For practical deployment in field machinery, transparency and traceability remain important, especially for systems intended to support operator decision-making or regulatory compliance.

From a practical standpoint, the high accuracy of the CO₂ emission model offers promising implications for real-time applications in precision agriculture. Integrating such models into embedded tractor systems or cloud-based farm management platforms could allow dynamic emission monitoring and feedback-based operational adjustments. Similar approaches have been demonstrated in intelligent irrigation and carbon-aware aeration management in wastewater systems [56].

It is also noteworthy that the model was developed using real operational data from two separate growing seasons, which contributes to its robustness. Nevertheless, the generalizability remains constrained by the geographical and mechanical homogeneity of the dataset. Expanding the dataset to include different machinery types, more soil classes, and variable topographies—combined with the integration of high-frequency telemetry—could further enhance model performance and applicability across diverse agricultural systems.

Lastly, the methodology used in this study contributes to the growing body of literature advocating for machine learning approaches in environmental impact assessment and low-carbon agriculture. As emission regulations for non-road mobile machinery become more stringent globally, the development of reliable, interpretable, and data-efficient predictive models will become increasingly important for compliance, optimization, and sustainability in agricultural mechanization.

5. Conclusions

This study investigated the viability of the XGBoost machine learning algorithm for predicting CO₂ emission and specific fuel consumption (SFC) during soil tillage operations with agricultural tractors. By integrating a wide range of field-collected variables—including soil texture, moisture content, wheel slippage, and engine performance parameters—XGBoost demonstrated strong predictive capability for CO₂ emissions (R² > 0.80), confirming its suitability for modeling complex, nonlinear systems in agricultural mechanization.

The relatively lower performance of the SFC model (R² < 0.3) highlights the need for a broader set of dynamic input features to capture the multifactorial nature of fuel consumption under variable field conditions. Despite this limitation, the study confirms the relevance of machine learning models in the environmental performance monitoring of agricultural machinery, with particular value for emission estimation, operational optimization, and the development of precision farming technologies.

The choice of the DART booster within the XGBoost framework proved effective in preventing overfitting while maintaining predictive accuracy, indicating its potential for use in data-limited but complex agricultural scenarios. Compared to traditional statistical methods, the use of tree-based ensemble learning models enables higher flexibility and interpretability, particularly when paired with post hoc explanatory tools such as SHAP.

From a broader perspective, this research contributes to the growing body of work on sustainable agriculture, offering a data-driven approach for supporting regulatory compliance, energy efficiency, and low-carbon transitions in farming. Future research should focus on the expansion of datasets across seasons, topographies, and machinery types, the incorporation of real-time sensor data, and the integration of predictive models into decision-support systems and embedded machinery platforms.

The findings underline the transformative potential of machine learning in advancing environmentally responsible agricultural practices and pave the way for intelligent, emission-aware field operations aligned with global sustainability goals.

Author Contributions

N.B.: experiment setup, data collection, and data processing., O.E.-Đ.: methodology, software, validation, formal analysis, visualization, investigation, data curation, resources, writing—original draft, and writing—review and editing. M.M.: software, formal analysis, validation, data curation, investigation, and writing—review and editing. A.R.: data curation and writing—original draft. Z.M.: formal analysis and writing—review and editing. A.D.: formal analysis and writing—original draft, review and editing. R.M.: data curation, formal analysis, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “Contract for the Transfer of Funds for the Financing of Scientific Research Work of Teaching Staff at Accredited Higher Education Institutions in 2025, contract registration number: 451-03-65/2024-03/200116”.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Dataset is available upon request from the authors.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Conti, J.; Holtberg, P.; Diefenderfer, J.; LaRose, A.; Turnure, J.T.; Westfall, L. International Energy Outlook 2016 with Projections to 2040; No. DOE/EIA-0484; USDOE Energy Information Administration (EIA), Office of Energy Analysis: Washington, DC, USA, 2016. [Google Scholar]
Song, S.; Li, T.; Liu, P.; Li, Z. The transition pathway of energy supply systems towards carbon neutrality based on a multi-regional energy infrastructure planning approach: A case study of China. Energy 2022, 238, 122037. [Google Scholar] [CrossRef]
Akbari, M.; Piri, H.; Renzi, M.; Bietresato, M. The Effects of Biodiesel on the Performance and Gas Emissions of Farm Tractors’ Engines: A Systematic Review, Meta-Analysis, and Meta-Regression. Energies 2024, 17, 4226. [Google Scholar] [CrossRef]
Visković, J.; Dunđerski, D.; Adamović, B.; Jaćimović, G.; Latković, D.; Vojnović, Đ. Toward an Environmentally Friendly Future: An Overview of Biofuels from Corn and Potential Alternatives in Hemp and Cucurbits. Agronomy 2024, 14, 1195. [Google Scholar] [CrossRef]
Naik, S.N.N.; Goud, V.V.; Rout, P.K.; Dalai, A.K. Production of first and second generation biofuels: A comprehensive review. Renew. Sustain. Energy Rev. 2010, 14, 578–597. [Google Scholar] [CrossRef]
Perkowska, A.; Klepacki, B.; Bórawski, P.; Bełdycka-Bórawska, A.; Michalski, K. Changes in energy consumption in agriculture in the EU countries. Energies 2021, 14, 1570. [Google Scholar] [CrossRef]
Mizik, T.; Gyarmati, G. Economic and Sustainability of Biodiesel Production—A Systematic Literature Review. Clean Technol. 2021, 3, 19–36. [Google Scholar] [CrossRef]
Jokiniemi, T.; Ahokas, J.; Rinne, M. Predicting fuel consumption of grass silage harvesting systems. Agric. Food Sci. 2021, 30, 26–40. [Google Scholar] [CrossRef]
Nagar, M.; Pandey, A.K.; Ghosh, A. Cloud-driven serverless framework for generalised tractor fuel consumption prediction model using machine learning. Comput. Electron. Agric. 2024, 214, 10807. [Google Scholar] [CrossRef]
Yang, M.; Wang, Y.; Zhang, Y. Estimating Fuel Consumption of an Agricultural Robot by Applying Artificial Neural Networks. Machines 2023, 11, 43. [Google Scholar] [CrossRef]
Felten, D.; Froba, N.; Fries, J.; Emmerling, C. Energy balances and greenhouse gas-mitigation potentials of bioenergy cropping systems (Miscanthus, rapeseed, and maize) based on farming conditions inWestern Germany. Renew. Energy 2013, 55, 160–174. [Google Scholar] [CrossRef]
Shamshirband, S.; Khoshnevisan, B.; Yousefi, M.; Bolandnazar, E.; Anuar, N.B.; Wahab, A.W.; Khan, S.U. A multiobjective evolutionary algorithm for energy management of agricultural systems—A case study in Iran. Renew. Sustain. Energy Rev. 2015, 44, 457–465. [Google Scholar] [CrossRef]
Paciolla, F.; Łyp-Wrońska, K.; Quartarella, T.; Pascuzzi, S. Simulation Analysis of Energy Inputs Required by Agricultural Machines to Perform Field Operations. AgriEngineering 2025, 7, 7. [Google Scholar] [CrossRef]
Dettù, F.; Formentin, S.; Savaresi, S.M. Driving Style Assessment System for Agricultural Tractors: Design and Experimental Validation. Agronomy 2022, 12, 590. [Google Scholar] [CrossRef]
Milosavljević, M.M.; Marinković, A.D.; Petrović, S.D.; Sovrlić, M. A new ecologically friendly process for the synthesis of selective flotation reagents. Chem. Ind. Chem. Eng. Q. 2009, 15, 257–262. [Google Scholar] [CrossRef]
Sedyaaw, P.; Kawade, S.S.; Bhaladhare, R.D.; Pranali, K.; Pandey, A. Causes, Effects and Management Measures of Acid Rain: A Review. Bhartiya Krishi Anusandhan Patrika 2024, 39, 215–222. [Google Scholar] [CrossRef]
Mamta, B. Acid Rain-A Specific type of Pollution: Its concept, causes, effects and tentative solutions. Chem. Sci. Rev. Lett. 2016, 5, 21–26. [Google Scholar]
Cutini, M.; Brambilla, M.; Pochi, D.; Fanigliulo, R.; Bisaglia, C. A Simplified Approach to the Evaluation of the Influences of Key Factors on Agricultural Tractor Fuel Consumption during Heavy Drawbar Tasks under Field Conditions. Agronomy 2022, 12, 1017. [Google Scholar] [CrossRef]
Renius, K.T. Fundamentals of Tractor Design; Springer Nature: Dordrecht, The Netherlands, 2020. [Google Scholar] [CrossRef]
Martins, M.B.; Marques Filho, A.C.; Seron, C.d.C.; Guimarães Júnnyor, W.d.S.; Vendruscolo, E.P.; Bortolheiro, F.P.d.A.P.; Blanco Bertolo, D.M.; Lopes, A.G.C.; Santana, L.S. Controlled Traffic Farm: Fuel Demand and Carbon Emissions in Soybean Sowing. AgriEngineering 2024, 6, 1794–1806. [Google Scholar] [CrossRef]
Damanauskas, V.; Janulevičius, A. Validation of Criteria for Predicting Tractor Fuel Consumption and CO₂ Emissions When Ploughing Fields of Different Shapes and Dimensions. AgriEngineering 2023, 5, 2408–2422. [Google Scholar] [CrossRef]
Al-Sager, S.M.; Almady, S.S.; Marey, S.A.; Al-Hamed, S.A.; Aboukarima, A.M. Prediction of Specific Fuel Consumption of a Tractor during the Tillage Process Using an Artificial Neural Network Method. Agronomy 2024, 14, 492. [Google Scholar] [CrossRef]
Lim, R.-G.; Kim, T.-B.; Kim, W.-S.; Baek, S.-Y.; Jeon, H.-H.; Ham, J.-Y.; Yoo, C.; Kim, Y.-J. Development and Validation of Prediction Model for Exhaust Emissions During Tractor Plow Tillage. Agriculture 2024, 14, 2334. [Google Scholar] [CrossRef]
Janulevicius, A.; Juostas, A.; Pupinis, G. Estimation of tractor wheel slippage with different tire pressures for 4wd and 2wd driving systems. Eng. Rural. Dev. 2019, 22, 88–93. [Google Scholar]
Juostas, A.; Janulevičius, A. Tractor’s engine efficiency and exhaust emissions’ research in drilling work. J. Environ. Eng. Landsc. Manag. 2014, 22, 141–150. [Google Scholar] [CrossRef]
Peltre, C.; Nyord, T.; Bruun, S.; Jensen, L.S.; Magid, J. Repeated soil application of organic waste amendments reduces draught force and fuel consumption for soil tillage. Agric. Ecosyst. Environ. 2015, 211, 94–101. [Google Scholar] [CrossRef]
Arvidsson, J.; Hillerström, O. Specific draught, soil fragmentation and straw incorporation for different tine and share types. Soil Tillage Res. 2010, 110, 154–160. [Google Scholar] [CrossRef]
Manimaran, R.; Mohanraj, T.; Venkatesan, M.; Ganesan, R.; Balasubramanian, D. A computational technique for prediction and optimization of VCR engine performance and emission parameters fuelled with Trichosanthes cucumerina biodiesel using RSM with desirability function approach. Energy 2022, 254, 124293. [Google Scholar] [CrossRef]
El-Shafay, A.; Gad, M.; Ağbulut, Ü.; Attia, E.-A. Optimization of performance and emission outputs of a CI engine powered with waste fat biodiesel: A detailed RSM, fuzzy multi-objective and MCDM application. Energy 2023, 275, 127356. [Google Scholar] [CrossRef]
Wang, H.; Ji, C.; Shi, C.; Ge, Y.; Meng, H.; Yang, J.; Chang, K.; Wang, S. Comparison and evaluation of advanced machine learning methods for performance and emissions prediction of a gasoline Wankel rotary engine. Energy 2022, 248, 123611. [Google Scholar] [CrossRef]
Krzywanski, J.; Skrobek, D.; Sosnowski, M.; Ashraf, W.M.; Grabowska, K.; Zylka, A.; Kulakowska, A.; Nowak, W.; Sztekler, K.; Shahzad, M.W. Towards enhanced heat and mass exchange in adsorption systems: The role of AutoML and fluidized bed innovations. Int. Commun. Heat Mass Transf. 2024, 152, 107262. [Google Scholar] [CrossRef]
Li, Y.; Jia, M.; Han, X.; Bai, X.-S. Towards a comprehensive optimization of engine efficiency and emissions by coupling artificial neural network (ANN) with genetic algorithm (GA). Energy 2021, 225, 120331. [Google Scholar] [CrossRef]
Venkatesh, N.S.; Sugumaran, V.; Thangavel, V.; Vijayaragavan, M.; Subramanian, B.; Js, F.J.; Varuvel, E.G. Efficacy of machine learning algorithms in estimating emissions in a dual fuel compression ignition engine operating on hydrogen and diesel. Int. J. Hydrogen Energy 2023, 48, 39599–39611. [Google Scholar] [CrossRef]
Yıldırım, S.; Tosun, E.; Çalık, A.; Uluocak, İ.; Avşar, E. Artificial intelligence techniques for the vibration, noise, and emission characteristics of a hydrogen-enriched diesel engine. Energy Sources Part A Recovery Util. Environ. Eff. 2018, 41, 2194–2206. [Google Scholar] [CrossRef]
Ali, A.; Rondelli, V.; Martelli, R.; Falsone, G.; Lupia, F.; Barbanti, L. Management Zones Delineation through Clustering Techniques Based on Soils Traits, NDVI Data, and Multiple Year Crop Yields. Agriculture 2022, 12, 231. [Google Scholar] [CrossRef]
Cammarano, D.; Zha, H.; Wilson, L.; Li, Y.; Batchelor, W.D.; Miao, Y. A Remote Sensing-Based Approach to Management Zone Delineation in Small Scale Farming Systems. Agronomy 2020, 10, 1767. [Google Scholar] [CrossRef]
Serrano, J.; Shahidian, S.; Paixão, L.; Marques da Silva, J.; Moral, F. Management Zones in Pastures Based on Soil Apparent Electrical Conductivity and Altitude: NDVI, Soil and Biomass Sampling Validation. Agronomy 2022, 12, 778. [Google Scholar] [CrossRef]
Rokhafrouz, M.; Latifi, H.; Abkar, A.A.; Wojciechowski, T.; Czechlowski, M.; Naieni, A.S.; Maghsoudi, Y.; Niedbała, G. Simplified and Hybrid Remote Sensing-Based Delineation of Management Zones for Nitrogen Variable Rate Application in Wheat. Agriculture 2021, 11, 1104. [Google Scholar] [CrossRef]
Szolnoki, Z.; Farsang, A. Evaluation of metal mobility and bioaccessibility in soils of urban vegetable gardens using sequential extraction. Water Air Soil Pollut. 2013, 224, 1737. [Google Scholar] [CrossRef]
Kassai, P.; Kocsis, M.; Szatmári, G.; Makó, A.; Mészáros, J.; Laborczi, A.; Magyar, Z.; Takács, K.; Pásztor, L.; Szabó, B. Large-scale mapping of soil particle size distribution using legacy data and machine learning-based pedotransfer functions. Geoderma 2025, 454, 117178. [Google Scholar] [CrossRef]
Pitla, S.; Lin, N.; Shearer, S.A.; Luck, J.D. Use of Controller Area Network (Can) Data to Determine Field Efficiencies of Agricultural Machinery. Appl. Eng. Agric. 2014, 30, 829–838. [Google Scholar] [CrossRef][Green Version]
MSZ-08 0205-2:1978; Determination of Physical and Hydrophysical Properties of Soil. Hungarian Standards Institution: Budapest, Hungary, 1978.
Weynants, M.; Baetens, J.M.; Tóth, G.; Lobb, D.A.; Van Orshoven, J. Predicting soil particle-size distribution using the Arany plasticity index: A comparative study across European soils. Geoderma 2024, 439, 116596. [Google Scholar]
MSZ−08-0210:1977; Determination of the Organic Carbon Content of Soil. Hungarian Standards Institution: Budapest, Hungary, 1977.
MSZ 08-0452:1980; Use of High-Capacity Analyser Systems for Soils Analyses. Quantitative Determination of the Organic Carbon Content of the Soil on Contiflo Analyzer System, Hungarian Standards Institution: Budapest, Hungary, 1980.
EN 50270: 2000-01; Electromagnetic Compatibility of Electrical Apparatus for the Detection and Measurement of Combustible Gases, Toxic Gases, or Oxygen. British Standard Institution: London, UK, 2015.
Li, X.; Ma, L.; Chen, P.; Xu, H.; Xing, Q.; Yan, J.; Lu, S.; Fan, H.; Yang, L.; Cheng, Y. Probabilistic solar irradiance forecasting based on XGBoost. Energy Rep. 2022, 8, 1087–1095. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Approximating XGBoost with an interpretable decision tree. Inf. Sci. 2021, 572, 522–542. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Dong, J.; Chen, Y.; Yao, B.; Zhang, X.; Zeng, N. A neural network boosting regression model based on XGBoost. Appl. Soft Comput. 2022, 125, 109067. [Google Scholar] [CrossRef]
Batunacun; Wieland, R.; Lakes, T.; Nendel, C. Using shapley additive explanations to interpret extreme gradient boosting predictions of grassland degradation in Xilingol, China. Geosci. Model Dev. 2021, 14, 1493–1510. [Google Scholar] [CrossRef]
Romeo, L.; Frontoni, E. A unified hierarchical XGBoost model for classifying priorities for COVID-19 vaccination campaign. Pattern Recognit. 2022, 121, 108197. [Google Scholar] [CrossRef]
Zhang, X.L.; Xiu, C.D.; Wang, Y.Z.; Yang, D. High-precision WiFi indoor localization algorithm based on CSI-XGBoost. J. Beijing Univ. Aeronaut. Astronaut. 2018, 44, 2536–2544. [Google Scholar]
Li, Y.Z.; Wang, Z.Y.; Zhou, Y.L.; Han, X. The improvement and application of XGBoost method based on the Bayesian optimization. J. Guangdong Univ. Technol. 2018, 35, 23–28. [Google Scholar]
Liu, P.; Fan, W. Extreme Gradient Boosting (XGBoost) Model for Vehicle Trajectory Prediction in Connected and Autonomous Vehicle Environment. Promet-Traffic Transp. 2021, 33, 767–774. [Google Scholar] [CrossRef]
Wang, H.C.; Wang, Y.Q.; Wang, X.; Yin, W.X.; Yu, T.C.; Xue, C.H.; Wang, A.J. Multimodal machine learning guides low carbon aeration strategies in urban wastewater treatment. Engineering 2024, 36, 51–62. [Google Scholar] [CrossRef]

Figure 1. Tractor movement paths under different working regimes during field trials in 2019 and 2020, correlated with soil properties based on Arany index and humus content.

Figure 2. Conceptual flowchart of the XGBoost machine learning model used for prediction of CO₂ emissions and fuel consumption from field-collected tractor and soil data.

Figure 3. Pearson correlation matrix showing linear relationships between input features (working regime, clay content, and humus content) and output variables (specific fuel consumption, CO₂ emission). Correlation values are dimensionless, ranging from –1 (perfect negative) to +1 (perfect positive). The units of the variables are as follows: humus (%) by weight, specific fuel consumption in g/kWh, and CO₂ concentration in exhaust gases in %.

Figure 4. Scatter plot comparing measured and predicted CO₂ values using the best-performing XGBoost model (dart booster, learning rate = 0.11).

Figure 5. Predicted versus measured CO₂ values for XGBoost model with gbtree booster and learning rate = 0.02, showing acceptable model performance.

Figure 6. Model prediction error analysis for CO₂ emissions in working regimes 1 (a) and 2 (b), highlighting deviation patterns.

Figure 7. Comparison of predicted and actual values for specific fuel consumption using the optimal XGBoost model (dart booster, learning rate = 0.05).

Figure 8. Confusion matrix for the classification of specific fuel consumption (SFC) based on a threshold of 350 g/kWh. The model distinguishes between low (≤350 g/kWh) and high (>350 g/kWh) consumption levels. Matrix values represent the number of samples (unitless), while SFC is measured in grams per kilowatt-hour (g/kWh).

Table 1. Technical specifications of the tractor and plough used during the field experiment for primary soil tillage under varying operational modes—experimental setup data.

Technical Characteristics Tractor		Technical Characteristics Plough
Engine Power (kW) ECE R24	263	Number of working body	6
No. of revolution at max. power (min⁻¹)	1.900	Working width per working body (cm)	40
M_max/n_max (Nm/min⁻¹)	1.498/1.450	Plough working width (cm)	240
q [g/kWh]	195	Tillage depth (cm)	25
Energy supply in reference to const. mass [kW/t]	2.4	Clearance (cm)	80
Specific mass without ballast [kg/kW]	41.5	Hitching type	Mounted
Specific mass with ballast [kg/kW]	52.99	Mass (kg)	2.341
Mass without ballast	10.830	Required power (kW)	243

Table 2. Frequency transmission and reception ranges of the Teltonika FMB 120 GPS/GSM data logger used for tractor telemetry monitoring.

Channels	Tx	Rx
GSM 900	880–915 MHz	925–960 MHz
GSM 1800	1710–1785 MHz	1805–1880 MHz
BT/BLE	2400–2483.5 MHz	2400–2483.5 MHz
GPS L1		1575.42 MHz
GLONASS		1602.56–1615.50 MHz

Table 3. Categorized input variables (tractor working regime, clay content, and humus percentage) used in XGBoost model training and prediction.

Tractor Working Regime		Clay [%]		Humus [%]
	1	42–50	1	2–3.06	1
1300–1700 TMS
25 cm 2.4 m
	2	50–60	2	3.07–3.2	2
1400–1500 TMS
25 cm 2.4 m
	3	42–58	3	3.2–4	3
1400–1600 TMS
25 cm 2.4 m
	4	49–51	4
1500–1600 TMS
25 cm 2.4 m
	5	52–60	5
1600–1700 TMS
25 cm 2.14 m
	6
without TMS
25 cm 2.4 m

Table 4. Statistical performance of different XGBoost model configurations (booster and learning rate) in predicting CO₂ concentration in tractor exhaust gases.

Booster	Learning Rate	R²	MSE	RMSE
gbtree	0.1	0.337811	0.256934	0.506889
gbtree	0.05	0.334865	0.243014	0.492964
gbtree	0.01	0.330987	0.254212	0.504195
gbtree	0.02	0.427529	0.147581	0.384163
gbtree	0.021	0.315038	0.266614	0.515885
dart	0.02	0.312012	0.253482	0.503470
dart	0.01	0.365701	0.256175	0.506137
dart	0.1	0.423450	0.217854	0.466748
dart	0.11	0.549201	0.132960	0.364636

Table 5. Performance evaluation of XGBoost models with varying booster types and learning rates for predicting specific tractor fuel consumption (g/kWh).

Booster	Learning Rate	R²	MSE	RMSE
gbtree	0.3	0.278144	8.712392	2.951676
gbtree	0.1	0.235264	10.042184	3.168940
dart	0.05	0.289586	9.048537	3.008078
dart	0.04	0.274618	9.580924	3.095306
dart	0.3	0.286009	8.822204	2.970193

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Implementation of XGBoost Models for Predicting CO₂ Emission and Specific Tractor Fuel Consumption

Abstract

1. Introduction

2. Materials and Methods

2.1. Structure of the Experiment

2.2. Categorization of Input Variables

2.3. Remote Method for Determining Management Zones Based on NDVI

2.4. Soil Sampling with an Automatic Soil Sampler

2.5. Soil Testing Methods in the Laboratory

2.6. Method for Collecting Data from Tractors in Real Time

2.7. Method for Measuring Exhaust Gases

2.8. Machine Learning Model

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics