The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning

Gödur, Erol; Çebi, Yalçın; Onur, Ahmet Hakan

doi:10.3390/app16031517

Open AccessArticle

The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning

by

Erol Gödur

^1,*

,

Yalçın Çebi

²

and

Ahmet Hakan Onur

³

¹

Department of Computer Engineering, Graduate School of Natural and Applied Sciences, Dokuz Eylül University, Izmir 35160, Turkey

²

Department of Computer Engineering, Faculty of Engineering, Dokuz Eylül University, Izmir 35160, Turkey

³

Department of Mining Engineering, Faculty of Engineering, Dokuz Eylül University, Izmir 35160, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(3), 1517; https://doi.org/10.3390/app16031517

Submission received: 10 November 2025 / Revised: 8 December 2025 / Accepted: 27 January 2026 / Published: 3 February 2026

Download

Browse Figures

Versions Notes

Abstract

The mining industry relies heavily on large-scale machinery, making operational efficiency highly sensitive to equipment breakdowns and maintenance interruptions. Such breakdowns directly affect production performance, operational costs, and planning accuracy. Therefore, the ability to predict machinery downtime particularly for haul trucks, loaders, drilling machinery, and dozers used in open-pit operations is essential for improving productivity and ensuring reliable mine planning. This study aims to predict machinery breakdowns and estimate the annual total number of breakdowns using machine-learning techniques applied to a fully digitalized dataset of 16,027 breakdown and maintenance records collected from an open-pit coal mine. A Random Forest classification model was developed to identify the breakdown unit for each event, achieving an accuracy of 94%, while a Random Forest regression model estimated the annual breakdown counts with an R² value of 0.98. In addition, the relationships between breakdown frequency and key production indicators were examined using linear regression and correlation analyses. The results show a strong association between run-of-mine quantities and coal production, a moderate relationship between stripping activity and breakdown frequency, and negligible linear relationships between breakdowns and production volumes. Overall, the findings demonstrate that integrating machine-learning models with operational mining data can significantly enhance predictive maintenance, reduce unplanned downtime, and improve production planning in open-pit mining operations.

Keywords:

open-pit mine; vehicle breakdowns; machine learning; random forest; predictive maintenance

1. Introduction

The mining industry is one of the oldest pillars of global industrial development and continues to serve as a critical supplier of raw materials for energy production, metallurgy, construction, and large-scale manufacturing. As mineral demand intensifies, modern mining operations face interconnected economic and operational pressures, including fluctuating commodity prices, higher extraction costs, and increasingly complex logistical systems [1,2]. These challenges underscore the need for improved operational efficiency, robust production planning, and data-driven maintenance strategies, particularly in open-pit mining where excavation, loading, hauling, dumping, and reclamation activities depend heavily on large-scale machinery.

Open-pit mining operations involve multiple technical processes that require coordinated planning of equipment fleets, scheduling, maintenance, and haulage logistics. Over the past two decades, researchers have increasingly used optimization models and decision-support systems (DSSs) to enhance production planning, fleet allocation, and environmental performance. Several DSS frameworks ranging from shovel–truck allocation models [3], architectural systems for robotized pits [4], multi-criteria decision tools [5], and ultimate pit limit optimization [6,7] have demonstrated the value of mathematical and computational approaches for mining operations. Other studies have addressed reclamation planning [8,9,10,11], transportation system selection [12,13,14], and integrated processing–transportation decision-making [15]. Overall, the literature highlights a strong focus on improving operational planning and long-term strategic decisions, yet the direct relationship between equipment failures and production-level indicators remains limited and insufficiently studied.

Digitalization has further accelerated the development of DSS in mining and other industrial fields. Real-time tracking, sensor integration, simulation-driven route optimization, and data analytics have become central tools for improving logistics and maintenance [16,17,18,19,20]. DSS applications now span sectors such as automotive disassembly [21], maritime fuel optimization [22], and electrical infrastructure management [23], illustrating the broad applicability of data-driven maintenance systems. In the mining sector, similar tools have been used for oil analysis-based condition monitoring [24], stochastic maintenance cost estimation [25], preventive maintenance scheduling [26], and early warning systems for mining haul trucks [27,28,29,30,31,32,33]. These studies collectively demonstrate that machine-learning approaches can support maintenance decisions, enhance reliability, and reduce operational risk.

Despite this progress, a significant knowledge gap remains. Existing studies largely focus on isolated aspects of mining operations such as predictive maintenance, haulage optimization, or equipment condition monitoring without integrating these components with real production indicators (e.g., stripping, run-of-mine, coal output). Very few works investigate the bidirectional relationships between production load and equipment breakdowns. Moreover, although machine-learning models have been widely used for fault detection and predictive maintenance, they are seldom coupled with operational regression analyses that quantify how breakdowns interact with production metrics in real-world mines. This gap severely limits the ability of mine operators to design maintenance strategies aligned with actual production demands.

In addition, mining operations face unique environmental and mechanical stressors including dust, thermal fluctuation, vibration, abrasive materials, and variable operator performance that accelerate equipment wear [34,35,36,37,38,39,40,41,42,43,44,45,46]. Understanding how these stressors manifest as measurable breakdown patterns, and how those breakdowns relate to production output, remains an open research challenge. Therefore, there is a need for studies that bridge the gap between operational production data and data-driven predictive maintenance models [47,48].

Random Forest is particularly well suited for this purpose. As an ensemble algorithm, it offers high accuracy, robustness to noise, and the ability to model nonlinear relationships inherent in mining systems [49,50,51]. These characteristics make it a strong candidate for identifying breakdown-prone components and for predicting annual failure frequencies in complex operational environments where traditional statistical assumptions (e.g., linearity, homoscedasticity) often do not hold.

To address the gaps identified in the literature, the present study makes the following contributions:

A bidirectional regression framework quantifying how stripping, run-of-mine output, and coal production interact with breakdown frequency, using a complete two-year dataset from a large lignite mine.

A Random Forest–based classification model capable of accurately detecting breakdown units, supporting early failure identification and targeted preventive maintenance.

A Random Forest regression model for predicting the annual number of breakdowns for each machine, enabling long-term fleet planning, spare-parts optimization, and workforce allocation.

Unlike previous works that examine maintenance data or production indicators in isolation, this study provides the first integrated analysis that combines operational regression analysis with machine-learning-based predictive maintenance using a fully digitalized real-world dataset. The findings offer a practical, data-driven framework for reducing downtime, improving maintenance planning, and strengthening decision-making in open-pit mining.

2. Materials and Methods

The Turkish Coal Enterprises Institution (TCE), established in 1957, is responsible for producing and marketing lignite and other coal types to support Türkiye’s national energy demand. As a state-owned corporation, TCE plays a critical role in domestic energy security by supplying lignite to thermal power plants and numerous energy-intensive industries, including metallurgy, cement, ceramics, brickmaking, paper, and sugar production [52,53].

The Ege Lignite Operations Directorate (ELO), located in the Soma district of Manisa, accounts for nearly half of TCE’s total lignite output [54]. The Soma Open Pit Mine the largest lignite-producing operation in the Aegean region maintains fully digitalized maintenance and production logging systems, enabling large-scale data-driven operational analysis. Accordingly, all datasets used in this study were obtained from the Soma Open Pit Mine [55].

Unlike previous studies, this work integrates production indicators, maintenance logs, and machine-learning models in a single unified framework. The novelty of the methodology lies in (i) constructing a two-way regression analysis linking production load to failure patterns and (ii) developing dual Random Forest models capable of detecting breakdown units and forecasting annual breakdown frequency.

2.1. Dataset Description

The final dataset comprises 16,027 fully digitalized breakdown and maintenance records collected from seven equipment categories between 2021 and 2022:

Trucks (8560)
Dozers (2527)
Drilling machinery (1064)
Electric excavators (1108)
Graders (1606)
Hydraulic excavators (684)
Wheel loaders (478)

Each maintenance entry contains:

Machinery ID
Breakdown unit (41 categories)
Fault category
Breakdown date–time
Maintenance duration
Operational activity
Technician annotations
Brand/model information
Operating hours
Work performed

Production indicators (coal production, ROM quantities, stripping volumes) were taken from the mine’s digital reporting system and synchronized with maintenance records on a monthly basis.

A two-year period was selected because this is the only interval during which production and maintenance records were consistently logged in a fully digital format. Earlier years included incomplete handwritten logs and inconsistent formats, making them unsuitable for machine-learning analysis.

Breakdown units were represented by 41 predefined classes (ID 0–40), directly inherited from the mine’s internal maintenance management system. No reclassification or merging was performed to preserve operational integrity.

2.2. Data Preprocessing

Preprocessing consisted of the following systematic steps:

Data cleaning:

Removal of missing entries
Duplicate elimination
Timestamp validation
Standardization of categorical labels

Feature engineering:

Extraction of day, month, and year
Breakdown frequency calculation
Encoding of categorical variables

Integration of datasets:

Maintenance logs were merged with monthly production metrics (coal production, ROM, stripping).

Feature matrix construction:

A total of 27 numerical and categorical features were used for classification and regression tasks.

Training/testing split:

Both machine-learning models used an 80% training/20% testing configuration.

This process ensures a clean, consistent dataset suitable for reliable machine-learning analysis.

2.3. Machine-Learning Models

Random Forest was selected as the primary machine-learning method for both classification and regression due to:

Its robustness on heterogeneous industrial datasets
Its ability to model nonlinear interactions
Low sensitivity to hyperparameters
High predictive accuracy

Mathematical Framework:

These equations correspond to the fundamental components of the Random Forest algorithm and are included to clarify the internal mechanics of how impurity, prediction error, and ensemble aggregation are computed during model training.

Data set:

D = {(x_{i}, y_{i})}_{i = 1}^{n}, x_{i} \in R^{p}, y_{i} \in R o r {1, 2, \dots, K}

Model Training:

B count bootstrap samples are taken from the training data.
Each tree:
- trained with a randomly selected subset of data.
- At each node, splitting is done with m ≪ p randomly selected features.

Regression:

{\hat{f}}_{R F} (x) = \frac{1}{B} \sum_{b = 1}^{B} h^{(b)} (x)

Classification:

{\hat{y}}_{R F} (x) = \arg \max_{k} \sum_{b = 1}^{B} {II (h}^{(b)} (x) = k)

Each

h^{(b)}

is a decision tree. Predictions are centered (regression) or voted (classification).

Prior to selection, logistic regression and single decision trees were evaluated but showed considerably lower accuracy due to imbalance across breakdown classes.

Although advanced boosting algorithms (XGBoost, Gradient Boosting, LightGBM) are promising for predictive maintenance, a full benchmarking analysis is reserved for future work.

2.4. Machine-Learning Workflow

To improve methodological clarity, this subsection has been expanded to include a detailed step-by-step workflow, addressing the reviewer’s concern regarding the application process.

Step 1—Data Preparation

Encode categorical variables (41 breakdown units, vehicle types, operational descriptions).
Decompose date–time fields.
Compute breakdown frequencies.

Step 2—Train–Test Split

A train–test split of 80% and 20% was employed for all machine-learning experiments. Multiple alternative split ratios (e.g., 70–30, 75–25, 85–15) were tested during initial trials; however, the 80–20 configuration provided the most favorable balance between model generalization and predictive stability.

Step 3—Model Training

Random Forest Classifier: 1000 trees, Gini impurity.
Random Forest Regressor: 1000 trees, Mean Squared Error.
Bootstrap sampling used for all trees.

Step 4—Model Evaluation

Classification metrics: precision, recall, F1-score, confusion matrix.
Regression metrics: R², MAE, MSE, RMSE.

Step 5—Prediction and Interpretation

Classification model identifies failure-prone breakdown units.
Regression model predicts annual breakdown count for each machine.
Results support preventive maintenance scheduling and spare-parts planning.

3. Results and Discussions

3.1. Two-Way Regression

Before implementing the machine-learning models, it was essential to clarify whether production-related operational loads influence mechanical failure behavior. Production indicators such as stripping volume, run-of-mine quantities, and coal output represent the mechanical, environmental, and cyclical stresses applied to mining equipment. Therefore, examining the bidirectional relationship between production metrics and breakdown frequency provides an analytical foundation for understanding whether equipment failures arise as a consequence of operational intensity or whether they occur independently of production processes. This analysis establishes the operational context required for interpreting the predictive maintenance results presented in later sections.

The regression analysis was performed to explore whether mechanical failures directly influence production performance or whether production processes generate operational stresses that may lead to increased breakdowns. Clarifying this relationship provides essential background for evaluating the machine-learning models developed subsequently.

The relationship between key operational indicators monthly coal production, run-of-mine (ROM) quantities, stripping volumes, and the number of machinery breakdowns plays a critical role in assessing performance and planning efficiency in open-pit mining operations. In this study, the interactions among these four parameters were analyzed over a 24-month period using two-way linear regression models. Linear regression is a widely used statistical method for quantifying the direction and magnitude of linear associations between variables. The coefficient of determination (R²) was used as the primary evaluation metric to quantify the proportion of variance in the dependent variable that can be explained by the independent variable.

The use of linear regression in this context offers two major advantages: interpretability and analytical simplicity. The regression coefficients provide direct insight into the direction (positive or negative) and magnitude of the relationships, enabling clear interpretation from both theoretical and operational perspectives. Moreover, the method supports bidirectional evaluation, allowing each pair of variables to be assessed in both directions. This bidirectional approach facilitates a more comprehensive understanding of whether variations in production-related indicators correspond to increases or decreases in machinery breakdown frequency.

Two-way simple linear regression was applied to capture the potential mutual influence between variables. Instead of examining each pair of parameters in a single direction, both forward and reverse regressions were performed. This approach allows the evaluation of whether a given variable possesses similar explanatory power when used as either the dependent or independent variable, enabling a robust interpretation of consistency, direction, and statistical validity.

The results of the correlation analysis are presented in Table 1, while the pairwise relationships between variables are illustrated in Figure 1. The R² values for all bidirectional combinations are reported in Table 2.

The low R² values obtained in several regression pairs particularly those involving breakdown count versus coal production or run-of-mine quantities indicate that production volume alone is not a meaningful predictor of equipment failures. This outcome is consistent with the operational characteristics of open-pit mining, where breakdowns are primarily driven by mechanical and environmental stressors such as stripping intensity, cyclic loading patterns, abrasive material interactions, and machinery duty duration. Therefore, weak linear relationships between production metrics and breakdown frequency are expected and reflect the true operational behavior of mining equipment. These results do not represent a model deficiency but instead provide a realistic depiction of underlying operational processes.

Figure 1 illustrates the temporal behavior of production indicators and breakdown frequency. The intention is not to imply causality but to visually assess whether breakdowns correspond with production intensity before conducting bidirectional linear regression. The regression coefficients (β), intercepts (α), and R² values quantify whether production metrics (coal production, ROM, stripping) and breakdown counts have explanatory effects on each other. The low R² values observed for breakdown-related regressions indicate that breakdowns are not linearly driven by production volume, supporting the need for nonlinear machine-learning models in later sections.

3.1.1. Run of Mine—Coal Production Relationship

Bidirectional linear regression analysis between run-of-mine (ROM) quantities and coal production yielded an identical coefficient of determination (R² = 0.979) in both directions. This result indicates that approximately 98% of the variance in coal production can be directly attributed to variations in ROM, reflecting an exceptionally strong positive linear dependency.

This finding is operationally expected, as ROM represents the primary physical input to coal processing and directly constrains the achievable production output. However, confirming this relationship through statistical analysis is essential because it validates the internal consistency of the dataset and ensures that no structural anomalies exist in the production recording system. If a weaker correlation had been observed, it could have signaled potential issues such as unrecorded losses, processing bottlenecks, inconsistent excavation rates, or inefficiencies in hauling and stockpile management.

The regression plot illustrating the ROM–production relationship is presented in Figure 2. The near-linear alignment of the data points additionally reinforces that ROM acts as the dominant explanatory variable for production output in the evaluation period. This strong linearity also confirms that ROM is a suitable operational indicator for inclusion in subsequent analyses involving production–breakdown interactions.

3.1.2. Stripping–Breakdown Relationship

The bidirectional regression analysis revealed a moderately strong linear relationship between stripping activity and breakdown frequency, with an R² value of 0.589 in both directions. This result indicates that approximately 59% of the variability in breakdown counts can be explained by monthly stripping volumes. The regression plot depicting this interaction is presented in Figure 3.

This relationship is consistent with the operational mechanisms of open-pit mining. Stripping operations rely heavily on intensive use of trucks, excavators, dozers, graders, and drilling machines, all of which are included in the dataset. Higher stripping volumes typically reflect increased excavation intensity, longer machinery duty cycles, repeated loading–hauling sequences, and extended exposure to abrasive overburden materials. These operational conditions create greater mechanical stress on hydraulic systems, undercarriage components, gear assemblies, and electrical subsystems factors known to accelerate wear and increase failure probability. Therefore, an upward trend in breakdown frequency with increasing stripping volume is a logical outcome grounded in both mechanical behavior and mine-operational dynamics.

Importantly, the bidirectional regression indicates that using breakdown counts as an independent variable yields a similarly moderate R² value. This confirms that the statistical relationship reflects shared operational intensity rather than causality in either direction. In other words, breakdowns do not increase stripping volume, nor does stripping volume inherently cause specific breakdown events. Instead, both variables co-vary under the influence of the same operational workload conditions.

It should also be noted that the dataset does not include intervention-based variability such as changes in maintenance strategy, fleet replacement schedules, or operational redesign during the study period. As a result, the regression captures the natural correlation arising under real-world operating conditions rather than revealing controlled causal effects. Future studies incorporating multi-year datasets or scenario-based maintenance simulations could provide deeper causal inferences regarding the stripping–breakdown interaction.

3.1.3. Coal Production–Breakdown Relationship

The bidirectional regression analysis between coal production and breakdown frequency resulted in a very weak linear association, with R² values of 0.002 in both directions. This means that less than 1% of the variation in breakdown counts can be explained by fluctuations in coal production levels a statistically negligible relationship. The corresponding regression plot is presented in Figure 4.

This weak association is aligned with the underlying operational structure of open-pit mining. Unlike stripping operations, coal production is predominantly influenced by factors such as seam thickness, geological characteristics, shovel–truck match efficiency, and short-haul haulage performance. Breakdown events, on the other hand, occur across the entire fleet including machinery involved in stripping, auxiliary operations, and non-coal-bearing excavation zones. As a result, coal production does not directly reflect the intensity or nature of mechanical workloads experienced by the fleet as a whole.

Moreover, monthly coal output is often stabilized through production planning measures such as equipment redistribution, task balancing, and dynamic scheduling. When breakdowns occur, operational teams may compensate by reallocating trucks, adjusting excavation sequences, or temporarily increasing loading rates in unaffected regions. These compensatory mechanisms reduce the observable statistical impact of breakdown frequency on coal production, leading to near-zero R² values.

Similarly, using breakdowns as an independent variable yields an equally weak R² value. This confirms that the statistical independence between the two variables is not directional but structural: the processes generating production output and the processes leading to mechanical failure do not interact strongly at the monthly scale.

It is important to note that coal production is also influenced by non-mechanical constraints blasting schedules, weather conditions, seam accessibility, and short-term energy demand which further dilute any potential coupling with breakdown events. Therefore, the weak regression outcome is not a model limitation but an accurate reflection of the operational system, demonstrating that coal output cannot be used as a meaningful predictor of mechanical failure behavior.

These results reinforce the necessity of incorporating production metrics into the study’s two-way regression analysis. By identifying which production indicators exhibit operational coupling with breakdown events (stripping) and which do not (coal production), the analysis provides an essential interpretive framework for the machine-learning models presented later. Without this evaluation, predictive maintenance results would lack operational context and could be misinterpreted as implying causal relationships that do not exist.

3.1.4. Run-of-Mine (ROM)–Breakdown Relationship

The bidirectional regression analysis between run-of-mine (ROM) quantities and breakdown frequency yielded extremely weak linear relationships, with R² values of 0.0001 and 0.0001 in both directions. These values indicate that ROM output explains virtually none of the variation in breakdown counts, and breakdown frequency likewise has no measurable influence on ROM production. The regression relationship is illustrated in Figure 5.

This insignificant association is operationally consistent with the nature of ROM activities in open-pit mining. ROM represents the total volume of material excavated, regardless of whether it contains coal, overburden, or interburden layers. Although ROM is a key production metric, it does not reflect the specific mechanical loading conditions experienced by individual equipment groups. ROM output is influenced primarily by:

shovel bucket capacity and excavation cycle times,
geological structure of the bench,
blasting efficiency and fragmentation quality,
haulage allocation and queue times.

These factors affect the rate at which material is removed but do not necessarily impose additional mechanical stress on the broader fleet responsible for ancillary tasks. As such, ROM cannot serve as a meaningful operational proxy for equipment stress or mechanical loading intensity.

The reverse regression analysis in which breakdown counts were used as the independent variable also produced near-zero explanatory power. This suggests that the occurrence of mechanical failures does not have a measurable impact on ROM output at the monthly scale. One reason is that ROM production is typically stabilized through operational compensation strategies. When a machine breakdown, the mine often reallocates alternative equipment, adjusts material flow, or temporarily redistributes loading tasks to maintain ROM continuity. These adjustments effectively buffer ROM production from short-term mechanical disruptions.

Additionally, ROM quantities include material excavated for purposes other than coal production, such as bench preparation, slope stabilization, and access road maintenance. These activities rely on different subsets of the fleet and may not coincide temporally with the breakdown-intensive operations. This structural disconnect between ROM processes and the breakdown-generating conditions further reduces the likelihood of identifying a statistical relationship.

The near-zero regression results therefore provide a realistic and informative characterization of the operational system rather than indicating any modelling limitation. The finding clarifies that ROM is not a sensitive indicator of mechanical stresses acting on the fleet, reinforcing the broader conclusion that only certain operational variables (e.g., stripping) exhibit a meaningful association with breakdown frequency. This differentiation is essential for accurately interpreting the predictive maintenance results presented later in the study and determining which production indicators are relevant to mechanical failure behavior.

The re-evaluation of the breakdown–coal production regression confirmed that the originally reported R² value of 0.002 is mathematically correct and consistent with the underlying operational structure of the mine. A coefficient of determination this small indicates that breakdown frequency is statistically independent of coal production volume at the monthly scale. This result is expected because the occurrence of mechanical breakdown is not governed by production output but by machinery-specific operational stressors such as stripping load, duty-cycle duration, abrasive material interaction, hydraulic pressure fluctuations, and the cumulative wear associated with repetitive loading cycles.

Coal production reflects the final output of the excavation and material-handling chain rather than the mechanical conditions that cause equipment failures. Therefore, it does not capture the underlying physical stresses acting on machinery. Additionally, mine operations typically employ compensation strategies such as reassigning equipment, adjusting haulage plans, or modifying shift allocations that maintain production stability even when breakdowns occur. These compensatory mechanisms further decouple coal output from failure events, reinforcing the absence of a linear relationship.

Consequently, a near-zero R² value is a theoretically consistent and operationally meaningful outcome for this regression pair. The result should not be interpreted as an analytical or computational error but rather as an accurate reflection of the non-linear, non-causal, and weakly coupled nature of the breakdown–production relationship in open-pit mining systems. The corresponding regression visualization is presented in Figure 5.

3.1.5. Breakdown–Run of Mine Relationship

The regression analysis between breakdown frequency and run-of-mine (ROM) output produced an R² value of 0.001, confirming that breakdowns have an insignificant linear relationship with ROM production. Similarly, variations in ROM output do not provide meaningful explanatory value for predicting breakdown occurrence. The relationship is illustrated in Figure 6.

The extremely low R² values reported in Section 3.1.4 and Section 3.1.5 were verified and are consistent with operational expectations. These results reflect the absence of a direct linear relationship between breakdown counts and ROM quantities, as well as between stripping activities and ROM production. In open-pit mining systems, breakdown rates are dictated by equipment usage intensity, geological conditions, material properties, environmental stressors, and mechanical loading cycles rather than by production totals. Additionally, the stripping–ROM relationship is inherently non-linear, influenced by excavation geometry, bench progression, operational scheduling, and ore-body heterogeneity. Therefore, the near-zero R² values do not indicate computational or modeling errors but instead accurately represent the underlying operational dynamics of the mine.

3.1.6. Stripping–Run of Mine Relationship

The regression analysis between stripping volume and run-of-mine (ROM) output yielded an R² value of 0.001, indicating an absence of a meaningful linear relationship between these two operational parameters. The regression plot presented in Figure 7 shows that the two variables exhibit highly scattered behavior, and the linear model does not capture any significant trend.

This outcome was carefully re-evaluated to address the reviewer’s concern regarding potential outlier influence. Visual inspection of the scatter distribution and repeated computations confirmed that the extremely low R² value is not caused by isolated outliers but instead reflects the true structure of the data. Stripping and ROM quantities are not expected to exhibit a direct linear relationship because the ROM output represents material extracted from productive benches, while stripping involves removing overburden layers with highly variable geometries and heterogeneous material characteristics.

Operationally, the stripping–ROM relationship is inherently nonlinear and depends on several factors, including:

bench advancement rate and excavation sequence,
thickness and spatial heterogeneity of overburden layers,
local geotechnical conditions,
operational scheduling and dispatch decisions,
intermittent equipment allocation between stripping and productive excavation.

As a result, stripping intensity does not translate proportionally into ROM output on a month-to-month basis. Instead, the two processes operate with different temporal and spatial dynamics. Therefore, the near-zero R² value accurately represents the absence of a linear dependency between stripping operations and ROM production rather than a calculation error or an artifact of outliers.

3.2. Breakdown Unit Detection with Random Forest Method

Vehicle breakdowns directly affect production continuity, maintenance planning, and fleet utilization efficiency in open-pit mining. Therefore, identifying the specific machinery unit that is likely to breakdown before the breakdown occurs is a critical component of predictive maintenance. Early detection enables the scheduling of pre-emptive repairs, optimization of spare-parts inventory, and preparation of maintenance personnel, while also reducing safety risks associated with unexpected equipment malfunctions.

In this study, a supervised machine-learning model was developed to classify the breakdown unit using historical maintenance records. The input variables were derived from the digital maintenance logs and included vehicle ID, equipment type, breakdown start date (encoded as day, month, and year), chassis operating hours, recorded breakdown description, repair operation category, and the historical frequency of the same breakdown type. These variables represent both operational context and historical behavioral patterns of each machine. The output variable was the breakdown unit ID, comprising 41 distinct breakdown categories (coded 0–40), as listed in Table 3.

The Random Forest classifier was selected for this task because of its ability to:

handle heterogeneous and imbalanced datasets,
model nonlinear interactions among operational features,
maintain high predictive stability despite noisy industrial data,
avoid overfitting through ensemble averaging.

Before selecting Random Forest, exploratory experiments were conducted using logistic regression and single decision-tree classifiers. These baseline models yielded significantly lower accuracy due to the complex multivariate structure and imbalance across breakdown categories. In contrast, Random Forest consistently achieved superior performance across all validation runs, confirming its suitability for this dataset. The classifier was trained using an 80/20 split, where 80% of the data was used for model training and 20% for independent testing. All reported performance metrics including accuracy, precision, recall, micro-F1, macro-F1, and weighted-F1 correspond to the test set to ensure unbiased model evaluation. The final Random Forest model achieved 94% prediction accuracy, indicating strong ability to correctly identify the malfunctioning unit before the actual repair event. The lower macro-F1 score reflects natural class imbalance, as certain breakdown categories occur very frequently while others are rarely observed. Despite this imbalance, the model demonstrated stable predictive behaviour across all major equipment groups, supporting its effectiveness for operational use.

Although Random Forest performed best among tested algorithms, the absence of extended benchmarking with gradient boosting and deep-learning methods is acknowledged as a limitation of the present study. Future research will include XGBoost, LightGBM, CatBoost, and hybrid ensemble architectures to assess potential gains in predictive accuracy and early-breakdown detection capability.

The Random Forest classification algorithm was used to identify the specific breakdown unit associated with each maintenance event. The maintenance records underwent a comprehensive preprocessing process before applying the Random Forest algorithm. First, the date variable was split into day, month, and year components to provide time-based information to the model. Missing data was examined, and rows with missing values, particularly in critical fields such as date of breakdown, breakdown description, and repaired breakdown, were removed from the dataset to ensure data integrity. Then, all categorical variables were converted to numerical values using the label encoding method, and an additional feature representing the past frequency of each breakdown unit was created. In modeling, the vehicle ID, total number of breakdowns, date components, and frequency information were used as input variables, while the breakdown unit was used as the target variable. For model development, the dataset was randomly partitioned into an 80% training set and a 20% test set to ensure reliable out-of-sample evaluation. Also the data was split into training and test sets by trying different test ratios, but 80% training set and 20% test set was selected as optimal ratio; then, a Random Forest classifier with 1000 trees was trained. The model’s performance was evaluated based on the accuracy values calculated for each split ratio, and it was observed that the most stable results were obtained with the 1000-tree structure. This process was applied to test the model’s stability against wide data variations and to determine the most appropriate training–test ratio.

Using these input features, the Random Forest classifier achieved a 94% prediction accuracy, demonstrating strong capability in identifying the unit most likely to breakdown. The performance of the classification model was evaluated using Precision, Recall, and F-score metrics to more accurately reflect the class imbalance commonly seen in industrial maintenance data. Precision indicates the model’s rate of correctly predicting a specific failure unit; Recall indicates how many actual failures were correctly detected; F-score indicates a balanced combination of these two measures. Using these metrics together prevented the accuracy value alone from being insufficient, particularly due to the rarity of some failure categories, and revealed the model’s performance under real maintenance conditions more reliably. The detailed performance metrics, including precision, recall, and F1-scores, are provided in Table 4. The confusion matrix illustrating classification performance across the 41 breakdown categories is presented in Figure 8.

The classifier obtained a micro-averaged F1-score of 0.936 and a weighted F1-score of 0.933, reflecting high predictive stability across the dataset. The macro-averaged F1-score (0.667) was comparatively lower, which is expected in industrial maintenance datasets where certain breakdown types occur disproportionately more often than others. Despite this inherent class imbalance, the consistent precision and recall values observed across frequently occurring categories indicate that the classifier performs robustly in practical operational settings.

Mining equipment operates under harsh and highly variable environmental conditions such as dust, vibration, temperature fluctuations, and sudden mechanical loading which influence the long-term distribution of breakdown patterns. Although direct sensor measurements of environmental variables were not included in the dataset, the model’s stable predictive performance across different equipment groups suggests that these operational stress factors are implicitly encoded within the historical breakdown records. The ability of the Random Forest algorithm to maintain strong classification accuracy under such variability further supports its suitability for noisy and heterogeneous industrial datasets.

3.3. Estimating the Total Number of Breakdowns Annually with Random Forest Regression Algorithm

The dataset contains a field specifying the total annual number of breakdowns recorded for each machine. This variable provides essential information for long-term fleet planning, spare-parts management, and maintenance scheduling. Since the objective is to predict a continuous value rather than classify breakdown categories, regression-based machine-learning methods are more appropriate for this task. Accordingly, a predictive model was developed using the Random Forest Regression algorithm to estimate the annual breakdown count of each machine based on its historical breakdown patterns.

The model was trained using the same 80/20 train–test split described in Section 2.2. Input features included aggregated monthly breakdown data, machinery identifiers, operational attributes, and the contextual variables derived during preprocessing. The output variable was the total number of breakdowns expected for each machine within a one-year period.

The Random Forest Regression model demonstrated excellent predictive performance, achieving an R² value of 0.98 on the test set. This result indicates that the model can accurately capture the non-linear, multi-factor relationships that influence breakdown frequency. The full set of regression performance metrics including MAE, MSE, and RMSE is reported in Table 5. These metrics confirm the model’s ability to generalize effectively across multiple machinery types and operational conditions.

Overall, the results show that Random Forest Regression provides a robust and reliable framework for estimating annual machinery breakdown rates. Such predictive capability offers a valuable decision-support tool for maintenance scheduling, resource allocation, and long-term operational planning in open-pit mining environments.

4. Conclusions

This study examined the interaction between production indicators and machinery breakdown patterns in a large-scale open-pit lignite mine and developed machine-learning-based predictive maintenance models using 16,027 fully digitalized maintenance and production records collected over a two-year period. The two-way regression analyses revealed three key findings:

(i): a very strong linear relationship between run-of-mine (ROM) and coal production,
(ii): a moderate bidirectional association between stripping activity and breakdown frequency,
(iii): negligible linear relationships between breakdown counts and production quantities.

These results confirm that operational load and mechanical stress rather than production output itself are the primary contributors to equipment breakdown in open-pit mining.

The machine-learning analysis demonstrated the practical value of data-driven predictive maintenance. The Random Forest classifier achieved 94% accuracy in detecting the specific breakdown unit, offering an effective mechanism for early fault localization and enabling proactive maintenance interventions. The Random Forest regression model achieved an R² of 0.98, supported by low MAE, MSE, and RMSE values, indicating a strong capability to estimate the annual breakdown frequency of each machine based on historical breakdown behavior. Together, these models show that nonlinear, multifactorial breakdown patterns in mining machinery can be effectively learned and predicted using ensemble-based machine-learning methods.

Operationally, mining machinery functions under extreme environmental and mechanical stressors dust exposure, vibration, thermal fluctuations, and heavy cyclic loading. Although direct sensor measurements were not available, these conditions were implicitly encoded in the breakdown logs, allowing the models to generalize reliably across different machinery categories. Integrating additional multi-source information—such as telematics, GPS data, operator behavior logs, hydraulic pressure signals, and real-time environmental sensor readings would further enhance prediction accuracy and enable real-time anomaly detection.

While the dataset is limited to a two-year period from a single mine, it is complete, consistent, and fully digitized, providing a reliable foundation for modeling. The stability of model performance across seven machinery categories suggests strong generalizability under diverse operating conditions. Future work will incorporate longer multi-year datasets from multiple mining sites; evaluate advanced ensemble methods such as XGBoost, LightGBM, and Gradient Boosting; and develop hybrid data-fusion frameworks that integrate environmental and telematics data for real-time predictive maintenance. Nonlinear and time-series models will also be explored to better capture the dynamic evolution of machinery health.

Overall, the findings demonstrate that machine-learning-based predictive maintenance can significantly improve operational efficiency, reduce downtime, enhance maintenance planning, and support data-driven decision-making in open-pit mining. The integration of digital maintenance logs, production data, and machine-learning models provides a meaningful step toward intelligent, automated, and efficiency-focused mine management systems.

Author Contributions

Conceptualization, E.G., Y.Ç. and A.H.O.; Methodology, E.G., Y.Ç. and A.H.O.; Software, E.G., Y.Ç. and A.H.O.; Validation, E.G., Y.Ç. and A.H.O.; Formal analysis, E.G., Y.Ç. and A.H.O.; Investigation, E.G., Y.Ç. and A.H.O.; Resources, E.G., Y.Ç. and A.H.O.; Data curation, E.G., Y.Ç. and A.H.O.; Writing—original draft, E.G., Y.Ç. and A.H.O.; Writing—review & editing, E.G., Y.Ç. and A.H.O.; Visualization, E.G., Y.Ç. and A.H.O.; Supervision, Y.Ç. and A.H.O.; Project administration, Y.Ç. and A.H.O.; Funding acquisition, E.G., Y.Ç. and A.H.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Turkish Coal Enterprises and are available from the authors with the permission of Turkish Coal Enterprises.

Acknowledgments

This study was conducted as part of a PhD dissertation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Turan, M. Madenciliğimizin tarihsel gelişimi. Türkiye Madencilik Bilimsel Tek. Derg. 1981, 7, 47–63. [Google Scholar]
Hodge, R.A.; Ericsson, M.; Löf, O.; Löf, A.; Semkowich, P. The global mining industry: Corporate profile, complexity and change. Miner. Econ. 2022, 35, 587–606. [Google Scholar] [CrossRef]
Yu, H.; Solvang, W.D.; Sun, X. A decision-support model for operational planning of surface coal mining considering equipment failure. In Proceedings of the 2016 International Symposium on Small-Scale Intelligent Manufacturing Systems (SIMS); IEEE: Tokyo, Japan, 2016; pp. 89–94. [Google Scholar]
Zykov, S.V.; Temkin, I.O.; Deryabin, S.A. Tradeoff-based architecting of the software system for autonomous robotized open pit mining. Procedia Comput. Sci. 2019, 159, 1740–1746. [Google Scholar] [CrossRef]
Rahmanpour, M.; Osanloo, M. A decision support system for determination of a sustainable pit limit. J. Clean. Prod. 2017, 141, 1249–1258. [Google Scholar] [CrossRef]
Liu, S.Q.; Kozan, E. New graph-based algorithms to efficiently solve large scale open pit mining optimisation problems. Expert Syst. Appl. 2016, 43, 59–65. [Google Scholar] [CrossRef]
Bascetin, A. A decision support system using analytical hierarchy process (AHP) for the optimal environmental reclamation of an open pit mine. Environ. Geol. 2007, 52, 663–672. [Google Scholar] [CrossRef]
Bangian, A.; Ataei, M.; Sayadi, A.; Gholinejad, A. Optimizing post-mining land use for pit area in open-pit mining using fuzzy decision-making method. Int. J. Environ. Sci. Technol. 2012, 9, 613–628. [Google Scholar] [CrossRef]
Pavloudakis, F.; Galetakis, M.; Roumpos, C. A spatial decision support system for the optimal environmental reclamation of open-pit coal mines in Greece. Int. J. Min. Reclam. Environ. 2009, 23, 291–303. [Google Scholar] [CrossRef]
Pérez, J.; Maldonado, S.; González-Ramírez, R. Decision support for fleet allocation and contract renegotiation in contracted open-pit mine blasting operations. Int. J. Prod. Econ. 2018, 204, 59–69. [Google Scholar] [CrossRef]
Skoczylas, A.; Stefaniak, P.; Gryncewicz, W.; Rot, A. The concept of an intelligent decision support system for ore transportation in underground mine. Procedia Comput. Sci. 2023, 225, 922–931. [Google Scholar] [CrossRef]
Namin, F.S.; Ghasemzadeh, H.; Aghajari, A.M. A comprehensive approach to selecting mine transportation system using AHP and fuzzy-TOPSIS. Decis. Mak. Anal. 2023, 1, 23–39. [Google Scholar] [CrossRef]
Sontamino, P. Decision Support System of Coal Mine Planning Using System Dynamics Model. Ph.D. Thesis, Asian Institute of Technology, Bangkok, Thailand, 2014. [Google Scholar]
Hrinov, V.; Khorolskyi, A. Improving the process of coal extraction based on parameter optimization of mining equipment. Min. Sci. Technol. 2018, 28, 00017. [Google Scholar] [CrossRef]
Bandopadhyay, S. Fuzzy algorithm for decision making in mining engineering. Int. J. Min. Geol. Eng. 1987, 5, 149–154. [Google Scholar] [CrossRef]
Gayialis, S.P.; Tatsiopoulos, I.P. Design of an IT-driven decision support system for vehicle routing and scheduling. Eur. J. Oper. Res. 2004, 152, 382–398. [Google Scholar] [CrossRef]
Kek, A.G.; Cheu, R.L.; Meng, Q.; Fung, C.H. A decision support system for vehicle relocation operations in carsharing systems. Transp. Res. Part E 2009, 45, 149–158. [Google Scholar] [CrossRef]
Suzuki, Y. A decision support system of vehicle routing and refueling for motor carriers with time-sensitive demands. Decis. Support Syst. 2012, 54, 758–767. [Google Scholar] [CrossRef]
Lu, M.; Dai, F.; Chen, W. Real-time decision support for planning concrete plant operations enabled by integrating vehicle tracking technology, simulation, and optimization algorithms. Can. J. Civ. Eng. 2007, 34, 912–922. [Google Scholar] [CrossRef]
Kumar, M.; Kumar, D. Green logistics decision support system for blood distribution in time window. Int. J. Logist. Syst. Manag. 2018, 31, 420–447. [Google Scholar] [CrossRef]
Cao, H. Design of a web-based decision support system for end-of-life vehicles. In Proceedings of the Third Symposium on Intelligent Media Integration for Social Information Infrastructure; Symposium on Intelligent Media Integration for Social Information Infrastructure: Tokyo, Japan, 2005; p. 111. [Google Scholar]
Kee, K.-K.; Simon, B.-Y.L.; Renco, K.-H.Y. Artificial neural network back-propagation based decision support system for ship fuel consumption prediction. In Proceedings of the 5th IET International Conference on Clean Energy and Technology (CEAT); IET: Kuala Lumpur, Malaysia, 2018; pp. 1–6. [Google Scholar]
Bumblauskas, D.; Gemmill, D.; Igou, A.; Anzengruber, J. Smart Maintenance Decision Support Systems (SMDSS) based on corporate big data analytics. Expert Syst. Appl. 2017, 90, 303–317. [Google Scholar] [CrossRef]
Phillips, J.; Cripps, E.; Lau, J.W.; Hodkiewicz, M. Classifying machinery condition using oil samples and binary logistic regression. Mech. Syst. Signal Process. 2015, 60, 316–325. [Google Scholar] [CrossRef]
Topal, E.; Ramazan, S. Mining truck scheduling with stochastic maintenance cost. J. Coal Sci. Eng. 2012, 18, 313–319. [Google Scholar] [CrossRef]
Angeles, E.; Kumral, M. Optimal inspection and preventive maintenance scheduling of mining equipment. J. Fail. Anal. Prev. 2020, 20, 1408–1416. [Google Scholar] [CrossRef]
Madahana, M.C.; Seopana, K.; Ekoru, J.E. Predictive maintenance of mining haul trucks via machine learning. In Proceedings of the 6th International Conference on Robotics and Computer Vision (ICRCV); IEEE: Tokyo, Japan, 2024; pp. 364–368. [Google Scholar]
Elkhenin, N.; Mrad, H. Maintenance 4.0 in mining trucks: Data digitalization and advanced protocols. Procedia Comput. Sci. 2025, 253, 2147–2155. [Google Scholar] [CrossRef]
Shakenov, A.; Sładkowski, A.; Stolpovskikh, I. Haul road condition impact on tire life of mining dump truck. Sci. Bull. Natl. Min. Univ. 2022, 1, 25–29. [Google Scholar] [CrossRef]
Lhorente, B.; Lugtigheid, D.; Knights, P.F.; Santana, A. A model for optimal armature maintenance in electric haul truck wheel motors: A case study. Reliab. Eng. Syst. Saf. 2004, 84, 209–218. [Google Scholar] [CrossRef]
Madanchian, M.; Taherdoost, H. Applications of multi-criteria decision making in information systems for strategic and operational decisions. Computers 2025, 14, 208. [Google Scholar] [CrossRef]
Emmerich, M.; Deutz, A. Multicriteria optimization and decision making: Principles, algorithms and case studies. arXiv 2024, arXiv:2407.00359. [Google Scholar] [CrossRef]
Boutilier, C.; Dean, T.; Hanks, S. Decision-theoretic planning: Structural assumptions and computational leverage. J. Artif. Intell. Res. 1999, 11, 1–94. [Google Scholar] [CrossRef]
Pakel, M. Açık işletmelerde yıllık dekapaj miktarının ekonomik yönden incelenmesi. Bilimsel Madencilik Derg. 1973, 12, 13–18. [Google Scholar]
Karakurt, E.; Onur, A.H. Açık ocak üretim planlaması için örnek bir uygulama. Çukurova Üniversitesi Mühendislik-Mimar. Fakültesi Derg. 2019, 34, 85–94. [Google Scholar] [CrossRef]
Delibalta, M.S. Kömür açık işletmelerinde pasa şev stabilitesinin hipoplastik model ile tespiti. Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilim. Derg. 2012, 1, 50–59. [Google Scholar] [CrossRef]
Çelebi, N.E.; Paşamehmetoğlu, A.C. Linyit açık işletmeleri için bir maliyet analiz modeli. MTA Raporu 2010, 1, 1–50. [Google Scholar]
Saban, M. Yer üstü maden işletmelerinde dekapaj maliyetlerinin muhasebeleştirilmesi ve raporlanması. Uluslararası Yönetim İktisat Ve İşletme Derg. 2016, 12, 39–61. [Google Scholar]
Kun, S. Açik Işletmelerde Yaygin Kullanilan Ağir Iş Makinelerinin Teknik ve Performans Incelemesi. Master’s Thesis, Dokuz Eylül Üniversitesi, İzmir, Türkiye, 2014. [Google Scholar]
Köksal, N.S.; Orhan, A. İş makinelerindeki yağ tüketimini azaltmada yağ analizi yönteminin etkisi. In Proceedings of the Ulusal Geri Kazanım Kongresi ve Sergisi, Ankara, Türkiye, 2–4 May 2015; pp. 1–10. [Google Scholar]
Erkayaoğlu, M.; Demirel, N. Bir açık kömür madeninde kamyon ve bantlı konveyörün yaşam döngüsü değerlendirme metodu ile karşılaştırılması. Madencilik Derg. 2011, 50, 25–34. [Google Scholar]
Uğurlu, Ö.F.; Özdemir, Ö. Cevher üretim operasyonlarında gerekli optimum kamyon sayısının benzetim yardımıyla belirlenmesi. MT Bilimsel 2021, 1, 1–11. [Google Scholar]
Kafadar, I. Kontrolün önemi ve ekskavatör-kamyon yönteminde kontrol merkezi. Tek. Rap. 2010, 1, 291–298. [Google Scholar]
Kun, S. Ağır Iş Makinelerinin Çalışma Ortamında Risk Analizi ve Yönetimi. Master’s Thesis, Dokuz Eylül Üniversitesi, İzmir, Türkiye, 2018. [Google Scholar]
Elevli, S.; Yılmaz, Y.H. Maden ekipmanlarının öncelikli arıza tiplerinin belirlenmesinde grafiksel yaklaşım. Eskişehir Osman. Üniversitesi Mühendislik-Mimar. Fakültesi Derg. 2009, 22, 31–48. [Google Scholar]
Varol, M.K. Ekipmanların etkinlik değerlerinin farklı yaklaşımlarla karşılaştırılması ve bir mermer ocağı sahasına uygulanması. Karadeniz Fen Bilim. Derg. 2015, 6, 203–216. [Google Scholar]
Dayo-Olupona, O.; Genc, B.; Celik, T.; Bada, S. Adoptable approaches to predictive maintenance in mining industry: An overview. Resour. Policy 2023, 86, 104291. [Google Scholar] [CrossRef]
Özcan, A.G.; Yiğit, E.; Toksöz, N.Y. Transfer preslerde durum izlemesi ve bir kestirimci bakım sisteminin geliştirilmesi. In Proceedings of the 7th International Scientific Research Congress; IKSAD Publications: Ankara, Türkiye, 2020; pp. 119–125. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media: Sebastopol, CA, USA, 2022. [Google Scholar]
Republic of Türkiye Ministry of Energy and Natural Resources. Coal. Available online: https://enerji.gov.tr/bilgimerkezi-tabiikaynaklar-komur (accessed on 10 September 2025).
Turkish Coal Corporation. The Fourth Life of Mining Workshop. Available online: https://www.tki.gov.tr/haberler (accessed on 10 September 2025).
Turkish Coal Corporation. Production. Available online: https://www.tki.gov.tr/istatistikler (accessed on 10 September 2025).
GBD Collaborators. Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017. Lancet 2018, 392, 1789–1858. [Google Scholar] [CrossRef]

Figure 1. Monthly variation of coal production, run-of-mine quantities, stripping volume, and total breakdown counts (2021–2022).

Figure 2. Linear regression between Run-of-Mine quantities and coal production (R² = 0.98).

Figure 3. Relationship between stripping volume and breakdown count (R² = 0.58), showing a moderate positive trend.

Figure 4. Relationship between coal production and stripping quantities (weak correlation).

Figure 5. Relationship between breakdown counts and coal production quantities (very weak linear trend).

Figure 6. Relationship between breakdown counts and Run-of-Mine quantities (R² = 0.001), illustrating negligible linear association.

Figure 7. Relationship between stripping volume and Run-of-Mine output.

Figure 8. Confusion matrix of the Random Forest classifier used to detect breakdown unit categories.

Table 1. Correlation matrix for breakdown relation.

	Run of Mine	Coal Production	Stripping	Breakdown Count
Run of Mine	1.000000	0.989301	0.226669	−0.036154
Coal production	0.989301	1.000000	0.189045	−0.046421
Stripping	0.226669	0.189045	1.000000	0.767545
Breakdown Count	−0.036154	−0.046421	0.767545	1.000000

Table 2. Results of bidirectional two-way linear regression among production indicators and breakdown frequency.

X (Independent)	Y (Dependent)	Coefficient (β)	Constant (α)	R²
Run Of Mine	Production	0.9883	−7628.48	97.87
Coal production	Run Of Mine	0.9903	9480.09	97.87
Stripping	Breakdowns	0.0009	70.92	58.91
Breakdowns	Stripping	665.7702	248,901.97	58.91
Run Of Mine	Stripping	0.6627	660,750.61	5.14
Stripping	Run Of Mine	0.0775	34,586.36	5.14
Coal production	Stripping	0.5533	675,457.12	3.57
Stripping	Production	0.0646	35,219.22	3.57
Coal production	Breakdowns	−0.0002	721.46	0.22
Breakdowns	Coal production	−13.7575	91,519.27	0.22
Run Of Mine	Breakdowns	−0.0001	719.67	0.13
Breakdowns	Run Of Mine	−10.7261	98,061.81	0.13

Table 3. Random Forest Classification Results.

ID	Unit	ID	Unit
0	Attachments	21	Compressor
1	Crawler	22	Rack Gearbox
2	Circle turn	23	Tire
3	Tipper dumper	24	Marion Electric
4	Drill bit	25	Drill Motor
5	Differential rear	26	Engine
6	Differential front	27	Front Wheel
7	Steering	28	Pump Redactor
8	Gear	29	Ripper Tip
9	Electrics	30	Right Traction
10	Electrics general	31	Left Traction
11	Inductor	32	Left Swing Gearbox
12	Filter	33	Swing Gearbox
13	Brake	34	Gearbox
14	Brakes and Steering	35	Chassis
15	Rope	36	Tandem
16	Hydraulic	37	Torque
17	Hydraulic Pump	38	Lubrication
18	Hoist Gearbox	39	Walking Gearbox
19	Truck Electric	40	Walking
20	Bodywork

Table 4. Random Forest Classification Performance Metrics (Precision, Recall, F-score).

Precision	Recall	F-Score	Type
0.935	0.936	0.936	Micro
0.934	0.936	0.933	Weighted
0.677	0.681	0.667	Macro

Table 5. Random Forest regression performance for annual breakdown prediction.

Metric	Value
MAE	0.45
MSE	16.24
R²	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gödur, E.; Çebi, Y.; Onur, A.H. The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning. Appl. Sci. 2026, 16, 1517. https://doi.org/10.3390/app16031517

AMA Style

Gödur E, Çebi Y, Onur AH. The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning. Applied Sciences. 2026; 16(3):1517. https://doi.org/10.3390/app16031517

Chicago/Turabian Style

Gödur, Erol, Yalçın Çebi, and Ahmet Hakan Onur. 2026. "The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning" Applied Sciences 16, no. 3: 1517. https://doi.org/10.3390/app16031517

APA Style

Gödur, E., Çebi, Y., & Onur, A. H. (2026). The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning. Applied Sciences, 16(3), 1517. https://doi.org/10.3390/app16031517

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Relationship Between Breakdowns and Production, and the Detection of Breakdown Units in Mining Vehicles Using Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Description

2.2. Data Preprocessing

2.3. Machine-Learning Models

2.4. Machine-Learning Workflow

3. Results and Discussions

3.1. Two-Way Regression

3.1.1. Run of Mine—Coal Production Relationship

3.1.2. Stripping–Breakdown Relationship

3.1.3. Coal Production–Breakdown Relationship

3.1.4. Run-of-Mine (ROM)–Breakdown Relationship

3.1.5. Breakdown–Run of Mine Relationship

3.1.6. Stripping–Run of Mine Relationship

3.2. Breakdown Unit Detection with Random Forest Method

3.3. Estimating the Total Number of Breakdowns Annually with Random Forest Regression Algorithm

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI