Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework

Labib, Ashraf; Tǎnǎsuicǎ (Zotic), Coralia; Seecharan, Turuna S.; Roman, Mihai-Daniel

doi:10.3390/asi9030063

Open AccessArticle

Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework

¹

Faculty of Business and Law, University of Portsmouth, Portsmouth P01 3DE, UK

²

Department of Cybernetics and Economic Informatics, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, 010374 Bucharest, Romania

³

Department of Mechanical and Industrial Engineering, University of Minnesota Duluth, Duluth, MN 55812, USA

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov. 2026, 9(3), 63; https://doi.org/10.3390/asi9030063

Submission received: 27 January 2026 / Revised: 25 February 2026 / Accepted: 9 March 2026 / Published: 17 March 2026

Download

Browse Figures

Versions Notes

Abstract

The most impactful factors for the cost of fleet management are maintenance expenses and fuel consumption. Traditional ways of monitoring fleet performance fail to connect raw operational data with driving habits. The current study addresses this challenge by developing an architecture of frameworks, consisting of unsupervised and supervised machine learning algorithms, statistical testing, simulation and survival analysis to discover insights that lead to key behavioral predictors. The nucleus of this complex architecture is the decision-making grid (DMG), a two-dimensional matrix that groups vehicles based on their frequency of entering the service and the cost of their repairs. It is the first integration of DMG with ML for prescriptive fleet management. The objective of the study is twofold: firstly, to build a system that classifies vehicles according to their risk profile, and secondly, to offer clear directions for changing driver patterns that most affect vehicle costs or for keeping good practices. The framework proposed by this study not only drives the optimization of operational efficiency but also contributes to a methodology that links driver profiles to costs, offering a scalable methodology for similar business contexts.

Keywords:

machine learning; decision-making grid (DMG); driver behavior analysis; Weibull analyses; SHAP interpretability; predictive maintenance

1. Introduction

The current global economic context is characterized by growing uncertainty and escalating conflict situations that further increase investor risk aversion. Keeping a business alive in these situations requires companies to adapt by optimizing internal processes and activities.

One of the most immediate and impactful strategies is cost reduction, or improving procedures for cost monitoring, as these actions have a direct and measurable economic effect. This is a direction typically followed by decision-makers within organizations, especially in times of imminent crisis. The same strategic approach is increasingly being considered by transportation companies. The transport and storage sector represents almost 5% of the European Union’s GDP, and the increase in the demand for transportation is estimated to become around 60% by 2050, as highlighted by the European Commission in its report from 2019 [1]. In a Deloitte report on fleet management in Europe [2], it is written that fuel represents approximately 20% of the total cost of ownership, while maintenance and repair account for an additional 15%.

The objective of the current research is to develop a structure of algorithms and methodologies that can guide cost optimization for a transportation company by identifying the correlations between driver behavior and the resulting impact on maintenance and fuel expenses. Traditional fleet management approaches are often insufficiently effective, as they tend to be descriptive rather than predictive. This limitation prevents us from capturing the connection between driver behavior and its financial consequences, making it impossible to extract actionable insights from raw operational data.

The idea for this research came about from the need to make proactive, rather than reactive, decisions within a vehicle fleet. A large amount of data is collected from sensors installed at the level of a car’s Controller Area Network (CAN); this data cannot be processed using classical methods due to its high dimensionality and the lack of a clear understanding of which parameters have the greatest impact on maintenance and fuel consumption costs.

The article’s objective is to build an architecture of methods, models, and methodologies, combining advanced analytics, machine learning, artificial intelligence algorithms, and statistical modeling to identify key behavioral predictors that have the highest and most significant impacts on costs.

The novelty of the current work is that it combines the decision-making grid (DMG) with classification algorithms to identify the predictors and their specific combination that determine whether a specific vehicle should be placed in a specific zone of the decision matrix. The final result not only allows for classification but also outlines strategies through which the migration of a vehicle from a less efficient zone to a superior one—either in terms of cost reduction or reduced maintenance frequency—can be made possible.

Another distinctive element of the current work is that it proposes a new method for identifying and evaluating the optimal approach for generating the DMG, specifically from the perspective of determining the thresholds that define the nine distinct zones of the grid. The use of classification algorithms and the evaluation of their accuracy during various simulations—based on thresholds generated through different methods such as the Percentile-Based Quantiles Method, the largest gap method, and a Hybrid Segmentation Method—position this work as a turning point. Unlike previous studies that often rely on arbitrary or purely heuristic segmentation boundaries, this approach transforms the segmentation problem into a measurable and optimizable task.

The research applies unsupervised machine learning algorithms to identify distinct clusters of drivers, according to different categories of predictors, like accelerometer predictors or consumption variables. The most important predictors included are: total distance traveled in kilometers during the monitored period; time spent driving between 10 and 60 km/h, 60 and 110 km/h, and over 110 km/h; number of aggressive starts; unsafe braking at speeds between 10 and 60 km/h, 60 and 110 km/h, and over 110 km/h; dangerous cornering at speeds between 10 and 60 km/h, 60 and 110 km/h, and over 110 km/h; unsafe or fluctuating driving at speeds between 60 and 110 km/h and over 110 km/h; and the number of incidents associated with dangerous speeds (over 110 km/h). Statistical analysis and hypothesis testing were afterwards applied to determine if the differences between the clusters are significant from a statistical point of view, and to identify if any correlations exist between the age of the vehicles and the frequency or the amount of repair-related expenses.

Another important stage of the research was the construction of an alternative decision matrix. It is considered an alternative because, instead of using downtime as one of its axes, it uses the total repair expenses, while the other axis remains the frequency of the vehicle’s entries into service. A critical step was to select the most appropriate method for generating the thresholds needed to build the DMG. Given the lack of relevant studies offering guidance on how to determine the optimal segmentation strategy, the current study proposes a method for identifying the best thresholds as follows: a DMG is generated using an initial thresholding method, and then classification algorithms are trained using nine distinct classes, each representing one of the matrix’s zones. The model with the highest classification performance, based on accuracy and F1-score, is selected. Beyond model performance, the distribution of vehicles across the nine zones is also considered essential—the goal being that most, or all, zones contain some data. For each matrix built using different threshold generation methods, classification models are evaluated, and the optimal DMG is selected based on both the highest accuracy and the broadest zone coverage. Once the optimal thresholding strategy is identified, classification algorithms are used to determine the predictors with the greatest influence on vehicle positioning within the matrix. The justification behind identifying these predictors and their zone-specific value intervals is to simulate and explore how modifying one or more of these parameters facilitates a vehicle’s migration from a lower-efficiency zone to a more efficient one in terms of maintenance costs and intervention frequency.

This entire complex process is consolidated into a useful tool for intelligent fleet management, with the final aim being to develop a preventive way to prolong vehicle lifespan, reduce maintenance expenses and interventions, and improve fuel consumption efficiency, which can also translate into a reduction in CO₂ emissions.

Driving style has been thoroughly studied in the context of reducing emissions, reducing fuel consumption, and increasing road safety. Driving style can be classified into three categories: calm, normal, and aggressive. Ecological driving behavior (“Eco-Driving”) implies a calmer driving style that reduces hard accelerations and braking and reduces speeding [3]. Eco-Driving improves fuel economy by 15–25%, reducing Greenhouse Gas Emissions (GHG) by at least 30%, and promotes safety in road transport [3,4,5,6,7,8,9,10,11].

Other research on aggressive driving applies machine learning to predict aggressive driving. Wan et al. [12] applied Random Forest (RF) and adaptive boosting (AdaBoost) algorithms as classifiers, along with the adaptive synthetic sampling (ADASYN) and synthetic minority oversampling technique (SMOTE) for oversampling. They differentiated between normal and risky behaviors based on a spectrum of driving behaviors, such as speed and steering control, specifically focusing on identifying such behaviors in expressway tunnel sections. The results indicate that the risky driving behavior identification model using the ADASYN-RF algorithm achieves the best recognition performance. This approach utilizes the objectivity and comprehensiveness of the driving behavior spectrum, integrating multidimensional variables to achieve higher recognition accuracy compared to other models. The innovation of this method for identifying dangerous driving behaviors based on driving behavior profiles lies in its comprehensiveness, multidimensional data integration, spectral radius analysis, and enhanced recognition accuracy.

Another study by Rahman et al. [13] utilized a convolutional neural network-based deep learning framework, incorporating heart rate, movement tracking, and yawn detection to detect signs of drowsiness or fatigue in real time and generate an alarm. This paper develops an automatic driver fatigue detection system through computer vision with eye movement tracking, yawn detection, and heart rate monitoring. The eye aspect ratio (EAR) and lip distance computation are applied for the driver’s various states classification, e.g., active, drowsy, and sleepy. The proposed drowsiness technique has been deployed in an edge computing device, NVIDIA Jetson Nano. This portable and user-friendly device can assist drivers in reducing road accidents as well as the pain and anguish that individuals experience in their daily lives.

Gatteschi et al. [14] compared the performance of several state-of-the-art algorithms for aggressive driving event detection (belonging to anomaly detection-, threshold- and machine learning-based categories) on multiple datasets containing sensor data collected with different devices (black boxes and smartphones), on different vehicles and in different locations. Their results confirmed the superiority of ML-based approaches with respect to the other proposed algorithms. The best results were obtained by the Support Vector Machine (SVM) and RF techniques.

Few researchers have investigated the direct role of aggressive driving on vehicle maintenance. A study by Mikulic et al. [15] examined the factors affecting vehicle roadworthiness. The researchers hypothesized that driving style affects the vehicle condition regardless of the frequency of vehicle maintenance and vehicle age. In their study, they analyzed data regarding vehicle ownership collected at Periodic Technical Inspections (PTI) stations during PTI and compared the results with questionnaire responses as an indicator for determining the driving style and vehicle maintenance routine. Their results showed that personal owners have higher rates of roadworthiness than vehicles owned by legal entities. Regarding driving style, their research found that vehicles owned by personal owners were driven less aggressively than vehicles owned by legal entities or leased by legal entities.

Recent studies show that telematics data combined with ML and AI algorithms form a powerful tool for automatic driver behavior detection [16]. These studies also highlight that driver behavior varies depending on operational conditions, with aggressive starts and dangerous braking events being some of the most important predictors of behavioral patterns [16,17]. Advanced statistical methods such as Kaplan–Meier curves are used to better understand how events occur over time. Other research points out that driver behavior can also vary across time and space [17], while work environment conditions represent another important factor influencing driver performance and behavior [18]. The most important objective of these studies is to identify predictors that are highly correlated with driver behavior so that they can be modeled to improve traffic safety.

The decision-making grid (DMG) is a structured framework for selecting appropriate maintenance strategies based on two critical dimensions: frequency of failure and downtime. First introduced in the late 1990s, the DMG has become a recognized tool in the fields of reliability engineering, maintenance management, and asset optimization [19,20]. The DMG simplifies complex maintenance decisions by mapping equipment into a 3 × 3 matrix, which recommends policies such as corrective, preventive, condition-based, or redesign policies. The grid uses two axes: (1) frequency of failure (low, medium, high), and (2) downtime (low, medium, high). This creates a 3 × 3 matrix with nine zones, each corresponding to a recommended maintenance strategy. For example: low frequency, low downtime → operate-to-failure (corrective maintenance). However, this does not mean that the assets in this zone are ignored, but rather it implies an auditing strategy to sustain best practices. High frequency, high downtime → design or maintenance, which implies redesign or replacement, and assets in this zone should be candidates for the next turnaround or shutdown. Medium zones → preventive maintenance schedules but with emphasis on distinct aspects of the PM schedules depending on their location in the grid.

The DMG provides a qualitative yet systematic decision support tool, where it has been implemented in various industries such as manufacturing and processing industries [20], small and medium industries (SMI) [21], the food processing industry [22], the oil palm industry [23], the nuclear power industry [24], and hydroelectric power plants [25], and was integrated into a Computerized Maintenance Management System (CMMS), which was implemented in the automotive manufacturing industry [3].

The results of these implementations show not only reductions in costs and machine downtime but also better reliability in daily maintenance operations [21]. The decision-making grid (DMG) offers a continuous enhancement of a smart and easy-to-implement maintenance decision analysis system that can form the nucleus of any maintenance management system that contributes to overall business excellence [26]. It also serves as a foundational module for more complex frameworks like the Knowledge-Based Maintenance System (KBMS) and has been incorporated into intelligent decision-support tools and simulations.

It has been compared to other decision grids in maintenance, such as the Jack-Knife Diagram [26], and has been integrated into other models such as the bathtub curve [21], the overall equipment effectiveness (OEE) [27], and criticality analysis [28]. It has also been extended to incorporate cost analysis [22], detection rates as part of the risk priority number (RPN) [29], the process capability index in quality [30].

The scope of the DMG has been expanded by applying it to areas other than in maintenance management such as in managing innovation strategies in R&D [26,31], and it has been integrated with root cause analysis (RCA), the Analytic Hierarchy Process (AHP), Multi-Criteria Decision-Making (MCDM), fuzzy logic control (FLC) [9,20], and genetic algorithms [21,32,33]. Labib and others [9,20] emphasized the flexibility of the DMG, adapting it for use in lean manufacturing environments and as a pedagogical tool for teaching complex decision-making.

The main advantages of the DMG are its simplicity and visual clarity, practical decision guidance, and its ability to integrate with expert systems, whereas its limitations can be its structure, which is static unless integrated with dynamic data [34], as well as the existence of various classifications into low/medium/high zones [26].

2. Materials and Methods

The study’s complexity stems from its objective: achieving increased efficiency for the analyzed company through a reduction in fleet maintenance and consumption expenditures. Multiple analytical techniques were employed. First, explanatory data analytics was used to understand the behavior of the fleet. We then used PCA in both the analysis and segmentation of the population. Also, we segmented the population into a Decision-Making Grid (DMG), on which we built classification models to identify the most relevant predictors that will lead to the specific position of a car in the grid. The intuition was to find the combination of parameters and their interval limits for a car at a certain place in the matrix. This would allow the fleet manager to have an overview of every car position and for them to be able to run simulations by changing different parameter values to see where cars would be repositioned. The following diagram presents the methodologies used and the outputs obtained in our research in the modeling phase (Figure 1).

The usage of a pre-trained Large Language Model was the key to extracting the topics of the maintenance expenditures and classifying the descriptions into different classes of expenditures. The data received from the client was in the format of an Excel worksheet containing the information at the car level for each car entering the service, but the description was written as open text by the issuer, so there was no specific classification of the expenditure into categories. The model used for classification of the topics was GPT-Neo 1.3B, an open-source LLM created by EleutherAI (available online: https://huggingface.co/EleutherAI/gpt-neo-1.3B (accessed in 28 February 2025)). We chose this model because it is also trained on the Romanian language, and the descriptions of the expenditures were written in this specified language, hence it could classify vague or ambiguous descriptions. Since it was already trained, we had the data we needed to go further into the analytical part of the project. The LLM-based categorization was validated using a human-in-the-loop procedure. A subset of descriptions was manually reviewed to test if the assigned categories were consistent with the meaning of the original text. This manual validation was used to validate the resulting categories, given the absence of a labeled benchmark dataset.

The methodological elements used during the case study can be grouped in seven categories: unsupervised machine learning for segmentation, statistical analysis and hypothesis testing, survival and reliability analyses, decision-making grid (DMG), supervised machine learning for classification, data augmentation and class balancing, and the last category for model interpretability.

When speaking about unsupervised machine learning, K-means was the algorithm used for finding patterns in data and for segmenting the population. The objective of the algorithm is to group the data points into k clusters, according to the similarities found for detecting patterns or structure into the analyzed data. The algorithm needs to receive the number of clusters, then randomly initialize k centroids and assign each point of data to the nearest centroid. The centroid of cluster j represents the mean (center point) of all data points that belong to the segment j.

C l u s t e r (x_{i}) = {a r g m i n}_{j} ‖ x_{i} - μ_{j} ‖

(1)

Then the algorithm recomputes the centroid for each cluster and repeats the assignment, recomputing the centroids of each cluster until convergence.

μ_{j} = \frac{1}{| c_{j} |} \sum_{x_{i} \in c_{j}} x_{i},

(2)

where:

c_j represents the set of all data points that were assigned to cluster j;

μ_j represents the mean of all data included in cluster j;

|c_j| represents the number of data points in the cluster j;

x_i represents an individual data point in the dataset.

An important step of the algorithm is the identification of the optimal number of clusters that can be done using two methods: the elbow method and the silhouette score. The purpose of the elbow method is to detect the optimum number of clusters for which the total distance of all points from their respective cluster centroids stops decreasing significantly. The silhouette score also measures how well each data point fits into its cluster. Its values are between −1 and 1, with higher values indicating better clustering. The formula for the silhouette score is the following:

s (i) = \frac{b (i) - a (i)}{m a x {a (i), b (i)}},

(3)

where:

a(i) represents the mean intra-cluster distance;

b(i) represents the mean nearest cluster distance.

Principal Component Analysis (PCA) is a method to reduce the dimensionality of data with two specific objectives. It was first used for graphical purposes to graphically represent the population in a two-dimensional space, and secondly, to reduce the dimension of the variable space before segmenting the population. This is recommended to eliminate the multicollinearity in the data but maintain all the information needed. The principal issue of PCA when used for segmentation purposes is that the explainability of the results decreases, making it harder to identify what kind of combined information is held by each principal component.

Regarding the statistical analysis and hypothesis testing, three important tests were used for identifying if there exist any statistical difference between the generated clusters. The first method used was the analysis of variance, ANOVA, which was applied after clustering the population, to test if variables such as maintenance cost or failure frequency differ significantly between clusters. The high F-value for the ANOVA combined with a p-value smaller than 0.05 indicates that at least for one cluster, the mean is statistically different.

F = \frac{B e t w e e n - g r o u p v a r i a b i l i t y}{W i t h i n - g r o u p v a r i a b i l i t y} = \frac{M e a n s q u a r e d e r r o r b e t w e e n g r o u p s}{M e a n s q u a r e d e r r o r w i t h i n g r o u p s},

(4)

If ANOVA finds differences between clusters that are statistically significant, then a second test can be applied: Tukey’s Honestly Significant Difference (HSD) test, having as its objective the identification of which pairs of clusters differ.

The robustness of the clustering results was tested through several complementary validation approaches appropriate for the size and characteristics of the dataset. The optimal number of clusters was determined using the elbow method, while silhouette scores were used to evaluate cluster separation across the different segmentation scenarios. Principal Component Analysis (PCA) provided a structural visualization of the clustering patterns. Statistical testing through ANOVA and Tukey HSD further confirmed the presence of consistent differences across clusters. Considering the relatively small sample size, formal resampling-based stability techniques such as bootstrap clustering were not considered appropriate, as they may produce unstable estimates in small datasets.

One of the hypotheses that needed to be tested during the study was the one regarding a possible association between the age of a vehicle and its frequency of failure or between the age of a vehicle and the amount of money spent on repairs for the car. The chi-square test for independence evaluates whether there is a significant association between two categorical variables. A high chi-square value and a p-value lower than 0.05 suggest a statistically significant association.

χ^{2} = \sum \frac{(O_{i j} - E_{i j})}{E_{i j}},

(5)

where:

O_ij—observed frequency in cell i, j;

E_ij—expected frequency under independence assumption.

The next section of methodologies used in the current study focuses on reliability and survival analysis and includes statistical techniques to evaluate the likelihood of a vehicle requiring service over time, taking into account vehicle age and service frequency.

The first model used was Weibull analysis, having as its final goal the modelling of time-to-service data, assuming a specific distribution. In this study, it evaluates whether service-related rates increase, decrease or remain constant with vehicle age:

F (t) = 1 - e^{- {(t / λ)}^{β}},

(6)

where:

F(t)—cumulative distribution function (probability of a vehicle entering in the service by time t);

λ—characteristic life (scale parameter);

β—shape parameter (β < 1: early entering into the service; β = 1: constant entering into the service rate; β > 1: rate of entering into the service increases with age).

To estimate the variability and the stability of the Weibull shape parameter (β), a technique for resampling was used, the so-called bootstrap method, which makes multiple random sample extractions from the data with replacement, fits the Weibull distribution to each sample and calculates the distribution of estimated β values. The bootstrap analysis for the Weibull analysis was applied to different vehicle categories (Small, Midsize, Large) to evaluate whether the increasing-service-visit trend is consistent across the population or driven by outliers.

The Weibull analysis can be limited in terms of efficiency because it considers a specific distribution of data, but there is another non-parametric method for estimating the survival function without assuming any distribution: Kaplan–Meier Survival Analysis. The Kaplan–Meier curve estimates survival probability over age. Each vehicle is considered to “fail” when it is serviced. The formula for the survival function is shown as Equation (7).

\overset{˘}{S} (t) = \prod_{t_{l} < t} (1 - \frac{d_{i}}{n_{i}}),

(7)

where:

d_i—number of events (enter into the service) at time t_i;

n_i—number of vehicles at risk just before t_i.

The decision-making grid (DMG) is a central component of the research, providing an interpretable way to categorize vehicle behavior and a way of identifying which predictors can be influenced and modified, so that the vehicle may migrate into a safer and more cost-effective zone of the grid. It is a 3 × 3 matrix based on two dimensions, the frequency of a vehicle entering into the service and the downtime cost, more precisely, the volume of expenses paid due to the car being placed in service. Each axis is divided into three zones, low, medium and high, resulting in nine distinct zones. Different strategies were tested to determine the most appropriate thresholds for segmentation: percentile-based quantiles (tertile segmentation), the largest gap method, equal count binning (tertile via sorting), and a hybrid segmentation method combining all of the three methods.

Percentile-based quartiles divide the population using the 33rd and the 66th percentiles. This method has its strengths because it ensures the equal distribution of the groups but is limited in terms of the natural breaks in data that are not captured. The largest gap method sorts data and identifies the largest two differences between consecutive values. Equal count binning was also tested, but it produced similar classes as the percentile method. The hybrid segmentation method combined all of the three methods, generating more robust and balanced boundaries. The approach was as follows:

{T h r e s h o l d}_{1} = \frac{Q_{33} + {T e r t i l e}_{1} + {G a p}_{1}}{3},

(8)

{T h r e s h o l d}_{2} = \frac{Q_{33} + {T e r t i l e}_{2} + {G a p}_{2}}{3}

(9)

Each DMG segmentation method was tested using classification algorithms that will be defined in the following rows, and the performance was measured using prediction accuracy scores for the DMG zone assignment and the distribution of class coverage (did all zones receive data?).

Supervised machine learning was used to classify vehicles into DMG zones based on predictive features like driving style and maintenance costs, with the goal being to identify which predictors influence zone assignment. The feature selection process followed an iterative approach, guided by the need to identify predictors that are both influential and actionable from a fleet management perspective. The initial set of variables included telematics and accelerometer data describing driving behavior, as well as data on vehicle usage and operational performance, allowing the analysis to capture the connection between driver patterns and vehicle positioning within the DMG. Feature importance was evaluated using SHAP in order to identify the predictors with the strongest contribution to zone classification. The results showed that some highly influential variables, such as total kilometers driven, could not be directly influenced through operational interventions. Based on this observation, predictors that could not be adjusted to influence driver behavior were removed, and the models were retrained using only the remaining actionable variables. This step ensured that the final feature set reflects meaningful behavioral and operational drivers of vehicle positioning within the DMG rather than purely statistical associations.

Many algorithms were tested to compare performance and robustness: RandomForestClassifier, an ensemble of decision trees that reduces variance and handles non-linear interactions; AdaBoostClassifier, a boosting method that combines weak learners sequentially to improve performance; LogisticRegression, a linear model very good for interpretable classification, but which performs well when relationships are linear; and KNeighborsClassifier (KNN), which classifies based on proximity in feature space, but unfortunately is sensitive to feature scaling. Other models tested were NaiveBayes (GaussianNB), a probabilistic model assuming feature independence; the DecisionTreeClassifier model, a rule-based classifier; XGBoost, gradient-boosted trees, which are powerful and efficient for structured data; the GradientBoostingClassifier model, which is similar to XGBoost; and ExtraTreesClassifier, a randomized version of RandomForest, often improving variance and runtime. Other models were trained and tested, like BaggingClassifier, a bootstrap Aggregation applied to base; MLPClassifier, a multilayer perceptron algorithm useful for capturing non-linear patterns; Support Vector Machines (SVC) with linear kernel, which performs well with linearly separable classes; SVC with RBF kernel, which captures more complex relationships using radial basis functions; Fourier with LogisticRegression, which is a fine-tuned model through the addition of random Fourier features to map data into a higher-dimensional space. A notable issue was the occurrence of imbalanced classes, and the technique used for balancing the distinct zones of the DMG was the data augmentation technique, because imbalance leads to poor generalization, biased models, and unreliable classification. There were more strategies for augmenting the data during the study: for underrepresented zones, rows were manually duplicated and perturbed by adding small Gaussian noise to numerical features. Even with manual augmentation, class imbalance persisted. Therefore, the Synthetic Minority Oversampling Technique (SMOTE) was applied to the training set. SMOTE synthesizes new samples by interpolating between neighboring instances of the minority class. The method preserves label balance while increasing diversity of minority samples.

Given the relatively small sample size, the augmentation was intentionally conservative to avoid altering the economic interpretation of cost-related variables. The data augmentation strategy was built to preserve the statistical properties of the original dataset while addressing class imbalance across DMG zones. For classes with a single observation, a small Gaussian noise perturbation (mean = 0, standard deviation = 0.01) was applied to numeric features in order to generate synthetic observations without significantly altering the underlying distribution. The low value of variance in the noise ensured that generated values remained close to the original observations. SMOTE was used to generate synthetic samples through interpolation between existing observations, preserving local relationships within the feature space. This approach improved class balance while maintaining the overall structure of the dataset.

The focus of the current study was on interpreting the trained models to understand why they make certain predictions and which predictors influence those decisions the most. This interpretability is critical for actionable insights, particularly in the context of fleet behavior and cost optimization. The SHAP library from Python was used for identifying the most important predictors that determine a vehicle’s position in a specific zone of the matrix. It is a model-agnostic interpretability framework based on cooperative game theory. It assigns each feature an importance value for a particular prediction, considering all possible combinations of features. SHAP values explain how much each feature contributes to increasing or decreasing the prediction and provide both global (overall importance) and local (for a specific instance) explanations.

Considering all the strategies adopted and the methodologies tested, a visual representation was created to illustrate the flow of actions, algorithms, and strategies employed to address this specific business problem (Figure 2).

3. Results

3.1. Data Description

The database contains private data [35], consisting of 91 vehicles of various categories and dimensions, all of which are involved in the goods transportation process. The population is composed of 11 Large vehicles, weighing more than 12 tons, 21 Midsize cars, weighing between 4 and 12 tons, and Small cars, weighing less than 4 tons. On these cars, IoT devices were installed that continuously sent information into a data lake. Through APIs, the information was collected from the data lake and inserted into a MySQL database. The data received contained accelerometer information, usage data, GPS data, and data referring to different impacts that affected cars in various ways and with different intensities. Other data received from the client referred to service repairs, costs, and issues for which cars required servicing, as well as data regarding the total number of kilometers driven monthly and the total amount of fuel used.

From the initial list of expenses, we eliminated the categories referring to Periodic Technical Inspection (PTI) and kept only the following categories: batteries, tires, parts, repair, and bodywork. After applying the LLM model (GPT-Neo 1.3B) for topic generation and segmenting the expenses according to the new topics, we arrived at the following segments of expenses: braking systems, suspension and steering systems, exhaust and catalysis systems, engine and transmission, electrical and on-board systems, air conditioning systems, wheel and tire system, body systems and accessories, fuel and AdBlue, miscellaneous and other operations, fluids and lubricants, and accidents and towing.

Regarding the brand of the vehicles, the most common one in the analyzed population was Mercedes-Benz, accounting for over 70% of the total, with other brands (IVECO, FIAT) having significantly lower shares.

One of the first insights found in the yearly distribution of expenses by category was the noticeable increase in repair-related costs from 57.6% in 2022 to 75.3% in 2024. This suggests that a higher share of the total expenses is now going toward repairs. But what is driving this shift? Could it be the natural wear and tear caused by the aging of vehicles, or is it more about how vehicles are being driven and the impact of driver dynamics on vehicle condition? This question drove us to pursue a deeper investigation using analytics and modeling techniques.

In contrast, tire-related expenses have declined, dropping from 26.6% in 2022 to just 10.2% in 2024 (Figure 3). It is important to note, however, that seasonal tire replacement costs were excluded from this analysis. These are considered mandatory expenses due to legal requirements that dictate the use of winter tires during colder months and summer tires in warmer periods. These costs are expected and unavoidable.

3.2. Clustering and Principal Component Analysis—Statistical Analysis

The first direction for identifying insights in the data was to cluster the data by considering the different sections of information provided by the IoT, like consumption, accelerometer data, and specific impacts correlated with driver behavior, such as unsafe or fluctuating driving at speeds between 60 and 110 km/h, unsafe or fluctuating driving at speeds >110 km/h, number of aggressive starts, unsafe braking at speeds between 10 and 60 km/h, unsafe braking at speeds between 60 and 110 km/h, unsafe braking at speeds >110 km/h, dangerous cornering at speeds between 10 and 60 km/h, and dangerous cornering at speeds between 60 and 110 km/h.

Through the accelerometer sensor, overload exceedance can be detected, as it measures G-forces in three directions (X: lateral/horizontal, Y: front/back, Z: vertical). For example, if we consider the horizontal axis, a specific event considered an overload can be measured as a value that exceeds a given threshold of 3 m/s², for example.

Considering all the accelerometer parameters, we segmented the population using an unsupervised machine learning model, K-means, and identified that the optimal number of clusters is 5 by using the elbow method (Figure 4).

Several variables contributed to the segmentation process, so it was not possible to represent the clusters graphically due to the multiple dimensions of segmentation. Therefore, we used Principal Component Analysis (PCA) to reduce the dimensionality of the clustering space in order to visualize the cars according to their segments. The PCA was not used before segmentation for a simple reason: it was related to the explainability of the segmentation, considering that the purpose of the research was to understand the connections between driver behavior and car expenses, and to be able to offer solutions to reduce these costs.

The dimensions were combined and generated two new dimensions, PCA1 and PCA2, which came with the opportunity to visualize the distribution of the cars, according to their K-means generated segment (Figure 5).

To identify patterns and insights in the data, boxplots were created for each predictor considered as a direction for the segmentation algorithm. The most important ones, in terms of identifying special characteristics of each group of vehicles, are presented in the following figures. Cluster number 1 contains the cars with the highest number of special events that occurred (Figure 6, left image), where ‘events’ refer to impacts on the vehicle from different directions, overload on braking, overload on curves, or overload when accelerating the car.

The acceleration overload also identifies two groups of cars that concentrate on the individuals with the highest rate of performing these inefficient driving patterns: cluster 1 and cluster 4 (Figure 6, right graph). On a closer look at other predictors like braking overload (Figure 7, left graph) and cornering overload (Figure 7, middle graph), it can be easily seen that clusters 1 and 4 also concentrate the individuals with the highest values for these directions of clustering. In terms of the variable that describes the average speed during these events (Figure 7, right graph), the evidence indicates that cluster 1 represents the population of drivers with the highest number of events happening when driving at a higher speed compared to the rest of the segments of the population.

A first step in identifying whether there might be a connection between driving style and maintenance expenses was to examine the distribution of expenses by type at the cluster level. Cluster 1 distinguishes itself through having the highest amount of money the company spent on the maintenance (Appendix A) of the vehicles included in it (types of expenses are detailed in Appendix B). Also, vehicles assigned to cluster 1, which are characterized by a more aggressive driving style, are correlated with a significantly higher cost of repairs. The expenses are not only higher in total, but also across multiple categories such as engine and transmission repairs, electrical components, and braking systems. This finding has important operational consequences. For example, drivers in cluster 1 could be targeted with tailored training programs focused on safe and fuel-efficient driving. Additional interventions might include real-time driving feedback, incentive schemes to promote smoother driving behavior, or even tighter monitoring for new drivers showing similar driving patterns.

The segmentation model and the heat map correlate the highest expenses with the cluster containing drivers who mostly affect the car through overload on braking, cornering, or accelerating, which represented the inflection point, indicating that we can go deeper into the data to identify the variables that have the most impact on maintenance costs and fuel consumption.

The next step was to use unsupervised machine learning, also called K-means, for segmenting the drivers based on their behavior. The directions of segmentation were: total distance traveled in kilometers during the monitored period, time spent driving between 10 and 60 km/h (Time1), time spent driving between 60 and 110 km/h (Time2), time spent driving at speeds >110 km/h (Time3), number of aggressive starts (AF), unsafe braking at speeds between 10 and 60 km/h (Brake1), unsafe braking at speeds between 60 and 110 km/h (Brake2), unsafe braking at speeds >110 km/h (Brake3), dangerous cornering at speeds between 10 and 60 km/h (Curve1), dangerous cornering at speeds between 60 and 110 km/h (Curve2), dangerous cornering at speeds >110 km/h (Curve3), unsafe or fluctuating driving at speeds between 60 and 110 km/h (Disfluent2), unsafe or fluctuating driving at speeds >110 km/h (Disfluent3), and number of incidents associated with dangerous speeds (>110 km/h) (Speed3).

Histograms give a better perspective on the data and the direction to be followed for modeling and analyzing them. The fleet analyzed has a positive skewed distribution for the parameter total number of kilometers driven (Figure 8), which was evaluated for each of the vehicles included in the analyses over a period of time of 6 months (from June 2024 to November 2024). The dominant activity occurred between 10.000 km and 25.000 km, during the analyzed period, indicating that the majority of the cars had reasonable activity for a commercial fleet of cars (meaning that they drove on average between approximately 1600 and 4100 km on a monthly basis). There were also more than 10 cars that are underused and few vehicles that were intensively used, maybe with longer routes or interurban routes.

The K-means algorithm was trained initially on all the predictors described and the maximum value obtained for the silhouette score was 0.31, but when using Principal Component Analysis to reduce the dimensionality, the silhouette score increased to 0.48. There were two components used for reducing the dimension of the input data for segmenting the population and generating seven distinct clusters of cars (Figure 9). In the effort of not losing the interpretability of the model, the loadings for PCA1 and for PCA2 were analyzed (Figure 9), and the conclusion was that PCA1 points out drivers that might be riskier because the strongly positive contributions to the component are held by variables such as Total_AF (total number of aggressive starts), Total_Brake1/2/3 (total number of dangerous braking at different speeds), and Total_Curve1/2/3 (risky turns on different speeds). This means that PCA1 values indicate vehicles that are frequently involved in risky events.

Main component 2 (PCA2) seems to be correlated with drivers who drive frequently, over long distances, but without making too many mistakes. Features such as long distances and driving time at moderate speeds are variables that have a higher influence at the PCA2 level. The Total_Dist_km variable (total number of kilometers driven) has a strongly positive contribution to PCA2 (0.59) (Figure 10), which indicates that drivers who drive a lot are the ones who determine high values of PCA2. Other variables such as Total_Time1_min and Total_Time2_min also contribute positively, suggesting that these drivers spend a lot of time driving at moderate speeds.

The use of boxplots at the variable level for each cluster is helpful in discovering the real components of a cluster and if there are indeed differences that can reveal distinct drivers’ behaviors in traffic.

In terms of total distance traveled by the vehicles (Figure 11, left graph), cluster 4 shows the longest total average distance traveled, with significant variations, indicating vehicles that have accumulated high mileage. Clusters 0, 1, and 6 have close median values, but cluster 6 has fewer extreme values compared to the others. Visualizing the variable showing the total number of minutes traveled at speeds between 10 and 60 km per hour (Figure 11, middle graph), cluster 1 shows the largest variation in total time, with some extreme values, while cluster 4 has the highest mean but low variation, suggesting uniform use over time at low speeds. The variable total time driven at speeds between 60 and 110 km per hour (Figure 11, right graph) reveals that cluster 4 has the largest median value and the widest range, indicating increased variability in associated time. Clusters 5 and 6 have relatively low median values, suggesting reduced time for these events.

If the first set of boxplots explained the clusters from a car usage perspective, in terms of the number of kilometers traveled and time spent driving at different speeds, the second set of graphs (Figure 12) presents the segments from a driver dynamics perspective, considering the following directions of interest: the total number of aggressive starts, the total number of unsafe brakes at speeds between 60 and 110 km per hour, and the total number of unsafe driving courses at speeds between 60 and 110 km per hour.

Cluster 6 shows the highest values for the number of aggressive starts, indicating a more aggressive behavior, while cluster 4 displays lower values, but with significant variation. Cluster 2 has the highest average values for the dangerous braking event, suggesting an aggressive braking style, while cluster 4 shows lower and more consistent values, suggesting a group of safety-conscious drivers. For the disfluent driving style, cluster 2 is distinguished by a significantly higher median and interquartile range, indicating a high frequency of disfluent events, while clusters 4 and 5 show exceptionally low values, with almost no disfluent events, highlighting the drivers with the best driving performance.

The scope of identifying distinct groups of drivers acting similarly was reached, but the main objective is to find a way to reduce expenses associated with the fleet, so the next stage is to identify how the variables explaining driver style are correlated with expenses related to fuel consumption and with maintenance costs. A direction was to analyze how the braking dynamics of a driver impact maintenance costs. To accomplish this specific task, the fleet was segmented based on only three directions: Unsafe braking at speeds between 10 and 60 km/h, unsafe braking at speeds between 60 and 110 km/h, and unsafe braking at speeds >110 km/h. A K-means algorithm was used, and the optimal number of clusters was 5. Cluster 0 shows drivers with the best braking patterns (Figure 13); they are the most prudent drivers. But from an expense point of view, we had a different perspective that came out (Figure 13).

While cluster 0 represents a group of prudent drivers in terms of braking behavior, this group of cars recorded the highest values of expenses related to the braking system (Appendix C). ANOVA was used to test whether there is a significant difference in braking style between clusters, as well as in braking system expenses across clusters (Appendix D). The results of the ANOVA show a statistically significant difference between clusters for braking dynamics across all three speed categories (p-value < 0.05), but no significant difference between clusters in terms of braking expenses (p-value > 0.05).

Tukey’s HSD (Honestly Significant Difference) tests were also applied to validate the ANOVA results and to better understand which clusters differ the most from each other in terms of braking habits at different speeds and braking costs (Appendix E). The Tukey test results confirm the ANOVA findings, indicating that for the majority of clusters, there are statistically significant differences in braking style across speed categories, while there are no significant differences between any of the clusters in terms of braking costs.

Another perspective on analyzing the data was to identify whether clustering the population based only on the total number of engine starts, total amount of braking, and total number of curves taken during the analyzed period would reveal significant differences between clusters in terms of the associated maintenance costs. The K-means algorithm was trained on the data, and according to the elbow method (Figure 14, left graph), a number of four clusters was considered optimal for segmenting the population. By analyzing the boxplots for each of the three segmentation variables, the result showed that cluster 2 contained the vehicles with the most intensive usage (Figure 14, right graph). The segmentation algorithm grouped into this cluster the cars with the highest values for total engine starts, total number of curves taken, and total amount of braking.

ANOVA and Tukey tests were used to identify if there were significant differences, and the results suggested that there was no significant difference between clusters for any of the expense categories (Appendix F).

The final approach to segmenting the data to identify specific patterns and correlations between driving behavior and expenses was to analyze the fleet from a fuel consumption perspective. The variables used for segmentation were the total amount of fuel used, the total number of kilometers driven, and the consumption per kilometer over a period of four months, from May 2024 to August 2024. The elbow method (Figure 15) suggested an optimal number of four clusters, with a silhouette score of 0.6 for this segmentation. The segmentation algorithm identified that cluster 1 contained the vehicles with the highest usage—cars that were intensively used both in terms of fuel consumption and kilometers driven—followed by cluster 2.

The ANOVA and Tukey HSD analyses (Appendix G) revealed that for the majority of the expense categories, there were significant differences between the generated clusters. However, costs related to accidents showed no significant variation between clusters. Clusters 1 and 2 highlight vehicles with higher costs, as shown by the distribution heat map (Appendix H) for the expenses per category and per cluster.

The conclusion after analyzing the data was that the most important correlation found was between usage and costs, meaning that an intensively used car will generate the highest maintenance cost and also the highest fuel consumption costs. Contrary to initial expectations, the solution was to use a different type of data analysis and use other methodologies of finding insights into the analyzed population.

3.3. Decision-Making Grid Construction

The decision-making grid (DMG) is the tool that was able to segment the fleet into many tangible zones and offered the opportunity of finding the expected outputs, allowing us to propose targeted actions to improve the overall efficiency by changing driver behavior and moving them to a better zone. The DMG (Figure 16) was adapted to the current case study, the frequency of failure was translated into the frequency of a vehicle entering into service (x-axis), and the downtime was translated into the amount of money spent on maintenance of a specific vehicle (y-axis).

The DMG was built based on quantile thresholds, using the 0.33 and 0.66 quantiles as cut-off points to segment the population along both dimensions. This resulted in nine distinct zones, where vehicles were allocated (Figure 16) according to their values of service entry frequency and the amount of money spent on their maintenance.

The first question that arose was whether there is any correlation between the age of a car and the frequency of failure (Figure 17), as well as between age and the cost of failure. To answer this, two contingency matrices were built and a chi-squared test was applied to each of them. The results were as follows: chi-square statistic for frequency: 59.09601, p-value: 9.819368370137725 × 10⁻⁶; and chi-square statistic for cost: 61.2022, p-value: 4.63268508673909 × 10⁻⁶. The outcomes of the chi-squared tests indicate a strong and significant relationship between the age of the vehicle and both the frequency of the car entering into service and the associated cost of maintenance.

Despite the modest number of vehicles included in this study, the next steps of the analyses began from a key inflection point. The assumption that drivers might behave differently depending on the size of their car led to a new split within the fleet. The segmentation was based on the vehicle tonnage and resulted in three distinct groups: the ‘Large’ segment, consisting of cars over 12 tons (Figure 18, left graph); the ‘Midsize’ segment, including the vehicles between 4 and 12 tons (Figure 18, middle graph); and the ‘Small’ category, comprising cars under 4 tons (Figure 18, left graph). This new split translated into three new DMGs, all of them built on quantile thresholds, using the 0.33 and 0.66 quantiles as cut-off points (Figure 18). In terms of volume, the ‘Large’ segment included 11 trucks, the ‘Midsize’ segment comprised 21 vehicles, and the ‘Small’ category contained 59 cars.

The different zones of the DMG were labeled according to the two thresholds, and these zone names will be used throughout the remainder of this paper. The first part of the label includes the frequency of failure, and the second part, the cost of failure (Figure 19).

Given this new segmentation of the population, it was necessary to verify whether the previously observed strong correlation between vehicle age and both failure frequency and maintenance cost still holds within each vehicle category. The chi-squared test (Table 1) indicates that there is no statistically significant association between vehicle age and the different zones of the DMG at the segment level. This lack of significance may be due to the reduced data volume within each zone and its associated segment, but it may also suggest that other variables influence maintenance costs and the frequency with which a vehicle requires service.

The boxplot for the ‘Small’ vehicle category supports the chi-square test result, showing that there is no clear difference in the median vehicle age across the DMG zones (Figure 20). This suggests that vehicle age is not significantly associated with the assigned zone, in line with the statistical test outcome.

The next objective was to investigate whether the age of the vehicle still remains a significant factor in explaining the frequency of service incidents and maintenance costs at the segment level. For each segment, the chi-square test was applied to determine the significance of the relationship. The average vehicle age was also calculated for each frequency and cost tier (Table 2).

For the ‘Large’ and ‘Midsize’ segments of cars, there was no significant relationship found between vehicle age and either frequency or cost (p-values > 0.2). The sample size for the ‘Large’ class of vehicles (only 11 vehicles) may limit the strength of conclusions. The ‘Small’ segment showed the clearest trend; frequency increased with vehicle age. However, the chi-square test results were marginal (p ≈ 0.06), meaning the relationship was close to statistical significance, but was not strong enough to be confirmed definitively at the 5% significance level.

The chi-square analysis represents the first step in identifying a relationship between vehicle age and maintenance frequency and cost. Age by itself is not able to capture the overall complexity of fleet dynamics. Its analysis, together with the vehicle size segmentation, clustering based on usage and behavioral parameters, and survival analysis, provides a broadened perspective on how it interacts with operational parameters. The principal obstacle in applying multivariate modeling was mainly the reduced population size.

3.4. Survival and Reliability Analysis

An important step in understanding the age distribution of the vehicles and their dynamics in terms of age-related failures was to apply Weibull analysis (Figure 21). The conclusion of the analyses, according to the beta shape parameter, is that: ‘Large’ vehicles are in the accelerated wear-out phase, meaning that failures increase with age; ‘Midsize’ vehicle show similar behavior to the ‘Large’ segment, indicating that the frequency of service visits increase with the age; and for the ‘Small’ category of cars, the beta value is the lowest, suggesting that failures are more distributed, possibly due to different factors, not just age.

To assess the reliability of the Weibull distribution fit for vehicle age (especially given the small number of vehicles in the “Large” category), a bootstrap resampling approach was applied. This method involved repeatedly resampling the vehicle age data and re-estimating the Weibull Beta (shape) parameter to evaluate its variability. The results (Table 3) show that ‘Small’ category of vehicles have low and consistent Beta values, indicating a more stable failure pattern not strongly tied to age; ‘Large’ and ‘Midsize’ vehicles have high Beta values, consistent with an accelerated wear-out phase, but also show high variability, suggesting these estimates may be unstable due to the limited sample size.

Considering that the outcomes of the Weibull analysis and the bootstrap results still required further validation, an additional method was applied: Kaplan–Meier (KM) Survival Analysis. Unlike the Weibull model, which assumes a specific failure distribution, Kaplan–Meier analysis directly estimates the probability of survival over time without relying on a predefined mathematical function. The results (Figure 22) showed that for the ‘Small’ vehicle category, the survival probability declines rapidly, and most vehicles are no longer operational by year 10. For the ‘Midsize’ category, failures are distributed between years 8 and 12, suggesting age-related degradation but with less consistency. ‘Large’ vehicles exhibit high survival probability initially, followed by a sharp decline between years 8 and 10, indicating that failures in this category are strongly age-driven.

Considering the results of the three methods applied (Weibull, bootstrap for Weibull, and Kaplan–Meier), which identified the ‘Small’ group of vehicles as the category where maintenance costs and service frequency do not significantly depend on vehicle age, the next stage of the analysis focused specifically on this group. The objective was to identify the predictors most strongly correlated with a higher failure rate and increased maintenance costs.

With the number of potential predictors being high, including accelerometer data and various types of expenses, the optimal direction for evaluating their impact was to reduce dimensionality by using Principal Component Analysis (PCA). The elbow method was used to determine the optimal number of components to be used in further modeling, with the objective of achieving approximately 80% explained variance. According to this criterion, the optimal number of principal components was 10 (Figure 23).

To identify whether there is a correlation between driver style and maintenance costs, a multiple linear regression was applied to identify how and if the principal components impact the maintenance costs. For example, the driving behavior described through some principal components has a significant impact on the braking expenses, but not all the components are relevant (Appendix I). The R-squared value is 0.395, meaning that approximately 40% of braking expenses are explained by the 10 principal components. The model is statistically relevant (p-value F-statistic = 3.12 × 10⁻⁶). If checking the impact that the components have on expenses, then PC1 has a negative impact (p < 0.001), while PC2 has a significant positive effect on them (p = 0.001). Other components are not significant at a level of significance of 0.05, like PC3, PC4, and PC5, for example.

3.5. Classification Models—Interpretability

The issue with this way of identifying the predictors with the highest influence on the maintenance costs is interpretability. Having so many different factors and different loadings of factors in the principal components makes it even harder to figure out the influence of specific predictors, which leads to the impossibility of being able to influence them through different approaches. This issue imposes another direction of analysis: the use of machine learning algorithms to find which predictors are most important in determining where a vehicle is placed in the nine distinct zones of the DMG.

Another issue that occurred was connected to the low sample size for some of the DMG zones. That situation imposed the use of an augmentation technique to improve model training. For each small group, synthetic examples were generated by adding a small amount of noise to the original vehicle’s numeric features. Although simplistic, this approach helped to balance the training dataset and reduce bias toward overrepresented classes.

The accuracy metrics show poor accuracy for the models, with the highest accuracy for the SVC algorithm being 67% and the F1-score being 74% (Figure 24). Given the class imbalance among the nine DMG zones, the final model was trained using SMOTE (synthetic minority oversampling technique) to synthetically balance the dataset. A linear Support Vector Machine (SVM) was chosen, as it delivered the best performance during testing. The model was trained on the resampled dataset and saved along with the corresponding feature list for reproducibility.

SVM was selected for its robustness in high-dimensional settings and its ability to find optimal class-separating hyperplanes compared to the other models (Table 4) that were tested and were not able to capture, with a higher accuracy, the behavior of the predictors.

SHAP analysis was used for interpreting the final SVM model by quantifying the contribution of each feature to the prediction. The summary plot (Appendix J) visually ranks the predictors based on their overall impact, helping identify which features are most influential in assigning vehicles to specific DMG zones. The target variable was encoded using an ordinal mapping, where 0 corresponds to the high–high risk zone (highest risk) and 8 corresponds to the low–low risk zone (lowest risk). Because the SHAP analysis was applied to the output of model prediction—which returns the numeric class label—the SHAP values describe how each feature influences movement along this ordinal scale.

The SHAP summary (Appendix I) plot reveals that the most influential predictors for classifying vehicles into DMG zones are the kilometers driven in 2023, braking expenses, and bodywork-related costs. High values for the predictor kilometers driven in 2023 have a negative SHAP contribution, meaning that vehicles with high annual fuel consumption are labeled into the higher-risk zones (e.g., high–high or high–medium). Positive SHAP values (points on the right side of the plot) indicate that a feature pushes the prediction toward higher label values, and therefore toward safer risk zones. Conversely, negative SHAP values (points on the left) indicate that a feature pushes the prediction toward lower label values, corresponding to riskier zones.

Other important insights provided by the models are that aggressive driving patterns increase the fleet risk, and repeated electrical failures are events correlated with an increase in the class risk. Vehicles with consistent fluid replacements tend to remain in lower-risk categories, and driving at medium–high speeds (e.g., highways) is mechanically less stressful than urban stop-and-go driving.

The predictors most highly correlated to the target were identified, but not all of them could be influenced in order to effectively change driver patterns. For example, the SHAP importance plot highlights the number of kilometers driven in 2023 as the most important predictor. While this is a strong indicator, it is not a factor that can currently be modified. Based on this insight, the predictors that cannot be adjusted to influence driver behavior and which therefore do not support actionable strategies for reducing fleet costs were removed. The classification models were then retrained using only the remaining, actionable predictors (Figure 25). Two other important predictors were added to the list of predictors: maintenance expenses per kilometer driven and fuel expenses per kilometer driven. These features were calculated by reporting the total me (January 2023 until November 2024) to the total number of kilometers driven in the same period.

The overall accuracy of the models decreased by eliminating the most important predictor and the best performing model was a Random Forest model, with an accuracy score of 58% and an F1-score of 55%. The other tested models were not able to get to a better accuracy (Table 5). By using the SHAP library, the most impactful predictors were identified, the first in importance being the maintenance cost per kilometer with an importance of 74%, followed by the average unsafe or fluctuating driving at speeds between 60 and 110 km/h, with an importance of 22%, and fuel cost per kilometer with a SHAP importance of 19%.

3.6. Thresholding DMG

Currently we have predictors that are mostly correlated with the probability of a car being classified in a specific zone in the DMG, and these predictors are the ones that can be modified by influencing drivers through different strategies. A significant limitation of the initial modeling phase is the low accuracy obtained across all the classification algorithms. This limitation is correlated with the low amounts of data and the high class imbalance in the training data, which directly influence the ability of the model to learn stable decision boundaries. Also, the initial modeling phases only provided models that are weak in approximation of the structure of the DMG segments. That conduct to a low confidence in the validity of the predicted classes and practically a limitation of the capacity to extract relevant insights regarding the relationship between driver behavior and the risk zones. Starting from these outcomes, a new direction of study appeared: is the way of building the DMG using the quantile thresholds the optimal method, considering the low amount of data?

The last part of the research is focused on answering this specific question. The way the research was built is as follows: the first method of building the DMG was compared with 2 other methods, by training the classification models for finding the highest accuracy a classification model can reach. After identifying the best classification model in terms of accuracy, we can conclude that the corresponding segmentation method was the most effective for building the DMG. The other two methods were: largest gap method and a hybrid segmentation technique, that averages the thresholds obtained from the other two methods.

The Largest Gap is a method that also splits the population into three categories, but it places the thresholds where the differences in the data are the most significant. The flow of this method is as follows: firstly, the values are sorted; then, the absolute differences between consecutive values are calculated—measuring the jumps between neighboring values. Finally, the two largest jumps are identified, and the values near these jumps become the thresholds that split the data into three segments.

From the graphical perspective (Figure 26) of the vehicle distribution based on the largest gap method, it is clearly visible that several zones within the matrix lack data or contain very few data points. This may lead either to the impossibility of augmenting the data or to overfitting the model as a result of excessive augmentation. For the ‘Small’ vehicle category, the classification algorithm was only able to generate predictions for two out of the nine zones, which disqualifies this data-splitting methodology for the current sample of data.

The hybrid segmentation technique (Figure 27) averages the thresholds obtained from the two methods, percentile-based quantiles and the largest gap, in order to create more balanced and stable zone definitions across the two axes. It is a weighted average, assigning greater importance to the percentile-based quantiles (giving them twice the weight compared to the thresholds from the largest gap method). The hybrid method offers a balanced approach, reducing dependence on a single segmentation strategy and improving stability across diverse distributions.

This version of the training pipeline introduces data augmentation prior to train-test split, ensuring a better balance across DMG zones. SMOTE (synthetic minority oversampling technique) is further used to synthetically oversample minority classes, followed by training and evaluating 14 classification models. Firstly, the models were trained based on all the predictors, including the ones that could not be changed by influencing driver behavior.

Secondly, predictors that could not be modified were eliminated, and the 14 classification models were again retrained on this new set of predictors.

The accuracy metrics show a much better accuracy for this model than for the same models and predictors for the percentile-based quantile generation of the DMG (Figure 28). The models with the highest accuracy are Random Forest, Extra Trees, Bagging and Fourier with logistic Regression, with an accuracy score of 93% and an F1-score of 90% (Table 6). The model chosen to generate the classification was Random Forest (RF).

SHAP analysis was used to interpret the model by quantifying the contribution of each feature to the prediction. The summary plot (Appendix K) visually ranks the predictors based on their overall impact. The predictors with the highest influence are the total expenses, with a SHAP importance of 0.37; the cost of maintenance per kilometer driven, with an importance of 0.2; and electrical expenses, with an importance of 0.18. These are some examples but define a specific outcome: the accuracy of the models increased, as did the number of predictors with influence on the target, but the importance of the predictors is more uniformly distributed; there are no more predictors with a very high importance compared to the others.

Appendix K presents the SHAP summary plot, illustrating the magnitude and direction of impact for each predictor variable on the model output. The horizontal axis represents the SHAP value, indicating how much a given feature contributes to increasing (positive SHAP value) or decreasing (negative SHAP value) the model’s prediction. Each dot corresponds to a single observation, colored according to the feature’s value (red = high, blue = low). As seen, the total expenses, electrical expenses, and the maintenance cost per kilometer drive are the most influential features, as evidenced by their wide SHAP value distribution. This confirms their critical role in predicting the target variable (vehicle class). Behavioral features, such as the number of unique locations where the driver overbraked, show consistent impacts, emphasizing the contribution of aggressive or unsafe driving to increased costs. Some important insights from the SHAP graph are that vehicles generating repeated maintenance costs require preventive interventions, unresolved electrical issues are signals to increase the risk classification of a car, meaning that early detection programs are highly recommended, and coaching drivers to improve braking behavior can reduce maintenance costs.

If the changeable predictors are kept, there is a slight decrease in accuracy, but the performance remains significantly higher than that of the DMG generated using percentile-based quantile segmentation. Using the same predictors and models, the percentile-based DMG achieved a maximum classification accuracy of 58% (Appendix J), whereas the current (Table 7), best-performing model, ExtraTrees, reaches an accuracy of 86% (Figure 29).

Threshold selection in decision-making grid applications is context-dependent, as threshold limits are normally determined based on the relative magnitude and distribution of the available data rather than according to universally accepted rules. Previous studies [22] highlight that DMG thresholds are often established empirically to reflect the characteristics of the dataset and the specific maintenance decision-making context. In this study, classification accuracy is used as a quantitative criterion to evaluate alternative threshold configurations, which generates a data-driven selection process that correlates the segmentation with predictive performance and practical decision relevance.

In conclusion, the hybrid method for generating the DMG proved to be the most effective for the dataset used in this research, as demonstrated by its ability to support a classification algorithm that achieved higher accuracy compared to the other two segmentation models. The objective of identifying the predictors most strongly correlated with driver behavior was also met by selecting the best-performing prediction model, training it on the entire population, and determining the interval limits for each predictor across the zones of the matrix. With this knowledge of both the key predictors and their corresponding thresholds within each DMG zone, a fleet manager can identify actionable levers to influence driver style in the desired direction within the matrix.

The figure presents the ranked features based on their SHAP values (Figure 30), indicating their contribution to the model’s predictions. These features include a wide range of expense categories (e.g., total_expenses, engine_transmission_expenses, electrical_expenses) as well as behavioral indicators such as unique_locations_overbraking, total_acceleration_events, and average_speed_events. SHAP values provide both global and local interpretability; globally, they highlight which features most strongly influence the model across all predictions. In this case, total_expenses and maintenance_cost_per_km emerge as dominant predictors, reflecting their central role in the cost modeling task.

Some of the practical outcomes of the SHAP summary plot are: vehicles accumulating many kilometers in short intervals (km driven in the last 4 months) tend to deteriorate faster and should be monitored for early intervention. Also, high values of unsafe or fluctuating driving at 60–110 km/h are correlated with risk, meaning that a fleet manager should target reductions in rapid speed oscillations. High fuel consumption per 100 km is also an important signal of operational inefficiency, and low-speed driving time correlates with increased risk, meaning that vehicles used in urban areas may require adjusted maintenance programs.

4. Discussion

The focus of the study was to build a predictive and prescriptive framework for fleet cost optimization by correlating driver style to maintenance and fuel expenses through machine learning and decision-support methodologies.

The decision-making grid (DMG) was used as a classification tool and an instrument for generating a strategic roadmap for behavioral interventions. By identifying the key variables and their value ranges that determine where a vehicle should be placed in a specific zone, the DMG also offers methods for moving vehicles from less efficient zones to more optimal ones, or for maintaining high-performing vehicles in their current efficient matrix area.

To build the DMG, three distinct methods for threshold generation were tested, and it was found that the hybrid thresholding method outperformed traditional segmentation techniques in terms of both accuracy and coverage. The research also proposes a methodology for determining which thresholding strategy performs best, using supervised machine learning classification algorithms.

Both supervised and unsupervised machine learning techniques were used to identify the predictors with the highest impact on the target variable, making it easier to identify distinct driver clusters, while the statistical approach was the key to assessing if the differences between clusters were statistically significant and allowed for a detailed characterization of each cluster’s behavioral pattern. The SHAP library was used for model interpretability and to determine the most influential predictors.

The scope of the modeling framework was to provide a consistent comparative foundation for evaluating alternative DMG threshold configurations, rather than optimizing predictive performance. Training conditions were identical across models, including the same dataset, preprocessing pipeline, and augmentation strategy. By doing so, performance differences reflect the impact of threshold definitions rather than model-specific effects. The classification framework represents a controlled experimental setting for threshold evaluation, representing an exploratory approach that can be further validated on larger datasets.

The proposed system is a powerful tool for fleet managers, offering the possibility to simulate the effects of driver profile changes on vehicle performance. It enables proactive interventions aimed at reducing costs and emissions while improving operational efficiency. This tool enables fleet managers to adjust key behavioral and maintenance-related predictors—such as braking intensity, harsh acceleration frequency, maintenance cost per kilometer, or short-term mileage—and immediately observe if a vehicle migrates to another zone within the DMG matrix. The classification models are executed in real time, providing instant feedback regarding the expected shift toward higher or lower risk clusters. This functionality transforms the DMG from a static segmentation framework into an interactive decision-support system. Fleet managers can simulate, in real time, the cost savings resulting from adjusting parameters that move a vehicle from a high-risk zone to a lower-risk zone. However, it is true that the system remains a tool—changing driver dynamics still requires human intervention. One-to-one discussions or specific human resource strategies, such as targeted training programs for drivers exhibiting more aggressive driving styles, may be necessary. The current research has its limitations. The first limitation is related to the number of vehicles included in the analysis, a fleet of just 91 cars. This significantly constrains the results, especially since some zones of the matrix contained very few individuals. For these zones, data augmentation techniques were applied, which may potentially decrease the accuracy of the outcomes. Another limitation is that the results are based on data from a single company’s fleet and thus may require validation on larger or more diverse datasets. From a business perspective, an additional limitation is that using this type of tool requires IoT devices to be installed at the CAN level of each vehicle, which involves both time and cost investments.

Despite all these limitations, the research proposes a foundation for integrating data-driven decision-making into fleet management, combining interpretability, accuracy, and business applicability. Future researchers could aim to validate this methodology on larger and more heterogeneous fleets, test real-time integration with telematics systems, and further explore behavioral incentive mechanisms based on predictive risk profiling. Additionally, expanding the DMG approach to incorporate environmental impact metrics or other variables could align fleet management with broader sustainability goals.

Aligned with these direction of development, future work will also address three key areas highlighted by the current findings: real-time integration of the classification and simulation framework into operational telematics platforms, validation of the proposed models using significantly larger datasets to enhance statistical reliability, and systematic inclusion of environmental performance indicators to ensure that the DMG framework supports operational efficiency and also sustainability objectives, real-time integration, larger datasets, and inclusion of environmental considerations.

5. Conclusions

The research distinguishes itself from prior studies because it adapts the DMG matrix to the automotive domain, whereas it is traditionally used in industrial and manufacturing contexts, with the main objective of optimizing and increasing the efficiency of fuel usage and maintenance costs based on driver behavior. The study follows a trial-and-error research design, in which multiple methodological directions and segmentation strategies were explored to identify the most suitable approach given the constraints of the available dataset, particularly the limited sample size and heterogeneous class distribution. The chosen flow of testing different methods and methodologies shows the importance of flexibility when dealing with real-world telematics and maintenance data, where analytics must be adapted to structural limitations rather than idealized conditions.

The research shows the importance of using data-driven methodologies for more efficient fleet management by combining predictive modeling, interpretability, and actionable operational insights. Testing several segmentation strategies, the study identifies that the hybrid DMG approach highly improves the accuracy of vehicle risk classification. While the initial models gained limited predictive performance due to the small sample of data and class imbalance, the hybrid methodology produced an important increase in accuracy metrics, achieving accuracies of around 0.85 and F1-scores near 0.80, showing a clear improvement over the quantile-based baseline.

An essential component of the analysis was the explicit focus on predictor interpretability, which was prioritized over dimensionality-reduction techniques such as PCA. Although PCA would have improved model accuracy, it would have done so at the expense of transparency and behavioral insight—precisely the elements most relevant for understanding and influencing driver behavior. Maintaining full interpretability allows the model to identify which predictors contribute to optimal performance and, more importantly, to articulate the behavioral patterns that characterize the most efficient drivers in the fleet.

Finding the ideal zone of drivers is based on interpretability. By detecting the ideal intervals for the predictors that drive a car’s placement in that ideal zone, fleet managers can perform targeted interventions that help other drivers reach these optimal patterns and this ideal zone. This contributes to fuel consumption improvements, which leads to reductions in CO₂ emissions and improves the overall environmental impact.

Overall, the research provides a robust foundation for future developments in intelligent fleet management. It shows that integrating interpretable machine learning with behavioral analytics can meaningfully enhance decision-making, even under important data constraints. The methods, methodology, and insights built here can be extended to larger fleets, integrated with telematics systems for continuous monitoring, or combined with environmental metrics to better align operational efficiency with sustainability goals.

The current work has its limitations in terms of contextually available variables because the available dataset was anonymized, limiting the potential of using relevant factors. The available contextual variable was the vehicle brand, but approximately 80% of the vehicles had the same manufacturer, resulting in a highly imbalanced distribution. Future research could extend the analysis by incorporating additional explanatory variables such as driver age, driving experience, vehicle technical characteristics, loading degree, and maximum payload capacity.

Author Contributions

Conceptualization, T.S.S. and C.T.; methodology, A.L., C.T. and T.S.S.; software, C.T.; validation, T.S.S., A.L. and M.-D.R.; formal analysis, C.T.; investigation, T.S.S. and A.L.; resources, C.T.; data curation, C.T. and T.S.S.; writing—original draft preparation, C.T.; writing—review and editing, T.S.S. and A.L.; visualization, C.T.; supervision, A.L. and M.-D.R.; project administration, M.-D.R. and A.L.; funding acquisition, M.-D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by European Union—Next generation EU and Romanian Government, under National Recovery and Resilience Plan for Romania, grant number 760050/23.05.2023, code PNRR-C9-I8-CF 267/29.11.2022, through the Romanian Ministry of Research, Innovation and Digitalization, within Component 9, Investment I8. This paper was financed by the Bucharest University of Economic Studies during the PhD program.

Data Availability Statement

Data is unavailable due to privacy and ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Heatmap for the maintenance costs by category and cluster, showing that cluster 1 has higher costs across multiple repair types.

Appendix B

Details of the included expenses:

1—braking expenses;

2—suspension expenses;

3—exhaust and catalysis expenses;

4—engine and transmission expenses;

5—electrical expenses;

6—air conditioning expenses;

7—wheels and tires expenses;

8—body and accessories expenses;

9—AdBlue expenses;

10—miscellaneous expenses;

11—fluid expenses;

12—accident-related expenses;

13—towing expenses.

Appendix C

Figure A2. Boxplot showing no significant cluster differences in braking-system expenses, despite behavioral distinctions.

Appendix D

Variable	Source	Sum of Squares	df	F	p-Value
Total_Break1	Cluster_Braking	1.16 × 10⁶	5	68.25	5.84 × 10⁻²⁹
	Residual	3.03 × 10⁵	89	–	–
Total_Break2	Cluster_Braking	728,180.13	5	115.2	2.53 × 10⁻³⁷
	Residual	112,484.23	89	–	–
Total_Break3	Cluster_Braking	59,284.36	5	195.4	2.13 × 10⁻⁴⁶
	Residual	5,401.79	89	–	–
cheltuieli_franare	Cluster_Braking	4.64 × 10⁸	5	0.51	0.766
	Residual	1.61 × 10¹⁰	89	–	–

Appendix E

Tukey HSD Results for Total_Break1:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=========================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------

0 1 415.8393 0.0 293.6209 538.0577 True

0 2 178.9821 0.0 139.6726 218.2917 True

0 3 22.8393 0.9988 −148.5072 194.1857 False

0 4 225.2679 0.0 157.1817 293.354 True

0 5 469.8393 0.0 298.4928 641.1857 True

1 2 −236.8571 0.0 −361.1649 −112.5494 True

1 3 −393.0 0.0 −601.0067 −184.9933 True

1 4 −190.5714 0.0014 −326.7438 −54.3991 True

1 5 54.0 0.974 −154.0067 262.0067 False

2 3 −156.1429 0.1004 −328.9858 16.7001 False

2 4 46.2857 0.4221 −25.4834 118.0548 False

2 5 290.8571 0.0001 118.0142 463.7001 True

3 4 202.4286 0.0198 20.8654 383.9917 True

3 5 447.0 0.0 206.8145 687.1855 True

4 5 244.5714 0.0023 63.0083 426.1346 True

---------------------------------------------------------

Tukey HSD Results for Total_Break2:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=========================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------

0 1 524.1607 0.0 449.6433 598.6781 True

0 2 45.4107 0.0 21.4434 69.378 True

0 3 −3.8393 1.0 −108.3104 100.6318 False

0 4 151.4464 0.0 109.9338 192.959 True

0 5 323.1607 0.0 218.6896 427.6318 True

1 2 −478.75 0.0 −554.5413 −402.9587 True

1 3 −528.0 0.0 −654.8231 −401.1769 True

1 4 −372.7143 0.0 −455.7395 −289.6891 True

1 5 −201.0 0.0002 −327.8231 −74.1769 True

2 3 −49.25 0.7498 −154.6335 56.1335 False

2 4 106.0357 0.0 62.2776 149.7938 True

2 5 277.75 0.0 172.3665 383.1335 True

3 4 155.2857 0.0013 44.5855 265.986 True

3 5 327.0 0.0 180.5573 473.4427 True

4 5 171.7143 0.0003 61.014 282.4145 True

---------------------------------------------------------

Tukey HSD Results for Total_Break3:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=========================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------

0 1 45.0357 0.0 28.7059 61.3655 True

0 2 2.5357 0.7232 −2.7165 7.7879 False

0 3 174.0357 0.0 151.1419 196.9296 True

0 4 39.8929 0.0 30.7958 48.99 True

0 5 140.0357 0.0 117.1419 162.9296 True

1 2 −42.5 0.0 −59.1089 −25.8911 True

1 3 129.0 0.0 101.2079 156.7921 True

1 4 −5.1429 0.9625 −23.337 13.0513 False

1 5 95.0 0.0 67.2079 122.7921 True

2 3 171.5 0.0 148.4062 194.5938 True

2 4 37.3571 0.0 27.768 46.9463 True

2 5 137.5 0.0 114.4062 160.5938 True

3 4 −134.1429 0.0 −158.4018 −109.8839 True

3 5 −34.0 0.0313 −66.0915 −1.9085 True

4 5 100.1429 0.0 75.8839 124.4018 True

---------------------------------------------------------

Tukey HSD Results for cheltuieli_franare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −7644.0803 0.9686 −35,844.8716 20,556.711 False

0 2 −730.6443 0.9999 −9800.9708 8339.6823 False

0 3 −2233.9852 1.0 −41,770.6221 37,302.6516 False

0 4 −7056.7682 0.7797 −22,767.0347 8653.4982 False

0 5 −8699.9653 0.9875 −48,236.6022 30,836.6715 False

1 2 6913.4361 0.9812 −21,769.455 35,596.3271 False

1 3 5410.0951 0.9995 −42,585.5617 53,405.7519 False

1 4 587.3121 1.0 −30,833.2208 32,007.845 False

1 5 −1055.885 1.0 −49,051.5418 46,939.7718 False

2 3 −1503.341 1.0 −41,385.2825 38,378.6006 False

2 4 −6326.1239 0.8749 −22,886.1988 10,233.9509 False

2 5 −7969.3211 0.992 −47,851.2626 31,912.6205 False

3 4 −4822.783 0.9994 −46,716.8268 37,071.2609 False

3 5 −6465.9801 0.9994 −61,886.5908 48,954.6306 False

4 5 −1643.1971 1.0 −43,537.241 40,250.8467 False

Appendix F

ANOVA Results for cheltuieli_franare:

sum_sq df F PR (>F)

C(Cluster) 2.841738 × 10⁸ 3.0 0.52916 0.66339

Residual 1.628987 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_franare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −3158.7533 0.8103 −12,453.1318 6135.6253 False

0 2 −7364.0565 0.8723 −32,735.5188 18,007.4058 False

0 3 475.6588 0.9988 −7903.1866 8854.5042 False

1 2 −4205.3032 0.974 −30,066.2922 21,655.6858 False

1 3 3634.4121 0.7643 −6126.9764 13,395.8005 False

2 3 7839.7153 0.8528 −17,706.5238 33,385.9543 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_suspensie:

sum_sq df F PR (>F)

C(Cluster) 3.463899 × 10⁸ 3.0 0.806233 0.493608

Residual 1.303241 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_suspensie:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −3984.0542 0.5942 −12,297.3633 4329.2549 False

0 2 −8066.3515 0.7887 −30,759.7255 14,627.0225 False

0 3 −2922.6289 0.7378 −10,417.0441 4571.7863 False

1 2 −4082.2973 0.9671 −27,213.5259 19,048.9313 False

1 3 1061.4253 0.9888 −7669.5984 9792.449 False

2 3 5143.7226 0.9351 −17,705.9796 27,993.4248 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_motor_transmisie:

sum_sq df F PR (>F)

C(Cluster) 4.989381 × 10⁸ 3.0 0.867066 0.461234

Residual 1.745479 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_motor_transmisie:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −5132.9774 0.5049 −14,753.949 4487.9942 False

0 2 −9396.966 0.7854 −35,659.9505 16,866.0184 False

0 3 −1245.7583 0.9818 −9919.0261 7427.5095 False

1 2 −4263.9887 0.9755 −31,033.7012 22,505.7239 False

1 3 3887.2191 0.7458 −6217.1726 13,991.6108 False

2 3 8151.2078 0.8511 −18,292.695 34,595.1105 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_evacuare_catalizare:

sum_sq df F PR (>F)

C(Cluster) 1.343718 × 10⁸ 3.0 0.444019 0.722111

Residual 9.179672 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_evacuare_catalizare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −1492.999 0.9436 −8470.1031 5484.1051 False

0 2 −5173.2767 0.8925 −24,219.1261 13,872.5726 False

0 3 1033.5704 0.9732 −5256.2613 7323.4021 False

1 2 −3680.2777 0.9598 −23,093.605 15,733.0496 False

1 3 2526.5694 0.8036 −4801.1097 9854.2485 False

2 3 6206.8471 0.8318 −12,970.2037 25,383.8979 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_electrice:

sum_sq df F PR (>F)

C(Cluster) 2.407304 × 10⁸ 3.0 0.379846 0.767753

Residual 1.922398 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_electrice:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −1973.8559 0.9561 −12,070.6434 8122.9316 False

0 2 −9511.849 0.8032 −37,073.6986 18,050.0006 False

0 3 318.8126 0.9997 −8783.4014 9421.0265 False

1 2 −7537.9931 0.896 −35,631.6316 20,555.6454 False

1 3 2292.6684 0.9419 −8311.4472 12,896.7841 False

2 3 9830.6615 0.7904 −17,921.0538 37,582.3769 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_climatizare:

sum_sq df F PR (>F)

C(Cluster) 7.324592 × 10⁸ 3.0 1.361474 0.259614

Residual 1.631902 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_climatizare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −6224.3171 0.3037 −15,527.0106 3078.3764 False

0 2 −7836.203 0.8507 −33,230.3633 17,557.9573 False

0 3 24.6508 1.0 −8361.6906 8410.9922 False

1 2 −1611.8859 0.9984 −27,496.0109 24,272.239 False

1 3 6248.9679 0.3434 −3521.1533 16,019.0891 False

2 3 7860.8538 0.8521 −17,708.2396 33,429.9473 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_roti_anvelope:

sum_sq df F PR (>F)

C(Cluster) 1.893244 × 10⁷ 3.0 0.190798 0.902421

Residual 3.009906 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_roti_anvelope:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −732.3808 0.9634 −4727.5766 3262.8149 False

0 2 −2667.935 0.9187 −13,573.8776 8238.0076 False

0 3 −271.6284 0.9973 −3873.2815 3330.0248 False

1 2 −1935.5542 0.9684 −13,051.9202 9180.8118 False

1 3 460.7525 0.9917 −3735.1878 4656.6928 False

2 3 2396.3067 0.9404 −8584.7638 13,377.3772 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_caroserie_accesorii:

sum_sq df F PR (>F)

C(Cluster) 7.002934 × 10⁷ 3.0 0.29172 0.831278

Residual 7.281723 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_caroserie_accesorii:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −460.6617 0.9974 −6674.7683 5753.4449 False

0 2 −5884.999 0.8006 −22,848.0451 11,078.0471 False

0 3 72.4271 1.0 −5529.5654 5674.4196 False

1 2 −5424.3373 0.8444 −22,714.6748 11,866.0003 False

1 3 533.0888 0.9965 −5993.2548 7059.4324 False

2 3 5957.4261 0.798 −11,122.4736 23,037.3257 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_combustibil_adblue:

sum_sq df F PR (>F)

C(Cluster) 5.005394 × 10⁶ 3.0 0.309556 0.818422

Residual 4.904782 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_combustibil_adblue:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==========================================================

group1 group2 meandiff p-adj lower upper reject

----------------------------------------------------------

0 1 −285.3995 0.9669 −1898.1658 1327.3669 False

0 2 −823.0772 0.9613 −5225.5492 3579.3947 False

0 3 −492.5824 0.8118 −1946.4849 961.3201 False

1 2 −537.6777 0.9892 −5025.0926 3949.7372 False

1 3 −207.1829 0.9886 −1900.9851 1486.6193 False

2 3 330.4948 0.9973 −4102.3045 4763.2941 False

----------------------------------------------------------

ANOVA Results for cheltuieli_diverse:

sum_sq df F PR (>F)

C(Cluster) 1.147887 × 10⁸ 3.0 0.651101 0.584282

Residual 5.347747 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_diverse:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −1752.3966 0.8247 −7077.7327 3572.9396 False

0 2 −2928.8825 0.9523 −17,465.7947 11,608.0298 False

0 3 1013.5904 0.9456 −3787.179 5814.3599 False

1 2 −1176.4859 0.9968 −15,993.8789 13,640.9071 False

1 3 2765.987 0.5689 −2826.9286 8358.9026 False

2 3 3942.4729 0.8949 −10,694.58 18,579.5258 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_fluide:

sum_sq df F PR (>F)

C(Cluster) 5.117825 × 10⁸ 3.0 0.740223 0.530771

Residual 2.097215 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_fluide:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

=============================================================

group1 group2 meandiff p-adj lower upper reject

-------------------------------------------------------------

0 1 −4412.2682 0.6935 −14,958.1553 6133.619 False

0 2 −9414.9568 0.8274 −38,202.7425 19,372.8289 False

0 3 546.7461 0.9988 −8960.3294 10,053.8215 False

1 2 −5002.6886 0.9702 −34,345.9169 24,340.5397 False

1 3 4959.0142 0.6461 −6116.7667 16,034.7952 False

2 3 9961.7029 0.8051 −19,024.3937 38,947.7995 False

-------------------------------------------------------------

ANOVA Results for cheltuieli_accident:

sum_sq df F PR (>F)

C(Cluster) 3.214899 × 10⁶ 3.0 0.398348 0.754503

Residual 2.448077 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_accident:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==========================================================

group1 group2 meandiff p-adj lower upper reject

----------------------------------------------------------

0 1 −331.9215 0.8711 −1471.316 807.473 False

0 2 −394.667 0.9873 −3504.9454 2715.6114 False

0 3 −394.667 0.7465 −1421.8266 632.4926 False

1 2 −62.7455 0.9999 −3233.0347 3107.5438 False

1 3 −62.7455 0.9991 −1259.3905 1133.8996 False

2 3 0.0 1.0 −3131.7042 3131.7042 False

----------------------------------------------------------

ANOVA Results for cheltuieli_tractare:

sum_sq df F PR (>F)

C(Cluster) 5.951550 × 10⁶ 3.0 0.664992 0.575693

Residual 2.714774 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_tractare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==========================================================

group1 group2 meandiff p-adj lower upper reject

----------------------------------------------------------

0 1 −609.1466 0.5472 −1809.0008 590.7076 False

0 2 −697.6475 0.9443 −3972.9666 2577.6716 False

0 3 −337.4327 0.8466 −1419.0965 744.2312 False

1 2 −88.5009 0.9999 −3427.0153 3250.0135 False

1 3 271.7139 0.9424 −988.4287 1531.8566 False

2 3 360.2149 0.9918 −2937.667 3658.0967 False

Appendix G

ANOVA Results for cheltuieli_franare:

sum_sq df F PR (>F)

C(Cluster) 7.155399 × 10⁹ 3.0 23.044421 3.476264 × 10⁻¹¹

Residual 9.418640 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_franare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 26,287.601 0.0 16,507.5664 36,067.6355 True

0 2 17,924.1016 0.0002 7054.2009 28,794.0023 True

0 3 −233.1758 0.9996 −6236.4646 5770.1131 False

1 2 −8363.4994 0.3664 −21,781.5961 5054.5973 False

1 3 −26,520.7767 0.0 −36,416.7383 −16,624.8152 True

2 3 −18,157.2774 0.0002 −29,131.5983 −7182.9564 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_suspensie:

sum_sq df F PR (>F)

C(Cluster) 7.107431 × 10⁹ 3.0 34.377208 6.021829 × 10⁻¹⁵

Residual 6.271366 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_suspensie:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 29,424.1832 0.0 21,443.7334 37,404.6331 True

0 2 10,267.1506 0.0165 1397.3764 19,136.9247 True

0 3 952.2923 0.9568 −3946.3556 5850.9402 False

1 2 −19,157.0327 0.0001 −30,106.1196 −8207.9458 True

1 3 −28,471.8909 0.0 −36,546.9366 −20,396.8453 True

2 3 −9314.8583 0.0382 −18,269.8387 −359.8778 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_motor_transmisie:

sum_sq df F PR (>F)

C(Cluster) 4.772522 × 10⁹ 3.0 10.982796 0.000003

Residual 1.318121 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_motor_transmisie:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 23,815.5287 0.0 12245.7814 35,385.276 True

0 2 4838.9764 0.7584 −8020.0788 17,698.0316 False

0 3 −727.0795 0.9932 −7828.9496 6374.7906 False

1 2 −18,976.5523 0.0124 −34,850.1147 −3102.9899 True

1 3 −24,542.6082 0.0 −36,249.4968 −12,835.7196 True

2 3 −5566.0559 0.6769 −18,548.64 7416.5281 False

---------------------------------------------------------------

ANOVA Results for cheltuieli_evacuare_catalizare:

sum_sq df F PR (>F)

C(Cluster) 3.835001 × 10⁹ 3.0 21.231507 1.637299 × 10⁻¹⁰

Residual 5.479044 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_evacuare_catalizare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 13,000.9049 0.0001 5541.5947 20,460.2151 True

0 2 20,537.5277 0.0 12,246.9679 28,828.0875 True

0 3 −364.6007 0.9968 −4943.3569 4214.1555 False

1 2 7536.6228 0.2239 −2697.4664 17,770.712 False

1 3 −13,365.5056 0.0001 −20,913.2343 −5817.7769 True

2 3 −20,902.1284 0.0 −29,272.3304 −12,531.9265 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_electrice:

sum_sq df F PR (>F)

C(Cluster) 8.008809 × 10⁹ 3.0 21.20601 1.673999 × 10⁻¹⁰

Residual 1.145590 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_electrice:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 25,987.478 0.0 15,201.4674 36,773.4886 True

0 2 21,601.0922 0.0001 9613.1116 33,589.0728 True

0 3 −713.2676 0.9921 −7334.0558 5907.5206 False

1 2 −4386.3858 0.8652 −19,184.6703 10,411.8987 False

1 3 −26,700.7456 0.0 −37,614.6075 −15,786.8837 True

2 3 −22,314.3598 0.0 −34,417.5013 −10,211.2183 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_climatizare:

sum_sq df F PR (>F)

C(Cluster) 5.827819 × 10⁹ 3.0 15.750395 2.492119 × 10⁻⁸

Residual 1.122367 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_climatizare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==============================================================

group1 group2 meandiff p-adj lower upper reject

--------------------------------------------------------------

0 1 24,397.7049 0.0 13,721.58 35,073.8298 True

0 2 14,838.0674 0.0081 2972.218 26,703.9169 True

0 3 −120.3248 1.0 −6673.6617 6433.0122 False

1 2 −9559.6375 0.3255 −24,207.16 5087.8851 False

1 3 −24,518.0297 0.0 −35,320.7034 −13,715.356 True

2 3 −14,958.3922 0.0082 −26,938.2294 −2978.555 True

--------------------------------------------------------------

ANOVA Results for cheltuieli_roti_anvelope:

sum_sq df F PR (>F)

C(Cluster) 1.436568 × 10⁹ 3.0 27.367145 1.051549 × 10⁻¹²

Residual 1.592270 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_roti_anvelope:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==============================================================

group1 group2 meandiff p-adj lower upper reject

--------------------------------------------------------------

0 1 13,262.6445 0.0 9241.4536 17,283.8354 True

0 2 5932.7756 0.0043 1463.4718 10,402.0795 True

0 3 1236.1569 0.5586 −1232.175 3704.4887 False

1 2 −7329.8689 0.0043 −12,846.8973 −1812.8405 True

1 3 −12,026.4876 0.0 −16,095.3435 −7957.6318 True

2 3 −4696.6187 0.0381 −9208.8564 −184.3811 True

--------------------------------------------------------------

ANOVA Results for cheltuieli_caroserie_accesorii:

sum_sq df F PR (>F)

C(Cluster) 3.087725 × 10⁹ 3.0 21.965378 8.689141 × 10⁻¹¹

Residual 4.264028 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_caroserie_accesorii:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 18,921.8382 0.0 12,341.3774 25,502.299 True

0 2 9555.8923 0.0051 2242.119 16,869.6656 True

0 3 1142.6196 0.8805 −2896.6715 5181.9108 False

1 2 −9365.946 0.039 −18,394.2637 −337.6282 True

1 3 −17,779.2186 0.0 −24,437.6805 −11,120.7567 True

2 3 −8413.2726 0.0189 −15,797.3047 −1029.2405 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_combustibil_adblue:

sum_sq df F PR (>F)

C(Cluster) 1.034293 × 10⁸ 3.0 8.002354 0.000086

Residual 3.920542 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_combustibil_adblue:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

============================================================

group1 group2 meandiff p-adj lower upper reject

------------------------------------------------------------

0 1 3485.9325 0.0001 1490.5813 5481.2838 True

0 2 −308.624 0.9834 −2526.333 1909.0849 False

0 3 −95.5397 0.997 −1320.3483 1129.2688 False

1 2 −3794.5566 0.0026 −6532.1559 −1056.9573 True

1 3 −3581.4723 0.0001 −5600.4753 −1562.4692 True

2 3 213.0843 0.9945 −2025.9287 2452.0974 False

------------------------------------------------------------

ANOVA Results for cheltuieli_diverse:

sum_sq df F PR (>F)

C(Cluster) 1.098222 × 10⁹ 3.0 7.632985 0.000131

Residual 4.364313 × 10⁹ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_diverse:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==============================================================

group1 group2 meandiff p-adj lower upper reject

--------------------------------------------------------------

0 1 9402.8761 0.0021 2745.4823 16,060.2699 True

0 2 7709.7318 0.0378 310.4523 15,109.0113 True

0 3 −714.0763 0.968 −4800.5913 3372.4388 False

1 2 −1693.1443 0.9622 −10,827.0133 7440.7247 False

1 3 −10,116.9523 0.0009 −16,853.2592 −3380.6455 True

2 3 −8423.8081 0.0206 −15,894.1678 −953.4483 True

--------------------------------------------------------------

ANOVA Results for cheltuieli_fluide:

sum_sq df F PR (>F)

C(Cluster) 8.530027 × 10⁹ 3.0 19.974215 4.947865 × 10⁻¹⁰

Residual 1.295391 × 10¹⁰ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_fluide:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===============================================================

group1 group2 meandiff p-adj lower upper reject

---------------------------------------------------------------

0 1 28,259.0217 0.0 16,789.4632 39,728.5802 True

0 2 20,124.9599 0.0005 7377.2583 32,872.6614 True

0 3 −476.629 0.998 −7517.0001 6563.7421 False

1 2 −8134.0618 0.5321 −23,870.1662 7602.0426 False

1 3 −28,735.6507 0.0 −40,341.1628 −17,130.1385 True

2 3 −20,601.5889 0.0004 −33,471.7495 −7731.4282 True

---------------------------------------------------------------

ANOVA Results for cheltuieli_accident:

sum_sq df F PR (>F)

C(Cluster) 3.103811 × 10⁶ 3.0 0.384409 0.76448

Residual 2.449188 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_accident:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

==========================================================

group1 group2 meandiff p-adj lower upper reject

----------------------------------------------------------

0 1 −375.8733 0.9242 −1952.9663 1201.2197 False

0 2 −178.6733 0.9933 −1931.5142 1574.1676 False

0 3 −375.8733 0.7404 −1343.942 592.1953 False

1 2 197.2 0.9952 −1966.5538 2360.9538 False

1 3 0.0 1.0 −1595.787 1595.787 False

2 3 −197.2 0.9913 −1966.8793 1572.4793 False

----------------------------------------------------------

ANOVA Results for cheltuieli_tractare:

sum_sq df F PR (>F)

C(Cluster) 6.526823 × 10⁷ 3.0 9.331617 0.000019

Residual 2.121608 × 10⁸ 91.0 NaN NaN

Tukey HSD Results for cheltuieli_tractare:

Multiple Comparison of Means—Tukey HSD, FWER = 0.05

===========================================================

group1 group2 meandiff p-adj lower upper reject

-----------------------------------------------------------

0 1 2796.5333 0.0 1328.693 4264.3736 True

0 2 770.616 0.6056 −860.7974 2402.0293 False

0 3 −22.962 0.9999 −923.968 878.044 False

1 2 −2025.9173 0.0481 −4039.7776 −12.057 True

1 3 −2819.4953 0.0 −4304.7346 −1334.256 True

2 3 −793.578 0.59 −2440.6632 853.5073 False

Appendix H

Figure A3. 1—braking expenses; 2—suspension expenses; 3—exhaust and catalysis expenses; 4—engine and transmission expenses; 5—electrical expenses; 6—air conditioning expenses; 7—wheels and tires expenses; 8—body and accessories expenses; 9—AdBlue expenses; 10—miscellaneous expenses; 11—fluids expenses; 12—accident-related expenses; 13—towing expenses.

Appendix I

‘cheltuieli_franare’: <class ‘statsmodels.iolib.summary.Summary’>

“““

OLS Regression Results

==============================================================

Dep. Variable: cheltuieli_franare R-squared: 0.395

Model: OLS Adj. R-squared: 0.323

Method: Least Squares F-statistic: 5.486

Date: Wed, 12 Feb 2025 Prob (F-statistic): 3.12 × 10⁻⁶

Time: 14:40:49 Log-Likelihood: −1012.3

No. Observations: 95 AIC: 2047.

Df Residuals: 84 BIC: 2075.

Df Model: 10 Covariance Type: nonrobust

==============================================================

coef std err t P > |t| [0.025 0.975]

----------------------------------------------------------------------------

const 7688.6229 1120.889 6.859 0.000 5459.611 9917.635

PC1 −609.9627 165.231 −3.692 0.000 −938.543 −281.382

PC2 942.6661 276.673 3.407 0.001 392.472 1492.861

PC3 279.7312 308.564 0.907 0.367 −333.882 893.345

PC4 358.7384 314.072 1.142 0.257 −265.829 983.306

PC5 −437.0935 403.623 −1.083 0.282 −1239.742 365.555

PC6 −890.0702 479.432 −1.857 0.067 −1843.473 63.332

PC7 −723.4756 504.791 −1.433 0.156 −1727.308 280.357

PC8 1545.1146 525.902 2.938 0.004 499.302 2590.928

PC9 −1833.7117 535.557 −3.424 0.001 −2898.725 −768.698

PC10 402.6532 587.966 0.685 0.495 −766.581 1571.888

==============================================================

Omnibus: 36.408 Durbin–Watson: 1.582

Prob(Omnibus): 0.000 Jarque–Bera (JB): 81.394

Skew: 1.433 Prob(JB): 2.12 × 10⁻¹⁸

Kurtosis: 6.514 Cond. No. 6.78

==============================================================

Appendix J

Figure A4. SHAP summary plot—feature impact on vehicle DMG classification (SVM model) showing that kilometers driven in 2023, braking expenses, and bodywork-related costs are the strongest predictors pushing vehicles toward higher-risk DMG zones (source: author’s contribution, generated using the Python 3.10 programming language).

Appendix K

Figure A5. SHAP summary plot showing that total expenses, maintenance cost per kilometer, and electrical expenses are the most influential predictors, with feature importance distributed more uniformly across variables compared to previous models (source: author’s contribution, generated using the Python programming language).

References

European Commission. Transport in the European Union: Current Trends and Issues; European Commission: Brussels, Belgium, 2019. [Google Scholar]
Deloitte. Fleet Management in Europe: Growing Importance in a World of Changing Mobility; Deloitte: London, UK, 2017. [Google Scholar]
Fafoutellis, P.; Mantouka, E.G.; Vlahogianni, E.I. Eco-driving and its impacts on fuel efficiency: An overview of technologies and data-driven methods. Sustainability 2020, 13, 226. [Google Scholar] [CrossRef]
Cale, D.; Franco, A.; Ferreira, J.C.; Rocha, J. Gamification system for eco-driving: Enhancing driver motivation and fuel savings through game mechanics. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2024, 16, 485–501. [Google Scholar]
Hellstrom, R.; Ivarsson, M.; Aslund, J.; Nielsen, L. Look-ahead control for heavy trucks to minimise trip time and fuel consumption. Control Eng. Pract. 2009, 17, 245–254. [Google Scholar] [CrossRef]
Cheng, Q.; Nouveliere, L.; Ofila, O. A new eco-driving assistance system for a light vehicle: Energy management and speed optimisation. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, Australia, 23–26 June 2013; pp. 1434–1439. [Google Scholar] [CrossRef]
Husnjak, S.; Forenbacher, I.; Bucak, T. Evaluation of eco-driving using smart mobile devices. Promet-Traffic Transp. 2015, 27, 335–344. [Google Scholar] [CrossRef]
Zhou, M.; Jin, H.; Wang, W. A review of vehicle fuel consumption models to evaluate eco-driving and eco-routing. Transp. Res. Part D Transp. Environ. 2016, 49, 203–218. [Google Scholar] [CrossRef]
Barla, P.; Gilbert-Gonthier, M.; Lopez Castro, M.A.; Miranda-Moreno, L. Eco-driving training and fuel consumption: Impact, heterogeneity and sustainability. Energy Econ. 2017, 62, 187–194. [Google Scholar] [CrossRef]
Meseguer, J.E.; Toh, C.K.; Calafate, C.T.; Cano, J.C.; Manzoni, P. Drivingstyles: A mobile platform for driving styles and fuel consumption characterisation. J. Commun. Netw. 2017, 19, 162–168. [Google Scholar] [CrossRef]
Xu, Y.; Li, H.; Liu, H.; Rodgers, M.O.; Guensler, R.L. Eco-driving for transit: An effective strategy to conserve fuel and emissions. Appl. Energy 2017, 194, 784–797. [Google Scholar] [CrossRef]
Wan, L.; Yan, Y.; Liu, C.; Mao, T.; Wang, W. Characteristics and identification of risky driving behaviors in expressway tunnels based on behavior spectrum. Int. J. Transp. Sci. Technol. 2024, 16, 5–17. [Google Scholar] [CrossRef]
Rahman, A.; Hriday, M.B.H.; Khan, R. Computer vision-based approach to detect fatigue driving and face mask for edge computing device. Heliyon 2022, 8, e11204. [Google Scholar] [CrossRef]
Gatteschi, V.; Cannavò, A.; Lamberti, F.; Morra, L.; Montuschi, P. Comparing algorithms for aggressive driving event detection based on vehicle motion data. IEEE Trans. Veh. Technol. 2021, 71, 53–68. [Google Scholar] [CrossRef]
Mikulic, I.; Boskovic, I.; Zovak, G. Effects of driving style and vehicle maintenance on vehicle roadworthiness. Hum.-Transp. Interact. 2024, 32, 667–677. [Google Scholar]
Nie, X.; Lin, X.; Li, Z.; Ji, B. Driving Style Recognition for Commercial Vehicles Based on Multi-Scale Convolution and Channel Attention. Appl. Sci. 2026, 16, 1925. [Google Scholar] [CrossRef]
Liu, R.; Yu, H.; Ren, Y.; Liu, S. The analysis of classification and spatiotemporal distribution characteristics of ride-hailing drivers’ driving style: A case study in China. Int. J. Environ. Res. Public Health 2022, 19, 9734. [Google Scholar] [CrossRef]
Jaydarifard, S.; Behara, K.; Baker, D.; Paz, A. Driver fatigue in taxi, ride-hailing, and ridesharing services: A systematic review. Transp. Rev. 2024, 44, 572–590. [Google Scholar] [CrossRef]
Labib, A. World-class maintenance using a computerised maintenance management system. J. Qual. Maint. Eng. 1998, 4, 66–75. [Google Scholar]
Labib, A. A decision analysis model for maintenance policy selection using a CMMS. J. Qual. Maint. Eng. 2004, 10, 191–202. [Google Scholar] [CrossRef]
Tahir, Z.; Burhanuddin, M.A.; Ahmad, A.R.; Halawani, S.M.; Arif, F. Improvement of decision-making grid model for maintenance management in small and medium industries. In Proceedings of the 2009 International Conference on Industrial and Information Systems (ICIIS), Haikou, China, 24–25 April 2009. [Google Scholar]
Burhanuddin, M.A.; Ahmad, A.R.; Desa, M. An improved decision-making grid model for food processing small and medium industries. In Proceedings Toward Excellence in Computer Science Postgraduate Research (PARS’08); Universiti Teknologi Malaysia: Johor, Malaysia, 2008; Volume 1, pp. 1–6. [Google Scholar]
Azriadi, E. Implementing decision-making grid model to improve maintenance strategies in oil palm industries. J. Eng. Sci. Technol. Manag. 2021, 1, 37–46. [Google Scholar] [CrossRef]
Hartini, E.; Subekti, M. An improvement of the decision-making grid model in failure-based maintenance on RSG-gas system/components. J. Phys. Conf. Ser. 2019, 1198, 022060. [Google Scholar] [CrossRef]
Okamura, F.T.; Alves Junior, P.N.; Cruz Júnior, J.C.; Costa Melo, I. Practical application of the decision-making grid (DMG) for supporting maintenance strategy decisions in a small hydroelectric power plant (SHPP). In Handbook of Smart Energy Systems; Springer International Publishing: Cham, Switzerland, 2023; pp. 2787–2807. [Google Scholar]
Seecharan, T.; Labib, A.; Jardine, A. Maintenance strategies: Decision-making grid vs jack-knife diagram. J. Qual. Maint. Eng. 2018, 24, 61–78. [Google Scholar] [CrossRef]
Shahin, A.; Attarpour, M.R. Developing decision-making grid for maintenance policy making based on estimated range of overall equipment effectiveness. Mod. Appl. Sci. 2011, 5, 86. [Google Scholar] [CrossRef]
Karar, A.N.; Labib, A. Agile asset criticality assessment approach using decision-making grid. J. Qual. Maint. Eng. 2022, 28, 1–13. [Google Scholar] [CrossRef]
Shahin, A.; Labib, A.; Haj Shirmohammadi, A.; Balouei Jamkhaneh, H. Developing a 3D decision-making grid based on failure modes and effects analysis with a case study in the steel industry. Int. J. Qual. Reliab. Manag. 2021, 38, 628–645. [Google Scholar] [CrossRef]
Shahin, A.; Ghofrani Isfahani, N.; Nilipour Tabatabaei, S.A. Determining appropriate maintenance strategy based on decision-making grid, Sigma level, and process capability index—With a case study in a steel company. Int. J. Appl. Manag. Sci. 2013, 5, 265–280. [Google Scholar] [CrossRef]
Shahin, A.; Malekzadeh, N.; Wood, L.C. Developing a decision-making grid for selecting innovation strategies—The case of knowledge-based companies. Technol. Anal. Strateg. Manag. 2023, 35, 827–843. [Google Scholar] [CrossRef]
Burhanuddin, M.A.; Halawani, S.M.; Ahmad, A.R. A costing analysis for Decision Making Grid model in failure-based maintenance. Adv. Decis. Sci. 2011, 2011, 205039. [Google Scholar] [CrossRef]
Tahir, Z.; Ahmad, A.; Nur, I.M.; Aboobaider, B.M.; Kobayashi, S. Using genetic algorithm to bridge decision-making grid data gaps in small and medium industries. In Proceedings of the 2014 Makassar International Conference on Electrical Engineering and Informatics (MICEEI), Makassar, Indonesia, 26–30 November 2014; pp. 114–117. [Google Scholar]
Shahin, A.; Aminsabouri, N.; Kianfar, K. Developing a decision-making grid for determining proactive maintenance tactics: A case study in the steel industry. J. Manuf. Technol. Manag. 2018, 29, 1296–1315. [Google Scholar] [CrossRef]
Tǎnǎsuicǎ (Zotic), C. Dataset of Vehicle Sensor Data Including Driving Behaviour Indicators, Maintenance Costs, and Fuel Consumption at Vehicle and Monthly Levels; Private Dataset. 2024; [dataset]. The Dataset is not publicly Available due to Privacy and Proprietary Restrictions But Is Available from the Corresponding Author upon Reasonable Request. [Google Scholar]

Figure 1. The actions, methodologies, and outputs.

Figure 2. Analytical framework for fleet behavior optimization. In the central matrix, green represents low maintenance cost and low service frequency (low risk), while red represents high cost and high service frequency (high risk).

Figure 3. Percentage distribution of maintenance expenses by category between 2022 and 2024, showing an increase in repair-related costs and a decline in tire expenses.

Figure 4. The elbow method, used to determine the optimal number of clusters for accelerometer-based segmentation; the curve shows an inflection point at k = 5.

Figure 5. Visualization of the five K-means clusters in the PCA1–PCA2 space, illustrating how principal component reduction enables graphical inspection of high-dimensional behavioral segmentation.

Figure 6. Boxplots for total number of events (left graph) and acceleration overload (right graph).

Figure 7. Boxplots for braking overload (left graph), cornering overload (middle graph) and average speed during the events (right graph).

Figure 8. Right-skewed histogram, with most vehicles falling within moderate usage levels and a few outliers showing minimal or intensive driving activity.

Figure 9. PCA1–PCA2 projection of the seven clusters (PCA1 on the x-axis and PCA2 on the y-axis), distinguishing risky from high-distance driving behaviors.

Figure 10. PCA loading heatmap showing the contribution of driving-related variables to the first two principal components; PCA2 is strongly driven by long-distance and moderate-speed driving indicators.

Figure 11. Boxplots for total distance traveled (left), total time driven at speeds between 10 and 60 km/h (middle), total time driven at speeds between 60 and 110 km/h (right).

Figure 12. Boxplots for total aggressive starts (left), total number of unsafe brakes at speeds between 60 and 110 km/h (middle), total number of unsafe driving at speeds between 60 and 110 km/h (right).

Figure 13. Distribution of unsafe braking events across the five K-means clusters, showing behavioral differences at low (10–60 km/h), medium (60–110 km/h), and high (>110 km/h) speeds; cluster 0 has the most prudent braking patterns.

Figure 14. Clustering based on usage-intensity indicators: the elbow plot (left) identifies four optimal clusters, while the boxplot (right) shows clear separation between clusters, with cluster 2 displaying the highest levels of vehicle activity (engine starts, braking events, and curve-taking frequency).

Figure 15. Clustering analysis based on fuel usage and mileage: the elbow plot (left) indicates four optimal clusters, while the two boxplots (middle and right) show differences in total fuel consumption and total kilometers across clusters.

Figure 16. Decision-making grid (DMG) adapted for the case study, illustrating how vehicles are segmented based on the frequency of service entries (x-axis) and total maintenance cost (y-axis).

Figure 17. Boxplots confirming that older vehicles are more likely to incur higher failure frequency and higher maintenance costs.

Figure 18. Quantile-based DMG matrices illustrating risk segmentation across Large, Midsize, and Small vehicle groups.

Figure 19. Structure of the decision-making grid (DMG), showing the nine zones formed by combining failure frequency and cost, which define the low–low to high–high risk categories used throughout the analysis. Green indicates low risk, yellow medium risk, and red high risk based on failure frequency and maintenance cost.

Figure 20. Boxplot for the age of vehicles according to DMG zones.

Figure 21. Weibull curves illustrating accelerated wear-out for Large/Midsize vehicles and diffuse failures for the Small category.

Figure 22. Kaplan–Meier Survival Curves for vehicle categories, showing that Small vehicles exhibit the fastest decline in survival probability, while Large vehicles fail sharply between years 8 and 10. The shaded areas represent the confidence intervals of the survival estimates.

Figure 23. Cumulative explained variance curve indicating that approximately 10 principal components capture around 80% of the total variance, marking the optimal dimensionality threshold based on the elbow method.

Figure 24. Accuracy and F1 outcomes showing strong class imbalance effects, with only SVC (linear) reaching moderate performance levels.

Figure 25. Model performance comparison showing that, after removing non-actionable predictors, ensemble methods (RandomForest, ExtraTrees, Bagging) achieve the highest accuracy and F1-score, confirming their robustness under the reduced feature set.

Figure 26. Decision-making grids for Large, Midsize, and Small vehicles under the largest gap method, illustrating large empty regions in the matrix and insufficient data density—particularly for the Small category.

Figure 27. Hybrid DMG segmentation combining percentile-based thresholds and largest gap boundaries to produce more balanced and stable zone definitions across vehicle categories.

Figure 28. Accuracy and F1-score comparison showing that, under the hybrid DMG segmentation with augmented and SMOTE-balanced data, ensemble models (RandomForest, ExtraTrees, Bagging) achieve the highest predictive performance, reaching an up to 93% accuracy and 90% F1-score.

Figure 29. Model accuracy and F1-score obtained using only actionable predictors, showing that ensemble methods (ExtraTrees, RandomForest, Bagging) maintain high performance—up to 86% accuracy—substantially outperforming the percentile-based DMG segmentation.

Figure 30. SHAP-derived feature importance showing cost variables as dominant and behavioral metrics as secondary contributors.

Table 1. Chi-square results.

Segment	Chi-Square	p-Value	Degrees of Freedom
Large	22	0.0786	14
Midsize	16	0.1816	12
Small	72	0.0686	56

Table 2. Chi-square results by vehicle tonnage segment, frequency of failure and costs of failure.

Vehicle Category	Large	Midsize	Small
Frequency Tier	Average Age
Low	8.40	9.50	5.20
Mid	7.70	9.33	5.52
High	9.00	8.71	6.28
Chi-square frequency	4.64	5.74	25.5
p-value	0.33	0.22	0.06
Cost Tier	Average Age
Low	7.50	9.86	5.30
Mid	8.30	9.00	4.68
High	9.25	8.71	6.90
Chi-square cost	5.88	5.65	25.9
p-value	0.21	0.23	0.0553

Table 3. Weibull Beta parameter—bootstrap results by vehicle category.

Vehicle Category	Mean_Beta	Std_Beta	Min_Beta	Max_Beta
Large	11.15	4.70	6.54	43.77
Small	2.37	0.25	1.84	3.69
Midsize	9.87	4.02	7.76	89.15

Table 4. Summary of model performance metrics.

Rank	Model	Accuracy	F1-Score	Key Notes
1	SVC (Linear Kernel)	0.67	0.74	Best-performing model
2	XGBoost/ExtraTrees	0.58	0.61	Next best performing model
3	RandomForest	0.50	0.54	Moderate performance
4	Logistic Regression/Naive Bayes/Bagging	0.42	0.42	Limited ability to model this dataset
5	KNN/GradientBoosting/MLP	0.33	0.37	Underfitting
6	Nystroem+SVC	0.25	0.1	Kernel approximation ineffective here
7	AdaBoost/DecisionTree/	0.17	0.1	Poor generalization
8	SVC (RBF Kernel)/Fourier + LogisticRegression	0.08	0.06	Lowest performance among all models

Table 5. Summary of Model Performance Metrics.

Rank	Model	Accuracy	F1-Score	Key Notes
1	RandomForest/ExtraTrees	0.58	0.55	Best-performing models
2	AdaBoost/Naive Bayes/Bagging/XGBoost	0.47	0.35/0.43/0.42/0.44	Moderate performance
3	DecisionTree	0.42	0.37	limited generalization
4	Logistic Regression/SVC (Linear Kernel)/Fourier + LogisticRegression	0.32	0.22/0.24/0.3	Hard to capture nonlinear patterns
5	KNN	0.26	0.21	Low predictive stability
6	SVC (RBF Kernel)/MLP	0.16/0.11	0.08/0.10	Underfitting

Table 6. Summary of model performance metrics.

Rank	Model	Accuracy	F1-Score	Key Notes
1	RandomForest/ExtraTrees/Bagging	0.93	0.90	Best-performing models
2	Logistic Regression/Naive Bayes/DecisionTree/XGBoost	0.87	0.87/0.87/0.84/0.88	High overall performance
3	GradientBoosting	0.80	0.83	Solid performance
4	KNN	0.73	0.79	Good performance
5	SVC (RBF Kernel)/MLP/SVC (Linear Kernel)	0.67	0.74	Moderate performance
6	Fourier + LogisticRegression	0.60	0.66	Kernel approximation provides limited gains
7	AdaBoost	0.13	0.08	Underfitting

Table 7. Summary of model performance metrics.

Rank	Model	Accuracy	F1-Score	Key Notes
1	ExtraTrees/Nystroem + SVC	0.86	0.79	Best-performing models
2	RandomForest/Logistic Regression/KNN/Naive Bayes/DecisionTree/XGBoost/GradientBoosting/Bagging/SVC (Linear Kernel)	0.79	0.8–0.7	Broad group of well-performing models; stable generalization
3	MLP/SVC (RBF Kernel)/	0.71	0.68	Moderate performance
4	AdaBoost	0.57	0.55	Underfitting
5	Fourier + LogisticRegression	0.50	0.47	Kernel approximation provides modest predictive power

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Labib, A.; Tǎnǎsuicǎ, C.; Seecharan, T.S.; Roman, M.-D. Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework. Appl. Syst. Innov. 2026, 9, 63. https://doi.org/10.3390/asi9030063

AMA Style

Labib A, Tǎnǎsuicǎ C, Seecharan TS, Roman M-D. Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework. Applied System Innovation. 2026; 9(3):63. https://doi.org/10.3390/asi9030063

Chicago/Turabian Style

Labib, Ashraf, Coralia Tǎnǎsuicǎ (Zotic), Turuna S. Seecharan, and Mihai-Daniel Roman. 2026. "Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework" Applied System Innovation 9, no. 3: 63. https://doi.org/10.3390/asi9030063

APA Style

Labib, A., Tǎnǎsuicǎ, C., Seecharan, T. S., & Roman, M.-D. (2026). Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework. Applied System Innovation, 9(3), 63. https://doi.org/10.3390/asi9030063

Article Menu

Data-Driven Fleet Optimization Using ML Algorithms and a Decision-Making Grid Framework

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Data Description

3.2. Clustering and Principal Component Analysis—Statistical Analysis

3.3. Decision-Making Grid Construction

3.4. Survival and Reliability Analysis

3.5. Classification Models—Interpretability

3.6. Thresholding DMG

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

Appendix H

Appendix I

Appendix J

Appendix K

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI