Appendix A
Figure A1.
Heatmap for the maintenance costs by category and cluster, showing that cluster 1 has higher costs across multiple repair types.
Figure A1.
Heatmap for the maintenance costs by category and cluster, showing that cluster 1 has higher costs across multiple repair types.
Appendix B
Details of the included expenses:
1—braking expenses;
2—suspension expenses;
3—exhaust and catalysis expenses;
4—engine and transmission expenses;
5—electrical expenses;
6—air conditioning expenses;
7—wheels and tires expenses;
8—body and accessories expenses;
9—AdBlue expenses;
10—miscellaneous expenses;
11—fluid expenses;
12—accident-related expenses;
13—towing expenses.
Appendix C
Figure A2.
Boxplot showing no significant cluster differences in braking-system expenses, despite behavioral distinctions.
Figure A2.
Boxplot showing no significant cluster differences in braking-system expenses, despite behavioral distinctions.
Appendix D
| Variable | Source | Sum of Squares | df | F | p-Value |
| Total_Break1 | Cluster_Braking | 1.16 × 106 | 5 | 68.25 | 5.84 × 10−29 |
| | Residual | 3.03 × 105 | 89 | – | – |
| Total_Break2 | Cluster_Braking | 728,180.13 | 5 | 115.2 | 2.53 × 10−37 |
| | Residual | 112,484.23 | 89 | – | – |
| Total_Break3 | Cluster_Braking | 59,284.36 | 5 | 195.4 | 2.13 × 10−46 |
| | Residual | 5,401.79 | 89 | – | – |
| cheltuieli_franare | Cluster_Braking | 4.64 × 108 | 5 | 0.51 | 0.766 |
| | Residual | 1.61 × 1010 | 89 | – | – |
Appendix E
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=========================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------
0 1 415.8393 0.0 293.6209 538.0577 True
0 2 178.9821 0.0 139.6726 218.2917 True
0 3 22.8393 0.9988 −148.5072 194.1857 False
0 4 225.2679 0.0 157.1817 293.354 True
0 5 469.8393 0.0 298.4928 641.1857 True
1 2 −236.8571 0.0 −361.1649 −112.5494 True
1 3 −393.0 0.0 −601.0067 −184.9933 True
1 4 −190.5714 0.0014 −326.7438 −54.3991 True
1 5 54.0 0.974 −154.0067 262.0067 False
2 3 −156.1429 0.1004 −328.9858 16.7001 False
2 4 46.2857 0.4221 −25.4834 118.0548 False
2 5 290.8571 0.0001 118.0142 463.7001 True
3 4 202.4286 0.0198 20.8654 383.9917 True
3 5 447.0 0.0 206.8145 687.1855 True
4 5 244.5714 0.0023 63.0083 426.1346 True
---------------------------------------------------------
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=========================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------
0 1 524.1607 0.0 449.6433 598.6781 True
0 2 45.4107 0.0 21.4434 69.378 True
0 3 −3.8393 1.0 −108.3104 100.6318 False
0 4 151.4464 0.0 109.9338 192.959 True
0 5 323.1607 0.0 218.6896 427.6318 True
1 2 −478.75 0.0 −554.5413 −402.9587 True
1 3 −528.0 0.0 −654.8231 −401.1769 True
1 4 −372.7143 0.0 −455.7395 −289.6891 True
1 5 −201.0 0.0002 −327.8231 −74.1769 True
2 3 −49.25 0.7498 −154.6335 56.1335 False
2 4 106.0357 0.0 62.2776 149.7938 True
2 5 277.75 0.0 172.3665 383.1335 True
3 4 155.2857 0.0013 44.5855 265.986 True
3 5 327.0 0.0 180.5573 473.4427 True
4 5 171.7143 0.0003 61.014 282.4145 True
---------------------------------------------------------
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=========================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------
0 1 45.0357 0.0 28.7059 61.3655 True
0 2 2.5357 0.7232 −2.7165 7.7879 False
0 3 174.0357 0.0 151.1419 196.9296 True
0 4 39.8929 0.0 30.7958 48.99 True
0 5 140.0357 0.0 117.1419 162.9296 True
1 2 −42.5 0.0 −59.1089 −25.8911 True
1 3 129.0 0.0 101.2079 156.7921 True
1 4 −5.1429 0.9625 −23.337 13.0513 False
1 5 95.0 0.0 67.2079 122.7921 True
2 3 171.5 0.0 148.4062 194.5938 True
2 4 37.3571 0.0 27.768 46.9463 True
2 5 137.5 0.0 114.4062 160.5938 True
3 4 −134.1429 0.0 −158.4018 −109.8839 True
3 5 −34.0 0.0313 −66.0915 −1.9085 True
4 5 100.1429 0.0 75.8839 124.4018 True
---------------------------------------------------------
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −7644.0803 0.9686 −35,844.8716 20,556.711 False
0 2 −730.6443 0.9999 −9800.9708 8339.6823 False
0 3 −2233.9852 1.0 −41,770.6221 37,302.6516 False
0 4 −7056.7682 0.7797 −22,767.0347 8653.4982 False
0 5 −8699.9653 0.9875 −48,236.6022 30,836.6715 False
1 2 6913.4361 0.9812 −21,769.455 35,596.3271 False
1 3 5410.0951 0.9995 −42,585.5617 53,405.7519 False
1 4 587.3121 1.0 −30,833.2208 32,007.845 False
1 5 −1055.885 1.0 −49,051.5418 46,939.7718 False
2 3 −1503.341 1.0 −41,385.2825 38,378.6006 False
2 4 −6326.1239 0.8749 −22,886.1988 10,233.9509 False
2 5 −7969.3211 0.992 −47,851.2626 31,912.6205 False
3 4 −4822.783 0.9994 −46,716.8268 37,071.2609 False
3 5 −6465.9801 0.9994 −61,886.5908 48,954.6306 False
4 5 −1643.1971 1.0 −43,537.241 40,250.8467 False
Appendix F
ANOVA Results for cheltuieli_franare:
sum_sq df F PR (>F)
C(Cluster) 2.841738 × 108 3.0 0.52916 0.66339
Residual 1.628987 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_franare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −3158.7533 0.8103 −12,453.1318 6135.6253 False
0 2 −7364.0565 0.8723 −32,735.5188 18,007.4058 False
0 3 475.6588 0.9988 −7903.1866 8854.5042 False
1 2 −4205.3032 0.974 −30,066.2922 21,655.6858 False
1 3 3634.4121 0.7643 −6126.9764 13,395.8005 False
2 3 7839.7153 0.8528 −17,706.5238 33,385.9543 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_suspensie:
sum_sq df F PR (>F)
C(Cluster) 3.463899 × 108 3.0 0.806233 0.493608
Residual 1.303241 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_suspensie:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −3984.0542 0.5942 −12,297.3633 4329.2549 False
0 2 −8066.3515 0.7887 −30,759.7255 14,627.0225 False
0 3 −2922.6289 0.7378 −10,417.0441 4571.7863 False
1 2 −4082.2973 0.9671 −27,213.5259 19,048.9313 False
1 3 1061.4253 0.9888 −7669.5984 9792.449 False
2 3 5143.7226 0.9351 −17,705.9796 27,993.4248 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_motor_transmisie:
sum_sq df F PR (>F)
C(Cluster) 4.989381 × 108 3.0 0.867066 0.461234
Residual 1.745479 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_motor_transmisie:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −5132.9774 0.5049 −14,753.949 4487.9942 False
0 2 −9396.966 0.7854 −35,659.9505 16,866.0184 False
0 3 −1245.7583 0.9818 −9919.0261 7427.5095 False
1 2 −4263.9887 0.9755 −31,033.7012 22,505.7239 False
1 3 3887.2191 0.7458 −6217.1726 13,991.6108 False
2 3 8151.2078 0.8511 −18,292.695 34,595.1105 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_evacuare_catalizare:
sum_sq df F PR (>F)
C(Cluster) 1.343718 × 108 3.0 0.444019 0.722111
Residual 9.179672 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_evacuare_catalizare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −1492.999 0.9436 −8470.1031 5484.1051 False
0 2 −5173.2767 0.8925 −24,219.1261 13,872.5726 False
0 3 1033.5704 0.9732 −5256.2613 7323.4021 False
1 2 −3680.2777 0.9598 −23,093.605 15,733.0496 False
1 3 2526.5694 0.8036 −4801.1097 9854.2485 False
2 3 6206.8471 0.8318 −12,970.2037 25,383.8979 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_electrice:
sum_sq df F PR (>F)
C(Cluster) 2.407304 × 108 3.0 0.379846 0.767753
Residual 1.922398 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_electrice:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −1973.8559 0.9561 −12,070.6434 8122.9316 False
0 2 −9511.849 0.8032 −37,073.6986 18,050.0006 False
0 3 318.8126 0.9997 −8783.4014 9421.0265 False
1 2 −7537.9931 0.896 −35,631.6316 20,555.6454 False
1 3 2292.6684 0.9419 −8311.4472 12,896.7841 False
2 3 9830.6615 0.7904 −17,921.0538 37,582.3769 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_climatizare:
sum_sq df F PR (>F)
C(Cluster) 7.324592 × 108 3.0 1.361474 0.259614
Residual 1.631902 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_climatizare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −6224.3171 0.3037 −15,527.0106 3078.3764 False
0 2 −7836.203 0.8507 −33,230.3633 17,557.9573 False
0 3 24.6508 1.0 −8361.6906 8410.9922 False
1 2 −1611.8859 0.9984 −27,496.0109 24,272.239 False
1 3 6248.9679 0.3434 −3521.1533 16,019.0891 False
2 3 7860.8538 0.8521 −17,708.2396 33,429.9473 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_roti_anvelope:
sum_sq df F PR (>F)
C(Cluster) 1.893244 × 107 3.0 0.190798 0.902421
Residual 3.009906 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_roti_anvelope:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −732.3808 0.9634 −4727.5766 3262.8149 False
0 2 −2667.935 0.9187 −13,573.8776 8238.0076 False
0 3 −271.6284 0.9973 −3873.2815 3330.0248 False
1 2 −1935.5542 0.9684 −13,051.9202 9180.8118 False
1 3 460.7525 0.9917 −3735.1878 4656.6928 False
2 3 2396.3067 0.9404 −8584.7638 13,377.3772 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_caroserie_accesorii:
sum_sq df F PR (>F)
C(Cluster) 7.002934 × 107 3.0 0.29172 0.831278
Residual 7.281723 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_caroserie_accesorii:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −460.6617 0.9974 −6674.7683 5753.4449 False
0 2 −5884.999 0.8006 −22,848.0451 11,078.0471 False
0 3 72.4271 1.0 −5529.5654 5674.4196 False
1 2 −5424.3373 0.8444 −22,714.6748 11,866.0003 False
1 3 533.0888 0.9965 −5993.2548 7059.4324 False
2 3 5957.4261 0.798 −11,122.4736 23,037.3257 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_combustibil_adblue:
sum_sq df F PR (>F)
C(Cluster) 5.005394 × 106 3.0 0.309556 0.818422
Residual 4.904782 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_combustibil_adblue:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==========================================================
group1 group2 meandiff p-adj lower upper reject
----------------------------------------------------------
0 1 −285.3995 0.9669 −1898.1658 1327.3669 False
0 2 −823.0772 0.9613 −5225.5492 3579.3947 False
0 3 −492.5824 0.8118 −1946.4849 961.3201 False
1 2 −537.6777 0.9892 −5025.0926 3949.7372 False
1 3 −207.1829 0.9886 −1900.9851 1486.6193 False
2 3 330.4948 0.9973 −4102.3045 4763.2941 False
----------------------------------------------------------
ANOVA Results for cheltuieli_diverse:
sum_sq df F PR (>F)
C(Cluster) 1.147887 × 108 3.0 0.651101 0.584282
Residual 5.347747 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_diverse:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −1752.3966 0.8247 −7077.7327 3572.9396 False
0 2 −2928.8825 0.9523 −17,465.7947 11,608.0298 False
0 3 1013.5904 0.9456 −3787.179 5814.3599 False
1 2 −1176.4859 0.9968 −15,993.8789 13,640.9071 False
1 3 2765.987 0.5689 −2826.9286 8358.9026 False
2 3 3942.4729 0.8949 −10,694.58 18,579.5258 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_fluide:
sum_sq df F PR (>F)
C(Cluster) 5.117825 × 108 3.0 0.740223 0.530771
Residual 2.097215 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_fluide:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
=============================================================
group1 group2 meandiff p-adj lower upper reject
-------------------------------------------------------------
0 1 −4412.2682 0.6935 −14,958.1553 6133.619 False
0 2 −9414.9568 0.8274 −38,202.7425 19,372.8289 False
0 3 546.7461 0.9988 −8960.3294 10,053.8215 False
1 2 −5002.6886 0.9702 −34,345.9169 24,340.5397 False
1 3 4959.0142 0.6461 −6116.7667 16,034.7952 False
2 3 9961.7029 0.8051 −19,024.3937 38,947.7995 False
-------------------------------------------------------------
ANOVA Results for cheltuieli_accident:
sum_sq df F PR (>F)
C(Cluster) 3.214899 × 106 3.0 0.398348 0.754503
Residual 2.448077 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_accident:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==========================================================
group1 group2 meandiff p-adj lower upper reject
----------------------------------------------------------
0 1 −331.9215 0.8711 −1471.316 807.473 False
0 2 −394.667 0.9873 −3504.9454 2715.6114 False
0 3 −394.667 0.7465 −1421.8266 632.4926 False
1 2 −62.7455 0.9999 −3233.0347 3107.5438 False
1 3 −62.7455 0.9991 −1259.3905 1133.8996 False
2 3 0.0 1.0 −3131.7042 3131.7042 False
----------------------------------------------------------
ANOVA Results for cheltuieli_tractare:
sum_sq df F PR (>F)
C(Cluster) 5.951550 × 106 3.0 0.664992 0.575693
Residual 2.714774 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_tractare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==========================================================
group1 group2 meandiff p-adj lower upper reject
----------------------------------------------------------
0 1 −609.1466 0.5472 −1809.0008 590.7076 False
0 2 −697.6475 0.9443 −3972.9666 2577.6716 False
0 3 −337.4327 0.8466 −1419.0965 744.2312 False
1 2 −88.5009 0.9999 −3427.0153 3250.0135 False
1 3 271.7139 0.9424 −988.4287 1531.8566 False
2 3 360.2149 0.9918 −2937.667 3658.0967 False
Appendix G
ANOVA Results for cheltuieli_franare:
sum_sq df F PR (>F)
C(Cluster) 7.155399 × 109 3.0 23.044421 3.476264 × 10−11
Residual 9.418640 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_franare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 26,287.601 0.0 16,507.5664 36,067.6355 True
0 2 17,924.1016 0.0002 7054.2009 28,794.0023 True
0 3 −233.1758 0.9996 −6236.4646 5770.1131 False
1 2 −8363.4994 0.3664 −21,781.5961 5054.5973 False
1 3 −26,520.7767 0.0 −36,416.7383 −16,624.8152 True
2 3 −18,157.2774 0.0002 −29,131.5983 −7182.9564 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_suspensie:
sum_sq df F PR (>F)
C(Cluster) 7.107431 × 109 3.0 34.377208 6.021829 × 10−15
Residual 6.271366 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_suspensie:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 29,424.1832 0.0 21,443.7334 37,404.6331 True
0 2 10,267.1506 0.0165 1397.3764 19,136.9247 True
0 3 952.2923 0.9568 −3946.3556 5850.9402 False
1 2 −19,157.0327 0.0001 −30,106.1196 −8207.9458 True
1 3 −28,471.8909 0.0 −36,546.9366 −20,396.8453 True
2 3 −9314.8583 0.0382 −18,269.8387 −359.8778 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_motor_transmisie:
sum_sq df F PR (>F)
C(Cluster) 4.772522 × 109 3.0 10.982796 0.000003
Residual 1.318121 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_motor_transmisie:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 23,815.5287 0.0 12245.7814 35,385.276 True
0 2 4838.9764 0.7584 −8020.0788 17,698.0316 False
0 3 −727.0795 0.9932 −7828.9496 6374.7906 False
1 2 −18,976.5523 0.0124 −34,850.1147 −3102.9899 True
1 3 −24,542.6082 0.0 −36,249.4968 −12,835.7196 True
2 3 −5566.0559 0.6769 −18,548.64 7416.5281 False
---------------------------------------------------------------
ANOVA Results for cheltuieli_evacuare_catalizare:
sum_sq df F PR (>F)
C(Cluster) 3.835001 × 109 3.0 21.231507 1.637299 × 10−10
Residual 5.479044 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_evacuare_catalizare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 13,000.9049 0.0001 5541.5947 20,460.2151 True
0 2 20,537.5277 0.0 12,246.9679 28,828.0875 True
0 3 −364.6007 0.9968 −4943.3569 4214.1555 False
1 2 7536.6228 0.2239 −2697.4664 17,770.712 False
1 3 −13,365.5056 0.0001 −20,913.2343 −5817.7769 True
2 3 −20,902.1284 0.0 −29,272.3304 −12,531.9265 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_electrice:
sum_sq df F PR (>F)
C(Cluster) 8.008809 × 109 3.0 21.20601 1.673999 × 10−10
Residual 1.145590 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_electrice:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 25,987.478 0.0 15,201.4674 36,773.4886 True
0 2 21,601.0922 0.0001 9613.1116 33,589.0728 True
0 3 −713.2676 0.9921 −7334.0558 5907.5206 False
1 2 −4386.3858 0.8652 −19,184.6703 10,411.8987 False
1 3 −26,700.7456 0.0 −37,614.6075 −15,786.8837 True
2 3 −22,314.3598 0.0 −34,417.5013 −10,211.2183 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_climatizare:
sum_sq df F PR (>F)
C(Cluster) 5.827819 × 109 3.0 15.750395 2.492119 × 10−8
Residual 1.122367 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_climatizare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==============================================================
group1 group2 meandiff p-adj lower upper reject
--------------------------------------------------------------
0 1 24,397.7049 0.0 13,721.58 35,073.8298 True
0 2 14,838.0674 0.0081 2972.218 26,703.9169 True
0 3 −120.3248 1.0 −6673.6617 6433.0122 False
1 2 −9559.6375 0.3255 −24,207.16 5087.8851 False
1 3 −24,518.0297 0.0 −35,320.7034 −13,715.356 True
2 3 −14,958.3922 0.0082 −26,938.2294 −2978.555 True
--------------------------------------------------------------
ANOVA Results for cheltuieli_roti_anvelope:
sum_sq df F PR (>F)
C(Cluster) 1.436568 × 109 3.0 27.367145 1.051549 × 10−12
Residual 1.592270 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_roti_anvelope:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==============================================================
group1 group2 meandiff p-adj lower upper reject
--------------------------------------------------------------
0 1 13,262.6445 0.0 9241.4536 17,283.8354 True
0 2 5932.7756 0.0043 1463.4718 10,402.0795 True
0 3 1236.1569 0.5586 −1232.175 3704.4887 False
1 2 −7329.8689 0.0043 −12,846.8973 −1812.8405 True
1 3 −12,026.4876 0.0 −16,095.3435 −7957.6318 True
2 3 −4696.6187 0.0381 −9208.8564 −184.3811 True
--------------------------------------------------------------
ANOVA Results for cheltuieli_caroserie_accesorii:
sum_sq df F PR (>F)
C(Cluster) 3.087725 × 109 3.0 21.965378 8.689141 × 10−11
Residual 4.264028 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_caroserie_accesorii:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 18,921.8382 0.0 12,341.3774 25,502.299 True
0 2 9555.8923 0.0051 2242.119 16,869.6656 True
0 3 1142.6196 0.8805 −2896.6715 5181.9108 False
1 2 −9365.946 0.039 −18,394.2637 −337.6282 True
1 3 −17,779.2186 0.0 −24,437.6805 −11,120.7567 True
2 3 −8413.2726 0.0189 −15,797.3047 −1029.2405 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_combustibil_adblue:
sum_sq df F PR (>F)
C(Cluster) 1.034293 × 108 3.0 8.002354 0.000086
Residual 3.920542 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_combustibil_adblue:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
============================================================
group1 group2 meandiff p-adj lower upper reject
------------------------------------------------------------
0 1 3485.9325 0.0001 1490.5813 5481.2838 True
0 2 −308.624 0.9834 −2526.333 1909.0849 False
0 3 −95.5397 0.997 −1320.3483 1129.2688 False
1 2 −3794.5566 0.0026 −6532.1559 −1056.9573 True
1 3 −3581.4723 0.0001 −5600.4753 −1562.4692 True
2 3 213.0843 0.9945 −2025.9287 2452.0974 False
------------------------------------------------------------
ANOVA Results for cheltuieli_diverse:
sum_sq df F PR (>F)
C(Cluster) 1.098222 × 109 3.0 7.632985 0.000131
Residual 4.364313 × 109 91.0 NaN NaN
Tukey HSD Results for cheltuieli_diverse:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==============================================================
group1 group2 meandiff p-adj lower upper reject
--------------------------------------------------------------
0 1 9402.8761 0.0021 2745.4823 16,060.2699 True
0 2 7709.7318 0.0378 310.4523 15,109.0113 True
0 3 −714.0763 0.968 −4800.5913 3372.4388 False
1 2 −1693.1443 0.9622 −10,827.0133 7440.7247 False
1 3 −10,116.9523 0.0009 −16,853.2592 −3380.6455 True
2 3 −8423.8081 0.0206 −15,894.1678 −953.4483 True
--------------------------------------------------------------
ANOVA Results for cheltuieli_fluide:
sum_sq df F PR (>F)
C(Cluster) 8.530027 × 109 3.0 19.974215 4.947865 × 10−10
Residual 1.295391 × 1010 91.0 NaN NaN
Tukey HSD Results for cheltuieli_fluide:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===============================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------
0 1 28,259.0217 0.0 16,789.4632 39,728.5802 True
0 2 20,124.9599 0.0005 7377.2583 32,872.6614 True
0 3 −476.629 0.998 −7517.0001 6563.7421 False
1 2 −8134.0618 0.5321 −23,870.1662 7602.0426 False
1 3 −28,735.6507 0.0 −40,341.1628 −17,130.1385 True
2 3 −20,601.5889 0.0004 −33,471.7495 −7731.4282 True
---------------------------------------------------------------
ANOVA Results for cheltuieli_accident:
sum_sq df F PR (>F)
C(Cluster) 3.103811 × 106 3.0 0.384409 0.76448
Residual 2.449188 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_accident:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
==========================================================
group1 group2 meandiff p-adj lower upper reject
----------------------------------------------------------
0 1 −375.8733 0.9242 −1952.9663 1201.2197 False
0 2 −178.6733 0.9933 −1931.5142 1574.1676 False
0 3 −375.8733 0.7404 −1343.942 592.1953 False
1 2 197.2 0.9952 −1966.5538 2360.9538 False
1 3 0.0 1.0 −1595.787 1595.787 False
2 3 −197.2 0.9913 −1966.8793 1572.4793 False
----------------------------------------------------------
ANOVA Results for cheltuieli_tractare:
sum_sq df F PR (>F)
C(Cluster) 6.526823 × 107 3.0 9.331617 0.000019
Residual 2.121608 × 108 91.0 NaN NaN
Tukey HSD Results for cheltuieli_tractare:
Multiple Comparison of Means—Tukey HSD, FWER = 0.05
===========================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------
0 1 2796.5333 0.0 1328.693 4264.3736 True
0 2 770.616 0.6056 −860.7974 2402.0293 False
0 3 −22.962 0.9999 −923.968 878.044 False
1 2 −2025.9173 0.0481 −4039.7776 −12.057 True
1 3 −2819.4953 0.0 −4304.7346 −1334.256 True
2 3 −793.578 0.59 −2440.6632 853.5073 False
Appendix H
Figure A3.
1—braking expenses; 2—suspension expenses; 3—exhaust and catalysis expenses; 4—engine and transmission expenses; 5—electrical expenses; 6—air conditioning expenses; 7—wheels and tires expenses; 8—body and accessories expenses; 9—AdBlue expenses; 10—miscellaneous expenses; 11—fluids expenses; 12—accident-related expenses; 13—towing expenses.
Figure A3.
1—braking expenses; 2—suspension expenses; 3—exhaust and catalysis expenses; 4—engine and transmission expenses; 5—electrical expenses; 6—air conditioning expenses; 7—wheels and tires expenses; 8—body and accessories expenses; 9—AdBlue expenses; 10—miscellaneous expenses; 11—fluids expenses; 12—accident-related expenses; 13—towing expenses.
Appendix I
‘cheltuieli_franare’: <class ‘statsmodels.iolib.summary.Summary’>
“““
OLS Regression Results
==============================================================
Dep. Variable: cheltuieli_franare R-squared: 0.395
Model: OLS Adj. R-squared: 0.323
Method: Least Squares F-statistic: 5.486
Date: Wed, 12 Feb 2025 Prob (F-statistic): 3.12 × 10−6
Time: 14:40:49 Log-Likelihood: −1012.3
No. Observations: 95 AIC: 2047.
Df Residuals: 84 BIC: 2075.
Df Model: 10 Covariance Type: nonrobust
==============================================================
coef std err t P > |t| [0.025 0.975]
----------------------------------------------------------------------------
const 7688.6229 1120.889 6.859 0.000 5459.611 9917.635
PC1 −609.9627 165.231 −3.692 0.000 −938.543 −281.382
PC2 942.6661 276.673 3.407 0.001 392.472 1492.861
PC3 279.7312 308.564 0.907 0.367 −333.882 893.345
PC4 358.7384 314.072 1.142 0.257 −265.829 983.306
PC5 −437.0935 403.623 −1.083 0.282 −1239.742 365.555
PC6 −890.0702 479.432 −1.857 0.067 −1843.473 63.332
PC7 −723.4756 504.791 −1.433 0.156 −1727.308 280.357
PC8 1545.1146 525.902 2.938 0.004 499.302 2590.928
PC9 −1833.7117 535.557 −3.424 0.001 −2898.725 −768.698
PC10 402.6532 587.966 0.685 0.495 −766.581 1571.888
==============================================================
Omnibus: 36.408 Durbin–Watson: 1.582
Prob(Omnibus): 0.000 Jarque–Bera (JB): 81.394
Skew: 1.433 Prob(JB): 2.12 × 10−18
Kurtosis: 6.514 Cond. No. 6.78
==============================================================
Appendix J
Figure A4.
SHAP summary plot—feature impact on vehicle DMG classification (SVM model) showing that kilometers driven in 2023, braking expenses, and bodywork-related costs are the strongest predictors pushing vehicles toward higher-risk DMG zones (source: author’s contribution, generated using the Python 3.10 programming language).
Figure A4.
SHAP summary plot—feature impact on vehicle DMG classification (SVM model) showing that kilometers driven in 2023, braking expenses, and bodywork-related costs are the strongest predictors pushing vehicles toward higher-risk DMG zones (source: author’s contribution, generated using the Python 3.10 programming language).
Appendix K
Figure A5.
SHAP summary plot showing that total expenses, maintenance cost per kilometer, and electrical expenses are the most influential predictors, with feature importance distributed more uniformly across variables compared to previous models (source: author’s contribution, generated using the Python programming language).
Figure A5.
SHAP summary plot showing that total expenses, maintenance cost per kilometer, and electrical expenses are the most influential predictors, with feature importance distributed more uniformly across variables compared to previous models (source: author’s contribution, generated using the Python programming language).
Figure 1.
The actions, methodologies, and outputs.
Figure 1.
The actions, methodologies, and outputs.
Figure 2.
Analytical framework for fleet behavior optimization. In the central matrix, green represents low maintenance cost and low service frequency (low risk), while red represents high cost and high service frequency (high risk).
Figure 2.
Analytical framework for fleet behavior optimization. In the central matrix, green represents low maintenance cost and low service frequency (low risk), while red represents high cost and high service frequency (high risk).
Figure 3.
Percentage distribution of maintenance expenses by category between 2022 and 2024, showing an increase in repair-related costs and a decline in tire expenses.
Figure 3.
Percentage distribution of maintenance expenses by category between 2022 and 2024, showing an increase in repair-related costs and a decline in tire expenses.
Figure 4.
The elbow method, used to determine the optimal number of clusters for accelerometer-based segmentation; the curve shows an inflection point at k = 5.
Figure 4.
The elbow method, used to determine the optimal number of clusters for accelerometer-based segmentation; the curve shows an inflection point at k = 5.
Figure 5.
Visualization of the five K-means clusters in the PCA1–PCA2 space, illustrating how principal component reduction enables graphical inspection of high-dimensional behavioral segmentation.
Figure 5.
Visualization of the five K-means clusters in the PCA1–PCA2 space, illustrating how principal component reduction enables graphical inspection of high-dimensional behavioral segmentation.
Figure 6.
Boxplots for total number of events (left graph) and acceleration overload (right graph).
Figure 6.
Boxplots for total number of events (left graph) and acceleration overload (right graph).
Figure 7.
Boxplots for braking overload (left graph), cornering overload (middle graph) and average speed during the events (right graph).
Figure 7.
Boxplots for braking overload (left graph), cornering overload (middle graph) and average speed during the events (right graph).
Figure 8.
Right-skewed histogram, with most vehicles falling within moderate usage levels and a few outliers showing minimal or intensive driving activity.
Figure 8.
Right-skewed histogram, with most vehicles falling within moderate usage levels and a few outliers showing minimal or intensive driving activity.
Figure 9.
PCA1–PCA2 projection of the seven clusters (PCA1 on the x-axis and PCA2 on the y-axis), distinguishing risky from high-distance driving behaviors.
Figure 9.
PCA1–PCA2 projection of the seven clusters (PCA1 on the x-axis and PCA2 on the y-axis), distinguishing risky from high-distance driving behaviors.
Figure 10.
PCA loading heatmap showing the contribution of driving-related variables to the first two principal components; PCA2 is strongly driven by long-distance and moderate-speed driving indicators.
Figure 10.
PCA loading heatmap showing the contribution of driving-related variables to the first two principal components; PCA2 is strongly driven by long-distance and moderate-speed driving indicators.
Figure 11.
Boxplots for total distance traveled (left), total time driven at speeds between 10 and 60 km/h (middle), total time driven at speeds between 60 and 110 km/h (right).
Figure 11.
Boxplots for total distance traveled (left), total time driven at speeds between 10 and 60 km/h (middle), total time driven at speeds between 60 and 110 km/h (right).
Figure 12.
Boxplots for total aggressive starts (left), total number of unsafe brakes at speeds between 60 and 110 km/h (middle), total number of unsafe driving at speeds between 60 and 110 km/h (right).
Figure 12.
Boxplots for total aggressive starts (left), total number of unsafe brakes at speeds between 60 and 110 km/h (middle), total number of unsafe driving at speeds between 60 and 110 km/h (right).
Figure 13.
Distribution of unsafe braking events across the five K-means clusters, showing behavioral differences at low (10–60 km/h), medium (60–110 km/h), and high (>110 km/h) speeds; cluster 0 has the most prudent braking patterns.
Figure 13.
Distribution of unsafe braking events across the five K-means clusters, showing behavioral differences at low (10–60 km/h), medium (60–110 km/h), and high (>110 km/h) speeds; cluster 0 has the most prudent braking patterns.
Figure 14.
Clustering based on usage-intensity indicators: the elbow plot (left) identifies four optimal clusters, while the boxplot (right) shows clear separation between clusters, with cluster 2 displaying the highest levels of vehicle activity (engine starts, braking events, and curve-taking frequency).
Figure 14.
Clustering based on usage-intensity indicators: the elbow plot (left) identifies four optimal clusters, while the boxplot (right) shows clear separation between clusters, with cluster 2 displaying the highest levels of vehicle activity (engine starts, braking events, and curve-taking frequency).
Figure 15.
Clustering analysis based on fuel usage and mileage: the elbow plot (left) indicates four optimal clusters, while the two boxplots (middle and right) show differences in total fuel consumption and total kilometers across clusters.
Figure 15.
Clustering analysis based on fuel usage and mileage: the elbow plot (left) indicates four optimal clusters, while the two boxplots (middle and right) show differences in total fuel consumption and total kilometers across clusters.
Figure 16.
Decision-making grid (DMG) adapted for the case study, illustrating how vehicles are segmented based on the frequency of service entries (x-axis) and total maintenance cost (y-axis).
Figure 16.
Decision-making grid (DMG) adapted for the case study, illustrating how vehicles are segmented based on the frequency of service entries (x-axis) and total maintenance cost (y-axis).
Figure 17.
Boxplots confirming that older vehicles are more likely to incur higher failure frequency and higher maintenance costs.
Figure 17.
Boxplots confirming that older vehicles are more likely to incur higher failure frequency and higher maintenance costs.
Figure 18.
Quantile-based DMG matrices illustrating risk segmentation across Large, Midsize, and Small vehicle groups.
Figure 18.
Quantile-based DMG matrices illustrating risk segmentation across Large, Midsize, and Small vehicle groups.
Figure 19.
Structure of the decision-making grid (DMG), showing the nine zones formed by combining failure frequency and cost, which define the low–low to high–high risk categories used throughout the analysis. Green indicates low risk, yellow medium risk, and red high risk based on failure frequency and maintenance cost.
Figure 19.
Structure of the decision-making grid (DMG), showing the nine zones formed by combining failure frequency and cost, which define the low–low to high–high risk categories used throughout the analysis. Green indicates low risk, yellow medium risk, and red high risk based on failure frequency and maintenance cost.
Figure 20.
Boxplot for the age of vehicles according to DMG zones.
Figure 20.
Boxplot for the age of vehicles according to DMG zones.
Figure 21.
Weibull curves illustrating accelerated wear-out for Large/Midsize vehicles and diffuse failures for the Small category.
Figure 21.
Weibull curves illustrating accelerated wear-out for Large/Midsize vehicles and diffuse failures for the Small category.
Figure 22.
Kaplan–Meier Survival Curves for vehicle categories, showing that Small vehicles exhibit the fastest decline in survival probability, while Large vehicles fail sharply between years 8 and 10. The shaded areas represent the confidence intervals of the survival estimates.
Figure 22.
Kaplan–Meier Survival Curves for vehicle categories, showing that Small vehicles exhibit the fastest decline in survival probability, while Large vehicles fail sharply between years 8 and 10. The shaded areas represent the confidence intervals of the survival estimates.
Figure 23.
Cumulative explained variance curve indicating that approximately 10 principal components capture around 80% of the total variance, marking the optimal dimensionality threshold based on the elbow method.
Figure 23.
Cumulative explained variance curve indicating that approximately 10 principal components capture around 80% of the total variance, marking the optimal dimensionality threshold based on the elbow method.
Figure 24.
Accuracy and F1 outcomes showing strong class imbalance effects, with only SVC (linear) reaching moderate performance levels.
Figure 24.
Accuracy and F1 outcomes showing strong class imbalance effects, with only SVC (linear) reaching moderate performance levels.
Figure 25.
Model performance comparison showing that, after removing non-actionable predictors, ensemble methods (RandomForest, ExtraTrees, Bagging) achieve the highest accuracy and F1-score, confirming their robustness under the reduced feature set.
Figure 25.
Model performance comparison showing that, after removing non-actionable predictors, ensemble methods (RandomForest, ExtraTrees, Bagging) achieve the highest accuracy and F1-score, confirming their robustness under the reduced feature set.
Figure 26.
Decision-making grids for Large, Midsize, and Small vehicles under the largest gap method, illustrating large empty regions in the matrix and insufficient data density—particularly for the Small category.
Figure 26.
Decision-making grids for Large, Midsize, and Small vehicles under the largest gap method, illustrating large empty regions in the matrix and insufficient data density—particularly for the Small category.
Figure 27.
Hybrid DMG segmentation combining percentile-based thresholds and largest gap boundaries to produce more balanced and stable zone definitions across vehicle categories.
Figure 27.
Hybrid DMG segmentation combining percentile-based thresholds and largest gap boundaries to produce more balanced and stable zone definitions across vehicle categories.
Figure 28.
Accuracy and F1-score comparison showing that, under the hybrid DMG segmentation with augmented and SMOTE-balanced data, ensemble models (RandomForest, ExtraTrees, Bagging) achieve the highest predictive performance, reaching an up to 93% accuracy and 90% F1-score.
Figure 28.
Accuracy and F1-score comparison showing that, under the hybrid DMG segmentation with augmented and SMOTE-balanced data, ensemble models (RandomForest, ExtraTrees, Bagging) achieve the highest predictive performance, reaching an up to 93% accuracy and 90% F1-score.
Figure 29.
Model accuracy and F1-score obtained using only actionable predictors, showing that ensemble methods (ExtraTrees, RandomForest, Bagging) maintain high performance—up to 86% accuracy—substantially outperforming the percentile-based DMG segmentation.
Figure 29.
Model accuracy and F1-score obtained using only actionable predictors, showing that ensemble methods (ExtraTrees, RandomForest, Bagging) maintain high performance—up to 86% accuracy—substantially outperforming the percentile-based DMG segmentation.
Figure 30.
SHAP-derived feature importance showing cost variables as dominant and behavioral metrics as secondary contributors.
Figure 30.
SHAP-derived feature importance showing cost variables as dominant and behavioral metrics as secondary contributors.
Table 1.
Chi-square results.
Table 1.
Chi-square results.
| Segment | Chi-Square | p-Value | Degrees of Freedom |
|---|
| Large | 22 | 0.0786 | 14 |
| Midsize | 16 | 0.1816 | 12 |
| Small | 72 | 0.0686 | 56 |
Table 2.
Chi-square results by vehicle tonnage segment, frequency of failure and costs of failure.
Table 2.
Chi-square results by vehicle tonnage segment, frequency of failure and costs of failure.
| Vehicle Category | Large | Midsize | Small |
|---|
| Frequency Tier | Average Age |
| Low | 8.40 | 9.50 | 5.20 |
| Mid | 7.70 | 9.33 | 5.52 |
| High | 9.00 | 8.71 | 6.28 |
| Chi-square frequency | 4.64 | 5.74 | 25.5 |
| p-value | 0.33 | 0.22 | 0.06 |
| Cost Tier | Average Age |
| Low | 7.50 | 9.86 | 5.30 |
| Mid | 8.30 | 9.00 | 4.68 |
| High | 9.25 | 8.71 | 6.90 |
| Chi-square cost | 5.88 | 5.65 | 25.9 |
| p-value | 0.21 | 0.23 | 0.0553 |
Table 3.
Weibull Beta parameter—bootstrap results by vehicle category.
Table 3.
Weibull Beta parameter—bootstrap results by vehicle category.
| Vehicle Category | Mean_Beta | Std_Beta | Min_Beta | Max_Beta |
|---|
| Large | 11.15 | 4.70 | 6.54 | 43.77 |
| Small | 2.37 | 0.25 | 1.84 | 3.69 |
| Midsize | 9.87 | 4.02 | 7.76 | 89.15 |
Table 4.
Summary of model performance metrics.
Table 4.
Summary of model performance metrics.
| Rank | Model | Accuracy | F1-Score | Key Notes |
|---|
| 1 | SVC (Linear Kernel) | 0.67 | 0.74 | Best-performing model |
| 2 | XGBoost/ExtraTrees | 0.58 | 0.61 | Next best performing model |
| 3 | RandomForest | 0.50 | 0.54 | Moderate performance |
| 4 | Logistic Regression/Naive Bayes/Bagging | 0.42 | 0.42 | Limited ability to model this dataset |
| 5 | KNN/GradientBoosting/MLP | 0.33 | 0.37 | Underfitting |
| 6 | Nystroem+SVC | 0.25 | 0.1 | Kernel approximation ineffective here |
| 7 | AdaBoost/DecisionTree/ | 0.17 | 0.1 | Poor generalization |
| 8 | SVC (RBF Kernel)/Fourier + LogisticRegression | 0.08 | 0.06 | Lowest performance among all models |
Table 5.
Summary of Model Performance Metrics.
Table 5.
Summary of Model Performance Metrics.
| Rank | Model | Accuracy | F1-Score | Key Notes |
|---|
| 1 | RandomForest/ExtraTrees | 0.58 | 0.55 | Best-performing models |
| 2 | AdaBoost/Naive Bayes/Bagging/XGBoost | 0.47 | 0.35/0.43/0.42/0.44 | Moderate performance |
| 3 | DecisionTree | 0.42 | 0.37 | limited generalization |
| 4 | Logistic Regression/SVC (Linear Kernel)/Fourier + LogisticRegression | 0.32 | 0.22/0.24/0.3 | Hard to capture nonlinear patterns |
| 5 | KNN | 0.26 | 0.21 | Low predictive stability |
| 6 | SVC (RBF Kernel)/MLP | 0.16/0.11 | 0.08/0.10 | Underfitting |
Table 6.
Summary of model performance metrics.
Table 6.
Summary of model performance metrics.
| Rank | Model | Accuracy | F1-Score | Key Notes |
|---|
| 1 | RandomForest/ExtraTrees/Bagging | 0.93 | 0.90 | Best-performing models |
| 2 | Logistic Regression/Naive Bayes/DecisionTree/XGBoost | 0.87 | 0.87/0.87/0.84/0.88 | High overall performance |
| 3 | GradientBoosting | 0.80 | 0.83 | Solid performance |
| 4 | KNN | 0.73 | 0.79 | Good performance |
| 5 | SVC (RBF Kernel)/MLP/SVC (Linear Kernel) | 0.67 | 0.74 | Moderate performance |
| 6 | Fourier + LogisticRegression | 0.60 | 0.66 | Kernel approximation provides limited gains |
| 7 | AdaBoost | 0.13 | 0.08 | Underfitting |
Table 7.
Summary of model performance metrics.
Table 7.
Summary of model performance metrics.
| Rank | Model | Accuracy | F1-Score | Key Notes |
|---|
| 1 | ExtraTrees/Nystroem + SVC | 0.86 | 0.79 | Best-performing models |
| 2 | RandomForest/Logistic Regression/KNN/Naive Bayes/DecisionTree/XGBoost/GradientBoosting/Bagging/SVC (Linear Kernel) | 0.79 | 0.8–0.7 | Broad group of well-performing models; stable generalization |
| 3 | MLP/SVC (RBF Kernel)/ | 0.71 | 0.68 | Moderate performance |
| 4 | AdaBoost | 0.57 | 0.55 | Underfitting |
| 5 | Fourier + LogisticRegression | 0.50 | 0.47 | Kernel approximation provides modest predictive power |