1. Introduction
The average highway mileage of the top three countries with the largest total highway mileage reached 6 million kilometres in 2024 [
1]. With the continuously increasing highway mileage, maintenance is playing a more critical role than ever. Some countries, such as the US and Denmark, have already allocated a significant part of the road-related budget to this [
2].
Considering pavement maintenance, the evaluation of pavement condition is a critical component of infrastructure management, necessitating robust assessment techniques. Two primary methodologies are utilised: destructive testing, such as coring, and non-destructive testing (NDT), exemplified by the Falling Weight Deflectometer (FWD) or Traffic Speed Deflectometer [
3]. Coring involves the extraction of cylindrical pavement samples to directly characterise material properties, including compressive strength and asphalt content, as well as structural attributes, such as layer thickness and interlayer bonding. Despite its precision, this method causes permanent structural damage, requiring subsequent repairs that elevate costs and disrupt traffic flow. Conversely, NDT assesses pavement condition without compromising structural integrity, offering greater efficiency and the ability to cover larger areas compared to destructive methods. The FWD, a widely adopted NDT technique, generates a dynamic load pulse by dropping a calibrated weight onto a 300 mm diameter circular load plate, inducing vertical deformations that form a deflection basin (
Figure 1) [
4]. These deformations are measured using geophones positioned at the load centre and at multiple radial distances, enabling rapid data acquisition over extensive pavement sections. The collected deflection data facilitate the prediction of pavement layer moduli and thicknesses through back-calculation techniques [
5].
Back-calculation employs an iterative numerical optimisation process to determine pavement properties. This process begins with an initial estimate of pavement properties, followed by a forward calculation to simulate the FWD test and compute the predicted deflection response. The computed response is then compared to the actual FWD deflection data, yielding a value for an objective function that quantifies the agreement between estimated properties and observed measurements. Through an optimisation procedure, the estimated properties are iteratively adjusted to minimise the objective function, refining the accuracy of the pavement property predictions [
6]. Compared to coring, which is labour-intensive and limited to discrete locations, the FWD enables efficient, large-scale assessments with minimal disruption. While coring provides precise, localised material data for detailed forensic analysis, NDT methods like the FWD preserve pavement integrity, making them ideal for routine monitoring and broad structural evaluations. The selection of an appropriate method depends on the project objectives, balancing the need for detailed material characterisation against the advantages of rapid, non-invasive assessment for effective pavement management.
Various programmes are available to analyse the pavement structures, such as ELSYM5, WESLEA, BISAR, etc., or back-calculation tools like ELMOD. In static analysis, the peak load and deflection are used to calculate the thickness and moduli of pavement layers. In contrast, in dynamic analyses such as DBALM, FEM, 3D-Move, etc. [
7,
8,
9], the force-time recorded functions are directly utilised to predict needed results. However, based on research by Tarefder [
10], the results from various back-calculation software differ, and the final output can differ from the laboratory result due to the calculation algorithm.
Unlike traditional back-calculation analytical algorithms, ML is a popular tool for analysing similar tasks [
11,
12,
13,
14]. ANN is one of the most popular analysis tools that allows for reasonable predictions without referencing physical phenomena in analysed problems [
15,
16,
17]. Pure ANN and hybrid models, e.g., combined with a Genetic Algorithm, were utilised to predict the moduli of pavement. The results showed that ANN models improved prediction compared to traditional back-calculation software, and the hybrid ANN model showed stronger generalisation ability than traditional ANN [
18,
19,
20].
Ensemble models also include popular ML models, e.g., RF (Random Forest) and GBM (Gradient Boosting Machines), which are commonly used to predict pavement properties. Sudyka et al. [
21] used RF-, ANN-, and BT (Bagged Trees)-Reinforced Trees to train different models to predict asphalt layer temperature based on data obtained from FWD and TSD. The results showed that all models had a good prediction accuracy, with R
2 values exceeding 0.8. Worthey et al. [
22] predicted the dynamic modulus of asphalt mixtures by a model trained in a Bagged trees ensemble, and ensemble models exhibited significantly better prediction accuracy on this property than some ANN models.
One of the ensemble models, XGBoost, performs better than other tree boosting models in practice [
23]. Despite this advantage, there is a lack of literature on back-calculation in specific FWD applications, while its performance has been proven in many other applications in pavement analyses. Wang et al. [
24] found that its performance in predicting the International Roughness Index of rigid pavement surpassed that of other ensemble models. In addition, Ali et al. [
25] utilised XGBoost to predict the dynamic modulus of asphalt concrete mixtures, with results that significantly outperformed some well-known regression models, including Witczak, Hirsch, and Al-Khateeb. Ahmed [
26] built both RF and XGBoost models to predict pavement structural conditions based on data derived from the Long-Term Pavement Performance (LTPP) programme. The results demonstrated that XGBoost outperformed RF and had practical advantages over empirical equations. Zhu [
27] compared ANN, SVM, KNN and a combined RF-XGBOOST model to propose a pavement maintenance decision model, and found that the combined RF-XGBOOST model achieves a classification accuracy of 93.1% surpassing the rest of the ML models.
In short, ML models, e.g., ANN and ensemble models, are increasingly utilised for analysing pavement properties, offering advantages over traditional methods. One of the ensemble models, XGBoost, presents with a better performance than other tree boosting models; in some cases, it outperforms the ANN. However, there is little research on FWD back-calculation using XGBoost. Therefore, this study focuses on the prediction of XGBoost models.
3. Database Generation and Its Analyses
In order to generate a database, i.e., sets of input data (all parameters of the layered structure, such as layer thicknesses and their material properties) and output data in the form of deflections at 10 distinguished points (as in the FWD test), a boundary value problem was formulated as in
Figure 1. This is an axisymmetric problem of an infinite layered half-space symmetrically loaded on a circular area of radius
a, with a load of 40 kN to form the 80 kN standard axle load.
A database for training the machine learning model was generated using the JPav programme, developed by one of the authors. This programme is a static analytical deflection-calculation software that calculates deflection at a given point based on the properties of pavement layers.
The programme allows for the stress/strain of a multi-layered pavement with linear–elastic and isotropic behaviour to be calculated by applying the biharmonic function proposed by Burmister [
28]. The calculation of the stresses, strains, and displacements is carried out by substituting the biharmonic function in the equations of the elasticity theory. The programme reproduces the results provided by BISAR or other primary programmes developed to calculate stresses, strains, and displacements in multi-layered pavements. The programme was developed to calculate the vertical displacement on the theoretical basis presented in
Appendix A; see also [
29]. In the interpretation of FWD measurements, it is considered valid to employ static layered-elastic analysis and to perform a back-calculation of pavement layer moduli, provided that the resulting parameters remain consistent with the mechanistic–empirical design framework [
30].
Pavement layer properties indicate the thickness (Has) and modulus (Eas) of the asphalt layer, the thickness (Hgr) and modulus (Hgr) of the granular layer, the thickness (Hsu) and modulus (Esu) of the subgrade layer, and the modulus (Eil) of the infinite subgrade layer. While the results of defection at ten measuring points, d1 to d10, are the output of the software, the distances of the measuring points from the load centre are 0, 0.2 m, 0.3 m, 0.45 m, 0.6 m, 0.9 m, 1.2 m, 1.5 m, 1.8 m, and 2.1 m, respectively.
JPav requires at least 11 input parameters to obtain results. The four constant inputs are the number of loads, the magnitude of the load (kN), the radius of the circular contact area (m), and the Poisson’s ratio of each layer, set at 1, 40, 0.15, and 0.35, respectively. To obtain a continuous database for later analysis, the remaining inputs, i.e., the properties of each layer, were generated using the random uniform function, and the number of decimal places for each value was determined by the corresponding interval value, as shown in
Table 1. In total, 65,824 datasets were randomly generated (see flowchart in
Figure 2), consisting of the input data (geometrical and mechanical parameters of the structure) and output data (deflection values for points d1–d10).
This large dataset allows for the effective training and validation of the developed models. The limits for thickness and stiffness are typical values found in pavement. In the field, we have all possible values for all variables (thickness and stiffness) of the pavement layers. Therefore, the cases included in the database are valid for characterising a pavement.
Considering a continuous and even distribution of thickness and Young’s modulus of each layer within the above range, the database was generated using JPav. The distribution of the database’s input parameters is presented in
Figure 3.
It should be noted that the randomly chosen geometric parameters (i.e., layer thicknesses), as well as the mechanical parameters (layer stiffness moduli), generally define the pavement structure as having a specific global stiffness (compliance). To evaluate the generated database, the compliance of individual pavement layers and the total compliance of the layers (excluding the native soil layer of infinite thickness) were assessed. The compliance was assessed assuming a uniaxial stress state (compression) in the layer of isotropic elastic material, following the formulas resulting from the theory of elasticity [
31,
32].
where
i = as, gr, su, il, and H represent the thickness of one layer, E is the Young’s modulus of one layer, and
is Poisson’s ratio, herein,
= 0.35. The summary compliance can be calculated using the following sum:
Assuming that, at some depth H = 15 m-Has-Hgr-Hsu, there is a rigid layer top surface, total compliance may be determined based on a displacement boundary condition. In this type of compliance estimation, the local nature of the load in the FWD test is not taken into account. Therefore, an estimation of the (bending) compliance based on Kirchhoff’s thin plate theory was also proposed [
33] to determine the compliance of each layer according to the following formula:
Summary compliance (SBC) can be determined analogously, as in the case of axial compliance— see Equation (2)—and the assumptions presented below this equation. The actual compliance of the FWD problem lies somewhere between SC and SCB.
Following the equations above for the analysed database, the distributions in terms of compliance are shown in
Figure 4. The dataset shows a predominance of layers with lower compliance, indicating higher stiffness in most simulated structures.
The compliance of individual layers contributes to the overall compliance of the pavement structure.
Figure 5 presents histograms of the total compliances, SBC and SC. These histograms do not conform to normal distributions but resemble gamma distributions. The normality of the distribution was checked using the Anderson–Darling criteria, and the fit to the gamma distribution was also assessed (see also
Appendix B). Comparing the distributions of total compliance with those of individual layers reveals a shift in their characteristics. The database predominantly contains cases with medium compliance, with fewer cases exhibiting low compliance. Cases with high compliance are also numerous.
The section above presents the analysis of input data for the ML algorithm. The data were processed using JPav to generate the output dataset, comprising ten deflection points (d1–d10) that define the deflection basin. The output data underwent statistical evaluation, with the results presented in
Table 2. Additionally, multicollinearity assessment is essential for high-dimensional datasets. Elevated multicollinearity within a dataset complicates the isolation of individual effects of correlated features on the target variable, potentially compromising the interpretability and reliability of predictive models [
34]. A Pearson correlation matrix was computed, with the results visualised as a heatmap in
Figure 6. To facilitate analysis across varying distances, each distance (
r) was normalised by the radius of the loading plate (
a), as described below:
where
is the normalised distance,
a is the radius of loading, (herein
a = 0.15 m), and
r is the radial distance of the measuring point from the axis of symmetry, in m.
Analysis of the correlation matrix heatmap (
Figure 6) reveals a strong correlation between neighbouring measurement points within the range d1 to d5. Beyond this range, deflections correlate strongly not only with adjacent points but also with two or three subsequent points. Consequently, reducing the number of measurement points in the range d6 to d10 (for
) is unlikely to significantly affect the accuracy of the data used for back-analysis in interpreting FWD test results.
Table 2 summarises the average, maximum, and minimum deflection values, which characterise a typical deflection basin. All skewness values are positive, indicating a right-skewed distribution. Kurtosis values are less than 3 (the kurtosis of a normal distribution), indicating a platykurtic distribution with fewer outliers. This is confirmed by the absence of outliers within the homogeneous interval [Q1 − 1.5IRQ; Q3 + 1.5IRQ]. Approximately 3% of the data were identified as outliers and were removed from the dataset prior to training the machine learning model. After removing these outliers, the dataset contained 60,687 data points.
As can be seen from the generated database, cases identified as outliers were removed. At this stage, it is difficult to assess the precise impact of this removal on the predictive performance of the trained machine-learning models. Accordingly, after the models were developed, a subsequent verification was carried out, as presented in
Appendix C, to evaluate the effect of outlier removal. This was achieved by comparing the performance of models trained on the full dataset (including outliers) with those trained on the cleaned dataset.
5. ML Models Evaluation
5.1. Model Training with the Whole Dataset
Based on the preparation in the last section, models were trained using XGBoost (version 2.1.4) [
42]. Their performance was evaluated using the R
2 and root mean square error (RMSE), as well as Residual Variance (RV), with the results presented in
Table 5.
The results indicate that models predicting Has, Hsu, and Esu achieved strong performance, with coefficients of determination (R2) exceeding 0.8. However, models for Hgr and Eil exhibited poor performance, with R2 values below 0.1.
To interpret these models, feature importance (FI) and SHAP violin plots were employed, as detailed below. FI quantifies the influence of each input feature on the model’s predictions [
43], as presented in
Figure 7.
Figure 7 illustrates the feature importance calculated using the Gain metric for each target. The results reveal that deflections at positions d1, d6, and d10 exert the most significant influence on the prediction of Has, Esu, and Hsu, respectively, with importance values exceeding 0.5. Notably, d10 also demonstrates a substantial impact on the prediction of Eil. For the prediction of Eas, deflections at d1 and d2 exhibit greater influence compared to other features, with importance values of approximately 0.2. In contrast, the remaining features show relatively minor influence on predictive performance, with importance values generally remaining below 0.2.
To elucidate the contribution of each feature to the model, SHAP (SHapley Additive exPlanations) violin plots were employed. These plots visualise the SHAP values, which quantify the contribution of each predictor to the model’s output. The violin plots illustrate the distribution of these contributions, highlighting the magnitude and direction (positive or negative) of each feature’s impact on predictions. The SHAP violin plots are presented in
Figure 8.
Analysis of the SHAP violin plots reveals distinct dominant features for various target variables. Specifically, for the targets Has, Hgr, Hsu, Eas, Egr, Esu, and Eil, the corresponding dominant features are deflections at positions d1, d10, d10, d1 and d2, all positions, d6 and d10, and d10, respectively. Examination of
Figure 7 and
Figure 8 indicates that models exhibiting high predictive performance, such as those for Has, Hsu, and Eas, are characterised by a limited number of dominant features. For instance, in the violin plot for Has (
Figure 8a), the deflection at d1 emerges as the dominant feature, demonstrating a consistent influence on predictions, as evidenced by its broader span along the
x-axis compared to other features. Additionally, certain features exhibit minimal importance in predictive tasks, prompting consideration of feature reduction techniques, which are discussed in subsequent sections. In
Figure 9, the residual plots for each target feature are presented.
5.2. Model Training with PCA
Principal Component Analysis (PCA) is a widely utilised dimensionality reduction technique that identifies a reduced set of orthogonal features, or principal components, capable of representing the original dataset in a lower-dimensional subspace while minimising information loss [
44]. The correlation analysis detailed in
Section 3, coupled with the extensive dataset comprising 60,687 samples, indicates a highly correlated and voluminous database, rendering it well-suited for PCA.
During the data pre-processing phase, the methodology closely followed the procedures outlined previously, with the sole distinction occurring post-data scaling. At this stage, PCA transformation was applied to the entire dataset. The explained variance ratio of each principal component is illustrated in
Figure 10. To ensure that 99% of the variance is accounted for, three principal components (PC1, PC2, and PC3) were selected for model training. These components were subsequently utilised to train the predictive models. The results of the trained models are presented in
Figure 11.
Subsequently, XGBoost regression models were trained on the whole dataset using the same hyperparameters as those applied to the PCA-processed dataset. The performance of the models for each target property was evaluated by R
2, RMSE, and RV. The results are presented in
Table 6 (for the entire dataset) and
Figure 11 (for the reduced database after applying the PCA approach).
Based on the results presented in
Table 6 and the accompanying figures, the trained models for Has, Hsu, and Esu exhibit satisfactory predictive performance, achieving R
2 values exceeding 0.8. In contrast, the predictive performance of the remaining models is suboptimal. A comparison with the models trained in
Section 5.1 reveals that the models exhibiting both high and poor predictive performance are consistent across datasets pre-processed with and without PCA. However, the predictive capabilities of the models for most properties show a slight decline, as indicated by reductions in R
2 and increases in RMSE, with the notable exception of the Has model, which maintains nearly equivalent predictive performance. When considering the dimensionality reduction from 10 to 3, there is a significant boost in model training compared to when training on the entire feature set; see the results presented in
Table 7. All values in the table are mean values from all models (Has, Hgr, etc.), so the unit of RMSE was cancelled.
The table above demonstrates that PCA yields superior computational performance compared to the raw database in terms of optimisation time, fitting time, total time, and model size. Notably, PCA required only approximately 40% of the training time relative to the full dataset, while reducing the model size to 40%.
5.3. Model Training with FDM-like Approach
The Feature Difference Method (FDM) can be used for the generation of a new database comprising deflection differences between adjacent points, as described, among other sources, in [
45]. In the present study, the Feature Difference Method (FDM) was not directly applied. Instead, drawing on the available literature and established experience in the interpretation of Falling Weight Deflectometer (FWD) measurements, a set of features was selected, consisting of differences between deflections recorded by individual geophones.
Based on the previous studies [
46,
47,
48], four variables were employed—SCI, BDI, BCI, and CI—of which SCI (Surface Curvature Index), BDI (Base Damage Index), and BCI (Base Curvature Index) are deflection basin parameters, and variables d7–d8 (referred to here as CI) were introduced by Chen [
47]. These variables are detailed as follows:
The variables SCI, BDI, BCI, and CI represent the deflection differences between adjacent points, while d1, d3, d5, d6, d7, and d8 correspond to the deflections measured at the 1st, 3rd, 5th, 6th, 7th, and 8th points, respectively.
The use of specific deflections, and particularly the difference between them, allows us to define the deflection basin straightforwardly by reducing the amount of data. Typically, the FWD test produces a deflection basin with up to nine points, each of which represents the deflection measured with a specific sensor. This representation allows us to calculate, through back-analysis, the pavement layer modulus. If only the condition of the pavement, in terms of the asphalt layer, granular layers, and subgrade, is necessary, the amount of data defining the deflection basin can be reduced, typically by using the deflection basin parameters.
The model training process closely mirrored the methodology described previously, with the sole distinction being the substitution of the training database with the variables SCI, BDI, BCI, and CI during the data pre-processing stage. Following model training and prediction, the performance was assessed using R
2, RMSE, and RV. The results are presented in
Table 8 and visualised in
Figure 12.
Based on the results presented in
Table 8 and the accompanying figures, the predictive performance of the models varies significantly across target properties. Only the model for Has (
Figure 12a) achieves a satisfactory performance, with an R
2 value exceeding 0.8. The models for Eas (
Figure 12d) and Esu (
Figure 12f) approach this threshold, with R
2 values close to 0.8. However, the remaining models exhibit poor predictive performance. Notably, the models for Hgr (
Figure 12b) and Eil (
Figure 12g) yield R
2 values close to zero, indicating an extremely poor fit to the dataset compared to other trained models.
5.4. Model Assessment, Heteroscedasticity, and Sensitivity Analyses
To facilitate a direct comparison of the predictive models trained using different pre-processing methods, the R
2 and RV values are compiled and presented in
Table 9.
In summary, a comparative analysis of the models trained in this study reveals that models utilising full deflection data from 10 measuring points outperform both PCA-pre-processed and FDM-pre-processed models. Notably, the models for Hsu and Esu achieve R2 values exceeding 0.9, indicating exceptional predictive capability. In contrast, models trained with PCA-pre-processed data exhibit moderate predictive performance, with nearly all models failing to surpass the predictive accuracy of those trained on full deflection data. However, these PCA-based models display a similar trend in predictive performance, excelling in predictions for Has, Hsu, and Esu, while performing poorly for other properties. Given that the PCA-pre-processed dataset is less than half the size of the full dataset, this approach significantly reduces computational resource demands, particularly for large databases.
Models trained with FDM-pre-processed data demonstrate superior performance in predicting Has, achieving an R2 of approximately 0.919, which surpasses all other models, as well as notable performance for Eas. However, their predictive capabilities for the remaining properties are markedly inferior compared to both the whole dataset and PCA-pre-processed models.
The observed-versus-predicted plots (
Figure 9,
Figure 10,
Figure 11 and
Figure 12) reveal that residual scatter is not consistent across the range of each target variable. In nearly all models—regardless of whether training was performed on raw deflections, PCA components, or FDM indices—the vertical spread of residuals visibly increases with predicted values, resulting in a characteristic funnel or fan-out shape. This pattern is particularly pronounced for the best-performing targets (Has, Hsu, Esu), where the highest densities of thick or stiff layers coincide with the largest prediction errors. The systematic increase in residual variance with the magnitude of the predicted parameter clearly indicates the presence of heteroscedasticity. Although XGBoost, a Gradient Boosting Decision Tree (GBDT) algorithm, does not assume homoscedasticity and is generally robust to its presence [
49], formal evaluation remains appropriate. Even after removing all identified outliers, the database comprises approximately 60,000 samples, which can render visual inspections of residual plots sensitive to local point density. Accordingly, a quantitative Decile Variance Check was employed to rigorously assess heteroscedasticity. The results of this test are presented in the tables below.
The decile variance check performed on the trained models (
Table 10,
Table 11 and
Table 12) reveals clear differences in residual behaviour depending on the input representation.
When the models are trained on the full set of ten deflection points, residual variance remains remarkably stable for most targets: Has, Hgr, Eas, Egr, Esu, and Eil exhibit variance ratios between the lowest and highest deciles of only 1.0–1.3, indicating essentially homoscedastic behaviour. The sole exception is the subgrade thickness Hsu, which exhibits a pronounced heteroscedastic pattern (variance ratio ≈ 4.8), a natural consequence of its very wide range of generation (0.25–10 m), where absolute prediction errors inevitably increase with the magnitude of the target. Applying Principal Component Analysis markedly alters this picture. Although the first three components still capture 99% of the variance, the dimensionality reduction introduces or amplifies heteroscedasticity for several parameters. While Has and Hgr remain reasonably stable, Eas, Egr, Esu, and especially Hsu now display variance ratios of 3 to 7, demonstrating that the information discarded in the lower-variance components is particularly important for maintaining uniform predictive precision across stiff or thick structures. The strongest heteroscedasticity appears with the basin indexes (SCI, BDI, BCI, CI). Even the best-performing target, Has, exhibits a variance ratio exceeding 3, while Eas, Esu, and Hsu reach ratios of 6 to 10. The aggressive compression of the deflection basin into just four engineered parameters causes small variations in extreme or noisy basins to be magnified into substantially larger relative errors, particularly for thicker and stiffer pavement configurations. Overall, the complete ten-point deflection dataset produces the most homoscedastic residuals and therefore the most stable predictions across the entire range of realistic pavement structures. Both dimensionality-reduction approaches, and especially the use of traditional deflection-basin indexes, systematically increase heteroscedasticity, meaning that any gain in computational speed or simplicity comes at the cost of reduced prediction reliability for thick or stiff layers.
To evaluate the influence of individual input parameters on the predicted targets, a One-at-a-Time (OAT) sensitivity analysis was conducted [
50]. A baseline input vector was established using the arithmetic mean of the test dataset. Afterwards, each feature was individually perturbed by increasing and decreasing its raw value by 20% while holding the other variables constant. Crucially, these perturbed inputs were processed through the full Standard Scaler and PCA transformation pipeline before inference, ensuring the XGBoost model evaluated the physical changes within the correct feature space. The resulting sensitivity was quantified by the “Total Range,” defined as the absolute difference between the predictions of the high- and low-perturbation scenarios, and subsequently normalised as a percentage of the total accumulated range to rank the relative importance of each component.
Models trained on the full deflection dataset (d1–d10) and on PCA-transformed data showed consistent sensitivity patterns: the predicted values of Has, Hsu, Esu, and Eas were most sensitive to changes in the dominant deflection sensors identified by feature importance and SHAP (primarily d1 for asphalt-layer parameters, d6–d10 for subgrade and deeper-layer parameters). Perturbations of the key sensors produced relative response changes typically between 15% and 35%, whereas perturbations of less important sensors induced changes below 5%.
For poorly predicted targets (Hgr and Eil), the total sensitivity range remained extremely low (<6%, even for the most influential sensors), confirming that realistic variations in deflections within the generated database contain almost no information about granular-layer thickness and infinite-subgrade stiffness. This insensitivity, rather than model deficiency, explains the persistently low R2 values.
Models trained on the basin indexes (SCI, BDI, BCI, CI) exhibited markedly higher sensitivity for the well-predicted parameters (Has and Eas), with perturbation ranges reaching up to 45% for SCI and BDI, reflecting the strong concentration of predictive power in these engineered basin parameters. Conversely, sensitivity to BCI and CI was negligible for most targets, and overall sensitivity for Hgr, Egr, and Eil remained near zero, reinforcing the conclusion that these four indices alone cannot resolve granular and deep-subgrade properties.
In summary, one-at-a-time sensitivity analysis corroborated the feature importance and SHAP findings, quantitatively demonstrating that high-performing predictions are driven by a small subset of highly informative deflection measurements (or derived indices).