1. Introduction
The Kinematic Transmission Error (KTE) of a gear pair—the angular deviation between the theoretical and actual gear position—has long been recognised as one of the primary excitation sources for tonal gear noise [
1,
2,
3]. Since the mid-20th century, numerous analytical and experimental studies have confirmed that fluctuations in KTE, caused by tooth deflection, geometric imperfections, and misalignments, directly correlate with tonal noise levels in vehicle drivetrains [
4,
5,
6]. Early predictive approaches relied on analytical formulas [
7] and later on advanced numerical techniques such as the Finite Element Method (FEM) and the Boundary Element Method (BEM) [
8,
9], which provided accurate results but at high computational cost.
Traditional noise reduction strategies have mainly focused on optimising gear geometry and alignment, including profile modification, crowning, backlash adjustment, and careful material and surface treatment selection [
10,
11,
12]. While these measures can effectively mitigate some geometric error effects, they do not always result in substantial overall noise reduction, particularly when multiple error sources interact [
13].
Classical vibroacoustic simulations, such as FEM- and BEM-based approaches, remain indispensable for detailed noise analysis and design validation [
8,
9]. However, these methods are computationally intensive [
14,
15], limiting the number of design iterations that can be performed within typical development timelines. This constraint has motivated research into alternative modelling techniques that can provide rapid yet reliable predictions during early-stage design.
In recent years, machine learning (ML) has emerged as a powerful surrogate modelling approach for gear noise prediction [
16,
17,
18]. ML models, once trained on representative datasets, can capture highly non-linear relationships between manufacturing tolerances and KTE with negligible computation time at inference [
19]. This capability enables extensive parametric studies and rapid design-space exploration without the need for repeated high-fidelity simulations.
Concurrently, three recent developments are strengthening this trend. First, non-contact optical metrology and encoder-based TE techniques now deliver high-resolution, production-speed measurements that expand training datasets and improve label quality, ranging from full-field optical gear profiling to phase-accurate TE capture with clock-pulse schemes [
20,
21,
22,
23]. Second, absolute TE formulations enable repeatable, phase-consistent dynamic signals that are robust to drift and have even been used as real-time proxies for wear progression in meshing gears [
24,
25]. Third, uncertainty-aware and physics-informed surrogates are gaining traction in NVH—e.g., Gaussian-process models that return predictive variance and hybrid schemes that encode governing relations—making fast predictors more trustworthy outside the training grid [
26,
27]. In parallel, open tolerance–KTE corpora (e.g., Recherche Data Gouv) now provide tens of thousands of Monte-Carlo samples with standardized inputs and KTE outputs, enabling benchmarking, transfer learning, and reproducible studies [
28]. Synthetic training data from multibody dynamics have likewise proven effective for gear systems, pointing to practical pipelines where high-fidelity simulations seed fast ML predictors [
29]. Finally, ML is increasingly used to assess tonal perception in EVs, underscoring the link between TE-based excitation modeling and sound-quality objectives [
30].
While the dataset used in this study has been previously published, its original application by Khezri et al. [
31] focused on evolutionary cost–tolerance optimisation for complex assembly mechanisms, with micro gears serving as the case study. Their analysis aimed at balancing manufacturing cost and tolerance quality through simulation and surrogate modelling, without addressing NVH aspects or developing noise-related metrics. In contrast, the present work applies advanced machine learning models to the same tolerance parameters to predict Kinematic Transmission Error (KTE) and introduces the Tonal Risk Index (TRI) as a novel, dimensionless indicator of relative tonal noise risk, explicitly linking manufacturing tolerances to NVH performance.
In this study, we introduce the Tonal Risk Index (TRI), a novel, dimensionless metric for quantifying relative tonal noise risk directly from predicted KTE values. The TRI provides a simple, interpretable link between manufacturing tolerance combinations and expected tonal noise severity. By normalising predicted KTE values to a 0–1 scale, the TRI enables quick comparison of tolerance configurations and supports early design decision-making without requiring full vibroacoustic modelling. While the TRI focuses on gear-mesh excitation effects, it offers a scalable foundation for integration into broader NVH (Noise, Vibration, and Harshness) optimisation frameworks. While prior studies have used ML to predict KTE, this study uniquely introduces a simple, normalized indicator—TRI—that allows design engineers to assess tonal excitation risk directly from tolerance combinations.
The remainder of this paper is organized as follows. 
Section 2 describes the dataset, feature definitions, and preprocessing steps. 
Section 3 details the applied machine learning models, training procedures, and evaluation metrics. 
Section 4 presents the results, including prediction accuracy, feature importance, and interaction analysis, followed by the derivation of the TRI. 
Section 5 discusses the findings in relation to traditional vibroacoustic approaches, uncertainty sources, and psychoacoustic considerations. Finally, 
Section 6 summarizes the main conclusions and outlines potential directions for future research.
  Research Questions, Hypothesis, and Objectives
The present study addresses two main research questions:
How accurately can the KTE be predicted from manufacturing tolerances and process deviations, and which tolerances or tolerance interactions have the greatest influence?
Can the causal chain from tolerances → KTE → mechanical excitation → radiated noise be quantified?
The hypothesis is that KTE is a non-linear, interaction-driven function of manufacturing tolerances, and that tree-based machine learning models (e.g., Random Forest, XGBoost) will significantly outperform linear regression approaches while consistently identifying critical tolerances. Furthermore, because KTE is a key excitation source, a dimensionless TRI derived from predicted KTE is expected to serve as a reliable proxy for relative noise risk in drivetrains.
The objectives of this work are as follows:
- To train and cross-validate machine learning models using 15 tolerance and process-shift parameters for KTE prediction. 
- To quantify the relative importance of each tolerance and identify dominant interactions. 
- To derive a TRI and map manufacturing tolerances through KTE to an estimated noise risk metric. 
  2. Materials and Methods
We utilised the Gear Statistical Tolerance Analysis dataset provided by Recherche Data Gouv (France), comprising 39,984 Monte Carlo simulation samples. The dataset is openly available under a CC-BY license [
28]. Each sample contains 15 input variables—seven manufacturing tolerances (pitch, run-out, form, and misalignment for both spur and crown wheels) and eight process-shift variables—and seven output variables, including the KTE value. The 15 input variables were sampled independently using a Latin Hypercube Sampling (LHS) strategy. Each variable followed a truncated normal distribution centered around zero, with standard deviations selected to represent typical industrial tolerance ranges. The upper and lower bounds were defined using a ±3σ range, corresponding to the expected spread of manufacturing deviations under standard process control. No explicit correlation structure was enforced between variables, allowing for the exploration of independent and combined effects in a high-dimensional tolerance space. Input tolerances are expressed in micrometres (µm) and represent deviations from nominal gear geometry. The KTE value is given in microradians (µrad) and quantifies the periodic angular error at the mesh line under loaded conditions.
The dataset was pre-processed by ensuring numerical type consistency, imputing missing values via median imputation, and standardising the measurement units. The target variable for modelling was the scalar KTE value in µrad. Median imputation was selected due to its robustness against outliers, which are common in tolerance datasets derived from worst-case scenario sampling. Unlike mean imputation, the median preserves the central tendency without being skewed by extreme values, ensuring more stable model behavior during training.
A baseline Ordinary Least Squares (OLS) regression model was trained, achieving R2 ≈ 0 Synthetic training and MAE ≈ 3.75 µrad, indicating that a simple linear model cannot capture the complex non-linear dependencies between tolerances and KTE. Consequently, two non-linear ensemble methods—Random Forest (RF) and Extreme Gradient Boosting (XGBoost 2.1.0)—were trained using 5-fold cross-validation with stratified sampling to ensure even distribution of KTE values across folds. Hyperparameters (e.g., n_estimators, max_depth, learning_rate) were tuned via grid search. These algorithms were chosen for their strong performance on tabular datasets with complex, non-linear feature interactions, as well as their robustness to multicollinearity and outliers. Moreover, both models provide direct tools for interpretability, including feature importance and SHAP analysis, making them suitable for engineering design tasks where transparency is essential.
Model interpretability was assessed using permutation feature importance and SHAP (SHapley Additive exPlanations) values. Feature interactions were visualised through two-dimensional partial dependence plots (PDPs) to identify tolerance combinations leading to significant KTE variation.
Statistical inspection revealed that the majority of tolerance values are normally distributed, centred near zero with standard deviations corresponding to realistic industrial limits. For example, a pitch error of 10 µm typically represents the upper bound of acceptable deviations in precision automotive gear manufacturing and can result in measurable increases in KTE. Pearson correlation analysis showed moderate positive correlations between pitch error and KTE (r ≈ 0.45) and between radial run-out and KTE (r ≈ 0.39), suggesting additive and interaction-driven effects. Outlier detection was performed via interquartile range (IQR) analysis, with extreme cases retained to preserve the dataset’s coverage of worst-case manufacturing scenarios (
Figure 1).
  2.1. Machine Learning Model Configuration
Two tree-based ensemble methods—Random Forest (RF) and Extreme Gradient Boosting (XGBoost)—were trained for KTE prediction.
- Random Forest: 500 trees, maximum depth = 12, minimum samples per split = 2, bootstrap sampling enabled. 
- XGBoost: 500 estimators, maximum depth = 8, learning rate = 0.05, subsample = 0.8, colsample_bytree = 0.8. 
A stratified 80/20 train–test split was applied to the full dataset. The training set (80%) was used for model fitting and hyperparameter tuning. Within this subset, hyperparameter tuning was conducted using 5-fold cross-validation to ensure robust selection of model parameters. The final model performance was evaluated on the held-out 20% test set, which was never seen during training or tuning.
  2.2. Tonal Risk (TRI) Definition
To enable a direct mapping from predicted KTE values to a relative noise risk metric, we introduced the TRI, defined as:
        where 
 is the ML-predicted KTE for the 
i-th tolerance configuration. This min–max normalisation yields a dimensionless score in [0, 1], where 0 denotes the lowest tonal noise risk within the dataset and 1 denotes the highest. The current TRI formulation focuses on gear-mesh excitation and does not account for gearbox housing vibration, bearing-induced excitations, or psychoacoustic weighting. This normalization was applied across the full set of predicted KTE values to ensure that the TRI reflects relative excitation severity within the simulated population. This formulation supports comparability across tolerance configurations but does not constitute an absolute risk scale.
All analysis was performed in Python 3.11 using the scikit-learn and SHAP libraries on an Intel® Core™ i7 (Intel Corporation, Santa Clara, CA, USA) workstation with 128 GB RAM.
Normalization ensures comparability across different datasets or manufacturing scenarios. TRI values are right skewed in most cases, indicating that only a small proportion of tolerance combinations yield critically high tonal noise risk.
  2.3. Workflow
- Define the manufacturing tolerance and process-shift parameters (e.g., pitch error, run-out, misalignment, form defect). 
- Train a machine learning model (Random Forest/XGBoost) to predict KTE from the tolerance parameters. 
- Apply min–max normalization to the predicted KTE values to compute tri. 
- Map TRI values to qualitative noise-risk. 
  2.4. Limitations
The present study has several limitations that should be acknowledged. First and foremost, both the machine learning models and the proposed TRI metric are developed and validated exclusively using synthetic simulation data. Their performance on real-world, noisy, and imperfectly measured tolerance inputs remains unknown and will likely be lower. Therefore, experimental validation against physical measurements is an essential next step to assess the practical applicability of the method.
Secondly, the TRI is an excitation-based indicator derived solely from predicted KTE values. It does not account for structural dynamics, acoustic radiation, or psychoacoustic weighting. As such, it should not be interpreted as a direct predictor of radiated noise but rather as a relative risk index for tonal excitation.
Other important limitations include the exclusion of additional drivetrain excitation sources, such as bearing dynamics, shaft misalignments, and electromagnetic torque ripple. Likewise, effects related to gearbox housing modes and vibroacoustic coupling are not captured in the current approach.
Although no experimental noise measurements were available in the present study, an important future research direction is the qualitative comparison of the TRI with measured noise levels.
Despite these constraints, the TRI offers a computationally efficient, interpretable tool for early-stage tolerance assessment and supports future integration into broader NVH simulation frameworks.
  3. Results
The Random Forest model achieved an R
2 ≈ 0.82 and a mean absolute error (MAE) of 0.36 µrad, while XGBoost produced comparable performance. Both significantly outperformed the linear baseline (OLS), which failed to capture the non-linear behaviour of KTE. The predicted versus true KTE values (
Figure 2) demonstrate a close match between model outputs and reference data.
Other performance metrics were calculated to evaluate the predictive accuracy of the models. The mean absolute error (MAE) was 0.36 µrad, and the root mean squared error (RMSE) was 0.52 µrad, both indicating a low average deviation from the reference values. The mean absolute percentage error (MAPE) was approximately 5.2%, confirming the model’s reliability across a broad range of tolerance configurations. These results suggest that the model performs consistently across the dataset, including both low and high KTE values.
Feature importance analysis (
Figure 3) identified pitch error, radial run-out, and misalignment as the dominant contributors to KTE. This finding is consistent with physical expectations, as deviations in pitch and profile, along with eccentricity, are known to cause significant transmission errors.
Interaction analysis (
Figure 4) revealed that certain tolerance combinations produced super-additive effects: pitch error paired with run-out resulted in disproportionately high KTE, and misalignment combined with form-defect process shift generated errors exceeding the sum of individual effects.
The TRI, calculated by normalising predicted KTE values to a [0, 1] scale, exhibited a right-skewed distribution (
Figure 5) with a median value of ≈0.63. Most tolerance configurations yielded relatively low noise risk scores, but a small subset fell into the upper 0.8–1.0 range. These high-risk outliers correspond to rare, extreme combinations of manufacturing and process-shift parameters, representing potential NVH quality concerns that warrant targeted process control. This distribution suggests that, under realistic production conditions, assemblies are unlikely to reach the highest tonal noise severity levels, though outliers can still present significant risks.
  4. Discussion
The study confirms that KTE prediction is non-linear and interaction-driven. Linear models fail, whereas tree-based learners capture the complex relationship and identify the most influential tolerances. Misalignment and machining errors are known to increase noise; the machine-learning results provide a quantitative ranking of these effects. The proposed TRI offers an accessible proxy for relative noise risk, allowing early design optimisation without full vibroacoustic simulations. Random Forest models are particularly suitable because they do not require assumptions about functional form, making them well-suited to complex tolerance–noise relationships.
  4.1. Comparison with Conventional Methods
Classical vibroacoustic prediction methods, such as Finite Element Analysis (FEA) and Boundary Element Method (BEM), offer high physical fidelity and detailed insight into structural and acoustic behaviour. They can directly model housing resonances, gear–housing coupling, and radiated sound fields. However, these approaches are computationally expensive and require extensive meshing, material characterisation, and boundary condition definition, which can limit their use to a small number of design iterations in early development stages. In contrast, the machine learning (ML) approach adopted here provides orders-of-magnitude faster predictions once trained, enabling rapid tolerance-space exploration. ML models are also more flexible in integrating heterogeneous data sources (e.g., simulation outputs, measurement datasets) without the need for fully parameterised physical models. The trade-off is reduced physical interpretability and reliance on representative training data for accurate generalisation.
  4.2. Uncertainty Analysis
While the ML models achieved satisfactory predictive performance, they inherently carry predictive uncertainty. Sources of uncertainty include:
- (1)
- Measurement noise in the original KTE values (from test benches or simulation discretization errors). 
- (2)
- Data variance due to uncontrolled process shifts or unmeasured variables not captured in the dataset. 
- (3)
- Model approximation error, arising from the inability of the chosen ML algorithms to perfectly model all underlying non-linearities. In practice, predictive intervals or ensemble variance estimates could be integrated into the TRI framework to communicate confidence levels alongside noise risk scores. 
  4.3. Psychoacoustic Relevance of Tonality
Tonality is a critical factor in human perception of mechanical noise. Even at modest overall sound pressure levels, a pronounced tonal component can be perceived as more intrusive or unpleasant than broadband noise. Psychoacoustic studies have shown that tonal components are more likely to draw attention and contribute disproportionately to perceived annoyance. Therefore, targeting reductions in tonal excitation, as captured by the KTE-derived TRI, can yield greater subjective NVH improvements than equivalent reductions in overall sound power.
  5. Conclusions
Machine-learning models were able to predict the KTE of a gear pair with high accuracy R2 ≈ 0.82, MAE ≈ 0.36 µrad), whereas linear models failed to capture the underlying non-linear behaviour (R2 ≈ 0). Pitch error, radial run-out, and misalignment were consistently identified as the most influential tolerances affecting KTE. Significant non-linear interactions were observed, most notably between pitch error and run-out, and between misalignment and form-defect process shift. These findings align with physical expectations that deviations in profile, pitch, and eccentricity are major contributors to transmission error. Feature rankings remained stable across validation folds, confirming the robustness of the analysis.
The Tonal Risk Index (TRI), introduced in this study, is a novel and easily applicable metric for quantifying relative tonal noise risk directly from predicted KTE values. The TRI is computed by normalising predicted KTE values to a dimensionless [0, 1] scale, enabling straightforward comparison of manufacturing tolerance configurations in terms of their potential tonal excitation severity. From a psychoacoustic perspective, tonal components are perceived as more intrusive than broadband noise, making the TRI particularly relevant for early detection of NVH-critical designs.
From an industrial standpoint, the TRI framework offers practical decision-support capabilities during early-stage design and tolerance allocation. It allows engineers to rapidly identify and avoid tolerance combinations likely to cause NVH quality issues, and can also guide process control by flagging high-risk tolerance interactions before production. This approach reduces reliance on extensive physical prototyping and computationally expensive high-fidelity simulations (FEM/BEM), thereby accelerating the product development process.
Overall, integrating machine learning with the proposed TRI metric provides a fast, interpretable, and sustainable approach to assessing and managing tonal noise risk in gear manufacturing, bridging the gap between traditional high-accuracy simulation methods and the need for rapid, data-driven design decision-making.