1. Introduction
Artificial Neural Networks (ANNs) have been widely used in engineering to predict material properties or structure responses. The involvement of numerous components, such as material constituents or structural elements, coupled with varying environmental and testing conditions, makes such applications inherently complex. To achieve accurate predictions, researchers often include as many input features as possible to comprehensively characterize the system. However, high-dimensional input data increases computational cost and training time and typically requires larger datasets to achieve satisfactory performance. Moreover, excessive or redundant input variables can lead to overfitting, thereby reducing prediction accuracy. A common practice is to carefully select input variables and identify the most important and influential features [
1]. Previous studies have shown that using too many input parameters can make predictive model unnecessarily complex, whereas using too few may omit critical information [
2]. This trade-off underscores the importance of dimensionality reduction in ANN models in engineering applications.
Beyond predictive accuracy, reducing input parameters is motivated by the growing emphasis on sustainable and resource-efficient modeling in civil engineering. Reducing input dimensionality can simplify model structure and lower computational demand during training and inference, which in turn reduces runtime and energy use for data-driven prediction workflows. Such computational efficiency aligns with the principles of Green AI by prioritizing performance per unit of compute and enabling faster, lower-cost design iterations for engineering materials and systems, including low-carbon concretes and durable pavement structures [
3,
4].
A common strategy for reducing input dimensionality is feature selection, which identifies the most informative subset of variables. Recent engineering studies have applied feature selection to enhance ANN model performance. Zhu and Wang [
5] adopted improved an improved Relief algorithm to automatically identify the most relevant bridge attributes for a deep-learning model, enabling accurate bridge condition forecasts up to four years ahead. Liu et al. [
6] used random forest feature importance and LASSO regression to select key factors influencing process-induced deformation of composite structures. Their ANN-based framework achieved a 98% reduction in computational time with less than 5% loss in accuracy, showing that under limited computing resources, focusing on the top-ranked variables yields nearly the same accuracy as using the full set. Khalaf et al. [
7] integrated a genetic algorithm for input selection in predicting shear connector strength. Out of ten candidate variables, the optimal ANN model required only seven inputs while maintaining high accuracy (R
2 = 0.96). Other feature selection techniques frequently used in engineering include correlation filtering [
8], sensitivity analysis [
9,
10], rank-correlation analysis [
11], and metaheuristic approaches such as particle swarm optimization and firefly algorithms [
12,
13]. Bezerra et al. recently compared conventional feature selection methods and found that, although methods like PCA and random forest often enhance performance, they may perform differently depending on downstream models and task characteristics, especially when high-variance components obscure task-relevant patterns, underscoring that feature selection is not universally beneficial without careful method choice [
14]. Similarly, Heidary et al. show that dimensionality reduction can improve computational efficiency in classification tasks but that aggressive reduction may degrade generalization on unseen data if relevant structures are discarded [
15]. Collectively, these studies demonstrate that selecting a relevant subset of inputs enhances model robustness and interpretability while reducing overfitting risks associated with high-dimensional data, but also that the choice and tuning of selection techniques critically influence outcomes.
In addition to selecting a subset of existing features, another approach is to transform the input space into a lower dimensional representation. Principal Component Analysis (PCA) is widely used in engineering machine learning studies as a dimensionality reduction technique that generates new, uncorrelated features capturing the largest variance in the data. Wan et al. [
16] examined the effect of dimensionality reduction on predicting concrete compressive strength using eight original input variables. A comparison on performance was conducted among using all features, six PCA-selected features, and six manually chosen features. The results showed that the ANN with PCA derived features improved test accuracy from an R
2 of 0.913 to 0.934 by mitigating noise and multicollinearity. Although prediction accuracy decreased slightly with fewer inputs, training became noticeably faster. Similarly, Sun et al. [
17] applied a PCA-ANN model to predict frozen soil strength and found that using only four to five principal components, which captured 90–95% of the data variance, produced predictions nearly identical to those using the full feature set. However, since PCA is an unsupervised reduction method, it does not always guarantee improved model performance. It primarily removes linear correlations and noise. For instance, in a recent study [
16], an extreme gradient boosting model performed worse with PCA-reduced features than with the original inputs, likely because the discarded components contained nonlinear information useful for prediction.
While existing feature selection and transformation techniques have shown promise in improving ANN performance, they also exhibit several limitations. Most feature selection methods rely heavily on model-specific importance metrics (e.g., weight sensitivity analysis in ANNs or permutation-based feature importance) or optimization heuristics (e.g., genetic algorithms and particle swarm optimization), which may overlook feature redundancy and inter-feature correlations, because such metrics typically evaluate features individually or focus on optimizing a global objective without explicitly accounting for dependencies among features. Conversely, transformation-based approaches such as PCA eliminate collinearity but disregard nonlinear dependencies and lack interpretability, as the transformed components no longer represent physical variables. Moreover, these methods are often applied in isolation (e.g., used separately without combining feature selection and transformation approaches), which can result in inconsistent improvements across datasets and problem types.
Recent advances in embedded and ensemble-based feature selection have provided additional tools for handling high-dimensional inputs. For instance, SHapley Additive exPlanations (SHAP)-based selection methods leverage game-theoretic attribution to quantify each feature’s contribution to model output, offering strong interpretability and the ability to capture nonlinear interactions [
18]. However, SHAP values are computed on a feature-by-feature basis and may still retain redundant predictors that exhibit high mutual correlation, which can negatively influence ANN training stability. Similarly, XGBoost-based feature selection has gained popularity due to its capacity to model nonlinear relationships and rank features through split-based gain metrics [
19]. Yet, the tree-boosting mechanism tends to favor features that perform well in early splits, and its importance scores do not explicitly account for inter-feature dependencies or multicollinearity. As recent studies have noted, the lack of redundancy-aware mechanisms in these modern methods can lead to inconsistent selection behavior across datasets with varied feature interactions [
20]. These limitations highlight the continued need for feature reduction frameworks that integrate multiple selection criteria while explicitly addressing feature redundancy.
To address these shortcomings, this study develops a robust two-stage input reduction framework that combines statistical and model-based feature selection with autocorrelation-based redundancy elimination. The objective is to evaluate this framework across 32 publicly available datasets, assess its stability, sensitivity and efficiency, and further demonstrate its effectiveness through a case study on ANN-based prediction performance.
3. Feature Importance Ranking and Redundancy Elimination (FIRRE) Methods
To improve the optimization efficiency and interpretability of high-dimensional input features in ANN, a two-stage feature reduction framework, termed Feature Importance Ranking and Redundancy Elimination (FIRRE), is proposed. As presented in
Figure 1, FIRRE operates in two sequential steps, namely feature ranking based on importance scores, and redundancy removal using pairwise autocorrelation analysis.
In the first stage, four feature selection methods are employed to quantify the importance of the original input features. These methods are selected to evaluate feature relevance to the target variable from both statistical and model-based perspectives. The four methods are as follows:
(a) Pearson Correlation Analysis. This method evaluates the linear association between each input feature and the target variable and is commonly used for preliminary variable screening. The Pearson correlation coefficient is calculated using Equation (1), where
x denotes the feature,
y represents the target variable, and
n is the number of samples in the dataset.
(b) Recursive Feature Elimination (RFE). RFE evaluates feature importance by iteratively training a model and removing the least important features to retain an optimal subset. In this study, Support Vector Machine is adopted. The features with the lowest scores are excluded, and the iteration process is repeated until there is only the top 60% of the features are retained.
(c) Random Forest. This approach ranks and selects features based on important scores obtained from a trained random forest model. The Mean Decrease in Impurity (MDI) criterion is employed to quantify the average reduction in node impurity resulting from splits on a given feature across all trees in the ensemble, as presented in Equation (2). In this equation,
denotes the decrease in impurity from a split, and
represents the total number of trees in the ensemble.
(d) SelectKBest. This method evaluates features individually using statistical hypothesis testing. As shown in Equation (3), the F-test for regression is used as the scoring function to quantify the linear dependency between each feature and the continuous target variable, where R
2 denotes the coefficient of determination,
n represents the number of the samples. The top 60% of features with the highest F-scores are then retained for subsequent analysis.
The four feature ranking methods were selected to provide complementary assessments of input importance. Pearson correlation captures linear statistical dependence between individual inputs and the target variable, while the F-test evaluates group-wise variance-based relevance [
46]. In contrast, RFE-SVM and Random Forest provide model-based rankings that account for nonlinear relationships and feature interactions [
47]. Combining these methods improves robustness by reducing the bias associated with any single criterion.
The scores obtained from the four feature selection methods are first normalized and combined using a weighted average to produce an integrated importance score. Features accounting for the top 60% of cumulative importance are retained for the next stage. Such ensemble scoring strategies are widely applied in feature selection research and have been shown to outperform single-method approaches [
48,
49,
50].
In the first stage, the primary objective is to evaluate the relationship between each feature and the target variable. In the second stage, the focus shifts to examining inter-feature relationships to mitigate high collinearity, which can cause gradient instability and model overfitting. To identify and remove redundant features, a pairwise Pearson correlation matrix is constructed using the features retained from the first stage. A correlation threshold of ρ > 0.95 is applied to detect strongly correlated feature pairs. For each correlated pair, the feature with the lower importance score is discarded, ensuring that the more informative variable is preserved. This process adheres to the low-collinearity principle commonly recommended in variable selection for regression and neural-network models. By eliminating highly correlated variables, this procedure not only reduces redundancy but also compacts the feature space, stabilizes model training, and enhances interpretability. It effectively mitigates the distortive effects of multicollinearity, which are particularly critical for models sensitive to input scaling and weight updates.
The proposed FIRRE framework integrates multi-source feature importance evaluation with a structured correlation-based pruning mechanism. This hybrid approach, grounded in both theoretical rationale and empirical evidence, yields a more compact, stable, and interpretable feature set for downstream neural-network modeling. Comparable methodologies have been widely adopted in high-dimensional data analysis.
4. Cross-Dataset Comparative Evaluation
4.1. Comparison with Existing Methods
This section presents a cross-dataset comparative evaluation of FIRRE against widely used dimensionality reduction and feature selection approaches, namely Random Feature Selection (RFS), Principal Component Analysis (PCA), Variance-Threshold Filtering (VTF), and K-means clustering. For each of the 32 datasets, each method was applied within an identical preprocessing and training pipeline to reduce the input dimensions. The predictive performance of ANN was assessed on held-out data using the coefficient of determination (R
2) and root mean square error (RMSE). Higher R
2 and lower RMSE indicate better performance. The aggregated outcomes across datasets, together with per-dataset results, are presented in
Figure 2 and
Figure 3.
In terms of R2, FIRRE ranked 1st in 18 out of 32 datasets, and ranked in the top 2 in 25 out of 32 datasets, demonstrating superior or highly competitive predictive accuracy across the majority of datasets. Regarding RMSE, FIRRE ranked 1st in 20 out of 32 datasets, and ranked in the top 2 in 26 out of 32 datasets, indicating consistently low prediction errors and strong generalization performance. These rankings prove FIRRE’s robust performance, with the method achieving first or second place in approximately 80% of all datasets for both evaluation metrics. Notably, FIRRE maintained competitive performance even in datasets where it did not rank first, rarely falling below third place, which highlights its stability and reliability across diverse data characteristics and dimensionality challenges.
4.2. Comparison with Novel Methods
This section presents a cross-dataset comparative evaluation of FIRRE against advanced machine learning-based feature selection approaches, namely Least Absolute Shrinkage and Selection Operator (LASSO) [
51], Extreme Gradient Boosting feature importance (XGB) [
19], Boruta algorithm [
52], and SHapley Additive exPlanations (SHAP) [
18]. For each of the 32 datasets, each method was applied within an identical preprocessing and training pipeline to reduce the input dimensions. The predictive performance of ANN was assessed on held-out data using R
2 and RMSE. The aggregated outcomes across datasets, together with per-dataset results, are presented in
Figure 4 and
Figure 5.
The comparative analysis reveals that FIRRE demonstrates highly competitive performance against these advanced machine-learning-based feature selection methods. In
Figure 4, FIRRE maintains consistently high R
2 values across most datasets, exhibiting particular stability in challenging scenarios where competing methods show performance degradation. In datasets 2, 7, 13, 20, and 23, where several methods experience marked drops in predictive accuracy (R
2 below 0.7 or 0.6), FIRRE sustains robust performance above these thresholds. While LASSO, XGB, Boruta, and SHAP occasionally achieve comparable or superior results on datasets with moderate complexity, FIRRE demonstrates greater consistency across the diverse data characteristics in the benchmark suite.
Regarding RMSE, FIRRE achieves particularly pronounced advantages in high-dimensional datasets (29–32), where errors reach the 10−1 to 10−4 scale compared to 100 to 10 or higher for other methods. This performance gap suggests FIRRE’s iterative refinement mechanism is especially effective for complex problems where feature interactions and redundancies pose greater challenges. Quantitatively, FIRRE ranked 1st in R2 for 15 out of 32 datasets and in the top 2 for 23 out of 32 datasets (72%). For RMSE, FIRRE ranked 1st in 17 datasets and in the top 2 for 24 datasets (75%). Even when not ranking first, FIRRE typically remains within the top three methods, revealing its reliability and competitiveness against state-of-the-art feature selection approaches.
4.3. Stability Analysis
To further assess the robustness of the proposed FIRRE framework, a stability analysis was conducted to evaluate its consistency across different datasets compared with eight competing methods, namely RFS, PCA, VTF, K-means, LASSO, XGB, Boruta, and SHAP. While previous sections demonstrated that FIRRE outperforms other methods in most cases, this analysis focuses on evaluating stability. Although FIRRE is not always the best-performing approach for every single dataset, FIRRE exhibits markedly superior stability. In contrast, the other four methods occasionally achieve higher accuracy on certain datasets but perform poorly on others, indicating strong dataset dependency and weak generalization capability. This instability severely limits their practical applicability, whereas FIRRE maintains consistently good performance across all datasets.
To quantitatively evaluate this stability, the relative errors of R
2 and RMSE between the dimension-reduced and original results were computed for five representative datasets, using Equations (4) and (5).
As shown in
Figure 6, two boxplots were drawn to illustrate the distributions of relative errors in R
2 and RMSE for all methods, with corresponding variances calculated for quantitative comparison. FIRRE achieved the lowest median relative error and the smallest variance for both metrics, indicating minimal and consistent accuracy loss after feature reduction. For R
2, FIRRE demonstrated a variance of σ
2 = 0.018, substantially lower than almost all competing methods. Among the baseline methods, RFS (σ
2 = 0.079) and K-means (σ
2 = 0.086) exhibited the highest variances, representing the poorest stability. Notably, the advanced machine learning methods showed mixed results: while LASSO achieved competitive stability (σ
2 = 0.017), XGB (σ
2 = 0.023), Boruta (σ
2 = 0.016), and SHAP (σ
2 = 0.020) demonstrated modest improvements over traditional methods but still exhibited comparable or greater variability than FIRRE. For RMSE, FIRRE again demonstrated superior consistency with σ
2 = 8272.336, considerably lower than RFS (σ
2 = 52,764.258), K-means (σ
2 = 57,622.798), and even the advanced methods such as XGB (σ
2 = 12,812.966) and Boruta (σ
2 = 30,185.304). LASSO (σ
2 = 34,070.728) and SHAP (σ
2 = 8744.576) showed relatively better stability among competing methods but remained less consistent than FIRRE.
These findings confirm that FIRRE provides highly stable and reliable dimensionality reduction performance across diverse datasets. Its minimal fluctuation in predictive accuracy demonstrates strong generalization and robustness. While some advanced machine learning methods (particularly LASSO and SHAP) approach FIRRE’s stability in certain metrics, FIRRE maintains consistently superior performance across both R2 and RMSE evaluations. This balanced stability makes FIRRE a more dependable choice for applications in engineering practices where consistent performance across varied data characteristics is essential.
4.4. Stage-Wise Analysis of FIRRE
To evaluate the contribution of each reduction stage within the proposed FIRRE framework, a stage-wise analysis was conducted across all 32 datasets. Since FIRRE comprises only two sequential stages, this analysis serves as a focused ablation study examining the incremental contribution of each stage. Three configurations were compared: (1) RFS, which serves as a baseline by randomly selecting features to match the same reduced dimensionality as FIRRE; (2) Stage 1 only, applying importance screening to remove weakly correlated features; and (3) the full FIRRE framework, combining both stages.
Figure 7 presents the comparison through mean values, standard error bars, and individual dataset distributions for R
2 and RMSE.
Overall, both Stage 1 and FIRRE yield positive relative gains in R2 and negative relative changes in RMSE, demonstrating systematic improvement over the random baseline. Stage 1 importance screening removes variables that have weak correlations with the target, leading to a clear median R2 improvement of 48.1% and a corresponding RMSE reduction of 21.3%. These improvements indicate that eliminating non-informative inputs suppresses noise propagation within the ANN and enhances generalization stability.
The full FIRRE framework further improves predictive performance by pruning redundant features with high pairwise correlation. Although the gains beyond Stage 1 are smaller in magnitude, the average R2 improvement reaches almost 50% relative to RFS, and RMSE shows an additional 20% decrease on average. The reduction in performance variance also highlights FIRRE’s enhanced robustness and consistency across datasets.
The scattered dots beside each bar reveal that while a few datasets exhibit moderate increase, the overwhelming majority cluster above the RFS baseline, confirming that both stages contribute positively in most cases. Taken together, these findings verify that Stage 1 provides the primary accuracy enhancement through relevance screening, whereas Stage 2 delivers additional refinement and stability through redundancy elimination. The two-stage procedure thus ensures reliable accuracy gains relative to random selection while achieving substantial dimensionality reduction and improved efficiency.
5. Sensitivity and Efficiency Analysis
In this section, the results after applying FIRRE to the 32 datasets are presented and analyzed. The results are presented in
Figure 8.
As presented in
Figure 8, the use of FIRRE method led to a clear reduction in input dimensionality across the 32 datasets while maintaining prediction accuracy. The number of input variables decreased substantially, with a median reduction of approximately 35% and an average reduction of around 40%. More than half of the datasets were reduced to five or fewer inputs, indicating that FIRRE effectively identified and removed redundant or weakly correlated features without compromising the representational capacity. With respect to prediction performance, R
2 remained largely stable after applying FIRRE method. Across all datasets, the mean change in R
2 was only −0.008, and the median change was −0.001, revealing minimal overall deviation. Moreover, 14 datasets exhibited improved R
2, while 18 exhibited slight decreases, most within ±0.2. The largest observed improvement was an increase of 0.1, whereas the largest decline was −0.11. These results confirm that FIRRE achieves a substantial simplification of model inputs with negligible impact on prediction accuracy, which enhances both the interpretability and computational efficiency of ANN models in engineering applications.
5.1. Sensitivity Analysis
In
Figure 9, a bubble plot was used to provide a comprehensive visualization of the relationship between input parameter reduction, predictive accuracy, and dataset characteristics after applying FIRRE. Along the horizontal axis, input reduction ranges from 20% to 70%, showing that most datasets underwent moderate dimensionality reduction. The vertical distribution of ΔR
2 ranges from −0.12 to 0.10, with more than 30% of the datasets concentrated around zero and around 25% of datasets higher than zero. This indicates that most datasets maintained nearly identical and even better predictive accuracy after feature reduction. Only 18% of datasets exhibit notable decrease in predictive accuracy, with ΔR
2 exceeding −0.05, reflecting moderate decline. The color gradient, representing ΔRMSE, shows mostly yellow hues near the center, corresponding to minimal changes in mean absolute error. A few darker colored bubbles that have higher input reduction ratios indicate cases where prediction error decreases notably, implying that more aggressive reduction occasionally enhanced model generalization. The purple circle corresponds to Dataset 23 (N = 12,000) and shows a large negative ΔRMSE, indicating a substantial reduction in RMSE after applying FIRRE. Although ΔR
2 is slightly negative, RMSE decreases, suggesting that reducing the inputs from 7 to 3 may remove some predictive information captured by R
2 while still improving average prediction accuracy as measured by RMSE.
Regarding the effect of sample size on the application of FIRRE,
Figure 10 illustrates the relationship between dataset size, initial input dimensionality, and the change in predictive performance (ΔR
2) after applying FIRRE. Overall, most datasets cluster around ΔR
2 = 0, reaffirming that FIRRE generally maintains model accuracy regardless of dataset scale or input complexity. Moreover, the marker shapes reveal useful trends across input dimensionality groups. Datasets with fewer than 8 inputs predominantly exhibit ΔR
2 values close to zero or slightly negative, suggesting that when input dimensionality is already low, further reduction provides limited benefit and may occasionally result in minor information loss. Datasets with moderate inputs show small variations on both sides of the zero line, indicating a balanced effect where FIRRE selectively improves or slightly reduces accuracy depending on the correlation structure among variables. In contrast, datasets with more than 15 inputs show more frequent positive ΔR
2 values, including the highest gains observed. This trend suggests that FIRRE is particularly effective for higher-dimensional problems, where redundant features are more prevalent and their removal improves model generalization.
Across all dataset sizes, from small laboratory datasets to large scale numerical simulations, the results confirm that FIRRE performs consistently, with no clear dependence of ΔR
2 on sample size. Instead, the primary factor influencing performance gain is the initial input dimensionality, where the higher the number of original features, the more likely FIRRE enhances predictive accuracy. Overall, both
Figure 9 and
Figure 10 illustrate that FIRRE consistently reduces input dimensionality without significantly sacrificing prediction accuracy, and in some instances, improved model efficiency and stability by eliminating redundant or noisy features.
5.2. Efficiency Analysis
5.2.1. Predictive Efficiency
To further evaluate the impact of feature reduction on model performance, two predictive efficiency indicators were proposed. These two indicators quantify the change in prediction accuracy or RMSE relative to the proportion of inputs removed. Specifically, the efficiency of feature reduction on R
2, E
R2, and the efficiency of feature reduction on RMSE, E
RMSE, were computed using Equations (6) and (7), where E
R2 > 0 indicates an improvement in model accuracy per fraction of input reduction, and E
RMSE < 0 represents a reduction in the RMSE per fraction of input reduction. These normalized indicators provide a fair basis for comparing datasets with different input dimensions and reduction ratios, allowing a more consistent interpretation of FIRRE’s predictive performance. The results are presented in
Figure 11 and
Figure 12.
As presented in
Figure 11, E
R2 exhibits considerable variation across datasets but demonstrates a balanced distribution around zero for the majority of cases. Positive E
R2 values are observed primarily in datasets with high dimensional inputs, where dataset 2 with 27 inputs has 0.09 E
R2, datasets 21 and 22 with 27 inputs have 0.27 and 0.25 E
R2, respectively. This indicates that FIRRE successfully identifies and removes redundant or weakly corelated inputs, thereby improving generalization. In contrast, slightly negative E
R2 values occur mostly in low dimensional input datasets, where further reduction can remove a small portion of informative variables and lead to marginal declines in R
2. Overall, most datasets exhibit modest efficiency magnitudes, demonstrating that FIRRE achieves a balanced performance between simplification and predictive accuracy.
As presented in
Figure 12, E
RMSE reveals a similar pattern. Approximately 16 out of 32 datasets show negative E
RMSE values, suggesting that FIRRE not only maintains but actively improves prediction accuracy by mitigating overfitting and reducing noise sensitivity. These improvements are most evident in datasets with many inputs, where redundancy is high. Conversely, a few smaller datasets show positive E
RMSE values, indicating slightly higher errors after reduction. However, these datasets remain limited in magnitude. Overall, the efficiency analysis demonstrates that FIRRE delivers consistent and stable predictive performance across diverse datasets, simultaneously achieving input simplification and enhanced model generalization.
5.2.2. Computational Efficiency
To further evaluate the computational benefits of input reduction, a time improvement indicator was used to quantify the change in computation time before and after applying FIRRE. This indicator can be calculated in Equation (8), where positive values indicate a reduction in total computation time, reflecting improved computational efficiency after feature reduction. Computation time includes the complete training and validation process for each dataset and is plotted on a logarithmic scale to accommodate the wide variation in dataset sizes and model complexities.
As shown in
Figure 13, FIRRE consistently reduces computation time across most datasets, demonstrating its effectiveness in enhancing computational efficiency. Approximately 85% of the datasets exhibit a noticeable reduction in training time, indicating that feature reduction generally accelerates model convergence and improves computational performance. The blue bars are generally lower than the gray bars, particularly in datasets with a large number of inputs or samples, such as datasets 3, 13, 21, and 29 to 32. The corresponding dotted line indicates that most datasets achieve a time improvement between 10% and 60%. However, a few smaller datasets show minimal or slightly negative improvement, likely due to nonlinear overheads from deeper architectures, batch size or input–output inefficiencies, and hardware-level mismatches. These cases indicate that when system or selection costs approach the scale of training time, the overall gains may diminish slightly [
53,
54]. It should be noted that hardware-level mismatches refer to minor variations in system resource allocation, background processes, or memory management that can slightly affect computation time. Changes in input features, such as different numbers of inputs, may interact with these hardware-level factors, leading to small variations in measured training times.
The logarithmic scale highlights that FIRRE’s impact becomes more pronounced as computation time increases. For computationally intensive datasets, feature reduction results in substantial time savings, primarily due to the smaller input layer size and fewer network connections during training. Conversely, datasets with limited input or already efficient training configurations show less noticeable improvement. Overall, the results demonstrate that FIRRE not only simplifies model inputs but also enhances computational efficiency, particularly for medium- and large-scale datasets where training time is a limiting factor.
5.3. Summary
The effect of FIRRE framework can be characterized using six complementary indicators that jointly describe its predictive and computational performance. Besides the percentage of input reduction (%), the percentage of computation time reduction (%), E
R2 (%), E
RMSE (%), ΔR
2 and ΔRMSE are also incorporated and normalized using Equations (9) and (10) into relative values. Together, these six indicators capture FIRRE’s effects on accuracy, efficiency, and model simplification in a consistent, quantitative manner.
Figure 14 presents radar plots summarizing the median performance and interquartile range (IQR) across six normalized indicators (0 to 1 scale) for three datasets groups categorized by initial input dimensionality, namely fewer than 8, between 9 and 15, and larger than 15 input variables. The indicators include changes in predictive accuracy (ΔR
2, ΔRMSE) and efficiency metrics (E
R2, E
RMSE), alongside input and time reduction percentages. For ΔR
2 and E
R2, higher values indicate performance improvement, while for ΔRMSE and E
RMSE, lower values (more negative) indicate better performance as they represent error reduction. However, in the normalization process, datasets with extremely large baseline RMSE values can produce disproportionately large ΔRMSE and E
RMSE magnitudes, which dominate the normalization range and cause other datasets to cluster near the maximum normalized value.
High-dimensional datasets exhibit the largest normalized gains in ΔR2 and input reduction, confirming that FIRRE achieves substantial dimensionality compression while maintaining accuracy when feature redundancy is high. Medium-dimensional datasets demonstrate balanced performance across all indicators with median values consistently above 0.5 and narrow IQR bands, reflecting optimal trade-offs between simplification and accuracy. Low-dimensional datasets show more modest improvements, as fewer redundant features are available for removal, though FIRRE maintains minimal accuracy loss.
Overall, these results confirm that FIRRE adapts effectively to varying dimensionality, delivering greatest benefits in high-dimensional problems while maintaining stable performance across all dataset complexities.
6. Case Study: Prediction of Dynamic Modulus of Asphalt Mixtures
To further demonstrate the practical applicability and interpretability of the proposed FIRRE framework in engineering contexts, this section presents a case study using a representative dataset. Whereas the preceding sections focused on cross-dataset evaluation, sensitivity analysis, and computational efficiency, the current case study illustrates FIRRE’s performance when applied to a specific engineering problem. Through a step-by-step implementation, the reduced input set, prediction accuracy, and computational efficiency are examined in detail, emphasizing how FIRRE enhances model generalization and simplifies input representation without compromising predictive accuracy.
6.1. Dataset Description and ANN Architecture
As a case study, the proposed FIRRE method is applied on an ANN dataset for dynamic modulus prediction of asphalt mixtures [
55]. The original ANN structure was a Genetic Algorithm modified ANN (GA-ANN), consisting of 25 input parameters, one hidden layer with 16 hidden neurons and 1 output parameter. Hyperparameters for the architecture are summarized in
Table 2. The architecture of the ANN was evaluated, and all hyperparameters were fine-tuned and reported in the previous study [
55].
The input and output parameters are summarized in
Table 3. As dynamic modulus is the material characteristic, the original study classified the input parameters into the properties of various components and testing conditions. It was reported that the input parameters had not been checked for correlation evaluation or redundancy elimination, as the 25 input parameters could help characterize the material composition of aggregates, base binder, aged binder, as well as material testing method and aging procedures [
55].
6.2. Application of FIRRE Reduction Method
The FIRRE method was applied to reduce redundant or weakly informative variables. In stage 1, the importance ranking module removed inputs with weak correlations to the target output, reducing the feature set from 25 to 16 variables. Using the same GA-ANN configuration, model performance improved from R
2 = 0.926 and RMSE = 2259 to R
2 = 0.947 and RMSE = 1915, confirming that removal of irrelevant variables enhanced predictive performance. In stage 2, a pairwise Pearson-correlation analysis was performed on the remaining 16 variables to identify and remove redundant inputs.
Figure 15 presents the corresponding correlation map. In
Figure 15a, the initial correlation matrix for all 25 inputs reveals strong multicollinearity among variables describing coarse aggregate gradation and RAP binder properties. Then, in
Figure 15b, the reduced matrix after stage 1 still presents moderate cross correlations between binder related and aging related features. After stage 2, the final correlation structure depicted in
Figure 15c after redundancy pruning shows weakly correlated variables. The sparse pattern confirms the elimination of multicollinearity and ensures that each retained feature provides distinct information. Subsequent retaining of the ANN using the 7 inputs configuration yielded the highest predictive accuracy, with R
2 = 0.966 and RMSE = 1492. This demonstrates that removing highly correlated variables stabilizes the learning process and improves model efficiency without sacrificing accuracy. The prediction results are presented in
Figure 16.
6.3. Interpretation on the Reduced Input Parameters
Inspection of the reduced input composition clarifies the physical meaning of FIRRE’s selection. After stage 1, 9 features were eliminated, including large sieve aggregate contents (13.2 mm, 9.5 mm, 4.75 mm), crumb rubber mesh, aged binder content in 0–3 and 3–5 mm RAP, aged binder penetration grade and softening point, and the accelerated factor. Among these, three were related to coarse aggregate gradation, one described crumb rubber characteristic, four described RAP binder properties, and one represented the laboratory aging condition. As noted in the study [
10], mixture gradations were controlled to be consistent among groups; thus, the three largest sieve sizes (13.2 mm, 9.5 mm, and 4.75 mm) were excluded due to low correlation with the target output. Since only a single crumb rubber type (30 mesh) and a single accelerated aging factor (53.1) were used, their exclusion was expected. Notably, aside from RAP content (0–3 mm and 3–5 mm), all other RAP-related binder properties were removed, suggesting weak correlation between these RAP binder properties and the target modulus.
After Stage 2, the input parameters were further reduced from 16 to 7, removing nine additional variables: aggregates retained on 2.36 mm, 1.18 mm, 0.6 mm, 0.15 mm, and 0.075 mm sieves, filler content, RAP content 0–3 mm, RAP content 3–5 mm, and base binder softening point. Six of these were aggregate gradation parameters. Following FIRRE, only aggregate retained on the 0.3 mm sieve remained, indicating that this parameter had the strongest correlation with the target modulus while showing minimal redundancy with other inputs. In this case, aggregate retained on the 0.3 mm sieve might be the critical factor that affects the dynamic modulus of asphalt mixtures. Interestingly, both RAP content variables were removed at this stage, possibly because their influence, along with that of RAP on gradation, was captured by the 0.3 mm sieve retention parameter when the designing aggregate gradation. The base binder softening point was also excluded, likely due to redundancy with penetration grade, as both describe binder rheological properties. Thus, only penetration grade was retained among the final seven inputs.
Overall, the case study confirms that FIRRE can automatically identify physically meaningful and statistically independent variables, enhancing both predictive reliability and interpretability of ANN models for asphalt materials.