Next Article in Journal
Carbonaceous Shale Deposits as Potential Unconventional Sources for Rare Earth Elements at the Witbank Coalfield, Permian Vryheid Formation, South Africa
Previous Article in Journal
Development and Optimization of Bentonite-Based Slurry Sealing Material
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Approaches for Predicting the Elastic Modulus of Basalt Fibers Combined with SHapley Additive exPlanations Analysis

1
Xinjiang Biomass Solid Waste Resources Technology and Engineering Center, Kashi University, Kashi 844006, China
2
School of Resources and Environmental Engineering, Shandong University of Technology, Zibo 255000, China
*
Author to whom correspondence should be addressed.
Minerals 2025, 15(4), 387; https://doi.org/10.3390/min15040387
Submission received: 5 March 2025 / Revised: 1 April 2025 / Accepted: 3 April 2025 / Published: 5 April 2025
(This article belongs to the Section Clays and Engineered Mineral Materials)

Abstract

:
The elastic modulus of basalt fibers is closely associated with their chemical composition. In this study, eight machine learning models were developed to predict the elastic modulus, with hyper-parameter tuning implemented through the GridSearchCV technique. Model performance was evaluated using the coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE). SHAP analysis was employed to uncover the relevance of oxide compositions and their interactions with the elastic modulus. Among these models, the Categorical Boosting algorithm exhibited the best results, with an R2 of 0.9554, an RMSE of 4.7556, and an MAE of 2.0323. SHAP analysis indicated that CaO had the most significant influence on elastic modulus predictions. The importance of other oxides was ranked as follows: SiO2, Al2O3, MgO, K2O, Na2O, Fe2O3, FeO, and TiO2. Additionally, SHAP analysis determined oxide ranges for positive elastic modulus prediction. This research provides new insights into leveraging machine learning to optimize the mechanical properties of basalt fibers.

Graphical Abstract

1. Introduction

Basalt fibers, recognized as a high-performance fiber material, have gained considerable attention for their superior mechanical strength, thermal resistance, and commendable chemical stability [1]. These fibers are produced from natural basalt rocks through a melt-spinning process and comprise chemicals like SiO2, Al2O3, CaO, MgO, FeO, and Fe2O3 [2]. The elastic modulus of basalt fibers denotes the ratio of incremental stress to incremental strain during the phase of elastic deformation when the fiber is subjected to external force [3,4]. This ratio serves as a critical indicator of the basalt fiber’s rigidity and capacity to resist deformation under stress. In scenarios requiring the endurance of high loads, such as high-strength composite materials [5,6] and reinforced concrete [7,8], a high elastic modulus is essential. Furthermore, the advantages of basalt fibers, including their low production costs [9] and eco-friendliness [10], render them a compelling substitute for reinforcement materials in various industrial applications.
Basalt fiber-reinforced composites are produced by adding basalt fibers as reinforcement into the matrix of composite materials. These reinforced composites include basalt fiber-reinforced cement and basalt fiber-reinforced polymer composites [11], which generally exhibit excellent mechanical properties, chemical stability, and durability [12]. Tumadhir et al. analyzed the elastic modulus of basalt fiber-reinforced concrete with different basalt fiber volume fractions (0.1%, 0.2%, 0.3%, and 0.5%) [13]. The results indicated that the elastic modulus reached its maximum value at a volume fraction of 0.3%. Ayub et al. discovered that the elastic modulus of concrete increases with the addition of basalt fiber volume fraction, with the optimal basalt fiber content being between 1% and 3% [14]. Elmahdy et al. conducted research on the elastic modulus of basalt and glass–epoxy composites under various strain rates. Across all tested strain rates, the elastic modulus of basalt–epoxy composites surpassed that of glass–epoxy composites, with a difference ranging from 3.7% to 41% [14]. In a comparative study by Lopresto et al., the Young’s modulus of basalt composites was found to be 45% higher than glass composites [15]. Obviously, the elastic modulus of basalt fiber has a significant effect on the elastic modulus of reinforced composites.
Previous studies have shown that the elastic modulus of basalt fibers is closely related to their chemical composition. The primary chemical composition of SiO2, Al2O3, CaO, MgO, FeO, and Fe2O3 has various effects on the lattice structure, which, in turn, affects the elastic modulus of the basalt fibers and, consequently, the mechanical properties of basalt-reinforced composites. Ding et al. compared the elastic modulus of basalt fibers and glass fibers, finding that the elastic modulus of basalt fibers is approximately 18% higher than that of glass fibers [16]. Research from Deák et al. shown that SiO2 and Al2O3 are the main components of basalt fibers, and increasing their content can enhance the elastic modulus of basalt fibers [17]. The sum of the content of SiO2 and Al2O3 has a high correlation with the elastic modulus of basalt fibers, with a correlation coefficient of up to 0.80. Oxides formed from ions with a greater ionic radius and lower charges, like Na+, K+, and Ba2+, do not favor the enhancement of the elastic modulus. On the other hand, oxides consisting of ions with a smaller ionic radius and higher polarization abilities, such as Li+, Be2+, Mg2+, AI3+, and Ti4+, contribute to an increased elastic modulus [18].
Machine learning is an effective tool for predicting the mechanical properties of basalt-reinforced materials and reinforced concrete. Wei et al. utilized an Artificial Neural Network (ANN) model to predict the alkali resistance of basalt fibers. The input variables for the model included the content of Si, Al, Fe, Ca, Mg, K, Na, Ti, and Zr in both basalt and glass fibers, with the output variable being the tensile strength retention rate. The optimized ANN model achieved an R2 value of 0.92 on the test set [19]. Sun et al. developed three different eXtreme Gradient Boosting (XGBoost) models to predict the split tensile strength of basalt fiber-reinforced coral aggregate concrete. The ESOA-XGBoost (egret swarm optimization XGBR) performed best, with an R2 of 0.9633 [20]. Alarfaj et al. compared the performance of five machine learning models in predicting the splitting tensile strength of fiber-reinforced recycled aggregate concrete, and the deep neural network showed the best performance, with the highest R2 value of 0.94 [21]. A review study by Machello et al. found that various machine learning algorithms can generally predict the mechanical properties of fiber-reinforced polymer (FRP) composites accurately with minimal error. It also pointed out that more experimental data are needed in future studies to enhance the current database and improve the performance of these machine learning models [22]. It is clear that machine learning can be used to reveal the relationship between the chemical composition of basalts and their elastic modulus.
This study evaluates the performance of eight machine learning models (Multiple Linear Regression (MLR), K-Nearest Neighbors (KNN), decision tree (DT), Support Vector Regression (SVR), and ANN, as well as ensemble machine learning models such as Random Forest (RF, based on the Bagging algorithm), XGBoost (based on the Boosting algorithm), and Categorical Boosting (CatBoost, based on the Boosting algorithm)) in predicting the elastic modulus of basalt fibers. The chemical components SiO2, Al2O3, TiO2, Fe2O3, CaO, Na2O, MgO, FeO, and K2O are used as input variables, with the elastic modulus as the output variable. The coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE) serve as evaluation metrics for assessing the predictive accuracy of the models. SHapley Additive exPlanations (SHAP) is applied to analyze the significance of each input variable in the optimal model and the dependence relationship between variables. This study presents a cost-effective and efficient approach for predicting the mechanical properties of basalt fibers based on their chemical composition, showcasing the potential of machine learning in the industrial production of basalt fibers.

2. Materials and Methods

2.1. Data Pre-Processing

The data utilized in this study comprise experimental and literature data. Among them, 92 sets of experimental data were provided by Sichuan Sizhong Basalt Fiber Technology Research Co., Ltd. The remaining 85 literature datasets were primarily sourced from studies published between 2008 and 2023 [16,17,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42], retrieved through keyword combinations (“chemical composition” with “basalt fiber” or “basalt”) in Elsevier, Springer, Wiley, and related databases. This timeframe was selected to analyze recent advancements in the study of elastic modulus of basalt fiber and its chemical composition, as well as to explore the relationship between them. All literature data were entirely based on peer-reviewed publications focused on the chemical compositions of basalt and their corresponding fiber strengths. The data comprised input variables and output variables. The input variables encompassed the chemical components of SiO2, Al2O3, TiO2, Fe2O3, CaO, Na2O, MgO, FeO, and K2O, while the output variable was the elastic modulus of basalt fibers. The chemical analyses predominantly employed Chemical analysis, ICP-OES, XRF, and EDS to quantify oxides. The elastic modulus of basalt fibers typically refers to the tangent modulus (Young’s Modulus) measured under uniaxial tension, following testing standards including ISO 9163:2005(E) [43], EN ISO 5079:1999 [44], the German standard DIN 65382 [45], GB/T 7690.1-2001 [46], GB/T 38897-2020 [47], GB/T3362-82 [48], ASTM C1557-14 [49], and ASTM D 3379-75 [50]. Analytical methods for chemical composition and elastic modulus in the literature are summarized in Table A1 (Appendix B).
It should be noted that the adoption of different analytical methods (Chemical analysis, ICP-OES, XRF, and EDS) across literature studies indeed introduces systematic variations in measured chemical compositions, while the elastic modulus between single basalt and fiber rovings are indeed different, which may affect the development of machine learning models. However, because different testing methods were used in different studies, it was hard to unify the testing methods when collecting these data, which can lead to systematic variations in the measured values. Rigorous data cleaning was conducted on all collected data, which involved removing outliers and inputting missing values. Outlier removal was performed using the Z-score method, through which outliers were identified and eliminated from the dataset. The Z-score for each data point was calculated using the formula
Z = x μ σ
where x represents the data point, μ is the mean of the dataset, and σ is the standard deviation. A threshold of Z > 3 was set, meaning any data point with a Z -score greater than 3 or less than −3 was classified as an outlier and excluded from the analysis. This threshold was selected based on field-specific conventions and dataset characteristics. The missing values were imputed with the mean value of the remaining variables, which is a commonly adopted method. Following data cleaning, the input variables were standardized, namely normalized to the range of [0, 1], to ensure that all data were on the same scale.

2.2. Machine Learning Models

Representative machine learning models including MLR, KNN, DT, SVR, ANN, RF, XGBoost, and CatBoost were selected for training and testing in this study. An overview of each model’s algorithm, along with its basic principles, advantages, and disadvantages, is provided in Appendix A.

2.3. Hyper-Parameter Optimization

The empirical results suggest that for traditional machine learning applications with limited datasets (e.g., sample sizes below 10,000), a 70:30 training–test split often achieves optimal balance, enabling reliable evaluation of model generalizability. Therefore, the preprocessed data were divided into a training set (70%) and a testing set (30%) to ensure that the models could perform well on unseen data. GridSearchCV was employed for hyper-parameter optimization of each machine learning model. This method involves setting a series of candidate hyper-parameter values and combining them with cross-validation to identify the optimal combination of hyper-parameters that maximizes model performance. This step is crucial for improving the generalization ability and prediction accuracy of the models.

2.4. Model Performance Evaluation

The evaluation metrics R2, RMSE, and MAE were employed to compare the performance of the model on both the training and testing sets after hyper-parameter optimization. These metrics provide a comprehensive evaluation of the model’s predictive accuracy and stability. R2 provides an overall measure of model performance, while RMSE and MAE offer more specific indications of the magnitude of prediction errors. The coefficient of determination, R2, primarily assesses the degree of fit between the model predicted values and the actual observed values, with a value closer to 1 indicating a better fit of the model.
R 2 = 1 S S E S S T
SSE (Sum of Squares due to Error) represents the sum of squared residuals, which is the sum of the squares of the differences between the predicted values and the actual observed values. SST (Total Sum of Squares) denotes the total sum of squared deviations, which is the sum of the squares of the differences between the actual observed values and their mean. The coefficient of determination can also be expressed by the following formula.
R 2 = 1 i = 1 m ( y i   y ^ i ) 2 i = 1 m ( y i   y ¯ i ) 2
RMSE is the square root of the ratio of the sum of the squared deviations between the observed values and the actual observed values to the number of observations, m.
R M S E = 1 m i = 1 m ( y i y ^ i ) 2
MAE is the average of the absolute values of the differences between all predicted values and actual observed values.
M A E = 1 m i = 1 m | y i y ^ i |
In Equations (3)–(5) mentioned above,   y i represents the actual value at the i-th observation point, y ^ i denotes the predicted value at the i-th observation point, and y ¯ i indicates the average of the actual values across all observation points.

2.5. Interpretability Analysis

SHAP is a machine learning interpretation tool grounded in the Shapley value from game theory, capable of quantifying each input variable’s contribution to model prediction [51]. It serves as an effective approach for addressing complex, black-box problems and is applicable to nearly all supervised machine learning models. A significant advantage of the SHAP method is that it provides both global and local explanations. The global explanation can identify which input variables most strongly affect model predictions, while the local explanation helps reveal which input variables significantly influence predictions at specific data points. Additionally, SHAP offers a suite of powerful visualization tools to aid in comprehending and interpreting complex machine learning models. This study uses SHAP to analyze the significance of each oxide in the input variables for the basalt elastic modulus, identify how input variables affect the basalt elastic modulus at various values, and uncover any possible nonlinear relationships or interactions between input variables and the basalt elastic modulus.

3. Results

3.1. Description of Variables and Correlation Analysis

The elastic modulus of basalt fibers is closely correlated to their oxide composition. Table 1 summarizes the range, mean, and variance of the input variables (oxides), while Figure 1 presents a pie chart depicting the average percentage of each oxide in the database. Among all oxides, SiO2 is the most abundant, with content ranging from 42.43% to 66.90% and an average of 54.07%. Following SiO2, Al2O3 shows content ranging from 8.70% to 25.60% and an average of 15.37%. These oxides form the structural backbone of basalt fibers, acting as network formers that provide the fundamental mechanical properties. Other oxides are generally regarded as network modifiers, which alter the network structure of basalt fibers and significantly influence their mechanical behavior.
Table 1 also reveals that SiO2 and Al2O3 exhibit relatively small standard deviations, indicating that the primary components of basalt fiber exhibit minimal fluctuation in content, whereas other oxides have broader distribution ranges. Figure 2 provides a Pearson correlation coefficient (PCC) heatmap illustrating the relationships between each oxide and the elastic modulus. The absolute PCC values for SiO2-Fe2O3, SiO2-Al2O3, and Al2O3-CaO are 0.62, 0.51, and 0.43, respectively, while the correlations among other variables remain below 0.40. This indicates a weak correlation among the input variables, which is favorable for constructing a stable machine learning model.

3.2. Model Performance

To obtain the best performance for each model, grid search combined with k-fold cross-validation (k = 5) was employed to determine the optimal combination of model parameters. The coefficient of determination (R2), root-mean-square error (RMSE), and mean absolute error (MAE) were introduced to evaluate the performance of each model in predicting the elastic modulus of basalt fiber. Smaller RMSE and MAE values indicate better predictive accuracy of the model, while an R2 value closer to 1 signifies a stronger fit to the test set. The hyper-parameter ranges and optimal parameters for each model are shown in Table A2 (Appendix B). The performance of each model is shown in Table 2 and Figure 3.
The R2 values of the models DT, ANN, XGBoost, and CatBoost on the training set are nearly 1, accompanied by lower RMSE and MAE values, indicating strong performance during training. On the test set, the CatBoost model achieves an R2 of 0.9554, an RMSE of 4.7556, and an MAE of 2.0323, demonstrating the lowest error and the highest degree of fit. Although the R2, RMSE, and MAE values of XGBoost are comparable to those of CatBoost, the latter outperforms XGBoost in terms of overall error reduction and predictive accuracy. The ranking of predictive accuracy among the models is as follows: CatBoost, XGBoost, ANN, KNN, RF, MLR, SVR, and DT. CatBoost’s superior performance is attributed to its ability to handle nonlinear relationships, work effectively with small datasets, minimize overfitting, and automatically optimize hyper-parameters. These advantages position it as the best-performing model in this study, surpassing other tree-based algorithms such as RF and DT. Furthermore, the study highlights CatBoost’s strong adaptability to small datasets, further demonstrating its robustness in predictive tasks. Overall, these findings confirm that the CatBoost model delivers the highest predictive accuracy among all models evaluated.
Figure 4a shows a linear regression plot comparing the actual values to the predicted values of the CatBoost model on the test set. Figure 4b illustrates the test set data, predicted values, and corresponding absolute errors. The results reveal strong alignment between the actual and predicted values, with relatively small discrepancies. Specifically, the mean error for the elastic modulus is only 1.19, underscoring the CatBoost model’s accuracy. These findings demonstrate the model’s effectiveness in predicting the elastic modulus of basalt fibers based on their chemical composition.

3.3. SHAP Interpretation

The evaluation of model performance demonstrates that the CatBoost model achieves the highest accuracy in the elastic modulus prediction task. However, its complexity poses challenges in understanding and interpreting its behavior during model training, testing, and internal decision-making processes. This lack of interpretability limits the model’s transparency and credibility in practical applications. To address this limitation, this study utilizes the SHAP method to compute the specific contribution (SHAP value) of each input variable to the elastic modulus of basalt. By quantifying the importance of each oxide, SHAP provides consistent, fair, and interpretable insights, enhancing the model’s transparency and practical utility.
Figure 5 demonstrates the importance of each oxide in predicting the elastic modulus, ranked by their average absolute SHAP values. In descending order of importance, the key oxides are CaO, SiO2, Al2O3, MgO, K2O, Na2O, Fe2O3, FeO, and TiO2. Notably, CaO has the most pronounced influence on the elastic modulus, a phenomenon not previously observed in previous studies. In the melt state during fiber formation, divalent CaO plays a critical role as a network modifier, significantly enhancing ionic mobility. Meanwhile, Ca2+ ions, characterized by their large ionic radius and moderate charge, are effective in balancing basalt network charges [18,52]. Their presence facilitates the incorporation of additional atoms into the fiber structure, forming stable R-O ionic bonds that aggregate the network structure and enhance the elastic modulus. SiO2 and Al2O3 follow CaO in importance, contributing significantly to the elastic modulus as they form the structural backbone of basalt fibers, endowing them with their fundamental mechanical properties. MgO has a similar role to CaO, yet due to the smaller ionic radius of Mg2+, it generates a higher electric field intensity. This characteristic promotes the polymerization of polyhedral structures disrupted by alkali metals in the glass network, further improving the fiber’s elastic modulus. The SHAP values for K2O, Na2O, and Fe2O3 are comparable, reflecting their similar influence on the elastic modulus. Similarly, FeO and TiO2 exhibit the lowest SHAP values, indicating their minimal impact, likely due to their relatively low concentrations in the dataset. It is noteworthy that the SHAP values for SiO2, Al2O3, MgO, K2O, Na2O, Fe2O3, FeO, and TiO2 are only 1/7 to 1/3 of that for CaO. This underscores the role of CaO as a network modifier, significantly enhancing network polymerization and thereby improving the fiber’s elastic modulus. Thus, optimizing the CaO percentage emerges as a critical strategy for enhancing the elastic modulus of basalt fibers.
Pearson correlation analysis reveals that the input variables exhibit relatively weak intercorrelations. Consequently, univariate dependency analysis is employed to visualize the impact of individual variables on the machine learning model’s prediction results. Figure 6 illustrates the relationship between the oxide and the elastic modulus, where the x-axis represents the input variable values and the y-axis represents the SHAP values. This figure effectively highlights whether a specific oxide has a positive or negative influence on the elastic modulus prediction. For instance, in the case of CaO, when the SHAP value exceeds 0, the corresponding range of CaO content is [3.40, 8.20]. Within this range, the predicted elastic modulus of basalt fibers increases. Figure 7 summarizes the positive contributions of each input variable to the predicted elastic modulus of basalt fibers.

4. Conclusions

This study constructed eight machine learning prediction models, namely MLR, KNN, DT, SVR, ANN, RF, XGBoost, and CatBoost, to assess the elastic modulus of basalt fiber based on oxide composition. SHAP analysis was conducted to examine the relevance and interaction of oxide composition on the elastic modulus. The following are the main conclusions.
(1)
The correlation among the oxide variables is weak, and there is no significant linear correlation with the elastic modulus.
(2)
The CatBoost model performed best for elastic modulus prediction, scoring 0.9554, 4.7556, and 2.0323 of R2, RMSE, and MAE, respectively.
(3)
The SHAP results revealed a ranking of the input variable importance for the XGBR model of, in descending order, CaO > SiO2 > Al2O3 > MgO > K2O > Na2O > Fe2O3 > FeO > TiO2.
(4)
The calcium oxide content has a significant impact on the elastic modulus of basalt fibers, indicating that adjusting the calcium oxide content is an important approach to improving the elastic modulus of basalt fibers.

Author Contributions

Conceptualization, methodology, writing—original draft, funding acquisition, L.Z.; validation, data curation, N.L.; supervision, writing—review and editing, funding acquisition, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Project Fund of Xinjiang Biomass Solid Waste Resources Technology and Engineering Center (KSUGCZX202204), Shandong Provincial Natural Science Foundation (ZR2021QE016), and National Natural Science Foundation of China (52004228).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. An Overview of the Employed Machine Learning Models

Appendix A.1. Multiple Linear Regression

Multiple Linear Regression (MLR) is a regression model that predicts a dependent variable based on the assumption of linear relationships among multiple independent variables. The core principle of MLR involves fitting data through the method of least squares to identify a line (or in multidimensional space, a hyperplane) that minimizes the sum of squared errors between the observed and predicted values. This process aims to minimize the expression i = 1 n ( y i y i ^ ) 2 , where y i represents the actual value,   y i ^ denotes the predicted value, and the predictive model is given by y = β 0 + β 1 x i 1 + β 2 x i 2 + + β n x i p . In this equation, β represents the regression coefficients, and x signifies the independent variables.
This model is characterized by its simplicity, ease of understanding and interpretation, and effectiveness in fitting data that exhibit linear relationships. However, MLR is sensitive to outliers and performs poorly when attempting to fit nonlinear data.

Appendix A.2. K-Nearest Neighbors Regression (KNN)

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm that predicts the target value by averaging its K-Nearest Neighbors. The training dataset is denoted as D = { x 1 , y 1 , x 2 , y 2 , , x n , y n } . For a sample x to be predicted, the distance (e.g., Euclidean distance d x , x i ) between x and each sample x i in the training set is calculated. Subsequently, the k nearest samples to x are identified and denoted as N k x . Finally, x is predicted based on the y i of these K-Nearest Neighbors, typically using methods such as averaging or weighted averaging. For instance, the predicted value using the averaging method is given by y ^ = 1 k i N k x y i .
The KNN model does not require prior assumptions or modeling of the data and can adapt to various data distributions. It demonstrates good fitting ability for nonlinear data. However, when the dataset is large, the prediction speed becomes slow. It is sensitive to the choice of the value of k and the distance, and is susceptible to the influence of noise and outliers.

Appendix A.3. Decision Tree (DT)

Decision tree (DT) is a supervised learning algorithm that relies on a tree-structured decision-making process. A decision tree comprises nodes and edges, with nodes categorized into root nodes, internal nodes, and leaf nodes. Starting from the root node, the tree gradually partitions the data based on the features of the training data, dividing them into different subsets until leaf nodes are formed. These leaf nodes provide the final prediction values. The DT model is intuitive, easy to understand and interpret, capable of handling nonlinear data, insensitive to missing values, and capable of automatic feature selection. However, it is susceptible to overfitting.

Appendix A.4. Support Vector Regression (SVR)

The Support Vector Machine Regression (SVR) algorithm aims to find an optimal hyperplane such that most data points are as close as possible to the hyperplane while allowing for a controlled margin of error.
Given a training dataset x 1 , y 1 , x 2 , y 2 , , x n , y n , assume there exists a linear function f x = w T x + b to fit the data, where w is the weight vector and b is the bias term. The objective of SVR is to minimize the objective function
m i n w , b , ξ , ξ 1 2 | w | 2 + C i = 1 n ξ i + ξ i
subject to the constraints
y i w T x i b ϵ + ξ i
w T x i + b y i ϵ + ξ i
  ξ i , ξ i 0
Here, ξ i and ξ i are slack variables, C is the penalty parameter, and ϵ is the allowable error margin. By solving this optimization problem, the optimal values of w and b are obtained, thereby defining the regression model.
SVR is particularly robust and effective in handling high-dimensional and nonlinear datasets. However, its computational complexity increases significantly when applied to large datasets. Moreover, the model’s performance is sensitive to the selection of key parameters such as C , ϵ and the choice of kernel function, and its interpretability is relatively limited.

Appendix A.5. Artificial Neural Networks (ANNs)

An Artificial Neural Network (ANN) is a computational model inspired by the structure and function of biological neural networks. It consists of numerous interconnected neurons and learns to make predictions by identifying patterns in input data. Neural networks typically consist of an input layer, one or more hidden layers, and an output layer.
For a neural network with n input neurons, m hidden neurons, and p output neurons, the input vector x = x 1 , x 2 , , x n undergoes a linear transformation with a weight matrix W 1 and a bias vector b 1 , followed by a nonlinear activation function f 1 , yielding the output of the hidden layer h = f 1 W 1 x + b 1 . The output of the hidden layer is then subjected to another linear transformation using weight matrix W 2 and bias vector b 2 , followed by a nonlinear activation function f 2 , resulting in the output layer’s output y ^ = f 2 W 2 h + b 2 . The backpropagation algorithm adjusts the network’s weights and biases by minimizing the loss function, such as the mean-squared error (MSE) loss function:
MSE = 1 N i = 1 N y i ^   y i 2
The model exhibits strong fitting capabilities for complex nonlinear relationships; it has self-learning and self-adaptive abilities; and it can handle various types of data. While it is complex, and the training process is challenging, with a tendency to become stuck in local optima. It requires a large amount of training data and considerable training time. Additionally, its interpretability is poor, making the decision-making process difficult to understand.

Appendix A.6. Random Forest (RF)

Random Forest (RF) is an ensemble learning algorithm based on decision trees. Multiple subsets of the original training dataset are randomly sampled with replacement to construct multiple decision trees, each of which is trained independently. During prediction, each decision tree independently predicts the sample, and the final prediction is obtained by aggregating the results of all decision trees, typically using methods such as averaging or voting. For example, the prediction value using the averaging method is given by
y ^ = 1 K k = 1 K f k x
where K is the number of decision trees, and f k x represents the prediction of the k -th decision tree for the sample x .
The model exhibits excellent predictive accuracy and robustness; effectively handles high-dimensional and nonlinear datasets; demonstrates resilience to outliers and noise; and provides insights into feature importance.
The model’s training and inference times are relatively long, particularly when dealing with a large number of trees; its complexity hinders interpretability; and it may occasionally overfit, depending on the dataset and parameters.

Appendix A.7. Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting Regression (XGBoost) is an efficient Gradient Boosting algorithm that uses decision trees as base learners. Based on the idea of gradient boosting, the process starts with an initial prediction y 0 ^ , often set as the mean of the target values in the training data. During each iteration, a new decision tree f t x is constructed based on the residuals between the current model’s predictions and the actual target values. This new tree is added to the model, updating the prediction as follows:
y t ^ = y t 1 ^ + η f t x
where η is the learning rate. Each decision tree’s structure and parameters are determined by minimizing a differentiable loss function, such as the mean-squared error (MSE):
L y , y ^ = 1 2 y y ^ 2
Regularization techniques, such as penalizing tree complexity, are also applied during training to prevent overfitting.
XGBoost achieves high predictive accuracy and has excelled in many data mining and machine learning competitions. It supports large-scale datasets and distributed computing, automatically handles missing values, and offers excellent scalability and flexibility.
The model is highly complex with numerous parameters, making hyper-parameter tuning challenging. The training process is relatively intricate and requires significant computational resources and time. Additionally, the model is sensitive to outliers.

Appendix A.8. Categorical Boosting (CatBoost))

CatBoost is a gradient boosting-based machine learning algorithm that automatically handles categorical features, delivering superior performance and efficiency. Like XGBoost, CatBoost is built on the gradient boosting framework, improving model accuracy through the iterative addition of decision trees.
CatBoost employs an innovative ordered boosting approach, which considers the training information of previous trees while constructing each new tree, effectively avoiding issues of data reuse and sorting. For categorical features, it utilizes a target statistics-based encoding method to transform categorical features into numerical ones, efficiently leveraging their information. Regularization techniques are also applied to prevent overfitting.
CatBoost offers robust support for categorical features without requiring complex preprocessing. It achieves high prediction accuracy and strong generalization ability, trains quickly with low memory consumption, and is relatively less sensitive to hyper-parameters, making it easier to tune.
Similar to other tree-based ensemble models, CatBoost suffers from limited interpretability. Its performance may be inferior to that of specialized algorithms when dealing with high-dimensional sparse data. Additionally, its robustness to noisy data and outliers requires further improvement.

Appendix B

Table A1. Analytical methods for chemical composition and elastic modulus in the literature.
Table A1. Analytical methods for chemical composition and elastic modulus in the literature.
Analytical Methods for Chemical CompositionAnalytical Methods for Elastic ModulusFiber or RovingReferences
Chemical analysis (ASTMC169-92)ISO 9163:2005(E)roving[16]
ICP-OESEN ISO 5079:1999fiber[17]
ICP-OESGerman standard DIN 65382fiber[23]
EDSGB/T 7690.1-7690.6-2001fiber[24]
XRFISO 5079fiber[25]
GB/T 1549-2008GB/T 38897-2020fiber[26]
XRFISO 5079fiber[27]
Not mentionedNot mentionedfiber[28]
XRFISO 5079fiber[29]
Not mentionedNot mentionedfiber[30]
ICPGB/T3362-82roving[31]
Not mentionedNot mentionedfiber[32]
Not mentionedNot mentionedfiber[33]
Not mentionedNot mentionedfiber[34]
ICPASTM C1557–14fiber[35]
XRFASTM D 3379-75fiber[36]
Not mentionedNot mentionedfiber[37]
Not mentionedNot mentionedfiber[38]
Not mentionedNot mentionedfiber[39]
Not mentionedNot mentionedfiber[40]
XRFNot mentionedfiber[41]
Not mentionedNot mentionedfiber[42]
Table A2. The hyper-parameters ranges and the optimum values.
Table A2. The hyper-parameters ranges and the optimum values.
ModelHyper-ParameterRangeOptimum
MLR---
KNNn_neighbors[1, 50]22
weights[’uniform’, ’distance’]uniform
metric[’euclidean’, ’manhattan’, ’minkowski’]minkowski
DTcriterion[’squared_error’, ’friedman_mse’, ’absolute_error’, ’poisson’]squared_error
max_depth[1, 100]4
min_samples_split[2, 20]2
min_samples_leaf[1, 20]1
max_features[None, ’sqrt’, ’log2’]None
SVRC[0.1, 1, 10, 100, 1000]1.0
kernel[’linear’, ’poly’, ’rbf’, ’sigmoid’]rbf
gamma[’scale’, ’auto’]scale
ANNhidden_layer_sizes[(50,50), (100, 100), (100, 50)](100, 100)
activation[’identity’, ’logistic’, ’tanh’, ’relu’]relu
solver[’lbfgs’, ’sgd’, ’adam’]adam
learning_rate[’constant’, ’invscaling’, ’adaptive’]constant
alpha[0.0001, 0.02, 0.05]0.02
RFn_estimators[10, 300]141
max_depth[1, 100]12
min_samples_split[2, 10]2
min_samples_leaf[1, 10]1
XGBoostn_estimators[50, 500]86
learning_rate[0.01, 0.1, 0.2, 0.3]0.1
max_depth[3, 10]6
subsample[0.5, 1.0]0.9
gamma[0, 10]0
CatBoostiterations[50, 1000]1000
learning_rate[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.2, 0.3]0.03
depth[3, 10]7
l2_leaf_reg[1, 10]3
bagging_temperature[0, 1]1
random_strength[1, 10]1

References

  1. Dhand, V.; Mittal, G.; Rhee, K.Y.; Park, S.J.; Hui, D. A short review on basalt fiber reinforced polymer composites. Compos. B Eng. 2015, 73, 166–180. [Google Scholar] [CrossRef]
  2. Ivanitskii, S.G.; Gorbachev, G.F. Continuous basalt fibers: Production aspects and simulation of forming processes. I. State of the art in continuous basalt fiber technologies. Powder Metall. Met. Ceram. 2011, 50, 125. [Google Scholar] [CrossRef]
  3. Shi, F.J. A study on structure and properties of basalt fiber. Appl. Mech. Mater. 2012, 238, 17–21. [Google Scholar]
  4. Antunes, P.; Domingues, F.; Granada, M.; André, P. Mechanical Properties of Optical Fibers; INTECH Open Access Publisher: London, UK, 2012. [Google Scholar]
  5. Li, G.; Chen, Y.; Wei, G. Continuous fiber reinforced meta-composites with tailorable Poisson’s ratio and effective elastic modulus: Design and experiment. Compos. Struct. 2024, 329, 117768. [Google Scholar]
  6. Bi, C.; Tang, G.H.; He, C.B.; Yang, X.; Lu, Y. Elastic modulus prediction based on thermal conductivity for silica aerogels and fiber reinforced composites. Ceram. Int. 2022, 48, 6691–6697. [Google Scholar]
  7. Alshahrani, A.; Kulasegaram, S.; Kundu, A. Elastic modulus of self-compacting fibre reinforced concrete: Experimental approach and multi-scale simulation. Case Stud. Constr. Mater. 2023, 18, e01723. [Google Scholar] [CrossRef]
  8. Wang, Y.; Hu, S.; Sun, X. Experimental investigation on the elastic modulus and fracture properties of basalt fiber-reinforced fly ash geopolymer concrete. Constr. Build. Mater. 2022, 338, 127570. [Google Scholar] [CrossRef]
  9. Asadi, A.; Baaij, F.; Mainka, H.; Rademacher, M.; Thompson, J.; Kalaitzidou, K. Basalt fibers as a sustainable and cost-effective alternative to glass fibers in sheet molding compound (SMC). Compos. B Eng. 2017, 123, 210–218. [Google Scholar]
  10. Jagadeesh, P.; Rangappa, S.M.; Siengchin, S. Basalt fibers: An environmentally acceptable and sustainable green material for polymer composites. Constr. Build. Mater. 2024, 436, 136834. [Google Scholar]
  11. Khandelwal, S.; Rhee, K.Y. Recent advances in basalt-fiber-reinforced composites: Tailoring the fiber-matrix interface. Compos. B Eng. 2020, 192, 108011. [Google Scholar]
  12. Guo, Z.S.; Hao, N.; Wang, L.M.; Chen, J.X. Review of Basalt-Fiber-Reinforced Cement-based Composites in China: Their Dynamic Mechanical Properties and Durability. Mech. Compos. Mater. 2019, 55, 107–120. [Google Scholar] [CrossRef]
  13. Tumadhir, M. Borhan Thermal and mechanical properties of basalt fibre reinforced concrete. Int. J. Civ. Environ. Eng. 2013, 7, 334–337. [Google Scholar]
  14. Wang, D.; Ju, Y.; Shen, H.; Xu, L. Mechanical properties of high performance concrete reinforced with basalt fiber and polypropylene fiber. Constr. Build. Mater. 2019, 197, 464–473. [Google Scholar]
  15. Lopresto, V.; Leone, C.; De Iorio, I. Mechanical characterisation of basalt fibre reinforced plastic. Compos. B Eng. 2011, 42, 717–723. [Google Scholar]
  16. Ding, L.; Liu, Y.; Liu, J.; Wang, X. Correlation analysis of tensile strength and chemical composition of basalt fiber roving. Polym. Compos. 2019, 40, 2959–2966. [Google Scholar] [CrossRef]
  17. Deák, T.; Czigány, T. Chemical Composition and Mechanical Properties of Basalt and Glass Fibers: A Comparison. Text. Res. J. 2009, 79, 645–651. [Google Scholar]
  18. Wu, Z.; Liu, J.; Chen, X. Continuous Basalt Fiber Technology; Chemical Industry Press Co., Ltd.: Beijing, China, 2020; p. 238. [Google Scholar]
  19. Wei, C.; Zhou, Q.; Deng, K.; Lin, Y.; Wang, L.; Luo, Y.; Zhang, Y.; Zhou, H. Alkali resistance prediction and degradation mechanism of basalt fiber: Integrated with artificial neural network machine learning model. J. Build. Eng. 2024, 86, 108850. [Google Scholar]
  20. Sun, Z.; Li, Y.; Yang, Y.; Su, L.; Xie, S. Splitting tensile strength of basalt fiber reinforced coral aggregate concrete: Optimized XGBoost models and experimental validation. Constr. Build. Mater. 2024, 416, 135133. [Google Scholar]
  21. Alarfaj, M.; Qureshi, H.J.; Shahab, M.Z.; Javed, M.F.; Arifuzzaman, M.; Gamil, Y. Machine learning based prediction models for spilt tensile strength of fiber reinforced recycled aggregate concrete. Case Stud. Constr. Mater. 2024, 20, e02836. [Google Scholar]
  22. Machello, C.; Bazli, M.; Rajabipour, A.; Rad, H.M.; Arashpour, M.; Hadigheh, A. Using machine learning to predict the long-term performance of fibre-reinforced polymer structures: A state-of-the-art review. Constr. Build. Mater. 2023, 408, 133692. [Google Scholar] [CrossRef]
  23. Eduard, K.; Rainer, G.; Jona, S. Basalt, glass and carbon fibers and their fiber reinforced polymer composites under thermal and mechanical load. AIMS Mater. Sci. 2016, 3, 1561–1576. [Google Scholar]
  24. Wei, B.; Cao, H.; Song, S. Tensile behavior contrast of basalt and glass fibers after chemical treatment. Mater. Des. 2010, 31, 4244–4250. [Google Scholar] [CrossRef]
  25. Sergey, I.G.; Evgeniya, S.Z.; Sergey, S.P.; Bogdan, I.L. Correlation of the chemical composition, structure and mechanical properties of basalt continuous fibers. AIMS Mater. Sci. 2019, 6, 806–820. [Google Scholar]
  26. Wang, L. Study on Effect of Basalt Fiber Component on Elastic Modulus. Master of Thesis, Southeast University, Nanjing, China, 2021. [Google Scholar]
  27. Kuzmin, K.L.; Gutnikov, S.I.; Zhukovskaya, E.S.; Lazoryak, B.I. Basaltic glass fibers with advanced mechanical properties. J. Non-Cryst. Solids 2017, 476, 144–150. [Google Scholar] [CrossRef]
  28. Manylov, M.S.; Gutnikov, S.I.; Lipatov, Y.V.; Malakho, A.P.; Lazoryak, B.I. Effect of deferrization on continuous basalt fiber properties. Mendeleev Commun. 2015, 25, 386–388. [Google Scholar] [CrossRef]
  29. Kuzmin, K.L.; Zhukovskaya, E.S.; Gutnikov, S.I.; Pavlov, Y.V.; Lazoryak, B.I. Effects of Ion Exchange on the Mechanical Properties of Basaltic Glass Fibers. Int. J. Appl. Glass Sci. 2016, 7, 118–127. [Google Scholar] [CrossRef]
  30. Wu, Z.; Liu, J.; Jiang, M.; Wang, Y.; Lei, L. A High-Temperature Resistant Basalt Fiber Composition. China Patent CN 201410139342.1, 6 January 2016. [Google Scholar]
  31. Wei, B. Evaluation of Basalt Fiber and Its Hybrid Reinforced Composite Performance. Master’ Thesis, Harbin Institute of Technology, Harbin, China, 2008. [Google Scholar]
  32. Wang, X.; Sun, K.; Shao, J.; Ma, J. Fracture properties of graded basalt fiber reinforced concrete: Experimental study and Mori-Tanaka method application. Constr. Build. Mater. 2023, 398, 132510. [Google Scholar] [CrossRef]
  33. Ramachandran, B.E.; Velpari, V.; Balasubramanian, N. Chemical durability studies on basalt fibres. J. Mater. Sci. 1981, 16, 3393–3397. [Google Scholar] [CrossRef]
  34. Dong, J.F.; Wang, Q.Y.; Guan, Z.W.; Chai, H.K. High-temperature behaviour of basalt fibre reinforced concrete made with recycled aggregates from earthquake waste. J. Build. Eng. 2022, 48, 103895. [Google Scholar] [CrossRef]
  35. Xing, D.; Chang, C.; Xi, X.Y.; Hao, B.; Zheng, Q.; Gutnikov, S.I.; Lazoryak, B.I.; Ma, P.C. Morphologies and mechanical properties of basalt fibre processed at elevated temperature. J. Non-Cryst. Solids 2022, 582, 121439. [Google Scholar] [CrossRef]
  36. Nasir, V.; Karimipour, H.; Taheri-Behrooz, F.; Shokrieh, M.M. Corrosion behaviour and crack formation mechanism of basalt fibre in sulphuric acid. Corros. Sci. 2012, 64, 1–7. [Google Scholar]
  37. Li, R.; Gu, Y.; Zhang, G.; Yang, Z.; Li, M.; Zhang, Z. Radiation shielding property of structural polymer composite: Continuous basalt fiber reinforced epoxy matrix composite containing erbium oxide. Compos. Sci. Technol. 2017, 143, 67–74. [Google Scholar]
  38. Ahmad, M.R.; Chen, B. Effect of silica fume and basalt fiber on the mechanical properties and microstructure of magnesium phosphate cement (MPC) mortar. Constr. Build. Mater. 2018, 190, 466–478. [Google Scholar]
  39. Vejmelková, E.; Koňáková, D.; Scheinherrová, L.; Doleželová, M.; Keppert, M.; Černý, R. High temperature durability of fiber reinforced high alumina cement composites. Constr. Build. Mater. 2018, 162, 881–891. [Google Scholar] [CrossRef]
  40. Qin, J.; Qian, J.; Li, Z.; You, C.; Dai, X.; Yue, Y.; Fan, Y. Mechanical properties of basalt fiber reinforced magnesium phosphate cement composites. Constr. Build. Mater. 2018, 188, 946–955. [Google Scholar] [CrossRef]
  41. Tang, C.; Jiang, H.; Zhang, X.; Li, G.; Cui, J. Corrosion Behavior and Mechanism of Basalt Fibers in Sodium Hydroxide Solution. Materials 2018, 11, 1381. [Google Scholar] [CrossRef]
  42. Li, M.; Gong, F.; Wu, Z. Study on mechanical properties of alkali-resistant basalt fiber reinforced concrete. Constr. Build. Mater. 2020, 245, 118424. [Google Scholar] [CrossRef]
  43. ISO 9163:2005(E); Textile Glass—Rovings—Manufacture of Test Specimens and Determination of Tensile Strength of Impregnated Rovings. International Organization Standardization: Brussels, Belgium, 2005.
  44. ISO/DIS 5079(en); Textile Fibres—Determination of Breaking Force and Elongation at Break of Individual Fibres. International Organization Standardization: Brussels, Belgium, 1999.
  45. DIN 65382; Aerospace; Reinforcement Fibres for Plastics; Tensile Test of Impregnated Yarn Test Specimens. German Institute for Standardisation: Berlin, Germany, 1988.
  46. GB/T 7690.1−2001; Reinforcements—Test Method for Yarns—Part 1: Determination of Linear Density. Standardization Administration of China: Beijing, China, 2001.
  47. GB/T 38897−2020; Non-Destructive Testing—Measurement Method for Material Elastic Modulus and Poisson’s Ratio Using Ultrasoinc Velocity. Standardization Administration of China: Beijing, China, 2020.
  48. GB/T 3362−2017; Test Methods for Tensile Properties of Carbon Fiber Multifilament. Standardization Administration of China: Beijing, China, 2017.
  49. ASTM, C1557−14; Standard Test Method for Tensile Strength and Young’s Modulus of Fibers. American Society Testing and Materials: West Conshohocken, PA, USA, 2014.
  50. ASTM, D3379-75; Standard Test Method for Tensile Strength and Young’s Modulus for High-Modulus Single-Filament Materials. American Society Testing and Materials: West Conshohocken, PA, USA, 1989.
  51. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
  52. Cao, H.; Yan, Y.; Yue, L.; Zhao, J. Basalt Fiber; National Defense Industry Press: Beijing, China, 2017; p. 188. [Google Scholar]
Figure 1. Pie chart of average oxide percentage.
Figure 1. Pie chart of average oxide percentage.
Minerals 15 00387 g001
Figure 2. Pearson correlation coefficient heatmap of variables.
Figure 2. Pearson correlation coefficient heatmap of variables.
Minerals 15 00387 g002
Figure 3. Performance of each model in training data and test data.
Figure 3. Performance of each model in training data and test data.
Minerals 15 00387 g003
Figure 4. Comparison of predicated data and test data. (a) Linear regression plot comparing the actual values to the predicted values of the CatBoost model on the test set. (b) The test set data, predicted values, and the corresponding absolute errors.
Figure 4. Comparison of predicated data and test data. (a) Linear regression plot comparing the actual values to the predicted values of the CatBoost model on the test set. (b) The test set data, predicted values, and the corresponding absolute errors.
Minerals 15 00387 g004
Figure 5. Feature importance of input variables.
Figure 5. Feature importance of input variables.
Minerals 15 00387 g005
Figure 6. Dependence plot of input variables.
Figure 6. Dependence plot of input variables.
Minerals 15 00387 g006
Figure 7. The ranges within which the input variables positively contribute to the TS.
Figure 7. The ranges within which the input variables positively contribute to the TS.
Minerals 15 00387 g007
Table 1. The range, mean, and variance of the input variables.
Table 1. The range, mean, and variance of the input variables.
Input VariableMinimum–MaximumMeanStandard Deviation
SiO2/wt%42.43–66.9054.074.54
Al2O3/wt%8.70–25.6015.373.75
TiO2/wt%0.00–8.461.571.31
Fe2O3/wt%0.30–19.348.563.65
CaO/wt%3.20–18.918.272.26
Na2O/wt%0.20–14.003.121.75
MgO/wt%1.70–19.445.852.69
FeO/wt%0.00–6.621.361.86
K2O/wt%0.00–9.351.581.18
Table 2. The performance of each model.
Table 2. The performance of each model.
ModelTraining DataTest Data
R2RMSEMAER2RMSEMAE
MLR0.295110.30828.05330.170311.69588.7801
KNN0.53188.40155.49920.42709.71996.8333
DT1.00000.04070.00520.051612.50458.1983
SVR0.185411.08168.64770.109912.11379.1008
ANN0.98561.25470.37890.92095.30562.5518
RF0.91843.50752.37760.391610.01526.4477
XGBoost1.00000.04080.00580.93904.99972.2462
CatBoost0.99930.32070.26130.95544.75562.0323
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, L.; Lin, N.; Yang, L. Machine Learning Approaches for Predicting the Elastic Modulus of Basalt Fibers Combined with SHapley Additive exPlanations Analysis. Minerals 2025, 15, 387. https://doi.org/10.3390/min15040387

AMA Style

Zhang L, Lin N, Yang L. Machine Learning Approaches for Predicting the Elastic Modulus of Basalt Fibers Combined with SHapley Additive exPlanations Analysis. Minerals. 2025; 15(4):387. https://doi.org/10.3390/min15040387

Chicago/Turabian Style

Zhang, Ling, Ning Lin, and Lu Yang. 2025. "Machine Learning Approaches for Predicting the Elastic Modulus of Basalt Fibers Combined with SHapley Additive exPlanations Analysis" Minerals 15, no. 4: 387. https://doi.org/10.3390/min15040387

APA Style

Zhang, L., Lin, N., & Yang, L. (2025). Machine Learning Approaches for Predicting the Elastic Modulus of Basalt Fibers Combined with SHapley Additive exPlanations Analysis. Minerals, 15(4), 387. https://doi.org/10.3390/min15040387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop