Previous Article in Journal
Synthesis and Evaluation of Layered Ni–Co and Ni–Co–Ni Electrodes Modified by Molten–Salt Al Deposition/Dissolution Technique for Electrochemical Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Electrochemical Surface COF of Titanium Alloy Using an Enhanced LightGBM with Lag and Rolling Features

School of Mechanical and Electrical Engineering, Shaanxi University of Science and Technology, Xi’an 710021, China
*
Author to whom correspondence should be addressed.
Coatings 2026, 16(6), 680; https://doi.org/10.3390/coatings16060680
Submission received: 9 April 2026 / Revised: 28 May 2026 / Accepted: 30 May 2026 / Published: 4 June 2026

Highlights

What are the main findings?
  • Proposing an enhanced LightGBM model that incorporates lag and rolling-window features to effectively capture the temporal dynamics of friction coefficient.
  • Identifying and ranking the key influencing factors (e.g., solution concentration, test duration) through feature importance analysis, providing actionable insights for process optimization.
What are the implications of the main findings?
  • The modified LightGBM algorithm effectively addresses challenges in modeling high-dimensional, non-linear complex data, exhibiting superior generalization ability and predictive accuracy.
  • Through modified LightGBM-based feature contribution analysis, the model’s predictions are endowed with strong interpretability, providing critical data support and theoretical reference for optimizing the electrochemical machining process of titanium alloys.

Abstract

To achieve accurate prediction of the surface COF (coefficient of friction) of titanium alloys under electrochemical corrosion conditions, this study investigates the tribological behavior of titanium alloys across various solutions, concentrations, voltages, and sliding velocities to construct a systematic dataset. Two machine learning models are developed and optimized: a standard Light Gradient Boosting Machine (LightGBM) and an enhanced LightGBM model incorporating lag and rolling features. These models are employed to predict the friction coefficient and feature importance analysis. The results indicate that solution concentration is the primary factor influencing the friction coefficient of the titanium alloy, followed by test duration, while sliding velocity exerts the least influence. Following experimental validation and iterative optimization, the enhanced LightGBM model, integrated with lag and rolling features, demonstrates superior predictive accuracy, achieving a coefficient of determination (R2) of 0.979 on the training set and 0.951 on the test set. This research establishes a data-driven predictive framework that demonstrates superior accuracy and interpretability compared to models using only raw features, showcasing the potential of feature-engineered machine learning in optimizing electrochemical machining parameters.

1. Introduction

Titanium alloys are widely utilized in aerospace, petrochemicals, biomedicine, and national defense due to their exceptional properties, including low density, high specific strength, excellent corrosion resistance, and superior biocompatibility [1,2]. Following prolonged periods of technical integration, independent research, and industrial promotion, China’s titanium industry has entered a phase of rapid development, with production volumes steadily increasing to establish the nation as a global leader in the sector [3]. However, the stringent requirements for raw materials, processing techniques, and equipment—coupled with the inherent difficulty of manufacturing—result in high production costs, posing significant challenges for industrial processing. Consequently, achieving low-cost, high-quality fabrication while ensuring the reliability and safety of titanium alloys in industrial applications remains a critical challenge for the engineering community.
In the context of titanium alloy forming, numerous scholars have sought to improve traditional drawing technologies. In recent years, innovative techniques such as electroplastic drawing and electrochemical drawing have been proposed. Electroplastic drawing utilizes high-energy pulse currents to enhance material plasticity and reduce deformation resistance, thereby decreasing the drawing force during processing. However, this method requires specialized DC pulse power supplies to maintain system stability and safety, leading to substantial production costs. Conversely, electrochemical drawing replaces traditional lubricants with specialized active electrolytes and applies micro-currents to the workpiece. This maintains favorable tribological properties on the metal surface and reduces surface deformation resistance. Simultaneously, this method allows for the optimization and adjustment of product surface quality with low energy consumption and minimal equipment requirements [4]. Therefore, enhancing the surface tribological performance of titanium alloys during processing has become a pivotal issue. As early as the 1960s, Gutman et al. [5] from Ben-Gurion University of the Negev proposed that mechanochemical effects, also known as electrochemical plasticization [5], can induce changes in the mechanical properties and microstructure of material surfaces. By applying electrochemical treatment to the drawing process, they observed a reduction in surface residual stress and hardness, which increased the plastic deformation capacity of high-strength alloys. Over the past half-century, electrochemical surface plasticization has been studied extensively, leading to the development of various theoretical frameworks and experimental methods aimed at improving the surface quality of titanium alloy drawing processes [6]. While previous theoretical analyses relied heavily on data obtained from extensive simulation experiments [7], the advancement of data mining and machine learning provides a novel paradigm. These modern tools enable the development of predictive models based on experimental data to identify the primary factors influencing the electrochemical surface COF of titanium alloys.
In recent years, researchers globally have conducted extensive studies on electrochemical plasticization and the interactions between corrosion and deformation in various materials [8,9,10]. An increasing number of studies have focused on accelerating corrosion through applied currents to exploit the synergy between corrosion and wear [10,11], thereby enhancing surface plasticity and facilitating sliding. Gutman et al. [12] applied electrochemical treatment to metal surfaces during drawing processes, attributing the success of this method to chemomechanical effects (CME), which reduce surface residual stress and hardness, consequently improving the plastic deformation capacity of high-strength alloys. Chen et al. [13,14] proposed a novel metal plastic forming technique termed electrochemical cold drawing (ECD). Findings indicated that surface hardness decreases with increasing current density across various electrolyte solutions [15]. Notably, the current density required for this method (101–102 mA/cm2) is significantly lower than that of the electroplastic effect (108–109 mA/cm2). This provides distinct advantages over traditional drawing processes, including reduced deformation resistance and enhanced plasticity. As research into the imp act of electrochemical dissolution on the mechanical properties of metal surfaces has matured, Gutman [12] utilized ECD on AM60B magnesium alloy bars. Their systematic investigation into ECD parameters—including electrolyte composition, current density, drawing speed, and die aperture—demonstrated that electrochemical treatment significantly enhances alloy formability. However, most existing studies rely solely on experimental data analysis to evaluate improvements in surface tribological properties. In reality, electrochemical tribological performance is subject to the complex, coupled influences of voltage, solution type, concentration, and sliding speed. These variables must be integrated into a multi-factor predictive framework to accurately assess material performance.
The rise of machine learning (ML) has opened a new trajectory for predicting the electrochemical surface COF of titanium alloys. Current research on the tribological characteristics of titanium alloys under electrochemical corrosion often focuses on single factors or limited operating conditions, leaving a gap in the understanding of their coefficient of friction under complex, synergistic environments. Furthermore, most existing ML applications in this field are directed toward predicting material service behavior [16,17,18,19]. For instance, Alqurashi [20] proposed a hybrid intelligent framework combining fuzzy logic with Artificial Neural Networks (ANN) to model the erosion-corrosion behavior of glass fiber reinforced pipes (GRP) under harsh conditions, providing accurate quantitative predictions. Zheng et al. [21] coupled high-throughput testing with ML to develop a model for predicting the erosion-corrosion rate of 90/10 Cu-Ni alloy; a comparison of six ML models revealed that the Random Forest-based model achieved the highest coefficient of determination alongside the lowest error metrics. Kuang and Long [22] employed various machine learning algorithms to predict the atmospheric corrosion rates of low-alloy steel (LAS). By utilizing attribute transformation descriptors, they effectively analyzed corrosion behaviors and significantly enhanced the generalization capability of the predictive models. Similarly, Li et al. [23] established a predictive model for the corrosion fatigue crack growth rate of aluminum alloys by integrating the Bayesian bootstrap method with Gradient Boosting Regression Trees (GBRT). The accuracy of their predictive model was quantitatively evaluated using the Mean Squared Error (MSE) and the Coefficient of Determination (R2) as core performance metrics, providing a robust theoretical reference for the subsequent prediction of corrosion fatigue life in aluminum alloys.
Most prior studies have employed machine learning algorithms primarily for predicting the corrosion fatigue of metals. However, in the electrochemical dynamic tribocorrosion process, the time-evolving behavior of the friction coefficient is influenced by multiple parameters such as solution, concentration, voltage, and speed in a nonlinearly coupled manner. Its prediction is essentially a high-dimensional nonlinear regression problem with strong temporal dependency. Specialized machine learning prediction models for this complex scenario are still insufficient at present. Furthermore, as a data-driven approach, the accuracy of ML models is heavily contingent upon the quality and quantity of the sample data. Based on the aforementioned analysis, this paper focuses on the tribological characteristics of titanium alloys under varying conditions of solution type, concentration, sliding velocity, and voltage to systematically parse the influence of these factors on surface COF. Specifically, this study proposes an enhanced LightGBM model incorporating lag features and rolling statistics. By collecting friction coefficient data across diverse experimental conditions, we modeled and predicted the COF of titanium alloys, established a predictive framework for electrochemical surface COF, and identified the underlying trends of how different operating conditions affect material performance. The core rationale for selecting this algorithm lies in its high compatibility with the requirements of electrochemical coefficient of friction prediction. GBRT exhibit outstanding capabilities in handling complex non-linear relationships and high-dimensional features, effectively capturing the intricate interactions between solution concentration, velocity, and voltage on the friction coefficient. By leveraging the built-in feature importance ranking of LightGBM, combined with lag variables and rolling statistics, redundant and low-contribution features were eliminated.

2. Experimental and Methodology

2.1. Materials and Experiments

The titanium alloy material used in this study was sourced from Shenzhen Pacific Steel Co., Ltd. and sectioned into rectangular specimens with dimensions of 30 mm × 10 mm × 5 mm using electrical discharge machining, as illustrated in Figure 1. The chemical composition analysis of the titanium alloy sample was conducted by XRF-1800 X-ray fluorescence spectrometer, and the main components are shown in Table 1. The surface and cross-section of the titanium alloy sample were successively ground with 200#, 400#, 600#, 800#, 1000# and 1200# sandpapers on a grinding and polishing machine, then polished with 1.0-micron diamond polishing agent to a mirror-like finish. After that, the sample was washed with distilled water and ultrasonically cleaned with ethanol for 10 min, dried with cold air and then dried. The electrochemical corrosion and wear performance of the titanium alloy samples were characterized using an MSR-2T electrochemical reciprocating friction and wear tester (Lanzhou Zhongke Kaihua Technology Co., Ltd., Lanzhou, China). This instrument is specifically designed to investigate the corrosive-wear behavior and friction mechanisms of specimens in electrochemical media. Tribological properties were measured using the reciprocating motion module. While electrochemical corrosion behavior was monitored in real time via a CHI604e electrochemical workstation (Shanghai Chenhua Electrochemical Workstation Co., Ltd., Shanghai, China). The reagents and electrochemical corrosion solutions used in the experiments are listed in Table 2. Specimens were mounted in a 300 mL electrochemical cell. A load was applied vertically onto the specimen surface through a counter-ball indenter mounted on a sensor-integrated loading rod, as shown in Figure 2. The applied load was generated by calibrated weights stacked in descending order of mass (i.e., smaller weights placed atop larger ones), with the convex side of the counter ball oriented downward to suppress vibration during reciprocation and thereby minimize systematic measurement errors. The electrochemical cell—and thus the lower specimen—was driven horizontally by a motorized actuator to produce reciprocating sliding contact against the stationary counter ball. This configuration enabled continuous acquisition of the time-dependent coefficient of friction at the specimen–counter-ball interface, as shown in Figure 3.
First, TC4 titanium alloy samples were ground, polished, and ultrasonically cleaned. Electrochemical corrosion and tribological tests were performed using an MSR-2T electrochemical corrosion–tribology coupled testing machine in conjunction with a CHI604e electrochemical workstation. A tungsten carbide (WC) ball (5 mm diameter, 1460 HV) served as the counterface. A normal load of 5 N and a reciprocating amplitude of 5 mm were applied, and the sampling frequency was set to 1 Hz. Tests were conducted at reciprocating frequencies of 60, 120, 180, and 240 cycles/min. The corrosive media were prepared using aqueous sulfuric acid and hydrochloric acid solutions with different concentrations at room temperature. Potentiostatic polarization was carried out at applied potentials of −0.3, 0.2, 0.5, 1.0 and 1.2 V (vs. SCE). By systematically varying the corrosive media, electrochemical potential, and mechanical parameters, the evolution of surface coefficient of friction was investigated. The specific experimental parameters are summarized in Table 3.

2.2. Experimental Analysis

Electrochemical tribocorrosion tests reveal that under an applied potential of 0.5 V, the alloy exhibits a lower friction coefficient of approximately 0.45 in hydrochloric acid solution compared with sulfuric acid solution. Ultra-depth microscopic observations (Figure 4) demonstrate that the wear tracks formed in both solutions are relatively smooth with shallow furrows.
In sulfuric acid solution (Figure 4a), typical delamination wear characteristics are observed. Continuous sliding friction increases the strain rate at contact areas. The combined effect of electrochemical corrosion and stress-induced micro-plastic deformation initiates microcracks on the alloy surface. Synergistic reciprocating sliding and electrochemical corrosion promote crack propagation, eventually leading to spalling and fatigue wear. Meanwhile, the plasticized layer formed via electrochemical corrosion reduces deformation resistance and alleviates local work hardening, thus lowering the friction coefficient under applied potential relative to the uncharged condition.
In hydrochloric acid solution (Figure 4b), obvious pitting corrosion can be detected. Chloride ions possess high activity under the applied electric field, which severely destroys the passive film on the alloy surface. Enriched chloride ions accelerate localized anodic dissolution and facilitate the formation of pitting pits. Additionally, continuous friction disrupts and decomposes the surface passive film, reducing the shear force between the WC ball and the alloy surface, which further decreases the friction coefficient.
Figure 5 shows the EDS elemental distribution results in different regions of the wear scar surface of TC4 alloy in 0.5 mol/L H2SO4 solution. It can be observed that at the potential of −0.3 V, a large area of fresh TC4 matrix remains in the wear scar region with almost no corrosive wear occurring. As the applied potential increases gradually from 0.2 V to 1.0 V, the elemental contents in the wear scar region change correspondingly. The average content of oxygen element rises continuously, reaching a maximum value of 55.33%. Although the average content of sulfur element is relatively low, it also presents an increasing trend. This indicates that oxides and a small amount of sulfides generated by corrosive friction form on the fresh matrix surface, and the accumulation of corrosion products becomes more abundant and thicker with the increase in applied potential.
Figure 6 presents the wear track morphologies of TC4 titanium alloy observed via ultra-depth field microscope under an applied potential of 0.5 V in 0.5 mol/L H2SO4 solution at different reciprocating sliding speeds. It can be seen that the wear surface remains relatively smooth at low sliding speeds. As the sliding speed increases, the wear track gradually becomes rougher, accompanied by increased residual wear debris and evident adhesive spalling features.
At a low reciprocating speed of 60 t/min, moderate plastic deformation and shallow ploughing grooves are detected on the wear surface. The low sliding velocity provides sufficient time for electrochemical dissolution at the contact interface. Although friction tends to thin and remove the surface plasticized layer, the combined effect of applied potential and tribocorrosion allows adequate time for the regeneration of corrosive oxide films. The stable-thickness surface film thereby achieves favorable friction-reducing and lubricating effects.
In contrast, at a high sliding speed of 240 t/min, the surface film is frequently sheared and stripped away during continuous friction. The newly formed corrosion products fail to fully cover the contact zone between the WC ball and alloy substrate. The sliding process mainly acts on the fresh anodically dissolved matrix and local oxides, resulting in more severe plastic deformation, denser ploughing grooves and local fragment spalling. Such damage is primarily attributed to corrosion-induced cracking driven by high cyclic contact stress.
This study experimentally investigates the effects of various corrosion parameters, including corrosive medium type, solution concentration, applied potential and reciprocating sliding speed, on the friction coefficient, wear rate, wear track morphology and surface element distribution of the plasticized layer on titanium alloy surfaces.
Experimental results reveal that under the synergistic effect of applied potential and tribocorrosion, the surface plasticized layer of alloy specimens exhibits excellent friction-reducing and lubricating properties in both hydrochloric acid and sulfuric acid solutions. Different corrosive media and concentrations exert distinct influences on the friction coefficient of corroded surfaces. Nevertheless, the quantitative contribution degree of each parameter to surface COF remains unclear. Accordingly, this work aims to establish a data-driven model based on characteristic factors such as solution type, concentration, applied voltage and sliding speed to realize accurate prediction of friction coefficient.
All data used for dataset construction in this study are entirely derived from controlled electrochemical corrosion and wear experiments. In alignment with standard electrochemical corrosion conditions, the solution type, concentration, velocity and voltage were selected as the primary research parameters to explore their influence on the friction coefficient of the titanium alloy. Based on practical operating conditions, 60 min tests were conducted for each parameter, generating 30 initial sets of experimental conditions. Using cross-factor experimental design, the dataset was further expanded to 250 data points. These data were divided into a training set (90%) and a test set (10%). This methodology enabled a comprehensive investigation into the effects of relevant variables on the tribocorrosion friction coefficient of titanium alloy, as well as a rigorous evaluation of its performance under complex service conditions.

2.3. Machine Learning Methods

In the field of material tribology, it is essential to collect, process, and analyze extensive experimental datasets that encompass friction coefficients and their underlying correlations with external operating conditions. However, due to the high-dimensional nature of such data and the complex non-linear relationships between variables, traditional analytical methods often fail to fully reveal the intrinsic associations within the data. Consequently, the application of sophisticated machine learning algorithms and data modeling strategies can effectively extract latent associative features, significantly enhancing the accuracy and reliability of predictive outcomes. This provides a rigorous scientific foundation for optimizing the processing and expanding the industrial applications of titanium alloys.
Random Forest (RF) constitutes a robust bagging ensemble that exhibits low variance and strong resilience to noise and outliers, rendering it a reliable baseline for time-series forecasting. However, its bootstrap sampling procedure may disrupt temporal dependencies, and its predictive capacity is often outperformed by gradient-boosted trees in modeling complex nonlinear and non-stationary dynamics.
LightGBM, an efficient gradient-boosting decision tree implementation, leverages histogram-based splitting, gradient-based one-side sampling (GOSS), and leaf-wise growth to deliver superior predictive accuracy and computational efficiency on large-scale time-series data. It excels in capturing intricate temporal patterns, feature interactions, and residual structures, making it the preferred choice for high-performance forecasting in contemporary research.
In comprehensive benchmarks across diverse time-series domains (e.g., finance, energy, traffic), LightGBM consistently achieves lower forecasting errors (RMSE, MAE, MAPE) than Random Forest, particularly for high-dimensional, large-sample, and dynamically complex datasets. RF remains valuable for small, noisy datasets where stability and interpretability are prioritized.
Considering that the electrochemical corrosion wear dataset of titanium alloys presents high dimensionality, complex nonlinear relationships, and obvious time-series-dependent characteristics, the LightGBM algorithm is adopted in this study for model training and predictive analysis.

2.4. LightGBM Modeling

In this study, the LightGBM model was selected for training and prediction. As an efficient gradient boosting decision tree framework, LightGBM demonstrates exceptional training efficiency and predictive precision when handling large-scale, high-dimensional data by employing a leaf-wise growth strategy and histogram-based optimization [24,25,26]. Its core mechanisms involve iterative residual fitting, a tree-based decision framework, and a regularized objective function.
Figure 7 systematically illustrates the core operational mechanism of the LightGBM framework. At the feature engineering level, the model utilizes continuous feature discretization and histogram construction strategies to quantify floating-point features into discrete bins and aggregate gradient information. By leveraging histogram subtraction for the rapid calculation of splitting gains, the model significantly reduces memory overhead while enhancing computational throughput. At the ensemble learning level, the framework adopts a gradient boosting architecture with a leaf-wise tree growth strategy. Using the training set as input, the model iteratively constructs decision trees across successive generations, where each new tree fits the residual error of the preceding models. Ultimately, a global predictive model is established through the linear superposition of all base learners. This architecture achieves a sophisticated equilibrium between computational efficiency and generalization performance, providing critical technical support for high-efficiency learning in complex data scenarios.
This study employs a systematic workflow for predicting the coefficient of friction (COF) in tribocorrosion processes, as illustrated in Figure 8. First, tribocorrosion experiments are conducted under varying operating parameters for a duration of 60 min to generate the raw dataset. The acquired raw data is then subjected to a systematic preprocessing stage to remove noise, handle missing values, and standardize the signals. Following preprocessing, we enrich the dataset by engineering lagged features and rolling statistical features. These features are designed to capture the temporal dependencies and evolving statistical characteristics of the friction coefficient signals. Subsequently, the enriched dataset is used to train a Light Gradient Boosting Machine (LightGBM) model. Finally, the trained model is applied to perform time-series prediction of the coefficient of friction (COF).

3. Results and Discussion

3.1. Data Description and Preprocessing

The performance ceiling of a machine learning model is fundamentally determined by data quality, while feature engineering serves as the core process for extracting latent patterns and enhancing model generalization. The experimental data concerning the electrochemical corrosive-wear of titanium alloys exhibit prominent time-series characteristics, with evolutionary behavior governed by the non-linear coupled regulation of factors such as solution concentration, voltage, and sliding velocity. Consequently, this study first performed a exhaustive descriptive analysis of the raw experimental data, followed by the design and implementation of a systematic data preprocessing and feature engineering framework to construct a high-quality dataset for model training and optimization.
The friction coefficient data used in this study were obtained from laboratory records of electrochemical corrosive-wear experiments, encompassing measured values of the friction coefficient over time under diverse experimental conditions. The raw dataset consists of six fields, which are categorized into independent variables (input features) and a dependent variable (target variable) based on the experimental mechanism. The specific definitions and descriptions of each field are provided in Table 4.
A preliminary Exploratory Data Analysis (EDA) was conducted to evaluate the sample distribution across various experimental conditions. The experiments encompassed sulfuric acid and hydrochloric acid solutions at different concentrations, with sliding velocities set at 60, 120, 180, and 240 rpm. Applied voltage conditions included 0 V (open circuit), 0.2 V, −0.3 V, 0.5 V, 1.0 V, and 1.2 V. Typically, raw experimental data contain minor missing values and outliers resulting from sensor fluctuations, which were corrected during the preprocessing stage. To address these data quality issues, a systematic workflow—comprising data cleaning, outlier handling, missing value imputation, and data standardization—was implemented.
To eliminate the influence of varying measurement scales on model performance, the raw input variables were subjected to Min-Max Normalization according to the following equation:
x ̑ i = x i min ( x i ) max ( x i ) min ( x i )
In Equation (1), x i and x ̑ i represent the raw and normalized values of the ith variable, respectively, while max ( x i ) and min ( x i ) denote the maximum and minimum values of that variable. Concurrently, to mitigate the adverse impact of the wide numerical distribution range of the friction coefficient on model fitting, a logarithmic transformation was applied to the friction coefficient u for each data set under corrosive-wear conditions. The transformed value, y = lg ( u ) , served as the model output variable to enhance fitting performance and convergent stability for data with large numerical spans. To quantitatively evaluate the predictive accuracy and generalization capability of the constructed models, the Coefficient of Determination ( R 2 ) and Mean Squared Error ( M S E ) were selected as core evaluation metrics, defined as follows:
M S E = i = 1 N ( y ̑ i y i ) 2 N
R 2 = 1 i = 1 N ( y ̑ i y i ) 2 i = 1 N ( y - y i ) 2
In Equations (2) and (3), y i is the actual value of the target variable, y ̑ i is the predicted value, y - is the mean of the actual values, and n is the total number of samples. Generally, an R 2 value closer to 1 and an M S E closer to 0 indicate smaller overall prediction errors and higher accuracy in model fitting and forecasting.

3.2. Feature Selection

Feature selection is a critical stage in enhancing the accuracy of time-series prediction models. In this study, in addition to retaining experimental conditions as base features, we constructed lag features, rolling statistical features, and derived statistical features based on the inherent characteristics of the time series. This approach was designed to systematically characterize the dynamic evolutionary patterns of the friction coefficient. The base features were extracted directly from the raw experimental data to represent the macroscopic operating environment of the electrochemical corrosive-wear experiments. These primarily include temporal features and condition features. Temporal features encompass absolute time and normalized relative time (the ratio of elapsed time to the total experimental duration). Condition features include solution encoding, concentration, sliding velocity, and operating voltage. The construction of time-series features specifically addresses the significant temporal dependencies and hysteresis effects inherent in friction coefficient variations. To effectively capture these dynamic characteristics, two key categories of features were developed:
(1) Lag Features: Lag features utilize the friction coefficient values from historical time points to facilitate the prediction of the current state, effectively reflecting the intrinsic memory effect of the tribological system. As shown in Equation (4), for the target variable (friction coefficient y t ), this study constructed lag features with orders k = 1, 2, …, 5:
l a g k = y t k
By incorporating these lag features, the model can effectively learn the short-term inertial changes and fluctuation trends of the friction coefficient.
(2) Rolling Statistical Features: Rolling statistical features employ a sliding window mechanism to smooth the data, enabling the extraction of statistical patterns within local time scales. This approach is instrumental in suppressing experimental noise. In this study, sliding windows with sizes of 3, 5, and 10 were utilized to calculate the following features:
Rolling Mean: Characterizes the local average level of the friction coefficient.
Rolling Standard Deviation: Reflects the local fluctuation intensity of the friction coefficient.
Rolling Extremes (Minimum/Maximum): Captures the boundary values of the friction coefficient within the window.
The calculation for the rolling mean is defined as follows:
Rolling_Mean ( t , w ) = 1 w i = 0 w 1 y t i
In Equation (5), w represents the size of the sliding window.
In time-series prediction tasks, data samples possess inherent temporal correlations. Adopting traditional random partitioning methods can easily introduce data leakage (the inclusion of “future” information), leading to over-optimistic and unreliable evaluation results. Consequently, this study adopts a rigorous chronological splitting strategy. The dataset was partitioned proportionally using individual experiments as the basic unit. First, the raw data were grouped according to their unique Experiment IDs. Subsequently, within each experimental group, the first 90% of samples were assigned chronologically to the training set for model parameter learning, while the remaining 10% served as the test set to validate generalization capability. Finally, the training and test sets from all experiments were merged to form the global training and test sets. This partitioning method ensures that all test samples occur strictly later in time than the training samples, effectively simulating a real-world prediction scenario. This approach eliminates information leakage and ensures that the model evaluation results are both objective and credible.
Owing to the high-dimensional and complex non-linear characteristics of the titanium alloy electrochemical corrosive-wear dataset, directly applying a standard LightGBM gradient boosting decision tree model for fitting would not only increase the computational burden but also potentially degrade predictive accuracy through the introduction of irrelevant features. To address these challenges, an enhanced LightGBM approach was developed, as illustrated in the flowchart in Figure 9. This method constructs a predictive model for the electrochemical surface COF prediction of titanium alloys by incorporating lag features and rolling statistical features. Furthermore, input feature optimization is achieved through a hybrid approach combining Pearson correlation coefficients with LightGBM’s built-in feature importance ranking. As a boosting-based ensemble learning method, GBDT iteratively optimizes and approximates residuals by serially combining multiple decision trees. It utilizes additive models and a forward stagewise algorithm for optimization, employing the negative gradient of the loss function to achieve steepest descent approximation, thereby efficiently addressing high-dimensional non-linear modeling problems. Compared to approaches using only raw instantaneous features, traditional feature selection, or conventional ML models, the proposed method offers two distinct advantages. Temporal Depth: By integrating lag and rolling features, the model fully extracts historical dependencies and local dynamic patterns within the time-series data, significantly enhancing its capability to represent complex, time-varying processes. Efficiency and Robustness: The integration of Pearson correlation filtering effectively reduces feature dimensionality and noise interference. This ensures high predictive precision while simultaneously improving training efficiency, generalization capability, and model interpretability.

3.3. Data Description and Hyperparameter Configuration

This study systematically analyzes and quantitatively evaluates the electrochemical corrosion and wear dataset in terms of global statistical properties and data quality. Multiple critical data characteristics are exhaustively characterized, including total sample scale, intrinsic feature configuration, missing data ratio, statistical distribution, abnormal sample distribution, and cross-feature correlation. The exhaustive data inspection enhances the reliability of model training and ensures the reproducibility of subsequent predictive experiments.
The dataset contains 32,470 valid time-series sampling points and five input variables, which are divided into categorical and numerical features according to physical experimental attributes. The electrolyte solution type is treated as a categorical feature, while the remaining four environmental and operational parameters are defined as continuous numerical variables. The friction coefficient is regarded as the core regression target for corrosion wear prediction. All experimental measurements were collected under independent and standardized electrochemical operating conditions. The sufficient sample size satisfies the statistical requirements for reliable machine learning training and performance verification, which further supports the robust evaluation of model generalization capability in complex wear prediction tasks.
To simulate real industrial monitoring scenarios where future tribological parameters are predicted based on historical sequential observations, a strict time-series sequential partitioning strategy is employed in this study. All samples collected from each 60 min independent experiment are divided chronologically without random shuffling. Specifically, the first 90% of temporal data are used for model training and parameter optimization, and the last 10% are reserved as the unseen test set to evaluate temporal extrapolation performance and long-term wear forecasting accuracy. This time-aware partitioning strictly maintains temporal causality, completely avoids potential time-series data leakage, and guarantees the practical applicability and scientific validity of the prediction evaluation results.
The time variable is synchronously recorded at fixed intervals throughout the entire experimental process, forming a complete, evenly distributed, and missing-free time-series index. As a fundamental sequential label, the time variable ensures uniform sampling frequency without requiring complex distribution fitting. Other key control factors, including solution concentration, sliding speed, applied voltage, and ambient temperature, are precisely configured before each test with balanced sample distribution across different gradient levels, satisfying the standard requirements of controlled-variable electrochemical experiments. Statistical results show that the friction coefficient ranges from 0.079 to 0.743, with a mean value of 0.506 and a standard deviation of 0.087. The overall data distribution is approximately symmetric, with no severe skewness or extreme outliers. Such stable and complete data distribution effectively reflects the full evolutionary process of titanium alloy friction coefficient, covering both the initial low-friction running-in stage and the subsequent stable high-friction wear stage, thereby providing high-quality sequential samples for supporting the accurate prediction of surface COF prediction using the LightGBM framework in this study, specific super parameters are shown in Table 5 and Table 6.
Based on the aforementioned dataset statistical characteristics and optimal hyperparameter configuration, exhaustive predictive experiments are conducted to further evaluate the model performance and analyze the corresponding results.

3.4. Data-Driven Prediction of Electrochemical Surface Coefficient of Friction

Figure 10 presents the Pearson correlation coefficient heatmap, which quantitatively reveals the linear association intensity between each feature of the electrochemical corrosive-wear experiment and the target variable (friction coefficient). The diagonal elements represent the unit values of variable self-correlation. A strong positive correlation was observed between solution encoding and concentration (r = 0.63), suggesting potential multicollinearity. The target variable, friction coefficient, exhibits moderate to strong positive correlations with time (r = 0.64) and velocity (r = 0.50), indicating that these are the primary linear factors driving the dynamic evolution of the friction coefficient. In contrast, the linear correlations between the friction coefficient and solution encoding, concentration, and voltage are relatively weak (|r| < 0.20). This suggests that their influences may manifest in non-linear forms, providing a critical rationale for the subsequent feature selection and modeling strategies.
As illustrated in Figure 11 and Figure 12, the conventional LightGBM model, relying solely on the five base features, exhibits poor fitting performance and weak predictive capability. The coefficient of determination (R2) for the training and test sets reached only 0.771 and 0.487, respectively. The predicted data points are highly dispersed, showing significant deviations across high, medium, and low wear rate regions, which indicates a deficient learning capacity regarding the underlying features of the training data. Furthermore, the prediction errors are substantial: the MSE, RMSE, and MAE for the training set are 0.00169, 0.04101, and 0.03091, while those for the test set are 0.00368, 0.06062, and 0.05034, respectively. The root cause of this performance lag lies in the temporal dependencies and hysteresis effects inherent in the friction coefficient (COF) time series. Unlike traditional static tabular data, the core characteristic of time-series data is autocorrelation—the COF at the current moment is strongly correlated with historical observations. This dynamic evolutionary pattern cannot be fully characterized by static experimental condition features alone.
As a tree-based model founded on the assumption of sample independence, LightGBM, if not explicitly provided with temporal information, can only learn the average trends from external operating conditions. It struggles to capture the inertia, abrupt changes, and periodic fluctuations of the COF itself. Consequently, a model that neglects temporal lag can only output an “average friction coefficient” based on current conditions, failing to predict accelerations or sudden shifts in the COF sequence. This results in a fitting curve that is overly smooth and centered around the mean, leading to significant deviations from the actual fluctuating COF values and explaining the fundamental failure of the base-feature model.
As demonstrated in Figure 12, the fitting and generalization results of the conventional LightGBM model are suboptimal. This stems from its architectural reliance on linear combinations of inputs, which performs poorly when confronted with the highly non-linear characteristics inherent in electrochemical corrosive-wear datasets. In contrast, the fitting results incorporating both lag and rolling features are highly satisfactory. A comparison of generalization errors reveals that the enhanced model significantly outperforms the conventional version. Specifically, Figure 13 illustrates the coefficient of determination (R2) improved to 0.979 for the training set and 0.951 for the test set. Correspondingly, the error metrics for the training set—MSE, RMSE, and MAE—decreased to 0.00015, 0.012285, and 0.00946, respectively. For the test set, these values dropped to 0.000348, 0.018653, and 0.013139. The predicted values align closely with the experimental data, exhibiting minimal deviations and maintaining stability even in high-wear-rate regions. These results validate the efficacy of the proposed feature selection strategy, which optimizes predictive performance and enhances generalization accuracy by eliminating redundant features.
Given that the friction coefficient is a representative time-series variable with significant autocorrelation, current observations are heavily dependent on historical states. Relying solely on basic experimental features is insufficient to fully characterize its dynamic evolutionary patterns. Consequently, this study developed lag and rolling statistical features, as illustrated in Figure 13, to explicitly introduce temporal dependency information. This allows the model to capture the short-term inertia, local fluctuations, and temporal patterns of the COF, thereby substantially elevating both fitting and predictive performance.
Figure 14 illustrates the feature importance ranking after incorporating lag and rolling statistical features. Notable discrepancies exist between these results and those of the conventional LightGBM model shown in Figure 11, reflecting differences in how each model interprets the data characteristics. According to the importance ranking in Figure 14, the first-order lag feature of the friction coefficient (COF_lag1) holds a dominant position, with its importance significantly exceeding all other features. This finding offers direct experimental evidence for the strong autocorrelation of friction coefficient sequences, revealing that the wear condition at the previous moment dominates the real-time variation in friction. It well characterizes the inherent memory effect and short-term inertia of tribological systems.
Rolling statistical features (such as rolling mean and rolling standard deviation) also demonstrate high importance. This suggests that the friction coefficient exhibits stable trends and fluctuation patterns within local time windows. These features effectively capture the dynamic evolutionary modes of the sequence, providing essential information for the model to identify local extrema and trend transitions.
Temporal features (absolute and relative time) possess moderate importance, indicating that the cumulative effect of the experimental process is a significant factor influencing the long-term variation in the friction coefficient. This aligns with the findings from the Pearson correlation analysis, which showed a strong positive correlation between time and the friction coefficient.
In contrast, the solution encoding feature exhibits a relatively low level of importance. This implies that within the context of this specific experimental dataset, the linear and non-linear impacts of the solution type on the friction coefficient are limited. Its influence is likely partially subsumed by temporal and other operating condition features. This observation provides a logical basis for future feature pruning and model lightweighting efforts.
Quantitative analysis was performed on each group of results in Figure 14. Mean square error (MSE) and coefficient of determination (R2) were adopted to evaluate the fitting performance and generalization error of the two algorithms, and the comparison results are presented in Table 7.
It can be observed from Table 7 that the enhanced LightGBM model achieves favorable fitting capability. In terms of generalization error on the test set, the proposed optimized model exhibits distinctly superior generalization performance compared with the conventional model.

4. Conclusions

In this study, a machine learning framework was developed to predict the factors influencing the electrochemical surface COF prediction of titanium alloys. The research systematically explored material behavior and model predictive capabilities under diverse operating conditions. An enhanced LightGBM model, incorporating lag and rolling statistical features, was constructed and validated using experimental datasets. To evaluate predictive precision, RMSE, MAE, and MSE were employed as core metrics, alongside an analysis of feature importance for variables such as solution concentration, experimental duration, and sliding velocity. The results demonstrate that the modified LightGBM algorithm effectively addresses challenges in modeling high-dimensional, non-linear complex data, exhibiting superior generalization ability and predictive accuracy. The established model accurately fits and predicts the electrochemical surface COF of titanium alloys. Moreover, through feature contribution analysis, the model offers robust interpretability, providing critical data support and a theoretical reference for the optimization of electrochemical machining processes for titanium alloys. Therefore, the main contributions of this work are threefold:
(1) Constructing a systematic dataset of titanium alloy COF prediction under multi-factor electrochemical conditions. Proposing an enhanced LightGBM model that incorporates lag and rolling-window features to effectively capture the temporal dynamics of friction coefficient. The enhanced LightGBM model, integrating lag and rolling features, demonstrated the highest overall performance. Feature analysis identified that among the five investigated factors (solution concentration, Solution type, voltage, sliding speed and experimental duration), solution concentration as the most critical factor influencing the friction coefficient, followed by experimental duration, while sliding velocity exhibited a relatively minor impact.
(2) In the prediction of friction coefficients, the enhanced LightGBM model outperformed conventional models across both training and testing datasets, achieving the lowest prediction errors and superior fitting, particularly in high-friction coefficient regimes. Conversely, the conventional LightGBM model exhibited the poorest performance, characterized by significant prediction errors and an increased number of outliers in the medium-to-high friction ranges, indicating an insufficient capacity to fit complex, non-linear data.
(3) Identifying and ranking the key influencing factors (e.g., solution concentration, test duration) through feature importance analysis, providing actionable insights for process optimization. Feature importance analysis confirmed that solution concentration is the primary driver of friction coefficient variations, yielding the highest contribution scores across all models. Experimental duration followed in importance, though weight evaluations varied by model. While the conventional LightGBM model overemphasized the role of voltage, the enhanced model assigned it a lower weight. Velocity had the least impact overall, despite holding a slightly higher relative importance in the conventional model than solution type. These discrepancies highlight how different model architectures interpret data characteristics, providing a crucial reference for future model optimization and feature selection strategies.
The enhanced LightGBM model we developed—incorporating lag and rolling statistical features tailored to titanium alloy electrochemical corrosive-wear data—effectively performs variable selection and model construction in high-dimensional spaces. This approach is feasible and practical for predicting and analyzing the factors that govern the COF of titanium alloy surfaces in complex electrochemical corrosive environments. While this study concentrates on machine learning-based COF prediction under standardized experimental settings, thorough evaluation of statistical repeatability will be addressed in follow-up research.

Author Contributions

Conceptualization, Methodology, Writing—original draft, F.H.; Supervision, H.W.; Supervision, Funding acquisition, J.J.; Methodology, J.S.; Software, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Guo, L.; He, W.; Zhou, P.; Liu, B. Research Status and Development Prospect of Titanium and Titanium Alloy Products in China. Hot Work. Technol. 2020, 49, 22–28. [Google Scholar]
  2. Zhao, Y.; Ge, P.; Xin, S. Progresses of R&D on Ti-Alloy Materials in Recent 5 Years. Mater. China 2020, 39, 527–534+557. [Google Scholar]
  3. Shen, C. Progress and development trend of titanium industry technology in China. China Metal Bulletin 2022, 9, 1–3. [Google Scholar]
  4. Wang, L.; Wang, Y.; Li, X.; Wu, Y.; Ding, N.; Wang, N.; Xin, X.; Su, B. Research Progress and Application Status of Plastic Processing Technologies of Titanium and Titanium Alloys in China. Titan. Ind. Prog. 2025, 42, 43–48. [Google Scholar] [CrossRef]
  5. Gutman, E.M. Interdependence of corrosion phenomena and mechanical factors acting on metals. Sov. Mater. Sci. 1968, 3, 401–409. [Google Scholar] [CrossRef]
  6. Guo, X.; Chen, T. Influence of different electrolytes on electrochemical plasticization for 304L stainless steel bar. Forg. Stamp. Technol. 2025, 50, 77–85. [Google Scholar]
  7. Dan, W.; He, Y.; Li, H. Advance and trend of friction study in plastic forming. Trans. Nonferrous Met. Soc. China 2014, 24, 1263–1272. [Google Scholar] [CrossRef]
  8. Zhao, Z.; Wang, J.; Wang, L. Research on the Relationship between the Average Friction Coefficient and Contact Pressure in Metal Plastic Forming. Forg. Stamp. Mach. 2006, 41, 4. [Google Scholar]
  9. Yang, H.; Gu, R.; Zhan, M.; Li, H. Effect of frictions on cross section quality of thin-walled tube NC bending. Trans. Nonferrous Met. Soc. China 2006, 16, 878–886. [Google Scholar] [CrossRef]
  10. Wang, L.; Zhou, Y.; Wang, J.; Wang, Z.; Huang, W. Corrosion-Wear Interaction Behavior of TC4 Titanium Alloy in Simulated Seawater. Tribology 2019, 7, 206–212. [Google Scholar]
  11. Wang, Z.; Yang, S.; Peng, Z.; Tan, Q.; Guo, J.; Zhou, L. Corrosive-Wear Properties of Two NiAl Alloys in Sulfuric Acid Solution. Chin. J. Mater. Res. 2015, 29, 595–601. [Google Scholar]
  12. Gutman, E.M.; Unigovski, Y.; Shneck, R.; Ye, F.; Liang, Y. Electrochemically enhanced surface plasticity of steels. Appl. Surf. Sci. 2016, 388, 49–56. [Google Scholar] [CrossRef]
  13. Li, L.L.; Chen, T.J.; Zhang, S.Q.; Yan, F.Y. Electrochemical cold drawing of in situ Mg2Sip/AM60B composite: A comparison with the AM60B alloy. J. Mater. Process. Technol. 2017, 240, 33–41. [Google Scholar] [CrossRef]
  14. Li, L.; Chen, T.; Zhang, S.; Gutman, E.M.; Unigovski, Y.; Yan, F. Electrochemical cold drawing of Mg alloy bars. Mater. Sci. Technol. 2017, 33, 244–254. [Google Scholar] [CrossRef]
  15. Yang, S. Research on the Corrosion Wear Characteristic of NiAl-2.5Ta-7.5Cr Based Alloy. Master’s Thesis, Hunan University of Science and Technology, Xiangtan, China, 2015. [Google Scholar]
  16. Schmidt, J.; Marques, M.R.; Botti, S.; Marques, M.A. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019, 5, 83. [Google Scholar] [CrossRef]
  17. Hu, J.J.; Cao, Z.; Dan, Y.; Niu, C.; Li, X.; Qian, S. Elastic Property Prediction of Materials Based on Machine Learning and Feature Selection. J. South China Univ. Technol. (Nat. Sci. Ed.) 2019, 47, 48–55. [Google Scholar]
  18. Liu, Y.; Zhao, T.; Ju, W.; Shi, S. Materials discovery and design using machine learning. Materiomics 2017, 3, 159–177. [Google Scholar] [CrossRef]
  19. Anaei, M.T.M.; Khosravifard, A.; Bui, T.Q. Analysis of fracture mechanics and fatigue crack growth in moderately thick plates using an efficient meshfree approach. Theor. Appl. Fract. Mech. 2021, 113, 102943. [Google Scholar] [CrossRef]
  20. Alqurashi, H.F.; Abdellah, M.Y.; Alshareef, M.; Hassan, M.K.; Alabdullah, F.T.; Moamed, A.F. Intelligent Modeling of Erosion-Corrosion in Polymer Composites: Integrating Fuzzy Logic and Machine Learning. Polymers 2026, 18, 9. [Google Scholar] [CrossRef]
  21. Zheng, Z.Q.; Wang, Z.B.; Hu, H.X.; Zheng, Y.G. Predicting erosion-corrosion rate of a 90/10 copper-nickel alloy via high-throughput experiments and machine learning. Mater. Lett. 2026, 406, 140003. [Google Scholar] [CrossRef]
  22. Kuang, J.G.; Long, Z.L. Prediction model for corrosion rate of low-alloy steels under atmospheric conditions using machine learning algorithms. Int. J. Miner. Metall. Mater. 2024, 31, 337–350. [Google Scholar] [CrossRef]
  23. Li, X.; Zhang, Y.; Yao, L.; Tong, X. Research on the prediction method of corrosion fatigue crack extension rate of aluminum alloy based on BB-GBRT algorithm. Theor. Appl. Fract. Mech. 2025, 136, 104807. [Google Scholar] [CrossRef]
  24. Wei, J.M.; Yuan, S.J.; Kong, S.S.; Yang, A.M.; Zhao, C.Y. Development and application of light gradient boosting machine. Comput. Eng. Appl. 2025, 61, 32–42. [Google Scholar]
  25. Hou, J.Q. Research on the Prediction Model of Rolling Force for Tandem Cold Rolling Process Based on Data-Driven. Master’s Thesis, Yanshan University, Qinhuangdao, China, 2023. [Google Scholar]
  26. Shao, J.B. Research on Prediction of Multi-Class Surface Defects of Hot Rolled Strip Based on Data-Driven. Master’s Thesis, Yanshan University, Qinhuangdao, China, 2023. [Google Scholar]
  27. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 3146–3154. [Google Scholar]
Figure 1. Titanium alloy specimens.
Figure 1. Titanium alloy specimens.
Coatings 16 00680 g001
Figure 2. Schematic of electrochemical friction and wear testing.
Figure 2. Schematic of electrochemical friction and wear testing.
Coatings 16 00680 g002
Figure 3. Building an experimental platform.
Figure 3. Building an experimental platform.
Coatings 16 00680 g003
Figure 4. SEM morphologies of the wear tracks of TC4 in two acid solutions at applied potential of 0.5, they should be listed as: (a) 0.5 mol/L H2SO4 (b) 0.5 mol/L HCl.
Figure 4. SEM morphologies of the wear tracks of TC4 in two acid solutions at applied potential of 0.5, they should be listed as: (a) 0.5 mol/L H2SO4 (b) 0.5 mol/L HCl.
Coatings 16 00680 g004
Figure 5. The EDS results of different areas on TC4 abrasion surface, they should be listed as: (a) −0.3 V, (b) 0.2 V, (c) 0.5 V, (d) 1.0 V (V = 240 t/min).
Figure 5. The EDS results of different areas on TC4 abrasion surface, they should be listed as: (a) −0.3 V, (b) 0.2 V, (c) 0.5 V, (d) 1.0 V (V = 240 t/min).
Coatings 16 00680 g005
Figure 6. SEM morphologies of the wear tracks of TC4 alloy at different reciprocating speeds in 0.5 mol/L H2SO4 solution, they should be listed as: (a) 60 t/min; (b) 120 t/min; (c) 180 t/min; (d) 240 t/min.
Figure 6. SEM morphologies of the wear tracks of TC4 alloy at different reciprocating speeds in 0.5 mol/L H2SO4 solution, they should be listed as: (a) 60 t/min; (b) 120 t/min; (c) 180 t/min; (d) 240 t/min.
Coatings 16 00680 g006
Figure 7. Core operational mechanism of LightGBM [27].
Figure 7. Core operational mechanism of LightGBM [27].
Coatings 16 00680 g007
Figure 8. Work flow chart.
Figure 8. Work flow chart.
Coatings 16 00680 g008
Figure 9. Flowchart of the Enhanced LightGBM Method.
Figure 9. Flowchart of the Enhanced LightGBM Method.
Coatings 16 00680 g009
Figure 10. Pearson correlation coefficient heatmap of Features and Target Variable.
Figure 10. Pearson correlation coefficient heatmap of Features and Target Variable.
Coatings 16 00680 g010
Figure 11. Feature importance ranking derived from the LightGBM model.
Figure 11. Feature importance ranking derived from the LightGBM model.
Coatings 16 00680 g011
Figure 12. Fitting and generalization results of the algorithm derived from the LightGBM model, they should be listed as: (a) Training Set: Measured vs. Predicted Scatter Plot; (b) Test Set: Measured vs. Predicted Scatter Plot; (c) Training Set: Measured vs. Predicted COF; (d) Test Set: Measured vs. Predicted COF.
Figure 12. Fitting and generalization results of the algorithm derived from the LightGBM model, they should be listed as: (a) Training Set: Measured vs. Predicted Scatter Plot; (b) Test Set: Measured vs. Predicted Scatter Plot; (c) Training Set: Measured vs. Predicted COF; (d) Test Set: Measured vs. Predicted COF.
Coatings 16 00680 g012
Figure 13. Fitting and generalization results of the Enhanced LightGBM Algorithm, they should be listed as: (a) Training Set: Measured vs. Predicted Scatter Plot; (b) Test Set: Measured vs. Predicted Scatter Plot; (c) Training Set: Measured vs. Predicted COF; (d) Test Set: Measured vs. Predicted COF.
Figure 13. Fitting and generalization results of the Enhanced LightGBM Algorithm, they should be listed as: (a) Training Set: Measured vs. Predicted Scatter Plot; (b) Test Set: Measured vs. Predicted Scatter Plot; (c) Training Set: Measured vs. Predicted COF; (d) Test Set: Measured vs. Predicted COF.
Coatings 16 00680 g013aCoatings 16 00680 g013b
Figure 14. Enhanced Feature Importance Ranking.
Figure 14. Enhanced Feature Importance Ranking.
Coatings 16 00680 g014
Table 1. Composition of TC4 titanium alloy used in this study (Mass fraction) wt.%.
Table 1. Composition of TC4 titanium alloy used in this study (Mass fraction) wt.%.
ElementAlVFeSiCNTi
Ti6Al4V6.104.150.090.120.010.01margin
Table 2. Experiment reagents.
Table 2. Experiment reagents.
Reagent NameSpecificationManufacturer
Concentrated sulfuric acidAnalytical pureSinopharm Chemical Reagent Co., Ltd., Shanghai, China.
Hydrochloric acidAnalytical pureSinopharm Chemical Reagent Co., Ltd., Shanghai, China.
Anhydrous ethanol (C2H5OH)≥99.7%Tianjin Fuyu Fine Chemical Co., Ltd., Tianjin, China.
Table 3. List of experimental parameters.
Table 3. List of experimental parameters.
SolutionH2SO4HCL
Concentration (mol/L)0.50.750.50.1
Voltage (V)−0.3, 0.2, 0.5, 1.0, 1.2
Speed (t/min)60, 120, 180, 240
Table 4. Definition and description of data variables.
Table 4. Definition and description of data variables.
Variable NameTypeUnitDescription
TimeNumericalminRecorded from the start of the experiment; reflects the dynamic evolution of the friction process.
SolutionCategorical-Type of chemical solution used, including “Sulfuric Acid” and “Hydrochloric Acid”.
ConcentrationNumericalmol/LMolar concentration of the solution; affects the physicochemical properties of the sliding surface.
VelocityNumericalr/minRelative sliding velocity of the specimen; directly influences frictional heat.
VoltageNumericalVApplied electric field strength; used to investigate the effect of voltage on the friction coefficient.
Friction CoefficientNumerical-Target variable; characterizes the coefficient of friction between material surfaces.
Table 5. Characteristic engineering parameters.
Table 5. Characteristic engineering parameters.
ParameterValueDescription
Number of lag characteristic periodslag_periods = 5Establish lag1~lag5
Scroll window size3, 5, 10 
Rolling statistics typeMean, standard deviation, minimum, maximum 
Rolling minimum number of periodsmin_periods = 1At least one valid value in the window is calculated
Time differencetime_diffDifference between adjacent time points
Experimental durationExperimental duration = max(time) − min (time)Internal calculation per experiment
Cumulative timeCumulative time =
Time − min (time)
Internal calculation per experiment
Table 6. LightGBM model super parameters.
Table 6. LightGBM model super parameters.
ParameterValueDescription
ParameterValueDescription
objective‘regression’Return to task
metric‘rmse’Root mean square error
boosting_type‘gbdt’Traditional gradient lifting tree
num_leaves31Maximum leaves per tree
learning_rate0.05Learning rate/shrinkage step
feature_fraction0.990% characteristics selected randomly for each tree
bagging_fraction0.8Randomly select 80% samples for each iteration
bagging_freq5Bagging every 5 rounds
min_data_in_leaf20Minimum number of leaf samples
min_gain_to_split0.01Split minimum gain
lambda_l10.1L1 regularization
lambda_l20.1L2 regularization
verbose−1silent mode 
random_state42Fixed randomness
n_jobs−1Use all CPU cores
num_boost_round1000Maximum iteration rounds
early_stopping_rounds50Verification set stops after 50 rounds of lifting
Validation setTraining set + Test setvalid_names = [‘train’,’valid’]
Log printing cycle100 
Table 7. Comparison of fitting generalisation errors of the three algorithms.
Table 7. Comparison of fitting generalisation errors of the three algorithms.
AlgorithmsTraining DatasetTest Dataset
MSER2MSE R2
LightGBM model0.001690.7710.003680.487
enhanced LightGBM model0.000150.9790.0003480.951
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, F.; Wen, H.; Jia, J.; Sun, J.; Wang, X. Prediction of Electrochemical Surface COF of Titanium Alloy Using an Enhanced LightGBM with Lag and Rolling Features. Coatings 2026, 16, 680. https://doi.org/10.3390/coatings16060680

AMA Style

Han F, Wen H, Jia J, Sun J, Wang X. Prediction of Electrochemical Surface COF of Titanium Alloy Using an Enhanced LightGBM with Lag and Rolling Features. Coatings. 2026; 16(6):680. https://doi.org/10.3390/coatings16060680

Chicago/Turabian Style

Han, Fang, Huaixing Wen, Junhong Jia, Junyan Sun, and Xuanchao Wang. 2026. "Prediction of Electrochemical Surface COF of Titanium Alloy Using an Enhanced LightGBM with Lag and Rolling Features" Coatings 16, no. 6: 680. https://doi.org/10.3390/coatings16060680

APA Style

Han, F., Wen, H., Jia, J., Sun, J., & Wang, X. (2026). Prediction of Electrochemical Surface COF of Titanium Alloy Using an Enhanced LightGBM with Lag and Rolling Features. Coatings, 16(6), 680. https://doi.org/10.3390/coatings16060680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop