The conventional evaluation methods, rooted in statistical analysis, often rely on linear regression or polynomial regression models. While these models are computationally straightforward and directly reflect the correlation between each index and the target, they fall short in capturing nonlinear relationships among multiple indexes. Earlier support vector machines (SVMs) were employed as regression models, but their complexity, particularly in dealing with multi-class problems, posed challenges. In recent years, many scholars have used the decision tree model for evaluation; the decision tree model adopts a tree structure to represent the decision process, the root node of the tree represents the entire dataset, each internal node represents a feature attribute, and each leaf node represents a class or value. The construction process of a decision tree is carried out in a recursive way. Starting from the root node, a feature attribute and a threshold are selected to divide the dataset into two subsets, such that some condition on this feature is satisfied. The building process is recursive, repeatedly selecting the best split feature and threshold, splitting the dataset until the stop condition is reached. The stop condition can be that the depth of the tree reaches a certain value and the number of data points in the leaf nodes is less than a certain threshold. Once the decision tree is built, it can be used to make classification or regression predictions. Starting from the root node, we traverse down the tree according to the characteristic values of the data, and finally reach a leaf node whose category or numerical value is the prediction result [
20]. The most commonly used are integrated models of decision trees, such as the bagging model: random forest (RF), and boosting model: gradient-boosting machine (GBM), etc. Compared with the boosting model, the bagging model cannot improve model deviation or significantly improve performance; the processing of unbalanced datasets is limited [
21,
22,
23,
24]. This paper chooses the gradient-boosting machine model in the boosting model, there are mainly gradient-boosting decision trees (GBDTs), extreme gradient-boosting (XGBoost), and light gradient-boosting machines (LightGBMs). In each iteration, GBDT needs to traverse the entire training data many times, the model calculation is complex, and the training process is relatively slow. Especially on large datasets, GBDT needs to build decision trees serially, so the training cannot be parallelized. Before LightGBM was proposed, the most famous GBDT tool was XGBoost, which is a decision tree algorithm based on the pre-ordering method. However, the disadvantages are also obvious. First, it consumes a large amount of space. Such algorithms need to save the eigenvalues of the data, and also save the results of the sorting of the features (for example, to save the sorted index in order to quickly calculate the split points later), which consumes twice the memory of the training data. Secondly, when traversing each split point, it is necessary to calculate the split gain, which is expensive to consume. In addition, the complex parameter tuning of XGBoost, with multiple parameters to be adjusted, may require more parameter tuning work. Therefore, the LightGBM model, with excellent training speed and high memory usage, is selected for the intelligent evaluation of anti-slip performance in this paper. This model has a high performance and low consumption, optimization accuracy, and supports three parallel training [
25] modes. In order to avoid the above shortcomings of XGBoost, and to speed up the GBDT model training without compromising the accuracy, lightGBM has carried out the following optimization [
26] on the traditional GBDT algorithm.
LightGBM adopts a histogram-based learning approach instead of preordering. This reduces memory consumption because there is no need to save the sorted feature index, while also speeding up the process of calculating segmentation points. The basic idea of the histogram algorithm is that when constructing the decision tree, it does not directly use the sorting information of the original data, but divides the value of each feature into several intervals, namely, the histogram columns (bins); as shown in
Figure 10, below, the histogram algorithm is actually to convert the continuous floating-point features into discrete features. Then, we count the data and feature label distribution of samples in each interval. This discretization process makes it more efficient to find the best segmentation points. The first advantage of histogram learning is to reduce memory consumption: since there is no need to store the sorting index of the original data, only the histogram information is stored, reducing the memory overhead. The second is to speed up segmentation point calculation: histogram statistics make the calculation to find the best segmentation point more efficient, especially on large-scale datasets.
LightGBM employs the leaf growth mode, which, compared to hierarchical growth, swiftly identifies leaf nodes with higher gain, thereby reducing the depth of the decision tree. Most network models in previous studies adopted the level-wise decision-tree growth strategy (
Figure 11), whereas LightGBM utilizes the leaf-wise algorithm with depth restriction (
Figure 12). In each iteration, LightGBM identifies the leaf with the greatest splitting gain among all current leaves and subsequently splits. The advantage of the leaf-wise algorithm lies in its smaller error compared to level-wise algorithms when the number of splittings is the same, resulting in higher precision. However, the leaf-wise algorithm may lead to the growth of a deeper decision tree, potentially causing overfitting of the target data. To counter this, LightGBM incorporates a depth threshold into the leaf-wise algorithm to prevent overfitting while maintaining efficiency.
LightGBM introduces gradient-based one-side sampling, reducing consideration for samples with small gradients while preserving data distribution. This enhancement further accelerates the training speed.
LightGBM supports feature bundling, enabling the combination of values from different features and reducing computational complexity. The EFB algorithm in LightGBM transforms this challenge into a graph coloring problem and employs a greedy approximation method to address it. Specifically, the EFB algorithm treats each feature as a vertex in the graph, and the edge weight between each vertex (feature) represents the conflict value between the two features (
Figure 13 below). The features to be bound are the feature points that are to be painted with the same colour in the graph coloruing problem.