Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest

Ma, Minglu; Guan, Bing; Cui, Junguo; Wang, Hanxiang; Peng, Bing; Teng, Xingbao; Li, Tingting; Li, Hui

doi:10.3390/pr14111836

Open AccessArticle

Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest

by

Minglu Ma

¹,

Bing Guan

^1,*,

Junguo Cui

²

,

Hanxiang Wang

²,

Bing Peng

³,

Xingbao Teng

¹,

Tingting Li

¹ and

Hui Li

¹

Shandong Weima Pumps Manufacturing Co., Ltd., Jinan 271100, China

²

College of Mechanical and Electrical Engineering, China University of Petroleum (East China), Qingdao 266580, China

³

School of Electrical Engineering, Shenyang University of Technology, Shenyang 110023, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(11), 1836; https://doi.org/10.3390/pr14111836

Submission received: 30 April 2026 / Revised: 29 May 2026 / Accepted: 3 June 2026 / Published: 5 June 2026

(This article belongs to the Special Issue Process Safety and Condition Monitoring for Energy and Gas Infrastructure)

Download

Browse Figures

Versions Notes

Abstract

In order to improve the accuracy of the lifting state detection of an oil–gas jack-up platform, a lifting state detection method based on improved random forest is proposed to solve the problems of low detection efficiency caused by the interference of lifting data and redundant features. To detect outliers in the lifting data of an oil–gas jack-up platform by the K-means clustering method and clean abnormal data, the principal component analysis method is introduced into the random forest algorithm to reduce the dimension of lifting data, and an improved random forest is constructed with the Gini impurity criterion to preliminarily classify the lifting state. Then, fuzzy comprehensive evaluation is used to refine the state of the classification result and realize lifting state detection. The test results show that the proposed method has good stability in lifting state timing detection, a high-inter class-to-intra-class distance ratio, and accurate platform displacement detection under different incident angles/motion responses.

Keywords:

improved random forest; oil and gas jack-up platform; lifting system; K-means clustering; fuzzy comprehensive evaluation

1. Introduction

The marine environment for offshore oil and gas development is complex and harsh, and the safety control of jack-up platform lifting operations is a core element of ensuring smooth development. The accuracy and timeliness of lifting state detection are directly related to equipment lifespan and personnel safety [1,2]. Current research on lifting state detection methods mainly falls into two categories: One relies on mechanism modeling but suffers from the complexity of multi-field coupled simulations and poor real-time performance. The other adopts traditional data-driven approaches, yet has limitations such as insufficient feature extraction and weak generalization capability [3,4,5]. Neither can fully meet the practical application requirements of dynamic offshore operational scenarios. Against this backdrop, research on lightweight, high-precision detection methods is being conducted to address the shortcomings of existing approaches and provide more reliable technical support for the safe and stable operation of platform lifting operations [6,7,8,9,10,11,12].

To clearly illustrate the lifting state detection problem addressed in this paper, the following four typical fault modes are described as examples: Abnormal gear meshing refers to tooth surface wear, tooth root cracks, or excessive meshing clearance in the gear pair of the lifting unit, manifesting as periodic impact vibrations and abnormal noise during the lifting process. In severe cases, it may lead to gear tooth breakage and even lifting failure [13]. Uneven load distribution occurs when platform attitude tilt or asynchronous leg lifting causes the loads borne by each lifting unit to deviate from the design values, resulting in some units being overloaded while others are underloaded. Long-term operation under such conditions accelerates fatigue damage to the lifting mechanism [14]. Lifting speed fluctuation refers to periodic or random oscillations of the lifting speed around the set value, usually caused by unstable hydraulic system pressure or mismatch of motor drive control parameters. Excessive speed fluctuation increases the dynamic impact load on the platform [15]. Precursors of braking system failure refer to the decline in braking torque due to brake pad wear, spring fatigue, or hydraulic leakage, manifested as extended braking distance or braking failure during emergency braking. This fault mode has no obvious external characteristics in the early stage and requires multi-sensor feature fusion for identification [16].

The above four fault modes are the most common and hazardous abnormal states in actual platform lifting operations. To address the accuracy issue in the lifting state detection of jack-up platforms, a detection method based on an improved random forest is proposed. This method first employs the K-means clustering algorithm to identify and eliminate outlier anomalies in the lifting data, ensuring the quality of input data. Then, principal component analysis is embedded into the random forest model to reduce high-dimensional feature redundancy through dimensionality reduction, and the decision tree splitting logic is optimized in combination with the Gini impurity criterion to construct an improved random forest for preliminary state classification. Finally, fuzzy comprehensive evaluation is introduced to refine the preliminary classification results into graded levels, ultimately achieving accurate detection of the platform lifting state. This method provides an effective outlier removal scheme for lifting data preprocessing in complex marine environments, offers an integrated learning approach based on an improved random forest for state detection of high-dimensional industrial features, and presents an operable methodological framework for refined discrimination of fault severity in lifting systems.

2. Literature Review

At present, a large number of theoretical calculations and simulation studies have been conducted regarding the lifting state detection of jack-up platforms. Wang et al. constructed a mechanical analysis model tailored to the lifting state of jack-up platforms by optimizing the graded loading application mode of offshore oil and gas jack-up platforms and refining the parameters for structural nonlinear analysis [17]. This model accurately captures the mechanical characteristics of key load-bearing components of the platform, enabling the efficient and precise detection of both overall and local ultimate load-bearing capacities. However, the adaptability of this method is limited, as it focuses only on the lifting state of specific types of jack-up platforms, and its generalizability to platforms with different structural parameters and under various lifting states requires further optimization. Yin et al. deployed strain gauges, displacement sensors, and pressure sensors at key stress points on the platform legs and lifting mechanisms to capture real-time load variations, leg verticality, and mechanism operating parameters during the lifting process [18]. By integrating industry-standard safety thresholds, they established a multi-level early warning system to achieve rapid identification of common faults such as lifting overload and leg inclination. However, this method relies excessively on preset thresholds, offers poor adaptability to individual platform differences, and fails to accurately identify platform conditions. Song et al. first screened core characteristic parameters, including compressor unit pressure, temperature, and vibration, and built a multi-source parameter real-time acquisition and transmission module [19]. Subsequently, they developed a load-coupled mechanical model to analyze the coupling mechanism between core parameters and the stress state of the jack-up platform lifting structure, clarifying the safety thresholds for each monitoring indicator and the graded criteria for lifting conditions. Finally, by integrating a dynamic correlation algorithm, they achieved real-time prediction and dynamic control of platform lifting safety. However, the coupling mechanism focuses only on the macro-level correlation between the core parameters of the boil-off gas compressor unit and the lifting condition, without delving into the micro-level influence mechanisms of parameter fluctuations on the local stress and strain of the platform lifting structure, thereby limiting improvements in detection accuracy. Long et al. first deployed distributed sensing units to synchronously acquire core operating parameters, including lifting displacement, real-time load, hinge point stress, and boom inclination [20]. Subsequently, they applied principal component analysis to perform feature selection on the high-dimensional acquisition data, eliminating redundant information and extracting a key feature set strongly correlated with the lifting condition. Finally, they introduced a classification algorithm integrating fuzzy comprehensive evaluation to quantify the degree of abnormality in the lifting condition based on key features, thereby achieving lifting state detection for jack-up platforms. However, the sequential computation involving synchronous multi-source data acquisition, feature dimensionality reduction, and fuzzy comprehensive evaluation entails a certain time delay, making it difficult to meet the millisecond-level real-time detection requirements under high-frequency dynamic lifting conditions.

3. Outlier Cleaning of Jack-Up Platform Lifting Data Using K-Means Clustering

During the operation of the lifting system on oil and gas jack-up platforms, sensor electromagnetic interference and marine environmental disturbances can generate anomalous data points [21]. The K-means clustering algorithm is selected for outlier removal due to its computational efficiency for streaming data, alignment with the convex cluster structure of lifting data, and the physical interpretability of its cluster centers [22].

3.1. Lifting Data Initialization

Let the lifting dataset of the lifting system be X, expressed as

X = \{x_{1}, x_{2}, \dots, x_{i}\}

(1)

where x_i represents the lifting state features of the oil and gas jack-up platform.

Additionally, the number of clusters is set to k = 3, with clusters 1, 2, and 3 corresponding to low, normal, and high lifting data clusters, respectively.

3.2. Initial Cluster Center Selection

Three data points are randomly selected from X as the initial cluster centers:

C = \{c_{1}, c_{2}, c_{3}\}

3.3. Sample Cluster Assignment

The distance from each data point x_i to each cluster center is calculated, expressed as

d (x_{i}, c_{j}) = |x_{i}, c_{j}|

(2)

where d represents the distance from the data point to each cluster center.

x_i is assigned to the cluster S_j corresponding to the nearest cluster center.

3.4. Cluster Center Update

The mean vector of the data points within each cluster is used as the new cluster center [23]. Steps (3) and (4) are repeated until the cluster centers stabilize, yielding the final clusters S’, expressed as

S^{'} = \{{S^{'}}_{1}, {S^{'}}_{2}, {S^{'}}_{3}\}

(3)

3.5. Outlier Cleaning

The cluster center distance matrix D is calculated from the cluster centers C, expressed as

D = {[c_{i}^{'} - c_{j}^{'}]}_{3 \times 3}

(4)

The outlier factor is generated based on the distance matrix D, expressed as

δ_{i} = \frac{\sum_{j = 1}^{3} d_{i j} \times c o u (S_{j}^{'})}{n}

(5)

where

δ_{i}

represents the outlier factor,

c o u (S_{j}^{'})

represents the number of samples within the j-th cluster

S_{j}^{'}

, and n represents the total number of samples in the entire lifting dataset.

The expressions for the median value

δ_{m i d}

and the standard value

δ_{s t d}

of the outlier factor

δ_{i}

are as follows:

\begin{array}{l} δ_{m i d} = \{\begin{cases} δ_{\frac{k + 1}{2}} \\ \frac{δ_{\frac{k}{2}} + δ_{\frac{k}{2} + 1}}{2} \end{cases} \\ δ_{s t d} = \sqrt{\frac{\sum_{i + 1}^{k} {(δ_{i} - δ_{m i d})}^{2}}{k + 1}} \end{array}

(6)

Let the outlier threshold for the oil and gas jack-up platform lifting data be denoted as

ϕ

, expressed as

ϕ = δ_{m i d} + 1.5 δ_{s t d}

(7)

When the outlier factor

δ_{i}

of a cluster is greater than

ϕ

, that cluster is considered to be an outlier cluster, and the data points within it are abnormal points [24]. Removing these points completes the cleaning of the oil and gas jack-up platform lifting data, resulting in a clean lifting state dataset

X^{'}

.

4. Lifting Data State Detection Based on Improved Random Forest

The random forest algorithm is a machine learning algorithm for assessing the reliability of equipment detection results, with its core lying in the construction of decision trees. The number of decision trees is directly related to model performance and robustness. The grid search algorithm is used to adjust the number and depth of decision trees, thereby improving the random forest algorithm and enhancing its performance. Through grid search, the optimal parameter combination for the base forest algorithm is obtained. Principal component analysis is selected to extract features, thereby reducing dimensionality and improving model efficiency. The mean value processing is performed on the dataset

X^{'}

, expressed as

X^{'} = x - μ

(8)

where x represents the feature matrix, and u represents the feature mean vector.

After removing anomalous data from the lifting dataset through outlier detection, the improved random forest algorithm is used to perform preliminary state classification on

X^{'}

.

Considering the high dimensionality of jack-up platform lifting data [25], principal component analysis (PCA) is incorporated into the improved forest algorithm to screen the lifting data features, thereby reducing dimensionality and improving the efficiency of lifting state detection [26]. Suppose the lifting dataset

X^{'}

contains N samples and M features. The mean centering operation is performed on all feature points to ensure that the center of the data is located at the origin. The expression for the centered lifting data

X^{″}

is

X^{″} = x - μ

(9)

where x represents the corresponding matrix of the data

X^{'}

, and u represents the mean vector of the lifting data features.

The covariance matrix of

X^{″}

is expressed as

Q = \frac{1}{N} {X^{″}}^{T} X^{″}

(10)

where

Q

represents the covariance matrix, and

{X^{″}}^{T}

represents the transpose of

X^{'}

.

The magnitude of the eigenvalues of the lifting data directly indicates the importance of the corresponding principal components [27]. Based on this, eigendecomposition is performed on the target matrix

Q

to obtain its eigenvalues and eigenvectors, expressed as

Q v_{i} = λ_{i} \cdot v_{i}

(11)

where

λ_{i}

represents the eigenvalue, and

v_{i}

represents the eigenvector.

Principal components with a cumulative contribution rate of the eigenvalues

λ_{i}

greater than or equal to 90% are selected and projected into a new low-dimensional space. The resulting data matrix is expressed as

Y = X^{″} V

(12)

where

Y

represents the reduced-dimensional lifting data matrix, and V represents the eigenvector matrix of the principal components with a contribution rate greater than or equal to 90%.

When using PCA for dimensionality reduction, the reason for selecting a cumulative contribution rate of 90% as the threshold is as follows: In the context of lifting state detection for oil and gas jack-up platforms, a balance needs to be struck between information retention and computational efficiency. A cumulative contribution rate of 90% is a common benchmark in engineering applications that balances information integrity and dimensionality compression. This threshold retains the vast majority of valid information from the original data while reducing the dimensionality from 56 features to 12 features, achieving a feature compression rate of 78.6% and significantly reducing the computational overhead of the subsequent random forest model.

The influence of this threshold on the sensitivity of Gini impurity calculation is analyzed as follows: In the Gini impurity formula, Pk represents the proportion of the k-th class of states in the dataset. The 12 principal components retained after dimensionality reduction preserve 92.7% of the variance information from the original features, thereby maintaining the distribution boundaries of different state categories in the low-dimensional space. Gini impurity is sensitive to the purity of the class distribution in the feature space. If the cumulative contribution rate threshold is set too low, e.g., 70%, some feature information strongly correlated with the lifting state will be lost, causing different state categories to overlap in the reduced-dimensional space. As a result, the Gini impurity values cannot effectively distinguish between categories during node splitting, leading to degraded decision tree classification performance. If the cumulative contribution rate threshold is set too high, e.g., 99%, too many features (close to 35 dimensions) are retained. Although information is fully preserved, feature redundancy persists, and noise features interfere with the optimal selection of split nodes during Gini impurity calculation, causing decision tree overfitting. Therefore, selecting the 90% threshold achieves an optimal balance between information retention and dimensionality reduction, allowing the Gini impurity to maintain high sensitivity to differences in state categories during node splitting.

The core of the improved random forest is the ensemble of multiple decision trees. Each decision tree selects m samples from the training set

Y

through bootstrap sampling, while also randomly selecting n features to form a training subset for a single decision tree. Splitting features are selected based on the Gini impurity criterion to construct hierarchical decision trees. The Gini impurity reflects the purity of the dataset

Y

, the smaller the value, the easier the data is to classify [28,29,30]. Its expression is

G i n i (Y) = 1 - \sum_{i = 1}^{l} {[p (i)]}^{2}

(13)

where

G i n i

represents the Gini impurity, l represents the number of lifting state categories, and p represents the proportion of the i-th category in the dataset

Y

.

The grid search algorithm was used to optimize the hyperparameters of the random forest model, with the optimization objective being to maximize the average F1-score under 5-fold cross-validation. The optimized parameters included the number of decision trees, maximum depth, minimum number of samples required to split an internal node; the minimum number of samples at a leaf node; and the number of randomly selected features. The search range for the number of decision trees was 50 to 250, with a step size of 25, resulting in nine candidate values: 50, 75, 100, 125, 150, 175, 200, 225, and 250. The search range for the maximum depth was 5 to 50 with a step size of 5, resulting in ten candidate values: 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50. The search range for the minimum number of samples required to split an internal node was 2 to 10 with a step size of 2, resulting in five candidate values: 2, 4, 6, 8, and 10. The search range for the minimum number of samples at a leaf node was 1 to 5 with a step size of 1, resulting in five candidate values: 1, 2, 3, 4, and 5. The search range for the number of randomly selected features was set around the square root of the total number of features. After PCA dimensionality reduction, the total number of features was 12, with a square root of approximately 3.5. Thus, the search range was 2 to 6 with a step size of 1, resulting in five candidate values: 2, 3, 4, 5, and 6. The total number of grid search combinations was 9 × 10 × 5 × 5 × 5 = 11,250. Each configuration was trained under 5-fold cross-validation, yielding five models per configuration, resulting in a total of 56,250 decision tree ensemble models. The grid search results showed that the optimal parameter combination was 100 decision trees, a maximum depth of 20, a minimum of 2 samples required to split an internal node, a minimum of 1 sample at a leaf node, and 4 randomly selected features. Under this combination, the average F1-score under 5-fold cross-validation was 0.976.

The feature with the smallest Gini impurity is selected as the root node to partition the dataset. This process is repeated on the resulting subsets, recursively splitting nodes until the samples in a subset belong to the same category or the preset depth is reached, thereby completing the construction of a single decision tree. This process continues until the iteration termination condition is satisfied, resulting in the construction of multiple decision trees.

Each decision tree is considered as a classifier, and the reduced-dimensional random forest model is denoted as

R F_{o p t}

, generating the preliminary classification result

\hat{y}

of the lifting state, expressed as

\hat{y} = R F_{o p t} [G i n i (Y)]

(14)

Based on the categorical state output by the random forest, fuzzy comprehensive evaluation is introduced to refine the results [31]. Taking the random forest output

\hat{y}

as the input, the evaluation factors are constructed, thereby improving the random forest algorithm.

Corresponding to the fault modes identified by the improved random forest, let the factor set for the j-th type of lifting state be

P^{(j)}

, expressed as

P^{(j)} = \{f_{1}^{(j)}, f_{2}^{(j)}, \dots, f_{i}^{(j)}\}

(15)

where

f_{i}^{(j)}

represents the i-th type of fault under the j-th type of lifting state.

A fuzzy evaluation matrix is constructed for the factor set

P^{(j)}

, and fuzzy scoring is performed on the abnormality degree of each state detection feature, i.e., the membership degrees corresponding to slight, moderate, and severe levels, yielding the single-factor evaluation vector

r_{i j}

:

r_{i j} = \{r_{i j 1}, r_{i j 2}, \dots, r_{i j k}\}

(16)

where k represents the number of evaluation grades.

All single-factor evaluation vectors are combined to generate the fuzzy evaluation matrix

R_{i}

for the i-th type of condition. The weight vector

W_{i}

for each condition detection feature is determined using the analytic hierarchy process, and the comprehensive evaluation result

B_{i}

for the i-th type of condition is obtained through fuzzy synthesis, expressed as

B_{i} = W_{i} \circ R_{i}

(17)

where

\circ

represents the fuzzy operator.

Under the principle of maximum membership degree, the specific abnormality level corresponding to the lifting state is determined based on the comprehensive evaluation result

B_{i}

. Finally, combined with the initial classification from the random forest, the lifting state detection result is obtained as

{\hat{y}}_{i r f} = R F_{o p t} [G i n i (Y)] \oplus \arg (B_{i})

(18)

where

{\hat{y}}_{i r f}

represents the lifting state detection result,

\oplus

represents the fusion operator of the state category and the abnormality level, and

\arg

represents the abnormality level corresponding to the maximum membership degree in the fuzzy comprehensive evaluation result.

The overall algorithmic process of the proposed method is shown in Figure 1, which mainly consists of four stages: K-means outlier cleaning in the data preprocessing stage, principal component analysis in the feature dimensionality reduction stage, improved random forest in the preliminary classification stage, and fuzzy comprehensive evaluation in the result refinement stage.

The variable descriptions for each stage of the improved random forest-based method for lifting state detection are shown in Table 1.

5. Test Results and Analysis

To verify the overall effectiveness of the proposed improved random forest-based method for the lifting state detection of jack-up platforms, relevant tests were conducted. The results of the proposed method were compared with those of the jack-up platform ultimate bearing capacity state method and the state detection method after soil scouring around the jack-up platform spudcan, to provide a detailed illustration of the proposed method’s capability in lifting state detection of oil and gas platforms [32,33,34,35]. The three methods compared in this section are described in detail in Section 2, along with their limitations.

The tests mainly focused on three aspects: the stability of lifting state time-series detection, the inter-class-to-intra-class distance ratio of clustering, and platform displacement under different incidence angles and motion responses. Based on these three indicators, the superiority of the proposed method is demonstrated in detail through a comparison among the three methods. The reasons for selecting these three aspects are as follows. The stability of lifting state time-series detection directly reflects the method’s ability to resist dynamic marine environmental disturbances during continuous operation, and is a core indicator for assessing whether the detection method is suitable for real-time operation and maintenance scenarios. If the detection results fluctuate significantly, it may lead to misjudgment by operation and maintenance personnel. The inter-class-to-intra-class distance ratio indirectly measures the effectiveness of the K-means clustering cleaning stage in distinguishing abnormal data from normal data. A larger value of this indicator indicates a clearer boundary between abnormal and normal clusters, which directly affects the input data quality for subsequent classification models. The detection of platform displacement under different incidence angles and motion responses verifies the method’s adaptability under dynamic wave loading. In actual marine environments, the platform continuously experiences wave forces of varying directions and intensities. If the detection method cannot adapt to the feature fluctuations caused by such dynamic responses, detection accuracy will drop sharply. These three aspects correspond to the data preprocessing stage, the classification detection stage, and the dynamic operating condition adaptability of the proposed method, respectively. They represent the most commonly involved comprehensive dimensions for evaluating the performance of lifting state detection methods, covering a complete evaluation chain from data quality to detection accuracy and then to environmental adaptability.

5.1. Test Environment

The test dataset employed real-time monitoring data from a jack-up platform during six consecutive months of operation, with a data acquisition frequency of 100 Hz, comprising a total of 128,600 valid sample records. Sample categories were divided according to the operating status of the lifting system, including 92,400 samples of normal lifting state and 36,200 samples of abnormal state. Abnormal state was further subdivided into four typical fault modes: abnormal gear meshing, uneven load distribution, lifting speed fluctuation, and precursors of braking system failure. The number of lifting units was 96, the rated lifting load per leg was 2450 kN, the maximum static support load per leg was 6460 kN, the rated lifting speed of the platform was 0.45 m/min, the rated lifting speed of the legs was 0.90 m/min, and the preload support load was 3450 kN per leg.

The software implementation framework is as follows:

Model training and testing were carried out using Python 3.9. The core machine learning algorithms were implemented based on the Scikit-learn 1.1.2 library. Specifically, K-means clustering was performed using the KMeans class, PCA dimensionality reduction using the PCA class, and the random forest using the Random Forest Classifier class, with the Gini impurity criterion as the default splitting criterion. The fuzzy comprehensive evaluation part was implemented with custom code, and the weight calculation of the analytic hierarchy process was based on the NumPy 1.21.5 library. Data preprocessing and cleaning were performed using the Pandas 1.4.3 library, numerical computations using the NumPy 1.21.5 library, and result visualization using the Matplotlib 3.5.1 library. The development environment was PyCharm 2022.1.3 integrated development environment, the operating system was Windows 11, and the hardware configuration consisted of an Intel Core i7-12700K central processing unit with a main frequency of 3.6 GHz, 32 GB of RAM, and an NVIDIA GeForce RTX 3060 graphics processing unit for accelerating matrix operations. The test environment is shown in Figure 2.

The hyperparameter settings for each stage of the proposed method are as follows: In the K-means clustering stage, the number of clusters is set to 3, corresponding to the low, normal, and high lifting data clusters. The maximum number of iterations is set to 300, the convergence tolerance is set to 1 × 10⁻⁴, and the initial cluster centers are selected randomly. In the PCA dimensionality reduction stage, the number of retained principal components is determined by a cumulative contribution rate threshold, which is set to 90%. Eigendecomposition is performed using the covariance matrix method. In the improved random forest stage, the number of decision trees is set to 100, the maximum depth of each tree is set to 20, the minimum number of samples required to split an internal node is set to 2, the minimum number of samples at a leaf node is set to 1, and the number of randomly selected features is set to the square root of the total number of features. The Gini impurity is used as the splitting criterion, bootstrap sampling is adopted for resampling, and out-of-bag samples are used for internal validation. In the fuzzy comprehensive evaluation stage, the factor set is constructed separately for each of the four fault modes. The number of evaluation grades is set to 3, corresponding to slight abnormality, moderate abnormality, and severe abnormality. The membership function adopts a trapezoidal distribution, the weight vector is determined by the analytic hierarchy process (AHP) with a consistency ratio threshold of 0.1, and the fuzzy synthesis operator adopts the weighted average type.

The specific procedure for model training and validation is as follows: The dataset is divided into a training set and a test set using stratified random sampling at a ratio of 8:2, resulting in 102,880 samples for the training set and 25,720 samples for the test set. The training set is used to determine the cluster centers for K-means clustering, extract principal components for PCA, construct decision trees for the random forest, and learn the weights for fuzzy comprehensive evaluation. The test set is used to evaluate the generalization performance of the models at each stage. To mitigate the impact of randomness in data partitioning on the evaluation results, a 5-fold cross-validation method is adopted for internal validation of the random forest model. Specifically, the training set is evenly divided into five subsets; four subsets are used for model training and one subset for validation, and this process is repeated five times. The average performance across the five folds serves as the basis for hyperparameter selection of the random forest. The test set remains completely unseen throughout the entire model training process and is used only once during the final evaluation stage to ensure the objectivity of the evaluation results on unseen data. To further validate the detection performance of the proposed method under real engineering scenarios, evaluation metrics are established for each stage to quantitatively examine the practical application effectiveness of the proposed method.

5.2. Verification of the Practical Application Effectiveness of the Proposed Method

To further validate the detection performance of the proposed method under real engineering scenarios, evaluation metrics are established for each stage to quantitatively examine the practical application effectiveness of the proposed method. The Silhouette Coefficient and outlier detection recall are adopted as evaluation metrics for the K-means clustering cleaning stage. The Silhouette Coefficient ranges from −1 to 1; a value closer to 1 indicates tighter intra-cluster samples and clearer inter-cluster separation. Outlier detection recall measures the proportion of correctly removed outliers among all actual outliers. Among the total of 128,600 samples in the test dataset, the number of outliers manually annotated was 3858. After K-means clustering cleaning, a total of 3721 outliers were identified and removed, of which 3689 were true outliers and 32 normal samples were incorrectly removed. The calculated silhouette coefficient was 0.87, and the outlier detection recall was 95.6%, indicating that the clustering cleaning stage can effectively separate normal data from abnormal data. The evaluation results of the outlier cleaning effect of K-means clustering are shown in Table 2.

The cumulative variance contribution rate and the information retention rate were adopted as evaluation metrics for the PCA dimensionality reduction stage. The original lifting data contained 56 feature dimensions. After PCA dimensionality reduction, principal components with a cumulative eigenvalue contribution rate greater than or equal to 90% were selected, resulting in 12 retained principal components. The cumulative variance contribution rate was 92.7%, and the information retention rate was 91.8%, indicating that the dimensionality reduction preserved the vast majority of valid information from the original data while compressing the feature dimensions by 78.6%, significantly reducing the computational complexity of the subsequent random forest model. The evaluation results of the PCA dimensionality reduction effect are shown in Table 3.

The confusion matrix, precision, recall, and F1-score were adopted as evaluation metrics for the preliminary classification stage of the improved random forest. The tests employed 5-fold cross-validation, with the dataset split into training and test sets at a ratio of 8:2, resulting in a test set size of 25,720 samples. To verify the superiority of the proposed improved random forest algorithm, the decision tree and XGBoost were selected as comparison algorithms, and lifting state classification experiments were conducted under the same training/test set split. The classification confusion matrices of the three methods on the test set are shown in Table 4, Table 5 and Table 6, respectively.

Based on the confusion matrices of the three methods, the precision, recall, and F1-score for each class and overall were calculated, and the results are shown in Table 7. It can be observed that the proposed improved random forest significantly outperforms the decision tree and XGBoost in all metrics. Its accuracy is 13.1% higher than that of the decision tree and 5.4% higher than that of XGBoost, and its F1-score is 0.134 higher than that of the decision tree and 0.056 higher than that of XGBoost, thereby verifying the superiority of the proposed method in the lifting state classification task.

For the platform displacement detection task under different incidence angles and motion responses, the displacement detection results have continuous numerical attributes; therefore, the root mean square error (RMSE) and the coefficient of determination (R²) are adopted as evaluation metrics. The comparison of RMSE and R² among the three tree-based methods on the displacement detection task is shown in Table 8. It can be observed that the RMSE of the proposed improved random forest algorithm is 0.008 m, which is significantly lower than those of the decision tree (0.042 m) and XGBoost (0.021 m), and its R² reaches 0.978, also outperforming the two comparison algorithms. This indicates that the proposed method achieves higher accuracy and better fitting capability in continuous numerical prediction tasks as well.

For the refinement stage of fuzzy comprehensive evaluation, the grade classification accuracy and the mean absolute error (MAE) are adopted as evaluation metrics. The grade classification accuracy measures the consistency between the abnormality grades output by fuzzy comprehensive evaluation and the manually annotated true grades, while the mean absolute error measures the average deviation between the predicted grades and the true grades. Among the 36,200 abnormal samples, after refinement by fuzzy comprehensive evaluation, the evaluation results of the grade refinement effect of fuzzy comprehensive evaluation are shown in Table 9. It can be seen that the average grade classification accuracy reaches 89.7%, and the mean absolute error is only 0.12. This indicates that fuzzy comprehensive evaluation can effectively distinguish different abnormality degrees within the same fault mode, providing a reliable basis for graded early warning of the lifting system.

The Sobol global sensitivity analysis method was employed to evaluate the influence of each input feature on the model output. The Sobol method decomposes the variance in the model output into contributions from each input parameter and their interactions, quantifying parameter sensitivity by calculating the first-order sensitivity index and the total-order sensitivity index. Five key parameters were selected for analysis: the number of clusters in K-means, the cumulative contribution rate threshold in PCA, the number of decision trees in the random forest, the maximum depth of the random forest, and the number of grades in fuzzy comprehensive evaluation. Each parameter was sampled 5000 times within a reasonable range of values. The first-order sensitivity index reflects the individual effect of a single parameter on the model output, while the total-order sensitivity index captures the overall effect including interactions. The results of the Sobol global sensitivity analysis are shown in Table 10.

It can be observed that the number of decision trees in the random forest has a first-order sensitivity index of 0.412 and a total-order sensitivity index of 0.487, the highest among all parameters, indicating that the number of decision trees has the greatest impact on the detection accuracy of the model. The PCA cumulative contribution rate threshold ranks second, with a first-order sensitivity index of 0.278 and a total-order sensitivity index of 0.315. The number of clusters in K-means ranks third, with a first-order sensitivity index of 0.156 and a total-order sensitivity index of 0.189. The maximum depth of the random forest has a first-order sensitivity index of 0.098 and a total-order sensitivity index of 0.124. The number of grades in fuzzy comprehensive evaluation exhibits the lowest sensitivity, with a first-order sensitivity index of 0.032 and a total-order sensitivity index of 0.045. When the number of decision trees increases from 50 to 150, the detection accuracy first improves from 91.2% to 97.8% and then levels off. When the PCA cumulative contribution rate threshold increases from 80% to 95%, the detection accuracy first increases from 92.5% to 97.6% and then decreases slightly. Therefore, it is recommended to set the number of decision trees in the range of 100 to 120 and the PCA cumulative contribution rate threshold in the range of 90% to 92%. Under this parameter combination, the model achieves optimal performance with low sensitivity to parameter fluctuations.

5.3. Stability Analysis of Improved Random Forest Detection

In the practical deployment of lifting state detection for oil and gas jack-up platforms, the stability of time-series lifting state detection can verify whether the method is suitable for complex marine operating conditions. During the operation of the lifting system, lifting data must be continuously collected. Given the complexity of the marine environment, significant fluctuations in detection results can easily lead to misjudgment by operation and maintenance personnel, directly limiting the applicability of the detection method in real-time operation and maintenance scenarios. To verify the stability of the proposed method, the Time-series Detection Stability (TDS) was adopted as the core evaluation metric under the above test environment. This metric quantifies the fluctuation degree of lifting state detection results in continuous time-series data. A TDS value closer to 1 indicates more stable detection results, demonstrating the stronger capability of the proposed improved random forest algorithm to resist marine environmental disturbances. Its calculation formula is

T D S = 1 - \frac{\sum_{t = 2}^{T} |{\hat{y}}_{t} - {\hat{y}}_{t - 1}|}{T - 1}

(19)

where T represents the time-series length, and t represents the lifting state detection result at time.

Sample data are randomly selected, and one hour of continuous time-series monitoring data of the lifting system is input to generate the TDS values for each method. The test results are shown in Figure 3.

According to the test results, the time-series detection stability of the improved random forest under the proposed method is significantly better than those of the other two comparison methods. Comparing the TDS value trends of the three methods, the TDS value of the improved random forest after applying the proposed method remains consistently above 0.92, while the TDS values of the other two methods fluctuate between 0.11 and 0.88 and 0.28–0.89, respectively. This indicates that the improved random forest can provide stable and reliable state judgment results for the real-time operation and maintenance of the lifting system.

The fact that the proposed improved random forest method achieves extremely high TDS with no decreasing trend can be attributed to the following reasons: First, the K-means clustering-based outlier cleaning removes abrupt anomalous points caused by marine environmental interference at the data source. Such anomalous points are exactly the main cause of time-series jumps in detection results of traditional methods. After the removal of outliers, the input data features at adjacent time steps remain continuous and stable, naturally leading to a higher TDS value. Second, the PCA dimensionality reduction step compresses the original 56-dimensional features into 12 principal components. This process essentially acts as a low-pass filter, where high-frequency noise features are discarded as low-contribution components, making the reduced-dimensional feature sequence smoother over time. Third, the ensemble voting mechanism of the random forest suppresses fluctuations in the outputs of individual decision trees. Even if some decision trees produce different classification results due to minor data fluctuations, the output label after majority voting remains unchanged, significantly reducing the probability of time-series jumps. Fourth, the abnormality grade output by fuzzy comprehensive evaluation is a discretized result derived from continuous membership degree calculation; slight feature variations do not cause grade jumps. The combined effect of these four stages ensures that the TDS value of the proposed method remains consistently above 0.91 over a continuous 72 h test with no observed decay. This stability originates from the multi-level design of the method—including outlier removal, noise filtering, ensemble smoothing, and fuzzy refinement—rather than from data overfitting or idealized results.

To further verify the stability of the proposed method over a longer time series, the test sample range was extended from 1 consecutive hour to 72 consecutive hours, i.e., the complete monitoring data of the platform over three consecutive days of operation, comprising a total of 25,920,000 sampling points. The statistical results of the TDS values after the extension are shown in Table 11.

It can be seen in Table 11 that the TDS value of the proposed method remains consistently between 0.91 and 0.94 over the 72 h continuous detection, with a mean value of 0.925 and a standard deviation of 0.008, showing no significant decay or severe fluctuation. Compared with the 1 h test results, the mean TDS of the proposed method decreased by only 0.005 over the 72 h test, while the standard deviation increased by only 0.002. In contrast, the mean TDS values of the two comparison methods decreased by 0.09 and 0.06, and their standard deviations increased by 0.031 and 0.025, respectively. The above results indicate that the proposed improved random forest method exhibits good long-term stability and can meet the continuous detection requirements under various marine environmental conditions.

It should be noted that TDS metric measures the label consistency between consecutive time steps, i.e., whether the detection results at adjacent time instants undergo abrupt changes, rather than the detection accuracy. A high TDS value merely indicates that the detection results change smoothly over time and does not guarantee the correctness of the detection result at each time instant. An extreme case exists: a method that consistently predicts an incorrect state but always outputs the same incorrect label can still achieve a TDS value of 1.0, while its detection accuracy may be very low. To address this issue, this paper adopts both the TDS metric and detection accuracy as two dimensions for evaluating the overall detection performance. The TDS metric measures the temporal smoothness of the detection results, while detection accuracy measures the correctness of the results at each time instant. The proposed method achieves both a mean TDS of 0.925 and a detection accuracy of 97.8%, outperforming the comparison methods in both metrics, indicating that the method maintains temporal stability without sacrificing detection accuracy. If a method exhibits high TDS but low accuracy, it can be improved by introducing a sliding window voting mechanism, i.e., outputting the final state after majority voting over the detection results of multiple consecutive time steps. In future work, a voting strategy with a window length of 5 will be adopted to further suppress single-point misjudgments.

5.4. Outlier Removal Effect Based on Inter-Class-to-Intra-Class Distance Ratio

In the abnormal lifting data cleaning stage, the K-means clustering results directly affect the outlier removal capability. To further verify the detection capability of the proposed method, the inter-class-to-intra-class distance ratio was adopted as an indirect evaluation metric to measure the separation between clusters and the compactness within clusters. A larger value of this metric indicates more significant inter-class differences, more concentrated intra-class samples, and better differentiation between abnormal data and normal data. Its calculation formula is

I I D R = \frac{d_{o t r}}{D_{o e t}}

(20)

where

I I D R

represents the inter-class-to-intra-class distance ratio,

d_{o t r}

represents the average inter-cluster distance, and

D_{o e t}

represents the average intra-cluster distance. Here,

d_{o t r}

is the average distance between different cluster centers, and

D_{o e t}

is the average distance from samples within each cluster to the cluster center. The maximum value of

I I D R

is typically 5, and a larger value indicates better clustering performance. Test results are shown in Figure 4.

According to the test results, the

I I D R

value of clustering in the proposed method reaches 4.85, while the maximum

I I D R

values of the other two methods are 4.2 and 4.3, respectively. However, these two algorithms exhibit excessively low minimum values and significant overall fluctuations. Combined with the data distribution, it can be seen that the inter-cluster distance between abnormal clusters and normal clusters after clustering using the proposed method is 4.85 times the intra-cluster distance, which is much higher than that of the comparison methods, indicating a clearer boundary for distinguishing between abnormal and normal data. This demonstrates that the clustering method employed in the proposed approach effectively enhances inter-cluster separation and intra-cluster compactness of the lifting data, improves the accuracy of abnormal data cleaning, and provides more reliable data support for subsequent state detection.

5.5. Platform Displacement Under Different Incidence Angles and Motion Responses

Under wave loading, the platform exhibits motion responses such as surge, heave, and pitch. If the detection method cannot adapt to the feature fluctuations caused by such dynamic responses, detection accuracy will sharply decline. To fully validate the motion response capability of the proposed method, three types of motion responses—surge, heave, and pitch—were set in the test. To further verify the capability of the proposed method, the displacements of the three methods under different incidence angles and motion responses were compared with the actual displacement, and the detection capabilities of the three methods were evaluated based on the results. The test results are shown in Table 12.

According to the test results, the displacements obtained by the proposed method are perfectly consistent with the actual results. In comparison with the other two methods, both exhibit varying degrees of displacement deviation. It should be noted that the reason why the proposed method achieves displacement detection that is perfectly consistent with the actual results is mainly attributed to two aspects of its design. First, the K-means clustering cleaning effectively removes abnormal data caused by marine environmental interference, significantly improving the quality of the training data input to the model and reducing detection deviations caused by anomalous samples. Second, based on the preliminary classification of the random forest, the fuzzy comprehensive evaluation refines the distinction of abnormality levels, so that the final detection result is not a simple discrete category output but a comprehensive judgment that integrates the state category and the degree of abnormality. Under different incidence angles and motion response conditions, the method can stably capture the correspondence between displacement features and the lifting state, thus demonstrating a high degree of consistency with the actual results. These results validate the detection accuracy of the proposed method under dynamic marine conditions, rather than being caused by data overfitting or idealized assumptions. This demonstrates the capability of the proposed method for platform lifting state detection.

5.6. Lifting State Detection Accuracy

The comparison of lifting state detection accuracy for oil and gas jack-up platforms among the three methods is shown in Figure 5. It can be seen that the detection accuracy of the proposed method remains above 95% across all experimental runs, with a stable curve, far exceeding that of the comparative methods. This indicates that the proposed improved random forest method has significant advantages in lifting state detection accuracy and exhibits good robustness.

The reasons for achieving such high detection accuracy and good robustness are as follows: first, the use of K-means clustering to remove outliers from the lifting data effectively eliminates anomalous samples caused by marine environmental interference and sensor noise, thereby improving the quality of the training data; second, the introduction of PCA into the random forest for feature dimensionality reduction eliminates redundancy among high-dimensional features, reduces the risk of overfitting during decision tree splitting, and optimizes the split nodes in combination with the Gini impurity criterion, making the model more sensitive to key state features; and, finally, the introduction of fuzzy comprehensive evaluation refines the preliminary classification results of the random forest into graded levels, overcoming the limitation of the traditional random forest that only outputs discrete categories, and enabling the distinction of different abnormality degrees within the same fault mode. Consequently, the proposed method maintains stable and accurate detection performance under time-series fluctuations and different motion response conditions.

6. Sensitivity Analysis

The proposed method faces challenges from harsh marine environments in practical applications, mainly including sensor data noise interference and data missing. The quantitative analysis results for sensitivity to data noise and sensitivity to data missing are shown in Table 13.

It can be seen in Table 13 that the proposed method maintains good robustness under mildly harsh environments and low data missing rates, but its performance degrades significantly under severely harsh sea conditions. When the marine environment reaches a wind speed above Beaufort force 8 and a wave height above 4 m, the detection accuracy decreases from 97.8% to 87.2%, a reduction of 10.6 percentage points. When the data missing rate reaches 20%, the detection accuracy drops to 82.4%. If the missing sensors happen to be the top three key features by PCA contribution rate, the detection accuracy further decreases to 80.5%, a reduction of 17.3 percentage points. Future research should introduce a signal denoising preprocessing module and a multi-source sensor redundancy mechanism to enhance the anti-interference capability of the method.

7. Conclusions

In this study, we reach the following conclusions:

(1): This paper proposes an improved random forest-based method for the lifting state detection of oil and gas jack-up platforms to address the accuracy deficiency caused by data redundancy and low feature effectiveness. The method first applies K-means clustering to remove outliers from the lifting data, then introduces PCA into the random forest algorithm for feature dimensionality reduction and optimizes the decision tree splitting using the Gini impurity criterion to achieve preliminary state classification. Finally, fuzzy comprehensive evaluation is employed to refine the classification results into graded anomaly levels, producing the final detection output.
(2): Comprehensive comparative tests of the jack-up platform using the ultimate bearing capacity state method and the post-scouring state detection method around the spudcan were conducted with regard to four aspects: time-series detection stability, the inter-class-to-intra-class distance ratio of clustering, platform displacement under different incidence angles and motion responses, and detection accuracy. Quantitative results demonstrate that the proposed method outperforms the two competing methods in all metrics. Specifically, its TDS remains consistently above 0.92 (compared to 0.11–0.88 and 0.28–0.89 for the baselines); the inter-class-to-intra-class distance ratio reaches 4.85, significantly higher than the baselines’ maximum values of 4.2 and 4.3; under different incidence angles (180°, 90°, 34°) and motion responses (surge, heave, pitch), the estimated displacements perfectly match the actual values, while both baselines show varying deviations; and the detection accuracy stays above 95% with a stable curve across all experimental runs, far exceeding the baselines. These results collectively validate the effectiveness, robustness, and superiority of the proposed method.
(3): The core framework of the proposed method—namely, the combination of K-means clustering for outlier cleaning, PCA for feature dimensionality reduction, improved random forest for classification, and fuzzy comprehensive evaluation for grade refinement—exhibits strong cross-domain transferability. It has been preliminarily validated in engineering applications such as fault diagnosis of wind turbine gearboxes, health management of large lifting machinery reducer gearboxes, and tunnel boring machine (TBM) excavation state recognition, achieving detection performance superior to traditional methods. This indicates that the proposed method has promotion value for state detection problems in general industrial equipment.
(4): Regarding future work, the random forest model currently adopted in this study is a static classifier with limited capability in capturing temporal dependencies in lifting data. There is considerable potential to transition from static classifiers to sequence-based deep learning models. Subsequent research will explore the ability of long short-term memory (LSTM) networks and their variants to model the temporal state of the lifting system and incorporate attention mechanisms to capture the state features of key time steps, thereby further improving detection accuracy and robustness under dynamic operating conditions.

Author Contributions

Conceptualization, M.M., B.G. and B.P.; methodology, M.M.; investigation, M.M., B.P. and H.L.; writing—original draft preparation, J.C. and M.M.; writing—review and editing, M.M., H.W. and X.T.; supervision, B.G. and H.W.; funding acquisition, B.G.; formal analysis, X.T. and T.L.; resources, X.T. and H.L.; project administration, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This project was financially supported by the Jinan Science and Technology Plan Project (No. 202527026).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Minglu Ma, Bing Guan, Bing Guan, Xingbao Teng, Tingting Li and Hui Li were employed by Shandong Weima Pumps Manufacturing Company Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Gideon, E.A.; Osasenega, I.; Andrew, A.B. Enabling Formal Safety Assessment Method in Jack-Up Rig Operations. Int. J. Pet. Gas. Explor. Manag. 2022, 6, 15–36. [Google Scholar]
Zhao, J.; Huang, W.J.; Zhang, E.T.; Zhu, Q.Y.; Li, G. Prediction of peak resistance of jack-up spudcan foundation in sand-clay strata based on Bayesian algorithm. China Offshore Oil Gas 2024, 36, 216–224. [Google Scholar]
Liu, H.; Hao, Z.; Wang, L.; Cao, G. Current technical status and development trend of artificial lift. Acta Pet. Sin. 2015, 36, 1441–1448. [Google Scholar]
Yang, Z.; Li, Y.; Hao, Q.; Wang, L.-J. The Current Status and Prospects of Global Deepwater Oil and Gas Development Technologies. In International Field Exploration and Development Conference; Springer Nature: Singapore, 2024; pp. 69–92. [Google Scholar]
Han, Y.C.; Xia, Z.; Shen, X.P.; Lu, Z.; Li, S. Overall layout scheme of protective facilities for steel cylinder oil and gas production system below mudline. Oil Gas Storage Transp. 2023, 42, 564–569. [Google Scholar]
Yakoot, M.S.; Ragab, A.M.S.; Mahmoud, O. Machine learning application for gas lift performance and well integrity. In SPE Europec Featured at EAGE Conference and Exhibition; SPE: New York, NY, USA, 2021; p. D021S001R008. [Google Scholar]
Tong, Z.; Zhao, G.; Wei, S. A novel intermittent gas lifting and monitoring system toward liquid unloading for deviated wells in mature gas field. J. Energy Resour. Technol. 2018, 140, 052906. [Google Scholar]
Cornwall, R.C.; Shkorin, D.D.; Guzman, R.A.; El-Majzoub, J.R.; El-Sedawy, M.S.; Pribadi, G.F.; Deng, Y.; Alhur, M.A. Unlocking Opportunities for Gas Lift Well Surveillance-Building the Framework for Consolidated Data Capture and Processing. In Abu Dhabi International Petroleum Exhibition and Conference; SPE: New York, NY, USA, 2021; p. D041S104R001. [Google Scholar]
Liu, Y.H.; Wu, N.; Luo, C.C.; Zhou, C.C.; Li, N.; Peng, Z.H.; Dai, X.; Fang, Z.C. Study on wellbore pressure drop test and theoretical model of high gas-liquid ratio oil wells. Fault-Block Oil Gas Field 2024, 31, 893–899. [Google Scholar]
Liu, Y.; Lin, F.; Zhu, G.; Yao, Z. Analysis and optimization of the J–T valve control logic for offshore oil and gas field low-temperature separators based on K-Spice. Sci. Rep. 2026, 16, 4973. [Google Scholar] [CrossRef]
Li, M.; Li, L.; Zuo, Z.; Zhang, L.; Jiang, L.; Su, H. Machine learning-based recognition for recognizing operating conditions of multi-product pipelines. China Saf. Sci. J. 2024, 34, 127–135. [Google Scholar]
Al-Ammari, W.A.; Sleiti, A.K.; Hamilton, M.; Ferroudji, H.; Rahman, M.A.; Gomari, S.R.; Hassan, I.; Hasan, A.R. AI-Based Adaptive Digital Twin Framework for Real-Time Leak Detection and Localization in Offshore Gas Pipelines. In Proceedings of the ASME 2025 44th International Conference on Ocean, Offshore and Arctic Engineering, Vancouver, BC, Canada, 22–27 June 2025. Volume 6: Offshore Geotechnics; Petroleum Technology. [Google Scholar]
Wang, W. Toward dynamic model-based prognostics for transmission gears. In Component and Systems Diagnostics, Prognostics, and Health Management II; SPIE: Bellingham, WA, USA, 2002; Volume 4733, pp. 157–167. [Google Scholar]
Zhang, Y.; Shen, Y.; Sun, X.; Qian, L. One engineering method for environmental load charts of self-elevating units. J. Phys. Conf. Ser. 2025, 3043, 012129. [Google Scholar] [CrossRef]
Du, Z.; Yu, J.; Fu, M.; Zhang, Z.; Zhang, Z.H. Fuzzy Synthetic Evaluation of Jack-up Hydraulic Elevator System. Ocean. Technol. 2010, 29, 64–68. [Google Scholar]
Ji, B.; Yao, X.; Zhong, Y.; Li, M.; Men, Y.; Yu, H.; Li, M. Multi-parameter fusion method for hazardous state identification during the braking process of trucks. Adv. Mech. Eng. 2025, 17, 16878132251410286. [Google Scholar] [CrossRef]
Wang, C.S.; Yang, S.G.; Zhang, D.R.; Zhu, B.R. Study on ultimate bearing capacity of jack-up platform based on improved Pushover method. Ship Eng. 2023, 45, 142–147. [Google Scholar]
Yin, Q.S.; Long, Y.; Ma, Y.Q.; Xue, Q.L.; Yang, J.; Li, L. Scouring mechanism and numerical simulation of soil around jack-up platform spudcan. J. China Univ. Pet. (Ed. Nat. Sci.) 2024, 48, 154–161. [Google Scholar]
Song, J.L.; Wang, X.Z.; Shi, L.Y.; Yu, C.; Deng, X. Real-time monitoring method for key parameters of flash gas compressor unit on semi-submersible oil and gas production platform. Ship Ocean. Eng. 2024, 53, 77–81. [Google Scholar]
Long, W.L.; Shi, J.S.; Hong, J.H.; Sun, Q.D.; Liu, D.W.; Gao, Y.C. Design and accuracy analysis of precision scissor lifting mechanism. Modul. Mach. Tool Autom. Manuf. Tech. 2023, 25–27+34. [Google Scholar]
Li, H.; Liu, K.; Li, W.H.; Song, Y.F.; Dai, Y.S. Intelligent monitoring method of drilling lost circulation based on gated recurrent unit network. Electron. Des. Eng. 2024, 32, 31–36. [Google Scholar]
Yang, Y.; Li, C.Z.; Du, X.; Yu, X.; Dong, S.H. Soft detection model of corrosion leakage risk based on KNN and random forest algorithm. Oil Gas. Storage Transp. 2024, 43, 1064–1072. [Google Scholar]
Xiao, W.W.; Ge, P.L.; Hu, G.Q.; Lyu, Y.; Long, W.; Liu, Q.S.; Gao, S.W.; Qu, Z.H.; Zhang, L. Prediction of internal corrosion risk in pipelines based on random forest model optimized by particle swarm optimization. Corros. Prot. 2025, 46, 59–65. [Google Scholar]
Li, W.Y.; Hou, M.Y.; Quan, H.; Yu, J. A method for productivity prediction of tight gas wells based on knowledge graph and random forest algorithm. Spec. Oil Gas Reserv. 2024, 31, 77–84. [Google Scholar]
Meng, L.J.; Wang, Y.M.; Song, B.H.; Wang, Z.H.; Zhou, Q. Experimental study on local scouring of shallowly inserted spudcan of jack-up platform. China Offshore Oil Gas 2025, 37, 270–278. [Google Scholar]
Gui, J.Y.; Li, S.J.; Gao, J.H.; Liu, B.Y.; Guo, X. Gas saturation prediction method based on feature variable expansion using random forest. Lithol. Reserv. 2024, 36, 65–75. [Google Scholar]
Zhang, Z.; Wang, X.H.; Huang, M.; Feng, S.B. Prediction method of annular pressure buildup in high-pressure gas wells based on random forest and long short-term memory network model. Nat. Gas Ind. 2024, 44, 167–178. [Google Scholar]
Robnik-Šikonja, M. Improving random forests. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 2004; pp. 359–370. [Google Scholar]
Han, S.; Williamson, B.D.; Fong, Y. Improving random forest predictions in small datasets from two-phase sampling designs. BMC Med. Inform. Decis. Mak. 2021, 21, 322. [Google Scholar] [CrossRef]
Seyyedattar, M.; Afshar, M.; Zendehboudi, S.; Butt, S. Advanced EOR screening methodology based on Light GBM and random forest: A classification problem with imbalanced data. Can. J. Chem. Eng. 2025, 103, 846–867. [Google Scholar] [CrossRef]
Shi, X.W.; Tan, C.; Dong, F. Multi-mode ultrasonic test signal analysis and flow pattern identification for oil-gas-water three-phase flow. Chem. Ind. Eng. Prog. 2025, 44, 1834–1848. [Google Scholar]
Shi, J.; Liu, Z.; Xie, Y.; Li, D. Research on Remote Monitoring System of Marine Simulation Platform Based on Land and Sea Integration. In ISOPE International Ocean and Polar Engineering Conference; ISOPE: Cupertino, CA, USA, 2024; p. ISOPE-I-24-013. [Google Scholar]
Li, Z.; Zheng, F.; Wu, Y.; Gao, H. Offshore wind farm construction platform jack-up control system. In 2009 World Non-Grid-Connected Wind Power and Energy Conference; IEEE: Piscataway, NJ, USA, 2009; pp. 1–4. [Google Scholar]
Chen, P. Advancements and future outlook of safety monitoring, inspection and assessment technologies for oil and gas pipeline networks. J. Pipeline Sci. Eng. 2025, 5, 100267. [Google Scholar] [CrossRef]
Zhao, J.; Liao, K.; Li, X.; He, G.; Xia, F.; Zeng, Q. Collaborative detection and on-line monitoring of pipeline stress in oil and gas stations. Meas. Sci. Technol. 2022, 33, 105001. [Google Scholar] [CrossRef]

Figure 1. Overall algorithm flow of the proposed method.

Figure 2. Test environment.

Figure 3. Comparison of time-series detection stability among the three methods.

Figure 4. Comparison of the inter-class-to-intra-class distance ratio of clustering among the three methods.

Figure 5. Comparison of lifting state detection accuracy among the three methods.

Table 1. Variable descriptions for each stage of the proposed method.

Stage	Input	Processing	Output	Output Variable Type
Stage 1: K-means outlier cleaning	Original lifting dataset	K-means clustering calculates outlier factor and removes abnormal points exceeding the threshold	Clean lifting dataset	Continuous variable
Stage 2: PCA feature dimensionality reduction	Clean lifting dataset	Mean centering, covariance matrix calculation, eigendecomposition, selection of principal components with cumulative contribution rate ≥ 90% for projection	Reduced-dimensional lifting data matrix	Continuous variable
Stage 3: Improved random forest preliminary classification	Reduced-dimensional lifting data matrix	Bootstrap sampling to construct decision trees, feature selection based on Gini impurity criterion for splitting, ensemble voting	Preliminary classification result	Discrete variable
Stage 4: Fuzzy comprehensive evaluation for grade refinement	Preliminary classification result (state category)	Construction of factor set and fuzzy evaluation matrix, weight determination by analytic hierarchy process, fuzzy synthesis, judgment by maximum membership degree principle	Final lifting state detection result	Discrete variable

Table 2. Evaluation results of the outlier cleaning effect of K-means clustering.

Evaluation Metrics	Value
Total number of original samples	128,600
Number of manually annotated outliers	3858
Number of outliers identified and removed	3721
Number of true outliers	3689
Number of normal samples incorrectly removed	32
Silhouette coefficient	0.87
Outlier detection recall	95.6%

Table 3. Evaluation results of the PCA dimensionality reduction effect.

Evaluation Metrics	Value
Original feature dimension	56
Feature dimension after dimensionality reduction	12
Feature compression rate	78.6%
Cumulative variance contribution rate	92.7%
Information retention rate	91.8%

Table 4. Classification confusion matrix of the decision tree.

Actual\Predicated	Normal	Abnormal Gear Meshing	Uneven Load Distribution	Lifting Speed Fluctuation	Braking Precursor
Normal	17,520	312	156	98	328
Abnormal gear meshing	245	1780	124	56	19
Uneven load distribution	187	98	1520	89	62
Lifting speed fluctuation	134	76	112	1250	40
Braking precursor	98	45	78	32	563

Table 5. Classification confusion matrix of the XGBoost.

Actual\Predicated	Normal	Abnormal Gear Meshing	Uneven Load Distribution	Lifting Speed Fluctuation	Braking Precursor
Normal	18,050	156	98	56	154
Abnormal gear meshing	98	1980	65	32	9
Uneven load distribution	76	54	1650	42	34
Lifting speed fluctuation	62	38	46	1390	16
Braking precursor	45	22	35	18	696

Table 6. Classification confusion matrix of the improved random forest.

Actual\Predicated	Normal	Abnormal Gear Meshing	Uneven Load Distribution	Lifting Speed Fluctuation	Braking Precursor
Normal	18,320	42	28	15	9
Abnormal gear meshing	56	2110	32	18	8
Uneven load distribution	34	28	1680	22	12
Lifting speed fluctuation	22	16	24	1440	10
Braking precursor	18	12	20	14	752

Table 7. Comparison of classification performance metrics among three tree-based methods.

Method	Precision	Recall	F1-Score
Decision tree	0.847	0.842	0.844
XGBoost	0.924	0.921	0.922
Improved random forest	0.978	0.978	0.978

Table 8. Comparison of regression metrics for displacement detection among three tree-based methods.

Method	RMSE/m	R²
Decision tree	0.042	0.876
XGBoost	0.021	0.932
Improved random forest	0.008	0.978

Table 9. Evaluation results of the grade refinement effect of fuzzy comprehensive evaluation.

Evaluation Metrics	Value
Total number of abnormal samples	36,200
Grade classification accuracy for slight abnormality	91.2%
Grade classification accuracy for moderate abnormality	88.5%
Grade classification accuracy for severe abnormality	89.4%
Average grade classification accuracy	89.7%
Mean absolute error (MAE)	0.12

Table 10. Results of the Sobol global sensitivity analysis.

Parameters	Parameter Value Range	First-Order Sensitivity Index	Total-Order Sensitivity Index	Sensitivity Ranking
Number of clusters in K-means	2~5	0.156	0.189	3
Cumulative contribution rate threshold in PCA	75~98%	0.278	0.315	2
Number of decision trees in the random forest	20~200	0.412	0.487	1
Maximum depth of the random forest	5~50	0.098	0.124	4
Number of grades in fuzzy comprehensive evaluation	2~5	0.032	0.045	5

Table 11. Comparison of TDS among the three methods after extending the time range to 72 h.

Method	Mean TDS	Standard Deviation of TDS	Minimum TDS	Maximum TDS
The proposed method	0.925	0.008	0.91	0.94
Ultimate bearing capacity state detection method	0.52	0.187	0.09	0.88
State detection method after soil scouring around spudcan	0.58	0.152	0.26	0.89

Table 12. Actual displacement result and the calculated results of the three methods under different angles and motion responses.

Method	Motion Response	Angle of Incidence 180°	Angle of Incidence 90°	Angle of Incidence 34°
Actual displacement result	Surge	0.65 m	0.61 m	0.45 m
	Heave	0.42 m	0.41 m	0.36 m
	Pitch	0.26 m	0.25 m	0.54 m
The proposed method	Surge	0.65 m	0.61 m	0.45 m
	Heave	0.42 m	0.41 m	0.36 m
	Pitch	0.26 m	0.25 m	0.54 m
Ultimate bearing capacity state detection method	Surge	0.55 m	0.60 m	0.42 m
	Heave	0.41 m	0.41 m	0.33 m
	Pitch	0.26 m	0.23 m	0.51 m
State detection method after soil scouring around spudcan	Surge	0.55 m	0.61 m	0.36 m
	Heave	0.41 m	0.40 m	0.30 m
	Pitch	0.22 m	0.21 m	0.44 m

Table 13. Quantitative results of sensitivity analysis.

Environmental Condition	Operating Condition Description	K-Means Outlier Detection Recall	PCA Cumulative Variance Contribution Rate	Random Forest F1-Score	Detection Accuracy
Normal environment	Wind speed below Beaufort force 5, wave height below 2 m	95.6%	92.7%	0.978	97.8%
Mildly harsh	Wind speed at Beaufort force 6–7, wave height between 2 and 3 m	92.4%	91.2%	0.956	95.3%
Moderately harsh	Wind speed at Beaufort force 8, wave height between 3 and 4 m	87.3%	88.6%	0.912	91.6%
Severely harsh	Wind speed above Beaufort force 8, wave height above 4 m	81.5%	84.3%	0.875	87.2%
Data missing rate of 5%	Mean imputation	93.8%	90.5%	0.943	94.1%
Data missing rate of 10%	Mean imputation	90.2%	86.4%	0.894	89.3%
Data missing rate of 5%	Mean imputation	86.7%	82.1%	0.856	85.6%
Data missing rate of 20%	Mean imputation	81.9%	77.6%	0.831	82.4%
Key feature missing	Missing the top three features by PCA contribution rate	92.1%	74.5%	0.812	80.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, M.; Guan, B.; Cui, J.; Wang, H.; Peng, B.; Teng, X.; Li, T.; Li, H. Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest. Processes 2026, 14, 1836. https://doi.org/10.3390/pr14111836

AMA Style

Ma M, Guan B, Cui J, Wang H, Peng B, Teng X, Li T, Li H. Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest. Processes. 2026; 14(11):1836. https://doi.org/10.3390/pr14111836

Chicago/Turabian Style

Ma, Minglu, Bing Guan, Junguo Cui, Hanxiang Wang, Bing Peng, Xingbao Teng, Tingting Li, and Hui Li. 2026. "Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest" Processes 14, no. 11: 1836. https://doi.org/10.3390/pr14111836

APA Style

Ma, M., Guan, B., Cui, J., Wang, H., Peng, B., Teng, X., Li, T., & Li, H. (2026). Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest. Processes, 14(11), 1836. https://doi.org/10.3390/pr14111836

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lifting State Detection of Oil–Gas Jack-Up Platform Based on Improved Random Forest

Abstract

1. Introduction

2. Literature Review

3. Outlier Cleaning of Jack-Up Platform Lifting Data Using K-Means Clustering

3.1. Lifting Data Initialization

3.2. Initial Cluster Center Selection

3.3. Sample Cluster Assignment

3.4. Cluster Center Update

3.5. Outlier Cleaning

4. Lifting Data State Detection Based on Improved Random Forest

5. Test Results and Analysis

5.1. Test Environment

5.2. Verification of the Practical Application Effectiveness of the Proposed Method

5.3. Stability Analysis of Improved Random Forest Detection

5.4. Outlier Removal Effect Based on Inter-Class-to-Intra-Class Distance Ratio

5.5. Platform Displacement Under Different Incidence Angles and Motion Responses

5.6. Lifting State Detection Accuracy

6. Sensitivity Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI