Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

: In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k -means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k -means algorithm. The optimized algorithm is used to ﬁlter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different conﬁdence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are veriﬁed. Realizing the classiﬁcation of early warning of gas risks has important practical signiﬁcance for improving the safety of coal mines. a multifactor coupling relationship coal face selection method of initial cluster centers that when different initial values to perform cluster analysis on the four dimensions of data gas accuracy abnormal values gas data. optimization Apriori early ﬁltered The the when multiple the gas grading pre-mechanism derived from the in-depth of gas outliers the early warning performance of disasters. By analyzing the inﬂuence of multifactor coupling gas concentration ﬂuctuations, the corresponding association rules and risk warning models are established and effectively prevent gas overruns and stops, coal operators extent. safe, efﬁcient and stable production has laid a foundation for accurate data fusion analysis the upstream and topological structure of the ventilation the link, to the of underground gas concentration prediction early


Introduction
In most countries, coal mines are threatened by natural disasters such as gas, coal dust, fire, roof collapse and water inrush to varying degrees during the mining process [1][2][3]. Among the many accidents in coal mines, gas accidents are the most prominent. From 2013 to 2020, a total of 225 gas accidents of various types occurred in China, with 1304 deaths accounting for 8.3% of the total accidents and 28.05% of the total deaths. The need for coal mine gas control remains urgent [4,5]. To date, many scholars around the world have conducted research on gas prediction and early warning [6,7]. Song et al. [8] used the R/S analysis method to analyze the gas chaotic characteristics of a gas drainage pipeline in the 1203 working face of the Hongyang No. 2 Mine and used the Hurst index to analyze the trend of gas changes and forecast an early warning of the coexistence of coal and gas. Hu et al. [9] established a grey target model, analyzed the influence of gas pressure, initial speed of diffusion, coal stiffness and damage type on coal and gas outbursts and predicted coal and gas outbursts, indicating the model's performance feasibility. Cheng et al. [10] improved a BP neural network by adjusting the weight of the network with additional momentum and applied it to predict coal and gas outburst disasters, which proved the superiority of the improved algorithm. Kumari et al. [11] introduce a uniform manifold approximation and projection (UMAP) and long short-term memory (LSTM) deep learning model have been proposed to forecast a sealed-off area's fire status in underground coal mines, and experimental research shows that the prediction efficiency of the proposed UMAP-LSTM model is higher than that of the existing SVR and ARIMA models. Slezak et al. [12] introduced a new approach for learning forecasting models over large multi-sensor data sets, including the steps of sliding-window-based feature extraction and rough-set-inspired feature subset ensemble selection, and used it to predict the coal mine methane concentration, and obtained good experimental results. Borowski et al. [13] analyzed the possibility of electricity production using gas engines fueled with methane captured from the Budryk coal mine in Poland, and artificial neural networks with different properties were tested. The developed models have a high value of correlation coefficient but showed deviations concerning the very low values persisting for a short time. The study shows that electricity production forecasting is possible, but it requires data on many variables that directly affect the production capacity of the system. Jo et al. [14] introduced a real-time monitoring, event-reporting and early warning platform, based on cluster analysis for outlier detection, spatiotemporal statistical analysis, and an RSS rangebased weighted centroid localization algorithm for improving safety management and preventing accidents in underground coal mines, and experimental research shows that this system is helpful for solving the problems of accessibility, serviceability, interoperability and flexibility associated with safety in coal mines.
During informatization and intelligentization of coal mines, "massive data and a lack of information" are common problems [15][16][17]. Analyzing and researching mine monitoring data and guiding production based on the analysis results are of great significance to coal mine enterprises to achieve high-quality development of the coal industry and related industries in the new era. Gas monitoring data are related to coal mine production safety, and many abnormal data points often appear in gas monitoring data [18][19][20]. Although the underground coal mine gas monitoring system has gradually become more complete, the analysis and processing of the data collected by the system remain insufficient. Most of the gas data are only researched using mathematical statistics and prediction based on a time series, and association rule mining analysis based on different dimensional influencing factors for outliers is lacking, which makes it difficult to meet the needs of coal mine safety production. Due to their inherent attributes, coal mine gas data tend to have clustering characteristics. When the label information in the sample data is unknown, to reveal the internal relevance of the sample, it is necessary to use a clustering algorithm. Tang et al. [21] proposed a new fuzzy clustering algorithm driven by data and knowledge, named density viewpoint-induced possibilistic fuzzy C-means (DVPFCM). Through experimental studies, including some comparative analyses, this algorithm exhibited better performance in determining the distance between computed clustering centers and reference centers. S. Askari et al. [22] proposed a generalized entropy-based possibilistic fuzzy C-means (GEPFCM) algorithm for clustering noisy data and showed that GEPFCM is more accurate than the PFCM algorithm. Tang et al. [23] proposed a new robust fuzzy C-means (FCM) algorithm for image segmentation called the patch-based fuzzy local similarity C-means (PFLSCM) and demonstrated that it achieves improved segmentation performance in comparison with the results produced by some related FCM-based algorithms. Association rules are primarily used to mine the associations between different data, and they are commonly used in association rule mining algorithms such as the Apriori algorithm and FP-growth algorithm [24][25][26]. However, because of the performance of the algorithm in the execution of different problems, there are certain limitations, so most research is based on the actual problems that solve the algorithm corresponding to optimization and improvement [27,28]. Sudhakar Singh et al. [29] proposed the improved Apriori algorithms VFPC and ETDPC based on MapReduce, conducted a quantitative analysis of the calculation cost, and concluded that the optimized calculation cost reduction results are substantial and more efficient in terms of execution time. Anindita Borah et al. [30] proposed a rare association rule extraction method based on a one-way tree. This method can generate a complete set of frequent and rare patterns, eliminating the need to repeatedly scan the database and rebuild the tree. The effectiveness and superiority of the method were experimentally verified. Ji et al. [31] optimized and improved the Apriori algorithm in terms of performance and used the grid coverage model as the basic model. The improved Apriori algorithm was applied to a WSN coverage optimization process of mobile nodes, and the results showed that the improved algorithm can solve the problem more effectively. Guo et al. [32] proposed an efficient data mining method for mining the association rules of passenger flow between different service lines in an urban rail transit network. The researchers used the Beijing subway network as an example to verify the feasibility of this method. R. Uday Kiran et al. [33] proposed an improved approach algorithm to extract rare association rules and showed the efficiency of the method through the experimental results of synthetic datasets and real datasets. Xin et al. [34] proposed using a reinforcement learning algorithm to improve a treap's large-scale database association rule mining algorithm and conducted an experimental analysis of the algorithm. The results showed that the algorithm can complete the task of mining variable relationships in large databases in a short time. Wang et al. [35] created a new algorithm, TREAP, which combines intermediate values and adjusted p-values for target inference.
Some data objects are included in the data sequence, which are inconsistent with the general behavior or model of the data. These data objects are discrete points. However, outliers are discarded which regarded as noise by most of the data mining methods used in the above studies. These prediction methods are likely to result in the loss of very important information. Rare discrete points in gas data may be more interesting than normal data. In fact, any abnormal point of gas concentration may cause gas disasters. However, the sudden emergence of discrete points in the real-time prediction of gas concentration is likely to cause data mining algorithms to discard them in the data preprocessing process. Therefore, there is a certain lag in the gas warning, and it is difficult to perform accurate warning. In order to improve the accuracy of gas warning, it is necessary to accurately distinguish the outliers in the dataset to determine whether they are interesting outliers. In this paper, based on an optimized k-means clustering algorithm and improved Apriori algorithm, an early warning model for multifactor coupling relationship analysis of gas concentration of working face is constructed. We extract abnormal data in four different dimensional data sets that can directly or indirectly affect gas concentration through cluster analysis and build the association rule learning set between each dimension. It can not only accurately identify the outliers in the data set, but also analyze and find interesting association rule events. Finally, a hierarchical early warning mechanism is established to achieve hierarchical early warning of gas risks, and improve the timeliness and accuracy of early gas warning.

Apriori Algorithm Based on Weight Optimization
The FP-growth and Treap algorithms are more difficult to implement in software code, and the algorithms have higher requirements for the dataset. Performance may decrease in the process of mining certain datasets. The FP-growth algorithm is prone to dwarfs and flat trees when constructing the FP-tree. Although the Apriori algorithm is more traditional, its ideas and processes are easier to understand and more flexible, and it can be optimized and improved according to different situations encountered in the mining process with strong applicability. At the same time, considering the data structure characteristics and application scenarios of this article, the Apriori algorithm is selected as the association rule mining algorithm of this paper.
The Apriori algorithm uses a layer-by-layer search iterative method, searches (k + 1) itemsets with k itemsets, finds all frequent itemsets, and generates strong association rules from frequent sets [36][37][38]. Let Φ = {I 1 , I 2 , . . . , I m } be a collection of items, and let task-related data D be a collection of database transactions in which each transaction T is a nonempty item set such that T⊆Φ. Each transaction has an identifier called TID. Let A be an itemset; transaction T contains A if and only if A⊆T. The association rule is an implication of the form A⇒B, where A⊂Φ, B⊂Φ, A = ∅, B = ∅ and A∩B=∅. Rule A⇒B is established in the transaction concentration and has a support degree of s, where s is the percentage of transactions in D that contain A∪B, which is the probability of P(A∪B).
However, in the process of mining frequent itemsets, the traditional Apriori algorithm discards itemsets below the minimum support threshold standard during the initial pruning process. As for the research content of this paper, if part of the data have a greater impact on the gas risks on the working face, the detected few gas anomaly value data points may not be able to reach the set support threshold size. At this point, the algorithm deletes it, thus missing part of the risk rules. In view of this omission, the algorithm support is optimized, and a weight parameter is added in the support calculation process so that the gas anomaly value with low frequency can be added to the frequent itemset. The design support threshold function is where support(X) is the support of the given item set, λ X is the weight parameter and D is the transaction set. For any x = {x 1 , x 2 , . . . , x r } where x i ∈I (i = 1, 2, . . . , r), if x is a single term, then its weight coefficient is assigned after the item set is generated; otherwise, the weight parameter must be obtained from the included items.
This equation indicates that the weight parameter calculation function has the smallest value among the weight parameters.
Rule A⇒B has confidence c in transaction set D, where c is the percentage of transactions that include A in D and B. This has a conditional probability of P(A∪B), which is where support_count(A∪B) is the number of transactions containing itemset A∪B, and support_count(A) is the number of transactions containing itemset A. The rule that meets the minimum support threshold (min_sup) and the minimum confidence threshold (min_conf) at the same time is a strong rule [39,40]. Lift represents the probability of containing X and Y at the same time and reflects the correlation between X and Y in association rules. |Lift| > 1 and higher indicates higher correlation, and Lift = 1 indicates no correlation, as shown in the following formula: The outliers in the gas data can be considered outliers in the clustering process. Therefore, the extraction of outliers can be based on the clustering method to detect outliers by examining the relationship between the object and the cluster. Using k-means clustering, for each object o, an outlier score can be assigned to the object according to the distance between the object and the nearest cluster center. Assuming that the closest center to o is c o , the distance between o and c o is dist (o, c o ), and the average distance between c o and the object assigned to c o is dist (o, c o ). The ratio dist (o, c o )/lc o measures the degree of difference between dist (o, c o ) and the average, and points far from the corresponding center are suspected to be outliers. The purpose of the k-means algorithm is to cluster n m-dimensional data X = {x 1 ,x 2 , . . . ,x n } and x i ∈R m (1 ≤ i ≤ n) into k sets. The algorithm steps are as follows.
First, randomly initialize k cluster centres c = {c 1 ,c 2 , . . . ,c k }, c j ∈R m (1 ≤ j ≤ k), and denote the set of each cluster center c j G j .
Second, put each clustered data point into a unique cluster set, and calculate the Euclidean distance between the data to be clustered x i and cluster centre c j .
Put each x i into the cluster set where the nearest cluster center is located, namely, Then, according to the data contained in each cluster set, update the cluster center value of the cluster set, namely, Next, repeat steps (2) and (3) until the change in the category center (cluster center) is less than a given threshold or the number of iterations is greater than a given number.
Finally, mark each sample data x i as the cluster category with the nearest cluster center.

Improved K-Means Clustering Algorithm
The above algorithm uses the method of randomly extracting initial cluster centers to cluster data, which causes different initial values to correspond to different clustering results. Thus, the selection method for initial cluster centers must be optimized. Because the production of the coal mining face is cyclical, that is, the shearer feed-cuts coal-loads, transports and moves the frame three shifts a day during cyclic operation or during the period of maintenance, drilling, support and other processes, the daily production process is relatively fixed. Therefore, the cluster centers of the preorder data in the collected dataset are similar to those of the postorder data. Optimizing the initial cluster center of all data is feasible using the cluster center of the preorder data subset. Cluster centers are iterated by minimizing the variance of each cluster. Minimizing the variance of each cluster makes the data contained in each set in the final clustering result show the smallest difference. After optimizing the cluster centers, the data are divided into k categories using the k-means algorithm. Among them, the k value is determined by the contour coefficient method. The distance between the data in the class to the cluster center is calculated. If the distance of certain data from the cluster center exceeds a given threshold, then it is determined to be abnormal-value data.
The algorithm is described as follows. Suppose the gas monitoring dataset is n m-dimensional data X = {x 1 ,x 2 , . . . ,x n }, x i ∈R m (1 ≤ i ≤ n), clustered to k sets. The algorithm steps are as follows.
Extract the preorder data subset of the gas dataset X, X pre ⊆ X. Use the contour coefficient to determine the most suitable clustering value k of the preorder dataset X pre .
For the preorder dataset X pre , randomly determine k cluster centers c pre = {c pre1 , c pre2 , . . . , c prek } and use Equation (9) to calculate the distance between the data x i to be clustered and the cluster centers c proj according to the data contained in each cluster set. Equation (10) is used as the criterion function to update the iterative cluster center value until the final clustering result is determined, and the final cluster center set c pro = {c pro1 , c pro2 , . . . , c prok } of the preorder data subset is obtained.
Set the k initial cluster centers c = c pro of the dataset X and apply the k-means algorithm to determine the final cluster centers.
A distance between a certain data value and the cluster center greater than the threshold θ is considered an abnormal value. The threshold can be set as follows: For a dataset without outlier data identification (the dataset has no outliers by default), calculate the sum of the average distance for each cluster and 1.5 times the standard deviation as the threshold. Calculate the distance from the outlier data to the cluster center in the training dataset with the outlier data identifier and take the smallest distance as the threshold. When the maximum distance to take outlier data fails, different thresholds are manually entered to detect the abnormal values of the training set until the number of abnormal points detected when k is taken is close to the known number.

Multifactor Coupling Relationship Analysis and Early Warning Model of Gas in the Working Face
Conventional gas early warning methods are mainly achieved by setting thresholds. When the monitored value has an abnormal value higher than the threshold, an alarm is issued. However, it is impossible to distinguish whether the outlier is an interesting outlier. The main idea of this model is to comprehensively judge gas risk events and establish a hierarchical warning mechanism by the interrelationship of the gas concentration in the working face, the gas concentration in the coal seam, the gas concentration in the upper corner and the pressure on the working face. The relationship among the gas concentration in the working face, the gas concentration in the mining coalbed, the gas concentration in the upper corner and the pressure on the working face is described next.
During the mining process of a working face, the dirty air which passes through the working face carries a large amount of gas to the upper corner and causes the accumulation of gas in the upper corner, which affects the gas concentration of the working face in the underground mine. Because of the influence of mine pressure, geological structures and other factors in the mining coalbed, the gas concentration occurring in the mining coalbed changes accordingly, which directly affects the gas concentration during mining. The supporting pressure data of the working face reflect the changes in the geological structure of the coal seam, and the roof pressure of the mining machine is disturbed during operation. As the roof pressure changes, the gas concentration in the goaf fluctuates, which affects the gas concentration in the downhole working face. There is a certain coupling relationship between the upper corner gas concentration, mining coalbed gas concentration, face pressure and working face gas concentration. Once the four dimensions of data have different degrees of abnormality, this may lead to gas disasters.
To further explore the coupling relationship between the four-dimensional data of the upper corner gas concentration, mining coalbed gas concentration, gas concentration of the working face and working face pressure and the correlation between the four-dimensional data and the underground gas risk events, this paper establishes an early warning model for the multifactor coupling relationship analysis of coal face gas, conducts in-depth analysis of the upper corner and mining coalbed determines the coal face gas concentration and abnormal data values of the working face pressure and determines the association rules.
The model is mainly composed of two parts: the outlier detection of the k-means algorithm based on cluster center optimization and the correlation analysis of the Apriori algorithm based on support optimization. Firstly, the optimized k-means algorithm is used to cluster the data and filter out the outliers in the data of each dimension. Secondly, the outlier data are reduced in dimensionality and converted into a 0-1 Boolean matrix. The optimized Apriori algorithm is used to analyze the association rules of the 0-1 Boolean matrix. Parameters such as support, confidence and lift are set to filter interesting association rule events. Finally, a hierarchical early warning mechanism for gas risk events is built to achieve hierarchical early warning of gas risk events.

Data Sources
A working face of a mine in Shaanxi was used as a test site to verify the above model. Gas concentration sensors were installed on the mining coalbed drainage pipeline, mining face and upper corner, and pressure sensors were added for hydraulic support of the face. The sensor that collects gas concentration is model KG9001C. The relevant technical parameters of the gas sensor are shown in Table 1 below. The sensor layout diagram is shown in Figure 1. U is the upper-corner gas sensor on the working surface, W is the working face gas sensor, C is the gas concentration sensor of the mining coalbed drainage pipeline, and S1-S156 are bracket pressure sensors arranged on the working face. Table 1. Technical parameters of gas sensor.

Model
Measuring Range Measurement Error Response Time Energies 2021, 14, x FOR PEER REVIEW 7 of 20 association rule events. Finally, a hierarchical early warning mechanism for gas risk events is built to achieve hierarchical early warning of gas risk events.

Data Sources
A working face of a mine in Shaanxi was used as a test site to verify the above model. Gas concentration sensors were installed on the mining coalbed drainage pipeline, mining face and upper corner, and pressure sensors were added for hydraulic support of the face. The sensor that collects gas concentration is model KG9001C. The relevant technical parameters of the gas sensor are shown in Table 1 below. The sensor layout diagram is shown in Figure 1. U is the upper-corner gas sensor on the working surface, W is the working face gas sensor, C is the gas concentration sensor of the mining coalbed drainage pipeline, and S1-S156 are bracket pressure sensors arranged on the working face.

Model
Measuring Range Measurement Error Response Time (4-10)% CH4 ± 1% CH4 (10-100)% CH4 ± 10% CH4 The gas data of the working face, mining coalbed and upper corner are collected at intervals of 5 s, and the maximum gas value within 5 min is taken as the experimental data value. The pressure data are taken every 5 min and correspond to the time node of the gas data extraction value. A total of 7 days of data are sorted and recorded as UX, WX, CX and SX, and 2016 groups of data are collected in 4 dimensions. Part of the data is shown in Table 2 below. A three-dimensional distribution diagram of gas concentration data extracted from the upper corner, working face and mining coalbed is shown in Figure 2, and a pressure data scatter diagram is shown in Figure 3. This figure shows that abnormal points exist in the data. An outlier detection algorithm must be used to detect them. At the same time, the gas monitoring system recorded 55 gas risk alarm events during the data collection period.

Mining -coalbed
Working face The mining-coalbed gas drainage pipeline The gas data of the working face, mining coalbed and upper corner are collected at intervals of 5 s, and the maximum gas value within 5 min is taken as the experimental data value. The pressure data are taken every 5 min and correspond to the time node of the gas data extraction value. A total of 7 days of data are sorted and recorded as U X , W X , C X and S X , and 2016 groups of data are collected in 4 dimensions. Part of the data is shown in Table 2 below. A three-dimensional distribution diagram of gas concentration data extracted from the upper corner, working face and mining coalbed is shown in Figure 2, and a pressure data scatter diagram is shown in Figure 3. This figure shows that abnormal points exist in the data. An outlier detection algorithm must be used to detect them. At the same time, the gas monitoring system recorded 55 gas risk alarm events during the data collection period.

Correlation Analysis of Preorder and Postorder Gas Data
The gas data of two adjacent days are randomly taken to analyze the fit degree. The change trend of the gas concentration value between any two days in different dimensions has a certain similarity. As shown in Figure 4a-c, the first 4 days of data are extracted from the three-dimensional dataset of the mining coalbed, upper corner and working face. We analyze the data of the first two days as the preorder data and the data of the latter two days as the postorder data. It is concluded that the gas data of the working face, upper corner and mining coalbed have a strong fit. Therefore, outlier detection is feasible using the method of clustering initial cluster centers based on the preorder gas concentration data clustering cluster center optimization of postorder data using the k-means algorithm.
Using gas concentration data for the upper corner, working face and mining coalbed for the first 2 days, the preamble datasets are U Xpre , W Xpre and C Xpre of 576 groups each, and the datasets for the next 5 days are U Xpost , W Xpost and C Xpost of 1440 groups each. The maximum, minimum and mean characteristics of the preorder and postorder gas datasets are listed in Table 3 below. Thermal correlation analysis was performed on sequential data. The thermal correlation coefficient reflects the degree of correlation between two variables. The higher the correlation coefficient is, the greater the degree of linear correlation between variables. The calculation involves two statistics: covariance and standard deviation. Covariance is an indicator used to measure the linear relationship between two random variables and is defined as follows: where µ x and µ y represent the mean values of random variables x and y, respectively. Covariance is used to reflect the degree of correlation between two variables. When the covariance is greater than 0, this indicates a positive correlation, and when the covariance is less than 0, this indicates a negative correlation. When there are abnormal points in the data or the degree of dispersion of the data changes, the value of the covariance is affected. Therefore, to better characterize the degree of dispersion of the data and standardize the data, the covariance and standard deviation are introduced to define the correlation, which is defined as follows:   The thermal correlation coefficient diagram is shown in Figure 5. The figure shows that the correlation coefficient of the sequence data of the upper corner is 0.13, the correlation coefficient of the sequence data before the work is −0.16 and the correlation coefficient of the sequence data of the coal seam is −0.55. The absolute value of the correlation coefficient is greater than 0.1, so there is a strong correlation.

Cluster Analysis and Abnormal Point Detection of Gas Concentration Data
The k-means clustering algorithm is used to divide the data for classification into k clusters. The contour coefficient method is used to calculate the distance from each vector in the cluster to all other points in the cluster to which it belongs and the average distance to all points in the nearest cluster adjacent to it. Then, the contour coefficient of the vector is S(i) = [b(i)-a(i)]/max{a(i), b(i)}. The value of the profile coefficient is within (-1,1), and a value closer to 1 means better cohesion and separation. k is set to 2, 3, 4, 5 and 8, and the contour coefficients are calculated as shown in Figure 6a,b below. The figure indicates that the maximum value of the contour coefficient calculated when k = 3 is 0.89. According to the contour coefficient method, the clustering effect is optimal at this time. From the above, taking the preorder dataset for clustering, the clustering effect is shown in Figure 7. The abnormal values (discrete points) of the scattered points after clus-

Cluster Analysis and Abnormal Point Detection of Gas Concentration Data
The k-means clustering algorithm is used to divide the data for classification into k clusters. The contour coefficient method is used to calculate the distance from each vector in the cluster to all other points in the cluster to which it belongs and the average distance to all points in the nearest cluster adjacent to it. Then, the contour coefficient of the vector is S(i) = [b(i) − a(i)]/max{a(i), b(i)}. The value of the profile coefficient is within (−1,1), and a value closer to 1 means better cohesion and separation. k is set to 2, 3, 4, 5 and 8, and the contour coefficients are calculated as shown in Figure 6a,b below. The figure indicates that the maximum value of the contour coefficient calculated when k = 3 is 0.89. According to the contour coefficient method, the clustering effect is optimal at this time.

Cluster Analysis and Abnormal Point Detection of Gas Concentration Data
The k-means clustering algorithm is used to divide the data for classification into k clusters. The contour coefficient method is used to calculate the distance from each vector in the cluster to all other points in the cluster to which it belongs and the average distance to all points in the nearest cluster adjacent to it. Then, the contour coefficient of the vector is S(i) = [b(i)-a(i)]/max{a(i), b(i)}. The value of the profile coefficient is within (-1,1), and a value closer to 1 means better cohesion and separation. k is set to 2, 3, 4, 5 and 8, and the contour coefficients are calculated as shown in Figure 6a,b below. The figure indicates that the maximum value of the contour coefficient calculated when k = 3 is 0.89. According to the contour coefficient method, the clustering effect is optimal at this time. From the above, taking the preorder dataset for clustering, the clustering effect is shown in Figure 7. The abnormal values (discrete points) of the scattered points after clustering are detected and set. When thresholds of 2 and 29 groups of abnormal data were From the above, taking the preorder dataset for clustering, the clustering effect is shown in Figure 7. The abnormal values (discrete points) of the scattered points after clustering are detected and set. When thresholds of 2 and 29 groups of abnormal data were detected and 1 group was detected by mistake, the accuracy rate was 96.55%. The detection effect is shown in Figure 8. detected and 1 group was detected by mistake, the accuracy rate was 96.55%. The detection effect is shown in Figure 8.  The cluster center of the preorder dataset is used to optimize the cluster center of the full dataset, and outlier detection is performed on the entire dataset. A three-dimensional graph of the clustering effect of the full dataset is shown in Figure 9. An outlier identification diagram of the full dataset is shown in Figure 10. A comparison between the optimized cluster center and the random initial cluster center is shown in Table 4. Energies 2021, 14, x FOR PEER REVIEW 13 of 2 detected and 1 group was detected by mistake, the accuracy rate was 96.55%. The detec tion effect is shown in Figure 8.  The cluster center of the preorder dataset is used to optimize the cluster center of th full dataset, and outlier detection is performed on the entire dataset. A three-dimensiona graph of the clustering effect of the full dataset is shown in Figure 9. An outlier identifica tion diagram of the full dataset is shown in Figure 10. A comparison between the opti mized cluster center and the random initial cluster center is shown in Table 4. The cluster center of the preorder dataset is used to optimize the cluster center of the full dataset, and outlier detection is performed on the entire dataset. A three-dimensional graph of the clustering effect of the full dataset is shown in Figure 9. An outlier identification diagram of the full dataset is shown in Figure 10. A comparison between the optimized cluster center and the random initial cluster center is shown in Table 4.   During the data collection process, the initial value of the support pres MPa. With the influence of mining, when the stress structure of the coal s   During the data collection process, the initial value of the support pressure d MPa. With the influence of mining, when the stress structure of the coal seam c the support pressure also changes. The above abnormal value detection metho used to detect the pressure data of the stent. Through outlier detection, a total of of abnormal data were detected, including 56 sets of abnormal values of gas conce  During the data collection process, the initial value of the support pressure data is 24 MPa. With the influence of mining, when the stress structure of the coal seam changes, the support pressure also changes. The above abnormal value detection method is also used to detect the pressure data of the stent. Through outlier detection, a total of 253 sets of abnormal data were detected, including 56 sets of abnormal values of gas concentration in the working face, 58 sets of abnormal values of gas concentration in the upper corner, 56 sets of abnormal values of gas concentration in this mining coalbed and abnormal values of support pressure data in 83 groups. There are a total of 55 groups of recorded gas risk alarms (D). The detected abnormal data values and risk records are converted into a 0-1 Boolean matrix, recording abnormal data as 1, normal data as 0, abnormal risk events as 1 and normal as 0. There are 125 groups of abnormal data combinations. The partially converted 0-1 Boolean matrix is shown in Table 5, and the association rules are mined based on the optimized Apriori algorithm constructed above.

Discussion
The Apriori algorithm was used to analyze the coupling relationship between the working face, upper corner and mining coalbed gas concentrations and the abnormal value detected by the support pressure. The specific analysis steps are as follows: (1) Data entry. The binary table of abnormal data includes binary information of the working face, upper corner, mining coalbed and support pressure data. (2) Establish the association rules between the data in the training set to obtain the weight.
(3) Based on the weighted Apriori algorithm, the correlation of the entire dataset is analyzed, and the association rules are obtained. (4) Set different confidence levels, obtain different levels of strong association rules and set the warning level. (5) Analyze the dataset according to the strong association rules of different early warning levels to achieve hierarchical early warning.
The minimum support is set to 10%, the minimum confidence is set to 10%, and 19 association rules are obtained. A scatter plot is presented in Figure 11, where the x-axis is confidence, the y-axis is support, and blue indicates the degree of lift. The figure shows that most of the points have a support degree of 20-40% and a confidence degree of 60-100%. There are only a few points outside this range. The support threshold is set to 20%, the confidence threshold is set to 60%, and the lift threshold is set to 1 to filter strong association rules. A total of 16 association rules are obtained. This indicates that some association rules do not meet the threshold constraints. The results of the association rule are shown in Table 6, where 16 association rules are obtained. In the table, Con represents the preconditions of the association rules. There are 1, 2 and 3 preconditions in the association rules generated by mining. Re represents the results of the preconditions in the association rules, Sup represents the support of the association rules, and Cof represents the confidence of the association rules.   The above table shows a strong correlation between the working face, uppe and mining coalbed gas concentrations and the working face pressure. Anomalie two dimensions of data lead to anomalies in data in other dimensions except mini beds. The gas concentration in the mining coalbed is primarily affected by the mi sure during the mining process and the natural factors of gas occurrence. Therefor association rules, there are two association rules that cause abnormalities in this co both of which are prerequisites for abnormal pressure dimension data.
The above method is used to explore the relationship between the four dim of the working face, upper corner, mining coalbed, support pressure and under gas danger. The minimum support threshold was set to 20%, and the minimum dence threshold was set to 60% to obtain 14 sets of association rules, as shown in Table 7. Partial association rules of underground gas risks and coal mine abnormal data training set.   The above table shows a strong correlation between the working face, upper corner and mining coalbed gas concentrations and the working face pressure. Anomalies in any two dimensions of data lead to anomalies in data in other dimensions except mining coalbeds. The gas concentration in the mining coalbed is primarily affected by the mine pressure during the mining process and the natural factors of gas occurrence. Therefore, in the association rules, there are two association rules that cause abnormalities in this coal seam, both of which are prerequisites for abnormal pressure dimension data.

ID
The above method is used to explore the relationship between the four dimensions of the working face, upper corner, mining coalbed, support pressure and underground gas danger. The minimum support threshold was set to 20%, and the minimum confidence threshold was set to 60% to obtain 14 sets of association rules, as shown in Table 7. Table 7. Partial association rules of underground gas risks and coal mine abnormal data training set. The above results show that there are more than three dimensions of gas concentration data and pressure data in the upper corner, mining coalbed and working face. When an abnormality is detected, there is 100% confidence that gas risk events occur. In the association rules with two factors leading to the result, only when the two sets of data mining coalbed and working face data and working face pressure and working face data are abnormal is the confidence level of the occurrence of a gas risk event situation 100%. The two factors lead to confidence in the association rules of the results above 95%. In comparison, a single factor leads to a lower confidence in the association rules of the occurrence of gas risk events situation of approximately 75%. Therefore, multiple abnormal factors downhole are more likely to cause gas danger than a single abnormal factor.

ID
By setting different confidence grades, four grades ofassociation rules are obtained. According to the confidence grade from high to low, the risk warning grade is divided into grade I, grade II, grade III and grade IV from strong to weak. There are four grades, as shown in Table 8. The above table indicates that the grade-I (Sup ≥ 0.2, Cof ≥ 0.99) risk warning confidence is 100%, and most of these warnings are caused by a simultaneous abnormality of data in 3 dimensions. When abnormal values are detected in the pressure data of the mining coalbed, upper corner, working face and support, it may be that the coal seam structure has been destroyed during the mining process, and the hydrology and other conditions have changed, leading to gas gushing in many places. Grade-II (Sup ≥ 0.2, 0.99 > Cof ≥ 0.7) danger warnings are mainly caused by abnormal data in two dimensions. From the preconditions of such association rules, the occurrence of abnormal gas concentrations in the mining coalbed and upper corner is relatively high. Grade III (Sup ≥ 0.2, 0.7 > Cof ≥ 0.5) and grade IV (Sup ≥ 0.2, 0.5 > Cof ≥ 0.4) have a single cause of gas risk events, and only one dimension of abnormal data leads to gas risk events. The mining coalbed, upper corner, working face and rock pressure disturbance during the mining process all have a certain impact on the gas concentration. In particular, when the gas concentration of multiple spatial dimensions is abnormal at the same time, this will likely cause underground gas risks to appear. At present, the early warning result of coal mine gas concentration is mainly judged by the size attribute of the data uploaded by the sensor. The big problem with this method is that it is difficult to quickly identify the authenticity of outliers and the reliability of early warning events is poor. However, the early warning results obtained in this article are based on analyzing multiple data dimensions with coupling relationships and making judgments on early warning events. The obtained gas warning result combines the screening and verification of multiple parameters and has high reliability.

Conclusions
This paper studied the dynamic changes in the gas in a mine and provided an indepth analysis of the abnormal values of the gas monitoring data. An early warning model was established for a multifactor coupling relationship analysis of coal face gas based on the improved k-means and Apriori algorithms. The model improves the k-means algorithm based on the selection method of initial cluster centers so that the unique clustering results obtained when set different initial values to perform cluster analysis on the four dimensions of data that directly or indirectly affect the gas concentration, thereby improving the accuracy of distinguishing abnormal values in gas data. Then, after adding weights to the support as an optimization method for the Apriori algorithm so that the larger risk events and major risk events with lower frequency can be distinguished. It is also possible to establish a gas grading early warning mechanism based on the support and confidence parameters. Finally, interesting association rule events can be filtered by lift. The model was verified by an example. The verification results show that the model can accurately predict the gas concentration when multiple factors affect the gas concentration conditions, and the gas grading pre-mechanism derived from the in-depth analysis of gas outliers can effectively improve the early warning performance of gas disasters. By analyzing the influence of multifactor coupling gas concentration fluctuations, the corresponding association rules and risk warning models are established and effectively prevent gas overruns and production stops, release coal mining capacity and protect the lives of operators to a greater extent. This promotes the safe, efficient and stable production of coal. This research has laid a certain foundation for intelligent gas accurate warning. The next step of the research will be to conduct data fusion analysis with the upstream and downstream topological structure of the ventilation as the link, in order to improve the accuracy of underground gas concentration prediction and early warning.