Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors

Xi, Jian; Guan, Lei; Zhu, Xiaoguang; Zong, Kai; Yan, Wenrui

doi:10.3390/pr14010108

Open AccessArticle

Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors

by

Jian Xi

^1,*

,

Lei Guan

^1,*,

Xiaoguang Zhu

²,

Kai Zong

^1,*

and

Wenrui Yan

¹

Work Safety Risk Monitoring and Early Warning Center, China Academy of Safety Science and Technology, Beijing 100012, China

²

Shandong Environmental Protection Development Group Technology Co., Ltd., Jinan 250014, China

^*

Authors to whom correspondence should be addressed.

Processes 2026, 14(1), 108; https://doi.org/10.3390/pr14010108 (registering DOI)

Submission received: 29 August 2025 / Revised: 4 December 2025 / Accepted: 8 December 2025 / Published: 28 December 2025

(This article belongs to the Special Issue AI-Driven Safe and High-Quality Development in Process Industries)

Download

Browse Figures

Versions Notes

Abstract

Hazardous gas leaks are a major trigger of chemical incidents. If not handled in time, they can easily lead to secondary disasters such as fires and explosions. In recent years, with the construction of hazardous chemical monitoring and early-warning systems in China, large volumes of field operating data from flammable and toxic gas sensors have been accumulated, providing a data foundation for leak-pattern studies grounded in real-world scenarios. In this study, 56 leak samples verified by site feedback were selected. Time-aware interpolation and Z-score normalization were used for preprocessing, and time-series features—including standard deviation of first differences, autocorrelation coefficients, and frequency-domain energy—were extracted. Leak patterns were then identified using two unsupervised approaches: K-Means clustering and a 1D-CNN autoencoder. Results show that K-Means effectively distinguishes macro-patterns such as sustained leaks, instantaneous leaks, fluctuating leaks, and interrupted leaks, while the autoencoder demonstrates stronger capability in extracting temporal features, revealing leak evolution and transition characteristics. The two methods are complementary and together provide a viable route to developing an end-to-end model for leak scenario identification and risk discrimination. This work not only verifies the feasibility of conducting leak-pattern recognition using real GDS data but also offers technical guidance for the intelligent upgrading of hazardous chemical monitoring and early-warning systems.

Keywords:

hazardous gas leak; Gas Detection System (GDS); K-Means; autoencoder; pattern recognition

1. Introduction

Hazardous gas leakage is a common type of incident in the chemical industry. Its direct consequences mainly include poisoning, asphyxiation, and environmental pollution; if not controlled promptly, it can also lead to more serious outcomes such as fires and explosions. Moreover, gas leakage frequently occurs during the course of various hazardous chemical accidents. Typical recent cases include the November 2018 leak at Shenghua Company [1] in Hebei, China, where leaked vinyl chloride ignited and exploded, causing 24 deaths and 21 injuries. In November 2019, a pipeline rupture at the TPC facility in Port Neches (USA) [2] led to a massive butadiene release and a vapor cloud explosion, injuring three people and prompting the evacuation of 54,000 residents.

To address leakage problems, the chemical industry began early on to deploy Gas Detection Systems (GDSs) for flammable and toxic gases. International development has gone through several stages:

Single-point alarm (1920s–1960s): Personnel had to carry a detector to patrol locally, with no centralized remote monitoring. Refineries began using portable/fixed catalytic-bead alarms. Even today, carrying portable sensors for inspections remains a key detection approach.
Distributed detection with centralized indication (1960s–1970s): The 4–20 mA current loop plus cabinet annunciator panels constituted the early GDS prototype, centralizing the display of values from tens of detectors. NFPA 72 [3] subsequently adopted this configuration as part of the fire alarm circuit.
System concept establishment (1980s): Gas alarms as well as interlocks, nitrogen purging, and emergency shutdown functions were consolidated into unified PLC (Programmable Logic Controller)/ESD (Emergency Shutdown Device) platforms. Industry practice also began to include quantitative evaluation of detector system coverage. The 1988 Piper Alpha offshore disaster [4] prompted the UK and the broader industry to designate the installation of GDS as a core safety requirement in the petrochemical sector.
Functional safety integration (1990s–2000s): GDSs were integrated into the Safety Integrity Level (SIL) framework and subjected to quantitative availability verification. The first edition of IEC 61511-1 [5] incorporated GDS into SIS (Safety Instrumented System) lifecycle management.
Performance and coverage design (2010s): Evaluating GDS in terms of coverage, PFD (Probability of Failure on Demand), and MTTR (Mean Time to Repair) have become mainstream practice. IEC 60079-29-2 [6] provides guidance on the selection, installation, and maintenance of GDS, while ISA TR84.00.07 [7] introduces 3D quantitative coverage metrics and a scheme compatible with LOPA (Layer of Protection Analysis).
Cloud and IIoT convergence (2020s–): GDSs are increasingly converging with Wi-Fi, HART, and 5G communications, as well as self-diagnostics and digital-twin technologies. Some GDS platforms now support remote OTA calibration and RUL (Remaining Useful Life) and health prognostics. NFPA 715 [8] establishes a dedicated, standalone standard for fuel-gas detection.
China’s GDS development has likewise progressed:
Initiation (1980s–1990s): Early GDSs were typically composed of a control panel coupled with a single-loop controller. GB 12358 [9] was the first Chinese national standard in this domain, which focused on instrument conformity and addressed product qualification for GDS.
Engineering support (1990–2008): Petrochemical enterprises began deploying detector points across critical process units and areas. For the first time, GB 50160 [10] codified GDS as a mandatory provision.
Systematic design (2009): GB/T 50493 [11] specifies explicit requirements for GDS performance, detector placement methodology, multi-level alarming, and interlock interfaces, thereby establishing system-level design criteria.
IoT upgrading (2010–2019): GDS are integrated with SIS and MES (Manufacturing Execution Systems) via IoT gateways or Modbus protocol, enabling digital monitoring and maintenance. GB/T 50493 [11] further introduces remote diagnostics, redundant communications, and the concept of lifecycle management.
Intelligent upgrading (2020–present): GDS has been integrated with work safety platforms of company and government supervisory systems. GB 17681 [12] defines GDS as a core monitoring subsystem and introduces hierarchical alarms and early-warning concepts. Since 2019, China has been building a hazardous chemical monitoring and early-warning system. By installing gateways at enterprise sites, the system collects safety-related monitoring parameters and alarm information from DCS/SCADA/GDS systems and records them in a unified national information system. Large amounts of data have been accumulated to date. Among them, monitoring data from flammable and toxic gas sensors is a key focus and provides a foundation for studying real plant leak processes.

Research on data analytics and pattern recognition for GDS has primarily focused on gas-type identification, leak-scenario recognition, and leak-state prediction. Early on, Stetter et al. [13] proposed a systematic approach to hazardous gas identification using electrochemical sensor arrays and pattern recognition algorithms. By varying heater (Pt, Rh) temperatures and bias settings, a single sensor operated under four modes, yielding 16 independent channels to collect data on 22 hazardous gases (e.g., NO₂, H₂S, CO, benzene). Using Karhunen–Loève transforms, K-Means, and nonlinear mapping, they achieved good discrimination after dimensionality reduction and found that ~10 channels retained ≈99% of discriminative information. The study highlighted the importance of multi-dimensional array responses and normalization but focused on static classification (gas identification), without time-series forecasting, trend analysis, or early-warning/actuation logic, leaving room for dynamic-pattern research.

Sharma et al. [14] advanced multimodality and privacy protection by building a dataset that synchronously captured 7-channel MOX sensor signals (MQ2, MQ3, MQ5, MQ6, MQ7, MQ8, and MQ135) and IR thermal frames(36° field of view, 40–330 °C range, 9 Hz, 32,136 thermal pixels) and trained a federated deep-fusion network to classify four scenarios (leak, smoke, smoke–leak mix, neutral gas) in real time. Potharaju et al. [15] addressed scarce labels and sensor faults in environmental monitoring via a two-stage framework: Isolation Forest auto-labeled anomalies in a 400 k record telemetry dataset (CO, LPG, smoke, temperature/humidity, etc.), after which Random Forest, MLP, and AdaBoost were trained for real-time prediction. For subsea pipelines, Zhang [16] proposed a GNN + time-series framework for leak/defect diagnosis: 5000 historical samples (10-D features: pressure, flow, temperature, etc.) were built into a node-edge topology. GCN + GAT extracted node patterns to reach 92% defect-recognition accuracy, and a 24 h ahead leak risk predictor achieved 85% accuracy.

Kang et al. [17] developed an intelligent early-warning workflow that integrates large-scale monitoring datasets, feature engineering, imbalanced sampling strategies, and ensemble learning. Twenty-three statistical features and two knowledge-driven features were extracted from daily/seasonal mechanisms. DT-RFE selected six key features (e.g., 24 h mean, slope, 7 d differential peak). Borderline SMOTE addressed extreme imbalance (447 NG-leak vs. 96k biogas samples). XGBoost achieved an F-score of 72.7%, a recall of 73.4%, and a precision of 71.2% on an independent test set, reducing on-site verification workload by ≈30% and shortening average response time to 12 min. Thanigaivelu et al. [18] proposed an integrated safety framework combining IoT multi-sensor networks with Isolation Forest for anomaly detection in chemical plants: eight classes of low-power ZigBee/Wi-Fi sensors (temperature/pressure, concentration, vibration, acoustics, etc.) collect data at pipelines/tanks and stream to the cloud. After cleaning, Isolation Forest generates isolation scores to trigger alarms under dynamic thresholds, then links to control systems for emergency stop, cut off, or containment. The system supports continuous learning and shows >98% detection, <3% false-alarm rate, <30 s response delay, scalable via sensor redundancy, and cybersecurity hardening.

Overall, while GDS performance for concentration/type detection and false-alarm control has improved markedly, most studies rely on laboratory data; real-world leak data are hard to obtain and are influenced by meteorology, layout, and sensor placement. Moreover, the leak scenario and risk information embedded in monitoring data are not yet fully mined. This study, therefore, leverages field GDS data, builds a real leak sample set, extracts leak features, and identifies leak patterns.

2. Data Sources

Per GB/T 50493 [11], detector locations for combustible/toxic gases should be determined by a comprehensive analysis of gas physico-chemical properties, characteristics of release sources, plant layout, geography, climate, detector characteristics, reliability requirements, and inspection routes, with placement where gases tend to accumulate and sampling/maintenance is convenient. For outdoor/open-shed areas, the horizontal distance from a combustible-gas detector to any release source in its coverage should not exceed 10 m; for a toxic-gas detector, it should not exceed 4 m in enclosed or poorly ventilated semi-open spaces: ≤5 m for combustible gas; ≤2 m for toxic gas. Detector mounting heights should reflect gas density: 0.3–0.6 m above the floor for gases heavier than air; within 2.0 m above the source for lighter-than-air gases; 0.5–1.0 m below the source for slightly heavier-than-air gases; and 0.5–1.0 m above the source for slightly lighter-than-air gases—as schematically shown in Figure 1.

Most Chinese chemical plants have deployed flammable/toxic gas alarms per these standards and upload data to the early-warning system. As noted in the Introduction, this system—developed by the Ministry of Emergency Management of China and local counterparts—collects chemical plant alarms via on-site IoT gateways and synchronizes data upward through park/county → city → province → ministry, at a 5 min sampling cadence. At the ministerial level, more than 1000 alarm records are logged daily from thousands of enterprises. Because site conditions are complex and alarm causes vary (sensor faults, panel failures, communication drops, power outages, maintenance work, exogenous gas contamination, etc.), not all alarm records are valid dispersion-pattern samples. Starting from the alarm records of various toxic and flammable gas sensors, we identified alarm points with long durations (≥15 min) whose readings were not near the upper or lower limits of the measurement range. For each point, we traced back and retrieved the preceding 4 h of sensor monitoring data and filtered out time series likely caused by sensor faults (step-like patterns). This yielded 56 groups of real leakage samples that constitute the dataset for this study. Sensor types and corresponding samples are displayed in Table 1.

To handle missing time-series values, we first quantified pre-interpolation missingness on a 5 min bin across 56 series. A bin is counted “missing” if no raw observation falls inside. At the dataset level, missingness is quantified as the empty bins proportion of all bins. The dataset-level missing rate was 10.5%, with a 4.1% median per-series rate; patterns included no missing (37.5%), isolated single-bin gaps (35.7%), short runs (2–3 bins) (16.1%), long runs (≥4 bins; ≥20 min) (7.1%), and mixed (3.6%).

We also compute the per-bin slope:

\begin{matrix} s_{i} = \frac{|x_{i + 1} - x_{i}|}{|τ_{i + 1} - τ_{i}| / Δ}, Δ = 5 \min \end{matrix}

(1)

We mark steep edges as the top-decile of

{s_{i}}

and ask what fraction of these edges cross a gap, i.e.,

(τ_{i + 1} - τ_{i}) / Δ > 1

. If missingness clustered at sharp transitions, this fraction would be high. The median steep change missing share was 0.0, indicating that missingness rarely coincided with sharp edges. Sample plots (after min–max normalization) are shown in Figure 2, Figure 3, Figure 4 and Figure 5.

Each sample spans 4 h, chosen so that the period around the leak alarm covers representative scenarios such as short-duration, long-duration, stable, and unstable leaks.

3. Data Preprocessing

Because the 56 samples come from different sensors, we first normalize them using Z-score normalization, which expresses measurements in standard deviations from the sample mean. This removes scale while preserving shape and relative differences, making it suitable for time-series pattern alignment [19]. Steps are as follows:

1. Mean:

\begin{matrix} μ = \frac{1}{n} \sum_{i = 1}^{n} x_{i} \end{matrix}

(2)

where x_i is the measured gas concentration and n is the number of points. With 4 h windows sampled every 5 min, n = 49.

2. Standard deviation:

\begin{matrix} σ = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}} \end{matrix}

(3)

3. Standardized value:

\begin{matrix} z_{i} = \frac{x_{i} - μ}{σ} \end{matrix}

(4)

Representative post-standardization features are summarized in Table 2.

In Table 2, standard deviation of first differences reflects short-term volatility/jitter and step frequency, computed by the following formula:

\begin{matrix} d_{t} = x_{t} - x_{t - 1}, t = 2, \dots, N \end{matrix}

(5)

\begin{matrix} \bar{d} = \frac{1}{N - 1} \sum_{t = 2}^{N} d_{t} \end{matrix}

(6)

\begin{matrix} s_{d} = \sqrt{\frac{1}{(N - 1) - 1} \sum_{t = 2}^{N} {(d_{t} - \bar{d})}^{2}} \end{matrix}

(7)

For the representative samples in Table 2, the standard deviation of first differences lies in the range 0.20–0.26, indicating that these series contain either a step-like increase or pronounced fluctuations around an elevated level. In other words, this feature acts as a proxy for leak intensity and edge strength: higher values correspond to stronger or more jagged changes in concentration.

Autocorrelation r_k (for k = 1,3,6) measures smoothness/persistence and periodic structure over the corresponding lags.

\begin{matrix} \bar{x} = \frac{1}{N} \sum_{t = 1}^{N} x_{t} \end{matrix}

(8)

\begin{matrix} r_{k} = \frac{\sum_{t = k + 1}^{n} {(x}_{t} - \bar{x}) (x_{t - k} - \bar{x})}{\sum_{t = 1}^{N} {(x_{t} - \bar{x})}^{2}} \end{matrix}

(9)

In Table 2, lag 1 autocorrelation is very high, 0.95–0.97, indicating that each point is very similar to the previous one and that deviations from baseline do not immediately dissipate. The autocorrelation at lag 3 remains in the range 0.65–0.82, and even at lag 6 it stays positive (0.14–0.57). Given our 5 min grid, lags 3 and 6 correspond to approximately 15 and 30 min, respectively. This slow decay of autocorrelation over 3–6 steps is consistent with sustained or slowly decaying leakage process.

FFT band 1 energy [20] represents the share of low-frequency variation (trend). With

\begin{matrix} X_{m} = \sum_{t = 1}^{N} x_{t} \exp (- j \frac{2 π (m - 1) (t - 1)}{N}), m = 1, \dots, ⌊\frac{N}{2}⌋ \end{matrix}

(10)

\begin{matrix} P_{m} = {|X_{m}|}^{2} \end{matrix}

(11)

\begin{matrix} B_{1} = \{m_{a}, \dots, m_{b}\} \subseteq \{1, \dots, ⌊\frac{N}{2}⌋\} \end{matrix}

(12)

The band 1 energy is

\begin{matrix} f f t_{b a n d 1} = \sum_{m \in B_{1}} P_{m} \end{matrix}

(13)

In Table 2, band 1 energy is consistently on the order of 1.9 × 10⁴–2.3 × 10⁴. This pattern indicates that a large fraction of their power is concentrated at slow time scales, rather than being dominated by high-frequency signals.

4. Leak Pattern Recognition with K-Means

Based on the features in Table 1 (augmented with zero-crossing rate, skewness, kurtosis, etc.), we constructed a feature matrix from the raw time series and applied K-Means clustering, which partitions samples into K clusters by minimizing within-cluster similarity distances [21]. The procedure is as follows:

1. Randomly initialize K centroids.

2. Assign each sample F_i to the nearest centroid c_j using Euclidean distance.

\begin{matrix} d (F_{i}, c_{j}) = \sqrt{\sum_{k = 1}^{m} {(F_{i k} - c_{j k})}^{2}} \end{matrix}

(14)

where m is the feature dimension.

3. Update each centroid to the mean of points assigned to it.

4. Repeat steps 2–3 until convergence or a maximum iteration count is reached.

The choice of K followed the elbow method, plotting K vs. within-cluster sum of squares (WCSS) to locate the “elbow” (Figure 6). The final K we selected is 7. When K = 7, the Davies–Bouldin index (DBI) and Calinski–Harabasz index (CHI) are 1.010 and 18.0, respectively. The trends of DBI (↓) and CHI (↑) are consistent with this choice. The resulting clusters are shown in Figure 7. Quantitative summaries per cluster are displayed in Table 3.

Cluster 6 (n = 17)—Low-amplitude fluctuation with sparse pulses: Most standardized values lie within ±1 σ (shown as Figure 8); the baseline is nearly flat (no obvious drift). Although local fluctuations exist (due to flow fields, sensor characteristics, and dispersion complexity), the global mean level is stable—indicative of a sustained-leak process.

Cluster 0 (n = 15)—Low-amplitude, slow drift: Most curves slowly drift within ±1 σ for extended periods (shown as Figure 9). Combined with raw curves, this reflects long-duration leaks that have reached sensor range limits or post-leak decay toward zero.

Cluster 5 (n = 10)—Interrupted process: Raw curves show clear leak interruption (shown as Figure 10), possibly due to ventilation or manual intervention.

Cluster 4 (n = 6)—Start-up peak with exponential decay: Multiple traces reached 4–5 σ around 00:00 and then decayed monotonically, returning to approximately 0 σ within 1–3 h (shown as Figure 11). This pattern could demonstrate a transient release subsequently diluted by ventilation.

5. Leak Pattern Recognition with 1D-CNN Autoencoder

A 1D-CNN autoencoder offers the following: (i) local-pattern capture—kernels naturally extract jumps, pulses, edges, and periodicities [22], while stacked/multi-scale kernels cover short/medium/long windows; (ii) translation invariance via shared weights and “same” padding, reducing sensitivity to event phase misalignment across samples; (iii) robust global summarization—a global average pooling (GAP) layer forms a sample-level latent vector less sensitive to outliers; and (iv) label-free training via reconstruction loss (MSE)—enables learning from limited data. The procedure is as follows:

1. Encoder. Multiple 1D convolution layers (kernel sizes 7, 5, 3; stride 1) are followed by a 1 × 1 convolution (shown in Figure 12) to compress channels to d. GAP over time yields the latent vector:

\begin{matrix} h = C o n v S t a c k (x) \end{matrix}

(15)

\begin{matrix} z = G A P (C o n v 1 \times 1 (h)) \in R^{d} \end{matrix}

(16)

2. Decoder. A fully connected layer expands z to

R^{C \times T}

with C as expanded channels, followed by a symmetric convolution stack (kernels 3, 5, 7) to reconstruct

\begin{matrix} \hat{x} \in R^{1 \times T} \end{matrix}

(17)

3. Training objective (point-wise MSE):

\begin{matrix} L_{M S E} (x, \hat{x}) = \frac{1}{T} \sum_{t = 1}^{T} {(x_{t} - {\hat{x}}_{t})}^{2} \end{matrix}

(18)

Optimization used Adam with mixed-precision CUDA and gradient clipping; early stopping was based on validation-set MSE. Autoencoder hyperparameters are shown in Table 4. Using 111 total epochs, the best validation loss occurred at epoch 81 (val MSE 0.7632, train 0.6382), which we adopt as the checkpoint for all downstream analyses. By the final epoch, train and validation MSE were 0.5577 and 0.8204, respectively, yielding a generalization gap of 0.2628. After the best epoch (≥81), the validation series exhibited low variability (SD ≈ 0.050) and gentle downward trends over the last 10 epochs (slopes: train −0.0010, val −0.0053). With an early-stopping patience of 30, the run terminated at 81 + 30 ≈ 111 epochs, consistent with the observed plateau behavior (shown in Figure 13).

After training, each sample’s latent vector {z_i} was clustered by K-Means. PCA to 2D with cluster coloring visualized separability (shown as Figure 14). Quantitative summaries per cluster are displayed in Table 5.

Cluster 6 (n = 11)—Curves fluctuate slightly around [−0.5, −0.7] σ with multiple “instantaneous rise → hold 5–20 min → sharp drop” segments (shown as Figure 15), reflecting transitions from stable leak-to-leak cessation.

Cluster 1 (n = 12)—Several curves oscillate between 2 and 4 σ before entering a declining regime (shown as Figure 16); step-downs to [−0.4, −0.7 σ] with subsequent amplitude convergence suggest instantaneous-leak aftermath and decay.

Cluster 0 (n = 11)—Many sequences oscillate within [−1,1] σ with frequent zero-crossings (shown as Figure 17) and no long-term trend (mostly low-to-moderate jitter). A few nearly constant, low-dynamic curves resemble sustained leaks (as in Section 4, cluster 6).

Clusters 3 (n = 8)—Many sequences show low amplitude in the first half (around [−0.6,0] σ) and multiple ≥3 σ positive spikes later (shown as Figure 18), indicating onset of leakage from a normal baseline.

Cluster 4 (n = 7)—Most curves rise from the negative/near-zero mid-period to peaks of ≈1.2–2.3 σ, then uniformly fall back (shown as Figure 19), again characteristic of instantaneous leaks.

6. Conclusions

Using real field GDS data from chemical facilities, we constructed a dataset of 56 valid leak samples. After feature extraction, normalization, and interpolation-based completion, we built a multi-dimensional time-series feature matrix and applied two unsupervised methods—K-Means and a 1D-CNN autoencoder—for leak-pattern identification, yielding the following conclusions:

Confirmed diversity of leak patterns. K-Means shows clear morphological differences among samples, separating sustained, instantaneous, fluctuating, and externally interrupted leaks. This indicates that field sensor data indeed embeds discriminable leak-scenario information that can support incident progression assessment and risk analysis.
Feature-based clustering is intuitive. K-Means quickly exposes global shape characteristics (e.g., persistence, stability), suiting macro-pattern recognition. However, it depends on hand-crafted features, struggles to capture complex temporal dynamics comprehensively, and is less robust to noise and phase shifts.
Autoencoders better capture temporal dynamics. The 1D-CNN autoencoder, via convolutions and global pooling, learns sample-level latent vectors that automatically encode local jumps, periodicities, and global trends without labels. Clusters reveal fine-grained evolution—e.g., transitions from stable leak to stop or post-instantaneous decay—better reflecting dynamic processes than K-Means.
Complementary methods. K-Means excels at macro-scene classification and rapid partitioning; the autoencoder provides refined temporal-evolution analysis. Combined, they enable a “coarse recognition + fine analysis” framework that enhances early warning accuracy and interpretability.
Practical implications. Beyond demonstrating the feasibility of pattern recognition and scenario classification with real plant data, this approach suggests how to optimize hazardous chemical early-warning systems. By identifying distinct leak patterns and mapping them—via CFD and site feedback—to risk grades, the paradigm can evolve from single-point alarms to intelligent scenario discrimination with linked early-warning actions.
Limitations and outlook. Constraints include limited sample size, imbalanced scenario distribution, and multiple external interferences. Future work will expand cross-plant datasets and incorporate multi-source fusion (video, meteorology, process parameters) to build multimodal leak-discrimination models. Autoencoders with attention mechanisms or graph neural networks may better capture spatiotemporal dispersion patterns, improving generalization and predictive accuracy.

In summary, leak-pattern recognition based on real-world GDS data is not only feasible but also provides more intelligent and granular technical support for hazardous chemical risk monitoring, informing the digital transformation of safety production and emergency management.

Author Contributions

Conceptualization, L.G. and J.X.; methodology, J.X. and K.Z.; data curation, X.Z.; writing—original draft preparation, J.X.; writing—review and editing, X.Z. and K.Z.; validation, W.Y.; funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Basic Scientific Research Funding of the China Academy of Safety Science and Technology (2025JBKY12), Key Science and Technology Project of the Ministry of Emergency Management of the People’s Republic of China (2024EMST090901), and National Key R&D Program of China (2024YFC3013600).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the first author or the corresponding author.

Acknowledgments

During the preparation of this study, the authors used ChatGPT 4o for the purposes of coding and translation. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Author Xiaoguang Zhu was employed by the company Shandong Environmental Protection Development Group Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Bulletin of the Office of the Work Safety Committee of the State Council Regarding the “11·28” Major Explosion and Deflagration Accident in Zhangjiakou, Hebei Province. Available online: https://www.mem.gov.cn/gk/zfxxgkpt/fdzdgknr/202012/t20201207_374013.shtml (accessed on 15 August 2025).
TPC Port Neches Explosions and Fire. Available online: https://www.csb.gov/tpc-port-neches-explosions-and-fire/ (accessed on 15 August 2025).
National Fire Protection Association. NFPA 72: National Fire Alarm and Signaling Code, 2019 ed.; NFPA: Quincy, MA, USA, 2019; Available online: https://edufire.ir/storage/Library/elam/NFPA%2072-2019.pdf (accessed on 20 August 2025).
Cullen, W.D. The Public Inquiry into the Piper Alpha Disaster; HMSO: London, UK, 1990. Available online: https://www.hse.gov.uk/offshore/piper-alpha-disaster-public-inquiry.htm (accessed on 20 August 2025).
IEC 61511-1:2016; Functional Safety—Safety Instrumented Systems for the Process Industry Sector—Part 1. IEC: London, UK, 2016. Available online: https://webstore.iec.ch/en/publication/24241 (accessed on 20 August 2025).
IEC 60079-29-2:2015; Explosive Atmospheres—Part 29-2: Gas Detectors—Selection, Installation, Use and Maintenance. IEC: London, UK, 2015. Available online: https://webstore.iec.ch/en/publication/21961 (accessed on 20 August 2025).
ISA-TR84.00.07-2018; Guidance on the Evaluation of Fire, Combustible Gas, and Toxic Gas System Effectiveness. ISA: Research Triangle Park, NC, USA, 2018. Available online: https://www.isa.org/products/isa-tr84-00-07-2018-guidance-on-the-evaluation-of (accessed on 20 August 2025).
NFPA 715; Standard for the Installation of Fuel Gas Detection and Warning Equipment. NFPA: Quincy, MA, USA, 2023. Available online: https://www.nfpa.org/codes-and-standards/nfpa-715-standard-development/715 (accessed on 20 August 2025).
GB 12358-2024; General Technical Requirements for Workplace Environmental Gas Detectors and Alarms. China Standards Press: Beijing, China, 2024. Available online: https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=CAB81EA2B8D4788DF49A803C73C0507E (accessed on 22 August 2025).
GB 50160-2008; Code for Fire Protection Design of Petrochemical Enterprises. China Standards Press: Beijing, China, 2008. Available online: https://sc.119.gov.cn/scxfjyzd/gfbz/2014/7/7/630369cc8a33455ea6fc7782f68c890b.shtml (accessed on 22 August 2025).
GB/T 50493-2019; Design Standard for Combustible and Toxic Gas Detection and Alarm in Petrochemical Industry. China Planning Press: Beijing, China, 2019. Available online: https://www.gov.cn/zhengce/zhengceku/2019-09/25/content_5454468.htm (accessed on 22 August 2025).
GB 17681-2024; Technical Specification for Safety Monitoring of Major Hazard Installations of Hazardous Chemicals. China Standards Press: Beijing, China, 2024. Available online: https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=07550FC116FD753A83069D0A0F963891 (accessed on 22 August 2025).
Stetter, J.R.; Jurs, P.C.; Rose, S.L. Detection of hazardous gases and vapors: Pattern-recognition analysis of data from an electrochemical sensor array. Anal. Chem. 1986, 58, 860–866. [Google Scholar] [CrossRef]
Sharma, A.; Khullar, V.; Kansal, I.; Chhabra, G.; Arora, P.; Popli, R.; Kumar, R. Gas detection and classification using multimodal data based on federated learning. Sensors 2024, 24, 5904. [Google Scholar] [CrossRef] [PubMed]
Potharaju, S.; Tirandasu, R.K.; Tambe, S.N.; Jadhav, D.B.; Kumar, D.A.; Amiripalli, S.S. A two-step machine-learning approach for predictive maintenance and anomaly detection in environmental sensor systems. MethodsX 2025, 14, 103181. [Google Scholar] [CrossRef] [PubMed]
Zhang, L. Optimization of oil and gas pipeline leakage data and defect identification based on graph neural processing. Ann. Data Sci. 2025, 12, 1413–1430. [Google Scholar] [CrossRef]
Kang, Z.; Qian, X.; Li, Y.; Hou, L.; Huang, Z.; Duanmu, W.; Yuan, M. Feature extraction of natural gas leakage for an intelligent warning model: A data-driven analysis and modeling. Process Saf. Environ. Prot. 2023, 174, 574–584. [Google Scholar] [CrossRef]
Thanigaivelu, P.S.; Ranganathan, C.S.; Priya, S.; Asha, R.M.; Murugan, S.; GaneshBabu, T.R. Securing chemical processing plants using Isolation Forest algorithm and IoT sensors for leakage prevention. In Proceedings of the 7th International Conference on Inventive Computation Technologies (ICICT 2024), Kathmandu, Nepal, 24–26 April 2024. [Google Scholar] [CrossRef]
Paparrizos, J.; Gravano, L. k-Shape: Efficient and accurate clustering of time series. ACM SIGMOD Rec. 2016, 45, 69–76. [Google Scholar] [CrossRef]
Kazempour, D.; Beer, A.; Schrüfer, O.; Seidl, T. Clustering Trend Data Time-Series through Segmentation of FFT-decomposed Signal Constituents. In Proceedings of the Conference on “Lernen, Wissen, Daten, Analysen” (LWDA 2019), Berlin, Germany, 30 September–2 October 2019; Available online: https://ceur-ws.org/Vol-2454/paper_64.pdf (accessed on 22 August 2025).
Wang, X.; Smith, K.; Hyndman, R. Characteristic-based clustering for time series data. Data Min. Knowl. Discov. 2006, 13, 335–364. [Google Scholar] [CrossRef]
Chen, S.; Yu, J.; Wang, S. One-dimensional convolutional auto-encoder-based feature learning for fault diagnosis of multivariate processes. J. Process Control 2020, 87, 54–67. [Google Scholar] [CrossRef]

Figure 1. Schematic of on-site sensor placement requirements.

Figure 2. Normalized curve of Sample 1.

Figure 3. Normalized curve of Sample 23.

Figure 4. Normalized curve of Sample 11.

Figure 5. Normalized curve of Sample 19.

Figure 6. k vs. total within-cluster sum of squared deviations.

Figure 7. K-Means clustering results.

Figure 8. Curve overlay of K-Means cluster 6.

Figure 9. Curve overlay of K-Means cluster 0.

Figure 10. Curve overlay of K-Means cluster 5.

Figure 11. Curve overlay of K-Means cluster 4.

Figure 12. 1D-CNN encoder structure.

Figure 13. 1D-CNN autoencoder training curve.

Figure 14. 1D-CNN autoencoder clustering results.

Figure 15. Curve overlay of autoencoder cluster 6.

Figure 16. Curve overlay of autoencoder cluster 1.

Figure 17. Curve overlay of autoencoder cluster 0.

Figure 18. Curve overlay of autoencoder cluster 3.

Figure 19. Curve overlay of autoencoder cluster 4.

Table 1. List of sensor types and corresponding samples.

Sensor Type	Sample No.
carbon monoxide	18, 29, 48, 10, 11, 34, 44, 9
butadiene	12, 23
propylene	39, 46
acrylonitrile	14, 19, 20, 21, 30, 31, 32
sulfur dioxide	60
methane	55
ammonia	15, 33, 51, 43, 13, 24
vinyl chloride	57, 59
ethylene oxide	22, 61
benzene	0, 1, 35, 36, 37, 40, 41, 42, 5, 52, 54, 56, 6, 7, 8
unknown	38, 45, 47, 58, 16, 17, 27, 28, 53

Table 2. Post-standardization features summary.

Sample Number	Min	Max	Standard Deviation of First Differences	Autocorrelation at Lag 1	Autocorrelation at Lag 3	Autocorrelation at Lag 6	FFT Band 1 Energy
1	−1.32706	4.620586	0.260192	0.950937	0.651711	0.139321	19,652.6264
52	−1.65775	2.376913	0.2424456	0.959969	0.708493	0.243905	20,988.07925
44	−0.50482	3.545038	0.2228604	0.968425	0.767406	0.36791	22,363.36531
48	−0.36168	3.805253	0.2224088	0.968849	0.776096	0.417579	22,608.58266
40	−0.83239	3.518055	0.219864	0.969357	0.792958	0.554151	22,535.44202
0	−1.00419	2.602187	0.2155932	0.966879	0.748485	0.334752	20,423.71543
13	−0.53538	3.412387	0.2108702	0.972253	0.794703	0.456207	22,888.40488
24	−0.53538	3.412387	0.2108702	0.972253	0.794703	0.456207	22,888.40488
56	−1.08731	3.064646	0.2051731	0.972229	0.789689	0.427625	21,555.48367
43	−0.37541	3.465129	0.2036276	0.97476	0.821291	0.573815	23,492.06252

Table 3. Quantitative summaries per cluster of K-Means.

Cluster	Sample Counts	Abs_z Median	Amplitude Median	Abs_z > 2.0 Median (Total Time)	Zero Crossings Count Median
0	15	0.77126481	2.748384744	5	3
1	4	0.270706386	1.42898119	12.5	4
2	1	0.147654238	0	30	2
3	3	0.441616587	1.654499535	5	2
4	6	0.494996604	3.104568234	15	2
5	10	0.455661096	2.968139332	25	2
6	17	0.552570624	2.666793355	15	11

Table 4. Autoencoder hyperparameters.

Item	Setting
Latent dimension (d)	16
Learning rate (Adam)	1 × 10⁻³
Batch size	64
Max epochs	300
Early-stopping patience	30
Weight decay	1 × 10⁻⁵
Dropout	0.1

Table 5. Quantitative summaries per cluster of autoencoder.

Cluster	Sample Counts	Abs_z Median	Amplitude Median	Abs_z > 2.0 Median (Total Time)	Zero Crossings Count Median
0	11	0.765957	2.759981	10	10
1	12	0.454202	2.732783	15	3
2	3	0.671843	3.45946	15	3
3	8	0.520103	2.674599	15	10.5
4	7	0.966703	2.699907	0	2
5	4	0.600381	3.27234	15	3
6	11	0.361681	2.857962	20	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xi, J.; Guan, L.; Zhu, X.; Zong, K.; Yan, W. Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors. Processes 2026, 14, 108. https://doi.org/10.3390/pr14010108

AMA Style

Xi J, Guan L, Zhu X, Zong K, Yan W. Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors. Processes. 2026; 14(1):108. https://doi.org/10.3390/pr14010108

Chicago/Turabian Style

Xi, Jian, Lei Guan, Xiaoguang Zhu, Kai Zong, and Wenrui Yan. 2026. "Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors" Processes 14, no. 1: 108. https://doi.org/10.3390/pr14010108

APA Style

Xi, J., Guan, L., Zhu, X., Zong, K., & Yan, W. (2026). Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors. Processes, 14(1), 108. https://doi.org/10.3390/pr14010108

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pattern Recognition of Hazardous Gas Leak Monitoring Data Based on Field Sensors

Abstract

1. Introduction

2. Data Sources

3. Data Preprocessing

4. Leak Pattern Recognition with K-Means

5. Leak Pattern Recognition with 1D-CNN Autoencoder

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI