A Machine Learning Approach for the Classification of Refrigerant Gases

Argirusis, Nikolaos; Konstantaras, John; Argirusis, Christos; Dimokas, Nikos; Thanopoulos, Sotirios; Karvelis, Petros

doi:10.3390/app14146230

Open AccessArticle

A Machine Learning Approach for the Classification of Refrigerant Gases

by

Nikolaos Argirusis

¹

,

John Konstantaras

²

,

Christos Argirusis

^1,3

,

Nikos Dimokas

⁴

,

Sotirios Thanopoulos

⁵

and

Petros Karvelis

^6,*

¹

mat4nrg GmbH, 38678 Clausthal-Zellerfeld, Germany

²

Energy and Environmental Research Laboratory, National and Kapodistrian University of Athens, 34400 Psachna, Evia, Greece

³

School of Chemical Engineering, National Technical University of Athens, Zografou, 15773 Athens, Greece

⁴

Department of Informatics, University of Western Macedonia, 52100 Kastoria, Greece

⁵

School of Mechanical Engineering, National Technical University of Athens, Zografou, 15773 Athens, Greece

⁶

Department of Informatics and Telecommunications, University of Ioannina, 45110 Ioannina, Greece

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6230; https://doi.org/10.3390/app14146230

Submission received: 6 June 2024 / Revised: 6 July 2024 / Accepted: 16 July 2024 / Published: 17 July 2024

Download

Browse Figures

Versions Notes

Abstract

Combining an Internet of Things-driven approach with machine learning algorithms holds great promise in discerning pure gases across various applications. Interconnecting gas sensors within a network allows for continuous monitoring and real-time environmental analysis, producing valuable data for machine learning models. Utilizing supervised learning algorithms, like random forests, enables the creation of accurate classification models that can effectively distinguish between different pure gases based on their distinct features, such as spectral signatures or sensor responses. This groundbreaking integration of the Internet of Things and Machine Learning fosters the development of robust, automated gas detection systems, ensuring high accuracy and minimal delay in recognizing pure gases. Consequently, it opens avenues for enhanced safety, efficiency, and environmental sustainability in numerous industrial and commercial scenarios.

Keywords:

machine learning; refrigerant gases; classification; environmental sustainability; gas detection

1. Introduction

The rapid progress of the Internet of Things (IoT) has resulted in an abundance of interconnected devices, generating vast amounts of data with substantial potential to enhance system and process performance [1]. Among the promising applications is mobile gas detection and identification [2], where integrating IoT with machine learning techniques can provide a more precise and dependable method for distinguishing various gaseous substances, including refrigerants. These refrigerants pose a significant environmental challenge as they contribute to the greenhouse effect, and their widespread use in Heating, Ventilation, Air Conditioning and Refrigeration (HVAC-R) systems demands strict regulatory measures. Therefore, achieving accurate detection and classification of these gases is of utmost importance for reasons related to safety, environmental preservation, and economic considerations.

Within the realm of IoT, gas sensors can be strategically deployed in a networked setup, facilitating continuous monitoring and real-time analysis of the surrounding environment [3]. The data collected from these sensors can then undergo processing and analysis through machine learning algorithms, either locally on edge devices or remotely on cloud-based platforms, depending on the specific needs of the application [4].

Through the synergistic blend of IoT and machine learning [5], the development of robust gas detection systems becomes feasible. These systems possess the capability to automatically identify and differentiate various gases, such as R32 and R134, with exceptional precision and minimal delay. Such a setup empowers these systems to provide crucial information for effective decision-making, ensuring the safe and efficient operation of HVAC-R systems while simultaneously reducing their environmental impact.

Machine learning algorithms have emerged as a transformative tool for gas classification in the context of IoT. By harnessing the vast datasets obtained from interconnected gas sensors, these algorithms can learn intricate patterns and features indicative of various refrigerant gases. The application of machine learning techniques has been demonstrated to significantly improve the accuracy, efficiency, and real-time capabilities of gas classification systems [6]. With the ability to process data either locally on edge devices or remotely on cloud-based platforms, machine learning offers unparalleled flexibility to cater to diverse application requirements.

Climate change, primarily driven by the burning of fossil fuels and greenhouse gas (GHG) emissions, results in rising average temperatures, melting ice masses, and extreme climate events globally. Since the 1900s, surface temperatures have increased by 1.65 °C, with an extreme rise of 0.18 °C reported in June 2023 [7]. Fluorinated greenhouse gases (FGGs), with their high global warming potential (GWP), are significant contributors to this issue [6]. Historically, chlorofluorocarbons (CFCs), used in packaging materials, aerosol solvents, and refrigerants, were recognized for their ozone-depleting properties, leading to their phase-out under the 1987 Montreal Protocol and subsequent replacement with hydrofluorocarbons (HFCs) [8,9]. However, HFCs have GWPs ranging from 500 to 5000, prompting the 1997 Kyoto Protocol to address the six largest GHGs, including carbon dioxide, methane, and HFCs, aiming to reduce their emissions [10].

Environmental regulations have spurred the development of greener refrigerants, necessitating accurate detection of both new and older gases to ensure system efficacy and proper disposal. Different countries have varying laws regarding used refrigerants: some classify them as hazardous waste requiring immediate destruction, while others permit resale by licensed processors after recycling. Typically, refrigerant recovery involves reprocessing through filtering, drying, distillation, and chemical treatment to meet the “Air-Conditioning, Heating, and Refrigeration Institute” (AHRI) Standard 700-1995 specifications [11].

Other reasons why quantitative measurements of the presence of such gases are required are leak detection during the manufacture of air conditioning components to determine if they are leak-free, testing newer systems to determine if they are operating at maximum efficiency and the correct mixing ratios [10].

Fast Fourier transform spectroscopy [12] can be used to perform the detection and quantification. However, these instruments are sensitive, cumbersome, and expensive and often a trained technician is needed to operate them under field conditions. Therefore, a mobile, easy-to-use, and less expensive detection device would be useful. Small devices using IR-sensor arrays deliver data that can feed machine-learning algorithms in order to analyze the measurement results.

In this manuscript, we present a comprehensive study on the implementation of machine learning algorithms for the classification of refrigerant gases in IoT-based gas detection systems. Our approach leverages the networked configuration of gas sensors to facilitate continuous monitoring and real-time analysis of the environment. By integrating state-of-the-art machine learning models, such as Random Forests, Support Vector Machines, and Neural Networks, we demonstrate how these algorithms surpass traditional gas detection methods in accuracy and speed. Through experimental evaluations, we showcase the advantages of our machine learning approach in accurately distinguishing between critical refrigerant gases, including R32 and R134, with minimal latency.

Some key findings from our experiments and methodology include:

High Accuracy in Gas Classification: The Random Forest classifier achieved high precision and recall rates in distinguishing between R32 and R134a gases, underscoring its effectiveness in this application. This model’s robustness was further validated through confusion matrices and performance metrics such as F1-scores.
Effectiveness of Feature Extraction: The use of a very fast library for feature extraction proved pivotal in transforming raw sensor data into meaningful features, thereby enhancing the performance of the classification models. This automated process reduced the need for manual feature engineering, accelerating the development of predictive models.
Impact of Air Dilution Levels: The models also effectively handled varying air dilution levels, maintaining high accuracy across different gas concentrations. This capability is crucial for practical applications where gas concentrations can fluctuate significantly.
Dimensionality Reduction for Visualization: Techniques such as t-SNE were employed to reduce the dimensionality of the data, facilitating better visualization and understanding of the underlying patterns. These techniques highlighted the distinct clustering of gas types and their concentrations, supporting the models’ classification decisions.

2. Related Work

The environmental and health impacts of refrigerants are critical areas of ongoing research, given their complex implications for both global warming and local health risks. In the study [13], a detailed trade-off analysis between global impact potential and local risk was performed, showcasing significant challenges when substituting refrigerants like R-22 with R-410a, which, although reducing ozone depletion, inadvertently increases the global warming potential. This emphasizes the nuanced decisions required in selecting optimal refrigerants that meet environmental compliance and sustainability goals.

Further research into the Japanese household air conditioning sector [14] highlighted the need for comprehensive policies that consider both the direct and indirect environmental impacts of refrigerants, suggesting a transition towards low GWP alternatives like R-32 to mitigate adverse effects effectively.

Furthermore, Ref. [15] provides insights into the variations in energy efficiency and operational patterns of residential air conditioning systems. This supports the implementation of continuous monitoring and adaptive control systems to optimize HVAC performance and minimize ecological footprints.

Advancements in machine learning have enabled the prediction of the thermophysical properties of refrigerants, aiding in the selection and design of more efficient refrigeration systems with potentially reduced environmental impacts, as discussed in [16].

Additionally, developments in sensor technology, such as those detailed in [17] enhance the accuracy of refrigerant management systems, contributing to better maintenance practices and reduced leakage rates. Infrared thermography has been utilized to analyze refrigerant distribution within heat exchangers, offering significant improvements in understanding refrigerant behavior, essential for designing more efficient and less environmentally damaging refrigeration systems, as explored in [18].

The impact of F-gas regulations in Europe on market trends for various refrigerants underscores how regulatory measures are reshaping the refrigerant landscape, pushing towards low GWP refrigerants and phasing out harmful substances, as analyzed in [19].

Together, these studies form a comprehensive view of the current challenges and considerations in managing refrigerants, underlining the critical need for ongoing research and technological adaptation to mitigate the environmental impact of refrigerant gases effectively. This collective body of work also highlights the importance of policy frameworks that integrate life cycle analyses and risk assessments, supporting the development of sustainable refrigeration technologies.

3. Materials and Methods

The figure provided below (Figure 1) outlines the comprehensive process flow for classifying refrigerant gases using a combination of Internet of Things (IoT) technology and machine learning. Initially, specific gases, namely R134a and R32, are the focus of the analysis. These gases are monitored using sensors as part of the signal acquisition phase, where real-time data are collected. These data are then processed through a Raspberry Pi, illustrating the use of IoT technology to handle and preprocess the sensor data efficiently. Following data acquisition, the next critical step involves feature extraction, performed using the ’tsfresh’ library, which is designed to automatically extract relevant features from time series data. This feature extraction is crucial for transforming raw sensor data into a format suitable for machine learning analysis. The extracted features are then fed into a machine-learning model, specifically a Random Forest classifier. This model is trained to distinguish between the types of gases based on the patterns recognized in the data. The output of the machine learning model is a decision on the type of gas present, which is the final goal of the process. This workflow exemplifies a robust application of both IoT and machine learning to achieve precise and reliable gas classification in an automated manner.

3.1. IoT Approach

The mobile device shown in Figure 2 has IoT capabilities that allow the identification of the user via an external mobile application, the device itself and the bottle (both via pre-registered unique QR-codes) used for the collection of the used refrigerant. The user starts the application and the device is registered by reading the QR code on it. The QR code is assigned to the device’s MAC address and stored in a database to which the user has no further access so that data manipulation is not possible. Further, the QR code on the bottle which is intended to be used for the storage of the refrigerant is also read and submitted to an external database along with the measurement results; the collection will be described in the following section.

3.2. Signal Acquisition

An STM32 µ-controller is responsible for controlling the hardware periphery, including pumps, valves, voltage sources, etc. Additionally, it handles data acquisition at a rapid scan rate of 1 point per millisecond. The infrared (IR) source operates in a pulsed manner at 10 Hz with a duty cycle of 62%. During each pulse, 100 data points are collected per channel, with only 60 of them considered usable as they correspond to the duration when the IR source is active. The data collection process spans over 1000 pulses. Following this data collection phase, the acquired data are transferred to a Raspberry Pi µ-computer where the subsequent data analysis takes place. An example of the signal acquired is shown in the following Figure 3.

3.3. Feature Extraction

In the process of signal classification, feature extraction [20,21] plays a pivotal role by transforming raw signals into a meaningful set of features, which can be effectively utilized by machine learning algorithms. The primary goal of feature extraction is to reduce data dimensionality while preserving essential information, thereby facilitating the development of more precise and computationally efficient classification models [22].

Existing literature proposes a range of techniques for feature extraction, including time-domain, frequency-domain, and time-frequency domain methods. Time-domain methods focus on extracting statistical features like mean, variance, and kurtosis directly from the raw signals [23,24]. Conversely, frequency-domain methods analyze the spectral content of the signals, often employing Fourier or wavelet transformations to derive features related to the signal’s frequency components [25].

The library Time Series Feature Extraction on the basis of Scalable Hypothesis tests (tsfresh), Ref. [26] holds substantial significance in the domain of feature extraction for time series data. Developed by a team of researchers, tsfresh offers a comprehensive and efficient toolkit that automates the process of extracting relevant features from time series datasets. The library is designed to handle a wide range of time series data, encompassing diverse domains such as finance, healthcare, sensor data, and more. By automating the feature extraction process, Tsfresh eliminates the need for manual feature engineering, which can be a time-consuming and error-prone task. This automation allows researchers and practitioners to focus more on the data analysis and modeling aspects, accelerating the overall development of predictive models and data-driven insights.

One of the key advantages of tsfresh lies in its ability to perform feature extraction at scale [27]. The library incorporates scalable hypothesis testing techniques to efficiently identify and extract meaningful features from large and complex time series datasets. This scalability empowers data scientists to work with extensive datasets, enabling the extraction of valuable features from thousands or even millions of time series instances. Moreover, tsfresh provides a diverse set of time domain and statistical features, as well as the flexibility to define custom features, ensuring a comprehensive exploration of the data’s characteristics. With its user-friendly interface and compatibility with popular machine learning libraries, tsfresh facilitates seamless integration into existing data analysis workflows, making it a valuable tool for researchers, practitioners, and data scientists seeking to harness the power of time series data for various applications.

3.4. Random Forest Classifier

Supervised classification, a prominent branch of machine learning, involves training algorithms on labeled data to enable them to recognize patterns and make predictions on unseen samples [28,29]. In this section, we present a comprehensive evaluation of the Random Forests algorithm [30,31].

We opted for Random Forests for several compelling reasons. First and foremost, Random Forests demonstrated superior performance in terms of predictive accuracy across various datasets, making them a robust choice for our classification task [31]. Furthermore, Random Forests’ ensemble nature and feature randomness mitigate the risk of overfitting, providing better generalization capabilities compared to some other algorithms, such as Multi Layer Perceptron (MLP) [32], which are more prone to overfitting, especially with complex datasets [33].

Moreover, the interpretability of Random Forests stood out as a crucial advantage over black-box models like MLP and Support Vector Machines (SVMs) [34]. The ability to easily estimate feature importance allowed us to gain valuable insights into the significant variables driving the classification decisions, making the results more interpretable and insightful. Lastly, Random Forests are computationally efficient, especially when compared to complex models like SVM and MLP, which often require extensive computational resources and training time. Given our research constraints, the efficiency of Random Forests made them a practical and viable choice for our classification task.

The algorithm can be summarized in a few key steps. Firstly, a random subset of the training data is selected with replacement, known as bootstrapping [35]. This step introduces diversity among the individual decision trees that will be generated, reducing the risk of overfitting [36]. Next, decision trees are constructed using these bootstrapped datasets, but with a slight variation. At each node of the tree, instead of considering all available features for splitting, a random subset of features is chosen. This process, called feature bagging or random feature selection, helps reduce the correlation between trees and further improves model generalization. Once the decision trees are fully grown, they are used to make predictions on new data points. For classification tasks, the final prediction is often determined by a majority vote among the individual trees, while for regression tasks, it is typically the average of the predictions.

In summary, the Random Forests algorithm combines the strength of bootstrapping, random feature selection, and ensemble techniques to create a robust and accurate predictive model, making it a valuable tool in various scientific and practical applications. In our analysis, the Random Forest model was configured with 300 trees, which provided a robust ensemble for handling the variability in our dataset while balancing computational efficiency. Additionally, we set the maximum depth of each tree in the forest to 5. These parameters were selected based on a series of preliminary tests that evaluated the model’s performance with varying numbers of trees and depths.

4. Dataset

In this research, we adopted an IoT-based approach to gather data from a diverse range of refrigerant gases, such as R32, R134a. To achieve this, we deployed a custom-designed sensor array comprising multiple gas sensors, meticulously chosen for their sensitivity, selectivity, and compatibility with the target refrigerants. The sensor array was seamlessly connected to an IoT platform, enabling efficient real-time data collection, transmission, and storage.

For the data acquisition process, we exposed the sensor array to controlled gas environments containing varying concentrations of different refrigerant gases. To ensure precision and consistency in measurements, these gas environments were carefully created in a laboratory setting using gas chambers and precisely controlled gas mixtures. Prior to each experiment, the sensors underwent calibration to guarantee accurate readings. Throughout the experiments, the sensor array continuously monitored the gas concentrations, promptly transmitting the acquired data to the IoT platform for further analysis and processing.

For this research, we gathered a total of 240 signals/samples which was then processed by the tsfresh library to extract the features by out ML algorithm. with each sample representing a distinct gas concentration and its corresponding sensor response. The compiled data has been meticulously presented in Table 1, offering an insightful summary of the sample distribution for each refrigerant gas type.

We further divided the dataset of the two refrigerant gases with different air dilution levels. Table 2 describes the dataset comprising measurements of two refrigerant gases, R134a and R32, recorded under varying air dilution levels, specifically 0%, 5%, 10%, 25%, 50% and 70%. This table underscores the balanced nature of our data collection process, as it demonstrates an equitable distribution of samples across different gases. Such a comprehensive and balanced dataset played a crucial role in enabling the successful training and evaluation of our machine-learning algorithms, ensuring a robust and dependable discrimination of refrigerant gases.

The raw data from these sensors are then subjected to an automated feature extraction process using the Python package tsfresh [26]. tsfresh is pivotal in our methodological framework, facilitating the extraction of relevant features from the time series data efficiently. This tool is designed to automate the calculation of a wide array of time series characteristics, extracting over 794 distinct features by default. These features are derived based on their statistical significance, ensuring that only the most relevant attributes are retained for model training.

Our final dataset is organized in a tabular data format, where each row represents a unique measurement instance and each column a feature. This structure is particularly suited for the application of machine learning algorithms and aligns with the requirements of our Random Forest classifier, which distinguishes between the gas types based on the learned patterns in the training data.

5. Evaluation

In this section, we present a comprehensive analysis of the outcomes achieved by the models we trained. Through the implementation of a stratified holdout methodology, the dataset underwent a meticulous partitioning procedure, leading to the creation of two distinct subsets, designated as the Train and Test sets.

Before presenting the evaluation measure we first must describe the True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) measures for the multiclass classification problem.

True Positives (TP): TP represents the fraction of instances correctly classified as belonging to a specific class among all instances that actually belong to that class. In a multi-class problem.

True Negatives (TN): TN represents the fraction of instances correctly classified as not belonging to a specific class among all instances that do not belong to that class.

False Positives (FP): FP represents the fraction of instances incorrectly classified as belonging to a specific class among all instances that do not belong to that class.

False Negatives (FN): Description: FN represents the fraction of instances incorrectly classified as not belonging to a specific class among all instances that actually belong to that class. In a multi-class problem, FN is computed for each class, signifying instances that were erroneously excluded from that class.

These definitions ensure that the assessment of model performance in a multi-class setting accounts for the nuances of each class separately, allowing for a more detailed and informative evaluation.

Accuracy:: Accuracy represents the fraction of correctly classified instances out of the total number of instances in the dataset.

$A c c = \frac{T P + T N}{T P + T N + F P + F N}$

(1)
Precision:: Precision quantifies the fraction of true positive predictions among all instances predicted as positive.

$P = \frac{T P}{T P + F P}$

(2)
Recall:: Recall, also known as Sensitivity or True Positive Rate, represents the fraction of true positive predictions among all actual positive instances.

$R = \frac{T P}{T P + F N}$

(3)
F1 Score:: The F1 Score is the harmonic mean of precision and recall. It provides a balance between precision and recall, making it useful when dealing with imbalanced datasets.

$F 1 = \frac{2 \cdot P \cdot R}{P + R}$

(4)

We conducted two experiments. In the first experiment, our aim was to differentiate between gases R32 and R134a (2 class classification problem), and in the second one we tried to differentiate between R32, R134a, and dilution level (12 class classification problem). The first experiment is Gas type Classification and the second one is Gas Type and dilution level Classification. Finally, we employed a stratified holdout technique [37], the dataset was divided into two distinct subsets—train (80%) and test—aiming (20%) to evaluate the efficiency of machine learning algorithms. This method guaranteed that the distribution of each gas class remained consistent across both training and testing sets, thereby minimizing the potential for biased performance evaluations. Moreover, this approach allows for the utilization of multiple performance metrics.

5.1. Gas Type Classification

For the first machine learning experiment below we provide the confusion matrix which is reported in Table 3. This table presents the confusion matrix for the classification of the two refrigerant gases, R134a and R32, using our machine-learning model. The rows of the matrix represent the predicted classes by the model, while the columns show the actual classes from the test dataset. The diagonal elements of the matrix (22 for both R134a and R32) indicate the number of correct predictions made by the model for each gas type, demonstrating its effectiveness in distinguishing between R134a and R32. The off-diagonal elements (2 for each off-diagonal cell) represent the misclassifications, where the model predicted one gas type but the actual type was the other. These results highlight the model’s high accuracy and the few instances of confusion between the two gas types, showcasing the robustness of the classification approach employed in this study. The robust performance of this classifier can significantly aid in applications requiring precise and reliable gas detection, such as environmental monitoring, safety systems in industrial settings, and compliance with environmental regulations. In summary, the confusion matrix reflects a highly effective and reliable classification system, suitable for critical applications involving the detection and differentiation of refrigerant gases.

Table 4 details the classification performance metrics for the two refrigerant gases, R32 and R134a, evaluated by precision, recall and f1-score. The precision metric for both gases stands at 0.9167, indicating the high accuracy of positive predictions and reflecting the model’s ability to correctly identify relevant instances of each gas type. The recall metric, also at 0.9167 for both gases, shows the proportion of actual positives that were correctly identified, highlighting the model’s sensitivity and effectiveness in capturing all relevant instances without significant omission. The f1-score, a harmonic mean of precision and recall, is consistently 0.9167 for both gases, providing a balanced measure of the model’s accuracy. These results collectively demonstrate the model’s robust performance in classifying both R32 and R134a gases with high precision and recall, confirming the effectiveness of the applied machine learning approach in distinguishing between the gas types.

5.2. Gas Type and Dilution Level Classification

This Section presents the detailed performance metrics for the classification of R32 and R134a refrigerant gases across various dilution levels. This analysis evaluates the precision, recall and f1-score for each class, offering insights into the model’s effectiveness and areas needing improvement.

Table 5 provides a detailed breakdown of the classification results for twelve distinct classes, corresponding to different percentages of refrigerant types R32 and R134a. This matrix is crucial for visualizing the model’s ability to accurately classify the gases across various dilution levels. The diagonal cells represent the number of true positives for each class, illustrating the model’s strength in correctly identifying each specific scenario. Off-diagonal cells contain a number of misclassifications, where the model predicted one condition but the actual condition was different, offering insights into the specific challenges or confusions faced by the model. This detailed representation helps in understanding the precision and robustness of the classification system in handling complex, multi-class scenarios.

Table 6 provides a detailed breakdown of the performance metrics for a 12-class classification problem, featuring precision, recall and f1-score for each class. The metrics elucidate the efficacy and limitations of the predictive model used in the classification of the dataset.

The analysis reveals variability in performance across the classes. Lower dilution levels, specifically R32 0% and R32 5%, demonstrate moderate performance with both precision and recall at 0.5000. This suggests challenges in the model’s ability to differentiate at lower concentrations, potentially due to overlapping sensor responses or insufficient distinguishing features at these levels. Conversely, mid to high dilution levels such as R32 25%, R32 70%, R134a 50%, and R134a 70% show improved performance. Notably, R32 25% achieves perfect recall, indicating the model’s consistent capability to identify all true positives for this dilution level. High dilution levels for R134a also perform well, particularly R134a 70%, which exhibits high effectiveness with both precision and recall reflecting robust model accuracy at this concentration.

Exceptional performance is noted for R134a 5% and R134a 10% classes, with perfect scores across all metrics—precision, recall, and f1-score. This excellence underscores the model’s capability to flawlessly distinguish these conditions, marking a high point in model reliability for these specific dilutions.

However, specific classes such as R32 50% exhibit significant challenges, showing zero precision and recall, which points to a complete model failure in detecting this class. This could be due to sensor limitations or deficiencies in the model’s training for this particular condition. Similarly, classes like R134a 0% and R134a 25% demonstrate high precision but varied recall rates, suggesting precision in predictions when made, yet a failure to detect all actual instances, particularly noticeable at the 0% dilution level.

Overall, the model achieves an accuracy of 72.92% across 48 instances, indicating a reasonable general performance. However, this also highlights the necessity for improvement, especially in classes with suboptimal detection rates. To enhance model performance, particularly at lower and mid-level dilutions where performance is lacking, it may be beneficial to improve feature extraction methods and potentially increase the size of the training dataset. Considering advanced preprocessing techniques or reevaluating the sensor technology used might also aid in distinguishing between overlapping features of different gas dilutions more effectively.

In conclusion, while the model demonstrates promising results in several scenarios, there are evident opportunities for refinement to achieve consistently high performance across all tested scenarios. This analysis not only forms a foundation for further model refinement but also provides critical feedback for ongoing research in the field of IoT-based refrigerant gas detection, paving the way for future advancements in environmental monitoring technologies.

5.3. Data Visualization

As high-dimensional datasets can be difficult to interpret and comprehend. Visualizing high-dimensional data often results in cluttered and overlapping visual elements, leading to a loss of interpretability and an inability to discern meaningful patterns [38]. By reducing the number of dimensions, these techniques facilitate the representation of complex data in a lower-dimensional space, making it easier to visualize and understand the underlying relationships between variables. Moreover, dimensionality reduction can reveal the inherent structure of the data, helping researchers and practitioners identify clusters, trends, and outliers that may not be apparent in the original high-dimensional space.

Dimensionality reduction techniques include t-distributed stochastic neighbor embedding (t-SNE) [39]. Visualizing data using t-SNE is particularly valuable for signal visualization because it can preserve the essential characteristics of the data while reducing noise and redundancy. By focusing on the most significant components of the data, these methods enhance the clarity and interpretability of visualizations, allowing researchers to gain a deeper understanding of the underlying signal dynamics. Furthermore, dimensionality reduction can also aid in the pre-processing stage of machine learning algorithms by addressing the curse of dimensionality. This phenomenon arises when high-dimensional datasets suffer from sparse sampling, leading to decreased model performance and longer training times.

In summary, dimensionality reduction techniques are essential for signal visualization as they improve interpretability, reveal hidden structures in the data, and address the challenges associated with high-dimensional datasets. By leveraging these techniques, researchers and practitioners can effectively explore and analyze complex signal data, ultimately leading to more informed decisions and better insights into the underlying phenomena.

Figure 4 displays the t-SNE projection of the feature space for two refrigerant gases, R134a and R32, across various air dilution levels (0%, 5%, 10%, 25%, 50%, 70%). Each point in the plot represents a gas sample, with shapes indicating the level of air dilution within that type and colors showing the type of gas.

Below we provide an analysis and some insights into the visualization:

Distinct Clustering by Gas Type: The two gases, R134a and R32 are represented by red and green colors, respectively. The clusters formed by each gas type are distinct, demonstrating good class separability achieved by the t-SNE dimensionality reduction. This separability is crucial for the effectiveness of machine learning algorithms used in classifying these gases, as it indicates that the features extracted are robust enough to distinguish between the two gases reliably.
Influence of Dilution Levels: The variation in shapes within each color group illustrates the impact of different dilution levels on the feature space. It is evident that as the dilution level changes, there are shifts in the clustering patterns. For instance, denser clusters at lower dilutions suggest more consistent sensor responses at these concentrations, while more spread-out clusters at higher dilutions may indicate variability in sensor behavior, which could be due to less distinct physical or chemical properties at lower gas concentrations.
Inter-Class Overlap: While there is a clear distinction between different gases, some overlap between adjacent dilution levels within the same gas type can be observed. This overlap is more pronounced in mid-dilution levels (25%, 50%), suggesting that these levels pose more of a challenge for classification. The overlap could result from similar sensor responses elicited by these dilution levels, which may require more sophisticated features or algorithms to resolve effectively.

In summary, the t-SNE projection illustrated in Figure 4 serves as a compelling visual tool that not only reaffirms the effectiveness of the feature extraction techniques employed but also highlights the challenges and nuances associated with classifying refrigerant gases at different dilution levels. The insights gained from this visualization are instrumental in guiding future enhancements in both data preprocessing and model development. By addressing the observed overlaps and variability in feature space, further refinements can be made to improve the accuracy and reliability of gas detection systems, thereby enhancing their applicability in real-world environmental monitoring and safety applications. This figure will accompany the discussion in our manuscript to illustrate the complexities and dynamics of feature distribution in IoT-based gas detection systems.

6. Discussion

In this study, we demonstrated the effective application of machine learning techniques, particularly the Random Forest classifier, for the classification of refrigerant gases using IoT-enabled sensor arrays. The results showed high accuracy, precision, recall, and F1 scores, particularly when compared to traditional statistical methods which often struggle with real-time data and complex gas mixtures. Our approach uniquely combines real-time IoT data acquisition with advanced machine learning, significantly outperforming older methods that rely on singular, less dynamic datasets.

Our analysis, presented in detailed tables and visualizations, clearly highlights the superior performance of the Random Forest algorithm. For instance, the confusion matrices in Table 3 illustrate the classifier’s ability to distinguish between gases R32 and R134a with minimal misclassification, a significant improvement over traditional spectroscopy methods that often require lengthy calibration and are prone to interference from similar gas types. The precision and robustness demonstrated here suggest a viable path forward for the deployment of these systems in critical areas such as industrial safety and environmental monitoring.

Furthermore, the use of t-SNE visualizations provided a clear depiction of the clustering of data points by gas type and dilution levels, as shown in our figures. These visualizations not only confirmed the effectiveness of our feature extraction techniques but also offered intuitive insights into the data structure that are not readily apparent from numerical data alone. This graphical representation helps in understanding complex patterns and interactions within the data, facilitating better decision-making and model improvements.

Despite these promising results, our study acknowledges certain limitations. The specificity of the sensors and potential interference from other gases require further exploration to ensure broader application. Future research will aim to expand the dataset to include more types of gases and test the models under a variety of environmental conditions to assess their robustness and reliability.

7. Conclusions

Concluding, we explored the efficacy of integrating Internet of Things (IoT) technology with machine learning algorithms to classify refrigerant gases, specifically R32 and R134a. The combination of these advanced technologies enables continuous, real-time monitoring and analysis, significantly enhancing the precision and efficiency of gas detection systems.

Our approach utilized an array of gas sensors connected via an IoT network, which facilitated the collection of a comprehensive dataset under controlled conditions. This dataset was then used to train various machine learning models, including Random Forests, Support Vector Machines (SVM), and Neural Networks (MLP). Among these, the Random Forest algorithm demonstrated superior performance due to its robustness, accuracy, and computational efficiency.

The integration of IoT and machine learning for refrigerant gas classification presents a promising solution for real-time environmental monitoring and safety applications. By enabling accurate and rapid detection of hazardous gases, this approach not only enhances operational efficiency but also contributes to environmental sustainability and regulatory compliance.

Future work will focus on expanding the dataset to include more types of refrigerant gases and exploring additional machine-learning techniques to further improve classification accuracy. Additionally, implementing these systems in real-world scenarios will provide valuable insights into their practical viability and performance under diverse conditions. Also, we are committed to acquiring new signals to further enhance the dataset. Expanding the dataset to include additional types of refrigerant gases and increasing the number of samples will allow us to refine and validate our model further, improving its accuracy and robustness in diverse conditions.

Author Contributions

Conceptualization, J.K.; Methodology, N.A., N.D., S.T. and P.K.; Software, N.A., J.K. and P.K.; Investigation, C.A., N.D. and S.T.; Writing—original draft, N.A., J.K., S.T. and P.K.; Writing—review & editing, C.A., N.D. and P.K.; Supervision, P.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding through the project “Circular economy ecosystem to Recover, Recycle and Re-use F-gases contributing to the depletion of greenhouse gases (LIFE Retradeables) from the LIFE Programme of the European Union under grant agreement LIFE19 CCM/AT 001226—LIFE Retradeables.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Nikolaos Argirusis and Christos Argirusis were employed by the company mat4nrg GmbH. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
Narkhede, P.; Walambe, R.; Mandaokar, S.; Chandel, P.; Kotecha, K.; Ghinea, G. Gas Detection and Identification Using Multimodal Artificial Intelligence Based Sensor Fusion. Appl. Syst. Innov. 2021, 4, 3. [Google Scholar] [CrossRef]
Ahmed, S.; Rahman, M.J.; Razzak, M.A. Design and Development of an IoT-Based LPG Gas Leakage Detector for Households and Industries. In Proceedings of the 2023 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 7–10 June 2023; pp. 0762–0767. [Google Scholar] [CrossRef]
Gomes, J.B.A.; Rodrigues, J.J.P.C.; Rabêlo, R.A.L.; Kumar, N.; Kozlov, S. IoT-Enabled Gas Sensors: Technologies, Applications, and Opportunities. J. Sens. Actuator Netw. 2019, 8, 57. [Google Scholar] [CrossRef]
Peng, P.; Zhao, X.; Pan, X.; Ye, W. Gas Classification Using Deep Convolutional Neural Networks. Sensors 2018, 18, 157. [Google Scholar] [CrossRef] [PubMed]
Hansen, J.; Ruedy, R.; Sato, M.; Lo, K. Global surface temperature change. Rev. Geophys. 2010, 48, 1–29. [Google Scholar] [CrossRef]
Rohde, R. March 2024 Temperature Update. 2024. Available online: https://berkeleyearth.org/march-2024-temperature-update/ (accessed on 15 July 2024).
Wallington, T.; Schneider, W.; Worsnop, D.; Nielsen, O.; Sehested, J.; Debruyn, W.; Shorter, J. The environmental impact of CFC replacements—HFCs and HCFCs. Environ. Sci. Technol. 1994, 28, 320A–326A. [Google Scholar] [CrossRef] [PubMed]
Sheraz, M.; Anus, A.; Le, V.C.T.; Swamidoss, C.M.A.; Kim, E.k.; Kim, S. A comprehensive review of contemporary strategies and approaches for the treatment of HFC-134a. Greenh. Gases Sci. Technol. 2021, 11, 1118–1133. [Google Scholar] [CrossRef]
Stephen, O.; Andersen, M.L.H.; Borgford-Parnell, N. Stratospheric ozone, global warming, and the principle of unintended consequences—An ongoing science and policy success story. J. Air Waste Manag. Assoc. 2013, 63, 607–647. [Google Scholar] [CrossRef]
Analysis of Equipment and Practices in the Refrigerant Reclamation Industry|US EPA—epa.gov. Available online: https://shorturl.at/CbTgU (accessed on 20 May 2024).
Jaggi, N.; Vij, D. Fourier transform infrared spectroscopy. In Handbook of Applied Solid State Spectroscopy; Vij, D.R., Ed.; Springer US: Boston, MA, USA, 2006; pp. 411–450. [Google Scholar] [CrossRef]
Xue, M.; Kojima, N.; Zhou, L.; Machimura, T.; Tokai, A. Trade-off analysis between global impact potential and local risk: A case study of refrigerants. J. Clean. Prod. 2019, 217, 627–632. [Google Scholar] [CrossRef]
Xue, M.; Kojima, N.; Machimura, T.; Tokai, A. Flow, stock, and impact assessment of refrigerants in the Japanese household air conditioner sector. Sci. Total Environ. 2017, 586, 1308–1315. [Google Scholar] [CrossRef]
Guo, F.; Rasmussen, B. Performance benchmarking of residential air conditioning systems using smart thermostat data. Appl. Therm. Eng. 2023, 225, 120195. [Google Scholar] [CrossRef]
Rathod, K.; Ravula, S.C.; Kommireddi, P.S.C.; Thangeda, R.; Kikugawa, G.; Chilukoti, H.K. Predicting thermophysical properties of alkanes and refrigerants using machine learning algorithms. Fluid Phase Equilibria 2024, 578, 114016. [Google Scholar] [CrossRef]
Qian, H.; Hrnjak, P. Mass measurement based calibration of a capacitive sensor to measure void fraction for R134a in smooth tubes. Int. J. Refrig. 2020, 110, 168–177. [Google Scholar] [CrossRef]
Li, W.; Hrnjak, P. Quantification of two-phase refrigerant distribution in brazed plate heat exchangers using infrared thermography. Int. J. Refrig. 2021, 131, 348–358. [Google Scholar] [CrossRef]
Mota-Babiloni, A.; Makhnatch, P. Predictions of European refrigerants place on the market following F-gas regulation restrictions. Int. J. Refrig. 2021, 127, 101–110. [Google Scholar] [CrossRef]
Amin, H.U.; Mumtaz, W.; Subhani, A.R.; Saad, M.N.M.; Malik, A.S. Classification of EEG Signals Based on Pattern Recognition Approach. Front. Comput. Neurosci. 2017, 11, 103. [Google Scholar] [CrossRef] [PubMed]
Georgoulas, G.; Karvelis, P.; Loutas, T.; Stylios, C.D. Rolling element bearings diagnostics using the Symbolic Aggregate approXimation. Mech. Syst. Signal Process. 2015, 60–61, 229–242. [Google Scholar] [CrossRef]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Ince, T.; Kiranyaz, S.; Gabbouj, M. A Generic and Robust System for Automated Patient-Specific Classification of ECG Signals. IEEE Trans. Biomed. Eng. 2009, 56, 1415–1426. [Google Scholar] [CrossRef]
Taylan, O.; Sattari, M.A.; Elhachfi Essoussi, I.; Nazemi, E. Frequency Domain Feature Extraction Investigation to Increase the Accuracy of an Intelligent Nondestructive System for Volume Fraction and Regime Determination of Gas-Water-Oil Three-Phase Flows. Mathematics 2021, 9, 2091. [Google Scholar] [CrossRef]
Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
Christ, M.; Kempa-Liehr, A.W.; Feindt, M. Distributed and parallel time series feature extraction for industrial big data applications. arXiv 2017, arXiv:1610.07717. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer Series in Statistics; Springer: New York, NY, USA, 2001. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar]
Haykin, S.; Lippmann, R. Neural Networks. A Comprehensive Foundation. Int. J. Neural Syst. 1994, 5, 363–364. [Google Scholar]
Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Efron, B.; Tibshirani, R. Improvements on Cross-Validation: The .632+ Bootstrap Method. J. Am. Stat. Assoc. 1997, 92, 548–560. [Google Scholar]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann: Amsterdam, The Netherlands, 2011. [Google Scholar]
Van Der Maaten, L.; Postma, E.; Van den Herik, J. Dimensionality reduction: A comparative review. J. Mach. Learn. Res. 2009, 10, 66–71. [Google Scholar]
van der Maaten, L.; Hinton, G. Viualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. An overview of our approach.

Figure 2. The IoT sensor.

Figure 3. Signal acquired from the IoT sensor.

Figure 4. t-SNE visualization of machine learning features derived from sensor data.

Table 1. Distribution of Collected Signals for Refrigerant Gases.

Type of Refrigerant Gas	Number of Signals
R32	120
R134a	120

Table 2. Distribution of collected signals with the two refrigerant gases, R134a and R32 with different air dilution levels.

	Number of Signals
Dilution with Air	0%	5%	10%	25%	50%	70%
R32	20	20	20	20	20	20
R134a	20	20	20	20	20	20

Table 3. Confusion Matrix for the Classification of Refrigerant Gases.

Predicted\Actual	R134a	R32
R32	22	2
R134a	2	22

Table 4. Performance Metrics for Gas Type Classification.

Class	Precision	Recall	F1-Score
R32	0.9167	0.9167	0.9167
R134a	0.9167	0.9167	0.9167

Table 5. Confusion Matrix for the % Refrigerant Type Classification Problem.

Actual	Predicted Class
Actual	R32 0%	R32 5%	R32 10%	R32 25%	R32 50%	R32 70%	R134a 0%	R134a 5%	R134a 10%	R134a 25%	R134a 50%	R32 70%
R32 0%	2	2	0	0	0	0	0	0	0	0	0	0
R32 5%	2	2	0	0	0	0	0	0	0	0	0	0
R32 10%	0	0	3	0	0	0	0	0	0	1	0	0
R32 25%	0	0	0	4	0	0	0	0	0	0	0	0
R32 50%	0	0	1	2	0	1	0	0	0	0	0	0
R32 70%	0	0	0	1	0	3	0	0	0	0	0	0
R134a 0%	0	0	0	0	0	0	2	0	0	0	1	1
R134a 5%	0	0	0	0	0	0	0	4	0	0	0	0
R134a 10%	0	0	3	0	0	0	0	0	4	0	0	0
R134a 25%	0	0	0	0	0	0	0	0	0	3	1	0
R134a 50%	0	0	0	0	0	0	0	0	0	0	4	0
R134a 70%	0	0	0	0	0	0	0	0	0	0	0	4

Table 6. Classification Performance Metrics.

Class	Precision	Recall	F1-Score
R32 0%	0.5000	0.5000	0.5000
R32 5%	0.5000	0.5000	0.5000
R32 10%	0.7500	0.7500	0.7500
R32 25%	0.5714	1.0000	0.7273
R32 50%	0.0000	0.0000	0.0000
R32 70%	0.7500	0.7500	0.7500
R134a 0%	1.0000	0.5000	0.6667
R134a 5%	1.0000	1.0000	1.0000
R134a 10%	1.0000	1.0000	1.0000
R134a 25%	1.0000	0.7500	0.8571
R134a 50%	0.5714	1.0000	0.7273
R134a 70%	0.8000	1.0000	0.8889
Accuracy			0.7292

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Argirusis, N.; Konstantaras, J.; Argirusis, C.; Dimokas, N.; Thanopoulos, S.; Karvelis, P. A Machine Learning Approach for the Classification of Refrigerant Gases. Appl. Sci. 2024, 14, 6230. https://doi.org/10.3390/app14146230

AMA Style

Argirusis N, Konstantaras J, Argirusis C, Dimokas N, Thanopoulos S, Karvelis P. A Machine Learning Approach for the Classification of Refrigerant Gases. Applied Sciences. 2024; 14(14):6230. https://doi.org/10.3390/app14146230

Chicago/Turabian Style

Argirusis, Nikolaos, John Konstantaras, Christos Argirusis, Nikos Dimokas, Sotirios Thanopoulos, and Petros Karvelis. 2024. "A Machine Learning Approach for the Classification of Refrigerant Gases" Applied Sciences 14, no. 14: 6230. https://doi.org/10.3390/app14146230

APA Style

Argirusis, N., Konstantaras, J., Argirusis, C., Dimokas, N., Thanopoulos, S., & Karvelis, P. (2024). A Machine Learning Approach for the Classification of Refrigerant Gases. Applied Sciences, 14(14), 6230. https://doi.org/10.3390/app14146230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Approach for the Classification of Refrigerant Gases

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. IoT Approach

3.2. Signal Acquisition

3.3. Feature Extraction

3.4. Random Forest Classifier

4. Dataset

5. Evaluation

5.1. Gas Type Classification

5.2. Gas Type and Dilution Level Classification

5.3. Data Visualization

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI