1. Introduction
According to the World Health Organization (WHO), approximately 1.35 million fatal accidents have occurred around the world, and the number is increasing annually [
1]. Within this number, the study states that 15% of these accidents are caused by driver drowsiness and impaired cognition ability. Moreover, agencies such as the American Automobile Association (AAA) predict that one out of eight accidents in the United States which require hospitalization happen because of driver drowsiness and fatigue [
2]. Thus, this condition can be seen as a life-threatening event, especially when the driver is cruising at a high speed, and the damage caused by these accidents is even more severe to public lives and property.
Drowsiness can be caused by many factors. Among them are chronic driver fatigue, lack of sleep, and increased CO
2 concentration in the vehicle [
3]. Some studies show that the cabin inside a vehicle contains different pollutants that can affect human health such as carbon monoxide (CO), carbon dioxide (CO
2), nitrogen dioxide (NO
2) and volatile organic compounds (VOC) [
4,
5]. They can cause various health concerns including impaired vision and physical coordination while driving, as well as dizziness and fatigue to the occupants [
6]. Furthermore, these combinations make it difficult for drivers to operate vehicles on the road [
7,
8].
Over time, a new generation of vehicle manufacturers have concentrated on Heating, Ventilation and Air Conditioning (HVAC) systems to provide a fresh air mode or re-circulation (RC) mode options for the occupants. Most HVAC systems use RC modes to help in reducing the distribution of pollutants and gases which come from the exhaust system. However, since most of the major air pollutants cannot be seen with human eyes, drivers are not aware of the air quality inside the vehicle cabin. Nevertheless, they inhale oxygen and then replace it with carbon dioxide (CO
2), which acts as a part of contamination known as human bio-effluents [
9]. The elevated concentration of CO
2 reduces individuals’ cognitive ability, which results in drowsiness, dizziness and fatigue [
10].
Thus, there is a need to provide monitoring systems that can measure in-vehicle air pollutants and ultimately monitor drivers’ conditions while driving. Previous studies used monitoring technologies such as cameras and in-vehicle sensors that are difficult to install and may constrain the driver’s behavior. Most of the existing systems have employed artificial intelligence techniques to provide decision-making processes on air quality [
11]. Such approaches include rule-based systems. Although they have made significant contributions in this area, real-time monitoring systems are still immature and remain challenging. This may be due to the need to provide various rules in order to allow the system to work efficiently. Detection accuracy also depends on the parameters inside the in-vehicle environment, which always vary continuously. Furthermore, these studies only focused on classifying the air quality in real-time, without having the ability to predict future conditions [
12].
In order to provide accurate prediction tasks, real-time information on various pollutants in the vehicle is required. Up to this date, the information has not been available in an online and public repository, nor in a constantly updated database. There are only a few published works which focus on driver drowsiness and its relationship with air pollutants inside the vehicle. Furthermore, there is little information on the available systems on the market that can classify and predict the future state of in-vehicle conditions and visualize them in an interactive visualization mode.
This paper recognizes the above-mentioned limitations and addresses them by proposing a new approach to classify in-vehicle air quality and predict the future state of its conditions. In this respect, two deep learning models are used to handle the time-series data, which are Long Short-term Memory (LSTM) and Gated Recurrent Units (GRU). These methods are then compared with the conventional approaches of machine learning algorithms such as Support Vector Regression (SVR) and Multi-Layer Perceptron (MLP) to evaluate their performances in terms of performance metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and coefficient of determination (R2).
The remainder of the paper is organized as follows:
Section 2 describes the previous work related to this study.
Section 3 gives detailed explanations of the data used and the methodology applied to predict the future state.
Section 4 presents the experimental results. Finally,
Section 5 concludes the study and provides directions for future research.
2. Related Work
Studies in indoor air quality prediction have increased considerably in recent years. However, most of the main topics have focused on indoor or outdoor environments. It can be seen that most air quality indexes and standards are introduced for outdoors in the selected environment. Up to this date, air quality inside the vehicle cabin has not been included in any standards [
13]. Over time, driver monitoring systems have been developed to monitor and measure drivers’ conditions while they are driving [
14]. This is due to the progression in autonomous driving technologies, which promote precision driver safety and health [
15]. One of the concerns in the driver monitoring system is drowsiness, a condition due to lack of oxygen and increase of air pollutants from the outside environment, such as CO, CO
2, and NO
2 [
16]. Furthermore, air pollution is also a major concern that can affect drivers’ ability to focus on the road [
17]. Studies show that long-term exposure to air pollution puts a high risk on human health, and results in respiratory and cardiovascular problems, neuropsychiatric complications, skin diseases and chronic illnesses such as cancer [
18].
Several studies found that a high concentration of CO
2 could affect human decision-making performance. Although not immediately life-threatening, it had a significant impact, particularly when driving. Ref. [
19] reported that seven out of nine cognitive function domains could be affected by the increase of CO
2 concentration in the vehicle cabin. Prolonged exposure to a high concentration of CO
2 (1400 ppm) affected human cognitive performance significantly, compared with 100% outdoor air ventilation and a moderate CO
2 (~945 ppm) condition. Meanwhile, ref. [
20] conducted an experiment by collecting CO
2 concentrations every 5 min with two different air circulation modes. It found that the CO
2 concentration reached 3200 ppm after one hour and human subjects reported an unpleasant sensation occurring after 25 min.
Table 1 presents the rest of the related work focusing on in-vehicle air quality systems.
Air Quality Index (AQI) is used as a standard to measure the current air quality in the surrounding environments. In particular, it measures the state of each air quality parameter relative to human need or purposes [
26]. This helps to show the public the current air quality and determine whether it has an impact on their health. Several AQIs have been established in different countries with different names, limit ranges and observation parameters. For example, the Air Quality Health Index (AQHI) has been introduced in Canada and Hong Kong [
27]. Singapore utilizes the Pollutant Standard Index (PSI) while Malaysia uses the Air Pollution Index (API).
Several techniques have been introduced to predict air quality in the in-vehicle environment. Some use electronic devices that are attached to the driver’s skin to measure biological signals such as electrocardiography, electrooculography and electromyography [
28]. They monitor variations in the brain signal and determine cognitive ability and psychological state for driving. Another approach involves the use of cameras, where visual information is obtained on the driver’s behavior [
29]. Visual characteristics including the eyes and mouth are analyzed to detect signs of drowsiness or distraction such as yawning and eye activity. In recent years, new technology involving multi-modal sensors that can analyze drivers’ bio-signals has been emerging [
30]. This includes the concentration of air pollutants in the car as well as particulate matter. The method is very convincing, as these gases can affect decision-making ability and information usage.
Table 2 represents examples of pollutant gases that can affect a driver’s ability to drive properly in the in-vehicle cabin.
With respect to the prediction system, artificial intelligence approaches have been widely used. Other traditional prediction methods use statistical techniques and mathematical models such as linear regression, principal component analysis (PCA) and multiple linear regression [
34]. In addition, machine learning approaches such as Support Vector Machine (SVM) and Decision Tree (DT) are also used to classify air quality [
30]. However, traditional prediction techniques are ill-suited for time-series applications, and prediction results always depend on the historical data [
35]. Furthermore, features have to be selected and manually handcrafted each time the environment changes. This contributes to time-consuming and ineffective classification systems [
36].
In recent years, studies have shown that deep learning models have an excellent capability of dealing with time-series data as well as with long-term dependencies of air quality prediction data. In particular, deep learning has gained increasing interest in the prediction field. The model contains hidden layers that have the capability of learning data patterns autonomously [
37]. Furthermore, deep learning has advantages compared with other traditional approaches. These include the ability to extract features automatically without having to undergo handcrafted feature extraction. Moreover, deep learning utilizes the use of shallow features which are difficult to use with traditional methods. With respect to this study, researchers have applied a deep learning model to predict air quality. For example, ref. [
38] applied an LSTM and Deep Autoencoder model to predict air quality in Seoul, South Korea. The study showed high prediction results using parameters such as PM
10 and PM
2.5. Moreover, ref. [
39] also used particulate matter as the main parameter. The study applied LSTM and GRU models to predict air quality, and found that GRU had the highest performance rates compared to the LSTM model. Most of the studies only focused on indoor or outdoor air quality. In addition, the learning models were performed post-analysis, rather than in real-time systems.
From the review, it can be seen that most of the presented work focused on air quality prediction for indoor or outside environments. This paper has a different viewpoint, from which it investigates the capability of deep learning algorithms to predict air quality inside the vehicle cabin. The work compares the performance of deep learning with traditional machine learning algorithms using several parameters such as CO2, particulate matter, temperature and humidity. This is important to ensure the safety of driver and passengers when driving vehicles on the road.
5. Experimental Results
The comparison process was performed to evaluate which approaches gave the best performances. It was carried out between the machine learning and deep learning models. The machine learning algorithm was represented by SVR and MLP, while deep learning models were comprised of LSTM and GRU, which are the variant models from Recurrent Neural Network (RNN). In this paper, SVR and MLP are regarded as the machine learning approach because they have not been provided with the recurrence feedback to update the weight and bias. However, for the deep learning approach, both the LSTM and GRU models were provided with recurrent feedback to improve the weight and bias values that were used.
The hyperparameter values were decided using the grid-search method.
Table 6 presents the specific value of the hyperparameters that were applied using the grid-search method. It can be seen that tuning in these parameters impacts greatly the prediction results. The structures of MLP, LSTM and GRU are much more similar, as they are composed of a similar branch of learning model, while SVR is different in terms of different parameters such as kernel, kernel coefficient, and regularization parameter.
The next process was to build the prediction models with the predefined hyperparameters that were determined in the previous process. The collected data was used to build the proposed models. The input for the training and validation model was divided into 80% and 20%, respectively. The training data were also based on the section and monthly data.
Table 7 presents the performance of the proposed models validated by the section and monthly data. The result clearly shows that the SVR with RBF kernel and GRU models had almost the same performance index in the evaluation results. However, the proposed GRU prediction model presented much higher performance rates in terms of prediction accuracy and reduced error rates. The GRU model obtained the R
2 value of 0.83 for the section data and 0.97 for the monthly data.
Lastly, the final step was to evaluate the performance of the prediction of in-vehicle air quality.
Table 8 shows the results of the future prediction data. It compares the three types of time periods: five minutes, ten minutes and twenty minutes. It can be seen that the prediction of the five-minute data had a slightly higher performance rate, compared with the ten- and twenty-minute data predictions. The results also show that the proposed GRU prediction model gave a very stable evaluation performance across the three types of data. Meanwhile,
Figure 8 shows the graph visualization of the prediction results, using the proposed GRU prediction models compared with the actual data.
6. Conclusions
After an extensive series of experiments, it can be concluded that the GRU model from the deep learning approach gives a good performance in predicting in-vehicle air quality. The model was compared with the LSTM model as well as with SVR and MLP from the traditional machine learning models. The proposed model achieved the highest prediction error of 0.97 for R2. Furthermore, the GRU model also showed the lowest error in terms of MSE, RMSE and MAE. From these experiments, it can be seen that the performance of the prediction system depends on the time taken to collect the data. From the result, the GRU model with five-minute data had the highest performance compared with the ten- and twenty-minute data.
Moreover, the model’s hyperparameters were also optimized using the grid-search method. This allowed the optimum value to be used for the model to predict air quality. The overall results showed that the GRU model was able to capture the historical data of installed sensors and predict them successfully. However, some limitations were noted throughout the study. It can be seen that some data are missing due to the loss of internet connectivity. Furthermore, the in-vehicle system needs to be provided with reliable communication systems in order to provide an efficient prediction system
For future work, the model will be embedded in the cloud database for faster data processing. This task can be extended to various applications of prediction systems for smart mobility applications. Furthermore, the feature extraction process can be conducted before performing the prediction task. The goal is to autonomously extract relevant features for representing environmental conditions and to compare the performance rates with non-extracted feature methods.