Next Article in Journal
Unveiling the Hidden Dangers of Plasticizers: A Call for Immediate Action
Next Article in Special Issue
Association between Air Pollution and Short-Term Outcome of ST-Segment Elevation Myocardial Infarction in a Tropical City, Kaohsiung, Taiwan
Previous Article in Journal
Two Different Heated Tobacco Products vs. Cigarettes: Comparison of Nicotine Delivery and Subjective Effects in Experienced Users
Previous Article in Special Issue
Associations between Air Pollution Exposure and Blood Pressure during Pregnancy among PRINCESA Cohort Participants
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model

1
Department of Chemical and Environmental Engineering, Seokyeong University, Seoul 02713, Republic of Korea
2
Department of Nano and Biological Engineering, Seokyeong University, Seoul 02713, Republic of Korea
3
Department of Health, Division Chemical Analysis Center, Korea Conformity Laboratories, Seoul 08503, Republic of Korea
4
Department of Occupational Health, Daegu Catholic University, Gyeongsan 38430, Republic of Korea
5
Department of Nano, Chemical and Biological Engineering, Seokyeong University, Seoul 02713, Republic of Korea
6
Department of Public Health, California State University, Fresno, CA 93740, USA
*
Author to whom correspondence should be addressed.
Toxics 2023, 11(6), 526; https://doi.org/10.3390/toxics11060526
Submission received: 27 April 2023 / Revised: 27 May 2023 / Accepted: 7 June 2023 / Published: 12 June 2023
(This article belongs to the Special Issue Ambient Air Pollution Exposure and Human Health)

Abstract

:
This study aims to propose an indoor air quality prediction method that can be easily utilized and reflects temporal characteristics using indoor and outdoor input data measured near the indoor target point as input to calculate indoor PM2.5 concentration through a multiple linear regression model. The atmospheric conditions and air pollution detected in one-minute intervals using sensor-based monitoring equipment (Dust Mon, Sentry Co Ltd., Seoul, Korea) inside and outside houses from May 2019 to April 2021 were used to develop the prediction model. By dividing the multiple linear regression model into one-hour increments, we attempted to overcome the limitation of not representing the multiple linear regression model’s characteristics over time and limited input variables. The multiple linear regression (MLR) model classified by time unit showed an improvement in explanatory power by up to 9% compared to the existing model, and some hourly models had an explanatory power of 0.30. These results indicated that the model needs to be subdivided by time period to more accurately predict indoor PM2.5 concentrations.

1. Introduction

Koreans spend around 20.66 hours per day indoors, equivalent to spending more than 72 years in an indoor environment based on life expectancy as of 2021 [1,2]. As most modern people spend the majority of their time indoors, it is critical to identify and regulate the concentration of pollutants in the indoor environment. PM2.5 is particulate matter with an aerodynamic diameter of 2.5 μm or smaller and is detrimental to health due to its ease of adsorption and concentration of poisonous substances [3]. Previous studies have shown that long-term exposure to PM2.5 significantly increases the chances of cardiopulmonary problems and the mortality of lung cancers [4,5]. Recently, high concentrations of PM2.5 in the air have become common in Korea. Thus, it is necessary to know the spatial or temporal distributions of indoor PM2.5 concentrations to prevent adverse health impacts of PM2.5 exposure on occupants.
Indoor air quality is mostly monitored with measuring devices. However, assessment of measurement-based indoor air quality requires a significant amount of time and money. Among the various indoor spaces, difficulties in installing, maintaining, and repairing measurement devices are of particular note in residential environments [6]. To overcome the limitations of this measurement-based monitoring, many attempts have been made to estimate indoor particulate matter (PM10, PM2.5) concentrations. In particular, approaches through artificial intelligence, such as machine learning and deep learning, are being intensively researched [7,8]. Models that have been commonly used in these studies are multiple linear regression (MLR), decision tree model, support vector machine, random forests, and artificial neural networks (ANN) [9]. However, these prediction models only estimate parameters based on knowledge of other parameters (temperature, relative humidity, occupants’ activities, etc.) and are not designed to incorporate time as a variable or reflect temporal characteristics [9].
Meanwhile, since indoor air quality is affected by various factors, it is necessary to determine the relative contribution between variables affecting indoor PM2.5 concentration and determine input variables to accurately predict indoor air quality. Thus, previous studies have developed prediction models by choosing ventilation conditions, indoor pollutants, indoor airflow, and pressure that can be directly measured or validated indoors as input variables to increase predictive performance [10,11]. Using input variables obtained through sampling or surveys, such as ventilation rate, indoor pollutant concentration, pressure, and airflow, activity pattern information, may result in high performance [12]. However, these variables are difficult to obtain in real-time, making it even more challenging to predict indoor air quality accurately.
To create an effective indoor concentration prediction model, it should be composed of easily obtainable variables. According to a previous study, it was found that indoor air quality was more affected by outdoor air quality than indoor sources in the case of natural ventilation [13]. When the windows were closed, outdoor sources accounted for 53 to 63% of indoor PM2.5 concentration. However, this increased to 92% when the windows were open [14]. In particular, the concentration of PM2.5 in outdoor air can enter indoors through cracks or gaps in building envelopes and windows, so outdoor PM2.5 is a key factor that can affect indoor PM2.5 [15].
In most previous studies, outdoor PM2.5 concentrations used fixed station’s data that were somewhat distant from the target point [16,17]. In the case of Korea, the national monitoring network is established at the city and count, so the outdoor PM2.5 concentration may differ from the indoor concentration in the air near the measured point [18]. By contrast, the I/O ratio, which is calculated as the ratio of the average indoor concentration to the average outdoor concentration, can estimate the approximate average indoor concentration through the outdoor concentration [19]. The data used to calculate the I/O ratio in most studies were performed simultaneously at the outdoor measurement point near the indoor sampling location [20]. Using the I/O ratio to estimate indoor PM2.5 concentrations can be a useful approach when it is difficult to obtain direct indoor PM2.5 measurements. However, it is important to note that the I/O ratio may not accurately reflect the actual indoor concentration as it assumes that the infiltration of outdoor PM2.5 is constant over time. An influencing variable of indoor PM2.5 concentration is only the outdoor PM2.5 concentration. Therefore, the I/O ratio method should be used with caution and supplemented with other methods to accurately predict the indoor PM2.5 concentration.
The main goal of this study was to overcome the limitations of using fixed station data as input values, which failed to reflect the temporal characteristics of existing indoor air quality prediction models. To achieve this, we used the I/O ratio method and utilized outdoor PM2.5 concentration, temperature, and humidity data measured near the indoor target point as input data to investigate the relationship between indoor and nearby outdoor PM2.5 concentrations. Subsequently, the indoor PM2.5 concentration was calculated using a multiple linear regression (MLR) model, employing easily available variables such as meteorological data, and the influence of outdoor variables was confirmed through weights and error terms. Finally, our aim was to provide an easily utilizable indoor PM2.5 concentration prediction method that can accurately reflect temporal characteristics.

2. Materials and Methods

2.1. Measurement Method

In this study, we measured the indoor and outdoor PM2.5 concentrations, temperature, and relative humidity for our indoor PM2.5 concentration prediction model. Other meteorological variables were obtained from the automatic weather system (AWS) data provided by the Korea Meteorological Administration.
First, measurable variables (PM2.5 concentration, temperature, and relative humidity) were collected from the inside and outside of a house in Bu-Cheon and Nam-Yang-Ju. The measuring point was located within a residential complex, with the Si-Heung interchange of the Metropolitan First Circular Expressway approximately 1.8 km to the south and Nam-Yang-Ju interchange of Metropolitan First Circular Expressway and North Arterial Road approximately 1 km to the west and north (Figure 1).
An outdoor air quality measuring instrument (Dust Mon, Sentry Co. Ltd., Seoul, Republic of Korea) using light scattering was used. This instrument had an error within 80–120% of the total average value of PM2.5 measured using standard equipment based on variability evaluation among monitoring equipment at the Korea Testing & Research Institute. The specifications of the measuring instrument are shown in Table 1, and the flow rate was fixed at 0.5 L/min. Measurement data were collected in real-time through LTE Cat M1 and stored on an SD card within the instrument. The instrument was attached to the roof of the third floor of the target house and to the wall of the living room in Bu-Cheon and to the balcony, and to the room in Nam-Yang-Ju. Indoor and outdoor real-time PM2.5 concentrations were measured simultaneously at one-minute intervals for one-year (1 May 2019 to 30 April 2020 and 27 June 2020 to 22 April 2021).
The meteorological data utilized in the study were obtained from AWS located at the target points in Bu-Cheon (37°50′05.49″ N, 126°76′36.40″ E) and Nam-Yang-Ju (37°63′42.48″ N, 127°15′11.84″ E).

2.2. Data Analysis

2.2.1. Statistics Analysis

Indoor and outdoor variables, including PM2.5, temperature, and relative humidity, were directly measured and collected at ten-minute intervals. The meteorological data (wind direction, wind speed, precipitation) were obtained through an AWS, and atmospheric data and time data (year-month-day 00:00) were extracted by averaging the minute data over ten-minute intervals. Rows containing missing or negative values were removed before performing descriptive statistical analysis and correlation analysis using the statistical program R.
Spearman’s rank correlation analysis, a non-parametric analysis method, was used for the correlation analysis considering the non-normal distribution of PM concentrations, and the significance level was set at 0.05.

2.2.2. Selection of Input Variables

The selection of input variables is a crucial consideration in modeling methods because they determine the model structure and can impact the coefficients and overall performance [21,22,23]. In this study, we aimed to use easily accessible meteorological variables as input variables. However, these variables exhibit nonlinear characteristics when predicting PM concentration [24]. This is due to numerous artificial conditions, such as household heating, transportation, and the activities of occupants, that can affect the immediate PM concentration [25]. Although using corresponding variables to increase the predictive power of the model may seem effective, it is not feasible for the purpose of this study, which is to propose an effective model. Furthermore, since sources that affect particulate matter concentration differ across regions, applying a variable that can impact immediate PM concentration is limited to a specific location.
In this study, outdoor PM concentration, temperature, relative humidity, and other meteorological variables were selected as the main input variables. Additionally, to overcome the limitation of the existing prediction model’s training data being somewhat distant from the indoor target point, a measurement method was applied to calculate the I/O ratio. Data obtained from measuring devices installed outdoors near the target point were used to determine the outdoor PM2.5 concentration, temperature, and relative humidity.
The MLR model used in this study is based on the I/O ratio, with the outdoor PM2.5 concentration selected as the initial variable. Other variables, including indoor and outdoor temperature and relative humidity measured by a sampling device, AWS data (wind speed, wind direction, and precipitation), as well as the difference between indoor and outdoor temperature and relative humidity, were also selected as input variables. Currently, the temperature and relative humidity difference between indoor and outdoor environments are recognized as contributing factors to natural ventilation, and hence, it was considered as a potential input variable [26]. The final input variables were chosen based on the results of correlation analysis and previous research.
To check for multi-collinearity among the independent variables, the variance inflation factor (VIF) was calculated using Equation (1). A VIF value of 1 indicates independence among variables, while a value > 5 indicates a high correlation among variables. If the VIF value is >10, one of the variables violating independence must be removed [27].
V I F i = 1 1 R i 2
In Equation (1), V I F i is the variance expansion factor for the ith independent variable, and R i 2 is the R2 value of regression analysis after removing the ith independent variable.

2.3. Data Preprocessing before Training

Data pre-processing and learning methods are shown in Figure 2. The dataset was first processed to remove missing values, after which it was divided into hourly units based on date and time, resulting in 24 datasets from 0:00 to 23:00. To ensure that the model predicts a universal level of concentration and improve its performance, outliers of indoor PM2.5 concentration were removed from the 24 datasets using the interquartile range method [28]. The equations for calculating the interquartile range (Equation (2)) and for detecting outliers (Equation (3)) are as follows:
I Q R = Q 3 Q 1
Q 1 1.5 × I Q R x Q 3 + 1.5 × I Q R
In Equations (2) and (3), I Q R denotes the interquartile range, Q 3 denotes the third quartile, and Q 1 denotes the first quartile.

2.4. Multiple Linear Regression Model

Multiple linear regression (MLR) is a technique used for modeling the linear relationship between two or more variables. The model is fitted such that the sum of squares of differences between observed and predicted values is minimized [29]. The following represents an MLR model:
y = a 1 x 1 + a 2 x 2 + a 3 x 3 + + a n x n + ε
In Equation (4), y is the dependent variable (PM2.5), x 1 , x 2 , x n are independent variables, ε is the intercept.
MLR was used to create a prediction model for indoor PM2.5 concentration. Prior to the application of the MLR procedure, all data were normalized according to Equation (4) [30].
Z i = x i m i n ( x ) max x m i n ( x )
In Equation (5), Z i denotes ith normalized value, x i denotes ith observed value for the variable x , min( x ) denotes minimum value in the dataset, max( x ) denotes maximum value in the dataset. MLR model was formulated using Scikit-learn (version 1.0.2) [31].
In this study, we aim to develop a real-time indoor PM2.5 concentration prediction model using gradient descent, which is a method used for estimating model weights in deep learning models such as neural networks.
The gradient descent methods include full gradient descent (i.e., batch gradient descent), stochastic gradient descent (SGD), and mini-batch gradient descent [32]. Full gradient descent uses the entire dataset to update the parameters once, but it can take a long time to calculate the coefficients if the dataset is large. In the case of the SGD method, an appropriate gradient can be obtained for one data point that has been randomly sampled from all the data to update the weight quickly. With mini-batch gradient descent, the gradient is calculated using a randomly selected batch size [33]. This method is often used because it is known to solve the problem of gradient vanishing and exploding, which can prevent finding better weight. However, very recently, it has been discovered that mini-batching is not necessary to resolve the non-vanishing variance issue inherent in the original SGD methods [31].
The regression coefficient of the prediction model was calculated using the SGD method and scikit-learn’s “SGD-Regressor” library for training. Default values were used for hyper-parameters that could not be adjusted, except for the regularization intensity (alpha) and initial learning rate (eta0) parameters [31].
Regularization is a method used to prevent the overfitting of MLR. Ridge regression (L2), a type of weight regularization method, was applied to utilize all the selected variables. Ridge regression includes a penalty term, as shown in Equation (6), which helps to improve the overfitting problem of the model by adjusting the alpha value of the penalty term to reduce the overall weight. The larger the alpha value, the stronger the regularization intensity; when alpha is zero, regularization is not applied. To apply an appropriate regularization intensity, we adjusted the alpha value to a multiple of 10 within the range of 0.0001 to 10.
a r g m i n w , b 1 n i = 1 n ( y i y i ^ ) 2
Error = a r g m i n w , b 1 n i = 1 n ( y i y i ^ ) 2 + a w i 2
where y i is the measured value, y i ^ is the predicted value, n is the number of data, w i 2 is the weight and a is alpha.
Meanwhile, the initial learning rate (eta0) was set to 0.001 instead of the default value of 0.01. This change was made because when eta0 was set to the default value, it converged to local optimization instead of global optimization.
The dataset was split into 70% training data and 30% test data for use in the SGD model. Random classification was applied using the ‘train_test_split’ and ‘random_state’ libraries in the scikit-learn package. The SGD models were trained using the training dataset, and their performance was assessed using the testing data that were not used during training.

2.5. Performance Indicators

The accuracy of the MLR methods was evaluated using the coefficient of determination ( R 2 ), root-mean-square error ( R M S E ), and mean absolute error ( M A E ). The R2 value is commonly used to explain how much of the variability in the predicted data can be explained by the relationship between the predicted and observed values. The RMSE and MAE are used to measure the difference between the measured and predicted values. The equations for these performance indicators are given in Equations (8)–(10), respectively [34]:
R 2 = 1 R S S T S S = 1 i = 1 n ( y i y i ^ ) 2 i = 1 n ( y i y ¯ ) 2
R M S E = i = 1 n ( y i y i ^ ) 2 n
M A E = y i y i ^ n
where y i , y i ^ , and y ¯ are the measured and predicted values of each output variable, and n is the number of samples.

3. Results

3.1. Distribution Characteristics of Indoor and Outdoor Measurement Data

As a result of identifying the distributions of measurement data, indoor PM2.5 concentration was 10.31 ± 13.70 μg/m3, the outdoor PM2.5 concentration was 26.28 ± 20.69 μg/m3, and the I/O ratio was 0.39 and median ratio was 0.29 (Table 2). To investigate the distribution of each variable, skewness, and kurtosis were calculated. As a result, indoor PM2.5 concentrations were found to have positively skewed and highly peaked distributions, with skewness values of 4.44 and kurtosis values of 50.27. Similarly, outdoor PM2.5 concentrations were found to have similar distributions. By contrast, temperature and relative humidity were found to have approximately normal distributions.
This study aimed to develop an indoor PM2.5 concentration prediction model that reflects temporal characteristics. Thus, the distribution characteristics of indoor PM2.5 concentrations by time were determined (Figure 3). Indoor PM2.5 concentrations were higher between 7–10 h and 19–21 h than at other times. Furthermore, it was verified that an extreme concentration compared to the average indoor PM2.5 concentration appeared in the evening (17–20 h).

3.2. Selection of Input Variables

To select input variables that can predict indoor PM2.5 well, the correlation coefficient was checked (Figure 4). Indoor PM2.5 concentration was found to have the highest correlation with PM2.5 (r = 0.43) among outdoor parameters, followed by relative humidity (r = 0.40) and wind speed (r = −0.17) (p < 0.05). In the case of indoor parameters, temperature (r = −0.43) (p < 0.05) had a high correlation. The indoor/outdoor temperature difference had a weak negative correlation (r = −0.17), and the relative humidity difference had a negative correlation (r = −0.41) (p < 0.05).
Initially, input variables with a correlation coefficient of 0.1 or higher were selected from the available variables [35]. Then, VIF, a multicollinearity indicator, was used to select the final set of input variables. The VIF values were 9.38 for outdoor PM2.5, 2.33 for indoor temperature, 1.28 for wind speed, 3.75 for outdoor relative humidity, 1.32 for temperature difference (ΔTemp), and 5.33 for relative humidity difference (ΔRH), due to the concern of multicollinearity. Finally, the selected input variables were outdoor PM2.5, indoor temperature, wind speed, outdoor relative humidity, temperature difference (ΔTemp), and relative humidity difference (ΔRH).

3.3. Model Training Result

3.3.1. MLR Model

The result of the training MLR model by the previous method is summarized in Table 3. As a result of checking the model’s performance, the explanatory power of the model that did not reflect time series characteristics was found to be 25%, and the RMSE and MAE were 4.87 and 3.66, respectively. Furthermore, as a result of checking the weight of the model, it was confirmed that the outdoor PM2.5 concentration had the greatest effect on the indoor PM2.5 concentration (16.44), and the indoor temperature was found to have the next negative effect (−9.44).
The calculated regression coefficients and error terms are as follows:
P M 2.5 i n = a · P M 2.5 o u t + b · T e m p . i n + c · R H o u t + d · W S o u t + e   · T e m p . +   f   · R H + ε
where a , b , c , d, e , and f denote regression coefficients and the measured and predicted values of each output variable, and ε denote error terms.

3.3.2. MLR Model Divided by Hour

At this time, unlike the previous model, the prediction model separated the dataset per hour, removed outliers using the interquartile range, and calculated the prediction model through the pre-processed dataset. The prediction model for indoor PM2.5 concentration separated per hour was learned (Table 4).
The explanatory power of the model divided into time periods was determined to be about 20~34%, and the RMSE and MAE were confirmed to be 4~7 and 3~5, respectively. The explanatory power of the time-specific model was found to be improved by up to 9%, and the error between the measured value and the predicted value was improved.
The explanatory power of each model divided per hour was the highest at 0.34 for H4 (4:00~4:59) models. The RMSE and MAE of the H15 (15:00~15:59) model were 3.34157 and 2.55109, respectively, showing the smallest error between the measured value and the predicted value.
At this time, unlike the previous model, the prediction model separated the dataset per hour, removed outliers using the interquartile range, and calculated the prediction model through the pre-processed dataset. The prediction model for indoor PM2.5 concentration separated per hour was learned (Table 5).
As a result of checking the regression coefficient of the MLR model, it was established that the variable that had the greatest effect on the indoor PM2.5 concentration was the indoor temperature at the H9 model, and it was confirmed that it had the greatest negative (−) effect at −17.11. The outdoor PM2.5 was found to have the next largest positive (+) effect, with 15.70 at H8.
The calculated regression coefficients and error terms are as follows:
P M 2.5 i n = a t · P M 2.5 o u t t + b t · T e m p . i n t + c t · R H o u t t + d t · W S o u t t + e t   · T e m p . ( t ) + f t   · R H ( t ) + ε t
where a t , b t , c t , d t , e , and f t denote regression coefficients, and the measured and predicted values of each output variable, and ε t denote error terms.

4. Discussion

4.1. Indoor PM2.5 Concentration and Outdoor Variables

The study aimed to propose a method for predicting indoor PM2.5 concentration by time by adding meteorological variables based on the I/O ratio method using PM2.5 concentration data measured in outdoor air adjacent to indoors. To confirm the correlation between indoor and outdoor PM2.5 concentrations, PM2.5 was measured in outdoor air adjacent to indoors. The outdoor PM2.5 concentration at the study site was found to be 26.28 ± 20.69 μg/m3, which is approximately twice the domestic annual average standard of 15 μg/m3. This concentration is also 1.5 times higher than the annual average PM2.5 concentration (20 μg/m3) of Seoul in 2021, according to the Air Quality Annual Report of the Ministry of Environment [36]. The indoor PM2.5 concentration was determined to be 14.86 ± 15.23 μg/m3, which is lower than the standard (35 μg/m3) for facilities used by sensitive classes in the indoor air quality management standard [37].
As a result of checking the indoor PM2.5 concentration characteristics by time zone, the highest concentration was 14.51 µg/m3 at 9:00, and the PM2.5 concentration between 7 and 10 o’clock was higher than at other times. In the afternoon, it gradually increased after 18:00, appeared high at 11.62 µg/m3 at 20:00, and then gradually decreased. These results were similar to those of previous studies [38,39,40,41,42,43]. The indoor PM2.5 concentration in the morning was higher than at other times, which could be due to occupants preparing for the day, cooking breakfast, or opening windows for ventilation. In the evening, the PM2.5 concentration gradually increased, which could be due to occupants returning home and cooking dinner. The outdoor PM2.5 concentrations on weekdays near apartments gradually increased from 6:00 a.m. and peaked at 9:00 a.m. [39]. It is assumed that this was due to the fact that car traffic and population movement during commuting hours affect the outdoor PM2.5 concentration in locations where residential complexes are concentrated, such as the study site.
After conducting a correlation analysis between indoor PM2.5 and outdoor PM2.5, it was found that the concentration of indoor PM2.5 had a correlation coefficient greater than 0.43 with the outdoor PM2.5 concentration. However, these results showed a lower correlation with outdoor PM2.5 compared to previous studies [40,41,42,43,44]. Meanwhile, in a dry urban environment where atmospheric dust events frequently occur, the correlation between indoor and outdoor PM2.5 (r = 0.82) was confirmed to be very high even when the windows were closed [41]. These findings suggest that the indoor PM2.5 concentration can also increase when the wind speed is strong or when the PM2.5 concentration in the outdoor air is high. In addition, it was also validated that the outdoor PM2.5 concentration had a quantitative effect on the indoor PM2.5 concentration even with the windows closed in offices in downtown areas where dust storms did not frequently occur [39]. These results suggest that the outdoor PM2.5 concentration can be used as an indicator to predict the indoor PM2.5 concentration and that the correlation between indoor and outdoor PM2.5 concentrations can be improved by considering the following meteorological variables with wind speed and direction.
Moreover, based on the analysis, indoor PM10 had the highest correlation coefficient of 0.95, followed by outdoor PM10, outdoor PM2.5, indoor temperature, ΔRH, outdoor RH, wind speed, and ΔT, in that order (p < 0.05). Excluding indoor PM10, significant correlations were found between outdoor PM10 and PM2.5 concentrations with indoor PM2.5 and indoor temperature also showed a significantly negative correlation with indoor PM2.5 concentrations (r = −0.43). Additionally, a significantly high positive correlation of 0.40 or more was observed between outdoor RH and ΔRH.
In other words, meteorological conditions can significantly impact indoor PM concentration levels, as they can affect the particle size depending on the indoor air exchange rate, relative humidity, and the origin of the air mass [45,46]. Therefore, it can be inferred that outdoor meteorological variables may be used to predict indoor PM2.5 concentrations.

4.2. Previous Studies about Indoor PM2.5 Concentration Prediction Model

In this study, the MLR model was used to predict indoor PM2.5 concentration, and the model was subdivided by time period. The results of previous studies that used the MLR model to predict indoor PM2.5 concentration are presented in Table 6. Three out of five studies used survey results as indoor variables, including questions on the number of pets, rooms, air purifiers, use of air fresheners, occupant activity patterns, and building characteristics [36,46,47]. However, the model that was mainly composed of survey items had an explanatory power of 0.35, which was lower than other studies. Other input variables used in these studies included indoor PM10, PM2.5 concentration, temperature, relative humidity, and ventilation rate [36].
All studies have verified that outdoor PM2.5 concentration is a crucial input variable in predicting indoor PM2.5 concentration, using outdoor-related variables such as PM10, PM2.5 concentration, temperature, relative humidity, wind speed, NO2, CO2, etc. [46,47,48,49]. Therefore, it can be inferred that outdoor PM2.5 concentration plays a significant role in determining indoor PM2.5 concentration.
In this study, the prediction model was subdivided by time period. Although no study subdivided the model by time period, as in our study, one study classified it by season, showing that the explanatory power of each model differed by season, and the annual model performed the best. However, this result was influenced by the dataset size, which is an important parameter for model evaluation. Furthermore, the regression coefficient of outdoor PM2.5 in the MLR model varied by season, with autumn (0.58), winter (0.69), spring (0.69), and annual (0.88) seasons showing different effects on indoor PM2.5 [48].
Most studies used data measured for a short period of time, ranging from 48 h to four months, for model training rather than data collected over a year. Additionally, one study used outdoor data from a national measurement network that was distant from the indoor measurement point. Some studies measured both indoor and outdoor air quality at nearby locations simultaneously, as in our study, but most used data were measured for a short period of time.
The performance of the prediction model was evaluated, and the model that used data measured in the experimental building had the lowest RMSE (0.09) and highest R2 (0.99) values, followed by the model predicting PM2.5 concentration in school indoor spaces [48,49]. However, the model developed for residential interiors had an explanatory power of less than 50% based on the validation score [36,46,47]. This is likely due to the significant variation in indoor pollutant concentrations based on individual occupant characteristics, as well as the presence of various sources in living spaces, unlike laboratory buildings or schools. Nevertheless, even when using input variables such as direct measurements or survey results for indoor pollutant concentrations, ventilation rates, or other factors, similar explanatory power to the time zone-based model proposed in this study was observed [36,46,47]. It is expected that the model could, to some extent, reflect the activity patterns of occupants through further subdivision.
Table 6. Summary of indoor PM2.5 prediction studies using MLR model.
Table 6. Summary of indoor PM2.5 prediction studies using MLR model.
Indoor TypeVariableTime Division
(O/X) (1)
DataRMSER2Ref.
IndoorOutdoor
DwellingSurvey resultPM2.5X(1) Country: America
(2) Sampling period: 48 h samping
(3) Sampling the indoor and outdoor data simultaneously, nearby
-0.35[36]
ApartmentSurvey results, building characteristics PM2.5 concentration, temperature, wind speedX(1) Country: Mongolia
(2) Sampling period: 7 days, during 24 h
(3) Indoor data: The direct measurement of indoor air
(4) Outdoor data: The national monitoring network
0.48,
0.50 (val) (2)
0.52,
0.49 (val)
[46]
DwellingPM10_2.5, survey result, VOCs, building characteristicsPM10_2.5, RH, PM2.5X(1) Country: Japan
(2) Sampling period: 7 days, during 24 h
(3) Sampling the indoor and outdoor data simultaneously, nearby
15.70 (val)0.42 (val)[47]
SchoolRelative humidity, temperature, VentilationPM2.5, CO2, wind speed, PM10O(1) Country: Israel
(2) Sampling period: 7 days, 7:00–12:00 in winter and spring, 12:00–17:00 in fall
(3) Indoor and outdoor measurements alternately at 15 min intervals
0.17 (Fall),
0.13 (Winter),
0.14 (Spring),
0.08 (Annual)
0.58 (Fall),
0.69 (Winter),
0.69 (Spring),
0.88 (Annual)
[50]
Laboratory buildingTemperature, Relative humidity, PM10, NO2Temperature, Relative humidity, PM10, NO2, PM2.5X(1) Country: America
(2) Sampling period: May-September 2020 during 24 h
(3) Sampling the indoor and outdoor data simultaneously, nearby
(4) Reflection of time delay effect (TSR model)
0.090.99[51]
(1) Time division (O/X): whether the model was divided by time (season, month, hour, etc.). (2) val: score of validation data.

4.3. MLR Model

In the MLR model, the higher the slope, the greater the influence on the dependent variable, and the larger the intercept, the larger the dependent variable on average [41]. In this study, outdoor PM2.5 concentration and environmental parameters were used as input variables for the indoor PM2.5 concentration prediction model, and normalized data were applied to the MLR model to regress the importance of each variable on indoor PM2.5 concentration.
As a result, the study found that the outdoor PM2.5 concentration had the greatest positive effect on the indoor PM2.5 concentration (16.44), followed by the indoor temperature (−9.44) having a negative effect. This can be interpreted as the indoor PM2.5 concentration increases as the outdoor PM2.5 concentration is high and the indoor temperature is low. It is judged to have influenced according to a previous study using variables similar to this study, outdoor PM2.5, wind speed, temperature, and relative humidity had an effect on the indoor PM2.5 concentration in the order, and it was verified that wind speed had a significant effect unlike in this study [41]. Unlike this study, which considered wind speed in all directions, it is judged that the difference in importance was caused by the use of wind speed in consideration of the wind direction affecting the research target point in the previous study. Additionally, the study found that PM2.5 in the air easily penetrates indoors through window cracks during winter [43].
Figure 5 shows the distribution of measured and predicted values from the previous model. Considering that the predicted value and the measured value are tilted along the y-axis, it was confirmed that the predicted value was generally underestimated compared to the measured value.
On the other hand, it is known that the concentration of PM2.5 in indoor air is greatly influenced by the activity patterns of occupants and indoor sources [45,50,51,52]. However, in order to reflect factors such as indoor sources, survey results must be used, and survey results are difficult to obtain compared to outdoor variables. Meanwhile, indoor sources are heavily influenced by occupants’ activities, and according to previous research findings, except for special events, occupants’ daily activity patterns tend to exhibit similarity by time period [38,39,40,41,42,43]. In this study, in order to reflect the distribution characteristics of indoor PM2.5 concentration by time and activity patterns of indoor occupants, the model was subdivided by the hour, and the importance of variables in the model by time zone and the influence on indoor PM2.5 concentration by time zone were analyzed. In order to check the weight of the variable, the slope and intercept were checked.
As a result, indoor temperature had the greatest effect (–17.11) on indoor PM2.5 concentration, followed by outdoor PM2.5 concentration (15.70). This differed from the previous model, which did not differentiate between time zones, and highlighted the importance of segmenting the model. In the case of indoor temperature, it was confirmed that a significant negative effect was given between 8–10H models, and the outdoor PM2.5 concentration also showed a large weight between 7–10H models. Furthermore, as a result of checking the intercept, the 8–11H model appeared higher than other models, and it was found that the indoor PM2.5 concentration appeared high during that time [45]. This is a result that can verify that there is an indoor source in the corresponding time. In addition, the intercept of the 18H model was 8.55, which was higher than that of the before and after models. In the case of the residential environment, it was a result that could be estimated that the occupant’s cooking activity would be the main indoor source. Prior research has shown that cooking can double indoor PM2.5 exposure [53]. In addition, it was confirmed that the indoor PM2.5 concentration lasted for about 30–60 min during cooking [43]. However, in this study, considering that the indoor concentration is high for about 3–4 h in the morning, it is judged that there is an additional source, such as cleaning activity or the concentration of PM2.5 in the outdoor air is high during traffic congestion.
Figure 6 shows the distribution of the measured and predicted values of the proposed time-specific model. After classification by time period and removing outliers, the model predicted a wider range of concentrations and showed slightly improved prediction performance compared to the existing model with an average of 27% ± 4%. The R2 values of Figure 5 and Figure 6 may appear to have little difference, but it is important to note that R2 can be greatly influenced by sample size. Therefore, the observed results are considered noteworthy. However, the predicted value was generally underestimated compared to the measured value, as seen by the tilt of predicted value and the measured values along the y-axis, similar to the existing model by time period.
As a result of examining the performance of the MLR model classified by time unit, the explanatory power was improved by up to 9% compared to the existing MLR model. Figure 7 and Figure 8 are the test results for indoor PM2.5 concentration per hour; the explanatory power of the H1, H2, H4, H5, H6, H9, H10, and H15 models was 0.30 or higher, compared to the existing MLR model. These results show that since the distribution characteristics of indoor PM2.5 concentrations are different for each time period, a model with high accuracy can be developed only when the model is subdivided and trained in consideration of time. In addition, since it was applied to residential space with various indoor sources and different activity patterns of occupants, the method proposed in this study would have been applied to indoor spaces such as offices where the types of indoor sources were relatively few and the activity patterns of occupants were constant. It is considered that the predictive performance is better.
However, the explanatory power of the time zone-based classification model proposed in this study was relatively low compared to other studies, possibly due to the limited data used for model training which were collected from only two households. It is believed that a more accurate and generalized model can be developed by incorporating a variable that has a significant impact on indoor PM2.5 concentration. Since this study divided the model by time zones based on indoor PM2.5 concentration, it is expected that the limitations of direct measurement and investigation can be partially overcome by reflecting the distribution characteristics of indoor PM2.5 concentration across different time zones.

4.4. Influence of Seasonal Characteristics on Prediction Results of PM2.5 Cocentration

Our study trained an MLR model using PM2.5 concentration data collected simultaneously indoors and in the nearby outdoor areas, along with hourly meteorological data, over a period of one year. We categorized the dataset into four seasons: spring (March to May), summer (June to August), autumn (September to November), and winter (December to February). We aimed to examine the performance differences of the prediction model according to the seasons and determine the impact of seasons on indoor PM2.5 concentrations. We applied the categorized datasets to the prediction model and compared the results with the previously calculated test-RMSE of the model (Table 7). The distribution of predicted values and actual measurements was examined (Figure 9).
Among the four seasons, the RMSE values were found to be highest in the order of spring, winter, summer, and autumn. When evaluating the model performance based on different time periods, significant errors were observed in the predicted values compared to the actual values for H15 in spring and winter, H5 in summer, and H19 in autumn. Upon analyzing the input data of the models during the time periods with high errors, the maximum concentrations were found to be 171.85 μg/m3, 160.00 μg/m3 in both spring and winter, 126.00 μg/m3 in summer, and 344.40 μg/m3 in autumn, respectively. The substantial differences between these values and the maximum RMSE values observed during the corresponding seasons (spring—60.00 μg/m3, winter—72.00 μg/m3, summer—57.00 μg/m3, autumn—43.98 μg/m3) indicate the possible presence of indoor pollution sources during those specific time periods. These findings are expected to serve as important reference data for future studies on indoor PM2.5 concentration prediction models.

5. Conclusions

Most people spend much of their time indoors, and it is important to regulate indoor air quality to prevent health problems related to exposure to PM2.5. While measuring devices are commonly used to monitor indoor air quality, it can be difficult and expensive to assess measurement-based indoor air quality. This study aims to provide a more easily utilizable indoor PM2.5 concentration prediction method that can accurately reflect temporal characteristics by utilizing outdoor PM2.5 concentration, temperature, and humidity data measured near the indoor target point as input data to calculate indoor PM2.5 concentration through a multiple linear regression model.
To address the limitations of the MLR model and capture the distribution characteristics by time period, the dataset was divided into hourly units. Additionally, outliers were removed from the dataset during model training by utilizing the interquartile range to produce a more accurate and universally applicable concentration value.
As a result of the training, a significant difference in model performance was observed depending on whether or not the time zones were taken into account. By incorporating temporal characteristics into the training process, the MLR model showed an up to 9% improvement in explanatory power compared to the existing model. Some temporal models demonstrated an explanatory power of 30% or higher.
On the other hand, since the model was trained using data collected from two specific dwellings, its accuracy may be lower when applied to different indoor spaces. Furthermore, the explanatory power of the predictive model was relatively low due to the limited availability of input variables that could be easily obtained. In addition, the fact that indoor pollution sources and ventilation have a large effect on indoor PM2.5 concentration can be seen as a major limitation of the model proposed in this study because these variables are not reflected.
However, in order to propose a practical prediction method and overcome the limitations of limited input variables, it is judged that the indoor and outdoor temperature difference, relative humidity, and the difference between temperature and humidity can be substituted for the ventilation rate and indoor pollutants as input variables. Furthermore, by employing an MLR model that allows us to examine the weights of each variable, we were able to assess the importance of input variables on an hourly basis. When evaluating the performance of the models by time period, the prediction performance of the model during the early morning hours, which corresponded to the sleep duration of occupants (H~H), showed the best results. By contrast, the models trained on time periods when occupants are most active, such as H8 and H19, exhibited poorer prediction performance. These results suggest that indoor pollution sources, not explainable by outdoor variables, might have influenced indoor PM2.5 concentrations during those specific time periods. It was evident that time is an essential variable that must be considered when predicting indoor PM2.5 concentrations.
Furthermore, to gain a more detailed understanding of the factors influencing the improvement of model performance, we evaluated the model performance by season. The results showed that the seasonal characteristics had a significant impact on indoor PM2.5 concentrations and the performance of the prediction models. This study is expected to serve as valuable reference material for future research on predicting indoor PM2.5 concentrations.

Author Contributions

Conceptualization, S.-Y.P. and C.-M.L.; data curation, S.-Y.P.; formula analysis, S.-Y.P.; investigation, S.-Y.P., D.-K.Y., S.-H.P., J.-I.J. and J.-M.L.; methodology, S.-Y.P. and C.-M.L.; project administration, C.-M.L. and Y.-S.C.; supervision, C.-M.L., Y.-S.C. and J.K.; visualization, S.-Y.P., data analysis, S.-Y.P.; writing—original draft preparation, S.-Y.P.; supervision, C.-M.L., W.-H.Y. and Y.-S.C.; writing—review and editing, C.-M.L., W.-H.Y. and Y.-S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This research was supported by the Environmental Disease Prevention and Management Core Technology Development Project of the Korea Environmental Industry and Technology Institute under funding from the Ministry of Environment (project number: 2022003310002) and by the Korea Ministry of Environment (MoE) through the “Project of Professional Manpower training for the Safety Management of Chemicals”.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Anonymous. Korean Exposure Factors Handbook; Risk Assessment Division; National Institute of Environmental Research: Incheon, Republic of Korea, 2019.
  2. F KSOSTAT. Kindicator. 2022. Available online: http://www.index.go.kr/unify/idx-info.do?idxCd=4275#:~:text=%EA%B5%AD%EC%A0%9C%EC%A0%81%EC%9C%BC%EB%A1%9C%20%ED%95%9C%EA%B5%AD%EC%9D%98%20%EB%AF%B8%EC%84%B8,2%EB%B0%B0%20%EC%A0%95%EB%8F%84%20%EC%8B%AC%ED%95%9C%20%EA%B2%83%EC%9D%B4%EB%8B%A4 (accessed on 27 March 2023).
  3. Wang, X.; Xu, Z.; Su, H.; Ho, H.C.; Song, Y.; Zheng, H.; Hossain, M.Z.; Khan, M.A.; Bogale, D.; Zhang, H.; et al. Ambient Particulate Matter (PM1, PM2.5, PM10) and Childhood Pneumonia: The Smaller Particle, the Greater Short-Term Impact? Sci. Total Environ. 2021, 772, 145509. [Google Scholar] [CrossRef] [PubMed]
  4. Schwartz, J. Harvesting and Long Term Exposure Effects in the Relation between Air Pollution and Mortality. Am. J. Epidemiol. 2000, 151, 440–448. [Google Scholar] [CrossRef]
  5. Franklin, M.; Koutrakis, P.; Schwartz, J. The Role of Particle Composition on the Association between PM2.5 and Mortality. Epidemiology 2008, 19, 680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Chen, L.Y.; Tsay, Y.S.; Jung, C.C. Machine Learning Models for Indoor PM2.5 Concentrations in Residential Architecture in Taiwan. In Proceedings of the CLIMA 2022 Conference, UT Austin, Austin, TX, USA, 23–26 May 2022. [Google Scholar]
  7. Jung, J.; Ahn, J. Intelligent User Pattern Recognition Based on Vision, Audio and Activity for Abnormal Event Detections of Single Households. J. Korea Soc. Comput. Inf. 2019, 24, 59–66. [Google Scholar]
  8. Lee, S.H.; Yoon, Y.A.; Jung, J.H.; Chang, T.W.; Kim, Y.S. A Machine Learning Model for Predicting Silica Concentrations through Time Series Analysis of Mining Data. J. Korean Soc. Qual. Manag. 2020, 48, 511–520. [Google Scholar]
  9. Wei, W.; Ramalho, O.; Malingre, L.; Sivanantham, S.; Little, J.C.; Mandin, C. Machine Learning and Statistical Models for Predicting Indoor Air Quality. Indoor Air 2019, 29, 704–726. [Google Scholar] [CrossRef]
  10. Choi, Y.J.; Choi, E.J.; Cho, H.U.; Moon, J.W. Development of an Indoor Particulate Matter (PM2.5) Prediction Model for Improving School Indoor Air Quality Environment. KIEAE J. 2021, 21, 35–40. [Google Scholar] [CrossRef]
  11. Lagesse, B.; Wang, S.; Larson, T.V.; Kim, A.A. Predicting PM2.5 in Well-Mixed Indoor Air for a Large Office Building Using Regression and Artificial Neural Network Models. Environ. Sci. Technol. 2020, 54, 15320–15328. [Google Scholar] [CrossRef]
  12. Choi, Y.; Choi, E.; Cho, H.; Moon, J. Development of a Prediction Model for Indoor Fine Dust (PM2.5) to Improve Indoor Air Quality in School Facilities. KIEAE J. 2021, 21, 35–40. [Google Scholar] [CrossRef]
  13. Phillips, J.L.; Field, R.; Goldstone, M.; Reynolds, G.L.; Lester, J.N.; Perry, R. Relationships between indoor and outdoor air quality in four naturally ventilated offices in the United Kingdom. Atmos. Environ. Part A Gen. Top 1993, 27, 1743–1753. [Google Scholar] [CrossRef]
  14. Ji, W.; Zhao, B. Contribution of Outdoor-Originating Particles, Indoor-Emitted Particles and Indoor Secondary Organic Aerosol (SOA) to Residential Indoor PM2.5 Concentration: A Model-Based Estimation. Build. Environ. 2015, 90, 196–205. [Google Scholar] [CrossRef]
  15. Liu, D.L.; Nazaroff, W.W. Modeling Pollutant Penetration Across Building Envelopes. Atmos. Environ. 2001, 35, 4451–4462. [Google Scholar] [CrossRef] [Green Version]
  16. Bakht, A.; Han, S.; Khan, M.S.; Jang, K.; Kim, K.H. Deep Learning-Based Indoor Air Quality Forecasting Framework for Indoor Subway Station Platforms. Toxics 2022, 10, 557. [Google Scholar] [CrossRef] [PubMed]
  17. Marzouk, M.; Atef, M. Assessment of Indoor Air Quality in Academic Buildings Using IoT and Deep Learning. Sustainability 2022, 14, 7015. [Google Scholar] [CrossRef]
  18. AirKorea. Available online: https://www.airkorea.or.kr/index (accessed on 27 March 2023).
  19. Chen, C.; Zhao, B. Review of Relationship between Indoor and Outdoor Particles: I/O Ratio, Infiltration Factor and Penetration Factor. Atmos. Environ. 2011, 45, 275–288. [Google Scholar] [CrossRef]
  20. Kang, J.-W.; An, C.-J.; Choi, W. Surrounding environment and indoor fine dust concentration distribution characteristics based on indoor/outdoor concentration ratio (I/O ratio): Focusing on previous research reviews and measurement results in Busan and Pyeongtaek elementary schools in summer. J. Korean Soc. Remote Sens. 2020, 36, 1691–1710. [Google Scholar]
  21. Abdipour, M.; Ramazani, S.H.R.; Younessi-Hmazekhanlu, M.; Niazian, M. Modeling Oil Content of Sesame (Sesamum indicum L.) Using Artificial Neural Network and Multiple Linear Regression Approaches. J. Am. Oil Chem. Soc. 2018, 95, 283–297. [Google Scholar] [CrossRef]
  22. Emamgholizadeh, S.; Parsaeian, M.; Baradaran, M. Seed Yield Prediction of Sesame Using Artificial Neural Network. Eur. J. Agron. 2015, 68, 89–96. [Google Scholar] [CrossRef]
  23. May, R.J.; Dandy, G.C.; Maier, H.R. Review of Input Variable Selection Methods for Artificial Neural Networks. In Artificial Neural Networks—Methodological Advances and Biomedical Applications; InTech: Rijeka, Croatia, 2011; pp. 215–241. [Google Scholar]
  24. Czernecki, B.; Półrolniczak, M.; Kolendowicz, L.; Marosz, M.; Kendzierski, S.; Pilguj, N. Influence of the Atmospheric Conditions on PM10 Concentrations in Poznań, Poland. J. Atmos. Chem. 2017, 74, 115–139. [Google Scholar] [CrossRef] [Green Version]
  25. Xu, C.; Xu, D.; Liu, Z.; Li, Y.; Li, N.; Chartier, R.; Li, N. Estimating Hourly Average Indoor PM2.5 Using the Random Forest Approach in Two Megacities, China. Build. Environ. 2020, 180, 107025. [Google Scholar] [CrossRef]
  26. Yeom, M.-S.; Cho, G.-Y. Natural Ventilation of a High-Rise Residential Building Using a Double Skin System. Archit 2007, 51, 57–62. [Google Scholar]
  27. Isixsigma. Variance Inflation Factor (VIF). Available online: https://www.isixsigma.com/dictionary/variance-inflation-factor-vif/ (accessed on 27 March 2023).
  28. Jeon, Y.T.; Yu, S.H.; Kwon, H.Y. Improvement of PM Forecasting Performance by Outlier Data Removing. J. Korea Multimed. Soc. 2020, 23, 747–755. [Google Scholar]
  29. Kashi, H.; Emamgholizadeh, S.; Ghorbani, H. Estimation of Soil Infiltration and Cation Exchange Capacity Based on Multiple Regression, ANN (RBF, MLP), and ANFIS Models. Commun. Soil Sci. Plant Anal. 2014, 45, 1195–1213. [Google Scholar] [CrossRef]
  30. Masood, A.; Ahmad, K. A Model for Particulate Matter (PM2.5) Prediction for Delhi Based on Machine Learning Approaches. Procedia Comput. Sci. 2020, 167, 2101–2110. [Google Scholar] [CrossRef]
  31. Scikit-Learn. Sklearn.linear_model.SGDRegressor. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html (accessed on 27 March 2023).
  32. Kang, M.J. Comparison of Gradient Descent for Deep Learning. J. Korea Acad. Ind. Coop. Soc. 2020, 21, 189–194. [Google Scholar]
  33. Konečný, J.; Liu, J.; Richtárik, P.; Takáč, M. Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting. IEEE J. Sel. Top. Signal Process. 2015, 10, 242–255. [Google Scholar] [CrossRef]
  34. Taşan, S.; Demir, Y. Comparative analysis of MLR, ANN, and ANFIS models for prediction of field capacity and permanent wilting point for Bafra plain soils. Commun. Soil Sci. Plant Anal. 2020, 51, 604–621. [Google Scholar] [CrossRef]
  35. Nishihama, Y.; Ishikuro, M.; Miyashita, C.; Yamamoto-Hanada, K.; Sato, M.; Kato, N.; Minami, Y.; Hori, T.; Doi, H.; Araki, A.; et al. Indoor Air Quality of 5000 Households and Its Determinants. Part A: Particulate Matter (PM2.5 and PM10–2.5) Concentrations in the Japan Environment and Children’s Study. Environ. Res. 2021, 198, 111196. [Google Scholar] [CrossRef]
  36. National Institute of Environmental Research. Annual Report of Air Quality in Korea 2021. Available online: https://www.niehs.nih.gov/research/programs/geh/partnerships/network/centres/south_korea/index.cf (accessed on 27 March 2023).
  37. Ministry of Environment. Implementation Regulations of Indoor Air Quality Management Act. Year. Available online: https://www.law.go.kr/%EB%B2%95%EB%A0%B9/%EC%8B%A4%EB%82%B4%EA%B3%B5%EA%B8%B0%EC%A7%88%EA%B4%80%EB%A6%AC%EB%B2%95%EC%8B%9C%ED%96%89%EA%B7%9C%EC%B9%99 (accessed on 23 April 2023).
  38. Park, B.R.; Choi, D.H.; Kang, D.H. Seasonal Contribution of Indoor Generated and Outdoor Originating PM2.5 to Indoor Concentration Depending on Airtightness of Apartment Units. J. Archit. Inst. Korea Struct. Constr. 2020, 36, 155–163. [Google Scholar]
  39. Park, J.; Kim, E.; Choe, Y.; Ryu, H.; Kim, S.; Woo, B.L.; Cho, M.; Yang, W. Indoor to Outdoor Ratio of Fine Particulate Matter by Time of the Day in House According to Time-activity Patterns. J. Environ. Health Sci. 2020, 45, 504–512. [Google Scholar]
  40. Park, S.; Yoon, D.; Kong, H.; Kang, S.; Lee, C. A Case Study on Distribution Characteristics of Indoor and Outdoor Particulate Matters (PM10, PM2.5) and Black Carbon (BC) by Season and Time of the Day in Apartment. J. Environ. Health Sci. 2021, 47, 339–355. [Google Scholar]
  41. Han, Y.; Qi, M.; Chen, Y.; Shen, H.; Liu, J.; Huang, Y.; Chen, H.; Liu, W.; Wang, X.; Liu, J. Influences of Ambient Air PM2.5 Concentration and Meteorological Condition on the Indoor PM2.5 Concentrations in a Residential Apartment in Beijing Using a New Approach. Environ. Pollut. 2015, 205, 307–314. [Google Scholar] [CrossRef] [PubMed]
  42. Zhao, L.; Chen, C.; Wang, P.; Chen, Z.; Cao, S.; Wang, Q.; Xie, G.; Wan, Y.; Wang, Y.; Lu, B. Influence of Atmospheric Fine Particulate Matter (PM2.5) Pollution on Indoor Environment during Winter in Beijing. Build. Environ. 2015, 87, 283–291. [Google Scholar] [CrossRef]
  43. Qi, M.; Zhu, X.; Du, W.; Chen, Y.; Chen, Y.; Huang, T.; Pan, X.; Zhong, Q.; Sun, X.; Zeng, E.Y. Exposure and Health Impact Evaluation Based on Simultaneous Measurement of Indoor and Ambient PM2.5 in Haidian, Beijing. Environ. Pollut. 2017, 220, 704–712. [Google Scholar] [CrossRef]
  44. Kearney, J.; Wallace, L.; MacNeill, M.; Heroux, M.-E.; Kindzierski, W.; Wheeler, A. Residential Infiltration of Fine and Ultrafine Particles in Edmonton. Atmos. Environ. 2014, 94, 793–805. [Google Scholar] [CrossRef]
  45. Yang, S.; Mahecha, S.D.; Moreno, S.A.; Licina, D. Integration of Indoor Air Quality Prediction into Healthy Building Design. Sustainability 2022, 14, 7890. [Google Scholar] [CrossRef]
  46. Yuchi, W.; Gao, J.; Xie, W.; Wang, L.; Zhang, Z.; Lai, A.C.K. Evaluation of Random Forest Regression and Multiple Linear Regression for Predicting Indoor Fine Particulate Matter Concentrations in a Highly Polluted City. Environ. Pollut. 2019, 245, 746–753. [Google Scholar] [CrossRef]
  47. Elbayoumi, M.; Ramli, N.A.; Yusof, N.F.F.M.; Yahaya, A.S.B.; Al Madhoun, W.; Ul-Saufie, A.Z. Multivariate Methods for Indoor PM10 and PM2.5 Modelling in Naturally Ventilated Schools Buildings. Atmosphere 2014, 94, 11–21. [Google Scholar] [CrossRef]
  48. Zhang, H.; Srinivasan, R.; Yang, X. Simulation and Analysis of Indoor Air Quality in Florida Using Time Series Regression (TSR) and Artificial Neural Networks (ANN) Models. Symmetry 2021, 13, 952. [Google Scholar] [CrossRef]
  49. Zhang, H.; Wang, Y.; Yang, X.; Wang, J.; Xie, M.; Wei, X.; Liu, H.; Yuan, Y. Factors Influencing Indoor Air Pollution in Buildings Using PCA-LMBP Neural Network: A Case Study of a University Campus. Build. Environ. 2022, 225, 109643. [Google Scholar] [CrossRef]
  50. Zhong, J.; Ding, J.; Su, Y.; Shen, G.; Yang, Y.; Wang, C.; Tao, S. Carbonaceous Particulate Matter Air Pollution and Human Exposure from Indoor Biomass Burning Practices. Environ. Eng. Sci. 2012, 29, 1038–1045. [Google Scholar] [CrossRef]
  51. Abt, E.; Suh, H.H.; Catalano, P.; Koutrakis, P. Relative Contribution of Outdoor and Indoor Particle Sources to Indoor Concentrations. Environ. Sci. Technol. 2000, 34, 3579–3587. [Google Scholar] [CrossRef]
  52. Pagel, É.; Costa Reis, N.; de Alvarez, C.E.; Santos, J.M.; Conti, M.M.; Boldrini, R.S.; Kerr, A.S. Characterization of the Indoor Particles and Their Sources in an Antarctic Research Station. Environ. Monit. Assess. 2016, 188, 1–16. [Google Scholar] [CrossRef] [PubMed]
  53. Shrubsole, C.; Ridley, I.; Biddulph, P.; Milner, J.; Vardoulakis, S.; Ucci, M.; Oreszczyn, T.; Wilkinson, P.; Marmot, A.; Davies, M. Indoor PM2.5 Exposure in London’s Domestic Stock: Modelling Current and Future Exposures Following Energy Efficient Refurbishment. Atmosphere 2012, 3, 623–646. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Location (yellow circle) of the monitoring device.
Figure 1. Location (yellow circle) of the monitoring device.
Toxics 11 00526 g001
Figure 2. The flowchart of data pre-processing, model development and validation, comparison performance.
Figure 2. The flowchart of data pre-processing, model development and validation, comparison performance.
Toxics 11 00526 g002
Figure 3. Distribution of indoor PM2.5 by the hour.
Figure 3. Distribution of indoor PM2.5 by the hour.
Toxics 11 00526 g003
Figure 4. Heatmap of variables.
Figure 4. Heatmap of variables.
Toxics 11 00526 g004
Figure 5. Relationship between predicted and observed indoor PM2.5 concentration according to the previous multiple linear regression models.
Figure 5. Relationship between predicted and observed indoor PM2.5 concentration according to the previous multiple linear regression models.
Toxics 11 00526 g005
Figure 6. Relationship between predicted and observed indoor PM2.5 concentration according to the proposed multiple linear regression models (H0–H23).
Figure 6. Relationship between predicted and observed indoor PM2.5 concentration according to the proposed multiple linear regression models (H0–H23).
Toxics 11 00526 g006
Figure 7. Relationship between predicted and observed indoor PM2.5 concentration per hour in a.m.
Figure 7. Relationship between predicted and observed indoor PM2.5 concentration per hour in a.m.
Toxics 11 00526 g007
Figure 8. Relationship between predicted and observed indoor PM2.5 concentration per hour in p.m.
Figure 8. Relationship between predicted and observed indoor PM2.5 concentration per hour in p.m.
Toxics 11 00526 g008
Figure 9. Relationship between predicted and observed indoor PM2.5 concentration according to the seasonal characteristic (H0–H23).
Figure 9. Relationship between predicted and observed indoor PM2.5 concentration according to the seasonal characteristic (H0–H23).
Toxics 11 00526 g009
Table 1. Specifications of the measuring device.
Table 1. Specifications of the measuring device.
SpecificationDust Mon (Sentry Co. Ltd., Seoul, Republic of Korea)
AppearanceToxics 11 00526 i001
300 (W) × 150 (D) × 430 (H) MM, 9 kg
MetricsParticulate matter PM2.5
OtherTemperature, Relative humidity
Measurement RangeParticulate matter0–100,000 µg/m3
Flux0.5 L/min
Operating range−30 °C~60 °C, 0~99% relative humidity (RH)
Working power220 VAC/60 Hz
Power144 kW/month
CommunicationsLTE Cat M1
Data storageSD CARD
Table 2. Distribution of indoor and outdoor measurement variables.
Table 2. Distribution of indoor and outdoor measurement variables.
VariableUnitsI/ONMean ± S.D.MedianMaxI/O RatioSkewnessKurtosis
MeanMedian
PM2.5µg/m3Indoor80,57210.31 ± 13.706.00460.560.390.294.4450.27
Outdoor80,57226.28 ± 20.6921.00227.001.765.38
Temperature°CIndoor80,57227.01 ± 2.6127.0033.302.152.160.28−0.36
Outdoor80,57212.55 ± 10.6512.5040.00−0.06−0.80
Relative
humidity
%Indoor80,57246.29 ± 18.6141.0694.250.610.481.391.18
Outdoor80,57275.37 ± 22.8986.1999.90−0.91−0.42
Table 3. The results of the MLR model.
Table 3. The results of the MLR model.
ModelN
(Train/Test)
CoefficientsIntercept.RMSEMAER2
abcdef
Previous method72,300
(50,610/
21,690)
16.44−9.444.46−0.71−4.570.699.624.865943.661570.25
Table 4. The results of the MLR model separated per hour.
Table 4. The results of the MLR model separated per hour.
ModelN
(Train/Test)
RMSEMAER2ModelN
(Train/Test)
RMSEMAER2
H02113/9064.411583.366100.25H122097/8995.129313.720210.28
H12153/9234.577193.543990.31H132116/9075.163333.781030.24
H22204/9454.494063.412140.31H142109/9054.231313.132490.24
H32223/9534.651653.530790.29H152073/8893.341572.551090.33
H42199/9424.751413.603950.34H162072/8893.945712.915740.28
H52199/9434.614113.571640.30H172064/8854.392133.268410.25
H62137/9164.808903.698740.31H182093/8974.704273.619870.28
H72132/9146.507805.014710.25H192143/9195.656134.122370.20
H82140/9187.170995.378630.22H202133/9155.592254.188540.24
H92136/9166.876365.086940.33H212089/8965.100413.794250.25
H102150/9226.571704.955900.31H222074/8894.893823.649400.28
H112141/9185.646994.100890.25H232077/8914.543203.388530.24
Table 5. The results of the MLR model separated per hour.
Table 5. The results of the MLR model separated per hour.
ModelCoefficientsIntercept.ModelCoefficientsIntercept.
abcdefabcdef
H014.00−6.455.83−0.29−0.16−0.194.47H128.22−5.215.64−1.97−3.610.226.17
H113.62−6.416.76−0.560.07−0.733.50H139.78−3.634.67−2.27−4.25−0.846.56
H29.86−6.925.830.27−0.380.034.05H146.76−3.864.69−1.45−4.53−0.336.20
H38.92−8.175.631.40−0.860.844.04H154.98−3.223.35−1.72−5.28−0.876.75
H410.06−8.505.270.19−0.030.174.70H165.72−2.663.61−1.53−4.41−0.565.77
H59.60−9.364.813.31−0.400.955.15H176.86−2.873.17−0.82−6.02−1.266.91
H68.70−8.015.122.450.650.33.84H189.11−3.613.09−2.23−6.92−0.588.55
H714.73−7.585.270.342.45−0.865.19H1910.91−4.634.37−2.61−3.76−0.066.74
H815.70−15.779.09−0.56−2.233.749.27H209.51−4.595.90−2.81−3.040.405.86
H912.15−17.119.560.59−3.585.129.85H2110.07−5.074.99−3.10−1.950.166.02
H1012.12−13.907.430.03−4.471.6711.85H2210.26−4.395.82−2.87−0.76−1.064.30
H118.84−9.035.49−1.72−4.510.7910.18H2310.68−4.835.35−1.40−1.64−0.434.42
Table 7. RMSE of the proposed model by season.
Table 7. RMSE of the proposed model by season.
ModelYearSpringSummerFallWinter
H04.4115812.342486.961566.5693715.74986
H14.5771911.135047.014246.8358315.80936
H24.494068.108138.294487.5620713.25280
H34.6516511.3288410.503816.6813311.75929
H44.751419.5573513.420977.390009.87304
H54.614119.5872214.308026.700839.71402
H64.8089014.2219113.671639.1473812.81491
H76.5078014.7935318.0931311.9077817.34678
H87.1709917.3850313.097718.7701320.58378
H96.8763618.3735815.0503012.3099221.72775
H106.5717013.1524413.092268.3171619.67714
H115.6469911.9996511.749667.3245817.13957
H125.1293112.7279512.711046.5580815.19633
H135.1633312.2643014.668548.4001413.69871
H144.231319.9918613.518078.5712312.54090
H153.3415714.5750115.198516.6189016.26682
H163.9457114.0852014.946305.2577113.67740
H174.3921310.9605013.8144020.0028812.47072
H184.7042711.0377412.4624311.4180914.39600
H195.656139.940458.8516617.8406415.55853
H205.5922514.9072311.3385513.6035821.36625
H215.1004111.610098.8889417.9856516.00126
H224.8938213.852689.271547.6892917.13908
H234.5432013.6966369.817647.2357516.53751
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, S.-Y.; Yoon, D.-K.; Park, S.-H.; Jeon, J.-I.; Lee, J.-M.; Yang, W.-H.; Cho, Y.-S.; Kwon, J.; Lee, C.-M. Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model. Toxics 2023, 11, 526. https://doi.org/10.3390/toxics11060526

AMA Style

Park S-Y, Yoon D-K, Park S-H, Jeon J-I, Lee J-M, Yang W-H, Cho Y-S, Kwon J, Lee C-M. Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model. Toxics. 2023; 11(6):526. https://doi.org/10.3390/toxics11060526

Chicago/Turabian Style

Park, Shin-Young, Dan-Ki Yoon, Si-Hyun Park, Jung-In Jeon, Jung-Mi Lee, Won-Ho Yang, Yong-Sung Cho, Jaymin Kwon, and Cheol-Min Lee. 2023. "Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model" Toxics 11, no. 6: 526. https://doi.org/10.3390/toxics11060526

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop