Change Point Detection for Airborne Particulate Matter (PM2.5, PM10) by Using the Bayesian Approach

Khan, Muhammad Rizwan; Sarkar, Biswajit

doi:10.3390/math7050474

Open AccessFeature PaperArticle

Change Point Detection for Airborne Particulate Matter (PM_2.5, PM₁₀) by Using the Bayesian Approach

by

Muhammad Rizwan Khan

¹

and

Biswajit Sarkar

^2,*

¹

Department of Industrial Engineering, Hanyang University, 222 Wangsimni-Ro, Seoul 133-791, Korea

²

Department of Industrial & Management Engineering, Hanyang University, Ansan, Gyeonggi-do 15588, Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2019, 7(5), 474; https://doi.org/10.3390/math7050474

Submission received: 28 February 2019 / Revised: 25 April 2019 / Accepted: 8 May 2019 / Published: 24 May 2019

(This article belongs to the Special Issue Application of Optimization in Production, Logistics, Inventory, Supply Chain Management and Block Chain)

Download

Browse Figures

Versions Notes

Abstract

Airborne particulate matter (PM) is a key air pollutant that affects human health adversely. Exposure to high concentrations of such particles may cause premature death, heart disease, respiratory problems, or reduced lung function. Previous work on particulate matter (

P M_{2.5}

and

P M_{10}

) was limited to specific areas. Therefore, more studies are required to investigate airborne particulate matter patterns due to their complex and varying properties, and their associated (

P M_{10}

and

P M_{2.5}

) concentrations and compositions to assess the numerical productivity of pollution control programs for air quality. Consequently, to control particulate matter pollution and to make effective plans for counter measurement, it is important to measure the efficiency and efficacy of policies applied by the Ministry of Environment. The primary purpose of this research is to construct a simulation model for the identification of a change point in particulate matter (

P M_{2.5}

and

P M_{10}

) concentration, and if it occurs in different areas of the world. The methodology is based on the Bayesian approach for the analysis of different data structures and a likelihood ratio test is used to a detect change point at unknown time (k). Real time data of particulate matter concentrations at different locations has been used for numerical verification. The model parameters before change point (

θ

) and parameters after change point (

λ

) have been critically analyzed so that the proficiency and success of environmental policies for particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations can be evaluated. The main reason for using different areas is their considerably different features, i.e., environment, population densities, and transportation vehicle densities. Consequently, this study also provides insights about how well this suggested model could perform in different areas.

Keywords:

airborne particulate matter; Bayesian approach; change point detection; likelihood ratio test; time series analysis; air quality

1. Introduction

Airborne particulate matter is one of the most dangerous air pollutants and harmful to human health. For the last two decades, information about the negative impacts of

P M_{10}

(particles less than 10

μ

m in diameter) and

P M_{2.5}

(particles less than 2.5 micrometers in diameter) has increased enormously. Exposure to high concentrations of such particles may cause premature death, heart disease, respiratory problems, or reduced lung function through different mechanisms, which include pulmonary and systemic inflammation, accelerated atherosclerosis, and altered cardiac autonomic function (Heroux et al. [1] and Pope et al. [2]). Therefore, to control particulate matter (PM) pollution, and to make effective plans for counter measurements, it is important to measure the efficiency and effectiveness of policies applied by the Ministry of Environment. Every region has developed different kinds of extensive bodies of legislation, which establish air quality standards for key air pollutants to improve the air quality and to satisfy these standards. The European Environment Agency, the United States Environmental Protection Agency, and the Ministry of Environment in South Korea have each set their own air quality standards for all air pollutants. Thus, It is essential to follow established air quality monitoring systems to measure the PM concentrations on an hourly as well as daily basis, because some areas deviate from the established PM standards. This may cause adverse environmental effects and serious health problems.

Until now, a number of statistical methods have been established to model the hazards of PM from air quality standards. A Bayesian multiple change point model was proposed to measure the quantitative efficiency of pollution control programs for air quality, which estimate the hazards of different air pollutants. In the model, it was assumed as a nonhomogeneous Poisson process with multiple change points. The change points were identified, and a rate function was estimated by using a reversible jump MCMC algorithm (Gyarmati-Szabo et al. [3]). In another study, the changes in health effects due to simultaneous exposure to physical and chemical properties of airborne particulate matter were gauged through Bayesian approach and inferences were drawn via the Markov Chain Monte Carlo method (Pirani et al. [4]). A Bayesian approach was introduced to estimate the distributed lag functions in time series models, which can be used to determine the short-term health effects of particulate air pollution on mortality (Welty et al. [5]). Hybrid models were proposed to forecast the PM concentrations for four major cities of China; Beijing, Shanghai, Guangzhou, and Lanzhou (Qin et al. [6]).

A change point detection method for detecting changes in the mean of the one dimensional Gaussian process was proposed on the basis of a generalized likelihood ratio test (GLRT). The important characteristic of this method is that it includes data dependence and covariance of the Gaussian process. However, in case of unidentified covariance, the plug-in GLRT method was suggested which remains asymptotically near optimal (Keshavarz et al. [7]). A new method for acute change point detection was proposed for fractional Brownian motion with a time dependent diffusion coefficient. The likelihood ratio method has been used for change point detection in Brownian motion. A statistical test was also suggested to identify the significance of a calculated critical point (Kucharczyk et al. [8]). The change point detection technique in machine monitoring was suggested, which was based on two stages. In the first stage, irregularities are measured in time series data through the automatic regression (AR) model, and then the martingale statistical test is applied to detect the change point in unsupervised time series data (Lu et al. [9]). An integrated inventory model was developed to determine the optimal lot size and production uptime while considering stochastic machine breakdown and multiple shipments for a single-buyer and single-vendor (Taleizadeh et al. [10]).

A statistical change point algorithm based on nonparametric deviation estimation between time series samples from two retrospective segments was proposed in which the direct density ratio estimation method was applied for deviation measurement through relative Pearson divergence (Liu et al. [11]). A novel statistical methodology for online change point detection was suggested in which data for an uncertain system was composed through an autoregressive model. On the basis of nonparametric estimation of unidentified elements, an innovative CUSUM-like scheme was recommended for change detection. This estimation method could also be updated online (Hilgert et al. [12]). A new methodology, the Karhunen-Loeve expansions of the limit Gaussian processes, was suggested for change point test in the level of a series. Firstly, change point detection in the mean was explained, which later extended to linear and nonlinear regression (Górecki et al. [13]). A Cramer-von Mises type test was presented to test the sudden changes in random fields which was dependent on Hilbert space theory (Bucchia and Wendler [14]). The continuous-review inventory model was developed for Controllable lead time for comparing two models; one with normally distributed lead time demand and the second assumes that there is no specific distribution for lead time demand (Shin et al. [15]).

A new technique was developed to identify the structural changes in linear quantile regression models. When a structural change in the relationship between covariates and response at a specific point exists, it may not be at the centre of response distribution, but at the tail. The traditional mean regression method might not be applicable for change point detection of such structural changes at tails. Subsequently, the proposed technique could be appropriate for it (Zhou et al. [16]). For detection of simultaneous changes in mean and variance, a new methodology called the fuzzy classification maximum likelihood change point (FCML-CP) algorithm was suggested. Multiple change points in the mean and variance of a process can be estimated by this method. This technique is much better than the normal statistical mixture likelihood method because it saves a lot of time (Lu and Chang [17]). A model for Partial Trade-Credit Policy of Retailer was developed in which deterioration of products was assumed as exponentially distributed (Sarkar and Saren [18]).

The Bayesian change point algorithm for sequential data series was introduced which has some uncertain limitations regarding location and number of change points. This algorithm was precisely based on posterior distribution to deduce if a change point has occurred or not. It can also update itself linearly as new data points are observed. Posterior distribution monitoring is the finest way to identify the presence of a new change point in observed data points. Simulation studies illustrate that this algorithm is good for rapid detection of existing change points, and it is also known for a low rate of false detection (Ruggieri and Antonellis [19]). Due to the probabilistic concept of Bayesian change point detection (BCPD), this methodology can overcome threats in identifying the location and number of change points.

The performance of two different methods for change point detection of multivariate data with both single and multiple changes was compared. The results illustrated adequate performance for both Expectation Maximization (EM) and Bayesian methods. However, EM exhibits better performance in case of minor changes and unsuitable priors while the Bayesian method has less computational work to do (Keshavarz and Huang [20]). The Bayesian multiple change point model was suggested for the identification of Distributed Denial of Service (DDoS) flooding attacks in VoIP systems in which Session Initiation Protocol (SIP) is used as signalling mechanism (Kurt et al. [21]). One of the well-known change detection techniques is post classification with multi temporal remote sensing images. An innovative post classification technique with iterative slow feature analysis (ISFA) and Bayesian soft fusion was suggested to acquire accurate and reliable change detection maps. Three steps were suggested in this technique, first was to get the class probability of images through independent classification. After that, a continuous change probability map of multi temporal images was obtained by ISFA algorithm. Lastly, posterior probabilities for the class combinations of coupled pixels were determined through the Bayesian approach to assimilate the class probability with the change probability, which is called Bayesian soft fusion. This technique could be widely applicable in land cover monitoring and change detection at a large scale (Wu et al. [22]).

The Bayesian change point technique was designed to analyse biomarkers time series data in women for the diagnosis of ovarian cancer. The identification of such kind of change points could be used to diagnose the disease earlier (Mariño et al. [23]). The Generalized Extreme Value (GEV) fused lasso penalty function was applied to identify the change point of annual maximum precipitation (AMP) in South Korea. Numerical analysis and applied data analysis were conducted in order to compare performance from the GEV fused lasso and Bayesian change point analysis, which shows that when water resource structures are hydrologically designed the GEV fused lasso method should be used to identify the change points (Jeon et al. [24]). The Bayesian method was recommended to identify the change point occurrence in extreme precipitation data, and the model follows a generalized Pareto distribution. This Bayesian change point detection was inspected for four different situations, one with no change model, second with a shape change model, third with a scale change model, and fourth with both a scale and shape change model. It was determined that unexpected and sustained change points need to be considered in extreme precipitation while making hydraulic design (Chen et al. [25]).

Bayesian change point methodology was presented to identify changes in the temporal event rate for a non-homogeneous Poisson process. This methodology was used to determine if a change in the event rate has occurred or not, the time for change, and the event rate before or after the change. The methodology has been explained through an example of earthquake occurrence in Oklahoma. This spatiotemporal change point methodology can also be used for identifying changes in climate patterns and assessing the spread of diseases. It permits participants to make real time decisions about the influence of changes in event rates (Gupta and Baker [26]). A new Bayesian methodology was recommended to analyze multiple time series with the objective of identifying abnormal regions. A general model was developed and it was shown that Bayesian inference allows independent sampling from the posterior distribution. Copy number variations (CNVs) are identified by using data from multiple individuals. The Bayesian method was evaluated on both real and simulated CNV data to provide evidence that this method is more precise as compare to other suggested methods for analyzing such data (Bardwell and Fearnhead [27]).

All the above mentioned methods are either too complex and complicated for application on random hazards of PM or not applicable to randomness of PM hazards. Therefore, still more studies are required to investigate the PM hazards, due to its complex and varying properties and associated (

P M_{10}

and

P M_{2.5}

) concentrations and compositions, to investigate the numerical productivity of pollution control programs for air quality. The primary purpose of this research is to develop models for change point detection of particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations if it occurs in different areas. The pollutant concentrations before and after a change point has to be critically analyzed so that the proficiency and success of environmental policies for particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations can be evaluated. The Bayesian approach is used to analyze random hazards of PM concentrations with a change point at an unknown time (k).

To demonstrate the proposed approach, real time data of random hazards of PM concentrations at different sites has been used. The PM concentrations change point (k), parameters before change point (

θ

), and parameters after change point (

λ

) have been comprehensively analyzed by using the Bayesian technique. Thus, simulation models have been constructed for different data structures. The main reason for using different areas is their considerably different features i.e., environment, population densities, and transportation vehicle densities. Consequently, this study also provides insight about how well this suggested model could perform in different areas. The paper is structured as follows: Section 2 refers to problem definitions, explaining assumptions along with notation, and Section 3 shows the formulation of mathematical models. Section 4 and Section 5 depict numerical examples and results, respectively, to validate the practical applications of the proposed models. Section 6 discusses the depicted results of previous section, it also explains the managerial insights of results. Finally, Section 7 presents conclusions of this study. Table 1 depicts the comparative study of different authors who have contributed in the direction of research, while the last row of the table portrays the contribution of this research paper. On the other hand, Table 2 and Table 3 compares the difference in previous workings and this work.

2. Problem Definition, Notation and Assumptions

2.1. Problem Definition

The major objective of this research is to develop a more precise, well defined and user friendly method for application on random hazards of PM to detect the change point of subjected air pollutant hazards at any unknown time (k) if it occurs at any area across the globe. The existing methods are either too complex and complicated for the application on random hazards of PM due to its complex and varying properties or not applicable to randomness of PM hazards. Therefore, still more studies are required to develop a such kind of methodology, which is easily understandable and appropriate to model the hazards of the PM concentrations from air quality standards that can also detect change points in these hazards. Secondly, this method could be applicable for any kind of time series and data distributions. Analysis of these changes need to be done, whether these change points are favorable or not for the environment. For this, a comparison of subjected pollutant hazards before and after a change point has to be done for the evaluation of pollution control programs adopted by environmental protection agencies. If hazards occurrences increase after the change point, then environmental policies have a negative impact which marks the failure of pollution control program, but if the hazards occurrences reduce after the change point, then it demonstrates the effectiveness of the pollution control program. Thirdly, an alteration in occurrences must be measured to define the new pollution control policies for further improvements in the current level of subjected air pollutant hazards.

For anticipated goals, the Bayesian approach will be used to determine posterior probabilities of pollutant occurrences and the likelihood ratio test will be used for identifying the change point in that Bayesian model. This suggested model would be numerically validated by using real-time data of particulate matters’ concentrations in different areas of Seoul, South Korea, observed from January 2004 to December 2013. The change point (k) for for particulate matter (

P M_{2.5}

and

P M_{10}

) hazards, the rate before the change point (

θ

), and the rate after the change point (

λ

) would be comprehensively analyzed. The central idea for using different regions is their considerably different features i.e., environment, population densities, and transportation vehicle densities. Hence, this study can also be a vision for the implementation of recommended model in different areas.Air quality standards for particular matter

P M_{2.5}

and

P M_{10}

are given in Table 4. Results have been determined by following these standards.

Table 5 and Table 6 illustrate some details regarding data collected for Guro, Nowon, Songpa, and Yongsan which exhibit standard-wise and location-wise percentage of polluted days. In case of

P M_{2.5}

, more than 44%, 21% and 8% days are polluted as per European, American and Korean standards respectively, which is alarming. Similarly, in case of

P M_{10}

, the polluted days concentrations as per European and Korean standards is more than 40% and 6% respectively and it could not be acceptable. Hence, there is a need to control hazards of PM.

2.2. Notation

The list of notation to represent the random variables and parameters is as follows:

Indices

i: replication or sequence, i = 1, 2, …
j: position in the chain, j = 1, 2, …n

Random variables

Y: random process
y: variable (Y) at any given point
$y_{i}$: variable (Y) at point i where $i \in 0, 1, 2 \dots$

Parameters

k: change point in the random process
$θ$: parameter before change point k associated with probability distribution function of random variable Y
$λ$: parameter after change point k associated with probability distribution function of random variable Y

Variables

$P r (θ)$: prior distribution for parameter $θ$
$P r (θ | y_{i})$: posterior distribution for parameter $θ$
$P r (λ | y_{i})$: posterior distribution for parameter $λ$
$P r (y_{i} | θ)$: likelihood or sampling model
V: mean of the chain or replications (Average of daily pollutant concentrations)
$V_{i j}$: jth observation from the ith replication
$V_{i}$: mean of ith replication
V: mean of m replications
B: between sequence variance represents the variance of replications with the mean of m replications
$S_{i}^{2}$: variance for all replications
W: within sequence variance is the mean variance for m replications
$V a r (V)$: overall estimate of the variance of V in the target distribution
$\sqrt{R}$: estimated potential scale reduction for convergence

2.3. Assumptions

The following assumptions were used for the proposed model:

Y represents the number of times an event occurs in time t and Y is always positive real numbers $y \in 1, 2 \dots$ that can be any random value.
$Y (0) = 0$ means that no event occurred at time $t = 0$ .
Time series random data observed on equal interval of lengths.
The particulate matter daily concentrations or occurrence of events follow specific random probability distribution function.
The particulate matter daily concentrations in any interval of length $(t)$ is a random variable and number of times event occurs is also positive random variable with parameter $(r a t e = θ)$ .

3. Mathematical Model

3.1. Formulation of Mathematical Model

The probability distribution function of a random variable Y at any given point y in the sample space is given as follows:

f (y; θ) = P r (Y = y | θ) f o r y \in 1, 2 \dots

There could be a single parameter or multiple parameters depending upon the probability distribution function of random variable Y.

The change point for random process Y is being detected by the likelihood ratio test and that is a statistical test used for comparing the goodness of fit for two statistical models; one is null model and other is alternative model. The test is based on the likelihood ratio, which states how many times more likely the data are under one model than the other. This likelihood ratio compared to a critical value used to decide whether to reject the null model.

\begin{matrix} f (Change point | Y, Eexpectation before change point, Expectation after change point) \\ = \frac{L (Y; Change point, Expectation before change point, Expectation after change point)}{\sum_{j = 1}^{n} L (Y; j, Change point, Expectation before change point, Expectation after change point)} \end{matrix}

(1)

and parameters’ comparison before and after the change-point is also being done.

Let the change point in the random process be denoted by k and

θ

be the random variable parameter before change point k while

λ

be the random variable parameter after change point k. It can be represented as:

y_{i} \sim p d f (θ) for i = 1, 2, \dots ., k

y_{i} \sim p d f (λ) for i = k + 1, k + 2, \dots, n

Hence,

f (y; θ) = P r (Y = y_{i} | θ) for i = 1, 2, \dots ., k

f (y; λ) = P r (Y = y_{i} | λ) for i = k + 1, k + 2, \dots, n

The joint pdf (probability density function) is the product of marginal pdf. If random variable

Y = y_{i}

with parameter

θ

is modelled, then joint pdf of our sample data will be as below:

P r (Y = y_{i} | θ) = \prod_{i = 1}^{n} P r (y_{i} | θ) f o r i \in 0, 1, 2, \dots, n

A class of prior densities is conjugate for the likelihood/sampling model

P r (y_{i} | θ)

if the posterior distribution is also in the same class. Therefore, prior distribution

P r (θ)

and posterior distribution

P r (θ | y_{i})

will follow the same conjugate prior distribution to the likelihood/sampling model

P r (y_{i} | θ)

. However, the likelihood

P r (y_{i} | θ)

follows the random distribution based on data. Therefore, the prior distribution

P r (θ)

of parameters and posterior distribution

P r (θ | y_{i})

of the same parameters must be same and conjugate for Bayesian analysis. Bayes theorem for parameter’s

θ

and

λ

is as follows:

P r (θ | y_{i}) \propto P r (θ) P r (y_{i} | θ)

P r (λ | y_{i}) \propto P r (λ) P r (y_{i} | λ)

By applying Bayes theorem, the posterior distribution of model parameters

θ

and

λ

can be determined

P r (θ | y_{i}) = \frac{P r (y_{i} | θ) P r (θ)}{P r (y_{i}) f o r i \in 1, 2, \dots, k}

P r (λ | y_{i}) = \frac{P r (y_{i} | λ) P r (λ)}{P r (y_{i}) f o r i \in k + 1, k + 2, \dots, n}

As,

L (θ | Y) = f_{θ} (Y) = f (Y | θ)

Now, apply likelihood ratio test statistic for change point detection

f (k | Y, θ, λ) = \frac{L (Y; k, Expectation before change point, Expectation after change point)}{\sum_{j = 1}^{n} L (Y; j, Expectation before change point, Expectation after change point)}

And likelihood will be determined as given by:

L (Y; k, θ, λ) = [e x p (k ((Expectation after k point - Expectation before k point)) {(\frac{Expectation before k}{Expectation after k})}^{\sum_{i = 1}^{k} y_{i}}]

The change point k is uniform over

y_{i} .

Please note that

θ, λ

and k are all independent of each other.

3.1.1. Convergence of the Parameters

A single simulation run of a somewhat arbitrary length cannot represent the actual characteristics of the resulting model. Therefore, to estimate the steady-state parameters, the Gelman-Rubin Convergence diagnostic has to be applied in which target parameters are estimated by running multiple sequences of the chain. m replications of the simulation

(m \geq 10)

are made, each of length

n = 1000

. If the target distribution is unimodal then Cowles and Carlin recommends that we must run at least ten chains, as this approach monitors the scalar numbers of interest in the analysis. Therefore, the mean rate of pollutant concentrations is a parameter of interest that is denoted by V.

Scalar summary V = Mean of the chain (Average of daily pollutant concentrations)

Let

V_{i j}

be the jth observation from the ith replication

V_{i j}, i = 1, 2, \dots ., m j = 1, 2, \dots ., n

Mean of ith replication

V_{i} = \frac{1}{n} \sum_{j = 1}^{n} V_{i j}

Mean of m replications

V = \frac{1}{m} \sum_{i = 1}^{m} V_{i}

The between sequence variance represents the variance of replications with the mean of m replications calculated as follows:

B = \frac{n}{m - 1} \sum_{i = 1}^{m} {(V_{i} - V)}^{2}

Variance for all replications is calculated to determine the within sequence variance

S_{i}^{2} = \frac{1}{n - 1} \sum_{j = 1}^{n} {(V_{i j} - V)}^{2}

The within sequence variance is the mean variance for k replications determined as given below:

W = \frac{1}{m} \sum_{i = 1}^{m} S_{i}^{2}

Finally, the within sequence variance and between sequence variance are combined to get an overall estimate of the variance of V in the target distribution

V a r (V) = \frac{n - 1}{n} W + \frac{1}{n} B

Convergence is diagnosed by calculating

\sqrt{R} = \sqrt{\frac{V a r (V)}{W}}

This factor

\sqrt{R}

(estimated potential scale reduction) is the ratio between the upper and lower bound on the space range of V which is used to estimate the factor by which

V a r (V)

could be reduced through more iterations. Further iterations of the chain must be run if the potential scale reduction is high. Run the replications until R is less than 1.1 or 1.2 for all scalar summaries.

3.1.2. Flowchart

The flowchart (Figure 1) for change point k detction, for any random process Y, is given as follows:

3.2. Comparison Method for Change Point Detection

A change point analysis has been done by using a combination of CUSUM (cumulative sum control chart) and bootstrapping for comparative analysis.

3.2.1. The CUSUM (Cumulative Sum Control Chart) Technique

The CUSUM (cumulative sum control chart) is a sequential analysis technique typically used for monitoring change detection. CUSUM charts are constructed by calculating and plotting a cumulative sum based on the data. The cumulative sums are calculated as follows:

First calculate the average.

$\bar{X} = (\frac{X_{1} + X_{2} + X_{3} + \dots .,}{n})$
Start the cumulative sum at zero by setting $S_{0} = 0$
Calculate the other cumulative sums by adding the difference between current value and the average to the previous sum, i.e.,

$S_{i} = S_{i - 1} + (X_{i} - \bar{X})$

Plot the series and the cumulative sum is not the cumulative sum of the values. Instead it is the cumulative sum of differences between the values and the average. Because the average is subtracted from each value, the cumulative sum also ends at zero.

Interpreting a CUSUM chart requires some practice. Suppose that during a period of time the values tend to be above the overall average. Most of the values added to the cumulative sum will be positive and the sum will steadily increase. A segment of the CUSUM chart with an upward slope indicates a period where the values tend to be above the overall average. Likewise a segment with a downward slope indicates a period of time where the values tend to be below the overall average. A sudden change in direction of the CUSUM indicates a sudden shift or change in the average. Periods where the CUSUM chart follows a relatively straight path indicate a period where the average did not change.

3.2.2. Bootstrap Analysis

A confidence level can be determined for the apparent change by performing a bootstrap analysis. Before performing the bootstrap analysis, an estimator of the magnitude of the change is required. One choice, which works well regardless of the distribution and despite multiple changes, is

S_{d i f f}

defined as:

S_{d i f f} = S_{m a x} - S_{m i n}

S_{m a x} = max_{i = 0, 1, 2, \dots,} S_{i}

S_{m i n} = \min_{i = 0, 1, 2, \dots,} S_{i}

Once the estimator of the magnitude of the change has been selected, the bootstrap analysis can be performed. A single bootstrap is performed by:

Generate a bootstrap sample of n units, denoted ${X^{0}}_{1}, {X^{0}}_{2}, {X^{0}}_{3}, \dots {X^{0}}_{n}$ by randomly reordering the original n values. This is called sampling without replacement.
Based on the bootstrap sample, calculate the bootstrap CUSUM, denoted ${S^{0}}_{0}, {S^{0}}_{1}, {S^{0}}_{2}, \dots {S^{0}}_{n}$ .
Calculate the maximum, minimum and difference of the bootstrap CUSUM, denoted ${S^{0}}_{m a x}, {S^{0}}_{m i n}$ and ${S^{0}}_{d i f f}$ .
Determine whether the bootstrap difference ${S^{0}}_{d i f f}$ is less than the original difference $S_{d i f f}$ .

The idea behind bootstrapping is that the bootstrap samples represent random reordering of the data that mimic the behavior of the CUSUM if no change has occurred. By performing a large number of bootstrap samples, it can be estimated that how much

S_{d i f f}

would vary if no change took place. It would be compared with the

S_{d i f f}

value calculated from the data in its original order to determine if this value is consistent with what has been expected if no change occurred. If bootstrap CUSUM charts tend to stay closer to zero than the CUSUM of the data in its original order, this leads one to suspect that a change must have occurred. A bootstrap analysis consists of performing a large number of bootstraps and counting the number of bootstraps for which

{S^{0}}_{d i f f}

is less than

S_{d i f f}

. Let N be the number of bootstrap samples performed and let X be the number of bootstraps for which

{S^{0}}_{d i f f} < S_{d i f f}

. Then the confidence level that a change occurred as a percentage is calculated as follows:

Confidence Level = 100 \frac{X}{N} percentage

This is strong evidence that a change did in fact occur. Ideally, rather than bootstrapping, one would like to determine the distribution of

{S^{0}}_{d i f f}

based on all possible reordering of the data. However, this is generally not feasible. A better estimate can be obtained by increasing the number of bootstrap samples. Bootstrapping results in a distribution free approach with only a single assumption, that of an independent error structure. Both control charting and change-point analysis are based on the mean-shift model. Let

X_{1}, X - 2, X_{3}, \dots

represent the data in time order. The mean-shift model can be written as

X_{i} = μ_{i} + ϵ_{i}

where

μ_{i}

is the average at time i. Generally

μ_{i} = μ_{i - 1}

except for a small number of values of i called the change-points.

ϵ_{i}

is the random error associated with the ith value. It is assumed that the

ϵ_{i}

are independent with means of zero. Once a change has been detected, an estimate of when the change occurred can be made. One such estimator is the CUSUM estimator. Let m be such that:

∣ S_{m} ∣ = \max_{i = 0, 1, 2, \dots,} ∣ S_{i} ∣

S_{m}

is the point furthest from zero in the CUSUM chart. The point m estimates last point before the change occurred. The point

m + 1

estimates the first point after the change. Once a change has been detected, the data can be broken into two segments, one each side of the change-point, 1 to m and

m + 1

to 24, estimating the average of each segment, and then analyzing the two estimated averages.

4. Numerical Example

The formulated mathematical model has been used for the numerical verification and the validity of the model has also been checked. That is why real-time data of particulate matter hazards for four different sites of Seoul, South Korea has been utilized for this investigation.

4.1. Particulate Matter ( $P M_{2.5}$ ) and $(P M_{10})$ Change Points for Four Different Sites

Two dissimilar cases need to be considered.

4.1.1. Case 1—When There Is No Hazard

In this case, there is no hazard and concentrations of particulate matter does not exceed the threshold value of the standards. Therefore, there will be no polluted day and random variable Y would always be

y = 0

. Hence, due to zero hazard in the concentrations of particulate matter, this model has not been applied.

4.1.2. Case 2—When There Are Hazards

In this case, several polluted days for particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations are considered as a Poisson process. A counting process is a Poisson counting process with the rate

θ > 0

. Here, we report the results obtained by applying the method described in Section 3 to the particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations for four different sites (Guro, Nowon, Songpa, and Yongsan) in Seoul, South Korea. We used the daily data observed from January 2004 to December 2013 to compute the change point of both pollutants.

f (y; θ) = (P r (Y = y | θ)) = P o i s s o n (y, θ) = \frac{e^{- θ} θ^{y}}{y!} f o r y \in 1, 2, \dots ., n

Poisson distribution is the number of events occurring in a given time period. So in this case, occurrence of the number of polluted days in a month is taken as Poisson distribution. The rate of polluted days for both

P M_{2.5}

and

P M_{10}

are given in Table 7 and Table 8 respectively.

The change point for this Poisson process has to be detected to know whether a change has occurred, the most likely month in which change has occurred, and if the rate of polluted days has increased or decreased after the change point. It has been assumed that the number of polluted days for (particulate matter)

P M_{2.5}

and

P M_{10}

concentrations follows a Poisson distribution with a mean rate

θ

until the month k. After the month k, the polluted days are distributed according to the Poisson distribution with a mean rate

λ

. It can be represented as:

y_{i} \sim P o i s s o n (θ) for i = 1, 2, \dots ., k

y_{i} \sim P o i s s o n (λ) for i = k + 1, k + 2, \dots, n

Hence,

f (y; θ) = P r (Y = y_{i} | θ) for i = 1, 2, \dots ., k

f (y; λ) = P r (Y = y_{i} | λ) for i = k + 1, k + 2, \dots, n

If we model

Y = y_{i}

as Poisson with mean rate

θ

then joint pdf of our sample data will be as below:

P r (Y = y_{i} | θ) = \prod_{i = 1}^{n} p r (y_{i} | θ) = \prod_{i = 1}^{n} \frac{e^{- θ} θ^{y_{i}}}{y!} = c (y_{1}, y_{2}, \dots y_{n}) e^{- n θ} θ^{\sum y_{i}} i \in 0, 1, 2, \dots, n

This means that whatever our conjugate class of densities is, it will have to include terms like

e^{- C_{2} θ} θ^{C_{1}}

for constants

C_{1}

and

C_{2}

. The simplest class of such densities, which include these terms and corresponding probability distributions, are known as family of Gamma distributions. Therefore, prior distribution

P r (θ)

and posterior distribution

P r (Y = θ | y_{1}, y_{2}, \dots y_{n})

will follow a Gamma distribution, but likelihood or sampling model

P r (y_{1}, y_{2}, \dots y_{n} | θ)

follow a Poisson distribution.

Therefore, the prior distributions of

θ

and

λ

, uncertain positive quantities

θ

and

λ

has

G a m m a (a_{1}, b_{1})

and

G a m m a (a_{2}, b_{2})

distributions respectively, where

a_{1}

is shape parameter and

b_{1}

is rate parameter for

θ

while

a_{2}

is shape parameter and

b_{2}

is rate parameter for

λ

P r (θ) = G a m m a (θ, a_{1}, b_{1}) = \frac{b_{1}^{a_{1}} e^{- b_{1} θ} θ^{a_{1} - 1}}{Γ (a_{1})}

P r (λ) = G a m m a (λ, a_{2}, b_{2}) = \frac{b_{2}^{a_{2}} e^{- b_{2} λ} λ^{a_{2} - 1}}{Γ (a_{2})}

Gamma distribution is also conjugate prior of the rate (inverse scale) parameter of the Gamma distribution itself. That is why the rate parameter

b_{1}

and

b_{2}

will also follow a Gamma distribution with different shape and rate parameters as given below:

b_{1} \sim G a m m a (c_{1}, d_{1}) where c_{1} = shape parameter d_{1} = rate parameter

b_{2} \sim G a m m a (c_{2}, d_{2}) where c_{2} = shape parameter d_{2} = rate parameter

By applying Bayes theorem, posterior distributions for rate parameters

θ

,

λ

,

b_{1}

and

b_{2}

will be determined in the following way. Likelihood and prior distributions of

θ

P r (y_{1}, y_{2}, y_{3}, \dots ., y_{n} | θ) \sim P o i s s o n (θ)

P r (θ) = G a m m a (θ, a_{1}, b_{1})

\begin{matrix} P r (θ | y_{1}, y_{2}, y_{3}, \dots ., y_{n}) & = \frac{P r (y_{1}, y_{2}, y_{3}, \dots ., y_{n} | θ) P r (θ)}{P r (y_{1}, y_{2}, y_{3}, \dots ., y_{n})} = (e^{- b_{1} θ} θ^{a_{1} - 1}) \times (e^{- n θ} θ^{\sum y_{i}}) \times c (y_{1}, y_{2}, y_{3}, \dots ., y_{n}) \\ = (e^{- (b_{1} + n) θ} θ^{a_{1} + \sum y_{i} - 1}) \times c (y_{1}, y_{2}, y_{3}, \dots ., y_{n}, a_{1}, b_{1}) \\ (θ | y_{1}, y_{2}, y_{3}, \dots ., y_{n}) \sim G a m m a (a_{1} + \sum_{i = 1}^{n} y_{i}, b_{1} + n) \end{matrix}

This is evidently a Gamma distribution. Hence, the conjugacy of Gamma family for the Poisson sampling model or likelihood is confirmed. Hence, it is concluded from the above that if:

θ \sim G a m m a (a_{1}, b_{1})

P r (y_{1}, y_{2}, y_{3}, \dots ., y_{n} | θ) \sim P o i s s o n (θ)

Then:

(θ | y_{1}, y_{2}, y_{3}, \dots ., y_{n}) \sim G a m m a (a_{1} + \sum_{i = 1}^{n} y_{i}, b_{1} + n)

Similarly, the posterior distributions of all parameters

θ

,

λ

,

b_{1}

and

b_{2}

can be determined as given below:

(θ | y, λ, b_{1}, b_{2}, k) \sim G a m m a (a_{1} + \sum_{i = 1}^{k} y_{i}, k + b_{1})

(λ | y, θ, b_{1}, b_{2}, k) \sim G a m m a (a_{2} + \sum_{i = k + 1}^{n} y_{i}, k + b_{2})

(b_{1} | y, θ, λ, b_{2}, k) \sim G a m m a (a_{1} + c_{1}, θ + d_{1})

(b_{2} | y, θ, λ, b_{1}, k) \sim G a m m a (a_{2} + c_{2}, λ + d_{2})

As Gamma is a two-parameter family of continuous probability distribution. As a result, the function:

L (θ | Y) = f_{θ} (Y) = f (Y | θ)

The likelihood ratio test statistic is:

f (k | Y, θ, λ) = \frac{L (Y; k, θ, λ)}{\sum_{j = 1}^{n} L (Y; j, θ, λ)}

The likelihood is determined as given by:

L (Y; k, θ, λ) = e x p (k (λ - θ)) {(θ / λ)}^{\sum_{i = 1}^{k} y_{i}}

For Bayesian approach, MATLAB has been used for change point detection of particulate matter (

P M_{2.5}

and

P M_{10}

) data during the study period (2004–2013) for four different sites (Guro, Nowon, Songpa and Yongsan) in Seoul, South Korea. 10 replications of each simulation are made with 1100 observations in each replication. First 100 observations are discarded as a burn-in period. Replication Mean

V_{i}

of remaining 1000 observations has been taken for each replication as shown in Table 9 and Table 10. Then mean

(V)

of replication mean has been taken to get the converged values of parameters.

Moreover, the CUSUM charts of polluted days as per European, American and Korean standards are shown in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 for four different sites Guro, Nowon, Songpa and Yongsan in Seoul, South Korea.

However, the bootstraps analysis of European standards has been shown in Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17 as given below.

The change point k is discrete uniform over

(1, 2, 3 \dots 120)

as there are 120 months in 10 years. Please note that

θ, λ

and k are all independent of each other.

5. Numerical Results

Two dissimilar approaches have been used to attain the results. First one is Bayesian approach, which is based on probability distributions. It can be applicable to any kind of data distribution. For this, firstly data distributions are defined and then proposed method is applied to acquire the results. This approach is better to apply for random data structures and time series. While the second method is based on CUSUM charts, this technique is directly applicable on the raw data, which is good for deterministic data structures. Summarized forms of particulate matter (

P M_{2.5}

and

P M_{10}

) change point (k), the rate before change point

(θ)

and the rate after change point

(λ)

during the study period (2004–2013) for four different sites (Guro, Nowon, Songpa and Yongsan) in Seoul, South Korea are given in Table 11, Table 12, Table 13 and Table 14. The results have been computed by following the European, American, and Korean Standards as discussed in Table 4.

5.1. $P M_{2.5}$ Change Point (k) through Bayesian Approach

In Table 11, the results obtained through Bayesian approach have been described, where

(k)

is the predicted change point varies for different areas and different air quality standards. The results indicate the reduction of polluted days after change point

(k)

for

P M_{2.5}

. While

(θ)

represents the per month rate of polluted days before change point,

(k)

and

(λ)

be the rate of per month polluted days after change point

(k)

.

5.2. $P M_{2.5}$ Last Point before Change (k) and First Point after Change $(k + 1)$ through CUSUM Approach

Table 12 represents the results obtained for

P M_{2.5}

through CUSUM approach, where

(k)

is the last point before change and

(k + 1)

be the first point after change point. So, the change point leis somewhere between

(k)

and

(k + 1)

. This method also shows the reduction of polluted days after change point as

(θ)

represents the per month rate of polluted days before change point and

(λ)

be the rate of per month polluted days after change point.

5.3. $P M_{10}$ Change Point (k) through Bayesian Approach

Table 13 explains the results obtained for

P M_{10}

through Bayesian approach. Hence, the expected change point is

(k)

that differs for different areas and various air quality standards. These results show the reduction of polluted days after change point

(k)

for

P M_{10}

. While

(θ)

is the per month rate of polluted days before change point

(k)

and

(λ)

represents the rate of per month polluted days after change point

(k)

.

5.4. $P M_{10}$ Last Point before Change (k) and First Point after Change $(k + 1)$ through CUSUM Approach

The results obtained for

P M_{10}

through CUSUM approach have been described in Table 14, where the last point before change is

(k)

and the first point after change point is

(k + 1)

. Therefore, the change point leis anywhere between

(k)

and

(k + 1)

. This method also depicts the reduction of polluted days after change point.

(θ)

represents the per month rate of polluted days before change point and

(λ)

be the rate of per month polluted days after change point.

6. Disussion

As the results of two different approaches have been described in the previous section.

Bayesian approach is based on probability distributions, which can be applicable on any kind of data distribution. In this case, firstly data distributions are defined and then proposed method is applied to acquire the results. This approach is better to apply for random data structures and time series.
CUSUM Approach is directly applied on the raw data, which is good for deterministic data structures.

6.1. Guro (Seoul, South Korea)

Guro is located in the southwestern part of Seoul, and has an important position as a transport link which includes railroads and land routes. The largest digital industrial complex in Korea is also positioned in Guro, centering on research and development activities as well as advanced information and knowledge industries. That is why, the policies of the Ministry of Environment in South Korea have influenced the concentrations of particulate matters (

P M_{2.5}

and

P M_{10}

) in Guro and rate of polluted days has reduced in any of the cases.

6.1.1. Bayesian Approach

Bayesian method is better to apply for random time series data. If we look in case of Guro, Table 11 indicates that for

P M_{2.5}

change-point (k) of polluted days were 43.18, 33.62, and 38.26 according to European, American, and Korean standards respectively. Therefore, a change occurred in the rate of polluted days, but it varied according to standards. At the minimum, the rate of polluted days (

θ

= 15.99) was reduced 30.14% and the maximum reduction (

θ

= 3.56) to (

λ

= 1.63) was 54.21% in the case of Korean standards. Similarly, Table 13 refers to the reduction of polluted days

(θ)

to (

λ

) for

P M_{10}

after change point (k) which were 85.46, 90.34, and 85.98 according to European, American, and Korean standards, respectively. Moreover, the decrease in the rate of polluted days (

θ

= 14.73) to (

λ

= 9.20) was at least 37.54% for European standards, but it was 80.95% in the case of American standards. Figure 18 and Figure 19 graphically represent the replications of monthly polluted days before and after the change point which are discussed in Table 9 and Table 10.

6.1.2. CUSUM Approach

CUSUM Approach also indicates a reduction in hazards rate from (

θ

) to (

λ

) after change. As for Guro, Table 12 also represents the change of

P M_{2.5}

polluted days through CUSUM approach, which shows that change point occurred between point 40 (k) and 41 (

k + 1

) for European standards, between point 36 (k) and 37 (

k + 1

) for American standards and it lies in-between point 35 (k) and 36 (

k + 1

) for Korean standards. While Table 14 indicates the change of

P M_{10}

polluted days through CUSUM approach with an indication of change point lies between point 77 (k) and 78 (

k + 1

) for European standards, point 89 (k) and 90 (

k + 1

) for American standards and point 64 (k) and 65 (

k + 1

) for Korean standards.

6.2. Nowon (Seoul, South Korea)

Nowon is located in the northeastern part of the city, and has the highest population density in Seoul with 619,509 persons living in 35.44 km

^{2}

, which is surrounded by mountains and forests on the northeast. The policies of the Ministry of Environment in Nowon have improved the rate of polluted days for

P M_{2.5}

and

P M_{10}

hazards from

θ

to

λ

. Improvement in the reduction of polluted days varies case to case.

6.2.1. Bayesian Approach

Correspondingly, in case of Nowon, Table 11 depicts that change point (k) of polluted days for

P M_{2.5}

were 49.95, 41.83, and 50.34 according to European, American, and Korean standards respectively. Particularly for this case, the change point was the same according to European and Korean standards, but varied for American standards. The rate of polluted days (

θ

= 15.88) for European standards showed a minimum decrease of 35.14% after the change point and approached (

λ

= 10.30), but the maximum decrease was for Korean standards which was 67.65% with (

θ

= 3.74) and (

λ

= 1.21). In the same manner, when we study Table 13, it elaborates that for

P M_{10}

, again there was a reduction in the rate of polluted days after change point (k) of 67.12, 73.80, and 65.68 to European, American, and Korean standards, respectively. That is comparable in cases of European and Korean standards, but a bit different for American standards. In addition, the reduction in the rate of polluted days for the European standard was at least 44.01% from (

θ

= 14.93) to (

λ

= 8.36), while the maximum reduction was (

θ

= 0.62) to (

λ

= 0.11) 82.25% for American standards. Figure 20 and Figure 21 graphically represent the replications of monthly polluted days before and after the change point which are given in Table 9 and Table 10.

6.2.2. CUSUM Approach

Moreover, CUSUM Approach also validates the reduction of PM hazards. In case of Nowon, Table 12 represents the change of

P M_{2.5}

polluted days through CUSUM approach, which shows that change point occurred between point 53 (k) and 54 (

k + 1

) for European standards, between point 41 (k) and 42 (

k + 1

) for American standards and it lies in-between point 50 (k) and 51 (

k + 1

) for Korean standards. While Table 14 indicates the change of

P M_{10}

polluted days through CUSUM approach with an indication of change point lies between point 67 (k) and 68 (

k + 1

) for European standards, point 63 (k) and 64 (

k + 1

) for American standards and point 66 (k) and 67 (

k + 1

) for Korean standards.

6.3. Songpa (Seoul, South Korea)

Songpa is located at the southeastern part of Seoul, and has largest population, with 647,000 residents. As per Ministry of Environment policies in Songpa, there is a smaller reduction for the rate of polluted days (

θ

) to (

λ

) as compared to Guro and Nowon, but still there is a significant reduction in PM hazards.

6.3.1. Bayesian Approach

Now for Songpa, we can check from Table 11 that change point (k) of polluted days for

P M_{2.5}

were 53.85, 53.32, and 56.44 for European, American, and Korean standards respectively, which were all similar. The reduction in rate of polluted days was at least 24.50% for European Standards (

θ

= 14.90) to (

λ

= 11.25) while it was highest for Korean standards at 47.77% with (

θ

= 3.14) and (

λ

= 1.64). Correspondingly, we can also inspect the improvement in the rate of polluted days from (

θ

) to (

λ

) for

P M_{10}

after change point (k) in Table 13. Change point (k) for the rate of polluted days due to

P M_{10}

concentration were 53.50, 88.16, and 52.75 according to European, American and Korean standards respectively, which was the same for European and Korean standards. The slightest improvement 41.17% has been in the case of European standards and (

θ

= 16.08) is converted to (

λ

= 9.46). On the other hand, if we look at American standards, the rate of polluted days (

θ

= 0.56) was already low which further decreased 87.5% to (

λ

= 0.07). Hence, this area is almost a meeting of the

P M_{10}

concentration requirements for American standards but not for other standards. Figure 22 and Figure 23 graphically represent the replications of monthly polluted days before and after the change point which are given in Table 9 and Table 10.

6.3.2. CUSUM Approach

As per CUSUM Approach, there is a decrease in PM hazards. Table 12 also represents the change of

P M_{2.5}

polluted days through CUSUM approach, which shows that change point occurred between point 53 (k) and 54 (

k + 1

) for European standards, between point 52 (k) and 53 (

k + 1

) for American standards and it lies in-between point 52 (k) and 53 (

k + 1

) for Korean standards. While Table 14 indicates the change of

P M_{10}

polluted days through CUSUM approach with an indication of change point lies between point 53 (k) and 54 (

k + 1

) for European standards, point 53 (k) and 54 (

k + 1

) for American standards and point 52 (k) and 53 (

k + 1

) for Korean standards.

6.4. Yongsan (Seoul, South Korea)

Yongsan is a place in the center of Seoul in which almost 250,000 people reside. Prominent locations in Yongsan includes Yongsan station, electronic market and Itaewon commercial area with heavy traffic and transportation. Consequently, the policies of the Ministry of Environment in Yongsan has affected the particulate matter (

P M_{2.5}

and

P M_{10}

) concentrations more than all the previous three locations (Guro, Nowon and Songpa). There is a remarkable decrease in rate of polluted days from (

θ

) to (

λ

).

6.4.1. Bayesian Approach

Similarly, in the case of Yongsan, Table 11 and Table 13 tell us that the rate of polluted days (

θ

) for particulate matters (

P M_{2.5}

and

P M_{10}

) was the highest in Seoul. The change occurred for

P M_{2.5}

with the change point (k) 7.94, 6.18, and 5.78 with respect to European, American and Korean standards respectively, which was comparable for all the three standards. There was minimally a 50.71% fall in the rate of polluted days (

θ

= 24.55) to (

λ

= 12.10) for European standards, but the reduction in rate of polluted days was a maximum of 74.18% in the case of Korean standards. On the same note, Table 13 indicates that the change in the rate of polluted days has also occurred for

P M_{10}

concentrations. The change point (k) for it were 89.74, 91.36, and 68.06 for European, American and Korean standards, respectively. Furthermore, at least 42.10% rate of polluted days (

θ

= 14.56) was reduced to (

λ

= 8.43) for European standards but its maximum decrease was 93.93% for American standards (

θ

= 0.66) to (

λ

= 0.04), although, it is already approaching the requirements of this standard. Figure 24 and Figure 25 graphically represent the replications of monthly polluted days before and after the change point which are given in Table 9 and Table 10.

6.4.2. CUSUM Approach

CUSUM Approach is directly applied on the raw data, which should be better for deterministic data structures. It also shows a reduction in PM hazards. In case of Yongsan, Table 12 also represents the change of

P M_{2.5}

polluted days through CUSUM approach, which shows that change point occurred between point 11 (k) and 12 (

k + 1

) for European standards, between point 64 (k) and 65 (

k + 1

) for American standards and it lies in-between point 64 (k) and 65 (

k + 1

) for Korean standards. While Table 14 indicates the change of

P M_{10}

polluted days through CUSUM approach with an indication of change point lies between point 77 (k) and 78 (

k + 1

) for European standards, point 89 (k) and 90 (

k + 1

) for American standards and point 64 (k) and 65 (

k + 1

) for Korean standards.

6.5. Strengths

This approach is very precise, well defined, user friendly and easily understandable for applications on probability distributions, time series and random data.
The above mentioned model is an appropriate approach for detection of change points in random data structures.
Good technique for evaluation of process control programs by comparing the parameters before and after change point.

6.6. Limitations

Detection of only single change point is given in this model.
Further extension is required by making a model for a multiple number of change points for locating changed segments.

6.7. Managerial Insights

This model presents a suitable technique to analyze the air quality and pollutant hazards in the air.
By detecting change points in particulate matter ( $P M_{2.5}$ and $P M_{10}$ ) concentrations and analyzing the occurrences of polluted days before and after a change point, environmental protection agencies can understand the role of their legislation efforts, and whether these change points are favorable or not for the environment.
A comparison of particulate matter hazards before and after a change point evaluates a pollution control program adopted by environmental protection agencies to make a decision. If these policies need further revision or not for the reduction of death rates and burden of diseases due airborne particulate matter concentrations in the air.
This study of pollutant hazards also defines the current levels of subjected air pollutant in the air which is helpful to make new pollution control policies for further improvements.
This research also brings an intuition to define new goals if previously defined goals have been achieved and also provides a vision if the environmental standards need to be revised, or not to overcome environmental challenges.

7. Conclusions

The main focus of this research work was to elucidate an appropriate change point detection model for occurrences of pollutant hazards due to higher concentrations of particulate matter (

P M_{2.5}

and

P M_{10}

) in different locations. The rate of pollutant hazards before and after a change point was also estimated comprehensively to investigate the effectiveness of policies applied by the Ministry of Environment. To verify the model, four major locations (Guro, Nowon, Songpa, and Yongsan) in Seoul, South Korea were selected as study areas due to their different characteristics, such as climate zones, environment, populations and population densities. Three different environmental standards (European, American and Korean) were chosen as threshold values. Then, the model was applied to real time data sets in all cases and conclusions were drawn. The rate before and after the change point of particulate matter concentrations indicated a reduction in polluted days over a 10-year period. The overall results of our study confirm the effective role of legislation efforts used consistently to improve the air quality through the years but pollutant hazards still exist. Hence, further improvements are required to meet set standards to nullify hazards. This study can be further extended by making a multi-parameter change point model for a multiple number of change points considering the fact that different data structures follow different probability distributions.

Author Contributions

Conceptualization, Muhammad Rizwan Khan (M.R.K.) and Biswajit Sarkar (B.S.); methodology, M.R.K.; software, M.R.K.; validation, M.R.K. and B.S.; formal analysis, M.R.K.; investigation, M.R.K.; resources, M.R.K. and B.S.; data curation, M.R.K. and B.S.; writing—original draft preparation, M.R.K.; writing—review and editing, B.S.; visualization, M.R.K. and B.S.; supervision, B.S.; project administration, M.R.K. and B.S.; funding acquisition, B.S.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Héroux, M.E.; Anderson, H.R.; Atkinson, R.; Brunekreef, B.; Cohen, A.; Forastiere, F.; Hurley, F.; Katsouyanni, K.; Krewski, D.; Krzyzanowski, M.; et al. Quantifying the health impacts of ambient air pollutants: recommendations of a WHO/Europe project. Int. J. Public Health 2015, 60, 619–627. [Google Scholar] [CrossRef] [PubMed]
Pope, C.A.; Burnett, R.T.; Thurston, G.D.; Thun, M.J.; Calle, E.E.; Krewski, D.; Godleski, J.J. Cardiovascular mortality and long-term exposure to particulate air pollution: Epidemiological evidence of general pathophysiological pathways of disease. Circulation 2004, 109, 71–77. [Google Scholar] [CrossRef] [PubMed]
Gyarmati-Szabó, J.; Bogachev, L.V.; Chen, H. Modelling threshold exceedances of air pollution concentrations via non-homogeneous Poisson process with multiple change-points. Atmos. Environ. 2011, 45, 5493–5503. [Google Scholar] [CrossRef]
Pirani, M.; Best, N.; Blangiardo, M.; Liverani, S.; Atkinson, R.W.; Fuller, G.W. Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles. Environ. Int. 2015, 79, 56–64. [Google Scholar] [CrossRef] [PubMed]
Welty, L.J.; Peng, R.; Zeger, S.; Dominici, F. Bayesian distributed lag models: Estimating effects of particulate matter air pollution on daily mortality. Biometrics 2009, 65, 282–291. [Google Scholar] [CrossRef]
Qin, S.; Liu, F.; Wang, J.; Sun, B. Analysis and forecasting of the particulate matter (PM) concentration levels over four major cities of China using hybrid models. Atmos. Environ. 2014, 98, 665–675. [Google Scholar] [CrossRef]
Keshavarz, H.; Scott, C.; Nguyen, X. Optimal change point detection in Gaussian processes. J. Stat. Plan. Inference 2018, 193, 151–178. [Google Scholar] [CrossRef]
Kucharczyk, D.; Wyłomańska, A.; Sikora, G. Variance change point detection for fractional Brownian motion based on the likelihood ratio test. Phys. Stat. Mech. Its Appl. 2018, 490, 439–450. [Google Scholar] [CrossRef]
Lu, G.; Zhou, Y.; Lu, C.; Li, X. A novel framework of change-point detection for machine monitoring. Mech. Syst. Signal Process. 2017, 83, 533–548. [Google Scholar] [CrossRef]
Taleizadeh, A.A.; Samimi, H.; Sarkar, B.; Mohammadi, B. Stochastic machine breakdown and discrete delivery in an imperfect inventory-production system. J. Ind. Manag. Optim. 2017, 13, 1511–1535. [Google Scholar] [CrossRef][Green Version]
Liu, S.; Yamada, M.; Collier, N.; Sugiyama, M. Change-point detection in time-series data by relative density-ratio estimation. Neural Netw. 2013, 43, 72–83. [Google Scholar] [CrossRef]
Hilgert, N.; Verdier, G.; Vila, J.P. Change detection for uncertain autoregressive dynamic models through nonparametric estimation. Stat. Methodol. 2016, 33, 96–113. [Google Scholar] [CrossRef]
Górecki, T.; Horváth, L.; Kokoszka, P. Change point detection in heteroscedastic time series. Econom. Stat. 2018, 7, 63–88. [Google Scholar] [CrossRef]
Bucchia, B.; Wendler, M. Change-point detection and bootstrap for Hilbert space valued random fields. J. Multivar. Anal. 2017, 155, 344–368. [Google Scholar] [CrossRef]
Shin, D.; Guchhait, R.; Sarkar, B.; Mittal, M. Controllable lead time, service level constraint, and transportation discounts in a continuous review inventory model. RAIRO-Oper. Res. 2016, 50, 921–934. [Google Scholar] [CrossRef]
Zhou, M.; Wang, H.J.; Tang, Y. Sequential change point detection in linear quantile regression models. Stat. Probab. Lett. 2015, 100, 98–103. [Google Scholar] [CrossRef]
Lu, K.P.; Chang, S.T. Detecting change-points for shifts in mean and variance using fuzzy classification maximum likelihood change-point algorithms. J. Comput. Appl. Math. 2016, 308, 447–463. [Google Scholar] [CrossRef]
Sarkar, B.; Saren, S. Partial trade-credit policy of retailer with exponentially deteriorating items. Int. J. Appl. Comput. Math. 2015, 1, 343–368. [Google Scholar] [CrossRef]
Ruggieri, E.; Antonellis, M. An exact approach to Bayesian sequential change point detection. Comput. Stat. Data Anal. 2016, 97, 71–86. [Google Scholar] [CrossRef]
Keshavarz, M.; Huang, B. Bayesian and Expectation Maximization methods for multivariate change point detection. Comput. Chem. Eng. 2014, 60, 339–353. [Google Scholar] [CrossRef]
Kurt, B.; Yıldız, Ç.; Ceritli, T.Y.; Sankur, B.; Cemgil, A.T. A Bayesian change point model for detecting SIP-based DDoS attacks. Digit. Signal Process. 2018, 77, 48–62. [Google Scholar] [CrossRef]
Wu, C.; Du, B.; Cui, X.; Zhang, L. A post-classification change detection method based on iterative slow feature analysis and Bayesian soft fusion. Remote. Sens. Environ. 2017, 199, 241–255. [Google Scholar] [CrossRef]
Mariño, I.P.; Blyuss, O.; Ryan, A.; Gentry-Maharaj, A.; Timms, J.F.; Dawnay, A.; Kalsi, J.; Jacobs, I.; Menon, U.; Zaikin, A. Change-point of multiple biomarkers in women with ovarian cancer. Biomed. Signal Process. Control. 2017, 33, 169–177. [Google Scholar] [CrossRef]
Jeon, J.J.; Sung, J.H.; Chung, E.S. Abrupt change point detection of annual maximum precipitation using fused lasso. J. Hydrol. 2016, 538, 831–841. [Google Scholar] [CrossRef]
Chen, S.; Li, Y.; Kim, J.; Kim, S.W. Bayesian change point analysis for extreme daily precipitation. Int. J. Climatol. 2017, 37, 3123–3137. [Google Scholar] [CrossRef]
Gupta, A.; Baker, J.W. Estimating spatially varying event rates with a change point using Bayesian statistics: Application to induced seismicity. Struct. Saf. 2017, 65, 1–11. [Google Scholar] [CrossRef]
Bardwell, L.; Fearnhead, P. Bayesian detection of abnormal segments in multiple time series. Bayesian Anal. 2017, 12, 193–218. [Google Scholar] [CrossRef]
Lim, S.S.; Vos, T.; Flaxman, A.D.; Danaei, G.; Shibuya, K.; Adair-Rohani, H.; AlMazroa, M.A.; Amann, M.; Anderson, H.R.; Andrews, K.G.; et al. A comparative risk assessment of burden of disease and injury attributable to 67 risk factors and risk factor clusters in 21 regions, 1990–2010: A systematic analysis for the Global Burden of Disease Study 2010. Lancet 2012, 380, 2224–2260. [Google Scholar] [CrossRef]
Rao, X.; Zhong, J.; Brook, R.D.; Rajagopalan, S. Effect of particulate matter air pollution on cardiovascular oxidative stress pathways. Antioxid. Redox Signal. 2018, 28, 797–818. [Google Scholar] [CrossRef]
Wei, T.; Meng, T. Biological Effects of Airborne Fine Particulate Matter (PM2.5) Exposure on Pulmonary Immune System. Environ. Toxicol. Pharmacol. 2018, 60, 195–201. [Google Scholar] [CrossRef]
Wang, Y.; Xiong, L.; Tang, M. Toxicity of inhaled particulate matter on the central nervous system: neuroinflammation, neuropsychological effects and neurodegenerative disease. J. Appl. Toxicol. 2017, 37, 644–667. [Google Scholar] [CrossRef]
Wellenius, G.A.; Burger, M.R.; Coull, B.A.; Schwartz, J.; Suh, H.H.; Koutrakis, P.; Schlaug, G.; Gold, D.R.; Mittleman, M.A. Ambient air pollution and the risk of acute ischemic stroke. Arch. Intern. Med. 2012, 172, 229–234. [Google Scholar] [CrossRef]
Li, P.; Xin, J.; Wang, Y.; Wang, S.; Shang, K.; Liu, Z.; Li, G.; Pan, X.; Wei, L.; Wang, M. Time-series analysis of mortality effects from airborne particulate matter size fractions in Beijing. Atmos. Environ. 2013, 81, 253–262. [Google Scholar] [CrossRef]
Kim, S.E.; Bell, M.L.; Hashizume, M.; Honda, Y.; Kan, H.; Kim, H. Associations between mortality and prolonged exposure to elevated particulate matter concentrations in East Asia. Environ. Int. 2018, 110, 88–94. [Google Scholar] [CrossRef]
Kim, S.E.; Honda, Y.; Hashizume, M.; Kan, H.; Lim, Y.H.; Lee, H.; Kim, C.T.; Yi, S.M.; Kim, H. Seasonal analysis of the short-term effects of air pollution on daily mortality in Northeast Asia. Sci. Total Environ. 2017, 576, 850–857. [Google Scholar] [CrossRef]
Qin, R.X.; Xiao, C.; Zhu, Y.; Li, J.; Yang, J.; Gu, S.; Xia, J.; Su, B.; Liu, Q.; Woodward, A. The interactive effects between high temperature and air pollution on mortality: A time-series analysis in Hefei, China. Sci. Total Environ. 2017, 575, 1530–1537. [Google Scholar] [CrossRef]
Lee, H.; Honda, Y.; Hashizume, M.; Guo, Y.L.; Wu, C.F.; Kan, H.; Jung, K.; Lim, Y.H.; Yi, S.; Kim, H. Short-term exposure to fine and coarse particles and mortality: A multicity time-series study in East Asia. Environ. Pollut. 2015, 207, 43–51. [Google Scholar] [CrossRef]
Cabrieto, J.; Tuerlinckx, F.; Kuppens, P.; Wilhelm, F.H.; Liedlgruber, M.; Ceulemans, E. Capturing correlation changes by applying kernel change point detection on the running correlations. Inf. Sci. 2018, 447, 117–139. [Google Scholar] [CrossRef]

Figure 1. Flowchart for change point (k) detection.

Figure 2. CUSUM chart for Guro

P M_{2.5}

.

Figure 2. CUSUM chart for Guro

P M_{2.5}

.

Figure 3. CUSUM chart for Guro

P M_{10}

.

Figure 3. CUSUM chart for Guro

P M_{10}

.

Figure 4. CUSUM chart for Nowon

P M_{2.5}

.

Figure 4. CUSUM chart for Nowon

P M_{2.5}

.

Figure 5. CUSUM chart for Nowon

P M_{10}

.

Figure 5. CUSUM chart for Nowon

P M_{10}

.

Figure 6. CUSUM chart for Songpa

P M_{2.5}

.

Figure 6. CUSUM chart for Songpa

P M_{2.5}

.

Figure 7. CUSUM chart for Songpa

P M_{10}

.

Figure 7. CUSUM chart for Songpa

P M_{10}

.

Figure 8. CUSUM chart for Yongsan

P M_{2.5}

.

Figure 8. CUSUM chart for Yongsan

P M_{2.5}

.

Figure 9. CUSUM chart for Yongsan

P M_{10}

.

Figure 9. CUSUM chart for Yongsan

P M_{10}

.