Correlation Structures of PM2.5 Concentration Series in the Korean Peninsula

In this paper, the authors investigate the idiosyncratic features of auto- and cross-correlation structures of PM2.5 (particulate matter of diameter less than 2.5 μ m ) mass concentrations using DFA (detrended fluctuation analysis) methodologies. Since air pollutant mass concentrations are greatly affected by geographical, topographical, and meteorological conditions, their correlation structures can have non-universal properties. To this end, the authors firstly examine the spatio-temporal statistics of PM2.5 daily average concentrations collected from 18 monitoring stations in Korea, and then select five sites from those stations with overall lower and higher concentration levels in order to make up two groups, namely, G1 and G2, respectively. Firstly, to compare characteristic behaviors of the auto-correlation structures of the two groups, we performed DFA and MFDFA (multifractal DFA) analyses on both and then confirmed that the G2 group shows a clear crossover behavior in DFA and MFDFA analyses, while G1 shows no crossover. This finding implies that there are possibly two different scale-dependent underlying dynamics in G2. Furthermore, in order to confirm that different underlying dynamics govern G1 and G2, the authors conducted DCCA (detrended cross-correlation analysis) analysis on the same and different groups. As a result, in the same group, coupling behavior became more prominent between two series as the scale increased, while, in the different group, decoupling behavior was observed. This result also implies that different dynamics govern G1 and G2. Lastly, we presented a stochastic model, namely, ARFIMA (auto-regressive fractionally integrated moving average) with periodic trends, to reproduce behaviors of correlation structures from real PM2.5 concentration time series. Although those models succeeded in reproducing crossover behaviors in the auto-correlation structure, they yielded no valid results in decoupling behavior among heterogeneous groups.


Introduction
Currently, most countries are facing severe air pollution problems incurred by rapid industrialization, high population density, traffic density, and so on. Air pollution levels are determined by the concentrations of the six pollutants, i.e., atmospheric fine particulate matters (PM 2.5 of diameter less than 2.5 µm), atmospheric particulate matter (PM 10 of diameter less than 10 µm), SO 2 , NO 2 , CO, and O 3 . In particular, PM 2.5 is a critical pollutant linked to respiratory and cardiovascular diseases, as well as visibility degradation and climate change [1]. So far, there were many studies on the characteristics of PM 2.5 using three approaches: finding sources of emission and characterizing seasonal distributions [2][3][4][5], analyzing various correlation structures inherent in PM 2.5 time series [6][7][8][9], and developing theoretical models for prediction and control [10][11][12]. also provides a detailed description of the methodology applied to this study. Section 3 derives and interprets various analytical results using DFA, MFDFA, and DCCA methods. Finally, Section 4 gives the discussion and conclusion of this study.

Data
South Korea began opening limited air pollution data of 16 stations to the public in 2002, and later expanded to providing real-time concentrations of six air pollutants from a well-constructed nationwide monitoring network consisting of 344 stations via a website (http://www.airkorea.or.kr) in 2005. The dataset is composed of PM 2.5 mass concentrations recorded hourly from about 344 monitoring stations located nationwide in Korea, from 1 January 2015 to 31 December 2017. Most datasets contain erroneous data points due to operational errors of measuring devices and abnormal values. Thus, the authors selected datasets with less than 10% of missing data points during the whole time span. Finally, they obtained daily average concentrations with a total of 1090 data points each at 18 stations, which were categorized into urban and roadside groups. In fact, all monitoring stations were grouped into four categories: urban, suburban, roadside, and country background. However, most datasets were removed upon processing the wrong data points contained in the original time series. In this study, we analyze the daily average mass concentration of PM 2.5 , defined as the daily average mass concentration instead of the logarithmic value, because the mass concentration is not stored and updated every hour.
As shown in Figure 1, we can observe a seasonal trend as reported in other studies, i.e., a high concentration in the winter and spring, and a low concentration in the summer and fall. This trend shows a similar behavior to meteorological factors such as temperature, relative humidity, wind speed, air pressure, and precipitation. While there seems to be a strong dependence of PM 2.5 concentrations on meteorological factors, it is not clear whether there is a reliable causal relationship between them. No meteorological factor is determined to contribute to the concentration other than sharing a seasonal trend [7].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 13 derives and interprets various analytical results using DFA, MFDFA, and DCCA methods. Finally, Section 4 gives the discussion and conclusion of this study.

Data
South Korea began opening limited air pollution data of 16 stations to the public in 2002, and later expanded to providing real-time concentrations of six air pollutants from a well-constructed nationwide monitoring network consisting of 344 stations via a website (http://www.airkorea.or.kr) in 2005. The dataset is composed of PM2.5 mass concentrations recorded hourly from about 344 monitoring stations located nationwide in Korea, from 1 January 2015 to 31 December 2017. Most datasets contain erroneous data points due to operational errors of measuring devices and abnormal values. Thus, the authors selected datasets with less than 10% of missing data points during the whole time span. Finally, they obtained daily average concentrations with a total of 1090 data points each at 18 stations, which were categorized into urban and roadside groups. In fact, all monitoring stations were grouped into four categories: urban, suburban, roadside, and country background. However, most datasets were removed upon processing the wrong data points contained in the original time series. In this study, we analyze the daily average mass concentration of PM2.5, defined as the daily average mass concentration instead of the logarithmic value, because the mass concentration is not stored and updated every hour.
As shown in Figure 1, we can observe a seasonal trend as reported in other studies, i.e., a high concentration in the winter and spring, and a low concentration in the summer and fall. This trend shows a similar behavior to meteorological factors such as temperature, relative humidity, wind speed, air pressure, and precipitation. While there seems to be a strong dependence of PM2.5 concentrations on meteorological factors, it is not clear whether there is a reliable causal relationship between them. No meteorological factor is determined to contribute to the concentration other than sharing a seasonal trend [7].  Figure 2 shows the spatial variations in PM2.5 concentrations. In particular, stations #15 and #18 show a low-level concentration, while stations #5, #11, and #16 show a high-level concentration. Station #16 is a roadside station, which is strongly affected by mobile vehicles. In order to examine the correlation structure of PM2.5 concentrations obtained from spatially separated monitoring stations, the authors applied the DFA and the MFDFA methods to two groups, namely, G1 (stations #15 and #18) and G2 (stations #5, #11, and #16), consisting of time series with different characteristic  Figure 2 shows the spatial variations in PM 2.5 concentrations. In particular, stations #15 and #18 show a low-level concentration, while stations #5, #11, and #16 show a high-level concentration. Station #16 is a roadside station, which is strongly affected by mobile vehicles. In order to examine the correlation structure of PM 2.5 concentrations obtained from spatially separated monitoring stations, the authors applied the DFA and the MFDFA methods to two groups, namely, G1 (stations #15 and #18) and G2 (stations #5, #11, and #16), consisting of time series with different characteristic seasonal statistics, as shown in Figure 2, among five PM 2.5 monitoring stations with representative statistics.
Appl. Sci. 2020, 10, x FOR PEER REVIEW  4 of 13 seasonal statistics, as shown in Figure 2, among five PM2.5 monitoring stations with representative statistics.
In Figure 3, a seasonal trend is even clearer in G2 while a less clear one can be seen in G1.  shows a strong seasonality. There is a surprising difference in PM2.5 concentration between stations #16 and #18, although they are located in the same city. This is a clue that PM2.5 concentration could be strongly affected by an endogenous factor rather than an exogenous one.

Detrended Fluctuation Analysis (DFA)
In order to investigate the correlation structures inherent in PM2.5 series, the authors applied three kinds of fluctuation analysis methods, i.e., DFA (detrended fluctuation analysis) [15], MFDFA (multifractal DFA) [23], and DCCA (detrended cross-correlation analysis) [24], which are briefly illustrated below. In the application of these fluctuation analyses, the authors took the detrending approach with overlapping sliding boxes, which is a widely used method to obtain better statistics when the data points are finite [25]. The authors firstly considered a noisy PM2.5 series, ( ) ( = 1, 2, … , max ). Then, the authors integrated the time series ( ) to construct the following profile: In Figure 3, a seasonal trend is even clearer in G2 while a less clear one can be seen in G1.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 13 seasonal statistics, as shown in Figure 2, among five PM2.5 monitoring stations with representative statistics.
In Figure 3, a seasonal trend is even clearer in G2 while a less clear one can be seen in G1.  shows a strong seasonality. There is a surprising difference in PM2.5 concentration between stations #16 and #18, although they are located in the same city. This is a clue that PM2.5 concentration could be strongly affected by an endogenous factor rather than an exogenous one.

Detrended Fluctuation Analysis (DFA)
In order to investigate the correlation structures inherent in PM2.5 series, the authors applied three kinds of fluctuation analysis methods, i.e., DFA (detrended fluctuation analysis) [15], MFDFA (multifractal DFA) [23], and DCCA (detrended cross-correlation analysis) [24], which are briefly illustrated below. In the application of these fluctuation analyses, the authors took the detrending approach with overlapping sliding boxes, which is a widely used method to obtain better statistics when the data points are finite [25]. The authors firstly considered a noisy PM2.5 series, ( ) ( = 1, 2, … , max ). Then, the authors integrated the time series ( ) to construct the following profile: Figure 3. The seasonal trend is weak in G1 (stations #15 and #18), while G2 (stations #5, #11, and #16) shows a strong seasonality. There is a surprising difference in PM 2.5 concentration between stations #16 and #18, although they are located in the same city. This is a clue that PM 2.5 concentration could be strongly affected by an endogenous factor rather than an exogenous one.

Detrended Fluctuation Analysis (DFA)
In order to investigate the correlation structures inherent in PM 2.5 series, the authors applied three kinds of fluctuation analysis methods, i.e., DFA (detrended fluctuation analysis) [15], MFDFA (multifractal DFA) [23], and DCCA (detrended cross-correlation analysis) [24], which are briefly illustrated below. In the application of these fluctuation analyses, the authors took the detrending approach with overlapping sliding boxes, which is a widely used method to obtain better statistics when the data points are finite [25]. The authors firstly considered a noisy PM2.5 series, x(i) (i = 1, 2, . . . , N max ). Then, the authors integrated the time series x(i) to construct the following profile: The integrated series was divided into N n (= N max − n + 1) overlapping boxes of equal size n, and the authors fit it using a polynomial function, X ν (k), which was the local trend for the ν-th box. Herein, the authors used the first-order polynomial. To detrend the integrated time series X(k), the authors subtracted the local trend and calculated the local detrended fluctuation function for the ν-th box such that Then, by summing over all overlapping N n boxes of size n, the authors computed the detrended variance function as follows: The above computation provides a relationship between F DFA (n) and n, such as for scale-invariant signals with power-law correlations. The scaling exponent α represents the degree of correlation in the signal. If α = 0.5, there is no correlation, that is, the signal is uncorrelated. On the other hand, if α < 0.5, the signal is negatively correlated. Lastly, if α > 0.5, there is a positive correlation in the signal.

Multifractal Detrended Fluctuation Analysis (MFDFA)
MFDFA is an extension of the conventional DFA and consists of five steps. The first three steps are identical to the DFA procedure. Let us consider x(i) of length N max and construct the profile by integrating x(i) − x , as shown in Equations (1) and (2). After dividing the integrated series into N n (= N max − n + 1) overlapping boxes of equal size n, the authors calculated the detrended fluctuation function F 2 n (ν) as done in Equation (3). Then, the authors averaged over all boxes to obtain the q-th order fluctuation function as follows: where the index variable q can take any real value in general. For q = 2, the standard DFA was retrieved. Lastly, the authors examined the scaling behavior of the generalized q-dependent fluctuation functions by analyzing double-logarithmic plots of F q (n) versus n for each index value q. If the x(i) series featured a long-range power-law correlation, the authors could find a power-law relationship between them, such as F q (n) ∼ n h(q) .
Here, the authors called the exponent h(q) a generalized Hurst exponent since h(2) was identical to the well-known Hurst exponent H [26]. For positive values of q, h(q) describes the scaling behavior of the boxes with large fluctuations because the boxes ν with large variance F 2 n (ν) contribute more to the average F q (n). On the contrary, for negative values of q, h(q) describes the scaling behavior of the boxes with small fluctuations, since the boxes ν with small variance F 2 n (ν) dominate the average F q (n).

Detrended Cross-Correlation Analysis (DCCA)
In order to quantify long-range cross correlations between two non-stationary time series, a modification of the covariance analysis, called DCCA, was proposed by Podobnik and Stanley [23]. The authors firstly considered two long-range cross-correlated time series x(i) and y(i) of equal length N max , from which the authors computed two integrated signals as follows: The authors divided the integrated time series into N n (= N max − n + 1) overlapping boxes of equal size n and defined the local trends X ν (k) and Y ν (k) of a polynomial function for the ν-th box by using a linear least-squares fit. Next, by defining the detrended walk as the difference between the integrated series and the local trend, the authors calculated the covariance of the residuals in the ν-th box as follows: and the detrended covariance function was obtained by averaging the above covariance functions as follows: In order to examine the behavior of the covariance function versus n, the authors considered two cases: one with no power-law cross-correlations in both time series, and the other with power-law cross-correlations in both series. For the former, the exponent of F DCCA (n) is not defined because it has complex values; thus, the authors defined a new measure to quantify the degree of cross-correlation in both time series with non-stationaries. The DCCA coefficient ρ DCCA (α, α , N max , n), defined below, was introduced by Gebende [27].
which is dependent on the length of two time series, their respective DFA exponents (α, α ), and the box size n. Podobnik et al. presented the framework to determine the statistical significance of ρ DCCA (α, α , N max , n) via the computed critical point ρ c (N max , n) for the 95% confidence level by varying the data length N max and the box size n [28]. For the latter approach, the authors could find a power-law scaling between both time series, as shown below.
In particular, for two persistent time series, the DCCA exponent λ was approximately equal to the average of both DFA exponents, i.e., λ ≈ (α + α )/2 [24]. However, for a pair of persistent and anti-persistent series, the authors had no clear support for the previous relationship. According to the numerical analysis, the DCCA exponent λ tended to approximately 0.5. Lastly, the authors addressed the fact that there should be a common error term in both time series if the authors observed a power-law cross-correlation between them.

Autocorrelation Structures
The DFA approach is a well-known method to quantify the degree of correlation properties in non-stationary time series, and it was successfully applied to various fields such as finance, meteorology, cardiac dynamics, and so on. However, there are some important problems in estimating the scaling index, such as changes in the index of scale, crossover, and scaling range. In this work, the authors firstly determined the scaling range using the surrogate dataset generated from the original series using two methods, i.e., shuffling and Fourier phase randomization, as shown in Figure 4 [29]. Herein, the authors set the scaling range to the scale interval [10,200] days. Then, the authors estimated the correlation exponent by linearly fitting the double-logarithmic plots of the detrended fluctuation functions vs. box sizes using the least-squares method. The authors confirmed that the PM 2.5 series showed a weak persistent behavior, as shown in Figure 5.

Autocorrelation Structures
The DFA approach is a well-known method to quantify the degree of correlation properties in non-stationary time series, and it was successfully applied to various fields such as finance, meteorology, cardiac dynamics, and so on. However, there are some important problems in estimating the scaling index, such as changes in the index of scale, crossover, and scaling range. In this work, the authors firstly determined the scaling range using the surrogate dataset generated from the original series using two methods, i.e., shuffling and Fourier phase randomization, as shown in Figure 4 [29]. Herein, the authors set the scaling range to the scale interval [10,200] days. Then, the authors estimated the correlation exponent by linearly fitting the double-logarithmic plots of the detrended fluctuation functions vs. box sizes using the least-squares method. The authors confirmed that the PM2.5 series showed a weak persistent behavior, as shown in Figure 5. box size plotted for two groups of surrogate data, namely, S1 and S2. S1 represents the shuffled series of an original time series, and S2 represents the shuffled series of a Fourier-phase randomized series of an original series. They have almost the same kurtosis. The authors determined the scaling range to be in the interval of [10,200] days, which was free from the finite size effect. All other series showed a similar behavior.  box size plotted for two groups of surrogate data, namely, S1 and S2. S1 represents the shuffled series of an original time series, and S2 represents the shuffled series of a Fourier-phase randomized series of an original series. They have almost the same kurtosis. The authors determined the scaling range to be in the interval of [10,200] days, which was free from the finite size effect. All other series showed a similar behavior.

Autocorrelation Structures
The DFA approach is a well-known method to quantify the degree of correlation properties in non-stationary time series, and it was successfully applied to various fields such as finance, meteorology, cardiac dynamics, and so on. However, there are some important problems in estimating the scaling index, such as changes in the index of scale, crossover, and scaling range. In this work, the authors firstly determined the scaling range using the surrogate dataset generated from the original series using two methods, i.e., shuffling and Fourier phase randomization, as shown in Figure 4 [29]. Herein, the authors set the scaling range to the scale interval [10,200] days. Then, the authors estimated the correlation exponent by linearly fitting the double-logarithmic plots of the detrended fluctuation functions vs. box sizes using the least-squares method. The authors confirmed that the PM2.5 series showed a weak persistent behavior, as shown in Figure 5. box size plotted for two groups of surrogate data, namely, S1 and S2. S1 represents the shuffled series of an original time series, and S2 represents the shuffled series of a Fourier-phase randomized series of an original series. They have almost the same kurtosis. The authors determined the scaling range to be in the interval of [10,200] days, which was free from the finite size effect. All other series showed a similar behavior.  Furthermore, crossover behavior was observed in G2 although there were variations in the degree of crossover, while G1 showed little evidence of crossovers. Considering the seasonality inherent in air pollutant series, this crossover behavior was mainly due to the seasonal trend. In order to confirm this possibility, the authors examined the crossover behaviors arising from the artificially generated time series with periodic trends. To this end, the authors used a stationary linear ARFIMA process to generate a power-law auto-correlated time series as follows [30]: where ρ (range of −0.5 and 0.5) is a free parameter, a i (ρ) represents weights defined by a i (ρ) = −Γ(i − ρ)/[Γ(−ρ)Γ(1 + i)], Γ(x) denotes the gamma function, and η(t) denotes an independent identically distributed (i.i.d.) Gaussian variable. The parameter ρ is related to a Hurst exponent such as = 2 + ρ. Here, the authors added periodic trends to the ARFIMA process in two ways: one was exogenous and the other was endogenous, as shown below. Exogenous trend: y p (t) = y(t) + Asin(2πt/T).
Endogenous trend: In the analysis, the authors set ρ = 0.2 in order to obtain a similar scaling exponent with that of the PM 2.5 series. The data size was also same as the PM 2.5 time series and the period T was set to 360 days.
As shown in Figure 6, the authors confirmed that the crossover behavior in DFA analysis could be adjusted by the degree of periodic trends with different amplitudes. Furthermore, it is likely that a periodic trend with small amplitude existed, although no clear evidence existed in the DFA analysis. However, the authors could not determine which trend was more appropriate to describe the behavior observed in the PM 2.5 series. Another interesting finding was a big difference observed in the crossover behavior of stations #4 and #5, although they were located in the same city. This may be a clue that, in some respects, endogenous factors overwhelm exogenous ones; station #4 was located by the roadside, while station #5 was in an urban area.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 13 Furthermore, crossover behavior was observed in G2 although there were variations in the degree of crossover, while G1 showed little evidence of crossovers. Considering the seasonality inherent in air pollutant series, this crossover behavior was mainly due to the seasonal trend. In order to confirm this possibility, the authors examined the crossover behaviors arising from the artificially generated time series with periodic trends. To this end, the authors used a stationary linear ARFIMA process to generate a power-law auto-correlated time series as follows [30]: is related to a Hurst exponent such as = 2 + . Here, the authors added periodic trends to the ARFIMA process in two ways: one was exogenous and the other was endogenous, as shown below.
Exogenous trend: Endogenous trend: In the analysis, the authors set = 0.2 in order to obtain a similar scaling exponent with that of the PM2.5 series. The data size was also same as the PM2.5 time series and the period was set to 360 days.
As shown in Figure 6, the authors confirmed that the crossover behavior in DFA analysis could be adjusted by the degree of periodic trends with different amplitudes. Furthermore, it is likely that a periodic trend with small amplitude existed, although no clear evidence existed in the DFA analysis. However, the authors could not determine which trend was more appropriate to describe the behavior observed in the PM2.5 series. Another interesting finding was a big difference observed in the crossover behavior of stations #4 and #5, although they were located in the same city. This may be a clue that, in some respects, endogenous factors overwhelm exogenous ones; station #4 was located by the roadside, while station #5 was in an urban area. Although there is no noticeable difference in both, the crossover behavior seems to be controlled by the magnitude of the periodic trend, i.e., the amplitude. Although there is no noticeable difference in both, the crossover behavior seems to be controlled by the magnitude of the periodic trend, i.e., the amplitude.
Next, the authors examined the correlation structures of different fluctuation levels using MFDFA analysis with different q-values, as shown in Figures 7 and 8. For the low fluctuations represented by q = −5, there was no clear scale range in G1, although a strong persistent tendency was seen in small scales, while G2 showed a strong persistent behavior over the whole scale range. For high fluctuations with q = +5, G1 and G2 showed a very similar behavior to the DFA cases.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 13 Next, the authors examined the correlation structures of different fluctuation levels using MFDFA analysis with different q-values, as shown in Figures 7 and 8. For the low fluctuations represented by = −5, there was no clear scale range in G1, although a strong persistent tendency was seen in small scales, while G2 showed a strong persistent behavior over the whole scale range. For high fluctuations with = +5, G1 and G2 showed a very similar behavior to the DFA cases.

Cross-Correlation Structures
In order for a power-law correlation to exist between two time series, two conditions must be satisfied; one is a common error term in both series, and the other involves sharing the same periodic term overwhelming random errors. As shown in Figure 9, PM2.5 series showed a considerably robust behavior in power-law cross-correlation scaling although there were different behaviors in the DCCA coefficients, i.e., decoupling in different groups and coupling in the same groups. These results imply that all PM2.5 series in the Korean peninsula, which are small compared with China and India, have at least one common error term and share a similar seasonal trend. Next, the authors examined the correlation structures of different fluctuation levels using MFDFA analysis with different q-values, as shown in Figures 7 and 8. For the low fluctuations represented by = −5, there was no clear scale range in G1, although a strong persistent tendency was seen in small scales, while G2 showed a strong persistent behavior over the whole scale range. For high fluctuations with = +5, G1 and G2 showed a very similar behavior to the DFA cases.

Cross-Correlation Structures
In order for a power-law correlation to exist between two time series, two conditions must be satisfied; one is a common error term in both series, and the other involves sharing the same periodic term overwhelming random errors. As shown in Figure 9, PM2.5 series showed a considerably robust behavior in power-law cross-correlation scaling although there were different behaviors in the DCCA coefficients, i.e., decoupling in different groups and coupling in the same groups. These results imply that all PM2.5 series in the Korean peninsula, which are small compared with China and India, have at least one common error term and share a similar seasonal trend.

Cross-Correlation Structures
In order for a power-law correlation to exist between two time series, two conditions must be satisfied; one is a common error term in both series, and the other involves sharing the same periodic term overwhelming random errors. As shown in Figure 9, PM 2.5 series showed a considerably robust behavior in power-law cross-correlation scaling although there were different behaviors in the DCCA coefficients, i.e., decoupling in different groups and coupling in the same groups. These results imply that all PM 2.5 series in the Korean peninsula, which are small compared with China and India, have at least one common error term and share a similar seasonal trend. Stable cross-correlated power-law behaviors remain in both, stressing that at least one common factor governs their dynamics.
According to the simulation results based on the ARFIMA process with periodic trends, both time series of the same periodic trend with independent identically distributed (i.i.d.) Gaussian variables showed an incomplete behavior in DCCA scaling, i.e., with complex-valued covariance in small scales, as shown in Figure 10. Depending on the characteristics of a periodic trend, i.e., exogenous or endogenous, there are two distinguishable tendencies. When the amplitude of a trend function is large enough to overwhelm the randomness, the endogenous factor promotes coupling instead of the exogenous one. On the other hand, decoupling lasts longer in terms of DCCA coefficients in the endogenous case than the exogenous case, as shown in Figures 10b and 11b. This may be due to the accumulation effect in the endogenous process because a trend is incorporated into the underlying dynamics. Also, an incomplete covariance appeared in cases where both time series had a common error term but periodic terms with different periods. Therefore, the authors concluded that the PM2.5 series shared more than one factor governing the dynamics, and that they were governed by a similar seasonal trend with variable amplitudes.  Stable cross-correlated power-law behaviors remain in both, stressing that at least one common factor governs their dynamics.
According to the simulation results based on the ARFIMA process with periodic trends, both time series of the same periodic trend with independent identically distributed (i.i.d.) Gaussian variables showed an incomplete behavior in DCCA scaling, i.e., with complex-valued covariance in small scales, as shown in Figure 10. Depending on the characteristics of a periodic trend, i.e., exogenous or endogenous, there are two distinguishable tendencies. When the amplitude of a trend function is large enough to overwhelm the randomness, the endogenous factor promotes coupling instead of the exogenous one. On the other hand, decoupling lasts longer in terms of DCCA coefficients in the endogenous case than the exogenous case, as shown in Figures 10b and 11b. This may be due to the accumulation effect in the endogenous process because a trend is incorporated into the underlying dynamics. Also, an incomplete covariance appeared in cases where both time series had a common error term but periodic terms with different periods. Therefore, the authors concluded that the PM 2.5 series shared more than one factor governing the dynamics, and that they were governed by a similar seasonal trend with variable amplitudes. Stable cross-correlated power-law behaviors remain in both, stressing that at least one common factor governs their dynamics.
According to the simulation results based on the ARFIMA process with periodic trends, both time series of the same periodic trend with independent identically distributed (i.i.d.) Gaussian variables showed an incomplete behavior in DCCA scaling, i.e., with complex-valued covariance in small scales, as shown in Figure 10. Depending on the characteristics of a periodic trend, i.e., exogenous or endogenous, there are two distinguishable tendencies. When the amplitude of a trend function is large enough to overwhelm the randomness, the endogenous factor promotes coupling instead of the exogenous one. On the other hand, decoupling lasts longer in terms of DCCA coefficients in the endogenous case than the exogenous case, as shown in Figures 10b and 11b. This may be due to the accumulation effect in the endogenous process because a trend is incorporated into the underlying dynamics. Also, an incomplete covariance appeared in cases where both time series had a common error term but periodic terms with different periods. Therefore, the authors concluded that the PM2.5 series shared more than one factor governing the dynamics, and that they were governed by a similar seasonal trend with variable amplitudes.   There is no complex value in fluctuation covariance, and the behaviors of DCCA coefficients are considerably different, as shown in the inset. (a) The cross-correlation is overwhelmed by the periodic trend, which is also shown in the scaling exponent, while (b) the randomness overwhelms the trend and, furthermore, its effect is accumulated in comparison to an exogenous case.

Discussion and Conclusions
In this work, we examined the descriptive statistics of PM2.5 concentrations at 18 sites in Korea, and we confirmed a well-known pronounced seasonal trend in most series. Then, we selected five notable sites with lower and higher mass concentrations and then generated two groups, G1 and G2, in order to conduct comparative analyses on both. To determine the heterogeneity in underlying dynamics depending on the concentration levels, we analyzed the auto-and cross-correlation structures of both groups, i.e., G1 and G2, consisting of stations #15 and #18 and stations #5, #11, and #16, respectively. They showed a different behavior in DFA (detrended fluctuation analysis); G1 showed no crossover, while G2 showed strong evidence of crossover. This finding implies that G2 has two governing dynamics depending on the scale, while G2 has one dynamic. In terms of the impact of trans-boundary transport from China, G1 has little influence while G2 has strong influence. Furthermore, to confirm the heterogeneity between groups, we conducted DCCA (detrended crosscorrelation analysis) on two series belonging to the same groups and different groups. Our finding strongly supports the heterogeneity in both. In the same group, coupling or synchronization increased as the scale increased, while, in the different group, decoupling between both series developed over increasing scale. This can be a starting point to categorize the specific region into several groups based on DCCA behaviors. Lastly, in order to reproduce the auto-correlation properties produced from real PM2.5 time series, we presented the ARFIMA (auto-regressive fractionally integrated moving average) process. Although these models succeeded in reproducing crossover behaviors in the auto-correlation structure, they yielded no valid results in decoupling behavior among heterogeneous groups. Also, through the analysis of the ARFIMA process, we found slightly different behavior in the DCCA analysis when incorporating the trend term, whether exogenous or endogenous.
In summary, the PM2.5 accumulation process operates in a complicated way and shows different dynamical behavior depending on mass concentration. However, using simple DFA analysis, we could categorize all PM2.5 time series into two groups with and without crossovers. Then, we confirmed that DCCA analysis works as an effective tool for checking the heterogeneity of DCCA coefficient behavior over different scales. There is no complex value in fluctuation covariance, and the behaviors of DCCA coefficients are considerably different, as shown in the inset. (a) The cross-correlation is overwhelmed by the periodic trend, which is also shown in the scaling exponent, while (b) the randomness overwhelms the trend and, furthermore, its effect is accumulated in comparison to an exogenous case.

Discussion and Conclusions
In this work, we examined the descriptive statistics of PM 2.5 concentrations at 18 sites in Korea, and we confirmed a well-known pronounced seasonal trend in most series. Then, we selected five notable sites with lower and higher mass concentrations and then generated two groups, G1 and G2, in order to conduct comparative analyses on both. To determine the heterogeneity in underlying dynamics depending on the concentration levels, we analyzed the auto-and cross-correlation structures of both groups, i.e., G1 and G2, consisting of stations #15 and #18 and stations #5, #11, and #16, respectively. They showed a different behavior in DFA (detrended fluctuation analysis); G1 showed no crossover, while G2 showed strong evidence of crossover. This finding implies that G2 has two governing dynamics depending on the scale, while G2 has one dynamic. In terms of the impact of trans-boundary transport from China, G1 has little influence while G2 has strong influence. Furthermore, to confirm the heterogeneity between groups, we conducted DCCA (detrended cross-correlation analysis) on two series belonging to the same groups and different groups. Our finding strongly supports the heterogeneity in both. In the same group, coupling or synchronization increased as the scale increased, while, in the different group, decoupling between both series developed over increasing scale. This can be a starting point to categorize the specific region into several groups based on DCCA behaviors. Lastly, in order to reproduce the auto-correlation properties produced from real PM 2.5 time series, we presented the ARFIMA (auto-regressive fractionally integrated moving average) process. Although these models succeeded in reproducing crossover behaviors in the auto-correlation structure, they yielded no valid results in decoupling behavior among heterogeneous groups. Also, through the analysis of the ARFIMA process, we found slightly different behavior in the DCCA analysis when incorporating the trend term, whether exogenous or endogenous.
In summary, the PM 2.5 accumulation process operates in a complicated way and shows different dynamical behavior depending on mass concentration. However, using simple DFA analysis, we could categorize all PM 2.5 time series into two groups with and without crossovers. Then, we confirmed that DCCA analysis works as an effective tool for checking the heterogeneity of DCCA coefficient behavior over different scales.