Multivariate Multifractal Detrending Moving Average Analysis of Air Pollutants

: One of the most challenging endeavors of contemporary research is to describe and analyze the dynamic behavior of time series arising from real-world systems. To address the need for analyzing long-range correlations and multifractal properties of multivariate time series, we generalize the multifractal detrended moving average algorithm (MFDMA) to the multivariate case and propose a multivariate MFDMA algorithm (MV-MFDMA). The validity and performance of the proposed algorithm are tested by conducting numerical simulations on synthetic multivariate monofractal and multifractal time series. The MV-MFDMA algorithm is then utilized to analyze raw, seasonally adjusted, and remainder components of ﬁve air pollutant time series. Results from all three cases reveal multifractal properties with persistent long-range correlations.


Introduction
Research of complex systems has become increasingly important in both natural and social sciences. Time series derived from complex real-world systems exhibit nonlinear behavior that cannot be characterized by linear statistical models. Namely, time series from financial, environmental, and many other fields often manifest long-term memory and frequent large fluctuations that cannot be adequately explained with normal-like distributions. Conventional econometric models such as ARMA, GARCH, and EGARCH fail to capture and accommodate these properties and the nature of such series. The scaling law mostly used to describe these types of time series is a power law with a scaling exponent α that (at least) asymptotically describes the behavior of a quantity F as a function of a scale parameter s: F(s) ∼ s α [1]. The systems characterized by a scaling law typically represent fractals or multifractals, depending on whether they are described by one scaling exponent or by a multitude of scaling exponents [1,2].
Various methods have been developed to characterize the properties of fractals and multifractals. Detrended fluctuation analysis (DFA) is applied to time series data in various fields, but Peng et al. originally proposed it for identifying long-range dependence in DNA nucleotide sequences [3]. The DFA algorithm was extended to multifractal detrended fluctuation analysis (MFDFA) for describing the multifractal properties of time series [4]. Methods based on the moving average have also been developed. The first such method was introduced for estimating the Hurst exponent of self-affinity signals [5]. It was further extended to the detrending moving average (DMA) and the multifractal detrending moving average (MFDMA) by considering the second-order difference between the original time series and its moving average function [6]. The behavior of the multivariate time series has been widely studied [1,[7][8][9][10][11][12]. The DFA and MFDFA have been recently extended to the multivariate cases (MVDFA and MV-MFDFA) to give insight into the multichannel data and auto-correlation behavior [1,12].
In this paper, inspired by Xiong and Shang [1] and Zhang et al. [12], we extend the method of MFDMA to a multivariate case (MV-MFDMA) to analyze the long-range correlations and multifractal properties of multivariate air pollutant time series data. A better understanding of the temporal and spatial variability of environmental time series is essential for modeling various phenomena. Algorithms that can handle multivariate time series have become exceedingly important, especially in dealing with environmental issues. Different environmental time series have been previously analyzed in the contexts of multifractality. For instance, Reference [13] used multifractal analysis on an air pollution index time series, while References [14,15] used six air pollutants for numerical experiments in their multifractal analysis. Fine particulate matter pollutants have been of interest for several research endeavors [16][17][18], while [19], for example, used a multifractal analysis on a time series from the European carbon futures markets. Multifractal analysis is often used for temperature analysis like [20] did for air temperature, or [21] for global methane concentrations and remotely-sensed temperature anomalies.
We chose data on air pollution as it directly impairs the environment, endangers entire ecosystems, causes biodiversity loses, and jeopardizes human health. Air pollution occurs when harmful or excessive amounts of gases, particles, and biological molecules are introduced into the atmosphere, whether they are of natural or anthropogenic origin. However, the consensus is that anthropogenic pollution is the main cause of most problems the environment faces today. Pollution is primarily caused by industrial activities, energy use, transport, and agricultural activities. Furthermore, some household activities, such as heating, can cause significant air pollution, although this is mostly relevant for developing and underdeveloped countries. Coordinated action and great willpower are needed to make positive changes because some pollutants, such as fine particulate matter and groundlevel ozone continuously create significant health problems, while various emissions continue to damage the environment. Although air quality improvements are somewhat noticeable in recent years because of coordinated international actions and global, regional, and national policies, it is evident that a multidisciplinary approach is needed to advance this fundamentally important research area and create solutions to improve air quality.
Therefore, our paper has three objectives. We extend the MFDMA method to a multivariate case, suggest the MV-MFDMA algorithm, and then, conduct numerical experiments on multivariate processes to investigate the performance of the newly proposed MV-MFDMA. Finally, we apply the MV-MFDMA algorithm to explore the long-range correlations and multifractal properties of five air pollutant time series.
The remainder of the paper is organized as follows. We introduce the proposed MV-MFDMA algorithm in Section 2, while the validity of the MV-MFDMA algorithm is presented in Section 3. In Section 4, we outline the available data set and pre-processing analysis and provide the results obtained using air pollutant time series. Finally, the conclusions are drawn in Section 5.

Multivariate Multifractal Detrending Moving Average Analysis
In this section, we propose a multivariate multifractal detrended moving average algorithm (MV-MFDMA). Let y = (y t,1 , y t,2 , . . . , y t,p ), t = 1, 2, . . . , N denote the p time series, where N is the number of observations in each time series. The newly constructed MV-MFDMA algorithm consists of the following steps.
Step 1: Calculate cumulative sums of each time series i = 1, 2, . . . , p: Step 2: Calculate the moving average function of each time series i = 1, 2, . . . , p in a moving window of size n:Ỹ where η is the largest integer not larger than η and η is the smallest integer not smaller than η. The parameter θ ∈ [0, 1] specifies the position of the moving window. In general, the moving average function includes (n − 1)(1 − θ) data points in the past and (n − 1)θ data points in the future. Here, we consider three different values of parameter θ = 0, 0.5, 1. If θ = 0, then the moving average function is calculated over all the past (n − 1) data points of the series, and hence, it refers to the backward moving average. For θ = 0.5, the moving average function includes half past and half future data points in each window and refers to the central moving average. In the case of θ = 1, the moving average function is calculated over all the future (n − 1) data points and refers to the forward moving average. For more details, see [6,22].
Step 3: Calculate the series residue by subtracting the moving average functionỸ t,i from Y t,i : where t satisfies the criterion n − (n − 1)θ ≤ t ≤ N − (n − 1)θ .
Step 4: Divide the residue series e t,i into N n = N/n − 1 non-overlapping segments of equal length n. The segments are denoted by e v such that e v t,i = e l+t,i for 1 ≤ t ≤ n with l = (v − 1)n.
Step 5: Calculate the fluctuation variance F 2 (v, n) as a function of n for an arbitrary segment v: where e v = (e t,1 , e t,2 , . . . , e t,p ) and || · || stands for the Euclidean norm.
Step 6: Average over all local variances F 2 (v, n) to obtain the q th order fluctuation function: where q = 0 and F q (v, n) = (F 2 (v, n)) q/2 If q = 0, Step 7: Vary the values of segment size n to determine the power law relation between the function F MV−MFDMA q (n) and the size scale n. If a time series exhibits multifractal properties, then F MV−MFDMA q (n) for large values of n will follow a power law type of scaling relation, such as: where h(q) denotes the generalized Hurst exponent.
The generalized Hurst exponent h(q) can be attained by the slope of the log-log plot of F MV−MFDMA q (n) versus n through the method of ordinary least squares. The scaling exponent h(q) when q > 0 traces the scaling behavior of the segments with larger fluctuations. Similarly, it shows the scaling behavior of segments with small fluctuations for q < 0. Usually, the large fluctuations are identified by a smaller scaling exponent h(q) for multifractal series. For monofractal time series, h(q) is independent of q. If only short-range correlations or no correlations exist in the sequence, then the scaling exponent h(2) equals 0.5. In this case, the time series display a random walk behavior. However, if there is long-range power law correlation, then h = 0.5. Furthermore, if 0.5 < h(2) < 1, the long-range auto-correlations are persistent, which indicates that an increase is likely to be followed by another increase. If 0 < h(2) < 0.5, we have long-range auto-correlations with anti-persistent behavior [3,4]. In this case, an increase is likely to be followed by a decrease.
Such a defined generalized Hurst exponent h(q) is directly related to the multifractal scaling exponent τ(q): τ(q) = q · h(q) − 1. The monofractal time series are characterized by a linear form for the multifractal scaling exponent. The multifractality of the time series can be characterized by the singularity spectrum f (α) of the Holder exponent α.
The singularity spectrum f (α) indicates the dimension of the subset of the series that is characterized by α. The spectrum gives information about the relative dominance of various fractal exponents present in the series, while α characterizes the strength of the singularity. The wider the spectrum, the richer the multifractality behavior.
Note that if i = 1, then the MV-MFDMA reduces to the standard MFDMA. On the other hand, if i ≥ 2, the MV-MFDMA investigates the multifractal features and long-range correlation properties of the multivariate process of dimension i as a whole.

Numerical Experiments on Synthetic Data Sets
The validity of the proposed MV-MFDMA algorithm is investigated in this section by performing numerical simulations on synthetic multivariate time series data. Synthetic data are modeled by the autoregressive fractionally integrated moving-average process (ARFIMA) following the procedure in [12]. Let us recall that the long-range correlations in stochastic variables can be modeled as: where (t) follows a standard normal distribution, d ∈ (−0.5, 0.5) is a memory parameter related to the Hurst exponent h z = 0.5 + d, and Z(d, t) = ∑ ∞ n=1 a n (d)z(t − n), a n (d) = dΓ(n − d)/(Γ(1 − d)Γ(n + 1)) [12]. Then, the two-component ARFIMA process can be defined as: where x (t) and y (t) follow a standard normal distribution N(0, 1), d 1 , d 2 ∈ [0, 0.5] are the scaling parameters, and W ∈ [0.5, 1] is a free parameter that controls the coupling strength between variables x and y [12]. If W = 1, then the variables x and y are fully decoupled and represent two separate ARFIMA processes. On the other hand, if W decreases from one to 0.5, then the correlations between variables x and y increase.
In the experiments, we considered multivariate monofractal time series, both independent and correlated. The length of each time series was N = 2 14 . The results were averaged over 20 realizations of each type of test series.

Independent Multivariate Monofractal Series
We generated a trivariate uncorrelated monofractal time series with parameters corresponding to the scaling exponent h = 0.5 (d = 0) using the ARFIMA model (9) and (10). These series were considered as uncorrelated since the initial data channels were realizations of mutually independent uncorrelated monofractal series [12]. Figure 1 displays the scaling exponents h(q) as a function of q and multifractal spectra f (α) obtained by applying the MV-MFDMA to trivariate uncorrelated monofractal series, for θ = 0, 0.5, 1. As expected, regardless of θ, for trivariate uncorrelated monofractal series, the estimated scaling exponent h(q) equals 0.5. This result implies the monofractal properties of multivariate series and holds regardless of the parameter θ. The multifractal spectra f (α) confirms the validity of the algorithm.  Next, we simulated trivariate anti-correlated and correlated monofractal time series with the following scaling exponents: Figure 2. The results show that h(q) is independent of q and practicality equal to 0.3 and 0.7, respectively. These results hold regardless of the chosen parameter θ.

Correlated Bivariate Monofractal Series
The MV-MFDMA method was also applied and tested on the correlated bivariate time series simulated by the two-component ARFIMA model following the procedure in [12].
The following values of a free parameter controlling the coupling strength between time series were considered: W = 0.5, 0.7, 0.9, while the parameters of the model d 1 = d 2 = 0.2 corresponded to scaling coefficients h 1 = h 2 = 0.7. Figure 3 shows the average MV-MFDMA results of correlated bivariate time series with different correlation levels for θ = 0, 0.5, 1. If we fix the coupling parameter W, the scaling exponents h(q) do not depend on the value of q and are nearly equal to the value of 0.7. This result is in harmony with the scaling exponent of individual univariate series. Moreover, even if we change the coupling parameter W, the scaling exponents h(q) remain the same for different values of q.

Multivariate Multifractal Series
Finally, we applied the MV-MFDMA to multifractal time series simulated by the following binomial multifractal model: where p x ∈ (0, 0.5) and n(k) is the number of the digits equal to one in the binary representation of index k [12]. More precisely, we simulated trivariate time series x, y, z according to (11) and chose p x = 0.2, p y = 0.3 and p z = 0.4, respectively. The proposed algorithm was applied to the generated series, and the obtained results are shown in Figure 4 in terms of scaling exponent h(q) and multifractal spectra f (α) for θ = 0.5. We also present the same results for individual time series.  Based on the presented results, we can observe that the MV-MFDMA algorithm preserves the properties of the univariate MFDMA. Namely, it is known that algorithms based on the multifractal detrending moving average are sensitive to the sample size, the selection of the scaling range, the choice of q-orders, and the position of the moving window [23]. These algorithms require a larger sample size, i.e., a larger number of observations are needed to obtain a more accurate estimation of the multifractal spectrum. Then, the accuracy is also influenced by the selection of the appropriate scaling range. Equal spacing between scales is suggested, where the scaling range varies from 10 to N/10 [23]. The choice of the q-orders should include both positive and negative values. It is evident from the presented results that the algorithm depends on the position of the moving window. According to [24], the lowest accuracy is obtained when θ = 0.5, while θ = 0 and θ = 1 provide higher accuracy.
A comparison of the proposed MV-MFDMA algorithm with other algorithms for detecting multifractal properties is given in Table 1. • requires an equal length of time series MV-MFDFA [12] • applies to multivariate time series

Air Pollutant Time Series via MV-MFDMA
In this section, we present the data and analyze the multifractal properties of multivariate air pollutant time series. We performed the MV-MFDMA algorithm on raw series, seasonally adjusted series, and remainder components and then compared the results.

Data
Data for this paper were acquired from [25]. For the purposes of testing the MV-MFDMA, we used datasets of five air pollutants presented in Table 2. To obtain uniform series in terms of length, values were used only on days when all five variables were measured.  Table 3 provides the descriptive statistics of the air pollutant time series. Positive values of skewness indicate that the distributions of all air pollutants are positively skewed. Kurtosis is greater than three for all air pollutants, which indicates that the homogeneity of the distribution in all cases is leptokurtic in relation to the normal distribution. The Jarque-Bera test rejects the null hypothesis of normality in air pollutants' series distribution, which is further corroborated by the fact that kurtosis is larger than three, and there is no zero skewness in any of the five cases. The statistics of the augmented Dickey-Fuller (ADF) unit root test based on both the Akaike (AIC) and Schwarz information criterion (SIC) reject the null hypothesis of a unit root at the 1% significance level. Additionally, the Philips-Perron (PP) test also rejects the null hypothesis of a unit root at the same significance level. Both the ADF and PP tests were done for both the trend and intercept and just the intercept cases. The t-statistics for all six cases of unit root tests are also presented in Table 3 and indicate that the air pollutants series are stationary.  The state of California was chosen because it is usually singled out as a leader in air pollution among other U.S. states. Even though all air pollutants are high in California, the O 3 levels particularly stand out. Since seasonal trends can sometimes influence the multifractal behavior of time series, it is preferable to perform seasonal decomposition; see for example [15,26,27]. It is interesting to note that a common approach for analyzing trendless fluctuations is absent. Nigmatullin and Vorobev [28] stated that in most cases, authors use traditional methods, such as the Fourier method, the wavelet method, Yulmenteyev's method, and Timashev's method or some additional processing algorithms that are also based on conventional methods containing some model assumptions and treatment methods associated with continuous mathematics. Nigmatullin, Lino, and Maione [29] pointed out that these sets of methods solve some specific tasks, but cannot be viewed as universal. To address this gap, a universal "platform" for treating various types of different trendless sequences has been proposed (see [28,29] for more details). For the purposes of this paper, we adjusted the considered time series by performing seasonal and trend decomposition using LOESS (STL) decomposition [30]. Through this algorithm, each time series is additively decomposed into deterministic trends, seasonal components, and stochastic remainder components. Removing seasonal components from the raw time series does not significantly influence the descriptive statistics results. All the descriptive statistics and unit root tests conclusions from the raw data are valid for the seasonally adjusted data as well (see Appendix A). Figure 5 shows that the seasonal components of all time series are characterized by the annual oscillation. Figure 5 also demonstrates, among others, the evolution trend of five air pollutant concentration series. The PM 2.5 and SO 2 series are more prone to extreme spikes, whereas NO 2 , CO, and O 3 concentrations have somewhat different trends in winter and summer, indicating a seasonal inclination. On the other hand, the trend components are characterized by a very small range of variability and do not show a large temporal evolution. The range of variability is the largest for series NO 2 and PM 2.5 . The remainder components of all time series do not display discernible patterns, and there are small fluctuations around zero.

Results
The MV-MFDMA presented in Section 2 was applied to the above-mentioned raw and seasonally adjusted air pollutant time series obtained upon removal of the seasonal components, as well as on the remainder components of the series. Applying multifractal analysis on the remainder components ensures the identification of the dynamic characteristics of air pollutants' inner fluctuations and enhances the robustness of the results. We set the range of time scale n to be 10 < n < N/10, where N is the length of each time series. Figure 6 illustrates the behavior of the MV-MFDMA fluctuation function versus time in the logarithmic scale for all three case: raw, seasonally adjusted, and remainder. The plots are presented for q = −4, 0, 4 and fixed θ = 0.5. All curves are approximately linear under large scales, indicating the power law behavior and the presence of multifractality. The MV-MFDMA fluctuation functions have quite similar behavior in all three cases, while we can note that fluctuations are the smallest for the remainder components. Fluctuations are observed regardless of the q values for small log(s) values. That is, the results show crossover time scales, i.e., different scaling laws and scaling exponents for time scales n < n * (where n * ≈ 3.32 for raw and n * ≈ 12 for the other two cases). When n > n * , we find that the MV-MFDMA fluctuation functions nicely respect the scaling relation, i.e., scale well with the scale size. Additionally, although not displayed, the MV-MFDMA is tested for θ = 0, 1. The results confirm the power law behavior and demonstrate the absence of large fluctuations.  Figure 7 shows the values of the generalized Hurst exponents h(q) versus q, where the variation of q belongs to the interval [−4, 4] for raw, seasonally adjusted, and remainder series. In all cases for θ = 0, 0.5, 1, we observe the dependence of MV-MFDMA h(q) on q. The MV-MFDMA h(q) monotonically decreases when the value of q increases. This result implies the multifractality of the examined time series. It is interesting to note that raw and seasonally adjusted series display almost the same behavior of the generalized Hurst exponents h(q) for all values of parameter θ. On the contrary, the values of the generalized Hurst exponents h(q) for the remainder series are above 0.5 for θ = 0.5, which indicates a strong long-term persistence. However, in the cases of the forward and backward moving average, the values of h(q) are lower.   With the aim of quantifying the degree of the multifractality of the multivariate system, we use the following measures (12) and (13) [12]: The larger the measures of ∆h and ∆α, the richer and stronger the degree of multifractality of the time series is. Tables 4-6 present the scaling exponent h(2) and the multifractality degrees ∆h and ∆α for the individual and multivariate air pollutants for all levels θ = 0, 0.5, 1 and for raw, seasonally adjusted, and remainder series, respectively.  Raw and seasonally adjusted series have similar characteristics of multifractal parameters. For θ = 0 and θ = 1, the values of h(2) are approximately around one, while in the case of θ = 0.5, the values are a bit lower, namely h(2) ≈ 0.85 and h(2) ≈ 0.9 for raw and seasonally adjusted series, respectively. This holds for both individual and multivariate time series. The results imply that both individual and multivariate series show persistent long-range correlation. Analogous to 1/ f , noise can be deducted in the case of the scaling exponent h(2) of multivariate time series less than one. In the case of the remainder components, the fluctuations are also characterized by a long-term persistence. However, the values of h(2) suggest that the remainder components are characterized by a weak persistence suggesting stable trends in the next period. In summary, multivariate series of five air pollutants show consistent, positive autocorrelation behavior. In all three cases, an interesting fact is that all values of ∆h and ∆α are significantly larger than zero. On the other hand, the values for multivariate air pollutants range around those values for individual time series and are lower than the maximum values of the individuals in each case. Both the ∆h and ∆α values in the remainder components are higher than the values of both raw and seasonally adjusted series. This finding reveals that the considered air pollutants can be characterized by multifractality behavior. Furthermore, since the relationship always exists among the individuals, it possibly makes the multifractal level weaker. As a result, the ∆h and ∆α values for multivariate time series are smaller than the maximum and average of the ∆h and ∆α values of individuals.

Conclusions
The multivariate approach and analysis may have a key role in cases when a high degree of uncertainty and coupling underlying dynamical mechanics is present. Hence, in this paper, we introduced the multivariate multifractal detrended moving average analysis (MV-MFDMA) as a multivariate generalization of the MFDMA method. The proposed method provides a way to analyze and investigate fractal dynamics information in multi-variate time series data sets generated by complex systems. The validity of the proposed method was investigated by conducting numerical simulations on synthetic monofractal and multifractal time series. The correlation properties of multivariate monofractal time series are in line with the univariate case, and the fractal behavior of correlated bivariate time series is independent of its correlation level. In the case of the multivariate multifractal series, the multivariate system exhibits multifractal properties. The corresponding multifractal degree decreases with the decreasing of the multifractal degree of individual series. According to our simulation results, the MV-MFDMA represents a reliable technique for measuring the long-term correlations of non-stationary multivariate time series.
The MV-MFDMA was also utilized to investigate the multifractal properties of air pollutant time series where the individual air pollutants were considered as different variables from the same system. We analyzed raw and seasonally adjusted time series of air pollutants obtained upon removing the annual oscillations. We found that air pollutants exhibited multifractal auto-correlation behavior, even after removing the seasonal pattern. In both cases, the air pollutant time series data exhibited multifractal properties, with persistent long-range correlations. In order to capture the dynamic characteristics of the inner fluctuations of the air pollutants, we also applied the algorithm to the stochastic remainder components of the time series. The results confirmed that multivariate time series of air pollutants possessed multifractal properties and exhibited long-term persistence such that an increase (decrease) in the previous period will be followed by an increase (decrease) in the next period.
The findings of this paper provide an elevated understanding of the evolutionary activity and temporal links of five air pollutants in the state of California. In general, it is necessary to achieve a better comprehension of the associations between different air pollution time series to manage the environmental air quality based on evidence. Researchers from any scientific field that operate with multivariate frequent time series are encouraged to apply the proposed algorithm since it provides an improvement in detecting long-range correlations and multifractal properties of multichannel data. However, the proposed algorithm has certain limitations. It applies only to frequent and large-dimensional multivariate time series with an equal length N. Moreover, when applying the proposed algorithm, it is suggested to use an equal spacing between scales, where the scaling range varies from 10 to N/10, while the q-orders should include both positive and negative values. Finally, in some cases, seasonality can influence the selected time series, so if that is the case, the series should be firstly depersonalized before applying the algorithm. Future research papers can use the proposed MV-MFDMA algorithm, not only the environmental quality ones, but on the widest range of different real-world complex system time series, for example on financial, economic, meteorological, and health care time series. Therefore, directions for future research should go towards studying new time series when a better comprehension of their interconnections is necessary.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.