Fractal and Long-Memory Traces in PM10 Time Series in Athens, Greece

This work examines if chaos and long memory exist in PM10 concentrations recorded in Athens, Greece. The algorithms of Katz, Higuchi, and Sevcik were employed for the calculation of fractal dimensions and Rescaled Range (R/S) analysis for the calculation of the Hurst exponent. Windows of approximately two months’ duration were employed, sliding one sample forward until the end of each utilized signal. Analysis was applied to three long PM10 time series recorded by three different stations located around Athens. Analysis identified numerous dynamical complex fractal time-series segments with patterns of long memory. All these windows exhibited Hurst exponents above 0.8 and fractal dimensions below 1.5 for the Katz and Higuchi algorithms, and 1.2 for the Sevcik algorithm. The paper discusses the importance of threshold values for the postanalysis of the discrimination of fractal and long-memory windows. After setting thresholds, computational calculations were performed on all possible combinations of two or more techniques for the data of all or two stations under study. When all techniques were combined, several common dates were found for the data of the two combinations of two stations. When the three techniques were combined, more common dates were found if the Katz algorithm was not included in the meta-analysis. Excluding Katz’s algorithm, 12 common dates were found for the data from all stations. This is the first time that the results from sliding-window chaos and long-memory techniques in PM10 time series were combined in this manner.


Introduction
Urban air pollution is a problem related to a sustainable environment, and human society and economy [1,2]. Caused by human intervention, urban air pollution is now a sustained ever-changing part of the urban environment, governed by mechanisms such as emission, dilution, and transportation [3]. Significant pollutants are sulphur dioxides and nitrogen oxides, the ozone, and respirable particulate matter (PM) [4]. The scientific research of the topic focuses on understanding the variations of urban air pollutants with respect to time, investigating the contribution of environmental and meteorological parameters to those variations through modeling, forecasting, and predicting future trends and alterations. Noteworthy approaches utilize neural networks [5][6][7][8][9][10][11][12][13][14][15][16], wavelets [17,18], statistical [19,20], and deterministic [21] models, and, currently, cutting-edge techniques based on chaos and complexity (e.g., References [4,[22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]). Since no forecast method is solid, reliable, and accurate enough to sufficiently match all air-contamination time series [37], it is a challenging task to achieve credible estimations. Their main limitations are related to a lack in measurements, gaps in knowledge on the interaction of air pollutants with the atmosphere, and the fact that it is not fully understood if urban air-pollution variations are governed by randomness or dynamical complexity. The exploration of the latter topic has recently gained particular attention.
The science of complex dynamical systems involves diverging techniques for the investigation of interstate transitions of such systems. Techniques based on fractals [38][39][40][41][42][43][44][45][46] and long-memory [41,[43][44][45]47] receive particular interest because they succeed in delineating the hidden traces of the time series of numerous physical systems of different types, importantly: stationary, nonstationary, and irregular [48,49]. Regarding air-pollution environmental systems, Shi et al. [24] reported two power laws in three air-pollution time series in Shanghai, China, indicating different self-organised critical (SOC) states. Liu et al. [4] employed detrended fluctuation analysis (DFA) and multifractals to investigate the temporal variations of SO 2 , NO 2 , and PM 10 concentrations in the city of Shanghai, China. They reported that the corresponding air pollution was a self-organised critical (SOC) process. They observed different SOC behaviour within the SO 2 , NO 2 , and PM 10 concentration time series that were attributed to differences in corresponding power-law relations. Varotsos et al. [31] observed power-law correlations and the persistent behaviour of deseasonalised ozone-concentration time series with lags between one week and five years. Lee et al. [50] reported scaling invariance and decreasing box-dimension function in the hourly average of O 3 time series concentrations during a whole year by analysing their data series with monofractals. Windsor and Toumi [35] reported, through R/S analysis, a high persistency of an hourly ozone-concentration time series extending up to 400 days. Similarly, Weng et al. [34] confirmed, through R/S analysis, the nonlinearity, fractality, persistency, and long-lasting characteristics of deseasonalised ground-level peak ozone-concentration time series in southern Taiwan. Xue at al. [51]concluded that R/S analysis, DFA, power-law spectral analysis, linear unbiased estimates, correlation analyses, and power-spectrum analysis are sophisticated approaches to analyse and describe the distribution, periodicity, and trends of air-pollutant concentrations. Similarly, Dong et al. [52] employed DFA and multifractal DFA in PM 2.5 and PM 10 data series, and reported the statistical self-affinity, long-memory, and multifractality of their pollutant time series. Finally, Yuval and broday [33] employed continuous wavelet-transform fractal analysis and found that the predictability of air pollutants is consistent with that of meteorological variables in the short scale.

Area of Study
Athens is an urban area with approximately four million habitants and air-contamination issues similar to other large cities worldwide [53]. The city of Athens is situated in a zone of complex geography inside the Athens basin (approximately 450 km 2 ). Mountains frame the city on the west, north and east, with heights ranging from 400 up to 1500 m ( Figure 1). Openings between these mountains are found northeast and west of the basin, while the ocean broadens southward (Saronikos Gulf). The major axis of the Athens basin is from the southwest to the northeast. Usually, winds in the Athens basin blow from the north and northeast in the late summer, fall, and winter, but from the south-southwest in spring and late spring. The northeastern and southwestern directions coincide with the major geographical axis of the basin. During the prevalence of local circulation systems, the basin's ventilation is poor [54,55]. Generally, the dispersion of air pollutants is not favoured by the topography. Air-pollutant emissions in Athens are mainly due to vehicles, industry, and domestic heating.

Measurement Methodology
PM 10 data were retrieved by the Department of Environmental Protection of the Hellenic Ministry of Environment and Energy. This department constantly monitors pollutant concentrations throughout each day. The automatic analysers employed in the measurements provide data every minute. From these data, daily average-pollution time series are calculated via microprocessors, located inside each automatic analyser. All values are transferred to the central station where they are further processed and stored. The present paper uses the daily PM 10 time series collected between 2001 and 2016 from the Agia Paraskevi (AGP), Maroussi (MAR) and Lykovrissi (LYK) stations ( Figure 1). PM 10 detection is conducted via beta radiation absorption.

Fractality and Long-Memory
Nature exhibits fractal behaviour in several physical systems that reflect this when dilated, translated, or rotated in space. Depending on their mathematical properties, these systems are characterised as self-affine or self-similar. Such systems are fractals because the self-affinity and the self-similarity characterise all parts of the system in a manner that every part is a large or small scale imitation or representation of the whole system. Due to this property, fractal systems can be investigated by analysing their parts. It is notable that scaling and fractal behaviour are associated with a system's long-memory [56][57][58] and complexity [57], where the complexity of a system indicates the existence of linear mechanisms and orderliness [59,60]. The complexity, fractality, and long memory of a system are closely related. For this reason, long-memory analysis can reveal the fractality and the complexity of a system [43][44][45]61]. The long-memory analysis of a system can show if there are strong associations between the past, present, and future states of the system.
Among the various methods for estimating the fractality of a system, the ones that directly calculate the fractal dimension are advantageous. These methods reveal the long memory of the system as well. A direct method for estimating the long memory of a system is R/S analysis [41,[45][46][47][62][63][64].
In the following, these methods are described in detail. As associated with the calculations, the Hurst exponent [62] is analysed first since all the methods in the following are combined in terms of Hurst exponents.

Katz's Method
Katz's method is a first approach for the calculation of fractal dimension D. The method [77,78], starts with the transpose array [s 1 , s 2 , . . . , s N ] of a time series, s i , i = 1, 2, . . . , N, where s i = (t i , y i ) and y i are the recorded values at sampled time instances t i .
Two sequential time-series points s i and s i+1 , represent value pairs (t i , y i ) and (t i+1 , y i+1 ) with Euclidean distance given by: The total length of the curve of the distances of Equation (1) is: Under the assumption that the above curve does not cross itself, its planar extend, d, is given by: Katz's fractal dimension, D, is calculated by Equations (2) and (3) by where n = L/a and a is the average distance of the points.

Higuchi's Method
Higuchi's method is a second approach for the calculation of the fractal dimension D of a time series. As reported in References [79,80] Higuchi's fractal dimension D can be calculated for a finite time series recorded at regular intervals i = 1, 2 . . . N.
From Series (5) [79][80][81], a new sequence, y k m , is constructed as follows: Higuchi [81], defines the length of the curve associated to each time series, X k m , as follows: where m and k are integers with m = 1, 2 . . . , k that determine the time lag between samples, the symbol [. . .] represents the bigger integer part of a value (Gauss notation), and the term is a normalisation factor. If a curve is a fractal with fractal dimension D, then the average value L(k) of the lengths of Equation (7), is a power law, namely: Hence, if L(k) is plotted versus k, k = 1, 2, . . . , k max , on a log-log graph, Higuchi's fractal dimension D can be calculated as a slope of the plot's linear regression. The interval time is taken as k = 1, .., k max for k max ≤ 4 (e.g., k = 1, 2, 3, 4, for k max = 4) and as k = 2 (j−1)/4 , j = 11, 12, 13 . . . , for k > 4 (hence, k max > 4) and [. . .] is the Gauss notation [78].

Sevcik Method
Sevcik [82] showed that fractal dimension D can be approximated for the sampled time series of dimension N. According to the Sevcik method, D is estimated from the Hausdorff dimension D h of a curve as [78]: where N( ) is the number of segments of length 2 that make up the curve. For a curve of length L, N( ) = L/2 [78]; hence, D h becomes: By mapping N points of curve L into a unit square of N × N cells of a normalised metric space, via double linear transformation, Equation (11) becomes [78,82]: The Sevcik fractal dimension is estimated by Equation (12), and the estimation is improved as N → ∞.

Computational Methodology of Fractal Dimension
To calculate fractal dimensions via the methods of Higuchi, Katz, and Sevcik, the following steps were followed: 1. The signal was divided into windows of 64 samples (approximately two months' duration). 2. In each segment, the fractal dimension was calculated as follows: • Katz's method: As D from Equation (4) for n = 64 and a = 1 sample per day, namely, the distance between the points of the series that feed parameter L. • Higuchi's method: As slope D of the best-fit line of the log-log plot of Equation (9), namely, log( L(k) )versus log(k), for kmax = 4. • Sevcik's method: As D = D h from Equation (12) for N = 64, namely, equal to the length of the series in each window.
3. The window slid one sample, and Steps (i-ii) were repeated until the end of the signal.

Rescaled-Range Analysis
R/S analysis was conceptualised by Hurst [62,63], and is a method that can reveal patterns that may be reproduced in the future [41,[45][46][47][62][63][64]. The method is based on the transformation of a natural time series, X(N) = x(1), x(2), . . . , x(N), to a new time series y(n, N), in a time interval n, where n = 1, 2, . . . , N according to Equation (13): In Equation (13), , is the average value calculated over a time interval N. Variable y(n, N) is known as the accumulated departure of the natural time series.
The R/S (rescaled) range is then calculated from Equation (14) where R(n) is the range of y(n, N), namely, the difference between its minimum and maximum values R(n) = max and S(n) is the standard deviation calculated as: If there are patterns of long memory hidden in time-series X(N), R/S exhibits a power-law association with bin size n according to Equation (17): In Equation (17), H is the Hurst exponent, and C is a constant of proportionality.
The logarithmic transformation of Equation (17) yields a linear relation: The Hurst exponent is estimated by Equation (18) as the slope of the best line fit.
Computational Methodology of R/S Analysis R/S analysis was utilised according to the following steps: 1.
The signal was divided in windows of 64 samples (approximately two months' duration).

2.
In each segment, the least-square fit was applied to the log R(n) S(n) − log (n) linear representation of Equation (6). Successful representations were considered those exhibiting squares of Spearman's correlation coefficient above 0.95.

3.
The window slid one sample, and steps (i-ii) were repeated until the end of the signal.
From the above, the time evolution of the Hurst exponent was derived for every signal under investigation.

Results and Discussion
The results from the fractal-analysis methods are presented in Figures 2-4. The results from R/S analysis are given in Figures 5-7. The reader may recall that the PM 10 time series (subfigures (a) of Figures 2-7), are lengthy since they refer to all years between 2001 and 2016. Interestingly, during certain periods, relatively high PM 10 concentrations were visually observed for all three monitoring stations. Such example areas are around samples 4500 and 5000 in the AGP series, 4500 and 5000 in the LYK series, and 4000 and 4250 in the MAR time series. In spite of the fact that the above sample values are different, the corresponding dates are similar, as computed from the date-sample data of all these series. The above temporal alignment may partially be attributed to the vicinity of the AGP, LYK, and MAR stations. Although the visual observation of data variations may reveal some patterns, it cannot be compared with the outcomes of cutting-edge methods such as neural networks [5][6][7][8][9][10][11][12][13][14][15][16]83] wavelets [4,17,18], and high-level statistics [20,24,28,37]; most importantly, fractal and long-memory methods are more powerful in signal exploration. The latter methods, as already emphasised, are advantageous because they can outline trends that are hidden in the time series. The following issues are significant and should be emphasised. The completeness of the PM 10 time series of Figures 2-7 are different; 88.2% for the Agia Paraskevi station, 89.5% for the Lykovrissi station, and 78.9% for the Maroussi station. This can be observed from the temporal differentiation of the three time series. The missing values of the time series impose bias to the analysis (e.g., References [52,84] and the related discussion and references), and, for this reason, the above investigators filled the missing data via cubic splines. However, this approach may alter the raw time series and, therefore, if followed, it could bias, or even suppress, any existing fractal and long-memory patterns in the time series. To avoid this type of bias, the PM 10 time series (subfigures (a) of Figures 2-7) were treated jointly, namely, the missing PM 10 values were ignored. This treatment avoids the bias of unavailable data in fractal and long-memory analysis, and is significant according to the following logic: If an N=64-sample window contains missing time-series values, values prior to the missing ones that fill part of the window up to a sample number, n, n ≤64, are mixed with N − n values recorded after the missing values. In the aspect that n values constitute the present of the time series, whereas N − n values constitute its future, the present is mixed with the future. Considering that sliding one sample forward is finally performed, the past of the series is rematched with its present, which is mixed and also matched with its future. If the time series have hidden fractal and long-memory patterns in critical states, their present is defined not only by their long-term history, but also, most importantly, by their long-lasting future [40][41][42][43][44][45][46][47]61,64,[70][71][72]74,85,86]. Under a different aspect, the system is in a non-Markovian state (see above publications and references therein), where its past, present, and future are inevitably linked by interchanged SOC states leading to critical end-points. In view of the above, the joint treatment of missing values is advantageous concerning the delineation of potentially existing fractal and long-memory patterns, since it combines the past, present, and future of the time series in each sliding window. Hereafter, missing values are treated as mentioned above.       [43,45,46,86]. As discussed extensively in Reference [45] and the above publications, the in situ relationship of H and D is linear, but a different relation might be valid. This view seems to be confirmed by the results of Figures 2-7, and could be another reason for the addressed discrepancies; however, the latter is outside the scope of this paper.
What is most important from the results of Figures 2-7 is that the time variations of D and H are completely different from concurrent time-series variations. This is observed in Figures 2-7. This is a significant finding that has been acknowledged by numerous investigators [40][41][42][43][44][45][46][47]61,64,[70][71][72]74,[85][86][87][88][89][90]. As extensively discussed in the above publications, the employed methods reveal hidden patterns in the time series, and for this reason their time evolution is different from that of the time series. Several windows are spotted with significantly low fractal dimensions and significantly high Hurst exponents. These windows may have been, most probably, linked to PM 10 time epochs of critical fractal and long-memory behaviour. Such critical time epochs for the PM 10 time series could be the following: (a) continuously changing, power-law fractal, long-memory, or SOC state interactions of the PM 10 air-pollution system with various sources of pollutants from industries, vehicles, and energy -generating power stations, combined with complicated physical and chemical processes [24,52]; (b) different small-and large-scale fluctuations of the PM 10 time series [52,[91][92][93][94], e.g., different types of persistency or antipersistency at the small and large time scales [4]; (c) scalefree fractal PM 10 systems with the power-law scaling and long-term memory recognised as a footprint of SOC behaviour [4]; (d) nonlinear low-dimensional chaos dynamics governing the PM 10 [95]; (e) external forces, such as meteorological processes and photochemical reactions, which amplify or reduce the contribution of linear and nonlinear mechanisms of ambient pollutant interaction, which further complicate the temporal structure of the dynamics of ambient pollutant concentrations and lead to complex behaviour of air-pollution systems [96]. According to the above argumentation, since the PM 10 systems of Figures 2-7 had passed from some or numerous critical fractal and long-memory epochs as the ones described above, the most beneficial method to treat the missing values would be the one adopted in this paper, namely, the one that combines the past with the present and the future. These epochs are also linked to SOC states of the PM 10 generating system.
The following issues are noteworthy: First, all the windows of Figures 2-7 are of 64-sample length, which is the power of two nearest to two months' length. This is a fair compromise between sufficiently large windows, but not excessively long to include great seasonal changes of atmospheric conditions [97][98][99][100][101]. As found in previous publications [61,64], window lengths above 32 are successful in calculating the Hurst exponents [64] of DFA exponents of electric signals produced by irradiated Quantum Dots [61]. Especially in the latter publication, long-memory analysis was checked versus trivial methodology [91][92][93][94]. Utilisation of very long windows, e.g., above 1024 samples per window, was avoided due to intense seasonal variations' nonpractical results regarding the time evolution of the hidden trends.
Greatest emphasis, however, should be placed on the following issues. As expressed in recent publications (References [43][44][45] and references therein), in complement to the claims of other publications (e.g., References [38,39,[102][103][104] and references therein), it is most significant when analysing the long memory and fractality of a system, not just to identify under-or overthreshold segments, but preferably, to locate under-or overthreshold areas that are commonly found through different chaos-analysis and long-memory methods. In this sense, in order to evaluate the segments of fractal and long-memory behaviour of Figures 2-7, it is crucial to set the threshold values for both the fractal dimensions and the Hurst exponents. To achieve this, the output data of the fractal analysis were computationally searched according to user-defined thresholds for common areas of underthreshold values for the fractal dimensions, and overthreshold values for the Hurst exponents from R/S analysis. At first, the threshold for the Hurst exponents derived from R/S analysis was set equal to 0.8 under the constraints that the accepted exponents were above this threshold and simultaneously exhibiting square of Spearman's correlation coefficient r 2 ≥ 0.95. The threshold for the fractal dimensions calculated from the Sevcik method was set equal to 1.2, since this value corresponds to H = 0.8 from the aforementioned relation H = 2 − D. On the other hand, the outcomes of Katz's and Higuchi's methods showed the discrepancies discussed above, in comparison to the dimensions of the Sevcik method. For this reason, the thresholds of the fractal dimensions from Katz's and Higuchi's methods were enlarged so as to correspond to Hurst exponents below 0.8, since no fractal dimension was below 1.2 from calculations according to Katz's and Higuchi's algorithms. Accounting for this, the fractal-dimension thresholds of Katz's and Higuchi's algorithms were taken equal to 1.5, a value significantly lower than the majority of the calculated D values. In order to locate the dates of the common overthreshold areas for H values from the R/S analysis, and the underthreshold values for the D values, proper computational calculations were iterated in the date vectors of the analysed segments under the constraint that the dates of the first sample of each window were arbitrarily attributed to the whole window. This was considered as an acceptable compromise since each window is moved one sample forward and, hence, all dates but the last are covered by analysis windows. The computational iterations to locate common dates with all techniques were repeated for all possible combinations of stations, and were also repeated for all possible combinations of two or more techniques for all stations. The reader should note that this holistic approach to locate common areas in the outputs of different chaos and log-memory analysis techniques that were employed in this paper is not trivial and, when found, the scientific evidence is more in-depth and should be emphasised. Figures 8 and 9 present the meta-analysis of the chaos and long-memory techniques in accordance to the arguments expressed above. Both Figures show the concurrent meta-analysis results for the data of all three investigated stations. Figure 8 includes all methods of Section 3, while Figure 9 shows all techniques except Katz's algorithm. As can be observed from Figure 8, there are no common areas simultaneously identified by the three stations when the result data are further processed for coincidences by all techniques of Section 3. This observation can mainly be attributed to the discrepancies in the chaos-analysis results by the three fractal dimension algorithms. Despite this, it is evident from Figure 8 that the results coincide for the data of two stations. Twelve dates were identified from combination analysis of the results of the Agia Paraskevi and Lykovrissi stations, and 36 for the the combination of data from the Agia Paraskevi and Maroussi stations. Lykovrissi results versus Maroussi results did not show common areas. On the other hand, results were significantly enhanced when the outputs from Katz's algorithm were not included in the meta-analysis. As can be observed from Figure 9, 16 common areas are common for the data from all stations. The actual dates of Figure 9 are presented in Table 1, as this is considered the most significant output. The combinations of fractal methods only, as well as of two fractal methods and R/S analysis, did not present more than two coincident dates. These findings are due to the fact that Katz's algorithm produces fewer D values in the range of 0 ≤ D ≤ 2, a fact that might be the source of the observed deviations. On the other hand, despite that Sevcik's N is far from ∞, the results of Sevcik's method are closer to the results of R/S analysis, because the H = 2 − D relation seems to be functional for the combination of only these techniques. However, as expressed in the discussion above, the scientific validity of the outcomes of the fractal and long-memory analysis techniques are more in-depth in the identified common areas by the combination of several methods. In this sense, the results of Figures 8 and 9 and Table 1, are, to the view of the authors, the most significant. As also expressed in the discussion, other investigators published the view of signifying results from only one method. In this view, it is interesting to note that, with the utilised threshold criteria, R/S analysis showed 1902 long-memory segments for the Agia Paraskevi station, 1425 for the Lykovrissi station, and 1559 for the Maroussi station. As also expressed in recent publications (References [43][44][45] and references therein), identification on many H-over-threshold segments is trivial for R/S analysis. Regarding the fractal-dimension analysis, e.g., from the data of the Lykovrissi station, 411 fractal segments were recognised by Katz's method, 889 by Higuchi's method, and 2881 by Sevcik's method. An analogous number of segments were identified by the data from the other two stations as well. This latter finding reinforces the aspect expressed in this paper and followed in the postanalysis of the results of the fractal and long-memory methods; every technique showed several areas of fractal and long-memory behaviour. The evidence is stronger only for those of Figures 8 and 9 and Table 1. The expressed aspects, the employed methodology, together with the presented results and discussion, indicate that the combination of several methods is a very promising tool for analysing PM air-pollution data.  Table 1. Coincidence dates of common segments of Figure 9.

Conclusions
The following summarises the most significant results: 1. The algorithms of Katz, Higuchi, and Sevcik were employed together with R/S analysis via sliding windows of two months' duration to investigate the existence of chaos and long memory in three 16-year-long PM 10 concentration time series recorded in Athens, Greece. 2. Several segments were found with dynamical complex fractal behaviour and long memory.
Via specific thresholds, computational calculations were performed on all possible combinations of two or more techniques for the data of all stations under study. The best combination of methods for the data of all stations was the one not including Katz's algorithm in the meta-analysis. 3. Twelve dates of coincidence were identified from this combination of techniques.