Establishment of Regional Concentration–Duration–Frequency Relationships of Air Pollution: A Case Study for PM2.5

Poor air quality usually leads to PM2.5 warnings and affects human health. The impact of frequency and duration of extreme air quality has received considerable attention. The extreme concentration of air pollution is related to its duration and annual frequency of occurrence known as concentration–duration–frequency (CDF) relationships. However, the CDF formulas are empirical equations representing the relationship between the maximum concentration as a dependent variable and other parameters of interest, i.e., duration and annual frequency of occurrence. As a basis for deducing the extreme CDF relationship of PM2.5, the function assumes that the extreme concentration is related to the duration and frequency. In addition, the spatial pattern estimation of extreme PM2.5 is identified. The regional CDF identifies the regional extreme concentration with a specified duration and return period. The spatial pattern of extreme air pollution over 8 h duration shows the hotspots of air quality in the central and southwestern areas. Central and southwestern Taiwan is at high risk of exposure to air pollution. Use of the regional CDF analysis is highly recommended for efficient design of air quality management and control.


Introduction
Air pollution has become a worldwide environmental health risk [1,2], especially in the developing countries. In management, an essential task is to understand the duration, frequency, and intensity of air pollution [3,4]. Poor air quality usually affects human health [5][6][7]. Elderly people, children, and people with heart or lung conditions are more sensitive than others to the effects of inhaling fine particles (PM 2.5 ). Acute exposure to PM 2.5 is associated with various health outcomes, such as respiratory symptoms [8,9], hospital admissions, or death [10]. The maximum PM 2.5 concentrations have significant positive impacts on outpatient visits [11]. Chronic exposure to PM 2.5 can lead to chronic diseases or reduced life expectancy [12]. The risk that extreme pollutant concentrations pose to human health is the subject of environmental concern [10].
The impact of frequency and duration of air quality has received considerable attention over the past decades. However, few people discuss the concentration-duration-frequency (CDF) relationships in air pollution. The CDF is modified from the intensity-duration-frequency (IDF) relationship, which is a mathematical relationship between the rainfall intensity, duration, and annual frequency of occurrence [13]. The most common IDF techniques from hydrological engineering were used for Gumbel distribution [14]. However, the CDF is related to the extreme concentration of air pollution with its duration and annual frequency of occurrence. The CDF formulas are empirical equations representing a relationship between the maximum concentration as a dependent variable and other parameters of interest, i.e., duration and annual frequency of occurrence. In the frequency aspect, the Figure 1 shows the locations of air quality monitoring stations in Taiwan. Air quality zones, e.g., northern, central, southwestern, and eastern, are also established in Taiwan ( Figure 1). The Taiwanese Environmental Protection Agency (TWEPA) has been regularly recording the air quality and meteorological data throughout Taiwan since July 1982 [20]. Since August 2005, the TWEPA has completed the installation of 76 monitoring instruments for fine airborne aerosols (PM 2.5 ; Figure 1). In this study, the historical data obtained by the TWEPA's Air Monitoring Network were obtained for the PM 2.5 hourly fine aerosol concentration in Taiwan from 2005 to 2015. This study used the hourly PM 2.5 obtained by the TWEPA stations that are in compliance with the regulatory air monitoring procedures. Int. J. Environ. Res. Public Health 2020, 17, x FOR PEER REVIEW 3 of 13

Methods
First, the moving average of PM2.5 data for various durations needed to be prepared. Derivations of the PM2.5 CDF relationships for each station for the different return periods were determined by fitting the Gumbel distribution to the corresponding maximum concentration per year at various stations (Section 3.1). The empirical model for the PM2.5 CDF analysis at each station was developed (Section 3.2). Eventually, the regional PM2.5 CDF analysis was performed (Section 3.3).

Derivations of PM2.5 CDF
The probability of occurrence of the extreme pollution event ( ≥ ) for the annual frequency of exceedance ( ) is the inverse of the return period calculated as follows: The probability of non-exceedance is as follows: where is the return period or recurrence interval of extreme concentration, is the extreme PM2.5 concentration, and is the threshold. The Gumbel distribution was fitted to the extreme PM2.5 data, i.e., to the maximum annual values. The cumulative Gumbel extreme value distribution is illustrated as follows [21]:

Methods
First, the moving average of PM 2.5 data for various durations needed to be prepared. Derivations of the PM 2.5 CDF relationships for each station for the different return periods were determined by fitting the Gumbel distribution to the corresponding maximum concentration per year at various stations (Section 3.1). The empirical model for the PM 2.5 CDF analysis at each station was developed (Section 3.2). Eventually, the regional PM 2.5 CDF analysis was performed (Section 3.3).

Derivations of PM 2.5 CDF
The probability of occurrence of the extreme pollution event (C ≥ c) for the annual frequency of exceedance (F) is the inverse of the return period calculated as follows: The probability of non-exceedance is as follows: where T is the return period or recurrence interval of extreme concentration, C is the extreme PM 2.5 concentration, and c is the threshold.
The Gumbel distribution was fitted to the extreme PM 2.5 data, i.e., to the maximum annual values. The cumulative Gumbel extreme value distribution is illustrated as follows [21]: where P(C ≤ c) is the probability of non-exceedance, and α and β are the parameters of the Gumbel distribution. From the observed sequence, the parameters can be estimated [13]. The parameters α and β are linked to the mean value (C) of the extreme PM 2.5 data and to the standard deviation (σ C ) of the extreme PM 2.5 data through the following equations: The logarithm was taken twice to yield a formulation from Equation (3), and Equation (2) was merged into Equation (3). Thus, the extreme concentration of PM 2.5 (C) for each return period is as follows: In addition, C can be expressed as a function of the return period for each duration. The average extreme concentrations for various durations were calculated based on the aforementioned steps. For the respective periods, the maximum values of 1, 8, 16, 24, 48, and 96 h-average concentrations were used for the modeling. In the specific duration, the PM 2.5 extreme concentration is a function of the return period. The following return periods are considered in this study: 5, 10, 20, 30, 40, 50, 60, 75, and 100 years.

Empirical PM 2.5 CDF Function
Based on the above modeling, the empirical CDF formulas can be developed on the basis of the equations representing a relationship between the maximum PM 2.5 concentration (C), duration (D), and frequency (F), which is the inverse of the return period (1/T), as follows: where D is the duration. The empirical function follows the power expression based on the Bernard equation [13,19]: where k, a, b are the model coefficients. We also consider the natural logarithm of both sides of the equation: The least squares method is applied to determine the parameters of the empirical CDF equation. The parameters of Equation (9)  The empirical PM 2.5 CDF function for each station was identified using Equation (8), whereas the regional CDF model was determined using Equation (10). The regional PM 2.5 CDF model is generated from the CDF with spatially varying parameters k(x), a(x), b(x) that are spatially interpolated. Regional concentration C(x) of PM 2.5 at any location x is the function within a specified return period (T) within a duration (D) as follows: i.e. lnC( where regional coefficients, e.g., k(x), a(x), b(x), at any location x are generated using k, a, b regression parameters by using the inverse distance weighted (IDW). The IDW method is a straightforward and low-computational approach for spatial interpolation procedures [22]. The k(x), a(x), b(x) regional coefficient maps are shown in the Appendix Figure A1 In this study, the CDF relationship (Equations (4) to (6)) was derived for each station, and the empirical CDF formulas (Equation (8)) can be evaluated with different durations and return periods. In this study, the CDF relationship (Equations (4) to (6)) was derived for each station, and the empirical CDF formulas (Equation (8)) can be evaluated with different durations and return periods.

Heterogeneity of Air Pollution
According to the CDF relationship, the PM 2.5 concentration is high with a low return period. In addition, the decrease of the extreme concentration varies with increasing duration. Figure 4 shows the comparison of the CDF curves in Erlin, Xiaogang, and Hualien in western, southwestern, and eastern Taiwan (locations shown in Figure 1). For the 5-year return period, the 8 h-average concentration is 110 µg/m 3 , and the 96 h-average concentration drops to 60 µg/m 3 in Hualien. However, in Erlin and Xiaogang, the 8 h-average concentrations are 170 and 175 µg/m 3 , and the 96 h-average concentrations are only reduced to 100 and 110 µg/m 3 . Table 1 shows the model coefficient list and illustrates coefficient b for duration containing large differences in the observations. Duration of air pollution is higher in Erlin than in Xiaogang and Hualien. According to the data for the duration over 24 h, air pollution persists in Erlin more than in Xiaogang and Hualien. The CDF curves clearly show the differences between the extreme PM 2.5 concentrations in the eastern and the western parts of Taiwan. In Figure 4, the downward slope of the CDF curve is greater in Hualien than in Erlin and Xiaogang. Air pollution was generally more serious in the western part when compared with the eastern part of Taiwan. The heterogeneity of extreme pollutant concentrations represents a different level of risk to people's health [23]. The emissions were from local sources, such as traffic and industrial or agricultural activities, and from outside Taiwan [24,25]. A highly developed industry and a large population density distinguish the western plain of Taiwan. Air quality is mainly influenced by local emission sources over the southwestern and central inland cities of Taiwan, but meteorological conditions also affect air pollution dispersion in Taiwan [26]. Air quality in the urban and industrial regions of southwestern and west central Taiwan is poor in the winter [20]. However, traffic and industrial activities are low in eastern Taiwan. The air quality in eastern Taiwan is generally good. In addition, emission sources, meteorological conditions, and atmospheric boundary layers affect transmission and diffusion of air pollution [27,28]. Using the CDF curve and results of the regional CDF analysis, the effect of the PM 2.5 emission due to the diffusion differences in various locations can be determined.

Regional CDF Analysis
Short-and long-return period air pollutions, such as 5-year and 100-year ones, provided the difference in frequencies of local high pollution levels and the effects of spatial PM2.5 patterns. Figures

Regional CDF Analysis
Short-and long-return period air pollutions, such as 5-year and 100-year ones, provided the difference in frequencies of local high pollution levels and the effects of spatial PM 2.5 patterns. Figures 5  and 6 show the estimated extreme concentration for 5-year and 100-year return periods with the duration of 1, 8, 24, and 96 h. If the analyzed duration was 1 h, the few, local, and distributed hotspots were from the northern, central, and southwestern areas of Taiwan, respectively. This anomaly in air quality may have been caused by a local event, e.g., a fireworks festival, agricultural and domestic solid fuel burning, or a forest wildfire. Over 8 h, the spatial pattern of air pollution was highly similar. The greater the duration, the larger the hotspot. The PM 2.5 exposure is not randomly distributed. The hotspots of PM 2.5 air pollution were shown in the central area and in the southwestern area of Taiwan, especially at the 96 h duration.
The long-duration air pollution in Taiwan is usually the effect of local sources, topography, and the monsoon [20]. The current northeast monsoon is weakening, leading to poor diffusion conditions to drive air pollutants [29]. Air quality becomes poor due to stagnant conditions in western Taiwan, particularly in central and southern Taiwan. Previous research also found that elevated total ambient PM 2.5 , due to poor diffusion conditions as the northeast monsoon weakens, may be associated with acute health outcomes [30]. In general, the PM 2.5 concentration in the summer is low, but high in winter. Air pollution in the short term (acute) and in the long term (chronic) should be considered [31,32]. Studies suggest that public health impacts of air pollution will be dominated by short-term and long-term exposure as determined by the association between exposure and mortality [6,31,33,34]. According to the spatial results, central and southwestern Taiwan is at high risk of long-duration exposure of air pollution. These areas are unhealthy for groups who are sensitive to air pollution.
Extreme pollution events were observed frequently by monitoring the network. The 1 h-average PM 2.5 concentrations reach very high levels. Furthermore, industry and traffic emissions [35], agricultural burning [36], massive incense burning [37], dust storms [38], domestic solid fuel burning [39], and wildfires [30] significantly affect local extreme air pollution. Pollutants released into the environment are spatially fluctuating rather than uniformly distributed, and thus the health risk due to air pollution is an issue of spatial variability [23,40]. This study can provide the information on air pollution distribution with a specified return period and duration.
agricultural burning [36], massive incense burning [37], dust storms [38], domestic solid fuel burning [39], and wildfires [30] significantly affect local extreme air pollution. Pollutants released into the environment are spatially fluctuating rather than uniformly distributed, and thus the health risk due to air pollution is an issue of spatial variability [23,40]. This study can provide the information on air pollution distribution with a specified return period and duration.

Discussion
The CDF, or the regional CDF method, belongs to the spatiotemporal data analysis of extreme PM2.5 concentration data. Spatiotemporal data analysis and visualization are important for air pollution management [41]. However, the summary of the extreme pollution data, or data analysis, is critical. The CDF analysis can help us summarize the main characteristics and patterns of air pollution for a specified duration and frequency and provide information about the occurrence of limit exceedances of air pollution [10]. The CDF curves and their spatial maps can be used for area delineation, policy control, and management [42,43]. The framework is relatively practical, but the experiment proves that it is valid to analyze the extreme air pollution monitoring results with various durations and return periods. The method is used to determine the multiple functions of the CDF in the air pollution data. However, the history of records can help define the long-term behavior. We

Discussion
The CDF, or the regional CDF method, belongs to the spatiotemporal data analysis of extreme PM 2.5 concentration data. Spatiotemporal data analysis and visualization are important for air pollution management [41]. However, the summary of the extreme pollution data, or data analysis, is critical. The CDF analysis can help us summarize the main characteristics and patterns of air pollution for a specified duration and frequency and provide information about the occurrence of limit exceedances of air pollution [10]. The CDF curves and their spatial maps can be used for area delineation, policy control, and management [42,43]. The framework is relatively practical, but the experiment proves that it is valid to analyze the extreme air pollution monitoring results with various durations and return periods. The method is used to determine the multiple functions of the CDF in the air pollution data. However, the history of records can help define the long-term behavior. We will maintain the data size as large as possible, and a future study will update the results.
Heterogeneity of PM 2.5 is highly related to pollutant sources, geographical location, and climatic and topographic conditions [44]. The use of the regional model rather than those for individual stations logically provides more spatial information for management purposes. The extreme PM 2.5 concentration was generally spatially varied. Moreover, the regional PM 2.5 CDF map could be useful in order to detect the spatial variation of health risk, and the CDF of various durations is needed to determine the health risk [45]. Avoiding exposure to air pollutants is especially critical for the elderly, children, and susceptible individuals with cardiovascular or pulmonary diseases (CVD or COPD) [32]. A further study can be conducted to determine the long-duration concentrations of PM 2.5 , such as the 1-week-, 10-day-, and 1-month-average concentrations. We will apply the CDF to high-frequency and massive low-cost sensor data, such as airbox data [46,47], and identify the differences.

Conclusions
This study presented the derivation of the CDF curves of fine aerosols, PM 2.5 . The Bernard formula was applied as the basis for estimating the empirical CDF relationship. The empirical CDF curve coefficients were estimated by regressions. Moreover, the regional map for the CDF analysis can be identified based on regional parameters. Results of the preliminary analysis of the data for 2005-2015 showed that the spatial pattern of extreme air pollution was highly similar over an 8 h duration. The local pattern of air pollution lasting for 1 h may be caused by a local event, e.g., a fireworks festival, agricultural and domestic solid fuel burning, and forest wildfires. The greater the duration, the larger the hotspot. Moreover, the use of regional maps rather than those for individual stations provide spatial information on air pollution, particularly for longer durations. Using the regional extreme concentration with a specified duration and return period, the spatial pattern of air pollution showed the hotspots of air pollution in the central area and in the southwestern area of Taiwan. Central and southwestern Taiwan is at high risk of long-duration exposure of air pollution.
In addition, the CDF relationship between the stations in western and eastern Taiwan revealed air quality differences between these areas. With the same return period, the concentration was lower in the eastern than in the western part. Air pollution persisted in the western part rather than in the eastern part. Furthermore, the local and the regional CDF relationships proposed in this study can be used as a reference for health planning and management. Further study can be conducted for air quality design and control of pollution.