Key Factors for Improving the Carcinogenic Risk Assessment of PAH Inhalation Exposure by Monte Carlo Simulation

Monte Carlo simulation (MCS) is a computational technique widely used in exposure and risk assessment. However, the result of traditional health risk assessment based on the MCS method has always been questioned due to the uncertainty introduced in parameter estimation and the difficulty in result validation. Herein, data from a large-scale investigation of individual polycyclic aromatic hydrocarbon (PAH) exposure was used to explore the key factors for improving the MCS method. Research participants were selected using a statistical sampling method in a typical PAH polluted city. Atmospheric PAH concentrations from 25 sampling sites in the area were detected by GC-MS and exposure parameters of participants were collected by field measurement. The incremental lifetime cancer risk (ILCR) of participants was calculated based on the measured data and considered to be the actual carcinogenic risk of the population. Predicted risks were evaluated by traditional assessment method based on MCS and three improved models including concentration-adjusted, age-stratified, and correlated-parameter-adjusted Monte Carlo methods. The goodness of fit of the models was evaluated quantitatively by comparing with the actual risk. The results showed that the average risk derived by traditional and age-stratified Monte Carlo simulation was 2.6 times higher, and the standard deviation was 3.7 times higher than the actual values. In contrast, the predicted risks of concentration- and correlated-parameter-adjusted models were in good agreement with the actual ILCR. The results of the comparison suggested that accurate simulation of exposure concentration and adjustment of correlated parameters could greatly improve the MCS. The research also reveals that the social factors related to exposure and potential relationship between variables are important issues affecting risk assessment, which require full consideration in assessment and further study in future research.


Introduction
Health risk is defined as the likelihood of an adverse effect in the population due to exposure to a contaminant. Accurate assessment of the health risk is of great significance for the determination of adverse effects and disease prevention. Monte Carlo simulation (MCS) is a computational technique used in the health risk assessment (HRA) [1][2][3]. Compared to point estimation, probabilistic risk estimation based on MCS allows considerably greater accuracy in describing the variables' uncertainty and improves the understanding of contaminants' environmental behaviors [4,5]. Thus, it makes the assessment more informative to risk managers [6,7]. MCS is widely used in the risk estimation of multimedia exposure and parameter sensitivity qualification [8][9][10]. Recently, MCS has been combined with the physiologically based pharmacokinetic model to determine the relationship between internal exposure dose and cancer risk [11]. Improved, two-dimensional MCS has also been used to directly provide specific guidance for the risk control of pollutant exposure [12,13]. With the development of exposure parameter research, such as the publication of Exposure Factor Handbook of U.S. EPA and Exposure Factors Handbook of Chinese Population, sufficient exposure parameters have been provided to support use of the MCS method in exposure estimation. Therefore, MCS has gradually replaced point estimation and is widely applied in the risk assessment of multiple exposure routes [14][15][16].
The traditional assessment model based on the MCS approach also suffers from several drawbacks. First, MCS treats input variables as random variables with known or estimated probability density functions. The distribution model's selection of parameters is obviously affected by the subjectivity of the researcher. It is difficult to determine how well the model represents the actual conditions. Second, correlated variables are treated independently in traditional risk assessment, which may increase the estimation uncertainty. Mathematical relationships exist among some variables. For example, positive correlation can be detected between a subject's water intake rate and body weight. It has been reported that a correlation with an absolute value of the coefficient greater than 0.6 will have larger effects on the output distributions [5]. In a traditional MCS risk assessment model, however, these variables are often treated as independent variables. This treatment will inevitably result in increasing uncertainty. Finally, the results of risk assessment based on MCS are difficult to validate. Risk estimation based on MCS usually focuses on a large population group. It is difficult to obtain the actual risk of each participant in the population by sampling due to the high cost.
Here, we wanted to use data from a large-scale investigation of individual exposure to polycyclic aromatic hydrocarbons (PAHs) to explore the key factors for improving the accuracy and decreasing the uncertainty of assessment based on MCS. PAHs are semivolatile organic compounds that are generally formed by the incomplete combustion of fossil fuels and biomass fuels [17][18][19]. They are associated with potentially toxic, mutagenic, and carcinogenic effects [20,21]. In accordance with the currently available epidemiological and toxicology data, exposure to high levels of PAHs has been proven to be a potential cause of skin, lung, bladder, and gastrointestinal cancers [22][23][24][25][26]. Due to the wide range of sources, strong carcinogenicity, and large emissions of PAHs, health risks caused by them have been a topic of public concern [27][28][29]. Actual carcinogenic risk though PAH inhalation was estimated by measured PAH concentration and population exposure parameters. Traditional assessment model based on MCS and improved models from three following aspects were also applied to evaluate the carcinogenic risk though PAH inhalation. (1) PAH exposure concentration was adjusted by the weight of population density in the research area, (2) the target population was age-stratified to increase the accuracy of exposure parameters, and (3) correlated parameters were adjusted by a regression model. The goodness of fit of the models was evaluated quantitatively by comparison with the actual risk of 2740 participants. The purpose of this study was to screen the key factors affecting the MCS method and find methods to improve the assessment based on MCS.

Research Area and Participants
Taiyuan City, the capital of Shanxi Province, was selected as the research area in this study ( Figure 1). Taiyuan, as an important industrial base in northern China, has long been plagued by air pollution, especially PAH pollution, due to high energy consumption and poor air diffusivity [30][31][32][33]. According to statistics, the annual consumption of oil in Taiyuan was 1.9 million tons standard coal equivalent, and the annual coal consumption was 79.0 million tons of coal, accounting for 20.4% of the total oil consumption and 29.4% of the coal consumption in Shanxi Province, respectively [34]. In this study, a total of 25 sampling sites were selected and covered all the six districts and four counties of Taiyuan City. Because the exposure scenarios of urban and rural areas were different, all the 25 sampling sites were classified into urban and rural groups, respectively. The population nearby was also classified accordingly. To obtain the exposure parameters including the height and body weight of the population in Taiyuan, field measurements were conducted among adult residents according to a designed statistical sampling method. The field measurement covered a sample size of 2740 local adult residents. The basic information of the participants is shown in Table 1. Age structure and sex ratio are two important factors to determine the representativeness of the participants. These two factors of participants in Taiyuan were compared with the census data of Taiyuan ( Figure S1). Similar age structure and sex ratio can be found in subjects from most age groups and the target population [35]. It can be considered that the sample size is sufficient to reflect the characteristics of the population in Taiyuan.

Sampling and Analytical Methods
Atmospheric particulate and gaseous-phase PAH samples were collected from 25 sampling sites in each region using passive air samplers with a PUF disk and glass fiber filter ( Figure 1). Atmospheric samples were collected in three periods of a year including spring, summer-autumn, and winter between 2009 and 2010. After sampling, the PAHs in the gas and particulate phase samples were Soxhlet extracted, purified by alumina silica gel column, and measured using an Agilent 6890 N gas chromatograph coupled with an Agilent 5975 mass spectrometer with an electron impact ion source. Details of the analytical and quantification procedures have been described previously [36,37]. Fifteen priority controlled PAHs were identified, including acenaphthene (Ace), acenaphthylene (Acy), fluorine (Flo), phenanthrene (Phe), anthracene (Ant), fluoranthene (Fla), pyrene (Pyr), benz(a)anthracene (BaA), chrysene (Chr), benzo(b)fluoranthene (BbF), benzo(k)fluoranthene (BkF), benzo(a)pyrene (BaP), dibenz(a, h) anthracene (DahA), indeno(1,2,3-cd)pyrene (IcdP), and benzo(ghi) perylene (BghiP). The annual average PAH concentration was calculated and used to represent the PAH exposure level of the population.

Risk Characterization
Based on the USEPA risk assessment protocol [38], the carcinogenic risk was characterized by the incremental lifetime cancer risk (ILCR). The ILCR of the studied population was estimated based on the following equation: where BaP eq is the BaP equivalent concentration (BaP eq ) in gaseous phase and particulate phase (ng·m −3 ), IR is the inhalation rate (m 3 ·day −1 ), EF is the exposure frequency (365 day·year −1 ), ED is exposure duration (53 year), BW is the body weight (kg), and AT is the average time (70 years × 365 day·year −1 ). CSF BaP is the inhalation cancer slope factor for BaP (3.14 per mg·kg −1 ·day −1 ). The BaP eq and the toxicity equivalency factors (TEFs) were used to express the carcinogenic risk of PAH mixtures. The BaP eq based on the BaP toxicity was incorporated using Equation (2), where C i is the concentration of the PAH species in atmospheric samples, and TEF i is the toxic equivalence factor of the congener of PAH, i. Table S1 lists the PAHs and TEFs used in the calculation associated with the evidence of cancer in PAH-exposed individuals.
Parameters in calculating individual risk are different from those used in Monte Carlo simulation. For the risk assessment of participants, the monitored PAH concentration in the closest sampling site to the participant was taken as the individual exposure level. Body weight was obtained directly by field measurement. IR was estimated by the basal metabolic rate method based on body weight and height. The details of estimation are provided in Table S2 and Text S1. For the Monte Carlo simulation, the exposure parameters of adults in Taiyuan were collected from the Exposure Handbook of Chinese Population.

Monte Carlo Simulation
Monte Carlo simulations were used to generates samples from probability distributions describing the range of results and the sensitivities of relative parameters. Here, three parameters including BaP eq concentration, body weight, and inhalation rate were considered as variables. Log-normal distributions were applied to describe variables, because log-normal is the most widely applied model in studies on the exposure parameters [39,40]. Distribution of concentration was obtained by Gaussian fitting of the monitoring data from 25 sampling sites. BW and IR were derived based on the exposure parameter collected from the Exposure Factors Handbook of Chinese Population [41].
Once the distribution of the variables was fixed, the computer selected a value for each variable at random from the specified distribution and calculated the corresponding risk. This process was repeated many times, the set of input values and corresponding estimate of risk were saved each time. In an MCS process, each calculation is referred to as an iteration, and a set of iterations is called a simulation. The Monte Carlo simulation program was written in Python 3.7. Iterations ranging from 5000 to 50,000 were performed to test convergence and the stability of the numerical output. As a result, a total of 30,000 iterations were sufficient to ensure a subgroup with the least population (age 60-70) can obtain convergence results. Rank correlation coefficients between each input variable and the carcinogenic risk were calculated by Spearman correlation analysis. The sensitivity of each input variable was quantified by comparing the rank correlation coefficients [42].

Individual Actual Risks Based on Measured Parameters
The carcinogenic risk of 2740 participants was assessed based on the PAH concentrations and parameters measured. We assumed that the subjects could represent the population of Taiyuan; thus, the calculated risk value based on the actual measurement was close to the real risk of the population. Therefore, in our research, risks based on measured data were considered as actual risk values to evaluate the goodness of fit of the Monte Carlo simulations.
The cancer risk of the whole population varied from 7.53 × 10 −7 to 4.67 × 10 −5 , with a median of 4.83 × 10 −6 . Compared with risks reported from the other cities in China, ILCR in Taiyuan was much higher than that reported in Xingtai, Hebei, with an average ILCR of 3.4 × 10 −7 [43]. However, ILCR in Taiyuan was lower than that of people in some typical industrial cities such as Nanchong, Sichuan Province, with a reported value of 1.2 × 10 −5 [44] and Changzhou with a value of 3.6 × 10 −3 [45].
Influencing factors of both environmental concentration and population exposure characteristics were studied ( Figure 2). Exposure assessment indicated that local residents had a mean gaseous exposure level of 16.28 ng kg −1 day −1 , which was in the same order of magnitude as the level of particulate PAH 15 with a mean value of 21.13 ng kg −1 day −1 . However, it can be seen from Figure 2A that particulate PAHs had a significantly higher BaP eq contribution and higher risk than did gaseous phase (Kruskal-Wallis test, p < 0.01). Previous research suggested PAHs with higher molecular weight were also more carcinogenic and more easily absorbed onto particles. Our findings are consistent with previous results [46].
In addition to the influence of environmental concentration, the exposure behaviors and physiological parameters of different subgroups of the population are often different, resulting in variation in the health risk. Kruskal-Wallis analysis was used to compare people of different genders, residential areas, and age groups. Although a significant difference can be detected between the body weight and inhalation rate of male and female participants (Kruskal-Wallis, p < 0.01), no significant carcinogenic risk difference was found between genders (p = 0.11). Similarly, no significant risk difference was observed among age groups (p = 0.64). These results also indicated that the influence of physiological parameters on exposure risk was not as obvious as we expected. In contrast, the health risk of rural participants (geomean, 5.9 × 10 −6 ) was significantly higher than that of urban ones (geomean, 3.4 × 10 −6 ). Based on the results above, it can be inferred that different exposure levels in urban and rural areas may be more important factors than physiological parameters. According to the criteria suggested by the U.S. EPA, a one-in-a-million chance of an additional human cancer over a 70 year lifetime is an acceptable level of risk, and a one-in-ten-thousand or greater chance is considered to be serious risk. Based on this, almost the entire population within the research area exceeded acceptable limits. However, none of the participant's risk exceeded the serious level. The results suggested local people had low probability of potential cancer risk.

Parameter Distribution Estimation
According to the results of the Shapiro-Wilk test (p > 0.05), the BaP eq concentration can be described by a log-normal distribution LN(2.21, 1.01). After ranking these data from least to greatest, plotting positions (proportions) for use in a cumulative probability distribution were calculated as: Distribution of concentration was obtained by Gaussian fitting of the monitoring data from 25 sampling sites ( Figure 3A, R 2 = 0.941). In a traditional assessment model, it is difficult to collect the physiological parameters of participants due to the large population and high cost. The development of research on exposure parameters in recent years has provided great convenience to the accurate assessment of health risk through the Monte Carlo method. We obtained quartiles and medians from the Exposure Factors Handbook of Chinese Population. The weight and IR area taken as x values and the corresponding quartiles as y values. The weight and IR distribution of the population were also fitted using a cumulative distribution function of the log-normal model ( Figure 3B,C). The fitting parameters describing the curves are shown in Table 2.

Comparison of Risk Distribution between the Two Methods
The risk distribution of individual participants and the result derived by Monte Carlo simulation are illustrated in Figure 4A,B, respectively. Obvious difference can be observed between the distribution mode of two methods. Three peaks can be found in the actual risk distribution of 2740 participants. The three peaks contained a population of 741, 1594, and 405, respectively. The age composition of three peaks was analyzed. No obvious age difference was observed among three groups. If the population was divided into age subgroups by every 10 years, people in age group 30-40 had the highest proportion in three peaks with percentages of 25.9%, 26.0%, and 32.1%, respectively. In contrast, the analysis of the spatial variation of risk found that a total of 51.2% of the population in the first peak was from Loufan District, which was considered to be a relatively low-pollution area in Taiyuan. For the third peak with high risk, about 49.5% of the population was from Jiancaoping District, which is a highly polluted area in the center of Taiyuan City. The results showed that a large proportion of people at high risk were from the same region. The same applied to the people at low risk. This showed that spatial variation of PAHs had an obvious impact on risk distribution. Meanwhile, population density may also be an important reason for this result. In summary, the statistical analysis showed that the arithmetic mean of actual ILCR was 6.25 × 10 −6 , with a standard deviation of 6.40 × 10 −6 . The average risk according to the Monte Carlo simulation was 1.63 × 10 −5 with a standard deviation of 2.37 × 10 −5 .

Parameter Sensitivity Analysis
Comparing the Monte Carlo results with the actual risk calculated based on 2740 subjects, it can be found that Monte Carlo simulation overestimated the risk level of the population, and the range of uncertainty introduced was much larger than that of the real population. A detailed sensitivity analysis was conducted to identify the input variables that were critical to the accuracy of the risk assessment.
In each iteration, the set of input variables and the corresponding risk were saved. The sensitivity of each input variable was quantified by comparing the Spearman rank correlation coefficients ( Figure 5). The greater the coefficient between parameter and risk, the more sensitive the parameter is. The result showed that BaP eq concentration had the highest parameter sensitivity with Spearman's r = 0.958. This is consistent with the previous research [47,48]. Exposure concentration was a more sensitive factor than exposure parameters in affecting the accuracy of assessment in this research. The other two parameters, BW and IR, had comparable sensitivity, with r values of −0.185 and 0.182.

Key Factors for Improving the Monte Carlo Simulation
The comparison showed that differences exist between the risk calculated by the traditional model and the actual risk. Combined with the results of parameter sensitivity, the process of exposure parameter estimation can be an important source of uncertainty. Three possible causes of the high uncertainty existed in the process of parameter estimation. First, when fitting the concentration distribution, all the 25 samples had the same weight, which ignored the spatial variation of pollutants and population density. As a result, the importance of sample sites in densely populated areas was reduced, and the importance of concentration from some sparsely populated areas was overestimated. Second, the physiological parameters of the population differed greatly, which brought great uncertainty when they were included in the same group. Third, obvious positive correlation existed between BW and IR ( Figure S2), but the two variables were randomly sampled in the traditional Monte Carlo process separately. We could not find the relationship between randomly sampled BW and IR ( Figure S3).
Correspondingly, we revised the traditional model from three aspects to solve the problems above and reduce the uncertainty caused by parameter estimation. First, a new BaP eq concentration distribution curve was established based on the BaP eq exposure level of individual participants instead of sampling sites. Thus, the weight of each sample point in the fitting model would be reflected through the number of people near the sites. Second, stratified analysis was used in the population to reduce the difference of physiological parameters in each subgroup. According to the characteristics of exposure parameters, the population was divided into three age groups: 18-44, 45-59, and 60-70 years old. Surrogate samples were generated by MCS. The number of surrogate samples in the age group was determined by the proportion of the age group in the whole population. Finally, the regression relationship between IR and BW was established. IR was replaced by an equation with BW to calculate the carcinogenic risk (Equation (4)). Only the concentration and body weight were used as variables for Monte Carlo simulation. The purpose of the three adjustments was to explore the influence of parameters on the simulation results, increase the accuracy, and reduce the uncertainty of MCS.

Concentration-Adjusted Monte Carlo
Different sampling points had different contributions to the population ILCR. The weight of each sampling site depended on the population density near the sites. The spatial distribution of BaP eq concentration in Taiyuan were constructed by ArcGIS software using the inverse distance weighting method ( Figure 6A). The participant frequency of each sampling site is illustrated in Figure 6B. From the figure, we can clearly see that the pollution level in the urban area was higher than that in the suburbs. The highest pollution level was found in the north of the city. Jiancaoping District had the highest concentration. Site JCP3 had the greatest BaP eq concentration of 76.09 ng·m −3 . Fortunately, the participant frequency of four sampling sites in Jiancaoping was relatively low. The residence of only 80 participants was near the sampling site JCP3. The highest frequency was at a sampling site in Xiaodian District, followed by Yingze and Xinghualing. In order to take the factor of population density into account, we assumed that the population density around the sampling sites can be reflected by the frequency of subjects. Thus, the BaP eq concentrations from each sampling site were replaced by exposure levels of individual participants nearby to build a new BaP eq concentration distribution. The BaP eq was also fitted using log-normal distribution LN(2.419,1.317). The risk of the population was estimated by another Monte Carlo simulation. The arithmetic mean of participants was 8.23 × 10 −6 , with a standard deviation of 8.48 × 10 −6 .

Age-Stratified Monte Carlo
In addition to the influence of exposure concentration, uncertainty existed in the physiological parameters of participants also. Collected data showed that there always were big gaps between the exposure parameters of different life stages. The average inhalation rate of people in age of 18~44 years was 18% higher than that of people aged from 60~70 years old. Dividing the population into subgroups is a commonly used method to reduce the uncertainty. We categorized people potentially being exposed to PAHs in Taiyuan

Parameter-Correlation-Adjusted Monte Carlo
Inhalation rate was replaced by the relationship between IR and BW in (Equation (4)). The concentration and weight were used as variables and randomly generated by MCS. As a result of the parameter-correlation-adjusted Monte Carlo, the arithmetic mean of ILCR was 8.10 × 10 −6 , with a standard deviation of 1.26 × 10 −5 .

Comparison of Improved Models
The risk distributions of four Monte Carlo simulations were compared with the actual risk as shown in Figure 7. It can be seen that the actual cancer risk had an irregular distribution. Therefore, it is difficult to accurately reflect the real distribution of risk using Monte Carlo simulation. However, we can evaluate the goodness of fit of the simulation based on two indicators including the accuracy and the variation. Simulation accuracy was quantitatively characterized by comparing the arithmetic mean and median values. The variation was evaluated by comparing the standard variation and the range. Risk distribution curves and statistics showed that no difference was found between the traditional model and the age-stratified model. The mean and median values of MCS surrogates was 2.6 and 1.9 times higher than the actual risk based on measured parameters. The standard deviation was 3.7 times higher, and the range was 1.5 orders of magnitude larger. The result indicated there was a big gap between the ILCR predicted by traditional and age-stratified MCS and the actual ILCR, and the age-stratification method did not improve the accuracy of prediction in this study. There maybe two reasons accounting for the failure of age-stratified MCS. First, according to the results of sensitivity analysis, the Spearman rank correlation coefficients between these two parameters and ILCR were 0.19 and 0.18. BW and IR were not the most sensitive parameters in this study. The effect of age stratification on reducing uncertainty is relatively small. Second, our target group was adults in Taiyuan, and the proportion of adolescents and the elderly in the population was relatively low, which means that the difference of physiological parameters was not obvious. Therefore, the difference between the two simulation methods in Taiyuan was not obvious. In contrast, much better prediction results can be seen in Figure 7 by adjustment of the concentration and parameter correlation. The arithmetic mean and median of the concentration-adjusted model were 32% and 18% higher than the actual ILCR. The arithmetic mean and median of the correlated-parameter-adjusted model were only 29.5% and 2% higher than actual risk, respectively. Elimination of parameter-related effects greatly increased the accuracy of prediction. The predicted median value almost coincided with the actual risk. Furthermore, the overlap between the curves of the two models and the green curve representing the actual risk showed a great reduction in the variation of these two models. The standard deviation of the concentration-adjusted and parametercorrelation-adjusted models was 1.97 times higher than the actual ILCR. Concentration adjustment had a better effect in reducing the standard deviation.
To provide a reasonable and effective method to improve the MCS method, the cost of each improvement also needs to be considered. In this study, the adjustment of concentration requires a better understanding of population spatial distribution and density. The acquisition of this information always requires a lot of additional work. If we want more accurate assessment results, it is a good choice to replace the site sampling with individual sampling, which requires even more resources. In contrast, the cost of parameter correlated adjustment is relatively low. Due to the development of exposure science, the relationship between parameters has been well documented in different races, ages, and genders. Correlated parameter adjustment can effectively improve the accuracy at low cost. Age stratification requires more detailed physiological parameter research of different age groups. It is also a low-cost method for countries and regions with relatively perfect parameter research. In addition, it should be noted that although the effect of age stratification was not obvious in this study, it is still possible to greatly improve the accuracy of prediction for populations with a more complex age structure.

Conclusions
In recent years, risk assessment has become an area of public concern. Due to the complexity of individual exposure environment and high cost of monitoring, risk assessment needs to find a balance between cost and accuracy. Therefore, the improvement of risk assessment based on MCS not only has important scientific value but also carries positive socioeconomic impacts. In this research, the health risk of the target population was evaluated through the traditional assessment method and also through three improved models. The results were compared with risk estimated based on the measured parameters of 2740 participants. The comparison suggested that the adjustment of concentration and correlated parameters can greatly improve the accuracy and reduce the uncertainty of risk assessment. In contrast, for people with small physiological differences, stratified analysis cannot improve the model effectively. The findings of this study are significant as they can help improve the accuracy of the risk assessment. Moreover, the result suggested that in addition to laboratory experiments and field measurements, more attention should be paid to collecting the sociological information of populations and finding out potential relationship between exposure parameters. Full consideration of these factors can greatly improve the prediction of health risk, reduce the scale of sampling, and minimize the cost of experiments.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijerph182111106/s1, Figure S1: Comparison of age structure difference between participants and local population, Figure S2: Correlation between IR and BW in participants, Figure S3: Relationship between inhalation rate and body weight by traditional Monte Carlo simulation, Table S1: Toxicity equivalency factors (TEFs) and reference dose for the PAHs [49], Table S2: Estimation of BMR (basal metabolic rate) of population in Taiyuan, Text S1: Estimation of inhalation rate [50,51].