An Integrated Method for Factor Number Selection of PMF Model: Case Study on Source Apportionment of Ambient Volatile Organic Compounds in Wuhan

The positive matrix factorization (PMF) model is widely used for source apportionment of volatile organic compounds (VOCs). The question about how to select the proper number of factors, however, is rarely studied. In this study, an integrated method to determine the most appropriate number of sources was developed and its application was demonstrated by case study in Wuhan. The concentrations of 103 ambient volatile organic compounds (VOCs) were measured intensively using online gas chromatography/mass spectrometry (GC/MS) during spring 2014 in an urban residential area of Wuhan, China. During the measurement period, the average temperature was approximately 25 °C with very little domestic heating and cooling. The concentrations of the most abundant VOCs (ethane, ethylene, propane, acetylene, n-butane, benzene, and toluene) in Wuhan were comparable to other studies in urban areas in China and other countries. The newly developed integrated method to determine the most appropriate number of sources is in combination of a fixed minimum threshold value for the correlation coefficient, the average weighted correlation coefficient of each species, and the normalized minimum error. Seven sources were identified by using the integrated method, and they were vehicular emissions (45.4%), industrial emissions (22.5%), combustion of coal (14.7%), liquefied petroleum gas (LPG) (9.7%), industrial solvents (4.4%), and pesticides (3.3%) and refrigerants. The orientations of emission sources have been characterized taking into account the frequency of wind directions and contributions of sources in each wind direction for the measurement period. It has been concluded that the vehicle exhaust contribution is greater than 40% distributed in all directions, whereas industrial emissions are mainly attributed to the west southwest and south southwest.


Introduction
Volatile Organic Compounds (VOCs) play a significant role in local, regional, and global air pollution. VOCs are harmful to humans, ecosystems, and the atmosphere because of their role in the formation of ozone and peroxy-acetyl nitrate (PAN) [1][2][3][4]. Exposure to VOCs is associated with acute toxic symptoms and the risk of mutagenicity and carcinogenicity [5,6]. With rapid economic growth

Measurement Site and Instrumentations
Wuhan city, situated in the Eastern part of the Jianghan Plain and at the intersection of the Yangtze and Hanjiang Rivers, is the largest metropolis in central China with an area of about 8500 km 2 and a population of approximately 10.2 million [34]. The economic growth of Wuhan in recent years is dramatic. In 2017, Wuhan's gross domestic product amounted to 1340 billion yuan, ranking eighth in the country, where automobile, electronic information, equipment manufacturing, food and tobacco, and energy and environmental protection are the pillar industries with economic product over one-hundred-billion Yuan (RMB). The number of vehicles in Wuhan exceeded 1,000,000 in 2010, and increased by 399,500, 414,000, and 362,000 vehicles from 2015 to 2017 respectively, and reached 2,830,000 vehicles in 2018 [36].
A typical subtropical humid monsoon climate occurs in this region with habitual climate characteristics of a hot summer and cold-humid winter. The measurement site is located at the super monitoring station of Wuhan (30 • Figure 1). The measurement period was from 10 May to 31 May 2014. Given the springtime period, the average temperature was 25 • C within the range of 17 to 34 • C, so that there are very few usages of domestic heating and cooling with air conditioning compared to winter and summer. The average wind speed was 1.3 m/s, with north/northwest as the prevailing wind direction. Low wind speeds occurred for 24% of the recorded hours of the measurement period.
The environmental monitoring center (EMC) of Wuhan municipality has started an investigation of VOC emission inventory. However, quantitative emission inventory data is currently lacking and only the number of companies active in different industrial sectors and located in the 13 districts of Wuhan is available. This piece of information can only provide an idea of the spatial distribution of industrial sources related to VOC emission. As reported in Table 1, 562 companies have been registered, mostly for packaging and printing (165), automotive (92), and equipment and furniture manufacturing (68). The districts of Huangpi, Dongxihu, and Jingkai (numbered 1, 2, and 4 in the right graph of Figure 1, respectively) are located north, northwest, and southwest of the monitoring site, respectively, and have the largest numbers of companies. However, a rather high number of companies are also located in the districts of Jianghan and Hanyang (numbered 8 and 10 in Figure 1), much closer to the monitoring site.
VOC concentrations in ambient air have been measured by the online monitor TH-PKU 300B [37], which obtained continuous and more intensive concentrations compared to passive samplers [38]. The capture and concentration of all kinds of VOC in air is elucidated by an ultra-low temperature (−160 • C) air tube capture and concentration technology. Sampling was performed under 10 min/h, using a metal tube with a 1µm filter. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using Agilent 7820 and Agilent 5975 devices. The flame ionization detector (FID) was used, and the columns were PLOT and DB-624 for MS. The temperature increased from 35 • C to 180 • C at a rate of 6 • C/min. The calibration of sampling flow rate, mass spectrometer tuning, blank experiment, and instrument calibration was conducted regularly to validate the data quality acquired by the TH-PKU 300B system. The measurement was operated by specialists from Wuhan EMC. The main calibration methods were internal standard and external standard. The internal standard at 4 ppb for each hour was inserted into the sample and analyzed together with the samples. The external standard at 4 ppb was tested once a day for all 103 species. The detection limit of 97% of the species was less than 0.05 ppb, and the detection limit of 70% of the species was less than 0.01 ppb. The measurement accuracy of 80% of the species was less than 10%, and the measurement accuracy of 45% of the species was less than 5%. The precision of 80% of the species was within ±20%, and of 60% of the species was within ±10%. Overall, 57 non-methane hydrocarbons (NMHCs), 33 halocarbons, and 13 carbonyls have been analyzed at one hour time resolution (Table 2).
Atmosphere 2018, 9, x FOR PEER REVIEW 3 of 18 A typical subtropical humid monsoon climate occurs in this region with habitual climate characteristics of a hot summer and cold-humid winter. The measurement site is located at the super monitoring station of Wuhan (30°36′ N, 114°17′ E), in a typical residential area in the urban area ( Figure 1). The measurement period was from 10 May to 31 May 2014. Given the springtime period, the average temperature was 25 °C within the range of 17 to 34 °C, so that there are very few usages of domestic heating and cooling with air conditioning compared to winter and summer. The average wind speed was 1.3 m/s, with north/northwest as the prevailing wind direction. Low wind speeds occurred for 24% of the recorded hours of the measurement period.
The environmental monitoring center (EMC) of Wuhan municipality has started an investigation of VOC emission inventory. However, quantitative emission inventory data is currently lacking and only the number of companies active in different industrial sectors and located in the 13 districts of Wuhan is available. This piece of information can only provide an idea of the spatial distribution of industrial sources related to VOC emission. As reported in Table 1, 562 companies have been registered, mostly for packaging and printing (165), automotive (92), and equipment and furniture manufacturing (68). The districts of Huangpi, Dongxihu, and Jingkai (numbered 1, 2, and 4 in the right graph of Figure 1, respectively) are located north, northwest, and southwest of the monitoring site, respectively, and have the largest numbers of companies. However, a rather high number of companies are also located in the districts of Jianghan and Hanyang (numbered 8 and 10 in Figure 1), much closer to the monitoring site.
VOC concentrations in ambient air have been measured by the online monitor TH-PKU 300B [37], which obtained continuous and more intensive concentrations compared to passive samplers [38]. The capture and concentration of all kinds of VOC in air is elucidated by an ultra-low temperature (−160 °C) air tube capture and concentration technology. Sampling was performed under 10 min/h, using a metal tube with a 1μm filter. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using Agilent 7820 and Agilent 5975 devices. The flame ionization detector (FID) was used, and the columns were PLOT and DB-624 for MS. The temperature increased from 35 °C to 180 °C at a rate of 6 °C/min. The calibration of sampling flow rate, mass spectrometer tuning, blank experiment, and instrument calibration was conducted regularly to validate the data quality acquired by the TH-PKU 300B system. The measurement was operated by specialists from Wuhan EMC. The main calibration methods were internal standard and external standard. The internal standard at 4 ppb for each hour was inserted into the sample and analyzed together with the samples. The external standard at 4 ppb was tested once a day for all 103 species. The detection limit of 97% of the species was less than 0.05 ppb, and the detection limit of 70% of the species was less than 0.01 ppb. The measurement accuracy of 80% of the species was less than 10%, and the measurement accuracy of 45% of the species was less than 5%. The precision of 80% of the species was within ±20%, and of 60% of the species was within ±10%. Overall, 57 non-methane hydrocarbons (NMHCs), 33 halocarbons, and 13 carbonyls have been analyzed at one hour time resolution (Table  2).   Other sectors include: Papermaking industry, cement manufacturing, textile printing and dyeing industry, fertilizer manufacturing industry, battery manufacturing, cooking industry, and tobacco products industry.
Atmosphere 2018, 9, 390 5 of 18 Table 2. VOC species analyzed by the gas chromatography-mass spectroscopy (GC-MS) system (species considered for positive matrix factorization (PMF) analysis are in bold).

Positive Matrix Factorization
Positive matrix factorization (PMF) is a multivariate factor analysis technique used for source identification and source apportionment of atmospheric pollutants. The PMF model is one of the multivariate receptor models developed by the US environmental protection agency (US-EPA). The PMF receptor model is most preferred [39] and has been widely used for source apportionment [40,41] since it simply requires measured concentration data other than a detailed and prior knowledge of sources.
In the PMF model, any data matrix X (n × m) can be factorized in two matrices G (n × p) and F (p × m); the residual matrix E as described in Equation (1).
where n and m are the number of samples and the number of species and p is the number of factors extracted. Equation (1) explains the case of source apportionment of atmospheric pollutants, where x i,j is the concentration of species j measured in sample i, p is the number of the factors contributing to the samples, g i,k the relative contribution of factor k to sample i, f k,j is the concentration of species j in factor profile k, and e i,j is the error of the PMF model for the j species measured in sample i. The goal is to find the g i,k , f k,j , and p values that best reproduce the observations x i,j . In the computational process the values of g i,k and f k,j are adjusted until a minimum value of the objective function Q for a given p is found, where Q is defined in Equation (2): where s i,j is the uncertainty of the concentration of species j in sample i, n is the number of samples and m the number of species. Different from other receptor models (i.e., chemical mass balance), the PMF solves Equation (1) without requiring prior knowledge of the number and type of sources that contribute to the chemical characteristics of the samples. Simply relying on two input files, sample species' concentration data and sample species' uncertainty data, the PMF solves the equation for each factor p, concurrently estimating the factor contributions (G) and the factor profiles (F). Sample species uncertainty can be derived from actual uncertainty data of analytical determination or be estimated through an equation-based approach from specific parameters, such as the detection limit (DL) of the measurement method [42][43][44].
In this study, 63 significant species of the measured 103 VOCs were selected for the PMF 5.0 model runs ( Table 2). Following the recommendations of the PMF user guide [45], the missing concentration data in the time series of selected species have been replaced with the median values of the data distributions, while the related value of the uncertainty was set as equal to four times the median. The uncertainty determination was followed the description by Polissar et al. [42] and Yuan et al. [46]. Outliers (values higher than 5% of all samples) were excluded from the dataset and flagged as a missing value. Concentrations less than or equal to the instrumental detection limit have been substituted by half the DL and the corresponding uncertainty was set as two times the DL. The uncertainty of each species is determined using Equation (3): where U i,j is the uncertainty for the sample i and species j, while c i,j is the concentration of the species j in the sample i.

Method for Selection of Factor Number
The final goal of the PMF runs is the determination of the number of factors, where factors refer to the sources of emission, the chemical composition of each factor, and the contribution of each factor to the sample minimizing the residuals. In general, source apportionment techniques do not recognize a single source but rather source categories, for instance traffic exhaust, biomass burning, whose emissions are characterized by specific markers in their chemical composition profiles.
As the PMF solution depends on the number of factors used to initialize the model run, this choice strongly affects source apportionment analysis. Additionally, the factors resulting from PMF have to be associated with emission sources characterized by the related chemical compositions. In general, increasing the number of factors would decrease the error of estimated concentrations, thus the higher the number of factors the better the explanation of observed concentration data but, at the same time, the more difficulty for the association of each factor to a corresponding proper source. Emission inventory data may be useful to address the choice of the number of factors. In our case, however, quantitative estimation of VOC emissions was not available and the suggestion about the wide variety of industrial activities potentially responsible for VOC emissions was the only available information. Thus, in order to make the choice of the number of factors as less arbitrarily as possible, we developed a multiple-indicator approach based on the values of three different statistical indicators for PMF model performance: (i) the correlation coefficients between observed and model reconstructed concentrations for a single VOC; (ii) an overall correlation coefficient for all the VOCs considered in PMF runs; and (iii) the normalized absolute error between observed and reconstructed concentrations for the entire VOCs dataset.
The first indicator is intended to assess the PMF performance by properly reconstructing the observed time trend of a single VOC species. Practically, we set a minimum threshold value (r 2 min = 0.8) for the correlation coefficient between observed and reconstructed concentrations and counted the number of species (N p ) with an actual coefficient (r 2 act,p ) greater than that for each model run. An increasing number N p of species with r 2 act,p > r 2 min is expected as PMF model runs are initialized with an increasing number or factors p.
The second indicator, instead of considering r 2 act,p values for a single VOC, relies on a weighted average correlation coefficient r 2 avg,p calculated with Equation (4): where r 2 act,p,j is the coefficient of correlation between observed and reconstructed concentrations for species j resulting from a PMF run with p factors, C avg,j the average concentration of the species j, and N the number of species considered in PMF runs (N = 63 in this study). This indicator is intended to assess overall PMF performance in reconstructing the time patterns of the entire VOC dataset, giving larger relative importance to the most abundant species through concentration-based weight. As for the first indicator, larger values of r 2 avg,p are expected in PMF runs initialized with larger number of factors.
The third indicator, intended to assess the accuracy in the model reconstruction of the observed concentration values for the entire dataset, is the normalized absolute error (NAE p ) calculated according to Equation (5): where C obs,i is the observed concentration of specie i and C pred,i is the corresponding reconstructed concentrations by PMF model with p factors. Contrary to the two previous indicators, NAE p is expected to decrease as p increases because of the enhanced ability of the model in data reconstruction. Performing different PMF simulations, initialized with increasing number of factors p (in our case we considered p = 4, 5, 6, 7, and 8) leads to a set of p-dependent indicators (N p, r 2 avg,p , and NAE p ) that account for both the time pattern reproduction and modeled concentration accuracy. The comparative analysis of the behavior of the indicator sets in relation to the number of factors and, in particular, a marginal improvement in data reconstruction for increasing number of factors, addresses the selection of a proper number of factors (i.e., of sources), is acceptable both as a mathematical solution but, most of all, is reasonable for the environmental interpretation of the results (i.e., source identification).

General Pattern of VOC Concentrations
The time pattern of NMHC, halocarbon, and carbonyl concentrations observed during the monitoring period in Wuhan is presented in Figure 2. In general, the concentration of NMHCs is about one order of magnitude higher than the halocarbons and carbonyls. The average concentration of NMHCs was 31 ppb. Halocarbons and carbonyls concentrations were in the order of a few tenths of ppb but mostly below 20 ppb. The average concentration of the halocarbons was 5.4 ppb. The average concentration of carbonyls was 4.5 ppb. Concentrations of NMHCs, halocarbons, and carbonyls show peaks around the 16th and 25th, these are influenced by local weather type and more pronounced than the diurnal effect (see Figure 2). During the period of 16 to 25 May, the atmosphere was controlled by subtropical high pressure and, at local scale there appeared strong straight air [47], thus local emissions accumulated which resulted in a notable increase of all three groups.
The most abundant VOCs during the monitoring period were the lightest alkanes and alkenes (ethane, propane, n-butane, and ethylene) together with acetylene and the lightest, single-ring aromatics (benzene and toluene), which had period-averaged concentrations in the 1-5 ppb range. Table 3 reports the comparison of these VOCs from Wuhan with literature data reported in other works from urban areas in China or other countries. Even though the reported concentration levels depend on several factors (i.e., monitoring season, monitoring site location, and exposure to emission sources and analytical methods) [48], they allow for contextualization of Wuhan data, showing substantial agreement with those from other Chinese cities. In particular, the comparison shows that the concentration levels of benzene and toluene in Wuhan are similar to those concurrently measured in Beijing in the same period of May 2014. In general, the concentration of NMHCs is about one order of magnitude higher than the halocarbons and carbonyls. The average concentration of NMHCs was 31 ppb. Halocarbons and carbonyls concentrations were in the order of a few tenths of ppb but mostly below 20 ppb. The average concentration of the halocarbons was 5.4 ppb. The average concentration of carbonyls was 4.5 ppb. Concentrations of NMHCs, halocarbons, and carbonyls show peaks around the 16th and 25th, these are influenced by local weather type and more pronounced than the diurnal effect (see Figure 2). During the period of 16 to 25 May, the atmosphere was controlled by subtropical high pressure and, at local scale there appeared strong straight air [47], thus local emissions accumulated which resulted in a notable increase of all three groups.
The most abundant VOCs during the monitoring period were the lightest alkanes and alkenes (ethane, propane, n-butane, and ethylene) together with acetylene and the lightest, single-ring aromatics (benzene and toluene), which had period-averaged concentrations in the 1-5 ppb range. Table 3 reports the comparison of these VOCs from Wuhan with literature data reported in other works from urban areas in China or other countries. Even though the reported concentration levels depend on several factors (i.e., monitoring season, monitoring site location, and exposure to emission sources and analytical methods) [48], they allow for contextualization of Wuhan data, showing substantial agreement with those from other Chinese cities. In particular, the comparison shows that the concentration levels of benzene and toluene in Wuhan are similar to those concurrently measured in Beijing in the same period of May 2014.  Figure 3 shows the results for the sets of the three statistical indicators obtained by PMF simulations initialized with four, five, six, seven, and eight factors, respectively.

Factor Selection for PMF Runs
The minimum of N p with correlation coefficients greater than the threshold (r 2 min = 0.8) is N 4 = 10, which is obtained with a 4-factor simulation; the maximum number of N p is N 8 =22, which is obtained with an 8-factor simulation (Figure 3, bottom panel). As expected, N p becomes larger as p increases but the trend is not linear. The transition from four to five factors leads to a sharp increase in N p (from 10 to 16), whereas, further increase of the number of factors results is a more regular increase of the N p . Actually, passing from six to seven factors we have an increase of one unit for N p (18 to 19), while going from 7 up to 8 factors N p varies from 19 to 22.
The weighted average correlation coefficient r 2 avg,p ranges between 0.70 and 0.78 (Figure 4, middle panel) with an increasing trend for N p . However, the transition from 4 to 5 factors does not imply any significant change in the indicator (r 2 avg,4 ≈ r 2 avg,5 ), whose notable increase occurs only when shifting from 6 (r 2 avg,6 = 0.72) to 7 (r 2 avg,7 = 0.76) factors. One additional factor included in PMF simulation leads to a small increase for the indicator value (from 0.76 up to 0.78).
Contrary to the previous indicators, the NAE p shows a declining trend for increasing number of factors ( Figure 3, top panel) down from NAE 4 = 3.85 to NAE 8 = 2.56. However, while PMF-increased accuracy is rather limited from four to six factors, a clear improvement is obtained when shifting from six to seven factors, with NAE p passing from NAE 6 = 3.51 to NAE 7 = 2.64 (−25%); conversely, shifting from seven up to eight factors does not involve important improvement (NAE 8 = 2.56, that is only 3% less than NAE 7 ).
The comparative analysis of the behavior of the indicators in relation to the number of factors supports that the 7-factor solution is mathematically reasonable. Too few factors may lead to an underestimation of the emission in spring in the area, additionally, if there are too many factors, this can prevent correct identification of sources, so our final choice fell on the 7-factor solution.
Atmosphere 2018, 9, x FOR PEER REVIEW 10 of 18 Figure 3 shows the results for the sets of the three statistical indicators obtained by PMF simulations initialized with four, five, six, seven, and eight factors, respectively.

Factor Selection for PMF Runs
The minimum of Np with correlation coefficients greater than the threshold (r 2 min = 0.8) is N4 = 10, which is obtained with a 4-factor simulation; the maximum number of Np is N8 =22, which is obtained with an 8-factor simulation (Figure 3, bottom panel). As expected, Np becomes larger as p increases but the trend is not linear. The transition from four to five factors leads to a sharp increase in Np (from 10 to 16), whereas, further increase of the number of factors results is a more regular increase of the Np. Actually, passing from six to seven factors we have an increase of one unit for Np (18 to 19), while going from 7 up to 8 factors Np varies from 19 to 22.
The weighted average correlation coefficient r 2 avg,p ranges between 0.70 and 0.78 (Figure 4, middle panel) with an increasing trend for Np. However, the transition from 4 to 5 factors does not imply any significant change in the indicator (r 2 avg,4 ≈ r 2 avg,5), whose notable increase occurs only when shifting from 6 (r 2 avg,6 = 0.72) to 7 (r 2 avg,7 = 0.76) factors. One additional factor included in PMF simulation leads to a small increase for the indicator value (from 0.76 up to 0.78).
Contrary to the previous indicators, the NAEp shows a declining trend for increasing number of factors ( Figure 3, top panel) down from NAE4 = 3.85 to NAE8 = 2.56. However, while PMF-increased accuracy is rather limited from four to six factors, a clear improvement is obtained when shifting from six to seven factors, with NAEp passing from NAE6 = 3.51 to NAE7 = 2.64 (−25%); conversely, shifting from seven up to eight factors does not involve important improvement (NAE8 = 2.56, that is only 3% less than NAE7).
The comparative analysis of the behavior of the indicators in relation to the number of factors supports that the 7-factor solution is mathematically reasonable. Too few factors may lead to an underestimation of the emission in spring in the area, additionally, if there are too many factors, this can prevent correct identification of sources, so our final choice fell on the 7-factor solution.

Source Identification by PMF
The profiles of the factors resulting from the 7-factor solution are shown in Figure 4. The concentration of each species is divided into each factor and indicated by a blue color, and the percentage that each species is explained by the factor is indicated by a red square. The concentrations corresponding to the y axis on the left are expressed in the logarithmic scale and the percentage of species explained by the factor must be sought in the y axis on the right. The seven factors have been identified as sources of VOCs on the basis of the resulting emission profile markers explained below.

Source Identification by PMF
The profiles of the factors resulting from the 7-factor solution are shown in Figure 4. The concentration of each species is divided into each factor and indicated by a blue color, and the percentage that each species is explained by the factor is indicated by a red square. The concentrations corresponding to the y axis on the left are expressed in the logarithmic scale and the percentage of species explained by the factor must be sought in the y axis on the right. The seven factors have been identified as sources of VOCs on the basis of the resulting emission profile markers explained below. Figure 4. Source profiles. The blue bars represent the concentration and red dots represent the percentage that each species is explained by the factor.
The first factor is assigned to industrial use of solvents. This factor explains the presence of TEX (toluene 15.4%, ethylbenzene 51.6%, and xylene 40.2%) as well as C6 and C7 alkanes (cyclohexane 49%, methylcyclopentane 51.7%, 2-methylpentane 51.9%, and 3-methylpentane 53.3%). All these organic compounds are commonly used as solvents in industrial processes [32,46]. The main industrial sources of VOCs present in Wuhan are from the manufacture of cars, press, production of furniture, and production of shoes and toys. VOCs explained by this factor are mainly related to paints and use of adhesives in the production processes [46].
The second factor explains mainly ethylene (58.6%) and toluene (52.0%), which are associated with the combustion of coal [32,46,53]. In China, coal is the dominant source of energy [54]. Coal combustion is also an important VOCs source [55].
The third factor is associated with the exhaust gases of motor vehicles, identified by specific tracers such as ethylene, toluene, benzene, acetylene, and other aromatics and alkanes (propane, nbutane etc.) [56]. This source, identified as vehicular emissions, explains 14.1% of ethylene, 53.7% of propane, and 35.4% of acetylene. These VOCs, indicated as tracers of emissions from vehicles, are consistent with other PMF-based studies for Los Angeles [53], Shanghai [57], Tianjin [58], Houston [59], and other receptor models, such as in Turkey reported by Dumanoglu et al. [25].
The fourth factor is characterized by the dominant presence of two specific VOCs, trans-1,3dichloropropane (90.8%) and 1,4-dichlorobenzene (73.7%). These two species are part of the family of chlorinated VOCs and are commonly used as pesticides. The first factor is assigned to industrial use of solvents. This factor explains the presence of TEX (toluene 15.4%, ethylbenzene 51.6%, and xylene 40.2%) as well as C 6 and C 7 alkanes (cyclohexane 49%, methylcyclopentane 51.7%, 2-methylpentane 51.9%, and 3-methylpentane 53.3%). All these organic compounds are commonly used as solvents in industrial processes [32,46]. The main industrial sources of VOCs present in Wuhan are from the manufacture of cars, press, production of furniture, and production of shoes and toys. VOCs explained by this factor are mainly related to paints and use of adhesives in the production processes [46].
The second factor explains mainly ethylene (58.6%) and toluene (52.0%), which are associated with the combustion of coal [32,46,53]. In China, coal is the dominant source of energy [54]. Coal combustion is also an important VOCs source [55].
The third factor is associated with the exhaust gases of motor vehicles, identified by specific tracers such as ethylene, toluene, benzene, acetylene, and other aromatics and alkanes (propane, n-butane etc.) [56]. This source, identified as vehicular emissions, explains 14.1% of ethylene, 53.7% of propane, and 35.4% of acetylene. These VOCs, indicated as tracers of emissions from vehicles, are consistent with other PMF-based studies for Los Angeles [53], Shanghai [57], Tianjin [58], Houston [59], and other receptor models, such as in Turkey reported by Dumanoglu et al. [25].
The fourth factor is characterized by the dominant presence of two specific VOCs, trans-1,3-dichloropropane (90.8%) and 1,4-dichlorobenzene (73.7%). These two species are part of the family of chlorinated VOCs and are commonly used as pesticides.
The fifth factor explains n-butane (39.4%), trans-2-butene (72.5%), and 1-butene (50.7%). The combination of these species is typically found in the combustion gas of liquefied petroleum gas (LPG) [59], as also reported in other works in China [46,57]. Actually, in Wuhan there are no vehicles using LPG, but LPG is used in catering for domestic use is very popular in urban areas [32].
The sixth factor is associated with industrial emissions, because it explains the high attendance of Freon 22 (81.0%), acrolein (77.7%), acetonitrile (70.5%), and methylvinylketone (73.3%). Freon 22 has been commonly used as a fuel, coolant, and as a versatile intermediate in the chemical industries. Acrolein is used in manufacturing plastics and synthetic rubber, and is an important and versatile intermediate for the chemical industry [49]. Acetonitrile is an important solvent in the chemical industries [60], and with the increase of its wide use in the industrial sector such as pharmaceuticals, solvents, and chromatography, the public is paying more and more attention to its environmental presence [61]. Methylvinylketone is used as a chemical reagent.
The seventh factor is associated with the use of refrigerants as it explains the presence of Freon 11 (53.9%), the first cooling fluid of wide use, and of chloromethane (64.8%) that, in the past, has been used widely as a coolant. Given the risks related to its contribution to climate change and ozone depletion, its use was reduced in most countries but it is still determinable in Wuhan as a source contributor. There could be some emission and it might not be due to the residual of its long-life time.
The contributions of the emission sources to the observed VOC concentrations were also calculated. The primary source is vehicular emissions (45.4%), which is comparable to that in PRD (~50%) [13] and Beijing (57.7%) [62], and was over 40% in a French urban area ten years ago [30]. Other dominating sources are industrial emissions (22.5%) and the combustion of coal (14.7%). Other sources that contribute less than 10% include LPG combustion (9.7%), industrial solvents (4.4%), and pesticide use (3.3%). The contribution from the use of refrigerants is less than 0.05%, but it is worth noting because of the high toxicity risks associated with compounds emitted by this source. Lyu et al. [32] conducted measurement for all four seasons (February 2013 to October 2014) and found that vehicular exhausts (27.8 ± 0.9%), coal burning (21.8 ± 0.8%) and LPG had (19.8 ± 0.9%) were the main contributors to VOCs in Wuhan; industrial solvents and pesticide use were not reported.
The results of the source apportionment were represented using polar plots that show the association between the contribution of the sources generated by PMF and the origin of the air masses [63]. The results are graphically displayed in the panels of Figure 5, with computed source contributions in color scale as a function of wind speed and wind direction on an hourly basis.
The sources identified as being associated with the industrial use of solvents and industrial emissions gave their largest contributions when winds blow from the north northwest and northwest with respect to the location of data collection, with wind speeds ranging from 5 to 7 m/s (Figure 5a,f). The source associated with the combustion of coal (Figure 5b) occurs in a confined zone of wind directions between west southwest and south southwest and is associated with lower wind speeds.
The source identified as vehicular emissions (Figure 5c) is associated with westerly winds and is spread around the monitoring station mainly from south to west northwest, but also up to the north. In the area west of the monitoring site we can find a dense road system accounting for a high mileage of the Wuhan road network, including two high-capacity ring roads. The association of the largest contributions of this source with rather low wind speeds confirms the very local origin of the traffic source. Figure 5d shows that the area in which the source of pesticides has a greater contribution is located to the northeast of the monitoring site, particularly where the wind is approximately 4 m/s. There is the same origin as the LPG source (Figure 5e). The cooling source (Figure 5g) provides the lowest contribution among those sources to Wuhan; it is more pronounced in west south.

Conclusions
In this study, a multiple-indicator method, based on the marginal improvement in data reconstruction for an increasing number of factors initializing PMF runs, has been developed to select the proper number of factors. The emission sources of VOCs in the biggest city in Central China, Wuhan, have been investigated using the positive matrix factorization (PMF) model. This method suggested a 7-factor PMF solution; seven sources could be associated with emission sources based on the results of VOC source profiles. The identified seven sources are vehicular emissions (45.4%), industrial emissions (22.5%), combustion of coal (14.7%), LPG combustion (9.7%), industrial use of solvents (4.4%), pesticides (3.3%), and use of refrigerants. The vehicular emissions source profile shows high attendance of ethylene, toluene, benzene, acetylene and other aromatics, and alkanes, which are typical VOCs emitted from the exhaust gases of motor vehicles.
The industrial emissions source profile shows high contribution of Freon 22, acrolein, acetonitrile, and methylvinylketone, all typical intermediate products and process materials in the chemical industry. The profile of the combustion of coal source is characterized by the strong presence of ethylene and toluene, whereas butane and butene characterize the profile of the LPG combustion source, mainly related to catering for domestic use, which is very popular in Wuhan city.
The origins of the sources identified by PMF are conducted in PolarPlot. The results indicate that the sources identified as industrial solvents and industrial emissions dominate in the area between the north northwest and northwest. The source associated with the combustion of coal is very limited in the zone between west southwest and south southwest of the monitoring station. The source associated with the vehicular emission is spread around the point contribution monitoring but with a predominant component of the sources of pesticides and liquefied petroleum gas more associated with winds from east of the monitoring site. These findings can be used to track the source origins for the development of an emission reduction strategy in Wuhan and can implement this method in other cities suffering from air pollution.
The new developed multiple-indicator method is independent from the type and number of species put in PMF model. Each of the indicators provides robust values to compare, which will minimize the influence caused by experience on source apportionment of users. The method was developed specially for the increased demand of VOC source identification in China, but it can be used for any kind of species source apportionment analysis by PMF model.