The Study of Characteristic Environmental Sites Affected by Diverse Sources of Mineral Matter Using Compositional Data Analysis

Compositional data analysis was applied on mineral element concentrations (i.e., Al, Ti, Si, Ca, Mg, Fe, Sr) content in PM10, PM2.5 and PM1 simultaneous measurements at three characteristic environmental sites: kerbside, background and rural site. Different possible sources of mineral trace elements affecting the PM in the considered sites were highlighted. Particularly, results show that compositional data analysis allows for the assessment of chemical/physical differences between mineral element concentrations of PM. These differences can be associated with both different kinds of involved mineral sources and different mechanisms of accumulation/dispersion of PM at the considered sites.


Introduction
Particulate matter (PM) is a mixture of particles present in the air with different masses, sizes, shapes, surface areas, characterized by many chemical elements and compounds.The interest in PM has widely increased in the last several decades owing to its detrimental effect on the public health and the environment [1][2][3].PM particles when inhaled can be deposited within the respiratory system and can act as a universal carrier of a wide variety of chemical substances potentially toxic [4,5].It is well known that acute and chronic exposure to PM is associated with adverse effects on the cardiovascular and respiratory system [6].PM is also associated with air quality and visibility degradation as well as to Earth's climate change [7,8].PM particles can affect solar radiation as it passes through the atmosphere due to their properties of scattering and absorptions of light as well as being able to alter the radiative properties of weather clouds and their lifetime determining a possible effect on incoming and outgoing solar radiation and on the overall energy balance of Earth [9][10][11].
In the light of aforesaid significance of PM, effective mitigation strategies of PM levels into the air, to protect the environment and the public health, require a detailed assessment of the possible PM emission sources in relation to its chemical composition [12].In the European context, selected groups of chemical elements and compounds have been linked to specific natural and anthropogenic sources of PM such as Al, Si, Ca, Fe, Ti, Mg and Sr to crustal and mineral matter and African dust, Na, Cl and Mg to marine sources and sea spray, V and Ni to fuel-oil combustion, SO 4 2− , NO 3 − and NH 4 + to secondary aerosol and long range transport [13].However, the identification of an element or a group of elements that can unequivocally be attributed to specific natural or anthropogenic sources of mineral matter has proven to be problematic.Sources of mineral matter including desert, crustal dust due to road traffic and farming activities as well as dust due to demolition and construction activities can have in common the same range of chemical elements.Up to today, the identification and characterization of different sources of mineral matter contributing to atmospheric PM mixture remains inadequate and requires still further research as pointed out in the literature [13][14][15].
In the meanwhile, several authors have focused their studies on the simultaneous measurements of PM size fractions: PM 10 , PM 2.5 and PM 1 (i.e., particles with aerodynamic diameters below 10, 2.5 and 1 µm, respectively).Due to the fact that PM particles with different sizes are emitted into the air from different sources and have different physical and chemical characteristics, the assessment of the PM size fractions can provide with important data on the process of formation of the PM and the identification of its sources.The PM coarse size fraction (i.e., particles with aerodynamic diameters between 2.5 and 10 µm) can mainly originate from both natural and anthropogenic sources including desert dust, volcanic eruptions, sea spray, fugitive dust from paved and unpaved dusty roads, demolition and construction activities [16][17][18].The fine size fraction (PM 2.5 ) and submicron size fraction (PM 1 ) can mainly originate from anthropogenic sources such as industrial activities, road traffic, different kinds of combustion processes and secondary particles generated in the atmosphere [19][20][21].The simultaneous measurements of PM 10 , PM 2.5 and PM 1 have been carried out in a range of different environmental sites such as traffic point [22], urban traffic and suburban background sites [23], Nordic background site and wild fire episodes [24], urban background and rural sites [25,26], industrial sites [27,28], superstation site [29], urban areas and surroundings sites [30] and suburban sites and regional background [31].Further environmental investigations on size-segregate PM simultaneous measurements were reported elsewhere [32,33].
This preliminary study investigates the application of compositional data analysis to mineral element concentrations of size-segregated PM simultaneous measurements.Compositional data consist of vectors whose components represent proportions or percentages of a certain quantity.The characteristic of these vectors is that the sum of their components is a constant, equal to 1 for proportions, 100 for percentages or some other constant c.These compositional data may refer to mineral composition of rocks, sediments, pollutant compositions, mixture of gases, water composition, etc. Compositional data with appropriate statistical tools have been used for interpreting environmental data as well as for characterizing processes acting in the environment.Statistical analysis of compositional data began with Aitchison [34,35] and since then has undergone several developments.Today, a consolidated statistical technique is considered [36][37][38][39][40].
This study is based on mineral element concentrations of PM 10 PM 2.5 and PM 1 simultaneous measurements sampled in three characteristic sites (i.e., kerbside, background and rural site) possibly affected by diverse sources of mineral matter [41].The relevance of mineral element concentrations of PM at kerbside site was taken into consideration.Mineral element concentrations of PM from the kerbside were compared with that from background and rural sites [42,43].
The main objective of this preliminary study is to evaluate the essential differences between the two patterns of variability of kerbside and background site dataset as well as between the two patterns of variability of kerbside and rural site dataset.These comparisons aim to highlight the underlying processes that influence the mineral element concentrations of PM of the considered environmental sites.The present study shows that the PM of the investigated environmental sites can be affected from different sources of mineral matter.Moreover, mechanisms of accumulation and/or dispersion of mineral matter can be observed.

Materials and Methods
Mineral elements concentrations of PM 10 , PM 2.5 and PM 1 simultaneous measurements as reported in literature have been considered and they refer to three characteristics monitored environmental sites namely rural, background and kerbside site.The kerbside site was placed close to a trafficked road and at the side of a street canyon, the background site was placed in a residential area close to school building and the rural site was located away from direct emission sources and in a site surrounded by fields.The measurements were conducted in winter (January-February).The samples of PM were collected using a rotating drum impactor.The elements on the collected samples were analyzed using a synchrotron radiation-induced X-ray fluorescence spectrometry [41].The mineral elements considered were Al, Ti, Si, Ca, Mg, Fe, Sr.These mineral elements were mostly and commonly interpreted as related to mineral matter [13].In this section, the methods used for the compositional data analysis are summarized.

Compositional Data and Sample Space
Compositional data are vectors whose components are positive numbers and they sum to a constant, usually 1 or 100.The sample space of a compositional observation x with three components is the unit simplex: The simultaneous measurements PM 10 , PM 2.5 and PM 1 can be decomposed in their relative fractions as coarse (see Equation ( 2)), intermodal (see Equation ( 3)) and submicron, PM 1 , mass concentration [44-46]: These fractions can be converted into compositions based on proportions by weight [40,47]: The compositional variables of this vector, x, are nonnegative and they sum to a constant c = 100 (see Equation ( 1)).The compositional data related to simultaneous sampling of PM 10 , PM 2.5 and PM 1 of various mineral elements can be cast into the form of a matrix x, with i rows representing the mineral elements and j columns representing the coarse, intermodal and submicron size fractions.This matrix, a three part compositional dataset, takes into account the mineral element concentrations of PM with respect to its coarse, intermodal, and submicron, size fraction in %.
The following matrix for the simultaneous measurements PM 10 , PM 2.5 and PM 1 of the elements Al, Ti, Si, Ca, Mg, Fe, Sr was considered in each characteristic environmental sites (i.e., rural, kerbside and background):

Transformation of Compositional Data
Compositional data in Equation ( 4) are constrained to a constant sum.The statistical analysis of compositional data requires an approach based on log-ratios transformation.Using this transformation, a composition is represented as a real vector.The compositional data are transformed into coordinates using ilr (isometric log-ratio) transformations [48,49] (see Equation ( 6)): The following matrix for PM 10 , PM 2.5 and PM 1 simultaneous measurements of the elements Al, Ti, Si, Ca, Mg, Fe, Sr was considered in each characteristic environmental sites (i.e., rural, kerbside and background) in terms of log-ratios: The isometric coordinates ilr 1 and ilr 2 can be inverse transformed by: where C is the closure operation for a vector x defined as below in Equation ( 9).This operation divides each component of the vector x by the sum of its components, hence scaling the vector to the constant c:

Triangular Diagram Representation, Centering and Rescaling Technique
The compositional datasets, their centres and confidence regions can be represented using a triangular diagram.The data is displayed by Graham and Midgley [50].Calculations were produced using Coda Pack Software [51] and R Software [52].Compositional data can be centered using the perturbation operator of the simplex [35].
The perturbation operation is defined as the perturbation p applied to a composition x that produces the composition v: with v, p and x vectors in S 3 c ; C is the closure operation (see Equation ( 9)).Perturbing a vector x by its inverse (see Equation ( 11)), it is possible to locate any composition in the baricenter of the triangular diagram: Likewise, it is possible to centre a compositional dataset of size n: using the inverse of its centre g −1 defined as Equation ( 13) [53,54]: The centering and rescaling of the compositional data allows for better visualizing compositions close to the boundary of the triangular diagram preserving straight lines of the grid as well as statistical properties [55].

Perturbation Difference
The perturbation difference is defined as the perturbation p to which a change can be attributed from a composition x to a composition y, whatever the processes involved, (see Equation ( 14)) with p, x and z vectors in S 3 c [56]; C is the closure operation (see Equation ( 9)).

Testing Hypothesis of Multivariare Normal Distribution
The multivariate normal distribution is a generalization of the normal distribution to higher dimensions.The test hypothesis on multivariate normal distribution of the compositional dataset relating to rural, kerbside and background (see Equation ( 5)) is needed before performing the test hypothesis about center and covariance structure.The test used for multivariate normality is the Anderson-Darling, Cramer-von Mises and Watson of log-ratios transformed dataset (see Equation ( 7)) [35,36].

Testing Hypothesis about the Center and the Covariance Structure
In order to evaluate whether there are real differences between two datasets in the center, in the covariance structure, or in both, a test hypothesis about center and covariance structure was performed.The test is applied to log-ratios transformed dataset (see Equation (7)).The methods are described in [35] (pp.153-158), [36].The PM size fractions and its chemical composition provide with important data on the process of formation of the PM and the identification of its sources [16][17][18][19][20][21].The center and the covariance structure of a compositional dataset x as derived from PM simultaneous measurements are linked to the chemical composition and distribution of the elements within coarse, intermodal and submicron size fraction.The test hypothesis about center and covariance structure allows for the statistical evaluation of differences between the chemical composition and the size distribution between two datasets.Therefore, the results of the above test provide information about the diverse PM sources and its formation processes between two considered datasets.

Results and Discussion
The difference between the mineral element concentrations of PM relating to background and kerbside site was evaluated comparing their compositional datasets.Figure 1a shows the three part compositional dataset of two characteristic monitored environmental sites such as background and kerbside after the application of the centering and rescaling technique.The three part compositional dataset of background and kerbside are displayed with high values of coarse size fraction above 55% and with low values of intermodal and submicron size fraction, which are below about 35% and 12%, respectively.The coarse size fraction is the dominant component in mineral tracers [57].The application of the centering and rescaling technique allows for a better visualization of the data despite the low values of intermodal and submicron size fraction.The three compositional datasets and their respective centres are closely displayed.Thus, in order to confirm or reject the hypothesis about the occurrence of two distinct sets, a statistical analysis is performed [57].and their respective centres are closely displayed.Thus, in order to confirm or reject the hypothesis about the occurrence of two distinct sets, a statistical analysis is performed [57].The two datasets of background and kerbside are tested for multivariate normality.The numerical results are reported in Table 1 and are compared with critical values reported by Stephens [36,58].The bivariate angle test shows that the hypothesis of normality can be accepted for both background and kerbside site with a significance level greater than 10%.The marginal test shows that the hypothesis of normality can be accepted for the kerbside site, referring to both ilr1 and ilr2, with a significance level greater than 10%.The marginal test shows that the hypothesis of normality can be accepted for the background site with a significance level greater than 5% and 10% for ilr1 and ilr2, respectively.Therefore, for each dataset, the hypothesis of multivariate normality cannot be rejected.Thus, the datasets of kerbside and background site are tested for hypothesis of equality in their centres and covariance structures [35] (p.153), [36].The results are reported in Table 2.The test value for the datasets related to kerbside and background site is below the critical value for the considered hypothesis of inequality of centres, equivalent to µ 1 ≠ µ 2,  6)) after perturbation.The centre for both datasets is at (ilr 1 ,ilr 2 ) = (0,0), the centre for background site is at (ilr 1 ,ilr 2 ) = (−0.19,−0.05) denoted with +, and the centre for kerbside site is at (ilr 1 ,ilr 2 ) = (0.19,0.05) denoted with ×.The continuous lines are the confidence regions (1 − α)100%, α = 0.05.
The two datasets of background and kerbside are tested for multivariate normality.The numerical results are reported in Table 1 and are compared with critical values reported by Stephens [36,58].The bivariate angle test shows that the hypothesis of normality can be accepted for both background and kerbside site with a significance level greater than 10%.The marginal test shows that the hypothesis of normality can be accepted for the kerbside site, referring to both ilr 1 and ilr 2 , with a significance level greater than 10%.The marginal test shows that the hypothesis of normality can be accepted for the background site with a significance level greater than 5% and 10% for ilr 1 and ilr 2 , respectively.Therefore, for each dataset, the hypothesis of multivariate normality cannot be rejected.Thus, the datasets of kerbside and background site are tested for hypothesis of equality in their centres and covariance structures [35] (p.153), [36].The results are reported in Table 2.The test value for the datasets related to kerbside and background site is below the critical value for the considered hypothesis of inequality of centres, equivalent to µ 1 = µ 2 , and equality of covariance structure, equivalent to ∑ 1 = ∑ 2 , thus this last hypothesis cannot be rejected.The inequality between the two centres indicates that the mineral element distribution differs for the two datasets.Mineral elements are more abundant in the coarse and intermodal size fraction in the dataset related to kerbside.Thus, it can be assumed that at the kerbside site there are mechanisms that promote the accumulation of elements in the coarser fractions.
Table 2. Test about the centres and the covariance structures for kerbside and background site.

Hypothesis
Test Value χ 2 Critical Value (α = 0.05) Degrees of Freedom Significance However, the equality between the two covariance structures suggests that the datasets related to kerbside and background site cannot be regarded as clearly distinct for chemical composition (see Figure 2b).These results can be interpreted as follows.The sources of mineral element at the rural site and at the kerbside site may differ.Moreover, the mechanism of accumulation and dispersion at the two sites may be also different (e.g., long range transport, dust resuspension and traffic-related processes).The combination of possibly different sources of mineral elements and diverse mechanism of accumulation/dispersion can concur to determine a different chemical composition and size distribution of the mineral matter contained in the PM.These results can be interpreted so that kerbside and background site are characterized by similar sources of mineral matter for the set of considered mineral elements.
In order to evaluate the nature of the difference between the element concentrations of PM for kerbside and background site, the perturbation difference is calculated between the perturbation centres related to kerbside and background site compositional dataset [59].The perturbation centre for background site is (59.31,31.93,8.77) (background) .The perturbation centre for kerbside site is (70.91,22.08, 7.01) (kerbside) .The perturbation difference is (44.51,25.75, 29.74) (kerbside)−(background) suggesting that the relative difference between these two sites is mainly in the coarse size fraction.
The simultaneous PM measurements at a kerbside site refer to a very busy road with a street canyon feature.Thus, it can be supposed that, at the kerbside, the effect of large traffic volumes combined with possibly narrow streets and multi-floor institutional and commercial building can determine a local characteristic accumulation/dispersion condition (e.g., street canyon environment) [60].These conditions may lead to higher tracer concentrations of the coarse size fraction.
Likewise, the difference between the mineral elements concentrations of PM relating to kerbside and rural site was evaluated comparing their compositional datasets.Figure 2a shows the three part compositional dataset of two characteristic monitored environmental sites such as kerbside and rural site after the application of the centering and rescaling technique.The three part compositional data of kerbside and rural site are displayed with high values of coarse size fraction above approximately 50% and with low values of intermodal and submicron size fraction, which are below about 40% and 25%, respectively.The coarse size fraction is the dominant component.The application of the centering and rescaling technique allows for a better visualization of the data despite the low values of intermodal and submicron size fraction.The compositional datasets of kerbside and rural as well as their respective centres are closely displayed.As above, in order to confirm or reject the hypothesis about the occurrence of two distinct sets, a statistical analysis is performed.The dataset of rural site is tested for multivariate normality.The numerical results are reported in Table 1 and they are compared with critical values.The bivariate angle test shows that the hypothesis of normality can be accepted with a significance level greater than 10%.The marginal test shows that the hypothesis of normality can be accepted for the rural site with a significance level greater than 5% and 2.5% for ilr 1 and ilr 2 , respectively.Therefore, the hypothesis of multivariate normality cannot be rejected.Thus, the datasets of kerbside and rural site are tested for hypothesis of equality in their centres and covariance structures.The results are reported Table 3.The test values for the datasets related to kerbside and rural site are above the critical ones for each considered hypothesis, thus the equality of centres, equivalent to µ 1 = µ 2 , of covariance structures, equivalent to ∑ 1 = ∑ 2 , or of both has to be rejected.The two datasets related to kerbside and rural site have to be regarded as clearly distinct for chemical composition and mineral element distributions (see Figure 2b).These results can be interpreted as follows.The sources of mineral element at the rural site and at the kerbside site may differ.Moreover, the mechanism of accumulation and dispersion at the two sites may be also different (e.g., long range transport, dust resuspension and traffic-related processes).The combination of possibly different sources of mineral elements and diverse mechanism of accumulation/dispersion can concur to determine a different chemical composition and size distribution of the mineral matter contained in the PM.
In order to evaluate the nature of the difference between the mineral element concentrations of PM of kerbside and rural site, the perturbation difference is calculated between the perturbation centres related kerbside and rural site compositional dataset.The perturbation centre for kerbside site is (70.91,22.08, 7) (kerbside) , whereas the perturbation centre for rural site is (55.94, 31.51, 12.55) (rural) .The perturbation difference is (50.17,27.74, 22.09) (kerbside)−(rural) suggesting that the relative difference between these two sites is mainly in the coarse size fraction of the considered set of mineral elements.This may be a result of the resuspended mineral matter due to road traffic, which contributes to higher concentrations in the coarse size fraction at the kerbside site [61].

Conclusions
The statistical methods used for the analysis of compositional data allow for statistically validating either differences or similarities between the investigated datasets of the related environmental sites.These differences or similarities can be associated to both the kind of involved mineral sources and possible mechanisms of addition and/or subtraction of materials that influences the behavior of the characteristic environmental sites.
Results highlight that the datasets of kerbside site and background site have different centers and equal covariance structures.The mineral elements of PM of these two sites are compositionally equivalent.Though the two distinct centres indicate that mineral elements have different distribution.This can be related to possible different mechanisms of accumulation and/or dispersion of PM at the two sites.At the kerbside, the traffic in combination narrow streets and multi-floor building determines local characteristic conditions, which lead to higher tracer concentrations of the coarse size fraction.
The datasets of kerbside site and the rural site have different centers and covariance structures.The mineral elements of PM of these two sites are different for composition and size distribution.These two sites have different sources of mineral elements.Furthermore, the mechanism of accumulation and/or dispersion at the two sites may also be different.The combination of different sources and diverse mechanisms of accumulation and/or dispersion can concur to determine a different chemical composition and size distribution of the mineral elements of PM.
The compositional analysis applied to mineral element concentrations of PM 10 , PM 2.5 and PM 1 simultaneous measurements is a technique that can be used to study environmental sites interested by different sources of mineral matter.

Table 1 .
Tests on multivariate normality for the datasets of rural, kerbside and background sites.

Table 1 .
Tests on multivariate normality for the datasets of rural, kerbside and background sites.

Table 3 .
Test about the centres and the covariance structures for kerbside and rural site.

Table 3 .
Test about the centres and the covariance structures for kerbside and rural site.