Classification of Clouds Sampled at the Puy de Dôme Station (France) Based on Chemical Measurements and Air Mass History Matrices

: A statistical analysis of 295 cloud samples collected at the Puy de Dôme station in France (PUY), covering the period 2001–2018, was conducted using principal component analysis (PCA), agglomerative hierarchical clustering (AHC), and partial least squares (PLS) regression. Our model classified the cloud water samples on the basis of their chemical concentrations and of the dynamical history of their air masses estimated with back-trajectory calculations. The statistical analysis split our dataset into two sets, i.e., the first set characterized by westerly air masses and marine characteristics, with high concentrations of sea salts and the second set having air masses originating from the northeastern sector and the “continental” zone, with high concentrations of potentially anthropogenic ions. It appears from our dataset that the influence of cloud microphysics remains minor at PUY as compared with the impact of the air mass history, i.e., physicochemical processes, such as multiphase reactivity. similar way. This indicates no robust statistical correlations between cloud microphysics and cloud water chemical composition. This suggests that


Introduction
The chemical composition of cloud water has revealed the high complexity of this medium [1], resulting from the cloud scavenging of soluble gases, the dissolution of the soluble fraction of the aerosol acting as cloud condensation nuclei (CCN), and from aqueous phase reactions [2]. Additionally, recent studies have shown that microbial activity altered the cloud water chemical composition [3][4][5]. Therefore, cloud water is composed of a mixture of complex inorganic and organic compounds with strong oxidants which have been shown to drive aqueous phase transformations in the presence of solar radiation [6,7].
Studies devoted to the analysis of the chemical composition and to its variability have aimed to better understand several atmospheric physicochemical processes such as droplet activation and growth, production and consumption of chemical compounds, as well as transport and deposition modeling purposes. Finally, the effect of cloud physics or air mass history on the chemical composition of clouds sampled at PUY is estimated.

Cloud Sampling
Sampling was performed at PUY (45.7722° N, 2.9648° E), which belongs to several international networks as follows: EMEP (the European Monitoring and Evaluation Programme), GAW (Global Atmosphere Watch), and ACTRIS (Aerosols, Clouds, and Trace Gases Research Infrastructure). The observatory chalet is on top of a monogenetic volcano rising above the surrounding area with a height of 1465 m. PUY is part of the Chaîne des Puys, a north-south oriented chain of extinct volcanoes in the Massif Central (France), and to the west, an agricultural plain to the ocean (300 km apart). The urban area of Clermont-Ferrand and its surrounding suburbs (285,000 inhabitants) is situated 16 km east and 1000 m lower than the station. PUY is able to characterize air masses from various histories, coming from the boundary layer or in the free troposphere, varying as a function of the seasons and time of the day. The top of the mountain is frequently in cloudy conditions, on average 30% of the time per year, with higher occurrences during winter and autumn [50]. This makes PUY a reference site to study and sample cloud properties.
The cloud sampling dataset used in this study covered the period 2001-2018, with an average sampling time of 3 h and an average sampling volume of 75 mL. Non-precipitating cloud droplets were sampled using cloud collectors, as described previously for PUY cloud studies [46]. Cloud droplets, larger than 7 μm (cut-off diameter) [51], were collected by impaction onto a rectangular aluminum plate. Most of the time, droplets were collected directly as a liquid, and more rarely, they froze upon impaction (supercooled conditions). The water was transferred at room temperature, either directly or after a short melting period into glass vials. The aluminum collectors were cleaned and sterilized by autoclaving. Samples were collected in sterilized bottles and cloud water was filtered (0.20 μm nylon filter to eliminate microorganisms and particles). The majority of the sampled clouds resulted from frontal systems that mainly occurred during autumn, spring, and winter; these meteorological conditions enabled sampling clouds over long-time durations.

PuyCloud Database
The PuyCloud observation system deals with the monitoring of the biological, microphysical, and chemical properties of clouds. Biological and chemical analyses are performed in collaboration with the ICCF (Institute of chemistry of Clermont-Ferrand). It is part of the French CO-PDD (Cézeaux-Aulnat-Opme-Puy De Dôme) multisite platform fully described by Baray et al. [39] in terms of instrumentations and data availability and widely employed [22,25,[52][53][54][55].
The meteorological parameters that are monitored at PUY include the following: wind speed and direction, temperature, pressure, and relative humidity. Cloud microphysical properties, i.e., liquid water content (LWC) and effective droplets radius (re), are measured with a Gerber particle volume monitor, model 100 (PVM-100).
Physicochemical parameters are measured immediately after sampling (pH, conductivity, and redox potential). The concentrations of the major organic and inorganic ions (acetic, formic, succinic, malonic and oxalic acids, Ca 2+ , K + , Mg 2+ , Na + , NH4 + , Cl − , SO4 2− , and NO3 − ) are measured by ion chromatography, using a DIONEX DX-320. The H2O2 and iron content, which are important parameters in the evaluation of the cloud oxidative capacity, are also determined. The spectrofluorimetric method based on the reactivity of p-hydroxyphenilacetic acid with horseradish peroxidase [53] was used to measure the concentration of hydrogen peroxide in cloud water. The Fe(II) concentration was measured by UV-visible spectroscopy at 562 nm, using the method developed by Stookey [56] based on the rapid complexation of iron with ferrozine. A detailed description of the physicochemical parameters and chemical analysis has been provided in Bianco et al. [6].  Table S1 indicates the physicochemical analysis performed for each cloud event. This cloud water chemical characterization was systematically performed for the last 20 years. Additional cloud water chemical and biological analysis have been developed during this last decade using targeted or global methods [3,22,25,52,54,[57][58][59][60][61].

Dynamical Analysis
The trajectory approach is commonly used to identify source areas of air pollutants, based on conditional probability fields including back trajectory calculations, land cover, and meteorological data [62,63]. In the present work a dynamical analysis using the CAT model is performed to identify source areas of chemical compounds detected in cloud samples.
The CAT model is the recent evolution of the Lagrangian model LACYTRAJ [44]. CAT is a three-dimensional (3D) forward/backward kinematic trajectory code using initialization wind fields from the recent reanalysis ECMWF ERA-5 [64]. A cluster of starting back trajectory points is defined by the user and advected by the model using a bilinear interpolation for horizontal wind fields and time and a log-linear interpolation for vertical wind fields. The CAT model has already been used to determine the air masses arriving in PUY on the basis of calculations of two sets of 24 h back trajectories per day over a two-year period (2015-2016) [39].
In this study, sets of 45 back trajectories were calculated every hour during the cloud sampling in volume +/− 0.1° in latitude and longitude. The vertical starting altitude of the back trajectories was deduced from the pressure measured at the Puy de Dôme summit considering the hydrostatic equilibrium. Trajectories were calculated between the summit and 50 m below (corresponding to 4 hPa) to take into account the ascent from the slopes of PUY of the air masses arriving below the observatory. The temporal resolution was 15 min and the total duration was 72 h.
The CAT model was initialized with ECMWF ERA-5 wind fields of any temporal and spatial resolution. For this work, wind fields were extracted every 3 h with a spatial resolution of 0.5° in latitude (55 km) and longitude (40 km), on 23 vertical pressure levels between 200 and 1000 hPa. CAT integrated a topography matrix at a resolution of around 10 km [65].
In addition to the wind parameters, the boundary layer height was also extracted from the ECMWF ERA-5 reanalysis in the same horizontal resolution, and spatially and temporally interpolated on all the trajectory points.
The trajectory calculation phase was followed by a dynamical characterization analysis phase. This phase consisted of flagging the cloud samples on the base of the results of the trajectories calculations.
The history of air masses was modeled by counting the number of trajectory points in each of the following nine geographic areas: north-northeast (NNE), east-northeast (ENE), east-southeast (ESE), south-southeast (SSE), south-southwest (SSW), west-southwest (WSW), west-northwest (WNW) and north-northwest (NNW), named "sector" hereafter, and one nearby area. The latter was defined because it was not possible to determine the origin for the closest points to PUY, in a radius of 0.5°, because of the spatial resolution of the wind fields. The percentage of points located over the sea and the continental surfaces was, then, determined using the topography file. If the altitude of the topography interpolated on each trajectory point is 0, this point is considered to be above the sea, and therefore corresponds to the "sea surface" zone. Otherwise it is the "continental surface" zone. Finally, we separated the continental and sea zones vertically, using the altitude of the atmospheric boundary layer height (ABLH) interpolated on the trajectory points (data summarized in Table S1, blue columns).
All of these characteristics were, then, compiled for each cloud sampling, providing a so-called "zone matrix" and a so-called "sector matrix". Thus, the matrices indicated, for each cloud sample, the distribution of the sectors or the zones crossed by their 72-hour backward trajectory. The relationship between the air mass history and the cloud composition was the subject of a statistical analysis, as described in Section 2.4.

Statistical Analysis
A principal component analysis (PCA) was performed using the concentrations of both organic and inorganic ions (Ca 2+ , K + , Mg 2+ , Na + , NH4 + , Cl − , NO3 − , SO4 2− , acetate, formate, malonate, oxalate, and succinate). The aim was to determine the most relevant variables to establish chemical categories (categories which would, then, be put in parallel with microphysical parameters or air mass history matrices). However, some chemical analyses were lacking and some samples presented missing values. Missing values were not replaced by the mean values to fully represent the variability of the dataset and to avoid overfitting [66]. Thus, samples, that were not fully characterized, were not considered in the statistical analysis.
Then, we performed numerous PCAs, and the maximum of information, gathered on the first two factors, was obtained with three ions with predominant marine sources (Cl − , Mg 2+ , and Na + ) and three ions with major sources from the continental surface (mostly anthropogenic ones) (NH4 + , SO4 2− , and NO3 − ). Hence, by keeping six inorganic ions, as similar studies [33,46,47], we fulfilled the best balance between samples and variables (i.e., increasing the number of variables means decreasing the number of samples).
The PCA type that was used during the computations was the Spearman's correlations (more appropriate when running the PCA on variables with different distributions).
Then, we performed agglomerative hierarchical clustering (AHC), an iterative classification method, the aim of which was to make up homogeneous groups of objects (categories) on the basis of their description by a set of variables (chemical variables, herein) describing the dissimilarity between the objects (cloud events, herein). The AHC produced a dendrogram which showed the progressive grouping of the data. To calculate the dissimilarity between samples, we applied the common Ward's agglomeration method (which minimized the within-group inertia) using Euclidean distance. The data were centered-reduced, to avoid variables with strong variance which unduly weighed on the results. The truncation level was automatically defined on the base of the entropy, and therefore the number of categories to retain.
Then, a PLS regression was performed to establish the correlations between the chemical parameters and the air masses history parameters. The Mann-Whitney and Kruskal-Wallis nonparametric tests were carried out to validate significant differences between two and among several data groups, respectively. Two air mass categories were declared to be different when the probability for the groups to have identical data distribution was lower than 5% (p-value < 0.05). These tests were chosen because the population from which the sample was extracted did not follow a normal distribution, according to the Shapiro-Wilk normality test.
This statistical analysis was performed using Excel XLSTAT software [67].

Results and Discussion
The multivariate statistical analysis was performed on 295 cloud samples collected at PUY, starting with PCA and AHC, in order to classify them according to their chemical composition. Then, these results were compared to the previous PUY study [46]. Then, we investigated, by using PLS regression, the relationships among these chemical data and the matrices provided by the CAT model, both on the zones ("sea and continental surfaces") and on the cardinal sectors crossed by the air masses. Finally, we compared the respective influences of the air mass history and microphysics on the chemical composition of clouds.

Clusterization of Cloud Waters at PUY
Data relative to Cl − , Mg 2+ , Na + , NH4 + , NO3 − , and SO4 2− , presented in Table S1, were analyzed by AHC and PCA to obtain categories based on ion concentration dissimilarities.

Chemical Categories
AHC was used to categorize cloud samples based on the long-term monitoring of their chemical composition. The AHC algorithm successfully grouped all the observations with a satisfactory cophenetic correlation (correlation coefficient between the dissimilarity and the Euclidean distance matrices) of 0.619 ( Figure 1). Indeed, the closer the correlation to 1, the better the quality of the clustering. The dotted line in Figure 1 represents the degree of truncation (dissimilarity = 91.16) of the dendrogram used for creating categories and was automatically chosen based on the entropy level. Given the small difference in dissimilarity ( Figure 1) between the light blue category (dissimilarity = 88.91) and the aggregate of the yellow and red categories (dissimilarity = 93.41), the AHC could be almost as robust with three or five categories. Nevertheless, these four AHC categories are consistent with our previous study [46]. Categories 1, 2, 3, and 4 consist of 113, 31, 55 and 9 clouds, respectively.
The ACH profile plot ( Figure 2), represents the four categories determined from the six main inorganic ions (Cl − , Mg 2+ , Na + , NH4 + , NO3 − , and SO4 2− ). The light blue category with low ion concentrations is named "Marine" according to its air mass history (i.e., the time spent by this air mass above the "sea surface", detailed in Section 3.2). This category is the most homogeneous. This is confirmed by its significantly lower within-class variance (179.86), as shown in Figure 1. The "marine" category is also the main category (113 objects), which is consistent with the remoteness of the PUY. The dark blue category is characterized by high concentrations of Na + , Cl − , and Mg 2+ and its air mass history, and thus called "highly marine". PUY is located more than 300 km from the Atlantic shore. Nevertheless, at a synoptic scale, the air masses are mainly transported from the Ocean to PUY with no relief between (as confirmed hereafter by the CAT model). Hence, this category, with 31 objects, would appear to be counterintuitively modest. This suggests that some western clouds (which could have been classified as "highly marine") have either precipitated or become diluted (increase in liquid water content), thereby decreasing concentration. Then, these western clouds are classified "marine", hence, the importance of this category (i.e., a category with a marine history, but without salt).  In red, the smallest category (nine objects), referred to as "polluted" in Figure 1 displays peak concentrations for SO4 2− , NH4 + , and NO3 − , suggesting the air mass passed over an urbanized area. Below these maxima, in yellow, the "continental" category with 55 objects stands out. It should be noted, with only nine objects, the polluted category is statistically less robust, and could have been merged with the "continental" category (see dissimilarities in Figure 1), and regarded as the extreme SO4 2− , NH4 + , and NO3 − values of the category. Conversely, the "highly marine" category could have been split (see dissimilarities in Figure 1), according to their SO4 2− concentration (not shown).

Variable Validation
A PCA was computed on a Spearman correlation matrix using the concentrations of the ions (Cl − , Mg 2+ , Na + , NH4 + , NO3 − , and SO4 2− ). The PCA correlation circle (Figure 4a) provides evidence that Cl − , Mg 2+ , and Na + are strongly correlated (see correlation matrix in Table S3,   In this PCA (Figure 4), the first two factors represent 85.57% of the initial variability of the data; the PCA is robust, with no information hidden in the next four factors (see squared cosines of the variables in Table S4). The horizontal axis (F1) is linked to the total ion concentration and represents 58.16% of the information, while the vertical axis (F2: 27.4%) is linked to the concentrations of NH4 + , NO3 − , and SO4 2− in positive, and Cl − , Mg 2+ , and Na + in negative. The PCA is consistent with the AHC. Coherently, in Figure 4b, the AHC "marine" category stands out on the left (F1 < 0) of the chart, the "highly marine" category at the bottom right (F1 > 0 and F2 < 0), the "continental" and the "polluted" categories at the top right (F1 > 0 and F2 > 0). [46] In this study, the statistical analysis evolves as compared to our previous work. First, the AHC is performed with a larger number of samples (208 versus 134) and variables considered for the statistical analysis are different, i.e., pH is not taken into account and Mg 2+ is added to the variables, as explained above. We removed cloud events with missing values. The ACP Spearman's correlations replaced Pearson's. However, the distribution of categories is fairly unchanged; among the 208 cloud events used in the AHC, 164 (78.8%) were clustered in a category similarly named in the 2001-2011 study [46] (see Table S1).
In both studies, 4% of cloud samples are in the "polluted" category, while the mean ion concentrations are markedly lower. Among the nine cloud events in "polluted", eight are clustered in "polluted_01-11".
In summary, the "marine" category slightly increases in percentage, as the mean ion concentration of the "continental" and "polluted" categories dwindle, in particular for the anthropogenic ions. "Highly marine" is the category that has expanded the most. This trend is not fully explained by the minor statistical processing adjustments (see Section 2.4). We compared (not shown) the two methods on the first 2001-2011 dataset, without observing any significant difference.
We performed a Mann-Whitney test ( Figure 5) on the clouds sampled from 2001 to 2011 (period covered by our previous work [46]) and since then. It appears that NH4 + , NO3 − , and . The concentration of sea salts does not evolve significantly, but the changes on anthropogenic classes (mentioned above) drive the changes observed in the "marine" and "highly marine" categories. Category terminology will receive additional justifications in Section 3.2.  [46], and 88 clouds sampled from 2012 to 2018. We compare, for both periods, NH4 + , NO3 − , and SO4 2− concentrations. The p-values of all pairwise comparisons are significant at level alpha = 0.05.

Influence of Air Mass History at PUY
This section is devoted to the correlation between the concentration of the inorganic ions and the air mass history. During their atmospheric transports, the air masses received chemical species under various forms (gases and particles) from various sources. This strongly depended on the altitude of the air masses. During the transport, chemicals could also undergo multiphasic chemical transformations, as well as dry or wet deposition. The objective, here, is to evaluate the effect of the history of air masses on the chemical composition of clouds. To this end, PLS regressions are performed and the results are validated with nonparametric tests (Kruskal-Wallis and Mann-Whitney tests). As described in Section 2.4, the CAT model provides two matrices. The "zone matrix" contains information about the time spent by air masses over "continental surface" or "sea surface", in the atmospheric boundary layer (<ABLH) or in the free troposphere (>ABLH). The "sector matrix" contains information about the time spent by air masses in the eight forty-five degrees sectors (NNE, ENE, ESE, SSE, SSW, WSW, WNW and NNW; see Figure S1c). Figure 6a represents the distribution of these parameters for all the cloud events. Despite the distance from the coast (300 Km), the strong maritime influence at PUY is obvious (Figure 6a). Over a 72-hour backward trajectory, on average, an air mass spends almost two days over the "sea surface". Coherently, PUY is characterized by prevailing strong west and north winds (WSW, WNW, NNW, and NNE), with average percentages of time spent over these four main sectors of 23, 44, 14, and 12%, respectively (Figure 6b).
To perform the PLS analysis (Figure 7), the matrix of the explanatory variables (the "Xs") is composed of the "sector matrix" and the "zone matrix". The matrix of the dependent variables (the "Ys") is the chemical matrix. As explained in Section 2.3, we restricted our statistical analysis to the concentration of six chemical compounds to avoid excessive loss of information and overfitting in the statistical analyses. Figure 7. Partial least squares (PLS) chart with t component on axes t1 and t2. The correlations map superimposes the "Xs", the "Ys" and the cloud events. The dependent variables from the chemical matrix are symbolized by a black "Y"; the explanatory variables from the "sector matrix" by a black "X"; and from the "zone matrix" by a blue, brown, light or dark "X". The 208 cloud events are gathered by AHC category (red circle, "marine"; dark blue diamond, "highly marine"; yellow square, "continental"; and red triangle, "polluted").
The index of the predictive quality of the models is quite low (Q 2 = 0.1, ideally it should be close to 1) suggesting weak correlations. It is well known that cloud composition depends on many other parameters than the chosen explanatory variables, related to the air mass history calculated by the model. Indeed, cloud chemical composition depends foremost on local microphysics [17,37,73], proximity to sources [33,48,74,75], biological activity [4,5,61], seasonal cycles [30,[76][77][78], and diurnal cycles [79]. Figure 7 displays numerous intricacies between chemical parameters and the air mass history. First, some zone variables are weakly correlated to some sector variables (cf. PLS correlation matrix in Table S5), "sea surface" (>ABLH) with WNW), "sea surface" (<ABLH) with WSW), "continental surface" (>ABLH) with ENE (too few observations to be interpretable on the graph) and more robustly, "continental surface" (>ABLH) with ENE (R = 0.7).
The "polluted" category in red (Figure 7) and, to a lesser extent, the "continental" category in yellow are on the left of the display, toward the NNE sector and the "continental surface" (>ABLH) zone. The "highly marine" category in dark blue and, to a lesser extent, the "marine" category in light blue are drawn toward the WNW/WSW sectors and the "sea surface" (>ABLH) zone.
We performed an AHC on the "sector matrix" and obtained three clusters. Then, we reran the previous PLS. The simplified correlation matrix (Table 1) highlights the link between "sea surface" zones west sector and "continental surface" zones and northeast sector. We do not keep this clusterization in the main PLS to avoid a loss of information. Table 1. PLS correlation matrix between clustered sector variables and zone variables. Highest correlation displayed in red and highest anticorrelation in blue. The simplified correlation matrix (Table 2) displays weak correlations. However, the link between sea salts (Cl − , Mg 2+ , and Na + ) and both the "sea surface" (>ABLH) zone and the WSW/WNW clustered sectors is noticeable. The same applies to ions of potentially anthropogenic origin (NH4 + , NO3 − , and SO4 2− ) and both the "continental surface" (>ABLH) zone and the NNW/NNE/ENE sector. For both marine and continental ions, the correlations are higher above the atmospheric boundary layer height (>ABLH), confirming PUY is surely influenced by long-range transport [42,43]. Table 2. PLS correlation matrix between chemical variables and both zone and clustered sector variables. Highest correlation displayed in dark red and highest anticorrelation in dark blue. In order to statistically validate these observations, we performed the Kruskal-Wallis test and compared the category distribution within each zone ( Figure S1a) and main sectors ( Figure S1b). As the computed p-values are lower than the significance level alpha = 0.05, we accept that the main sectors (WSW, WNW, NNW, NNE, and ENE) and the zones ("sea surface" (>ABLH), "sea surface" (<ABLH), and "continental surface" (>ABLH)) are significantly different for each category. The samples do not come from the same population. Only the p-value of "continental surface" (>ABLH) is greater than the significance level alpha = 0.05 (p-value = 0.062). The difference between the categories according to the sector distribution can also be observed on the map ( Figure S1c) provided by the CAT model. The history of air masses significantly influences the chemical composition of clouds.

Influence of Cloud Microphysics at PUY
The air mass history can influence solute concentration by scavenging aerosol particles and gaseous species (as discussed in Section 3.2). This strongly depends on the CCN concentration related to the physicochemical properties of aerosol particles (size distribution and chemical composition) and on the gas phase chemical composition and corresponding phase equilibria. Microphysical cloud conditions such as liquid water content (LWC) and effective droplets radius (re) can also perturb solution concentration variability, as well as chemical reactions occurring within cloud waters. This section is devoted to the possible relationships between LWC and chemical variables.
For this, a PLS analysis was performed with the LWC and the re, as the matrix of the explanatory variables (the "Xs"), and the chemical matrix (the "Ys"). The PLS chart is presented in Figure 8, and Table S6 reports the correlation matrix between these variables. There are weak correlations between LWC and ion concentrations. The strongest anticorrelation is between NH4 + and re, i.e., r(NH4 + , re) = −0.37. This analysis clearly demonstrates that at PUY, microphysical properties of the sampled clouds are almost not correlated with their chemical composition. This could be explained by the type of clouds that are collected; the majority are frontal clouds which were formed well before their arrival at the top of the mountain and which present a low variability in their microphysical properties. A supplementary analysis (PCA) has also been performed (Table S7), demonstrating that correlations between microphysical variables and air mass history parameters (Section 3.2) are negligible. Thus, the influence of air mass history on the chemical composition of clouds cannot be attributed to the variability of microphysical parameters. In other words, there is an influence of microphysics, but it is statistically identical, whatever the zones or the sectors crossed by the air mass.
To remove any influence of LWC variation, cloud water loadings (CWLs) are commonly calculated to evaluate the solute content per volume of air. The statistical analysis, in this study, were conducted on solute concentrations in cloud water to get more robust results, because microphysical parameters are not always available, especially under winter conditions. However, LWC and re variations are not highly variable ( Figure S2)  Correlations with t on axes t1 and t2 of solute concentrations in cloud waters ( Figure S3). This suggests that for clouds sampled at PUY, the air mass history can better explain the variability of cloud water solute concentrations than LWC variations. A decrease of the solute concentrations from continental origin (NO3 − , SO4 2− , and NH4 + ) was observed between the periods 2001-2011 and 2012-2018, as mentioned in Section 3.1.3. In addition, a low decrease of its mean value (from 0.31 to 0.27 g·m −3 ) suggests that the CWLs for these species also significantly decreased. This trend could be explained by the aerological evolution highlighted by the CAT model and requires further investigation.
Previous field studies have investigated the dependency of cloud chemical composition with microphysical parameters [16,[80][81][82]. It has been shown, for sites more exposed to anthropogenic emissions, that LWC could modulate solute concentrations. For example, clouds freshly formed by orography can have their chemical composition modulated by cloud microphysics [7,19,73].  Table S2) as compared with the literature data [19]. Polluted events are exceptionally observed at PUY and most likely originate from afar (northeast France, several hundred kilometers away). If we consider the synoptic scale, we should see higher concentrations of NH4 + , NO3 − , and SO4 2− ; however, these ions are involved in chemical and photochemical reactions [83][84][85][86]. Hence, more information on the chemical aging of the air masses is needed (chemical characterization in progress). Moreover, such a long-term monitoring, with varied air masses, smooths the microphysics (LWC and re) influence ( Figure S2). Cloud water is a complex matrix resulting from the interaction of many factors. Nevertheless, it appears that the air mass history, despite reduced correlations, remains the prevailing parameter, with either western and oceanic clouds or northeastern anthropogenic clouds.

Conclusions
In this study, statistical analyses (AHC and PCA) were carried out on 208 cloud samples collected at the Puy de Dôme station (France) between 2001 and 2018, which resulted in clustering the cloud samples according to their chemical properties (concentrations of inorganic ions from marine and continental origins) into four categories as follows: "highly marine", "marine", "continental", and "polluted". Despite an evolution of the statistical treatment to classify the clouds samples, this work confirms those established in a previous study by Deguillaume et al. [46] for clouds sampled between 2001 and 2011. A change between the relative proportions of categories is however noticed and attributed to a significant decrease in the NH4 + , NO3 − , and SO4 2− concentrations during the second period (2012-2018) of cloud sampling.
CAT models the history of the air masses arriving at PUY, providing for each air mass the time spent above the eight cardinal sectors and above continental or sea surfaces. The CAT model specifies whether the air mass is in a free troposphere or in an atmospheric boundary layer. From these in silico zone and sector matrices and in situ chemical characteristics, PLS analysis highlights two main relationships between air mass origins and ion concentrations. A type of air mass comes predominantly from western sectors and from the "sea surface" (> ABLH) zone, with the highest concentrations of sea salts (Cl − , Mg 2+ , and Na + ). A total of 31 cloud samples are gathered in the "highly marine" AHC category, which are characteristic of this air mass. Slightly linked to the latter, the "marine" AHC category, which is named for its air mass history and its low ion concentrations, is the most important (113 cloud samples) and the most "homogeneous". The second main air mass type arrives from the northeast sector and from the "continental surface" (> ABLH) zone, with the highest concentrations of potentially anthropogenic ions (NH4 + , NO3 -, and SO4 2− ). Only nine cloud samples are grouped in the "polluted" AHC category, characteristic of this air mass. With less extreme values and 55 cloud samples, the "continental" category represents the body of this set.
Finally, the influence of cloud microphysical properties (LWC and re) on the cloud water composition is investigated using PLS analysis in a similar way. This indicates no robust statistical correlations between cloud microphysics and cloud water chemical composition. This suggests that cloud chemical composition at PUY is influenced by air mass history which includes several physicochemical processes (CCN physical and chemical processes, mass transfer of soluble species, multiphase reactivity, etc.).
Clearly, this study highlights parameters that could drive the chemical composition of clouds at PUY; this statement cannot be generalized to other observation sites presenting different environmental scenarios. However, in a remote site, it appears that without major and immediate urban or marine influence, an air mass coming from the ocean or from a polluted area would be observed more or less loaded, according to complex biophysicochemical processes. In addition, much of the oceanic influence (i.e., Cl − , Mg 2+ , and Na + concentrations) seems to decrease quickly (78% of the clouds coming from the ocean appear "cleaned"), and much of the anthropic influence seems more persistent (NH4 + , NO3 , and SO4 2− concentrations) which remain significant.
The PUY site is a reference European station for the study of gases, aerosols, and clouds. International field campaigns have been conducted there in the past and future campaigns would especially target cloud biophysicochemical processes. Cloud waters collected at PUY for various air mass histories also serve for laboratory investigations that consider the following: (1) characterizing the complex chemical composition and its environmental variability by innovative analytical methods, and (2) quantifying photochemical and biological transformations occurring in this complex liquid medium. Cloud field investigations performed at PUY also help to build relevant chemical scenarios that help to better constrain cloud chemistry models [87]. For the dynamical frame, the CAT model makes it possible to give an overview of the air mass history; this helps to constrain cloud chemistry models but also makes it possible to compare the PUY station to other observatories where cloud studies are conducted.

Supplementary Materials:
The following are available online at www.mdpi.com/xxx/, Figure S1: Category distribution within each zone (a) and sector (b), Figure S2: LWC distribution at PUY, Figure S3: Comparison of normalized concentrations and normalized CWLs, Table S1: PuyCloud data ("TableS1.xlsm"), Table S2: Ion concentrations of the categories; Table S3: Spearman chemical correlation matrix of PCA, Table S4: Squared cosines of the variables, Table S5: Squared cosines of the variables (sectors, zones, and chemistry), Table S6: PLS correlation matrix between microphysics and chemistry, Table S7: PCA correlation matrix between microphysics and parameters related to air mass history.