Two-way and multiway modeling interpretation is obtained basically through the evaluation of bidimensional plots. It is worthwhile to remind that in the Tucker3 model the plots axis directions depend on the sign of the correspondent core matrix element. As the bidimensional modeling is well known and widely used in atmospheric sciences, in this article the Tucker3 model interpretation will be clarified in a more detailed way.
3.1. Unfold-Principal Component Analysis
The unfold-PCA calculation was performed firstly on the 2D array with the 16 yearly average values (1992–2007) for the three sites in the rows and5 variables in the columns. Autoscaling was performed on the unfolded matrix. The aim of this model was the comprehension of the differences between the sites with respect to the investigated variables and their changes in time. A score and loading plot are shown in
Figure 4.
The first PC explained 62% of the total variance, the second one explained 28%. In the loading plot, PC1 showed the difference between a primary pollutant located along positive values of the axis and O3, which had a negative value. The second component, instead, differentiated the primary pollutants.
Three sample groups emerged in the score plot. The first component differentiated the samples collected in the two urban sites (positive values) from the rural one (negative scores). PC2 differentiated Como (labelled as squares in the plot) and Lecco (labelled as triangles in the plot) samples. A pattern could be also detected: while all the Varenna samples were positioned in a small graph area, Como and Lecco samples were found from positive PC1 values (earliest sampling years) to values closer to the zero (more recent years). It is worthwhile to note that the two trends had a different direction in the PCs space.
Some interesting conclusions arose from this model. Varenna had higher O
3 values in comparison to the other two sites. On the other hand, Como was characterized by higher values of NO
2 and Lecco showed higher values of SO
2. During the chosen period, the primary pollutants reduction can be observed in the two urban sites, while the environmental condition of Varenna remained more stable (average yearly data are available in
Supplementary Materials, Table S1).
A subsequent unfold-PCA was performed on the autoscaled array with 16 yearly average values in the rows and 5 variables for the 3 sites in the columns. The aim of this calculation was the study of the relationship between variables according to their environmental behavior and their variation in time. The results are shown in
Figure 5.
The first two PCs retained 76% of the total variance (PC1 62%, PC2 14%—subsequent PCs did not add any valuable information).
In the loading plot the first component accounted for the information concerning primary pollutants in the three sites: these variables appeared to be highly correlated especially for the Como and Lecco sites. PC2 was mainly influenced by O3 (the secondary pollutant) values.
In this case, the model highlighted the correlation between different variables according to their environmental source (it is worthwhile to note that their position in the plot was quite different in comparison to the loading plot obtained by the previously described PCA model).
In the score plot the samples were found along the PC1 direction: they went from 1992 (negative values) to the more recent years that showed positive score values (years starting from 2000 occupied a small area on the plot). From the model arose an average primary pollutants reduction from 1992 (the most polluted situation) to 2000 and a successive stabilization. O3 levels (averaged on the three sites) did not show significant changes.
3.2. Four-Way Tucker3 Model
The first multiway model was calculated arranging the data set in a four-way array 365 days × 5 parameters × 16 years × 3 sites. Logarithmic transformation and J-scaling (scaling within variables mode) were applied prior to modeling. The aim of this model was to investigate: (1) the trends of the pollutants in the period 1992–2007; (2) the variability of the pollutants throughout the year; (3) the differences between the three considered sites.
The optimal model complexity given a proper explained variance (in other words, the optimum number of factors for each mode to be considered), was detected through the scree plot evaluation [
29,
30] in which models against explained variance % are reported (see
Figure 6). The scree plot was realized by starting the calculation from the model with one factor in each mode [1,1,1,1] until the model [5,5,5,3]. The optimal model showed the better compromise between the smaller number of factors in each mode and the higher percentage of the data variance—several models were taken into account when selecting it.
The decomposition model [2,1,2,2] that explained 96.5% of the total data variance was chosen for further investigation (
Figure 6a). All of the four modes but one were described by two factors, the second mode only needed one factor. The first mode (A) contained information concerning the days of the year, the second mode (B) contained the correlation between the parameters, and the third (C) and the fourth (D) models contained the differences between the years and the sites, respectively.
The most important core elements of Z (the core array) were [1,1,1,1], [2,1,1,2], and [1,1,2,2] that respectively explained 64.5%, 21.1%, and 6.2% of the core variance, which accounted for 91.8% of the total core variance. The loading plots for the different modes are shown in
Figure 7.
The first important core element [1,1,1,1] explained 64.5% of the core variance and carried the information about the interaction between all the first factors in each mode (A1, B1, C1, D1). The sign of this core element was (−). Along the first axis of mode A (days) and mode C (years), all the loadings showed positive values, therefore the sign of the core element was determined by the signs of modes B (variables) and D (sites). B1 was mainly related to the variation of variables NO, NO2, NOx, and SO2, which were highly correlated between them at negative loading values. O3 showed a positive loading value. This axis discriminated between the primary pollutants and the secondary one. D1 differentiated the rural town, Varenna (V) from the two industrialized and populated cities, Como (C) and Lecco (L): Varenna had a negative loading value while Como and Lecco both had similar positive loading values. The negative sign of the core element could be explained through the interaction between high negative loading of B1 (variables) and high positive loading of D1 (sites) or vice-versa (mode A and C show all positive loadings). Thus, it could be pointed out that, on average, the concentration values of the primary pollutants reached higher levels in Como and Lecco sites and that, on the other hand, Varenna was characterized by higher levels of O3. This consideration was notable for all the days in all the years.
The second important core element [2,1,1,2] explained 21.1% of the core variance and reflected the second factor in the first and fourth modes and the first factor in the second and third mode (A2, B1, C1, D2). A trend in the sample positions arose along A2 despite the overlapping due to the great number of samples (days). Samples were located from negative to positive values according to the season variability: winter and autumn days (labelled in black and dark gray in the plot) had negative values whereas spring and summer samples (labelled in gray and light gray in the plot) were placed at positive values. B1 distinguished between primary pollutants and O3, as already mentioned, while all the years showed positive values on C1. Along D2 all the loadings showed negative values; no differences between the sites arose from this factor. O3 levels (positive loading on B1) increased during spring and summer days while the primary pollutants highest values were recorded in the cold seasons.
The third important core element [1,1,2,2] had a positive sign and explained 6.2% of the core variance, it carried the information about the interaction between factors A1, B1, C2, and D2. C2 showed a trend from the first investigated years (positive values) to the more recent ones that showed negative values. The other factors have already been explained above. The third core element therefore showed the primary pollutants decreasing (as previously described by unfold-PCA models).
The second Tucker3 model presented in this article was built to study the differences between the weekly days (average values on a weekly base − Monday ÷ Sunday). The matrix was arranged into a four-way array as 7 days of the week × 5 parameters × 834 weeks × 3 sites. Logarithmic transformation and J-scaling were applied prior to modeling.
The decomposition model with complexity [2,1,2,2] (77.3% of the total data information) was chosen (
Figure 6b). The most important core elements of Z were [1,1,1,1], [1,1,2,2], and [2,1,1,2] (66.8%, 31.4%, and 1.8%, respectively) with negative element signs for the first two factors and positive for the third. The loading plots for different modes are shown in
Figure 8.
The first mode (A) described the difference between the days of the week, the second mode (B) described the correlation between the parameters, the third mode (C) described the difference between the weeks, and the fourth one (D) the difference between the sites.
The first important core element [1,1,1,1] explained the interaction between the first factor (A1, B1, C1, D1) in each mode. The sign of this core element was (−). Along the first axis of mode A (days) all loadings showed positive values; there were no differences between the days of the week. B1 showed a clear separation between O3, with a positive loading value, and NO, NO2, NOx, and SO2, with negative loading values. Along C1 all loadings showed positive values. D1 showed the difference between Varenna (V) at negative loading values and the other two sites Como (C) and Lecco (L) with similar positive loading values. The sign of the core element therefore depended on the interaction between high negative loadings of B1 (variables) and high positive loadings of D1 (sites), or vice-versa (mode A and C show all positive loadings). The core element [1,1,1,1] showed that Como and Lecco were characterized by a higher level of primary pollutants while Varenna showed a higher level of O3 in all the weekdays for all the weeks. The second important core element, [1,1,2,2], accounted for 31.4% of the core variance and explained the interactions between the factors (A1, B1, C2, D2). The core element sign was (−). A1 and B1 have already been previously discussed. It is worthwhile to notice that along C2 positive values during winter weeks change into negative values in summer weeks. All the three sites had positive values along D2, but Como and Lecco showed higher values than Varenna. The sign of the second important core element depended on the interaction between high negative loadings of B1 (variables) and high positive loadings of C2 (weeks), or vice-versa, considering that mode A1 and D2 showed all positive loadings. Therefore, the core element [1,1,2,2] showed a higher concentration of primary pollutants (NO, NO2, NOx, and SO2) in winter and autumn, whereas the secondary pollutant (O3) was higher in summer and spring for all the weekdays and for all the sites. Again, Como and Lecco showed higher values of primary pollutants.
The third important core element [2,1,1,2] accounted for 1.8% of the core variance and explained the interaction between the factors (A2, B1, C1, D2). The core element sign was (+). A2 showed the difference between weekends (Saturday and Sunday) with positive axis values and working-days (Monday to Friday) with negative values. B1 showed positive values of O3 and negative values for the primary pollutants, while C1 and D2 showed positive values respectively for the weeks and the sites. Therefore, the core element [2,1,1,2] showed higher concentrations of primary pollutants (NO, NO2, NOx and SO2) on working-days and a higher concentration of the secondary pollutant (O3) during weekends for all the weeks and all the sites.
From this model a difference between weekends and working-days arose. This model showed the difference between Varenna and the other two cities and confirmed ARPA site classifications as expected like the previous computed model. Moreover, cold seasons (winter and autumn) showed higher primary pollutant concentrations and O3 reached the maximum values in the warm seasons (summer and spring), as already shown in the previous model.