Predicting PM2.5 and PM10 Levels during Critical Episodes Management in Santiago, Chile, with a Bivariate Birnbaum-Saunders Log-Linear Model

: Improving air quality is an important environmental challenge of our time. Chile currently has one of the most stable and emerging economies in Latin America, where human impact on natural resources and air quality does not go unperceived. Santiago, the capital of Chile, is one of the cities in which particulate matter (PM) levels exceed national and international limits. Its location and climate cause critical conditions for human health when interaction with anthropogenic emissions is present. In this paper, we propose a predictive model based on bivariate regression to estimate PM levels, related to PM2.5 and PM10, simultaneously. Birnbaum-Saunders distributions are used in the joint modeling of real-world PM2.5 and PM10 data by considering as covariates some relevant meteorological variables employed in similar studies. The Mahalanobis distance is utilized to assess bivariate outliers and to detect suitability of the distributional assumption. In addition, we use the local inﬂuence technique for analyzing the impact of a perturbation on the overall estimation of model parameters. In the predictions, we check the categorization for the observed and predicted cases of the model according to the primary air quality regulations for PM.


Introduction and Literature Review
Note that particulate matter (PM) with a diameter less than 2.5 micrometers (PM2.5) is formed by particles small enough to penetrate respiratory pathways until reaching lungs and alveoli causing risks in public health [1]. Related epidemiological, toxicological and controlled human exposure studies have been reviewed [2]. This review concluded that various investigations, focused on individual sources of PM, provide evidence on a specific source that affects human health. This is the case for atmospheric contamination derived from vehicle traffic provoking some effects on human health like asthma, exacerbation of chronic respiratory diseases, respiratory problems and total cardiovascular mortality, among others [2]. Other disorders caused by atmospheric pollutants are epilepsy, headaches and venous thromboembolic disease [3].
For more than three decades, the city of Santiago in Chile has been one of the urban places that has presented levels exceeding national and international contamination limits [4]. Its location, topography and meteorology cause critical conditions on human Note that multivariate robust regression approaches have been proposed [20]. Multivariate outliers can affect the resulting ML estimates. The detection of outliers in multivariate observations is often based on the Mahalanobis distance (MD) [21]. Nevertheless, sometimes outliers do not have an enough large MD, which is due to the fact that the estimators based on the model employed to generate the MD are non-robust [22,23]. This is named the masking effect and occurs when a group of extreme observations distorts the estimates of the mean vector and/or variance-covariance matrix, producing a small distance from the outlier to the mean. The GBS family of distributions, including its BS and BS-t members, has been extended to the multivariate case [24], its multivariate qualitative robustness has been studied, and the mentioned masking effect has been evaluated numerically by simulations in multivariate BS-t models [21].
Another aspect to be considered regarding multivariate outliers is the low-dimensional visualization when employing usual scatterplots (2D). This type of visualization is not reliable to identify high-dimensional outliers. There are several outlier identification approaches looking at axis-parallel views or low-dimensional projections (often 2D) which are assumed to indicate high-dimensional outliers [25][26][27]. Low-dimensional views are risky, as discussed in [25] and shown by its Figure 9. The 2D scatterplots fail to reveal 3D outliers, a situation which is even worse in higher dimensions. Usual 2D scatterplots can be utilized to support the linear relationships between response variables and covariates provided by correlation coefficients. However, one must be careful when analyzing the 2D scatterplot matrix to detect high-dimensional outliers having in mind such a limitation.
Rieck and Nedelman [28] were the first ones in deriving BS regressions, often based on the logarithmic BS (log-BS) distribution [29]. The bivariate version of the BS distribution was proposed in [30], where ML and modified moment estimates of the corresponding parameters were derived. Multivariate log-BS distributions and multivariate BS log-linear regression models are presented in [10,12,[31][32][33].
For the present work, one of the assumptions on the bivariate regression is that its random errors are positive-skew distributed, which permits us to suitably model atmospheric pollutant levels. The use of the BS distribution has been justified by the proportionate-effect model demonstrating that this distribution has properties similar to those corresponding to the log-normal distribution, which allows its employ in atmospheric pollutant models [34]. For other applications of the BS distribution to environmental phenomena, see [35][36][37][38].
The main objective of this study is to apply a bivariate regression model to predict, simultaneously, the levels of PM2.5 and PM10 for the next day during the critical episodes management (CEM) in Santiago, Chile. This predictive model is based on a bivariate GBS regression, specifically using the bivariate BS and BS-t distributions for the model errors. A stepwise algorithm considering the Bayesian information criterion (BIC) is employed as a systematic variable selection tool to obtain a final bivariate regression model. In addition, diagnostics analytics is conducted by goodness-of-fit (GOF) and global/local influence techniques. GOF is used to determine which model offers a better fit to the atmospheric contamination data, whereas the local influence technique is utilized to analyze the impact of a perturbation on the overall estimation of model parameters [10,39]. Thus, model precision to predict a critical episode of atmospheric contamination is determined. The data were analyzed with the R software [40].
In Section 2, background on bivariate GBS and log-GBS distributions, bivariate loglinear GBS models and diagnostic techniques is provided. In Section 3, the case study is presented to motivate the application of the bivariate predictive model. Then, we introduce an application where this model is used with real-world PM2.5 and PM10 data. Section 4 contains the conclusions of this investigation and ideas for future research from the present applied study.
Algorithm 1 Generator of bivariate log-BS random vectors. 1: Perform a Cholesky decomposition of Ψ as Ψ = LL , with L being a lower triangular matrix with real and positive diagonal elements.

5:
Iterate Steps 1 to 4 until the vector of data is generated.

Algorithm 2 Generator of bivariate log-BS-t random vectors.
1: Perform a Cholesky decomposition of Ψ as Ψ = LL , with L being a lower triangular matrix with real and positive diagonal elements.
6: Iterate Steps 1 to 5 until the vector of data is generated.

Bivariate GBS Log-Linear Models
Consider a bivariate GBS log-linear regression model stated as with X = (x is ) ∈ R n×p being the model matrix of rank p, containing the values of p covariates, and Y = (Y ij ) ∈ R n×2 being the log-response matrix. Note that X and Y are connected by a coefficient matrix β = (β sj ) = (β 1 , β 2 ) ∈ R p×2 to be estimated, while E = (ε ij ) ∈ R n×2 is the error matrix. In addition, in the model defined in (2), let Y i , x i and ε i be the ith rows of Y, X and E, respectively. Thus, we have that where ε 1 , . . . , ε n are independent and identically distributed log-GBS 2 (α1 2×1 , 0 2×1 , Ψ, g (2) ), with 1 2×1 being a vector of ones. Consider a sample Y = (Y 1 , . . . , Y n ) from a bivariate GBS log-linear regression structure, with E(Y i ) = β x i , and y = (y 1 , . . . , y n ) being its respective observations. Hence, with the notations 'vec' and 'svec' for vectorization and vectorization of a symmetric matrix, respectively, the log-likelihood function for θ = (α, vec(β) , svec(Ψ) ) based on (3) is expressed as . . , n and j = 1, 2.
From (4), if g (2) is the bivariate Gaussian or t density generator, then the log-likelihood functions for θ are respectively stated as where c 1 and c 2 are constants independent of θ, and ξ ij , φ i are defined in (4). Multivariate log-GBS distributions are obtained from elliptic density generators, say g (2) . In this context, a result of interest is stated as with dg (2) (u)/du being the derivative of g (2) (u) with respect to u. If the function g (2) is a continuous and decreasing, it attains its maximum at u g , which is finite and positive. In addition, if g (2) is a continuous and differentiable function, u g is the solution to the equation ζ(u) + 1/u = 0, with ζ(u) being defined in (6). Note that the generator g (2) depends on a further shape parameter, which is denoted by ν, and it permits us to control the kurtosis of the distribution. Notice that u g equals to two for both Gaussian and t density generators. Thus, for the Gaussian and t density generators, we have, respectively, Consider the log-likelihood functions for θ defined in (4)-(5) and Ψ = Ψ(ρ), where ρ = svec(Ψ) = (ρ 1 , . . . , ρ l ) , with l = m(m − 1)/2 for m = 2. By taking the derivative of (θ; y) with respect to α, β, ρ, we obtain the gradient vector for θ stated by˙ = (˙ α ,˙ β ,˙ ρ ) , wherė with ζ i = ζ(MD i ), ζ expressed in (6), ) and D(X) is a block diagonal matrix with elements x i . To determine the ML estimates of the model parameters formulated in (2), we must equate the elements of the gradient vector stated in (7) to zero and, in this manner, obtain a homogeneous system of equations. Note that, for the log-BS-t distribution, as ν → ∞, one has −2ζ i approaching one, for all i = 1, . . . , n.
As this system cannot be solved analytically, the ML estimate θ of θ must be computed by using a non-linear optimization method to maximize the corresponding log-likelihood function. We use an iterative procedure for the optimization; more details about this procedure are provided below after the Hessian matrix is defined.
Observe that θ does not contain ν of the bivariate log-BS-t model, which must be fixed to obtain qualitative robustness according to [13][14][15][16]21]. Thus, we can work with a log-likelihood function profiled at ν. From [14], the influence function when using the t model is bounded only if ν is fixed, providing qualitatively robust parameter estimates. Nevertheless, the influence function is unbounded when ν is obtained with the ML estimation method. This indicates the non-robustness from the qualitative point of view, which should have a breakdown point equal to zero when analyzing its quantitative robustness, but this type of robustness will be explored in future studies.
The observed information matrix is stated as I(θ) = −¨ , with¨ being the Hessian matrix expressed as¨ with elements where, for k = 1, . . . , l, with Ψ(ρ) ρ k ρ s being stated as if k = s; whereas the case k = s conducts to ), whose elements are as given in (4). In order to obtain the maximized log-likelihood function, we use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method, also named the quantum-quantum BFGS optimization algorithm, which is a good choice for solving non-linear systems of equations since, in most cases, the BFGS algorithm may attain the solution more rapidly than other algorithms. For more details on numerical analysis for statistics, including the BFGS algorithm, see [41]. The BFGS algorithm is implemented in the R software by the function optim. Note that the gradient vector and Hessian matrix are analytically computed from the expressions defined in (7) and (8), respectively, and not numerically from the optim function. In addition, warm start/initial value selection for the iterative procedure is obtained from: (i) the ordinary least square estimate j being computed from (i); and (iii) Ψ (0) = D( Σ (0) ) −1/2 Σ (0) D( Σ (0) ) −1/2 , where D is a diagonal matrix and ij )/2), for i = 1, . . . , n and j = 1, 2.
Note that the estimators α, β and ρ are consistent, under regularity conditions, and they follow asymptotically a bivariate normal model with means α, β and ρ, respectively, and covariance matrix Σ θ that can be obtained from the associated expected Fisher information matrix. Therefore, as n → ∞, we get with D − → meaning "convergence in distribution to", p * = 2p + l + 1 and J (θ) = lim n→∞ (1/n)I(θ), where I(θ) is the associated expected Fisher information matrix. Observe that I −1 (θ) is a consistent estimator of the variance-covariance matrix of θ. Empirically, the expected Fisher information matrix may be approximated by the observed Fisher information matrix generated from (8), whereas the diagonal elements of its inverse matrix may be used to approximate the standard errors (SE). Asymptotic inference for the bivariate GBS log-linear regression parameters may be conducted by the asymptotic normality stated in (9).

Diagnostics Analysis
Diagnostics are used to assess suitability and stability in the modeling. As mentioned, diagnostics can be evaluated by GOF methods and global/local influence techniques. Let Y ∼ log-GBS 2 (α, µ, Ψ, g (2) ). Then, we have the property: (2) ), that is, the generalized chi-squared distribution with two degrees of freedom; see details in [12]. From this property of the bivariate log-GBS distribution, we get the MD expressed as being useful, as mentioned, to assess outliers in bivariate regression and to test goodness of fit in these regressions. Observe that: (i) MD i ∼ χ 2 (2), that is, the MD has the central χ 2 distribution with two degrees of freedom, if g (2) is the bivariate Gaussian density generator; and (ii) MD i /2 ∼ F (2, ν); that is, it is related to the central F distribution with two degrees of freedom in the numerator and ν in the denominator, when g (2) follows the bivariate t density generator, for i = 1, . . . , n. Note from the gradient vectors defined in (7) that ζ i can be interpreted as a weight in relation to the MD i . Then, as this weight is inversely proportional to MD i for the bivariate BS-t model, if case i has a large MD, it should have a small weight in the ML estimate. Therefore, this procedure assigns less weight to outlying observations in the sense of the MD defined in (10). Consider (θ) as the log-likelihood function for θ of the model stated in (2), named the non-perturbed model, and w ∈ R q as the perturbation vector in the model, for w ∈ Ω, with Ω being a set of perturbations. Thus, (θ|w) is the log-likelihood function of the perturbed model, where θ w is the ML estimate of θ generated from (θ|w). In addition, consider w 0 ∈ Ω ∈ R q as a non-perturbation vector with w 0 = 0 q×1 or w 0 = 1 q×1 , so that (θ) = (θ|w 0 ). Supposing that (θ|w) is a twice continuously differentiable function in a neighborhood of ( θ, w 0 ), the idea is to compare the ML estimates θ and θ w by the local influence method to assess how inference is affected by the associated perturbation. The likelihood distance (LD) is expressed as which is employed to evaluate the influence of w. A large LD(w) in (11) indicates that θ and θ w are considerably different in terms of the contours of the non-perturbed log-likelihood function (θ). In this paper, the local behavior of the influence plot a(w) = (w , LD(w)) around w 0 is studied. The direction in which the LD locally changes most quickly is determined; that is, the maximum curvature of the surface a(w). For LD(w) stated in (11), the maximum curvature is defined by with C d = 2|d Fd|, the matrix F ∈ R n×n and d being the unit-length direction vector. In order to obtain C max given in (12) and the corresponding direction vector d max , we must compute with ∆(θ, w) ∈ R p * ×n being a matrix partitioned accordingly for the perturbed model generated from (2), called perturbation matrix, with elements given by . . , n, j = 1, . . . , p * , evaluated at θ = θ and w = w 0 , where, as mentioned, p * = 2p + l + 1. Recall that −¨ ( θ) ∈ R p * ×p * is the observed information matrix for the non-perturbed model. This matrix is stated as I(θ) = −¨ , with¨ being the Hessian matrix given in (8). Thus, d max is a unit-length eigenvector related to the largest absolute eigenvalue C max expressed in (12). If the absolute value of d max i is large, it indicates that case i is potentially influential. In addition to d max i , another direction of interest is d i = e in , which is associated with the direction of case i, with e in ∈ R n being a vector of zeros and a one at the ith position. Therefore, the normal curvature is The diagnostic technique stated in (14) is named total local influence [42,43]. By employing the formulation given in (2) and its perturbed version, it is possible to determine normal curvatures for local influence. In order to do this, it is necessary to obtain the observed information matrix −¨ ( θ), compute the perturbation matrix ∆( θ, w 0 ) and then calculate the eigenvector related to the largest absolute eigenvalue of F defined in (13) as a local influence indicator. The schemes to be employed in this research are: (i) case-weight perturbation, (ii) correlation-matrix perturbation, (iii) response perturbation and (iv) a continuous covariate perturbation; see details about these schemes in [10].

Definition of the Problem and Established Methods
It is essential to study the relationship between the exposure of atmospheric pollutants and their impact on health, especially in mega-cities where a considerable number of the population is exposed, including vulnerable age groups. The effects of contamination produced by coarse PM in Santiago were investigated in [4], concluding that, for every 50 µg/m 3 increase in PM10 level, hospital visits caused by respiratory symptoms in children under 2 years of age increased by 4-12% [44]. Most respiratory emergency visits in Santiago were significantly associated with atmospheric contamination, specifically, with particles emitted during the combustion of fixed or mobile sources, like vehicle traffic [45]. The effects of atmospheric alerts by means of a multiple linear regression model were studied in [44]. The authors determined that atmospheric quality regulations in Santiago helped to decrease significantly the pollutant levels, where PM2.5 and PM10 reductions were between 5-7% for the "alert" and 12% for "pre-emergency" categories.
Meteorological conditions are an uncontrollable key factor in the determination of variability of atmospheric contamination. In some cases, it can surpass the influence of some anthropogenic effects, such as those that originate from vehicle traffic [46]. The effect of meteorological parameters on PM has been studied using different statistical techniques, including multiple linear regression, generalized additive models, multivariate adaptive regression splines and neural networks. Considering the high number of statistical models that can be used to fit atmospheric contamination, it is very common to observe discrepancies among results [46].
In Chile, the primary air quality regulations for PM10 are established in Supreme Decree number 59/1998 of the National Environmental Commission (CONAMA in Spanish) [47].
In 1999, CONAMA commissioned a study to improve the air quality predictive methodology in the Metropolitan Region, which resulted in a new approach known as the Cassmassi model, named after its creator Joseph Cassmassi [48,49]. This model was developed from air quality data measured by the automatic monitoring network of atmospheric pollutants of the Metropolitan network (MACAM in Spanish) and altitude-based meteorological data from the central zone of the country, between 1-April and 17-September during the years 1997 and 1998.
In 2000, CONAMA replaced its old model with a new Cassmassi model [50]. This model predicts the maximum level of PM10 for the next day, in each station of the MACAM network classified as a monitoring station with population representativeness for PM10. The MACAM network has 11 monitoring stations, geographically located in certain zones of the Metropolitan region of Chile, with their corresponding numbers on the map as shown in Figure 1   In 2015, the Chilean Ministry of Environment presented a pollution predictive model that anticipated days with bad air quality, known as the "Air quality predictive WRF-MMA model for fine breathable PM2.5", where WRF denotes "weather research and forecasting" and MMA denotes "Ministry of Environment" (Ministerio de Medio Ambiente de Chile, in Spanish). This model estimates the maximum level of PM2.5 for the next day and it is capable of predicting critical events of PM2.5 contamination three days in advance for nine cities along central and southern Chile [51,52].
Currently in 2021, the Cassmassi and WRF-MMA models are used and evaluated by environmental experts in Chile each day. These predictive models for PM levels are based on a univariate multiple linear regression. Subsequently, the authority makes a decision whether to issue an environmental alert, pre-emergency or emergency in the corresponding zone. Note that the models used by the Ministry of Environment predict PM2.5 and PM10 separately. Given that these levels are highly correlated, they should be considered in only one predictive model, such as proposed in the present investigation.

Data, Variables and Model
In this study, data collected from the Pudahuel monitoring station from MACAM network were used during the year 2015 in the CEM period. The main reasons to work with 2015 data and the Pudahuel station are: (i) 2015 is the last year with the most validated measurements for each station; (ii) the Pudahuel station registered the highest levels of PM2.5 during 2015; (iii) the Pudahuel station is the most influential monitoring station in Santiago, informing administrative decisions based on predicted critical episodes [53]; and (iv) according to air quality regulation, if at least one monitoring station in Santiago reports situations defined as pre-emergency or emergency for PM10 and/or PM2.5, the authority will declare the condition of a critical episode in the city [54]. Hence, data from the Pudahuel station are considered relevant for pollutant investigation in the Santiago region. Meteorological and pollutant data for the Pudahuel station were obtained from the National Air Quality Information System (SINCA in Spanish) website of the Chilean Ministry of Environment, which provides air quality data for the entire country (http: //sinca.mma.gob.cl, accessed on 22 January 2021). Some variables used in this study were originally measured hourly and had to be transformed in order to represent daily measurements for modeling purposes. In addition, a binary variable was used indicating if the current day is a weekend/holiday or weekday. The covariates employed in our predictive models are: • average wind speed every 6 h of the present day between 0:00-5:59 (X 1 ), 6:00-11:59 (X 2 ), 12:00-17:59 (X 3 ) and 18:00-23:59 (X 4 ); • average temperature every 6 h of the present day between 0:00-5:59 (X 5 ), 6:00-11:59 (X 6 ), 12:00-17:59 (X 7 ) and 18:00-23:59 (X 8 ); • average relative humidity every 6 h of the present day between 0:00-5:59 (X 9 ), 6:00-11:59 (X 10 ), 12:00-17:59 (X 11 ) and 18:00-23:59 (X 12 ); • average PM2.5 level every 6 h of the present day between 0:00-5:59 (X 13 ), 6:00-11:59 (X 14  The response variables considered are: • maximum PM2.5 level for the next day (T 1 ); • maximum PM10 level for the next day (T 2 ).
The data that contain these covariates and the response variable are named "Chilean PM" data.
The bivariate predictive model proposed in this study describes the relationship among the response variables defined above, T 1 and T 2 , that represent PM2.5 and PM10 maximum levels for the next day, respectively, and a set of p = 34 covariates, also defined above. Then, the bivariate predictive model is expressed in matrix form as with X = (x is ) ∈ R n×(p+1) being the model design matrix of rank p + 1 = 35, containing the values of 34 covariates, and Y = (Y ij ) = (log(T ij )) ∈ R n×2 being the log-response matrix. In addition, X and Y are connected by a coefficient matrix β = (β sj ) = (β 1 , β 2 ) ∈ R 35×2 , and E = (ε ij ) ∈ R n×2 being the error matrix. Here, the rows of the error matrix (ε i ) of the model defined in (15) are considered to be random variables whose behavior is characterized by bivariate statistical distributions. In this study, models with errors following bivariate log-GBS distributions are proposed, namely ε i ∼ log-GBS 2 (α1 2×1 , 0 2×1 , Ψ, g (2) ), with Ψ = (ρ rs ) ∈ R 2×2 being the correlation matrix and g (2) the bivariate density generator. We estimate the parameters of the bivariate GBS regression models via the ML method, which we have implemented in R by using the BFGS method through the optim function. As mentioned, the gradient vector and the Hessian matrix are analytically computed.
For the model given in (15), the MD is as defined in (10). Furthermore, MD i ∼ χ 2 (2), when g (2) is the bivariate normal density generator, and MD i /2 ∼ F (2, ν), if g (2) is the bivariate t density generator, for i = 1, . . . , n. Substituting the ML estimator of θ in MD i ( θ), this measure possesses asymptotically the same distribution as MD i (θ). Note that, using the Wilson-Hilferty (WH) approximation, it is possible to transform this distance so that it follows a normal distribution. Consequently, it is possible to check for normality using GOF techniques [10]. We show diagnostic graphical plots for total local influence (C i ) to detect possible influential cases under the fitted models.

Data Exploratory Analysis
Next, the bivariate predictive model defined in (15), based on bivariate GBS distributions, is applied, including GOF techniques and diagnostics based on the MD using the R software. The R codes and data used in this application are available upon request. Table 1 reports a descriptive summary of the data, which includes minimum, median, maximum, range, mean, standard deviation (SD), coefficient of variation (CV), coefficient of skewness (CS) and coefficient of kurtosis (CK) for the response variables, during a CEM period for the Pudahuel monitoring station in the year 2015. Although a CEM period covers the days between 01-April-2015 to 31-August-2015, the first 7 days of April were not considered for this study because no data for PM10 levels were registered in the monitoring station. For this reason, the total number of data is n = 146 and not 153. The primary air quality regulation for PM2.5 and PM10 is 50 µg/Nm 3 and 150 µg/Nm 3 , respectively, as 24-h level. According to Table 1, the primary regulations are exceeded for both response variables. Continuing with the exploratory analysis of the data, in Figure 2, marginal asymmetric distributions for response variables T 1 and T 2 are observed, justifying the need to model these levels with positive-skew distributions as proposed in this study. In addition, we calculate correlations between the response variables and all quantitative covariates, with the binary variable X 34 being not included in the correlation matrix. First, we remove the covariates X 22 , X 26 , X 27 and X 28 that are highly correlated between them. This is supported by the variance inflation factor (VIF) greater than 10 in marginal models causing possible collinearity problems. Such VIF values are 49.9, 8356.2, 20,551.9 and 18,657.9, respectively; see details about the VIF in [17] (p. 118) and [55]. Second, based on the low correlation between some covariates and the response variables, we determine that only the following covariates are part of the bivariate predictive model: X 21 , X 23 , X 24 , X 31 , X 32 , X 33 and X 34 . In Figure 3, a scatterplot matrix of these covariates (except the binary variable X 34 ) and the response variables is shown. This figure is conformed by scatterplots for the variables in study and their corresponding correlation coefficient. From this figure, a high correlation can be identified between T 1 and T 2 , justifying the use of a bivariate model. Note that we employ these 2D scatterplots to support the linear relationships between response variables and covariates provided by correlation coefficients. However, we do not consider them to detect multivariate outliers due to limitations earlier mentioned. In Figure 3, a scatterplot matrix of these covariates (except the binary variable X 34 ) and the respons

Parameter estimation and model selection 360
In view of the exploratory analysis described previously, bivariate log-GBS distributions seem to be adequate for obtaining the predictive model to be used when data-driven decision making i the monitoring of PM environmental contamination in Santiago, Chile. Then, the predictive bivariat regression model to be applied is given by Continuing with the exploratory analysis of the data, in Figure 2, marginal asymmetric 344 distributions for response variables T 1 and T 2 are observed, justifying the need to model these levels 345 with positive-skew distributions as proposed in this study. In addition, we calculate correlations 346 between the response variables and all quantitative covariates, with the binary variable X 34 being 347 not included in the correlation matrix. First, we remove the covariates X 22 , X 26 , X 27 , and X 28 that 348 are highly correlated between them. This is supported by the variance inflation factor (VIF) greater 349 than ten in marginal models causing possible collinearity problems. Such VIF values are 49.9, 8356.2, 350 20551.9, and 18657.9, respectively; see details about the VIF in [17, p. 118] and [55]. Second, based on 351 the low correlation between some covariates and the response variables, we determine that only the 352 following covariates are part of the bivariate predictive model: X 21 , X 23 , X 24 , X 31 , X 32 , X 33 and X 34 .

353
In Figure 3, a scatterplot matrix of these covariates (except the binary variable X 34 ) and the response

Parameter estimation and model selection 360
In view of the exploratory analysis described previously, bivariate log-GBS distributions seem to be adequate for obtaining the predictive model to be used when data-driven decision making in the monitoring of PM environmental contamination in Santiago, Chile. Then, the predictive bivariate regression model to be applied is given by

Parameter Estimation and Model Selection
In view of the exploratory analysis described previously, bivariate log-GBS distributions seem adequate to obtain the predictive model to be used in data-driven decision making when monitoring environmental pollution in Santiago. Then, the predictive bivariate regression model to be applied is given by where Y = (Y ij ) = (log(T ij )) ∈ R 146×2 is the log-response matrix and ε i = (ε i1 , ε i2 ) ∼ log-GBS 2 (α1 2×1 , 0 2×1 , Ψ 2×2 , g (2) ). As mentioned, the parameters of the bivariate BS and BS-t models are estimated by the ML method, which has been implemented in the R software. A stepwise algorithm based on the BIC is used for variable selection within the set {X 21 , X 23 , X 24 , X 31 , X 32 , X 33 , X 34 }. The covariates were initially ordered according to the correlation with the response variables. Table 2 provides the results obtained by this variable selection algorithm for the bivariate BS regression model.  Next, parameter estimates (that is, the value that maximizes the log-likelihood function), estimated asymptotic SEs and p-values of the corresponding t-tests are obtained for each of the parameters of the bivariate BS regression model, where non-significant covariates were excluded at a 5% significance level. Its results are reported in Table 3, from where it is possible to note that the coefficients β 23 and β 32 should be removed when predicting T 1 ; however, both of them should be used when predicting T 2 . Further, coefficient β 24 should be removed when predicting T 2 , but considered when predicting T 1 . As mentioned, the MD can be considered to evaluate whether the proposed distributional assumption for the multivariate models is appropriate and also as a measure of global influence to identify multivariate outliers. In Figure 4a, an empirical probability versus theoretical probability (PP) plot is presented, with Kolmogorov-Smirnov (KS) acceptance regions at 5% for transformed MDs. The KS test, although not particularly sensitive, but very competitive with other tests, is the only test that can be linked to a graphical tool as the PP plot. A graphical tool is always more desirable than a test due to its easier interpretation. However, if the graphical GOF tool can be accompanied by a p-value associated with a GOF test, it is more informative. This is the reason why we have used both GOF tools [56]. From the PP plot, the BS model does not have a good fit, which is corroborated by a p-value of 0.001 of the KS test associated with this PP plot. From Figure 4b, observe that cases {64, 95, 119} appear as possible multivariate outliers in the BS model. These cases correspond to 10-June, 11-July, and 04-August of the year 2015, respectively.
Next, we adopt a bivariate BS-t regression model to describe the data and apply the same variable selection algorithm as with the bivariate BS regression model, within the set {X 21 ,X 23 , X 24 , X 31 , X 32 , X 33 , X 34 }. An important point to consider under a t model is whether the degrees of freedom, ν, are estimated or not. Various authors [10,[13][14][15][16]21] have worked on this issue and reported problems when estimating ν due to unboundedness and local maximum in the likelihood function. Thus, in order to overcome this difficulty, the parameter ν can be previously fixed or, otherwise, information for ν from the data can be obtained [14]. Then, to estimate the parameters of the bivariate BS-t regression model, we use the profiled log-likelihood function with fixed ν from 1 to 20. This procedure is known as the non-failing method and applied in each of the iterations of the stepwise algorithm, starting the procedure with ν = 4 and attaining an optimum at ν = 3 with the lowest BIC (113.1882); see Figure 5a. Table 4 provides the results obtained by this variable selection algorithm for the bivariate BS-t regression model.    Next, parameter estimates, estimated asymptotic SEs and p-values of the corresponding t-tests are obtained for each of the parameters of the bivariate BS-t regression model, where non-significant covariates were excluded at a 5% significance level, in this case, X 32 . We use this level to obtain the BS-t model and its results are reported in Table 5. The BS-t model is proposed as optimal parsimonious, where the estimated ρ is statistically significant and the coefficient β 23 must be discarded in the prediction of T 1 , but considered for T 2 .  Figure 5b presents a PP plot with KS acceptance regions at 5% for transformed MDs. The plot shows that the bivariate BS-t model has a better fit than the BS model, with p-value of 0.8502 of the KS test. In addition, we fit the bivariate normal regression for comparison with an established model and summarize the BIC values of this model and of the bivariate BS and BS-t regression models in Table 6. Marginal normal regression models are less suitable than the bivariate normal regression model, as expected, with their BIC values omitted here. From Table 6 and Figure 5a, we confirm that the BS-t regression with ν = 3 degrees of freedom and the indicated covariates is the most adequate model for describing the Chilean PM data.

Diagnostic Analytics
From Figure 5c, we can identify that cases {64, 95} appear as possible multivariate outliers in the BS-t model. These cases correspond to 10-June-2015 and 11-July-2015, respectively. As it is well known, outliers can or cannot be potentially influential cases, so that we now apply the local influence method for their evaluation.
In order to identify possible influential cases under the fitted model, diagnostic plots are presented for total local influence (C i ). The schemes to be employed in this research are: (i) case-weight perturbation, (ii) correlation-matrix perturbation, (iii) response perturbation and (iv) a continuous covariate perturbation; see details about these schemes in [10]. Figure 6a-d present index plots for C i (θ), C i (α), C i (β) and C i (ρ) under the case-weight perturbation scheme. For this scheme, we can distinguish that case {96} appears as high potentially influential onθ in Figure 6a. In addition, this same case has a high potential influence only onβ in Figure 6c. Furthermore, in Figure 6b and in Figure 6d, there is no influence of this case onα andρ, respectively. Figure 6e-h show index plots for C i (θ), C i (α), C i (β) and C i (ρ) under the correlation matrix perturbation scheme. These figures indicate that under this perturbation scheme no cases stand out as high potentially influential considering the BS-t model. Figure 6i-l present index plots for C i (θ), C i (α), C i (β) and C i (ρ) under the response variable perturbation scheme. From these figures, one can distinguish that case 74 has a high influence on PM10, when applying the BS-t regression model. Furthermore, case 74 also has some influence onβ for PM10, but not forα andρ. Figures 6m-p present index plots for C i (θ), C i (α), C i (β) and C i (ρ) under the covariate perturbation scheme. From these figures, we can observe that no cases appear as potentially influential cases in the bivariate BS-t model.

Analysis of Results
In summary, cases {64, 74, 95, 96} are identified as potentially influential data under the different perturbation schemes used, two of which cases {64, 95} are indicated also as possible outliers. These cases correspond to 10-June, 20-June, 11-July and 12-July of the year 2015, respectively. Case 64 (10-June-2015) was the day prior to the second highest PM2.5 level during the year, whereas case 74 (20-June-2015) was the day with the highest recorded level for PM2.5 and the second highest for PM10. Note that cases 95 and 96 (11-July-2015 and 12-July-2015) are 2 of 5 days with the largest measured rainfall (total precipitation) for the entire year, which might have also affected the low levels of PM2.5 and PM10 observed for case 96. Under these conditions, a much larger decrease was expected for PM levels than the predicted levels. Meteorological variables might affect the response variables in an indirect manner, as observed in the perturbation schemes.
Next, the prediction capacity of the bivariate BS-t regression model with respect to the primary quality guidelines for PM2.5 and PM10 is analyzed, based on which the degree of precision to detect critical episodes was determined. Table 7 provides the primary quality guidelines for PM2.5 and PM10 levels for 24 h.

PM2.5 Level PM10 Level Indication
[0, 50) [0, 150) good [50,80) [150, 195) regular [80,110) [195, 240) alert [110,170) [240, 330) pre-emergency 170 330 emergency The results for the corresponding predictive capacity of the BS-t model for the year 2015 are reported next. First, once again, observed versus predicted data during the 2015 CEM period for PM2.5 are analyzed. According to Figure 7, the model is capable of following the overall trend of the observed data. Nevertheless, just as for bivariate BS model, when an abrupt increase in the pollutant level is present from one day to another, it is not capable of predicting a value similar to the observed data. Note that cases presenting extreme residuals of low frequency in the histogram are those where large differences exist between the observed and predicted data. Table 8 contains the categorization of the observed and predicted measurements according to the primary air quality regulations of PM2.5 levels for the year 2015. Most underestimated cases occurs when the observed data are categorized as emergency.  Just as for the PM2.5 level, a general underestimation of the PM10 level is noted. According to Figure 8, for extremely high measurements, a value similar to the observed data cannot be predicted by the model. Note that cases presenting extreme residuals of low frequency in the histogram are those where medium differences exist between the observed and predicted data. Table 9 contains the categorization of the observed and predicted measurements according to the primary air quality regulations for PM10 levels for the year 2015. Most of the underestimated cases occur when the observed data are categorized as "pre-emergency" or "emergency".

Conclusions and Future Investigation
In this study, bivariate Birnbaum-Saunders log-linear models were fitted to predict the maximum PM2.5 and PM10 levels during critical episodes management in Santiago, Chile. The bivariate Birnbaum-Saunders-t model showed a better fit to the data and, consequently, more precise and robust results were obtained with respect to the Birnbaum-Saunders model. The proportion of accurate predictions about the corresponding observed categories, according to primary air quality regulations, was also better for the Birnbaum-Saunders-t model than for the Birnbaum-Saunders model, in both PM2.5 and PM10 levels. For the bivariate Birnbaum-Saunders model, statistically significant meteorological variables, at a 5% significance level, were: maximum level of PM2.5 of the present day, average wind speed of the present day, predicted temperature range for the next day, average relative humidity of the present day, total precipitation of the present day, average atmospheric pressure of the present day and the binary variable weekend/holiday. For the bivariate BS-t model, the statistically significant covariates, at a 5% significance level, were the same as those for the Birnbaum-Saunders model, except for total precipitation of the present day and average atmospheric pressure of the present day. The stepwise algorithm was used as a systematic variable selection tool to obtain the bivariate regression model based on the Bayesian information criterion. The Mahalanobis distance was employed to evaluate if the distributional assumption was appropriate for each model and also as global influence method to detect bivariate outliers. The local influence technique, under perturbation schemes of case-weight, correlation matrix, response variable and a continuous covariate, was utilized to identify possible influential cases under the fitted model. For the Birnbaum-Saunders-t model, predictions were superior for the maximum PM2.5 level than for the maximum PM10 level. Considering the categorization of PM2.5 estimates using the Birnbaum-Saunders-t model, it is worth mentioning that some alert and pre-emergency indications were overestimated in more relevant categories according to primary air quality regulations for PM2.5 levels. The regular, alert, pre-emergency and emergency categories obtained an 81.5%, 50.0%, 51.2% and 36.8% of assertiveness, respectively; see Table 8. For PM10 estimates using the Birnbaum-Saunders-t model, an 87.3% and 63.9% assertiveness were obtained for the regular and alert categories, while categorizations for pre-emergency and emergency were underestimated mainly under alert; see Table 8. Future research, which arose from the present applied investigation, is proposed as follows: (i) Incorporation in the modeling of temporal, spatial, functional and quantile regression structures, as well as measurement errors, and partial least squares, are suitable to be studied and can improve the predictive capability of the model [57][58][59][60][61][62][63]. (ii) Traditional robust estimation methods as well as the theoretical study of quantitative robustness are also of interest [64].
(iii) Other applications in the context of multivariate methods are in cluster analysis and principal component analysis, particularly when using principal components to remove the collinearity among covariates [65]. (iv) An interesting field of application is in the statistical learning and neural networks.
The methodology used in this applied investigation provides options to explore other theoretical and numerical topics related, which are in progress and we hope to report them in other articles.