Robust Analysis of PM2.5 Concentration Measurements in the Ecuadorian Park La Carolina

In this article, a robust statistical analysis of particulate matter (PM2.5) concentration measurements is carried out. Here, the region chosen for the study was the urban park La Carolina, which is one of the most important in Quito, Ecuador, and is located in the financial center of the city. This park is surrounded by avenues with high traffic, in which shopping centers, businesses, entertainment venues, and homes, among other things, can be found. Therefore, it is important to study air pollution in the region where this urban park is located, in order to contribute to the improvement of the quality of life in the area. The preliminary study presented in this article was focused on the robust estimation of both the central tendency and the dispersion of the PM2.5 concentration measurements carried out in the park and some surrounding streets. To this end, the following estimators were used: (i) for robust location estimation: α-trimmed mean, trimean, and median estimators; and (ii) for robust scale estimation: median absolute deviation, semi interquartile range, biweight midvariance, and estimators based on a subrange. In addition, nonparametric confidence intervals were established, and air pollution levels due to PM2.5 concentrations were classified according to categories established by the Quito Air Quality Index. According to these categories, the results of the analysis showed that neither the streets that border the park nor the park itself are at the Alert level. Finally, it can be said that La Carolina Park is fulfilling its function as an air pollution filter.


Introduction
Air pollution is a serious problem [1]. Many people die every year due to problems related to air pollution. Therefore, in order to reduce this type of pollution in an efficient way, it is important to create citizen awareness about it and take care of green areas and trees, build more urban parks, avoid the use of aerosols, and improve the transportation system, among other things.
Living in the city implies that citizens have to face the problem of air pollution very often. Therefore, it is important to design measurement and control systems that can eliminate or mitigate the problems that air pollution is causing to humans and nature. The smart city concept takes this idea into account, and one of the most harmful pollutants for humans is PM 2.5 (fine particulate matter with diameter smaller than 2.5 µm [2]. the concentration of NO 2 and NOx on the road was performed by using the ordinary least-squares regression method. The studies conducted in [30] could be used to better understand the effects that roads that are heavily traveled and that are adjacent to sidewalks and houses have on people. All this helps to achieve a better urban design and to better predict the level of exposure that people have to certain variables that are harmful to human beings.
In addition, in order to study fine-scale variations of PM 2.5 concentrations and black carbon, in [31], "a three-point synchronous observation experiment" was carried out. Such an experiment was performed in Shanghai, China. Moreover, a characterization of concentrations of PM 2.5 and black carbon was performed in [31], and a generalized additive model was used to show the relationship that PM 2.5 concentrations and black carbon have with multiple factors. The results of [31] could contribute to the development of air pollution control strategies at roadsides.
In [32], in order to carry out indoor air quality predictions, a methodology based on modeling was presented. The predictions were made using artificial neural networks and the Personal-Exposure Activity Location Model [33,34]. The approach presented in [32] can be employed to determine exposures that urban workers have to air pollutants. The study presented in [32] was performed in Dublin, Ireland. In addition, in [35], a research work on particulate matter produced by "vehicle emissions at traffic intersections of street canyons in Hong Kong" was presented. In [35], detrended fluctuation analysis and autocorrelation analysis were used to study the distinguishing features of the concentration of particulate matter.
Furthermore, in [36], a model consisting of a wavelet neural network and a genetic algorithm was used to predict CO and PM 2.5 concentrations. The above-mentioned model was developed to perform the estimation of concentrations of air pollutants in Shanghai. In addition, in [37], both parts of the urban park La Carolina (Quito, Ecuador) and some of the surrounding streets were modeled as random variables, which represented routes made in some areas to measure the concentration of PM 2.5 in them. Then, nonparametric statistical inference techniques were used to classify the level of air pollution that the areas under study had, based on air quality indexes defined by the city of Quito [38]. In [37], a Kruskal-Wallis test was applied to test whether the above-mentioned random variables came from the same statistical population, and the Wilcoxon signed-rank test was used to perform the classification process in accordance with Quito air quality indexes [38]. The results of [37] showed that the air pollution level because of PM 2.5 concentrations in that urban park was not in alert level. However, there are important decisions that have to be taken by the city authorities in order to improve the air quality in the area under study. For each statistical model, regardless of whether this is a location model, a scale model, a linear regression model, or a model of any other type, there are different robust statistical tools that increase the reliability and accuracy of modeling and/or data analysis [39,40].
The main objective of this article is to perform the robust statistical analysis of PM 2.5 concentration measurements in La Carolina Park, which is one of the most important urban parks of Quito. In this article, in order to robustly estimate the location and scale parameters of the data, the following estimators were used: (i) for robust location estimation: α-trimmed mean, trimean, and median estimators; and (ii) for robust scale estimation: median absolute deviation, semi interquartile range, biweight midvariance, and estimators based on a subrange [39][40][41][42].
The use of several of the above-mentioned estimators within the context of the analysis of PM 2.5 concentrations in urban parks is novel and has its importance in the fact that with few data and without the need for them to fit a normal distribution, or some kind of parametric distribution, the central tendency and dispersion of the data can be estimated in a robust way. Thus, small variations in the data have no effect on their estimation.
Sensors 2019, 19, 4648 5 of 42 In this article, in order to be able to understand characteristics of the type of data that were analyzed, the first thing that was done was to establish the selection criteria of the study area (i.e., La Carolina Park, Quito). Second, a statistical summary of the data was performed, and moments of first, second, third and fourth order were found. Moreover, the median and range of the data were found. Also, in order to reduce the effect that each data point could have, graphs of the data were shown, and attempts were made to soften the data under analysis. In addition, empirical confidence intervals for the median of the data were obtained with a 95% confidence level, and nonparametric confidence intervals for the median of the data were also obtained with a 95% confidence level.
Here, the air pollution levels of each variable under study were classified in accordance with the categories established by the city of Quito [38]. After that, the robust statistical analysis of location parameters and scale parameters of the data was performed. The results of this article are compatible with [37].
This article is structured as follows: The criteria that were taken into account to select the study area are explained in Section 2. The description of the PM 2.5 measurement instrument that was used in this article and data collection process are given in Section 3. Section 4 is devoted to carry out the robust statistical analysis of the data. Finally, the conclusions are given in Section 5.

Selection Criteria of the Study Area: La Carolina Park
One of the fundamental criteria that was taken into consideration to choose La Carolina Park to carry out our study was that this urban park offers an extremely important environmental service for Quito. This urban park is an air pollution filter. However, the characteristics of the park and its surroundings are different from each other.
Therefore, taking into consideration what has been said above, this article is focused on studying the area of La Carolina Park that possibly has the highest levels of PM 2.5 concentrations.
In order to select the study area of the park, the following four criteria were taken into account: Furthermore, with the purpose of performing the experimental measurements, a route was followed through the park and its surroundings. The route that was followed is shown in Figure 1.
In addition, this figure shows the six random variables (X 1 , X 2 , X 3 , Y 1 , Y 2 , Y 3 ) that refer to the routes along which the specific measurements were performed. The above-mentioned four selection criteria are shown in Figure 2, and the explanation of Figures 1 and 2 is given below.  The above-mentioned six random variables, which refer to the routes along which the PM 2.5 concentration measurements were performed, are shown in Figure 1 and represent the following: • X 1 is a variable that is used to refer to the route that was followed through the sidewalk that is in front of La Carolina Park and on Avenida de los Shyris (shown as Shyris Av. in Figure 1). Measuring PM 2.5 concentrations on this sidewalk is important, because on this sidewalk, the greatest commercial activity that exists around the park can be found. On this sidewalk, there are many high-rise buildings with shopping centers, companies, bus stops, homes, and entertainment venues, among other things, which act as barriers that prevent a satisfactory dissipation of PM 2.5 concentrations. Therefore, the passers-by that walk on this sidewalk are exposed to the air pollution that is generated by the traffic of cars and buses that circulate along Avenida de los Shyris. In addition, the polluted wind that collides with buildings on this sidewalk bounces and creates whirls, which carry polluted air and impact the bodies of pedestrians. • X 2 is a variable that is used to refer to the route that was followed through the sidewalk that is on the border that separates La Carolina Park from Avenida de los Shyris. Measuring PM 2.5 concentrations on this sidewalk is important, because people walk along this sidewalk to have more contact with green spaces. On this sidewalk, there are also bus stops, and people use it to go jogging and do other activities. • X 3 is a variable that is used to refer to the route that was followed through the imaginary line that crosses the park center and that is parallel to X 1 and X 2 . Measuring PM 2.5 concentrations on this imaginary line is important, because it represents a region with trees, vegetation, and green spaces. Moreover, this region has no bus or car traffic. In addition, this region has many areas that have been specifically designed for citizens to practice outdoor sports in the park. • Y i is a variable that is used to refer to the route that was followed through Avenida República del Salvador (shown as Rep. Del Salvador Av. in Figure 1) and a path through the park that is an extension of Avenida República del Salvador. This path is used by people to go to the park center. The observation of this variable is important, because it takes into account parts of the park that have many trees and abundant vegetation. • Y 2 is a variable that is used to refer to the route that was followed through Avenida Portugal (shown as Portugal Av. in Figure 1) and a path through the park that is an extension of Avenida Portugal. This path is used by people to go to the park center. This variable represents an imaginary line that is perpendicular to X 1 , X 2 , and X 3 , and parallel to Y 1 . Furthermore, the observation of this variable is important, because it takes into account parts of the park that have both very few trees and very little vegetation. However, this region has many areas that have been specifically designed for citizens to practice outdoor sports in the park. • Y 3 is a variable that is used to refer to the route that was followed through Avenida Naciones Unidas (shown as Naciones Unidas Av. in Figure 1), which is a significant pollution source. This fact justified the need to consider this variable for the study that was carried out in this article. For this variable, the PM 2.5 concentration measurements were performed on the sidewalk that is on the border that separates the park from Avenida Naciones Unidas.  The above-mentioned six random variables, which refer to the routes along which the PM2.5 concentration measurements were performed, are shown in Figure 1 and represent the following: • 1 is a variable that is used to refer to the route that was followed through the sidewalk that is in front of La Carolina Park and on Avenida de los Shyris (shown as Shyris Av. in Figure 1). Measuring PM2.5 concentrations on this sidewalk is important, because on this sidewalk, the greatest commercial activity that exists around the park can be found. On this sidewalk, there are many high-rise buildings with shopping centers, companies, bus stops, homes, and entertainment venues, among other things, which act as barriers that prevent a satisfactory dissipation of PM2.5 concentrations. Therefore, the passers-by that walk on this sidewalk are exposed to the air pollution that is generated by the traffic of cars and buses that circulate along Avenida de los Shyris. In addition, the polluted wind that collides with buildings on this sidewalk bounces and creates whirls, which carry polluted air and impact the bodies of pedestrians.
• 2 is a variable that is used to refer to the route that was followed through the sidewalk that is on the border that separates La Carolina Park from Avenida de los Shyris. Measuring PM2.5 concentrations on this sidewalk is important, because people walk along this sidewalk to have more contact with green spaces. On this sidewalk, there are also bus stops, and people use it to go jogging and do other activities. The explanation of what is indicated in Figure 2A-D, is as follows: • Figure 2A corresponds to the selection criterion (A) (i.e., Vehicular traffic), and this figure was made with the intention of showing that the five main streets that border the park have high traffic (indicated in red). This area is part of the financial and commercial heart of the city. Therefore, it has a proven ability to attract a high volume of private vehicles. Only the street that borders the park at the northwest corner has low traffic (indicated in green), which is because it is a street that does not direct traffic to places that are of great interest. In short, it is not a high traffic street. This part is generally used as a parking area. • Figure 2B corresponds to the selection criterion (B) (i.e., Number of public transport routes), and this figure was made with the intention of showing that the streets that border the park have public transport lines, which in Quito is one of the most polluting types of vehicles. However, the two streets that connect the city in a north-south direction (i.e., Avenida Rio Amazonas, which is shown as Amazonas Av. in Figure 1, and Avenida de los Shyris) have seven and 10 bus lines, respectively; followed by Avenida Naciones Unidas, which has five bus lines. The junction at which Avenida de los Shyris and Avenida Naciones Unidas intersect is the one that concentrates most buses. In addition, it is possible that this junction has most air pollution emissions that come from the bus traffic.
• Figure 2C corresponds to the selection criterion (C) (i.e., Woodland density), and this figure was made with the intention of showing that forest density (represented by green markers) is a key factor in improving air quality. Therefore, this criterion focused on the identification of the park area with the lowest tree density. This would allow the identification of the most critical area, since having fewer trees may indicate that there are fewer natural systems that reduce pollution. In this sense, the northern part of the park was the one that met this criterion. This area is associated with physical activities such as exercise and sports in spite of its lack of trees and plants. This was a willful decision to create sports facilities rather than high-density green spaces. These facilities consist of the following: volleyball courts, basketball courts, and soccer fields. • Figure 2D corresponds to the selection criterion (D) (i.e., Land use), and this figure was made with the intention of showing that, last but not least, the study should take into account people exposed to air pollution levels. Therefore, land use was estimated, especially on the ground floor of the buildings and commercial areas of the region that borders the park. To try to satisfy this criterion, the blocks that border the park and that have both a greater amount and diversity of activities on the ground floor were identified. The shaded areas in orange were the ones that were chosen, according to this criterion. In addition, the orange areas located on Avenida de los Shyris were the ones that showed the greatest amount and diversity of commercial premises. Furthermore, it was identified that the above-mentioned blocks contained the highest percentage of food stores on the ground floor. In addition, the enclosures of these food stores do not have any filter mechanism for the air pollution that may be generated on this street.
It should also be mentioned that there are car parking areas within the park. However, although these parking areas are distributed equally throughout the park, with four areas to the south and four areas to the north, the difference in vegetation density between them is a factor that must be taken into account.
As a result of the above-mentioned four overlapping criteria, it was observed that the zone most prone to conditions of higher levels of air pollution and where people would be more exposed by the activity they perform is the zone that is indicated in Figure 2 (1) (i.e., Avenida de los Shyris and Avenida Naciones Unidas)-that is, the zone that is located in the northeast of the park. Therefore, this fact justified the need to measure air pollution in that zone.
According to the four selection criteria that were mentioned at the beginning of this section, Figure 2 (2) shows the routes that were followed to perform the PM 2.5 concentration measurements within the most critical area of the park.

PM 2.5 Measurement Instrument and Data Collection Description
In this article, a portable CEL-712 Microdust Pro monitor pair with a GPS (Global Positioning System) device was used to perform the PM 2.5 concentration measurements. This measurement instrument was calibrated by CASELLA [43]. The calibration results were the following: Target Error < 15%.
In this article, the sampling time of the CEL-712 Microdust Pro monitor was 1 s, and the measurements were averaged to show the results each 10 s. Moreover, the GPS saved the coordinates of the route each 1 min.
The above-mentioned equipment were used to carry out their measurements at a height of 1.5 m, placing the particle inlet forward. Here, the walking speed was approximately equal to 2 km/h. Then, in order to build a pollution map in QGIS software, both collected and GPS data were used. In this research work, PM 2.5 concentration measurements were performed in October 2018 and January 2019. Moreover, all measurements were performed during the morning rush hours (8:00-10:00), as it is a time of the day with the worst air quality conditions, due to the low planetary boundary height and reduced atmospheric mixing.
According to [44], the CEL-712 Microdust Pro has the highest measurement range of any occupational dust measurement instrument available on the market. In addition, due to the large data storage capacity of this equipment, up to 500 measurement results can be taken and stored. Moreover, the user can generate reports in an easy way using the intuitive report wizard. The key features of this instrument and applications can be found in [44]. A characteristic of the CEL-712 Microdust Pro that was important in order to carry out the research work presented in this article is that this instrument can be used for spot checks and walk-through surveys. In addition, this instrument has the advantage that the user can instantly see when and where excessive dust levels are occurring [44].
In accordance with [44], another characteristic that this instrument has, which makes it unique because no other device has it, is that the CEL-712 Microdust Pro uses an on-site calibration filter to provide a spot check of the linearity of the instrument.
Furthermore, the CEL-712 Microdust Pro was validated by collocation following the instructions of the U.S. Environmental Protection Agency (U.S. EPA) to evaluate low-cost sensors [45]. For the case under study, the U.S. EPA also recommends [46,47]. For the case under analysis, the CEL-712 Microdust Pro was validated by collocation with the Thermo Fisher Scientific BAM equipment of the Air Quality Network station of Quito, run by the Environmental Protection Agency (in Spanish, it is called "Secretaria de Ambiente" (SA)) of Quito, for six hours. The experiment time had to be limited to six hours, due to the battery lifetime, and the fact that the battery could not be replaced because the monitoring station was located in a school. Figure 3 shows the raw Microdust Pro data and hourly averages of Microdust Pro and the BAM method. It can be seen that while there is a variation in the Microdust Pro data, BAM data fits in the one standard deviation of the Microdust Pro data. In accordance with [48], the validation was performed at times were the relative humidity is low [49]. According to [44], the CEL-712 Microdust Pro has the highest measurement range of any occupational dust measurement instrument available on the market. In addition, due to the large data storage capacity of this equipment, up to 500 measurement results can be taken and stored. Moreover, the user can generate reports in an easy way using the intuitive report wizard. The key features of this instrument and applications can be found in [44]. A characteristic of the CEL-712 Microdust Pro that was important in order to carry out the research work presented in this article is that this instrument can be used for spot checks and walk-through surveys. In addition, this instrument has the advantage that the user can instantly see when and where excessive dust levels are occurring [44].
In accordance with [44], another characteristic that this instrument has, which makes it unique because no other device has it, is that the CEL-712 Microdust Pro uses an on-site calibration filter to provide a spot check of the linearity of the instrument.
Furthermore, the CEL-712 Microdust Pro was validated by collocation following the instructions of the U.S. Environmental Protection Agency (U.S. EPA) to evaluate low-cost sensors [45]. For the case under study, the U.S. EPA also recommends [46,47]. For the case under analysis, the CEL-712 Microdust Pro was validated by collocation with the Thermo Fisher Scientific BAM equipment of the Air Quality Network station of Quito, run by the Environmental Protection Agency (in Spanish, it is called "Secretaria de Ambiente" (SA)) of Quito, for six hours. The experiment time had to be limited to six hours, due to the battery lifetime, and the fact that the battery could not be replaced because the monitoring station was located in a school. Figure 3 shows the raw Microdust Pro data and hourly averages of Microdust Pro and the BAM method. It can be seen that while there is a variation in the Microdust Pro data, BAM data fits in the one standard deviation of the Microdust Pro data. In accordance with [48], the validation was performed at times were the relative humidity is low [49]. Before finishing this section, it is important to mention that studies aimed at conducting the evaluation and comparison of several PM2.5 monitors can be found in [6]. In addition, research articles focused on the calibration and validation of low-cost particulate matter sensors can be found in [4,48]. Before finishing this section, it is important to mention that studies aimed at conducting the evaluation and comparison of several PM 2.5 monitors can be found in [6]. In addition, research articles focused on the calibration and validation of low-cost particulate matter sensors can be found in [4,48].

Robust Data Analysis
This section is divided into three parts. First, a statistical summary of the data is presented. Second, a nonparametric inferential analysis is performed. Finally, the section ends using robust estimators to analyze measures of centralization and dispersion of the data of the variables under study. The latter is justified by the fact that robust estimators are relatively insensitive to small changes in the data.

Statistical Summary of the Dataset
In accordance with [50], in this research work, there is a set of data that is going to be analyzed in order to properly interpret the relationships between them. Furthermore, in this subsection, different types of graphs will be used with the intention of showing a simple transmission of the analysis and conclusions. Here, it has been considered that these graphs will allow a simple comparison between the data and, on the other hand, highlight differences between the variables under study. According to [51], these graphs make a clear and precise representation of the consequences.
To perform the analysis, a graph of each of the datasets was first considered, where these were treated as if they were a time series (see Figure 4). The order in which the data was collected is represented on the abscissa axis, while the value of the data is represented on the axis of the ordinates. Figure 4 shows the collected data separated by the lines that are parallel to the park, which are represented by the letter X, and the lines that are perpendicular to the park, which are represented by the letter Y (see Figures 1 and 2). The results of the PM 2.5 measurements that are represented in this figure are the observations of the variables represented in Figure 1, which have been explained in Section 2.

Statistical Summary of the Dataset
In accordance with [50], in this research work, there is a set of data that is going to be analyzed in order to properly interpret the relationships between them. Furthermore, in this subsection, different types of graphs will be used with the intention of showing a simple transmission of the analysis and conclusions. Here, it has been considered that these graphs will allow a simple comparison between the data and, on the other hand, highlight differences between the variables under study. According to [51], these graphs make a clear and precise representation of the consequences.
To perform the analysis, a graph of each of the datasets was first considered, where these were treated as if they were a time series (see Figure 4). The order in which the data was collected is represented on the abscissa axis, while the value of the data is represented on the axis of the ordinates. Figure 4 shows the collected data separated by the lines that are parallel to the park, which are represented by the letter , and the lines that are perpendicular to the park, which are represented by the letter (see Figures 1 and 2). The results of the PM2.5 measurements that are represented in this figure are the observations of the variables represented in Figure 1, which have been explained in Section 2.
(a)  From Figure 4, it can be seen that the variable 1 , which is the one with the most data, and the variable 3 are the ones with the greatest fluctuations, while the variables 2 and 3 behave in a more linear way and with less variations, although the values of 3 are lower than those of 2 . The other two variables, 1 and 2 , apparently behave analogously to each other and different from the rest. Below, a statistical summary of the data is shown in Table 1. This table includes measures of central tendency, variability, and shape. When comparing the summary statistics presented in Table 1 with those presented in [6], it can be said that the difference between both studies is that in [6], the skewness and kurtosis were not studied. However, in this article, it is important to study these third and fourth moments, because it is important to analyze whether the data comes from heavy-tailed distributions. According to [40], heavy-tailed distributions are those probability distributions whose density tails are not bounded by the normal density tails. From Figure 4, it can be seen that the variable X 1 , which is the one with the most data, and the variable Y 3 are the ones with the greatest fluctuations, while the variables X 2 and X 3 behave in a more linear way and with less variations, although the values of X 3 are lower than those of X 2 . The other two variables, Y 1 and Y 2 , apparently behave analogously to each other and different from the rest. Below, a statistical summary of the data is shown in Table 1. This table includes measures of central tendency, variability, and shape. When comparing the summary statistics presented in Table 1 with those presented in [6], it can be said that the difference between both studies is that in [6], the skewness and kurtosis were not studied. However, in this article, it is important to study these third and fourth moments, because it is important to analyze whether the data comes from heavy-tailed distributions. According to [40], heavy-tailed distributions are those probability distributions whose density tails are not bounded by the normal density tails.  Figure 5 shows the multiple box-plot of data from each variable, and Figure 6 shows the empirical 95% confidence intervals for the median [52], including both the mean and the median of the data.  Figure 5 shows the multiple box-plot of data from each variable, and Figure 6 shows the empirical 95% confidence intervals for the median [52], including both the mean and the median of the data.    Figure 5 shows the multiple box-plot of data from each variable, and Figure 6 shows the empirical 95% confidence intervals for the median [52], including both the mean and the median of the data.   Based on the previous figures, at first sight it is observed that the sizes of the six variables under study are different, with X 1 having almost double the observations compared to each of the rest of the variables. In addition, it is observed that the means are higher than the medians in all cases, except for Y 2 . The aforementioned indicates the existence of extreme values on the right, which is corroborated by the multiple box-plot shown in Figure 5.
In addition, the values of the standard deviation shown in Table 1 confirm that the variability of X 1 is similar to the variability of Y 3 , that the dispersion of the values of X 2 and X 3 is small, and that there is some similarity between the fluctuations of Y 1 and the fluctuations of Y 2 .
Moreover, it can be seen that all the variables present a great lack of normality due to the high skewness values. All the skewness values shown in Table 1 are positive, so the distributions are lengthened to the right. The lack of normality is reaffirmed for most of the variables, since almost all the kurtosis values are much greater than 3, in particular the kurtosis values of X 1 and Y 2 .
Next, in order to see the trends of the variables under study, the technique of moving average (MA) was used [53]. The MA technique was used to soften the data of the time series, in order to reduce the influence that each individual data had.
In this article, MA of size 5, 10, 15, and 20 were considered. However, after verifying that for each variable the MA technique behaved in the same way for each of the above-mentioned sizes, because the ends were maintained and the curves softened very little when the data were stable, it was decided to select the MA of size 10 to find representations analogous to those in Figure 4. Figure 7 shows the application of the MA of different sizes to the variable X 1 , and the graphs of the application of the MA of size 10 to the six variables are shown in Figure 8. Based on the previous figures, at first sight it is observed that the sizes of the six variables under study are different, with 1 having almost double the observations compared to each of the rest of the variables. In addition, it is observed that the means are higher than the medians in all cases, except for 2 . The aforementioned indicates the existence of extreme values on the right, which is corroborated by the multiple box-plot shown in Figure 5.
In addition, the values of the standard deviation shown in Table 1 confirm that the variability of 1 is similar to the variability of 3 , that the dispersion of the values of 2 and 3 is small, and that there is some similarity between the fluctuations of 1 and the fluctuations of 2 .
Moreover, it can be seen that all the variables present a great lack of normality due to the high skewness values. All the skewness values shown in Table 1 are positive, so the distributions are lengthened to the right. The lack of normality is reaffirmed for most of the variables, since almost all the kurtosis values are much greater than 3, in particular the kurtosis values of 1 and 2 .
Next, in order to see the trends of the variables under study, the technique of moving average (MA) was used [53]. The MA technique was used to soften the data of the time series, in order to reduce the influence that each individual data had.
In this article, MA of size 5, 10, 15, and 20 were considered. However, after verifying that for each variable the MA technique behaved in the same way for each of the above-mentioned sizes, because the ends were maintained and the curves softened very little when the data were stable, it was decided to select the MA of size 10 to find representations analogous to those in Figure 4. Figure  7 shows the application of the MA of different sizes to the variable 1 , and the graphs of the application of the MA of size 10 to the six variables are shown in Figure 8.   When analyzing Figures 7 and 8, it can be said that the conclusions are the same as for Figure 4. Specifically, the major fluctuations occur in the variables 1 and 3 , although these fluctuations occur by changing the trend. Moreover, the trend changes do not occur in specific moments, but When analyzing Figures 7 and 8, it can be said that the conclusions are the same as for Figure 4. Specifically, the major fluctuations occur in the variables X 1 and Y 3 , although these fluctuations occur by changing the trend. Moreover, the trend changes do not occur in specific moments, but rather in time intervals. The similarity between the variables X 2 and X 3 is repeated, where X 3 has lower values. In addition, the similarity between Y 1 and Y 3 and its difference with the remaining variables are also repeated.

Nonparametric Inferential Statistical Analysis
In this part of the article, the first objective is to know if all the variables can be considered as samples that come from six continuous random variables with distribution functions respectively. The location model for these samples can be established as that the six distribution functions are identical to the random variable Z, whose distribution function is F(x), and that the six distribution functions are 6) is a location parameter, and will be assumed to be an order statistic.
If the six samples come from populations that have a common median, a statistical hypothesis test will be established and confidence intervals will be obtained, both bilateral, to compare whether the six variables have sufficiently different medians. In addition, it will be analyzed whether the cause of these differences between the medians can be attributed to chance, or to another cause [54][55][56].
In this article, observations were carried out on different groups of variables, which will be considered to be independent of each other, because they are values that come from different places [54].
Here, in accordance with [54][55][56], it was considered the test where the null hypothesis was: against the alternative hypothesis: Taking into account that the variable is continuous, assuming that the null hypothesis is true and that the sample data are consistent with the median value, half of the observations will be less than M 0 , and the other half will be greater.
The test statistic will be K, which represents the number of sample observations greater than the M 0 value, and although the order statistics with index r, X (r) of a sample do not have the same distribution as the original variable and are not independent of each other, it does happen that the order statistics follow binomial distributions.
For the case under study, K will be a binomial random variable of parameter h, which is the number of observations, and have a probability of success equal to 1/2, K ∼ Bin h, 1 2 . Therefore, the null hypothesis, H 0 , will be rejected if the test statistic, K, takes values greater than a certain constant, K ≥ k α 2 , or takes values less than another constant, K ≤ k α 2 , where α is the significance level.
If bilateral confidence intervals are chosen, these intervals will be of the form . By choosing those values so that the probability on the right is equal to the probability on the left, it can be shown that these values verify the following: is the largest integer that verifies 3.
The p-value of the statistical hypothesis test is equal to 2·min K i=0 N i where N is the number of independent trials. For the statistical hypothesis testing, where the null hypothesis was: where M e was the population median, and with a significance level α = 5%, the lower and upper rejection limits, and length of the confidence interval were found. In addition, the p-value of the hypothesis test was found. In [6], it was considered that a p-value lower than 0.05 was statistically significant. In the present article, a p-value lower than 0.05 was also considered statistically significant, due to the fact that the rejection limits were calculated at α = 5%. These results are shown in Table 2. Also, Figure 9 shows both the nonparametric confidence intervals for the medians of the variables, with a 95% confidence level [54][55][56], and the medians of the variables. where was the population median, and with a significance level = 5%, the lower and upper rejection limits, and length of the confidence interval were found. In addition, the p-value of the hypothesis test was found. In [6], it was considered that a p-value lower than 0.05 was statistically significant. In the present article, a p-value lower than 0.05 was also considered statistically significant, due to the fact that the rejection limits were calculated at = 5%. These results are shown in Table 2. Also, Figure 9 shows both the nonparametric confidence intervals for the medians of the variables, with a 95% confidence level [54][55][56], and the medians of the variables.  In view of these results, it can be said that the median of 2 is different from the median of any of the other variables, since the rejection limits include the rest of the medians. The same can be said of 3 , because the rejection region of the hypothesis test for this variable includes the rest of the medians. In addition, the hypothesis that the medians of 1 and 2 can be the medians of the populations 3 and 1 is rejected, and the hypothesis that the medians of 3 and 1 are equal cannot be rejected. In addition, the hypothesis that the medians of the variables 1 and 2 are equal cannot be rejected.
The confidence intervals of smallest size are those corresponding to the variables 1 and 2 , which indicates that these two variables are the ones that have less variability. On the other hand, the confidence intervals of greatest size are those of the variables 1 and 3 . Therefore, it can be inferred that these variables have the greatest variability. Furthermore, the other two variables (i.e., 3 and In view of these results, it can be said that the median of Y 2 is different from the median of any of the other variables, since the rejection limits include the rest of the medians. The same can be said of Y 3 , because the rejection region of the hypothesis test for this variable includes the rest of the medians. In addition, the hypothesis that the medians of X 1 and X 2 can be the medians of the populations X 3 and Y 1 is rejected, and the hypothesis that the medians of X 3 and Y 1 are equal cannot be rejected. In addition, the hypothesis that the medians of the variables X 1 and X 2 are equal cannot be rejected. The confidence intervals of smallest size are those corresponding to the variables X 1 and X 2 , which indicates that these two variables are the ones that have less variability. On the other hand, the confidence intervals of greatest size are those of the variables Y 1 and Y 3 . Therefore, it can be inferred that these variables have the greatest variability. Furthermore, the other two variables (i.e., X 3 and Y 2 ) have intervals of an intermediate length with respect to the length of the other intervals. Therefore, the variability of these last two variables will also be intermediate with respect to the variability of the other variables.
In accordance with the above explanation and Figure 9, the variables under study have been classified into four groups. One group consists of Y 2 , another group consists of Y 3 , a third group consists of X 1 and X 2 , and the last group consists of X 3 and Y 1 .
Next, the nonparametric hypothesis tests that were performed to test the category in which each of the six variables under study was located are going to be analyzed. In order to do this, the categories that are established by the Quito Air Quality Index (QAQI) for air pollution by PM 2.5 [38] were taken into consideration.
In accordance with [38], for an average concentration of PM 2.5 in 24 h, the air pollution categories are the following: The medians, 95% confidence intervals for the medians, and bands that delimit the three lowest categories of air pollution by PM 2.5 concentration in which air quality in Quito is classified [38] are shown in Figure 10. In accordance with the above explanation and Figure 9, the variables under study have been classified into four groups. One group consists of 2 , another group consists of 3 , a third group consists of 1 and 2 , and the last group consists of 3 and 1 .
Next, the nonparametric hypothesis tests that were performed to test the category in which each of the six variables under study was located are going to be analyzed. In order to do this, the categories that are established by the Quito Air Quality Index (QAQI) for air pollution by PM2.5 [38] were taken into consideration.
In accordance with [38], for an average concentration of PM2.5 in 24 h, the air pollution categories are the following: The medians, 95% confidence intervals for the medians, and bands that delimit the three lowest categories of air pollution by PM2.5 concentration in which air quality in Quito is classified [38] are shown in Figure 10.  From Figure 10, it can be seen that the nonparametric confidence intervals for the median of X 3 and Y 1 are contained in the Desirable level. Therefore, the null hypothesis that the median of X 3 and Y 1 are at the Desirable level cannot be rejected at a significance level of α = 5%. Moreover, it can be rejected the hypothesis that these medians belong to the other levels of air quality at the 95% confidence level.
Furthermore, the hypothesis that the medians of the variables X 1 , X 2 , and Y 2 are at an Acceptable level cannot be rejected, and it is rejected the hypothesis that these medians can belong to the other levels.
In addition, the hypothesis that the median of variable Y 3 can belong to the Acceptable level or Caution level cannot be rejected, but the hypothesis that this median belongs to the other four levels is rejected.
In addition, taking into account the moments of orders two, three, and four shown in Table 1, although X 2 has the same median as X 1 , it does not come from the same distribution as X 1 . Therefore, if X 3 and Y 1 are considered to be elements of the same group, it can be said that the null hypothesis that the distributions of the five groups of variables are different from each other is rejected. Figures 11-16 show the observations of variables X 1 , X 2 , X 3 , Y 1 , Y 2 , and Y 3 , respectively, together with the 95% nonparametric confidence interval for the median and the limits that define the categories of the above-mentioned levels of air pollution by PM 2.5 concentrations [38]. From Figure 10, it can be seen that the nonparametric confidence intervals for the median of 3 and 1 are contained in the Desirable level. Therefore, the null hypothesis that the median of 3 and 1 are at the Desirable level cannot be rejected at a significance level of = 5%. Moreover, it can be rejected the hypothesis that these medians belong to the other levels of air quality at the 95% confidence level.
Furthermore, the hypothesis that the medians of the variables 1 , 2 , and 2 are at an Acceptable level cannot be rejected, and it is rejected the hypothesis that these medians can belong to the other levels.
In addition, the hypothesis that the median of variable 3 can belong to the Acceptable level or Caution level cannot be rejected, but the hypothesis that this median belongs to the other four levels is rejected.
In addition, taking into account the moments of orders two, three, and four shown in Table 1, although 2 has the same median as 1 , it does not come from the same distribution as 1 . Therefore, if 3 and 1 are considered to be elements of the same group, it can be said that the null hypothesis that the distributions of the five groups of variables are different from each other is rejected. Figures 11-16 show the observations of variables 1 , 2 , 3 , 1 , 2 , and 3 , respectively, together with the 95% nonparametric confidence interval for the median and the limits that define the categories of the above-mentioned levels of air pollution by PM2.5 concentrations [38].          From Figures 11-16, it can be said that the only variable that at any time exceeds all the air quality limits is 1 . In addition, this variable has many observations outside its nonparametric confidence band for the median, which is also a common characteristic that all the variables under study have. In addition, in terms of exceeding the air quality limits, there is the variable 3 , which almost reaches the Alarm level, and indicates a significant variability.
With respect to 2 , it can be said that this variable practically does not exceed the Acceptable level. Furthermore, it can be said that 3 sometimes exceeds the Desirable level; however, most observations of 3 remain below the Desirable level.
Finally, it can be said that 1 exceeds the Acceptable level with few observations, and many of its observations are at the Desirable level, while a large part of the observations of 2 are above the Desirable level, and some exceed the Acceptable level, being below Caution level.
Before finishing this subsection, the six variables under study are going to be divided into two different groups, one consisting of the variables and the other consisting of the variables. In addition, it will be said that the variables are parallel to the park and some of them cross it in this way, while the variables are perpendicular to the park and some of them cross it in this way. Therefore, it is going to be analyzed whether the air pollution by PM2.5 is more harmful for citizens when the park is crossed longitudinally or when it is crossed transversely. To this end, the Wilcoxon rank-sum test adapted to compare the above-mentioned two groups against a one-sided alternative is going to be used [54].
For this case, the null hypothesis is: and the alternative hypothesis is: From Figures 11-16, it can be said that the only variable that at any time exceeds all the air quality limits is X 1 . In addition, this variable has many observations outside its nonparametric confidence band for the median, which is also a common characteristic that all the variables under study have. In addition, in terms of exceeding the air quality limits, there is the variable Y 3 , which almost reaches the Alarm level, and indicates a significant variability.
With respect to X 2 , it can be said that this variable practically does not exceed the Acceptable level. Furthermore, it can be said that X 3 sometimes exceeds the Desirable level; however, most observations of X 3 remain below the Desirable level.
Finally, it can be said that Y 1 exceeds the Acceptable level with few observations, and many of its observations are at the Desirable level, while a large part of the observations of Y 2 are above the Desirable level, and some exceed the Acceptable level, being below Caution level.
Before finishing this subsection, the six variables under study are going to be divided into two different groups, one consisting of the X variables and the other consisting of the Y variables. In addition, it will be said that the X variables are parallel to the park and some of them cross it in this way, while the Y variables are perpendicular to the park and some of them cross it in this way.
Therefore, it is going to be analyzed whether the air pollution by PM 2.5 is more harmful for citizens when the park is crossed longitudinally or when it is crossed transversely. To this end, the Wilcoxon rank-sum test adapted to compare the above-mentioned two groups against a one-sided alternative is going to be used [54].
For this case, the null hypothesis is: H 0 : air pollution due to group Y = air pollution due to group X, and the alternative hypothesis is: H 1 : air pollution due to group Y > air pollution due to group X. Table 3 shows the count data-that is, the number of observations that respond in a certain manner to air pollution by PM 2.5 concentrations. This table shows the multivariate frequency distribution of the variables. Table 3. Air pollution due to groups X and Y.

Groups
Desirable Level Acceptable Level Caution Level Total ∅ is the empty set.
For the case under study, the Wilcoxon rank-sum test statistic [54] for the Y sample is W N = 11.5, the expected value is E(W N ) = 10.5, and the variance is var(W N ) = 4.475. Therefore, the approximate p-value, using the normal approximation to the distribution of W N with continuity correction [54], is 0.4066. As a result, the null hypothesis that air pollution due to group Y is equal to air pollution due to group X cannot be rejected at a significance level of α = 5%. Statistically speaking, the air pollution due to group Y is equal to the air pollution due to group X.
The difference between the nonparametric statistical tools used in [26][27][28] and the analysis performed in this subsection, is that in [26][27][28], the study focused on obtaining only estimates of the median of the analyzed data. To carry out the study that was presented in [26][27][28], the Kruskal-Wallis test and the Wilcoxon-signed rank test were used. However, in this subsection, a nonparametric statistical analysis procedure that was focused on obtaining measures to estimate the central tendency of the data and their dispersion was developed. The nonparametric statistical analysis procedure developed in this article not only estimates the median of the data, but also analyzes the variability of the data, and uses all this information to classify the variables under study according to established categories of air pollution levels [38].

Robust Analysis of Location Parameters
A parameter, θ, is called the location parameter for the random variable Ψ if the density function can be written as a function of ψ − θ. Therefore, the random variable Ψ − θ does not depend on θ. The location parameters usually indicate a value around which the bulk of the observations are grouped.
In this subsection, Ψ 1 , . . . , Ψ n was a sample and, in order to establish the estimators that were used, the sample order statistics [56] were considered: Ψ (1) ≤ Ψ (2) ≤ . . . ≤ Ψ (n) . In addition, in order to find estimates where the distribution symmetry center can be found, some L-location estimators that are linear combinations of order statistics were considered. In short, the α-trimmed family [39,40] with 0 ≤ α < 0.5 given by Equation (1) was used. Moreover, other location estimators that will be considered are the mean and the median: where [. . .] denotes the integer part. In addition, the trimean [39,41] was used, which is given by Equation (2): where Q i is the i-th quartile. Figures 17-22 show the α-trimmed mean of the variables, along with the mean, median, trimean, and 95% nonparametric confidence interval for the median.               In the graphs shown in Figures 17-22, the values of the α-trimmed mean function with low abscissa correspond to the extreme data of the distribution of values of the variable under study, and the values of the α-trimmed mean function with high abscissa correspond to the data close to the center of the distribution.
From Figures 17-22, it can be concluded that all the medians and trimeans are within the confidence band. Furthermore, the means of X 2 , X 3 , and Y 2 are within the confidence band. All this indicates stability of these variables. In contrast, the means of X 1 and Y 3 are well above the upper limit of the confidence band, which suggests the greater variability of these variables.
In addition, the most influential observations for the value of the mean of X 1 are those with the highest pollution values, since the α-trimmed mean function in abscissa low values is decreasing, and by the center of the distribution, the values fall outside the confidence band. This shows again the great variability of X 1 , which has already been observed through other previous arguments.
In addition, X 2 is the variable that has the most regular behavior of the location measurements. The observations that most influence the value of the median are the lowest, as the function for low abscissa values is increasing. Values near the center have many fluctuations, but are much smaller than those observed in X 1 .
Moreover, X 3 has many observations with very low values, which leads to the fact that the values of the α-trimmed mean function are far from the lower limit of the confidence band. Similar to X 2 , the function values close to the center of the distribution data suffer from fluctuations, and these fluctuations are similar to those of X 2 and lower than the fluctuations of the rest of the variables. The difference between X 2 and X 3 is that the air pollution values of X 2 , for the most part, are higher than those of X 3 . This fact and the high variability of X 1 were already anticipated in Figure 4a.
When evaluating the α-trimmed mean function in Y 1 , it is observed that in the central values of the function, it occurs the same as in the variables already analyzed. Specifically, it seems that the most decisive observations for the value of the median are grouped near it. At the extremes, it behaves symmetrically, but through the center of the data, there is a lower level of pollution. All this again suggests high variability.
In addition, Y 2 has many observations with low values. Thus, the values of the α-trimmed mean function are far from the lower limit of the confidence band. When suppressing extreme values, the function evaluated in low abscissa is increasing, remarking the presence of many observations of values well below the median. It is the only variable in which the median is greater than the mean. Furthermore, Y 3 is the variable for which the α-trimmed mean function has the highest range. The initial growth indicates a suppression of low values, which is corroborated by the fact that the function moves away from the lower limit of the confidence band. When for obtaining the α-trimmed mean function, it started to delete values greater than 60% of the extreme data, and this function decreased. This suggests that the central values of the distribution are greater than the median. What has been said here can also be seen in Figure 4b.
In [4], in order to achieve robust linear regression, the M-estimation method was used to reduce the influence of outliers in least squares fitting. According to [4], the M-estimation method was given in the form of a weight function of residuals, and its performance was satisfactory provided that the distribution of the response was normal and had no outliers. For the case under study in [4], the best model was the robust linear regression using the Talwar M-estimator.
However, in this research article, L-location estimators were used to estimate the central tendency of the data in a robust manner. Here, it was not assumed that the data fits a normal distribution. In fact, the skewness and kurtosis values shown in Table 1 show that the data do not fit a normal distribution. In addition, the box plots shown in Figure 5 show that some variables have many outliers. What have been explained above is a characteristic of heavy-tail distributions [40]. This situation justified the need to use robust estimators in this article.
Despite the fact that the research objectives in [4] and in this article were different, the use of M-estimators and L-location estimators gave satisfactory results. In addition, these results demonstrate that these estimators can be applied in cases where it is required to robustly estimate the concentration of PM 2.5 and the sample size is small, as is the case study in this article.

Robust Analysis of Scale Parameters
Taking into account the obtained results, it was necessary to find an estimate of the dispersion of the variables under study. Although X 2 and X 3 could be left out of this analysis, because they do not have enough extreme observations, the estimation of the dispersion of all the variables was carried out.
The measure that is commonly used to describe the variability of a sample of size n of a random variable, Ψ, is the sample standard deviation, given by Equation (3): S Ψ satisfies both the shift invariance condition, S Ψ+λ = S Ψ ∀ λ ∈ R, and the scale equivariance condition, S λΨ = |λ|S Ψ ∀ λ ∈ R. According to [40], any statistic satisfying these two conditions is a dispersion estimate. The scale estimators used in this subsection were the following [39]:

3.
Median Absolute Deviation: where M e is the median of the data.

4.
Semi Interquartile Range: where Q 1 is the first quartile and Q 3 is the third quartile.

5.
Biweight midvariance: where: The family of scale estimators S bi (c) is based on an M-estimator and, according to [42], the scale measurement performed with this estimator has greater efficiency than conventional scale measurements in a wide type of distributions.
Taking into account that approximately the expected value of the MAD statistic is 2 3 σ, where σ is the standard deviation of the population, if in Equation (8) it is chosen that c = k 2 , then in order to find the desired estimate, the observations that are at a distance of MAD in more than 2 3 k 2 will not be considered. So, if c = 9, then only those observations whose distance from MAD is less than 6 times the standard deviation will be considered. As c increases, the values of u i decrease in absolute value, and the value of the denominator of S bi (c) increases. Therefore, this function decreases.
As α grows, the subtractions between the order statistics shown in Equation (9) are carried out between order statistics that are increasingly separated from each other. Therefore, the observations that are more toward the center of the dataset will be located close to the minimum of the subtractions between the order statistics mentioned above, and also the value of the denominator of Equation (9) will increase. Due to this, the C α n function has less influence from extreme observations as α grows. In accordance with [57], for α ≈ 0.5 the least median squares (LMS) estimator is obtained. In this paper, the LMS estimator given by Equation (10) was used: This estimator has an expression that is analogous to C n , except only for the quotient [57]. Table 4 shows the value of the statistics obtained for each variable. Figures 23-28 show estimates of the estimator S bi for all variables, 0 < c < 18. Figures 29-34 show estimates of the estimator C α n for all variables, 0 < α < 0.5. where 0 ≤ < 0, is the length of the dataset that we want to estimate its spread, Ψ (1) ≤ Ψ (2) ≤ ⋯ ≤ Ψ ( ) are order statistics [56], [… ] denotes the integer part, and Φ −1 ( ) is the inverse of standard normal cumulative distribution function, evaluated at the probability values in . As grows, the subtractions between the order statistics shown in Equation (9) are carried out between order statistics that are increasingly separated from each other. Therefore, the observations that are more toward the center of the dataset will be located close to the minimum of the subtractions between the order statistics mentioned above, and also the value of the denominator of Equation (9) will increase. Due to this, the function has less influence from extreme observations as grows. In accordance with [57], for ≈ 0.5 the least median squares (LMS) estimator is obtained. In this paper, the LMS estimator given by Equation (10) was used.: This estimator has an expression that is analogous to [ 2 ] , except only for the quotient [57]. Table 4 shows the value of the statistics obtained for each variable. Figures 23-28 show estimates of the estimator for all variables, 0 < < 18. Figures 29-34 show estimates of the estimator for all variables, 0 < < 0.5.                        Table 4, it can be said that for each variable, the estimate Ψ is greater than the estimate , and in turn, is greater than the other three estimates, noting that the estimates , , and are more robust than the first two.

Taking into consideration what is shown in the graphs of Figures 23-34 and in
In addition, 1 has a great value of Ψ , which is much greater than the estimate, and the latter is, in turn, approximately four times greater than the other three estimates: , , and . Taking into account that these last three estimators are the most robust among all those considered, it can be said that the extreme observations that have, in general, very high values and others very close to zero are very influential in the estimates that include them. The graphs that represent the functions and are very similar, when they are compared to each other for low abscissa values. Therefore, estimates consider most of the observations. In addition, both functions show fluctuations, but then the two families of estimates stabilize around values between and .
With respect to 2 , the first thing that can be said is that the point estimates of scale , , , and are very similar to each other. The graph of the function has two very pronounced maximums for low values of , while the graph of function also has a maximum for low values of that is much lower than the two above-mentioned maximums. Moreover, it can be seen how both functions immediately stabilize around the point estimates. The foregoing indicates that there are not many extreme values that influence scale estimates, and that all the scale estimates found are acceptable. Furthermore, the scale estimates of 1 that are more robust (i.e., , , and ) are slightly greater than the respective scale estimates of the variable 2 . With respect to 3 , it can be said that the value of the robust estimates (i.e., , , and ) of this variable are greater than more of the half of the value of the non-robust estimates (i.e., and Ψ ). The graphs of and have characteristics analogous to the respective graphs of 2 . It can be seen that has a very pronounced maximum for low values of , and that and also has a low pronounced maximum for low values of . In addition, both functions immediately begin to fluctuate around the found point estimates. Therefore, it can be said that there are few extreme values, with high values, that influence the estimates remarkably. However, unlike  Table 4, it can be said that for each variable, the estimate S Ψ is greater than the estimate MAD mean , and in turn, MAD mean is greater than the other three estimates, noting that the estimates MAD, SIR, and LMS are more robust than the first two.

Taking into consideration what is shown in the graphs of Figures 23-34 and in
In addition, X 1 has a great value of S Ψ , which is much greater than the MAD mean estimate, and the latter is, in turn, approximately four times greater than the other three estimates: MAD, SIR, and LMS. Taking into account that these last three estimators are the most robust among all those considered, it can be said that the extreme observations that have, in general, very high values and others very close to zero are very influential in the estimates that include them. The graphs that represent the functions S bi and C α n are very similar, when they are compared to each other for low abscissa values. Therefore, estimates consider most of the observations. In addition, both functions show fluctuations, but then the two families of estimates stabilize around values between MAD and MAD mean .
With respect to X 2 , the first thing that can be said is that the point estimates of scale MAD, MAD mean , SIR, and LMS are very similar to each other. The graph of the function S bi has two very pronounced maximums for low values of c, while the graph of function C α n also has a maximum for low values of α that is much lower than the two above-mentioned maximums. Moreover, it can be seen how both functions immediately stabilize around the point estimates. The foregoing indicates that there are not many extreme values that influence scale estimates, and that all the scale estimates found are acceptable. Furthermore, the scale estimates of X 1 that are more robust (i.e., MAD, SIR, and LMS) are slightly greater than the respective scale estimates of the variable X 2 .
With respect to X 3 , it can be said that the value of the robust estimates (i.e., MAD, SIR, and LMS) of this variable are greater than more of the half of the value of the non-robust estimates (i.e., MAD mean and S Ψ ). The graphs of S bi and C α n have characteristics analogous to the respective graphs of X 2 . It can be seen that S bi has a very pronounced maximum for low values of c, and that and C α n also has a low pronounced maximum for low values of α. In addition, both functions immediately begin to fluctuate around the found point estimates. Therefore, it can be said that there are few extreme values, with high values, that influence the estimates remarkably. However, unlike what happens in the case of X 2 , the estimates obtained with the families of estimators are below MAD, for the case of C α n , and below MAD mean , for the case of S bi . Therefore, the estimators S bi and C α n are not so similar. Moreover, the most robust scale estimates are similar to each other in the case of X 1 and X 3 , but these robust estimates have somewhat greater values than those corresponding to the case of X 2 .
In addition, the point estimates of scale of Y 1 show appreciable differences. It can be seen that S Ψ is appreciably higher than the rest of the estimators. In addition, the MAD mean and SIR estimates are similar, the MAD value is half the value of the previous ones, and the LMS value is half the MAD value. All this indicates that there are influential observations in the scale estimates. The graphs of S bi and C α n are similar to the respective graphs of X 3 , even in the difference with X 2 . Therefore, there are a few extreme high observations that produce estimates above the possible real value. The S bi family oscillates around MAD mean and the C α n family oscillates around MAD. Moreover, Y 2 has quite different point estimates of scale, although the most robust estimators (i.e., MAD, SIR, and LMS) are very similar to each other, and slightly greater than the point estimators of scale of the variables that have been previously analyzed. The graphs of the functions S bi and C α n are different from the other graphs of the rest of the variables. It can be seen that C α n is the graph that is most influenced by extreme values and the high values of the variable. By suppressing the extreme observations, the C α n estimates are around MAD mean . The point scale estimates seem somewhat greater than those of the variables previously analyzed, but lower than those of the variable Y 3 .
Furthermore, by removing the S Ψ scale estimate of variable X 1 , the point scale estimates of Y 3 are greater than all the remaining ones. In addition, if only the most robust estimates (i.e., MAD, SIR, and LMS) are considered, then it can be seen that the differences with the estimates of the rest of the variables are 50% higher than the value of the estimates that so far were the highest; that is, those of the variable Y 2 . In addition, it can be said that Y 3 has similar characteristics to other variables. For example, there were observations with extremely high values that greatly influenced the estimates that took them into account. In addition, the S bi family tends to be bounded by point estimates of scale and to oscillate around MAD mean . Moreover, the C α n family tends to have oscillations, and it tends to MAD. Before moving on to the next section, it is important to highlight that, taking into account the previous comments, the variables under study can be classified according to their scale of variation. In this sense, X 2 is classified as the smallest. Second, the variables that have the least variation, but with greater variation than X 2 , are X 1 and X 3 . Y 1 and Y 2 can be placed in third and fourth place, respectively. Finally, the variable with the greatest variation is Y 3 , which has both the greatest variation and most of the points of influence. The results of this paper are compatible with those of [37].

Conclusions
In this article, data from PM 2.5 concentration measurements performed in La Carolina Park, Quito, Ecuador, were analyzed using robust statistics techniques. First, a statistical summary of the data was shown. In addition, it was found that the distributions of the data were not normal and that they could be heavy-tailed distributions.
In a preliminary analysis of the data, it was seen that all the extreme observations of the variables under study corresponded to high values of air pollution by PM 2.5 concentrations. In addition, it was seen that X 1 was the only variable that had extreme values for low air pollution levels. Furthermore, the lack of normality was a characteristic that was common of all the variables.
From hypothesis tests and the establishment of nonparametric confidence intervals, it was concluded that the median of Y 3 was the greatest. It was also concluded that the median of Y 2 was different from the other medians, and that the medians of X 1 and X 2 are equal to each other but different from all other medians. In addition, it was shown that the medians of X 3 and Y 1 are equal to each other and smaller than the other medians.
From the analysis carried out in this article, it was observed that the numerical values of X 1 were the only ones that sometimes exceeded all air quality limits established by the Quito Air Quality Index (QAQI). All the numerical values of X 2 except one of them were below the Acceptable level, and the numerical values of X 3 rarely exceeded the Desirable level. Few numerical values of Y 1 exceeded the Acceptable level, and many of its observations were at the Desirable level. Most of the observations of Y 2 were below the Caution level, and the vast majority were below the Acceptable level. The variable with the highest observations after Y 1 was Y 3 , although its observations did not exceed the Alarm level. Furthermore, a Wilcoxon rank-sum test showed that air pollution due to X 1 , X 2 , and X 3 was equal to air pollution due to Y 1 , Y 2 , and Y 3 , at the α = 0.05 significance level.
In this paper, it was shown that neither the streets that border the park nor the park itself (i.e., X 1 , X 2 , X 3 , Y 1 , Y 2 , and Y 3 ) were on Alert level. According to categories established by QAQI, the most critical case was Y 3 (Y 3 refers to a route along which measurements were performed in the street Avenida Naciones Unidas), which was at the Caution level. Therefore, measures that help to improve air quality in the region where La Carolina Park is located must be taken.
Once this first analysis was completed, it was decided to provide some robustness with respect to the estimates of both the central tendency of the data and its dispersion, because it is very important to give estimates that are practically immune at least to small variations in the data. Thus, unless a natural disaster occurs, such as an earthquake or a volcanic eruption, among others, regardless of the possible distribution of the data, estimates of the central tendency of the data and its dispersion will be limited by intervals that guarantee, with great certainty, that the true values of the magnitudes of the quantities under study are not very far from their estimates.
For centralization measures, the family of α-trimmed means and point estimates of the mean, median, and trimean were considered. For scale estimations, five point estimators were considered, which are not all robust. These estimators were the following: standard deviation, mean absolute deviation, median absolute deviation, semi interquartile range, and least median squares. Moreover, two families of robust estimators were considered: biweight midvariance estimators and estimators based on a subrange.
The results of this research showed that robust estimates of location of X 1 were very dependent on the number of observations that were not considered, although all these estimates were between the Desirable level and Acceptable level. On the other hand, robust estimates of the scale of X 1 were around 0.006 units. These data indicated the possibility that the distribution of X 1 is a distribution of heavy tails. This argument was corroborated by the high values of its skewness and kurtosis.
With respect to X 2 , most of the location estimates were in the nonparametric confidence band established for that variable. Since X 2 had barely extreme observations, the values of the robust location and scale estimators of this variable were all similar. In this case, all the robust location estimations were at the Acceptable level, and the robust scale estimations were close to 0.003 mg/m 3 . X 1 and X 2 would have their location measurements at the Acceptable level; the scale of X 2 is half that of the scale for X 1 , and their medians cannot be medians of the other variables that have been analyzed.
The above agrees with the fact that the difference between X 1 and X 2 is the street Avenida de los Shyris, where X 2 refers to a route that is right next to La Carolina Park and X 1 refers to a route where there is a highly commercial area. Therefore, the measurement results of X 1 are greater than the measurements results of X 2 , causing the displacement of the frequency distribution of X 1 to a higher value zone, and that the observations of X 1 with low numerical values appear as lower extreme data. Moreover, the above entails the appearance of a greater variability in the X 1 data compared to the X 2 data. This explanation demonstrates that the park is working as an air pollution filter.
Most of the robust location estimates of X 3 and Y 1 were in the confidence band established by nonparametric statistics, and all estimates were at the Desirable level. Therefore, the medians of the other variables cannot be medians of these two variables. In Y 1 , there seemed to be a breaking point from which robust location estimates were near the mean on the left, and near the median on the right. In addition, the scale estimates of Y 1 were somewhat higher than the scale estimates of X 3 . This happened because the range of observations of Y 1 for high pollution values was longer on the right than the range of X 3 . These two variables also showed a similarity regarding the geographical area, because the observations of both variables were taken on wooded areas of the park.
With respect to robust scale estimates of X 3 , it can be said that these were similar to the robust scale estimates of X 1 . In addition, robust scale estimates of Y 1 were significantly greater than robust scale estimates of X 3 . X 3 and Y 1 were the variables with the best air quality levels.
Furthermore, the robust location estimates of Y 2 that came from the α-trimmed means were above the nonparametric confidence interval, indicating that there were few high observations and many low observations. However, almost all the estimates were below the Acceptable level. Y 2 showed analogies with Y 1 in terms of the distribution of the data, but with a shift of them toward higher values. In spite of the fact that the measurements of Y 2 were performed in the middle of the park, in the area where these measurements were performed, there was not any wooded region. Once again, it is observed that where there are more trees and vegetation in an area, the lower the air pollution level in that area.
The robust scale estimates of Y 2 were similar to each other and larger than those of the other variables, except for Y 3 .
In addition, Y 3 showed robust location estimates from the α-trimmed means that were above the nonparametric confidence interval. Moreover, Y 3 is the variable that produced location estimates significantly greater than those of the other variables, being able to fall between the Acceptable level and the Caution level. This variable was the one with the greatest values of scale estimates.
The above is in accordance with the geographical situation in which the street Avenida Naciones Unidas is located and the route to which Y 3 refers. The street Avenida Naciones Unidas goes from east to west in the city, and it is well known that in Quito, the circulation speed of cars, buses, and trucks in that direction is slower than in any other direction. Therefore, there are more traffic jams when driving in that direction and, as a result, citizens on that street are much more exposed to air pollution.
At this point, it is important to say that in this article, robust confidence intervals were not included. Therefore, this is a task that remains pending for future research works. However, these confidence intervals are based on robust location and scale estimators, and these robust estimators in turn depend on the order statistics that were used to find the nonparametric confidence intervals. Therefore, it seems that there must be a close relationship between both kinds of confidence intervals.
The results of this study confirm that the efficiency of all urban dynamics (that is, mobility, recreation, cultural activities, and work activity, among others) must be subject to appropriate environmental quality standards. In the case of La Carolina Park, the quality and diversity of the activities that can be carried out in the park, together with the urban offer of the sector in which this part is located (which offers great competitiveness in terms of the following activities: commercial, labor, hotel, banking, governmental, and educational, among others), make this urban park one of the most visited spaces in Quito. Nevertheless, two critical areas were identified in the study:

1.
First critical area: For the case of La Carolina Park, the area near the intersection between Avenida Naciones Unidas and Avenida de los Shyris. Various sports are practiced in this area, which represents activities that demand more oxygen. This part of the park contains less woodland that mitigates the pollution that is generated in these two avenues.

2.
Second critical area: The paths of the roads that surround the park (most of them very wide), where from Monday to Friday they are very busy by people who work and/or live in the area. In these paths, the access to the premises located on the ground floor and the bus stops (specifically in Avenida Naciones Unidas, in Avenida de los Shyris, and in the northern part of the Avenida Rio Amazonas) are very exposed to the air pollution that is generated in these paths.
Therefore, as a result of the study carried out, it is recommended that these two areas have a new design and environmental management to reduce pollution emitting agents, and to mitigate and/or reduce the exposure of people to polluted air. As the reduction of pollution sources is subject to the urban structure of the city of Quito (which is a long, narrow city), and as there is also a need for more efficient and less polluting public transport, it is suggested that the urban decision makers, owners, and citizens in general take control (C) and mitigation (M) measures, such as:

•
(C1) Urban decision makers: Control the entry of polluted air to the food preparation and sale premises that are located on the ground floor by regulating the design of the accesses, especially when the access is direct from a route large vehicular flow. • (C2) Owners (especially of food premises on ground floors): Control food processing processes so that they are located in areas less exposed to polluted air. • (C3) Owners (of premises and/or homes): Consider natural ventilation habits (preferably cross-ventilation) of the buildings, taking into account the hours when pollution levels are lower (from 5:00 to 6:00 and from 20:00 to 21:00). • (M1) Urban decision makers: Mitigate exposure to pollution in the northeast part of the park, redesigning the placement of trees with more foliage at the edge of the park and promoting that sports activities be protected from pollution by a green filter. To achieve this goal, it is important that urban design can redirect activities that take the edge of the park as a unit of measurement, such as: walking, running and/or walking domestic animals.  Figures 1 and 2) that have allowed us to show the reader the urban park and its surroundings. R.Z. was responsible for the part of the article that has to do with the interpretation of the results from the point of view of air pollution due to PM 2.5 concentrations. It is important to say that authorship was limited to those who have contributed substantially to the work reported.