Water Particles Monitoring in the Atacama Desert: SPC Approach Based on Proportional Data

: Statistical monitoring tools are well established in the literature, creating organizational cultures such as Six Sigma or Total Quality Management. Nevertheless, most of this literature is based on the normality assumption, e.g., based on the law of large numbers, and brings limitations towards truncated processes as open questions in this ﬁeld. This work was motivated by the register of elements related to the water particles monitoring (relative humidity), an important source of moisture for the Copiapó watershed, and the Atacama region of Chile (the Atacama Desert), and presenting high asymmetry for rates and proportions data. This paper proposes a new control chart for interval data about rates and proportions (symbolic interval data) when they are not results of a Bernoulli process. The unit-Lindley distribution has many interesting properties, such as having only one parameter, from which we develop the unit-Lindley chart for both classical and symbolic data. The performance of the proposed control chart is analyzed using the average run length (ARL), median run length (MRL), and standard deviation of the run length (SDRL) metrics calculated through an extensive Monte Carlo simulation study. Results from the real data applications reveal the tool’s potential to be adopted to estimate the control limits in a Statistical Process Control (SPC) framework.


Introduction
Control charts are often applied to monitor processes in many fields, including ecology [1], health [2] and industry [3].This primary SPC tool is used for various types of data, such as count [4], attributes [5] and rates/proportions [6].The latter is a widespread type of data, having applications in the most diverse areas, for example, climate [7] and industry [8].The most widely used control charts for this situation are the p and np charts, but to use them, the process needs to be completed from Bernoulli experiments [9].
However, there are cases (i.e., processes) where rates and proportions are not results from Bernoulli experiments (e.g., come from individual measures or continuous number ratio), despite assuming values in the range (0, 1).For these processes, the p and np charts are not applicable [7,10].Therefore, it seeks alternatives, some of which are the beta [11], Kumaraswamy [7], simplex and unit-gamma [10] charts for monitoring fraction data, although not suitable in all cases (e.g., overdispersion).Thus, this article will propose an interesting and useful alternative to the previously mentioned control charts for monitoring processes with continuous data in the unit interval (e.g., rates, proportions or indices), the so-called unit-Lindley chart.Other works, such as [12][13][14][15], explored the unit transformation and showed the importance of the unit distributions class.Moreover, stochastic phenomena can be approximated, with a clear advantage and simplicity grounds, by summarizing a large quantity of data by a few numerical values, in the form of parametric models, enabling the modeller to learn about observable phenomena [16].
Recently, [17] introduced a one-parameter continuous probability distribution defined on the range (0, 1), which was named the unit-Lindley distribution.Such a model can describe processes involving data on rates and proportions.It also features many attractive properties, for example, a single parameter distribution (thus, more straightforward than the previously mentioned two-parameter distributions, namely the beta, Kumaraswamy, simplex, and unit-gamma models), a convenient reparameterization of the mean and closed-form expression for the maximum likelihood estimator.Thus, the unit-Lindley distribution motivated us to develop a new statistical control chart to monitor rates and proportions.It is worth pointing out that the main aim of the proposed unit-Lindley chart is to detect significant shifts in the process parameter (mean).
Our practical motivation came from monitoring the relative humidity in arid conditions of the Atacama Desert, which presents extreme weather with a small precipitation rate and high-temperature variation in the day regardless of the season.Three main climatic features of this region can be highlighted: (i) the (cold) Humboldt Current in the Pacific Ocean at the west; (ii) two mountain ranges: the Coastal range to the west, a part of blocking the moisture coming from the Pacific Ocean, and the long and high Andes mountain range to the east, which blocks the moisture coming from the Amazon Basin or the Atlantic Ocean; and (iii) its localization in the air convection cells close to the tropic of Capricorn (southern or tropical limit of the southern Hadley Cell, characterized by dry air).
Despite this arid environment, there is a particular phenomenon that will allow the formation of the Camanchaca in some area (in the cities of Copiapó and Huasco): morning and night sea mist (or fog), which will advance inland and penetrate deeply into the valleys and can be a source of water [18] in a region that suffers scarcity of this resource.In a simplified way, marine stratocumulus form over the Pacific Ocean, presenting, on the Chilean coast, cold temperatures linked to the Humboldt Current [19].This humid air is limited in its convection by a low thermal inversion linked to a warm and dry air linked to the tropical component of the Hadley Cell and located higher up than the colder air close to the sea surface.Such phenomenon is characteristic of the morning and night, as the temperatures are lower during that time and pass more easily below the dew point, which corresponds to the condensation point of the water contained in the atmosphere.Then, the clouds formed at low altitude can then, under the trade winds, enter deeply into the land following the west-east oriented valleys.
A classical data representation is infeasible, requiring a symbolic data representation, set-valued (interval or multi-valued) or modal (weight or probability distribution), containing more complex information of the phenomenon (humidity in hyper-arid conditions).These variables are called "symbolic", account for variability or uncertainty, making the symbolic data analysis more comprehensive than classical data analyses [20,21], and shall be extended for SPC reasoning.Thus, this paper proposes a new statistical methodology to overcome the complexity of monitoring the humidity in Copiapó city, Chile, due it is located in the Atacama Desert and most days present the Camanchaca phenomenon.
The remainder of this paper is organized as follows.Section 2 describes the practical motivation and the data acquisition.In Section 3, we first revise the unit-Lindley distribution and some of its basic properties (Section 3.1), and then present the new control chart based on this distribution (Section 3.2).Section 4 provides simulation studies designed to assess the performance of the proposed unit-Lindley chart.Section 5 illustrates the usefulness of the unit-Lindley chart through several examples.Finally, Section 6 concludes the paper with a few remarks and discussions on future works.

The Data
In the Atacama Desert, a north-south geographic band located mainly in northern Chile, precipitation is only a few millimeters per year or sometimes non-existent, making it one of the driest places on Earth [22,23].However, the vast expanses of the desert are punctuated by fertile valleys with rivers originating in the central Andes and flowing into the Pacific Ocean.Along these rivers, human populations have settled historically, exploiting more and more, this rare and precious water, especially with the growing development of the monoculture and mining industry, logically interested in the special weather conditions and the mineral resources of the Cordillera.
The hydrological regime of the principal rivers of Atacama is characterized by ice sources: water flows from the peaks following the melting of snowfall, glaciers and permafrost located in the upper parts of the Andes range [24,25].In the context of climate change, it is therefore essential to understand the hydrological cycle of these regions in order to set up a sustainable management policy [26].Understand the hydrological cycle requires the implementation of tools for forecasting river flows, relative humidity, groundwater reservoirs or any other water-related quantity monitoring, which inevitably needs an in-depth knowledge of the physical phenomena that govern the entire hydrological cycle and, more precisely, the complex interaction between atmosphere, climate, landforms, ice, snow and river flows.
Whereas, an important event occurs in this area, where marine stratocumulus cloud banks that form on the Chilean coast, called Camanchaca, which is daily the passageway of "low clouds", right after sunrise, sequentially for a couple of hours.This event is the source of water for many types of flora and fauna in the Atacama desert.
For illustrating the relative humidity dynamics in the Atacama, were processed satellite images acquired using MODIS (Moderate Resolution Imaging Spectroradiometer) sensor from 2000 to 2020, this sensor is available in two satellites, these are Terra (daily around 11:00 a.m.local time), as well as Aqua (around 3:00 p.m. local time).Figure 1 shows the statistical representation of the cloud occurrence over the Atacama region (Terra MODIS in panel B, and Aqua MODIS in panel C, adopting the same colour scale), shaded in dark-red for a high probability of cloud occurrence, and in beige for low probability.At the same time, a high observation of cloud (moisture source) is noticeable from the Chilean coast till the beginning of the highlands (this limit is evident in yellow), shown through the digital elevation model for part of the Copiapó watershed presented in panel A.
Regarding the Camanchaca event, when monitoring the humidity in the Atacama region of Chile, moreover, in Copiapó city, it is relevant to know that water is a scarce element (as liquid or vapour).Nonetheless, water vapour flows through the city in the mornings, almost daily, caused by the dominant winds coming from the west.
The weather station from which the relative humidity data used in this study are coming from is located on the campus of the University of Atacama (27.359S/70.353W) and belongs to the Chilean Meteorological Directorate (DMC).In addition to relative humidity, the data measured are atmospheric pressure, temperature, global solar radiation, precipitation, and wind direction and speed, the last both are installed at 10 meters above ground level.These data covering a period from 2016 to 2021 (at the time of writing this article) with, at best, a one-minute recording format.The relative humidity is recorded thanks to a Vaisala HMP155A-L17-PT probe protected from solar radiation and including an air filter with a Teflon membrane installed at 2 meters above ground level.The weather stations managed by the DMC are part of a Chilean national weather data network.
Relative humidity monitoring done for the meteorological purpose could also be used to assess the potential of the Camanchaca as a source of water in a region that suffers substantial scarcity of this essential resource.

Methodology
In this section, we first revise the unit-Lindley distribution and some of its basic properties (Section 3.1).Then, we present a new control chart based on this distribution (Section 3.2).
Figure 2 summarizes visually the adopted methodology, where we first transform the data set records from hourly into four periods per day (considering their minimum and maximum values).That is, the symbolic representation is given by transforming a time window of every 6 h observation points into a pairwise (MIN, MAX) representation.Then, we show the proposed new statistical methodology, which will be adopted as a symbolic interval data approach, through the unit-Lindley distribution and the control chart based on it.In this way, we intend to contribute to the understanding of the complexity of monitoring the humidity in Copiapó city.

The Unit-Lindley Distribution
Introduced by [17], a random variable Y is said to be unit-Lindley distributed with parameter θ > 0, denoted by Y ∼ UL(θ), if its cumulative distribution function (CDF) is given by The corresponding probability density function (PDF) is which is unimodal with maximum at Y max = 1 − θ/3 for θ < 3, and Y max = 0 for θ ≥ 3.
If Y ∼ UL(θ), then the mean and variance of Y are given, respectively, by where Ei(a, z) = ∞ 1 x −a e −xz dx is the exponential integral function [27], which can be computed using the expint(•) function of the expint package [28] in R.
The quantile function, Q(p; θ) = F −1 (p; θ), can be written as where W −1 denotes the negative branch of the Lambert W function [29], which can be computed via the lambertWn(•) function of the pracma package [30] in R. We can easily estimate the parameter θ using the maximum likelihood method.By considering the observed random sample y = (y 1 , y 2 , . . ., y n ) of size n from Y ∼ UL(θ), we obtain the likelihood function . Mazucheli et al. [17] showed that the maximum likelihood estimator (MLE) θ of θ has a closed-form expression and is given by In order to achieve substantial bias reduction, especially for small and moderate sample sizes, the authors derived a bias-corrected MLE θ of θ through the methodology proposed by Cox and Snell [31].This estimator is given by Mazucheli et al. [17] also presented an alternative and useful reparameterization of the unit-Lindley distribution, where µ = E[Y] = 1/(1 + θ) and, thus, θ = 1/µ − 1.In this case, the CDF and PDF of Y ∼ UL(µ), 0 < µ < 1, are written, respectively, as The mean and variance of the reparameterized unit-Lindley distribution are given, respectively, by The quantile function, Q(p; µ) = F −1 (p; µ), can be written as Furthermore, the authors showed that the MLE μ of µ is given by and the corresponding bias-corrected MLE μ of µ is It is worth noting that, in the original work of [17], there was a typo in the μ expression, with t(y) instead of n in the denominator.Due to its simplicity and better interpretability, which makes it arguably a more appealing model to use in practice, we shall hereafter consider this reparameterized version of the unit-Lindley distribution, as well as the control chart limits expressed in terms of the mean parameter µ.

Proposed Unit-Lindley Chart
Suppose that a process (e.g., a hydrological or environmental process) generates data (e.g., rates or proportions) according to a unit-Lindley distribution.That is, if Y denotes the monitored variable, then the PDF of Y is given by (1).Also, consider that the probability of false alarm (or type I error) is α.Thus, we have where µ is the in-control process parameter (that is, the mean value of the quality characteristic based on the in-control state), LCL and UCL are the lower and upper control chart limits, respectively.
Following [32], the control limits and centerline (CL) of the proposed unit-Lindley chart are given by where Q(.) is the quantile function presented in (2).Table 1 shows these control limits for several values of µ, considering α = 0.1, 0.01 and 0.0027.Note that the latter α value corresponds to the standard three-sigma rule (Six Sigma program).
When the parameter µ is unknown/unspecified, it can be estimated using all the available observations (data).Thus, we can replace µ by, e.g., its bias-corrected MLE (3) in the expressions of the control limits shown in (4), obtaining the so-called "trial control limits" [9].

Statistical Performance
In this section, we use Monte Carlo (MC) simulation studies to evaluate the unit-Lindley chart's statistical performance measured in terms of the average run length (ARL), median run length (MRL), and standard deviation of the run length (SDRL).All computational routines were implemented using the R software version 3.6.3[33].
The ARL is a metric widely used to evaluate the efficiency of control charts.The in-control ARL (or ARL 0 ) is defined as the average number of observations (or monitoring points) before a signal is given (that is, a single point falls outside the control limits), assuming that the process is in control.In contrast, the out-of-control ARL (or ARL 1 ) is the average number of observations that are taken until a mean shift is identified when the process is out of control [34].
Since the run length (RL) is high asymmetrically distributed, other metrics than the ARL, such as the SDRL and MRL, can be considered [10].The SDRL is a useful measure used to assess the spread (or dispersion) of the RL distribution, whereas the MRL refers to the midpoint of the RL distribution and is a more credible measure of a chart's performance since it is less affected by the skewness of the RL distribution [35].In addition, we will also use the in-control (SDRL 0 and MRL 0 ) and out-of-control (SDRL 1 and MRL 1 ) versions of these metrics.
Let Y be the result or output of a process that follows a unit-Lindley distribution reparameterized by its mean: Y ∼ UL(µ).Also, let µ s be the shifted mean proportion parameter after a change occurs in µ, that is, Y ∼ UL(µ s ).
For the proposed unit-Lindley chart, the in-control ARL, SDRL and MRL are defined as While the out-of-control metrics are given by Moreover, we will also use the down and up versions of the in-control ARL, MRL and SDRL, which consider the occurrence of false alarms at the LCL (i.e., sample points falling below the lower limit) and UCL (i.e., sample points falling above the upper limit), respectively.These metrics are given by In the usual Six Sigma program, α = 0.0027 and, therefore, ARL 0 = 1/0.0027≈ 370, SDRL 0 = (1 − 0.0027)/0.00272 ≈ 370, and MRL 0 = log(0.5)/log(1 − 0.0027) ≈ 256.This means, e.g., for the first measure, that even though the process is in control, an incorrect out-of-control signal (or false alarm) will be generated every 370 samples, on the average [9].On the other hand, values of ARL 1 ≈ 1 are desired, mainly for large-size shifts in the process mean parameter.

In-Control Processes
Without loss of generality, in this subsection we consider unit-Lindley processes with mean parameter: µ = 0.2, 0.5 and 0.8 (whose PDF plots are shown in Figure 3), as well as two distinct values for the probability of false alarm: α = 0.1 (which corresponds to ARL 0 = 10, SDRL 0 ≈ 9.487, MRL 0 ≈ 6.579, ARL down ).We also use different sample sizes (i.e., different numbers of Phase 1 observations) for each process: n = 10, 30, 50, 100 and 200.According to [9], in the Phase I study (or retrospective analysis), a process data set is collected and analyzed at once, building trial control limits to decide whether the process was under control when the first n observations were gathered.Whereas in the Phase II study (prospective analysis or process monitoring), the control chart constructed from a process "clean" data set showing control (reliable control limits) is used for monitoring future production.
The results obtained from 5000 MC simulations (or replicates) with n * = 5000 Phase 2 observations each, performed for each scenario studied (that is, by varying the number of Phase 1 observations, the mean parameter of the unit-Lindley distribution, and the probability of false alarm), and further information regarding these results, please contact the correspondence author.Despite some slight to moderate discrepancies between the theoretical (target) values of the performance measures, and the values calculated through MC simulations in some cases, which are expected due to the effect of parameter estimation on control chart properties (see, e.g., [36][37][38]), the obtained results seem to indicate the good performance of the proposed control chart.
The results obtained from 5000 MC simulations with n * = 5000 Phase 2 observations.Note that the values of the performance measures fall faster the higher the mean, especially with increases.With the 1% change, there is practically no difference in the metrics.At 10%, one can already see a significant drop, which is more robust for µ = 0.8.With 20%, the estimated values are closer to one.

Comparison with Some Standard Control Charts
The unit-Lindley control chart theory introduced here may be applied in practical situations as a valuable and exciting alternative to, e.g., the well-known beta [11], simplex [10] and Kumaraswamy [7] charts, when the process data are continuous in the interval (0, 1).Thus, in-control proportion/rate/index data can be modelled well via the proposed unit-Lindley chart.
In this subsection, we apply the four above-mentioned control charts to sample data generated from the unit-Lindley, beta, simplex [39] and Kumaraswamy [40] distributions.The aim is to investigate, through simulations, the performance (in terms of the same in-control and out-of-control metrics used before) of these control charts when they are applied to process data that come from different distributions defined on the range (0, 1).
The simulations were carried out using the same settings as described in the previous subsections.Nevertheless, we consider only n = 200, n * = 5000 and α = 0.1.For the true data-generating process (which can be unit-Lindley, beta, simplex or Kumaraswamy distributed), we set the (in-control) mean parameter: µ = 0.2 (case 1), 0.5 (case 2) and 0.8 (case 3).It can be seen, among others, that the proposed unit-Lindley chart has a good performance in all scenarios, despite being based on a distribution with a single parameter (and, thus, more straightforward than the other two-parameter distributions).

Application
In this section, we apply the novel unit-Lindley chart to real data on the relative humidity of the air in the city of Copiapó, Chile.Located in the Atacama Desert, this important northern Chilean city has 16,681.3square kilometers and 158,438 inhabitants [41].
Copiapó's economy is based on mineral exploitation and agriculture, which demands a significant volume of water in both activities.
The acquired data set is from the Copiapó Station, located at the University of Atacama.Data were obtained hourly from 21 December 2021 (day/month/year) to 2 February 2021 (n = 35,047 records).Figure 4 shows the dynamic of the data set, where, through the bimodality of the histogram, at first sight, it is noticeable that at least two phenomena are happening at once., and panel B shows the dynamic of this series in light blue.Also, the solid line represents the TS average using a LOESS (an acronym for "Locally Estimated Scatterplot Smoothing") smoothing method [42].
For many real applications, finding hidden structures or properties in complex data can be computationally costly or problematic due to the messy sets, latent patterns, and noisily signals.Symbolic Data Analysis (SDA) is a paradigm of Machine Learning and Statistics areas aiming to build, describe, analyze and extract new knowledge from more complex data structures.The SDA intends to study and propose methodologies that can handle more complex data (or symbolic data), such as intervals, sets or histograms, in order to consider variability and uncertainty that is often inherent to the data [43][44][45][46][47]. SDA starts summarizing massive classical data and describes the new units of more minor and smoother data sets by symbolic variables.
Nascimento et al. [21] summarized a database containing more than 1.5 billion observations that are signals of a multi-channel electroencephalogram (EEG) corresponding to a frequency over time.A dynamic linear model for EEG interval data was introduced and applied to a database which, after being condensed, resulted in 15,000,000 total observations (that is, only 0.98% of the size of the original data set).This kind of model can incorporate dynamic events regarding more complex data, showing a competitive alternative to modelling a problem, given its flexibility and speed in data convergence.Moreover, questions of dynamic accommodation in neuroscience research could be resolved to reveal brain activity patterns.Thus, to adopt the SDA towards the meteorological data, this work uses them as interval data, minimum-maximum (MIN-MAX) period, holding a physical interpretation regardless of the data compression.We aim first as a data fusion step to transform the data granularity by taking the mean of every 6 h into daily periods: day part 1 (from midnight to 5:59 a.m.UTC), day part 2 (from 6:00 a.m. to 11:59 a.m.UTC), day part 3 (from midday to 5:59 p.m. UTC), and day part 4 (from 6:00 p.m. to 11:59 p.m. UTC).This is in order to reduce the data noise (justified in terms of increasing the signal-to-noise ratio).So, the SDA representation transformed 35,047 original records into 11,736 new ones (5868 observations for the MIN period, and 5868 observations for the MAX period).Table 2 outlines the statistical descriptions per year and highlights in bold the highest values per statistic.Figure 5 shows the histograms of the acquired minimum (left-hand panel) and maximum (right-hand panel) daily period of humidity, where the solid black line represents the unit-Lindley PDF adjusted for these data.It is noticeable the asymmetry present in these data and the nonexistence of a whole period raining (that is, 6 h of raining, which would imply observation in the minimum relative humidity equal to one).As the first SPC analysis, we devoted the records from December 2016 to December 2020 for the Phase 1 study (process parameter estimation/assessment of process stability/control limits establishment).After that, we reserved all the records of 2021 for doing the Phase 2 study (online process monitoring).In Phase 1, we randomly selected 200 observation points to perform statistical inferences, and results suggested that, for both processes (minimum and maximum daily humidity monitoring), the unit-Lindley distribution assumption is valid (with estimated mean parameter values: μ = 0.584 and 0.760 for the MIN and MAX periods/processes, respectively).Also, in comparison with other used distributions beta, simplex and Kumaraswamy), the Kolmogorov-Smirnov goodness-of-fit test (for details on such a test, see, e.g., [48]) corroborates with the unit-Lindley adoption, for both the minimum and maximum TS, as shown in Table 3. Table 4 shows the control limits of both studied processes (using data from December 2016 to December 2020 only), based on the unit-Lindley distribution (unit-Lindley chart), and considering different tolerances to false alarms or type 1 error (α).In Figure 6, we present the control charts for each period of the day (parts 1-4), adopting the unit-Lindley distribution for the minimum and maximum observations (as interval data in blue), considering those daily period records (solid black lines, MIN-MAX respectively), with three tolerance (α) levels (15% as red thick dashed line, 10% as red thin dashed line, and 1% as solid red line).These control charts were developed for interval data in SPC, considering the estimated unit-Lindley LCL of the minimum humidity TS and the estimated unit-Lindley UCL of the maximum TS.
As a next step, we developed a three-dimensional (3D) visualization, considering both control limits for the maximum/minimum daily humidity.Figure 7 shows an SDA bivariate control chart for the humidity monitoring, adopting all the 2021 data records as online process monitoring (i.e., for Phase 2 analysis).This plot contains the maximum humidity values in the z-axis, the minimum humidity values in the y-axis, and the time observation points in the x-axis (as dots).Moreover, the z-and x-axes (background series) project the maximum humidity TS, while the y-and x-axes (bottom series) project the minimum humidity TS.The bivariate plot (minimum-maximum control chart) results in a rectangular box as a control/expected behaviour.Therefore, any observation point that out-patterns the expected dynamic (also called out-of-control) will be coloured in red during 3D plotting (Phase 2), followed by its projections.For instance, by considering the maximum TS (UCL = 0.927), it is notable that days 8 January 2021 (night), 11 January 2021 (dawn), 11 January 2021 (night), 20 January 2021 (dawn) and 20 January 2021 (night) were highlighted, being more significant than the tolerance level of 15%.Whereas, for the minimum TS (UCL = 0.840), 12 days (all during the night period) were extrapolated to the control limits (with ten observation points being from January 2021).Further investigations need to be conducted to appoint a stable tolerance level (α) to estimate these LCL and UCL, although the presented methodology shows to be competitive to make inferences about this process.The estimated control limits are represented as a shaded box, observed out-of-control points are highlighted as red points and their projections placed in the control chart projections.Thus, the TS projection on the bottom (x-and y-axes) is the control chart related to the minimum daily humidity, and the TS projection on the background (x-and z-axes) is the control chart of the maximum daily humidity.

Concluding Remarks and Future Prospects
In this paper, we developed a new control chart based on the unit-Lindley distribution by [17], named as unit-Lindley chart, and its inferential properties.Moreover, we also showed the competitiveness of working with interval data representation (as a SDA) to contour the missing data problem and the presence of noise.
As demonstrated by the simulation and empirical studies, the proposed control chart can be an efficient, exciting and valuable alternative to some well-known SPC tools when dealing with continuous process data in the interval (0, 1), e.g., indices, rates and proportions, not resulting from Bernoulli experiments.For instance, the most common control charts used to monitor this kind of data are based on the beta, simplex and Kumaraswamy distributions, which present more parameters and show some limits towards overdispersed data, confirmed through an extensive MC simulation study (using the ARL, SDRL and MRL metrics).
The developed parametric SPC tool enlightens the prediction and opens new doors to discuss extreme events in the Atacama water particles monitoring through probabilistic reasoning.Further works shall explore the extension of this work to a unit-Lindley regression model (which enables to include trend and season components and spatial dependence).

Figure 1 .
Figure 1.Statistics of the cloud occurrence over the Atacama region (panels B and C), shaded from high probability of cloud occurrence in dark-red to low probability of cloud occurrence in beige.The bottom left-hand map represents Terra MODIS (panel B), and Aqua MODIS (panel C) is in the bottom center map.The transaction of the dark-red area, which occurs mainly during the dawn and morning and is most associated with the Camanchaca increasing the humidity of the Chilean third region coast up to the beginning of the highlands, turns into the low scale of humidity right in the afternoons, represented by the full red map.Two bays are to be noticeable: in Copiapó and Huasco, as convergence points.In panel A, the digital elevation model for part of the Copiapó watershed is presented.Our data acquisition came from a weather station located at the University of Atacama in Copiapó, Chile (in the top left-hand picture).

Figure 2 .
Figure 2. Visual representation of the adopted methodology.

8 Figure 3 .
Figure 3. Unit-Lindley density function for the different parameter values considered in this study.

Figure 4 .
Figure 4. Humidity variation, collected per hour, from Copiapó (Chile) in the last five years.Panel A presents the histogram of the time series (TS), and panel B shows the dynamic of this series in light blue.Also, the solid line represents the TS average using a LOESS (an acronym for "Locally Estimated Scatterplot Smoothing") smoothing method[42].

Figure 5 .
Figure 5. Histogram of the minimum (left-hand panel) and maximum (right-hand panel) observations, after aggregating the daily humidity representation of the data in day periods (parts 1-4), followed by a black solid line representing the estimated PDF of the unit-Lindley distribution.

Figure 6 .
Figure 6.A visualization of the dynamic of Phase 1 (minimum and maximum TS), considering the observation points as four periods of the day, one in each graphic.Top-left panel: the day period part 1 (from midnight to 5:59 a.m.UTC); top-right panel: the day period part 2 (from 6:00 a.m. to 11:59 a.m.UTC); bottom-left panel: the day period part 3 (from midday to 5:59 p.m. UTC); bottom-right panel: the day period part 4 (from 6:00 p.m. to 11:59 p.m. UTC).Thus, the red lines represent three tolerance (α) levels (15% as thick dashed line, 10% as thin dashed line, and 1% as solid line) for each estimated control limit (UCL for the maximum TS, and LCL for the minimum TS).

Figure 7 .
Figure 7. SDA bivariate control chart for the daily humidity in Phase II monitoring (records from 2021).Through the 3D plot, the z-axis represents the maximum upper bound and the y-axis the minimum lower bound from the daily humidity (aggregated per periods), adopting a certain tolerance level (α = 0.15 or 15%), whereas the x-axis is related to the time observation points (as dots).The estimated control limits are represented as a shaded box, observed out-of-control points are highlighted as red points and their projections placed in the control chart projections.Thus, the TS projection on the bottom (x-and y-axes) is the control chart related to the minimum daily humidity, and the TS projection on the background (x-and z-axes) is the control chart of the maximum daily humidity.

Table 1 .
Control limits of the unit-Lindley chart for some values of µ and α.

Table 2 .
Statistical summary description, per year, highlighting in bold the highest values per statistic.

Table 3 .
The p-values from the Kolmogorov-Smirnov goodness-of-fit test for some continuous distributions defined on the range (0, 1) adjusted to the minimum and maximum daily relative humidity data.

Table 4 .
Control limits of the unit-Lindley chart for the minimum and maximum daily humidity monitoring.