Modeling PM2.5 Pollution Using a Truncated Positive Student’s-t Distribution: A Case Study in Chile

Héctor J. Gómez; Karol I. Santoro; Diego I. Gallardo; Paola E. Leal; Tiago M. Magalhães

doi:10.3390/math13233838

Abstract

This study revisits a recently proposed member of the truncated positive family of distributions, referred to as the positively truncated Student’s-t distribution. The distribution retains the structure of the classical Student’s-t distribution while explicitly incorporating a kurtosis parameter, yielding a flexible three-parameter formulation that governs location, scale, and tail behavior. A closed-form quantile function is derived, allowing a novel reparameterization based on the pth quantile and thereby facilitating integration into quantile regression models. The analytical tractability of the quantile function also enables efficient random number generation via the inverse transform method, which supports a comprehensive simulation study demonstrating the strong performance of the proposed estimators, particularly for the degrees-of-freedom parameter. The entire methodology is implemented in the tpn package for the R software. Finally, two real-data applications involving PM2.5 measurements—one without covariates and another with covariates—highlight the model’s robustness and its ability to capture heavy-tailed behavior.

Keywords:

truncated distribution; student’s-t; quantile regression; PM2.5

MSC:

62E10; 62E15; 62E17

1. Introduction

The analysis of environmental phenomena, such as atmospheric pollution, is of critical importance, particularly regarding the concentration of fine particulate matter (PM2.5), which comprises particles with an aerodynamic diameter smaller than 2.5

μ

m. This type of particulate matter poses a significant threat to human health, as it can penetrate deeply into the respiratory tract, reach the pulmonary alveoli, and even enter the bloodstream.

Numerous studies in the scientific literature have examined the adverse effects of PM2.5. For example, Castro et al. [1] investigated the relationship between PM2.5 concentrations and hospitalizations for decompensated heart failure (HF) in hospitals participating in the National Heart Failure Registry across the metropolitan area. Their findings indicate that HF patients, particularly those with a history of diabetes mellitus and/or hypertension, are more vulnerable to PM2.5 exposure.

Similarly, Fernández et al. [2] examined the prevalence of chronic obstructive pulmonary disease (COPD) and its association with PM2.5 exposure. Furthermore, Busch et al. [3] investigated the impact of this pollutant on mortality among older adults, while Matus and Oyarzún [4] assessed its effect on hospitalizations due to respiratory diseases in children. Together, these studies underscore the increasing emphasis on understanding and mitigating the health risks associated with air pollution.

In Latin America, Chile stands out as one of the countries with the highest levels of air pollution. According to Alvarez et al. [5] and the 2024 World Air Quality Report by IQAir, Chile ranks 62nd out of 138 countries in terms of fine particulate matter (PM2.5) pollution. Osorno, along with five other Chilean cities, is among the 15 most polluted in the region. In response to this situation, the Chilean government has implemented various decontamination plans since 2014 aimed at reducing environmental impacts. These measures include firewood-use restrictions, subsidies for cleaner heating systems, and initiatives to improve household thermal efficiency.

Since PM2.5 pollution levels can be highly variable, influenced by factors such as industrial activity, meteorological conditions, and seasonal fluctuations, it is necessary to apply flexible statistical methods to achieve more accurate characterization and inference. In this context, probabilistic models that can accommodate outliers and extreme observations are particularly valuable. Therefore, this paper proposes a modeling framework that captures these features by accounting for the heavy-tailed behavior and substantial dispersion typically observed in PM2.5 data.

The literature also includes several statistical studies that employ quantile regression as an analytical tool. For instance, Yang and Wu [6] developed a neural network model based on quantile regression to predict PM2.5 concentrations. Wu et al. [7] used this approach to analyze the distributional effects of PM2.5’s environmental components on male sperm quality. Similarly, Cao et al. [8] applied quantile regression in a study investigating the relationship between PM2.5 pollution and lung cancer mortality in China. Collectively, these studies underscore the effectiveness of quantile regression in capturing variation across the entire response distribution, thereby providing a deeper and more comprehensive understanding of the underlying phenomena.

Truncated distributions serve as ad hoc models for situations in which the range of a continuous or discrete random variable is restricted to a specific interval, excluding values outside that domain. Such models are particularly useful when data are partially observed or constrained by experimental design or natural censoring [9,10]. The appropriate application of truncated distributions improves parameter estimation and statistical inference in the presence of biased or incomplete data [11,12]. In this work, we focus on models supported on the positive real line. This study introduces a new probabilistic model based on the Truncated Positive Symmetric (TPS) family [13], a class of distributions derived from symmetric density functions centered at zero and truncated to yield distributions supported on the positive real line. Several well-known distributions arise as special cases of the TPS family. When g is the density of the normal distribution, the positive truncated normal (PTN) distribution is obtained. Likewise, if g corresponds to the density of the Laplace, Cauchy, or logistic distributions, the positive truncated Laplace (PTLa), positive truncated Cauchy (PTC), and positive truncated logistic (PTLo) distributions are obtained, respectively. This formulation allows for the construction of a broad class of positive-support distributions by adjusting only two parameters. The main strength of the TPS family lies in its ability to combine a location shift and truncation applied to a symmetric density, producing a flexible structure capable of representing both asymmetry and scale variability in the data.

Inspired by this structure, Gómez et al. [14] developed a computational package for this family, facilitating the generation of distributions suitable for practical applications. It is particularly useful in contexts that require positive support, such as modeling contamination levels, durations, or waiting times. However, a limitation of the original TPS family is its inability to directly control kurtosis, i.e., the heaviness of the distribution’s tails. This limitation is especially relevant in scenarios involving extreme events or outliers, which frequently occur in environmental pollution during critical episodes.

In this work, we propose an extension of the TPS family that uses the Student’s-t distribution as its base function. This symmetric distribution includes a degrees-of-freedom parameter,

ν

, which directly controls kurtosis. The resulting model provides simultaneous control over scale, asymmetry (shape), and kurtosis within a closed-form probability density function (pdf), offering a clear advantage over alternative approaches such as the Slash distribution or models with empirically specified tails [15,16,17]. Moreover, these properties facilitate the integration of the model within a quantile regression framework.

Given this context, it is essential to employ a distribution model that accurately captures both the heavy tails and skewness of PM2.5 data to produce reliable forecasts. Existing models applied to similar datasets in Asia generally fit average values well but fail to adequately account for extreme events [18,19,20]. The model proposed in this work aims to enhance understanding of pollution patterns, anticipate critical episodes, and inform the design of effective public policies, particularly those focused on protecting vulnerable populations, including children, the elderly, and individuals with respiratory conditions.

The remainder of this paper is organized as follows. In Section 2, we introduce the proposed distribution, the Truncated Positive Student’s-t (TPT) distribution, and discuss several key properties of the model. Section 3 describes the inference procedures, including a percentile-based estimation method and the derivation of the observed Fisher information matrix. In Section 4, we present a reparameterization of the model in terms of a quantile and detail its software implementation within the tpn package. Section 5 presents a simulation study to evaluate the performance of maximum likelihood (ML) estimators in finite samples. In Section 6, we apply the proposed models to two real-world datasets, both with and without covariates. Finally, Section 7 provides concluding remarks.

2. The TPT Distribution

Truncated distributions are well-studied in the statistical literature due to their relevance for modeling bounded data. Early works by [10,21] and investigated the theoretical properties of truncated models, while [22] provided computational tools for their implementation, including routines in R via the LaplacesDemon package. These contributions established both the general form of truncated probability models and their key analytical properties.

Building on this framework, we define the TPT distribution as the positive truncation of the standard Student’s-t distribution. To ensure consistency with the truncated positive normal (TPN) model within the TPS family, we adopt the parametrization in terms of

σ

,

λ

, and

ν

, where

σ > 0

represents a scale parameter,

λ \in R

a location parameter, and

ν > 0

the degrees-of-freedom parameter controlling tail heaviness.

Definition 1.

To be more specific, let Y be a continuous random variable. We say that Y follows a Truncated Positive Student’s-t distribution, denoted by

Y \sim TPT (σ, λ, ν)

, if its probability density function (pdf) is given by

f (y; σ, λ, ν) = \frac{1}{σ T_{ν} (λ)} t_{ν} (\frac{y}{σ} - λ), y \geq 0 .

The cumulative distribution function (cdf) is given by

F (y; σ, λ, ν) = \frac{T_{ν} (\frac{y}{σ} - λ) + T_{ν} (λ) - 1}{T_{ν} (λ)}, y \geq 0 .

and the hazard function is given by

h (y; σ, λ, ν) = \frac{f (y)}{1 - F (y)} = \frac{t_{ν} (\frac{y}{σ} - λ)}{σ [1 - T_{ν} (\frac{y}{σ} - λ)]}, y \geq 0 .

(1)

Here,

t_{ν} (\cdot)

and

T_{ν} (\cdot)

denote the pdf and cdf of the Student’s-t distribution with ν degrees of freedom, respectively.

Remark 1.

Using the CDF of the Student’s-t distribution, it is possible to obtain an expression equivalent to Equation (1), valid for

{(y - λ ν)}^{2} < ν^{3}

, given by

h (y; σ, λ, ν) = \frac{2 Γ (\frac{ν + 1}{2}) {(1 + \frac{1}{ν} {(\frac{y}{σ} - λ)}^{2})}^{- \frac{ν + 1}{2}}}{σ [\sqrt{π ν} Γ (\frac{ν}{2}) - 2 z Γ (\frac{ν + 1}{2}) {}_{2}F_{1} (\frac{1}{2}, \frac{ν + 1}{2}; \frac{3}{2}; - \frac{1}{ν} {(\frac{y}{ν} - λ)}^{2})]},

where

{}_{2}F_{1} (\cdot)

denotes the hypergeometric function.

Figure 1 presents the pdf, cdf, and hazard function for the TPT

(σ = 0.5, λ = 1.5, ν)

model, considering various values of

ν

. The TPT model can exhibit heavier tails in the pdf compared to the TPN model, while the hazard function may assume more bell-shaped forms. Furthermore, we observe that, for specific values of

ν

, the pdf increases more gradually at specific points. This behavior arises because the TPT model, due to its heavy tails, converges more slowly to its limiting values, although all functions eventually tend toward one as z increases.

Figure 1. Pdf, cdf and hazard function for the

T P T (σ, λ, ν)

model with different combinations for

ν

.

Observation 1.

The following distributions are special cases of the TPT distribution.

TPT $(σ, λ, ν \to \infty) \equiv$ TPN $(σ, λ)$ .
TPT $(σ, λ = 0, ν) \equiv$ HT $(σ, ν)$ , the half-Student’s-t distribution [23].
TPT $(σ, λ = 0, ν \to \infty) \equiv$ HN $(σ)$ , the half-normal distribution [24].

Figure 2 summarizes the relationships among the TPT and its particular cases. Note that the TPT model includes some known models in the literature, such as the TPN, TPC, HT, and HN distribution.

Figure 2. Particular cases of TPT distribution.

2.1. Quantile Function

Truncated distributions admit quantile functions that can be directly derived from the cdf of the baseline distribution. In the case of the TPT distribution, the quantile function follows immediately from the general definition of a quantile for truncated models (see, e.g., [10]). It is presented here for completeness and consistency with the formulation of the model.

Definition 2.

Let

Y \sim TPT (σ, λ, ν)

. Then, the quantile function of Y is given by

Q (p) = σ [T_{ν}^{- 1} (T_{ν} (λ) [p - 1] + 1) + λ], p \in (0, 1),

(2)

where

T_{ν}^{- 1} (\cdot)

denotes the quantile function of the Student’s-t distribution with ν degrees of freedom.

Remark 2.

From the quantile function, the main quartiles of the TPT distribution can be written as

1.: First quartile: $Q (0.25) = σ [T_{ν}^{- 1} (1 - 0.75 T_{ν} (λ)) + λ] .$
2.: Median: $Q (0.5) = σ [T_{ν}^{- 1} (1 - 0.5 T_{ν} (λ)) + λ] .$
3.: Third quartile: $Q (0.75) = σ [T_{ν}^{- 1} (1 - 0.25 T_{ν} (λ)) + λ] .$

These expressions follow directly from the definition of the quantile function.

2.2. Moments

The moments of the TPT distribution have been previously derived by [25], who provided closed-form expressions for general truncation limits

(a, b)

and location parameter

μ

. In this subsection, we restate the corresponding results for the specific case of the Truncated Positive Student’s-t (TPT) distribution by setting

μ = λ σ

,

a = 0

, and

b = \infty

.

Definition 3.

Let

Z \sim TPT (0, 1, ν)

. The k-th non-central moment of Y is given by

E (Z^{k + 2}) = E_{η} (η^{- (k + 2) / 2} V^{k + 2}), k = - 1, 0, 1, 2, \dots

(3)

where

(k + 1) V^{k} - V^{k + 2} = - 2 ϕ (0)

and

E_{η} (\cdot)

denotes that the expectation is taken in relation to

η \sim G a m m a (ν / 2, 2 / ν)

. The moment exists only if

ν > k

. On the other hand, the non-central moments of

Y \sim TPT (λ, σ, ν)

can be computed as

μ_{k} = E (Y^{k}) = E (σ^{k} {(λ + Z)}^{k}) = σ^{k} \sum_{i = 0}^{k} (\binom{k}{i}) λ^{k - i} E (Z^{i}) = σ^{k} \{λ^{k} + \sum_{i = 1}^{k} (\binom{k}{i}) λ^{k - i} E (Z^{i})\} .

(4)

For instance, the two first non-central moments are reduced to

1.: $μ_{1} = σ (\frac{k_{ν} (λ)}{T_{ν} (λ)} + λ), ν > 1;$
2.: $μ_{2} = \frac{σ^{2}}{T_{ν} (λ)} [λ^{2} T_{ν} (λ) + 2 λ k_{ν} (λ) + \frac{ν Γ (\frac{3}{2}) Γ (\frac{ν - 2}{2})}{\sqrt{π} Γ (\frac{ν}{2})} {}_{2}F_{1} (\frac{3}{2}, - \frac{ν}{2}; \frac{ν + 1}{2}; \frac{λ^{2}}{ν + λ^{2}})], ν > 2$

where

k_{ν} (λ) = \sqrt{\frac{ν}{π}} \frac{Γ (\frac{ν + 1}{2}) {(1 + λ^{2} / ν)}^{\frac{1 - ν}{2}}}{Γ (\frac{ν}{2}) (ν - 1)}

.

2.3. Skewness and Kurtosis Coefficients

The TPT model exhibits heavier tails than other models. However, the traditional kurtosis coefficient (defined in terms of moments) for the TPT model exists only for

ν > 4

. Therefore, in the range where the model acts better (where it produces the heaviest tails, i.e.,

ν \leq 4

), the comparison is not possible. Due to this feature, studying the behavior of the kurtosis (and skewness) coefficient requires an alternative approach, such as those proposed below. A classical measure of skewness was introduced by MacGillivray [26], and is given by

\begin{matrix} A (p) = \frac{Q (1 - p) + Q (p) - 2 Q (0.5)}{Q (1 - p) - Q (p)}, p \in (0, 1) . \end{matrix}

(5)

In particular, the MacGuillevray skewness measurement can efficiently describe the effect of the parameter

ν

on asymmetry. Figure 3 presents the asymmetry coefficient for different values of

λ

and

ν

.

Figure 3. Heat plots of the MacGillivray skewness coefficient and the Moors kurtosis coefficient for model

T P T s (σ = 1, λ

y

ν)

.

The kurtosis of the TPT distribution can also be studied using the Moors kurtosis coefficient [27] given by

\begin{matrix} K = \frac{Q (7 / 8) - Q (5 / 8) + Q (3 / 8) - Q (1 / 8)}{Q (3 / 4) - Q (1 / 4)} . \end{matrix}

(6)

It can be seen in [27] that for large values of (6), the distribution has heavy tails, while for small values, the model exhibits lighter tails. Figure 3 illustrates the behavior of the Moors kurtosis coefficient for the TPT distribution.

Table 1 shows the values produced by our model for the Moor kurtosis and skewness coefficients by MacGuillevray. Note that the skewness and kurtosis coefficients of the TPT model are increased when

ν

is small.

Table 1. MacGuillevray skewness and the Moors kurtosis coefficients for different values of

ν

.

2.4. Rényi Entropy

Rényi entropy is a measure of the uncertainty associated with a random variable. This measure is fundamental in several fields, including ecology and statistics, where it serves as an index of diversity. Rényi entropy is defined as follows.

R_{α} (y) = \frac{1}{1 - α} log (\int_{0}^{\infty} f {(y)}^{α} d y) .

Proposition 1.

Let

T P T (y, σ, λ, ν)

. The Rényi entropy of order α for Y is given by

\begin{matrix} R_{α} (y) = \frac{1}{1 - α} log (\frac{Γ {(\frac{ν + 1}{2})}^{α}}{2 σ^{α - 1} T_{ν}^{α} (λ) ν^{α} π^{α / 2}} (B_{1} (\frac{α (ν + 1) - 1}{2}, \frac{1}{2}) - B_{C} (\frac{α (ν + 1) - 1}{2}, \frac{1}{2}))), \end{matrix}

(7)

where

C = {sin}^{2} (\sqrt{ν} tan (- λ))

and

B_{x} (a, b) = \int_{0}^{x} u^{a - 1} {(1 - u)}^{b - 1} d u

denotes the incomplete beta function.

Proof.

By definition, it follows that

\begin{matrix} R_{α} (z) = \frac{1}{1 - α} log (\frac{Γ^{α} (\frac{ν + 1}{2})}{σ^{α} T_{ν}^{α} (λ) {(ν π)}^{α / 2}} \int_{0}^{\infty} {(1 + \frac{1}{ν} (\frac{z}{σ} - λ))}^{- α (ν + 1) / 2} d z) . \end{matrix}

Taking the kernel of the integral and making the change in variable

u = \frac{z}{σ} - λ

and the trigonometric substitution

u = \sqrt{ν} tan (θ)

, we obtain

\begin{matrix} R_{α} (z) = σ \sqrt{ν} \int_{\sqrt{ν} tan (- λ)}^{π / 2} {(1 - {sin}^{2} (θ))}^{α (ν + 1) / 2 - 1} d θ . \end{matrix}

Using the change

u = {sin}^{2} (θ)

in last integral, it follows that

\begin{matrix} R_{α} (z) & = & \frac{σ \sqrt{ν}}{2} \int_{{sin}^{2} (\sqrt{ν} tan (- λ))}^{1} {(1 - u)}^{α (ν + 1) / 2 - \frac{1}{2} - 1} u^{\frac{1}{2} - 1} d u, \\ = & \frac{σ \sqrt{ν}}{2} (B_{1} (\frac{α (ν + 1) - 1}{2}, \frac{1}{2}) - B_{C} (\frac{α (ν + 1) - 1}{2}, \frac{1}{2})), \end{matrix}

(8)

therefore of Equation (8), we get the result Equation (7). □

2.5. TPT Heavy-Tailed Distribution

The newly generated distribution is based on the Student’s-t distribution, which is heavy-tailed. We know that any probability distribution, specified by its cdf F(t), is a heavy right-tailed distribution (see Rolski et al. [28]) if

\begin{matrix} \underset{t \to \infty}{lim sup} (- \frac{log (1 - F (t))}{t}) = 0 . \end{matrix}

(9)

The following result demonstrates that the TPT distribution is heavy-tailed and right-skewed.

Proposition 2.

The cdf of the random variable

Y \sim TPT (σ, λ, ν)

is a heavy right-tailed distribution.

Proof.

Because the initial limit is of the indeterminate form

\infty / \infty

, applying L’Hopital’s Rule to the upper limit and substituting Equation (9), we have that

\begin{matrix} \underset{x \to \infty}{lim sup} (- \frac{log (1 - F (x; σ, λ, ν))}{x}) & = \underset{x \to \infty}{lim sup} (\frac{f (x; σ, λ, ν)}{1 - F (x; σ, λ, ν)}), \\ = \underset{x \to \infty}{lim sup} \frac{(ν + 1) (\frac{x}{σ} - λ)}{σ (ν + {(\frac{x}{σ} - λ)}^{2})} . \end{matrix}

Again, applying L’Hopital’s rule, we obtain

= \underset{x \to \infty}{lim sup} \frac{ν + 1}{2 σ (\frac{x}{σ} - λ)} = 0 .

□

3. Parameter Estimation

In this section, we present two methods for estimating the parameters of the TPT model. The first method is percentile-based, and the second is the maximum likelihood (ML) approach.

3.1. A Method Based on Percentiles

The motivation for this estimator is that it yields a model whose percentiles p align with the observed data, as represented by the empirical distribution. According to Klugman et al. [29], we introduce the following definitions.

Definition 4.

A percentile-matching estimate of

θ = (σ, λ, ν)

is any solution of the p equations

\begin{matrix} π_{g_{K}} (θ) = {\hat{π}}_{g_{k}}, \end{matrix}

where

g_{1}, g_{2}, \dots, g_{p}

arbitrarily chosen percentiles. From the definition of percentile, the equations can also be written as:

\begin{matrix} F ({\hat{π}}_{g_{k}} | θ) = g_{k}, k = 1, 2, \dots, p . \end{matrix}

(10)

Definition 5.

The smoothed empirical estimate of a percentile is calculated as

\begin{matrix} {\hat{π}}_{g_{k}} = (1 - h) z_{j} + h z_{j + 1}, \end{matrix}

(11)

where

j = [(n + 1)]

and

h = (n + 1) g - j

. Here, [·] indicates the greatest integer function and

z_{1} \leq z_{2} \leq \dots \leq z_{n}

are the order statistics from the sample.

For our particular problem, we choose the 25th, 50th, and 75th percentiles. The smoothed empirical estimates are obtained using the following system of equations

\begin{matrix} \frac{T_{ν} (\frac{{\hat{π}}_{0.25}}{σ} - λ) + T_{ν} (λ) - 1}{T_{ν} (λ)} & = & 0.25; \\ \frac{T_{ν} (\frac{{\hat{π}}_{0.50}}{σ} - λ) + T_{ν} (λ) - 1}{T_{ν} (λ)} & = & 0.50; \\ \frac{T_{ν} (\frac{{\hat{π}}_{0.50}}{σ} - λ) + T_{ν} (λ) - 1}{T_{ν} (λ)} & = & 0.75; \end{matrix}

These equations must be solved by using mathematical software, such as the function nleqslv available in R 4.5.2 software [30].

3.2. ML Estimation

Given

y_{1}, y_{2}, \dots, y_{n}

a random sample of size n from

T P T (σ, λ, ν)

, the log-likelihood function is given by

𝓁 (θ) = n (- log (T_{ν} (λ)) + log Γ (\frac{ν + 1}{2}) - log (σ) - \frac{1}{2} log (π ν) - log Γ (\frac{ν}{2})) + \frac{(ν + 1)}{2} \sum_{i = 1}^{n} log (1 + \frac{1}{ν} {(\frac{y_{i}}{σ} - λ)}^{2}) .

(12)

Therefore, the score function assumes the form

S (θ) = (S_{σ} (θ), S_{λ} (θ), S_{ν} (θ))

, where

\begin{matrix} S_{σ} (θ) & = - \frac{n}{σ} + \frac{(ν + 1)}{ν σ^{2}} \sum_{i = 1}^{n} \frac{y_{i} u_{i}}{d_{u_{i}}}, \end{matrix}

(13)

\begin{matrix} S_{λ} (θ) & = - \frac{n t_{ν} (λ)}{T_{ν}} + \frac{(ν + 1)}{ν} \sum_{i = 1}^{n} \frac{u_{i}}{d_{u_{i}}} and \end{matrix}

(14)

\begin{matrix} S_{ν} (θ) & = - n (\frac{T_{ν}^{(1)}}{T_{ν}} + \frac{1}{2} Ψ (\frac{ν + 1}{2}) - \frac{1}{2 ν} - \frac{1}{2} Ψ (\frac{ν}{2})) + \frac{1}{2} \sum_{i = 1}^{n} log (d_{u_{i}}) \end{matrix}

(15)

\begin{matrix} + \frac{ν + 1}{2 ν^{2}} \sum_{i = 1}^{n} \frac{u_{i}}{d_{u_{i}}}, \end{matrix}

(16)

where

d_{u} = 1 + \frac{1}{ν} u^{2}

,

u = \frac{y}{σ} - λ

,

T_{ν} = T_{ν} (λ)

, and

T_{ν}^{(n)} = \frac{\partial^{n} T_{ν} (λ)}{\partial ν^{n}}

. The ML estimator of

θ

, denoted by

\hat{θ}

, can be obtained by solving the likelihood equations

S (θ) = 0_{3}

, where

0_{3}

is a vector of length 3 containing zeros. Numerical methods, such as the Newton-Raphson procedure, can be employed to solve these equations. Alternatively, other optimization techniques, including the method proposed by MacDonald [31], may also be applied.

3.3. Observed Fisher Information Matrix

The asymptotic variance of

\hat{θ} = (\hat{σ}, \hat{λ}, \hat{ν})

, can be estimated by the Fisher information matrix defined as

I (θ) = - E [\partial^{2} 𝓁 (θ) / \partial θ \partial θ^{⊤}]

, where

𝓁 (θ)

is the log-likelihood function of the TPTs model given in (12). Under the regularity conditions [32].

I {(θ)}^{- 1 / 2} (\hat{θ} - θ) \overset{D}{\to} N_{3} (0_{3}, I_{3}), as n \to + \infty,

(17)

where

D

denotes convergence in distribution and

N_{3} (0_{3}, I_{3})

denotes the standard trivariate normal distribution. The elements of the matrix

\partial^{2} 𝓁 (θ) / \partial θ \partial θ^{⊤}

are given by

I_{σ σ} = \partial^{2} 𝓁 (θ) / \partial σ^{2}

,

I_{σ λ} = \partial^{2} 𝓁 (θ) / \partial σ \partial λ

, and so on. Explicitly, we have

\begin{matrix} I_{σ σ} & = & \frac{n}{σ^{2}} - \frac{(ν + 1)}{{(σ^{2} ν)}^{2}} \sum_{i = 1}^{n} \frac{y_{i} [y_{i} ν d_{u_{i}} + 2 u_{i} (σ d_{u_{i}} - y_{i} (\frac{y_{i}}{σ - λ}))]}{d_{u_{i}}^{2}}, \\ I_{σ λ} & = & - \frac{(ν + 1)}{(ν σ)} \sum_{i = 1}^{n} \frac{y_{i} (ν - u_{i}^{2})}{d_{u_{i}}^{2}}, \\ I_{σ ν} & = & \frac{1}{{(ν σ)}^{2}} \sum_{i = 1}^{n} \frac{y_{i} u_{i} (u_{i}^{2} - 1)}{d_{u_{i}}^{2}}, \\ I_{λ λ} & = & \frac{n (2 λ (ν + 1) {(1 + \frac{λ^{2}}{ν})}^{- 1} T_{ν} - ν T_{ν}^{(1)}) t_{ν} (λ)}{ν {(T_{ν})}^{2}} - \frac{(ν + 1)}{ν} \sum_{i = 1}^{n} \frac{ν + 3 u_{i}^{2}}{d_{u_{i}}^{2}} \\ I_{λ ν} & = & - \frac{n (t_{ν}^{(1)} (λ) T_{ν} - t_{ν} (λ) T_{ν}^{(1)})}{{(T_{ν})}^{2}} + \frac{1}{ν^{2}} \sum_{i = 1}^{n} \frac{u_{i} (ν d_{u_{i}} - (ν + 1))}{d_{u_{i}}} and \\ I_{ν ν} & = & - \frac{n (T_{ν}^{(2)} T_{ν} - {(T_{ν}^{(1)})}^{2})}{{(T_{ν})}^{2}} + \frac{n}{2} Ψ^{(1)} (\frac{ν + 1}{2}) + \frac{n}{2 ν^{2}} - \frac{n}{4} Ψ^{(1)} (\frac{ν}{2}) + \frac{1}{2 ν} \sum_{i = 1}^{n} \frac{u_{i}^{2}}{d_{u_{i}}} \\ + \sum_{i = 1}^{n} \frac{u_{i} (ν^{2} d_{u_{i}} - (ν + 1) u_{i} (2 ν - u_{i}^{2}))}{ν^{4} d_{u_{i}}^{2}} . \end{matrix}

where

t_{ν}^{(n)} (λ) = \frac{\partial^{n} t_{ν} (λ)}{\partial ν^{n}}

and

Ψ (\cdot)

and

Ψ^{(1)} (\cdot)

denote the digamma and trigamma functions, respectively.

In practice, it is not possible to obtain a closed-form expression for the expected value of the previous terms. However, the covariance matrix of the MLEs,

I {(θ)}^{- 1}

, can be consistently estimated by

I {(\hat{θ})}^{- 1}

, where

I (\hat{θ})

denotes the observed information matrix, which is computed as follows.

I (\hat{θ}) = - \frac{\partial^{2} 𝓁 (θ)}{\partial θ \partial θ^{⊤}} |_{θ = \hat{θ}} .

The asymptotic variances of

\hat{σ}

,

\hat{λ}

, and

\hat{ν}

are estimated by the diagonal elements of

I {(\hat{θ})}^{- 1}

, and their standard errors by the square root of the asymptotic variances.

4. TPT Model in a Quantile Regression Framework

Now, we introduce the TPT distribution for a quantile regression scheme. In this approach, the regression model is proposed to describe the conditional quantile of the response variable. Given the simple form of the quantile function for the TPT distribution, the model can be reparameterized in terms of its p-th quantile, denoted

r h o (p) = Q (p; σ, λ, ν)

.

Let

σ = \frac{ρ}{[T_{ν}^{- 1} (T_{ν} (λ) (p - 1) + 1) + λ]}

. Then, the PDF for the reparameterized TPT (henceforth, RTPT) model is given by

f (y; ρ, λ, ν) = \frac{d_{p} (λ, ν)}{ρ T_{ν} (λ)} t_{ν} (\frac{y d_{p} (λ, ν)}{ρ} - λ), y > 0,

(18)

where

d_{p} (λ, ν) = [T_{ν}^{- 1} (T_{ν} (λ) (p - 1) + 1) + λ]

,

ρ > 0

,

λ \in R

and

0 < p < 1

.

The cdf of the RTPT model is given by

\begin{matrix} F (y; ρ, λ, ν) = \frac{T_{ν} (\frac{y d_{p} (λ, ν)}{ρ} - λ) + T_{ν} (λ) - 1}{T_{ν} (λ)} . \end{matrix}

4.1. The Heterogeneous Case

Let

0 < p < 1

the quantile of interest. Assume that, for each observation, the p-th quantile of the distribution can be explained by a set of k covariates, say

x_{i}^{⊤} = (1, x_{i 1}, \dots, x_{i k})

,

i = 1, \dots, n

. We assume that

y_{i} (p) \sim T P T (ρ_{i} (p), λ (p), ν (p))

. For

p = 0.5

, the median regression is obtained as a particular case. The quantile of the distribution is linked with the covariates as

\begin{matrix} g (ρ_{i} (p)) = x_{i}^{⊤} β (p), \end{matrix}

(19)

where

β^{⊤} (p) = (β_{0} (p), β_{1} (p), \dots, β_{k} (p))

is a k-dimensional vector of unknown regression parameters (

k < n

) and

g (\cdot)

is a link function, which is continuous, invertible and at least twice differentiable. We consider the specific case

g (u) = log (u)

, which is the most common function used to link a linear predictor for positive parameters.

4.2. Estimation

In the quantile regression model context for the RTPT distribution, the log-likelihood function for

Θ (p) = (β^{⊤} (p), λ (p), ν (p))

is given by

\begin{matrix} 𝓁 (Θ (p)) = n [log (d_{p} (λ (p), ν (p))) - n T_{ν (p)} (λ (p))] - \sum_{i = 1}^{n} ρ_{i} (p) + \sum_{i = 1}^{n} t_{ν (p)} (\frac{y_{i} d_{p} (λ (p), ν (p))}{ρ_{i} (p)} - λ (p)) . \end{matrix}

(20)

The ML estimators of

Θ

can be obtained maximizing

𝓁 (Θ (p))

with respect to

Θ (p)

. Under mild regularity conditions and when the sample size n is large, the asymptotic distribution of the ML estimator

\hat{Θ} (p) = ({\hat{β}}^{⊤} (p), \hat{λ} (p), \hat{ν} (p))

is approximately multivariate normal (of dimension

k + 1

) and variance covariance matrix

K^{- 1} (Θ)

where

\begin{matrix} K (Θ) = E [- \frac{\partial 𝓁 (Θ)}{\partial Θ \partial Θ^{⊤}}] \end{matrix}

is the expected Fisher information matrix. Note that there is no closed-form expression for the matrix

K (Θ)

. Nevertheless, as shown in [33], the estimated observed Fisher information matrix given by

\begin{matrix} J (\hat{Θ}) = - \frac{\partial 𝓁 (Θ)}{\partial Θ \partial Θ^{⊤}} |_{Θ = \hat{Θ}} \end{matrix}

is a consistent estimator of the expected Fisher information matrix

K (Θ)

. Therefore, for large k, we can replace

K (Θ)

by

J (\hat{Θ})

.

4.3. Computational Implementation

The simulation studies and practical applications were carried out using functions implemented in the R software, specifically in the tpn [14] package. This package provides tools for parameter estimation and regression under the TPT model, facilitating both empirical analysis and model-based inference.

To estimate with the tpn package, install the package in R.

1

Install and load the necessary packages:

₁	`install.packages("tpn")`
₂	`library(tpn)`

2

Load your data into R, specifying y as the response variable, x as the matrix of covariates (may not be specified, indicating the case without covariates), and use the est.tpt function to obtain the ML estimator for the TPT model, along with their standard errors, in addition to the AIC and BIC criteria.

₁	`est.tpt(y, x, q=0.5)`

In this notation, q is the modelled quantile.

5. Simulation

In this section, we conduct a simulation study to assess the performance of the maximum likelihood (ML) estimators for the TPT model. Estimates are obtained using the function est.tpt described in Section 4.3. Additionally, Algorithm 1 outlines a procedure for generating random values from the TPT model, based on the inverse transform method.

Algorithm 1 Simulating values from the

TPT (σ, λ, ν)

distribution

Step 1: Simulate $U \sim U (0, 1)$ (i.e., the standard uniform distribution).
Step 2: Compute $Y = σ [T_{ν}^{- 1} (T_{ν} (λ) (U - 1) + 1) + λ]$

The package tpn also includes the function rtpt to draw values from the TPT model with specified parameters.

In the simulation study, we consider the parameter values

σ \in {1, 10}

,

λ \in {0.5, 2, 5}

, and

ν \in {2, 5, 10}

. Sample sizes of

n \in {50, 100, 200, 500}

are used. For each combination of

σ

,

λ

,

ν

, and n, 1000 replicates are generated, and the corresponding ML estimates and standard errors are computed using the est.tpt function. Table 2 summarizes the results, reporting the estimated bias (Bias), the mean standard error (SE), the root mean squared error (RMSE), and the empirical coverage probability (CP) for the 95% asymptotic confidence intervals.

Table 2. Bias, SE, RMSE and 95% CP for the ML estimators in the TPT distribution.

As the sample size increases, the bias, SE, and RMSE decrease, indicating that the ML estimators exhibit desirable properties even in finite samples. Furthermore, the SE and RMSE converge as n grows, suggesting that the standard errors are accurately estimated. With respect to the coverage probabilities (CPs), they remain close to the nominal level of 0.95, supporting the appropriateness of the normal approximation for inference on the ML estimators.

Observation 2.

The literature on estimating the degrees of freedom parameter (

ν

) of the Student’s t distribution highlights the substantial challenges associated with this task. In contrast, in our model, the maximum likelihood (ML) estimation of

ν

exhibits relatively stable behavior. Nevertheless, potential difficulties in analyzing this parameter remain, as discussed by Lange et al. [34], and have also been noted in other studies, such as Gómez et al. [35], which examines the estimation of

ν

from order statistics.

6. Application

In this section, we analyze a cross-sectional dataset obtained from the National Air Quality Information System (SINCA) of the Chilean Ministry of the Environment, which is publicly available at https://sinca.mma.gob.cl/index.php (accessed on 22 September 2025). The dataset consists of validated measurements from 39 monitoring stations across Chile on 20 June 2025, including concentrations of PM2.5 (

μ {g / m}^{3}

) and SO₂ (

μ {g / m}^{3}

). Unlike previous studies based on time series data, which are often affected by strong autocorrelation, this station-based snapshot mitigates temporal dependence issues. This design enables analysis of the joint distribution of pollutants at a specific point in time. Table 3 summarizes the descriptive statistics for SO₂, revealing its heavy-tailed nature and the presence of extreme values.

Table 3. Summary statistics for SO₂ concentrations (

μ {g / m}^{3}

) across 39 monitoring stations in Chile on 20 June 2025.

The main objective of this analysis is to fit the truncated positive t (TPT) model to explain PM_2.5 levels in terms of SO₂ concentrations, focusing on the upper quantiles of the conditional distribution. This approach is motivated by the importance of upper-tail behavior for environmental policy, as extreme pollutant concentrations are directly associated with critical air-quality episodes. Previous studies (e.g., Nakamura et al. [36]) have shown that geographical and meteorological conditions in southern Chile, such as thermal inversions during the autumn and winter months, exacerbate pollutant accumulation and trigger such critical episodes. Consequently, employing heavy-tailed regression models, such as the RTPT, provides valuable insights to support policy decisions aimed at mitigating severe air pollution.

6.1. Univariate Analysis

In this first step, we analyze the distributional behavior of the SO₂ variable obtained from 39 monitoring stations in Chile on 20 June 2025, the first day of the winter season. Each observation corresponds to a valid pair of PM2.5 and SO₂ values from the same station; however, this section focuses exclusively on SO₂. Table 3 summarizes the main descriptive statistics. The results reveal pronounced skewness and exceptionally high kurtosis, suggesting that the distribution of SO₂ exhibits heavy tails and contains extreme observations.

Figure 4 presents the boxplot and violin plot for the SO₂ data. The plots clearly illustrate the influence of extreme values on the overall distribution. Given its heavy-tailed nature, we fitted several three-parameter models to better capture its underlying distributional characteristics. Specifically, we considered the truncated positive Normal (TPN), the truncated positive t (TPT), and the Generalized Gamma (GG) distributions. According to the Akaike Information Criterion (AIC) [37], the TPT distribution provides the best fit among the candidates, achieving a lower AIC value than both the baseline TPN and the flexible GG model. This result confirms that incorporating the additional shape parameter

ν

in the TPT distribution enhances its ability to model heavy tails while preserving the truncated structure. The corresponding parameter estimates are reported in Table 4.

Figure 4. Boxplot of SO₂ concentrations (left panel) and violin plot of SO₂ concentrations (right panel).

Table 4. Estimated parameters (with standard errors in parentheses) for the fitted models of SO₂.

In summary, the results indicate that the TPT distribution provides the best overall fit among the models considered. Relative to the TPN distribution, the inclusion of the additional shape parameter

ν

in the TPT model enables more flexible modeling of the heavy-tailed behavior observed in the SO₂ data. Likewise, compared with the GG distribution, the TPT achieves an effective trade-off between goodness of fit and parsimony, reinforcing its suitability within the family of truncated positive models. Figure 5 displays the Q–Q plots for the three candidate models, confirming that the TPT distribution aligns most closely with the empirical quantiles and provides a robust characterization of the observed SO₂ concentrations.

Figure 5. QQ-plots of SO₂ concentrations for three fitted models: TPN (left panel), TPT (center panel), and GG (right panel).

6.2. Quantile Regression

In the second step, we examine the conditional distribution of PM_2.5 as a function of SO₂ concentrations using a quantile regression approach based on the truncated positive t model (RTPT). The analysis focuses on the 80th percentile (

p = 0.8

), which holds particular relevance for environmental policy. According to Chilean regulations (D.S. N° 12/2011 of the Ministry of the Environment), the 80th percentile of PM_2.5 serves as a reference threshold for identifying critical pollution episodes. Therefore, this choice allows for a precise evaluation of how SO₂ concentrations influence PM_2.5 levels in the upper tail of the distribution, corresponding to high-risk scenarios for public health.

The estimated conditional quantile model is given by

log (ρ_{i} (p)) = β_{int} (p) + β_{SO 2} (p) \times SO 2_{i}, i = 1, \dots, 39,

where

ρ_{i} (p)

denotes the scale parameter at quantile p, while

λ (p)

and

ν (p)

are additional shape parameters of the TPT distribution. Parameter estimates and their standard errors are reported in Table 5.

Table 5. Parameter estimates (with standard errors in parentheses) and confidence interval (CI) for the RTPT quantile regression model at

p = 0.8

.

The positive coefficient of

{\hat{β}}_{SO 2}

indicates that higher SO₂ levels are associated with increased PM_2.5 concentrations at the 80th percentile. In practical terms, this implies that as air quality conditions approach critical pollution episodes, SO₂ contributes to explaining the rise in PM_2.5 levels within the upper tail of the distribution. Although the estimated coefficient is relatively small (0.039), its effect becomes more pronounced at elevated SO₂ concentrations, underscoring the importance of controlling sulfur dioxide emissions to mitigate severe fine particulate matter pollution episodes.

The shape parameters

\hat{λ} = 1.415

and

\hat{ν} = 2.543

offer additional insight into the data’s distributional features: they indicate moderately heavy tails, consistent with the presence of extreme values observed in the SO₂ sample. This finding further supports the suitability of the RTPT model over traditional regression approaches that assume light-tailed residuals.

The adequacy of the RTPT model was assessed using a set of diagnostic tools. Figure 6 presents the Q–Q plot, likelihood displacement, and generalized Cook’s distance. Panel (a) indicates that the residuals closely follow the theoretical quantiles of the standard normal distribution and remain within the confidence bands, suggesting that the RTPT specification adequately captures the conditional distribution of PM_2.5 given SO₂. Panels (b) and (c) identify potentially influential observations, with station #32 emerging as the most influential according to Cook’s distance. Overall, these diagnostic results confirm that the RTPT model provides a robust fit, even in the presence of extreme observations. Finally, Figure 7 presents the estimated 80th quantile of PM_2.5 as a function of SO₂ concentrations for the city of Osorno. In the Chilean context, this estimate can be interpreted as the threshold used to declare an environmental alert in the city.

Figure 6. Diagnostic plots for the RTPT regression model at the 80th percentile (

p = 0.8

). Panel (a) shows the quantile residuals, indicating that the model adequately captures the distributional characteristics. Panel (b) presents the likelihood displacement, identifying potentially influential observations, with some stations exerting greater influence on the model fit. Panel (c) displays generalized Cook’s distance, highlighting the most influential points, with observation #32 being the most prominent.

Figure 7. Estimation of the 80th quantile for PM2.5 in Osorno in terms of SO₂.

In summary, the quantile regression results indicate that the impact of SO₂ becomes more pronounced under high-pollution scenarios, emphasizing the importance of jointly monitoring both pollutants. By focusing on the 80th percentile, the model provides insights into the factors driving the most severe episodes, precisely where public policy interventions can achieve the greatest effect.

7. Conclusions

This paper introduces the positively truncated Student’s-t distribution, a flexible model derived from the class of truncated distributions. It features separate parameters for scale, shape, and tail heaviness, allowing effective modeling of asymmetric and heavy-tailed data. Closed-form expressions for key analytical properties including the cumulative distribution, hazard function, quantiles, entropy, and moments are derived, which enables efficient random number generation via the inverse transform method. A quantile-based reparameterization is proposed to facilitate applications in quantile regression. Empirical analyses, both with and without covariates, demonstrate that the proposed distribution provides a superior fit and greater robustness than existing alternatives, particularly for high-kurtosis data. Future research could explore incorporating random effects and applying the model within a cure rate framework.

Author Contributions

Conceptualization, H.J.G., K.I.S. and D.I.G.; Methodology, H.J.G., K.I.S. and D.I.G.; Software, D.I.G. and T.M.M.; Formal analysis, K.I.S., D.I.G. and T.M.M.; Investigation, H.J.G., K.I.S. and P.E.L.; Writing—original draft, H.J.G., K.I.S., D.I.G., P.E.L. and T.M.M.; Writing—review & editing, H.J.G., K.I.S., D.I.G., P.E.L. and T.M.M.; Funding acquisition, H.J.G. and P.E.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the MINEDUC-UA project (code ANT 22991), during a research stay of K.I.S. at the Universidad Católica de Temuco.

Data Availability Statement

The data presented in this study are openly available in Sistema de Información Nacional de Calidad del Aire at https://sinca.mma.gob.cl/index.php (accessed on 22 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Castro, P.; Vera, J.; Cifuentes, L.; Wellenius, G.; Verdejo, H.; Sepúlveda, L.; Vukasovic, J.; Llevaneras, S. Polución por material particulado fino (PM 2.5) incrementa las hospitalizaciones por insuficiencia cardíaca. Rev. Chil. Cardiol. 2010, 29, 3. [Google Scholar] [CrossRef]
Fernández, R.; Peña, R.; Bravo-Alvarado, J.; Maisey, K.R.; Reyes, E.P.; Ruiz-Plaza De Los Reyes, D.; Márquez-Reyes, R. Prevalence Distribution of Chronic Obstructive Pulmonary Disease (COPD) in the City of Osorno (Chile) in 2018, and Its Association with Fine Particulate Matter PM2.5 Air Pollution. Atmosphere 2024, 15, 482. [Google Scholar] [CrossRef]
Busch, P.; Rocha, P.; Jin Lee, K.; Cifuentes, L.A.; Hui Tai, X. Short-term exposure to fine particulate pollution and elderly mortality in Chile. Commun. Earth Environ. 2024, 5, 469. [Google Scholar] [CrossRef]
Matus, P.; Oyarzún, M. Impacto del material particulado aéreo (MP2,5) sobre las hospitalizaciones por enfermedades respiratorias en niños: Estudio de caso-control alterno. Rev. Chil. Pediatr. 2019, 90, 166–174. [Google Scholar] [CrossRef] [PubMed]
Álvarez Escobar, B.; Castillo Farina, P.; Navarro-Riffo, J.; Muñoz Muñoz, C.; Boso Gaspar, A. Comportamientos de autoprotección frente a la contaminación del aire y factores psicosociales, Temuco, Chile. Rev. Int. Contam. Ambient 2022, 38, 11–26. [Google Scholar] [CrossRef]
Yang, S.; Wu, H. A novel PM2.5 concentration probability density prediction model combines the least absolute shrinkage and selection operator with quantile regression. Nature 2022, 29, 78265–78291. [Google Scholar] [CrossRef]
Wu, H.S.; Yu, X.L.; Wang, Q.L.; Zeng, Q.H.; Chen, Y.L.; Lv, J.Y.; Wu, Y.; Zhou, H.W.; Zhang, H.F.; Liu, M.; et al. Beyond the mean: Quantile regression to differentiate the distributional effects of ambient PM2.5 constituents on sperm quality among men. Chemosphere 2021, 285, 131496. [Google Scholar] [CrossRef]
Cao, Q.L.; Rui, G.Q.; Liang, Y. Study on PM2.5 pollution and the mortality due to lung cancer in China based on a geographic weighted regression model. BMC Public Health 2018, 18, 925. [Google Scholar] [CrossRef]
Cohen, C. Truncated and Censored Samples: Theory and Applications, 1st ed.; CRC Press: Boca Raton, FL, USA, 1991. [Google Scholar]
Johnson, N.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distribution; Wiley: New York, NY, USA, 1994; Volume 1. [Google Scholar]
Tallis, G.M. The Moment Generating Function of the Truncated Multi-Normal Distribution. J. R. Stat. Soc. Ser.-B-Methodol. 1961, 23, 223–229. [Google Scholar] [CrossRef]
Gómez, H.J.; Santoro, K.I.; Ayma, D.; Cortés, I.E.; Gallardo, D.I.; Magalhães, T.M. A New Generalization of the Truncated Gumbel Distribution with Quantile Regression and Applications. Mathematics 2024, 12, 1762. [Google Scholar] [CrossRef]
Gómez, H.J.; Santoro, K.I.; Barranco-Chamorro, I.; Venegas, O.; Gallardo, D.I.; Gómez, H.W. A Family of Truncated Positive Distributions. Mathematics 2023, 11, 4431. [Google Scholar] [CrossRef]
Gallardo, D.I.; Gómez, H.J.; Gómez, Y.M. tpn: Truncated Positive Normal Model and Extensions. R Package Version 1.12. Available online: https://CRAN.R-project.org/package=tpn (accessed on 22 September 2025).
Olmos, N.M.; Osvaldo, V.; Gómez, Y.M.; Iriarte, Y.A. Confluent hypergeometric slashed-Rayleigh distribution: Properties, estimation and applications. J. Comput. Appl. Math. 2020, 328, 112548. [Google Scholar] [CrossRef]
Reyes, J.; Iriarte, Y.A. A New Family of Modified Slash Distributions with Applications. Mathematics 2023, 11, 3018. [Google Scholar] [CrossRef]
Gomez, Y.M.; Mateluna, D.I.G.; Castro, M.D. A regression model for positive data based on the slashed half-normal distribution. Revstat Stat. J. 2021, 19, 553–573. [Google Scholar]
Nourmohammad, E.; Rashidi, Y. Ground data analysis for PM2.5 Prediction using predictive modeling techniques. J. Air Pollut. Health 2025, 10, 61–82. [Google Scholar] [CrossRef]
Wei, Q.; Chen, Y.; Zhang, H.; Jia, Z.; Yang, J.; Niu, B.; Xu, Z. PM2.5 concentration prediction using a whale optimization algorithm-based hybrid deep learning model in Beijing, China. Environ. Res. 2025, 371. [Google Scholar] [CrossRef]
Wei, Q.; Chen, Y.; Zhang, H.; Jia, Z.; Yang, J.; Niu, B. Simulation and prediction of PM2.5 concentrations and analysis of driving factors using interpretable tree-based models in Shanghai, China. Environ. Res. 2025, 270. [Google Scholar] [CrossRef]
DePriest, D.J. Using the singly truncated normal distribution to analyze satellite data. Commun. Stat.-Theo. Methods 1983, 12, 263–272. [Google Scholar] [CrossRef]
Nadarajah, S.; Kotz, S. R Programs for Computing Truncated Distributions. J. Stat. Softw. 2006, 16, 1–8. [Google Scholar] [CrossRef]
Psarakis, S.; Panaretoes, J. The folded t distribution. Commun. Stat.-Theory Methods 2007, 19, 2717–2734. [Google Scholar] [CrossRef]
Leone, F.C.; Nelson, L.S.; Nottingham, R.B. The folded normal distribution. Technometrics 1961, 3, 543–550. [Google Scholar] [CrossRef]
Kim, H.J. Moments of truncated Student-t distribution. J. Korean Stat. Soc. 2008, 37, 81–87. [Google Scholar] [CrossRef]
MacGillivray, H.L.; Balanda, K.P. The Relationships Between Skewness and Kurtosis. Aust. J. Stat. 1988, 30, 319–337. [Google Scholar] [CrossRef]
Moors, J.J. A quantile alternative for kurtosis. J. R. Stat. Soc. Ser. 1988, 37, 25–32. [Google Scholar] [CrossRef]
Rolski, T.; Schmidli, H.; Schmidt, V.; Teugel, J. Stochastic Processes for Insurance and Finance; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
Klugman, S.A.; Panjer, H.H.; Willmot, G.E. Loss Models: From Data to Decisions, 4th ed.; Wiley: New York, NY, USA, 1998. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 22 September 2025).
MacDonald, I.L. Does Newton-Raphson really fail? Stat. Methods Med. Res. 2014, 23, 308–311. [Google Scholar] [CrossRef]
Pranab, K.S.; Singer, J.M.; de Lima, A.C.P. From Finite Sample to Asymptotic Methods in Statistics; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Lindsay, B.G.; Li, B. On second-order optimality of the observed Fisher information. Ann. Stat. 1997, 25, 2172–2199. [Google Scholar] [CrossRef]
Lange, K.L.; Little, R.J.; Taylor, J. Robust Statistical Modeling Using the t Distribution. J. Am. Stat. Assoc. 1989, 83, 291–309. [Google Scholar] [CrossRef]
Gómez, H.W.; Torres, F.J.; Bolfarine, H. Large-Sample Inference for the Epsilon-Skew-t Distribution. Commun. Stat. Methods. 2007, 36, 73–81. [Google Scholar] [CrossRef]
Nakamura, A.; Nakatani, N.; Maruyama, F.; Noda, J. Characteristics of PM2.5 pollution in Osorno, Chile: Ion chromatography and meteorological data analyses. Atmosphere 2022, 13, 168. [Google Scholar] [CrossRef]
Akaike, H. Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium on Information Theory; Petrov, B.N., Csáki, F., Eds.; Akadémiai Kiadó: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]

Figure 1. Pdf, cdf and hazard function for the

T P T (σ, λ, ν)

model with different combinations for

ν

.

Figure 1. Pdf, cdf and hazard function for the

T P T (σ, λ, ν)

model with different combinations for

ν

.

Figure 2. Particular cases of TPT distribution.

Figure 3. Heat plots of the MacGillivray skewness coefficient and the Moors kurtosis coefficient for model

T P T s (σ = 1, λ

y

ν)

.

Figure 3. Heat plots of the MacGillivray skewness coefficient and the Moors kurtosis coefficient for model

T P T s (σ = 1, λ

y

ν)

.

Figure 4. Boxplot of SO₂ concentrations (left panel) and violin plot of SO₂ concentrations (right panel).

Figure 5. QQ-plots of SO₂ concentrations for three fitted models: TPN (left panel), TPT (center panel), and GG (right panel).

Figure 6. Diagnostic plots for the RTPT regression model at the 80th percentile (

p = 0.8

). Panel (a) shows the quantile residuals, indicating that the model adequately captures the distributional characteristics. Panel (b) presents the likelihood displacement, identifying potentially influential observations, with some stations exerting greater influence on the model fit. Panel (c) displays generalized Cook’s distance, highlighting the most influential points, with observation #32 being the most prominent.

Figure 6. Diagnostic plots for the RTPT regression model at the 80th percentile (

p = 0.8

). Panel (a) shows the quantile residuals, indicating that the model adequately captures the distributional characteristics. Panel (b) presents the likelihood displacement, identifying potentially influential observations, with some stations exerting greater influence on the model fit. Panel (c) displays generalized Cook’s distance, highlighting the most influential points, with observation #32 being the most prominent.

Figure 7. Estimation of the 80th quantile for PM2.5 in Osorno in terms of SO₂.

Table 1. MacGuillevray skewness and the Moors kurtosis coefficients for different values of

ν

.

Table 1. MacGuillevray skewness and the Moors kurtosis coefficients for different values of

ν

.

$ν$	Skewness ( $\sqrt{β_{1}}$ )	Kurtosis ( $β_{2}$ )
0.1	0.9980806	1024.000048
0.3	0.8442262	10.081036
0.6	0.5866104	3.164537
1.0	0.4142136	2.000000
2.0	0.2710897	1.467730
5.0	0.1911341	1.267580
10.0	0.1669352	1.218203
30.0	0.1516583	1.189552
HN	0.1442924	1.176419

Table 2. Bias, SE, RMSE and 95% CP for the ML estimators in the TPT distribution.

				$n = 50$				$n = 100$				$n = 200$				$n = 500$
$σ$	$λ$	$ν$	Param.	Bias	SE	RMSE	CP	Bias	SE	RMSE	CP	Bias	SE	RMSE	CP	Bias	SE	RMSE	CP
1	0.5	2	$σ$	−0.012	0.337	0.234	0.919	0.010	0.228	0.181	0.931	0.008	0.156	0.133	0.941	0.004	0.095	0.080	0.953
			$λ$	0.009	0.611	0.438	0.975	−0.001	0.394	0.315	0.963	−0.010	0.268	0.236	0.951	0.001	0.162	0.137	0.965
			$ν$	0.369	1.666	1.121	0.925	0.221	0.912	0.682	0.938	0.128	0.566	0.449	0.957	0.043	0.321	0.268	0.956
		5	$σ$	−0.093	0.305	0.224	0.875	−0.026	0.238	0.163	0.928	−0.007	0.170	0.127	0.944	0.007	0.107	0.088	0.955
			$λ$	0.196	0.580	0.442	0.943	0.053	0.428	0.300	0.954	0.020	0.302	0.234	0.952	−0.007	0.189	0.164	0.952
			$ν$	0.195	7.609	3.105	0.823	0.621	5.366	2.716	0.923	0.606	3.572	2.252	0.904	0.472	1.956	1.603	0.925
		10	$σ$	−0.142	0.284	0.222	0.837	−0.085	0.217	0.159	0.894	−0.035	0.166	0.120	0.927	−0.011	0.108	0.080	0.942
			$λ$	0.264	0.570	0.453	0.920	0.137	0.415	0.307	0.933	0.063	0.305	0.226	0.946	0.019	0.195	0.151	0.949
			$ν$	−2.567	15.117	5.692	0.749	−1.461	12.827	5.223	0.795	0.138	11.748	5.000	0.865	0.695	7.271	4.120	0.899
	2	2	$σ$	0.000	0.221	0.176	0.945	0.011	0.155	0.133	0.936	−0.025	0.105	0.092	0.942	0.001	0.068	0.056	0.948
			$λ$	0.051	0.512	0.411	0.979	0.007	0.352	0.305	0.942	0.027	0.250	0.212	0.962	0.005	0.155	0.130	0.953
			$ν$	0.386	1.403	1.041	0.933	0.171	0.769	0.622	0.951	−0.095	0.413	0.323	0.921	0.037	0.293	0.239	0.964
		5	$σ$	−0.037	0.197	0.160	0.932	−0.001	0.137	0.118	0.943	−0.028	0.092	0.076	0.951	0.000	0.059	0.051	0.944
			$λ$	0.102	0.475	0.391	0.988	0.025	0.316	0.265	0.958	0.037	0.219	0.180	0.961	0.012	0.137	0.115	0.952
			$ν$	0.471	7.372	3.227	0.857	0.959	5.516	3.073	0.898	0.684	3.196	2.283	0.926	0.090	1.490	1.141	0.931
		10	$σ$	−0.066	0.185	0.147	0.926	−0.050	0.127	0.109	0.935	−0.029	0.089	0.072	0.936	−0.022	0.055	0.046	0.943
			$λ$	0.157	0.467	0.374	0.990	0.091	0.308	0.253	0.962	0.052	0.211	0.176	0.965	0.039	0.130	0.111	0.964
			$ν$	−1.943	15.722	5.696	0.788	−0.793	12.992	5.187	0.830	0.220	10.542	5.141	0.850	0.955	6.849	4.315	0.905
	5	2	$σ$	−0.006	0.209	0.175	0.944	0.008	0.146	0.123	0.951	0.007	0.102	0.089	0.943	0.000	0.064	0.055	0.941
			$λ$	0.141	1.106	0.901	0.951	0.011	0.740	0.619	0.954	−0.007	0.515	0.454	0.942	0.011	0.327	0.278	0.947
			$ν$	0.347	1.354	1.005	0.923	0.217	0.805	0.643	0.952	0.108	0.505	0.411	0.952	0.022	0.294	0.242	0.950
		5	$σ$	−0.032	0.184	0.143	0.937	−0.010	0.127	0.097	0.958	0.008	0.091	0.076	0.956	−0.001	0.056	0.047	0.956
			$λ$	0.238	1.022	0.761	0.978	0.083	0.667	0.495	0.969	−0.020	0.457	0.381	0.955	0.019	0.288	0.238	0.961
			$ν$	0.543	6.440	3.120	0.876	0.610	4.051	2.487	0.908	0.634	2.741	1.958	0.936	0.226	1.430	1.117	0.943
		10	$σ$	−0.068	0.174	0.132	0.934	−0.038	0.120	0.091	0.955	−0.013	0.084	0.064	0.954	−0.009	0.053	0.043	0.945
			$λ$	0.435	1.041	0.776	0.981	0.225	0.667	0.490	0.983	0.083	0.443	0.334	0.975	0.059	0.273	0.222	0.957
			$ν$	−1.926	13.034	5.464	0.753	−0.816	10.367	4.836	0.839	0.735	9.370	5.094	0.869	0.131	4.779	3.298	0.896
10	0.5	2	$σ$	−0.206	3.282	2.434	0.917	0.105	2.279	1.819	0.935	0.112	1.572	1.351	0.951	0.017	0.949	0.781	0.950
			$λ$	0.032	0.595	0.460	0.960	0.009	0.394	0.326	0.963	−0.018	0.269	0.224	0.973	0.004	0.161	0.136	0.958
			$ν$	0.380	1.606	1.137	0.917	0.246	0.940	0.748	0.932	0.161	0.584	0.494	0.959	0.036	0.318	0.251	0.961
		5	$σ$	−0.937	3.009	2.132	0.888	−0.377	2.345	1.714	0.907	−0.001	1.706	1.338	0.943	0.041	1.058	0.864	0.957
			$λ$	0.188	0.572	0.425	0.939	0.086	0.421	0.325	0.936	0.015	0.302	0.240	0.951	−0.001	0.188	0.158	0.964
			$ν$	−0.261	5.848	2.595	0.851	0.586	5.563	2.811	0.874	0.769	3.799	2.483	0.914	0.292	1.798	1.320	0.939
		10	$σ$	−1.416	2.774	2.159	0.860	−0.827	2.203	1.649	0.874	−0.425	1.628	1.207	0.923	−0.065	1.088	0.816	0.943
			$λ$	0.258	0.560	0.451	0.916	0.149	0.417	0.324	0.918	0.070	0.301	0.231	0.953	0.010	0.197	0.151	0.959
			$ν$	−3.324	11.342	5.394	0.734	−1.330	13.075	5.177	0.791	−0.251	10.391	4.695	0.845	0.902	7.594	4.408	0.897
	2	2	$σ$	−0.053	2.196	1.749	0.954	0.139	1.545	1.308	0.948	−0.025	1.070	0.894	0.943	−0.018	0.674	0.571	0.942
			$λ$	0.055	0.511	0.416	0.971	0.006	0.349	0.298	0.953	0.024	0.248	0.211	0.956	0.009	0.155	0.131	0.946
			$ν$	0.279	1.258	0.873	0.936	0.247	0.816	0.648	0.954	0.083	0.490	0.396	0.950	0.038	0.294	0.247	0.948
		5	$σ$	−0.581	1.907	1.532	0.934	−0.119	1.359	1.038	0.959	0.047	0.952	0.781	0.954	0.042	0.595	0.496	0.954
			$λ$	0.154	0.484	0.394	0.988	0.046	0.318	0.249	0.970	0.002	0.218	0.178	0.963	−0.005	0.136	0.112	0.954
			$ν$	−0.637	4.169	2.142	0.839	0.868	5.160	2.828	0.918	0.923	3.453	2.387	0.943	0.334	1.639	1.273	0.940
		10	$σ$	−0.909	1.797	1.588	0.876	−0.313	1.284	0.946	0.953	−0.118	0.896	0.680	0.963	−0.013	0.560	0.439	0.958
			$λ$	0.226	0.481	0.417	0.981	0.074	0.308	0.236	0.986	0.028	0.209	0.162	0.972	0.005	0.129	0.103	0.959
			$ν$	−4.217	7.292	5.314	0.700	−0.908	12.412	5.002	0.819	1.102	12.563	5.880	0.877	1.008	6.838	4.157	0.920
	5	2	$σ$	−0.222	2.035	1.711	0.942	0.100	1.459	1.257	0.949	0.064	1.023	0.836	0.952	0.016	0.643	0.538	0.945
			$λ$	0.204	1.108	0.911	0.955	−0.004	0.737	0.628	0.948	−0.009	0.517	0.430	0.949	0.003	0.327	0.270	0.952
			$ν$	0.168	1.126	0.788	0.923	0.223	0.815	0.677	0.931	0.091	0.500	0.407	0.948	0.026	0.296	0.250	0.938
		5	$σ$	−0.615	1.773	1.478	0.921	−0.038	1.294	1.042	0.955	0.008	0.903	0.756	0.955	0.016	0.566	0.494	0.944
			$λ$	0.398	1.046	0.819	0.978	0.050	0.669	0.531	0.965	0.016	0.461	0.384	0.950	0.000	0.287	0.250	0.944
			$ν$	−0.072	4.895	2.627	0.846	0.929	4.605	2.803	0.914	0.484	2.629	1.943	0.921	0.253	1.450	1.151	0.932
		10	$σ$	−0.997	1.669	1.472	0.901	−0.341	1.202	0.970	0.957	−0.175	0.832	0.625	0.962	−0.008	0.528	0.407	0.961
			$λ$	0.614	1.057	0.885	0.989	0.204	0.660	0.513	0.979	0.103	0.441	0.333	0.983	0.006	0.269	0.208	0.959
			$ν$	−3.348	8.215	5.223	0.727	−0.067	12.471	5.594	0.836	−0.337	7.155	4.064	0.867	1.284	5.946	4.049	0.916

Table 3. Summary statistics for SO₂ concentrations (

μ {g / m}^{3}

) across 39 monitoring stations in Chile on 20 June 2025.

Table 3. Summary statistics for SO₂ concentrations (

μ {g / m}^{3}

) across 39 monitoring stations in Chile on 20 June 2025.

n	Min	Median	Mean	Max	Variance	Skewness	Kurtosis
39	0.210	1.580	3.834	46.160	54.680	5.014	29.045

Table 4. Estimated parameters (with standard errors in parentheses) for the fitted models of SO₂.

Parameters	TPT (SE)	TPN (SE)	GG (SE)
$\hat{σ}$	0.440 (0.265)	174.877 (-)	0.870 (0.103)
$\hat{λ}$	2.720 (1.490)	−45.514 (-)	1.675 (0.317)
$\hat{ν}$	0.486 (0.183)	-	0.557 (0.197)
log-likelihood	−79.2	−91.4	80.5
AIC	164.5	186.9	167.1
BIC	169.5	190.2	168.7

Table 5. Parameter estimates (with standard errors in parentheses) and confidence interval (CI) for the RTPT quantile regression model at

p = 0.8

.

Table 5. Parameter estimates (with standard errors in parentheses) and confidence interval (CI) for the RTPT quantile regression model at

p = 0.8

.

	${\hat{β}}_{int}$	${\hat{β}}_{SO 2}$	$\hat{λ}$	$\hat{ν}$
Estimate	3.782 (0.186)	0.039 (0.021)	1.415 (0.596)	2.543 (1.929)
90% CI	(3.476; 4.088)	(0.004; 0.074)	(0.435; 2.395)	(0.730; 8.856)
95% CI	(3.417; 4.147)	(−0.002; 0.080)	(0.247; 2.583)	(0.575; 11.247)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.