The Exponential-Centred Skew-Normal Distribution

: Data from some research ﬁelds tend to exhibit a positive skew. For example, in experimental psychology, reaction times (RTs) are characterised as being positively skewed. However, it is not unlikely that RTs can take a normal or, even, a negative shape. While the Ex-Gaussian distribution is suitable to model positively skewed data, it cannot cope with negatively skewed data. This manuscript proposes a distribution that can deal with both negative and positive skews: the exponential-centred skew-normal (ECSN) distribution. The mathematical properties of the proposed distribution are reported, and it is featured in two non-synthetic datasets.


Introduction
Hohle [1] used the convolution between two random variables, exponential and Gaussian, to model simple reaction times (RTs) elicited via psychological experiments. The distributions resulting from such convolutions are known as exponential-Gaussian distributions or, simply, Ex-Gaussian (EG) distributions. Other distributions, though, have been proposed to model RT data. For example, it has been shown that the shifted Wald, shifted Birnbaum-Saunders [2] and shifted Gamma distributions [3] provide good fits to RT data (the Birnbaum-Saunders distribution has also been used to model neural activity [4], and a flexible version of it was proposed recently [5]. The slashed quasi-Gamma distribution (an extension of the generalised Gamma distribution) has been proposed to model data with varying skewness and high kurtosis [6]. To our knowledge, neither the flexible Birnbaum-Saunders nor the slashed quasi-Gamma distributions have been tested in the modelling of RT data; hence, this is a task for future research). Nonetheless, distributions of RTs resulting from psychological experiments are commonly adjusted with the EG distribution model. EG distributions were first introduced in an RT analysis carried out by McGill [7], who, when studying the RT cumulative distribution function as a result of different intensities of auditory stimuli, found that they had an exponential form with similar constant times. This led to the assumption that, at least, one component of the total RT had an exponential distribution, and since the time constants that the curves implied seemed to be almost independent of stimulus intensity, McGill [7] hypothesised that this component was the necessary time for the required motor response. Moreover, as the resulting distributions changed according to stimulus intensity, RTs are assumed to contain a non-exponential component that represents the decision time.
As part of a model designed to represent the disjunctive structure of RTs, Christie and Luce [8] assumed that the observed response time consisted of an exponentially distributed decision time plus a variable time or motor response time whose distribution was not specified. Hence, these two studies, by McGill [7] and Christie and Luce [8], offer opposite interpretations of the component with the exponential distribution.
Hohle [1] proposed a very simple hypothesis: the observed RT results from the sum or convolution of two independent random variables: (a) an exponentially distributed dominant variable with parameter τ, as proposed by McGill [7] and Christie and Luce [8], and (b) another variable with a normal distribution, N(µ, σ 2 ), which may be the sum of a number of component variables with similar variations.
Therefore, under this assumption, for Y 1 ∼ EXP(τ) and Y 2 ∼ N(µ, σ 2 ), where Y 1 and Y 2 are independent, it follows that the probability density function (PDF) of the random variable X = Y 1 + Y 2 resulting from the convolution of Y 1 and Y 2 is given by: This will be denoted by EG(τ, µ, σ 2 ). The cumulative distribution F EG (·), survival S EG (·), relative risk r EG (·) and hazard functions h EG (·) are given by: where S N (t) is the survival function of the normal distribution. Additionally, E(X) = τ + µ and Var(X) = τ 2 + σ 2 . and the asymmetry and kurtosis coefficients are given by: It can be then concluded that the EG model has positive asymmetry. Furthermore, when τ → 0, then E(X) → µ, Var(X) → σ 2 , skew → 0, and kurt → 3, which entails that: Although the EG distribution can be useful in modelling positively skewed data (e.g., RTs), in practice, it is not unlikely that data take other shapes (e.g., normal or negatively skewed). Hence, having more flexible distributions is key to model data appropriately. This article proposes a modified EG distribution in which the Gaussian component is replaced by a skew-normal (SN) distribution. Doing so gives origin to the exponential skew-normal distribution; a distribution capable of adapting to various data shapes. First, the statistical properties of this distribution are presented. Given singularity issues associated with the skew-normal distribution, a centred version of this distribution is then described. Next, the performance of the proposed distribution is illustrated via two published datasets. Finally, this distribution is considered in the bigger picture of data analyses and modelling.

Exponential Skew-Normal Distribution
The EG model exhibits positive asymmetry, because the exponential model is positive asymmetric since the second component of the convolution is symmetric around µ. In his proposal, Hohle [1] assumed that the second component of the convolution, from which the EG model results, follows a normal distribution that is located at µ and has a constant variation that represents the sum of a number of variables with similar variations that represents the necessary time for the required motor response, according to McGill's hypothesis [7], or the variable motor response time, according to Christie and Luce's [8].
It is assumed herein that the variable motor response time, or the necessary time for the motor response, follows a skew-normal distribution (see [9]) whose PDF is given by: where φ and Φ denote the standard normal density and cumulative distribution functions, respectively. The asymmetry is controlled by the parameter λ, and the notation Z ∼ SN(λ) is used herein; for λ = 0, it is the normal model. The asymmetry and kurtosis ranges of this distribution are (−0.9953, 0.9953) and [3, 3.8692), respectively (these values are extracted from [10]). The PDF of the location-scale model can be written as: with ξ ∈ R and η > 0 as the location-scale parameters. The notation SN(ξ, η, λ) is used herein. Then, for Y 1 ∼ exp(τ) and Y 2 ∼ SN(µ, σ, α), where Y 1 and Y 2 are independent, the PDF of the random variable X = Y 1 + Y 2 resulting from the convolution of Y 1 and Y 2 is given by: where δ = α √ 1+α 2 and: This distribution is denoted as ESN(τ, µ, σ, α). It can be observed that, if α = 0, then: that is, the EG distribution is obtained. Therefore, the exponential skew-normal (ESN) model contains the EG model as a special case, which entails that the ESN distribution is more flexible in terms of asymmetry than the EG model. The ESN convolution model can be written in terms of the cumulative distribution function of the extended skew-normal model (here E; see [10], p. 40) and setting λ 0 = δσ τ : where Φ E (x; λ 0 , δ) is given by . Figure 1 shows the ESN's density behaviour for certain parameter values; (a) and (b) present a comparison between the ESN and EG distributions (the EG ensues when ESN has α = 0); (c) illustrates the case of negatively skewed ESN distributions. The cumulative distribution function (CDF) of a random variable with an ESN distribution does not have a closed-form expression and should be calculated using a numerical approximation algorithm. It is given by: where: Moreover, the survival, hazard, and inverse risk functions are given by:

Moments
The moments of the ESN distribution can be obtained directly from the stochastic representation of the random variable X = X 1 + X 2 ∼ ESN(τ, µ, σ, α), where X 1 ∼ Exp(τ) and X 2 ∼ SN(µ, σ, α) are independent random variables. Thus, the r th moment of the random variable X = X 1 + X 2 is given by: Given that E(X r−j 1 ) = (r − j)!, the even moments of the SN distribution match those of the normal distribution and by expressing the odd moments of the SN distribution (see [11]), it results that: where: Hence, from the central moments given by E(X − E(X)) r , the expected value and variance of the ESN random variable are: Grushka [12] found that the asymmetry and kurtosis coefficients of the EG random variable depended on the scale parameters of the convolution distributions, i.e., τ and σ. As for the ESN model, the asymmetry (skew) and kurtosis (kurt) coefficients are given by the following expressions: where δ = α √ 1+α 2 and sgn(·) denotes the sign function. Therefore, in line with Grushka's findings [12], (skew) and (kurt)depend on τ σ and the asymmetry parameter α. A (0, ∞) × (−1000, 1000) grid is constructed for τ σ ∈ (0, ∞) and α ∈ (−1000, 1000) values and skew values within (−0.995267, 1.99999); these values are obtained numerically using R statistical software.
Therefore, the range of asymmetry of the ESN model is (−0.9953, 2], which makes it more flexible than the EG model and other asymmetric models such as the SN model [9] and the power-normal model [13,14]. Similarly, for the same range of τ σ and α values, the kurtosis coefficient is within the (0.006039, 8.99999) range (these values were obtained numerically using the Rstatistical software). Moreover, if τ σ → ∞; α → ±∞, then kurt → 9; while if τ σ → 0; α → ±∞, then kurt → 0.8692, and if τ σ → 0; α → 0, then kurt → 0. Therefore, the kurtosis range of the ESN model makes the ESN more flexible than asymmetric models, including the SN model [9], the power-normal model [13,14] and the SN Alpha-power model [15]. The moment-and characteristic-generating functions of the ESN model are given by: and: Note that, for a ∈ R and b > 0, . In other words, the ESN distribution is closed for the linear combination of the form a + bX.

Exponential-Centred Sn Distribution
The skew-normal model is a suitable alternative for datasets with a significant degree of asymmetry; however, its drawback is that it has a singular information matrix if its asymmetry parameter equals zero, i.e., α = 0 [9]. Regarding this case (in which α is in the proximity of zero), some authors have proposed a reparameterization [16]. Another option is to use the so-called centred skew-normal distribution [9], which is obtained as shown below.
let the Y random variable be defined as: For this random variable, it follows that: Given that the random variable Y is a linear combination of Z ∼ SN(α), Y also follows an SN distribution, and after some calculations, it results that: where: with c = {2/(4 − π)} 1/3 and −0.9953 < γ 1 < 0.9953 (these values are provided in [9]). This is known as the centred SN (CSN) distribution and is denoted by CSN(ξ, η, γ 1 ).
Ref. [9] showed that the information matrix of the CSN model can be written as: where I(θ 1 ) is the information matrix of the SN model; D the matrix of the derivatives of the parameter vector θ 1 = (µ, σ, γ 1 ) with respect to the parameter vector θ = (ξ, η, λ); [9] showed that I(θ) is a non-singular matrix.
Note that this centred representation of the SN model reduces the space range of the asymmetry parameter since γ 1 ∈ (−0.9953; 0.9953) (see [17]).
The observed information matrix of the parameter vector θ 1 = (τ, µ, σ, α) is defined as , minus the second derivative of the log-likelihood function with respect to the parameters, provided in Appendix B.
The information matrix of the ECSN model can be obtained as the relationship in Equation (13) using an appropriate matrix B = (∂θ 1 /∂θ) 4×4 .
Furthermore, the observed information matrix (Λ(θ)) and Fisher information matrix (K(θ)) of this model, for the parameter vector θ = (τ, ξ, η, γ 1 ), can be written in terms of the observed and expected information matrices of the ESN model, of the parameter vector θ 1 = (τ, µ, σ, α), in the form: where: The elements of this matrix D 3×3 can be found in their general form in Azzalini and Capitanio ( [17], p. 68). Note that when γ 1 → 0, the inferior sub-matrix of size 3 × 3 is nonsingular and converges to a diagonal matrix diag(σ 2 , σ 2 /2, 6). This event entails that the matrix K(θ) is nonsingular for 0 < τ < ∞. Therefore, in regular conditions and by applying the asymptotic theory to the maximum likelihood estimator vector of the parameter vector θ, Λ(θ) → K(θ). Then, using the same stochastic convergence argument, it follows thatθ converges to a normal distribution; that is: Hence, for the ECSN convolution model, the elements in the diagonal of ∆ −1 (θ) represent the estimates of the variances ofτ,ξ,η andγ 1 . The 100 × (1 − β)% confidence intervals can be estimated in the conventional θ j ∓ Z 1−β/2 σ jj form, where σ jj represents the standard error of the parameter θ j .

Reaction Times
Ref. [18] investigated the sensorimotor processing of words by using a masked priming (MP) task. The MP is a task commonly used in experimental psychology and in which a prime word (e.g., below) is "sandwiched" for a short time (e.g., 35 ms) between a preceding masking pattern (or forward mask, e.g., LKPFH; shown for, say, 200 ms) and a proceeding target word (e.g., above; shown until a response is made). In some variations, there can be a masking pattern between the prime and the target word (called backward mask), and this was the case in Ansorge et al.'s study (such a pattern was shown for approximately 35 ms). Traditionally, there are congruent and incongruent trials; in the former case, there is a match between the response of the participant and a correct or expected response, while in the latter, there is a mismatch between those responses. The time the participant takes in responding is known as the reaction time (RT) and is measured in milliseconds.
Twelve females and 12 males participated in an MP task, and their RTs were collected for both congruent and incongruent trials (see Experiment 2a). The descriptive statistics for the distribution of mean RTs during incongruent trials were as follows:X = 890.1149, s = 98.6875, b 1 = −0.5326, and b 2 = 4.1798. These statistics indicated that the distribution of RTs was negatively skewed; hence, an EG distribution would not provide an appropriate fit as its asymmetry is always positive. When the data were fitted with the ECSN, the following estimates were obtained:τ = 0.2927(0.00002), µ = 977.1546(0.0058),σ = 130.3170(0.0058) andα = −1.8182(0.0041). Considering the centred representation of the ECSN model, the estimated parameters were as follows:τ = 0.2927(0.00002), ξ = 886.0465(0.1350),ν = 93.1774(0.1319) andγ = −0.4012(0.0034). Therefore, the ECSN gave an optimal fit, and it was further corroborated via the Kolmogorov-Smirnov (D) test; D = 0.16667, p-value = 0.9024 (in this work, the D test is used to assess the suitability of a given statistical distribution to a dataset. However, other tests exist; e.g., Cramér-von Mises, Anderson-Darling, Shapiro-Wilk, χ 2 , Hosmer-Lemeshow, Kuiper, Kernelized Stein discrepancy, Moran, and Zhang's Z K , Z C and Z A tests). Figure 2 displays the distribution of mean RTs and the ECSN distribution's fits.
The following example is provided to illustrate that negatively skewed data can emerge in situations other than RTs. For example, in psychometrics, some rating scales can show a negative skew.

The Flourishing Scale
Ref. [19] validated the Flourishing Scale (FS) in a sample of adults (n = 275) with low and moderate levels of well-being. The FS rating scale is an eight item measure of social-psychological functioning that tackles aspects such as relationships, self-esteem, purpose and optimism.
The authors aimed to understand the distribution of the total Flourishing scores (X = TFS, X ∈ Z + ); a global measure of the core aspects of individuals' optimal social and psychological functioning. The descriptive statistics of the TFS showed an average ofX = 41.44, a standard deviation of s = 6.48, a skewness of skew = −1.4394 and a kurtosis of kurt = 5.8725. The negative skewness suggested that the EG model was not the most suitable option because its range of asymmetry was positive. In addition, the high kurtosis left little room for models such as the SN, whose ranges of kurtosis and asymmetry were [3, 3.86) and (−0.9953, 0.9953), respectively.
The EG, the skew-normal Alpha-power (SNAP) and the ECSN models were fitted to the data. The exponential representation τ exp(−τx) was used for those models. The four parameter model SN AP(µ, σ, λ, α), studied by [15], is more flexible than its SN and power-normal (PN) counterparts because when α = 1, the SN distribution was obtained, and when λ = 0, the PN distribution was obtained; a normal distribution ensues when λ = 0 and α = 1. The SNAP's ranges of asymmetry and kurtosis were [−1.4676, 0.9953) and [1.4672, 5.4386], respectively [15], which made it a good fit to the TFS data.
These models were compared using the Akaike information criterion (AIC) [20] and the Bayesian information criterion (BIC) of Schwarz [21], defined as: where p is the number of parameters of the vector θ of the specific model. Based on the results of the AIC and the BIC, the most suitable model for the TFS data was the ECSN model (see Table 1). Figure 3 further shows the good fit of the ECSN model. Figure 4 shows the low degree of fit of the EG (a) and the SNAP (b) models; but it is evident that the EG gave a better fit than SNAP (see also the AIC and BIC values in Table 1).
A null hypothesis test of no difference between the ECSN model and the EG model can be expressed as, H 01 : α = 0 versus H 11 : α = 0.
By using the likelihood-ratio test (models are nested), it is found that: −2 log(Λ 1 ) = −2(−903.665 + 865.955) = 75.4198, This was a value larger than the critical 5% chi-squared value, i.e., χ 2 1,95% = 3.8414. Therefore, the null hypothesis was rejected, and it could be concluded that the ECSN model-which involved one extra parameter, making it more flexible in terms of asymmetry-fit the data better than the EG model (compare Figure 3b and Figure 4a).    Furthermore, the D test was used to determine the best fit for the TFS data. The results were as follows: for the EG model, D = 0.200 (p-value = 0.000); for the SNAP model, D = 0.2290 (p-value = 0.000); and for the ECSN model, D = 0.0981 (p-value = 0.1411). The results thus corroborated that the ECSN model was suitable for the TFS data.

Discussion
Data can take different distributional shapes, but some shapes are more common in certain research fields (for more information about other distributions and their applications, see the Special Issues in Volume 11 Issue 11 (year 2019) and Volume 12 Issue 5 (year 2020) in this journal). RTs, for example, are a ubiquitous metric in experimental psychology, and it is well known that their distribution is positively skewed. However, that is not necessarily always the case, and normally or negatively shaped RTs can ensue. In those cases, the EG distribution, a distribution traditionally used to model RT data, is not able to fit the data well. In this article, a distribution capable of fitting traditional RT data and RT data that deviate from a positive skew was proposed.
It is important, however, to frame the proposed distribution within the larger field of data analysis and modelling. While most research across scientific fields tends to focus on average differences, not much attention is paid to investigate group differences in variation. Indeed, tests designed to account for both location and scale (e.g., the Cucconi test) are rarely used. Adopting a distributional modelling approach enables tackling data differently by considering, at a minimum, the data's location (e.g., mean) and scale (e.g., SD). To our knowledge, the only existing statistical approach able to model data in a proper distributional manner is the generalised additive models for location, scale and shape (GAMLSS).
To conclude, the ECSN distribution was proposed as a new statistical distribution suitable to adjust the shape of RT data. As the ECSN was a four parameter distribution, it could be used to model RT data's location, scale and shape. It is our hope to see this distribution implemented in the GAMLSS R package so that it is put to the test in research contexts. The necessary Rcodes and datasets to reproduce the estimations and plots reported here can be found at https://figshare.com/projects/ Two_new_distributions_for_the_modelling_of_RT_data/81743.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Score Function
The elements of the score function, obtained from (16), are given by: where: Therefore, for θ j = τ, µ, σ, α, the following is obtained: