Some New Facts about the Unit-Rayleigh Distribution with Applications

: The unit-Rayleigh distribution is a one-parameter distribution with support on the unit interval. It is deﬁned as the so-called unit-Weibull distribution with a shape parameter equal to two. As a particular case among others, it seems that it has not been given special attention. This paper shows that the unit-Rayleigh distribution is much more interesting than it might at ﬁrst glance, revealing closed-form expressions of important functions, and new desirable properties for application purposes. More precisely, on the theoretical level, we contribute to the following aspects: (i) we bring new characteristics on the form analysis of its main probabilistic and reliability functions, and show that the possible mode has a simple analytical expression, (ii) we prove new stochastic ordering results, (iii) we expose closed-form expressions of the incomplete and probability weighted moments at the basis of various probability functions and measures, (iv) we investigate distributional properties of the order statistics, (v) we show that the reliability coefﬁcient can have a simple ratio expression, (vi) we provide a tractable expansion for the Tsallis entropy and (vii) we propose some bivariate unit-Rayleigh distributions. On a practical level, we show that the maximum likelihood estimate has a quite simple closed-form. Three data sets are analyzed and adjusted, revealing that the unit-Rayleigh distribution can be a better alternative to standard one-parameter unit distributions, such as the one-parameter Kumaraswamy, Topp–Leone, one-parameter beta, power and transmuted distributions. estimation of the unique unknown parameter by the maximum likelihood technique. Based on three real data sets, we have proved empirically that it can be superior to other well-reputed one-parameter unit distributions, namely the one-parameter Kumaraswamy, one-parameter beta, power and transmuted distributions. We hope that this study will be able to convince applied statisticians, and readers in general, that the unit-Rayleigh distribution can be used effectively in different ﬁelds dealing with unit data.


Introduction
In many applied scenarios, we are often confronted with the uncertainty of a phenomenon that can be quantified in a bounded range of values. For the sake of accuracy, proper modeling should take this information into account. As an immediate example, it is natural to model characteristics of the

USEFULNESS OF THE UNIT-DISTRIBUTIONS
Unit Rayleigh distribution (UR): An unexplored special case of the unit Weibull distribution

New motivations and interests of the UR distribution for application purposes
Revelations on the flexibility of the UR pdf and hrf, also supported by a graphical analysis

Applications to practical datasets, with the conclusion that the UR model may have a much better fit than well-known competitors
Closed form expressions of a lot of crucial functions and measures

Corresponding Functions
As the basis, the unit-Rayleigh distribution is associated with the cdf given as where β > 0, and F(x) = 0 for x ≤ 0 and F(x) = 1 for x > 1. Thus defined, it is a special case of the unit-Weibull distribution introduced by [15] with shape parameter equal to 2. By construction, the unit-Rayleigh distribution is the distribution of the rv exp(−Y), where Y denotes an rv following the Rayleigh distribution with scale parameter β, i.e., with cdf F Y (x) = 1 − exp(−βx 2 ), with x > 0, and F Y (x) = 0 otherwise. The Rayleigh distribution also corresponds to the chi-squared distribution with two degrees of freedom. The basics and properties of this distribution can be found in [23]. As a new fact, the unit-Rayleigh distribution is also the distribution of the rv 1/Z, where Z denotes a rv following the Benini distribution with truncated parameter equal to 1, i.e., with cdf 2 , with x > 1, and F Z (x) = 0 otherwise. The Benini distribution is a long tail distribution that can be viewed as a generalization of the Pareto distribution. We may refer to the former work of [24].
Based on F(x), the pdf of the unit-Rayleigh distribution is given as and f (x) = 0 for x ∈ (0, 1). The shape properties of this function are fundamental to evaluate the capability of the unit-Rayleigh model to fit data. This aspect will be discussed later. The survival function is obtained bȳ andF(x) = 1 for x ≤ 0 andF(x) = 0 for x > 1, the cumulative hrf is specified as and H(x) = 0 for x ≤ 0 and H(x) = +∞ for x > 1, and the hrf is given as and h(x) = 0 for x ∈ (0, 1). The shape properties of the hrf are precious indicators on some features of the unit-Rayleigh model. This point will be discussed later. We end this part by specifying the quantile function of the unit-Rayleigh distribution obtained as

Analysis of the cdf
Basically, F(x) is an increasing and derivable function with respect to x for x ∈ (0, 1). We can express it as a function of the power distribution cdf as The following inequalities can be deduced: For any x ∈ (0, exp(−1)), we have F(x) ≤ x β , and for x ∈ (exp(−1), 1), the reverse inequality holds: In addition, we can remark that F(x) is a decreasing function with respect to β; by setting F(x; β) = F(x), for any β 2 ≥ β 1 , we have This inequality reveals a basic first-order stochastic dominance of the unit-Rayleigh distribution.
Moreover, one can note that, for any x ∈ (0, 1), implying that F(x; β) is a convex function with respect to β.

Analysis of the pdf
In this section, we analyze f (x) as described in (2), also performing a mode(s) analysis. Such a global analysis has been performed in [16] for the unit-Weibull distribution, in full generality. Here, we provide more specific details on this aspect for the unit-Rayleigh distribution, including the expression of the mode and its comportment when β varied.
As an alpha remark, let us note that, for any β > 0, Since it is positive, the function f (x) is not monotonic; the points 0 and 1 are not modes of f (x). Now, for x ∈ (0, 1), we have Therefore, a critical point for f (x), say x 0 , satisfies x 0 ∈ (0, 1) and 2β [log (x 0 )] 2 + log (x 0 ) − 1 = 0. After developments, we get Let us now study the nature of this critical point. For any x ∈ (0, 1), we have After developments, we obtain η = − 8β + 1 < 0. We conclude that the point x 0 as defined by (5) is a maximum for the function f (x); it is the (unique) mode of the unit-Rayleigh distribution. Therefore, the pdf of the unit-Rayleigh distribution is "more or less bell shape".
Let us now discuss the behavior of this mode. By setting x 0 (β) = x 0 , we have implying that x 0 (β) is an increasing function with respect to β. Furthermore, we have meaning that the mode can take all the values of the interval (0, 1). This result indicates a certain flexibility of the unit-Rayleigh distribution regarding its mode. A graphical illustration of the possible shapes of f (x) is provided in Figure 2, considering the following values for β: 0.1, 0.5, 0.8, 1.5 and 4. We see in Figure 2 the "more or less bell shape" of the pdf, with an increasing mode according to β. Note that the black curve, corresponding to the pdf defined with β = 0.1, is "highly spiked"; the increasing curve does not appear in the figure because it is too sharp. In some senses, this graphical analysis completes the one performed in ( [15] Figure 1, last subfigure).

Analysis of the hrf
In this section, we analyze h(x) as specified in (3). As far as we know, this aspect has been explored only graphically in ( [15] Figure 2, last subfigure). Some new facts are discussed below. Firstly, for any β > 0, we have lim is positive, the point 0 is a minimum for h(x). Now, for x ∈ (0, 1), we have being the difference of two positive functions. In view of the denominator term, a critical point for h(x), say x 1 , satisfies x 1 ∈ (0, 1) and w(x 1 ) = 0. The study of this equation is not obvious from the analytical side. As a result, we propose some alternative arguments showing that the hrf can have various forms.
Firstly, due the presence of the β in factor of the first term, by dominance, for any x ∈ (0, 1), we have lim β→+∞ w(x) = +∞. This implies the existence of a β * such that, for β > β * , we have w(x) > 0 and, a fortiori, dh(x)/dx > 0. Hence, for β > β * , h(x) is an increasing function with respect to x. Now, assume that β is small, say β → 0. Then, standard equivalences gives w(x) ∼ β [log(x)] 2 (1 + log(x)), and the equivalence function is equal to 0 if x = exp(−1) ∈ (0, 1). Therefore, in the case where β is small enough, at least one critical point exists. This new fact reveals that the hrf is not only an increasing function, as one can think at first sight of ( [15] Figure 2, last subfigure); non monotonic shapes are possible for "small or not too small" β.
We illustrate this new fact by some plots of h(x) in Figure 2, considering the following values for β: 0.1, 0.18, 0.3, 0.5 and 1.5. We see in Figure 3 that the hrf can be increasing with convex and concave properties. For the black line corresponding to the hrf defined with β = 0.1, a bathtub shape is observed. These observations confirm the flexible hazard rate of the unit-Rayleigh distribution.

New Results
More mathematical results are developed in this section, all new.

Stochastic Order Results
We have already presented some stochastic order results involving the cdfs of the unit-Rayleigh and power distributions. More technical ones are described in the result below. Proposition 1. The following inequality holds: both being cdfs of unit-distributions.
Proof. The inequalities are immediate for x ≤ 0 and x ≥ 1. For x ∈ (0, 1), we can express F(x) as The desired result is a consequence of the following well-known inequalities: This concludes the proof of Proposition 1.
As far as we know, the cdfs F o (x) and F oo (x) described in Proposition 1 are not listed in the literature. They can be of independent interest for purposes out of the scope of this paper (modelling of proportion-type characteristics, constructions of new general families of continuous distributions, etc.).

Incomplete Moments
The incomplete moments of the unit-Rayleigh remain unexplored. We now fill this gap by providing their analytical expressions via comprehensive functions.

Proposition 2.
Let r be a nonnegative integer and X be a rv following the unit-Rayleigh distribution. Then, the r-th incomplete moment of X at t ∈ (0, 1) is given as

where I(A) denotes the indicator function over an event A and erfc(a) is the complementary error function
Proof. We recall that the unit-Rayleigh distribution corresponds to the one of the rv exp(−Y), where Y is a rv following the Rayleigh distribution with scale parameter β, i.e., with pdf f Y (x) = 2βx exp(−βx 2 ), with x > 0, and f Y (x) = 0 otherwise. Therefore, we have By applying the change of variable y = βx + r/(2 β), that is x = y − r/(2 β) / β, and performing some calculus, we obtain The result of Proposition 2 is obtained.
From Proposition 2, by taking r = 0, we obtain m 0 (t) = F(t) = exp −β [log(t)] 2 with t ∈ (0, 1). The r-th raw moments of X can be derived as We thus rediscover the formula in ([15] Subsection 2.2). In addition, the incomplete moments of X allow us to define an arsenal of interesting measures and functions involving the unit-Rayleigh distribution, such as mean deviations, mean residual life function, variance residual life function, reversed mean residual life function, Zenga curve, and so on. The complete list can be found in the book of [25], among others.
For approximation purposes of the incomplete moments of X, for any a ∈ R, we can use the well-known expression and approximation of the function erfc(a) given as where J denotes a large integer. Let us mention that some more simple approximations of erfc(a) exist, with the assumptions that x is "small enough" or "large enough" (see [26]). On the other side, erfc(a) is implemented in all the modern mathematical softwares, making the computations of m r (t) straightforward.
The following result presents a new and simple series expansion of the incomplete moments, with direct integration; no existing results on erfc(x) is used. It thus provides an alternative expression to the one presented in Proposition 2.

Proposition 3.
Under the setting of Proposition 2, for any t ∈ (0, 1), the following series expansion holds: where Γ(a, x) is the incomplete upper gamma function defined by Γ(a, x) = +∞ x t a−1 exp(−t)dt, with a, x > 0.
Proof. By making the change of variable x = Q(y) as defined as (4), we get By applying the Taylor series expansion of the exponential function, we obtain The proof of Proposition 3 follows from the above equalities.
From Proposition 3, by taking r = 0 with the convention 0 0 = 1 in the sum, we rediscover 2 , with t ∈ (0, 1). The r-th raw moments of X can be derived as where Γ(a) is the standard gamma function defined by Γ(a) = +∞ 0 t a−1 exp(−t)dt, with a > 0. The following simple finite sum approximation holds: where J denotes a large integer.

Probability Weighted Moments
The probability-weighted moments can be viewed as generalizations of raw moments. They appear quite naturally when we deal with the raw moments of order statistics. The closed forms of the probability weighted moments for the unit-Rayleigh distribution are given below.

Proposition 4.
Let r and s be two nonnegative integers and X be a rv following the unit-Rayleigh distribution. Then, the (r, s)-th probability weighted moment of X is given as erfc(x) being the complementary error function.
Proof. First of all, based on (1) and (2), let us notice that where f * (s) denotes the pdf of a rv Z following the unit-Rayleigh distribution with scale parameter β + s. Therefore Owing to Proposition 2 with β + s instead of β and t → 1, we have By combining the two equalities above, we conclude the proof of Proposition 4.
Clearly, we have m r = m r,0 . The probability-weighted moments will find applications in the next section.

Order Statistics
The modeling of several physical systems involved the use of order statistics. In this section, the basic properties of the order statistics of the unit-Rayleigh distribution are discussed. The theory and details on order statistics in a general setting can be found in [27].
First, based on a well-known distributional result of order statistics, (1) and (2), the pdf of the u-th order statistic of X in a random sample of size n from the unit-Rayleigh distribution, say X (u) , is x ∈ (0, 1).
In particular, for the minimum and maximum order statistics, we get respectively. The raw moments of X (u) can be simply expressed via the probability weighted moments of the former unit-Rayleigh distribution. Indeed, from the first expression of f X (u) (x) in (6) and the binomial formula, we can write and the r-th raw moment of X (u) is specified by where, by Proposition 4, From the raw moments of X (u) , several measures can be derived such as the skewness and kurtosis coefficients, L-moments, allowing to define the L-scale, L-skewness and L-kurtosis, among others.

Reliability Coefficient
The reliability coefficient allows us to study the behavior of various random systems. It is defined as the probability that a hierarchy exists between two characteristics of the system with unknown values a priori. All the details can be found in [28]. Here, we show that the reliability coefficient can be expressed in a simple manner for the unit-Rayleigh distribution.
Proposition 5. Let U and V be two independent rvs following the unit-Rayleigh distribution with scale parameters β and β * , respectively. Then, the corresponding reliability coefficient is defined by R = P(U ≤ V) and be the pdf of V and f * (x) be the pdf of the unit-Rayleigh distribution with scale parameter β + β * . Then, we have This proved Proposition 5.
From Proposition 5, we clearly have R < 1/2 for β * < β, R = 1/2 for β * = β, and R > 1/2 for β * > β. The simple expression of R is useful for statistical aims. In particular, by the invariance property, maximum likelihood estimates of the parameters β and β * , sayβ andβ * , respectively, provide the maximum likelihood estimate for R given aŝ

Tsallis Entropy
Commonly, the Tsallis entropy is a measure of randomness of a random variable. One can refer to the study of [29] for discussions on the roles of various entropy measures in applied sciences, including the Tsallis entropy. The following result concerns a series expansion of this entropy measure in the context of the unit-Rayleigh distribution. Proposition 6. Let τ = 1 and τ > 0. Then, the Tsallis entropy of a random variable X following the unit-Rayleigh distribution can be expressed as Proof. We only need to treat the integral term in the definition of T r . Owing to (2), we have Therefore, by making the change of variable x = exp(−y), i.e., y = − log(x), and by introducing a rv Y following the Rayleigh distribution with scale parameter βτ, we get Now, by using the Taylor series expansion of the exponential function and the following well-known moment properties of the Rayleigh distribution: For any υ > −2, By putting the above equalities together, we obtain The desired result follows by substituting this series expansion into the former definition of T r . From Proposition 6, the following approximation holds: where J is a large enough integer.

Some Bivariate Unit-Rayleigh Distributions
Now, we present some motivated ideas to construct bivariate unit-Rayleigh distributions, which are of interest for the modelling of conjoint characteristics with values on the unit interval. In order to keep a control on the structure of the marginal rvs, we propose to use the special probabilistic functions called copulas (see [30]).
Similarly, one can also define a bivariate unit-Rayleigh distribution by using the Clayton copula. Thus, we can define the Clayton unit-Rayleigh distribution by the following cdf: where λ ≥ −1 and λ = 0, F(x; β) and F(y; β * ) are defined as (1) with the scale parameters β and β * , respectively. Thus, for (x, y) ∈ (0, 1) 2 , we have As the last example, we can define the Gumbel unit-Rayleigh distribution by the following cdf: where λ ≥ 1, F(x; β) and F(y; β * ) are defined as (1) with the scale parameters β and β * , respectively. Hence, for (x, y) ∈ (0, 1) 2 , we have These bivariate extensions generate bivariate models that may be useful in the analysis of compositional data with values over (0, 1) 2 , involving proportions and/or percentages. Concrete applications can be found in chemistry, demography, geology, high throughput sequencing and survey. Estimation of the model parameters can be performed via the multivariate likelihood estimation method (see [31]). The detail on the statistical analysis of compositional data can be found in [32,33].

Applications
This section shows the applicability behavior of the unit-Rayleigh distribution in a data analysis framework, which has not received a particular attention in [15] or [16]. We estimate the parameter β by the maximum likelihood method, as done in [15] but by putting the shape parameter equal to 2; it is not to be estimated. In this case, from n observations of a rv following the unit-Rayleigh distribution, say x 1 , . . . , x n , the maximum likelihood estimate of β is defined bŷ Based on (1)-(3), the estimated cdf, pdf and hrf are obtained by substituting β byβ in their own expressions.
Thus, with the maximum likelihood method, we aim to compare the fit behavior of the unit-Rayleigh distribution with those of the following well-known one-parameter competitors.
• the one-parameter Kumaraswamy (Ku) distribution (or special Lehmann type II power distribution) defined by the cdf given as F Ku (x) = 0 for x ≤ 0 and F Ku (x) = 1 for x ≥ 1, with α > 0. See [7]. • the Topp-Leone (TL) distribution defined by the cdf specified by F TL (x) = 0 for x ≤ 0 and F TL (x) = 1 for x ≥ 1, where θ > 0. See [6]. • the one-parameter beta (B) distribution defined by the cdf given as • the power (P) distribution defined by the cdf expressed by the transmuted (TM) distribution defined by the cdf expressed by We may refer to [34] all the characteristics of the transmuted distribution.
These distributions can be assimilated to semi-parametric statistical models for adjustment purposes. The following classical criteria are used to compare the fits: minus estimated log-likelihood (−ˆ ), consistent Akaike information criterion (CAIC), Hannan-Quinn information criterion (HQIC), Akaike information criterion (AIC), Bayesian information criterion (BIC), Cramer-von Mises criterion (W) and Anderson-Darling criterion (A) are computed. The lower the values of these criteria, the better the fit. The R software developed by [35] is used, with the help of the R function goodness.fit function from the package AdequacyModel (see [36]).
Data with values into (0, 1) can be of various natures, including percentages or proportions. Based on positive data x 1 , . . . , x n , one can suppose that a phenomenon can be modeled by a random variable U with estimation of the upper bound of its theoretical support by m = sup(x 1 , . . . , x n ) or any reasonable larger value. Then, we can consider the random variable X = U/m which has support into (0, 1). In all the situations, we can recover the distribution of U by multiplication with m a posteriori. In the next, three data sets are considered. The first two data sets used the previous schema, and the third one contains proportions-like data initially communicated with values in (0, 1).
First, we consider data of times to infection of kidney dialysis patients in months, as described by [37]. The "times of infection" data set is: {2.5, 2.5, 3.5, 3.5, 3.5, 4.5, 5.5, 6.5, 6.5, 7.5, 7.5, 7.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 12.5, 13.5, 14.5, 14.5, 21.5, 21.5, 22.5, 22.5, 25.5, 27.5}. Now, we make a normalization operation by divided these data by 30, to get data between 0 and 1. The second data set concerns the failure times of the air conditioning system of an airplane (in hours), as reported in [38]. These "failure times" data set is: {23, 261, 87, 7, 120, 14,62,47,225,71,246,21,42,20,5,12,120,11,3,14,71,11,14,11,16,90,1,16, 52, 95}. Again, we make a normalization operation by dividing these data by 265, to get data between 0 and 1. That is, we work with the following data set: The third data set is about the maximum flood levels of a particular river in Pennsylvania in millions of cubic feet per second (mlcf/s). It is reported in [39]. With this unity of measure, the data are of proportion type, belonging to (0, 1). These "flood levels" data set is: {0 The data sets are basically analyzed in Table 1.  Table 1 indicates that the time of infection data set is right-skewed, with small dispersion and negative kurtosis. This point means that the curve of the unknown pdf behind these data is flatter than a normal pdf. Concerning the failure times data set, we can say that is "significantly" right-skewed, with small dispersion and "significant" kurtosis. For the flood levels data set, we can say that is right-skewed, with small dispersion and slightly positive kurtosis.
So the nature of the three data sets differs in numerous aspects. This is also illustrated through the corresponding boxplots in Figure 4, presenting different quantiles characteristics. Note that some extreme points are present. From Figure 5, we see that the TTT curve is concave, which corresponds to an increasing failure intensity for the times of infection data set. Figure 6 shows that the TTT curve is convex, then concave, suggesting a U-shape failure intensity for the failure times data set. In Figure 7, the TTT curve is concave, indicating an increasing failure intensity for the flood levels data set. Thus, these TTT plots highlighted the different nature of the failure intensity of these three data sets. It should also be noted that the increasing and U-shaped failure intensities are covered by the unit-Rayleigh model, which makes it suitable for more suitable analyzes of these data sets.
The quality of fit measurements for the models, as well as the maximum likelihood estimates (MLEs) and standard errors (SEs) of the parameters involved are collected in Tables 2-4 for the times of infection, failure times and flood levels data sets, respectively.   From Tables 2-4, the unit-Rayleigh model can be considered as the best model for the three data sets, because it has the smallest values for the CAIC, HQIC, AIC, BIC, W and A statistics. Figures 8-10 confirm this claim through a graphical approach. In them, we plot the estimated pdfs over the adequate histograms for the times of infection, failure times and flood levels data sets, respectively.   As anticipated, Figures 8-10 show the nice fits of the unit-Rayleigh model, which has captured the main characteristics of the data contrary to most of the competitors.
We complete this graphical analysis by plotting the estimated hrfs of the unit-Rayleigh model only in Figures 11-13.  As expected with the TTT plot in Figure 5, Figure 11 shows an increasing estimated hrf of the unit-Rayleigh model for the times of infection data set. Figure 12 indicates a U-shape estimated hrf for the failure time data set, which is coherent with the observation done in Figure 6. Figure 13 reveals an increasing estimated hrf for the flood levels data set, as anticipated in Figure 7. We thus see the importance of the possible U-shape of the hrf of the unit-Rayleigh distribution as evoked above for such a modelling.
All the preceding points highlight the undeniable capacities of the unit-Rayleigh model in the adjustment of various data. A possible continuation of this work may be the use of the unit-Rayleigh distribution for the construction of general families of distributions, through composition techniques or others, the construction of regression models including characteristics with values on the unit interval through an appropriated link function (see [40]). The presented bivariate versions of the unit-Rayleigh distribution can have applications in the treatment of compositional data with values over (0, 1) 2 (see [32]). All of these research scopes remain to be developed; we leave it for future investigations.

Conclusions
In this article, we have shown that the unit-Rayleigh distribution is not only a special case of the unit-Weibull distribution like many others, discussing specific motivations, interests, theoretical results, and practical benefits. In particular, numerous important functions and measures have closed-form expressions that can be useful for various probability and statistical purposes. The most relevant theoretical facts was a detailed analysis of the main functions, results on some stochastic ordering, the expressions of the incomplete and probability weighted moments, as well as those of the Tsallis entropy and reliability coefficient, various properties on the order statistics, and a list of potential bivariate extensions. An applied work has shown how the unit-Rayleigh distribution can be used in practice, with a quite simple estimation of the unique unknown parameter by the maximum likelihood technique. Based on three real data sets, we have proved empirically that it can be superior to other well-reputed one-parameter unit distributions, namely the one-parameter Kumaraswamy, Topp-Leone, one-parameter beta, power and transmuted distributions. We hope that this study will be able to convince applied statisticians, and readers in general, that the unit-Rayleigh distribution can be used effectively in different fields dealing with unit data.