1. Introduction
Standard probability distributions often fail to adequately describe real-world data, particularly when the data exhibit non-standard or complex structural properties. To address this issue, researchers have focused on developing broader families of statistical models that more accurately capture the complexities observed in empirical data. A common and effective approach to achieving greater flexibility involves introducing additional shape parameters into traditional probability distributions.
Shaw and Buckley [
1] introduced the Quadratic Rank Transmutation Map (QRTM), a methodology for generating new probability distributions from existing ones through rank-based transformations. This framework has since inspired further generalizations. Merovci, Alizadeh, and Hamedani [
2] expanded on this concept by proposing the Exponentiated Transmuted-G family. Subsequently, Moolath and Jayakumar [
3] introduced the T-transmuted X family, further enriching this line of research.
Furthermore, Granzotto, Louzada, and Balakrishnan [
4] presented a cubic extension of the QRTM, termed the Cubic Rank Transmutation Map (CRTM). More recently, Rahman et al. [
5] proposed a modified cubic transmuted-G distribution, adding another layer of adaptability within this class of statistical models.
The Rayleigh distribution is a continuous probability distribution commonly used to model non-negative random variables in probability theory and statistics [
6]. It is named after Lord Rayleigh (1842–1919). This distribution often arises when the overall magnitude of a vector is determined by its orthogonal components [
6].
The Rayleigh distribution is a special case of the Weibull family and is widely applied in reliability analysis, life-testing, and survival analysis. Specifically, if
then
X is equivalent to a Weibull random variable with shape parameter
and scale parameter
, i.e.,
Moreover, the square of a Rayleigh-distributed variable with parameter
has the following well-known interpretations:
, the chi-squared distribution with 2 degrees of freedom;
Equivalently, , the exponential distribution with rate parameter .
A notable characteristic of the Rayleigh distribution is its increasing hazard function, which makes it especially useful in certain reliability and survival contexts.
The Rayleigh distribution has a rich history, with early foundational contributions by Siddiqui [
7,
8] and Vickers [
9]. Over the years, several authors have proposed generalizations to enhance its flexibility and applicability, including Beckmann [
10], Kundu [
11], and Voda [
12]. More recently, Abd Elfattah et al. [
13] explored parameter estimation techniques for the Rayleigh model under various censoring schemes, reflecting continued interest in adapting the model to real-world data scenarios.
However, in many practical situations, the traditional Rayleigh form may not adequately capture emerging data patterns, motivating the development of extended versions. Merovci [
14,
15] introduced the transmuted Rayleigh and transmuted generalized Rayleigh distributions by applying transmutation techniques to the classical Rayleigh model.
More recently, Mir and Ahmad [
16] proposed the MTI Rayleigh distribution, designed to provide improved fit, particularly for datasets such as COVID-19 mortality figures. In a similar vein, Rivera et al. [
17] developed the Scale Mixture of Rayleigh (SMR) distribution, which performs well in capturing data with strong skewness and heavy tails.
Definition 1 ([18]). A continuous random variable X is said to follow a Rayleigh distribution with scale parameter if its probability density function (PDF) is given byand its cumulative distribution function (CDF) isHere, x denotes the random variable and σ is the scale parameter. Despite these advancements, the classical Rayleigh distribution remains limited in its ability to accommodate data exhibiting skewness or heavy tails. To address these shortcomings, recent studies have introduced structural extensions aimed at increasing flexibility and improving tail behavior.
One such advancement was proposed by Santoro et al. (2023) [
19], who introduced a modified version of the Lomax–Rayleigh distribution using a Slash-type transformation. This modification was designed to increase kurtosis, thereby enhancing the model’s capacity to capture extreme values.
In a different direction, Haj Ahmad et al. (2024) [
20] developed a discrete version of the generalized Rayleigh distribution. Utilizing a survival-based discretization approach, their model was tailored for count data—particularly data characterized by overdispersion. They investigated the model’s properties under both classical and Bayesian frameworks and demonstrated its effectiveness through applications to real datasets.
Further extending the Rayleigh family, Dong and Gui (2024) [
21] applied the generalized Rayleigh model to stress–strength reliability analysis. Their focus was on estimating the reliability measure
, using a sampling technique based on lower record ranked sets. The estimation procedures, developed under both likelihood and Bayesian paradigms, were enhanced with bootstrap confidence intervals, yielding improved precision over traditional sampling methods.
Motivated by these developments, we introduce a new generalization of the Rayleigh distribution: the record-based transmuted Rayleigh distribution of order 3 (rbt-Rayleigh). By incorporating two additional parameters, the proposed model offers increased flexibility while preserving a key reliability feature—the increasing failure rate (IFR)—under specific conditions. We evaluate the model using four distinct datasets and find that it consistently outperforms existing Rayleigh-type models, as assessed by standard criteria such as AIC, BIC, and the Kolmogorov–Smirnov statistic.
2. The Record-Based Transmuted Rayleigh Distribution of Order 3
Balakrishnan and He [
22] introduced the record-based transmuted-G (RBT-G) generator of order 3, a flexible framework for constructing new probability models from any given baseline cumulative distribution. This generator includes two additional shape parameters that allow for better control over the distribution’s skewness and tail behavior. The CDF is expressed as
subject to the constraints
and
.
The corresponding probability density function (PDF) derived from this generator is given by
where
denotes the probability density function (PDF) associated with the cumulative distribution function (CDF)
.
By taking the Rayleigh distribution as the baseline, we develop a new and flexible model known as the record-based transmuted Rayleigh distribution of order 3 (rbt-Rayleigh). The corresponding PDF and CDF are obtained by substituting the Rayleigh CDF and PDF, given in Equations (
1) and (
2), into generator Formulas (
3) and (
4).
Remark on the Gamma function. The Gamma function, denoted by
, is a classical extension of the factorial function to real and complex arguments. For any real number
, it is defined by the integral
One of its key properties is that, for every positive integer
n, we have
and more generally, it satisfies the recurrence relation
We make use of the following well-known integral identity involving the Gamma function:
where
,
, and
, as given in Gradshteyn and Ryzhik ([
23], Eq. 3.326(2), p. 339).
This identity is used in the proof of Proposition 1 and will also be used in Theorem 1 to derive the moment expressions.
Proposition 1. Let and denote the PDF and CDF of the record-based transmuted Rayleigh distribution of order 3 (rbt-Rayleigh), respectively, as defined in Equations (5) and (6). Then: - 1.
The PDF satisfies:
- (a)
for all .
- (b)
- 2.
The CDF satisfies:
- (a)
It is continuous on and right-continuous on .
- (b)
It is non-decreasing on .
- (c)
Proof. 1a. From the explicit form of the density, we note that it is composed of three factors:
where
Clearly,
and
imply
for
, and
for all
x. From the parameter constraints
and
, it follows that
Thus,
for all
, and therefore:
By applying the integral identity from Equation (
7), we obtain:
2a. The function
is a composition of exponential and polynomial terms, both of which are continuous on
. Hence,
is continuous and right-continuous on
.
2b. To verify monotonicity, we differentiate:
From part (1), we know
for all
, so
is non-decreasing on
.
2c. We now evaluate the limits:
By Theorem 3.20(d) from Rudin [
24], which states that
if and , then
we conclude:
Hence:
Thus,
satisfies:
Therefore, is a valid cumulative distribution function. Consequently, the record-based transmuted Rayleigh distribution of order 3 satisfies all the necessary conditions to be a valid probability distribution under the given parameter constraints. □
Figure 1 and
Figure 2 illustrate the variability in the shapes of the PDF and CDF for the record-based transmuted Rayleigh distribution of order 3.
The hazard rate function (HRF) of rbt-Rayleigh distribution is given by:
Since the baseline distribution in our model is the Rayleigh distribution, whose hazard function is strictly increasing (i.e., the Rayleigh distribution is IFR), it is of interest to examine whether this property is preserved under the record-based transmuted transformation of order 3.
This question has been addressed and rigorously proven by Balakrishnan and He (see Section 3.3 in [
22]), who showed that the resulting distribution retains the IFR property of the baseline if the transformation parameters satisfy the condition
Hence, in our case, the proposed distribution is IFR whenever this condition holds.
3. Quantile Function
The cumulative distribution function (CDF) of the rbt–Rayleigh distribution is given by
To compute the quantile function
, we must solve the nonlinear equation
which does not admit a closed-form solution for general values of
a,
b, and
. Therefore, the quantile function is computed numerically.
To address this, we implemented a root-finding algorithm in R that solves the equation above for a given probability level . The corresponding R code is provided below and can be used to generate a full quantile table or compute specific quantiles such as the median or quartiles.
R Code for Computing the Quantile Function of the rbt–Rayleigh Distribution
Listing 1 presents the
R code that numerically computes the quantile function of the rbt–Rayleigh distribution by solving the nonlinear equation.
Listing 1. R code for computing the quantile function of the rbt–Rayleigh distribution. |
![Symmetry 17 01034 i001]() |
Table 1 reports the quantile values
of the rbt–Rayleigh distribution for selected probabilities, while
Figure 3 illustrates the quantile function
for
, using parameters
,
, and
.
4. Moments
Theorem 1. If , then the moment of X is given by: Specifically, the mean and variance are obtained as follows: Proof. The result follows by applying the integral identity given in (
7). □
Throughout the remainder of the manuscript, we denote the r-th raw moment of the distribution by . This notation is used for expressing skewness and kurtosis in terms of the central moments.
Theorem 2. If , then the moment generating function of X, denoted by , is given by: Proof. By definition,
Since
one obtains
For any finite interval
, the function
is continuous on
and hence bounded. Therefore, there exists a constant
such that
Hence,
Since
the series
converges uniformly on
by the Weierstrass
M-test. Each term is continuous on
. By the Uniform Convergence Theorem for the Riemann integral, one obtains
The function
is integrable on
and tends to zero faster than any polynomial as
. Hence, letting
yields
From the explicit expression of the moments
one obtains
□
The values presented in
Table 2 and
Table 3 correspond to the mean and variance of the random variable
computed for selected combinations of the parameters
a,
b, and
.
5. Skewness and Kurtosis
In addition to the first two moments, which characterize the location and dispersion of a distribution, the third and fourth central moments provide insight into its shape. These are commonly summarized by the coefficients of skewness and kurtosis.
The coefficient of skewness measures the asymmetry of the distribution around its mean. A positive skewness indicates a longer right tail, whereas a negative value implies a heavier left tail. For a random variable
X, the skewness is defined as:
The coefficient of kurtosis, on the other hand, quantifies the heaviness of the tails and the sharpness of the peak relative to a normal distribution. It is given by:
For the proposed rbt-Rayleigh distribution, explicit expressions for the moments have been derived in Theorem 1. These can be directly substituted into the formulas above to compute the skewness and kurtosis as functions of the parameters a, b, and .
6. Harmonic Mean
Theorem 3. If , then the harmonic mean of X, defined as , is given by: Proof. We compute the expected value of the reciprocal:
Using the Gaussian integrals:
we conclude that:
□
7. Mean Deviations
The mean deviation about the mean and the mean deviation about the median are defined by:
where
is the mean, and
M denotes the median of the distribution.
Theorem 4. For the rbt-R distribution, the mean deviations and are given by:and Proof.
By substituting into Equations (
11) and (
12), we obtain the mean deviations. Here, erfc represents the complementary error function. □
8. Entropy
Entropy measures provide a formal means of quantifying the uncertainty inherent in probability distributions. Among them, Shannon entropy is the most widely used, while the Rényi entropy [
25], a parametric generalization, offers a broader framework for analyzing distributional characteristics [
26].
Let
X be a continuous random variable with probability density function
. The Rényi entropy of order
, is defined by
Theorem 5. Let . The Rényi entropy of order α for the rbt-R distribution is given by: Proof.
Applying the binomial expansion yields
For any finite interval
, the function
is continuous and hence bounded on
. Therefore, there exists a constant
such that
Since
converges absolutely, by the Weierstrass
M-test, the double series converges uniformly on
. Each term in the sum is continuous on
. By the Uniform Convergence Theorem for the Riemann integral, one obtains
The function
tends to zero faster than any polynomial as
. Thus, taking the limit
yields
Applying the integral identity
gives
Hence,
□
9. Order Statistics
Let denote the order statistics from an i.i.d. sample drawn from a continuous distribution with probability density function (PDF) and cumulative distribution function (CDF) .
The PDF of the
order statistic is given by:
If
, then
When
, we obtain the PDF of the smallest observation in the sample:
For the rbt-R distribution, this becomes:
When
, we obtain the PDF of the largest observation in the sample:
For the rbt-R distribution, this becomes
10. Maximum Likelihood Estimation
Let denote a random sample drawn from the record-based transmuted Rayleigh distribution of order 3, with parameters and , subject to the constraint .
The likelihood function corresponding to this sample is given by:
Taking logarithms, we obtain the log-likelihood function:
To estimate
, we differentiate
with respect to each parameter and set the score equations to zero:
In general, this nonlinear system has no closed-form solution, and numerical methods such as the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm or Newton–Raphson are employed to maximize the log-likelihood subject to the parameter constraints.
Under standard regularity conditions, the maximum likelihood estimator is asymptotically normal. Specifically:
where
, and
is the Fisher information matrix:
To confirm this result, we verify that the usual regularity conditions are satisfied:
Interior Point: The true parameter lies in the interior of the space .
Differentiability: The log-likelihood is continuously differentiable on for all .
Identifiability: The model is identifiable since each parameter combination yields a distinct density.
Fisher Information: The matrix exists and is positive definite.
Finite Expectations: Expectations involving the first and second derivatives are finite due to the exponential tail of the density.
To clarify notation, the gradient
and the Hessian
are given by:
The explicit expressions for the second-order partial derivatives used in this Hessian matrix are provided in
Appendix A.
The observed information matrix is:
Its inverse gives the estimated variance–covariance matrix:
Approximate
confidence intervals are constructed as:
where
is the standard normal quantile.
This formulation allows practitioners to assess parameter uncertainty and construct valid confidence intervals. Empirical examples using this procedure are provided in the following sections.
11. Application to Real Data
To evaluate the practical performance of the proposed record-based transmuted Rayleigh distribution of order 3 (rbtR), we fitted it to four real-world datasets. For comparison purposes, we also fitted the Rayleigh distribution (R), the transmuted Rayleigh distribution (T-R), the generalized Rayleigh distribution (G-R) as introduced in Surles and Padgett [
27], and the transmuted generalized Rayleigh distribution (TGR). Model comparison was conducted using multiple goodness-of-fit criteria, namely the Akaike Information Criterion (AIC), the corrected Akaike Information Criterion (AIC
c), the Bayesian Information Criterion (BIC), and the Kolmogorov–Smirnov distance (KS). Additionally, overlay plots of the estimated probability density functions and cumulative distribution functions were provided to facilitate a visual assessment of fit quality.
11.1. Dataset 1: Nicotine Yields (FTC, 1994)
Source: Federal Trade Commission (FTC), Cigarette Yields Report, 1994 (EconDataUS).
According to the FTC’s documentation, nicotine yields were measured using the Cambridge Filter Method—an approach the agency has endorsed since 1967 to ensure consistency across cigarette brands. The measurements, expressed in milligrams per cigarette, were rounded to the nearest 0.1 mg.
Data were collected from various manufacturers across more than 50 locations in the United States. The report includes results from the five dominant companies at the time: Philip Morris, R. J. Reynolds, Lorillard, Brown & Williamson, and Liggett Group. In the case of lesser-known brands, data were often submitted directly by the manufacturers, following FTC guidelines.
In addition to yield values, the report outlines sample collection protocols and standard smoking conditions, such as the 23 mm smoked butt length, which contribute to the reproducibility of the data. Previous studies by Sloan and Sublett [
28] and Schultz and Spears [
29] further support the accuracy of the laboratory techniques applied.
This dataset serves as a useful illustration for examining how well the proposed rbt-Rayleigh distribution performs in practice. Descriptive statistics are provided in
Table 4, while parameter estimates and model selection criteria—obtained via maximum likelihood estimation—are summarized in
Table 5 and
Table 6.
A histogram of the observed nicotine yields is presented in
Figure 4, while the empirical versus fitted CDFs are displayed in
Figure 5. The log-likelihood contour plot, with
fixed at its MLE, is shown in
Figure 6.
The observed Fisher information matrix (i.e., the negative of the Hessian matrix of the log-likelihood evaluated at the MLEs) under the rbt-Rayleigh distribution is estimated as:
The inverse of this matrix, denoted by
, provides the estimated variance–covariance matrix of the maximum likelihood estimators (MLEs):
Based on this, the approximate 95% confidence intervals for the parameters
,
a, and
b are computed as:
11.2. Dataset 2: Carbon Monoxide Emissions (FTC, 2007)
Source: U.S. Federal Trade Commission (FTC), “Nicotine, Tar, and CO Content of Domestic Cigarettes in 2007—Regular Brands, sorted by nicotine, tar, and CO.” Available at:
https://www.econdataus.com/cigrs.html, accessed on 23 June 2025.
This dataset reports carbon monoxide (CO) emissions per cigarette, measured in milligrams, as published by the FTC in 2007. The data were extracted from the publicly available table titled “Regular Brands, sorted by CO”, which presents standardized yield values for a wide range of domestic cigarette brands.
Measurements were obtained using the Cambridge Filter Method, a standardized laboratory technique recommended by the FTC to ensure comparability across brands. CO emission values are rounded to the nearest 0.1 mg and cover major tobacco manufacturers in the U.S. market.
The distribution of CO emissions exhibits moderate right skew due to a small number of high-emission brands. The proposed rbt-Rayleigh model fits the observed data closely, capturing the distributional shape more effectively than alternative models. Among the models evaluated, it achieves the lowest Kolmogorov–Smirnov (KS) distance (0.037), indicating superior goodness-of-fit.
Descriptive statistics for CO emissions are presented in
Table 7. Parameter estimates and log-likelihood values for the considered models are reported in
Table 8, while goodness-of-fit criteria are summarized in
Table 9. A histogram of the observed CO emission values is shown in
Figure 7, the empirical versus fitted CDFs are displayed in
Figure 8, and the log-likelihood contour plot—with
fixed at its MLE—is provided in
Figure 9.
The estimated variance–covariance matrix of the MLEs under the rbt-R distribution is given by:
11.3. Dataset 3: Carbon-Fibre Breaking Stress (50 mm Gauge)
The carbon-fibre breaking stress values analyzed in this study correspond to tensile strength measurements (in GPa) collected from fibres with a gauge length of 50 mm, as reported by Lishamol and Jiju [
30]. These measurements were obtained under controlled conditions from production samples, in accordance with standard testing procedures used to ensure that the fibres meet the necessary strength requirements for composite applications.
Of particular interest is the lower tail of the strength distribution—especially the first percentile—as reductions in this region may signal declining fibre quality and compromise the structural integrity of the resulting composite material.
This dataset serves as the third case study in our analysis (see
Table 10). Descriptive statistics are visualized in
Figure 10, and the empirical versus fitted CDFs are displayed in
Figure 11. The proposed three-parameter rbt-Rayleigh distribution demonstrates a superior fit, achieving the lowest AIC, AIC
c, BIC, and KS values across competing models (
Table 11 and
Table 12). The corresponding log-likelihood surface (
Figure 12) confirms the presence of a unique optimal solution.
The estimated variance–covariance matrix of the MLEs under the rbt-R distribution is given by:
11.4. Dataset 4: Carbon-Fibre Breaking Stress (20 mm Gauge)
This dataset consists of tensile strength measurements for carbon fibres tested at a gauge length of 20 mm, as reported by Badar and Priest [
31]. The strength values, expressed in gigapascals (GPa), were obtained under controlled laboratory conditions.
Using a shorter gauge length than that in Dataset 3 reduces the likelihood of encountering surface flaws in the tested segment. As a result, the measured strengths in this dataset tend to be slightly higher.
This distinction in testing setup provides a good opportunity to examine how the proposed rbt-Rayleigh distribution performs when the data come from a similar material but under different conditions. The observed breaking stress values for this dataset are presented in
Table 13.
A summary of the descriptive statistics is given in
Table 14, while parameter estimates and goodness-of-fit results appear in
Table 15 and
Table 16. Visual diagnostics are shown in
Figure 13,
Figure 14 and
Figure 15. Once again, the rbt-R model provides the best fit, outperforming all four competing distributions across all evaluation metrics (
Table 15 and
Table 16). This superiority is further supported by graphical diagnostics shown in
Figure 13,
Figure 14 and
Figure 15.
The estimated variance–covariance matrix of the MLEs under the rbt-R distribution is given by:
11.5. Summary of Results Across Datasets
The rbt-R distribution provides the best overall fit according to AIC, AICc, BIC, and KS criteria.
12. Random Sampling via Inverse Transform and Newton–Raphson
To investigate the finite-sample properties of the rbt-R maximum likelihood estimators, synthetic data are generated using a combination of the inverse-transform method and Newton–Raphson root-finding. The procedure is as follows:
Fix the true parameter vector , set the sample size n, and choose an initial value (we use the Rayleigh quantile approximation below).
For each , draw .
Compute
which provides a Rayleigh-based initial guess.
Solve the equation
iteratively via
where
f and
F denote the
rbt-R density and CDF, respectively. Iteration stops when
or after 50 steps, whichever occurs first.
Upon convergence, set . Repeat steps 2–5 for all to obtain the simulated sample .
13. Monte Carlo Experiment
We evaluated the performance of the estimator at the true parameter values , in the sample sizes
For each n, we perform independent replications. In each replication:
Generate a sample of size n using the inverse-Newton method described above;
Obtain the MLEs via constrained optimization (L–BFGS–B);
Store the estimated triplet.
For each parameter
, we compute:
14. Conclusions
In this work, we present a new three-parameter extension of the classical Rayleigh distribution by applying the record-based transmuted-G (RBT-G) distribution of order 3, originally introduced by Balakrishnan and He. The resulting model, referred to as the rbt-Rayleigh distribution of order 3, offers increased flexibility to capture skewness and heavy-tailed behavior while retaining analytical tractability. Several analytical properties of the proposed distribution are derived, including the r-th raw and central moments, the harmonic mean, Shannon entropy, the quantile function, and the order statistics. The model parameters are estimated using the maximum likelihood method, implemented via numerical optimization in the R programming environment.
To assess the model’s performance, the rbt-Rayleigh distribution is applied to four empirical datasets: two related to cigarette composition (nicotine content and carbon monoxide emissions), and two concerning carbon-fibre tensile strength. A comparative analysis is conducted using standard goodness-of-fit criteria: Akaike Information Criterion (AIC), corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC), and the Kolmogorov–Smirnov (KS) statistic. In all cases, the rbt-Rayleigh model demonstrates a superior fit relative to the classical Rayleigh, transmuted Rayleigh, generalized Rayleigh, and transmuted generalized Rayleigh distributions.