A Bounded Sine Skewed Model for Hydrological Data Analysis

Hussain, Tassaddaq; Shakil, Mohammad; Ahsanullah, Mohammad; Kibria, Bhuiyan Mohammad Golam

doi:10.3390/analytics4030019

Open AccessArticle

A Bounded Sine Skewed Model for Hydrological Data Analysis

by

Tassaddaq Hussain

¹

,

Mohammad Shakil

^2,*

,

Mohammad Ahsanullah

³ and

Bhuiyan Mohammad Golam Kibria

⁴

¹

Department of Mathematics, Mirpur University of Science and Technology (MUST), Mirpur 10250, Pakistan

²

Department of Mathematics, Miami Dade College, Hialeah, FL 33012, USA

³

Department of Management Sciences, Rider University, Lawrenceville, NJ 08648, USA

⁴

Department of Math & Stat, Florida International University, Miami, FL 33199, USA

^*

Author to whom correspondence should be addressed.

Analytics 2025, 4(3), 19; https://doi.org/10.3390/analytics4030019

Submission received: 16 May 2025 / Revised: 16 July 2025 / Accepted: 8 August 2025 / Published: 13 August 2025

Download

Browse Figures

Versions Notes

Abstract

Hydrological time series frequently exhibit periodic trends with variables such as rainfall, runoff, and evaporation rates often following annual cycles. Seasonal variations further contribute to the complexity of these data sets. A critical aspect of analyzing such phenomena is estimating realistic return intervals, making the precise determination of these values essential. Given this importance, selecting an appropriate probability distribution is paramount. To address this need, we introduce a flexible probability model specifically designed to capture periodicity in hydrological data. We thoroughly examine its fundamental mathematical and statistical properties, including the asymptotic behavior of the probability density function (PDF) and hazard rate function (HRF), to enhance predictive accuracy. Our analysis reveals that the PDF exhibits polynomial decay as

x \to \infty

, ensuring heavy-tailed behavior suitable for extreme events. The HRF demonstrates decreasing or non-monotonic trends, reflecting variable failure risks over time. Additionally, we conduct a simulation study to evaluate the performance of the estimation method. Based on these results, we refine return period estimates, providing more reliable and robust hydrological assessments. This approach ensures that the model not only fits observed data but also captures the underlying dynamics of hydrological extremes.

Keywords:

climate change; hazard rate function; link function; goodness-of-fit statistics; prediction; data analysis

MSC:

60E05; 62E15; 62F10

1. Introduction

Heavy-tailed distributions are widely studied in fields like actuarial science, engineering, and environmental science [1,2]. Recent hydrological research emphasizes improving tail behavior modeling to better predict extreme events such as floods, heavy rainfall, and glacier melt [3,4,5,6]. Applying heavy-tailed models to flood frequency analysis has proven effective in capturing the probability of rare, severe floods [7]. New distribution families, including those based on sine trigonometric functions, have been developed to enhance risk assessment and refine return period estimates [8,9]. The accurate forecasting of floods is crucial for risk management, especially given the periodic trends in rainfall and runoff. Using flexible, heavy-tailed distributions helps prevent the underestimation of extreme events, safeguarding infrastructure and public safety. Hydrological and environmental time series often display heavy-tailed, unimodal, and right-skewed traits [10]. These are usually modeled as independent stochastic processes, although their true distributions are unknown, leading to reliance on simplified models that may not capture real data complexities. Recent research highlights the need for advanced distribution families to better represent extreme events [11]. Modern techniques, including machine learning, help identify long-term trends, periodicities, and climate change impacts on floods, which are vital for risk assessment, early warnings, and infrastructure resilience planning amid rising flood risks [12,13,14]. Recent advances in probability distributions have introduced flexible families to address various statistical challenges [1]. The accurate estimation of return periods is crucial for designing hydraulic structures and managing flood risks. Flood time series analyses highlight the importance of modeling temporal variability and extremes using advanced statistical and machine learning techniques to identify trends, periodicities, and climate impacts [11,13]. These improve flood risk assessment and early warning systems. Flood frequency analysis (FFA), involving data screening, periodicity analysis, distribution selection, and parameter estimation, relies on modern methods to better capture complex flood behaviors influenced by climate and human factors [1,9,10,11,13,15,16,17,18,19,20,21]. A key challenge in flood frequency analysis (FFA) is the uncertainty in identifying the true probability distribution of flood events at specific sites, and it does not incorporate machine learning and heavy-tailed models [12]. Traditional models such as the Exponential, Log-Normal, Pearson Type-III, Weibull, Fréchet, and Gumbel [19,22,23,24] often struggle to capture complex hydrological features like seasonality, multiple peaks, or bounded extremes [25]. For example, the Gumbel distribution tends to underestimate tail risks, limiting its reliability [13]. While some studies [26,27] have incorporated sine functions to model seasonality, these approaches are limited in capturing tail behavior and complex flood dynamics. Similarly, sine-transformed distributions such as Sine-Weibull, Sine-Lomax, Sine-Exponential, and Sine-Burr [28,29,30,31,32] lack strong theoretical motivation and often fail to accurately model tail extremes and bounded or multimodal data [33]. Overall, these models fall short when data exhibit seasonality, multiple peaks, or bounded periodic extremes, highlighting the need for more flexible and theoretically grounded distribution families to improve flood risk assessment. To overcome these challenges, this section introduces the Sine-Skewed Distribution (SSD), which is a novel approach designed to enhance flood frequency modeling. The SSD offers several key advantages: (i) improved tail flexibility to better capture extreme events, (ii) boundedness for realistic hydrological constraints, (iii) accommodation of periodic behavior in time series data, and (iv) the preservation of heavy-tailed characteristics where needed. We derive the fundamental properties of the SSD, including its Probability Density Function (PDF), Hazard Rate Function (HRF), and Quantile Function. Furthermore, we examine both its parametric and asymptotic behavior to ensure robust performance in hydrological applications. This comprehensive analysis positions the SSD as a promising alternative to traditional distributions in FFA.

Derivation of the Proposed Model

The derivation of the proposed model is based on the following steps:

The first step in this direction is choice of the odd link function $D (.)$ , defined $D (x; Θ) = \frac{1 - G (x)}{G (x)}$ , which satisfies the conditions such as (i) $D (.)$ is differentiable and monotonically non-decreasing, (ii) $D (x) \to a$ as $x \to 0$ and $D (x) \to b$ as $x \to \infty$ where $G (x)$ is the baseline cumulative distribution function (CDF).
Now, take the CDF of log-logistic distribution as a baseline function with parameters $(θ, α)$ , which is defined on the interval $(0, \infty)$ as $G (x | α, θ) = \frac{x^{θ}}{x^{θ} + α^{θ}}$ .
On incorporating the baseline distribution function into $D (x | α, θ)$ , we get $D (x | α, θ) = {(\frac{α}{x})}^{θ}$ .
In order to address the common periodic fluctuations in hydrological time series, as highlighted in [19], we used the [34] CDF in terms of $P (x) = Sin (\frac{π}{2} G (x | α, θ))$ to generate a new class of distributions by modifying trigonometric functions—particularly the transformation of the Sin function into the Sin $G$ -class function.
Finally, a new class of exponentiated Sin- $D$ class is proposed by first substituting $G (x | α, θ)$ into $D (x | α, θ)$ and implementing necessary domain constraints to guarantee the function’s validity as a CDF for this new family as

$F (x; α, β, θ) = {(1 - Sin (\frac{π}{2} D (x | α, θ)))}^{β}, x \geq α,$

(1)

which on incorporating $D (x | α, θ)$ in Equation (1) reduces to

$F_{SSD} (x; α, θ, β) = {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}, x > α,$

(2)

and its survival function becomes

$S_{SSD} (x; α, θ, β) = 1 - {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}, x > α,$

(3)

with a corresponding PDF as

f (x; α, θ, β) = \frac{π α β θ}{2 x^{2}} {(\frac{α}{x})}^{θ - 1} Cos (\frac{π}{2} {(\frac{α}{x})}^{θ}) {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β - 1}, x > α, α, β > 0, θ > 0 .

(4)

where

α, θ

and

β

are the parameters of the distributions. The parameter

α

usually affects the lower bound shifts in the distribution to the right,

θ

affects the rate at which the CDF approaches 1, and

β

influences the overall shape and tail behavior. The PDF (which involves positive powers and cosine functions), as stated in Equation (4), is a skewed function depending on the parameters. For example, if

β > 1

, the distribution tends to be more concentrated near the lower bound; i.e.,

α

, similarly, if

β < 1

, SSD might have a longer tail on the right. However, the oscillatory sine and cosine functions can produce complex shapes, potentially leading to negative skewness in some parameter regimes.

The graphs show how different parameter settings affect the SSD’s PDF. The left plot as shown in Figure 1 covers a wide range and reveals predominantly right-skewed distributions peaking at lower values. The right plot, focusing on a narrower range, also demonstrates parameter-driven variations in shape and skewness. Overall, changing parameters significantly impact the distribution, underscoring the importance of proper parameter selection for accurate SSD modeling.

The remainder of the manuscript is structured in the following manner: In Section 2, an exploration of mathematical and statistical features is carried out. Section 3 is devoted to the comparison of methods of parameter estimation like maximum likelihood (MLE), Bayesian method of estimation (BME) and the L-moments estimation (LME) method along with a simulation study. In Section 4, flood data applications are studied and an analysis is made on the basis of goodness-of-fit measures, and the conclusions are presented in Section 5.

2. Exploring Mathematical and Statistical Features

In this section, we delve into the mathematical and statistical features that characterize the

SSD - G

distribution class. We will analyze concepts such as PDF and HRF curve behaviors, quantiles, the moment-generating function, various moments (including conditional moments), mean deviation, Bonferroni and Lorenz measures, and order statistics.

2.1. Shape of the PDF and HRF Curves

Suppose X follows SSD distribution with PDF as defined in Equation (Figure 1); then, the lograthimic form of the function is expressed as

\begin{matrix} Log (f (x; α, β, θ)) & = & Log (\frac{π α β θ}{2}) - 2 Log (x) + (θ - 1) Log (\frac{α}{x}) + Log (Cos (\frac{π {(\frac{α}{x})}^{θ}}{2})) \\ + & (β - 1) Log (1 - Sin (\frac{π {(\frac{α}{x})}^{θ}}{2})), \end{matrix}

on differentiating both sides of Equation (11) with respect to x and then equating it to zero, we get its mode, which is obtained by numerical study as portrayyed in Table 1. Moreover, it is observed that

\begin{matrix} \frac{d Log f (x; α, β, θ)}{d x} & = & - \frac{2}{x} - \frac{- 1 + θ}{x} + \frac{π α {(\frac{α}{x})}^{- 1 + θ} (- 1 + β) θ Cos (\frac{1}{2} π {(\frac{α}{x})}^{θ})}{2 x^{2} (1 - Sin (\frac{1}{2} π {(\frac{α}{x})}^{θ}))} \\ + & \frac{π α {(\frac{α}{x})}^{- 1 + θ} θ Tan (\frac{1}{2} π {(\frac{α}{x})}^{θ})}{2 x^{2}} = 0 . \end{matrix}

Here,

α

scales the mode linearly, and larger

α

values shift the mode rightward.

θ

governs tail behavior, and higher

θ

values pushes the mode closer to

α

.

β

controls skewness, for

β > 1

, the mode shifts right; for

β < 1

, left. Moreover,

x \to α^{+}

implies

f (x) \to 0

whereas

Log (f (x)) \to - \infty

. Similarly,

x \to \infty

implies

f (x) \to x^{- 2}

as well as finite mean but infinite variance, which is useful for modeling extreme events, whereas

\frac{d Log (f (x))}{d x} \to - 2 Log (x)

. The SSD distribution’s mode is analytically intractable but can be reliably computed numerically; see Table 1. The mode exists for all

x > α

and responds predictably to parameter changes, making it useful for modeling skewed, heavy-tailed data.

Definition 1.

The hazard rate function (failure rate) of the Sine-Skewed Distribution (SSD) is given by

h (x; α, β, θ) = \frac{f (x; α, β, θ)}{1 - F (x; α, β, θ)},

which on the substitution of Equations (1) and (3) yields

h (x; α, β, θ) = \frac{\frac{π α β θ}{2 x^{2}} {(\frac{α}{x})}^{θ - 1} Cos (\frac{π}{2} {(\frac{α}{x})}^{θ}) {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β - 1}}{1 - {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}} .

The SSD hazard rate is flexible, accommodating both decreasing and non-monotonic behaviors. Now, as

x \to α^{+}

,

h (x) \to 0

, and

x \to \infty

,

h (x) \sim \frac{π α^{θ} β θ}{2 x^{θ + 1}}

. For

β \geq 1

,

h (x)

is decreasing (DFR), and for

β < 1

,

h (x)

may be non-monotonic. It is heavy-tailed with

h (x) \to 0

as

x \to \infty

. The parameter

β

controls the monotonicity for any

β < 1

potential initial increase (useful for early-life failures), and

β \geq 1

is the decreasing failure rate (common in wear-out processes). SSD distribution is also suitable for modeling reliability data with varying failure patterns. Similarly, the lograthmic form of HRF yields

\begin{matrix} Log (h (x)) & = & Log (\frac{π α β θ}{2}) - 2 Log (x) + (θ - 1) Log (\frac{α}{x}) + Log (Cos (\frac{π}{2} {(\frac{α}{x})}^{θ})) \\ + & (β - 1) Log (1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ})) - Log (1 - {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}) . \end{matrix}

{lim}_{x \to α^{+}} Log (h (x)) = - \infty

, and

{lim}_{x \to \infty} log h (x) = log (\frac{π α^{θ} β θ}{2}) - (θ + 1) log x + O (x^{- 2 θ})

. For

β \geq 1

,

log h (x)

is decreasing, and for

β < 1

:

Log (h (x))

may have one maximum.

Similarly, the HRFs mainly show right-skewed, unimodal shapes with higher densities at lower SSD values see Figure 2. Some curves suggest increasing failure rates (IFRs) initially, then decreasing, while others indicate decreasing failure rates (DFRs). Overall, the distributions highlight how parameter variations influence failure behaviors. From Table 1 and Figure 3 and Figure 4, there is a relationship between its modal values, and the parameters

θ

and

β

exist for fixed

α

values (0.5 and 1.0). For

α = 0.5

, the mode and mode/

α

ratio exhibit a consistent increase with

θ

across varying

β

values (1.0 to 2.0), indicating that higher

β

values amplify skewness and shift the peak further from

α

. This trend is particularly evident in the mode/

α

ratio, which normalizes the mode by the scale parameter, demonstrating the proportional shift in the distribution’s shape. For

α = 1.0

, the behavior of the mode becomes more complex: it initially rises with

θ

but stabilizes or slightly declines at higher

θ

values (e.g.,

θ > 2.5

), suggesting that the influence of

θ

diminishes as

α

increases. The mode/

α

ratio for

α = 1.0

further supports this observation, highlighting the dominant role of

β

in determining the distribution’s peak when

α

is larger. These findings underscore the interplay between

θ

and

β

in shaping the SSD’s modal characteristics, where

θ

controls the tail behavior and

β

governs the skewness intensity, collectively influencing the location of the mode relative to the scale parameter

α

.

2.2. Percentile Function

Percentile functions bridge probabilistic flood risk with actionable engineering metrics, outperforming classical methods in robustness and interpretability—especially for climate-adjusted extremes. Let X be a continuous random variable with a CDF defined as

F

:

R

\to [0, 1]

. The percentile function

Q (u)

serves to identify the value x such that the probability of a random draw from the distribution being less than or equal to x is equal to u. The inverse of the SSD percentile function, yielding

x_{u} = Q (u)

, implies that

Q (u) = F_{SSD}^{- 1} (u)

is derived as follows:

Step 1:: Set the CDF equal to u

${(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β} = u, u \in (0, 1) .$
Step 2:: Take the β-th root of both sides

$1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}) = u^{1 / β} .$
Step 3:: Isolate the sine term

$Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}) = 1 - u^{1 / β} .$
Step 4:: Take the inverse Sine (ArcSine) of both sides

$\frac{π}{2} {(\frac{α}{x})}^{θ} = ArcSin (1 - u^{1 / β}) .$
Step 5:: Solve for ${(\frac{α}{x})}^{θ}$

${(\frac{α}{x})}^{θ} = \frac{2}{π} ArcSin (1 - u^{1 / β}) .$
Step 6:: Take the $θ$ -th root of both sides

$\frac{α}{x} = {(\frac{2}{π} ArcSin (1 - u^{1 / β}))}^{1 / θ} .$
Step 7:: Solve for x to obtain the percentile function

$x_{u} = α {(\frac{2}{π} ArcSin (1 - u^{1 / β}))}^{- 1 / θ} .$
Step 8:: Final Percentile Function

$Q (u) = α {(\frac{2}{π} ArcSin (1 - u^{1 / β}))}^{- 1 / θ}, u \in (0, 1) .$

Thus, we can write it as

x_{u} = Q (u) = {(\frac{π}{2})}^{\frac{1}{θ}} α ArcSin {(1 - u^{\frac{1}{β}})}^{- 1 / θ}

(5)

where

F^{- 1} (u)

denotes the percentile function of

F (x)

. Here,

F^{- 1} (u)

is characterized by the equation

Q (u)

where

u \in (0, 1)

. The

M e d i a n = \tilde{X} = x_{0.5}

is given by

\tilde{X} = {(\frac{π}{2})}^{\frac{1}{θ}} α ArcSin {(1 - {(0.5)}^{\frac{1}{β}})}^{- 1 / θ}

The skewness measure is due to the Bowley skewness defined by

SK = \frac{Q (\frac{3}{4}) + Q (\frac{1}{4}) - 2 Q (\frac{1}{2})}{Q (\frac{3}{4}) - Q (\frac{1}{4})}

On the other hand, the Moors kurtosis (Moors, (1988)) based on quantiles is given by

KU = \frac{Q (\frac{7}{8}) - Q (\frac{5}{8}) + Q (\frac{3}{8}) - Q (\frac{1}{8})}{Q (\frac{6}{8}) - Q (\frac{2}{8})} .

where

Q (

·) represents the percentile function. The measures

SK

and

KU

possess the usual characteristics. So, SSD is positively skewed and behaves as leptokurtic for

α > 1

and platykurtic for

α < 1

, which can be visualized from Figure 5.

2.3. Moments and Moment-Generating Function

Moments and moment-generating functions (MGFs) are essential tools in flood frequency analysis, providing insights into the distribution of extreme hydrological events. The first four moments—mean, variance, skewness, and kurtosis—capture key characteristics of flood distributions. The mean estimates average flood magnitudes, guiding infrastructure design such as spillway capacity. Variance measures the variability of flood peaks, with higher values indicating more volatile flood regimes, such as those influenced by monsoon patterns. Skewness assesses asymmetry in flood extremes with positive skewness common in rainfall-driven floods. Kurtosis indicates the heaviness of distribution tails, helping identify basins prone to outliers, like those affected by snowmelt combined with rainfall. Now, let X be a random variable with a PDF as given in Equation (4), which is parameterized by shape parameters (

α, β, θ

). The rth moment for a distribution within the SSD class can then be derived as follows:

μ_{r}^{/} = E (X^{r}) = \int_{- \infty}^{\infty} x^{s} f (x; α, β, θ) d x,

(6)

on incorporating Equation (4) in Equation (6), we get

\begin{matrix} μ_{r}^{/} = E (X^{r}) & = & \frac{π α β θ}{2} \int_{α}^{\infty} x^{r - 2} {(\frac{α}{x})}^{θ - 1} Cos (\frac{π}{2} {(\frac{α}{x})}^{θ}) \\ \times & {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β - 1} d x, x > α, α, β > 0, θ > 0 . \end{matrix}

now, on substuting

{(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β - 1} = y

, we get

μ_{r}^{/} = α^{r} {(\frac{π}{2})}^{r / θ} \int_{0}^{1} ArcSin {(1 - y^{\frac{1}{β}})}^{\frac{- r}{β}} d y,

let

ξ (β, θ, r; 0, 1) = \int_{0}^{1} ArcSin {(1 - y^{\frac{1}{β}})}^{\frac{- r}{β}} d y,

which after numerical integration yields results, so the rth moment can be written as

μ_{r}^{/} = α^{r} {(\frac{π}{2})}^{r / θ} ξ (β, θ, r; 0, 1) .

(7)

In this section, we have also conducted a numerical study, as portrayed in Table 2, to know the existence of mean and shape of the SSD under different parameters’ values. Table 2 also portrays that skewness is consistently positive across all configurations, suggesting all distributions are right-skewed. However, kurtosis values are

> 3

and

< 3

, indicating moderate to heavy-tailed behavior in these configurations. The extreme values in kurtosis and skewness for the first few configurations suggest potential outliers or heavy-tailed distributions. This could be characteristic of heavy-tailed distributions under specific parameter settings. Moreover, from Table 1, it is evident that

β

has no effect on the moments, mean, variance, skewness, or kurtosis in this data set. This suggests that the moments are independent of

β

under the given model. Increasing

α

leads to higher mean, variance, skewness, and kurtosis; see Figure 6. Increasing

θ

reduces the mean and variance but increases skewness and kurtosis (for fixed

α

). The distribution is right-skewed (

γ_{1} > 0

) and leptokurtic (

γ_{2} > 0

, heavier tails than normal). Now, we introduce the moment generating function by defining it as

M_{X} (t) = E (e^{t X}) = \int_{- \infty}^{\infty} e^{t x} f (x; α, β, θ) d x

(8)

M_{X} (t) = E (e^{t X}) = \sum_{r = 0}^{\infty} \frac{t^{r}}{r!} α^{r} {(\frac{π}{2})}^{\frac{r}{θ}} \int_{0}^{1} {(ArcSin (1 - y^{\frac{1}{β}}))}^{- r / θ} d y,

M_{X} (t) = E (e^{t X}) = \sum_{r = 0}^{\infty} \frac{t^{r}}{r!} μ_{r}^{/},

where

μ_{r}^{/}

is defined in Equation (7).

2.4. Conditional Moments

In flood risk analysis, conditional statistical methods such as conditional moments, the mean residual life (MRL) function, and mean inactivity time (MIT) are essential for understanding extreme hydrological events. The first partial moment helps develop flood risk curves, highlighting how rare but severe floods (e.g., top 10%) cause the most damages, and revealing patterns like the temporal clustering of high-risk periods. The MRL function predicts the expected severity of floods exceeding certain thresholds, aiding infrastructure planning, while MIT estimates the recovery times between floods, informing emergency preparedness. These tools support practical applications like flood insurance pricing and resource allocation. An example involving the Rhine River showed that a small percentage of floods caused the majority of economic losses—insights that traditional return period analysis might miss. Overall, this framework offers policymakers refined means to assess and manage flood risks amid climate change. Consequently, to facilitate this, the rth partial moment of the variable X, denoted as

\nabla_{r} (t)

for any real

r > 0

, is defined as

\nabla_{r} (t) = \int_{- \infty}^{t} x^{r} f (x; α, β, θ) d x,

(9)

on sustituting

y = {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}

, we get

\nabla_{r} (t) = \int_{0}^{{(1 - Sin [\frac{1}{2} π {(\frac{α}{t})}^{θ}])}^{β}} α^{r} {(\frac{π}{2})}^{\frac{r}{θ}} {(ArcSin (1 - y^{\frac{1}{β}}))}^{- r / θ} d y,

\nabla_{r} (t) = α^{r} {(\frac{π}{2})}^{\frac{r}{θ}} \int_{0}^{{(1 - Sin [\frac{1}{2} π {(\frac{α}{t})}^{θ}])}^{β}} {(ArcSin (1 - y^{\frac{1}{β}}))}^{- r / θ} d y,

\nabla_{r} (t) = α^{r} {(\frac{π}{2})}^{\frac{r}{θ}} \int_{0}^{F (t; α, β, θ)} {(ArcSin (1 - y^{\frac{1}{β}}))}^{- r / θ} d y,

\nabla_{r} (t) = α^{r} {(\frac{π}{2})}^{\frac{r}{θ}} ξ (β, θ, r; 0, F (t; α, β, θ)),

where

ξ (β, θ, r; 0, F (t; α, β, θ)) = \int_{0}^{F (t; α, β, θ)} {(ArcSin (1 - y^{\frac{1}{β}}))}^{- r / θ} d y

. Now, the rth residual moment is an important characteristic of the model. It gives the expected additional lifetime given that a component has survived until time t. For a non-negative continuous random variable X with SSD(

α, β, θ

) distribution, the rth life function is defined as

\begin{matrix} E ({(X - t)}^{r} | X > t) & = & \frac{1}{S (t; α, β, θ)} \int_{t}^{\infty} \frac{{(x - t)}^{r}}{2 x^{2}} π α {(\frac{α}{x})}^{- 1 + θ} β θ \\ \times & Cos \frac{1}{2} π {(\frac{α}{x})}^{θ} {(1 - Sin (\frac{1}{2} π {(\frac{α}{x})}^{θ}))}^{- 1 + β} d x, \end{matrix}

where

x > α, α, β > 0, θ > 0,

now put

y = {(1 - Sin e (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}

; then, we get

E ({(X - t)}^{r} | X > t) = \frac{1}{S (t; α, β, θ)} \int_{{(1 - Sin [\frac{1}{2} π {(\frac{α}{t})}^{θ}])}^{β}}^{1} {({(\frac{π}{2})}^{\frac{1}{θ}} α ArcSin {(1 - y^{\frac{1}{β}})}^{- 1 / θ} - t)}^{r} d y,

by using binomial expansion, we have

\begin{matrix} E ({(X - t)}^{r} | X > t) & = & \frac{1}{S (t; α, β, θ)} \sum_{k = 0}^{r} {(- t)}^{k} (\binom{r}{k}) {(\frac{π}{2})}^{\frac{r - k}{θ}} α^{r - k} \\ \times & (1 - \int_{0}^{{(1 - Sin (\frac{1}{2} π {(\frac{α}{t})}^{θ}))}^{β}} ArcSin {(1 - y^{\frac{1}{β}})}^{- (r - k) / θ} d y), \end{matrix}

which on simplification yields

\begin{matrix} E ({(X - t)}^{r} | X > t) & = & \frac{1}{S (t; α, β, θ)} \sum_{k = 0}^{r} {(- t)}^{k} (\binom{r}{k}) {(\frac{π}{2})}^{\frac{r - k}{θ}} \\ \times & α^{r - k} (1 - ξ (β, θ, r - k; 0, F (t; α, β, θ))) . \end{matrix}

2.4.1. Mean Deviation

Partial moments offer a valuable tool for quantifying the typical difference between a population’s median and mean, providing insights into the distribution’s central tendency. This methodology has broad applicability in fields like economics and insurance. For a random variable X following the SSD distribution, the mean deviations around the mean

μ = E (X)

and the median

\tilde{M}

are formally defined as

\nabla_{1} (x) = E ∣ X - μ_{1}^{/} ∣ = 2 μ_{1}^{/} F (μ_{1}^{/}) - 2 \nabla_{1} (μ_{1}^{/})

(10)

and

\nabla_{2} (x) = E ∣ X - \tilde{M} ∣ = μ_{1}^{/} - 2 \nabla_{1} (\tilde{M})

(11)

respectively, where

μ_{1}^{/} = E (X),

\tilde{M} = m e d i a n

(X) =

Q

(

\frac{1}{2})

,

M

, and

\nabla_{1} (t)

is the first complete moment given by Equation (7) with

r = 1

.

2.4.2. Bonferroni and Lorenz Curves

For a positive random variable X, the Lorenz and Bonferroni curves at a given probability p are expressed by

B (u) = \frac{1}{p μ_{1}^{/}} \nabla_{1} (q)

. In these definitions,

μ_{1}^{/} = E (X)

is the expected value of X, and

p = U (u)

represents the value of the percentile function of X at percentile u.

2.5. Order Statistics

Order statistics are important statistical measurements derived from arranging a set of random observations. Consider n independent random variables

X_{1}

,

X_{2}

,…,

X_{n}

following the SSD distribution. When these variables are sorted in increasing order to form

X_{1} \leq X_{2} \leq . . . \dots \leq X_{n}

, the resulting values are known as order statistics. These ordered data points are frequently applied in the reliability analysis of systems. The CDF for the ith order statistic is presented as follows:

\begin{matrix} F_{i; n} (x) & = & \frac{1}{B (i, n - i + 1)} \sum_{j = 0}^{n - i} \frac{{(- 1)}^{j}}{i + j} (\binom{n - i}{j}) F^{i + j} (x; α, β, θ) \\ = & \frac{1}{B (i, n - i + 1)} \sum_{j = 0}^{n - i} {(- 1)}^{j} (\binom{n - i}{j}) {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β (i + j)}, x > α, α, β > 0, θ > 0 . \end{matrix}

The corresponding pdf is expressed in the given form as

\begin{matrix} f_{i; n} (x) & = & \frac{f (x; α, β, θ)}{B (i, n - i + 1)} \sum_{j = 0}^{n - i} {(- 1)}^{j} (\binom{n - i}{j}) F {(x; α, β, θ)}^{i + j - 1} (x) \\ = & \frac{π α β θ {(\frac{α}{x})}^{θ - 1}}{2 x^{2} B (i, n - i + 1)} Cos (\frac{π}{2} {(\frac{α}{x})}^{θ}) \\ \times & \sum_{j = 0}^{n - i} {(- 1)}^{j} (\binom{n - i}{j}) {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β (i + j) - 1}, x > α, α, β > 0, θ > 0 . \end{matrix}

Then, the rth moment of the ith order statistics is given by

\begin{matrix} μ_{i : r} & = & E (X_{i : r}^{r}) = \int_{- \infty}^{\infty} x^{r} f_{i; n} (x) d x = \\ = & \frac{1}{B (i, n - i + 1)} \sum_{j = 0}^{n - i} {(- 1)}^{j} (\binom{n - i}{j}) \int_{- \infty}^{\infty} x^{r} f (x) F^{i + j - 1} (x) d x \\ = & \frac{1}{B (i, n - i + 1)} \sum_{j = 0}^{n - i} {(- 1)}^{j} (\binom{n - i}{j}) μ_{r, i + j - 1}^{'} \end{matrix}

where this integral can be evaluated numerically.

3. Methods of Parameter Estimation and Simulation Study

This section employs the maximum likelihood estimation (MLE) and studied the performance of MLEs on the basis bias and of the mean square error (MSE) of the MLEs, which provide asymptotically consistent, efficient, and normally distributed estimates, making it well-suited for large samples. However, its performance can degrade under model misspecification. As MLE excels in correctly specified models, to evaluate the performance of the method, we conducted a simulation study, based on sample size, distributional characteristics, and modeling objectives. However, in Method of Moments (MoM) estimation, case theoretical moments were hard to compute (leading to numerical instability), and initial guesses were too far from true parameters. So, we have not included it in our study.

3.1. Method of Maximum Likelihood

Statistical inference typically relies on three approaches: point estimation, interval estimation, and hypothesis testing. Among the various parameter estimation techniques available, the likelihood method stands out for its versatility and desirable properties, particularly in constructing confidence regions, intervals, and test statistics. The asymptotic theory associated with these estimates simplifies calculations and performs effectively even with limited sample data. Statisticians often aim to estimate quantities like the density of a test statistic, which is influenced by sample size, to improve the accuracy of estimate distributions. The calculations for maximum likelihood estimates (MLEs) within distribution theory are straightforward whether approached conceptually or mathematically. This section will focus on estimating parameters using the MLE method based on the entire sample. Let

x_{1}, \dots, x_{n}

be a stochastic realization of size n from the SSD distribution as defined in Equation (4). Let

P_{n} (ϕ) = {(\frac{𝜕 ℓ_{n}}{𝜕 α}, \frac{𝜕 ℓ_{n}}{𝜕 β} \frac{𝜕 ℓ_{n}}{𝜕 θ})}^{T}

be a

q \times 1

vector of the parameters. The log-likelihood function is given by

\begin{matrix} ℓ_{n} & = n Log (\frac{π}{2}) + n Log (α) + n Log (β) + n Log (θ) + (θ - 1) \sum_{i = 1}^{n} Log (\frac{α}{x_{i}}) \\ + \sum_{i = 1}^{n} Log (Cos (\frac{1}{2} π ({\frac{α}{x_{i}}}^{θ}))) - 2 \sum_{i = 1}^{n} Log (x_{i}) \end{matrix}

\begin{matrix} + (β - 1) \sum_{i = 1}^{n} Log (1 - Sin (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ})) . \end{matrix}

(12)

The log-likelihood can be maximized by differentiating Equation (12) with respect to the parameters, i.e.,

\begin{matrix} \frac{𝜕 ℓ_{n}}{𝜕 α} & = & \frac{n}{α} + \frac{n (- 1 + θ)}{α} + (- 1 + β) \sum_{i = 1}^{n} - \frac{π θ Cos (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ}) {(\frac{α}{x_{i}})}^{- 1 + θ}}{2 (1 - Sin (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ})) x_{i}} \\ - & \sum_{i = 1}^{n} \frac{π θ {(\frac{α}{x_{i}})}^{- 1 + θ} Tan (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ})}{2 x_{i}}, \end{matrix}

\frac{𝜕 ℓ_{n}}{𝜕 β} = \frac{n}{β} + \sum_{i = 1}^{n} Log (1 - Sin (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ})),

\begin{matrix} \frac{𝜕 ℓ_{n}}{𝜕 θ} & = & \frac{n}{θ} + \sum_{i = 1}^{n} Log (\frac{α}{x_{i}}) + (- 1 + β) \sum_{i = 1}^{n} - \frac{π Cos (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ}) Log (\frac{α}{x_{i}}) {(\frac{α}{x_{i}})}^{θ}}{2 (1 - Sin {(\frac{1}{2} π (\frac{α}{x_{i}}))}^{θ})} \\ + & \sum_{i = 1}^{n} - \frac{1}{2} π Log (\frac{α}{x_{i}}) {(\frac{α}{x_{i}})}^{θ} Tan (\frac{1}{2} π {(\frac{α}{x_{i}})}^{θ}) . \end{matrix}

The MLEs of parameters can be materialized by resolving the system of nonlinear equations, i.e.,

V_{n} (ϕ) = 0

. Since no closed form of estimators is possible, we have decided to find the solutions of these equations analytically by using the Newton–Raphson method via statistical packages such as Mathematica [12.0], R and Matlab.

Performance Summary of MLEs

The results presented in Table 3 demonstrate the performance of maximum likelihood estimation (MLE) for the SSD distribution parameters across five distinct parameter configurations. For Set-I (

α = 0.2556

,

θ = 0.2286

,

β = 0.2182

), the estimators exhibit minimal bias, particularly for sample sizes

n \geq 50

, with mean squared error (MSE) values decreasing rapidly as the sample size increases. This suggests strong consistency properties for MLEs when parameter values are relatively small. In Set-II (

α = 0.2556

,

θ = 1.0286

,

β = 1.1184

), we observe persistent positive bias in

\hat{θ}

estimates (approximately 0.0116 for

n = 25

) and higher MSE values compared to Set-I, particularly for

β

(MSE = 16.6767 at

n = 25

). The slower convergence indicates that larger sample sizes (

n > 150

) may be required for reliable estimation when

θ

and

β

exceed 1. The most problematic case emerges in Set-III (

α = 1.2556

,

θ = 1.2286

,

β = 0.0188

), where

\hat{β}

shows severe overestimation (bias ≈ 8.5 across all sample sizes) with extraordinarily high MSE values (approximately 75). This suggests fundamental challenges in estimating extremely small

β

parameters, which is likely due to numerical instability in the likelihood function’s behavior near zero. Set-IV (

α = 1.0045

,

θ = 0.0285

,

β = 1.2185

) presents an anomalous case where

\hat{θ}

yields negative MSE values, indicating potential computational artifacts in the optimization process. Meanwhile,

\hat{β}

maintains a consistently high bias (≈9.9) and MSE (>90), demonstrating particular sensitivity to the combination of small

θ

and large

β

. For Set-V (

α = 1.1551

,

θ = 1.3286

,

β = 1.1085

), we observe a moderate improvement in estimation quality with increasing sample size, though

\hat{θ}

and

\hat{β}

maintain higher MSE values (0.6870 and 4.3325, respectively at

n = 150

) compared to

\hat{α}

. The persistent underestimation of

α

(bias ≈−0.07 at

n = 150

) warrants further investigation into possible model misspecification.

These findings collectively highlight that the MLE performance for the SSD distribution is highly sensitive to the parameter space region with particular challenges emerging when the following apply:

$β$ approaches zero (numerical instability);
$θ$ is very small (optimization challenges);
Multiple parameters are large (slower convergence).

The results suggest that alternative estimation approaches or modified likelihood formulations may be necessary for certain parameter regimes, particularly when dealing with very small shape parameters.

3.2. Simulation Study

In order to find the estimators of SSD parameters, i.e.,

α, θ

and

β

, we have adopted three methods: namely, MLEs. In this context, we conducted a simulation study that involved five sets of parameters: Set-I:

α = 0.2556, θ = 0.2286, β = 0.2182

, Set-II:

α = 0.2556, θ = 1.0286, β = 1.1184

, Set-III:

α = 1.2556, θ = 1.2286, β = 0.0188

, Set-IV:

α = 1.0045, θ = 0.0285, β = 1.2185

, and Set-V:

α = 1.1551, θ = 1.3286, β = 1.1085

Generate 1000 samples of size n = 15, 25, 50, 75, 100, 150 from the given distribution.
Compute the MLE for $α, θ$ and $β$ using the log-likelihood function.
Calculate bias $= (\hat{Θ} - Θ)$ and MSE $= {(\hat{Θ} - Θ)}^{2})$ .
Repeat for all values of $Θ = (α, θ, β) .$

4. Discussion to Flood Data Application

4.1. Flood Frequency Analysis

Flood analysis can be performed using either annual maximum series (AMS) or partial-duration series (PDS). AMS records the highest flood peak each year, while PDS includes all peaks exceeding a set base level, potentially yielding more peaks than years of data. The Water Resources Council (USWRC) has provided guidelines for flood frequency analysis in Bulletin Nos. 15, 17, 17A, 17B, and 17C. These guidelines are suitable for floods with an annual exceedance probability (AEP) of 0.10 or less see [35]. For such quantiles, AMS provides a suitable sample and produces AEP estimates very similar to those from PDS. Furthermore, AMS is preferred for its wider availability and longer data records, while PDS can suffer from incomplete records due to challenges in defining the base threshold.

4.2. Data Sources and Competing Models

The core data foundations that are suggested for practice in flood frequency comprise systematic records, historical flood information, and pale of flood and botanical information. In this research, we adopted the systematic records and summarize them in Table 1; however, the flood measurements are listed below. Furthermore, for a comprehensive study of the models and identification of realistic return period, we have compared the SSD with well-known three-parameter distributions, such as Kappa (Kappa(3)), and Gamma (GD(3)) distributions. Similarly, two-parameter distributions, including Weibull (WD(2)), Gamma (GD(2)), Extreme Values (EV(2)), Log Logistic (LLD(2)), Log Normal (LN(2)) and Gumbel (GuD(2)).

4.3. Goodness of Fit Measure

For comparison purposes, we have studied goodness-of-fit tests, which help us assess how well a statistical distribution fits a sample of data. The three common tests discussed are the Kolmogorov–Smirnov, Cramér–von Mises, and Anderson–Darling tests.

1.: Kolmogorov–Smirnov (K-S) Test

It tests the maximum difference between the empirical distribution function (EDF) and the theoretical cumulative distribution function (CDF). Its test statistics is expressed as

D = sup_{x} | F_{n} (x) - F (x) |

where

F_{n} (x)

is the empirical distribution fnction and

F (x)

is the theoretical CDF. It is suitable for continuous distributions and sensitive to central deviations but less sensitive in the tails; see [36,37].

2.: Cramér–von Mises (CvM) Test

It measures the squared distance between the EDF and the theoretical CDF across the domain. It is defined as

W^{2} = \int_{- \infty}^{\infty} {[F_{n} (x) - F (x)]}^{2} d F (x)

It presents more balanced sensitivity across the entire distribution; see [38,39].

3.: Anderson–Darling (A-D) Test

It is a modified version of the CvM test that gives more weight to the tails. It is expressed as

A^{2} = - n - \frac{1}{n} \sum_{i = 1}^{n} [(2 i - 1) (ln F (X_{i}) + ln (1 - F (X_{n + 1 - i})))]

It is more powerful in detecting tail deviations see [40]. However, for comprehensive details, readers are referred to [41,42,43].

4.4. Information Criteria for Model Selection

Information criteria are used to compare and select statistical models by balancing goodness-of-fit and complexity; see [44,45,46]. These criterion are stated one by one as

1.: Akaike Information Criterion (AIC)

It is used to select the model with the best trade-off between fit and complexity but can overfit with small samples.

AIC = - 2 ln (L) + 2 k

where L is the likelihood function and k is the number of estimated parameters.

2.: Corrected AIC (AICc)

It is usually recommended for small sample sizes.

AICc = AIC + \frac{2 k (k + 1)}{n - k - 1}

3.: Bayesian Information Criterion (BIC)

It is developed by [45]; generally, it penalizes complexity more heavily than AIC, and it often prefers simpler models.

BIC = - 2 ln (L) + k ln (n)

4.: Hannan–Quinn Information Criterion (HQIC)

It is proposed by [44]. It is an intermediate between AIC and BIC in penalizing model complexity. [44] defined it as

HQIC = - 2 ln (L) + 2 k ln (ln (n))

5.: Consistent AIC (CAIC)

It is recognized as a variant that penalizes complexity even more, leading to more parsimonious models. It adjusts the AIC for small sample sizes and is re-expressed as

CAIC = - 2 ln (L) + k [ln (n) + 1]

Summary Table

Table 4 portrayed a comparison of these goodness-of-fit measures, which helps readers understand how they function.

4.5. Real Data Examples

Two real-world flood data sets constitute the AMF series: the first one measures the flow data for Mill Creek (Station 93) near Manhattan, IN for the period of 1940–1991 with measurements in cubic feet per second (cfs) taken from [10], and these data’s values are 4020, 3690, 2130, 2410, 3270, 1540, 2250, 2060, 5340, 4040, 2710, 2050, 5800, 3180, 2780, 2050, 5960, 2940, 2730, 1930, 4000, 2600, 2430, 1990, 3200, 2440, 2110, 2030, 5000, 2740, 2440, 2190, 4800, 2750, 2290, 1830, 5000, 2860, 2520, 1750, 8960, 2920, 2000, 1870, 3000, 2980, 1650, 1840, 3290, 2930, 2260, 3060. The second data set, which measures the peak in cubic meter per second (m³/s), was obtained from https://nrfa.ceh.ac.uk/data/station/peakflow/39008 (accessed on 24 January 2024). The National River Flow Archive (NRFA) provides access to its peak flow data of 96 stations, and the values of such data are 54.234, 81.635, 54.349, 78.382, 52.494, 59.305, 78.484, 51.479, 48.819, 52.674, 79.532, 65.128, 49.693, 74.867, 50.165, 49.347, 48.683, 72.413, 53.646, 47.325, 50.126, 54.261, 50.967, 64.294, 65.318, 64.898, 83.066, 54.338, 50.749, 56.469, 53.75, 83.059, 91.572, 72.319, 51.751, 62.626, 75.505, 62.157, 50.001, 57.514, 62.028, 56.744, 70.192, 56.502, 91.796, 70.87, 51.12, 55, 50.9, 49, 59.792, 55.786, 79.838, 63.85, 51.751, 76.892, 102.054, 49.259, 49.179, 56.34, 87.587, 59.024, 75.38, 57.788, 56.754, 54.145, 75.795, 54.746, 59.529, 60.135, 52.464, 51.439, 51.896, 66.552, 48.935, 48.781, 96.4, 48.366, 50.741, 97.989, 68.151, 76.208, 52.154, 64.358, 68.661, 51.555, 107.355, 99.092, 70.097, 54.992, 77.875, 76.292, 59.542, 50.16, 47.211, 51.5.

From Figure 7, the violin plot analysis shows that Data Set-I is more skewed with numerous high outliers, which can distort statistical measures like mean and variance. Data Set-II has a more balanced distribution with fewer and milder outliers, making it more reliable for inference. Overall, caution is needed when analyzing Data Set-I due to its extreme outliers, while Data Set-II is relatively more stable but still requires careful examination. Visualizations are essential before drawing conclusions.

Based on Table 1, the Shapiro–Wilk (SW) test for normality indicates that the data set is not normal at one percent levels of significance, thus indicating the presence of outliers, which can be visualized from the QQ-plots portrayed in Figure 7, Figure 8 and Figure 9. Since all points do not fall along this straight line and above the red line, respectively, we cannot assume normality. Figure 10 shows two plots of autocorrelation for NRFA peaks, indicating significant correlations at certain lags. These suggest possible seasonal patterns or long-term dependencies, which warrant further analysis to understand the data and improve modeling.

4.6. Data Assumptions and Specific Concerns

For reliable statistical analysis of flood data, the information gathered must be both dependable and representative of past events. Consequently, evaluating the appropriateness and applicability of flood records is an essential part of flood frequency analysis. Typically, annual peak-flow data are considered a random sample of independent and identically distributed events. Thes peak-flow data are assumed to represent the characteristics of future floods. Essentially, the underlying process generating floods is expected to be stable or unchanging over time.

Non-stationary procedures are challenging to be identified in peak-flow series. So, in this regard, we have applied the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test for testing a null hypothesis that an observable time series is stationary around a deterministic trend against the alternative, which is non stationary, and Mann–Kendall (MK) tests to identify the independece, monotonic upward or downward trend over time component of the AMS. From Table 5, it is evident that KPSS, MK and SW tests provide strong evidence in favor of level stationarity (which is also clear from Figure 10 and Figure 11 for plots of both time series data sets), independence and identical realizations, and non-normal behavior.

Table 6 presents Kendall’s rank correlations (

τ

), which are also calculated for determining the trends in an AMS, and portrayed in Figure 12 and Figure 13, indicating a negative and positive trend between years and volume for Data Set-I and -II, respectively, with higher p-values indicating no monotonic relationship between them.

In addition, time series and autocorrelation plots are portrayed in Figure 10 and Figure 11. Notably, the stationary signal (left) in Figure 10 and Figure 11 results in few significant lags (right panel) that exceed the confidence interval of the ACF (blue dashed line). Intuitively, we can realize and summarize from the ACFs plots that the signal, on the left panel, is stationary because the lags die out. The ACF plots show significant lags indicating potential seasonal patterns or long-term dependencies, which should be further analyzed for their impact on data structure and modeling.

While discussing the assumption of flood data sets, we have observed that the selected data sets follow the usual assumptions like stationarity, independence, and trend nature. Now, we shall discuss the goodness-of-fit statistics of the proposed and competing models. From Table 7, we observe that selected data sets are leptokurtic and positively skewed, and for the SSD model, there are minimum goodness-of-fit measures with high p-values, as indicated in Table 8 and Table 9. This supports the appropriateness of the proposed model, suggesting that the SSD is the best choice for such a data set, which produces an excellent fit and is thus likely robust to outliers. Meanwhile, GuD(2) produces a very low p-value—outliers likely caused the rejection of this model. Additionally, the histograms in Figure 14 and the information criteria presented in Table 10 and Table 11 further reinforce the model’s suitability, demonstrating minimal information loss and the best fit for the SSD model. In addition, distributions such as GuD(2) yield very low p-values, which are likely due to outliers inflating the test statistics. In contrast, SSD and LDD(2) exhibit very high p-values, indicating that they may effectively accommodate the outliers or that the data have undergone preprocessing steps, such as scaling or winsorization.

The information matrix is a crucial metric that needs to be estimated to create confidence intervals around point estimates. It is derived from the matrix of second derivatives of the log-likelihood and serves as the basis for the variance–covariance matrix. This square matrix contains the variances and covariances of various variables. The diagonal elements reflect the variances of the individual variables, while the off-diagonal elements represent the covariances between all possible pairs of variables. This matrix is a valuable tool for assessing the relationships between different structures or variables. It aids in understanding patterns and dependencies in data, facilitating tasks such as dimensionality reduction, clustering, or regression analysis.

In this context, we have also computed the variance–covariance matrix for the SSD. The confidence interval bands for the estimates of

\hat{α}

,

\hat{β}

and

\hat{θ}

for the data sets can be observed in Figure 15, Figure 16 and Figure 17, respectively.

\begin{matrix} \begin{matrix} \hat{α} & \hat{β} & \hat{θ} \end{matrix} \\ {COV}_{I} = & \begin{matrix} \hat{α} \\ \hat{β} \\ \hat{θ} \end{matrix} & (\begin{matrix} 1814.91 & 33.3254 & - 9.4677 \\ 33.3254 & 0.6021 & - 0.1627 \\ - 9.4677 & - 0.1627 & 0.0352 \end{matrix}) \end{matrix}, \begin{matrix} \begin{matrix} \hat{α} & \hat{β} & \hat{θ} \end{matrix} \\ {COV}_{II} = & \begin{matrix} \hat{α} \\ \hat{β} \\ \hat{θ} \end{matrix} & (\begin{matrix} 0.0611 & 0.0267 & - 0.3903 \\ 0.0267 & 0.0057 & - 0.0403 \\ - 0.3903 & - 0.0403 & 0.1467 \end{matrix}) \end{matrix} .

On the other hand, the well-known loss-of-information criteria, such as Akaike’s information criterion (AIC), the corrected version (AICC), Bayesian information criterion (BIC), Hannan–Quinn information criterion (HQIC), and consistent AIC (CAIC) are recommended for model selection in cases where traditional goodness-of-fit metrics are unable to distinguish the better model due to overlapping results. The model’s applicability is further supported by Table 8 and Table 9, which clearly support the suggested model by showing the model with the least loss of information for both data sets. Moreover, distributions such as GuD(2), WD(2), and EVD(2) exhibit very low p-values, indicating a significant influence of outliers, particularly in the tails. Even more stable distributions like LND(2) and GD(2) fall below the 0.05 significance level, suggesting that outliers have caused them to be rejected. In contrast, only SSD maintains a relatively high and stable p-value, reflecting its robustness against deviations.

4.7. Hydrological Parameters

The AMF series is widely used in FFA for two main reasons. First, its accessibility, as most data are organized in a way that makes annual series readily available. Second, there is a straightforward theoretical foundation for extrapolating the frequency of AMF series data beyond the observed range (see [10,17,18]). Therefore, classical frequency analyses are applied to all AMF series. Since the SSD model is determined to be the most suitable based on the two data analyses mentioned above, we proceed to investigate other hydrological characteristics through return period estimation, utilizing the properties of this model.

Return Period

The likelihood of events, such as windstorms, tornadoes, and floods reoccurring at least once, is often expressed in terms of a return period length, which is typically denoted by

T

. This return period is the reciprocal of the probability of exceedance in a given year (see [10]). The relationship between exceedance probability and the annual return period can now be described as follows.

\begin{matrix} F (x_{T}) & = & P (X \leq x_{T}) = 1 - P (X > x_{T}) = 1 - \frac{1}{T}, \end{matrix}

which implies

\begin{matrix} p & = & P (X > x_{T}) = \frac{1}{T}, \end{matrix}

hence,

T = \frac{1}{p}

, where

F (x_{T}) = {(1 - Sin (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}

is the probability of non-exceedence and

x_{T}

is a high threshold whose probability of exceedance is

p

. Therefore, the return level

x_{T}

for the PRD can be obtained by the following expression

x_{T} = {(\frac{π}{2})}^{\frac{1}{θ}} α ArcSin {[1 - {(1 - \frac{1}{T})}^{\frac{1}{β}}]}^{- 1 / θ}, α, β, θ > 0,

where

x_{T} > 0

and

T \geq 1

. Table 12 delivers estimates of the return level

x_{T}

and Table 13 yields estimated times of the recurrence of flood for the data set separately against the return periods

T = 5, 10, 20, 25, 30, 40, 50

years and return periods

x_{T}

where

T = \frac{1}{P (x_{T})}

, where

P (x_{T}) = SF (x_{T})

is the SF of the SSD given by

{SF}_{SSD} (x | α, θ, β) = 1 - {(1 - Sin e (\frac{π}{2} {(\frac{α}{x})}^{θ}))}^{β}, x \geq α, α, β, θ > 0 .

In computations, the existing parameters are replaced by their estimates

\hat{α}

,

\hat{θ}

and

\hat{β}

that indicate the MLEs of the SSD for the comparable data set. Additionally, plots in Figure 18 and Figure 19 for the said data sets imply that the suggested model depicts a realistic (neither too large nor too short) return period when compared with the competing models. Such a comparison is also portayed in Table 12 and Table 13, which clearly indicates that after 50 years, the flood discharge will be about 7614.74 cfs and 121.354 m³/s for Data Set-I and -II, respectively. Similarly, Figure 18 and Figure 19 portrays the the return period, which is more realistic when compared with other competing models. Other competing models portray higher time periods for the occurrence of such floods, which affects the feasibility of construction and administritations of reservior. Similarly, Table 14 and Table 15 portray confidence intervals for the return period and level estimates based on non-central t-distribution.

5. Conclusions and Future Work

This article presents a flexible probability model, the lower bounded Sine-Skewed Distribution (SSD), and examines its mathematical and statistical properties, including the mode, hazard function, asymptotic distributions, quantiles, moments, and order statistics. The model’s parameters are estimated via maximum likelihood estimation (MLE). A comprehensive simulation study is conducted to compare their performance, demonstrating that MLE provides the most reliable parameter estimates due to its consistency, asymptotic efficiency, and fulfillment of regularity conditions. The simulation results further confirm that MLE exhibits lower bias and mean squared error (MSE), particularly for finite samples, reinforcing its suitability for practical applications. The model is applied to two real-world flood data sets for validation. The analysis confirms that the proposed LBSSD significantly improves flood frequency analysis, aiding in reservoir design for defined timeframes. Key assumptions for flood data—including independence, trend analysis, stationarity, outlier detection, and autocorrelation function (ACF) checks—are rigorously assessed. The model demonstrates robustness even in the presence of outliers, ensuring reliable flood risk assessments. Additionally, confidence intervals for return levels and return periods are constructed, enhancing the interpretability of extreme event predictions. The results highlight that the Sine-Skewed Distribution (SSD), coupled with MLE-based inference, performs effectively in flood data analysis and hydrological applications, offering a valuable tool for water resource management and infrastructure planning. For future research, we plan to explore characterization issues of SSD, extend it to a bivariate framework for modeling correlated flood events, and apply it to broader hydrological and environmental data sets. A multivariate version of SSD will also be developed to improve flood prediction and address complex environmental challenges. This extension aims to provide more reliable tools for policymakers and engineers managing water resources and disaster preparedness.

Author Contributions

Conceptualization, T.H. and M.S.; methodology, T.H. and M.A.; software, T.H., and M.S.; validation, T.H. and B.M.G.K.; formal analysis, T.H. and M.S.; writing—original draft preparation, T.H. and M.A.; writing—review and editing, T.H. and M.A.; visualization, M.S., T.H. and B.M.G.K.; supervision, M.A., B.M.G.K. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study’s application section lists the data that were used along with their citations.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Alkhairy, I.; Nagy, M.; Muse, A.H.; Hussam, E. The Arctan-X Family of Distributions: Properties, Simulation, and Applications to Actuarial Sciences. Complexity 2021, 2021, 4689010. [Google Scholar] [CrossRef]
Nguyen, T.; Lee, S. Application of advanced heavy-tailed distributions in flood frequency analysis with climate change considerations. Stoch. Environ. Res. Risk Assess. 2023, 37, 987–1004. [Google Scholar]
Al-Babtain, A.A.; Shakhatreh, M.K.; Nassar, M.; Afify, A.Z. A new modified Kies family: Properties, estimation under complete and type-II censored samples, and engineering applications. Mathematics 2020, 8, 1345. [Google Scholar] [CrossRef]
Allouche, M.; Girard, S.; Gobet, E. Estimation of extreme quantiles from heavy-tailed distributions with neural networks. Stat. Comput. 2024, 34, 12. [Google Scholar] [CrossRef]
Kim, T.J.; Kwon, H.H.; Shin, Y.S. Frequency analysis of storm surge using Poisson-Generalized Pareto distribution. J. Korea Water Resour. Assoc. 2019, 52, 173–185. [Google Scholar]
Korkmaz, M.Ç. A new heavy-tailed distribution defined on the bounded interval: The logit slash distribution and its application. J. Appl. Stat. 2020, 47, 2097–2119. [Google Scholar] [CrossRef]
Zhou, Z.; Liu, S.; Hu, Y.; Liang, Y.; Lin, H.; Guo, Y. Analysis of precipitation extremes in the Taihu Basin of China based on the regional L-moment method. Hydrol. Res. 2017, 48, 468–479. [Google Scholar] [CrossRef]
Ahmad, Z.; Mahmoudi, E.; Dey, S. A new family of heavy tailed distributions with an application to the heavy tailed insurance loss data. Commun.-Stat.-Simul. Comput. 2022, 51, 4372–4395. [Google Scholar] [CrossRef]
Lyu, H.M.; Sun, W.J.; Shen, S.L.; Arulrajah, A. Flood risk assessment in metro systems of mega-cities using a GIS-based modeling approach. Sci. Total Environ. 2018, 626, 1012–1025. [Google Scholar] [CrossRef]
Hamed, K.; Rao, A.R. Flood Frequency Analysis; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
Zhang, Q.; Gu, X.; Singh, V.P.; Xiao, M. Flood frequency analysis with consideration of hydrological alterations: Changing properties, causes and implications. J. Hydrol. 2014, 519, 803–813. [Google Scholar] [CrossRef]
Chen, J.; Liu, Y.; Zhang, Q. Advances in flood frequency analysis: Incorporating machine learning and heavy-tailed models. Water Resour. Res. 2024, 60, e2023WR030233. [Google Scholar]
Li, C.; Sun, N.; Lu, Y.; Guo, B.; Wang, Y.; Sun, X.; Yao, Y. Review on urban flood risk assessment. Sustainability 2022, 15, 765. [Google Scholar] [CrossRef]
Zhou, Y.; Guo, S.; Xu, C.Y.; Xiong, L.; Chen, H.; Ngongondo, C.; Li, L. Probabilistic interval estimation of design floods under non-stationary conditions by an integrated approach. Hydrol. Res. 2022, 53, 259–278. [Google Scholar] [CrossRef]
Deng, J. Maximum entropy method for flood frequency analysis: A case study of the Grand River in Ontario, Canada. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2019; Volume 344, p. 012002. [Google Scholar]
Griffis, V.W.; Stedinger, J.R. Log-Pearson type 3 distribution and its application in flood frequency analysis. I: Distribution characteristics. J. Hydrol. Eng. 2007, 12, 482–491. [Google Scholar] [CrossRef]
Hasan, I.F. Flood Frequency Analysis of Annual Maximum Streamflows at Selected Rivers in Iraq. Jordan J. Civ. Eng. 2020, 14, 573–586. [Google Scholar]
Hosking, J.R.M.; Wallis, J.R. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
McCuen, R.H. Modeling Hydrologic Change: Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Nouri Gheidari, M.H. Comparisons of the L-and LH-moments in the selection of the best distribution for regional flood frequency analysis in Lake Urmia Basin. Civ. Eng. Environ. Syst. 2013, 30, 72–84. [Google Scholar] [CrossRef]
Sagrillo, M.; Guerra, R.R.; Bayer, F.M. Modified Kumaraswamy distributions for double bounded hydro-environmental data. J. Hydrol. 2021, 603, 127021. [Google Scholar] [CrossRef]
Boorman, D.B. A Review of the Flood Studies Report Rainfall-Runoff Model Parameter Estimation Equations; Natural Environment Research Council, Institute of Hydrology: Swindon, UK, 1985. [Google Scholar]
Cunnane, C. Statistical distribution for flood frequency analysis. In WMO Operational Hydrology; Report No. 33, WMO-No. 718; Operational Hydrology Report (WMO): Geneva, Switzerland, 1989. [Google Scholar]
Millington, N.; Das, S.; Simonovic, S.P. The Comparison of GEV, Log-Pearson Type 3 and Gumbel Distributions in the Upper Thames River Watershed Under Global Climate Models; Department of Civil and Environmental Engineering, the University of Western: London, ON, Canada, 2011. [Google Scholar]
Rowinski, P.M.; Strupczewski, W.G.; Singh, V.P. A note on the applicability of log-Gumbel and log-logistic probability distributions in hydrological analyses: I. Known pdf. Hydrol. Sci. J. 2002, 47, 107–122. [Google Scholar] [CrossRef]
Wang, W.C.; Xu, D.M.; Chau, K.W.; Chen, S. Improved annual rainfall-runoff forecasting using PSO–SVM model based on EEMD. J. Hydroinformatics 2013, 15, 1377–1390. [Google Scholar] [CrossRef]
Sapkota, L.P.; Kumar, P.; Kumar, V. A new class of sin-g family of distributions with applications to medical data. Reliab. Theory Appl. 2023, 18, 734–750. [Google Scholar]
Faruk, M.U.; Isa, A.M.; Kaigama, A. Sine-Weibull Distribution: Mathematical Properties and Application to Real Datasets. Reliab. Theory Appl. 2024, 19, 65–72. [Google Scholar]
Isa, A.M.; Bashiru, S.O.; Ali, B.A.; Adepoju, A.A.; Itopa, I.I. Sine-exponential distribution: Its mathematical properties and application to real dataset. UMYU Sci. 2022, 1, 127–131. [Google Scholar] [CrossRef]
Isa, A.M.; Ali, B.A.; Zannah, U. Sine burr xii distribution: Properties and application to real data sets. Arid. Zone J. Basic Appl. Res. 2022, 1, 48–58. [Google Scholar] [CrossRef]
Mustapha, B.A.; Isa, A.M.; Sule, O.B.; Itopa, I.I. Sine-Lomax distribution: Properties and applications to real data sets. FUDMA J. Sci. 2023, 7, 60–66. [Google Scholar] [CrossRef]
Bakouch, H.S.; Hussain, T.; Chesneau, C.; Jónás, T. A notable bounded probability distribution for environmental and lifetime data. Earth Sci. Inform. 2022, 15, 1607–1620. [Google Scholar] [CrossRef]
Merz, B.; Basso, S.; Fischer, S.; Lun, D.; Blöschl, G.; Merz, R.; Schumann, A. Understanding heavy tails of flood peak distributions. Water Resour. Res. 2022, 58, e2021WR030506. [Google Scholar] [CrossRef]
Kumar, D.; Singh, U.; Singh, S.K. A New Distribution Using Sine Function Its Application to Bladder Cancer Patients Data. J. Stat. Appl. Pro. 2015, 4, 417–427. [Google Scholar]
England, J.F., Jr.; Cohn, T.A.; Faber, B.A.; Stedinger, J.R.; Thomas, W.O., Jr.; Veilleux, A.G.; Kiang, J.E.; Mason, R.R., Jr. Guidelines for Determining Flood Flow Frequency—Bulletin 17C, Techniques and Methods 4-B5; US Geological Survey: Reston, VA, USA, 2018; Chapter B5; 148p. [Google Scholar] [CrossRef]
An, K. Sulla determinazione empirica di una legge didistribuzione. Giorn Dell’inst Ital Degli Att 1933, 4, 89–91. [Google Scholar]
Smirnov, N. Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat. 1948, 19, 279–281. [Google Scholar] [CrossRef]
Cramar, H. On the composition of elementary errors. Skand. Aktuarietids 1928, 11, 13–74. [Google Scholar]
Von Mises, R. Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und Theoretischen. In Physik; Mary S. Rosenberg: New York, NY, USA, 1945. [Google Scholar]
Anderson, T.W.; Darling, D.A. Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
Hussain, T.; Bakoush, S.H.; Rehman, U.Z.; Shan, Q. A Flexible Discrete Probability Model for Partly Cloudy Days. Colomb. J. Stat. 1997, 48, 1–21. [Google Scholar]
Hussain, T.; Bakouch, H.S.; Gharari, F. Environmental Data Analysis with a Versatile Model on Time Scales. Iran. J. Sci. 2025, 1–15. [Google Scholar] [CrossRef]
Biçer, C.; Bakouch, H.S.; Biçer, H.D.; Alomair, G.; Hussain, T.; Almohisen, A. Unit Maxwell-Boltzmann Distribution and Its Application to Concentrations Pollutant Data. Axioms 2024, 13, 226. [Google Scholar] [CrossRef]
Hannan, E.J.; Quinn, B.G. The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.) 1979, 41, 190–195. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Akaike, H. A new l ook at the statistical model identification. IEEE Trans. Autom. Control 2003, 19, 716–723. [Google Scholar] [CrossRef]

Figure 1. PDF graphs of SSD.

Figure 2. HRF graphs of SSD.

Figure 3. Modal plot-I.

Figure 4. Modal plot-II.

Figure 5. Skewness and kurtosis graphs of SSD by percentile.

Figure 6. Skewness and kurtosis graphs of SSD by moments.

Figure 7. Violin plot for the Data Set-I and -II.

Figure 8. QQ-plot for Data Set-I.

Figure 9. QQ-plot for Data Set-II.

Figure 10. Time series and ACF plots for Data Set-II.

Figure 11. Time series and ACF plots for Data Set-I.

Figure 12. Trend line plot for Data Set-I.

Figure 13. Trend line plot for Data Set-II.

Figure 14. Histogram for Data Set-I and -II.

Figure 15. Confidence interval belt of

\hat{α}

for Data Set-I and-II.

Figure 15. Confidence interval belt of

\hat{α}

for Data Set-I and-II.

Figure 16. Confidence interval belt of

\hat{β}

for Data Set-I and-II.

Figure 16. Confidence interval belt of

\hat{β}

for Data Set-I and-II.

Figure 17. Confidence interval belt of

\hat{θ}

for Data Set-I and-II.

Figure 17. Confidence interval belt of

\hat{θ}

for Data Set-I and-II.

Figure 18. Competing models’ return periods for Data Set-I.

Figure 19. Competing models’ return periods for Data Set-II.

Table 1. Modal values of SSD.

$α$	$β$	$θ$	$X_{Mode}$	$\frac{X_{Mode}}{α}$
0.5	1.00	1.00	0.729332	1.458663
0.5	1.00	1.25	0.692820	1.385640
0.5	1.00	1.50	0.666296	1.332592
0.5	1.00	1.75	0.646167	1.292333
0.5	1.25	2.00	0.682479	1.364958
0.5	1.25	2.25	0.663688	1.327376
0.5	1.25	2.50	0.648399	1.296797
0.5	1.25	2.75	0.635717	1.271435
0.5	1.50	3.00	0.655265	1.310531
0.5	1.50	3.25	0.643580	1.287160
0.5	1.50	3.50	0.633528	1.267056
0.5	1.50	3.75	0.624789	1.249578
0.5	1.75	4.00	0.636909	1.273818
0.5	1.75	4.25	0.628827	1.257653
0.5	1.75	4.50	0.621644	1.243288
0.5	1.75	4.75	0.615219	1.230437
0.5	2.00	5.00	0.623408	1.246816
0.5	2.00	5.25	0.617429	1.234858
0.5	2.00	5.50	0.612001	1.224003
0.5	2.00	5.75	0.607053	1.214105
1.0	1.00	1.25	1.385641	1.385641
1.0	1.00	1.50	1.332592	1.332592
1.0	1.00	1.75	1.292334	1.292334
1.0	1.00	2.00	1.260749	1.260749
1.0	1.25	2.25	1.327376	1.327376
1.0	1.25	2.50	1.296798	1.296798
1.0	1.25	2.75	1.271435	1.271435
1.0	1.25	3.00	1.250060	1.250060

Table 2. Numerical study of descriptive statistics from SSD.

$α$	$θ$	$β$	Mean	Variance	Skewness	Kurtosis
0.5	1.0	1.0	1.2337	0.4516	0.6396	0.4067
0.5	1.0	1.25	1.2337	0.4516	0.6396	0.4067
0.5	1.0	1.5	1.2337	0.4516	0.6396	0.4067
0.5	1.0	2.0	1.2337	0.4516	0.6396	0.4067
0.5	1.25	1.0	1.1272	0.5379	0.7685	0.5886
0.5	1.25	1.25	1.1272	0.5379	0.7685	0.5886
0.5	1.25	1.5	1.1272	0.5379	0.7685	0.5886
0.5	1.25	2.0	1.1272	0.5379	0.7685	0.5886
0.5	1.5	1.0	1.0613	0.5906	0.8535	0.7289
0.5	1.5	1.25	1.0613	0.5906	0.8535	0.7289
0.5	1.5	1.5	1.0613	0.5906	0.8535	0.7289
0.5	1.5	2.0	1.0613	0.5906	0.8535	0.7289
0.5	2.0	1.0	0.9844	0.6479	0.9575	0.9148
0.5	2.0	1.25	0.9844	0.6479	0.9575	0.9148
0.5	2.0	1.5	0.9844	0.6479	0.9575	0.9148
0.5	2.0	2.0	0.9844	0.6479	0.9575	0.9148
1.0	1.0	1.0	2.4674	2.7899	1.8071	4.4036
1.0	1.0	1.25	2.4674	2.7899	1.8071	4.4036
1.0	1.0	1.5	2.4674	2.7899	1.8071	4.4036
1.0	1.0	2.0	2.4674	2.7899	1.8071	4.4036
1.0	1.25	1.0	2.2543	2.1537	1.6078	3.6193
1.0	1.25	1.25	2.2543	2.1537	1.6078	3.6193
1.0	1.25	1.5	2.2543	2.1537	1.6078	3.6193
1.0	1.25	2.0	2.2543	2.1537	1.6078	3.6193
1.0	1.5	1.0	2.1226	1.7629	1.4688	3.1402
1.0	1.5	1.25	2.1226	1.7629	1.4688	3.1402
1.0	1.5	1.5	2.1226	1.7629	1.4688	3.1402
1.0	1.5	2.0	2.1226	1.7629	1.4688	3.1402
1.0	2.0	1.0	1.9687	1.5916	1.3525	2.7537
1.0	2.0	1.25	1.9687	1.5916	1.3525	2.7537
1.0	2.0	1.5	1.9687	1.5916	1.3525	2.7537
1.0	2.0	2.0	1.9687	1.5916	1.3525	2.7537
1.25	1.0	1.0	3.0843	5.5426	2.2568	6.8014
1.25	1.0	1.25	3.0843	5.5426	2.2568	6.8014
1.25	1.0	1.5	3.0843	5.5426	2.2568	6.8014
1.25	1.0	2.0	3.0843	5.5426	2.2568	6.8014
1.25	1.25	1.0	2.8179	4.2149	2.0079	5.7171
1.25	1.25	1.25	2.8179	4.2149	2.0079	5.7171
1.25	1.25	1.5	2.8179	4.2149	2.0079	5.7171
1.25	1.25	2.0	2.8179	4.2149	2.0079	5.7171
1.25	1.5	1.0	2.6532	3.4423	1.8358	5.0376
1.25	1.5	1.25	2.6532	3.4423	1.8358	5.0376
1.25	1.5	1.5	2.6532	3.4423	1.8358	5.0376
1.25	1.5	2.0	2.6532	3.4423	1.8358	5.0376
1.25	2.0	1.0	2.4609	2.8008	1.6909	4.4275
1.25	2.0	1.25	2.4609	2.8008	1.6909	4.4275
1.25	2.0	1.5	2.4609	2.8008	1.6909	4.4275
1.25	2.0	2.0	2.4609	2.8008	1.6909	4.4275

Table 3. Mean bias and MSE of MLEs’ against various sample sizes.

Set	n	Bias( $\hat{α}$ )	Bias( $\hat{θ}$ )	Bias( $\hat{β}$ )	MSE( $\hat{α}$ )	MSE( $\hat{θ}$ )	MSE( $\hat{β}$ )
I	15	−0.0223	0.0037	0.1477	0.1002	0.4651	4.0309
	25	−0.0114	0.0015	0.0961	0.0377	0.1462	1.1923
	50	−0.0035	0.0001	0.0801	0.0190	−0.0021	0.1015
	75	−0.0021	0.0000	0.0737	0.0114	−0.0190	0.0024
	100	−0.0015	0.0000	0.0677	0.0086	−0.0236	0.0019
	150	−0.0011	0.0000	0.0628	0.0066	−0.0260	0.0015
II	15	0.0113	0.0141	−0.0621	0.1599	1.4054	14.8283
	25	−0.0086	0.0116	−0.0119	0.1192	1.7738	16.6767
	50	−0.0233	0.0082	0.0163	0.0656	1.6010	13.6349
	75	−0.0243	0.0060	0.0359	0.0501	1.2596	9.7074
	100	−0.0234	0.0051	0.0343	0.0388	1.0495	7.5112
	150	−0.0199	0.0040	0.0295	0.0303	0.7834	5.0098
III	15	−0.0063	0.0010	7.7915	66.5011	0.1368	0.0201
	25	−0.0002	0.0001	8.5028	73.6786	0.1358	0.0190
	50	−0.0000	0.0000	8.4553	72.7072	0.1379	0.0195
	75	0.0000	0.0000	8.6431	75.99999	0.1360	0.0188
	100	0.0000	0.0000	8.6678	75.3438	0.1355	0.0185
	150	−0.0001	0.0000	8.6595	75.2051	0.1365	0.0189
IV	15	−0.0041	0.0005	9.2769	91.0090	−1.0936	1.1965
	25	−0.0112	0.0018	9.2276	90.4614	−1.0805	1.1694
	50	0.0007	0.0000	9.8235	97.1826	−1.0895	1.1872
	75	0.0007	0.0000	9.8791	97.9300	−1.0894	1.1869
	100	0.0008	0.0000	9.9400	98.9089	−1.0896	1.1874
	150	0.0008	0.0000	9.9436	98.9663	−1.0894	1.1869
V	15	−0.0694	0.1861	−0.0145	0.2764	1.9429	18.6212
	25	−0.0970	0.1507	0.0001	0.1944	1.9350	17.7271
	50	−0.1005	0.1117	0.0326	0.1124	1.5730	13.0220
	75	−0.1269	0.1036	0.0680	0.0950	1.6428	12.7965
	100	−0.0796	0.0702	0.0361	0.0650	0.9820	6.9775
	150	−0.0667	0.0506	0.0334	0.0438	0.6870	4.3325

Table 4. Comparison of Goodness of Fit Measures.

Criterion/Test	Use	Sensitivity	Sample Size
Kolmogorov–Smirnov	EDF vs. CDF	Center	Large
Cramér–von Mises	EDF vs. CDF	Entire distribution	Moderate
Anderson–Darling	EDF vs. CDF	Tails	All sizes
AIC/AICc	Model fit/complexity	Fit (AICc for small n)	All
BIC/CAIC	Simpler models	Strong penalty	Large
HQIC	Balanced selection	Moderate penalty	Moderate+

Table 5. Nonparameteric tests summary.

Data Sets	KPSS-Test	p-Value	MK-Test	p-Value	SW-Test	p-Value
I	0.1004	0.1000	−0.7891	0.4300	0.7879	0.0000
II	0.1688	0.1000	−0.7168	0.4735	0.8652	0.0000

Table 6. Kendall’s rank correlation

(τ)

test summary.

Table 6. Kendall’s rank correlation

(τ)

test summary.

Data Sets	$\hat{τ}$	t-Test	p-Value
I	−0.0762	−0.7970	0.4254
II	0.0388	0.5602	0.5753

Table 7. Descriptive summary of data sets.

Data Set	Sample Size	Mean	Median	S.D	SK	KU
I	52	3011.73	2720.00	1363.71	2.1199	8.6042
II	96	62.9785	57.134	14.4087	1.13741	3.53627

Table 8. MLEs and goodness-of-fit measures of Data Set-I.

Distribution	$\hat{α}$	$\hat{θ}$	$\hat{β}$	$A_{0}^{*}$	$W_{0}^{*}$	KS	p-Value
SSD	1166.68	3.2396	5.5955	0.1742	0.0305	0.0775	0.9311
Kappa(3)	4.4955	0.9577	2614.65	33.9922	7.3047	0.7408	0.0211
GD(3)	74.8171	0.0024	0.3089	1.0732	0.1750	0.1326	0.3552
WD(2)	2.3071	3404.18	-	2.6105	0.4431	0.1927	0.0525
GD(2)	6.7968	443.11	-	1.5066	0.2559	0.1549	0.1903
EVD(2)	2479.07	804.21	-	0.8917	0.1253	0.1016	0.6926
LLD(2)	4.9309	2693.13	-	0.5937	0.0595	0.0683	0.9263
LND(2)	7.9349	0.3668	-	0.9190	0.1479	0.1236	0.4427
GuD(2)	3798.06	2001.99	-	5.9406	1.1005	0.2765	0.0011

Table 9. MLEs and goodness-of-fit measures of Data Set-II.

Distribution	$\hat{α}$	$\hat{θ}$	$\hat{β}$	$A_{0}^{*}$	$W_{0}^{*}$	KS	p-Value
SSD	46.9895	4.2850	0.7397	0.6910	0.1038	0.0805	0.5689
Kappa(3)	687.6587	0.0009	152.93	1.9240	0.2988	0.1061	0.2357
GD(3)	38.8770	0.4668	0.7473	3.1205	0.5129	0.1482	0.0307
WD(2)	4.3382	68.8192	−	4.4453	0.7276	0.1771	0.0051
GD(2)	21.7629	2.8938	−	3.2649	0.5413	0.1520	0.0248
EVD(2)	56.7169	9.7213	−	2.6260	0.4097	0.12832	0.0875
LLD(2)	1.7122	51042.47	−	2.7160	0.3742	0.1216	0.1202
LND(2)	4.1196	0.2096	−	2.9166	0.4804	0.1450	0.0368
GuD(2)	70.8220	17.0419	−	6.3241	1.0715	0.2213	0.0002

Table 10. Information criterion for Data Set-I.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
$SSD$	429.372	864.743	865.243	870.597	866.987	873.597
Kappa(3)	434.2351	870.8700	871.3700	875.7300	874.6200	871.3700
GD(3)	435.2270	878.4540	879.3050	886.2590	873.2020	879.3050
WD(2)	444.8790	893.7580	894.0020	897.6600	892.5060	894.0020
GD(2)	437.8450	879.6910	879.9360	883.5930	878.4390	879.9360
EVD(2)	434.3140	872.6290	872.8740	876.5310	871.3770	872.8740
LLD(2)	433.7180	871.4350	871.6800	875.3380	870.1840	871.6800
LND(2)	434.2490	872.4980	872.7430	876.4000	871.2460	872.7430
GuD(2)	467.7230	939.4460	939.6910	943.3480	938.1940	939.6910

Table 11. Information criterion for Data Set-II.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
$SSD$	362.926	731.852	732.102	739.667	735.008	742.667
Kappa(3)	372.448	750.897	751.158	758.59	754.007	761.59
GD(3)	383.824	771.647	771.776	776.776	773.72	778.776
WD(2)	397.187	798.374	798.503	803.503	800.447	805.503
GD(2)	384.589	773.178	773.307	778.306	775.251	780.306
EVD(2)	376.169	756.337	756.466	761.466	758.41	763.466
LLD(2)	383.481	770.962	771.091	776.09	773.035	778.09
LND(2)	381.702	767.404	767.533	772.533	769.477	774.533
GuD(2)	412.409	828.817	828.946	833.946	830.891	835.946

Table 12. Return levels against specific time period (

T

).

Table 12. Return levels against specific time period (

T

).

Data Set	5	10	20	25	30	40	50
I	3647.86	4584.17	5716.35	6132.04	6492.69	7103.31	7614.74
II	71.2795	83.5847	98.0997	103.307	107.77	115.218	121.354

Table 13. Return period against some threshold values (

x_{T}

).

Table 13. Return period against some threshold values (

x_{T}

).

Data Set	5000	5500	6000	7000	8000	9000	10,000
I	13.1098	17.6994	23.3262	38.1651	58.5975	85.6291	120.297
Data Set	45	50	60	80	100	120	140
II	1.1239	1.1557	2.3682	8.2659	21.7269	47.6459	92.3923

Table 14. 95% confidence intervals for return level estimates.

Data Set	5	10	20	50
I	3647.86 ± 951.427	4584.17 ± 1308.83	5716.35 ± 1765.56	7614.74 ± 2579.83
II	71.2795 ± 108.305	83.5847 ± 141.606	98.0997 ± 182.433	121.354 ± 252.433

Table 15. 95% confidence intervals for return period estimates.

Data Set	5	10	20	50
I	(2.1798,10.1026)	(3.6618,22.0289)	(6.3433,47.0698)	(13.399, 47.0616)
II	(0,268.873)	(0,709.365)	(0, 1819.15)	(0, 6222.27)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hussain, T.; Shakil, M.; Ahsanullah, M.; Kibria, B.M.G. A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics 2025, 4, 19. https://doi.org/10.3390/analytics4030019

AMA Style

Hussain T, Shakil M, Ahsanullah M, Kibria BMG. A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics. 2025; 4(3):19. https://doi.org/10.3390/analytics4030019

Chicago/Turabian Style

Hussain, Tassaddaq, Mohammad Shakil, Mohammad Ahsanullah, and Bhuiyan Mohammad Golam Kibria. 2025. "A Bounded Sine Skewed Model for Hydrological Data Analysis" Analytics 4, no. 3: 19. https://doi.org/10.3390/analytics4030019

APA Style

Hussain, T., Shakil, M., Ahsanullah, M., & Kibria, B. M. G. (2025). A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics, 4(3), 19. https://doi.org/10.3390/analytics4030019

Article Menu

A Bounded Sine Skewed Model for Hydrological Data Analysis

Abstract

1. Introduction

Derivation of the Proposed Model

2. Exploring Mathematical and Statistical Features

2.1. Shape of the PDF and HRF Curves

2.2. Percentile Function

2.3. Moments and Moment-Generating Function

2.4. Conditional Moments

2.4.1. Mean Deviation

2.4.2. Bonferroni and Lorenz Curves

2.5. Order Statistics

3. Methods of Parameter Estimation and Simulation Study

3.1. Method of Maximum Likelihood

Performance Summary of MLEs

3.2. Simulation Study

4. Discussion to Flood Data Application

4.1. Flood Frequency Analysis

4.2. Data Sources and Competing Models

4.3. Goodness of Fit Measure

4.4. Information Criteria for Model Selection

Summary Table

4.5. Real Data Examples

4.6. Data Assumptions and Specific Concerns

4.7. Hydrological Parameters

Return Period

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI