Previous Article in Journal
Predictive Framework for Regional Patent Output Using Digital Economic Indicators: A Stacked Machine Learning and Geospatial Ensemble to Address R&D Disparities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bounded Sine Skewed Model for Hydrological Data Analysis

by
Tassaddaq Hussain
1,
Mohammad Shakil
2,*,
Mohammad Ahsanullah
3 and
Bhuiyan Mohammad Golam Kibria
4
1
Department of Mathematics, Mirpur University of Science and Technology (MUST), Mirpur 10250, Pakistan
2
Department of Mathematics, Miami Dade College, Hialeah, FL 51452, USA
3
Department of Management Sciences, Rider University, Lawrence Road, NJ 31111, USA
4
Department of Math & Stat, International University, Miami, FL 21961, USA
*
Author to whom correspondence should be addressed.
Analytics 2025, 4(3), 19; https://doi.org/10.3390/analytics4030019 (registering DOI)
Submission received: 16 May 2025 / Revised: 16 July 2025 / Accepted: 8 August 2025 / Published: 13 August 2025

Abstract

Hydrological time series frequently exhibit periodic trends with variables such as rainfall, runoff, and evaporation rates often following annual cycles. Seasonal variations further contribute to the complexity of these data sets. A critical aspect of analyzing such phenomena is estimating realistic return intervals, making the precise determination of these values essential. Given this importance, selecting an appropriate probability distribution is paramount. To address this need, we introduce a flexible probability model specifically designed to capture periodicity in hydrological data. We thoroughly examine its fundamental mathematical and statistical properties, including the asymptotic behavior of the probability density function (PDF) and hazard rate function (HRF), to enhance predictive accuracy. Our analysis reveals that the PDF exhibits polynomial decay as x , ensuring heavy-tailed behavior suitable for extreme events. The HRF demonstrates decreasing or non-monotonic trends, reflecting variable failure risks over time. Additionally, we conduct a simulation study to evaluate the performance of the estimation method. Based on these results, we refine return period estimates, providing more reliable and robust hydrological assessments. This approach ensures that the model not only fits observed data but also captures the underlying dynamics of hydrological extremes.

1. Introduction

Heavy-tailed distributions are widely studied in fields like actuarial science, engineering, and environmental science [1,2]. Recent hydrological research emphasizes improving tail behavior modeling to better predict extreme events such as floods, heavy rainfall, and glacier melt [3,4,5,6]. Applying heavy-tailed models to flood frequency analysis has proven effective in capturing the probability of rare, severe floods [7]. New distribution families, including those based on sine trigonometric functions, have been developed to enhance risk assessment and refine return period estimates [8,9]. The accurate forecasting of floods is crucial for risk management, especially given the periodic trends in rainfall and runoff. Using flexible, heavy-tailed distributions helps prevent the underestimation of extreme events, safeguarding infrastructure and public safety. Hydrological and environmental time series often display heavy-tailed, unimodal, and right-skewed traits [10]. These are usually modeled as independent stochastic processes, although their true distributions are unknown, leading to reliance on simplified models that may not capture real data complexities. Recent research highlights the need for advanced distribution families to better represent extreme events [11]. Modern techniques, including machine learning, help identify long-term trends, periodicities, and climate change impacts on floods, which are vital for risk assessment, early warnings, and infrastructure resilience planning amid rising flood risks [12,13,14]. Recent advances in probability distributions have introduced flexible families to address various statistical challenges [1]. The accurate estimation of return periods is crucial for designing hydraulic structures and managing flood risks. Flood time series analyses highlight the importance of modeling temporal variability and extremes using advanced statistical and machine learning techniques to identify trends, periodicities, and climate impacts [11,13]. These improve flood risk assessment and early warning systems. Flood frequency analysis (FFA), involving data screening, periodicity analysis, distribution selection, and parameter estimation, relies on modern methods to better capture complex flood behaviors influenced by climate and human factors [1,9,10,11,13,15,16,17,18,19,20,21]. A key challenge in flood frequency analysis (FFA) is the uncertainty in identifying the true probability distribution of flood events at specific sites, and it does not incorporate machine learning and heavy-tailed models [12]. Traditional models such as the Exponential, Log-Normal, Pearson Type-III, Weibull, Fréchet, and Gumbel [19,22,23,24] often struggle to capture complex hydrological features like seasonality, multiple peaks, or bounded extremes [25]. For example, the Gumbel distribution tends to underestimate tail risks, limiting its reliability [13]. While some studies [26,27] have incorporated sine functions to model seasonality, these approaches are limited in capturing tail behavior and complex flood dynamics. Similarly, sine-transformed distributions such as Sine-Weibull, Sine-Lomax, Sine-Exponential, and Sine-Burr [28,29,30,31,32] lack strong theoretical motivation and often fail to accurately model tail extremes and bounded or multimodal data [33]. Overall, these models fall short when data exhibit seasonality, multiple peaks, or bounded periodic extremes, highlighting the need for more flexible and theoretically grounded distribution families to improve flood risk assessment. To overcome these challenges, this section introduces the Sine-Skewed Distribution (SSD), which is a novel approach designed to enhance flood frequency modeling. The SSD offers several key advantages: (i) improved tail flexibility to better capture extreme events, (ii) boundedness for realistic hydrological constraints, (iii) accommodation of periodic behavior in time series data, and (iv) the preservation of heavy-tailed characteristics where needed. We derive the fundamental properties of the SSD, including its Probability Density Function (PDF), Hazard Rate Function (HRF), and Quantile Function. Furthermore, we examine both its parametric and asymptotic behavior to ensure robust performance in hydrological applications. This comprehensive analysis positions the SSD as a promising alternative to traditional distributions in FFA.

Derivation of the Proposed Model

The derivation of the proposed model is based on the following steps:
  • The first step in this direction is choice of the odd link function D ( . ) , defined D ( x ; Θ ) = 1 G ( x ) G ( x ) , which satisfies the conditions such as (i) D ( . ) is differentiable and monotonically non-decreasing, (ii) D ( x ) a as x 0 and D ( x ) b as x where G ( x ) is the baseline cumulative distribution function (CDF).
  • Now, take the CDF of log-logistic distribution as a baseline function with parameters ( θ , α ) , which is defined on the interval ( 0 , ) as G ( x | α , θ ) = x θ x θ + α θ .
  • On incorporating the baseline distribution function into D ( x | α , θ ) , we get D ( x | α , θ ) = ( α x ) θ .
  • In order to address the common periodic fluctuations in hydrological time series, as highlighted in [19], we used the [34] CDF in terms of P ( x ) = Sin ( π 2 G ( x | α , θ ) ) to generate a new class of distributions by modifying trigonometric functions—particularly the transformation of the Sin function into the Sin G -class function.
  • Finally, a new class of exponentiated Sin- D class is proposed by first substituting G ( x | α , θ ) into D ( x | α , θ ) and implementing necessary domain constraints to guarantee the function’s validity as a CDF for this new family as
    F ( x ; α , β , θ ) = 1 Sin ( π 2 D ( x | α , θ ) ) β , x α ,
    which on incorporating D ( x | α , θ ) in Equation (1) reduces to
    F SSD ( x ; α , θ , β ) = 1 Sin ( π 2 ( α x ) θ ) β , x > α ,
    and its survival function becomes
    S SSD ( x ; α , θ , β ) = 1 1 Sin ( π 2 ( α x ) θ ) β , x > α ,
with a corresponding PDF as
f ( x ; α , θ , β ) = π α β θ 2 x 2 ( α x ) θ 1 Cos ( π 2 ( α x ) θ ) 1 Sin ( π 2 ( α x ) θ ) β 1 , x > α , α , β > 0 , θ > 0 .
where α , θ and β are the parameters of the distributions. The parameter α usually affects the lower bound shifts in the distribution to the right, θ affects the rate at which the CDF approaches 1, and β influences the overall shape and tail behavior. The PDF (which involves positive powers and cosine functions), as stated in Equation (4), is a skewed function depending on the parameters. For example, if β > 1 , the distribution tends to be more concentrated near the lower bound; i.e., α , similarly, if β < 1 , SSD might have a longer tail on the right. However, the oscillatory sine and cosine functions can produce complex shapes, potentially leading to negative skewness in some parameter regimes.
The graphs show how different parameter settings affect the SSD’s PDF. The left plot as shown in Figure 1 covers a wide range and reveals predominantly right-skewed distributions peaking at lower values. The right plot, focusing on a narrower range, also demonstrates parameter-driven variations in shape and skewness. Overall, changing parameters significantly impact the distribution, underscoring the importance of proper parameter selection for accurate SSD modeling.
The remainder of the manuscript is structured in the following manner: In Section 2, an exploration of mathematical and statistical features is carried out. Section 3 is devoted to the comparison of methods of parameter estimation like maximum likelihood (MLE), Bayesian method of estimation (BME) and the L-moments estimation (LME) method along with a simulation study. In Section 4, flood data applications are studied and an analysis is made on the basis of goodness-of-fit measures, and the conclusions are presented in Section 5.

2. Exploring Mathematical and Statistical Features

In this section, we delve into the mathematical and statistical features that characterize the SSD G distribution class. We will analyze concepts such as PDF and HRF curve behaviors, quantiles, the moment-generating function, various moments (including conditional moments), mean deviation, Bonferroni and Lorenz measures, and order statistics.

2.1. Shape of the PDF and HRF Curves

Suppose X follows SSD distribution with PDF as defined in Equation (Figure 1); then, the lograthimic form of the function is expressed as
Log ( f ( x ; α , β , θ ) ) = Log ( π α β θ 2 ) 2 Log ( x ) + ( θ 1 ) Log α x + Log Cos π ( α x ) θ 2 + ( β 1 ) Log 1 Sin π ( α x ) θ 2 ,
on differentiating both sides of Equation (11) with respect to x and then equating it to zero, we get its mode, which is obtained by numerical study as portrayyed in Table 1. Moreover, it is observed that
d Log f ( x ; α , β , θ ) d x = 2 x 1 + θ x + π α α x 1 + θ ( 1 + β ) θ Cos 1 2 π α x θ 2 x 2 1 Sin 1 2 π α x θ + π α α x 1 + θ θ Tan 1 2 π α x θ 2 x 2 = 0 .
Here, α scales the mode linearly, and larger α values shift the mode rightward. θ governs tail behavior, and higher θ values pushes the mode closer to α . β controls skewness, for β > 1 , the mode shifts right; for β < 1 , left. Moreover, x α + implies f ( x ) 0 whereas Log ( f ( x ) ) . Similarly, x implies f ( x ) x 2 as well as finite mean but infinite variance, which is useful for modeling extreme events, whereas d Log ( f ( x ) ) d x 2 Log ( x ) . The SSD distribution’s mode is analytically intractable but can be reliably computed numerically; see Table 1. The mode exists for all x > α and responds predictably to parameter changes, making it useful for modeling skewed, heavy-tailed data.
Definition 1.
The hazard rate function (failure rate) of the Sine-Skewed Distribution (SSD) is given by
h ( x ; α , β , θ ) = f ( x ; α , β , θ ) 1 F ( x ; α , β , θ ) ,
which on the substitution of Equations (1) and (3) yields
h ( x ; α , β , θ ) = π α β θ 2 x 2 α x θ 1 Cos π 2 α x θ 1 Sin π 2 α x θ β 1 1 1 Sin π 2 α x θ β .
The SSD hazard rate is flexible, accommodating both decreasing and non-monotonic behaviors. Now, as x α + , h ( x ) 0 , and x , h ( x ) π α θ β θ 2 x θ + 1 . For β 1 , h ( x ) is decreasing (DFR), and for β < 1 , h ( x ) may be non-monotonic. It is heavy-tailed with h ( x ) 0 as x . The parameter β controls the monotonicity for any β < 1 potential initial increase (useful for early-life failures), and β 1 is the decreasing failure rate (common in wear-out processes). SSD distribution is also suitable for modeling reliability data with varying failure patterns. Similarly, the lograthmic form of HRF yields
Log ( h ( x ) ) = Log π α β θ 2 2 Log ( x ) + ( θ 1 ) Log α x + Log Cos π 2 α x θ + ( β 1 ) Log 1 Sin π 2 α x θ Log 1 1 Sin π 2 α x θ β .
lim x α + Log ( h ( x ) ) = , and lim x log h ( x ) = log π α θ β θ 2 ( θ + 1 ) log x + O ( x 2 θ ) . For β 1 , log h ( x ) is decreasing, and for β < 1 : Log ( h ( x ) ) may have one maximum.
Similarly, the HRFs mainly show right-skewed, unimodal shapes with higher densities at lower SSD values see Figure 2. Some curves suggest increasing failure rates (IFRs) initially, then decreasing, while others indicate decreasing failure rates (DFRs). Overall, the distributions highlight how parameter variations influence failure behaviors. From Table 1 and Figure 3 and Figure 4, there is a relationship between its modal values, and the parameters θ and β exist for fixed α values (0.5 and 1.0). For α = 0.5 , the mode and mode/ α ratio exhibit a consistent increase with θ across varying β values (1.0 to 2.0), indicating that higher β values amplify skewness and shift the peak further from α . This trend is particularly evident in the mode/ α ratio, which normalizes the mode by the scale parameter, demonstrating the proportional shift in the distribution’s shape. For α = 1.0 , the behavior of the mode becomes more complex: it initially rises with θ but stabilizes or slightly declines at higher θ values (e.g., θ > 2.5 ), suggesting that the influence of θ diminishes as α increases. The mode/ α ratio for α = 1.0 further supports this observation, highlighting the dominant role of β in determining the distribution’s peak when α is larger. These findings underscore the interplay between θ and β in shaping the SSD’s modal characteristics, where θ controls the tail behavior and β governs the skewness intensity, collectively influencing the location of the mode relative to the scale parameter α .

2.2. Percentile Function

Percentile functions bridge probabilistic flood risk with actionable engineering metrics, outperforming classical methods in robustness and interpretability—especially for climate-adjusted extremes. Let X be a continuous random variable with a CDF defined as F : R [ 0 , 1 ] . The percentile function Q ( u ) serves to identify the value x such that the probability of a random draw from the distribution being less than or equal to x is equal to u. The inverse of the SSD percentile function, yielding x u = Q ( u ) , implies that Q ( u ) = F SSD 1 ( u ) is derived as follows:
Step 1: 
Set the CDF equal to u
1 Sin π 2 α x θ β = u , u ( 0 , 1 ) .
Step 2: 
Take the β-th root of both sides
1 Sin π 2 α x θ = u 1 / β .
Step 3: 
Isolate the sine term
Sin π 2 α x θ = 1 u 1 / β .
Step 4: 
Take the inverse Sine (ArcSine) of both sides
π 2 α x θ = ArcSin 1 u 1 / β .
Step 5: 
Solve for α x θ
α x θ = 2 π ArcSin 1 u 1 / β .
Step 6: 
Take the  θ -th root of both sides
α x = 2 π ArcSin 1 u 1 / β 1 / θ .
Step 7: 
Solve for x to obtain the percentile function
x u = α 2 π ArcSin 1 u 1 / β 1 / θ .
Step 8: 
Final Percentile Function
Q ( u ) = α 2 π ArcSin 1 u 1 / β 1 / θ , u ( 0 , 1 ) .
Thus, we can write it as
x u = Q ( u ) = π 2 1 θ α ArcSin 1 u 1 β 1 / θ
where F 1 ( u ) denotes the percentile function of F ( x ) . Here, F 1 ( u ) is characterized by the equation Q ( u ) where u ( 0 , 1 ) . The M e d i a n = X ˜ = x 0.5 is given by
X ˜ = π 2 1 θ α ArcSin 1 ( 0.5 ) 1 β 1 / θ
The skewness measure is due to the Bowley skewness defined by
SK = Q ( 3 4 ) + Q ( 1 4 ) 2 Q ( 1 2 ) Q ( 3 4 ) Q ( 1 4 )
On the other hand, the Moors kurtosis (Moors, (1988)) based on quantiles is given by
KU = Q ( 7 8 ) Q ( 5 8 ) + Q ( 3 8 ) Q ( 1 8 ) Q ( 6 8 ) Q ( 2 8 ) .
where Q ( ·) represents the percentile function. The measures SK and KU possess the usual characteristics. So, SSD is positively skewed and behaves as leptokurtic for α > 1 and platykurtic for α < 1 , which can be visualized from Figure 5.

2.3. Moments and Moment-Generating Function

Moments and moment-generating functions (MGFs) are essential tools in flood frequency analysis, providing insights into the distribution of extreme hydrological events. The first four moments—mean, variance, skewness, and kurtosis—capture key characteristics of flood distributions. The mean estimates average flood magnitudes, guiding infrastructure design such as spillway capacity. Variance measures the variability of flood peaks, with higher values indicating more volatile flood regimes, such as those influenced by monsoon patterns. Skewness assesses asymmetry in flood extremes with positive skewness common in rainfall-driven floods. Kurtosis indicates the heaviness of distribution tails, helping identify basins prone to outliers, like those affected by snowmelt combined with rainfall. Now, let X be a random variable with a PDF as given in Equation (4), which is parameterized by shape parameters ( α , β , θ ). The rth moment for a distribution within the SSD class can then be derived as follows:
μ r / = E ( X r ) = x s f ( x ; α , β , θ ) d x ,
on incorporating Equation (4) in Equation (6), we get
μ r / = E ( X r ) = π α β θ 2 α x r 2 ( α x ) θ 1 Cos ( π 2 ( α x ) θ ) × 1 Sin ( π 2 ( α x ) θ ) β 1 d x , x > α , α , β > 0 , θ > 0 .
now, on substuting 1 Sin ( π 2 ( α x ) θ ) β 1 = y , we get
μ r / = α r π 2 r / θ 0 1 ArcSin 1 y 1 β r β d y ,
let
ξ ( β , θ , r ; 0 , 1 ) = 0 1 ArcSin 1 y 1 β r β d y ,
which after numerical integration yields results, so the rth moment can be written as
μ r / = α r π 2 r / θ ξ ( β , θ , r ; 0 , 1 ) .
In this section, we have also conducted a numerical study, as portrayed in Table 2, to know the existence of mean and shape of the SSD under different parameters’ values. Table 2 also portrays that skewness is consistently positive across all configurations, suggesting all distributions are right-skewed. However, kurtosis values are > 3 and < 3 , indicating moderate to heavy-tailed behavior in these configurations. The extreme values in kurtosis and skewness for the first few configurations suggest potential outliers or heavy-tailed distributions. This could be characteristic of heavy-tailed distributions under specific parameter settings. Moreover, from Table 1, it is evident that β has no effect on the moments, mean, variance, skewness, or kurtosis in this data set. This suggests that the moments are independent of β under the given model. Increasing α leads to higher mean, variance, skewness, and kurtosis; see Figure 6. Increasing θ reduces the mean and variance but increases skewness and kurtosis (for fixed α ). The distribution is right-skewed ( γ 1 > 0 ) and leptokurtic ( γ 2 > 0 , heavier tails than normal). Now, we introduce the moment generating function by defining it as
M X ( t ) = E ( e t X ) = e t x f ( x ; α , β , θ ) d x
M X ( t ) = E ( e t X ) = r = 0 t r r ! α r π 2 r θ 0 1 ArcSin 1 y 1 β r / θ d y ,
M X ( t ) = E ( e t X ) = r = 0 t r r ! μ r / ,
where μ r / is defined in Equation (7).

2.4. Conditional Moments

In flood risk analysis, conditional statistical methods such as conditional moments, the mean residual life (MRL) function, and mean inactivity time (MIT) are essential for understanding extreme hydrological events. The first partial moment helps develop flood risk curves, highlighting how rare but severe floods (e.g., top 10%) cause the most damages, and revealing patterns like the temporal clustering of high-risk periods. The MRL function predicts the expected severity of floods exceeding certain thresholds, aiding infrastructure planning, while MIT estimates the recovery times between floods, informing emergency preparedness. These tools support practical applications like flood insurance pricing and resource allocation. An example involving the Rhine River showed that a small percentage of floods caused the majority of economic losses—insights that traditional return period analysis might miss. Overall, this framework offers policymakers refined means to assess and manage flood risks amid climate change. Consequently, to facilitate this, the rth partial moment of the variable X, denoted as r ( t ) for any real r > 0 , is defined as
r ( t ) = t x r f ( x ; α , β , θ ) d x ,
on sustituting y = 1 Sin ( π 2 ( α x ) θ ) β , we get
r ( t ) = 0 1 Sin 1 2 π α t θ β α r π 2 r θ ArcSin 1 y 1 β r / θ d y ,
r ( t ) = α r π 2 r θ 0 1 Sin 1 2 π α t θ β ArcSin 1 y 1 β r / θ d y ,
r ( t ) = α r π 2 r θ 0 F ( t ; α , β , θ ) ArcSin 1 y 1 β r / θ d y ,
r ( t ) = α r π 2 r θ ξ ( β , θ , r ; 0 , F ( t ; α , β , θ ) ) ,
where ξ ( β , θ , r ; 0 , F ( t ; α , β , θ ) ) = 0 F ( t ; α , β , θ ) ArcSin 1 y 1 β r / θ d y . Now, the rth residual moment is an important characteristic of the model. It gives the expected additional lifetime given that a component has survived until time t. For a non-negative continuous random variable X with SSD( α , β , θ ) distribution, the rth life function is defined as
E ( ( X t ) r | X > t ) = 1 S ( t ; α , β , θ ) t ( x t ) r 2 x 2 π α α x 1 + θ β θ × Cos 1 2 π α x θ 1 Sin 1 2 π α x θ 1 + β d x ,
where x > α , α , β > 0 , θ > 0 , now put y = 1 Sin e ( π 2 ( α x ) θ ) β ; then, we get
E ( ( X t ) r | X > t ) = 1 S ( t ; α , β , θ ) 1 Sin 1 2 π α t θ β 1 π 2 1 θ α ArcSin 1 y 1 β 1 / θ t r d y ,
by using binomial expansion, we have
E ( ( X t ) r | X > t ) = 1 S ( t ; α , β , θ ) k = 0 r ( t ) k r k π 2 r k θ α r k × 1 0 1 Sin 1 2 π α t θ β ArcSin 1 y 1 β ( r k ) / θ d y ,
which on simplification yields
E ( ( X t ) r | X > t ) = 1 S ( t ; α , β , θ ) k = 0 r ( t ) k r k π 2 r k θ × α r k 1 ξ ( β , θ , r k ; 0 , F ( t ; α , β , θ ) ) .

2.4.1. Mean Deviation

Partial moments offer a valuable tool for quantifying the typical difference between a population’s median and mean, providing insights into the distribution’s central tendency. This methodology has broad applicability in fields like economics and insurance. For a random variable X following the SSD distribution, the mean deviations around the mean μ = E ( X ) and the median M ˜ are formally defined as
1 ( x ) = E X μ 1 / = 2 μ 1 / F ( μ 1 / ) 2 1 ( μ 1 / )
and
2 ( x ) = E X M ˜ = μ 1 / 2 1 ( M ˜ )
respectively, where μ 1 / = E ( X ) , M ˜ = m e d i a n (X) = Q ( 1 2 ) , M , and 1 ( t ) is the first complete moment given by Equation (7) with r = 1 .

2.4.2. Bonferroni and Lorenz Curves

For a positive random variable X, the Lorenz and Bonferroni curves at a given probability p are expressed by B ( u ) = 1 p μ 1 / 1 ( q ) . In these definitions, μ 1 / = E ( X ) is the expected value of X, and p = U ( u ) represents the value of the percentile function of X at percentile u.

2.5. Order Statistics

Order statistics are important statistical measurements derived from arranging a set of random observations. Consider n independent random variables X 1 , X 2 ,…, X n following the SSD distribution. When these variables are sorted in increasing order to form X 1 X 2 . . . X n , the resulting values are known as order statistics. These ordered data points are frequently applied in the reliability analysis of systems. The CDF for the ith order statistic is presented as follows:
F i ; n ( x ) = 1 B ( i , n i + 1 ) j = 0 n i ( 1 ) j i + j n i j F i + j ( x ; α , β , θ ) = 1 B ( i , n i + 1 ) j = 0 n i ( 1 ) j n i j 1 Sin π 2 α x θ β ( i + j ) , x > α , α , β > 0 , θ > 0 .
The corresponding pdf is expressed in the given form as
f i ; n ( x ) = f ( x ; α , β , θ ) B ( i , n i + 1 ) j = 0 n i ( 1 ) j n i j F ( x ; α , β , θ ) i + j 1 ( x ) = π α β θ ( α x ) θ 1 2 x 2 B ( i , n i + 1 ) Cos π 2 ( α x ) θ × j = 0 n i ( 1 ) j n i j 1 Sin π 2 α x θ β ( i + j ) 1 , x > α , α , β > 0 , θ > 0 .
Then, the rth moment of the ith order statistics is given by
μ i : r = E ( X i : r r ) = x r f i ; n ( x ) d x = = 1 B ( i , n i + 1 ) j = 0 n i ( 1 ) j n i j x r f ( x ) F i + j 1 ( x ) d x = 1 B ( i , n i + 1 ) j = 0 n i ( 1 ) j n i j μ r , i + j 1
where this integral can be evaluated numerically.

3. Methods of Parameter Estimation and Simulation Study

This section employs the maximum likelihood estimation (MLE) and studied the performance of MLEs on the basis bias and of the mean square error (MSE) of the MLEs, which provide asymptotically consistent, efficient, and normally distributed estimates, making it well-suited for large samples. However, its performance can degrade under model misspecification. As MLE excels in correctly specified models, to evaluate the performance of the method, we conducted a simulation study, based on sample size, distributional characteristics, and modeling objectives. However, in Method of Moments (MoM) estimation, case theoretical moments were hard to compute (leading to numerical instability), and initial guesses were too far from true parameters. So, we have not included it in our study.

3.1. Method of Maximum Likelihood

Statistical inference typically relies on three approaches: point estimation, interval estimation, and hypothesis testing. Among the various parameter estimation techniques available, the likelihood method stands out for its versatility and desirable properties, particularly in constructing confidence regions, intervals, and test statistics. The asymptotic theory associated with these estimates simplifies calculations and performs effectively even with limited sample data. Statisticians often aim to estimate quantities like the density of a test statistic, which is influenced by sample size, to improve the accuracy of estimate distributions. The calculations for maximum likelihood estimates (MLEs) within distribution theory are straightforward whether approached conceptually or mathematically. This section will focus on estimating parameters using the MLE method based on the entire sample. Let x 1 , , x n be a stochastic realization of size n from the SSD distribution as defined in Equation (4). Let P n ( ϕ ) = ( 𝜕 n 𝜕 α , 𝜕 n 𝜕 β 𝜕 n 𝜕 θ ) T be a q × 1 vector of the parameters. The log-likelihood function is given by
n = n Log π 2 + n Log ( α ) + n Log ( β ) + n Log ( θ ) + ( θ 1 ) i = 1 n Log α x i + i = 1 n Log Cos 1 2 π α x i θ 2 i = 1 n Log x i
+ ( β 1 ) i = 1 n Log 1 Sin 1 2 π α x i θ .
The log-likelihood can be maximized by differentiating Equation (12) with respect to the parameters, i.e.,
𝜕 n 𝜕 α = n α + n ( 1 + θ ) α + ( 1 + β ) i = 1 n π θ Cos 1 2 π α x i θ α x i 1 + θ 2 1 Sin 1 2 π α x i θ x i i = 1 n π θ α x i 1 + θ Tan 1 2 π α x i θ 2 x i ,
𝜕 n 𝜕 β = n β + i = 1 n Log 1 Sin 1 2 π α x i θ ,
𝜕 n 𝜕 θ = n θ + i = 1 n Log α x i + ( 1 + β ) i = 1 n π Cos 1 2 π α x i θ Log α x i α x i θ 2 1 Sin 1 2 π α x i θ + i = 1 n 1 2 π Log α x i α x i θ Tan 1 2 π α x i θ .
The MLEs of parameters can be materialized by resolving the system of nonlinear equations, i.e., V n ( ϕ ) = 0 . Since no closed form of estimators is possible, we have decided to find the solutions of these equations analytically by using the Newton–Raphson method via statistical packages such as Mathematica [12.0], R and Matlab.

Performance Summary of MLEs

The results presented in Table 3 demonstrate the performance of maximum likelihood estimation (MLE) for the SSD distribution parameters across five distinct parameter configurations. For Set-I ( α = 0.2556 , θ = 0.2286 , β = 0.2182 ), the estimators exhibit minimal bias, particularly for sample sizes n 50 , with mean squared error (MSE) values decreasing rapidly as the sample size increases. This suggests strong consistency properties for MLEs when parameter values are relatively small. In Set-II ( α = 0.2556 , θ = 1.0286 , β = 1.1184 ), we observe persistent positive bias in θ ^ estimates (approximately 0.0116 for n = 25 ) and higher MSE values compared to Set-I, particularly for β (MSE = 16.6767 at n = 25 ). The slower convergence indicates that larger sample sizes ( n > 150 ) may be required for reliable estimation when θ and β exceed 1. The most problematic case emerges in Set-III ( α = 1.2556 , θ = 1.2286 , β = 0.0188 ), where β ^ shows severe overestimation (bias ≈ 8.5 across all sample sizes) with extraordinarily high MSE values (approximately 75). This suggests fundamental challenges in estimating extremely small β parameters, which is likely due to numerical instability in the likelihood function’s behavior near zero. Set-IV ( α = 1.0045 , θ = 0.0285 , β = 1.2185 ) presents an anomalous case where θ ^ yields negative MSE values, indicating potential computational artifacts in the optimization process. Meanwhile, β ^ maintains a consistently high bias (≈9.9) and MSE (>90), demonstrating particular sensitivity to the combination of small θ and large β . For Set-V ( α = 1.1551 , θ = 1.3286 , β = 1.1085 ), we observe a moderate improvement in estimation quality with increasing sample size, though θ ^ and β ^ maintain higher MSE values (0.6870 and 4.3325, respectively at n = 150 ) compared to α ^ . The persistent underestimation of α (bias ≈−0.07 at n = 150 ) warrants further investigation into possible model misspecification.
These findings collectively highlight that the MLE performance for the SSD distribution is highly sensitive to the parameter space region with particular challenges emerging when the following apply:
  • β approaches zero (numerical instability);
  • θ is very small (optimization challenges);
  • Multiple parameters are large (slower convergence).
The results suggest that alternative estimation approaches or modified likelihood formulations may be necessary for certain parameter regimes, particularly when dealing with very small shape parameters.

3.2. Simulation Study

In order to find the estimators of SSD parameters, i.e., α , θ and β , we have adopted three methods: namely, MLEs. In this context, we conducted a simulation study that involved five sets of parameters: Set-I: α = 0.2556 , θ = 0.2286 , β = 0.2182 , Set-II: α = 0.2556 , θ = 1.0286 , β = 1.1184 , Set-III: α = 1.2556 , θ = 1.2286 , β = 0.0188 , Set-IV: α = 1.0045 , θ = 0.0285 , β = 1.2185 , and Set-V: α = 1.1551 , θ = 1.3286 , β = 1.1085
  • Generate 1000 samples of size n = 15, 25, 50, 75, 100, 150 from the given distribution.
  • Compute the MLE for α , θ and β using the log-likelihood function.
  • Calculate bias = ( Θ ^ Θ ) and MSE = ( Θ ^ Θ ) 2 ) .
  • Repeat for all values of Θ = ( α , θ , β ) .

4. Discussion to Flood Data Application

4.1. Flood Frequency Analysis

Flood analysis can be performed using either annual maximum series (AMS) or partial-duration series (PDS). AMS records the highest flood peak each year, while PDS includes all peaks exceeding a set base level, potentially yielding more peaks than years of data. The Water Resources Council (USWRC) has provided guidelines for flood frequency analysis in Bulletin Nos. 15, 17, 17A, 17B, and 17C. These guidelines are suitable for floods with an annual exceedance probability (AEP) of 0.10 or less see [35]. For such quantiles, AMS provides a suitable sample and produces AEP estimates very similar to those from PDS. Furthermore, AMS is preferred for its wider availability and longer data records, while PDS can suffer from incomplete records due to challenges in defining the base threshold.

4.2. Data Sources and Competing Models

The core data foundations that are suggested for practice in flood frequency comprise systematic records, historical flood information, and pale of flood and botanical information. In this research, we adopted the systematic records and summarize them in Table 1; however, the flood measurements are listed below. Furthermore, for a comprehensive study of the models and identification of realistic return period, we have compared the SSD with well-known three-parameter distributions, such as Kappa (Kappa(3)), and Gamma (GD(3)) distributions. Similarly, two-parameter distributions, including Weibull (WD(2)), Gamma (GD(2)), Extreme Values (EV(2)), Log Logistic (LLD(2)), Log Normal (LN(2)) and Gumbel (GuD(2)).

4.3. Goodness of Fit Measure

For comparison purposes, we have studied goodness-of-fit tests, which help us assess how well a statistical distribution fits a sample of data. The three common tests discussed are the Kolmogorov–Smirnov, Cramér–von Mises, and Anderson–Darling tests.
1. 
Kolmogorov–Smirnov (K-S) Test
It tests the maximum difference between the empirical distribution function (EDF) and the theoretical cumulative distribution function (CDF). Its test statistics is expressed as
D = sup x | F n ( x ) F ( x ) |
where F n ( x ) is the empirical distribution fnction and F ( x ) is the theoretical CDF. It is suitable for continuous distributions and sensitive to central deviations but less sensitive in the tails; see [36,37].
2. 
Cramér–von Mises (CvM) Test
It measures the squared distance between the EDF and the theoretical CDF across the domain. It is defined as
W 2 = [ F n ( x ) F ( x ) ] 2 d F ( x )
It presents more balanced sensitivity across the entire distribution; see [38,39].
3. 
Anderson–Darling (A-D) Test
It is a modified version of the CvM test that gives more weight to the tails. It is expressed as
A 2 = n 1 n i = 1 n ( 2 i 1 ) ln F ( X i ) + ln ( 1 F ( X n + 1 i ) )
It is more powerful in detecting tail deviations see [40]. However, for comprehensive details, readers are referred to [41,42,43].

4.4. Information Criteria for Model Selection

Information criteria are used to compare and select statistical models by balancing goodness-of-fit and complexity; see [44,45,46]. These criterion are stated one by one as
1. 
Akaike Information Criterion (AIC)
It is used to select the model with the best trade-off between fit and complexity but can overfit with small samples.
AIC = 2 ln ( L ) + 2 k
where L is the likelihood function and k is the number of estimated parameters.
2. 
Corrected AIC (AICc)
It is usually recommended for small sample sizes.
AICc = AIC + 2 k ( k + 1 ) n k 1
3. 
Bayesian Information Criterion (BIC)
It is developed by [45]; generally, it penalizes complexity more heavily than AIC, and it often prefers simpler models.
BIC = 2 ln ( L ) + k ln ( n )
4. 
Hannan–Quinn Information Criterion (HQIC)
It is proposed by [44]. It is an intermediate between AIC and BIC in penalizing model complexity. [44] defined it as
HQIC = 2 ln ( L ) + 2 k ln ( ln ( n ) )
5. 
Consistent AIC (CAIC)
It is recognized as a variant that penalizes complexity even more, leading to more parsimonious models. It adjusts the AIC for small sample sizes and is re-expressed as
CAIC = 2 ln ( L ) + k [ ln ( n ) + 1 ]

Summary Table

Table 4 portrayed a comparison of these goodness-of-fit measures, which helps readers understand how they function.

4.5. Real Data Examples

Two real-world flood data sets constitute the AMF series: the first one measures the flow data for Mill Creek (Station 93) near Manhattan, IN for the period of 1940–1991 with measurements in cubic feet per second (cfs) taken from [10], and these data’s values are 4020, 3690, 2130, 2410, 3270, 1540, 2250, 2060, 5340, 4040, 2710, 2050, 5800, 3180, 2780, 2050, 5960, 2940, 2730, 1930, 4000, 2600, 2430, 1990, 3200, 2440, 2110, 2030, 5000, 2740, 2440, 2190, 4800, 2750, 2290, 1830, 5000, 2860, 2520, 1750, 8960, 2920, 2000, 1870, 3000, 2980, 1650, 1840, 3290, 2930, 2260, 3060. The second data set, which measures the peak in cubic meter per second (m3/s), was obtained from https://nrfa.ceh.ac.uk/data/station/peakflow/39008 (accessed on 24 January 2024). The National River Flow Archive (NRFA) provides access to its peak flow data of 96 stations, and the values of such data are 54.234, 81.635, 54.349, 78.382, 52.494, 59.305, 78.484, 51.479, 48.819, 52.674, 79.532, 65.128, 49.693, 74.867, 50.165, 49.347, 48.683, 72.413, 53.646, 47.325, 50.126, 54.261, 50.967, 64.294, 65.318, 64.898, 83.066, 54.338, 50.749, 56.469, 53.75, 83.059, 91.572, 72.319, 51.751, 62.626, 75.505, 62.157, 50.001, 57.514, 62.028, 56.744, 70.192, 56.502, 91.796, 70.87, 51.12, 55, 50.9, 49, 59.792, 55.786, 79.838, 63.85, 51.751, 76.892, 102.054, 49.259, 49.179, 56.34, 87.587, 59.024, 75.38, 57.788, 56.754, 54.145, 75.795, 54.746, 59.529, 60.135, 52.464, 51.439, 51.896, 66.552, 48.935, 48.781, 96.4, 48.366, 50.741, 97.989, 68.151, 76.208, 52.154, 64.358, 68.661, 51.555, 107.355, 99.092, 70.097, 54.992, 77.875, 76.292, 59.542, 50.16, 47.211, 51.5.
From Figure 7, the violin plot analysis shows that Data Set-I is more skewed with numerous high outliers, which can distort statistical measures like mean and variance. Data Set-II has a more balanced distribution with fewer and milder outliers, making it more reliable for inference. Overall, caution is needed when analyzing Data Set-I due to its extreme outliers, while Data Set-II is relatively more stable but still requires careful examination. Visualizations are essential before drawing conclusions.
Based on Table 1, the Shapiro–Wilk (SW) test for normality indicates that the data set is not normal at one percent levels of significance, thus indicating the presence of outliers, which can be visualized from the QQ-plots portrayed in Figure 7, Figure 8 and Figure 9. Since all points do not fall along this straight line and above the red line, respectively, we cannot assume normality. Figure 10 shows two plots of autocorrelation for NRFA peaks, indicating significant correlations at certain lags. These suggest possible seasonal patterns or long-term dependencies, which warrant further analysis to understand the data and improve modeling.

4.6. Data Assumptions and Specific Concerns

For reliable statistical analysis of flood data, the information gathered must be both dependable and representative of past events. Consequently, evaluating the appropriateness and applicability of flood records is an essential part of flood frequency analysis. Typically, annual peak-flow data are considered a random sample of independent and identically distributed events. Thes peak-flow data are assumed to represent the characteristics of future floods. Essentially, the underlying process generating floods is expected to be stable or unchanging over time.
Non-stationary procedures are challenging to be identified in peak-flow series. So, in this regard, we have applied the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test for testing a null hypothesis that an observable time series is stationary around a deterministic trend against the alternative, which is non stationary, and Mann–Kendall (MK) tests to identify the independece, monotonic upward or downward trend over time component of the AMS. From Table 5, it is evident that KPSS, MK and SW tests provide strong evidence in favor of level stationarity (which is also clear from Figure 10 and Figure 11 for plots of both time series data sets), independence and identical realizations, and non-normal behavior.
Table 6 presents Kendall’s rank correlations ( τ ), which are also calculated for determining the trends in an AMS, and portrayed in Figure 12 and Figure 13, indicating a negative and positive trend between years and volume for Data Set-I and -II, respectively, with higher p-values indicating no monotonic relationship between them.
In addition, time series and autocorrelation plots are portrayed in Figure 10 and Figure 11. Notably, the stationary signal (left) in Figure 10 and Figure 11 results in few significant lags (right panel) that exceed the confidence interval of the ACF (blue dashed line). Intuitively, we can realize and summarize from the ACFs plots that the signal, on the left panel, is stationary because the lags die out. The ACF plots show significant lags indicating potential seasonal patterns or long-term dependencies, which should be further analyzed for their impact on data structure and modeling.
While discussing the assumption of flood data sets, we have observed that the selected data sets follow the usual assumptions like stationarity, independence, and trend nature. Now, we shall discuss the goodness-of-fit statistics of the proposed and competing models. From Table 7, we observe that selected data sets are leptokurtic and positively skewed, and for the SSD model, there are minimum goodness-of-fit measures with high p-values, as indicated in Table 8 and Table 9. This supports the appropriateness of the proposed model, suggesting that the SSD is the best choice for such a data set, which produces an excellent fit and is thus likely robust to outliers. Meanwhile, GuD(2) produces a very low p-value—outliers likely caused the rejection of this model. Additionally, the histograms in Figure 14 and the information criteria presented in Table 10 and Table 11 further reinforce the model’s suitability, demonstrating minimal information loss and the best fit for the SSD model. In addition, distributions such as GuD(2) yield very low p-values, which are likely due to outliers inflating the test statistics. In contrast, SSD and LDD(2) exhibit very high p-values, indicating that they may effectively accommodate the outliers or that the data have undergone preprocessing steps, such as scaling or winsorization.
The information matrix is a crucial metric that needs to be estimated to create confidence intervals around point estimates. It is derived from the matrix of second derivatives of the log-likelihood and serves as the basis for the variance–covariance matrix. This square matrix contains the variances and covariances of various variables. The diagonal elements reflect the variances of the individual variables, while the off-diagonal elements represent the covariances between all possible pairs of variables. This matrix is a valuable tool for assessing the relationships between different structures or variables. It aids in understanding patterns and dependencies in data, facilitating tasks such as dimensionality reduction, clustering, or regression analysis.
In this context, we have also computed the variance–covariance matrix for the SSD. The confidence interval bands for the estimates of α ^ , β ^ and θ ^ for the data sets can be observed in Figure 15, Figure 16 and Figure 17, respectively.
α ^   β ^   θ ^ COV I = α ^ β ^ θ ^ 1814.91 33.3254 9.4677 33.3254 0.6021 0.1627 9.4677 0.1627 0.0352 , α ^   β ^   θ ^ COV II = α ^ β ^ θ ^ 0.0611 0.0267 0.3903 0.0267 0.0057 0.0403 0.3903 0.0403 0.1467 .
On the other hand, the well-known loss-of-information criteria, such as Akaike’s information criterion (AIC), the corrected version (AICC), Bayesian information criterion (BIC), Hannan–Quinn information criterion (HQIC), and consistent AIC (CAIC) are recommended for model selection in cases where traditional goodness-of-fit metrics are unable to distinguish the better model due to overlapping results. The model’s applicability is further supported by Table 8 and Table 9, which clearly support the suggested model by showing the model with the least loss of information for both data sets. Moreover, distributions such as GuD(2), WD(2), and EVD(2) exhibit very low p-values, indicating a significant influence of outliers, particularly in the tails. Even more stable distributions like LND(2) and GD(2) fall below the 0.05 significance level, suggesting that outliers have caused them to be rejected. In contrast, only SSD maintains a relatively high and stable p-value, reflecting its robustness against deviations.

4.7. Hydrological Parameters

The AMF series is widely used in FFA for two main reasons. First, its accessibility, as most data are organized in a way that makes annual series readily available. Second, there is a straightforward theoretical foundation for extrapolating the frequency of AMF series data beyond the observed range (see [10,17,18]). Therefore, classical frequency analyses are applied to all AMF series. Since the SSD model is determined to be the most suitable based on the two data analyses mentioned above, we proceed to investigate other hydrological characteristics through return period estimation, utilizing the properties of this model.

Return Period

The likelihood of events, such as windstorms, tornadoes, and floods reoccurring at least once, is often expressed in terms of a return period length, which is typically denoted by T . This return period is the reciprocal of the probability of exceedance in a given year (see [10]). The relationship between exceedance probability and the annual return period can now be described as follows.
F ( x T ) = P ( X x T ) = 1 P ( X > x T ) = 1 1 T ,
which implies
p = P ( X > x T ) = 1 T ,
hence, T = 1 p , where F ( x T ) = 1 Sin ( π 2 ( α x ) θ ) β is the probability of non-exceedence and x T is a high threshold whose probability of exceedance is p . Therefore, the return level x T for the PRD can be obtained by the following expression
x T = π 2 1 θ α ArcSin 1 1 1 T 1 β 1 / θ , α , β , θ > 0 ,
where x T > 0 and T 1 . Table 12 delivers estimates of the return level x T and Table 13 yields estimated times of the recurrence of flood for the data set separately against the return periods T = 5 , 10 , 20 , 25 , 30 , 40 , 50 years and return periods x T where T = 1 P ( x T ) , where P ( x T ) = SF ( x T ) is the SF of the SSD given by
SF SSD ( x | α , θ , β ) = 1 1 Sin e ( π 2 ( α x ) θ ) β , x α , α , β , θ > 0 .
In computations, the existing parameters are replaced by their estimates α ^ , θ ^ and β ^ that indicate the MLEs of the SSD for the comparable data set. Additionally, plots in Figure 18 and Figure 19 for the said data sets imply that the suggested model depicts a realistic (neither too large nor too short) return period when compared with the competing models. Such a comparison is also portayed in Table 12 and Table 13, which clearly indicates that after 50 years, the flood discharge will be about 7614.74 cfs and 121.354 m3/s for Data Set-I and -II, respectively. Similarly, Figure 18 and Figure 19 portrays the the return period, which is more realistic when compared with other competing models. Other competing models portray higher time periods for the occurrence of such floods, which affects the feasibility of construction and administritations of reservior. Similarly, Table 14 and Table 15 portray confidence intervals for the return period and level estimates based on non-central t-distribution.

5. Conclusions and Future Work

This article presents a flexible probability model, the lower bounded Sine-Skewed Distribution (SSD), and examines its mathematical and statistical properties, including the mode, hazard function, asymptotic distributions, quantiles, moments, and order statistics. The model’s parameters are estimated via maximum likelihood estimation (MLE). A comprehensive simulation study is conducted to compare their performance, demonstrating that MLE provides the most reliable parameter estimates due to its consistency, asymptotic efficiency, and fulfillment of regularity conditions. The simulation results further confirm that MLE exhibits lower bias and mean squared error (MSE), particularly for finite samples, reinforcing its suitability for practical applications. The model is applied to two real-world flood data sets for validation. The analysis confirms that the proposed LBSSD significantly improves flood frequency analysis, aiding in reservoir design for defined timeframes. Key assumptions for flood data—including independence, trend analysis, stationarity, outlier detection, and autocorrelation function (ACF) checks—are rigorously assessed. The model demonstrates robustness even in the presence of outliers, ensuring reliable flood risk assessments. Additionally, confidence intervals for return levels and return periods are constructed, enhancing the interpretability of extreme event predictions. The results highlight that the Sine-Skewed Distribution (SSD), coupled with MLE-based inference, performs effectively in flood data analysis and hydrological applications, offering a valuable tool for water resource management and infrastructure planning. For future research, we plan to explore characterization issues of SSD, extend it to a bivariate framework for modeling correlated flood events, and apply it to broader hydrological and environmental data sets. A multivariate version of SSD will also be developed to improve flood prediction and address complex environmental challenges. This extension aims to provide more reliable tools for policymakers and engineers managing water resources and disaster preparedness.

Author Contributions

Conceptualization, T.H. and M.S.; methodology, T.H. and M.A.; software, T.H., and M.S.; validation, T.H. and B.M.G.K.; formal analysis, T.H. and M.S.; writing—original draft preparation, T.H. and M.A.; writing—review and editing, T.H. and M.A.; visualization, M.S., T.H. and B.M.G.K.; supervision, M.A., B.M.G.K. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The study’s application section lists the data that were used along with their citations.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Alkhairy, I.; Nagy, M.; Muse, A.H.; Hussam, E. The Arctan-X Family of Distributions: Properties, Simulation, and Applications to Actuarial Sciences. Complexity 2021, 2021, 4689010. [Google Scholar] [CrossRef]
  2. Nguyen, T.; Lee, S. Application of advanced heavy-tailed distributions in flood frequency analysis with climate change considerations. Stoch. Environ. Res. Risk Assess. 2023, 37, 987–1004. [Google Scholar]
  3. Al-Babtain, A.A.; Shakhatreh, M.K.; Nassar, M.; Afify, A.Z. A new modified Kies family: Properties, estimation under complete and type-II censored samples, and engineering applications. Mathematics 2020, 8, 1345. [Google Scholar] [CrossRef]
  4. Allouche, M.; Girard, S.; Gobet, E. Estimation of extreme quantiles from heavy-tailed distributions with neural networks. Stat. Comput. 2024, 34, 12. [Google Scholar] [CrossRef]
  5. Kim, T.J.; Kwon, H.H.; Shin, Y.S. Frequency analysis of storm surge using Poisson-Generalized Pareto distribution. J. Korea Water Resour. Assoc. 2019, 52, 173–185. [Google Scholar]
  6. Korkmaz, M.Ç. A new heavy-tailed distribution defined on the bounded interval: The logit slash distribution and its application. J. Appl. Stat. 2020, 47, 2097–2119. [Google Scholar] [CrossRef]
  7. Zhou, Z.; Liu, S.; Hu, Y.; Liang, Y.; Lin, H.; Guo, Y. Analysis of precipitation extremes in the Taihu Basin of China based on the regional L-moment method. Hydrol. Res. 2017, 48, 468–479. [Google Scholar] [CrossRef]
  8. Ahmad, Z.; Mahmoudi, E.; Dey, S. A new family of heavy tailed distributions with an application to the heavy tailed insurance loss data. Commun.-Stat.-Simul. Comput. 2022, 51, 4372–4395. [Google Scholar] [CrossRef]
  9. Lyu, H.M.; Sun, W.J.; Shen, S.L.; Arulrajah, A. Flood risk assessment in metro systems of mega-cities using a GIS-based modeling approach. Sci. Total Environ. 2018, 626, 1012–1025. [Google Scholar] [CrossRef]
  10. Hamed, K.; Rao, A.R. Flood Frequency Analysis; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
  11. Zhang, Q.; Gu, X.; Singh, V.P.; Xiao, M. Flood frequency analysis with consideration of hydrological alterations: Changing properties, causes and implications. J. Hydrol. 2014, 519, 803–813. [Google Scholar] [CrossRef]
  12. Chen, J.; Liu, Y.; Zhang, Q. Advances in flood frequency analysis: Incorporating machine learning and heavy-tailed models. Water Resour. Res. 2024, 60, e2023WR030233. [Google Scholar]
  13. Li, C.; Sun, N.; Lu, Y.; Guo, B.; Wang, Y.; Sun, X.; Yao, Y. Review on urban flood risk assessment. Sustainability 2022, 15, 765. [Google Scholar] [CrossRef]
  14. Zhou, Y.; Guo, S.; Xu, C.Y.; Xiong, L.; Chen, H.; Ngongondo, C.; Li, L. Probabilistic interval estimation of design floods under non-stationary conditions by an integrated approach. Hydrol. Res. 2022, 53, 259–278. [Google Scholar] [CrossRef]
  15. Deng, J. Maximum entropy method for flood frequency analysis: A case study of the Grand River in Ontario, Canada. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2019; Volume 344, p. 012002. [Google Scholar]
  16. Griffis, V.W.; Stedinger, J.R. Log-Pearson type 3 distribution and its application in flood frequency analysis. I: Distribution characteristics. J. Hydrol. Eng. 2007, 12, 482–491. [Google Scholar] [CrossRef]
  17. Hasan, I.F. Flood Frequency Analysis of Annual Maximum Streamflows at Selected Rivers in Iraq. Jordan J. Civ. Eng. 2020, 14, 573–586. [Google Scholar]
  18. Hosking, J.R.M.; Wallis, J.R. Some statistics useful in regional frequency analysis. Water Resour. Res. 1993, 29, 271–281. [Google Scholar] [CrossRef]
  19. McCuen, R.H. Modeling Hydrologic Change: Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
  20. Nouri Gheidari, M.H. Comparisons of the L-and LH-moments in the selection of the best distribution for regional flood frequency analysis in Lake Urmia Basin. Civ. Eng. Environ. Syst. 2013, 30, 72–84. [Google Scholar] [CrossRef]
  21. Sagrillo, M.; Guerra, R.R.; Bayer, F.M. Modified Kumaraswamy distributions for double bounded hydro-environmental data. J. Hydrol. 2021, 603, 127021. [Google Scholar] [CrossRef]
  22. Boorman, D.B. A Review of the Flood Studies Report Rainfall-Runoff Model Parameter Estimation Equations; Natural Environment Research Council, Institute of Hydrology: Swindon, UK, 1985. [Google Scholar]
  23. Cunnane, C. Statistical distribution for flood frequency analysis. In WMO Operational Hydrology; Report No. 33, WMO-No. 718; Operational Hydrology Report (WMO): Geneva, Switzerland, 1989. [Google Scholar]
  24. Millington, N.; Das, S.; Simonovic, S.P. The Comparison of GEV, Log-Pearson Type 3 and Gumbel Distributions in the Upper Thames River Watershed Under Global Climate Models; Department of Civil and Environmental Engineering, the University of Western: London, ON, Canada, 2011. [Google Scholar]
  25. Rowinski, P.M.; Strupczewski, W.G.; Singh, V.P. A note on the applicability of log-Gumbel and log-logistic probability distributions in hydrological analyses: I. Known pdf. Hydrol. Sci. J. 2002, 47, 107–122. [Google Scholar] [CrossRef]
  26. Wang, W.C.; Xu, D.M.; Chau, K.W.; Chen, S. Improved annual rainfall-runoff forecasting using PSO–SVM model based on EEMD. J. Hydroinformatics 2013, 15, 1377–1390. [Google Scholar] [CrossRef]
  27. Sapkota, L.P.; Kumar, P.; Kumar, V. A new class of sin-g family of distributions with applications to medical data. Reliab. Theory Appl. 2023, 18, 734–750. [Google Scholar]
  28. Faruk, M.U.; Isa, A.M.; Kaigama, A. Sine-Weibull Distribution: Mathematical Properties and Application to Real Datasets. Reliab. Theory Appl. 2024, 19, 65–72. [Google Scholar]
  29. Isa, A.M.; Bashiru, S.O.; Ali, B.A.; Adepoju, A.A.; Itopa, I.I. Sine-exponential distribution: Its mathematical properties and application to real dataset. UMYU Sci. 2022, 1, 127–131. [Google Scholar] [CrossRef]
  30. Isa, A.M.; Ali, B.A.; Zannah, U. Sine burr xii distribution: Properties and application to real data sets. Arid. Zone J. Basic Appl. Res. 2022, 1, 48–58. [Google Scholar] [CrossRef]
  31. Mustapha, B.A.; Isa, A.M.; Sule, O.B.; Itopa, I.I. Sine-Lomax distribution: Properties and applications to real data sets. FUDMA J. Sci. 2023, 7, 60–66. [Google Scholar] [CrossRef]
  32. Bakouch, H.S.; Hussain, T.; Chesneau, C.; Jónás, T. A notable bounded probability distribution for environmental and lifetime data. Earth Sci. Inform. 2022, 15, 1607–1620. [Google Scholar] [CrossRef]
  33. Merz, B.; Basso, S.; Fischer, S.; Lun, D.; Blöschl, G.; Merz, R.; Schumann, A. Understanding heavy tails of flood peak distributions. Water Resour. Res. 2022, 58, e2021WR030506. [Google Scholar] [CrossRef]
  34. Kumar, D.; Singh, U.; Singh, S.K. A New Distribution Using Sine Function Its Application to Bladder Cancer Patients Data. J. Stat. Appl. Pro. 2015, 4, 417–427. [Google Scholar]
  35. England, J.F., Jr.; Cohn, T.A.; Faber, B.A.; Stedinger, J.R.; Thomas, W.O., Jr.; Veilleux, A.G.; Kiang, J.E.; Mason, R.R., Jr. Guidelines for Determining Flood Flow Frequency—Bulletin 17C, Techniques and Methods 4-B5; US Geological Survey: Reston, VA, USA, 2018; Chapter B5; 148p. [Google Scholar] [CrossRef]
  36. An, K. Sulla determinazione empirica di una legge didistribuzione. Giorn Dell’inst Ital Degli Att 1933, 4, 89–91. [Google Scholar]
  37. Smirnov, N. Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat. 1948, 19, 279–281. [Google Scholar] [CrossRef]
  38. Cramar, H. On the composition of elementary errors. Skand. Aktuarietids 1928, 11, 13–74. [Google Scholar]
  39. Von Mises, R. Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und Theoretischen. In Physik; Mary S. Rosenberg: New York, NY, USA, 1945. [Google Scholar]
  40. Anderson, T.W.; Darling, D.A. Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 1952, 23, 193–212. [Google Scholar] [CrossRef]
  41. Hussain, T.; Bakoush, S.H.; Rehman, U.Z.; Shan, Q. A Flexible Discrete Probability Model for Partly Cloudy Days. Colomb. J. Stat. 1997, 48, 1–21. [Google Scholar]
  42. Hussain, T.; Bakouch, H.S.; Gharari, F. Environmental Data Analysis with a Versatile Model on Time Scales. Iran. J. Sci. 2025, 1–15. [Google Scholar] [CrossRef]
  43. Biçer, C.; Bakouch, H.S.; Biçer, H.D.; Alomair, G.; Hussain, T.; Almohisen, A. Unit Maxwell-Boltzmann Distribution and Its Application to Concentrations Pollutant Data. Axioms 2024, 13, 226. [Google Scholar] [CrossRef]
  44. Hannan, E.J.; Quinn, B.G. The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.) 1979, 41, 190–195. [Google Scholar] [CrossRef]
  45. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  46. Akaike, H. A new l ook at the statistical model identification. IEEE Trans. Autom. Control 2003, 19, 716–723. [Google Scholar] [CrossRef]
Figure 1. PDF graphs of SSD.
Figure 1. PDF graphs of SSD.
Analytics 04 00019 g001
Figure 2. HRF graphs of SSD.
Figure 2. HRF graphs of SSD.
Analytics 04 00019 g002
Figure 3. Modal plot-I.
Figure 3. Modal plot-I.
Analytics 04 00019 g003
Figure 4. Modal plot-II.
Figure 4. Modal plot-II.
Analytics 04 00019 g004
Figure 5. Skewness and kurtosis graphs of SSD by percentile.
Figure 5. Skewness and kurtosis graphs of SSD by percentile.
Analytics 04 00019 g005
Figure 6. Skewness and kurtosis graphs of SSD by moments.
Figure 6. Skewness and kurtosis graphs of SSD by moments.
Analytics 04 00019 g006
Figure 7. Violin plot for the Data Set-I and -II.
Figure 7. Violin plot for the Data Set-I and -II.
Analytics 04 00019 g007
Figure 8. QQ-plot for Data Set-I.
Figure 8. QQ-plot for Data Set-I.
Analytics 04 00019 g008
Figure 9. QQ-plot for Data Set-II.
Figure 9. QQ-plot for Data Set-II.
Analytics 04 00019 g009
Figure 10. Time series and ACF plots for Data Set-II.
Figure 10. Time series and ACF plots for Data Set-II.
Analytics 04 00019 g010
Figure 11. Time series and ACF plots for Data Set-I.
Figure 11. Time series and ACF plots for Data Set-I.
Analytics 04 00019 g011
Figure 12. Trend line plot for Data Set-I.
Figure 12. Trend line plot for Data Set-I.
Analytics 04 00019 g012
Figure 13. Trend line plot for Data Set-II.
Figure 13. Trend line plot for Data Set-II.
Analytics 04 00019 g013
Figure 14. Histogram for Data Set-I and -II.
Figure 14. Histogram for Data Set-I and -II.
Analytics 04 00019 g014
Figure 15. Confidence interval belt of α ^ for Data Set-I and-II.
Figure 15. Confidence interval belt of α ^ for Data Set-I and-II.
Analytics 04 00019 g015
Figure 16. Confidence interval belt of β ^ for Data Set-I and-II.
Figure 16. Confidence interval belt of β ^ for Data Set-I and-II.
Analytics 04 00019 g016
Figure 17. Confidence interval belt of θ ^ for Data Set-I and-II.
Figure 17. Confidence interval belt of θ ^ for Data Set-I and-II.
Analytics 04 00019 g017
Figure 18. Competing models’ return periods for Data Set-I.
Figure 18. Competing models’ return periods for Data Set-I.
Analytics 04 00019 g018
Figure 19. Competing models’ return periods for Data Set-II.
Figure 19. Competing models’ return periods for Data Set-II.
Analytics 04 00019 g019
Table 1. Modal values of SSD.
Table 1. Modal values of SSD.
α β θ X Mode X Mode α
0.51.001.000.7293321.458663
0.51.001.250.6928201.385640
0.51.001.500.6662961.332592
0.51.001.750.6461671.292333
0.51.252.000.6824791.364958
0.51.252.250.6636881.327376
0.51.252.500.6483991.296797
0.51.252.750.6357171.271435
0.51.503.000.6552651.310531
0.51.503.250.6435801.287160
0.51.503.500.6335281.267056
0.51.503.750.6247891.249578
0.51.754.000.6369091.273818
0.51.754.250.6288271.257653
0.51.754.500.6216441.243288
0.51.754.750.6152191.230437
0.52.005.000.6234081.246816
0.52.005.250.6174291.234858
0.52.005.500.6120011.224003
0.52.005.750.6070531.214105
1.01.001.251.3856411.385641
1.01.001.501.3325921.332592
1.01.001.751.2923341.292334
1.01.002.001.2607491.260749
1.01.252.251.3273761.327376
1.01.252.501.2967981.296798
1.01.252.751.2714351.271435
1.01.253.001.2500601.250060
Table 2. Numerical study of descriptive statistics from SSD.
Table 2. Numerical study of descriptive statistics from SSD.
α θ β MeanVarianceSkewnessKurtosis
0.51.01.01.23370.45160.63960.4067
0.51.01.251.23370.45160.63960.4067
0.51.01.51.23370.45160.63960.4067
0.51.02.01.23370.45160.63960.4067
0.51.251.01.12720.53790.76850.5886
0.51.251.251.12720.53790.76850.5886
0.51.251.51.12720.53790.76850.5886
0.51.252.01.12720.53790.76850.5886
0.51.51.01.06130.59060.85350.7289
0.51.51.251.06130.59060.85350.7289
0.51.51.51.06130.59060.85350.7289
0.51.52.01.06130.59060.85350.7289
0.52.01.00.98440.64790.95750.9148
0.52.01.250.98440.64790.95750.9148
0.52.01.50.98440.64790.95750.9148
0.52.02.00.98440.64790.95750.9148
1.01.01.02.46742.78991.80714.4036
1.01.01.252.46742.78991.80714.4036
1.01.01.52.46742.78991.80714.4036
1.01.02.02.46742.78991.80714.4036
1.01.251.02.25432.15371.60783.6193
1.01.251.252.25432.15371.60783.6193
1.01.251.52.25432.15371.60783.6193
1.01.252.02.25432.15371.60783.6193
1.01.51.02.12261.76291.46883.1402
1.01.51.252.12261.76291.46883.1402
1.01.51.52.12261.76291.46883.1402
1.01.52.02.12261.76291.46883.1402
1.02.01.01.96871.59161.35252.7537
1.02.01.251.96871.59161.35252.7537
1.02.01.51.96871.59161.35252.7537
1.02.02.01.96871.59161.35252.7537
1.251.01.03.08435.54262.25686.8014
1.251.01.253.08435.54262.25686.8014
1.251.01.53.08435.54262.25686.8014
1.251.02.03.08435.54262.25686.8014
1.251.251.02.81794.21492.00795.7171
1.251.251.252.81794.21492.00795.7171
1.251.251.52.81794.21492.00795.7171
1.251.252.02.81794.21492.00795.7171
1.251.51.02.65323.44231.83585.0376
1.251.51.252.65323.44231.83585.0376
1.251.51.52.65323.44231.83585.0376
1.251.52.02.65323.44231.83585.0376
1.252.01.02.46092.80081.69094.4275
1.252.01.252.46092.80081.69094.4275
1.252.01.52.46092.80081.69094.4275
1.252.02.02.46092.80081.69094.4275
Table 3. Mean bias and MSE of MLEs’ against various sample sizes.
Table 3. Mean bias and MSE of MLEs’ against various sample sizes.
SetnBias( α ^ )Bias( θ ^ )Bias( β ^ )MSE( α ^ )MSE( θ ^ )MSE( β ^ )
I15−0.02230.00370.14770.10020.46514.0309
25−0.01140.00150.09610.03770.14621.1923
50−0.00350.00010.08010.0190−0.00210.1015
75−0.00210.00000.07370.0114−0.01900.0024
100−0.00150.00000.06770.0086−0.02360.0019
150−0.00110.00000.06280.0066−0.02600.0015
II150.01130.0141−0.06210.15991.405414.8283
25−0.00860.0116−0.01190.11921.773816.6767
50−0.02330.00820.01630.06561.601013.6349
75−0.02430.00600.03590.05011.25969.7074
100−0.02340.00510.03430.03881.04957.5112
150−0.01990.00400.02950.03030.78345.0098
III15−0.00630.00107.791566.50110.13680.0201
25−0.00020.00018.502873.67860.13580.0190
50−0.00000.00008.455372.70720.13790.0195
750.00000.00008.643175.999990.13600.0188
1000.00000.00008.667875.34380.13550.0185
150−0.00010.00008.659575.20510.13650.0189
IV15−0.00410.00059.276991.0090−1.09361.1965
25−0.01120.00189.227690.4614−1.08051.1694
500.00070.00009.823597.1826−1.08951.1872
750.00070.00009.879197.9300−1.08941.1869
1000.00080.00009.940098.9089−1.08961.1874
1500.00080.00009.943698.9663−1.08941.1869
V15−0.06940.1861−0.01450.27641.942918.6212
25−0.09700.15070.00010.19441.935017.7271
50−0.10050.11170.03260.11241.573013.0220
75−0.12690.10360.06800.09501.642812.7965
100−0.07960.07020.03610.06500.98206.9775
150−0.06670.05060.03340.04380.68704.3325
Table 4. Comparison of Goodness of Fit Measures.
Table 4. Comparison of Goodness of Fit Measures.
Criterion/TestUseSensitivitySample Size
Kolmogorov–SmirnovEDF vs. CDFCenterLarge
Cramér–von MisesEDF vs. CDFEntire distributionModerate
Anderson–DarlingEDF vs. CDFTailsAll sizes
AIC/AICcModel fit/complexityFit (AICc for small n)All
BIC/CAICSimpler modelsStrong penaltyLarge
HQICBalanced selectionModerate penaltyModerate+
Table 5. Nonparameteric tests summary.
Table 5. Nonparameteric tests summary.
Data SetsKPSS-Testp-ValueMK-Testp-ValueSW-Testp-Value
I0.10040.1000−0.78910.43000.78790.0000
II0.16880.1000−0.71680.47350.86520.0000
Table 6. Kendall’s rank correlation ( τ ) test summary.
Table 6. Kendall’s rank correlation ( τ ) test summary.
Data Sets τ ^ t-Testp-Value
I−0.0762−0.79700.4254
II0.03880.56020.5753
Table 7. Descriptive summary of data sets.
Table 7. Descriptive summary of data sets.
Data SetSample SizeMeanMedianS.DSKKU
I523011.732720.001363.712.11998.6042
II9662.978557.13414.40871.137413.53627
Table 8. MLEs and goodness-of-fit measures of Data Set-I.
Table 8. MLEs and goodness-of-fit measures of Data Set-I.
Distribution α ^ θ ^ β ^ A 0 * W 0 * KSp-Value
SSD1166.683.23965.59550.17420.03050.07750.9311
Kappa(3)4.49550.95772614.6533.99227.30470.74080.0211
GD(3)74.81710.00240.30891.07320.17500.13260.3552
WD(2)2.30713404.18-2.61050.44310.19270.0525
GD(2)6.7968443.11-1.50660.25590.15490.1903
EVD(2)2479.07804.21-0.89170.12530.10160.6926
LLD(2)4.93092693.13-0.59370.05950.06830.9263
LND(2)7.93490.3668-0.91900.14790.12360.4427
GuD(2)3798.062001.99-5.94061.10050.27650.0011
Table 9. MLEs and goodness-of-fit measures of Data Set-II.
Table 9. MLEs and goodness-of-fit measures of Data Set-II.
Distribution α ^ θ ^ β ^ A 0 * W 0 * KSp-Value
SSD46.98954.28500.73970.69100.10380.08050.5689
Kappa(3)687.65870.0009152.931.92400.29880.10610.2357
GD(3)38.87700.46680.74733.12050.51290.14820.0307
WD(2)4.338268.81924.44530.72760.17710.0051
GD(2)21.76292.89383.26490.54130.15200.0248
EVD(2)56.71699.72132.62600.40970.128320.0875
LLD(2)1.712251042.472.71600.37420.12160.1202
LND(2)4.11960.20962.91660.48040.14500.0368
GuD(2)70.822017.04196.32411.07150.22130.0002
Table 10. Information criterion for Data Set-I.
Table 10. Information criterion for Data Set-I.
Distribution l AICAICCBICHQICCAIC
SSD 429.372864.743865.243870.597866.987873.597
Kappa(3)434.2351870.8700871.3700875.7300874.6200871.3700
GD(3)435.2270878.4540879.3050886.2590873.2020879.3050
WD(2)444.8790893.7580894.0020897.6600892.5060894.0020
GD(2)437.8450879.6910879.9360883.5930878.4390879.9360
EVD(2)434.3140872.6290872.8740876.5310871.3770872.8740
LLD(2)433.7180871.4350871.6800875.3380870.1840871.6800
LND(2)434.2490872.4980872.7430876.4000871.2460872.7430
GuD(2)467.7230939.4460939.6910943.3480938.1940939.6910
Table 11. Information criterion for Data Set-II.
Table 11. Information criterion for Data Set-II.
Distribution l AICAICCBICHQICCAIC
SSD 362.926731.852732.102739.667735.008742.667
Kappa(3)372.448750.897751.158758.59754.007761.59
GD(3)383.824771.647771.776776.776773.72778.776
WD(2)397.187798.374798.503803.503800.447805.503
GD(2)384.589773.178773.307778.306775.251780.306
EVD(2)376.169756.337756.466761.466758.41763.466
LLD(2)383.481770.962771.091776.09773.035778.09
LND(2)381.702767.404767.533772.533769.477774.533
GuD(2)412.409828.817828.946833.946830.891835.946
Table 12. Return levels against specific time period ( T ).
Table 12. Return levels against specific time period ( T ).
Data Set5102025304050
I3647.864584.175716.356132.046492.697103.317614.74
II71.279583.584798.0997103.307107.77115.218121.354
Table 13. Return period against some threshold values ( x T ).
Table 13. Return period against some threshold values ( x T ).
Data Set50005500600070008000900010,000
I13.109817.699423.326238.165158.597585.6291120.297
Data Set45506080100120140
II1.12391.15572.36828.265921.726947.645992.3923
Table 14. 95% confidence intervals for return level estimates.
Table 14. 95% confidence intervals for return level estimates.
Data Set5102050
I3647.86 ± 951.4274584.17 ± 1308.835716.35 ± 1765.567614.74 ± 2579.83
II71.2795 ± 108.30583.5847 ± 141.60698.0997 ± 182.433121.354 ± 252.433
Table 15. 95% confidence intervals for return period estimates.
Table 15. 95% confidence intervals for return period estimates.
Data Set5102050
I(2.1798,10.1026)(3.6618,22.0289)(6.3433,47.0698)(13.399, 47.0616)
II(0,268.873)(0,709.365)(0, 1819.15)(0, 6222.27)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hussain, T.; Shakil, M.; Ahsanullah, M.; Kibria, B.M.G. A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics 2025, 4, 19. https://doi.org/10.3390/analytics4030019

AMA Style

Hussain T, Shakil M, Ahsanullah M, Kibria BMG. A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics. 2025; 4(3):19. https://doi.org/10.3390/analytics4030019

Chicago/Turabian Style

Hussain, Tassaddaq, Mohammad Shakil, Mohammad Ahsanullah, and Bhuiyan Mohammad Golam Kibria. 2025. "A Bounded Sine Skewed Model for Hydrological Data Analysis" Analytics 4, no. 3: 19. https://doi.org/10.3390/analytics4030019

APA Style

Hussain, T., Shakil, M., Ahsanullah, M., & Kibria, B. M. G. (2025). A Bounded Sine Skewed Model for Hydrological Data Analysis. Analytics, 4(3), 19. https://doi.org/10.3390/analytics4030019

Article Metrics

Back to TopTop