On Modeling Concrete Compressive Strength Data Using Laplace Birnbaum-Saunders Distribution Assuming Contaminated Information

: Compressive strength is a well-known measurement to evaluate the endurance of a given concrete mixture to stress factors, such as compressive loads. A suggested approach to assess compressive strength of concrete is to assume that it follows a probability model from which its reliability is calculated. In reliability analysis, a probability distribution’s reliability function is used to calculate the probability of a specimen surviving to a certain threshold without damage. To approximate the reliability of a subject of interest, one must estimate the corresponding parameters of the probability model. Researchers typically formulate an optimization problem, which is often nonlinear, based on the maximum likelihood theory to obtain estimates for the targeted parameters and then estimate the reliability. Nevertheless, there are additional nonlinear optimization problems in practice from which different estimators for the model parameters are obtained once they are solved numerically. Under normal circumstances, these estimators may perform similarly. However, some might become more robust under irregular situations, such as in the case of data contamination. In this paper, nine frequentist estimators are derived for the parameters of the Laplace Birnbaum-Saunders distribution and then applied to a simulated data set and a real data set. Afterwards, they are compared numerically via Monte Carlo comparative simulation study. The resulting estimates for the reliability based on these estimators are also assessed in the latter study.


Introduction
In reality, concrete is the most widely used construction material in the world. Concrete compressive strength is a measure used in determining the amount of resistance a structural element can offer to deformation. Compressive strength is a widely used measure to access the performance of a given concrete mixture. Considering this approach of concrete is vital due to it is the main measure deciding how well concrete can withstand loads that influence its measure. It precisely tells us whether a particular mix is suitable to meet the necessities of a particular venture. Concrete can astoundingly stand up to compressive loading. This is often why it is reasonable for constructing arches, columns, dams, foundations, and tunnel linings among other structures.
Researchers from different fields of science may attempt to describe phenomena of interest, such as concrete compressive strength, using either mathematical or probabilistic models. For instance, scientists describe the life of an object using a probabilistic model called a lifetime probability distribution. Such practice is common in the scientific community, especially in many science fields that involve reliability and reliability analyses. For example, in material science, the two-parameter Birnbaum-Saunders (BS) lifetime distribution can be used by analysts to model fatigue of materials due to periodic cyclic loading [1,2]. This nonnegative lifetime model has unimodal skewed probability density and hazard rate curves.
Unimodal hazard rate curves are common in practice; see, for example, Langlands et al. [3]. A non-negative continuous random variable T is said to follow the BS distribution if the corresponding cumulative distribution function (CDF) is given by: where α is the shape parameter, β is the scale parameter, and Φ(·) is the CDF of the standard normal distribution. Desmond [4] provided a more general derivation for the BS distribution assuming a biological model which reinforced the physical rationalization for the use of the BS distribution by relaxing the original presumptions of [1,2]. The BS distribution has desirable aspects and a close relationship to the normal distribution. Consequently, at least a couple of hundred papers and a single research monograph have already appeared describing all properties and developments of this distribution; see, for example, the comprehensive review by Balakrishnan and Kundu [5] in this connection. Examples of recent applications of the BS distribution are such as Bourguignon et al. [6], Hassani et al. [7], and Kannan et al. [8], among others. The BS distribution belongs to a generalized family of distributions called the generalized BS distribution [9]. The generalization is obtained by replacing the Gaussian kernel in Equation (1) with kernels of symmetrical distributions such as the Laplace and logistic distributions. Sampling plans from truncated life tests assuming the generalized BS distribution were developed in [10], while the generalized BS distribution was used to analyze air pollutant concentration in [11]. For further details in this connection, see [12]. The Laplace BS (LBS) distribution is a BS distribution based on Laplace kernel. Its properties and some associated estimation methods were studied by Zhu and Balakrishnan [13]. This paper expands upon their work by discussing additional methods to estimate the parameter of the LBS distribution assuming data contamination. A positive random variable is said to follow the LBS distribution if the corresponding CDF, reliability function, and the probability density functions are given by: and respectively, such that α is a positive shape parameter, while β is a positive scale parameter, where: and The LBS distribution and other lifetime models are characterized by essential statistical properties such as hazard (failure) rates and reliability. The latter concept represents the probability of a specimen continuing to exist for a certain amount of time without loss. Approximating the aspects of the life of objects of interest (e.g., reliability) is associated with model parameters estimation. Estimating the probability distribution parameters has been of great interest to scientists and has received much attention in the statistical literature. In practice, one can obtain various estimators for the model parameters. Therefore, several researchers conducted comparative Monte Carlo simulation studies to numerically assess the estimators from different statistical and computation perspectives; see, for example, Gupta and Kundu [14], Alkasasbeh and Raqab [15], Mazucheli et al. [16], do Espirito Santo and Mazucheli [17], and Balakrishnan and Alam [18], among other research contributions.
Due to the advancement of human civilization, researchers currently deal with various types and large amounts of data originated from the targeted phenomena. In practice, there is no guarantee that part of the data may be missing or contaminated with unusual information values called outliers or extremes. In both cases, estimation efficiency is negatively impacted since the quality of data became questionable. Under data contamination, the maximum likelihood method is not robust since the deviations caused by existing outliers can negatively affect the likelihood. Data contamination motivated many researchers to propose alternative estimators to those obtained by the maximum likelihood theory for various distributions; see, for example, Lawson et al. [19], Boudt et al. [20], Agostinelli et al. [21], Wang et al. [22], among other papers.
The aim of this paper is twofold. Firstly, the LBS distribution is fitted to a real concrete compressive strength data using nine estimation methods. Secondly, the performances of the resulting estimators from the considered estimation methods; namely, modified moments estimators (MMEs), maximum likelihood estimators (MLEs), the maximum product of spacings estimators (MPSEs), the least-squares estimators (LSEs), the weighted leastsquares estimators (WLSEs), the percentile estimators (PCEs), the Cramér-von Mises estimators (CVMEs), the Anderson-Darling estimators (ADEs), and the right-tailed Anderson-Darling estimators (RADEs) are investigated using numerical applications and Monte Carlo simulations. The remaining parts of this article are organized as follows. Section 2 discusses the nine estimation methods of interest. Section 3 illustrates the practical application of the discussed estimators using a simulated data set and a real data set. Section 4 reports the outcomes of extensive Monte Carlo simulation study to compare the performance of each estimator under different settings. Finally, remarks and future research are used to conclude this paper in Section 5.

Reliability and Model Parameters Estimation
In this section, eight nonlinear optimization problems are formulated to obtain eight estimators for the α and β parameters mentioned in the introductory section. The ninth estimators are the MMEs which have closed-form expression [13]. Before establishing the targeted optimization problems, one must discussing some computational considerations to solve them.

Computational Considerations for the Optimization Process
The first important computational consideration is finding suitable starting values to solve the optimization problems. One of the important and practical aspects of the LBS distribution is that the scale parameter β is actually the median of the population. Hence, a reasonable starting value for this parameter is the sample median, i.e., β (0) = median(t 1 , . . . , t 2 ). Regarding the starting value for the shape parameter α, one can acquire such value by using the relationship between the LBS distribution and the standard Laplace distribution. In fact, if T follows the LBS distribution with model parameters α and β, then: follows the Laplace distribution with the location parameter equals to 0, and the scale parameter equals to α. Hence, a starting value for α is obtained by determining the sample Z values; say, z 1 , . . . , z n , based on the observed data t 1 , . . . , t n and the corresponding sample medianβ (0) , and then finding the mean absolute deviation (MAD) of z 1 , . . . , z n as a starting value for α. That is,α The second computational consideration is the choice of optimization algorithm from which one can obtain the estimators for the LBS distribution. Zhu and Balakrishnan [13] showed that the probability density function of the LBS distribution Equation (4) is continuous, but it is not differentiable at β. Consequently, estimators like the MLEs require an optimization method free of derivatives to be obtained, such as the Nelder-Mead algorithm [23]. Note that the latter algorithm is utilized to acquire the remaining estimators to avoid any potential bias in the competitive computations.

Maximum Likelihood Estimation
Zhu and Balakrishnan [13] have obtained MLEs for α and β and prove their existence and uniqueness. One approach to obtain such estimators is by solving the following maximization algorithm: such that t 1 , . . . , t n are the observed random sample.

Least-Squares-Based Estimations
Given an observed random sample t 1 , . . . , t n from the LBS distribution with parameters α and β. Suppose that t 1:n < . . . , t n:n are the corresponding observed sample order statistics, and consider the following minimization problem: Hence, the solutions of the above optimization problem are the LSEs for α and β given that w 1 = · · · = w n = 1. However, if w i = [(n + 1) 2 (n + 1)]/[i(n − i + 1)], then the solutions of the above minimization problem are the WLSEs of α and β [24].

Percentile Estimation
PCEs of α and β are acquired by fitting a linear model to the theoretical percentiles and the sample percentiles [25,26]. This method requires closed-form cumulative distribution and quantile functions. In the case of the LBS distribution, the cumulative distribution function was given by Equation (2), while the quantile function is defined as: such that 0 < u < 1, α, β > 0, and Q L (u) is the quantile function of the standard Laplace distribution, i.e., Q L (u) = − sgn(u − 0.5) log(1 − 2|u − 0.5|). The PCEs of α and β are the solutions for the following minimization problem: where t 1:n , . . . , t n:n are the observed sample order statistics, and p i:n = i/(n + 1).

Maximum Product of Spacing Estimation
Another estimators of α and β that depends on solving a maximization problem are the MPSEs. Recent research indicates that such estimates compete with MLEs in terms of estimation efficiency and asymptotic properties [27][28][29]. Given the observed sample order statistics t 1:n < . . . < t n:n , then the MPSEs for the model parameters are determined numerically by solving the following maximization problem: such that

Goodness-of-Fit Estimations
The remaining three estimators are based on the idea of minimizing goodness-of-fit statistics, i.e., minimizing the difference between the estimated cumulative distribution function and an empirical counterpart. Examples of such statistics are the Cramér-von Mises, the Anderson-Darling, and the right-tailed Anderson-Darling statistics. The CVMEs of α and β are obtained by evaluating the following minimization problem: On the other hand, ADEs and RADEs of α and β are acquired as solutions for the following minimization problems: respectively.

Numerical Applications
In this section, a simulated data is analyzed; afterwards, a real data is analyzed for the sake of illustration.

Simulated Data Analysis
Suppose 15 random observations are generated from the LBS distribution with α = β = 1 as shown in Table 1. The simulated data in Table 1 are obtained according to the following algorithm: 1. Generate a random sample U 1 , . . . , U n from the standard uniform distribution (i.e., U i ∼ Uniform(0, 1, ∀i.) 2.
Before obtaining the estimates for α and β, one must check their existence and uniqueness. Mathematically proving these requirements is beyond the scope of this study; nevertheless, one may prove them using graphical devices. Using extensive Monte Carlo simulations, a three-dimensional (3D) profile plot for each objective function in the preceding section is established, as shown in Figure 1. The 3D charts clearly indicate that there are areas in which global extrema exist and are expected to be unique for each objective function. The MMEs are obtained first for α and β as shown in [13], and then the remaining estimators are obtained successively as shown in Table 2. The outcomes of the latter table are obtained assuming no contamination in the data, and assuming contamination in the upper 20% order statistics. On the other hand, based on the obtained estimates under both assumptions, Table 3 provides the actual reliability probabilities calculated from the complement of Equation (2) vs. their approximated counterparts using different estimators. Both true and approximated reliability probabilities are evaluated at the sample minimum, the sample three quartiles (Q 1 , Q 2 , Q 3 ), and the sample maximum. From the previous tables, one can easily observe that the PCEs for α and β provided the farthest approximations for the model parameters and the reliability probabilities. The performances of the remaining estimates are further assessed later.

Real Data Analysis
To illustrate the application of the considered estimation methods in practice, the concrete compressive strength data of [30] is considered for analysis. This data set established from 17 different sources to check the reliability of a suggested strength model. The data gathered concrete comprising cement alongside fly ash, blast furnace slag, and superplasticizer. The data set consisted of a single response variable; namely, the compressive strength of concrete (in MPa), and 8 covariates. Using this data, various estimated models are obtained and compared by the means of Kolmogorov-Smirnov (KS) test. The latter one-sample testing procedure is used to test the null that the distribution function of a given data set is that of the probability distribution of interest. To obtain the KS statistic, one must consider the following steps: 1.
Obtain the estimates of the parameters α and β, denoted byα andβ.
Calculate the value of KS statistic as follows: and accordingly calculate the p-value to make a decision about the hypotheses.
Furthermore, since ties exist and the model parameters were estimated, the p-values for the KS statistics were obtained using B = 1, 000 parametric bootstrapping samples. The steps to obtain the bootstrapping p-value for KS test are as follow: 1.
For each method, obtain the estimates of the model parameters α and β; say,α andβ.

2.
Use the estimates in the previous step and the algorithm in the preceding section to generate a random sample X * 1 , . . . , X * n from the LBS distribution with shape parameter α and scale parameterβ.

4.
Calculate the p-value as follows: where I j is an indicator function, such that I j = 1 if KS (j) > KS, and zero otherwise, for j = 1, . . . , B and KS is the KS statistic obtained from the original data set. Finally, it is important to mention that since ties were observed in the data, the MPSEs cannot be acquired directly. When ties exist, one may use a generalization of the maximum product of spacings method to obtain the required estimators; see Murage et al. [31] for additional details. Alongside the estimated parameters and the goodness-of-fit statistics, the reliability probability is calculated at 17 MPa, 28 MPa, and 70 MPa by substituting these values in Equation (3) and replacing the model parameters with the corresponding estimates. In practice, concrete compressive strength can fluctuate between 17 MPa and 28 MPa for residential concrete, while in can be higher as 70 MPa in the case of commercial constructions [32].
Tables 4-7 respectively summarize the analyses of the considered data set assuming no contamination in the data, 20% of upper data contamination (i.e., the upper 20% of order statistics are multiplied by 5), 20% of lower data contamination (i.e., the lower 20% of order statistics are divided by 5), and 40% two-tailed data contamination which is a mixture of the previous data contamination cases. From the latter tables, one can note the following observations based on the values of the KS test statistics: • When there is no data contamination, both MLEs and MPSEs performed well in terms of goodness-of-fit. • In the case of upper data contamination, ADEs outperformed both MLEs and MPSEs which took second and third place, respectively. • On the other hand, both MLEs and MPSEs maintained their performance followed by ADEs in the case of lower data contamination.
• In contrast, WLSEs have perform better than MPSEs and MLEs when two-tailed data contamination exists. • Overall, MLEs and MPSEs provided the best results in terms of goodness-of-fit, and they both have endured data contamination unlike their counterparts. This is most likely due to the fact that the sample size is large (1000+ units). Furthermore, PCEs and MMEs did not perform well among compared to their counterparts in all considered settings. • Finally, according to the reliability proportions estimated by MLEs and MPSEs, one can conclude that the sampled specimens of [30] were suitable for residential buildings.

Simulation Outcomes
This section presents the outcomes of Monte Carlo simulation experiments based on 1000 random samples from the LBS distribution with different combinations of values for the shape parameter and sample sizes assuming the following scenarios: • For each scenario, the simulation study assumes n = 10(10)100, α = 0.5(0.5)2.0, and β = 1, without loss of any generality. To measure estimation efficiency, the simulated bias and simulated root mean-squared-error (RMSE) are calculated as such that N = 1000, whileα i (β i ) is an estimate of the model parameter α (β) based on simulation repetition i. Furthermore, to measure the goodness-of-fit of the fitted model parameters based on the nine estimators, the average absolute difference between the true and estimated reliability function (D abs ), and the maximum absolute difference between the true and estimated reliability function (D max ) are determined as respectively, such that S(t; α, β) = 1 − F(t; α, β), whileα i (β i ) is an estimate of the model parameter α (β) based on simulation repetition i. From a statistical perspective, an estimator is computationally consistent when its simulated bias tends to 0 as the sample size increases. Furthermore, when the simulated bias neither increases nor decreases when data contamination exists, then one can conclude that the estimator is computationally robust. Here, Figures 2 and 3 clearly indicate that the most consistent and robust estimators for the model parameters α and β are the MPSEs and CVMEs regardless of the sample size and the true value of α.
In computational statistics, an estimator is computationally efficient when the simulated RMSE tends to 0 as the sample size increases regardless of the existence of data contamination. Figures 4 and 5 suggest that the MPSEs, CVMEs, and the LSEs of α and β are the most efficient estimators compared to the other ones regardless of the simulation settings.
In goodness-of-fit analysis, whenever a pair of estimators yields D abs and D max that tends to 0 as the sample size increases, and are not negatively affected by data contamination, then this pair of estimates provides the least difference between the true and estimated reliability function. This is very important in practice since the aim is to find the best approximation for the reliability. Figures 6 and 7 again indicate that MPSEs, CVMEs, and the LSEs of α and β are the estimators that performed well in terms of goodness-of-fit.

Conclusions
In this paper, the estimation problem of the parameters and the reliability function of the Laplace Birnbaum-Saunders lifetime distribution is considered. Besides the method of maximum likelihood, eight classical frequentist estimation methods have been discussed for this purpose; namely, modified moments, maximum product of spacings, least-squares, weighted least-squares, percentile, Cramér-von Mises, Anderson-Darling and right-tailed Anderson-Darling estimation methods. Based on the assumption that the invariance property is exist for the different estimation methods, the reliability function is also estimated using the different estimation methods. To compare the performance of the different estimators a Monte Carlo simulation study is conducted. The practical application of the estimators is illustrated by analyzing a simulated data set and one real data set belongs to compressive strength of concrete. Both data analyses and the Monte Carlo simulation study indicated that all methods perform well when there is no contamination in the data. Once there is some contamination in the data, maximum product of spacings, least-squares, and Cramér-von Mises estimates are notably robust compared to the other estimators and the performance of the other method improve as the sample size increases. Data contamination is not the only problem that faces researchers in practice. Data censoring is another practical challenge that needs to be addressed in future research since it negatively impacts estimation efficiency and robustness. Another important research direction is to compare the studied frequentist estimators to Bayesian estimation in terms of performance based on additional real experimental results which are available in literature.