Applications of the Sine Modified Lindley Distribution to Biomedical Data

In this paper, the applicability of the sine modified Lindley distribution, recently introduced in the statistical literature, is highlighted via the goodness-of-fit approach on biological data. In particular, it is shown to be beneficial in estimating and modeling the life periods of growth hormone guinea pigs given tubercle bacilli, growth hormone treatment for children, and the size of tumors in cancer patients. We anticipate that our model will be effective in modeling the survival times of diseases related to cancer. The R codes for the figures, as well as information on how the data are processed, are provided.


Introduction
When people are diagnosed with cancer, COVID-19, or any other severe condition, or when clinical trials of a new treatment are conducted, survival is a major concern. Carcinoma is a general term that refers to a variety of diseases that can affect any part of the body. One of the risk factors is the abrupt development of aberrant cells that grow beyond their usual bounds, allowing them to infect nearby portions of the body and travel to other organs, which is one of the risk factors; this is known as metastasis. Widespread metastases are the leading cause of cancer death.
Doctors seek to regulate the growth and size of these tumors in the places where they arise in order to protect human lives. The tumor stage and survival times are two aspects, as they aid doctors in determining the best treatment for their patients. As a result, determining the probability distribution of tumor size and survival durations is crucial for selecting the best treatment option.
In order to analyze the numbers of people who are diagnosed with and die from severe diseases each year, the number of people who are currently living after the diagnosis of a disease, the mean age at which a disease was diagnosed, and the number of people who are still alive at a given time after diagnosis, statistics can be used. It also gives an idea of the differences among groups defined by age, sex, racial/ethnic group, geographic location, and other categories.
One such way of analyzing the properties of the survival data or the size of the tumor is by modeling the data. Data modelling related to biological science is of utmost importance to understanding the data statistically. Over the years, many researchers have developed discrete as well as continuous distributions that help in modelling biological data. Ref. [1] developed the Marshall-Olkin Inverse Lomax distribution (MO-ILD), which is used in modeling cancer stem cells. Ref. [2] studied the weighted generalized Quasi Lindley distribution, which was studied to model COVID-19 data from Algeria and Saudi Arabia, and Ref. [3] modeled the survival times of guinea pigs infected with virulent tubercle bacilli using the Sine Half-Logistic Inverse Rayleigh distribution. With this motivation in mind, we use the existing sine-modified Lindley (S-ML) distribution developed by [4] in modelling data related to different types of cancer. We also provide optimized open source S-ML distribution codes for practitioners to use. This paper is structured as follows. Section 2 covers a review of the existing S-ML distribution. Section 3 includes the application of the distribution to cancer data, as well as various visual presentations to back up the numbers, and Section 4 concludes the study.

The S-ML Distribution
In this section, a brief review of the definitions and properties of the sine generated (S-G or Sin-G in some references) family of distributions, the modified Lindley distribution, and the S-ML distribution is implemented. Due to their application and operating capability in a range of contexts, the families defined by "trigonometric transformations" have sparked a lot of interest in recent years. The sinusoidal transformation that contributes to the S-G family was initially studied by [5].

S-G Family of Distributions
The corresponding basic definitions of the associated distribution function (DF) and PDF given, respectively, by where G(y; γ) and g(y; γ) are the DF and PDF of a certain continuous distribution with parameter vector denoted by γ, respectively. These functions are linked to a reference or parent distribution that the practitioner determines ahead of time based on the study's goals. The S-G family is well-known as a potential parent family alternative. Without introducing extra parameters, the following stochastic ordering holds: G(y; γ) ≤ F S−G (y; γ) for every y ∈ R. The S-G family provides the capability to develop flexible statistical models that can handle a variety of data. The recent works on the S-G family include the sine Lindley and the sine exponential distribution introduced by [6], the transformed S-G family studied by [7], the sine Topp Leone-G family of distributions developed by [8], sine Kumaraswamy-G family introduced by [9], the sine extended odd Fréchet-G family of distributions studied by [10], and the sine power Lomax model by [11].
Ref. [4] improved the S-G family's performance by applying it to a specific oneparameter distribution established by [12]: the modified Lindley (ML) distribution. The S-ML distribution was developed as a result.

Modified Lindley Distribution
The ML distribution proposed by [12] is made possible by applying the tuning function e −β y, β > 0 to the Lindley distribution with the goal of boosting its capabilities in a variety of domains. As a result, the ML distribution is defined by the DF expressed as follows: The PDF is given by g ML (y; β) = β 1 + β e −2βy (1 + β)e βy + 2βy − 1 , y > 0, respectively, with β > 0, and G ML (y; β) = g ML (y; β) = 0 for y ≤ 0.
The ML distribution adapts to rising, reverse bathtub, and constant hazard rates and is a mixture of the exponential and gamma distributions with parameters β and (2, 2β).
The practical benefit is very significant; for the three data sets shown in [12], the ML model outperforms the Lindley and exponential models. The wrapped modified Lindley distribution proposed by [13] and the inverted modified Lindley distribution proposed by [14] are two examples of improvements to the ML distribution.

S-ML Distribution
The corresponding DF and PDF of the S-ML distribution , respectively, with β > 0, and F S-ML (y; β) = f S-ML (y; β) = 0 for y ≤ 0. By varying the value of β, different variants of f S-ML (y; β) can be obtained. Figure 1 depicts the most representative of them. We can imply from Figure 1, that for smaller values of β, local increasing shape are seen; the distribution is unimodal, larger values of β, the plot of f S-ML (y; β) decreases and is leptokurtic in shape. The shapes of the S-ML probability density function (PDF) are found to be adaptable to different shapes, being unimodal, decreasing, and right-skewed.
The S-ML distribution has also been shown to exhibit a non-monotonic hazard rate function (HRF), depicting an increasing-reverse bathtub-constant shape. The distribution's applicability and adaptability make it very appealing for modeling data from various fields and [4] has proved that the model stands strong against twelve other competent distributions, such as the generalized beta type 2 distribution introduced by [15], the Lomax distribution studied by [16], and the lognormal distribution developed by [17] in modelling data related to weather and engineering.

Applications
In the statistical literature on life-testing experiments, numerous distributions have been developed. Some of which can be used to model the increase or decrease in failure rates, while others can model bathtub and upside-down bathtub failure rates, and still others can do both. We have examined a few distributions in this case, which include the S-ML distribution against the sine-Lindley distribution (S-Lindley) defined by [18], the sine-exponential (S-Expo) distribution studied by [6], the inverse Lindley distribution (IL) introduced by [19], and the exponential (Expo) distribution as seen in [20].
The PDF and DF of the competing models used against the S-ML model are displayed in Table 1. Table 1. DF and PDF of the competitive models used against the S-ML model.

S-Lindley cos
We begin by investigating the descriptive measures of the modeled data-sets, which include the mean (µ), median (M), standard deviation (σ), skewness (γ 1 ) and the kurtosis (γ 2 ). • A statistical analysis is conducted on the data-sets with the help of the statistical software [21]. The statistical analysis includes evaluating the estimate (β) of the data by the method of maximum likelihood estimation, the related standard error (SE), and other statistical measures such as the goodness-of-fit (GOF) test statistics including Akaike Information criterion (AIC), Bayesian information criterion (BIC) along with Anderson Darling statistic (A * ), Cramér-von Mises statistic (w * ) and Kolmgrov-Smirnov statistic (D n ) with its correspondig p-value. The AIC is defined to be the BIC is given by where ll denotes the log-likelihood function taken at the maximum likelihood estimate, n denotes the number of data and k represents the numver of model parameters.
The model with the highest p-value and the lowest values for D n , w * , and A * , as well as the AIC and BIC values, is the best fit for the data. It will be highlighted in the coming numerical tables with the blue color. The software R is used to conduct the estimation. • Finally, for a visual representation, the empirical probability density function (EPDF) plots and the empirical cumulative density function (ECDF) plots, accompanied by the box plot and total time on test (TTT) plot, are displayed. The box plot gives a visual representation of the descriptive measures of the data and the TTT plot, proved useful for gaining information about the hazard form of the data. In many real-world situations, there is qualitative information about the shape of the failure rate function that might help in the selection of a particular distribution. The TTT plot has a convex shape for decreasing HRF and a concave shape for increasing HRF.

Survival Times of Growth Hormone Medication
The first data set consists of the estimated time from growth hormone medication until the children reached the target age in the Programa Hormonal de Secretaria de Saude de Minas Gerais in 2009, as reported in [22].
A summary of the measures of descriptive statistics is provided in Table 2 with the box and TTT plots plotted in Figure 2.   Table 3 providesβ, the SE and the GOF metrics of the survival times of growth hormone medication. Statistical Analysis-Based on the information in Table 2, we can conclude that the data are positively skewed and mesokurtic, as evidenced by the box plot in Figure 2. The TTT plot of the survival times of the data set is displayed in Figure 2. It shows an increasing HRF plot. In addition, analysis of the data set shows that the evaluated model (S-ML) is the best model throughout all elements of the model selection criteria, such as the increasing hazard function. The S-ML model has a higher p-value and minimum values for the test statistics including the AIC, BIC, A * , w * , and D n values, as shown in Table 3. The EPDF and ECDF plots are given in Figure 3.
The plots in Figure 3 display that the S-ML and S-Lindley models give a better fit to the data set than the S-Lindley, S-Expo, IL, and Expo models.

Survival Times of Guinea Pigs Data
This data set was originally studied by [23], which has also been analyzed previously by [24]. The data set represents the survival times of n = 72 guinea pigs injected with different doses of tuberculosis bacilli. The main concern of this data set is to predict the survival times of the guinea pigs because they have a high susceptibility to human tuberculosis.
A summary of measures of descriptive statistics is provided in Table 4 with the box and TTT plots displayed in Figure 4.    Table 4 informs us that the data are right-skewed and leptokurtic, as demonstrated by a graphical representation of the box plot in Figure 4. Figure 4 also illustrates the TTT plot of this data set. It displays an increasing HRF plot. Moreover, analysis of the data set implies that the S-ML distribution is the best model among the other competitive models, when statistical GOF criteria and the increasing HRF are considered. We can observe from Table 5, that the S-ML distribution has minimum values for the test statistics with a higher p-value and least values for GOF metrics. The EPDF and ECDF plots are displayed in Figure 5.
From Figure 5, we can also confirm this suitability behavior, as the plots of S-ML and S-Lindley distribution trace the shape of the data very well. We can conclude from Table 5 and Figure 5 that the S-ML model perfectly describes the survival times of guinea pigs.

Size of Tumors in Lung Cancer Patients
A swelling or tumor arises when the cells in the lungs expand at an abnormally fast rate, which can lead to lung cancer. It is possible to identify that and see if its spread to other organs based on a variety of indicators. One of these characteristics is tumor stage, which aids doctors in determining the best treatment for their patients. The tumor size is used to determine the staging system. The data show the tumor size of 76 lung cancer patients at Tanta University's chest hospital, sixty of whom are in stage I, seven in stage II, and the rest in stage III.
A summary of measures of descriptive statistics is provided in Table 6 with the box and TTT plots plotted in Figure 6.    Table 6, we see that the data are right-skewed and leptokurtic. This is proved in a graphical display of the box plot in Figure 6. Figure 6 shows the TTT plot of this data set. It illustrates an increasing HRF plot. From Table 7, the S-ML model has minimum values for D n and higher p-value with least values for AIC and BIC. The EPDF and ECDF plots are illustrated in Figure 7.
The plots in Figure 7 show that S-ML distribution captures the shape of the histogram of the data set. We can conclude from Table 7 and Figure 7 that the S-ML distribution can be used to model this data set related to the size of tumors in lung cancer patients.

Conclusions
In this paper, we have extended the applications of the sine-modified Lindley (S-ML) distribution developed by [4] to model biomedical data. The distribution yields the benefits of both the modified Lindley and S-G distributional functionalities. It was used to investigate the distribution of tumor size, patients diagnosed with cancer's survival durations, and medications provided. The AIC, BIC, and test statistics such as A * , w * , and D n with their associated p-values are used to select the best-fitting model. These metrics are supported by a visual representation of how well the S-ML model fits the data, such as a box plot or a TTT plot. We believe the findings are superior to other competing distributions for modeling biomedical data and can be used to model a range of other biological data. We have also included the data sets and R codes for all of the figures in the paper, as well as all of the estimations, and the tests carried out. We refer readers to the Appendix A for these R codes.
Funding: This research received no external funding.

Acknowledgments:
We would like to thank the two referees for the constructive comments on the paper.

Conflicts of Interest:
The authors declare no conflict of interest.