1. Introduction
Entropy, as introduced by [
1], is frequently used to measure the uncertainty in the probability distribution of a random variable (
rv). The concept of entropy plays a fundamental role in statistical theory and its applications (see [
2,
3]). A well-known application of entropy in statistics is the test for normality, based on the property that the normal distribution has the highest entropy among all continuous distributions with a given variance (see [
4]). A recent review of Shannon entropy and related measures like Rényi and Tsallis entropy can be found in [
5]. The differential entropy for a non-negative continuous
rv X, with a probability density function (
pdf)
, can be defined as:
When a unit has survived up to a specific age
t, understanding the distribution of its remaining lifetime is especially important in reliability and survival analysis. To deal with this, Ref. [
6] introduced the notion of residual entropy. Later, Ref. [
7] suggested a way to describe the lifetime distribution by using conditional Shannon’s entropy. Based on these, Refs. [
6,
7] studied certain ordering and aging characteristics of lifetime distributions. Ref. [
8] expanded some results presented by [
9]. Ref. [
10] described a distribution using the functional connections between residual entropy and the hazard rate function. Refs. [
11,
12,
13] studied different generalizations of Ebrahimi‘s measure. For a non-negative
X, which represents the lifetime of a component, the residual entropy function can be defined as:
where
is the survival function (
).
The measure called extropy is considered as the complementary dual of entropy. For a non-negative and absolutely continuous
rv X, which represents the lifetime of a component with
, the differential extropy of
X, as introduced by [
14], is defined as:
This measure provides an alternative way to assess the concentration or spread of the
pdf over the domain of
X, offering a different perspective on the uncertainty or distribution associated with the
rv’s lifetime. By utilizing extropy, researchers can quantify the relative uncertainty of one variable compared to another, which is particularly beneficial in areas such as reliability engineering, information theory, and decision-making processes.
Extropy serves as a complementary measure to entropy for quantifying uncertainty in
rv’s, and it is particularly useful for comparing the uncertainties of two
rv’s. The uncertainty in a
X can be quantified by considering the difference between the outcomes of two independent repetitions of the same experiment. Let
and
denote such outcomes. Then, the difference
reflects the uncertainty associated with
X. Then, the
of
is given by,
It follows that the probability of
can be approximated as
. Therefore, if the extropy of
is smaller than that of another
, i.e.,
, then
possesses greater uncertainty than
(see [
15]); further foundational studies on extropy can be found in [
16]. As the concept of
is not suitable for a
rv that has already persisted for a certain period, Ref. [
17] proposed the concept of residual extropy, defined as:
where
is the
.
Residual extropy was introduced as a tool to assess the uncertainty associated with a
rv. This measure has gained growing importance in areas such as survival analysis, reliability analysis, and actuarial science, as it provides crucial insights into the behavior and dynamics of systems and processes over time. In commercial or scientific fields such as astronomical measurements of heat distribution in galaxies, extropic analysis offers valuable insights worthy of further exploration (see [
18,
19]). The concept of residual extropy has been studied in various contexts, including the work by [
20], which focuses on k-record values. Ref. [
21] proposed a kernel-based estimator for the residual extropy function under the
-mixing dependence condition.
Data gathered for statistical analysis in various scientific disciplines often include errors. In meteorology, weather data often contain inaccuracies caused by instrument limitations or unpredictable atmospheric changes. For example, in manufacturing industry, production data can contain errors due to problems with machines or mistakes made during the inspection process. Similarly, in agriculture, the measurement of crop yields can be unreliable because of differences in how samples are collected or changes in weather and other environmental conditions. These kinds of mistakes are known as measurement errors, and they happen when the data we observe is different from the actual or true values. This can occur for many reasons, such as faulty equipment, poor measuring tools, changes in the environment, or even simple human mistakes during data collection. To deal with these issues, measurement error models are used. These models help us understand and correct the inaccuracies in the data, making it possible to get more accurate and meaningful results. They are especially important in fields like manufacturing, agriculture, medicine, and social sciences, where reliable data is crucial for making decisions and drawing conclusions.
The classical error model is used when we want to determine
X, but cannot do so directly because of different types of errors in measurement. For example, measuring systolic blood pressure (SBP) can be affected by daily and seasonal changes. Errors may come from machine recording issues, how the measurement is taken, etc. In such cases, it is often reasonable to assume an additive error model. For additional details and examples of classical error models, refer to [
22]. Additive measurement error models have been widely studied due to their significance in handling contaminated data. In cases where the variable of interest
X, which cannot be directly observed. Instead, an independent sample
are drawn according to the model:
where the measurement error
is independent of
X. The primary goal is to estimate
, the unknown
pdf of
X. In this model, the distribution of
is exactly known; however, this may not possible always. In such cases,
can be estimated from replicated measurements, as discussed by [
23]. In cases where replicates are unavailable, estimating the measurement error distribution becomes more challenging. One approach is to estimate the error distribution from an independent validation data set, where a subset of the data includes both the contaminated observations and their corresponding error-free measurements. From this, the distribution of the measurement error can be directly estimated by [
22]. This strategy is particularly useful in industrial applications. Additionally, simulation-based deconvolution techniques have been proposed to estimate the error distribution even when it is unknown, such as the SIMEX (Simulation-Extrapolation)-based approach introduced by [
24].
In the context of extropy, Ref. [
25] extended the theory of extropy estimation to handle data contaminated by additive measurement error. Building on this, Ref. [
26] developed the theory for past extropy estimation under measurement error, offering an estimation framework supported by asymptotic theory and simulation evidence. However, residual extropy, despite its practical relevance, has not yet been studied under the presence of measurement error.
This lack of literature presents a critical gap: while uncertainty about the future is often the focus in predictive tasks, we currently lack statistical tools to estimate residual extropy when the observed data are contaminated. Given the pervasiveness of measurement error in lifetime and reliability data, addressing this issue is essential to ensure accurate and reliable inference.
Therefore, this study is motivated by the need to develop a non-parametric estimation method for residual extropy in the presence of additive measurement error. Our approach leverages kernel-based deconvolution techniques to correct for the error and accurately estimate the residual extropy function. By doing so, we extend the existing body of work on extropy and contribute a novel and practical solution to uncertainty quantification under imperfect data conditions.
The structure of the paper is as follows:
Section 2 introduces the estimator for residual extropy in the presence of measurement error in the data. In
Section 3, we examine the asymptotic properties of the proposed estimator.
Section 4 and
Section 5 provide a comparison between the proposed estimator and the empirical estimator based on contaminated data, through simulation and data analysis, respectively. The paper concludes with
Section 6.
4. Simulation
A simulation study was conducted to assess the performance of the proposed estimator, comparing it with that of the empirical estimator of residual extropy, using the contaminated sample
. The estimator is defined as:
The empirical estimator of the
sf , denoted by
, is based on the contaminated sample
,
The simulation framework included the following scenarios and conditions, with the unobserved distribution assumed to be as follows:
Gamma distribution with shape parameter 2 and scale parameter .
Weibull distribution with shape parameter 2 and scale parameter 3.
Log-normal distribution with shape parameter 0 and scale parameter 1.
The measurement errors are assumed to follow the distributions outlined below:
The parameter
was chosen to achieve a specific level of contamination, as determined by the signal-to-noise ratio
. We examined contamination levels of 15% and 30% in measurement error distributions with a mean of 0 in both cases. Convergence rates shows how quickly an estimator approaches the true parameter or function as the sample size increases. They are important to understand the efficiency and accuracy of the estimator under different error distributions and smoothness conditions. For normal and Laplace error distributions, we applied the rule-of-thumb (ROT) bandwidth selection method, which sets the bandwidth proportional to the noise standard deviation and sample size. Regarding the error distribution,
and
are optimum bandwidth values for normal and Laplace errors, respectively (see [
34]).
In kernel density estimation with error-free data, it is generally accepted that the choice of the kernel function
has little influence on the estimator’s accuracy. On the other hand, when performing deconvolution kernel estimation with contaminated data, certain conditions concerning the structure of deconvolution estimators must be fulfilled under specific circumstances. For normal errors, we applied a second-order kernel with a characteristic function that exhibits symmetric and compact support (see, [
28,
32]),
with characteristic function,
In case of Laplacian errors, the standard normal kernel function was considered. The estimators defined in Equations (
9) and (
13) were calculated for different sample sizes
n, while keeping
t = 1 fixed for each sample. Repeating this procedure for 1000 samples, the bias and mean squared error
were evaluated for various sample sizes
n = 50, 100, 200, 400, and 500.
Examining
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6, we notice a consistent pattern where both bias and MSE decrease with increasing sample size. When we compare the performance of the estimators based on the MSE, it can be concluded that
performs better than
.