A Confidence Set Analysis for Observed Samples: a Fuzzy Set Approach

Confidence sets are generally interpreted in terms of replications of an experiment. However, this interpretation is only valid before observing the sample. After observing the sample, any confidence sets have probability zero or one to contain the parameter value. In this paper, we provide a confidence set analysis for an observed sample based on fuzzy set theory by using the concept of membership functions. We show that the traditional ad hoc thresholds (the confidence and significance levels) can be attained from a general membership function. The applicability of the newly proposed theory is demonstrated by using well-known examples from the statistical literature and an application in the context of contingency tables.


Introduction
Quantities of interest are typically surrounded by a number of uncertain events.According to statistical reasoning, probability measures are employed to model uncertain events and to make inferences over the quantities of interest.The observed relevant information is contained in the observed sample [1].The statistical model is formally written by the triplet: (X , F , P ), where X ⊆ R n is the sample space, F is the associated σ-field and P = {P θ : θ ∈ Θ} is a family of sampling probabilities indexed by a parameter θ, where Θ ⊂ R k , with k < ∞, is a non-empty set called the parameter space.The quantities of interest are connected with the parameter θ, e.g., the expectation of some random quantity defined in the statistical model.
The inferential process about θ involves a summary of the information provided by the observed data using (minimal) sufficient statistics and their respective induced models that concentrate the statistical relevant information.There are essentially two types of estimation theories, namely, point and set estimation theories; this paper focuses on the latter.For the univariate case, Neyman [2] provided a theory of confidence intervals, which is based on a random interval θ 1 (X) ≤ θ ≤ θ 2 (X) such that its probability is greater than (or equal to) to a given predefined value γ = 1 − α < 1 (confidence level), where X is the random sample.The most frequent interpretation states that if the experiment is repeated and a confidence interval is computed for each experiment, then the parameter θ is expected to lie at least in 100γ% of those observed confidence intervals.However, in practice, the experiment is repeated once and just one confidence set is observed.This observed confidence set θ 1 (x) ≤ θ ≤ θ 2 (x) contains non-random values, since x is the observed value of X, so, the probability that this observed confidence set contains any specific point or region will always be zero or one [3].Therefore, after observing the sample the confidence sets cannot be interpreted in terms of frequencies (as Neyman proposed in 1935 [2]).
Fuzzy set theory developed by Zadeh [4,5] allows generating possibility distributions by using confidence sets (see for example [6][7][8]).Probability measures are dominated by possibility measures in the following sense: Events with zero possibility must have zero probability, however not all events with positive possibility have positive probability [9].That is, in some cases, some events with positive possibility do have zero probability.Therefore, possibility measures can provide an information not featured by probability measures.We will show that, for a given observed sample, the related possibility distribution provides information about the structure of confidence sets.
The main contribution of our paper is to show how to generate a fuzzy number from a given confidence set and therefore infer about some parameter θ under the light of fuzzy set theory.Although this approach has already been discussed in the literature (see [7,8,10]), our approach is more general (e.g., the confidence region is formally defined, the parameter space is multidimensional, etc.) and oriented to the statistical community.In this context, the proposal presented in this paper is based on a general membership function proposed in Patriota [11].As a consequence of this characterization, properties and comparisons of confidence sets are discussed under the scope of fuzzy set theory but from a statistical point of view.
The paper is organized as follows.In Section 2, a review of fuzzy set theory for statisticians is provided.Section 3 focuses on the connection between confidence sets and fuzzy sets through a membership function.Section 4 presents some examples of the results obtained and Section 5 provides an application to a real dataset.Finally, in Section 6, a discussion about different proposals existing in the literature for relating confidence theory and fuzzy theory is presented.Section 7 ends the paper with some remarks and conclusions.

A Brief Review of Fuzzy Theory
Fuzzy set theory provides mathematical treatment of some vague linguistic terms such as "about", "around", "close", "short", among others.From the fuzzy theory viewpoint, numbers are idealizations of imprecise information expressed by means of numerical values.For example, when the height of an individual is measured, a numerical value is registered including some inaccuracies.Such inaccuracies may have been caused by the measurement instruments, human limitations, rounding, or biased prior information among many other causes.If the "real" value of the height is represented by the number h, maybe it would be more correct to say that the value of the height is approximately and not exactly h [12], the word "approximately" is imprecise and can be modeled by fuzzy theory.As was noted by Coppi et al. [13], fuzzy theory can provide an additional value to statistical methods because of the uncertainty inherent to the observable world and its associated information sources are combined beyond the traditional probability theory.For example, Tanaka et al. [14] introduced the concept of fuzzy regression while Wünsche et al. [15] characterized the least squares method for fuzzy random variables and Arabpour and Tata [16] developed some theoretical elements regarding parameter estimation in fuzzy regression models.The connection between the estimation of parameters and fuzzy theory has been studied by several authors.Geyer and Meeden [17] established a relation between the concept of p-value and fuzzy structures and Parchami et al. [18] introduced the concept of fuzzy confidence intervals.On the other hand, Casals et al. [19] studied fuzzy decision problems by relating the concepts of hypothesis testing and fuzzy information nature.Saade and Schwarzlander [20] and Saade [21] proposed a characterization of fuzzy hypothesis testing while Watanabe and Imaizumi [22] related the concepts of hypothesis test statistics and fuzzy hypotheses.Arnold [23,24] related the concept of fuzzy hypothesis testing with conventional methods of real data analysis.Taheri and Behboodian [25] generalized the Neyman-Pearson approach for hypothesis testing under the fuzzy point of view and Filzmoser and Viertl [26] introduced the concept of fuzzy p-value for statistical hypotheses using fuzzy data.Recently, Patriota [11] provided an evidence measure for testing null hypotheses that is intrinsically related to fuzzy theory.

Overview of Fuzzy Theory
In order to make our paper self-contained, we provide an overview of fuzzy theory in this section.We only use some concepts and terminology of fuzzy set theory, mainly based on the works of [5,27,28].
is called the membership function for the fuzzy set A. In addition, the empty fuzzy set ∅ is characterized by µ The fuzzy set theory extends the traditional set theory by relaxing the concept of membership of elements in their respective sets.On the one hand, in the ordinary set theory it is considered that ω ∈ A (membership one) or ω ∈ A (membership zero), that is, it is a binary operation.On the other hand, fuzzy set theory considers a degree of membership that ranges over the interval [0, 1], that is, ω is a member of A with a certain degree and this same element ω is a member also of A c with another degree.Probability theory is built on the usual set theory and provides a number in [0, 1] to describe the degree of uncertainty that ω ∈ A.
The main difference between probability theory and fuzzy theory lies in the definition of a set: the former considers traditional sets and the latter considers fuzzy sets.As will be seen in this section, the properties of fuzzy sets are very different from those of traditional sets.The applicability of fuzzy sets is enormous in language modeling [29], image analysis [30] among many others.One simple example is that an object with gray color has a degree of blackness and a degree of whiteness, so it would be much more informative to model this phenomenon inside the fuzzy set framework setting membership degrees than by setting a binary membership (traditional set theory).
From Definition 1, we can represent an ordinary set by using fuzzy notation.For instance, if Ω = R k , then any usual subset B ⊆ R k is represented by setting µ B (ω) = 1 for all ω ∈ B and µ B (ω) = 0 for all ω ∈ B. As a special case, let Ω = R and B = (a, b) be an interval on the real line with a < b.Then, B can be written in terms of a fuzzy set as B = {(ω, 1) : It is important to stress that membership and probability density functions are intrinsically different.For example, if π(ω) is a density function, i.e., π(ω) ≥ 0 for all ω ∈ Ω and Ω π(ω)dω = 1, then we can obtain a membership function by defining µ A (ω) = C −1 π(ω), provided that C = sup ω∈Ω π(ω) < ∞.However, the converse is not necessarily true, since a membership function does not need to be integrable over Ω.
For probability density functions, it is common to define a support to characterize the set of all points with positive density.For membership functions, we have the same definition to represent the set of all points with positive membership in the fuzzy set.Definition 2 formalizes this concept.Definition 2. The support of a fuzzy set A is defined as Notice that, an element ω has full membership in its respective fuzzy set when its membership is one.In this context, the element ω fully contains all features required by the fuzzy set.Definition 3 formalizes the set of all points with full membership, that is, all points where their membership functions are equal to one.Definition 3. The core of a fuzzy set A is defined as core( A) = {ω ∈ Ω : µ A (ω) = 1}.
When the core has as least one element, we have a normal fuzzy set (see Definition 4).Definition 4. A fuzzy set A is called normal if its core is nonempty.In other words, there is at least one point ω ∈ R k with µ A (ω) = 1.
Let A be a normal fuzzy set.Then, the closer µ A (ω 0 ) is to one, the more we believe that ω 0 lies in core( A) and the closer µ A (ω 0 ) is to zero, the more we believe that ω 0 is not in core( A).That is, the degree of membership for an element can also be seen as a measure of uncertainty [29].
Let A and B be two fuzzy sets with membership functions µ A (ω) and µ B (ω), respectively.According to Zadeh [5], (see also [31]), if Ω ⊆ R k the common operations are defined as follows: From the above definitions, if we consider Ω = {(ω, 1) ; ω ∈ Ω ⊆ R k } as the universal fuzzy set, then for any fuzzy set A we have A ⊆ Ω, provided that the membership function of A has domain Ω.In addition, if there exists As the reader can see, these properties block the excluded middle and contradiction laws of classic set theory (for further details see [31][32][33]).
The concept of a fuzzy set is very broad and difficult to handle without some additional specifications.In this context, the next definition allows us to specify fuzzy numbers, which are natural extensions of traditional numbers.However, this latter definition depends on fuzzy convexity, which is defined next (see [5] for more details).

Definition 5. A set A is convex if and only if µ
Note that the concept of convexity under the fuzzy approach differs from the classic definition of convexity under functional analysis.More discussion about this concept will be presented in Section 3. A fuzzy interval A is a fuzzy set that satisfies the condition of convexity and normality, so the core of a fuzzy interval is constituted by all elements with membership one.A fuzzy interval A is a fuzzy number when the cardinality of core( A) equals 1 [28].Fuzzy numbers and fuzzy intervals are useful to represent imprecision for point and interval measures, respectively.As mentioned earlier, these concepts have multiple applications, e.g., in artificial intelligence, image processing, speech recognition, biological, and medical science, operations research, decision analysis, information processing, economics, geography, psychology, linguistics, etc.More applications can be found in [34,35].Figure 1a-c illustrate a general fuzzy set, a fuzzy interval and a fuzzy number, respectively.
Dubois et al. [27] defined the class of LR (left and right) membership functions defined over Ω = R, i.e., the class of membership functions that can be entirely characterized by three parameters, namely, (m, α, β), and two functions L and R. The next definition is related to the concept of LR-type fuzzy numbers.

Definition 6. The fuzzy number
where m is called the center of A and α and β are called the left and right propagations, respectively.
If α = β, A is called a symmetric fuzzy number.For a symmetric membership function, the equality In this paper, we use all definitions presented in this section to connect the classic statistical quantities with fuzzy theory.

Confidence Sets and Membership Functions
As mentioned in the Introduction, the main goal of this paper is to infer about some parameter θ using the confidence sets under the fuzzy set theory.For that reason, we start this section by presenting the definition of a general confidence set.
Under a parametric statistical model (X , F , P , where 2 Θ is the family of all subsets of Θ (the power set) satisfying for every θ ∈ Θ, where X ∈ X is a random vector defined in the statistical model (see [36], p. 315).When P θ (C α (X) θ) = 1 − α for all θ ∈ Θ, then the confidence set C α is exact.Procedures to build confidence sets can be found in [37][38][39][40][41] among others.These procedures are in general based on pivotal quantities and likelihood-ratio statistics.Here, 1 − α is called the confidence level and α is called the significance level [42].Intuitively, the interval width depends on the confidence level, for instance, the greater the confidence level the greater the interval width built under normal distributions (see [43]).
After observing the sample, the confidence set C α (x) is a fixed set and P θ (C α (x) θ) is zero or one, where x is the observed sample, so the probability statements in Equation ( 1) are used just to construct a proper confidence set.Once the sample is observed, this confidence set is fixed.Therefore, the observed confidence sets cannot be interpreted in terms of probabilities (see [44] Section 3.1.2,p. 41, for further details).In this section we show that, although it is not possible to make probabilistic statements about observed confidence sets, we can interpret the observed confidence sets in terms of fuzzy sets.We present a general membership function that provides all information contained in an observed confidence set C α (x), for all levels α ∈ [0, 1] (alpha-cuts).
where sup{∅} = −∞.We use the short notation µ Θ when the family C is not the focus.Then, The proof is straightforward.
For each proposed confidence region, we can represent the parameter space by the fuzzy set where µ Θ (θ) is given in Equation ( 2).Note that for different confidence sets C 1,α and C 2,α we have different memberships, namely As a consequence, the resulting fuzzy sets will have different representations, namely respectively.Additional information with respect to Equation (2) can be found in Mauris et al. [45] and more recently in [11].Patriota [11] studied some relationships with p-values when the confidence region C α is built under the likelihood-ratio statistic.
The next result characterizes the core of Θ.
Theorem 1 characterizes the functional form of the core for the membership function defined in Equation (2), that is, those values in Θ that produce full membership.

Remark 1.
If the confidence set C α (x) is centered in the maximum likelihood (ML) estimate θ, then we have that µ Θ ( θ) = 1.This means that the ML estimate is part of the core of the fuzzy set Θ. We can interpret core( Θ) as the set of all parameter values for which the related probability distribution explains the observed data according to C.
Next we define non-increasing confidence sets in terms of the significance level α.Definition 7. Let C = {C α (x)} α∈I be a family of confidence sets.We say that C is a non-increasing family of confidence sets if The next result relates the monotonicity property of confidence sets with the membership functions.
be two membership functions with the same core.We say that µ Θ 1 has total supremacy over µ Definition 8 allows us to compare two different confidence sets, e.g., we can determine if a confidence set is more conservative than another for all confidence levels.Definition 8 is similar to the definition of superiority given by Xie and Singh [46].Note also that if U is the family of all membership functions then T establishes an order relation in U and this relation is the order relation of the inclusion for fuzzy sets, (see for instance [47]).Notice that Definition 8 is strong and is not applicable in many situations, notably if two membership functions have different core sets.In order to make the supremacy concept less restrictive, allowing us to include more situations, we define the following operators.Definition 9. Let µ Θ 1 and µ Θ 2 two continuous membership functions.We say that ), and is denoted by ), and are denoted by ), and it is denoted by µ , where We call E r the r-up integral operator and E r the r-down integral operator.
Notice that, if µ Θ is integrable, then by Definition 9, it is straightforward that Example 1 describes how to analytically compute the quantities E r and E r .However, for more complex models, analytical solutions are virtually impossible, so these integrals have to be computed numerically by using any software (for instance, MAPLE, MATLAB, Ox, R, SAS).
Example 1.Let X = (X 1 , X 2 , . . ., X n ) be a random sample (the random variables are independent and identically distributed) from a normal population with mean θ and known variance σ 2 and let x be the observed sample.Here, Θ = R and a (1 − α) confidence set for θ, using the pivotal quantity method, is given by , where x is the sample mean and z q is the qth-quantile of a standard normal distribution.The membership function is given by Then, solving the equations µ Θ (c 1r ) = µ Θ (c 2r ) = r with respect to c 1r and c 2r , where c 1r ≤ x ≤ c 2r , we obtain where Now, by using the identity [48] a −∞ Φ(x) dx = aΦ(a) + φ(a), where φ(•) stands for the standard normal density function, we have that Similarly for the r-up integral operator, Definition 9 will be used to identify more conservative confidence sets.It is possible to show that E r , E r and T satisfy the requirements to be order relations (reflexivity, antisymmetry and transitivity) and ∼ E r , ∼ E r and ∼ T satisfy the requirements to be equivalence relations (reflexivity, symmetry and transitivity).
Theorem 3 establishes some relations among the three types of supremacies.
be two continuous and integrable membership functions.Then, where for all r ∈ [0, 1], if and only if there exist θ * ∈ Θ and k ∈ R such that Θ 1,k, * ⊆ Θ 2 , where the latter is the inclusion of fuzzy sets and Θ 1,k, * depends on the membership function µ Θ 1,k, * .This implies that ) for all r ∈ [0, 1].The proof of the converse is similar.If k = 0, by the equality of the cores, we have A confidence interval is said to be more conservative than another if the former interval's amplitude is greater than the latter's for a specific significance level.A procedure to generate a confidence interval is considered more conservative than another if the interval's amplitude is greater than the latter's for all significance levels.Below, we define the conservative concept for general confidence sets.
4. Let Θ 1 and Θ 2 be two fuzzy set representations over the parametric space Θ associated with C 1,α (x) and C 2,α (x Definitions 8-10 are tools for comparing confidence sets through their respective membership functions.The membership functions used are defined in the same parameter space.However, there are situations in which we are interested in comparing confidence sets from a partial vector parameter with confidence sets from a full parameter vector.Therefore, a membership function for partial parameter vectors is defined next. Let θ = (λ , ψ ) , where λ and ψ are vectors with dimensions k 1 and k 2 with k = k 1 + k 2 .Without loss of generality, let C * α (x) be a confidence set for λ and let Λ be the set in which λ varies.Then, a membership function for λ can be defined simply by The same properties of the membership (2) and the above definitions are valid for this partial membership function.

Examples
In this section we present some examples of confidence sets in order to illustrate the relation between the confidence set and associated membership function.Moreover, we compute the r-up and down integral operators.

Exponential Distribution
Let X = (X 1 , . . ., X n ) be a random sample from an exponential distribution with rate θ, let x be the observed sample and x the sample mean.Then, a 1 − α level confidence interval for θ (a 1 ) by using the pivotal quantity (see [49], p. 267) is (a 2 ) by using the asymptotic approximation [50] is where χ 2 ν;q is the q-th quantile of a chi square distribution with ν degrees of freedom.Thus, from Lemma 1 we have the following: where χ 2 (•; ν) is the cumulative distribution function of a chi square distribution with ν degrees of freedom.
Figure 2 presents the membership functions (panel a) and functions (r, E r ) for r ∈ [0, 1] and (r, E r ) for r ∈ [0, 1] for both confidence intervals (panels b and c), considering n = 5 and x = 1.In all plots, the solid and dotted lines represent cases (a 1 ) and (a 2 ), respectively.
In particular, Figure 2a shows that for some θ ∈ Θ * ⊂ Θ we have µ Θ 1 (θ) ≤ µ Θ 2 (θ) and for some That is, the order relation is not the same for all θ ∈ Θ.However, Definition 8 states that the order relation must be fulfilled for all θ ∈ Θ.In Figure 2b, it can be observed that the inequality E r (µ Θ 1 ) ≤ E r (µ Θ 2 ) holds for r ∈ [0.81;

Poisson Distribution
Now, let X = (X 1 , . . ., X n ) a random sample from a Poisson distribution with rate θ, and let x be the observed sample and x the sample mean.A 1 − α level confidence interval for θ (a 1 ) by using the pivotal quantity [51] is (a 2 ) by using the asymptotic approximation [52] is Thus, from Lemma 1 we have the following.
(a 1 ) The membership function for Figure 3a presents the membership functions.Note that in this case, the total supremacy does not occur (see Definition 8).Moreover, Figure 3b,c present the functions (r, E r ) and (r, E r ) for r ∈ [0, 1] for both confidence intervals respectively.From these figures, it can be concluded that µ Θ 1 has r-up-supremacy over µ Θ 2 for all r ∈ [0, 1].We considered n = 5 and x = 4, and as in the previous example, the solid and dotted lines represent cases (a 1 ) and (a 2 ), respectively.

Normal Distribution
Now we discuss the situation where the parameter of interest is a vector.For this, consider, X 1 , . . ., X n to be a random sample from a normal distribution with mean θ 1 and variance θ 2 .One approximate confidence set of significance level α for θ = (θ 1 , θ 2 ) can be defined by (see [53]): where s 2 is the sample variance with denominator n.From Lemma 1, the associated membership function is given by Here, Then, the r-down and r-up integral operators are given by where I A (θ) is the indicator function.These integrals can be computed numerically by using any software (for instance, MAPLE, MATLAB, Ox, R, SAS). Figure 4 presents the above membership function with n = 5, x = 5, s 2 = 5 and the r-down and the r-up integral operators for the respective membership function.
In Figure 4a two planes are plotted, of height 0.3 and 0.8, to show the changes in confidence regions.In Figure 4b the graphs of (r, E r ) and (r, E r ) are plotted.It can be observed that E r = E r when r = 0.236, as indicated by the dash-dotted vertical line.Basically, when the practitioner wants to compare general confidence sets, it is sufficient to plot the graphs of (r, E r (µ Θ )) and (r, E r (µ Θ )) in order to analyze the behaviors.Moreover, for univariate or bivariate confidence sets, the graph of (θ, µ Θ (θ)) can be used in place of the usual confidence sets with a pre-fixed confidence level, since it brings much more information.

Confidence Set for Proportions in Bernoulli Trials
Following [54], we consider an experiment where N pairs of Bernoulli events denoted as A and B are observed.In this case, the outcomes are recorded as 1 (success) and 2 (failure), and the ith observed pair is denoted by (Y i1 , Y i2 ).The results of the experiment can be summarized in a 2 × 2 contingency table.Each n kl corresponds to the number of pairs (Y i1 , Y i2 ) with outcomes Y i1 = k and Y i2 = l.Particularly, we consider a data set from [55] related to the study of the effect of the airway hyper-reactivity (AHR) before and after stem cell transplantation (SCT) in 21 children.The data are provided in Table 1.In this case, we are interested in a (1 − α) confidence interval for the difference of proportions θ = p 12 − p 21 , where p 12 is the proportion related to having AHR before SCT and not having it after SCT and p 21 is the proportion related to the opposite event, namely, to not having AHR before SCT and having it after SCT.For constructing the confidence interval, we consider the following methods, namely, Wald, Wald with continuity correction, Wald with Agresti-Min pseudo-frequency adjustment and Wald with Bonett-Price Laplace adjustment (for further details about these methods, see [54,56,57]).
For each method α ∈ I = [0, 1] and z q is the qth-quantile of a standard normal distribution.Therefore, the membership functions for each method are given by: 1. Wald's method: 2. Wald's method with continuity correction: 3. Wald's method with Agresti-Min pseudo-frequency adjustment: Wald's method with Bonett-Price Laplace adjustment: Figure 5 shows the membership functions for each method presented before.In this case, we have that µ Θ 1 T µ Θ 2 .Therefore, by Theorem 3, we conclude that C 2 = {C 2,α (x)} 0≤α≤1 is more conservative than C 1 = {C 1,α (x)} 0≤α≤1 .Consequently, Wald's method with continuity correction generates more conservative confidence intervals than Wald's method for all confidence levels.Similarly, µ Θ 4 T µ Θ 3 , that is, Wald's method with Agresti-Min pseudo-frequency adjustment generates more conservative confidence intervals than Wald's method with Bonett-Price Laplace adjustment for all confidence levels.Moreover, core( Θ 1 ) = core( Θ 2 ) and core( Θ 3 ) = core( Θ 4 ); however, core( Θ 1 ) = core( Θ 3 ).This means that Wald's method with and without a continuity correction provides higher credibility to θ = −0.2857for the difference of proportions.However, Wald's method with Agresti-Min pseudo-frequency adjustment and with Bonett-Price Laplace adjustment gives a result greater than θ = −0.2608.The intersection of the continuous horizontal line with the membership functions shown in Figure 5 characterizes both, the lower and upper limits of a 95% confidence level according to each method described previously.Note that the largest differences are given in the lower limits.
Note also that, in this situation, the concept of total supremacy cannot be used to compare the membership functions of Θ 1 and Θ 3 , since they have different cores.For this reason, we use Definition 9.The expressions for the E r (µ Θ ) and E r (µ Θ ) operators are similar to those given in Example 1, since their membership function shape is the same.Then, where θ * and σ * are as follows: 1. Wald's method Figure 6 shows the E r (µ Θ ) and E r (µ Θ ) operators for Wald's method with and without Bonett-Price Laplace adjustment.Note that µ Θ 4 has r-down and r-up supremacy over µ Θ 1 .Then, the interval C 1,α (x) is up/down-more conservative than C 4,α (x) for all confidence levels.Finally, Figure 7 depicts these operators for Wald's method with and without Agresti-Min pseudo-frequency adjustment.In this case, we have that µ Θ 3 has r-up and down-supremacy over µ Θ 1 .Consequently, the interval C 1,α (x) is up/down-more conservative than C 3,α (x) for all confidence levels.

Confidence Region for Regression Coefficients in a Normal Linear Model
In this section, we consider a dataset from [58] on features of Australian athletes available from the Australian Institute of Sport (AIS).This dataset has been analyzed previously by Arellano-Valle et al. [59], considering a linear regression model to study the relationship between lean body mass, height and weight of the Australian athletes.The model is given by where Lbm i is the lean body mass, Ht i is the height and Wt i is the weight associated with i = 1, . . ., 102 Australian male athletes.Table 2 presents a summary of the basic descriptive statistics for these variables.By assuming that the model error terms i , i = 1, . . ., n, are independent and normally distributed with constant variance σ 2 for each term, the following confidence region for β 1 and β 2 (see [60] for more details) is given by where θ = (β 1 , β 2 ) T , θ = ( β1 , β2 ) T is the maximum likelihood estimator for θ, σ2 is the maximum likelihood estimator for σ 2 , X is the design matrix associated to the proposed regression model (assumed to be full column rank) and F(q, p, s) is the qth-quantile corresponding to the Fisher-Snedecor's probability distribution with p and s degrees of freedom.The resulting membership function, based on the above confidence region, is , 2, 100 .
Figure 8 shows the membership function surface for the confidence ellipsoid for β 1 and β 2 .This is the full information the confidence region can provide after observing this dataset.Notice that, by definition of the membership function above, µ Θ θ = 1 and, by properties of the function F(q, p, s), θ is the unique element of core( Θ).That is, θ is the unique value with full membership to the set Θ = {(θ, µ Θ (θ)); θ ∈ Θ)}.Each θ such that µ Θ (θ) = k has membership of k × 100%.Figure 9 depicts the graphics (θ, µ Θ (θ)), where µ Θ (θ) = 0.05 (solid line) and µ Θ (θ) = 0.1 (dashed line), respectively; they are the contour curves of µ Θ (θ) at 0.05 and 0.1, respectively.The values of θ in solid line have membership of 5%, while the values of θ in dashed line have membership 10%.

Discussion
The connection between confidence sets and fuzzy theory has already been investigated in the fuzzy literature.As mentioned, some works about this topic have been published in recent years (see for example among others [6][7][8]10]).All of them are related to the idea of generating a possibility distribution from the information provided by confidence sets.The main goal of this proposal is to find, in the context of fuzzy theory, more information in terms of degrees of possibility about the parameter of interest.More information can be extracted from confidence sets (or confidence intervals) after the sample is observed than just one fixed set (or interval).Indeed, a large family of sets is available.
There are also other proposals with the purpose of inferring about a parameter using confidence sets.These proposals are the fiducial inference [61], Dempster-Shaffer (DS) calculus [62], confidence distribution [46] and posterior Bayes.Some of them are carefully discussed by [63][64][65].The main idea of fiducial inference is to consider a distribution for a parameter of interest using the sample information.The idea is to provide some probability statements about a parameter from the observed information, particularly, the information provided by a sufficient statistic.Fiducial distributions are often criticized because in some cases they do not integrate to one, i.e., they are not proper probability distributions [66,67].Moreover, some pivotal quantities used to build a fiducial distribution generate inconsistent results [68][69][70].Therefore, some alternatives have been proposed such as confidence distributions and DS calculus.
The idea of confidence distributions is closely related with the fiducial distribution.As Schweder and Hjort [71] established, "confidence distributions are the Neymanian interpretation of Fisher's fiducial distributions" and they can be defined as a sample-dependent distribution able to represent confidence of all levels for a parameter of interest [46].One of the most famous representatives of this class of distributions is the bootstrap distribution.It is important to remark that, the confidence distribution is considered a distribution estimator and its interpretation has to be done in a frequentist framework, considering a fixed and non-random parameter.Moreover, it is possible to obtain confidence intervals, point estimates and hypothesis test results about the parameter of interest using this distribution.
The DS calculus is based on the idea of convert observed data and pivotal relationships to upper and lower probability statements [72].These statements are related to the probability of support by some subset of the parameter space, the contradiction of this event and the probability of "do not know" about both of them.According to Hanning and Xie [72], a confidence distribution can be formally put into a DS framework.The main difference between DS calculus and fiducial and confidence distributions is the concept of degree of belief.While the latter are focused on providing an estimator of a parameter of interest in terms of a quantity or/and interval/subset, DS calculus is concentrated on obtaining different degrees of belief or confidence for a simple question (related to a parameter).For this purpose, a belief function is used rather than probabilities.
In Bayesian inference we use the concept of prior distribution over the parametric space in order to infer about the parameter of interest.Although in this case, a distribution is also obtained, the notion of unknown parameter considered in the approaches described above is replaced by the notion of random parameter.Moreover, in this setup, the prior distribution summarizes the available information about this parameter.Some other authors have comprehensively discussed the methods discussed in this section.For a detailed discussion, we refer to [73][74][75].
Finally, our proposal intends to use the fuzzy set theory to represent a confidence set and, in this context, to provide more information to the statistical community about a specific confidence set.We believe, this approach represents both the uncertainty and imprecision existent about the parameter of interest by using the possibility theory obtained from a simple confidence set.

Conclusion and Final Remarks
In this paper, we revisited the connection between confidence sets and fuzzy sets through a membership function and therefore possibility distributions.Some elementary definitions and properties of fuzzy theory were revisited in order to assist readers not familiar with these non-standard tools.The connection indicates that fuzzy theory formalism can be utilized to interpret and give basis for confidence sets after observing the sample.Zadeh [76] argued that probability theory and fuzzy theory are complementary rather than competitive.As stressed in the paper, after observing the sample, any confidence sets have probability zero or one to contain the parameter value, so probabilistic post-data interpretations are not useful.We incorporated a new form of interpretation in which the core( Θ) is the set of all parameter values for which the related probability distributions best explain the observed data according to C. In terms of a belief statement: the closer θ is to core( Θ), the more we believe that P θ that best model, according to C (the observed data).Moreover, the proposed membership function delivers much more information than a confidence set with a pre-fixed confidence level.Concepts of supremacy to compare different confidence sets and confidence procedures through membership functions were introduced and studied.Applications to usual examples were offered.This paper is part of large project on the foundation of classic statistics.Other works are being developed on hypothesis testing, evidence measures, coherence and so forth.

Figure 1 .
Figure 1.Representations of (a) a general fuzzy set, (b) a fuzzy interval and (c) a fuzzy number.

Figure 2 .
Figure 2. (a) membership functions for the exponential rate based on the pivotal quantity (solid line) and based on the normal approximation (dotted line); (b) the r-down integral operator for the respective memberships; (c) the r-up integral operator for the respective memberships.n = 5 and x = 1 are considered.

Figure 3 .
Figure 3. (a) membership functions for the Poisson rate based on the pivotal quantity (solid line) and based on the normal approximation (dotted line); (b) the r-down integral operator for the respective memberships; (c) the r-up integral operator for the respective memberships.n = 5 and x = 4 are considered.

Figure 4 .
Figure 4. (a) membership function surface for mean and variance in a normal population; (b) the r-down and the r-up integral operators for the memberships.We consider n = 5, x = 5 and s 2 = 5.

Figure 5 .
Figure 5. Membership functions for different confidence intervals.

Figure 6 .Figure 7 .
Figure 6.(a) Membership functions based on Wald's method, with and without Bonett-Price Laplace adjustment.(b) r-down integral operator for these memberships.(c) r-up integral operator for these memberships.

Table 1 .
Airway hyper-responsiveness (AHR) status before and after stem cell transplantation (SCT) in 21 children.

Table 2 .
Descriptive statistics of the Australian athletes dataset: sample mean x, sample standard deviation s, and sample skewness and kurtosis coefficients √ b 1 and b 2 , respectively.