# Inference with the Median of a Prior

^{1}

^{2}

^{*}

Next Article in Journal

Previous Article in Journal

School of Intelligent Systems (IPM) andAmirkabir University of Technology (Dept. of Stat.), Tehran, Iran.

LSS (CNRS-Supélec-Univ. Paris 11),Supélec, Plateau de Moulon, 91192 Gif-sur-Yvette, France.

Author to whom correspondence should be addressed.

Received: 14 February 2006 / Accepted: 9 June 2006 / Published: 13 June 2006

We consider the problem of inference on one of the two parameters of a probability distribution when we have some prior information on a nuisance parameter. When a prior probability distribution on this nuisance parameter is given, the marginal distribution is the classical tool to account for it. If the prior distribution is not given, but we have partial knowledge such as a fixed number of moments, we can use the maximum entropy principle to assign a prior law and thus go back to the previous case. In this work, we consider the case where we only know the median of the prior and propose a new tool for this case. This new inference tool looks like a marginal distribution. It is obtained by first remarking that the marginal distribution can be considered as the mean value of the original distribution with respect to the prior probability law of the nuisance parameter, and then, by using the median in place of the mean.

We consider the problem of inference on a parameter of interest θ of a probability distribution when we have some prior information on a nuisance parameter ν from a finite number of samples of this probability distribution. Assume that we know the expressions of either the cumulative distribution function (cdf) F_{X|ν,θ}(**x**|ν, θ) or its corresponding probability density function (pdf) f_{X|ν,θ}(**x**|ν, θ), **X** = (X_{1}, … ,X_{n})′ and **x** = (x_{1}, … ,x_{n})′. $\mathcal{V}$ is a random parameter on which we have an a priori information and θ is a fixed unknown parameter. This prior information can either be of the form of a prior cdf F_{$\mathcal{V}$}(ν) (or a pdf f_{$\mathcal{V}$}(ν)) or, for example, only the knowledge of a finite number of its moments. In the first case, the marginal cdf
is the classical tool for doing any inference on θ. For example the Maximum Likelihood (ML) estimate, ${\hat{\theta}}_{ML}$ of θ is defined as
where f_{X|θ}(**x**|θ) is the pdf corresponding to the cdf F_{X|θ}(**x**|θ).

In the second case the Maximum Entropy (ME) principle ([4, 5]), can be used for assigning the probability law f_{$\mathcal{V}$}(ν) and thus go back to the previous case, e.g. [1] page 90.

In this paper we consider the case where we only know the median of the nuisance parameter $\mathcal{V}$. If we had a complementary knowledge about the finite support of pdf of $\mathcal{V}$, then we could again use the ME principle to assign a prior and go back to the previous case, e.g. [3]. But if we are given the median of $\mathcal{V}$ and if the support is not finite, then in our knowledge, there is not any solution for this case. The main object of this paper is to propose a solution for it. For this aim, in place of F_{X|θ}(**x**|θ) in (1), we propose a new inference tool ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ) which can be used to infer on θ (we will show that ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ) is a cdf under a few conditions). For example we can define
where ${\tilde{f}}_{\mathit{X}|\theta}$(**x**|θ) is the pdf corresponding to the cdf ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ).

This new tool is deduced from the interpretation of F_{X|θ}(**x**|θ) as the mean value of the random variable T = T ($\mathcal{V}$; x) = F_{X|θ}(**x**|θ) as given by (1). Now, if in place of the mean value, we take the median, we obtain this new inference tool ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ) which is defined as
and can be used in the same way to infer on θ.

As far as the authors know, there is no work on this subject except recently presented conference papers by the authors, [9, 8, 7]. In the first article we introduced an alternative inference tool to total probability formula, which is called a new inference tool in this paper. We calculated directly this new inference tool (such as Example A in Section 2) and a numerical method suggested for its approximation. In the second one, we used this new tool for parameter estimation. Finally, in the last one, we reviewed the content of two previous papers and mentioned its use for the estimation of a parameter with incomplete knowledge on a nuisance parameter in the one dimensional case. In this paper we give more details and more results with proofs using weaker conditions, with a new overlook on the problem. We also extend the idea to the multivariate case. In the following, first we give more precise definition of ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ). Then we present some of its properties. For example, we show that under some conditions, ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ) has all the properties of a cdf, its calculation is very easy and depends only on the median of prior distribution. Then, we give a few examples and finally, we compare the relative performances of these two tools for the inference on θ. Extensions and conclusion are given in the last two sections.

Hereafter in this section to simplify the notations we omit the parameter θ, and we assume that the random variables X_{i}, i = 1, … , n and random parameter $\mathcal{V}$ are continuous and real. We also use increasing and decreasing instead of non-decreasing and non-increasing respectively.

Let **X** = (X_{1}, … , X_{n})′ have a cdf F_{X|ν}(**x**|ν) depending on a random parameter $\mathcal{V}$ with pdf f_{$\mathcal{V}$}(ν), and let the random variable T = T($\mathcal{V}$; **x**) = F_{X|ν}(**x**|$\mathcal{V}$) have a unique median for each fixed **x**. The new inference tool, ${\tilde{F}}_{\mathit{X}}(\mathit{x})$, is defined as the median of T:

To make our point clear we begin with the following simple example, called Example A. Let F_{X|$\mathcal{V}$}(x|ν) = 1 − e^{−νx}, x > 0, be the cdf of an exponential random variable with scale parameter ν > 0. We assume that the prior pdf of $\mathcal{V}$ is known and also is exponential with parameter 1, i.e. f_{ν}(ν) = e^{−ν}, ν > 0. We define the random variable T = F_{X|$\mathcal{V}$}(x|$\mathcal{V}$) = 1 − e^{−$\mathcal{V}$x}, for any fixed value x > 0. The random variable 0 ≤ T ≤ 1 has the following cdf

Therefore, pdf of T is f_{T}(t) = $\frac{1}{x}$(1 − t)^{($\frac{1}{x}$ − 1)}, 0 ≤ t ≤ 1. Now, we can calculate the mean of the random variable T as follow

Let Med(T) be the median of the random variable T, then it can be calculated by

Mean value of the random variable T is a cdf with respect to (wrt) x. This fact is always true; because E(T) is the marginal cdf of random variable X, i.e. F_{X}(x). The marginal cdf is well known, well defined and can also be calculated directly by (1). On the other hand, in this example, it is obvious that Med(T) is a cdf wrt x, which is called ${\tilde{F}}_{X}$(x) in Definition 1, see Figure 1. However, we have not a short cut for calculating ${\tilde{F}}_{X}$(x) such as F_{X}(x) in (1).

In the following theorem and remark, first we show that under a few conditions, ${\tilde{F}}_{\mathit{X}}$(**x**) has all the properties of a cdf. Then, in Theorem 2, we drive a simple expression for calculating ${\tilde{F}}_{\mathit{X}}$(**x**) and show that, in many cases, the expression of ${\tilde{F}}_{\mathit{X}}$(**x**) depends only on the median of the prior and can be calculated simply, see Remark 2. In Theorem 3 we state separability property of ${\tilde{F}}_{\mathit{X}}$(**x**) versus exchangeability of F_{X}(**x**).

Let **X** have a cdf F_{X|ν}(**x**|ν) depending on a random parameter $\mathcal{V}$ with pdf f_{$\mathcal{V}$}(ν) and the real random variable T = F_{X|ν}(**x**|$\mathcal{V}$) have a unique median for each fixed **x**. Then:

- 1.
- ${\tilde{F}}_{\mathit{X}}$(
**x**) is an increasing function in each of its arguments. - 2.
- If F
_{X|ν}(**x**|ν) and F_{$\mathcal{V}$}(ν) are continuous cdfs then ${\tilde{F}}_{\mathit{X}}$(**x**) is a continuous function in each of its arguments. - 3.
- 0 ≤ ${\tilde{F}}_{\mathit{X}}$(
**x**) ≤ 1.

Proof:

- 1.
- Let
**y**= (y_{1}, … , y_{n})′,**z**= (z_{1}, … , z_{n})′, y_{j}< z_{j}for fixed j and y_{i}= z_{i}for i ≠ j, 1 ≤ i, j ≤ n and take_{X|$\mathcal{V}$}is an increasing function in each of its arguments. Therefore,is the unique median of Y and so k_{y}≤ k_{y}or equivalently ${\tilde{F}}_{\mathit{X}}$(_{z}**x**) is increasing in its j-th argument. - 2.
- Let
**x**. = (x_{1}, … , x_{j − 1}, x., x_{j + 1}, … ,x_{n})′ and**t**= (x_{1}, … , x_{j − 1}, t, x_{j + 1}, … ,x_{n})′. By part 1, ${\tilde{F}}_{\mathit{X}}$(**x**) is an increasing function in each of its arguments. Therefore,Further, F_{X|$\mathcal{V}$}(**x**|$\mathcal{V}$) is continuous wrt x_{j}, and so**x**) is the unique median of F_{X|$\mathcal{V}$}(**x**|$\mathcal{V}$), therefore by (3),**x**) is continuous. - 3.
- ${\tilde{F}}_{\mathit{X}}$(
**x**) is the median of random variable T, where T = F_{X|$\mathcal{V}$}(**x**|$\mathcal{V}$) and 0 ≤ T ≤ 1, and so 0 ≤ ${\tilde{F}}_{\mathit{X}}$(**x**) ≤ 1. ☐

By part 1 of Theorem 1, lim_{xj↑+∞} ${\tilde{F}}_{\mathit{X}}$(**x**) and lim_{xj↓−∞} ${\tilde{F}}_{\mathit{X}}$(**x**) exist and are finite, [11]. Therefore ${\tilde{F}}_{\mathit{X}}$(**x**) is a continuous cdf if conditions of Theorem 1 hold and**x**) the marginal cdf of **X** based on median. When ${\tilde{F}}_{X}$(x) is a one dimensional cdf, the last condition follows from parts 1 and 3 of Theorem 1.

- 1.
- lim
_{xj↓−∞}${\tilde{F}}_{\mathit{X}}$(**x**) = 0 for any particular j, - 2.
- lim
_{x1↑+∞,… ,xn↑+∞}${\tilde{F}}_{\mathit{X}}$(**x**) = 1, - 3.
- ∆
_{b1a1 }…∆_{bnan}${\tilde{F}}_{\mathit{X}}$(**x**) ≥ 0, where a_{i}≤ b_{i}, i = 1, … , n, and ∆b_{j}a_{j}${\tilde{F}}_{\mathit{X}}$(**x**) = ${\tilde{F}}_{\mathit{X}}$((x_{1}, … , x_{j − 1}, b_{j}, x_{j + 1}, … ,x_{n})′)−${\tilde{F}}_{\mathit{X}}$((x_{1}, … , x_{j − 1}, a_{j}, x_{j + 1}, … ,x_{n})′) ≥ 0.

If L(ν) = F_{X|ν}(**x**|ν) is a monotone function wrt ν and $\mathcal{V}$ has a unique median ${F}_{\mathrm{\mathcal{V};}}^{-1}\left(\frac{1}{2}\right)$, then ${\tilde{F}}_{\mathit{X}}$(**x**) = L(${F}_{\mathrm{\mathcal{V};}}^{-1}\left(\frac{1}{2}\right)$).

Proof:

Let
be the generalized inverse of L, e.g. [10] page 39. Noting that
and by (2) we have,
where the last expression follows from
☐

If conditions of Theorem 2 hold, then ${\tilde{F}}_{\mathit{X}}$(**x**) belongs to the family of distributions F_{X|$\mathcal{V}$}(**x**|$\mathcal{V}$). Because, ${\tilde{F}}_{\mathit{X}}$(**x**) = F_{X|ν}(**x**|${F}_{\mathrm{\mathcal{V};}}^{-1}(\frac{1}{2})$) Therefore ${\tilde{F}}_{\mathit{X}}$(**x**) is a cdf and conditions in Remark 1 hold.

${\tilde{F}}_{\mathit{X}}$(**x**) depends only on the median of prior distribution, ${F}_{\mathrm{\mathcal{V};}}^{-1}\left(\frac{1}{2}\right)$, while the expression of F_{X}(**x**) needs the perfect knowledge of F_{$\mathcal{V}$}(ν). Therefore, ${\tilde{F}}_{\mathit{X}}$(**x**) is robust relative to prior distributions with the same median.

If median of T is not unique then ${\tilde{F}}_{X}$(x) may not be a unique cdf wrt x. For example (called Example B), assumethat $\mathcal{V}$ has the following cdf, in Example A, Figure 2-left:
Then, T = T ($\mathcal{V}$; x) = F_{X|$\mathcal{V}$}(x|$\mathcal{V}$) = 1 − e^{−$\mathcal{V}$x} has the following cdf
Therefore, the median of T is an arbitrary point in the following interval: (see Figure 2-right)

Let F_{X|ν}(**x**|ν) be conditional cdf of **X** = (X_{1}, … , X_{n})′ given $\mathcal{V}$ = ν and L_{(k1,… ,kr)}(ν) = F_{(Xk1 ,… ,Xkr)}|$\mathcal{V}$(x_{k1} , … , x_{kr}|ν) be monotone function of ν for each {k_{1}, … , k_{r}} ⊆ {1, … , n}. Let also $\mathcal{V}$ have a unique median ${F}_{\mathrm{\mathcal{V};}}^{-1}\left(\frac{1}{2}\right)$. If for each {k_{1}, … , k_{r}} ⊆ {1, … , n},
i.e. **X** | $\mathcal{V}$ = ν has independent components, then

Proof:

Conditions of Theorem 2 hold and so, for each {k_{1}, … , k_{r}} ⊆ {1, … , n},
☐

If **X** | $\mathcal{V}$= ν has independent components, then the marginal distribution of **X** cannot have independent components. For example, in general case,
It can be shown that, if **X** | $\mathcal{V}$= ν has Independent and Identically Distributed (iid) components, then the marginal distribution of **X** is exchangeable, see Example 1. We recall that for identically distributed random variables exchangeability is a weaker condition than independence.

In the following we show that some families of distributions (e.g. [6]) have a monotone distribution function wrt their parameters and so, calculation of ${\tilde{F}}_{\mathit{X}}$(**x**) is very easy by using Theorem 2.

Let L(ν) = F_{X|$\mathcal{V}$}(**x**|$\mathcal{V}$). If ν is a real location parameter then L(ν) is decreasing wrt ν.

Proof:

Let ν_{1} < ν_{2} and ν be a location parameter. Then
☐

Let L(ν) = F_{X}|$\mathcal{V}$(x|ν). If ν is a scale parameter then L(ν) is monotone wrt ν.

Proof:

Let ν_{1} < ν_{2}. If ν is a scale parameter, ν > 0, then
Therefore, L(ν) is an increasing function if x < 0 and is a decreasing function if x > 0, i.e. L(ν) is a monotone function wrt ν. ☐

The proof of the following lemma is straightforward.

Let X_{1}, … X_{n} given $\mathcal{V}$ = ν beiid random variables and X = (X_{1}, … , X_{n})′. If L(ν) = F_{X1}|$\mathcal{V}$(x|ν) is an increasing (a decreasing) function then L*(ν) = F_{X|ν}(**x**|ν) is an increasing (a decreasing) function of ν.

In some cases we can show directly that L(.) is a monotone function. For example, in the exponen-tial family this property can be proved by using differentiation. Let **X**|**η** be distributed according to an exponential family with pdf
where **η** = (η_{1}, … , η_{n})′ and **T** = (T_{1}, … , T_{n})′. It can be shown that L(**η**) = F_{X}|**η**(**x**|**η**) is a monotone function wrt each of its arguments in many cases by the following method: Let I_{y≤x} = 1 if y_{1} ≤ x_{1}, … , y_{n} ≤ x_{n} and 0 elsewhere; and note that the differentiation under the integral sign is true for exponential family. Then
The last equality follows from , e.g. [6] page 27.

On the other hand, we can use stochastic ordering property of a family of distributions for showing that L(.) is a monotone function. A family of cdfs
where V is an interval on the real line, is said to have Monotone Likelihood Ratio (MLR) property if, for every ν_{1} < ν_{2} in V the likelihood ratio
is a monotone function of x. The property of MLR defines a very strong ordering of a family of distributions.

If $\mathcal{F}$ is an MLR family wrt x then F_{X|$\mathcal{V}$}(x|ν) is an increasing (or a decreasing) functionof ν for all x.

Proof:

See e.g. [12] page 124. ☐

A family of cdfs in (4) is said to be stochastically increasing (SI) if ν_{1} < ν_{2} implies F_{X|$\mathcal{V}$}(x|ν_{1}) ≥ F_{X|$\mathcal{V}$}(x|ν_{2}) for all x. For stochastically decreasing (SD) the inequality is reversed. This definition is a weaker property than MLR property (by Lemma 4), but is a stronger property than monotonicity of L(ν) = F_{X|$\mathcal{V}$}(x|ν) (because L(ν) is monotone for each fixed x). Therefore, we have
It can be shown that the converse of the above relations are not true.

MLR ⇒ SI or SD ⇒ L(ν) is monotone

In Theorem 1, we prove that ${\tilde{F}}_{X}$(x) is an increasing function. In the proof of this theorem we do not use the monotonicity property of L(ν) = F_{X|$\mathcal{V}$}(x|ν) wrt ν.Forexample (called Example C), assume that
be mixture cdf of an exponential and a Cauchy cdf with parameter ν > 0. Figure 3-left shows the graphs of L(ν) = F_{X|$\mathcal{V}$}(x|ν) for different x. L(ν) is not monotone for some of x values in this figure. If we assume that the prior pdf of $\mathcal{V}$ is known and is also exponential with parameter 1, then, still median of random variable T is a cdf, see Figure 3-right.

In what follows, we use the following notations and expressions, [2] pages 427-422:
Exchangeable Normal: The random vector **X** = (X_{1}, … , X_{n})′ is said to have an exchangeable normal distribution, $\mathcal{E}$$\mathcal{N}$ (**x**; µ, σ^{2}, ρ), if its distribution is multivariate normal with the following mean vector and variance-covariance matrix
It can be shown that $\mathcal{E}$$\mathcal{N}$ (**x**; µ, σ^{2}, ρ) =
where

The first example we consider is
where we assume that the mean value ν is the nuisance parameter. Let X_{1}, … , X_{n} be an iid copy of X (i.e. X|$\mathcal{V}$ = ν, θ) and **X** = (X_{1}, … , X_{n})′, then:

- Prior pdf case f
_{$\mathcal{V}$}(ν) = $\mathcal{N}$ (ν; ν_{0}, θ_{0}):Then we have - Unique median knowledge case Median {$\mathcal{V}$} = ν
_{0}:Then, as we could see, by using Lemma 1 and Theorem 2, we have**x**|θ) (because F_{X|$\mathcal{V}$,θ}(**x**|ν, θ) is a decreasing function wrt ν by Lemma 1), therefore,_{$\mathcal{V}$}(ν) = $\mathcal{N}$ (ν; ν_{0}, θ_{0}) or f_{$\mathcal{V}$}(ν) = $\mathcal{C}$ (ν; ν_{0}, θ_{0}) then ${\tilde{f}}_{\mathit{X}|\theta}$(**x**|θ) is given by (5), because the median of these two distributions are equal to ν_{0}(see Remark 3). - Moments knowledge case E(|$\mathcal{V}$|) = ν
_{0}:Then the ME pdf is given by $\mathcal{D}$$\mathcal{E}$ (ν; ν_{0}). In this case we cannot obtain an analytical expression for_{0}or Median {$\mathcal{V}$} = ν_{0}and the support of $\mathcal{V}$ is R the ME pdf does not exist.

The second example we consider is
where, this time, we assume that ν is the variance and the nuisance parameter. Then:

- Prior pdf case f
_{$\mathcal{V}$}(ν) = $\mathcal{I}$$\mathcal{G}$ (ν; α, β):Then, it is easy to show that,_{X|θ}(**x**|θ) cannot be calculated analytically. - Unique median knowledge case Median {$\mathcal{V}$} = ν
_{0}:Then, as we could see, by using Lemma 2 and Theorem 2, we have_{X|ν,θ}(**x**|ν, θ) is a monotone function wrt ν (by using derivative) and by Theorem 3 we have - Moments knowledge case E(1/$\mathcal{V}$) = 1/ν
_{0}:Then, knowing that the variance is a positive quantity, the ME pdf f_{$\mathcal{V}$}(ν) is an $\mathcal{I}$$\mathcal{G}$ (ν; 1, ν_{0}). In this case we have_{X|θ}(**x**|θ) cannot be calculated analytically.

In this example we consider is $\mathcal{E}$$\mathcal{N}$ (**x**; ν, σ^{2}, ρ), where ν is a nuisance parameter. Noting that, we can write $\mathcal{E}$$\mathcal{N}$ (**x**; ν, σ^{2}, ρ), as follows (exponential family),
where , can be determined. This pdf is a monotone function wrt θ_{3} and so L(ν) is a monotone function. Let θ = (σ^{2}, ρ) and the median of prior pdf be ν_{0}, then

Suppose we are interested in estimating θ in Example 1. In the case that n = 1
and so the ML estimator (MLE) of θ based on these two pdfs are equal to
respectively. For n > 1 the MLE of θ based on
can be calculated numerically by the following simplified likelihood function,
where we assume that θ_{0} = 1. The MLE of θ based on , is equal to

Before comparing these two estimators (by considering normal prior for ν), one can predict that, $\hat{\theta}$ is better than $\tilde{\theta}$, because $\hat{\theta}$ uses more information (i.e. known normal prior) than $\tilde{\theta}$ which uses only the median of prior distribution. We may also recall that, f_{X|θ}(**x**|θ) is the true pdf of observations obtained using the full prior knowledge on the nuisance parameter, while ${\tilde{f}}_{\mathit{X}|\theta}$(**x**|θ) is a pseudo pdf which includes only prior knowledge of the median of nuisance parameter.

The empirical Mean Square Error (MSE) of 4 estimators are plotted in Figure 4 for different sample sizes n. We note by T the MLE of θ when ν = ν_{0}, and we note by T_{MaxEnt} the MLE of θ when the prior mean and variance are known.

In Figure 4-left we plot the graphs of MSE of $\hat{\theta}$, $\tilde{\theta}$, T and T_{MaxEnt}. In Table 1 we classify these 4 estimators and corresponding assumptions for n = 1. We see that, in Figure 4-left, $\hat{\theta}$ is better than $\tilde{\theta}$, especially for large sample size n, and T is the best.

In Figure 4-right we plot the graphs of MSE wrt median, ν_{0}. This is useful for checking robustness of estimators wrt false prior information. We see that $\hat{\theta}$ is more robust than $\tilde{\theta}$ relative to ν_{0}, but both of them dominated by T. In this case, samples are generated from a normal distribution with random normal mean (median ν_{0}) when θ = 2, however, we assume that ν has a standard normal prior distribution.

The simulations confirm the following logic: more we have information better will be the estima-tion. In fact for calculating T we have not nuisance parameter; for $\hat{\theta}$, we use all prior distribution information; for T_{MaxEnt} we use prior mean and prior variance information; and for $\tilde{\theta}$ we use only the median value of prior distribution.

In this section, we show that the suggested new tool can be extended to other functions such as quantiles instead of median, but not to other functions such as mode. For example, mode of the random variable T = T ($\mathcal{V}$; **x**) = F_{X|$\mathcal{V}$}(**x**|θ) in Definition 1, i.e.,
is not a cdf in Example A. The mode of T is: (see Figure 1 top)
which is not a distribution function. If we assume k = 1, then Mod(T) is a degenerate cdf. In Figure 5 we plot the mean, median and mode of the random variable T. We see that they are cdfs. However, the cdf based on mode is the extreme case of the two others.

As noted by one of the referees, the mode of prior pdf is useful for introducing a pseudo cdf similar to our new inference tool, ${\tilde{F}}_{\mathit{X}}$(**x**). That is, instead of using the result of Theorem 2: ${\tilde{F}}_{\mathit{X}|\theta}$(**x**|θ) = F_{X}|ν,θ(**x**|Med($\mathcal{V}$), θ), using ${\tilde{F}}_{\mathit{X}|\theta}^{\mathit{Mod}}$(**x**|θ) = F_{X}|ν,θ(**x**|Mod($\mathcal{V}$), θ). This method was used for eliminating the nuisance parameter ν. In this case, Theorem 3, i.e. separability property of pseudo marginal distribution, also holds for ${\tilde{F}}_{\mathit{X}|\theta}^{\mathit{Mod}}$(**x**|θ). Note that, the mode of the random variable T, defined in (7) is not equal to ${\tilde{F}}_{\mathit{X}|\theta}^{\mathit{Mod}}$(**x**|θ) and may not be a cdf similar to the above illustration. However, it may be a cdf similar to the following example pointed out by the referee. In Example A, let $\mathcal{V}$ − 1 be a binomial distribution with parameters $\mathcal{B}$(2, $\frac{3}{4}$), i.e. $\mathcal{V}$ is a discrete random variable with support {1, 2, 3}. Then E(T) = 1 − (e^{−x} + 6e^{−2x} + 9e^{−3x})/16 and Mod(T) = 1 − e^{−3x} are cdfs see Figure 6.

On the other hand, we may extend the method presented in this paper to the class of quantiles (e.g., quartiles or percentiles). To make our point clear we consider the first and third quartiles of random variable T in Example A (instead of median, which is the second quartile). We denote the new inference tools based on first and third quartiles by ${\tilde{F}}_{\mathit{X}|\theta}^{{Q}_{1}}$(**x**) and ${\tilde{F}}_{\mathit{X}|\theta}^{{Q}_{3}}$(**x**) respectively.

They can be calculated such as (2) by
It can be shown that, in Example A, ${\tilde{F}}_{X}^{{Q}_{1}}$(x) = 1 − e^{x ln 0.75} and ${\tilde{F}}_{X}^{{Q}_{3}}$(x) = 1 − e^{x ln 0.25}. In Figure 7 we plot them.

In conclusion, it seems that the method can be extended to any quantiles instead of median, but its extension to other functions may need more care.

In this paper we considered the problem of inference on one set of parameters of a continuous probability distribution when we have some partial information on a nuisance parameter. We considered the particular case when this partial information is only the knowledge of the median of the prior and proposed a new inference tool which looks like the marginal cdf (or pdf) but its expression needs only the median of the prior. We gave precise definition of this new tool, studied some of its main properties, compared its application with classical marginal likelihood in a few examples, and finally gave an example of its usefulness in parameter estimation.

The authors would like to thank the referee for their helpful comments and suggestions. The first author is grateful to School of Intelligent Systems (IPM, Tehran) and Laboratoire des signaux et syst`emes (CNRS-Sup´elec-Univ. Paris 11) for their supports.

- Berger, J. O. Statistical Decision Theory: Foundations, Concepts, and Methods; Springer: New York, 1980. [Google Scholar]
- Bernardo, J. M.; Smith, A. F. M. Bayesian Theory; Wiley: Chichester, UK, 1994. [Google Scholar]
- Hernández Bastida, A.; Martel Escobar, M. C.; Vázquez Polo, F. J. On maximum entropy priors and a most likely likelihood in auditing. Qüestiió
**1998**, 22(2), 231–242. [Google Scholar] - Jaynes, E. T. Information theory and statistical mechanics I,II. Physical review
**1957**, 106, 620–630. [Google Scholar] and 108, 171–190. - Jaynes, E. T. Prior probabilities. IEEE Transactions on Systems Science and Cybernetics SSC-4
**1968**, (3), 227–241. [Google Scholar] - Lehmann, E. L.; Casella, G. Theory of point estimation, 2nd ed.; Springer: New York, 1998. [Google Scholar]
- Mohammad-Djafari, A.; Mohammadpour, A. On the estimation of a parameter with incomplete knowledge on a nuisance parameter. 2004; Vol. 735, AIP; pp. 533–540. [Google Scholar]
- Mohammadpour, A.; Mohammad-Djafari, A. An alternative criterion to likelihood for parameter estimation accounting for prior information on nuisance parameter. In Soft methodology and Random Information Systems; Springer: Berlin, 2004; pp. 575–580. [Google Scholar]
- Mohammadpour, A.; Mohammad-Djafari, A. An alternative inference tool to total probability formula and its applications. 2004; Vol. 735, AIP; pp. 227–236. [Google Scholar]
- Robert, C. P.; Casella, G. Monte Carlo statistical methods, 2nd ed.; Springer: New York, 2004. [Google Scholar]
- Rohatgi, V. K. An Introduction to Probability Theory and Mathematical Statistics; Wiley: New York, 1976. [Google Scholar]
- Zacks, S. Parametric statistical inference; Pergamon, Oxford, 1981. [Google Scholar]

Assumptions | pdf of X|θ based on prior information MLE of θ | Simulated data pdf MSE(θ) = E(MLE − θ)^{2} |

Known parameter ν = ν _{0} | $\mathcal{N}$ (x; ν_{0}, θ) T = (X − ν _{0})^{2} | $\mathcal{N}$ (x; 0, θ) 2θ ^{2} |

Known prior f _{$\mathcal{V}$}(ν) = $\mathcal{N}$ (ν; ν_{0}, θ_{0}) | $\mathcal{N}$ (x; ν_{0}, θ + θ_{0})$\hat{\theta}$ = max{(X − ν _{0})^{2} − θ_{0}, 0} | $\mathcal{N}$ (x; 0, θ + 1) E($\hat{\theta}$ − θ) ^{2} |

Known moments E($\mathcal{V}$) = ν _{0}, V ($\mathcal{V}$) = $\frac{{\theta}_{0}}{2}$ | $\mathcal{N}$ (x; ν_{0}, θ + $\frac{{\theta}_{0}}{2}$)T _{MaxEnt} = max{(X − ν_{0})^{2} − $\frac{{\theta}_{0}}{2}$, 0} | $\mathcal{N}$ (x; 0, θ + 1) E(T _{MaxEnt} − θ)^{2} |

Known unique median Median($\mathcal{V}$) = ν _{0} | $\mathcal{N}$ (x; ν_{0}, θ)$\tilde{\theta}$ = (X − ν _{0})^{2} | $\mathcal{N}$ (x; 0, θ + 1) 2(θ + 1) ^{2} + 1 |

© 2006 by MDPI (http://www.mdpi.org). Reproduction for noncommercial purposes permitted.