Performance Analysis of Maximum Likelihood Estimation for Transmit Power Based on Signal Strength Model

We study theoretical performance of Maximum Likelihood (ML) estimation for transmit power of a primary node in a wireless network with cooperative receiver nodes. The condition that the consistence of an ML estimation via cooperative sensing can be guaranteed is firstly defined. Theoretical analysis is conducted on the feasibility of the consistence condition regarding an ML function generated by independent yet not identically distributed random variables. Numerical experiments justify our theoretical discoveries.


Introduction
It has been an important task in the estimation of the transmit power and location of a node based on a set of cooperative receiver (or monitor) nodes in wireless networks. Such an estimation is particularly essential for cooperative sensing based cognitive radio (CR) [1] networks to enable various sensing mechanisms, such as the estimation for the maximum interference-free transmit power of a frequency-agile radio in an Opportunistic Spectrum Sharing system [2]. As pointed out in [3], transmit power estimation is also a fundamental functional block for the detection of adverse behaviors, such as signal jamming attacks [4] and channel capturing [5].
As a special case of transmit power estimation, node positioning based on received signal strength observations has been extensively studied in [6][7][8], where the transmit power itself is assumed known. In [9], transmit power estimation was studied by using an ad hoc optimization method. A simple yet rough estimator was proposed by a geometrical approach with a deterministic model equation in [3]. However, this method is not an exact way for estimation and may cause a large deviation under some circumstances. In [2], an maximum likelihood (ML) estimation method was introduced to calculate the concerned maximum interference-free transmit power. However, the performance of ML estimation lacks relevant and comprehensive analysis. In [10], deep theoretical results regarding the performance of ML estimation for transmit power have been established by assuming that the observation locations are random variables. The popular ML method has also been applied to [11,12] recently.
Specifically, we concern a wireless network consisting of a primary node, or transmit node, as well as a set of receiver nodes that are listening to the signal launched by the primary node. We assume that the primary node transmits at a constant power level during the observation period, and the receiver nodes, with their locations known previously, can exchange their respective received power information with each other [2,3]. The problem is estimating the transmit power and location of the primary node based on a lognormal shadowing model [2,13].
Mathematically, we consider an estimation issue regarding the observation model given by a lognormal shadowing model ( [2,13]) as where is the path-loss exponent and w i is typically a zero mean Gaussian noise N (0, σ 2 ), i = 1, 2, . . . , n.
There are four constant parameters in total in the model (1), i.e., transmit power s p , path-loss exponent , and transmitter location (x p , y p ). The other varying variables except noise are the received power level s i at the i-th monitor node and the monitor location (x i , y i ), i = 1, 2, . . . , n. The estimation issue considered in the above references, e.g., [10], are all with respect to (s p , x p , y p ), i.e, path-loss exponent is assumed known. In this paper, we consider a more general estimation issue with respect to the four parameter vector θ = ( , s p , x p , y p ) for model (1), based on an available data set {s i , x i , y i , i = 1, 2, . . . , n}. In light of the fact that the observation data may be collected from a finite number of monitor locations, it is reasonably assumed that the monitor location (x i , y i ) is deterministic, rather than random variable as in [10].
Because of the nonlinearity of the developed models with respect to the parameters to be identified, most classic methods for linear model in [14] are not applicable. However, the ML method is an exception, which is effective for any kind of parametric model if the distribution of noise is known. By assuming that the noise is an independent identically distributed sequence obeying N (0, σ 2 ), the derived optimization objective function is a summation of the squared difference between the two sides of model Equation (1) (discarding the noise term on the right-hand side). The optimization problem is calculated by Matlab optimization toolbox (2010b). In an iterative process, the former estimation is taken as an initial starting point for the next calculation. In this contribution, we are motivated to analyze the performance of ML-based estimation for parameter θ, since the issue to find the root of the ML function is a standard optimization problem that can be solved by many mathematical or engineering software programs.
Although in practice an ML-based estimation method is usually implemented by a finite data set, there could be a concern regarding the asymptotical performance of the developed ML estimation model as the data volume n tends to infinity. If the ML estimation is biased even when n tends to infinity, how can one expect for the effectiveness with a fixed n? Therefore, the major effort of this paper is to investigate the asymptotical performance of the ML estimate as n → ∞, i.e., when it succeeds and when it fails. It turns out that the monitor location set {(x i , y i ), i = 1, 2, . . .} should be rich enough in some sense to guarantee the n-th ML estimate, denoted as θ n , tends to θ as n → ∞. To our knowledge, a special case of this issue has been considered in [10], where the performance of ML estimation for parameters (s p , x p , y p ) has been analyzed by assuming that {(x i , y i )} is a sequence of iid random variables. However, as mentioned above, it is more natural to view the receiver node locations {(x i , y i )} as deterministic values. Hence, in this paper, we will consider the performance of ML estimation for θ = ( , s p , x p , y p ) as the number of receiver nodes n tends to infinity by information from deterministic receive nodes.
The main contributions of this paper are as follows.
• We identify the condition that the consistence of ML estimation for transmit power can be guaranteed; i.e., θ n → θ as n → ∞. Exactly as shown in Remark 1, to achieve a correct observation, the receiver nodes' location set {(x i , y i ), i = 1, 2, . . .} should not be merely located in a criterion curve (a term given by Definition 2). The variety of the location set is to guarantee the persistently excitation condition, which is used for the convergence of the Least Squares algorithm [14].

•
Theoretically, we consider the consistence issue for a ML function generated by independent yet not identically distributed random variables. A simple criterion for checking the consistence is given in Theorem 3.
The rest of this paper is organized as follows. The proposed model and ML estimation for transmit power are presented in Section 2. An extreme case related to consistence condition for ML estimation of transmitter power is studied in Section 3, which may help to establish some intuitive sense for the main results of this paper. Then, as the main practical contribution of the paper, the technical mechanisms beneath the numerical experiments are demonstrated and justified in Section 4. In Section 5, some numerical experiments are designed to detect the performance of the ML estimate algorithm, and explained by relevant theoretical results. The main theoretical contribution is in Section 6, where some theoretical preliminaries to be used in verification of the theoretical results in the former section are prepared. Concluding remarks of the paper are given in Section 7.

Maximum Likelihood Estimation for Transmit Power in Localized Signal Strength Model
As aforementioned, we limit our discussion to Signal-Strength (SS) based localization of a single primary transmitter in the geographic coverage area [2]. Let L = (x p , y p ) denote the location of the primary transmitter. Suppose that a sequence of uncorrelated observed SS measurements, s 1 , . . . , s n , along with the corresponding position coordinates L 1 , . . . , L n , where L i = (x i , y i ), i = 1, . . . , n, are available. The set of observations may be obtained in different ways. For example, consider a scenario in which n receiver nodes, located at positions L 1 , . . . , L n , collect the signal strength observations s 1 , . . . , s n at a given time. These receiver nodes exchange their data among each other, such that at least one of these nodes receives the entire set O. Naturally, the observation set O may also be obtained by measurements from a single receiver node at n different points in time along a trajectory as the node moves in the coverage area. In general, a given observation (S i , L i ) may be obtained either from a measurement taken by the receiver node itself in the past, or from a measurement at another receiver node that shares the information between the two nodes. Actually, we will find below that there are a set of requirements on the number of receiver nodes and their locations that can satisfy the consistence condition of ML estimation of transmit power.
The observation equation is given by a lognormal shadowing model ( [2,13]) as where noise w i follows a Gaussian distribution N (0, σ 2 ), and Let us formulate below the ML estimate for θ, or parameters , s p , x p , y p , based on the given observation data O(n). Mathematically, when the locations (x i , y i ) are viewed as deterministic quantity, the random variable s i has a Gaussian distribution N (s p − 10 log 10 d ip , σ 2 ), i.e., its density is Note that, though {s i } is a sequence of independent random variables, they have different distributions since the locations (x i , y i ) are probably different with different i.
Clearly, the corresponding ML function is and the log ML function can be written as which is equivalent to minimize This turns to be a nonlinear optimization problem regarding , s p , x p , y p , which can be solved by many mathematical software packages such as GAMs (General Algebraic Modeling System) and Matlab toolbox, or by designing a special program. As proposed in the Introduction, we are motivated to investigate the asymptotical performance of the ML estimate as n increases. Precisely, we are interested to know when the ML-based n-th estimate for θ = ( , s p , x p , y p ) tends to the true value as n tends to infinity.

Unique Solution Condition for a System of Observation Equations without Noise
In this section, we consider an extreme case related to a consistence condition for ML estimation of transmitter power, which may help to establish some intuitive sense for the main results of this paper. A similar idea can be found in the proof of Theorem 3 in [10]. By (2) and (6), we have Thus, M n (θ) = 0 if there is no noise in the model Equation (2). This further means that each term in the summation (6) equals zero, i.e., We naturally want these n equations regarding θ = ( , s p , x p , y p ) to have a unique solution. Mathematically, the solution is not unique if we find another θ = ( , s p , x p , y p ) = θ such that the data set {(x i , y i )} n 1 synchronously satisfies the following n equations: Combining (8) and (9), one gets for i = 1, 2, . . . , n, where d ip = (x i − x p ) 2 + (y i − y p ) 2 and d ip given by (3). Thus, the solution of n equations given by (8) is not unique if the data set {(x i , y i )} n 1 satisfies (10) with certain θ = ( , s p , x p , y p ). Conversely, if the solution is unique, the whole location set should never satisfy (10) for any given θ .
After replacing x i , y i as x, y in (10), a curve is defined by equation of x, y as with given parameters λ = 10 1 10 (s p −s p ) > 0, ( , x p , y p ) and ( , x p , y p ). For convenience, let us call the curve given by (11) a criterion curve with respect to ( , x p , y p ), ( , x p , y p ), and λ. The class of these criterion curves can be used to to determine whether the solution of a system of n equations given by (8) is unique. Precisely, the condition guaranteeing uniqueness can be presented as: the data set {(x i , y i )} n 1 should not be located in any single criterion curve. Below, we will analyze some qualities of criterion curves in detail under general setting and a special case, respectively.

Qualities of the Criterion Curve
Let us first point out a qualitative property of criterion curve given by (11). If = , say < , we find that This means, for any given λ > 0, the criterion curve defined by (11) is impossible to extend to infinity. However, if = , we know that the curve turns to be a circle unless λ = 1. Another immediate discovery is that the curve separates the two points (x p , y p ) and (x p , y p ), due to the fact that the left-hand side of (11) is less than its right-hand side when substituting x = x p and y = y p into (11), and the inequality reversed by substituting x = x p and y = y p . Let us summarize these facts as a proposition below. (11) is a straight line if = and λ = 1. Otherwise, it is a bounded curve. In both cases, the curve separates the two points (x p , y p ) and (x p , y p ).

Proposition 1. The criterion curve defined by
By Equation (11), a criterion curve is symmetrical to the straight line passing through points (x p , y p ) and (x p , y p ). To know how many points a criterion curve intersects the mentioned straight line, let us introduce a parametrical presentation of the straight line as where α = x p − x p and β = y p − y p . By substituting (13) into (11), we have where δ = λ 2 (α 2 + β 2 ) − . It is easy to see that the Equation (14) (with respect to t) has at least two solutions and at most four solutions for = . If = and λ = 1, it has just one solution t = 1 2 ; and it has two solutions for λ = 1. Hence, there are three different structures for the criterion curves if = ; and two different structures if = . Let us check it by numerical experiments. By setting (x p , y p ) = (−2, 0), (x p , y p ) = (2, 0), = 1.5, = 2.5, we plot graphs of the criterion curves for λ = 0.05, 0.046487, 0.045 respectively in Figures 1-3. The reason why λ selects these three values is that, under the given parameter setting, λ = 0.046487 is found numerically at a critical point corresponding to the critical curve among the other two kinds of curves. Thus, the other two values 0.05 and 0.045 from both sides of the critical value are chosen to show the bifurcation phenomenon. The latter case will be analyzed extensively in the next subsection, so we just summarize the case = below.  Proposition 2. For = , the criterion curves defined by Equation (11) can be divided into 3 kinds of curves according to the number of intersecting points of a criterion curve and the straight line determined by the two points (x p , y p ) and (x p , y p ), i.e., the number of roots of Equation (14). Specifically, the three cases are as follows: (a) If the number is 2, the curve is a closed connected curve. A typical graph of this case is shown in Figure 1; (b) If the number is 3, the curve is a closed connected curve. A typical graph of this case is shown in Figure 2; (c) If the number is 4, the curve is two separated curves. A typical graph of this case is shown in Figure 3.

Qualities of the Criterion Curve if Is Known
If the path-loss exponent is known, i.e., in this case, the ML estimation is proposed for (s p , x p , y p ) as in [10], by a similar reasoning process, we have the criterion curve as with given parameters λ = 10 1 10 (s p −s p ) > 0, (x p , y p ) and (x p , y p ). Clearly, the curve is a straight line when λ = 1, and a circle when λ = 1. Though these basic facts have been discovered in [10], we describe some more general and deeper qualities in the following propositions.

Proposition 3.
When λ = 1, the circle defined by Label (15) is centered at i.e., λ |1−λ 2 | times the distance between (x p , y p ) and (x p , y p ). While λ = 1, the straight line defined by (15) is perpendicular to the line connecting (x p , y p ) and (x p , y p ) and containing their midpoint.
In the circle case, the radius tends to infinity when λ tends to 1, which also means the circle turns out to be a straight line.
With s p and P ∆ = (x p , y p ) fixed, an important topic is checking whether a given circle is a criterion curve, and, if it is, how we can find the corresponding P ∆ = (x p , y p ). As shown in Figure 4, given the point P and a circle centered at C with radius r, it is important to figure out whether the circle is a criterion curve, and what P is at this moment.
Let us change the Formula (16) into a vector form: Thus, we have Hence, P , shown in Figure 4, can be calculated from (18) directly if λ, P and C are known. Let us deduce a formula to calculate λ from given information: points P and C, and circle radius r. In the case of λ > 1, shown in Figure 4, by the aforementioned fact, it follows that and derived by (17). Thus, from (19) and (20), we derive and thus Similar results can be derived in the case 0 < λ < 1. Therefore, now we can calculate P by Formulas (18) to (22) when λ = 1, i.e., r = |PC|. In the case λ = 1, or r = |PC|, we know that the corresponding curve should be a straight line rather a circle. Thus, if a given circle contains the point P, it is impossible to be a criterion curve. Let us summarize these facts as a proposition below. Proposition 4. Assuming points P = (x p , y p ) and C = (x c , y c ), and a circle centered at C with radius r > 0 are given, then the following assertions hold: (i) If the circle contains point P, i.e., r = |PC|, then the circle is not a criterion curve. Moreover, a straight line traversing through P is not a criterion curve.
(ii) If r = |PC|, then the circle is a criterion curve, and the corresponding P = (x p , y p ) in (15) can be calculated as: with λ = |PC|/r.
Another obvious fact is that, under case (ii) of Proposition 4, the circle contains either one of P and P , and the other is out of the circle, as shown in Figure 4. This also serves as an explanation as to why it fails to be a criterion curve if a circle contains P itself. Moreover, three noncollinear points are not located in any criterion curve, if the circle determined by the three points contains P.

Consistence Condition for ML Estimation of Transmitter Power
In this section, we deduce the consistence condition to guarantee convergence of the ML estimate for θ, i.e., θ n → θ as n → ∞. Roughly speaking, the consistence condition is that the location set {(x i , y i )} should be rich enough in some sense. A criterion is given and proved.
Let us introduce the definition for consistence condition, which is a requirement for richness of the locations {(x i , y i )}, as introduced in the former section. The requirement is somewhat similar to 'persistence of excitation condition' in a least squares algorithm to guarantee the convergence.
Let us introduce a limit set of a sequence of point sets in terms of standard Euclidean distance.

Definition 1.
Denote a series of nonintersecting and successive index sets for the whole natural numbers as A point P ∈ R n is called a limit point of the sequence of point sets {S k }, if for ∀ε > 0, there exists j 0 > 0 and i j ∈ A j such that |P i j − P| < ε holds for j > j 0 . The set comprised of all such limit points is called the limit set of the set sequence {S k }, denoted as Lim(S k ).
For example, letting we have its limit set to Lim(S k ) = {0, −1, 1}. Below we define the criterion curve strictly, which actually has been introduced in Section 3.

Definition 2. A criterion curve is defined by equation of x, y as
with given parameters λ = 10 1 10 (s p −s p ) > 0, ( , x p , y p ) and ( , x p , y p ).
where m ≥ 3 is a positive integer. If the limit set Lim(S k ) does not locate in any single criterion curve given by (24), then the ML estimation for model (2) with likelihood function given by (5) is very consistent, i.e., the estimates converge to the true vale with probability 1 as the data number tends to infinity.
Proof. Theorem 3 is used to prove the above theorem. Since the model is assumed to be located in a local region, we need only to check the two conditions therein. By the likelihood function given by (5) and density for s i given by (4), we have where f used on the right-hand side is used to make the expressions brief. By the fact that all parameters belong to a compact set, the condition (ii) of Theorem 3 is satisfied.
The continuity of f with respect to all variables is obvious. By Remark 5, we need only to verify (53). To show (53) for density (4), it is sufficient to show that, for any different θ and θ , there exist s and (x, y) ∈ Lim(S k ) such that f (s; x, y, θ) = f (s; x, y, θ ).
By substituting (4) to (26), we have which means that there is no criterion curve containing all points of the limit set Lim(S k ). In other words, this is equivalent to say that the limit set Lim(S k ) is not located in any single criterion curve given by (24), which finishes the proof of the theorem.
Based on Theorem 1, the ML estimate for transmitter power may fail even if the number of observation tends to infinity if the whole location set {x i , y i } ∞ 1 belongs to a single criterion curve. Below are two typical cases when is known.
(i). The set {x i , y i } ∞ 1 is located in a straight line, which means λ = 1 and = in (24). In this case, the ML estimator for location (x p , y p ) may tend to its symmetrical point with respect to the line yet the estimate for s p still works, as in Example 2. Furthermore, if the set {x i , y i } ∞ 1 is asymptotically located in a straight line, the same phenomenon happens, as shown in Example 4.
(ii). The set {x i , y i } ∞ 1 is located in a circle, which means λ = 1 and = in (24). In this case, the ML estimate may tend to another location L p = (x p , y p ) given by (23), and estimate for s p tends to s p , as in Example 3.
However, even in the two cases above, it is still possible to see that the ML estimators converge to the true value because the true value are also the solution of the likelihood function.

Numerical Algorithm and Experiments
Let us first develop a recursive numerical algorithm to minimize M n (θ) in (6) by using Matlab code/fminsearch/. When the n-th ML estimate for θ is numerically found to be θ n , the next minima θ n+1 for M n+1 (θ) is numerically searched based on θ n as: This means the starting point of the next search is θ n , rather than any other randomly selected value. Thus, the information obtained by the former step has been used for the next search, which may increase the effectiveness.
For simplicity, the numerical experiments below are designed to investigate the performance of ML estimation for parameters s p , x p , y p , instead of θ = ( , s p , x p , y p ). Naturally, similar phenomena exist for ML estimation of θ. Four numerical experiments below are designed to investigate the asymptotical performances of the ML estimation for (s p , x p , y p ) in model (2) as the observation number tends to infinity. The first one achieved a successful observation while the rest three failed. The essential reason for the failure is that the likelihood function (5), or the equivalent function M n (·, ·, ·) in (6), has multiple roots when some conditions are not met. Note that, even in the three examples of failure, the ML estimation may still work since the true value is also a root of the likelihood function. It depends on the numerical algorithm to find the roots of the likelihood function. (2)  Hence, Lim(S k ) is still comprised of these four noncollinear and nonconcyclic points. By Theorem 1 and Remark 1, the ML estimation for (s p , x p , y p ) will definitely converge to all true values as shown in Figure 5, which coincides with the numerical experiments.
The set S k does not need to own four noncollinear and nonconcyclic points. As a matter of fact, A k consists of three noncollinear points (or three collinear points) may still work.

Example 2.
Under the same setting of Example 1, expect that all locations (x i , y i ) are located in the set {(x, y):x = 0, y ∈ [−100, 100]}. The performance has been shown in Figure 6.
The location set {(x i , y i ), i = 1, 2, . . .} is obviously located in a criterion curve, since they are all located in the straight line x = 0. The corresponding P = (x p , y p ) = (−20, 0) in (15) by the fact P = (20, 0). In addition, the corresponding λ = 1 under this case, so, by (10), s p = s p . This means that the ML estimates for s p and y p have no bias, while the ML estimate for x p may tend to x p , which has been discovered by the experiment. By observing that the symmetric point of (20, 0) with respect to the line x = 0 is (−20, 0), the ML estimators for x p may converge to −20, yet the estimators for y p and s p still work, as shown in Figure 6.
The performance has been shown in Figure 7. The location set {(x i , y i ), i = 1, 2, . . .} in Example 3 is also located in a criterion curve, since they are all located in the same circle with center (−2020/99, 0) with radius r = 400/99, calculated by Proposition 4. The corresponding P = (−20, 0) in (15) and the corresponding λ = 10, so, by (10), s p = 70. This means that ML estimate for y p have no bias, while ML estimates for x p and s p may tend to x p and s p respectively, which has been discovered by the experiment, as shown in Figure 7. Note that the corresponding location set S k in Theorem 1 is: , y , y ∈ [−100, 100], i = 1, 2, 3, 4 .
Clearly, Lim(S k ) is located on line x = 0. Thus, it is located on a single criterion curve. A similar phenomenon of Example 2 happens. By observing that the symmetric point of (20, 0) with respect to the line x = 0 is (−20, 0), the ML estimators for x p may converge to −20 yet the estimators for y p and s p still work, as shown in Figure 8.
It is similar to design a set of numerical experiments to make the ML estimations for θ = ( , s p , x p , y p ) all fail. The four simple examples above are just used to demonstrate some basic features of the proposed ML estimate algorithm.

Consistence Analysis of Maximum Likelihood Estimation
This section serves as theoretical preliminaries for the former mathematical analysis, which contains general consistence analysis for the proposed ML estimation algorithm based on independent observations with different distributions. As for independent identically distributed observation cases, we refer to Theorem 1.3.3 in Chapter 1 of book [15]. Hence, Theorem 2 here can be taken as an extension of the counterpart in [15]. Furthermore, we propose an easy way of checking the criterion in the following Theorem 3. On the other hand, the readers who are more concerned with practical applications are suggested to pay more attention to Theorem 3.
Although the ML method, suggested by R. A. Fisher, has attracted the interest of mathematicians in theoretical performance analysis for a long time, there are still remain open quite basic issues such as the one investigated in this paper. Thus, this section investigates the basic strong consistence for theoretical preparation.
Let (X , U , P θ , Θ) be a statistical experiment, which means that (X , U , P θ ) is a probability space for any θ ∈ Θ. Let P θ be absolutely continuous with respect to measure µ on U and thus dP θ dµ = p(X, θ), i.e., something like density function. Let X be the observation data generated by a certain statistical experiment. Thus, the function p(X, θ) can be taken as a likelihood function corresponding to the experiment and observation X. The statisticθ defined bŷ is called the ML estimator for the parameter θ based on the observation X. In the case of independent identically distributed (iid) observations, X 1 , . . . , X n , where X i possesses the density f (x, θ) with respect to measure µ. The ML estimatorθ n has a simple form aŝ Moreover, in many applications, the likelihood function depends on some other varying quantities, e.g., the likelihood function for transmitter power model by (5) involves s i and (x i , y i ). This means that the ML estimatorθ n is generallyθ whereθ i is a deterministic quantity (or vector), e.g.,θ i = [s i , x i , y i ] T in (5). The issue to be faced now is to estimate parameter θ via the available observations and the varying data {θ i }. Naturally, some restrictions upon the extra data set {θ i } are required to guarantee the consistence.
In the aforementioned iid case with likelihood function (29), the ML estimatorθ n is strongly consistent (see Theorem 1.3.3 in Chapter 1 of book [15]), i.e.,θ n → θ with probability 1 as n → ∞. Below, we intend to obtain a similar result by using the case with likelihood function as (30).

Theorem 2.
Let Θ be a bounded closed set in R k , f (x,θ i , θ) be a continuous function of θ ∈ Θ for almost all x ∈ X , i = 1, 2, . . .. Let the following conditions be fulfilled with a real number q ≥ 2: (i) Denote a series of nonintersecting and successive index sets for the whole natural numbers as {A j , j = 1, 2, . . .} with the volume of A j less than or equal to a constant positive integer m. For all θ ∈ Θ and all γ > 0, there exists a positive number κ θ (γ) (may depend on θ, q and γ) such that (ii) For anyθ i , δ > 0 and θ ∈ Θ, there exists a positive number ω θ (δ) (may depend on θ, q and δ) such that Then, for any fixed θ ∈ Θ, the ML estimatorθ n given by (30) tends to θ as n → ∞ with probability 1.
Let us explain the two conditions briefly here. The condition (i) requires that in any index set A j there is at least a density function with a different value as θ is changed. This means the data set {θ i , i ∈ A j } should be rich enough to make density function f (x,θ i , θ) distinguishable according to different values of θ. On the other hand, condition (ii) desires that the density function has certain continuity with respect to θ in some sense.

Remark 2.
This theorem is a twofold extension of Theorem 4.3 in Chapter 1 of book [15]. Firstly, the theorem in book [15] deals with an iid likelihood function as (29), which is just a special case of (30) withoutθ i in the likelihood function. Secondly, a parameter q is introduced here in the conditions (i) and (ii) to include the counterpart conditions therein as a special case of q = 2, although, q = 2 is a wise choice.
Before prove the theorem, let us introduce an extension of Young's inequality [16] below and refer its proof to [17].

Remark 4.
This lemma develops an extension of the classic Young's inequality; see, e.g., [16], for a ≥ 0, b ≥ 0, and 1 p + 1 q = 1 with p > 0, q > 0, which is a source of lots of important inequalities, e.g., Holder's inequality and Minkowski's inequality. A corollary from the proof is The below proof for Theorem 2 shares the main ideas of the counterpart for Theorem 4.3 in [15].

Proof of Theorem 2. Denote
Notice that the maximum operator is taking over a finite set A j in (31) of condition (i), thus there exists an index i j ∈ A j such that Let Γ be a sphere of a small radius δ located entirely in the region |u| > 1 2 γ. We shall estimate first E sup Γ Π n (u) 1 q . Supposing that u 0 is the center of Γ, then sup u∈Γ Π n (u) We now intend to find a suitable upper bound for the expectation of the left-hand side of (37). This can be done by considering the right-hand side in two separated nonintersecting index subsets of i, denoted by A ∆ = {i j , j = 1, 2, . . . , n} and A ∆ = n j=1 A j − A . By taking expectation over index set A for (37), we have Noticing the fact m is a constant integer, this means can be held if we select sufficiently small δ. Thus, the right-hand side of (44) tends exponentially to 0 as n → ∞. Fix γ > 0 and cover the exterior of the sphere |u| ≤ γ by M spheres Γ j , j = 1, . . . , M, of radius δ with centers u j . The number δ is chosen to be sufficiently small so that all the spheres will be located in the region |u| > 1 2 γ and so that, for all j, the inequality mω θ+u j (δ) ≤ 1 2q κ θ ( 1 2 γ) satisfied. Then, by definition of Π n and (44), Therefore, which finishes the proof.
Once the two conditions of Theorem 2 are justified, then strong consistence of ML estimation algorithm follows. However, the two conditions are inconvenient to use in applications. Below is an attempt to find a simple and applicable criterion by using the two conditions (i) and (ii) with q = 2 in Theorem 2.
Theorem 3. Let Θ be a bounded closed set in R k , {θ i } belong to a compact set, f (x,θ i , θ) be a continuous function of θ ∈ Θ, x ∈ X , and {θ i }, and A j is the same notation of Theorem 2. Furthermore, if (i ) For any different θ, θ ∈ Θ and ∀j, there exist fixed x 0 and ε 0 > 0 and i 0 ∈ A j such that Then, for any fixed θ ∈ Θ, the ML estimatorθ n given by (30) tends to θ as n → ∞ with probability 1.

Proof.
We need only to justify that the two conditions of Theorem 2 are satisfied with q = 2. Clearly, the condition (ii ) is stronger than (ii) of Theorem 2 in view of standard Differential Mean Value Theorem with respect to θ. Thus, we need only to guarantee that (i ) is a special case of (i).
By the notion introduced by Definition 1, we simplify further the condition (i ) of Theorem 3 for application.

Conclusions
We have investigated the consistence condition of Maximum Likelihood (ML) estimation for transmit power of a primary node in a wireless network with cooperative receiver nodes. By relevant theoretical analysis, the condition is that the location set of receiver nodes should not be (or asymptotically in some sense) merely in a criterion curve (which is a term given by Definition 2). In other words, this means the locations of receiver nodes should be rich enough to guarantee consistence of ML estimation. This condition can be compared with the persistent excitation condition used to guarantee the convergence of a least squares algorithm [14].
Moreover, we have also established theoretical analysis for the consistence condition of ML function, which is generated by independent yet not identically distributed random variables. A simple criterion is proposed for application.
Some relevant positive and negative numerical experiments were designed and conducted in order to verify the theoretical discoveries. All of these experiments have been analyzed and explained by relevant theoretical results established.
As our future study, more general scenarios in transmit power estimation are of great interest-for example, the case where there are more unknown parameters in the lognormal shadowing model, as well as the case in which more primary nodes are to be estimated synchronously in the wireless network as that stated in [8]. It is believed that the strict theoretical results developed in the paper will still be useful and will be able to serve as a basis when the more general scenarios of ML estimation are handled.