Entropy vs. Energy Waveform Processing: A Comparison Based on the Heat Equation

Virtually all modern imaging devices collect electromagnetic or acoustic waves and use the energy carried by these waves to determine pixel values to create what is basically an “energy” picture. However, waves also carry “information”, as quantified by some form of entropy, and this may also be used to produce an “information” image. Numerous published studies have demonstrated the advantages of entropy, or “information imaging”, over conventional methods. The most sensitive information measure appears to be the joint entropy of the collected wave and a reference signal. The sensitivity of repeated experimental observations of a slowly-changing quantity may be defined as the mean variation (i.e., observed change) divided by mean variance (i.e., noise). Wiener integration permits computation of the required mean values and variances as solutions to the heat equation, permitting estimation of their relative magnitudes. There always exists a reference, such that joint entropy has larger variation and smaller variance than the corresponding quantities for signal energy, matching observations of several studies. Moreover, a general prescription for finding an “optimal” reference for the joint entropy emerges, which also has been validated in several studies.


Introduction
All applications of imaging technology begin with collection of a flux, either electromagnetic [1] or acoustic [2], and nearly all images produced by current technology, e.g., for remote sensing or medical imaging, are representations based on some type of scattered energy. The reasons for this are mainly historical, since the detectors are typically developed by physicists and electrical engineers who are intimately acquainted with the wave equation and transmission line theory where conservation of energy is a central concept. On a more practical level, transduction elements are frequently characterized in terms of energy conversion efficiencies, making it natural to think of the subsequent image formation process in terms of either electromagnetic or acoustic field energy. From this perspective, if the received energy arriving during some small interval of time must be reduced to a single number in order to compute an image pixel value, energy is the obvious choice.
For instance, in the field of medical ultrasonics, tumor detection and tissue classification with ultrasound remain a highly useful and clinically relevant approach (liver, kidney, prostate, breast, heart, eye, thyroid, pancreas, gall bladder, etc.). A number of teams, including those of Insana [3][4][5], Forsberg [6,7], Deng [8], and others, have pursued novel data acquisition and reduction schemes (spectral, cepstral, wavelet, elastographic, harmonic, etc.) to augment diagnostic power from traditional radio frequency data, and progress continues apace. Nevertheless, although hardware improvements in clinical imaging systems over the last 50 years have dramatically improved the ability of ultrasound to display tissue features, resident signal processing algorithms have not evolved much beyond fundamental presentation of the energy of backscattered compressional waves.
Since all detectors have a finite response time, their output is essentially an integral of the incoming flux that they measure taken over a very short time interval. As such, it is natural to ask if other integrals or functionals of the incident flux might also have utility if represented as images [9][10][11][12][13][14].
In fact, numerous experimental studies have demonstrated the utility of information theoretic quantities for this type of analysis of experimentally-measured waveforms. In the standard application of information theory, as initiated by Shannon, the random variables (i.e., waveforms, which we will denote f and g defined on [0, 1]) are assumed to have the same underlying distribution [15]. However, additional assumptions, such as differentiability, permit computation of distributions, both individual and joint, directly from the measured waveforms. Operationally, the ability to differentiate experimentally-measured waveforms, which contain noise, requires regularization. For our studies, this is accomplished using optimal smoothing splines [16]. Differentiability combined with regularization then permits computation of the distributions for individual waveforms, or pairs of waveforms in the case of joint distributions. Information-theoretic quantities may then be computed from these distributions in an approach that corresponds more closely to that initiated by Kolmogorov and Chaitan, where entropy is a measure of the intrinsic complexity of individual mathematical objects [17,18]. The underlying assumption in experimental applications of these quantities is that the waveforms, usually acquired in scattering measurements, faithfully capture the complexity of the interrogated scattering architectures. The merit of this strategy has been demonstrated in several experimental studies, which have investigated the sensitivities of several entropies for the detection of small changes in waveforms acquired in acoustic experiments [19][20][21][22][23][24].
To date, the most sensitive of these is a joint entropy of two waveforms, one acquired in an acoustic backscatter (i.e., an echo) measurement from an experimental specimen, f(t), and the other a reference, g(t), which may be obtained separately by experiment or by theoretical modeling [25]. To compute this entropy for differentiable waveforms f(t), g(t), we must calculate their joint distribution w f,g , which is not a function, but a tempered distribution [25]. Thus, calculations based on w f,g require "coarse-graining" on a uniform grid of C × C squares covering the x, y-plane to obtain a discrete joint probability distribution p C (j, k) from w f,g by integrating its product with smoothed versions of the characteristic functions of these squares (Equations 40 to 44 of [25]). This is followed by a limiting process where the grid size, C, is taken to zero. For instance, the calculation of the joint entropy begins with the following computation: (1) In the limit where C → 0 and f, g are piecewise differentiable functions on [0,1] (without any intervals of constancy), Equation (1) becomes [25]: (2) In this note we determine conditions on g(t) that maximize the sensitivity of H f,g to small changes in f(t). This requires a lengthy calculation.
We will be comparing this sensitivity to that of the signal energy, E f , of f(t) defined by: (3) Numerous studies have shown that entropy signal receivers are always at least as sensitive as E f is to small changes in f (see Equation (5)). In fact, in many cases, it is actually much more sensitive [19][20][21][22][23][24][25].
A typical result for materials characterization is shown in Figure 1 [26], which shows images of a graphite/epoxy composite laminate scanned using a 2.5-MHz transducer on a 101 × 101 point grid. The backscattered ultrasound was digitized for off-line analysis. The peak-topeak image was produced using the peak-to-peak amplitudes of the waveforms. The rest were produced using a moving window (128 points long) analysis to produce a stack of images corresponding to different depths whose minima were then projected onto a single image-plane to permit rapid analysis of the entire image set. This projection scheme is frequently used to reduce the amount of data that must be inspected or in the case where the defect is not confined to a narrow range of depths. The E f image was produced using Equation (3). The H f [20], I f, ∞ [27], H f,g ( [25] or Equation (2)) and H f,gO are entropy images.

The Main Result
All of our studies have been based on the (fairly typical) situation where an experimentalist acquires waveforms of a few microseconds duration over a much greater period of time spanning minutes or longer. Thus, there are two time scales: the long experimental time scale and a much shorter measurement time scale (which we have parametrized on the interval [0, 1], the domain of f(t) and g(t)). If we denote a measurement at time t by M t , then the sensitivity is defined by [39]: (4) It is essentially the noise-normalized change in a physical quantity at different measurement times. The situation also describes measurements where data (i.e., f(t), g(t)) are acquired at different times between which the measurement device has been moved; for instance, Figure  1, where the experimental times map to different spatial locations in the experimental specimen.
For evaluation of receiver sensitivities in theoretical studies, we will maximize the similar quantity: (5) where the product σς(t) captures the impact of Gaussian noise on the experimental measurement (ς(t) is a functional of a Brownian path (see Equation (15)) and σ is a scalar representing the signal-to-noise ratio). On the other hand, the perturbing function εη(t), where ε is small and η ∈ C 1 [0, 1], models the effect of the variation of the scattering architecture, as captured in Figure 1. Together, these account for the specific form of the measured function, which we write as (f + εη +σς) (t); all of these functions will be discussed fully in the next section. The experimental change referred to prior to Equation (4) is then quantified mathematically by the directional derivative: (6) The measure of noise is: (7) where all of the means are computed as Wiener integrals as discussed below. Similar definitions hold for the signal energy E f .
After carefully-defining the quantities, f(t), η(t), σ, ς(t), in terms of a physical measurement model, having finite time resolution width Δ ≪ 1, we calculate the variations δH f+σς,g [η] and δE f+σς [η] in Section 4. The mean values of these are then calculated in Section 5 as Wiener integrals. The joint entropy is considered first in Section 4.1. After deriving the mean variation as a Wiener integral, we rewrite it in Section 5.1.1 as a Lebesgue integral. This integral represents a family of solutions to the heat equation with initial conditions defined on the real line. However, the noise-level plays the role usually held by time.
The results of the calculations may be summarized as:
In nearly all practical situations, such a t 0 exists. We prove this theorem by constructing an example reference at the end of Section 5.1.1.
There are actually many such g(t). A good way to construct examples is, roughly, to pick a zero, t 0 , of f′(t) and have g′(t) be a non-zero constant on one side of t 0 , while on the other side, it differs from f′(t) by only a small constant 1 ≫ ω > 0. In the limit as ω → 0, | δH f+σς,g [η]| > 2|η′(t 0 )| log[1/ω]/f″(t 0 ). Exact details are provided in the proof, where a bound on the size of g(t) is also provided.
On the other hand, the variation for the signal energy, provided by Equation (81), is calculated in Section 4.2. If f and η are bounded, then in the noise-free limit, it is (see Equation (51)): (8) which is bounded.
Thus, if η′(t 0 ) ≠ 0, there exists g, such that: (9) There are many theorems describing the behavior of solutions to the heat equation for small time discussed in Section 5.1.2. These results are used in Section 6 to prove:

The Physical Setup
We assume that we have an experimental system that is measured by some means, e.g., interrogated by waves, π(t), of some sort, such as microwave or ultrasonic radiation, and that we collect some portion of the waves scattered in this system to obtain both perturbed, f pert. (t), and unperturbed functions, f unpert. (t), for times normalized by the appropriate choice of units to lie between zero and one. The difference |f unpert. (t) − f pert. (t)| is assumed to be small and to be caused physically by a change in the system as measured at different spatial locations (as in Figure 1) and/or different times if the system is evolving. We also assume that there is Gaussian distributed noise in the system and that the noise functions for all measurements are represented by Brownian paths, which we will always denote by x(t) or, when there are multiple Brownian paths that must be distinguished, by x i (t). Moreover, we assume that these functions, which all have a variance of one, are scaled by an experimentally-determined signal-to-noise ratio, σ, which will typically have positive values much less than one-tenth.
If the system is linear, with initial impulse-response function Λ(t) and final perturbed impulse-response function Λ(t) + ελ(t), these functions may be represented mathematically as: (12) If the system is non-linear, these equations become approximate representations of f unpert. (t) and f pert. (t) in the case where the perturbation to the system is small.
All experimental systems have finite time resolution. In addition, an experimentalist may deliberately signal average successive measurements to cancel noise. Both of these facts may be expressed using convolution of ideal experimental measurements, such those in Equation (12), with a finite measurement function, μ Δ (t) of limited time duration, i.e., 1]. This requires us to replace Equation (12) by: (13) and: (14) where, (15) with: Hughes et al. Page 7 Physically, f(t) is the "well-behaved", i.e., noise-free and smooth, part of the linear response of the experimental system (as described by Λ(t)) to the probing "waveform", π(t 1 ), while η(t) is the noise-free and smooth change in the system, resulting from the perturbation of the system, as represented by the function λ(t).
We will additionally assume that, (17) for later use.
Later, we will also need ς′(t): (18) with: (19) We will also assume that μ Δ (t) is non-negative; typically, a unit step function of width less than one whose derivatives we will treat in the sense of distributions. We have from Equation (14): (20) which we will need for the evaluation of Equation (2).

Characteristics of the Measurement Window μ Δ (t) Relevant to the Variation and Variance of H f+εη+σς,g
Up to this point, we have not specified the experimental window function μ Δ (t). While many different choices for this function are possible, the most common is a unit height step function of width much smaller that the total time of the experimental measurement, which we have scaled to be one. Experimental conditions usually provide for at least sixteen sample points in this window. This is motivated by the experimental goal of selecting digitizer equipment and settings so that the experimental window function is very small compared to the scale over which the input waveform exhibits significant change. Consequently, for typical measurements, we have values of ‖m t ‖ ≪ 1/16. To make these comments more precise, we will define: (21) where χ (0,Δ] (t) is the characteristic function of (0, Δ]. This function has finite total variation and m t (1) = μ Δ (t − 1) = 0, for all t ∈ [0, 1], which we must always have for technical reasons (see the Appendix on Wiener integrals Equations (128) and (146)). Later, we will also need, for the calculation of the variance of joint entropy and its variation, the relation (jointly continuous in s and t): (22) and using ‖m t ‖ to denote the L 2 -norm on [0, 1]: (23)

Characteristics of the Measurement Window μ Δ (t) Relevant to the Variation and Variance of E f+εη+σς
If we define: (24) and: (25) Then, the last term in Equation (13) may be written as: (26) We also note for later use: (27) In particular, if 0 ≤ t 1 , t 2 ≤ 1, then we have the order relation: (28)

Calculation of the Variation, δH f+σς,g (η)
We now calculate the average change in Equation (2) when f(t) is perturbed by the function εη(t) + σς(t) as shown in Equation (20). The calculation is broken into two parts.

The First
Term-To begin with, we will focus on the first term in Equation (2). Using the terms defined above this is: (29) Let A(ε) denote the set of points where . Similarly, denote the corresponding set of points where by B(ε). We will additionally assume that the set of critical points of f(t) and g(t) are disjoint, so that χ A(ε) (t) ≠ 0 ⇒ g′(t) ≠ 0 and χ B(ε) (t) ≠ 0 ⇒ f′(t) ≠ 0. The indicator functions may be expressed in terms of Heaviside functions, H(t), as: (30) Using these conventions, the first term in Equation (29) becomes: (31) Now, (32) where we have used the operational relation for Dirac delta functions: xδ(x) = 0. In addition, if f′(t) + εη′(t) + σς′(t) ≠ 0 (which is true if t ∈ B(ε), since the set of critical points of f(t) and g(t) are disjoint), we similarly have: (33) Using these relations, we differentiate Equation (31) with respect to ε as the first step in obtaining the variation, V I , of the first term:  (2), is obtained by first computing the derivative: (40) As in the case for the first term, the variations at ∂A(ε) and ∂B(ε) cancel in pairs, since |f′(t) + εη′(t) + σς′(t)| = |g′(t)| at the boundary points; hence, the derivative is: (41) Now, we set ε = 0 to obtain the variation, V II , for the logarithmic part: (42)

Total Variation-
The total variation, V, is now: (43) where: (44) is continuous in u for fixed υ and continuous in υ, except for a discontinuity across υ = 0. Figure 3 contains a plot of a typical G(u, υ). Moreover, for fixed υ, G(u, υ) is an odd function of u, a fact that will be significant later.
For later use, we record the alternate forms of G(u, υ):

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript (45) and: (46) We also record for later use the first and second partial derivatives of G(u, υ) with respect to u, (47) which is continuous in u and, therefore, yields upon further differentiation with respect to u the distribution-free second partial derivative,

Calculation of the Variation, δE f+σς (η), for E f
We now wish to characterize the signal receiver for energy, which is given by Equation (3). The perturbed signal receiver value is: Following the usual steps, we differentiate with respect to ε: The variation δE f+σς [η] is now given by:

Calculation of the Average Variation, 〈δH f+σς,g (η)〉, by Wiener Integration
We now compute the average variation or expectation value of the variation over the space of noise functions, i.e., we may average Equation (43) where: Finally, where the last integral is obtained by the change of variables: This can be further simplified as: Hughes et al. Page 14 where, based on the discussion above, typical experimental conditions will lead to σ being on the order of 10 −6 . (58) is the Green's function for the heat equation [41,42], defined on ℝ, i.e.,

The Average Variation, 〈δH f+σς,g [η]〉, and the Heat Equation-We observe that the bracketed term in Equation
Thus, we have: where: is a point on one member of a family of heat surfaces that are determined by the heat equations, with different (in fact, |g′(t)|-dependent) "initial conditions" (i.e., at s = 0) defined by G(z, |g′ (t)|). A typical example is shown in Figure 4.
Since for small noise, s, the heat kernel is approximated well by a Dirac delta function, we might expect that in the limit s → 0 + : (63) in which case, Equation (60) becomes: which we will shortly show is unbounded for certain choices of g(t). A good way to construct such g(t) is to pick a zero of f′(t) and have g′(t) differ only slightly from f′(t) on one side of the zero and differ greatly from it on the other side. Referring back to Equation (78), if we assume that |g′(t)| ≩ 0, then G(z, |g′(t)|) ∈ L 2 [ℝ], and there are a large number of theorems describing the behavior of u |g′(t)| (x, s). In particular, a limiting form of Theorem 5 in Chapter 5 on page 67 of [42], where the vertical sides are pushed out to ±∞, guarantees that as s → 0, u |g′(t)| (z, s) → G(z, |g′(t)|). Figure 4 also illustrates the useful inequality: (65) With this result, we are now ready to prove:

Proof:
We shall consider the case f″(t 0 ) > 0; the other case is similar.
We extend g to be piecewise C 3 [0, 1] off(t 0 − ξ, t 0 + ξ) and such that g′(t) ≥ 1 for all t ∈ [0, 1]\[t 0 − ξ, t 0 + ξ), except possibly finitely many points where it does not exist. The exact details of the extension do not matter, since contributions from the extension will be negligible compared to those from (t 0 − ξ, t 0 + ξ), as long as we do not choose a reference g(t) that does not violate the conditions discussed after Equation (36). Figure 2 illustrates how the extension might be chosen in one case.
We note, however, that in order to keep the variance of the variation small, so that we maintain high sensitivity (Equation (5)), we must keep as indicated in Equation (109). This provides a practical bound on the magnitude of the variation.
We also record for later use the calculation of the maximum magnitudes of the first and second partial derivatives of G(u, υ) with respect to u as: (75) and: (76) Hughes et al. Page 17 Given the structure of the integral in Equation (61), we see that u |g′(t)| (f′(t), s) → 0 as s → ∞.
Moreover, (77) A later analysis of the signal energy E f will reveal a different "initial condition" (these "initial conditions" refer to the noise-free case), which characterizes that receiver.
The structure of G(z, |g′(t)|) is more clearly seen in Equation (46), which we recall: (78) We note that G(z, |g′(t)|) is determined by the mathematical form of the signal receiver, in this case H f,g , as well as the reference waveform g′(t).
As shown in Figure 4, the supremum of the initial conditions may be made arbitrarily large by making g′(t) smaller. The effect of this change is reflected in the structure of the solution (heat) surface, as shown in Figure 5. The figure also shows that for all z and s: (79) which is consistent with the maximum property for solutions to the heat equation (discussed in Chapter 2, Section 3 of [42]).

Calculation of the Average Variation 〈ΔE f+σς (η)〉
From Equation (51), the average variation is the Wiener integral: (80) which becomes: (81) which has no dependence on the noise level σ and, thus, no heat surface.
We also observe that the integral in Equation (81) is the variation for the signal energy receiver in the noise-free analysis.

I.
Then, there exists a constant K 1 that depends on ‖η′‖ ∞ , but not on g or σ, such that: (82) II. There exists K 2 > 0, that depends on ‖η‖ ∞ , but not on Δ or σ, such that: (83) We break the proof into two parts.
Proof-Part I: Calculation of the variance of δH f+σς,g [η] (84) Recalling Equation (43), the Wiener integral on the right-hand side of Equation (84) becomes: Expanding the square as a double integral (and using Equation (18) for ς′(t) and Equation (19) for m t ), the Wiener integral may be written as: (86) where, since G(u, υ) is jointly continuous in u, υ, as before, we can interchange the order of integrations. The inner Wiener integral may be replaced by a Lebesgue integral, using Equation (146). It becomes: (87) where (from Equation (143)):

Author Manuscript
Author Manuscript Author Manuscript
We now rewrite the right-hand side of Equation (92) in terms of solutions of the heat equation, as was done following Equation (60), and find that Equation (92) equals: (94) where: (95) where we have used the change of variables: (96) and Equation (61) to go from the first to the second equations.
We now observe that, by Equation (22), 〈m t 1 , m t 2 〉 = 0 when |t 1 − t 2 | > Δ, in which case: (97) and so, Equation (94) may be written as: The functions appearing in Equation (98) are based on integrals that have the form: (99) where limiting behavior as s i → 0 for i = 1, 2 may be found using a Laplace expansion, as described in Equation 6.4.35 of Bender and Orszag [43], and Equation (99) holds, provided h is C 5 in a neighborhood of a.
Consequently, we see that, after rewriting the first integral as a product of a t 1 and a t 2 integral and using Equations (90), the difference of the first and the last terms in Equation (98) is of order σ 4 in the limit s 1 , s 2 → 0 and may therefore be dropped.
Consequently, we focus attention on the remaining integrals. Equation (99) enables this refinement of the limiting form of Equation (63): (100) where the n-th-partial derivative of G(u, υ) with respect to the first argument is denoted by G (n) (u, υ) and is guaranteed to exist for all n at all but finitely many points by the assumption that |g′(t)| ≠ |f′(t)| at all but finitely many points. Moreover, since |g′(t)| is bounded away from zero by hypothesis, there is a uniform bound on G (4) (f′(t), |g′(t)|) off the set of points where |g′(t)| = |f′(t)|.
Using (105) and (90), the difference between the second and third integrals appearing in Equation (98) gives: (106) to accuracy O[σ 4 ]. This may be further simplified, introducing errors of O[2Δ] by truncating the upper bound of integration for t 1 in the last two integrals from 1 to 1 − 2Δ, so that by Equation (23), ‖m t ‖ = 1, and the last two pieces cancel, leaving: Focusing on the product of integrals in the second row, we see that at the cost of an additional error term of O[Δ], we may replace ‖m t ‖ by 1 in the t 1 integral and then use Schwartz's inequality to bound the integral by the norms shown. To bound the t 2 integral, we use Equations (17) and (75) to obtain: (107) Focusing next on the integrals in the first row of the inequality, we use Schwartz's inequality to bound the t 1 integral and Equations (17), (76) and (89) to bound the t 2 integral to obtain: (108) Since the reference satisfies the constraint: (109) where α > 0, then we have: (110) This simplifies to our final bound: (111) where is a constant independent of g, η, Δ and σ. Let , and we are done.
We now rewrite Ω as: (120) In the second integral the z 1 integration is of an odd integrand over a symmetric interval and, hence, vanishes, so that we obtain after some rearrangement and simplification of the remaining Gaussian integrals: where we have used Equation (116) to obtain the last equation.
Inserting the limiting form of the integral into Expression (117) of small positive s 1 , s 2 into Equation (112) and using Equation (119), we obtain: (123) where K 2 is a constant independent of Δ and σ, but depending on η.
This completes the proof of the theorem.

The First Wiener Integral
The first type of Wiener integral is described by the following: and let F(u) be a (real or complex) measurable function defined on −∞ < u < ∞. Then, a necessary and sufficient condition that: (126) be a Wiener measurable function of x(•) over C 0 [0, 1] is that: (127) be of class L 1 on −∞ < u < ∞. Moreover, if this condition is satisfied, We derive this equation below. Published results of a similar form may be found in [40], Theorem 29.7 (n = 1 case and assuming that ρ is normalized to unity). Other references are either Koval'chick [44] (page 106, Equation (13)) or Cameron and Martin [45] (page 393, Equation (6.3)). The same result is derived by an argument that will be familiar to many physicists in Paley, Wiener and Zygmund (see [46] Equation (2.11)), although the result derived there also assumes that the ρ is normalized to one. There is a difference of between the result contained in Equation (128)

The Second Wiener Integral
The second type of Wiener integral we need is: (131) where ρ i (1) = 0 for each index i and the ρ i are orthonormal. Similar versions of this integral appear in many sources, for instance in Koval'chick [44] (page 107, Equation (14)), which contains (after transcription into the modern conventions): (132) This form appears to be derivable from Equation (131) using integration by parts. Specifically, We would formally transform this into the form of Equation (131) by the following steps: First observe that the integral in Equation (133) is equal to: Next, use the facts that we have assumed that ρ i (1) = 0 and, additionally, that the Brownian paths are normalized according to x(0) = 0, so that (classical) integration by parts of the integrals in the first line of Equation (134) would yield: (135) Although this "derivation" shows that the two forms are equivalent, as desired, it overlooks the fact that the integrals in Equation (132) cannot be classical integrals. In fact, Wiener [47] (p. 68) states that integrals of the form: (136) are actually Itō integrals. The correct integration-by-parts formula in this case is: (137) where [X, Y] t is the quadratic covariation process with: where P ranges over partitions of the interval [0, t] and the norm, ‖P‖, of the partition, t 0 < ⋯ < t n , is the mesh, i.e., max{|t i − t i−1 | : i = 1, …, n}.
However, in the case where Y is of bounded variation: (139) so that Equation (137) reduces to the classical integration by parts formula. Thus, the derivation above is (accidentally) correct.
The number of sources containing detailed derivations of these equations in English appears to be limited. The only source we have been able to locate is Paley, Wiener and Zygmund, which is completely self-contained and contains the equivalent of our Equation (131) (see Equation (2.14) in [46]), although the result derived there assumes that the measure is normalized to one and uses a slightly different notation for the Brownian paths.
We need to compute Wiener integrals like those on the left-hand side of Equation (131) in the case where the ρ k (t) are not orthonormal. Moreover, we only need to consider the special form: To apply Equation (131), we use the Gram-Schmidt process to obtain an orthonormal family ν 1 (t), ν 2 (t) from the original ρ 1 (t), ρ 2 (t). Using the short-hand notation: The Gram-Schmidt process is: (142) where: (143) which is expressed in matrix form: (144) or: (145) This permits us to rewrite Equation (140) as: to which (since the ν k (1) = 0, k = 1, 2) we may now apply Equation (131) to obtain:

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. Materials characterization using entropy signal receivers. The derivation of the prescription for the reference g that permits the improvement in contrast between the H f,g and H f,gO images is the subject of this study. (Far left) Sample diagram showing intended circular defect shape and the actual defect shape as revealed by several entropy images and verified by eventual destructive examination. (Top) An example of where boundary points can be balanced exactly, A 0 , …, A 5 ∈ ∂A(0), ∂B(0) = ∅. The cancellations may be enumerated as follows: the contribution to Equation (36) from A 0 , A 1 cancels that from A 4 , A 5 if ; the contribution from A 2 of −1 cancels that from A 3 , which is +1. C, D can always be scaled, so that C > |f′(t) + σς′ (t)|, for t ∈ (0, t −1 ), and similarly for D with t ∈ (t 0 , 1). The addition of additional pairs of zero-crossings for f′(t) + σς′(t) will lead to the same structure of canceling pairs. (Bottom)  A typical initial condition from Equation (78) is shown by the purple curve. Many different initial conditions, such as the one shown, define different "heat"-surfaces, which, in turn, define the average variation 〈δH f+σς,g [η]〉 via the inner z-integral in Equation (60). Also shown is the curve . Typical solution surface for Equation (62) resulting from the initial conditions shown in Figure 4. The heavy purple line represents a typical u |g′(t)| (f′(t), s). Many lines, such as the one shown, each from a potentially different solution-surface (determined by different initial conditions, such as shown in Figure 4), determine the "heat"-surface that defines the average variation 〈δH f+σς,g [η]〉 via Equation (60). The resulting surface inherits properties of the different solution surfaces, e.g., for negative values of x, the "heat"-surface is positive; for positive values of x it is negative, as s → ∞ the "heat"-surface decays to zero, as discussed after Equation (61). Surface shown for |g′(t)| = 0.5.