Half Way There: Theoretical Considerations for Power Laws and Sticks in Diffusion MRI for Tissue Microstructure

: In this article, we consider how differing approaches that characterize biological microstructure with diffusion weighted magnetic resonance imaging intersect. Without geometrical boundary assumptions, there are techniques that make use of power law behavior which can be derived from a generalized diffusion equation or intuited heuristically as a time dependent diffusion process. Alternatively, by treating biological microstructure (e.g., myelinated axons) as an amalgam of stick-like geometrical entities, there are approaches that can be derived utilizing convolution-based methods, such as the spherical means technique. Since data acquisition requires that multiple diffusion weighting sensitization conditions or b -values are sampled, this suggests that implicit mutual information may be contained within each technique. The information intersection becomes most apparent when the power law exponent approaches a value of 12 , whereby the functional form of the power law converges with the explicit stick-like geometric structure by way of conﬂuent hypergeometric functions. While a value of 12 is useful for the case of solely impermeable ﬁbers, values that diverge from 12 may also reveal deep connections between approaches, and potentially provide insight into the presence of compartmentation, exchange, and permeability within heterogeneous biological microstructures. All together, these disparate approaches provide a unique opportu-nity to more completely characterize the biological origins of observed changes to the diffusion attenuated signal.


Introduction
Diffusion is a fundamental transport process in many biological systems and is known to be sensitive to structure and inhomogeneity in the environment. The simplest description of diffusion is to assume a homogeneous environment in which the mean squared displacement (MSD) grows linearly with time and the distribution of particle displacements is Gaussian. However, when inhomogeneities are present in the environment, the interactions between diffusing particles and cellular structure lead to non-trivial displacement distributions, which most broadly can be classified as non-Gaussian. Advances in magnetic resonance imaging have made it possible to study diffusion processes in living tissue non-invasively [1] with considerable clinical impact [2,3]. Since the length scales probed by diffusing particles are orders of magnitude smaller than a typical scan voxel, measurements of diffusion contain quantitative microstructural information that is unique in comparison to other magnetic resonance (MR) modalities that measure relaxation, flow, or changes in blood oxygenation [4]. The most common experiments involve diffusion-weighted magnetic resonance imaging (DWI) of water, which enable measurement of local diffusion coefficients at well-defined diffusion times in living tissue. The most common application of DWI is in the human brain, where it has been well-established that there is a distribution of diffusivities within the neural tissue microstructure [5]. There have also been extensive applications in non-brain organs including kidney [6], liver [7], skeletal [8] and muscle [9].
Non-Gaussian dynamics are readily measured in diffusion-weighted MR data, but disentangling environmental information is a highly non-trivial process because biophysical complexity vastly exceeds the information content of the measured signal. This is a significant challenge in the development of practical modeling techniques to capture salient, relevant anatomical features. There is extensive research literature on microstructural inference, and many different techniques and formulations. These diverse techniques fall into two broad categories: 1.
Those which make strong assumptions about the environment and fit geometric models to the diffusion signal. Specifically, the signal is decomposed into a weighted sum of contributions from spheres, cylinders, and other geometric forms which restrict diffusion, and for which a closed-form of the diffusion in the compartment is available. These models typically assume that either there is no exchange between geometric compartments, and that in between compartments there is an additional "extracellular" compartment for which diffusion is assumed to be free or described by a tortuosity approximation. The choice of geometry and number of geometric compartments is made a priori (see, e.g., [10][11][12][13][14]).

2.
Those which derive mathematical descriptions of the diffusion signal to infer details of the microstructural environment with little or no constraints imposed on the geometrical boundary conditions (see, e.g., [15][16][17][18][19][20][21]). Within the mathematicallydriven approaches, there has been recent interest in generalizing the description of the diffusion process itself through continuous time random walk (CTRW) theory. Recently, the CTRW formalism has been applied to neural tissue to connect a model of diffusion-weighted signal with a physical interpretation of generalized diffusion dynamics that encode power law behavior [22][23][24][25].
Typically, presentation of these differing approaches is often designed to test or showcase feasibility, instead of investigating which technique is most appropriate for a particular application. Although a unifying framework has been proposed [26], navigating the conclusions of studies applying different models in different contexts can be a daunting prospect. A significant confound to the choice of analysis is that the diffusion signal is remarkably featureless. Diffusion-weighted water signals are smooth and typically monotonically decreasing. In contrast with spectral data [27], for example, there are few obvious features which can be used to unambiguously differentiate between different microstructural scenarios. As such, there is a question of degeneracy, that is to say, different signal models can explain the same data equally well. In some cases, there is the added problem of model uniqueness in which the same model can describe a single data set with more than one set of parameters. This is a serious emerging problem which has thus far received little attention in the literature.
Degeneracy immediately leads to the question of which model properties are fundamental to the description of the signal, and which are simply convenient (but non-unique) parameterizations. In other words, what statements can we make about the data which underpin all possible models, and what does that mean for the underlying tissue? A complete answer to this question is a considerable research undertaking, but there are existing theoretical directions which are suggestive of deeper connections. In particular, more than one approach involves power laws. The Random Permeable Barriers model (RPBM) [28] and approaches based on CTRW fractional diffusion [19,23] both suggest power law scaling, albeit for very different reasons.
That different approaches lead to similar forms suggest that power laws, and particularly the values of power law exponents, capture a fundamental property of the observed dynamics. It has been proposed that the value of power law exponents defines universality classes capturing the dimensionality and type of disorder present in the diffusion environment [28]. Currently, the relationships between the different power law exponents defined in different approaches are understudied. There has been initial work which derives the mathematical underpinning between two seemingly disparate techniques that characterize power law subdiffusion and diffusional kurtosis [25]. However, it is unclear that there is a connection between power law exponents and other modeling approaches which make strong geometric assumptions. In this work, we explore the connections between several important techniques in diffusion MRI: models which make an explicit assumption of stick structure [29], and those which make non-geometric assumptions, such as the CTRW (i.e., fractional) diffusion [19,23,25], and time dependent diffusivity [26,28]. We explore the theoretical connections between these different approaches using power laws as a unifying principle. We concentrate on the case where the mean squared displacement increases with the square root of time, as opposed to the simple linear Gaussian case, and find that this allows us to uncover some unexpected connections. The approach also points the way to an interesting new research direction.

Diffusion in Sticks
One of the possible approaches to higher-order diffusion image analysis is the assumption of diffusion in an explicit structure. This approach dates back to the work of Stanisz [30], who constructed a model of bovine optic nerve containing differently-shaped elliptical compartments defining geometrical boundary constraints in the diffusion equation. Solutions with these constraints can be expressed in a closed form solution that can be fitted to a set of measurements. Another approach which has been used by several authors makes use of a "stick" compartment. This is diffusion in a single direction which has some reduced value in all others. Typically, stick compartments act like Dirac delta functions, and have zero diffusivity in all but one direction. This model has a number of representations, including a tensor with only one non-zero eigenvalue [29]. Single stick models have been used previously [29] and are part of the Camino [31] compartment model hierarchy used in microstructure imaging [32].
Sticks can be used to construct other models or compartments. A common approach is to convolve the stick orientation with an orientational distribution as a model of structure with more than one direction. Since this is a convolution, this approach shares the same assumptions as spherical deconvolution methods that define a continuum of nonexchanging compartments which differ only in the variable(s) modeled by the chosen distribution. Examples of convolution-based modeling include an approach which constructs a distributed sticks compartment using a Bingham distribution [33,34], and another known as neurite orientation dispersion and density imaging (NODDI) that makes use of a Watson-distributed stick compartment [13]. It is also has been proposed to consider a uniform distribution in which there is no preferred orientation with stick distributed evenly in all directions (see [11], for example).
A compartment made up of non-exchanging sticks with no preferred orientation has a closed form, and the diffusion weighted signal, S, may be written as where erf represents the error function, b is the MR pulse sequence controlled parameter for diffusion sensitization, given by where q ∈ R + is a parameter with units of inverse length given by the details of the pulse sequence and t is the diffusion time, again a parameter of the pulse sequence (note that the units of b are s mm −2 , the inverse of those of a regular diffusivity). λ and λ ⊥ are the parallel and transverse diffusivities, respectively, and S 0 is the unweighted signal, which has the same (arbitrary) units as S. The diffusion-weighted signal itself is a real-valued quantity (with arbitrary units) measured by the MRI scanner as a result of applying a pulse sequence to the sample. In practice, each S is a function of the specifics of the pulse sequence, the scanner hardware, and the tissue being imaged. Diffusion-weighted images are acquired across a broad spatial region, potentially covering, say, a patients entire head. The signal as discussed in this work refers to individual voxel (i.e., three dimensional pixel) intensities from diffusion-weighted images. The signal decay expressions we are interested in describe voxel intensities as a function of diffusion-weighted scan parameters, which are typically summarized as a single real-valued parameter b to give a value to the overall sensitivity of the pulse sequence for the underlying diffusive process. Diffusion sensitivity is controlled by the action of pairs of magnetic field gradients sensitizing the scan to diffusive motion, and the diffusion time, which is the amount of time between the application of the gradient pulses. Diffusion imaging works by attenuating the measured MR signal by an amount proportional to the overall amount of diffusion per voxel. A good introduction to the approach and the associated MR physics can be found in [35], and a more formal treatment in [36].
Equation (1) is the form employed by [32], which they refer to as an "astrosticks" compartment. It is formally identical to the signal model derived by Kaden and coworkers in relation to the Spherical Means Technique (SMT) [34] who interpret it as the Fourier Transform of a Bingham distribution of sticks in the limit of no preferred direction. SMT aims to infer details of the kernel function (i.e., the signal from an individual stick) from diffusion-weighted data, which is interpreted as representative of the contribution of each individual stick. In its most general form, a perfect one-dimensional "stick" compartment is delta-function-like in that it has a non-zero diffusivity in a single orientation (only) and is zero in all others. However, Equation (1) makes slightly more relaxed assumptions where the diffusivity takes on a finite, non-zero value (λ ⊥ ) in all directions except one, and where λ λ ⊥ . In order to further approximate a perfect one-dimensional "stick", we can let λ ⊥ → 0, and let λ = D we recover the simpler formulation and obtain This form is comparatively unusual in the space of diffusion imaging models, and has features which are suggestive of connections elsewhere. The presence of the error function is notable in that it is unusual for a model of diffusion MRI. Different boundary conditions and assumptions lead to different expressions for the signal curve, which reflect different symmetries in the underlying equations and operators [37,38]. Free diffusion leads naturally to exponentials, for example, and assumptions involving cylinders often lead to Bessel functions (see, e.g., [32]). Error functions, however, are more commonly associated with probability theory, although they are also encountered in heat conduction [39]. In diffusion MRI, however, to the best of our knowledge, they are encountered only in the current context, and as a special case of fractional diffusion, which we explore below. We note also that Equation (3) may be expressed as a confluent hypergeometric function [33], which will be of interest later. Figure S1 in the Supplementary Materials is a plot of this function vs. b-value over a range of values for D.

Fractional Diffusion
Fractional diffusion assumes a generalized effective transport process described by a CTRW [40], in which displacement trajectory is governed by stochastic "step lengths" and "waiting times" drawn from separate power law distributions. CTRW theory is governed by the fractional diffusion equation where ∂ α /∂t α is the Riemann-Liouville fractional derivative in time for with order 0 < α ≤ 1, D α,β is the generalized diffusion constant with units of mm β s −α [22,23,41], and ∂ β ∂|x| β is the Riesz-Feller fractional derivative of order 0 < β ≤ 2. The absolute value in this definition is conventional for this derivative and indicates that derivatives in the positive and negative x direction are included. When α = 1 and β = 2, Equation (4) becomes the classical diffusion equation as a special case.
Furthermore, it should be noted that the Riemann-Liouville fractional derivative operator in Equation (4) can be written as where the * operator denotes a convolution and the Γ function is the generalized form of the factorial function defined for real numbers. The fractional derivative is simply the first derivative of a convolution of a power law time kernel of order α and the diffusion propagator [42]. Equation (4) can be solved using a Fourier-Laplace method with open boundaries. The transform p(x, t) → p(q, s) gives Inverting the Laplace transform such that p(q, s) → p(q, t) further gives where E α (·) is the one-parameter Mittag-Leffler function (MLF) [43][44][45]. This encodes the value of the power law exponent of the waiting time distribution as 0 < α ≤ 1. Equation (7) also contains the order of the spatial fractional derivative β as a stretching exponent on the spatial Fourier variable q. When the duration of the diffusion-encoding gradient pulses in the pulse sequence are chosen so as to be much less than the diffusion time, it can be shown that the q from the pulse sequence (Equation (2)) and the Fourier variable from Equation (7) coincide [36]. This is known as the q-space approximation. It underpins a great deal of models in the diffusion MRI literature [26] and we will make use of it here. In this work, we are interested in two special cases of fractional diffusion. Firstly, the time-fractional case where 0 < α ≤ 1 and β = 2, meaning that the Riesz-Feller derivative in Equation (4) reverts to a standard second-order partial derivative in x. Secondly, we will make use of the quasi-diffusion case [46] where we require that β = 2α. Here, the relationship between the exponents allows for the arguments in Equation (7) to be rewritten so that The units of D α,2α are then mm 2α s −α . If we define D α,2α = D α we recover a constant with units of regular diffusivity and can further write and hence p(q, t) = E α (−(bD) α ).
This identity will allow us to bridge from the fractional/quasi-diffusion case to other diffusion models (such as Equation (3)) which are parameterized by b-value. Figure S2 in the Supplementary Materials contains a plot of this function vs. b-value for several values of α. In terms of the diffusion signal measured by the scanner, the signal S is proportional to p(q, t) with the constant of proportionality given by S 0 , which is the signal intensity without diffusion sensitizing gradients. Hence, from here, we will replace p(q, t) with S S 0 .
The MLF is an entire function as can be represented as a power series When α = 1, through the identity Γ(k + 1) = k!, Equation (11) becomes, which is the Taylor series form of the exponential function. This coincides with the signal attenuation expression for free, Gaussian diffusion (i.e., S S 0 = exp(−bD) when z = bD as per the quasi-diffusion approximation when α = 1). This exponential is also the standard form for diffusion attenuation for free, Gaussian diffusion [36]).
A different special case occurs when the exponent α = 1 2 , which has been previously reported as a typical value for healthy white matter in the brain [25]. Under this circumstance, the diffusion-weighted signal, S, is modeled as where erfc represents the complementary error function. Immediately, it can be appreciated that there are formal similarities shared between Equation (3), representing a geometric description of astrosticks, and Equation (13), representing quasi-diffusion, as there is a presence of an error function and the function decays as 1 √ b in the limit of high b. One obvious difference, though, is that at b = 0, Equation (13) is continuous and does not diverge, whereas Equation (3) is only defined asymptotically at non-zero values of b.

Confluent Hypergeometric Functions
Clearly, Equations (3) and (13) are not equivalent. However, it is tempting to look for a mathematical connection between the two approaches. A promising place to start is that they are both examples of confluent hypergeometric functions. Confluent hypergeometric functions are a very broad class of special functions which unify most of the classical special functions, such as orthogonal polynomials, trigonometric functions, and exponentials as different special cases of a larger class (indeed, this is the original of the term "confluent", derived from the Latin for "flowing together") [39]. A short discussion of Confluent Hypergeometric functions can be found in Appendix A.
Equation (3) is a confluent hypergeometric function of the first kind and Equation (13) is a confluent hypergeometric function of the second kind Due to their interrelated definitions, it is possible to derive a number of different relationships between functions with different constant values, and also between the first and second kinds. A particularly useful relationship for our current purposes is Substituting z = bD (which here corresponds to the square of the quasi-diffusion parameter) as per the quasi-diffusion special case above, and setting µ = 1 2 , γ = 3 2 gives which results in exp(bD)Ψ 1, By dividing out the exponential term, the confluent hypergeometric function of the second kind (Ψ) is given as Equation (19) can be written as the MLF when α = 1 2 using the relationship in Equation (15) such that Equation (17) can be used to derive a number of basic relationships between functions, but for current purposes it is enough to note that the second term on the right hand side of Equation (20) contains the (lower) right hand side of Equation (14), and therefore the representation of astrosticks in Equation (3). The first term on the right hand side of Equation (19) contains a further confluent hypergeometric function (Φ) with indices 0 and 1 2 corresponding to a zeroth order Hermite polynomial, which is simply a constant. Finally, it is important to highlight that Equation (20) is defined at b = 0, whereas the explicit form of Equations (1) and (3) are divergent at b = 0. Therefore it can be seen that, asymptotically, the astrosticks and fractional models share special cases through a simple, invertible transform when the fractional exponent is equal to 1 2 . Additionally, we note that Equation (14) diverges as b → 0 whereas Equation (15) does not.

Power Laws
While it appears that a dispersed stick-like structure encodes a particular special case of the fractional model pertaining to subdiffusive behavior of power law exponent α = 1 2 , there is another interesting association to the RPBM effective media approach in heterogeneous environments [26,28]. In this section we will make use of the time-fractional case of fractional diffusion, setting β = 2. In this case, the diffusion constant has fractional units and to differentiate it from the regular diffusivity used above, we will denote it D α,2 (in line with the notation of Equation (4), rather than D). The RPBM model derives a time-dependence for diffusivity which is written down as follows where D ∞ is the long-time asymptotic diffusivity and D(0) is short time constant controlling the contribution of the power law time dependence with exponent θ. This defines a diffusion-like process with a transient time-dependence where the second term goes to zero in the long time limit, and the constant diffusivity D ∞ remains. D(t) has units of a regular diffusivity (i.e., mm 2 s −1 ), and there fore requires that D ∞ and D(0)t −θ also have these units. Since the D(0) term contains an explicit time dependence with a fractional power, we can see that D(0) must therefore have units of mm 2 s θ+1 . Figure S3 in the Supplementary Materials contains a plot of the signal decay expected from this form of a time-dependent diffusivity vs. b-value, assuming a monoexponential relationship with the signal, for several values of θ.
The classical Gaussian representation of mean squared displacement, r 2 (t) , is where d is the dimensionality of space. In an effort to generalize the MSD to include time dependency, D(t) from Equation (21) can be substituted to give Conversely, the equivalent form for the MSD in the case of subdiffusive fractional diffusion in the CTRW framework gives where D α,2 /Γ(1 + α) is the normalized fractional diffusion coefficient with units mm 2 s −α .
Since the classical Gaussian case in Equation (22) is linear in time, we can define the scaling properties of the diffusivity using the quantity where ν is a constant for simple, Gaussian diffusion displacement but exhibits a time dependency for heterogeneous, non-Gaussian displacements. This quantity for the RPBM and CTRW approaches is found by dividing Equations (23) and (24) by time, t, which gives and Differentiating each with respect to time gives and Equations (28) and (29) are equivalent provided the follow relationships hold and Hence, it is apparent that the fractional diffusivity is associated with the short-time constant defined in the RPBM, and the fractional exponent has a straightforward relationship with the RPBM's temporal exponent.
The link between exponents is of particular interest in that power laws in the timedependent diffusivity are known to be linked to different structural universality classes [28]. In particular, θ = 1 2 is associated with diffusion in the presence of randomly oriented sticks as presented above in Equations (1) and (3). This provides a further link between models: θ = 1 − α = 1 2 is associated with stick structure via universality class, and also via the functional form of the decay curve to the explicit geometric assumption of stick-structure. As such, we can see that the observation of a given structural universality class is equivalent to the assumption of an explicit geometry.
Finally, we also note that there has been previous work to derive a connection between power law exponents α [25] in case where β = 2 and the diffusional kurtosis K [47], showing that which can now also be computed from RPBM theory as and for stick-like structures where θ = 1 − α = 1 2 pertains to a kurtosis expression of 3 2 π − 3 or ∼ 1.71.

Conclusions
In all these connections, it appears power laws play a central role. Both the fractional and time-dependent approaches make assumptions of power law behavior in the signal, which is identified with exponents measured from a set of observations across a range of b-values or diffusion times. Approaches that make geometric assumptions are also implicitly assuming the same power law behavior, though in this case, the value of the exponent is fixed at 1 2 . This exponent is derived differently and is inherent to perfect stick-like structures, which illustrates that disparate sets of assumptions are nonetheless linked by the fundamentals of the measured diffusion process itself.
In addition to this theoretical connection, however, the approaches discussed here are also linked by similarities in their data requirements. Any model which does not assume Gaussian diffusion requires measurements of the behavior of the diffusion signal either at several different time points, several different sensitizations, or potentially even both, in order to provide information about nonlinear behavior in the diffusion-weighted signal. One interesting implication is that a particular multi-shell or multi-diffusion time acquisition can be analyzed in several different ways. Geometrical models can be fitted to the same data as models which concentrate on the diffusion process. With suitable scaling assumptions, a model of diffusion time dependence can be adapted to consider measurements as a function of b-value samples and vice versa (see, e.g., [48]). The different formulations provide different insights into the diffusion environment, and may potentially also allow more effective fitting constraints than is possible using only one approach in isolation. This may have important ramifications when addressing degeneracy both between models and where multiple parameter combinations in a single model are possible.
The analysis presented here clearly points to power laws being a powerful and versatile tool with which to interrogate data. We have restricted ourselves to the case where the time exponent is equal to 1 2 , but there is no reason to believe this is the only set of relationships of this kind. Different exponents correspond to other universality classes, other special cases of the Mittag-Leffler functions, and other relationships within the confluent hypergeometric functions. We look forward in the future to exploring more of these relationships and what they reveal about the differing assumptions and formulations of models in the literature. In particular, it is not clear that this stage what features of the data are fundamental, which are the result of the choice of a particular parameterization, and what is the most fundamental description of the variation in the data.
Theoretically, the approach considered here points away from the current trend to formulate models with ever more constraining assumptions and rather towards exploring which features of the data are fundamental and therefore may be robustly estimated. There is currently no accepted criteria by which a feature in a model can be said to be present in the data, and there are examples of situations in which different sets of model parameters can lead to predictions of the same set of measurements [49]. The search for commonalities in the behavior of the signal, as opposed to the search for new parameterizations, offers interesting new directions in the analysis of diffusion-weighted data for biological tissue and has the potential to provide further insight into what can and cannot be measured in clinical and research MR systems. Author Contributions: M.G.H. and C.I. planned and wrote this paper together. Links between models were discussed and developed jointly. Although individual ideas originated with one or other author, each critiqued and corrected the other's work. Both authors had been exploring these lines individually, and collaborated to bring the overall narrative together. The actual writing of the article was iterative, with each author contributing text which was them modified by the other until the manuscript converged on a form both jointly agreed was suitable for submission. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Confluent Hypergeometric Functions
Confluent hypergeometric functions can be separated broadly into two different groups, sometimes referred to as the first and second kinds. The following description follows a similar line to that of [39].
Confluent hypergeometric functions as solutions of the hypergeometric equation where u is an unknown function, the prime denotes differentiation with respect to the independent variable z ∈ R, and γ and µ are real-valued constants (γ / ∈ Z). Equation (A1) is not a unique parameterization, however, and an alternate form can be obtained by writing u = z 1−γ v which gives where χ = 2 − γ and η = 1 + µ − γ. Notice that Equation (A2) has the same overall form as Equation (A1) but with differently defined constants. So if v is a solution of Equation (A2), then a function of the form u = z 1−γ v is also a solution of Equation (A1) and we have two classes of related, linearly independent solutions [39]. Conventionally, solutions of Equation (A1) are referred to as confluent hypergeometric functions of the first kind, denoted Φ(µ, γ; z) whereas confluent hypergeometric functions of the second kind are expressed in terms of the sum of the u and z 1−γ v and denoted Ψ(η, χ; z).