A Review of Image Reconstruction Algorithms for Diffuse Optical Tomography

: Diffuse optical tomography (DOT) is a biomedical imaging modality that can reconstruct hemoglobin concentration and associated oxygen saturation by using detected light passing through a biological medium. Various clinical applications of DOT such as the diagnosis of breast cancer and functional brain imaging are expected. However, it has been difﬁcult to obtain high spatial resolution and quantiﬁcation accuracy with DOT because of diffusive light propagation in biological tissues with strong scattering and absorption. In recent years, various image reconstruction algorithms have been proposed to overcome these technical problems. Moreover, with progress in related technologies, such as artiﬁcial intelligence and supercomputers, the circumstances surrounding DOT image reconstruction have changed. To support the applications of DOT image reconstruction in clinics and new entries of related technologies in DOT, we review the recent efforts in image reconstruction of DOT from the viewpoint of (i) the forward calculation process, including the radiative transfer equation and its approximations to simulate light propagation with high precision, and (ii) the optimization process, including the use of sparsity regularization and prior information to improve the spatial resolution and quantiﬁcation.


Introduction
Diffuse optical tomography (DOT) is a biomedical imaging modality that reconstructs the distribution of the optical properties of scattering and absorption coefficients as tomographic images by employing light illumination and detection at the surface of a measured object [1][2][3][4][5][6][7].Near-infrared (NIR) light propagates deep into tissues and is mainly absorbed by oxy-/deoxyhemoglobin. Therefore, by acquiring DOT with NIR light images, the concentrations of oxy-/deoxyhemoglobin and related oxygen saturation can be determined.This permits diagnosis of diseases such as breast cancer [8][9][10][11][12] and those involving angiogenesis, and can be used to monitor activities in the human brain [13] and to observe the oxygen supply to the neonatal brain in the intensive care unit [14], both of which are reflected by changes in regional blood flow in the brain.The scattering coefficient can be an indicator of changes in the tissue conditions.
Although DOT appears to be a promising novel biomedical imaging technology, it involves technical problems attributed to light propagation accompanied by scattering and absorption.Unlike X-ray computed tomography (CT), image reconstruction employing backprojection does not function well for DOT because of diffusive light propagation.Therefore, DOT requires image reconstruction consisting of two processes.The first is the forward process, which calculates light propagation and predicts measurements.The second is an optimization process that minimizes the error between the actual and predicted measurements.
In the forward process, light propagation can be calculated with a given set of absorption and scattering coefficients by employing the radiative transfer equation (RTE) and various approximations of the RTE, including the photon diffusion equation (PDE), P N approximation, and Monte Carlo method.The predicted measurements at the surface of the object are obtained by solving these equations.In the optimization process, a given set of optical properties used in the forward process is updated to minimize the error between the predicted and actual measurements by employing various optimization methods such as the Newton-Raphson method, which minimizes the squared error.The use of regularization techniques and prior information is a good option in the optimization process for some applications.Various image reconstructions can be composed to solve the inverse problem by selecting and combining adequate methods for the forward and optimization processes.Because a strong scattering effect in biological tissues is essentially unavoidable, image reconstruction is crucial for realizing DOT in practical clinical use.
Studies on image reconstruction have attempted to resolve the technical problems of spatial resolution and quantification accuracy in DOT images of highly heterogeneous measured objects.Image reconstruction has been improved by the recent studies mentioned in this review, although these efforts have not been reflected in clinical applications.To overcome this low spatial resolution, sparsity regularization techniques related to compressed sensing technology have been introduced.The use of prior information about the structure inside the body obtained from other imaging modalities, such as X-ray CT and magnetic resonance imaging (MRI), has been proposed to improve spatial resolution.Spectral prior information improves the quantification of chromophore concentrations in multispectral DOT imaging.Moreover, changes in the circumstances surrounding DOT in the past two decades affect image reconstruction algorithms.Recent progress in computational technology, including artificial intelligence (AI) and high-performance supercomputers, may have altered image reconstruction of DOT.Image reconstruction schemes employing AI with deep learning in diffuse optical imaging [15][16][17][18][19] and the simulation of light propagation using supercomputers [20] have also been attempted in recent years.Progress in diffuse optics and related imaging technologies, including photoacoustic (PA) imaging, which allows high-resolution imaging of blood vessels deep inside the body [21,22], may promote reconsideration of the role of DOT and its image reconstruction.In such a changing situation, it is worth reviewing what has been achieved in DOT image reconstruction for research in diffuse optical imaging and related fields.This review provides useful information to select algorithms for clinical applications of DOT and will assist researchers working in emerging DOT-related research fields, including AI, high-performance computation, and different optical imaging technologies, to expand their research into DOT.
The general framework and theoretical aspects of DOT image reconstruction are described in detail in the literature [23,24].Previous review articles [2,[25][26][27] have provided highly informative guides for implementing DOT image reconstruction.Here, we review the methods for the forward and optimization processes with some studies on DOT image reconstruction, including some recent reports that were not included in previous reviews.The authors tried to cover as wide a range of topics and studies in this review as possible, although this may have inflated rather than simply fledged this review.Figure 1 illustrates the flow of image reconstruction for DOT and related topics for each of the processes mentioned in this review.

Measurement
For image reconstruction of DOT, the distribution of optical properties, such as the absorption coefficient μa(r) and the scattering coefficient μs(r)/the reduced scattering coefficient μs′(r) at position r in the images, must be calculated.The images of the optical properties can be translated into images of the physiological and functional properties, such as the hemoglobin concentration and oxygen saturation of blood and tissues.The optical properties are calculated by comparing the actual and predicted measurements.Therefore, image reconstruction for DOT begins with the measurement.
Detailed descriptions of the measurement for DOT image reconstruction can be found in the literature [1,[3][4][5][6][7].Light sources such as lasers and laser diodes are used for illumination of the surface of the medium to generate measurements of light propagating through the measured object and reaching its surface.The measurements are obtained using optical detectors that can measure the light intensity at the surface of the measured object.Optical detectors such as photomultiplier tubes connected to optical fibers on which the opposite side is attached to the measured object can be employed for the measurement.The types of measurements are usually categorized into time-domain (TD), frequency-domain (FD), and continuous-wave (CW) measurements with illumination using pulsed, intensity-modulated, and continuous light, respectively.Through TD measurement, the distribution of time-of-flight (DTOF) is obtained, and some characteristics, such as moments of DTOF (the mean time of flight of photons is the first moment of DTOF), integral transform, and the shape of the DTOF itself, can be used in image reconstruction.Through FD measurement, the intensity of light, modulation depth, and phase shift are used.The light intensity is used in the CW measurement.

Measurement
For image reconstruction of DOT, the distribution of optical properties, such as the absorption coefficient µ a (r) and the scattering coefficient µ s (r)/the reduced scattering coefficient µ s (r) at position r in the images, must be calculated.The images of the optical properties can be translated into images of the physiological and functional properties, such as the hemoglobin concentration and oxygen saturation of blood and tissues.The optical properties are calculated by comparing the actual and predicted measurements.Therefore, image reconstruction for DOT begins with the measurement.
Detailed descriptions of the measurement for DOT image reconstruction can be found in the literature [1,[3][4][5][6][7].Light sources such as lasers and laser diodes are used for illumination of the surface of the medium to generate measurements of light propagating through the measured object and reaching its surface.The measurements are obtained using optical detectors that can measure the light intensity at the surface of the measured object.Optical detectors such as photomultiplier tubes connected to optical fibers on which the opposite side is attached to the measured object can be employed for the measurement.The types of measurements are usually categorized into time-domain (TD), frequency-domain (FD), and continuous-wave (CW) measurements with illumination using pulsed, intensitymodulated, and continuous light, respectively.Through TD measurement, the distribution of time-of-flight (DTOF) is obtained, and some characteristics, such as moments of DTOF (the mean time of flight of photons is the first moment of DTOF), integral transform, and the shape of the DTOF itself, can be used in image reconstruction.Through FD measurement, the intensity of light, modulation depth, and phase shift are used.The light intensity is used in the CW measurement.

Prediction of the Measurement: Forward Process
The forward and optimization processes are described in detail in previous studies [2,[23][24][25][26][27][28].The predicted measurements to be compared with the actual measurements are obtained by computing the light measured at the surface of the object based on the theory of light propagation inside biological media.Light propagation involves the transport of photons, accompanied by scattering and absorption.In this case, the radiative transport equation (RTE) describes the light propagation and can be used to obtain predicted measurements with a set of given optical properties.Some approximation methods of the RTE, such as P N approximation, diffusion approximation (DA) with the photon diffusion equation (PDE), and the Monte Carlo (MC) method are used in the computation of light propagation to reduce the computational difficulties of the RTE.Except for the MC method, general numerical computation methods such as the finite element method (FEM) and the finite difference method are used to compute light propagation and predicted measurements.

Optimization Process for Nonlinear Image Reconstruction
Let F i,j (µ a , µ s ) be the mathematical operation for calculating the measured light with optical properties µ a and µ s and the given positions of the light sources and detectors to obtain the predicted measurements: µ s = (1 − g)µ s is defined by the anisotropy (asymmetry) factor, g, which is the average cosine of the scattering angle.Image reconstruction can be formulated as an optimization problem to find µ a and µ s while minimizing the cost function, which consists of the squared error and additive regularization term, as follows: where M i,j are the measurements with the ith source and jth detector, respectively, and w i,j is the weight adjusting the contribution of M i,j to the image reconstruction.R(µ a , µ s ) is a regularization term.The image-reconstruction process used to solve the inverse problem is generally ill-posed.Image reconstruction often suffers from the nonuniqueness of solutions and instability owing to noise contaminating the measurements.R(µ a , µ s ) is a function for evaluating the extent to which the solution (reconstructed image) satisfies the condition expected in the desired solution and for reducing the overfitting of F i,j to M i,j contaminated by noise.By minimizing R(µ a , µ s ) with a residual error, this ill-posed nature can be alleviated.Various types of R(µ a , µ s ), such as the squared L 2 -norm of the reconstructed image (Tikhonov regularization), L p -norm (0 ≤ p ≤ 1), and total variation, which is the norm of the gradient of the image, can be applied.g adjusts the regularization effect.The relationship between the optical properties and measurements as a result of light propagation is nonlinear.Therefore, image reconstruction is performed by solving Equation (1) by employing nonlinear optimization methods with iterative updating processes, such as the Newton-Raphson, quasi-Newton, and conjugate gradient methods.For the nonlinear optimization process, the gradient of the cost function f, the sum of the squared error term, with the regularization term in Equation ( 1) must be computed, as follows: where µ k is the optical property of interest at the kth position r k , where one of the nodes for the FEM to compute F i,j exists.The vector matrix formula in Equation ( 2) is often used.
where F and M are column vectors in which F i,j and M i,j are aligned, respectively.µ is a vector of µ k .W is the diagonal matrix with w i,j , and J is the Jacobian matrix with ∂F i,j /∂µ k as its elements, which can be calculated by the perturbation method or the adjoint method derived in the early important work on the DOT image reconstruction by Arridge and Schweiger [23,28].Using the Newton-Raphson method, iterative image reconstruction can be expressed as where H = ∇ 2 f (µ) is a Hessian matrix and the subscript t is the number of iterations.The updating process in Equation ( 4) is repeated until the solution converges.The Levenberg-Marquardt method, which is often employed in DOT image reconstruction, uses the following iteration to solve Equation ( 2), with W equal to an identity matrix and γ = 0.
where γ is a small positive value and I an identity matrix.The nonlinear optimization methods, such as the steepest descent and conjugate gradient methods, have been compared for DOT image reconstruction with DA [29].

Optimization Process for Linearized Image Reconstruction
This section describes the optimization process for image reconstruction using a linearized forward process.As a good approximation, the µ of the reconstructed image µ is obtained, and the measurements are approximated using the Taylor expansion as where F(µ) is the predicted measurement with µ (or the measurement equivalent to the predicted measurement) and δµ = µ − µ.When the series expansion is terminated in the first order, the forward process can be linearized, and the linear equation is obtained.The linearizations of the forward process for the absolute and logarithmic values of the measured light intensities are referred to as the Born and Rytov approximations, respectively.Then, the optimization process can be formulated in a manner similar to that in Equation (1) with regularization, as follows: where δM = M − F. When employing classical Tikhonov regularization with a weighting matrix L formulated as R(δµ) = Lδµ 2 , δµ is obtained by setting the derivative of the objective function of Equation (7) to zero, and the reconstructed image is expressed as µ = µ + δµ.Depending on the type of regularization term, a nonlinear optimization process is needed to solve Equation (7).The limitations of linearized image reconstruction have been previously discussed [30].The linearization approach is appropriate when changes in the optical properties are sufficiently small and exist in small regions.

Bayesian Approach
The measurements are always contaminated with additive random noise.Thus, the forward equation with a given optical property can be written as M = F(µ) + ε, where ε is the random additive noise and µ is the given optical property.Owing to the randomness of ε, M is a random variable with a conditional (prior) probability density function p(M|µ).Assuming µ and ε are stochastically independent, the joint probability is formulated as p(M, µ) = p(M|µ) • p(µ) = p(µ|M) • p(M), which is Bayes' rule.
In the Bayesian approach based on the above formula, image reconstruction is to find µ maximizing the conditional (posterior) function, p(µ|M) = p(M|µ) • p(µ) as a likelihood function under the condition of M determined, i.e., p(M) = 1.When ε is Gaussian random noise with a covariance matrix W and an average of zero, the log-likelihood of the posterior function is equivalent to Equation (1) with p(µ) = exp(−λ • R(µ)).Assuming that µ is a Gaussian random variable with a variance 1/λ, the image reconstruction incorporates classical Tikhonov regularization.
By choosing the probability density function, the Bayesian approach can take advantage of regularization methods and prior knowledge for image reconstruction, as shown in previous studies [31 -33]; some Bayesian approaches are introduced in the following sections.
The image reconstruction method can be characterized by the methods selected for the forward and optimization processes, particularly the equations employed in the forward process and the cost functions with a certain regularization and use of prior information in the optimization process.Several studies on image reconstruction algorithms are categorized and introduced in the following sections from the viewpoint of the forward and optimization processes.

Radiative Transfer Equation
Light propagation through biological tissues is mathematically described by the radiative transfer equation (RTE), as follows [23]: where c is the speed of light, I(r,s,t) the radiance (light intensity), and q(r,s,t) the light source term.r and s represent the position and direction of light, respectively.p(s•s ) is the scattering phase function, which represents the probability that the direction of light s is changed to s by the scattering event.The Henyey-Greenstein function is commonly used as p(s,s ) in DOT image reconstruction.Equation (10) represents the time-dependent case.The time-derivative term on the left side of Equation ( 10) is removed in the case of time-independent CW measurement.Fourier-transformed versions of Equation ( 10) are used for FD measurements.The computation of the RTE does not impose any limitations on the optical properties.Therefore, RTE can compute light propagation more accurately than other methods, such as DA, and can be the best choice for image reconstruction for objects that include voidlike regions, such as the trachea in the neck, which can affect DOT imaging of thyroid cancer [34,35].
Klose and Hielscher employed time-independent RTE for image reconstruction [36,37].For the computation of the RTE, they employed the upwind-difference discrete-ordinate method by discretizing the angular and spatial variables in the RTE.After experimental validation of the forward process with a rectangular phantom, including the void-like region [36], image reconstruction using the gradient method, referred to as model-based iterative image reconstruction (MOBIIR), was performed.Images of µ a and µ s were obtained in a phantom containing void-like regions [37].Abdoulaev also reported MOBIIR with a forward process using FEM [38].
Tarvainen et al. used the RTE in FD measurements during the forward process [39].The RTE was computed using the FEM, in which stream diffusion modification was utilized.In the optimization process, the cost function with total variation regularization (see Section 4.1) was minimized using the Gaussian-Newton method.Image reconstruction was successful in the 2D numerical simulations of various cases, including low-scattering blobs and low-scattering and low-absorption gaps.
Soloviev and Arridge proposed RTE-based image reconstruction for the FD measurement of an object, such as an embryo that consists of weakly and highly scattering regions [40].Assuming that µ a + µ s is significantly different between the weak and high scattering regions, an approximation of the RTE solution was derived for the forward process.The optimization process with Tikhonov regularization was demonstrated using 3D numerical simulations.
Machida et al. proposed image reconstruction with linearization employing the Rytov approximation accompanied by Green's function calculated using the F N method [41].The F N method was originally used to compute the 1D RTE using singular-eigenfunction expansion, which was extended to 3D computation by employing the rotated reference frame method [42].The image reconstruction method was examined in phantom experiments using CCD measurements.Machida also reported numerical simulations of image reconstruction based on the Born approximation and 3D FN computation of the RTE for a slab geometry with spatially oscillating structured light [43].
Image reconstruction employing RTE is particularly suitable for small measurement objects such as finger joints.Clinical applications for diagnosing rheumatoid arthritis have been reported [44][45][46].Image reconstructions based on the RTE have been reported in fluorescent diffuse optical tomography (FDOT) [47], which reconstructs the concentrations of fluorescent probes [48,49], and in quantitative photoacoustic tomography (QPAT) [50], which reconstructs the concentrations of exogenous and endogenous chromophores from photoacoustic pressure waves [51][52][53][54].FDOT and QPAT are used to image subjects that exist in shallow regions and in small objects, where DA is often invalid.Efficient and precise computational methods for RTE have been studied [20,55].

P N Approximation
To obtain a good approximation of the RTE, P N approximation [23] was attempted in the forward process of image reconstruction.The P N approximation is derived from an orthogonal function expansion using spherical harmonics in Equation (10), in which I(r,s,t), q(r,s,t), and p(s•s') [23] are formulated as where ψ l,m , q l,m , and Θ l,m are the coefficients, and Y l,m the spherical harmonics of order l and degree m, which are associated with the Legendre polynomial and orthonormal function.Y * l,m is the complex conjugate of Y l,m .Owing to the orthonormality of the spherical harmonics, the RTE, in which Equations ( 11)-( 13) are substituted, is decoupled into (N + 1)! simultaneous equations by multiplying Y * l,m when the spherical harmonic expansion is terminated at l = N.The P N approximation is computed with simultaneous equations.
Boas et al. compared P 3 approximation with DA for the determination of the optical properties from MC simulated measurements and from P 3 -approximated measurements in the FD measurement [56].µ a was more accurately determined using P 3 approximation, although the DA estimated µ s ' better than P 3 approximation in the highly forward-scattering case.Jiang and Paulsen derived a stable and computationally convenient higher-order DA (P 3 approximation) with second-order spatial derivative terms and compared the measurements with a cylindrical phantom and the computational results with the higher-order DA and DA using FEM [57].
Oliveria and Tahir developed a computer program based on their original code, called EVENT, to compute the P N approximation with FEM and compared the computational results with the P N approximation and with DA.They succeeded in reconstructing images from MC-simulated measurements by employing P 7 approximation in TD measurements [58].
Jiang developed an image reconstruction algorithm based on the third-order diffusion approximation (P 3 approximation) with Marquardt and Tikhonov regularizations and succeeded in image reconstruction using 2D numerical simulations [59].Yuan et al. implemented a 3D FEM computation of P 3 approximation and compared P 3 approximation with P 1 approximation and MC simulation.They reconstructed an image of a 3D numerical phantom mimicking a finger joint with cartilage between the bones [60].
Wright et al. performed image reconstruction implementing the FEM calculation of the P N approximation in the forward process in FD measurement [61].They employed the even-parity formulation of RTE, which was derived by introducing the light intensity, I ± (s) = {I(s) ± I(−s)}/2, and the light source, q ± (s) = {q(s) ± q(−s)}/2.Consequently, the RTE was transformed into the formula of I + including the second-order spatial derivative.The simultaneous equations of P N approximation, which are expressed in the vector matrix formula, are discretized using the FEM.Numerical simulations of image reconstruction employing the P 1 , P 3 , and P 5 approximations were performed, with the measurements computed using the P 7 approximation.It was demonstrated that the highly absorbing and scattering regions in the low-scattering medium were imaged more accurately in the reconstructed image using the higher-order P N approximation.

SP N Approximation
Klose and Larsen introduced a simplified spherical harmonics (SP N ) approximation, which was originally proposed for neutron transport, for CW light propagation in biological media [62].In the SP N approximation, the P N approximation of 1D light propagation in the planar geometry was directly applied to 3D light propagation by replacing the 1D differential operator, d/dx, by the 3D gradient operator, ∇.In the 1D P N approximation, the Legendre moments of radiance φ n (x) = P n (ŝ)I(x, ŝ)dΩ are defined.A set of four equations for the SP 7 approximation was formulated for composite moments ϕ m (m = 1, 2, 3, 4) defined by the sum of the series of ϕ n (n = 1, 2, . . .,6).The equations for the SP N approximation (N < 7) were readily derived from those for the SP 7 approximation, and the equation for SP 1 approximation was equivalent to the PDE described in Section 3.4.The SP N approximation was calculated using a set of coupled diffusion-like equations with second-order spatial derivative terms.The numerical simulations in [62] indicated that the precision of the approximation improved rapidly from N = 1 to 7. The computational time of the SP N approximation is significantly shorter than that for the RTE computation by the discrete ordinate method discretizing s.
Chu et al. developed a 3D FEM for SP N approximation in FD measurements [63].The computational results were compared with those of MC simulations.The SP N approximation (N > 1) provides a more accurate phase and amplitude of the modulated light than SP 1 approximation.Chu and Dehghani reported image reconstruction in FD measurements using the SP 7 approximation [64].The 2D image reconstructions of µ a and µ s from the phase and amplitude obtained by the numerical simulation with the SP 5 approximation were performed by employing the Jacobian matrix, J, calculated using the SP 1 and SP 5 approximations.Image reconstruction using the SP 5 approximation provided more accurate quantitative and qualitative images.
Domínguez and Bérubé-Lauzière reported image reconstruction using time-domain parabolic SP N equations [65,66].In addition to the aforementioned CW and FD cases, a system of diffusion-like equations for the SP 7 approximation was formulated for the TD measurement.Numerical simulations of image reconstruction employing N = 1, 3, 5, and 7 were performed.In the simultaneous reconstruction of the heterogeneous medium where the high-absorption inclusion and high-scattering inclusion existed, the reconstructed µ a and diffusion coefficient, D =1/{3(µ a + µ s )}, were evaluated.The SP N approximations (N > 1) reconstructed the high µ a inclusion and low D inclusions better than the SP 1 Appl.Sci.2023, 13, 5016 9 of 30 approximation, which underestimated µ a .The error in the reconstructed value with the SP 7 approximation was slightly larger than that with the SP 5 and SP 5 approximations [66].
Some studies on image reconstruction with SP N approximations for FDOT and QPAT have been reported [67][68][69].

Diffusion Approximation
In DA, light propagation in the TD measurement is described by the following PDE [23]: where D =1/{3(µ a + µ s ')} is the diffusion coefficient, Φ(r, t) = 4π I(r, s, t)dΩ the fluence rate (photon density), and q 0 (r, t) an isotropic light source.For the CW measurement, the first term in Equation ( 14) is removed.Equation ( 14) is the Fourier transform for the FD measurement.
To compute light propagation with the PDE, the Robin boundary condition, which is often used in image reconstruction, is as follows: where A is a parameter that depends on the internal reflection ratio, and n is the outward normal vector.Early studies by Arridge and Schweiger [28] established the standard nonlinear image reconstruction scheme for DOT.The computation of the PDE is typically performed using the FEM.The Jacobian matrix J appearing in Equation ( 3) is obtained using the adjoint method based on the reciprocity theorem [70,71].In the TD measurement, the perturbation η in Φ with µ s and µ a at the measured position r d associated with small changes ν and α in µ s and µ a , respectively, is calculated as where Ψ is Green's function of the adjoint PDE, which calculates the light propagation from r d with position ξ and time τ.From Equation ( 16), the entries of J with respect to µ s ' are computed with the inner product of the gradients of Ψ and Φ, discretized with the FEM, whereas those with respect to µ a are computed with the product of Ψ and Φ on the right-hand side of Equation ( 16).Various types of measurement exist for image reconstruction.Gao et al. compared reconstructed images from the measured intensity (CW measurement), mean time-of-flight (first moment), variance (second moment), skew (third moment), and full-time-resolved DTOF [72].The normalized DTOF provides a better image in terms of spatial resolution and quantification in 2D numerical experiments using DA, although the image reconstruction from full-time-resolved measurements required a 120-fold longer computational time than that from the mean time-of-flight and variance.DA image reconstruction from Laplacetransformed DTOF using a modified generalized pulse spectrum technique (mGPST) has also been proposed [73].
DA is obtained from the P 1 approximation with the assumption that the time derivative of the net flux vector, J = 4π sI(r, s, t)dΩ, equals zero, and the light source is isotropic.The DA is valid while µ a << µ s .It is not valid to use the PDE for the nonscattering void region.Moreover, DA requires spatiotemporal conditions in which the directions of photons are sufficiently randomized by many scattering events and the light intensity changes very slowly [23,74].
Although the limitation of DA requiring the abovementioned assumptions and conditions, which often seem difficult to be fulfilled, has been debated, image reconstruction employing PDE has been successfully applied in in vivo and clinical DOT studies [75][76][77][78][79]. Several studies on image reconstruction based on DA with various regularizations and inversion schemes have been reported because the computational burden of the PDE is reasonable.Most studies on the optimization process introduced in Section 4 employed DA.
Sophisticated open sources for the forward computation of light propagation and image reconstruction, such as TOAST++ [80,81] and NIRFAST [82,83], are currently publicly available and have contributed to countless studies on DOT applications and image reconstruction.

Hybrid Approach
Although some limitations exist, DA can significantly reduce the computational burden compared with the RTE computation.Taking advantage of DA, hybrid methods of RTE and DA have been studied [84][85][86].The RTE is used only in the spatial and temporal domains, wherein using DA can be invalid for hybrid approaches.
Tarvainen et al. succeeded in image reconstruction by employing a forward process coupling the RTE and PDE computations in 2D numerical simulations of the FD measurement [87].The measured circular object was divided into two circular subdomains.The RTE was used in one of the subdomains near the measured surface, whereas the PDE was used in the inner subdomain.The forward calculation coupling the RTE and PDE was implemented using FEM.The optimization problem with weighted Tikhonov regularization was solved using the Gauss-Newton method.It was demonstrated that the coupled RTE and DA models reconstructed low-scattering inclusions as well as the RTE model from the measurements simulated using the MC method, including Gaussian-distributed noise.

Monte Carlo Simulation
Generally, the MC method is a numerical technique used to solve deterministic problems such as partial differential equations using probabilistic methods [88,89].In the forward process of computing light propagation, the movements of photons with scattering and absorption by biological tissues were stochastically simulated.Monte Carlo modeling of light transport in multilayered tissues (MCML) developed by Wang et al. and other software available online are widely used [90][91][92][93].In the MC method, a photon packet with unit energy w is launched into a medium comprising layers with optical properties.The length of the free path in which the photon packet is not scattered is determined by an exponentially distributed random variable.The ratio of photons maintaining the direction and surviving the absorption at distance l is given by p(l) = µ t exp(−µ t l), where µ t = µ a + µ s .The energy of the photon packet decreases with absorption at the position reached by the photon packet.The absorbed energy (µ a /µ t )w is recorded at this position.The photon packet with the remaining energy (µ s /µ t )w continues to travel.The change in the direction of the photon packet by a scattering event is determined randomly using the Henry-Greenstein function p(s,s ).The photon packet is reflected randomly by obeying Fresnel reflectance at the boundaries of the layers.A large number of photon packets are traced such that the recorded absorbed light distribution converges to a good approximation of the light propagation.Comparisons between MC and DA have been reported in many studies [94][95][96][97].
Hayakawa et al. proposed the perturbation MC (pMC) method for the computation of the Jacobian matrix J to solve the inverse problem using a gradient-based nonlinear optimization process [98].In the pMC method, the standard MC method is first used to record the energies of the detected photon packets moving from the light source to detectors with scattering events.To compute the changes in the measurements, the recorded energy w of the photon traveling through the region with the perturbations of µ s → µ s , µ a → µ a , and µ t → µ t is modified as where L is the total length of the photon packet traveling through the region perturbed by ζ scattering events.The matrix J can be obtained using Equation (17).The performance of the pMC method was numerically examined to determine the optical properties of an object with two layers.Kumar and Vasu demonstrated image reconstruction for a heterogeneous medium with the background and inclusions having a low µ s of approximately 0.5 mm −1 , because of which the DA did not hold [99].A nonlinear iterative optimization method, called the conjugate gradient squared method, was employed.Matrix J was computed using the pMC method.µ a and µ s values of the inclusions were simultaneously reconstructed with/without prior information regarding the locations of the inclusions in the 2D numerical simulations.Image reconstruction using the pMC method succeeded in imaging inclusions that could not be reconstructed using DA.
Boas et al. simulated light propagation in a 3D realistic adult head model using the MC method [100].Then, using the Rytov approximation, which is a series expansion of the logarithm of Φ, the linearized forward process was formulated, as shown in Equation (7).In this study, the entries of matrix J to reconstruct the changes in µ a from the Rytov approximation were calculated using the MC method based on the adjoint method with a light source with the δ-function in the CW version, as follows [101]: where G(r 1 , r 2 ) is Green's function, which is the fluence rate at r 2 generated by the light source at r 1 ; r s,m and r d,m are the positions of the mth pair of the source and detector, respectively.r n is the n-th voxel that discretizes the medium to the source positions.By the image reconstruction using Equation ( 18) and standard Tikhonov regularization, the changes in regional blood flow in the somatosensory area during median nerve stimulation were imaged and superimposed on the magnetic resonance image of the subject's head.

Use of Regularization Minimizing Norms
Regularization methods were employed in DOT image reconstruction to solve the inverse problem and alleviate instability, which means that reconstruction is strongly affected by noise in the measurement and error in the forward process.The reconstructed image was disturbed by noise and errors.As shown in Equation ( 9), the classical Tikhonov regularization that minimizes the squared 2-norm ( 2 -norm or L 2 -norm) of the reconstructed image can be readily applied to image reconstruction.Tikhonov regularization with an appropriate γ is useful for obtaining a smooth image by reducing the effect of measurement noise.However, smoothness is often accompanied by a decline in spatial resolution.In such a case, the changes in the optical properties of the measured object are blurred and are reconstructed in a larger volume than the true one.As a result, the change is reconstructed as a smaller value than true one, which means that the concentration of the photon absorber such as hemoglobin is underestimated by image reconstruction with the Tikhonov reconstruction.Additionally, the effect of diffusive light propagation also causes a low spatial resolution, generally in the image reconstruction of optical imaging, as reported in the literature [102].
To improve the quality of the reconstructed image, the regularization method for minimizing the p-norm (0 ≤ p ≤ 1) of the reconstructed image has been used in recent years.
The p-norm is defined as , for the vector µ comprising K entries.Figure 2 illustrates the idea of the regularization minimizing p-norm [103].The dashed line represents the set of optical properties µ = (µ 1 , µ 2 ) at two positions which provide an identical and single measurement.The image reconstruction of two optical properties from a single measurement is underdetermined.By employing a certain value of the norm of the reconstructed image (solid line), the number of candidates of the solution can be reduced.The candidates exist on intersecting points of the dashed line and circle, which represent the set of µ with a certain value norm in the 2-norm case.By choosing smaller p, it is observed that the intersection points approach one or the other of the axes.Thus, one of entries of µ at a position takes a large value, and the other takes a value close to zero.This indicates that the solution becomes sparse.Generally, the DOT image reconstruction has a large number of unknown changes in µ a and µ s , which are localized in a smaller number of voxels (smaller region) with smaller value of p as well as in the case of Figure 2.Then, the blurriness of the image can be reduced.In the case of Figure 2, by minimizing the p-norm, the solution can be specified on the tangent point.A difficulty encountered while implementing the regularization minimizing p-norm is computing the gradient for the optimization because the cost function becomes nonconvex.There are several techniques to implement the regularization minimizing p-norm.
entries. Figure 2 illustrates the idea of the regularization minimizing p-norm [103].The dashed line represents the set of optical properties μ = (μ1, μ2) at two positions which provide an identical and single measurement.The image reconstruction of two optical properties from a single measurement is underdetermined.By employing a certain value of the norm of the reconstructed image (solid line), the number of candidates of the solution can be reduced.The candidates exist on intersecting points of the dashed line and circle, which represent the set of μ with a certain value norm in the 2-norm case.By choosing smaller p, it is observed that the intersection points approach one or the other of the axes.Thus, one of entries of μ at a position takes a large value, and the other takes a value close to zero.This indicates that the solution becomes sparse.Generally, the DOT image reconstruction has a large number of unknown changes in μa and μs, which are localized in a smaller number of voxels (smaller region) with smaller value of p as well as in the case of Figure 2.Then, the blurriness of the image can be reduced.In the case of Figure 2, by minimizing the p-norm, the solution can be specified on the tangent point.A difficulty encountered while implementing the regularization minimizing p-norm is computing the gradient for the optimization because the cost function becomes nonconvex.There are several techniques to implement the regularization minimizing p-norm.Süzen et al. proposed a compressed sensing method for DOT [104].In this study, the DOT images were reconstructed using measurements acquired via random sampling.Image reconstructions were performed using the regularization method, minimizing the L1 norm (1-norm) of the image.The optimization process was based on the literature [105], in which the L1-norm, i.e., the absolute value, was approximated as , u      + with parameter u smoothing the function, and the gradient was ap- Süzen et al. proposed a compressed sensing method for DOT [104].In this study, the DOT images were reconstructed using measurements acquired via random sampling.Image reconstructions were performed using the regularization method, minimizing the L 1 norm (1-norm) of the image.The optimization process was based on the literature [105], in which the L 1 -norm, i.e., the absolute value, was approximated as |µ| ≈ √ µ • µ + u, with parameter u smoothing the function, and the gradient was approximately computed as d|µ|/dµ ≈ µ/ √ µ • µ + u.L 1 -norm regularization was found to be more robust than Tikhonov (L 2 -norm) regularization in terms of reducing the number of measurement samples used in image reconstruction.
Shaw and Yalavarthy proposed image reconstruction with dynamic CW-domain DOT that employed image reconstruction with Rytov approximation for video-rate imaging [106].They applied the 1 -norm (1-norm) minimization implemented with the open-source YELL1 algorithm [107] derived from the alternating direction method (ADM), which minimizes the augmented Lagrangian function equivalent to Equation ( 8) by updating sequentially the optical properties δµ, the residual error norm e, and the Lagrange multiplier Γ [108].The update of δµ was performed using the product of the absolute value and the sign of the gradient with a threshold for minimizing the 1 -norm.Phantom experiments demonstrated that 1 -norm regularization improved the image contrast compared to traditional 2 -norm regularization.Kavuri et al. reported that L 1 -norm regularization improved DOT image reconstruction in terms of spatial resolution and depth localization in phantom experiments [109].
L 1 -norm regularization was implemented using the depth-compensation method.The Rytov approximation was employed to linearize image reconstruction.In this study, matrix M for depth compensation was computed using the singular value decomposition of matrix J.Then, matrix J in Equation ( 7) is replaced by JM.L 1 -norm regularization was implemented with a logarithmic barrier penalty function bounding the reconstructed values at a certain interval, which was minimized simultaneously with the cost function.
Cao et al. implemented sparsity regularization with L 1 -norm by employing an expectation minimization (EM) algorithm [110].They modeled the measurements (Rytov approximation) using additive noise and optical properties as random Gaussian variables.Subsequently, a Bayesian framework was applied for image reconstruction.The cost function was formulated as the log-likelihood function of the posterior probability to be maximized, which is equivalent to the minimization in Equation ( 1), where matrix W is composed of the variance in the Gaussian distribution of the measurement additive noise, and the regularization term is the L 1 -norm of the optical properties.The expectationmaximization (EM) algorithm consisted of E-and M-steps.In the E-step, the log-likelihood (cost function) is computed using the measurements and optical properties in each iteration.In the M-step, the log-likelihood is maximized.The soft threshold method was used, in which the gradient of the L 1 -norim was computed as sgn(δµ)|δµ| and as zero when |δµ| was smaller than the threshold.
Okawa et al. used the p -norm (0 < p ≤ 1) of a reconstructed image as a regularization term using the focal underdetermined system solver (FOCUSS) algorithm [111,112].The change in the absorption coefficient δµ a from the background was formulated to differentiate the p -norm as δµ ak = sgn(z k )|z k | 2/p at the k-th FEM node with parameter z k .Then, the optimization problem of the image reconstruction in Equation ( 1) was reformulated as follows: where z is a vector composed of z k (k = 1, 2, . . ., K). Images were reconstructed using the mean time-of-flight of the time-resolved measurements.Numerical and phantom experiments demonstrated that the reconstructed area with changes in the absorption coefficient decreased as p approached zero.Prakash et al. reported the p -norm (0 ≤ p ≤ 1) regularization for CW domain measurement [113].They implemented the majorize-minimization framework, which replaced the original nonconvex cost function with the p -norm (0 < p ≤ 1) by a sequence of convex cost functions equal to the original cost function at a certain point and larger than the original cost function at the other points.Moreover, the regularization minimizing smooth 0 -norm was introduced.The smooth 0 -norm was approximated as δµ a 0 = K − ∑ K k=1 ρ(δµ ak ) with ρ (δµ ak ) = exp(−δµ ak 2 /σ 2 ), which takes values of unity and zero when ∆µ ak > σ and ∆µ ak < σ, respectively.Improvements in image quality were demonstrated in a numerical experiment with irregular geometry and a gelatin phantom experiment with optical properties mimicking typical breast tissue.Various applications of sparsity regularization have been proposed [114,115].
Although regularization minimizing the norm of change in the optical properties effectively reduces blurriness and can locate the localized cerebral blood flow change caused by neural activity and cancer tissues with angiogenesis in the early stage, it is difficult to reconstruct a certain volume of a relatively large tissue/organ with nearly uniform optical properties.In such cases, the total variation (TV) method may be useful.The TV regularization method minimizes the norm of the image derivative and reconstructs the image using a nearly piecewise constant with jump discontinuities [116].The TV regularization term is represented as where µ is the reconstructed distribution of optical properties; |•| indicates the Euclidean (L 2 ) norm.In the actual calculation of the TV norm, a difference approximation of the gradient of µ is performed.By minimizing the gradient of µ (approximated with differences in µ on neighboring discretized nodes for numerical calculation using FEM), the image tends to be flat and has piecewise constant parts.Paulsen and Jiang applied the TV regularization method to the FD-domain DOT to reconstruct D and µ a simultaneously [117].Image reconstruction was attempted through numerical simulation with additive random noise and in the phantom experiment, where the ratios of the imaging target with a radius of 12.5 mm to the background were 2:1 and 10:1, respectively.Image reconstruction with TV regularization also reconstructed the area and values of D and µ a correctly, whereas image reconstruction without TV regularization was unable to correctly reconstruct the area and values.Douiri et al. compared Tikhonov, TV, and Huber regularizations [118].Although TV regularization reconstructs a volume with piecewise constant optical properties, it is difficult to reconstruct the changes in optical properties for a small volume included in a large volume.To overcome this difficulty and reconstruct small and large structures, the Hubert regularization term is formulated as follows: where σ is the parameter for switching Ψ, which is automatically adjusted during each iteration.In the flat homogeneous part with a small gradient of the reconstructed image, Ψ functioned similarly to the Tikhonov regularization to smooth the image while preserving the discontinuities with a large gradient.The numerical simulations demonstrated that the changes in µ a and µ s of the small area in a large flat part was reconstructed more clearly by Hubert regularization than by TV regularization.The Hubert regularization method was extended to include prior information on the edges of regions in an object imaged using different imaging modalities [119].

Use of Structural Prior Information
Because DOT provides functional information, the interpretation of DOT images becomes more meaningful by superimposing DOT images onto morphological images obtained by other imaging modalities, such as X-ray CT, MRI, and ultrasound imaging.Moreover, information regarding the location where the optical properties can change improves the precision and efficiency of image reconstruction by reducing the number of unknown variables to be reconstructed.
Schweiger and Arridge proposed a two-stage image reconstruction method that employs prior information [120].They assumed that the anatomical structure inside a measured object can be obtained using other imaging modalities.The averages of the optical properties in certain regions obtained by the segmentation of other images were reconstructed in the first stage.In the first stage, the optical properties of each region were assumed to be homogeneous.Subsequently, the details of the optical properties in each region were reconstructed in the second stage by employing the image obtained in the first stage as an initial estimate.It was demonstrated that the proposed two-stage method employing a segmented MR image of the brain provides a better image than image reconstruction with a homogeneous flat initial estimate.
Dehghani et al. incorporated prior information into the matrix J [121].The discretized positions (FEM node) in a segmented region were lumped by introducing a K × K matrix U that had elements U ξ,η = 1 when the ξ-th position was included in η-th segmented region, and was otherwise 0. The matrix J in Equation ( 9) was replaced by the matrix JU.The ill-posed nature of the inverse problem was relieved by grouping the discretized positions.The accuracy of the reconstructed values improved significantly in the 3D image reconstruction of small objects.The prior information that was introduced into U to fix a homogeneous change in the optical properties in a segmented region is called the hard prior information.
Ntziachristos et al. employed a hard prior method to quantify hemoglobin concentration and oxygen saturation of breast lesions in 14 subjects [122].They obtained prior structural information from the MR images.Image reconstructions of µ a at wavelengths of 780 and 830 nm were carried out for the regions of the tumor and other tissues (background).The cancerous tumor had a hemoglobin concentration almost 10 times higher than that of the background tissues and the oxygen saturation of cancerous tissues was approximately 10% lower than that of the background tissues.
Boverman et al. investigated the influence of imperfect prior information, which are errors in the segmentation of MIR images, on two-step image reconstruction combined with background estimation with nonlinear optimization and linearized image reconstruction with a hard prior method [123].By comparing the image reconstructed from the homogeneous initial guess with that from the initial guess obtained based on prior background information, it was demonstrated that image reconstruction with imperfect prior information can localize the abnormal region in the breast but causes a bias in the reconstructed image.
Di Sciacca et al. tried using hard prior information from ultrasound images in TD measurements with eight wavelengths ranging from 635 to 1060 nm [124].Digital phantoms of the breast were used to visualize benign and malignant lesions.The digital phantoms were generated using software from the Virtual Imaging Clinical Trials for Regulatory Evaluation (VICTRE) Project.B-mode ultrasound images of the breast were simulated using k-wave software [125] by employing phantoms.The optical properties were determined using Gaussian random variables, which had different averages and variances between the benign and malignant lesions.For image reconstruction, segmentations were performed to separate the lesion and normal regions based on the B-mode image.Two-region image reconstruction was performed, and the reconstructed optical properties were used for classification, in which 75% of the lesions were classified correctly.
In contrast, the prior information incorporated into the regularization term was called soft prior information [25].In this approach, prior information was incorporated into matrix L in Equation (9).One of the methods for employing L as prior information is to use the elements of L k,k = 1 when k = k , L k,k = 0 when k = k , and L k,k = -1/K when the k-th and k -th FEM nodes are in the same segmented region (Laplacian structure).Using this method, the difference in optical properties at positions in the same segmented region was minimized.Therefore, the optical properties tended to be homogeneous in each segmented region.
Yalavarthy et al. examined soft and hard prior methods [126,127] in numerical simulations with an MRI-based breast model and an experiment with a gelatin phantom.The Laplacian and Helmholtz structures were used as soft prior methods.In the Helmholts structure, which is a modified version of the Laplacian structure, L k,k = −1/(K + h/l) when the k-th and k -th FEM nodes are used in the same segmented region, where h and l are the distances between the nodes and the size of the imaging target (tumor), respectively.Compared with image reconstruction without prior information, the use of structural prior information dramatically improves the quantification and image quality of DOT.
Brooksby et al. used the soft prior method with MR to reconstruct the optical properties of the breast using FD measurements [128].In numerical simulations, the soft prior method with a tissue layer structure recovered 74% of the true values.Additionally, a good initial estimate of the optical properties provided 99% recovery.
Li et al. proposed a method that uses prior information on the structure obtained using other imaging modalities in the regularization term [129].Using the diagonal matrix L, which has a diagonal element of 1 corresponding to the voxel in the segmented tumor region and 0 corresponding to the normal region, the cost function with the regularization terms was formulated with linearization in the forward process as By taking a large value of the regularization parameter γ 1 and a small value of γ 2 , δµ in the tumor region can become relatively large, although the artifacts, which are noises that appear in the reconstructed image, can be enhanced.In this study, the spectral prior method described in Section 4.3 was also employed.
The Bayesian framework incorporates prior structural information.Guven et al. proposed a hierarchical Bayesian approach [130].Anatomical images, such as MRI and X-ray CT, were segmented into subimages composed of background tissues.In each subimage, we assumed that the absorption coefficient was Gaussian distributed with an unknown mean and variance, which are referred to as the hyperparameters.The FD measurements were then modeled with linearization employing the Rytov approximation and additive Gaussian noise.The hierarchical Bayesian approach estimated δµ a and the hyperparameters simultaneously by maximizing the log-likelihood of the posterior probability density function.
In a manner different from the Bayesian framework, Panagiotou et al. treated DOT image reconstruction with probability statistics to incorporate prior structural information [131].For the regularization term, they used the joint entropy and mutual information of the reconstructed and reference images obtained by different imaging modalities.The joint entropy was formulated as follows:

Use of Spectral Prior Information
While DOT can image the hemoglobin concentration and oxygen saturation of tissues by using the reconstructed µ a and µ s , DOT image reconstruction faces the cross-talk problem caused by nonuniqueness, which means that a reconstructed image with a combination of µ a and µ s different from the true values is obtained from the measurements because the different combinations of µ a and µ s cause the measurements to be identical or very close to each other.
Arridge and Lionheart [132] summarized the conditions for combinations of (µ a1 , D 1 ) and (µ a2 , D 2 ) that cause the same CW measurements, as follows: and where δ 0 is the measured region surrounded by Ω 1 including isotropic light sources.It was reported that the cross-talk problem was reduced by the image reconstruction method, which directly reconstructed the concentrations of photon absorbers using multispectral measurements because the spectral prior information of photon absorbers could function as a constraint to maintain the consistency of the spectra in the reconstructed images [133,134].In DOT image reconstruction employing prior spectral information, µ a at the o wavelengths, λ 1 , λ 2 , . . .λ o , was formulated with the concentrations of q photon absorbers, c(r) = [c 1 , c 2 , . . .c q ] T , as follows: where ε 1 , ε 2 , . . .ε q , are the extinction coefficients of the photon absorbers.In contrast, µ s is defined as r) , by assuming a simplified Mie-scattering coefficient, where a and b are the parameters depending on the size, refraction index, and concentration of the scatterers [133,135].By substituting the equations relating µ a and µ s to a, b, and c into Equation ( 1) or ( 8) and constructing matrix J, image reconstructions with prior spectral information were performed.Corlu et al. proposed an image reconstruction method for multispectral (four wavelengths) CW measurements using nonlinear gradient-based optimization [133].In this study, b was assumed constant, and a and c were reconstructed.To reduce the cross-talk problem, the condition of increasing nonuniqueness in the multispectral-CW DOT was derived based on the conditions in Equations ( 26)- (28).(a, c) and (a + δa, c + δc) yield the same measurements when the following equation holds.
where δc = [δc 1 , δc 2 , . . .δc q ] T and h(a . Equation (30) was rewritten using the vector matrix formula Ax = 1.The least-squares solution of the equation is The residual error norm, E = Ax 0 − 1 , was used as the measure of the ability to distinguish µ a from µ s .When the residual error norm was close to zero, the measurement involved cross-talk problems in image reconstruction.In the numerical simulation, it was demonstrated that the cross-talk between µ a and µ s was reduced by a combination of wavelengths that provided an E that was as large as possible.Simultaneously, the concentrations of oxy-and deoxyhemoglobin were clearly separated by the combination of wavelengths with the small condition number of matrix A of ε in Equation (30).The condition number is the ratio between the maximum and minimum singular values, and indicates the degree of separation ability among c 1 , c 2 , . . .c q .This image reconstruction method was applied to the FD DOT image of oxy-/deoxyhemoglobin, water, and µ s in a real breast tissue at four wavelengths: 690, 750, 786, and 830 nm [136].Li et al. used a spectral prior method for linearized image reconstruction [134].The linear equation, δM = Jδc, was solved for the changes in the concentrations of oxy-and deoxyhemoglobin, and δc was the regularized Moore-Penrose generalized solution, δc = J T (JJ T + λI) −1 δM.To compare the image reconstructions of δc using a spectral prior (direct method) and calculation from the reconstructed δµ a (indirect method), a numerical simulation and a blood phantom experiment were conducted.By measuring the contrast-to-noise (CNR) ratio (peak value of deoxyhemoglobin divided by the mean standard deviation of the reconstructed values in voxels) and cross-talk (changes in oxyhemoglobin concentration divided by deoxyhemoglobin concentration), we verified that the indirect method could improve the contrast-to-noise ratio and reduce cross-talk.
Li et al. employed the spectral prior method with a formulation of µ s that was different from other methods [137].Based on the Mie theory and the assumption that the size of the scattering particle is Gaussian distributed, the u-th scattering particle has a reduced scattering coefficient: where v is the particle size, n is the refractive index, α u is the average particle size, b is the standard deviation, and Q scat is the scattering efficiency.Then, µ s is formulated as where ϕ u is the volume fraction of the uth scattering particle.Equation ( 33) can be written as a vector matrix formula, similar to Equation (29).ϕ u was reconstructed with c simultaneously.This algorithm was examined in numerical simulations with various geometries including particles with sizes of 150, 1000, and 6000 nm in a phantom experiment using blood, intralipid, and agar powder.The algorithm was also tested on the clinical measurements of patients with breast cancer.

Other Important Topics: Regularization Parameter, Artifacts, Local Minima, and AI
In DOT image reconstruction, the selection of the regularization parameter γ is important, as is the case in other inverse problems.The reconstructed image cannot sufficiently approximate the true optical properties when γ is excessively large because the squared error term in Equation ( 2) is not sufficiently small.However, the measurement noise, which causes artifacts in the reconstructed image, disturbs the reconstructed image when γ is extremely small.Selecting an appropriate γ is one of the important aspects in image reconstruction.
The L-curve and generalized cross-validation (GCV) methods are often used to solve inverse problems.The L-curve method finds γ that minimizes the squared error term and regularization term in good balance by plotting the log of the squared residual error term versus the log of the regularization term with various γ, which often draws an L-shaped curve [111,114,115,138].γ allows the corner of the L-shaped curve to efficiently minimize both terms.
The GCV method in the case of Equation ( 9) with W = L = I selects γ that minimizes the GCV function, where δµ(γ) is the Tikhonov regularized image reconstructed using γ [139,140].The trace in Equation ( 34) is the sum of the diagonal elements in the matrix, which can be regarded as a measure of the degrees of freedom in the regularized image δµ [116].Various methods have been proposed to select γ.Correa et al. examined 10 methods to select g including heuristic, L-curve, GCV, unbiased predictive risk estimator, discrepancy principle, normalized cumulative periodogram (NCP), F-slope, quasi-optimality criterion, full width half maximum, contrast-to-noise ratio (CNR), and CNR•Ψ −1 methods [141].Linearized 3D image reconstruction was performed using a tissue-mimicking phantom made of epoxy resin with the imaging target.The optimal method was selected based on the relative error in the reconstructed image, objectivity, and the requirement of no prior knowledge.Although it is difficult to find the corner of the L-shaped curve owing to its high ill-posedness, the L-curve method was selected as the optimal method under the conditions of the study.
Jagannath and Yalavarthy compared the minimal residual method (MRM) with GCV [139].MRM updates δµ in a manner similar to that of the steepest gradient method to solve Equation ( 8) with W = I using Tikhonov regularization with L = I.MRM changes γ in each iteration to update δµ.The updating rule at the tth iteration is where the step size of s t = ∇ f t 2 / J∇ f t 2 + γ t ∇ f t 2 .γ t , which minimizes the squared residual error, was searched using a simplex-type method.The image reconstructions based on the Rytov approximation with CW measurements in the numerical simulations and the gelatin phantom experiment demonstrated that the MRM provided a higher spatial resolution and peaks closer to the true values than the GCV method.Prakash and Yalavarthy proposed the conjugate gradient-type least-square QR (LSQR) method to select γ [142].The LSQR method is typically used to solve linear equations efficiently and is applied to image reconstruction.The LSQR, MRM, L-curve, and GCV methods were compared using numerical simulations and gelatin phantom experiments.The LSQR and MRM methods improved the spatial resolution and quantitativeness of the reconstructed images.
Sun et al. examined methods to select γ automatically, such as the L-curve, GCV methods, MRM, projection error method (PEM), and model function method in nonlinear image reconstruction using Tikhonov regularization from the logarithm of light intensities in CW measurements [143].The PEM calculates γ at the tth iteration, as follows: The absolute bias error (ABE), which is the mean value of the error between the reconstructed and true values; CNR, which compares the values in the region of interest and background; and FWHM were evaluated through numerical simulations.PEM provided better CNR and ABE than the other methods, especially for the image reconstruction of a single target.MRM improved the spatial resolution more than the other methods.
Pogue et al. proposed γ as a spatial variant [144].Because of the diffusive nature of light propagation, the spatial resolution of the DOT image degrades as the distance from the target to the sources and detectors increases.A spatially constant γ can excessively smooth the image of a deeper target.For image reconstruction of a circular object with Tikhonov regularization, the spatially variant γ is formulated as follows: where r n j is the radial position of the jth pixel and R is the radius of the image.γ e and γ c are empirically determined parameters.In the numerical simulation of the FD measurements with three targets, the spatially variant γ reconstructs the target close to the center of the circular object more clearly than the spatially constant γ.The spatially variant γ provided a better image than the spatially constant one by evaluating the peak value of the reconstructed target, the FWHM, and the variance in the homogeneous region.Phantom experiments were also conducted.The artifacts appearing in the high-sensitivity region close to the sources and detectors were suppressed using this regularization method.
Another way to suppress the artifacts and influence of measurement noise on the reconstructed image is to reduce the measurement noise.Okawa et al. proposed a noise reduction method for TD measurement [145].The number of photons measured using the time-correlated single-photon counting method is a Poisson-distributed random variable with a mean equal to the noise-free photon count.Based on the probability density function of the DTOF measurements formulated using the Poisson distribution function, a noise-free DTOF that maximizes the posterior log-likelihood was estimated.Using the estimated noise-free DTOF, the artifacts and variations in the homogeneous region were reduced in the numerical and phantom experiments.
Errors in the measurement conditions cause a mismatch between the forward process and the actual measurement.Consequently, artifacts may appear in the image to compensate for this mismatch.Therefore, calibration of the sources and detectors is important for accurately measuring light intensities.Boas et al. demonstrated the simultaneous reconstruction of δµ and calibration factors [146].In this study, the intensity of the ith light source at position r s,i has a constant calibration factor s i and is formulated as s i •q 0 (r s,i ).Then, the measurement by the jth detector with calibration factor d j was expressed as s i • d j • M i,j .The image reconstruction of the calibration factors and δµ was performed by linearization using the Rytov approximation.The matrix J in Equation ( 7) can be rewritten to include the effects of s i and d j : The matrix was formulated using Green's function for slab geometry.Reconstructions of δµ and the calibration factors were performed using Tikhonov regularization.δµ was scaled to be dimensionless for the simultaneous reconstruction.In the numerical experiment with the simulated measurements generated with a randomly chosen calibration factor, no artifacts appeared in the image reconstructed by the proposed method, whereas the images reconstructed without considering the calibration factor were contaminated with artifacts.
Oh et al. reported image reconstruction with a calibration factor (coupling coefficient) in FD measurement using a Bayesian framework updating µ a and coupling coefficients sequentially [147].The forward process used the finite difference method.A numerical simulation was performed for a cubic object.The phantom experiments employed a lightemitting diode and a photodiode as the source and detector, respectively.A cylindrical object in a culture flask filled with intralipids was reconstructed.
Schweiger et al. reported the nonlinear reconstruction of µ a and µ s and the coupling coefficient to modify the errors in the amplitude and phase in FD measurements [148].µ a , µ s , and the coupling coefficients were scaled by averaging the initial estimates.The FEM was used in the forward process.The damped Gauss-Newton method was used in the optimization process.The 2D numerical simulation and a cylindrical phantom experiment with two targets were performed.In the phantom experiment, hair was placed between the optical fibers and the phantom surface to simulate functional brain imaging.
Fukuzawa et al. applied a calibration coefficient to TD measurements [149].The measured amplitude of the DTOF was influenced by changes in the contact condition of the optical fibers on the surface of the measured object, especially when the surface consisted of a soft material such as the skin of the scalp.The mGPST method using the Laplace-transformed DTOF was employed for image reconstruction.In addition to the numerical experiments, a phantom experiment employing a cylindrical polyacetal resin coated with soft silicone rubber demonstrated that the change in compression on the surface by the optical fibers caused an artifact at the surface in the reconstructed image.The calibration coefficient was applied to the in vivo measurement of the infant's head and reduced artifacts.
The estimation of the calibration coefficient prior to image reconstruction was proposed by Li and Jiang [150], employing a calibration matrix obtained from a homogeneous phantom, and by Tarvainen et al. [151] using a rotationally symmetric array of sourcedetector optical fibers in FD measurement.
In addition to the instability caused by noise, which is relieved by regularization, the reliability of a reconstructed image is influenced by its nonuniqueness.In particular, using the gradient-based optimization method mentioned in Section 2.3, an initial guess of the optical properties may lead to a local minimum of the cost function, which causes the reconstructed image to not approximate the true image.Li et al. used a combination of a genetic algorithm (GA) and a gradient-based method in the optimization process to simulate DOT image reconstruction for prostate cancer diagnosis [152].The GA randomly generated wide-ranging candidates for the reconstructed image, which were updated by mutation, crossover recombination, and selection [153].The updating rule mimics the biological evolutionary process.The initial guess selected by the GA was used in the nonlinear gradient-based image reconstruction.The FEM geometry, including the prostate, intraprostatic tumor, and rectum, was constructed to solve the PDE based on endorectal MRI images.
Jiang et al. used simulated annealing (SA) to prevent the reconstructed image from being trapped by local minima [154].In this method employing SA, which is a type of a Metropolis-Hastings Markov chain Monte Carlo method, δµ a was expressed as a discrete value as δµ a = δµ a max {(S i /M) + (1/2)} by employing the spin variable S i = ±1, ±2, . . ., ±M/2 at the ith position.The cost function f (S i ) with linearization by Rytov approximation and a type of Tikhonov regularization was used in the Boltzmann distribution with temperature T, formulated as p(S i ) ∝ exp(−f (S i )/T), as the probability density function to sample the spin variables.S i randomly generated in accordance with p(S i ) was adopted when the probability increased (i.e., f decreased).Starting from a high T, which provided a highly random selection of S i and cooling by decreasing T, the spins resulting in the reconstructed image were narrowed from a wide range of candidates.The numerical simulations demonstrated that the SA avoided the local minimum arising in a single-spin case [155] and that the DOT image was reconstructed in the multiple-spin case.
AI, as represented by deep learning, can provide a different approach to reconstructing DOT images.Deep learning comprises an artificial neural network (ANN) with several hidden layers.Various ANN architectures for deep learning, such as convolutional neural networks (CNN), generative adversarial networks, and recurrent neural networks (RNN), have been employed.The neural network outputs the data processed by artificial neuron units, the connections of which are adjusted by a learning process using training data that are given input and output pairs.Deep learning and its relationship with the DOT have been detailed in the literature [16,17].Yoo et al. proposed a deep-learning approach [15] that employed an ANN to solve the following Lippman-Schwinger equation for δµ: where δΦ = Φ − Φ 0 , and Φ 0 and G 0 are the fluence rate and Green's function of the CW version of the PDE with the squared diffusive wave number of the background substituted for µ a .Equation ( 39) can be inverted using a technique related to Hankel matrix decomposition, which is achieved using a fully data-driven neural network with an encoderdecoder structure.The training data were generated by numerical simulations using NIRFAST.This approach output the image of δµ better than the 1 -sparse regularization method and the iterative updating method based on Rytov approximation in the numerical, phantom, and in vivo mouse experiments.Mozumder et al. proposed an ANN output of the updated optical properties for the Gauss-Newton method used in a nonlinear optimization process [33].Image reconstruction was formulated using the FD PDE.The Bayesian approach was employed to incorporate prior information on the distribution of the optical properties.In the process to update the optical properties, CNN was employed.The update of the optical properties is expressed as where µ t is the distribution of µ a and µ s in the tth iteration, denoted by the subscript, G θ,t is the updating function achieved by the CNN, and δµ t is the gradient of the cost function.µ t and δµ t are the inputs to the CNN.The selection of the step size, which must be determined in the updating process of the nonlinear optimization, was reduced.Numerical experiments demonstrated that the proposed method worked better than the conventional Gauss-Newton method under modeling errors.The proposed method reduced crosstalk artifacts, whereas the Gauss-Newton method reconstructed images contaminated by artifacts.Takamizu et al. proposed image reconstruction using the long short-term memory (LSTM) deep learning method, which is an extension of the RNN that uses gates to selectively retain and forget information [156].The deep learning scheme is composed of two LSTM layers and a dense layer to reconstruct the positions with changes in µ a .The datasets generated using TD RTE were used to train the ANN.By comparing the DOT images reconstructed using the proposed method and the conventional nonlinear optimization method with DA, it was demonstrated that the proposed method reduced the blurriness of the region with a large µ a .
As shown in the above studies, the use of AI technology represented by deep learning can reduce computational costs in the forward and optimization processes using training data prior to practical image reconstruction.Forward calculation, particularly with RTE, which is repeatedly required in the nonlinear optimization process, requires massive computation, and becomes a bottleneck in the clinical application of DOT.AI can improve the quality of an image by confining the solution space of the inverse problem of DOT image reconstruction in accordance with appropriate prior information.

Discussion
The image-reconstruction algorithms reviewed in the previous sections were studied to address each technical problem.In actual clinical measurements, image reconstruction is required to consider these problems comprehensively.In particular, an improvement in the spatial resolution is essential for the diagnosis of cancers, which are important imaging targets of DOT.Studies on breast cancers using DOT and NIR spectroscopy indicated that malignant breast cancer had statistically different optical properties from normal tissues [8,10,12].Malignant tumors can have a higher total hemoglobin concentration and a larger scattering coefficient than normal tissues and benign tumors.In addition to breast cancer, cancers of other organs involve abnormal vascularization.Microvessel density in tumors, which can be imaged noninvasively by diffuse optical techniques, can be an indicator of tumor grade and stage, and a prognostic indicator [157].The progression and expansion of malignant tumors require nutrient supply and waste removal, which can be accomplished by microcirculation owing to angiogenesis and vasculogenic mimicry in tumors.Therefore, contrast with hemoglobin must be useful.
However, it is difficult to distinguish cancer from benign lesions using hemoglobin as a contrast agent because the hemoglobin concentration increases in both the carrier and benign lesions [12].One of the important characteristics useful for cancer diagnosis is tumor heterogeneity.Although the hemoglobin concentration is an important biomarker, it changes temporally and spatially in tumors.The peripheral area and invasive edge of the tumor can have more blood flow and higher hemoglobin concentrations.However, blood flow in growing tumors can decrease due to decreasing vessel density, severe structural and functional abnormalities of the vessels, and the development of necrosis [158].Heterogeneous hypoxic regions of tumors with low blood flow, which often exist in the core of tumors and resist radiotherapy, are of interest in various medical imaging studies [159].By imaging the heterogeneity of tumors quantitatively and qualitatively using an improved image reconstruction algorithm, DOT may be applied as expected in cancer diagnoses.
Therefore, the low spatial resolution of DOT due to diffusive light propagation should be overcome by image reconstruction for clinical use, although it may not be easy for the current DOT to reconstruct the heterogeneity within a tumor a few centimeters in size.In image reconstruction, the mismatch between the forward calculation and the actual light propagation degrades image quality.Therefore, high-precision computation in the forward process is desirable.The use of RTE is a promising direction, especially for imaging cancers and abnormalities such as thyroid cancer and rheumatic arthritis, which are near the illumination positions.The error of DA increases in regions with µ a > 0.01 mm −1 and a distance smaller than 5 mm from the illuminating position, according to studies [74,97] comparing DA and MC simulations.DA can increase the error in the void and low scattering regions, such as the trachea in the neck and cerebral spinal fluid layer in the head, as discussed in the literature on RTE, P N /SP N approximations, and the MC method.However, the use of the RTE with appropriate discretization is a time-consuming and computationally expensive process.Higher-order approximations using P N /SP N methods [57][58][59][60][61][62][63][64][65][66] and hybrid methods [87] can also be useful.
In the application of DOT, including cancer imaging and functional brain imaging, the imaging targets associated with changes in hemoglobin concentration must be selected from a heterogeneous background consisting of multiple organs and different tissues.The effects of background heterogeneity cannot be ignored under realistic scenarios.Therefore, in the optimization process, methods used in the literature, such as [118], which can manage broad and local changes in the optical properties simultaneously, are needed.To obtain a reliable image, prior structural information from other imaging modalities such as MRI, X-ray CT, and ultrasound imaging is indispensable.Therefore, methods introduced in the literature [120,123,126,130], which incorporate prior structural information and recover the heterogeneity inside the segmented organs, should be employed.Because of the difficulty in selecting the regularization parameter, two-stage image reconstruction [120,123] incorporates hard prior information as an initial estimate.In the second stage, to reconstruct heterogeneities in the segmented regions, combination use of the sparsity regularization methods using p-norm (0 ≤ p ≤ 1) [104,106,[109][110][111]113] can be useful to achieve the highresolution image.The studies indicated that appropriate regularization improved not only the spatial resolution but also the quantification.Other biomarkers, such as water, lipids, collagen, and the scattering coefficient are useful for diagnosis, as shown in [136].The spectral prior method improved the quantification performance of the DOT.
The combined use of DOT and other optical imaging methods is one approach for obtaining a higher spatial resolution.Some studies combining QPAT/FDOT and DOT have been reported [160][161][162][163][164][165][166].QPAT and FDOT are related to the DOT.Photoacoustic tomography (PAT) achieved a higher spatial resolution than DOT by exploiting the low scattering characteristic of ultrasound excited by NIR light [167].FDOT can enhance the contrast of malignant lesions using fluorescent molecules [47].Both PAT and FDOT highlighted tissue abnormalities.However, the background optical properties, which are not observed in the PA and FDOT images, can be imaged by DOT.Moreover, the background optical properties are useful for quantifying PAT/FDOT images because the image intensity of PAT/FDOT depends on the fluence rate of the excitation light, which in turn depends on the optical properties of the heterogeneous background.The reconstruction of the optical properties of the PA sources [50] and the fluence compensation modification of the PA image [168][169][170][171] by QPAT will be improved using background optical properties quantified by DOT, which will play a very important role in guaranteeing quantification by PAT/FDOT.In this context, the aforementioned two-stage DOT image reconstruction method is useful.A quantitative comparison between abnormalities imaged by PAT/FDOT and normal tissues imaged by DOT may provide useful information for diagnosis.

Conclusions
DOT with NIR light, which can image the concentrations of chromophores such as hemoglobin deep inside the body, is expected to be a new imaging modality for cancer diagnosis and functional brain imaging.Unlike X-ray CT, DOT image reconstruction must consider diffusive light propagation, which reduces the spatial resolution and quantification of images.Studies on image reconstruction have attempted to resolve the technical problems of spatial resolution and quantification accuracy in DOT images of highly heterogeneous measured objects.Various image reconstruction methods were reviewed here.
In the forward process, the use of the RTE, P N , SP N , diffusion approximations, and MC simulations has been reported.Improvements in computational performance in recent years have enabled us to base the high-precision forward simulation of light propagation on predicting measurements for DOT image reconstruction.To reduce the mismatch between the forward simulation and the actual measurement, which degrades the quality of the reconstructed image, RTE and its higher-order approximation are useful.To balance the required precision in clinical applications with computational time and cost, an appropriate forward computation method should be chosen.
During the optimization process, various methods, including regularization techniques and the use of structural and spectral prior information, have been proposed to overcome the technical problems of low spatial resolution, noise, and artifacts.Sparsity regularization techniques have improved spatial resolution.The use of prior information on the structures inside the body obtained from other imaging modalities, such as X-ray CT and MRI, is useful for treating heterogeneous backgrounds.Spectral prior information improves the quantification of chromophore concentrations in multispectral DOT imaging.For reliable image reconstruction in cancer diagnoses with high heterogeneity, the use of prior information and sparsity regularization must be effective.By considering the purpose of DOT in clinical use and the specificity of the diffuse optical measurement system, an effective optimization method can be developed.
Although the authors have reviewed important studies, many of them could have been missed owing to the limitations of the authors' knowledge and time.As discussed in this review, various efforts have been made during the past two decades to improve image reconstruction.Moreover, technologies surrounding DOT, such as AI, high-performance computation, and PA imaging, have begun in recent years to change the circumstances.Novel solutions that apply emerging technologies to improve image reconstruction and DOT applications in clinics can be anticipated by leveraging numerous previously reported studies.

Figure 1 .
Figure 1.Flow of DOT image reconstruction and topics related to the processes in image reconstruction mentioned in this review together with corresponding section numbers.

Figure 1 .
Figure 1.Flow of DOT image reconstruction and topics related to the processes in image reconstruction mentioned in this review together with corresponding section numbers.

Figure 2 .
Figure 2. Idea of the regularization minimizing p-norm of the reconstructed image with (a) p = 2 (Tikhonov regularization), (b) p = 1, and (c) 0 < p < 1.The solid lines represent the points of (μ1, μ2) with a constant value of the p-norm.The dashed lines represent the group of points providing identical measurements.The cross-section points of the solid and dashed lines are solutions with a certain value of the p-norm.By minimizing the norms, the tangent points are selected as the optimal reconstructed image.

Figure 2 .
Figure 2. Idea of the regularization minimizing p-norm of the reconstructed image with (a) p = 2 (Tikhonov regularization), (b) p = 1, and (c) 0 < p < 1.The solid lines represent the points of (µ 1 , µ 2 ) with a constant value of the p-norm.The dashed lines represent the group of points providing identical measurements.The cross-section points of the solid and dashed lines are solutions with a certain value of the p-norm.By minimizing the norms, the tangent points are selected as the optimal reconstructed image.