HoloDiffusion: Sparse Digital Holographic Reconstruction via Diffusion Modeling

: In digital holography, reconstructed image quality can be primarily limited due to the inability of a single small aperture sensor to cover the entire field of a hologram. The use of multi-sensor arrays in synthetic aperture digital holographic imaging technology contributes to overcoming the limitations of sensor coverage by expanding the area for detection. However, imaging accuracy is affected by the gap size between sensors and the resolution of sensors, especially when dealing with a limited number of sensors. An image reconstruction method is proposed that combines physical constraint characteristics of the imaging object with a score-based diffusion model, aiming to enhance the imaging accuracy of digital holography technology with extremely sparse sensor arrays. Prior information of the sample is learned by the neural network in the diffusion model to obtain a score function, which alternately constrains the iterative reconstruction process with the underlying physical model. The results demonstrate that the structural similarity and peak signal-to-noise ratio of the reconstructed images using this method are higher than the traditional method, along with a strong generalization ability.


Introduction
Digital holography (DH) utilizes digital cameras instead of traditional optical recording materials to capture holograms and employs numerical methods to reconstruct both the amplitude and phase information of the light field emitted by objects [1][2][3].It has gained significant prominence as a crucial scientific tool in a wide range of applications, including three-dimensional recognition [4,5], microscopic imaging [6,7], and surface feature extraction [8,9].
Nevertheless, there are still certain areas that require further improvement in DH.The quality of the reconstruction is often limited by the field of hologram (FOH).The complete information of a large FOH cannot be detected by a single small aperture sensor.The multiplexing method can enhance the representation of high-frequency information, which is available in either the frequency domain or the holographic domain [10,11].The resolution of single-aperture DH can be enhanced through the self-extrapolation method [12,13].However, the potential for improvement is restricted when employing smaller single-aperture sensors or capturing holograms in long-range imaging scenarios [14].
The synthetic aperture technique substantially broadens the detection area of highorder diffraction fringes [15,16].As a prevalent technique for recovering complex amplitude light fields of targets, the Gechberg-Saxton (GS) [17] algorithm restored phase by alternating iterations between the spatial domain and holographic domain.Fienup et al. improved the GS method by adding a feedback process for fast convergence [18].To expand the effective detection area within the holographic field, Huang et al. use multiple sparse aperture sensors, facilitating large-scale information acquisition.A self-restoration method for sparse aperture arrays (SRSAAs) [19] is then proposed, designed to incrementally recover the missing information within the gaps between sensors.Based on the GS method, SRSAAs offer acceptable image reconstruction quality, and they are sensitive to the selection of the initial conditions.Owing to inadequate extraction and utilization of prior information pertaining to the target of the light field distribution, there is an increased likelihood of falling into local optima.In addition, the quality of the reconstructed image is inherently constrained by the performance capabilities of the sensors.
Recently, a diffusion model [20] with strong generative capabilities has been proposed and has shown excellent performance in various generative modeling tasks, including medical image generation [21,22], image editing [23], and super-resolution imaging [24].
In order to achieve high-quality digital holographic imaging of an expansive FOH, the HoloDiffusion method is proposed by incorporating the diffusion model into the iterative digital holographic reconstruction.Prior information on the complex amplitude of the target light field is learned from the amplitude-phase image dataset via a diffusion model.The rotating iterations between the spatial domain and the holographic domain serve to complement each other's information.The acquired prior information is utilized to bolster the reconstruction process.The distribution and energy constraints of objects are imposed on the spatial domain image to procure high-quality images.
The rest of the paper is structured as follows.The basis of DH, diffusion model and the details of the proposed HoloDiffusion are described in Section 2. Experimental results under various conditions are presented in Section 3. The discussion and conclusion are in Sections 4 and 5, respectively.

Digital Holography
The hologram captured by the sensor can be depicted as follows: where O(x h , y h , z h ) represents the object wave function and R(x h , y h , z h ) signifies the reference wave function.
The transfer function of the object can be formulated as follows: t(x 1 , y 1 , z 1 ) = (1 − a(x 1 , y 1 , z 1 )) exp(−iφ(x 1 , y 1 , z 1 )) where a(x 1 , y 1 , z 1 ) delineates the attenuation of the incident wave and φ(x 1 , y 1 , z 1 ) represents the phase introduced by the object.The transmission function t(x 1 , y 1 , z 1 ) can be expressed as 1 + g(x 1 , y 1 , z 1 ), where g(x 1 , y 1 , z 1 ) characterizes the presence of the object and '1' is the transmittance in the absence of an object.According to the forward propagation of Fresnel diffraction, the plane wave U(x 1 , y 1 ) will be modulated into the following form when passing through the object: 2 describes the distance between the object and the hologram.λ is the wavelength and k = 2π/λ represents the wavenumber.The expression for the backpropagation formula is where U * denotes the complex conjugation of U. h(x h , y h , z h ) is the normalization of the hologram acquired when the reference wave directly illuminates the sensor without any object present.The reconstruction of images is limited by the field of hologram.While the sparse aperture array self-recovery method offers a partial solution, its lack of utilization of deep learning techniques results in suboptimal image reconstruction, which is especially evident in highly sparse sensor arrays.In pursuit of richer information and high-quality results, a method termed HoloDiffusion is introduced.This method tackles image reconstruction challenges within highly sparse sensor arrays through the application of diffusion models.Furthermore, this method employs a score-based generative model to estimate the prior distribution of both amplitude and phase images.

Score-Based Generative Model
As illustrated in Figure 1, the score-based diffusion model considers the continuous distribution of data points over time in accordance with the gradual evolution of the diffusion process.It progressively transforms the data points into random noise through forward stochastic differential equations (SDEs).This process is subsequently reversed, reconstructing the data from the noise that generated the sample.Hence, training a neural network is feasible in terms of estimating the gradient of the log data distribution (i.e., ∇ x log p(x)), enabling numerical solutions for inverse SDEs.h x y z is the normalization of the hologram acquired when the reference wave directly illuminates the sensor without any object present.The reconstruction of images is limited by the field of hologram.While the sparse aperture array self-recovery method offers a partial solution, its lack of utilization of deep learning techniques results in suboptimal image reconstruction, which is especially evident in highly sparse sensor arrays.In pursuit of richer information and highquality results, a method termed HoloDiffusion is introduced.This method tackles image reconstruction challenges within highly sparse sensor arrays through the application of diffusion models.Furthermore, this method employs a score-based generative model to estimate the prior distribution of both amplitude and phase images.

Score-Based Generative Model
As illustrated in Figure 1, the score-based diffusion model considers the continuous distribution of data points over time in accordance with the gradual evolution of the diffusion process.It progressively transforms the data points into random noise through forward stochastic differential equations (SDEs).This process is subsequently reversed, reconstructing the data from the noise that generated the sample.Hence, training a neural network is feasible in terms of estimating the gradient of the log data distribution (i.e., log ( ) ), enabling numerical solutions for inverse SDEs. a fixed mean and variance.This diffusion process can be modeled as a solution for a forward SDE: where w is the standard Wiener process, ( , ) f x t is called the drift coefficient of ( ) x t , and ( ) g t is the diffusion coefficient of ( ) x t .
Given that the reverse of the diffusion process is also a diffusion process [25], the solution for the reverse SDE can be formulated as follows: where w is the standard Wiener process with time ranging from T to 0, and dt is an infinitesimal negative step.Once the score log ( ) for each marginal distribution is known for all t , the reverse diffusion process can be derived from the above equation and then simulated in order to sample from 0 p .The diffusion process {x(t)} T t=0 is parameterized by the continuous time variable t ∈ [0, T], where x(0) ∼ p 0 , x(T) ∼ p T , p 0 is the data distribution and p T is an unstructured prior distribution devoid of p 0 information, such as a Gaussian distribution with a fixed mean and variance.This diffusion process can be modeled as a solution for a forward SDE: where w is the standard Wiener process, f (x, t) is called the drift coefficient of x(t), and g(t) is the diffusion coefficient of x(t).
Given that the reverse of the diffusion process is also a diffusion process [25], the solution for the reverse SDE can be formulated as follows: where w is the standard Wiener process with time ranging from T to 0, and dt is an infinitesimal negative step.Once the score ∇ x log p t (x) for each marginal distribution is known for all t, the reverse diffusion process can be derived from the above equation and then simulated in order to sample from p 0 .Different SDEs can be constructed by selecting various functions: f (x, t) and g(t).To mitigate the variance explosion (VE) that SDEs may induce and achieve higher sample quality, the subsequent VE-SDE is devised: where σ(t) > 0 is a monotonically increasing function, which is typically configured as geometric progression [20].Beginning with sample x(T) ∼ p T , these samples x(0) ∼ p 0 can be obtained by reversing the process.It can be articulated as a reverse time VE-SDE: Given that the true value of ∇ x log p t (x) remains unknown, the solution for the inverse SDE can be approximated by employing a time-conditioned neural network S θ (x, t) : ∇ x log p t (x(t)).This approach involves substituting ∇ x log p t (x) with a Gaussian perturbation kernel ∇ x log p t (x(t)|x(0)) , which is centered around x(0).The parameter θ can be optimized by applying the subsequent formula: Consequently, an approximation of the solution for the reverse SDE can be achieved: Then, the Euler discretization method is employed for the numerical solution of the SDE.This process entails dividing the time variable t uniformly into N intervals such that 0 = t 0 < • • • < t N = 1, ∆t = 1/N, thereby discretizing it within the inclusive range of [0, 1].The essence of the training process of a diffusion model is to train a predictor to approximate the real noise distribution.A typical encoder-decoder structure with a U-Net architecture is used in the network.In the encoder section, the U-Net model progressively compresses the size of the image.In the decoder section, it gradually restores the image size.Additionally, residual connections are employed between the encoder and decoder to ensure that the decoder does not lose the information from previous steps when inferring and recovering image details.

Image Reconstruction Utilizing HoloDiffusion
In the realm of DH, considering the constraints of sensor pixel pitch, the diffracted beam originating from each point on the object can be seen as a cone-shaped diffraction cone.The maximum spatial frequency that a sensor can capture is limited.As illustrated in Figure 2a, the target with the amplitude and the phase acquires the holographic field through digital holographic imaging.The blue section in Figure 2b is derived from sparse sampling by the sensor.Due to the inability of a full-field sensor to capture the entire image map effectively, a sparse sensor array is employed for collection.During the reconstruction of the image with the sparse sensor array, the loss of information from sensor gaps results in the loss of the frequency component corresponding to the entire scene.Hence, these sampling gaps exert an influence on the reconstructed amplitude and phase distributions of the object.
The digital holographic challenge posed by sparse aperture arrays can be transformed into hologram recovery problems involving sparse sampling, as depicted in the following equation: where W represents the sparse sampling matrix related to sensor sequence arrangement, ⊙ denotes the Hadamard product, which is the element-wise multiplication of corresponding entries of two matrices.I is the holographic field on the sensor and M is the sparse sampled hologram.
Inspired by the transformation of the above problem, the HoloDiffusion method is proposed to improve the quality of reconstructed images.A detailed flowchart of the HoloDiffusion is illustrated in Figure 3.The digital holographic challenge posed by sparse aperture arrays can be transformed into hologram recovery problems involving sparse sampling, as depicted in the following equation: where W represents the sparse sampling matrix related to sensor sequence arrangement,  denotes the Hadamard product, which is the element-wise multiplication of corresponding entries of two matrices.I is the holographic field on the sensor and M is the sparse sampled hologram.
Inspired by the transformation of the above problem, the HoloDiffusion method is proposed to improve the quality of reconstructed images.A detailed flowchart of the Ho-loDiffusion is illustrated in Figure 3.The digital holographic challenge posed by sparse aperture arrays can be transformed into hologram recovery problems involving sparse sampling, as depicted in the following equation: where W represents the sparse sampling matrix related to sensor sequence arrangement,  denotes the Hadamard product, which is the element-wise multiplication of corresponding entries of two matrices.I is the holographic field on the sensor and M is the sparse sampled hologram.
Inspired by the transformation of the above problem, the HoloDiffusion method is proposed to improve the quality of reconstructed images.A detailed flowchart of the Ho-loDiffusion is illustrated in Figure 3.In the iterative reconstruction stage, start with a hologram Îi+1 and turn to phase and amplitude through BP.A conversion of the hologram into amplitude and phase is necessary: where F symbolizes the forward propagated (FP) process, and F −1 corresponds to the backward propagation (BP) process.The estimated value of the hologram is denoted by Îi , and O i represents the estimated value of the amplitude and phase.Here, the superscript i serves as an iteration marker during the reconstruction process.At the commencement of the reverse SDE stage, În−1 = M.For amplitude, the absorption constraint is implemented, and the amplitude is first inverted.This is because an object that absorbs light cannot have a negative value.In instances where the amplitude signifies light absorption, it is set to zero, necessitating the concurrent adjustment of the phase to zero at the corresponding location.Upon imposition of the constraint, the magnitude of negative values is multiplied by a matrix such that the pixel value of the hologram is 0, except for the support area.By using the prior information, the area with pixels set to 0 is negated, and the amplitude with a background of 1 is obtained.Therefore, the absorption constraint for amplitude O amp and the support constraint [26] for phase O pha are utilized to eliminate the twin image: where superscript i is omitted for brevity, and P represents pixels outside the support area.Specifically, the continuous distribution over time is considered with diffusion processes.By inverting the SDE, random noise can be converted into data for sampling.The numerical solver employed for the inverse SDE functions as the predictor.In particular, the sample from the prior distribution can be obtained through the inverse SDE presented in Equation ( 8) and subsequently discretized in the following manner: where i= n − 1, • • • , 1, 0 is the number of discretization steps for the reverse-time SDE, σ i is the noise schedule at the i-th iteration, z i+1 ∼ N(0, 1) denotes standard normalization, and S θ (O i+1 , σ i+1 ) is a score function with a time-conditional neural network.
After each iteration of the forward SDE, a fidelity operation is performed to ensure data consistency (DC): The hologram produced by the fidelity operation advances to the next iteration.The reconstructed amplitude and phase are derived via BP from the hologram in the final iteration.Additionally, the pseudo-code of the HoloDiffusion algorithm is depicted in Algorithm 1.

7:
O i ← F( Îi ) 8: End for 9: Return O 0 After learning the distribution of the image set, the amplitude and the phase requiring reconstruction are input into the predictor, resulting in the generation of a reconstructed image, which is logged as an iteration.The phase and amplitude are converted into holograms through BP.Subsequently, the four blocks of the hologram are covered back for fidelity.Following numerous iterations, the ultimate reconstruction amplitude and phase are acquired.

Data Specification
The dataset comprises 60,000 images, each featuring a resolution of 1200 × 1200 pixels and each pixel pitch is 3.8 µm.The central 28 × 28 pixel region of the image is generated using the MNIST dataset.For the amplitude, the digit portion is assigned a pixel value of 0.1 and the background portion is designated with a pixel value of 1.In the phase, the pixel values of the digit and the background are 1 and 0.

Model Training and Parameter Selection
The parameter selections are as follows: The wavelength is 500 nm, the side length of the object area is 0.001, the propagation distance is 0.0024, and the hologram side length is 0.001.During prior learning, the noise is added to the model is 2000 with a mean of 0 and a standard deviation that is a random number between 0.01 and 10.The random number seed used is set to 42.The model is trained by the Adam algorithm with a learning rate 0.0002.The method is implemented using a computer equipped with a NVIDIA TITAN GPU.To balance the quality of the reconstruction with the speed of the process, the iteration number is set to n = 500 in the reconstruction stage.

Quantitative Indices
To quantitatively assess the quality of the reconstructed data, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) are employed.
The MSE quantifies the error between paired observations that represent the same phenomenon.It is defined as follows: where N P is the number of pixels within the reconstruction result, Ô is the estimated value of the reconstructed phase or amplitude, and O is the ground truth for comparison.As MSE approaches zero, it indicates that the reconstructed image is increasingly closer to the reference image.PSNR describes the relationship between the maximum possible power of a signal and the power of noise corruption.A higher PSNR means better reconstruction quality.PSNR is expressed as follows: SSIM is utilized to measure the similarity between the ground-truth and reconstruction.It is represented as follows: where µ Ô and σ 2 Ô are the average and variances of Ô. σ Ô O is the covariance of Ô and O. c 1 and c 2 are used to maintain a stable constant.

Reconstruction at Gaps of Different Sizes
To evaluate the effectiveness of HoloDiffusion in reconstruction under various sensor gaps, experimental verification is carried out.Additionally, the reconstruction results are qualitatively and quantitatively compared with the SRSAA.
In the experiments, four sensors were used to detect the area corresponding to the blue sections, as shown in Figure 4.The four sensors were symmetrically distributed at the four corners of the holographic field, with gaps for every two sensors.The sensor size was specified as 450 × 450, and the size of the entire hologram was 1200 × 1200.In the iteration using the diffusion model, the reconstruction targets were cropped to amplitude and phase images with a resolution of 512 × 512 pixels due to the memory size limitations of the graphics card.Under the aforementioned fixed conditions, the effects of images reconstructed using the HoloDiffusion and SRSAA methods under various gap conditions were confirmed.
O . 1 c and 2 c are used to maintain a stable constant.

Reconstruction at Gaps of Different Sizes
To evaluate the effectiveness of HoloDiffusion in reconstruction under gaps, experimental verification is carried out.Additionally, the reconstruc qualitatively and quantitatively compared with the SRSAA.
In the experiments, four sensors were used to detect the area corres blue sections, as shown in Figure 4.The four sensors were symmetrically the four corners of the holographic field, with gaps for every two sensors.was specified as 450 × 450, and the size of the entire hologram was 1200 iteration using the diffusion model, the reconstruction targets were croppe and phase images with a resolution of 512 × 512 pixels due to the memory of the graphics card.Under the aforementioned fixed conditions, the effec constructed using the HoloDiffusion and SRSAA methods under various were confirmed.As depicted in Figure 5, the quality of the phase and amplitude recons methods gradually deteriorates as the gap increases.However, HoloDi strates clearer results compared to SRSAA as the gap size increases, indic performance when dealing with sparser sensor configurations.SRSAAs ca factory reconstruction with a small gap size; as the gap size widens, more a apparent in SRSAAs.Compared to SRSAAs, the reconstructed amplitude duced using HoloDiffusion exhibit superior image quality.For instance size is 90, the reconstructed image using HoloDiffusion appears clearer a nificantly reduced artifacts.The image reconstructed using the SRSAA me loss of most target pixel details, and the linked regions are disjointed.C As depicted in Figure 5, the quality of the phase and amplitude reconstructed by both methods gradually deteriorates as the gap increases.However, HoloDiffusion demonstrates clearer results compared to SRSAA as the gap size increases, indicating improved performance when dealing with sparser sensor configurations.SRSAAs can achieve satisfactory reconstruction with a small gap size; as the gap size widens, more artifacts become apparent in SRSAAs.Compared to SRSAAs, the reconstructed amplitude and phase produced using HoloDiffusion exhibit superior image quality.For instance, when the gap size is 90, the reconstructed image using HoloDiffusion appears clearer and exhibits significantly reduced artifacts.The image reconstructed using the SRSAA method exhibits a loss of most target pixel details, and the linked regions are disjointed.Conversely, the images reconstructed using the HoloDiffusion method closely resemble the real situation while maintaining the details and leaving the structures unchanged.
As shown in Table 1, the average PSNR, SSIM, and MSE values of 100 images reconstructed from the MNIST dataset are recorded.HoloDiffusion can achieve notable average PSNR gains of 6.16 dB, 6.38 dB, 10.44 dB, 12.28 dB, and 11.10 dB at various gaps.What is exciting is that when the size of the gap is 90, the phase and amplitude reconstructed by the PSNR can attain 35.51 dB and 41.62 dB, respectively.Simultaneously, in comparison with the SRSAA method, the reconstruction results of HoloDiffusion display higher SSIM values and smaller MSE values.Hence, under the condition of larger gap size, HoloDiffusion demonstrates significant advancements in suppressing noise and artifacts.As shown in Table 1, the average PSNR, SSIM, and MSE values of 100 images reconstructed from the MNIST dataset are recorded.HoloDiffusion can achieve notable average PSNR gains of 6.16 dB, 6.38 dB, 10.44 dB, 12.28 dB, and 11.10 dB at various gaps.What is exciting is that when the size of the gap is 90, the phase and amplitude reconstructed by the PSNR can attain 35.51 dB and 41.62 dB, respectively.Simultaneously, in comparison with the SRSAA method, the reconstruction results of HoloDiffusion display higher SSIM and smaller MSE values.Hence, under the condition of larger gap size, HoloDiffusion demonstrates significant advancements in suppressing noise and artifacts.

Reconstruction under Different Numbers of Sensors
To confirm the effectiveness and robustness of the HoloDiffusion method in reconstructing images with varying numbers of sensors (SN), a comparison is made between the HoloDiffusion method and the SRSAA method.
In the experiments in this section, the gap was 120, and the sensor size was 500.A different number of sensors is used for various distributions, as depicted in Figure 6.Under the above fixed conditions, the effects of the images reconstructed by the HoloDiffusion and SRSAA methods are verified under the condition of different numbers of sensors.
structing images with varying numbers of sensors (SN), a comparison is made between the HoloDiffusion method and the SRSAA method.
In the experiments in this section, the gap was 120, and the sensor size was 500.A different number of sensors is used for various distributions, as depicted in Figure 6.Under the above fixed conditions, the effects of the images reconstructed by the HoloDiffusion and SRSAA methods are verified under the condition of different numbers of sensors.As illustrated in Figure 7, a sharp decline in the quality of the reconstructed amplitude and phase is observed for both methods as the number of sensors decreases.While SRSAA is capable of reconstructing a clear image with a high number of sensors, its performance significantly deteriorates as the sensor count decreases, deviating from the ground truth in terms of basic outlines and details.As the number of sensors decreases, the image reconstructed by HoloDiffusion is clearer than that of the SRSAA.Furthermore, when the basic structure and outline of the image remain unchanged, the reconstructed image can more closely approach the ground truth.Experiments with varying sensor numbers demonstrate that HoloDiffusion not only reconstructs image details more effectively but also suppresses the generation of artifacts and twin images.As illustrated in Figure 7, a sharp decline in the quality of the reconstructed amplitude and phase is observed for both methods as the number of sensors decreases.While SRSAA is capable of reconstructing a clear image with a high number of sensors, its performance significantly deteriorates as the sensor count decreases, deviating from the ground truth in terms of basic outlines and details.As the number of sensors decreases, the image reconstructed by HoloDiffusion is clearer than that of the SRSAA.Furthermore, when the basic structure and outline of the image remain unchanged, the reconstructed image can more closely approach the ground truth.Experiments with varying sensor numbers demonstrate that HoloDiffusion not only reconstructs image details more effectively but also suppresses the generation of artifacts and twin images.

Reconstruction under Different Numbers of Sensors
To confirm the effectiveness and robustness of the HoloDiffusion method in reconstructing images with varying numbers of sensors (SN), a comparison is made between the HoloDiffusion method and the SRSAA method.
In the experiments in this section, the gap was 120, and the sensor size was 500.A different number of sensors is used for various distributions, as depicted in Figure 6.Under the above fixed conditions, the effects of the images reconstructed by the HoloDiffusion and SRSAA methods are verified under the condition of different numbers of sensors.As illustrated in Figure 7, a sharp decline in the quality of the reconstructed amplitude and phase is observed for both methods as the number of sensors decreases.While SRSAA is capable of reconstructing a clear image with a high number of sensors, its performance significantly deteriorates as the sensor count decreases, deviating from the ground truth in terms of basic outlines and details.As the number of sensors decreases, the image reconstructed by HoloDiffusion is clearer than that of the SRSAA.Furthermore, when the basic structure and outline of the image remain unchanged, the reconstructed image can more closely approach the ground truth.Experiments with varying sensor numbers demonstrate that HoloDiffusion not only reconstructs image details more effectively but also suppresses the generation of artifacts and twin images.The average PSNR, SSIM, and MSE values of 100 images reconstructed from the MNIST dataset with different numbers of sensor arrays are documented in Table 2.In general, HoloDiffusion consistently outperforms across various sensor counts.Among them, when the number of sensors is three, the PSNR values of the phase and amplitude reconstructed by HoloDiffusion are improved by 8.11 dB and 6.79 dB.Furthermore, the reconstruction results obtained using the HoloDiffusion method demonstrate higher SSIM values and smaller MSE values.Even with fewer sensors, HoloDiffusion effectively suppresses noise and twin images to a significant extent.For visual comparison, the reconstructed images are presented in Figure 8.The HoloDiffusion method demonstrates fewer artifacts and maintains better continuity of image features compared to the SRSAA on cross-dataset.Regardless of whether it is analyzing phase or amplitude information, the HoloDiffusion method showcases improved reconstruction that more accurately mirrors the ground truth.
reconstructed by HoloDiffusion are improved by 8.11 dB and 6.79 dB.Furthermore, th reconstruction results obtained using the HoloDiffusion method demonstrate highe SSIM values and smaller MSE values.Even with fewer sensors, HoloDiffusion effectivel suppresses noise and twin images to a significant extent.

Generalizability Verification on Cross-Dataset
A pre-trained diffusion model is employed to evaluate the generalization capabilitie of the model across various datasets.The effectiveness and robustness of both method are gauged.
For visual comparison, the reconstructed images are presented in Figure 8.The Ho loDiffusion method demonstrates fewer artifacts and maintains better continuity of imag features compared to the SRSAA on cross-dataset.Regardless of whether it is analyzin phase or amplitude information, the HoloDiffusion method showcases improved recon struction that more accurately mirrors the ground truth.As listed in Table 3, the HoloDiffusion method consistently outperforms the SRSAA in terms of performance, spanning almost all assessed datasets and metrics.Remarkably the HoloDiffusion method achieves substantially elevated PSNR values, exhibiting an im provement margin of nearly 14 dB in certain cases.This suggests a significantly enhance image reconstruction quality when employing the HoloDiffusion method.The SSIM ou comes indicate that HoloDiffusion excels in preserving structural integrity compared t the SRSAA.Furthermore, the lower MSE values imply a more accurate approximation t As listed in Table 3, the HoloDiffusion method consistently outperforms the SRSAA in terms of performance, spanning almost all assessed datasets and metrics.Remarkably, the HoloDiffusion method achieves substantially elevated PSNR values, exhibiting an improvement margin of nearly 14 dB in certain cases.This suggests a significantly enhanced image reconstruction quality when employing the HoloDiffusion method.The SSIM outcomes indicate that HoloDiffusion excels in preserving structural integrity compared to the SRSAA.Furthermore, the lower MSE values imply a more accurate approximation to the original image in HoloDiffusion, indicating fewer errors during the reconstruction process.
Collectively, these results emphasize the exceptional performance of the HoloDiffusion method in image reconstruction, highlighting its potential for robust application across diverse imaging scenarios.In order to verify the effectiveness and robustness of the HoloDiffusion method in image reconstruction, sensor arrays with different sizes are used while maintaining a fixed gap.
As detailed in Figure 9, in the image reconstructed using the SRSAA method, a significant loss of detail is observed, and the connected areas appear fragmented or broken.The images reconstructed using the HoloDiffusion method are very close to the real situation while keeping the details and structures unchanged.As the sensor array size increases, the reconstructed images exhibit fewer artifacts and greater detail.It is obvious that the HoloDiffusion method achieves better results.
the original image in HoloDiffusion, indicating fewer errors during the reconstruction process.Collectively, these results emphasize the exceptional performance of the HoloDiffusion method in image reconstruction, highlighting its potential for robust application across diverse imaging scenarios.

Reconstruction at Different Sensor Sizes
In order to verify the effectiveness and robustness of the HoloDiffusion method in image reconstruction, sensor arrays with different sizes are used while maintaining a fixed gap.
As detailed in Figure 9, in the image reconstructed using the SRSAA method, a significant loss of detail is observed, and the connected areas appear fragmented or broken.The images reconstructed using the HoloDiffusion method are very close to the real situation while keeping the details and structures unchanged.As the sensor array size increases, the reconstructed images exhibit fewer artifacts and greater detail.It is obvious that the HoloDiffusion method achieves better results.

Reconstruction under Large Pixel Sizes
To evaluate the effectiveness of the HoloDiffusion method in reconstructing images with limited information, this section examines its performance using larger pixel sizes.Specifically, while maintaining the size of the single sensor, each pixel size is doubled from the original, which is equivalent to under-sampling.
For visual comparison, Figure 10 shows that the structure and contour of the amplitude and phase reconstructed by the HoloDiffusion method are closer to the real situation when the sampling rate is increased.The problem of part resolution reduction of the image can be further solved by using the HoloDiffusion method.In addition, at low sampling rates (SRs), the reconstructed HoloDiffusion amplitude outperforms the phase.It is hypothesized that this phenomenon can be attributed to the phase being more influenced by the amount of information than the amplitude.Table 5 tabulates the average PSNR, SSIM, and MSE values of 100 images reconstructed from the MNIST dataset.Compared to the SRSAA method, the HoloDiffusion method produces reconstruction results with superior SSIM and PSNR values.It can be seen that at a sampling rate of 4/5, images reconstructed via the HoloDiffusion method begin to demonstrate better results.Table 5 tabulates the average PSNR, SSIM, and MSE values of 100 images reconstructed from the MNIST dataset.Compared to the SRSAA method, the HoloDiffusion method produces reconstruction results with superior SSIM and PSNR values.It can be seen that at a sampling rate of 4/5, images reconstructed via the HoloDiffusion method begin to demonstrate better results.

Conclusions
In the context of digital hologram reconstruction, an algorithm is proposed founded on a diffusion model characterized by robust generative abilities.The diffusion model is incorporated into the physics-based iterative reconstruction process, specifically for image rotation in the SRSAA method.This integration enables the execution of image generation on the amplitude image within the holographic domain.Specifically, amplitude and phase information for both channels of the holographic domain image is obtained using diffusion modeling.After under-sampling the image, the SRSAA is utilized to ensure the fidelity of the phase and amplitude.The phase and amplitude are put into the network based on the prior information for prediction.The numerical SDE solution is executed alternately during the iteration stage, allowing for the acquisition of generated sample data and facilitating efficient reconstruction.The image reconstruction and model generation capabilities were validated, and this method demonstrated superior reconstruction effects under four new image sampling methods.The results indicate that the model exhibits greater flexibility in handling complex holographic image reconstruction and has broader applicability in diverse digital images.

Figure 1 .
Figure 1.The figure shows that the data perturbed by noise is smoothed along the trajectory of a SDE.By estimating the score function log ( ) x t p x  using a SDE, it is possible to approximate the reverse SDE and subsequently solve it, enabling the generation of image samples from noise.The diffusion process

Figure 1 .
Figure 1.The figure shows that the data perturbed by noise is smoothed along the trajectory of a SDE.By estimating the score function ∇ x log p t (x) using a SDE, it is possible to approximate the reverse SDE and subsequently solve it, enabling the generation of image samples from noise.

Figure 2 .
Figure 2. The figure shows a sparse aperture digital holography.(a) The insertion plot depicts the amplitude and phase of the target.(b) Holographic field and sparse sensor distribution; the blue part represents the position of the sensor.

Figure 3 .
Figure 3.The figure shows the proposed method for digital holographic reconstruction.(Top): Prior learning stage to learn the gradient distribution via denoising score matching.(Bottom): Iterate between numerical SDE solver and data-consistency step to achieve reconstruction.

Figure 2 .
Figure 2. The figure shows a sparse aperture digital holography.(a) The insertion plot depicts the amplitude and phase of the target.(b) Holographic field and sparse sensor distribution; the blue part represents the position of the sensor.

Figure 2 .
Figure 2. The figure shows a sparse aperture digital holography.(a) The insertion plot depicts the amplitude and phase of the target.(b) Holographic field and sparse sensor distribution; the blue part represents the position of the sensor.

Figure 3 .
Figure 3.The figure shows the proposed method for digital holographic reconstruction.(Top): Prior learning stage to learn the gradient distribution via denoising score matching.(Bottom): Iterate between numerical SDE solver and data-consistency step to achieve reconstruction.During the prior learning stage, the gradient distribution of amplitude and phase is learned by denoising score-matching.Notably, the amplitude and phase of the object O specifically are represented as the matrix [ , ] amp pha  O O O

Figure 3 .
Figure 3.The figure shows the proposed method for digital holographic reconstruction.(Top): Prior learning stage to learn the gradient distribution via denoising score matching.(Bottom): Iterate between numerical SDE solver and data-consistency step to achieve reconstruction.During the prior learning stage, the gradient distribution of amplitude and phase is learned by denoising score-matching.Notably, the amplitude and phase of the object O specifically are represented as the matrix O = [O amp , O pha ] of the dual-channel.The HoloDiffusion is trained with O in high-dimensional space as a network input, resulting in the acquisition of the parameterized S θ (O, t).In the iterative reconstruction stage, start with a hologram Îi+1 and turn to phase and amplitude through BP.A conversion of the hologram into amplitude and phase is necessary: Photonics 2024,11, 388

Figure 4 .
Figure 4.The figure shows the sensor distribution at different distances.The blue sent sensor arrays.

Figure 4 .
Figure 4.The figure shows the sensor distribution at different distances.The blue squares represent sensor arrays.

Figure 5 .
Figure 5.The figure shows the reconstruction results using different methods at sensor array sizes equal to 450 and different gaps.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 5 .
Figure 5.The figure shows the reconstruction results using different methods at sensor array sizes equal to 450 and different gaps.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 6 .
Figure 6.The figure shows the distribution of different numbers of sensors.The blue squares represent sensor arrays.(a-c) represent the sampling conditions when the number of sensors is 2, 3, and 4, respectively.

Figure 7 .
Figure 7.The figure shows the reconstruction results using different methods with a sensor array size equal to 500, a gap equal to 120, and different numbers of sensors.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 6 .
Figure 6.The figure shows the distribution of different numbers of sensors.The blue squares represent sensor arrays.(a-c) represent the sampling conditions when the number of sensors is 2, 3, and 4, respectively.

Figure 6 .
Figure 6.The figure shows the distribution of different numbers of sensors.The blue squares represent sensor arrays.(a-c) represent the sampling conditions when the number of sensors is 2, 3, and 4, respectively.

Figure 7 .
Figure 7.The figure shows the reconstruction results using different methods with a sensor array size equal to 500, a gap equal to 120, and different numbers of sensors.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 7 .
Figure 7.The figure shows the reconstruction results using different methods with a sensor array size equal to 500, a gap equal to 120, and different numbers of sensors.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 8 .
Figure 8.The figure shows the reconstruction results using different methods with a sensor arra size equal to 500 and gap equal to 120 on cross-dataset.(a) Ground truth, (b) SRSAA, (c) residua image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 8 .
Figure 8.The figure shows the reconstruction results using different methods with a sensor array size equal to 500 and gap equal to 120 on cross-dataset.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 9 .
Figure 9.The figure shows the reconstruction results using different methods at different sensor array sizes and gap equal to 180.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Figure 10 .
Figure 10.The figure shows the reconstruction results using different methods at different sampling rates with a sensor size equal to 500 and gap equal to 120.(a) Ground truth, (b) SRSAA, (c) residual image between (a) and (b), (d) HoloDiffusion, (e) residual image between (a) and (d).

Table 1 .
The table shows the quantitative reconstruction results at different sensor gaps.

Table 1 .
The table shows the quantitative reconstruction results at different sensor gaps.

Table 2 .
The table shows the quantitative reconstruction results with different numbers of sensors.A pre-trained diffusion model is employed to evaluate the generalization capabilities of the model across various datasets.The effectiveness and robustness of both methods are gauged.

Table 2 .
The table shows the quantitative reconstruction results with different numbers of sensors

Table 3 .
The table shows the quantitative reconstruction results on a cross-dataset.

Table 3 .
The table shows the quantitative reconstruction results on a cross-dataset.

Table 4
presents the average PSNR, SSIM, and MSE values for 100 images reconstructed from the MNIST dataset using a sensor count of four and a gap size of 180.The best PSNR, SSIM and MSE values achieved using different methods are highlighted in bold.With sensor sizes of 350, 400, 450, and 500, HoloDiffusion demonstrates an impressive average PSNR gain of 5.22 dB.Remarkably, when the sensor array size is 500, 11.10 dB can be achieved by the HoloDiffusion method.At the same time, compared with the SRSAA method, the reconstruction results of the HoloDiffusion method have higher SSIM values and smaller MSE values.Consequently, HoloDiffusion provides considerable improvements in noise and artifact suppression.

Table 4 .
The table shows the quantitative reconstruction results with different sensor array sizes.

Table 5 .
The table shows the quantitative reconstruction results at different sampling rates.

Table 5 .
The table shows the quantitative reconstruction results at different sampling rates.