Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy

Li, Jiaosheng; Liu, Haoran; Lai, Zeyu; Chen, Yifei; Shan, Chun; Zhang, Shuting; Liu, Youyou; Huang, Tude; Ma, Qilin; Zhang, Qinnan

doi:10.3390/photonics12070708

Open AccessArticle

Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy

by

Jiaosheng Li

^1,†,

Haoran Liu

^1,†,

Zeyu Lai

¹,

Yifei Chen

¹,

Chun Shan

²,

Shuting Zhang

²,

Youyou Liu

²,

Tude Huang

¹,

Qilin Ma

³ and

Qinnan Zhang

^2,*

¹

School of Photoelectric Engineering, Guangdong Polytechnic Normal University, Guangzhou 510665, China

²

School of Electronics and Information, Guangdong Polytechnic Normal University, Guangzhou 510665, China

³

School of Electrical and Photoelectronic Engineering, West Anhui University, Lu’an 237012, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Photonics 2025, 12(7), 708; https://doi.org/10.3390/photonics12070708

Submission received: 17 June 2025 / Revised: 8 July 2025 / Accepted: 13 July 2025 / Published: 14 July 2025

(This article belongs to the Special Issue Advancements in Optical Metrology and Imaging)

Download

Browse Figures

Versions Notes

Abstract

Digital holography can capture and reconstruct 3D object information, making it valuable for biomedical imaging and materials science. However, traditional holographic reconstruction methods require the use of phase shift operation in the time or space domain combined with complex computational processes, which, to some extent, limits the range of application areas. The integration of deep learning (DL) advancements with physics-informed methodologies has opened new avenues for tackling this challenge. However, most of the existing DL-based holographic reconstruction methods have high model complexity. In this study, we first design a lightweight model with fewer parameters through the synergy of deep separable convolution and Swish activation function and then employ it as a teacher to distill a smaller student model. By reducing the number of network layers and utilizing knowledge distillation to improve the performance of a simple model, high-quality holographic reconstruction is achieved with only one hologram, greatly reducing the number of parameters in the network model. This distilled lightweight method cuts computational expenses dramatically, with its parameter count representing just 5.4% of the conventional Unet-based method, thereby facilitating efficient holographic reconstruction in settings with limited resources.

Keywords:

holographic imaging; single exposure; deep learning; knowledge distillation

1. Introduction

Digital holography achieves non-destructive acquisition of 3D wavefront information of an object by recording the interference fringes between the object and reference beam [1,2]. Compared with traditional optical microscopy techniques, it demonstrates unique advantages in fields such as live-cell observation and micro/nano-particle tracking, et al. [3,4,5,6]. However, limitations such as speckle noise and phase-wrapping effects introduced by coherent light sources severely constrain the quality of reconstructed images. Traditional digital holography relies on mechanical phase shift devices to acquire multiple interference images step-by-step for complex amplitude reconstruction, resulting in high system complexity [7,8]. The time-division control mechanism introduces acquisition delays, significantly restricting real-time detection in dynamic scenarios. Although polarization-based phase-shifting techniques can improve real-time performance, they impose certain constraints on the spatial bandwidth product and require complex spatial matching systems [9,10].

In recent years, deep learning (DL) has demonstrated powerful feature representation capabilities and end-to-end processing efficiency in image processing tasks. By automatically learning hierarchical image features through multi-layer neural networks, DL enables robust and adaptive analysis of visual data [11,12]. The DL-based method bypasses the requirement for rigorous analytical models by employing its powerful data-driven modeling capabilities, which are acquired through direct learning from input–output data pairs. The efficacy of DL systems is fundamentally determined by their architectural topology, which regulates inter-layer information propagation dynamics. In digital holography applications, most deployed deep neural networks (DNNs) are adapted from established architectures such as convolutional neural networks (CNNs) [13,14,15], deep generative models [16], and recurrent neural networks (RNNs) [17]. However, holograms typically contain substantial amounts of information and are captured at high resolutions. To effectively utilize this rich information, researchers often opt for deeper networks with increased parameters and sophisticated mechanisms like attention mechanisms [18,19], while the large size of holographic images poses challenges for implementing computationally and memory-intensive models. The development of lightweight network architectures in recent years has introduced efficient DL models capable of preserving high performance with reduced parameters and lower computational requirements [20,21]. These models are now finding increasing application in end-to-end frameworks for holographic direct reconstruction and computational holography [22,23].

Drawing from this, we introduce a lightweight distillation network (Ldnet) model designed to minimize the number of network layers and enhance the capabilities of simpler models via a knowledge distillation strategy, ultimately cutting down on resource usage and facilitating lightweight deployment. The knowledge distillation strategy is a way to achieve complex model performance using a simple model [24], which is a technique that transfers knowledge from a large, complex model (teacher model) to a smaller, simpler single model (student model), offering benefits such as increased efficiency and reduced resource consumption while maintaining high levels of performance [25]. We first designed a lightweight deep separable network (Lnet) model, which utilizes deep separable convolution and the Swish activation function to maintain encoder–decoder symmetry. Through cross-layer skip connections, multi-scale feature fusion is achieved, effectively compensating for the high-frequency information loss caused by pooling operations and improving the ability to reconstruct high-frequency details. Furthermore, the model significantly compresses the number of parameters while enhancing deep feature representation through a dynamic channel expansion strategy in bottleneck layers, ensuring both lightweight design and robust performance. This ‘expansion’ is functional rather than structural. The number of physical channels remains unchanged; On the contrary, this strategy dynamically adjusts the effective contribution of each channel through an adaptive feature refinement process. This dynamic behavior is injected into the student model (Ldnet) through the attention transfer loss during the knowledge distillation process. Subsequently, this model serves as a teacher model to guide the training of a more compact student model, whose core advantage lies in achieving both lightweight architecture and competitive performance. The designed lightweight distillation model achieves superior reconstruction quality while reducing parameters by approximately 95% compared to conventional Unet-based methods [15,26]. Moreover, through gradient accumulation, mixed-precision training, and EMA smoothing, training efficiency and stability are significantly improved, which holds promise for meeting the requirements of real-time holographic reconstruction on mobile devices.

2. Theoretical Framework and Network Design

2.1. Theoretical Framework

Assuming the recorded object is located on plane

x_{0} y_{0}

, with its complex amplitude distribution denoted as

O (x_{0}, y_{0})

on this plane, the hologram recording plane is positioned at

x_{R} y_{R}

. Given the reference wave field

R (x_{R}, y_{R})

and the object-to-CCD separation distance

d_{0}

, the object wave field formed at the recording plane through Fresnel diffraction propagation can be expressed as:

O (x_{R}, y_{R}) = ∭ O (x_{O}, y_{O}) \frac{\exp [i k \sqrt{z_{O}^{2} + {(x_{R} - x_{O})}^{2} + {(y_{R} - y_{O})}^{2}}]}{i λ \sqrt{z_{O}^{2} + {(x_{R} - x_{O})}^{2} + {(y_{R} - y_{O})}^{2}}} d x_{O} d y_{O} d z_{O}

(1)

where

λ

denotes the wavelength of the incident light, and k represents the wavenumber (

k = 2 π / λ

). Typically, the reference wave can be regarded as a plane wave with constant amplitude, expressed as

R (x_{R}, y_{R}) = A_{R} \exp (i φ_{R})

, where

A_{R}

indicates the amplitude of the reference wave and

φ_{R}

represents its phase. The interference pattern intensity distribution recorded by the CCD is derived from the coherent superposition of the object wave and reference wave, mathematically expressed as:

I (x_{R}, y_{R}) = {|R (x_{R}, y_{R}) + O (x_{R}, y_{R})|}^{2}

(2)

After acquiring the hologram and completing the phase shift operation, conventional holographic reconstruction techniques can be employed to recover the original image [7,8]. Departing from classical approaches, we developed a lightweight knowledge distillation framework to establish the nonlinear mapping between holograms and their corresponding reconstructions. The dataset consists of 2400 hologram-image pairs for training, along with 310 samples each for testing and validation. By integrating the optimized network model with a single hologram as input, a high-quality original image can be reconstructed. As illustrated in Figure 1, a Mach–Zehnder phase-shifting interferometry system was built to collect holograms. The designed distillation architecture comprises a teacher model and a student model. Our architecture development began with a lightweight deep separable network (Lnet) featuring encoder–decoder symmetry with cross-layer skip connections for multi-scale feature fusion. In addition, by dynamically expanding the number of channels in the bottleneck layer, the model enhances the ability to express deep features while ensuring light weight, significantly reducing the number of parameters. By using this lightweight network to extract features from holograms, reconstructed data and training weights can be obtained. Leveraging knowledge distillation, we subsequently employed this model as the teacher to supervise the training of a more compact student model. The student model learns the reconstructed data and training weights of the teacher model to perform another feature extraction on the hologram, obtaining a better reconstructed image with a lower number of parameters.

2.2. Teacher Model Architecture (Lnet)

The teacher model employs a modified Unet architecture specifically optimized for holographic image reconstruction. The core innovation involves replacing standard convolution with depthwise separable convolution, effectively reducing computational complexity while preserving feature representation capability. The encoder consists of four downsampling stages, each utilizing depthwise separable convolutions with Swish activation for feature extraction, followed by max pooling to form an expanding channel pathway (64→128→256→512). The bottleneck layer employs 1024-channel depthwise separable convolutions to encode global information. In the decoder, transposed convolutions perform upsampling, with skip connections to corresponding encoder-level feature maps for cross-layer information fusion and spatial detail recovery. The final output layer generates reconstruction results matching the input hologram dimensions through a 1 × 1 convolution.

2.3. Student Model Architecture

The student model adopts a lightweight architecture derived from the teacher network, preserving structural similarity while halving channel numbers to achieve significant computational efficiency. Incorporated with a multi-level knowledge incremental mechanism, it achieves multi-dimensional knowledge transfer from low-texture to high-level features by adding auxiliary heads (1 × 1 convolution) at each decoder layer to project student features to the teacher’s corresponding dimensions. This design enables multi-granularity knowledge distillation spanning from low-level edge features to high-level semantic representations, allowing the compact student model to perform hierarchical feature matching with the teacher while preserving its lightweight advantage.

2.4. Knowledge Training Framework

The training framework employs a dynamically weighted hybrid loss function that combines the primary task loss with feature-level losses, incorporating a spatial weighting strategy to prioritize high-response regions. The optimization techniques of the framework also integrate EMA model updates, gradient accumulation, mixed-precision training gradient pruning, and Cosine annealing learning rate scheduling with restarts, effectively balancing model performance and computational efficiency. The Lnet training strategy is optimized using a composite loss function. The total loss function is defined as (hyperparameter α = 0.5):

L_{T o t a l} = α \cdot L_{L 1} + (1 - α) \cdot L_{H uber}

(3)

The L₁ loss is used to calculate the absolute error between the predicted value

y_{i}

and the label

x_{i}

, expressed as:

L_{L 1} = \frac{1}{N} \sum_{i - 1}^{N} |x_{i} - y_{i}|

(4)

And the Huber loss

L_{H uber}

is expressed in the form of a piece-wise function (hyperparameter δ = 1.0):

L_{H u b e r} = \frac{1}{N} \sum_{i - 1}^{N} \{\begin{matrix} \frac{1}{2} {(x_{i} - y_{i})}^{2} & i f |x_{i} - y_{i}| < δ \\ δ \cdot (|x_{i} - y_{i}| - \frac{1}{2} δ) & i f |x_{i} - y_{i}| \geq δ \end{matrix}

(5)

The loss function of the Ldnet model adopts a joint optimization framework of task loss and multi-level distillation loss, and the total loss function is defined as:

L_{T o t a l} = α \cdot L_{t a s k} + (1 - α) \cdot L_{d i s t i l l}

(6)

The hyperparameter α adopts a dynamic weight adjustment algorithm, which linearly modulates the ratio between the task loss and distillation loss based on the training progress. Initially, a ratio of 0.1:0.9 is used to prioritize knowledge transfer, gradually shifting to a task-oriented ratio of 0.7:0.3 in later stages. This enables a smooth transition from imitation learning to autonomous optimization. The primary task loss

L_{t a s k}

, which ensures fundamental reconstruction capability by directly constraining the pixel-wise error between the student’s output and ground-truth label, is defined as:

L_{t a s k} = \frac{1}{N} \sum_{i = 1}^{N} {‖y_{s t u d e n t}^{(i)} - y_{t a r g e t}^{(i)}‖}_{1}

(7)

Multi-level feature distillation loss

L_{d i s t i l l}

, with a stepped allocation strategy for the weight of each level loss. The bottom level feature weight is 0.1, increasing layer by layer to the highest level of 0.4, enhancing the transmission efficiency of high-level semantic knowledge. Mathematically, it is described as:

L_{d i s t i l l} = 0.1 L_{d e c 1} + 0.2 L_{d e c 2} + 0.3 L_{d e c 3} + 0.4 L_{d e c 4}

(8)

The above losses are computed for each of the decoder’s four layers, where the weight for the feature similarity loss is set to

λ_{f e a t} = 0.8

, and the weight for the attention transfer loss is set to

λ_{a t t} = 0.2

, expressed as:

L_{d e c} = λ_{f e a t} L_{\cos}^{(d e c)} + λ_{a t t} L_{a t t}^{(d e c)}

(9)

Subsequently, the cosine similarity

L_{\cos}^{(k)}

of the normalized features is computed and weighted by the teacher’s attention map, effectively mitigating optimization bias caused by feature scale discrepancies:

L_{c o s}^{(k)} = 1 - \frac{1}{N} \sum_{i = 1}^{N} (A_{t}^{(k, i)} \cdot c o s_s i m ({\hat{f}}_{s}^{(k, i)}, {\hat{f}}_{t}^{(k, i)}))

(10)

where

{\hat{f}}_{s}^{(k)}

and

{\hat{f}}_{t}^{(k)}

denote channel-wise normalization operations applied to student and teacher features respectively, formulated as:

{\hat{f}}_{s}^{(k)} = \frac{f_{s}^{(k)}}{{‖f_{s}^{(k)}‖}_{2}}

(11)

{\hat{f}}_{t}^{(k)} = \frac{f_{t}^{(k)}}{{‖f_{t}^{(k)}‖}_{2}}

(12)

The symbols

f_{s}^{(k)}

and

f_{t}^{(k)}

correspond to the feature maps output by the k-th decoder layer of the student and teacher models, respectively. The attention transfer loss

L_{a t t}^{(k)}

in Equation (9), which constrains the student’s attention maps to align with the teacher’s, is defined as:

L_{a t t}^{(k)} = \frac{1}{N} \sum_{i = 1}^{N} {‖A_{s}^{(k, i)} - A_{t}^{(k, i)}‖}_{2}^{2}

(13)

where

A_{s}^{(k)}

and

A_{t}^{(k)}

denote the spatial attention maps of the student and teacher features, respectively, generated by computing the mean absolute values of the features:

A_{s}^{(k)} = s i g m o i d (\frac{1}{C} \sum_{C = 1}^{C} |f_{s}^{(k, C)}|)

(14)

A_{t}^{(k)} = s i g m o i d (\frac{1}{C} \sum_{C = 1}^{C} |f_{t}^{(k, C)}|)

(15)

Experimental measurements of network performance indicators reveal substantial variations among the three models in terms of computational efficiency and storage demands, as specifically quantified in Table 1. The conventional Unet serves as a benchmark model, exhibiting substantial computational redundancy with a parameter count of 31.03 M, a computational complexity of 54.53 G FLOPs, and a model file size of up to 118.37 MB. The teacher model (Lnet) uses depthwise separable convolution and channel compression strategies to significantly reduce the parameter count to 5.99 M (80.7% reduction), the computational complexity to 28 G FLOPs (48.6% reduction), and the model storage space to 22.84 MB. While maintaining the advantage of the Unet feature pyramid structure, it achieves intensive utilization of computing resources. The Ldnet further optimized the network architecture, reducing the parameter count to 1.69 M (71.8% less than Lnet), FLOPs to 8.23 GB (70.6% less than Lnet), and the model volume to only 6.44 MB. When computing on resource limited devices, such as CPU, the frame rates of the three models are 2.98, 3.52, and 8.17 frames/s, respectively. Remarkably, through synergistic channel pruning and multi-stage distillation, Ldnet preserves the teacher model’s representational capacity despite operating with merely 5.4% of the original Unet’s parameters. Notably, Ldnet achieves an 84.9% reduction in FLOPs compared to conventional Unet, indicating an approximately 8-fold decrease in floating-point operations per inference. This computational efficiency significantly enhances deployment feasibility on edge devices and provides a viable solution for real-time holographic reconstruction on mobile platforms. Parameter efficiency analysis reveals that during the construction process from Unet to Lnet and then to Ldnet, the effective FLOPs generated per megabyte of parameters increased from 1.76 G to 4.88 G, and the utilization rate of computing resources increased by 277%, fully reflecting the superiority of the teacher–student framework in the field of model compression.

3. Analysis and Verification of Experimental Results

We selected one set of results from the validation set and presented it in Figure 2. Figure 2a shows the reconstructed label data, while Figure 2(b–d) show the results of Unet, Lnet, and Ldnet training reconstructions. The calculated structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) values are 0.88, 0.89, 0.92, and 19.76, 20.42, and 21.57 dB, respectively. To display the reconstruction results more intuitively, the 80th row was extracted and plotted in Figure 2e–h. The curve analysis reveals a notable discrepancy between the reconstruction results of the traditional Unet teacher model and distillation model at the 80th row, whereas Ldnet’s outputs maintain strong alignment with the ground truth Label. The quantitative results and curve analysis above further prove that the proposed method can obtain high-precision holographic reconstruction results with fewer model parameters.

Subsequently, to further demonstrate the reliability of the results, we selected five additional sets of reconstruction results with different numbers, as shown in Figure 3. The label images appear in the first column, with their corresponding experimental holograms shown in the second column. When processed by three pre-trained networks, these holograms generate reconstructed outputs presented in columns three through five. Qualitative assessment confirms all three networks successfully reconstruct accurate digital information. For quantitative evaluation, we computed PSNR, SSIM and Mean absolute error (MAE) metrics across five reconstruction sets, with results plotted in Figure 4, Figure 5 and Figure 6. Clearly, the number 1 has the highest quality of reconstruction among the three networks. The reconstruction results of Unet are comparable to those of Lnet, while the reconstruction results of Ldnet are the best among various digital graphs. The results demonstrate that Ldnet achieves higher reconstruction accuracy compared to the other two methods and aligns more closely with the ground truth, confirming its feasibility and superiority.

4. Discussion

To further investigate the impact of model parameter quantity on network performance, we reduce the number of channels in Ldnet by half in this section. The resulting model, denoted as Ldnet_half, contains 0.47 M parameters with a computational complexity of 2.41 G FLOPs. After training with the same strategy, three sets of predicted digit results are presented in Figure 7. For quantitative analysis, we evaluated the performance using three metrics: PSNR, SSIM, and MAE. The results are summarized in Table 2, which shows that Ldnet_half still achieves satisfactory reconstruction quality despite the reduced number of channels. However, its performance is slightly inferior to that of the original Ldnet model.

Additionally, the impact of the Swish activation function on network performance was further discussed. An Lnet model was designed that only utilizes depthwise separable convolution (Lnet_onlyDS) without using the Swish activation function. The activation function used was the same ReLU function as Unet. Under identical training conditions, the network’s predicted results are shown in Figure 8. Similar to previous analyses, we conduct a quantitative evaluation of these results, as presented in Table 3. The findings indicate that using depthwise separable convolution alone yields slightly inferior reconstruction quality compared to the original Lnet design. This demonstrates the effectiveness and necessity of combining depthwise separable convolution with the Swish activation function for optimal performance.

Furthermore, we investigated the robustness of Ldnet to varying noise levels. We introduced Gaussian noise perturbations at 1%, 2%, 5%, 10%, and 15% during the imaging process, with the corresponding network outputs visualized in Figure 9. Quantitative analysis (Table 4) demonstrates that the reconstruction quality remains stable, maintaining a PSNR around 15 dB and SSIM consistently above 0.80 across all noise levels (1–15%). Notably, the PSNR and SSIM values show negligible variation with increasing noise levels, further confirming the network’s strong noise resistance capability.

5. Conclusions

In this paper, we propose an efficient single-exposure holographic imaging method implemented through a lightweight distillation network (Ldnet). Through the collaborative design of depthwise separable convolution and Swish activation function, we first constructed a lightweight depthwise separable network (Lnet). This model utilizes cross-layer skip connections to achieve multi-scale feature fusion while maintaining the encoder–decoder symmetric structure. It not only effectively compensates for the high-frequency information loss caused by pooling operations but also significantly improves the ability to reconstruct high-frequency details. In addition, by dynamically expanding the number of channels in the bottleneck layer, the model enhances the ability to express deep features while ensuring light weight, significantly reducing the number of parameters. Subsequently, this model was used as the teacher model, combined with the knowledge distillation strategy, to further guide the training of a student model with fewer parameters. Ultimately, just feeding one hologram into the pre-trained student model enables fast and high-quality holographic reconstruction. The proposed method only requires one hologram to achieve high-quality holographic reconstruction, which is superior to the output results of traditional Unet models. Moreover, it only requires 5.4% of the Unet model parameters and is expected to meet the real-time holographic reconstruction requirements of mobile devices. However, due to the light weight of the model, there is an upper limit to its expressive power, which may result in insufficient recovery of ultra-high resolution or extremely high-frequency details. Moreover, as a data-driven approach, generalization capabilities have always been a major issue limiting the computational applications of deep learning. Although methods like transfer learning can improve generalization, the transfer learning of lightweight models requires a more complex process. To overcome this limitation, we will further investigate transfer learning methods specifically designed for lightweight models.

Author Contributions

Conceptualization, J.L. and H.L.; funding acquisition, J.L., C.S., Q.M. and Q.Z.; investigation, J.L. and Q.Z.; methodology, Q.Z.; software, H.L. and Z.L.; validation, J.L. and H.L.; writing—original draft, J.L., H.L. and Z.L.; writing—review and editing, Y.C., C.S., S.Z., Y.L., T.H. and Q.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic and Applied Basic Research Foundation of Guangdong Province, grant number 2024A1515011584; the National Natural Science Foundation of China, grant number 62205059 and 62273108; Special Projects in Key Fields of Ordinary Universities in Guangdong Province, grant number 2024ZDZX3047; Guangdong Province University Characteristic Innovation Project, grant number 2024KTSCX195; the Youth Project of Guangdong Artificial Intelligence and Digital Economy Laboratory (Guangzhou), grant number PZL2022KF0006; and Scientific Research Projects of Anhui Universities, grant number 2023AH052641.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Javidi, B.; Carnicer, A.; Anand, A.; Barbastathis, G.; Chen, W.; Ferraro, P.; Goodman, J.W.; Horisaki, R.; Khare, K.; Kujawinska, M. Roadmap on digital holography. Opt. Express 2021, 29, 35078–35118. [Google Scholar] [CrossRef] [PubMed]
Tan, J.; Liu, J.; Wang, X.; He, Z.; Su, W.; Huang, T.; Xie, S. Large depth range binary-focusing projection 3D shape reconstruction via unpaired data learning. Opt. Lasers Eng. 2024, 181, 108442. [Google Scholar] [CrossRef]
Anand, A.; Javidi, B. Digital holographic microscopy for automated 3D cell identification: An overview. Chin. Opt. Lett. 2014, 12, 060012. [Google Scholar] [CrossRef]
Hong, J. A review of 3D particle tracking and flow diagnostics using Digital Holography. Meas. Sci. Technol. 2025, 36, 032005. [Google Scholar] [CrossRef]
Tan, J.; Niu, H.; Su, W.; He, Z. Structured light 3D shape measurement for translucent media base on deep Bayesian inference. Opt. Laser Technol. 2025, 181, 111758. [Google Scholar] [CrossRef]
Huang, H.; Huang, H.; Zheng, Z.; Gao, L. Insights into infrared crystal phase characteristics based on deep learning holography with attention residual network. J. Mater. Chem. A 2025, 13, 6009–6019. [Google Scholar] [CrossRef]
Kim, M.K. Phase-Shifting Digital Holography; Springer: Berlin/Heidelberg, Germany, 2011; Volume 162, pp. 95–108. [Google Scholar] [CrossRef]
Li, F.; Fang, H.; Jing, H.; Su, Y. Multi-image encryption based on three-step phase-shifting digital holography. J. Opt. 2025, 1–13. [Google Scholar] [CrossRef]
Kakue, T.; Yonesaka, R.; Tahara, T.; Awatsuji, Y.; Nishio, K.; Ura, S.; Kubota, T.; Matoba, O. High-speed phase imaging by parallel phase-shifting digital holography. Opt. Lett. 2011, 36, 4131–4133. [Google Scholar] [CrossRef]
Bai, F.; Lang, J.; Gao, X.; Zhang, Y.; Cai, J.; Wang, J. Phase shifting approaches and multi-channel interferograms position registration for simultaneous phase-shifting interferometry: A review. Photonics 2023, 10, 946. [Google Scholar] [CrossRef]
Rivenson, Y.; Wu, Y.; Ozcan, A. Deep learning in holography and coherent imaging. Light Sci. Appl. 2019, 8, 85. [Google Scholar] [CrossRef]
Li, J.; Wu, B.; Liu, T.; Zhang, Q. URNet: High-quality single-pixel imaging with untrained reconstruction network. Opt. Lasers Eng. 2023, 166, 107580. [Google Scholar] [CrossRef]
Zhang, Z.; Zheng, Y.; Xu, T.; Upadhya, A.; Lim, Y.J.; Mathews, A.; Xie, L.; Lee, W.M. Holo-UNet: Hologram-to-hologram neural network restoration for high fidelity low light quantitative phase imaging of live cells. Biomed. Opt. Express 2020, 11, 5478–5487. [Google Scholar] [CrossRef] [PubMed]
Ren, Z.; Xu, Z.; Lam, E.Y. Learning-based nonparametric autofocusing for digital holography. Optica 2018, 5, 337–344. [Google Scholar] [CrossRef]
Nagahama, Y. Phase retrieval using hologram transformation with U-Net in digital holography. Opt. Contin. 2022, 1, 1506–1515. [Google Scholar] [CrossRef]
Kang, J.W.; Park, B.S.; Kim, J.K.; Kim, D.W.; Seo, Y.H. Deep-learning-based hologram generation using a generative model. Appl. Opt. 2021, 60, 7391–7399. [Google Scholar] [CrossRef]
Huang, L.; Liu, T.; Yang, X.; Luo, Y.; Rivenson, Y.; Ozcan, A. Holographic image reconstruction with phase recovery and autofocusing using recurrent neural networks. ACS Photonics 2021, 8, 1763–1774. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, Q.; Liu, T.; Li, J. Single-pixel deep phase-shifting incoherent digital holography. Opt. Express 2024, 32, 35939–35951. [Google Scholar] [CrossRef]
Xu, G.; Feng, J.; Lyu, J.Y.; Dian, S.; Jin, B.; Liu, P. A coarse-to-fine attention-guided autofocusing for holography under high noisy scenes with explainable neural network. Opt. Lasers Eng. 2025, 190, 108945. [Google Scholar] [CrossRef]
Mehta, S.; Rastegari, M. Mobilevit: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar] [CrossRef]
Liu, Y.; Xue, J.; Li, D.; Zhang, W.; Chiew, T.K.; Xu, Z. Image recognition based on lightweight convolutional neural network: Recent advances. Image Vis. Comput. 2024, 146, 105037. [Google Scholar] [CrossRef]
Zeng, T.; So, H.K.-H.; Lam, E.Y. RedCap: Residual encoder-decoder capsule network for holographic image reconstruction. Opt. Express 2020, 28, 4876–4887. [Google Scholar] [CrossRef] [PubMed]
An, Q.; Liu, X.; Men, G.; Dou, J.; Di, J. Frequency-domain learning-driven lightweight phase recovery method for in-line holography. Opt. Express 2025, 33, 5890–5899. [Google Scholar] [CrossRef]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Xiang, J.; Colburn, S.; Majumdar, A.; Shlizerman, E. Knowledge distillation circumvents nonlinearity for optical convolutional neural networks. Appl. Opt. 2022, 61, 2173–2183. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. pp. 234–241. [Google Scholar]

Figure 1. Lightweight distilled network (Ldnet) structure for digital holographic reconstruction.

Figure 2. Comparison results of the digit ‘8’ in the experiment. (a) Reconstructed label data; Reconstruction results of (b) Unet, (c) Lnet, and (d) Ldnet; The 80th row extracted from: (e) reconstructed label data; (f) Unet reconstruction results; (g) Lnet reconstruction results; and (h) Ldnet reconstruction results.

Figure 3. Reconstruction results of different digits.

Figure 4. Comparative PSNR performance of Unet, Lnet, and Ldnet in different digit reconstructions.

Figure 5. Comparative SSIM performance of Unet, Lnet, and Ldnet in different digit reconstructions.

Figure 6. Comparative MAE performance of Unet, Lnet, and Ldnet in different digit reconstructions.

Figure 7. Reconstruction results of different digits for the Ldnet and Ldnet_half models.

Figure 8. Reconstruction results of different digits for the Lnet and Lnet_onlyDS models.

Figure 9. Noise analysis results. The top row presents holograms under varying noise conditions, while the bottom row displays their corresponding reconstructed results output by the Ldnet model at different noise levels.

Table 1. FLOP Model parameter count, FLOPs, model size, and frame rate for three deep learning methods.

Network	Params (M)	Flops (G)	Model Size (MB)	FPS (Frames/s)(cpu)
Unet	31.03	54.53	118.37	2.98
Lnet	5.99	28	22.84	3.52
Ldnet	1.69	8.23	6.44	8.17

Table 2. PSNR, SSIM, and MAE for the Ldnet and Ldnet_half models.

Index	Network	Digit ‘9’	Digit ‘3’	Digit ‘6’
PSNR (dB)	Ldnet_half	19.02	20.45	18.37
PSNR (dB)	Ldnet	22.13	21.48	20.34
SSIM	Ldnet_half	0.8946	0.9109	0.8822
SSIM	Ldnet	0.9259	0.9129	0.8983
MAE	Ldnet_half	7.42	6.29	8.34
MAE	Ldnet	5.14	5.89	6.85

Table 3. PSNR, SSIM, and MAE for the Lnet and Lnet_onlyDS models.

Index	Network	Digit ‘4’	Digit ‘0’	Digit ‘1’
PSNR (dB)	Lnet_onlyDS	16.38	16.17	21.20
PSNR (dB)	Lnet	19.06	17.83	22.70
SSIM	Lnet_onlyDS	0.8584	0.8121	0.9254
SSIM	Lnet	0.8747	0.8329	0.9307
MAE	Lnet_onlyDS	11.39	14.08	5.06
MAE	Lnet	8.39	11.19	4.33

Table 4. PSNR and SSIM of the reconstructed results output by the Ldnet model at different noise levels.

Noise Level	Without Noise	1%	2%	5%	10%	15%
PSNR (dB)	17.39	15.36	15.34	15.31	15.31	14.90
SSIM	0.8739	0.803	0.8024	0.8029	0.8044	0.8008

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Liu, H.; Lai, Z.; Chen, Y.; Shan, C.; Zhang, S.; Liu, Y.; Huang, T.; Ma, Q.; Zhang, Q. Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy. Photonics 2025, 12, 708. https://doi.org/10.3390/photonics12070708

AMA Style

Li J, Liu H, Lai Z, Chen Y, Shan C, Zhang S, Liu Y, Huang T, Ma Q, Zhang Q. Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy. Photonics. 2025; 12(7):708. https://doi.org/10.3390/photonics12070708

Chicago/Turabian Style

Li, Jiaosheng, Haoran Liu, Zeyu Lai, Yifei Chen, Chun Shan, Shuting Zhang, Youyou Liu, Tude Huang, Qilin Ma, and Qinnan Zhang. 2025. "Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy" Photonics 12, no. 7: 708. https://doi.org/10.3390/photonics12070708

APA Style

Li, J., Liu, H., Lai, Z., Chen, Y., Shan, C., Zhang, S., Liu, Y., Huang, T., Ma, Q., & Zhang, Q. (2025). Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy. Photonics, 12(7), 708. https://doi.org/10.3390/photonics12070708

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Single-Exposure Holographic Imaging via a Lightweight Distilled Strategy

Abstract

1. Introduction

2. Theoretical Framework and Network Design

2.1. Theoretical Framework

2.2. Teacher Model Architecture (Lnet)

2.3. Student Model Architecture

2.4. Knowledge Training Framework

3. Analysis and Verification of Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI