Wavefront Reconstruction Using Two-Frame Random Interferometry Based on Swin-Unet

: Due to its high precision, phase-shifting interferometry (PSI) is a commonly used optical component detection method in interferometers. However, traditional PSI, which is susceptible to environmental factors, is costly, with piezoelectric ceramic transducer (PZT) being a major contributor to the high cost of interferometers. In contrast, two-frame random interferometry does not require precise multiple phase shifts, which only needs one random phase shift, reducing control costs and time requirements, as well as mitigating the impact of environmental factors (mechanical vibrations and air turbulence) when acquiring multiple interferograms. A novel method for wavefront reconstruction using two-frame random interferometry based on Swin-Unet is proposed. Besides, improvements have been made on the basis of the established algorithm to develop a new wavefront reconstruction method named Phase U-Net plus (PUN+). According to training the Swin-Unet and PUN+ with a large amount of simulated data generated by physical models, both of the methods accurately compute the wrapped phase from two frames of interferograms with an unknown phase step (except for multiples of π ). The superior performance of both methods is effectively showcased by reconstructing phases from both simulated and real interferograms, in comprehensive comparisons with several classical algorithms. The proposed Swin-Unet outperforms PUN+ in reconstructing the wrapped phase and unwrapped phase.


Introduction
PSI is one of the most popular techniques in optical metrology [1], known for its high robustness and accuracy.Traditional PSI require multiple interferograms with fixed and known phase shifts [2][3][4], but acquiring interferograms is time-consuming and susceptible to adverse effects from mechanical vibrations, environmental turbulence, and temperature variations.Therefore, it is desirable to minimize the number of interferograms.However, single-frame interferometry for wavefront reconstruction requires additional prior information for phase ambiguity.Takeda et al. [5] proposed a Fourier transform-based method that introduces a large spatial carrier frequency by adding significant tilt to the testing object or reference surface, which allows separating the phase from other information in the frequency domain.However, the method is not suitable for interferograms with closed fringes and suffers from fringe densification due to high tilt.Ge et al. [6] specified the concavity and convexity of the phase and were able to recover a monotonic phase from a single interferogram with closed fringes.The concavity or convexity of the phase is difficult to determine.Different from single-frame interferometry, two-frame random interferometry effectively solves the phase ambiguity problem and provides better reconstruction accuracy, which introduces an unknown phase step between the two frames of interferograms, significantly reducing detection costs and shortening capture time compared to traditional phase-shifting interferometry.In comparison, two-frame random interferometry achieves a good balance between capturing time and reconstruction accuracy, which has received extensive research attention.
Kreis et al. [7] proposed a Fourier transform-based two-frame interferometric phaseshifting wavefront reconstruction, known as the Kreis method, which demodulated the phase from two frames of interferograms without introducing sign ambiguity.The original Kreis method does not consider the pre-filtering process, making it sensitive to noise in practical applications.Vargas et al. [8] proposed a two-frame reconstruction method based on regularized optical flow (OF), which calculated the motion direction of fringes from the two frames of interferograms, applying the spiral phase transform to one interferogram to obtain the wrapped phase and combing it with the fringe motion direction map to eliminate sign ambiguity.But OF requires subtracting the direct current (DC) component of interferograms in advance.Vargas et al. [9] proposed another self-tuning (ST) two-frame phase-shifting method that did not need to know the phase-step between interferograms, by which the quadrature filter was tuned sequentially at a predetermined discrete set of frequencies within [0, 2π], and the reconstructed wrapped phase was obtained.ST method requires subtracting the DC component in interferograms before wavefront reconstruction.Besides, Vargas et al. [10] proposed a two-frame reconstruction method based on Gram-Schmidt (GS) orthogonalization.GS method demodulates the wrapped phase by treating the interferograms as independent vectors, which has the advantages of high efficiency and accuracy but also requires subtracting the DC component in advance.
In recent years, with the development of artificial neural networks, the typical Ushaped convolutional neural network U-Net [11] has been proposed and applied to biomedical image segmentation, consisting of a symmetric encoder-decoder with skip connections.The convolution in U-Net has local restrictions and does not effectively utilize global information until many layers of convolution are performed.Inspired by the tremendous success of Transformers in natural language processing (NLP), researchers have attempted to introduce Transformers into the field of computer vision (CV), Liu et al. [12] proposed Swin Transformer for image recognition tasks, whose self-attention mechanism has a natural advantage in extracting global information.Inspired by Swin Transformer, Cao et al. [13] proposed Swin-Unet for medical image segmentation, combining U-Net with Swin Transformer and achieving higher image segmentation accuracy.
The remarkable achievements of deep learning in CV activated researchers to explore its application in optical metrology.Different from traditional "physical based" approaches, deep learning-supported optical metrology is based on "data-driven" principles.The field of optical metrology has been developed inspired by the achievement of deep learning, such as enhancement [14], denoising [15][16][17], and phase unwrapping [18][19][20][21].Li et al. [22] proposed a two-frame reconstruction method based on the Phase U-Net (PUN) accurately estimating the wrapped phase from two frames of interferograms.PUN requires normalization of the interferograms, offering higher accuracy compared to other two-frame reconstruction methods.To reach the higher precision requirements of reconstructing the wrapped phase in two-frame random interferometry and further improving the reconstruction accuracy, we propose a new two-frame reconstruction method inspired by PUN [22] and Swin-Unet [12], which just require normalization of the interferograms in advance.Concretely, our contributions can be summarized as follows: (1) Swin-Unet has been constructed for wavefront reconstruction from two-frame phase-shifting interferograms, which only needs one random phase shift, reducing control costs and time requirements, as well as mitigating the impact of environmental factors; (2) PUN+, based on original PUN, has been proposed, which includes the development of a bilinear interpolation operation for up-sampling, eliminating the need for transpose convolution, also ReLU has been applied in final convolution layer without Softmax or ELU.Experimental results have confirmed the effectiveness of these changes; (3) Simulations and experiments indicate that both of the two proposed methods have superior performance than other traditional methods and deep learning method (PUN) in wavefront reconstruction.

The Process of Proposed Method
As shown in the blue part of Figure 1a, the proposed method reconstructs the wrapped phase from two randomly shifted interferograms.During network training, as shown in the green part, it involves solving the mean squared error (MSE) loss between the predicted results and the ground truth, and computing gradients for backpropagation.The model parameters are then adjusted and optimized through the adaptive moment estimation (ADAM) optimizer.During network testing, only the blue part is needed, and the green part is not used.Figure 1b illustrates the unwrapped phase being recovered from the wrapped phase by the unwrapping algorithm ("unwrap" function in the MATLAB R2018b [22]).

Theoretical Background
Two-frame random phase-shifting interferometry obtains two-frame interferograms by altering the optical path difference between the reference wavefront and the testing wavefront [23].The expressions for the intensity of the two obtained interference patterns are as follows: where, I 1 (x, y) and I 2 (x, y) represent the intensity of the original and phase shifting in- terferograms, respectively, at the coordinate point (x, y), a(x, y), and b(x, y) represent DC component and modulation component, respectively.ϕ(x, y) represents the phase of the testing wavefront at the coordinate point (x, y), and δ represents the random phase step (except for 0 and π rad).Accurately generating the original phase ϕ(x, y) is crucial in simulations, and Zernike polynomials [24] play a key role in this process.
Zernike polynomials are a sequence of orthogonal and linearly independent polynomials defined on the unit circle.The orthogonality of the Zernike polynomials, enabling them to represent any square-integrable function within the unit disk, allows the coefficients of different polynomials to be independent of each other, which is advantageous in eliminating interference from accidental factors.In addition, Zernike polynomials and Seidel aberration coefficients can be easily correlated.Thus, any continuous arbitrary shape of the wavefront can be represented by a linear combination of Zernike polynomials [25], whose coefficients can be calculated using methods such as least squares fitting [26], Gram-Schmidt [27] and cubic B-spline [28].The expression for the original phase ϕ(x, y) generated by Zernike polynomials is as follows: where L is the coefficient of the highest-order term, Z r represents the Zernike coefficients for each term, and U r represents the Zernike polynomials, whose expression is as follows: where ρ represents the vector length between the coordinate point (x, y) and the origin point, and θ represents the angle between the vector and projection in the x-axis.When (n − 2m) > 0, sin[(n − 2m)θ is used, and when (n The expression for the continuous orthogonality of Zernike polynomials is as follows: where U l n (ρ, θ) and U k m (ρ, θ) represent Zernike polynomials.To visualize that Zernike polynomials can accurately represent continuous wavefront shapes, we measure and analyze the plane mirror by ZYGO GPI-XP/D4 laser interferometer and software Metro Pro ® Version 8.3.5 [29] to obtain the real phase map. Figure 2a,b shows the real phase map and phase map generated by Zernike polynomials, respectively.
After the original phase is generated, the wrapped phase is computed as the corresponding ground truth for the network.The wrapped phase ϕ w represents the phase angle of the original phase, which can be calculated by the MATLAB R2018b function "angle" [18], and then can be mapped directly to the target interval [−π/2, π/2].The left-side feature extraction network consists of four convolution blocks, each containing two sets of 3 × 3 convolutions, batch normalization, and rectified linear unit (ReLU) activation.Additionally, a downsampling layer using max pooling is included.Batch normalization accelerates network convergence and mitigates overfitting.
The right-side feature fusion network consists of four convolution blocks and an upsampling layer.The upsampled feature maps are combined with multi-scale feature maps from the left-side network through skip connections.The purpose of the feature fusion is to compensate for the loss of spatial information during downsampling.Different from PUN and U-Net, we utilize the bilinear interpolation for upsampling.Estimating the pixel value at the target coordinates based on the surrounding pixel values and relative position enables image upscaling, and bilinear interpolation maintains the smoothness and details of the image to some extent which achieves a good balance between computational cost and scaling accuracy, compared to other interpolation algorithms.
The bridging network in the middle comprises a single convolution block that connects the left-side and right-side networks.Different from the original U-Net and PUN employing softmax and ELU activations in the output layer, respectively, PUN+ proposed employs ReLU activation to predict the wrapped phase.The output pixel values of the network image are expected to fall within the range [0, 1], allowing for direct mapping to the target range [−π/2, π/2].Since ReLU activation produces only positive values and has no negative output, we can ensure the network's output remains within the desired range by applying a suitable threshold.
For training, PUN+ employs the MSE loss function which measures the difference between the predicted phase and the ground truth.

Swin-Unet
The Swin-UNet architecture, shown in Figure 4, is a Transformer-based network inspired by U-Net, which consists of an encoder, bottleneck, decoder, and skip connections, and comprises 12 Swin Transformer blocks.
The encoder consists of a linear embedding layer, three sets of two consecutive Swin Transformer blocks, and a patch merging layer.The input image is divided into nonoverlapping patches of size 4 × 4, with each patch having a feature dimension of 32.The linear embedding layer projects the patch features to a specified dimension, generating patch tokens.The tokenized patches, with a resolution of H 4 × W 4 , undergo feature representation learning by two consecutive Swin Transformer blocks.The Swin Transformer blocks maintain the feature dimension and resolution.Simultaneously, the patch merging layer downsamples the patches by a factor of 2× and reduces the resolution to H 8 × W 8 .The process is repeated three times.
The decoder comprises Swin Transformer blocks and a patch-expanding layer.The patch-expanding layer performs 2× upsampling, expanding the feature map from a resolution of H 32 × W 32 to H 16 × W 16 while halving its dimension.Similar to U-Net, the output features from the patch expanding layer are fused with multi-scale features from the encoder through skip connections, mitigating spatial information loss caused by downsampling.The final patch expanding layer uses 4× upsampling to restore the feature map resolution to the original (W × H) while maintaining the same dimension.A linear projection layer is applied to generate pixel-level predictions.
The encoder, bottleneck, and decoder are interconnected, with the bottleneck consisting of two consecutive Swin Transformer blocks that preserve the feature map's dimension and resolution.Figure 4 illustrates the details of two consecutive Swin Transformer blocks.Each block includes a layer normalization (LN) layer, a window-based multi-head selfattention (W-MSA) model, a shifted window-based multi-head self-attention (SW-MSA) model, residual connections, and a multi-layer perception (MLP).W-MSA incorporates window partitioning into the conventional multi-head self-attention (MSA), while SW-MSA additionally incorporates window shifting operations.Using this window partitioning mechanism, two consecutive Swin Transformer blocks can be represented as follows: where z l+1 and z l are outputs of MLP, and ẑl and ẑl+1 are outputs of W-MSA and SW-MSA, respectively.Refer to reference [12], the expression for the self-attention mechanism is as follows: where Q, K, V ∈ R M 2 ×d , Q, K, V represent query, key, and value matrices, respectively.M 2 represents the number of patches by splitting windows, and d represents the dimension of the query or key.

Simulation Dataset
Based on Section 2.2, we generate the simulated training data based on ZYGO's Zernike polynomials, employing a linear combination of nine Zernike polynomials to generate the initial phase.To avoid generating excessively dense fringes, we choose Zernike coefficients from the 2nd to the 10th order (excluding Piston) and randomly assign amplitudes ranging from −9 to +9.This approach enables us to generate a variety of original phase patterns with different types of aberrations while maintaining manageable fringe densities in the interferograms.
To ensure that the trained network has a strong generalization ability and performs well in various scenarios, we utilize a large amount of simulated phase and its correspond-ing interferograms generated from physical models for network training, respectively.By incorporating diverse variations in the training data, we aim to enhance the network's ability to handle different situations and improve its overall performance.Based on Equations ( 1), ( 2) and ( 7), we generate a total of 24,000 pairs of interferograms and their corresponding wrapped phase.These data are used as inputs and targets for the network after undergoing normalization preprocessing.The phase step for each pair of interferograms is a random value between 0 and π rad (excluding 0 and π rad).The dataset is divided into a training set (90%) and a testing set (10%).These interferograms are normalized between 0 and 1 to be the inputs of neural networks, while the predicted wrapped phase is the output.Figure 5a shows an example of an original phase generated by the Zernike polynomials, Figure 5b,c show the corresponding two-frame interferograms with random phase step; Figure 5d shows the corresponding wrapped phase.To better simulate real-world conditions, random Gaussian white noise is added to the interferograms.Based on local means and local variances method (LMLVM) [30], we compute the signal-to-noise ratio (SNR) of testing interferograms with different noise levels.Figure 6 shows simulated interferograms with different noise levels.In addition, we set the DC component and modulation component of the interferograms to follow Gaussian distributions. Figure 7 shows simulated interferograms with different background intensities and modulations.
Our proposed method is trained on simulated data generated based on a physical model.Therefore, the predictive performance of the proposed method heavily relies on the content of the interferograms in the dataset, including but not limited to the settings of random phase shifts, background light intensity, modulation, and noise levels.To achieve better predictive results, it is essential to construct a large dataset for training.The training process poses a challenge in terms of computational resources and time, requiring the use of high-performance GPUs or TPUs, and it involves a lengthy training duration.

Accuracy Test
To begin with, we conduct thorough validation to assess the feasibility and accuracy of the proposed method.The trained networks are subjected to test using the testing dataset.In Figure 8, the demodulation results of Kreis [7], OF [8], ST [9], GS [10], PUN [22], PUN+, and Swin-Unet are presented for a testing image with SNR = 43.5 dB and step = 1 rad.In the magnified detail images, it can be observed that the details reconstructed by PUN+ and Swin-Unet are closer to the ground truth.For further distinction and comparison, Table 1 provides the corresponding root mean square errors (RMSEs) comparing the reconstructed results with the ground truth.The RMSEs between Kreis, OF, ST, GS, and PUN, and the ground truths are 0.5255 rad, 0.5234 rad, 0.2534 rad, 0.2792 rad, and 0.1418 rad, respectively.Our proposed PUN+ and Swin-Unet whose reconstruction errors are 0.0921 rad and 0.0719 rad, respectively, outperform the other five existing reconstruction methods.While PUN is currently the most accurate method available, the reconstruction RMSEs of PUN+ and Swin-Unet are approximately 65% and 50% lower, than that of PUN, respectively.Furthermore, we investigate the accuracy of the proposed method under phase steps ranging from 0 to π rad (excluding 0 and π rad).We compute the RMSEs between the wrapped phase obtained by seven dual-frame reconstruction methods and the ground truth, with interferograms that are devoid of additional noise.The results are plotted in Figure 9.Under noise-free conditions, the fluctuation range of PUN+ (brown line) and Swin-Unet (light blue line) does not exceed 0.055 rad and 0.045 rad, respectively, as the phase shift step changes from 0 to π rad, showing overall stable performance.Additionally, PUN+ and Swin-Unet consistently exhibit RMSEs below 0.1205 rad and 0.1124 rad, respectively, as the phase shift step changes from 0 to π rad.Compared to PUN (purple line), the proposed methods consistently outperform with significantly lower RMSEs for different phase shift steps.Moreover, they also demonstrate better performance near the singular phase π.Unlike OF (red line) and ST (blue line) which experience a rapid decline in reconstruction accuracy near the singular phase, resulting in a jump in RMSE, our proposed methods maintain better performance.These findings demonstrate that the trained Swin-Unet consistently outperformed other methods, which affirms the effectiveness and precision of the proposed method.Cautiously, when the phase step between the two-frame interferograms is greater than π radians, our proposed method can still accurately predict the wrapped phase, as the result of that the interferogram obtained by adding (π + ∆δ) to the original phase is equivalent to adding (π − ∆δ) to the original phase (0 < ∆δ < π).Moreover, based on Equations ( 1) and ( 2), it is evident that the intensity expression is a periodic function with a period of 2π.As a result, the proposed method can be applied to two-frame interferograms with phase shift of any random value greater than 0 (except integer multiples of π).To address scenarios with negative phase shift, predefined PZT movement directions are commonly used as a preventive approach.

Anti-Noise Performance
In real measurement environments, noise is present in interferograms and cannot be avoided, making it necessary to test the proposed method's noise resistance.Gaussian white noise is added to the randomly selected two-frame interferograms in the testing set, resulting in their SNR varying from 13.9 dB to 43.5 dB.For the convenience of analysis, the DC component and modulation component are both set to 1. Figure 10 shows testing two-frame interferograms with SNR = 13.9 dB.As shown in Figure 11, we perform wavefront reconstruction on the low SNR interferograms by Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet.It is clear that in high-noise conditions, PUN+ and Swin-Unet have better recon-struction performance, which is not achievable by traditional existing methods.Figure 12 depicts the RMSEs between the wrapped phase obtained by different methods and the ground truth as SNR varying from 13.9 dB to 43.5 dB.Table 1 provides the RMSEs for Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet under 13.9 dB, 28.7 dB, and 43.5 dB SNR levels.Compared to PUN and PUN+, Swin-Unet has the acceptable network parameters [31] and the fewest flops, with 27.14 M and 7.72 G, respectively.While SNR varies from 28.7 dB to 43.5 dB, the proposed PUN+ (brown line) and Swin-Unet (light blue line) consistently outperform other methods, with average RMSEs of 0.1031 rad and 0.09617 rad, respectively.Conversely, when the interferograms contain significant noise (13.9 dB), the reconstruction accuracy of nearly all methods drops.However, PUN+ (brown line) and Swin-Unet (light blue line) still exhibit RMSEs of 0.1840 rad and 0.1647 rad, respectively, which remain lower than other methods.This simulation confirms the superior performance of PUN+ and Swin-Unet across the entire tested SNR range.It should be noted that PUN+ corresponds to higher RMSEs than that of Swin-Unet, indicating that the noise resistance performance of PUN+ is slightly weaker than that of Swin-Unet.In practical measurements, to achieve optimal reconstruction accuracy, pre-denoising of interferograms is necessary.

Low Modulation Test
In the actual measurement process, due to the influence of stray light and uneven illumination, interferograms often exhibit a visual effect of a bright central region and a dark surrounding area.For the two-frame phase-shifting interferometry, we assume that the background intensity and modulation of the two interferograms follow spatial inconsistency and temporal consistency.Spatial inconsistency means that the background intensity and modulation follow different Gaussian distributions, not constant in spatial distribution.Temporal consistency means that the PZT only needs to move once between the two interferograms, with a very short time duration, and the impact on the two interferograms is extremely similar.The background intensity and modulation are not functions of time.
As shown in Figure 13, we generated random phase-shifted dual-frame interferograms with different background intensities, modulations, and a moderate amount of Gaussian white noise.The residual maps between the reconstruction results obtained from different methods and the ground truth are illustrated in Figure 14.Table 2 provides the corresponding specific RMSEs.It can be clearly seen that under low modulation (29.7 dB), compared to other methods, the proposed PUN+ and Swin-Unet perform better, preserving fewer details lost, with RMSEs of 0.1329 rad and 0.1166 rad, respectively.As the modulation of the interferograms increases, the reconstruction errors of all methods show a decreasing trend.However, the reconstruction errors of PUN+ and Swin-Unet are consistently lower than other methods, with Swin-Unet's reconstruction accuracy slightly higher than PUN+.

Experiment
To further evaluate the effectiveness of the proposed methods, we conduct experiments by interferograms with random phase step in the experimental setup shown in Figure 15.The phase step between the two-frame interferograms is an unknown constant.Therefore, the linear phase shift error induced by the linear drift of the PZT has been effectively eliminated in the two-frame interferometry.Non-linear phase shift errors caused by vibration exist in the form of harmonics.To address this issue, we use a vibration isolation platform and a high-precision PZT to ensure that the phase shift between the two-frame interferograms is an unknown constant as much as possible, minimizing harmonic errors at this point.In actual measurement, by using ZYGO's GPI-XP/D4 laser interferometer, we capture multiple phase-shifting interferograms, Figures 16a and 17a show two real interferograms with random phase step in first set, and calculate the corresponding wrapped phase which served as the ground truth by the traditional thirteen-steps PSI [32], as shown in Figure 16i and Figure 17i, respectively.Then, in Figure 16, the wrapped phases are calculated from the first set of real interferograms using Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet methods, and both PUN and the proposed methods seem to perform well.In the locally magnified image, PUN is not smooth along the edges and contours, while PUN+ and Swin-Unet capture contour details that are closer to the ground truth.For ease of analysis, we compute RMSEs (compared to traditional PSI) for the seven reconstruction methods, which are 0.4235 rad, 0.4193 rad, 0.3505 rad, 0.3214 rad, 0.2650 rad, 0.25190 rad, and 0.2304 rad, respectively.The performance of the proposed PUN+ and Swin-Unet is better than PUN, reducing the RMSEs by 0.01 rad and 0.03 rad, respectively.
Figure 17 shows the wrapped phases are calculated from the second set of real interferograms using Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet methods.PUN, PUN+, and Swin-Unet perform better than others.In the locally enlarged image, the wrapped phase map reconstructed by PUN exhibits overall blurriness in the high-frequency signals, with a significant loss of fine details.In contrast, the wrapped phase maps reconstructed by PUN+ and Swin-Unet show clear and well-preserved fine details.Particularly, Swin-Unet's reconstruction results are remarkably close to the ground truth.After calculation and analysis, RMSEs between the wrapped phase obtained by the seven reconstruction algorithms (Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet) and the ground truth are 0.4383 rad, 0.4372 rad, 0.3788 rad, 0.3459 rad, 0.2804 rad, 0.2619 rad, and 0.2543 rad, respectively.Although PSI cannot be considered the actual ground truth, it is still reliable regardless of the noise present in the interferograms.
Furthermore, to verify the reconstruction accuracy of the demodulated results, we use the classic and simple "unwrap" function in MATLAB R2018b to unwrap the wrapped phase and obtain the unwrapped phase, as shown in Figures 18 and 19.Also, we compare the unwrapped results obtained from the seven reconstruction methods with the ground truth.In the first experimental data, RMSEs of Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet methods are 0.5413 rad, 0.5804 rad, 0.2801 rad, 0.2929 rad, 0.1510 rad, 0.0682 rad, and 0.0546 rad, respectively.In the second experimental data, RMSEs of Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet methods are 0.7562 rad, 0.7066 rad, 0.2025 rad, 0.1382 rad, 0.1176 rad, 0.0602 rad, and 0.0449 rad, respectively.The wrapped phase of PUN exhibits noticeable distortions in edge regions and high-frequency bands, leading to misalignment during unwrapping and further amplifying the RMSE.However, using PUN+ and Swin-Unet can effectively mitigate the distortion in high-frequency regions.

Conclusions
In conclusion, a novel approach is proposed for wavefront reconstruction based on Swin-Unet to accurately estimate the wrapped phase from two interferograms.Additionally, we have proposed PUN+, building upon the foundation of PUN [22], for wavefront reconstruction.By evaluating the methods on both simulated and real interferograms, and comparing their performance against the classical Kreis [7], OF [8], ST [9], GS [10], and PUN [22] methods.The accuracy of the proposed PUN+ and Swin-Unet have been verified through simulation and experimental results which demonstrate that our proposed methods compared to the above several methods exhibit superior performance in terms of demodulation results while being able to operate at a relatively acceptable time cost.Furthermore, by unwrapping the obtained wrapped phase in experiments, we further indicate that the original phase obtained from both methods still maintains higher accuracy.Overall, in the above processes, the comprehensive performance of the proposed Swin-Unet is superior to PUN+.The proposed Swin-Unet is a promising approach for wavefront reconstruction based on twoframe random interferometry.

Figure 1 .
Figure 1.The workflow of two-frame PSI based on Swin-Unet; (a) the blue part represents wrapped phase prediction stage, whose input is two frame of interferograms with random step, the green part is Swin-Unet parameters update stage, where Swin-Unet weights are updated by computing MSE between the predicted wrapped phase and the ground truth; (b) the blue part is unwrapping stage, the unwrapping algorithm is used to recover the unwrapped phase.

Figure 2 .
Figure 2. (a) Real phase map, (b) phase map generated by Zernike polynomials.2.3.The Architecture of Neural Networks 2.3.1.PUN+ Based on the inspiration from PUN and the original U-Net, we have developed a PUN+ framework depicted in Figure 3, which comprises three main components: the left-side feature extraction network, the right-side feature fusion network, and the bridging network in the middle.

Figure 3 .
Figure 3.The architecture of PUN+, whose orange boxes correspond to multi-channel feature maps, gray-dotted boxes represent copied feature maps.The number of channels is on top of the box and the feature map size is denoted at the lower left edge of the box.

Figure 4 .
Figure 4.The architecture of Swin-Unet, which consists of four main components: encoder, bottleneck, decoder, and skip connections, constructed by Swin Transformer blocks.2.4.Network Training PUN+ and Swin-Unet are implemented using Python 3.9 and PyTorch 1.11.1.Training and testing are conducted on a PC with an NVIDIA GeForce RTX 3090 GPU, Xeon Platinum 8260C CPU.Both of the two training processes include 300 epochs on a dataset of 24,000 pairs of images, and the batch size of the dataloader is 32.

Figure 5 .
Figure 5. (a) An original phase generated by the Zernike polynomials method, (b,c) two-frame interferograms with random phase step generated by original phase, (d) wrapped phase generated by original phase.

Figure 7 .
Figure 7.The interferograms with different background intensities and modulations.

Figure 13 .
Figure 13.Two-frame interferograms with different background intensities, modulations, and noise levels.Background intensities, modulations, and noise levels of interferograms are identical in each column.The interferograms in different columns have different background intensities, modulations, and various noise levels (from left to right, SNR varies from 29.7 dB to 33.0 dB, and background intensity and modulation increase from low to high).Among them, the interferograms (29.7 dB) have the lowest background intensity and modulation, while the interferograms (33.0 dB) have the highest.

Figure 14 .
Figure 14.Residual maps between the predicted results reconstructed by different methods and the ground truth.The residual maps between the predicted results and the ground truth for Kreis, OF, ST, GS, PUN, PUN+, and Swin-Unet are sequentially presented from top to bottom.

Figure 15 .
Figure 15.Environmental setup including computer, laser interferometer, tested mirror, fixture, and vibration isolation platform.

Figure 16 .
Figure 16.Evaluation of different two-frame methods with the first set of experimental data.(a) Real interferograms with random phase shift, the wrapped phase obtained by different reconstruction methods on the real interferograms, (b) Kreis, (c) OF, (d) ST, (e) GS, (f) PUN, (g) PUN+, (h) Swin-Unet, (i) ground truth.

Figure 17 .Figure 18 .
Figure 17.Evaluation of different two-frame methods with the second set of experimental data.(a) Real interferograms with random phase shift, the wrapped phase obtained by different reconstruction methods on the real interferograms, (b) Kreis, (c) OF, (d) ST, (e) GS, (f) PUN, (g) PUN+, (h) Swin-Unet, (i) ground truth.

Table 1 .
Performance comparison of the reconstruction of different methods while step = 1 rad; flops and parameters of deep learning methods.

Table 2 .
Performance comparison of the reconstruction of different methods under different background intensities and modulations.