Interferometric Wavefront Sensing System Based on Deep Learning

: At present, most wavefront sensing methods analyze the wavefront aberration from light intensity images taken in dark environments. However, in general conditions, these methods are limited due to the interference of various external light sources. In recent years, deep learning has achieved great success in the ﬁeld of computer vision, and it has been widely used in the research of image classiﬁcation and data ﬁtting. Here, we apply deep learning algorithms to the interferometric system to detect wavefront under general conditions. This method can accurately extract the wavefront phase distribution and analyze aberrations, and it is veriﬁed by experiments that this method not only has higher measurement accuracy and faster calculation speed but also has good performance in the noisy environments.


Introduction
When light is transmitted over long distances in space, it is often interfered by numerous factors (atmospheric turbulence, humidity, etc.) to distort the wavefront. The wavefront aberration of light has traditionally been an important factor affecting the imaging quality of optical systems. Adaptive optics technology (AO) is the most effective measure to overcome and compensate wavefront aberration [1]; it has a wide range of applications in astronomy, microscopy, radar, and other research fields. The wavefront detector is an important part of the adaptive optics systems. It can analyze the degree of the aberration and finally convert it into the control signal to the corrector to automatically compensate for the aberration, thereby improving the imaging quality of the optic systems. With the continuous development of technology, many fields have higher requirements for measurement technology. They not only need to have a faster measurement speed but also need to have higher accuracy. However, traditional wavefront detection methods such as the Hartmann-Shack method [2], Fourier spectroscopy method [3], etc. have been unable to meet these requirements. Therefore, new methods are needed to make up for the shortcomings of traditional technologies.
Machine learning, including deep learning, has become an increasingly hot topic now. It is based on the biological neuron model of the human brain, which can learn from examples to solve the problem of function approximation or pattern classification [4]. In the early days, some people applied machine learning methods in the field of optics. They used a multilayer perceptron to measure the optical phase distortion caused by air turbulence [5]. Later, this method was used in the wavefront reconstruction system of the Hubble Telescope [6]. With the advent of neural networks, methods of applying neural networks to adaptive optics for wavefront detection and reconstruction were also proposed [7]. Recently, evidence reveals that the use of convolutional neural networks can realize the wavefront reconstruction of the point source [8]. Furthermore, Zernike coefficients were computed realize the wavefront reconstruction of the point source [8]. Furthermore, Zernike coefficients were computed with a deep neural network framework based on the wavefront of the point source and extended source received by the wavefront detector [9].
At present, most wavefront sensing methods use a detector to directly record the light intensity distribution of the wavefront [8][9][10] and then analyze the phase distribution and wavefront aberration. However, these methods only work well under dark conditions, because any interference from external light sources will be recorded by the detector, which directly affects the measurement result. Therefore, the application of these methods is very limited. In this study, neural network algorithms were applied to the interference system, which can realize the measurement of the distorted wavefront in the general environment. Finally, it is verified by experiments that this method not only has higher measurement accuracy and faster calculation speed but also has good performance in noisy environments.

Wavefront Detecting System
A wavefront is a combination of points where light has the same phase in space, and the wavefront aberration refers to the difference between the actual wavefront and the ideal wavefront [11]. In wavefront detection, it is crucial to get the distorted wavefront phase distribution first, and then the wavefront aberration can be further analyzed. The phase-shifting (PS) method [12], as a common phase measurement method, is very common in various interference systems. In the industrial field, the wavefront detection technology based on the phase-shifting interference device is widely used in the detection of optical surface defects, because it has the advantages of a large measurement field and high measurement accuracy. The phase-shifting method can generate multiple interference fringe patterns by continuously changing the phase value of the reference beam, and it extracts the phase distribution of the measuring beam according to the relationships between the fringe patterns. Compared with other phase extraction methods such as the Fourier transform method [13], wavelet transform method [14], etc., the phase-shifting method is quite insensitive to background intensity and it can achieve pixel-wise phase measurement with higher resolution and accuracy. Therefore, using phase-shifting interference system for wavefront detection will not be interfered by external light sources, and it can better complete the measurement task in a general environment, which breaks the constraint of the dark environment. Here, we designed a phase-shifting interference system to detect the distorted wavefront caused by the defect of the optical components, as shown in Figure 1. Here, the laser emits a beam with a wavelength of 532 nm through the beam expander and collimator (B) as the illumination source. To avoid excessive laser intensity, we placed an adjustable neutral density filter (ND) at the exit port to adjust the intensity. The parallel beam irradiated on the Here, the laser emits a beam with a wavelength of 532 nm through the beam expander and collimator (B) as the illumination source. To avoid excessive laser intensity, we placed an adjustable neutral density filter (ND) at the exit port to adjust the intensity. The parallel beam irradiated on the beam splitter (BS) is divided into two parts; one passes through the BS and irradiates vertically on the mirror M1 controlled by a precision linear stage (PLS) as the reference beam, and the other one is Appl. Sci. 2020, 10, 8460 3 of 15 reflected by mirror M2 to the deformable mirror M3 as the measuring beam. The reference beam and the measuring beam are reflected by M1 and M3 respectively, and they converge and interfere at the BS. The interference fringes are finally recorded by the camera. The intensity of the fringe pattern can be expressed as Equation (1): where I(x, y) is the intensity of the fringe pattern, I r (x, y) is the reference beam intensity, I o (x, y) is the measuring beam intensity, I b (x, y) is the background noise, ϕ r (x, y) represents the reference beam phase distribution, and ϕ o (x, y) represents the measuring beam phase distribution. Here, x and y refer to the horizontal and vertical coordinates of each pixel on the interference field area recorded by the camera.
In this system, the deformable mirror will deform when subjected to external forces, which will change the phase value at each point on the measuring wavefront, thereby simulating the distortion of spatial light after long-distance transmission. When it is not subjected to external forces, it is equivalent to a plane mirror, and the reflected wavefront is still a plane. At this time, the optical path difference between the reference beam and the measuring beam at all points on the interference field is equal. When the deformable mirror is subjected to external forces, the measuring wavefront reflected is no longer an ideal plane but becomes distorted with a certain aberration. At this time, the interference field recorded by the camera shows bright and dark stripes: Among them, ∆ϕ(x, y) represents the phase delay caused by M3 at the point (x, y) on the measuring wavefront. By applying different forces, we can get the wavefront with different distortions. After that, the four-step phase-shifting method is used to change the phase of the reference beam by π/2 each time, so that four interference fringe patterns are obtained. The fringe patterns intensity can be expressed as: The index n = 1, 2, 3, 4. The phase change of the reference beam is controlled by the PLS, and each movement distance is equal to 1/8 of the laser wavelength. Using the orthogonality of trigonometric functions, ϕ r (x, y) − ϕ o (x, y) can be obtained: When M3 deforms, we can use the same method to obtain the phase distribution of the distorted wavefront interference fringe pattern: Equations (1)-(5) represent the process of the phase-shifting method. Since the reference beam illuminates vertically on the mirror M1, so the value of ϕ r (x, y) is equal everywhere in the interference field and can be regarded as a constant. Therefore, we only need one phase-shifting measurement that can obtain the phase value ϕ o (x, y) + ∆ϕ(x, y) of the measuring beam.

Wavefront Analysis Neural Network
There is a problem in the method of extracting the wavefront phase based on the fringe patterns. The phase value calculated by the inverse trigonometric function arctan is limited to (−π, π], which is not true; that is, there is a phenomenon of phase wrapping. So far, a lot of methods have been proposed to solve this problem. There are some good algorithms such as Goldstein's branch cut algorithm [15], the quality-guided algorithm [16], and the mask cut algorithm [17]. They all belong to path-tracking algorithms. This type of algorithm has very high requirements for the quality of the input image. It needs to select a suitable integration path in the phase map and bypass the noise area to prevent error transmission.
In general conditions, noise is inevitable. The true phase map calculated by this type of algorithm is prone to large errors; thus, it is difficult to guide wavefront correction. When we detect the distorted wavefront, phase unwrapping is an indispensable part. Traditional algorithms have low accuracy when facing noisy images. Some algorithms with anti-noise capability often require a lot of calculation time, making it difficult for wavefront detection systems to achieve dynamic monitoring of the wavefront.
After we get the true phase map, using mathematical expressions to fit the wavefront can better guide wavefront aberration analysis and correction. In optical measurement research, it is very common to use Zernike polynomials as the basis function for wavefront fitting. The reason is that Zernike polynomials fit the optical wavefront with high accuracy, they are orthogonal in the circular domain, and most of the optical instruments have circular apertures. What is more, there is a certain correspondence between the Zernike polynomial and the Seidel aberration. If we use orthogonal Zernike polynomial as the basis function to fit the phase change of the measuring beam wavefront, it can be expressed as Equation (6): where the index i = 1, 2, . . . , M, Z i is the ith Zernike polynomial, and a i is the corresponding coefficient.
To get the unknown value of a i , a lot of methods have been proposed, such as the least-squares method, Gram-Schmidt orthogonal method [18], Householder transformation method [19], etc. However, these methods have problems such as poor fitting accuracy or slow fitting speed, which makes them difficult to be used in actual wavefront detection systems. Traditional methods can no longer meet the requirements of high-precision dynamic wavefront measurement. To get an algorithm with high accuracy and faster calculation speed, we introduced the state-of-the-art deep convolutional neural networks.

Models
The whole system includes two neural network modules, as shown in Figure 2 The role of net1 is to unwrap the phase calculated by the phase-shifting method. In the previous chapter, we mentioned that the wrapped phase value is limited to (−π, +π]; however, there is such a relationship between the wrapped phase and the true phase, as formulated in Equation (7): Here, k(x, y) represents the wrap count of each pixel, k(x, y) ∈ Z. In the wrapped phase, it can be seen that there are obvious dividing lines between regions with different wrap count. Therefore, we can use image segmentation to separate these regions. Inspired by Segnet [20], net1 adopts the encoder-decoder structure, which can segment these regions well and give their classification results. The last layer of net1 is the softmax layer, and the output is a feature map with 11 channels, which represents 11 classification probability value of each pixel. We extract the category index n(x, y) with the largest probability value to represent the final classification result of the pixel. There is a one-to-one correspondence between the classification result n(x, y) and the wrap count k(x, y), as formulated in Equation (8): k(x, y) = n(x, y) − 5 where n(x, y) ∈ {0, 1, . . . , 9, 10}, k(x, y) ∈ {−5, −4, . . . , 4, 5}. In general conditions, the measuring beam phase change caused by the deformable mirror is small, and the wrap count k(x, y) corresponding to its wrapped phase is usually in the range of [−1, 1]. However, to make this method applicable to scenarios with large phase changes, we generated some wavefront maps with larger phase values in the training dataset, and the wrap counts calculated from these maps can reach the range up to [−5, 5].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 15 the training dataset, and the wrap counts calculated from these maps can reach the range up to [−5, 5].
(a) Wavefront analysis process Since the unwrapped phase of net1 is not completely correct, it may wrongly estimate the wrap count at individual points, causing these points to have a large deviation from the true value, as shown in Figure 3b,c. To solve this problem, we propose a smooth method to further process the output result, as formulated in Equations (9) and (10).
Through a global search, we get the value V(x, y) of each pixel and subtract it by the average value V ′ (x, y) of its 3 × 3 neighborhood: If the difference γ is greater than the threshold T or less than −T, then subtract or add 2π to the value of the pixel: T is an empirical value, and it works better when equal to 4. After 10 iterations, most of the bad points will be corrected except for some areas with dense bad points, as shown in Figure 3d. Since the unwrapped phase of net1 is not completely correct, it may wrongly estimate the wrap count at individual points, causing these points to have a large deviation from the true value, as shown in Figure 3b,c. To solve this problem, we propose a smooth method to further process the output result, as formulated in Equations (9) and (10).
Through a global search, we get the value V(x, y) of each pixel and subtract it by the average value V (x, y) of its 3 × 3 neighborhood: Appl. Sci. 2020, 10, 8460 If the difference γ is greater than the threshold T or less than −T, then subtract or add 2π to the value of the pixel: T is an empirical value, and it works better when equal to 4. After 10 iterations, most of the bad points will be corrected except for some areas with dense bad points, as shown in Figure 3d. Comparing Figure 3c,d, we find that the bad points are clustered into regions, and the values of all points in a region have the same difference of 2πk with the true values. To further correct these regions, we use the Sobel operator to detect the edges and then extract the whole regions. We calculate the average value of an area and subtract it by the average value of adjacent edge pixels outside the area. According to the difference value and Equation (10), the correct value can be obtained. Eventually, we can get a well-restored wavefront, as shown in Figure 3e Comparing Figure 3c,d, we find that the bad points are clustered into regions, and the values of all points in a region have the same difference of 2πk with the true values. To further correct these regions, we use the Sobel operator to detect the edges and then extract the whole regions. We calculate the average value of an area and subtract it by the average value of adjacent edge pixels outside the area. According to the difference value and Equation (10), the correct value can be obtained. Eventually, we can get a well-restored wavefront, as shown in Figure 3e,f. Due to the long-distance transmission, the wavefront captured by the detector will have some noise. When the wavefront is noisy, it will severely affect the aberration analysis. To reduce the influence of noise, we use the 3*3 Gaussian kernel with a sigma of 1 to filter the restored wavefront. After that, we can get a wavefront distribution, which is conducive to the subsequent aberration analysis.
The role of net2 is to make a further analysis based on the true phase map and fit the coefficients of the Zernike polynomial required for wavefront reconstruction and correction. It adopts the structure of a deep convolutional neural network as a whole, and five residual blocks [21] are added to accelerate the learning process and improve the fitting accuracy. Finally, two fully connected layers are used, and we directly output the first 36 Zernike coefficients.

Training Data Generated
The neural network only works well after being trained, and it needs a lot of training data. According to the function of net1, the wrapped phase is needed as the input, and the wrap count corresponds to the output label. We also need the true phase map as the final result reference. Specifically, the true phase map was simulated by performing arithmetic operations (addition or subtraction) on 7-15 two-dimensional Gaussian functions with random mean and variance. Then, wrap the true phase to get the wrapped phase. Finally, according to Equation (7), the wrap count can be calculated.
Net2 needs the true phase map as the input and Zernike coefficients as the output label. Due to the long-distance transmission, the wavefront captured by the detector will have some noise. When the wavefront is noisy, it will severely affect the aberration analysis. To reduce the influence of noise, we use the 3*3 Gaussian kernel with a sigma of 1 to filter the restored wavefront. After that, we can get a wavefront distribution, which is conducive to the subsequent aberration analysis.
The role of net2 is to make a further analysis based on the true phase map and fit the coefficients of the Zernike polynomial required for wavefront reconstruction and correction. It adopts the structure of a deep convolutional neural network as a whole, and five residual blocks [21] are added to accelerate the learning process and improve the fitting accuracy. Finally, two fully connected layers are used, and we directly output the first 36 Zernike coefficients.

Training Data Generated
The neural network only works well after being trained, and it needs a lot of training data. According to the function of net1, the wrapped phase is needed as the input, and the wrap count corresponds to the output label. We also need the true phase map as the final result reference. Specifically, the true phase map was simulated by performing arithmetic operations (addition or subtraction) on 7-15 two-dimensional Gaussian functions with random mean and variance. Then, wrap the true phase to get the wrapped phase. Finally, according to Equation (7), the wrap count can be calculated.
Net2 needs the true phase map as the input and Zernike coefficients as the output label. However, the Zernike coefficients calculated according to the true phase map have errors and cannot be used as the training label of the net2. The method of using random numbers as the Zernike coefficients to reconstruct the wavefront cannot produce the type of the true phase we desire. To ensure that the training data of net1 and net2 are of the same type, we proposed a novel method to generate the training data: First, the covariance matrix method is used to fit the first 36 Zernike coefficients corresponding to the true phase map, as formulated in Equations (11)-(24). For N points on a wavefront, the phase value of each point can be expressed as: W i = a 0 + a 1 Z 1i + a 2 Z 2i + · · · + a j Z ji + · · · + a n Z ni (11) where the index i = 1, · · · , N, and n = 36. Z ji . represents the value of the j-th Zernike polynomial at the i-th point. The average value of the N points is expressed as: W = a 0 + a 1 Z 1 + a 2 Z 2 + · · · + a j Z j + · · · + a n Z n .
Here, Z j is defined as Z j = 1 N N i=1 Z ji . Subtracting Equation (12) from Equation (11), the function can be expressed as: U mi = a 1 U 1i + a 2 U 2i + · · · + a j U ji + · · · + a n U ni (13) where U mi = W i − W and U ji = Z ji − Z j . According to the Gram-Schmidt orthogonalization method [18], a linear combination of U j can be used to reconstruct a set of orthogonal polynomials P j on discrete data points: U m = C 1 P 1 + C 2 P 2 + · · · + C j P j + · · · + C n P n (14) where P k and P j are orthogonal, N i=1 P ji P ki = 0. Then, we construct the polynomials P j according to the following function: 0 · · · 0 a 21 1 · · · 0 · · · · · · · · · 0 a n1 a n2 · · · 1 Due to the orthogonality of P 1i and P 2i , we can get the value of α 21 as follows: Here, A jk represents the covariance of Z j and Z k , which is the covariance of U j and U k . The expression of A jk is as follows: Appl. Sci. 2020, 10, 8460 8 of 15 By analogy, we can get all the values of α jk , which can be expressed as: According to Equation (15), when the index j < k, α jk is equal to zero, when j = k, α jk is equal to one. Thus, we can get the polynomials P j . Then, we use the least square method to find the value of the coefficients C j when the variance σ 2 is minimum; the functions can be expressed as: Combining Equations (13) and (15), we can get the relationship between C j and a j , which can be expressed as: (C 1 C 2 · · · C n ) = (a 1 a 2 · · · a n ) 0 · · · 0 a 21 1 · · · 0 · · · · · · · · · 0 a n1 a n2 · · · 1 α ij a i , (j = 1, 2, · · · , n − 1).
When the index j = n, a n is equal to C n . According to Equation (12), a 0 is equal to W − n j=1 a j Z j . Thus, we can get the value of the first 36 Zernike coefficients of the wavefront Since the number of the detected points is limited, the Zernike coefficients fitted here are not accurate. So, we reconstructed the wavefront based on the obtained Zernike coefficients and generated a new wavefront phase map. In this way, we not only get the phase map of the desired type but also ensure that the corresponding Zernike coefficients are accurate. We also added 20-50% salt and pepper noise with a random value within [−2, 2] to a little part of the training dataset of net1 to improve its anti-noise ability. A training dataset consisting of 10,000 pairs was generated, and 100 pairs were generated for testing. The image size is 201 × 201 pixels. Figure 4 shows the data used in the training process.

Training Networks
The two networks are trained separately. Net1 carry out classification tasks, so it is appropriate to choose cross-entropy as the loss function, as formulated in Equation (25): x,y logp(x, y).
M is the number of samples in each batch of training data, and p(x, y) is the probability value of the correct classification result. Adam optimizer [22] was used, the learning rate is initially set to 0.001, and the learning rate decay rate is 0.99. The batch size is set to 8, and the maximum training epoch is 20. generated a new wavefront phase map. In this way, we not only get the phase map of the desired type but also ensure that the corresponding Zernike coefficients are accurate. We also added 20-50% salt and pepper noise with a random value within [−2, 2] to a little part of the training dataset of net1 to improve its anti-noise ability. A training dataset consisting of 10,000 pairs was generated, and 100 pairs were generated for testing. The image size is 201 × 201 pixels. Figure 4 shows the data used in the training process.

Training Networks
The two networks are trained separately. Net1 carry out classification tasks, so it is appropriate to choose cross-entropy as the loss function, as formulated in Equation (25): M is the number of samples in each batch of training data, and p(x, y) is the probability value of the correct classification result. Adam optimizer [22] was used, the learning rate is initially set to 0.001, and the learning rate decay rate is 0.99. The batch size is set to 8, and the maximum training epoch is 20.
The output result of net2 is the first 36 Zernike coefficients; to evaluate the quality of the fitting result, we choose mean square error (MSE) as the loss function, as formulated in Equation (26): The output result of net2 is the first 36 Zernike coefficients; to evaluate the quality of the fitting result, we choose mean square error (MSE) as the loss function, as formulated in Equation (26): where y i is the logits value, andŷ i is the label value. Adam optimizer was used, the learning rate is initially set to 0.0001, and the learning rate decay rate is 0.99. The batch size is set to 8, and the maximum training epoch is 40. Both networks were implemented using the TensorFlow framework (Google) and were computed on the i7 8700k CPU (Inter) and RTX 2070 GPU (NVIDIA).

Experiment
To validate the effectiveness of this deep learning wavefront detection method we proposed, we successively conducted simulation experiments and real experiments in general environment. First, we conducted a simulation experiment. We extracted two sets of wrapped phases with a signal-to-noise ratio (SNR) of 9 and a signal-to-noise ratio (SNR) of 3 from the test dataset of net1. Then, we input these data separately to the net1, branch cut algorithm, and quality-guided phase unwrapping algorithm to get the unwrapped phases. We compared these results with the real phase and observed the net1 performance with the two other methods according to the errors. The output results are shown in Figures 5 and 6.
From Figure 5, we can see that there are some errors between the output calculated by each method and the true phase. These errors are mainly caused by incorrect estimation of the wrap counts of the points on the wrapped phase so that there are differences of 2πk between the calculated wavefronts and the real wavefront at these points. These errors are represented as spikes in the difference maps. The more spikes and the greater its value in the difference map, the greater the error of the output result, which also indicates the poor performance of the method. According to Figure 5 and Table 1, we can see that these methods all have good performance when dealing with a wrapped phase with less noise (SNR = 9). Among them, the output result of Goldstein's branch cut algorithm has the highest accuracy, with the root mean square error (RMSE) of 0.06 and calculation time of 1.94 s. The net1 uses the least computation time. It only takes 1.04 s to complete the work, and the error is comparable to Goldstein's branch cut algorithm, with an RMSE of 0.07. Compared with the other two algorithms, the quality-guided algorithm performs poorly. Not only the calculation time is long (12.48 s), the accuracy of its output results is also relatively low, with an RMSE of 0.22.
First, we conducted a simulation experiment. We extracted two sets of wrapped phases with a signal-to-noise ratio (SNR) of 9 and a signal-to-noise ratio (SNR) of 3 from the test dataset of net1. Then, we input these data separately to the net1, branch cut algorithm, and quality-guided phase unwrapping algorithm to get the unwrapped phases. We compared these results with the real phase and observed the net1 performance with the two other methods according to the errors. The output results are shown in Figures 5 and 6. From Figure 5, we can see that there are some errors between the output calculated by each method and the true phase. These errors are mainly caused by incorrect estimation of the wrap counts of the points on the wrapped phase so that there are differences of 2πk between the calculated wavefronts and the real wavefront at these points. These errors are represented as spikes in the difference maps. The more spikes and the greater its value in the difference map, the greater the error of the output result, which also indicates the poor performance of the method. According to Figure 5 and Table 1, we can see that these methods all have good performance when dealing with a wrapped phase with less noise (SNR=9). Among them, the output result of Goldstein's branch cut algorithm has the highest accuracy, with the root mean square error (RMSE) of 0.06 and calculation time of 1.94 s. The net1 uses the least computation time. It only takes 1.04 s to complete the work, and the error is comparable to Goldstein's branch cut algorithm, with an RMSE of 0.07. Compared with the other two algorithms, the quality-guided algorithm performs poorly. Not only the calculation time is long (12.48 s), the accuracy of its output results is also relatively low, with an RMSE of 0.22.   When dealing with a noisy wrapped phase map (SNR = 3), as shown in Figure 6, the accuracy of the output results of the three methods are all reduced. As the noise becomes denser, more spikes appear in the difference maps, and their value may become larger. Goldstein's branch cut algorithm, which performed well before, was a bit disappointing this time. From Table 2, we found the accuracy of the output result is greatly reduced, the RMSE is 1.48. As to the quality-guided algorithm, due to the increased noise, the calculation error is transmitted and amplified, which caused the error to occur in regions. The performance of the deep learning method is still stable enough; it has an excellent performance in the face of a noisy image, with an RMSE of 0.37.   When dealing with a noisy wrapped phase map (SNR = 3), as shown in Figure 6, the accuracy of the output results of the three methods are all reduced. As the noise becomes denser, more spikes appear in the difference maps, and their value may become larger. Goldstein's branch cut algorithm, which performed well before, was a bit disappointing this time. From Table 2, we found the accuracy of the output result is greatly reduced, the RMSE is 1.48. As to the quality-guided algorithm, due to the increased noise, the calculation error is transmitted and amplified, which caused the error to occur in regions. The performance of the deep learning method is still stable enough; it has an excellent performance in the face of a noisy image, with an RMSE of 0.37. After that, we selected two sets of distorted wavefronts from the test dataset and used net2 and the least-squares method to fit their first 36 Zernike coefficients, as shown in Figure 7. It can be seen that the fitting speed of the least-squares method is faster; it only takes 0.5 s to complete the calculation. However, the fitting accuracy of the least-squares method is poor, and there is a large deviation between the fitting result and the true value; the RMSE is 1.94 and 1.13, respectively. This is because the use of Zernike polynomials as the basis function is not orthogonal on the discrete sampling points, which will cause the equations to appear ill-conditioned and greatly affect the fitting accuracy. The fitting speed of the net2 is slower; it takes about 0.9 s to complete the fitting process. However, the trained neural network can better fit the corresponding Zernike coefficients and has high fitting accuracy with RMSE of 0.16 and 0.17, which is a better choice to be used to analyze and guide wavefront correction.
Finally, to see how the wavefront detection system performs in the real environment, we built a detection system and conducted a series of wavefront detection experiments, as shown in Figure 8. Here, a laser diode (Changfu Technology, Beijing, China) with the power of 150 mW and wavelength of 532 nm was used as the light source. The laser passes through the beam expander and collimator and is divided into two parts by the beam splitter. The reference beam illuminates vertically on the mirror M1 controlled by the precision linear stage (Physik Instrumente, Karlsruhe, Germany), and the measuring beam illuminates on the deformable mirror. By adjusting the tightening screws, we can apply different forces to the deformable mirror to generate different distorted wavefronts. Then, according to the measurement requirements of the four-step phase-shifting method, we controlled the precision linear stage to move in one direction four times; each time, the reference beam phase changed by π/2. Finally, the fringe patterns were recorded by the camera (Daheng Image Vision, Beijing, China), as shown in Figure 9.
According to the relationships between the trigonometric functions, we can calculate the wrapped phase corresponding to the wavefront (Figure 10a). After that, we input the wrapped phase into our net1 and get the unwrapped phase ( Figure 10b). What is more, we also used the Goldstein's branch cut algorithm to process the wrapped phase and compared the output result with that of net1, which are expressed in Figure 10c,d. It can be seen that the outputs of these two methods are almost identical, which indirectly proves that net1 can effectively process the wrapped phase. To further analyze the wavefront aberration, we input the unwrapped phase to net2 and fit the corresponding first 36 Zernike coefficients, as shown in Figure 10e and Table 3. To analyze the accuracy of the fitting results, we used the fitted Zernike coefficients to reconstruct the wavefront (Figure 10f) and calculated the difference between the reconstructed wavefront and the real wavefront in the circular domain, as shown in Figure 10g. The difference expressed in Figure 10g shows that the overall error is small and is controlled within ±0.5; the RMSE is 0.6133. There is a large error at the border of the wavefront at individual points. After analysis, this is caused by the detection error of the camera. Through experiments in the real environment, the wavefront detection system can satisfactorily complete the wavefront detection and analysis works, which proves that it is an accurate and effective wavefront detection method.
is because the use of Zernike polynomials as the basis function is not orthogonal on the discrete sampling points, which will cause the equations to appear ill-conditioned and greatly affect the fitting accuracy. The fitting speed of the net2 is slower; it takes about 0.9 s to complete the fitting process. However, the trained neural network can better fit the corresponding Zernike coefficients and has high fitting accuracy with RMSE of 0.16 and 0.17, which is a better choice to be used to analyze and guide wavefront correction. Finally, to see how the wavefront detection system performs in the real environment, we built a detection system and conducted a series of wavefront detection experiments, as shown in Figure 8. Here, a laser diode (Changfu Technology, Beijing, China) with the power of 150 mW and wavelength of 532 nm was used as the light source. The laser passes through the beam expander and collimator and is divided into two parts by the beam splitter. The reference beam illuminates vertically on the mirror M1 controlled by the precision linear stage (Physik Instrumente, Karlsruhe, Germany), and the measuring beam illuminates on the deformable mirror. By adjusting the tightening screws, we can apply different forces to the deformable mirror to generate different distorted wavefronts. Then, according to the measurement requirements of the four-step phase-shifting method, we controlled the precision linear stage to move in one direction four times; each time, the reference beam phase changed by π/2. Finally, the fringe patterns were recorded by the camera (Daheng Image Vision, Beijing, China), as shown in Figure 9.      According to the relationships between the trigonometric functions, we can calculate the wrapped phase corresponding to the wavefront (Figure 10a). After that, we input the wrapped phase into our net1 and get the unwrapped phase (Figure 10b). What is more, we also used the Goldstein's branch cut algorithm to process the wrapped phase and compared the output result with that of net1, which are expressed in Figure 10c,d. It can be seen that the outputs of these two methods are almost identical, which indirectly proves that net1 can effectively process the wrapped phase. To further analyze the wavefront aberration, we input the unwrapped phase to net2 and fit the corresponding first 36 Zernike coefficients, as shown in Figure 10e and Table 3. To analyze the accuracy of the fitting results, we used the fitted Zernike coefficients to reconstruct the wavefront (Figure 10f) and calculated the difference between the reconstructed wavefront and the real wavefront in the circular domain, as shown in Figure 10g. The difference expressed in Figure 10g shows that the overall error is small and is controlled within ±0.5; the RMSE is 0.6133. There is a large error at the border of the wavefront at individual points. After analysis, this is caused by the detection error of the camera. Through experiments in the real environment, the wavefront detection system can satisfactorily complete the wavefront detection and analysis works, which proves that it is an accurate and effective wavefront detection method.

Conclusions
In this work, we present a new deep learning wavefront sensing method. First, we designed an interference system to detect distorted wavefront and generated multiple interference fringe patterns by phase-shifting method, which greatly improved the anti-noise ability and broke the dependence of the wavefront detection system to the dark environment. Second, we designed a neural network system to analyze the detected wavefront. The two neural network modules in the system were used to unwrap the phase and fit the Zernike coefficients. Compared with other traditional algorithms, the wavefront analysis neural network system not only has higher

Conclusions
In this work, we present a new deep learning wavefront sensing method. First, we designed an interference system to detect distorted wavefront and generated multiple interference fringe patterns by phase-shifting method, which greatly improved the anti-noise ability and broke the dependence of the wavefront detection system to the dark environment. Second, we designed a neural network system to analyze the detected wavefront. The two neural network modules in the system were used to unwrap the phase and fit the Zernike coefficients. Compared with other traditional algorithms, the wavefront analysis neural network system not only has higher measurement accuracy but also greatly reduces the calculation time. Finally, simulation and experiments have proved the effectiveness of the system under different working conditions. In the future, we wish to apply the new wavefront detection system to radar research to achieve dynamic measurement and correction of the wavefront.

Conflicts of Interest:
The authors declare no conflict of interest.