A Fringe Phase Extraction Method Based on Neural Network

In optical metrology, the output is usually in the form of a fringe pattern, from which a phase map can be generated and phase information can be converted into the desired parameters. This paper proposes an end-to-end method of fringe phase extraction based on the neural network. This method uses the U-net neural network to directly learn the correspondence between the gray level of a fringe pattern and the wrapped phase map, which is simpler than the exist deep learning methods. The results of simulation and experimental fringe patterns verify the accuracy and the robustness of this method. While it yields the same accuracy, the proposed method features easier operation and a simpler principle than the traditional phase-shifting method and has a faster speed than wavelet transform method.


Introduction
Optical metrology has been widely used in various areas, such as 3D sensing, machine vision, intelligent robot control, industry monitoring, and dressmaking. In optical metrology, the output is usually in the form of a fringe pattern, from which a phase map can be determined. Once the phase map has been obtained, it can be converted into the desired parameters, such as the shape of the object, and in-plane or out-of-plane deformation. For instance, the fringe projection technique [1][2][3][4] is often used to measure the 3D-profile of objects [2]. When the fringe pattern is projected on a measured free surface, the phase of the fringe pattern is modulated by the height distribution of the object. A method to extract the phase map from the deformed fringe pattern is thus needed.
Thus far, many methods for phase calculation have been developed, including temporal phase-shifting [4,5], spatial phase-shifting [6,7], and Fourier transform [8]. The phase-shifting is a pointwise technique and it is sensitive to noise such as CCD random noise, environmental vibration, air disturbance, etc. The temporal phase-shifting method requires four images in one stage, which is unsuitable for real-time measurement, and the spatial phase-shifting method requires a complex optical path. The Fourier transform technique, on the contrary, is a global transform method that is hence more tolerant to noise. However, as the transform is global, an accurate frequency band containing effective information of the measured object needs to be determined to avoid large calculation errors. Some improvements have been proposed to overcome the shortcoming of the simple Fourier transform method. A windowed Fourier ridges algorithm [9][10][11] and a windowed Fourier filtering algorithm have been proposed to achieve a low standard deviation for local frequencies and phase distributions in fringe pattern analysis. Morlet wavelet transform has also been used for phase extraction on different types of fringe patterns [12][13][14].
In this research, we propose a fringe phase extraction method based on the neural network. As an important part of machine learning, neural networks have been widely used in various fields, such as object recognition [15][16][17][18], object segmentation [19], and

Principle
The U-net is an end-to-end deep neural network that takes an image of any size (fringe patterns here) as input and outputs a specified image (corresponding wrapped phase maps here). The process of forward propagation, the training method and output principle used for the neural network are described in Sections 2.2-2.4, respectively. As our ultimate goal is to obtain a stable network model with effective parameters, a large amount of training data, including fringe patterns and corresponding wrapped phase maps, are needed in advance. Once this stable network model has been obtained, the wrapped phase map of an arbitrary fringe pattern can be obtained directly. The neural network method was programmed in Python based on the Tensorflow framework, and run on a desktop computer equipped with an Intel i5-4460 CPU and a GeForce GTX 1080 graphics card.

U-Net Neural Network
The size of the input fringe pattern is 512 × 512 pixels and the output maintain the same size. This network features a contracting path, a transition path and an expansive path. The contracting path is used to extract features of the fringe pattern, and the expansive path is applied for converting into corresponding warped phase map. With the deepening of the contracting layers, low-dimensional features including gray gradient of every pixel are changed into high-dimensional features including the location and the local gradient. More layers of each path mean more connection parameters, so as to fit more complex non-linear mapping relationship.
The principle of the contracting path is the same as that of the CNN [33]. The contracting path includes four repeated down-sampling process. Every down-sampling contains two convolution blocks and a pooling block. The feature channel doubles every two convolution blocks, and the image dimensions reduce the half after a pooling operation because the stride is two pixels.
The down-sampling operation is illustrated in Figure 1. As Figure 1 shows, the convolution kernel shifting stride is (1, 1) along two dimensions. This means that the convolution kernels shift 1 pixel along the x and y directions each time and multiple with the image. The convolution consists of a convolution layer and an activation function, and the principle [34] can be described by Formula (1): where v x,y i,r 2 represents the value of the output at (x, y) for the r 2 -th feature channel map of the i -th layer. v x+p,y+q i−1,r 1 represents the value of the result at (x + p, y + q) for the r 1 -th feature channel map of the (i − 1) -th layer, and R is the total number of feature channel in the (i − 1) -th convolution layer. b i,r 2 is a common basic term for the r 2 -th feature channel map of the i -th layer. w p,q i,r 2 ,r 1 represents the weight of the convolution kernel at (p, q), and P × Q is the size of the convolution kernels in terms of pixels, which is 3 × 3 in all the convolution blocks of the contraction path. f represents the activation function, which uses rectified linear units (ReLUs) [33]. The principle of ReLU is described by Formula (2): The pooling block employs the max-pooling method, and the region of every pooling operation is 2 × 2 pixels, which is intended to obtain the maximum pixel value in this small region. Additionally, the pooling stride is (2,2) along the x and y directions so as to reduce the image size by a factor of 2. The expansive path includes four repeated up-sampling process, and it aims to enlarge the image and decode the convolution process. The up-sampling operation including a transposed convolution block, a merge block and two convolution blocks is illustrated in Figure 2. This operation doubles the size of the image and reduces the number of feature channel by half.
The transposed convolution operation is identical to the convolution operation, but it enlarges the image from the previous block. Some zero-value pixels between neighboring image pixels are inserted, and a convolution operation on the up-sampled image is employed. The stride of the transported convolution layer is 2 × 2 pixels, which means that it inserts one zero pixel between neighboring image pixels, doubling the image size. The convolution kernel size in the transposed convolution layer is also 3 × 3 pixels.
The merge block is an image mosaic process. Once the result of the transposed convolution layer has been obtained, it is spliced into the feature image of the corresponding procedure in the contracting path. The principle of two convolution blocks is the same as the down-sampling operation, but it reduces the number of feature channel by half.
All the convolution kernel values are initialized with random numbers from a truncated Gaussian distribution and the values of biases are initialized as constant. The pooling block employs the max-pooling method, and the region of every pooling operation is 2 × 2 pixels, which is intended to obtain the maximum pixel value in this small region. Additionally, the pooling stride is (2,2) along the x and y directions so as to reduce the image size by a factor of 2.
The expansive path includes four repeated up-sampling process, and it aims to enlarge the image and decode the convolution process. The up-sampling operation including a transposed convolution block, a merge block and two convolution blocks is illustrated in Figure 2. This operation doubles the size of the image and reduces the number of feature channel by half.
The transposed convolution operation is identical to the convolution operation, but it enlarges the image from the previous block. Some zero-value pixels between neighboring image pixels are inserted, and a convolution operation on the up-sampled image is employed. The stride of the transported convolution layer is 2 × 2 pixels, which means that it inserts one zero pixel between neighboring image pixels, doubling the image size. The convolution kernel size in the transposed convolution layer is also 3 × 3 pixels.
The merge block is an image mosaic process. Once the result of the transposed convolution layer has been obtained, it is spliced into the feature image of the corresponding procedure in the contracting path. The principle of two convolution blocks is the same as the down-sampling operation, but it reduces the number of feature channel by half.
All the convolution kernel values are initialized with random numbers from a truncated Gaussian distribution and the values of biases are initialized as constant.
The whole process of network propagation is shown in Figure 3. This process features a contracting path (left), a transition path and an expansive path (right). The size of the input fringe pattern is 512 × 512 pixels. After once down-sampling operation, the size of the image is changed to 256 × 256 pixels and the number of feature channels to 64. By repeating this process four times, the size of the image is reduced to 32 × 32 pixels and the number of feature channel changes to 512.
The transition path is consisted of two convolution blocks. Additionally, the feature channel doubles after two convolution blocks. The size of the image maintains 32 × 32 pixels and the number of feature channel changes to 1024.
Then, the result is subjected to the expansive path including four repeated up-sampling operation. The size of image is 64 × 64 pixels, 128 × 128 pixels, 256 × 256, and 512 × 512 pixels after each up-sampling when the numbers of feature channels are 512, 256, 128, and 64, respectively. Finally, a convolution operation is applied, and the size of the convolution kernel is 1 × 1 pixels. The size of the image maintains 512 × 512 pixels, and the number of feature channel changes to 256. The whole process of network propagation is shown in Figure 3. This process features a contracting path (left), a transition path and an expansive path (right). The size of the input fringe pattern is 512 × 512 pixels. After once down-sampling operation, the size of the image is changed to 256 × 256 pixels and the number of feature channels to 64. By repeating this process four times, the size of the image is reduced to 32 × 32 pixels and the number of feature channel changes to 512.
The transition path is consisted of two convolution blocks. Additionally, the feature channel doubles after two convolution blocks. The size of the image maintains 32 × 32 pixels and the number of feature channel changes to 1024.
Then, the result is subjected to the expansive path including four repeated up-sampling operation. The size of image is 64 × 64 pixels, 128 × 128 pixels, 256 × 256, and 512 × 512 pixels after each up-sampling when the numbers of feature channels are 512, 256, 128, and 64, respectively. Finally, a convolution operation is applied, and the size of the convolution kernel is 1 × 1 pixels. The size of the image maintains 512 × 512 pixels, and the number of feature channel changes to 256.

Network Training
Note that the warped phase results are periodic, when 1 output channel and MSE loss function are used, the result is easily restricted to the local optimal solution, where all output values tend to be 0. Thus, this problem is chosen to be converted into a classification problem. The result of the network is a three-dimensional matrix, and the size is 512, 512, 256 along x, y, and feature channel directions, respectively. For every pixel, the values along feature channel direction represent the possibility of being 0 to 255. A softmax function is used to reset the result to meet the requirement of probability distribution, so that cross-entropy [35] known as multi-class log loss can be used as loss function.
When the output of a pixel along the feature channel direction is q 1 , q 2 , · · · q n , the result of the softmax function can be described by where n represents the number of feature channels, and there is 256. The probability distribution of reset q meets the following condition: The ground truth g of each pixel supposed to be -pi to pi is scaled up to between 0 and 255. The probability distribution of training label p is given according to the formula: Cross-entropy is defined by [35] where p represents the training label and q represents the calculated result. The values of p and q are explained in the next section. n represents the number of feature channels, and w, h represents the width and height of the fringe pattern, respectively. The smaller the cross-entropy, the higher the probability that the actual and the calculated results are closer.

Network Training
Note that the warped phase results are periodic, when 1 output channel and MSE loss function are used, the result is easily restricted to the local optimal solution, where all output values tend to be 0. Thus, this problem is chosen to be converted into a classification problem. The result of the network is a three-dimensional matrix, and the size is 512, 512, 256 along x, y, and feature channel directions, respectively. For every pixel, the values along feature channel direction represent the possibility of being 0 to 255. A softmax function is used to reset the result to meet the requirement of probability distribution, so that cross-entropy [35] known as multi-class log loss can be used as loss function.
When the output of a pixel along the feature channel direction is q , q , ⋯ q , the result of the softmax function can be described by The backpropagation algorithm [36] is used to back-propagate the error into the network, and adaptive moment estimation (Adam) [37]-based optimization is used to optimize the weights of convolution kernels(w) and common basic terms(b) of all layers. An input is first propagated through the network. Then, the difference between the calculated and the desired output is backpropagated from the output layer to the first layer of the network, thereby adjusting the network weights in the opposite direction of the derivative of the network error with respect to each individual network weight. By repeating this procedure multiple times for each data in a training set, the network can be taught to map the inputs on the correct outputs. The batch size was 10, and the epoch was 1000. The learning rate was 10 −4 .

Output Principle
The values of each pixel along feature channel direction represent the possibility of being 0 to 255. As Figure 4 shows, the output is the position corresponding to maximum possibility, so the output value is between 0 and 255. Note that the number of feature channels in the last convolution operation can choose more than 256, corresponding to higher resolution and more calculated time. At last, the output is restored to between -pi and pi.
results are closer.
The backpropagation algorithm [36] is used to back-propagate the error into the network, and adaptive moment estimation (Adam) [37]-based optimization is used to optimize the weights of convolution kernels(w) and common basic terms(b) of all layers. An input is first propagated through the network. Then, the difference between the calculated and the desired output is backpropagated from the output layer to the first layer of the network, thereby adjusting the network weights in the opposite direction of the derivative of the network error with respect to each individual network weight. By repeating this procedure multiple times for each data in a training set, the network can be taught to map the inputs on the correct outputs. The batch size was 10, and the epoch was 1000. The learning rate was 10 .

Output Principle
The values of each pixel along feature channel direction represent the possibility of being 0 to 255. As Figure 4 shows, the output is the position corresponding to maximum possibility, so the output value is between 0 and 255. Note that the number of feature channels in the last convolution operation can choose more than 256, corresponding to higher resolution and more calculated time. At last, the output is restored to between -pi and pi.

Simulation Image
Numerical simulations were carried out to test the performance of the proposed algorithm. From simple to complex, we used three equations to simulate fringe patterns, and the size of patterns was set to 512 × 512 pixels.
The second and third kinds of fringe patterns were generated according to Equations (8) and (9). The 1600 fringe patterns with different fringe shapes were obtained through image cropping and rotation from a fringe pattern: I(x, y) = 50 + 50 × cos(peaks(1000) + 20 × pi/1000 × y),y = 1, 2 · · · 1000, I(x, y) = 50 + 50 × cos(50 − (x − 200) 2 − (y − 200) 2 10, 000 ), x = 1, 2, · · · 1000; y = 1, 2, · · · 1000, where x represents height in pixels and y represents width. Figure 5 shows some simulation patterns. (a), (b), and (c) represent fringe patterns generated according to Equations (7)-(9), respectively. Of all simulation patterns, 100 fringe patterns were selected to evaluate the trained network and the rest were used to train the model. The gray level of the fringe patterns was set as the input to the network, and the wrapped phase data calculated by four-step phase-shifting method were set as the output. generated according to Equations (7), (8), and (9), respectively. Of all simulation p 100 fringe patterns were selected to evaluate the trained network and the rest we to train the model. The gray level of the fringe patterns was set as the input to t work, and the wrapped phase data calculated by four-step phase-shifting metho set as the output.

Experimental Image
The fringe projection technique is often used to determine the 3D-profile of ject. When the fringe pattern is projected on a measured free surface, its phase is lated by the height distribution of the object. We used the fringe pattern capture the fringe projection experiment to verify the ability of the neural network to extr phase map. Figure 6a,b show schematic layout and physical diagram of the expe respectively.

Experimental Image
The fringe projection technique is often used to determine the 3D-profile of an object. When the fringe pattern is projected on a measured free surface, its phase is modulated by the height distribution of the object. We used the fringe pattern captured from the fringe projection experiment to verify the ability of the neural network to extract the phase map. The optical path adopted oblique projection and vertical shooting. Th projector chosen was Vivitek-D5158HD at a resolution of 1920 × 1280 pi era used was the Basler ace 1600-20 g, with a resolution of 1600 × 1200 pixe axis of the projection and the receiving end intersected at point O. Moreov and the digital fringe projector were at the same height L. Due to modula ject's height, light that was supposed to obtain at point B was cast on point point recorded by the camera was A. Finally, the height information of recorded in the fringe pattern. A 1-mm-thick disk was chosen as the measurement object. The positio the disk were altered to obtain different fringe patterns. To obtain a suffici images to meet the big data requirement of network training, such dat image cropping, translation, and rotation were used. One thousand fringe obtained. Figure 7 shows one of these and its corresponding phase-shifti terns.  The optical path adopted oblique projection and vertical shooting. The digital fringe projector chosen was Vivitek-D5158HD at a resolution of 1920 × 1280 pixels. The camera used was the Basler ace 1600-20 g, with a resolution of 1600 × 1200 pixels. The optical axis of the projection and the receiving end intersected at point O. Moreover, the camera and the digital fringe projector were at the same height L. Due to modulation by the object's height, light that was supposed to obtain at point B was cast on point E, but the light point recorded by the camera was A. Finally, the height information of the object was recorded in the fringe pattern.
A 1-mm-thick disk was chosen as the measurement object. The position and angle of the disk were altered to obtain different fringe patterns. To obtain a sufficient number of images to meet the big data requirement of network training, such data extension as image cropping, translation, and rotation were used. One thousand fringe patterns were obtained. Figure 7 shows one of these and its corresponding phase-shifting fringe patterns.
A 1-mm-thick disk was chosen as the measurement object. The position and angle of the disk were altered to obtain different fringe patterns. To obtain a sufficient number of images to meet the big data requirement of network training, such data extension as image cropping, translation, and rotation were used. One thousand fringe patterns were obtained. Figure 7 shows one of these and its corresponding phase-shifting fringe patterns. Another complex object, a facial mask from opera, was measured, as shown in Figure 8. The same operations were performed as before to yield another 1000 experimental patterns. Michelson interferometry is widely used to measure out-of-place displacement. An MI-based measurement system was setup to obtain different interferograms, and Figure  9 shows a schematic drawing of the measurement system. A light beam was emitted from the He-Ne laser generator, and expanded as parallel light beams after going through a spatial filter and convex lens. The parallel light beams were then divided into two identical parts by a beam splitter (BS), and one each was introduced to the reference arm and Another complex object, a facial mask from opera, was measured, as shown in Figure 8. The same operations were performed as before to yield another 1000 experimental patterns.
image cropping, translation, and rotation were used. One thousand fringe patterns were obtained. Figure 7 shows one of these and its corresponding phase-shifting fringe patterns. Another complex object, a facial mask from opera, was measured, as shown in Figure 8. The same operations were performed as before to yield another 1000 experimenta patterns. Michelson interferometry is widely used to measure out-of-place displacement. An MI-based measurement system was setup to obtain different interferograms, and Figure  9 shows a schematic drawing of the measurement system. A light beam was emitted from the He-Ne laser generator, and expanded as parallel light beams after going through a spatial filter and convex lens. The parallel light beams were then divided into two iden tical parts by a beam splitter (BS), and one each was introduced to the reference arm and Michelson interferometry is widely used to measure out-of-place displacement. An MI-based measurement system was setup to obtain different interferograms, and Figure 9 shows a schematic drawing of the measurement system. A light beam was emitted from the He-Ne laser generator, and expanded as parallel light beams after going through a spatial filter and convex lens. The parallel light beams were then divided into two identical parts by a beam splitter (BS), and one each was introduced to the reference arm and the objective arm. In the objective arm, the light beam propagated onto the surface of the object and was reflected. The reference arm had a reflector coupled with a PZT used for phase-shifting. Finally, two reflected light beams were returned to the BS and interference onto the surface of a CCD. The phase of the captured interferogram recorded the out-of-place displacement information of the object. By changing the fringe interval or the position of the reflector, 100 interferograms were obtained directly, and the other 900 interferograms were obtained through data extension. Figure 10a,b show physical diagram and some interferograms, respectively. the objective arm. In the objective arm, the light beam propagated onto the surface of the object and was reflected. The reference arm had a reflector coupled with a PZT used for phase-shifting. Finally, two reflected light beams were returned to the BS and interference onto the surface of a CCD. The phase of the captured interferogram recorded the out-of-place displacement information of the object. By changing the fringe interval or the position of the reflector, 100 interferograms were obtained directly, and the other 900 interferograms were obtained through data extension. Figure 10a,b show physical diagram and some interferograms, respectively.  The 100 fringe patterns from the fringe projection experiment and 100 interferograms were selected to evaluate the trained network, and the remainder was used to training the model. During the training, the wrapped phase dataset as the output of the neural network was calculated by the four-step phase-shifting method.  The 100 fringe patterns from the fringe projection experiment and 100 interferograms were selected to evaluate the trained network, and the remainder was used to

Results on Simulation Image
The 100 simulation fringe patterns were used to test the accuracy and robustness of the method. Figure 11 shows some simulated fringe patterns and corresponding calculated wrapped phase maps. Figure 11b was calculated by the four-step phase-shifting method and Figure 11c was obtained through the trained neural network. The results show that the two types of measurement aligned well, and patterns with different fringe intervals and types yielded the correct values. In order to further illustrate the accuracy of our proposed method, wavelet transform method was used to make comparison. The two-dimension wavelet transform was implemented according to approach proposed by Wang [38], and the window-modifying parameter chose to be 2. The error maps coming from different methods are shown in Figure 11d,e, respectively. The whole error level verified the high accuracy.
To evaluate the accuracy of this method, we defined two types of error: systematic error (E) and standard deviation error (S) [31].
E is defined as where d j i real represents the wrapped phase data of the i-th pixel in the j-th calculated image, and d j i cal represents the calculated phase data of the i-th pixel in the j-th calculated image, which was also warped. M represents the number of calculated images, and N represents the number of pixels in an image. S reflects the average error, and its best possible score was zero.
S is defined as where S reflects the deviation in the measured displacement corresponding to the mean value, and has is related to random error. For the phase data of the simulation fringe patterns, the value of E was 0.03 rad and that of S was 0.07 rad.

Results on Simulation Image
The 100 simulation fringe patterns were used to test the accuracy and robustness of the method. Figure 11 shows some simulated fringe patterns and corresponding calculated wrapped phase maps. Figure 11b was calculated by the four-step phase-shifting method and Figure 11c was obtained through the trained neural network. The results show that the two types of measurement aligned well, and patterns with different fringe intervals and types yielded the correct values. In order to further illustrate the accuracy of our proposed method, wavelet transform method was used to make comparison. The two-dimension wavelet transform was implemented according to approach proposed by Wang [38], and the window-modifying parameter chose to be 2. The error maps coming from different methods are shown in Figure 11d,e, respectively. The whole error level verified the high accuracy. To evaluate the accuracy of this method, we defined two types of error: systematic error (E) and standard deviation error (S) [31].
E is defined as where d represents the wrapped phase data of the i-th pixel in the j-th calculated image, and d represents the calculated phase data of the i-th pixel in the j-th calcu-

Results on Experimental Image
The 100 fringe patterns obtained from the experiment on fringe projection were used to test the accuracy and the robustness of the method. Figure 12 shows some patterns and their results of wrapped phase maps. The patterns in Figure 12a were chosen from different projection experiments that used an empty background, a disk, and different parts of a mask as measured object. One hundred interferograms were also used to test the accuracy and the robustness of the method. Some interferograms and their wrapped phase map results are shown in Figure 13. The results shown in Figures 12 and 13b were calculated by the four-step phase-shifting method, and those shown in Figures 12 and 13c were obtained through the trained neural network. Through the comparison, we see that the results of the two methods were consistent. They also show that both experimental fringe projection patterns and interferograms yielded the correct results, and verified the robustness of the trained neural network and the feasibility of the machine learning method. Additionally, error maps are presented in Figure 12d,e and Figure 13d,e. Through contrast, errors of our proposed method were reduced obviously, demonstrating its improved performance in measuring complex objects under environmental noise. Thus, the proposed method owns higher noise resistance ability compared with wavelet transform method. robustness of the trained neural network and the feasibility of the machine learning method. Additionally, error maps are presented in Figures 12d,e and 13d,e. Through contrast, errors of our proposed method were reduced obviously, demonstrating its improved performance in measuring complex objects under environmental noise. Thus, the proposed method owns higher noise resistance ability compared with wavelet transform method.  By referring again to the results of calculation of the four-step phase-shifting method, we used E and S to evaluate the accuracy of the experimentally obtained fringe pattern. For the wrapped phase data of the experimental fringe patterns, the value of E was 0.10 rad and S was 0.08 rad. For the wrapped phase data of interferograms, the value of E was 0.22 rad and S was 0.24 rad. The results verified the precision of this trained neural network and the accuracy of the learning-based method, as well as the system's ability to resist environmental noise. We also see that with decreasing quality of the fringe pattern, error increased.

Computation Efficiency
In order to illustrate the computation efficiency of the proposed method, the wavelet transform method was used to make comparison, which also only needed a single fringe pattern. The wavelet transform method was also programming using python language. 10 simulation patterns, 10 fringe projection patterns and 10 interferograms were calculated, respectively, and the values of E and S were shown in Table 1. The average calculated time of one pattern were recorded in Table 2. By referring again to the results of calculation of the four-step phase-shifting method, we used E and S to evaluate the accuracy of the experimentally obtained fringe pattern. For the wrapped phase data of the experimental fringe patterns, the value of E was 0.10 rad and S was 0.08 rad. For the wrapped phase data of interferograms, the value of E was 0.22 rad and S was 0.24 rad. The results verified the precision of this trained neural network and the accuracy of the learning-based method, as well as the system's ability to resist environmental noise. We also see that with decreasing quality of the fringe pattern, error increased.

Computation Efficiency
In order to illustrate the computation efficiency of the proposed method, the wavelet transform method was used to make comparison, which also only needed a single fringe pattern. The wavelet transform method was also programming using python language. 10 simulation patterns, 10 fringe projection patterns and 10 interferograms were calculated, respectively, and the values of E and S were shown in Table 1. The average calculated time of one pattern were recorded in Table 2. From Table 2, we can see, once the neural network was determined, the calculated time of the neural network method have no matter with the fringe pattern quality, only depending on the network structure and the input size. Under the same accuracy, the calculated speed of the proposed method is 20 times faster than the wavelet transform method.

Discussion
From Figure 12c, we find that different measured objects can yield correct results regardless of the change in the position or the angles of objects. Figure 13c shows that this trained neural network can be applied to interferograms with lower pattern quality. To verify the network's ability to handle more complex fringe patterns with different shapes and intervals of fringes, some fringe obtained from such data extensions as image extension and rotation were calculated, and consistent results were obtained as shown in Figure 14. The results of Figure 14 further verify the robustness of this trained neural network. From Table 2, we can see, once the neural network was determined, the calculated time of the neural network method have no matter with the fringe pattern quality, only depending on the network structure and the input size. Under the same accuracy, the calculated speed of the proposed method is 20 times faster than the wavelet transform method.

Discussion
From Figure 12c, we find that different measured objects can yield correct results regardless of the change in the position or the angles of objects. Figure 13c shows that this trained neural network can be applied to interferograms with lower pattern quality. To verify the network's ability to handle more complex fringe patterns with different shapes and intervals of fringes, some fringe obtained from such data extensions as image extension and rotation were calculated, and consistent results were obtained as shown in  This learning-based method is an improving process. The more different fringe patterns are learned, the more complex model can be calculated. A sufficient amount of big data can support this neural network to adapt to all kinds of fringe patterns. Different fringe patterns verified its precision with the four-step phase-shifting method, however, this end-to-end neural network only needs one original fringe pattern to obtain the corresponding wrapped phase map.
This method requires a considerable amount of training data, which is time-consuming in data preparation. This training can be a continuous process. Once a batch of fringe patterns has been trained, the obtained neural network can be saved. The next batch of images can then be trained on the saved neural network, which can signif- This learning-based method is an improving process. The more different fringe patterns are learned, the more complex model can be calculated. A sufficient amount of big data can support this neural network to adapt to all kinds of fringe patterns. Different fringe patterns verified its precision with the four-step phase-shifting method, however, this endto-end neural network only needs one original fringe pattern to obtain the corresponding wrapped phase map. This method requires a considerable amount of training data, which is time-consuming in data preparation. This training can be a continuous process. Once a batch of fringe patterns has been trained, the obtained neural network can be saved. The next batch of images can then be trained on the saved neural network, which can significantly reduce the time needed for training.

Conclusions
In this paper, we proposed a fringe wrapped-phase map extraction method based on the U-net neural network that can obtain the wrapped phase map directly from a fringe pattern. The results of simulated and experimental fringe patterns verified the efficiency and the robustness of this method. At the same accuracy, the proposed method boasts easy operation and a simple principle compared with the traditional phase-shifting method and owns faster speed than wavelet transform method.