A Novel Approach to Droplet’s 3D Shape Recovery Based on Mask R-CNN and Improved Lambert–Phong Model

Aiming at the demand for extracting the three-dimensional shapes of droplets in microelectronic packaging, life science, and some related fields, as well as the problems of complex calculation and slow running speed of conventional shape from shading (SFS) illumination reflection models, this paper proposes a Lambert–Phong hybrid model algorithm to recover the 3D shapes of micro-droplets based on the mask regions with convolutional neural network features (R-CNN) method to extract the highlight region of the droplet surface. This method fully integrates the advantages of the Lambertian model’s fast running speed and the Phong model’s high accuracy for reconstruction of the highlight region. First, the Mask R-CNN network is used to realize the segmentation of the highlight region of the droplet and obtain its coordinate information. Then, different reflection models are constructed for the different reflection regions of the droplet, and the Taylor expansion and Newton iteration method are used for the reflection model to get the final height of all positions. Finally, a three-dimensional reconstruction experimental platform is built to analyze the accuracy and speed of the algorithm on the synthesized hemisphere image and the actual droplet image. The experimental results show that the proposed algorithm based on mask R-CNN had better precision and shorter running time. Hence, this paper provides a new approach for real-time measurement of 3D droplet shape in the dispensing state.


Introduction
In the process of microelectronic high-speed dispensing, detecting the 3D shape of droplets online is a necessary precondition for studying the micro-jetting effect and realizing adaptive control of the dispensing process [1][2][3][4]. There is also a huge demand for detecting the 3D shapes of droplets online in many other areas. For example, it is also useful for the droplets formed by microfluidic chips for biological and biomedical applications [5][6][7][8]. 3D shape detection enables the microfluidic chip to control the volume of the droplets more precisely, thereby improving the accuracy of the entire system. However, due to the high viscosity and non-Newtonian behavior of most types of adhesives, the tipping, tailing, and unevenness of the substrate leads to irregular shapes. For the conventional stereo vision method and the structured light approach, problems such as poor real-time performance, low precision, and difficulty in compatibility arise [9,10]. In this context, the use of three-dimensional vision to reconstruct the 3D shape of droplets emerges as a reliable method. The three-dimensional reconstruction method based on monocular vision can be used to derive depth information according to considering the problem of slow speed in calculation in the conventional SFS hybrid reflection method. The Lambert-Phong hybrid reflection model is proposed, combining the advantages of the fast solution speed of the Lambertian method and the high accuracy of the Phong method at the highlight region. First, the algorithm denoises the input droplets image. Then, the Mask R-CNN depth learning neural network is used to perform highlight area segmentation, so as to obtain the highlight area coordinates for the next step. Next, the optimization algorithm is linearized to the highlight region and non-highlight region at the pixel level. Then, the two parts are combined to realize the 3D shape reconstruction of the droplet. Finally, the precision and running speed of the composite image and the real image are analyzed, respectively. The algorithm flow is shown in Figure 1.
Micromachines 2018, 9, x FOR PEER REVIEW 3 of 12 this paper by considering the problem of slow speed in calculation in the conventional SFS hybrid reflection method. The Lambert-Phong hybrid reflection model is proposed, combining the advantages of the fast solution speed of the Lambertian method and the high accuracy of the Phong method at the highlight region. First, the algorithm denoises the input droplets image. Then, the Mask R-CNN depth learning neural network is used to perform highlight area segmentation, so as to obtain the highlight area coordinates for the next step. Next, the optimization algorithm is linearized to the highlight region and non-highlight region at the pixel level. Then, the two parts are combined to realize the 3D shape reconstruction of the droplet. Finally, the precision and running speed of the composite image and the real image are analyzed, respectively. The algorithm flow is shown in Figure 1.

Highlight Region Segmentation Based on Mask R-CNN
Mask R-CNN is a conceptually simple and flexible method for object instance segmentation that uses the same first half portion of the program as Faster R-CNN: a region proposal network (RPN) is utilized for region of interest extraction [30]. While predicting the box offset and class for each region of interest, it outputs a binary mask. This method does not require a compression operation as Faster R-CNN does. Note that the FCN can be applied to each region of interest (ROI) for the prediction of a segmentation mask, since the mask directly represents the correspondence between pixels by convolution. Figure 2 shows the specific flow of this method:


Input of the normalized image into the main network To facilitate the generation of the mask, the fixed 512 × 512 images are input into the network [31], which have undergone median filtering and normalization.

Feature extraction and generation of regions of interest
The image is sent to the main network to extract the data, and then the region proposal network is used to find the region of interest. Subsequently, a layer called ROIAlign is adopted that accurately aligns the extracted features with the input to improve the accuracy of the object mask.


Proposing the box offset, the class, and the mask A n × n sliding window is used to generate a one-dimensional fully connected feature in the fifth convolutional layer of the network. Ultimately there are three branches generated [32], which contain the information to predict: reg-layer, cls-layer, and object mask. Thence, the first two branches are used for bounding-box classification and regression in parallel. The third branch is used to output the binary mask of the highlight feature called "Star".
With completion of the Mask R-CNN training, the test image is input into the network, and then it is feasible to obtain and output the highlight position information. Finally, the diffuse reflection area and the highlight area of the droplet to be tested can be spotted. Clearly, the red area in Figure 3 is the highlight area identified by the experiment.

Highlight Region Segmentation Based on Mask R-CNN
Mask R-CNN is a conceptually simple and flexible method for object instance segmentation that uses the same first half portion of the program as Faster R-CNN: a region proposal network (RPN) is utilized for region of interest extraction [30]. While predicting the box offset and class for each region of interest, it outputs a binary mask. This method does not require a compression operation as Faster R-CNN does. Note that the FCN can be applied to each region of interest (ROI) for the prediction of a segmentation mask, since the mask directly represents the correspondence between pixels by convolution. Figure 2 shows the specific flow of this method:

•
Input of the normalized image into the main network To facilitate the generation of the mask, the fixed 512 × 512 images are input into the network [31], which have undergone median filtering and normalization.

Feature extraction and generation of regions of interest
The image is sent to the main network to extract the data, and then the region proposal network is used to find the region of interest. Subsequently, a layer called ROIAlign is adopted that accurately aligns the extracted features with the input to improve the accuracy of the object mask.

•
Proposing the box offset, the class, and the mask A n × n sliding window is used to generate a one-dimensional fully connected feature in the fifth convolutional layer of the network. Ultimately there are three branches generated [32], which contain the information to predict: reg-layer, cls-layer, and object mask. Thence, the first two branches are used for bounding-box classification and regression in parallel. The third branch is used to output the binary mask of the highlight feature called "Star".
With completion of the Mask R-CNN training, the test image is input into the network, and then it is feasible to obtain and output the highlight position information. Finally, the diffuse reflection area and the highlight area of the droplet to be tested can be spotted. Clearly, the red area in Figure 3 is the highlight area identified by the experiment.

3D Shape Recovery Based on Combined Optimization Model
In the case of an ideal diffuse reflector, the effects of diffuse reflection components are idealistically considered. The reflection model is as follows [33]: where   In fact, most object surface reflections can be considered as a linear combination of diffuse and specular components, since they both exist on the surface of the object. This relationship is precisely described by the Phong model:

3D Shape Recovery Based on Combined Optimization Model
In the case of an ideal diffuse reflector, the effects of diffuse reflection components are idealistically considered. The reflection model is as follows [33]: where   In fact, most object surface reflections can be considered as a linear combination of diffuse and specular components, since they both exist on the surface of the object. This relationship is precisely described by the Phong model:

3D Shape Recovery Based on Combined Optimization Model
In the case of an ideal diffuse reflector, the effects of diffuse reflection components are idealistically considered. The reflection model is as follows [33]: where (x, y) is the position of the corresponding pixel, (p, q) is the gradient information of the image pixels, R(p, q) is the Lambertian model reflection function, E l (x, y) is the luminance information of the image after normalization, and ρ l is the surface reflection coefficient of the diffuse reflection component. In fact, most object surface reflections can be considered as a linear combination of diffuse and specular components, since they both exist on the surface of the object. This relationship is precisely described by the Phong model: where ρ l and ρ s are the surface reflection coefficients of the diffuse and specular components, respectively, w ∈ [0, 1] is the smoothing factor, n s is the specular reflection factor, (p m , q m , −1) is the direction vector of the light source, and (p k , q k , −1) is the source direction vector-the direction vector on the intersection of the light source and the camera.
It is obvious that the equation adds the specular component compared with the Lambertian reflection model. Then, we combine the two models and propose k h in the model as a highlight factor to obtain an optimized model: where k h = 1 corresponds to the highlight region, and then Phong model is used. k h = 0 corresponds to the diffuse reflection region, and then the Lambertian reflection model is used. Therefore, Equation (3) is used to calculate and identify different regions of the Lambert-Phong hybrid model. It is applicable to solve the Lambertian model by using the linearization method in the diffuse reflection region: Then, Taylor expansion is applied: Therefore, the following equation can be obtained: Therefore, when given the initial value, iterative calculation can be performed to obtain the final iteration result called z l = z n i, j , which is the height of each point in the images. Similarly, the Phong hybrid model is used to solve the highlight region. Because E(x, y) = R(p, q), E x = R x , E y = R y , the original equation is corrected by introducing the image gradient weighting coefficient: Then, the following equation is obtained: where A new expression of the objective function can be obtained based on the central difference of the discretization of the target equation.
With the acquisition of this equation, the Newton iteration method is used to solve the value of the height z, and its equation structure is similar to the Lambertian linear method.
Finally, the following equation is established: where β i,j is the harmonic coefficient of the pixel, which is generally set as 1/13. With the completion of the Newton iteration, the convergence value z p = z n + 1 i, j is obtained, which is the final height of the 3D contour solved based on the Phong hybrid reflection model. Finally, the height value solved by the optimization algorithm of this paper is obtained by combining the height values obtained by the two models:

Three-Dimensional Shape Recovery Based on Combined Optimization Model
As shown in Figure 4, the droplet image acquisition experimental platform designed in this paper was mainly composed of a front camera, a coaxial light source, a turntable, and a side camera for precision calibration, power supply, computer, and various connecting frames.
With the acquisition of this equation, the Newton iteration method is used to solve the value of the height z, and its equation structure is similar to the Lambertian linear method.
Finally, the following equation is established: where , i j  is the harmonic coefficient of the pixel, which is generally set as 1/13. With the completion of the Newton iteration, the convergence value 1 , is obtained, which is the final height of the 3D contour solved based on the Phong hybrid reflection model.
Finally, the height value solved by the optimization algorithm of this paper is obtained by combining the height values obtained by the two models:

Three-Dimensional Shape Recovery Based on Combined Optimization Model
As shown in Figure 4, the droplet image acquisition experimental platform designed in this paper was mainly composed of a front camera, a coaxial light source, a turntable, and a side camera for precision calibration, power supply, computer, and various connecting frames.
In order to facilitate the experiment, the droplet was placed on the center of the turntable, and the positions of the light source and the camera were adjusted to make it perpendicular to the turntable. Meanwhile the position of the side camera was adjusted to make it parallel to the turntable. The side camera was used to collect the positive side image of the droplet, and it was assumed to be the theoretical height data for the side view of the droplet. The theoretical height data of the side view of the droplet were used for comparison with the height data reconstructed by the algorithm and as the final experimental data accuracy evaluation standard.

Precision Analysis of Synthetic Image
It is difficult to detect the 3D shape of a real image. Therefore, the composite image was first to be tested. In this section, the composite hemisphere is used for 3D reconstruction. The equation for the hemisphere is as follows: In order to facilitate the experiment, the droplet was placed on the center of the turntable, and the positions of the light source and the camera were adjusted to make it perpendicular to the turntable. Meanwhile the position of the side camera was adjusted to make it parallel to the turntable. The side camera was used to collect the positive side image of the droplet, and it was assumed to be the theoretical height data for the side view of the droplet. The theoretical height data of the side view of the droplet were used for comparison with the height data reconstructed by the algorithm and as the final experimental data accuracy evaluation standard.

Precision Analysis of Synthetic Image
It is difficult to detect the 3D shape of a real image. Therefore, the composite image was first to be tested. In this section, the composite hemisphere is used for 3D reconstruction. The equation for the hemisphere is as follows: where (x o , y o , −1) is the position coordinate of the center of the ball, z is the height of the corresponding position (x, y), and r is the radius. Here, we assign r = 50 pix. The composite hemisphere is shown in Figure 5. x y , and r is the radius. Here, we assign r = 50 pix. The composite hemisphere is shown in Figure 5.  As shown in Figure 6b, the Lambertian reflection model was sensitive to highlight regions-even in the composite map there was a distortion district. However, the reconstruction effect was better in the diffuse reflection region. Figure 6c illustrates that the linearization of the Phong model was effective, and therefore could effectively deal with the highlight regions. However, it consumed too much time. Therefore, in this paper, based on the combination optimization, Mask R-CNN was used to segment the highlight region of the composite image as shown in Figure 6d. The coordinate information after segmentation can be used by the 3D shape reconstruction algorithm to improve the accuracy of reconstruction.
We solved the linearization of the Phong model for the highlight region and the linearization of the Lambertian model for the non-highlight region, respectively. The results of the three-dimensional reconstruction are shown in Figure 6e,f. We combined these two results to bring out the final experimental data of the algorithm. Figure 6g shows the best solution, as there was no distortion in both highlight and non-highlight regions, and the calculation speed was relatively faster. In this paper, the maximum cross-sectional height average relative error and height root mean square error of the composite image were solved respectively. The formulas are as follows: where ARE is the height average relative error, RMSE is the height root mean square error, n is the total number of processed images, m is the total number of pixels of the largest cross section of the composite sphere, a Z is the height value after the reconstruction, and z is the actual height value.
The running time and accuracy are shown in Table 1. Table 1 indicates that the Lambert-Phong model had the smallest ARE value among the three compared models, and a slightly larger RMSE value than the Phong model. The CPU time was only 0.73761 s, which is much shorter than the Phong model. Thus, we conclude that the proposed Lambert-Phong model inherits the advantages of high efficiency from the Lambertian model and high accuracy from the Phong model.  Figure 6a is a grayscale image corresponding to 14.3% high gloss. The parameters were as follows: n s = 10, w = 0.5, k h = 1, p l = 0.857, p s = 0.143. The direction of the light source (p m , q m , −1) was (0, 0, −1).
As shown in Figure 6b, the Lambertian reflection model was sensitive to highlight regions-even in the composite map there was a distortion district. However, the reconstruction effect was better in the diffuse reflection region. Figure 6c illustrates that the linearization of the Phong model was effective, and therefore could effectively deal with the highlight regions. However, it consumed too much time. Therefore, in this paper, based on the combination optimization, Mask R-CNN was used to segment the highlight region of the composite image as shown in Figure 6d. The coordinate information after segmentation can be used by the 3D shape reconstruction algorithm to improve the accuracy of reconstruction.
We solved the linearization of the Phong model for the highlight region and the linearization of the Lambertian model for the non-highlight region, respectively. The results of the three-dimensional reconstruction are shown in Figure 6e,f. We combined these two results to bring out the final experimental data of the algorithm. Figure 6g shows the best solution, as there was no distortion in both highlight and non-highlight regions, and the calculation speed was relatively faster. In this paper, the maximum cross-sectional height average relative error and height root mean square error of the composite image were solved respectively. The formulas are as follows: where ARE is the height average relative error, RMSE is the height root mean square error, n is the total number of processed images, m is the total number of pixels of the largest cross section of the composite sphere, Z a is the height value after the reconstruction, and z is the actual height value. The running time and accuracy are shown in Table 1. Table 1 indicates that the Lambert-Phong model had the smallest ARE value among the three compared models, and a slightly larger RMSE value than the Phong model. The CPU time was only 0.73761 s, which is much shorter than the Phong model. Thus, we conclude that the proposed Lambert-Phong model inherits the advantages of high efficiency from the Lambertian model and high accuracy from the Phong model.  From the above experimental data, there was a large error using the diffuse reflection model in the highlight region because it ignores the specular reflection components of the highlight region, whereas the Lambert-Phong optimization model developed in this study solves the distortion and shape of the composite image and the side cross section contrast diagram using the Lambertian model algorithm (dark blue is the cross section of reconstruction, light blue is the cross section of the real shape); (c) 3D shape of the composite image and the side cross section contrast diagram using the Phong hybrid model algorithm; (d) highlight detection's results of the composite image; (e) 3D shape of the composite image at the highlight region using the Phong hybrid model algorithm; (f) 3D shape of the composite image at the non-highlight region using the Lambertian model algorithm; (g) 3D shape of the composite image and the side cross section contrast diagram using the Lambert-Phong model algorithm. From the above experimental data, there was a large error using the diffuse reflection model in the highlight region because it ignores the specular reflection components of the highlight region, whereas the Lambert-Phong optimization model developed in this study solves the distortion and time problems of the two algorithms, and the reconstruction effect was better than that of the single algorithm.

Precision Analysis of Real Image
In this section, the above algorithm is applied to the 3D shape reconstruction experiment of the droplets, and the experimental results are compared with different algorithms. Since the 3D shape of the real tiny droplets is difficult to detect accurately, this paper calculates the accuracy through the maximum external contour of the side image.
The experimental results of the 3D shape in Figure 7b demonstrate that the diffuse reflection model solved by linearization could not accurately describe the specular component of the surface of the object. Therefore, a large distortion occurred in the highlight region, which affected the accuracy of reconstruction greatly. Comparison with the experimental results of the Phong hybrid model shown in Figures 6c and 7c demonstrates that although the reconstruction effect of the highlight region and the non-highlight region was better in the composite image, the distortion phenomenon appeared in the droplet's boundary in the real image, which reduced the accuracy to a certain extent. The results of the combined Lambert-Phong optimization model proposed in this paper are shown in Figure 7d. It is clear that the experimental result was more accurate, the reflection characteristics of the droplet surface could be expressed more accurately, and the reconstruction error was also well-solved. Hence, it can be concluded that the Lambert-Phong model was applicable for the droplet in this case.
In this section, the data of the three algorithms are compared and the maximum cross-sectional height average relative error and height root mean square error are calculated. As shown in Figure 3a, the cross section of the real image was taken with a-b as the cross-sectional line. The cross section contrast diagrams of the three algorithms are shown in Figure 7b Table 2. It can be seen from the error table that the accuracy of the algorithm proposed in this paper was higher than that of the diffuse reflection model and Phong hybrid model, and therefore it is suitable for the three-dimensional reconstruction of industrial droplets. In the real image, the solution speed of this model was still faster than other models, so the model is applicable for the droplets' experimental environment.
Micromachines 2018, 9, x FOR PEER REVIEW 9 of 12 time problems of the two algorithms, and the reconstruction effect was better than that of the single algorithm.

Precision Analysis of Real Image
In this section, the above algorithm is applied to the 3D shape reconstruction experiment of the droplets, and the experimental results are compared with different algorithms. Since the 3D shape of the real tiny droplets is difficult to detect accurately, this paper calculates the accuracy through the maximum external contour of the side image.
The experimental results of the 3D shape in Figure 7b demonstrate that the diffuse reflection model solved by linearization could not accurately describe the specular component of the surface of the object. Therefore, a large distortion occurred in the highlight region, which affected the accuracy of reconstruction greatly. Comparison with the experimental results of the Phong hybrid model shown in Figures 6c and 7c demonstrates that although the reconstruction effect of the highlight region and the non-highlight region was better in the composite image, the distortion phenomenon appeared in the droplet's boundary in the real image, which reduced the accuracy to a certain extent. The results of the combined Lambert-Phong optimization model proposed in this paper are shown in Figure 7d. It is clear that the experimental result was more accurate, the reflection characteristics of the droplet surface could be expressed more accurately, and the reconstruction error was also well-solved. Hence, it can be concluded that the Lambert-Phong model was applicable for the droplet in this case.
In this section, the data of the three algorithms are compared and the maximum cross-sectional height average relative error and height root mean square error are calculated. As shown in Figure 3a, the cross section of the real image was taken with a-b as the cross-sectional line. The cross section contrast diagrams of the three algorithms are shown in Figure 7b Table 2. It can be seen from the error table that the accuracy of the algorithm proposed in this paper was higher than that of the diffuse reflection model and Phong hybrid model, and therefore it is suitable for the three-dimensional reconstruction of industrial droplets. In the real image, the solution speed of this model was still faster than other models, so the model is applicable for the droplets' experimental environment.

Conclusions and Prospect
In this paper, a novel approach based on Mask R-CNN and improved Lambert-Phong model is carried out to reconstruct the micro-droplet's 3D shape with the advantages of high accuracy and efficiency. Firstly, Mask R-CNN is used to segment the highlight region and diffuse reflectance region of the droplets, and then the Lambert and Phong models are combined to reconstruct the 3D shapes of diffuse reflectance region and highlight region, respectively. Finally, the above two results are combined to get the final 3D shape of the droplets. In the experiment, the reconstruction errors of 3.81% and 8.06% in the composite image and the actual droplet image based on Lambert-Phong model proposed in this paper were both smaller than the other algorithms based on a single model, which shows that the algorithm in this paper had good experimental precision and running speed. This study provides an effective way for monitoring the volume and shape of droplets in the microelectronic dispensing area in real-time. In the future, studies about the 3D shape recovery of some irregularly shaped droplets and a droplet jetting volume control method based on the proposed shape recovery algorithm will be explored deeply.

Conclusions and Prospect
In this paper, a novel approach based on Mask R-CNN and improved Lambert-Phong model is carried out to reconstruct the micro-droplet's 3D shape with the advantages of high accuracy and efficiency. Firstly, Mask R-CNN is used to segment the highlight region and diffuse reflectance region of the droplets, and then the Lambert and Phong models are combined to reconstruct the 3D shapes of diffuse reflectance region and highlight region, respectively. Finally, the above two results are combined to get the final 3D shape of the droplets. In the experiment, the reconstruction errors of 3.81% and 8.06% in the composite image and the actual droplet image based on Lambert-Phong model proposed in this paper were both smaller than the other algorithms based on a single model, which shows that the algorithm in this paper had good experimental precision and running speed. This study provides an effective way for monitoring the volume and shape of droplets in the microelectronic dispensing area in real-time. In the future, studies about the 3D shape recovery of some irregularly shaped droplets and a droplet jetting volume control method based on the proposed shape recovery algorithm will be explored deeply.