The Application of Deep Learning and Image Processing Technology in Laser Positioning

In this study, machine vision technology was used to precisely position the highest energy of the laser spot to facilitate the subsequent joining of product workpieces in a laser welding machine. The displacement stage could place workpieces into the superposition area and allow the parts to be joined. With deep learning and a convolutional neural network training program, the system could enhance the accuracy of the positioning and enhance the efficiency of the machine work. A bi-analytic deep learning localization method was proposed in this study. A camera was used for real-time monitoring. The first step was to use a convolutional neural network to perform a large-scale preliminary search and locate the laser light spot region. The second step was to increase the optical magnification of the camera, re-image the spot area, and then use template matching to perform high-precision repositioning. According to the aspect ratio of the search result area, the integrity parameters of the target spot were determined. The centroid calculation was performed in the complete laser spot. If the target was an incomplete laser spot, the operation of invariant moments would be performed. Based on the result, the precise position of the highest energy of the laser spot could be obtained from the incomplete laser spot image. The amount of displacement could be calculated by overlapping the highest energy of the laser spot and the center of the image.


Introduction
As science and technology change quickly, the size of parts decreases continuously, the combination of miniature parts requires higher precision than general parts, and the production time for each part is fixed in mass production.If the solder at the junction fails to absorb adequate energy due to laser radiation offset, the weld may fail, and if the high-energy region of the laser irradiates other positions of the part, the part may be rejected directly.Therefore, it is necessary to search for and position the highest energy region of the laser.
Many researches use different hand-craft features to represent the visual contents for images and try to seek appropriate similarity measurement to make the similarity of low-level features to be closer to the similarity of the high-level concepts.The deep convolutional learning method often improves its ability for feature extraction and its efficiency for similarity measurement [1].The feature training classifier of a sample image was extracted using a neural network before matching [2], the position of the laser spot in the input image was found [3,4], the center spot was searched for, and the laser spot position was moved to overlap the image center of a CCD (Charge-coupled Device) camera to complete positioning.However, when a large range of images was taken, images of incomplete spots often appeared.That still lead to precise positioning of the spot re-imaging process, but it took a lot of time.So the purpose of this study was to combine machine vision with a deep learning algorithm to build a positioning system that could align the laser's center spot automatically and quickly.
Traditional template matching method has many defects, for example, every time the template is changed, continuous rematching and weight updating are required, wasting much time, and when the matched image has an area similar to the template image but not the target area, there may be false matching.However, it must be considered that imaging systems use different databases, different imaging systems, and different parameters.A high-resolution image provides precise and easy position detection, but in real-time systems, they lead to high computational cost [5].A hierarchical matching can be used to solve the problem of recognition accuracy [6].However, none of the existing classification schemes encompasses all the possible features due to the vastness of the variability in feature geometry and topology [7].The purpose of this study was to use deep learning to enhance feature recognition and shorten the time needed to obtain a result, so as to remedy the defects.

Template Matching and Invariant Moments
Template matching is used to look for the part of an image that matches the template image.First, the sizes of the read-in template image and the matched image are calculated and stored.If the template image cannot divide the matched image exactly, the right and bottom of the image are supplemented by 0. The image is then substituted into an algorithm to determine the weight of the template image for each point in the matched image, and the maximum weight is found out in order to know the position of the upper left corner of the template image in the matched image.The length and width of the template image are inputted, and the region of the template image can be indicated in the matched image [8][9][10][11].
The template matching method has numerous algorithms.This study used the Sum of Squared Differences (SSD), expressed as Equation (1) [11,12]: where D is the resulting region after matching is completed, S is the m × n matched image base, and T is the M × N image template, where 1 As the weight must be calculated again for the matched image each time the template is changed, template matching takes considerable time; however, the accuracy is much higher than the starting point search method, therefore, this method was used for study [13] (Figure 1).but it took a lot of time.So the purpose of this study was to combine machine vision with a deep learning algorithm to build a positioning system that could align the laser's center spot automatically and quickly.
Traditional template matching method has many defects, for example, every time the template is changed, continuous rematching and weight updating are required, wasting much time, and when the matched image has an area similar to the template image but not the target area, there may be false matching.However, it must be considered that imaging systems use different databases, different imaging systems, and different parameters.A high-resolution image provides precise and easy position detection, but in real-time systems, they lead to high computational cost [5].A hierarchical matching can be used to solve the problem of recognition accuracy [6].However, none of the existing classification schemes encompasses all the possible features due to the vastness of the variability in feature geometry and topology [7].The purpose of this study was to use deep learning to enhance feature recognition and shorten the time needed to obtain a result, so as to remedy the defects.

Template Matching and Invariant Moments
Template matching is used to look for the part of an image that matches the template image.First, the sizes of the read-in template image and the matched image are calculated and stored.If the template image cannot divide the matched image exactly, the right and bottom of the image are supplemented by 0. The image is then substituted into an algorithm to determine the weight of the template image for each point in the matched image, and the maximum weight is found out in order to know the position of the upper left corner of the template image in the matched image.The length and width of the template image are inputted, and the region of the template image can be indicated in the matched image [8][9][10][11].
The template matching method has numerous algorithms.This study used the Sum of Squared Differences (SSD), expressed as Equation (1) [11,12]: where D is the resulting region after matching is completed, S is the × matched image base, and T is the × image template, where 1 ≤ ≤ − + 1, 1 ≤ ≤ − + 1.
As the weight must be calculated again for the matched image each time the template is changed, template matching takes considerable time; however, the accuracy is much higher than the starting point search method, therefore, this method was used for study [13] (Figure 1).In continuous conditions, the image function is f (x, y).The p + q geometric moment of the image (standard moment) is defined as: According to the 0th origin moment and the first origin moment, the barycentric coordinates x, y of the target area can be obtained: x = m 10 m 00 y = m 01 /m 00 (3) The p + q central moment is defined as: As the central moment is constructed by selecting the centroid of the target area as the center, the moment calculation will always be the point of the target area in relation to the centroid of the target area.It is unrelated to the position of the target area, and it has translation invariance.
For discrete digital images, the integral is substituted by the summation sign: x p y q f (x, y) p, q = 0, 1, 2 • • • (5) where N and M are the height and width of the image, respectively.In order to cancel the effect of scale change on the central moment, each central moment is normalized using 0th central moment µ 00 to obtain the normalized central moment: Therefore, the 0th moment represents the quality of the target area (area).If the scale of the target area changes, the 0th central moment will decrease and the moment will have scale invariance.
Physical Seven invariant moments M1~M7 are constructed using the second and third normalized central moments: The seven invariant moments compose a set of characteristic quantities with rotation, scaling, and translation invariances.

Convolutional Neural Network
The major difference between the Convolutional Neural Network (CNN) and the traditional multilayer perception network is the convolutional and pooling layers.The two layers enable the CNN to recognize image (or voice) details, whereas the other neural networks only extract data for computation.The three layers of CNN are briefly described below.

Convolutional Layer
The convolutional layer scans the picture from top to bottom through a filter with a fixed size to obtain various local features as the input of the next layer.After ReLU function processing, the values smaller than 0 are exported as 0, while those larger than 0 are exported directly.This result is the feature map.Each point in the feature map can be regarded as a feature of the region in the original pattern and transmitted to the next layer.The convolutional layer of CNN concentrates on obtaining these local features of the pattern [14,15].
All the convolution operation outputs are then transformed by a nonlinear activation function.Equation ( 9) is an example of a nonlinear activation function [14]: The convolution operation of the same convolutional layer and previous layer share the same weight, expressed as Equation ( 9), where l is the convolutional layer, W is the shared weight, M is the characteristic pattern differing from the previous layer, j is the characteristic pattern of one output, B is the bias in the layer, and (*) represents the convolution operation.
If the traditional deep learning network is used for recognizing an image, the original two-dimensional picture must be split into one-dimensional image.Each pixel is regarded as an eigenvalue and is put into the DNN architecture for analysis.Thus, these input pixels lose the original spatial arrangement information.The purpose of the convolutional layer of CNN is to maintain the spatial arrangement of the image and obtain a partial image as the input feature [14].

Pooling Layer
The function of the pooling layer is to reduce the input picture size to reduce the dimensionality of each feature map and maintain important features.This has many advantages, such as reducing the parameters for subsequent layers, accelerating system running, and reducing over-fitting.
Similar to the convolutional layer, the pooling layer uses a filter to extract the values of various regions for operation; however, the final output is free from the activate function (the function of the convolutional layer is ReLU).
The pooling average value after each convolutional layer is calculated by Equation ( 11) [14]: where i and j are the positions of the output map, and M and N are the pooling sample sizes in two orthogonal dimensions.
If the input image size has an odd number of pixels or cannot be divided exactly (the filter is not 2 × 2), it can be handled using peripheral zero fill or by abandoning edges.

Full Connected Layer
The full connected layer refers to a general neural network.After the convolutional layer extracts features and the pooling layer reduces image parameters, the feature information is put into the full connected layer for classification.Each neuron is connected to a pixel of the previous filter, and each connected weight is identical and shared in the same layer.However, as each connection has an independent and dissimilar weight, the full connected layer consumes considerable computing resources [14].

Experimental Results
When the CCD camera obtains an image, as the light spot is located somewhere in the image, the range of the light spot image center will be searched preliminarily, as shown in Figure 2.After the region of the light spot is found by the program, the center point of the light spot region is displaced to overlap the CCD image center, and the magnification of the CCD is increased to coincide with the size of the light spot region.The light spot region is thus separated, and the first search is completed [16][17][18].The pooling average value after each convolutional layer is calculated by Equation ( 11) [14]: where i and j are the positions of the output map, and M and N are the pooling sample sizes in two orthogonal dimensions.
If the input image size has an odd number of pixels or cannot be divided exactly (the filter is not 2 × 2), it can be handled using peripheral zero fill or by abandoning edges.

Full Connected Layer
The full connected layer refers to a general neural network.After the convolutional layer extracts features and the pooling layer reduces image parameters, the feature information is put into the full connected layer for classification.Each neuron is connected to a pixel of the previous filter, and each connected weight is identical and shared in the same layer.However, as each connection has an independent and dissimilar weight, the full connected layer consumes considerable computing resources [14].

Experimental Results
When the CCD camera obtains an image, as the light spot is located somewhere in the image, the range of the light spot image center will be searched preliminarily, as shown in Figure 2.After the region of the light spot is found by the program, the center point of the light spot region is displaced to overlap the CCD image center, and the magnification of the CCD is increased to coincide with the size of the light spot region.The light spot region is thus separated, and the first search is completed [16][17][18].This system uses a convolutional neural network to search for the laser spot region.The neural network is trained using images with multiple complete light spots and incomplete light spot modes so that the program can judge the position of the light spot in the image accurately and rapidly.
When the light spot region is positioned in the image successfully, the first part will use the center of the light spot region as the image center.This deviates from the actual light spot center, and the system will consider that the positioned laser spot region may not be a complete light spot, as shown in Figure 3.The distance between the CCD image center and the light spot center is measured, and the two centers are shifted and overlapped.This system uses a convolutional neural network to search for the laser spot region.The neural network is trained using images with multiple complete light spots and incomplete light spot modes so that the program can judge the position of the light spot in the image accurately and rapidly.
When the light spot region is positioned in the image successfully, the first part will use the center of the light spot region as the image center.This deviates from the actual light spot center, and the system will consider that the positioned laser spot region may not be a complete light spot, as shown in Figure 3.The distance between the CCD image center and the light spot center is measured, and the two centers are shifted and overlapped.There are two processing modes, and whether the found light spot is complete or not is judged according to the aspect ratio of the light spot region.If it is a complete light spot, the centroid point of the light spot will be extracted using the centroid method, the distance between the CCD image center and the light spot center will be calculated, and the image center will be moved to overlap the light spot center.If it is not a complete light spot, the invariant moments will be calculated for the incomplete light spot region in the image and matched with the preset light spot source image to find out the position of the partial light spot region centroid in the light spot.The distance between the CCD image center and the light spot center is calculated, and the image center is moved to overlap the light spot center.
The positioning effects of this system with different sample numbers are described below.A total of 25, 50, and 100 positive samples were used to train the neural network.
Figure 4 shows the program results of using 25 positive samples and 50 negative samples to train the neural network.According to Figure 4a, the program could position the complete light spot successfully, but there was still misrecognition.The incomplete light spot in Figure 4b could not be recognized successfully.In order to improve the results, this system increased the sample number and the program was trained again.Figure 4 shows the program results of using 50 positive samples and 100 negative samples to There are two processing modes, and whether the found light spot is complete or not is judged according to the aspect ratio of the light spot region.If it is a complete light spot, the centroid point of the light spot will be extracted using the centroid method, the distance between the CCD image center and the light spot center will be calculated, and the image center will be moved to overlap the light spot center.If it is not a complete light spot, the invariant moments will be calculated for the incomplete light spot region in the image and matched with the preset light spot source image to find out the position of the partial light spot region centroid in the light spot.The distance between the CCD image center and the light spot center is calculated, and the image center is moved to overlap the light spot center.
The positioning effects of this system with different sample numbers are described below.A total of 25, 50, and 100 positive samples were used to train the neural network.
Figure 4 shows the program results of using 25 positive samples and 50 negative samples to train the neural network.According to Figure 4a, the program could position the complete light spot successfully, but there was still misrecognition.The incomplete light spot in Figure 4b could not be recognized successfully.In order to improve the results, this system increased the sample number and the program was trained again.
total of 25, 50, and 100 positive samples were used to train the neural network.
Figure 4 shows the program results of using 25 positive samples and 50 negative samples to train the neural network.According to Figure 4a, the program could position the complete light spot successfully, but there was still misrecognition.The incomplete light spot in Figure 4b could not be recognized successfully.In order to improve the results, this system increased the sample number and the program was trained again.Figure 4 shows the program results of using 50 positive samples and 100 negative samples to train the neural network.Figure 5a shows that the misrecognition of complete light spots was improved, but it was unable to position the light spot region only.Figure 5b shows there were many misrecognitions but the position of incomplete light spots could be recognized.In order to further Figure 4 shows the program results of using 50 positive samples and 100 negative samples to train the neural network.Figure shows that the misrecognition of complete light spots was improved, but it was unable to position the light spot region only.Figure 5b shows there were many misrecognitions but the position of incomplete light spots could be recognized.In order to further increase the accuracy, the sample number was increased a second time and the network was trained again.After the neural network was trained using 100 positive samples and 200 negative samples, the program could position the region of complete light spots successfully.The misrecognition probability was reduced; however, it was still unable to position incomplete light spots.
Table 1 shows the results of using three different sample numbers and changing the feature region size to train the neural network to recognize 100 target images.According to the data, the accuracy increased significantly with the sample number.The result candidates contained correct results and misrecognitions, and the result of the target images minus the result candidates showed that the program failed to recognize the number of targets.After the neural network was trained using 100 positive samples and 200 negative samples, the program could position the region of complete light spots successfully.The misrecognition probability was reduced; however, it was still unable to position incomplete light spots.
Table 1 shows the results of using three different sample numbers and changing the feature region size to train the neural network to recognize 100 target images.According to the data, the accuracy increased significantly with the sample number.The result candidates contained correct results and misrecognitions, and the result of the target images minus the result candidates showed that the program failed to recognize the number of targets.According to Table 1, the system still had misrecognitions.This could be because the complete light spots and incomplete lights spots were recognized using the same neural network, or that the light spots were complete but only partial regions were positioned.In addition, when the laser was shot at the platform, the brightness would decrease outwards from the laser center spot, and the laser would be partially scattered by particles in the air.Visual recognition would be difficult because of the possibility of brightness in partial black regions.If the mode of these bright regions was close to the sample of the light spot, there could be misrecognition.
The region of the complete light spot was obtained successfully from the result of using 100 positive samples.According to the system flow, the CCD image center was moved to the positioning regional center and set as the template image, as shown in Figure 6.The aspect ratio of the region was calculated, and the aspect ratio of the template calculated by the program was 1.027.When the complete light spot was identified, the invariant moments of the template image were calculated and the first moment was taken, i.e., the barycentric coordinates of the light spot region, and these are indicated in the image, as shown in Figure 7.In the final step, the distance between the image center and the light spot center was calculated, and the CCD camera was moved.The image center and light spot center were overlapped, and the light spot center was aligned for the second time, as shown in Figure 8.The aspect ratio of the region was calculated, and the aspect ratio of the template calculated by the program was 1.027.When the complete light spot was identified, the invariant moments of the template image were calculated and the first moment was taken, i.e., the barycentric coordinates of the light spot region, and these are indicated in the image, as shown in Figure 7.The aspect ratio of the region was calculated, and the aspect ratio of the template calculated by the program was 1.027.When the complete light spot was identified, the invariant moments of the template image were calculated and the first moment was taken, i.e., the barycentric coordinates of the light spot region, and these are indicated in the image, as shown in Figure 7.In the final step, the distance between the image center and the light spot center was calculated, and the CCD camera was moved.The image center and light spot center were overlapped, and the light spot center was aligned for the second time, as shown in Figure 8.In the final step, the distance between the image center and the light spot center was calculated, and the CCD camera was moved.The image center and light spot center were overlapped, and the light spot center was aligned for the second time, as shown in Figure 8.The aspect ratio of the region was calculated, and the aspect ratio of the template calculated by the program was 1.027.When the complete light spot was identified, the invariant moments of the template image were calculated and the first moment was taken, i.e., the barycentric coordinates of the light spot region, and these are indicated in the image, as shown in Figure 7.In the final step, the distance between the image center and the light spot center was calculated, and the CCD camera was moved.The image center and light spot center were overlapped, and the light spot center was aligned for the second time, as shown in Figure 8.     Table 2 shows the mean errors of the image center and light spot center when the system performed light spot positioning with different sample numbers, where 1 pixel = 0.11 µm.It was observed that as the number of positive samples increased and the feature was enhanced, the center of the first positioning would be close to the center of the light spot and the time of the shift to the light spot center could be shortened.Figure 10 is a recognized incomplete light spot map.First, it was recognized as being range independent, and the invariant moments were calculated using the image center as a reference point, as shown in Table 3.  Table 2 shows the mean errors of the image center and light spot center when the system performed light spot positioning with different sample numbers, where 1 pixel = 0.11 µm.It was observed that as the number of positive samples increased and the feature was enhanced, the center of the first positioning would be close to the center of the light spot and the time of the shift to the light spot center could be shortened.Figure 10 is a recognized incomplete light spot map.First, it was recognized as being range independent, and the invariant moments were calculated using the image center as a reference point, as shown in Table 3. Figure 9 shows the gray level histogram of the laser spot.The positions of the light spot center and image center can be observed, and the distance between the two centers can be calculated by the coordinate relationship.Table 2 shows the mean errors of the image center and light spot center when the system performed light spot positioning with different sample numbers, where 1 pixel = 0.11 µm.It was observed that as the number of positive samples increased and the feature was enhanced, the center of the first positioning would be close to the center of the light spot and the time of the shift to the light spot center could be shortened.Figure 10 is a recognized incomplete light spot map.First, it was recognized as being range independent, and the invariant moments were calculated using the image center as a reference point, as shown in Table 3.When the invariant moments of the recognition region were obtained, similarity matching was performed with the invariant moment matrix of all points in the complete light spot obtained in advance, and the coordinate value of the maximum matching weight was exported, as shown in Figure 11.When the invariant moments of the recognition region were obtained, similarity matching was performed the invariant moment matrix of all points in the complete light spot obtained in advance, and the coordinate value of the maximum matching weight was exported, as shown in Figure 11.According to the above results, the position of the incomplete light spot image center in the complete light spot was obtained.Finally, as long as the distance between the image center and the complete light spot centroid was calculated and the position of the CCD was moved, the center point search could be completed (Figure 12).

Conclusions
In terms of the center spot search, compared with ordinary template matching, using invariant moments for matching was much faster.On the one hand, because the invariant moments digitized the image, the information to be processed at one time was 72% less than that for template matching using images for the operation.The invariant moment of every point of the source image would be According to the above results, the position of the incomplete light spot image center in the complete light spot was obtained.Finally, as long as the distance between the image center and the complete light spot centroid was calculated and the position of the CCD was moved, the center point search could be completed (Figure 12).When the invariant moments of the recognition region were obtained, similarity matching was performed with the invariant moment matrix of all points in the complete light spot obtained in advance, and the coordinate value of the maximum matching weight was exported, as shown in Figure 11.According to the above results, the position of the incomplete light spot image center in the complete light spot was obtained.Finally, as long as the distance between the image center and the complete light spot centroid was calculated and the position of the CCD was moved, the center point search could be completed (Figure 12).

Conclusions
In terms of the center spot search, compared with ordinary template matching, using invariant moments for matching was much faster.On the one hand, because the invariant moments digitized the image, the information to be processed at one time was 72% less than that for template matching using images for the operation.The invariant moment of every point of the source image would be

Conclusions
In terms of the center spot search, compared with ordinary template matching, using invariant moments for matching was much faster.On the one hand, because the invariant moments digitized the image, the information to be processed at one time was 72% less than that for template matching using images for the operation.The invariant moment of every point of the source image would be calculated in advance and saved in a database as long as the invariant moments of the matched region were calculated, and the similarity to the database was calculated.Template matching generates a weighting matrix during matching and the weighting matrix must be created again whenever the template is changed, which takes considerable time.
The proposed system uses the aspect ratio to check whether the light spot is complete or not.The positioning method is selected according to the complete image of the light spot; thus, the complex computation of all images is unnecessary, and the load on the overall system is reduced.

Figure 2 .
Figure 2. Schematic diagram of laser spot positioning.

Figure 2 .
Figure 2. Schematic diagram of laser spot positioning.

Figure 3 .
Figure 3. (a) Image of regional center that is not the light spot center; (b) incomplete laser spot image.

Figure 3 .
Figure 3. (a) Image of regional center that is not the light spot center; (b) incomplete laser spot image.

Figure 8 .
Figure 8. Schematic diagram of the second alignment.

Figure 8 .
Figure 8. Schematic diagram of the second alignment.Figure 8. Schematic diagram of the second alignment.

Figure 8 .
Figure 8. Schematic diagram of the second alignment.Figure 8. Schematic diagram of the second alignment.

Figure 9
Figure9shows the gray level histogram of the laser spot.The positions of the light spot center and image center can be observed, and the distance between the two centers can be calculated by the coordinate relationship.

Figure 9
Figure9shows the gray level histogram of the laser spot.The positions of the light spot center and image center can be observed, and the distance between the two centers can be calculated by the coordinate relationship.

Figure 9 .
Figure 9. Complete light spot gray level histogram.

Figure 9 .
Figure 9. Complete light spot gray level histogram.

Figure 9 .
Figure 9. Complete light spot gray level histogram.

Figure 10 .
Figure 10.Incomplete light spot recognition map.Figure 10.Incomplete light spot recognition map.

Figure 10 .
Figure 10.Incomplete light spot recognition map.Figure 10.Incomplete light spot recognition map.

Figure 12 .
Figure 12.Relative positions of incomplete light spot image center and light spot centroid.

Figure 12 .
Figure 12.Relative positions of incomplete light spot image center and light spot centroid.

Figure 12 .
Figure 12.Relative positions of incomplete light spot image center and light spot centroid.

Table 1 .
Experiment data of light spot positioning system.

Table 1 .
Experiment data of light spot positioning system.

Table 2 .
Mean error of light spot positioning with different sample numbers.

Table 2 .
Mean error of light spot positioning with different sample numbers.

Table 2 .
Mean error of light spot positioning with different sample numbers.

Table 3 .
Incomplete light spot invariant moments.

Table 3 .
Incomplete light spot invariant moments.