High Precision Dimensional Measurement with Convolutional Neural Network and Bi-Directional Long Short-Term Memory (LSTM)

In modern industries, high precision dimensional measurement plays a pivotal role in product inspection and sub-pixel edge detection is the core algorithm. Traditional interpolation and moment methods have achieved some success. However, those methods still have shortcomings. For example, the accuracy is still insufficient with the resolution limitation of the image sensor. Moreover, prediction results can be affected by image noise. With the recent success of deep learning technology, we propose a sub-pixel edge detection method based on convolution neural network (CNN) and bi-directional long short-term memory (LSTM). First, one-dimensional visual geometry group-16 (VGG-16) is employed to extract edge features. Then, a transformation operation is developed to generate sequence information. Lastly, bi-directional LSTM with fully-connected layers is introduced to output edge positions. Experimental results on our steel plate dataset demonstrate that our method achieves superior accuracy and anti-noise ability than traditional methods.


Introduction
A charge coupled device (CCD) is an important piece of digital imaging equipment. With the rapid development of CCD sensors and computer hardware, high precision dimensional measurement systems based on machine vision have been gradually adopted by industries such as automobile manufacturing, iron and steel manufacturing, and electronic manufacturing [1]. As shown in Figure 1, there are examples of industries applying dimensional measurement for their products. In the dimensional measurement systems, edge detection is the core algorithm. Due to the high precision requirements and the cost of the high-resolution CCD sensor, the accuracy of pixel-level edge detection is insufficient, and sub-pixel edge detection becomes an effective way to further improve the performance.
In this work, we take the production line of steel plate in the Taiyuan iron and steel industry for research. Figure 1b shows the actual scene of the production line. Sheared steel plates are conveyed along with the roller; the aim of the task is to design a system for the measurement of length. Figure 2 shows the sketch map of the designed dimensional measurement system. Steel plate images are collected by two industrial-grade array CCD sensors and the length of the steel plate can be calculated by the sum of the field interval in the middle, and the steel plate's length in the fields of two CCD sensors. Therefore, the core issue here is the high precision measurement of edge positions. Here, we take the images of one camera to test sub-pixel edge detection methods. The concept of sub-pixel edge detection was first introduced by Hueckel [2], which includes three main methods: fitting method, interpolation method, and moment method. The fitting methods acquire edge location by fitting the grey value of the hypothetical edge model. The method proposed by Ye [3] adopts the Gaussian edge function obtained by convoluting the ideal edge model. The interpolation methods [4][5][6] interpolate the grey value of the pixel to increase information and locate the sub-pixel edge positions. Moreover, for the moment method, Ghosal first utilizes Zernike orthogonal moment on edge detection, which only needs three masks to be calculated [7][8][9][10]. The method proposed by Xie [11] improves the Zernike orthogonal moment with Roberts operator and Otsu's method. However, in terms of the vast field-of-view of steel plates, those traditional methods cannot achieve a satisfactory accuracy with the resolution limitation of image. Furthermore, some images captured by the CCD sensor have noise such as scars and retro-reflective targets, which may affect the prediction results. In recent years, the convolution neural network (CNN) has achieved great success in the computer vision field [12][13][14][15][16][17][18][19], including feature extraction, classification, and regression. CNN-based models learn network parameters directly from data using backpropagation and have more hidden layers which means a more powerful nonlinear fitting ability. In the early stage, CNN mainly focuses on classification tasks. LeNet was introduced by Yann in 1998 [20], which was designed to deal with the recognition of handwritten characters. After that, AlexNet [21], ZF-Net [22] and GoogleNet [23] were proposed at the ImageNet large scale visual recognition competition (ILSVRC), all of which achieved good grades. VGG was proposed by the visual geometry group of Oxford University at 2015 [24,25], which has more than 10 hidden layers, a smaller filter size, and a more robust feature  The concept of sub-pixel edge detection was first introduced by Hueckel [2], which includes three main methods: fitting method, interpolation method, and moment method. The fitting methods acquire edge location by fitting the grey value of the hypothetical edge model. The method proposed by Ye [3] adopts the Gaussian edge function obtained by convoluting the ideal edge model. The interpolation methods [4][5][6] interpolate the grey value of the pixel to increase information and locate the sub-pixel edge positions. Moreover, for the moment method, Ghosal first utilizes Zernike orthogonal moment on edge detection, which only needs three masks to be calculated [7][8][9][10]. The method proposed by Xie [11] improves the Zernike orthogonal moment with Roberts operator and Otsu's method. However, in terms of the vast field-of-view of steel plates, those traditional methods cannot achieve a satisfactory accuracy with the resolution limitation of image. Furthermore, some images captured by the CCD sensor have noise such as scars and retro-reflective targets, which may affect the prediction results. In recent years, the convolution neural network (CNN) has achieved great success in the computer vision field [12][13][14][15][16][17][18][19], including feature extraction, classification, and regression. CNN-based models learn network parameters directly from data using backpropagation and have more hidden layers which means a more powerful nonlinear fitting ability. In the early stage, CNN mainly focuses on classification tasks. LeNet was introduced by Yann in 1998 [20], which was designed to deal with the recognition of handwritten characters. After that, AlexNet [21], ZF-Net [22] and GoogleNet [23] were proposed at the ImageNet large scale visual recognition competition (ILSVRC), all of which achieved good grades. VGG was proposed by the visual geometry group of Oxford University at 2015 [24,25], which has more than 10 hidden layers, a smaller filter size, and a more robust feature The concept of sub-pixel edge detection was first introduced by Hueckel [2], which includes three main methods: fitting method, interpolation method, and moment method. The fitting methods acquire edge location by fitting the grey value of the hypothetical edge model. The method proposed by Ye [3] adopts the Gaussian edge function obtained by convoluting the ideal edge model. The interpolation methods [4][5][6] interpolate the grey value of the pixel to increase information and locate the sub-pixel edge positions. Moreover, for the moment method, Ghosal first utilizes Zernike orthogonal moment on edge detection, which only needs three masks to be calculated [7][8][9][10]. The method proposed by Xie [11] improves the Zernike orthogonal moment with Roberts operator and Otsu's method. However, in terms of the vast field-of-view of steel plates, those traditional methods cannot achieve a satisfactory accuracy with the resolution limitation of image. Furthermore, some images captured by the CCD sensor have noise such as scars and retro-reflective targets, which may affect the prediction results.
In recent years, the convolution neural network (CNN) has achieved great success in the computer vision field [12][13][14][15][16][17][18][19], including feature extraction, classification, and regression. CNN-based models learn network parameters directly from data using backpropagation and have more hidden layers which means a more powerful nonlinear fitting ability. In the early stage, CNN mainly focuses on classification tasks. LeNet was introduced by Yann in 1998 [20], which was designed to deal with the recognition of handwritten characters. After that, AlexNet [21], ZF-Net [22] and GoogleNet [23] were proposed at the ImageNet large scale visual recognition competition (ILSVRC), all of which achieved good grades. VGG was proposed by the visual geometry group of Oxford University at 2015 [24,25], which has more than 10 hidden layers, a smaller filter size, and a more robust feature extraction ability. However, image noise may affect the prediction results, while long short-term  [26] has been proposed to handle the problem due to its excellent performance in analyzing sequence information.
LSTM is a widely used deep learning algorithm that aims to process and analyze sequence data. LSTM has been used in applications like natural language processing and the prediction of the stock market [27][28][29][30][31][32][33][34]. LSTM is developed based on the recurrent neural network (RNN) [35][36][37]. The traditional RNN has the problem of long-term dependency, which cannot connect the information when the gap between relevant information grows. LSTM avoids the long-term dependency problem by different gate structures that keep or drop out information. Moreover, the bi-directional LSTM model fuses forward propagation LSTM and backpropagation LSTM to connect both past and future information [38][39][40]. In our case, the positions of edge points in one image has relationships with both forward and backward ones. Therefore, bi-directional LSTM is a more appropriate option to optimize the edge positions further. For image noise, which may affect the extraction of edge points, bi-directional LSTM can learn edge information from adjacent unaffected edge points to rectify incorrect prediction results. Recent research has been proposed that combines the advantages of CNN and bi-directional LSTM for practical applications [41,42].
To further improve the accuracy of dimensional measurements with the limitation of image resolution, inspired by the analysis above, we propose a novel sub-pixel edge detection model based on CNN and bi-directional LSTM, which simultaneously has high precision and anti-noise ability. Our model adopts a one-dimensional visual geometry group-16 (VGG-16) to extract edge point features from the images. Then, a transformation module is developed to generate sequence information and bi-directional LSTM is followed to equip the model with anti-noise ability. In the end, a fully connected layer is employed to output the final prediction results. Experiments on our steel plate dataset demonstrate that the proposed model outperforms traditional methods and achieves only 0.112 of the overall mean absolute error (MAE) with the low image resolution of 512 pixels × 612 pixels.
The main contributions of this work are listed as follows: 1.
We propose a sub-pixel edge detection method based on deep learning for high precision dimensional measurements.

2.
We adopt CNN to extract features from images and introduce the anti-noise ability by adding bi-directional LSTM.

3.
We offer a sub-pixel edge detection dataset of steel plate used in training and testing sub-pixel edge detection methods.
The remainder of the paper is organized as follows: Section 2 describes different components of the proposed sub-pixel edge detection system. Section 3 introduces the dataset, preprocessing methods, training protocol, and results. Section 4 is the discussion. Section 5 is the conclusion.

Methods
In this paper, we propose a sub-pixel edge detection method based on deep learning for high precision dimensional measurements. Our work aims to predict accurate edge positions of steel plates with the resolution limitation of image. In order to obtain low-resolution input data and the corresponding sub-pixel ground truth, two steps of preprocessing should be noticed. First, we downsample each image of steel plate by four times and collect 90 horizontal lines' pixel value in the region of steel plate at the equal interval as the input data. Then, we measure the edge position of each line manually at the original resolution and divide the position by four to obtain the sub-pixel ground truth. Therefore, our method and comparison methods are all tested on the low-resolution images.
After the preprocessing steps, our proposed sub-pixel edge detection network is trained using the training data and the training procedure is based on the gradient descent algorithm which uses the updated parameters calculated by the loss function to improve the performance. Then, the trained model with the best parameters will be chosen to generate predictions on the test data. The pipeline of our sub-pixel edge detection system is illustrated in Figure 3.

Building Blocks
In this section, we introduce the three building blocks in the proposed model. We first apply a one-dimensional VGG-16 to parallel extract edge features from the 90 collected lines of pixel value. Then, a transformation module is developed to generate sequence information. Lastly, bi-directional LSTM is employed to introduce the anti-noise ability and make the prediction results more accurate. The architecture of the proposed network is shown in Figure 4.

VGG-16 as Feature Extractor
The purpose of applying VGG-16 to our model is to extract edge features from the input lines of pixel values. VGG nets proved to have excellent feature extraction performances at the Imagenet large scale visual recognition competition in 2014. The increase in the depth of the convolution layers has a significant improvement on feature extraction and the application of small convolution kernels reduces the number of parameters. In this work, the inputs are 90 one-dimensional vectors. To extract edge features from them, we adopt a VGG-16 with the one-dimensional kernel in each convolution and max-pooling layer. The 90 input vectors are considered as a batch, and the one-dimensional VGG-16 can extract edge features parallel with the same weights. Figure 5 shows the details of the

Building Blocks
In this section, we introduce the three building blocks in the proposed model. We first apply a one-dimensional VGG-16 to parallel extract edge features from the 90 collected lines of pixel value. Then, a transformation module is developed to generate sequence information. Lastly, bi-directional LSTM is employed to introduce the anti-noise ability and make the prediction results more accurate. The architecture of the proposed network is shown in Figure 4.

Building Blocks
In this section, we introduce the three building blocks in the proposed model. We first apply a one-dimensional VGG-16 to parallel extract edge features from the 90 collected lines of pixel value. Then, a transformation module is developed to generate sequence information. Lastly, bi-directional LSTM is employed to introduce the anti-noise ability and make the prediction results more accurate. The architecture of the proposed network is shown in Figure 4. The purpose of applying VGG-16 to our model is to extract edge features from the input lines of pixel values. VGG nets proved to have excellent feature extraction performances at the Imagenet large scale visual recognition competition in 2014. The increase in the depth of the convolution layers has a significant improvement on feature extraction and the application of small convolution kernels reduces the number of parameters. In this work, the inputs are 90 one-dimensional vectors. To extract edge features from them, we adopt a VGG-16 with the one-dimensional kernel in each convolution and max-pooling layer. The 90 input vectors are considered as a batch, and the one-dimensional VGG-16 can extract edge features parallel with the same weights. Figure 5 shows the details of the VGG-16 model used in this work. The purpose of applying VGG-16 to our model is to extract edge features from the input lines of pixel values. VGG nets proved to have excellent feature extraction performances at the Imagenet large scale visual recognition competition in 2014. The increase in the depth of the convolution layers has a significant improvement on feature extraction and the application of small convolution kernels reduces the number of parameters. In this work, the inputs are 90 one-dimensional vectors. To extract edge features from them, we adopt a VGG-16 with the one-dimensional kernel in each convolution and max-pooling layer. The 90 input vectors are considered as a batch, and the one-dimensional VGG-16 can extract edge features parallel with the same weights. Figure 5 shows the details of the VGG-16 model used in this work.

2.
Concat all the i V at first axis, the output is denoted as V , and the form is ( , , , ) N w h c .

3.
Expand the dimension of V , the new form is (1, , , , ) N w h c .

4.
Compress the width, height, and channel to one dimension, the form of the final output is (1, , * * ) N w h c .

Bi-Directional LSTM
The bi-directional LSTM network is introduced to improve accuracy and add anti-noise ability. The LSTM network has excellent performance in handling context information, while bi-directional LSTM not only accesses past context information but can also obtain future context information. In this work, the 90 edge points in one image have logical connections and their values vary little from both forward and backward ones. Therefore, bi-directional LSTM is more appropriate for our task compared to the vanilla LSTM. Figure 6 shows the details of the LSTM cell. The input of LSTM is the timing sequence, and for the input at any moment, there are three steps in a LSTM cell.

Bi-Directional LSTM
The bi-directional LSTM network is introduced to improve accuracy and add anti-noise ability. The LSTM network has excellent performance in handling context information, while bi-directional LSTM not only accesses past context information but can also obtain future context information. In this work, the 90 edge points in one image have logical connections and their values vary little from both forward and backward ones. Therefore, bi-directional LSTM is more appropriate for our task compared to the vanilla LSTM. Figure 6 shows the details of the LSTM cell. The input of LSTM is the timing sequence, and for the input at any moment, there are three steps in a LSTM cell.
The first step is to decide which context information to throw away from the cell state of last moment by a sigmoid layer called forget gate layer. f t decides which information to throw off and f t is formulated as follow: where W f is the weight of forget gate layer, b f is the bias, and σ is sigmoid operation. Then, another sigmoid layer called input gate layer is applied to decide which information should be reserved. i t · C t decides which information to keep, and the expression of i t and C t are as follows: where W i and W C is the weight of the input gate layer, b i and b C is the bias. Lastly, the third step is to decide which information deserved to be output, and this layer is called output gate layer. h t decides which information to output and the expressions of h t are as follows: new form of i V is (1, , , ) w h c .

2.
Concat all the i V at first axis, the output is denoted as V , and the form is ( , , , ) N w h c .

3.
Expand the dimension of V , the new form is (1, , , , ) N w h c .

4.
Compress the width, height, and channel to one dimension, the form of the final output is (1, , * * ) N w h c .

Bi-Directional LSTM
The bi-directional LSTM network is introduced to improve accuracy and add anti-noise ability. The LSTM network has excellent performance in handling context information, while bi-directional LSTM not only accesses past context information but can also obtain future context information. In this work, the 90 edge points in one image have logical connections and their values vary little from both forward and backward ones. Therefore, bi-directional LSTM is more appropriate for our task compared to the vanilla LSTM. Figure 6 shows the details of the LSTM cell. The input of LSTM is the timing sequence, and for the input at any moment, there are three steps in a LSTM cell.  With these three steps, the proposed model will obtain anti-noise ability and have more accurate predictions. The detailed configurations of the network are shown in Table 1.

MSE as Loss Function
The loss function is a necessary component in the deep learning network to calculate the deviation value between the prediction and ground truth and optimize the network parameters through backpropagation. In this work, we adopt the mean square error (MSE) loss function at the end of the proposed network. The expression of MSE is: whereX i is the predicted value, X i is the ground truth, and m is the batch size. During the training phase, the proposed network is trained by the stochastic gradient descent (SGD) algorithm to minimize the MSE loss.

Dataset
We evaluate our sub-pixel edge-detection system on the steel plate images collected at the Taiyuan Iron and Steel industry. At the end of the production line, all the finished steel plates need to be measured by the system. The dataset is captured by the industrial-grade array CCD sensor in the stainless-steel cold rolling production line with roller and steel plate in the image. The type of CCD sensor is Point Grey FL3-U3-120S3C-C and the resolution is 2048 × 2448. All the images have corresponding ground truth, 241 images for training, and 62 images for testing. Figure 7 shows the samples of the dataset.

MSE as Loss Function
The loss function is a necessary component in the deep learning network to calculate the deviation value between the prediction and ground truth and optimize the network parameters through backpropagation. In this work, we adopt the mean square error (MSE) loss function at the end of the proposed network. The expression of MSE is: where ˆi X is the predicted value, i X is the ground truth, and m is the batch size. During the training phase, the proposed network is trained by the stochastic gradient descent (SGD) algorithm to minimize the MSE loss.

Dataset
We evaluate our sub-pixel edge-detection system on the steel plate images collected at the Taiyuan Iron and Steel industry. At the end of the production line, all the finished steel plates need to be measured by the system. The dataset is captured by the industrial-grade array CCD sensor in the stainless-steel cold rolling production line with roller and steel plate in the image. The type of CCD sensor is Point Grey FL3-U3-120S3C-C and the resolution is × 2048 2448 . All the images have corresponding ground truth, 241 images for training, and 62 images for testing. Figure 7 shows the samples of the dataset.

Preprocessing the Dataset
To verify the superiority of the proposed method with the limitation of image resolution, all the images in the dataset need to be preprocessed. It is unnecessary to calculate all edge positions for length measurement and getting a fixed number of edge positions at equal interval is sufficient. Therefore, the preprocessing includes two steps: downsampling and collecting one-dimensional horizontal vectors from each image.
First, we downsample the original images by four times to obtain low-resolution images with the size of × 512 612 . Then, we select the region of interest (ROI) that covers the steel plate and pick 90 one-dimensional horizontal vectors at equal interval as the input data. The resolution of each vector is × 1 612 . Moreover, different from pixel-level edge detection, it is impossible to obtain absolute ground truth of sub-pixel edge position. Thus, the edge position of each selected vector is calculated manually on the original images and divided by four. In this way, the error of ground truth is within one-fourth of a pixel, and the accuracy can be guaranteed to the greatest extent. Finally, every 90 selected vectors and the corresponding ground truth from one image are considered as one set of input data to the proposed model.

Preprocessing the Dataset
To verify the superiority of the proposed method with the limitation of image resolution, all the images in the dataset need to be preprocessed. It is unnecessary to calculate all edge positions for length measurement and getting a fixed number of edge positions at equal interval is sufficient. Therefore, the preprocessing includes two steps: downsampling and collecting one-dimensional horizontal vectors from each image.
First, we downsample the original images by four times to obtain low-resolution images with the size of 512 × 612. Then, we select the region of interest (ROI) that covers the steel plate and pick 90 one-dimensional horizontal vectors at equal interval as the input data. The resolution of each vector is 1 × 612.
Moreover, different from pixel-level edge detection, it is impossible to obtain absolute ground truth of sub-pixel edge position. Thus, the edge position of each selected vector is calculated manually on the original images and divided by four. In this way, the error of ground truth is within one-fourth of a pixel, and the accuracy can be guaranteed to the greatest extent. Finally, every 90 selected vectors and the corresponding ground truth from one image are considered as one set of input data to the proposed model.

Training Protocol and Metrics
The proposed sub-pixel edge detection model is deployed on Google Tensorflow deep learning platform with one NVIDIA GTX1080Ti GPU (11GB RAM). During the training procedure, the learning rate starts with 0.0001 and decays 5% every 241 iterations. The total number of iterations is 80,000.
The metrics to evaluate the proposed sub-pixel edge detection model involve three different criteria: mean-absolute error (MAE), MSE, and root mean square error (RMSE). The formulas are as follows: where n is the total number of edge points, m(x i ) is the prediction, and x i is the ground truth.

Experimental Results
To better evaluate the proposed sub-pixel edge detection model, traditional methods including interpolation method and moment method are adopted as the baseline for comparison. For the interpolation method, we introduce the quadratic interpolation algorithm, while for the moment method, the Sobel-Zernike operator is employed. Table 2 shows the detail results of the proposed model on our steel plate dataset. We evaluate the model based on the four times downsampled images and Figure 8 shows examples of the prediction results. The first row is the steel plate images; the second row is the predictions and the corresponding ground truth, where the vertical coordinate represents the serial number of the edge points, and the horizontal ordinate represents the coordinate of the edge point.
learning rate starts with 0.0001 and decays 5% every 241 iterations. The total number of iterations is 80,000.
The metrics to evaluate the proposed sub-pixel edge detection model involve three different criteria: mean-absolute error (MAE), MSE, and root mean square error (RMSE). The formulas are as follows: x m x n x m x n (9) where n is the total number of edge points, ( ) i m x is the prediction, and i x is the ground truth.

Experimental Results
To better evaluate the proposed sub-pixel edge detection model, traditional methods including interpolation method and moment method are adopted as the baseline for comparison. For the interpolation method, we introduce the quadratic interpolation algorithm, while for the moment method, the Sobel-Zernike operator is employed. Table 2 shows the detail results of the proposed model on our steel plate dataset. We evaluate the model based on the four times downsampled images and Figure 8 shows examples of the prediction results. The first row is the steel plate images; the second row is the predictions and the corresponding ground truth, where the vertical coordinate represents the serial number of the edge points, and the horizontal ordinate represents the coordinate of the edge point.   In the proposed model, it is appropriate to apply VGG-16 as the feature extraction block and in order to prove the superiority of VGG-16 in extracting edge point features, we also explore VGG-19, ResNet, and DenseNet as comparison. Table 3 shows the details of the comparison. As can be seen, equipping feature extraction block with VGG-16 has less error in prediction than other CNN-based models.

The Effect of the Fully Connected Layers in VGG-16
In this work, we adopt a one-dimensional VGG-16 with the standard structure as the feature extractor and there are two fully connected layers with 4096 output channels in the end. Different from the convolution layer, the fully connected layer does not share weights. To evaluate the effect of the fully connected layers, we did comparative experiments with two additional models. In the first model, the two fully connected layers are replaced by one convolution layer with the kernel size of 1 × 38 and the output channels of 4096, while the second model changed the output channels of the two fully connected layers to 2048. As shown in Table 4, our model has the best accuracy. The reason may come from two parts. The characteristics of shared weights reduce the nonlinear fitting ability of convolution layer and fewer channels hinder the feature extraction.

The Importance of Bi-Directional LSTM
LSTM is an effective model to deal with sequence information, while bi-directional LSTM has the ability to process information at the present moment according to both the past and future information. This characteristic makes bi-directional LSTM can optimize the edge positions further and has the anti-noise ability. To better evaluate the effects of adding bi-directional LSTM into our model, we did additional experiments compared to the model with vanilla LSTM and the model without any types of LSTM. As shown in Table 5, the proposed model with bi-directional LSTM achieves the best accuracy and the relative error of the predictions from the proposed model is 40% lower than the model without any types of LSTM. The results demonstrate that bi-directional LSTM is effective and performs better than vanilla LSTM. For the edge points which may be influenced by image noise in this work, the proposed model can rectify the prediction results according to the context information. Figure 9 shows the examples of predictions results with image noise. The first column is the images with noise such as scars and retro-reflective targets, and the second column represents the predictions from the proposed model and interpolation method. For the areas with noise, the interpolation method incorrectly considers the noise point as the edge point, while the proposed model is not affected by the noise.  For the edge points which may be influenced by image noise in this work, the proposed model can rectify the prediction results according to the context information. Figure 9 shows the examples of predictions results with image noise. The first column is the images with noise such as scars and retro-reflective targets, and the second column represents the predictions from the proposed model and interpolation method. For the areas with noise, the interpolation method incorrectly considers the noise point as the edge point, while the proposed model is not affected by the noise. The comparison above indicates that the bi-directional LSTM can reduce the effect of image noise and is also helpful to improve the accuracy of predictions.

The Comparison to Other Methods
There are three main sub-pixel edge detection methods: the fitting method, interpolation method, and moment method. There is also the interpolation method with quadratic interpolation The comparison above indicates that the bi-directional LSTM can reduce the effect of image noise and is also helpful to improve the accuracy of predictions.

The Comparison to Other Methods
There are three main sub-pixel edge detection methods: the fitting method, interpolation method, and moment method. There is also the interpolation method with quadratic interpolation algorithm and the moment method with Sobel-Zernike operator, which are widely used in the sub-pixel edge detection. In order to evaluate the proposed dimensional measurement model, we apply those two methods as comparison. Figure 10 shows the samples of the comparison between quadratic interpolation method and our model, where the first column is the steel plates images captured by the CCD sensor; the second column is the ground truth and the predictions from different methods. Results on our steel plates dataset (Table 6) demonstrate that the proposed model has superior improvement on the accuracy of prediction than other methods.  Results on our steel plates dataset (Table 6) demonstrate that the proposed model has superior improvement on the accuracy of prediction than other methods.

Conclusions
In this paper, we propose a novel network based on CNN and bi-directional LSTM to perform sub-pixel edge detection on steel plate images. The main contribution of this work includes introducing one-dimensional VGG-16 to extract edge features and applying bi-directional LSTM to handle sequence information. The purpose of our work is to find a way to locate edge positions on steel plate images more accurately with the limitation of resolution and has anti-noise ability. The comparison between traditional sub-pixel methods indicates that our method achieves significant improvement than other existed methods.
Author Contributions: Y.W. conceived the idea; Y.W. designed the algorithm; Q.C. performed the experiments and analyzed the data; Y.W. and Q.C. wrote the paper; M.D. and J.L. revised the paper. All authors read and approved the submitted manuscript.