Segmenting the Semi-Conductive Shielding Layer of Cable Slice Images Using the Convolutional Neural Network

Being an important part of aerial insulated cable, the semiconductive shielding layer is made of a typical polymer material and can improve the cable transmission effects; the structural parameters will affect the cable quality directly. Then, the image processing of the semiconductive layer plays an essential role in the structural parameter measurements. However, the semiconductive layer images are often disturbed by the cutting marks, which affect the measurements seriously. In this paper, a novel method based on the convolutional neural network is proposed for image segmentation. In our proposed strategy, a deep fully convolutional network with a skip connection algorithm is defined as the main framework. The inception structure and residual connection are employed to fuse features extracted from the receptive fields with different sizes. Finally, an improved weighted loss function and refined algorithm are utilized for pixel classification. Experimental results show that our proposed algorithm achieves better performance than the current algorithms.


Introduction
The semiconductive shielding layer is an important part of the aerial insulated cable; it is made of polymer material to balance the electric field distribution and avoid partial discharge. In trying to improve the reliability of this material when used in cables above 10 kV, more accurate and intelligent methods for structural parameter measurement are required during the polymer manufacturing process.
Traditionally, a series of cursory measuring points are selected by the naked eye and then measured manually with some tools such as a vernier caliper. Although optical instruments such as microscopes and projectors have been introduced in recent years, it is still hard to find the real weak points in insulation and shielding layers manually. Manual measuring methods are low in efficiency, poor in repeatability, and cumbersome in operation, while the results are usually affected by human factors.
In recent years, methods based on machine vision have been developed to solve the problems in manual measurement mentioned above. Cui [1] compared several edge detection operators used for contour extraction of cable slices and found that the binary morphology operator yields the best performance in edge feature detection. Fan [2] proposed a system to obtain the insulation contours by means of total variation denoising and the binary morphology operator. Feng [3] adopted the spindle transformation and multi-scale gradient to improve the precision of edge location. An improved Sobel-Zernike moment positioning method was proposed by Xia [4] to enhance the speed of the sub-pixel location method, while Bian [5] proposed an improved sub-pixel interpolation algorithm for cable thickness measurement.
Although there exist works on cable structural parameter measurement, most of them just focus on the measurement of the insulation layer and sheath layer for conventional types of regular cables. While parameter measurement of the semiconductive shielding layers mainly relies on manual measurement combined with a projector, there are some shortcomings, and this cannot meet the increasing requirements of cable security.
The fast segmentation of the semiconductive shielding layer is a key step in the process of vision measurement, but it is hard to extract features by traditional image processing methods because the semiconductive layer image regions are often seriously disturbed by the cutting marks.
In our research, an FCN-based algorithm is presented to acquire the image region of the semiconductive shielding layer automatically. After analyzing the characteristics of aerial insulated cable images, the inception structure and residual connection are utilized to calculate the details for network, an improved weighted loss is proposed to locate the outer border of the semiconductive shielding layer, and pixels similar to the foreground can also be found. During the segmentation process, there will be some mispredicted regions. Finally, a refinement step is proposed to remove the interference regions according to the prior knowledge and then ensure the unique region of the semiconductive shielding layer.

Image Analysis of Aerial Insulated Cable Slices
The first step in measuring the structural parameters of an aerial insulated cable is sampling. A cable slice is then cut from the sample. Some samples with a conductor (a) and a cable slice without a conductor (b,c) are shown in Figure 1.  The images of the aerial insulated cable slice, as shown in Figure 1b,c, consist of two parts, the insulation layer region and the semiconductive shielding layer region, as shown in Figure 1c.
As a whole, the structure of the insulated cable slice is circular, while the internal part is a sawtooth shape. In terms of spatial position, the semiconductive layer is located in the inner layer and closely adheres to the insulation layer.
Three typical aerial insulated cable slices, which were taken under different illumination, are shown in Figure 2a-c. In each image, the typical region is marked by a red rectangle for analysis, and the selected regions are enlarged in the lower right corner. As is seen in Figure 2a,c, the colors of the two layers are so similar that it is difficult to distinguish their actual boundary. As is seen in Figure 2b,c, the two layers fit very closely, and it is difficult to distinguish their boundary. According to the cutting process, the cutter will leave parallel stripes on the surface of the cable slice when a cable slice is made. As seen from Figure 2b,c, the enlarged image regions display the evident parallel stripes, which cover the whole image regions, and change the original texture structure of the slice images. The regular image segmentation methods include threshold segmentation, edge detection, etc. The Canny operator is a widely used edge detection algorithm with good performance. The results of the Canny operator for edge extraction in Figure 2a-c are shown in Figure 2d-f. It is simple to detect the inner and outer edges, but it is hard to obtain the boundary between the two layers completely, because the gray value of these parallel stripes is close to that of the semiconductive shielding layer.
At the same time, these images are illuminated differently. When the light is dim, as shown in Figure 2a, both the cutting marks and the semiconductor shielding layer are difficult to recognize. When the light is bright, both the cutting marks and the semiconductor shielding layer are clear. When the brightness is medium, as shown in Figure 2b, there are still many cutting marks, and the segmentation task cannot be completed, as shown in Figure 2e.
From the above, due to the similar color of the two layers and the cutting marks, it is difficult to locate the region of the semiconductor shielding layer. Then, the FCN-based method is proposed to solve the problems above.

Convolution Block
A convolution block usually consists of convolutions, batch normalization (BN) [16], and the activation function. The filters of convolutions slide over images to extract features. The BN algorithm [16] is commonly followed by convolution to accelerate the convergence of the network and prevent the network from overfitting. The activation function introduces non-linear decision boundaries to the network, and the rectified linear unit (ReLU) is often employed in deep learning applications, as it can alleviate the vanishing gradient problem and is considerably faster than the alternatives.

Inception Structure
The inception structure [6] stacks convolutions with the kernel sizes of 1 × 1, 3 × 3, and 5 × 5 and pools together, as shown in Figure 3. According to Szegedy et al. [6], it can increase the width of the network and improve the adaptability for scales. Szegedy [7] also proved that the parameters can be reduced by repeated application of the 3 × 3 convolution instead of the 5 × 5 convolution.

Residual Connection
In the residual connection algorithm [17], shortcuts are used as the identity mapping to propagate the gradients of the network. A residual unit that introduces shortcuts between convolutions is shown in Figure 4, and it adds an identity mapping that converts the original output to F(x) + x, where x denotes the input features.

U-Shaped Structure
The U-shaped structure [18] consists of a contracting path and an expansive path and is beneficial to detail the extraction and training of small datasets. The standard UNet [18] structure has five stages, which samples the images down to 1/16 and then samples up stage-by-stage for pixel-level prediction. The max-pooling layer is used to downsample the feature maps, while deconvolution is used to restore the resolution. The skip connection is used to fuse more low-level features in each stage.

Improved Network Architecture
The proposed network architecture is illustrated in Figure 5. It is designed on the basis of the U-shaped architecture [18], which consists of a contracting path and an expansive path. There are five stages in our architecture, and the information is passed from the contracting path to the expansion path at the corresponding stage through skip connections.
The contracting path gets the input images and outputs the feature maps of high-level semantics. Blocks with five stages and their feature map sizes are shown in Table A1. The first inception block extracts 32 feature maps from the input image, and then, another inception block follows. The resolution of the feature map is gradually reduced by using max-pooling stage-by-stage, while the channels of the feature map are doubled via the inception block following the max-pooling until it reaches 512.
The residual connection is added to fuse the features extracted by the previous two blocks. More specifically, the proposed inception block is shown in Figure 6. It consists of three parallel branches with different sensory fields, and those branches are followed by a 1 × 1 convolution block to extract hybrid features. The first branch only contains one 1 × 1 convolution block, the second one 3 × 3 convolution block, and the rest the repeated application of two 3 × 3 convolution blocks instead of a 5 × 5 convolution block. These convolutions all use a stride of 1 × 1 with padding. The convolution block of the first inception block at each stage contains convolution, BN, and ReLU, while the second is output without ReLU. The max-pooling works with 2 × 2 kernels and a stride of 2 × 2, while the feature maps are sampled down to 1/16 of the original image after processing in five stages.  In the expansive path, the feature maps with high-level semantics and low resolution are up-sampled until they are restored to the size of the original image. The restored features from the last stage are concatenated with the features from the contracting path at the corresponding stage. These skip connections let low-level information be passed to higher levels directly and obtain richer features. The features at different scales are captured by the repeated application of the inception block at each stage, and residual connections are used to make the gradient spread efficiently. Then, the network can learn to build a more precise output based on the information. Starting with deconvolution at Stage 4, blocks and their feature map sizes are shown in Table A2. Each deconvolution doubles the resolution of the feature maps from the last stage and halves the number of channels; hence, its spatial dimension is consistent with the feature maps transferred by the skip connection. After the processing of an inception block, the information is fused, and the number of channels is halved. The information is further consolidated in the next inception block by the residual connection. Finally, the network fuses information and outputs a pixel-level prediction.

Improved Loss Function
The binary cross-entropy loss (BCE) is commonly used in binary classification and is defined as follows [19]: where y ∈ [0, 1] is the pixel label, the foreground (y = 1) and background (y = 0). p ∈ [0, 1] is the probability of the foreground estimated by the model and is computed by a sigmoid function over the output feature map. p t is defined to rewrite the BCE as BCE(p, y) = BCE(p t ) = −log(p t ).
In the task of semiconductive shielding layer segmentation, pixels in the image are divided into the foreground and background, as shown in Figure 7, the background pixels in the image are much greater than the foreground pixels, which leads to the imbalance between the classes. Furthermore, a basic UNet model is established, and the segmentation results are shown in Figure 8. The red rectangles mark the regions in the original images and prediction images where the prediction is obviously wrong. As seen from Figure 8a,d, the model mistook the outer part of the insulation layer for the semiconductor shielding layer. As seen from Figure 8b,e, the boundaries of the two layers are difficult to separate properly. Figure 8c,f shows that the model misidentifies parts of the insulation layer near the boundary as the semiconductor shielding layer. The outer edge of the insulation layer is similar to the semiconductive shielding layer in shape and color, then the network is prone to predicting it as the foreground. The mentioned factors interfere with the segmentation results greatly. To solve the problem of class imbalance, a typical method is to introduce a weight factor into the loss function [18,19]. Ronneberger [18] used a weight map forcing the model to pay more attention to the border, and Lin [19] added a modulating factor making the model focus on the hard examples.
As shown in Figure 9, the background outside the cable slice can be identified most easily, while the pixels on the boundary between the two layers are the hardest part to identify and is challengeable. Inspired by existing methods, a weighted loss is proposed to force the network to pay more attention to hard examples: where x is the pixel position on the output feature map and ω is the weight map that we introduced to solve the problems mentioned above. The weight map contains three components and is pre-computed for each training datum: where ω c is a class balance weight map, ω p is a position weight map, and ω h is a hard example penalty weight map. k i ∈ N(i = 1, 2, 3) are parameters to adjust the proportion of the corresponding terms in ω. ω p and ω h are computed as: where d i and d o are the minimum distances between the inner contour and outer contour of the semiconductive shielding layer at pixel position x, respectively. If the pixel is out of the contour, the distance is computed as a negative number; otherwise, the distance is positive. The position information of pixel x can be obtained according to the symbol and value of the distance. ω p is computed according the position of the pixel. d h is the minimum distance between position x and the outer contour of the insulation layer. σ i ∈ N (i = 1, 2, 3) are constants for the calculation of the effective distance. As shown in Figure 10, d has different signs in different regions, and the weight map is calculated according to its values. The weight map is shown in Figure 10.

Prediction Refinement
In this section, the edge of the insulation layer is very similar to that of the semiconductive shielding layer in terms of geometry and texture.
As shown in Figure 11, some isolated regions marked with red rectangles are predicted as the foreground, because the network is prone to classify the pixels of the outer edge of the insulation layer into the semiconductive layer. However, there is only one continuous semiconductive layer in one slice image. Thus, an approach based on the morphological properties of the target region is designed to remove the noise areas. Firstly, the network prediction is processed with an appropriate threshold value t o ∈ [0, 1] according to function T(p t ): Then, all the connected domains are found, and the aspect ratio of each region is calculated by its minimum enclosing rectangle.
where AR denotes the aspect ratio. R h and R h represent the height and width of the minimum enclosing rectangle, respectively. Finally, small noises are removed according to the area of each region, and then, the regions are selected according to the aspect ratio with a threshold t ar .

Implementation
The proposed method is implemented with the TensorFlow framework, and the experimental environment included an Intel Core i5 3.4 GHz CPU, an NVIDIA GeForce GTX1060 GPU, and a 64 bit Windows operating system. The details of the experiment are given below: • Dataset: A platform consisting of an industrial camera, a telecentric lens, and an auxiliary light source was set up to collect data, since it is difficult to collect the images with the semiconductive shielding layer by normal illumination. Two-hundred fifty-four images were collected from different aerial insulated cable sections under different lighting conditions, and then, the semiconductive shielding layers were manually labeled at the pixel level. The dataset was trimmed to a uniform size of 224 × 224 and was divided into a training set with 148 images, a validation set with 28 images, and a test set with 78 images. Cable slice images and their masks are shown in Figure 12. In the case of this task, the shift and rotation invariance, as well as the robustness to illumination variations were primarily considered. The content of this part is supplemented by two aspects. • Evaluation metrics: Intersection over union (IoU), Dice coefficients, and pixel precision were used to evaluate the segmentation results. Let TP (true positive) be the number of pixels with the actual target predicted as the target, FP (false positive) be the number of pixels with the actual background predicted as the target, and FN (false negative) be the number of pixels with the actual target predicted as the background. Let KP be the number of pixels predicted as the target and KG be the number of pixels labeled as the target. The higher these metrics are, the better the model performs.
• Parameter configuration: The weight maps were pre-calculated with the parameters of σ 1 = 9, σ 2 = 25, σ 3 = 4, k 1 = 1, and k 2 = k 3 = 5. During the training phase, the sigmoid function was used to indicate the probability that each pixel is predicted to be the foreground, since there were only two classes in this task: foreground and background. The weighted binary cross-entropy loss function was optimized by gradient descent with a 0.001 initial learning rate. The network was trained with a mini-batch size of four for more than 100 epochs until the verified IoU no longer increased significantly. For the refinement step, we set t o = 0.5 and t ar = 0.9.

Results and Discussion
The proposed network structure has five stages, and the number of stages was adjusted to find the optimal numbers. As shown in Table 1, with the increase of the stages, the performance of the network improved gradually. However, the performance of the network can hardly be improved by adding stages after it reaches five stages. In order to validate the effectiveness of the structure we improved, several submodels were built for comparison: S1, S2, S3. S1 is the standard UNet. In S2, the inception structure is introduced as the main building block of the network, and in S3, the residual connection is used on the basis of S2. Table 2 summarizes the performances of these models, and the results indicate that the performance can be enhanced by introducing the above two structures. In order to validate the contribution of the prediction refinement steps, the predictions output by the models above were refined respectively. We found that there was an approximately 1% improvement in rough predictions, but little in the more robust models. However, this did work for some predictions. In this step, small regions that were incorrectly predicted would be filtered out without disturbing the target boundary. We tried to use fully connected CRFs [20] to refine as well, but that made the results worse, because of the cutting marks.
The segmentation results are shown in Figure 13, and it shows a comparison of the segmentation results using our method with the standard UNet model. The first row in Figure 13 shows the cable slices at different sizes under different illumination; the second row displays the results of the proposed method; and the last row is the predicted results of standard UNet. It can be observed that the proposed method still has good segmentation results in the case of the severe interference of cutting marks, and it is robust to illumination variation. Figure 13. Segmentation results. The first row is the input images. The second row is the prediction of our method. The last is the results obtained by standard UNet.
According to the analysis in the previous sections, it is difficult to accurately segment the semiconductor shielding layer region from these images by traditional edge detection methods. It can be seen from Figure 13, relatively speaking, hat the method based on the deep neural network is well adapted to the light change and cutting mark interference, and the semiconductor shielding layer region is successfully segmented.
It can be seen from the fourth column of Figure 13 that a small part of the outer edge region of the insulation layer is retained in the segmentation results of the standard UNet model, but this part of the region is correctly predicted by the proposed method.
As seen from the last two columns in Figure 13, the standard UNet model incorrectly predicts the area of the ground insulation layer near the semiconductor shielding layer as the semiconductor shielding layer. If such a prediction is used for measurements, there will be a serious error. The proposed method obtained relatively correct segmentation results.
From the edge details of the segmentation results, such as the third column and the fifth column, the internal and external edges in the segmentation results of the comparison method are relatively rough, while the segmentation results of the proposed method are closer to the actual edge situation.
Therefore, although both methods successfully segment the semiconductive layer area, the proposed method performs better in detail.

Conclusions
In this paper, a semiconductive shielding layer segmentation method based on the convolutional neural network is proposed, and it is a typical application for polymer materials. The main novelties of this study are as follows. First, the inception structure is introduced to make the network more robust to scale. Second, the residual connection is employed to improve the U-shaped structure. Third, a weighted loss function is proposed especially for this task to force the network to pay more attention to the pixels that are difficult to classify. Finally, the prediction refinement step based on prior knowledge is proposed to refine the network prediction results. The experimental results demonstrate that the proposed method can deal with the task of semiconductive shielding layer segmentation. In the future, we intend to improve our method based on other architectures, such as DenseNet [21], richer features [22], and CRF-RNN [23].

Conflicts of Interest:
The authors do not have any competing interests to declare.

Abbreviations
The following abbreviations are used in this manuscript: