ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection

Chen, Faquan; Deng, Miaolei; Gao, Hui; Yang, Xiaoya; Zhang, Dexian

doi:10.3390/app12168070

Open AccessArticle

ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection

by

Faquan Chen

^1,2,

Miaolei Deng

^2,3,

Hui Gao

^1,2

,

Xiaoya Yang

^2,3 and

Dexian Zhang

^2,3,*

¹

School of Mechanical and Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China

²

Henan International Joint Laboratory of Grain Information Processing, Zhengzhou 450001, China

³

College of Information Science and Engineering, Henan University of Technology, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(16), 8070; https://doi.org/10.3390/app12168070

Submission received: 13 June 2022 / Revised: 8 August 2022 / Accepted: 9 August 2022 / Published: 12 August 2022

(This article belongs to the Section Surface Sciences and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Metallic surface defect detection is critical to ensure the quality of industrial products. Recently, human-advanced surface defect detection algorithms have been proposed. Most of these algorithms rely on convolutional neural networks (CNN) and an anchoring scheme. However, a convolution unit only samples the input feature maps at fixed shapes and locations. Similarly, a set of anchors are uniformly predefined with fixed scales and shapes, which increases the difficulties of bounding box regression. Therefore, we propose an adaptive convolution and anchor network for metallic surface defect detection, named ACA-Net. Specifically, an adaptive convolution and anchor (ACA) module is proposed, which mainly consists of adaptive convolution and an adaptive anchor. Firstly, an adaptive convolution module (ACM) is designed, which adaptively determines the location and shape of each convolution unit. In addition, a multi-scale feature adaptive fusion (MFAF) is proposed, which is used in ACM to extract and integrate multi-scale features. Then, an adaptive anchor module (AAM) is proposed to yield more suitable anchor boxes by adaptively adjusting shapes. Extensive experiments on NEU-DET dataset and GC10 dataset validate the performance of the proposed approach. ACA-Net achieves 1.8% on NEU-DET dataset higher Average Precision (AP) than GA-RetinaNet. Furthermore, the proposed ACA module is also adopted in GA-Faster R-CNN, improving the AP by 1.2% on NEU-DET dataset.

Keywords:

surface defect detection; convolutional neural network; feature fusion; anchor box; computer vision

1. Introduction

Due to the influence of many factors, such as the production process, equipment, and other restrictions in the industrial production process, there will be some defects on the surface of some metallic products. As we know, metallic surface defects have a greatly adverse effect on the quality of industrial products. Thus, it is crucial to ensure the quality of industrial production with the use of surface defect detection technologies. At present, the methods of surface defect detection of metallic products are mainly still in the stage of artificial eye detection in the industrial production process. However, the artificial visual inspection is unreliable and time-consuming. Therefore, in order to meet growing demand, the desired way to guarantee the quality of products is to automatically detect surface defects by compute vision technologies.

The surface defect detection system mainly includes the image acquisition module, image processing module, image analysis module, data management, and the man-machine interface module. This paper only focuses on the surface defect detection algorithm. The founder of computer vision can be defined as the process discovering what presents in an image and where they are [1]. Similarly, surface defect detection can be categorized into image-level defect classification, region-level defect inspection, and pixel-level defect segmentation [2]. The different among image-level defect classification, region-level defect inspection, and pixel-level defect segmentation is shown in Figure 1. Image-level surface defect classification [3,4] is to recognize and label the surface defects, but not know where the defects are. Region-level defect inspection [5], similar to generic object detection [6] in computer vision, is able to determine simultaneously the classification and localization. And this localization information is crucial for the follow-up quality assessment system. Pixel-level defect segmentation [5,7] can not only recognize and locate the surface defects, but also describe the shapes of defects. However, the shortcoming of pixel-level defect segmentation is that pixel-wise annotation is very time-consuming, so it is not feasible in the industrial field. Under this background, we focus on region-level surface defect inspection. In this paper, surface defect detection is regard as a region-level defect inspection task. From the perspective of computer vision, the defect detection task is essentially object detection for defecting images.

The current surface defect detection methods can be mainly divided into two categories, i.e., traditional methods and based on deep learning methods. Most early surface defect detection methods use hand-crafted features to achieve good performance [8,9]. However, these methods heavily depend on expertise, and they are very sensitive to illumination, background, and other restrictions [10]. In recent years, with the rapid development of the convolutional neural network (CNN) [11], the surface detection methods based on deep learning have become the mainstream methods [12,13]. Although many remarkable breakthroughs have been made, there are still many challenges in current surface defect detectors. Under this background, we discuss three important problems, i.e., how to better extract and integrate the multi-scale features, how to better enhance CNNs’ capability of extracting features, and how to better generate more suitable anchors.

Feature fusion is a critical part of modern surface defect detectors, which could fuse the features from different layers, scales, or branches. Many recent works demonstrate that the performance of detectors can be effectively improved by using a feature fusion module [14,15,16]. However, these features are usually fused via simple operations, such as addition or concatenation. It is not the best choice, because not all features from different branches contribute equally to its response. Furthermore, these simple fusion operations fail to take both channel and spatial attention into account simultaneously. So these methods are unable to determine what and where to emphasize or suppress features effectively. In this work, a multi-scale feature adaptive fusion (MFAF) is proposed. MFAF can adaptively extract and integrate the features from different layers and scales, and take both channel and spatial attention into account. Specifically, the MFAF makes use of a cascade of different kernels and dilated convolution layers to capture multi-scale contextual information. Then, the channel attention weights of the features with different scales are extracted by the SEWeight [17], and these channel weights are re-calibrated by using the Softmax operation to establish long-range channel dependency. Finally, an improved spatial attention module SAM [18] is utilized. Thus, the channel-wise and spatial-wise features with different receptive fields can be acquired.

In recent years, CNN has achieved outstanding success for computer vision [6]. Meanwhile, many significant CNN architectures are proposed, such as, VGG [19], Inception-v4 [20], ResNet [21], and HRNet [22]. However, these CNN backbones still suffer from some drawbacks. For example, a convolution unit only samples the feature map at fixed shapes and locations. As a result, the receptive field of a convolution unit cannot be adjusted according to different surface defects. Inspired by Deformable convolutional networks (DCN) [23], we design an adaptive convolution module (ACM) with fewer parameters and lower computational cost to enhance CNNs’ capability of extracting features of surface defects. Specifically, 4-channel offsets are added in the traditional convolution, and then the offsets are utilized to accommodate the positions and shapes of convolution units respectively. The offsets can be learned from the input feature maps via a convolutional block. To further enhance the capability of ACM, MFAF is employed to obtain the offsets.

Recently, the success of most state-of-the-art (SOTA) modern detectors [24,25,26] derived from the use of anchors. However, these anchor-based detectors suffer from some drawbacks. For instance, a set of anchor boxes are uniformly predefined with fixed scales and shapes, which increases the difficulties of bounding box regression. Inspired by Guided Anchoring (GA) [25], a more efficient adaptive anchor module (AAM) is proposed in this paper. Specifically, AAM first identifies the locations that may contain surface defects and then determines the anchor shapes at different locations. In addition, a location-guided shape adaptation component is designed, which can utilize the location information to yield more suitable anchors. Therefore, AAM is able to yield more accurate preset anchors for different surface defects adaptively, rather than using hand-designed preset anchor boxes blindly. In addition, only one anchor is predicted by AAM at each location of the feature map, which decreases computational cost effectively.

In summary, the principal contributions of this work are as follows:

(1) A multi-scale feature adaptive fusion (MFAF) is proposed. It can adaptively extract and integrate the features from different layers and scales, and meanwhile take both channel and spatial attention into account.

(2) An adaptive convolution module (ACM) is proposed in this work, which can adjust the location and shape of the convolution unit according to surface defects with different shapes and locations.

(3) An adaptive anchor module (AAM) is proposed, which is capable of adjusting the shapes and scales of anchor for different surface defects instead of using fixed preset anchors blindly.

2. Related Work

2.1. Traditional Detection Approaches

Generally, the traditional surface defect detection mainly consists of two parts: feature extractor and classifier. This method uses hand-crafted features such as scale-invariant feature transform (SIFT) [27] and histogram of oriented gradient (HOG) [28]. Then these features are always input into a classification model, e.g., support vector machines (SVM) [29]. For instance, Ghorai et al. [30] evaluate the performance of a number of wavelet features and used SVM to localize defects. To against noise, Song et al. [31] improve the local binary pattern (LBP) and then adopt SVM to classify hot-rolled steel strip surface defects. Chu et al. [32] propose four types of statistical features, and then apply an enhanced twin support vector machine classifier to recognize steel surface defects. Wang et al. [33] extract HOG feature and then introduce a random forest algorithm to perform defect classification. These traditional surface defect detection methods have achieved noticeable improvement. However, this type of extracted features suffers from drawbacks brought by hand-crafted features. For instance, these features are heavily dependent on expertise. Moreover, these methods still cannot achieve satisfactory detection accuracy and robustness.

2.2. Deep-Learning-Based Detection Approaches

From the perspective of computer vision, the surface defect detection task is essentially object detection for defect images. With the rapid development of CNN [11], generic object detection technology has achieved outstanding success [6]. Generally, an object detector mainly consists of three parts: a backbone, a neck and a head [24]. The backbone is employed to extract features from an input image, such as VGG [19] and ResNet [21]. Then, the neck part is often utilized to combine feature maps from different stages of the backbone, such as feature pyramid network (FPN) [34]. Finally, the head part is used for predicting classes and bounding boxes of objects. The head of object detection can be categorized into two kinds, i.e., one-stage method and two-stage method. Generally, two-stage detectors are more accurate, but have lower inference speeds. While, one-stage detectors are much faster, but have a relatively poor performance [6]. The R-CNN series method is the most representative two-stage object detection method, it includes Faster R-CNN [35], GA-Faster R-CNN [25], and TridentNet [26]. As for the one-stage object detection method, the most representative models mainly include you only look once (YOLO) [36], Single Shot MultiBox Detector (SSD) [37], and RetinaNet [38].

With the development of computer vision, many works [12,13] demonstrate that generic object detection methods based in deep learning could be transferred as defect detection methods. Recently, most of these state-of-the-art surface defect detectors rely on an anchoring scheme. For instance, Li et al. [39] adapt SSD as the meta structure to achieve real-time and accurate detection of surface defects. A cascaded autoencoder (CASAE) architecture is designed in [40] to detect metallic surface defects. Faster R-CNN [35] is improved in [13] to detect workpiece surface defects. Wei et al. [41] propose a defect detectors-based YOLOv3 model for detecting identification of the railway. Cheng et al. [12] improve RetinaNet with difference channel attention and spatial feature fusion for steel surface defect detection. Despite the outstanding success, there are many challenges in the surface defect detection field. For instance, the receptive field of a convolution unit cannot be adjusted according to different defects. In addition, a set of anchor boxes are uniformly predefined with fixed scales and shapes, which increases the difficulties of bounding box regression.

2.3. Feature Fusion

Different from the neck (e.g., FPN [34]) of detector, feature fusion in this paper is utilized to fuse the features from different layers, scale, or branches. Many recent works demonstrate that the performance of detectors can be effectively improved by using a feature fusion module [14,15,16]. Recently, many excellent feature fusion modules are proposed, such as SPP [14], Receptive Fields Block (RFB) [15] and Hybrid Pooling-Atrous (PHA) [16]. Although many outstanding breakthroughs have been made, the feature fusion operation in these methods mentioned above are implemented via simple concatenation or addition. Nevertheless, this is not the best choice, because not all features from different branches contribute equally to its response. Therefore, Zhang et al. [17] design Pyramid Split Attention (PSA) to extract the multi-scale spatial information, and then capture recalibrated channel attention weights by Softmax operation to establish a long-range channel dependency. However, The PSA module misses the spatial attention which is important to determine where an object is.

2.4. Convolutional Neural Networks

In recent years, CNN has achieved outstanding breakthroughs, such as VGG [19], ResNet [21] and HRNet [22]. Benefiting from the rapid development of CNN, surface defect detection technology has achieved outstanding success [41]. However, there are many challenges in CNN. For instance, the receptive field, shape, and location of the convolution unit cannot be adjusted according to objects with different shapes. To address these problems, Dai et al. [23] propose DCN which adds 2D offsets to the sampling locations in standard convolution. DCN could adjust freely deformation of the sampling grid according to different objects. However, many samples may extend well beyond the region of the object, which causes the features to be influenced by irrelevant area in the image. Furthermore, this method leads to significant computation cost. For instance, the 18-channel offsets will be needed for a

3 \times 3

convolution unit.

2.5. Anchor Boxes

Recently, the success of most state-of-the-art surface defect detection relies on anchoring schemes. As for two-stage detectors, anchors are utilized to predict proposals, such as Faster R-CNN [35]. Different from two-stage detectors, anchors are employed to predict directly final bounding boxes in one-stage detectors, such as YOLOv3 [42], SSD [37] and RetinaNet [38]. However, the aspect ratios and scales of predefined anchors are fixed uniformly, which cannot adjust adaptively according different objects. To better leverage the anchor boxes, some excellent semi-anchor-free methods are proposed. For instance, rather than using all of the preset fixed anchors, Guided Anchoring (GA) [25] could predict only one variable anchor at each location that may contain objects. However, GA only utilizes the anchor shape information to refine features, the anchor location information is not efficiently utilized.

3. Materials and Methods

3.1. Overall Network Architecture

An adaptive convolution and anchor network ACA-Net is designed in this paper. Specifically, an adaptive convolution and anchor (ACA) module is proposed, which consists of ACM and AAM. Meanwhile, an MFAF is designed and used in ACM. Firstly, the overall network architecture of the proposed ACA-Net is described in this section. Then, the proposed MFAF is introduced. Next, we introduce the proposed adaptive convolution module in detail. Finally, the details of the proposed AAM are described.

The overview architecture of the proposed ACA-Net is illustrated in Figure 2. ACA-Net mainly consists of three parts: (a) a backbone to extract features from the input image; (b) a neck to combine feature maps from different stages of the backbone; (c) a head to predict classification and bounding boxes (BBox) of defect. Specially, ResNet-50 [21] pretrained on ImageNet [43] dataset is used as the backbone. It is worth noting that the ResNet-50 is modified for detecting surface defects, which are the same as RetinaNet. ACM improved by MFAF is employed at stages conv3–conv5 of ResNet-50 to enhance the modeling capability. Secondly, FPN [34] is adopted as the neck on top of the backbone to integrate multi-level features. Finally, we adopt the head which is similar to RetinaNet [38]. The box subnet (BSubnet) and the classification subnet (CSubnet) is respectively utilized to yield regression features (RF) and classification features (CF). Note that the RF and

C F

are operated by AAM to yield more suitable anchors and make new features

R F'

and

C F'

. Afterwards, the

R F'

and

C F'

is used for regressing BBox and predicting classes of surface defects respectively.

3.2. Multi-Scale Feature Adaptive Fusion

As shown in Figure 3, the proposed multi-scale feature adaptive fusion (MFAF) mainly consists of two parts: multi-scale feature extract (MFE) and feature adaptive fusion (FAF). ⊗ denotes the element-wise multiplication, and © denotes the concatenate operator in the channel dimension. in_ch and mid_ch denote the number of channel of feature maps. To capture multi-scale contextual information, the MFE makes use of a cascade of different kernels and dilated convolution layers. Specifically, MFE consists of two ordinary convolution layers with

1 \times 1

and

3 \times 3

kernels respectively, and a

3 \times 3

atrous convolution layer with the dilation rate of 3. The ordinary

3 \times 3

convolution layer is added before the atrous convolution layer. Then, the three kinds of features, from the ordinary convolution and atrous convolution, are concatenated in the channel dimension. Mathematically, the multi-scale feature map can be obtained by a concatenation way as

F = C a t (F_{1}, F_{2}, F_{3})

(1)

where

F_{1}

,

F_{2}

and

F_{3}

are the three kinds of feature maps with different receptive fields, and F is a multi-scale feature map.

After obtaining the multi-scale features, FAF is used for fusing features adaptively. Firstly, the channel attention weights of the three kinds of features are extracted by the SEWeight [17] respectively. As illustrated in Figure 3b, it uses a global average pooling and two fully-connected (FC) layers. The vector of channel attention weights can be represented as

Z_{i} = S E W e i g h t (F_{i}), i = 1, 2, 3

(2)

where

Z_{i}

is the channel attention weights of the feature map with different scales. The multi-scale attention weight vectors can be obtained in a concatenation way as

Z = Z_{1} © Z_{2} © Z_{3}

(3)

where © is the concatenation operator, Z is the multi-scale attention weight vectors. Then, these channel weights are re-calibrated by using the Softmax operation to establish long-range channel dependency. A soft assignment weight can be obtained as

a t t_{i} = S o f t m a x (Z_{i}) = \frac{e x p (Z_{i})}{\sum_{i = 1}^{3} e x p (Z_{i})}

(4)

where

a t t_{i}

is the re-calibrated weight of the multi-scale channel. Analogously, the whole channel attention weight can be given by

a t t = a t t_{1} © a t t_{2} © a t t_{3}

(5)

where

a t t

denotes the final multi-scale channel weights after Softmax operation. Thus, the channel-wise features can be obtained by multiplying the re-calibrated weight attention

a t t

with the feature map F. It could be defined as

Y = F \otimes a t t

(6)

where ⊗ denotes the element-wise multiplication, and Y is channel-wise multi-scale feature map.

Finally, these channel-wise features are operated by improved SAM. As illustrated in Figure 3b, we employ an extra added

1 \times 1

convolution layer compared to original SAM [18]. The spatial information from average-pooling, max-pooling and

1 \times 1

convolution layers is operated by

7 \times 7

convolution layer with the Sigmoid function to obtain spatial attention weights. Mathematically, the final feature can be given by

O u t = S A M (Y)

(7)

where

O u t

is the channel-wise and spatial-wise feature map with multi receptive fields, and

S A M

denotes the improved SAM operation.

3.3. Adaptive Convolution Module

The traditional 2D convolution uses a regular grid over the input feature map, which means that the sampling locations and receptive field sizes of regular convolution are fixed. For example, in regular convolution, a

3 \times 3

kernel could be defined as

p_{k} \in \{(- 1, - 1), (- 1, 0), \dots, (1, 1)\} .

(8)

In this example, we let

K = 3 \times 3

, which denotes the 9 sampling locations. Let

x (p)

denotes the features at location p from the input feature map. As for the output feature map, the features at p can be expressed as

y (p) = \sum_{k = 1}^{K} w_{k} \cdot x (p + p_{k})

(9)

where

y (p)

are the features at location p from the output feature map, and

w_{k}

is the learnable weight. Due to the fixed sampling locations and receptive fields of regular convolution, these convolution filters cannot appropriately cover the surface defects with various shapes.

To solve the issues above, we propose an adaptive convolution module (ACM). For simplicity, the ACM is described in 2 dimensions. As illustrated in Figure 4, the regular convolution kernels in ACM could be adjusted with learnable offsets containing 4 elements

\{Δ x, Δ y, Δ w, Δ h\}

.

Δ x, Δ y \in (- 0.5, 0.5)

and

Δ w, Δ h \in (- 1, 1)

are respectively used for adjusting the sampling locations and shapes (i.e., weight and height) of regular convolution kernels. The sampling location offsets can be expressed as

Δ x y = (Δ x, Δ y)

(10)

The sampling shape offsets can be expressed as

Δ w h_{k} \in \{(- Δ w, - Δ h), (- Δ w, 0), (- Δ w, Δ h), \dots, (Δ w, Δ h)\}

(11)

Hence, in ACM, the features at location p from the output feature maps becomes

y (p) = \sum_{k = 1}^{K} w_{k} \cdot x (p + p_{k} + Δ x y + Δ w h_{k})

(12)

As

Δ x y

and

Δ w h

are typically fractional, the value of

y (p)

is calculated via bilinear interpolation, just as in [23]. In this way, for

3 \times 3

kernel, ACM can adjust the regular convolution kernels with only 4-dimensional offsets. Therefore, no heavy computation is introduced by ACM. To further enhance ACM, the offsets are obtained via MFAF block with

m i d_c h = 8

. The spatial resolution of the offset field is the same as the input feature map. In the training, both the convolutional kernels and the offsets could be learned simultaneously. In this paper, ACM is utilized in all the

3 \times 3

convolutional layers in conv3–conv5 stages in ResNet-50 [21].

3.4. Adaptive Anchor Module

Inspired by GA [25], an adaptive anchor module (AAM) is proposed in this work which could yield more suitable anchor boxes for different surface defects. Given an image I, the probability of a surface defect can be expressed as

p (x, y, w, h | I) = p (x, y | I) p (w, h | x, y, I)

(13)

where

(x, y)

denotes the center of surface defect, and

(w, h)

is the width an height respectively. This equation indicates that the shape of a surface defect is closely related to its location. Following this formulation, AAM first identifies locations where the centers of surface defects are likely to exist and then predicts the shapes at different locations. Furthermore, we devise a location-guided shape adaptation component in this work, which can utilize the location information to yield more suitable anchors.

As demonstrated in Figure 5. Firstly, given a feature map from the neck of detector, the subnet CSubnet and BSubnet is utilized to yield class features (RF) and box features (CF), respectively. Next, a

1 \times 1

convolution with Sigmoid function is employed to yield a probability map which indicates the locations where the centers of surface defects are likely to exist. The probability map is the same size as CF. The value of each entry of probability map denotes the probability that the center of a surface defect may exist at that location. Based on the probability map, the regions where surface defects may exist could be selected. Therefore, there is no need to take those excluded regions into account.

Then, the shapes of the anchors where surface defects may exist at each location is determined. As shown in Equation (13), the shape of an anchor closely relates to its location. Therefore, a location-guided shape adaptation module is designed in this work to yield more suitable anchors. Specifically, the shape adaptation module consists of two parts: (1) A probability map is input two

3 \times 3

convolutions to yield shape offsets; (2) A

3 \times 3

deformable convolution (DC) is utilized to improve regression features by using the shape offsets. By these operations, the shapes of anchors can be adapted via location information. Afterwards, the shapes of anchors can be predicted via a

1 \times 1

convolutional layer. It is also worth noting that just one anchor box is placed at each available location.

Finally, feature adaptation is adopted to transform the original feature map (i.e., CF and BF) according to the anchor shape information. It utilizes a

1 \times 1

convolutional layer to obtain location offsets from the shape map, and then applies a

3 \times 3

deformable convolution to the original features, similar to the shape adaptation module.

A multi-task loss function as in GA [25] is utilized to optimize the framework in an end-to-end way. The multi-task loss function can be defined as

L = λ_{1} L_{l o c} + λ_{2} L_{s h a p e} + L_{c l s} + L_{r e g}

(14)

where

L_{l o c}

is a focal loss [38] to optimize the locations. As for

L_{s h a p e}

, we use a variant bounded iou loss as in [25] to only optimize the shape of anchor.

L_{c l s}

is a focal loss [38] and

L_{r e g}

is

L_{1}

loss [44]. In our experiments,

λ_{1} = 1

and

λ_{1} = 0.1

.

3.5. Dataset

Surface defect images are affected by a variety of factors, such as environment, light, noise, and camera. Furthermore, image quality seriously affects the of surface defect detectors. However, we focus on how to improve the detection accuracy, not how to get good images. Therefore, we only used the images with good quality. In this work, the NEU-DET [45] dataset is used. NEU-DET consists of six kinds of surface defects from hot-rolled steel plates, which includes patches, rolled-in scale, crazing, inclusion, pitted surface and scratches. Each kind of defect includes 300 images, and an image may consists many defects. The annotations are provided in NEU-DET, which marks the class and location of each defect in an image. Some examples of defect images are shown in Figure 6. Each yellow box is the bounding box which denotes the groundtruth box of a defect. In the NEU-DET dataset, there are about 5000 groundtruth boxes. Conventionally, the NEU-DET is divided into a training set containing 1260 images and a testing set containing 540 images.

4. Results and Discussions

4.1. Implementation Details

The ResNet-50 [21] with FPN [34] is used as the backbone network of ACA-Net. The NEU-DET dataset is a small dataset which is not enough to support the training of complex neural networks. Therefore, the ResNet-50 is pretrained on ImageNet [46] dataset. All of the surface defect detectors are trained on a Tesla V100 32G GPU with 2 images. Stochastic gradient descent (SGD) is used to train with a learning rate of 0.002, momentum as 0.9 and weight decay of 0.0001. We train 36 epochs in the end-to-end manner, and decrease the learning rate at the 28th and 34th epochs. The models are trained by the MMDetection toolkit [47].

The accuracies of all detectors are evaluated by the COCO [48] standard metric average precision (AP). The speed of each detector is evaluated on frame per second (FPS), i.e., the number of images that a detector can process per second. The IoU strategy in [25] is employed to determine the positive and negative samples from the anchors. The

A P

is evaluated at a different IoU threshold (

I o U \in [0.5 : 0.95]

) over all kinds of surface defects. It can be defined as

A P = \frac{1}{K} \sum_{n = 1}^{K} \int (P_{n} (R_{n})) d R_{n}

(15)

where

R_{n}

denotes the recall for class n,

P_{n}

is the corresponding precision, and K is the number of classes. The precision (P) and recall (R) are defined as follows

P = \frac{T P}{T P + F P}

(16)

R = \frac{T P}{T P + F N}

(17)

where

T P

,

F P

, and

F N

denotes the number of true positive samples, false positive samples, and false negative samples, respectively. Table 1 shows all of the evaluation metrics in out experiment.

4.2. Ablation Studies

In order to verify the effectiveness of the proposed modules, ablation experiments are conducted. The results are shown in Table 2. We take the GA-RetinaNet [25] with ResNet-50 [21] and FPN [34] as our baseline.

Firstly, ACM (without MFAF) is applied to GA-RetinaNet. We could see that the AP increases from 35.5% to 36.1%. Meanwhile, the

A P_{S}

and

A P_{M}

is improved by 4.1% and 1.0% respectively. This result indicates that the modeling capability of CNN can be enhanced by adjusting the location and shape of the convolution unit, especially for small defects. However, we also find that

A P_{L}

drops by 0.6% since a large receptive field is important for detecting large defects. However, the receptive field of the shape offsets in ACM is too small for large defects. We believe that this is the main reason for the dropping of

A P_{L}

.

Then, MFAF is used in ACM to adaptively integrate multi-scale features. With MFAF, the AP is improved by 0.5%. Meanwhile, we could also find that the

A P_{S}

,

A P_{M}

and

A P_{L}

is 1.9%, 0.2% and 2.1% better than before respectively. The result demonstrates that MFAF contributes to improving the surface defect detection performance for all surface defects with different sizes.

Finally, AAM is leveraged to replace original GA [25]. As illustrated in Table 2, the use of AAM shows a significant increase from 36.6% to 37.3%. This result better proves our previous views: The shape of an anchor closely relates to its location, and the location information could help AAM yield more suitable anchors. In addition, the

A P_{75}

is improved by 0.8%. Theses improvements further demonstrate that AAM could yield more suitable anchor boxes for defects. We could also find that the

A P_{S}

and

A P_{M}

are improved by 1.5% and 0.7%. However, it also worth noting that

A P_{L}

drops by 1.2%. We argue that the offsets in AAM have too-small receptive fields, which compromises our ability to locate large objects.

4.3. Comparsion

To further confirm the effectiveness of the proposed method, we apply the proposed ACA module to a two-stage detector GA-Faster R-CNN [25] and compare our methods with most of the SOTA one-stage and two-stage detectors. We could see that two-stage detectors commonly achieve better detection accuracy than one-stage detectors [6], but have lower inference speeds. And Table 3 shows that ACA-Net achieves an AP of 37.3%. Meanwhile, we could also see that ACA-Faster R-CNN achieves an AP of 40.3%, 1.2 points higher than the baseline GA-Faster R-CNN. These results indicate that both our one-stage and two-stage methods are comparable to most of the SOTA methods in accuracy. The results better confirm the effectiveness of the proposed ACA module. However, ACA-Net achieves an FPS of 7.6, 1.0 FPS lower than the baseline GA-RetinaNet. And compared with GA-Faster R-CNN, the FPS of ACA-Faster R-CNN decreases from 7.5 to 6.0. Nonetheless, it is worth noting that our aim is to verify the effectiveness of the proposed ACA module, instead of achieving SOTA performance in both accuracy and speed with many tricks. Besides, both ACA-Net and ACA-Faster R-CNN are obviously far faster than artificial visual inspection.

For a more intuitive comparison, some detection examples on NEU-DET dataset are shown in Figure 7. We could find that both ACA-Net and ACA-Faster R-CNN detect all surface defects with high accuracies, which further confirms the effectiveness of our methods.

4.4. Additional Experiments

To further demonstrate the effectiveness of the proposed methods, we carry out experiments on GC10 dataset from AI Studio [51]. The GC10 dataset is a small dataset which includes 2294 images from weld assemblies. It consists of nine kinds of surface defects and a welding line, which includes Crease, Crescent gap, Inclusion, Oil spot, Punching, Rolled pit, Silk spot, Waist folding, Water spot and Welding line. The annotations are provided in GC10, which marks the class and location of each defect in images. Conventionally, the GC10 is divided into a training set containing 1836 images and a testing set containing 458 images.

Table 4 shows that all the metrics of accuracy are improved compared with the baseline GA-RetinaNet. However, we also find that ACA-Net achieves a FPS of 5.9 on GC10 dataset, just 0.2 FPS lower than the baseline GA-RetinaNet. Nonetheless, ACA-Net is obviously far faster than artificial visual inspection on the GC10 dataset. The results further confirm the effectiveness of our methods.

5. Conclusions

In this work, we propose an adaptive convolution and anchor network (ACA-Net) for metallic surface defect detection, which can produce more robust features and yield high-quality anchor boxes. Specifically, an adaptive convolution module (ACM) is designed, which can adaptively adjust the location and shape of each convolution unit to enhance the modeling capability of CNN. In addition, a multi-scale feature adaptive fusion (MFAF) is proposed to extract and integrate multi-scale features adaptively. Then, we design an adaptive anchor module (AAM) which can generate more accurate anchor boxes for different surface defects. Extensive experiment results on the NEU-DET dataset and GC10 dataset demonstrate the effectiveness of our approach.

Author Contributions

Conceptualization, F.C. and M.D.; methodology, F.C.; validation, F.C., M.D. and H.G.; formal analysis, F.C. and H.G.; investigation, X.Y. and H.G.; resources, F.C. and H.G.; data curation, F.C.; writing—original draft preparation, F.C., X.Y.; writing—review and editing, F.C.; visualization, F.C.; supervision, D.Z.; project administration, M.D.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Key R&D Program of China (No. 2018******02); Major Public Welfare Special Projects of Henan Province (No. 201300311200).

Data Availability Statement

Data Availability Statement: Publicly available datasets were analyzed in out work. NEU-DET dataset can be found here: http://faculty.neu.edu.cn/songkechen/zh_CN/zdylm/263270/list/, accessed on 17 December 2021. GC10 dataset can be found here: https://aistudio.baidu.com/aistudio/datasetdetail/89446, accessed on 2 August 2022.

Conflicts of Interest

There is no conflict of interest in this article.

Abbreviations

CNN	convolutional neural networks
ACA-Net	adaptive convolution and anchor network
ACA	adaptive convolution and anchor
ACM	adaptive convolution module
MFAF	multi-scale feature adaptive fusion
AAM	adaptive anchor module
NEU-DET	Northeastern University Detection
AP	Average Precision
DC	Deformable convolutional
DCN	Deformable convolutional networks
SOTA	state-of-the-art
GA	Guided Anchoring
SIFT	scale-invariant feature transform
HOG	histogram of oriented gradient
SVM	support vector machines
LBP	local binary pattern
FPN	feature pyramid network
YOLO	you only look once
SSD	Single Shot MultiBox Detector
SPP	Spatial pyramid pooling
RFB	Receptive Fields Block
PHA	Hybrid Pooling-Atrous
PSA	Pyramid Split Attention
VGG	Very deep convolutional networks for large-scale image recognition
ResNet	Residual Network
CF	classification features
RF	regression features
BBox	bounding boxes
MFE	multi-scale feature extract
FAF	feature adaptive fusion
SGD	Stochastic gradient descent
COCO	Microsoft coco

References

Marr, D. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information; MIT Press: Cambridge, MA, USA, 2010. [Google Scholar]
Dong, H.; Song, K.; He, Y.; Xu, J.; Yan, Y.; Meng, Q. PGA-Net: Pyramid feature fusion and global context attention network for automated surface defect detection. IEEE Trans. Ind. Inform. 2019, 16, 7448–7458. [Google Scholar] [CrossRef]
Lin, Z.; Ye, H.; Zhan, B.; Huang, X. An efficient network for surface defect detection. Appl. Sci. 2020, 10, 6085. [Google Scholar] [CrossRef]
Xu, X.; Zheng, H.; Guo, Z.; Wu, X.; Zheng, Z. SDD-CNN: Small data-driven convolution neural networks for subtle roller defect inspection. Appl. Sci. 2019, 9, 1364. [Google Scholar] [CrossRef]
Fang, X.; Luo, Q.; Zhou, B.; Li, C.; Tian, L. Research progress of automated visual surface defect detection for industrial metal planar materials. Sensors 2020, 20, 5136. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Sahoo, D.; Hoi, S.C. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef]
Aslam, Y.; Santhi, N.; Ramasamy, N.; Ramar, K. Localization and segmentation of metal cracks using deep learning. J. Ambient Intell. Humaniz. Comput. 2021, 12, 4205–4213. [Google Scholar] [CrossRef]
Deutschl, E.; Gasser, C.; Niel, A.; Werschonig, J. Defect detection on rail surfaces by a vision based system. In Proceedings of the IEEE Intelligent Vehicles Symposium, Parma, Italy, 14–17 June 2004; pp. 507–511. [Google Scholar]
Ngan, H.Y.; Pang, G.K.; Yung, S.P.; Ng, M.K. Wavelet based methods on patterned fabric defect detection. Pattern Recognit. 2005, 38, 559–576. [Google Scholar] [CrossRef]
Xian, Y.; Liu, G.; Fan, J.; Yu, Y.; Wang, Z. YOT-Net: YOLOv3 Combined Triplet Loss Network for Copper Elbow Surface Defect Detection. Sensors 2021, 21, 7260. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Cheng, X.; Yu, J. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection. IEEE Trans. Instrum. Meas. 2020, 70, 1–11. [Google Scholar] [CrossRef]
Wang, H.; Wang, J.; Luo, F. Study on Surface Defect Detection of Metal Sheet and Strip using Faster R-CNN with Multilevel Feature. Mech. Sci. Technol. Aerosp. Eng. 2021, 2, 262–269. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
Zhang, Q.; Xiao, T.; Huang, N.; Zhang, D.; Han, J. Revisiting feature fusion for RGB-T salient object detection. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1804–1818. [Google Scholar] [CrossRef]
Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network. arXiv 2021, arXiv:2105.14447. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 764–773. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region proposal by guided anchoring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2965–2974. [Google Scholar]
Paz, D.; Zhang, H.; Christensen, H.I. Tridentnet: A conditional generative model for dynamic trajectory generation. In Proceedings of the International Conference on Intelligent Autonomous Systems, Singapore, 22–25 June 2021; Springer: Cham, Switzerland, 2022; pp. 403–416. [Google Scholar]
Wei, X.; Yang, Z.; Liu, Y.; Wei, D.; Jia, L.; Li, Y. Railway track fastener defect detection based on image processing and deep learning techniques: A comparative study. Eng. Appl. Artif. Intell. 2019, 80, 66–81. [Google Scholar] [CrossRef]
Resendiz, E.; Hart, J.M.; Ahuja, N. Automated visual inspection of railroad tracks. IEEE Trans. Intell. Transp. Syst. 2013, 14, 751–760. [Google Scholar] [CrossRef]
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
Ghorai, S.; Mukherjee, A.; Gangadaran, M.; Dutta, P.K. Automatic defect detection on hot-rolled flat steel products. IEEE Trans. Instrum. Meas. 2012, 62, 612–621. [Google Scholar] [CrossRef]
Song, K.; Yan, Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Appl. Surf. Sci. 2013, 285, 858–864. [Google Scholar] [CrossRef]
Chu, M.; Gong, R.; Gao, S.; Zhao, J. Steel surface defects recognition based on multi-type statistical features and enhanced twin support vector machine. Chemom. Intell. Lab. Syst. 2017, 171, 140–150. [Google Scholar] [CrossRef]
Wang, Y.; Xia, H.; Yuan, X.; Li, L.; Sun, B. Distributed defect recognition on steel surfaces using an improved random forest algorithm with optimal multi-feature-set fusion. Multimed. Tools Appl. 2018, 77, 16741–16770. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Annual Conference on Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitio, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Li, Y.; Huang, H.; Xie, Q.; Yao, L.; Chen, Q. Research on a surface defect detection algorithm based on MobileNet-SSD. Appl. Sci. 2018, 8, 1678. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.; Ma, W.; Liu, X.; Xu, D. Automatic metallic surface defect detection and recognition with convolutional neural networks. Appl. Sci. 2018, 8, 1575. [Google Scholar] [CrossRef]
Wei, X.; Wei, D.; Suo, D.; Jia, L.; Li, Y. Multi-target defect identification for railway track line based on image processing and improved YOLOv3 model. IEEE Access 2020, 8, 61973–61988. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 60, 25. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
Yi, L.; Li, G.; Jiang, M. An end-to-end steel strip surface defects recognition system based on convolutional neural networks. Steel Res. Int. 2017, 88, 1600068. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Chen, K.; Wang, J.; Pang, J.; Cao, Y.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Xu, J. MMDetection: Open mmlab detection toolbox and benchmark. arXiv 2019, arXiv:1906.07155. [Google Scholar]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
Chen, Q.; Wang, Y.; Yang, T.; Zhang, X.; Cheng, J.; Sun, J. You only look one-level feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13039–13048. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
AI Studio. Available online: https://aistudio.baidu.com/aistudio/index (accessed on 2 August 2022).

Figure 1. Three kinds of surface defect detection tasks. (a) Image-level defect classification task. (b) Region-level defect detection. (c) Pixel-level defect segmentation.

Figure 2. Architecture of the proposed ACA-Net.

Figure 3. The multi-scale feature adaptive fusion (MFAF). (a) The Architecture of MFAF. (b) SEWeight, (c) The improved SAM.

Figure 4. Adaptive convolution module (ACM).

Figure 5. Adaptive anchor module (AAM).

Figure 6. Defect images in NEU-DET. The yellow box is a groundtruth of a defect. (a) Crazing. (b) Inclusion. (c) Patches. (d) Pitted surface. (e) Rolled-in scale. (f) Scratches.

Figure 7. Comparison of detection results on NEU-DET. (a) Groundtruth, (b) YOLOv3, (c) YOLOF, (d) YOLOX-l, (e) SSD, (f) GA-RetinaNet, (g) ACA-Net (ours), (h) TridentNet, (i) Faster R-CNN, (j) GA-Faster R-CNN, (k) ACA-Faster R-CNN (ours).

Table 1. Evaluation metrics.

Alias	IoU	Area	Meaning
$A P$	0.50–0.95	all	$A P$ at IoU thresholds from 0.50 to 0.95
$A P_{50}$	0.50	all	$A P$ at 0.50 IoU threshold
$A P_{75}$	0.75	all	$A P$ at 0.75 IoU threshold
$A P_{S}$	0.50–0.95	small	$A P$ for small objects: $a r e a < 32^{2}$
$A P_{M}$	0.50–0.95	medium	$A P$ for medium objects: $32^{2} < a r e a < 96^{2}$
$A P_{L}$	0.50–0.95	large	$A P$ for large objects: $a r e a > 96^{2}$

Table 2. The effects of proposed module.

ACM	MFAF	AAM	$A P$	$A P_{50}$	$A P_{75}$	$A P_{S}$	$A P_{M}$	$A P_{L}$
			0.355	0.728	0.290	0.231	0.283	0.437
✓			0.361	0.736	0.311	0.272	0.293	0.431
✓	✓		0.366	0.732	0.324	0.291	0.295	0.452
✓	✓	✓	0.373	0.746	0.332	0.306	0.302	0.440

Table 3. Comparison results.

	Method	Backbone	$AP$	${AP}_{50}$	${AP}_{75}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$	$FPS$
one- stage	YOLOv3 [42]	Darknet-53	0.284	0.661	0.209	0.337	0.238	0.270	18.8
	YOLOF [49]	ResNet-50	0.330	0.675	0.265	0.303	0.277	0.372	18.6
	YOLOX-l [50]	CSPDarknet	0.373	0.726	0.326	0.330	0.309	0.456	16.6
	SSD [37]	VGG-16	0.319	0.692	0.237	0.200	0.266	0.395	18.1
	GA-RetinaNet [25]	ResNet-50	0.355	0.728	0.290	0.231	0.283	0.437	8.6
	ACA-Net (ours)	ResNet-50	0.373	0.746	0.332	0.306	0.302	0.440	7.6
two- stage	TridentNet [26]	ResNet-50	0.401	0.753	0.390	0.371	0.337	0.474	4.9
	Faster R-CNN [35]	ResNet-50	0.383	0.731	0.370	0.289	0.320	0.459	10.6
	GA-Faster R-CNN [25]	ResNet-50	0.391	0.756	0.370	0.343	0.325	0.465	7.5
	ACA-Faster R-CNN (ours)	ResNet-50	0.403	0.764	0.380	0.440	0.336	0.461	6.0

Table 4. Comparison results.

Method	Backbone	$AP$	${AP}_{50}$	${AP}_{75}$	${AP}_{S}$	${AP}_{M}$	${AP}_{L}$	$FPS$
GA-RetinaNet [25]	ResNet-50	0.281	0.589	0.257	0.050	0.172	0.267	6.1
ACA-Net (ours)	ResNet-50	0.293	0.605	0.268	0.100	0.190	0.272	5.9

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, F.; Deng, M.; Gao, H.; Yang, X.; Zhang, D. ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection. Appl. Sci. 2022, 12, 8070. https://doi.org/10.3390/app12168070

AMA Style

Chen F, Deng M, Gao H, Yang X, Zhang D. ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection. Applied Sciences. 2022; 12(16):8070. https://doi.org/10.3390/app12168070

Chicago/Turabian Style

Chen, Faquan, Miaolei Deng, Hui Gao, Xiaoya Yang, and Dexian Zhang. 2022. "ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection" Applied Sciences 12, no. 16: 8070. https://doi.org/10.3390/app12168070

APA Style

Chen, F., Deng, M., Gao, H., Yang, X., & Zhang, D. (2022). ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection. Applied Sciences, 12(16), 8070. https://doi.org/10.3390/app12168070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ACA-Net: An Adaptive Convolution and Anchor Network for Metallic Surface Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. Traditional Detection Approaches

2.2. Deep-Learning-Based Detection Approaches

2.3. Feature Fusion

2.4. Convolutional Neural Networks

2.5. Anchor Boxes

3. Materials and Methods

3.1. Overall Network Architecture

3.2. Multi-Scale Feature Adaptive Fusion

3.3. Adaptive Convolution Module

3.4. Adaptive Anchor Module

3.5. Dataset

4. Results and Discussions

4.1. Implementation Details

4.2. Ablation Studies

4.3. Comparsion

4.4. Additional Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI