## 1. Introduction

A fabric is a textile material, short for “textile fabric” [

1], that is manufactured with textile fibers and widely used in daily life. A fabric defect is a flaw on the fabric surface resulting from the manufacturing process [

2]. Unlike other processes, quality inspection of the fabric surface is highly important to textile manufacturers before products reach customers. Traditionally, visual inspection performed by experienced human inspectors has been used to ensure fabric quality, as shown in

Figure 1a. Limited by factors such as human fatigue and inattentiveness, visual detection methods can hardly provide reliable and stable results [

3]. Moreover, these results are usually subjective and cannot be quantified.

In recent years, with the rapid development of machine vision and digital image processing techniques, automated fabric defect inspection has become popular and has gradually displaced the traditional manual method. As shown in

Figure 1b, automated inspection of fabric defects usually involves three steps: image acquisition, defect detection and post-processing. The image acquisition procedure is mainly responsible for the digital image capture of defective samples, and generally, a line-scan charge coupled device (CCD) camera can be used. The defect detection procedure is performed to localize and segment the flawed regions, and sometimes, it also includes quantitative analysis. The last procedure refers to all subsequent processes after defect detection, e.g., defect type classification and defect grade assessment. In this paper, we mainly concentrate on the defect detection step, which is considered more challenging in fabric quality inspection.

Three main challenges exist in the defect detection task. First, there are a broad range of different fabrics, which usually exhibit varied characteristics. As making general algorithms compatible with various texture types is difficult, instability in the traditional fabric defect detection methods may occur. According to [

2,

5], all fabrics can be classified using up to 17 established wallpaper groups dominated by

p1,

p2,

p3,

$p3m1$,

$p31m$,

$p4$,

$p4m$,

$p4g$,

$pm$,

$pg$,

$pmg$,

$pgg$,

$p6$,

$p6m$,

$cm$,

cmm and

pmm, which have lattices based on parallelogram, rectangular, rhombic, square, or hexagonal shapes. In

Figure 2, we show some common defective fabric samples with different 2-D patterned textures. All this variability increases the complexity of the defect detection problem, making it difficult to devise a generalized method. Second, the categories and characteristics of fabric defects themselves are generally varied. Currently, more than 70 categories of fabric defects defined by the textile industry exist [

2]. These defects can be caused by different factors, such as machine malfunctions, yarn problems, and oil stains [

6]. As shown in

Figure 3, these defects can have vastly different manifestations in the same category of fabric (the

$p1$ group). Some defects, e.g., the ones in

Figure 3a,c,d,f, appear as regions of low contrast, nonuniform brightness, or irregular shape, which further contributes to the difficulties. Third, collecting large numbers of fabric defect samples, especially some rare types, is extremely difficult in industry, resulting in a data imbalance or a complete failure for some traditional supervised methods.

Though challenging, numerous researchers have devoted substantial efforts to these issues. Considering the different appearances of inspected fabrics, Ngan et al. [

2] broadly categorized the methods into two main groups, non-motif-based and motif-based methods. Most works focus on the non-motif-based group [

7,

8,

9,

10,

11,

12]. Specifically, Bissi et al. [

13] presented an algorithm for automated texture defect detection in uniform and structured fabrics based on a complex symmetric Gabor filter bank and principal component analysis (PCA). Experimental results using the TILDA textile texture database have verified the robustness and computation-saving performance of this method. However, this method does not generalize well for some slightly complex patterned textures. Harinath et al. [

14] proposed a wavelet transform-based method for fabric defect detection, which is well suited for quality inspection due to its multi-resolution feature extraction. However, similar spectral approaches are usually computationally demanding. Qu et al. [

15] proposed a defect detection algorithm for fabrics with complex textures based on a dual-scale over-complete dictionary. This method can enhance the self-adaptability of defect detection by considering large variations in the defect sizes. It also achieved excellent detection performance on comparison datasets. However, this method requires the inspected images to be aligned with the training samples in the dictionary. In addition, it is not efficient in detecting low contrast defects. In addition to these non-motif-based methods, only a few studies have conducted the defect detection task by considering elementary fabric motifs as a basic manipulation unit (motif-based) [

2]. These methods usually require a defect-free ground truth for comparison of the motifs, or they analyze the energy of motif subtraction to highlight defects [

16]. This method is not robust and can be time consuming. In addition, it is generally not suitable for the

p1 group (this group refer to fabric which is composed of one fundamental lattice with one motif only). Therefore, we will mainly concentrate on the non-motif-based models in later discussions and experiments.

In this paper, we present a novel unsupervised learning-based model that is capable of coping with different types of defects in the p1 and non-p1 groups. This model is a multi-scale convolutional denoising autoencoder (MSCDAE) architecture. The inputs into the network at each scale are generated by a Gaussian pyramid to cope with defects of different sizes. Instead of considering elementary motifs as a basic manipulation unit, this model tries to train the multiple convolutional denoising autoencoder (CDAE) networks with randomly sampled image blocks (also known as image patches) from defect-free samples. In fabric samples of the same type, these image patches, which do not contain defective areas, are usually highly similar. Therefore, after training, the CDAE network is capable of modeling the distribution of these defect-free image patches in the patch domain. Filters in the trained CDAE network will be sensitive to similar patches and thus show large responses to them. For patches that contain defective areas, their appearances and distributions in the patch domain will usually be quite different. The trained model may therefore be less sensitive to them, and relatively small responses will be generated. By measuring the residual between the response and the original input, direct pixel-wise prediction can easily be conducted. Finally, by synthesizing prediction results from multiple pyramid layers, the final inspection representation for a candidate sample can be generated.

The main contributions of this paper are summarized in the following points.

We proposed a new non-motif-based method MSCDAE which has the advantage of good compatibility for fabric defect detection. This method is a learning-based model that is suitable for the p1 and non-p1 types of fabrics. Experimental results have verified its good performance.

The multi-pyramid and CDAE architectures in this model are novel and subtle. Specifically, processing in a multi-scale manner with pyramids may ensure the capture of sufficient textural properties, which are often data independent. In addition, applying the CDAE network can distinguish defective and defect-free patches easily through the use of reconstruction residual maps, which are more intuitive.

This model is conducted in an unsupervised way, and no labeled ground truth or human intervention is needed. Furthermore, only defect-free samples are required for the training of this model. All these properties make it easier to apply the method in practice.

The remainder of this paper is organized as follows. In

Section 2, we briefly review the fabric defect detection methods and the foundations of the CDAE network. The reconstruction residual of this network serves as the core indicator of direct pixel-wise prediction for defect detection in later experiments. Then, in

Section 3, procedures of the proposed MSCDAE model are described in detail, and steps to train this model and test candidate images are summarized. In

Section 4, the overall performance of the proposed method is analyzed and compared with other well-known defect inspection methods. Relevant points about parameter selection are also discussed in this section. Implementation details of the proposed method are illustrated in

Section 5. Finally, we give our conclusions in

Section 6.

## 2. Related Works and Foundations

As previously stated, traditional fabric defect detection methods can be categorized into two main groups: non-motif-based and motif-based groups. According to [

2,

6,

8,

11], the majority of these methods belong to the first group. In this group, methods can be further subdivided into six categories: statistical, spectral, model-based, learning-based, structural and hybrid approaches [

2], as shown in

Figure 4. The statistical approach is a widely used method that seeks to distinguish defective and defect-free regions according to their different statistical characteristics, e.g., similarity, uniformity, and regularity. Typical statistical approaches include the auto-correlation metric, co-occurrence matrix and fractal dimension methods. The spectral approach is another widely used method for defect inspection that highlights the difference between defective and defect-free regions in the frequency domain. Fourier transformation, wavelet transformation and Gabor transformation are all typical spectral methods [

17,

18]. Though effective, these methods are usually computationally demanding. Model-based and structural approaches are relatively less common, possibly because these two approaches are highly data dependent. The models designed or the texture primitives utilized are varied for different types of patterned textures. The hybrid approach is a combined architecture that integrates two or more strategies for defect inspection [

11,

13,

19]. Hybrid approaches are usually capable of coping with different types of defects, and their inspection performance tends to be more effective and robust. However, these methods also have high time and computation costs. Moreover, it is generally difficult to design such a generalized approach. Motif-based approaches are methods that consider elementary motifs as basic manipulation units. These methods generally have better generality than the non-motif-based methods. However, currently, they cannot tackle the patterned texture of the

p1 group, e.g., plain and twill fabrics, very well. Additionally, they tend to be sensitive to noise and nonuniform illumination in situations in which the working environment is relatively poor. Here, we will mainly concentrate on the learning-based approach.

Learning-based approaches, especially methods with deep neural network architectures, are very promising for defect inspection. In recent years, there have been many studies that have investigated this field and explored better strategies for defect inspection [

20,

21,

22,

23]. However, the majority of these studies use supervised learning, which often requires large amounts of labeled defective samples for model training [

23]. The autoencoder (AE) network is a typical unsupervised method that has been widely used in shape retrieval [

24], scene description [

25], target recognition [

26,

27] and object detection [

28]. It can be trained without any labeled ground truth or human intervention. Since it is also an important component of the proposed model, the foundations and developments of this network will be briefly described below.

AE networks are based on an encoder-decoder paradigm that is usually trained in an unsupervised fashion and in a fully connected form. Convolutional autoencoder (CAE) networks differ from conventional AE networks in that they retain the structure information of the 2-D images and the weights are shared among local positions. The architecture of a typical CAE contains an encoder part with convolutional and pooling layers and an analogous decoder part with deconvolutional and upsampling layers. The encoder and decoder parts can be defined as transitions

$\varphi $ and

$\psi $ such that:

where

$\mathbf{x}\in {\mathbb{R}}^{d}=\mathcal{X}$ refers to an image patch in the

$\mathcal{X}$ domain, and

$\mathbf{z}=\varphi \left(\mathbf{x}\right)\in {\mathbb{R}}^{p}=\mathcal{F}$ refers to the corresponding hidden layer map in the

$\mathcal{F}$ domain. Assume that

${\mathbf{x}}^{\prime}$ denotes the reconstruction; then, the encoder and decoder processes can be expanded as:

where “∘” is the convolution process,

$\mathbf{W}$ and

${\mathbf{W}}^{\prime}$ are the weight matrices,

$\mathit{b}$ and

${\mathit{b}}^{\prime}$ are the bias vectors for the encoder and decoder, respectively, and

$\sigma $ and

${\sigma}^{\prime}$ are the nonlinear mapping processes, specifically, the convolutional, pooling, deconvolutional, and upsampling processes. Particularly, the pooling and upsampling processes are usually conducted in the form of max-pooling and max-unpooling [

29]. The CAE model can be trained to minimize the reconstruction errors (such as the mean squared errors):

where

N is the number of samples,

$\lambda $ is a constant that balances the relative contributions of the reconstruction and the regularization terms, and

$\sqrt{{\parallel {\mathbf{x}}_{i}-{\mathbf{x}}_{i}^{\prime}\parallel}^{2}}$ is the reconstruction residual of the

i-th image patch.

The CDAE network is slightly different from the CAE in that it takes partially corrupted inputs for model training and aims to recover the original undistorted inputs. This is done by first corrupting the initial input

$\mathbf{x}$ into

$\tilde{\mathbf{x}}$ by means of a stochastic mapping

$\tilde{\mathbf{x}}\sim q\left(\tilde{\mathbf{x}}|x\right)$. Assume that

${\tilde{\mathbf{x}}}^{\prime}$ is the reconstruction of the corrupted data

$\tilde{\mathbf{x}}$; then, loss in the CDAE model is measured by the reconstruction error

$\mathcal{L}\left(\mathbf{x},{\tilde{\mathbf{x}}}^{\prime}\right)$, as shown in

Figure 5b. The concrete form of

$\mathcal{L}\left(\mathbf{x},{\tilde{\mathbf{x}}}^{\prime}\right)$ is similar to that of

$\mathcal{L}\left(\mathbf{x},{\mathbf{x}}^{\prime}\right)$ in Equation (

3). In general, the CDAE model is conducted in a stacked form, which allows hierarchical feature extraction from unlabeled samples, as shown in

Figure 5c. A stochastic gradient descent algorithm [

30] can be easily used to optimize of all these neural network models.