An Algorithm for Surface Defect Identification of Steel Plates Based on Genetic Algorithm and Extreme Learning Machine

Defects on the surface of steel plates are one of the most important factors affecting the quality of steel plates. It is of great importance to detect such defects through online surface inspection systems, whose ability of defect identification comes from self-learning through training samples. Extreme Learning Machine (ELM) is a fast machine learning algorithm with a high accuracy of identification. ELM is implemented by a hidden matrix generated with random initialization parameters, while different parameters usually result in different performances. To solve this problem, an improved ELM algorithm combined with a Genetic Algorithm was proposed and applied for the surface defect identification of hot rolled steel plates. The output matrix of the ELM’s hidden layers was treated as a chromosome, and some novel iteration rules were added. The algorithm was tested with 1675 samples of hot rolled steel plates, including pockmarks, chaps, scars, longitudinal cracks, longitudinal scratches, scales, transverse cracks, transverse scratches, and roll marks. The results showed that the highest identification accuracies for the training and the testing set obtained by the G-ELM (Genetic Extreme Learning Machine) algorithm were 98.46% and 94.30%, respectively, which were about 5% higher than those obtained by the ELM algorithm.


Introduction
Surface defects detection technique is widely applied in industrial scenarios [1]. The surface quality inspection of steel plate has passed through three stages of development, including manual visual inspection, traditional non-destructive testing, and machine vision detection. Artificial visual inspection commonly uses the stroboscopic method [2], which sets up a high-frequency flashing light source above the production lines, and uses the persistence of human vision to achieve high-speed inspection of steel plates. This kind of detection causes great damage to the human body, and could result in optic fatigue as well as a higher false inspection rate. Due to the high-frequency flashing, the workers cannot inspect all parts of the steel surface, so a large number of defects are thus ignored.
The traditional non-destructive detecting techniques include eddy current testing, infrared detection, magnetic flux leakage detection, and laser detection. Since these techniques are limited by the detection principles, only a small number of defect types can be detected. At the same time, the resolution of acquired images is not high enough, so these techniques cannot effectively evaluate the product quality. With the development of computer image processing technology, machine vision with a Charge Coupled Device (CCD) has become widely used in industrial visual inspection. The invention of high-speed CCD cameras also enables fast image acquisition. In the 1980s, some organizations [3] A number of improved ELM-based algorithms have been developed, such as: I-ELM (incremental ELM) [20], OS-ELM (on-line sequential ELM) [21], EI-ELM (enhanced incremental ELM) [22], OP-ELM (Optimally Pruned ELM) [23], EM-ELM (esrror minimized ELM) [24], EOS-ELM (ensemble OS-ELM) [25], and so on. In the field of defect identification, ELM also plays a significant role. Li et al. [26] employed ELM for the identification of glass bottle defects and achieved high identification rates. Zhang et al. [27] applied ELM for the identification of solder joint defects. On the other hand, as the weights and bias of the ELM are randomly chosen, the results are different for each training session. Even if training results are fine enough, the result is not stable. The random weights limit the application of the ELM, that is, the random weights cannot reach the optimal of the current networks. In this paper, an improved ELM algorithm named G-ELM is proposed, which can effectively avoid the instability result caused by randomization with the help of a Genetic Algorithm [28].
The chapters of the paper are organized as follows: Section 2 briefly describes the principles of the ELM algorithm and Genetic Algorithm. Section 3 introduces the G-ELM algorithm, including its principles, elements, and implementation. The advantages of this proposed algorithm are as follows: (1) GA helps to eliminate the instability caused by parameters randomization initialization; (2) some new iterations are added to improve the efficiency of the evolution; and (3) some evolution methods are proposed to speed up converge. In Section 4, the online detecting system is introduced and some types of defect origins are discussed. In Section 5, the original ELM and G-ELM are compared and analyzed experimentally. Section 6 is the summary.

ELM Algorithm
The ELM algorithm was proposed by Huang Guangbin [20] as a kind of single hidden layer feedforward neural network (SLFN). The main idea of ELM is stochastic SLFN weight and bias, in which the activation function of the output node is infinitely differentiable. Thus, the optimal solution can be obtained by a generalized matrix inverse operation. The training network takes only a few simple steps and trains very quickly. However, the stochastic weights and bias will result in instability caused by parameters randomization initialization.
Suppose we have M mutually exclusive samples (x i , y i ), x i ∈ R d , y i ∈ R; then an SLFN network with N hidden nodes can be expressed as follows: where f is the activation function, and w i , b i , β i are the input weights, bias, and output weights of the ith neuron nodes of the hidden layer, respectively. If the SLFN can perfectly predict the data, that is, the difference between the observed valueŷ i and the ground truth y i is 0, then the above equation can be written as: The equation can be abbreviated as: where H is the output matrix of the hidden layer, defined as follows: β as: Given a randomly initialized input layer, as well as the training data x i ∈ R d , the output matrix of the hidden layer H can be calculated. And with H and the target output y i ∈ R, we can calculate the output weight β, β = H † Y, which H † is Moore-Penrose pseudoinverse [29].
In general, the flow of the ELM algorithm is as follows: Step 1: Given training set (x i , y i ), x i ∈ R d , y i ∈ R, activation function f : R → R, hidden nodes N.
Step 2: Randomize initial input weights w i and bias b i , i ∈ [1, N].
Step 3: Calculate the hidden layer output matrix H.

Genetic Algorithm
The Genetic Algorithm [26] is an algorithm that simulates the evolution of a species in nature. It is widely applied to optimization and search problems. Inspired by biology, mutation, chromosome crossover, and artificial selection; these and other concepts were applied to the Genetic Algorithm.
The flow of the Genetic Algorithm is as follows: Step 1: Produce an initial generation G 0 .
Step 2: Mutate the initial generation by chromosomal crossing, artificial selection, or other operations, denoted as: Step 3: Step 4: If step 3 is established, then the calculation is completed, otherwise return to step 2 to continue to produce offspring.
Step 5: If the maximum number of iterations is reached, the algorithm is exited. In the above process, G i represents the ith generation, f m represents mutation function, i max represents the maximum number of iterations, f t represents the fitness function, and ε is a customized small value which is greater than 0.
There are two important aspects of the Genetic Algorithm. One is the mutation operator to generate offspring; the other is the fitness operator to determine which offspring is to survive.

G-ELM
The input weights and bias of ELM are combined together to form a input matrix, such as The matrix X is considered as a single individual or a chromosome, and each element in it is a minimum mutation unit. Taking the characteristics of the matrix X into account, some operations are not compatible to this problem, such as crossover.
Mutation is a key point in this algorithm; to increase the correctness and success rate of the mutation, some novel mutation rules are added to the G-ELM which could produce new offspring more effectively.

The G-ELM Procedure
The process is as follows: Given hidden matrix nodes number N, training samples N N ≤ N and normally N N.
A single training sample (x i , y i ), x i ∈ R d , y i ∈ R, d is the number of features of the sample. ELM randomizes an initial input weight matrix iw = Matrix N × d , while a column vector The algorithm is initialized m times, so m initial parents are obtained, named the initial parents group. The fitness function is defined as: in whichŷ i is the ground truth. Then set ε and the maximum iteration limit k max . if f f itness (G 0 ) < ε or k = k max . Then terminate the training, otherwise enter the evolution iteration. When one generation of the elements of G k change from one status to another, it is called a mutation operation. Every time a mutation operation proceeds, certain rules need to be followed.
Suppose the kth generation is G k , and the elements in the matrix are . There are three steps in the mutation operator O m : Step 1: Choose all selectable elements, excluding the locked ones.
Step 2: Determine the number of variation elements according to the mutation rate v t .
Step 3: Determine the new element value g i,j according to the variation fluctuation range r.
Step 4: Mutate m times with different randomized parameters.
Step 5: Choose the model of smallest fitness among m models as the candidate for the generation k + 1.
Step 6: Check if this model is performs better than the model of smallest fitness and, if so, choose it to be G k+1 , otherwise give up the current generation of groups and restart from G k to produce a new set of offspring.
Step 7: Cycle the above steps until the training condition is reached or the maximum number of iterations is reached.
The detailed procedure is shown in Figure 1.
Meta ls 2017, 7, 311 5 of 12 in which � is the ground truth. Then set ε and the maximum iteration limit . if itness ( 0 ) < or = . Then terminate the training, otherwise enter the evolution iteration. When one generation of the elements of change from one status to another, it is called a mutation operation. Every time a mutation operation proceeds, certain rules need to be followed.
Suppose the kth generation is , and the elements in the matrix are , (1 ≤ ≤ � , 1 ≤ ≤ + 1). There are three steps in the mutation operator : Step 1: Choose all selectable elements, excluding the locked ones.
Step 2: Determine the number of variation elements according to the mutation rate .
Step 3: Determine the new element value , according to the variation fluctuation range r.
Step 4: Mutate m times with different randomized parameters.
Step 5: Choose the model of smallest fitness among m models as the candidate for the generation k + 1.
Step 6: Check if this model is performs better than the model of smallest fitness and, if so, choose it to be +1 , otherwise give up the current generation of groups and restart from to produce a new set of offspring.
Step 7: Cycle the above steps until the training condition is reached or the maximum number of iterations is reached.
The detailed procedure is shown in Figure 1.

Mutation Operation Rules
In order to speed up the evolution rate, some new mutation methods are proposed, including superior gene selection, dynamic variability, and regional mutation.

Mutation Operation Rules
In order to speed up the evolution rate, some new mutation methods are proposed, including superior gene selection, dynamic variability, and regional mutation.
(1) Superior gene selection: Whenever a new offspring is generated, it is assumed that all the elements of variation become an element group M. If it is obvious that element group M is the main reason for this successful mutation, then in order to preserve these "better genes", the next generation of variation elements will not contain the elements in element group M. Experimentation shows that this practice can greatly improve the efficiency of the algorithm.
(2) Dynamic variability: Set the value of base variability v b , the highest variability v h , and variation step rate v s . For each iteration t, the mutation rate v t is defined as: When one generation reaches the bottleneck of evolution, mutation rate v t will be changed initiatively to improve the success rate of evolution. However, a too high mutation rate, such as 0.7, will also make it difficult to mutate successfully. Furthermore, the mutation rate will be gradually increased until the highest rate v h is reached, after which it will be returned to the base mutation rate v b so as not to fall into a local optimal. In general, the training accuracy will be improved in a limited number of iterations.

Surface Inspection of Hot Rolled Steel Plates
There are three main procedures of the steel production industry, including continuous casting, hot rolling, and cold rolling. In every procedure, the surface status of steel products varies with different features. This paper is focused on the surface defect identification of hot rolled steel plates, and all samples were collected from several surface inspection systems installed on hot rolling steel plate lines, which were developed by Xu et al. [30]. Due to the hostile environment of hot rolling lines, the contrast and sharpness of samples are often not very good.
The system is installed after the hot straighter, and the surface temperatures of the steel plates are about 600 • C-800 • C. As demonstrated in Figure 2, the hot rolled steel plates are illuminated with green linear laser lighting. The wavelength of the lasers is 532 nm, which is very far from the spectrum of high temperature radiation. Furthermore, a narrow-banded color filter with a central bandwidth of 532 nm is installed on the front of each camera lens, and only laser lights with the wavelength of 532 nm reflected by the steel plates are allowed to enter into the cameras. Then, the surface of high temperature steel plates is imaged with high quality, and defects are visible in the images. In a surface inspection system for a 5000 mm steel plate production line, eight line-scanning CCD cameras with 4096 pixels are used, four of which are for image acquisition of the top side of the steel plates, while the other four cameras are for the bottom side. Each camera view is 1200 mm, and four cameras can cover 4700 mm of the width, subtracting overlaps between two cameras. The resolution of images in the width direction is 1200/4096 ≈ 0.3 mm/pixel. The resolution of images in the height direction is the distance between two adjacent lines captured by the camera. To keep images in the same resolution for the width direction and height direction, the distance between two adjacent lines captured by the camera is also 0.3 mm. A rotator is installed on the roller to acquire the real-time speed of the production line, and all cameras of the system are triggered once every 0.3 mm by the rotator. As the height of an image is 1024 pixels, the length of plate covered by an image is 1024 × 0.3 ≈ 307 mm.
Normally, a hot rolled steel plate is about 25 m long, and about 8 × 25,000/307 ≈ 648 images are needed to cover the whole length and width of both sides of the steel plate.      One of most common defects of hot rolled steel plates is scratch, including longitudinal scratch ( Figure 3e) and transverse scratch (Figure 3h). Scratches are usually caused by contact with hard objects or corners, or by the relative movement of steel plates and rollers resulting from a change in velocity.

Surface Defects of Steel Plates
Another frequent kind of defect is scale (Figure 3f). Some scales are covered on the steel surface, while some are rolled in the steel plates. Identification of scales is very difficult because of their diverse shapes and distributions.
Roll mark (Figure 3i) is another common kind of defect. They are periodical, as they are caused by foreign matter and pits in the rollers. By calculating the cycle of roll marks, the problem roller can be determined.

Experiments
The dataset uses 1675 samples of hot rolled steel plates, which are collected with the surface inspection system of a 5000 mm hot rolling line. Of these, 836 samples are used as a training set, while another 839 samples are used as a testing set. During each experiment, only the training samples are used to optimize the model. There are much more testing samples than training samples (normally 10% of total samples) to testify the generalization of the model. As illustrated in Section 4.1, there are nine types of defects in the dataset, including pockmarks, chaps, scars, longitudinal cracks, longitudinal scratches, scales, transverse cracks, transverse scratches, and roll marks. The image size is 128 × 128 pixels. The feature extraction procedure is carried out by the original Local Binary Pattern (LBP) operator [31], and the feature vector is 256 dimensions. All features of the training samples consist of a matrix of 836 × 256. The algorithm was developed with Matlab (R2016b, Version 9.1, MathWorks Inc., Natick, MA, USA), and operated in a MacBook Air computer (version 2011, Apple Inc., Cupertino, CA, USA) (CPU 1.4 GHz Intel Core i5, memory 4 GB 1600 MHz).

Comparison between ELM and G-ELM
In order to verify the efficiency of the proposed G-ELM, the training set is also imported into the ELM model. ELM and G-ELM algorithms both use 200 hidden nodes. The number of G-ELM iterations is 1000. The first-generation individual is selected from the best 20 random ELM models. Table 1 gives the experimental results of the G-ELM algorithm, which are compared with the ELM algorithm. In Table 1, v b and v h are the minimum and maximum boundary of the evolution mutation rate. Compared to the results of ELM, the improved G-ELM has better accuracy for both the training set and testing set because of its additional mutation operation rules. For example, because of v b and v h , the worst accuracies for the training set and testing set by G-ELM are 94.98% and 89.93%, which are 1.78% and 0.30% higher than that of ELM, respectively. Moreover, note that the accuracies for both the training set and the testing set are high, which means that the generalization of G-ELM is high enough.
For G-ELM, with the increase of the value of v b and v h , the accuracies of the training set and testing set continue to increase until the maximum values are reached, after which the values decrease, as shown in Figure 4. In Table 1, when the value of v b increases from 0.001 to 0.1, the accuracy for the training set improves from 94.98% to 98.46%, which is the highest accuracy. However, the accuracy decreases from 98.46% to 94.74% when v b increases to 0.6. This can be explained by the fact that the lower the mutation rate, the fewer the number of mutation elements, which makes it more difficult to mutate successfully. When the mutation rate is too high, there become too many mutation elements, causing the loss of superior genes. Only an appropriate mutation rate and reasonable mutation rules can achieve a higher rate of successful mutation and preserve effective genes.
both the training set and the testing set are high, which means that the generalization of G-ELM is high enough.
For G-ELM, with the increase of the value of and ℎ , the accuracies of the training set and testing set continue to increase until the maximum values are reached, after which the values decrease, as shown in Figure 4. In Table 1, when the value of increases from 0.001 to 0.1, the accuracy for the training set improves from 94.98% to 98.46%, which is the highest accuracy. However, the accuracy decreases from 98.46% to 94.74% when increases to 0.6. This can be explained by the fact that the lower the mutation rate, the fewer the number of mutation elements, which makes it more difficult to mutate successfully. When the mutation rate is too high, there become too many mutation elements, causing the loss of superior genes. Only an appropriate mutation rate and reasonable mutation rules can achieve a higher rate of successful mutation and preserve effective genes.

Analysis of the Perfomance of G-ELM
In this section, the performance of G-ELM is discussed, including the training history and the number of iterations. Figure 5 shows the training history of the training set and testing set at = 0.1, ℎ = 0.3, in which there are 10 successful evolutions after about 7000 iterations in total. It can be seen that during the 10 generations, the accuracy of the training set increases more steadily than that of the testing set. Moreover, in the 10th generation, the accuracies of both the training set and testing set are the highest, meaning that the model is optimal. For the first six generations, the accuracy of the testing set rises rapidly. However, the accuracy of the testing set fluctuates significantly from the sixth generation to the 10th generation because the generalization ability is weakened. Therefore, the accuracy at the 10th generation is optimal.
Note that when the number of generations increases, it becomes harder to achieve successful evolution. In Table 2, the larger the number of generations, the more iterations are needed. For instance, in the eighth generation, there needs to be 830 iterations, while 5690 iterations are required for the generation. Secondly, when the number of generations is very large, it is very difficult to achieve evolution. With the increase of the number of generations, the accuracies of the training set and testing set become higher, since it is closer to the optimal solution. From the first generation to the 10th, the accuracy of the testing set increases from 90.93% to 94.43%. Furthermore, in Table 2, regarding the training set, some accuracies are the same. For example, the accuracy of the seventh generation is the same as that of the fifth and ninth generations, because the generalization ability of the model is instable, especially when the number of iterations is high.

Analysis of the Perfomance of G-ELM
In this section, the performance of G-ELM is discussed, including the training history and the number of iterations. Figure 5 shows the training history of the training set and testing set at v b = 0.1, v h = 0.3, in which there are 10 successful evolutions after about 7000 iterations in total. It can be seen that during the 10 generations, the accuracy of the training set increases more steadily than that of the testing set. Moreover, in the 10th generation, the accuracies of both the training set and testing set are the highest, meaning that the model is optimal. For the first six generations, the accuracy of the testing set rises rapidly. However, the accuracy of the testing set fluctuates significantly from the sixth generation to the 10th generation because the generalization ability is weakened. Therefore, the accuracy at the 10th generation is optimal.
Note that when the number of generations increases, it becomes harder to achieve successful evolution. In Table 2, the larger the number of generations, the more iterations are needed. For instance, in the eighth generation, there needs to be 830 iterations, while 5690 iterations are required for the generation. Secondly, when the number of generations is very large, it is very difficult to achieve evolution. With the increase of the number of generations, the accuracies of the training set and testing set become higher, since it is closer to the optimal solution. From the first generation to the 10th, the accuracy of the testing set increases from 90.93% to 94.43%. Furthermore, in Table 2, regarding the training set, some accuracies are the same. For example, the accuracy of the seventh generation is the same as that of the fifth and ninth generations, because the generalization ability of the model is instable, especially when the number of iterations is high.
There is an increase of time consumption, especially from the seventh to 10th generation, and its CPU time increases from 7.35 s to 250.95 s. Under certain circumstances, such as online detection, there is not enough time to search for the global optimal, thus limited iterations are also acceptable. In this case, the seventh generation is a highly cost-effective solution based on accuracy and time. There is an increase of time consumption, especially from the seventh to 10th generation, and its CPU time increases from 7.35 s to 250.95 s. Under certain circumstances, such as online detection, there is not enough time to search for the global optimal, thus limited iterations are also acceptable. In this case, the seventh generation is a highly cost-effective solution based on accuracy and time.

Conclusions
An improved ELM algorithm named G-ELM was proposed and applied for the defect identification of steel plates. The G-ELM algorithm employs some additional mutation rules, which can offset the uncertainties caused by ELM randomization. Moreover, the G-ELM algorithm has a large gap between the different mutation rate training results. Results of experiments with nine typical defect samples showed that the G-ELM algorithm effectively improved the identification accuracy of the ELM algorithm. Under conditions of = 0.1 and ℎ = 0.3, the G-ELM algorithm performs best. The highest identification accuracy of the training and testing set obtained by the G-ELM algorithm are 98.46% and 94.30% respectively, which are about 5% higher than that obtained by the ELM algorithm.
Acknowledgments: This work is sponsore d by The National Natural Scie nce Foundation of China (No. 51674031).
Author Contributions: Siyang Tian conceived, designed and performed the experiments; Ke Xu contributed experiment data, materials and experiment equipments; Siyang Tian and Ke Xu analyzed the data; Siyang Tian wrote the paper; Ke Xu revised the paper.

Conflicts of Interest:
The authors de clare no conflict of inte re st.

Conclusions
An improved ELM algorithm named G-ELM was proposed and applied for the defect identification of steel plates. The G-ELM algorithm employs some additional mutation rules, which can offset the uncertainties caused by ELM randomization. Moreover, the G-ELM algorithm has a large gap between the different mutation rate training results. Results of experiments with nine typical defect samples showed that the G-ELM algorithm effectively improved the identification accuracy of the ELM algorithm. Under conditions of v b = 0.1 and v h = 0.3, the G-ELM algorithm performs best. The highest identification accuracy of the training and testing set obtained by the G-ELM algorithm are 98.46% and 94.30% respectively, which are about 5% higher than that obtained by the ELM algorithm.