Using Entropy for Welds Segmentation and Evaluation

In this paper, a methodology based on weld segmentation using entropy and evaluation by conventional and convolution neural networks to evaluate quality of welds is developed. Compared to conventional neural networks, there is no use of image preprocessing (weld segmentation based on entropy) or data representation for the convolution neural networks in our experiments. The experiments are performed on 6422 weld image samples and the performance results of both types of neural network are compared to the conventional methods. In all experiments, neural networks implemented and trained using the proposed approach delivered excellent results with a success rate of nearly 100%. The best results were achieved using convolution neural networks which provided excellent results and with almost no pre-processing of image data required.


Introduction
The Fourth Industrial Revolution (Industry 4.0) has opened space for research and development of new manufacturing methods, systems and equipment based on innovations such as computing intelligence, autonomous robots, big data, augmented reality, process simulation, quality management systems, etc. [1].
Weld evaluation is very important quality control process in many manufacturing processes. Without this technological process, it would be almost impossible to produce welded constructions with current efficiency-whether we are talking about time, price, or material consumption. It is therefore necessary to welds be inspected to meet the specified quality level. In order to detect the possible presence of different weld defects, proper sensing, monitoring and inspection methods are necessary for quality control. Very effective and non-destructive method for weld evaluation is visual inspection. Inspection process using this method can be in certain level automated and done by computer systems [2,3].
Visual inspection of a weld is an important non-destructive method for weld quality diagnostics that enables to check welded joint and its various parameters. This examination is carried out as a first examination and able to detect various defects [4].
In this paper, we focus on indirect visual evaluation due to which the evaluation process can be automated. Indirect inspection can be applied also in places that are not directly accessible, for example the inner surface of a pipeline, the interior of pressure vessels, car body cavities etc. It also eliminates errors of human judgment and removes errors caused by workers for such reasons as e.g., fatigue, inattention or lack of experience.
The improved beamlet transformation for weld toe detection described in [5,6] considers images which are corrupted by noise. The authors aim at detecting edge borders of welds. The dynamic neural network cloud computing for real-time weld evaluation, both implemented on a single-board low-cost computer. The proposed evaluation system was successfully verified on welding samples corresponding to a real welding process. The system considerably contributes to the weld diagnostics in industrial processes of small-and medium-sized enterprises. In [18], the same authors use a single-board computer able to communicate with an Android smartphone which is a very good interface for a worker or his shift manager. The basic result of this paper is a proposal of a weld quality evaluation system that consists of a single-board computer in combination with Android smartphone. This paper deals with development of a software system for visual weld quality evaluation based on weld segmentation using entropy and evaluation by conventional and convolution neural networks. The evaluation of the performance results is compared to the conventional methods (weld segmentation based on entropy and evaluation using conventional neural networks with and without weld segmentation). Most experiments of proposed method apply on weld metal, however, one experiment with convolution neural networks applies also on weld adjected zones. 6422 real and adjusted laboratory samples of welds are used for experiments. The paper is organized in five sections: Section 2 deals with preparation of input data for the neural network. Section 3 describes configuration of used neural networks and their training process. In Section 4 the results of experiments are presented. In Section 5 we discuss the results.

Preparation of Input Data for the Neural Network
The input data for the proposed diagnostic system were represented in the form of grayscale laboratory samples of metal sheet welds in JPEG format. The samples were pre-classified as OK (correct) and NOK (incorrect) (Figures 1 and 2). Defective weld samples (NOK) include samples of various surface defects such as irregular weld bead, excess weld metal, craters, undercut, etc. Welds images are captured under the same illumination and have the same resolution 263 × 300 pixels. The total number of evaluated sample images was 6422.
However, for several reasons the image resolution 263 × 300 pixels is not suitable for a conventional neural network due to the necessity of large amount of allocated memory (about gigabytes for thousands of frames even in a relatively low resolution) and time-consuming network training time. context of Industry 4.0. The proposed approach is based on using a visual system for weld recognition, and a neural network cloud computing for real-time weld evaluation, both implemented on a single-board low-cost computer. The proposed evaluation system was successfully verified on welding samples corresponding to a real welding process. The system considerably contributes to the weld diagnostics in industrial processes of small-and medium-sized enterprises. In [18], the same authors use a single-board computer able to communicate with an Android smartphone which is a very good interface for a worker or his shift manager. The basic result of this paper is a proposal of a weld quality evaluation system that consists of a single-board computer in combination with Android smartphone. This paper deals with development of a software system for visual weld quality evaluation based on weld segmentation using entropy and evaluation by conventional and convolution neural networks. The evaluation of the performance results is compared to the conventional methods (weld segmentation based on entropy and evaluation using conventional neural networks with and without weld segmentation). Most experiments of proposed method apply on weld metal, however, one experiment with convolution neural networks applies also on weld adjected zones. 6422 real and adjusted laboratory samples of welds are used for experiments. The paper is organized in five sections: Section 2 deals with preparation of input data for the neural network. Section 3 describes configuration of used neural networks and their training process. In Section 4 the results of experiments are presented. In Section 5 we discuss the results.

Preparation of Input Data for the Neural Network
The input data for the proposed diagnostic system were represented in the form of grayscale laboratory samples of metal sheet welds in JPEG format. The samples were pre-classified as OK (correct) and NOK (incorrect) (Figure 1 and Figure 2). Defective weld samples (NOK) include samples of various surface defects such as irregular weld bead, excess weld metal, craters, undercut, etc. Welds images are captured under the same illumination and have the same resolution 263 × 300 pixels. The total number of evaluated sample images was 6422.   However, for several reasons the image resolution 263 × 300 pixels is not suitable for a conventional neural network due to the necessity of large amount of allocated memory (about gigabytes for thousands of frames even in a relatively low resolution) and time-consuming network training time.
Several suitable options for data processing that eliminate the above problems are presented next. At first, the background weld segmentation is described. Segmentation provides two outputsthe weld mask and the segmented weld itself. Three transformations of the weld mask into a onedimensional feature vector are described further. Feature vectors are useful as inputs for the multilayer perceptron (MLP)/radial basis function (RBF) neural networks. Finally, the size of the segmented/unsegmented weld image is reduced when applied in the conventional neural network (if CNN is applied, no size reduction is needed).

Weld Segmentation
The sample images depict the weld itself and the background -metal sheet. The background does not affect the evaluation of the weld and is masked from the images by the proposed algorithm. The simplified flowchart of the algorithm is shown in Figure 3.  Several suitable options for data processing that eliminate the above problems are presented next. At first, the background weld segmentation is described. Segmentation provides two outputs -the weld mask and the segmented weld itself. Three transformations of the weld mask into a one-dimensional feature vector are described further. Feature vectors are useful as inputs for the multilayer perceptron (MLP)/radial basis function (RBF) neural networks. Finally, the size of the segmented/unsegmented weld image is reduced when applied in the conventional neural network (if CNN is applied, no size reduction is needed).

Weld Segmentation
The sample images depict the weld itself and the background-metal sheet. The background does not affect the evaluation of the weld and is masked from the images by the proposed algorithm. The simplified flowchart of the algorithm is shown in Figure 3. However, for several reasons the image resolution 263 × 300 pixels is not suitable for a conventional neural network due to the necessity of large amount of allocated memory (about gigabytes for thousands of frames even in a relatively low resolution) and time-consuming network training time.
Several suitable options for data processing that eliminate the above problems are presented next. At first, the background weld segmentation is described. Segmentation provides two outputsthe weld mask and the segmented weld itself. Three transformations of the weld mask into a onedimensional feature vector are described further. Feature vectors are useful as inputs for the multilayer perceptron (MLP)/radial basis function (RBF) neural networks. Finally, the size of the segmented/unsegmented weld image is reduced when applied in the conventional neural network (if CNN is applied, no size reduction is needed).

Weld Segmentation
The sample images depict the weld itself and the background -metal sheet. The background does not affect the evaluation of the weld and is masked from the images by the proposed algorithm. The simplified flowchart of the algorithm is shown in Figure 3.   After reading the images, local entropy of each pixel is computed according to [19]: where p ij represents the probability function for the pixel [i, j].
This value contains information about the complexity/unevenness around the pixel. The neighbourhood radius was set to 8 pixels. To compute the entropy, the filters.rank.entropy function from the Python library scikit-image was used. The resulting local entropy matrix effectively finds the edges and texture complexity in the image. The results of filtering can be seen in Figure 4.
As the entropy resolution values were too detailed for our application, the blur filtering was applied. The anisotropic blur filter from the imager library was implemented, which removes noise/unimportant details while preserving edges better than other types of blur filters. The blur filter with an amplitude of 250 was applied ( Figure 5). After reading the images, local entropy of each pixel is computed according to [19]: where represents the probability function for the pixel , . This value contains information about the complexity/unevenness around the pixel. The neighbourhood radius was set to 8 pixels. To compute the entropy, the filters.rank.entropy function from the Python library scikit-image was used. The resulting local entropy matrix effectively finds the edges and texture complexity in the image. The results of filtering can be seen in Figure 4.
As the entropy resolution values were too detailed for our application, the blur filtering was applied. The anisotropic blur filter from the imager library was implemented, which removes noise/unimportant details while preserving edges better than other types of blur filters. The blur filter with an amplitude of 250 was applied ( Figure 5).   After reading the images, local entropy of each pixel is computed according to [19]: where represents the probability function for the pixel , . This value contains information about the complexity/unevenness around the pixel. The neighbourhood radius was set to 8 pixels. To compute the entropy, the filters.rank.entropy function from the Python library scikit-image was used. The resulting local entropy matrix effectively finds the edges and texture complexity in the image. The results of filtering can be seen in Figure 4.
As the entropy resolution values were too detailed for our application, the blur filtering was applied. The anisotropic blur filter from the imager library was implemented, which removes noise/unimportant details while preserving edges better than other types of blur filters. The blur filter with an amplitude of 250 was applied ( Figure 5).    The next step is thresholding. In the image matrix, the value 1 (white) represents weld pixels, the value 0 (black) represents background. Thresholding was implemented using the function threshold from the imager library. The optimal threshold value was computed automatically using the kmeans method ( Figure 6). the kmeans method ( Figure 6).
The thresholding result may have some imperfections -small blobs and unfilled areas. Unfilled areas are removed using the inverted output of the function bucketfill (imager library). It is applied on the background of the weld and it finds all pixels of the background. The remaining the pixels are filled with value 1 (white) (Figure 7(a)).
Very small blobs were removed using the function clean (imager library). This function reduces objects size using morphological erosion, and then increases it. This causes, that very small objects are removed and the shape of larger object is simplified (Figure 7(b)).
However, larger blobs were not removed in the previous step. To find the largest object in the image, the function split_connected (imager library) was used ( Figure 8).
The segmentation result -the mask and the masked weld can be seen in Figure 9.
(a) Input for thresholding (b) Output -a mask  Step 4 -filling holes (a) and morphological simplification (b).
The thresholding result may have some imperfections-small blobs and unfilled areas. Unfilled areas are removed using the inverted output of the function bucketfill (imager library). It is applied on the background of the weld and it finds all pixels of the background. The remaining the pixels are filled with value 1 (white) (Figure 7a). The next step is thresholding. In the image matrix, the value 1 (white) represents weld pixels, the value 0 (black) represents background. Thresholding was implemented using the function threshold from the imager library. The optimal threshold value was computed automatically using the kmeans method ( Figure 6).
The thresholding result may have some imperfections -small blobs and unfilled areas. Unfilled areas are removed using the inverted output of the function bucketfill (imager library). It is applied on the background of the weld and it finds all pixels of the background. The remaining the pixels are filled with value 1 (white) (Figure 7(a)).
Very small blobs were removed using the function clean (imager library). This function reduces objects size using morphological erosion, and then increases it. This causes, that very small objects are removed and the shape of larger object is simplified (Figure 7(b)).
However, larger blobs were not removed in the previous step. To find the largest object in the image, the function split_connected (imager library) was used ( Figure 8).
The segmentation result -the mask and the masked weld can be seen in Figure 9.
(a) Input for thresholding (b) Output -a mask   Very small blobs were removed using the function clean (imager library). This function reduces objects size using morphological erosion, and then increases it. This causes, that very small objects are removed and the shape of larger object is simplified (Figure 7b).
However, larger blobs were not removed in the previous step. To find the largest object in the image, the function split_connected (imager library) was used ( Figure 8).

Vector of Sums of Subfields in the Mask
The first representation of the mask is a vector which entries are sums of subfields. For input images of resolution 263 × 300 pixels, was selected a subfield of 50 × 50 pixels, which corresponds to 36 values. The function for vector calculation is shown in the Algorithm 1.
The function ceiling rounds a number to the next higher integer. Using division of the index , by the size of the subfield, and subsequently the function ceiling, we obtained / for the selected index / . The function as.vector retypes the resulting two-dimensional array into a vector by writing the matrix elements column-wise into a vector. Example of retyping can be understood from Figure 10 and Figure 11.
Graphs for OK and NOK welds ( Figure 12) can be compared in Figure 13: the OK mask graph has every third value (representing the subfields in the image center) maximal. Values of the NOK weld graph are distributed into more columns and the values do not achieve maximum values. The main drawback of this representation is that it can be used only for images with the same size. The benefit is a multiple reduction of input data (number of mask pixels in our case has been reduced 50 2times). The segmentation result-the mask and the masked weld can be seen in Figure 9.

Vector of Sums of Subfields in the Mask
The first representation of the mask is a vector which entries are sums of subfields. For input images of resolution 263 × 300 pixels, was selected a subfield of 50 × 50 pixels, which corresponds to 36 values. The function for vector calculation is shown in the Algorithm 1.
The function ceiling rounds a number to the next higher integer. Using division of the index , by the size of the subfield, and subsequently the function ceiling, we obtained / for the selected index / . The function as.vector retypes the resulting two-dimensional array into a vector by writing the matrix elements column-wise into a vector. Example of retyping can be understood from Figure 10 and Figure 11.
Graphs for OK and NOK welds ( Figure 12) can be compared in Figure 13: the OK mask graph has every third value (representing the subfields in the image center) maximal. Values of the NOK weld graph are distributed into more columns and the values do not achieve maximum values. The main drawback of this representation is that it can be used only for images with the same size. The benefit is a multiple reduction of input data (number of mask pixels in our case has been reduced 50 2times).

Vector of Sums of Subfields in the Mask
The first representation of the mask is a vector which entries are sums of subfields. For input images of resolution 263 × 300 pixels, was selected a subfield of 50 × 50 pixels, which corresponds to 36 values. The function for vector calculation is shown in the Algorithm 1.
The function ceiling rounds a number to the next higher integer. Using division of the index (i, j) by the size of the subfield, and subsequently the function ceiling, we obtained indI/indJ for the selected index i/ j. The function as.vector retypes the resulting two-dimensional array into a vector by writing the matrix elements column-wise into a vector. Example of retyping can be understood from Figures 10 and 11.
Graphs for OK and NOK welds ( Figure 12) can be compared in Figure 13: the OK mask graph has every third value (representing the subfields in the image center) maximal. Values of the NOK weld graph are distributed into more columns and the values do not achieve maximum values. The main drawback of this representation is that it can be used only for images with the same size. The benefit is a multiple reduction of input data (number of mask pixels in our case has been reduced 50 2 -times).

Histogram Projection of the Mask
A histogram projection is a vector containing sums of columns and rows of the input image matrix ( Figure 14). In the case of an image mask, these are amounts representing numbers of white pixels. Thus, the length of the vector corresponds to the vector of the height and width of the image.
In the graphs ( Figure 15 and Figure 16) showing the histogram projection of the mask, the difference between correct and wrong welds is visible. The projection of the correct weld mask is more even, the sums by columns have an even increase and slope, and the sums per line have small variations. On the other hand, the histogram projection of the wrong weld mask has a lot of irregularities. The disadvantage of this representation consists in that it cannot be used for input images of different resolutions. The resulting projection vector is much larger than other representations. The advantage is easy implementation and calculation.

Histogram Projection of the Mask
A histogram projection is a vector containing sums of columns and rows of the input image matrix ( Figure 14). In the case of an image mask, these are amounts representing numbers of white pixels. Thus, the length of the vector corresponds to the vector of the height and width of the image.
In the graphs (Figures 15 and 16) showing the histogram projection of the mask, the difference between correct and wrong welds is visible. The projection of the correct weld mask is more even, the sums by columns have an even increase and slope, and the sums per line have small variations. On the other hand, the histogram projection of the wrong weld mask has a lot of irregularities. The disadvantage of this representation consists in that it cannot be used for input images of different resolutions. The resulting projection vector is much larger than other representations. The advantage is easy implementation and calculation.

Histogram Projection of the Mask
A histogram projection is a vector containing sums of columns and rows of the input image matrix ( Figure 14). In the case of an image mask, these are amounts representing numbers of white pixels. Thus, the length of the vector corresponds to the vector of the height and width of the image.
In the graphs ( Figure 15 and Figure 16) showing the histogram projection of the mask, the difference between correct and wrong welds is visible. The projection of the correct weld mask is more even, the sums by columns have an even increase and slope, and the sums per line have small variations. On the other hand, the histogram projection of the wrong weld mask has a lot of irregularities. The disadvantage of this representation consists in that it cannot be used for input images of different resolutions. The resulting projection vector is much larger than other representations. The advantage is easy implementation and calculation.

Vector of Polar Coordinates of the Mask Boundary
A next representation of a weld mask in this paper is the vector of polar coordinates of the mask boundary. To transform weld masks, an algorithm has been proposed and implemented. Its main steps are described below.
The first step is to find the , coordinates of the mask boundary using the function boundary (imager library). Then, coordinates of the center of the object , are calculated according to: In the next step, the position of the object is normalized (the center is moved to the position 0,0 ) according to the found coordinates. Then, for each boundary point, the coordinates are converted from Cartesian to polar , (i.e., distance from center, angle). According to the Pythagorean theorem, the distance is calculated as follows: Calculation of the angle is realized by Algorithm 2: Algorithm 2. Calculation of angle from Cartesian coordinates procedure Angle(x, y) z ← x + 1i * y a ← 90 -arg(z) / π * 180 return round(a mod 360) end procedure If the resulting number of coordinates is less than 360, the missing angle values are completed and the corresponding distances are calculated from the surrounding values by linear interpolation using the na_approx function (zoo library). The result is a vector with 360 elements, which indices

Vector of Polar Coordinates of the Mask Boundary
A next representation of a weld mask in this paper is the vector of polar coordinates of the mask boundary. To transform weld masks, an algorithm has been proposed and implemented. Its main steps are described below.
The first step is to find the x, y coordinates of the mask boundary using the function boundary (imager library). Then, coordinates of the center of the object [cx, cy] are calculated according to: In the next step, the position of the object is normalized (the center is moved to the position [0, 0]) according to the found coordinates. Then, for each boundary point, the coordinates are converted from Cartesian to polar [r, α] (i.e., distance from center, angle). According to the Pythagorean theorem, the distance is calculated as follows: Calculation of the angle is realized by Algorithm 2: Algorithm 2. Calculation of angle from Cartesian coordinates procedure Angle(x, y) z ← x + 1i * y a ← 90 -arg(z) / π * 180 return round(a mod 360) end procedure If the resulting number of coordinates is less than 360, the missing angle values are completed and the corresponding distances are calculated from the surrounding values by linear interpolation using the na_approx function (zoo library). The result is a vector with 360 elements, which indices correspond to the angle values in degrees, and the value is the distance r. The resulting graphs of OK and NOK weld masks ( Figure 17) are in Figures 18 and 19. correspond to the angle values in degrees, and the value is the distance r. The resulting graphs of OK and NOK weld masks ( Figure 17) are in Figure 18 and Figure 19.   The representation in the form of polar coordinates for the OK weld visibly differs from the NOK one. The big jumps and variations on the graph are caused by large irregularities in the weld shape. The advantage of such representation is that it can be used for any input mask resolution. The disadvantage is a complicated calculation. Generally, mask representations contain information only about the shape of the weld, which can be considered as a disadvantage because texture information is important input data for the neural network.  correspond to the angle values in degrees, and the value is the distance r. The resulting graphs of OK and NOK weld masks ( Figure 17) are in Figure 18 and Figure 19.   The representation in the form of polar coordinates for the OK weld visibly differs from the NOK one. The big jumps and variations on the graph are caused by large irregularities in the weld shape. The advantage of such representation is that it can be used for any input mask resolution. The disadvantage is a complicated calculation. Generally, mask representations contain information only about the shape of the weld, which can be considered as a disadvantage because texture information is important input data for the neural network. correspond to the angle values in degrees, and the value is the distance r. The resulting graphs of OK and NOK weld masks ( Figure 17) are in Figure 18 and Figure 19.   The representation in the form of polar coordinates for the OK weld visibly differs from the NOK one. The big jumps and variations on the graph are caused by large irregularities in the weld shape. The advantage of such representation is that it can be used for any input mask resolution. The disadvantage is a complicated calculation. Generally, mask representations contain information only about the shape of the weld, which can be considered as a disadvantage because texture information is important input data for the neural network. The representation in the form of polar coordinates for the OK weld visibly differs from the NOK one. The big jumps and variations on the graph are caused by large irregularities in the weld shape. The advantage of such representation is that it can be used for any input mask resolution. The disadvantage is a complicated calculation. Generally, mask representations contain information only about the shape of the weld, which can be considered as a disadvantage because texture information is important input data for the neural network.

Data Preparation for Neural Network
Weld images and feature vectors were stored in two data structures of type list. The first list represented welds classified as NOK (incorrect); the second list welds classified as OK (correct). For neural networks, it was necessary to combine data, i.e., to transform and randomly mix them. For MLP and RBF networks, each input vector has to have assigned a classification value 0 (incorrect) or 1 (correct). Then, the vectors were merged together and with randomly mixed elements. Next, the L2-normalization was applied to the data. Finally, 85% of training and 15% of test samples were selected randomly. For convolution neural networks, the images were 5-times reduced, then the data type was converted to a three-dimensional array data structure. In the arrays, the dimensions were transposed to represent to correspond to the following structure: [number o f images * length * height]. The vector of zeros with the same length as the first dimension corresponded to the first array (array of NOK welds). The vector of ones corresponded to the second array (array of OK welds). The arrays and vectors were merged into a common list and their elements were mixed randomly. Then, 77% of training samples, 15% of test samples and 8% of validation samples were selected.

Configuration and Training of Neural Networks
Several neural network architectures were configured for comparison and testing. Their parameters were changed during the experiments and the experiment results were compared and evaluated. Both RBF and MLP networks were configured in The Stuttgart Neural Network Simulator for R language -RSNNS library, the MLP networks were configured in the Keras library, and the convolution networks were configured in the Keras and the MXNet libraries.

RBF Network
To implement the RBF network, the RSNNS library was chosen (just in this one the RBF network template is available). Three RBF networks were configured using the function rbf (RSNN library).

Data Preparation for Neural Network
Weld images and feature vectors were stored in two data structures of type list. The first list represented welds classified as NOK (incorrect); the second list welds classified as OK (correct). For neural networks, it was necessary to combine data, i.e., to transform and randomly mix them. For MLP and RBF networks, each input vector has to have assigned a classification value 0 (incorrect) or 1 (correct). Then, the vectors were merged together and with randomly mixed elements. Next, the L2-normalization was applied to the data. Finally, 85% of training and 15% of test samples were selected randomly. For convolution neural networks, the images were 5-times reduced, then the data type was converted to a three-dimensional array data structure. In the arrays, the dimensions were transposed to represent to correspond to the following structure: * ℎ * ℎ ℎ . The vector of zeros with the same length as the first dimension corresponded to the first array (array of NOK welds). The vector of ones corresponded to the second array (array of OK welds). The arrays and vectors were merged into a common list and their elements were mixed randomly. Then, 77% of training samples, 15% of test samples and 8% of validation samples were selected.

Configuration and Training of Neural Networks
Several neural network architectures were configured for comparison and testing. Their parameters were changed during the experiments and the experiment results were compared and evaluated. Both RBF and MLP networks were configured in The Stuttgart Neural Network Simulator for R language -RSNNS library, the MLP networks were configured in the Keras library, and the convolution networks were configured in the Keras and the MXNet libraries.

RBF Network
To implement the RBF network, the RSNNS library was chosen (just in this one the RBF network template is available). Three RBF networks were configured using the function rbf (RSNN library).

Data Preparation for Neural Network
Weld images and feature vectors were stored in two data structures of type list. The first list represented welds classified as NOK (incorrect); the second list welds classified as OK (correct). For neural networks, it was necessary to combine data, i.e., to transform and randomly mix them. For MLP and RBF networks, each input vector has to have assigned a classification value 0 (incorrect) or 1 (correct). Then, the vectors were merged together and with randomly mixed elements. Next, the L2-normalization was applied to the data. Finally, 85% of training and 15% of test samples were selected randomly. For convolution neural networks, the images were 5-times reduced, then the data type was converted to a three-dimensional array data structure. In the arrays, the dimensions were transposed to represent to correspond to the following structure: * ℎ * ℎ ℎ . The vector of zeros with the same length as the first dimension corresponded to the first array (array of NOK welds). The vector of ones corresponded to the second array (array of OK welds). The arrays and vectors were merged into a common list and their elements were mixed randomly. Then, 77% of training samples, 15% of test samples and 8% of validation samples were selected.

Configuration and Training of Neural Networks
Several neural network architectures were configured for comparison and testing. Their parameters were changed during the experiments and the experiment results were compared and evaluated. Both RBF and MLP networks were configured in The Stuttgart Neural Network Simulator for R language -RSNNS library, the MLP networks were configured in the Keras library, and the convolution networks were configured in the Keras and the MXNet libraries.

RBF Network
To implement the RBF network, the RSNNS library was chosen (just in this one the RBF network template is available). Three RBF networks were configured using the function rbf (RSNN library).

MLP Network
Experiments with training and testing of MLP networks showed, that a one-layer architecture is sufficient for our data representation. The performance of the network was very good and the difference from multiple hidden layers was negligible. To keep the objectivity, MLP networks had the same configuration in both libraries. The sigmoid activation function and the randomize weights initialization functions were used. For the NN training, the error backpropagation algorithm with learning parameter 0,1 was used.
The implementation in the RSNNS library uses the mlp function for configuration and training. Configuration details are in Figure 23, Figure 24 and Figure 25.

MLP Network
Experiments with training and testing of MLP networks showed, that a one-layer architecture is sufficient for our data representation. The performance of the network was very good and the difference from multiple hidden layers was negligible. To keep the objectivity, MLP networks had the same configuration in both libraries. The sigmoid activation function and the randomize weights initialization functions were used. For the NN training, the error backpropagation algorithm with learning parameter 0,1 was used.
The implementation in the RSNNS library uses the mlp function for configuration and training. Configuration details are in Figures 23-25.

MLP Network
Experiments with training and testing of MLP networks showed, that a one-layer architecture is sufficient for our data representation. The performance of the network was very good and the difference from multiple hidden layers was negligible. To keep the objectivity, MLP networks had the same configuration in both libraries. The sigmoid activation function and the randomize weights initialization functions were used. For the NN training, the error backpropagation algorithm with learning parameter 0,1 was used.
The implementation in the RSNNS library uses the mlp function for configuration and training. Configuration details are in Figure 23, Figure 24 and Figure 25.

MLP Network
Experiments with training and testing of MLP networks showed, that a one-layer architecture is sufficient for our data representation. The performance of the network was very good and the difference from multiple hidden layers was negligible. To keep the objectivity, MLP networks had the same configuration in both libraries. The sigmoid activation function and the randomize weights initialization functions were used. For the NN training, the error backpropagation algorithm with learning parameter 0,1 was used.
The implementation in the RSNNS library uses the mlp function for configuration and training. Configuration details are in Figure 23, Figure 24 and Figure 25.   The implementation of the MLP network in the Keras library required a detailed list of layers in the code. Two layer_dense layers were used; the first one defines the hidden layer with the ReLU activation function, and the second one defines the output layer with the size 2 (two output categories) using the softmax activation function (Figure 26).

Convolution Neural Network
For an objective comparison of the Keras and MXNet libraries, the same convolution network architecture in both libraries was used at first, however in the MXNet library, training such a neural network was too slow. Thus, we designed our own architecture with a better learning time performance. The discussion about the results is provided in the next Section 4.
The architecture of the convolution network 1 is shown in Figure 27 and visualized in Figure 28. The architecture includes a list of all layers and the size of output structures for both NN. Two pairs of convolution and pooling layers were used, the convolution being applied twice before the first pooling layer. The input image size was 56 × 60. The number of convolution filters was 32 at the beginning, in further convolution filters it rose to 64. A dropout was used between some layers to prevent overtraining of the neural network by deactivating a certain percentage of randomly selected neurons. At the end, the flatten layer was used to convert the resulting structure into a onedimensional vector used as an input for a simple MLP network with one hidden layer containing 256 neurons. The implementation of the MLP network in the Keras library required a detailed list of layers in the code. Two layer_dense layers were used; the first one defines the hidden layer with the ReLU activation function, and the second one defines the output layer with the size 2 (two output categories) using the softmax activation function (Figure 26). The implementation of the MLP network in the Keras library required a detailed list of layers in the code. Two layer_dense layers were used; the first one defines the hidden layer with the ReLU activation function, and the second one defines the output layer with the size 2 (two output categories) using the softmax activation function (Figure 26).

Convolution Neural Network
For an objective comparison of the Keras and MXNet libraries, the same convolution network architecture in both libraries was used at first, however in the MXNet library, training such a neural network was too slow. Thus, we designed our own architecture with a better learning time performance. The discussion about the results is provided in the next Section 4.
The architecture of the convolution network 1 is shown in Figure 27 and visualized in Figure 28. The architecture includes a list of all layers and the size of output structures for both NN. Two pairs of convolution and pooling layers were used, the convolution being applied twice before the first pooling layer. The input image size was 56 × 60. The number of convolution filters was 32 at the beginning, in further convolution filters it rose to 64. A dropout was used between some layers to prevent overtraining of the neural network by deactivating a certain percentage of randomly selected neurons. At the end, the flatten layer was used to convert the resulting structure into a onedimensional vector used as an input for a simple MLP network with one hidden layer containing 256 neurons.

Convolution Neural Network
For an objective comparison of the Keras and MXNet libraries, the same convolution network architecture in both libraries was used at first, however in the MXNet library, training such a neural network was too slow. Thus, we designed our own architecture with a better learning time performance. The discussion about the results is provided in the next Section 4.
The architecture of the convolution network 1 is shown in Figure 27 and visualized in Figure 28. The architecture includes a list of all layers and the size of output structures for both NN. Two pairs of convolution and pooling layers were used, the convolution being applied twice before the first pooling layer. The input image size was 56 × 60. The number of convolution filters was 32 at the beginning, in further convolution filters it rose to 64. A dropout was used between some layers to prevent overtraining of the neural network by deactivating a certain percentage of randomly selected neurons. At the end, the flatten layer was used to convert the resulting structure into a one-dimensional vector used as an input for a simple MLP network with one hidden layer containing 256 neurons.  Parameters of individual layers are shown in the diagram in Figure 28. For example, the convolution layer (red) contains a list of 3 × 3 -filter size, 3 × 3 -stride, 32 -number of filters.
The architecture of the convolution network 2 is visualized in Figure 29. Two pairs of convolution and pooling layers were used, however in this case a double convolution occurs only in the second layer. There is also a difference in the design of the convolution, where the parameter stride (step of the filter) is 3,3. Dropout was used only in two places.  Parameters of individual layers are shown in the diagram in Figure 28. For example, the convolution layer (red) contains a list of 3 × 3 -filter size, 3 × 3 -stride, 32 -number of filters.
The architecture of the convolution network 2 is visualized in Figure 29. Two pairs of convolution and pooling layers were used, however in this case a double convolution occurs only in the second layer. There is also a difference in the design of the convolution, where the parameter stride (step of the filter) is 3,3. Dropout was used only in two places.  Figure 28. For example, the convolution layer (red) contains a list of 3 × 3 -filter size, 3 × 3 -stride, 32 -number of filters.

Parameters of individual layers are shown in the diagram in
The architecture of the convolution network 2 is visualized in Figure 29. Two pairs of convolution and pooling layers were used, however in this case a double convolution occurs only in the second layer. There is also a difference in the design of the convolution, where the parameter stride (step of the filter) is 3,3. Dropout was used only in two places.

Results
This chapter presents results of code profiling, weld segmentation and evaluation of neural networks.

Code Profiling
Profiling was done using the profvis library at the level of the code line. The output is an interactive visualization using memory listing in MB and computing time in ms for each code line. The example can be seen in Figure 30.

Results
This chapter presents results of code profiling, weld segmentation and evaluation of neural networks.

Code Profiling
Profiling was done using the profvis library at the level of the code line. The output is an interactive visualization using memory listing in MB and computing time in ms for each code line. The example can be seen in Figure 30.

Results
This chapter presents results of code profiling, weld segmentation and evaluation of neural networks.

Code Profiling
Profiling was done using the profvis library at the level of the code line. The output is an interactive visualization using memory listing in MB and computing time in ms for each code line. The example can be seen in Figure 30.  Profiling was performed on a desktop computer with parameters listed in Table 1 (the graphic card was not used). Table 1. Technical specifications of PC.

Operating System
Windows 7 Professional 64-bit

Results of Data Preparation and Segmentation
Segmentation was successful for all tested weld samples. For some NOK defective welds which consisted of several parts or contained droplets, only the largest continuous weld surface was segmented, which was considered to be a correct segmentation for proposed methodology. Segmentation examples are shown in Figure 31. Profiling was performed on a desktop computer with parameters listed in Table 1 (the graphic card was not used). Table 1. Technical specifications of PC.

Results of Data Preparation and Segmentation
Segmentation was successful for all tested weld samples. For some NOK defective welds which consisted of several parts or contained droplets, only the largest continuous weld surface was segmented, which was considered to be a correct segmentation for proposed methodology. Segmentation examples are shown in Figure 31.
The segmentation time is an important indicator in comparison of results. Results of profiling different parts of the segmentation process can be seen in Figure 32. Code profiling was carried out using a computer with the technical specification shown in Table 1.  Segmentation was performed by concatenating the outputs from functions load.image, grayscale, entropyFilter, createMask, and segmentWeld. Almost all functions in this section of the program were performed very quickly (within 30 ms) except for the entropyFilter function, which took an average of 158 ms to be completed. This function is the most important part of the segmentation algorithm; the time was acceptable. The average time to complete the whole segmentation was 194 ms. The average amount of memory allocated was 74.76 MB. For MLP and RBF networks, the next step was to transform masks into feature vectors. The profiling results of functions performing three types of transformations can be seen in Figure 33.
The results show that these functions are optimal, taking up minimal memory and time. The mean values for computing the vector of sums of subfields in the mask are 16 ms and 0.1 MB; for the histogram projection vector, it is less than 10 ms and less than 0.1 MB (estimation of profiling tool, The segmentation time is an important indicator in comparison of results. Results of profiling different parts of the segmentation process can be seen in Figure 32. Code profiling was carried out using a computer with the technical specification shown in Table 1.  Segmentation was performed by concatenating the outputs from functions load.image, grayscale, entropyFilter, createMask, and segmentWeld. Almost all functions in this section of the program were performed very quickly (within 30 ms) except for the entropyFilter function, which took an average of 158 ms to be completed. This function is the most important part of the segmentation algorithm; the time was acceptable. The average time to complete the whole segmentation was 194 ms. The average amount of memory allocated was 74.76 MB. For MLP and RBF networks, the next step was to transform masks into feature vectors. The profiling results of functions performing three types of transformations can be seen in Figure 33.
The results show that these functions are optimal, taking up minimal memory and time. The mean values for computing the vector of sums of subfields in the mask are 16 ms and 0.1 MB; for the Segmentation was performed by concatenating the outputs from functions load.image, grayscale, entropyFilter, createMask, and segmentWeld. Almost all functions in this section of the program were performed very quickly (within 30 ms) except for the entropyFilter function, which took an average of 158 ms to be completed. This function is the most important part of the segmentation algorithm; the time was acceptable. The average time to complete the whole segmentation was 194 ms. The average amount of memory allocated was 74.76 MB. For MLP and RBF networks, the next step was to transform masks into feature vectors. The profiling results of functions performing three types of transformations can be seen in Figure 33. Presented results are also shown in Table 2.

Criteria for Evaluation of Neural Network Results
As the main criterion for results evaluation the confusion matrix was chosen. The main diagonal of the confusion matrix contains the numbers of correctly classified samples, the antidiagonal contains the numbers of incorrectly classified samples; the smaller values in the antidiagonal, the more successful the prediction model. In a binary classification this matrix contains four values ( Figure 34): TP -true positive; FP -false positive; FN -false negative; TN -true negative.
The accuracy was computed from the confusion matrix and is expressed as the ratio of correctly classified samples to all samples, see equation (5) [20].
Accuracy is an objective criterion only if the FN and FP values are similar. Presented results are also shown in Table 2.  A more objective criterion for comparing results is the F-score. The F-score is calculated as the harmonic average of the precision and the recall (sensitivity) values [20], the best score corresponds to F-score = 1: To visualize the success of neural network classification, the ROC (Receiver operating characteristics) curve was chosen. It shows the recall (sensitivity) value depending on the value 1specificity at the variable threshold [20] (Figure 35): The ROC curve for the best possible classifier is rectangular with the vertex [0,1].

Results of Neural Network Classificaton
We configured and tested neural networks for all data representations (in total 15 experiments). For a better clarity, the experiments results are labelled using labels from Table 3.
The first tests were carried out for RBF and MLP networks with input data formats according to Table 3. Resulting confusion matrices for RBF networks are as follows: The accuracy was computed from the confusion matrix and is expressed as the ratio of correctly classified samples to all samples, see Equation (5) [20].
Accuracy is an objective criterion only if the FN and FP values are similar. A more objective criterion for comparing results is the F-score. The F-score is calculated as the harmonic average of the precision and the recall (sensitivity) values [20], the best score corresponds to F-score = 1: To visualize the success of neural network classification, the ROC (Receiver operating characteristics) curve was chosen. It shows the recall (sensitivity) value depending on the value 1-specificity at the variable threshold [20] (Figure 35): A more objective criterion for comparing results is the F-score. The F-score is calculated as the harmonic average of the precision and the recall (sensitivity) values [20], the best score corresponds to F-score = 1: To visualize the success of neural network classification, the ROC (Receiver operating characteristics) curve was chosen. It shows the recall (sensitivity) value depending on the value 1specificity at the variable threshold [20] (Figure 35): The ROC curve for the best possible classifier is rectangular with the vertex [0,1].

Results of Neural Network Classificaton
We configured and tested neural networks for all data representations (in total 15 experiments). For a better clarity, the experiments results are labelled using labels from Table 3.
The first tests were carried out for RBF and MLP networks with input data formats according to Table 3. Resulting confusion matrices for RBF networks are as follows:

Results of Neural Network Classificaton
We configured and tested neural networks for all data representations (in total 15 experiments). For a better clarity, the experiments results are labelled using labels from Table 3. The first tests were carried out for RBF and MLP networks with input data formats according to Table 3. Resulting confusion matrices for RBF networks are as follows: From the matrices (10) it is evident that the RBF network performed bad when classifying NOK welds-they are often classified as OK. ROC curves of trained RBF networks are depicted in Figure 36.
From the matrices (10) it is evident that the RBF network performed bad when classifying NOK welds -they are often classified as OK. ROC curves of trained RBF networks are depicted in Figure  36.   The results show that the MLP implementation in the RSNNS library was more successful compared with the Keras library. The networks had no problem to classify correct (OK) or incorrect (NOK) welds. FP and FN values were approximately similar. The resulting calculated accuracy and F-scores shown in Table 4 describe the performance of the trained neural networks.  The results show that the MLP implementation in the RSNNS library was more successful compared with the Keras library. The networks had no problem to classify correct (OK) or incorrect (NOK) welds. FP and FN values were approximately similar. The resulting calculated accuracy and F-scores shown in Table 4 describe the performance of the trained neural networks. The results show that MLP networks are much more successful. Using default RBF initialization weights the RBF network less successful. From a practical point of view, MLP networks are more suitable for weld evaluation.
It was hard to compare the results for MLP networks, they provided similar results for all data representations. The RBF network achieved significantly better results in the vector of sums of subfields in the mask data representation.
It was found out, that using the same network configuration in the two libraries yields slightly different results. The implementation in the RSNNS library was almost 100% successful and therefore it was considered as the best candidate for practical use.
Training profiling for RSNN library was done next. Although training in the Keras library allocated less memory, the training time was several times longer than in case of the RSNNS library. Using vector of sums of subfields in the mask, the MLP network training time in RSNNS took less than one second, while using the Keras library was tens of seconds. The list of training profiling results is shown in Table 5. Comparison of convolution neural nets was again based on the confusion matrices, ROC curves, accuracy and F-scores. The input of the networks were just images of welds without any filtration and masked welds without background (black background). Confusion matrices are as follows: cnn − ker − seg11 = 534 0 0 461 , Classification error in convolution neural networks was minimal, therefore the ROC curve was evaluated as ideal for all experiments with indistinguishable differences. For all neural nets, the ROC curve was the same (Figure 38).
The resulting accuracy and F-scores along with the number of epochs needed to train the networks are listed in Table 6.  For convolution networks, changes of accuracy after each epoch for both training (blue line) and validation data (green line) are shown in Figure 39. The charts show that training with nonsegmented weld images started at a lower accuracy and the learning was slower (Figure 40).   For convolution networks, changes of accuracy after each epoch for both training (blue line) and validation data (green line) are shown in Figure 39. The charts show that training with non-segmented weld images started at a lower accuracy and the learning was slower (Figure 40).  For convolution networks, changes of accuracy after each epoch for both training (blue line) and validation data (green line) are shown in Figure 39. The charts show that training with nonsegmented weld images started at a lower accuracy and the learning was slower (Figure 40).  The progress of training for the Keras library was more uniform, without steps. The graphs can be seen in Figure 41 and Figure 42.  The success rate for all networks was higher than 99%. The decisive factor for comparison were the code profiling results shown in Table 7. The progress of training for the Keras library was more uniform, without steps. The graphs can be seen in Figure 41 and Figure 42.  The success rate for all networks was higher than 99%. The decisive factor for comparison were the code profiling results shown in Table 7. The progress of training for the Keras library was more uniform, without steps. The graphs can be seen in Figure 41 and Figure 42.  The success rate for all networks was higher than 99%. The decisive factor for comparison were the code profiling results shown in Table 7. The success rate for all networks was higher than 99%. The decisive factor for comparison were the code profiling results shown in Table 7. It can be concluded, that the network with the architecture shown in Figure 29 in Section 3.3 implemented using the MXNet library was the fastest. With a training time 12.170 ms and a 100% success also for non-segmented data it is considered the best choice for practical use.
Although the MLP network (mlp-rsn-sum04) was similarly successful and several times faster in training, the preparation of the representation in the form of the vector of sums of subfields in the mask took considerably more time. The number of training samples was approximately 5400, the average time to obtain a mask of one sample was 164 ms, and the vector calculation was 16 ms, in total 972 ms.

Profiling Single Weld Diagnostics
In practice, neural network training is not a frequent process. Usually, the network is trained once and then implemented for prediction. Therefore, at the end we decided to evaluate the prediction of one weld for the most successful models. The provided results represent the average of five independent tests. The list can be seen in Table 8 along with the average image preparation time and memory required to prepare the weld input image for the specific diagnostic model. The diagnostic profiling results confirmed that the best solution was the classification of the weld using the convolution net with the architecture shown in Figure 29 in Section 3.3. The average image loading time and its 5× reduction took only 14 ms on average, and evaluation time was 14 ms.

Discussion
The aim of this paper was to develop a neural network based methodology to evaluate quality of welds. Several types of neural networks implemented in several software libraries were compared with respect to performance. It was necessary to prepare the data (images of welds) into a format suitable for neural network processing. For some types of networks (convolution) the input data preparation was minimal (segmentation or no segmentation), while for other networks (MLP, RBF), a sophisticated data preprocessing was required (filtering, equalizing and segmenting the image based on entropy). Each library required its own input data format which also had to be taken into account during programming. The main result of the paper is confirmation, that the convolutional neural networks can be used for weld quality evaluation without using image preprocessing and in case of using no segmentation, they can be used for evaluation not only weld metal but also adjected zones.
Neural networks were configured experimentally to achieve the best performance and the obtained results were compared. In all cases, neural networks implemented and trained using the proposed approach delivered excellent results with a success rate of nearly 100%. Thus, we can recommend any of the tested libraries to solve the weld quality evaluation problem. The best results were achieved using convolution neural networks which provided excellent results and with almost no pre-processing of image data required. The longer training time of these networks is acceptable in practical usage.
In summary, based on achieved experimental results, convolution neural networks have shown to be a promising approach for weld evaluation and will be applied in the future research dealing with evaluation of images in the real welding processes. The convolutional neural networks can be used for weld quality evaluation without using image preprocessing.