Classification of Coarse Aggregate Particle Size Based on Deep Residual Network

: Traditional aggregate particle size detection mainly relies on manual batch sieving, which is time-consuming and inefficiency. To achieve rapid automatic detection of aggregate particle sizes, a mechanical symmetric classification model of coarse aggregate particle size, based on a deep residual network, is proposed in this paper. First, aggregate images are collected by the optical vertical projection acquisition platform. The collected aggregate images are corrected, and their geometric parameters are extracted. Second, various digital image processing methods, such as size correction and morphological processing, are used to improve the image quality and enlarge the image dataset of different aggregate particle sizes. Then, the deep residual network model (ResNet50) is built to train the aggregate image dataset to achieve accurate classification of aggregate sizes. Finally, compared with the traditional single geometric particle size classification model, the comparative results show that the accuracy of the coarse aggregate classification model proposed in this paper is nearly 20% higher than that of the traditional method, reaching 0.833. The proposed model realizes the automatic classification of coarse aggregate particle size, which can significantly improve the efficiency of aggregate automatic detection.


Introduction
Presently, coarse aggregate gradation is one of the most important technical indicators of asphalt concrete pavement aggregate, which has a significant impact on pavement performance [1][2][3]. Due to the different qualities of aggregate produced by many aggregate processing manufacturers, before using mixed aggregate, it is necessary to take mixed aggregate samples to detect the particle size and gradation of the aggregate to ensure that it meets requirements. A common aggregate particle size detection method is the vibration sieving method [4]; however, it can only be used in the sampling inspection of aggregate size range classifications and aggregate gradation in construction processes. It cannot detect aggregates size on a conveyor belt in real-time. Therefore, it is essential to study a fast and accurate real-time automatic method for coarse aggregate particle size classification for the automatic real-time detection of aggregate gradation.
In recent years, many scholars have carried out research on the grading of aggregate particle sizes and grading detection methods for coarse aggregate [5][6][7]. Tafesse et al. [8] developed an image analysis system for determining the three-dimensional size and shape distribution of coarse aggregate particles. The analysis results were similar to those obtained by physical sieving, which proves that the system can be used in construction site detection. Hu et al. [9] used computerized tomography for 3D reconstruction of internal asphalt mixture aggregate, and identified and separated aggregate using MATLAB. The experimental results showed that the calculated aggregate volume information was consistent with the aggregate gradation of the asphalt mixture. Li et al. [10] designed an all-weather real-time grading detection system for mineral mixtures, which used the photoelectric imaging platform with the minimum boundary algorithm and the dimension feature calculation method to calculate aggregate gradation in real-time. It provided data references for construction detection. Wang [11] proposed an improved image method to calculate aggregate particle size, and selected the aggregate equivalent ellipse model as the primary model for aggregate particle size calculation, which had an accuracy of 80.7%. Sun et al. [12] used a Gocator 3D intelligent sensor to collect aggregate 3D point cloud data, and then extracted 3D feature parameters from the reconstructed 3D aggregate through the cascade filtering method and the greedy triangulation algorithm was used for calculations. The test results show that the three-dimensional shape index of road aggregate proposed in this paper can be used for a comprehensive and quantitative evaluation of the morphological characteristics of aggregates with different lithologies, particle sizes, and shapes. Liu et al. [13] proposed a new method called the virtual cutting method to evaluate the angularity index values of 3D point-cloud coarse aggregate images with the aim of characterizing the angularity of aggregates on conveyor belts. Compared with those of 2D and 3D projection methods, the virtual cutting method can be employed to quantify the angularity of a single aggregate or aggregates in piles on conveyor belts based on 3D point-cloud images.
Recently, research on aggregate characteristic analyses using machine learning and deep learning with artificial intelligence have been increasing year on year [14]. Pei et al. [15] proposed a neural network model for calculating aggregate particle size. The model used multiple geometric feature factors of two-dimensional images to calculate aggregate particle size. Compared with the calculation model of single-feature aggregate particle size, the calculation accuracy and sieving efficiency of the aggregate particle size were significantly improved. Pei et al. [16] used extreme gradient boosting (XGBoost) [17,18] to automatically classify the aggregate shape, and proposed a fusion feature importance analysis method to select feature parameters. These research schemes, based on artificial intelligence algorithms, provide new research ideas and a specific theoretical basis for the automatic detection of aggregates. Sun et al. [19] introduced the multiple features of coarse aggregate 27 (MFCA27) dataset, which contains 27 features of aggregates, based on aggregate three-dimensional (3D) top-surface object, and proposed a Gaussian process classifier (GPC) to aggregate sieve-size class measurements. The results demonstrated that the GPC was the best-performing method for the datasets, with two-or three-dimensional feature sets in terms of accuracy and robustness.
From the above research, although aggregate size grading efficiency has been improved to a certain extent, most of the aggregate size detection methods build models by actively extracting geometric features from images. The selection of geometric parameters directly affects the calculation accuracy of the model. Therefore, combined with a deep residual network [20][21][22], we proposed a coarse aggregate size classification model, which takes images as direct input and does not need to actively select geometric parameters. The proposed method mainly aims at problems existing in aggregate classification, such as the low precision, slow speed, and the inability to rapidly detect aggregates on a conveyor belt.
The remainder of this article is organized as follows. Section 2 introduces the coarse aggregate dataset, including aggregate image acquisition and aggregate image preprocessing. Section 3 describes the proposed method in detail, including the network structure and experimental settings. Section 4 introduces the evaluation index and discusses the experimental results compared with other common methods. Finally, Section 5 draws the conclusions of this paper.

Coarse Aggregate Image Acquisition
The experimental device was an optical vertical projection acquisition platform based on a secondary development, as shown in Figure 1. The optical acquisition platform comprised an industrial computer module, industrial camera module, and luminous platform module. The optical acquisition platform used the vertical irradiation method to collect aggregate images. In order to ensure that the difference between each batch of aggregate information was controlled within a small range, it was necessary to fix the industrial camera module on a pre-designed acquisition platform. The vertical distance between the camera and the luminous platform was strictly controlled at 0.45 m ± 10 mm. The simple optical projection acquisition method has the shortcoming of an uneven illumination projection, which has difficulty meeting the accuracy requirements of the aggregate image dataset. Therefore, it was necessary to use a high-intensity LED with a photosensitive sensor to accurately compensate for the illumination in the acquisition environment.

Coarse Aggregate Image Preprocessing
According to The Methods of Aggregate for Highway Engineering (JTG E 42-2005) [23], the aggregate was divided into 2.36~4.75 mm, 4.75~9.50 mm, 9.50~13.20 mm, 13.20~16.00 mm, etc. A total of 2400 aggregates were screened out according to the criterion using the vibration sieving method. In order to ensure the format marking of collected data and the convenience of batch processing in the later stage, five aggregates were collected as an image via linear arrangement, and the horizontal interval of each aggregate was kept at 5 cm ± 5 mm, as shown in Figure 2. The 5 cm arrangement of the aggregates was intended to facilitate the segmentation of multiple sets of images; thus, making it easier to quickly and accurately create single aggregate image datasets. In practice, the aggregates do not need to be aligned by 5 cm, and only need to be out of contact with each other. When the aggregates were placed, we used software built by Python to control the camera to take aggregate images. A total of 480 pieces of aggregate were collected and images were produced. According to the minimum particle size distribution range of aggregate particle size, including aggregates of four particle sizes, 4.75, 9.5, 13.2, and 16.0 mm grade.  Due to the image distortion in the optical acquisition platform, Zhang's calibration method [24] was used to correct the distortion of the industrial camera module, which can effectively reduce the error caused by the lens distortion effect. The checkerboard used for calibration was 9 × 12 with a black and white spacing of 20 mm. After distortion correction, the parameter unit was the pixel distance. In order to unify the parameter units into the actual size, the checkerboard image with a distortion correction is used to calibrate and calculate the actual physical size. Specifically, using corner detection for the checkerboard image first, and coordinates of 8 × 11 Harris corner points in the image were obtained. Then, the pixel size of each black and white square edge was calculated according to the image coordinates of the Harris corner point. Combining with the actual physical size of the known black and white squares, the conversion ratio between the pixel distance and the actual physical distance was finally obtained.
The overall process of image-quality improvement is shown in Figure 3. In order to eliminate the influence of low-frequency noise in the process of image acquisition, the median gray value of each point pixel in the 3 × 3 neighborhood of a point pixel was used to replace the gray value of the pointed pixel to smooth the data. A 3 × 3 Sobel operator was used for edge detection to obtain the edge contour of a single aggregate. The specific formula is shown in Equation (1): Here, is vertical gradient kernel, is original image, and is edge image. The aggregate profile has an obvious effect on the horizontal gradient and vertical gradient, and the noise is weak. To minimize the impact of noise, the superposition weight was set to 0.8, as shown in Equation (2).
Finally, cavity filling based on a contour search was used to quickly fill the edge image and perform morphological corrosion and expansion processing to remove the remaining tiny noise. The image was cropped according to the minimum enclosing rectangle of the aggregate to obtain a single aggregate image, as shown in Figure 4. After the above image processing, 2400 single aggregate images were obtained from 480 original coarse aggregate images, including 600 aggregate images at grades of 4.75, 9.5, 13.2 and 16.0 mm, respectively, which were used as the aggregate image dataset. According to the characteristics of aggregate images and the conversion ratio calculated during image preprocessing, the unified image size was 512 × 512. Since the relative size of the aggregates was an essential basis for the aggregate size range, direct scaling of the images will destroy the relative size of the aggregates, and it is necessary to expand the boundary of a single aggregate image to reach the image size of 512 × 512. In order to ensure sufficient data for subsequent model training, 2400 single aggregate images were rotated and flipped to augment the data. Finally, the aggregate image dataset was constructed, including 4800 aggregate images: 1200 aggregates at 4.75, 9.5, 13.2, and 16.0 mm grades.

Deep Residual Network
As the depth of a deep convolutional neural network increases, problems, such as gradient explosion, gradient disappearance, and network degradation, will occur, leading to a decrease in model accuracy with the increase in depth. The deep residual network constructed by residual module combined with batch normalization and the ReLU activation function can effectively solve the abovementioned problems. The residual module directly transmits the information of shallow convolution to deep convolution through the shortcut connections, increasing the complexity of the information and improving the network fitting ability.
There are two common residual modules, as shown in Figure 5a,b. Figure 5a shows a two-layer convolution structure residual module connected by two 3 × 3 convolution layers. Figure 5b shows a three-layer convolution structure residual module connected by 1 × 1 convolution layer, 3 × 3 convolution layer, and 1 × 1 convolution layer. As can be seen from the residual module structure shown in Figure 5a, input X is directly transmitted to output Y through the shortcut connections to realize identity mapping and effectively reduce the illusion of network degradation caused by increased network depth.
To ensure that the input channel and output channel of the residual module are consistent, a convolution layer with the size of 1 × 1 should be added to the shortcut connections to construct the channel balance residual module, as shown in Figure 6. Multiple residual modules are connected with channel balance residual modules in order to form residual blocks, and four residual blocks are cascaded to form a basic deep residual network. The number and structure of residual modules in the deep residual network are different to form different deep residual networks, such as ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, etc. The network structure is shown in Table 1. Residual module structure two-layer residual module three-layer residual module Comprehensively considering the model detection accuracy and detection speed, ResNet50 was selected as the primary research network and was fine-tuned according to the aggregate image dataset. The specific structure of ResNet50 after adjustment is shown in Figure 7. In Figure 7b, n is the number of three-layer residual modules in the residual block. For the four residual blocks of ResNet50 types, n is 3, 2, 4, 1.

Network Structure
For the classification task of aggregate particle size, the ResNet50 structure was used to construct the main part of the classification network. The classification network comprises an input module, feature extraction module, global pooling module, and fully connected module.
In order to adapt to the aggregate dataset, the size of the network input module was 512 × 512. In order to further reduce the module computation and improve the network convergence speed, the network used a single channel gray map as the input. In order to improve the feature extraction capability of the network, the network adopted the cascaded form of an initial convolution layer, an initial pooling layer, and four residual blocks in ResNet50 as the network feature extraction module structure. The initial convolution layer was composed of convolution layer, batch normalization and ReLU activation function, in which the convolution kernel size in the convolution layer was 7 × 7 and the step size was 2. The initial pool layer was the maximum pool with a pool kernel size of 2 × 2 and step size of 2. The residual module in the residual block used the three-layer symmetric convolution structure residual module suitable for the deep network. It replaced the convolution layer in the residual block with a complex convolution layer consisting batch normalization, ReLU activation function, and the convolution layer, to further to improve the module's operation accuracy and operation speed. In the feature extraction module, the residual connection was adopted for each residual block, so residual block input was added to residual block output through a shortcut connection. To ensure the consistency of the residual block input size and output feature size, a complex convolution layer with a convolution kernel of 1 × 1 and step size of 2 was added on the shortcut connection.
The global pooling module consisted of a global average pooling layer between the feature extraction module and the fully connected module. It compresses and reduces the dimension of the feature image of the feature extraction module to meet the input requirements of the fully connected module. The fully connected module consisted of three fully connected layers, one Dropout layer, and one SoftMax layer. The numbers of neurons in the three-layer fully connected layer were 256, 32 and 4, and 2048 input features were converted and compressed into 4 output features. The Dropout layer was located between the first fully connected layer and the second fully connected layer to prevent overfitting. SoftMax layer converted the four output features into output probabilities. The particle size of the input aggregate image was judged according to the output probability of the fully connected module. The specific network structure is shown in Figure 8. In Figure 8, s is the convolution step size, n1, n2, and n3 are the number of convolution output channels, and n5 is the number of three-layer residual modules in the residual block. The 4800 aggregate images in the aggregate image dataset were divided into a training dataset, test dataset, and verification dataset according to 3:1:1. The model training hyperparameter settings are shown in Table 2. The training parameters were chosen based on several experiments, combined with experience based on [25,26]. Common loss functions included zero one loss, mean square error loss, cross entropy loss, and so on. Cross-entropy loss [26] is the commonly used loss function for classification models. The optimizer was the SGD optimizer with a momentum of 0.9 [27,28], which is widely used in model training. The epoch was set to 50 according to the complexity of the experimental data and the model convergence needs. The initial learning rate was set to 0.01 [25,26,29], which is also a commonly used setting parameter.

Experiment and Analysis
Aggregate images are processed by the coarse aggregate particle size classification model to obtain aggregate classification results. The results can be divided into four categories according to the label and actual processing results, and the confusion matrix is constructed, as shown in Table 3. The sum of values in each column represents the real quantity of the class. For a certain column, values in each row represent the quantity predicted by the model for the class. For the whole confusion matrix, the larger the value on the diagonal of the confusion matrix, the more correct the network classification.  Table 3 shows the numerical interpretation of confusion matrix in the case of two categories (distinguished by positive and negative). In Table 3 The coarse aggregate size classification model and ResNet50 classification network were tested with 360 single aggregate images in the test set, including 90 aggregate images of each size range.
According to the ResNet50 output results and the actual aggregate image labels, a confusion matrix was obtained, as shown in Figure 9. Figure 9 shows the prediction results of the aggregate images by ResNet50 in the form of a confusion matrix. The larger the diagonal value and the smaller the other grid values, the higher the prediction accuracy of the model. Taking the first column of Figure 9 as an example, these values show that there are 90 aggregate images with a 4.75 mm grade. The 78 in the first row represents 78 aggregate images that were predicted to be 4.75 mm grade and the 12 in the second row represents 12 aggregate images that were predicted to be 9.5 mm grade. The 0 in the third and fourth rows means that no images were predicted for be 13.2 or 16.0 mm grades.
The precision and recall of ResNet50 were calculated using the confusion matrix, as shown in Figure 10. The average accuracy of ResNet50 for coarse aggregate particle size classification was 0.781. According to Figures 9 and 10 above, the classification precision of the aggregate images for the 9.5 mm grade and 13.2 mm grade were 0.68 and 0.72, respectively, which was far lower than the classification precision of aggregate images of 4.75 mm grade and 16 mm grade, indicating that ResNet50 had a poor feature extraction effect for aggregate images at 9.5 mm grade and 13.2 mm grade.
Based on the output results of the coarse aggregate particle size classification model and the actual aggregate image labels, a confusion matrix was obtained, as shown in Figure 11. Figure 11 shows the prediction results of 360 single aggregate images using the coarse aggregate particle size classification model in the form of a confusion matrix. The precision and recall of the coarse aggregate particle size classification model were calculated using the confusion matrix, as shown in Figure 12. The overall accuracy of the coarse aggregate particle size classification model was 0.833, higher than ResNet50 (ACC = 0.781). According to Figures 11 and 12 above, the classification precision of the aggregate images of the 4.75 and 16.0 mm grading is above 0.9. Compared with ResNet50, the classification precision of the 9.5 and 13.2 mm grades is also improved.
To further verify the effectiveness of the coarse aggregate size classification model, the results of the coarse aggregate size classification model were compared with existing aggregate grading calculation methods in the aggregate image dataset. The accuracy of the proposed model is 0.153 and 0.255 higher than that of the equivalent elliptic shortdiameter model (ACC = 0.68) and equivalent rectangular short-edge model (ACC = 0.578), respectively. The results show that the coarse aggregate particle size classification model has a higher accuracy rate for the aggregate image dataset, and the model performance is significantly better than the traditional geometric model.

Conclusion
In view of the problems existing in the specification classification of aggregates before construction, such as a low accuracy, slow speed, etc., a coarse aggregate size classification model based on a deep residual network is designed in this paper. The main conclusions are summarized as follows: The coarse aggregate images are collected by the optical vertical projection acquisition platform, and then the single aggregate images are rotated and flipped to increase the number of images, thereby constructing the aggregate image dataset. ResNet50 is applied to the aggregate image dataset. The experimental results indicate that the accuracy of this The improved deep residual network model is designed for application to the aggregate image dataset and the numerical results obtained by the experiments indicate that the accuracy of the proposed model is up to 0.833 on the test dataset, to realize the automatic grading of coarse aggregate specification, and the detection speed and accuracy are improved to some extent. Meanwhile, it solves problems such as the slow speed of traditional manual measurements and the low classification accuracy of the single geometric model.
The proposed model can significantly improve the accuracy and speed of coarse aggregate size classification, and is of significance to the intelligent detection of aggregate size distribution and gradation.
In the future, the focus of our research will be on how to isolate the unqualified aggregate identified by the proposed method by using manipulator or other devices. This idea will be applied to the aggregate production process, which can improve the aggregate quality.