Multilabel Image Classification Based Fresh Concrete Mix Proportion Monitoring Using Improved Convolutional Neural Network

Proper and accurate mix proportion is deemed to be crucial for the concrete in service to implement its structural functions in a specific environment and structure. Neither existing testing methods nor previous studies have, to date, addressed the problem of real-time and full-scale monitoring of fresh concrete mix proportion during manufacturing. Green manufacturing and safety construction are hindered by such defects. In this study, a state-of-the-art method based on improved convolutional neural network multilabel image classification is presented for mix proportion monitoring. Elaborately planned, uniformly distributed, widely covered and high-quality images of concrete mixtures were collected as dataset during experiments. Four convolutional neural networks were improved or fine-tuned based on two solutions for multilabel image classification problems, since original networks are tailored for single-label multiclassification tasks, but mix proportions are determined by multiple parameters. Various metrices for effectiveness evaluation of training and testing all indicated that four improved network models showed outstanding learning and generalization ability during training and testing. The best-performing one was embedded into executable application and equipped with hardware facilities to establish fresh concrete mix proportion monitoring system. Such system was deployed to terminals and united with mechanical and weighing sensors to establish integrated intelligent sensing system. Fresh concrete mix proportion real-time and full-scale monitoring and inaccurate mix proportion sensing and warning could be achieved simply by taking pictures and feeding pictures into such sensing system instead of conducting experiments in laboratory after specimen retention.


Introduction
Mix proportion refers to the components and their proportions of concrete. Both theoretical guidance and engineering practice adopt property-based mix proportion design method currently, leading to a fact that in a specific environment and structure, properties and structural functions of concrete in service are enormously affected by its mix proportion. A neglectable error of mix proportion could result in great difference of concrete service behavior [1].
Modern concrete has been, to date, the most widely used building material in all kinds of construction engineering since its invention because of its irreplaceable and incomparable advantages. Due to the rapid pace of urbanization and the explosive growth of global infrastructure market, the consumption of cement-based concrete is considerable. It is estimated that global concrete consumption exceeds 3.3 × 10 10 t annually [1]. On the other hand, China national standards [2,3] 10 20 31.5 A total of 75 mix proportions could be obtained by various combinations of w/b, s/a and NMSCA. A few extreme combinations that could not be adopted in engineering practice were excluded and 67 experiments were conducted. Such exclusions were considered form the perspective of compressive strength and fluidity, concrete mixtures with small w/b and large s/a always have poor fluidity, and the ones with large w/b usually have small compressive strength since the latter and the reciprocal of the former show a close linear relationship when the type of cement is determined. Eight excluded mix proportions are listed in Table 2.

Materials
Ordinary Portland cement produced in Tangshan, Hebei province was selected in this study. Its basic physical and mechanical properties are shown in Table 3. After inspection, its basic physical and mechanical properties and the content of chemical components conform to China national standard [25]. River sand with moderate fineness produced in Qinhuangdao, Hebei province was selected as fine aggregate. Its major physical properties conform to China national standard [26], as listed in Table 4. Crushed stone was chosen as coarse aggregate. Table 4 indicates its major physical properties. When NMSCA = 20 mm or 31.5 mm, referring to relevant material [27], continuous size grading was adopted to improve concrete service behaviors, as listed in Table 5.

Experiment Setup and Image Collection
All the experiments in this study were conducted in Beijing from May to July in 2019. HJW60 concrete mixer, produced in Wuxi, was used for concrete mixing. Well-mixed concrete mixtures were poured into the metal tray with the size of 800 mm × 800 mm.
In order to ensure the evenness of collected images and reduce the influence of light, indoor light sources were controlled, and a 1-m-high metal plate was fixed on each side of the tray to block uneven light. Hand-held image acquisition device took pictures at a fixed height of 400 mm from the bottom of the tray to reduce accidental error. Image collection was completed within 120 s after pouring to avoid slump loss, maintain freshness and keep in line with manufacturing practice.
The same number of image sets obtained from 67 experiments include a total of 8340 images, containing 152 images for the set with the maximum number and 82 with the minimum.

Image Preprocessing, Data Augmentation and Dataset Segmentation
Few blurred images and images with too much noise were deleted. In each set, one image was randomly selected as testing image to set up a testing set containing 67 images. Data augmentation was carried out for the rest images and the number of images in each set was expanded to 200 in order to improve generalization ability of CNN models and prevent networks from overfitting. Given the fact that existing ones could cover all poured mixtures in the tray, we did not add more images to avoid networks memorizing exact details, since more images lead to more detail repetitions. Approaches for data augmentation include rotation, horizontal shift, vertical shift, shear, zoom and horizontal flip.
200 images in each set were randomly divided into training and validation set after shuffling, of which training set was made up for 75% and validation set for 25%. Therefore, there are 10,050 images in training set and 3350 in validation set.

Deep Learning Theory and CNN
Deep learning methods are representation-learning methods. Multiple levels of representation were obtained by composing nonlinear, but simple modules that each transform the representation at one level into a more abstract and complex level [28]. Instead of extracting features manually, deep learning methods allow a machine to be fed with raw data, and a number of filters then extract the topological features hidden in input data and needed representations were automatically discovered, this is also the essence of deep learning methods [28,29].
CNN is a kind of typical, outstanding and most widely adopted basic structure of deep learning theory. LeCun et al. [30] published a paper establishing basic framework of CNN and later improved it using gradient-based optimization for document recognition [31,32]. The architecture of a typical CNN is structured as a series of blocks. The first few blocks are composed of combinations of convolutional layers and pooling layers, and the last block usually consists of fully connected layers and a classification model [28,29,33]. Four key insights behind CNN that take advantage of the properties of natural signals are shared weights, local connections, pooling and the use of many layers [28].
There are also many other approaches with successful applications in deep learning community. These neural networks are tailored for a variety of different tasks, to name a few, recurrent neural networks (RNN) [34] and long short-term memory (LSTM) [35,36] are suitable for modeling sequential data and sequence recognition and prediction, region-based convolutional neural network (R-CNN) [37] as well as its variants and you only look once (YOLO) [38] are capable of tackling object detection problems, SegNet [39] is tailored for semantic segmentation tasks. Such models have more complex structures and modules, such as memory blocks in LSTM, to cope with more complicated problems [40,41]. Complex structures correspond to considerable computing resources consumptions. Classification is one of the most basic tasks for deep learning, so neural networks for classification tasks usually have simple, but effective structures, and computational costs could also be reduced.

Multilabel Image Classification
Real-world objects always have multiple semantic meanings simultaneously. The paradigm of multilabel learning emerges to deal with the defect that explicit expression of such multiplicity is hindered by the simplified assumption of traditional supervised learning that each real-world object is represented by a single instance and associated with a single label [46]. Compared with single label classification, more information is asked for multilabel classification, but it is more similar to human cognition.
To cope with the challenge of exponential-sized output space, it is crucial to effective exploit label correlations information. Existing strategies could be roughly characterized into three families based on the order of correlations. First-order strategy tackles learning tasks in a label-by-label style and thus ignoring coexistence of the other labels. Second-order strategy considers pairwise relations between labels. High-order strategy considers high-order relations among labels [46].
In machine learning community, solutions to multilabel image classification are usually considered from two perspectives. Problem transformation methods include label-based transformation and instance-based transformation, which fit data to algorithm and transform multilabel tasks into other well-established learning scenarios, especially single-label classification. Representative algorithms include binary relevance [47], classifier chains [48], calibrated label ranking [49], random k-labelsets [50], and so forth. Algorithm adaptation methods fit algorithm to data and adapt successful learning techniques to deal with multilabel data directly. Typical algorithms include ML-kNN [51], ML-DT [52], rank-SVM [53], CML [54], and so forth.
Particularly, with respect to deep learning, the application of CNN for multilabel image classification is rapid widened because of the prominent merits aforementioned. According to the key philosophy of problem transformation methods, CNN could be directly used for multilabel classification tasks, such methods were named as "multilabel CNN" [46,55]. CNN could also be improved or combined with other algorithms, outstanding methods include HCP [56], CNN-RNN [57], RLSD [58]. Such improved CNNs usually apply a feature selection attention strategy called attention mechanism. In addition, brand new algorithms represented by GCN [59] are developing recent years.

Establishing and Improving of Network Models
In this study, according to aforementioned solutions for multilabel image classification tasks, four CNN models were established for training and comparative study, among which the best-performing one would be selected and put into manufacturing and engineering application for mix proportion monitoring.

CNN Models Based on Problem Transformation
Multilabel classification task for w/b, s/a and NMSCA was transformed into multiclassification task for mix proportion "instance". Specifically, corresponding to 67 experiments, different classes of three labels were combined to obtain 67 sets of instances of mix proportion, and concrete mixture images were classified into 67 classes at the level of mix proportion instances. Two kinds of original multiclassification-based CNN structures were applied to realize the classification of mix proportion instances.
AlexNet structure was adopted as the first basic CNN structure since it is a breakthrough in the development of CNNs that has simple, but efficient network configuration. The advantage of AlexNet lies in its two independent GPUs for simultaneous training, and actually its layers and filters are structured in groups. Furthermore, for the second, fourth and fifth convolutional layers, the convolution kernels deployed on a specific GPU is only connected with the ones on the same GPU in the previous layer. Evidently, such structures are not applicable to limited hardware facilities. In addition, it is worth noting that input images were classified into 1000 classes by original AlexNet, such number is significantly larger than that of mix proportion instances. Therefore, AlexNet structure was fine-tuned as follows to adapt to mix proportion classification task: (1) Structure about double parallel-working GPUs was ignored, convolution kernels separately deployed on two GPUs were merged and local response normalization (LRN) operation was replaced by batch normalization (BN) and (2) the last fully connected layer and output layer were reset to classify concrete mixture images into 67 classes. For further discussion, this fine-tuned AlexNet-structure-based CNN was named as Net-I.
AlexNet has a considerable number of parameters that exceed 61 million since ImageNet dataset [60] was used for its training, of which the number of classes and images are significantly larger than that of concrete mixture dataset. Given that training a network which has too many parameters with a relatively small dataset may lead to overfitting and further resulting in the declining of generalization ability, a simpler CNN with fewer parameters was established referring to AlexNet structure. Input size of images, network depth, the number of layers, the number and size of filters were comprehensively considered. After a trial-and-error process, detailed structure of the simple CNN was determined as illustrated in Table 6. This self-established simple network was named as Net-II for further discussion.

CNN Models Based on Algorithm Adaption
Although above-discussed method, adapting dataset to original multiclassification-based CNN, is simple and convenient to perform, the prominent defect lies in the sample loss. To be specific, for images of a given class in the corresponding label, such as w/b = 0.3, rather than being gathered up to enable networks to learn the features of concrete mixture images for such class, they were distributed to 10 sets of images with different mix proportions. Moreover, network models could only be evaluated at the level of mix proportion instance, and it could not be estimated that how the networks performed with regard to each of the three labels.
To tackle such defects, improvement of original CNN tailored for multiclassification tasks is desirable. First, for one sample, confidence scores of different classes outputted by Softmax activation function adopted in original CNN are correlated with each other, and all returned values are always summed up to 1. However, for our multilabel classification task, three labels: w/b, s/a and NMSCA are not mutually exclusive. Therefore, Softmax function was replaced by Sigmoid function as the activation of output layer. For the confidence score P y (i) = n x (i) ; w of the j-th class being proper label of the i-th example, Sigmoid and Softmax function are defined as Equations (1) and (2), respectively: . . .
for i = 1, 2, . . . , m. Where, m is the number of examples at current batch, n represents the number of classes, w denotes the weights, w T n x (i) is the output of the previous layer.
Sensors 2020, 20, 4638 9 of 23 Subsequently, categorical cross entropy was replaced with binary cross entropy as loss function to consider each output label as an independent Bernoulli distribution. The loss function of binary cross entropy and categorical cross entropy are defined as Equations (3) and (4), respectively: where, the new index k is introduced to indicate that n k=1 e w T k x (i) is independent of n j=1 1{·}. The term 1 y (i) = j is a logical expression that returns ones if a predicted class of the i-th example is true for j-th class and returns zeros otherwise.
Essentially, above two improvements transform the multilabel classification problem to binary-classification task for each example, that is, CNN models would successively determine whether each of the total 13 classes is or is not a proper label of a given image. Such transformation is realized by adapting CNN models to datasets, so these improved CNN models were still recognized as algorithm-adaption-based models.
It is worth noting that correlations of labels are not considered by above improvements. Specifically, three classes with the highest confidence scores will be outputted by CNN models, but could not be guaranteed that they come from three different mix proportion labels. To cope with such shortcoming, an additional module was added after the output layer to enable network models to select the class with the highest confidence score separately from w/b, s/a and NMSCA as the final outputs.
Corresponding to Net-I; and Net-II, both AlexNet and self-established structure were improved for multilabel image classification. Improved AlexNet and self-established structure were named as Net-III and Net-IV, respectively for further discussion.
Moreover, deep learning tasks usually encounter huge challenges when tuning training hyperparameters. Manual trail-and-error processes may consume considerable time. Therefore, Bayesian optimization [61,62] was adopted to search optimal hyperparameters in parameter space. Bayesian optimization was applied to optimize initial learning rate, batch size and epoch. Approximate search ranges of three hyperparameters were determined by preliminary training in advance. In order to accelerate the convergence of CNN models, instead of setting continuous closed intervals, several discontinuous values were specified within their approximate ranges to form three sets for searching, as manifested in Table 7. The rest major training options were fixed as listed in Table 8.  To summarize, adjustments or improvements of four CNN models are summarized in Table 9. Four established CNN models are visualized by Figures 1 and 2. To summarize, adjustments or improvements of four CNN models are summarized in Table 9. Four established CNN models are visualized by Figures 1 and 2. Table 9. Adjustments or improvements of four CNN models.

Results
All programs in this study were performed with MatlabR2020a and Python3.8, on a desktop computer equipped with 2.6 GHz Intel i7-4720 CPU, 16 GB RAM, x-64-based processor and NVIDIA GeForce GTX960M GPU. To summarize, adjustments or improvements of four CNN models are summarized in Table 9. Four established CNN models are visualized by Figures 1 and 2. Table 9. Adjustments or improvements of four CNN models.

Results
All programs in this study were performed with MatlabR2020a and Python3.8, on a desktop computer equipped with 2.6 GHz Intel i7-4720 CPU, 16 GB RAM, x-64-based processor and NVIDIA GeForce GTX960M GPU.

Results
All programs in this study were performed with MatlabR2020a and Python3.8, on a desktop computer equipped with 2.6 GHz Intel i7-4720 CPU, 16 GB RAM, x-64-based processor and NVIDIA GeForce GTX960M GPU.

Optimal Hyperparameters Selected by Bayesian Optimization
Bayesian optimization was performed by minimizing classification error of validation set during training, that is, the best-performing CNN model was selected based on validation accuracy. Optimal training hyperparameters selected by Bayesian optimization are illustrated in Table 10.

Metrics and Evaluations of Training and Validation Set
Training with the optimal hyperparameters, training and validation accuracies achieved by four CNN models are listed in Table 11. Evidently, accuracies on both training and validation set is higher than 99% for all the four CNN models, validation accuracies of Net-III and Net-IV even reached 100%. It is worth noting that both training and validation accuracies of two improved CNN models based on algorithm adaption, Net-III and Net-IV, are higher than that of Net-I and Net-II. Moreover, Net-II and Net-IV-which applied self-established network structures-higher accuracies compared with Net-I and Net-III.
Furthermore, refer to accuracy curves shown in Figure 3, compared with multiclassification-based Net-I and Net-II, Net-III and Net-IV had higher initial training and validation accuracies. Net-I and Net-III converged slower and their curves have more significant fluctuations. Meanwhile, time elapsed during training of two CNNs with self-established structure is much shorter than that of Net-I and Net-III, since the number of parameters of self-established CNNs is only 27 million, less than half of AlexNet structure.
For images in the validation set, confusion matrices were drawn to visually illustrate classification results. Figures 4 and 5 show partial confusion matrices of Net-I and Net-II, it is unnecessary to draw confusion matrices for Net-III and Net-IV because they are not essentially multiclassification models and their validation accuracies reached 100% (name of classes are shown in the form of w/b_s/a_NMSCA, for example, 0.3_25_10 could be interpreted as w/b = 0.3, s/a = 25%, NMSCA = 10 mm, the same hereinafter). Two confusion matrices only show corresponding classes of misclassified images, classes in which all images are correctly classified are not shown here. The element a ij in each cell of the matrix could be interpreted as the number of images classified to the i-th class but belong to the j-th class. The color of each cell represents its proportion in the sum of all elements in corresponding column, that is, its proportion in the total number of images of corresponding ground truth label and the proportion of correct or incorrect classification. As shown in Figures 4 and 5, most images are in the diagonal of matrices which means correct classification. In the total of 3350 validation images, Net-I misclassified 33 images in seven classes, and four images distributed in two classes were misclassified by Net-II, such classification ability is acceptable in engineering and manufacturing practice. Meanwhile, it is also clarified by such data that Net-II performs better on validation set than Net-I. For images in the validation set, confusion matrices were drawn to visually illustrate classification results. Figures 4 and 5 show partial confusion matrices of Net-Ⅰ and Net-Ⅱ, it is unnecessary to draw confusion matrices for Net-Ⅲ and Net-Ⅳ because they are not essentially multiclassification models and their validation accuracies reached 100% (name of classes are shown in the form of w/b_s/a_NMSCA, for example, 0.3_25_10 could be interpreted as w/b = 0.3, s/a = 25%, NMSCA = 10 mm, the same hereinafter). Two confusion matrices only show corresponding classes of misclassified images, classes in which all images are correctly classified are not shown here. The element in each cell of the matrix could be interpreted as the number of images classified to the i-th class but belong to the j-th class. The color of each cell represents its proportion in the sum of all elements in corresponding column, that is, its proportion in the total number of images of corresponding ground truth label and the proportion of correct or incorrect classification. As shown in Figures 4 and 5, most images are in the diagonal of matrices which means correct classification. In the total of 3350 validation images, Net-Ⅰ misclassified 33 images in seven classes, and four images distributed in two classes were misclassified by Net-Ⅱ, such classification ability is acceptable in engineering and manufacturing practice. Meanwhile, it is also clarified by such data that Net-Ⅱ performs better on validation set than Net-Ⅰ.      In summary, four CNN models showed undoubted learning ability on training and validation set and Net-Ⅳ had the best performance during training and validation.

Metrics and Evaluations of Testing Set
Generalization ability of CNN models was evaluated and compared using testing set composed of 67 images.

Evaluations and Comparisons of Net-Ⅰ and Net-Ⅱ
Net-Ⅰ and Net-Ⅱ are evaluated using value which is commonly applied to multiclassification tasks.
is defined based on two parameters: precision and recall, which are defined as follows: for 1 ≤ ≤ . Where, n is the number of classes, m is the number of examples, ( ) denotes the feature vector extracted from the i-th example, ( ) denotes the set of the ground truth label associated with ( ) , is the j-th class label, |⋅| is interpreted as the operation that returns the cardinality, ( ) is the classifier that returns the set of proper label of . precision and recall are usually comprehensively considered and jointly used in practice.
is an integrated version of precision and recall with the balancing factor > .
is defined as Equation (7): In summary, four CNN models showed undoubted learning ability on training and validation set and Net-IV had the best performance during training and validation.

Metrics and Evaluations of Testing Set
Generalization ability of CNN models was evaluated and compared using testing set composed of 67 images.

Evaluations and Comparisons of Net-I and Net-II
Net-I and Net-II are evaluated using F 1 value which is commonly applied to multiclassification tasks. F 1 is defined based on two parameters: precision and recall, which are defined as follows: for 1 ≤ i ≤ m. Where, n is the number of classes, m is the number of examples, x (i) denotes the feature vector extracted from the i-th example, Y (i) denotes the set of the ground truth label associated with x (i) , y j is the j-th class label, |·| is interpreted as the operation that returns the cardinality, h(x) is the classifier that returns the set of proper label of x. precision and recall are usually comprehensively considered and jointly used in practice. F β is an integrated version of precision and recall with the balancing factor β > 0. F β is defined as Equation (7): when β = 1, F β returns the harmonic mean of precision and recall recognized as F 1 : The values of precision, recall and F 1 of Net-I and Net-II are listed in Table 12. Apparently, precision, recall and F 1 of Net-II are all equal to 1, the reason is this network correctly classified all testing images. Such three indices of Net-I are a little lower than that of Net-II.
Specifically, top-3 confidence scores and corresponding classes of testing images which were misclassified and images which network models returned low confidence score of ground truth label being the proper label are illustrated in Tables 13 and 14 (corresponding labels of the specific confidence score are listed in bracket). Table 13. Testing images which were misclassified by Net-I and images which Net-I returned low confidence score of ground truth label being the proper label. It is evidently illustrated by above two tables that seven out of 67 testing images were misclassified by Net-I and there exists three images that Net-I returned low confidence score (lower than 60%) of ground truth label being the proper label; all the testing images were correctly classified by Net-II and there only exists one image that gained the confidence score lower than 60% of ground truth label being the proper label. The ground truth labels of aforementioned two kinds of images are highly consistent with that of misclassified validation images.

Ground Truth Label Top-3 Confidence Score (%) and Corresponding Class w/b s/a (%) NMSCA (mm)
To summarize, compared with Net-I, generalization performance of Net-II is much better. Overfitting phenomenon appeared on CNN model which applied AlexNet structure, this is also in line with the consideration we discussed in Section 4.3 when establishing network models.

Evaluations and Comparisons of Net-III and Net-IV
Multilabel classification tasks usually cannot directly use existing evaluation metrics for multiclassification problems, here we introduce mean interpolated average precision (MiAP) which is commonly applied in information retrieval community. The definition of MiAP is also based on precision and recall, but it is worth noting that such two indices are defined differently here from those in Section 5.3.1. We redefine precision and recall as Equations (9) and (10): for i = 1, 2, . . . , k and k = 1, 2, . . . , m. Where, m is the sample size, y (i) j denotes the label with one-hot encoding label with one-hot encoding for x (i) of the j-th class, y returns the confidence score of y include the examples with top-k confidence scores.
Rather than recognizing examples with the highest confidence score or with the confidence score higher than a specific fixed threshold as positives, this operation in fact specifies changing threshold of confidence score as k varies from 1 to m, examples with confidence scores higher than that of the k-th example is recognized as positives after the descending ordering of confidence scores.
The number of the recall values we obtained from j-th class is x (i ) y j ∈ Y (i ) , 1 ≤ i ≤ m , various precision values correspond to one recall level. Interpolation method is used to calculate average precision to simplify calculation and comprehensively consider the performance of models reflected by precision and recall. The interpolated precision P interp at each certain recall level R is defined as the highest precision value P retrieved for any recall level R ≥ R : Therefore, for the j-th class, interpolated average precision (iAP) could be calculated as Equation (12): MiAP of CNN models can be then calculated from the mean values of all classes: Figure 6 illustrates the P-R curves of Net-III and Net-IV. MiAP and iAP of each class are manifested in Table 15. Definitely, both two CNNs achieved high MiAP and iAP, classification task was accomplished. Compare from the perspective of labels, the label NMSCA had higher P-R curves, indicate that networks had the best generalization ability with respect to NMSCA and two networks were of slightly inferior performance when identify w/b and s/a. With respect to comparation CNN models, MiAP and iAP of Net-III is lower than that of Net-IV, generalization ability of networks with AlexNet structure is relatively poor and there may be an overfitting phenomenon which is also consistent with the considerations discussed in Section 4.3.  MiAP is actually a label-based ranking metric only focusing on classification performance of classes of a model. However, in manufacturing and engineering practice, for a given sample, more attention is usually concentrated on error between the result identified by models and the actual situation. Although the essence of two CNNs are classifiers which output the confidence score of a specific class being proper label of a given image, three labels for classification in the present study are quantifiable indices. Hence, "identified values" of w/b, s/a and NMSCA of each testing image could be calculated by weighted average calculation using confidence scores and corresponding numeric values of classes in one mix proportion label. Correspondingly, "actual values" refer to the real w/b, s/a and NMSCA values reflected by ground truth label. Identified values could be compared with actual values by various indices to evaluate their errors. In this study, mean absolute percentage  MiAP is actually a label-based ranking metric only focusing on classification performance of classes of a model. However, in manufacturing and engineering practice, for a given sample, more attention is usually concentrated on error between the result identified by models and the actual situation. Although the essence of two CNNs are classifiers which output the confidence score of a specific class being proper label of a given image, three labels for classification in the present study are quantifiable indices. Hence, "identified values" of w/b, s/a and NMSCA of each testing image could be calculated by weighted average calculation using confidence scores and corresponding numeric values of classes in one mix proportion label. Correspondingly, "actual values" refer to the real w/b, s/a and NMSCA values reflected by ground truth label. Identified values could be compared with actual values by various indices to evaluate their errors. In this study, mean absolute percentage error (MAPE), absolute fraction of variance (R 2 ) were applied for error evaluation. For each class, MAPE and R 2 are defined as Equations (14) and (15): where, a i and p i denote the actual value and identified values of the i-th example, m is the sample size. Actual and detected values of each testing image are illustrated in Figure 7. Undeniably, most images are on or close to the diagonal of the diagram. MAPE and R 2 of each label and average MAPE and R 2 of two networks are listed in Table 16. Apparently, MAPE of each label and two CNNs are small, even close to 0, R 2 values are close to 1. In contrast to MiAP, compared with w/b and s/a, the NMSCA label did not have the best performance when evaluating with MAPE and R 2 , this is because the numeric difference of three classes in NMSCA label is larger than that in w/b or s/a label. When it comes to the performance of networks, comparison results are in line with that of MiAP, that is, Net-IV had better generalization ability.  (14) and (15): where, and denote the actual value and identified values of the i-th example, m is the sample size. Actual and detected values of each testing image are illustrated in Figure 7. Undeniably, most images are on or close to the diagonal of the diagram. MAPE and of each label and average MAPE and of two networks are listed in Table 16. Apparently, MAPE of each label and two CNNs are small, even close to 0, values are close to 1. In contrast to MiAP, compared with w/b and s/a, the NMSCA label did not have the best performance when evaluating with MAPE and , this is because the numeric difference of three classes in NMSCA label is larger than that in w/b or s/a label. When it comes to the performance of networks, comparison results are in line with that of MiAP, that is, Net-Ⅳ had better generalization ability.     Aforementioned research methodology and results evaluations and comparations are visualized in the form of flow chart in Figure 8.

Comparative Study
Above arguments indicate that all the four CNN models have learned the features of mix proportion in fresh concrete images, presenting outstanding learning and generalization ability. Moreover, identification time of testing images is within 1 s, realizing real-time monitoring of mix proportion. Demonstrated by comprehensive comparisons, Net-Ⅳ was chosen as the representative achievement of the present study since it had undeniable the best performance. Net-Ⅳ was further named as ConcMPNet.
ConcMPNet was compared with methods proposed by Ref. [17,18,19,20] and existing testing method, as manifested in Table 17. Visibly, ConcMPNet requires the simplest approach. Real-time monitoring could be realized, green manufacturing could be implemented, and resource wasting problem was addressed only by ConcMPNet. Prominent merit and effectiveness of presented multilabel image classification and CNN-based monitoring method are highlighted by comparative studies.  [17,18], R 2 is compared as evaluation metric; for reference [19], percentage of successful classification is adopted and more details could be referred in this study; reference [20] used R for evaluation; for existing testing method in laboratory, assurance rate no less than 95% is required for concrete compressive strength.

Establishing of Mix Proportion Monitoring and Integrated Intelligent Sensing System
ConcMPNet was embedded into executable application, visual interactive interface was designed and laid out, then the executable application was packaged and equipped with certain hardware facilities such as high-definition camera and deployed to terminals to establish fresh

Comparative Study
Above arguments indicate that all the four CNN models have learned the features of mix proportion in fresh concrete images, presenting outstanding learning and generalization ability. Moreover, identification time of testing images is within 1 s, realizing real-time monitoring of mix proportion. Demonstrated by comprehensive comparisons, Net-IV was chosen as the representative achievement of the present study since it had undeniable the best performance. Net-IV was further named as ConcMPNet.
ConcMPNet was compared with methods proposed by Ref. [17][18][19][20] and existing testing method, as manifested in Table 17. Visibly, ConcMPNet requires the simplest approach. Real-time monitoring could be realized, green manufacturing could be implemented, and resource wasting problem was addressed only by ConcMPNet. Prominent merit and effectiveness of presented multilabel image classification and CNN-based monitoring method are highlighted by comparative studies.  [17,18], R 2 is compared as evaluation metric; for reference [19], percentage of successful classification is adopted and more details could be referred in this study; reference [20] used R for evaluation; for existing testing method in laboratory, assurance rate no less than 95% is required for concrete compressive strength.

Establishing of Mix Proportion Monitoring and Integrated Intelligent Sensing System
ConcMPNet was embedded into executable application, visual interactive interface was designed and laid out, then the executable application was packaged and equipped with certain hardware facilities such as high-definition camera and deployed to terminals to establish fresh concrete mix proportion monitoring system, which could also be recognized as an intelligent sensor. The monitoring system has two parallel operation processes for sensing: taking real-time pictures and loading stored images in terminals. The user interface of established monitoring system is shown in Figure 9.
in Figure 9.
Such a system cooperated with existing mechanical and weighing sensors to establishing integrated intelligent sensing system. When production equipment fails to produce concrete mixture with proper calculated mix proportion, our integrated intelligent sensing system could send warning messages to fixed and mobile terminals. Real-time and full-scale mix proportion monitoring and inaccurate mix proportion sensing and warning during manufacturing could be achieved. To summarize, according to aforementioned series processes of CNN model establishing, improving, training, testing, selection, as well as the establishing of integrated intelligent sensing system, working flow and research process of the present study are illustrated in Figure 10.

Conclusions
In the present study, a novel deep-learning-based method was presented for mix proportion real-time and full-scale monitoring of fresh concrete. As a typical data-driven, learning-based Figure 9. User interface of concrete mix proportion monitoring system.
Such a system cooperated with existing mechanical and weighing sensors to establishing integrated intelligent sensing system. When production equipment fails to produce concrete mixture with proper calculated mix proportion, our integrated intelligent sensing system could send warning messages to fixed and mobile terminals. Real-time and full-scale mix proportion monitoring and inaccurate mix proportion sensing and warning during manufacturing could be achieved.
To summarize, according to aforementioned series processes of CNN model establishing, improving, training, testing, selection, as well as the establishing of integrated intelligent sensing system, working flow and research process of the present study are illustrated in Figure 10. concrete mix proportion monitoring system, which could also be recognized as an intelligent sensor.
The monitoring system has two parallel operation processes for sensing: taking real-time pictures and loading stored images in terminals. The user interface of established monitoring system is shown in Figure 9.
Such a system cooperated with existing mechanical and weighing sensors to establishing integrated intelligent sensing system. When production equipment fails to produce concrete mixture with proper calculated mix proportion, our integrated intelligent sensing system could send warning messages to fixed and mobile terminals. Real-time and full-scale mix proportion monitoring and inaccurate mix proportion sensing and warning during manufacturing could be achieved. To summarize, according to aforementioned series processes of CNN model establishing, improving, training, testing, selection, as well as the establishing of integrated intelligent sensing system, working flow and research process of the present study are illustrated in Figure 10.

Conclusions
In the present study, a novel deep-learning-based method was presented for mix proportion real-time and full-scale monitoring of fresh concrete. As a typical data-driven, learning-based

Conclusions
In the present study, a novel deep-learning-based method was presented for mix proportion real-time and full-scale monitoring of fresh concrete. As a typical data-driven, learning-based approach, the key insight of this method lies in feeding elaborately planned, uniformly distributed, widely covered and high-quality data to CNN models which are enabled to extract implicit features and generalizing such features to newly fed data for identification. Accounting for engineering and manufacturing practice, w/b, s/a and NMSCA were selected as variables of mix proportion, also the labels of concrete mixture images and the target objects of classification and mix proportion monitoring.
A total of 67 experiments were conducted, and the same number sets of images were collected which were annotated with different combinations of classes in above three mix proportion labels. Four CNN models, based on problem transformation and algorithm adaption, respectively, were established, improved, trained and tested. Training and validation accuracies of four networks are all above 99%. As for testing set, F 1 value of Net-I is above 0.85, and Net-II even reaches 1; MiAP of Net-III and Net-IV are all above 0.9, MAPEs are small enough and R 2 values are close to 1. All the four networks showed outstanding learning and generalization ability. Net-IV was chosen as the representative achievement and named as ConcMPNet after comparison. ConcMPNet was embedded into executable application and equipped with hardware facilities to establish fresh concrete mix proportion monitoring system. Such system was deployed to terminals and cooperated with existing mechanical and weighing sensors to establish integrated intelligent sensing system, real-time and full-scale mix proportion monitoring and inaccurate mix proportion sensing and warning during manufacturing could be achieved.
The contributions of this research lie in: • The improved CNN model was embedded in executable application for fresh concrete mix proportion monitoring, overcoming the defects of real-time and full-scale monitoring that have not been, to date, addressed by existing testing method and other studies; • The presented CNN model could likely be scaled up for other concrete mixture identification tasks. Our well-trained CNN model could be applied for transfer learning which received much attention recent year because it effectively reduces computational costs. It is crucial for the success of transfer learning that the dataset of target task is similar with original training dataset, but existing successful CNNs did not apply professional concrete mixture dataset for training. Therefore, transfer learning using our CNN model provides a potential way for future similar concrete identification tasks. For a specific kind of concrete mixture such as recycled concrete or environmentally friendly concrete, monitoring could be implemented by simply feeding collected dataset to our CNN models and carrying out transfer learning; • As a multi-disciplinary research, this study introduced a state-of-the-art method to ancient and traditional engineering manufacturing community and widened application fields of intelligent methods and deep learning techniques. The present study provides a solid foundation for future works in both disciplines.
The prominent merit of presented method lies in that it can realize real-time monitoring of fresh concrete mix proportion only by taking pictures which could not be achieved by previous studies and existing methods. However, the proposed multilabel-image-classification-based method does not intend to be a cure-all. The convenience of monitoring is bought at the expense of huge number of previous experimentations and onerous work of data collection. This is also the shortcoming of all data-driven methods. More precise identification requires more data and larger datasets.
Future works could be built on the united application of CNN and other intelligent algorithms, such as ANN, to realize concrete property especially mechanical property prediction with the route of "image-mix proportion-property" and moreover, it is promising to cooperate the proposed system with series of other approaches, such as chemical composition sensors, to promote the establishing of a more intelligent and precise sensing system for concrete early properties monitoring stepping forward.