Computerized Data Interpretation for Concrete Assessment with Air-Coupled Impact-Echo : An Online Learning Approach

Developing efficient Artificial Intelligence (AI)-enabled system to substitute human role in 1 non-destructive testing is an emerging topic of considerable interest. In this study, we propose a novel 2 impact-echo analysis system using online machine learning, which aims at achieving near-human 3 performance for assessment of concrete structures. Current computerized impact-echo systems 4 commonly employ lab-scale data to validate the models. In practice, however, the echo patterns can 5 be far more complicated due to varying geometric shapes and materials of structures. To deal with 6 a large variety of unseen data, we propose a sequential treatment for echo characterization. More 7 specifically, the proposed system can adaptively update itself to approaching human performance in 8 impact-echo data interpretation. To this end, a two-stage framework has been introduced, including 9 echo feature extraction and the model updating scheme. Various state-of-the-art online learning 10 algorithms have been reviewed and evaluated for the task. To conduct experimental validation, we 11 collected 10940 echo instances from multiple inspection sites with each sample had been annotated 12 by human experts with healthy/defective condition labels. The results demonstrated that the 13 proposed scheme achieved favorable echo pattern classification accuracy with high efficiency and 14 low computation load. 15


Introduction
Aging infrastructure poses the significant challenges to human society.It is indispensable to perform the efficient non-destructive evaluation (NDE) to ensure the safety of those critical structures.
In this study, we focus on hammering impact-echo test, which is one of the most conventional NDE methods for assessment of concrete structures due to the low-cost and high-efficiency [1,2].In the setting of hammering inspection, field engineer generates surface impact using a handy hammer and then determine structural condition by listening to the echoes.It is evident that such judgment is highly subjective and relies on individual experience; and thus, leaves the evaluation results open to human error.Extensive research efforts had been made to develop efficient computerized echo investigation systems to alleviate human efforts in hammering echo analysis, as well as to eliminate the human-made errors, [4,11,14].and condition assessment result is presented.Figure 1 shows the schematic flow.This paper mainly addresses the issue of devising efficient machine learning algorithm for impact-echo.Current learning schemes applied for the impact-echo test are restricted to the standard batch setting, which assumes both training and input testing echo samples reside in the same feature space with the static statistical characteristic; hence, model training can be performed over the pre-collected laboratory-scale echo database [11,15,16].In practices, however, such assumption does not hold.The patterns of echo signal can alter significantly with the specifications of concrete structures under evaluation, such as material, shape and years of service [1].From the viewpoint of machine learning, these factors would make the posterior distribution of the test echoes drift from that of the pre-collected training samples; thus, degrading the echo analysis performance.
In this study, we adopt an alternative hypothesis which admits the pre-collected training echo dataset only covers small range of the complete distribution; moreover, we propose a new formulation of echo pattern classification with the online learning paradigm, where efficient model updating schemes has been exploited to minimize the cumulative prediction loss suffered along with the continuous input of echo data with expert labels.Online learning is a well-established learning scheme which has both theoretical and practical appeals [17,18] and it is particularly well-suited to the hammering impact-echo, since the large-scale echo data can be accessed only in a sequential way.
It is noteworthy that our ultimate goal is to develop an efficient hammering echo investigation system with near-human accuracy.In hammering test, humans are capable of discerning the defect-induced echoes of various concrete structures by auditory perception.In this study, we propose AI-enabled computing system by adopting a formulation of binary classification, which produces labels to indicate healthy or defective concrete, respectively.At validation stage, a loss between the predicted results and expert labels has been employed to compare the performance of proposed approach to that of humans.The main contributions of proposed approach can be summarized as follows: • The objective of this study is to build efficient impact-echo analysis system for concrete structure assessment.To this end, a novel online learning framework had been proposed, which can effectively characterize discriminant information from large-scale echo spectrum data in an incremental way.

Related work
In this section, we present a review to the previous studies conducted on hammering impact-echo methods for concrete condition assessment.The review comes up with two parts: the first is fundamental research towards impact-echo method and the latter is recent advances in developing computerized hammering echo investigation system with machine learning techniques.

Impact-echo Method and Air-coupled Hammering Inspection
The initial literatures describing impact-echo presented from 1970s, subsequently, more studies were carried out in both theoretical and experimental aspects [2,3,6].Echo signal analysis is commonly performed through Fourier analysis.Although advanced methods have been employed in recent works, such as using Wavelet transform [5] to alleviate poor time-frequency resolution in Fourier spectra and fuse impulse response with 3D laser scanning for delamination detection [10], Fourier analysis still dominates in practice due to the efficient implementation.In figure 2, we present one example clip of hammering echo waveform and its Fourier spectrum, respectively.A well-known formula to determine a void beneath surface of concrete is proposed by [1]: where f peak denotes peak frequency of echo signal spectrum, C p is the velocity of the longitudinal, β is constant of 0.96 for plate-shape structures wave according to [1] and d represents depth of inside void.However, some recent studies reveal the availability of formula (1) is constrained by the size and flatness of defect area, e.g. if void is not parallel to surface, the echo resonance behaves differently and thus Equation 1 fails to estimate void depth [9].In addition, to facilitate engineers' usage of impact-echo technique, imaging methods for impact-echo test attracted much research interests, such as in [7], a depth spectrum is proposed which interprets spectral peak of echo signal to depth of defect.Impact-echo is initially a contact inspection method, which is quite time-consuming to fix transducer, especially when dealing with large structures.To enhance the efficiency, a new suggestion is to apply air-coupled sensor in impact-echo [8].A designated air-couple sensor is employed to capture acoustic echo from concrete structure.And experimental results show the air-coupled sensor is comparable to contact sensors for delamination detection and grouting quality evaluation tasks.Until now, impact-echo method, due to its high diagnosis accuracy and favorable stability, remains to be active research topic in the non-destructive test field, and efforts will be continuously delivered to the topic.The last five years have seen remarkable progress in machine learning research, and a spreading trend emerged to develop human-level machine learning systems to relieve people from laborious and exhausting tasks in the structural health monitoring field [4,14].In order to substituting human role in hammering echo interpretation, great efforts had been carried out to establish data-driven machine learning system to discern anomalous echoes from healthy ones [11,15,16].We present a review on current status of computerized impact-echo system development as follows.
The early systems commonly tackle echo investigation problem with statistical pattern classification, in which the echo spectra has been used as feature vector and various conventional classifiers have been employed, such as Gaussian mixture models (GMM) [11], Artificial Neural Network (ANN) [12] and Support Vector Machines (SVM) [13], to characterize discriminant information of healthy/defective echoes.In recent years, significant progress has been made in noise robust echo feature representation learning.Advanced echo signal descriptors developed by the bag-of-words model (BoW model) [15] and sparse coding approaches [16] has been proved to be effective for anomalous echo identification under hostile acoustic environment.It is noteworthy that these literatures commonly assume that all training and test echo instances are sampled from same population; the experimental dataset was confined to be the laboratory-scale as well.It is anticipated to be problematic when we directly apply the echo analysis model trained by lab-scale data to practical impact-echo test, because the pre-collected training data is quite limited to render sufficient discriminant information to deal with complex real-world echo patterns.

The Proposed Online Learning Framework for Hammering Echo Pattern Analysis
We introduce the details of proposed online machine learning-enabled hammering echo analysis system in this section.The processing flow has been shown in figure 3. We assume that the echo instances are received in a streaming way {x, y} t , t ∈ [1, ..., T] and all echoes had been annotated by professional inspector with healthy or anomalous labels.Notably, during the data collection, the the echo analysis model so as to achieve near-human performance for anomalous echo identification.
To this end, it is crucial to devise the efficient online learning scheme and various state-of-the-art algorithms have been reviewed and compared.The details are presented as follows.

Echo Feature Extraction
Over decades, spectral analysis has been a dominant approach for hammering echo analysis [3], [6], in which Fourier transform (FT) is employed to generate the spectrum of echo signal.Then, further pattern investigation can be performed.The Fourier analysis can be expressed as: where s(t) is echo waveform to be analyzed and x( f ) is the extracted echo spectrum.In the following contents, we will use x t ∈ R d to denote hammering echo spectrum collected at time stamp t, with d frequency-bins.Our pattern classification process is based on Fourier spectrum representation of echo signal.In figure 4, we present two examples of echo Fourier spectrum, which were collected from normal and defective concrete structures, respectively.According to the plots, differences in spectral distributions can be clearly observed.In addition, at low frequency region below 500Hz, high noise power can be seen which is irrelevant to hammering specimen.We employed high pass filter to eliminate the ambient noises presented in the band lower than 500Hz.Meanwhile, to reduce echo feature dimension, we also discard the frequency bands higher than 15kHz, since there existed no discriminant information in echo data.

Online Learning Algorithms in Evaluation
As the core feature of this research, online learning algorithms are employed to deal with the case that hammering echo data (with expert annotations) arrives incrementally with time stamps.
Concretely, at time t , the online learning algorithm analyses the input hammering echo feature vector and expert label, i.e. {x, y t }, through three steps: the first is to predict its label ŷt ∈ {−1, +1}, in which the two digits represent defective and healthy status, respectively.Then, we compare the predicted label ŷt with true label y t ∈ {−1, +1} by using a well-defined loss function l(y t , ŷt ).Finally, if the computed prediction loss exceeds a threshold, the classification model will be updated in an analytical way.Overall, the cumulative mistake through whole data stream can be minimized.In this section, we first present a general algorithmic framework of online machine learning for hammering echo

Perceptron
The perception algorithm is the initial method for online learning [20].Given the linearly separable data, the method can convergence to a hyperplane to shatter the different classes in a finite number of updates.The prediction function of perception is very simple: ŷt = sign(w x t ) and the updating rule will be conducted as follows: There is neither parameter nor optimization constrains in the perception algorithm.Perceptron algorithm has several limitations.First, it can only classify linearly separable sets of vectors.If the class-conditional data distribution is inherently nonlinear, perceptron will never reach a point where all vectors are classified properly.Second, since there is no constraint applied during model training, perceptron is vulnerable to noise.To alleviate the problems, substantial modification had been carried out.Representative works can be referred in [18].

Online Gradient Descent (OGD)
Gradient descend updating is another efficient approach for online learning [21].In this evaluation, we selected logistic loss to measure the prediction error: Subsequently, the updating rule can be represented as:

Passive-Aggressive Learning Algorithm[PA]
is one state-of-the-art first order online learning approach.The optimization formulation can be expressed as follows: where the loss function is based on the hinge loss: The updating rule can be derived analytically: In addition, several variants of PA method had been investigated [22].The core idea is to add slack variable ξ-induced penalty to handle non-separable cases.

The Second Order Perception (SOP)
Aiming at better characterizing the hammering echo data structure, advanced second-order online learning approaches had been developed.Unlike the above-mentioned first-order algorithms, the second-order online learning is designated to exploit the underlying relationship between features.Concretely, it assumes the weight vector exhibits Gaussian distribution w ∼ N (µ, Σ).At initialization stage, two additional hyperparameters are commonly set to w 1 = 0, Σ 1 = aI.Furthermore, the prediction function is noted as: The following updating process is conducted as the predicted label is inconsistent to true label: A representative work for second order perception can be referred to [23].

The Confidence-Weighted learning algorithm (CW)
CW method is an advanced second-order online learning [24].In contrast to SOP approach, CW methods perform the Kullback-Leibler divergence minimization between the new weight distribution and the old one with constraint that the probability of correct classification can be improved.The updating rule of CW is shown as below: A closed-form solution can be derived as: , where the updating coefficients can be calculated as follows: More detail parameters setting discussion can be found in [24].

Adaptive Regularization of Weight Vectors (AROW)
Regularization is regarded as useful trick to enhance both accuracy and robustness of online learning algorithm.AROW method added adaptive regulizer to restrict the sudden changes of weight during online learning [25].The formulation of AROW is presented as follows: where l 2 (µ; (x t , y t ) = (max{0, 1 − y t (µ The updating coefficients can be obtained by solving optimization problem: SCW is more advanced second-order learning algorithm that improves over the original CW by adding the capability to handle the non-separable cases, and also improves over AROW by adding the adaptive margin property [26].The classification suffer loss for input echo data is defined as l t = max{0, 1 − y t w t x t }.If l t > 0, the classification model will be updated: where

Hammering echo data visualization
Data visualization is widely recognized as one integral part of nowadays data analysis systems, which makes complex data more accessible, understandable and usable.In our hammering echo pattern investigation system, we incorporate data visualization function so as to let end-users browse and understand the massive echo data distributions.We adopted the fundamental method principal component analysis (PCA), which is a standard way for visualizing data.The basic principle of PCA is to find the low dimension linear subspace such that the variations of data can be maximized.The detail procedures can be found in [19].We present the whole hammering echo data visualization results in the experimental analysis section.

Data Collection
In this section, we introduce the hammering echo dataset we created to evaluate the proposed system.First, we present the impact-echo hardware we used for data collection in Figure 5. Followingly, table 1 shows more detail specifications.For echo data recording, the sampling rate was set to 44.1kHz and resolution was fixed to 16-bit depth.We visited 12 inspection sites to capture echo data.Meanwhile, binary expert annotations, i.e.
one echo indicating normal or anomalous conditions, has also been collected.In figure 6, we show the photos of two inspection sites.The defective area had been tagged with pink color by inspector.
In addition, we marked multiple parallel lines in yellow, which explains the trace of hammering.
Scanning speed was around 80 centimeter per minute (cm/min).The hammering area varies with locations.As a result, we obtained 10940 annotated echo instances, among which 9349 are normal and 1591 are anomalous instances, respectively.The dataset laid fundamentals for further numerical analysis.

Experimental settings
At echo feature extraction stage, we determine the Fourier analysis window length to be 1024.
Band pass filter is applied to focus on the frequencies ranging from 500 to 15000Hz.At online learning stage, parameter tuning plays a key role in achieving accurate hammering echo pattern classification.In this study, we evaluate 7 state-of-the-art online learning approaches with the massive real hammering echo data.The detail parameter settings are presented in Tab. 1.  experiments had been performed in the same vein as real scenario, in which the labeled data was fed to online learning system in a sequential manner.The experiments were conducted over 20 random permutations for the whole echo dataset.At each iteration, we divided the dataset into 15 sub data sets.During online learning, we recorded the evaluation results, i.e. echo classification accuracies and computation time costs, when one subset had been processed.Those information will help us understand the learning behavior of algorithm.Finally, the results are presented by averaging of total 20 trials.

Echo Data Visualization
As introduced in section 3.3, data visualization is useful approach to understand the data.In figure.6, we present the distribution of echo dataset using principal component analysis (PCA).In the visualization, binary class labels were noted with different colors, i.e. the normal echoes were marked with black and flaw-induced ones were colored in red.According to the distribution plot, we have several major findings: 1. damage-induced echoes produced more scattered distribution compared to that of healthy ones.It is reasonable because the damaged concrete usually generated more complex echo spectrum.2. The boundary between normal and anomalous echoes is not clear; in other words, there exists strong non-linearity between the two-class distribution.From the machine learning aspect, the methods which were designated to deal with the inherently nonlinear data may perform superiorly.
Grounded on above understanding to the data collection, we start the algorithmic analysis as follows.

Empirical Evaluation Results
In this part, we present results of experimental validation.The comparison has been drawn of three aspects: echo pattern classification accuracy, processing efficiency and computation complexity.
As for the first comparison-accuracy, we adopted two metrics: mistake rate transition curve and the cumulative classification error rate.It can be anticipated that with more data being examined, the mistake rate would decrease monotonically.To exploit the performance of online echo analysis models, we presented the cumulative error rate after the whole online learning process was done.Figure 7 exhibited the overall errors statistics during online learning process.First of all, by examining the overall mistakes, we found that second-order algorithms, i.e.SOP, CW, AROW and SCW-II usually outperforms first-order algorithms, including perception, OGD and PA; also margin based algorithms, such as CW and SCW-II, usually outperforms non-margin based methods.As for the second evaluation criteria-computation efficiency, we presented the cumulative time costs for all the seven online learning algorithms under evaluation in figure 8.We found that the first-order schemes exhibited superior efficiency due to simpler formulation.The SOP method took longest time in process, which is one of initial second order approaches.confident-weighted learning methods, including both CW and SCW, achieved favorable performance in balancing accuracy and efficiency.To demonstrate the complexity of online learning algorithms, we further showed cumulative number of updates.In general, fewer update steps indicates the algorithm is efficient to establish more robust pattern classification hyperplane such that input feature distribution shift can be accommodated.
By examining figure 9, we can see first order methods usually produced smaller number of updates.
However, the classification accuracies were inferior.AROW scheme made significantly larger number of updates, which can induce high time cost in processing.To clarify the comparison, we further prepared table 2 to summarize the key experimental results, including cumulative mistake rate, size of support vectors (SVs) and cpu run time.Such quantitative information is complementary to above charts.From the table, we found that SCW-II outperformed all other methods in echo pattern classification accuracy.Meanwhile, the method achieved superior efficiency among all the second order online learning algorithms.Besides, we also investigated the number of support vectors (SVs) used by different learning scheme.SVs are defined as the samples used to determined max-margin hyperplane for classification.Since the echo data is highly non-linear, we can see larger number of SVs had been used at classification stage.As a result, among all the compared algorithms, SCW-II produced the best performance in terms of accuracy; for other metrics including number of updates, and running time cost, it also outperformed other second order methods.Therefore, the method can be optimal selection for the application of hammering echo online learning.It is noteworthy that errors may exist in the echo labels, because practical hammering inspection usually takes location information into account.It can be regarded as performing region-based smoothing over each individual expert echo labels nearby.In contrast, our quantitative evaluation had been conducted in a point-wise way.Such factor can be one major reason that lead to the error rate over 10%.The empirical evaluation validated the effectiveness of online learning approach for hammering echo pattern investigation.

Conclusions
In this paper, we attempted to develop an efficient learning framework which can mimic human performance on hammering echo data investigation.Field inspectors determined health condition of concrete by using auditory perception of hammering echoes.In this study, we formulate such process

Figure 1 .
Figure 1.The general processing flow of computerized hammering echo investigation

Figure 3 .
Figure 3. Flow chart of online learning formulation for hammering echo investigation

Figure 4 .
Figure 4. Examples of hammering echo spectrum: normal echo (left) and defective case (right)

Figure 5 .
Figure 5.The low-cost impact-echo system (left) and microphone directivity illustration (right) a * I The first parameter C governs the trade-off between the fitting loss term and regularization term in machine learning model training.In the second order algorithms, the parameter α = 1 is used to initialize the covariance matrix, i.e.Σ = α * I, where I is identity matrix.Parameter η is used to define loss function in the confidence-weighted learning algorithms, i.e. in CW and SCW-II.The Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 9 February 2018 doi:10.20944/preprints201802.0076.v1

Figure 7 .
Figure 7. Visualization of hammering echo dataset using principal component analysis (PCA)

Figure 8 .Figure 9 .
Figure 8. Summary of online cumulative classification error rate

PreprintsFigure 10 .
Figure 10.Comparison of number of updating steps

9 February 2018 doi:10.20944/preprints201802.0076.v1
•Unlike conventional studies which commonly conduct experiment on laboratory-scale data, a massive hammering echo database has been created during this study, which includes more than ten thousand echo samples collected from different types of concrete structures.Moreover, each echo instance has been annotated by professional inspectors with healthy/defective label.The database laid solid fundamental for learning scheme validation.Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted:

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 9 February 2018 doi:10.20944/preprints201802.0076.v1 material
, shape, and other specifications of target structures can vary from place to place.For instance, in Figure3the inspection place shifted from bridge to tunnel, meanwhile the captured echo data/label indexes range varied from 1 ∼ t 1 to t 1 + 1 ∼ t 1 + t 2 , respectively.Our goal is to continuously update

Table 1 .
Summary of the parameter setting by algorithms Figure 6.Photos of two working sites for hammering echo data capture

Table 2 .
Summary of the parameter setting by algorithms