Long-Term Prediction of Crack Growth Using Deep Recurrent Neural Networks and Nonlinear Regression: A Comparison Study

: Cracks in a building can potentially result in ﬁnancial and life losses. Thus, it is essential to predict when the crack growth is reaching a certain threshold, to prevent possible disaster. However, long-term prediction of the crack growth in newly built facilities or existing facilities with recently installed sensors is challenging because only the short-term crack sensor data are usually available in the aforementioned facilities. In contrast, we need to obtain equivalently long or longer crack sensor data to make an accurate long-term prediction. Against this background, this research aims to make a reasonable long-term estimation of crack growth within facilities that have crack sensor data with limited length. We show that deep recurrent neural networks such as LSTM suffer when the prediction’s interval is longer than the observed data points. We also observe a limitation of simple linear regression if there are abrupt changes in a dataset. We conclude that segmented nonlinear regression is suitable for this problem because of its advantage in splitting the data series into multiple segments, with the premise that there are sudden transitions in data.


Introduction
Infrastructure (e.g., tunnel, road, system communication, and power plant) is essential in social and economic activities in contemporary civilization. Investment in infrastructure development has given an crucial impact on economic improvement. Private and public sectors can utilize the infrastructure to increase the efficiency of goods transportation and services. Therefore, it is necessary to prevent those losses before a fatal disaster happens.
Specifically, in this paper, we consider the problem of detecting crack growth in a building structure. Exponential crack growth in a building structure (e.g., concrete and steel frame) could make the structure unstable, thereby damaging the whole structure. However, generally speaking, predicting such property in the short term is not a viable option. Moreover, decision-makers typically consider a long-term projection to formulate their policies. Hence, predicting long-term crack growth is essential in preventing infrastructure disturbance.
We approach this problem with segmented nonlinear regression as an estimator of longterm prediction for the crack growth in infrastructure. We perform diverse experiments to evaluate the effectiveness of segmented nonlinear regression. We then compare the result with three other models: long short-term memory (LSTM), Sequence-to-Sequence (Seq2Seq) LSTM, and simple linear regression. As a result, we have found that our implementation of segmented regression is comparable to the Seq2Seq LSTM model. Moreover, our segmented regression model outperformed the LSTM model and the linear regression model.

Crack Sensor Data
To evaluate our approach, we aggregate crack sensor data from various locations in South Korea. We consider two kinds of infrastructure for this study: an underground facility and a bridge. We collect the sensor data within different time frames and with different sampling rates. The data used in this research had been provided by Infranics Co., Ltd. in Seoul, Korea, a software company working with KT Corporation (formerly Korea Telecom, Korea's largest telecommunications company). Due to confidential reasons, we do not disclose the types and the positions of the sensors.
In this paper, we report the data in millimeter measurements with varying sample rates. Due to the continuous form of the data, we perform several time-series related data preprocessing methods, described in Section 3.1.

Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN) [1] is a feed-forward neural network with the addition of feedback connections. The output of a unit in an RNN model at the current timestep is supplied back to the unit of the RNN model as an additional input for the subsequent step. This architecture innately enables the RNN model to learn temporal dependence in the data. For this reason, we consider RNN as one of our approaches to perform long-term prediction in this research.
In this section, we briefly discuss the Jordan Network [2], one of the widely recognized classical RNN architectures. Let each step t, with input x t , hidden state h t , and the output y t , then the architecture and operation of the Jordan Network can be expressed as follows: where W h and U h are hidden state parameter matrices, W y and U y are the output parameter matrices, b h is parameter vectors, and tanh is the hyperbolic tangent activation function for non-linearity. The gradients of the above units (Equations (1) and (2)) are computed using back-propagation through time (BPTT) algorithm [3]. BPTT unfolds RNN's hidden units backward over time, and back-propagates the gradients of the hidden units at the corresponding time step. The parameters of the hidden units are updated according to the gradients in the respective time steps. Several studies have reported that RNN has a problem with a long-term dependency issue [4,5]. One of the main reasons is vanishing and exploding gradient problems [5], which can happen when a gradient-based machine learning algorithm tries to fit its model using data with long-term dependencies. Long short-term memory (LSTM) addressed this issue by introducing gates to control the flow of information [6]. In this work, we consider LSTM and Sequence-to-Sequence (Seq2Seq) LSTM architecture as our long-term estimator candidate.

Long-Short Term Memory (LSTM)
Long-Short Term Memory (LSTM) determines the quantity of information needed to be preserved and forgotten with the gates mechanism. A basic LSTM unit consists of a cell state c t , a forget gate f t , an input gate i t , and an output gate o t . LSTM uses a cell state c t as a memory that stores information from all computations. To prevent the storage of unnecessary information in a long sequence of inputs, a forget gate f t discards a fragment of it. Using an input gate i t , LSTM can select inputs that maximize outputs. The purpose of an output o t gate is to decide which part of the cell state c t is relevant to the output. Finally, a hidden state h t of a basic LSTM is an element-wise product of o t and the hyperbolic tangent of c t . For every time-step t, the compact forms of the equations for the forward pass of an LSTM unit, based on the descriptions above, are described as follows: where x t ∈ R d is the input; W ∈ R hxd , U ∈ R hxh , and b ∈ R h are weight matrices and bias vector parameters respective to each function; σ and tanh are sigmoid and hyperbolic tangent functions, respectively.

Sequence-to-Sequence LSTM
Sequence-to-Sequence (Seq2Seq) LSTM is an architecture that uses two sections of LSTM that act as an encoder and a decoder [7]. This architecture has been implemented in many natural language processing (NLP) tasks (e.g., machine translation [8] and summarization [9]). There are two parts in Seq2Seq LSTM: an encoder and a decoder. The encoder reads an input sequence and extracts the necessary information. Then, the decoder produces an output prediction based on information from the hidden state vector of the encoder. Lastly, the decoder, which is trained with the teacher forcing method [10], uses the actual output o t as an input in the next time step x t+1 . Figure 1 illustrates the architecture of Seq2Seq LSTM.

Regression Analysis
In this section, we briefly describe two methods of regression analysis that we implement in this study to find the relationship of changes in crack sensor values over time.

Linear Regression
Linear regression is one of the regression analysis methods for mapping the dependent variable y with the independent variable x, such that: where α and β are the intercept and bias parameters, respectively; is residual, and n is the number of observations. Various methods, such as Ordinary Least Square (OLS) or Gradient Descent, can be used to estimate two parameters by minimizing the appropriate loss function. A widely used loss function is a mean square error (MSE) function, which minimizes the sum of residuals between the true label and the model output (y).

Segmented Nonlinear Regression
Segmented nonlinear regression-also known as piecewise regression-is a regression analysis method that partitions the independent variable data into segments, and fits each segment using linear regression methods. Dynamic programming (DP) [11] and greedy merging [12][13][14][15] are the existing approaches for finding the breakpoint locations of the input data and fit line equations for each segment. Although DP [11] has shown promising results, it is not practical for implementation on massive data points, due to its quadratic running time, O(n 2 ) [13,14].
In this paper, here we implement a greedy merging approach by Acharya et al. [14] that has achieved efficient computation with the tradeoff of a slight decrease in precision. Algorithm 1 is the pseudocode for the entire process of segmented nonlinear regression with greedy merging. The algorithm initially makes a partition of n number of data points into n segments with a length of 1. Then, it performs the least square method to calculate the loss of a linear function for merging neighboring pairs of segments. Acharya et al. have defined the error of merging two adjacent segments S 2c−1 and S 2c as: where c is the index for the two segments, y c is the true value of the data points of the two segments, X c are the input data points of the two segments, α c is the weight of least square fit, X c α c is the prediction, and S is the set of all segments. In simple terms, the merging error is the mean-squared error of the pair of segments subtracted by variance s 2 times the length of the pair of segments. Consequently, Algorithm 1 merges the pairs when the pairs are not one of the τ-largest errors, where τ is a hyperparameter. The merging processes continue until they met the target number of segments T.
Let m i be the current number of segments.
Let B be the set of the other indices. /* Keep the segments unmerged with large merging errors.
return S // the least squares fit to the data on every segment in S i

Data Preprocessing
In the experiment to evaluate our approach, we have gathered data from four crack sensors in two different facilities. We label the data from the first position of the first facility as Crack A1, and the data from the second position of the first facility as Crack A2. Similarly, we put labels to the data from the second facility at respective positions as Crack B1 and Crack B2. We perform data collection for each crack sensor at a distinct time frame. Specifically, we have collected Crack A1 and Crack A2 data from 25 February 2019 to 31 August 2019, and from 1 June 2019 to 31 July 2019, respectively. As for Crack B1 and Crack B2, we have accumulated them from 21 June 2019 to 31 August 2019 and from 13 June 2019 to 31 August 2019, respectively.
We discover two issues from the sensor data at all facilities. The first issue is that the sampling rates are not consistent, and the second issue is that there are many missing values. We conjecture that these problems happen due to either transmission error or device glitch. We perform downsampling to establish a consistent sampling rate, which decreases the original sampling rate into hourly intervals and computes the mean value in each hour. To address the missing values, we apply linear interpolation between two data points where the gap happens. Figure 2 shows the results of preprocessed data with the interpolation of Crack A1 data. Overall, we preprocess the data into 4513 data points for Crack A1, 1465 data points for Crack A2, 1729 data points for Crack B1, and 1907 data points for Crack B2. Since we are dealing with time-series data, we split them such that the first 90% sequences are a training set and the last 10% are a test set. Figure 3 illustrates the general data splitting strategy used in this paper. We will detail this strategy in the next section.

Sliding Window
Since LSTM and Seq2Seq LSTM need a fixed sequence of data as an input to train the model and to make inferences from the model, we applied data reshaping with a fixedlength sliding window. The fixed-length window slides through the preprocessed data that eventually formed several batches of fixed sequence data. Then, we feed these batches into both LSTM models and Seq2Seq LSTM models as inputs for training. Subsequently, we perform batch training that enables fast and efficient computation. Additionally, the batches with the sliding window ensure that the model can capture the lagged dependencies in the whole sequence of data.
As an illustration, let the sequence of toy data (x 1 ...x 8 ) and the fixed-length sliding window with a length of 6 in LSTM operate as follows. The fixed-length window in the first batch covers from x 1 to x 6 . Then, the window slides to the next sequence comprising x 2 to x 7 . Afterward, the window proceeds to the next sequence containing from x 3 to x 8 . Lastly, we set the last data point in each batch as the ground truth that we compare with the output of LSTM.
In our experiment, we choose 100 as the length of the sliding window in the training setup for LSTM, and select the 100th data point as the ground truth. We illustrate this sliding window mechanism in Figure 4. As aforementioned, we labeled the last data point in each batch as ground truth to calculate the gradient. Figure 4. Illustration of the sliding window operation for LSTM on toy data. Blue boxes denoted as input data for LSTM, red boxes denoted as ground truth that needed to be predicted using LSTM, and orange rectangles are the sliding windows.
As for Seq2Seq LSTM, we perform a fixed-length sliding window operation that is similar to what has previously been described, but with a slight change. We divide the sequence sampled by the sliding window in half, which produces two new sequences. We select the first sequence as input for the encoder, and the rest as the ground truth to compare with the decoder's prediction. We move the fixed-length window to stride as long as the length of the ground truth to prevent the network from peeking the future for every batch. In the experiment, we use the sliding window with a length of 256 and divide it into two sequences, where the length of each sequence is 128. Figure 5 illustrates the sliding window process in Seq2Seq LSTM on the toy data. Figure 5. Illustration of the sliding window operation for Seq2Seq LSTM on toy data. Blue boxes denoted as the input data for encoder, red boxes denoted as the ground truth to be predicted by decoder, orange rectangles denoted as the sliding windows.
Note that we do not perform data reshaping for the linear regression method and the segmented nonlinear regression method we have applied, because we let them process one data point at one time step.

Model Hyperparameters
To make a fair comparison between the LSTM and Seq2Seq LSTM model, we train both models with similar hyperparameters. Both models consist of two stacked LSTMs, 64 hidden units, and a fully connected layer as an output layer. We train both models with 10,000 epochs and choose the optimally performing model based on loss value. We adopt the root mean square error (RMSE) estimator as a loss function for the LSTM and Seq2Seq LSTM models. We set the initial learning rate of LSTM and Seq2Seq LSTM as 0.0001.
For regression analysis, we train linear regression and segmented nonlinear regression with OLS to estimate the parameters. We carry out hyperparameter searching for segmented nonlinear regression. As a result, we discover that the model with hyperparameters T of 8 and τ of 4 has shown the highest performance. We also set s hyperparameter as 0 in segmented nonlinear regression after we discover that it can introduce too high a variance in the experiment results. Since we handle time series data, we make use of the last segment's parameter in the segmented nonlinear regression model to perform inference, because the test set is the next continuity of the training set.

Hardware Systems
For fair comparison, we perform a series of data preprocessing and experiments in the same hardware system. We use Intel Xeon E5-2620, which has eight cores and 16 threads in a central processor. To handle a huge dataset, we utilize 64 GB of RAM. Finally, we use Nvidia 1080Ti 11 GB as a GPU to accelerate the training and inference of the LSTM and Seq2Seq LSTM models.

Evaluations
We choose the root mean square error (RMSE) as our quantitative evaluation measure for models' predictions in the test set. The model with a lower RMSE score indicates the best-performed model. We also analyze the long-term prediction capability of all models with 10-year growth predictions on the Crack A2 data. Figure 6a shows that the LSTM model's predictions are constant, and that the discrepancies between the ground truth are visible. The same figure shows the model produced a constant value of 0.078 throughout the end of the Crack A2 test set, while the ground truth varies with the highest value of 0.076. We observe the model performed poorly, with similar behavior in in the remaining models. As a result, the LSTM models achieved the worst RMSE score compared with other models. We believe this outcome is because the model failed to capture the lagged dependencies of the training data.

Seq2Seq LSTM
In Figure 6b, we observe the Seq2Seq LSTM model successfully emulated the Crack A2 test set from date 26 July 2019 until 27 July 2019. However, the model overshoots for the remaining test set with a margin of 0.024 at the last data point. We suspect the model showed such a behavior because it could not effectively learn temporal dependencies on the data that exhibited abrupt changes. Although the model exhibited overshooting, it achieved the best RMSE score among the models. Table 1 shows that the Seq2Seq LSTM model outperforms the LSTM, linear regression, and segmented nonlinear regression models for all the test sets.

Linear Regression
We observe the performance of the linear regression model is relatively effective on Crack A1 and Crack A2 test set. However, its predictions are inferior to other algorithms on the Crack B1 and Crack B2 test sets, as shown in Table 1. We assume that the model could not make a compelling prediction owing to a sudden change in the train set, as is observed from Figure 7d. In the figure, we discover the linear regression model's predictions were disparate than the ground truth on the Crack B1 train set. Therefore, the linear regression model produced an unsatisfactory result on the Crack B1 test set with an RMSE score of 0.12429; in the meantime, the Seq2Seq LSTM model obtained 0.00253.

Segmented Nonlinear Regression
We discover the segmented nonlinear regression model considerably outperformed the linear regression model and the LSTM model based on the RMSE score. The model also rivals the Seq2Seq LSTM model's result on the Crack A2 test set. From Table 1, we find the segmented nonlinear regression model significantly surpassed the linear regression model on any test set. Figure 7d shows that the model can approximate broken segments effectively, despite abrupt changes on the Crack B1 train set.

Long-Term Prediction Analysis
In addition to quantitative evaluations based on the RMSE metric, we analyze the longterm prediction capability of each model. We conduct experiments in which individual models produce two kinds of prediction: (1) 10 years and (2) eight months on each sensor. We present the prediction results on Crack A2 in Figure 8, where each model obtains similar performance in terms of the RMSE metric.  We discover that the outputs of the LSTM model and the Seq2Seq LSTM model are indistinguishable, as both models output constant values after a few steps. Figure 8b shows that the Seq2Seq LSTM model produced its predictions in a logarithmic fashion in less than a month until it generated fixed outputs for the rest of its prediction. We observe that the LSTM model performed similarly, with an exception in which the model produced static outputs from the beginning. As a result, both models failed to emulate the crack's growth in the long term, as in our experiment for 10-year prediction. Figure 8a shows the results mentioned earlier on Crack A2 data. Furthermore, the training process has required a tremendous amount of time on both models. Similarly, the inference process for the 10-year prediction took a massive span of durations for the LSTM model and the Seq2Seq LSTM models, as we report in Table 2. For these reasons, we consider that the LSTM model and the Seq2Seq LSTM model may not be suitable for long-term predictions. Table 2. Average time consumption of the training process, and 10-year inference for each model on every sensor. We report the duration for both cases in h:mm:ss format, where h is hours, mm is minutes, and ss is seconds. The fastest time is in bold.

Prediction Model
Training Duration Inference Duration Contrary to the LSTM model and the Seq2Seq LSTM model, the linear regression requires a small amount of training and inference time. On average, the model only took a minute and three seconds to produce 10-year predictions. Although the linear regression model achieved the fastest training and inference time among the models, we discover the linear regression model produced negative increment outputs on 10-year and eight-month predictions, as shown in Figure 8a and Figure 8b, respectively. We conjecture that the decremental happened as a result of the inaccurate prediction. Figure 6c supports our assumption where it shows that the model produced linearly decreasing values, while the ground truth abruptly increases. Thus, we consider that the linear regression model may be inadequate for long-term prediction as well.
We observe that the segmented nonlinear regression model generated incremental values, while the other models produced either constant values or decremental outputs in Figure 8a,b. Moreover, Table 1 shows that the model produced predictions that had comparable performance to the Seq2Seq LSTM model on each sensor, despite several abrupt changes in presence. Above all, the segmented nonlinear regression model required a short period of training and inference processes that are comparable to those of the linear regression model. Considering the factors as mentioned earlier, it is reasonable to conclude that the segmented nonlinear regression model is an acceptable application for the long-term prediction of crack growth when there are multiple abrupt changes in the data.

Conclusions and Future Work
In this study, we exhaustively explore several methods for the long-term prediction of crack growth in two separate facilities that are practically in use. We evaluate the performance of LSTM, Seq2Seq LSTM, linear regression, and segmented nonlinear regression models on the test set, with RMSE as our evaluation metric. We also analyze 10-year and eight-month predictions of the models on each crack sensor. Our experiment results show that the LSTM model has the worst performance among the models in terms of RMSE score. On the other hand, the Seq2Seq LSTM model achieves the highest performance in the same test sets. While the linear regression model is comparable to the segmented nonlinear regression model on the Crack A1 and Crack A2 test set, the segmented nonlinear regression model obtains more satisfactory results when there are abrupt changes apparent in the data, such as Crack B1 and Crack B2. The long-term prediction results show that the LSTM model and Seq2Seq LSTM model are ineligible as the predictor, due to a failure to emulate the crack growth, where both models give a constant output. Additionally, both models require a tremendous amount of time for training and inference. Finally, our analysis of long-term prediction points to segmented nonlinear regression being the ideal candidate as the predictor, which is computationally efficient and exhibits comparably high performance.
In future work, we would like to consider other research directions for more robust prediction. The first one is to incorporate temperature as the independent variable of prediction, which we assume causes rapid expansion and shrinkage of the materials that lead to the growth of the crack. The next one to regard in the regression analysis are the external shocks that cause abrupt changes in crack sensor reading. Finally, we plan to extend our machine learning and deep learning-based research to accomodate the influence on the stress distribution in the cracked region using the geometry of the structure and loading cases [16][17][18][19].

Data Availability Statement:
The data that support the findings of this study are available from Infranics Co., Ltd., but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Infranics Co., Ltd.