We first clustered the measured displacement data obtained from each measuring point using Gaussian Mixture Model (GMM) and improved the model by an Iterative Self-Organizing Data Analysis (ISODATA). The displacement data of 24 measuring points we selected for the case were classified into five groups. We then used the random coefficient model to fit the data of each class.

#### 3.1. Clustering of the Monitoring Data Based on ISODATA-GMM

As we have introduced in

Section 2, the displacement is mainly induced by three components: water pressure component

${\delta}_{H}$, temperature component

${\delta}_{T}$ and aging component

${\delta}_{\theta}$. To build a clustering criterion to represent the spatial and temporal characteristics of measuring points, we have to discuss these three factors, respectively. The water pressure component

${\delta}_{H}$ and the temperature component

${\delta}_{T}$ are mainly dependent on the location of the measuring point and the geometrical size of the dam. For concrete dams, the spatial relations between each measuring point can be represented by its distance to the dam foundation

d. The temporal characteristic of measuring points mainly affects the aging component

${\delta}_{\theta}$. We first separated the aging component

${\delta}_{\theta}$ from the time series measured data. The temporal characteristic can be described by two factors: one is the maximum absolute value of the aging sequence

$\lambda $, and another is the the degree of convergence of the data series

$\xi $, which is expressed by

$\xi =\left|{\displaystyle \frac{{c}_{1}}{{c}_{2}}}\right|$, where

${c}_{1}$ and

${c}_{2}$ are the aging term coefficients. Therefore, we use

d,

$\lambda $ and

$\xi $ as the clustering criteria, to represent the spatial and temporal characteristics of the measuring points.

Gaussian Mixture Model (GMM) based clustering assumes that data comes from several sub-datasets which are modelled separately, and the whole dataset is a mixture of these sub-datasets. The resulting model is a finite mixture model. When data are multivariate continuous observations, the parametrized component density is usually a multidimensional Gaussian density.

For a one-dimensional dataset, we assume that the probability distribution of a random variable

x follows a mixture of two Gaussian distributions as described in Equation (

6):

where

$k=1$ and

$k=2$ represent the two Gaussian distributions; the

kth prior probability is

$\left\{{p}_{1}=1/2,{p}_{2}=1/2\right\}$;

$\left\{{\mu}_{k}\right\}$ and

$\sigma $ are the mean and the standard deviation of the two Gaussian distributions, respectively. We use

$\theta \equiv \left\{\left\{{\mu}_{k}\right\},\sigma \right\}$ to simplify these parameters.

The dataset ${\left\{{x}_{n}\right\}}_{n=1}^{N}$ which contains N points is assumed as an independent sample from the distribution. ${k}_{n}$ denotes the unknown class tag for the nth point.

In the case that

$\left\{{\mu}_{k}\right\}$ and

$\sigma $ are known, the posterior probability of the class tag of the

nth point

${k}_{n}$ can be written as:

If the case of

$\left\{{\mu}_{k}\right\}$ is unknown and

$\sigma $ is known, we may deduce the

$\left\{{\mu}_{k}\right\}$ from the data series

${\left\{{x}_{n}\right\}}_{n=1}^{N}$. We hence derive an iterative algorithm of

$\left\{{\mu}_{k}\right\}$ to maximize the likelihood estimation:

The natural logarithm of the likelihood

L derivation of

$\left\{{\mu}_{k}\right\}$ is:

where

${p}_{k|n}\equiv P\left({k}_{n}=k|{x}_{n},\theta \right)$ is the Gaussian density (see Equation (

7)). Ignoring the items in

$\frac{\partial}{\partial {\mu}_{k}}}P\left({k}_{n}=k|{x}_{n},\theta \right)$, the second derivative versus

$\left\{{\mu}_{k}\right\}$ can be approximated as:

Then, the initial

${\mu}_{1}$,

${\mu}_{2}$ are iterated to

${\mu}_{1}^{\prime}$,

${\mu}_{2}^{\prime}$ using the approximate Newton–Raphson steps:

We now expand to the multidimensional dataset (multiple Gaussian distribution). The Gaussian mixture density can be written as:

where

k is the serial number of the Gaussian distribution;

i is the serial number of the data’s dimension;

n is the serial number of the data sequence;

I is the total number of the data’s dimension;

${\pi}_{k}$ is the weight;

${\mu}_{i}^{\left(k\right)}$ is the mean of the Gaussian distribution;

${\sigma}_{i}^{\left(k\right)}$ is the variance of the Gaussian distribution;

${x}_{i}^{\left(n\right)}$ is the data point. The iterative formula of

${\mu}_{i}^{k}$ has been presented in Equation (

11). The iterative formulas of the variance

${\sigma}_{i}^{\left(k\right)}$ and the weight

${\pi}_{k}$ are as follows:

Once the iteration is in convergence, the GMM clustering classified the dataset into several classes.

However, there are still some defects in GMM clustering. The number of classes and the number of data points in one class are unknown before clustering; hence, the iteration may obtain a class with only one or two data points, which may result in the divergence of the final results.

To solve this problem, we introduce the Iterative Self-Organizing Data Analysis (ISODATA) to realize the following functions: (a) separate the class into two when the variance is too large, (b) delete the class when the number of samples below an indicated value, and (c) merge two classes when they are too close.

Figure 2 shows the flow chart of ISODATA.

#### 3.2. Random Coefficient Model

As shown in

Figure 3, the monitoring data is two-dimensional, which contains time series data and cross-sectional data. The data on one panel represents the cross section displacement data at a certain time, and each grid on the panel stands for a monitoring point.The monitoring data of the dam’s cross section at an indicated time can be considered as a two-dimensional panel.

Here, Equation (

15) expresses the regression coefficients of a panel without time variation:

where

${y}_{it}$ is the two-dimensional dam displacement data;

${x}_{kit}$ is the two-dimensional data of explanatory variables;

t is the time index;

i is the cross section index;

k is the explanatory variables index;

${\beta}_{ki}$ is independent with time and can be divided into

${\beta}_{k}$ and

${\gamma}_{ki}$;

$\beta ={({\beta}_{1},\cdots ,{\beta}_{K})}^{{}^{\prime}}$ is the common mean coefficient vector,

$\gamma ={({\gamma}_{1i},\cdots ,{\gamma}_{Ki})}^{{}^{\prime}}$ is the derivation from individual data to the common mean value;

u is a random interference term. Ref. [

20] assumed that

${\beta}_{i}=\beta +{\gamma}_{i}$ is a random variable and deduced the following assumptions in Equation (

16):

By integrating the

$NT$ observation data, we can obtain the equation in a matrix format (Equation (

17)):

where

$u={({u}_{1}^{\prime},\cdots ,{u}_{N}^{\prime})}^{\prime}$,

$\gamma ={({\gamma}_{1}^{\prime},\cdots ,{\gamma}_{N}^{\prime})}^{\prime}$,

N is the number of panel,

T is the number of data in each panel, the compound error term

$\tilde{X}\gamma +u$ is a diagonal matrix, and the

i-th diagonal block is

${\psi}_{i}={X}_{i}\Delta {X}_{i}^{\prime}+{\sigma}_{i}^{2}{I}_{T}$. According to [

20], the estimation of

$\beta $ from OLS is biased. Once

$\frac{1}{NT}}{X}^{\prime}X$ converges to a non-zero constant matrix, we can hence obtain a consistent non-effective estimation. The optimal linear unbiased estimator of

$\beta $ is the generalized least squares estimation:

The variance of the estimator is:

${\widehat{\beta}}_{GLS}$ follows an asymptotic normal distribution and it is the effective estimation of $\beta $. The random coefficient model can dominate the explanatory variables coefficients, which makes the coefficients following asymptotic normal distributions instead of being free variables, and hence represents the correlation between adjacent monitoring points.

The distribution density of one monitoring point is strongly dependent on its features such as the location of the monitoring point. Hence, we cluster the measuring points based on its spatial and temporal characteristics. Using the ISODATA-GMM method introduced in

Section 3.1, the measuring points with similar spatial and temporal characteristics are classified into the same group. Then, the coefficients in the same cluster can be considered as following the same normal distribution.