In this section, the two proposed TDE schemes are introduced. We first derive the optimum scheme by taking joint ML for TDE. In this section, in order to put forth the mathematical concept simply and easily, the explanations are based on the assumption that the noise signal z(n) is a random signal and independent identically distributed in trials which follows normal distributions. However, the optimum TDE complexity is quite large, and thus, to reduce complexity, we also propose a sub-optimum TDE, which is a simplified version of the optimum TDE assuming that x(n) is a random signal but common in each measurement trial. The two schemes only estimate time delay in trials, so that they may avoid the intertwined problem that the other conventional schemes have.

#### 3.1. Proposed Optimum TDE Scheme

The proposed TDE scheme, by taking the joint ML, aims to find the most probable point of the time delay vector that consists of the all-trial time delays. The delay vector can be obtained by maximization of the joint pdf of delays as:

where

**y** = [

**y**_{1},

**y**_{2},…,

**y**_{I}]

^{T},

**y**_{i} = [

y_{i}(1),

y_{i}(2),…,

y_{i}(

N)],

$\overline{d}$ = [

${\overline{d}}_{1},{\overline{d}}_{2},\cdots ,{\overline{d}}_{I}$] is a delay vector,

**y**^{T} is the transpose of

**y**,

$\widehat{d}$ is the estimated delay vector of

$\overline{d}$, and

${\Sigma}_{\overline{d}}=E\{(y-{\mu}_{x}){(y-{\mu}_{x})}^{\mathrm{T}}\}$ is the covariance matrix which differs according to

$\overline{d}$. Note that the previous studies for TDE calculate a cost function for each trial delay, but the cost function of the proposed scheme is based on overall trials,

i.e., the delay vector. This means that the proposed scheme focuses on relative delays in trials instead of best delays in each trial.

${\Sigma}_{\overline{d}}$ is composed of sub-matrices as in:

where

${r}_{\overline{d}}^{i,j}=E\{({y}_{i}-{\mu}_{x}){({y}_{i}-{\mu}_{x})}^{\mathrm{T}}\}$ is the covariance sub-matrix for the

i-th and the

j-th trial signals. The sub-matrix can be expressed as:

where Cov(

y_{i}(

n),

y_{j}(

k)), which is the covariance of the

n-th sample in the

i-th trial and the

k-th sample in the

j-th trial, can be determined by the statistical property of

x(

n) and (

d_{i} −

d_{j}). This shows that the optimum TDE can be applied just with the statistical information of the ERP signals, and does not require much information about the instantaneous ERP signal,

i.e., autocorrelation of the ERP signal instead of ERP signal estimation. Note that the autocovariance function of the ERP signal can be estimated based on the measured signal. For example, the autocovariance functions of each trial signal and their average will provide an estimated autocorrelation. Since the autocorrelation and autocovariance function are based on how fast the ERP changes, they are easier to estimate than the signals themselves which need to obtain the template signal in the conventional schemes. In addition, the time delay vector

$\overline{d}$ contains the delays of all the trials. The joint ML cost function in Equation (3) shows that it is maximum where the overall time delays are well estimated, resulting in better performance than other conventional methods employing estimation of individual trials like Equation (2).

#### 3.2. Proposed Sub-Optimum TDE Scheme

The optimum scheme performs the best from the joint ML point of view. However, it involves great complexity because of the calculation of ${\Sigma}_{\overline{d}}^{-1}$ and $\left|{\Sigma}_{\overline{d}}\right|$ in Equation (3). Hence, we propose a sub-optimal TDE with a lower computational burden assuming that x(n) is a random signal but common in each measurement trials.

The exponential term in Equation (3) plays a key role for the joint ML calculation. Taking logarithm can make the Equation (3) approximately simplified as:

For further simplification, ${\Sigma}_{\overline{d}}^{-1}$ is simplified by taking only major elements in the matrix. The following example explains the simplification.

For intuitive understanding of the covariance matrix simplification, we consider a simple case where

x(

n) is a white Gaussian random signal with variance

${\sigma}_{x}^{2}$, and noise

z_{i}(

n) is zero mean white Gaussian noise with variance

${\sigma}_{z}^{2}$. In the case of the white random signal

x(

n),

${r}_{\overline{d}}^{i,j}$ has a non-zero value at the upper or lower diagonal elements determined by (

d_{i} −

d_{j}). The main diagonal elements have non-zero value

$({\sigma}_{x}^{2}+{\sigma}_{z}^{2})$ for

i =

j, and the shifted sub-diagonal elements have

${\sigma}_{x}^{2}$ for

i ≠ j. The non-zero elements in the shifted diagonal represent the relative delay between the

i-th and the

j-th trial. For example, when there are three trial data sequences each three points long for each trial (

I =

N = 3),

${\sigma}_{x}^{2}$ = 1,

${\sigma}_{z}^{2}$ = 0.1, the covariance matrix of a delay set

$\overline{d}$ = [0, 0, 1] is a 9 × 9 matrix and represented as:

The main diagonal elements in Equation (7) have

$({\sigma}_{x}^{2}+{\sigma}_{z}^{2})$ = 1.1, and the sub-matrix

${r}_{\overline{d}}^{i,j}$ where

i ≠ j has a non-zero element according to the delay set. For example:

and this matrix has

${\sigma}_{x}^{2}$ = 1 in the diagonal elements since the 1st and 2nd trial delays in

$\overline{d}$ are the same. However, the non-zero terms in

${r}_{\overline{d}}^{1,3}$ are shifted from the main diagonal entries due to the delay difference (

d_{i} −

d_{j}),

i.e., they are shifted by 1 with (

d_{1} −

d_{3}) = −1 as:

For all the sub-matrices, the element in the

a-th row and

b-th column, of

${r}_{\overline{d}}^{i,j}$ can be expressed as:

where

$\delta (n)$ is the Dirac delta function. Then

${\Sigma}_{\overline{d}}^{-1}$ is presented as:

From Equation (11), we observe that

${\Sigma}_{\overline{d}}^{-1}$ has non-zero elements at the same position as

${\Sigma}_{\overline{d}}$ has. As Equation (11) shows,

${\Sigma}_{\overline{d}}^{-1}$ preserves the location of the sub-diagonal shift using the delay set, and this determines how (

**y** −

μ_{x})

^{T},

${\Sigma}_{\overline{d}}^{-1}$, and (

**y** −

μ_{x}) are multiplied. For example, the metric in Equation (6) can be obtained by taking the approximation of

${\Sigma}_{\overline{d}}^{-1}$ as:

The zero value elements in

${\Sigma}_{\overline{d}}^{-1}$ do not need to be calculated since the multiplication result is zero, enabling computation complexity reduction. Furthermore, if we set the non-zero elements in

${\Sigma}_{\overline{d}}^{-1}$ at a constant number (

i.e., −1 as in Equation (12)), multiplication is required only for the corresponding delay values for Equation (6). In addition, the main diagonal in Equation (12) can be disregarded in the computation of Equation (6) since it is common irrespective of

$\overline{d}$. Finally, the sub-optimum scheme can be represented as:

In the sub-optimal scheme, as shown in Equation (14), $\widehat{d}$ is obtained using only the correlation of the measured signal **y**. An exhaustive search or a stochastic optimization can be applied for searching $\widehat{d}$. Hence the optimum and sub-optimum joint ML solutions choose the best relative trial delay vector satisfying Equation (3) and a close to the best relative trial delay vector, respectively, from a ML point of view, among the relative delay vectors of all the trials.

In Equation (11), the values of the non-zero elements are different, e.g., 6.8 ≠ 5.2 ≠ 0.9 and –3.2 ≠ –4.8, but as the matrix size increases, with larger

N and

I, this difference decreases, resulting in smaller variance among the non-zero elements.

Figure 1 shows 3D color display of covariance matrix

${\Sigma}_{\overline{d}}$ and inverse covariance matrix

${\Sigma}_{\overline{d}}^{-1}$ of signals of

N = 100 and

I = 3.

Figure 1a,b show

${\Sigma}_{\overline{d}}$ of a colored random signal and a real EEG signal, respectively, and

Figure 1c,d are their

${\Sigma}_{\overline{d}}^{-1}$, respectively. As seen in

Figure 1c, the main diagonal and sub-diagonal values are similar, meaning that

${\Sigma}_{\overline{d}}^{-1}$ can be approximated to have constant non-zero terms as in Equation (12). A similar tendency can be observed for real EEG signals which are not white random signals as seen in

Figure 1b,d. This implies that the sub-optimal scheme can be applied for EEG signals that contain ERP signals although the scheme is derived based on a white random signal assumption. In addition, if

x(

n) is a white random signal, Equations (3) and (13) will return almost the same result. On the other hand, the performance of the sub-optimal scheme will be worse than that of the optimum scheme if the ERP signal does not follow white random signal characteristics due to the larger approximation error of

${\Sigma}_{\overline{d}}^{-1}$. Since the sub-optimum scheme does not require complicated operations such as matrix inversion, the computational complexity is reduced dramatically. The computation complexity of the sub-optimum scheme is O(

R^{I}I^{2}R), which is far smaller than that of the optimum scheme, which is O(

R^{I}(

IR)

^{2.373}), where

R (

R ≤

N) is the searching range of the delay candidates. Hence, for faster execution, the sub-optimal scheme can be applied.