Pattern Recognition of Gene Expression with Singular Spectrum Analysis

Drosophila segmentation as a model organism is one of the most highly studied. Among many maternal segmentation coordinate genes, bicoid protein pattern plays a significant role during Drosophila embryogenesis, since this gradient determines most aspects of head and thorax development. Despite the fact that several models have been proposed to describe the bicoid gradient, due to its association with considerable error, each can only partially explain bicoid characteristics. In this paper, a modified version of singular spectrum analysis is examined for filtering and extracting the bicoid gene expression signal. The results with strong evidence indicate that the proposed technique is able to remove noise more effectively and can be considered as a promising method for filtering gene expression measurements for other applications.


Introduction
Segmentation is a crucial stage of many animal body plans.One of the best understood examples of segmentation is the patterning along the future head to tail axis of the Drosophila melanogaster.Since most of the developmental regulators identified in Drosophila are conserved genes, the Drosophila model has a profound influence on other fields, including medicine [1].During Drosophila embryogenesis the segmented body plan is established through a cascade of both maternally and zygotically expressed segmentation genes [2].
Specifically, in the modern formulation, cells or nuclei would respond to different concentration thresholds of the morphogen by the expression of specific target genes, thereby translating the differential morphogen concentration into patterns of gene expression [3].The zygotic segmentation genes were first identified in a large scale screen for embryonic lethal mutations that affect the body pattern [4].Bicoid, which encodes a homeobox transcription factor, was the first gene to be identified that fulfils the criteria of such a morphogen [5].
Of the many maternal coordinate genes, here we are concerned only with those in the anterior groups that regulates zygotic segmentation genes, namely bicoid (bcd) (the italic lower case bcd represents the gene and Bcd refers to protein).In Drosophila melanogaster, bcd mRNA, transcribed in the mother, is deposited at the anterior end of the fertilized egg, and the Bcd protein is translated and diffuses to form an anteroposterior (AP) concentration gradient [6].This gradient determines most aspects of head and thorax development.In Bcd-deficient embryos, the thoracic segments are developed from more anterior regions than normal, and less of the body is devoted to the head [1].
Once the Bcd gradient is established, target genes, such as the hunchback and Krüppel gap genes, are transcribed at different concentrations of Bcd and thus at different positions in the embryo [7].Several models have been proposed to describe the Bcd gradient in the Drosophila embryo since its discovery in 1988.However, none of those are fully consistent with all available experimental observations; each can only partially account for and explain experimental findings on the Bcd gradient [8].Among all extensive studies, the most widely accepted model was formulated before the identification of Bcd [5], which is called the simple diffusion model [3] (also referred to as the SDD model).The SDD model has been widely used to explain the exponential gradient of Bcd [9]: where A is the amplitude, x is distance from the anterior, and λ is the length parameter [10].
It should be noted that λ is obtained by fitting an exponential model to the Bcd intensity profile, and computing the position at which the concentration has dropped to 1/exp of the maximal value at the anterior (at x = 0) [8].However, one of the challenging issues in analysing Bcd is its association with both intrinsic and extrinsic noise, which limits the performance of the models and techniques, and consequently affects the accuracy of results [11,12].Therefore, employing an effective method for dealing with noisy Bcd is essential.
There are two main approaches for fitting a model to a noisy Bcd.The first approach ignores the presence of noise and directly fits a model to noisy Bcd data (like the SDD model).The second approach, which is often more effective, seeks to reduce the Bcd noise level via filtering methods, and then a model is fitted to the filtered data [13].It should be noted that the selection of an appropriate filtering method is crucial for the effectiveness of the second approach.For the standard filtering methods, assuming stationarity for the data, linearity of the model and normality of the residuals can provide only an approximation of the true situation [18].Therefore, a method that does not depend on these assumptions could be very useful for modelling and filtering Bcd data.Singular spectrum analysis (SSA) is a very good example as it is nonparametric and therefore does not require any assumptions [14] and basic SSA has been applied since 2006 for extracting the Bcd signal from its noisy datasets; for example see [2,6,[19][20][21]23]. Accordingly, we introduce a new method of SSA for filtering and signal extraction of Bcd, which is a powerful technique for filtering noisy signal in time series analysis and signal processing.
The basic SSA method consists of two complementary stages: decomposition and reconstruction.Each stage includes two separate steps.At the first stage the Bcd is decomposed into the sum of a small number of independent and interpretable components, such as a slowly varying trend and a structureless noise [16].At the second stage, the noise-free Bcd is reconstructed [17].
Here, the SSA technique based on minimum variance is utilized as it has been shown to be more effective than the basic version of it [22].This enables extraction of the Bcd signal more accurately in comparison with the basic version [2,6,23].
In recent years the SSA technique has been developed and applied to many biomedical and genetic data (see, for example, [24,25]).It has also been used for filtering microarray gene expression data [15,26], curve fitting in growth curve models [13], and detecting the gene expression pattern in [2,6,23].It has also been used for reliably imaging radioactive seeds [27].
The rest of the paper is organised as follows: the SSA technique is briefly described in Section 2. The empirical results are represented in Section 3, and finally Section 4 is devoted to summary and conclusions.

LS and MV Estimators
A short description of the SSA based on minimum variance technique is given below, and in doing so we mainly follow [22] where a more detailed description is made available.
Consider a noisy signal vector Y N = (y 1 , . . ., y N ) T of length N .
where S N represents the signal component and where L is some integer called the window length (we can assume L ≤ N/2).Define the so-called "trajectory matrix": The columns X j of X, considered as vectors, lie in an L-dimensional space R L .It is obvious that: where S and E represent Hankel matrices of the signal S N and noise E N , respectively.The singular value decomposition (SVD) of the trajectory matrix X can be written as: where U ∈ R L×K is the matrix consists of the normalized eigenvector U i corresponding to the eigenvalue λ i (i = 1, . . ., L), V ∈ R K×K , is the matrix contains the principal components defined as The diagonal elements of Σ are called singular value of X, and their set is called the singular value spectrum.
The signal subspace methods are based on the assumption that the vector space of the noisy time series (signal) can be split in mutually orthogonal noise and "signal+noise" subspaces.Thus, by adapting the weights of the different singular components, an estimate of the Hankel matrix X, which corresponds to noise reduced series, can be achieved: where W is the diagonal matrix containing the weights.Now, the problem is choosing the weight matrix W. Next we consider the problem of choosing this matrix using different criteria.The SVD of the matrix X can be written as: where We can also represent the SVD of the Hankel matrix of the signal s T as: It is clear that the Hankel matrix S cannot be reconstructed exactly if it is perturbed by noise.

LS Estimate of S
Let us consider the assumption that the matrix X L×K is rank deficient, i.e., rank X = r and r < L < K.The simplest estimate of S is obtained when we approximate S by a matrix of rank r in the LS sense: where ∥ .∥ F is the Frobenius norm.That is, the LS estimate is obtained by setting the smallest singular value to zero (λ r+1 = 0, . . ., λ L = 0) in Equation ( 7): The S LS estimate removes the noise subspace but keeps the noisy signal uncorrelated in the "signal+noise" subspace.The disadvantage of LS is that the performance of the LS estimator is crucially dependent on the estimation of the signal rank r.That is, selecting singular values in LS is a binary approach.The main advantage of the LS estimate is that one does not need to consider any assumptions either about the signal or the noise.

MV Estimate of S
The aims of the noise reduction can be considered as follows: (1) separate the "signal+noise" subspaces from the noise-only subspace; (2) remove the noise-only subspace; (3) ideally, remove the noise components in the "signal+noise" subspaces.The first two steps can be achieved by the least squares estimate, while the MV estimate allows us to achieve the third aim as well.However, one should consider some assumptions to obtain the MV estimate [22] and if the assumptions are met, one can obtain the MV estimate as follows [28,31].Given the matrix X, with rank X = rank N = L and also rank S = r.Find the matrix P ∈ R K×K that minimises min ∥ XP − S ∥ 2 F and the solution is obtained by P = (X T X) −1 X T S. Therefore, the MV estimate of S is XP = X(X T X) −1 X T S. Using the SVD of the X, we can obtain XP = UU T S.
That is, the MV estimate of S can be interpreted as an orthogonal projection of S onto the column space of X because UU T is the associated projection matrix.Note also that rank (XP) = rank (S) = r.Let us now consider an alternative form of the SVD of the matrix X using the SVD of S Equation ( 8) as follows: As it appears from Equation ( 11), the middle matrix is diagonal, and the left and right matrices have orthonormal columns.Therefore, Equation (11) can be considered as an alternative form of the SVD of X, and the singular values of X are: Hence, the singular values in Σ 2 can be interpreted as a noise threshold, which permits estimating σ noise from Σ 2 in Equation (12).We can also consider the following submatrices: Now, the MV estimate of S:

Weight Matrix W
Let us consider again the weight matrix W based on the LS and MV estimates.As it is appears from Equations ( 10) and ( 14), the left and right singular vectors, U 1 and V 1 , of LS and MV estimates are the same, but the singular values are different.The LS and MV estimates can be defined based on the weight matrix W r×r as follows: where Here the capability of the SSA technique based on the minimum variance estimator, in reconstructing, is tested by applying the technique to the simple exponential series: where ϵ n is the not normally distributed noise series.In total 200 observations were generated and we added different level of noise to each point of the original series.The simulation was repeated 1000 times.Different values of window length L were considered to examine the sensitivity of the SSA technique for different L. Based on the theoretical results L should be large enough but not greater than N/2 [29].Here, for reconstruction the first two eigenvalues are selected and the ratio of root mean square error (RRMSE) is used to compare two noise reduction methods [30]: where, s i are the estimated values of s i , obtained with SSA M V and s i are the estimated values of s i obtained by SSA LS and N is the series length.If RRMSE <1, then the SSA M V procedure outperforms the alternative method.Conversely, RRMSE >1 would indicate that the performance of the corresponding SSA M V procedure is worse than the competing method.
Figure 1 shows the RRMSE of SSA M V /SSA LS .As it appears, the SSA M V has better performance for small window length and the RRMSE tends to 1 as the window length increases, confirming that both methods have similar performance for large window length.Also it can be seen that there is a gradual increase in RRMSE with window length.For a window length of 3, the performance of SSA M V is up to 14% better than SSA LS in reconstruction.However, there is no significant discrepancy between the performance of SSA M V and SSA M V for window length greater than 36.For more detailed information see [22].

Simulated Series
In this paper, a series of simulated data are used to evaluate the performance of two different approaches for extracting Bcd signal from its noisy datasets.
The first part assesses the SDD model as the most common model in studying Bcd, which follows an exponential curve.In this approach the performance of SDD is evaluated before and after filtering.The second part studies the efficiency of using SSA M V directly for trend extraction.
For starting the simulation, an exponential curve drawn from the SDD model was considered as the benchmark, then random errors ϵ of a normal distribution with zero mean and variance σ 2 ϵ were added to this curve to obtain a noisy datasets.This simulation was repeated 10, 000 times.Finally, by fitting the nonlinear model to the noisy simulated Bcd series, parameters A and λ were estimated.
Table 1 presents the values of estimated parameters and the related standard deviation.As it appears, following the application of SSA, these values are more robust and accurate than the values prior to filtering.Figure 2 also shows the normal distribution of the estimated parameters A and λ before and after filtering.As it appears, parameters show different values in noisy and noise-free series, which confirms that noise reduction does help to obtain a better model.Thus, if the user is inclined to apply a parametric model such as SDD, it is strongly suggested that they first reduce the noise with any noise reduction method (here for example SSA) and then fit a model to the noise reduced Bcd, or they can just apply SSA, which is a powerful nonparametric method for both noise reduction and trend extraction.For the SSA analysis, since the length of each series is 100 (one observation per egg length percentage), the window length of 50 was chosen for trend extraction and reconstruction.Choosing L = 50 and performing the SVD step of the trajectory matrix X, 50 eigentriples were obtained, ordered by their contribution (share) in the decomposition stage.We shall say that the series Y N is not complex if Y N is well approximated by a series with small rank d.For example, series y i = e αi (i = 1, ... , N) has rank 1.For all 2 ≤ L ≤ N − 1, y i = by i−1 where b = e α .It should be noted that the number of eigentriples selected as corresponding to the series S N has to be at least d.For example, if y i = e α + ϵ i then the window length L should be at least 2 and the first eigentriple is enough for reconstructing the original series if ||S N || ≫ ||E N ||.Accordingly, the first eigentriple is selected for filtering and trend extraction.
Furthermore, the ratio of Root Mean Square Error (RRMSE) is used to measure the difference between the estimated values (fitted by SDD model and reconstructed by SSA) and the actual values.
Table 2 shows the results.As it appears from Table 2, a significant reduction in the RMSE value is achieved by SSA, confirming that these results are more accurate than the estimated points by SDD model.For all simulation runs (here 5 times) SSA outperforms the SDD model ranging from 42% to 46%.For instance, for the case 1, first row in Table 2, the extracted signal by SSA is 45% more accurate than the SDD model.The expression level of Bcd gene in wild-type Drosophila melanogaster embryos is measured by using fluorescently tagged antibodies.The data used in this study is available via [32].A complete explanation on the method, data, biological characteristics and different cleavage cycles can be found in [2,33,34].

The Results
Since a natural hint for grouping is the matrix of the absolute values of the w-correlations, calculated w-correlations for different time classes are presented in Table 3.If the absolute value of the w-correlations is small, then Bcd signal and its corresponding noise are almost w-orthogonal, but if it is large, then they are far from being w-orthogonal, and are therefore not very well separable.As it can be seen from the results, the Bcd signal can be effectively separated from the noise component, confirming the technique proposed in this paper performs well in noise reduction and pattern recognition.Table 3. w-correlations between signal and residuals for Bcd gene expression series.Figures 3 and 4 show the extracted trend obtained by SSA-MV.The black and red colours indicate original data and SSA trend, respectively.As it can be seen from Figures 3 and 4, as the series length increases the fluctuation in data increases, confirming more noise in AP position.Here we can see that the trend component is precisely extracted for all time classes, indicating the robustness of SSA to outliers.This can be considered as another advantage of SSA over classical methods, including the parametric approach currently used for Bcd analysis.Time Class 14 (8)

Figure 1 .
Figure 1.The RRMSE of SSA M V /SSA LS in the reconstruction of noisy exponential series.

Figure 2 .
Figure 2. Distribution of the estimated parameters of A and λ for noisy Bcd and noise-reduced Bcd (thick line).

Figure 3 .
Figure 3. Temporal dynamics of Bcd expression, cleavage cycles 10 to 13. Red and green lines show the trend extracted by SSA M V and SDD, respectively.

Figure 4 .
Figure 4. Temporal dynamics of Bcd expression, cleavage cycle 14.Red and green lines show the trend extracted by SSA M V and SDD, respectively.

Table 1 .
Standard deviation of parameters before and after applying SSA.

Table 2 .
The RMSE results achieved by SSA and SDD using noisy simulated data.