Next Article in Journal
Translational Control using an Expanded Genetic Code
Previous Article in Journal
Loss of C/EBPδ Exacerbates Radiation-Induced Cognitive Decline in Aged Mice due to Impaired Oxidative Stress Response
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering

1
School of Information Science and Engineering, Qufu Normal University, Rizhao 276826, China
2
Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2019, 20(4), 886; https://doi.org/10.3390/ijms20040886
Submission received: 4 December 2018 / Revised: 11 January 2019 / Accepted: 7 February 2019 / Published: 18 February 2019
(This article belongs to the Section Molecular Informatics)

Abstract

:
Feature selection and sample clustering play an important role in bioinformatics. Traditional feature selection methods separate sparse regression and embedding learning. Later, to effectively identify the significant features of the genomic data, Joint Embedding Learning and Sparse Regression (JELSR) is proposed. However, since there are many redundancy and noise values in genomic data, the sparseness of this method is far from enough. In this paper, we propose a strengthened version of JELSR by adding the L1-norm constraint on the regularization term based on a previous model, and call it LJELSR, to further improve the sparseness of the method. Then, we provide a new iterative algorithm to obtain the convergence solution. The experimental results show that our method achieves a state-of-the-art level both in identifying differentially expressed genes and sample clustering on different genomic data compared to previous methods. Additionally, the selected differentially expressed genes may be of great value in medical research.

1. Introduction

With the emergence of deep sequencing technologies, considerable genomic data have become available. Since genomic data are usually high-dimension small-sample data, that is, the dimension of the gene is large, the dimension of the sample is small, and it is easy to cause interference when performing feature selection and difficult to understand the sample directly [1]. Additionally, a large number of superfluous and extraneous genes are possessed in these genomic data, which severely interfere with the biological processes. As the case stands, only a small minority of genes with biological sense contribute to disease research [2]. Accordingly, how to identify these key genes from the massive high-dimensional genomic data is a hotspot and nodus in research. Furthermore, studies have testified that these key genes are efficaciously extracted by embedding learning [3]. Furthermore, cluster analysis is based on the similarity of each data point to classify the samples or genes, which is helpful for accurate determination of the cancer subtype. Some studies have also demonstrated that embedding learning and sparse regression is good for cluster analysis and feature selection [4,5].
Feature selection is used to pick out k features from m dimensional data ( m   >   k ) to optimize a specific index of the system [6]. Initially, some information on characteristic genes is extracted to constitute the data after the reduction to achieve dimensionality reduction. Following this, some genes associated with disease are dredged up from the low-dimensional data on medical research. At present, feature selection is extensively and continually studied owing to the usefulness and practicality of this method. However, traditional feature selection approaches have some issues: (1) The manifold structure is not fully considered, which can reflect the internal geometric structure in data [7]; and (2) they only use the statistical strategy, which can affect the accuracy and reliability of the results. Moreover, procedures of traditional feature selection method are performed independently, such as embedding learning and sparse regression [8,9]. However, a better performance can be achieved by combining two of the above independent procedures. Hou et al. came up with a new method of feature selection via Joint Embedding Learning and Sparse Regression (JELSR), according to the above ideas [5]. This method is a good solution to the above-mentioned issues. JELSR has a good effect on feature selection outstrip of these traditional methods. Nevertheless, there is still a problem thereinafter; since the L 2 , 1 -norm penalty only sparsely constrains the rows of data, the sparsity of the method is far from satisfactory. However, if the sparseness is not enough, taking too many unrelated genes into account can cause serious errors. Hence, an efficient sparse method should be explored to strengthen the previous method.
LASSO (Least Absolute Shrinkage and Selection Operator) was first proposed by Robert Tibshirani [10]. It constructs an L 1 -norm penalty function to obtain a more refined model and compresses some coefficients to get zero coefficients. Both the L 1 -norm constraint and the L 2 , 1 -norm constraint can produce sparse effects, but the sparsity of the L 1 -norm constraint is decentralized and the L 2 , 1 -norm constraint can only produce row sparseness, as shown in Figure 1. However, the combination of the L 1 -norm and the L 2 , 1 -norm constraints can generate internal row sparsity and enhance the correlation between rows and columns of the matrix [11], as shown in Figure 1, so the occurrence of the redundant and irrelevant genes can be reduced. In this paper, to obtain more sparse effects, we propose a new method by adding an L 1 -norm constraint on the sparse regression matrix based on the JELSR (LJELSR). First, the corresponding graph is constructed to depict the inherent structure of the data. Then, the graph matrix is embedded into the following steps: feature ranking and data regression. Owing to the similarity between data points described by the above constructed graph, the clustering effect is improved to some degree. Finally, to get more sparse effects, we combine embedding learning and sparse regression with L 1 -norm through linear weighting to complete the corresponding feature selection and cluster analysis.
The major merits of our work are shown below:
  • More zero values are produced by adding an L 1 -norm constraint on the sparse regression matrix, such that we get more sparse results.
  • The internal geometric structure of data is preserved along with dimensionality reduction by embedding learning to reduce the occurrence of inaccurate results.
  • Although an exact solution cannot be obtained by our method, we provide a convergent iterative algorithm to get the optimal results.
The other parts of this paper are arranged as listed below. Section 2 details some comparative experiments and the analysis of experimental results from different datasets. Section 3 describes some related materials and presents the methodology of LJELSR. A conclusion of this paper is given in Section 4.

2. Results

The LJELSR, JELSR [5], ReDac [12], and SMART [11] are used to select differentially expressed genes and test the performance of the proposed method for different genomic data. Among them, the JELSR, ReDac, and SMART are used as comparison methods.

2.1. Datasets

To validate the effectiveness of our method, the LJELSR, JELSR, ReDac, and SMART methods are run on three datasets, including the ALL_AML, the colon cancer, and the ESCA datasets. The ALL_AML dataset includes acute lymphoblastic leukemia (ALL) and acute myelogenous leukemia (AML) [13], and ALL has also been divided into T cell subtypes and B cell subtypes. The colon cancer dataset is obtained by [14], to facilitate clustering, and the samples are organized into two categories, namely, diseased and normal samples. Additionally, the esophageal carcinoma dataset (ESCA) is downloaded from the TCGA (The Cancer Genome Atlas, TCGA). The TCGA is a publicly available dataset, where it is acquired from https://tcgadata.nci.nih.gov/tcga/. Some details of the three datasets are listed in Table 1.

2.2. Parameters Selection

In our method, there are mainly four parameters involved, namely three balance parameters α 1 ,   α 2 ,   β   and the nearest neighbor number q . In ALL_AML, we select parameters α 1 ,   α 2 and β from range { 10 5 ,   10 7 ,   10 9 ,   10 11 ,   10 13 ,   10 15 ,   10 17 ,   10 19 ,   10 21 ,   10 23 ,   10 25 } ; in colon, α 1 ,   α 2 and β are found from { 10 5 ,   10 4 ,   10 3 ,   10 2 ,   10 1 ,   10 0 ,   10 1 ,   10 2 ,   10 3 ,   10 4 ,   10 5 } ; and in ESCA, we choose parameters α 1 ,   α 2 and β in the range of { 10 0 ,   10 1 ,   10 2 ,   10 3 ,   10 4 ,   10 5 ,   10 6 ,   10 7 ,   10 8 ,   10 9 ,   10 10 } . The above optimal parameters under different datasets are obtained by five-fold cross-validation. Besides, according to the existing literature [15,16] and a large number of experiments, when q is taken as 5 or 6, the experimental effect is better. In our experiment, we set the value of q to 5.

2.3. Evaluation Metrics

In this study, there are two metrics employed to assess all algorithms: p-value and clustering accuracy (ACC) [17]. Firstly, the size of the p-value is closely related to the relationship between the selected genes and the disease, and there is a negative correlation between them. Additionally, the p-value is obtained by the ToppFun tool, which is a public gene list functional enrichment analysis tool. The p-value cutoff is set as 0.01 in the whole experiment. Secondly, the level of ACC indicates the degree of excellence of the algorithm, and there is a positive correlation between them. The value of the ACC is obtained by the following formula:
ACC = i = 1 n δ ( s i , map ( c i ) ) n
where c i means the label of the clustering and s i is the label of the original data x i . In addition, the value of the δ ( x , Y ) is 1 if x = y and is 0 otherwise, and map ( · ) is a mapping function.

2.4. Feature Selection Analysis

2.4.1. Experimental Results and Analysis on ALL_AML Dataset

For the sake of fairness, the LJELSR, JELSR, ReDac, and SMART are respectively used to extract 100 differentially expressed genes from the ALL_AML dataset to analyze their performance. To verify the effectiveness of the algorithm, the selected genes are put into ToppFun to get the p-values and the resulting p-values are arrayed in an ascending sort order. Then, we pick out the first ten terms listed in Table 2. The ten portions in bold font represent the best p-values. It can be seen from Table 2 that the p-values obtained by the LJELSR method are lower than the p-values obtained by the other three methods. Hence, the performance of the LJELSR surpasses the other three methods.
To further illustrate the relationship between the selected genes and ALL_AML, the selected differentially expressed genes are put in GeneCards for testing. GeneCards is a synthesis database of human genes, providing relationships between disease, gene expression, gene function, and so on. The top five differentially expressed genes associated with ALL_AML obtained by the LJELSR method are listed in Table 3 and the official names of these genes and related diseases are also listed. As can be seen from Table 3, the official name of the “CD7” is the “CD7 Molecule”. CD7 is originally found in T cells of acute lymphoblastic leukemia [18]. It is a membrane glycoprotein of human T lymphocytes and thymocytes. Additionally, it plays an essential role in detecting the interactions of T-cells or B-cells during early lymphoid development. Thence, the loss of CD7 can affect the expression of T-cells, which has a great impact on T-cell leukemia [19]. The “MYB” homologous official name is “MYB Proto-Oncogene, Transcription Factor”. From the official name of this gene, MYB is not only a pro-oncogene, but also a factor that affects the transcription of genes. Furthermore, its duplication can cause leukemia [20]. Other genetic analyses are similar to the above analysis. In all, these genes are directly or indirectly related to leukemia. This table only simply displays partial functions of some of these genes.

2.4.2. Experimental Results and Analysis on Colon Cancer Dataset

In this section, the analytical approach and procedure of this dataset are the same as the previous dataset. The 100 genes extracted by each method are tested for the Gene Ontology (GO) detection tool—ToppFun. The p-values gained by four methods are arranged in ascending order. We single out the p-values of the first ten items and list them in Table 4. The best p-values are indicated in bold typeface. From Table 4, it is obvious that the p-values obtained by the LJELSR method are smaller than the p-values obtained by the other three methods. Therefore, the performance of our method is a big plus over the other three methods.
In addition, the selected differentially expressed genes by the LJELSR method are put in GeneCards for testing. We choose the top five genes associated with colon cancer obtained by the LJELSR method to list in Table 5. Additionally, Table 5 also shows how these genes correspond to the official names and related diseases. From Table 5, the official name of the “ACTB” is “Actin Beta”. This gene can encode one of six different actin proteins, and change in the protein brought about by changes in the gene. It can affect certain biological processes. Furthermore, ACTB goes hand in hand with many cancers and plays a major role in lung and colorectal cancer, and so on [21]. Andersen et al. found that colon cancer is affected by ACTB [22]. The “WW Domain Containing Oxidoreductase” is abbreviated as the “WWOX”. Different from the traditional tumor suppressor genes, its effect is more complicated and extensive on cellular function. Besides, the expression level of WWOX is different in two different colon cancer cell lines, which are the HT29 and SW480 cell lines [23]. Moreover, the expression of WWOX can lead to apoptosis, while defects in this gene are associated with multiple types of cancer. Analysis of the remaining genes is similar to the above-mentioned genes. Table 5 only shows some descriptions of partial genes associated with colon cancer.

2.4.3. Experimental Results and Analysis on ESCA Dataset

In this subsection, the dataset we use is ESCA, which is different from the above two datasets. To further confirm the effectiveness of the algorithm, the experiment is run with this data and the selected differentially expressed genes are placed in ToppFun for GO analysis. We rank all the p-values in ascending order and choose first ten p-values to list in the Table 6, where the best results are highlighted. From Table 6, we can conclude that the p-values obtained by the LJELSR method are mostly smaller than the p-values obtained by the other three methods. Therefore, on the whole, the performance of the LJELSR outperforms the other three methods.
Additionally, the selected differentially expressed genes are put in GeneCards for testing. The top five genes associated with ESCA obtained by the LJELSR method, their name, and related diseases are all displayed in Table 7. From Table 7, the “ERBB2” corresponds to the official name “Erb-B2 Receptor Tyrosine Kinase 2”. ERBB2 is the membrane receptor of 185kDa encoded by proto-oncogene ERBB-2, and is one of the members of the epidermal growth factor receptor family. Additionally, it has been verified that its amplification is closely related to the occurrence of esophageal cancer [24]. “KRT5” is the abbreviation of “Keratin 5”, and it is a member of the keratin family of momentous gene families that encodes the corresponding protein. Additionally, its changes affect the expression of this gene’s families, causing some complex diseases. For example, mutations in KRT5 and KRT14 can cause epidermolysis bullosa to a large extent [25]. The analysis of other genes is similar to the above analysis. Table 7 only shows some functions of partial genes, and detailed information on the remaining genes can be obtained from GeneCards.

2.4.4. Differentially Expressed Genes Comparing by Methods

In this subsection, the selected differentially expressed genes are further analyzed. For the three datasets mentioned above, we explore the common differentially expressed genes and the unique differentially expressed genes obtained by different methods for the same dataset. We select 100 genes for each algorithm and pair them with the officially published disease-causing gene pool to obtain the verified genes. This disease-causing gene pool can be downloaded directly from GeneCards. Table 8 shows the common and unique differentially expressed genes obtained by the four methods for the ESCA dataset. The bold italic indicates the common differentially expressed genes excavated by the four methods. The underlined italic indicates the unique differentially expressed genes excavated by LJELSR, which are not found by other methods. The “Number” means the total number of verified genes. As can be seen from the table, LJELSR has selected more proven genes than other methods; ANXA1, MUC6, FN1, PKM, CD24, GLUL, PLEC, PIGR, ACTB, PABPC1, LYZ, and SPRR1B are the common differentially expressed genes; the unique differentially expressed genes of the LJELSR method are FSCN1, ITGB4, LAMC2, HLA-B, LAMB3, HLA-C, SLC7A5, ENO1, and so on. These genes play an important role in studying the relationship between genes and diseases, the details of these genes can be obtained from GeneCards, and some of the genes have been explained above. In summary, LJELSR has an advantage over other methods in that it can mine more and more valuable differentially expressed genes.

2.5. Clustering Analysis

Kmeans is one of simple algorithms that solve the clustering problem, and its procedure is a relatively easy and simple way to classify the samples by setting a certain number of clusters in advance [26]. For all methods, different datasets have different numbers of sample clustering. The Kmeans method is used to cluster samples on the ALL_AML, colon cancer, and ESCA datasets, where their number of sample clustering is three, two, and two, respectively. Different methods are run for different data to get the values of ACC, and the details are shown in Table 8, where the largest values are marked in bold typeface. From Table 9, we can conclude that the values of ACC obtained by the LJELSR method for different datasets are almost all higher than the other methods. In this section, based on the above analysis, we can summarize that the efficiency of our method is higher than other methods.

3. Materials and Methods

3.1. Related Notations and Definitions

In this paper, m i and m j are defined as the i-th row and j-th column of matrix M , respectively. Denote X R m × n as the original data matrix, where its row represents a gene (feature) and its column represents a sample. For an arbitrary matrix M , L r , s norm is represented as below:
M r , s = ( i = 1 n ( j = 1 m | m i j | r ) s r ) 1 s
where r and s represent a positive number. When r = s = 1 , the above formula becomes the expression of the L 1 norm or LASSO, i.e.,
M 1 = i = 1 n j = 1 m | m i j |
Similarly, when r = 2 , s = 1 , and (2) becomes the expression of the L 2 , 1 norm, i.e.,
M 2 , 1 = i = 1 n ( j = 1 m | m i j | 2 ) 1 2
Additionally, the inherent geometric structure of data is fully taken into account to deal with data for lessening the appearance of the inexact results in actual applications. Considering its benefits, it is also added to our work. Therefore, we firstly construct a q -nearest-neighbor graph G with n vertices in the data space, where a vertex corresponds to a data point [15], and q is the nearest neighbor number. w i j represents the correlation between two data point x j and x j . All w i j make up the weight matrix W . There are many strategies to compute W . However, three strategies are frequently employed [15].
(1) 0-1 weighting:
w i j = { 1 , i f   x i N q ( x j )   o r   x j N q ( x i ) 0 ,   o t h e r w i s e
where N q ( x i ) indicates the set gained by the way of q -nearest neighbors of the data point x i .
(2) Heat kernel weighting:
w i j = { e x i x j 2 σ , i f   x i N q ( x j )   o r   x j N q ( x i ) 0 ,   o t h e r w i s e
where σ is a proper constant, and its value is obtained by the previous experience.
(3) Dot-product weighting:
w i j = { x i T x j , i f   x i N q ( x j )   o r   x j N q ( x i ) 0 ,   o t h e r w i s e
These three strategies are applied to different occasions. Since the operation of the 0-1 weighting is relatively uncomplicated, it is commonly used for computing the weight matrix. The second is widely applied to image data, and the third is frequently used in the IR community for processing documents [15].
Next, we define a diagonal matrix D , where its diagonal values are given as d i i = j w i j . The graph Laplacian matrix L is defined as L = D W [27].

3.2. Joint Embedding Learning and Sparse Regression (JELSR)

Traditional feature selection methods are performed independently. To further heighten the performance of the previous algorithm, the JELSR method was proposed by Hou et al. [5]. Firstly, it uses linear approximation weights and the L 2 , 1 -norm regularization to combine embedding learning and sparse regression to establish a new objective function. Then, the sparse regression matrix is used to finish the corresponding feature selection [5]. The objective function of the JELSR algorithm is written as follows:
min P , Y tr ( YLY T ) + β ( P T X Y 2 2 + α P 2 , 1 ) s . t . YY T = I k × k
where Y R k × n is a low dimensional embedding matrix; P R m × k is the sparse regression matrix; α and β are two balance parameters; and k is the dimension of low-dimensional space.

3.3. The Proposed Method

Assuming that the expression level of the gene is within the normal range, the higher the sparseness of the sparse regression matrix, the easier it is to find the differentially expressed genes. What is more, the results of identifying the differentially expressed genes are much more accurate.
Compared with the previous methods, the JELSR method proposed by Hou et al. [5] has a good effect on the feature selection. However, it is inevitable that some redundancy values and artificial noise values in the sparse regression matrix have to be taken into account, such that the sparseness of this method is far from satisfactory. Consequently, it is necessary to discover an efficacious sparse method to improve the performance. Therefore, we propose the LJELSR method, which may improve the sparseness of the algorithm and the accuracy of the results.
The objective function of LJELSR is:
min P , Y tr ( YLY T ) + β ( P T X Y 2 2 + α 1 P 1 + α 2 P 2 , 1 ) s . t . YY T = I k × k
where Y R k × n is a low-dimensional embedding matrix; P R m × k is the sparse regression matrix; and α 1 , α 2 , and β are three balance parameters.

3.4. Optimization

Since there are the L 1 -norm and the L 2 , 1 -norm constraints on the objective formula, it is difficult to optimize and solve the optimal solution directly. With the elicitation of Wang et al. [11] and Nie et al. [28], the iterative strategy is introduced to solve the above problem. Now, we will explain in detail the specific optimization process of our method.
Before solving the optimization problem, we introduce two diagonal matrices, U R m × m and U ˜ R m × m , whose i-th diagonal values are defined as follows:
U = i = 1 k U i ,   ( U i ) j j = 1 2 | P j i | ( j = 1 m )
U ˜ j j = 1 2 p j 2
where U i represents i-th diagonal matrix U ; ( U i ) j j represents the j-th diagonal value of the i-th matrix U ; and U ˜ j j indicates the j-th diagonal value of matrix U ˜ .
To prevent the emergence of spillover, a small constant ε is added to the diagonal matrices   U and U ˜ , respectively, that is:
U = i = 1 k U i ,   ( U i ) j j = 1 2 max ( | p j i | , ε ) ( j = 1 m )
U ˜ j j = 1 2 max ( p j 2 , ε )
Owing to the fact that the partial derivatives of P 1 and tr ( P T UP ) on P are identical, P 1 can be replaced by tr ( P T UP ) . Analogously, we carry out an exchange for P 2 , 1 and tr ( P T U ˜ P ) . Therefore, (9) can be rewritten as:
min P , U , U ˜ , Y tr ( YLY T ) + β ( P T X Y 2 2 + α 1 tr ( P T UP ) + α 2 tr ( P T U ˜ P ) ) s . t . YY T = I k × k
We firstly optimize the matrix P . We denote
L ( P , U , U ˜ ) = P T X Y 2 2 + α 1 tr ( P T UP ) + α 2 tr ( P T U ˜ P )
When two diagonal matrices U and U ˜ are fixed, we compute the partial derivative of L ( P , U , U ˜ ) on P and make it equal to zero. Therefore, we can get the following equation:
L ( P ) P = 2 XX T P 2 XY T + 2 α 1 UP + 2 α 2 U ˜ P = 0
namely,
P = ( XX T + α 1 U + α 2 U ˜ ) 1 XY T
To facilitate optimization, we introduce an auxiliary variable A into the objective formula, and denote A = XX T + α 1 U + α 2 U ˜ . According to the above analysis, (17) is brought into (14), such that we will get
L ( P , U , U ˜ , Y ) = tr ( Y ( L + β I n × n β X T A 1 X ) Y T )
Since Y is subject to the orthogonal constraint YY T = I k × k , the optimization problem of Y becomes
arg   min Y tr ( Y ( L + β I n × n β X T A 1 X ) Y T ) s . t . YY T = I k × k  
When A and L are fixed, we use the strategy of the eigen-decomposition of the matrix G = ( L + β I n × n β X T A 1 X ) Y T to update Y in (19). For Y i ( i = 1 , 2 , , k ) , we firstly choose the k smallest eigenvalues of the matrix G, and then seek out the corresponding eigenvectors to constitute a new matrix Y = ( Y 1 , Y 2 , , Y k ) [29]. In addition, the diagonal matrices U and U ˜ are updated by (12) and (13) when P is fixed, respectively. Additionally, we initialize U and U ˜ as an identity matrix, respectively.

3.5. Feature Selection

According to the above update rules, the sparse regression matrix P is acquired after repeated iterations of LJELSR. Then, to acquire some differentially expressed genes, we conduct a detailed analysis of the matrix P . In the first place, each element of the matrix P is subjected to absolute processing. Secondly, we sum the absolute values by each row of the sparse regression matrix P and get a new vector, as follows.
P ¯ = ( P ¯ 1 , P ¯ 2 , , P ¯ m ) T
P ¯ i = j = 1 k = | P ¯ i j |
Then, we rank the vector P ¯ i in descending order to get a new vector, as follows:
P ˜ = ( P ˜ 1 , P ˜ 2 , , P ˜ m ) T
Finally, the genes corresponding to the first l values are selected as differentially expressed genes to analyze their property ( l   <   m ) . By and large, the value of the element is directly proportional to the importance of the corresponding gene. Next, we put the selected genes into ToppFun and GeneCards to analyze them, where they are publicly accessible from https://toppgene.cchmc.org/enrichment.jsp and http://www.genecards.org/, respectively.
In summary, the procedure of LJELSR is shown in Algorithm 1.
Algorithm 1. Procedure of LJELSR.
Input: Data matrix X ; Neighborhood size q ; Balance parameters α 1 , α 2 , β ; Dimensionality of embedding k ; Feature selection number l .
Output: Selected feature index set { P 1 , P 2 , , P m }
Stage one: Graph construction
Construct the weight matrix W ; Compute the diagonal matrix D , graph Laplacian matrix L ;
Stage two: Alternative optimization,
Initialize U = U ˜ = I m × m ;
Loop
Update Y and fix A , L by (19),
Update P and fix U , U ˜ by (17),
Update U , U ˜ and fix   P by (12) and (13).
until convergence
Stage three: Feature selection

3.6. Convergence Analysis

In this study, an alternative algorithm is utilized to finish iteratively updating work of the proposed method. Now, let us analyze its convergence behavior of LJELSR. The lemma given below was proposed by Nie et al. [28].
Lemma 1.
For any non-zero vectors P , P t R k , the following inequality holds:
P 2 P 2 2 2 P t 2 P t 2 P t 2 2 2 P t 2
The convergence result of the proposed method is explained by the following theorem:
Theorem 1.
The value of the target function for each iteration is monotonically decreasing in the Algorithm 1. Detailed proof of Theorem 1 is given in the Appendix A.

4. Conclusions

In this paper, we discuss a new feature selection method by adding an L 1 -norm constraint on the sparse regression matrix based on the JELSR method. Firstly, the four methods are executed to select differentially expressed genes and cluster the samples from ALL_AML, colon cancer, and ESCA datasets, respectively. Secondly, some of materials related to this paper are presented, and our methods and the corresponding optimization strategy are given. Finally, the conclusion is drawn that the performance of the proposed method is better than other methods through the experimental results.

Author Contributions

The study was jointly designed by S.-S.W. and J.-X.L. Both the experimental execution of the LJESLR method and the writing of the manuscript were completed by S.-S.W. The data analysis was contributed by M.-X.H and C.-M.F. The final manuscript was read and approved by all authors.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61872220 and 61572284.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof. 
We can come to the following conclusion in accordance with the objective function of the proposed method.
L ( P t + 1 , Y t + 1 ) = tr ( Y t L ( Y t ) T ) + β ( ( P t ) T X Y t 2 2 + α 1 P t 1 + α 2 P t 2 , 1 ) = tr ( Y t L ( Y t ) T ) + β ( tr ( ( P t ) T X Y t ) T ( ( P t ) T X Y t ) ) + β ( α 1 i = 1 k ( p i t ) T U i t p i t + α 2 tr ( ( P t ) T ) U t ( P t ) )
Consequently, we obtain
tr ( Y t + 1 L ( Y t + 1 ) T ) + β ( tr ( ( P t + 1 ) T X Y t + 1 ) T ( ( P t + 1 ) T X Y t + 1 ) ) + β ( α 1 i = 1 k ( p i t + 1 ) T U i t p i t + 1 + α 2 tr ( ( P t + 1 ) T ) U t ( P t + 1 ) ) tr ( Y t L ( Y t ) T ) + β ( tr ( ( P t ) T X Y t ) T ( ( P t ) T X Y t ) ) + β ( α 1 i = 1 k ( p i t ) T U i t p i t + α 2 tr ( ( P t ) T ) U t ( P t ) ) tr ( Y t + 1 L ( Y t + 1 ) T ) + β ( tr ( ( P t + 1 ) T X Y t + 1 ) T ( ( P t + 1 ) T X Y t + 1 ) ) + α 1 β i = 1 m j = 1 k ( ( p i j t + 1 ) 2 2 p i j t + 1 p i j t + 1 + p i j t + 1 ) + α 2 β c = 1 m ( ( p t + 1 ) c 2 2 2 ( p t + 1 ) c 2 ( p t + 1 ) c 2 + ( p t + 1 ) c 2 ) tr ( Y t L ( Y t ) T ) + β ( tr ( ( P t ) T X Y t ) T ( ( P t ) T X Y t ) ) + α 1 β i = 1 m j = 1 k ( p i j t + ( p i j t ) 2 2 p i j t p i j t ) + α 2 β c = 1 m ( ( p t ) c 2 + ( p t ) c 2 2 2 ( p t ) c 2 ( p t ) c 2 ) tr ( Y t + 1 L ( Y t + 1 ) T ) + β ( tr ( ( P t + 1 ) T X Y t + 1 ) T ( ( P t + 1 ) T X Y t + 1 ) ) +   β ( α 1 i = 1 m j = 1 k ( p i j t + 1 ) + α 2 c = 1 m ( ( p t + 1 ) c 2 ) ) tr ( Y t L ( Y t ) T ) + β ( tr ( ( P t ) T X Y t ) T ( ( P t ) T X Y t ) ) + β ( α 1 i = 1 m j = 1 k ( p i j t ) + α 2 c = 1 m ( ( p t ) c 2 ) )
According to Lemma 1, the last step is established. Thus, this algorithm monotonically decreases the objective value in each iteration. □

References

  1. Church, G.M.; Gilbert, W. Genomic sequencing. Proc. Natl. Acad. Sci. USA 1984, 81, 1991–1995. [Google Scholar] [CrossRef] [PubMed]
  2. Liao, J.C.; Boscolo, R.; Yang, Y.-L.; Tran, L.M.; Sabatti, C.; Roychowdhury, V.P. Network component analysis: Reconstruction of regulatory signals in biological systems. Proc. Natl. Acad. Sci. USA 2003, 100, 15522–15527. [Google Scholar] [CrossRef] [PubMed]
  3. Constantinopoulos, C.; Titsias, M.K.; Likas, A. Bayesian feature and model selection for gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1013–1018. [Google Scholar] [CrossRef] [PubMed]
  4. Nie, F.; Zeng, Z.; Tsang, I.W.; Xu, D.; Zhang, C. Spectral embedded clustering: A framework for in-sample and out-of-sample spectral clustering. IEEE Trans. Neural Netw. 2011, 22, 1796–1808. [Google Scholar] [PubMed]
  5. Hou, C.; Nie, F.; Yi, D.; Wu, Y. Feature selection via joint embedding learning and sparse regression. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, 16–22 July 2011; pp. 1324–1329. [Google Scholar]
  6. D’Addabbo, A.; Papale, M.; Di Paolo, S.; Magaldi, S.; Colella, R.; d’Onofrio, V.; Di Palma, A.; Ranieri, E.; Gesualdo, L.; Ancona, N. Svd based feature selection and sample classification of proteomic data. In Knowledge-Based Intelligent Information and Engineering Systems; Springer: Berlin/Heidelberg, Germany, 2008; pp. 556–563. [Google Scholar]
  7. Cai, D.; He, X.; Han, J. Spectral regression for efficient regularized subspace learning. Proceedings 2007, 149, 1–8. [Google Scholar]
  8. Zhao, Z.; Wang, L.; Liu, H. Efficient spectral feature selection with minimum redundancy. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), Atlanta, GA, USA, 11–15 July 2010; pp. 673–678. [Google Scholar]
  9. Cai, D.; Zhang, C.; He, X. Unsupervised feature selection for multi-cluster data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; pp. 333–342. [Google Scholar]
  10. Tibshirani, R.J. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  11. Wang, H.; Nie, F.; Huang, H.; Risacher, S.; Ding, C.; Saykin, A.J.; Shen, L. Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 557–562. [Google Scholar]
  12. Zhao, Q.; Meng, D.; Xu, Z. A recursive divide-and-conquer approach for sparse principal component analysis. arXiv, 2012; arXiv:1211.7219. [Google Scholar]
  13. Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 2004, 101, 4164–4169. [Google Scholar] [CrossRef] [PubMed]
  14. Alon, U.; Barkai, N.; Notterman, D.A.; Gish, K.; Ybarra, S.; Mack, D.; Levine, A.J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 1999, 96, 6745–6750. [Google Scholar] [CrossRef] [PubMed]
  15. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar] [PubMed]
  16. Shang, R.; Wang, W.; Stolkin, R.; Jiao, L. Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. 2017, 1–14. [Google Scholar] [CrossRef] [PubMed]
  17. Wu, M.; Schölkopf, B. A local learning approach for clustering. In Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; pp. 1529–1536. [Google Scholar]
  18. Aruffo, A.; Seed, B. Molecular cloning of two cd7 (T-cell leukemia antigen) cdnas by a cos cell expression system. EMBO J. 1987, 6, 3313–3316. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, T.Y.; Chen, C.Y.; Tien, H.F.; Lin, C.W. Loss of cd7, independent of galectin-3 expression, implies a worse prognosis in adult T-cell leukaemia/lymphoma. Histopathology 2009, 54, 214–220. [Google Scholar] [CrossRef] [PubMed]
  20. Lahortiga, I.; De Keersmaecker, K.; Van Vlierberghe, P.; Graux, C.; Cauwelier, B.; Lambert, F.; Mentens, N.; Beverloo, H.B.; Pieters, R.; Speleman, F. Duplication of the myb oncogene in t cell acute lymphoblastic leukemia. Nat. Genet. 2007, 39, 593–595. [Google Scholar] [CrossRef] [PubMed]
  21. Guo, C.; Liu, S.; Wang, J.; Sun, M.-Z.; Greenaway, F.T. Actb in cancer. Clin. Chim. Acta 2013, 417, 39–44. [Google Scholar] [CrossRef] [PubMed]
  22. Andersen, C.L.; Jensen, J.L.; Ørntoft, T.F. Normalization of real-time quantitative reverse transcription-pcr data: A model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res. 2004, 64, 5245–5250. [Google Scholar] [CrossRef] [PubMed]
  23. Nowakowska, M.; Pospiech, K.; Lewandowska, U.; Piastowska-Ciesielska, A.W.; Bednarek, A.K. Diverse effect of wwox overexpression in ht29 and sw480 colon cancer cell lines. Tumor Biol. 2014, 35, 9291–9301. [Google Scholar] [CrossRef] [PubMed]
  24. Dahlberg, P.S.; Jacobson, B.A.; Dahal, G.; Fink, J.M.; Kratzke, R.A.; Maddaus, M.A.; Ferrin, L.J. Erbb2 amplifications in esophageal adenocarcinoma. Ann. Thorac. Surg. 2004, 78, 1790–1800. [Google Scholar] [CrossRef] [PubMed]
  25. Bolling, M.; Lemmink, H.; Jansen, G.; Jonkman, M. Mutations in krt5 and krt14 cause epidermolysis bullosa simplex in 75% of the patients. Br. J. Dermatol. 2011, 164, 637–644. [Google Scholar] [CrossRef] [PubMed]
  26. Xu, J.; Liu, H. Web user clustering analysis based on kmeans algorithm. In Proceedings of the International Conference on Information NETWORKING and Automation, Kunming, China, 18–19 October 2010. [Google Scholar]
  27. Zhou, D.; Huang, J.; Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1601–1608. [Google Scholar]
  28. Nie, F.; Huang, H.; Cai, X.; Ding, C.H. Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; pp. 1813–1821. [Google Scholar]
  29. Hou, C.; Nie, F.; Li, X.; Yi, D.; Wu, Y. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Trans. Cybern. 2014, 44, 793–804. [Google Scholar] [PubMed]
Figure 1. The diagrammatic sketch of different norms.
Figure 1. The diagrammatic sketch of different norms.
Ijms 20 00886 g001
Table 1. Details of the three datasets.
Table 1. Details of the three datasets.
DatasetsGenesSamplesClassesDescription
ALL_AML5000383acute lymphoblastic leukemia and acute myelogenous leukemia
colon2000622colon cancer
ESCA20,5021922esophageal carcinoma
Table 2. The p-values of different methods applied to the ALL_AML dataset.
Table 2. The p-values of different methods applied to the ALL_AML dataset.
IDLJELSRJELSRReDacSMART
GO:00069552.249 × 10−203.157 × 10−188.447 × 10−141.900 × 10−10
GO:00507768.768 × 10−181.312 × 10−157.228 × 10−121.000 × 10−9
GO:00453211.306 × 10−161.808 × 10−141.154 × 10−112.500 × 10−10
GO:00017751.548 × 10−151.604 × 10−136.621 × 10−111.590 × 10−10
GO:00512512.005 × 10−154.157 × 10−149.117 × 10−124.380 × 10−12
GO:00071592.098 × 10−152.754 × 10−153.712 × 10−111.510 × 10−10
GO:00026822.477 × 10−152.725 × 10−147.817 × 10−125.870 × 10−8
GO:00466493.964 × 10−155.426 × 10−143.164 × 10−101.540 × 10−11
GO:00163377.034 × 10−158.993 × 10−145.253 × 10−111.940 × 10−11
GO:00704867.220 × 10−159.350 × 10−151.299 × 10−115.400 × 10−10
Table 3. The top five genes selected by LJELSR for the ALL_AML dataset.
Table 3. The top five genes selected by LJELSR for the ALL_AML dataset.
GeneGene Official NameRelated Diseases
CD34CD34 MoleculeDermatofibrosarcoma Protuberans and Hypercalcemic Type Ovarian Small Cell Carcinoma
CD7CD7 MoleculePityriasis Lichenoides Et Varioliformis Acuta and T-Cell Leukemia
MYBMYB Proto-Oncogene, Transcription FactorAcute Basophilic Leukemia and Angiocentric Glioma
CXCR4C-X-C Motif Chemokine Receptor 4Whim Syndrome and Human Immunodeficiency Virus Infectious Disease
CTSGCathepsin GPapillon-Lefevre Syndrome and Cutaneous Mastocytosis
Table 4. The p-values of different methods for the colon dataset.
Table 4. The p-values of different methods for the colon dataset.
IDLJELSRJELSRReDacSMART
GO:00066141.612 × 10−179.836 × 10−142.677 × 10−141.016 × 10−12
GO:00066133.970 × 10−172.060 × 10−135.617 × 10−141.970 × 10−12
GO:00450475.074 × 10−172.519 × 10−136.872 × 10−142.360 × 10−12
GO:00725998.161 × 10−173.720 × 10−131.016 × 10−133.348 × 10−12
GO:00226263.104 × 10−168.999 × 10−132.465 × 10−131.125 × 10−11
GO:00001843.751 × 10−161.301 × 10−123.568 × 10−131.029 × 10−11
GO:00037355.428 × 10−166.787 × 10−131.412 × 10−133.310 × 10−12
GO:00709726.180 × 10−161.960 × 10−125.384 × 10−131.488 × 10−11
GO:00190839.571 × 10−161.932 × 10−121.547 × 10−113.005 × 10−10
GO:00444451.372 × 10−151.487 × 10−123.106 × 10−138.772 × 10−12
Table 5. The top five genes selected by LJELSR for the colon dataset.
Table 5. The top five genes selected by LJELSR for the colon dataset.
GeneGene Official NameRelated Diseases
MUC3AMucin 3A, Cell Surface AssociatedCap Polyposis and Hypertrichotic Osteochondrodysplasia
ACTBActin BetaDystonia, Juvenile-Onset and Baraitser-Winter Syndrome 1
WWOXWW Domain Containing OxidoreductaseSpinocerebellar Ataxia, Autosomal Recessive 12andEpileptic Encephalopathy, Early Infantile, 28
SPI1Spi-1 Proto-OncogeneInflammatory Diarrhea and Interdigitating Dendritic Cell Sarcoma
RPS24Ribosomal Protein S24Diamond-Blackfan Anemia 3 and Diamond-Blackfan Anemia
Table 6. The p-values of different methods for the ESCA dataset.
Table 6. The p-values of different methods for the ESCA dataset.
IDLJELSRJELSRReDacSMART
GO:00051982.772 × 10278.941 × 10181.096 × 10181.036 × 104
GO:00701617.504 × 10213.347 × 10142.733 × 10236.527 × 1014
GO:00300551.400 × 10191.111 × 10103.576 × 10171.619 × 1012
GO:00059126.655 × 10191.946 × 10114.401 × 10203.498 × 1012
GO:00059251.319 × 10187.671 × 10112.103 × 10171.061 × 1012
GO:00059241.746 × 10189.246 × 10112.747 × 10171.313 × 1012
GO:00056155.538 × 10168.395 × 10203.985 × 10152.117 × 1017
GO:00300545.243 × 10144.156 × 10105.243 × 10144.510 × 109
GO:00052003.076 × 10131.301 × 10102.062 × 10132.358 × 104
GO:00420607.442 × 10132.650 × 1094.536 × 10128.790 × 1011
Table 7. The top five genes selected by LJELSR for the ESCA dataset.
Table 7. The top five genes selected by LJELSR for the ESCA dataset.
GeneGene Official NameRelated Diseases
ERBB2Erb-B2 Receptor Tyrosine Kinase 2Glioma Susceptibility 1andOvarian Cancer, Somatic
KRT14Keratin 14Epidermolysis Bullosa Simplex, Koebner Type and Epidermolysis Bullosa Simplex, Recessive 1
KRT5Keratin 5Epidermolysis Bullosa Simplex, Dowling-Meara Type and Epidermolysis Bullosa Simplex, Weber-Cockayne Type
KRT19Keratin 19Anal Canal Adenocarcinoma and Thyroid Cancer
KRT4Keratin 4White Sponge Nevus 1andWhite Sponge Nevus Of Cannon, Krt4-Related
Table 8. Differentially expressed genes of four methods for the ESCA dataset.
Table 8. Differentially expressed genes of four methods for the ESCA dataset.
MethodsDifferentially Expressed GenesNumber
LJELSRERBB2, KRT14, KRT5, KRT19, KRT4, TFF1,FSCN1, KRT13,ITGB4, ANXA1, MUC6,LAMC2,HLA-B, KRT16, JUP, KRT17,LAMB3, ATP4A, DSP,LAMA3, FOS, FN1, CTSB, MYH11,HLA-C, LDHA, PKM,SLC7A5, PSCA, SERPINA1, S100A7, S100A9, CRNN, S100A8, B2M, DMBT1, CD24,ENO1, TNC,KRT15, GLUL,HSPA1A, NDRG1, LCN2,COL17A1,CEACAM6, REG1A, PLEC, GAPDH, PIGR, AGR2,ANPEP, PKP1, ACTB, FLNA,PI3, FTL, CTSE,PABPC1, PGC, ALDOA, EEF2,KRT6B, LYZ, CLDN18, SPRR1B,KRT6C,PPP1R1B, PGA3,COL3A1,C3,REG1B,PERP, KRT6A,PGA5,CES1,PGA4,EIF178
JELSRMUC1, KRT14, KRT5, KRT19, KRT8, KRT4, TFF1, FN1, MUC6, S100A8, CTSD, SPRR3, ATP4A, HSPB1, FOS, CEACAM5, GJB2, H19, EZR, KRT16, KRT13, CTSB, JUP, ANXA1, TFF2, S100A9, SFN, KRT17, PSCA, S100A2, MUC5B, COL1A1, CD24, PKM, DSC3, GLUL, MALAT1, REG1A, ACTN4, DSG3, LCN2, DSP, S100A7, B2M, MYH9, PKP1, S100A11, PIGR, HSPA8, PLEC, TRIM29, EEF1A1, ATP1B1, AGR2, LYZ, ACTB, PGC, PABPC1, SPRR2A, SPRR1B, SPRR1A, CA2, REG4, P4HB, CLDN18, CTSE, EEF2, CREB3L1, KRT6A, A2M, PGA371
ReDacKRT14, KRT5, KRT4, FN1, CRNN, TGM3, MUC6, S100A8, IL1RN, SPRR3, ATP4A, HSPB1, MAL, KRT16, KRT13, JUP, THBS1, ANXA1, S100A9, PSCA, ECM1, CD24, PKM, FTL, HSPG2, DES, GLUL, MALAT1, PPL, EMP1, ACTN4, MYH11, CSTA, GAPDH, TAGLN, DSP, B2M, MYH9, GSN, PKP1, S100A11, PIGR, HSPA8, FLNA, PLEC, MYLK, CSTB, TRIM29, EEF1A1, RPL3, LYZ, PSAP, ACTB, ALDOA, PGC, PABPC1, SYNM, SPRR2A, SPRR1B, SPRR1A, REG4, P4HB, EEF2, KRT6A, ACTG2, PGA366
SMARTERBB2, CCND1, GSTP1, CD44, KRT19, MUC4, MUC2, GRB7, TFF1, HLA-A, FN1, SOD2, ITGA6, NDRG1, SPP1, SERPINA1, MUC6, CTSD, HSPB1, CEACAM5, H19, CTSB, F5, ITGB1, ANXA1, SDC1, DMBT1, CLU, LDHA, CD24, APP, PKM, FTL, HSPG2, TNC, GLUL, MALAT1, NTS, LCN2, MYH9, PIGR, FLNA, CD55, PLEC, TSPAN8, EEF1A1, AGR2, LYZ, GPX2, ACTB, DSG2, PABPC1, SPRR1B, REG4, SCD, CLDN18, FAT157
Table 9. ACC of different methods for different datasets.
Table 9. ACC of different methods for different datasets.
MethodsALL_AMLColonESCA
LJELSR81.57964.52096.354
JELSR81.57961.29095.833
ReDac68.42161.29095.313
SMART44.74063.98094.790
Kmeans78.53053.42096.350

Share and Cite

MDPI and ACS Style

Wu, S.-S.; Hou, M.-X.; Feng, C.-M.; Liu, J.-X. LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering. Int. J. Mol. Sci. 2019, 20, 886. https://doi.org/10.3390/ijms20040886

AMA Style

Wu S-S, Hou M-X, Feng C-M, Liu J-X. LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering. International Journal of Molecular Sciences. 2019; 20(4):886. https://doi.org/10.3390/ijms20040886

Chicago/Turabian Style

Wu, Sha-Sha, Mi-Xiao Hou, Chun-Mei Feng, and Jin-Xing Liu. 2019. "LJELSR: A Strengthened Version of JELSR for Feature Selection and Clustering" International Journal of Molecular Sciences 20, no. 4: 886. https://doi.org/10.3390/ijms20040886

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop