# Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}

^{10}

^{11}

^{12}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methodology of ICA Application to Cancer Omics Data

#### 2.1. Brief Introduction into Matrix Factorization Applied to Omics Data

**a**and

_{k}**s**vectors gives a one-rank matrix of the same dimension as

_{k}**X**):

**a**and

_{k}**s**such that:

_{k}**a**and

_{k}**s**will be called a component throughout this review. Therefore, a component is represented by a vector

_{k}**s**of size m containing weights of omics variables (genes, proteins, CpG sites, etc.). At the same time a component is associated to a vector

_{k}**a**of size N, containing contributions of the component to measured samples. We will use these notations and meaning of

_{k}**a**and

_{k}**s**vectors throughout the whole review.

_{k}**a**and

_{k}**s**. For example, the terms “loadings”, “activations”, “factor strength” or “sample-associated weights” have been used to denote the elements of

_{k}**a**vectors. The matrix composed from the

_{k}**a**vectors is sometimes called the “mixing matrix” and denoted as

_{k}**A**. The elements of

**s**vectors have been called “weights of the component” or “signals” and the matrix composed of them (denoted as

_{k}**S**) is sometimes called the “signal matrix”. Moreover,

**s**vectors themselves are frequently referred to as “components” or “factors”.

_{k}**s**vector is frequently named a metagene [16]; in the case of other data types one can use similar naming, e.g., a metaCpG for the analysis of DNA methylation profiles. Further we will use the term metagene (or metagene weights for the individual elements) to refer to vector

_{k}**s**even when describing application of ICA to various data types. Similarly, the

_{k}**a**vectors are sometimes called metasamples, and we will adopt this term in the text (referring to the individual vector elements as metasample weights), see Figure 1a.

_{k}**X**matrix is known; the

**a**and

_{k}**s**vectors are unknown. As such, the problem of matrix factorization (**) is heavily underdetermined, and additional constraints need to be introduced on

_{k}**a**and

_{k}**s**vectors in order to find its solution. First of all, it can be required that the all

_{k}**a**vectors would have length one.

_{k}**a**vectors: (

_{k}**a**,

_{i}**a**) = 0, for i ≠ j and that the solution of (**) should give the same result for different orders of matrix decomposition p, i.e.,

_{j}**a**and

_{k}**s**vectors computed for the order p = p’ would be the same as for the decomposition of order p’’ > p’. In this case, solving (**) is equivalent to computing the singular value decomposition (SVD) of

_{k}**X**and gives a set of principal components. There exist several ways to introduce PCA, as reviewed in Reference [17].

**a**and

_{k}**s**vectors would be non-negative. This constrains the problem (**) and leads to NMF. The simplest approach to solve (**) with these constraints is to repetitively apply the non-negative least squares regression method, considering

_{k}**a**as unknown at one iteration and

_{k}**s**as unknown at the next iteration, until convergence to a local minimum.

_{k}**s**(or sometimes, vectors

_{k}**a**) have to represent maximally mutually independent distributions, for different k. The perfect independence would mean that the joint probability distribution P(s

_{k}_{1},…,s

_{p}) can be factorized as $P({s}_{1},\dots ,{s}_{p})\text{}=\text{}{P}_{1}({s}_{1})\times \text{}{P}_{2}({s}_{2})\dots \times \text{}{P}_{p}({s}_{p})$. Here, one assumes that the elements of vectors

**s**are i.i.d. samples of the underlying probability distributions P

_{k}_{k}(s

_{k}).

**a**vectors and the components cannot be naturally ranked. The NMF components contain only non-negative elements, which makes the intuitive picture of the additive action of metagenes simpler to interpret, while in PCA and ICA some metagenes can cancel the action of other metagenes if they are summed up with different signs.

_{k}#### 2.2. ICA Algorithms

**s**vectors). It can be shown that maximizing entropy of joint distributions of pairs of

_{k}**s**leads to minimizing their mutual information.

_{k}**s**distributions [20]. Quantification of non-Gaussianity for continuous distributions involves negentropy (or Gibbs free energy, in physics). Negentropy measures the departure from Gaussianity of a random vector of density $P(u)$ by comparing its entropy to the entropy of a normal distribution with same mean and variance. The entropy is defined with a negative sign ($S=-\int P(u)logP(u)du$) and the negentropy is, therefore, a non-negative function reaching zero only for the standardized normal distribution. For the mathematical details, we refer the reader to the classical works [19,20].

_{k}**s**or

_{k}**a**vectors is always finite in real-life applications, one needs to introduce the way to effectively approximate it from the finite samples. For this purpose, various surrogate functions (called non-linearity functions) have been proposed, one the most popular of which involving the kurtosis. Empirically, kurtosis was found to be an appropriate choice of non-linearity in the analysis of transcriptomic data. Other types of non-linearity functions have been suggested; however, the appropriate choice of non-linearity for applying ICA to different kinds of omics measurements remains an open question. The two most popular ICA algorithms based on non-Gaussianity maximization are fastICA [20] and joint approximation diagonalization of Eigen-matrices (JADE) [21]. Most of the recent applications of ICA to omics data were based on fastICA, utilizing approximate Newton iterations to optimize a non-Gaussianity measure. However, other approaches to computing independent components have been used such as the product density estimation-based method (ProDenICA), claimed to have higher sensitivity to a wider range of source distributions than fastICA [22,23].

_{k}**a**and

_{k}**s**vector elements. This allows combining the nice properties of non-negative mixture problem and the requirement for mutual independence of the components. A kernel version of ICA have been developed [25] and sparse ICA was proposed in Reference [26], but both have not yet found wide applications in omics data analysis (though kernel ICA was exploited in Reference [27]). Finally, tensorial ICA was recently developed in References [28,29] and recently applied to the joint analysis of gene expression, copy number changes, and DNA methylation data from colon cancer with some promising results (see more in Section 3.5).

_{k}#### 2.3. Various Ways to Apply ICA to Omics Data

**s**) or metasamples (vectors

_{k}**a**). Technically, the first case corresponds to the application of ICA algorithm to the initial matrix

_{k}**X**containing samples as rows and omics variables as columns, and the second case corresponds to the application of ICA to the transposed matrix

**X**. Surprisingly, both ways of applying ICA to omics data are wide-spread, and sometimes it makes an effort to figure out in which way ICA was applied. Some studies aim at maximizing the non-Gaussianity of metagenes [2,33,34,35], while others maximize non-Gaussianity of metasamples [36,37]. Empirically it was shown that maximizing the non-Gaussianity of metagenes is clearly preferable in gene expression analyses to maximizing the non-Gaussianity of metasamples [38]. This choice leads to much better reproducibility of metagenes in independent datasets as well as to better interpretability of the components computed within the same dataset.

**a**and

_{k}**s**can be inverted simultaneously without changing the definition of the component. Some methods (such as BIODICA or DeconICA) avoid this ambiguity by assuming that the heaviest tail of the

_{k}**s**distribution should correspond to positive values, which usually gives satisfactory results. In Reference [43], each ICA component was characterized by two sets of top contributing genes, from the negative and the positive side of the metagene weight distribution. The largest such set was called a dominating module and the final orientation of the component was chosen to make the weights of the dominating module positive. In other cases, labeling of samples can be used in order to select one of the two possible signs of

_{k}**a**and

_{k}**s**. In this case, the orientation was chosen based on the values of

_{k}**a**vectors. For example, in a disease study, one can require that any component would be oriented towards aggravation of the disease condition (e.g., from normal samples to more aggressive cancer stages). This approach was recently used for quantifying disease comorbidity using ICA [44].

_{k}#### 2.4. Assessment and Comparison with Other Matrix Factorization Methods

#### 2.5. Estimating the Number of Independent Components

**s**distributions are characterized by the presence of one or few weights with exceptionally large values, separated by a significant gap from the other values. Secondly, it was shown that a certain level of over-decomposition of transcriptomic datasets, i.e., choosing the number of independent components several times larger than MSTD, does not drastically change the definition of most of the components within the MSTD range. At the same time, it was observed that increasing the number of independent components over the MSTD value sometimes leads to biologically meaningful splitting of the components. For example, a component within the MSTD range which was associated with the total level of immune infiltrate in tumoral microenvironment splits into three components in higher-order decompositions which can be associated with the presence of T-cells, B-cells, and macrophages [34,42].

_{k}#### 2.6. Methods for Interpretation of Independent Components

**s**vectors (e.g., applying hypergeometric test or overrepresentation analysis (Webgestalt 2017) to the set of most contributing genes, or Gene Set Enrichment Analysis to the whole ranking defined by

_{k}**s**), using large-scale collections of reference gene sets. The distribution of gene weights from

_{k}**s**vectors can be projected on top of genome-wide biological network reconstructions where the network edges represent different types of interactions or regulations between genes and/or proteins. This can be further used for various types of network-based analyses, leading to the determination of biological network “hotspot” areas and eliminating the need of having a reference gene set collection [50]. The

_{k}**s**vectors (resulting from the analysis of transcriptomic or methylome data) can be projected onto genome and be a subject of peak-calling analysis, which can sometimes lead to associating a component to genomic alterations [33].

_{k}**a**are used to associate components to sample annotations such as clinical data (tumor stage, molecular classification, time label, sample processing data, etc.). Metasamples can be also associated with some clinically relevant molecular data, such as mutations in known cancer drivers for a particular cancer type. Metasamples can be also associated with known labels for molecular tumor subtype.

_{k}**s**vector can be projected on cancer-specific biological network maps such as the Atlas of Cancer Signaling Network (ACSN) using user-friendly Google Maps-based online platforms such as NaviCell and MINERVA [51,52]. Functional enrichment analysis results of ICA metagenes can be visualized using maps representing functional redundancy between reference gene sets, such as InfoSigMap or enrichment maps [53].

_{k}## 3. Applications of ICA in Cancer Research

#### 3.1. Applications to Data Preprocessing, Classification, Dimensionality Reduction, and Clustering

**s**distribution is analyzed for determining a set of the most contributing genes (e.g., characterized by the most extreme absolute values in

_{k}**s**). The simplest idea is to select the variables (e.g., genes) bypassing the threshold in p standard deviations, with some choice for p (typically, p ≥ 3). A combined set of the most contributing to different ICs genes can be used to define a subset of data for further analysis.

_{k}**a**vector to the dates of sample preparation. Another component frequently identified in the analysis of transcriptomic data is related to GC-content, which might reflect the influence of GC-content on the RNA amplification step common for both microarray-based and sequencing-based methodologies. In Reference [39], a small dataset of three primary melanoma tumors and two matched controls, characterized at the level of transcriptome and miRNA, were merged together with a large reference melanoma dataset from the Cancer Genome Atlas. The ICA decomposition was performed for the merged transcriptomic and miRNA data separately. For both molecular data types, it was possible to identify those independent components capturing technical differences among platforms while focusing the analysis on biologically meaningful factors whose quantification was comparable among platforms.

_{k}#### 3.2. ICA for Unraveling Functional Subsystems of a Living Cell or a Cell Ecosystem

**s**vector (weights associated with the omics variables) or metagene. The level of activation (or inhibition) of an identified functional subsystem

_{k}**s**can be read in the corresponding metasample vector

_{k}**a**. The same is relevant for an independent component associated with a technical factor intensity.

_{k}**s**) to the definition of the subsystem, which can be positive or negative. However, those variables having close to zero weights can be neglected from the subsystem definition. An important characteristic of a metagene is the set of the most contributing genes (see discussion in the previous section). The most contributing genes are useful to characterize the functional subsystem and to identify if this subsystem corresponds to an existing known one. After determination of the sets of the most contributing genes per each metagene (functional subsystem), one can check if a gene is associated with the subsystem exclusively or contributes to several ones. This analysis can be used to identify potential coupling between the subsystems and their concrete mechanisms (see further discussion). Sometimes it is convenient to distinguish two gene sets per metagene, having the largest and the smallest set of weights, from the positive and the negative sides of the

_{k}**s**distribution.

_{k}**a**is the unimodal or bi- or multi-modal character of the distribution. In the case of well-defined bimodality of a metasample, one can stratify the distribution of samples into two groups, with respect to the nature of the functional subsystem identified. A typical example of this kind is the identification of the functional subsystem of proliferation in single-cell RNA-Seq data, where the corresponding metasamples frequently have two modes, corresponding to proliferative and non-proliferative cell states.

_{k}**s**vectors were matched with each other through correlation in order to reveal the functional modules implicated in colon cancer tumor cells’ and the variability of tumoral microenvironment.

_{k}#### 3.3. Applications to Unsupervised Cell Type Deconvolution

#### 3.4. ICA Applications to Single-Cell Omics Data Analysis

#### 3.5. Multi-Omics ICA Applications in Cancer Research

**a**vectors (metasamples). Such an approach was recently applied in Reference [39] to a set of melanoma bulk samples, profiled at the level of transcriptome and microRNA expression. Similarly, in a recent study [90], 77 breast and 84 ovarian cancer samples, profiled simultaneously at transcriptome and proteome level, were analyzed using stabilized ICA, followed by integrating the discovered associations with clinical data and molecular pathways.

_{k}_{ijk}has dimensions “number of samples × number of genes × number of data types”. For example, Xi = 4, j = 5, k = 2 element in the tensor can indicate DNA methylation level of the promoter of the gene 5 in the sample 4. Xi = 4, j = 5, k = 1 could indicate expression of the same gene in the same sample. In the case of tensor factorization, the resulting components represent matrices rather than vectors having dimensions “number of genes versus number of data types” (for metagenes) and “number of data types versus number of samples” (for metasamples). The existence of correlations among different data types within the same matrix-component indicates coupling among several levels of molecular descriptions captured by tensorial ICA.

#### 3.6. Correlations and Interactions among Functional Subsystems Defined by ICA

**a**vectors. This analysis confirmed previously identified comorbidity patterns based on the analysis of individual gene expression profiles (related to the role of immune system and mitochondrial metabolism) and suggested new molecular mechanisms of comorbidity between lung cancer and Alzheimer’s disease such as estrogen receptor signaling pathway or the involvement of cadherins.

_{k}## 4. Discussion

## Funding

## Conflicts of Interest

## Abbreviations

ICA | Independent Component Analysis |

PCA | Principal Component Analysis |

NMF | Non-Negative Matrix Factorization |

MSTD | Maximally Stable Transcriptomic Dimension |

fMRI | functional Magnetic Resonance Imaging |

TCGA | The Cancer Genome Atlas |

BIC | Bayesian Information Criterion |

FOBI | Fourth-Order Blind Identification |

SEQC | Sequencing Quality Control consortium |

t-SNE | t-Distributed Stochastic Neighbor Embedding |

## References

- Liebermeister, W. Linear modes of gene expression determined by independent component analysis. Bioinformatics
**2002**, 18, 51–60. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lee, S.-I.; Batzoglou, S. Application of independent component analysis to microarrays. Genome Biol.
**2003**, 4, R76. [Google Scholar] [CrossRef] [PubMed] - Saidi, S.A.; Holland, C.M.; Kreil, D.P.; MacKay, D.J.C.; Charnock-Jones, D.S.; Print, C.G.; Smith, S.K. Independent component analysis of microarray data in the study of endometrial cancer. Oncogene
**2004**, 23, 6677–6683. [Google Scholar] [CrossRef] [PubMed][Green Version] - Frigyesi, A.; Veerla, S.; Lindgren, D.; Höglund, M. Independent component analysis reveals new and biologically significant structures in micro array data. BMC Bioinform.
**2006**, 7, 290. [Google Scholar] [CrossRef] [PubMed] - Wang, Q.; Li, Q.; Mi, R.; Ye, H.; Zhang, H.; Chen, B.; Li, Y.; Huang, G.; Xia, J. Radiomics nomogram building from multiparametric MRI to predict grade in patients with glioma: A cohort study. J. Magn. Reson. Imaging
**2019**, 49, 825–833. [Google Scholar] [CrossRef] - Levine, A.B.; Schlosser, C.; Grewal, J.; Coope, R.; Jones, S.J.M.; Yip, S. Rise of the machines: Advances in deep learning for cancer diagnosis. Trends Cancer
**2019**, 5, 157–169. [Google Scholar] [CrossRef] - Tandel, G.S.; Biswas, M.G.; Kakde, O.; Tiwari, A.S.; Suri, H.; Turk, M.; Laird, J.R.; Asare, C.K.; Ankrah, A.N.; Khanna, N.; et al. A Review on a deep learning perspective in brain cancer classification. Cancers (Basel)
**2019**, 11, 111. [Google Scholar] [CrossRef] - Hosny, A.; Parmar, C.; Quackenbush, J.; Schwartz, L.H.; Aerts, H.J.W.L. Artificial intelligence in radiology. Nat. Rev. Cancer
**2018**, 18, 500–510. [Google Scholar] [CrossRef] - Gao, Z.; Wu, S.; Liu, Z.; Luo, J.; Zhang, H.; Gong, M.; Li, S. Learning the implicit strain reconstruction in ultrasound elastography using privileged information. Med. Image Anal.
**2019**, 58, 101534. [Google Scholar] [CrossRef] - Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature
**2017**, 542, 115–118. [Google Scholar] [CrossRef] - Ehteshami Bejnordi, B.; Veta, M.; Johannes van Diest, P.; Van Ginneken, B.; Karssemeijer, N.; Litjens, G.; Van der Laak, J.A.W.M.; Hermsen, M.; Manson, Q.F.; Balkenhol, M.; et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA
**2017**, 318, 2199–2210. [Google Scholar] [CrossRef] [PubMed] - Chaudhary, K.; Poirion, O.B.; Lu, L.; Garmire, L.X. Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res.
**2018**, 24, 1248–1259. [Google Scholar] [CrossRef] [PubMed] - Schmidt, C.M.D. Anderson breaks with IBM Watson, raising questions about artificial intelligence in oncology. J. Natl. Cancer Inst.
**2017**, 109, 5. [Google Scholar] [CrossRef] [PubMed] - Gorban, A.N.; Mirkes, E.M.; Tyukin, I.Y. How Deep should be the depth of convolutional neural networks: A backyard dog case study. Cognit. Comput.
**2019**, 1–10. [Google Scholar] [CrossRef] - Karhunen, J.; Oja, E.; Wang, L.; Vigario, R.; Joutsensalo, J. A class of neural networks for independent component analysis. IEEE Trans. Neural Netw.
**1997**, 8, 486–504. [Google Scholar] [CrossRef] [PubMed][Green Version] - Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA
**2004**, 101, 4164–4169. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gorban, A.N.; Zinovyev, A.Y. Principal graphs and manifolds. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods and Techniques; IGI Global: Hershey, PA, USA, 2008; ISBN 9781605667669. [Google Scholar]
- Zinovyev, A.; Kairov, U.; Karpenyuk, T.; Ramanculov, E. Blind source separation methods for deconvolution of complex signals in cancer biology. Biochem. Biophys. Res. Commun.
**2013**, 430, 1182–1187. [Google Scholar] [CrossRef][Green Version] - Bell, A.J.; Sejnowski, T.J. An information-maximization approach to blind separation and blind deconvolution. Neural Comput.
**1995**, 7, 1129–1159. [Google Scholar] [CrossRef] - Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Netw.
**2000**, 13, 411–430. [Google Scholar] [CrossRef] - Cardoso, J.-F. High-order contrasts for independent component analysis. Neural Comput.
**1999**, 11, 157–192. [Google Scholar] [CrossRef] - Zhou, W.; Altman, R.B. Data-driven human transcriptomic modules determined by independent component analysis. BMC Bioinform.
**2018**, 19, 327. [Google Scholar] [CrossRef] [PubMed] - Risk, B.B.; Matteson, D.S.; Ruppert, D.; Eloyan, A.; Caffo, B.S. An evaluation of independent component analyses with an application to resting-state fMRI. Biometrics
**2014**, 70, 224–236. [Google Scholar] [CrossRef] [PubMed] - Krumsiek, J.; Suhre, K.; Illig, T.; Adamski, J.; Theis, F.J. Bayesian Independent Component Analysis Recovers Pathway Signatures from Blood Metabolomics Data. J. Proteome Res.
**2012**, 11, 4120–4131. [Google Scholar] [CrossRef] [PubMed] - Bach, F.R. Kernel independent component analysis. J. Mach. Learn. Res.
**2002**, 3, 1–48. [Google Scholar] - Zibulevsky, M.; Pearlmutter, B.A. Blind source separation by sparse decomposition in a signal dictionary. Neural Comput.
**2001**, 13, 863–882. [Google Scholar] [CrossRef] [PubMed] - Teschendorff, A.E.; Journée, M.; Absil, P.A.; Sepulchre, R.; Caldas, C. Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput. Biol.
**2007**, 3, e161. [Google Scholar] [CrossRef] [PubMed] - Virta, J.; Taskinen, S.; Nordhausen, K. Applying fully tensorial ICA to fMRI data. In Proceedings of the 2016 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 6 December 2016; pp. 1–6. [Google Scholar]
- Virta, J.; Li, B.; Nordhausen, K.; Oja, H. Independent component analysis for tensor-valued data. J. Multivar. Anal.
**2017**, 162, 172–192. [Google Scholar] [CrossRef][Green Version] - Bach, F.R.; Jordan, M.I. Beyond independent components: Trees and clusters. J. Mach. Learn. Res.
**2003**, 4, 1205–1233. [Google Scholar] - Meyer-Bäse, A.; Theis, F.J.; Lange, O.; Puntonet, C.G. Tree-Dependent and topographic independent component analysis for fMRI analysis. In International Conference on Independent Component Analysis and Signal Separation; Springer: Berlin/Heidelberg, Germany, 2004; pp. 782–789. [Google Scholar]
- Avila Cobos, F.; Vandesompele, J.; Mestdagh, P.; De Preter, K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics
**2018**, 34, 1969–1979. [Google Scholar] [CrossRef][Green Version] - Biton, A.; Bernard-Pierrot, I.; Lou, Y.; Krucker, C.; Chapeaublanc, E.; Rubio-Pérez, C.; López-Bigas, N.; Kamoun, A.; Neuzillet, Y.; Gestraud, P.; et al. Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. Cell Rep.
**2014**, 9, 1235–1245. [Google Scholar] [CrossRef] - Kairov, U.; Cantini, L.; Greco, A.; Molkenov, A.; Czerwinska, U.; Barillot, E.; Zinovyev, A. Determining the optimal number of independent components for reproducible transcriptomic data analysis. BMC Genom.
**2017**, 18, 712. [Google Scholar] [CrossRef] - Kong, W.; Vanderburg, C.R.; Gunshin, H.; Rogers, J.T.; Huang, X. A review of independent component analysis application to microarray gene expression data. Biotechniques
**2008**, 45, 501–520. [Google Scholar] [CrossRef] [PubMed][Green Version] - Meng, C.; Zeleznik, O.A.; Thallinger, G.G.; Kuster, B.; Gholami, A.M.; Culhane, A.C. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform.
**2016**, 17, 628–641. [Google Scholar] [CrossRef] [PubMed] - Barillot, E.; Calzone, L.; Hupe, P.; Vert, J.-P.; Zinovyev, A. Computational Systems Biology of Cancer; Taylor & Francis: Abington, UK, 2012; ISBN 9781439831441. [Google Scholar]
- Cantini, L.; Kairov, U.; de Reyniès, A.; Barillot, E.; Radvanyi, F.; Zinovyev, A. Assessing reproducibility of matrix factorization methods in independent transcriptomes. Bioinformatics
**2019**. [Google Scholar] [CrossRef] [PubMed] - Nazarov, P.V.; Wienecke-Baldacchino, A.K.; Zinovyev, A.; Czerwińska, U.; Muller, A.; Nashan, D.; Dittmar, G.; Azuaje, F.; Kreis, S. Independent component analysis provides clinically relevant insights into the biology of melanoma patients. BMC Med. Genom.
**2019**, 395145. [Google Scholar] [CrossRef] - Chiappetta, P.; Roubaud, M.C.; Torrésani, B. Blind source separation and the analysis of microarray data. J. Comput. Biol.
**2005**, 11, 1090–1109. [Google Scholar] [CrossRef] [PubMed] - Himberg, J.; Hyvarinen, A. Icasso: Software for investigating the reliability of ICA estimates by clustering and visualization. In Proceedings of the 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718), Toulouse, France, 17–19 September 2003; pp. 259–268. [Google Scholar]
- Czerwinska, U.; Cantini, L.; Kairov, U.; Barillot, E.; Zinovyev, A. Application of independent component analysis to tumor transcriptomes reveals specific and reproducible immune-related signals. In Proceedings of the Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018; Volume 10891LNCS, pp. 501–513. [Google Scholar]
- Engreitz, J.M.; Daigle, B.J.; Marshall, J.J.; Altman, R.B. Independent component analysis: Mining microarray data for fundamental human gene expression modules. J. Biomed. Inform.
**2010**, 43, 932–944. [Google Scholar] [CrossRef][Green Version] - Greco, A.; Sanchez Valle, J.; Pancaldi, V.; Baudot, A.; Barillot, E.; Caselle, M.; Valencia, A.; Zinovyev, A.; Cantini, L. Molecular inverse comorbidity between Alzheimer’s disease and lung cancer: New insights from matrix factorization. Int. J. Mol. Sci.
**2019**, 20, 3114. [Google Scholar] [CrossRef] - Stein-O’Brien, G.L.; Arora, R.; Culhane, A.C.; Favorov, A.V.; Garmire, L.X.; Greene, C.S.; Goff, L.A.; Li, Y.; Ngom, A.; Ochs, M.F.; et al. Enter the matrix: Factorization uncovers knowledge from omics. Trends Genet.
**2018**, 34, 790–805. [Google Scholar] [CrossRef] - Way, G.P.; Zietz, M.; Himmelstein, D.S.; Greene, C.S. Sequential compression across latent space dimensions enhances gene expression signatures. bioRxiv
**2019**. bioRxiv:573782. [Google Scholar] [CrossRef] - Cangelosi, R.; Goriely, A. Component retention in principal component analysis with application to cDNA microarray data. Biol. Direct
**2007**, 2, 2. [Google Scholar] [CrossRef] [PubMed] - Ceruti, C.; Bassis, S.; Rozza, A.; Lombardi, G.; Casiraghi, E.; Campadelli, P. DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern Recognit.
**2014**, 47, 2569–2581. [Google Scholar] [CrossRef] - Albergante, L.; Bac, J.; Zinovyev, A. Estimating the effective dimension of large biological datasets using Fisher separability analysis. In Proceedings of the International Joint Conference on Neural Networks, Hungary, Budapest, 14–17 July 2019. [Google Scholar]
- Kuperstein, I.; Grieco, L.; Cohen, D.P.A.; Thieffry, D.; Zinovyev, A.; Barillot, E. The shortest path is not the one you know: Application of biological network resources in precision oncology research. Mutagenesis
**2015**, 30, 191–204. [Google Scholar] [CrossRef] [PubMed] - Bonnet, E.; Viara, E.; Kuperstein, I.; Calzone, L.; Cohen, D.P.A.; Barillot, E.; Zinovyev, A. NaviCell Web Service for network-based data visualization. Nucleic Acids Res.
**2015**, 43, W560–W565. [Google Scholar] [CrossRef] [PubMed][Green Version] - Gawron, P.; Ostaszewski, M.; Satagopam, V.; Gebel, S.; Mazein, A.; Kuzma, M.; Zorzan, S.; McGee, F.; Otjacques, B.; Balling, R.; et al. MINERVA-a platform for visualization and curation of molecular interaction networks. NPJ Syst. Biol. Appl.
**2016**, 2, 16020. [Google Scholar] [CrossRef] [PubMed] - Cantini, L.; Calzone, L.; Martignetti, L.; Rydenfelt, M.; Blüthgen, N.; Barillot, E.; Zinovyev, A. Classification of gene signatures for their information value and functional redundancy. NPJ Syst. Biol. Appl.
**2018**, 4, 2. [Google Scholar] [CrossRef] [PubMed] - Grossmann, P.; Stringfield, O.; El-Hachem, N.; Bui, M.M.; Rios Velazquez, E.; Parmar, C.; Leijenaar, R.T.; Haibe-Kains, B.; Lambin, P.; Gillies, R.J.; et al. Defining the biological basis of radiomic phenotypes in lung cancer. Elife
**2017**, 6, e23421. [Google Scholar] [CrossRef] [PubMed] - Zhang, X.W.; Yap, Y.L.; Wei, D.; Chen, F.; Danchin, A. Molecular diagnosis of human cancer type by gene expression profiles and independent component analysis. Eur. J. Hum. Genet.
**2005**, 13, 1303–1311. [Google Scholar] [CrossRef] [PubMed] - Huang, D.-S.; Zheng, C.-H. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics
**2006**, 22, 1855–1862. [Google Scholar] [CrossRef] [PubMed] - Zheng, C.H.; Huang, D.S.; Kong, X.Z.; Zhao, X.M. Gene Expression Data Classification Using Consensus Independent Component Analysis. Genom. Proteomics Bioinform.
**2008**, 6, 74–82. [Google Scholar] [CrossRef][Green Version] - Aziz, R.; Verma, C.K.; Srivastava, N. A novel approach for dimension reduction of microarray. Comput. Biol. Chem.
**2017**, 71, 161–169. [Google Scholar] [CrossRef] [PubMed] - Nascimento, M.; Silva, F.F.E.; Sáfadi, T.; Nascimento, A.C.C.; Ferreira, T.E.M.; Barroso, L.M.A.; Ferreira Azevedo, C.; Guimarães, S.E.F.; Serão, N.V.L. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS ONE
**2017**, 12, e0181195. [Google Scholar] [CrossRef] [PubMed] - Han, H.; Li, X.L. Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery. BMC Bioinform.
**2011**, 12, S7. [Google Scholar] [CrossRef] - Trapnell, C.; Cacchiarelli, D.; Grimsby, J.; Pokharel, P.; Li, S.; Morse, M.; Lennon, N.J.; Livak, K.J.; Mikkelsen, T.S.; Rinn, J.L. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol.
**2014**, 32, 381–386. [Google Scholar] [CrossRef] [PubMed][Green Version] - Aynaud, M.-M.; Mirabeau, O.; Gruel, N.; Grossetete-Lalami, S.; Boeva, V.; Durand, S.; Surdez, D.; Saulnier, O.; Zaidi, S.; Gribkova, S.; et al. Transcriptional programs define intratumoral heterogeneity of Ewing sarcoma at single cell resolution. bioRxiv
**2019**. bioRxiv:623710. [Google Scholar] [CrossRef] - Gorban, A.N.; Pokidysheva, L.I.; Smirnova, E.V.; Tyukina, T.A. Law of the minimum paradoxes. Bull. Math. Biol.
**2011**, 73, 2013–2044. [Google Scholar] [CrossRef] [PubMed] - Gorban, A.N.; Tyukina, T.A.; Smirnova, E.V.; Pokidysheva, L.I. Evolution of adaptation mechanisms: Adaptation energy, stress, and oscillating death. J. Theor. Biol.
**2016**, 405, 127–139. [Google Scholar] [CrossRef][Green Version] - Segal, E.; Friedman, N.; Koller, D.; Regev, A. A module map showing conditional activity of expression modules in cancer. Nat. Genet.
**2004**, 36, 1090–1098. [Google Scholar] [CrossRef] - Galon, J.; Mlecnik, B.; Bindea, G.; Angell, H.K.; Berger, A.; Lagorce, C.; Lugli, A.; Zlobec, I.; Hartmann, A.; Bifulco, C.; et al. Towards the introduction of the ‘Immunoscore’ in the classification of malignant tumours. J. Pathol.
**2014**, 232, 199–209. [Google Scholar] [CrossRef] - Becht, E.; Giraldo, N.A.; Lacroix, L.; Buttard, B.; Elarouci, N.; Petitprez, F.; Selves, J.; Laurent-Puig, P.; Sautès-Fridman, C.; Fridman, W.H.; et al. Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biol.
**2016**, 17, 218. [Google Scholar] [CrossRef] - Newman, A.M.; Liu, C.L.; Green, M.R.; Gentles, A.J.; Feng, W.; Xu, Y.; Hoang, C.D.; Diehn, M.; Alizadeh, A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods
**2015**, 12, 453–457. [Google Scholar] [CrossRef] [PubMed][Green Version] - Aran, D.; Hu, Z.; Butte, A.J. xCell: Digitally portraying the tissue cellular heterogeneity landscape. Genome Biol.
**2017**, 18, 220. [Google Scholar] [CrossRef] [PubMed] - Racle, J.; de Jonge, K.; Baumgaertner, P.; Speiser, D.E.; Gfeller, D. Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. Elife
**2017**, 6, e26476. [Google Scholar] [CrossRef] [PubMed] - Gaujoux, R.; Seoighe, C. Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: A case study. Infect. Genet. Evol.
**2012**, 12, 913–921. [Google Scholar] [CrossRef] [PubMed] - Nelms, B.D.; Waldron, L.; Barrera, L.A.; Weflen, A.W.; Goettel, J.A.; Guo, G.; Montgomery, R.K.; Neutra, M.R.; Breault, D.T.; Snapper, S.B.; et al. CellMapper: Rapid and accurate inference of gene expression in difficult-to-isolate cell types. Genome Biol.
**2016**, 17, 201. [Google Scholar] [CrossRef] [PubMed] - Kotliar, D.; Veres, A.; Nagy, M.A.; Tabrizi, S.; Hodis, E.; Melton, D.A.; Sabeti, P.C. Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. Elife
**2019**, 8, e43803. [Google Scholar] [CrossRef] [PubMed] - Wang, N.; Hoffman, E.P.; Chen, L.; Chen, L.; Zhang, Z.; Liu, C.; Yu, G.; Herrington, D.M.; Clarke, R.; Wang, Y. Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci. Rep.
**2016**, 6, 18909. [Google Scholar] [CrossRef] [PubMed] - Czerwinska, U. Unsupervised deconvolution of bulk omics profiles: Methodology and application to characterize the immune landscape in tumors. Ph.D. Thesis, University Paris Decartes, Paris, France, 2018. [Google Scholar]
- Su, Z.; Łabaj, P.P.; Li, S.; Thierry-Mieg, J.; Thierry-Mieg, D.; Shi, W.; Wang, C.; Schroth, G.P.; Setterquist, R.A.; Thompson, J.F.; et al. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol.
**2014**, 32, 903–914. [Google Scholar] - Teschendorff, A.E.; Zheng, S.C. Cell-type deconvolution in epigenome-wide association studies: A review and recommendations. Epigenomics
**2017**, 9, 757–768. [Google Scholar] [CrossRef] - Teschendorff, A.E.; Zhuang, J.; Widschwendter, M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics
**2011**, 27, 1496–1505. [Google Scholar] [CrossRef] - Dirkse, A.; Golebiewska, A.; Buder, T.; Nazarov, P.V.; Muller, A.; Poovathingal, S.; Brons, N.H.C.; Leite, S.; Sauvageot, N.; Sarkisjan, D.; et al. Stem cell-associated heterogeneity in Glioblastoma results from intrinsic tumor plasticity shaped by the microenvironment. Nat. Commun.
**2019**, 10, 1787. [Google Scholar] [CrossRef] [PubMed] - Francesconi, M.; Di Stefano, B.; Berenguer, C.; de Andrés-Aguayo, L.; Plana-Carmona, M.; Mendez-Lago, M.; Guillaumet-Adkins, A.; Rodriguez-Esteban, G.; Gut, M.; Gut, I.G.; et al. Single cell RNA-seq identifies the origins of heterogeneity in efficient cell transdifferentiation and reprogramming. Elife
**2019**, 8, e41627. [Google Scholar] [CrossRef] [PubMed] - Qiu, X.; Mao, Q.; Tang, Y.; Wang, L.; Chawla, R.; Pliner, H.A.; Trapnell, C. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods
**2017**, 14, 979–982. [Google Scholar] [CrossRef] [PubMed][Green Version] - Zhu, D.; Zhao, Z.; Cui, G.; Chang, S.; Hu, L.; See, Y.X.; Lim, M.G.L.; Guo, D.; Chen, X.; Poudel, B.; et al. Single-Cell Transcriptome Analysis Reveals Estrogen Signaling Coordinately Augments One-Carbon, Polyamine, and Purine Synthesis in Breast Cancer. Cell Rep.
**2018**, 25, 2285–2298. [Google Scholar] [CrossRef] [PubMed] - Butler, A.; Hoffman, P.; Smibert, P.; Papalexi, E.; Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol.
**2018**, 36, 411–420. [Google Scholar] [CrossRef] [PubMed] - DeTomaso, D.; Yosef, N. FastProject: A tool for low-dimensional analysis of single-cell RNA-Seq data. BMC Bioinform.
**2016**, 17, 315. [Google Scholar] [CrossRef] [PubMed] - Kondratova, M.; Czerwińska, U.; Sompairac, N.; Amigorena, S.D.; Soumelis, V.; Barillot, E.; Zinovyev, A.; Kuperstein, I. A multiscale signalling network map of innate immune response in cancer reveals signatures of cell heterogeneity and functional polarization. Nat. Commun.
**2019**. In Press. [Google Scholar] - Tirosh, I.; Izar, B.; Prakadan, S.M.; Wadsworth, M.H.; Treacy, D.; Trombetta, J.J.; Rotem, A.; Rodman, C.; Lian, C.; Murphy, G.; et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science
**2016**, 352, 189–196. [Google Scholar] [CrossRef][Green Version] - Macaulay, I.C.; Svensson, V.; Labalette, C.; Ferreira, L.; Hamey, F.; Voet, T.; Teichmann, S.A.; Cvejic, A. Single-Cell RNA-sequencing reveals a continuous spectrum of differentiation in hematopoietic cells. Cell Rep.
**2016**, 14, 966–977. [Google Scholar] [CrossRef] - Risso, D.; Perraudeau, F.; Gribkova, S.; Dudoit, S.; Vert, J.-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun.
**2018**, 9, 284. [Google Scholar] [CrossRef] - Forget, A.; Martignetti, L.; Puget, S.; Calzone, L.; Brabetz, S.; Picard, D.; Montagud, A.; Liva, S.; Sta, A.; Dingli, F.; et al. Aberrant ERBB4-SRC signaling as a hallmark of group 4 medulloblastoma revealed by integrative phosphoproteomic profiling. Cancer Cell
**2018**, 34, 379–395. [Google Scholar] [CrossRef] [PubMed] - Liu, W.; Payne, S.H.; Ma, S.; Fenyö, D. Extracting pathway-level signatures from proteogenomic data in breast cancer using independent component analysis. Mol. Cell. Proteom.
**2019**, 18, S169–S182. [Google Scholar] [CrossRef] [PubMed] - Teschendorff, A.E.; Jing, H.; Paul, D.S.; Virta, J.; Nordhausen, K. Tensorial blind source separation for improved analysis of multi-omic data. Genome Biol.
**2018**, 19, 76. [Google Scholar] [CrossRef] [PubMed][Green Version] - Sefta, M. Comprehensive Molecular and Clinical Characterization of Retinoblastoma. Ph.D. Thesis, Université Paris-Saclay, 2015. [Google Scholar]
- Renard, E.; Teschendorff, A.E.; Absil, P.-A. Capturing confounding sources of variation in DNA methylation data by spatiotemporal independent component analysis. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 23–25 April 2014; pp. 195–200. [Google Scholar]
- Ma, Z.; Teschendorff, A.; Yu, H.; Taghia, J.; Guo, J. Comparisons of non-gaussian statistical models in DNA methylation analysis. Int. J. Mol. Sci.
**2014**, 15, 10835–10854. [Google Scholar] [CrossRef] [PubMed] - Kong, W.; Mou, X.; Deng, J.; Di, B.; Zhong, R.; Wang, S.; Yang, Y.; Zeng, W. Differences of immune disorders between Alzheimer’s disease and breast cancer based on transcriptional regulation. PLoS ONE
**2017**, 12, e0180337. [Google Scholar] [CrossRef] [PubMed] - Scheffer, M.; Carpenter, S.R.; Lenton, T.M.; Bascompte, J.; Brock, W.; Dakos, V.; van de Koppel, J.; van de Leemput, I.A.; Levin, S.A.; van Nes, E.H.; et al. Anticipating Critical Transitions. Science
**2012**, 338, 344–348. [Google Scholar] [CrossRef][Green Version] - Mesleh, A.M. Lung cancer detection using multi-layer neural networks with independent component analysis: A comparative study of training algorithms. Jordan J. Biol. Sci.
**2017**, 10, 239–249. [Google Scholar] - Han, G.; Liu, X.; Zhang, H.; Zheng, G.; Soomro, N.Q.; Wang, M.; Liu, W. Hybrid resampling and multi-feature fusion for automatic recognition of cavity imaging sign in lung CT. Futur. Gener. Comput. Syst.
**2019**, 99, 558–570. [Google Scholar] [CrossRef]

**Figure 1.**Independent component analysis (ICA) is a standard tool for reducing the complexity of omics datasets in cancer biology. (

**a**) ICA belongs to the family of matrix factorization methods, approximating a 2D matrix by a product of two much smaller matrices, containing metagenes and metasamples, in the case of omics data. (

**b**) ICA can be considered as a rotation of PCA axes, after data “whitening” (i.e., orienting the Gaussian ellipsoid along the coordinate axes and scaling them to unit variance). (

**c**) The major types of applications of ICA in cancer biology. (

**d**) The number of publications in PubMed mentioning ICA and the number of publications simultaneously mentioning ICA and “tumor” or “cancer”.

**Figure 2.**Features of ICA applied to a synthetic (

**a**) and two real-life datasets (breast cancer The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) transcriptomic datasets) (

**b**,

**c**). (

**a**) Independent Component Analysis is able to disentangle (or deconvolute) two intersecting Gaussian distributions with coinciding means and whose principal axes form a sharp angle; (

**b**) 100 order ICA decomposition of the TCGA and METABRIC datasets. Each component represented as a metagene was correlated to either immune infiltration-related or proliferation-related meta-metagenes derived from Reference [33]. This analysis shows that only one of the components was strongly correlated to the cell-cycle, while several can be associated with the presence of an immune-infiltrated ICA-derived signature (this, probably, signifies the ability of ICA to deconvolute the major immune cell types in an unsupervised manner (see, Reference [42]); (

**c**) correlations matrix between the metagenes of independent components extracted from the TCGA and METABRIC separately. It shows that, for some components computed for different datasets, there exists a strong and unique association between them, indicating the high reproducibility of the ICA results (e.g., see Reference [38]).

**Figure 3.**Interpretation of ICA components using histopathology imaging of bladder tumor cross-sections. Each metasample produced by ICA defined a ranking, which was used to sort the images. Visual inspection determines a clear trend in the images towards the increase of certain elements (presence of smooth muscle cells, myofibroblasts (cancer-associated fibroblasts), dividing cells). Two example images per component selected from the top and the bottom of the rankings are shown here. Green rhombuses designate normal samples. Black circles designate cells of interest: muscle cell (left), myofibroblast (middle), cells in mitosis (right). The figure is reproduced from the Supplementary Materials of Reference [33] with permission.

**Figure 4.**Use of ICA components in meta-analysis of multiple omics datasets. (

**a**) Pairwise comparison of two sets of ICA metagenes led to an asymmetric correlation matrix (same as in Figure 2c) which can be converted to a graph using some threshold and selecting the maximal correlations. If two components are maximally correlated with each other, then such a correlation defines reciprocal best hit (RBH). (

**b**) Graph of maximal correlations (reciprocal and not) exceeding certain threshold among components computed for 22 cancer transcriptomic dataset. Each node is a component, and an edge denotes a correlation. Color reflects the cancer type (e.g., red is bladder cancer). Communities in this graph define highly reproducible cancer type-specific and universal latent factors The figure is reproduced with permission from Reference [33].

**Figure 5.**Examples of utility of ICA for unsupervised deconvolution of cell types. (

**a**) Application of ICA to the Sequencing Quality Control consortium (SEQC) dataset [76] containing measurements of two references transcriptomic profiles of cell lines and their mixtures at known proportions. The first two ICs identify the types and the effect of the platform. (

**b**) Correlation graph among selected components from ICA applied to six non-redundant breast cancer transcriptomic datasets. Three cliques formed in the graph correspond to major immune cell types. The thickness of the edges reflects the absolute correlation value. “Immune” meta-metagene was defined in Reference [33] as the one associated with the presence of immune infiltrate in a tumor. This figure was reproduced with permission from Reference [42].

**Figure 6.**Application of ICA in single cell data analysis of tumors (study of glioblastoma from Reference [79]). (

**a**) t-distributed stochastic neighbor embedding (t-SNE) visualization of the data reveals a strong batch effect. Grey and red/blue dots represent cells from the same cell line, analyzed in two batches (batch 1—grey dots, batch 2—red and blue cells). The green dots show a cell population from a different cell line added to the dataset for the reason of comparison. (

**b**) t-SNE visualization of the data after eliminating signals contained in one IC associated with batch effect. (

**c**) In ICA decomposition of single cell scRNA-Seq data from cancer studies, usually there exist two components associated with phases of the cell cycle (G1/S, DNA replication, and G2/M, mitosis). Here the loadings of such two components are visualized. Black arrows show the regions when the labeled genes are highly expressed. Yellow arrows show assumed direction of the progression through the cell cycle.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sompairac, N.; Nazarov, P.V.; Czerwinska, U.; Cantini, L.; Biton, A.; Molkenov, A.; Zhumadilov, Z.; Barillot, E.; Radvanyi, F.; Gorban, A.; Kairov, U.; Zinovyev, A. Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets. *Int. J. Mol. Sci.* **2019**, *20*, 4414.
https://doi.org/10.3390/ijms20184414

**AMA Style**

Sompairac N, Nazarov PV, Czerwinska U, Cantini L, Biton A, Molkenov A, Zhumadilov Z, Barillot E, Radvanyi F, Gorban A, Kairov U, Zinovyev A. Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets. *International Journal of Molecular Sciences*. 2019; 20(18):4414.
https://doi.org/10.3390/ijms20184414

**Chicago/Turabian Style**

Sompairac, Nicolas, Petr V. Nazarov, Urszula Czerwinska, Laura Cantini, Anne Biton, Askhat Molkenov, Zhaxybay Zhumadilov, Emmanuel Barillot, Francois Radvanyi, Alexander Gorban, Ulykbek Kairov, and Andrei Zinovyev. 2019. "Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets" *International Journal of Molecular Sciences* 20, no. 18: 4414.
https://doi.org/10.3390/ijms20184414