Next Article in Journal
A Novel Imperialist Competitive Algorithm for Energy-Efficient Permutation Flow Shop Scheduling Problem Considering the Deterioration Effect of Machines
Previous Article in Journal
Memristive Hopfield Neural Network with Hidden Multiple Attractors and Its Application in Color Image Encryption
Previous Article in Special Issue
Dynamics of the Aggregation of Cells with Internal Oscillators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Natural Methods of Unsupervised Topological Alignment

by
Maksim V. Kukushkin
1,2,*,
Mikhail S. Arbatskiy
1,
Dmitriy E. Balandin
1 and
Alexey V. Churov
1
1
Russian Clinical Research Center of Gerontology, Pirogov Russian National Research Medical University, Ministry of Healfcare of the Russian Federation, 129226 Moscow, Russia
2
Institute of Applied Mathematics and Automation, Kabardino-Balkarian Scientific Center, Russian Academy of Sciences, 360000 Nalchik, Russia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(24), 3968; https://doi.org/10.3390/math13243968
Submission received: 16 October 2025 / Revised: 29 November 2025 / Accepted: 8 December 2025 / Published: 12 December 2025
(This article belongs to the Special Issue Advances in Biological Systems with Mathematics)

Abstract

In the paper, we present a comparison analysis of the methods of the topological alignment and extract the main mathematical principles forming the base of the concept. The main narrative is devoted to the so-called coupled methods dealing with the data sets of various natures. As a main theoretical result, we obtain harmonious generalizations of the graph Laplacian and kernel-based methods with the central idea to find a natural structure coupling data sets of various natures. Finally, we discuss prospective applications and consider far-reaching generalizations related to the hypercomplex numbers.

1. Introduction

In many papers the method of coupling images of heterogenous data sets is based on the invention of a special technique involving artificial constructions that at least do not reflect, at most contradict, the mathematical nature of the mapping, i.e., by saying “mathematical nature” we mean abstract classical mathematical notions such as the space, the algebraic structure, etc. It is clear that dealing with the classical mathematical notions, we have an opportunity to develop the theory harmoniously using well-known methods and putting them as a fundamental base inventing new ones that can be in their own turn classified in accordance with the accepted terminology. Quite the contrary is the case when we involve artificial technical formulas allowing us to achieve a concrete goal but likely preventing the development of the theory in general, for in this case we know nothing of the consequent influence of the used construction on the scheme of reasonings, and it is rather reasonable, due to the technical obstacles, to expect the end of the theory.
Manifold alignment is a class of algorithms that create a mapping of heterogeneous data sets into a common, lower-dimensional latent space. The central point is to find a mapping that reveals the entire structure of the initial manifold. It is clear that a metric reflecting the distance plays a great role in the issue as well as the mathematical nature of the mapping; however, the latter should be harmoniously connected with applications. Diagonal integration represents a joint analysis of multi-omics biological data—genomic, transcriptomic, proteomic, and metabolomic. A critical view of existing methods and tools for diagonal integration must be supported by experience in analyzing individual data types. The scientific team has analyzed genomic data to identify damaging mutations [1], scRNA-seq transcriptomic data of cell lines [2], proteomic data [3], small non-coding RNA data [4], as well as metabolomic profiles. The accumulated experience of working with various types of omics data creates a foundation for developing methods for their integration.
Experimental methods on individual cells reveal transcriptomic and epigenetic heterogeneity between cells, but the question on their relationship remains open. Note that various types of topological alignment include the principle concept of the relationship between images or sometimes between preimages. The relationship is provided by an artificial structure generally represented by an operator with the idea to spread a unified structure on the data sets of the different nature. The literary survey represented below produces arguments creating prerequisites for the comprehensive investigation.
In the paper [5], an approach for integrating several types of measurements of observed data on a single cell is considered. However, the stated results are not based on a detailed theory; moreover, the concept of connecting heterogeneous data is not represented. In the paper [6], the authors present the manifold alignment method that is used to integrate several types of measurements of observed data performed on various aliquots of a given cell population. The MMD-MA (Maximum Mean Discrepancy Manifold Alignment) method minimizes the discrepancy between images of different modalities in a common latent space using the maximum mean discrepancy as a measure of the dissimilarity that provides a theoretically sound approach. However, minimizing the discrepancy can lead to loss of the information specific to each modality since a smooth representation does not reflect the real heterogeneity of the biological data. Moreover, the connection between heterogenous data sets is based on the mathematical construction obtained by virtue of the optimization, where the artificially constructed penalty functions are involved, what does not fully preserve the inherent structure of the data sets.
In the paper [7], the authors present a modification of the Gromov–Wasserstein distance-based manifold alignment method [8], which combines heterogeneous multiomic data sets of a single cell in order to describe and represent common and data set-specific cellular structures in different modalities. The Pamona (Partial Gromov–Wasserstein distance-based manifold alignment) method applies partial optimal alignment of manifolds, allowing not all elements to be compared between modalities, but only those for which reliable correspondences can be found. However, the method requires additional criteria to determine the threshold for the reliability of correspondences and may lead to the loss of potentially informative connections due to the specific approach to establishing correspondences.
In the paper [9], the authors combine two clustering processes so that in clustering cells corresponding to the scRNA-seq sample, the information from the scATAC-seq sample could also be used, and vice versa. The authors formulate this problem of coupled clustering as an optimization problem and present a method for solving it called coupled non-negative matrix factorization.
In the paper [10], a new algorithm for unsupervised topological alignment of multiomic integration for a single cell is presented. The method does not require any information on correspondence between cells or between measurements. At the first stage, the method embeds the internal low-dimensional structure of each data set of a single cell into a matrix, the elements of which reflect the distances corresponding to cells within the same data set, and then aligns the cell images by comparison of distance matrices using the matrix optimization method. Finally, the method projects separate, non-comparable measurements corresponding to single cell data sets into a latent embedding space to provide comparability of measurements by virtue of aligned cell images.
In the paper [11], the topological alignment method is formulated for two modalities, where a modality is understood as a set of features for a sample of cells, the information on the features is reflected in a matrix, the columns of which represent cells, and the rows represent various features. Since the objects in the two data sets are different, the additional technical tool is required for the method, the so-called transition matrix connecting the heterogenous data sets. The transition matrix is understood as a projection of one data set onto the space corresponding to another data set. The transition matrix is obtained from the profiles of scATAC-seq due to the information given in the bodies of genes [12,13,14,15].
Resuming the descriptions given above, we come to the conclusion that in each case there is a technically involved unnatural structure providing a relationship between the images of the data sets. Generally, a rather abstract term “relationship” means the property of preservation of the proportion between the distances in the initial and latent spaces. However, from the mathematical point of view, the constructed mapping should inherit the abstract properties of the involved structure. Thus, classical mathematical structures, such as the structures defined on the Hilbert space or generated by the properties of well-known operators acting in the Hilbert space (here we use a notion including Euclidean spaces), generate mappings having classical properties. The corresponding methods of the topological alignment are called by nature. On the other hand, the practical relevance of the methods considered above appears just due to mathematical tools applied in some complicated order that may not be harmonious from the fundamental theory point of view. The corresponding methods of the topological alignment are called by unnatural. In this paper, we study natural methods such as Laplacian graph and kernel-based methods of the topological alignment. Generally, we consider a natural algebraic structure such as the finite-dimensional unital algebra of the hypercomplex numbers and produce concrete reasonings corresponding to the field of the complex numbers, showing the relevance of the approach. The real and imaginary parts of the complex number correspond to the coupled data sets, respectively; thus, the construction of a mapping defined on the complex Hilbert space includes a naturally coupled structure. The main achievement is an attempt to construct an abstract qualitative theory, creating an opportunity to consider a number of heterogenous data sets endowed with the coupled natural structure, let alone far-reaching modifications and generalizations.
The paper is organized as follows: in Section 2, we consider notations and some well-known facts; in Section 3, we introduce the idea of the coupled unsupervised manifold alignment; in Paragraph 3.1, we consider a coupled Laplacian mapping and prove auxiliary propositions; in Paragraph 3.2, we consider the unsupervised manifold alignment via the reproducing kernel; in Paragraph 3.3, we consider prospective applications to the modern methods; and in Section 4, we finalize the theoretical results and discuss prospective generalizations.

2. Preliminaries

Throughout the paper, we consider finite-dimensional matrices over the field of the complex numbers C n × m , i.e., of the dimension ( n × m ) , n , m N , and use the following notation A = { a s j } C n × m , s = 1 , 2 , , n , j = 1 , 2 , , m for a matrix. We use the following standard notation for the transpose operation:  A T = { a j s } . Consider a matrix A = { a s j } C n × m , a s j C , denote
a s : = ( a s 1 , a s 2 , , a s m ) T , a j . = ( a 1 j , a 2 j , , a n j ) T .
We consider a complex linear vector space C n consisting of the set of column matrices whose elements are complex numbers x = ( x 1 , x 2 , , x n ) T , x j C , j = 1 , 2 , n . Analogously, we define the real linear vector space R n . A complex linear vector space supplied with the given bellow structure of the inner (scalar ) product operation is called by the complex Euclidean space
( x , y ) C n : = j = 1 n x j y ¯ j , x , y C n .
Note that the complex Euclidean space represents a particular case of the unitary space, i.e., a linear space over the field of the complex numbers. Define the norm in the general sense in a space with the inner product as follows: x = ( x , x ) . Consider a matrix A = { a s j } C n × m , and denote A ¯ = { a ¯ s j } . In accordance with the theorem on the orthogonal decomposition theorem, we have
C n = C m + . C n m , n > m ,
i.e., for an arbitrary element x C n there exists a unique pair x 1 C m , x 2 C n m of elements so that the following decomposition holds x = x 1 + x 2 . In this way we can put the element x 1 into correspondence with the element x , which is called the orthogonal projection of the space C n onto the subspace C m . Given an arbitrary set X , a totally ordered set Y , and a function f : X Y , the argmin over some subset Ω X is defined by the following expression
argmin x Ω f ( x ) : = { y Ω : f ( y ) f ( x ) , x Ω } .
Consider a set of the elements { x 1 , x 2 , , x n } belonging to a normed space. Let us construct a weighted graph G having n vertices, where one vertex corresponds to one element, and a set of edges connecting the adjacent vertices of the graph. We numerate the vertexes according to the given order, i.e., j vertex corresponds to x j . We suppose that the vertexes s and j are adjacent to each other if the elements x s and x j are related, for instance close in some sense. The matter is how to define the rather vague notion of closeness in a more concrete way [16].
Here, we consider two variants: The first one is the so-called ε -neighborhood method, which postulates the following sense of closeness. Suppose that the vertexes s and j are adjacent to each other if
x s x j < ε ,
where the norm is understood in the abstract sense. However, the Euclidean norm is mostly used in applications. Note that this type of closeness is geometrically motivated, the relationship is naturally symmetric. However, it often leads to a graph with several connected components, and it is difficult to choose the convenient value of ε to avoid the unconnected graph.
The second variant is the so-called n-nearest neighbors method, which postulates the following sense of closeness. Suppose that the vertexes s and j are adjacent to each other if x s belongs to the set of n nearest, in the sense of the norm neighbors, of the element x j or x j belongs to the set of n-nearest neighbors of the element x s .
Having constructed the graph in accordance with one of the above methods, we can choose weights to form the weight matrix. The first one relates to the heat kernel; we assume that if the vertexes s and j are connected, in symbol s j , then the weight matrix is formed from the elements
W s j = e t 1 x s x j 2 , t R { 0 } ,
in the contrary case, we assume that W s j = 0 . Thus, we have
W s j = e t 1 x s x j 2 , s j 0 , s j .
In accordance with the made definition, we obtain the adjacency matrix of the weighted graph W = { W s j } C n × n . The substantiation for choosing this weight is represented in the papers [17,18]. The second one is the so-called simple weight; we assume that W s j = 1 if the vertices s and j are connected and W s j = 0 if s and j are disconnected, i.e.,
W s j = 1 , s j 0 , s j .
This kind of simplicity gives the opportunity to avoid choosing the value of the parameter t . Note that a matrix A C n × m generates the finite-dimensional operator A : C m C n . We use the following notations for the Hermitian components of the operator
Re A = A + A 2 , Im A = A A 2 i , A ¯ : = { a ¯ s j } , Re ( A ) : = { Re a s j } , Im ( A ) : = { Im a s j } ,
the latter matrices are called by the real and imaginary parts of the matrix A, respectively. The detailed information on the Hermitian components’ properties is given in the paper [19]. Denote by D ( A ) , R ( A ) , N ( A ) the domain of definition, the range, and the kernel or null space of the operator A, respectively.

3. Coupled Unsupervised Manifold Alignment

The manifold alignment represents a solution to the problem of alignment and at the same time forms the basis for finding a unified representation of different data sets. The principal concept of manifold alignment is to use the relationships between objects within each data set to obtain the information on the relationships between data sets and eventually to create a mapping of initially different data sets into a common latent space. The approaches described in this paragraph deal with different data sets having the same underlying structure. The basic low-dimensional representation is extracted by modeling the local geometry in the complex vector space using the generalized graph Laplacian operator associated with the data sets naturally coupled on the complex plane. Apparently, the alignment of manifolds can be considered as a method of reducing the dimension of the latent space [20], where the goal is to find a low-dimensional embedding of different data sets that preserves any known correspondences between them. Although we consider unsupervised methods, they can be easily modified to semi-supervised or supervised methods due to the corresponding term reflecting the distance from the chosen initial point in the sense of the given norm. In the reduced simplified form, we can formulate the coupled problem of the dimension decreasing as follows: Given n elements
x 1 , x 2 , , x n C l , x j = ( x 1 j , x 2 j , , x l j ) T , x s j C , l N ,
we are challenged to fined a set of the elements
y 1 , y 2 , , y n C m , 1 m l ,
so that the element y j is the image of the element x j under the searched mapping, i.e., x j y j . Moreover, we are interested in finding a mapping for which the images would be located close to each other in some sense. In the case corresponding to the Euclidean space the desired mapping can be created due to the subspace generated by the generalized eigenvectors of the graph Laplacian operator. Below, we present the detailed analysis of the algorithm solving this problem in the case corresponding to the complex Euclidean space, which simultaneously solves the problem of coupling different data sets.

3.1. Coupled Laplacian Mapping

Throughout the paper, we consider a symmetric matrix W C n × n , W = { W s j } having non-negative real and imaginary parts, and define a diagonal matrix
D = { D s j } , D s j = s = 1 n W s j , s = j 0 , s j .
Further, we assume that D j j 0 , j = 1 , 2 , , n , which does not restrict the appropriate class of matrices due to the given sense of the elements. Consider an operator L : = D W . Note that L is not self-adjoint; the verification is left to the reader. However, it has the following property: L = L ¯ . Since D is a diagonal matrix, then it commutes with an arbitrary matrix of the corresponding dimension. In accordance with the symmetry property, we have
Re L s s = j s Re W s j , s = 1 , 2 , n ,
therefore, the matrix Re ( L ) is a symmetric diagonally dominant. Hence, it is positive semidefinite, i.e., Re ( L ) 0 , from what follows that Re L 0 . The analogous reasoning leads to the fact that Im L 0 . The following lemma establishes a technical tool relevant in the further reasonings.
Lemma 1. 
Assume that W C n × n , then, the following relation holds
tr { A L A } = j = 1 m ( L a j . , a j . ) C n = 1 2 s , j = 1 n a s a j E m 2 W s j .
Proof. 
By direct calculation, we have
tr { A L A } = q = 1 m s , j = 1 n L s j a ¯ s q a j q .
It is clear that
s , j = 1 n L s j a ¯ s q a j q = s = 1 n L s s | a s q | 2 + s j s , j = 1 n L s j a ¯ s q a j q = s = 1 n D s s | a s q | 2 s = 1 n W s s | a s q | 2 + s j s , j = 1 n L s j a ¯ s q a j q =
= s = 1 n D s s | a s q | 2 s = 1 n W s s | a s q | 2 s j s , j = 1 n W s j a ¯ s q a j q = s = 1 n D s s | a s q | 2 s , j = 1 n a ¯ s q W s j a j q = ( L a q . , a q . ) C n .
On the other hand,
Re s , j = 1 n | a s q a j q | 2 W s j = Re s , j = 1 n | a s q | 2 + | a j q | 2 2 Re { a s q a ¯ j q } W s j =
= 2 Re s = 1 n Re D s s | a s | 2 s , j = 1 n a s q a ¯ j q Re W s j = 2 ( a q . , Re L a q . ) C n = 2 Re ( L a q . , a q . ) C n .
Analogously, we get
Im s , j = 1 n | a s q a j q | 2 W s j = Im s , j = 1 n | a s q | 2 + | a j q | 2 2 Re { a s q a ¯ j q } W s j =
= Re s = 1 n Im D s s | a s | 2 s , j = 1 n a s q a ¯ j q Im W s j = 2 Re ( a q . , Im L a q . ) C n = 2 Im ( L a q . , a q . ) C n .
Combining the above relations, we get
q = 1 m ( L a q . , a q . ) C n = 1 2 q = 1 m s , j = 1 n W s j | a s q a j q | 2 = 1 2 s , j = 1 n W s j q = 1 m | a s q a j q | 2 = 1 2 s , j = 1 n a s a j C m 2 W i j .
Using Formula (1), we obtain the desired result.   □
Theorem 1. 
Assume that W C n × n , A C n × n , A D A = e i θ · I , then, the following relation holds
tr { A L A } = 2 e i θ · tr { D 1 L } .
Proof. 
Consider a decomposition
D = Re D + i Im D .
Note that D j j 0 , j = 1 , 2 , , n , therefore the matrix D is invertible and commutes, as well as its inverse, since they are diagonal, with an arbitrary matrix of the corresponding dimension. Thus, applying Lemma 1, we get
1 2 tr { A L A } = j = 1 n ( L a j . , a j . ) C n = j = 1 n ( D 1 L D a j . , a j . ) C n =
= j = 1 n ( D 1 L Re D a j . , a j . ) C n + i j = 1 n ( D 1 L Im D a j . , a j . ) C n .
Using the condition Re W , Im W 0 and the symmetry property, we can define the nonnegative self-adjoint square roots and rewrite the previous relation in the following form
1 2 tr { A L A } = j = 1 n ( D 1 L Re D a j . , Re D a j . ) C n + i j = 1 n ( D 1 L Im D a j . , Im D a j . ) C n .
Note that the condition A D A = e i θ · I can be rewritten in the form
ψ s q : = j = 1 n D j j a ¯ j s a j q = δ s q e i θ , s , q = 1 , 2 , , n ,
where A D A = ψ s q . It is clear that A D A = e i θ · I , therefore,
A Re D A = 1 2 A D A + A D A = cos θ · I , A Im D A = sin θ · I .
The latter can be rewritten in the form
( Re D a s . , Re D a j . ) C n = cos θ · δ s j , ( Im D a s . , Im D a j . ) C n = sin θ · δ s j , s , j = 1 , 2 , , n .
It follows from the conditions that the following sets
Re D a j . , Im D a j . , j = 1 , 2 , , n
are orthogonal in C n ; moreover, in accordance with Equation (3), they become orthonormal if we consider the corresponding multipliers. Applying the well-known theorem on the finite-dimensional operator trace, taking into account Equation (2), we obtain the desired result.   □

3.1.1. Generalized Eigenvectors

Assume additionally that the matrix W C n × n is normal; let us show that the corresponding operator L = D W is normal. We have
L L = ( H 1 i H 2 ) ( H 1 + i H 2 ) = H 1 2 + H 2 2 + i H 1 H 2 i H 2 H 1 ,
where H 1 : = Re L = Re D Re W , H 2 : = Im L = Im D Im W . Note that Re D and Im D are diagonal; therefore, they commute with an arbitrary matrix of the corresponding dimension. Hence, using the commutative property of the Hermitian components of the normal operator, i.e., H 1 H 2 = H 2 H 1 , we obtain the desired result.
Having generalized the corresponding self-adjoint operator [21], we will call the operator L by the generalized graph Laplacian operator. Consider the following generalized eigenvalue problem
L e = λ D e , e C n , λ C ,
where L is the generalized graph Laplacian operator. Assume that the following condition is imposed upon the solutions of the problem (4), i.e.,
( D e , e ) C n = e i θ , θ ( 0 , π / 2 ) ,
then the solutions are called by the generalized eigenvectors of the operator L . Denote by N ( L ) the subspace generated by the generalized eigenvectors of the operator L . We have the following decomposition
C n = N ( L ) + . N ( L ) .
To prove this fact, we should use the decomposition formula for an arbitrary bounded operator acting in C n , i.e.,
C n = R ( L ) + . N ( L ) .
Since the operator D 1 L is normal, then N ( L ) = N ( L ) = R ( L ) , the latter leads to Equation (5). Denote
ξ : = dim { N ( L ) } , μ : = dim { N ( L ) } ;
then, in accordance with Equation (5), we have ξ + μ = n . Consider a set the generalized eigenvectors
L e 1 = λ 1 D e 1 , L e 2 = λ 2 D e 2 , , L e n = λ n D e n ,
numbered in accordance with the order of the corresponding eigenvalues
| λ 1 | < | λ 2 | < < | λ ξ | , λ n μ + 1 = λ n μ + 2 = = λ n = 0 .
We have the following implication
L e j = λ j D e j , L e j = λ ¯ j D e j , j = 1 , 2 , , ξ .
This fact can be proved easily if we notice that D 1 L is a normal operator. Let us show that the generalized eigenvectors satisfy the following condition
( D e s , e j ) C n = e i θ · δ s j .
For this purpose, note that the case s = j is given due to the definition, assume that s j and consider the following reasonings
λ s ( D e s , e j ) C n = ( L e s , e j ) C n = ( e s , L e j ) C n = ( e s , λ ¯ j D e j ) C n = λ j ( D e s , e j ) C n .
Taking into account the fact that λ s λ j , we obtain the desired result. Note that the multiplier e i θ defines the unitary operator, and it is clear that the operator L 1 : = e i θ L is normal. Denote by { β j } 1 ξ , { γ j } 1 ξ the set of the eigenvalues { λ j } 1 ξ numerated as follows
{ β j , γ j } , { λ p , λ q } : Re β j = Re λ p , Im γ j = Im λ q , j , p , q { 1 , 2 , , ξ } ,
Re β j Re β j + 1 , Im γ j Im γ j + 1 , j = 1 , 2 , , ξ 1 .
It is clear that we can enumerate the sets of the generalized eigenvectors in accordance with the given order so that the sets
g 1 , g 2 , , g ξ , s 1 , s 2 , , s ξ
correspond to the real and imaginary parts of eigenvalues, respectively. Denote by
N ξ : = N ( L ) , N m , m = 1 , 2 , , ξ
the subspace generated by the generalized eigenvectors e 1 , e 2 , , e m . The proof of the following theorem represents the modified concept of the gradient descent method
Theorem 2. 
Assume that the matrix W C n × n is normal, then the following relation holds
min A D A = e i θ I j = 1 m Re ( L 1 a j . , a j . ) C n = j = 1 m Re β j , min A D A = e i θ I j = 1 m Im ( L 1 a j . , a j . ) C n = j = 1 m Im γ j .
Proof. 
Consider the following reasonings
e i θ j = 1 m ( L a j . , a j . ) C n = e i θ j = 1 m s = 1 ξ c s j β s D g s , s = 1 ξ c s j g s C n = j = 1 m s = 1 ξ | c s j | 2 β s = s = 1 ξ β s j = 1 m | c s j | 2 ,
where we used the following decomposition
a j . = s = 1 ξ c s j g s .
Therefore, observing the real parts on the both sides of the equality, we get
Re j = 1 m ( L 1 a j . , a j . ) C n = s = 1 ξ Re β s j = 1 m | c s j | 2 .
Note that
e i θ · δ j k = ( D a j . , a k . ) C n = s = 1 ξ c s j D g s , s = 1 ξ c s k g s C n = e i θ s = 1 ξ c s j c ¯ s k ; s = 1 ξ c s j c ¯ s k = δ j k .
Thus, the first Equation (7) can be rewritten in the form
min C C = I s = 1 ξ Re β s j = 1 m | c s j | 2 , j = 1 m s = 1 ξ | c s j | 2 = m ,
where C : = { c s j } C ξ × m . Denote
x s : = j = 1 m | c s j | 2 ,
then the expression corresponding to the main problem can be rewritten in the form
f ξ ( x 1 , x 2 , , x ξ ) : = s = 1 ξ x s Re β s , s = 1 ξ x s = m ,
where the function f ξ is the function of several real variables defined on R ξ . Substituting, we get
g ξ 1 ( x 1 , x 2 , , x ξ 1 ) : = s = 1 ξ 1 x s Re β s + Re β ξ m s = 1 ξ 1 x s = s = 1 ξ 1 x s Re β s β ξ + m Re β ξ ,
s = 1 ξ 1 x s m .
Therefore,
g ξ 1 = c 1 ξ c 2 ξ c ξ 1 ξ , c s ξ : = Re β s β ξ < 0 .
Combining this fact with the properties of the hyperplane, we get
min x R m ξ 1 g ξ 1 ( x ) = min x T m ξ 1 s = 1 ξ 1 x s Re β s = min x T m ξ 1 f ξ 1 ( x ) ,
where
R m n : = x R n : s = 1 n x s m , T m n : = x R n : s = 1 n x s = m , n , m N .
Implementing the same reasonings, we get
g ξ 2 ( x 1 , x 2 , , x ξ 2 ) : = s = 1 ξ 2 x s Re β s + Re β ξ 1 m s = 1 ξ 2 x s =
= s = 1 ξ 2 x s Re β s β ξ + m Re β ξ 1 , s = 1 ξ 2 x s m .
Therefore, the constant vector g ξ 2 has negative coordinates. Analogously to the above, we get
min x R m ξ 2 g ξ 2 ( x ) = min x T m ξ 2 f ξ 2 ( x ) .
Implementing the same reasonings, we come to the problem
min x T m m f m ( x ) .
On the other hand, we have
f m ( x ) = s = 1 m Re β s j = 1 m | c s j | 2 , s = 1 m c ¯ s k c s j = δ j k , j , k = 1 , 2 , m .
Note that the last condition can be rewritten in the form
C C = I , C C m × m , C C = I ,
i.e., C is the unitary matrix. Thus, in accordance with the well-known property of the unitary matrix, the latter equality can be rewritten in the form
j = 1 m c s j c ¯ k j = δ s k , s , k = 1 , 2 , m .
Substituting this relation into Equation (8), we obtain
min A D A = e i θ I j = 1 m Re ( L 1 a j . , a j . ) C n = min x T m m f m ( x ) = s = 1 m Re β s .
The proof corresponding to the imaginary part is absolutely analogous.   □
The following lemma represents a technical tool to construct mappings having the required properties,
Lemma 2. 
Assume that the matrix W C n × n is normal, the following relation holds
Re β j = Re λ p , Im γ j = Im λ q , j , p , q { 1 , 2 , , m } ,
then, for a matrix Y C n × m satisfying the conditions
{ y j . } 1 m N m , ( D y s . , y j . ) C n = e i θ · δ s j , s , j = 1 , 2 , , m ,
the following relation holds
argmin A D A = e i θ I tr A L A = Y , A C n × m .
Proof. 
Applying Lemma 1 and Theorem 2, we obtain
min A D A = e i θ I tr A L A = min A D A = e i θ I j = 1 m ( L 1 a j . , a j . ) C n 2 =
= min A D A = e i θ I j = 1 m Re ( L 1 a j . , a j . ) C n 2 + j = 1 m Im ( L 1 a j . , a j . ) C n 2
min A D A = e i θ I j = 1 m Re ( L 1 a j . , a j . ) C n 2 + min A D A = e i θ I j = 1 m Im ( L 1 a j . , a j . ) C n 2 =
= j = 1 m Re β j 2 + j = 1 m Im γ j 2 = j = 1 m Re λ j 2 + j = 1 m Im λ j 2 .
The latter relation holds due to the one-to-one correspondence between the eigenvalues and their real or imaginary parts, numerated in order of their increasing values. In accordance with the well-known theorem on the trace of the finite-dimensional operator, analogously to the proof of Theorem 1, taking into account Equations (2) and (3), we get
j = 1 m ( L 1 y j . , y j . ) C n 2 = j = 1 m ( D 1 L 1 Re D y j . , Re D y j . ) C n + i j = 1 m ( D 1 L 1 Im D y j . , Im D y j . ) C n =
= j = 1 m ( D 1 L 1 Re D e j , Re D e j ) C n + i j = 1 m ( D 1 L 1 Im D e j , Im D e j ) C n =
= j = 1 m L 1 e j , e j C n 2 = j = 1 m Re ( L 1 e j , e j ) C n 2 + j = 1 m Im ( L 1 e j , e j ) C n 2 =
= j = 1 m Re λ j 2 + j = 1 m Im λ j 2 .
The latter relation leads to the desired result.   □

3.1.2. Restriction of the Dimension

Consider the sets of elements
u 1 , u 2 , , u n R l , v 1 , v 2 , , v n R p , l , p N ,
and assume that the graph G is constructed in accordance with one of the methods described in the introduction section, i.e., ε -neighborhood or n-nearest neighbors methods. Thus, we can put weighted matrices W ( 1 ) , W ( 2 ) into correspondence with the sets in accordance with the given order, where
W s j ( 1 ) = e t 1 u s u j E l 2 , s j 0 , s j , W s j ( 2 ) = e t 1 v s v j E p 2 , s j 0 , s j .
The following matrix is called by the matrix of the coupled mapping
W = α H + i β H , α , β > 0 , α + β = 1 ,
where
H = { H s j } C n × n , H s j = η W s j ( 1 ) + μ W s j ( 2 ) , η , μ > 0 , η + μ = 1 .
Since the Hermitian components of the operator W commute, we conclude that the operator W is normal. However, this fact can be established by direct calculation. Thus, we can apply the propositions proved in the previous paragraphs. Note that the values of the parameters α , β , η , μ reflect the influence of the corresponding components on the image. Let us construct the corresponding generalized graph Laplacian operator L = D W . It is not hard to prove that relation Equation (6) holds, where θ = arctan β / α , we should notice that in accordance with Equation (10), we have
θ = arg ( D z , z ) C n = arctan Im ( D z , z ) C n Re ( D z , z ) C n = arctan β α , z C n .
Analogously, we get
arg ( L z , z ) C n = arctan β α , z C n .
Note that the Hermitian components of the operator L are diagonally dominant matrices, therefore, they are positive semidefinite matrices. We have
Re ( L e j , e j ) C n > 0 , Im ( L e j , e j ) C n > 0 , j = 1 , 2 , , ξ .
Using the relation ( L e j , e j ) C n = λ j e i θ , j = 1 , 2 , , ξ , we can obtain easily arg ( L z , z ) C n = θ , therefore, arg λ j = 0 , j = 1 , 2 , , ξ . Without loss of generality, assume that l p , Then, we can put the subset of C l into correspondence with the subsets of Equation (9) in the following way: x j = u j + i v j , j = 1 , 2 , , n , where we assume that the element v j has the zero coordinates with the indexes more than the value p < l .
Thus, if we find the mapping from C l into C m , 1 m p , i.e., F : C l C m , then due to the correspondence between the real and imaginary parts of the complex numbers we obtain naturally coupled mappings
F 1 : R l R m , F 2 : R p R m .
Assume that the matrix W satisfies Equation (10); consider the following problem
argmin A D A = e i θ I s , j = 1 n a s a j C m 2 W s j , A C n × m .
Eventually, we can resume the following theorem
Theorem 3. 
The solution of Equation (11) is represented by a set { y j } 1 n C m satisfying the following conditions
{ y j . } 1 m N m , ( D y s . , y j . ) C n = e i θ · δ s j , s , j = 1 , 2 , , m .
Moreover, the following coupled mapping is defined
u j Re y j , v s Im y j , j = 1 , 2 , , n .
Proof. 
The proof follows immediately from Lemma 1, Lemma 2.   □

3.1.3. Probabilistic Approach

It is remarkable that Theorem 3 creates a prerequisite for further optimization problems. In this regard, we can observe the Kullback–Leibler divergence as the most suitable tool practically showing the preservation between the elements’ characteristics. The generalized graph of the Laplacian method projects the distinct unmatched features across single-cell multi-omics data sets into a common embedding space. The aim is to preserve the intrinsic low-dimensional structures as well as the aligned cells simultaneously. However, the next reasonable step is to apply an analog of the t-distribution stochastic neighbor embedding method [22]. Stochastic Neighbor Embedding (SNE) [23] starts by converting the high-dimensional Euclidean distances between data elements into conditional probabilities that represent similarities. The similarity between the data element x j and the data element x s , ( s , j = 1 , 2 , , n ) is the conditional probability, p j | s , that x s would correspond to x j as its neighbor if neighbors are corresponded to each other in proportion to their probability density under the Gaussian function centered at x s . For nearby data elements, p j | s is relatively high; at the same time, for widely separated data elements, p j | s is almost infinitesimal for reasonable values of the variance of the Gaussian function σ s . Theoretically, the conditional probability p j | s is defined as follows
p j | s : = e x s x j / 2 σ s 2 m s e x s x m / 2 σ s 2 ,
where σ s is the variance of the Gaussian function [22] centered on the data element x s . Since we are only interested in modeling pairwise similarities, we set the value of p s | s to zero. For the low-dimensional counterparts y s and y j of the high-dimensional data elements x s and x j , it is easy to compute a similar conditional probability, which we denote by q j | s . We set the variance of the Gaussian function that is employed in the calculation of the conditional probabilities q j | s to the value 2 1 / 2 . Therefore, we model the similarity of the map element y j to the map element y s by
q j | s : = e y s y j m s e y s y m .
Analogously to the previous case we set q s | s = 0 . In accordance with the description given in [22], we assume that if the map elements y s and y j correctly model the similarity between the high-dimensional data elements x s and x j , the conditional probabilities p j | s and q j | s will be equal. Having taken into account this description, the SNE method aims to find a low-dimensional data representation that minimizes the divergence between p j | s and q j | s . Note that a natural measure of the faithfulness with which q j | s models p j | s is the Kullback–Leibler divergence
KL ( P s | | Q s ) = j = 1 n p j | s log p j | s q j | s ,
where P s represents the conditional probability distribution corresponding to the data element x s over all other data elements. The symbol Q s represents the conditional probability distribution corresponding to the map element y s over all other map elements. The indefinite value of the function for elements with the same indexes is defined as follows
p s | s log p s | s q s | s = 0 , s = 1 , 2 , , n .
The SNE method minimizes the sum of Kullback–Leibler divergences over all data elements using a gradient descent method. The objective function is represented by the following expression
s = 1 n KL ( P s | | Q s ) .
Since the Kullback–Leibler divergence includes the logarithmic function, various types of effects can appear. For instance, there is a large cost for using widely separated map elements to represent nearby data elements, i.e., for using a small q j | s to model a large p j | s , but there is only a small cost for using nearby map elements to represent widely separated data elements. This observation leads to the conclusion that the cost function focuses on preserving the local structure of the data in the mapping. In accordance with [22], in most cases there does not exist a single value of σ s that is appropriate for all data elements in the data set since the density of the data is variable. The method represented in [22] produces a scheme of reasonings allowing one to choose a suitable value of the parameter σ s . Since we consider the symmetric case, which corresponds to the constant value of σ s , we represent the general idea of the selection strategy of the variance σ s .
In accordance with [22], in the general case, it is not likely that there is a single value of σ s that is optimal for all data elements in the data set because the density of the data is likely to vary. In dense regions, a smaller value of σ s is usually more appropriate than in sparser regions. Any particular value of σ s induces a probability distribution, P s over all of the other data elements. This distribution has an entropy that increases as σ s increases. In accordance with [23] SNE performs a binary search for the value of σ s that produces a P s with a fixed perplexity [22] that is specified by the user. The perplexity is defined as follows
P e r p ( P s ) = 2 H ( P s ) ,
where H ( P s ) is the Shannon entropy defined as follows
H ( P s ) : = j = 1 n p j | s log 2 p j | s , p j | s = e x s x j / 2 σ s 2 m s e x s x m / 2 σ s 2 .
As it was mentioned above, the perplexity is specified by the user [22,23], what gives a key to obtain the corresponding set { σ s } 1 n .
The fact that the solution of Equation (11) is not unique allows to extend the theoretical investigation and apply probabilistic approach represented above. Thus, the idea is to choose the concrete solution from the set of solutions belonging to the generalized eigenvector subspace having the desirable dimension.
This solution should satisfy the penalty term given by Kullback–Leibler divergence. Here, we use the following notations
X = x 1 , x 2 , , x n T , Y = y 1 , y 2 , , y n T .
In accordance with Theorem 3, we have
X 1 Y 1 , X 2 Y 2 .
where
X 1 : = Re X , Y 1 : = Re Y , X 2 : = Im X , Y 2 : = Im Y .
Consider the constructions related to the symmetric SNE [22], in this case, we have p j | s = p s | j , q j | s = q s | j , thus we can use the unified form of notations p s j , q s j . The pairwise similarities are given by
P s ( 1 ) : = p s j ( 1 ) , p s j ( 1 ) : = e Re ( x s x j ) / 2 σ 2 m s e Re ( x s x m ) / 2 σ 2 , Q s ( 1 ) : = q s j ( 1 ) , q s j ( 1 ) : = e Re ( y s y j ) m s e Re ( y s y m ) .
Analogously, we define P s ( 2 ) , Q s ( 2 ) , where
p s j ( 2 ) : = e Im ( x s x j ) / 2 σ 2 m s e Im ( x s x m ) / 2 σ 2 , q s j ( 2 ) : = e Im ( y s y j ) m s e Im ( y s y m ) .
Using the assumptions related to the equal indexes and the indefinite value of the function given above, consider the following optimization problem
min Y 1 , Y 2 F ( Y 1 , Y 2 ) ,
where the functional is defined as follows
F ( Y 1 , Y 2 ) = s = 1 n KL ( P s ( 1 ) | | Q s ( 1 ) ) + s = 1 n KL ( P s ( 2 ) | | Q s ( 2 ) ) + ζ Y 1 Y 2 F 2 , ζ 0 .
The penalty term reflects the distance between images, however it can be regulated by the parameters α , β given in Paragraph 3.1.2. Moreover, the idea of the problem corresponding to the value ζ = 0 represents the interest itself due to the exclusive natural structure of the complex plane.

3.1.4. Characteristic of the Mapping

Consider an abstract approach to a function creating a weight matrix, assume that
W s j = ψ ( x s , x j ) , s , j = 1 , 2 , , n ,
where ψ is a complex-valued function. Thus, having imposed on the function ψ corresponding conditions formulated in the previous paragraphs, we can create a mapping satisfying the required properties. It is remarkable that by virtue of the function ψ we can give sense, dictated by the relations between the elements x s and x j , to the values W s j , i.e., the value W s j reflects the degree of a relationship between the elements x s and x j . Apparently, we can state that Theorem 1 provides an indicator of the function ψ in some sense. Assume that the conditions of Theorem 1 holds; applying Lemma 1 observe the following relation
tr { A L A } = 1 2 s , j = 1 n a s a j C n 2 W s j = j = 1 n ( L a j . , a j . ) C n , A D A = e i θ .
On the other hand, in accordance with Theorem 1, we have
tr { A L A } = 2 e i θ tr D 1 L = 2 e i θ j = 1 n 1 W j j D j j ,
here we should notice the obvious fact | W j j / D j j | 1 . Therefore,
s , j = 1 n a s a j C n 2 W s j = 4 e i θ j = 1 n 1 W j j D j j , A D A = e i θ .
Motivated by this phenomena, we involve the functional f ( ψ ) as the indicator of the function ψ influence on the constructed mapping, where
f ( ψ ) : = e i θ j = 1 n 1 W j j D j j .
The simple heuristic reasoning leads us to the conclusion that the smaller the values W s j , s j , indicating a relationship between the elements, the more efficient the corresponding mapping from the point of view of the elements’ closeness. It is rather clear that Theorem 1 can be used to reveal fully the properties of a function that are relevant within the context. In order to observe the particular case given by the weight matrix corresponding to distributions, assume that ψ ( x s , x j ) = p s j , s , j = 1 , 2 , , n , then, in accordance with the assumptions accepted above D s s = 1 , W s s = 0 ; therefore, f ( ψ ) = n e i θ . The latter relation reflects some intrinsic sense appealing to the connection of the weight matrix and the dimension of the complex Euclidean space. This rather simple observation fully reveals the dependence of the mapping on the function ψ since the last equality can be used as the standard.

3.2. Unsupervised Manifold Alignment via the Reproducing Kernel

Nowadays, many methods of sequencing individual cells are available, but difficulties arise when applying several different sequencing methods to the same cell. In the paper [6], the authors represent unsupervised manifold alignment algorithm MMD-MA for integrating multiple measurements of observed data performed on various aliquots of a given cell population.
The idea of the method is to map the cells measurement data produced by various methods into a latent space.
According to the algorithm, the data of a single cell corresponding to several types of measurements are aligned by optimizing an objective function consisting of three components: (1) maximum average discrepancy (MMD), which sets constrictions on the mapping into the latent space; the “distances” between images should be minimal in the sense of some function determined by MMD; (2) a distortion term, a construction that preserves the structure of the initial data under the mapping; and (3) a penalty function that eliminates a trivial solution. It should be noted that the MMD-MA method does not require any information on the correspondence between the cell data processing methods. The requirements for the considered type of mapping are rather weak. They allow the algorithm to integrate data from measurements made in a single cell of heterogeneous features, such as gene expression, DNA availability, chromatin structure, methylation, and visualization data. In the paper [6] the authors demonstrate the relevance of the MMD-MA method in simulated experiments and using a set of real data, including data on gene expression in a single cell and data on methylation.
In this paragraph, we study the construction of the mapping based on the concept of the reproducing kernel. In order to implement the classical approach, the additional information from the theory of Hilbert spaces is provided. The issues related to the concept of the reproducing kernel in the Hilbert space are considered and a theoretical justification of the opportunity to construct a suitable mapping is given. Further, we study the objective function consisting of three terms considered in the paper [6]. In the conclusion, some ideas on the possibility of developing the mathematical concept of the method are represented.

3.2.1. Reproducing Kernel Hilbert Space

Let us define a complex linear space of elements of an arbitrary nature as a set of elements with a given structure of the linear operation acting into the initial set, i.e., the linear combinations of the elements belong to the initial set. A linear space with a given inner (scalar) product operation is called a unitary space. Thus, the inner product naturally defines the metric on the elements of a linear space. However, it may happen that some limits of sequences do not belong to the initial set of the linear space. Recall that a unitary space containing all its limiting elements is called by the Hilbert space. There are some discrepancies in the definitions of the Hilbert space. In some sources there is an additional condition related to the infinite dimension, i.e., an infinite-dimensional unitary space is called by the Hilbert space. As can be seen from the definition given above, the Euclidean space is a finite-dimensional unitary space over the field of real numbers. Further, we will denote the abstract Hilbert space by H and the corresponding inner product operation by ( f , g ) H , f , g H .
Let A be a set of elements of an arbitrary nature and let H be a system of complex functionals defined on the set A and forming a Hilbert space H . A complex functional of two variables K ( x , y ) , x , y A is called by the reproducing kernel of the space H if the following condition is satisfied. For an arbitrary fixed value x, the functional K ( x , y ) is an element of the space H , in symbols we have K x ( y ) : = K ( x , y ) H , moreover, for an arbitrary functional f H , we have
f ( x ) = ( f , K x ) H .
It is remarkable that there are such representatives of the Hilbert space of functionals defined on a set that a corresponding reproducing kernel does not exist. The Hilbert space containing the reproducing kernel is called by the reproducing kernel Hilbert space (RKHS). The most known representatives of RKHS are the Bergman space and the Dirichlet space, i.e., the spaces of holomorphic functions on the unit disk.
The following theorem establishes a criterion of the reproducing kernel existence.
Theorem 4. 
A Hilbert space H is RKSH if and only if for an arbitrary element y A there exists a constant C y such that
| f ( y ) | C y f H , f H .
We need the following theorem.
Theorem 5. 
(Moore-Aronszajn) Suppose that K ( x , y ) is a Hermitian symmetric, positive definite kernel on the set A × A . Then there exists a unique Hilbert space H of functions defined on A , for which K is the reproducing kernel. The space is defined as follows, consider the linear span of the functionals K x , x A , completing this set according to the norm generated by the following inner product, we obtain the desired RKHS
j = 1 n α j K x j , j = 1 m β j K y j H : = j = 1 n k = 1 m α j β k ¯ K ( x j , y k ) , α , β C .
Note that the axioms of the inner product can be verified directly taking into account the positive definiteness of the kernel K . According to this definition, we have
( K x , K y ) H = K ( x , y ) , x , y A , f = j = 1 α j K x j , x j A , f H ,
where convergence is understood in the sense of the norm of the Hilbert space H .

3.2.2. Kernel-Based Mapping

Consider the sets of elements X 1 , X 2 , in general, these can be sets of elements of an arbitrary nature. The elements of sets correspond to objects (cells, samples) having a set of features (various measurements reflecting one or more properties of the object). Thus, a correspondence naturally arises between a set of objects and a set of vectors, the coordinates of which are quantitative characteristics of features. We will consider the subsets
X s = x 1 ( s ) , x 2 ( s ) , , x n s ( s ) X s , s = 1 , 2 .
The numbers n 1 , n 2 reflect the number of objects under consideration. Note that we do not assume the existence of any similarity between these subsets. At the same time, the idea of further reasonings is to construct a mapping of these subsets into a certain space in which their images would be comparable.
As the main tool for further study, we use the following positively defined Hermitian symmetric kernels γ s ( x , y ) , x , y X s having the range of values belonging to C . The biological meaning of these mathematical constructions may be to quantify the similarity between objects. Let us introduce the following notation
K ( s ) = K q j ( s ) : = γ s x q ( s ) , x j ( s ) , q , j = 1 , 2 , , n s , s = 1 , 2 ,
i.e., K ( s ) C n s × n s . Since, in accordance with the made assumptions, both kernels are Hermitian symmetric, positive definite, then using Theorem 2, we can construct the Hilbert spaces H s (RKHS) of functionals defined on X s for which γ s are reproducing kernels.
Following the idea of finding mappings into some space in which the images of X s would be comparable, consider the following functionals
z l ( s ) ( x ) = j = 1 n s α l j ( s ) γ s x j ( s ) , x , x X s , l = 1 , 2 , , p , p N .
Note that in accordance with Theorem 2 these functionals are elements of the space H s . Thus, the following operator is defined
F ( s ) : X s C p , F ( s ) x = z ( s ) ( x ) , x X s ,
where
z ( s ) ( x ) = z 1 ( s ) ( x ) , z 2 ( s ) ( x ) , , z p ( s ) ( x ) T .
It is clear that it is possible to implement the correspondence in this way, regardless of the nature of the set X s , this universality provides the idea of the manifold alignment. Below, for the convenient form of writing we omit the index s , however the further reasonings are correct in both ( s = 1 , 2 ) cases. Denote
A C p × n , A : = α q j .
Note that
A K = z q x j C p × n .
Using Equation (13), applying Theorem 5, we get
A K A = v q j , v q j = k = 1 n z q x k α ¯ j k = k = 1 n α ¯ j k m = 1 n α q m γ x m , x k =
= m = 1 n α j m K x m , k = 1 n α q k K x k H = z j , z q H .
Applying these reasonings in the both cases corresponding to the value of the index s , we can rewrite the latter relations in the form
A ( s ) K ( s ) = z q ( s ) x j ( s ) C p × n s , A ( s ) K ( s ) A ( s ) = z j ( s ) , z q ( s ) H C p × p .
The next challenge is to select the mapping parameters in order to minimize the distances, understood in the sense of a function of the Euclidian metric, between the images. Using the Gaussian radial basic function (RBF), we will regulate matrices A ( s ) under a certain condition connecting them and determined by the relation between the elements belonging to X s . Thus, we can determine the similarity between the elements of the sets X s , s = 1 , 2 in the sense of the distance between them given by the following formula
G q j ( s , k ) : = G z ( s ) x q ( s ) , z ( k ) ( x j ( k ) ) , s , k = 1 , 2 , q = 1 , 2 , , n s , j = 1 , 2 , , n k ,
where the Gaussian RBF is defined as follows
G ( u , v ) : = e t 1 u v C p 2 , u , v C p , t R { 0 } .
The choice of the parameter t is determined by a concrete application. Note that the term distance is used in the heuristic sense since the construction does not satisfy the axioms of metric space, the verification is left to the reader. It is clear that using the Gaussian RBF, we can establish similarity in the sense of the distance between the elements X s , for identical and different values of s . This approach leads to the following function, reflecting both the relationship between elements within the sets and the relationship between elements of different sets
G A ( 1 ) , A ( 2 ) = 1 n 1 2 i , j = 1 n 1 G i j ( 1 , 1 ) 2 n 1 n 2 i = 1 n 1 j = 1 n 2 G i j ( 1 , 2 ) + 1 n 2 2 i , j = 1 n 2 G i j ( 2 , 2 ) .
In accordance with [6], the last formula has been related to MMD construction [24]. However, we use the exclusive notation since the similarity of definitions is rather vague.
Taking into account the specifics of the applications, the authors of the paper [6] consider that it is not sufficient to find the minimum of the function G relative to the matrices A ( s ) . Thus, an additional term characterizing distortion is introduced in order to ensure the preservation of relations between the images of elements, i.e.
dis A ( s ) = K ( s ) K ( s ) A ( s ) A ( s ) K ( s ) F ,
where the Frobenius norm is used. This expression quantifies how much the matrix K ( s ) differs from the matrix of inner products after mapping. The restriction imposed on dis A ( s ) inherently guarantees that the distortion between the data in the initial space and the corresponding data after mapping should be small. Consider the following penalty function
pen A ( s ) = I p A ( s ) K ( s ) A ( s ) F .
Note that in accordance with Equation (14), we have
I p A ( s ) K ( s ) A ( s ) = δ q j z j ( s ) , z q ( s ) H s , q , j = 1 , 2 , , p .
The latter relation characterizes the influence of the penalty function; thus, it finds the mapping having coordinates with the property close to the orthonormal one. In order to ensure that the mapping to C p satisfies this property, we can add the penalty function as a summand.
Taking into account the above, we arrive at the problem of finding the minimum of the objective function
argmin A ( 1 ) , A ( 2 ) G A ( 1 ) , A ( 2 ) + s = 1 2 λ 1 dis A ( s ) + λ 2 pen A ( s ) .
The solution can be found due to the Gradient Descent method [6]. In practice, it is necessary to specify several parameters to solve the optimization problem. The parameters include the dimension p of the space C p , the parameter of Gaussian RBF, as well as the parameters λ 1 , λ 2 . In the paper [6], the authors assume that the parameters p and λ 1 , λ 2 are set by the user.
Now assume that the set of elements X s is endowed with the complex structure. In this case, we have a mapping naturally induced by mapping Equation (12), i.e.
F ( s ) : Re X s R p , F ( s ) : Im X s R p .
Apparently, we obtain a mapping implementing the manifold alignment where a union of four data sets coupled in the natural way is considered. The advantage of the kernel-based approach is in the following. Firstly, we should remark that the set X s has or does not have an arbitrary topological structure. It can be an infinite-dimensional space endowed with the Hilbert space structure or a finite set without any structure. This assumption makes a great freedom in modifications of the method. In particular the interest arises in the case when X s is endowed with the quaternion structure H . In this case the problem is to modify the reproducing kernel construction to construct the desirable mapping. Further generalizations in this direction lead us to the hypercomplex numbers. The arbitrary structure of the preimages is fully adopted to the implementation of the idea to unite several data sets of various natures. However, the concept of the mapping requires a concrete technique that should be invented in the general case corresponding to the hypercomplex numbers.

3.3. Prospective Application to the Modern Methods

The mixOmics method [25] represents a specialized package for the R language, developed for multivariate analysis of biological data with emphasis on data exploration, dimensionality reduction, and results visualization. The input data structure in mixOmics assumes use of matrix X of size N samples by P predictors with continuous values and a categorical outcome vector y , which is automatically converted to an indicator matrix Y of size N by K classes. Output data represents a comprehensive set of results, including latent components for projecting samples into lower-dimensional space, loading vectors demonstrating each feature’s contribution, molecular signatures as selected features, and classification and prediction results for new samples, as well as various visualization types, including sample plots, variable plots, correlation networks, and heatmaps. All methods in the package are implemented as projection techniques where samples are summarized through H latent components defined as linear combinations of original predictors. This approach ensures not only effective data dimensionality reduction but also preserves result interpretability, which is critically important for understanding biological mechanisms and identifying significant biomarkers in omics studies. However, the general concept of mapping requires the usage of the unified structure represented in this paper. The kernel-based method can be involved as the universal method related to the abstract data sets.
The MultiVI method [26] represents a deep generative model for probabilistic analysis of multimodal single-cell data, designed to integrate different molecular modalities into a unified representation. This method is developed for joint analysis of transcriptome, chromatin accessibility, and surface protein expression data from individual cells, even when information for some cells is available only for one or several modalities. MultiVI finds application in cellular diversity studies and characterization of cell types and states, as well as in data integration tasks from different laboratories and sequencing technologies. The method can combine single-cell RNA sequencing data (scRNA-seq), chromatin accessibility analysis (scATAC-seq), and surface protein measurements using antibodies (CITE-seq). MultiVI effectively works with both fully paired data, where all modalities are measured in the same cells, and partially paired or unpaired data, where different cells are characterized by different sets of modalities. The data processing workflow in MultiVI consists of several sequential stages. First, the method uses modality-specific encoders based on deep neural networks to create latent representations of each modality, including accounting for batch effects and technical data features. Then these representations are combined into a joint latent space through averaging with the application of a distance penalty between modalities. At the final stage, modality-specific decoders reconstruct observations from the latent representation using appropriate distributions for each data type. Input data is provided in count matrix format for each modality, where rows correspond to cells and columns to features (genes, genomic regions, or proteins). MultiVI generates low-dimensional joint cell representations, normalized and batch-corrected values for all modalities, as well as uncertainty estimates for imputed values. Here, we should remark that it is possible to involve the unified natural structure to create a low-dimensional representation. Using the coupled generalized Laplacian mapping, we can form an intermediate subspace of images and then apply the invented algorithm in order to implement the coupling much more accurate.
The MUON method [27] represents a multimodal analytical platform for omics data, developed for organizing, analyzing, visualizing, and sharing multimodal biological data. This tool is designed to solve computational challenges arising when working with multi-omics experiments, including efficient storage, indexing, and seamless access to large data volumes, tracking biological and technical metadata, as well as processing dependencies between different omics layers. MUON can combine various types of omics data, including single-cell RNA sequencing (scRNA-seq), chromatin accessibility data (scATAC-seq), epitope profiling (CITE-seq), and DNA methylation, as well as spatial omics data. The platform supports trimodal analyses such as scNMT-seq or TEA-seq and can process arbitrary numbers of data modalities. The processing workflow in MUON consists of several sequential stages. First, preprocessing of individual modalities occurs, including quality control, sample filtering, data normalization, and feature selection for analysis. Then dimensionality reduction methods are applied, which can work with individual modalities (e.g., principal component analysis) or jointly process multiple modalities through approaches such as multi-omics factor analysis (MOFA) or weighted nearest neighbors (WNN). At the next stage, cell neighborhood graphs are constructed based on obtained representations, which can use information from individual modalities or combined multimodal representations. The final stage includes creating nonlinear embeddings through UMAP-type methods and clustering for cell type identification. Input data is provided in count matrix format for each modality along with corresponding metadata. MUON uses the MuData container, which represents a hierarchical data structure where each omics modality is stored as an AnnData object. Output data includes processed count matrices, low-dimensional representations, cell embeddings, neighborhood graphs, cluster labels, and differential analysis results, which can be visualized and used for biological interpretation of results. Apparently, the concept of embedding used in the method admits application of the methods elaborated in the paper.
The PolarBear method [28] represents a semi-supervised machine learning model for predicting missing modalities and aligning single-cell data between different types of omic measurements. PolarBear’s main task lies in solving the problem of integrating multimodal single-cell data when most cells have only one type of measurement available (e.g., only transcriptome or only chromatin accessibility), while comprehensive co-assay data measuring multiple modalities in a single cell are available in limited quantities. The method can be applied in cellular regulation studies, cellular heterogeneity analysis, differential gene expression studies between cell types, identification of cell-specific regulatory elements in tumor samples, and generating hypotheses about biological processes in modalities inaccessible for direct measurement. PolarBear can combine scRNA-seq and scATAC-seq data, working with various types of co-assay technologies, including CAR-seq, SNARE-seq, Paired-seq, and SHARE-seq. Principally, the method can be adapted to work with other types of omic data since it does not require feature correspondence between different measurement modalities. Data processing in PolarBear occurs in two main stages. At the first stage, the method trains two separate beta-variational autoencoders (beta-VAE) for each data modality, using both paired co-assay data and much more numerous unimodal data from public databases. The autoencoders learn to create stable latent cell representations independent of sequencing depth and batch effects. For scRNA-seq, the autoencoder assumes that gene read counts follow a zero-inflated negative binomial distribution, while for scATAC-seq the Bernoulli distribution is used for binary chromatin accessibility peaks. At the second stage, a fully connected translator layer is added between the trained autoencoders, which is trained in supervised mode exclusively on co-assay data to translate between latent representations of two modalities in both directions. Input data is provided as raw gene count matrices for scRNA-seq and binarized peak count matrices for scATAC-seq, with additional information about batches and sequencing depth for each cell. Output data includes predicted profiles in the missing modality, sequencing depth-normalized expression estimates, latent cell representations for alignment between modalities, differential expression analysis results, and cell-specific marker genes, providing possibilities for biological interpretation and subsequent analysis of integrated multimodal data. The alignment between modalities creates the prerequisite for the paper results application. However, the nature of the semi-supervised model and the only type of measurements may create some obstacles to the direct application of the methods discussed in the paper.
The sciCAN method [29] represents a method for integrating single-cell chromatin accessibility and gene expression data using cycle-consistent adversarial networks. The method is designed to combine scATAC-seq and scRNA-seq data into a unified representation without requiring prior information about cell correspondence between modalities. The sciCAN method can be applied for hematopoietic hierarchy analysis, studying cellular responses to CRISPR perturbations, constructing joint developmental cell trajectories, transferring cell type labels between modalities, and other integrative single-cell data analyses. The method can combine single-cell RNA sequencing and single-cell ATAC sequencing data, transforming chromatin accessibility peak matrices into gene activity matrices to ensure compatibility with gene expression data. Input data undergoes logarithmic transformation normalization with pseudocount addition, after which the top 3000 highly variable genes are identified for each modality and used as features for integration. Data processing in sciCAN consists of two main components: representation learning and modality alignment. Encoder E projects high-dimensional data from both modalities into a joint low-dimensional space using a noise contrastive estimation loss function to learn discriminative representation. For modality alignment, two separate discriminator networks are used: Drna distinguishes modality source in latent space, while Datac works with generator G to create connections between modalities through adversarial training with the addition of cycle-consistent losses. The architecture includes fully connected layers with batch normalization and ReLU activation for the encoder, three-layer multilayer perceptrons for discriminators with sigmoid activation, and a two-layer decoder for the generator. Additionally, linear transformation of 128-dimensional latent representation to 32-dimensional output and 25-dimensional SoftMax-activated output for NCE loss computation is applied. Input data is provided in gene expression and gene activity matrix format after preprocessing and normalization, while output data represents 128-dimensional joint latent cell representation used for subsequent integrative analysis, including trajectory construction, label transfer, and clustering of cells from both modalities in unified feature space. In this case, we should italicize that the combination of scATAC-seq and scRNA-seq data can be considered in the framework of the paper results. In particular, heterogenous data can be successfully coupled by the kernel-based method.
The SCIM method (Single-Cell data Integration via Matching) represents a scalable approach for integrating single-cell data obtained using different profiling technologies without requiring feature correspondence between modalities. The method is designed to recover correspondences between cells measured by different technologies when paired correspondences between data sets are lost due to cell consumption during profiling processes. The SCIM method can be applied to integrate any single-cell omics technologies including scRNA-seq, CyTOF, proteomics, and genomics, provided there is a common latent structure in the data. The method can combine data from two or more technologies that measure non-overlapping feature sets, for example expression of different gene sets, gene expression with image characteristics, or any other single-cell measurements originating from the same cell suspension. The SCIM method consists of two main processing stages: first, an integrated latent space invariant to technology is created using an autoencoder/decoder framework with an adversarial objective function, where separate encoder and decoder networks are trained for each technology, and a single discriminator acts in latent space to ensure indistinguishability of representations from different technologies. Then a bipartite matching scheme is applied for pairwise connection of cells between technologies, using their low-dimensional latent representations through an efficient bipartite matching algorithm based on the minimum cost maximum flow problem. Input data is provided in cell-by-feature matrix format for each technology, where features are specific to the profiling technology but can represent gene expression, protein levels, or other measurements. Output data represents eight-dimensional latent cell representations in common feature space and paired correspondences between cells from different technologies obtained through the bipartite matching algorithm, allowing use of true observed signals per cell pair for any subsequent analysis while maintaining technology-invariant data integration. It should be noted that the specifics of the conditions, such as the use of various profiling technologies and the need for paired correspondences between data sets, determine the relevance of the paper results application and subsequent comparison analysis.
From the point of view of applications, the aim of this paper is to develop methods for the diagonal integration of multimodal biological data for a comprehensive study of the biological processes, in particular the aging process. The prospective project based on the paper’s fundamental results plans to develop new mathematical methods of diagonal integration that take into account the identified limitations of existing approaches. We propose to combine and jointly analyze seven different types of data: genomic, transcriptomic, proteomic, metabolomic, and lipidomic data. The aging process acts as a suprasystem factor for diagonal integration as a fundamental biological phenomenon that manifests itself at all levels of the molecular organization. A comprehensive analysis of the integrated data will make it possible to identify new pharmaceutical targets for therapeutic interventions in the aging process, identify multidimensional biomarkers of aging with increased predictive power, and establish causal relationships between molecular changes at various levels of biological organization in the context of age-associated processes.

4. Conclusions

Having analyzed the main mathematical principles forming the concept of the unsupervised topological alignment, we represent a natural algebraic structure coupling the heterogenous data sets as well as their images. In this regard, the harmonious generalization of the graph Laplacian method was obtained over the field of the complex numbers. The kernel-based method with the central idea to find an appropriate structure coupling images of data sets of various natures was extended to the complex vector space. The prospective theoretical results appeal to more complicated algebraic structures, allowing to couple naturally an arbitrary number of data sets and their images. Thus, the quaternion structure can be considered as further generalization that leads to the hypercomplex numbers and more abstract mathematical objects. It was shown that the kernel-based methods are completely efficient for the implementation of the idea to unite data sets of various natures. The main obstacle in further development is that a mapping requires a concrete technique, i.e., generalizations of the well-known results of the operator theory for the module over the hypercomplex structure. Finally, we presented the detailed analysis of the modern biological methods supplied with the comments on possible applications of the invented approach. The authors believe that the represented theoretical approach is principally novel, while the obtained conclusions admit relevant biological applications.

Author Contributions

Conceptualization, M.V.K., M.S.A., D.E.B. and A.V.C.; methodology, M.V.K. and A.V.C.; formal analysis, M.V.K.; investigation, M.V.K. and M.S.A.; resources, M.S.A.; data curation, M.S.A., D.E.B. and A.V.C.; writing—original draft, M.V.K.; supervision, A.V.C.; project administration, M.S.A. All authors have read and agreed to the published version of the manuscript.

Funding

State Assignment of Russian National Research Medical University № 125022602916-4.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mironenko, I.V.; Kryukova, O.V.; Buianova, A.A.; Churov, A.V.; Arbatsky, M.S.; Kubrikova, A.A.; Petrusenko, Y.S.; Repinskaia, Z.A.; Shmitko, A.O.; Ilyina, G.A.; et al. ACE-Dependent Alzheimer’s Disease: Circulating ACE Phenotypes in Heterozygous Carriers of Rare ACE Variants. Int. J. Mol. Sci. 2025, 26, 9099. [Google Scholar] [CrossRef] [PubMed]
  2. Arbatskiy, M.; Balandin, D.; Churov, A.; Varachev, V.; Nikolaeva, E.; Mitrofanov, A.; Bekyashev, A.; Tkacheva, O.; Susova, O.; Nasedkina, T. Intratumoral Cell Heterogeneity in Patient-Derived Glioblastoma Cell Lines Revealed by Single-Cell RNA-Sequencing. Int. J. Mol. Sci. 2024, 25, 8472. [Google Scholar] [CrossRef] [PubMed]
  3. Kulebyakina, M.; Basalova, N.; Butuzova, D.; Arbatsky, M.; Chechekhin, V.; Kalinina, N.; Tyurin-Kuzmin, P.; Kulebyakin, K.; Klychnikov, O.; Efimenko, A. Balance between Pro and Antifibrotic Proteins in Mesenchymal Stromal Cell Secretome Fractions Revealed by Proteome and Cell Subpopulation Analysis. Int. J. Mol. Sci. 2024, 25, 290. [Google Scholar] [CrossRef] [PubMed]
  4. Basalova, N.; Sagaradze, G.; Arbatskiy, M.; Evtushenko, E.; Kulebyakin, K.; Grigorieva, O.; Akopyan, Z.; Kalinina, N.; Efimenko, A. Secretome of Mesenchymal Stromal Cells Prevents Myofibroblasts Differentiation by Transferring Fibrosis-Associated microRNAs within Extracellular Vesicles. Cells 2020, 9, 1272. [Google Scholar] [CrossRef]
  5. Welch, J.D.; Hartemink, A.J.; Prins, J.F. MATCHER: Manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol. 2017, 18, 138. [Google Scholar] [CrossRef]
  6. Liu, J.; Huang, Y.; Singh, R.; Vert, J.P.; Noble, W.S. Jointly Embedding Multiple Single-Cell Omics Measurements. Algorithms Bioinform. 2019, 143, 10. [Google Scholar]
  7. Cao, K.; Hong, Y.; Wan, L. Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 2022, 38, 211–219. [Google Scholar] [CrossRef]
  8. Chapel, L.; Alaya, M.Z.; Gasso, G. Partial optimal tranport with applications on positive-unlabeled learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2900–2910. [Google Scholar]
  9. Duren, Z.; Chen, X.; Zamanighomi, M.; Zeng, W.; Satpathy, A.T.; Chang, H.Y.; Wang, Y.; Wong, W.H. Integrative analysis of single-cell genomics data by coupled nonnegative matrix factorizations. Proc. Natl. Acad. Sci. USA 2018, 115, 7723–7728. [Google Scholar] [CrossRef]
  10. Cao, K.; Bai, X.; Hong, Y.; Wan, L. Unsupervised topological alignment for single-cell multi-omics integration. Bioinformatics 2020, 36, 148–156. [Google Scholar] [CrossRef]
  11. Dou, J.; Liang, S.; Mohanty, V.; Cheng, X.; Kim, S.; Choi, J.; Li, Y.; Rezvani, K.; Chen, R.; Chen, K. Unbiased integration of single cell multi-omics data. bioRxiv 2020. [Google Scholar] [CrossRef]
  12. Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.; Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef] [PubMed]
  13. Rosenberg, A.B.; Roco, C.M.; Muscat, R.A.; Kuchina, A.; Sample, P.; Yao, Z.; Graybuck, L.T.; Peeler, D.J.; Mukherjee, S.; Chen, W.; et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 2018, 360, 176–182. [Google Scholar] [CrossRef] [PubMed]
  14. Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M., III; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive integration of single-cell data. Cell 2019, 177, 1888–1902. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, C.; Sun, D.; Huang, X.; Wan, C.; Li, Z.; Han, Y.; Qin, Q.; Fan, J.; Qiu, X.; Xie, Y.; et al. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol. 2020, 21, 198. [Google Scholar] [CrossRef]
  16. Belkin, M.; Matveeva, I.; Niyogi, P. Regularization and semi-supervised learning on large graphs. In Learning Theory. COLT 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 624–638. [Google Scholar]
  17. Ek, C.H. Shared Gaussian Process Latent Variables Models. Ph.D. Thesis, Oxford Brookes University, Oxford, UK, 2004. [Google Scholar]
  18. Ham, J.H.; Lee, D.D.; Saul, L.K. Semisupervised alignment of manifolds. In Proceedings of the 10th International Conference on Artificial Intelligence and Statistics, Bridgetown, Barbados, 6–8 January 2005; pp. 120–127. [Google Scholar]
  19. Kukushkin, M.V. Schatten Index of the Sectorial Operator via the Real Component of Its Inverse. Mathematics 2024, 12, 540. [Google Scholar] [CrossRef]
  20. Ham, J.H.; Lee, D.D.; Saul, L.K. Learning high dimensional correspondences from low dimensional manifolds. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003. [Google Scholar]
  21. Belkin, M.; Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
  22. Van Der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  23. Hinton, G.E.; Roweis, S.T. Stochastic Neighbor Embedding. Adv. Neural Inf. Process. Syst. 2002, 15, 833–840. [Google Scholar]
  24. Chwialkowski, K.; Ramdas, A.; Sejdinovic, D.; Gretton, A. Fast two-sample testing with analytic representations of probability measures. Adv. Neural Inf. Process. Syst. 2015, 28, 1981–1989. [Google Scholar]
  25. Rohart, F.; Gautier, B.; Singh, A.; Lê Cao, K.-A. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 2017, 13, e1005752. [Google Scholar] [CrossRef]
  26. Ashuach, T.; Gabitto, M.I.; Koodli, R.V.; Saldi, G.A.; Jordan, M.I.; Yosef, N. MultiVI: Deep generative model for the integration of multimodal data. Nat. Methods 2023, 20, 1222–1231. [Google Scholar] [CrossRef]
  27. Bredikhin, D.; Kats, I.; Stegle, O. MUON: Multimodal omics analysis framework. Genome Biol. 2022, 23, 42. [Google Scholar] [CrossRef] [PubMed]
  28. Zhang, R.; Meng-Papaxanthos, L.; Vert, J.-P.; Noble, W.S. Multimodal Single-Cell Translation and Alignment with Semi-Supervised Learning. J. Comput. Biol. 2022, 29, 1198–1212. [Google Scholar] [CrossRef]
  29. Xu, Y.; Begoli, E.; McCord, R.P. sciCAN: Single-cell chromatin accessibility and gene expression data integration via cycle-consistent adversarial network. npj Syst. Biol. Appl. 2022, 8, 33. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kukushkin, M.V.; Arbatskiy, M.S.; Balandin, D.E.; Churov, A.V. Natural Methods of Unsupervised Topological Alignment. Mathematics 2025, 13, 3968. https://doi.org/10.3390/math13243968

AMA Style

Kukushkin MV, Arbatskiy MS, Balandin DE, Churov AV. Natural Methods of Unsupervised Topological Alignment. Mathematics. 2025; 13(24):3968. https://doi.org/10.3390/math13243968

Chicago/Turabian Style

Kukushkin, Maksim V., Mikhail S. Arbatskiy, Dmitriy E. Balandin, and Alexey V. Churov. 2025. "Natural Methods of Unsupervised Topological Alignment" Mathematics 13, no. 24: 3968. https://doi.org/10.3390/math13243968

APA Style

Kukushkin, M. V., Arbatskiy, M. S., Balandin, D. E., & Churov, A. V. (2025). Natural Methods of Unsupervised Topological Alignment. Mathematics, 13(24), 3968. https://doi.org/10.3390/math13243968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop