1. Introduction
Human brain connectomics, driven by the increasing availability of large-scale neuroimaging datasets [
1], has emerged in recent years as a prominent field of research. This field has the potential to address many of the open questions about the structure and function of the human brain. Notably, connectomics-based analyses have revealed meaningful differences between healthy and disease conditions [
2,
3]. However, to further assess the reliability of such findings and to capture individual-specific characteristics that may be overlooked in group-level studies, the concept of a “brain connectivity fingerprint” has gained growing interest [
4,
5,
6].
Functional connectivity fingerprinting of the brain refers to the ability to identify an individual’s functional connectome (FC) from a set of FCs in repeated fMRI imaging sessions. The existence of a brain fingerprint has been established in the last decade with work done with data from functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) [
7,
8]. Such studies have shown that the functional connectome of the brain varies between individuals, therefore, serving, to some extent, as a
fingerprint. In the literature, different studies of FC fingerprinting have been conducted with varying approaches, such as principal component analysis (PCA) [
9,
10], sparse dictionary learning (SDL) [
11], geodesic distance in regularized FC [
12] and correlation distance in FC tangent space projections [
13]. In [
9], the authors show that individual connectivity profiles can be reconstructed through an optimal linear combination of PCA-derived orthogonal components. In [
10], the authors perform PCA in a subset of “learning” FCs to obtain an eigenspace in which the “validation” FCs are projected, thus enabling the identification of the fMRI condition or the participant to which the “validation” FC belongs. In [
11], SDL is used to refine FC profiles, leading to a higher distinctiveness in FCs relative to raw connectivity. PCA and SDL work on 2-dimensional data, while Tucker decomposition is designed for higher-dimensional data structures called tensors, essentially allowing it to analyze complex relationships across multiple variables in a dataset by decomposing it into a core tensor and factor matrices along each dimension. In essence, Tucker decomposition may be thought of as a “higher-order PCA”.
Using the fact that FCs estimated as correlation matrices lie on or inside a Symmetric Positive Definite (SPD) manifold, Venkatesh and colleagues proposed using geodesic distance to compare FCs [
14]. In a follow-up study [
12], the authors explored how optimally regularized FCs maximize the individual fingerprint of participants, measured by the geodesic distance between their FCs. A limitation of this approach is that the geodesic distance between FCs of size
is a map
. Hence, even though the geodesic distance provides a global measure of similarity between FCs, it does not directly highlight the specific features that make the individual’s functional connectivity unique. As an alternative to using FCs, tangent FCs have demonstrated a high capacity to predict cognition and behavior [
15,
16,
17]. Most recently, ref. [
13] analyzed the effects of tangent FCs with respect to fingerprinting. In that study, a high degree of fingerprinting was achieved not only across sessions of a unique participant, but also for matching the sessions of twins.
To simultaneously overcome the drawbacks of the studies mentioned above, we propose utilizing tensor decomposition for FC fingerprinting. Tensor decomposition enables projecting high-dimensional data into a lower-dimensional space, preserving its structure while independently extracting meaningful information from each dimension.
Tensors are multidimensional arrays with applications in various fields, including signal processing, computer vision, and neuroscience [
18]. In brain connectomics, tensors enable modeling and analyzing the functional and structural connections within the brain by reducing the dimensionality of complex, interrelated, and high-dimensional data through tensor decomposition. In [
19], the authors studied the dynamics of FCs to understand the process of formation and dissolution of brain functional networks through tensor decomposition techniques. In another study [
20], it was demonstrated how the analysis of the tensor components enables the extraction of unbiased and interpretable descriptions of single-trial dynamics across many trials through low-dimensional representations of neural data. In [
21], the authors discuss challenges associated with interpreting the brain connectivity patterns derived from tensor decomposition. To our knowledge, only [
22] has considered using tensors in brain connectivity fingerprinting analysis in the literature. However, their study was based on structural connectivity. The task of identifying subjects through their functional connectomes has the additional challenge of dynamic changes and functional reconfigurations happening at a fast rate in response to cognitive stimuli.
Through this study, the effectiveness of tensor based methods in uncovering FC fingerprints is assessed. The adopted fingerprinting framework can be broadly described as follows:
Each participant underwent a total of sixteen data acquisition sessions, two for each of the considered fMRI conditions. Using the BOLD time series from each acquisition, an FC matrix of dimension is estimated, thus yielding a total of sixteen FC matrices for each participant.
For each condition, a tensor of dimensions Number of Brain Regions × Number of Brain Regions × Number of Participants, is constructed by concatenating all participants’ FCs derived in the first acquisition session. Similarly, a second tensor is obtained for all conditions by concatenating all participants’ FCs derived in the second acquisition session.
The obtained tensors are then decomposed via Tucker decomposition [
23], yielding a core tensor and three factor matrices. The first two factor matrices contain cohort-level functional connectivity information and will later on be referred to as “brain parcellation factor matrices”. The third factor matrix contains participant-specific information and will be later addressed as the “participants factor matrix”. The participants factor matrix, acting as a “fingerprint” of each participant, is used to match participants FCs corresponding to different data acquisition sessions. The accuracy with which different sessions are able to be correctly matched is quantified by a metric denominated matching rate [
24].
The aims of this study are: (i) assess the impact of Tucker decomposition on functional connectome fingerprinting in within- and between-fMRI condition settings for different parcellation granularities, (ii) estimate optimal levels of compression of brain parcellation-specific and participant-specific information that maximize fingerprinting, and (iii) analyze how sampling in resting-state prior to Tucker decomposition affects fingerprinting.
The remainder of the article is organized as follows. In Materials and Methods, we: (i) describe the data set used in this study as well as the preprocessing procedures; (ii) introduce the adopted tensor notation; (iii) introduce Tucker decomposition; and (iv) describe matching rate, the fingerprinting measurement used in this study. In Results, we: (i) provide a comparative analysis of fingerprints across parcellation granularities, (ii) compare the fingerprinting performance of different dimensionality reduction methods across parcellation granularities, (iii) disclose optimal levels of compression of parcellation-specific and participant-specific information that maximize within- and between-condition fingerprints, and (iv) present the findings for the different strategies of sampling time points in resting-state time series. In the Discussion, we (i) discuss our findings and (ii) highlight some limitations of our study and make suggestions for future work. Lastly, in Conclusions, we summarize the presented work.
2. Materials and Methods
In this section, a description of the dataset, tensor notation, and fingerprinting framework is presented. First, there is an overview of the HCP dataset, then brain parcellation granularities are considered, and preprocessing procedures are presented. Then, the adopted tensor notation is introduced, along with commonly used tensor decomposition techniques. Lastly, we introduce the metric used to quantify FC fingerprinting, and a detailed description of the fingerprinting framework.
2.1. Dataset Description
The HCP Dataset [
25] has been widely used as a standard in the neuroimaging literature for a broad range of research domains [
7,
26,
27,
28,
29] due to its large-scale, high-quality, and open-access data gathered from a large and diverse cohort of participants. This study utilizes data from eight fMRI conditions included in the Young-Adult Human Connectome Project (HCP) dataset. The fMRI dataset used in this study is from the publicly available Human Connectome Project (HCP). The original study was approved by the Washington University Institutional Review Board. Per HCP protocol, written informed consent statement was obtained from all subjects by the HCP Consortium. To minimize the impact of hereditary influences on fingerprinting, a subset of 426 unrelated individuals was selected. The demographic information of the participants can be found in
Table 1. The fMRI conditions analyzed in this study include resting-state (RS), Emotion processing, Gambling (GAM), Language (LAN), Motor (MOT), Relational processing (REL), Social cognition (SOC), and Working Memory (WM). Each participant completed two sessions per condition, corresponding to separate left-to-right (LR) and right-to-left (RL) acquisitions, which are designated as test and retest sessions. Resting-state scans were conducted over four sessions (“REST1” and “REST2”) on two separate days, though only the two REST1 sessions were considered in this study.
2.2. Brain Parcellations
In this study, we utilized the Schaefer parcellation functional brain atlas to analyze the human cortex [
30]. This parcellation was derived from resting-state fMRI data of 1489 participants, and registered with surface alignment. It was generated using a gradient-weighted Markov random field approach that integrates local gradient and global similarity methods. The Schaefer parcellation is available at nine levels of granularity, ranging from 100 to 900 regions in increments of 100, in both volumetric and grayordinate space. Since the grayordinate versions of these parcellations are already in the same surface space as the HCP fMRI data, mapping them onto the fMRI scans is straightforward. Moreover, surface-based mapping ensures better alignment between the parcellations and fMRI data compared to volumetric mapping. Therefore, we applied surface-based mapping to project the Schaefer parcellations onto the fMRI data. To ensure comprehensive brain coverage, we incorporated 14 subcortical regions into each parcellation, as provided in the HCP dataset (file: Atlas_ROI2.nii.gz). This file was converted from a NIFTI to a CIFTI format using the HCP Workbench 1.5 software (Saint Louis, MI, USA.) (wb_command -cifti-createlabel). As a result, the Schaefer-200 parcellation, for instance, ultimately included 214 brain regions.
2.3. Preprocessing
The preprocessing of fMRI data followed the “minimal” preprocessing pipeline provided by the HCP, which includes artifact removal, motion correction, and alignment to a standardized template [
31]. Further details on this pipeline can be found in previous studies [
26,
32]. We enhanced this minimal pipeline by incorporating additional preprocessing steps, as described in [
13]. Specifically, for resting-state fMRI data, we applied the following procedures: (i) regressed out the global gray matter signal from voxel-wise time series [
31], (ii) implemented a first-order Butterworth bandpass filter in both forward and reverse directions within the frequency range of 0.001–0.08 Hz [
31], and (iii) z-scored and averaged voxel time series within each brain region, while excluding time points that deviated more than three standard deviations from the mean (processed using the Workbench software, wb_command -cifti-parcellate). The same preprocessing steps were applied to all task-based fMRI conditions. However, the bandpass filter was adjusted to a broader frequency range (0.001–0.250 Hz) [
33], as the optimal filtering range remains uncertain [
34].
2.4. Estimation of Whole-Brain FCs
The functional connectivity between pairs of brain regions was estimated by computing Pearson’s correlation (corr MATLAB 2023a function), which results in a symmetric correlation matrix, with M being the number of brain regions for a given parcellation. Throughout this article, this correlation matrix is referred to as FC. For each participant, we computed a whole-brain FC for each of the two sessions (test and retest), each fMRI condition (all seven tasks and resting state), and all parcellation granularities.
2.5. Tensor Notation
We refer to multidimensional data structures as tensors. Mathematically, tensors can be described as objects that lie in and originate from the tensor product of N vector spaces. The number N of vector spaces from which a tensor originates defines the order (or mode) of the tensor. Throughout this work, we adopt the following notation: scalars are denoted by lower or upper case letters, e.g., x or X, vectors are denoted by boldface lowercase letters, e.g., , matrices are denoted by boldface capital letters, e.g., , and high-order tensors are denoted by boldface, calligraphic letters, e.g., . Entries of a tensor are denoted by lowercase letters with subscripts, e.g., , with for all . Tensor fibers are constructed by fixing every index of the tensor but one. Therefore, tensors have as many fibers as dimensions. For example, third-order tensors have column (mode-1), row (mode-2), and tube (mode-3) fibers, which are denoted by , , and , respectively, with colons denoting all the entries of a mode. A tensor slice is defined by fixing all entries from the tensor except two. We can define the slices , , and for a third-order tensor.
The process of reshaping a tensor into a matrix is known as matricization. The mode-
n matricization of a tensor
is given by the matrix
, and its columns correspond to the mode-
n fibers of
. The
n-mode matrix product between a tensor
and a matrix
is denoted by
and is of size
. The outer product between two vectors
is represented by
and is of dimension
, while the inner product of
is defined as
and the norm of
is defined as
.
2.6. Tensor Decomposition
In recent years, tensors have become increasingly popular in the fields of signal processing, machine learning, and neuroscience for their capacity to model complex high-order relationships among objects [
35,
36,
37]. Tensor decomposition enables projecting high-dimensional data into a lower-dimensional space while preserving the original structure of the data. For the purpose of brain fingerprinting, tensor decomposition has the potential of extracting unique features from each participant’s fMRI data acquisition session, thus facilitating subject distinctiveness. Several tensor decomposition algorithms can be found in the literature, each with their own characteristics and applications. The most commonly used ones are the CANDECOMP/PARAFAC (CP) [
38,
39] decomposition and the Tucker decomposition [
23].
The CP decomposition of a tensor
factorizes it into a sum of
R rank-one tensors, where a rank-one tensor denotes the outer product between
N vectors. Equation (
1) shows the CP decomposition of
:
where
R denotes the rank of the decomposition. The vectors
are typically assumed to be normalized, with a corresponding scaling factor of
. Equation (
1) can also be expressed in the simplified form
, in which
, for
, are referred to as factor matrices, and
is a vector containing all scaling factors
.
The Tucker decomposition [
23] decomposes a tensor into a core tensor multiplied by a matrix along each of the tensor modes. For
, its
n-rank, denoted by
, is defined as the column rank of its mode-
n matricization
. In other words, the
n-rank is the number of linearly independent vectors that span the basis of the
mode-n fibers of
. Equation (
2) shows the Tucker decomposition of
.
where
, ⋯,
are column-wise orthonormal factor matrices, and
are the ranks of the decomposition, where
for
. If
), we refer to the decomposition as a truncated Tucker decomposition and refer to it as a rank-(
) decomposition. The tensor
is referred to as the core tensor and its entries represent the level of interaction between the different factors. Note that the CP decomposition can be understood as a special case of the Tucker decomposition when the Tucker’s core is reduced to a hyper-diagonal tensor (all non-diagonal entries are equal to zero) and
. For simplicity, consider the third-order tensor
. The Tucker decomposition of
is obtained by solving the optimization problem
3, which seeks to minimize the norm of the difference between the true and estimated tensors. The decision variables of this problem are the core tensor
, and the factor matrices
.
2.7. Tucker Decomposition of Functional Connectomes
For each of the eight fMRI conditions analyzed in this study, we represent the data as a third-order tensor . Here, M is equal to the granularity of the brain parcellation and corresponds to the total number of participants. Given the symmetry of FC matrices, we refer to as a semi-symmetric tensor, meaning it remains invariant under permutation of two (or more) indices. In our case, for . All analyses performed in this work have an input of a semi-symmetric tensor constructed by concatenating participants’ FCs obtained in one fMRI scanning session (either test or retest).
Once the data have been structured as a semi-symmetric tensor, we can produce a low-rank estimation of the FCs through either of the previously mentioned tensor decomposition methods. Due to the lack of interactions between components, the results of CP decomposition are generally easier to interpret [
21] compared to Tucker decomposition. However, this lack of interaction often leads CP to produce less accurate approximations of the original tensor, as measured by the
norm. In contrast, Tucker decomposition leverages its core tensor to capture interactions between components, enabling it to approximate the original tensor with greater precision [
40]. Considering the interpretability/accuracy trade-off in the context of brain fingerprinting, we focus on Tucker decomposition.
Several methods have been developed to estimate the Tucker decomposition. Notably, we highlight Sequentially Truncated Higher-Order Singular Value Decomposition (ST-HOSVD) [
41,
42] and Higher-Order Orthogonal Iteration (HOOI) [
43]. For a given tensor
, ST-HOSVD sequentially computes the factor matrices via truncated eigenvalue decompositions of the mode-
n matricizations (for
) of
, while iteratively updating
at each step. The output of the algorithm is a core tensor and a set of column-wise orthogonal factor matrices. Similarly to PCA, the components from the factor matrices capture most of the variance across each of the tensor modes. The Tucker estimation via ST-HOSVD of
is shown in Algorithm 1.
Algorithm 1 Sequentially Truncated Higher-Order Singular Value Decomposition (ST-HOSVD) |
Input: - 1:
for to N do - 2:
Matricize the tensor along mode i to obtain - 3:
Compute the Gram matrix - 4:
Compute the eigenvalue decomposition of and sort the eigenvalues in descending order - 5:
eigenvectors corresponding to the sorted eigenvalues of - 6:
- 7:
end for - 8:
Output: |
On the other hand, HOOI utilizes an Alternating Least Squares (ALS) approach to estimate each of the factor matrices by sequentially solving sub-problems of the form in (
4):
where the factor matrices
are commonly initialized via the ST-HOSVD of
. By iteratively optimizing each factor matrix while keeping the others fixed, HOOI provides a better fit as measured by the norm of the difference between the true and estimated tensors compared to ST-HOSVD, but for a higher computational cost. However, since HOOI is not guaranteed to converge to a global optimum nor to a stationary point [
43,
44] and does not provide substantial fingerprinting improvements in comparison to ST-HOSVD (see
Table 2), ST-HOSVD is the chosen algorithm to compute the Tucker decomposition of FCs.
When applied to tensors that exhibit partial (or full) symmetries, HOSVD preserves the symmetric structure of the tensor [
41]. Hence, for a tensor consisting of one session (e.g., test sessions) of participants’ FC matrices, the Tucker decomposition of
can be reformulated as:
where the factor matrices
and
obtained via HOSVD contain, respectively, brain parcellation and participant-specific information and the ranks
and
express the compression levels of brain parcellation and participant-specific information. While solving the optimization problem presented in (
5), the brain parcellation ranks were chosen with a step size of
. In addition to the previous parcellation ranks, we also performed a full-rank decomposition. Thus, for a parcellation granularity of 414, for example, the brain parcellation ranks were set to
. In contrast, for all parcellation granularities explored, the participant ranks were set to
.
Under the hypothesis that the functional connectivity patterns of a participant are, to some extent, reproducible across scanning sessions, we fix the core tensor
and brain parcellation factor matrix
B derived from the Tucker decomposition of tensor
and estimate the participant factor matrix
of the tensor
comprising FCs from another data acquisition session (e.g., retest session). By doing so, we aim to detect a consistent presence of underlying cohort-level functional connectivity patterns across different data acquisition sessions for each participant. The optimization problem shown in (
6)
and admits a closed-form solution given by
where
† denotes the Moore–Penrose inverse [
45] of a matrix, and
and
denote the
mode-3 matricization of
and
, respectively.
2.8. Fingerprinting Quantification
To quantify fingerprinting, we used a measure denominated matching rate [
24] for an identifiability matrix
, where
denotes the Pearson’s correlation between the
j-th row of the participant factor matrix
, and the
k-th row of the participant factor matrix
. The main diagonal entries of
represent similarity levels between different imaging sessions of the same participant. By hypothesis, we expect those entries to be higher than the off-diagonal entries, which represent the similarity level between different imaging sessions of different participants. Matching rate is a variation of
[
7] that accounts for the fact that each participant is present only once in the test and retest sets.
(
7) is the average frequency at which a participant’s test session is most highly correlated to their retest session, and their retest session is most highly correlated to their test session (note that one does not necessarily imply the other). For matching rates, we impose that once a test session is paired with a retest session, it can no longer be chosen for a new pairing. The relative frequency of successful participants matching in both directions is then averaged, yielding a value in the range
, where 0 indicates a failure to correctly match any of the participant’s FCs, and 1 indicates success in matching all participant’s FCs correctly. An algorithmic description of the computation of the matching rate is presented in Algorithm 2.
Algorithm 2 Matching Rate Computation |
Input: - 1:
- 2:
- 3:
for to N do - 4:
- 5:
- 6:
if = then - 7:
- 8:
end if - 9:
-inf - 10:
-inf - 11:
end for - 12:
- 13:
- 14:
for to N do - 15:
- 16:
- 17:
if = then - 18:
- 19:
end if - 20:
-inf - 21:
-inf - 22:
end for - 23:
Output: |
2.9. Fingerprinting Framework Adapted to Tucker Decomposition
The proposed fingerprinting framework consists of five key steps: (i) given a data acquisition session (either test or retest) of an fMRI condition, construct a tensor that contains all participants FCs; (ii) decompose the tensor via Tucker decomposition to obtain a core tensor, a brain parcellation factor matrix, and a participant factor matrix; (iii) estimate the other session’s participant factor matrix based on the decomposition of the given session; (iv) obtain an identifiability matrix by computing pairwise Pearson’s correlation between the rows of both participant factor matrices; and (v) calculate the matching rate for the obtained identifiability matrix. A schematic representation of our framework is presented in
Figure 1.
In the following section, we discuss how matching rate is affected by parcellation granularity, decomposition rank, scanning length of fMRI conditions, and under within- and between-condition scenarios.
4. Discussion
Table 2 shows that under all conditions, Tucker decomposition-based methods (ST-HOSVD and HOOI) consistently achieve the highest matching rates compared to FCs and other data-driven techniques. Both methods significantly outperform PCA and CP, particularly in conditions with fewer time points, such as Emotion and Relational, where FCs, whether used directly or in combination with PCA, under-perform. The results highlight the effectiveness of Tucker decomposition techniques in enhancing fingerprinting accuracy even for fMRI conditions with a short scanning length.
The results presented in
Figure 2 provide meaningful insights into the influence of parcellation granularity on fingerprinting accuracy. The observed matching rate plateau at higher granularities indicates that beyond a certain threshold, increasing the parcellation resolution offers diminishing matching rate returns, which could be due to an increase in noise in the overall data of each participant. This is consistent with prior findings by Finn et al. (2015) [
7], who showed that measures from FCs estimated with long scan sessions provide meaningful information about individuals even with moderate parcellation granularity. The significantly higher matching rates for resting-state compared to tasks aligns with previous research showing that resting-state data captures more stable and individualized connectivity patterns due to its longer scanning length and more consistent brain-wide activity [
9]. However, it has also been shown that when accounting for scanning length, resting-state has a lower fingerprint than tasks [
13]. Interestingly, the lower performance observed for Emotion reflects the challenges of identifying individuals from shorter or more transient brain states, where brief tasks led to less reliable fingerprinting. The lack of a strict linear relationship between task length and matching rates across intermediate conditions suggests that certain cognitive tasks elicit more distinct and stable connectivity patterns, regardless of their duration. This nuanced relationship highlights the complexity of functional connectome dynamics, emphasizing the need for both sufficient data length and appropriate parcellation resolution to maximize fingerprinting reliability.
As shown in
Figure 3, compressing the brain parcellation information is detrimental to fingerprinting, as the highest matching rates were obtained with a brain parcellation rank of 414. In contrast, compressing the participants specific dimension overall preserves matching rate, as shown by the high matching rates obtained with participant rank as low as 150 for Relational, Gambling, Social, Language, Working Memory, and Resting-State, and have a rank of 300 for Emotion and Motor. The results from this analysis imply that the dimension of the participants information, represented by the participants rank, can be considerably compressed while preserving fingerprints. This indicates that there is some redundancy in inter-participant variability, with not every participant adding new information to the data. However, it is important to emphasize that the compressibility of the participants dimension is likely cohort-size dependent, and must, therefore, be reexamined when dealing with different cohort sizes.
The within-condition results shown in
Figure 4A confirm the presence of functional connectome fingerprints and demonstrate that the proposed framework is particularly effective in uncovering them, compared to non-decomposed functional connectomes. Notably, the obtained fingerprinting results were substantially higher than those obtained with FCs, especially for parcellation granularity 214, where fingerprinting performance was notably low (ranging from 37% to 83%). Noting that resting-state achieves matching rates of 100% in
Figure 4A, in
Figure 4B, a comparison between tasks and resting-state is carried out when the FCs of both are computed with an equal number of time-points. The higher matching rates obtained from Emotion, Relational, and Gambling in comparison to resting-state suggest that, to some degree, the fingerprints derived from resting-state stem from the considerably longer duration of resting-state relative to fMRI tasks.
Unlike during fMRI tasks, participants in the resting-state condition are not engaged with any specific stimulus and, therefore, resting-state FCs encode “baseline” functional couplings among brain regions. Therefore, between-condition analyses allow us to assess the extent to which these couplings—captured by the resting-state brain parcellation factor matrix—can be recovered when participants engage in tasks. Even though extracting fingerprints in this setting is inherently more challenging compared to the within-condition setting, we were able to substantially increase matching rates relative to using original FCs. In
Figure 5, it is shown that results are more sensitive to the parcellation and participant ranks, with a clear benefit in setting both to be the maximum value possible in many of the tasks. Furthermore, in
Figure 6 it is shown that ST-HOSVD provided an even greater improvement over matching rates relative to FCs compared to the within-condition fingerprints. This supports the rationale that baseline functional couplings exist and can also be effectively uncovered while participants are engaged in a task.
We also explored the effect of reducing the resting-state BOLD time series duration to match the duration of each task. Doing so enables determining whether the entire scanning length is necessary to extract “key features” that facilitate obtaining between-condition fingerprints. From both sampling procedures carried out in this study, it is clear that randomly sampling time points is the superior strategy for fingerprinting. Comparing the results of
Figure 7A,B, we see that randomly sampling time points of resting-state scans is not only more effective than sampling them consecutively, but also as effective as constructing resting-state FCs using the full scan length (as shown in
Figure 5). This result aligns with the fingerprinting improvements seen when sub-sampling frames of edge-based time-series [
46].
The results presented in
Figure 8 highlight the effectiveness of the proposed ST-HOSVD-based framework in enhancing FC fingerprinting relative to using FCs directly. Panels A and B illustrate the identifiability matrices and participant similarity distributions for Emotion and resting-state, respectively. The ST-HOSVD framework considerably improves the separation between within-participant and between-participant similarity, as evident by the clearer diagonal structure in the identifiability matrices and the sharper peak of the within-participant similarity distribution. This indicates that the participants factor matrix derived from ST-HOSVD better captures the individual-specific features of FC patterns compared to vectorized FCs. Panels C, D, and E, respectively, display qualitative examples in which the proposed framework fails to correctly match the Emotion FCs of one participant, but succeeds in matching the FCs of participants that cannot be matched using FCs directly. Visually speaking, it is easy to see similarities between the FCs from participants 2 and 3, however the same cannot be said about the FCs from participant 6. While these results demonstrate the improved fingerprinting capability of the ST-HOSVD-based framework, potential limitations exist. Factors such as variations in FC stability across tasks or inter-individual differences in connectivity patterns could influence the framework’s performance. Additionally, differences in cohort size, scanner parameters, or preprocessing pipelines may affect the generalizability of the results, and should be further analyzed in future studies.
Both CP and Tucker decomposition are commonly used tensor decomposition techniques for dimensionality reduction and feature extraction purposes. In the context of fingerprinting, CP falls short due to two key reasons. First, the assumption that the original high-dimensional data can be reconstructed using non-interacting components is too restrictive, as we know that there are innate interactions between brain regions under a functional connectivity standpoint [
47]. Second, due to CP decomposition being a single-rank decomposition, we cannot freely explore the dynamics of compressing the different dimensions of the data. Conversely, the Tucker’s core plays a pivotal role in capturing the interactions between components while giving us the flexibility to explore how different levels of compression of the brain parcellation and participants’ information affect fingerprinting, thus overcoming both drawbacks of CP decomposition. However, drawing neuroscientific insights directly from the factor matrices derived from Tucker decomposition of FCs is non-trivial due to the existence of the core tensor [
21], which captures several interactions between the components from each factor matrix.
FC fingerprinting, while having the potential to provide valuable insights in clinical and forensic settings, raises significant ethical concerns regarding privacy, bias, and potential misuse. The sensitive nature of neuroimaging data, which can reveal information about an individual’s cognitive state, mental health, or even predispositions to certain conditions, makes it highly vulnerable to privacy breaches. Without proper anonymization, such data could be exploited for unauthorized profiling or discrimination. To safeguard privacy, robust anonymization techniques, such as de-identification and differential privacy, should be implemented. Additionally, data security measures, including encryption during storage and transmission, as well as strict access controls, are essential to prevent unauthorized usage. Bias is another concern, as models trained on non-representative datasets may lead to inaccurate or unfair identifications, particularly in forensic applications. Ensuring diverse, unbiased datasets and regularly auditing algorithms for fairness can help mitigate this risk. Addressing these ethical implications is crucial to prevent the misuse of neuroimaging fingerprinting and protect individual rights.
Our study has limitations. As discussed above in detail, interpreting the factor matrices derived from Tucker decomposition is not straightforward due to the presence of a core tensor that captures complex interactions between all factor matrices [
21]. Additionally, the proposed fingerprinting framework does not allow for incremental updates to the core and factor matrices when FCs from new participants are introduced. Rather, the entire Tucker decomposition and fingerprinting framework would need to be recomputed. Our study leads to several avenues for further research. When preprocessing fMRI BOLD data, there is a large number of pipelines, steps, and parameters that can be used, with each specific configuration possibly leading to different FC estimations and ultimately differences in fingerprinting. Further work could assess the specific impact of such decisions (e.g., global signal regression, bandpass filter) on the association between Tucker decomposition and matching rates. Also, while Pearson’s correlation is the most widely used coupling method for fMRI time-series to estimate functional connectivity, other alternatives such as mutual information should be considered in order to assess the impact of different coupling methodologies when using decomposition methods to assess fingerprinting. To improve the interpretability of Tucker decomposition, future work could explore strategies to extract meaningful neuroscientific insights from the core tensor and factor matrices. One promising approach is to impose sparsity constraints on the core tensor using
or
regularization, which could help isolate dominant functional connectivity patterns shared across individuals. Simultaneously, the participant factor matrix would reveal the contribution of each underlying pattern to an individual’s FC, thereby enhancing interpretability. Another possible path for deriving neuroscientific insights is to perform post-hoc statistical analyses by correlating the components from the factor matrices with cognitive or behavioral measures. Doing so would help bridge the gap between Tucker-based decompositions methods and neuroscientific interpretation, allowing for a better understanding of how extracted patterns relate to individual differences in cognition, behavior, or clinical state.