A Novel Longitudinal Phenotype–Genotype Association Study Based on Deep Feature Extraction and Hypergraph Models for Alzheimer’s Disease

Traditional image genetics primarily uses linear models to investigate the relationship between brain image data and genetic data for Alzheimer’s disease (AD) and does not take into account the dynamic changes in brain phenotype and connectivity data across time between different brain areas. In this work, we proposed a novel method that combined Deep Subspace reconstruction with Hypergraph-Based Temporally-constrained Group Sparse Canonical Correlation Analysis (DS-HBTGSCCA) to discover the deep association between longitudinal phenotypes and genotypes. The proposed method made full use of dynamic high-order correlation between brain regions. In this method, the deep subspace reconstruction technique was applied to retrieve the nonlinear properties of the original data, and hypergraphs were used to mine the high-order correlation between two types of rebuilt data. The molecular biological analysis of the experimental findings demonstrated that our algorithm was capable of extracting more valuable time series correlation from the real data obtained by the AD neuroimaging program and finding AD biomarkers across multiple time points. Additionally, we used regression analysis to verify the close relationship between the extracted top brain areas and top genes and found the deep subspace reconstruction approach with a multi-layer neural network was helpful in enhancing clustering performance.


Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disease and has a devastating financial and emotional impact on individuals and their families [1]. Memory loss, cognitive decline, and behavioral problems are the main clinical signs of AD, which will continue to progress over time [2]. Regional variations in the disease's progression will occur at various times and in various parts of the brain [3]. Numerous studies have discovered that these regional variations may be influenced by different single nucleotide polymorphisms (SNPs) at different times. The study of how complicated genetics affects temporal patterns in imaging genetics has gained popularity [4,5]. To realize early intervention and treatment for AD and stop AD's progression, finding biomarkers in the disease's progression is advantageous [6]. To determine the correlation between imaging genetics data, the majority of early imaging genetics relied on linear models, such as the Sparse Canonical Correlation Analysis (SCCA) algorithm [7]. SCCA is a standard high-dimensional data analysis method based on CCA [8]. By adjusting the weights of different properties of various modal data, it can maximize the correlation of two different modal data. Based on SCCA, Fang et al. incorporated the concept of a joint sparse model and fusion graph lasso. They presented a Joint Sparse Canonical Correlation Analysis (JSCCA) technique to increase the accuracy of the algorithm for the joint analysis of high-dimensional imaging data [9]. Furthermore, Kim et al. extended JSCCA to include connectivity-based penalties, validating the robustness of the algorithm to select potential biomarkers associated with Parkinson's disease on both simulated and real imaging genetics datasets [10]. The link between SNPs and quantitative traits (QTs) can also be examined using the sparse additive model (SpAM). Compared to traditional linear models, it can produce a more realistic impact when taking into account the nonlinear effect of SNP [11]. However, the majority of these methods focus on the relationship between quantitative traits (QTs) and SNPs from a single time point, which may disregard the wealth of historical data that the evolving QT carries through numerous time points. In order to explore the interaction between QTs and genetic data in brain imaging, several researchers have taken into account the time structure of longitudinal imaging data. The most typical method is to improve the SCCA algorithm by including sparse terms to help in the identification of pertinent markers and the construction of a diagnostic model, enabling the algorithm to locate a few strong correlation pairs in high-dimensional imaging genetics data. A valuable temporally constrained group sparse canonical correlation analysis (TGSCCA) algorithm was proposed by Hao et al. [12] to identify the relationship between SNPs and longitudinal sMRI phenotypes. Du et al. [4] proposed the multi-task sparse canonical correlation analysis (T-MTSCCA) to analyze how SNPs change brain QT over time using longitudinal neuroimaging data. Recent research has begun to take joint multimodal longitudinal regression and classification into account to find AD biomarkers and associations between data from various modalities [13].
Considering that linear models may not be able to reveal how genetic factors affect the structure and function of the brain through complex nonlinear relationships, and most of the current methods for analyzing image genetic data only apply QTs from a single time point (e.g., baseline), in this study, we proposed a novel algorithm named DS-HBTGSCCA that combines the deep subspace reconstruction and hypergraph-based Time Constraint Group SCCA (TGSCCA) algorithms to analyze the bivariate correlation between reconstructed longitudinal structural magnetic resonance imaging (sMRI) data and gene expression data. Firstly, the deep subspace reconstruction algorithm was used to reconstruct the original data, and the unique expression patterns of different label samples were identified. Specifically, the deep subspace reconstruction approach transferred each sample to a nonlinear feature space via a multilayer feedforward neural network to extract the nonlinear link between regions of interest (ROIs) and gene variants. Furthermore, it was innovative to maintain the high-order structural relationship between the identical modal samples by using the hypergraph's Laplacian matrix and K nearest neighbors (KNN) to embed the hyperedge and treat each feature as a vertex. Finally, we performed a biological significance analysis and regression analysis of top brain regions on the top risk brain regions and top genes mined, and verified the superiority of the reconstructed data.

Data Source and Preprocessing
The gene expression data and longitudinal brain imaging data of 210 non-Hispanic white participants were downloaded from the Alzheimer's Disease Neuroimaging Disease (ADNI) website (https://adni.loni.usc.edu/ (accessed on 15 March 2022)), including 10  in this paper needs to collect sMRI and gene expression data of the same subject at multiple time points, the interval between time points needs to be roughly consistent. Therefore, only 10 AD samples meet our screening requirements. The demographics of the participants are listed in Table 1. From the ADNI database, we retrieved the sMRI data for the samples with the time distributions of baseline (T1), 6th month (T2), 12th month (T3), and 24th month (T4). After selecting, we eliminated brain imaging samples presenting illegible structures and samples having distorted images, leaving us with 210 samples in total. The four brain images of each sample then underwent head motion correction segmentation, registration, and feature extraction using the SPM12 program in MATLAB. After extracting 122 ROIs using the Anatomical Automatic Labeling (AAL) template, 32 cerebellar structures were eliminated and 90 ROIs were kept as the final characteristics of the brain image.

Preprocessing of Gene Expression Data
For the original gene expression data downloaded from the ADNI website, differential expression analysis was performed on 210 samples (diseased group: 148; healthy control group: 62) using the limma package of R [14], and genes with p < 0.05 were included, and finally 413 significantly differential expression genes were obtained as features of gene expression data.

Deep Subspace Reconstruction Algorithm
Because of the large amount of noise in biological data, the estimation bias will be introduced when the biological data are constrained by a network or graph. Therefore, considering the multi-subspace structure of the bottom data layer, samples with the same label are gathered in the same subspace, and different samples are distributed in different subspaces. The objective function of the deep subspace reconstruction algorithm [15,16] is defined as: where W x is the self-expression coefficient matrix, and X is the feature matrix of the original data. Data label information is a crucial piece of prior knowledge, and it is possible to find greater correlation between imaging and genetic data by putting tags into the algorithm. On this basis, the following objective function can be obtained by incorporating the tag information: where ∼ X represents the sample set of all labels, x Nonlinearity, high dimension, and small sample size have always been the limitations of SCCA when trying to calculate the correlation between two variables. Therefore, in our study, the deep subspace reconstruction algorithm was used to reconstruct longitudinal sMRI data and gene expression data. By introducing a multilayer feedforward neural network, each sample's brain network QTs and gene expression data can be built in a nonlinear space. The schematic diagram of deep subspace reconstruction is depicted in Figure S1 of the Supplementary Material.

Hypergraph Learning
Ordinary graphs can be used to describe the binary relationship between two objects, whereas hypergraphs [17] have the ability to connect more than two vertices together through hyperedges to exploit the relationship between multiple related objects. By using the prior knowledge between nodes, one can create hyperedges and, subsequently, hypergraphs. High-order correlations between samples belonging to the same modal can be maintained using the Laplace matrix of hypergraphs.
We set G(V, E, W), which represents a hypergraph, where V is the set of vertices, E is the set of hyperedges, and W is the set of weights of the hyperedges. Each hyperedge e i (i = 1, 2, . . . N e ) is assigned a weight a(e i ). Figure S2 of the Supplementary Material is a hypergraph, in which each hyperedge can connect multiple nodes. For the given hypergraph G, its incidence matrix H is defined as follows: Based on the incidence matrix H, the degree v of each vertex and the degree e of each hyperedge are, respectively, defined as follows: In addition, let D v and D e represent the diagonal matrices corresponding to the degrees of vertices and the degrees of hyperedges, respectively. W represents a diagonal matrix containing hyperedge weights.
In this paper, we adopt the method proposed in [18] to compute the hypergraph Laplacian matrix: Among them, L h is the hypergraph Laplacian matrix, where I is the identity matrix and

The Proposed Optimization Algorithm
This paper merged the hypergraph matrix of two kinds of omics data after deep subspace reconstruction into the TGSCCA algorithm and induced the new algorithm (DS-HBTGSCCA) to find the high-order relationship between omics data from the sample relationship provided by different omics data (the high-order relationship between longi-tudinal image data and gene expression data is shown in Figure S3). The system frame diagram is shown in Figure 1.
This paper merged the hypergraph matrix of two kinds of omics data after deep subspace reconstruction into the TGSCCA algorithm and induced the new algorithm (DS-HBTGSCCA) to find the high-order relationship between omics data from the sample relationship provided by different omics data (the high-order relationship between longitudinal image data and gene expression data is shown in Figure S3). The system frame diagram is shown in Figure 1. The original gene expression data * and longitudinal imaging phenotype data * are reconstructed in depth subspace to obtain X and , where * = , … , , … , ∈ is the gene expression data of N subjects, * = , … , , … , ∈ is longitudinal imaging phenotype data, and are the hypergraph matrices of X and Y, respectively. Hypergraph constraints can be rewritten with P ( ) (i = 1,2) as follows: where and are the Laplace matrices of X and Y, respectively, and = , = .
Based on the TGSCCA model [13], the optimization algorithm function can be written as: where T is the number of time points and = , … , , … , ∈ .

Performance Index of the Algorithm
In this paper, we take the canonical correlation coefficient (CCC) as the performance index of the algorithm. The calculation formula of this index is given below.
Among them, stands for CCC of correlation analysis of two kinds of data at the time.
( ) stands for Pearson CCC. The original gene expression data X * and longitudinal imaging phenotype data Y * t are reconstructed in depth subspace to obtain X and Y t , where

Regression Index
. . , y nt , . . . , y Nt ] T ∈ R N×q is longitudinal imaging phenotype data, L h 1 and L h 2 are the hypergraph matrices of X and Y, respectively. Hypergraph constraints can be rewritten with P (H i ) (i = 1,2) as follows: where B 1 and B 2 are the Laplace matrices of X and Y, respectively, and Based on the TGSCCA model [13], the optimization algorithm function can be written as: where T is the number of time points and

Performance Index of the Algorithm
In this paper, we take the canonical correlation coefficient (CCC) as the performance index of the algorithm. The calculation formula of this index is given below.
Among them, CCC t stands for CCC of correlation analysis of two kinds of data at the t time. Corr(·) stands for Pearson CCC.
2.6. Regression Index R 2 score is recommended to evaluate the benefits and drawbacks of regression models. The formula for calculating R 2 _score is: Biomolecules 2023, 13, 728 6 of 18 It can be simplified to: where MSE stands for mean squared error and Var stands for variance.

Statistical Analysis
Subjects' age statistics were presented in the form of mean ± standard deviation (SD) and statistically analyzed using the t-test of SPSS 20.0 software. Figure 1 was drawn by using online software called ProcessOn (https://www.processon.com (accessed on 30 November 2022)).   models. The formula for calculating R2_score is: It can be simplified to: where MSE stands for mean squared error and Var stands for variance.

Statistical Analysis
Subjects' age statistics were presented in the form of mean ± standard de (SD) and statistically analyzed using the t-test of SPSS 20.0 software. Figure 1 was by using online software called ProcessOn (https://www.processon.com (accessed November 2022)).

On Simulation Data
To evaluate the performance of the proposed DS-HBTGSCCA, we simulated neuroimaging datasets T (i = 1,2,3,4) and genetic dataset X with n samples, where T has p features, and X includes q features. We set n = 200, p = 90, and q = 450. Assuming that the two data are correlated, α represents the sMRI feature weight vector with p elements, and β represents the gene feature weight vector with q elements, where each element independently obeys a uniform distribution U(−1, −0.5) ∪ U(0.5,1). The correlation between sMRI and genes can be simulated by using a latent variable ε with a normal distribution N(0, σ ). For sMRI data, T = βϵ + e represents correlated voxels, T = e represents uncorrelated voxels and e represents noise with the normal distribution N(0, σ ). T = T + ∆v (∆v ~ N (0, 0.1)) and i = 1,2,3. For gene expression data, X = αε + e and X = e are generated to represent correlated and uncorrelated voxels, respectively. The noise level of the simulation data set was set from 1 to 5, which was employed to compare the performance of the proposed algorithm and the TGSCCA. Then, on the generated 200 simulation data samples, the training set and the test set were divided according to 4:1, and the model was trained on the training set to obtain the optimal hyperparametric solution. The parameter tuning range is {0.0001, 0.001, 0.01, 0.1, 1}. Each loop's average of the CCCs from the four epochs and its used hyperparameters were released at the end of each loop. Next, the trained model from the test set was utilized to calculate the correlation of each period, and the corresponding weight and CCC were then calculated. Taking TGSCCA as the control group, the CCCs of the two algorithms in four peri-

On Simulation Data
To evaluate the performance of the proposed DS-HBTGSCCA, we simulated neuroimaging datasets T i (i = 1, 2, 3, 4) and genetic dataset X with n samples, where T i has p features, and X includes q features. We set n = 200, p = 90, and q = 450. Assuming that the two data are correlated, α represents the sMRI feature weight vector with p elements, and β represents the gene feature weight vector with q elements, where each element independently obeys a uniform distribution U(−1, −0.5) ∪ U(0.5, 1). The correlation between sMRI and genes can be simulated by using a latent variable ε with a normal distribution N 0, σ 2 . For sMRI data, T i = β + e represents correlated voxels, T i = e represents uncorrelated voxels and e represents noise with the normal distribution N 0, σ 2 e . T i+1 = T i + ∆v (∆v ∼ N (0, 0.1)) and i = 1, 2, 3. For gene expression data, X = αε + e and X = e are generated to represent correlated and uncorrelated voxels, respectively. The noise level of the simulation data set was set from 1 to 5, which was employed to compare the performance of the proposed algorithm and the TGSCCA. Then, on the generated 200 simulation data samples, the training set and the test set were divided according to 4:1, and the model was trained on the training set to obtain the optimal hyperparametric solution. The parameter tuning range is {0.0001, 0.001, 0.01, 0.1, 1}. Each loop's average of the CCCs from the four epochs and its used hyperparameters were released at the end of each loop. Next, the trained model from the test set was utilized to calculate the correlation of each period, and the corresponding weight and CCC were then calculated. Taking TGSCCA as the control group, the CCCs of the two algorithms in four periods under different noise environments were obtained (Tables S1-S5). In addition, we drew corresponding line charts through the data of Tables S1-S5, so as to better observe the changes in the data ( Figure 2).
As shown in Figure 2, we evaluated the CCC of DS-HBTGSCCA and TGSCCA at different noise levels as a performance indicator. The results showed that the two algorithms obtained the largest CCC when noise = 2. In T1, T2, and T3 periods, our algorithm was better than TGSCCA. Only in the T4 period, the CCC of TGSCCA was greater than DS-HBTGSCCA, but there were not many differences in the two algorithms. In other noise cases, the CCC of DS-HBTGSCCA was larger than that of TGSCCA. This indicated that our approach was more effective at detecting longitudinal joints than TGSCCA and had good robustness.

On Real Imaging Genetic Data
This experiment made use of the real data in Section 2.1. To prove the superiority of the reconstructed data, we conducted experiments on the optimized model using both the original and the reconstructed data. The dividing method and parameter setting of the training set and the test set of the two data were consistent with the simulated data. Tables 2 and 3 are CCCs on the reconstructed and the original data using two methods on ADNI data, respectively. Table 1 shows that the correlation of DS-HBTGSCCA was higher than the TGSCCA algorithm's correlation in the T1 and T2 periods. The difference between the two in the T3 period was not statistically significant, and our algorithm's CCC was lower than the TGSCCA algorithm's correlation in the T4 period. Meanwhile, the CCCs of the reconstructed data for both algorithms were higher than the CCCs of the original data, further demonstrating that the data reconstructed by the deep subspace reconstruction method and hypergraphs can more accurately depict the relationship between longitudinal sMRI data and gene expression data. In a word, the correlation performance of the DS-HBTGSCCA algorithm was better than TGSCCA in real data at most times.

Top ROIs and Top Genes Identification
The heatmaps of the brain's regions and the gene weight values obtained using the two different methods on reconstructed data are shown in Figure 4. DS-HBTGSCCA can uncover consistent patterns at four time points in the process of examining brain risk regions and risk genes, whereas the TGSCCA algorithm can only be employed at certain time points. This implies that DS-HBTGSCCA improved the correlation performance between longitudinal sMRI data and gene expression data and that our method was highly stable for detecting longitudinal joints.
The top 10 ROIs on the reconstructed longitudinal sMRI, as determined by our method and TGSCCA, are listed in Table 4, along with their weights (absolute values). Compared with TGSCCA, DS-HBTGSCCA could identify the brain regions with the same weights in four time points because of the innovative addition of deep subspace reconstruction to reconstruct the original data and the high-order correlation of the reconstructed data by hypergraph mining, which signifies that DS-HBTGSCCA could jointly select brain regions with the same capture time pattern in ROIs from adjacent time points and had strong robustness. The top 20 genes determined by DS-HBTGSCCA are provided in Table 5, together with their canonical weights (absolute values) based on reconstructed gene expression data.   Figure 4 displays a graphic mapping of the top 10 ROIs on the reconstructed longitudinal sMRI. To determine whether there was a connection between nodes, the Pearson CCC (PCC) between the top 10 ROIs was calculated, and the mean value of PCC (0.9798) was calculated. In this figure, the size of the nodes was represented by the weight calculated in accordance with CCA. The two brain regions with values greater than the mean had a connectivity and relationship with one another.
In order to study the influence of time series ROI changes, we performed an independent sample t-test between the four periods of T1-T4 for the top 10 brain regions mentioned and found that only two brain regions (left middle temporal (p = 0.025) and right inferior temporal (p = 0.04)) between T1 and T4 had significant differences. Then, we draw the heatmaps of the two brain regions with significant differences and the top 20 genes for HC and AD in T1 and T4 ( Figure 5). Figure 6 shows the enrichment analysis results of the Top 10 genes to explore the relationship between the top 20 differentially expressed genes selected by the DS-HBTGSCCA and see their functions clearly.

Regression Results Using Selected Top Markers
In this section, we utilized gene expression data to forecast the expression of gray matter volume in real data in order to predict the regression response. The smaller the error of response, the closer the relationship between selected genes and gray matter volume, and the stronger the representativeness. Regression analysis was carried out specifically using Bayesian regression, linear regression, and ridge regression. The gray matter volume of the top 10 brain regions was regressed using the top 10 to top 100 genes, respectively. The mean R 2 _score obtained from the reconstructed data by the three regression methods is given in Table 6.

Effectiveness Verification Results of Deep Subspace Reconstruction
To evaluate the influence of the deep neural network on subspace clustering performance, we tested the clustering effect of multi-omics data with and without a multilayer neural network and used six different indices (contour coefficient, F-score, Precision, Recall, normalized mutual information (NMI), adjusted rand index (Adj-RI)) to evaluate the clustering performance ( Table 7). The higher the value of the indices, the better the clustering effect.
We also evaluated the effectiveness of the algorithm reconstruction by using the Kmeans algorithm to cluster based on the reconstructed T1, T2, T3, T4, and gene data. Then, we applied t-distributed stochastic neighbor embedding (t-SNE) to reduce the dimension for visualization ( Figure S5). We introduced three clustering indicators (silhouette_score, calinski_harabasz_score, and davies_bouldin_score) to evaluate the reconstruction effect (Tables S6-S8). Among them, the bigger the silhouette_score and calinski_harabasz_score, the smaller the davies_bouldin_score.
Based on Tables 2 and 3, we discovered that the reconstructed data have higher CCCs across epochs on our model than the original data. Subsequently, we identified temporally consistent brain areas and genes. Taking T1 period as an example for longitudinal images, we drew heatmaps of the top 10 ROIs and top 20 genes identified by DS-HBTGSCCA on the reconstructed and original data (Figure 7). Note: The data above were represented as mean ± standard deviation (SD) of subspace clustering algorithm with different initialization after 20 times.

Biological Significance Analysis of Top ROIs
MRI can be used to evaluate the pattern of alterations in the gray and white matter of AD brains to advance AD research. Based on the effectiveness of the DS-HBTGSCCA, the top 10 ROIs have mostly been documented to be associated with AD. For example, researchers discovered that the left middle temporal lobes were closely associated to AD in the experiment evaluating the effect of tandospirone citrate on the treatment of AD [19]. Left precuneus, right precuneus, left precentral gyrus, and right postcentral gyrus were identified to be associated to AD by creating a multi-layer network model of NC, MCI, and AD [20]. In MCI patients, the right language network may be damaged, and the right middle temporal gyrus is abnormal [21]. In healthy middle-aged individuals, AD risk genes also change the default mode network's (DMN) functional connection mode, which results in differences in the left middle frontal lobe [22].

Correlation Analysis between Time-Series Related ROIs and Top Genes
We found a correlation between the top brain regions with time series and top genes. It was clear that more powerful correlation pairs were captured in the transition from HC to AD samples ( Figure 5). The middle temporal gyrus is involved in many different brain functions, such as thinking and recognizing faces. The alexia and agraphia of Chinese characters are also caused by injury to the posterior part of the middle temporal gyrus in the left cerebral hemisphere [23]. The inferior temporal gyrus is associated with abnormal semantic memory disorders (such as AD) [24]. The inhibitor of translation initiation of the EIF4EBP2 gene involves synaptic plasticity, which can affect learning and memory formation through similarity. RTN4.1 is a member of the RTN4 family. RTN4-related diseases include temporal lobe epilepsy (TLE), which is a chronic disease of the nervous system, and its related pathways include AD and miRNA effect (https://www.genecards. org/cgi-bin/carddisp.pl?gene=RTN4&keywords=RTN4 (accessed on 16 March 2023)). Research has shown that TLE is related to memory impairment and memory loss. The loss of pyramidal cells in TLE will cause verbal memory defects, and the loss of right neurons is more prominent in non-verbal (visual-spatial memory loss). Therefore, it can be inferred that EIF4EBP2 is related to the inferior temporal gyrus, which can cause memory impairment, and RTN4 is related to the middle temporal gyrus, which can cause alexia. Future research can find out the expression of these two genes in neurodegenerative diseases such as AD and determine their relationship [25][26][27].

Biological Significance Analysis of Top Genes
Many human diseases are intrinsically caused by genes. It is possible to forecast AD, administer the proper pharmacological intervention, halt the progression of AD, and significantly contribute to AD clinical research by researching the association between genes and AD and finding genes connected to AD. All of the aforementioned studies conclusively demonstrated that the top 20 genes (FCER1G, AIF1.1, EIF4EBP2, MORF4L1, PRR13, CCNY.2, TUBB1, MAPKAPK3.1, IER3, PECAM1, CELF2.1, CORO1C.2, RTN4.1, XRCC5), which account for 70% of the disease, were either directly or indirectly associated to the development of AD. FCER1G, which has a strong negative correlation with the right middle temporal of AD in T1 period, is closely related to the pathophysiology of Aβ, and the expression level of FCER1G affects the progress of AD [28]. Inflammation and apoptosis may all lead to AD [29][30][31]. MAPKAPK3.1 and CORO1C.2 were negatively correlated with right middle temporal in AD samples at T1 and T4, in which MAPKAPK3 was involved in regulating the inflammatory response of mammals [32], while CORO1C was related to apoptosis. It can be inferred that MAPKAPK3.1 belonging to mapkapk3 family and CORO1C.2 belonging to coro1c family may also participate in the pathological process of AD. Evidence has also shown that other genes with no obvious positive and negative correlation may play an important role in AD. Sanfilippo et al. [33] found that the expression level of PECAM1 in the brain of AD patients was regulated. PRR13 negatively regulates thrombospondin-1 (TSP1) expression at the level of transcription. This downregulation was shown to reduce apoptosis. The studies in [34,35] all confirmed that TSP-1 is a potential therapeutic target of AD pathogenesis. Future research needs to examine how the expression of other genes changes throughout time to affect how AD develops.
The results of GO analysis showed that these genes' pathways are almost entirely related to AD, which demonstrated that our approach had a substantial benefit in terms of feature selection ( Figure 6). It was obvious from the figure that these pathways' biological processes were most intensive when they relate to neutrophils. The migration of neutrophils to Aβ could impair cognitive function [36,37]. Additionally, a crucial component of treating AD is inhibiting neutrophil elastase and neutrophil transport [38][39][40]. Overactivation of neutrophils is an essential feature of AD [41]. Secondly, we realized a biological pathway associated with platelets, which are crucial to the pathophysiology of AD. Platelets create Aβ and other amyloid precursor protein (APP) secretase products [42]. Thrombocyte degranulation releases APP (sAPP) secreted by recombination, which will lead to negative feedback regulation during platelet activation [43]. Compared with the control group, AD patients retain more APP in their activated platelets [44]. The pathophysiology of AD may also be impacted by serotonin [45]. Other biological mechanisms that are implicated to AD include phagocytosis and the response to hydroperoxide [46][47][48].

Regression Analysis
According to Table 4, the gray matter volume of the brain was regressed using three conventional regression techniques and their R2 scores were all fairly close to 1. The result demonstrated that all three regression approaches had outstanding model fit and strong regression performance. It also clearly indicated how closely the amount of gray matter in the brain correlates with the genes appointed by the proposed algorithm. We confirmed the validity of the approach in this work for detecting genetic connections with QTs of MRI-derived ROI measurements in the ADNI Cohort.
The contour coefficient of the subspace clustering algorithm through a multilayer neural network was larger than that of the subspace clustering algorithm without a multilayer neural network in the gene, T2, T3, and T4 periods (Table 7). Additionally, in the T1 period, the contour coefficients of the two were similar. The mean ± standard deviation of five indices, such as F-score, which had passed through the subspace clustering algorithm of the multilayer neural network 20 times with different initialization, was basically larger than that of the subspace clustering algorithm without the multilayer neural network in the gene, T1, T2, T3 and T4 periods. It was obvious that the multilayer neural network enabled the subspace clustering technique to achieve superior clustering performance. Figure 7 illustrated how little disturbance was present in the heatmap created using the reconstructed data, regardless of whether it represented brain regions or gene expression data, and how this results in a more pronounced pattern, demonstrating how the features selected by the reconstructed data were more stable and could be chosen more precisely. The distinguishing characteristics attested to the efficiency of the deep subspace reconstruction algorithm.

Conclusions
We proposed a combined method of deep subspace reconstruction and hypergraphbased TGSCCA (DS-HBTGSCCA), which investigated the nonlinear features and high-order correlation between longitudinal sMRI data and gene expression data. It was successfully applied to the identification of AD biomarkers. We compared the performance of TGSCCA and DS-HBTGSCCA on simulated data sets and real data sets, respectively. The experimental results showed that the biological significance analysis of the top 10 brain regions and top 20 genes discovered by DS-HBTGSCCA confirmed that this method could accurately detect the AD-related pathogenic ROIs and genes, and that two risk brain regions (left middle temporal and right inferior temporal) with significant differences in time sequence were found in the top 10 brain regions. In addition, a regression analysis of the top brain areas was carried out, and the effectiveness of the deep subspace strategy was verified.
However, there are some limitations in this study. There was a small number of AD subjects compared to other groups. An unbalanced number of different types of samples often leads to deviations in results. In future research, we aim to use SMOTE and other resampling methods and collect more samples to overcome this problem. In addition, more genetic and clinical data will be taken into account in imaging genetics research to explore complex relationships between the data. Meanwhile, developing deep learning techniques for studying the correlation between longitudinal images and genetic data and recognizing nonlinear changes in features over time would be an interesting research direction.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biom13050728/s1, Figure S1: The schematic diagram of deep subspace reconstruction; Figure S2: A hypergraph model (A is a hypergraph, in which each hyperedge can connect multiple nodes; B is the index matrix of A); Figure S3: Heatmaps of high-order correlation of multiomics Data. A, B, C, D and E represent the heatmap of high-order correlation among T1, T2, T3, T4 and gene data, respectively, and the depth of color in the heatmap represents the high-order similarity between samples; Figure S4: The scatter plot of t-SNE dimension reduction before and after brain imaging and gene reconstruction in different periods; Table S1: When noise = 1, the CCCs of the two methods in four different periods; Table S2: When noise = 2, the CCCs of the two methods in four different periods; Table S3: When noise = 3, the CCCs of the two methods in four different periods; Table S4: When noise = 4, the CCCs of the two methods in four different periods; Table S5: When noise = 5, the CCCs of the two methods in four different periods; Table S6: Evaluation of reconstruction effect by using silhouette_score; Table S7: Evaluation of reconstruction effect by using calinski_harabasz_score; Table S8: Evaluation of reconstruction effect by using davies_bouldin_score.