Mining High-Level Imaging Genetic Associations via Clustering AD Candidate Variants with Similar Brain Association Patterns

Brain imaging genetics examines associations between imaging quantitative traits (QTs) and genetic factors such as single nucleotide polymorphisms (SNPs) to provide important insights into the pathogenesis of Alzheimer’s disease (AD). The individual level SNP-QT signals are high dimensional and typically have small effect sizes, making them hard to be detected and replicated. To overcome this limitation, this work proposes a new approach that identifies high-level imaging genetic associations through applying multigraph clustering to the SNP-QT association maps. Given an SNP set and a brain QT set, the association between each SNP and each QT is evaluated using a linear regression model. Based on the resulting SNP-QT association map, five SNP–SNP similarity networks (or graphs) are created using five different scoring functions, respectively. Multigraph clustering is applied to these networks to identify SNP clusters with similar association patterns with all the brain QTs. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern. We applied this pipeline to an AD imaging genetic study, which yielded promising results. For example, in an association study between 54 AD SNPs and 116 amyloid QTs, we identified two SNP clusters with one responsible for amyloid beta clearances and the other regulating amyloid beta formation. These high-level findings have the potential to provide valuable insights into relevant genetic pathways and brain circuits, which can help form new hypotheses for more detailed imaging and genetics studies in independent cohorts.


Introduction
Alzheimer's Disease (AD) is a complex neurodegenerative disorder characterized by continuous cognitive impairment and eventual amyloid plaques, neurofibrillary tangles, and atrophy patterns in the brain [1][2][3]. As the most common type of dementia, AD is responsible for approximately 5.8 million dementia cases in US [4]. AD has a heritability ranging from 60% to 80% estimated from the twin study [5]. The most widely used approach to identify AD genetic basis is to perform a genome-wide association study (GWAS) or GWAS-based meta-analysis on case-control phenotypes. Over 50 AD-related single nucleotide polymorphisms (SNPs) have been identified [6,7].
Many previous AD studies use GWAS and pathway enrichment analysis to explore the genetic basis of the AD diagnosis [3,[7][8][9][10][11][12][13][14]. However, these case-control genetic association studies cannot directly reveal the biological pathways from genetic determinants, molecular signatures, and brain traits to cognitive and clinical outcomes. To bridge this gap, brain imaging genetics [15][16][17] is emerging as a new research field, where quantitative traits (QTs) extracted from brain imaging data are used as intermediate phenotypes to study genetics. These imaging QTs have the potential to not only link genetics with disease outcomes but also capture neuropathological heterogeneity of AD [18,19].
Conventional brain imaging genetics studies perform massive pairwise association analyses between each SNP-QT pair. These individual level SNP-QT signals are high dimensional and typically have small effect sizes, making them hard to be detected and replicated. To bridge this gap, some studies attempt to interpret these results on a macroscopic level or derive high-level understandings. For example, Yao et al. used a two-dimensional enrichment analysis to address this challenge, grouping similar brain regions and genes together via a biclustering approach [20]. Yao's work identified various high-level two-dimensional imaging genetic modules, which were predefined based on the brain transcriptome data from Allen Human Brain Atlas.
In this work, instead of using the knowledge-driven, predefined imaging genetic modules, we propose an alternative data-driven approach to identify high-level imaging genetic patterns. Based on the detailed SNP-QT associations, we develop a graph-cut algorithm to cluster similar SNPs together so that SNPs within the same cluster tend to have similar associations with QTs across the brain. We construct multiple SNP networks based on different similarity measurements. Each similarity network can be viewed as a weighted graph with a specific similarity measure defined as the edge weight. We employ a multigraph clustering method derived from min-max graph cut to discover SNP clusters that take into consideration of all the studied similarity measures. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern to provide valuable biological insights at a high level.
We applied this pipeline to an AD imaging genetic study, which yielded promising results. For example, in an association study between 54 AD SNPs and 116 amyloid QTs, we identified two SNP clusters with one responsible for amyloid beta clearances and the other regulating amyloid beta formation. These high-level findings have the potential to provide valuable insights into relevant genetic pathways and brain circuits, which can help form new hypotheses for subsequent imaging and genetics studies in independent cohorts.

Data Description
Data used in the preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) [21]. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see www.adniinfo.org. In this study, participants (N = 971) include 202 AD, 218 late MCI (LMCI), 296 early MCI (EMCI), and 255 healthy control (HC) subjects. The baseline structural magnetic resonance imaging (MRI) scans, AV45, and FDG positron-emission tomography (PET) scans, genotyping data, demographic information and clinical assessments are downloaded from the ADNI database (adni.loni.usc.edu). Table A1 shows participant characteristics.

Data Preprocessing
The genotyping data were downloaded and analyzed using PLINK v1.90 [22]. We perform quality control using the following criteria: genotyping call rate > 95%, minor allele frequency > 5%, and Hardy Weinberg Equilibrium > 1.00 × 10 −6 . Then, we select 54 risk variants identified by recent AD genome-wide association studies (GWAS) or GWAS meta-analysis [3,6,7]. Table A2 shows the list of risk variants investigated in this study.
Structural MRI scans are processed with voxel-based morphometry (VBM) using the Statistical Parametric Mapping (SPM) software. All scans are aligned to a T1-weighted template image, segmented into gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) maps, normalized to the standard Montreal Neurological Institute (MNI) space as 2 × 2 × 2 mm 3 voxels. The GM maps are extracted and smoothed with an 8mm FWHM kernel. We then extract the average regional GM measurements from 116 regions-ofinterests (ROIs) defined by the automated anatomical labeling (AAL) atlas.
Preprocessed F-18 florbetapir (AV45) PET scans are collected and aligned to the Montreal Neurological Institute space as 2 × 2 × 2 mm voxels using SPM. Standard uptake value ratio is computed by intensity normalization based on a cerebellar crus reference region. We then extract the average regional AV45 measurements from 116 AAL ROIs.
The (18)F-fluorodeoxyglucose (FDG) PET measurements are also registered into the same MNI space as 2 × 2 × 2 mm 3 voxels by SPM. We then extract the average regional FDG measurements from 116 AAL ROIs. Figure 1 shows the flowchart of the analyses performed in this study, including six steps.

Method Overview
Step 1 generates detailed SNP-QT association maps for five different subject sets examined in our prior study [23], respectively. Step 2 constructs five SNP similarity networks using different scoring functions.
Step 3 performs multigraph clustering on the five SNP networks with a range of cluster numbers.
Step 4 examines the clustering quality of each cluster through Silhouette analysis. Based on the Silhouette scoring results, two cluster groups are selected for the subsequent analysis in Steps 5 and 6. We perform functional annotation for (1) each identified SNP cluster in Step 5 using pathway analysis and (2) its corresponding brain association pattern in Step 6 using Neurosynth and Neurovault.

Step 1: Imaging Genetic Association Analysis
The relationship between each ROI-based imaging QT and each SNP can be obtained by performing a linear regression. Let G be a set of SNPs and Y be a set of imaging QTs (AV45, FDG and VBM). We perform a linear regression model to estimate the additive effect of each SNP g ∈ G on each QT y ∈ Y. The analysis is performed for all possible SNP-QT pairs for each of the five comparison groups (i.e., EMCI vs. HC, LMCI vs. HC, AD vs. HC, MCI vs. HC, ALL vs. HC) within each of the three imaging modalities (i.e., AV45, FDG, and VBM). The regression is repeated 54 × 116 times. The linear regression model is defined as follows: where Z = (z 1 , · · · , z k ) T includes the variables whose effects we want to exclude, such as age, sex, and education; α and Γ = (γ 1 , · · · , γ k ) are the coefficients; is the error term. Our goal is to estimate α and also test if the SNP g has a significant effect (i.e., α = 0) on each QT y ∈ Y.
Thus, in Step 1 we generate an ROI-based p-value map to quantify the significance of SNP effects on imaging data. Specifically, in this work, each element of the significance map records the "negative log p-value" −log 10 (p) at the corresponding ROI. At the end of this step, we have 5 SNP-QT maps of size 54 (number of studied SNPs) × 116 (number of ROIs) for each of the three modalities. Step 1 generates detailed SNP-QT association maps (54 SNPs by 116 QTs) for five different subject sets examined in our previous study [23], respectively.
Step 2 transforms the SNP-QT map to SNP networks by applying different similarity scoring functions to each pair of 116-dimensional SNP vectors.
Step 3 uses multigraph min-max cut algorithm to generate an optimal clustering result scoring analysis in Step 4. In Step 5, the SNPs in each cluster are mapped to nearest genes and uploaded to enrichR for Elsevier pathway analysis to identify relevant biological pathways. In Step 6, Neurovault and Neurosynth are used to functionally annotate the average brain association pattern for all the SNPs in each cluster.

Step 2: SNP Networks with Different Similarity Measurements
Step 1 explores the lower level relationship between imaging and genetic data. In order to aggregate the individual effects of multiple SNP-ROI pairs to high-level imaging genetic patterns, we transform the SNP-QT maps to an SNP network that models the SNP similarity in terms of their effects on all the QTs across the entire brain. From Step 1, a 54-by-116 SNP-QT map is constructed for each of the five comparison groups within each of the three modalities. For each SNP, there is a 116 dimensional feature representation that maps its effect on the brain. The similarity measurement is applied on all pairs of 116-dimensional normalized SNP vectors to create a 54-by-54 SNP network. Five scoring functions shown in Table 1 are used, resulting in five distinct 54-by-54 SNP networks for each comparison group. The three SNP networks formed by the Pearson correlation, the Spearman correlation, and the cosine similarity are normalized by taking the absolute value of the entry, respectively. The two SNP networks formed by the Manhattan and Euclidean distances are transformed to normalized similarity networks by taking a Gaussian radial basis function centered at distance = 0 with a standard deviation of (maximum-minimum)/3, respectively. After normalization, all the entries in each 54-by-54 SNP network have a value between 0 and 1. Table 1. Assume the 54-by-116 genetic-imaging matrix is X. Scoring functions are applied to X i and X j ∈ R 116 , 116-dimensional row vectors of X that maps the effect of a given SNP to 116 brain regions of interest (ROIs). Assume X ik denotes the i-th row and k-th column entry of X. Note that the Manhattan distance and Euclidean distance need to be transformed to the corresponding similarity measures using a Gaussian radial basis function in the third column.

Scoring Function Normalized Similarity
Pearson correlation

Step 3: Multigraph Min-Max Graph Clustering
Although an SNP network describes the similarity between each pair of SNPs, a high-level understanding can be obtained by grouping similar SNPs together and study their collective effects. From Step 2, five 54-by-54 normalized similarity SNP networks are created for each comparison group within each of the three modalities. The network can be viewed as a graph so that the connected components output from graph cut algorithms are viewed as network clusters. Ding et al. proposed a min-max graph cut algorithm that improves cluster quality and balance by minimizing similarity between pairwise subgraphs and maximizing similarity within each subgraph [24]. The min-max graph cut takes a single similarity network as input, so it clusters one network and examines the effect of one scoring function. Wang et al. generalized the single-graph min-max graph cut into multigraph min-max graph cut, which is used in this study to evaluate the combined effect of five scoring functions [25]. The objective functions of both min-max graph cut models are shown in Table 2. In this study, multigraph min-max graph cut algorithm is implemented through a gradient descent method with convergence conditions. The implication of multigraph min-max clustering is that it combines the effects of multiple scoring functions at the same time. The clustering results of multigraph min-max graph cut algorithm have features that resemble the clustering results of single-graph min-max clustering from the best scoring function. Multigraph min-max clustering with five 54-by-54 SNP networks as inputs is performed on the number of clusters ranged from 2 to 9 to produce clustering results for each comparison group within each modality. Table 2. Objective functions of single-graph and multigraph clustering. A is the adjacency matrix, which is equivalent to the similarity network in this study. D is the diagonal matrix of A. Q is the output clustering labels. K is the number of clusters.

Graph Cut Algorithm for Cluster Analysis Objective Function
Single-graph min-max cut Step 4: Silhouette Scoring Analysis The goal of this step is to determine the optimal number of clusters. Silhouette refers to a method of interpretation and validation of consistency within clusters of data and provides a graphical representation of cluster quality [26]. The Silhouette value has a range between -1 and 1. A value close to 1 indicates good clustering quality: the objects are close to assigned clusters and far from neighbor clusters. A value close to -1 suggests that the number of clusters selected is not appropriate. The scoring functions are listed in Table 3. The Silhouette scoring analysis is performed on the clustering results of multigraph clustering with number of cluster ranged from 2 to 9. The normalized similarity networks in Step 3 are transformed to distance matrices by converting a similarity measure of x into a distance measure of 1 − x. For a given number of clusters, there are 5 similarity measurements × 5 comparison groups within each of the three modalities. The 5 × 5 = 25 Silhouette scores are averaged for comparison. The clustering result with the highest averaged Silhouette score is selected for further analysis. The Silhouette scoring analysis is also performed on the clustering results of single-graph clustering with number of cluster ranging from 2 to 9. The 5 Silhouette scores from 5 comparison groups are averaged and compared with the averaged Silhouette score of the multigraph clustering to analyze the effectiveness of multigraph clustering. Table 3. Silhouette scoring functions. Let C I be the cluster which node i belongs to.

Step 5: EnrichR Elsevier Pathway Analysis
A high-level result of two SNP groups is produced from previous analysis. The genetic domain of each SNP group can be analyzed through the pathway analysis using Enrichr. Enrichr is an integrative web-based and mobile software application that includes new gene-set libraries, an alternative approach to rank enriched terms, and various interactive visualization approaches to display enrichment results using the JavaScript library, Data Driven Documents (D3) [27][28][29]. The software can also be embedded into any tool that performs gene list analysis. The 54 AD-related SNPs in this study are mapped to their closest gene, upstream or downstream. The SNP cluster from multigraph clustering are mapped to a group of genes and uploaded to EnrichR for pathway analysis. The elsevier pathway analysis results of each SNP cluster are recorded and compared because it contains various AD-related pathways.

Step 6: Neurovault Brain Region Analysis
After analyzing the genetic domain, the brain pattern corresponding to each SNP cluster can be analyzed through mapping the average effect of each SNP group onto the brain. This brain association pattern can be analyzed by Neurovault and Neurosynth [30], which gives us functional and structural information of the affected brain regions. NeuroVault is an open-science neuroinformatics online repository of brain statistical maps atlases and parcellations [30]. Neurosynth is a platform for large-scale, automated synthesis of functional magnetic resonance imaging (fMRI) data. It takes thousands of published articles reporting the results of fMRI studies and outputs brain maps with calculated correlation coefficients given the uploaded MRI data. The SNPs that are grouped together are expected to affect similar brain regions; thus, the averaged SNP effect on 116 QTs from each SNP group is calculated and mapped onto the brain. The resulting brain map is functionally annotated using NeuroVault and Neurosynth.  Table A2. The order of ROIs on the horizontal axis follows the list shown in Table A3. Normalization was performed so that each row has a squared norm of 1. The vertical axis follows the SNP order listed in Table A2. The horizontal axis follows the ROI order listed in Table A3.

Imaging Genetic Association Maps
Each entry of the map corresponds to −log 10 (p-value) from the linear regression before normalization. After an initial SNP-QT map is created, each 116-dimensional vector of a given SNP is normalized such that the Euclidean norm is 1. This step is performed so that each SNP is represented as a directional unit vector to facilitate subsequent analysis.
While such an imaging genetic map describes detailed associations for each SNP-QT pair, it is not straightforward to detect any general trend in these maps. The goal of the subsequent steps is to extract high-level information from these maps and help provide biological interpretation to aid biomarker discovery and therapeutic target identification.

Multigraph vs. Single-Graph Silhouette Analysis
The multigraph vs. single-graph averaged Silhouette scores are shown in Figure 3. The multigraph averaged Silhouette score is calculated by taking the mean of 25 Silhouette scores (5 scoring functions × 5 comparison groups) from the multigraph clustering result at a given number of clusters for a given modality. The single-graph averaged Silhouette score is calculated by also taking the mean of 5 × 5 = 25 Silhouette scores. Instead of using the same clustering result across five scoring functions for the multigraph case, a single-graph clustering is performed on each of the scoring functions. The Silhouette scores are calculated based on the clustering result of a specific scoring function.
A higher Silhouette score indicates a better clustering quality. A lower number of clusters is preferred in this study when the Silhouette scores are similar since our goal is to provide a high-level understanding. As a result, cluster number = 2 is chosen for the subsequent analyses. In the subsequent analyses, we report the multigraph results of clustering SNPs into two groups, which is the optimal case for both AV45 and VBM.

Clustering Results
The SNP networks constructed by the normalized cosine scoring function are shown in  The similarity network entries are normalized so that the minimum is 0 and the maximum is 1. Each SNP has a maximum similarity of 1 with itself as observed from the diagonal. Good partition of SNPs is indicated by strong similarity within each cluster and weak similarity between the clusters. A balanced size of the two clusters is preferred so that we can identify multiple high-level patterns instead of one single high-level pattern coupled with a small number of outliers; therefore, the clustering result on the AV45 measures for the LMCI vs. HC comparison group as well as the clustering result on the VBM measures for the AD vs. HC comparison group are selected for subsequent analysis.

Case Study: Example AV45 Result
Among all the results in modality AV45, the most balanced one is generated by analyzing the LMCI vs. HC comparison group, and this result is shown in Table A4. The functional annotation and pathway analysis of the identified SNP clusters and the corresponding brain maps are shown in Figure 5. The SNPs in each of the two groups are mapped to their closest genes and uploaded as two gene sets to enrichR. The Elsevier pathway analysis is used in this study because multiple AD related pathways are included in this pathway, which is helpful for understanding AD pathogenesis. The average normalized brain significance maps corresponding to two SNP groups are shown in Figure 5c. Neurosynth analysis results of these two brain maps are shown in Figure 5d.

Case Study: Example VBM Result
Among all the results in the modality VBM, the most significant and balanced result is generated by analyzing the AD vs. HC comparison group, and this result is shown in Table A5. The functional annotation and pathway analysis of the identified SNP clusters and the corresponding brain maps are shown in Figure 6. The analysis is similar to the previous case study on the AV45 measures for the LMCI vs. HC comparison group. This clustering result has a lower Silhouette score (0.158) than that in the previous case study (0.293). So a less distinct pattern is observed in the network, along with less differentiated pathways, brain regions, and brain map visualization.

Comparison between Single-Graph and Multigraph Clusterings
In this study, multiple scoring functions have been selected to evaluate the similarity between different AD-related SNPs in terms of their effects on 116 ROIs across the brain. Each scoring function quantifies the similarity between SNPs from a specific perspective. Multigraph clustering is used to output a clustering result that combines the effects of multiple scoring functions. The purpose of building SNP-SNP networks through different scoring methods is to evaluate the SNP similarity in terms of their effects on 116 ROIs traits across the brain from multiple perspectives. Given two vectors (1, 2, 3) and (0.001, 0.002, 0.003), their Pearson correlation, Spearman correlation, and cosine similarity are all 1 (corresponding to the largest similarity), since they focus on comparing the vector directionality instead of the vector magnitude; however, their Manhattan distance and Euclidean distance are very sensitive to the vector magnitude, and thus are both large, leading to very small similarity. Our multigraph approach combines the effects of all these scoring functions, and takes into consideration both vector directionality and magnitude when performing multigraph clustering.
Several single-graph and multigraph clusterings with a varying number of clusters from 2 to 9 are performed. Averaged Silhouette analysis scores are used to quantify clustering quality under a given cluster condition. In Figure 3, the plot of averaged Silhouette analysis for single-graph shows that clustering quality improves in general as the number of clusters increases for FDG and VBM; however, for AV45 a higher number of clusters leads to a lower cluster quality. There is an inconsistency in the optimal number of clusters for different imaging modalities. The goal of this study is to acquire a high-level understanding of imaging genetic associations. Despite the inconsistency of clustering quality, a large number of clusters also makes subsequent analysis complicated. Only a few brain regions and pathways will be present when the number of SNPs in each cluster decreases, which downgrades the high-level understanding back to individual level analysis.
With these difficulties addressed in single graph clustering, the use of multigraph clustering is very promising for various reasons. The first advantage of multigraph clustering is that at a given number of clusters, it is able to selectively use scoring functions that behave well. For example, at cluster number = 2, the Pearson and Spearman methods have low Silhouette scores (<0.062) across all three modalities, while the Manhattan, Euclidean, and cosine methods have high ones (>0.11). In this case, the multigraph clustering yields an average Silhouette score of 0.1016 (Figure 3), resulting in prominent patterns when mapped to Manhattan, Euclidean, and cosine networks (e.g., Figure 5a).
The second advantage of multigraph clustering for this study is that it behaves the best for AV45 and VBM at the number of clusters = 2 (see Figure 3). As discussed above, a small number of clusters is great for high-level analysis. For FDG, the Silhouette score for the cluster number of 2 is also close to the score for the cluster number of 8. So the result for the cluster number of 2 is reported for all three modalities in this study and coupled with subsequent functional annotation and pathway analysis.
The third advantage of multigraph clustering is that the analysis is more efficient and consistent than a collection of single-graph clusterings. Instead of doing five singlegraph clusterings with inconsistent results among different scoring functions, multigraph clustering is able to return a single set of clustering result. This feature provides a novel way of analysis for future studies with a large number of candidate evaluation functions and no prior knowledge of their performances.

AV45 Clustering Result
In the AV45 row of Figure 4, comparison group AD vs. HC and ALL vs. HC both have one cluster group of 1 SNP and another cluster group of 53 SNPs. The two clusters can be viewed as one group because the multigraph clustering algorithm explicitly enforces each cluster to be nonempty. While these two results are not significant, rs11278892 with its minor allele G is classified to be the most distant from the other 53 SNPs.
Comparison group EMCI vs. HC has one cluster group of 2 SNPs and another cluster group of 52 SNPs. Again, this can be roughly viewed as a single group. The smaller cluster group contains rs4575098 and rs4663105. There is no prior research of rs4575098, but rs4663105 mapped to BINI gene was identified as having a significant association among APOE 4+ and 4− subjects [31]. Future research can be conducted on the association between rs4575098 and rs4663105 as well as their collective role in early MCI development.
Comparison group LMCI vs. HC has the most balanced cluster group for AV45 with one cluster of 20 SNPs and another cluster of 34 SNPs (with APOE rs429358). The partition will provide us with insights of how two groups of SNPs each plays a different role in the LMCI stage. This finding is promising given that (1) LMCI is the transitional stage between EMCI and AD, (2) there are no significant partitions at EMCI and AD, and (3) there is a significant pattern at LMCI. This suggests a potential stage-specific imaging genetic pattern during AD progression, which warrants further investigation. See Section 4.5 for additional discussion on the functional annotation of this high-level imaging genetic pattern.

FDG Clustering Result
In the FDG row of Figure 4, for the smaller cluster group, EMCI vs. HC group has rs10498633 and rs12881735, LMCI vs. HC group has rs10498633 and rs12881735, and AD vs. HC group has rs6656401, rs2093760, and rs4844610. The MCI vs. HC group has eight SNPs and the ALL vs. HC group has six SNPs. In general, the clustering patterns in the networks do not seem as significant as AV45 and VBM. The Silhouette score of FDG (0.076) is also lower than AV45 (0.102) and VBM (0.0879); yet, there is one observation of the results: rs10498633 present in both EMCI and LMCI smaller cluster groups. Previous studies have shown that rs10498633 in SLC24A4 was significantly associated with anisotropy, total number and length of fibers, including some connecting brain hemispheres [32].

VBM Clustering Result
In the VBM row of Figure 4, comparison group MCI vs. HC has one group of 2 SNPs (rs4236673 and rs9331896) and another group of 52 SNPs. Comparison group ALL vs. HC has one group of 1 SNP (rs9271058) and another group of 53 SNPs. These cases can be viewed as having one group instead of two partitions.
Comparison group EMCI vs. HC has a smaller group of six SNPs: rs10808026, rs7810606, rs10498633, rs12881735, rs12590654, and rs113260531. Comparison group LMCI vs. HC has a smaller group of five SNPs: rs4236673, rs9331896, rs10498633, rs12881735, and rs12590654. The SNPs rs10498633, rs12881735, and rs12590654 lie in the intersection of these two groups, potentially having an impact throughout the MCI stage. As mentioned in the FDG section, rs10498633 is also found to be distant from the other AD-related SNPs for VBM modality, which reinforces its unique role associated with anisotropy in the MCI stage.
Comparison group AD vs. HC has the most balanced cluster result with one group of 16 SNPs and another group of 38 SNPs. This provides us with insights about how the two groups of AD-related SNPs each play a different role in AD patients. Functional annotation of this high-level imaging genetic pattern are discussed in Section 4.6.

AV45 Case Study
In Figure 5a,b, the Elsevier pathway analysis reveals some promising results on our genetic analysis of AV45 measures in the LMCI vs. HC comparison: (1) the pathway of amyloid beta clearance in AD is enriched by genes associated with the SNP Group 1, and (2) the pathway of amyloid beta formation in AD is enriched by genes associated with the SNP Group 2. AD pathogenesis is widely believed to be driven by the production and decomposition of β-amyloid peptide [33]. The disease state of AD is closely related to the solubility and the quantity of β-amyloid. Our pathway analysis suggests that the SNPs in Group 1 have potential to be related to the decomposition of amyloid beta while the SNPs in Group 2 to be related to its production. Since AD is characterized by accumulation of β-amyloid, it warrants further investigation that the SNPs involved here can be studied as suppressors and/or promoters to minimize the amount of β-amyloid present [34].
A relevant observation from our pathway analysis is Group 1's association with amyloid beta and APP intracellular transport in AD and amyloid beta traffic and degradation in extracellular matrix in AD and Group 2's association with APP processing. β-amyloid is released by sequential proteolytic processing of the amyloid precursor protein, so the inhibition of APP processing and the excitation of intracellular transport, traffic, and degradation together minimize the accumulation of β-amyloid in the extracellular matrix.
Another indicator of Group 1's role on β-amyloid is the MBP immunal pathway, which is responsible for amyloid beta degradation [35]. The most correlated pathway of Group 2 is complement activation in AD. Complement proteins are integral components of amyloid plaques and cerebral vascular amyloid in AD patient brains, which can be found at the earliest of amyloid deposition [36]. The complement activation also coincides with the clinical expression of Alzheimer's dementia. Aside from the two group's direct associations with β-amyloid, the pathway analysis also shows that AD is correlated with different diseases including Tangier Disease, cancer, psoriasis, and asthma. Previous studies have shown that Tangier Disease is caused by mutations of ABAC1, which is closely related to β-amyloid [37].
In Figure 5c,d, the most correlated brain regions associated with SNP Group 1 include cerebellar, cerebellum, vi, lobules, and vermis (see https://neurosynth.org/analyses/ terms/, accessed on 16 June 2022 for definition of these terms). Cerebellar and cerebellum are responsible for motor functions and balance. It is also associated with the visual system. Vermis and some subsequent correlated brain regions are also associated with maintaining posture. So, this group is primarily associated with brain regions that are responsible for balance, motor functions, and visual functions. Group 2 is correlated with prefrontal, medial prefrontal, medial, prefrontal cortex, and social. All these regions control cognitive ability, memory management, and emotional impulse. The affected brain regions and their respective functions of two groups of SNPs show a great difference, demonstrating the promise of our clustering result. Figure 6a,b shows the results of Elsevier pathway analysis on our genetic study of VBM measures in the AD vs. HC comparison. SNP Group 1 is associated with complement activation in AD and various pathways that is associated with the immune system and systematic lupus erythematosus, which is a disease categorized by the immune system attacking its own tissues. SNP Group 2 is associated with amyloid clearance and formation pathways, which has an ambiguous downstream function compared with the AV45 results. Thus previous AV45 result shows a better partition, which can also be verified by visually inspecting the SNP networks and comparing the averaged Silhouette scores (0.1015 vs. 0.0879).

VBM Case Study
In Figure 6c,d, the brain association pattern corresponding to SNP Group 1 includes cerebellum, cerebellar, vi, lobules, and putamen. Cerebullum and cerebellar govern motor functions and balance (see https://neurosynth.org/analyses/terms/, accessed on 16 June 2022 for definition of these terms). The putamen is involved in learning and motor control, including speech articulation, language functions, and cognitive functions. Similar to the Group 1 result of the AV45 analysis above, this group is associated with balance, motor functions, and visual functions. The brain association pattern corresponding to SNP Group 2, on the other hand, is related to premotor, parietal motor, movements, and primary motor. The primary function of the premotor cortex is to assist in integration of sensory and motor information of the performance of an action. The parietal lobes integrate somatosensory signals and information from different modalities. The difference between the two brain maps in this case is less significant than the AV45 analysis above.

Conclusions
A data-driven analysis pipeline has been proposed in this work to identify highlevel imaging genetic patterns. Based on the detailed SNP-QT associations, we develop a graph-cut algorithm to cluster similar SNPs together so that SNPs within the same cluster tend to have similar associations with QTs across the brain. We construct multiple SNP networks based on different similarity measurements. Each similarity network can be viewed as a weighted graph with a specific similarity measure defined as the edge weight. We employ a multigraph clustering method derived from min-max graph cut to discover SNP clusters that take into consideration of all the studied similarity measures. After that, functional annotation is performed for each identified SNP cluster and its corresponding brain association pattern to provide valuable biological insights at a high level.

Index
Name Index Name Index Name Index Name