Artificial Intelligence-Assisted Identification of Genetic Factors Predisposing High-Risk Individuals to Asymptomatic Heart Failure

Heart failure (HF) is a global pandemic public health burden affecting one in five of the general population in their lifetime. For high-risk individuals, early detection and prediction of HF progression reduces hospitalizations, reduces mortality, improves the individual’s quality of life, and reduces associated medical costs. In using an artificial intelligence (AI)-assisted genome-wide association study of a single nucleotide polymorphism (SNP) database from 117 asymptomatic high-risk individuals, we identified a SNP signature composed of 13 SNPs. These were annotated and mapped into six protein-coding genes (GAD2, APP, RASGEF1C, MACROD2, DMD, and DOCK1), a pseudogene (PGAM1P5), and various non-coding RNA genes (LINC01968, LINC00687, LOC105372209, LOC101928047, LOC105372208, and LOC105371356). The SNP signature was found to have a good performance when predicting HF progression, namely with an accuracy rate of 0.857 and an area under the curve of 0.912. Intriguingly, analysis of the protein connectivity map revealed that DMD, RASGEF1C, MACROD2, DOCK1, and PGAM1P5 appear to form a protein interaction network in the heart. This suggests that, together, they may contribute to the pathogenesis of HF. Our findings demonstrate that a combination of AI-assisted identifications of SNP signatures and clinical parameters are able to effectively identify asymptomatic high-risk subjects that are predisposed to HF.


Introduction
Heart failure (HF) is an important public health problem that is associated with high morbidity, high mortality, and a burden to healthcare [1]. It is considered to be a progressive disorder and is caused by a range of different risk factors; this leads to a heterogenic pathophysiology. The symptoms of HF include effort intolerance and/or fluid retention, as well as dyspnea, fatigue, and pulmonary congestion. The American College of Cardiology and American Heart Association (ACC/AHA) have categorized HF into four stages: Stage A is defined as having risk factors for HF only; Stage B is defined as having structural heart disease without any current or prior symptoms of HF; Stage C is defined as symptomatic HF; and Stage D is HF refractory to treatment [2].
Stage B HF, which is asymptomatic, is a risk factor for mortality. Stage B HF is defined by the ACC/AHA as best characterized by an increase in the left ventricular (LV) mass (LVM), an increase in the left atrial dimensions, the presence of LV geometric patterns indicative of adverse remodeling (i.e., concentric remodeling and/or both concentric and eccentric hypertrophy), and a reduction in the LV ejection fraction (LVEF) or diastolic dysfunction. However, Stage B HF subjects are asymptomatic, thus it is highly unlikely this clinically silent population will routinely receive a detailed examination of their heart. This leads to a five-fold increase in mortality risk among such individuals, as well as the transition from Stage B HF to Stage C HF, which is associated with elevated rates of hospitalization and death [3]. The ACC/AHA guidelines have emphasized the importance of the appropriate treatment of Stage B HF subjects to prevent the development of symptomatic HF [2] and have highlighted the need for early detection strategies to identify these clinically asymptomatic Stage B HF subjects.
The risk prediction of a complex disease such as the asymptomatic Stage B HF is currently a challenge as well as is an unmet clinical need. It is well documented that the risk factors for HF include environmental factors, metabolic derangements, and genetic factors [2], and there is also an increasing appreciation that there is an underlying strong heritable component. This strengthens the importance of discovering genetic factors that contribute to the underlying mechanism in an attempt to reveal novel targets for the prevention and treatment of HF. Notably, recent studies of single nucleotide polymorphisms (SNPs) have suggested that there is an association between certain genetic factors and an increased risk of HF [4,5]. However, the role of those factors and the molecular mechanism(s) behind the pathogenesis of HF remain incompletely understood.
Artificial intelligence (AI) techniques have been applied previously to cardiovascular diseases, namely model prediction of the presence of HF, estimation of the HF subtype, and assessment of the severity of HF [6]. Currently, most studies of AI-assisted HF prediction have used clinical features and focused on the recognition of subtypes for prognosis [7,8], such as destabilizations, re-hospitalizations, and mortality [9,10]. In addition, the techniques of AI machine-learning also have great potential for delineating complex biological processes, in particular those involving interactions between the multiple genetic factors and biochemical pathways that accelerate the development of HF. Accordingly, in this study, we applied an AI-assisted methodology to identify the genetic factors in a high-risk population that are potentially associated with asymptomatic Stage B HF. This involved carrying out genome-wide SNPs screening. Furthermore, we also performed protein connectivity mapping of the genes in which the SNPs are located in order to pinpoint their potential role in the molecular pathogenesis of HF in terms of functional connectivity and protein-protein interaction networks.

Study Subjects
Between February 2019 and November 2019, 162 prospectively recruited participants from the Northeastern Taiwan Community Medicine Research Cohort (NTCMRC, Clin-icalTrials.gov Identifier: NCT04839796) were enrolled and examined in the cardiology outpatient clinic of Chang Gung Memorial Hospital, Keelung. All subjects received a clini-Cells 2021, 10, 2430 3 of 20 cal examination, blood tests, electrocardiogram (ECG) and echocardiography evaluation, as well as having a complete personal and past medical history takenrecording. The 10-year and lifetime risk of atherosclerotic cardiovascular disease (ASCVD) was calculated for each subject [11]. The inclusion criteria were as follows: subjects were over 30 years old with a 10-year ASCVD risk ≥20%. The exclusion criteria were that the subject was already known to have clinical HF at Stage C or D; was suffering from atrial fibrillation (as identified by having a previous diagnosis of atrial fibrillation or paroxysmal atrial fibrillation; found to have documented atrial fibrillation on ECG or during the echocardiography exam); and/or was pregnant. Informed signed consent was obtained from all participants. This study protocol conforms to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Institutional Review Board of Chang Gung Medical Foundation (IRB No: 201800802B0 and 202000077B0A3).

Clinical Assessment
At recruitment, all participants provided a detailed personal history and received a full physical examination, completed a questionnaire, and underwent various biochemical tests, including assays for n-terminal pro-brain natriuretic peptide (NT-proBNP), high sensitivity-Troponin T (hs-Tnt), and high sensitivity c-reactive protein (hs-CRP), as well as a twodimensional (2D) echocardiography examination. Blood pressure was measured using the average of two seated measurements. Heart rate was measured via a resting 12-lead ECG. Body mass index was calculated as weight divided by height 2 and expressed as kg/m 2 . Diabetes mellitus was defined as a fasting glucose of ≥126 mg/dL, a random glucose of ≥200 mg/dL, or the use of hypoglycemic medications. Previous history of coronary heart disease was used to identify if subjects had angina pectoris with a positive exercise test result; a history of myocardial infarction; angiographic evidence of significant (>75%) coronary artery stenosis after intra-coronary nitroglycerine 50-200 µg administration; a history of percutaneous coronary revascularization; or coronary artery bypass grafting. Current smoking status was defined as having smoked more than 100 cigarettes in their lifetime and having smoked within 1 month before enrollment.

Biochemical Analysis
Blood specimens were collected in citrate-treated tubes at recruitment. After centrifuging for at least 15 min, the plasma component was frozen and shipped on dry ice to the core laboratory center of our hospital, at which the samples were stored at −80 • C for the subsequent measurement of cytokines and inflammatory markers. Plasma hs-CRP was measured in duplicates by an enzyme-linked immunosorbent assay on the basis of purified protein and polyclonal anti-C-reactive protein antibodies (IMMULITE hs-CRP, Diagnostic Products Corporation, Los Angeles, CA, USA). The lower limit of this assay was 0.10 mg/L and the coefficient of variation was ≤5% at the 0.20 mg/L C-reactive protein level. The plasma concentrations of hs-Tnt and NT-proBNP were measured using appropriate sandwich enzyme-linked immunosorbent assay kits and the monoclonal antibody targeting the relevant cytokine (R&D Systems, Inc., Minneapolis, MN, USA). Other related biomarkers, namely leptin and adiponectin, were measured using commercially available enzyme-linked immunosorbent assays (Boster Biological Technology, Pleasanton, CA, USA).

Echocardiography
Echocardiography was performed within one month of recruitment. A comprehensive transthoracic Doppler echocardiography was performed using a commercially available machine (Vivid E9 system, General Electrics, Boston, MA, USA) with a M5S probe. LV end-diastolic and end-systolic volume were measured from the apical two-chamber and four-chamber view, and the LVEF was calculated using the modified biplane Simpson's rule. The LVM index was measured according to the American Society of Echocardiography formula. Conventional Doppler parameters were measured according to a standardized Cells 2021, 10, 2430 4 of 20 examination procedure and these were early (E) and late diastolic transmitral flow velocity; deceleration time of E; average of the septal annular mitral early diastolic, late diastolic, and systolic tissue velocities (E'); and the ratio of E/E'. The pulmonary artery systolic pressure was calculated using the modified Bernoulli equation from the tricuspid regurgitation peak jet velocity and estimated right atrial pressure (from the respiratory variation of the inferior vena cava diameter). Two-dimensional strain analysis was performed using custom 2D strain-imaging software (EchoPac, GE Ultrasound, Boston, MA, USA). The endocardial borders were traced from the end-systolic frame of the 2D images. Interactive software then automatically tracked myocardial motion and divided each image into six segments. Numerical and graphical displays of the deformation parameters (reflecting the average value for the tracking of all acoustic markers in each segment) were then generated for all six segments from each view. Longitudinal peak systolic strain was acquired for the apical two-chamber, three chamber, and four-chamber views. If further shortening occurred after the end of the systole, this was measured as the peak strain. Global longitudinal strain was calculated as the average longitudinal strain of the segments of two-chamber, three-chamber, and four-chamber views. All examinations of echocardiography were performed and analyzed by one experienced physician (Dr. Ning-I Yang) of excellent reproducibility [12] who was blinded to the subject data.

Definition of Stages A and Stage B Heart Failure
HF preclinical stages were assessed based on the clinical history and echocardiographic data. Stage A HF was defined as the presence of risk factors such as arterial hypertension; type 2 diabetes; obesity; metabolic syndrome; a documented clinical history of atherosclerotic disease or the use of cardiotoxins when there was no evidence of structural heart disease; or the signs/symptoms of HF. Stage B HF was defined as the presence of structural heart disease or the detection of diastolic dysfunction on the echocardiographic examination. The latter subject needed to fulfill one of the following criteria: (1) LV hypertrophy as defined by an LV mass index of >95 g/m 2 in women or of >115 g/m 2 in men; (2) LV dilatation as defined by an LV end diastolic volume index of >95 mL/m 2 ; (3) concentric remodeling defined by a relative wall thickness of >0.42; (4) asymptomatic LV dysfunction, including an LVEF of < 50% and/or diastolic dysfunction (≥grade II) without clinical signs and/or symptoms of HF; and (5) more than mild mitral or aortic regurgitation.

DNA Extraction of White Blood Cells
Peripheral venous blood was collected from the subjects and processed on the same day. Each blood sample was centrifuged at 3000 rpm for 10 min at 4 • C to separate the serum from cells. Genomic DNA was then isolated from peripheral white blood cells using the phenol/chloroform DNA extraction method after lysis of red blood cells. Finally, precipitation and washing using 95% isopropanol followed by 80% alcohol were used to obtain the total genomic DNA.

Whole-Genome SNP Analysis
To identify single nucleotide polymorphisms (SNPs), we genotyped the genomic DNA using Axiom TM Genome-Wide TWB 2.0 array plates, which included 686,463 SNPs. Genotyping analyses were performed on 117 high-risk (ASCVD risk ≥ 20%) subjects, including 83 Stage A and 34 Stage B asymptomatic HF subjects. The stages of HF were defined by the echocardiographic examination results. Among the SNPs identified, those with a minor allele frequency rate of 0 or those SNPs with a missing rate of more than 10% were excluded from further analysis. In total, 392,885 SNPs were available for further analyses.

AI-Assisted Discovery of Candidate SNPs
For the model training and testing, all machine-learning analyses were performed using R Version 3.5.3 (using the random Forest, e1071, glmnet, rpart, caret, and cvAUC packages). We used three supervised algorithms to select important features, namely the random forest (RF), support vector machine (SVM), and least absolute shrinkage and selection operator (LASSO) methods, with the input dataset having a train-to-validation split ratio of 80:20. The SNPs were ranked by the summation of the selected counts using 100-time bootstrapped random samples and the three machine-learning methods. The minimum features needed were the highest performance for the area under the curve (AUC) and the accuracy rate; these were calculated for the four different machine-learning models (RF, SVM, LASSO, and the decision tree).

Protein Interaction Network
The locations of AI-identified SNP signatures were mapped to their relevant genes. To explore the regulatory mechanisms and potential pathways that the genes may be involved in, the six protein-coding genes from the twelve signature genes were subjected to protein-protein interaction analyses using the BioGRID database [13]. The results are displayed as a graphical network using the open-source software Cytoscape [14].

Statistics
Two independent sample t-tests were used to compare differences between the continuous variables derived from the groups. The results are presented as means ± standard deviations (SD). The χ 2 test was used to examine the distribution of the categorical variables and results are expressed as frequencies and percentages between the groups. A multiple logistic regression model was used to determine the strength of association between the selected parameters and the presence of Stage B HF. The statistical software used for this study was SPSS 25.0 (IBM Corporation, Armonk, NY, USA).

Clinical, Biochemical, and Echocardiography Data
A total of 162 subjects were recruited into the study between February 2019 and November 2019. The demographic and clinical baseline characteristics of the subjects are shown in Table 1. Of these, 113 subjects (70%) had Stage A HF and 49 subjects (30%) had Stage B (asymptomatic) HF. Both groups had a male predominance and the ASCVD risk scores in both groups were in the high category (>20%). The risk factors for HF included coronary artery disease as well as hypertension and diabetes, and these were similar for both groups. Biochemical analysis showed that the Stage B HF group had higher levels of NT-proBNP ( Figure 1A, 117.35 ± 114.93 vs. 78.30 ± 87.55, p = 0.040) and adiponectin (13.13 ± 10.32 vs. 8.74 ± 8.86, p = 0.011). Our results demonstrated that elevated serum levels of NT-proBNP and adiponectin correlated with the progression of HF in asymptomatic patients. Despite the beneficial effects of adiponectin on cardiometabolic traits, increased adiponectin has also been found during HF progression, the so-called adiponectin paradox [15]. Echocardiography analysis showed that both groups had adequate LV systolic function and right ventricle systolic function, as shown by the LVEF, global longitudinal strain analysis, and tricuspid annular plane systolic excursion parameters. Since Stage B HF is defined as having LV hypertrophy, dilatation, increased relative wall thickness, and/or LVdysfunction, the corresponding echocardiographic parameters were greater in the Stage B group as would be expected (Table 2; Figure 1B). Following multiple logistic regression analyses, it was found that an elevated NT-proBNP level was associated with a prediction of Stage B HF (odds ratio: 1.005, 95% confidence interval 1.000-1.010, p = 0.032; Table 3).

AI-assisted Identification of SNP Signature: Model Selection, Performance, and Validation
Genome-wide association studies (GWAS) have been able to identify thousands of SNPs that are linked to complex human diseases. To categorize loci associated with asymptomatic HF, namely Stage B, we combined AI-assisted analysis with GWAS using the data from the 117 high-risk (ASCVD risk ≥ 20%) subjects. This study comprised three parts, which were as follows: (1) the performance of GWAS on 83 subjects with Stage A HF and 34 subjects with asymptomatic Stage B HF using the Axiom Genome-Wide TWB 2.0 Array; (2) feature selection; and (3) model derivation and validation. During the feature selection process, only SNPs with adjusted p-values < 1 × 10 −7 (0.05/392885) and an odds ratio of > 1 in 392,885 SNPs, whose p-values were calculated by χ 2 test and adjusted with Bonferroni correction, were used for the correlation analysis of the HF stage. Multiple feature importance methods within the various different machine-learning algorithms consisted of feature importance in the random forest (RF) approach, weighted and LAVI (B3, 51 mL/m 2 ). The red arrow indicates that the echocardiographic pictures were taken at the end-diastolic phase in A2 and B2, and at the end-systolic phase in A3 and B3.  Genome-wide association studies (GWAS) have been able to identify thousands of SNPs that are linked to complex human diseases. To categorize loci associated with asymptomatic HF, namely Stage B, we combined AI-assisted analysis with GWAS using the data from the 117 high-risk (ASCVD risk ≥ 20%) subjects. This study comprised three parts, which were as follows: (1) the performance of GWAS on 83 subjects with Stage A HF and 34 subjects with asymptomatic Stage B HF using the Axiom Genome-Wide TWB 2.0 Array; (2) feature selection; and (3) model derivation and validation. During the feature selection process, only SNPs with adjusted p-values < 1 × 10 −7 (0.05/392885) and an odds ratio of > 1 in 392,885 SNPs, whose p-values were calculated by χ 2 test and adjusted with Bonferroni correction, were used for the correlation analysis of the HF stage. Multiple feature importance methods within the various different machine-learning algorithms consisted of feature importance in the random forest (RF) approach, weighted support vector in the support vector machine (SVM) approach, and a shrinkage coefficient >0 in the least absolute shrinkage and selection operator (LASSO) approach; these were integrated into the ranking features importance. For model derivation and validation, subjects were randomly assigned into either a training (80%) or a validation (20%) set after selection of the significant SNPs by these three machine-learning (RF, SVM, and LASSO) methods. The above process was repeated 100 times using different combinations of subject to form the training and validation sets. The SNPs were ranked using their summarized counts from the 100-times random samples obtained from the three machinelearning methods (Figure 2A). Although we didn't have a large sample size, we used three screening processes to obtain reliable results. Firstly, the adjusted p-value (Bonferroni correction) was used to identify the SNPs with strong evidence. Secondly, the positive odds ratio was used to provide reasonable explanations on the risk alleles of the disease. Thirdly, the bootstrap (resampling technique) method with multiple replication times was used to approximate the true population distribution. In addition, different variable selection methods, including machine-learning and statistical methods, were used to rank the importance of SNPs. Notably, our data revealed that the top-ranked SNPs were frequently picked up by the machine using the variable selection and bootstrapped random sample (reproducibility) methods. This means that, although the training and validation samples were different in every bootstrap replication, the top-ranked SNPs consistently showed their importance in distinguishing between Stage A and B patients. Of the four machine-learning models tested (RF, SVM, LASSO, and the decision tree), the SVM model gave the best performance in terms of AUC and accuracy rate. The top 20 SNPs selected by AI had the best prediction performance and the accuracy rate was 0.899 while the AUC was 0.931 when differentiating between Stage A and Stage B (Table 4; Figure 2B-D). Among the top 20 SNPs, 13 SNPs had previously been annotated and mapped either to protein-coding genes, pseudogenes, or non-coding RNA (ncRNA) genes. Furthermore, using these annotated 13 SNPs, the prediction still performed well with an accuracy rate of 0.857 and AUC of 0.912 when differentiating between Stage A and B subjects ( Figure 2E).

The AI-selected SNP Signature Identifies Asymptomatic High-risk Subjects that are Predisposed towards Progression to Heart Failure
The haplotypes distribution, expression patterns, and functions of the genes containing the 13 SNPs that were identified using the Stage A and B subjects are summarized in Table 5. These thirteen genes include six protein-coding genes, one pseudogene, and six ncRNA genes. The SNPs in the five protein-coding genes (GAD2, APP, RASGEF1C, MACROD2, and DMD) and one pseudogene (PGAM1P5) are intron variants. The SNP located in the protein-coding gene DOCK1 is an upstream variant. When the pro-

The AI-Selected SNP Signature Identifies Asymptomatic High-Risk Subjects That Are Predisposed towards Progression to Heart Failure
The haplotypes distribution, expression patterns, and functions of the genes containing the 13 SNPs that were identified using the Stage A and B subjects are summarized in Table 5. These thirteen genes include six protein-coding genes, one pseudogene, and six ncRNA genes. The SNPs in the five protein-coding genes (GAD2, APP, RASGEF1C, MACROD2, and DMD) and one pseudogene (PGAM1P5) are intron variants. The SNP located in the protein-coding gene DOCK1 is an upstream variant. When the protein-coding genes were examined, we were able to make the following observations ( Figure 3A): (1) in the GAD2, APP, and RASGEF1C genes, the incidences of the G/G genotypes were significantly increased in the Stage B subjects (p < 0.001); (2) in the MACROD2 gene, the incidence of the A/A genotype was significantly increased in the Stage B subjects (p < 0.001); and (3) in the DMD and DOCK1 genes, the incidences of the C/T genotypes were significantly increased in the Stage B subjects (p < 0.001). Notably, expression of these four protein-coding genes, namely RASGEF1C, MACROD2, DMD and DOCK1, as well as the pseudogene PGAM1P5, can be detected in the heart [16], suggesting that these candidate genes might play a role in the pathogenesis of HF.  When the non-coding RNA (ncRNA) genes were examined, two SNPs were intron variants of long intergenic ncRNA genes (LINC01968 and LINC00687), while the remain-ing four SNPs were intron variants of ncRNA genes that have uncharacterized functions (LOC105372209, LOC101928047, LOC105372208, and LOC105371356). When the ncRNA genes were examined, we were able to make the following observations ( Figure 3B): (1) in LINC01968 and LOC105372208, the C/T genotypes are significantly increased in Stage B subjects (p < 0.001); (2) in LINC00687, the incidence of the C/C genotype is significantly increased in Stage B subjects (p < 0.001); (3) in LOC105372209, LOC105372210, and LOC105371356, the A/G genotypes are significantly increased in Stage B subjects (p < 0.001); and (4) in the LOC101928047 and LOC101928004, the incidences of the T/T geno-types are significantly increased in Stage B subjects (p < 0.001). It should also be noted that the expression of three of these ncRNA, namely LOC105372209, LOC101928047, and LOC105372208, are detectable in the heart [17], which suggests that they might also be involved in HF progression.

The Protein Interaction Network of the Genes for which the AI-Assistance Identified where the SNPs Are Located
To explore the potential role in the pathogenesis of the high-risk Stage B subjects of the identified protein-coding genes where the SNPs are located, we conducted a proteinprotein interaction network analysis in order to identify any potential pathways that might affect cardiac function. Interestingly, five hub proteins, namely TERF1, TERF2, TRIM25, KIAA1429, and PRKACA, were identified in the protein-protein interaction network (marked in blue). This indicates that these proteins are likely to serve as hubs to connect DMD, RASGEF1C, MACROD2, DOCK1, and PGAM1P5 in the heart (Figure 4). In addition, within the protein interaction network, DMD seems to connect with APP, which is expressed in the brain, whereas DOCK1 seems to connect with GAD2, which is expressed in the pancreas (Figure 4). Currently, we cannot rule out the possibility that these two interactions might occur in vivo in cardiomyocytes via a novel mechanism that involves the transportation of the proteins or peptides outside the heart. For example, Aβ, the product of APP, might be released from the brain and delivered to the heart via the circulation where it can interact with DMD, thereby affecting cardiac function.
In addition, within the protein interaction network, DMD seems to connect with APP, which is expressed in the brain, whereas DOCK1 seems to connect with GAD2, which is expressed in the pancreas (Figure 4). Currently, we cannot rule out the possibility that these two interactions might occur in vivo in cardiomyocytes via a novel mechanism that involves the transportation of the proteins or peptides outside the heart. For example, Aβ, the product of APP, might be released from the brain and delivered to the heart via the circulation where it can interact with DMD, thereby affecting cardiac function. Figure 4. Protein-protein interaction network of the genes containing the AI-assisted identification of SNPs. The protein-protein interaction network was established using the protein-coding genes that contain a SNP. The red dots indicate the genes containing the AI-assisted SNPs and the blue dots indicate the proteins that are connected to the proteins containing SNPs within the network. PGAM1P5 is annotated as a pseudogene. Abbreviations include VIRMA: Vir-like M6A Methyltransferase-associated; PRKACA: protein kinase CAMP-activated catalytic subunit alpha; TERF1: . Protein-protein interaction network of the genes containing the AI-assisted identification of SNPs. The proteinprotein interaction network was established using the protein-coding genes that contain a SNP. The red dots indicate the genes containing the AI-assisted SNPs and the blue dots indicate the proteins that are connected to the proteins containing SNPs within the network. PGAM1P5 is annotated as a pseudogene. Abbreviations include VIRMA: Vir-like M6A Methyltransferase-associated; PRKACA: protein kinase CAMP-activated catalytic subunit alpha; TERF1: telomeric repeat binding factor 1; TERF2: telomeric repeat binding factor 2; and TRIM25: tripartite motif containing 25.

Discussion
The need to halt the progression of asymptomatic pre-clinical HF cannot be over emphasized and thus there is an urgent need for new diagnostic and management tools. The early identification of Stage B HF subjects can be challenging, despite the fact that there is a clear association between traditional risk factors and the development of HF [18]. Notwithstanding the above, the majority of individuals with hypertension and prior myocardial infarction do not eventually develop new-onset HF, a concept that is often referred to as the "prevention paradox". In this study, we have shown that AI-assisted machine-learning is able to identify SNPs that are potentially associated with the risk of Stage B preclinical HF in high-risk individuals.

The Genomics of Heart Failure
The presence of heritable, polygenic components related to symptomatic and asymptomatic cardiovascular disease have been long recognized [19]. Many loci are associated with cardiovascular risk factors and diseases, and these have provided insights into the possible mechanisms that underlie the disease. However, there remains a great need for efforts to translate these genomic mechanisms into clinical practice. The application of GWAS has made the identification of loci possible, which is possibly related to the occurrence of HF and the mortality associated with HF. However, due to the heterogeneous pattern of HF, very few GWAS studies have been able to be replicated. Based on the results of GWAS studies, most cardiovascular diseases seem to be influenced by a large number of loci and these variants themselves are seldom the causal variants of the disease. Interestingly, the aggregation of these minor loci account for 10% to 36% of the inherited variation in hyperlipidemia [20], type 2 diabetes [21], myocardial infarction [22], and HF [23,24]. In this study, we hypothesized that genetic factors have an important role in the progression of asymptomatic HF in high-risk subjects. Using a traditional approach, our study subjects were sieved and we confirmed that the high-risk subjects were without any identifiable clinical manifestations of HF. The subjects included in this study were homogenous without other obvious cardiac diseases. In previous studies, the lack of homogeneity has been a potential bias. Our results provide a possible precision medicine approach to the early identification of individuals who are asymptomatic but at high risk of HF. Current guidelines are lacking on how to precisely predict the progression to HF and there is a lack of specific preventive measures or treatments that can used to help these asymptomatic high-risk patients.

Integrate Artificial Intelligence into the Traditional Prediction Model
In the era of precision medicine, AI or machine-learning algorithms have a number of advantages over the traditional regression model approach [25,26]. Currently, machinelearning or AI is utilized for predicting the prognosis for HF and this has used large or multifaceted datasets such as electronic health records [27] or multi-omics data [28]. However, these datasets are biased due to heterogeneous causes of HF. These prediction models created a "black-box" algorithm that shows only limited improvement over traditional logistic regression prediction models [29]. The application of AI (machine-learning) needs to be incorporated into the traditional approach rather than there being a case of mutual exclusion. In this study, we screened high-risk asymptomatic subjects using a traditional scoring system and this was then followed by AI-assisted prediction using whole genome SNPs. In previous studies, the AUCs of the prognosis prediction, generated by their identified SNPs, combined with traditional risk factors ranged from 0.56 to 0.77 [30][31][32]. Our approach using traditional risk factors, AI-assisted analysis, and a combination of 13 novel SNPs gave an AUC of 0.91. Our findings demonstrate that using an integrated application of traditional and AI-assisted approaches dramatically improves HF prediction.

Individual SNPs and Heart Failure Progression
Although prior studies using subjects with European ancestry have identified associations between candidate SNPs in the introns of PITX2, ABO, ACTN2, MYOZ1, SYNPO2L, BAG3, and CDKN1A [23,33], such an analysis of a combined database constructed from a number of different cohorts results in an increase in the heterogeneity of the etiology and clinical manifestation of HF; this then leads to a reduction in the statistical power [33]. Other studies focusing on a Han population found an association between a prognosis of heterogeneous cardiomyopathy with SNPs associated with LGALS3 [34], AGCT, SLC25A13, HRG, APOB, SOD3, SYNM, and TLN2 [30]. In the present study, three organ-specific clusters of SNPs were identified ( Figure 4). Specifically, APP and GAD2 are mainly expressed in the brain and pancreas, respectively, while the other SNPs, namely DMD, MACROD2, PGAM1P5, RASGEF1C, and DOCK1, are mainly expressed in the heart. The GAD2 polymorphism has been reported to be associated with eating behaviors among women [35] and the risk of obesity [36,37]. Obesity is a well-known risk factor for a HF progression. Another gene associated with obesity found in our signature is the APP gene, which is upregulated in mitochondria and regulates mitochondrial function [38]. Another gene, MACROD2, which is one of three mono ADP-ribosylases in humans, has been reported to act as a transcriptional regulator of adipogenesis and obesity in a Han population [39]. The DOCK1 gene, an atypical Rac activator, has been associated with obesity in a Yup'ik population [40] and is required for cardiovascular development [41]. Mutation of DMD results in muscular dystrophy that can be complicated by the presence of HF and irregular heart rhythms [42]. Finally, there is increasing evidence that lncRNAs are able to affect the expression of protein-coding genes by competitively binding to shared miRNAs, which then reduces the degradation of protein-coding genes. Several studies have found that lncRNA-associated competing endogenous RNA cross-talks with cardiovascular disease pathogenetic processes [43,44].

Natriuretic Peptides and the Brain
Cardiovascular disorders share many risk factors with Alzheimer's disease and other memory disorders. NT-proBNP has been found to be an independent risk marker for the incident of dementia and Alzheimer's disease [45], with higher levels of NT-proBNP being significantly associated with a smaller total grey matter volume [46]. One possible explanation for the relationship between NT-proBNP and dementia may be that individuals with an elevated level of NT-proBNP are more likely to suffer from clinically identified and silent brain ischemic events. It is interesting to note that changes in NT-proBNP are still associated with dementia even after adjusting for CVD risk factors and stroke [47]. Another plausible explanation is that NT-proBNP has a role as a marker of myocardial stress, whereby it reflects the mechanisms leading to progressive subclinical cardiac dysfunction with concomitant myocardial [48] and retinal microvascular damage [49]. Finally, it should be noted that the expression of NT-proBNP is also elevated during stroke and has been found to be associated with increased mortality from stroke [50].

The Cardiac Natriuretic Peptide System
In our study population, we found that Stage B subjects had a higher level of NT-proBNP. It is well known that cardiac hormones and their prohormones are involved in cardiovascular hemostasis via the regulation of natriuresis, diuresis, vasodilatation, and the inhibition of the renin-angiotensin-aldosterone system (RAAS). BNP is a natriuretic hormone that was initially identified in the brain but is released mainly from the heart, particularly as a response when a ventricle is subject to high ventricular filling pressure [51]. Cleavage of the prohormone pro-BNP produces two forms; these are the biologically active 32 amino acid BNP and the biologically inactive 76 amino acid NT-proBNP. These natriuretic peptides play an important role in the diagnosis of patients who are suspected to have HF [52]. It has been shown previously that when an individual is at risk of HF, BNP-based screening and collaborative care is able to reduce the combined rates of LV systolic-diastolic dysfunction and clinical HF [53]. In addition, NT-proBNP-guided RAAS antagonists and beta-blocker therapy in diabetic subjects have been shown to be beneficial and to help prevent cardiac events [54].

The Heart-to-Brain Connection
Alzheimer's disease (AD) and HF with a preserved fraction are age-related disorders that can coexist; they also have common risk factors, a similar epidemiological stratification, and involve common triggers, including oxidative stress, inflammation, and hypoxia. An examination of elderly AD patients has identified the presence of subclinical heart disease, including LV hypertrophy, aortic valve thickening, and aortic regurgitation. The hallmark of AD is the deposition of amyloid plaques, which consist primarily of a 40-42 amino acid peptide called amyloid-β (Aβ). These peptides aggregate into fibrils that then form an ordered β-sheet structure. The amyloid precursor protein is known as the precursor protein for AD-related amyloid Aβ [55]. Aβ deposition in the walls of the cerebral blood vessels is a hallmark lesion of cerebral amyloid angiopathy. In addition, APP and amyloid beta precursor such as protein 2 (APLP2) have also been found to be expressed in cardiomyocytes when heart pathology is present [56]; thus, Aβ may play a role in car-diomyocyte degeneration during HF [57]. Inclusions in the cardiomyocytes of an aging heart are described as being basophilic degenerations of the heart; this has been found to be correlated with age, the degree of myocardial fibrosis in individuals with arterial hypertension, and the severity of cerebral amyloid angiopathy. The fragments detected as part of cardiac basophilic degeneration indicate the presence of specific inclusion body pathology that is related to amyloid precursor protein metabolism. The severity of cerebral amyloid angiopathy has been found to be related to the amyloid precursor protein-derived amyloid β-protein, which suggests a possible link between myocardial and cerebrovascular amyloid precursor protein-related lesions [58].

Limitations and Future Perspectives
This study has several limitations. First, this is a cross-sectional study involving a limited number of subjects from north-east Taiwan and therefore the results need to be validated using both larger scale cohorts and long-term follow ups; furthermore, the findings may not be applicable to western populations. Second, the 13 SNPs' signatures, identified by AI-assisted whole-genome SNP analysis, are associated with HF progression in high-risk subjects. Thus, more studies are needed to clarify how these SNPs affect the functions of these genes and how any relevant changes are involved in the underlying mechanisms behind HF progression.

Conclusions
This study demonstrates the potential of employing AI machine-learning models to augment traditional methods when predicting genetic predispositions to HF in a high-risk population. Knowledge of the major SNPs associated with preclinical HF provides insights into the relationships between complex pathways and also highlights various key genes that potentially are targets for risk stratification, therapy, and drug development. Our results demonstrate that the application of traditional risk stratification, followed by AIassisted analysis, is able to raise prediction performance when used on well-characterized, homogenous, and phenotypically identified subjects. Data Availability Statement: Due to ethical restrictions, the data presented in this study are available on request from the corresponding author.