Molecular Autopsy of Sudden Cardiac Death in the Genomics Era

Molecular autopsy is the process of investigating sudden death through genetic analysis. It is particularly useful in cases where traditional autopsy is negative or only shows non-diagnostic features, i.e., in sudden unexplained deaths (SUDs), which are often due to an underlying inherited arrhythmogenic cardiac disease. The final goal of molecular autopsy in SUD cases is to aid medico-legal inquiries and to guide cascade genetic screening of the victim’s relatives. Early attempts of molecular autopsy relied on Sanger sequencing, which, despite being accurate and easy to use, has a low throughput and can only be employed to analyse a small panel of genes. Conversely, the recent adoption of next-generation sequencing (NGS) technologies has allowed exome/genome wide examination, providing an increase in detection of pathogenic variants and the discovery of newer genotype-phenotype associations. NGS has nonetheless brought new challenges to molecular autopsy, especially regarding the clinical interpretation of the large number of variants of unknown significance detected in each individual.


Introduction
Sudden cardiac death (SCD) refers to the unexpected death of an individual due to an underlying cardiac disease occurring within one hour of onset of symptoms in an apparently healthy subject or, if unwitnessed, in someone known to be in good health up to 24 h before the event [1]. SCD represents 15-20% of all deaths in the general population, with an annual incidence ranging from 40 to 100 per 100,000 person-year [2].
HCM is characterized by unexplained left ventricular hypertrophy, myocyte disarray, and fibrosis. HCM is due to mutations in genes encoding for sarcomere proteins (like MYBPC3 and MYH7) inherited in an autosomal dominant pattern with incomplete of SCD from a holistic point of view [10,11], in this review we aimed to specifically focus on the use of next-generation sequencing (NGS) technologies for molecular autopsy, reporting both the latest data on NGS application in this context as well as technical aspects regarding data processing and interpretation.

Methods
We searched PubMed and Scopus in March 2021 with the terms 'molecular autopsy' OR 'next generation sequencing' OR 'whole exome sequencing' OR 'whole genome sequencing' AND 'sudden death' OR 'sudden cardiac death' OR 'sudden unexplained death'. The present paper is a narrative review, therefore no formal criteria for study selection were adopted.

Molecular Autopsy
"Molecular autopsy" refers to the processes of post-mortem genetic testing on DNA extracted from blood or tissue collected at autopsy to detect a genetic cause responsible for a SUD. The identification of a potentially pathogenic mutation prompts the screening of surviving relatives, with profound implications for their future clinical management [12]. Moreover, the demonstration of a pathophysiological substrate likely responsible for an otherwise SUD represents an invaluable element for medico-legal inquiries. Nonetheless, a molecular autopsy may be negative or inconclusive even when the most advanced technologies for genetic sequencing are employed. Indeed, not all cases of SUD can be ascribed to genetically determined conditions, and genetic variants of unknown significance are commonly found. Relating the identified gene variant to the phenotype of the deceased and studying the segregation of the variant within the family may be important to establish a definite genotype-phenotype association [13].

Sanger Sequencing
First-generation sequencing technologies are primarily represented by Sanger sequencing, which uses oligonucleotide primers to search for targeted and previously known DNA regions. Sanger sequencing analysis is performed comparing the patient's electropherogram against a control one. This approach is easy to employ and has a nearly complete accuracy for the identification of genetic variants [14]. It has been the gold standard for genetic research for almost 3 decades, having been also used for the sequencing of the first human genome (the Human Genome Project), completed in 2003 [15].
Historically, molecular autopsy studies relied on Sanger sequencing to test a few channelopathy-associated genes [10,11]. Despite its simplicity and accuracy, this approach has a high cost per sample and allows for the sequencing of one DNA fragment at a time; hence, it is a low throughput technique (i.e., with low efficiency). Moreover, it is poorly applicable for large-scale genetic screening. Therefore, this approach is inevitably associated with some loss of information about other potential disease-causing genes or gene modifiers, ensuing in an overall low "diagnostic yield". This term, also known as "mutation detection yield", indicates the probability that a disease-causing variant is identified and represents a good measure of the efficiency of a genetic test [14] (Table 1).

Sanger Sequencing in Sudden Cardiac Death
In 1999, Ackerman et al. performed the first molecular autopsy by identifying a novel LQTS pathogenic mutation (KCNQ1) in a 19-year-old who died of anoxic encephalopathy after a near-drowning [16]. Some years later, Chugh et al. tested 5 LQTS-associated genes (KCNQ1, KCNH2, SCN5A, KCNE1, and KCNE2) in 12 SUD cases, and identified the same KCNH2 missense mutation in 2 subjects (17%) [17]. Di Paolo et al. later reported a LQTS-associated mutation in 2 out of 10 cases of juvenile SUD, with a mutation detection yield of 20% [18]. Following studies indicated a lower diagnostic yield: 15% (5 out of 33 SUD cases) in Skinner et al. [19] and 11% (5 out of 44 SUD cases) in Winkel et al. [20], both analysing LQTS genes. Tester et al. conducted the largest molecular autopsy study employing Sanger sequencing so far, including 173 SUD cases tested for the following genes: KCNQ1, KCNH2, SCN5A, KCNE1, KCNE2, RYR2. Mutations in RYR2 were found in 12% of subjects and potentially pathogenic variants in genes associated with LQTS in 15%. Notably, SUD cases with a family history of cardiac events showed a significantly higher mutation prevalence (37% vs. 19%), and the diagnostic yield was even higher (45%) among SUD cases aged < 50 years and with a family history of premature SCD [21].
Overall, these studies showed that a significant proportion of SUD derives from a fatal arrhythmic event caused by a channelopathy, but inter-study mutation detection yield was highly variable, likely reflecting a significant heterogeneity in terms of population examined, DNA source (blood vs. paraffin-embedded tissue), number of genes analysed, and criteria for the attribution of variant pathogenicity.
Nonetheless, the evidence from these studies was deemed sufficient by a consensus document by the Heart Rhythm Society and the European Heart Rhythm Association (HRS/EHRA) on genetic testing for cardiomyopathies and channelopathies to state that comprehensive or targeted (RYR2, KCNQ1, KCNH2, and SCN5A) ion channel genetic testing may be considered in SUD cases to determine the cause of death and facilitate the screening of potentially at-risk relatives, especially when LQTS or CPVT is suspected (class IIb, level of evidence C) [4].

Next-Generation Sequencing
Massively parallel sequencing technologies, better known as next-generation sequencing (NGS) technologies, have been designed to overtake the barriers of first-generation sequencing [14]. NGS simultaneously analyses millions of small polynucleotide fragments of 50 to 250 base pairs (bp), called "short reads", allowing for high-throughput sequencing. Sample DNA is cut into fragments of 1000 to 10,000 bp and the NGS reads 50-250 bp from either end of the fragment. Each read is "paired" with the read from the opposite end of the fragment ("paired-end" reads); then, alignment algorithms are employed to line up the series of short reads to the "human reference sequence" (the most widely adopted framework for clinical and research genome sequencing deriving from different ethnic groups) to recreate the whole original DNA sequence. Afterwards, a specific software is used to search for mismatches between the reads and the reference sequence which may underlie a variant of interest, although the attribution of clinical significance requires further investigations. This process is very fast and cost-effective, allowing the entire genome to be sequenced in a matter of few days using just a limited amount of DNA [14,22,23] (Table 1).
NGS technology provides accurate and reliable data for most parts of the genome and has been extensively validated against Sanger sequencing [24]. Guidelines for its use as a diagnostic test in clinical practice are available [25].
There are 3 commercially available NGS platforms (Roche/454, Illumina/Solexa and ABI/SOLiD), with the Illumina/Solexa being the most employed (Supplementary Figure S1). A NGS test can be designed to target a restricted gene panel, the whole exome (WES), the whole genome (WGS), or even RNA sequencing (RNA-seq).
Gene panels can range from tens to thousands of genes and are the preferred test when a specific condition or group of diseases is suspected. Indeed, gene panels are usually selected among genes previously associated with a particular phenotype ("core disease gene list"). This approach aims at maximizing sensitivity, specificity and coverage for the selected genes, hence usually has a higher diagnostic yield than WES or WGS [23]. In case of a less clear phenotype, as in SUD, broader gene panels may be preferable, and WES may show a higher diagnostic yield. The decision on which genes to include in the panel is left to the individual laboratory [23]. SCD studies commonly include genes associated with both channelopathies and cardiomyopathies [26]. The cost of a gene panel is variable depending on customization, but usually lower than WES [23].
WES examines all~22,000 known protein-coding genes, which constitute 1-2% of the entire genome. WES is employed for genetic testing of phenotypes with a broad differential diagnosis or as a second line test when targeted genetic panels have been inconclusive. The diagnostic yield of WES depends on the tested population and the availability of family members, with a mutation detection yield up to 50% in highly selected cohorts.
WGS covers a great deal of the entire genome, providing information on regulatory, intronic, and intergenic regions. The indications for WGS use are similar to WES. DNA sequencing is more uniform than WES, but the large amount of data provided limits its applicability due to storage and analytical issues. WGS has also a higher cost than WES or gene panels [23].
RNA-seq provides information on targeted RNA transcripts or even the whole transcriptome with an overall accuracy superior to microarrays [27].

Variant Calling, Filtering, Prioritization and Interpretation
NGS provides a large number of variants which need further filtration and prioritization for clinical interpretation, a process which may differ slightly among individual laboratories, but whose general outline is described below (Figure 1). Several bioinformatic tools are employed in a multistep analysis that produce different files: FASTQ contains base calls of all the reads produced and the quality score of each base; BAM (Binary Aligned/Mapped file) provides read alignment over the reference genome; VCF (Variant Call Format file) includes the chromosomal position, name, and reference genome of each variant.
"Variant calling" is the process of identifying mismatches between the reference genome and the reads aligned over it. There may be errors due to sequencing and alignment mistakes, and specific statistical tools are dedicated to "filter" variants based on the likelihood that a detected mismatch represents a true gene variant or a technical error. Variants are usually identified based on a quality score consisting of a read coverage (i.e., alignment of bases to a specific nucleotide position) ≥ 30-fold and a read percentage (the proportion of bases differing from the reference sequence) ≥ 20. Missense mutations due to single nucleotide polymorphisms are easier to detect through NGS, whereas the probability of finding insertions and deletions of DNA (indels) is inversely proportional to the size of the indel due to higher frequency of alignment errors [22].

Figure 1.
Next-generation sequencing molecular autopsy. DNA of the deceased is extracted from the blood or tissues and then processed via next-generation sequencing tools (usually employing targeted gene pales or exome sequencing), which are faster and more cost-efficient for testing large numbers of genes compared to traditional Sanger sequencing. Once sample DNA sequencing is completed, detected variants must undergo several processes of filtration and prioritization which significantly reduce their number to a small relevant selection for clinical interpretation. ACMG, American College of Medical Genetics and Genomics; DANN, deleterious annotation of genetic variants using neural networks; EDTA, ethylenediaminetetraacetic acid; FATHMM, functional analysis through hidden Markov models; FFPET, formalin-fixed and paraffin-embedded tissue; GERP, genomic evolutionary rate profiling; MAF, minor allele frequency; OMIM, online Mendelian inheritance in man; Polyphen2, polymorphism phenotyping 2; PROVEAN, protein variation effect analyzer; Sift, sorting intolerant from tolerant; WebGestalt, web-based gene set analysis toolkit.
After this "technical filtration", variants must undergo a "biological filtration". Indeed, rare variants must be differentiated from the large number of missense mutations After this "technical filtration", variants must undergo a "biological filtration". Indeed, rare variants must be differentiated from the large number of missense mutations with no biological relevance present in the general population, described as "background noise". The ratio of rare variants in the sample DNA to background noise is referred to as "signalto-noise ratio" [28]. Variants can be filtered for a predefined gene list and/or for a specific frequency (e.g., minor allele frequency -MAF <0.1%, for rare variants) in human genetic databases. In early genomic studies, the absence of a variant in a healthy control population was deemed sufficient to infer its potential pathogenicity, but the novelty of a mutation is no longer considered a reliable criterion for clinical interpretation [28]. Nonetheless, the type of gene involved (e.g., a channelopathy-or cardiomyopathy-associated gene) may provide clues to the clinical relevance of a variant. Another important criterion for biological filtration is the type of mutation, i.e., missense vs. nonsense. Missense mutations are common in unaffected individuals, and genotype-phenotype causal link is more difficult to assess. On the contrary, nonsense mutations (e.g., deletions, insertions and splice-site disrupting mutations) are more likely to produce abnormal proteins and to subsequently have a clinical impact. Accordingly, "nonsense" mutations are rarer and less likely to be found in apparently healthy individuals [28].
After filtration, the variants within a VCF file must be prioritized, i.e., the likelihood that a variant has a functional significance must be established. There are numerous approaches to prioritize variants (Figure 1), and guidelines have been published to standardize this process [29,30]. Previous description of a variant is an important criterion to guide the interpretation of its clinical significance. Databases like ClinVar or OMIM collect information on previously-assessed variants. ClinVar provides a categorical level of evidence for each variant found in scientific literature. Indeed, not all prior reports are robust, and mutations may undergo reclassification as knowledge expands [23]. For example, Campuzano et al. recently reassessed a cohort of 104 subjects and 17 SCD cases diagnosed before 2010 with inherited arrhythmogenic syndromes, finding that more than 70% of rare variants associated with these conditions had changed their classification [31]. In silico tools (DANN, Mutation Taster, FATHMM, MutationAssessor, Polyphen2, Sift, PORVEAN) can be used to predict the effect of a genetic mutation on the protein. Mutations that profoundly alter the protein structure or cause the substitution of an amino acid with another which has completely different chemical properties in a critical domain are more likely to cause a functional change. In particular, aminoacidic substitutions in protein domains conserved in other human proteins with similar function (paralogs) or the same protein in other species (orthologs) are usually more clinically relevant. Dedicated software such as GERP++ or PhyloP can assess intra-and interspecific conservation of DNA sequences [23,28]. Although not always easy to obtain, co-segregation of phenotype with genotype within families is one of the most useful approaches to assess the pathogenicity of a variant.
Most of the previously described approaches are applicable to Mendelian disorders where a single mutation is involved, but there is now evidence that in some cases, particularly in complex phenotypes like SCD, multiple variants may contribute to disease expression [23,28]. Some online tools, like WebGestalt, may be employed to assess if a particular combination of variants may be associated with a particular phenotype. Finally, the functional consequences of mutations can be fully assessed through in vitro cellular expression systems or transgenic animal models. The major drawbacks of these functional studies are their cost and time to obtain results, which make them not suitable for routine evaluation of genetic findings [23,28].
To conclude, the American College of Medical Genetics and Genomics (ACMG) recommends the use of standard terminology to classify variants: 'pathogenic', 'likely pathogenic', 'likely benign', 'benign', and 'variant of unknown significance' (VUS) [29]. Given the complexity of NGS-related bioinformatic analysis, the continuous advances in the field, and the profound impact that an attribution of pathogenicity of a specific variant can have on the management of an individual, the translation of genetic data into the clinical setting requires specific expertise.

Sample Collection
Blood and fresh frozen tissues are the preferred sources for DNA extraction for genetic analysis. Indeed, the HRS/EHRA consensus paper on genetic testing for channelopathies and cardiomyopathies recommends the collection of "DNA-friendly samples (5-10 mL whole blood in ethylenediaminetetraacetic acid-EDTA-tube, blood spot card, or a frozen sample of heart, liver, or spleen) for subsequent genetic testing" [4]. These samples should be stored refrigerated (<4 weeks) or frozen at −20 • C to −80 • C (>4 weeks) in order to not compromise DNA integrity [32]. Similar recommendations can be found in the recently published Asia Pacific Heart Rhythm Society (APHRS)/HRS consensus paper on the investigation of decedents with SUD and patients with sudden cardiac arrest [33]. Although blood sample storage for future reanalysis is now a common practice in SCD assessment, it is not always available for historical SCD cases, posing limitations to their re-examination. On the contrary, formalin-fixed and paraffin-embedded tissue (FFPET) samples, which are usually prepared for histological analysis, are widely accessible, even for old SCD cases, and may constitute a valuable alternative. However, the process of formalin fixation alters DNA through crosslinking and degradation in fragments of an average length of~150 bp. Sanger sequencing, which relies on a read length >250 bp, is therefore difficult to perform on FFPET-derived DNA [34]. NGS, thanks to its lower read length, can overcome these limitations. In 2017, Baudhuin et al. were the first to use FFPET samples for genomic evaluation of 4 cases with a clinical phenotype suggestive of an inherited cardiovascular disorder [35]. The same year, Bagnall et al. were the first to demonstrate the feasibility of NGS on FFPET samples from juvenile SCD cases [36]. A recent study compared results of NGS analysis of 12 SCD cases between FFPET and corresponding non-formalin fixed samples (RNA-later-preserved tissues or bloodstain card): all pathogenic variants, likely pathogenic variants or VUS identified in the nonfixed samples were also confirmed in FFPET samples with a variable degree of confidence, but the latter provided more false positives and negatives, particularly when formalin fixation was longer than 8 days [37]. Therefore, caution is advised for the use of FFPET-derived DNA for genomic studies.

Sequencing-Related Issues
NGS does not characterize all areas of the genome with the same precision. Capture approaches for selective sequencing (such as in WES and targeted panels sequencing) and sequencing chemistry itself can lead to uneven DNA coverage, which may cause misinterpretation of variants. For example, areas of the genome rich in cytosine and guanine nucleotides are harder to sequence because the higher-energy bonds between the DNA strands make them less exposed to replication reaction. Since certainty decreases in regions with low coverage, variant calls from these regions are more likely to be discarded [22,23]. NGS is also prone to alignment errors, affecting most commonly areas of insertions or deletions, or regions of the DNA with repeated sequences that are longer than the short reads [22,23]. All these issues are potential sources of false negative results, and underscore that NGS technologies are not perfect, despite continuous improvement in the speed and accuracy of analysis and sequencing.

Variants of Unknown Significance
NGS technologies have significantly increased the number of variants which can be detected in a single individual; therefore, classification of mutations is of the utmost importance to establish a causal link between genotype and phenotype. Despite the fact that several tools are available for variant prioritization, most of them (like large co-segregational studies of functional evaluation of mutations) are not routinely applicable to SUD cases; therefore, with the expansion of gene testing ensues an increase in detection of VUS (mostly missense) [28]. This is currently considered the main drawback of NGS molecular autopsy, since these VUS cannot be used to infer causative relationships and can-not be used for the screening of the victim's relatives. However, with the more widespread use of NGS and the consequent accumulation of data on SCD-associated variants as well as the development of more sophisticated tools for the prediction of the effect of single or combined mutations, it is expected that numerous VUS will be reclassified in the near future [23,31].

Next-Generation Sequencing in Sudden Cardiac Death
NGS offers the possibility of an "exome molecular autopsy", changing the perspective from the screening of single genes or small panels to testing large, multigene panels.
In 2014 Bagnall et al. for the first time performed WES in 28 juvenile SUD cases and identified 3 rare variants on major LQTS-associated genes, but the expansion of the panel to other channelopathy-and cardiomyopathy-associated genes led to further identification of 6 rare variants [38]. In a following study, the same group performed gene panel analysis (including either 69, 98 or 101 genes) on 51 SUD cases and WES (filtering for 59 cardiacrelated genes) on another 62 SUD cases, finding a mutation in a clinically relevant cardiac gene in 31 cases (27%) [8].
Hata et al. employed a panel of 70 genes to evaluate 25 SUD cases with either normal hearts or non-diagnostic structural abnormalities. They identified 5 known variants and 10 novel variants predicted to have a "high" pathogenic potential after in silico analysis. Mutations involved 3 channelopathy-associated genes (RYR2, CACNA1C, and ANK2), 3 HCM-or DCM-associated genes (MYH7, LDB3, and PRKAG2), 5 ACM-related genes (PKP2, JUP, DSG2, DSP, and TMEM43), and 2 cardiac transcription factor genes (TBX5 and GATA4). Interestingly, combined heterozygous rare variants were found in 3 of the 25 cases, and 2 subjects carried 3 or more variants [39]. These data support the notion that the paradigm of "one gene-one disease" may not apply to all SUD cases, which may sometimes result from the interaction of multiple mutations, as also postulated in a case report by our group [40].
A study on 59 SUD cases included subjects with both autopsy-negative hearts and others with subtle cardiac structural abnormalities not meeting the diagnostic criteria for any specific cardiomyopathy; WES followed by restriction for 135 genes associated with inherited cardiac disorders had a diagnostic yield of 29%, with 7 probands (12%) carrying very rare (MAF < 0.02%) or novel possible pathogenic variants, and 10 (17%) carrying previously published rare (MAF 0.02-0.5%) disease-causing mutations; the higher number of genes tested led to an increase in VUS detection, which were found in 19 (34%) of probands [41]. Hertz et al. employed a panel consisting of 100 channelopathy-and cardiomyopathy-related genes to screen 52 SCD cases with subtle cardiac abnormalities. Variants with "likely functional effects" were identified in 15 cases (29%), of whom 2 (4%) had more than one variant in at least one gene. These mutations were detected with a similar frequency in genes associated with cardiomyopathies (47%) or channelopathies (53%) [42]. These findings confirm the hypothesis that cardiomyopathies may sometimes present with SCD before development of a "diagnostic" phenotype, but also suggest that channelopathies should not be excluded based solely on the presence of minimal structural changes at autopsy.
Ripoll-Vera et al. filtered for a very broad number of genes (194 to 380) related to arrhythmic sudden death for the molecular autopsy of 62 SCDs, obtaining an overall detection yield for pathogenic or probably pathogenic mutations of 31%, at the cost of finding a VUS in about 34% of cases [43]. Dewar et al. published one of the largest studies to date employing a panel of 71 genes in 191 SUDs aged < 5 years. A potentially pathogenic mutation was found in 12 children (6.3%), a novel variant with in silico pathogenic prediction in 15 (7.9%), and a VUS in 36 (18.9%) [44]. Lahrouchi et al. instead employed a panel of 71 genes in 302 SUD cases aged 1-64 years, excluding subjects with evidence of even subtle structural disease. Forty (13%) subjects carried a pathogenic or likely pathogenic mutation, while 42% carried a VUS. Most mutations involved genes associated with LQTS and CPVT, but cardiomyopathy-related genes were also represented. Notably, in surviving relatives the diagnostic yield increased from 26% to 39% thanks to the combination of molecular autopsy and clinical evaluation [26].
Several other post-mortem NGS studies have been recently conducted [45][46][47][48][49][50][51][52][53][54][55][56][57][58], with variable mutation detection rates, not always comparable due to the heterogeneity of genes screened, cases analysed, and methods employed for variant prioritization. Overall, compared with Sanger sequencing-based molecular autopsy, NGS studies have highlighted that cardiomyopathy genes may play a role in some SUD cases, especially in the presence of subtle non-diagnostic cardiac abnormalities, but even in their absence. Moreover, NGS studies have shown the possible coexistence of more than one pathogenic mutation potentially responsible for SCD. Nonetheless, the expansion of genes tested has only slightly improved the overall diagnostic yield of SUD (on average from 20% to nearly 35%), mostly because of the major role still played by the 5 most common channelopathy-related genes (KCNH2, KCNJ2, KCNQ1, RYR2, and SCN5A), which should be tested in all SUD cases. In addition, with the increasing number of genes tested, the majority of additional variants identified tend to be VUS. The reclassification of VUS would be the key to significantly improve the diagnostic yield of NGS molecular autopsy.

Conclusions
Molecular autopsy is a fundamental aid to forensic examination, aiming at establishing a genetic diagnosis when traditional autopsy is inconclusive, with the final goal of aiding medico-legal investigations and guiding cascade genetic screening of the victim's relatives. The diagnostic yield of the molecular autopsy is on average 20% with classical Sanger sequencing, but increases up to 35% and more with targeted NGS or WES, at the cost of detecting a larger number of VUS. Despite its undeniable advantages, the relatively low mutation detection yield of molecular autopsy currently prevents it from being a stand-alone tool in the assessment of SUD, which always requires a comprehensive clinical evaluation. Nonetheless, molecular autopsy through genomic technologies offers the possibility to store data for future reassessments which may unveil novel genotype-phenotype associations, thus supporting an extensive use of this approach. Funding: JUvenile Sudden cardiac deaTh in the Pisa territory: JUST know and treat.

Conflicts of Interest:
The authors declare no conflict of interest.