Comprehensive Molecular Analysis of DMD Gene Increases the Diagnostic Value of Dystrophinopathies: A Pilot Study in a Southern Italy Cohort of Patients

Duchenne/Becker muscular dystrophy (DMD/BMD) is an X-linked neuromuscular disease due to pathogenic sequence variations in the dystrophin (DMD) gene, one of the largest human genes. More than 70% of DMD gene defects result from genomic rearrangements principally leading to large deletions, while the remaining are small nucleotide variants, including nonsense and missense variants, small insertions/deletions or splicing alterations. Considering the large size of the gene and the wide mutational spectrum, the comprehensive molecular diagnosis of DMD/BMD is complex and may require several laboratory methods, thus increasing the time and costs of the analysis. In an attempt to simplify DMD/BMD molecular diagnosis workflow, we tested an NGS method suitable for the detection of all the different types of genomic variations that may affect the DMD gene. Forty previously analyzed patients were enrolled in this study and re-analyzed using the next generation sequencing (NGS)-based single-step procedure. The NGS results were compared with those from multiplex ligation-dependent probe amplification (MLPA)/multiplex PCR and/or Sanger sequencing. Most of the previously identified deleted/duplicated exons and point mutations were confirmed by NGS and 1 more pathogenic point mutation (a nonsense variant) was identified. Our results show that this NGS-based strategy overcomes limitations of traditionally used methods and is easily transferable to routine diagnostic procedures, thereby increasing the diagnostic power of DMD molecular analysis.


Introduction
Duchenne muscular dystrophy (DMD-OMIM number 310200), as well as the allelic Becker form (BMD-OMIM number 310376), is a lethal, rapidly progressive neuromuscular disease, whose characteristic trait is the degeneration of skeletal, smooth, and cardiac muscles leading to progressive muscle fiber damage and loss of muscle function [1][2][3]. Elevated serum level of creatine kinase is early hallmark of the disease that begins in childhood (onset at four or five years of age) and estimated incidence, for the severe DMD form, is of about 1:3300 males; females are usually asymptomatic, though a small percentage may show a mild disease-related phenotype [2][3][4][5]. DMD/BMD is an X-linked recessive disease caused by sequence alterations occurring in the DMD gene (OMIM *300377) encoding the dystrophin protein [6,7]. Dystrophin is an important anchor protein that plays a role in anchoring the cytoskeleton to the plasma membrane through F-actin. In the absence of dystrophin, muscle cells become more permeable; the extracellular matrix enters the cells leading to the destruction and progressive death of these cells that are replaced by adipose tissue [8].
DMD is one of the largest known human genes; it encompasses 79 exons, spanning approximately 2.4 Mb [9]. In addition to its large size, DMD is featured by a complex mutational spectrum, since more than 7000 pathogenic variants are known to date. About 70% of DMD mutations result from genomic rearrangements (GRs) that lead mainly to large deletions and to a lesser extent to duplications involving one or more gene exons [10]. The remaining mutations are small nucleotide variants (SNVs), including nonsense and missense variants, small insertions/deletions (INDELs) or splicing alterations, which can occur anywhere along the gene [1,6,11,12]. Thus, DMD molecular analysis is complex and requires multiple analytical steps to reach enough diagnostic sensitivity. However, confirmation of the DMD/BMD clinical suspicion should be as quick as possible to ensure appropriate patient care, carrier identification, family planning, prenatal diagnosis and, most importantly, prompt access to personalized treatment [13,14].
Next generation sequencing (NGS)-based approaches are currently used for routine molecular diagnostics [6,15,16]. Indeed, these approaches have shown their reliability and accuracy in the analysis of single disease-causing genes, panels of genes related to a disease of interest, and the whole exome/genome [17][18][19][20][21][22][23]. Consequently, NGS is redefining the standards for the detection of SNVs related to the onset of human diseases.
Recently, it has been proposed that the high sequence coverage, unusual of NGS applications, may be used to estimate the presence of copy number variations (CNVs) often associated with GRs [24,25]. In this way, NGS may be able to allow, in a single analytic procedure, the complete diagnosis of a disease of interest by detecting both SNVs and GRs [26]. In this contest, we have recently shown that an NGS-based method, coupled with specific bioinformatic tools, was able to identify large heterozygous deletions/duplications in different disease-associated genes [23,27].
Here, we aimed to verify if the same analytic strategy could be effective to improve molecular diagnosis of DMD/BMD. For this purpose, we tested an NGS method suitable for the comprehensive detection of all the different types of genomic variants affecting the DMD gene. Our results show that the proposed strategy overcomes the limitations of traditionally used methods and is easily transferable to routine diagnostic procedures.

Study Population and DNA Samples
Forty DNA samples were selected among those who underwent a DMD molecular analysis at the CEINGE molecular diagnostic core lab. In particular, 31 DMD/BMD patients (males, aged from 3 to 61 y), and 9 DMD carriers (females, aged from 13 to 62 y) were included in this study. All patients gave their written informed consent to the molecular analysis. The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of University of Naples Federico II (protocol code 77/21, 26 March 2021).
To assess the accuracy in mutations detection of the proposed NGS-based method, patients were selected among those carrying different DMD mutations, spanning all over the DMD gene sequence. In particular, 4 carried point mutations identified by Sanger sequencing; 32 patients harbored large deletions/duplications identified by multiplex PCR and/or MLPA analysis; and 4 resulted in a wild type after the multiplex PCR and/or MLPA analysis. NGS-DMD screening was carried out blind.

DMD Gene Traditional Molecular Analysis
Genomic DNAs were obtained by peripheral blood samples in EDTA by using standard procedures. For CNVs detection, quantitative fluorescence multiplex PCR and/or MLPA analyses were carried out. In particular, 4 multiplex PCR reactions/patient were performed on a 2700 thermal cycler (Applied Biosystems Inc., Foster City, CA, USA) to analyze the 24 DMD exons corresponding to CNVs hot-spot, using labeled primers and an internal standard reference as a control to be added to each multiplex PCR. The obtained PCR products were separated by capillary gel electrophoresis on the ABI PRISM 3130 XL genetic analyzer (Applied Biosystems, Foster City, CA, USA), as previously reported [28]. The MLPA test was performed to screen all the dystrophin gene exons using the SALSA MLPA probe sets P034/P035 (MRC-Holland, Amsterdam, the Netherlands), according to the manufacturer's instructions. Ligation and amplification steps were performed using a 2700 thermal cycler (Applied Biosystems Inc., Foster City, CA, USA). Next, all the amplified fragments were separated using capillary electrophoresis on an ABI PRISM 3130 XL genetic analyzer and data analysis was carried out using the Coffalyser software (MRC-Holland, Amsterdam, the Netherlands).
Sanger sequencing was used to detect point mutations already identified within the family. PCR amplification was carried out on a 2700 Thermal Cycler (Applied Biosystems Inc., Foster City, CA, USA) to specifically amplify the DMD exon carrying the pathogenic mutation. Next, direct sequencing was performed using an ABI 3100 capillary sequencer (Applied Biosystems Inc., Foster City, CA, USA). Sanger electropherograms were visualized using the SeqMan tool (DNASTAR, Inc., Madison, WI, USA).

NGS Analysis
All the selected DNA samples, already analyzed as described above, were quantified by using the Thermo Scientific™ NanoDrop spectrophotometer, and quality-assessed on agarose gel (0.8% agarose and 0.1 µg/ml ethidium bromide). Next, DNA indexed libraries were prepared using Multiplicom's DMD MASTR assay protocol (Multiplicom, Niel, Belgium), following the manufacturer's instructions. Briefly, for each sample, 200 ng of genomic DNA were used as input. All the DMD target regions were amplified in 4 separate multiplex PCR amplification reactions. A second round of amplification (Universal PCR) was performed to univocally tag all the obtained amplicons from the same DNA sample with a specific barcode sequence (INDEX) and the p5-p7 adaptors, required for sample multiplexing and the downstream sequencing reactions. Each indexed amplicon library was then purified using the Agencourt AMPure XP beads (Beckman Coulter, Inc., Brea, CA, USA), and verified for quality using the Agilent 2100 Bioanalyzer with a DNA 1000 LabChip (Agilent Technologies, Santa Clara, CA, USA). All the 40 indexed libraries were pooled in equimolar amount and sequenced on the MiSeq system (Illumina Inc., San Diego, CA, USA) using the Illumina Reagent Kit V2 (500 cycles PE, 2 × 250 bp). Sanger sequencing was used to confirm identified causative point mutations or doubtful variants.

NGS Sequence Data Analysis
The downstream sequencing data analysis was carried out using 2 different bioinformatic software: (i) the Sequence Pilot software (version 4.2, JSI Medical Systems GmbH, Kippenheim, Germany) used to detect point mutations; and (ii) the Sophia DDM®software (version 4.8.1.3, Sophia Genetics SA, Saint Sulpice, Switzerland), used to verify the presence of point mutations and identify large GRs (LGRs).
In particular, to detect possible SNVs and small INDELs in the sequenced samples, the SeqNext module of the JSI SeqPilot maps all the obtained sequencing reads against the DMD reference sequence (ENSG00000198947 gene reference and ENST00000357033 transcript from Ensemble database). Reads corresponding to each sample were sorted according to their INDEX and variants were called only if their frequency was more than 10% considering the combined reads, excluding homopolymers (repetition of 6 or more identical nucleotides), according to the manufacturer's instructions and the default parameters suggested by the JSI SeqPilot software. All the detected variants were annotated and classified according to their biological and clinical significance using different databases, such as Ensemble (http://www.ensembl.org, accessed on 12 October 2021), Clin-Var (https://www.ncbi.nlm.nih.gov/clinvar/, accessed on 12 October 2021), and Varsome (https://varsome.com/, accessed on 12 October 2021).
The Sophia Genetics DDM ® software, in addition to call SNVs, can deduce the sex of each patient based on their homozygous and heterozygous status. Further, the software implements an algorithm able to automatically select, based on coverage similarities, a set of reference samples among those sequenced in the same run. Then, using these reference samples, the coverage is normalized by sample and by the target region, and CNVs calling is performed using a hidden-Markov-model algorithm [11]. Sophia DDM software classifies all samples into rejected, medium, and low noise, based on the residual coverage noise after the step of normalization and CNVs calling. No CNV results were reported for the rejected samples, but plots of the coverage profile for those samples were still included for illustration purposes.
In not-rejected samples, individual target regions were classified into three categories: high-confidence, medium-confidence, and undetermined. The performance of the CNV module is higher for longer CNVs than for shorter CNVs and higher for deletions than for duplications. In most cases, single-amplicon duplications would not be missed, but labeled as "undetermined". However, it is recommended to re-test, for validation purposes, all the GRs found using an independent test, i.e., MLPA. The entire workflow is summarized in Figure 1. After genomic DNA quality and quantity assessment, each sample has been analyzed according to traditional molecular techniques (A) and NGS (B). In particular, MLPA and/or Sanger sequencing were carried out to detect DMD pathogenic mutations (A). The same samples were analyzed blindly by NGS (B). DNA libraries were prepared with an amplicon-based protocol for each study subject. Obtained libraries (corresponding to 40 individual samples) were sequenced in one sequencing run using the MiSeq system. NGS sequence data analysis was carried out using two different pipelines. Finally, NGS results were compared to those obtained with conventional diagnostic procedures.

Results
The 40 patients included in this study were analyzed following the routine diagnostic procedure for DMD (multiplex PCR and/or MLPA and/or Sanger sequencing), as described under the methods section. Moreover, they were analyzed blindly with the NGS-based strategy described above to assess its performance in diagnostic settings.

NGS Sequence Coverage
Each amplicon library was successfully analyzed as described under Materials and Methods. In total, we obtained 8,706,134 read pairs with 90.60% (7,887,326 reads) of mapped reads. The 90.23% of reads obtained were on target, while 1.12% and 8.65% were respectively "off-target" and unmapped reads. On target coverage distribution per sample is reported in Supplementary Table S1.

Point Mutation Identification
Fifty-eight total variants, scattered along the whole DMD gene, were detected. Among these, 39 (67%) were intronic and 19 (33%) were exonic variants. Among the 19 coding sequence variants, 11 (58%) were missense, 3 (16%) were nonsense variants, 3 (16%) were synonymous, and 2 (10%) were frameshift variants. All the detected variants were classified according to their clinical significance using the ClinVar database and/or American College of Medical Genetics (ACMG) classification ( Figure 2A, Table 1).   Table 2. Comparison of the results found by combining MLPA and multiplex-PCR, or by Sanger sequencing approaches ("Previous results"), and those obtained by next generation sequencing ("NGS results), in the 40 subjects enrolled in this study. For the CNV analysis four samples were "REJECTED" and five samples showed "UNDETERMINED" regions, due to a high background noise level. In the NGS point mutation's results column, a pathogenic mutation not previously identified by the traditional diagnostic flowchart is highlighted in bold.

NGS Results
Sample ID Gender Phenotype mPCR Results   As reported in Table 1, 3 pathogenic nonsense variants were detected: 2 were already identified by the previous Sanger sequencing analysis (c.3259C>T, p.Gln1087* and c.2414C>G, p.Ser805*), the other one (c.583C>T, p.Arg195*) was detected in a DMD patient (ID-5) who resulted wild type after the previous multiplex PCR and MLPA analysis.

NGS CNV Results
Sanger sequencing was carried out to confirm this NGS data ( Figure 2B). In addition, two frameshift mutations causing a premature stop codon were also identified confirming previous Sanger results.

CNVs Detection
CNVs analysis was performed on the whole cohort of samples by the Sophia Genetics software ( Table 1). The sex of each sample was successfully identified by the software. Twelve samples, automatically selected by the software, were used as control references. One hundred forty duplicated target regions and 110 deleted target regions were achieved. The average coverage per target region was 1623X, with average residual noise of 0.0913.
Of the 40 analyzed samples, 24 were classified as samples with low-noise, 12 with medium-noise, and only 4 were rejected from the CNV analysis due to the background noise level ( Figure S1, Table S2). Five samples showed some undetermined regions, probably corresponding to the presence of duplications in a single amplicon. Indeed, comparing results of the previous multiplex PCR/MLPA with Sophia software results, all the undetermined regions resulted to be duplicated exons ( Figure S2, Table 2). Three out 4 rejected samples had deletions that therefore escaped to the Sophia software analysis. However, despite the software considering them rejected, it is easy to recognize the presence of deletions through a virtual inspection with the IGV software (see also Supplementary Materials, Figure S1).
CNVs results in the remaining 34 patients were in alignment with the previous multiplex PCR and/or MLPA data. In detail, 11 samples presented duplications, 16 showed deletions and seven resulted wild type (Figures 3 and S3-S5). In addition, Sophia Genetics software identified in two patients (ID-19 and ID-30) a single exon deleted/duplicated not confirmed using MLPA strategy; these can be considered as single dropouts (Table 2 and Figures S3-S5).

Integrative Genome Viewer (IGV) Analysis
To make sure Sophia software did not miss any gene deletions, for each sample we performed a visual check of the reads corresponding to each DMD exon by IGV analysis. This control was carried out by comparing .bam file of samples with that of a reference male genome with no GRs in the DMD gene. In three patients "rejected" by Sophia DDM (ID-28, ID-43, ID-44 in Table 2), the IGV inspection allowed to reveal the deleted exons.

Discussion
Herein, the validation study of an NGS based-approach able to identify a unique strategy with both SNVs and GRs in the DMD gene is reported.
The study group, including 45 unrelated patients, was previously screened using traditional approaches (multiplex PCR and/or MLPA for large deletions/duplications identification and Sanger sequencing for known point mutations) and re-analyzed in blind using NGS. All the male patients were affected by DMD, whereas the nine carrier females were asymptomatic.
We found a total of 58 SNVs, including three nonsense pathogenic or likely pathogenic variants. One of them, the c.583C>T (p.Arg195*-rs398123999), was identified in a patient that resulted as a wild type for the previous traditional screening analysis not including the search for SNVs. Four patients were rejected from the CNVs analysis but recovered due to IGV software; for 36 individuals a CNV report was obtained. Five patients reported different undetermined regions, in particular, for the presence of duplications as specified in the Sophia Genetics CNV analysis report. All the variants found previously by using Sanger sequencing or MLPA were confirmed with our strategy. The molecular diagnosis of DMD/BMD has been traditionally considered as a lengthy and complex process [9,16]. Indeed, the procedure entailed both MLPA and/or multiplex PCR to detect large deletions/duplications and Sanger sequencing to detect SNVs, thus identifying all the possible DMD mutations [29]. However, considering the large size of the gene and the fact that SNVs occur more rarely than GRs, DMD Sanger analysis is not offered by all laboratories, with consequent lack of diagnostic sensitivity and underestimation of the relative weight that SNVs may have in DMD pathogenesis. Moreover, in most cases Sanger sequencing is performed only after that muscle biopsy reveals dystrophin deficiency, with consequent delays in obtaining a crucial information to identify the healthy carriers within the family, to offer to at-risk couples proper genetic counseling and the opportunity to take advantage of pre-implantation and/or prenatal diagnostic procedures [30][31][32]. In addition, the recent discovery of several emerging therapies, based on the repairing and/or restoring of specific mutations, requires the availability of even more accurate, sensitive, and specific molecular techniques for the fast and comprehensive DMD gene molecular scanning [16,33].
NGS offers an opportunity to fill-in existing gaps in the molecular diagnosis of DMD/BMD, as also recently proposed [34]. Thus, the method we described herein can be easily implemented in routine diagnostic practice with the advantage not only to reduce time and cost of the analysis but also to detect all the possible kinds of mutations affecting DMD (Figure 4). This increased sensitivity will, in turn, ameliorate the clinical care of affected patients and their families, and not least the number of muscle biopsies to be performed in the often very young patients. Future perspectives concern the use of NGS as the first tier strategy followed by MLPA for negative patients, in order to exclude putative undetected duplications. To this end, we are expanding our case study by also enrolling undiagnosed patients. NGS drawbacks may be indicated as follows: (i) the methodology does not analyze deep intronic regions; (ii) the cost-benefit assessment, which depends on the number of samples tested for each sequencing run, should be calculated by each laboratory depending on patient influx and urgency issues.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/diagnostics11101910/s1, Table S1: Percentage of target regions at different values of sequencing coverage, Table S2: General run statistics of the CNV analysis; Figure S1: Panel of rejected samples for CNV analysis, Figure S2: Panel of samples with undetermined regions, Figure S3: NGS-based deletion detection in DMD by Sophia Genetics Software, Figure S4: NGS-based duplication detection in DMD by Sophia Genetics Software, Figure S5: Panel of wild-type samples for CNV analysis.