Application of Combined Long Amplicon Sequencing (CoLAS) for Genetic Analysis of Neurofibromatosis Type 1: A Pilot Study

Elaborate analyses of the status of gene mutations in neurofibromatosis type 1 (NF1) are still difficult nowadays due to the large gene sizes, broad mutation spectrum, and the various effects of mutations on mRNA splicing. These problems cannot be solved simply by sequencing the entire coding region using next-generation sequencing (NGS). We recently developed a new strategy, named combined long amplicon sequencing (CoLAS), which is a method for simultaneously analysing the whole genomic DNA region and, also, the full-length cDNA of the disease-causative gene with long-range PCR-based NGS. In this study, CoLAS was specifically arranged for NF1 genetic analysis, then applied to 20 patients (five previously reported and 15 newly recruited patients, including suspicious cases) for optimising the method and to verify its efficacy and benefits. Among new cases, CoLAS detected not only 10 mutations, including three unreported mutations and one mosaic mutation, but also various splicing abnormalities and allelic expression ratios quantitatively. In addition, heterozygous mapping by polymorphisms, including introns, showed copy number monitoring of the entire NF1 gene region was possible in the majority of patients tested. Moreover, it was shown that, when a chromosomal level microdeletion was suspected from heterozygous mapping, it could be detected directly by breakpoint-specific long PCR. In conclusion, CoLAS not simply detect the causative mutation but accurately elucidated the entire structure of the NF1 gene, its mRNA expression, and also the splicing status, which reinforces its high usefulness in the gene analysis of NF1.


Introduction
Neurofibromatosis type I (NF1) is one of the most common autosomal-dominant disorders, occurring with an incidence of one in 2500-3000 individuals, independent of ethnicity, race, and gender [1]. Half of the affected individuals have a de novo NF1 mutation, while the other half carry a mutation that appears as a familial trait [2]. The clinical features of NF1 are characterised by multiple café au lait macules, Lisch nodules in the iris, and fibromatous tumours of the skin. Less common but potentially more serious manifestations, include plexiform neurofibromas, malignant peripheral nerve sheath tumours, optical pathway and other central nervous system gliomas, scoliosis, tibial dysplasia, and vasculopathy [3]. The diagnostic criteria of NF1, according to the National Institutes of Health Consensus Conference in 1988, are generally accepted worldwide for the current routine clinical practice [4]. The causative gene of the disease, i.e., NF1, was first identified in 1990 by Wallace et al. [5]. NF1 is located on chromosome 17q11.2, spans approximately 290 kb of genomic DNA, contains 58 exons, and encodes a 220-250-kDa cytoplasmic protein called neurofibromin [2]. The protein neurofibromin's main role is to be a negative regulator of the RAS proto-oncogene. Neurofibromin acts as a guanosine triphosphatase (GTPase)-activating protein (GAP), maintaining the proto-oncogene RAS in the inactive GDP form by accelerating the conversion of GTP-RAS to GDP-RAS through the NF1 GAP-related domain (NF1-GRD). The subjects with the disorder have an increased susceptibility to the development of benign and malignant tumours, because RAS is overactivated as a result of the NF1 loss-of-function mutation [2,3]. Legius syndrome is important for the differential diagnosis of NF1 [6]. This autosomal-dominant inherited disease caused by SPRED1 mutation is indistinguishable from NF1, since it also produces café au lait macules and axillary or inguinal freckling but does not produce neurofibromas or malignant tumours in contrast to NF1. Café au lait macules are often the only symptom in young patients with NF1, and a distinction is required between these two syndromes for prognostic estimations. The only reliable method for differential diagnosis is genetic testing. SPRED1 located on chromosome 15q14 belongs to the RAS-MAPK pathway and is involved in the inactivation of RAS together with neurofibromin [7].
Although the causative gene was identified more than 30 years ago, the gene analysis of NF1 still remains a challenge for several reasons. First of all, NF1 is a large gene that lacks mutation hotspots, screening all 58 exons being required for merely detecting mutations in the coding region. In addition, aberrant splicing due to deep intron mutations [8] and intragenic- [9] or chromosomal-level large deletions [10] have also been reported. Furthermore, a significant population of patients have mosaic mutations [11]. Additionally, 10 or more NF1 pseudogenes are present in the human genome, and these highly similar DNA sequences prevent accurate NF1 mutation analyses [12]. The combination of multiplex ligation-dependent probe amplification (MLPA) and/or other methods of screening for large deletions and RNA-or DNA-based Sanger sequencing achieved very high NF1 mutation detection rates: 93% and 97%, respectively, in the large cohorts of Korea [9] and France [13]. However, multistep screening and large amounts of Sanger sequencing require labour and time. In recent years, with the advent of next-generation sequencing (NGS), mass sequencing has become easier and faster [14,15]. It was also possible to simplify the analytical procedure. Pasmant et al. showed DNA-based targeted NGS by a multiplex PCR (230 amplicons of~150 bp) approach could detect point mutations and copy number alterations simultaneously [14]. These achievements are noteworthy, but there is still room for improvement in the NGS method currently in use. Sabbagh et al. showed that a significant proportion of NF1 missense mutations (30%) were deleterious by affecting pre-mRNA splicing [13]. Additionally, recent studies have shown that many mRNAs bearing premature termination codons (PTCs) escape nonsense-mediated mRNA decay (NMD), in addition to the canonical rules as the NMD evasion of PTCs in the last or penultimate exons [16]. Therefore, the true effect of NF1 missense and protein-truncating mutations, i.e., whether there is a possibility of protein expression, cannot be determined without the simultaneous testing of DNA and RNA. In addition, the diagnosis of large deletion/duplication mutations by altered exon copy numbers in MLPA and NGS does not determine the mutation itself, the breakpoint sequence on the DNA. From the above point of view, there is a need for a method that can detect mutations in a wide range and determine the nature of mutations in more detail in a single experimental system using NGS.
A similar situation has been observed in another neurocutaneous syndrome, namely tuberous sclerosis complex. Recently, we developed multi-modular long-range PCR-based NGS analysis methods and, also, a combination application of them, combined long amplicon sequencing (CoLAS), to solve this problem [17]. In this pilot study, CoLAS was applied to NF1 gene diagnosis for five previously reported patients with known mutations, three point mutations and two chromosomal-level microdeletions [12], and in the 15 patients newly recruited for this study. Furthermore, we optimised the method and evaluated its accuracy and utility.

Patient and Sample
The patients participating in this study include those clinically diagnosed by NIH criteria [4] (definite cases) or suspected to have NF1 by partial symptoms but do not fulfil the diagnostic criteria (suspicious cases). The first five patients were cases in which we previously reported NF1 mutations [12], and their samples were used to optimise and validate the accuracy of NF1 CoLAS. The remaining 15 patients were newly recruited for the present study. Written informed consent was obtained from all patients or parents who participated in this study, and the study design was approved by the Ethics Review Board of Kanazawa Medical University (No. G160, 25 August 2020).

Genomic DNA and Total RNA Extraction and Full-Length cDNA Synthesis
In this study, all DNA samples used were extracted from peripheral whole blood using a rapid extraction method [18]. This method is capable of extracting very-high-molecularweight DNA, being suitable for long-range PCR amplification. The DNA amount and the optical density (OD) A260/280 ratio were measured using a Nanodrop instrument (Thermo Fisher Scientific, Waltham, MA, USA). The total RNA from peripheral blood mononuclear cells (PBMC) was extracted using TRIzol (Thermo Fisher Scientific). RNA concentrations and OD ratios were measured on a Nanodrop instrument, and the RNA integrity number was measured using a TapeStation 4200 and High-Sensitivity RNA ScreenTape (Agilent Technologies, Santa Clara, CA, USA). Full-length double-stranded cDNA was synthesised from 50 ng of total RNA using a SMART-Seq ® HT Kit (Takara Bio USA, Mountain View, CA, USA), according to the manufacturer's standard protocols.

Long-Range PCR and RT-PCR
We set up several types of long-range PCR-based NGS (long amplicon sequencing; LAS) and arranged them to combine (CoLAS) for the NF1 genetic analysis (see, also, Supplementary Materials). The entire NF1 genomic region of about 290 kb of chromosome 17 (NC_000017.11:g.31089750_31378015) was amplified by four sets of multiplex long-range PCR for Multiplex LAS (MuLAS) ( Figure 1A), and the entire NF1 mRNA was amplified by RT-PCR with 8 PCR primer sets for reverse-transcribed LAS (rLAS) ( Figure 1B).
To cover the entire genomic region of NF1, 23 long PCR primer sets were designed to overlap, with a length from 4102 bp to 21,982 bp. PCR primer sets were divided into four groups (A-D) for multiplex PCR. For RT-PCR, although the SMART-Seq HT kit synthesises full-length mRNA, the NF1 expression in PBMC is very low, an eight consecutive primer set being required to amplify the entire mRNA. NF1 type-1 microdeletion break point-specific long-range PCR primers were also prepared according to the literature [20] (Table 1). Moreover, we designed SPRED1 very long PCR primers, which covered the whole coding exons and surrounding genomic regions.   Large genomic deletion (type-1 microdeletion)-specific primers # NC_000017.11 (chr17, GRCh38.p13) For the NF1 multiplex long PCR amplification, each PCR reaction of four multiplex groups contained 1 µL of 20-ng/µL genomic DNA in a 10-µL reaction volume, and two-step PCR cycles were performed with the KOD Multi&Epi enzyme (TOYOBO, Osaka, Japan); after an initial denaturation step at 94 • C for 2 min, there were 35 cycles of 98 • C for 10 s and 68 • C for 10 min, for a total run time of 6 h and 11 min. To obtain uniform amplification products in the multiplex long PCR, the concentration of each primer was experimentally adjusted. The concentrations of the all primers started from 0.1 µM, and the results of the sequence were observed: the concentration of the primer set with a high depth was decreased, and the concentration of the primer set with a low depth was increased. By repeating this step, it was optimized, as shown in Table 1, through 9-, 4-, 3-, and 2-times trials for each of the four groups: A, B, C, and D, respectively.
For the NF1 RT-PCR, 1 µL of synthesised cDNA was added, and the final concentration of each primer was 0.4 µM in a 10-µL reaction volume, and three-step PCR cycles were performed with the KOD Multi&Epi enzyme; after an initial denaturation step at 94 • C for 2 min, there were 35 cycles of 98 • C for 10 s, 62 • C for 10 s, and 68 • C for 1 min. For the SPRED1 very long PCR and NF1 type-1 microdeletion break point-specific PCR, 1 µL was added to 20-ng/µL genomic DNA and a final concentration of 0.15 µM of each primer in a 10-µL reaction volume, and touchdown PCR cycles were performed with the KOD One enzyme (TOYOBO): 3 cycles of 98 • C for 10 s and 74 • C for 10 min, 3 cycles of 98 • C for 10 s and 72 • C for 10 min, 3 cycles of 98 • C for 10 s and 70 • C for 10 min, and 25 cycles of 98 • C for 10 s and 68 • C for 10 min. The total run times were 5 h and 58 min. Although full-length double-stranded cDNA was synthesised from the total RNA of PBMC by a SMART-Seq ® HT Kit, it was difficult to amplify the full-length NF1 cDNA with a single primer set. The maximum length that could be amplified varied according to the sample, and a uniform and stable amplification was obtained finally for all the samples by fractionating them into 8 segments ( Figure 1B). It was considered that the low expression level of the NF1 gene in the blood (low target amount) and the higher-order structure, which long-length cDNA easily take, may affect the amplification efficiency.

Library Preparation and Sequencing
Long PCR products of each sample were pooled and purified using AMPure XP (Beckman Coulter, Brea, CA, USA) by 0.4X, and an NGS library was prepared using a Nextera Flex DNA kit (Illumina, San Diego, CA, USA), according to the manufacturer's protocols. Libraries were quantified using an HS Qubit dsDNA assay (Thermo Fisher Scientific) and a TapeStation 4200. Qualified size distributions were checked on a TapeStation 4200 using High-Sensitivity D1000 ScreenTape. A 12.5-pM library was sequenced on an Illumina MiSeq system (2 × 250 cycles), according to the standard Illumina protocols (Illumina).

In Silico Analysis of Missense Variants
For missense mutations whose pathological significance has not been determined, an in silico analysis was performed with Variant Annotation Integrator (http://genome.ucsc. edu/cgi-bin/hgVai?hgsid=723950191_wYiZpDqQkfTaYfcMalCmmczL927z, last accessed on 30 May 2021) and Combined Annotation-Dependent Depletion (CADD) (https://cadd. gs.washington.edu/, last accessed on 30 May 2021). Variant Annotation Integrator is a simplified version of dbNSFP [29] that runs on the website, and the results of multiple mutation prediction programmes are output at once. These programs include SIFT (https:// sift.bii.a-star.edu.sg/, last accessed on 7 July 2021), PolyPhen-2 (http://genetics.bwh.harvard. edu/pph2/, last accessed on 7 July 2021), Mutation Taster (http://www.mutationtaster.org/, last accessed on 7 July 2021), Mutation Assessor (http://mutationassessor.org/r3/, last accessed on 7 July 2021), and the Likelihood ratio test (http://www.genetics.wustl.edu/jflab/ lrt_query.html, last accessed on 7 July 2021). CADD is a framework that integrates multiple annotations into one metric by contrasting the variants that survived natural selection with simulated mutations. The results are expressed as PHRED-like scaled C-scores, and usually, greater than 20 is adopted as the cut-off value. CADD ranking is a variant relative to all possible substitutions of the human genome. A C-score greater than 20 means the variant is ranked within the 1% of the most deleterious variants [30]. In this study, we tentatively defined that a missense variant could be pathogenic when all of the following conditions were met: frequency of the general population in dbSNP was less than 0.001, majority of the programmes of Variant Annotation Integrator determined the variant as pathogenic, and the PHRED-like scaled C-score of CADD was above 20.

Heterozygosity Mapping of NF1 Region
Since MuLAS analyzes the entire NF1 genomic region, including the intron sequence, it is possible to determine whether the copy number of the region is kept at two by detecting the heterozygous variants. In addition, if there is a large deletion or insertion within the long-range PCR region where heterozygosity is maintained, it should be detected, and the break points were accurately analysed by Pindel software [28]. Notwithstanding, no intragenic large deletions or insertion of the NF1 were detected in this study.
There are many polymorphisms in introns, but many heterozygous false positives are also detected, because the error call rate of NGS in repetitive DNA sequences is high [17]. To eliminate these false positives, two samples, NF_04 and NF_05, were used with the previously reported large deletions at the chromosomal level [12]. A total of 177 heterozygous variants called by HaplotypeCaller in these samples could be determined to be false positives, because these samples have only one allele of NF1 and heterozygous variants cannot occur. Most of them were due to repetitive sequences, and some seemed to be sequence-dependent PCR errors. True heterozygous variant candidates were extracted in each sample by removing these false positives from heterozygous calls by HaplotypeCaller. Finally, true heterozygous variants were determined by visually checking with IGV. Since many SNPs are common to multiple samples, the number requiring visual inspection decreased as the number of samples increased. For heterozygosity mapping of the NF1 region, the number of heterozygous variants in each long-range PCR region was plotted in each patient.

Detection of Chromosomal-Level Large Deletion by Break Point-Specific Long-Range PCR
According to previous studies, two recurrent chromosomal microdeletions are found in patients with NF1 [31]. These microdeletions have specific break points located in paralogous regions flanking NF1: proximal NF1-REP-a and distal NF1-REP-c for the 1.4-Mb type-1 microdeletion and SUZ12P1 and SUZ12 for the 1.2-Mb type-2 microdeletion. Type-1 microdeletion is a major type (about 80%), while type-2 microdeletion is relatively rare (about 10%). NF_04 and NF_05 are known to have a type-1 microdeletion, according to a previous study. The proximal break point of the type-1 microdeletion is located between LRRC37BP1 and SUZ12P1, and the distal break point is located on the telomere side of LRRC37B [32], and the region containing break points can be amplified by long PCR (Table 1) [20].

Accessing Features of NF1 Splicing Mutations
Since RT-PCR uses the full-length cDNA as a template, although it is amplified by 8 fractions, it shows the overall mutant allele expression and splicing state of NF1 mRNA. In addition, PCR amplifications of both alleles at the heterozygous points are not always equal [33], but the differences in amplification at the DNA level (MuLAS) can be used to correct the allele expression ratio at the mRNA level (rLAS).

Sanger Sequencing
To validate the NGS results, each NF1 exon PCR was performed as previously reported [12], and direct DNA sequencing was performed using the BigDye Terminator v3.1 cycle sequencing kit and ABI PRISM 3100xl Genetic analyser (Thermo Fisher Scientific).

Statistical Analysis
The differences of the NF1 allelic expressions were analysed by unpaired-t tests, and p < 0.05 (double-tailed) was considered statistically significant.

Clinical Features of the Patients with Neurofibromatosis Type 1
A total of 20 patients with neurofibromatosis type 1, including suspicious cases, participated in this study. A summary of the clinical features of the included patients in this study are listed in Table 2. Patients NF_06, 09, and 16 were suspected of having NF1 by café au lait macules and central nervous system symptoms, while NF_13 had only multiple café au lait macules. In these patients, other NF1 symptoms were not present, and the patients did not meet the clinical diagnostic criteria. However, these patients may not have cutaneous and/or plexiform neurofibroma due to their young ages.

CoLAS for Genetic Analysis of NF1
Multiplex DNA amplicons of NF1 were pooled and sequenced by NGS, Multiplex LAS (MuLAS). As a result, uniform coverage was obtained ( Figure 1C). Additionally, NF1 RT-PCR amplicons were pooled and performed a junction sequence analysis by NGS, reverse-transcribed LAS (rLAS), and the splicing state of the full-length NF1 mRNA was reconstructed ( Figure 1D). Four primer sets of very long-range PCR (around 20-kb single amplicons) for the SPRED1 gene was set up to differential the diagnosis of Legius syndrome ( Figure 1E), and NGS sequencing was performed, very LAS (vLAS) ( Figure 1F).
As a result of sequencing, all long-range PCR products were amplified almost evenly, and sequence reads were formed seamlessly. Therefore, the coverage rate for the target sequence (defined as the percentage of bases with a depth of 40 or more on the IGV among the bases between each PCR primer) reached 100%. The average depth of MuLAS and vLAS was 200X (at least 40X), and rLAS was 600X (at least 100X). We evaluated the CoLAS running costs at approximately ¥10,000 (Japanese yen) per sample, including DNA and RNA extraction, cDNA synthesis, library preparation, NGS sequencing, and Sanger sequencing.

Point Mutation Detection of NF1
MuLAS detected various point mutations at the DNA level (Table 3). In the three cases previously reported mutations by Sanger sequencing-namely, in NF_01-03-the same mutation was also detected in this study [12]. Of the 15 newly analysed patients, 8 out of 11 clinically definite cases were found to have variants that were thought to be disease-causative mutations. NF_07, 08, 17, and 19 had protein-truncating mutations, and two of them were newly identified mutations in this study. NF_20 had a known start codon mutation [13,34], and NF_10, 11, and 12 had infrequent missense variants and had varying degrees of clinical significance, according to ClinVar (https://www.ncbi.nlm.nih. gov/clinvar/ (accessed on 30 May 2021)). On the other hand, protein-truncating mutations were identified in two out of the four suspicious cases (NF_06 and 16), and one of which was a novel mutation.
An in silico analysis was performed for missense mutations whose pathological significance has not been determined (Table 4). To varying degrees, these mutations were determined to be potentially pathogenic.
These NF1 variants detected by NGS were validated by Sanger sequencing (Figure 2), and all of them were confirmed. In NF_10, the missense variant c.2033C>T p.Pro678Leu was detected with a synonymous variant c.2034G>A p.Pro678=, which is a common SNP (rs2285892) with high allelic frequency in the general population, A = 0.439172 (55146/125568, TOPMED). Sanger sequencing cannot distinguish if these two variants are located on a same allele (in-cis) or different alleles (in-trans); since NGS sequences on a molecule-by-molecule basis, it has been shown that these variants are present in different alleles (in-trans). As every c.2033C>T and c.2034G>A substitution appeared in different reads, no read contained both of them. The nonsense mutation of NF_17, c.625C>T p.Gln209Ter, was considered as mosaic. GATK's HaplotypeCaller detected a low frequency of mutation allele (T = 0.25), and the Sanger sequence also detected a low T peak.
Curr. Issues Mol. Biol. 2021, 1, FOR PEER REVIEW 16 These NF1 variants detected by NGS were validated by Sanger sequencing (Figure 2), and all of them were confirmed. In NF_10, the missense variant c.2033C>T p.Pro678Leu was detected with a synonymous variant c.2034G>A p.Pro678=, which is a common SNP (rs2285892) with high allelic frequency in the general population, A = 0.439172 (55146/125568, TOPMED). Sanger sequencing cannot distinguish if these two variants are located on a same allele (in-cis) or different alleles (in-trans); since NGS sequences on a molecule-by-molecule basis, it has been shown that these variants are present in different alleles (in-trans). As every c.2033C>T and c.2034G>A substitution appeared in different reads, no read contained both of them. The nonsense mutation of NF_17, c.625C>T p.Gln209Ter, was considered as mosaic. GATK's HaplotypeCaller detected a low frequency of mutation allele (T = 0.25), and the Sanger sequence also detected a low T peak.

Heterozygosity Mapping of the NF1 Region
Heterozygous variants were extracted in each sample, and the numbers in each longrange PCR region were counted (Table 5). In 12 samples, the presence of single-nucleotide variants (SNV) or short insertion/deletion variants (Indel) demonstrated heterozygosity DD, developmental disorder; Epi, epilepsy; MR, mental retardation; Mo, quasi-Moya Moya disease; and UBO, unidentified bright object in brain MRI. * These patients were previously reported [12]. # According to a National Institutes of Health Consensus Conference in 1988 [4].  Each cell is colour-coded depending on the degree of pathogenicity: blue for low, yellow for moderate, and red for high.

Heterozygosity Mapping of the NF1 Region
Heterozygous variants were extracted in each sample, and the numbers in each longrange PCR region were counted (Table 5). In 12 samples, the presence of single-nucleotide variants (SNV) or short insertion/deletion variants (Indel) demonstrated heterozygosity of all long-range PCR regions, i.e., the presence of both alleles' entire NF1 gene. What should be noted is, since long-range PCR regions next to each other overlap, in the region sandwiched between two regions where maintaining two gene copies has been confirmed by heterozygous variants, the existence of two copies of the gene is guaranteed, even without a heterozygous variant in that region, because both the forward and reverse PCR primers used to amplify that region were located in the adjacent regions that confirmed the existence of two copies. In the remaining samples, there was an area where the copy number could not be determined only from the MuLAS data. In 18 samples, excluding the NF1 type-1 microdeletion sample (NF_04, 05), the determination rate of heterozygosity was 85.3% in all the primer sets and 86.9% in the primer set containing the NF1 exons.

Detection of Chromosomal Level Large Deletion by Break Point-Specific Long-Range PCR
The break point PCR for type-1 microdeletion was performed, and approximately 4 kb of product was obtained specifically in NF_04 and 05 ( Figure 3A). The PCR product was microdeletion-specific and was not found in the other samples. Moreover, we also performed a NGS analysis of this PCR product and mapped it to two regions 1420 kb apart on both sides of NF1, as expected ( Figure 3B).

Accessing Features of NF1 Splicing Mutations
The analysis using rLAS detected splicing anomalies in three samples. In NF_02, very complex aberrant splicing events caused by the exon 21 splicing donor site mutation NM_000267.3:c.2850+1G>T were observed ( Figure 4A). Three different aberrant splicing events into exon 21 using three different GT sequences as new splicing donor sites were detected with various frequencies. The first two were out-of-frame deletions (232 and 144 bp), while the last one was an in-frame deletion (90 bp). The junction read number of each aberrant splicing was 266, 28, and 27, respectively, compared to exons 21 and 22 normal splicing, which was 325. Additionally, exon 21 skipping was observed as junction read number 54. The overall splicing anomaly is described at the RNA level as NM_000267.3: r.[2410_2850del;2619_2850del;2707_2850del;2761_2850del]. Among them, exon 21 skipping and r.2619_2850del were not detected in a previous study that used random primed cDNA and local RT-PCR [12].
Curr. Issues Mol. Biol. 2021, 1, FOR PEER REVIEW 17 out a heterozygous variant in that region, because both the forward and reverse PCR primers used to amplify that region were located in the adjacent regions that confirmed the existence of two copies. In the remaining samples, there was an area where the copy number could not be determined only from the MuLAS data. In 18 samples, excluding the NF1 type-1 microdeletion sample (NF_04, 05), the determination rate of heterozygosity was 85.3% in all the primer sets and 86.9% in the primer set containing the NF1 exons.

Detection of Chromosomal Level Large Deletion by Break Point-Specific Long-Range PCR
The break point PCR for type-1 microdeletion was performed, and approximately 4 kb of product was obtained specifically in NF_04 and 05 ( Figure 3A). The PCR product was microdeletion-specific and was not found in the other samples. Moreover, we also performed a NGS analysis of this PCR product and mapped it to two regions 1420 kb apart on both sides of NF1, as expected ( Figure 3B).        In this study, we also detected another splicing anomaly in the NF_16 nonsense mutation, NM_000267.3:c.889A>T p.Lys297Ter. This A-to-T substitution is located at the first base of exon 9. rLAS demonstrated that the mutant base (T) expression level in mRNA was reduced to 27.7% (corrected by the allele ratio at the DNA level), and there was a minor population of exon 9 skipping ( Figure 4C).
Moreover, the mutant allele expression ratio was compared to the wild-type allele at the mRNA level (corrected by the allele ratio at the DNA level) between the protein truncating mutations (frameshift, nonsense, and splicing mutations) and missense mutations. As previously reported in NF_03, this seemingly missense mutation, NM_000267.3: c.4402A>G p. (Ser1468Gly), is actually a splice mutation that creates a new splicing acceptor site in exon 33 and the stop codon immediately after that, NM_000267.3:c.4402A>G r.4368_4402del p.Phe1457Ter [12]. In a previous study, it was not possible to quantify the efficiency of this abnormal splicing. Therefore, it was unclear whether the effects of missense amino substitution p.Ser1468Gly would remain or not. In the present study, this quantification was possible by using NGS. Comparing the DNA (MuLAS) and RNA (rLAS) levels, it was shown that mutant bases (G) are contained in only 3.5% of mRNA (corrected by the allele ratio in the DNA level), because the G base is almost spliced out as an acceptor site ( Figure 4B, upper part). The sashimi plot also showed that this aberrant splicing junction number was nearly equal to the normal splicing junction number from the wild-type allele ( Figure 4B, lower part). Overall, the major effect of this one base substitution was demonstrated to be protein truncating associated with aberrant splicing.
In this study, we also detected another splicing anomaly in the NF_16 nonsense mutation, NM_000267.3:c.889A>T p.Lys297Ter. This A-to-T substitution is located at the first base of exon 9. rLAS demonstrated that the mutant base (T) expression level in mRNA was reduced to 27.7% (corrected by the allele ratio at the DNA level), and there was a minor population of exon 9 skipping ( Figure 4C).
Moreover, the mutant allele expression ratio was compared to the wild-type allele at the mRNA level (corrected by the allele ratio at the DNA level) between the protein truncating mutations (frameshift, nonsense, and splicing mutations) and missense mutations. For NF_02, the polymorphism detected in this sample, NM_000267.3:c.8151G>A p.Pro2717=, was used because the mutant base was spliced out and was not included in the mRNA. NF_03 was classified as a protein-truncating mutation based on the above discussion. NF_17 was excluded due to a mosaic mutation. Protein-truncating mutations had the impression that a lower allele expression level than that of the missense mutations and a significant difference was observed (p-value = 0.019) when the synonymous polymorphism of the sample for which no mutation was identified was added ( Figure 4D). It is suggested that protein-truncating mRNA may be destroyed by nonsense-mediated mRNA decay (NMD). However, this difference of mRNA expression is largely due to the two samples of NF_02 and NF_03, and it should be noted that there are some exceptional samples, such as NF_08, that have a hardly decreased mRNA level despite the frameshift mutation in exon 27. It suggests the sure existence of a NMD evasion case that does not follow the canonical rules. Additionally, NF_20 is a start codon mutation, NM_000267.3:c.1A>G p.Met1Val, being classified as missense mutation, but the mRNA level of the mutant allele may be decreased. However, no splicing abnormalities were observed in rLAS.
In conclusion, the abnormal splicing and allelic expression of NF1 mRNA could be observed accurately at the same time by combining MuLAS and rLAS data. Therefore, CoLAS is a useful tool that can comprehensively explore the effects of the point mutations on mRNA transcription.

SPRED1 Mutation Analysis
For the differential diagnosis of Legius syndrome, SPRED1 vLAS was added for the patients with no identified NF1 mutations (NF_09, 13, 14, 15, and 18) and for the patients with missense variants of undetermined pathological significance (NF_10, 11, and 12). No SPRED1 mutation was detected in these patients, including point mutation and large deletion/insertion. Additionally, all long amplicons contained heterozygous polymorphisms, confirming that, in all the patients, the entire SPRED1 gene region was maintained in two copies.

No NF1 Mutation Detected Cases
Patients NF_09 and 13 did not meet the diagnostic criteria for NF1, and their parents did not wish to pursue genetic testing. Regarding NF_14, the heterozygous mapping by MuLAS was incomplete, and MLPA was added at another facility, but no copy number abnormality was detected in the NF1 exons. CoLAS-denied point mutations, large deletions/insertions, and splicing mutations, including deep intron mutations, leave the possibility of low-frequency mosaic mutations in patients who met the clinical diagnostic criteria but NF1 mutations could not be identified (NF_14, 15, and 18).

Discussion
The diagnosis of a typical NF1 patient is easy, based on the clinical diagnostic criteria, but genetic testing has different significance. Frequently, young patients often present only with café au lait macules, and if there is no family history, the diagnosis cannot be confirmed until other symptoms appear. On the other hand, an optic glioma is commonly found in young children with NF1 and sometimes result in visual impairment. Additionally, the frequency of occurrence of optic glioma is known to have a genotype-phenotype correlation that NF1 mutations located in a third of the 5 side show a significantly higher risk [35]. Another important point is that the difference of phenotypic expressivity in NF1 is extremely large, forming a wide spectrum from very severe cases to almost asymptomatic mild cases. Patients with mosaic mutations that are present in a non-negligible proportion contribute to one reason for the phenotypic mildness. Therefore, even in adult patients, the diagnosis of atypical cases is dependent on genetic testing. Actually, in this study, definitive NF1 mutations were detected in two young patients that did not fulfil the clinical diagnostic criteria (NF_06 and NF_16).
Recently, clinical genetic testing has become basically performed using NGS. However, in most cases, it is all about doing massive sequences in the coding and exon/intron boundary region and copy number analyses. There are still insufficient points, such as in genetic testing. As in NF1, the causative gene is not only large, but also, the pattern of gene mutation is diverse. In addition, mutations at the DNA level have various effects on mRNA splicing. Additionally, determining the break point sequences of DNA for large intragenic deletions/duplications is not easy. Therefore, we developed a method that can detect as many types of gene mutations as possible and analyse the details of the nature of that on a single NGS platform. CoLAS, which simultaneously analyzes the entire genomic region and the full-length cDNA of a specific gene based on long-range PCR, has already improved the genetic analysis of the tuberous sclerosis complex [17]. In this pilot study, CoLAS was applied to the genetic diagnosis of NF1 and also demonstrated its high utility.
There are many advantages of using a long-range PCR as a base for a NGS analysis. Pseudogene sequences can be eliminated by designing target-specific PCR primers. An accurate sequence can also be obtained for introns. These are difficult with the capture probe method, and at the same time, it is less expensive to order PCR primers than to make a new capture probe for a specific gene. Long-range PCR products are likely to contain heterozygous bases, allowing a copy number confirmation and detection of the intragenic structural abnormalities (large deletions/insertions) simultaneously. By making it a multiplex, the amount of template DNA can be reduced. In this study, the NF1 genomic region spanning about 290 kb can be covered by only four PCR reactions (total of 80-ng genomic DNA). By sequencing the full-length cDNA, the splicing state and allele expression of the entire gene can be understood, and the splicing abnormality associated with the deep intron mutation can be detected by comparison with the DNA sequence.
In the present study, NF1 mutations were identified in 10 of the 15 newly analysed patients. It was possible to quantitatively measure that a single-point mutation causes various splicing abnormalities at the same time and that splicing abnormalities are caused by mutations that appear to be missense or nonsense mutations. The mapping of the NF1 copy number using heterozygous polymorphism confirmed the presence of two copies in the entire region of the NF1 gene in 12 out of 18 patients, excluding two cases of type-1 microdeletion, and the determination rate of heterozygosity was 86.9% in the primer set containing NF1 exons. If no heterozygous polymorphism is found in the NF1 region, microdeletion is suspected, and it was shown that type-1 microdeletion can be directly proved by break point-specific long-range PCR.
Additionally, this pilot study revealed the current limitations of NF1-CoLAS. First of all, with any type of LAS, it is difficult to accurately measure the alterations of gene copy numbers. Accordingly, we tried heterozygous mapping using intron SNPs, but some patients had regions where heterozygosity could not be proved, and an additional analysis by MLPA was required. The low frequency of polymorphisms in some patients may be due to the effect of recruiting a single ethnic group (Japanese).
Next, the detection of mosaic mutation is incomplete. In patients with no identified NF1 mutations, the possibility of less-frequent mosaic mutations remains. Increasing the NGS reading depth may detect less-frequent mosaic mutations but, at the same time, increase the risk of detecting false positives associated with PCR errors. Some PCR errors occur in a DNA sequence-dependent manner, and others occur randomly. To avoid false positives, it is necessary to duplicate experiments to extract common mosaic mutations and then remove the known sequence-specific false positives [17]. However, to create a sequence-dependent false-positive list, it is necessary to analyse more samples, which is an issue to consider in the future.
This method could also be useful in analysing somatic mutations in tumours involving the NF1 gene. In that case, a frozen tumour sample is required to extract high-molecularweight DNA and RNA. In addition, more accurate mosaic mutation detection is required.
At the end, in rLAS, the mRNA expression pattern of NF1 was analysed in lymphocytes, but it is expected that the expression of NF1 has tissue specificity, and how accurately it reflects the aberrant expression in the diseased target organ (nervous system) is unknown.

Conclusions
Although there is a limit on the sample numbers, this pilot study showed that CoLAS is an excellent method for the precise genetic analysis of NF1. This NGS application can not only simply identify disease-causative mutations (10 out of 15 new patients, including suspicious cases) but also accurately shows the detailed individual genomic structures, including chromosomal-level microdeletion, quantitative mRNA transcription, and the splicing state of the NF1 gene. Additionally, there are still remaining issues to be noted, such as determining the gene copy numbers, identifying less-frequent mosaic mutations, and tissue-specific gene expression. Informed Consent Statement: Written informed consent was obtained from all the patients or parents who participated in this study.