Expanding the Mutation Spectrum in ABCA4: Sixty Novel Disease Causing Variants and Their Associated Phenotype in a Large French Stargardt Cohort

Here we report novel mutations in ABCA4 with the underlying phenotype in a large French cohort with autosomal recessive Stargardt disease. The DNA samples of 397 index subjects were analyzed in exons and flanking intronic regions of ABCA4 (NM_000350.2) by microarray analysis and direct Sanger sequencing. At the end of the screening, at least two likely pathogenic mutations were found in 302 patients (76.1%) while 95 remained unsolved: 40 (10.1%) with no variants identified, 52 (13.1%) with one heterozygous mutation, and 3 (0.7%) with at least one variant of uncertain significance (VUS). Sixty-three novel variants were identified in the cohort. Three of them were variants of uncertain significance. The other 60 mutations were classified as likely pathogenic or pathogenic, and were identified in 61 patients (15.4%). The majority of those were missense (55%) followed by frameshift and nonsense (30%), intronic (11.7%) variants, and in-frame deletions (3.3%). Only patients with variants never reported in literature were further analyzed herein. Recruited subjects underwent complete ophthalmic examination including best corrected visual acuity, kinetic and static perimetry, color vision test, full-field and multifocal electroretinography, color fundus photography, short-wavelength and near-infrared fundus autofluorescence imaging, and spectral domain optical coherence tomography. Clinical evaluation of each subject confirms the tendency that truncating mutations lead to a more severe phenotype with electroretinogram (ERG) impairment (p = 0.002) and an earlier age of onset (p = 0.037). Our study further expands the mutation spectrum in the exonic and flanking regions of ABCA4 underlying Stargardt disease.

Given the large number of variants reported in ABCA4 [12], most of them being polymorphisms, the identification of true disease-causing mutations is often challenging [5,13,14].
With the advent of new analytical approaches, such as next generation sequencing (NGS), the detection rates of ABCA4 mutations have greatly increased since its discovery in 1997. Nevertheless, homozygous or compound heterozygous mutations are regularly detected in no more than 70-75% of STGD1 patients, while a significant number of patients carry only a single ABCA4 mutation or none at all [15][16][17][18][19]. On the other hand, several deep intronic variants in non-coding regions were reported to segregate with STGD1 affecting correct splicing mechanisms [19][20][21][22][23].
Interestingly, all genotyping studies on ABCA4 report a broad mutation spectrum and a high allelic heterogeneity [6,12,[24][25][26]. This phenomenon may be partially explained by the ethnic variability of the studied populations [12]. Therefore the need to screen large populations and to identify novel variants is still relevant [27,28] and can help not only to explore the genetic architecture of ABCA4 pathology, but also the genotype/phenotype correlation. Much effort today is directed towards a more specific correlation tailored on single variants [24,[28][29][30][31], even though their low frequency makes it challenging. Reporting ABCA4 novel variants with a clear and definite correlation with STGD1 would also help the proper selection of patients for therapeutic clinical trials reaching a higher degree of confidence with molecular diagnosis [32,33].
To further explore the genetic characteristics of ABCA4 and broaden the spectrum of its pathogenic variants, we analyzed a large French cohort of 397 STGD1 patients using a combination of microarray analysis and Sanger sequencing. The purpose of this study was to report all the novel variants identified evaluating their pathogenicity with in silico analysis. The clinical features were also analyzed and correlated with the genetic results.

Results
Sequencing of ABCA4 in our cohort of 397 STGD1 patients identified a large number of allelic variants. Microarray technology (ABCR600, ASPER Biotech, Inc., Tartu, Estonia) was used to screen 211 subjects for already known mutations, demonstrating at least two pathogenic variants in 76 of them. All the unsolved and the remaining cases were addressed to direct Sanger sequencing. At the end of the screening at least two likely pathogenic mutations were found in 302 patients (76.1%), while 95 remained unsolved: 40 (10.1%) with no variants identified, 52 (13.1%) with one heterozygous mutation, and 3 (0.7%) with at least one variant of uncertain significance (VUS) (Figure 1).
Given the large number of variants reported in ABCA4 [12], most of them being polymorphisms, the identification of true disease-causing mutations is often challenging [5,13,14].
With the advent of new analytical approaches, such as next generation sequencing (NGS), the detection rates of ABCA4 mutations have greatly increased since its discovery in 1997. Nevertheless, homozygous or compound heterozygous mutations are regularly detected in no more than 70-75% of STGD1 patients, while a significant number of patients carry only a single ABCA4 mutation or none at all [15][16][17][18][19]. On the other hand, several deep intronic variants in non-coding regions were reported to segregate with STGD1 affecting correct splicing mechanisms [19][20][21][22][23].
Interestingly, all genotyping studies on ABCA4 report a broad mutation spectrum and a high allelic heterogeneity [6,12,[24][25][26]. This phenomenon may be partially explained by the ethnic variability of the studied populations [12]. Therefore the need to screen large populations and to identify novel variants is still relevant [27,28] and can help not only to explore the genetic architecture of ABCA4 pathology, but also the genotype/phenotype correlation. Much effort today is directed towards a more specific correlation tailored on single variants [24,[28][29][30][31], even though their low frequency makes it challenging. Reporting ABCA4 novel variants with a clear and definite correlation with STGD1 would also help the proper selection of patients for therapeutic clinical trials reaching a higher degree of confidence with molecular diagnosis [32,33].
To further explore the genetic characteristics of ABCA4 and broaden the spectrum of its pathogenic variants, we analyzed a large French cohort of 397 STGD1 patients using a combination of microarray analysis and Sanger sequencing. The purpose of this study was to report all the novel variants identified evaluating their pathogenicity with in silico analysis. The clinical features were also analyzed and correlated with the genetic results.

Results
Sequencing of ABCA4 in our cohort of 397 STGD1 patients identified a large number of allelic variants. Microarray technology (ABCR600, ASPER Biotech, Inc., Tartu, Estonia) was used to screen 211 subjects for already known mutations, demonstrating at least two pathogenic variants in 76 of them. All the unsolved and the remaining cases were addressed to direct Sanger sequencing. At the end of the screening at least two likely pathogenic mutations were found in 302 patients (76.1%), while 95 remained unsolved: 40 (10.1%) with no variants identified, 52 (13.1%) with one heterozygous mutation, and 3 (0.7%) with at least one variant of uncertain significance (VUS) (Figure 1).

Novel Variants
Sixty-three novel ABCA4 variants were identified in 63 index patients (30 males); 60 variants (in 61 subjects) were likely pathogenic or pathogenic. The other 3 variants (in 3 subjects) were VUS (Table 1, uncertain). CIC03678 was a carrier of two novel mutations, one likely pathogenic and one VUS. The 60 likely pathogenic or pathogenic variants comprised 33 missense and 6 nonsense variants, 11 deletions, 1 one-base pair (bp) duplication, 1 four-bp duplication, 1 insertion, and 7 variants presumably affecting splicing (5 single nucleotide substitutions, 1 single-bp deletion, and 1 eight-bp deletion) (Table 1), which co-segregated with the phenotype in tested available family members. Co-segregation analysis allowed the identification of complex alleles with in cis mutations in 18 patients. Nine additional patients had more than two variations, but it was impossible to establish phase and determine each allele due to the lack of DNA from additional family members.
All genotypic information about patients and co-segregation analysis are reported in Table 2. Four missense mutations (i.e., p.(Cys519Tyr), p.(Glu1271Gln), p.(Gly1592Asp), and p.(Asp2095Gly)) were analyzed also with the splicing predictor tools due to their proximity with a putative splice site. None of them was predicted to affect splicing significantly but they were predicted as disease-causing by the missense prediction in silico algorithms.
CIC08932 inherited the complex allele p.(Asn96Lys; Gly1961Glu) from the unaffected mother, but the novel variant p.(Gly1592Asp) was not found in the unaffected father ( Table 2). This variant could be de novo in the index patient or CIC08933 may not be the biological father. Unfortunately, it was impossible to get this information from the subject, and our informed consent did not cover single nucleotide polymorphisms analysis for paternity testing.
Three novel variants found in three index patients (p.(Asn956Lys), p.(Arg1137Gly), and c.5899−3T>C) were considered VUS. Conservation and pathogenic prediction data for each variant are reported in Supplementary Tables S1 and S2.        All novel variants found in the study and their respective position on ABCA4 are represented in Figure 2.
All novel variants found in the study and their respective position on ABCA4 are represented in Figure 2

Genotype-Phenotype Correlation
Genotype-phenotype correlation analysis was performed on the 60 patients carrying at least one likely pathogenic variant on each allele. The three subjects carrying a VUS were excluded from the analysis. All clinical features are presented in the Supplementary Table S3.
Patients were divided in two genotype groups: (1) patients with at least one null mutation (NM), i.e., frame-shift or nonsense mutations, splicing variants affecting the first two nucleotides adjacent to an exonic sequence and/or other variants proven to cause a premature stop or incomplete formation of the whole protein in previous studies (e.g., c.5461−10T>C [51]); (2) patients with two or more missense variants (MM) [52].
The two genotypic groups were not statistically different for age (mean of 34.3 ± 17.8 years for NM group and 33.9 ± 18.3 years for MM group; p = 0.94) nor duration of disease (mean of 16.9 ± 14.6 years for NM group and 10.8 ± 10.9 years for MM group; p = 0.095).
Mean age of onset was 18.5 ± 12.59 years and 23.1 ± 14.8 years in NM and MM groups, respectively (p = 0.037).
Right eye (RE) and left eye (LE) did not differ significantly for best corrected visual acuity (BCVA) (p = 0.81), central retinal thickness (CRT) (p = 0.09), and macular volume (MV) (p = 0.137), therefore we chose to use data from the right eye for comparison between genotypic groups.

Genotype-Phenotype Correlation
Genotype-phenotype correlation analysis was performed on the 60 patients carrying at least one likely pathogenic variant on each allele. The three subjects carrying a VUS were excluded from the analysis. All clinical features are presented in the Supplementary Table S3.
Patients were divided in two genotype groups: (1) patients with at least one null mutation (NM), i.e., frame-shift or nonsense mutations, splicing variants affecting the first two nucleotides adjacent to an exonic sequence and/or other variants proven to cause a premature stop or incomplete formation of the whole protein in previous studies (e.g., c.5461−10T>C [51]); (2) patients with two or more missense variants (MM) [52].
The two genotypic groups were not statistically different for age (mean of 34.3 ± 17.8 years for NM group and 33.9 ± 18.3 years for MM group; p = 0.94) nor duration of disease (mean of 16.9 ± 14.6 years for NM group and 10.8 ± 10.9 years for MM group; p = 0.095).
Mean age of onset was 18.5 ± 12.59 years and 23.1 ± 14.8 years in NM and MM groups, respectively (p = 0.037).
Right eye (RE) and left eye (LE) did not differ significantly for best corrected visual acuity (BCVA) (p = 0.81), central retinal thickness (CRT) (p = 0.09), and macular volume (MV) (p = 0.137), therefore we chose to use data from the right eye for comparison between genotypic groups.

Figure 4.
Repartition between the two genotypic subtypes for electroretinography (ERG) groups (on the left) and peripapillary sparing (on the right). For ERG classification, group I has abnormal multifocal (mf-) ERG with normal full-field (ff-) ERG; in group II there were mf-ERG abnormalities with cone dysfunction (assessed with light-adapted 30 Hz flicker and light-adapted 3.0); Group III has additional rod dysfunction (assessed using dark-adapted 0.01 and dark-adapted 10.0). Peripapillary area was considered speared (Yes in the graph) if no alterations were found in the fundus autofluorescence within 0.6 mm from the optic disc edge. This area was not considered speared when flecks or ellipsoid zone (EZ) and/or retinal pigment epithelium (RPE) atrophy were present (see Table 3 for classifications). Photoreceptors dysfunction (ERG groups II and III) and peripapillary area involved with the presence of flecks or atrophy are more prevalent in the group of patients with at least one null mutation.

Discussion
This work is a longitudinal study starting in 2007 which used a combination of microarray analysis and direct Sanger sequencing to identify ABCA4 mutations in a large French cohort with a clinical diagnosis of STGD1 disease. At that time targeted NGS and whole exome sequencing (WES) was not available in our laboratory. The rate of bi-allelic variants detected in the population using this approach was 75.6%, which is in accordance with previously reported data (about 75%) [15][16][17][18][19].
The authors believe that the same results would have been obtained using only direct Sanger sequencing on the entire cohort. Initially, patients were analyzed with microarray to evaluate the prevalence of known mutations, which was at this time commonly used to reduce screening costs. However, the high number of unsolved cases led us to further investigate the cohort with direct Sanger sequencing. The sole application of microarray was enough to solve 76 cases, which were not further investigated with Sanger technique. Indeed, this may have led to an underestimation of the prevalence of novel additional mutations and/or complex alleles in this cohort [28,58]. The relatively high number of unsolved cases may be related to a combination of different factors: (1) The clinical overlap of distinct genetic entities could have led to uncertainties in patients' selection. Therefore the screening of other genes (e.g., ELOVL4, PROM1, PRPH2, etc.) responsible for overlapping phenotypes may help solving such cases; (2) Large rearrangements within the ABCA4 locus have been shown to occur in ~0.5% to ~2% of STGD1 cases in previous reports and they would not have been detected by our mutational screening [22,24]; (3) Mutations in noncoding regions of the ABCA4 gene locus have been proposed as a common source for a second causative mutation in patients with typical Stargardt phenotype. In particular, the occurrence of deep intronic variants in subjects with a single ABCA4 mutations may range from ~2% to ~18% in different reports [19][20][21][22]27,59]. This may have led not only to an underestimation of bi-allelic cases, but also of the number of novel variants found in the cohort, Figure 4. Repartition between the two genotypic subtypes for electroretinography (ERG) groups (on the left) and peripapillary sparing (on the right). For ERG classification, group I has abnormal multifocal (mf-) ERG with normal full-field (ff-) ERG; in group II there were mf-ERG abnormalities with cone dysfunction (assessed with light-adapted 30 Hz flicker and light-adapted 3.0); Group III has additional rod dysfunction (assessed using dark-adapted 0.01 and dark-adapted 10.0). Peripapillary area was considered speared (Yes in the graph) if no alterations were found in the fundus autofluorescence within 0.6 mm from the optic disc edge. This area was not considered speared when flecks or ellipsoid zone (EZ) and/or retinal pigment epithelium (RPE) atrophy were present (see Table 3 for classifications). Photoreceptors dysfunction (ERG groups II and III) and peripapillary area involved with the presence of flecks or atrophy are more prevalent in the group of patients with at least one null mutation.

Discussion
This work is a longitudinal study starting in 2007 which used a combination of microarray analysis and direct Sanger sequencing to identify ABCA4 mutations in a large French cohort with a clinical diagnosis of STGD1 disease. At that time targeted NGS and whole exome sequencing (WES) was not available in our laboratory. The rate of bi-allelic variants detected in the population using this approach was 75.6%, which is in accordance with previously reported data (about 75%) [15][16][17][18][19].
The authors believe that the same results would have been obtained using only direct Sanger sequencing on the entire cohort. Initially, patients were analyzed with microarray to evaluate the prevalence of known mutations, which was at this time commonly used to reduce screening costs. However, the high number of unsolved cases led us to further investigate the cohort with direct Sanger sequencing. The sole application of microarray was enough to solve 76 cases, which were not further investigated with Sanger technique. Indeed, this may have led to an underestimation of the prevalence of novel additional mutations and/or complex alleles in this cohort [28,58]. The relatively high number of unsolved cases may be related to a combination of different factors: (1) The clinical overlap of distinct genetic entities could have led to uncertainties in patients' selection. Therefore the screening of other genes (e.g., ELOVL4, PROM1, PRPH2, etc.) responsible for overlapping phenotypes may help solving such cases; (2) Large rearrangements within the ABCA4 locus have been shown to occur in~0.5% to~2% of STGD1 cases in previous reports and they would not have been detected by our mutational screening [22,24]; (3) Mutations in noncoding regions of the ABCA4 gene locus have been proposed as a common source for a second causative mutation in patients with typical Stargardt phenotype. In particular, the occurrence of deep intronic variants in subjects with a single ABCA4 mutations may range from~2% to~18% in different reports [19][20][21][22]27,59]. This may have led not only to an underestimation of bi-allelic cases, but also of the number of novel variants found in the cohort, whose report was the main aim of this study. However, the number of novel variants affecting deep intronic regions should be relatively small. Schulz et al. [27] in a recent report screened for deep intronic variants 237 STGD1 patients showing a single ABCA4 variant. Among the cohort, ten different sequence variants were found, only two of which were not previously reported [27]. The addition of the first, the screening of selected noncoding regions known to be "hot spots" for pathogenic variants (e.g., intron 30 and 36 [27]), and second, whole genome sequencing (WGS) analysis, will be the next steps in order to raise the number of solved cases in our cohort.
Among the 305 patients with bi-allelic variants, we found a total of 240 disease-causing mutations, of which 60 (25%) were never described before ( Table 2).
The high number of new variants found in this cohort may seem surprising considering that several studies had already been conducted on the western and central European populations [5,15,17,27,43]. These findings are probably related to the extensive allelic heterogeneity of ABCA4 and the possible contribution of ethnic differences [12]. Among the 61 patients with novel likely pathogenic or pathogenic variants, three subjects had eastern European origins and nine subjects had non-European origins. The novel variant p.(Pro1761Arg) is carried by three subjects with no ties of consanguinity and with non-European origins: Armenia (CIC07960), Turkey (CIC09601), and Algeria (CIC07036). This may suggest a higher prevalence of this mutation in the south Mediterranean area. Both CIC07960 and CIC09601 harbor the variant in cis with p. (Arg2106Cys). Interestingly, the two subjects share the same pattern distribution of the lesion, but CIC07960 has an earlier age of onset and an important foveal involvement, probably due to the associated truncating mutation p.(Leu1274Serfs*8) ( Figure 5). whose report was the main aim of this study. However, the number of novel variants affecting deep intronic regions should be relatively small. Schulz et al. [27] in a recent report screened for deep intronic variants 237 STGD1 patients showing a single ABCA4 variant. Among the cohort, ten different sequence variants were found, only two of which were not previously reported [27]. The addition of the first, the screening of selected noncoding regions known to be "hot spots" for pathogenic variants (e.g., intron 30 and 36 [27]), and second, whole genome sequencing (WGS) analysis, will be the next steps in order to raise the number of solved cases in our cohort. Among the 305 patients with bi-allelic variants, we found a total of 240 disease-causing mutations, of which 60 (25%) were never described before ( Table 2).
The high number of new variants found in this cohort may seem surprising considering that several studies had already been conducted on the western and central European populations [5,15,17,27,43]. These findings are probably related to the extensive allelic heterogeneity of ABCA4 and the possible contribution of ethnic differences [12]. Among the 61 patients with novel likely pathogenic or pathogenic variants, three subjects had eastern European origins and nine subjects had non-European origins. The novel variant p.(Pro1761Arg) is carried by three subjects with no ties of consanguinity and with non-European origins: Armenia (CIC07960), Turkey (CIC09601), and Algeria (CIC07036). This may suggest a higher prevalence of this mutation in the south Mediterranean area. Both CIC07960 and CIC09601 harbor the variant in cis with p. (Arg2106Cys). Interestingly, the two subjects share the same pattern distribution of the lesion, but CIC07960 has an earlier age of onset and an important foveal involvement, probably due to the associated truncating mutation p.(Leu1274Serfs*8) ( Figure 5). The novel variant p.(Ile351Leufs*23) was carried by three unrelated French subjects indicating a possible higher prevalence in this population. While CIC01080 and CIC04235 share an early onset advanced disease with spread fundus atrophy, CIC07308 has a normal ff-ERG and no posterior atrophy probably due to the second mutation p.(Gly1961Glu), known for its "milder" pathologic effect [30,31].
Among the likely pathogenic novel variants three were homozygous: p.(Tyr340Cys), p.(Val434Gly), and p. (Gly2146Valfs*36). The novel variant p.(Ile351Leufs*23) was carried by three unrelated French subjects indicating a possible higher prevalence in this population. While CIC01080 and CIC04235 share an early onset advanced disease with spread fundus atrophy, CIC07308 has a normal ff-ERG and no posterior atrophy probably due to the second mutation p.(Gly1961Glu), known for its "milder" pathologic effect [30,31].
CIC07725, harboring p.(Tyr340Cys), had early-onset disease (nine years of age) with normal ff-ERG, centripetal spread of flecks and foveal EZ band involvement. CIC09625, harboring p.(Val434Gly) has an early onset disease (nine years of age) with a central lesion without flecks and with foveal sparing. Even though they share a similar age of onset and duration of the disease (one and two years, respectively), CIC09625 shows a "milder" phenotype ( Figure 6). These two missense mutations may have a distinct impact on the protein function even though they are located in the same domain (first ABCA4 extracellular domain). Alternatively, genetic or environmental modifiers may explain the phenotypic variability.
As expected, CIC06396, homozygous for p. (Gly2146Valfs*36), showed a more advanced phenotype with an early onset disease (nine years of age), extensive fundus RPE atrophy and photoreceptor impairment (Figure 6). CIC07725, harboring p.(Tyr340Cys), had early-onset disease (nine years of age) with normal ff-ERG, centripetal spread of flecks and foveal EZ band involvement. CIC09625, harboring p.(Val434Gly) has an early onset disease (nine years of age) with a central lesion without flecks and with foveal sparing. Even though they share a similar age of onset and duration of the disease (one and two years, respectively), CIC09625 shows a "milder" phenotype ( Figure 6). These two missense mutations may have a distinct impact on the protein function even though they are located in the same domain (first ABCA4 extracellular domain). Alternatively, genetic or environmental modifiers may explain the phenotypic variability.
CIC05824 harbors the novel intronic variant c.5899−3T>C, which is not predicted to affect splicing by the algorithms used since the nucleotide is poorly evolutionarily conserved (Supplementary Table S2). Indeed, we cannot exclude the possibility that this variant may affect splicing in vivo since it has already been proven that ABCA4 can have non-canonical splice sites variant with important effects on protein [19][20][21][22][23]59].
The VUS p.(Asn956Lys) harbored by CIC03678 is predicted to be pathogenic only by SIFT and the AA is not conserved ( Table 1, Supplementary Table S2). Asparagine and Lysine share similar characteristics of hydropathy and polarity. Unfortunately, although genetic analysis of CIC03679, unaffected son of CIC03678, revealed that he was not carrying p.(Asn956Lys), in absence of other family members available for co-segregation analysis, its significance still remains unclear.
CIC05824 harbors the novel intronic variant c.5899−3T>C, which is not predicted to affect splicing by the algorithms used since the nucleotide is poorly evolutionarily conserved (Supplementary Table S2). Indeed, we cannot exclude the possibility that this variant may affect splicing in vivo since it has already been proven that ABCA4 can have non-canonical splice sites variant with important effects on protein [19][20][21][22][23]59].
The VUS p.(Asn956Lys) harbored by CIC03678 is predicted to be pathogenic only by SIFT and the AA is not conserved ( Table 1, Supplementary Table S2). Asparagine and Lysine share similar characteristics of hydropathy and polarity. Unfortunately, although genetic analysis of CIC03679, unaffected son of CIC03678, revealed that he was not carrying p.(Asn956Lys), in absence of other family members available for co-segregation analysis, its significance still remains unclear.
Variant c.3409A>G, p.(Arg1137Gly) carried by CIC03545, is predicted to be pathogenic by SIFT and mutation taster although the residue was not conserved, being substituted by a Glutamine, a Threonine or a Lysine in the conservation analysis (Table 1, Supplementary Table S2). However, all these amino acids are polar while Glycine is not, hence an effect on the tridimensional structure of the protein cannot be completely excluded.
Several hypomorphic alleles have been associated with STGD1 [27,29]. Even though many of them were considered benign because of their high frequency, these alleles could be pathogenic when they are in trans with severe variants, explaining many of the unsolved cases with only one or no mutations detected on ABCA4 [27,29]. The presence of these alleles could help in classifying accompanying ABCA4 mutations as severe mutations and in genotype-phenotype correlations.
For example, Zernant et al. [29] recently reported that the hypormophic variant p.(Asn1868Ile) accounts for more than 50% of the missing causal alleles in monoallelic cases and about 80% of late-onset cases and distinguishes ABCA4-related disease from AMD [29]. In our sub-cohort of 61 patients, this variant appeared in 14 subjects. Most of these cases presented p.(Asn1868Ile) in cis with other disease causing mutations (12 subjects) and the phenotypes were typically consistent with the effect of the overall genotype with mainly early onset and severe disease as documented by Zernant et al. [29]. Only two subjects (CIC08283 and CIC04197) had p.(Asn1868Ile) alone, in trans with other pathogenic variants, resulting in a milder phenotype. Further functional analysis should be performed in order to confirm the pathogenicity of these variants.
Among the 63 patients carrying novel variants, we performed genotype-phenotype correlation on 60 of them. The three subjects carrying a VUS were excluded from the analysis. According to the genotype-phenotype model previously proposed [24], we observed an earlier age of onset and more diffuse photoreceptor dysfunction (assessed by ff-ERGs) in NM group. These findings are in accordance with the previous literature [12,52]. Regarding the peripapillary sparing, its loss was more frequent in the NM group which is also associated with a higher prevalence of non-group I ff-ERG consistent with previous publications reporting a loss of this sign in non-group I Stargardt disease [60].
Overall, in our group of 60 patients harboring novel likely pathogenic ABCA4 changes, we observed a tendency for a worse phenotype when a truncating variant is present.
In this report we wanted to prioritize the characterization of the patients carrying novel variants. Even though it is only a subgroup of subjects the results seem strong and consistent with previous literature [12] and they probably reflect the genotype-phenotype correlation of the entire cohort. Further analysis including a full clinical assessment of patients with known variants will be necessary to confirm these findings.
However, even with a large cohort of patients, it is impossible to establish a precise phenotype-genotype correlation only on the basis of the type of the mutation. In fact, two different missense mutations involving residues at different location with distinct effects on the quaternary structure of ABCA4 can produce very different phenotypes [12,61]. Nowadays, the main efforts are aimed to find correlations between specific variants and clinical phenotype although the rarity of many alleles makes this difficult. For this reason, we tried to provide an exhaustive clinical characterization of each subject showing at least one novel mutation among the two mutant alleles of ABCA4 (see Supplementary Table S3).

Materials and Methods
Patients with a presumed diagnosis of STGD1 disease were recruited at the Reference Center for rare diseases, Referet of the Quinze-Vingts hospital, Paris. Informed consent was obtained from each patient after explanation of the study and its potential outcome. The study protocol adhered to the tenets of the Declaration of Helsinki and was approved by a national ethics committee (CPP Ile de France V, Project number 06693, N•EUDRACT 2006-A00347- 44, 11 December 2006). All the patients and available family members were asked to donate a blood sample for genetic screening for ABCA4 mutations. Subjects with at least one novel variant were called back to undergo a complete phenotype analysis as described below.

Mutation Analysis
Total genomic DNA was extracted from peripheral whole blood samples by standard salting out procedures according to the manufacturer's recommendation (Puregen Kit; Qiagen, Courtaboeuf, France). The first consecutive 211 subjects were screened for known ABCA4 mutations by microarray analysis on a commercially available microarray (ABCR600, ASPER Biotech, Inc., Tartu, Estonia) [48]. Among them, samples which were excluded for known variants were further investigated for variants in the coding exons and their flanking regions of ABCA4 by PCR and direct Sanger sequencing. The DNA of the other patients included in the study was directly sequenced by Sanger sequencing. The 50 coding exons and flanking intronic regions were amplified in 48 fragments (ABCA4 RefSeq NM_000350) using oligonucleotides reported in Supplementary Table S4, a commercially available polymerase (HotFire, Solis Biodyne, Tartu, Estonia), 1.5 mM MgCl 2 at an annealing temperature of 58 • C for 1 min. The PCR products were enzymatically purified (ExoSAP-IT, USB Corporation, Cleveland, OH, USA, purchased from GE Healthcare, Orsay, France) sequenced and investigated as previously reported [62].
Nucleotide numbering reflects cDNA numbering with +1 corresponding to the A of the ATG translation initiation codon in the reference sequence, according to journal guidelines (www.hgvs.org/ mutnomen).
All variants were classified following the American College of Medical Genetics and Genomics (ACMG) standards and guidelines and the Association of Molecular Pathology (AMP) Clinical Practice Guidelines and Reports based on previous publications (as compiled in the Human Gene Mutation Database (HGMD) [63] and in the Leiden Open (source) Variation Database (LOVD) [64]), population data, computational data and functional data. The Exome Aggregation Consortium (ExAC) database was used to check variant frequency.
Evolutionary conservation was investigated using the 46-way Vertebrate Multiz Alignment and Conservation of the University of California Santa Cruz (UCSC) genome browser [71,72].
For missense mutations, an amino acid residue was considered highly conserved if the same residue was present in all species or was different in just one species among fishes or reptiles, moderately conserved if different in 2 to 5 species (included), and not conserved if different in more than 5 species or in at least one primate.
A novel sequence variant was considered pathogenic if it represented a nonsense variant or small insertion, deletion, or duplication inducing a frame-shift. In the case of a missense change, a novel sequence variant was considered to be likely pathogenic if it was either predicted to be deleterious by all three prediction algorithms or if it affected a highly or moderately evolutionary conserved amino acid residue and was predicted to be pathogenic by at least one algorithm mentioned above [73]. In all other cases, a variant was classified as a VUS.
The same criteria described above, but considering DNA sequence, were applied to study the conservation of nucleotide residues and pathogenicity of variants on putative or non-canonic splice sites (±10 bases from exon boundaries).
The phenotype was classified following all clinical criteria summarized in Table 3. Color fundus photographs were acquired for each index patient and stratified in 4 groups according to a previously described classification [54].
A confocal scanning laser ophthalmoscope (Heidelberg Retina Angiograph [HRA] II; Heidelberg Engineering, Dossenheim, Germany) was used to acquire SW-AF images (488 nm excitation) and NIR-AF images (787 nm excitation). Thanks to the eye-tracking function of the HRA and averaging at least 30 single frames, 2 high-quality images (30 • and 50 • field of view respectively) for each modality were taken for each eye. Subsequently, we divided patients into 5 groups according to a previously described classification [55]: group 1: central lesion with jagged borders, group 2: central lesion with extensive fundus changes; group 3: central lesion with smooth borders and an hyperautofluorescent ring-like halo in SW-AF and NIR-AF; group 4: central lesion with smooth borders and no hyperautofluorescent NIR-AF ring; group 5: small discrete central lesion better visualized in NIR-AF. When RPE atrophy was present the measurement of the area was performed using RegionFinder software (Heidelberg Engineering, Dossenheim, Germany; software version 2.5.8.0) on SW-AF images [76]. The 50 • images were used to establish the extension of fundus abnormalities and for evaluating the peripapillary area [56].
A single horizontal high-resolution OCT B-Scan (9 mm) was obtained together with a simultaneously acquired infra-red (IR) fundus image in a transverse plane through the fovea. This image was used to evaluate the preservation of the ellipsoid zone (EZ) through the fovea [57] and the possible foveal sparing from the disease [38]. A cube of OCT scans was then obtained in high-speed mode with the automated real-time mode activated and set to 10 (creating a mean image of ten repeated identical B-scans to improve signal-to-noise ratio). The cube scan protocol contains a density of 49 B-scan lines covering an area of 20 • × 20 • centered on the fovea. As previously described, a trained operator corrected any segmentation errors made by the software version with manual segmentation [77,78]. When the quality of segmentation was adequate, the central retinal thickness (CRT) and the total macular volume (TMV) were recorded and compared against the data of 30 normal subjects (15 subjects younger than 30 years and 15 subjects older; Supplementary Table S5).
All patients underwent electrophysiological assessment, including ff-ERG and mf-ERG, incorporating the minimum standards of the International Society for Clinical of Electrophysiology of Vision (ISCEV) including the following stimulations: dark adapted dim flash 0.01 candela second (cd·s/m 2 ); dark-adapted bright flash 10.0 cd·s/m 2 ; light adapted 3.0 cd·s/m 2 30 Hz flicker ERG and light adapted 3.0 cd·s/m 2 at 2 Hz. The patient data set were compared against those of 30 healthy subjects (15 younger than 30 years old and 15 older; Supplementary Table S1). The limits of ERG normality were defined for all the components of the ERG as the mean value ± 2 standard deviations. All the components of the ERG from each eye were taken into account when classifying patients into the three ERG groups defined by Lois et al. [53]: Group 1 has abnormal mf-ERG with normal ff-ERG; in Group 2 there were mf-ERG abnormalities with cone dysfunction (assessed with light-adapted 30 Hz flicker and light-adapted 3.0); Group 3 has additional rod dysfunction (assessed using dark-adapted 0.01 and dark-adapted 10.0). The overall classification was based on the more severe eye when the ERG group was different between eyes in the same patient.

Statistical Analysis
All data analysis was conducted using IBM SPSS Statistics software v. 21.0 (Chicago, IL, USA). For BCVA, MV, and CRT we used the Wilcoxon signed-rank test to prove the agreement between eyes.
Differences between the two genotype groups were analyzed using the Wilcoxon-Mann-Whitney test for age of onset, age at visit, duration of the disease, BCVA, MV, and CRT. The Goodman-Kruskal Gamma test was used to analyze the difference between NM and MM groups for fundus stage, NIR-FAF, and IR-FAF groups, the presence of RPE atrophy, foveal sparing, peripapillary sparing, EZ band integrity, and ERG group distribution. For these descriptive variants, when there was discordance between the right and left eye, we considered the eye with the worse phenotype. A p value inferior to 0.05 was considered statistically significant.

Conclusions
Despite certain limitations in terms of detecting large CNVs (<1% of all disease-associated alleles) [79] or mutations in non-coding regions, and the lack of functional analysis, our study broadens the spectrum of ABCA4 mutations with 60 likely pathogenic or pathogenic variants, all associated with STGD1. Acknowledgments: The authors thank Rand Wilcox Vanden Berg for writing assistance and proofreading. DNA samples included in this study originate from NeuroSensCol DNA bank, dedicated for research in neurosensory disorders (PI: JA Sahel, coPI I Audo, partner with CHNO des Quinze-Vingts, Inserm and CNRS).

Conflicts of Interest:
The authors declare no conflict of interest.