Bi-Allelic Novel Variants in CLIC5 Identified in a Cameroonian Multiplex Family with Non-Syndromic Hearing Impairment

DNA samples from five members of a multiplex non-consanguineous Cameroonian family, segregating prelingual and progressive autosomal recessive non-syndromic sensorineural hearing impairment, underwent whole exome sequencing. We identified novel bi-allelic compound heterozygous pathogenic variants in CLIC5. The variants identified, i.e., the missense [NM_016929.5:c.224T>C; p.(L75P)] and the splicing (NM_016929.5:c.63+1G>A), were validated using Sanger sequencing in all seven available family members and co-segregated with hearing impairment (HI) in the three hearing impaired family members. The three affected individuals were compound heterozygous for both variants, and all unaffected individuals were heterozygous for one of the two variants. Both variants were absent from the genome aggregation database (gnomAD), the Single Nucleotide Polymorphism Database (dbSNP), and the UK10K and Greater Middle East (GME) databases, as well as from 122 apparently healthy controls from Cameroon. We also did not identify these pathogenic variants in 118 unrelated sporadic cases of non-syndromic hearing impairment (NSHI) from Cameroon. In silico analysis showed that the missense variant CLIC5-p.(L75P) substitutes a highly conserved amino acid residue (leucine), and is expected to alter the stability, the structure, and the function of the CLIC5 protein, while the splicing variant CLIC5-(c.63+1G>A) is predicted to disrupt a consensus donor splice site and alter the splicing of the pre-mRNA. This study is the second report, worldwide, to describe CLIC5 involvement in human hearing impairment, and thus confirms CLIC5 as a novel non-syndromic hearing impairment gene that should be included in targeted diagnostic gene panels.


Annotation and Filtering Strategy
Variants were annotated and filtered using ANNOVAR [20] and custom scripts. Variants were first prioritized based on the inheritance model, considering both AR and autosomal dominant (AD) modes of inheritance. Subsequently, rare variants with a minor allele frequency (MAF) <0.005 (for AR) and <0.0005 (for AD) in all populations of the genome aggregation database (gnomAD) were retained. Known pathogenic HI variants listed in ClinVar were also retained, regardless of their frequencies. dbNSFP v3.0 was used to annotate, with 17 bioinformatic tools predicting the deleterious effects of the identified variants [21]. Coding variants were evaluated using Sorting Intolerant from Tolerant (SIFT), polymorphism phenotyping v2 (PolyPhen-2) × 2, MutationAssessor, the likelihood ratio test (LRT), Mendelian clinically applicable pathogenicity (M-CAP) score, Rare Exome Variant Ensemble Learner (REVEL), MutPred, protein variation effect analyzer (PROVEAN), MetaSVM, and MetaLR, while Genomic DNA samples were extracted from peripheral blood, using the chemagic extraction protocol, in the division of Human Genetics, University of Cape Town, South Africa. Additionally, a group of 118 unrelated Cameroonian individuals living with sporadic NSHI of putative genetic origin (Table S1) were recruited, to investigate the frequencies of pathogenic variants that could be found. All hearing impaired family members were previously investigated for variants in GJB2 (through direct sequencing of the entire coding region of GJB2), and GJB6-D13S1830 deletion (using a multiplex polymerase chain reaction), and were negative [6].
Genes 2020, 11, 1249 4 of 12 A total of 122 ethno-linguistically matched Cameroonian controls without personal or familial history of HI were randomly recruited among blood donors at The Central Hospital of Yaoundé, Cameroon.

Whole Exome Sequencing and Data Analysis
DNA samples from five family members were exome sequenced at Omega Bioservices (Norcross, GA, USA); these samples were obtained from two affected individuals ( Figure 1A, II.1, and II.3), their parents (I.1, and I.2), and one unaffected sibling (II.4). Library preparation was performed with an Illumina Nextera Rapid Capture Exome Kit ® (Illumina, San Diego, CA, USA) following the manufacturer's instructions, and the resulting libraries were hybridized with a 37 Mb probe pool to enrich exome sequences. Sequencing was performed on an Illumina HiSeq 2500 sequencer using the pair-end 150 bp run format. Sequencing data were processed using the Illumina DRAGEN Germline Pipeline v3.2.8. Briefly, high-quality reads were aligned to the human reference genome GRCh37/hg19 using the DRAGEN software version 05.021.408.3.4.12, and, after sorting and duplicate marking, variants were called, and individual genomic variant call format (gvcf) files were generated. Joint single nucleotide variant (SNV) and Insertion/Deletion (Indel) variant calling was performed using the genome analysis toolkit (GATK) software v4.0.6.0 [17]. The sex of each individual was verified using plinkv1.9 [18]. Familial relationships for all members were verified via Identity-by-Descent sharing (plinkv1.9) and the Kinship-based INference for Gwas (KING) algorithm [18,19].

Annotation and Filtering Strategy
Variants were annotated and filtered using ANNOVAR [20] and custom scripts. Variants were first prioritized based on the inheritance model, considering both AR and autosomal dominant (AD) modes of inheritance. Subsequently, rare variants with a minor allele frequency (MAF) < 0.005 (for AR) and <0.0005 (for AD) in all populations of the genome aggregation database (gnomAD) were retained. Known pathogenic HI variants listed in ClinVar were also retained, regardless of their frequencies. dbNSFP v3.0 was used to annotate, with 17 bioinformatic tools predicting the deleterious effects of the identified variants [21]. Coding variants were evaluated using Sorting Intolerant from Tolerant (SIFT), polymorphism phenotyping v2 (PolyPhen-2) × 2, MutationAssessor, the likelihood ratio test (LRT), Mendelian clinically applicable pathogenicity (M-CAP) score, Rare Exome Variant Ensemble Learner (REVEL), MutPred, protein variation effect analyzer (PROVEAN), MetaSVM, and MetaLR, while MutationTaster, Eigen, Eigen-PC, functional analysis through Hidden Markov models (FATHMM-MKL), combined annotation dependent depletion (CADD) score, and deleterious annotation of genetic variants using neural networks (DANN) score were used to annotate both coding and non-coding variants [21].
Adaptive boosting (ADA) and random forest (RF) scores derived from dbscSNV v1.1 were used to predict the deleterious effect of variants within splicing consensus regions (−3 to +8 at the 5 splice site and −12 to +2 at the 3 splice site) [21,22]. We used phyloP, Genomic Evolutionary Rate Profiling (GERP), SiPhy, and phastCons scores to estimate the evolutionary conservation of the nucleotides and amino acid (aa) residues at which the variants occurred [21,23,24]. The hereditary hearing loss homepage (HHL), online Mendelian inheritance in man (OMIM), human phenotype ontology (HPO), and ClinVar databases were used to determine if there were any existing associations between the identified variants and genes and HI. Candidate variants were considered when: (1) they occurred in known HI genes (and genes expressed in the inner ear); (2) they had a predicted effect on protein function or pre-mRNA splicing (nonsense, missense, start-loss, frameshift, splicing, start-loss, etc.); and (3) they co-segregated with the HI phenotype within the family.

Sanger Sequencing
Sanger sequencing was performed for all the available family members (I.1, I.2, II.1, II.2, II.3, II.4, and II.5; Figure 1A), 118 unrelated sporadic NSHI cases from Cameroon (Table S1), and 122 apparently healthy controls that were previously recruited as blood donors at The Central Hospital of Yaoundé. Primers to target our variants of interest in exon3 (forward 5 -GAAGGAACATACTGGGGCGA-3 ; reverse 5 -AGCGCATTTTTGTTAGGCAGA-3 ) and at the exon1-intron1 boundary (forward 5 -CTCTGAGCGAAAGAGAGAAAGAG-3 ; reverse 5 -ACTTGTTGCTCCCACGACC-3 ) of the CLIC5 gene were validated using NCBI BLAST. The optimal annealing and extension temperatures for the PCR were 60 • C and 70 • C for 30 s and 1 min, respectively. PCR-amplified DNA products were Sanger sequenced using a BigDye TM Terminator v3.1 Cycle Sequencing Kit and an ABI 3130XL Genetic Analyzer ® (Applied Biosystems, Foster City, CA, USA) in the Division of Human Genetics, University of Cape Town, South Africa. Sequencing chromatograms were manually checked using FinchTV v1.4.0, and aligned in UGENE v34.0 to the CLIC5 reference sequence (ENSG00000112782; retrieved from Ensembl browser).

Evolutionary Conservation of Amino Acids and Secondary Structure Analysis
We performed a multiple sequence alignment (MSA) of human CLIC5 with non-human similar proteins to provide more evidence on the evolutionary conservation of the amino acid residue at which our candidate missense variant occurred. A PSI-BLAST search against the non-redundant protein database of CLIC5 was performed. Non-redundant, non-synthetic CLIC5 proteins from all the different species in the 500 BLAST hits were manually retrieved as FASTA files. The MSA was performed using CLUSTAL Omega v1.2.4 [25] and the MSA file was visualized using Jalview v2.10.5 [26]. Furthermore, PSIPRED v4.0 [27] and Swiss-Model [28] were used to assess the secondary structural features of both protein forms. Additionally, the InterPro [29] database was queried via the InterProScan web service [30] to identify domains and potential domain changes for both protein forms separately.

Protein Modelling
Three-dimensional modelling was performed on the longest isoform of the CLIC5 gene as follows: a homology model of the longest isoform (410 amino acids) of wild-type and mutant CLIC5 [NM_001114086.1: c.701T>C:p.(L234P)] was constructed using the program MODELLER based on the available crystal structure of human chloride intracellular channel protein 5 (PDB ID: 6Y2H) as a template [31]. PYMOL viewer was used for structural visualization and image processing.

Participants Phenotypes
A total of seven individuals from "Family 24" were recruited, including three affected individuals (II.1: 36 years old, II.2: 32 years old, and II.3: 25 years old), their parents (I.1: 61 years old, and I.2: 55 years old), and two unaffected siblings (II.4: 18 years old, and II.5: 16 years old) ( Figure 1A). The most likely mode of inheritance for the NSHI is AR. From the medical history, no environmental factors were identified as a possible cause of HI, and no HI participant had a history of ophthalmological (blurred or distorted vision, photophobia, eye pain, etc.) or neurological (vertigo, dizziness, etc.) symptoms. Additionally, no vestibular, neurologic, or any other systemic abnormalities were detected by physical examination. A history of prelingual and progressive HI was described for all three affected pedigree members; however, before this study, no formal audiological assessment was performed for any of the family members. Audiological assessment of the three affected individuals revealed bilateral profound sensorineural HI ( Figure 1B).

WES Identification of Candidate Gene and Variants
The average target region coverage was about 225×, with 96.30% of the target region being covered to a depth of 10 X or more. After applying our various filtering criteria described in the methods section, two candidate variants were found to occur in a known HI gene (CLIC5; MIM:607293) and to co-segregate with the HI phenotype. These two variants which occurred in a compound heterozygous state are the missense variant NM_016929.5:c.224T>C, and the splice-site variant NM_016929.5:c.63+1G>A. The NM_016929.5:c.224T>C variant leads to the substitution of a leucine by a proline amino acid residue at position 75 [NM_016929.5:p.(L75P)] and was predicted to be damaging by 16 of the 17 bioinformatics tools used (Table S2). The NM_016929.5:c.63+1G>A variant, which occurs in a canonical donor splice site, was predicted damaging by most of the tools that can be used to evaluate non-coding variants, including MutationTaster, FATHMM-MKL, Eigen-PC, CADD, and DANN (Table S2). Both variants were predicted as occurring in conserved positions of the genome and were both absent from the gnomAD, UK10K, Greater Middle East (GME) variome project databases, as well as the Single Nucleotide Polymorphism Database (dbSNP) ( Table S2). Based on a human splice finder server (HSF v3.1) and NNSPLICE 0.9, the variant NM_016929.5:c.63+1G>A is predicted to break the consensus 5 donor site "AAGGTAGGT" (which is altered due to the variation "AAGATAGGT") and probably alter the splicing of the pre-mRNA. The NM_016929.5:c.63+1G>A variant might therefore alter normal protein synthesis and function through various mechanisms. Based on the American College of Medical Genetics' (ACMG) guidelines for the interpretation of sequence variants, both variants were classified as pathogenic (NM_016929.5:c.63+1G>A: PSV1, PP1-S, PM2, and PP3 and NM_016929.5:c.224T>C: PM2, PP3, PM3, PP1, and PP1-S) [32,33]. In addition to CLIC5, only the CEP250 gene shows compound heterozygous synonymous variants that co-segregate with hearing impairment (Table S3), which was unlikely to be the cause of the disease.

Sanger Sequencing Confirmation of Variants
Sanger sequencing confirms these candidate variants and their co-segregation with the HI phenotype ( Figure 1A,C). The three affected individuals (II.1, II.2, and II.3) were compound heterozygous for both variants, the father (I.1) and an unaffected daughter (II.4) were heterozygous for the missense variant, and the mother (I.2) and the other unaffected daughter (II.5) were both heterozygous for the splice-site variant ( Figure 1A). Neither of these variants was detected in the 122 controls or 118 sporadic NSHI cases (Table S1)

Evolutionary Conservation of Amino Acids
The NCBI PSI-BLAST search of CLIC5 (NP_058625.2) against the non-redundant protein database found the variant position p.(L75P) to be highly conserved across all non-human species retrieved in the top 500 BLAST hits ( Figure 2). As expected, there was substantial conservation across an extensive aa block (on which the variant resides) which forms the thioredoxin/Genetic Diversity Statistics (GST)-N-terminal binding domain. This was consistent with the GERP and PhyloP scores for conservation, indicating a strong evolutionary and functional constraint on the region.

Protein Modelling: Secondary Structure Analysis and Domain Search
A significant attenuation of the protein's secondary structural features was predicted for the NM_016929.5(CLIC5):p.(L75P) variant using the PSIPRED v4.0 server, whereby; there was an abolishment of the β4 strand (Figure 3 and Figure S1 red box) and multiple changes affecting the lengths of β strands and several helices were inflicted ( Figure S1 black boxes). Using Swiss-Model, a similar distortion in the secondary structure of the mutant protein was observed; shortening of the β4 strand, although no β-strand loss was apparent. A domain search with InterProScan (InterPro v80.0) predicted the loss of the N-terminal GST domain due to the variant ( Figure S2). This domain loss was also predicted to lead to the abrogation of CLIC5 s protein binding function (GO:0005515). Model parameters were refined and showed improvement in model qualities (Table S4). database found the variant position p.(L75P) to be highly conserved across all non-human species retrieved in the top 500 BLAST hits ( Figure 2). As expected, there was substantial conservation across an extensive aa block (on which the variant resides) which forms the thioredoxin/ Genetic Diversity Statistics (GST)-N-terminal binding domain. This was consistent with the GERP and PhyloP scores for conservation, indicating a strong evolutionary and functional constraint on the region.

Protein Modelling: Secondary Structure Analysis and Domain Search
A significant attenuation of the protein's secondary structural features was predicted for the NM_016929.5(CLIC5):p.(L75P) variant using the PSIPRED v4.0 server, whereby; there was an abolishment of the β4 strand (Figures 3 and S1 red box) and multiple changes affecting the lengths of β strands and several helices were inflicted ( Figure S1 black boxes). Using Swiss-Model, a similar distortion in the secondary structure of the mutant protein was observed; shortening of the β4 strand, although no β-strand loss was apparent. A domain search with InterProScan (InterPro v80.0) predicted the loss of the N-terminal GST domain due to the variant ( Figure S2). This domain loss was also predicted to lead to the abrogation of CLIC5′s protein binding function (GO:0005515). Model parameters were refined and showed improvement in model qualities (Table S4).
Finally, we performed 3D modelling of the wild-type and mutant long isoform of CLIC5 ( Figure  3). The NM_016929.5:c.224T>C missense variant is located in a β-sheet in the extracellular domain of  (Figure 3c). We found that there was a local perturbation in the hydrophobic interaction of nearby residues at position 234 of the CLIC5 protein (Figure 3d,f). Pro234 affects the shortness of the nearby β-sheet conformation in the mutant protein, as shown in Figure 3f. There was also a difference observed on the surface charge distribution between wild-type and mutant (Figure 3e,g).

Discussion
This study is, to our knowledge, the first report highlighting the association of HI with CLIC5 Finally, we performed 3D modelling of the wild-type and mutant long isoform of CLIC5 (Figure 3). The NM_016929.5:c.224T>C missense variant is located in a β-sheet in the extracellular domain of the long isoform of CLIC5 [NM_001114086.1:c.701T>C:p.(L234P)] (Figure 3c). We found that there was a local perturbation in the hydrophobic interaction of nearby residues at position 234 of the CLIC5 protein (Figure 3d,f). Pro234 affects the shortness of the nearby β-sheet conformation in the mutant protein, as shown in Figure 3f. There was also a difference observed on the surface charge distribution between wild-type and mutant (Figure 3e,g).

Discussion
This study is, to our knowledge, the first report highlighting the association of HI with CLIC5 variants in individuals of African ancestry, and the second to demonstrate this association globally. Thus, the data confirms CLIC5 as a novel HI gene. Both pathogenic variants reported are novel: (NM_016929.5:c.224T>C) and the splicing variant (NM_016929.5:c.63+1G>A), and were not found in 118 unrelated sporadic cases of NSHI cases, reinforcing the genetic and locus heterogeneity nature of HI, and the importance of investigating diverse populations, particularly the understudied African populations, to help to enhance and refine HI disease-gene curation. The contribution of CLIC5 to NSHI in humans was first described with the identification of a homozygous nonsense variant [NM_016929.5:c.96T>A; p.(Cys32Ter)] that abrogated the protein function and co-segregated with ARNSHI in a Turkish family [14]. The two affected individuals from the aforementioned Turkish family presented an early onset sensorineural HI, which started mildly and progressed to severe-to-profound HI. This HI phenotype is similar to that described in the present study, as our three affected participants described a history of prelingual HI, and presented profound sensorineural HI at the time of the study [14]. The corresponding mutant mice model (jbg mice), which has a deletion in the CLIC5 mice ortholog gene, resulting in impaired hearing and vestibular dysfunction [15]. CLIC5 was also studied in 69 unrelated Spanish and 50 predominantly Dutch patients with ARNSHI, and no PLP variants were identified [14]. In the present study, we did not find any clinical evidence of vestibular or renal dysfunctions, unlike what was previously reported in the Turkish family [14], as well as in the corresponding mutant mice model (jbg mice) that were also shown to have abnormalities in the foot processes of the kidney podocytes leading to proteinuria [34,35]. Biological exploration of the kidney functions of affected Cameroonian individuals with PLP in CLIC5 should be performed. In addition to the inner ear and kidney abnormalities, the jbg-mutant mice also exhibited emphysema-like lung pathology, hyperactivity, and gastric haemorrhage [14,36]. Additional studies on more families and populations worldwide are needed to refine the phenotype of CLIC5-induced HI in humans.
CLIC5 (mapped on 6p21.1 locus) encodes a protein that belongs to the chloride intracellular ion channel (CLIC) family [37]. The encoded protein (CLIC5) was shown to be highly expressed in the inner ear, and important for sensorineural hearing [15]. CLIC5 protein associates with actin-based cytoskeletal structures and may play a role in multiple processes, including hair cell stereocilia formation [15]. The main function of CLIC5A in the ear is the stabilization of membrane-actin filament linkages at the base of hair cell stereocilia [15]. Therefore, a variant that abrogates CLIC5A or destabilizes its activity would lead to the destabilization of actin-based complexes, fusion, and the elongation of hair cell stereocilia, and consequently, impaired hearing [14,38]. The missense NM_016929.5(CLIC5):p.(L75P) variant reported in this study is predicted to lead to the loss of the N-terminal GST domain. This is in turn expected to abrogate CLIC5 s protein binding function (GO:0005515), and is therefore likely to affect binding to ERM proteins. Interaction of CLIC5 with the actin-based cytoskeleton is dependent upon its protein-protein interaction with ERM proteins [38].
There are three isoforms of CLIC5 [39]: The canonical isoform CLIC5B (410aa), CLIC5A (251aa) and CLIC5C (205aa). All three isoforms show evidence of expression in the human inner ear, of which CLIC5A shows the highest expression (251aa) [40]. The splice site variant we identified in this study is predicted to affect two of these three isoforms, [NM_016929.5:c.63+1G>A (251 aa; CLIC5A); NM_001256023.1:c.63+1G>A (205 aa; CLIC5C)], including isoform CLIC5A. This splice site variant is located at the 5 donor canonical splice site of exon 1 of these two isoform transcripts (position +1) and predicted to lead to a loss of the consensus 5 donor site. The missense variant reported in this study [NM_016929.5: p.(L75P)] is predicted to affect all three isoforms of CLIC5 as a missense change.
Although the identified variants in the present study are predicted to be pathogenic (Table S2), and to also affect the structure and function of the protein (Figure 2, Figures S1 and S2), more studies in other populations will likely inform and strengthen the HI disease gene-pair curation, globally, as illustrated with this case report.

Conclusions
We identified bi-allelic novel compound heterozygous pathogenic variants in CLIC5 (MIM:607293), the missense variant [NM_016929.5:c.224T>C; p.(L75P)] and the splicing variant (NM_016929.5:c.63+1G>A), that co-segregated with non-syndromic autosomal recessive hearing impairment in three affected members of a non-consanguineous family from Cameroon. This study is the second report, worldwide, to describe the CLIC5-HI gene-disease pair in humans, and thus confirms CLIC5 as a novel NSHI that should be included in targeted diagnostic gene panels. Our study emphasizes the urgent need of using WES to investigate hearing impairment in understudied African populations, in order to improve our understanding of hearing pathobiology.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4425/11/11/1249/s1, Table S1: Demographic and clinical characteristics of isolated NSHI cases screened for the identified CLIC5 pathogenic variants. Mean age = 10.92 ± 4.84 (3-31) years, Table S2: Description of pathogenic variants identified in CLIC5, Table S3: Synonymous likely benign variants identified in the CEP250 gene, Table S4: Model parameters before and after refinement showing improvement in protein model qualities, Figure S1: Secondary structure prediction of CLIC5 using the 251 amino acids isoform (NM_016929.5). Boxes indicate positions of difference between the wild-type (CLIC5A:p.75L) and mutant (CLIC5A:p.75P). Red boxes show loss of the fourth strand in the wild-type, while black boxes show changes in the lengths of strands and helices, Figure S2: Domains of CLIC5A:p.75L (wild-type) and CLIC5A:p.75P (mutant) predicted by InterPro, based on the 251 amino acids isoform (NM_016929.5). The GST N-terminal domain is lost in the mutant and its protein-binding activity is abolished.

Funding:
This study was possible thanks to funding from the Wellcome Trust, grant number 107755Z/15/Z to GAA and AW (co-applicants); NIH, USA, grant number U01-HG-009716 to AW, the African Academy of Science/Wellcome Trust, grant number H3A/18/001 to AW, and the National Institute of Deafness and other Communication Disorders grants R01 DC01165, DC003594 and DC016593 to S.M.L. The funders were not involved in study design, data collection and analysis, decision to publish, or preparation of the manuscript.