A Study on the Genetics of Primary Ciliary Dyskinesia

Primary ciliary dyskinesia (PCD) is a poorly understood disorder. It is primarily autosomal recessive and is prevalent in tribal communities of the United Arab Emirates due to consanguineous marriages. This retrospective study aimed to assess the pathogenicity of the genetic variants of PCD in indigenous patients with significant clinical respiratory problems. Pathogenicity scores of variants obtained from the chart review were consolidated using the Ensembl Variant Effect Predictor. The multidimensional dataset of scores was clustered into three groups based on their pathogenicity. Sequence alignment and the Jensen–Shannon Divergence (JSD) were generated to evaluate the amino acid conservation at the site of the variation. One-hundred and twelve variants of 28 genes linked to PCD were identified in 66 patients. Twenty-two variants were double heterozygous, two triple heterozygous, and seven homozygous. Of the thirteen novel variants, two, c.11839 + 1G > A in dynein, axonemal, heavy chain 11 (DNAH11) and p.Lys92Trpfs in dynein, axonemal, intermediate chain 1 (DNAI1) were associated with dextrocardia with situs inversus, and one, p.Gly21Val in coiled-coil domain-containing protein 40 (CCDC40), with absent inner dynein arms. Homozygous C1orf127:p.Arg113Ter (rs558323413) was also associated with laterality defects in two related patients. The majority of variants were missense involving conserved residues with a median JSD score of 0.747. Homology models of two deleterious variants in the stalk of DNAH11, p.Gly3102Asp and p.Leu3127Arg, revealed structural importance of the conserved glycine and leucine. These results define potentially damaging PCD variants in the region. Future studies, however, are needed to fully comprehend the genetic underpinnings of PCD.


Introduction
Primary ciliary dyskinesia (PCD), also known as 'motile ciliopathy' or 'motile ciliary dysfunction', is a heterogeneous clinical entity, predominantly due to biallelic genetic variants. The disorder is associated with frequent respiratory infections, laterality defects (e.g., situs inversus in 50% of patients), and infertility (due to impaired functions of the Fallopian tubes and spermatozoa) [1,2]. Affected children typically have chronic or frequent respiratory complaints (e.g., rhinitis, otitis media, sinusitis, wet cough, wheezing, bronchitis, pneumonia, and bronchiectasis) from early infancy [3,4]. Its diagnosis is challenging, as variations in over 50 genes are known to be associated with PCD [5]. In addition, hundreds of proteins are involved in the axonemes, excluding the membrane-bound ones. Furthermore, some of the reported pathogenic variants are associated with normal nasal ciliary ultrastructure (e.g., those involving the dynein axonemal heavy chain 11 (DNAH11), coiled-coil domain-containing protein 65 (CCDC65), etc.) [6,7].
involving PCD in the Arab world is still limited [26], and further population studies are highly justified. This report assesses the pathogenicity of variants in PCD-related genes found in native Emirati children with clinically significant respiratory problems.

Methods
The pediatric pulmonary service at Tawam Hospital, Al Ain, UAE is a tertiary center that serves children with respiratory problems. This study was a retrospective review of the genetic investigations performed at Tawam Hospital. It analyzed the results of the genetic tests-chromosomal microarray, comprehensive pulmonary disease panel (includes the entire coding plus 10 base pairs of the flanking intron regions of 92 single genes, see Supplementary Table S1, https://www.centogene.com/science/centopedia/ comprehensive-pulmonary-disease-panel.html, accessed on 17 September 2021), singlegene sequencing, and diagnostic WES-of 66 pediatric patients with chronic or frequent respiratory infections and positive variants in PCD genes. The PCD-related genes included in the Panel were: CCDC39, CCDC40, DNAAF1, DNAAF2, DNAH11, DNAH5, DNAH9, DNAI1, DNAI2, DNAL1, NME8, RSPH1, RSPH4A, and RSPH9.
The study was approved by 'Tawam Human Research Ethics Committee' (T-HREC); reference numbers: SA/AJ/566 (19 April 2018 and 11 December 2019) and AA/AJ/653 (19 June 2019). Informed consent to participate in this retrospective data collection for the reported variants was exempted. All methods were performed in accordance with the relevant guidelines and regulations. The data were collected from 2013 to 2019; 120 patients had genetic investigations, and 66 had positive results.
The pediatric pulmonary service at Tawam Hospital receives over four thousand outpatient visits per year. Genetic testing is usually performed for children with chronic or recurrent unexplained and clinically significant respiratory problems. Typically, these children undergo work-up that includes sweat chloride test, chest computed tomography (CT) scan, and gastrointestinal studies. When necessary, the pediatric pulmonary team will also request a comprehensive pulmonary disease panel or targeted mutation (when known disease-causing variant is present in the family). If this investigation is negative or a genetic condition is suspected, WES will be then requested by the genetic team, usually with chromosomal microarray (CMA) or duplication/deletion studies. The variants reported here include positive reports found in the comprehensive pulmonary disease panel (n = 48), WES (n = 16), and/or single gene sequencing (n = 8); see Table 1 and Supplementary  Table S2. In addition, chromosomal microarray (n = 15), duplication/deletion (n = 5; results were negative), and other specific tests (e.g., the nasal scrape biopsies performed in two children) are also included. It is worth emphasizing that the 66 children studied had at least the comprehensive pulmonary disease panel, WES, or single gene sequencing. Eight children had both the comprehensive pulmonary disease panel and WES; the outcome of these tests is also summarized in Table 1 and Supplementary Table S2. Due to hospital arrangements, many of these tests were performed by Centogene AG (Germany). The reported WES coverage was 99.40%, and variants were confirmed by Sanger sequencing. Variants of HYDIN were also verified by Sanger sequencing, but primer designs were not available to distinguish between HYDIN and the pseudogene HYDIN2. The term 'double heterozygous' was used here since, in most patients, parental data were not available to phase the variants.

Information from Public Databases
Variant information, available in public databases, was consolidated using Ensembl Variant Effect Predictor (VEP; https://www.ensembl.org/Tools/VEP; Accessed on 3 February 2020) [30]. Extracted data included effect of variation, codon and amino acid changes (where applicable), known variations from dbSNP (Single Nucleotide Polymorphism Database), functional consequence, exome allele frequency from gnomAD, splicing prediction from SpliceAI, as well as score and pathogenicity prediction from the algorithms ranked REVEL (rare exome variant ensemble learner), ranked MetaLR (meta-analytic logistic regression), ranked MetaSVM (meta-analytic support vector machine), Condel (consensus deleterious) and scaled CADD (combined annotation-dependent depletion) available in dbNSFP (One-Stop Database of Functional Predictions and Annotations for Human Non-synonymous and Splice Site) version 4.0a. Clinical assessments included only those with functional evidence supportive. ACMG (American College of Medical Genetics and Genomics) classification was obtained from https://www.varsome.com (Accessed on 3 February 2020).

Multiple Sequence Alignment
Multiple sequence alignment was performed to evaluate conservation of amino acids at the sites reported here and to compute Jensen-Shannon Divergence (JSD) scores. Amino acid sequences, where available, of proteins from Homo sapiens (human), Pan troglodytes (chimpanzee), Mus musculus (house mouse), Rattus norvegicus (Norway rat), Canis lupus familiaris (dog), Equus caballus (horse), Bos taurus (bovine), Xenopus tropicalis (frog), Gallus gallus (chicken), and Danio rerio (zebrafish) were downloaded from NCBI RefSeq (Supplementary Table S3). Sequences were imported into Geneious 9.1.8 (Biomatters Ltd., Auckland, New Zealand) and multiple sequence alignment was performed using MUS-CLE [31]. Aligned sequences were exported in FASTA format.

Jensen-Shannon Divergence (JSD)
Using multiple sequence alignment of the sequences from the species mentioned above, conservation of protein residue at the sites reported here was predicted using the JSD method [32]. This task was performed using the online tool available at https: //compbio.cs.princeton.edu/conservation/score.html (Accessed on 3 December 2019). The default settings were used for this purpose.

Clustering and Multidimensional Scaling
The variants were classified into three clusters-possibly pathogenic, uncertain, and possibly benign-by k-means clustering using the k-means function in R version 3.6.0, using a combination of predictors (the pathogenicity predictions from all algorithms mentioned in 2.1). The clustering methods yielded p values of <0.0001 on the Kruskal-Wallis test between the three groups (possibly pathogenic, uncertain, and possibly benign) for each of the scoring systems (see Figure 1 and Table 2). To reduce the dimensionality of the dataset, multidimensional scaling (MDS) was performed with the pathogenicity scores using the cmdscale function in R.

Structural Modeling
Homology models of two deleterious variants in the stalk region of DNAH11, p.Gly3102Asp, and p.Leu3127Arg were generated. For comparative modeling, the stalk region of the human cytoplasmic dynein 1 (Protein Data Bank, PDB, ID: 5NUG) structure was used. Sequences of the stalk region were aligned, and models of the wild-type (WT) and the two variants were generated using Schrödinger Prime 2019-4 (Schrödinger, LLC, New York, NY, USA).

Jensen-Shannon Divergence (JSD)
Using multiple sequence alignment of the sequences from the species mentioned above, conservation of protein residue at the sites reported here was predicted using the JSD method [32]. This task was performed using the online tool available at https://compbio.cs.princeton.edu/conservation/score.html (Accessed on 3 December 2019). The default settings were used for this purpose.

Clustering and Multidimensional Scaling
The variants were classified into three clusters-possibly pathogenic, uncertain, and possibly benign-by k-means clustering using the k-means function in R version 3.6.0, using a combination of predictors (the pathogenicity predictions from all algorithms mentioned in 2.1). The clustering methods yielded p values of <0.0001 on the Kruskal-Wallis test between the three groups (possibly pathogenic, uncertain, and possibly benign) for each of the scoring systems (see Figure 1 and Table 2). To reduce the dimensionality of the dataset, multidimensional scaling (MDS) was performed with the pathogenicity scores using the cmdscale function in R. Here, MDS reduces the dimensionality of the data and projects it on two abstract coordinates, which visualize the 'closeness' of points. The points were clustered into three groups using k-means clustering and colored red, blue, and green. The differences between the three groups (red, blue, and green) for each prediction scoring tool are significant (p < 0.0001, Kruskal-Wallis H test). (B) A dot plot of the five pathogenicity predictor scores arranged according to the clusters identified in (A); the CADD scores were divided by 33 to aid data comparison. Horizontal lines are mean. (C) A scatter plot of REVEL versus Condel scores of the missense variants. The three k-means clusters obtained are colored in red (likely pathogenic), blue (uncertain), and green (likely benign). The difference between the clusters was significant (p < 0.0001, Kruskal-Wallis H test). (D) A dot plot of the five pathogenicity predictor scores arranged according to the clusters identified in (C). Horizontal lines are mean. Here, MDS reduces the dimensionality of the data and projects it on two abstract coordinates, which visualize the 'closeness' of points. The points were clustered into three groups using k-means clustering and colored red, blue, and green. The differences between the three groups (red, blue, and green) for each prediction scoring tool are significant (p < 0.0001, Kruskal-Wallis H test). (B) A dot plot of the five pathogenicity predictor scores arranged according to the clusters identified in (A); the CADD scores were divided by 33 to aid data comparison. Horizontal lines are mean. (C) A scatter plot of REVEL versus Condel scores of the missense variants. The three k-means clusters obtained are colored in red (likely pathogenic), blue (uncertain), and green (likely benign). The difference between the clusters was significant (p < 0.0001, Kruskal-Wallis H test). (D) A dot plot of the five pathogenicity predictor scores arranged according to the clusters identified in (C). Horizontal lines are mean. Table 2. Studied variants of PCD. Exome allele frequency is from gnomAD; * indicates if homozygotes have been reported. JSD (Jensen-Shannon Divergence) scores range from 0 to 1.0, with higher values indicating better conservation; 'Conserved' refers to complete conservation of the amino acid at the site over the 10 studied species (see Methods) and 'figure numbers' to the multiple sequence alignment. MetaLR (meta-analytic logistic regression that integrates variant pathogenicity scores and allele frequency to predict deleteriousness) ranked scores range from 0 to 1; higher scores are a more likelihood of pathogenicity. MetaSVM (meta-analytic support vector machine that integrates multiple omics data) ranked scores range from 0 to 1; higher scores are a more likelihood of pathogenicity. CADD (combined annotation-dependent depletion) refers to PHRED-scaled CADD scores that range from 1 to 99 (e.g., a score of 30 suggests the variant is top 0.1%). REVEL (rare exome variant ensemble learner that integrates data from several pathogenicity predictor software) ranked scores range from 0 to 1; higher scores are a more likelihood of pathogenicity. Condel (consensus deleterious generated from SIFT (made from sequence homology and biochemistry of the alternate residue) and PolyPhen2 (made from sequence homology and known database of the protein secondary structure)) scores range from 0.0 (tolerated) to 1.0 (deleterious). 'Scatter' refers to k-means clusters in the scatter plot of REVEL versus Condel scores (see Figure 1C) and 'MDS' refers to k-means clusters in the MDS plot of MetaLR, MetaSVM, CADD, REVEL, and Condel scores (see Figure 1A); red indicates possibly pathogenic, blue uncertain, and green possibly benign. American College of Medical Genetics and Genomics (ACMG) classification was from https://www.varsome.com (Accessed on 29 November 2020). Clinical assessment is for variants with supportive functional evidence. Patient phenotypes are summarized in Table 3 Figure S2E -                 Six variations of DNAH11 were in the stem, three in the AAA3, one between the AAA3 and AAA4, one in the AAA4, three in the stalk, one in the AAA5, one between the AAA5 and AAA6, one in the AAA6, and two in the C-terminus (Figure 3). The majority involved highly conserved residues (Table 2). Homology models of the two deleterious variants, Gly3102Asp and Leu3127Arg, in the stalk region of DNAH11 showed that these amino acids are also conserved in the homologous cytoplasmic dynein that was used for the modeling (Figure 4). In Gly3102Asp, the conserved achiral glycine gives flexibility to the helical structure and the aspartate is likely to disrupt this flexibility with its bulkier negatively charged sidechain. In Leu3127Arg, the hydrophobic leucine keeps the stalk intact, and the charged arginine is likely to disrupt this assembly.

Statistics
The analyses (including means and standard deviations) were performed using SPSS statistical package (Version 20; IBM Corp., Armonk, NY, USA). The Kruskal-Wallis H test (non-parametric, k independent samples) test was used to compare groups of variants. A p value of <0.05 was considered significant.

Results
One hundred and twelve variants of 28 PCD genes were found in 66 patients. Table 2 summarizes the results of the 112 studied variants of PCD; these variants were found in the 66 pediatric patients with chronic or frequent respiratory infections and positive variants in PCD genes. Table 3 lists the 18 patients with 'double or triple heterozygous variants' or 'homozygous variants' in PCD genes.
Eighty-three variants were missense, thirteen intronic (splice site), four nonsense, four frameshift, six synonymous, one duplication, and one inframe deletion. The majority of the missense variants involved conserved residues with a JSD score (mean ± SD) of 0.724 ± 0.102 (median, 0.747).
Variants in DNAH5 (19 of 112) and DNAH11 (21 of 112) were the greatest numbers identified here. Eight variations of DNAH5 were in the stem, one in the AAA2 (ATPases associated with a variety of cellular activities 2), one between the AAA2 and AAA3, one in the AAA4, and one in the C-terminus (Figure 2). Six of the missense variations involved highly conserved residues (Table 2).
Six variations of DNAH11 were in the stem, three in the AAA3, one between the AAA3 and AAA4, one in the AAA4, three in the stalk, one in the AAA5, one between the AAA5 and AAA6, one in the AAA6, and two in the C-terminus (Figure 3). The majority involved highly conserved residues (Table 2). Homology models of the two deleterious variants, Gly3102Asp and Leu3127Arg, in the stalk region of DNAH11 showed that these amino acids are also conserved in the homologous cytoplasmic dynein that was used for the modeling (Figure 4). In Gly3102Asp, the conserved achiral glycine gives flexibility to the helical structure and the aspartate is likely to disrupt this flexibility with its bulkier negatively charged sidechain. In Leu3127Arg, the hydrophobic leucine keeps the stalk intact, and the charged arginine is likely to disrupt this assembly.
Unfortunately, lack of suitable structures and missing regions in homologous structures limit modeling other variants. Structural information for the regions of interest in the other proteins is also not available. Therefore, further analysis of their missense variations is limited to information obtained from the multiple sequence alignment of proteins from various species (Supplementary Figures S1-S5). Selected variants with high or conflicting prediction scores are discussed below.
The three variations of DNAH1 involved conserved residues (Figure S1A-C) and had high pathogenicity scores. DNAH6 Ile213 was conserved in the 10 species considered ( Figure S1D), and its substitution with valine was benign. DNAH8 Ile223 showed permissible valine substitutions ( Figure S1E), which could support CADD score (22.7) for the Ile223Thr variant (Table 2).
CCDC39 Lys359 was conserved ( Figure S2B), which could support the pathogenic scores of MetaLR, CADD, and Condel for Lys359Thr. CCDC39 Asn473 was also conserved ( Figure S2C), which could support CADD score (25.5) for Asn473Asp. CCDC39 Thr594 was not conserved (Figure S2D), which supported CADD (11.1) and REVEL (0.341) scores for Thr594Ile (Table 2). Unfortunately, lack of suitable structures and missing regions in homologous structures limit modeling other variants. Structural information for the regions of interest in the other proteins is also not available. Therefore, further analysis of their missense variations is limited to information obtained from the multiple sequence alignment of proteins from various species (Supplementary Figures S1-S5). Selected variants with high or conflicting prediction scores are discussed below.
The three variations of DNAH1 involved conserved residues (Figure S1A-C) and had high pathogenicity scores. DNAH6 Ile213 was conserved in the 10 species considered (Figure S1D), and its substitution with valine was benign. DNAH8 Ile223 showed permissible valine substitutions ( Figure S1E), which could support CADD score (22.7) for the Ile223Thr variant (Table 2).
CCDC40 Gly21 was substituted only by aspartate or glutamate in four of the nine studied species ( Figure S2H); phenotypically, the novel Gly21Val is probably pathogenic ( Table 3, Patient 8) despite its benign computational pathogenicity predictions ( Table 2). CCDC40 Asp284 resided in a region where the sequence was conserved and a conservative substitution to a glutamate was observed in only two no-mammalian species (chicken and zebrafish) studied here ( Figure S2K); thus, histidine might not be tolerated at this position. Asp284His, on the other hand, had conflicting predictions of pathogenicity (Table 2). This variant was detected in a double heterozygous state in a symptomatic child with recurrent sinusitis (Table 3, Patient 7). Thus, future studies are needed to confirm whether this variant is disease causing. DNAAF1 Glu402 was substituted by isoleucine in zebrafish ( Figure S3B). This might support the benign scores for Glu402Val, as valine could usually substitute for isoleucine. DNAI2 Ser229 was conserved ( Figure S4D), which might support CADD score (21.1) for CCDC40 Gly21 was substituted only by aspartate or glutamate in four of the nine studied species ( Figure S2H); phenotypically, the novel Gly21Val is probably pathogenic ( Table 3, Patient 8) despite its benign computational pathogenicity predictions ( Table 2). CCDC40 Asp284 resided in a region where the sequence was conserved and a conservative substitution to a glutamate was observed in only two no-mammalian species (chicken and zebrafish) studied here ( Figure S2K); thus, histidine might not be tolerated at this position. Asp284His, on the other hand, had conflicting predictions of pathogenicity (Table 2). This variant was detected in a double heterozygous state in a symptomatic child with recurrent sinusitis (Table 3, Patient 7). Thus, future studies are needed to confirm whether this variant is disease causing. DNAAF1 Glu402 was substituted by isoleucine in zebrafish ( Figure S3B). This might support the benign scores for Glu402Val, as valine could usually substitute for isoleucine. DNAI2 Ser229 was conserved ( Figure S4D), which might support CADD score (21.1) for Ser229Ala. DRC1 Arg694 was conserved ( Figure S5C), which might support the pathogenicity of Arg694Thr. HYDIN Pro3213 was substituted by different amino acids in two species ( Figure S5E); the novel variant Pro3213Arg had conflicting predictions of pathogenicity (Table 2), but it was clinically likely pathogenic (Table 3, Patients 17-18). RSPH4A Tyr217 was conserved ( Figure S5H), which supported the pathogenic scores of Condel and CADD for Tyr217Ser. RSPH4A Ile470 was also conserved except for the valine substitution ( Figure S5I), which supported the deleterious Condel score (0.781) for Ile470Met. ARMC4 Arg570 was conserved ( Figure S5K), which supported the CADD score (20.3) for Arg570Gln. ARMC4 Arg629 was conserved ( Figure S5L), which supported Condel and CADD scores for Arg629His. CEP104 Glu698 was conserved ( Figure S5M), which might support CADD score (23.3) for the novel variant Glu698Lys (Tables 2 and 3, Patient 9).
C1orf127 Arg113Ter was found in homozygous state in two related children (double cousins) with complex congenital heart disease and lateralization defect. The parents were heterozygous and asymptomatic (had situs solitus on chest radiographs and echocardiograms), suggesting autosomal recessive inheritance. Further studies are needed to understand the precise function of C1orf127 protein.

Novel Variants
Thirteen variants were novel, mostly identified in symptomatic children ( Table 3). Eight of the novel variants were missense, two frameshift, one nonsense, one duplication, and one splice donor site (Table 2). CCDC40:p.Gly21Val was found in homozygous state in two symptomatic sisters with biopsy-proven significant microtubular disorganizations, including distorted dynein arms and absent inner dynein arms, thus, confirming its pathogenicity ( Table 3, Patient 8). It is worth noting that CCDC40 Gly21 was not conserved ( Figure S2H), which accounts for the relatively low Condel and CADD scores ( Table 2). DNAH5:p.Leu2413Pro (Condel: 0.842, CADD: 29.2) was found in a heterozygous state with heterozygous DNAAF5:p.Arg263Gln (Condel: 0) in a child with significant respiratory infections from early infancy. DNAH5 Leu2413 was conserved ( Figure 2J). Consistently, the scores (including MDS) of DNAH5:p.Leu2413Pro predict pathogenicity. This variant was detected in a child with significant respiratory infections from early infancy. Therefore, other unidentified conditions could be responsible for the observed phenotype in this patient. DRC1:p.Glu382Asp had a Condel score of 0.022, which was consistent with the presence of aspartate at this position in various species ( Figure S5B). DNAH11:p.Glu414Gly was found in heterozygous state with two other heterozygous variants (CCDC39:p.Arg853Cys and DNAAF3:p.Ile18Ser) in a child with multiple anomalies. Its high Condel (0.896) and CADD (24.2) scores were consistent with the conserved DNAH11 Glu414 ( Figure 3B). DNAH11:p.Leu3080Met involved the highly conserved DNAH11 Leu3080 ( Figure 3M), supporting its Condel score of 0.987. It was found in a symptomatic child with heterozygous STAT3:p.Met660Thr, known to cause 'hyper-IgE recurrent infection syndrome 1, autosomal dominant'. Thus, future studies are needed to determine whether these variants are disease causing or not. DNAH11:c.11839+1G>A, with potential loss of splice donor site predicted by SpliceAI (donor loss delta score: 1; i.e., could be pathogenic), was found in a double heterozygous state with DNAH11:p.Arg2744Cys (Condel: 0.906) in a child with situs inversus/dextrocardia, suggesting pathogenicity (Table 3, Patient 15). DNAI1:p.Lys92Trpfs was found in a heterozygous state with the X-linked recessive OFD1:p.Lys976Thr. OFD1 is linked to ciliary dysfunction (Joubert syndrome 10, X-linked recessive). ACMG classification from VarSome for OFD1:p.Lys976Thr is uncertain significance, but computational predictors indicate that it is 'possibly damaging' ( Table 2). The affected girl had situs inversus and chronic respiratory infections, likely resulting from X-inactivation.

Discussion
Here, we analyzed the pathogenicity of genetic variants of PCD genes in the UAE. Two noticeable findings are the novel pathogenic variants of PCD in the community, and the clustering of up to seven distinct variants in some patients. High rates of inbreeding are known to increase the genomic homozygosity past prediction [33].
No individual or meta predictor is able to clearly predict the pathogenicity of all clinically relevant variants [34]. MDS is a statistical approach adopted in the interpretation of datasets involving several variables (ttp://www.stat.yale.edu/~lc436/papers/JCGS-mds. pdf; accessed on 25 October 2021). Hence, here we employed MDS to segregate and cluster variants based on multiple pathogenicity scores. As evident from Figure 1A, the reduction in dimensionality into two coordinates, while still preserving the distance between these points in multi-dimensional space, facilitated better visualization and separation of these variants into three groups. These groups were designated as likely pathogenic, uncertain, or likely benign. Overlaps, however, were observed between the range of individual predictors ( Figure 1B). Figure 1B also indicated that the meta predictors REVEL and Condel were able to segregate the three categories in this dataset. Thus, a scatter plot was generated based on REVEL and Condel scores ( Figure 1C), where the three groups clearly cluster separately ( Figure 1D). In the absence of clear evidence of the pathogenicity of variants (e.g., functional assays), such analysis could assist with inferring potential pathogenicity of variants. Thus, care must be taken while interpreting in silico pathogenicity scores in isolation.
The occurrence of one or more autosomal recessive disorders of PCD in the offspring of a couple is a function of the number of shared variants. This probability is estimated by binomial distribution, which shows a 0.4375 chance for two shared variants and a 0.8665 chance for seven shared variants. Thus, although the incidence rate in the UAE is unknown, PCD appears to be frequent in the community and needs preventive measures.
It is worth noting that DNAH1:p.Tyr3688Cys (as an example) was found in heterozygous state in a child with dextrocardia; the only other variant reported in the genes of interest in this patient was DNAH6:p.Ile213Val (Condel: 0.00). Other unidentified recessive conditions (not reported or detected by the WES test) could be responsible for the observed phenotype in this patient. Future studies are needed to ratify the inheritance of some of these variants, as autosomal recessive diseases are more amenable to genetic prevention than autosomal dominant ones.
As explained in the Results, CCDC40:p.Gly21Val (Table 3, Patient 8) is identified in homozygous state in two siblings with biopsy-proven significant microtubular disorganizations, including distorted dynein arms and absent inner dynein arms, confirming its pathogenicity. The computational variant effect predictions, however, are 'benign' ('tolerated'). This discrepancy in 'in silico prediction' is serious, especially with respect to genetic screening. In addition, many variants may be missed or incorrectly called by genome sequencing approaches. Thus, future studies are also needed to further investigate the pathogenicity of some of these variants (e.g., parental studies to help evaluate effects of single heterozygous variants) and improve the yield of genetic investigations.
Many of the PCD variants were found in children with other congenital anomalies and developmental delay (e.g., Patients 1, 2, and 9 in Table 3). Unidentified variations in other genes, thus, may have also contributed to the overall phenotype, given the high incidence of recessive disorders in our population. Future studies, thus, are needed to uncover all contributing genetic variants to the disease in any given patient. Exome sequencing study was performed on individuals with PCD from Saudi Arabia [24]. The study identified similar variants in CCDC39, CCDC40, DNAAF5, DNAH5, DNAI1, RSPH4A, and RSPH9. The study also identified other PCD variants that involved genes not in the present cohort, such as: PKD1L1, MCIDAS, CCDC151, CCNO, CYP21A2, ITCH, MCIDAS, and CEP164 [24].
The majority of the variants reported here are heterozygous and are found with other multiple heterozygous variants in symptomatic children. Further studies are needed to uncover whether individuals with heterozygous variants in PCD genes could be symptomatic, especially with genes that may show monoallelic expression [35,36]. Undoubtedly, the observed PCD alleles could result in complex phenotypes, as shown in Table 3. As recently shown, the immunofluorescence analysis is a reliable diagnostic tool for PCD, and could be used to fulfill this purpose [13].
Genetic variations from Arabian populations are underrepresented in publicly databases, such as ClinVar and Genome Aggregation Database (gnomAD). Often, this shortage impacts the ability of clinicians and genetic counsellors to draw meaningful conclusions about the variants identified. Thus, studies that catalogue variations in the community are of importance, especially in regions of a high rate of consanguinity that potentially exacerbates the prevalence of recessive disorders. This holds true for PCD, where the diagnosis still remains challenging [26]. Thus, this study describes the variations in PCD-related genes identified through a retrospective chart review of patients at a tertiary hospital in the UAE. Based on our current practice and commercial sequencing arrangement, most of the variants were identified from a comprehensive pulmonary disease panel, which covers only 14 (Supplementary Table S1) of the over 45 genes associated with PCD [10,37].
Hence, it is possible that other variations in PCD-related genes could have been overlooked. Despite this limitation, 112 variations, 13 of which are novel, have been identified in this study. Nonetheless, when PCD (ciliopathy) is suspected, it is prudent to use a panel that covers all the genes known to be associated with this condition. As 25% of the described variations in this study have been designated as 'variants of uncertain significance' in the publicly databases, it underscores the need for further studies to elucidate their role in PCD. Additionally, studies involving parents and siblings could also assist phasing and establishing the role of double and triple heterozygous variants identified here. Large-scale population-based studies using WES-or PCD-specific panels may be necessary to provide a comprehensive coverage of variations that underpin PCD this disease, and potentially guide future genetic screening programs and premarital counselling.
This investigation was retrospective and aimed at assessing variants identified in the community. Ideally, we would have preferred additional functional studies. Therefore, future investigations should aim at further characterization of these variants, including functional analyses.

Conclusions
The clinical spectrum of PCD is far from being well understood. Therefore, reports on variants associated with the disease are highly desirable. This condition is especially common in the Arabian Peninsula [24]. Our results describe several damaging variants in PCD genes in various Arabian tribes residing in the UAE. Considering the small local population of the UAE (about one million), these results could support genetic screening and counseling programs to prevent the disease. More work, however, is needed to understand the scope of the genetic underpinnings of PCD. Further studies addressing the limitations and methods of improving in silico analysis of genetic variants (e.g., including parental and sibling studies) are also necessary. Methods that analyze variants within the structure of a functioning cilium are warranted. As previously stated, PCD variants are distinctive to families, and healthcare providers need to be familiar with their significance in the community [24].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/jcm10215102/s1, Figure S1: Multiple sequence alignment of dynein axonemal heavy chain proteins (other than DNAH5 and DNAH11); Figure S2: Multiple sequence alignment of coiled-coil domain-containing (CCDC) proteins; Figure S3: Multiple sequence alignment of dynein axonemal assembly factor (DNAAF) proteins; Figure S4: Multiple sequence alignment of dynein axonemal intermediate chain (DNAI) proteins; Figure S5: Multiple sequence alignment of other proteins; Table S1: PCD genes covered by the Centogene ® comprehensive pulmonary disease panel; Table S2: Details of genetic tests done; Table S3: NCBI RefSeq accession of protein sequences used for multiple sequence alignment.  Informed Consent Statement: Consent to participate was waived by the ethics committee that approved this study.