Whole Genome Sequencing Indicates Heterogeneity of Hyperostotic Disorders in Dogs

Craniomandibular osteopathy (CMO) and calvarial hyperostotic syndrome (CHS) are proliferative, non-neoplastic disorders affecting the skull bones in young dogs. Different forms of these hyperostotic disorders have been described in many dog breeds. However, an incompletely dominant causative variant for CMO affecting splicing of SLC37A2 has been reported so far only in three Terrier breeds. The purpose of this study was to identify further possible causative genetic variants associated with CHS in an American Staffordshire Terrier, as well as CMO in seven affected dogs of different breeds. We investigated their whole-genome sequences (WGS) and filtered variants using 584 unrelated genomes, which revealed no variants shared across all affected dogs. However, filtering for private variants of each case separately yielded plausible dominantly inherited candidate variants in three of the eight cases. In an Australian Terrier, a heterozygous missense variant in the COL1A1 gene (c.1786G>A; p.(Val596Ile)) was discovered. A pathogenic missense variant in COL1A1 was previously reported in humans with infantile cortical hyperostosis, or Caffey disease, resembling canine CMO. Furthermore, in a Basset Hound, a heterozygous most likely pathogenic splice site variant was found in SLC37A2 (c.1446+1G>A), predicted to lead to exon skipping as shown before in SLC37A2-associated canine CMO of Terriers. Lastly, in a Weimaraner, a heterozygous frameshift variant in SLC35D1 (c.1021_1024delTCAG; p.(Ser341ArgfsTer22)) might cause CMO due to the critical role of SLC35D1 in chondrogenesis and skeletal development. Our study indicates allelic and locus heterogeneity for canine CMO and illustrates the current possibilities and limitations of WGS-based precision medicine in dogs.


Animals
Blood samples from a CHS-affected American Staffordshire Terrier and seven dogs affected by CMO of seven different breeds (Supplementary Table S1) were collected. Genomic DNA was isolated from EDTA blood samples using the Maxwell RSC Whole Blood DNA kit (Promega, Dübendorf, Switzerland). All dogs were diagnosed by respective veterinarians on the basis of clinical signs and imaging. Skull radiographs of four and CT scan images of two of the CMO-affected dogs were obtained.

Whole-Genome Resequencing
WGS data of the eight affected dogs were obtained after preparation of a PCR (polymerase chain reaction)-free fragment library at an average 26× coverage. Fastq-files were mapped to the dog reference genome assembly CanFam3.1. Single nucleotide and small indel variants were called and NCBI (National Center for Biotechnology Information) annotation release 105 was used to predict their functional effects, as described previously [15]. Private variants were identified by comparison with the variant catalog of 576 publically available dogs of 120 various breeds, as well as 8 wolves (Supplementary Table S1), provided by the Dog Biomedical Variant Database Consortium (DBVDC) [15]. The description of the sequence variants is in accordance with the recommendations of the Human Genome Variation Society [22]. Integrative genomics viewer (IGV) [23] was used for visual inspection and validation of the candidate variants found in the affected dogs' WGS data.

Availability of Data and Material
The WGS data of all affected as well as comparison cohort dogs/wolves have been made freely available at the European Nucleotide Archive (ENA). All ENA accession numbers are given in the Supplementary Table S1. Genes 2020, 11, 163 4 of 12

Phenotype
We investigated a CHS-affected American Staffordshire Terrier and seven CMO-affected dogs of different breeds (Australian Terrier, Basset Hound, Cairn Terrier, Curly Coated Retriever, German Wirehaired Pointer, Old English Sheepdog, and Weimaraner). The diagnosis was made by veterinarians observing the typical clinical signs including persistent and sharp pain when opening mouth, painful swelling of the jaw, fever, and in some cases also severe pain of ulna and radius bones. Age of onset of the clinical signs ranged from 3 to 7 months of age, and skull radiographs/computed tomographic (CT) images were available for six of the affected dogs (Supplementary Table S1). The imaging findings of one dog was consistent with CHS and in four dogs with CMO (Figure 1), in the remaining dog, no changes were identified on radiographs, but clinical signs were consistent with the diagnosis of CMO. Due to the self-regressive course of the disorder, the remission of clinical signs appeared in all cases soon after the diagnosis and symptomatic treatment. On the basis of the previously published gene test, all eight collected dogs were genotyped negative for the reported CMO-causing variant in SLC37A2 (XM_005619600.3:c.1332C>T) [11]. In addition, for all eight dogs, the region of the SLC37A2 gene was also manually inspected in IGV to confirm no candidate variants were missed in the functional annotation and to rule out obvious, large structural variants including copy number variation affecting this gene.
Genes 2020, 11, 163 4 of 12 We investigated a CHS-affected American Staffordshire Terrier and seven CMO-affected dogs of different breeds (Australian Terrier, Basset Hound, Cairn Terrier, Curly Coated Retriever, German Wirehaired Pointer, Old English Sheepdog, and Weimaraner). The diagnosis was made by veterinarians observing the typical clinical signs including persistent and sharp pain when opening mouth, painful swelling of the jaw, fever, and in some cases also severe pain of ulna and radius bones. Age of onset of the clinical signs ranged from 3 to 7 months of age, and skull radiographs/computed tomographic (CT) images were available for six of the affected dogs (Supplementary Table S1). The imaging findings of one dog was consistent with CHS and in four dogs with CMO (Figure 1), in the remaining dog, no changes were identified on radiographs, but clinical signs were consistent with the diagnosis of CMO. Due to the self-regressive course of the disorder, the remission of clinical signs appeared in all cases soon after the diagnosis and symptomatic treatment. On the basis of the previously published gene test, all eight collected dogs were genotyped negative for the reported CMO-causing variant in SLC37A2 (XM_005619600.3:c.1332C>T) [11]. In addition, for all eight dogs, the region of the SLC37A2 gene was also manually inspected in IGV to confirm no candidate variants were missed in the functional annotation and to rule out obvious, large structural variants including copy number variation affecting this gene.

Identification of Private Candidate Variants
We considered both the autosomal dominant and autosomal recessive modes of inheritance when analyzing the WGS data. Filtering for private variants present in a homozygous or heterozygous state in all eight affected dogs and absent in 584 publicly available dog and wolf genomes revealed no variants shared across all cases. However, filtering for private variants of each of the cases separately yielded possible dominantly inherited candidate protein-changing variants in three cases (Figures 2-4). In the remaining five affected dogs, no variants in obvious functional candidate genes were found. Therefore, we report a catalog of each dog's private variants (Supplementary Tables S2-S9). The total number of variants per genome, when compared to the CanFam3.1 reference, ranged from 4.9 million in the American Staffordshire Terrier to 5.6 million in the Basset Hound (Table 1). After filtering against the public genomes, on average ≈9200 variants were left with the least count observed in American Staffordshire Terrier (4582 variants) and the highest in the Basset Hound (13,370 variants), as shown in Table 1. In total, most of the called variants were located in non-coding intergenic (50.7%) or intronic (48.3%) regions, whereas on average 64.5 (34-96) protein-changing (0.7%) variants were found in each genome. In the CMO-affected Australian Terrier, a missense variant in exon 26 of the alpha 1 chain of type I collagen (COL1A1) gene was discovered. The detected variant (NM_001003090.1:c.1786G>A) alters the encoded amino acid residue 596 of COL1A1 (NP_001003090.1:p.(Val596Ile)) with a predicted moderate impact on the resulting protein ( Figure 2). The missense variant affects the triple helix domain of COL1A1 protein, which typically consists of the invariant Gly-X-Y repeat motif. The amino acid is conserved across species but the valine to isoleucine substitution was predicted neutral and tolerated by in silico prediction tools [25,26]. In humans, a similar missense variant has been described in exon 41, also changing the X residue in the triple repeat motif (Figure 2). There is no non-synonymous variant in the human COL1A1 coding region reported at the corresponding position in the gnomAD [24]. synonymous variant in the human COL1A1 coding region reported at the corresponding position in the gnomAD [24]. In the CMO-affected Basset Hound, a splice site variant was found in the solute carrier family 37 member 2 (SLC37A2) gene. This single nucleotide variant (XM_005619600.3:c.1446+1G>A) affects an evolutionary strongly conserved donor splice site at the beginning of intron 16 and was predicted to disrupt splicing ( Figure 3). The human consensus sequence for the U2-type GT-AG donor splice site corresponds to a perfect base pairing to the 5′ end of U1 small nuclear RNA [27]. The variant changes this canonical dinucleotide sequence ( Figure 3c) and therefore most likely eliminates the splice site and leads to skipping of exon 16 and a shortened transcript. One human variant (rs1278346722) alters the same position of the human SLC37A2 donor splice site as the dog variant; however, the guanine is exchanged by cytosine (c.1425+1G>C), not by adenine as found here. This human variant is present in only 2 out of 125,127 patients in a heterozygous state (allele frequency of 7.99 x 10 -6 ) [24]. In the CMO-affected Basset Hound, a splice site variant was found in the solute carrier family 37 member 2 (SLC37A2) gene. This single nucleotide variant (XM_005619600.3:c.1446+1G>A) affects an evolutionary strongly conserved donor splice site at the beginning of intron 16 and was predicted to disrupt splicing (Figure 3). The human consensus sequence for the U2-type GT-AG donor splice site corresponds to a perfect base pairing to the 5 end of U1 small nuclear RNA [27]. The variant changes this canonical dinucleotide sequence ( Figure 3c) and therefore most likely eliminates the splice site and leads to skipping of exon 16 and a shortened transcript. One human variant (rs1278346722) alters the same position of the human SLC37A2 donor splice site as the dog variant; however, the guanine is exchanged by cytosine (c.1425+1G>C), not by adenine as found here. This human variant is present in only 2 out of 125,127 patients in a heterozygous state (allele frequency of 7.99 × 10 −6 ) [24].
Lastly, a frameshift variant in the solute carrier family 35 member D1 (SLC35D1) gene (XM_003434643.2:c.1021_1024delTCAG) was found in a CMO-affected Weimaraner. The schematic representation indicates that the 4 bp deletion is located in exon 12 and leads to a frameshift affecting the last 15 amino acids of the SLC35D1 protein, and was predicted to produce a novel 22 amino acid long C-terminus of SLC35D1 (XP_003434691.1:p.(Ser341ArgfsTer22); Figure 4). There is no non-synonymous variant in the human SLC35D1 coding region at the corresponding position described in the gnomAD [24]. Lastly, a frameshift variant in the solute carrier family 35 member D1 (SLC35D1) gene (XM_003434643.2:c.1021_1024delTCAG) was found in a CMO-affected Weimaraner. The schematic representation indicates that the 4 bp deletion is located in exon 12 and leads to a frameshift affecting the last 15 amino acids of the SLC35D1 protein, and was predicted to produce a novel 22 amino acid long C-terminus of SLC35D1 (XP_003434691.1:p.(Ser341ArgfsTer22); Figure 4). There is no nonsynonymous variant in the human SLC35D1 coding region at the corresponding position described in the gnomAD [24].  [11] (blue), as well as the newly reported variant at the beginning of intron 16 (red). Note that the SLC37A2 gene is annotated on the reverse complementary strand. (c) Wild type and mutant allele compared to the human consensus sequence for the U2-type GT-AG donor splice sites [27]. The uppercase letters indicate exon 16 and lowercase letters intron 16 sequence, whereas the subscript numbers show the percentage of the respective conserved nucleotide in 186,630 investigated U2-type GT-AG human 5'-splice site motifs. The base at position +1 is a highly conserved G (100%) [27].

Discussion
On the basis of clinical signs and/or radiography analyses, eight dogs of eight different breeds were diagnosed with different forms of hyperostotic disorders such as CHS or CMO. Because all eight affected dogs were homozygous wild type for the previously described variant causing CMO in three different Terrier breeds [11], whole-genome sequencing of the individual cases was performed to investigate the underlying genetics.
No variants shared across all 8 affected dogs and absent from 584 publically available genomes

Discussion
On the basis of clinical signs and/or radiography analyses, eight dogs of eight different breeds were diagnosed with different forms of hyperostotic disorders such as CHS or CMO. Because all eight affected dogs were homozygous wild type for the previously described variant causing CMO in three different Terrier breeds [11], whole-genome sequencing of the individual cases was performed to investigate the underlying genetics.
No variants shared across all 8 affected dogs and absent from 584 publically available genomes were discovered, which was in accordance with the expectation of several breed-specific types. The total number of private variants varied greatly between the individual genomes. This might be partially the consequence of different levels of inbreeding in different breeds, as well as the variable numbers of breed controls present in the studied dataset [15]. By prioritization of functional candidate genes, three possible causative variants in COL1A1, SCL37A2, and SLC35D1 were identified when searching for private variants in the Australian Terrier, the Basset Hound, and the Weimaraner, respectively.
The identified COL1A1:c.1786G>A variant in a CMO-affected Australian Terrier was predicted to result in a substitution of valine by isoleucine in the triple helix domain of alpha 1 chain of type I collagen (COL1A1). The structure of connective tissues includes collagens, proteoglycans, and non-collagenous proteins, as well as enzymes that are necessary for precise extracellular matrix assembly and degradation. Mutations in genes encoding these distinct components of the matrix result in similar phenotypes [28]. The triple-helical domain in collagen proteins consists of a Gly-X-Y repeating motif that is essential for the helix folding [29]. Mutations of the Gly residues lead to more severe defects such as osteogenesis imperfecta (OMIM 120150, OMIA 002126-9615, OMIA 002127-9913). On the other hand, substitutions of the X or Y residues result in milder phenotypes and are associated with Ehlers-Danlos syndrome and Caffey disease (OMIM 120150) [28]. The autosomal dominant form of infantile cortical hyperostosis in humans (OMIM 114000) is described as a collagenopathy caused by a single heterozygous missense variant in COL1A1 with incomplete penetrance. This variant affects the X residue in a triple-helical domain of COL1A1 (Figure 2), but the exact pathogenesis of how it leads to self-limiting hyperostotic bone lesions is still unclear [3]. Therefore, we speculate that the identified missense variant in canine COL1A1 might represent a candidate variant of uncertain significance for CMO in the affected Australian Terrier. However, as the predicted amino acid substitution involves two chemically similar, hydrophobic amino acids, this result should be considered preliminary and requires further confirmation.
The detected splice site variant in SLC37A2 is the most likely pathogenic variant for CMO in the affected Basset Hound. SLC37A2 has been shown to be associated with CMO in Terriers [11]. Belonging to the glucose-phosphate transporter family, it is expressed in bone-related tissues including bone marrow or osteoclasts. In mice, Slc37a2 was shown to have an important role in osteoclast differentiation and function [30]. The previously identified synonymous variant causes altered splicing, resulting in a frameshift and premature stop codon [11]. This likely affects the function of SLC37A2 and leads to disturbed glucose homeostasis in osteoclasts in the developing bones, potentially explaining the hyperostosis phenotype of CMO-affected Terriers [11]. The process of splicing consists of the recognition of exon-intron boundaries by spliceosome complex and the subsequent excision of an intron and is highly conserved across eukaryotes [31]. Splice sites are classified in four major subtypes on the basis of components of the spliceosome and the introns' terminal dinucleotides. The majority of all human splice sites are the U2-type GT-AG subtype. The highly conserved bases at the first and second positions of the 5' motif of U2-type splice sites are G and T (100%) [27]. The herein reported splice site variant (c.1446+1G>A) detected in a CMO-affected Basset Hound changes this canonical dinucleotide sequence. Therefore, it most likely disrupts the correct splicing of intron 16.
The SLC35D1 protein-changing variant (c.1021_1024delTCAG) found in a CMO-affected Weimaraner leads to a frameshift affecting the last 15 amino acids of the SLC35D1 protein. The solute carrier family 35 member D1 (SLC35D1) gene encodes another nucleotide-sugar transporter, which has a crucial role in chondroitin sulfate biosynthesis [32]. Chondroitin sulfate chains are an important part of cartilage proteoglycans, which are needed for chondrogenesis and skeletal development [33]. Impairment of the proteoglycan synthesis was confirmed in the Slc35d1-knockout mice, which showed severe chondrodysplasia, including hypoplastic craniofacial bones and extremely short long bones. The heterozygous Slc35d1 +/mice were phenotypically apparently normal [32]. In humans, autosomal recessive mutations in SLC35D1 have been described in a rare lethal skeletal dysplasia (OMIM 269250). On the basis of these comparative data, it seems possible that the canine frameshift variant represents a likely causative variant for CMO phenotype in the affected Weimaraner. This might be either due to haploinsufficiency or a dominant negative function of the mutant protein on the residual wildtype SLC35D1.
The success rate of WGS-based identification of rare disease-causing genes is difficult to determine because about two-thirds of studied human diseases discover too many credible candidate variants and in the remaining one-third there are none [13]. Similarly, the comparisons described herein, using the recently presented DBVDC variant catalog, reduced the ≈5,000,000 variants per genome identified from WGS to <100 protein-changing variants per genome. Even though the DBVDC variant catalog is a great resource for identification of potential causal variants in dogs, the numbers of identified private protein-changing variants per genome are probably overestimated because of the still imperfect status and annotation of the current CanFam3.1 reference genome, and prioritization of causal variants remains challenging. For example, there is a gap in the reference assembly in the 5'-region of SLC37A2. Therefore, sequencing of additional genomes of well-phenotyped breed-matched obligate carriers and/or affected dogs may help unravel the complete molecular etiology of canine hyperostotic disorders such as CHS or CMO in the remaining cases by reducing the number of private candidate variants shared in cases of the same breed. De novo mutations causing autosomal dominant disorders are easier to identify when parents' data is available because each individual carries fewer variants that are not also found in the parents [16]. Unfortunately, for the examined cases in this study, we had no access to samples of the parents to evaluate the possibility of spontaneously occurring mutations causing the hyperostotic disorder. Furthermore, analyzing the transcriptome of biopsies or even better relevant cells from affected tissue by RNAseq methods would be an excellent option to get further insights into the development of this transient disease phenotype. Despite the availability of the molecular methods, studying possible temporal and/or cell-specific expression changes remains challenging due to the practical and ethical limitations of accessing the appropriate samples in privately-owned dogs. Considering the relatively mild and self-limiting phenotype, some cases may remain undiagnosed, which together with lower penetrance possibly prevented the identification of further disease-causing variants. Comparison cohort dogs used for variant filtering were assumed to be non-affected. However, in light of the fact that the affected individuals with CHS or CMO spontaneously recover, we cannot rule out the possibility that pathogenic alleles were present in the publically available genomes, which would obviously limit the power of the WGS-based precision medicine approach.

Conclusions
We propose that the novel splicing variant SLC37A2:c.1446+1G>A is most likely pathogenic for CMO. We report two other candidate causative variants of uncertain significance with less evidence for pathogenicity in compelling functional candidate genes (COL1A1 and SLC35D1). Further functional studies are needed to confirm the causality of the described variants, whereas additional genetic and/or environmental factors might also influence the manifestation of the disorder. In addition, the detection of structural variants such as larger-sized indels or segmental duplications is strongly limited. The sensitivity of our variant detection might have been compromised by the presence of gaps in the reference genome and the employed short-read sequencing technology. Consequently, long-read sequencing improved reference assembly and annotation quality, and better algorithms for indels and structural variant calling might overcome these issues in the future. Nevertheless, our study demonstrates allelic and locus heterogeneity of different forms of canine hyperostotic disorders and indicates the current possibilities and limitations of precision medicine in dogs.