A Functional Single-Nucleotide Polymorphism Upstream of the Collagen Type III Gene Is Associated with Catastrophic Fracture Risk in Thoroughbred Horses

Simple Summary Bone fractures are a major welfare concern in Thoroughbred racehorses and often lead to euthanasia. In this study, we established a cell model to investigate the genetic factors involved in fracture risk. We found that collagen type III is expressed at lower levels in cells from horses with a high genetic risk of fracture and identified a novel DNA variant that is significantly associated with fracture and can affect the expression level of collagen type III. Abstract Fractures caused by bone overloading are a leading cause of euthanasia in Thoroughbred racehorses. The risk of fatal fracture has been shown to be influenced by both environmental and genetic factors but, to date, no specific genetic mechanisms underpinning fractures have been identified. In this study, we utilised a genome-wide polygenic risk score to establish an in vitro cell system to study bone gene regulation in horses at high and low genetic risk of fracture. Candidate gene expression analysis revealed differential expression of COL3A1 and STAT1 genes in osteoblasts derived from high- and low-risk horses. Whole-genome sequencing of two fracture cases and two control horses revealed a single-nucleotide polymorphism (SNP) upstream of COL3A1 that was confirmed in a larger cohort to be significantly associated with fractures. Bioinformatics tools predicted that this SNP may impact the binding of the transcription factor SOX11. Gene modulation demonstrated SOX11 is upstream of COL3A1, and the region binds to nuclear proteins. Furthermore, luciferase assays demonstrated that the region containing the SNP has promoter activity. However, the specific effect of the SNP depends on the broader genetic background of the cells and suggests other factors may also be involved in regulating COL3A1 expression. In conclusion, we have identified a novel SNP that is significantly associated with fracture risk and provide new insights into the regulation of the COL3A1 gene.


Introduction
Bone fractures caused by overloading rather than direct trauma can occur in Thoroughbred racehorses and, due to the complexities involved in treating fractures in a large animal that must remain weight-bearing, many of these can be catastrophic and result in euthanasia.Although the incidence of catastrophic fracture is low (0.38 per 1000 stars in the UK [1]), bone fractures are the leading cause of deaths occurring on the racecourse [2], with approximately 60 horses/year suffering from a fatal fracture during a race in the UK [1].
Fractures therefore have a significant welfare and economic impact on the racing industry.Many environmental risk factors have been identified for catastrophic fracture [3][4][5][6][7], but there is also a genetic contribution to risk.The heritability of distal limb fracture has been found to range from 0.21-0.37 in UK Thoroughbreds (flat and National Hunt) [8], and multiple genomic locations have been identified that may contribute to fracture risk, including a 5 Mb region on equine chromosome 18 (ECA18) in UK flat and national hunt Thoroughbreds [9,10].A candidate SNP study using SNPs within this region also confirmed an association with fractures in Japanese flat racing Thoroughbreds [11].Similarly, both genetic [12][13][14][15] and environmental [16,17] risk factors have been identified for stress fractures in humans.However, fracture risk is a complex trait resulting from the effects of multiple genes, and no specific functional mechanisms underpinning fracture have been defined in either species.To enable better prevention of catastrophic fractures, we need to develop an improved understanding of the biological processes affected by genetic factors.
However, studying the genetic mechanisms and biological processes predisposing horses to fracture is difficult to perform in vivo due to the large variation in exposure to environmental factors of fracture risk (e.g., training and racing histories) and the fact that bone tissue can only be isolated postmortem.Small animal models are very valuable for defining gene functions [32] but are unlikely to be useful in studying specific DNA variants, as causal variants are often poorly conserved between species [33].To overcome these limitations, cell models have been widely used to study human diseases [34].Often, these are simple inherited diseases, but for complex genetic diseases, cells from affected patients have been compared with cells from unaffected donors [35][36][37].However, more recently, these models have been refined by the selection of cells based on their polygenic risk score [38,39].In contrast to humans, the use of cell models to study inherited diseases in horses has not been widely reported.
The aim of this study was to develop a cell model to study the genetic basis of fracture risk in horses.We first established genome-wide polygenic risk scores (Genomic Best Linear Unbiased Prediction, GBLUP) [40,41] for fractures.The risk scores were then utilized to create an in vitro disease model using bone cells from horses at genetically high and low risk of fracture to identify candidate genes that were differentially expressed.Whole-genome sequencing was performed to identify potential regulatory DNA variants, and further cell models were then used to investigate the mechanisms of gene regulation.

Materials and Methods
An overview of the experimental approach is shown in Figure 1.

Skin Fibroblast Cell Isolation and Culture
Equine skin fibroblasts were isolated postmortem from 44 Thoroughbred horses (of unknown fracture status) and 3 Welsh Mountain ponies to establish the cell models.Thoroughbred samples were used exclusively in experiments that determined the effects of genetic variation for fracture.Welsh Mountain pony samples were used only in experiments that were not specifically related to fracture risk.All animals had been euthanised for reasons unrelated to this study and with the consent of the Animal Health Trust Ethical Review Committee (AHT_02_2012) and Royal Veterinary College Clinical Research Ethical Review Board (URN 2021 2035-2).Briefly, small dermal tissue pieces were collected into standard culture media (Dulbecco's modified Eagle's medium (DMEM) containing 10% foetal bovine serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin (all ThermoFisher, Loughborough, UK) plus 2.5 µg/mL amphotericin B and stored at 4 • C for up to 24 h.Tissue samples were then dissected into small pieces and digested in 1 mg/mL collagenase from C. histolyticum (Sigma-Aldrich, Dorset, UK) in standard culture media plus 2.5 µg/mL amphotericin B overnight at 37 • C, 5% CO 2 .Dissociated cells were washed and cultured in standard culture media (without amphotericin B) and passaged using 0.25% trypsin-EDTA (Sigma-Aldrich) every 3-5 days.

Skin Fibroblast Cell Isolation and Culture
Equine skin fibroblasts were isolated postmortem from 44 Thoroughbred horses (of unknown fracture status) and 3 Welsh Mountain ponies to establish the cell models.Thoroughbred samples were used exclusively in experiments that determined the effects of genetic variation for fracture.Welsh Mountain pony samples were used only in experiments that were not specifically related to fracture risk.All animals had been euthanised for reasons unrelated to this study and with the consent of the Animal Health Trust Ethical Review Committee (AHT_02_2012) and Royal Veterinary College Clinical Research Ethical Review Board (URN 2021 2035-2).Briefly, small dermal tissue pieces were collected into standard culture media (Dulbecco's modified Eagle's medium (DMEM) containing 10% foetal bovine serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 µg/mL streptomycin (all ThermoFisher, Loughborough, UK) plus 2.5 µg/mL amphotericin B and stored at 4 °C for up to 24 h.Tissue samples were then dissected into small pieces and digested in 1 mg/mL collagenase from C. histolyticum (Sigma-Aldrich, Dorset, UK) in standard culture media plus 2.5 µg/mL amphotericin B overnight at 37 °C, 5% CO2.Dissociated cells were washed and cultured in standard culture media (without amphotericin B) and passaged using 0.25% trypsin-EDTA (Sigma-Aldrich) every 3-5 days.

Polygenic Risk Scoring of Thoroughbred Skin Fibroblasts
Genome-wide Complex Traits Analysis (GCTA) [42] was used on Equine SNP50 genotyping data from 269 cases and 253 controls from the previous study [9] ("the training

Polygenic Risk Scoring of Thoroughbred Skin Fibroblasts
Genome-wide Complex Traits Analysis (GCTA) [42] was used on Equine SNP50 genotyping data from 269 cases and 253 controls from the previous study [9] ("the training data set") to derive BLUP (Best Linear Unbiased Prediction) allele effects for all individual SNPs across the genome (genomic BLUP, GBLUP).This is equivalent to deriving the heritability explained by each individual SNP, a prerequisite for polygenic risk score analysis [40].DNA was extracted from skin fibroblasts using the QIAamp DNA mini kit (Qiagen, Manchester, UK) and genotyped on the Geneseek Genomic Profiler Equine Plus chip (based on Illumina's Equine SNP70 platform) (performed by Neogen, Ayr, UK), which includes over 70,000 evenly distributed SNPs, including those on the Illumina Equine SNP50 chip.Polygenic risk scores for the skin fibroblast samples were calculated by applying the PLINK (http://pngu.mgh.harvard.edu/purcell/plink/(accessed on 12 August 2020) [43] score function to the fibroblast genotypes, in conjunction with the estimated SNP BLUP effects obtained from the training data set, applying the GBLUP protocol as described by Chung [41].Additionally, we carried out genotyping and scoring of 382 DNA samples that had been collected from blood samples taken from Thoroughbreds (flat and National hunt) of unknown fracture status.These Thoroughbreds therefore represent the general population.Having risk scores from a large number of horses representing the general Thoroughbred population enabled us to then select the skin samples (from the separate bank of 44 samples that were collected) that represented the ends of the risk spectrum (see Figure 2).These samples were from male horses.All DNA samples were used with the approval of the Royal Veterinary College Clinical Research Ethical Review Board (URN 2020 2004-2).
described by Chung [41].Additionally, we carried out genotyping and scoring of 382 DNA samples that had been collected from blood samples taken from Thoroughbreds (flat and National hunt) of unknown fracture status.These Thoroughbreds therefore represent the general population.Having risk scores from a large number of horses representing the general Thoroughbred population enabled us to then select the skin samples (from the separate bank of 44 samples that were collected) that represented the ends of the risk spectrum (see Figure 2).These samples were from male horses.All DNA samples were used with the approval of the Royal Veterinary College Clinical Research Ethical Review Board (URN 2020 2004-2).

Osteoblast Differentiation
To develop a relevant cell model, we promoted osteoblast-like cell differentiation of the primary equine fibroblast cells.Once cells reached approximately 70% confluency,

Osteoblast Differentiation
To develop a relevant cell model, we promoted osteoblast-like cell differentiation of the primary equine fibroblast cells.Once cells reached approximately 70% confluency, they were cultured in osteogenic media (standard culture media plus 10 nM dexamethasone, 28 µM ascorbic acid, and either 2 mM or 10 mM β-glycerophosphate (all Sigma-Aldrich)) for 21 days at 37 • C, 5% CO 2 , with the media replaced every 3-4 days.

Histological Staining
To determine matrix mineralisation following osteoblast-like cell differentiation, cells were stained with von Kossa (Abcam, Cambridgeshire, UK) according to the manufacturer's instructions.Calcium deposition was detected by incubating the cells with 2% Alizarin red S pH 4.2 for 5 min.Hydroxyapatite deposition was detected using the OsteoImage bone mineralisation assay (Lonza, Slough, UK).

RNA Extraction, cDNA Synthesis, and Quantitative PCR
RNA was extracted from skin cells pre and post osteogenic differentiation and following SOX11 overexpression and knockdown.RNA was extracted using Tri-reagent (Sigma-Aldrich) and purified using an RNeasy mini kit (Qiagen) according to the manufacturer's instructions.Genomic DNA contamination was removed using Invitrogen™ DNA-free™ DNA Removal Kit (ThermoFisher).A total of 1 µg of RNA was used to prepare cDNA using the Sensifast cDNA synthesis kit (Bioline, London, UK).In total, 2 µL aliquots of cDNA (corresponding to 20 ng of cDNA) were used in qPCR using SYBR green containing supermix (Bioline) on the Biorad CFX96 thermal cycler (Biorad, Hertfordshire, UK) in duplicate.Primers to equine-specific genes (note SOX11 primers detect both the human and equine genes) were designed using NCBI primer blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/)(accessed on 25 March 2019) and mfold (http://www.unafold.org/)(accessed on 25 March 2019) programs to produce amplicons of 50-150 bp devoid of secondary structures at Tm 60 • C and a melting temperature ™ of 58-62 • C and. Primer sequences are in Table S1.qPCR cycling parameters were 95 • C for 10 min, followed by 45 cycles of 95 • C for 15 s, 60 • C for 15 s, and 72 • C for 15 s.Gene expression was measured relative to the 18 s rRNA housekeeping gene using the 1/2 ∆∆CT [44].The 18 s rRNA was selected as a housekeeping gene based on its stability across the samples and experimental conditions being tested.Stabilities of 18 s rRNA, GAPDH and ACTB housekeeping genes were evaluated across the experiments using RefFinder [45] (https://www.heartcure.com.au/reffinder,(accessed on 12 February 2020) Supplementary Figure S1).

Immunocytochemistry
Skin fibroblasts pre and post osteogenic differentiation were cultured on gelatin-coated coverslips (Sigma-Aldrich), fixed in 3% paraformaldehyde for 20 min, and permeabilised for 1 h with 0.1% triton-X-100 prior to blocking in 2.5% normal horse serum (Vector Laboratories, Peterborough, UK) for 20 min at room temperature.Primary antibody incubations were performed at 4 • C overnight prior to incubation with the secondary antibody.All antibodies were used at optimised concentrations in blocking solution (Table S2).Coverslips were mounted onto glass sides using Vectashield Hardset mounting media with DAPI (Vector Laboratories) and imaged using a Nikon Eclipse Ti2 series microscope (Nikon, Surrey, UK).For RUNX2 and STAT1 staining, the intensity of nuclear staining was quantified using ImageJ [46].

Whole-Genome Sequencing
Whole-genome sequencing to identify novel DNA variants associated with fractures was performed on two Thoroughbred fracture cases and two Thoroughbred controls.The DNA samples were isolated as part of the previous study [9] used to generate the polygenic risk scores (i.e., part of the discovery cohort used to develop the "training data set" as described in Section 2.2).The fracture cases sustained distal limb fractures (one horse had a sesamoid fracture and one horse had a lateral condylar fracture) that required euthanasia while racing on a UK racecourse.The fracture controls were uninjured horses over four years of age with no history of fracture and had been racing during the same time period as the cases.Whole-genome sequencing was performed by Source Bioscience (Cambridge, UK) on 100 bp (base pair) PE (paired end) reads using an Illumina HiSeq 2000 to generate 30-36× coverage per sample.Reads were aligned to the reference genome (Equcab 3.0) and variant calls made using GATK (HaplotypeCaller) [47].Base quality score recalibration, and indel realignment and duplicate removal were carried out and SNP discovery was performed according to GATK Best Practices [48].Genomic Variant Call Format (VCF) files were combined into a single VCF file and Variant Effect Predictor (VEP) [49] used for crossgenome analysis.The effect of SNPs on transcription factor binding was predicted using TomTom [50].The minor allele frequencies were calculated from publicly available datasets, including horses representing multiple breeds (http://gong_lab.hzau.edu.cn/Animal_SNPAtlas [51] (accessed on 4 June 2023)), 101 Japanese Thoroughbreds [52], and other individual breeds, including US/European Thoroughbreds [53].The genotypes of five SNP heterozygotes and five SNP homozygotes were confirmed using the Integrative Genome Viewer [54].Histone modifications in the region containing the SNP were confirmed using ENCODE data [55] and FANNGmine (version 1.3, https://faangmine.rnet.missouri.edu/(accessed on 5 June 2023)).

Overexpression of SOX11
To determine the role of SOX11 in regulating COL3A1, SOX11 overexpression assays were performed.First, 1 × 10 6 skin fibroblasts (from one Thoroughbred and two Welsh Mountain ponies as this experiment is independent of breed) were suspended in 400 µL of standard culture media lacking penicillin/streptomycin and either 20 µg of pCMV6-AC-GFP human SOX11 (NM_003108; Origene, Rockville, MD, USA) or pCAG-eGFP [56], which was a gift from Peter Andrews (University of Sheffield, Sheffield, UK).The human and equine SOX11 proteins are 90% identical.Electroporation was performed in a 4 mm gap cuvette with one 35 ms pulse of 170 V using a BTX EM830 (BTX, San Diego, CA, USA).Then, 48 h after electroporation, antibiotic resistance was used to select stably transfected cells using 1 mg /mL G418 or 3.5 µg/mL puromycin (both Sigma-Aldrich), respectively, for up to 14 days.Following selection, undifferentiated cells were harvested for analysis or differentiated into osteoblasts for 21 days as above prior to analysis.

ELISA
To determine COL3A1 protein levels, whole-cell protein extract was prepared from undifferentiated skin cells (overexpressing SOX11 or control) using three rounds of freezethawing in extraction buffer containing 20 mM Hepes pH 7.9, 450 mM NaCl, 0.4 mM EDTA, 25% glycerol, 1 mM PMSF, and the supernatant collected by centrifugation.An equinespecific COL3A1 ELISA (EKU11662, Biomatik, DE, USA) was subsequently used to measure the amount of COL3A1 in 300 ng of total whole-cell extract.Measurements were made on a microplate absorbance reader (Mplex (Infinite M200); Tecan, Männedorf, Switzerland).

Electrophoretic Mobility Shift Assay (EMSA)
To determine if the DNA region containing the COL3A1 SNP could bind to proteins nuclear protein extract (NPE) was extracted from skin fibroblasts derived from two different Welsh Mountain ponies (the experiment is independent of breed) that were overexpressing human SOX11 (as described above) using NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermofisher) according to the manufacturer's instructions.EMSAs were carried out in accordance with the protocol of LightShift chemiluminescent EMSA kit (Thermofisher).For this purpose, 20 fmol of biotinylated doublestranded oligonucleotides (equine sequences containing either the reference or alternative DNA variant), 5 ′ -biotin-GAATCACTCAAATATTTGCCTGTCAGGTCC-3 ′ or 5 ′ -biotin-GAATCACTCAAATATTGGCCTGTCAGGTCC-3 ′ , were incubated in 20 µL of reaction mixture (10× binding buffer, 2.5% (v/v) glycerol, 5 mM MgCl 2 , 50 ng/µL poly (dI•dC), 0.05% NP-40, 50 mM KCl, 10 mM EDTA) at room temperature for 20 min.For the shift assay, labelled probes were incubated with a total of 30 µg of NPE.For the competitor assay, the nuclear extract was preincubated with a 200-fold molar excess of unlabelled probes before adding the labelled probe.For the supershift assay, the nuclear extract was incubated for 1 h at room temperature with 5.8 µg of rabbit anti-SOX11 (sab4200450) (Sigma) after adding the labelled probe.A total of 5 µL of loading buffer was then added to stop the reaction.The samples were loaded onto Invitrogen™ DNA Retardation Gels (6%) (ThermoFisher) and run in 0.5X TBE for 1 h at 100 V.The gel was transferred to Invitrogen™ BrightStar™-Plus Positively Charged Nylon Membrane (Thermofisher) for 30 min at 380 mA.DNA membrane was cross-linked for 10-15 min on a transilluminator equipped with 312 nm UV bulbs.The detection was performed using LightShift chemiluminescent EMSA kit (Thermofisher) according to the manufacturer's instructions.Technical controls for the assay were carried out using Epstein-Barr Nuclear Antigen probes and protein extracts as provided in the kit.

Construction of the Luciferase Reporter Plasmids
To determine if the DNA region containing the COL3A1 SNP had promoter activity, luciferase assays were performed.The luciferase reporter plasmid was generated using a region 5 ′ upstream of the COL3A1 gene (ECA18:65,598,913-65,602,132 EquCab 3.0) from −3186 bp to +34 bp relative to the transcriptional start site (see Supplementary Figure S2).The sequence carrying the reference allele (TT) was PCR-amplified from genomic homozygous TT allele Thoroughbred DNA using the primers: Forward 5 ′ -ctattggtaccGAGGAATTGGTCACAGGAATCAC-3 ′ and Reverse 5 ′ -ctggaagcttGCAGTTCAAAGTAGCACCATC-3 ′ , showing restriction sites for KpnI and HindIII, respectively.The alternative allele (GG) fragment was created from the reference allele (TT) PCR product using the Q5 ® Site-Directed Mutagenesis Kit (New England BioLabs, Ipswich, MA, USA).The sequence of primers carrying out the alternative allele, Forward 5 ′ -CTCAAATATTGGCCTGTCAGG-3 ′ and Reverse 5 ′ -TGATTCCTGTGACCAATTC-3 ′ , were designed using online NEB primer design software, NEBaseChanger™ (https:// nebasechanger.neb.com/(accessed on 12 February 2021).pGL4.10[Luc2](#E6651 Promega, Southampton, UK) and PCR products were digested with KpnI and HindIII, and columnpurified.The KpnI-HindIII fragments with the reference or alternative allele were then cloned into the previously digested pGL4.10[Luc2]vector.In each step, PCRs were per-formed by Q5 High-fidelity DNA Polymerase (New England Biolabs), and PCR products were verified by gel electrophoresis and sequencing (performed by DNA Sequencing and Service, University of Dundee, UK).In each case, ligation products were transformed in DH5α competent cells (ThermoFisher), which were then plated onto ampicillin-containing (100 µg/mL) LB agar (Sigma-Aldrich), and individual colonies were analysed by restriction enzyme digest and sequencing after 24 h of growth at 37 • C.

Luciferase Assays
Skin fibroblasts from six Thoroughbreds (0.3 × 10 5 ) with a range of overall risk scores were transfected in 24-well plates using Lipofectamine3000 DNA transfection reagent (ThermoFisher).The cells in each well were co-transfected with 1.4 µg of reference allele (TT) or alternative allele (GG) COL3A1-5 ′ UTR firefly luciferase reporter vector and 0.14 µg of internal control Renilla luciferase vector, pGL4.74[hRluc/TK](Promega), as the control for transfection efficiency.pGL4.10[Luc2]vector with no insertion was used as a promoterless negative control.Then, 48 h after transfection, cells were harvested and assayed using Dual-Glo ® luciferase assay system (Promega).The luciferase activities were measured using luminescence reader (Mplex (Infinite M200); Tecan).Promoter activity was expressed as relative firefly luciferase activity after normalization to Renilla luciferase activity.

Statistical Analysis
Statistical analysis of qPCR, immunocytochemical quantification, ELISA, and luciferase data was performed using SPSS (v29.0,IBM Corp, Armonk, NY, USA).Shapiro-Wilks normality testing was performed to confirm a normal distribution of the data (raw or log-transformed data), and Levene's test was used to confirm equal variance between the groups.For comparisons of two groups, Student's t-test (unpaired, two-tailed) was used, and for more than two groups, ANOVA was used with Tukey's post hoc test.A chi-square test was used to measure the association of the SNP with fractures.In all statistical tests, a significance threshold of p < 0.05 was set.

Application of a Polygenic Risk Score to Select Thoroughbred Cells for Use
Skin fibroblasts were isolated from 44 Thoroughbred horses of unknown fracture status, DNA was extracted, and cells banked.Genotyping of the samples was performed to calculate their relative polygenic risk score for fracture (using their genotypes across the 50,000 SNPs in the Equine SNP50).We then selected the three samples with the lowest genetic risk and the three samples with the highest genetic risk for use in osteoblast differentiation.These samples were at the ends of the risk spectrum when compared with the scores from the 382 Thoroughbreds representing the general population (Figure 2).

Skin Fibroblasts Can Differentiate into Osteoblast-like Cells In Vitro
To generate a relevant cell type to study, we differentiated the skin fibroblasts into osteoblast-like cells.Osteoblast differentiation was carried out using two concentrations of β-glycerophosphate, 2 mM, and 10 mM.We demonstrated that after 21 days of culture, the skin fibroblasts expressed a range of different osteoblast-associated genes, and this was not significantly affected by the level of β-glycerophosphate (Figure 3A).Furthermore, both 2 mM and 10 mM β-glycerophosphate result in the production of a mineralised matrix (Figure 3B).No mineralised matrix was produced under normal culture conditions (Supplementary Figure S3). of β-glycerophosphate, 2 mM, and 10 mM.We demonstrated that after 21 days of culture, the skin fibroblasts expressed a range of different osteoblast-associated genes, and this was not significantly affected by the level of β-glycerophosphate (Figure 3A).Furthermore, both 2 mM and 10 mM β-glycerophosphate result in the production of a mineralised matrix (Figure 3B).No mineralised matrix was produced under normal culture conditions (Supplementary Figure S3).Images are representative of six biological replicates using cells at passages 5-6.

Candidate Gene Expression Analysis Reveals a Significant Difference in STAT1 and COL3A1 Gene Expression in Skin-Derived Osteoblasts from High-and Low-Risk Thoroughbred Horses
Previous work has demonstrated that a region on ECA18 was significantly associated with fracture risk [9].This region contains 27 protein-coding genes with a known function.Of these, 11 have previously published associations with bone formation or fracture, as described in the introduction.We therefore measured the expression of these 11 candidate genes in skin-derived osteoblasts from the three horses at the highest and three horses at the lowest genetic risk of fracture (as shown in Figure 2).STAT1 was expressed at significantly greater levels in cells derived from horses at high risk of fracture than in horses at low risk of fracture, and COL3A1 was expressed at significantly greater levels in cells derived from horses at low risk of fracture (Figure 4).No expression of ZNF804A was detected, and no significant differences in the expression of the other genes between high-and low-risk horses were found.
STAT1 can sequester the essential regulator of osteoblast differentiation RUNX2 in the cytoplasm of cells.We therefore examined the cellular localisation of STAT1 and RUNX2 in either the cells derived from high-and low-risk horses, but no significant differences were observed (Figure 5).Although STAT1 is ubiquitously expressed in both pre-and postosteogenic differentiation, we have previously demonstrated the specificity of the STAT1 antibody to the equine protein [57,58].
the lowest genetic risk of fracture (as shown in Figure 2).STAT1 was expressed at significantly greater levels in cells derived from horses at high risk of fracture than in horses at low risk of fracture, and COL3A1 was expressed at significantly greater levels in cells derived from horses at low risk of fracture (Figure 4).No expression of ZNF804A was detected, and no significant differences in the expression of the other genes between highand low-risk horses were found.STAT1 can sequester the essential regulator of osteoblast differentiation RUNX2 in the cytoplasm of cells.We therefore examined the cellular localisation of STAT1 and RUNX2 in either the cells derived from high-and low-risk horses, but no significant differences were observed (Figure 5).Although STAT1 is ubiquitously expressed in both preand postosteogenic differentiation, we have previously demonstrated the specificity of the STAT1 antibody to the equine protein [57,58].To identify novel SNPs that may be responsible for the differential expression of STAT1 and COL3A1, we carried out preliminary whole-genome sequencing on two fracture cases and two controls.This did not reveal any exonic SNPs in the associated region on chromosome 18 that were predicted to affect protein structure or function.However, there were a large number of intronic/intergenic variants.One variant was located 3154 bp upstream of the transcriptional start site (TSS) of COL3A1 in (position ECA18:65598945 T>G EquCab 3.0).Both cases were GG, one control was TT and one control was GT.This region is highly conserved between humans and horses, and ENCODE data demonstrate that the human region has markers of an enhancer/promoter (e.g., multiple transcription factor binding sites and histone H3 lysine 4 methylation, H3K4me).The Functional Annotation of Animal Genomes (FAANG) database also demonstrates the presence of H3K4me and H3K27ac (histone H3 lysine 27 acetylation) in the region covering the SNP in adipose tissue, ovaries, and skin.
The Genomic Evolutionary Rate Profiling (GERP) conservation score [59] of the SNP is −0.28.The minor allele frequency (MAF) was 0.32 in our 91 case horses and 0.22 in our 86 controls (i.e., 0.27 on average).Across 25 different breeds of horse [51], the MAF was 0.13.In many breeds, the MAF was either very low or absent (absent in Arabians, Belgians, Clydesdales, Icelandics, Shetlands, and Welsh ponies), whereas in US/European Thoroughbreds, it was 0.53, similar to Japanese Thoroughbreds (0.5).It was also very high in Quarter horses (0.66) (Supplementary Table S3).The genotype of ten Thoroughbreds with heterozygous or homozygous genotypes was confirmed to be accurate using the Integrative Genome Viewer.

WGS Revealed a SNP in the Region Upstream of COL3A1 That Is Predicted to Result in the Loss of a SOX11 Binding Site and Is Significantly Associated with Fracture Risk
To identify novel SNPs that may be responsible for the differential expression of STAT1 and COL3A1, we carried out preliminary whole-genome sequencing on two fracture cases and two controls.This did not reveal any exonic SNPs in the associated region on chromosome 18 that were predicted to affect protein structure or function.However, there were a large number of intronic/intergenic variants.One variant was located 3154 bp upstream of the transcriptional start site (TSS) of COL3A1 in (position ECA18:65598945 T>G EquCab 3.0).Both cases were GG, one control was TT and one control was GT.This region is highly conserved between humans and horses, and ENCODE data demonstrate that the human region has markers of an enhancer/promoter (e.g., multiple transcription factor binding sites and histone H3 lysine 4 methylation, H3K4me).The Functional Annotation of Animal Genomes (FAANG) database also demonstrates the presence of H3K4me and H3K27ac (histone H3 lysine 27 acetylation) in the region covering the SNP in adipose tissue, ovaries, and skin.
The Genomic Evolutionary Rate Profiling (GERP) conservation score [59] of the SNP is −0.28.The minor allele frequency (MAF) was 0.32 in our 91 case horses and 0.22 in our 86 controls (i.e., 0.27 on average).Across 25 different breeds of horse [51], the MAF was 0.13.In many breeds, the MAF was either very low or absent (absent in Arabians, Belgians, Clydesdales, Icelandics, Shetlands, and Welsh ponies), whereas in US/European Thoroughbreds, it was 0.53, similar to Japanese Thoroughbreds (0.5).It was also very high in Quarter horses (0.66) (Supplementary Table S3).The genotype of ten Thoroughbreds with heterozygous or homozygous genotypes was confirmed to be accurate using the Integrative Genome Viewer.
The SNP is predicted to change a transcription factor binding motif from a SOX9/10/11 binding site to a KLF13 binding site (Figure 6A).Allelic discrimination assays The SNP is predicted to change a transcription factor binding motif from a SOX9/10/11 binding site to a KLF13 binding site (Figure 6A).Allelic discrimination assays were performed (Figure 6B,C) and demonstrated that the G variant is significantly associated with fractures (p < 0.01).

SOX11 Is Expressed in Undifferentiated and Osteoblast-Differentiated Skin Fibroblasts but Is
Not Differentially Expressed between High-and Low-Risk Horses SOX9 and 10 are both members of the SOXE group and have overlapping functions [60].SOX9 is a master regulator of cartilage differentiation and is usually decreased in osteoblast differentiation [61].In contrast, SOX11, a member of the SOXC group [62], is required for osteoblast precursor survival, proliferation, and differentiation [63].We demonstrated that SOX11 is expressed in skin fibroblasts before and after differentiation into osteoblast-like cells, but there is no significant difference in SOX11 expression levels between high-and low-risk horses (Figure 7).
were performed (Figure 6B,C) and demonstrated that the G variant is significantly associated with fractures (p < 0.01).

SOX11 Is Expressed in Undifferentiated and Osteoblast-Differentiated Skin Fibroblasts but Is Not Differentially Expressed between High-and Low-Risk Horses
SOX9 and 10 are both members of the SOXE group and have overlapping functions [60].SOX9 is a master regulator of cartilage differentiation and is usually decreased in osteoblast differentiation [61].In contrast, SOX11, a member of the SOXC group [62], is required for osteoblast precursor survival, proliferation, and differentiation [63].We demonstrated that SOX11 is expressed in skin fibroblasts before and after differentiation into osteoblast-like cells, but there is no significant difference in SOX11 expression levels between high-and low-risk horses (Figure 7).

SOX11 Modulation in Skin Fibroblasts Results in Significant Changes in the Expression of COL3A1
We next determined if SOX11 can regulate COL3A1 expression by modulating its expression.Skin fibroblasts were stably transfected to overexpress the human SOX11 gene, as the human and equine SOX11 proteins are 90% identical.In the undifferentiated skin fibroblasts, this resulted in a large, significant increase in SOX11 expression (Figure 8Ai) and a significant increase in COL3A1 expression (Figure 8Aii).This suggests that SOX11

SOX11 Modulation in Skin Fibroblasts Results in Significant Changes in the Expression of COL3A1
We next determined if SOX11 can regulate COL3A1 expression by modulating its expression.Skin fibroblasts were stably transfected to overexpress the human SOX11 gene, as the human and equine SOX11 proteins are 90% identical.In the undifferentiated skin fibroblasts, this resulted in a large, significant increase in SOX11 expression (Figure 8A(i)) and a significant increase in COL3A1 expression (Figure 8A(ii)).This suggests that SOX11 regulates COL3A1.Following osteoblast differentiation of the skin fibroblasts, there was no significant difference in the expression of COL3A1 in cells overexpressing SOX11 compared with the control cells.This is because the relative level of SOX11 overexpression was reduced after differentiation, which may reflect a lower activity of the CMV promoter (used to drive overexpression) in osteoblasts (Figure 8A).No differences were observed in the total protein level or cellular localisation of COL3A1 in undifferentiated skin fibroblasts following SOX11 overexpression (Supplementary Figure S4).However, the lentiviral transfection of undifferentiated skin fibroblasts to express an shRNA to SOX11 produced a significant decrease in COL3A1 (Figure 8B).Taken together, these data demonstrate that SOX11 can regulate COL3A1 expression.

The Region Containing the SNP Binds to Nuclear Proteins from Equine Cells
To determine if the region of DNA containing the novel SNP could bind to nuclear proteins, EMSA was performed using nuclear extracts from undifferentiated skin fibroblasts overexpressing SOX11.We observed a gel shift indicating protein binding to the probe, whether the probe contained the TT control allele or the fracture-associated GG allele.However, the addition of a SOX11 antibody did not result in a supershift (Figure

The Region Containing the SNP Binds to Nuclear Proteins from Equine Cells
To determine if the region of DNA containing the novel SNP could bind to nuclear proteins, EMSA was performed using nuclear extracts from undifferentiated skin fibroblasts overexpressing SOX11.We observed a gel shift indicating protein binding to the probe, whether the probe contained the TT control allele or the fracture-associated GG allele.However, the addition of a SOX11 antibody did not result in a supershift (Figure 9).EMSA revealed that a probe containing either the wild-type (TT) or variant (GG) allele was capable of binding to a nuclear protein as seen by the shift (arrow).The use of an unlabelled competitor probe prevented this shift from occurring.The addition of an antibody to SOX11 did not result in a detectable supershift.Image representative of results observed using nuclear extracts from fibroblasts isolated from two different ponies.The boxed area represents a technical control for the assay, carried out using a biotin-labelled probe for Epstein-Barr Nuclear Antigen (EBNA, labelled E in the figure), an unlabelled probe, and an EBNA protein extract.The image is of the entire full-size blot and is not cropped.

The Region Containing the SNP Has Promoter Activity, and the SNP Affects Reporter Gene Expression in a Genetic-Background-Dependent Manner
To determine if the region of DNA containing the novel SNP had any promoter activity, we carried out luciferase assays in undifferentiated skin fibroblasts.We demonstrated that the region upstream of the equine COL3A1 gene, including the SNP, does have promoter activity (Figure 10).However, the effect of the SNP is variable and depends on the genetic background of the individual.In skin fibroblasts derived from some horses, the presence of the SNP caused a reduction in promoter activity (Figure 10A,B).In contrast, in other horses, the presence of the SNP caused an increase in promoter activity (Figure 10C,D) and, in some horses, the presence of the SNP had no significant effect on promoter activity (Figure 10E,F).There was no correlation between the result and the overall risk score of the horse from which the cells were derived.Each assay was carried out on three to six technical replicates performed simultaneously.However, additional replicates were also performed using the same donor horse but on replicates set up at three different times.For the same donor cell line, the results are consistent between these independent experiments (Supplementary Figure S5).EMSA revealed that a probe containing either the wild-type (TT) or variant (GG) allele was capable of binding to a nuclear protein as seen by the shift (arrow).The use of an unlabelled competitor probe prevented this shift from occurring.The addition of an antibody to SOX11 did not result in a detectable supershift.Image representative of results observed using nuclear extracts from fibroblasts isolated from two different ponies.The boxed area represents a technical control for the assay, carried out using a biotin-labelled probe for Epstein-Barr Nuclear Antigen (EBNA, labelled E in the figure), an unlabelled probe, and an EBNA protein extract.The image is of the entire full-size blot and is not cropped.

The Region Containing the SNP Has Promoter Activity, and the SNP Affects Reporter Gene Expression in a Genetic-Background-Dependent Manner
To determine if the region of DNA containing the novel SNP had any promoter activity, we carried out luciferase assays in undifferentiated skin fibroblasts.We demonstrated that the region upstream of the equine COL3A1 gene, including the SNP, does have promoter activity (Figure 10).However, the effect of the SNP is variable and depends on the genetic background of the individual.In skin fibroblasts derived from some horses, the presence of the SNP caused a reduction in promoter activity (Figure 10A,B).In contrast, in other horses, the presence of the SNP caused an increase in promoter activity (Figure 10C,D) and, in some horses, the presence of the SNP had no significant effect on promoter activity (Figure 10E,F).There was no correlation between the result and the overall risk score of the horse from which the cells were derived.Each assay was carried out on three to six technical replicates performed simultaneously.However, additional replicates were also performed using the same donor horse but on replicates set up at three different times.For the same donor cell line, the results are consistent between these independent experiments (Supplementary Figure S5).We were not able to perform the luciferase assays in the skin cells following osteogenic differentiation as the cells cannot be transfected.

Discussion
In this study, we employed polygenic risk scores using genome-wide information to facilitate the establishment of an in vitro system to study bone gene regulation in horses at high and low genetic risks of fracture.These samples were not part of the original cohort of cases and controls used to derive the risk score.The validation of the polygenic risk score in a new cohort of cases and controls is currently ongoing.Skin fibroblasts were selected as the starting cell type due to their accessibility and ease of culture, which allowed us to create a bank of cells from a large number of horses representing the general Thoroughbred population and select cells which were at very high or very low genetic risk.Primary osteoblasts could not be used, as they have very limited proliferative capacity in vitro [64].Skin fibroblasts have previously been shown to share properties with MSCs in vitro and can be driven to differentiate into mineralising osteoblasts [65,66].We confirm this finding using horse skin fibroblasts.In future studies, it may be possible to use other cells as the starting population, such as primary MSCs [67] or induced pluripotent stem cells (iPSCs) [68][69][70].We selected skin fibroblasts only from male Thoroughbreds to ensure we captured samples at the ends of our risk spectrum and because some previous studies have shown an association of catastrophic fracture with sex [3,6,71], but future studies using larger numbers of female and male cells would be warranted.Our cell-based system was established to allow us to control for environmental variability, but it is well known that other factors can influence cell behaviour in vitro, e.g., donor age.This study only used cells from young Thoroughbred horses between the ages of two and four years old, but we cannot rule out that other biases may have existed in the cells which affected the results.We were not able to perform the luciferase assays in the skin cells following osteogenic differentiation as the cells cannot be transfected.

Discussion
In this study, we employed polygenic risk scores using genome-wide information to facilitate the establishment of an in vitro system to study bone gene regulation in horses at high and low genetic risks of fracture.These samples were not part of the original cohort of cases and controls used to derive the risk score.The validation of the polygenic risk score in a new cohort of cases and controls is currently ongoing.Skin fibroblasts were selected as the starting cell type due to their accessibility and ease of culture, which allowed us to create a bank of cells from a large number of horses representing the general Thoroughbred population and select cells which were at very high or very low genetic risk.Primary osteoblasts could not be used, as they have very limited proliferative capacity in vitro [64].Skin fibroblasts have previously been shown to share properties with MSCs in vitro and can be driven to differentiate into mineralising osteoblasts [65,66].We confirm this finding using horse skin fibroblasts.In future studies, it may be possible to use other cells as the starting population, such as primary MSCs [67] or induced pluripotent stem cells (iPSCs) [68][69][70].We selected skin fibroblasts only from male Thoroughbreds to ensure we captured samples at the ends of our risk spectrum and because some previous studies have shown an association of catastrophic fracture with sex [3,6,71], but future studies using larger numbers of female and male cells would be warranted.Our cell-based system was established to allow us to control for environmental variability, but it is well known that other factors can influence cell behaviour in vitro, e.g., donor age.This study only used cells from young Thoroughbred horses between the ages of two and four years old, but we cannot rule out that other biases may have existed in the cells which affected the results.
To differentiate the skin fibroblasts into osteoblasts, we tested two concentrations of β-glycerophosphate.An amount of 10 mM of β-glycerophosphate has been commonly applied to equine-induced pluripotent stem cells (iPSCs) [72], equine MSCs [73][74][75], and human and mouse skin fibroblasts [65,66].However, there have been reports that levels of β-glycerophosphate over 2 mM can lead to nonphysiological mineral deposition [76].In this study, we saw no significant effect of β-glycerophosphate concentration on gene expression or matrix mineralisation, but additional biological replicates may enable the detection of smaller differences in differentiation.
We used the skin-derived osteoblasts to carry out a candidate gene expression analysis of genes that lie within a region on ECA18 that has previously been shown to be associated with catastrophic fracture [9].All of the candidate genes were expressed in the skin-derived osteoblasts, with the exception of ZNF804A, which was previously suggested to be a candidate gene for fracture risk [9] but was not expressed in our experimental system.Of the candidate genes examined, COL3A1 and STAT1 were differentially expressed in the high-and low-risk cells.In the future, analyses using RNA sequencing to examine global gene expression profiles would be beneficial.As we had a limited number of samples (three high-risk and three low-risk), future work using larger cohorts may detect other differentially expressed genes.However, we were limited in the samples we had available as we wanted to ensure that we selected cells that were at the ends of the risk spectrum [38].
STAT1 −/− mice have a significant increase in bone mineral density, mineral content, and bone growth [30].This is due to increased osteoblast differentiation [28].STAT1 acts to sequester the essential regulator of osteoblast differentiation RUNX2 in the cytoplasm of cells.Therefore, in the absence of STAT1, more RUNX2 enters the nucleus to activate osteoblast-associated gene transcription [28].Furthermore, it has been shown in mice that STAT1 inhibition accelerates fracture healing [29].RUNX2 expression and activity need to be tightly controlled, as overexpression of Runx2 leads to mice that have osteopenia and suffer from multiple fractures [77].Therefore, in our study, the higher level of STAT1 observed in the high-risk cells may contribute to the misregulation of other bone-specific genes and predisposition to fracture.However, we were unable to demonstrate any differences in the cellular localisation of STAT1 or RUNX2 in the high-or low-risk samples.Furthermore, our WGS was not able to detect any putative, functional SNPs in the vicinity of the STAT1 gene.This may reflect the small number of WGS samples used.In the future, it would be beneficial to perform WGS on more samples to increase our power to detect SNPs and/or other variants (e.g., structural variants).However, our results may also suggest that other factors are regulating the expression of STAT1, such as long-range enhancers.Further work to determine the mechanisms behind the differential expression of STAT1 and identify any downstream effects is therefore required.COL3A1 −/− mice are not viable.However, COL3A1 −/+ mice have reduced bone volume, bone density, and trabecular thickness, and COL3A1 −/− MSCs have reduced osteoblast differentiation compared with wild-type MSCs [22].In humans, the loss of the function mutations of COL3A1 results in Ehlers-Danlos Syndrome, and these patients have a tendency to fracture, low bone mass, and abnormal bone structures [78].Therefore, in our study, the lower levels of COL3A1 in the cells derived from horses at high risk of fracture may contribute to their risk.COL3A1 regulation is not well-defined, but our WGS data revealed an SNP 3154 bp upstream of the COL3A1 transcriptional start site.As our WGS was performed on only two cases and two controls, we carried out allelic discrimination genotyping in a further 86 controls and 91 cases and found that the SNP was significantly associated with fracture.
The SNP exists in a region which contains histone modifications in multiple equine tissues, suggesting promoter/enhancer function.No histone modifications were found in the FAANG database for bone tissue, but these tissues had lower-quality metrics, and the data were more limited than for other tissues [79].Although its GERP score does not indicate high levels of evolutionary conservation, there are limitations to using this approach to identify deleterious mutations, particularly in noncoding sites [80].The MAF of the SNP varied between breeds, with most breeds showing very low frequencies and Thoroughbreds and Quarter horses showing much higher frequencies.The incidence of catastrophic fracture is high in both of these breeds [81].Interestingly, the MAF appeared to be lower in our UK Thoroughbreds compared with variant catalogues of international Thoroughbreds, which may reflect some population differences.The higher allele frequencies in some Thoroughbred populations and the small difference in the cases versus the controls in the UK population (0.32 versus 0.22) suggest it is unlikely to be a variant of high effect.However, complex diseases are most likely explained by multiple variants of small-to-moderate effect size, and while minor alleles are often the risk alleles [82], the major allele within a breed or population can sometimes be the risk allele [83,84].
Bioinformatics tools suggested that the SNP may decrease SOX11 binding and promote KLF13 binding.KLF13 is a transcription factor involved in erythropoiesis [85].It has also been shown to act as a tumour suppressor in colorectal cancer cells [86] and to be a novel regulator of heart development [87].It is not clear if KLF13 has any role in bone formation or function, and we did not measure KLF13 expression in this study.However, KLF13 has been shown to be upregulated by dexamethasone during osteoblast differentiation [88] and is expressed in developing mouse skeletons [89].The potential effect of SNPs on KLF13 binding and KLF13 regulation of COL3A1 therefore warrants future investigation.
Although SOX11 is involved in bone differentiation [63,90], it has not previously been shown to regulate COL3A1.In skin fibroblasts, the stable knockdown of SOX11 through an integrated shRNA resulted in a significant decrease in COL3A1 expression.Likewise, the overexpression of SOX11 produced a significant increase in COL3A1 gene expression.This was not reflected at the protein level; however, protein and RNA levels do not always correlate, particularly in response to perturbation [91].This may suggest that COL3A1 is also regulated at the level of translation.Following osteogenic differentiation, no significant effect of SOX11 overexpression was observed, but this likely reflects the lower levels of SOX11 overexpression following differentiation.Promoter activity has been correlated with cell type, and CMV in particular shows variable expression in different cells [92].Future work to test the activity of different promoters in equine osteoblast cells is required to overcome this limitation.
The promoter region of the equine COL3A1 gene has not previously been defined.EMSA using equine nuclear extracts from SOX11 overexpressing cells demonstrated protein binding to the region of DNA containing the SNP.Protein binding was observed using DNA containing both the control (TT) and alternative (GG) allele.Fainter binding was observed in the presence of the GG allele, but we cannot rule out the possibility that this reflects technical variation in the assay rather than a true reduction in binding.We were unable to detect a supershift using an antibody to SOX11 and are therefore unable to confirm the identity of the binding proteins.This may reflect a genuine lack of SOX11 binding to our region of interest, or it may reflect a lack of detection.SOX11 has previously been reported to contain a conserved domain that inhibits DNA binding in vitro and prevents detectable binding using EMSAs [62,93], although other studies have detected SOX11 binding using in vitro translated proteins [94].We used nuclear extracts from SOX11 overexpressing cells to ensure that robust levels of SOX11 were present.However, the SOX11 protein is tagged with GFP.This may interfere with SOX11 antibody binding and may account for the lack of supershift.Alternatively, it may reflect the weak binding of SOX11 under the conditions used in this in vitro assay.The GG allele is predicted to result in both the loss of the SOX11 binding site and the gain of a KLF13 binding site.As both proteins are of a similar size, it is not possible to determine if the binding proteins to the TT probe are the same as those binding to the GG probe without further analysis.Chromatin immunoprecipitation is required to determine if SOX11 can bind to the COL3A1 region within cells.
The SOX11 knockdown and overexpression assays were performed using cells from both Thoroughbreds and Welsh Mountain ponies and produced similar results.The EMSA was performed using cells from Welsh Mountain ponies.Welsh Mountain ponies do not carry the COL3A1 variant (Supplementary Table S3) and do not commonly have fractures due to bone overloading and are therefore may be at low genetic risk of fracture, or this may reflect the disciplines they are used in (i.e., not racing).In this study, pony cells were only used in experiments that were independent of the breed of horse, as they do not determine the effect of genetic variants but only demonstrate that SOX11 can regulate COL3A1 expression and that the region of interest can bind to nuclear proteins.Such experiments are usually performed in either immortalized/cancerous cell lines or in laboratory animal cells to understand human biology [63,90].This is because gene functions are well-conserved between species and the outcomes are unlikely to be breed-specific.
Luciferase assays in undifferentiated skin cells demonstrated that the proximal region (from −3186 to +34) of COL3A1 containing the SNP of interest does have promoter activity.We attempted to perform the luciferase assays in an osteoblast cell type, but it was not possible to transfect the skin cells following osteoblast differentiation, possibly due to the presence of the mineralised matrix they produce.To our knowledge, there are no reports of a successful transfection of cells following osteogenic culture and mineralisation.There are also very few reports on the successful culture of primary equine osteoblasts, and we did not have access to suitable tissue samples to derive these cells.Attempts to use primary equine osteoblasts for these experiments were therefore outside the scope of this paper.
Furthermore, the luciferase assays demonstrated that the effect of the SNP was dependent on the genetic background of the horse, with different effects observed using cells from different horses.This reflects the complexity of studying complex genetic conditions, even in an in vitro system where environmental effects are minimised.The results suggest that multiple genetic modifications may interplay to regulate COL3A1 expression, and further work to identify other pathways that may converge on COL3A1 is required.In addition, the findings demonstrate that although the SNP was significantly associated with fractures, it is unlikely to be a better predictor of fracture risk than the use of a polygenic risk score that utilizes a genome-wide approach.
The major limitation of this study was the small sample sizes, as described.In addition, the luciferase assays, EMSAs, and SOX11 modulation experiments were only successfully performed in the undifferentiated fibroblasts due to the technical limitations described and the lack of an equine osteoblast cell line.However, this work is the first demonstration that SOX11 is capable of regulating COL3A1 and demonstrates that a specific region upstream of COL3A1 has promoter activity in vitro.Furthermore, the fibroblasts used have osteogenic differentiation potential.Bone remodelling and fracture repair involve the activities of both mature osteoblasts and progenitor cells [95], and future work to understand the regulation and role of COL3A1 in both cell types during bone remodelling is warranted.

Conclusions
Increasing our understanding of the genetic factors and biological processes involved in fracture risk may lead to novel methods to reduce risk in the future.Here, we established a cell-based system to measure the expression of candidate genes present in a fractureassociated region on ECA18 [9,11].We demonstrated that one of the genes within this region, COL3A1, is differentially expressed in cells from horses at high and low genetic risk of fracture and identified a novel SNP upstream of COL3A1 that lies in a promoter region.The SNP may affect SOX11 binding, and we demonstrate that SOX11 modulates COL3A1 expression.As a complex disease, this SNP is likely to play only a small role in fracture risk, but the findings demonstrate the feasibility of using cell models to identify novel DNA variants and biological processes associated with equine diseases.

Supplementary Materials:
The following supporting information can be downloaed at: https: //www.mdpi.com/article/10.3390/ani14010116/s1,Table S1: List of primer sequences used in qPCR; Table S2: Antibodies used in immunocytochemistry; Table S3: Minor allele frequencies of the SNP in different breeds;   Informed Consent Statement: Not applicable.

Animals 2024 , 24 Figure 1 .
Figure 1.An overview of the experimental approach used demonstrating the order of the experiments.Red cells indicate cells from horses with a high polygenic risk score (PRS), and blue cells indicate cells from horses with a low PRS.Grey indicates cells from horses and ponies with either unknown or a range of risk scores.Created with BioRender.com(accessed on 6 June 2023).

Figure 1 .
Figure 1.An overview of the experimental approach used demonstrating the order of the experiments.Red cells indicate cells from horses with a high polygenic risk score (PRS), and blue cells indicate cells from horses with a low PRS.Grey indicates cells from horses and ponies with either unknown or a range of risk scores.Created with BioRender.com(accessed on 6 June 2023).

Figure 2 .
Figure 2. The polygenic risk scores of 382 Thoroughbreds of unknown fracture status (representing the general population) with arrows indicating the scores for the skin cells selected as the three highest (red) and three lowest (blue) risks.

Figure 2 .
Figure 2. The polygenic risk scores of 382 Thoroughbreds of unknown fracture status (representing the general population) with arrows indicating the scores for the skin cells selected as the three highest (red) and three lowest (blue) risks.

Figure 3 .
Figure 3. Skin fibroblasts can differentiate into mineralising osteoblasts: (A) Box and whisker plots showing osteoblast-associated gene expression in skin fibroblasts from six individual Thoroughbred horses.There are no significant differences in the expression of osteoblast associated genes following 21 days of differentiation in media with either 2 mM (light grey bars) or 10 mM β-glycerophosphate (dark grey bars).Expression is shown relative to the housekeeping gene.(B) A mineralised matrix is deposited following osteoblast differentiation in either 2 mM or 10 mM β-glycerophosphate.Positive staining for calcium deposition is shown in black for von Kossa and red for Alizarin Red S. Positive hydroxyapatite staining detected under a fluorescent light is shown in green.Scale bars = 100 µm.Images are representative of six biological replicates using cells at passages 5-6.

Figure 3 .
Figure 3. Skin fibroblasts can differentiate into mineralising osteoblasts: (A) Box and whisker plots showing osteoblast-associated gene expression in skin fibroblasts from six individual Thoroughbred horses.There are no significant differences in the expression of osteoblast associated genes following 21 days of differentiation in media with either 2 mM (light grey bars) or 10 mM β-glycerophosphate (dark grey bars).Expression is shown relative to the housekeeping gene.(B) A mineralised matrix is deposited following osteoblast differentiation in either 2 mM or 10 mM β-glycerophosphate.Positive staining for calcium deposition is shown in black for von Kossa and red for Alizarin Red S. Positive hydroxyapatite staining detected under a fluorescent light is shown in green.Scale bars = 100 µm.Images are representative of six biological replicates using cells at passages 5-6.

Figure 4 .
Figure 4. Expression of candidate genes from the fracture-associated region on ECA18 in skin-cellderived osteoblasts.There is a significant difference in the expression of STAT1 and COL3A1 between cells derived from high-and low-risk Thoroughbred horses.Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m of cells derived from three individual horses, each differentiated into osteoblasts in duplicate (once in 2 mM β-glycerophosphate and once in 10 mM β-glycerophosphate). * p < 0.05.N.D = expression not detected.Cells were at passages 5-6.

Figure 4 .
Figure 4. Expression of candidate genes from the fracture-associated region on ECA18 in skin-cellderived osteoblasts.There is a significant difference in the expression of STAT1 and COL3A1 between cells derived from high-and low-risk Thoroughbred horses.Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m of cells derived from three individual horses, each differentiated into osteoblasts in duplicate (once in 2 mM β-glycerophosphate and once in 10 mM β-glycerophosphate). * p < 0.05.N.D = expression not detected.Cells were at passages 5-6.

3. 4 .
WGS Revealed a SNP in the Region Upstream of COL3A1 That Is Predicted to Result in the Loss of a SOX11 Binding Site and Is Significantly Associated with Fracture Risk

Figure 5 .
Figure 5. STAT1 and RUNX2 have similar cellular localisations in cells derived from high-risk and low-risk horses: (A) Immunocytochemistry demonstrates STAT1 (red) expression in undifferentiated Thoroughbred skin fibroblasts and following osteoblast differentiation, but RUNX2 (red) is only present following differentiation.DAPI staining of the nucleus is shown in blue.Scale bar = 50 µm.Images are representative of replicates using cells derived from three different donors.(B) Quantification of the intensity of the staining in the nuclei shows no significant differences between cells derived from high-and low-risk horses.Error bars represent the s.e.m from cells derived from three different donors.

Figure 5 .
Figure 5. STAT1 and RUNX2 have similar cellular localisations in cells derived from high-risk and low-risk horses: (A) Immunocytochemistry demonstrates STAT1 (red) expression in undifferentiated Thoroughbred skin fibroblasts and following osteoblast differentiation, but RUNX2 (red) is only present following differentiation.DAPI staining of the nucleus is shown in blue.Scale bar = 50 µm.Images are representative of replicates using cells derived from three different donors.(B) Quantification of the intensity of the staining in the nuclei shows no significant differences between cells derived from high-and low-risk horses.Error bars represent the s.e.m from cells derived from three different donors.

Figure 6 .
Figure 6.An associated SNP of interest lies upstream of COL3A1.(A) A SNP identified through preliminary whole-genome sequencing is predicted to cause the loss of a SOX9/10/11 binding site and the gain of a KLF13 binding site.Red arrow indicates the position of the SNP.Analysis output from TomTom.(B) Representative allelic discrimination plot of the COL3A1 upstream SNP.Blue indicates TT homozygotes, green indicates TG heterozygotes, and red indicates GG homozygotes.(C) The percentage distribution of the genotypes in 86 Thoroughbred horses without a fracture (controls) and 91 Thoroughbred horses with catastrophic fractures (cases).

Figure 6 .
Figure 6.An associated SNP of interest lies upstream of COL3A1.(A) A SNP identified through preliminary whole-genome sequencing is predicted to cause the loss of a SOX9/10/11 binding site and the gain of a KLF13 binding site.Red arrow indicates the position of the SNP.Analysis output from TomTom.(B) Representative allelic discrimination plot of the COL3A1 upstream SNP.Blue indicates TT homozygotes, green indicates TG heterozygotes, and red indicates GG homozygotes.(C) The percentage distribution of the genotypes in 86 Thoroughbred horses without a fracture (controls) and 91 Thoroughbred horses with catastrophic fractures (cases).Animals 2024, 14, x FOR PEER REVIEW 13 of 24

Figure 7 .
Figure 7. SOX11 expression.SOX11 is expressed in Thoroughbred skin fibroblasts pre and post differentiation into osteoblast-like cells.There are no significant differences in expression between cells derived from high-and low-risk horses.Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m of cells derived from three individual horses.Cells were at passages 2-6.

Figure 7 .
Figure 7. SOX11 expression.SOX11 is expressed in Thoroughbred skin fibroblasts pre and post differentiation into osteoblast-like cells.There are no significant differences in expression between cells derived from high-and low-risk horses.Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m of cells derived from three individual horses.Cells were at passages 2-6.

Animals 2024 , 24 Figure 8 .
Figure 8. (A) SOX11 stable modulation results in significant changes in COL3A1 expression: (Ai) Stable transfection of a SOX11 overexpression plasmid (pSOX11) results in a significant increase in SOX11 expression in equine skin fibroblasts (one Thoroughbred, two ponies), pre and post osteogenic differentiation, compared with control cells (pGFP).(Aii) This results in a significant increase in COL3A1 expression in the undifferentiated cells (pre osteogenic culture).(B) Lentiviral expression of a shRNA to SOX11 reduces SOX11 (Bi) and COL3A1 (Bii) expression in undifferentiated skin fibroblasts (three Thoroughbred, one pony).Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m from cells derived from three or four different donors.* p < 0.05.Cells between passages 5 and 10 were used in these experiments.

Figure 8 .
Figure 8. (A) SOX11 stable modulation results in significant changes in COL3A1 expression: (A(i)) Stable transfection of a SOX11 overexpression plasmid (pSOX11) results in a significant increase in SOX11 expression in equine skin fibroblasts (one Thoroughbred, two ponies), pre and post osteogenic differentiation, compared with control cells (pGFP).(A(ii)) This results in a significant increase in COL3A1 expression in the undifferentiated cells (pre osteogenic culture).(B) Lentiviral expression of a shRNA to SOX11 reduces SOX11 (B(i)) and COL3A1 (B(ii)) expression in undifferentiated skin fibroblasts (three Thoroughbred, one pony).Expression is shown relative to the housekeeping gene.Error bars represent the s.e.m from cells derived from three or four different donors.* p < 0.05.Cells between passages 5 and 10 were used in these experiments.

Figure 9 .
Figure 9.The region upstream of COL3A1 containing the SNP can bind to a nuclear protein.EMSA revealed that a probe containing either the wild-type (TT) or variant (GG) allele was capable of binding to a nuclear protein as seen by the shift (arrow).The use of an unlabelled competitor probe prevented this shift from occurring.The addition of an antibody to SOX11 did not result in a detectable supershift.Image representative of results observed using nuclear extracts from fibroblasts isolated from two different ponies.The boxed area represents a technical control for the assay, carried out using a biotin-labelled probe for Epstein-Barr Nuclear Antigen (EBNA, labelled E in the figure), an unlabelled probe, and an EBNA protein extract.The image is of the entire full-size blot and is not cropped.

Figure 9 .
Figure9.The region upstream of COL3A1 containing the SNP can bind to a nuclear protein.EMSA revealed that a probe containing either the wild-type (TT) or variant (GG) allele was capable of binding to a nuclear protein as seen by the shift (arrow).The use of an unlabelled competitor probe prevented this shift from occurring.The addition of an antibody to SOX11 did not result in a detectable supershift.Image representative of results observed using nuclear extracts from fibroblasts isolated from two different ponies.The boxed area represents a technical control for the assay, carried out using a biotin-labelled probe for Epstein-Barr Nuclear Antigen (EBNA, labelled E in the figure), an unlabelled probe, and an EBNA protein extract.The image is of the entire full-size blot and is not cropped.

Figure 10 .
Figure 10.The region upstream of COL3A1 containing the SNP has promoter activity, and the effect of the SNP varies with the donor cells used.Luciferase assays showing the relative light unit (RLU) for the 3220 bp region upstream of COL3A1 containing either the reference allele (TT) or the alternative allele (GG) compared with a promoter-less control (C−.Error bars represent the s.e.m of three to six technical replicates.Graphs (A-F) represent the assay being performed in skin fibroblasts from six different donor Thoroughbred horses with a range of polygenic risk scores.The risk score for each horse is displayed above each graph.* p < 0.05.

Figure 10 .
Figure 10.The region upstream of COL3A1 containing the SNP has promoter activity, and the effect of the SNP varies with the donor cells used.Luciferase assays showing the relative light unit (RLU) for the 3220 bp region upstream of COL3A1 containing either the reference allele (TT) or the alternative allele (GG) compared with a promoter-less control (C−).Error bars represent the s.e.m of three to six technical replicates.Graphs (A-F) represent the assay being performed in skin fibroblasts from six different donor Thoroughbred horses with a range of polygenic risk scores.The risk score for each horse is displayed above each graph.* p < 0.05.
Figure S1: Output from RefFinder showing the stability testing of three housekeeping genes; Figure S2: Diagram of the luciferase assay construct relative to the COL3A1 gene transcriptional start site.
Figure S3: Undifferentiated skin fibroblasts cultured under normal conditions (without osteogenic factors) show no positive staining to indicate matrix mineralization;

Figure S4 :
Figure S4: COL3A1 protein levels do not change in response to SOX11 overexpression; Figure S5: Luciferase assays showing the relative light unit (RLU) for the 3220 bp region upstream of COL3A1 containing either the reference allele (TT) or the alternative allele (GG) compared to a promoterless control (C−).Author Contributions: D.J.G. conceived and designed the study; E.P.L. performed the majority of the experimental work; A.B. carried out the WGS analysis, skin fibroblast banking, and polygenic risk scoring; R.E.M. performed the overexpression work and A.C.R. performed the immunocytochemistry and analysis for STAT1 and RUNX2; S.C.B. designed the polygenic risk scoring protocol; S.A.D.-A.performed the minor allele frequency analysis; D.J.G. supervised the project, secured the funding, and drafted the manuscript.All authors have read and agreed to the published version of the manuscript.Funding: This study was kindly funded by the Horserace Betting Levy Board (vet/prj/792).A.B. was funded by the Anne Duchess of Westminster Charitable Trust, which also funded the whole-genome sequencing.The Alborada Trust fund A.C.R. Institutional Review Board Statement: Equine skin fibroblasts were taken from horses that had been euthanised for reasons unrelated to this study and with the consent of the Animal Health Trust Ethical Review Committee (AHT_02_2012) and Royal Veterinary College Clinical Research Ethical Review Board (URN 2021 2035-2).