Genome-Wide Association Analyses of Equine Metabolic Syndrome Phenotypes in Welsh Ponies and Morgan Horses

Equine metabolic syndrome (EMS) is a complex trait for which few genetic studies have been published. Our study objectives were to perform within breed genome-wide association analyses (GWA) to identify associated loci in two high-risk breeds, coupled with meta-analysis to identify shared and unique loci between breeds. GWA for 12 EMS traits identified 303 and 142 associated genomic regions in 264 Welsh ponies and 286 Morgan horses, respectively. Meta-analysis demonstrated that 65 GWA regions were shared across breeds. Region boundaries were defined based on a fixed-size or the breakdown of linkage disequilibrium, and prioritized if they were: shared between breeds or across traits (high priority), identified in a single GWA cohort (medium priority), or shared across traits with no SNPs reaching genome-wide significance (low priority), resulting in 56 high, 26 medium, and seven low priority regions including 1853 candidate genes in the Welsh ponies; and 39 high, eight medium, and nine low priority regions including 1167 candidate genes in the Morgans. The prioritized regions contained protein-coding genes which were functionally enriched for pathways associated with inflammation, glucose metabolism, or lipid metabolism. These data demonstrate that EMS is a polygenic trait with breed-specific risk alleles as well as those shared across breeds.


Introduction
Equine metabolic syndrome (EMS) is now best described as a clustering of risk factors often leading to laminitis, which is the primary clinical concern due to the painful and often career ending outcome of this disease [1]. Although laminitis itself is not fatal, in the best interest of the patient, the severity and crippling pain often leads to a decision of euthanasia [2]. The key component of EMS is insulin dysregulation, which are derangements in the balanced relationship between plasma insulin, glucose and lipids, and manifests clinically as baseline hyperinsulinemia, an exaggerated or prolonged insulin or glucose response post oral or intravenous carbohydrate challenge, tissue insulin resistance, or hypertriglyceridemia [1].
insulin, INS-OST and triglycerides and ACTH were log transformed. Glucose, GLU-OST, and NH and GH ratios were normally distributed and did not need to be adjusted.
Trait measurements were adjusted to account for known confounding covariates using the residuals from a linear mixed effects model in the R software program Linear and Nonlinear Mixed Effects Models (nlme), with sex and age included as fixed effects and farm as a random effect [16]. For each trait, within breed GWA were performed from the imputed SNP genotype data using a custom code for an improved mixed linear regression analysis [9]. This algorithm utilizes a three-step process, which combines a Bayesian Sparse Linear Mixed Model (BSLMM) [17] available in the software program Genome-wide Efficient Mixed Model Association (GEMMA) [18] and a linear mixed model implemented in FaST-LMM [19] (see Appendix B for a full description of this model). The threshold for genome-wide significance was based on the effective number of independent tests (SNPs not in linkage disequilibrium [LD]) as calculated by the Genetic Type 1 Error Calculator [20]. In the Welsh ponies, this value was 841,750 SNPs, resulting in a Bonferroni-corrected threshold for genome-wide significance of 5.98 × 10 −08 . For the Morgan horses, the effective number of independent tests was 657,030 SNPs, resulting in a Bonferroni corrected threshold for genome-wide significance of 7.61 × 10 −08 . The suggestive threshold for both breeds was set at 1.00 × 10 −05 [21,22].
Principal components analysis (PCA) revealed population stratification in the Welsh pony cohort based on clustering of the registered sections ( Figure A1). To account for this population substructure, and avoid over-fitting the model, three separate GWA were performed using the full cohort (n = 264), sections A, B, C and D (n = 238) and sections A and B (n = 220). In order to maximize sensitivity, the union of the GWA results from all three cohorts was used (Appendix A).

Meta-analysis
A GWA meta-analysis was performed with the software program METASOFT [23] using the Morgan horse and Welsh pony GWA summary data from the 688,471 SNPs that were shared between breeds. Briefly, the METASOFT algorithm uses a random effects model which adjusts for heterogeneity between studies by allowing the effect size of the alternative allele to vary between populations. Unlike other random effects models, where both the null and alternative models assume heterogeneity, METASOFT uses a likelihood ratio test that assumes heterogeneity only under the alternative model [24]. The effective number of shared SNPs was 306,023 in the Morgan horses and 307,349 in the Welsh ponies as calculated by GEC. For a more conservative p-value, the threshold for genome-wide significance was determined using the effective number of SNPs for the Welsh ponies (0.05/307,349) and set at 1.63 × 10 −07 . The suggestive threshold was set at 1.00 × 10 −05 [21,22]. To be considered a region of interest identified on meta-analysis (MA-ROI), at least one SNP needed to exceed the threshold for genome-wide significance.

Prioritization of GWA Regions and Identification of Positional Candidate Genes
All GWA regions where SNPs exceeded the suggestive threshold for significance were reviewed. To be considered within a single region, consecutive SNPs on the same chromosome had to be within 500 kb of each other [24,25]. Regions of interest had to contain a minimum of five SNPs exceeding the suggestive threshold, with at least one SNP exceeding the threshold for genome-wide significance.

Fixed-Size Regions
The boundaries of the fixed-size region were defined as 500 kb 5 of the base pair position of the minimum SNP within the region and 500 kb 3 of the base pair position of the maximum SNP [24][25][26][27][28][29]. A region was identified as shared if it was within the boundaries of another region and prioritized as described below.

LD-Bound Regions
To define the boundaries of the LD-bound region, the software program Plink was utilized to calculate the pairwise LD measures for all SNPs within the region [14]. Window size was set at 1 Mb from the minimum and maximum SNP within the region; a pairwise calculation for LD with the test SNP was performed for all SNPs within the window. The threshold for SNPs within LD was set at an r 2 of greater than 0.3 [27]. A custom code was used to identify regions where LD for all SNPs dropped below 0.3 and spanned at least 100 kb both 5 and 3 to the widest peak of LD within the window, which was used to define the boundaries of the region. If LD did not drop for at least 100 kb on either side of the LD peak, window size was increased by 1 Mb until the region could be defined. An LDbound region was identified as shared if it was within the boundaries of another LD-bound region and prioritized as described below.

Prioritization
Regions were prioritized if they were identified as shared between breeds on meta-analysis (MA-ROI), shared across traits within a single GWA cohort (for example, a region shared between insulin and adiponectin in the Morgan horses), or breed specific. The prioritized regions were categorized as high, medium or low priority ( Figure 1) as follows:

•
High priority: Region was identified as an MA-ROI or it was shared across traits with at least one region being considered an ROI. • Medium priority: Region was identified as an ROI in at least one GWA cohort. • Low priority: Region was shared across traits, but no regions met the criteria to be considered an ROI.

•
If a region met the criteria for more than one category (for example a region identified as a MA-ROI and was also shared across traits but not an ROI) then the region was assigned the higher priority level.
Genes 2019, 10, x FOR PEER REVIEW 5 of 23 side of the LD peak, window size was increased by 1 Mb until the region could be defined. An LDbound region was identified as shared if it was within the boundaries of another LD-bound region and prioritized as described below.

Prioritization
Regions were prioritized if they were identified as shared between breeds on meta-analysis (MA-ROI), shared across traits within a single GWA cohort (for example, a region shared between insulin and adiponectin in the Morgan horses), or breed specific. The prioritized regions were categorized as high, medium or low priority ( Figure 1) as follows:


High priority: Region was identified as an MA-ROI or it was shared across traits with at least one region being considered an ROI.  Medium priority: Region was identified as an ROI in at least one GWA cohort.  Low priority: Region was shared across traits, but no regions met the criteria to be considered an ROI.  If a region met the criteria for more than one category (for example a region identified as a MA-ROI and was also shared across traits but not an ROI) then the region was assigned the higher priority level.

Identification of Positional Candidate Genes and Functional Enrichment Analysis
Positional candidate genes were identified using the Bioconductor/ R software package biomaRt [30] with EquCab3 as the reference genome [31]. Positional candidate genes were defined as all protein coding genes, pseudogenes, and RNA genes within each GWA region as defined by the fixedsize region or the LD-bound regions as described above. Based on the comparison between the fixedsize and LD-bound regions, protein-coding genes within all prioritized LD-bound regions were

Identification of Positional Candidate Genes and Functional Enrichment Analysis
Positional candidate genes were identified using the Bioconductor/R software package biomaRt [30] with EquCab3 as the reference genome [31]. Positional candidate genes were defined as all protein coding genes, pseudogenes, and RNA genes within each GWA region as defined by the fixed-size

Shared Regions Across Welsh Ponies and Morgan Horses and Across Traits
Identification of the shared regions between the Morgan horses and the Welsh pony cohort from the boundaries of the fixed region obtained from the GWA results identified one shared region for laminitis status, one shared region for ACTH, and one shared region for insulin-OST ( Figure 2). The boundaries defined by LD identified the above shared regions as well as an additional shared region for GH on ECA 22.
Meta-analysis across breeds identified the four shared regions above, as well as an additional 56 regions and five unique regions (regions not identified in either breed as significant on GWA), for a total of 65 shared regions of interest (MA-ROI). MA-ROI included two for insulin and four for glucose post oral sugar challenge, three for insulin, two for glucose, four for NEFA, seven for adiponectin, five for leptin, 15 for NH, eight for GH, and 12 for laminitis status. Unique regions were found for insulin (one MA-ROI) and glucose (one MA-ROI) post oral sugar test and NH (three MA-ROI). Across the MA-ROI, three regions were also shared across traits. No MA-ROI were identified for plasma triglyceride levels (Table 1).
Of the 56 regions identified on meta-analysis that were only significant in one breed-specific GWA, 30 (22 ROI) were called in at least one Welsh pony cohort and 26 (20 ROI) were called in the Morgan horses. Twenty-eight of the MA-ROI contained less than five SNPs of which 11 were single SNP regions. Comparison of the results using a fixed effects model identified 32 of the 65 MA-ROI (Table 1). For the Morgan horses, 88 of the 142 regions were eliminated from further prioritization (Table  S8). This resulted in 54 regions being prioritized and 1104 positional candidate genes with 38 high priority regions containing 801 positional candidate genes, eight medium priority regions containing 139 positional candidate genes, and eight low priority regions containing 164 positional candidate genes. Accounting for the 10 shared regions resulted in 44 unique regions and 963 positional candidate genes (Tables S8 and S9). and the red line represents the genome-wide significant threshold (7.61e−08 in the Morgan horses and 5.98e−08 the Welsh ponies). In the section A, B, C and D Welsh ponies (not shown) and A and B Welsh ponies (B), the same region on ECA10 exceeds the suggestive threshold, which was also identified as an ROI in the Morgan horses (A). GWA metaanalysis identified this region as shared across both breeds.   To be considered an MA-ROI, at least one SNP had to exceed the threshold for genome-wide significance (1.6 × 10 −07 ) on meta-analysis. Provided is the base pair position of the lowest (Min_SNP) and highest (Max_SNP) SNP, as well as the number of SNPs per region which exceeded the suggestive (Sugg_SNPs) and genome-wide significance (Sign_SNPs) threshold. Regions shared across two traits in the meta-analysis are represented by the corresponding highlighted chromosomes (Chr). Regions which were statistically significant using a standard fixed effects models (FE) are indicated by an X.

Prioritization of GWA Results and Identification of Positional Candidate Genes Based on Fixed-Size Regions in Welsh Ponies
For the Welsh pony cohorts, 189 of the 303 regions were eliminated from further prioritization. For the remaining 114 regions, 46 regions were considered high priority regions and contained 890 positional candidate genes, 34 regions were considered medium priority regions and contained 719 positional candidate genes, and 35 regions were considered low priority regions and contained 289 positional candidate genes. Accounting for the 19 shared regions resulted in 91 unique regions and 1511 positional candidate genes (Table S7).

Prioritization of GWA Results and Identification of Positional Candidate Genes Based on Fixed-Size Regions in Morgan Horses
For the Morgan horses, 88 of the 142 regions were eliminated from further prioritization (Table S8). This resulted in 54 regions being prioritized and 1104 positional candidate genes with 38 high priority regions containing 801 positional candidate genes, eight medium priority regions containing 139 positional candidate genes, and eight low priority regions containing 164 positional candidate genes. Accounting for the 10 shared regions resulted in 44 unique regions and 963 positional candidate genes (Tables S8 and S9).

Prioritization of LD-Bound GWA Regions, Identification of Positional Candidate Genes, and Network Analysis in Welsh Ponies
In the Welsh ponies, the LD-bound regions identified five additional regions shared across traits (ECA1 for adiponectin and INS-OST, ECA5 for insulin and leptin, ECA6 for leptin and GH, ECA9 for INS-OST and NEFA, and ECA18 for insulin and GH). However, the LD-bound regions did not identify six regions as shared across traits that were identified with the fixed-size boundaries (ECA4 for leptin and GH, ECA10 for NH and GH, ECA14 for leptin and laminitis status, ECA19 for ACTH and laminitis status, ECA 28 for insulin and INS-OST, and ECA28 for adiponectin and leptin). This resulted in 214 regions being removed and 89 regions being prioritized with 56 high priority regions containing 1567 positional candidate genes (Table 2). Further, 26 regions were given medium priority and contained 620 positional candidate genes and seven regions were given low priority and contained 30 positional candidate genes for a total of 2217 positional candidate genes. Accounting for the 18 shared regions resulted in 1853 positional candidate genes (Table S13).
Across nine EMS traits, functional enrichment analysis of the protein-coding genes within the prioritized regions identified enrichment (adjusted p-value < 0.05) for Gene Ontology (GO) terms associated with inflammation and fatty acid metabolism, including: interleukin-1 receptor binding (GH), lipid antigen binding (insulin), RAGE receptor binding (insulin and leptin), toll-like receptor binding (insulin and leptin), fatty-acyl-coA-synthase (INS-OST), cytokine activity (ACTH, NEFA, GH and NH), and regulation of NK-kappa signaling (GH, glucose, insulin, laminitis status, leptin, NEFA and NH). Enrichment for protein-coding genes associated with the GO terms thiosulfate sulfurtransferase activity was identified for NEFA concentrations (p-value = 4.9 × 10 −02 ).  Final region boundaries were based on LD and are indicated by the lowest base pair position (Min) and the highest base pair position (Max). The total number of genes includes all protein-coding genes, pseudogenes, and RNA genes identified for region based on EquCab3. Shared regions across prioritized traits are indicated by highlighted chromosomes.

Prioritization of LD-Bound GWA Regions, Identification of Positional Candidate Genes, and Network Analysis in Morgan Horses
The LD-bound GWA regions for the Morgan horse identified three additional regions shared across traits (ECA 21 for triglycerides and adiponectin, ECA 6 for adiponectin and INS-OST, and ECA 19 for NH and laminitis status), but did not identify two regions as shared across traits that were identified with the fixed boundaries (ECA 20 for adiponectin and insulin and ECA 24 for insulin and NEFA) (Table S14). This resulted in 39 high priority regions containing 1142 positional candidate genes (Table 3). Further, eight regions were assigned medium priority and contained 155 positional candidate genes, and nine regions were assigned low priority and contained 176 positional candidate genes for a total of 1473 positional candidate genes. Accounting for the 12 shared regions resulted in 1167 positional candidate genes (Tables S14 and S15).
Functional enrichment analysis of the protein-coding genes within the prioritized regions identified enrichment (adjusted p-value < 0.05) for the Kyoto Encyclopedia of Genes and Genomes (KEGG) terms PI3K-AKT signaling pathway (ACTH, adiponectin, GH, glucose-OST, INS-OST, laminitis status, leptin, NEFA, and NH) and the Wnt-signaling pathway (ACTH, adiponectin, INS-OST, NH, and TG). Enrichment was also identified for the GO terms arylesterase activity for adiponectin (adjusted p-value = 1.3 × 10 −02 ) and G-protein-coupled receptor activity for leptin (adjusted p-value = 1.2 × 10 −21 ). Final region boundaries were based on LD and are indicated by the lowest base pair position (Min) and the highest base pair position (Max). The total number of genes includes all protein-coding genes, pseudogenes, and RNA genes identified for region based on EquCab3. Shared regions across prioritized traits are indicated by highlighted chromosomes.

Discussion
In this study, we used high density SNP genotype data and GWA in two high risk breeds to identify hundreds of regions of the genome contributing to 12 EMS traits. Both fixed (500 kb) and linkage disequilibrium-based approaches were used to identify the boundaries of genomic regions identified on GWA and from there positional candidate genes within these regions. Within breed prioritization of the LD-bound regions was used to begin the process of organizing and managing this large list, and resulted in 56 high, 26 medium, and seven low priority regions, for a total of 1853 candidate genes in the Welsh ponies; and 39 high, eight medium, and nine low priority regions, for a total of 1167 candidate genes in the Morgan horses. Meta-analysis demonstrated that 65 of these regions were shared across breeds. These data support the hypothesis that EMS is a polygenic trait with both across breed and breed-specific genetic variants and predict that identifying even the major functional variants and risk loci will be a challenging task.
Comparison of the regions identified from the within breed GWA identified fewer shared regions across breeds than we anticipated. This could indicate that breed differences account for more of the risk alleles for EMS than previously thought, or that additional regions were shared but not identified in the breed-specific GWA, which can occur for several reasons. First, if the allele frequency of the variant is low in one breed, then it will not be detected on that GWA. Second, the effect of the variant on a trait can vary between breeds. The within-breed population sizes were powered to detect variants of moderate to high effect but would not find variants of low effect [33,34]. Third, variations in recombination of the ancestral chromosome can lead to differences in marker alleles between populations [35]. Depending on the markers represented on the genotyping array, the variant may be identified in one breed but not the other. Increasing the power of the study by performing across-breed GWA could identify more shared regions between breeds. However, combining data can lead to the inclusion of additional population substructure and unknown confounding variables into the model [36]. Therefore, we used meta-analysis to increase the number of individuals within the analysis and to improve the power to find unique associations, variants of low effect, and additional shared regions across breeds [37,38], and identified 65 shared regions of which five were unique (not identified in either breed specific GWA).
Selective breeding can lead to population stratification within breeds [39], and not accounting for this population stratification can lead to spurious associations on GWA [40]. For this data, principal components analysis revealed population stratification in the Welsh pony cohort based on clustering of the registered sections ( Figure A1). This was not unexpected as the Welsh pony sections are distinct subpopulations based on pedigree and conformation. Mixed linear models are a common way to account for population stratification and relatedness in GWA with the inclusion of a genetic relationship matrix (GRM) [18,41]. However, the Welsh ponies presented a unique challenge since, although the GRM would account for genetic similarities between Welsh pony sections, it would not account for all the phenotypic (conformation traits) variation between sections. On the other hand, including both the GRM and section as a covariate would lead to over-fitting of the model by accounting for relatedness both as a random effect (GRM) and fixed effect (section). Therefore, to account for population stratification within our Welsh pony cohort while maximizing sensitivity to identify genetic variations contributing to EMS both within and across sections, we chose to perform the GWA using the full data set and then subset the data to the section A, B, C and D ponies, and the section A and B ponies. Notably, there is likely a degree of population segregation in the section A and B which we are not fully controlling with a mixed model analysis, and ideally, we would have included the section A, B, C and D ponies as a separate GWA cohorts in order to account for all population substructure; however, our sample size made these analyses unfeasible due to the limited statistical power.
To reduce false positives, regions were prioritized based on sharing across breeds and traits. Regions shared across breeds (MA-ROI) were given high priority, as these regions were not breed specific and likely to be found in other high-risk breeds. Regions shared across traits with at least one ROI were also assigned high priority since many components and downstream effects of the endocrine system are highly interrelated; therefore, a variant affecting multiple traits would be expected to have a larger biological effect then a variant affecting a single trait. An ROI identified in one GWA cohort was assigned medium priority as these regions were likely breed or section (Welsh pony) specific and, based on the power of our study, variants of moderate to high effect. Finally, regions that were not an ROI but shared across traits were assigned low priority, because these regions were identified across multiple GWA and were less likely to be false positives and/or regions that contain variants of low effect.
Our prioritization removed 71% of the 303 GWA regions for the Welsh ponies and 58% of the 142 GWA regions in the Morgan horses. Of the removed regions, 49% were single SNP regions, 38% were regions with less than five SNPs, and 13% were regions with greater than or equals to five SNPs but no SNPs which exceeded the threshold for genome wide significance. Given: (i) the large percentage of single or low SNP regions that were removed, (ii) the high-density genotype data used in these analyses, and (iii) the use of the max gamma value for BSLMM (improving sensitivity at the cost of specificity), it is likely most of these regions were false positives. However, we utilized Bonferroni corrected p-values which tend to be more conservative corrections [42]; therefore, some of the removed regions may harbor genetic variants associated with EMS but represent variants with very low effect or poorly annotated regions of the genome (relative decreased number of SNPs in that region). Increasing the number of individuals, or represented Welsh pony sections, would improve the power of the study to determine which of these regions were true or false positives.
In order to identify candidate genes, we first used a fixed boundary of 500 kb 5 of the SNP identified on GWA with the lowest base pair position and 3' of the SNP with the highest base pair position. 500 kb was chosen based on the average distance for LD to breakdown in Thoroughbred horses [26,27,29]. Although LD decay varies between horse breeds [28], using the more conservative Thoroughbred estimate gave a higher likelihood that we would capture all variants within LD (r 2 > 0.3) of the marker SNPs in our cohorts. From the fixed boundaries, 1511 and 963 positional candidate genes were identified in the Welsh ponies and Morgan horses, respectively.
Estimates of LD decay are based on the average r 2 across chromosomal segments and do not represent specific regions of the genome [26,28]. Newer variants or variants within regions under selection will have longer LD blocks whereas older/ancestral variants will have shorter LD blocks due to longer periods of recombination. Therefore, using a fixed region has the potential to exclude causal variants or to include candidate genes that are not in LD with the marker SNPs. To more precisely call positional candidate genes for GWA regions, we calculated LD using the squared correlation coefficients between SNPs. SNPs within LD were defined as an r 2 > 0.3 [28]. Boundaries were identified based on gaps of LD, i.e. where all SNPs dropped below 0.3 for a span of 100 kb 5 (defined the start of the LD block) and 3' (defined the end of LD block) to the widest peak of LD.
Across all Welsh pony cohorts, 70% of the boundaries identified by LD were smaller than those identified by the fixed-size region, with an average difference of 645.4 kb (range of 11.4 kb to 1.7 Mb); whereas, in the Morgan horses, 57% of the LD boundaries were smaller than that of the fixed regions, with an average difference of 566.6 kb (range of 51.5 kb to 2.2 Mb). The large percentage of fixed-size boundaries, likely overestimating the region size, is not surprising given that the fixed-size regions were based on data from Thoroughbreds which have one of the highest inbreeding coefficients and LD amongst horse breeds [24,28]. Ponies and Morgan horses were identified to have LD similar to Quarter Horses [28], a breed with a high level of genetic diversity. For the remaining regions, the LD boundaries were an average of 1. Results of the functional enrichment analysis of the prioritized regions based on LD provided support for our approach. In the Welsh ponies, nine traits were enriched for pathways associated with inflammation or fatty acid metabolism. This is not surprising given that chronic, low-grade inflammation and dyslipidemia are common components of EMS. The latter results are also consistent with human GWA studies of MetS which have consistently identified a large number of variants related to lipid metabolism [43][44][45]. In addition, the GO term thiosulfate sulfurtransferase (Tst, Rhodanese) activity was enriched in positional genes identified for plasma NEFA concentrations. Tst encodes a mitochondrial enzyme which is involved in mitochondrial energetics and removal of reactivate-oxygen species and was identified as a candidate obesity-resistant gene in the polygenic Lean mouse model [46]. In this study, the authors evaluated Tst mRNA concentrations in multiple cross-ethnic human populations and found that they were higher in lean individuals compared with obese individuals or individuals with type 2 diabetes and negatively correlated with body mass index [46].
In the Morgan horses, the KEEG terms for PI3K/AKT and Wnt signaling pathways were functionally enriched across several traits. Both of these pathways have significant roles in metabolism, with the PI3K/AKT signaling pathway being essential for glucose homeostasis and lipid metabolism [47,48], and the Wnt signaling pathway regulating body mass, glucose metabolism, de novo lipogenesis, and low-density lipoprotein clearance [49]. Further, the GO term arylesterase (ARE) activity was enriched for plasma adiponectin concentrations. ARE activity has been linked to the paraoxonase-1 gene which has roles in lipoprotein oxidative damage prevention and being protective against metabolic syndrome [50,51]. Lastly, serum leptin concentrations were enriched for the GO term G-protein-coupled receptor activity, which have been found to be directly involved in pancreatic β-cell destruction and insulin resistance and remain potential targets for drug therapy for metabolic syndrome and type II diabetes [52].

Conclusions
In conclusion, the results of these data provide strong evidence that EMS is a complex, polygenic syndrome with dozens of risk alleles contributing to the phenotype. Prioritization of the hundreds of regions identified on the GWA of 12 individual traits led to the identification of thousands of positional candidate genes which contained protein-coding genes which were functionally enriched for pathways associated with glucose metabolism, lipid metabolism and inflammation. However, further work is required to narrow down the candidate gene pool. Nonetheless, these data were an important first step in the identification of the genetic risk alleles associated with EMS. In addition, this data concludes that a single variant genetic test will not provide enough information to accurately predict an individual's risk for EMS, and a future DNA test will require a panel of genetic variants including both shared and breed specific variants.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/11/893/s1, Figure S1: Linkage disequilibrium (LD) for neck-to-height-ratio (NH) on equine chromosome 4 (ECA4) in the Morgan horses, Table S1: Summary table for GWA regions for each of the 12 EMS traits in Welsh ponies and Morgan horses, Table S2: Summary table of the shared regions across two or three cohorts for each of the 12 EMS traits from the Welsh pony GWA, Table S3: Shared regions from the Welsh pony GWA, Table S4: Prioritization of the GWA results of the full Welsh pony cohort based on fixed-size regions, Table S5: Prioritization of the GWA results of the section A, B, C and D Welsh ponies based on fixed-size regions, Table S6: Prioritization of the GWA results of the section A and B Welsh ponies based on fixed-size regions, Table S7: Final prioritization of the GWA results based on a fixed-size region for the Welsh ponies, Table S8: Prioritization of the GWA results of the Morgan horses based on fixed-size regions, Table S9: Final prioritization of the GWA results based on a fixed-size region for the Morgan horses, Table S10: Prioritization of the GWA results of the full Welsh pony cohort based on LD-bound regions, Table S11: Prioritization of the GWA results of the section A, B, C and D Welsh ponies based on LD-bound regions, Table S12: Prioritization of the GWA results of the section A and B Welsh ponies based on LD-bound regions, Table S13: Final prioritization of the GWA results based on an LD-bound region for the Welsh ponies, Table S14: Prioritization of the GWA results of the Morgan horses based on LD-bound regions, Table S15: Final prioritization of the GWA results based on an LD-bound region for the Morgan horses. The Welsh Pony and Cob Society (http://wpcs.uk.com) registers Welsh ponies into six sections based on pedigree, withers height and conformation as follows: section A (sire and dam must both be section A, and the pony can be up to 50 inches for withers height), section B (either sire and dam are both section B or one parent can be a section A, and the pony can be up to 58 inches for withers height), section C (at least one parent must be C or D and the pony can be up to 54 inches for withers height), section D (at least one parent must be C or D and the pony must be over 54 inches for withers height), section H (either the sire or dam is a registered Welsh pony, and there are no height restrictions) and section P (either sire or dam is at least 50% Welsh pony with no height restrictions).

Acknowledgments:
The authors would like to acknowledge the numerous owners, trainers, and referring veterinarians who participated in the sampling of horses for this study.

Conflicts of Interest:
The authors declare no conflict of interest.

A.1. Welsh Pony Population Structure
The Welsh Pony and Cob Society (http://wpcs.uk.com) registers Welsh ponies into six sections based on pedigree, withers height and conformation as follows: section A (sire and dam must both be section A, and the pony can be up to 50 inches for withers height), section B (either sire and dam are both section B or one parent can be a section A, and the pony can be up to 58 inches for withers height), section C (at least one parent must be C or D and the pony can be up to 54 inches for withers height), section D (at least one parent must be C or D and the pony must be over 54 inches for withers height), section H (either the sire or dam is a registered Welsh pony, and there are no height restrictions) and section P (either sire or dam is at least 50% Welsh pony with no height restrictions).

A.2. GWA Results
For the Welsh ponies, GWA across all 12 traits resulted in 130 regions where at least one SNP exceeded the suggestive threshold, of which 33 were identified as ROI (entire Welsh pony cohort; n = 264), 139 regions where at least one SNP exceeded the suggestive threshold, of which 23 were identified as ROI (section A, B, C, and D Welsh ponies; n = 238), and 82 regions where at least one SNP exceeded the suggestive threshold, of which 13 were identified as ROI (section A and B Welsh ponies; n = 220) (Table S1).
Across all 12 traits, 38 regions were shared with two of the Welsh pony GWA cohorts and 5 regions were shared with all three of them. Fifteen of the 43 shared regions contained at least one GWA where the region met the criteria to be considered an ROI (Tables S2 and S3). The 43 shared regions represented 18.9%, 26.6%, and 30.5% of the total regions identified in the full cohort, the  (Table S1).
Across all 12 traits, 38 regions were shared with two of the Welsh pony GWA cohorts and 5 regions were shared with all three of them. Fifteen of the 43 shared regions contained at least one GWA where the region met the criteria to be considered an ROI (Tables S2 and S3). The 43 shared regions represented 18.9%, 26.6%, and 30.5% of the total regions identified in the full cohort, the section A, B, C and D Welsh ponies, and the section A and B Welsh ponies, respectively. Eight regions had an ROI identified in the full cohort (24.2% of the total ROI for this cohort), six regions had an ROI identified in the section A, B, C and D ponies (26.1% of the total ROI for this cohort), and six regions had an ROI identified in the section A and B ponies (46.2% of the total ROI identified in this cohort). For example, analysis of ACTH identified five shared regions. The region on equine chromosome (ECA) 5 was shared across all three cohorts but was only identified as an ROI in the GWA of the full cohort ( Figure A2 and Table S3).

Appendix A.3. Prioritization of GWA Results and Identification of Positional Candidate Genes Based on Fixed-Size Regions in Welsh Ponies
For the full Welsh pony cohort, 78 of the 130 regions were eliminated from further prioritization, 35 were categorized as high priority, 12 were categorized as medium priority and five were categorized as low priority (Table S4). For the section A, B, C and D Welsh ponies, 94 of the 139 regions were eliminated from further prioritization, 19 were categorized as high priority, 19 were categorized as medium priority and eight were categorized as low priority (Table S5). For the section A and B Welsh ponies, 57 of the 82 regions identified on GWA were eliminated from further prioritization, nine were categorized as high priority, 10 were categorized as medium priority and six were categorized as low priority (Table S6).

A.5. Shared Regions Across Welsh Ponies and Morgan Horses
Identification of the shared regions between the Morgan horses and at least one Welsh pony cohort from the boundaries of the fixed region obtained from the GWA results identified one shared region for laminitis status (all ponies), one shared region for ACTH (Morgan horses with section A, B, C, and D ponies), and one shared region for INS-OST (for Morgan horses with both the section A, B.

C.
A. Figure A2. Manhattan plots of the genome-wide association results for ACTH in (A) full Welsh pony cohort, (B) the section A, B, C and D Welsh ponies, and (C) the section A and B Welsh ponies. The equine chromosomes (ECA) are plotted on the x-axis and the −log10 of the p-value is plotted on the y-axis. The blue line indicates the suggestive threshold (1.0 × 10 −05 ) and the red line represents the genome-wide significant threshold (5.9 × 10 −08 ). In all three GWA, the same region on ECA5 exceeds the suggestive threshold but is only identified as an ROI in the full cohort (A).

Appendix A.5. Shared Regions Across Welsh Ponies and Morgan Horses
Identification of the shared regions between the Morgan horses and at least one Welsh pony cohort from the boundaries of the fixed region obtained from the GWA results identified one shared region for laminitis status (all ponies), one shared region for ACTH (Morgan horses with section A, B, C, and D ponies), and one shared region for INS-OST (for Morgan horses with both the section A, B, C, and D and section A and B ponies; Figure 2). The boundaries defined by the LD-region identified the above shared regions as well as an additional shared region for GH on ECA 22 between the Morgan horses and the full Welsh pony cohort. Within breed GWA were performed from the imputed SNP genotype data using a custom code for an improved mixed linear regression analysis [9]. This algorithm utilizes a three-step process, which combines a Bayesian Sparse Linear Mixed Model (BSLMM) [18] available in the software program Genome-Wide Efficient Mixed Model Association (GEMMA) [19] and a linear mixed model implemented in FaST-LMM [20]. In step one, the genome is divided by chromosome and SNPs are placed into 500 kb bins. Based on results from BSLMM, the SNP with the highest model frequency and the two adjacent SNPs were chosen to represent the corresponding bin. In step two, a likelihood ratio test was performed to determine if inclusion of the top ranked bin as a random effect will improve the null model (model with sex and age as fixed effects and farm as a random effect). If the model was improved, the alternative model became the null model and the next highest-ranked bin was tested. If the model was not improved, the bin was discarded, and the next highest-ranked bin was tested against the null hypothesis. After all bins were evaluated, SNPs which improved the model were utilized to create the select SNP genetic relationship matrix (GRM). In step three, all imputed SNPs were tested for an effect on the trait using FastLMM with the inclusion of the select SNP GRM in place of the standard all-SNP GRM. For the GWA, the tested SNP, and SNPs within 1Mb of the tested SNP, were removed from the select SNP GRM to avoid proximal contamination.
The number of iterations for the Markov chain Monte Carlo (MCMC) implemented in BSLMM has not been previously assessed [9,17], and our initial results with the default of 550 thousand (k) iterations with 10 k burn-in iterations provided inconsistent results across seeds. Therefore, we took appropriate steps to determine the number of iterations for the MCMC to converge and provide consistent results across seeds. First, to assess the concordance of SNPs identified by BSLMM, we performed this step using 10 million (M) iterations with 100 K burn-in iterations, which was repeated across ten different seeds. SNPs with a beta value greater than zero (i.e. the posterior mean for SNPs which were estimated to have a large effect on the outcome variable) were extracted from the dataset for each seed. For this subset of SNPs, the intersect between seeds was determined, and correlations between gamma values (proportion of iterations that the SNP was estimated to have a large effect) were calculated. For 10 M iterations, the total number of SNPs with a beta value greater than zero ranged from 486,937 to 497,207 SNPs. Approximately 50% of the SNPs were shared between two seeds,~13% were shared between four seeds, and~3% were shared across all ten seeds. Pearson's correlation coefficient between gamma values were minimal at <0.01. Thus, this process was repeated using 20 M iterations (200 K burn-in iterations) and then increasing in 10 M and 100 k increments up to 100 M iterations (1 M burn-in iterations). Although SNP concordance improved as the number of iterations increased, the gain plateaued after 50 M iterations where all SNPs had a beta value greater than zero. In addition, the Pearson's correlation coefficient for gamma values was still poor at 0.20 at 100 M iteration. Computational time was extensive at 30 and 60 days to complete the 50 M and 100 M iterations, respectively, utilizing six processors per node and running seeds in parallel [53].
Previous studies have averaged the values of MCMC estimates across repeated chains [54]. For this analysis the goal was to maximize sensitivity; therefore, using data from the 10 M iterations, the max gamma value across all ten seeds was chosen to represent each SNP in which beta was greater than zero. These values were then used to choose the most influential SNP per 500 kb bins (step 1). To assess the repeatability of these results, this process was repeated using 10 different seeds at 10M iterations and 20 M iterations. Although differences were present, most regions were shared across all three results (Table A1). Thus, to maximize computational efficiency and sensitivity, we used the max gamma value across 10 seeds obtained from 10 M iterations (100 K burn-in) and prioritized regions of interest (see Section 2.5. Prioritization of GWA Regions and Identification of Positional Candidate Genes).

Appendix B.2. Fixed and Random Effects
Age and sex were included in our model as fixed effects based on epidemiological studies which identified both as risk factors for EMS [55,56]. Season [57,58], diet [3,59], exercise [59,60], and endocrine-disrupting chemicals [61] have also been identified as environmental risk factors for EMS, but a large percentage of environmental variation has yet to be explained [9]. Further, several studies have produced conflicting findings as to the effect of season on EMS biochemical measurements [62,63], as well as the long-term effect of high non-structural carbohydrate diets on insulin sensitivity [64,65]. Therefore, we included farm as a random variable to account for both known and unknown environmental risk factors, as well as non-independent sampling of our data (each farm was required to have one control and one horse with EMS to be included in the study).  Abbreviations: Sugg (total number of SNPs which exceeded the suggested threshold for genome-wide significance), Sign (total number of SNPs which exceed the threshold for genome-wide significance).