Molecular Characterisation of Equine Herpesvirus 1 Isolates from Cases of Abortion, Respiratory and Neurological Disease in Ireland between 1990 and 2017

Multiple locus typing based on sequencing heterologous regions in 26 open reading frames (ORFs) of equine herpesvirus 1 (EHV-1) strains Ab4 and V592 was used to characterise 272 EHV-1 isolates from 238 outbreaks of abortion, respiratory or neurological disease over a 28-year period. The analysis grouped the 272 viruses into at least 10 of the 13 unique long region (UL) clades previously recognised. Viruses from the same outbreak had identical multi-locus profiles. Sequencing of the ORF68 region of EHV-1 isolates from 222 outbreaks established a divergence into seven groups and network analysis demonstrated that Irish genotypes were not geographically restricted but clustered with viruses from all over the world. Multi-locus analysis proved a more comprehensive method of strain typing than ORF68 sequencing. It was demonstrated that when interpreted in combination with epidemiological data, this type of analysis has a potential role in tracking virus between premises and therefore in the implementation of targeted control measures. Viruses from 31 of 238 outbreaks analysed had the proposed ORF30 G2254/D752 neuropathogenic marker. There was a statistically significant association between viruses of the G2254/D752 genotype and both neurological disease and hypervirulence as defined by outbreaks involving multiple abortion or neurological cases. The association of neurological disease in those with the G2254/D752 genotype was estimated as 27 times greater than in those with the A2254/N752 genotype.


Introduction
Equid alphaherpesvirus 1 (EHV-1) commonly known as equine herpesvirus 1, is the most economically and clinically significant equine herpesvirus [1]. EHV-1 belongs to the Alphaherpesvirinae subfamily, genus Varicellovirus [2]. The virus has a global distribution in horse populations, causing several clinical syndromes including mild to severe respiratory disease, abortion, neonatal foal death, chorioretinitis, and neurological disorders often referred to as equine herpesvirus myeloencephalopathy (EHM) [3][4][5][6]. The neurological form of the disease is less common than abortion or respiratory disease but can result in fatalities [6]. Primary infection occurs via the sporadic cases versus multiple case outbreaks were also examined for the presence of the putative neurological marker.

Multi-Locus Analysis
Phylogenetic analysis (see tree in Figure 1 and alignment in Supplementary Figure S1) was performed using an artificial peptide consisting of concatenated amino acids of U L and U S based on 31 non-synonymous substitutions between Ab4 and V592 and seven additional mutations identified by analysis of viruses characterised in this study and published sequences [34,35]. The analysis grouped the 272 viruses characterised into 10 of the 13 U L clades identified by Bryant et al. (2018) [35]. This approach did not distinguish the single clade 2 representative NY03, which grouped with clade 1 viruses or the single clade 12 representative Suffolk/91/94, which grouped with the clade 10 viruses. Similar to the UK, the majority of Irish isolates (118 isolates from 106 outbreaks from 1991 to 2017) clustered in clade 7 with viruses from the United States and Australia. No Irish viruses belonged to clade 4, which contains the single representative strain RACL11.
The potential usefulness of multi-locus analysis to corroborate or disprove a hypothesis based on epidemiological data was demonstrated in selected outbreaks (see Supplementary Table S1). The hypothesis that multiple cases of neurological disease identified on two different sport horse premises (141 and 142) were linked by horse movement was supported by the identical multi-locus profiles of the viruses isolated (IRL/350/2011 and IRL/331/2011). Similarly, IRL/394/2009 was isolated on premises 121 from a mare with neurological disease that had returned home from a public stud farm, premises 122. Subsequent investigation indicated that IRL/394/2009 had an identical multi-locus profile to virus IRL/626/2009, which was circulating sub-clinically on premises 122. Multi-locus typing also corroborated the hypothesis that in one case of neurological disease the source of virus was reactivation of a latent infection. The yearling of a mare that was present on premises 1 during a severe neurological outbreak travelled to premises 22 the following year to be castrated. A pony that shared a field with the yearling post castration, presented with neurological signs approximately two weeks later. The virus isolated from the in-coordinate pony had the same genetic profile as the viruses associated with EHV-1 outbreaks on premises 1, supporting the hypothesis that reactivated virus from the yearling was the source of infection. In contrast, multi-locus typing provided evidence to disprove the hypothesis that horse movement from premises 15 where IRL/176/1994 was isolated prior to diagnosis of EHV-1 abortion and multiple cases of neurological disease, was the source of virus to a public stud (premises 16) where a single mare developed neurological disease. However, the virus isolated IRL/206/1994 (clade 7) from the single case on the stud farm was readily distinguishable from IRL/176/1994 (clade 11), indicating that premises 15 was highly unlikely to be the source.
For routine molecular epidemiological investigation in a diagnostic laboratory alignment of representative EHV-1 isolates from all clades (see Figure 2) identified that a limited number of non-synonymous sites in six ORFs could be targeted to distinguish the clades identified in this study. Clade 7, the most prominent clade circulating, is readily identified using ORF11 (R235M). The same ORF can be used to identify clade 9 viruses. ORF13 identifies clades 6 and 8 using non-synonymous changes at A405T and A499T, respectively. Sequencing of ORF30 in addition to ORFs 11 and 13 identifies clades 10 and 11. Inclusion of ORFs 37, 52 and 76 identifies the remaining clades 1, 3, 5, and 13. This approach has the potential to be used globally as the clades are not specific to Irish isolates but include viruses from Asia, Australia, North America, and Europe. Furthermore, the assay could be modified in the future employing a multiplex PCR to amplify several target fragments simultaneously prior to sequencing. This would be less labour-intensive and more cost-effective.
Neurological disease was associated with viruses from nine of the ten clades identified. Outbreaks with multiple cases of abortion were also associated with viruses from nine clades but outbreaks with multiple neurological cases were restricted to viruses from six clades. A significant association between clade and neurological disease was found (p-value = 0.02) driven by high expression in clades 1 and 3. Furthermore, the statistical association between clade and hypervirulence was found (p-value = 0.027) driven by high expression in clades 1 and 8.

Figure 2.
Multi-locus sequence analysis of representative Irish EHV-1 isolates using 38 amino acid differences in 26 open reading frames (ORFs). Amino acid differences (n = 38, including triplet in ORF14) between EHV-1 strains Ab4 and V592 and representative Irish isolates across 26 ORFs. Amino acid positions are numbered according to V592. Colours are used to highlight different UL clades [35]. EHV-1 representatives from 10 clades are shown. Shading is used to highlight amino acid difference at that site. The putative neurological marker at ORF30 variable site N752/D752 is highlighted in grey. The number of isolates with the same genotype for each clade is summarised by years, counties, premises, and cases of EHV-1 examined. Neuro indicates number of isolates from cases of neurological disease. 1 ND: indicates not determined. 2 Includes repeated samples (n = 3) from the same case. 3 Indicates ORF68 not assigned group. ( ---) represents gap in sequence. * represents an Italian isolate. Multi-locus sequence analysis of representative Irish EHV-1 isolates using 38 amino acid differences in 26 open reading frames (ORFs). Amino acid differences (n = 38, including triplet in ORF14) between EHV-1 strains Ab4 and V592 and representative Irish isolates across 26 ORFs. Amino acid positions are numbered according to V592. Colours are used to highlight different U L clades [35]. EHV-1 representatives from 10 clades are shown. Shading is used to highlight amino acid difference at that site. The putative neurological marker at ORF30 variable site N752/D752 is highlighted in grey. The number of isolates with the same genotype for each clade is summarised by years, counties, premises, and cases of EHV-1 examined. Neuro indicates number of isolates from cases of neurological disease. 1 ND: indicates not determined. 2 Includes repeated samples (n = 3) from the same case. 3 Indicates ORF68 not assigned group. ( -) represents gap in sequence. * represents an Italian isolate. Thirty-five viruses from 31 outbreaks i.e., 13% of the outbreaks included in the study, had the G2254/D752 change in polymerase gene associated with neuropathogenicity [21]. These viruses clustered in several different clades; eight from six outbreaks were clade 1, one was clade 3, five were clade 7, 14 from 13 outbreaks were clade 8, two were clade 9, three from two outbreaks were clade 11, and two were clade 13 (see Table 1). A statistical association between clade and the G2254/D752 genotype was found (p-value < 0.001) driven by high expression in clades 1 and 8. Eighteen of the 35 G2254/D752 viruses (51.4%) were isolated from cases of neurological disease. The remainder were from cases of abortion or neonatal foal death. Six were from five outbreaks of multiple abortion. In this study seven viruses characterised from cases of neurological disease lacked the neuropathogenic marker, i.e., had the A2254/N752 genotype. However, to the best of our knowledge all were single cases. The association between the G2254/D752 genotype and neurological disease was found to be highly significant (p-value <0.001). The odds of developing neurological disease after viral infection for those with the G2254/D752 genotype was estimated as 27 times greater than for those with the A2254/N752 genotype (odds ratio (OR) = 26.9; 95% confidence interval (CI): 10-75). Multiple outbreaks of neurological disease were 38 times more likely than single outbreaks if associated with the neuropathogenic marker (OR = 38.3; 95% CI: 2-820). Furthermore, there was a statistically significant association between the G2254/D752 marker and hypervirulent disease expression as defined by outbreaks involving multiple abortion or neurological cases (p-value <0.001). The odds ratio for the G2254/D752 genotype and hypervirulence was determined as almost five times greater than with an A2254/N752 strain (OR = 4.8; 95% CI: 2.2-10.5).

ORF68
A~1185 bp nucleotide sequence of ORF68 for 222 isolates representative of 222 EHV-1 outbreaks over the 28-year period was determined to assess the relationship between ORF68 groups and clades identified by multi-locus analysis. Sequence analysis of this polymorphic region of ORF68 identified isolates in each of Nugent's six groups and one of her two unassigned groups (see alignment Figure 3). ORF68 representative sequences have been deposited in GenBank with accession numbers MH976701-MH976709. Of the 222 isolates analysed using this grouping system, only one Irish isolate belongs to group 1. Similar to Ab4, IRL/497/1997 encodes 8 G's in the homopolymeric tract, giving rise to a 418 amino acid (aa) long U S 2 protein compared with 303aa for EHV-1 isolates in other groups. Twenty-six isolates (11.7%) belong to group 2. The majority of isolates 112 (50.45%) belong to group 3 and have the characteristic SNP T719. However, two previously unidentified nucleotide changes at SNP A87 and SNP A821 in group 3 viruses may represent two new sub-groups. Twenty-seven (12.16%) and 20 isolates (9%) belong to groups 4 and 5, respectively. All the group 5 isolates identified had the additional SNP C626 in addition to the group 5 characteristic SNPs G710 and A713. Twenty-two (9.9%) isolates belong to the V592-like group 6. A further 14 isolates (6.3%) which contained A629 and T755 SNPs were categorised as belonging to Nugent's unassigned group. Regions of sequence variation in ORF68 for representative Irish isolates. Analysis of 222 EHV-1 isolates showed they belong to seven of the groups previously described [21]: groups 1-6 and one of the two unassigned groups. N indicates the number of isolates characterised with a particular sequence in the group. Dots indicate sequence identity, while group-specific single nucleotide polymorphisms (SNPs) are highlighted. Vertical dashed lines represent breaks in continuous sequence where no changes occurred. The numbers above the alignment indicate the nucleotide positions according to the ORF68 sequence of strain Ab4 (group 1), which contains 8 G residues in the homopolymeric tract (nucleotides 732-739). Symbol (-) denotes nucleotide deletion and (*) denotes includes one Italian strain. Regions of sequence variation in ORF68 for representative Irish isolates. Analysis of 222 EHV-1 isolates showed they belong to seven of the groups previously described [21]: groups 1-6 and one of the two unassigned groups. N indicates the number of isolates characterised with a particular sequence in the group. Dots indicate sequence identity, while group-specific single nucleotide polymorphisms (SNPs) are highlighted. Vertical dashed lines represent breaks in continuous sequence where no changes occurred. The numbers above the alignment indicate the nucleotide positions according to the ORF68 sequence of strain Ab4 (group 1), which contains 8 G residues in the homopolymeric tract (nucleotides 732-739). Symbol (-) denotes nucleotide deletion and (*) denotes includes one Italian strain.
Network analysis (Figure 4) demonstrated that there was no correlation between the sequence of the polymorphic region of ORF68 and the location of the country where the virus was isolated, as the majority of Irish isolates clustered with viruses from all over the world. The single Irish group 1 isolate, IRL/497/1997, clusters with four UK isolates (one of which is Ab4) and one Japanese respiratory isolate in node A. Twenty-five group 2 Irish isolates cluster in node B with viruses from Europe, Japan, North America, Argentina, and Australia. One hundred and twelve group 3 isolates share the largest node C with isolates from Europe, North America, and Australia. A further 27 group 4 isolates cluster with European, North American, Ethiopian, and Indian strains in node D. All 20 Irish group 5 isolates share node M with North American strains. Node K contains only three strains from the UK, including reference strain V592 in addition to 22 Irish isolates. Finally, 14 Irish strains belonging to Nugent's unassigned group shared node F with a single UK strain and 21 Polish strains only. The single Italian isolate analysed belonged to node B (group 2). of the polymorphic region of ORF68 and the location of the country where the virus was isolated, as the majority of Irish isolates clustered with viruses from all over the world. The single Irish group 1 isolate, IRL/497/1997, clusters with four UK isolates (one of which is Ab4) and one Japanese respiratory isolate in node A. Twenty-five group 2 Irish isolates cluster in node B with viruses from Europe, Japan, North America, Argentina, and Australia. One hundred and twelve group 3 isolates share the largest node C with isolates from Europe, North America, and Australia. A further 27 group 4 isolates cluster with European, North American, Ethiopian, and Indian strains in node D. All 20 Irish group 5 isolates share node M with North American strains. Node K contains only three strains from the UK, including reference strain V592 in addition to 22 Irish isolates. Finally, 14 Irish strains belonging to Nugent's unassigned group shared node F with a single UK strain and 21 Polish strains only. The single Italian isolate analysed belonged to node B (group 2).
There was correlation of ORF68 groups and clades identified by multi-locus analysis. The single ORF68 group 1 isolate IRL/497/1997, which is similar to Ab4, has G8 in the homopolymeric tract, clustered with clade 1 viruses. Within ORF68 group 2 there were representatives from clades 1, 3, 5, and 13. Group 3 contained all the clade 7 viruses and 10 of the 24 clade 11 viruses characterised. Six clade 11 viruses had an additional SNP A87 which was absent in clade 7 viruses. Group 4 contained all clade 8 and clade 10 viruses identified. Groups 5 and 6 contained clade 6 and 9 viruses, respectively. The unassigned group described by Nugent et al. (2006) [21] with SNPs A629 and T755 was composed of 14 clade 11 viruses.  Supplementary Table S4 for sequence  information). Nodes, labelled with capital letters (A to Y), represent the same ORF68 sequence and are coloured based on the geographical origin of the sample. The area of each circle is in proportion There was correlation of ORF68 groups and clades identified by multi-locus analysis. The single ORF68 group 1 isolate IRL/497/1997, which is similar to Ab4, has G8 in the homopolymeric tract, clustered with clade 1 viruses. Within ORF68 group 2 there were representatives from clades 1, 3, 5, and 13. Group 3 contained all the clade 7 viruses and 10 of the 24 clade 11 viruses characterised. Six clade 11 viruses had an additional SNP A87 which was absent in clade 7 viruses. Group 4 contained all clade 8 and clade 10 viruses identified. Groups 5 and 6 contained clade 6 and 9 viruses, respectively. The unassigned group described by Nugent et al. (2006) [21] with SNPs A629 and T755 was composed of 14 clade 11 viruses.

Discussion
The present study is the first to document the molecular characterisation of EHV-1 clinical isolates in Ireland over a 28-year period. Two hundred and sixty-nine viruses detected in Ireland and a further three viruses isolated in Italy were included in the multi-locus analysis. Our investigation established that genetic characterisation has the potential to be a useful aid in the management of EHV-1 outbreaks, based on identification of the G2254/D752 polymerase genotype, the U L clade assignation, and to a lesser extent, ORF68 sequencing.
Multi-locus typing of EHV-1 was initiated by Nugent et al. (2006) [21] and extended in this study to allow comparison to recently proposed U L clades [35]. In the original multi-locus study, analysis of a panel of twenty-five isolates (12 neurological and 13 non-neurological) using the amino acid differences between EHV-1 reference strains Ab4 and V592 led to the proposal that ORF68 analysis could be used for distinguishing isolates without having to type multiple loci [21]. Consequently, multi-locus analysis has not been widely used by other investigators who have concentrated on ORF68 sequencing. In this study, multi-locus typing of 272 EHV-1 isolates established a correlation to 11 of the 13 EHV-1 clades proposed by the recent UK genotyping study [35]. The 272 isolates characterised clustered in 10 of the 11 clades. Our results concur with those of Bryant et al. (2018) [35] in demonstrating that clade 7 viruses predominate, and that simultaneous co-circulation of clades occurs in Ireland. In the next generation sequencing (NGS) study by Bryant et al. (2018) [35] network analysis suggested that recombination had occurred between EHV-1 strains. This has also been observed in herpes simplex virus 1 (HSV-1) strains [36]. Furthermore inter-species recombination has been detected in field samples between EHV-1 and EHV-4 [34,37], EHV-1 and EHV-9 [38], and between EHV-1 and EHV-8 [35]. Thus, although the genotyping method based on the targeted multi-locus approach used in this study and also recently developed for VZV [39] may be more practical than NGS for surveillance purposes, it has the limitation that it does not allow the detection of possible recombination crossovers in the unanalysed parts of the genome [40]. Whole genome sequencing (WGS) of viruses is increasingly important in clinical settings but is not yet routinely used in the majority of veterinary diagnostic laboratories. As sequencing costs continue to decrease, specialised bioinformatic resources become more accessible and methods are standardised, WGS using NGS methods is likely to be more widely applied in veterinary medicine providing molecular epidemiology studies greater accuracy [41]. Meanwhile however, targeted PCR amplification and Sanger sequencing offer a rapid and robust alternative for the detection of virus variants. This study indicated that analysis of a SNP identifies viruses of the most common EHV-1 clade in the UK and Ireland, and that only six ORFs need to be targeted to discriminate between ten clades.
In this study the multi-locus analysis proved very useful to support or negate the epidemiological data in the tracking of virus between selected premises. Viruses from the same outbreak had identical profiles whereas viruses identified on the same premises in different years were rarely identical suggesting reintroduction rather than reactivation or persistent circulation. In three outbreaks on premises with an epidemiological link the viruses had identical profiles. This included a case of suspected reactivation of virus from an outbreak in the previous year in a different province. In addition to providing support for epidemiological links the multi-locus analysis also provided conflicting evidence that a premise with multiple neurological cases was the source of virus linked to a single case on another premises.
In the future it is envisaged that molecular typing will become routine in our laboratory and as several of the EHV-1 clades are not geographically restricted, this approach can be used in other countries. Molecular evidence corroborating equestrian events or specific premises as the source of virus will assist in the implementation of targeted movement restrictions, quarantine and other control measures. The results of such analysis are not proof of a causal link but add strength and depth to a clinical advisory service. If for example, viruses of the same clade are isolated from cases on different premises linked directly or indirectly to return of horses from a training centre or public stud farm it becomes incumbent on the owner of that centre or farm to communicate a possible risk to clients. In terms of clinical management an informed decision may be taken to quarantine horses on the public premises until there is no further evidence of circulating virus or clients may implement extra biosecurity measures in relation to transport, isolation and monitoring of horses returning from that centre. Demonstration that repeated outbreaks of EHV-1 on individual premises are due to viruses of different clades can also contribute to clinical management. Owners of premises that suffer repeated incidences of EHV-1 associated disease frequently focus on reactivation of latent virus and identification and removal of a "carrier". Molecular evidence to the contrary facilitates the introduction of improved management practices with respect to vaccination, separation of broodmares from younger stock and sport horses and temporary isolation of visiting mares.
Since Nugent et al. (2006) [21] put forward the hypothesis that variants with the G2254/D752 substitution in the DNA polymerase have increased likelihood of association with neuropathogenicity, there has been international focus on the characterisation of the genotypes of EHV-1 isolates and allelic discrimination assays have been widely used to distinguish between neuropathogenic and non-neuropathogenic strains [42,43]. Many studies concentrate on the retrospective investigation of archived viruses and the prevalence data generated must be interpreted with caution due to sampling and storage bias. In this study, 35 of 272 viruses from 238 outbreaks had the G2254/D752 genotype suggesting a prevalence of 12.9% in Ireland. However, this prevalence may reflect sample bias for the isolates chosen for characterisation, prior to the introduction of routine genotyping in 2005. Thus, the true prevalence is likely to be nearer 9% calculated from the years 2005-2017. The findings indicate that the vast majority of Irish viruses have the A2254/N752 genotype. It has been demonstrated in some studies that horses infected with a G2254/D752 variant such as Ab4 show higher levels of virus shedding than horses infected with the A2254/N752 variant such as V592 [16,19,[44][45][46]. Subsequently it was suggested that this may indicate a selective advantage of the G2254/D752 strains which could favour an increase in prevalence such as that reported in the United States from 3.3% in the 1960s to 19.4% since the year 2000 [26,47,48]. In recent years there has been an increase in the number of severe EHM outbreaks reported in other countries including France which subscribes to a Tripartite Agreement for the free movement of horses without health checks between Ireland, France, and the UK [49,50]. However, neither a parallel increase in the incidence of EHM nor an increase in the detection of the G2254/D752 variant has been observed in Ireland. The abundance of A2254/N752 variants in the majority of field studies globally suggests that the proposed selective advantage of the G2254/D752 variant has not resulted in strain displacement in the wider equine population [25,35,48,49,51,52].
Viraemia is essential for the spread of the virus from peripheral blood mononuclear cells (PBMCs) to endothelial cells lining the blood vessels in the CNS or the pregnant uterus [11,53]. Horses experimentally infected with neuropathogenic strains develop a cell-associated viraemia greater in magnitude and longer in duration than with non-neuropathogenic virus strains [16,19,54] and G2254/D752 strains are more successful in the infection of PBMCs and the establishment of viraemia compared to A2254/N752 strains [46,55]. In this study there was a statistically significant association between viruses of the G2254/D752 genotype and hypervirulent disease expression as defined by outbreaks involving multiple abortions or neurological cases. International findings related to the association of EHV-1 genotype with pathogenic phenotypes vary. Nugent et al. (2006) examined 131 EHV-1 isolates from nine countries [21]. Of the 49 neurological isolates examined, 42 (86%) had the G2254/D752 genotype whereas 78/82 (95%) of non-neurological isolates had the A2254/N752 genotype. Following this study several large outbreaks of EHM documented in the literature were associated with the G2254/D752 genotype including outbreaks in Croatia [56], France [49], Germany [57,58], Canada [44] and the first reported outbreak of EHM in New Zealand [59]. However, studies in other countries showed that not all horses with EHM were infected with a strain of the G2254/D752 genotype and that A2254/N752 variants are also associated with neurological disease [24,26,52,60]. Similarly, the G2254/D752 genotype was associated with non-neurological/abortion outbreaks in Europe [25,51,60] and the Americas [48,52]. In this study, the likelihood of neurological disease was 27 times greater when the causal virus was of the G2254/D752 genotype rather than the A2254/N752 genotype. However, the onset of neurological disease cannot be fully attributed to this virus polymorphism and it is suggested that other viral pathogenicity determinants such as glycoprotein D and host factors such as age, gender, immunity and hormonal status may contribute to disease severity [46,61]. More recently Brosnahan et al. (2018) [62] investigated the role of host genetics and identified a SNP in an intron of a platelet-related gene associated with EHM.
Since 2006, when it was first proposed by Nugent et al. (2006) [21] that the ORF68 polymorphic region was a putative molecular marker for epidemiological studies, this region has been commonly used for genotyping of EHV-1 isolates in different countries: Australia [24], Ethiopia [31], Hungary [30], India [29], Japan [33], and Poland [32]. The study by Nugent et al. (2006) [21] identified six major groups (1-6) and two unassigned groups based on analysis of 106 global isolates and proposed that certain strain groups were geographically restricted. Sequence analysis in this study showed that all Irish isolates segregated into the six groups and one of the two unassigned groups described by Nugent et al. (2006) [21]. The majority of viruses characterised internationally also support this ORF68 grouping system. Cuxson et al. (2014) classified 52 Australian isolates as group 2 or 3 and two as group 5 [24]. Ninety-one Ethiopian isolates were restricted to group 4 [31], eight Indian isolates clustered within groups 4 and 5 [29] and a Japanese isolate was classified as group 2 [33]. However, several of these studies also reported a small number of viruses that could not be classified within the original proposed groups [24,29,33]. Studies in Eastern Europe identified further polymorphism. A study of 38 viruses from cases of abortion in Poland assigned three to group 3, four to group 4, and 22 to one of the unassigned groups [21] but nine were classified in two novel groups [32]. Similarly, in Hungary only 23 of 35 isolates fitted with the originally described groups (groups 2, 3, and 4) and four new groups were proposed [30]. None of the viruses in this study grouped in the novel groups proposed.
The original hypothesis that ORF68 groups are geographically restricted is not supported by the results of our study or those of other investigators. For example, Nugent et al. (2006) [21] found that all group 5 isolates came from outbreaks in North America; however, 9% (20/222) of Irish isolates belong to this group, which has been demonstrated to include viruses from Australia [24] and India [29]. Network analysis of Irish isolates with international strains showed that Irish isolates clustered within 7 nodes with isolates from the several different geographic regions. In agreement with the conclusions from studies in Hungary and Poland this suggests that ORF68 is not a suitable global marker [30,32]. However, this type of strain variation has been demonstrated to be a useful adjunct to epidemiological data when investigating disease outbreaks on multiple premises [56,57,63]. In this study ORF68 genetic analysis of Irish isolates substantiated virus tracking by multiple-locus typing. However, analysis of 222 isolates by both ORF68 sequence and multi-locus typing indicated that although both are useful molecular epidemiological tools multi-locus typing is more accurate. The ORF68 grouping system groups together some viruses from different clades which are readily distinguishable by multi-locus analysis. As more EHV-1 strains are sequenced internationally, additional polymorphisms and new clades are likely to be identified, with the potential to further refine epidemiological investigations in identifying transmission pathways.
In conclusion, this is the first study to explore the genetic diversity of EHV-1 in Ireland, the third largest producer of thoroughbred foals in the world [64]. The contribution of genetic characterisation to our understanding of viral pathogenesis, development of diagnostics, implementation of evidence-based management strategies, and predictions of likely outcome and disease spread is increasing. The data relating to over 250 EHV-1 isolates presented here adds depth to our knowledge of circulating genotypes and illustrates that tracking of virus by genetic analysis when used in combination with epidemiological data gives valuable insights and support for targeted preventive measures. An example of a targeted preventive measure is where on acceptance of data implicating mare sales as the transmission pathway for EHV-1 abortions at geographically disparate locations, a sales company introduces a new condition of sale that all pregnant mares are vaccinated against EHV-1. Consistent with previous studies globally, our results indicated that infection with a strain of the G2254/D752 genotype will not inevitably result in neurological disease. Nevertheless, the strong association with hypervirulence observed in this study suggests that it would be of benefit to veterinarians to be aware that horses in their care are at increased risk of developing EHM or multiple abortions when a virus of this genotype is detected.

Viruses
EHV-1 viruses archived at the Virology Unit of the Irish Equine Centre between 1990 and 2017 were retrieved along with the clinical histories available. The clinical samples, which had been stored at −70 • C, included nasal secretions and tissue homogenates from cases of neurological disease, abortion, and neonatal foal deaths. Two hundred and sixty-nine isolates originated from Ireland. Three isolates from two severe abortion outbreaks in Italy were also included. The samples were retrospectively allocated a unique reference number derived from the country, laboratory number and year of collection. A summary of the numbers of isolates analysed in this study is given in Table 3. An overview of all samples included in this study is given in Supplementary Table S1. Two hundred and seventy-two isolates (269 horses) from 238 outbreaks on 220 premises were genetically characterised by multi-locus typing. Representative viruses from 222 of these 238 outbreaks were also characterised by ORF68 sequencing. The majority of the viruses were recovered from clinical samples: nasal secretions from respiratory/neurological disease (n = 10) and tissues (lungs, liver, spleen, allantochorion, and amniotic cord) from cases of abortion/neonatal foal death (n = 237). Twenty-five samples with a low concentration of virus (brain tissue from cases of neurological disease (n = 2), nasal swabs (n = 14) and multiple tissues (n = 9)) were amplified by a single passage in cell culture. For culture isolation of these viruses, 25 cm 3 tissue culture flasks of near confluent rabbit kidney (RK-13) cells were inoculated with 0.5 ml tissue homogenate/nasal fluid. The cells were maintained in 5 ml of maintenance medium (supplemented with 2% foetal calf serum) at 37 • C in an atmosphere of 5% CO 2 [65]. The monolayer was examined for the presence of cytopathic effect (CPE). Where sequences were compared between DNA samples prepared directly from the sample and tissue culture isolates from the same outbreak, they were found to be identical (n = 10).

Extraction of DNA
DNA was extracted from 200 µL of tissue homogenates/nasal secretions/RK-13 infected cells using the QIAamp DNA Mini kit (Cat No: 51306, Qiagen) according to the manufacturer's instructions. Alternatively, DNA was extracted from 100 µL of sample by an automated method using the Kingfisher Flex Magnetic Particle Processor instrument (Thermo Scientific) with the LSI MagVet Universal Isolation Kit (Life Technologies) as per the kit manufacturer's guidelines.

PCR of multiple loci of EHV-1
The complete genome sequences of Ab4 (AY665713.1) and V592 (AY464052.1) were aligned using ClustalW [66]. Non-synonymous changes were identified between the two genomes in the protein coding regions (as described previously [21]) and primers were designed with similar annealing temperatures to amplify the loci of sequence variation between Ab4 and V592 (see Table 4). Primers were not designed for ORF24 and ORF71 as these contained repeat regions considered to be of limited use for epidemiological studies [21]. Primer sequences are detailed in Supplementary Table S5   Post-amplification 5 µL of each PCR product was analysed on a 1.2% agarose gel (Sigma) stained with 0.003% Sybersafe (Invitrogen). Reactions were purified using the QIAquick PCR Purification kit (catalogue no. 28106, Qiagen) or the QIAquick Gel Extraction kit (catalogue no. 28706, Qiagen). Purified PCR products were sequenced using Sanger dideoxynucleotide sequencing technology (MRC-University of Glasgow Centre for Virus Research, Glasgow, UK; GATC-Biotech, Cologne, Germany).

Multi-Locus Sequence Analysis
Nucleotide sequences obtained from targeted multi-locus sequence analysis were aligned to individual ORFs of reference strains Ab4 (neuropathogenic, G2254/D752 strain) and V592 (non-neuropathogenic, A2254/N752 strain) using Seqman. Comparative analysis of predicted partial amino acid sequences was carried out for each isolate by using the ClustalW [66] accessory application in BioEdit sequence alignment editor version 7.2.5 [67]. Twenty-eight complete and 78 partial EHV-1 genome sequences which had been included in the study by Bryant et al. (2018) [35] were mined from GenBank [68] (Supplementary Table S2). Nucleotide sequences of individual ORFs were translated using ClustalW implemented in Bioedit. Multiple amino acid sequence alignments were produced for individual ORFs with Ab4 as a reference sequence. Positions of amino acid sequence variation were recorded and tabulated for the sequenced isolates and EHV-1 genome sequences.
Thirty-one non-synonymous substitutions between Ab4 and V592 in 26 ORFs in U L and U S were examined. Amino acid alignments of complete genome sequences and isolates sequenced in this study identified seven additional substitutions to those in Ab4 and V592 in ORFs 11 (R235M), 13 (A405T, E492K, T493I, A499T), and 14 (R628K, S692N) which were also included in the analysis. A concatenated amino acid sequence based on these 38 amino acid differences was constructed for each isolate and the EHV-1 genome sequences. The resulting 38aa artificial peptide sequences (n = 321) were aligned using ClustalW (Supplementary Figure S1). Representative sequences (n = 126, including 96 isolates sequenced in this study) were aligned in MUSCLE [69] implemented in MEGA7 version 7.0.14 [70]. Phylogenetic analysis of these 126 sequences was inferred by the Maximum Likelihood method based on the Jones Taylor Thornton (JTT) matrix-based model [71] with bootstrap values determined over 100 iterations. The topology of the tree was examined for U L clade resolution based on the study of Bryant et al. (2018) [35].

PCR of EHV-1 ORF68
PCR primers were designed to amplify ORF68 (US 2 ) (detailed in Table 5) based on EHV-1 reference sequences Ab4 (GenBank accession AY665713.1) and V592 (GenBank accession AY464052.1) using the online application Primer3 [72]. Amplification was performed using the G-Storm (Gene Technologies) with the PCRx Enhancer System (catalogue no. 11495-017, Invitrogen) which is specific for the amplification of problematic and/or GC-rich templates. The reaction component consisted of 1X PCRx Amplification Buffer, 1.5 mM MgSO4, 2X PCRx Enhancer Solution, 0.4 µM each primer, 5 U of Taq DNA Polymerase (5 U/µL, Invitrogen), 0.2 mM dNTP Mixture (Applied Biosystems), 10 µL of template DNA, and nuclease free water to a final volume of 50 µL. Initial denaturation was carried out at 95 • C for 5 min, followed by amplification with 40 cycles of 95 • C for 30 s, 50 • C for 1 min, 72 • C for 2 min, and a final elongation at 72 • C for 10 min.

ORF68 Sequence Analysis
The four overlapping sequence regions of the 1313bp ORF68 amplicon were assembled for each isolate using Seqman Version 5.01 (DNASTAR). Nucleotide sequence alignments were performed using the ClustalW [66] application in BioEdit [67]. Sequences were aligned using Ab4 strain as a reference to identify variable positions and perform grouping of the isolates as per Nugent et al. (2006) [21]. Isolates from 222 of the 238 outbreaks, defined as an occurrence of one or more cases in an epidemiological unit, were included in the analysis. Incomplete ORF68 sequence data was obtained for 16 of the 238 outbreaks which necessitated exclusion from the analysis. ORF68 sequence data for a further 219 EHV-1 isolates were retrieved from GenBank (sequence information can be found in Supplementary  Table S4). The ORF68 alignment of 464 bp in length was converted to nexus file format using Seqret in EMBOSS [73]. An international median-joining haplotype network of EHV-1 ORF68 sequences, colour-coded by geographic location, was constructed in PopART version 1.7 [74] as described previously for Polish EHV-1 strains [32].

Statistical Methods
A chi-squared test for association and proportion was used to test the null hypothesis that there was no difference in the relative proportions of the G2254/D752 genotype between: isolates originating from neurological and non-neurological outbreaks, isolates originating from single cases of neurological disease and outbreaks with multiple neurological cases, and isolates originating from hypervirulent disease expression defined as multiple abortions or multiple cases of neurological disease, and those from sporadic cases. The same statistical test was used to investigate the statistical significance of: the G2254/D752 genotype, hypervirulent disease expression, and neurological disease, across the clades where Irish strains resided. Odds ratios for neurological disease, hypervirulent disease expression, and multiple neurological cases with the putative neurological marker were estimated with a 95% confidence interval. The data was summarised in 2 × 2 contingency tables and analysis was conducted in the R statistical software package version 3.5.1. The statistical significance was set at α = 0.05.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-0817/8/1/7/s1, Figure S1: Alignment of 322 EHV-1 artificial peptide sequences; Table S1: Summary of EHV-1 strains genotyped and listed in order of U L clades; Table S2: Accession codes for EHV-1 sequences used in alignment; Table S3: Premises where more than one isolate was characterised by multi-locus sequence typing; Table S4: Details of EHV-1 ORF68 sequences used in network analysis; Table S5: Primer sequences used for multi-locus amplification.