Every year, millions of horses are transported over long distances by road, sea, and air [1
]. Horses may be transported for various purposes and, unlike other farmed species, many times in their lives [3
], and travel conditions and related welfare consequences differ depending on the situation. Although transport is a potential source of physical and psychological stress for all horses, the risks of mortality, disease, and injury are generally higher for low-value animals, such as meat horses, which are often transported in unsatisfactory conditions [4
]. Moreover, several countries have no plants that slaughter horses for human consumption. As a result, each year, hundreds of thousands of horses are subjected to a gruelling, cross-border journey (>8 h of travel) that ends in slaughter [2
]. Long journeys increase the risk of welfare issues and often lead to blurring of information related to transport conditions [8
The effects of transportation on the welfare of horses include anxiety-related behaviours, aggression, exhaustion, injury, respiratory and gastrointestinal disease, dehydration, pyrexia, and immunosuppression [3
]. About 1% of horses die en route [4
] but a greater percentage of animals are euthanised later due to severe injuries sustained during the journey or have non-visible injuries such as bruising, which is only recognisable post-mortem [11
]. Behavioural and physiological responses, as well as injury rates, are affected by management factors such as vehicle specification, journey duration, and driver experience [12
], and also by the physical fitness of the horse, its temperament, and its coping strategies [3
]. For example, Grandin et al. recommended that aggressive horses should be segregated during transport because fighting was documented as a major cause of injury [17
]. Fazio et al. evaluated the stress responses of stallions with different temperaments and found that nervous stallions had poor capacity to adapt to transport, probably as a result of adrenocortical depletion [18
]. Conversely, changes in physiological parameters and problem behaviours decrease with repeated transport, suggesting that transport-induced stress responses are reduced in horses habituated to the situation [19
The influence of prior handling on the degree of transport stress experienced by horses has not been extensively studied. It is known, however, that abilities and coping strategies differ for broken (tamed) and unbroken (untamed) horses [21
]. The response of unbroken horses to challenging situations is characterised by high arousal, fear, excitability, and a negative emotional state, increasing their risk of distress and travel-related pathologies. Knowles et al. [5
] confirmed that transport management of unbroken ponies should take into account their ethology and physiological response to stressors. For example, unlike handled horses, it is preferable for untamed ponies to travel as a group [5
]. It is likely that their strong herding instinct underlies this difference [5
]. The same authors realised the importance of having a tool that would predict individual horse’s responses to transport stress before embarking on the journey. These authors failed to find a strong relationship between pre-transport behaviours and aggressive behaviour during transport. However, rating the reactions of untrained horses during a novel object or handling test may be a better predictor of their temperament, emotional responses, and coping strategies [23
The European Union recognises that broken and unbroken horses have different needs and adaptability skills. For this reason, Regulation EC 1/2005 on the protection of animals during transport includes stricter rules for the transportation of unbroken animals: they must not be transported on journeys over eight hours, tied during transport, or transported in individual bays, but must instead travel in groups of ≤4. An unbroken horse is defined in the Regulation as a horse that “cannot be tied or led by a halter without causing avoidable excitement, pain or suffering”. However, this definition is not accompanied by verification procedures; therefore, in practice, there is still no test to identify whether a horse is broken or unbroken. Thus, even when violations are identified during on-road inspections, nobody can be fined because official veterinarians do not have a test to identify whether a horse is broken or not. The European Parliament has expressed serious concerns about horse welfare during transportation and admits that there is still a high level of regulatory noncompliance, mainly related to unbroken horses [2
]. It is therefore essential to provide official veterinarians with a tool that allows them to categorise horses and, consequently, direct transporters towards the correct transport procedures. Provision of a reliable test and the resultant reduction in the number of horses that travel under inappropriate conditions would avoid many injuries and substantial suffering.
The objective of this study was to develop and validate a behavioural test to identify whether a horse is broken or unbroken. The study was based on the hypothesis that horses show different behavioural and physiological responses to being approached, haltered, and led, depending on prior level of tameness.
This study describes the development and validation of a test (BUT) that—for the first time—allows the classification of horses as broken or unbroken. Our results confirmed our hypothesis that horses with different levels of prior handling would react differently to being approached, haltered, and handled. The BUT is based on scoring the horse’s behaviour when it is approached, haltered, and handled in a standardised way. Each horse receives a score that ranges from 0 to 4, where 0 indicates the worst situation while 4 indicates the best situation. Thus, low BUT scores were assigned to horses that showed nervousness and avoidance behaviours, as well as those that could not be approached, haltered, and led, whereas high BUT scores were assigned to horses that exhibited fewer avoidance behaviours and could be approached, haltered, and led easily. We established a threshold that allows classification of a horse as broken (BUT score ≥ 2) or unbroken (BUT score < 2). This simple test could fill a legislative gap as, although Regulation EC 1/2005 includes different rules for transport of broken and unbroken horses, no tool for the classification of horses has previously been available. Due to their greater reactivity, unbroken animals are at increased risk of injury and disease. However, if the BUT was included in future transportation regulations worldwide, it would ensure that the correct transport procedures were followed for these animals and would help officials to verify regulatory compliance. Regular application of the BUT before a journey to a sub-group (randomly selected) of the horses in departure could therefore safeguard the horses’ welfare.
The current study followed the rigorous validation process that is required to confirm the reliability and validity of behavioural rating scales [36
]. Agreement analyses showed that the AHT, the HT, and the BUT all have excellent inter-observer and intra-observer reliability. The highest agreement indices were obtained for score 0, while the lowest were obtained for score 1. These findings are not surprising because a score of 0 indicates not only high arousal levels (e.g., aggressive responses, fear, excitation) but also test failure (animal not haltered or led), a situation that is well-defined and unambiguous. Conversely, a score of 1 indicates an intermediate arousal level (moderate reluctance and one or more avoidance behaviours) where subjective judgment could have a greater influence. This is the first test developed to assess the horses’ level of prior tameness, and there is no literature with which to compare the results directly. However, similar values of inter-observer agreement have been reported for tests evaluating the human-animal relationship [31
] and for some pain scales [25
]. Czycholl et al. [49
] have recently evaluated inter-observer reliability of the indicators proposed by the AWIN protocol for horses, including behavioural tests scored on a 3- or 2-point scale. The authors reported acceptable-to-good agreement for all the indicators but highlighted that behavioural responses such as fear and avoidance, as well as approach tests, may show low reliability in horses because—similar to other species [50
]—they may show incongruent behaviours (e.g., the simultaneous existence of curiosity and fear could lead them to approach, then run away, and then return). However, although defining an exact level of arousal may be problematic, our experience shows that rating horses’ behavioural responses with the BUT is a reliable and easy way to judge the level of prior handling and tameness of unfamiliar horses.
Test–retest reliability data were obtained by repeating the BUT on a subset of horses three weeks after the first session. Although test–retest reliability was lower than inter- and intra-observer reliability, it was good for the AHT and the HT, and excellent for the BUT. This result was expected as it is commonly accepted that test–retest could be influenced by many factors including changes in test conditions, physical or mental state of the animal, and its learning experience [36
]. In the present study, the main sources of variability affecting test–retest agreement were likely to be intra-observer variation and changes in the horses’ behavioural responses. Even in a well-defined test situation, test–retest reliability is very sensitive to the animal’s affective state and mood on the day. For example, a positive emotional state could imply a lower latency time in approaching humans and less aggressive and fearful behaviours [52
], leading to a higher BUT score. Conversely, a negative emotional state leads to fearful and cautious reactions [52
], which could result in a lower BUT score. In the repeated BUTs, although many days had passed between the two tests, a positive or negative emotional response could also be linked to the horse’s memory of its previous BUT experience [50
]. Pain is another factor that could influence the horse’s response to human approach [31
]. To limit this bias, only horses that appeared to be healthy on visual clinical evaluation were used. Moreover, although tester and test area were the same in the two repetitions and no major changes had occurred within the farms, identical test conditions could not be guaranteed. For example, changes in social dynamics of the herd or the farmer’s handling between the two sessions could have affected the horses’ behaviour and, therefore, the test scores. Most horses were tested in the presence of other animals, and this could also confound the results. Unfortunately, none of these factors could be controlled in the present study, and this may explain the relatively low test–retest reliability. In spite of these caveats, the agreement indices indicate that the horses’ responses to the BUT were consistent over time. We suggest, however, that assessors in the field should take into account the environmental and psychological context in which the test is conducted during their scoring, as the stability of the BUT across different situations has not been confirmed. We also suggest that the tester wear protective equipment and always stop the test at the first signs of distress or aggressive behaviour.
Our BUT demonstrated both construct and criterion validity. The correlation between BUT and the recorded physiological and behavioural measures, which are known to be related to the level of taming [21
], confirmed its construct validity [36
]. Some authors have indeed claimed that the different reactions to stimuli of “naïve” horses compared to trained ones are related to activation of different areas of the brain [22
]. In particular, broken horses do not usually show negative reactions when exposed to humans or novel environments, whereas unbroken horses typically show a high level of emotion and display different behavioural (aggression, fear, vocalisation, defaecation, and so on) and physiological (changes in heart rate, blood pressure, hormones, respiration rate, and so on) stress related responses [21
]. Several authors [23
] have shown that, compared with unbroken horses, broken horses have a lower increase in heart rate, approach the tester sooner, and are caught more quickly than unbroken horses during novel object and handling tests. It has also been shown that a reduction in emotional reactivity and improvement in the human–horse relationship continue as the number of training and handling sessions increases [60
]. Our correlation analyses have confirmed that a high BUT score indicates a broken horse, as this was associated with lower respiratory rates, avoidance distances, as well as the time taken for approach, haltering and handling. Conversely, horses achieving a low BUT score could be defined as unbroken because they showed high RR, avoidance distance, and test times. However, there was no correlation between BUT score and HR or eye temperature. HR has been used previously to assess the personality and reactivity of horses [23
], including those that are unbroken [58
]. The inconsistency between our results and those of previous studies is likely to be due to the amount of missing HR data in our study. Its measurement required close contact with the animal and therefore could not be collected for many horses, particularly those that were unbroken, as the test was stopped if the horse showed signs of distress (e.g., flight response). Moreover, HR in horses increases during physical exercise, and, after the BUT, it could have been difficult to distinguish between emotional and physical reasons for an increase in HR. Eye temperature, on the other hand, has been used as an indicator of arousal for horses [33
], but this can be influenced by many factors, especially when measured in the field [34
]. We tried to standardise some of these factors; however, this was not always possible under field conditions. Indeed, ET was still strongly influenced by environmental temperature and measurement distance, and these could represent confounding factors that mask its association with the animal’s level of reactivity. In the context of the present study, parameters that do not require close contact with the animal seem not only feasible but also fit for purpose. Higher respiratory rates, the presence of avoidance behaviour, and longer approach times for unbroken horses indicate activation of the sympathetic nervous system and a “fight-or-flight” response. They also suggest that these horses perceive approach by a human as a dangerous situation that triggers a negative emotional state and a high degree of arousal [24
]. Fearful and stressed horses are more likely to develop transport related respiratory disease after 8 h journey [62
], so unbroken horses may be at higher risk when travelling over this lenght. The criterion validity was investigated by choosing the expert’s judgment as the ‘gold standard’ criterion measure against which to compare the BUT score. The expert evaluated the horse in the test area and defined it as broken or unbroken based on its ability to respond to pressure, used as negative reinforcement [28
]. Our findings confirm the responsiveness and predictive value of the BUT on the basis that the likelihood of the horse being broken increased substantially with each additional BUT score point. The responsiveness of the BUT score also suggests that it could be used to indicate different levels of prior taming. For example, a BUT score of 0 could be defined as “no taming,” while scores of 2 and 4 could be defined as “moderate level” and “good level” of taming, respectively.
The utility of the BUT as a tool for widespread use is that it can discriminate, with high sensitivity and specificity, between a broken horse and an unbroken one. Specifically, a horse would be defined as broken if its BUT score was ≥2 and as unbroken if it was <2. Our statistical approach thus confirmed the criterion validity of the BUT and suggested a rigorous procedure for applying it. After adequate training, official veterinarians would be able to score a horse’s behaviour objectively using the BUT, decide whether the horse is broken or unbroken, and advise on the transport procedures that should be put in place. In addition to BUT’s binary classification (broken vs. unbroken), which should direct personnel towards using specific transport procedures, the BUT score may indicate the horse’s level of taming. This information could accompany the animal throughout its transport and could also be relevant to human safety, because the fearful and aggressive reactions that characterise a low level of taming have been identified as the major cause of horse-related accidents [21
The results achieved so far proved that the BUT could be a reliable and valid tool. In contrast, when the observers were asked to classify horses using the definition proposed in Regulation EC 1/2005, all the agreement analyses showed that this classification system had poor reliability. It follows from this that the definition of unbroken horses, as written in the current legislation, is unclear. This could have led to confusion and consequently to the transport of unbroken horses over long distances and in inappropriate transport conditions [2
]. This highlights the need to include, within the ongoing revision of the current legislation, a better definition of unbroken horses. However, we would also like to question the terminology that is currently used to define horses’ prior level of handling and training (i.e., taming). The term ‘unbroken’ was used in this study because it is the term used in Regulation EC 1/2005 to describe untamed horses. However, the converse state (‘broken’) implies that the animal has been ‘defeated’, ‘beaten’, ‘overpowered’, or ‘vanquished’, a terminology that is outdated at a time when humane animal handling and training procedures are prevailing worldwide. The term ‘broken’ is also used to indicate that the horse has been trained for riding or driving, something that is irrelevant in the context of this legislation. We, therefore, suggest that the term ‘unbroken’ should be replaced with ‘unhandled or untamed’ in the updated version of Regulation EC 1/2005.
Our findings need to be interpreted with caution because this study has several limitations. The BUT was applied to a draught horse breed, and all horses were tested in their paddock. Consequently, our findings need to be confirmed by applying the BUT on a larger population of horses housed in both familiar and unfamiliar environments. Untamed meat horses are often conducted in unfamiliar pens and kept there with a low space allowance before loading, so the BUT should be also re-conducted in a real setting; during the application of the BUT in a real-world, many other problems could happen, which may require a refinement of the described procedures. However, even if our results are preliminary, they confirmed that tamed and untamed horses have a different reaction when approached, haltered, and led. These differences in reactivity and their relationship with humans suggest that—as is currently the case under Regulation EC 1/2005—different transport procedures must be followed in these two groups of horses. This would help to reduce the distress that can be associated with the transport of horses with different levels of taming prior to transport. Since there is a need for a robust procedure that allows identification of these animals, based on our findings, it may be suggested to include the BUT in the legislation on the protection of welfare during live animal transport. This will allow personnel to define, prior to shipping, whether a horse is broken/tamed or unbroken/untamed, thus paralleling the current requirement for pre-transport assessment of fitness for travel. BUT test would take a bit of time during the preparation phase of transport; however, this little time investment may be crucial to safeguard the welfare of the travelling horses as well as the horse handlers, who often get injured during loading and unloading procedures. Horses and human health and welfare are indeed interconnected, and the application of BUT may therefore enhance both.