Investigation of a Standardized Qualitative Behaviour Assessment and Exploration of Potential Influencing Factors on the Emotional State of Dairy Calves

Simple Summary Although welfare states of dairy calves are of public and scientific concern, no standardized protocol exists to assess the emotionality of these animals. Therefore, this study aimed at investigating and establishing a calf-specific term list for Qualitative Behavior Assessment (QBA), a technique that is already validated for assessing emotional states in many animal species. The statistically supported results showed that agreement can be reached among observers, terms showed varied results across farms, and evaluated emotional states could be linked to some explaining farm factors. Overall, results showed that calves have a neutral emotional state and profit from certain farm factors. However, we conclude that the assessment should be more widely used to gain more insight into calves’ welfare states and how their emotional state can be improved to a positive one. Abstract Assessing emotional states of dairy calves is an essential part of welfare assessment, but standardized protocols are absent. The present study aims at assessing the emotional states of dairy calves and establishing a reliable standard procedure with Qualitative Behavioral Assessment (QBA) and 20 defined terms. Video material was used to compare multiple observer results. Further, live observations were performed on 49 dairy herds in Denmark and Italy. Principal Component Analysis (PCA) identified observer agreement and QBA dimensions (PC). For achieving overall welfare judgment, PC1-scores were turned into the Welfare Quality (WQ) criterion ‘Positive Emotional State’. Finally, farm factors’ influence on the WQ criterion was evaluated by mixed linear models. PCA summarized QBA descriptors as PC1 ‘Valence’ and PC2 ‘Arousal’ (explained variation 40.3% and 13.3%). The highest positive descriptor loadings on PC1 was Happy (0.92) and Nervous (0.72) on PC2. The WQ-criterion score (WQ-C12) was on average 51.1 ± 9.0 points (0: worst to 100: excellent state) and ‘Number of calves’, ‘Farming style’, and ‘Breed’ explained 18% of the variability of it. We conclude that the 20 terms achieved a high portion of explained variation providing a differentiated view on the emotional state of calves. The defined term list proved to need good training for observer agreement.


Introduction
Positive indicators in animal welfare assessment schemes are still limited although, nowadays, essential in many ways. Qualitative Behavioral Assessment (QBA) focuses on animals' demeanor and body language to identify the underlying emotional state. QBA is one possibility which is frequently used to assess animals' emotional states on-farm and in experiments, using either a free choice profiling (FCP) [1] or term lists as behavioral descriptors [2]. Validity aspects of QBA were investigated in 2015 compared to an average of 65.9 cows in 2000, rendering the larger herds with increased challenges regarding mortality [18] similar to their Nordic neighbors [19,20]. Additionally, herd size has also been associated with other management practices such as the choice of grazing or organic production status in smaller farms [21], an issue also evident in the Italian dairy production [22]. Larger production sites require a larger number of employees. In some farms, however, this only implies a greater number of animals per stockperson, which could negatively affect the welfare of the entire herd as discussed by Simon et al. [23]. Within the last decades, a vast number of non-natives being employed on Danish farms for varying time periods has led to a high turnover in manpower, challenging management in terms of language and cultural barriers and possibly different attitudes towards calf handling and rearing. To our knowledge, no previous studies have investigated the potential influence of these farm factors on the emotional state of calves. Likewise, studies evaluating observer agreement on QBA video material for the evaluation of a prespecified term list in dairy calves has not been published so far. Furthermore, information on emotional states on calves in dairy herds, especially in terms of aggregated scores (WQ-C12) at farm level, was desirable for future purposes.
Therefore, this study aimed at investigating a QBA procedure, including 20 terms, by extrapolating the main dimensions of the calves' emotional states achieved by Principal Component Analysis (PCA) and testing reliability by means of comparing observer agreement on video material. Further, we performed the QBA aggregation procedure and integrated score calculation to achieve Welfare Quality criterion scores (WQ-C12) at farm level. Finally, we were investigating certain farm factors potentially related to the emotional state of dairy calves for identifying explanatory variables on WQ-C12.

QBA Procedure
The WQ protocol for calves and heifers developed earlier [16] and the proposed QBA description was used for the on-farm assessments of emotional state. The proposed QBA was a term list with 20 descriptors, 'Active, Relaxed, Uncomfortable, Calm, Content, Tense, Enjoying, Indifferent, Frustrated, Friendly, Bored, Positively occupied, Inquisitive, Irritable, Nervous, Boisterous, Uneasy, Sociable, Happy, Distressed', which was used to score behavior after an observation period of 20 min. Each term was scored on a 125 mm continuous scale (Visual Analogue Scale), where left represented the 'minimum' and right the 'maximum' point, by crossing each scale at a certain point fitting the observation. A very left position on the scale or a zero indicated that the expressed quality of the specific behavior/term was "entirely absent in any of the animals seen", whereas the 'Maximum' at 125 mm stood for "the expressed quality of the specific behavior/term was constantly obvious across all animals seen during the observation" as was relevant in the case of herd observations [2]. Additionally, further considerations had to be integrated in the scoring by the observer, as the aim was also to imagine the respective endpoints of the scale as what was possible as an expression of a behavioral quality for the respective species for the given age and sex. Hence, the term "Inquisitive" would be at a different expected level in adult cows compared to calves. Training and testing of observers was performed to ensure calibration and the correct use of the scale for QBA. Initially, this included exchange of experiences amongst observers on different impressions from different husbandry systems or circumstances leading to minimum or maximum points on certain behavioral expressions, illustrating for their colleagues' potential magnitudes and enhancing understanding of the potential dimensions.

Video Sessions and Observers
Reliability testing of the chosen QBA term list was done by including eleven trained observers (7 females, 4 male) to a test panel which was asked to score QBA in 20 videos showing calves, in one go, with a length of 1-2 min each. All observers had a higher education at a university level in the field of Animal Science or Veterinary Medicine from across Europe, with both males and females being represented. All participants had a minimum experience of a week-long course on QBA in cattle (cows and calves), including on-farm and video practice, and score calculations with the WQ-QBA lists for dairy cattle and dairy calves. Prior to the video session, all the terms were discussed and calibrated again amongst all observers according to the definitions that can be found in Table 1. All observers scored the videos on QBA paper sheets with the above-mentioned term lists and transferred their scores (in mm) to a provided electronic data sheet.

QBA Application On-Farm, Observers, and Farms
Before the on-farm assessments started, four observers familiarized themselves with the terms of the above-mentioned list and their descriptions (Table 1), ensuring they understood the terms and used them in a similar context. Furthermore, the four observers, all female and veterinarians, applying the QBA were all previously trained for WQ in dairy cattle and welfare assessment of dairy Animals 2019, 9, 757 5 of 12 calves including the new QBA system, respectively. They achieved sufficient agreement with a silver standard (previously trained and certified person) and each other (r > 0.7). For logistical reasons and limited resources of the study, each farm was visited by one observer only, observers assessed in their respective affiliated country. For the same reasons, farms were recruited by the observers themselves in their own countries, firstly dependent on farmers' willingness to join voluntarily, and secondly observers tried to involve a high variation of farm sizes, breed, farming styles, and locations in order to potentially achieve a large variation of emotional states in calves. Therefore, the sample of farms ended up being a conveniently chosen one, an expected situation in on-farm studies, as nothing else was feasible. Nonetheless, it covered a wide range of different husbandry conditions for the calves, and therefore expectedly also a variety of potentially possible QBA results.
The on-farm assessments were carried out in dairy cow farms in Denmark (40) and in Italy (9), 31 conventional and 18 organic farms size varying from 106-608 lactating cows with an average of 48.2 calves per farm. Calves were mainly housed single or pairwise in the first month and later on moved to group housing. Breeds present on farms were either Holstein (25), other milk types such as Jersey or Brown Swiss (13), or mixed, meaning mixed breeds and/or mixed herds (11). In 13 of the farms, weaned calves grazed on pasture, farms keeping males had on average 12 bulls, and the number of full-time equivalent workers ranged from 1 to 3.5. The assessment was carried out in the early morning (around milking time for the cows, before morning feeding for the calves), as the first evaluation upon arrival at the farm. The calves at herd level were observed for 20 min in total, including all animals (male and female) aged 0-180 days present at the time of inspection. If animals were housed in several groups, the observation time between groups was split with a maximum of 8 observation points. In principle, animals were observed at group level even if housed individually or pairwise, in as large observational segments as possible and visible. After finishing the observations, the observers turned away from the animals, i.e., walked away from the animals to not observe any further, and made their scoring on a paper version of QBA.

Statistics
The data collected on-farm were analyzed using a Principal Component Analysis (PCA). The analysis was based on a correlation matrix, without rotation, and two components were extracted (after pre-exploration of an unlimited number of components and selection of eigenvalues >1 adjusted by the explained variances to avoid under/over factoring). Analyses were performed in R [24] using the libraries rcmdr, psych, Deducer, DeducerExtra. The dataset was sent to an external researcher, experienced with QBA and PCA, revealing the same PCA results and Graph of terms with a different statistical program (Minitab) to reassure correct methodology.
Accordingly, scores attributed to the calves by the 11 observers during the video's administration were submitted to PCA analysis using a correlation matrix with no rotation. The PCA scores attributed to the 20 videos on the first two main Principal Components were tested for interobserver reliability using Kendall Correlation Coefficient W. Kendall W values can vary from 0 (no agreement at all) to 1 (complete agreement), with values higher than 0.6 showing substantial agreement. Subsequently, the interobserver reliability for each descriptor separately was calculated using the intraclass correlation coefficient (ICC).

Procedures with the On-Farm QBA Scores
The PC1-score, obtained from the PCA for each farm, was analyzed as a dependent variable in a linear regression model using the 20 terms as predictors to estimate the weights (reported as "estimates" in Table 1) according to the approach by Budaev [25]. The WQ criterion score (WQ-C12), defined in the WQ cattle protocol [12], was calculated using the following formula: where N k is the value (in mm) obtained by a farm for a given term k, w k and the weight (= estimate, given in Table 1) attributed to a given term k 'constant', a fixed value for each farm (= intercept, given in Table 1) The WQ-C12 was used for the interpretation of the calves' welfare state. This index of the 'Emotional state' was analyzed applying a linear model and considering given farm characteristics and managerial choices as predictors (factors). The factors that were constant for every farm with no variation (e.g., age at separation from cow, if born in a calving box) or with too much missing information (>80%, e.g., bedding type of calves) were dropped out. The following factors could be considered for the analysis: housing outdoors or indoors of young calves (HOUSING_Y) and older calves (HOUSING_O) respectively, if weaned calves were on pasture or not (WEANED_PASTURE), the number of calves (CALVES_NO), cows (COWS_NO), young heifers (NO_YOUNGSTOCK) and bulls (NO-BULLS) on the farm, the prevalent cattle breed (BREED), the farming style as 'organic' or 'conventional' (FARMSTYLE), and how much manpower was available (MANPOWER). After pre-elimination of correlated factors, a back/forward procedure based on AIC revealed the best fitting model. Residuals were graphically checked using the function qqplot in R.

PCA for Video and On-Farm Data
Analysis of the video data looking into the agreement of 11 different observers, revealed good overall agreement (Cronbach's Alpha 0.83) between observers on the QBA for dairy calves and an ICC of 0.83. Overall explained variance of the PCA was 29.15% for PC1 and 16.34% for PC2, with an eigenvalue of 5.83 and 3.39, respectively. Agreements between observers for each descriptor were good for 14 terms (Cronbach's Alpha 0.74-0.97) and moderate for five terms (Cronbach's Alpha 0.46-0.64), the term 'distressed' couldn't' be analyzed (further details in Table 2). PCA for the on-farm data summarized QBA descriptors on two main components, PC1 and PC2, with eigenvalues of 8.065 and 2.662, explaining 40.3% and 13.3% of the variation, respectively. Farm PC1 scores ranged between −2.67 and 1.49 and PC2 scores from -2.17 to a maximum of 2.42. The PC1 and PC2 loadings for the terms can be seen in detail in Table 1, and their distribution in the four quadrants in Figure 1. PCA for the on-farm data summarized QBA descriptors on two main components, PC1 and PC2, with eigenvalues of 8.065 and 2.662, explaining 40.3% and 13.3% of the variation, respectively. Farm PC1 scores ranged between −2.67 and 1.49 and PC2 scores from -2.17 to a maximum of 2.42. The PC1 and PC2 loadings for the terms can be seen in detail in Table 1, and their distribution in the four quadrants in Figure 1.  (Table  1). On an individual level, farm results showed great dispersion across the two dimensions, as can be seen in the PCA graph 'farm results' in Figure 2, thus indicating that a discrimination of farms with two dimensions is satisfactory and sensitive enough to picture different emotional states of calf herds.  (Table 1). On an individual level, farm results showed great dispersion across the two dimensions, as can be seen in the PCA graph 'farm results' in Figure 2, thus indicating that a discrimination of farms with two dimensions is satisfactory and sensitive enough to picture different emotional states of calf herds.

Welfare Quality Scores for the Emotional State of Calves and Associated Factors
The PC1 for each farm, translated to the WQ-criterion score (WQ-C12) gave a mean of 51.1 ± 9.0 points (0 points = worst to 100 points = excellent situation), indicating neutral states (on average) in the involved farms. However, as farm scores ranged from 28.84 to 66.19 points, the average was not representing the actual emotional state for all calf herds included in the study. Results of investigating between-farm variance by modeling WQ-C12 results with certain farm factors revealed that CALVES_NO, FARMSTYLE, and BREED stayed in the model, explaining WQ-C12 best (p-value < 0.05; r 2 adjusted: 0.18). Farms following the organic style achieved significantly higher estimates (+6.77; p < 0.01). In terms of farm size, every additional calf was revealing a significant, but small, positive effect (+0.05; p < 0.05). For 'BREED', effects were not significant and in our study. Holsteins had the highest (+3.75; p = 0.21) and Jersey the lowest estimates (−1.75; p = 0.61).

PCA and Achieved Dimensions
Firstly, this study aimed at investigating the dimensions created by Principal Component Analysis (PCA) using 20 terms describing different quality of behavior in dairy calves. Like other studies applying this procedure, PC1 was found to summarize terms describing 'Valence' and PC2 'Activity', comparable to PC1 'Mood' and PC2 'Activity' for donkeys [7], cows [6], or pigs [10]. The distribution of certain terms across the four quadrants, and their meaningful loadings (>0.5) on PC1 and PC2 [25], also fitted the expressed behavior. Amongst others, the term 'Frustrated', loading very negatively on 'Valence' but positively on 'Activity', was well aligned with the expectations when picturing the behavioral modulation in young calves using this set of terms. Furthermore, the defined components can be seen as a good summary of the overall emotional state, as terms that are close in their expressed quality also go well together in groups across the quadrants, such as 'Happy' and 'Content', or 'Indifferent' and 'Distressed'. There was only one term, 'Boisterous', not loading substantially on one of the two axes, indicating that it is not substantially needed to describe the

Welfare Quality Scores for the Emotional State of Calves and Associated Factors
The PC1 for each farm, translated to the WQ-criterion score (WQ-C12) gave a mean of 51.1 ± 9.0 points (0 points = worst to 100 points = excellent situation), indicating neutral states (on average) in the involved farms. However, as farm scores ranged from 28.84 to 66.19 points, the average was not representing the actual emotional state for all calf herds included in the study. Results of investigating between-farm variance by modeling WQ-C12 results with certain farm factors revealed that CALVES_NO, FARMSTYLE, and BREED stayed in the model, explaining WQ-C12 best (p-value < 0.05; r 2 adjusted : 0.18). Farms following the organic style achieved significantly higher estimates (+6.77; p < 0.01). In terms of farm size, every additional calf was revealing a significant, but small, positive effect (+0.05; p < 0.05). For 'BREED', effects were not significant and in our study. Holsteins had the highest (+3.75; p = 0.21) and Jersey the lowest estimates (−1.75; p = 0.61).

PCA and Achieved Dimensions
Firstly, this study aimed at investigating the dimensions created by Principal Component Analysis (PCA) using 20 terms describing different quality of behavior in dairy calves. Like other studies applying this procedure, PC1 was found to summarize terms describing 'Valence' and PC2 'Activity', comparable to PC1 'Mood' and PC2 'Activity' for donkeys [7], cows [6], or pigs [10]. The distribution of certain terms across the four quadrants, and their meaningful loadings (>0.5) on PC1 and PC2 [25], also fitted the expressed behavior. Amongst others, the term 'Frustrated', loading very negatively on 'Valence' but positively on 'Activity', was well aligned with the expectations when picturing the behavioral modulation in young calves using this set of terms. Furthermore, the defined components can be seen as a good summary of the overall emotional state, as terms that are close in their expressed quality also go well together in groups across the quadrants, such as 'Happy' and 'Content', Animals 2019, 9, 757 9 of 12 or 'Indifferent' and 'Distressed'. There was only one term, 'Boisterous', not loading substantially on one of the two axes, indicating that it is not substantially needed to describe the emotional states of calves. Therefore, it could be argued to exchange it with another term or leave it out and redo the analysis. On the other hand, there might be farms in the future where this term will show a greater response, as it should be seen in the light of our case-to-variable ratio being 2.25, lower than reported from other studies by Budaev [25]. Besides that, breed differences in temperament might be causing this vaguer response. Coming from different countries and with different linguistic backgrounds, the team was communicating and scoring QBAs in English, giving room for slightly different interpretations of the term 'Boisterous', especially in young animals. This might be mitigated in the future by the intensive use of written QBA descriptors and definitions, such as those published for donkeys by Minero et al. [7] also seen in Table 1 in this paper. This approach would probably be helpful but cannot replace training and live agreement between observers. However, as shown by the weights that were assigned to 'Boisterous', it did not have a major impact on WQ-C12 and should be kept in the term list until further studies and more data are available. A similar argument could be done for the negative term 'Distress', which is a severe condition and thus was not recorded for the majority of observers in this study but would be very useful in case QBA is applied in extremely poor welfare conditions or discriminating ones.

Agreement of Observers in Video Study
For most of the terms, it seems that the current training regime and previous knowledge of QBA amongst the observers was sufficient for a good agreement. However, when looking at the terms with low agreement, which to a large extent described negative states such as 'Indifferent', 'Frustrated', 'Irritable', or 'Uneasy', it seems that observers had recognized different valences. Disagreement can have two reasons-different perceptions or different scorings. With regard to video sessions, well known limitations are the quality of the footage and the restricted viewing angle and overall impression towards the animal an observer has. In short clips, it might be difficult for some observers to focus immediately on the animals' demeanor. However, short clips are preferred to avoid observers suffering from fatigue throughout a long video session, since a minimum number of videos is necessary for the analysis.
A further source of disagreement is a very low or disguised level or absence of a certain term throughout videos, or unclear demarcation of terms. When confronting the observers with the inter-observer reliability of the video session, there were some remarks that some video footage was not good enough to see a lot of the body language of the animals. This might have caused the disagreement and highlights the importance of the footage quality. At the same time, it underlines the importance of training the second source of errors occurring when the scoring is not performed properly-what cannot be seen in the animals should also not be scored (or estimated instead). A general point in scoring QBA is the reference for the line from min to max that some observers build up with experience, to classify the present animals with respect to a certain term. In this video session this was maybe difficult for some observers, in particular if there was limited experience with a huge variety of emotional states in animals, and it might be that this aspect needed more attention in the training material.
For proper scoring, observers also needed to be trained for the occasion that there might be different animals showing different graded valences. To what extent observers were able to implement this cannot be analyzed, and it therefore remains a potential source of the extent of disagreement in the above-mentioned terms. However, as the main amount of terms was scored with good agreement, it has been shown that agreement in principle can be trained and is possible. Future training sessions should therefore focus on terms with so far lower agreement and provide excellent video footage.

Emotional State of Dairy Calves on the Farms
As stated earlier, this was the first study for dairy calves aggregating the PC1 score to a WQ-C12, and therefore scaling QBA results from 0 (poor) to 100 (excellent) points on a welfare scale. Therefore, we lack direct comparison. In any case we can say the farms in this study reached an average neutral to slightly positive welfare state, based on the expert opinions of Welfare Quality [12]. Comparing PC1 scores of our study with veal calves from 24 Italian farms [4] showed a larger range for the PC1 (−5.08 to 3.88). Translated to WQ-C12, this would correspond to 13.6 to 86.9 points and an average of 52.5, pointing towards a neutral to positive emotional state. Another QBA in 63 beef bull farms with three assessments per farm, using almost the same terms as for the dairy calves, was revealed average PC1 scores ranging from −4.8 to 4.8. This corresponds to 15.2 to 93.2 points for WQ-C12, with an average of 48 points, stating a neutral emotional state in beef bulls [8]. Despite having a larger sample size in the present study than the compared veal studies, but lower sample size than the beef studies, a lower range in WQ-C12 was found. This might be explained by the fact that Danish dairy calf husbandry, which was finally dominant in this study, was more uniform and scorings were therefore more homogenous than in Italian veal calves and beef bulls from Austria, Germany, and Italy [4,8]. However, QBA was still able to discriminate between farms amongst the dairy calf herds of this study, furthermore corresponding to certain farm factors, which is arguing for a large enough variation and sensitivity of the defined terms list.

Associated Factors on the Welfare Quality Scores for Emotional State of Calves
Finally, as mentioned before, we were investigating certain explanatory farm factors and their potential association with the emotional state of dairy calves at farm level. Positive effects on the emotional state of dairy calves were found by herd size (i.e., number of calves), organic production style, and breed. However, breed had no significant effect in the univariable analyses. Nonetheless, breed can be seen in relation to herd size, as the larger Danish herds most commonly are Holstein herds, which was also reflected by the included study herds. Additionally, a large number of Jersey bull calves are culled shortly after birth, leaving only heifer calves on the dairy farms for assessment. The associations found in the present study are well aligned with other findings, due to the effects of organic production setup. Pairwise housing and group housing of calves are mandatory in organic farms, which lead to decreased fear levels [26,27]. Furthermore, enhancement in social interactions is aiding calves during the feed uptake in the transition period at weaning [28,29].
Studies investigating potential risk factors for an aggregated emotional state in cattle farms are rare to our knowledge. Brscic et al. [4] found differences in the descriptive terms 'Active' and 'Lively' according to the housing form of veal calves at the age of 3 weeks (single vs. group housed). In contrast to that, the present study could not find an influence of the type of housing in our study, neither in very young calves nor older ones. Ellingsen et al. [17] analyzed effects of four different handling styles of stockperson ('Calm/Patient', 'Dominating/Aggressive', 'Positive interactions', and 'Insecure/Nervous') and succeeded in proving an influence on positive and negative moods of dairy calves. Although stockpersons' handling in our study was not observed due to study limitations, it would be valuable to include this in future studies on risk factors of emotional state and would probably lead to a higher explained variance in the model. The aspect of including stockmanship was also supported by the study from Norwegian colleagues [30], who investigated effects of farm factors and stockpersons' handling style on QBA in goats. Findings included that the positive attitude towards petting goats significantly related negatively to the QBA term 'Aggressive' and positively to the term 'Inquisitive/Interested'. In this study, we have also made farmer questionnaires with some of the farmers regarding their attitudes towards Animal Welfare, which is currently under investigation and will be part of our next study.

Conclusions
We conclude that with the 20 terms used, it is possible to achieve good overall observer agreement with trained persons. However, analysis pointed out that some terms would profit from improved video training of observers for a common understanding. We observed a relatively high portion of explained variation (54.0%) on PC1 ('Valence') and PC2 ('Activity') providing a differentiated view on the emotional state of calves. WQ-C 12 -scores were pointing at a neutral to positive emotional state, of which 18% of the variance could be related to number of calves on a farm, farming style, and prevalent breed, leaving enough potential for further investigations into which factors on farm might influence the emotional state.