Selection of Meat Inspection Data for an Animal Welfare Index in Cattle and Pigs in Denmark

Simple Summary Despite being important to the general public, the monitoring of animal welfare is not systematic. The Danish political parties agreed in 2012 to establish national animal welfare indices for cattle and pigs, and here we assess the potential for using data from the systematic meat inspection to contribute to such indices. We demonstrate that although a number of recordings may be relevant for animal welfare, differences in recording practices between slaughterhouses can be so large that correction is not deemed feasible. For example, significant differences in tail fractures in pigs and sows were recorded between abattoirs, despite the fact that this condition should be easier to diagnose compared to e.g., the more consistently recorded “chronic arthritis” in cows. The study findings suggest that some recordings may be useful for inclusion in animal welfare indices, but that their relevance should be assessed along with the recording practices if included. Furthermore, factors such as appropriate behaviour are also important to monitor as part of the welfare of both cattle and pigs. Abstract National welfare indices of cattle and pigs are constructed in Denmark, and meat inspection data may be used to contribute to these. We select potentially welfare-relevant abattoir recordings and assess the sources of variation within these with a view towards inclusion in the indices. Meat inspection codes were pre-selected based on expert judgement of having potential animal welfare relevance. Random effects logistic regression was then used to determine the magnitude of variation derived at the level of the farm or abattoir, of which farm variation might be associated with welfare, whereas abattoir variation is most likely caused by differences in recording practices. Codes were excluded for use in the indices based on poor model fit or a large abattoir effect. There was a large abattoir effect for most of the codes modelled and these codes were deemed to be not appropriate to be carried forward to the welfare index. A few were found to be potentially useful for a welfare index: Eight for slaughter pigs, 15 for sows, five for cattle <18 months of age, and six for older cattle. The absolute accuracy of each code/combination could not be assessed, only the relative variation between farms and abattoirs.


Introduction
In 2012, a joint agreement between the political parties represented in the Danish parliament decided to establish animal welfare indices [1]. The purpose of the development of national indices for cattle and pigs was to enable surveillance of the state of animal welfare nationally and in the longer term decide areas where animal welfare can be improved. Animal welfare is, however, a multifactorial concept with different stakeholders traditionally thought to emphasise different aspects [2][3][4]. To create an index that is transparent it was decided to choose a hedonistic approach to animal welfare. This approach places the emphasis on the experiences of the animal [5], with the consequence that e.g., disease or reduced growth are only taken into account if they have an impact on the affective state of the animal. This is the same approach as the one taken in the EU-project Welfare Quality [6]. The indices were to be constructed using farm visits, but in order to make the monitoring as efficient and cheap as possible, there was also a desire to include register data whenever possible.
Meat inspection is carried out routinely on all cattle and pigs carcasses according to legislation from EU and Denmark [7,8] in order to safeguard food and animal welfare at slaughter. The meat inspection data may also be used for purposes such as creation of an index of animal welfare. A number of challenges exist prior to such use. For example, all meat inspection parameters recorded for food safety reasons are not necessarily relevant in relation to animal welfare at the farm, and some are related to acute disease conditions, which may have occurred during transport, and some are fairly non-specific recordings. Furthermore, differences in recording practices and thresholds may differ between slaughterhouses [9][10][11], which may result in differences in sensitivity and specificity of the meat inspection data in relation to the intended target conditions between the slaughterhouses. Finally, rare conditions may be difficult to appraise statistically, although they are of sufficient severity to highly motivate inclusion in a welfare index.
The objectives of the present study were to provide a statistical assessment of meat inspection data to (a) select codes of relevance to an animal welfare index based on prevalence and welfare impact; (b) assess the contribution of each slaughterhouse on the variation in prevalence of each relevant meat inspection variable; and (c) provide estimates of a correction factor for each slaughterhouse for each of the relevant meat inspection code.

Materials and Methods
Meat inspection data for 2012 were provided by the Danish Veterinary and Food Administration (Glostrup, Denmark) and used for the data analyses. The meat inspections are done by official technicians as laid down in the EU legislation [7]. A specific protocol is given in a government circular [8], according to which an official veterinarian has the overall responsibility of the recording as specified in the EU legislation. Observations are recorded electronically at the carcass inspection station and verified by government veterinarians and uploaded to a meat inspection database located with the Danish Food and Agricultural Council (Axelborg, Copenhagen V, Denmark). The data were summarised into the number of animals slaughtered and prevalence of code, for each combination of farm of origin, abattoir, animal type (pig, sow, calf, cow), and slaughter date. Data were provided from all major pig (n = 9) and sow (n = 3) abattoirs, including 5381 pig farms and 1781 sow farms. Slaughterhouses processing relatively few cattle were excluded, i.e., all slaughterhouses with less than 10,000 cattle slaughtered in 2012 were not included in the following analyses. This resulted in data from eight slaughterhouses being used, with a total of 10,718 farms providing data for cows and 7019 farms providing data on calves. Cows and calves were slaughtered in the same abattoirs, whereas pigs and sows were slaughtered in separate plants. Due to the purpose of the study, namely to create an index reported annually, observations from all dates were then combined at the level of farm, abattoir, code and animal type. This was referred to as a "batch", i.e., a batch consisted of the number of pigs, sows, cattle <18 months, or cattle ≥18 months of age slaughtered at a specific abattoir from a specific farm within 2012.

Exclusion of Codes
Some irrelevant "commercial codes" (such as information about contamination, missing organs and slaughter line issues) were excluded from the data. Specific meat inspection codes were also excluded where they were not deemed relevant to the purpose of the study, which was to assess changes in on-farm welfare of cattle and pigs, excluding transport to the abattoir and slaughter. Consequently, codes were excluded due to (a) possibly being related to transport; (b) acute conditions, which could have occurred during transport; (c) central nervous system (CNS) conditions, while they are relatively unspecific and difficult to assess at the abattoir; (d) not related to animal welfare (when using the hedonistic definition mentioned previously); and (e) being non-specific conditions. Further, codes were excluded if they had a low prevalence combined with a low impact on welfare.
All individual codes were 3-digit (listed in Appendix A). Codes that were judged to be equivocations as far as animal welfare was concerned were collapsed into a single category. For example, all codes associated to included liver conditions in cattle were collapsed (374, 375, 377, 379, 381 to 374375377379381), and abscesses were collapsed to 570577580584585 irrespective if they occurred in the front part (570), mid-part (577), rear part (580), extremities (584) or head (585). If an animal had one of these conditions, it was classified as having the condition. The decisions were based on consensus between three of the authors (Hans Houe, Søren Saxmose Nielsen, Björn Forkman) and other experts (Sine Andreassen and Anne Marie Michelsen). See Appendix A, Table A1 (pigs) and Table A2 (cattle) for specific descriptions of the individual codes.

Estimation of Abattoir Effects for Each Code and Category
Random effects logistic regression using R [12] was done as described in detail in Denwood et al. [13]. Briefly, the random effect logistic regression models were fitted using the glmer-function in the lme4 package in R [14]. The random effects model with binomial response was used to assess the relative variance explained by the farm of origin, abattoir, and residual extra-binomial variance at the level of "batch" observation (interaction of Farm and Abattoir). Models were fitted separately for each combination of animal type and code. To assess if abattoir and farm effects were present, the statistical significance of the random effects of Abattoir and Farm were individually tested using a numerical approach as described by Lewis et al. [15] and Denwood et al. [13]-where these were not deemed to be significant, they were removed.
Animal type/code combinations with either fewer than 50 positive batches, or no batches with more than 1 positive animal, were not analysed using the random effects model (where batch as previously defined is the number of pigs, sows or cattle of a given type slaughtered at a specific abattoir from a specific farm). These datasets contain insufficient information for the random effects results to be numerically stable. Model fit was assessed against the distribution of deviance statistics from data generated using the fitted model. The general form of the model is as follows: where the subscript i denotes each observed combination of farm and abattoir, f denotes the farm associated with batch i, and k denotes the abattoir associated with batch i. The explanatory variables consist of a common intercept A and random effect of batch B (which were included for every model), and random effects of farm C and abattoir D (which were tested for significance as discussed above). The response variable Y i (the number of observed positive recordings for batch i) was described using a Binomial distribution, according to the fitted probability p i and total number of recordings N i . The 95% confidence intervals for the estimates within the random effects associated with each farm and abattoir were generated using a parametric bootstrap approach. We note that a subset of this data has already been presented to illustrate the statistical methodology developed to analyse the data [13], but here we consider the welfare implications of the analyses rather than the statistical methods themselves, and also widen the scope to include both pigs and cattle.
The resulting random effect coefficients (on the logit scale) for codes where a statistically significant abattoir effect was identified were subsequently used to divide the modelled codes into those where: (i) correction of slaughterhouse effects might be useful for further use of the code; (ii) correction for slaughterhouse effect would be deemed controversial; and (iii) correction would be deemed inappropriate. For the former, random effect coefficients of between −1 and 1 were deemed potentially useful to generate correction factors, (under the assumption that they had acceptable sensitivity and specificity; this assumption is not assessed in this article). Any correction should be done on the logit scale, but for explanatory purposes, a random effect coefficient of 1 on the logit scale corresponds to a correction of approximately 2.7 times the average, and a random effect coefficient of −1 corresponds to a correction of 0.37 times the average (these approximations are only accurate for prevalences <20%; otherwise a correction has to be done on the logit scale). For larger random effects estimates it is likely that there is a systematic difference in recording procedure between slaughterhouses, so if the absolute random effect coefficient was between 1 and 2 (prevalences +/−2.7 to 7.4 times different between the abattoirs), then correction was deemed questionable; and if >2 then it was deemed inappropriate.

Code Selection
The pig and sow data originally included 76 non-commercial meat inspection codes,  Tables 3 and 4.

Descriptive Statistics
Prevalence for each code and code combination for slaughter pigs and sows are given in Tables 1  and 2, respectively. Prevalence for each code and code combination for cattle are given in Tables 3  and 4.  Table 2. Prevalence (number and %) of selected slaughter recording codes in sows slaughtered at the three largest sow slaughterhouses (S10-S12) in Denmark in 2012.

Pig and Sow Data
Eleven codes were removed from each of the pig and sow data because of poor model fit, which was primarily as a result of low numbers of observations (Table 5). Of the remaining 31 codes or combinations for each animal group, there was evidence of Abattoir-only variance for two sow-codes, Farm-only variance for five of each sow and slaughter pig codes, and both sources of variance for 33 combinations (eight combinations had neither random effect term fitted). For example, for code 120 in pigs, the variance effect due to abattoirs was 0.29, the farm effect was 0.38 and the residual 0.15. Thus, the farm effect was biggest, but there was still considerable difference between slaughterhouses (all abattoir and farm random effects terms presented are statistically significant). However for sows, the slaughterhouse effect appeared to be largest (0.36 vs. 0.26) meaning that the slaughterhouse effect seemed to be larger than that of disease. Figure 1 shows a graphical summary of the random effects.

Calf and Cow Data
Twenty-four and 19 codes were removed from the calf and cow datasets, respectively due to no and poor model fit, with 20 codes in calves and 25 codes cows producing acceptable model fits (Table 6). Of the remaining combinations, there was evidence of Abattoir-only variance for 8, Farm-only variance for five, and both levels of variance for 13 combinations (12 combinations had neither random effect term fitted). A summary graph illustrating the results is shown in Figure 2.  Individual estimates for the variance partition effect of each abattoir (95% confidence intervals shown as bars) for each code in pigs (S1-S9, blue) and sows (S10-S12, pink).

Figure 1.
Individual estimates for the variance partition effect of each abattoir (95% confidence intervals shown as bars) for each code in pigs (S1-S9, blue) and sows (S10-S12, pink).  There is substantially more agreement for the abattoir random effect estimates for the cattle data than for the pig data. However, there is still some variation in the magnitude of random effects estimates between codes, suggesting that caution should be taken when interpreting codes. There is a striking similarity between the estimates produced for calf and cow data, especially for disease codes 271289, 412, 570577580584585 and 602604.  Table 7). Including both the codes and categories with an abattoir effect and those without, (a) four codes and four categories (15 codes in total) were deemed potentially useful in pigs; (b) 10 codes and five categories (23 codes in total) were deemed potentially useful in sows; (c) two codes and three categories (14 codes in total) were deemed potentially useful in cattle <18 months; and (d) five categories (17 codes in total) were deemed potentially useful in cattle ≥18 months of age ( Table 7). The potentially useful codes with descriptions are listed in Table 8.

Discussion
This study provides estimates of the differences in meat inspection recording due to farm and abattoir effect for a selection of meat inspection codes from three sow, nine pig and eight cattle abattoirs. "Farm"-associated variation is considered to be due to differences in health or welfare conditions at farms, whereas "abattoir"-associated variation might be considered to occur due to differences in recording at different abattoirs. However, it should be noted that a proportion of this variation may also be due to any systematic difference in the average prevalence of disease between the subsets of farms that primarily send animals to a specific abattoir for slaughter.
Among 76 meat inspection codes in pigs and sows, 42 were used as single codes or in categories in the random effect analyses. Thirty-one codes could be modelled in pig abattoirs and 31 could be modelled in sow abattoirs, but the codes were not exactly the same because different conditions were more prevalent in some types of animals than others. A farm and an abattoir type effect existed for all of these 31 pig codes and an abattoir effect existed for all but six codes/categories (132 (skinny), 230 (endocarditis), 379381 (liver conditions) and 600601 (tail-bite or association infection) in sows.
Among 84 meat inspection codes in cattle, 44 were used as single codes or in categories. Twenty codes could be modelled for calves and 25 for adult cattle. There was a significant abattoir effect for all but one code (532 (chronic arthritis or arthrosis)) in adult cattle.
There does not seem to be a great deal of consistency in abattoir effects between different disease codes in either pigs or sows, although some pairs of codes (for example Codes 336 (gastric ulcers) and 120 (circulatory affection) in pigs) do show some agreement. A similar analysis conducted using 2013 and 2014 data also revealed some variation from year to year (data not shown). There are also substantial differences in the estimate for the variance partition due to abattoir between disease codes, indicating that it is not likely to be feasible to use a single correction factor for all disease codes, if correction factors were to be used to even out the observed bias. For example, abattoir S10 was above average for five, and below for 11 codes and code categories, while abattoir S5 was above average for 13 and below average for seven codes and code categories (Figure 1). The individual random effect estimate for each abattoir can be interpreted as the effect of the abattoir on the reported prevalence of each code after accounting for differences between farms. This effect is relative to an "average" abattoir with an effect size of 0 (i.e., a random effects estimate), so it can be used as the basis of a correction factor by multiplying the estimate by −1 and adding this to the logit of the average prevalence to come up with an expected logit prevalence at each abattoir. For prevalence <20%, which is true of almost all relevant slaughter codes, this can be reasonably approximated using the exponent of the abattoir effect multiplied by the observed prevalence. Obviously these estimates are conditional on the 2012 data being fully representative of future observations, and no effect of date/time of year has been accounted for so the correction factors can only safely be applied to a dataset representing a full calendar year of observations.
For some codes, the results presented here suggest a considerable and significant difference in recording levels between abattoirs. The magnitude of the differences between abattoirs was most frequently observed in the range -1 to 1 (on the logit scale), but for some codes and categories the differences were somewhat larger or substantially larger (Table 7). For these codes, there would seem to be some structural differences in the recording procedures, and consequently applying a simple correction factor without addressing understanding of the major underlying differences in recording procedure may not be a sensible or viable approach. When the differences are smaller, then use of a correction factor to "even out" small variations between abattoirs may be useful to allow a more robust comparison of observed farm prevalence. There are some farms that only use one slaughterhouse, which should not be a problem for slaughterhouse effects, as slaughterhouses always have more than one farm. However, it constitutes a challenge that batch and farm effects confound each other for some farms, where a farm has a single batch and therefore two random effect levels for a single observation. Therefore, we may have challenges in separating the farm and batch effect, and interpretation of the data should focus on the abattoir effect, not the any potential farm-effect. It is also important to note that the random effects components presented are only estimates, and represent only indications of relative differences between welfare indicators and between abattoir and farm effects. Although it is theoretically possible to obtain confidence intervals for these via a procedure such as parametric bootstrapping, this is computationally impossible for this dataset. We also note the increased potential for shrinkage for the abattoir random effect relative to that for farm due to the large difference in the number of abattoirs (eight for cattle, nine for pigs and three for sows) vs. farms (10,718 farms for adult cattle, 7019 farms for calves, 5381 farms for pigs and 1781 farms for sows). This means that the variation between abattoirs is likely to be somewhat underestimated relative to that between farms. However, this does not affect our conclusions because of the focus on the abattoirs, not the farms. Table 8 provides a list of meat inspection codes and descriptions for those codes and categories where there was no detected abattoir effect or where the effect was within −1 and 1 on the logit scale, i.e., they were within 2.7 times higher or lower than the mean prevalence. The listed conditions all have some relation to animal welfare, but we have refrained from specifying how much they would eventually contribute. This is dealt with in the weighting and aggregation in other parts of the main project. Furthermore, this study does not inform if the conditions are recorded accurately. Differences in accuracy of recording practices are likely to be the main cause of differences between slaughterhouses resulting in the high abattoir effects; differences in recording accuracy has also been demonstrated for clinical recordings [16]. It can be speculated that the conditions not recorded by some meat inspectors are those that are considered to be least severe. There are no data in the present study to suggest so, but it could be object of speculation. The conditions listed in Table 8 are those that are more specific and this supports the notion that they may be more accurately recorded. However, a condition such as gastric ulcers (code 336) in pigs might also be considered fairly specific and easy to diagnose, but there is still quite a large difference between the slaughterhouses. Chronic pericarditis (code 222) is also fairly specific and appears to be recorded relatively similarly in adult cattle across slaughterhouses, but this is not the case in pigs and sows, where the prevalence can still be high in some slaughterhouses (e.g., 5.1% in pigs in S1) but not in others (0.006% pigs in S6). Use of the data would depend on a farm-effect, because this effect should reflect the differences in the conditions.
A number of additional requirements are necessary if the data should be used for national animal welfare monitoring. Firstly, the recordings should measure animal welfare with some level of accuracy, the recordings should be objective, consistent over time and feasible to implement. A basic assumption for use of the correction factors is that the time period used is representative. The recording level can differ within the same abattoir over time as we have previously demonstrated [10]. However, if the correction factors are updated regularly, e.g., annually, then this is only of minor importance. A more important assumption is that farmers do no send specific pigs (with e.g., higher or lower perceived prevalence of welfare-related conditions) to specific slaughterhouses, which would mean that true prevalence is made artificially high or low by the correction. Another example may be if certain types of pigs associated with particularly good or bad welfare are predominantly slaughtered at a particular slaughterhouse. For example, organic pigs are often slaughtered at specific slaughterhouses such as S4, and they may have different levels of disease. This could lead to e.g., a high prevalence at the abattoir slaughtering these specific pigs. Slaughterhouse S4 had a higher prevalence of codes 131 (emaciated), 132 (skinny), 222 (chronic pericarditis), 361 (hernias) and 505507 (healed tail and rib fractures), none of which is likely to be associated specifically to organic production. Farmers probably do not send pigs to slaughterhouses in any kind of balanced way, but we have no possible means to estimate this at the moment. For now, we have to accept that we cannot differentiate low slaughterhouse sensitivity from a slaughterhouse, where everyone sends the healthy animals, i.e., we assume that the distribution of true disease is random between slaughterhouses, which may be nonsense due to spatial effects of disease prevalence for some conditions, but not for others. However, it is not really possible to deem based on the data at hand. It should be noted that approximately 20% of sows are slaughtered in abattoirs not included in this study, while this is the case for less than 1% of slaughter pigs. Almost all cattle slaughtered in Denmark during 2012 were also included. However, it was not possible to correct for any imbalances in the data, which are observational in nature. The next steps in any data aggregation are also important but will not be covered here, as they are beyond the scope of the present paper. A thorough analysis has been included and published in a report from the Danish Veterinary and Food Administration including technical appendices [17].
Use of the data for an animal welfare index would also presume that all animals are slaughtered in Denmark. A high proportion of piglets are exported, and the number of sows slaughtered outside Denmark is also significant. Such animals would therefore not contribute to an animal welfare index.

Conclusions
We recommend to proceed with the codes and categories listed in Table 8, while they have some relation to animal welfare and differences in recording between abattoirs seem minimal to moderate. However, the accuracy of recording has not been assessed, and the magnitude of the relation to animal welfare has not been assessed either, although a qualitative assessment has been done. A full assessment would not be feasible. The codes and categories not included in Table 8 should not be used without further addressing differences between slaughterhouses. Last but not least, if the codes and categories are included in indices used for national governance, it should be recalled they are numeric simplifications of complex concepts [18].