Automatic Assessment of Keel Bone Damage in Laying Hens at the Slaughter Line

Simple Summary Keel bone damage (KBD) is very prevalent in commercial laying hen flocks with a wide range of affected hens/flock. It can cause pain, and affected hens have been found to be less mobile. The assessment of this animal welfare indicator provides important feedback for the farmer about flock health and consequently on the need for interventions. However, the assessment of keel bone damage is time-consuming, and prior training is needed in order to gain reliable results. Optical detection methods can be a means to automatedly score hens at the slaughter line with high sample sizes and in a standardized way. We developed and validated an automatic 3D camera-based detection system. While it generally underestimates the presence of KBD due to the purely visual assessment and technical constraints, it nevertheless shows good accuracy and high correlation of prevalences with those visually determined by a trained human assessor. Therefore, this system opens up opportunities to better monitor and combat a severe animal welfare problem in the long-term. Abstract Keel bone damage (KBD) can be found in all commercial laying hen flocks with a wide range of 23% to 69% of hens/flock found to be affected in this study. As KBD may be linked with chronic pain and a decrease in mobility, it is a serious welfare problem. An automatic assessment system at the slaughter line could support the detection of KBD and would have the advantage of being standardized and fast scoring including high sample sizes. A 2MP stereo camera combined with an IDS imaging color camera was used for the automatic assessment. A trained human assessor visually scored KBD in defeathered hens during the slaughter process and compared results with further human assessors and automatic recording. In a first step, an algorithm was developed on the basis of assessments of keel status of 2287 hens of different genetics with varying degrees of KBD. In two optimization steps, performance data were calculated, and flock prevalences were determined, which were compared between the assessor and the automatic system. The proposed technique finally reached a sensitivity of 0.95, specificity of 0.77, accuracy of 0.86 and precision of 0.81. In the last optimization step, the automatic system scored on average about 10.5% points lower KBD prevalences than the human assessor. However, a proposed change of scoring system (setting the limit for KBD at 0.5 cm deviation from the straight line) would lower this deviation. We conclude that the developed automatic scoring technique is a reliable and potentially valuable tool for the assessment of KBD.


Introduction
The keel, also called the breast bone, is the biggest bone in laying hens. It runs axially along the midline and extends outward, perpendicular to the plane of the ribs. Keel bone damage (KBD) can occur in non-cage systems in high prevalence. It has been found by visual inspection and palpation of hens in 100% of flocks with a wide range of 3-97% hens/flock affected [1][2][3][4]. KBD can consist of either deviations or fractures or both, but without dissection of the bone or other methods like histology or radiology, it is not possible to reliably distinguish mere deviations from fractures. In fact, results from Scholz et al. [5] suggest that most deviations are accompanied or caused by fractures. A correlation of distinct deviations and fractures could also be shown, e.g., by Fleming et al. [6] and Saraiva [7]. A host of factors contributes to its occurrence related to management and rearing, nutrition and housing [8]. Keel bone fractures are a serious welfare problem that involves pain [9,10] and impairment of mobility [2,11,12]. However, laying hen farmers are often not informed about the keel bone status in their flock and therefore are unaware of this welfare problem.
KBD can be assessed on-farm by palpation and visual inspection, but the handling and assessment of a sufficient number of animals in order to achieve a reliable estimation of the flock prevalence is an obstacle in the implementation of routine welfare monitoring on-farm using animal-based indicators (e.g., [13] for cattle, [14] for pigs). Nevertheless, increasingly, retailers and, in some countries like Germany, welfare law [15] require animalbased welfare monitoring concerning a number of potential welfare problems. In the search for less time-consuming assessment methods, the bottleneck slaughterhouse comes into particular focus, as at this point a high number of animals or even the whole flock can easily be inspected. The results provide retrospective information about on-farm welfare, as long as indicators are not largely influenced by catching and transport, which could possibly cause further keel bone fractures. Furthermore, assessments can be automated, which has the further potential advantage of better standardization and consequently increased reliability of assessments. The latter is important because repeatability of KBD assessment between different assessors is demanding to achieve and requires intensive training [16,17].
For an automated assessment, image processing techniques can be applied. They are a cheap, non-contact and cost-effective tool, and have already been used to monitor diverse livestock health and behavior measures (reviewed by [18][19][20]). Automatic, image-based assessment systems are routinely applied to, e.g., the evaluation of foot pad lesions in broilers and turkeys [21][22][23]. Even though some of the keel bone fractures, which are not associated with deviations from the straight line or clearly visible callus formation, are not detectable by visual inspection (and only to a limited degree by palpation), a major part of KBD, including deviations and thereby fractures, is clearly visible and therefore a good candidate for image processing.
The current research project aimed to develop and validate a measurement system for an automated image-based recording of KBD in laying hens at the slaughterhouse in order to allow a time-efficient and reliable assessment and thereby to contribute to an improvement of laying hen welfare.

Slaughterhouse and Animals
Development of the automated system took place in the years 2017 to 2019, with validation at a poultry slaughterhouse in Germany, involving laying hens of different genetics (e.g., Lohmann Brown, Lohmann Selected Leghorn, Novogen) at the end of lay. The commercial slaughter process comprised manual unloading from the transport crates and shackling, electrical stunning in a water bath, bleeding to death and afterward plucking by a roller fitted with rubber studs. Visual KBD assessments were always carried out after defeathering.

KBD Assessment by Human Assessors
KBD assessment was carried out by one trained person via visual inspection (score 0: intact, no visible callus or dislocation; no lateral, dorsal or ventral deviation from straight axis ≥ 0.2 cm; score 1: damaged, visible callus; dislocation or deviation from axis > 0.2 cm). At the slaughter line, the assessor was standing at around 30 cm distance from the carcasses. Chain speed slightly differed but was not faster than one carcass/two seconds. For the assessor, it was possible to follow the carcasses for five more seconds.
Inter-assessor reliability was tested in ten different pairs of one trained person and an experienced or inexperienced further assessor. Five persons were untrained and nine trained. The assessor pairs simultaneously scored 46 to 400 randomly selected hens (a total of 2273 hens) at the slaughter line as well as outside the slaughter line during the normal slaughter process.

Camera Vision System
A 2MP stereo camera with an active pattern projector to enhance the 3D reconstruction and a 5MP color cam were used in combination for the automatic assessment. The stereo camera included two monochrome cameras with a resolution of 1280 × 1024 pixels and an 8mm lens and working range of 480-1000 mm. The 3D image was improved by the projection of a blue pattern onto the carcass (Figure 1). The cameras were fitted in a stainless-steel housing with an IP69 similar protection class. The image recording was triggered by an infrared light barrier when there was a hook in front of the camera. At the focal point, which was 650 mm for this camera, the Z accuracy was set to 0.715 mm. The picture of the carcass was immediately processed after the picture was taken. The 3D reconstruction was performed on a NVidia Geforce GTX 1080 TI graphics card due to the high belt speed. The evaluation computer was also equipped with an Intel i7-6700 processor and 8 GB RAM. The algorithm for the 3D reconstruction was provided by the camera manufacturer and can be modified using various parameters. For the evaluation of the 3D data, the image processing library Halcon of the company MVTec was used.

KBD Assessment by Human Assessors
KBD assessment was carried out by one trained person via visual inspection (score 0: intact, no visible callus or dislocation; no lateral, dorsal or ventral deviation from straight axis ≥ 0.2 cm; score 1: damaged, visible callus; dislocation or deviation from straight axis > 0.2 cm). At the slaughter line, the assessor was standing at around 30 cm distance from the carcasses. Chain speed slightly differed but was not faster than one carcass/two seconds. For the assessor, it was possible to follow the carcasses for five more seconds.
Inter-assessor reliability was tested in ten different pairs of one trained person and an experienced or inexperienced further assessor. Five persons were untrained and nine trained. The assessor pairs simultaneously scored 46 to 400 randomly selected hens (a total of 2273 hens) at the slaughter line as well as outside the slaughter line during the normal slaughter process.

Camera Vision System
A 2MP stereo camera with an active pattern projector to enhance the 3D reconstruction and a 5MP color cam were used in combination for the automatic assessment. The stereo camera included two monochrome cameras with a resolution of 1280 × 1024 pixels and an 8mm lens and working range of 480-1000 mm. The 3D image was improved by the projection of a blue pattern onto the carcass ( Figure 1). The cameras were fitted in a stainless-steel housing with an IP69 similar protection class. The image recording was triggered by an infrared light barrier when there was a hook in front of the camera. At the focal point, which was 650 mm for this camera, the Z accuracy was set to 0.715 mm. The picture of the carcass was immediately processed after the picture was taken. The 3D reconstruction was performed on a NVidia Geforce GTX 1080 TI graphics card due to the high belt speed. The evaluation computer was also equipped with an Intel i7-6700 processor and 8 GB RAM. The algorithm for the 3D reconstruction was provided by the camera manufacturer and can be modified using various parameters. For the evaluation of the 3D data, the image processing library Halcon of the company MVTec was used.

Development and Validation of the Automated System
To identify the keel, the breast area was first determined on the Z-image (Figure 2a). In this elliptical region, a line was drawn by the user, presenting the main axis by calculating the center region ( Figure 2b). The actual course of the keel was extracted using height information. The maximum and mean horizontal distances between the ideal and actual keel line were calculated ( Figure 2c). The keel was classified as damaged if the mean distance from the ideal line was ≥0.2 cm (Figure 2d). Algorithm development was carried out in the following stages:

Development and Validation of the Automated System
To identify the keel, the breast area was first determined on the Z-image (Figure 2a). In this elliptical region, a line was drawn by the user, presenting the main axis by calculating the center region (Figure 2b). The actual course of the keel was extracted using height information. The maximum and mean horizontal distances between the ideal and actual keel line were calculated (Figure 2c). The keel was classified as damaged if the mean distance from the ideal line was ≥0.2 cm (Figure 2d). Algorithm development was carried out in the following stages: Initial phase: Altogether, 2287 hens of different genetics with varying degrees of KBD from more than 12 slaughter batches were temporarily taken from the slaughter line on two different days and hung in a self-made scaffold. KBD was assessed by one trained person by visual inspection, standing directly in front of the carcass. In case of doubt, a ruler was used to measure the deviation. After the assessment by the observer, the carcass was recorded with the camera system. Based on these assessments, the initial algorithm was developed.
Learning phase: The camera system was installed at the slaughter line. Five slaughter batches were assessed by the trained assessor by visual inspection and the automated system. Resulting prevalences of KBD were compared. In total, 2170 hens were assessed by the assessor and 25,748 by the camera during this step. Based on these results, an optimization of the camera and algorithm was carried out.
Optimization step 1: Performance of the system in comparison to the visual scoring by the trained assessor was evaluated using the criteria sensitivity, specificity, accuracy and precision. This was done in two optimization steps on 189 and 60 marked carcasses with differing damage levels. In addition, KBD prevalences of six batches were assessed by a human assessor and an automated system. Moreover, algorithm optimization was based on deviating assessments between the human assessor and the automated system from a scoring of a further 300 carcasses.
Optimization step 2: Again, 93 and 79 marked carcasses were scored and prevalences of four slaughter batches determined. During the last batch, the assessor used a modified threshold for keel bone damage: deviation from straight axis of ≥0.5 cm (in contrast to ≥0.2 cm).
In total, KBD prevalences from 15 batches for the comparison of the prevalence ascertained by assessor and automated system, and individual assessments of 421 birds out of four different batches for calculation of performance criteria (sensitivity, specificity, accuracy and precision), were included in the validation. Initial phase: Altogether, 2287 hens of different genetics with varying degrees of KBD from more than 12 slaughter batches were temporarily taken from the slaughter line on two different days and hung in a self-made scaffold. KBD was assessed by one trained person by visual inspection, standing directly in front of the carcass. In case of doubt, a ruler was used to measure the deviation. After the assessment by the observer, the carcass was recorded with the camera system. Based on these assessments, the initial algorithm was developed.

Statistics
Learning phase: The camera system was installed at the slaughter line. Five slaughter batches were assessed by the trained assessor by visual inspection and the automated system. Resulting prevalences of KBD were compared. In total, 2170 hens were assessed by the assessor and 25,748 by the camera during this step. Based on these results, an optimization of the camera and algorithm was carried out.
Optimization step 1: Performance of the system in comparison to the visual scoring by the trained assessor was evaluated using the criteria sensitivity, specificity, accuracy and precision. This was done in two optimization steps on 189 and 60 marked carcasses with differing damage levels. In addition, KBD prevalences of six batches were assessed by a human assessor and an automated system. Moreover, algorithm optimization was based on deviating assessments between the human assessor and the automated system from a scoring of a further 300 carcasses.
Optimization step 2: Again, 93 and 79 marked carcasses were scored and prevalences of four slaughter batches determined. During the last batch, the assessor used a modified threshold for keel bone damage: deviation from straight axis of ≥0.5 cm (in contrast to ≥0.2 cm).
In total, KBD prevalences from 15 batches for the comparison of the prevalence ascertained by assessor and automated system, and individual assessments of 421 birds out of four different batches for calculation of performance criteria (sensitivity, specificity, accuracy and precision), were included in the validation. with TP = true positives (damaged keels recorded as damaged), TN = true negatives (non-damaged keels recorded as intact), FP = false positives (non-damaged keels recorded as damaged keels), FN = false negatives (damaged keels recorded as non-damaged keels). The PABAK was also calculated regarding the agreement between human assessor and automated recording.
These analyses were done in Excel 2016. In addition, the relation between KBD prevalences determined by the automated system or human assessor was investigated by Pearson correlation analysis (SPSS Version 27).

KBD Assessment by Human Assessors
Inter-assessor reliability was higher with trained compared to untrained assessors (mean PABAK: 0.76 versus 0.72), but was substantial in all cases except one (with a moderate result) and almost perfect in four cases (Table 1). More detailed data for inter assessor reliability are provided in the Supplementary Materials (Table S1). Table 1. Results of inter-assessor reliability tests (prevalence-adjusted, bias-adjusted kappa: PABAK) for the assessment of keel bone damage in laying hen carcasses by visual inspection between ten different pairs of one trained person and an experienced or inexperienced further assessor.

Validation of the System
Performance of the system improved during optimization. It finally reached performance levels of above 80% for all criteria except for specificity with 77% (Tables 2 and 3). More detailed data of performance assessments are provided in the Supplementary Materials (Table S2).  KBD prevalences ascertained by the human assessor in the 15 flocks during learning and optimization phases ranged between 23.0% and 68.9% (mean: 42.1%, median: 39.2%). After the modification of score 1 in the last assessment round to: deviation from straight axis of ≥0.5 cm (in contrast to ≥0.2 cm)the prevalence was reduced by 11.4% points to a minimum prevalence of 18.8%. At the same time, the lowest difference between automated and human assessment of 1.5% points was achieved here. Otherwise, differences ranged between a maximum of 32.5% points during the learning phase and a minimum of 1.5% points in optimization step 2 (mean: 7.9% points in optimization step 2; Table 4). Overall phases prevalences ascertained by human and automated assessment showed a substantial correlation (r = 0.687, p = 0.003, n = 16, assessment round 15b not included) Table 4. Prevalence (%) of keel bone damage in laying hens at the slaughter line assessed by a trained person and an automated camera system and differences between the determined prevalences (% points).

Discussion
The visual scoring of animal-based measures by a human assessor is an important method in livestock welfare assessments. However, it is time-consuming and laborintensive [18] and requires a minimum of training. To our knowledge, this project was the first attempt to develop and evaluate a system for the automatic detection of KBD at the slaughterhouse.
In the present project, the keel status was purely visually evaluated, which leads to an underestimation of the damage, since smaller deviations, dislocations and callus deposits can only be detected by palpation or other methods such as radiography. In addition, we found that the camera further underestimates the occurrence of KBD. In part, the number of damaged keels that were not correctly recognized by the camera system might have been due to the movement of the carcass during imaging. However, the exposure time of the camera was very low (<2 ms), so problems due to motion blur are unlikely. By using a 3D camera with a higher resolution, accuracy could be improved, but this would more than double the costs of the system, possibly rendering it less attractive for commercial application. Another option may be an increase of framerate from the current one image for each shackle to more images of each chicken from different perspectives, integrating the different 3D models into one big model with more information and a higher resolution. However, through optimization of the algorithm, we were able to already reach a high sensitivity of 95% and a specificity that approached 80%. The agreement between human observer and automatic system was also substantial when expressed as PABAK and comparable to agreements between different human observers. In addition, the substantial correlation between the differently achieved prevalences by automated system and human assessor proves that the farmers will still receive a fair reflection of the relative magnitude of the welfare problem in their flock. The general underestimation of KBD cases is a little problematic at the currently rather high prevalence levels in practice but should be considered in the interpretation of the results, especially when it should come to setting up target values. In a further validation step, both for the automated system and the human assessment by visual inspection and palpation, additional scoring results should be compared to true KBD ascertained by the scoring of dissected bones, radiographs or other kinds of bone visualizations. Moreover, possible influences of catching and transport on KBD should be determined.
A change of the scoring scheme to the definition of a damaged keel with deviations and dislocations ≥0.5 cm led to a considerably better agreement between assessor and camera as well as to an improvement of sensitivity, specificity, accuracy and precision. At the same time, it will likely also improve inter-assessor reliability further, although predominantly, substantial to almost perfect agreement was already reached. Regarding the inexperienced assessors, these results were unexpected but may be due to the merely visual assessment being easier than the assessment via palpation. In case of a system failure at the slaughterhouse, this and the higher threshold for scoring a KBD would make it easier for slaughterhouse personnel to reliably determine KBD prevalences by visual inspection. Therefore, we recommend changing the definition of visually assessed KBD accordingly and setting a deviation from the straight line of the keel bone of 0.5 cm as the limit for the scoring of KBD (except if callus material is clearly visible). This, furthermore, corresponds to possible discussions about the severity of the welfare problem if the keel bone shows only deviations and no fractures. Presently, no information about the relation between keel bone deviations and pain is available. However, a more severe deviation is more likely associated with a fracture. Scholz et al. [5] histologically found indications of fractures (callus material or new bone tissue) in 51% of macroscopically assessed keel bones with slight deformations, in 80% of keel bones with moderate deviations and in all keel bones with severe deviations. Thus, deformities of ≥0.5 cm are more likely associated with fractures and therefore with pain. However, more research is needed to gain a better understanding of the types and implications of damage that is represented by only small deviations from the straight line of up to 0.5 cm. More investigation would also be needed on the kinds and levels of fractures that cannot be detected by the automated system as well as by palpation technique. This includes fractures of the tip of the keel that are also difficult to score reliably by human assessors [25]. Here, open questions further relate to causes and welfare effects of these fractures.
The proposed image processing technique in this study can also be improved in a fully automated way by using robust machine learning techniques.
However, the developed system is already able to correctly identify KBD on a satisfactory level. It has the advantage that high sample sizes can easily be achieved and that it opens up the possibility to monitor the development of a severe welfare problem that currently does not receive enough attention in practice.
The disadvantage of retrospective monitoring at the slaughterhouse is that it does not allow an immediate intervention in order to improve the welfare of the individual animal [26]. However, KBD is usually affected by long-term causes such as generally fragile bones, unfavorable housing equipment like metal perches, narrow corridors in aviaries, poor lighting conditions and high laying performance, which cannot be remedied with a simple and quick countermeasure. Automatic recording would help the laying hen farmers gain first knowledge about KBD prevalence in their flocks and thus determine the need for long-term changes. Besides, it would also be an easy method to gain information about the development of KBD in general over years and could support investigations about KBD. Therefore, it could be a highly valuable instrument for the monitoring of hen welfare and provide farmers with important information for their management decisions as well as researchers for future epidemiological welfare studies.

Conclusions
The successful development of an automatic camera-based detection technique allows the reliable recording of keel bone damage (KBD) at the slaughter line. The system generally underestimates the presence of KBD due to the purely visual assessment and technical constraints. Nevertheless, it shows good accuracy and a substantial correlation of