Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens

Dahal, Samin; Paneru, Bidur; Dhungana, Anjan; Chai, Lilong

doi:10.3390/agriengineering8020068

Open AccessArticle

Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens

Department of Poultry Science, University of Georgia, Athens, GA 30602, USA

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(2), 68; https://doi.org/10.3390/agriengineering8020068

Submission received: 4 January 2026 / Revised: 1 February 2026 / Accepted: 12 February 2026 / Published: 14 February 2026

(This article belongs to the Section Livestock Farming Technology)

Download

Browse Figures

Versions Notes

Abstract

The feather coverage of a laying hen is an important indicator of both its productivity and welfare. Conventional manual feather scoring procedures are laborious, subjective, and stressful for the hens. Thermography offers a modern alternative to addressing these problems. Thermal cameras capture radiative heat loss, which is comparatively greater Classification from featherless areas. Studies have been conducted to establish a standard temperature range that correlates to specific featherless areas. However, such temperature-based approaches have been inconsistent with each other. In contrast, this study used deep learning techniques to automatically assess dorsal feather scores using thermal images. Thermal images (n = 1575) of the dorsal body of cage-free laying hens with varying degrees of feather damage were captured. Manual feather scoring was performed, classifying the image into a feather score (0–2) according to the increasing severity of feather loss. A total of 1222 images were selected, filtering out images of lower quality. Two types of computer vision models, a classification model and an object detection model, were trained and evaluated. A custom convolutional neural network (CNN) was trained to classify thermal images into feather score categories. Additionally, we trained and optimized You Only Look Once (YOLO) object detection models to detect areas of feather damage and predict the feather score. The CNN model achieved an overall accuracy of 0.81, with high precision for severe feather loss. The YOLO-based object detection model was optimum using YOLO11n, which achieved a precision of 0.81, a recall of 0.73 and a mean average precision (mAP) at 0.5 intersection over union (IoU) of 0.84. Results show the potential of combining thermal imaging with deep learning techniques to perform objective, automatic, and scalable feather scoring procedures. Future studies should focus on data diversity, multiple part scoring, and semantic segmentation for robust performance.

Keywords:

cage-free; computer vision; deep learning; poultry welfare; thermal imaging

1. Introduction

Feather coverage in laying hens is a key indicator of both productivity and welfare status. Hens with poor feather coverage have exposed skin, because of which they lose more heat to the environment, especially in cold environmental conditions [1]. To compensate for this heat loss, hens with poor feather coverage increase thermoregulatory heat production [2]. Consequently, they will have increased feed consumption to generate additional heat. Studies have shown that hens with good feather coverage have lower feed intake and higher egg production [3,4,5]. This leads to lower production costs and potentially greater profit for farmers. Therefore, hens with better feather coverage have better feed efficiency and lower production cost [6].

Poor feather coverage in hens can indicate various environmental, nutritional, infectious, or behavioral issues [7]. These factors are strongly associated with the overall welfare status of hens. Feather coverage assessment has been found to be a reliable method for evaluating feather pecking activity in hens [8,9]. Therefore, feather coverage assessment can also indicate the welfare status of hens. A study used feather coverage of cage-free laying hens as an indicator to assess the welfare impact of various improvement initiatives [10]. In addition, several established welfare assessment protocols such as LayWel (2006) [11], Welfare Quality Network (2019) [12], and AssureWel (2013) [13] include feather scoring as one of the indicators of hen welfare.

Although widely used, manual feather scoring of hens has various drawbacks. It tends to be laborious, time-consuming, and subjective. Such methods require a substantial workforce, especially on a large-scale farm. In addition, handling hens for feather scoring can be stressful, affecting the hens’ welfare. To overcome such shortcomings, thermal imaging techniques have been used. Thermal cameras detect the temperature of a target by evaluating its radiative heat loss [14]. Since featherless areas lack insulation, they radiate more heat to the environment, which can be detected and quantified using thermal imaging techniques.

Previous studies have used thermal imaging to quantify feather scores by correlating thermal temperature with feather coverage [15,16,17]. However, the temperature ranges corresponding to each feather score category varied across studies, making standardization challenging. Cook et al. (2006) reported featherless areas to be within the temperature range of 28 to 31 °C [15]. However, in another study, Zhao et al. (2013) observed such areas to exceed 35.9 °C [16]. The authors mentioned variations in environmental parameters such as temperature and humidity, which can influence the thermal temperature reading. This makes establishing a standard temperature threshold for feather scoring challenging, especially under varying environmental conditions. To overcome the limitations, analyzing the thermal image instead of relying solely on specific temperature ranges might be more effective. This can be achieved with the help of computer vision techniques. Deep learning models can be used to extract and learn distinctive features that differentiate feathered and featherless regions.

Precision poultry farming technologies, such as computer vision-based systems, are becoming increasingly important in addressing the current issues related to animal welfare and production efficiencies. A convolutional neural network (CNN) is a type of deep learning model, commonly used to learn specific features from images for classification. In a recent study, a CNN was applied to classify poultry feathers based on colors, with an accuracy of 93.71% [18]. Another important deep learning model is You Only Look Once (YOLO), which is primarily used for object detection. Detection models like YOLO not only classify the objects of interest but also determine the location of the object within an image. YOLO has become one of the most popular computer vision algorithms and is increasingly being applied in poultry research [19]. Since its introduction, the YOLO model has evolved significantly, with each version bringing key improvements to enhance performance and efficiency.

In the current study, we explored the use of thermal image-based deep learning models to automatically assess feather scores according to the severity of the feather loss on the dorsal body of laying hens. First, we developed a custom CNN model for feather score classification. Subsequently, we trained and optimized YOLO models to detect featherless areas with the assignment of their respective feather scores. The objective of this study was to compare the performance of these deep learning methods in quantifying feather loss using thermal imaging techniques.

2. Materials and Methods

2.1. Experimental Setup

The study was conducted at the University of Georgia (UGA) Poultry Research Facility in a cage-free research setting. Day-old Lohmann LSL-Lite chicks were obtained from Hy-Line North America (Mansfield, GA, USA). The hens were housed in three identical rooms each measuring 7.3 m in length, 6.1 m in width, and 3 m in height. Environmental conditions were maintained in accordance with Lohmann LSL-lite management guidelines. All animal use and management procedures of the study were approved by the Animal Care and Use Committee (IACUC) at the UGA (AUP #: A2023 02-024). Hens were offered ad libitum feed that was manufactured at the UGA feed mill.

2.2. Image Acquisition

Thermal images of 25 hens were captured weekly in each of the three rooms from the first week of May, when the birds were 80 weeks of age, to the third week of June, when they reached 86 weeks of age. To minimize variation, individual hens were placed in the same designated position and location for capturing thermal images. A total of 1575 thermal images of the dorsal body of hens were captured using a FLIR T530 thermal camera (FLIR Systems, Wilsonville, OR, USA) mounted on a tripod at a fixed height of 0.8 m above the hen. The air temperature was approximately 20 °C, and the camera emissivity was set to 0.98, consistent with previous similar studies [16]. To avoid residual heat effects, a 5 min interval was maintained between successive hen placements. All images were captured in the early morning to minimize environmental variation.

Manual feather scoring was performed by experienced researchers following the Welfare Quality Network (2019) [12] protocol for laying hens, with the criteria and number of images for each score in Table 1. Hens were categorized as score 0 for minimal or no feather loss, score 1 for moderate feather loss, and score 2 for severe feather loss. Inter-observer agreement was greater than 95%, showing high consistency. Images were then classified into their respective feather score categories for further use. Representative examples of each class are presented in Figure 1.

2.3. Model Training

A total of 1222 images were selected from the original image dataset after excluding images of poor quality. A custom CNN model was developed for the classification task. First, the image was resized to 128 × 128 and converted to tensors. The model consists of two convolutional layers with ReLU activation and MaxPooling and two fully connected layers. The dataset was split into 80% for training and 20% for validation. The model was trained for 15 epochs using the Adam optimizer with a learning rate of 0.001. The CNN developed for feather score classification consists of the following layers one after another:

Input layer: accepts RBG images resized to 128 × 128 pixels (3 channels).
Convolution 1: 16 filters, 3 × 3 kernel, stride 1, padding ‘same’, ReLU activation.
MaxPool 1: 2 × 2 pool size, stride 2, reducing spatial dimensions to 64 × 64 × 16.
Convolution 2: 32 filters, 3 × 3 kernel, stride 1, padding ‘same’, ReLU activation.
MaxPool 2: 2 × 2 pool size, stride 2, reducing spatial dimensions to 32 × 32 × 32.
Flatten: converts 3D feature maps into a 1D vector (32 × 32 × 32 = 32,768 features).
Fully connected: 128 units, ReLU activation.
Output: 3 units, linear activation followed by softmax for multi-class classification.

For the detection task, bounding box annotations were performed using CVAT.ai [20], drawing a rectangle around featherless areas of the hen. Following the scoring protocol as shown in Table 1, only scores 1 and 2 were annotated. However, score 0 was not annotated as it corresponds to the absence of featherless areas. From the model output, images with no annotated regions can be classified as a score of 0.

The dataset was split into training, validation, and test sets in the ratio of 70:20:10. Three models, YOLO11n, YOLO11l, and YOLOv8l, were trained with augmentation, early stopping, and batch size adjustment for optimization. Ultimately, YOLO11n [21] trained for 100 epochs demonstrated the best performance and was selected for further testing. The detection models were trained on NVIDIA RTX 4000 (Ada Generation) using CUDA version 12.5. The classification models were run on Google Colab using a T4 GPU hardware accelerator.

2.4. Performance Metrics of Models

The models were evaluated based on standard deep learning metrics. For the classification task, accuracy and the confusion matrix were used to evaluate the model. For the object detection task, precision, recall and mean average precision at 50% intersection over union (mAP50) were used to evaluate the model.

2.4.1. Accuracy

Accuracy is a commonly used metric for classification tasks. It measures the number of correct predictions out of the total number of predictions made. It can be obtained by dividing the sum of true positive (TP) and true negative (TN) by the sum of TP, TN, false positive (FP), and false negatives (FN).

A c c u r a c y (A) = \frac{T P + T N}{T P + T N + F P + F N}

2.4.2. Precision

Precision is the number of correct positive predictions out of the total positive predictions made by the model. It can be obtained by dividing TPs by the sum of TPs and FPs.

P r e c i s i o n (P) = \frac{T P}{T P + F P}

2.4.3. Recall

Recall is the number of correct positive predictions made by the model out of the total actual instances. It can be obtained by dividing TPs by the sum of TPs and FNs.

R e c a l l (R) = \frac{T P}{T P + F N}

2.4.4. mAP50

mAP50 is the average precision of the model across all recall levels for an intersection over union (IoU) of 0.5. IoU is a measure of the overlap between the bounding box predicted by the model and the ground truth bounding box. It is a more comprehensive evaluation metric as it considers both precision and recall.

A v e r a g e p r e c i s i o n (A P) = \int_{0}^{1} P (R) d R

M e a n a v e r a g e p r e c i s i o n (m A P) = \frac{1}{n} \sum_{i = 1}^{n} {A P}_{i}

n = n u m b e r o f c l a s s e s

3. Results

3.1. Classification

3.1.1. Performance Metrics

As shown in Figure 2a, the training accuracy of the classification model increased steadily during the first eight epochs, after which it plateaued. A similar trend is observed in the loss curve shown in Figure 2b, with losses stabilizing after approximately eight epochs. By the end of training, the validation accuracy reached 0.81, indicating good overall performance.

The micro-averaged precision, recall, and F1-score all reached 0.81, showing strong overall predictive performance. The weighted average F1-score was also 0.81, which shows that the model performed well on average when accounting for the class distribution. However, the macro-averaged F1 score was only 0.68. This indicated variation in performance across different classes, which might be due to class imbalance in the dataset. The model achieved a precision of 0.91, a recall of 0.9, and an F1-score of 0.9 for class 2. Therefore, this model is very effective in detecting hens with a feather score of 2. For class 0, the model reached a recall of 0.78, precision of 0.58, and F1-score of 0.67. For class 1, the model did not perform well with a precision of 0.61, recall of 0.4, and F1-score of 0.48.

3.1.2. Confusion Matrix

The confusion matrix in Figure 3 provides a normalized, class-wise evaluation of the model. The model demonstrated a high accuracy of 0.9 in detecting severe feather loss, while performance for moderate feather loss was lower, with an accuracy of only 0.4. For hens with minimal or no feather loss, performance was relatively better, with an accuracy of 0.78. Misclassification analysis shows that 19% of images with score 0 and 26% of images with score 1 were incorrectly predicted as score 2. However, score 1 misclassification was relatively uncommon, with only 3% of score 0 images and 5% of score 2 images incorrectly predicted to be of score 1. Additionally, 34% of score 1 images and 5% of score 2 images were incorrectly predicted to be of score 0.

3.2. Detection

For the detection task, YOLO11n trained for 100 epochs demonstrated the best performance. So, the results from this best optimized model are explained below.

3.2.1. Performance Metrics

Performance metrics of the YOLO11n model across the training epochs are presented in Figure 4. The model achieved a precision of 0.81, meaning 81% of the positive predictions made by the model were correct. The model achieved a recall of 0.73, meaning that it correctly predicted 73% of all the true positive instances. The model reached an mAP50 of 0.84, meaning it achieved an average precision of 84% at the IoU threshold of 0.5.

The training loss curves of the detection model are shown in Figure 5. The training bounding box loss steadily decreased from 1.62 to 0.83, showing improved bounding box localization over epochs. Similarly, the training classification loss decreased from 2.92 to 0.52, showing improved class prediction performance. The training distribution focal loss decreased from 1.29 to 0.8, showing improved bounding box regression. Similarly, the validation loss curves of the detection model are given in Figure 6. The validation bounding box loss decreased from 1.31 to 1.03, the validation classification loss decreased from 2.84 to 0.71, and the validation distribution focal loss decreased from 1.04 to 0.97. However, compared with the initial phase, the validation loss curves did not decrease as rapidly beyond approximately 75 epochs. The limited decrease in validation loss beyond 75 epochs was one of the reasons for training the model for only 100 epochs.

3.2.2. Confusion Matrix

The confusion matrix in Figure 7 provides a normalized, class-wise evaluation of the detection model. The model achieved a high accuracy of 0.96 in detecting severe feather loss, while performance for moderate feather loss was comparatively lower, with an accuracy of only 0.75. Score 2 was over predicted in some cases, with 5% of true score 1 incorrectly predicted as score 2. Misclassification as score 1 was less common, with only 1% of score 2 incorrectly predicted as score 1. Additionally, 20% of score 1 and 3% of score 2 were incorrectly predicted as background, meaning they were not predicted as areas with feather loss.

This confusion matrix does not have accuracy for minimal or no feather loss because in the detection model areas without annotations are treated as background. Only the areas with feather loss were annotated for the model training. Since featherless areas were not annotated, they are considered background instead of a separate class. If class 0 was included in the matrix, it would dominate the values. Therefore, it was omitted.

3.2.3. Model Test

Representative examples of prediction performed using the YOLO11n model in test images are presented in Figure 8, Figure 9 and Figure 10 for score 0, 1 and 2, respectively. These figures show the model predicting the feather score using the features of the feather loss region from the thermal images of the dorsal body of a hen. The confidence threshold was set to 0.65, meaning all the predicted bounding boxes with confidence scores of 65% or above are displayed. If multiple scores are predicted within an image, the final feather score is determined by selecting the maximum detected score. An image with no feather loss region as shown in Figure 8 can be predicted as score 0.

4. Discussion

The findings highlight the potential of deep learning-based techniques for automatic feather loss assessment in laying hens using thermal images. Zhang et al. (2023) [22] provided a computer vision-based feather damage monitoring system utilizing color, thermal, and depth imaging, which focused on quantification of feather damage with a coefficient of determination of 0.946 and a root mean square error of 2.015 mm between the predicted depth and manual feather damage measurement. Alternatively, our approach demonstrates excellent feather loss classification and detection based only on thermal imaging alone. The model eliminates the need for manually defining a temperature range, giving methodological advancement over conventional similar studies [15].

In both the detection and classification models, severe feather loss was consistently identified with high reliability. The majority class might be more favored by the model. In contrast, the prediction of moderate feather loss was the least reliable. This might be because of relatively fewer images with a moderate feather score in the training dataset [23]. Additionally, hens with moderate feather scores might be more difficult to distinguish, as their features are often not as extreme as those of hens with minimal or severe feather loss. This challenge was consistent with the findings from the study by Schreiter and Freick (2022) [17], who showed excellent accuracy for severe feather loss but low sensitivity for differentiating moderate feather loss, reaching as low as 31.7% for score 0 vs. 1. Misclassification might also arise from slight feather ruffling or thermal artifacts, which might be similar to featherless areas in the obtained image. Both the classification and detection models showed good results in the assessment of feather scores in laying hens. However, distinct patterns and limitations are present, providing a way for their application and areas for improvement.

4.1. Classification

The custom CNN model showed convergence, as proven by the stabilization of accuracy and loss after around eight epochs. This rapid convergence reduced the risk of overfitting. The validation score and the micro-averaged F1-score reaching 0.81 shows that the model can reliably distinguish feather loss classes overall. However, the reduced macro-averaged F1-score of 0.68 indicates a difference in class-wise performance, which might be due to dataset imbalance and the inherent challenge of distinguishing the feather score using thermal images. Classes representing feather score 0 and feather score 2 exhibited relatively better performance than the class representing feather score 1. The model achieved greater precision, recall and F1-score for the severe feather loss class. Additionally, the confusion matrix showed better accuracy for the score 2 class. Conversely, the model struggled with the moderate feather loss class achieving lower precision, recall, F1-score and accuracy. There were relatively fewer misclassifications of severe feather loss as moderate or no feather loss. However, misclassification of score 0 and score 1 as score 2 indicates that some healthy or minimally affected hens might be mistakenly flagged as severely feather-damaged ones, which might lead to unnecessary intervention.

4.2. Detection

The YOLO11n model showed strong performance in localizing and classifying feather loss regions in hens. Most of the predicted feather loss regions by the model correspond to true positive instances as indicated by its good precision. Similarly, recall indicated that the model could detect a large proportion of feather loss areas, though some regions might remain undetected. The high mAP50 shows the model’s ability to accurately detect feather loss regions and classify their severity. These results are supported by reducing training and validation loss within the limited epochs, preventing overfitting. Previous studies have shown YOLO-based models have achieved far superior performance in poultry, with YOLOv11n even reaching a precision of 0.977, a recall of 0.92, and an mAP50 of 0.96 for the detection of dead chickens using RBG image [19]. Similar to the classification model, severe feather loss was detected with greater accuracy while moderate feather loss exhibited comparatively lower accuracy.

The detection model had a better performance in detecting moderate feather loss, achieving an accuracy of 0.75 compared with 0.4 in the classification model. The detection-based approach might have additional benefits to the classification-based approach as it also localizes features in the feather loss region instead of relying on the whole image’s features. However, the CNN model requires significantly less inference time per image than the YOLO model, which makes it suitable for rapid screening when speed is of priority. Classification models are simple and fast, without the need for detailed annotations, especially useful in large-scale monitoring. From a practical perspective, combining both classification and detection models might offer a more robust pipeline for automatic feather loss monitoring in laying hens.

4.3. Limitations and Future Directions

The models were trained on limited datasets of only around 1222 images, which limits the diversity of the images. In addition, the dataset was obtained from a single facility with birds from the same breed under controlled environmental conditions. As shown in Figure 11, one of the causes of misclassification arises from variation in posture during image capture. When hens were spreading their wings, regions between the wings and the body were exposed, appearing to be localized hot spots in thermal images even if the hens had full feather coverage. Such patterns might have led the models to overestimate feather loss severity. Various physiological factors that can influence the thermographic measurements were not incorporated into the model, including natural molting status, metabolic activity, laying status, hydration level, and skin emissivity variation. The current design reduces the variability but limits the robustness of the model. Data obtained from hens of diverse ages, breeds, environmental conditions, body parts and locations can be used for training to enhance model robustness. Additionally, multimodal artificial intelligence systems that integrate thermal, visual, environmental, acoustic, and physiological data have been shown to enhance welfare assessment accuracy and generalizability compared with the unimodal approach as in the current study [24].

The model performance indicates room for improvement. Future studies should explore weighted loss function [25], targeted data augmentation [26], and oversampling strategies [27], which can improve performance especially towards the minority class. In a related study by Elmessery et al., 2023 [28], YOLO-based models demonstrated excellent results reaching an mAP50 of 0.988 and a F1-score of 0.972, with better overall performance using thermal images than that with normal RBG images. This exceptional performance might be due to a large dataset with 10,000 images.

In the YOLO-based detection model, hens with complete feather coverage were not annotated and were instead treated as background. This strategy was adopted assuming that focusing only on the featherless areas would be more effective instead of learning from the entire chicken image, which might introduce artifacts and increase model errors. This strategy also reduced annotation time and effort. However, lack of annotations specific to full feather coverage may lead to false negatives, especially in complex farming environments. To address this limitation, future work should investigate alternative strategies, such as introducing a separate class for full feather coverage and training models based on annotations of the entire hen instead of only featherless areas.

Integrating the thermal imaging system and deep learning with robotics in farming can potentially reduce labor cost and minimize handling stress in hens. It can be used as an early detection tool for feather loss, helping prevent feather pecking to improve welfare and production.

5. Conclusions

The study demonstrated the potential of deep learning-based approaches to assess the feather score of the dorsal body in laying hens using a thermal imaging technique. A custom-built CNN was developed to classify images into feather score categories, while a YOLO-based object detection model was trained and optimized to detect regions of bare skin and classify the feather score. The classification model achieved an accuracy of around 0.81, and the YOLO11n object detection model achieved a precision of 0.81, a recall of 0.73, and an mAP50 of 0.84. A classification model can be used when localization of the feather loss region is not required, while a detection model can be used when a precise feather loss region is required. Integrating the models into a real-time monitoring system can enhance objective and practical feather scoring techniques in commercial poultry production. Future research should focus on expanding the dataset size and diversity by including hens of different ages, breeds, housing systems, and environmental conditions to further improve the robustness and accuracy of the model.

Author Contributions

Conceptualization: L.C.; methodology: S.D. and L.C.; formal analysis: S.D.; investigation: S.D., B.P., A.D. and L.C.; resources: L.C.; writing—original draft preparation: S.D., B.P., A.D. and L.C.; writing—review and editing: S.D., B.P., A.D. and L.C.; visualization: S.D., B.P., A.D. and L.C.; supervision: L.C.; project administration: L.C.; funding acquisition: L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the USDA-NIFA AFRI (2023-68008-39853), the Georgia Research Alliance (Venture Fund), UGA Office of Global Engagement Grant, and the UGA Institute for Integrative Precision Agriculture.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
GPU	Graphics Processing Unit
IoU	Intersection over Union
mAP	Mean Average Precision
mAP50	Mean Average Precision at 50% Intersection over Union Threshold
ReLU	Rectified Linear Unit
UGA	University of Georgia
YOLO	You Only Look Once

References

Richards, S.A. The influence of loss of plumage on temperature regulation in laying hens. J. Agric. Sci. 1977, 89, 393–398. [Google Scholar] [CrossRef]
Nichelmann, M.; Baranyiová, E.; Goll, R.; Tzschentke, B. Influence of feather cover on heat balance in laying hens (Gallus domesticus). J. Therm. Biol. 1986, 11, 121–126. [Google Scholar] [CrossRef]
Lesson, S.; Morrison, W.D. Effect of feather cover on feed efficiency in laying birds. Poult. Sci. 1978, 57, 1094–1096. [Google Scholar] [CrossRef]
Peguri, A.; Coon, C. Effect of feather coverage and temperature on layer performance. Poult. Sci. 1993, 72, 1318–1329. [Google Scholar] [CrossRef]
Glatz, P.C. Effect of poor feather cover on feed intake and production of aged laying hens. Asian-Australas. J. Anim. Sci. 2001, 14, 553–558. [Google Scholar] [CrossRef]
Hagger, C.; Marguerat, C.; Steiger-Staf, D.; Stranzinger, G. Plumage Condition, Feed Consumption, and Egg Production Relationships in Laying Hens. Poult. Sci. 1989, 68, 221–225. [Google Scholar] [CrossRef]
Leeson, S.; Walsh, T. Feathering in commercial poultry II. Factors influencing feather growth and feather loss. World’s Poult. Sci. J. 2004, 60, 52–63. [Google Scholar] [CrossRef]
Bilcik, B.; Keeling, L.J. Changes in feather condition in relation to feather pecking and aggressive behaviour in laying hens. Br. Poult. Sci. 1999, 40, 444–451. [Google Scholar] [CrossRef] [PubMed]
Schwarzer, A.; Rauch, E.; Erhard, M.; Reese, S.; Schmidt, P.; Bergmann, S.; Plattner, C.; Kaesberg, A.; Louton, H. Individual plumage and integument scoring of laying hens on commercial farms: Correlation with severe feather pecking and prognosis by visual scoring on flock level. Poult. Sci. 2022, 101, 102093. [Google Scholar] [CrossRef]
Mullan, S.; Szmaragd, C.; Cooper, M.; Wrathall, J.; Jamieson, J.; Bond, A.; Atkinson, C.; Main, D. Animal welfare initiatives improve feather cover of cage-free laying hens in the UK. Anim. Welf. 2016, 25, 243–253. [Google Scholar] [CrossRef]
LayWel. Manual for Self-Assessment of the Welfare of Laying Hens on Farm; University of Bristol: Bristol, UK, 2006. [Google Scholar]
Welfare Quality Network. Assessment Protocol for Laying Hens, Version 2.0; Welfare Quality Network: Lelystad, The Netherlands, 2019; Available online: https://www.welfarequalitynetwork.net/media/1294/wq_laying_hen_protocol_20_def-december-2019.pdf (accessed on 1 December 2025).
AssureWel. AssureWel Laying Hen Assessment Protocol, Version 4; AssureWel: Bristol, UK, 2013; Available online: http://www.assurewel.org/Portals/2/Documents/Laying%20hens/AssureWel%20Laying%20Hen%20Assessment%20Protocol.pdf (accessed on 1 December 2025).
McCafferty, D.J. Applications of thermal imaging in avian science. Ibis 2013, 155, 4–15. [Google Scholar] [CrossRef]
Cook, N.J.; Smykot, A.B.; Holm, D.E.; Fasenko, G.; Church, J.S. Assessing feather cover of laying hens by infrared thermography. J. Appl. Poult. Res. 2006, 15, 274–279. [Google Scholar] [CrossRef]
Zhao, Y.; Xin, H.; Dong, B. Use of infrared thermography to assess laying-hen feather coverage. Poult. Sci. 2013, 92, 295–302. [Google Scholar] [CrossRef]
Schreiter, R.; Freick, M. Research Note: Is infrared thermography an appropriate method for early detection and objective quantification of plumage damage in white and brown feathered laying hens? Poult. Sci. 2022, 101, 102022. [Google Scholar] [CrossRef] [PubMed]
Niu, J.; Li, T.; Qi, K.; Liu, Y.; Deng, H.; Hu, Y.; Xu, D.; Wu, L.; Amevor, F.K.; Wang, Y.; et al. Research note: Application of convolutional neural networks for feather classification in chickens. Poult. Sci. 2025, 104, 105254. [Google Scholar] [CrossRef]
Bumbálek, R.; Umurungi, S.N.; Ufitikirezi, J.d.D.M.; Zoubek, T.; Kuneš, R.; Stehlík, R.; Lin, H.-I.; Bartoš, P. Deep learning in poultry farming: Comparative analysis of Yolov8, Yolov9, Yolov10, and Yolov11 for dead chickens detection. Poult. Sci. 2025, 104, 105440. [Google Scholar] [CrossRef] [PubMed]
CVAT.ai Corporation. Computer Vision Annotation Tool (CVAT), Version 2.46.0; Intel Corporation: Santa Clara, CA, USA, 2025. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J. Ultralytics YOLO11. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 1 December 2025).
Zhang, X.; Zhang, Y.; Geng, J.; Pan, J.; Huang, X.; Rao, X. Feather Damage Monitoring System Using RGB-Depth-Thermal Model for Chickens. Animals 2023, 13, 126. [Google Scholar] [CrossRef] [PubMed]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object detection using YOLO: Challenges, architectural successors, datasets and applications. Multimed. Tools Appl. 2022, 82, 9243–9275. [Google Scholar] [CrossRef]
Essien, D.; Neethirajan, S. Multimodal AI systems for enhanced laying hen welfare assessment and productivity optimization. Smart Agric. Technol. 2025, 12, 101564. [Google Scholar] [CrossRef]
Du, J.; Zhou, Y.; Liu, P.; Vong, C.-M.; Wang, T. Parameter-Free Loss for Class-Imbalanced Deep Learning in Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3234–3240. [Google Scholar] [CrossRef]
Yang, J.; Li, Z.; Gu, Z.; Li, W. Research on floating object classification algorithm based on convolutional neural network. Sci. Rep. 2024, 14, 32086. [Google Scholar] [CrossRef] [PubMed]
Taskiran, S.F.; Turkoglu, B.; Kaya, E.; Asuroglu, T. A comprehensive evaluation of oversampling techniques for enhancing text classification performance. Sci. Rep. 2025, 15, 21631. [Google Scholar] [CrossRef] [PubMed]
Elmessery, W.M.; Gutiérrez, J.; El-Wahhab, G.G.A.; Elkhaiat, I.A.; El-Soaly, I.S.; Alhag, S.K.; Al-Shuraym, L.A.; Akela, M.A.; Moghanm, F.S.; Abdelshafie, M.F. YOLO-Based Model for Automatic Detection of Broiler Pathological Phenomena through Visual and Thermal Images in Intensive Poultry Houses. Agriculture 2023, 13, 1527. [Google Scholar] [CrossRef]

Figure 1. Representative thermal images of hens with feather scores of 0 (left), 1 (center), and 2 (right). The images display dorsal views with head oriented at the top and the tail at the bottom. Colors represent the surface temperature using a rainbow color scale, with blue indicating cooler area, red indicating warmer areas, and white indicating the highest temperature.

Figure 2. Training and validation performance of the classification model over 15 epochs. (a) Training and validation accuracy curves. (b) Training and validation loss curves.

Figure 3. Normalized confusion matrix for the classification model with three feather score classes: 0, 1 and 2. Here, rows represent the predicted labels, and columns represent the true labels. The values on the diagonal indicate class-wise accuracy.

Figure 4. YOLO11n performance metrics over 100 epochs. The graph illustrates the performance metrics—precision, recall, and mAP50—across the epochs.

Figure 5. YOLO11n training loss over 100 epochs: (a) box loss, (b) classification loss, and (c) distribution focal loss.

Figure 6. YOLO11n validation loss over 100 epochs: (a) box loss, (b) class loss, and (c) distribution focal loss.

Figure 7. Normalized confusion matrix for the detection model with two feather score classes: 1 and 2. Here, columns represent the true labels, and rows represent the predicted labels. The values on the diagonal indicate class wise accuracy.

Figure 8. Thermal image (°C) of a hen with no feather loss (score 0), showing no bounding box prediction obtained using the trained YOLO11n model.

Figure 9. Thermal image (°C) of a hen with moderate feather loss (score 1), showing bounding box prediction and its confidence score (0.68) predicted using the trained YOLO11n model.

Figure 10. Thermal image (°C) of a hen with severe feather loss (score 2), showing bounding box prediction and its confidence score (0.85) predicted using the trained YOLO11n model.

Figure 11. Example of a challenging misclassification case: (a) RBG image of hen with full feather coverage (feather score 0), and (b) thermal image (°C) of the same hen, with localized thermal hot spots appearing between wings and body, leading the CNN model to incorrectly classify the image as feather score 1.

Table 1. Feather score with the criteria and the corresponding number of images used.

Feather Score	Criteria	Number of Images
0	No or minimum feather wear	221
1	One or more featherless areas < 5 cm diameter	189
2	At least one featherless area ≥ 5 cm diameter	812

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dahal, S.; Paneru, B.; Dhungana, A.; Chai, L. Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens. AgriEngineering 2026, 8, 68. https://doi.org/10.3390/agriengineering8020068

AMA Style

Dahal S, Paneru B, Dhungana A, Chai L. Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens. AgriEngineering. 2026; 8(2):68. https://doi.org/10.3390/agriengineering8020068

Chicago/Turabian Style

Dahal, Samin, Bidur Paneru, Anjan Dhungana, and Lilong Chai. 2026. "Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens" AgriEngineering 8, no. 2: 68. https://doi.org/10.3390/agriengineering8020068

APA Style

Dahal, S., Paneru, B., Dhungana, A., & Chai, L. (2026). Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens. AgriEngineering, 8(2), 68. https://doi.org/10.3390/agriengineering8020068

Article Menu

Deep Learning and Thermal Imaging Approaches for the Assessment of Feather Coverage in Cage-Free Laying Hens

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Setup

2.2. Image Acquisition

2.3. Model Training

2.4. Performance Metrics of Models

2.4.1. Accuracy

2.4.2. Precision

2.4.3. Recall

2.4.4. mAP50

3. Results

3.1. Classification

3.1.1. Performance Metrics

3.1.2. Confusion Matrix

3.2. Detection

3.2.1. Performance Metrics

3.2.2. Confusion Matrix

3.2.3. Model Test

4. Discussion

4.1. Classification

4.2. Detection

4.3. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI