YOLO-Based Model for Automatic Detection of Broiler Pathological Phenomena through Visual and Thermal Images in Intensive Poultry Houses

: The increasing broiler demand due to overpopulation and meat imports presents challenges in poultry farming, including management, disease control, and chicken observation in varying light conditions. To address these issues, the development of AI-based management processes is crucial, especially considering the need for detecting pathological phenomena in intensive rearing. In this study, a dataset consisting of visual and thermal images was created to capture pathological phenomena in broilers. The dataset contains 10,000 images with 50,000 annotations labeled as lethargic chickens, slipped tendons, diseased eyes, stressed (beaks open), pendulous crop, and healthy broiler. Three versions of the YOLO-based algorithm (v8, v7, and v5) were assessed, utilizing augmented thermal and visual image datasets with various augmentation methods. The aim was to develop thermal-and visual-based models for detecting broilers in complex environments, and secondarily, to classify pathological phenomena under challenging lighting conditions. After training on acknowledged pathological phenomena, the thermal YOLOv8-based model demonstrated exceptional performance, achieving the highest accuracy in object detection (mAP50 of 0.988) and classiﬁcation (F1 score of 0.972). This outstanding performance makes it a reliable tool for both broiler detection and pathological phenomena classiﬁcation, attributed to the use of comprehensive datasets during training and development, enabling accurate and efﬁcient detection even in complex environmental conditions. By employing both visual-and thermal-based models for monitoring, farmers can obtain results from both thermal and visual viewpoints, ultimately enhancing the overall reliability of the monitoring process.

The image labeling consists of various classes including lethargic chickens, tendons, diseased eyes, stressed (beaks open) chickens, pendulous crops, and chicken ( Figure 2). The annotation, splitting, preprocessing, and augmentation of ages were accomplished using the "Roboflow" software [44]. This software prov necessary tools to transform raw photos into a custom computer vision model an in applications [45,46]. To accurately outline the disease phenomenon in broilers, t ground areas were minimized to reduce the surrounding area. Examples of labeled diseases in both optical and thermal images are shown in Figure 2. The annotated were saved in XML format files. All chicken images were annotated across the six The dataset was split into a training set (2.9 K images), a validation set (269 imag a test set (145 images). Overfitting is a more critical concern in deep learning mo address overfitting in deep learning, techniques such as regularization (e.g., L1 or ularization), dropout, early stopping, and data augmentation are commonly used the model overfitting, there are no frequent images among the split datasets of t validation, and test sets [47]. Processing was conducted using Auto-Orient: applied stretch to 640 × 640, specified format yolov7 pytorch, and the dataset was downlo a zip folder on the local computer.
Poultry houses have a complex environment where sidelight, backlight, sligh sion, and strong occlusion will affect an equalized image of broilers, causing fals tion or missing detection of targets. The training image should have more scenes to features and overcome the interference of complex scenes [48]. However, limitatio when dealing with diseased broilers, as the availability of images is restricted du ease-related constraints, limited number of affected broilers, and time constraints turing the images. Consequently, current advancements in deep learning researc on enhancing existing data to augment the training dataset and improve neural n generalization.
In Figure 2 (column 2), chickens can be observed panting, which involves their beaks and breathing rapidly. This behavior is a mechanism used by chicke lease heat and dissipate the internal heat, similar to how dogs pant. Panting an The image labeling consists of various classes including lethargic chickens, slipped tendons, diseased eyes, stressed (beaks open) chickens, pendulous crops, and healthy chicken ( Figure 2). The annotation, splitting, preprocessing, and augmentation of the images were accomplished using the "Roboflow" software [44]. This software provides the necessary tools to transform raw photos into a custom computer vision model and use it in applications [45,46]. To accurately outline the disease phenomenon in broilers, the background areas were minimized to reduce the surrounding area. Examples of labeled broiler diseases in both optical and thermal images are shown in Figure 2. The annotated images were saved in XML format files. All chicken images were annotated across the six classes. The dataset was split into a training set (2.9 K images), a validation set (269 images), and a test set (145 images). Overfitting is a more critical concern in deep learning models. To address overfitting in deep learning, techniques such as regularization (e.g., L1 or L2 regularization), dropout, early stopping, and data augmentation are commonly used. To test the model overfitting, there are no frequent images among the split datasets of training, validation, and test sets [47]. Processing was conducted using Auto-Orient: applied, resize: stretch to 640 × 640, specified format yolov7 pytorch, and the dataset was downloaded as a zip folder on the local computer. The thermal analysis of infected sections in this study is depicted in the table presented in Figure 2, where two different approaches were employed. The first approach is represented in row C, and the second method is shown in row D. The first technique initially identified the afflicted area, followed by determining the maximum, minimum and average temperatures. The second method uses a temperature mask to assist us in better comprehending the heat distribution pattern and the lumpiest part of the temperature. Poultry houses have a complex environment where sidelight, backlight, slight occlusion, and strong occlusion will affect an equalized image of broilers, causing false detection or missing detection of targets. The training image should have more scenes to extract features and overcome the interference of complex scenes [48]. However, limitations arise when dealing with diseased broilers, as the availability of images is restricted due to disease-related constraints, limited number of affected broilers, and time constraints in capturing the images. Consequently, current advancements in deep learning research focus on enhancing existing data to augment the training dataset and improve neural network generalization.
In Figure 2 (column 2), chickens can be observed panting, which involves opening their beaks and breathing rapidly. This behavior is a mechanism used by chickens to release heat and dissipate the internal heat, similar to how dogs pant. Panting and rapid breathing are considered early indicators of heat stress in chickens [49]. Figure 2 (column 4) illustrates a condition known as tendon rupture or perosis, which is a metabolic disease causing leg weakness in chickens, ducks, and turkeys. It usually occurs in poultry under six weeks of age, resulting in flattened and enlarged hocks [50]. Lastly, Figure 2 (column 1) shows a pendulous or spastic crop, as it is often known.
The thermal analysis of infected sections in this study is depicted in the table presented in Figure 2, where two different approaches were employed. The first approach is represented in row C, and the second method is shown in row D. The first technique initially identified the afflicted area, followed by determining the maximum, minimum and average temperatures. The second method uses a temperature mask to assist us in better comprehending the heat distribution pattern and the lumpiest part of the temperature.

Thermal Image Processing
This section describes the methodology and image processing techniques used in this study. Infrared cameras can identify points of heat concentration in chickens. However, obtaining clear thermal images is just the initial step in thermography. The real challenges lie in the subsequent processing and interpretation of these images to transform them into meaningful thermograms. Then, these can be the basis for efficient optimization measures on objects captured thermographically. To analyze and evaluate thermal images, powerful analysis software is essential. In this study, the UTi165A software [51] was utilized to process and annotate the thermal images captured via the UTi165A thermal camera. The process involved several steps, as shown in Figure 3. Once the thermal images are imported, various adjustments can be performed and analyzed using the UTi165A software's tools. This may include adjusting the color palette, temperature range, image enhancement, and other settings to enhance the visibility and interpretation of thermal patterns, this step is very useful for improving thermal patterns and showing the temperature gradient inside the image. The UTi165A software provides annotation tools to mark specific regions or objects of interest within the thermal images. These tools places markers on the image to highlight specific areas or anomalies, but in this study, this tool is not utilized for YOLO training, it is only used to aid and guide annotating at this anomaly part as a specific pathological phenomenon via the Roboflow software. After processing and annotating the thermal images, the work can be saved within the UTi165A software to preserve the changes performed. The software has options to save the images with annotations overlaid or save the annotations separately as metadata associated with the images. The processed images aid the YOLO-based model to train more accurately on thermal images. image to highlight specific areas or anomalies, but in this study, this tool is not utilized for YOLO training, it is only used to aid and guide annotating at this anomaly part as a specific pathological phenomenon via the Roboflow software. After processing and annotating the thermal images, the work can be saved within the UTi165A software to preserve the changes performed. The software has options to save the images with annotations overlaid or save the annotations separately as metadata associated with the images. The processed images aid the YOLO-based model to train more accurately on thermal images.

Feature Extraction from Thermal Images
In this study, the extracted features from thermal images are primarily based on patterns and pixel intensity rather than colors. Since thermal images capture temperature distributions rather than visible light, color information is not as relevant for feature extraction in this context. The YOLO-based model analyzes the thermal images by focusing on patterns and variations in pixel intensity, which corresponds to temperature differences. It utilizes convolutional layers to detect distinct thermal patterns indicative of different pathological phenomena. It learns to recognize sharp transitions, temperature gradients, hotspots, and other thermal pattern characteristics of specific conditions, without relying on predefined rules. Through iterative learning, the model automatically extracts discriminative thermal features for accurate classification of broiler diseases.

Experimental Environment and Hyper-Parameter Settings
The CPU of the utilized computer in this research contains 16 threads and 8-core Intel (R) Core (TM) i7-10870H, belonging to the 10th generation. It operated at a clock speed of 2.21 GHz with a turbo speed of 5 GHz. The CPU had a cache memory of 16 MB, and a maximum memory size of 128 GB (DDR4-2933). The GPU utilized was the NVIDIA GeForce RTX3060 equipped with 3840 CUDA cores and 6 GB of video memory. The operating system employed was Windows 10, and the software versions utilized were PyTorch 1.8.1, Python 3.8, and CUDA 11 [52,53]. For the training phase of this study, the number of epochs was set to 100. The batch size used for the model training was 8, and the input size was defined as 416 × 416 pixels. Regularization was applied through the Batch Normalization (BN) layer to update the model's weights. The momentum factor (momentum) was set to 0.937, and the weight decay rate was set to 0.0005, a regularization method that adds a penalty of 0.0005 to the loss function was used to avoid overfitting. The initial vector was set to 0.01 during the training process (Table 1).

Data Augmentation
In this study, challenges were initially encountered when training YOLO-based models using raw visual-and thermal-based datasets, as the model's performance was found to be unsatisfactory. To overcome this, the importance of finding suitable augmentation techniques to create a more diverse and high-quality dataset for improved model training was recognized [54]. Numerous augmentation techniques were used to enhance generalizability and prevent model overfitting. These techniques were implemented using Roboflow. Horizontal and vertical mirroring, rotating (90 • , 180 • , and 270 • ), blurring, and adding noise are examples of conventional data augmentation techniques [55]. The conventional image augmentation techniques are outlined in Table 2.

Mosaic Data Augmentation
The newly emerged data augmentation technique for combining many images called Mosaic data augmentation dramatically improves the background of objects being detected [56]. These techniques can expand the datasets available and strengthen the resilience of the detection models in intricate scenes [36]. In this research, thermal and visual datasets for broiler and disease detection at different complicated scenes were augmented by combing the conventional and Mosaic techniques for developing a reliable detection model.
As shown by Figure 4, the Mosaic data augmentation steps are as follows; firstly, a batch of image data was randomly extracted from the broiler pathological phenomena dataset. Then, four images of this batch were arbitrarily chosen, scaled, dispersed, and joined or spliced into new images, and the procedures above were carried out for batch size times. Lastly, the Yolov8-based algorithm was trained using the Mosaic data augmentation, which is appropriate for small object detection [39,57].
The augmented training set was ultimately composed of 1600 images for each combination of various augmentation methods, resulting in a total of 9600 images specifically for pathological phenomena ( Figure 5). To achieve this, the Roboflow online program was configured to triple the image data for each augmentation technique.

model.
As shown by Figure 4, the Mosaic data augmentation steps are as follows; firstly, a batch of image data was randomly extracted from the broiler pathological phenomena dataset. Then, four images of this batch were arbitrarily chosen, scaled, dispersed, and joined or spliced into new images, and the procedures above were carried out for batch size times. Lastly, the Yolov8-based algorithm was trained using the Mosaic data augmentation, which is appropriate for small object detection [39,57]. The augmented training set was ultimately composed of 1600 images for each combination of various augmentation methods, resulting in a total of 9600 images specifically for pathological phenomena ( Figure 5). To achieve this, the Roboflow online program was configured to triple the image data for each augmentation technique.

Architecture of YOLOv8
YOLOv8 architecture has been changed to have higher object detection precision in complex scenes than other previous versions of YOLO, as shown in Figure 6. This updated architecture includes a backbone consisting of a series of convolutional layers that extract features at different resolutions; these features are then passed through a neck module, where they are consolidated before being fed into the detection head. A total of 100 sample datasets were chosen from Roboflow Universe to evaluate how well models generalize to new domains. The small version of YOLOv8 was evaluated alongside YOLOv5 and YOLOv7 on the RF100 benchmark. YOLOv8 has an overall better mAP. There are five different versions of YOLOv8, from the smallest YOLOv8n with a 37.3 mAP score to the largest YOLOv8x with a 53.9 mAP score on COCO [58].

Architecture of YOLOv8
YOLOv8 architecture has been changed to have higher object detection precision in complex scenes than other previous versions of YOLO, as shown in Figure 6. This updated architecture includes a backbone consisting of a series of convolutional layers that extract features at different resolutions; these features are then passed through a neck module, where they are consolidated before being fed into the detection head. A total of 100 sample datasets were chosen from Roboflow Universe to evaluate how well models generalize to new domains. The small version of YOLOv8 was evaluated alongside YOLOv5 and YOLOv7 on the RF100 benchmark. YOLOv8 has an overall better mAP. There are five different versions of YOLOv8, from the smallest YOLOv8n with a 37.3 mAP score to the largest YOLOv8x with a 53.9 mAP score on COCO [58]. Agriculture 2023, 13, x FOR PEER REVIEW 10 of 22 Figure 6. YOLOv8 network architecture includes four generic modules of the input terminal, backbone, head, and prediction.

Improved YOLOv8 with Anchor Free
YOLO series models have undergone multiple iterations or updates over time. Each new version of the model builds upon the previous versions to overcome limitations and improve performance. The YOLOv8 architecture adopts an anchor-free approach, similar to YOLOX, for object detection. This anchor-free approach eliminates the need for predefined anchors or reference points, resulting in more efficient and adaptable object detection across different scales and aspect ratios. During training, loss functions are employed to optimize the model's parameters by minimizing the discrepancy between predicted values and ground truth annotations. YOLO v8 utilizes similar loss functions as YOLO versions 5 and 7, including box loss and classification loss. However, it deviates from using objectness loss and instead employs distributional focal loss, which treats the continuous distribution of box locations as a discretized probability distribution. This approach considers box locations as probability distributions rather than precise coordinates, providing a different perspective on object detection. Anchor boxes were a challenging component of older YOLO models. Anchor-free lowers the number of box predictions, Figure 6. YOLOv8 network architecture includes four generic modules of the input terminal, backbone, head, and prediction.

Improved YOLOv8 with Anchor Free
YOLO series models have undergone multiple iterations or updates over time. Each new version of the model builds upon the previous versions to overcome limitations and improve performance. The YOLOv8 architecture adopts an anchor-free approach, similar to YOLOX, for object detection. This anchor-free approach eliminates the need for predefined anchors or reference points, resulting in more efficient and adaptable object detection across different scales and aspect ratios. During training, loss functions are employed to optimize the model's parameters by minimizing the discrepancy between predicted values and ground truth annotations. YOLO v8 utilizes similar loss functions as YOLO versions 5 and 7, including box loss and classification loss. However, it deviates from using objectness loss and instead employs distributional focal loss, which treats the continuous distribution of box locations as a discretized probability distribution. This approach considers box locations as probability distributions rather than precise coordinates, providing a different perspective on object detection. Anchor boxes were a challenging component of older YOLO models. Anchor-free lowers the number of box predictions, which hurries up object detection speed that facilitates the post-processing of more candidate detections afterward inference [42]. The diagram in Figure 7 illustrates how to compute mAP. The calculation begins with each class detection recording, moving on to calculate precision, recall, and average precision (AP), using an interpolation of 11 points, and lastly, computing mAP. mAP is a metric for measuring object detection and segmentation system performance. Intersection over Union (IoU) is a widely employed metric, ranging from 0 to 1, that evaluates the accuracy and precision of object detection and segmentation algorithms by calculating the ratio of overlapping area to the total area of both regions, facilitating quantitative assessment of algorithm performance, Figure 8. which hurries up object detection speed that facilitates the post-processing of more candidate detections afterward inference [42].

Detection Evaluation Metrics: Mean Average Precision, mAP
The diagram in Figure 7 illustrates how to compute mAP. The calculation begins with each class detection recording, moving on to calculate precision, recall, and average precision (AP), using an interpolation of 11 points, and lastly, computing mAP. mAP is a metric for measuring object detection and segmentation system performance. Intersection over Union (IoU) is a widely employed metric, ranging from 0 to 1, that evaluates the accuracy and precision of object detection and segmentation algorithms by calculating the ratio of overlapping area to the total area of both regions, facilitating quantitative assessment of algorithm performance, Figure 8.

Performance Assessors
The standard performance measures used to determine the accuracy of the trained classification model are precision, recall, F1 score, and accuracy [59]. These performance measures primarily depend on four key probabilities with model predictions. Thus, the definitions of each prediction probability are as follows: (1) TP: True positives, implying that the model correctly predicted a label compared to the ground truth (actual data). (2) TN: True negatives, meaning that the model did not foresee the label and is not a component of the truth. (3) FP: False positives, denoting that the model predicted a label, but it is not a part of the ground truth (error type one). (4) FN: False negatives, meaning that the model did not predict a label, but it is part of the ground truth (error type two).
Calculating the subsequent performance metrics is simple after model prediction F1 Score It finds the optimal threshold confidence score at which precision and recall result in the highest F1 score. The F1 score calculates the balance between precision and recall. When the F1 score is high, precision and recall are high, and vice versa.
Average Precision (AP) The model's ability to distinguish negative samples is reflected in precision. The model's capacity to identify negative samples is stronger than the higher precision. Recall measures how well a model can locate positive samples. The model's capacity to detect positive samples increases with recall. The result of combining the two is the F1 score. The model is more reliable with a higher F1 score. The average precision (AP), calculated independently for each category, is the highest precision average value over all recall scenarios. Precision is a fairly logical evaluation metric, but occasionally it only captures some things. mAP, recall, and F1 score were thus introduced for comprehensive evaluation. The following formulas were used to determine the precision and mAP [60]: The precision at each recall level r is interpolated by taking the maximum precision measured using a method for which the corresponding recall exceeds r: The IoU threshold is set to 0.5 [61]. The sample with the highest confidence is the positive sample when an object is frequently identified; the other is the negative sample. The precision value of 10 bisectors on the horizontal axis 0-1 (including 11 breakpoints) on the smoothed precision-recall curve was acquired. The average value was determined as the final AP using Equation (5).
The mAP is a metric used to evaluate object detection models, such as Fast R-CNN and YOLO Mask R-CNN. The mean average precision (AP) value is calculated over recall values from 0 to 1. The mAP formula involves the following sub-metrics: confusion matrix, IoU, recall, and precision, discussing and interpreting as follows by calculating the AP for each class, and then averaging over the S classes.

Train YOLOv8 Model
Once the dataset was annotated and classified into different disease categories using Roboflow software, YOLOv8 PyTorch was used to export them into the suitable format of YOLO TXT and YAML files. All formatted data were downloaded as a zip folder for the YOLOv8 training process and model development procedures. The YOLOv8 model was trained in a local machine applying the following image size criteria: 640, batch size: 8, number of training periods: 100, and weights: yolov8s.pt. The following flowchart illustrates the model development stages (Figure 9). The IoU threshold is set to 0.5 [61]. The sample with the highest confidence is the positive sample when an object is frequently identified; the other is the negative sample. The precision value of 10 bisectors on the horizontal axis 0-1 (including 11 breakpoints) on the smoothed precision-recall curve was acquired. The average value was determined as the final AP using Equation (5).
The mAP is a metric used to evaluate object detection models, such as Fast R-CNN and YOLO Mask R-CNN. The mean average precision (AP) value is calculated over recall values from 0 to 1. The mAP formula involves the following sub-metrics: confusion matrix, IoU, recall, and precision, discussing and interpreting as follows by calculating the AP for each class, and then averaging over the classes.

Train YOLOv8 Model
Once the dataset was annotated and classified into different disease categories using Roboflow software,YOLOv8 PyTorch was used to export them into the suitable format of YOLO TXT and YAML files. All formatted data were downloaded as a zip folder for the YOLOv8 training process and model development procedures. The YOLOv8 model was trained in a local machine applying the following image size criteria: 640, batch size: 8, number of training periods: 100, and weights: yolov8s.pt. The following flowchart illustrates the model development stages (Figure 9).

Experimental Analysis of Chicken Detection and Diseases Classifications
The whole dataset consisted of 10,000 thermal and visual images with 50,000 annotated frames, divided into training (80%), testing (10%), and validation (10%) sets. The model was trained on the entire dataset for broiler detection using 100 epochs and 8

Experimental Analysis of Chicken Detection and Diseases Classifications
The whole dataset consisted of 10,000 thermal and visual images with 50,000 annotated frames, divided into training (80%), testing (10%), and validation (10%) sets. The model was trained on the entire dataset for broiler detection using 100 epochs and 8 batches, taking approximately 9.6 h for completion. The graphs in Figures 10 and 11 show the model's performance improvement, with various metrics for both the training and validation sets, including classification (cls_loss), objectness (obj_loss), distribution focal (dfl_loss), and box (box_loss) loss. These metrics assess the model's ability to locate broilers accurately, determine their class, and detect pathological phenomena, with the focal loss function addressing class imbalance during training. It can modify the cross entropy loss to concentrate learning on challenging misclassified samples that are dynamically scaled with a scaling factor decrementing to zero as confidence in the correct class rises. The model is swiftly amended regarding precision, mean average precision, and recall indices before plateauing after approximately 37 epochs ( Figure 10A). Validation data objectness and box losses decreased until about 35 to 40 epochs.
There are two distinct interpretations of mAP, as seen in Figures 10 and 11. The first one, mAP50, represents the mean average precision at an IoU threshold of 0.5. The developed YOLOv7-based model achieves an average value of 0.95 using the raw visual and thermal images datasets ( Figure 10A). The other form, mAP50-90, is calculated at thresholds of IoU from 0.5 to 0.95 with a step of 0.5. The Yolov7-based model achieved values above 0.7, which indicates that the model performs well for the broiler detection under various lighting circumstances. However, when identifying pathological phenomena, the YOLOv7-based model's performance and losses metrics show lower efficacy indices (Figure 10B), suggesting the need for a qualified and sufficient dataset for reliable and precise detection. To address this, two types of image data augmentation techniques, traditional and Mosaic techniques, were used in this study (Tables 3 and 4), synchronized with the emergence of a new version of YOLOv8.  through raw data (without augmentation process) of thermal images datasets, over 100 of training epochs. Figure 11 shows that the progression of the classification loss during model training (train/cls_loss) steadily decreased over epochs, starting at 4.5 and reaching 0.21. In contrast, the validation classification loss (val/cls_loss) started at 3.1 and ended at 0.6, which is three times greater than the train/cls_loss. This significant difference suggests that the augmentation process is primarily responsible for this variation between the two losses. Figure 11. Plots of training and validation sets of broiler pathological phenomena classification model created through YOLOv8, second model (thermal-based), through thermal images dataset augmented using the Mosaic technique, box, classification, and distributional focal (dfl) losses, and performance metrics of precision, recall, and mean average precision over the training epochs.

Model Comparison and the Influence of Different Dataset Augmentation Methods
The YOLOv8-based model, trained with thermal image dataset augmented via the Mosaic augmentation method (Table 3 and (Table 3), with mAP50 and mAP50-95 reaching 0.988 and 0.857, respectively, indicating exceptional performance in identifying pathological phenomena identification in complex scenes. While the Mosaic method significantly improved the thermal image dataset, its impact on the visual image dataset was different, resulting in substantial improvements with mAP50 and mAP50-90 reaching 0.829 and 0.679, respectively.
Traditional augmentation methods were used individually or in combination to augment both thermal and visual image dataset. Then, the augmented dataset was used to train the YOLOv8-based model. The YOLOv8-based model training results are illustrated in Table 4. YOLO versions 8, 7, and 5 took 0.863, 0.992, and 0.139 h for training of the augmented thermal image dataset via the Mosaic method, respectively. The bounding box augmentation was applied to enhance the detection of quickly moving objects; the fundamental notion of bounding box augmentation is to change the information inside the bounding box by varying, for instance, the brightness and blur of an object relative to its The evaluation metrics precision and recall evaluates the accuracy and completeness of the model in detecting and labeling specific phenomena in broilers. According to Tables 3 and 4, the YOLOv8-based thermal image model for pathological phenomena detection achieved high precision and recall of 0.988 and 0.956, respectively, resulting in an optimal F1 score of 0.972, indicating good balance between precision and recall.  There are two distinct interpretations of mAP, as seen in Figures 10 and 11. The first one, mAP50, represents the mean average precision at an IoU threshold of 0.5. The developed YOLOv7-based model achieves an average value of 0.95 using the raw visual and thermal images datasets ( Figure 10A). The other form, mAP50-90, is calculated at thresholds of IoU from 0.5 to 0.95 with a step of 0.5. The Yolov7-based model achieved values above 0.7, which indicates that the model performs well for the broiler detection under various lighting circumstances. However, when identifying pathological phenomena, the YOLOv7-based model's performance and losses metrics show lower efficacy indices ( Figure 10B), suggesting the need for a qualified and sufficient dataset for reliable and precise detection. To address this, two types of image data augmentation techniques, traditional and Mosaic techniques, were used in this study (Tables 3 and 4), synchronized with the emergence of a new version of YOLOv8. Figure 11 shows that the progression of the classification loss during model training (train/cls_loss) steadily decreased over epochs, starting at 4.5 and reaching 0.21. In contrast, the validation classification loss (val/cls_loss) started at 3.1 and ended at 0.6, which is three times greater than the train/cls_loss. This significant difference suggests that the augmentation process is primarily responsible for this variation between the two losses.

Model Comparison and the Influence of Different Dataset Augmentation Methods
The YOLOv8-based model, trained with thermal image dataset augmented via the Mosaic augmentation method (Table 3 and Figure 11), exhibited a gradual reduction in training and validation loss over 100 epochs. The performance metrics rapidly improved at the initial epochs of 25-40 and then stabilized close to one. For instance, the mAP50 metric reached 0.99 from epoch 99, showing consistent performance from epoch 73 until the end of training (Figure 11).
The Mosaic method proved to be the most effective technique for thermal image dataset augmentation in YOLO-based training for broiler pathological phenomena identification. The YOLOv8-based model achieved the highest performance metrics values among the other previous versions of 7 and 5 (Table 3), with mAP50 and mAP50-95 reaching 0.988 and 0.857, respectively, indicating exceptional performance in identifying pathological phenomena identification in complex scenes. While the Mosaic method significantly improved the thermal image dataset, its impact on the visual image dataset was different, resulting in substantial improvements with mAP50 and mAP50-90 reaching 0.829 and 0.679, respectively. Traditional augmentation methods were used individually or in combination to augment both thermal and visual image dataset. Then, the augmented dataset was used to train the YOLOv8-based model. The YOLOv8-based model training results are illustrated in Table 4. YOLO versions 8, 7, and 5 took 0.863, 0.992, and 0.139 h for training of the augmented thermal image dataset via the Mosaic method, respectively. The bounding box augmentation was applied to enhance the detection of quickly moving objects; the fundamental notion of bounding box augmentation is to change the information inside the bounding box by varying, for instance, the brightness and blur of an object relative to its background. Bounding box degree augmentation creates additional training data merely by changing the bounding boxes of a video frame. Thermal and visual image datasets augmented using the bounding box method were treated to rotate at 90 • , have a brightness of 25 or undergo a box-shear process ( Table 4). By using the grayscale augmentation method, the input image is randomly transformed into a single-channel grayscale output image. This leads the model to place less emphasis on color as a single grayscale may not be appropriate in developing a detection model for objects with one color only. Grayscale augmentation is different from grayscaling as a preprocessing phase. Grayscale, as an augmentation step, is applied arbitrarily to a portion of the images in a training dataset. The combination of augmentation methods of grayscale with 90 • rotation and box-shear of the bounding box for the visual image dataset increases the trained-model 's precision indicator to 0.901 (Table 4). However, the augmentation of the visual image dataset via the brightness 30 method can increase the precision to 0.969, and both mAP50 and mAP50-90 to 0.654 and 0.412, respectively. The performance measures of the trained model for visual image dataset augmented via a combination of flip, rotation, blur and cutout methods achieves a precision, mAP50 and mAP50-95 of 0.914, 0.573 and 0.349, respectively. Lower model recall trained for datasets augmented through methods of Shear + Brightness + Noise and Cutout, indicates that the model can only identify the pathological phenomena with a decimal of 0.539 and 0.606, respectively. However, higher precisions of 0.903 and 0.964, respectively, show that these models have higher capacity to correctly label the identified pathological phenomena through the models. F1 score gives a final impression of model performance in the classification process. For instance, the two previous models have an F1 score of 0.675 and 0.744, respectively.
In general, all these traditional augmentation trials indicate that the brightness augmentation method has the highest impact on the visual image dataset quality; enhancing precision, recall, and mAP50 (see Table 4). However, the other augmentation methods have limited impressions on some performance measures of the visual image dataset. In comparison between visual and thermal images, visual images are enhanced better than thermal images via the traditional augmentation techniques. This is due to the predominant white color of broilers inside the poultry house which allows visual images to be more responsive to traditional augmentation techniques than thermal images. In contrast, the Mosaic augmentation technique enhances thermal images more than visual images due to the temperature gradient found in thermal images, which fits the principal working concept of Mosaic augmentation. Mosaic augmentation combines multiple thermal images into a single composite image. This results in a larger field of view and provides additional context for the model to learn from. By including multiple broilers or surroundings in the image, the model can capture a more comprehensive understanding of the thermal patterns and relationships between different areas, thereby improving its ability to detect and classify abnormalities. The Mosaic augmentation method has the key answer to these interpretations, which dramatically enhances the performance metrics of the YOLOv8based trained model for the augmented thermal image dataset due to the temperature gradient found in thermal images, which does not exist in visual images (Table 3). However, augmenting visual image datasets with the Mosaic augmentation method can increase mAP50 and mAP50-95 to suitable levels for both trained YOLOv8 and YOLOv7-based models, but precision indexes stay at 0.861 and 0.802, respectively.

Developed Models Capacity
The created YOLOv8-based model in this study for broiler pathological phenomena detection is more precise and accurate with the infrared thermal camera. Broiler activity can be monitored during light hours through the YOLOv8-based model for the surveillance optical camera. The quality of the captured images is affected by illumination intensity. The previously generated models have faced challenges trying to detect objects in various lighting situations. Therefore, these challenges can be resolved by integrating optical and thermal cameras. The thermal camera is capable of capturing images in diverse lighting and weather conditions, offering a wider field of view to capture a greater variety of objects with enhanced clarity. Thus, the quality of image acquisition devices, either thermal or optical, affects the algorithm training. Consequently, both thermal and optical images make up the thermal-and visual-based datasets used in this study to develop thermal-and visual-based YOLOv8 models that work concurrently, providing different perspectives about broilers states. The main motivation to use thermal and optical cameras is the requirement to obtain more precise data represented in two types of images. This is especially important for intensive poultry farms where broilers must be continuously monitored. With the help of this method, the proposed models are more reliable for monitoring broilers aroundthe-clock in various local or microclimate circumstances. Images of the broilers were taken roughly from all viewpoints and at multiple locations or orientations. Significant poultry pathological phenomena were acknowledged in the poultry house of the Faculty of Agriculture at Kafrelsheikh University as a sick broiler, i.e., lethargic, slipped tendon, diseased eye, stressed (their beaks are open), pendulous crop, and healthy broiler. The model results show thermal and optical detection of different broiler cases ( Figure 12). lighting and weather conditions, offering a wider field of view to capture a greater variety of objects with enhanced clarity. Thus, the quality of image acquisition devices, either thermal or optical, affects the algorithm training. Consequently, both thermal and optical images make up the thermal-and visual-based datasets used in this study to develop thermal-and visual-based YOLOv8 models that work concurrently, providing different perspectives about broilers states. The main motivation to use thermal and optical cameras is the requirement to obtain more precise data represented in two types of images. This is especially important for intensive poultry farms where broilers must be continuously monitored. With the help of this method, the proposed models are more reliable for monitoring broilers around-the-clock in various local or microclimate circumstances. Images of the broilers were taken roughly from all viewpoints and at multiple locations or orientations. Significant poultry pathological phenomena were acknowledged in the poultry house of the Faculty of Agriculture at Kafrelsheikh University as a sick broiler, i.e., lethargic, slipped tendon, diseased eye, stressed (their beaks are open), pendulous crop, and healthy broiler. The model results show thermal and optical detection of different broiler cases ( Figure 12).

Conclusions
The proposed model is appropriate to detect and classify the pathological phenomena of broilers in intensive poultry houses that require round-the-clock monitoring during production season to avoid dangers. The environment inside these poultry houses is not maintained consistently and has complex scenes with sidelight, backlight, slight and strong occlusions, and daytime and nighttime illumination. Production tools, such as heaters, fans, feeders, drinking lines, dust, and others, affect light intensity at different locations inside the house. Poultry production takes care of the individuals from the first

Conclusions
The proposed model is appropriate to detect and classify the pathological phenomena of broilers in intensive poultry houses that require round-the-clock monitoring during production season to avoid dangers. The environment inside these poultry houses is not maintained consistently and has complex scenes with sidelight, backlight, slight and strong occlusions, and daytime and nighttime illumination. Production tools, such as heaters, fans, feeders, drinking lines, dust, and others, affect light intensity at different locations inside the house. Poultry production takes care of the individuals from the first day they entered the poultry house to the proper harvesting size. Three different versions of the YOLO-based algorithm were tested to achieve the best one. The developed YOLOv8-based model demonstrates enhanced and reliable performance in broiler detection and pathological phenomena classifications compared to other versions of YOLO. The developed model was trained on broiler detection at various ages and sizes. Five main categories of pathological phenomena can be acknowledged as stressed (beaks open), diseased eyes, slipped tendons, pendulous crop, lethargic and healthy broiler. The developed model was trained using the images captured with infrared thermal and visual cameras. Experiments were performed using raw thermal and visual datasets to train YOLOv7 for broiler detection and pathological phenomena classification. The performance measures of the YOLOv7-based model trained with raw datasets show acceptable levels for broiler detections. In contrast, the model used for pathological phenomena classification has the lowest pursuance for the thermal image dataset of mAP50 and mAP50-95 at 0.478 and 0.278, respectively.
For this reason, data augmentation methods are necessary to enhance the quality of the thermal and visual image datasets. Different augmentation methods were combined or individually applied to obtain the most suitable one. The thermal image dataset augmented via the Mosaic method shows the highest performance metrics in training the YOLOv8based model with an mAP50 of 0.988, an mAP50-95 of 0.857, an F1 score of 0.972, a precision of 0.988, and a recall of 0.956. Therefore, this model has the most efficient capacity for broiler detections and the pathological phenomena classification in all environmental conditions.
Overall, the implementation of the YOLOv8-based model in intensive poultry production offers significant benefits, enabling timely and accurate monitoring to avoid potential dangers during the production season. With its ability to handle complex scenes and diverse lighting conditions, this model contributes to improved poultry welfare and efficient disease control. The findings of this study open avenues for further advancements in precision livestock farming and demonstrate the potential of AI-based detection systems in enhancing poultry production management and animal care.  Institutional Review Board Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available in the main manuscript.