1. Introduction
Layer chicken farming is an important part of agriculture and has shown steady growth worldwide [
1]. The current multi-tier cage farming system significantly reduces costs. However, as farming scale and stocking density increase, relying solely on human experience to assess the health status of layer chickens becomes prone to omissions and misjudgments. Against the backdrop of Industry 4.0, technologies such as the Internet of Things (IoT), deep learning, and intelligent agricultural equipment are increasingly integrated into farming systems [
2]. Digital twin technology also demonstrates significant potential in the intelligent management of large-scale caged layer farming [
3].
Beak deformities in layer chickens represent one of the primary phenotypic traits affecting their health. Accidental deformities can result in pain, stress, and aggressive interactions among chickens, whereas deformities resulting from pathological causes may be hereditary and consequently compromise the quality of chick hatching [
4]. Relying on human expertise can lead to misjudgments on the one hand and may also trigger stress responses in chickens [
5]. In contrast, machine vision technology enables non-contact, precise measurements and early disease detection, offering rapid and non-invasive advantages [
6].
Current deep learning-based object detection models can be categorized into two types: end-to-end single-stage models, exemplified by a single-shot multibox detector (SSD) and You Only Look Once (YOLO), and two-stage models, characterized by a prior bounding box process and represented by Faster-RCNN. In 2019, Zhuang et al. proposed the IFSSD model for identifying diseased chickens in flocks based on digital image processing and deep learning, achieving an accuracy of 99.7% [
7]. In 2023, Guo et al. proposed an improved YOLOv5-based model for monitoring egg-laying behavior in layer chickens, which integrates an attention mechanism with a bidirectional feature pyramid network. The model achieved an average detection accuracy (mAP) of 96.7% [
8]. To overcome the challenges in recognizing the head features of caged chickens, Chen et al. proposed a convolutional neural network (CNN) model combined with an adaptive luminance adjustment algorithm in the same year, which achieved fine-grained detection of head status in caged chickens [
9]. In 2024, Yang et al. proposed a dead chicken detection method for caged chickens based on an improved YOLOv7 model, achieving a counting accuracy of 89.6% [
10]. Moreover, research on visual recognition in poultry farming, both domestically and internationally, has predominantly focused on broiler behavior and dead chicken detection, while studies concerning chicken beak analysis remain scarce, and there is a lack of application platforms.
Digital twin technology has been increasingly adopted across industries due to its ability to integrate multiple technologies for managing complex systems, including agriculture. In crop production, Liang et al. [
11] developed a digital twin-based paddy field monitoring system. Zhang et al. [
12] further moved to design a plant factory digital twin system combining IoT and multi-variable environment control.
In poultry farming, digital twin technology is primarily applied to status monitoring, visualization, and decision support. Currently, most research on digital twins has focused on large animals, such as pigs and cattle. Jo et al. [
13] proposed the concept of smart pig farming by integrating digital twins, the Internet of Things (IoT), and big data to enhance animal welfare. Jeong et al. [
14] developed a digital twin model aimed at optimizing heating control in pig housing. Moreover, Neethirajan et al. [
15] highlighted the potential of integrating digital twins with machine vision in livestock systems. Guo et al. [
16] further established an unmanned pig farming system that combines computer vision, IoT, and intelligent modeling. Han et al. [
17] constructed a digital twin model for cattle using farm sensor data to support physiological monitoring and prediction. Despite these advancements, the application of digital twins in small-sized poultry remains significantly underexplored.
Most previous studies have primarily focused on behavior recognition in layer chickens, while the detection of beak abnormalities has received comparatively little attention. Furthermore, existing inspection systems often rely solely on the physical capabilities of inspection robots, lacking integration with intelligent digital platforms, which restricts their potential for comprehensive perception and management in poultry farming. To address these gaps, this study proposes an enhanced beak shape detection model based on YOLOv8, which is specifically optimized for deployment on Jetson Nano edge devices, enabling the real-time detection and counting of beak anomalies. In addition, a digital twin system for caged layer chicken farms has been developed using cloud-edge collaborative technology, facilitating the synchronized mapping of inspection results into a virtual environment to improve intelligent management capabilities. The main contributions of this study can be summarized as follows:
- (1)
The Efficient Multi-Scale Attention (GSConv) module is introduced into the neck of YOLOv8 to reduce the computational complexity and make it more suitable for edge deployment, and at the same time, it can solve the problem of feature inconspicuousness and occlusion to a certain extent.
- (2)
An Efficient Multi-Scale Attention (EMA) mechanism is added before the SPPF (Spatial Pyramid Pooling-Fast) module in the backbone to enhance the model’s ability to detect the features of the small targets of the layer chickens’ beak;
- (3)
A cloud-edge collaborative data transmission framework is built, integrating InfluxDB and the EMQX cloud server, to support real-time updating and the visualization of the digital twin model for large-scale poultry farms.
2. Materials and Methods
2.1. Experimental Base
The experiment was conducted at the farm base of Yongxiang Breeder Co. in Jinhu County, Huai’an City, Jiangsu Province, China (33.01° N, 119.02° E). The internal scene is shown in
Figure 1. The farm consists of a one-way straight walkway and rows of multi-story cages with a concrete floor that is relatively flat. Each row of cages consists of 3–4 tiers with multiple compartments, each holding 3–4 laying hens. The cage structures are uniform and fixed, with farm staff traditionally following a serpentine path for feeding and inspection. However, this manual management approach presents several challenges, including prolonged exposure to high ammonia levels, which can cause eye and respiratory irritation, and the need for extensive labor, as staff typically spend 3–4 h per day inspecting the entire farm. Replacing manual inspections with automated inspection robots can significantly reduce labor costs, improve inspection frequency, and minimize health risks for workers. Image data for this study were collected from 9:00 a.m. to 11:00 a.m. from 11 to 17 December 2024, a period when the chickens are most active, typically feeding with their heads extended outside the cages, providing clear visibility of their beaks. To minimize stress and avoid disruptive behaviors, the image acquisition process was conducted under natural lighting conditions, without artificial light intervention.
2.2. Image Acquisition and Processing System
The inspection robot used in this study is built on the R550 (AKM) PLUS mobile chassis (Dongguan Wheel Technology Co., Ltd., Dongguan, China). This chassis is equipped with a depth camera and a suspended Ackermann steering structure, supporting key functions such as laser radar-based mapping, navigation, obstacle avoidance, acoustic source localization, wireless communication with the robotic arm, and real-time image acquisition. It also includes a powerful processing subsystem for on-site data handling. The robot measures 560 mm in width, 770 mm in length, and 1000 mm in height, with a maximum travel speed of 1.0 m/s. The image acquisition subsystem comprises a Hikvision DS-E12a camera paired with an NVIDIA Jetson Nano edge computing device (NVIDIA Co., Santa Clara, CA, USA), enabling high-speed data processing at the farm site.
The farm layout includes 1.2 m wide channels between each row of cages, providing sufficient space for the robot to navigate while capturing images. To optimize image quality for beak detection, the cameras on each tier were positioned at a top-down angle, allowing clearer views of the chickens’ beaks as they extend their heads from the cages to feed. This positioning strategy enhances the visibility of beak morphology, ensuring that the captured images provide comprehensive data for training the deep learning model. To increase the diversity of the training set, images were collected from multiple channels within the farm, capturing a wide range of beak shapes and conditions. The collected images serve as inputs for the subsequent deep learning model, which is designed to detect and count abnormal beak features in layer chickens. The detection results are then transmitted to an EMQX cloud server for real-time visualization and monitoring through a digital twin platform, facilitating intelligent farm management.
Figure 2 illustrates the complete image acquisition and processing system.
2.3. Beak Trait Definition and Dataset Construction
Based on the research findings and the objectives of this study, the beak shapes of caged layer chickens are broadly classified into two main categories: normal and abnormal. A normal beak is characterized by symmetrical alignment between the upper and lower mandibles, with a tight fit and no noticeable displacement when the beak is closed. Given the primary focus of this study on detecting abnormal beak shapes, abnormal beaks are further divided into two subcategories: mild and severe. This classification captures varying degrees of deformation, ranging from slight asymmetry or minor overgrowth (mild) to significant misalignment or severe structural damage (severe). Detailed definitions and representative examples of these three beak shape categories are presented in
Table 1.
Each cage in the farm houses 3–4 chickens, and the inspection robot randomly captured approximately 500 chicken images from various rows and tiers within the facility. After eliminating invalid images, such as those affected by motion blur, target loss, and severe occlusion, a total of 1248 high-resolution (1920 × 1080 pixels) raw images were obtained, including approximately 480 images of normal chickens. To improve the overall quality of the dataset and provide stable data support for the digital twin system, a series of preprocessing steps was applied to the images. Under natural lighting conditions, the phenotypic features of layer chickens in lower-tier cages appeared unclear due to low contrast and shadow-induced detail loss, which might introduce outliers during training and affect model performance. Therefore, low-light images were enhanced and brightened. First, a bilateral filter with a window size of 5 was applied for denoising to effectively reduce noise in the dark regions. Then, adaptive histogram equalization was performed to enhance image details and textures. Finally, the images were converted to the HSV color space, and gamma correction was applied to the brightness channel with a coefficient of 0.6 to further improve luminance. As shown in
Figure 3, the comparison between the original and processed images highlights the significant improvement in image clarity. In particular,
Figure 3a,b demonstrate that the dark details have been effectively brightened, making beak features more prominent. Additionally, the grayscale histograms in
Figure 3c,d reveal a more balanced and uniformly distributed pixel intensity in the processed images, further confirming the effectiveness of the preprocessing steps.
During the actual sample collection process, it was observed that the number of layer chickens with normal beaks significantly outnumbered those with abnormal beaks, resulting in a pronounced class imbalance in the multi-class object detection task. This imbalance can lead to overfitting on the majority (normal) class and underfitting on the minority (abnormal) classes, reducing the overall effectiveness of the model. To address this issue, data augmentation techniques were applied primarily to images of chickens with abnormal beaks. These techniques included translation, scaling, rotation, mirroring, blurring, and noise addition, effectively increasing the proportion of abnormal samples and enhancing the model’s robustness in real-world scenarios. This approach helps reduce the risk of local overfitting and improves the model’s ability to generalize across diverse conditions. As a result, the final dataset was expanded to 1,853 images, providing a more balanced representation of normal and abnormal beak shapes. Examples of the augmented images are shown in
Figure 4.
The annotation tool LabelImg was used to label the three beak shape categories in each image using the bounding box method, as illustrated in
Figure 5. This process generated two types of annotation formats: YOLO (.txt) and VOC (.xml). In the .txt format, each line represents a single object within the image, containing the class label, normalized center coordinates, and the normalized width and height of the bounding box. This format is optimized for the YOLO detection framework. The .xml files, on the other hand, are structured according to the PASCAL Visual Object Classes (VOC) format, making them suitable for input into a broader range of experimental algorithms and model architectures.
The dataset consists of 1853 images of layer chickens, with each image annotated with 1 to 5 targets. According to the statistics, the dataset includes 2539 normal beak targets, 214 mild abnormal beak targets, and 1023 severe abnormal beak targets. To prepare for model training and evaluation, the dataset is randomly divided into training, validation, and test sets with a ratio of 7:2:1.
2.4. Detection Model of Beak Trait in Caged Layer Chickens
Since this study requires the twinning of the model for implementation, both detection speed and accuracy are taken into account in the selection of the base model. Although the model represented by the RCNN series can achieve high detection accuracy, its structure is complex, and the detection speed cannot meet the demand. The other type of single-stage model represented by the YOLO series has a large improvement in inference speed and model accuracy after many iterations. Therefore, after comprehensively comparing the performance effects of multiple models on public datasets [
18], the YOLO series algorithm was chosen as the base network model for improvement. In particular, compared with the previous version, YOLOv8 has made different degrees of improvements in the backbone, neck, and head [
19]. To lighten the model structure and increase detection speed while ensuring detection accuracy, YOLOv8 adopts fewer feature parameters and dynamic data enhancement techniques, which makes the model more suitable for deployment on edge computing devices for chicken beak anomaly detection [
20]. In addition, YOLOv8 combines the distribution focal loss (DFL) technique in the loss function to deal with the category imbalance problem. It adjusts the factors in the focal loss according to the sample distribution of each category so as to focus on rare categories more effectively, improve the model’s learning ability and generalization performance, and provide a strong condition for abnormal trait detection of caged chickens.
In this study, the Efficient Multi-Scale Attention (EMA) mechanism is incorporated before the SPPF module of the backbone network. This method strategically reshapes partial channels into the batch dimension to preserve channel-wise information, thereby enhancing the model’s capacity for capturing the fine-grained morphological features of layer chicken beaks. Such architectural optimization effectively mitigates challenges posed by low feature saliency and inter-hen occlusions in intensive farming environments [
21]. The structure of the EMA mechanism is shown in
Figure 6. The integration of the EMA mechanism into YOLOv8 significantly enhances its detection capability for small targets, such as beak deformities in layer chickens [
22] while concurrently reducing computational overhead. This optimization renders the model more suitable for deployment on resource-constrained embedded devices.
In order to reduce the amount of model computation, the GSConv module in the slim-neck design paradigm [
23] was used in this study to replace the ordinary convolutional blocks in the neck part of the baseline model. In traditional convolutional neural networks, spatial information is progressively transformed into channel-wise representations through successive operations. This conversion process inevitably causes partial semantic information loss during each stage of spatial compression and channel expansion. The structure of the GSConv module is shown in
Figure 7, which can synergistically combine the advantages of the high accuracy of standard convolution with the computational efficiency of depth-separable convolution to balance accuracy and computational efficiency in detecting abnormalities in the beak of layer chickens and adapt to the deployment of the GSConv module while preserving the fine-grained features [
24]. In the neck section of YOLOv8, since the number of channels is maxed out and compression is not required, this module can be added here to minimize the amount of information lost and reduce the amount of model computation. The finalized improved YOLOv8 network structure is shown in
Figure 8.
2.4.1. Experimental Development Environment
The experimental model was developed and trained on cloud computing infrastructure provided by the AudoDL computational platform, utilizing an Ubuntu 20.04 LTS operating system environment. The implementation employed Python 3.8 as the programming language with the PyTorch 1.11.0 framework. Hardware acceleration was achieved through an NVIDIA RTX 4090 GPU featuring 24 GB of VRAM, coupled with an Intel(R) Xeon(R) Platinum 8352 V processor (16 virtual CPU cores @ 2.10 GHz base frequency) and 120 GB of system memory. Computational workflows were accelerated using the CUDA 11.3 parallel computing architecture with cuDNN 8.2.1 deep neural network library optimizations.
2.4.2. Model Evaluation Metrics
To holistically evaluate model performance across multiple dimensions, this study employs four principal metrics: precision (P), recall (R), mean average precision (mAP), and F1-score. Each metric is mathematically defined by Equations (1)–(5).
where TPs (true positives) denote instances correctly identified as positive-class samples, whereas FPs (false positives) represent negative-class samples erroneously classified as positive, and FNs (False negatives) correspond to positive-class samples misclassified as negative. AP (average precision) is mathematically derived as the area under the PR (precision–recall) curve, encapsulating the precision–recall trade-off across varying confidence thresholds. With the total number of object categories denoted as N (specifically N = 3 in this experimental configuration), the class-specific AP value for the i-th category is designated as AP
i. The mAP metric serves as a comprehensive indicator of multi-class recognition capability, which is computed through the arithmetic mean of AP values across all N categories, where elevated mAP scores reflect enhanced model robustness. The F1-score provides a balanced assessment of classification accuracy and completeness, with higher values indicating an optimal equilibrium between these competing objectives.
2.5. Design of Digital Twin System
The rise in digital twin technology provides a new intelligent management solution for large-scale farms, which can satisfy the manager’s demand for concise and intuitive remote monitoring. Unreal Engine 5 (UE5), initially a powerful commercial game engine with strong interactive capabilities and cross-platform features, has gradually become a powerful tool for some researchers to build digital twins [
25,
26].
2.5.1. Data Acquisition and Transmission
Monitoring beak traits is not the only focus in caged layer chicken farms; environmental parameters also play a critical role in the health and well-being of the chickens. Elevated levels of temperature, humidity, and CO2 can cause heat stress, while harmful gases like H2S and NH3 pose direct threats to animal welfare. To address these concerns, this study synchronized and presented six key environmental parameters—temperature, humidity, CO2, H2S, NH3, and light intensity—through a digital twin system using an inspection robot. The video streams from the inspection robot are transmitted to the digital twin via the Real-Time Streaming Protocol (RTSP), enabling the remote monitoring of both environmental and beak-related conditions.
Due to variations in data sources and formats, this system establishes a cloud-edge collaborative data exchange mechanism by utilizing the JSON format to encapsulate various types of data. JSON is widely used in IoT communication for data formatting, as it allows for easy cross-platform parsing and processing. In this study, different data sources are standardized into records, with each record containing a message name, timestamp, and specific data items, as illustrated in
Table 2.
While message queuing telemetry transport (MQTT) serves as the underlying lightweight communication protocol for this system, Erlang MQTT Broker X (EMQX) is deployed as the core broker to manage data transmission and routing. Built on Erlang/OTP, EMQX supports the publish/subscribe model and Quality of Service level 2 (QoS2) and offers a graphical interface, scalability, and multiple security mechanisms. Two message topics are defined based on different data sources, as shown in
Table 3. In terms of transmission security, EMQX provides a variety of client authentication methods to ensure that clients connecting to the server are authorized. This study uses password authentication to set up the username and password required to establish a connection in the management panel, and all clients are required to add the corresponding fields when requesting a connection. In addition, EMQX also supports SSL/TLS-encrypted connections, which will ensure the security and integrity of data communication at the transport layer of the network connection. This study utilizes the OpenSSL tool to generate digital certificates and set the configuration items in the deployment panel when the client establishes a connection to carry the authentication information to send a request so as to establish a secure communication channel through the encrypted port. For time-series data, EMQX forwards messages via its rule engine to InfluxDB, a time-series database using the time-series measurement (TSM) engine, which supports large-scale data with low latency and high throughput [
27].
To enable the data-driven operation of the digital twin system, this study constructs a data acquisition and transmission framework, as illustrated in
Figure 9. The framework encompasses three core processes: multi-source data collection, standardized data encapsulation, and cloud-edge collaborative communication. An inspection robot serves as the primary data acquisition platform, as it is equipped with a Jetson Nano module that offers high responsiveness and efficient bandwidth utilization. This configuration supports the real-time acquisition of environmental and video data while minimizing latency and reducing cloud-side computational load [
28,
29].
2.5.2. Twin Scene Rendering
In order to visualize and interact with the acquired multi-source data, this study uses the Blender (Blender 3.5, Blender Foundation, Amsterdam, NH, The Netherlands) and Cero (Cero 11, Maxon Computer GmbH, Friedrichsdorf, HE, Germany) modeling software to construct a three-dimension (3D) static model of each element of the chicken farm, and two rows of cages and an aisle are selected to construct an equal-scale digital twin scene in UE5 with lighting, physics simulation, gravity, and mass, as shown in
Figure 10.
As described in
Section 2.5.1, the system uses EMQX to manage data transmission. Building on this, UE5 subscribes to relevant topics and parses messages from various data sources, which serve as inputs to the digital twin model, enabling real-time updates. As shown in
Figure 11, this study uses the MQTT Monster plugin in UE5. Through blueprint logic, the system completes client setup, topic subscription, and message listening. Since all data are formatted in JSON, callback functions are triggered when messages arrive. The data are deserialized and passed to the corresponding model components through the event distribution system to update the twin in real time.
The core of a digital twin lies in creating a digital twin platform that can simulate various behaviors and events in the real world. Interactivity serves as a key bridge that enables user interaction with the twin model. From the perspective of poultry farm managers, the interactive design in UE5 should prioritize ease of use and operational efficiency. Therefore, model interaction is designed to rely on conventional keyboard and mouse inputs. Interaction design requirements and user interface elements are summarized in
Figure 12.
2.5.3. Model Twin Implementation
While real-time rendering enables intuitive interaction, intelligent perception in the digital twin system further relies on embedded machine vision models. To achieve real-time integration from edge-based beak shape detection to twin visualization, this study designs and implements a complete deployment and mapping process, covering model conversion, device integration, data transmission, and platform visualization. The overall structure is shown in
Figure 13.
Based on the improved YOLOv8 model above, this study deploys the beak shape detection model on the Jetson Nano platform to enable the real-time detection of caged layer chickens and digital twin mapping. Considering the limited computing resources of Jetson Nano, the model is converted from .pt to the .engine format using the official YOLOv8 export tool and accelerated with TensorRT. TensorRT, developed by NVIDIA, is a high-performance deep learning inference library compatible with mainstream frameworks and significantly improves inference efficiency on edge devices. In addition, YOLOv8 officially provides the BoT-SORT tracker, which can call the tracking method for target tracking by loading the target detection model during the model inference process, which greatly facilitates the realization of anomaly detection and the counting of the beak of caged chickens in this study.
After deployment, a USB camera is connected to the system as the input device, and image acquisition is performed by the inspection robot. Once inference is completed, the detection results are formatted as JSON and transmitted to the EMQX cloud server via the MQTT protocol. The UE5 client of the digital twin platform subscribes to the “robot/detect” topic, receives and parses the inspection data in real-time, and displays the number of chickens with abnormal beaks. In the practical application of the system, the detection accuracy is improved by conducting multiple inspections on the same day, effectively reducing the leakage rate of missed chicken beak targets.
3. Results
3.1. Comparison of Model Effect Before and After Improvement
The network architecture of the YOLOv8 model was optimized via the aforementioned enhancement strategy, and the comparative loss curves during training are illustrated in
Figure 14. Both the baseline and modified architectures achieve stable convergence, yet the improved YOLOv8 model demonstrates accelerated feature learning with faster convergence rates and smoother optimization trajectories. Notably, the reduced discrepancy between training and validation losses indicates superior generalization capability when processing unseen data.
The detection performance of both YOLOv8 and improved YOLOv8 architectures was rigorously evaluated using the same test set, with representative detection results presented in
Figure 15. As shown in the figure below, the improved YOLOv8 model exhibits superior prediction confidence over the baseline YOLOv8 architecture when detecting wire-occluded beak targets, demonstrating enhanced reliability in challenging occlusion scenarios. Furthermore, critical performance differentiation is observed in identifying morphologically subtle “Mild”-class deformities: while the baseline YOLOv8 model erroneously classified mildly deformed beaks as normal variants, the improved YOLOv8 model achieves precise recognition through advanced feature discrimination. This capability stems from the network’s enhanced and strengthened feature extraction capacity, which effectively suppresses interference from non-critical morphological variations. However, some challenges remain, including missed detections when the beak targets are distant or under low-light conditions. It is worth noting, though, that these distant targets are typically located on the adjacent side of the coop and can still be clearly detected as the inspection robot moves closer.
3.2. Ablation Experiments
To evaluate the efficacy of individual architectural refinements, this study conducts a comprehensive ablation analysis under identical experimental configurations. Performance comparisons were conducted by sequentially modifying individual modules against the aforementioned evaluation metrics, with model speed quantified through training iteration speed (it/s) and inference speed (img/s). Additionally, this study incorporated comparative experiments by replacing the EMA module with mainstream attention mechanisms, namely the squeeze and excitation (SE) module and the convolutional block attention module (CBAM). The evaluation results are documented in
Table 4 and
Figure 16.
As shown in the experimental results in the preceding figure and table, replacing the standard convolution blocks with the GSConv module in the baseline YOLOv8 model leads to notable improvements in detection performance, particularly for mild-class beak deformities. Specifically, the F1-score and precision for mild-class targets increase by 12.5% and 5.8%, respectively, compared to the baseline model. This substantial performance gain confirms that integrating the GSConv module within the neck network layer effectively preserves critical information during hierarchical feature propagation and cross-scale fusion, enhancing the model’s ability to capture fine-grained features. Additionally, the GSConv-based model demonstrates superior computational efficiency, achieving a 6.4% reduction in parameter count compared to the original architecture, further highlighting its advantages in lightweight edge computing scenarios.
Models incorporating three attention modules show improved accuracy (mAP, F1-score, precision, and recall), as the mechanisms filter out non-critical features based on importance. The EMA-enhanced model achieves the highest mAP values for normal-class and severe-class targets, while the SE-integrated architecture demonstrates minimal overall performance gains. This demonstrates that the SE module, which exclusively focuses on channel-wise information, achieves the poorest accuracy performance despite its lower computational overhead. The CBAM, which simultaneously processes spatial and channel information, achieves superior performance but causes significant parameter expansion and reduced detection speed. Therefore, the integrated GSConv-EMA architecture is optimal for this digital twin system’s implementation, achieving balanced optimization between holistic detection performance and real-time operational efficiency.
3.3. Comparison of Different Models’ Performance
To further validate the effectiveness of the improved detection model, this study mainly evaluates the recognition performance of the improved YOLOv8 against four representative object detection algorithms, including YOLOv5n, YOLOv7-tiny, YOLOv8n, and Faster-RCNN. For a fair comparison, all models were run in the same environment and tested on the same dataset. The results, presented in
Table 5, show that the improved YOLOv8 model outperforms its counterparts. This is attributed to its targeted optimization in beak recognition, which enhances the ability of the model to represent beak shape features effectively.
As shown in the table, the improved YOLOv8 and Faster-RCNN models achieve the highest detection accuracy in terms of precision, recall, and F1-score. However, as a two-stage detection framework, Faster-RCNN suffers from a significantly larger model size (44.8 MB) and the lowest inference speed (9 FPS), making it unsuitable for real-time applications.
Compared with YOLOv7-tiny, the improved YOLOv8 model demonstrates superior recognition performance. Although YOLOv7-tiny achieves the fastest inference speed (25 img/s), its detection accuracy is limited due to its lightweight design, as reflected in a relatively low F1-score of 0.72 and an mAP of 78.8%.
It is observed that the improved YOLOv8 model also outperforms the baseline model YOLOv8n and the classic YOLOv5n model, with mAP improvements of 2.4% and 3.2%, respectively. Although its inference speed (19 img/s) is slightly lower than that of the other YOLO variants, the improvement in detection accuracy makes it a more balanced and practical solution.
In summary, the improved YOLOv8 model achieves accurate detection of abnormal beak conditions while maintaining acceptable inference speed, meeting the requirements of real-time feedback in digital twin systems for poultry farming.
3.4. Field Experiment of the Improved YOLOv8 Model
In order to test the actual operation effect of the digital twin system designed and developed in this study and to verify the actual performance of the beak detection model of layer chickens, experiments were conducted in the above chicken farm, as shown in
Figure 17a. The experimental network environment is covered by a wireless LAN with a bandwidth of 150 Mbps, and the network situation is relatively stable. The digital twin scenario runs on the local Windows 10 operating system, and the hardware environment is an RTX3060 graphics card with 8 GB of video memory, an 11th Gen Intel(R) Core(TM) i5-11400 processor with a base frequency of 2.60 GHz, and 16 GB of RAM. After connecting the inspection robot edge computing device and the local machine to this LAN cable for experimentation, the twinning scenario running interface is shown in
Figure 17b.
We first tested the connectivity of the topic interfaces by connecting the sending and receiving clients to the EMQX proxy server and sending ten test cases for each topic to verify that messages could be successfully received. The test results are shown in
Table 6, where all topics were able to communicate properly.
The digital twin maturity model requires that the system has high real-time performance and stability, so this chapter continues to test the latency and stability of the interface. The test methodology is to count the message latency magnitude and loss rate when the subject is continuously transmitted for about 30 min in a real scenario. Data latency checking can be accomplished by calculating the difference in timestamp attributes between the sender and receiver of a message, and data loss checking can be accomplished by setting counters at the sender and receiver. In addition, message inflow and outflow statistics of the proxy server can be viewed in the administration panel of EMQX. Before the test starts, this study sets different sending frequencies for messages of different topics to simulate the transmission of messages during actual operation. The test results are shown in
Table 7. From the perspective of message transmission delay, the topics in this study have high real-time performance, which is close to the industrial IoT millisecond wireless transmission standard; from the perspective of interface stability, the data can be successfully received after 30 min of continuous communication.
Next, the detection performance of the improved YOLOv8 model after deployment on Jetson Nano was evaluated. Prior to detection, the elevation bar height and camera angle were adjusted, and the distance between the robot and the target cage was set to 30 cm to ensure that clear images were captured. At the beginning of the inspection, the corresponding program is launched, and the topic “/robot/detect” is published. The real-time detection results are shown in
Figure 18. As illustrated on the left, the beak shapes of each layer of chicken in the current frame are accurately identified, with the number of detections and the processing time displayed in the lower command-line window. The terminal on the right displays the number of detected abnormal beaks published. Meanwhile, statistical results are successfully visualized in the virtual environment. During the experiment, the robot carries out detection tasks across all three levels of the cages—upper, middle, and lower.
4. Discussion
This study focuses on the recognition of beak shape in caged layer chickens, classifying the beak conditions into three categories: normal, mildly abnormal, and severely abnormal. To enhance the ability to distinguish between these categories, a task-oriented detection model, YOLOv8+GSConv+EMA, is proposed. A multi-layer digital twin platform for caged layer farms is also developed, with the YOLOv8+GSConv+EMA model deployed on the edge computing device Jetson Nano to achieve real-time object detection and counting, enabling a dynamic twin mapping of the farm environment.
Owing to its advantages in parameter efficiency, detection accuracy, and operational stability, YOLOv8 is selected as the base architecture for further structural enhancements in this study. Specifically, the GSConv module is used to replace the standard convolution block in the neck to balance both detection precision and computational efficiency. Furthermore, an EMA mechanism is incorporated before the SPPF module in the backbone to strengthen the network’s focus on critical feature regions. Compared to conventional models, the improved algorithm achieves a 12.5% increase in detection accuracy for mildly abnormal cases. In evaluations against other mainstream object detection models, YOLOv8+GSConv+EMA exhibits superior overall performance while maintaining a compact model size (6.3 MB) and a real-time processing speed of 19 images per second.
To meet the requirements for real-time responsiveness and data management in digital twin applications, an MQTT-based EMQX cloud server is constructed to manage data flow and message distribution across the platform. Multi-source heterogeneous data are uniformly packaged in JSON format, enabling low-latency communication through EMQX. Meanwhile, InfluxDB is used as a time-series database to optimize the storage of large-scale monitoring data. Together, these components constitute a complete data-driven framework that enables the mapping from physical perception to the digital twin model. In terms of digital twin implementation, Cero and Blender are used for 3D geometry modeling, while UE5 serves as the rendering engine. Finally, with YOLOv8+GSConv+EMA deployed on Jetson Nano, real-time detection and statistical counting of abnormal beaks are achieved, offering visual support and technical infrastructure for intelligent farm management.
Although the proposed YOLOv8+GSConv+EMA-based method for recognizing beak shape anomalies in caged layer chickens has achieved satisfactory results in experiments, there are still some limitations.
Firstly, the YOLOv8+GSConv+EMA model may miss detections when small-scale beak targets are fully occluded by dense metal cage wires, especially in the lower levels of multi-tier cages, where target overlap and uneven lighting further increase detection difficulty. Future work will explore advanced occlusion-handling techniques and lighting compensation strategies to improve robustness under complex conditions. Secondly, the current study relies on the tracking and counting function integrated into the YOLOv8 framework, which must run in the Ultralytics environment, which increases the memory consumption of the edge device and is not conducive to multitasking and parallel processing. Future research could address this issue by improving hardware performance or employing specialized, lightweight target-tracking algorithms to enhance the scalability of the digital twin framework. Thirdly, the digital twin system may encounter several challenges in practical commercial applications. Specifically, its real-time performance and system stability could be affected in larger-scale deployments, and development costs may also rise accordingly. In future work, we plan to enhance the system design by optimizing hardware selection and refining the underlying algorithms to address these issues.
5. Conclusions
In this study, a specialized beak shape recognition and detection model is proposed in caged layer chickens based on the improved YOLOv8 model. Specifically, in the neck, the standard convolutional block is replaced with the GSConv module to improve feature fusion efficiency. In addition, an EMA mechanism is integrated into the backbone to improve fine-grained feature extraction. Meanwhile, this study utilizes IoT and cloud-edge collaboration technology to establish a digital twin platform to realize the application of the chicken beak detection model on the ground. The innovation of this study is that a new cage chicken detection model is proposed, which can effectively improve the extraction ability for the inconspicuous features of chicken beaks and solve the problem of small target occlusion to a certain extent. In addition, this study combines deep learning and digital twin technology to propose a new remote monitoring solution for caged chicken farms with the purpose of optimizing the manual inspection method. The final YOLOv8+GSConv+EMA model achieves 91% precision, 91% recall, and a 93% F1-score, outperforming other object detection models. Notably, it achieves an mAP of 92.7%, which is a 2.4% improvement over the baseline model. These results indicate that YOLOv8+GSConv+EMA can effectively recognize the beak shape of caged laying chickens, which is expected to be practically applied in poultry farming. In the future, we will try to further improve the feature extraction ability and generalization ability of the model and apply it to a wider range of livestock and poultry breeding fields. For larger farms, we will try to upgrade the robot to enable it to collect images of multi-layer caged chickens at the same time and at the same time use edge computing devices with higher performance to support multi-task parallel processing.
Author Contributions
Conceptualization, H.L. and X.Z. (Xiuguo Zou); methodology, H.L. and X.Z. (Xiuguo Zou); validation, H.L., H.C., J.L. and X.Z. (Xiuguo Zou); formal analysis, H.L., H.C., J.L. and Y.L.; investigation, H.L., H.C., J.L. and Y.L.; data curation, H.L., H.C., J.L., Q.Z., T.L. and X.Z. (Xinyu Zhang); writing—original draft preparation, H.L., H.C. and Y.L.; writing—review and editing, Y.L., Y.Q. and X.Z. (Xiuguo Zou); visualization, H.L., H.C. and J.L.; supervision, Y.Q. and X.Z. (Xiuguo Zou); project administration, X.Z. (Xiuguo Zou); funding acquisition, X.Z. (Xiuguo Zou). All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Key Research and Development Program of China (Grant No. 2024YFD2000302) and the International Science and Technology Cooperation Program of Jiangsu Province (Grant No. BZ2023013).
Institutional Review Board Statement
Not applicable.
Data Availability Statement
Data are contained within the article.
Acknowledgments
We are thankful to Sunyuan Wang, Zhenyu Liang, and Chengrui Xin, who have contributed to our field data collection and primary data analysis.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Zhang, L.; Liu, H. World layer hens industry development history and trend forecast. Chin. J. Anim. Sci. 2012, 3, 12–16. [Google Scholar]
- Yang, D.; Cui, D.; Ying, Y. Development and trends of chicken farming robots in chicken farming tasks: A review. Comput. Electron. Agric. 2024, 221, 108916. [Google Scholar] [CrossRef]
- Wang, L. Digital Twins in Agriculture: A Review of Recent Progress and Open Issues. Electronics 2024, 13, 2209. [Google Scholar] [CrossRef]
- Zhu, J.; Chen, J.; Liu, R.; Zheng, M.; Hu, J. Analysis of Major Chemical Components in Chicken Beak Tissue. China Poult. 2012, 34, 59–60. [Google Scholar]
- Wu, D.; Cui, D.; Zhou, M.; Ying, Y. Information perception in modern poultry farming: A review. Comput. Electron. Agric. 2022, 199, 107131. [Google Scholar] [CrossRef]
- Zhao, N.; Li, X.; Jiang, Y.; Wang, Z.; Bi, Y.; Chen, G.; Chang, G. Application of Image Recognition Technology in the Field of Chicken Breeding. J. Agric. Sci. Technol. 2023, 25, 13–22. [Google Scholar]
- Zhuang, X.; Zhang, T. Detection of sick broilers by digital image processing and deep learning. Biosyst. Eng. 2019, 179, 106–116. [Google Scholar] [CrossRef]
- Guo, Y.; Regmi, P.; Ding, Y.; Bist, R.; Chai, L. Automatic detection of brown hens in cage-free houses with deep learning methods. Poult. Sci. 2023, 120, 102784. [Google Scholar] [CrossRef]
- Chen, J.; Yao, W.; Shen, M.; Liu, L. Fine-grained detection of caged-hen head states using adaptive Brightness Adjustment in combination with Convolutional Neural Networks. Int. J. Agric. Biol. Eng. 2023, 16, 208–216. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, T.; Fang, C.; Zheng, H.; Ma, C.; Wu, Z. A detection method for dead caged hens based on improved YOLOv7. Comput. Electron. Agric. 2024, 226, 109388. [Google Scholar] [CrossRef]
- Liang, C. Research and Implementation of Rice Field Environmental Monitoring System Based on Digital Twin. Master’s Thesis, Hunan Agricultural University, Changsha, China, 2020. [Google Scholar]
- Zhang, Z.; Zhu, Z.; Gao, G.; Qu, D.; Zhong, J.; Jia, D.; Pan, S. Design and research of digital twin system for multi-environmental variable mapping in plant factory. Comput. Electron. Agric. 2023, 213, 108243. [Google Scholar] [CrossRef]
- Jo, S.K.; Park, D.H.; Park, H.; Kwak, Y.; Kim, S.H. Energy planning of pigsty using digital twin. In Proceedings of the 19th International Conference on Information and Communication Technology Convergence (ICTC 2019), Jeju-si, Republic of Korea, 1 August 2019; pp. 723–725. [Google Scholar]
- Jeong, D.; Jo, S.K.; Lee, I.B.; Shin, H.; Kim, J.G. Digital Twin Application: Making a Virtual Pig House Toward Digital Livestock Farming. IEEE Access 2023, 11, 121592–121602. [Google Scholar] [CrossRef]
- Neethirajan, S.; Kemp, B. Digital twins in livestock farming. Animals 2021, 11, 1008. [Google Scholar] [CrossRef] [PubMed]
- Guo, J.; Kong, Y.; Liu, S. Construction and application of digital unmanned system in pig farming. J. Huazhong Agric. Univ. 2024, 43, 288–296. [Google Scholar]
- Han, X.; Lin, Z.; Clark, C.; Vucetic, B.; Lomax, S. AI Based Digital Twin Model for Cattle Caring. Sensors 2022, 22, 7118. [Google Scholar] [CrossRef]
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910. [Google Scholar] [CrossRef]
- Badgujar, C.M.; Poulose, A.; Gan, H. Agricultural object detection with You Only Look Once (YOLO) Algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 2024, 223, 109090. [Google Scholar] [CrossRef]
- Ma, N.; Su, Y.; Yang, L.; Li, Z.; Yan, H. Wheat Seed Detection and Counting Method Based on Improved YOLOv8 Model. Sensors 2024, 24, 1654. [Google Scholar] [CrossRef]
- Huo, S.; Duan, H.; Xu, Z. An improved multi-scale YOLOv8 for apple leaf dense lesion detection and recognition. IET Image Process. 2024, 18, 4913–4927. [Google Scholar] [CrossRef]
- Zheng, H.; Jiang, L.; Wen, L. A water surface residual feed detection algorithm based on improved YOLOv8n. Fish. Mod. 2025, 52, 80–88. [Google Scholar]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight-design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
- Wang, C.; He, D.; Zhu, B. Lightweight rice field weed detection algorithm based on YOLO-RW. J. Nanjing Agric. Univ. 2025, 3, 1–15. [Google Scholar]
- Qiu, H.; Zhang, H.; Lei, K.; Zhang, H.; Hu, X. Forest digital twin: A new tool for forest management practices based on Spatio-Temporal Data, 3D simulation Engine, and intelligent interactive environment. Comput. Electron. Agric. 2023, 215, 108416. [Google Scholar] [CrossRef]
- Sørensen, J.V.; Ma, Z.; Jørgensen, B.N. Potentials of game engines for wind power digital twin development: An investigation of the Unreal Engine. Energy Inform. 2022, 5, 39. [Google Scholar] [CrossRef]
- Huynh, D.V.; Khosravirad, S.R.; Masaracchia, A.; Dobre, O.A.; Duong, T.Q. Edge Intelligence-Based Ultra-Reliable and Low-Latency Communications for Digital Twin-Enabled Metaverse. IEEE Wirel. Commun. Lett. 2022, 11, 1733–1737. [Google Scholar] [CrossRef]
- Li, H.; Wang, H.; Liu, G.; Wang, J.; Evans, S.; Li, L.; Wang, X.; Zhang, S.; Wen, X.; Nie, F.; et al. Concept, system structure and operating mode of industrial digital twin system. Comput. Integr. Manuf. Syst. 2021, 27, 3373–3390. [Google Scholar]
- Cai, T.; Liu, P.; Niu, D.; Shi, J.; Li, L. The Embedded IoT Time Series Database for Hybrid Solid-State Storage System. Sci. Program. 2021, 1, 9948533. [Google Scholar] [CrossRef]
Figure 1.
Interior scene of a caged egg-laying chicken farm.
Figure 1.
Interior scene of a caged egg-laying chicken farm.
Figure 2.
Image acquisition and processing system.
Figure 2.
Image acquisition and processing system.
Figure 3.
The image acquisition and processing system: (a) original images; (b) brightened image. (c) Grayscale histogram of the original image; (d) grayscale histogram of the brightened image.
Figure 3.
The image acquisition and processing system: (a) original images; (b) brightened image. (c) Grayscale histogram of the original image; (d) grayscale histogram of the brightened image.
Figure 4.
Binocular camera imaging model: (a) translation; (b) scaling; (c) rotation; (d) mirroring; (e) blurring; (f) salt-and-pepper noise.
Figure 4.
Binocular camera imaging model: (a) translation; (b) scaling; (c) rotation; (d) mirroring; (e) blurring; (f) salt-and-pepper noise.
Figure 5.
Process of LabelImg labeling.
Figure 5.
Process of LabelImg labeling.
Figure 6.
EMA mechanism structure diagram.
Figure 6.
EMA mechanism structure diagram.
Figure 7.
GSConv structure diagram.
Figure 7.
GSConv structure diagram.
Figure 8.
Improved YOLOv8 network structure diagram.
Figure 8.
Improved YOLOv8 network structure diagram.
Figure 9.
Data acquisition and transmission workflow.
Figure 9.
Data acquisition and transmission workflow.
Figure 10.
Physical and digital twin scenarios for large-scale caged layer chicken farms: (a) physical scenario; (b) digital twin scenario.
Figure 10.
Physical and digital twin scenarios for large-scale caged layer chicken farms: (a) physical scenario; (b) digital twin scenario.
Figure 11.
Message subscription and parsing.
Figure 11.
Message subscription and parsing.
Figure 12.
Requirements for interactivity design.
Figure 12.
Requirements for interactivity design.
Figure 13.
Model deployment and the twin realization process.
Figure 13.
Model deployment and the twin realization process.
Figure 14.
Comparison of loss curves before and after YOLOv8 model improvement: (a) YOLOv8 model loss curve; (b) improved YOLOv8 model loss curve.
Figure 14.
Comparison of loss curves before and after YOLOv8 model improvement: (a) YOLOv8 model loss curve; (b) improved YOLOv8 model loss curve.
Figure 15.
Comparison of detection results before and after YOLOv8 model improvement: (a) YOLOv8 model detection results; (b) improved YOLOv8 model detection results.
Figure 15.
Comparison of detection results before and after YOLOv8 model improvement: (a) YOLOv8 model detection results; (b) improved YOLOv8 model detection results.
Figure 16.
Comparison of accuracy results of ablation experiments.
Figure 16.
Comparison of accuracy results of ablation experiments.
Figure 17.
Experimental environment display: (a) field experiment diagram; (b) twin scene runtime interface.
Figure 17.
Experimental environment display: (a) field experiment diagram; (b) twin scene runtime interface.
Figure 18.
Field experiment effect diagram: (a) real-time detection effect diagram; (b) twin mapping effect.
Figure 18.
Field experiment effect diagram: (a) real-time detection effect diagram; (b) twin mapping effect.
Table 1.
Different behavioral definitions of yellow feather broilers.
Table 2.
Standardized JSON message structures.
Table 2.
Standardized JSON message structures.
Data Source | Data Format |
---|
Environmental Parameters | {name, timestamp, data:{temperature, humidity, h2s, co2, nh3, light}} |
Model detection results | {name, timestamp, data:{abnormal}} |
Table 3.
Five message topics and their significance.
Table 3.
Five message topics and their significance.
Topic Name | Message Level of QoS | Meaning |
---|
/robot/environment | 2 | The physical robot publishes data collected by the sensor array. |
/robot/detect | 2 | The physical robot publishes the model detection results of abnormal beak shapes. |
Table 4.
Comparison of model speeds and sizes with different module actions.
Table 4.
Comparison of model speeds and sizes with different module actions.
Model | Iteration Speed (it/s) | Detection Speed (img/s) | Model Size (MB) |
---|
YOLOv8 | 5.61 | 20 | 6.2 |
YOLOv8+GSConv | 5.46 | 19 | 5.8 |
YOLOv8+GSConv+SE | 5.44 | 18 | 6.2 |
YOLOv8+GSConv+CBAM | 5.29 | 15 | 6.5 |
YOLOv8+GSConv+EMA | 5.36 | 19 | 6.3 |
Table 5.
Comparison of evaluation metrics for multi-model experiments.
Table 5.
Comparison of evaluation metrics for multi-model experiments.
Models | Precision | Recall | F1-Score | mAP(%) | Model Size (MB) | Detection Speed (img/s) |
---|
Faster-RCNN | 0.92 | 0.91 | 0.93 | 92.4 | 44.8 | 7 |
YOLOv7-tiny | 0.69 | 0.75 | 0.72 | 78.8 | 12.1 | 25 |
YOLOv8n | 0.86 | 0.89 | 0.90 | 90.3 | 6.2 | 20 |
YOLOv5n | 0.81 | 0.86 | 0.86 | 89.5 | 3.9 | 22 |
YOLOv8+GSConv+EMA | 0.91 | 0.91 | 0.93 | 92.7 | 6.3 | 19 |
Table 6.
Topic interface connectivity tests.
Table 6.
Topic interface connectivity tests.
Sender | Receiver | Topic | Results |
---|
Inspection robot | UE5 | /robot/environment | Pass |
Inspection robot | UE5 | /robot/detect | Pass |
Table 7.
Topic interface latency and stability tests.
Table 7.
Topic interface latency and stability tests.
Topic | Frequency of Messages | Number of Messages Sent | Number of EMQX Message Inflows | Number of EMQX Message Outflows | Number of Messages Received | Average Delay (ms) |
---|
/robot/environment | 2.0 s/message | 902 | 902 | 902 | 902 | 15 |
/robot/detect | 1.0 s/message | 1839 | 1839 | 1839 | 1839 | 15 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).