Next Article in Journal
Genome-Wide Association Study Reveals Candidate Genes Underlying Reproduction-Associated Conformation Traits in Jersey Cattle
Previous Article in Journal
Design and Evaluation of an Automated Rod-Feeding Mechanism for Small Arch Shed Machine Based on Kinematics
Previous Article in Special Issue
SPMF-YOLO-Tracker: A Method for Quantifying Individual Activity Levels and Assessing Health in Newborn Piglets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds

1
Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
2
National Engineering Research Center Information Technology in Agriculture, Beijing 100097, China
3
College of Computer and Information Engineering, Tianjin Agricultural University, Tianjin 300384, China
4
Beijing Wordbrain Information Technology Co., Ltd., Beijing 101206, China
*
Author to whom correspondence should be addressed.
Agriculture 2026, 16(1), 28; https://doi.org/10.3390/agriculture16010028
Submission received: 29 October 2025 / Revised: 10 December 2025 / Accepted: 15 December 2025 / Published: 22 December 2025
(This article belongs to the Special Issue Modeling of Livestock Breeding Environment and Animal Behavior)

Abstract

Cough sounds are a key acoustic indicator for detecting respiratory diseases in laying hens, which have become increasingly prevalent with the intensification of poultry housing systems. As an important early signal, cough sounds play a vital role in disease prevention and precision health management through timely recognition and spatial localization. In this study, an improved BiLSTM–TDOA method was proposed for the accurate recognition and localization of laying hen cough sounds. Nighttime audio data were collected and preprocessed to extract 81 acoustic features, including formant parameters, MFCC, LPCC, and their first and second derivatives. These features were then input into a BiLSTM-Attention model, which achieved a precision of 97.50%, a recall of 90.70%, and an F1-score of 0.9398. An improved TDOA algorithm was then applied for three-dimensional sound source localization, which resulted in mean absolute errors of 0.1453 m, 0.1952 m, and 0.1975 m along the X, Y, and Z axes across 31 positions. The results demonstrated that the proposed method enabled accurate recognition and 3D localization of abnormal vocalizations in laying hens, which will provide a novel approach for early detection, precise control, and intelligent health monitoring of respiratory diseases in poultry houses.

1. Introduction

The poultry industry is fundamental to global food production, and maintaining chicken health is critical for sustainable farming [1]. Advances in artificial intelligence provide effective tools for precision poultry management, facilitating automated monitoring and early disease detection [2]. With the continuous expansion of poultry farming and increased stocking density, respiratory diseases in laying hens are rising [3]. Early detection, timely diagnosis, and rapid intervention are essential for reducing economic losses and ensuring farming efficiency before diseases spread [4]. However, most farms still rely on manual inspection by farm personnel, which is not only inefficient [5], but also induces stress in the flock, making it difficult to identify abnormal individuals accurately, undermining disease control [6]. Coughing, a characteristic symptom of respiratory diseases, can be reliably captured at night, providing a basis for early diagnosis [7]. Therefore, the intelligent monitoring and recognition of typical cough sounds, combined with spatial localization of abnormal vocalizations, offers new insights for disease diagnosis and health management in poultry facilities [8,9].
Research on abnormal sound recognition in livestock and poultry has been relatively well-developed. Various methods, including Backpropagation (BP) neural networks [10], Support Vector Machines (SVM) [11], data fusion [12], and Bidirectional Long Short-Term Memory (BiLSTM) networks [13], have been applied to poultry audio recognition, achieving accuracies of 91–98.5%, indicating substantial progress in abnormal sound recognition in poultry. However, due to the challenges of audio acquisition and limitations in data quality, research on laying hen cough recognition remains limited. Early studies on abnormal audio localization focused on two-dimensional (2D) positioning of cough sounds in livestock barns. Silva employed a seven-microphone array with the Time Difference of Arrival (TDOA) method, achieving an average error of 0–1.5 m and a maximum standard deviation of 0.4 m [14], demonstrating effective 2D localization. Du used a four-element linear microphone array combined with the Phase Transform (PHAT) algorithm to regionally localize laying hen calls in a planar simulated space, achieving 74.7% overall regional accuracy [15]. Although linear arrays can estimate the sound-emitting region, they cannot accurately determine the specific position of the sound source in three-dimensional (3D) space. Overall, most existing studies on livestock and poultry sound source localization are conducted in planar farming environments, with experiments designed based on specific simulated farm scenarios. Therefore, further research is required to improve localization techniques for complex 3D structures such as multi-tiered cage systems [16].
In laying hen farming, integrating abnormal audio recognition with precise 3D localization provides novel technological support for poultry health monitoring and enhanced animal welfare. Although sound source localization techniques have been widely applied in engineering fields [17,18,19], their application in abnormal laying hen sound localization remains limited [20]. The primary methods include SRP-PHAT [21,22] and TDOA [23,24]. In practice, SRP-PHAT typically estimates only the sound source direction and demands substantial computational resources [25], limiting its effectiveness for accurate 3D localization in poultry houses. In contrast, TDOA offers advantages such as real-time performance, low computational cost and robustness, making it suitable for precise 3D sound source localization in this study [26]. This study aims to achieve nighttime recognition and localization of laying hen coughs. Existing studies indicate that, during nighttime production management, veterinarians typically inspect the henhouse after lights-off and assess flock health by monitoring vocalizations and behaviors [15,27]. Meanwhile, hens exhibit substantially reduced activity after lights-off, with markedly fewer normal vocalizations such as feeding sounds, movement noises, and routine calls, and the background noise level also decreases accordingly. Therefore, this study also focuses on nighttime recordings. Nighttime audio data were preprocessed, and acoustic features, including formant parameters, Mel-frequency cepstral coefficients (MFCCs) [28], linear predictive cepstral coefficients (LPCCs) [29], and their first- and second-order derivatives—were extracted, resulting in a total of 81 features. These features were used as inputs to a BiLSTM-Attention model [30] for accurate cough recognition. After detecting cough events, an improved TDOA method was applied to estimate the 3D coordinates of the sound source, with the optimal position serving as the output, achieving precise 3D localization of laying hen coughs.
The main contributions of this study were as follows:
(1)
Acoustic feature fusion-based recognition: For laying hen cough sounds, a fusion of extracted acoustic features—including formants, MFCCs, LPCCs, and their first—and second-order derivatives was employed. An attention mechanism was integrated into the BiLSTM-Attention model to enhance the network’s focus and capture key acoustic features of coughs, thereby achieving high-accuracy recognition.
(2)
Improved TDOA-based 3D localization in a laying-hen house: In the TDOA-based sound source localization framework, a novel combination of PHAT-weighted peak refitting and global grid search strategies was proposed to significantly improve time delay estimation and spatial search performance. Considering the acoustic environment of 3D cage structures in poultry houses, a 3D localization algorithm was optimized; unlike conventional 2D plane-based methods, it incorporates the vertical dimension to account for height-related propagation differences and enhance overall spatial accuracy.

2. Materials and Methods

2.1. Materials

2.1.1. Sound Recognition Dataset of Laying Hens

The sound recordings were obtained from a laying hen farm (Huadu Yukou Poultry Co., Ltd., Pinggu District, Beijing, China) between 19 December and 28 December 2024. This study was conducted in a commercial poultry house, where audio recordings were collected in the central area between the two laying-hen rearing aisles from a single batch of approximately 17,000 48-week-old “Jinghong No. 1” laying hens. Continuous nighttime monitoring was carried out in the henhouse, and a total of 756 nighttime audio segments were manually annotated and extracted from the recordings for analysis. These samples comprised 260 instances of normal nocturnal vocalizations, 229 instances of cough sounds, and 267 instances of environmental noises generated by hens interacting with metal cages in a multilayer cage system, which were the sound recognition dataset of laying hens. The spectrograms [31] of the three audio categories used in this study are presented in Figure 1.

2.1.2. Setup of Localization Experimental Platform and Dataset Construction

During the localization experiment conducted on 5 January 2025, a laboratory-based simulation was performed. Considering the 3D cage structure of a commercial laying hen house, which includes aisles for personnel and equipment access, the simulated experiment faithfully replicated the spatial layout of the rearing environment [32]. A central aisle is incorporated within the experimental space. The cough sound-emitting device was mounted on a tripod with adjustable height to simulate coughs from different cage positions, enabling the microphone array to capture audio from multiple spatial locations and realistically reproduce the distribution of coughing hens. To ensure realism and operability, the central aisle was excluded from data collection points. The simulated experimental space measured 4 × 4 × 2 m and by continuously varying the position of the sound-emitting device, the spatial distribution of coughing behavior was simulated. The cough audio used corresponded to that shown in Figure 1c.
For localization, this study employed the xCORE VocalFusion™ Speaker series XVF3100/3000 voice processors (XMOS, Shenzhen, China) as the microphone array for data acquisition. These devices are compact, easy to install and maintain, and minimally interference with normal henhouse operations, providing reliable sound source localization performance and allowing integration with mobile robots for future fixed-point monitoring. The microphone array consisted of four microphones in a quaternary cross-shaped configuration, positioned in the experimental space at the following 3D coordinates: [(−0.0215, +0.03725, +0.059), (+0.0215, +0.03725, +0.059), (+0.0215, −0.03725, +0.059), (−0.0215, −0.03725, +0.059)] (units: m). During each recording session, all four microphones simultaneously outputted single-channel audio streams. The experimental setup is illustrated in Figure 2, with the microphone array centrally positioned. The sound-emitting device was mounted on a tripod and placed at different positions within the rearing space to simulate coughing behavior. A total of 31 positions were set, with 20 audio samples collected per position, resulting in a total of 620 samples to ensure the stability and reliability of the experimental results.

2.2. Methods

The research framework of this study is shown in Figure 3, illustrating the complete process from laying hen audio collection to cough audio recognition and localization. Once the hen audio is identified as a cough sound, the audio undergoes sound source localization. During the localization process, features are extracted from the four-channel audio collected by the microphone array for computational analysis. After obtaining the time delay values, a global search is performed to determine the final location of the abnormal audio source.

2.2.1. Acoustic Feature Extraction of Laying Hens

To characterize the vocalizations of laying hens, three types of acoustic features were extracted: MFCC, LPCC, and formant parameters. Both MFCC and LPCC include static coefficients, first-order (Δ), and second-order (Δ2) derivatives to capture temporal dynamics.
(1)
MFCC extraction
The MFCC features were extracted by first applying pre-emphasis to enhance high-frequency components, followed by framing and windowing the signal using a Hamming window. Each frame was then transformed using the Short-Time Fourier Transform [33], and the resulting spectral energies were passed through a Mel filterbank and logarithmically scaled. The Discrete Cosine Transform [34] was subsequently applied to obtain the MFCCs as follows:
M F C C c = m = 1 M l o g ( E m ) · cos π c M ( m 1 2 ) ,   c = 1 , 2 , , C
where E m is the energy of the m -th Mel filter, M is the total number of Mel filters, and C is the number of MFCCs.
Δ M F C C c n = M F C C c n M F C C c n 1 Δ 2 M F C C c n = Δ M F C C c n Δ M F C C c n 1
where n denotes the frame index. These differences reflect the dynamic changes in the vocal signal over time, enhancing the representation of temporal patterns.
(2)
LPCC extraction
The LPCC features were extracted by first computing the linear predictive coding (LPC) coefficients a n of the audio signal, which represents the vocal tract characteristics. The LPC coefficients were then converted to cepstral coefficients (LPCC) using:
c n = a n + k = 1 n 1 k n c k a n k , n = 1 , 2 , , p
where p is the LPC order. This formula recursively converts LPC coefficients into cepstral coefficients, capturing the spectral envelope of the vocal signal. To model temporal dynamics, the first- and second-order differences of LPCC were computed as:
Δ L P C C c n = L P C C c n L P C C c n 1 Δ 2 L P C C c n = Δ L P C C c n Δ L P C C c n 1
where n denotes the frame index. These differences capture the temporal evolution of the vocal tract features.
(3)
Formant extraction
Formant frequencies, representing the resonant characteristics of the vocal tract, were estimated by performing LPC analysis and solving for the roots of the LPC polynomial. The angles of the complex roots θ i were then converted to formant frequencies using:
F i = θ i · f s 2 π , i = 1 , 2 , 3
where f s is the sampling frequency. This provides the first three formant frequencies (F1–F3), which are key resonant frequency features of the vocalizations.

2.2.2. BiLSTM-Attention Cough Sound Classification of Laying Hens

To classify cough sounds of laying hens, a bidirectional Long Short-Term Memory (BiLSTM) network was employed, which can capture both past and future temporal dependencies in sequential audio features.
(1)
BiLSTM model structure
The BiLSTM network consists of L stacked layers with hidden size H . Given an input feature sequence S = { s 1 , s 2 , s 3 , s 4 , , s t } , the forward and backward LSTM outputs at time step t are:
h t = L S T M f o r w a r d ( s t , h t 1 ) , h t = L S T M b a c k w a r d ( s t , h t + 1 )
The final hidden representation is the concatenation of both directions:
h t = [ h t ; h t ]
(2)
Attention mechanism
To focus on the most informative frames, an attention mechanism computes a weight α t for each time step:
α t = e x p ( t a n h ( W h t + b ) ) k = 1 T e x p ( t a n h ( W h t + b ) )
where h t is the hidden state of the BiLSTM at time step t , and b are learnable parameters. The context vector is obtained as a weighted sum of hidden states:
C = t = 1 T α t h t
In our implementation, the attention layer consists of two linear layers with a Tanh activation in between, which outputs a scalar score for each time step. The scores are normalized using softmax to produce attention weights α t . This allows the model to dynamically focus on the most informative frames in the audio sequence. The context vector C is then used for classification, ensuring that the BiLSTM model emphasizes key temporal segments that are most relevant to distinguishing the three vocalization classes. By weighting important time steps more heavily, the attention mechanism improves sequence representation, enhances interpretability, and increases classification performance compared with using only the last hidden state of the BiLSTM.
(3)
Classification layer
The context vector c is passed through a fully connected layer and softmax to output the probability of each class:
y ^ = s o f t m a x ( W c c + b c )
where y ^ represents the predicted probability for each category: cough, normal, or environmental noise.
(4)
Model training
The model is trained using cross-entropy loss:
L = i 1 C y i l o g ( y i ^ )
where y i is the one-hot ground truth label and C is the number of classes. Optimization is performed using the AdamW algorithm with weight decay. Early stopping and dropout are applied to prevent overfitting, ensuring robust classification performance. The specific parameter settings used in this study are summarized in Table 1. The dataset was split into a training set, a validation set, and a test set, with 60% of the data used for training, 20% for validation, and 20% for independent performance evaluation.

2.2.3. Sound Source Localization of Laying Hen Coughs

In the process of 3D sound source localization based on the TDOA method, there are two main steps: the first step is to calculate the time delay between different microphones using the signals received by each microphone in the array, thereby obtaining the time delay estimates; the second step is to combine the known spatial coordinates of the microphones with the time delay information to estimate the precise location of the sound source, thus achieving a complete mapping from time delay measurement to 3D position determination.
(1)
Time delay estimation method
The first step in sound source localization is the estimation of time delay, the core task of which is the signals received by each microphone element in the array to calculate the time differences of arrival between different elements, thereby providing the foundational data for subsequent 3D sound source position calculation. In this study, a four-microphone array is used and the arrival time of the source at each microphone is denoted by t i ( i = 1 , 2 , 3 , 4 ) . First, a time delay can be computed for every pair of microphones, yielding six time-difference-of-arrival values in total, ( τ 12 ,   τ 13 ,   τ 14 ,   τ 23 ,   τ 24 ,   τ 34 ) . On this basis, one pair ( τ 12 ) is selected as an example and its computation process is described in detail. Furthermore, to obtain the overall time-delay estimates, the generalized cross-correlation (GCC) [35] method and its phase-weighted variant are employed; implementation details are given below.
GCC method is a frequency-domain time delay estimation technique. Its basic idea is to transform the cross-correlation function of two signals into the frequency domain, compute the cross-power spectrum, and then apply an inverse integral transform to obtain the time-domain cross-correlation function, thereby determining the time delay between the signals. Let the signals received by two microphones be x 1 ( ω ) and x 2 ( ω ) , and their cross-power spectrum is expressed as:
G 12 ( ω ) = X 1 ( ω ) X 2 ( ω )
where X 1 ( ω ) and X 2 ( ω ) are the Fourier transforms of x 1 ( ω ) and x 2 ( ω ) , respectively, and denotes the complex conjugate. The generalized cross-correlation function can be written as:
R 12 Ψ ( τ ) = 1 2 π π π Ψ 12 ( ω ) G 12 ( ω ) e j ω τ d ω
When the weighting function Ψ 12 ( ω ) = 1, this formula reduces to the standard GCC. The time delay between the signals can be obtained by locating the peak of the cross-correlation function:
τ 12 = a r g m a x τ R 12 Ψ ( τ )
To further enhance robustness in low signal-to-noise ratio and reverberant environments, the phase transform weighting (PHAT) method is introduced. PHAT primarily relies on the phase information of the cross-power spectrum while discarding the magnitude, so that the time delay estimation depends solely on phase information, thereby reducing the influence of amplitude fluctuations and noise on peak detection. The PHAT weighting function is defined as:
Ψ 12 ( ω ) = 1 | G 12 ( ω ) |
Substituting this into the generalized cross-correlation formula gives the PHAT-weighted cross-correlation function:
R 12 P H A T ( τ ) = 1 2 π π π G 12 ( ω ) | G 12 ( ω ) | e j ω τ d ω
In this formula, the magnitude normalization ensures that the TDOA information comes entirely from the phase characteristics of the signals. By locating the peak of R 12 P H A T ( τ ) , a more accurate time delay estimation can be obtained:
τ 12 = a r g m a x τ R 12 PHAT ( τ )
To further improve the accuracy of time delay estimation, especially in the presence of broadened peaks or potential misidentification in challenging acoustic conditions, a local peak refinement strategy is applied, referred to here as Refined GCC-PHAT. In this approach, the preliminary peak obtained from the PHAT-weighted cross-correlation function is further adjusted by fitting a quadratic function to the local neighborhood around the peak. This allows calculation of a small offset, Δ τ , which corrects the initial estimate. The refined time delay is then expressed as:
τ 12 = τ p e a k + Δ τ
This refinement effectively reduces the impact of peak broadening and minor fluctuations, yielding a more precise estimate of TDOA and providing a reliable foundation for subsequent 3D sound source localization.
(2)
Sound Source Position Estimation
After estimating the time delay between two microphone signals, the 3D spatial search method can be further employed to determine the precise location of the sound source. This approach involves dividing the possible space where the sound source may exist into a finite 3D search region. At each discrete grid point, the theoretical distance difference between the sound source and each microphone pair is calculated and compared with the estimated distance difference obtained via GCC-PHAT, thereby identifying the location that best matches the estimated values. Let the sound source position be p = ( x , y , z ) , and the theoretical distance difference between the i -th and j -th microphones is:
Δ r i j ( p ) = | | p m i | | + | | p m j | | Δ r i j ( p ) = | | p m i | | + | | p m j | |
where m i and m j represent the 3D coordinates of the i -th and j -th microphones, respectively. The observed distance difference obtained from GCC-PHAT time delay estimation is:
Δ d i j = c · Δ t i j
where c is the speed of sound, and Δ t i j is the time difference between the microphones. To quantify the matching degree of each spatial point, a cost function J ( p ) is defined as the sum of squared errors across all microphone pairs:
J ( p ) = i = 1 N 1 j = i + 1 N ( Δ r i j ( p ) Δ d i j ) 2
In practical computation, the 3D space is discretized into multiple candidate grid points, each corresponding to a potential sound source location p . By evaluating J ( p ) at each grid point, the point minimizing the cost function is taken as the optimal estimated source location:
p b e s t = a r g m i n J ( p )
The advantage of the spatial search method lies in its intuitive nature and independence from initial values. By traversing the entire search space, a global optimum can be guaranteed. This approach provides a complete mapping from time delay estimation to the 3D sound source position. When combined with the high-precision time delay measurement of GCC-PHAT, it effectively improves the accuracy and robustness of sound source localization and provides reliable spatial information for subsequent sound source behavior analysis and abnormal event monitoring.

2.2.4. Evaluation Methods

To comprehensively evaluate the performance of the laying hen cough sound recognition and localization models developed in this study, task-specific evaluation metrics were selected for the recognition and localization. For the recognition task, precision, recall, and F1-score were used to assess the model’s classification performance for cough sounds. Additionally, the proposed method was compared with commonly used classification algorithms, and both the overall accuracy and the recognition accuracy of each type of nighttime vocalization were reported to provide a comprehensive assessment of the model’s recognition capability.
P r e c i s i o n = TP TP + FP ,   Recall = TP TP + FN ,   F 1 - score = 2 · P r e c i s i o n · R e c a l l P r e c i s i o n + Recall
where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives.
In the sound source localization process, the procedure primarily consists of two stages: time delay estimation and position search. During the time delay estimation stage, the miniature microphone array used in this study provides time delay differences at millisecond-level precision, which makes quantitative comparison of localization accuracy somewhat challenging. Therefore, one set of data was randomly selected from the 31 experimental positions for result visualization, providing an intuitive assessment of time delay estimation accuracy. In the position search stage, the accuracy of time delay estimation directly affects the subsequent localization results. Therefore, in this study, 3D coordinates were calculated using 20 datasets from 31 experimental positions (a total of 620 samples), and the stability and accuracy of the proposed sound source localization method were quantitatively evaluated by computing the mean absolute error (MAE) of the results.

3. Results and Discussion

3.1. Recognition Results of Laying Hen Sounds

To evaluate the effectiveness of the proposed BiLSTM-Attention model, we conducted a comparative analysis with several baseline models, including SVM, MLP, BP, LSTM, and BiLSTM. Table 2 presents the performance comparison of all models on the training set, which provides a fair assessment of their learning capacities under identical training conditions. The training results reveal clear performance differences among the models. Our BiLSTM-Attention model achieved the highest macro F1-score of 0.8922, with macro precision and recall of 89.03% and 89.93%, respectively. In contrast, traditional models showed varying levels of performance: SVM (macro F1 = 0.8213), MLP (0.7391), BP (0.8062), LSTM (0.8192), and BiLSTM (0.8467). These results indicate that while baseline models can recognize basic audio categories, they are less capable of capturing complex acoustic patterns. Notably, the BiLSTM model (0.8467) outperformed the unidirectional LSTM (0.8192), confirming the advantage of bidirectional processing for temporal feature extraction.
Cough sound recognition was the core focus of this study, as accurate detection is essential for subsequent 3D sound source localization. On the training set, the BiLSTM-Attention model achieved a cough detection F1-score of 0.9623, with 98.08% precision and 94.44% recall—outperforming all baseline models. For normal vocalizations and environmental sounds, the model maintained balanced performance with F1-scores of 0.8673 and 0.8471, respectively, demonstrating robust recognition across all three categories. To assess generalization capability, we evaluated the BiLSTM-Attention model on an independent test set (Table 3). The model achieved a macro F1-score of 0.8918, with precision and recall of 89.59% and 88.88%. Performance remained strong across all categories: cough (F1 = 0.9398, precision = 97.50%), normal vocalizations (0.8544), and environmental sounds (0.8814). Critically, the performance drop from training to testing was only 0.04%, indicating excellent generalization ability. The minimal generalization gap distinguishes our approach from traditional models, which often exhibit larger performance degradation on unseen data. This stability is particularly valuable in real-world nighttime poultry house environments, where consistent performance is essential for reliable health monitoring. The smooth convergence curves in Figure 4 further confirm the model’s training stability and reliability.
These results demonstrate that the BiLSTM-Attention model not only exhibits superior learning capacity on the training set but also maintains high performance on independent test data. The integration of attention mechanisms with bidirectional LSTM architecture enables robust feature extraction and temporal dependency modeling, making the model suitable for practical deployment in agricultural monitoring scenarios.

3.2. Improved TDOA Localization of Laying Hen Coughs

3.2.1. Analysis of Cough Sound Time Delay Estimation Based on TDOA

The TDOA-based sound source localization process comprises two main stages. First, the time delays between each pair of microphones are estimated. Second, these estimated delays are used to determine the source location within the defined spatial range. In this study, a four-microphone array was employed to capture the cough sounds of laying hens. For each experimental dataset, six time delay differences (τ12, τ13, τ14, τ23, τ24, τ34) were calculated. To visually assess the computational accuracy of the three time delay estimation algorithms, Figure 5 presents the estimated time delays across 31 experimental datasets. The horizontal axis denotes the indices of the time delay pairs, while the vertical axis represents the estimated values. By comparing the results obtained using the original GCC method, the PHAT-weighted GCC method, and the improved PHAT method, the differences in accuracy and stability among the three algorithms can be clearly observed.
Overall, the results indicate that time-delay estimates obtained using the original GCC method exhibit considerable variability across measurement points, with some values deviating significantly from the true delays, reflecting its susceptibility to noise and limited robustness. By introducing PHAT weighting, which normalizes the magnitude of the cross-power spectrum and makes the estimation rely primarily on phase information, the computational accuracy improves substantially. Although most PHAT-weighted estimates are close to the true values, noticeable deviations remain at certain points. With the improved PHAT method, which involves local quadratic fitting and fine adjustment of preliminary peaks, the estimation accuracy is further refined, bringing the results more closely to the true values, and the numerical stability is significantly increased, reducing anomalous fluctuations.
Based on the visualization of the three methods, the improved PHAT weighting method demonstrates clear superiority at 25 measurement points (points 3, 4, 6–17, 19–22, 24, 25, 27–31). This improvement is primarily attributable to the incorporation of local quadratic fitting and fine adjustment of preliminary peak positions, which effectively suppress noise and multipath interference in the cross-correlation function and enhance both estimation accuracy and numerical stability. In addition, because the method relies on phase information rather than amplitude, it provides more reliable time-delay estimation under low signal-to-noise ratio conditions. As a result, the improved PHAT method yields delay estimates that are consistently closer to the true values and markedly outperforms both the original GCC method and the standard PHAT method. At points 1, 2, 5, 18, 23, and 26, however, the improved PHAT method performs slightly worse than GCC or standard PHAT. These deviations, although present, are minor and do not significantly impact the overall accuracy or subsequent localization performance. Numerical analysis further indicates that when the true time-delay difference approaches zero, the improved PHAT method may introduce estimation errors due to the sensitivity of local fitting. For example, at point 8, τ12 is 0.0011069, but the estimate is −0.0063476; at point 18, τ24 is −0.0000353, with an estimate of 0.0034954; and at point 20, τ13 is 0.0000283 with an estimate is −0.0015434. In contrast, the original GCC and standard PHAT methods generally output values close to zero under such conditions, thereby avoiding these small deviations. Overall, despite occasional inaccuracies at extremely small delay differences, the improved PHAT method achieves superior performance in both accuracy and robustness, providing a more reliable computational foundation for the time-delay estimation required in laying hen cough sound localization.
From a numerical perspective, the device configuration used in this study was selected based on practical application constraints, ultimately employing equipment with low storage requirements and a 16 kHz sampling rate. Under the limited sampling resolution, both the original GCC and the standard PHAT methods compute time delays strictly according to discrete sampling intervals, inevitably introducing quantization errors and generating from the true experimental values. As shown in Table 4 for measurement point 21, the time delay estimates produced by these methods are restricted to discrete values such as 0, 0.0625, 0.125, 0.1875, indicating that low sampling rates hinder the capture of finer temporal variations in acoustic propagation. When the true delays align closely with sampling intervals (e.g., τ12, τ14, τ23, τ34), the GCC and PHAT methods provide reasonable estimates, and the improved PHAT method further enhances accuracy. However, when the true delay falls between sampling points (e.g., τ13), significant errors arise. For instance, for a true value of 0.0031888, the GCC method computes outputs 0.0625, while the PHAT method returns 0, which is an improvement over GCC but still noticeably less accurate than the improved PHAT method. Such “staircase” quantization efforts undermine the continuity of actual acoustics propagation, leading to accumulated errors and positional offsets in the subsequent localization stage. Despite these constraints, the improved PHAT method demonstrates strong robustness and adaptability under low sampling rates, effectively mitigating the limitations of traditional approaches. This indicates that even with resource-constrained hardware, appropriate algorithmic optimization can achieve a reasonable balance among cost, power consumption, and localization accuracy. Overall, while the improved PHAT method performs slightly worse than GCC or standard PHAT in a few instances, it achieves consistently higher accuracy and stability across most estimation scenarios, making it well suited for time-delay estimation and 3D localization of laying hens in this study.
During the visualization of the results, the GCC method exhibited excessively large numerical deviations in time delay estimates at certain measurement points, with four points showing particularly pronounced discrepancies. For instance, at point 10, the true value of τ13 was −0.1129728 ms, whereas the computed value was 6.0625 ms; at point 12, τ13 was −0.1480608 ms, with an estimated −2.25 ms; at point 16, τ13 and τ14 were −0.1717706 ms and −0.1066484 ms, while the computed values were 7.6875 ms and 0.5 ms; at point 20, τ24 was 0.1606474 ms, compared −6 ms. These extreme deviations were excluded from the visualization. Even after removing these anomalous cases, the GCC method remains generally unstable. This instability arises from high sensitivity to signal coherence and low SNR during peak detection. Noise introduces spurious peaks in the cross-correlation function under low SNR, leading to erroneous peak selection. While insufficient coherence diminishes the main peak height and blurs its position, it increases uncertainty. Multipath effects further generate reflected peaks, weakening the main peak energy and shifting estimated delays, collectively causing large deviations at certain points and undermining overall reliability. To address these limitations, PHAT weighting and its improved variant were applied to suppress noise and multipath interference, enhancing both accuracy and robustness. As observed in the figures, the improved PHAT methods correct extreme GCC deviations and progressively align estimated peaks with true values.

3.2.2. Analysis of Cough Sound Position Search Based on TDOA

This process mainly corresponded to the second stage of the TDOA method, which involved position estimation. The 3D coordinates of the localization results were presented in Table 5. Since three different time delay estimation methods were employed during the localization process, three corresponding sets of localization results were obtained. The localization procedure was divided into two stages: time delay estimation and position search. Errors introduced during the time delay estimation stage were further amplified in the position search stage, making the final localization results a more direct reflection of the impact of different time delay estimation algorithms on localization accuracy, thereby providing a basis for evaluating the performance of each method. As shown in Table 5, considering the overall 3D localization results, both the PHAT-weighted method and its improved variant demonstrated significant advantages in terms of localization accuracy and stability. The results obtained by these two methods exhibited a more concentrated distribution, smaller fluctuations, and higher overall consistency, indicating strong robustness and anti-interference capability in complex acoustic environments.
In contrast, the localization results based on the original GCC method were less stable, exhibiting notable deviations and large errors at multiple points, strongly affected by noise and multipath effects. Specifically, 3D coordinates distribution revealed that errors were more likely near quadrant boundaries (e.g., azimuth angles around 90° and 270°), while small time delay differences between microphones increased sensitivity to environmental noise and minor phase perturbations, causing instability. Although error patterns varied across points for all three algorithms, deviations were generally concentrated near quadrant transitions. For example, GCC deviations occurred at points 11 and 27, PHAT at points 10 and 26, and the improved PHAT at point 8. Overall, GCC localization errors were larger than those of PHAT-based methods, demonstrating the latter’s superior robustness in suppressing noise, mitigating multipath effects, and enhancing algorithmic stability. Further analysis of the GCC-based 3D results showed that the quadrant estimation errors at points 5, 7, 10, 14, 16, and 20 were not caused by quadrant proximity but likely by instability in time delay estimation under high-noise conditions or by cumulative errors. Minor errors during the time delay estimation in the GCC method were amplified during position inversion, resulting in significant localization deviations. PHAT weighting substantially suppressed such errors, yielding localization results closer to the true source positions. While the improved PHAT method further reduced quadrant estimation errors, enhancing both localization accuracy and spatial consistency.
Table 5 presents the averaged 3D coordinates for all 31 measurement points; however, the tabulated data alone did not intuitively reflect the specific error characteristics. Therefore, Figure 6 illustrates the absolute localization errors (MAE) in the X, Y, and Z directions for 20 sets of data across the 31 measurement points, providing a more direct comparison of the accuracy performance among different localization methods. Overall, the mean absolute errors of the localization results obtained using the GCC method, PHAT weighting, and the improved PHAT method across the 31 points were as follows: X direction, 0.7217, 0.2478, and 0.1453 m; Y direction, 0.7776, 0.3037, and 0.1952 m; and Z direction, 0.6498, 0.3341, and 0.1975 m, respectively. It was evident that the GCC method exhibited relatively large errors, with average deviations exceeding 0.6 m in all three directions, making it unsuitable for precise localization of laying hen cough sounds. The introduction of PHAT weighting reduced the 3D localization errors to approximately 0.3 m, while the further improved PHAT method decreased the errors to about 0.1–0.2 m, demonstrating a substantial improvement in localization accuracy. A detailed analysis showed that the original GCC method displayed considerable instability and pronounced deviations in the localization results. Specifically, data sets 5, 7, 10, 14, 16, and 20 had localization errors exceeding 1 m along the X, Y, and Z axes, which resulted from inaccurate time delay estimation. The application of PHAT weighting significantly reduced the overall errors, producing more stable localization results. Compared with the conventional PHAT-weighted method, the improved PHAT approach achieved even greater stability and further error reduction, indicating higher precision and reliability in practical localization scenarios.
Based on the specific error values of the 3D coordinates, all experimental points with errors exceeding 1.5 m were labeled as “>1.5 m” in the figure to highlight large deviations while minimizing their influence on the statistical results of the PHAT and improved PHAT methods. Figure 6a illustrates the distribution of mean absolute errors (MAE) along the X-axis. Overall, the GCC method exhibited substantial errors. Among the 31 data sets, 10 exceeded 0.8 m and 4 surpassed 1.5 m. These results indicate that the GCC method did not satisfy the accuracy requirements for cough sound localization in laying hens. The application of PHAT weighting significantly reduced the errors, with all data controlled within 0.8 m. After employing the improved PHAT method, the maximum error further decreased to below 0.4 m, effectively mitigating large deviations and substantially enhancing localization accuracy. Analysis of the individual data sets revealed a general downward trend in errors from GCC to PHAT to improved PHAT, with the most pronounced reduction occurring from GCC to PHAT. However, for data sets 3, 4, 12, 16, 17, 19, 25, and 27, the PHAT method produced slightly lower errors than the improved PHAT method; for sets 6, 8, 30, and 31, the improved PHAT method exhibited the largest error within the respective set. Nonetheless, these differences were minor and remained within acceptable limits, and the overall localization accuracy still demonstrated a clear and consistent advantage for the improved PHAT method.
Figure 6b illustrates the MAE distribution along the Y-axis. In general, errors along the Y-axis were slightly higher than those along the X-axis. The GCC method continued to exhibit large errors, with five data sets exceeding 1.5 m, whereas the PHAT-weighted and improved PHAT methods substantially reduced these errors. Similarly to the X-axis results, most data sets showed a gradual decrease in error magnitude with methodological improvement, although certain fluctuations persisted. This indicated that localization accuracy along the Y-axis was more affected by the microphone array layout and the acoustic propagation paths. Specifically, in data sets 7, 10, 12, 23, 26, and 31, the improved PHAT method produced slightly higher errors than the standard PHAT method. However, the differences did not exceed 0.3 m and remained within normal fluctuation ranges. In data sets 1, 25, and 29, the errors of the improved PHAT method exceeded those of the PHAT method by approximately 0.3 m, due to local reflections and phase perturbations within the acoustic field. In contrast, data sets 11, 21, 24, and 27 exhibited noticeably higher errors when using the PHAT method than the other two methods, with set 11 reaching 1.177 m and sets 21 and 24 exceeding 0.8 m. These large errors had a more pronounced impact on the overall results than the minor fluctuations observed for the improved PHAT method, further validating its effectiveness in suppressing extreme deviations and enhancing Y-axis localization accuracy.
Figure 6c illustrates the MAE distribution along the Z-axis. Overall, the stability of source position estimation in the Z direction was slightly lower than that in the horizontal plane (X and Y axes). For most experimental sets, the improved PHAT method maintained Z-axis errors within 0.3 m, demonstrating good stability and precision. However, in experimental sets 8, 16, and 25, the improved method exhibited noticeably higher errors than the other two methods, with set 25 showing an increase of approximately 0.4 m compared with the unmodified method. This observation indicates that, under certain source spatial distributions or incidence angles, vertical localization accuracy might fluctuate, reflecting the algorithm’s sensitivity to the height distribution of the microphone array and the angle of wave incidence. Nevertheless, in sets 6, 14, and 19, the improved method effectively reduced initially large errors, while in sets 3, 5, 17, 18, 20, and 24, the reduction was even more pronounced, significantly mitigating the previously large deviations.
Overall, despite occasional fluctuations in Z-axis errors, the improved PHAT method exhibited clear advantages in enhancing vertical localization accuracy under complex acoustic conditions. A comprehensive analysis indicated that the GCC method was prone to significant errors under multipath reflections, background noise, and non-stationary signal conditions. PHAT weighting improved robustness by suppressing amplitude information and reinforcing phase consistency, whereas the improved PHAT method further introduced fitting functions and noise-adaptive weighting, effectively correcting cross-correlation peak offsets and enhancing localization accuracy and stability. Although minor fluctuations remained in certain spatial directions, the overall results demonstrated that the proposed method outperformed conventional approaches in 3D source localization, particularly in reducing large errors and improving vertical-axis accuracy.

4. Limitation

This study focuses on the recognition and localization of nighttime cough sounds in laying hen houses and proposes a multi-feature fusion BiLSTM-Attention method for cough recognition together with an improved TDOA-based localization algorithm. In the recognition stage, the BiLSTM-Attention network utilizes bidirectional temporal modeling and an attention mechanism to enhance the extraction of salient acoustic cues, achieving robust and high-precision identification of cough events even with limited samples. Experimental results demonstrate a precision of 97.50% and a recall of 90.70%, clearly outperforming traditional feature-based machine learning models and confirming the effectiveness of multi-feature fusion combined with attention mechanisms in avian vocalization analysis. In the localization stage, the improved TDOA algorithm performs nonlinear fitting optimization on estimated delays and incorporates a full-space search strategy to achieve 3D positioning of cough sources. Across 31 spatial test points using a miniature microphone array, the algorithm consistently maintained a mean absolute error of below 0.2 m, demonstrating high accuracy, stability, and applicability in complex poultry house acoustics. Overall, the proposed framework provides a reliable methodological foundation for cough sound recognition and spatial localization in laying hen houses.
Despite the satisfactory experimental outcomes, several limitations remain. First, on-site audio collection was restricted by production constraints and the need to minimize animal stress, resulting in a dataset of limited scale and quality, which is inadequate for training deeper neural network models. Second, although the localization experiments were designed to approximate the geometric characteristics of laying hen houses, discrepancies from real acoustic environments persist. In practical settings, the sound field is more complex and affected by multiple concurrent sources, background noise, and reflection effects, which may increase time-delay estimation errors and consequently reduce localization accuracy.

5. Conclusions

To address the lack of effective integration between cough sound recognition and localization in existing studies, this work proposes a BiLSTM recognition framework coupled with a TDOA localization algorithm for laying hen coughs. In the recognition stage, multiple acoustic features—formants, MFCCs, LPCCs, and their first- and second-order derivatives—are fused and fed into a BiLSTM-Attention model, achieving a precision of 97.50%, a recall of 90.70%, and an F1-score of 0.9398. In the localization stage, a miniature microphone array enables individual-level 3D positioning of hen vocalizations. The estimated time delays are refined via quadratic fitting and integrated with a full-space search strategy, yielding mean absolute errors of 0.1453 m, 0.1952 m, and 0.1975 m along the X, Y, and Z axes across 31 positions. By linking abnormal audio event recognition with subsequent localization and designing experiments that closely reflect real poultry house conditions, this study achieves accurate 3D source localization under stacked cage-rearing systems and provides new methodological insights for integrated recognition and localization of abnormal avian vocalizations. Future work will focus on: (1) expanding the dataset by collecting more diverse laying hen vocalizations to improve model generalization; and (2) conducting full-scale validation in real poultry houses to assess the model’s stability and real-time performance under dynamic multi-source acoustic conditions.

Author Contributions

Conceptualization, F.Q. and L.Y.; methodology, F.Q.; software, Z.R. and X.D.; validation, F.Q., L.Y. and Y.Z. (Yanrong Zhuang); formal analysis, Y.Z. (Yanrong Zhuang); investigation, C.L. and H.Z.; resources, Q.L.; data curation, Y.W. (Yue Wu) and Y.Z. (Yujie Zhao); writing—original draft preparation, F.Q.; writing—review and editing, L.Y. and Y.W. (Yuxin Wang); visualization, X.D.; supervision, Q.L.; project administration, L.Y.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Reform and Development Project of Beijing Academy of Agriculture and Forestry Sciences (GGFZ20250407); the National Natural Science Foundation of China (Grant No. 32573284); and the Beijing Innovation Team-Smart Agriculture Industry Technology System, grant number BAIC10-2025-E04.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Haiqing Zhang was employed by the company Beijing Wordbrain Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Wu, Z.; Willems, S.; Liu, D.; Norton, T. How AI Improves Sustainable Chicken Farming: A Literature Review of Welfare, Economic, and Environmental Dimensions. Agriculture 2025, 15, 2028. [Google Scholar] [CrossRef]
  2. Natsir, M.H.; Mahmudy, W.F.; Tono, M.; Nuningtyas, Y.F. Advancements in artificial intelligence and machine learning for poultry farming: Applications, challenges, and future prospects. Smart Agric. Technol. 2025, 12, 101307. [Google Scholar] [CrossRef]
  3. Ahmed, N.Q.; Dawood, R.A.; Al-Mastafa, A.; Jassim, S.T.; Ali, A.; Ahmed, A.M.; Hammady, F.J. Investigating the correlation between stocking density and respiratory diseases in poultry. J. Anim. Health Prod. 2025, 13, 65–72. [Google Scholar] [CrossRef]
  4. He, P.; Chen, Z.; Yu, H.; Hayat, K.; He, Y.; Pan, J.; Lin, H. Research progress in the early warning of chicken diseases by monitoring clinical symptoms. Appl. Sci. 2022, 12, 5601. [Google Scholar] [CrossRef]
  5. Italiya, J.; Ahmed, A.A.; Abdel-Wareth, A.A.; Lohakare, J. An AI-based system for monitoring laying hen behavior using computer vision for small-scale poultry farms. Agriculture 2025, 15, 1963. [Google Scholar] [CrossRef]
  6. Neethirajan, S. Rethinking poultry welfare—Integrating behavioral science and digital innovations for enhanced animal well-being. Poultry 2025, 4, 20. [Google Scholar] [CrossRef]
  7. Zhou, H.; Zhu, Q.; Norton, T. Cough sound recognition in poultry using portable microphones for precision medication guidance. Comput. Electron. Agric. 2025, 237, 110541. [Google Scholar] [CrossRef]
  8. Manikandan, V.; Neethirajan, S. AI-driven bioacoustics in poultry farming: A critical systematic review on vocalization analysis for stress and disease detection. Preprint 2025. [Google Scholar]
  9. Liu, M.; Chen, H.; Zhou, Z.; Du, X.; Zhao, Y.; Ji, H.; Teng, G. Development of an intelligent service platform for a poultry house facility environment based on the Internet of Things. Agriculture 2024, 14, 1277. [Google Scholar] [CrossRef]
  10. Yu, L.; Du, T.; Yu, Q.; Liu, T.; Meng, R.; Li, Q. Recognition method of laying hens’ vocalizations based on multi-feature fusion. Trans. Chin. Soc. Agric. Eng. 2022, 53, 259–265, (In Chinese with English abstract). [Google Scholar]
  11. Rizwan, M.; Carroll, B.T.; Anderson, D.V.; Daley, W.; Harbert, S.; Britton, D.F.; Jackwood, M.W. Identifying rale sounds in chickens using audio signals for early disease detection in poultry. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 55–59. [Google Scholar]
  12. Banakar, A.; Sadeghi, M.; Shushtari, A. An intelligent device for diagnosing avian diseases: Newcastle, infectious bronchitis, avian influenza. Comput. Electron. Agric. 2016, 127, 744–753. [Google Scholar] [CrossRef] [PubMed]
  13. Cuan, K.; Zhang, T.; Li, Z.; Huang, J.; Ding, Y.; Fang, C. Automatic Newcastle disease detection using sound technology and deep learning method. Comput. Electron. Agric. 2022, 194, 106740. [Google Scholar] [CrossRef]
  14. Silva, M.; Ferrari, S.; Costa, A.; Aerts, J.; Guarino, M.; Berckmans, D. Cough localization for the detection of respiratory diseases in pig houses. Comput. Electron. Agric. 2008, 64, 286–292. [Google Scholar] [CrossRef]
  15. Du, X.; Lao, F.; Teng, G. A sound source localisation analytical method for monitoring the abnormal night vocalisations of poultry. Sensors 2018, 18, 2906. [Google Scholar] [CrossRef]
  16. Yu, L.; Zhuang, Y.; Qiu, F.; Ding, X.; He, J.; Zhao, Y.; Yang, G.; Wu, Y.; Zhao, C.; Li, Q. Research progress of audio information technology in agricultural field. Trans. Chin. Soc. Agric. Eng. 2025, 56, 1–26, (In Chinese with English abstract). [Google Scholar]
  17. Bottigliero, S.; Milanesio, D.; Saccani, M.; Maggiora, R. A low-cost indoor real-time locating system based on TDOA estimation of UWB pulse sequences. IEEE Trans. Instrum. Meas. 2021, 70, 5502211. [Google Scholar] [CrossRef]
  18. Motie, S.; Zayyani, H.; Salman, M.; Bekrani, M. Self UAV localization using multiple base stations based on TDoA measurements. IEEE Wirel. Commun. Lett. 2024, 13, 2432–2436. [Google Scholar] [CrossRef]
  19. Kita, S.; Kajikawa, Y. Fundamental study on sound source localization inside a structure using a deep neural network and computer-aided engineering. J. Sound Vib. 2021, 513, 116400. [Google Scholar] [CrossRef]
  20. Fan, W.; Peng, H.; Yang, D. The application and challenges of advanced detection technologies in poultry farming. Poult. Sci. 2025, 104, 105870. [Google Scholar] [CrossRef]
  21. García-Barrios, G.; Gutiérrez-Arriola, J.M.; Sáenz-Lechón, N.; Osma-Ruiz, V.J.; Fraile, R. Analytical model for the relation between signal bandwidth and spatial resolution in steered-response power phase transform (SRP-PHAT) maps. IEEE Access 2021, 9, 121549–121560. [Google Scholar] [CrossRef]
  22. Ramamurthy, A.; Unnikrishnan, H.; Donohue, K.D. Experimental performance analysis of sound source detection with SRP PHAT-β. In Proceedings of the IEEE Southeastcon 2009, Atlanta, GA, USA, 5–8 March 2009; IEEE: New York, NY, USA, 2009; pp. 422–427. [Google Scholar]
  23. Wang, L.; Hon, T.; Reiss, J.D.; Cavallaro, A. Self-localization of ad-hoc arrays using time difference of arrivals. IEEE Trans. Signal Process. 2015, 64, 1018–1033. [Google Scholar] [CrossRef]
  24. Gustafsson, F.; Gunnarsson, F. Positioning using time-difference of arrival measurements. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 6–10 April 2003; p. 553. [Google Scholar]
  25. Krishnaveni, V.; Kesavamurthy, T. Beamforming for direction-of-arrival (DOA) estimation—A survey. Int. J. Comput. Appl. 2013, 61, 4–11. [Google Scholar] [CrossRef]
  26. Heydari, Z.; Mahabadi, A. Scalable real-time sound source localization method based on TDOA. Multimed. Tools Appl. 2023, 82, 23333–23372. [Google Scholar] [CrossRef]
  27. Ma, H.; Xin, P.; Ma, J.; Yang, X.; Zhang, R.; Liang, C.; Liu, Y.; Qi, F.; Wang, C. End-to-end detection of cough and snore based on ResNet18-TF for breeder laying hens: A field study. Artif. Intell. Agric. 2026, 16, 412–422. [Google Scholar] [CrossRef]
  28. Ali, S.; Tanweer, S.; Khalid, S.S.; Rao, N. Mel frequency cepstral coefficient: A review. In Proceedings of the 2nd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2020, New Delhi, India, 27–28 February 2020. [Google Scholar]
  29. Kamarudin, N.; Al-Haddad, S.; Khmag, A.; Hassan, A.B.; Hashim, S.J. Analysis on Mel frequency cepstral coefficients and linear predictive cepstral coefficients as feature extraction on automatic accents identification. Int. J. Appl. Eng. Res. 2016, 11, 7301–7307. [Google Scholar]
  30. Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: New York, NY, USA, 2019; pp. 3285–3292. [Google Scholar]
  31. Oppenheim, A.V. Speech spectrograms using the fast Fourier transform. IEEE Spectr. 2009, 7, 57–62. [Google Scholar] [CrossRef]
  32. Hartcher, K.M.; Jones, B. The welfare of layer hens in cage and cage-free housing systems. World’s Poult. Sci. J. 2017, 73, 767–782. [Google Scholar] [CrossRef]
  33. Özhan, O. Short-Time Fourier Transform. In Basic Transforms for Electrical Engineering; Springer International Publishing: Cham, Switzerland, 2022; pp. 441–464. [Google Scholar]
  34. Wang, X.; Xu, X.; Sun, K.; Jiang, Z.; Li, M.; Wen, J. A color image encryption and hiding algorithm based on hyperchaotic system and discrete cosine transform. Nonlinear Dyn. 2023, 111, 14513–14536. [Google Scholar] [CrossRef]
  35. Ashtekar, S.; Kumar, P.; Kumar, R. Study of generalized cross correlation techniques for direction finding of wideband signals. In Proceedings of the 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; pp. 707–714. [Google Scholar]
Figure 1. Original audio spectrogram of laying hen sounds. (a) Normal sound; (b) Environmental sound; (c) Cough sound.
Figure 1. Original audio spectrogram of laying hen sounds. (a) Normal sound; (b) Environmental sound; (c) Cough sound.
Agriculture 16 00028 g001
Figure 2. Experimental setup for sound source localization of laying hen cough sounds.
Figure 2. Experimental setup for sound source localization of laying hen cough sounds.
Agriculture 16 00028 g002
Figure 3. Framework diagram of the laying hen cough audio recognition and localization system. The asterisks (*) in the figure indicate the improvements in this study.
Figure 3. Framework diagram of the laying hen cough audio recognition and localization system. The asterisks (*) in the figure indicate the improvements in this study.
Agriculture 16 00028 g003
Figure 4. Validation Performance of the BiLSTM-Attention Model. (a) Accuracy curve; (b) Loss curve.
Figure 4. Validation Performance of the BiLSTM-Attention Model. (a) Accuracy curve; (b) Loss curve.
Agriculture 16 00028 g004
Figure 5. Comparison of different TDOA methods and their results in the time delay estimation stage. (a) Experiment 1–4; (b) Experiment 5–8; (c) Experiment 9–12; (d) Experiment 13–16; (e) Experiment 17–20; (f) Experiment 21–24; (g) Experiment 25–28; (h) Experiment 29–31.
Figure 5. Comparison of different TDOA methods and their results in the time delay estimation stage. (a) Experiment 1–4; (b) Experiment 5–8; (c) Experiment 9–12; (d) Experiment 13–16; (e) Experiment 17–20; (f) Experiment 21–24; (g) Experiment 25–28; (h) Experiment 29–31.
Agriculture 16 00028 g005aAgriculture 16 00028 g005bAgriculture 16 00028 g005c
Figure 6. Comparison of 3D localization errors for three methods. (a) Comparison of localization errors on X-axis for three methods; (b) Comparison of localization errors on Y-axis for three methods; (c) Comparison of localization errors on Y-axis for three methods.
Figure 6. Comparison of 3D localization errors for three methods. (a) Comparison of localization errors on X-axis for three methods; (b) Comparison of localization errors on Y-axis for three methods; (c) Comparison of localization errors on Y-axis for three methods.
Agriculture 16 00028 g006
Table 1. Hyperparameter settings of the BiLSTM-attention model.
Table 1. Hyperparameter settings of the BiLSTM-attention model.
ParameterParameter Value
Input size81
Hidden size512
LSTM layers2
Dropout rate0.5
OptimizerAdamW
Learning rate1 × 10−4
Batch size16
Table 2. Training results of different models across sound categories.
Table 2. Training results of different models across sound categories.
ModelClassPrecisionRecallF1-Score
SVM [11]cough89.71%84.72%0.8714
normal80.00%73.17%0.7643
environment77.38%89.04%0.8280
macro82.36%82.31%0.8213
MLPcough86.96%86.96%0.8696
normal60.27%84.62%0.7040
environment84.85%51.85%0.6437
macro77.36%74.47%0.7391
BP [10]cough90.48%82.61%0.8636
normal72.73%76.92%0.7477
environment80.00%81.48%0.8073
macro81.07%80.34%0.8062
LSTMcough84.21%87.67%0.8591
normal91.80%69.14%0.7887
environment73.33%90.41%0.8098
macro83.12%82.41%0.8192
BiLSTM [13]cough91.94%85.07%0.8837
normal78.95%77.92%0.7843
environment84.27%90.36%0.8721
macro85.05%84.45%0.8467
BiLSTM-attentioncough98.08%94.44%0.9623
normal90.74%83.05%0.8673
environment78.26%92.31%0.8471
macro89.03%89.93%0.8922
Table 3. Test results of the BiLSTM-Attention model.
Table 3. Test results of the BiLSTM-Attention model.
ClassPrecisionRecallF1-Score
cough97.50%90.70%0.9398
normal84.62%86.27%0.8544
environment86.67%89.66%0.8814
macro89.59%88.88%0.8918
Table 4. Time delay estimation results for measurement point 21.
Table 4. Time delay estimation results for measurement point 21.
Time Delay DifferenceTrue Value (ms)Re-PHAT (ms)PHAT (ms)GCC (ms)
τ12−0.0779007−0.0690572−0.0625−0.0625
τ130.00318880.000080800.0625
τ140.08187580.06197280.06250.0625
τ230.08108950.06877750.06250.0625
τ240.15977650.15027020.12500.1250
τ340.07868700.07253680.06250.0625
Table 5. Three-dimensional coordinate results for all measurement points.
Table 5. Three-dimensional coordinate results for all measurement points.
GroupTrue Position (m)Re-PHAT (m)PHAT (m)GCC (m)
1(1.752, 1.633, 1.936)(1.953, 2, 1.662)(1.443, 1.571, 1.395)(1.002, 1.553, 1.953)
2(1.901, 1.901, 1.961)(1.966, 1.998, 1.757)(1.48, 1.564, 1.194)(0.749, 1.766, 1.993)
3(0.99, 1.025, 1.851)(1.31, 0.925, 1.908)(1.3, 0.75, 1.45)(1.444, 1.974, 0.082)
4(0.978, 1.086, 1.839)(0.791, 0.842, 1.675)(0.85, 0.8, 1.9)(1.4, 1.4, 1.95)
5(0.609, 1.014, 1.221)(0.667, 1.029, 1.236)(1.35, 1.55, 1.75)(−0.278, −0.562, 0.063)
6(0.308, 1.009, 1.221)(0.15, 0.66, 0.816)(0.3, 1.55, 1.8)(0.3, 1.55, 1.8)
7(0, 0.992, 1.221)(0, 1.093, 1.44)(0, 0.95, 1.35)(2.168, −1.006, 0)
8(0.009, 1, 0.334)(−0.05, 1.185, 0.592)(0, 0.6, 0.35)(0, 0.6, 0.35)
9(−0.105, 1.994, 0.352)(−0.245, 1.897, 0.59)(−0.25, 1.85, 0.65)(−0.048, 1.023, 0.508)
10(−0.16, 1.007, 1.851)(−0.1, 1.085, 1.735)(0, 0.95, 1.35)(0.713, −1.122, 0)
11(−0.268, 1.376, 1.136)(−0.413, 1.61, 1.105)(−0.061, 0.199, 0.127)(0, 0.6, 0.35)
12(−0.354, 1.029, 1.445)(−0.298, 1.14, 1.773)(−0.25, 1, 1.7)(−0.439, 1.711, 0.092)
13(−0.998, 1.728, 1.397)(−1.363, 1.976, 1.35)(−1.473, 1.823, 1.149)(−0.234, 1.259, 1.33)
14(−0.744, 1.047, 1.447)(−0.908, 1.215, 1.87)(−1.093, 1.362, 1.823)(0.05, 0.05, 0.1)
15(−1.76, 1.887, 1.136)(−1.664, 1.795, 1.318)(−1.5, 1.55, 1.05)(−0.091, 1.185, 1.544)
16(−1.087, 1.013, 1.447)(−1.445, 1.079, 1.85)(−1.3, 0.75, 1.45)(1.119, −1.996, 0)
17(−1.655, 1.159, 1.021)(−1.726, 1.16, 1.014)(−1.6, 1.3, 1.55)(−0.303, 1.274, 1.576)
18(−1.879, 1.085, 1.975)(−1.93, 1.125, 1.898)(−1.3, 0.75, 1.45)(−1.205, 0.712, 1.488)
19(−2.007, −1.023, 0.627)(−1.853, −0.788, 1.212)(−1.961, −0.906, 1.315)(−1.3, −0.75, 1.45)
20(−1.751, −1.011, 1.839)(−1.43, −0.79, 1.785)(−1.3, −0.75, 1.45)(1.332, 1.726, 0)
21(−1.738, −1.044, 1.906)(−1.417, −0.783, 1.812)(−0.95, −0.55, 1.55)(−0.85, −0.8, 1.9)
22(−1.786, −1.499, 1.699)(−1.66, −1.393, 1.825)(−1, −1.516, 1.72)(−1.177, −1.054, 1.731)
23(−1.582, −1.886, 1.923)(−1.515, −1.742, 1.804)(−1.35, −1.55, 1.75)(−0.279, −0.559, 0.559)
24(−0.559, −1.098, 1.024)(−0.523, −0.995, 0.945)(−1.1, −1.9, 1.95)(−0.7, −1.8, 2)
25(−0.474, −1.017, 1.641)(−0.242, −0.595, 1.13)(−0.25, −1, 1.7)(−0.25, −0.9, 1.85)
26(−0.106, −1.012, 1.35)(−0.05, −0.755, 1.132)(0, −0.95, 1.35)(−0.25, −0.9, 1.85)
27(0, −1.963, 1.975)(0.093, −1.566, 1.819)(0, −1.115, 1.584)(−0.3, −1.55, 1.8)
28(0.218, −0.944, 1.678)(0.211, −0.826, 1.643)(0.25, −1, 1.7)(0.25, −1, 1.7)
29(1.012, −1.652, 1.856)(0.816, −1.275, 1.583)(1.35, −1.55, 1.75)(0.717,−1.12, 1.391)
30(0.057, −1.082, 0.526)(0.15, −1.4, 0.75)(0, −0.6, 0.35)(0, −0.6, 0.35)
31(0.95, −1, 1.352)(0.7, −0.5, 1.15)(0.95, −0.55, 1.55)(0.95, −0.55, 1.55)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Qiu, F.; Li, Q.; Zhuang, Y.; Ding, X.; Wu, Y.; Wang, Y.; Zhao, Y.; Zhang, H.; Ren, Z.; Lai, C.; et al. Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds. Agriculture 2026, 16, 28. https://doi.org/10.3390/agriculture16010028

AMA Style

Qiu F, Li Q, Zhuang Y, Ding X, Wu Y, Wang Y, Zhao Y, Zhang H, Ren Z, Lai C, et al. Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds. Agriculture. 2026; 16(1):28. https://doi.org/10.3390/agriculture16010028

Chicago/Turabian Style

Qiu, Feng, Qifeng Li, Yanrong Zhuang, Xiaoli Ding, Yue Wu, Yuxin Wang, Yujie Zhao, Haiqing Zhang, Zhiyu Ren, Chengrong Lai, and et al. 2026. "Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds" Agriculture 16, no. 1: 28. https://doi.org/10.3390/agriculture16010028

APA Style

Qiu, F., Li, Q., Zhuang, Y., Ding, X., Wu, Y., Wang, Y., Zhao, Y., Zhang, H., Ren, Z., Lai, C., & Yu, L. (2026). Improved BiLSTM-TDOA-Based Localization Method for Laying Hen Cough Sounds. Agriculture, 16(1), 28. https://doi.org/10.3390/agriculture16010028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop