Research on Evaluating the Filtering Method for Broiler Sound Signal from Multiple Perspectives

Simple Summary Broiler sound signals can reflect their own health status, such as crowing when they are hungry, coughing when they are in pain or ill, purring when there are foreign bodies in their throat, and flapping wings when they are fighting with each other. Many studies have been carried out to improve animal welfare based on this. However, the reality is that in addition to the emotional sounds mentioned above, there still exists back noise in the broiler sound signal collected in the farm, such as the noise of people walking and ventilation equipment. Therefore, it is necessary to adopt effective signal filtering methods to filter out these noises, and further obtain high-quality broiler sound signals for animal welfare research. In this study, five of the most classic and effective signal filtering methods were used to filter the back noise in broiler sound signals, and different evaluation indicators from two angles were used for “scoring” the filtering effect of each method in order to select the outstanding performers. These studies have laid the foundation for the follow-up study of broiler health monitoring. Abstract Broiler sounds can provide feedback on their own body condition, to a certain extent. Aiming at the noise in the sound signals collected in broiler farms, research on evaluating the filtering methods for broiler sound signals from multiple perspectives is proposed, and the best performer can be obtained for broiler sound signal filtering. Multiple perspectives include the signal angle and the recognition angle, which are embodied in three indicators: signal-to-noise ratio (SNR), root mean square error (RMSE), and prediction accuracy. The signal filtering methods used in this study include Basic Spectral Subtraction, Improved Spectral Subtraction based on multi-taper spectrum estimation, Wiener filtering and Sparse Decomposition using both thirty atoms and fifty atoms. In analysis of the signal angle, Improved Spectral Subtraction based on multi-taper spectrum estimation achieved the highest average SNR of 5.5145 and achieved the smallest average RMSE of 0.0508. In analysis of the recognition angle, the kNN classifier and Random Forest classifier achieved the highest average prediction accuracy on the data set established from the sound signals filtered by Wiener filtering, which were 88.83% and 88.69%, respectively. These are significantly higher than those obtained by classifiers on data sets established from sound signals filtered by other methods. Further research shows that after removing the starting noise in the sound signal, Wiener filtering achieved the highest average SNR of 5.6108 and a new RMSE of 0.0551. Finally, in comprehensive analysis of both the signal angle and the recognition angle, this research determined that Wiener filtering is the best broiler sound signal filtering method. This research lays the foundation for follow-up research on extracting classification features from high-quality broiler sound signals to realize broiler health monitoring. At the same time, the research results can be popularized and applied to studies on the detection and processing of livestock and poultry sound signals, which has extremely important reference and practical value.


Introduction
Relevant studies have shown that animal sounds contain a lot of emotional information, such as crowing when they are hungry [1], coughing when they are in pain or ill [2,3], purring when there are foreign bodies in their throat, and flapping wings when they are fighting with each other [4]. Therefore, sounds can be used to obtain feedback on the animal's own body conditions and emotion changes and can be used as an auxiliary method to evaluate animal welfare [5]. Compared with traditional physiological index detection, such as obtaining animal blood for detection, the sound detection method has the advantages of no stress, no contact, and continuous collection. Therefore, researchers can evaluate the comfort level of animals' living environment and health status by collecting their sounds in the farm and carry out further measures to improve animal welfare [6,7]. Many researchers have carried out related research on this. For example, Marx G et al. [8] studied chicks' voices in their grouping process, including distress calls, short peeps, warblers, and pleasure notes, and found that the chicks' voices would change with their growth environment and social skills. Aydin, A. et al. [9] studied broiler feed intake and designed an advanced sound detection system for detecting broilers' continuous pecking behavior. The system used real-time sound processing technology to accurately measure the broilers' feed intake at the group level. Carpentier, Lenn et al. [10] studied chicken sneezes under different noise sources. By extracting the features that can characterize a chicken's sneezes, the number of sneezes in a fixed-duration chicken sound signal can be monitored, and then the chicken's health can be obtained. Yang et al. [11] studied sounds from different sources in commercial broiler houses. By using the Maximum Overlap Discrete Wavelet Transform (MODWT), the Fast Fourier Transform (FFT) method, and the Gaussian regression model, the sound frequency ranges of six common sounds, including bird vocalization, fan, feed system, heater, flapping wing, and dustbathing, were determined. Du et al. [12] developed a monitoring system by using a Kinects microphone array. It is a feasible method to detect the abnormal situation of poultry at night by monitoring the number and regional distribution of sounds. Huang et al. [13] proposed a new detection method for bird flu based on audio analysis. By analyzing and processing the frequency domain of the extracted broiler sound signal, the Mel-Frequency Cepstral Coefficients (MFCC) was determined to be used as the distinguishing standard of healthy and infected broilers, and a parameter-optimized Support Vector Machine (SVM) was used to quickly predict broiler health.
It can be seen that broiler sound signals are the basis of various studies. Therefore, obtaining high-quality broiler sound signals is very important to carry out research on broilers' growth condition and health status. Many researchers have used various signal filtering methods to process the sound signals collected in the farms in order to obtain a reliable data source for following research. For example, in his research mentioned above, Carpentier, Lenn et al. [10] used Spectral Subtraction to filter the sound signal before extracting features from broilers' sneezes. Cheng et al. [14] proposed a new broiler sound recognition method based on Sparse Representation. Specifically, it used Sparse Representation to reconstruct the sound signal, to complete signal filtering and feature extraction, and also used the SVM as the classification model to recognize the broiler sound in different environments. Aiming at the noise generated by fans and other ventilation equipment in the perch breeding mode, Yu et al. [15] used a variety of signal filtering methods to remove fan noise from the laying hens' sound signal and used a pattern recognition algorithm to identify the specific sound type from the filtered sound signal.
The research of broiler sound signal filtering method is still being explored. In recent years, the author's research group has been committed to introducing machine learning methods into the research of signal classification and recognition [16][17][18][19][20][21], and has further applied the research results to conduct research on broiler health monitoring, which realized effective recognition of a fixed-duration sound signal collected in the broiler farm by constructing a classifier with good performance; i.e., one that is able to identify the specific sound types and their numbers in a fixed-duration sound signal, including crows, coughs, purrs, and flapping wings. By calculating the ratio of the number of coughs to the number of all sound types, the cough rate can be obtained. Then, we can judge the broilers' health in the current area by comparing the value of the cough rate. It can be seen that the prediction performance of the classifier directly determines the judgment of the broilers' health. The data set for training the classifier is extracted from a large number of broiler sound signals. Therefore, the quality of the sound signal determines the quality of the data set, and the quality of the data set affects the prediction performance of the classifier. It can be seen that the broiler sound signal filtering method proposed in this study is an important element which is used to efficiently filter the collected sound signals to form a high-quality data set. In addition, this research combined the signal-to-noise ratio (SNR) and the root mean square error (RMSE) to comprehensively evaluate the processing effects of different methods from the signal angle, and prediction accuracy was used to evaluate their processing effects from the recognition angle. Finally, both the signal angle and the recognition angle were considered comprehensively, and the best broiler sound signal filtering method was obtained.

Methods
On the basis of summarizing the previous studies and consulting a large amount of literatures, this article took five signal filtering methods as the research objects, including Basic Spectral Subtraction, Improved Spectral Subtraction based on multi-taper spectrum estimation, Wiener filtering, and Sparse Decomposition using thirty atoms and fifty atoms [22][23][24][25][26], to evaluate their filtering effect on broiler sound signal from multiple perspectives. In this chapter, the author briefly describes the principles of the five methods.

Basic Spectral Subtraction
Basic Spectral Subtraction is a signal filtering method introduced in the early days. It is based on the assumption of additive noise and obtains a pure signal by subtracting noise components from a noisy signal [27]. Suppose that the time sequence of the sound signal is x(n), the i-th frame signal after windowing and framing is x i (m), and the frame length is N. Any frame signal x i (m) after Fast Fourier Transform (FFT) is: Calculate the absolute value of X i (k), obtain the amplitude of each frame signal as |X i (k) |, and then calculate its phase angle as: Save the amplitude and phase angle of each frame signal. Knowing that the duration of the preamble silent frame (noise segment) is IS, and the corresponding frame number is NIS, the average energy value of the noise segment can be calculated as: The calculation formula of Basic Spectral Subtraction is further obtained as: In the formula, a and b are two constants; a is an over-subtraction factor, and b is a gain compensation factor.
After the Spectral Subtraction is obtained, the amplitude of each frame signal is |X i (k)|, and the phase angle of each frame signal is X i angle (k). By performing Inverse Fast Fourier Transform (IFFT) on |X i (k)| and X i angle (k), the signal sequence after Basic Spectral Subtraction filtering can be obtained asx i (m). Among them, the insensitivity of sound signal to the phase is used, and the phase angle information before spectral subtraction is directly used in the signal after spectral subtraction. The principle of Basic Spectral Subtraction is shown in Figure 1. sound signal to the phase is used, and the phase angle information before spectral subtraction is directly used in the signal after spectral subtraction. The principle of Basic Spectral Subtraction is shown in Figure 1.

Improved Spectral Subtraction
In Basic Spectral Subtraction, the periodogram method is commonly used when estimating the power spectrum of the signal, while the traditional periodogram method uses only one data window [28]. However, in the process of using this method, it will inevitably bring about a large estimation deviation and estimation variance, resulting in the enhanced signal containing large amounts of noise with certain rhythmic fluctuations and sounds similar to music [29]. For this reason, Thomson [30] proposed an improved multitaper spectrum estimation method in 1982, it used multiple orthogonal data windows to obtain a direct spectrum for the same data sequence, and then averaged them to obtain the spectrum estimate, and thus a smaller estimate variance could be obtained. Therefore, the multi-taper spectrum is a more accurate spectrum estimation method than the periodogram method, and using Improved Spectral Subtraction method based on it can effectively reduce the "music noise" in the signal.
The definition of the multi-taper spectrum is: In the formula, L is the number of data windows and is the spectrum of the − ℎ data window, which is defined as: In the formula, ( ) is the signal sequence and N is the sequence length. ( ) is the − ℎ data window function, which meets the mutual orthogonality between data windows.
Among them, the data window is a set of orthogonal discrete ellipsoid sequences, also called Slepian windows. The specific steps of Improved Spectral Subtraction based on multi-taper spectrum estimation are as follows.
Step 1: As known that the time sequence of the sound signal is ( ), the − ℎ frame signal after windowing and framing is ( ), and there is overlap between adjacent frames.
Step 2: Perform FFT on the frame signal ( ), calculate the amplitude spectrum | ( )| and phase spectrum ( ) respectively, and perform smoothing between adjacent frames to calculate the average amplitude spectrum | ( )| as:

Improved Spectral Subtraction
In Basic Spectral Subtraction, the periodogram method is commonly used when estimating the power spectrum of the signal, while the traditional periodogram method uses only one data window [28]. However, in the process of using this method, it will inevitably bring about a large estimation deviation and estimation variance, resulting in the enhanced signal containing large amounts of noise with certain rhythmic fluctuations and sounds similar to music [29]. For this reason, Thomson [30] proposed an improved multitaper spectrum estimation method in 1982, it used multiple orthogonal data windows to obtain a direct spectrum for the same data sequence, and then averaged them to obtain the spectrum estimate, and thus a smaller estimate variance could be obtained. Therefore, the multi-taper spectrum is a more accurate spectrum estimation method than the periodogram method, and using Improved Spectral Subtraction method based on it can effectively reduce the "music noise" in the signal.
The definition of the multi-taper spectrum is: In the formula, L is the number of data windows and S mt is the spectrum of the k-th data window, which is defined as: In the formula, x(n) is the signal sequence and N is the sequence length. a k (n) is the k-th data window function, which meets the mutual orthogonality between data windows.
Among them, the data window is a set of orthogonal discrete ellipsoid sequences, also called Slepian windows. The specific steps of Improved Spectral Subtraction based on multi-taper spectrum estimation are as follows.
Step 1: As known that the time sequence of the sound signal is x(n), the i-th frame signal after windowing and framing is x i (m), and there is overlap between adjacent frames.
Step 2: Perform FFT on the frame signal x i (m), calculate the amplitude spectrum |X i (k)| and phase spectrum θ i (k) respectively, and perform smoothing between adjacent frames to calculate the average amplitude spectrum |X i (k)| as: Take the i-th frame signal as the center, and take M frame signals before and after, so there are 2M + 1 frame signals for mean value calculation. In practice, M is usually taken as one, that is, the average is carried out in three frame signals.
Step 3: Perform multi-taper spectrum estimation on the frame signal x i (m) to obtain the multi-taper spectrum power spectrum density P(k, i) (i represents the i-th frame signal, k represents the k-th spectral line): In the formula, PMTM is the multi-taper spectrum power spectral density estimation of the signal.
Step 4: The multi-taper spectrum power spectrum density estimation value is also smoothed between adjacent frames, and the smooth power spectrum density P y (k, i) is calculated as: Similarly, take the i-th frame signal as the center, and take M frame signals before and after, so there are 2M + 1 frame signals for mean calculation. In practice, M is usually taken as one, that is, the average is carried out in three frame signals.
Step 5: Knowing that the duration of the preamble silent frame (noise segment) is IS, and the corresponding frame number is NIS, the average power spectral density value of the noise can be calculated as P n (k). The calculation formula is: Step 6: The gain factor g(k, i) is calculated by using the spectral subtraction relationship as: In the formula, α and β are two constants, α is an over-subtraction factor, and β is a gain compensation factor. Properly selected, α can effectively remove music noise, but too large a value of α will cause sound distortion.
Step 7: Through the gain factor g(k, i) and the average amplitude spectrum |X i (k)|, the amplitude spectrum after spectrum subtraction can be obtained as: Combine the subtracted amplitude spectrum |X i (k)| with the phase spectrum θ i (k) in step 2 to perform IFFT. Restore the frequency domain |X i (k)| to the time domain to obtain the noise-reduced sound signalx i (m) as: The principle of Improved Spectral Subtraction is shown in Figure 2.

Wiener Filtering
Wiener filtering is based on the statistical independence of sound and noise, and uses the Minimum Mean Square Error (MMSE) criterion for signal filtering [31,32]. Suppose the noisy sound signal is: In the formula, ( ) and ( ) represent pure sound signal and noise, respectively. Wiener filtering is to design a filter ℎ( ) for which, when the input is ( ), the output is: In the formula, ( ) minimizes the mean square error = [{ ( ) −( )} ] of ( ) and ( ) according to MMSE.
According to the principle of orthogonality, the best ℎ( ) must satisfy Equation (17) for all : Incorporating Equation (16) into Equation (17) and taking the Fourier Transform, the spectrum estimator of Wiener filtering can be derived as: In the formula, ( ) is the power spectral density of sound signal, and ( ) is the power spectral density of noise. The signal filtering is realized by minimizing the mean square error of | ( ) − ( )| . The above-mentioned Wiener filtering process (input-output relationship) is shown in Figure 3. Based on this, the specific implementation steps of Wiener filtering are as follows:

Wiener Filtering
Wiener filtering is based on the statistical independence of sound and noise, and uses the Minimum Mean Square Error (MMSE) criterion for signal filtering [31,32]. Suppose the noisy sound signal is: In the formula, s(n) and d(n) represent pure sound signal and noise, respectively. Wiener filtering is to design a filter h(n) for which, when the input is y(n), the output is: In the formula,ŝ(n) minimizes the mean square error ε = E[{s(n) −ŝ(n)} 2 ] of s(n) and d(n) according to MMSE.
According to the principle of orthogonality, the best h(n) must satisfy Equation (17) for all m: Incorporating Equation (16) into Equation (17) and taking the Fourier Transform, the spectrum estimator of Wiener filtering can be derived as: In the formula, P s (k) is the power spectral density of sound signal, and P d (k) is the power spectral density of noise. The signal filtering is realized by minimizing the mean square error of |x(n) −x(n)| 2 . The above-mentioned Wiener filtering process (input-output relationship) is shown in Figure 3.

Wiener Filtering
Wiener filtering is based on the statistical independence of sound and noise, and uses the Minimum Mean Square Error (MMSE) criterion for signal filtering [31,32]. Suppose the noisy sound signal is: In the formula, ( ) and ( ) represent pure sound signal and noise, respectively. Wiener filtering is to design a filter ℎ( ) for which, when the input is ( ), the output is: In the formula, ( ) minimizes the mean square error = [{ ( ) −( )} ] of ( ) and ( ) according to MMSE.
According to the principle of orthogonality, the best ℎ( ) must satisfy Equation (17) for all : Incorporating Equation (16) into Equation (17) and taking the Fourier Transform, the spectrum estimator of Wiener filtering can be derived as: In the formula, ( ) is the power spectral density of sound signal, and ( ) is the power spectral density of noise. The signal filtering is realized by minimizing the mean square error of | ( ) − ( )| . The above-mentioned Wiener filtering process (input-output relationship) is shown in Figure 3. Based on this, the specific implementation steps of Wiener filtering are as follows: Based on this, the specific implementation steps of Wiener filtering are as follows: Step 1: As known that the time sequence of the sound signal is x(n), the i-th frame signal after windowing and framing is x i (m), and there is overlap between adjacent frames as well.
Step 2: Perform FFT on the frame signal x i (m), calculate the amplitude spectrum |X i (k)| and phase spectrum θ i (k), respectively, save them, and further calculate the power spectrum as: Step 3: Knowing that the duration of the preamble silent frame (noise segment) is IS, and the corresponding frame number is NIS, the average power spectral density of the noise segment can be calculated as P n (k), and the average noise amplitude spectrum can be calculated as |X n (k, i)|: Step 4: We use endpoint detection to obtain segment and non-segment frames. For segment frames, we modify the average noise power spectrum (variance) and the average noise amplitude spectrum. Among them, the average power spectrum of noise is λ d (k). For non-segment frames, the prior SNRξ i (k) and the posterior SNR γ i (k) are calculated first, and then the transfer function H i (k) of Wiener filtering is determined.
In the formula, Y i (k) is the spectral value of the noisy signal at the corresponding frequency point.ξ In the formula, Step 5: Calculate the outputŜ i (k) of each frame signal after Wiener filtering as: Step 6: After performing Inverse Fourier Transform (IFT) on the amplitude spectrum S i (k) and the phase spectrum θ i (k), the sound signal restored to the time domain can be obtained, which is the filtered signalx i (m):

Sparse Decomposition
Analyzing from the perspective of sparse decomposition, the noisy signal contains two parts, namely the signal and the noise, and the signal is the sparse component in the noisy signal. The signal has a certain structure, and its structural characteristics are consistent with the atomic characteristics. While noise is random and uncorrelated, so it has no structural characteristics [33]. If the meaningful (the enough larger energy) atoms can be extracted from the signal, the extracted part will be used as the signal. If we cannot continue to extract meaningful atoms from the signal residual, it is considered that all the signal residuals are noises.
In the iterative process of signal sparse decomposition, each decomposition iterative process selects the atomic vector g r k with the largest inner product of the signal or signal residual R k f . This is actually equivalent to the maximum correlation between the atomic vector g r k and the signal or residual signal R k f , and g r k is the closest to R k f . Sparse Decomposition is the process of continuously tracking and extracting the atomic vector that best matches the original signal and its residual signal. These extracted atomic vectors can be regarded as the components of the original signal.
The key to performing signal filtering by using Sparse Decomposition is how to determine what kind of atom represents the signal, which is meaningful. In the signal sparse decomposition process based on the Matching Pursuit (MP) algorithm, this problem is the determination of the iterative stop condition. Generally speaking, there are two ways to determine the stop of iteration. One method is to use hard thresholds. That is, the iterative stop condition of Sparse Decomposition is set as the upper limit M of the number of iterations, and then the linear combination of M atoms is used to approximate the original noisy signal, and the approximate representation of the signal is taken as the denoised signal of the original noisy signal, and the residual signal is used as noise. Another method is the coherence ratio threshold method. That is, the sparse decomposition iteration stop condition is set as a lower limit of the coherence ratio [34].
The Orthogonal Matching Pursuit (OMP) algorithm based on a Genetic Algorithm (GA), which is continuously developed on the basis of the MP algorithm, was determined as Sparse Decomposition method specifically used in this article, and the first iterative stopping method was also determined. Through the summary of a large number of previous experiments, Sparse Decomposition method of using thirty atoms and fifty atoms was determined to achieve better filtering effects than using other atom numbers. The process of Sparse Decomposition by using GA-OMP is shown in Figure 4. that best matches the original signal and its residual signal. These extracted atomic vectors can be regarded as the components of the original signal.
The key to performing signal filtering by using Sparse Decomposition is how to determine what kind of atom represents the signal, which is meaningful. In the signal sparse decomposition process based on the Matching Pursuit (MP) algorithm, this problem is the determination of the iterative stop condition. Generally speaking, there are two ways to determine the stop of iteration. One method is to use hard thresholds. That is, the iterative stop condition of Sparse Decomposition is set as the upper limit M of the number of iterations, and then the linear combination of M atoms is used to approximate the original noisy signal, and the approximate representation of the signal is taken as the denoised signal of the original noisy signal, and the residual signal is used as noise. Another method is the coherence ratio threshold method. That is, the sparse decomposition iteration stop condition is set as a lower limit of the coherence ratio [34].
The Orthogonal Matching Pursuit (OMP) algorithm based on a Genetic Algorithm (GA), which is continuously developed on the basis of the MP algorithm, was determined as Sparse Decomposition method specifically used in this article, and the first iterative stopping method was also determined. Through the summary of a large number of previous experiments, Sparse Decomposition method of using thirty atoms and fifty atoms was determined to achieve better filtering effects than using other atom numbers. The process of Sparse Decomposition by using GA-OMP is shown in Figure 4. When GA-OMP performs signal reconstruction, in order to better describe the timevarying characteristics of the signal, a complete time frequency atomic dictionary is usually used. Since the Gabor dictionary has better time frequency characteristics, we chose it as the over-complete dictionary in this article. The formula is: When GA-OMP performs signal reconstruction, in order to better describe the timevarying characteristics of the signal, a complete time frequency atomic dictionary is usually used. Since the Gabor dictionary has better time frequency characteristics, we chose it as the over-complete dictionary in this article. The formula is: In the formula, g r (t) = e −πt 2 is the Gaussian window function, γ = (s, µ, v, ω) is the time frequency parameter, s is the expansion factor, µ is the translation factor, v is the frequency factor, and ω is the phase factor.

Verification and Analysis
The broiler sound signals were collected in the No. This area is about two meters away from the broiler colony breeding area. It should be noted that because the sound pitch of broilers is small, and multiple broilers will vocalize at the same time, it is difficult to directly use the audio collection system to collect the broiler sound signals in the broiler group breeding area. Most of these sound signals overlap, which is not conducive to the development of this research. Therefore, the author built an experimental area far away from the broiler group breeding area for the successful collection of broiler sound signals. In this way, the problem of overlap broiler sound signals had been solved, and the built experimental area under the same noisy environment. The noise here includes the sound made by breeders walking in the walkway, and by the vertically placed ventilation equipment.
The specific breed of broiler in the farm is white-feather broiler. Generally, their incubation period is twenty-one days (about three weeks) and the growth period is fifty days (about seven weeks). The test subjects selected by the author are four white-feather broilers of the same batch of eight weeks old, of which two are healthy and the other two are diseased. The author put them into the experimental area one by one and used the audio collection system to collect sound signals. The audio collection system used the chassis of National Instruments Co., Ltd. (Austin, TX, U.S.), and its specific model is PXI-1050. For the chassis, the model of the internal controller is PXI-8196, the model of the sound capture card is NI4472B (8-channels synchronous acquisition, 24 bits resolution, 102.4 kS/s sampling frequency). The sound sensor selects the model of MPA201 from BSWA TECHNOLOGY CO., LTD (Beijing, China), the response frequency is 20 Hz to 20,000 Hz, and the sensitivity is 50 mV/Pa. The sound recording software chooses NI Sound and Vibration Assistant 2010, the sampling frequency is 32 kHz and the sampling accuracy is 16 bits, and monophonic acquisition can be performed. The audio collection system collects fixed-duration sound signals in a time unit of five minutes, the interval between two adjacent sound signals is thirty seconds, and the collected sound signals are stored in the computer in the "*.tdms" format. From reference [35], the specific sound collection and procession information mentioned above can be seen. For each five-minute sound signal, the author performed pulse extraction and endpoint detection in MATLAB. By setting the detection threshold to be 1.5 times the total energy of the leading non-segment frame, the effective signal pulse can be extracted. Through the artificial recognition of the broiler sound signals saved after pulse extraction and endpoint detection, the specific sound types that can be detected include crows, coughs, purrs, and flapping wings. Among them, the crow is the natural short cheeping during the growth of healthy broilers, and this sound is relatively loud and sharp. The cough is the prolonged abnormal croak of diseased broilers, and this sound is relatively low. The purr is the snoring sound made when the broiler is asleep, and this sound is continuously fluctuating. The flapping wings sound is produced by inciting the friction and vibration between the wings and the air when the broiler is moving. The sound amplitude is large and the continuous time is long.
The author chose ten sound signals with better recording effects (that is, the sound signal contains as many crows, coughs, purrs, and flapping wings as possible), and used the audio processing software Adobe Audition 14.0 to intercept the ten-second-duration sound signals from each of the selected ten sound signals, which were used as the data source for the verification and analysis of this research. On this basis, the author designed the software programs of Basic Spectral Subtraction, Improved Spectral Subtraction based on multi-taper spectrum estimation, Wiener filtering, Sparse Decomposition using thirty atoms and fifty atoms, running on MATLAB, and processed the intercepted ten sound signals separately. The sound signals processed by the five signal filtering methods were evaluated from both the signal angle and the recognition angle, and the best-performing method was obtained through comprehensive analysis.

Multiple Perspective Evaluation Indicators
From the signal angle, this article mainly selects two indicators, signal-to-noise ratio (SNR) and root mean square error (RMSE), to evaluate the processing effects of five filtering methods. Suppose the noisy audio signal is x(n) = s(n) + d(n), where x(n) is the sound signal captured by voice recorder, and s(n) and d(n) represent the pure signal and noise, respectively. Then, the SNR refers to the ratio of signal to noise in a fixed-duration sound signal. Often, the larger the SNR of the sound signal processed by a certain signal filtering method, the better the processing effect of the current method. Its expression is: In the formula, ∑ N n=0 s(n) 2 represents the energy of the sound signal, ∑ N n=0 [x(n) − s(n)] 2 represents the energy of the noise.
The RMSE refers to the root mean square of the variance between the sound signal before and after signal filtering. Often, the smaller the RMSE of the sound signal processed by a certain signal filtering method, the closer the filtered sound signal is to the original noise-free signal and the better the processing effect of the current method. Its expression is: In the formula, x(n) is the sound signal before filtering, and f (n) is the sound signal after filtering.
From the recognition angle, this article extracted 39-dimensional MFCC features from the sound signals processed by five signal filtering methods, thereby we established five data sets for training classifiers accordingly. In this article, the kNN classifier and the Random Forest classifier under the default parameter configuration were used to make predictions on the five data sets separately, and five prediction accuracies were obtained. Therefore, on which data set the classifier achieved the highest prediction accuracy indicated that the overall quality of that data set was the best, further indicating that the quality of the sound signal that constitutes the data set is the best, and the processing effect of the corresponding signal filtering method is the best.
Suppose the data set is D = {(x 1 , y 1 ), (x 2 , y 2 ), · · · , (x m , y m )}, where y i is the true label corresponding to the data x i , and f (x i ) is the label predicted by the classifier f . The prediction accuracy can be expressed as the ratio of the number of data that are correctly predicted to the total number of data, that is: In the formula, I is the indicator function, and when f (x i ) = y i , I( f (x i ) = y i ) = 1.

Signal Angle
The author designed the software programs of Basic Spectral Subtraction, Improved Spectral Subtraction based on multi-taper spectrum estimation, Wiener filtering, and Sparse Decomposition running in MATLAB. Among them, the software program of Sparse Decomposition method contains using thirty atoms and fifty atoms. It should be noted that the selection of thirty atoms and fifty atoms is based on the author's extensive research in the previous period. In this article, five signal filtering methods were performed on the same intercepted ten sound signals, and the SNR and RMSE of the processed sound signals were calculated. The calculation results are shown in Table 1. In Table 1, the BSS represents Basic Spectral Subtraction, the ISS represents Improved Spectral Subtraction based on multi-taper spectrum estimation, the WF represents Wiener filtering, and the SD represents Sparse Decomposition using different atoms. The above abbreviations will not be repeated in the following table.
It can be seen from Table 1 that the SNR obtained on the sound signal that was filtered by using Improved Spectral Subtraction is greater than that obtained by using Basic Spectral Subtraction. The RMSE obtained on the sound signal that was filtered by using Improved Spectral Subtraction is slightly smaller than that obtained by using Basic Spectral Subtraction. Considering that the larger the SNR or the smaller the RMSE of the sound signal before and after signal filtering, the better the filtering effect, from the signal angle, the filtering effect of Improved Spectral Subtraction is therefore better than that of Basic Spectral Subtraction, which is in line with theoretical reasoning. At the same time, the SNR obtained on the sound signal that was filtered by using fifty-atom Sparse Decomposition is greater than that obtained by using thirty-atom Sparse Decomposition. The RMSE obtained on the sound signal that was filtered by using fifty-atom Sparse Decomposition is smaller than that obtained by using thirty-atom Sparse Decomposition. It is also concluded that fifty-atom Sparse Decomposition is better than thirty-atom Sparse Decomposition. Considering from the principle perspective, this shows that, while using thirty atoms, it not only sparse the noise in the noisy signal, but also sparse a part of the original noise-free signal. The SNR obtained on the sound signal that was filtered by using Wiener filtering is between the two Spectral Subtraction methods and the two Sparse Decomposition methods. From comprehensive analysis of the signal angle, Improved Spectral Subtraction based on multi-taper spectrum estimation has achieved the largest SNR and the smallest RMSE, indicating that this method has achieved the best filtering effect.

Recognition Angle
As mentioned above, five signal filtering methods were performed on the same intercepted ten sound signals, but we no longer calculate their SNR and RMSE at this time. Instead, for each processed sound signal, the author first performed framing processing on it, and then combined the endpoint detection method to obtain the start frame and end frame of one single specific sound type in one sound signal, and the combination of the start frame and the end frame of multiple specific sound types in one sound signal can be finally obtained. For example, after the sound signal framing processing and endpoint detection is completed, the 10-th frame is the start frame of the crow, and the 55-th frame is the end frame. This is the first specific sound type in this sound signal. It can also be obtained that the 67-th frame is the start frame of the cough, and the 135-th frame is the end frame. This is the second specific sound type in this sound signal. By analogy, multiple specific sound types in this sound signal can be obtained. For each specific sound type, pieces of feature data can be extracted from all signal frames between its start frame and end frame. Therefore, a total of forty-six feature data can be extracted from the 10-th frame to the 55-th frame, and the label of the data is "crow", and a total of sixty-eight feature data can be extracted from 67-th frame to 135-th frame, and also the label of the data is "cough". By analogy, feature data corresponding to multiple specific sound types in this sound signal can be obtained. By merging the feature data of the same label, multiple feature data corresponding to the four labels (crow, cough, purr, and flapping wing) can be extracted from one intercepted sound signal. Similarly, for ten intercepted sound signals processed by the same filtering method, multiple feature data corresponding to four labels can also be extracted from the ten processed sound signals to form the data set corresponding to the current filtering method. Therefore, this research can finally form five data sets corresponding to five signal filtering methods, and their specific description is shown in Table 2. It should be noted that the label of the data is marked by manual identification. Specifically, we invited experienced breeders to identify and record the specific sound types that appear in chronological order in the ten sound signals to ensure the reliability of data labeling. It should be noted that the features extracted from each frame signal are the 39-dimensional MFCC features. The MFCC used here is a cepstrum parameter extracted from the Mel-Frequency domain. It first maps the linear spectrum of the signal to the Mel nonlinear spectrum based on auditory perception, and then converts it to the cepstrum [36,37]. Equation (29) is to convert from frequency to Mel scale, and Equation (30) is to return from Mel scale to frequency. The block diagram of MFCC feature extraction is shown in Figure 5.
f mel = 2595log 10 (1 + f /700) (30) f = 700 10 f mel  As shown in Figure 5, we generally take the 1 − to 13 − ℎ cepstrum coefficients after Discrete Cosine Transform (DTC) as the standard 13-dimensional MFCC parameters, which reflect the static characteristics of the sound signal. The dynamic characteristics of the sound signal can be obtained by the difference of the static characteristics. The firstorder difference of static characteristics reflects the speed of the sound signal change, while the second-order difference reflects the acceleration of the sound signal change. We combined the standard MFCC parameters with the first-order difference and the secondorder difference to obtain a total of 39-dimensional MFCC features.
We used the kNN classifier and Random Forest classifier under the default parameter configuration to make predictions on the above five data sets respectively. In order to obtain relatively accurate prediction results, all experimental tests were run on the same computer. Each data set was performed ten predictions, and the average of the ten predictions was taken as the final prediction result to reduce the random influence. The final prediction results are shown in Table 3. It is worth noting that because the kNN classifier and the Random Forest classifier had achieved significant prediction effects in the entire research system that this research belongs to, this article selected them to make predictions on each data set from the recognition angle. Specifically, the kNN classifier adopts the idea of majority voting in principle, and it has achieved the best overall prediction performance in the entire research system, while the Random Forest classifier adopts the ideal of ensemble learning, and has also achieved better prediction performance. This article comprehensively analyzes the prediction results of the two, and evaluates the processing effects of the five signal filtering methods from the recognition angle. The specific descriptions of the prediction results achieved by the two classifiers on the five data sets are shown in Tables 4 and 5.  As shown in Figure 5, we generally take the 1-st to 13-th cepstrum coefficients after Discrete Cosine Transform (DTC) as the standard 13-dimensional MFCC parameters, which reflect the static characteristics of the sound signal. The dynamic characteristics of the sound signal can be obtained by the difference of the static characteristics. The firstorder difference of static characteristics reflects the speed of the sound signal change, while the second-order difference reflects the acceleration of the sound signal change. We combined the standard MFCC parameters with the first-order difference and the secondorder difference to obtain a total of 39-dimensional MFCC features.
We used the kNN classifier and Random Forest classifier under the default parameter configuration to make predictions on the above five data sets respectively. In order to obtain relatively accurate prediction results, all experimental tests were run on the same computer. Each data set was performed ten predictions, and the average of the ten predictions was taken as the final prediction result to reduce the random influence. The final prediction results are shown in Table 3. It is worth noting that because the kNN classifier and the Random Forest classifier had achieved significant prediction effects in the entire research system that this research belongs to, this article selected them to make predictions on each data set from the recognition angle. Specifically, the kNN classifier adopts the idea of majority voting in principle, and it has achieved the best overall prediction performance in the entire research system, while the Random Forest classifier adopts the ideal of ensemble learning, and has also achieved better prediction performance. This article comprehensively analyzes the prediction results of the two, and evaluates the processing effects of the five signal filtering methods from the recognition angle. The specific descriptions of the prediction results achieved by the two classifiers on the five data sets are shown in Tables 4 and 5.  It can be seen from the above table that, when the kNN classifier and the Random Forest classifier were used to make predictions on the five data sets, respectively, the average prediction accuracy achieved by the two classifiers on the data set established from the sound signal filtered by Improved Spectral Subtraction are both higher than that achieved by Basic Spectral Subtraction, which are about 0.6% and 1.3% higher, respectively. The average prediction accuracy achieved by the two classifiers on the data set established from the sound signal that filtered by the thirty-atom Sparse Decomposition is equivalent to the fifty-atom Sparse Decomposition. The two classifiers both achieved the highest average prediction accuracy on the data set established from the sound signal that was filtered by Wiener filtering, being respectively 88.83% and 88.69%, which are significantly higher than that achieved by other four signal filtering methods. Therefore, from the recognition angle, Improved Spectral Subtraction based on multi-taper spectrum estimation is better than Basic Spectral Subtraction, which is consistent with the conclusion drawn from the signal angle. In addition, Wiener filtering has achieved the best filtering effect, and the obtained prediction accuracy was about 10% higher than other four signal filtering methods.

Comprehensive Analysis
On the basis of analyzing and drawing preliminary conclusions from the signal angle and the recognition angle, respectively, we have to conduct a comprehensive analysis. From the signal angle, Improved Spectral Subtraction based on multi-taper spectral estimation has achieved the best signal filtering effect, while from the recognition angle, Wiener filtering has achieved the best signal filtering effect. Considering that Wiener filtering has achieved obvious performance advantages from the recognition angle, but its performance is very general from the signal angle, we have to consider it. We take one sound signal processed by Wiener filtering to draw a time-domain signal diagram, and intercept one specific sound type to draw the time-domain signal diagram, as shown in Figure 6. calculated SNR was higher than Improved Spectral Subtraction, and the newly calculated RMSE was close to Improved Spectral Subtraction and was significantly better than the other four signal filtering methods. Considering that the larger the SNR or the smaller the RMSE of the sound signal before and after signal filtering, the better the filtering effect, at this time we can conclude that the filtering effect achieved by Wiener filtering and Improved Spectral Subtraction was equivalent from the signal angle. Therefore, when we comprehensively considered from both the signal angle and the recognition angle, the processing effect of Wiener filtering is the best. Moreover, the entire research system described in this study is ultimately aimed towards building a classifier with good prediction performance to predict specific sound types and their numbers in a fixed-duration sound signal, while Wiener filtering has an absolute performance advantage when comparing from the recognition angle. Finally, this research determined that Wiener filtering was the best signal filtering method in this article.
(a) Time-domain waveform after Wiener filtering (b) Time-domain waveform of one specific sound type

Discussion
This research has made a comprehensive consideration from both the signal angle and the recognition angle, and found that Wiener filtering is the best method for broiler sound signal filtering. In fact, in the signal processing field, adaptive filtering methods are often used to filter noisy signals. This research also used this to perform exploratory research in an early stage, but the actual filtering effect is not good. Specifically, when eval- Therefore, when we calculate the SNR and RMSE of the sound signal processed by Wiener filtering, the calculation result was incorrect. This is also the reason why the processing effect of Wiener filtering is general from the signal angle. It should be noted that there are two main reasons for this noise. The first aspect is that the estimated value given by Wiener filtering spectrum estimator at the initial time is deviated from the true value, so a period of error noise will be generated. The second aspect is that the working principle of the Wiener filtering spectrum estimator is constantly adjusting the initial estimated value of the signal to approximate the true value. However, at the initial time, it is in a state of adjustment, which is unbalanced, and this noise is also a manifestation of the initial unstable state. In the signal processing field, this noise is usually not taken into consideration. Therefore, we filtered out the leading noise in the sound signal after Wiener filtering and recalculated the SNR and RMSE of the intercepted ten sound signals. Finally, the new average SNR was 5.6108, and the new RMSE was 0.0551. It can be seen that the newly calculated SNR was higher than Improved Spectral Subtraction, and the newly calculated RMSE was close to Improved Spectral Subtraction and was significantly better than the other four signal filtering methods. Considering that the larger the SNR or the smaller the RMSE of the sound signal before and after signal filtering, the better the filtering effect, at this time we can conclude that the filtering effect achieved by Wiener filtering and Improved Spectral Subtraction was equivalent from the signal angle. Therefore, when we comprehensively considered from both the signal angle and the recognition angle, the processing effect of Wiener filtering is the best. Moreover, the entire research system described in this study is ultimately aimed towards building a classifier with good prediction performance to predict specific sound types and their numbers in a fixed-duration sound signal, while Wiener filtering has an absolute performance advantage when comparing from the recognition angle. Finally, this research determined that Wiener filtering was the best signal filtering method in this article.

Discussion
This research has made a comprehensive consideration from both the signal angle and the recognition angle, and found that Wiener filtering is the best method for broiler sound signal filtering. In fact, in the signal processing field, adaptive filtering methods are often used to filter noisy signals. This research also used this to perform exploratory research in an early stage, but the actual filtering effect is not good. Specifically, when evaluating it from both the signal angle and the recognition angle, it performed even worse than Basic Spectral Subtraction (the worst performer in this article). Sparse Decomposition selected in this research is specifically an Orthogonal Matching Pursuit based on Genetic Algorithm, which is developed on the basis of the MP algorithm and OMP algorithm. The signal Sparse Decomposition is essentially to express the signal in a concise and clear mathematical language, that is, it is hoped that the number of atoms used in the signal is as low as possible, and the signal is not distorted. As such, Sparse Decomposition (or Sparse Representation) is essentially an optimization problem, and the greedy algorithm is a common method of solving this type of problem. The MP algorithm is the earliest and representative greedy algorithm [38,39]. Its main idea is to use as few atoms as possible to linearly represent the input signal from a given over-complete dictionary based on a certain similarity measurement criterion, so as to achieve an approximation of the input signal. The disadvantage of the MP algorithm is that in the optimization process, an atom may be repeatedly selected, which leads to poor convergence of the algorithm [40]. It should be noted that the direction selection of the MP algorithm is sub-optimal, not optimal. This is because the residual is only perpendicular to the current projection direction, so in the next projection, it is likely to be projected to the original direction again [41]. In order to solve the problem of selecting the optimal projection direction of the MP algorithm, the author further introduced the Orthogonal Matching Pursuit (OMP) algorithm.
The OMP algorithm was proposed by Y.C. Pati and R. Rezaiifar [42]. Compared with the MP algorithm, the advantage of the OMP algorithm is that it first processes all the atoms through the Gram-Schmidt regularization method, and then uses the residual signal to project the selected atoms, which ensures that the result of each iteration is the global optimal solution. However, from the analysis of the decomposition time, because the OMP algorithm needs to be orthogonalized every time before projecting on the selected atom, the calculation amount is relatively large. In this research, when the OMP algorithm was selected to carry out the preliminary research, although the signal filtering effect was improved, the calculation efficiency was still very low. Thus, this research introduced a Genetic Algorithm to optimize the OMP algorithm and formed a GA-OMP algorithm for signal sparse decomposition. The Genetic Algorithm proposed by John Holland is a bionic algorithm whose core idea is to simulate the process of mutation and natural selection. Through natural selection, it weeds out bad solutions. Through the reproduction mechanism, the optimal solution is retained. A Genetic Algorithm is an efficient parallel global search algorithm which defines a parameter to be optimized, the absolute value of the signal or signal residual, and the inner product of the atom as the fitness function. This is a good solution to the problem of finding the optimal matching atom in each iteration of the OMP algorithm when the signal is sparsely represented.
In addition, in the signal filtering method of this article, we choose the first method to determine the iterative stop condition of Sparse Decomposition. The author checked the relevant literature and found that Cheng et al. [14] had also conducted the research on broiler signal filtering. In her research, when Sparse Decomposition is used for signal filtering, the second method is selected to determine the iterative stop condition, and the achieved signal filtering effect is excellent. In the next step, the author considers it, and compares it with the existing Wiener filtering and the sub-optimal Improved Spectral Subtraction based on multi-taper spectral estimation to analyze whether there is a greater performance improvement.
Moreover, when analyzing the five signal filtering methods from the signal angle, the calculation results are as shown in Table 1. Among them, part of the SNR calculated by Sparse Decomposition has a negative value. For example, the SNR calculated from the NO.1 filtered sound signal that uses the thirty-atom Sparse Decomposition is −0.7658. This shows that the quality of the sound signal filtered by the thirty-atom Sparse Decomposition is even worse than that without signal filtering. This indicates that Sparse Decomposition method that uses the current atoms not only sparse the noise in the noisy signal, but also sparse the noise-free signal. That is to say, the noise in this sound signal was relatively small, so in the sparse process, only a small part of the noise was processed, and most of it was the processing of the noise-free signal. When adjusting the number of atoms, this situation will be effectively suppressed. For example, when using fifty-atom Sparse Decomposition to filter the NO.1 filtered sound signal, the calculated SNR is adjusted to −0.1859, which is significantly improved compared to the previous value. The same situation also appeared in the NO.2 and NO.2 filtered sound signals. In fact, the author tried to increase the number of atoms to eighty during the research process. Although the SNR obtained was no longer negative, the degree of reduction of the noisy signal was already high, and the signal filtering effect was basically insignificant. Thus, the author will conduct an in-depth study on this part later.
After obtaining the best method for broiler sound signal filtering, for the next step the author will use this as a basis to carry out research on broiler health monitoring with reference to the classification algorithm in machine learning to improve the broiler welfare. Specifically, the author will extract multiple classification features that can distinguish between crows, coughs, purrs, and flapping wings from the filtered broiler sound signals, and build a machine learning data set. Classifiers based on different classification algorithms will be trained on the data set, the inherent parameters of the best performed classifier will be optimized, and the final classifier for the classification of specific sound types (crows, coughs, purrs, and flapping wings) will be obtained. In this way, for a piece of broiler sound signal recorded in the broiler farm, after signal filtering and feature extraction, a data set to be predicted can be formed. We can apply the classifier to make the predictions, and then obtain the specific number of crows, coughs, purrs, and flapping wings in each signal. By comparing the ratio of the number of coughs to the number of all sound types, the cough rate used to judge broiler health is obtained, and the judgment of broiler health is completed. This is in principle consistent with the fact that the breeders judge the cough frequency of the broilers in the current area on the spot. According to the judgment result, protective measures can be taken to improve broiler welfare.

Conclusions
Research on evaluating the filtering method for broiler sound signals from multiple perspectives in this article was conducted from both the signal angle and the recognition angle, and the filtering effects of Basic Spectral Subtraction, Improved Spectral Subtraction based on multi-taper spectrum estimation, Wiener filtering, thirty-atom Sparse Decomposition, and fifty-atom Sparse Decomposition were evaluated. From the signal angle, it can be concluded that Improved Spectral Subtraction based on multi-taper spectrum estimation has obtained the highest average SNR of 5.5145 and the smallest average RMSE of 0.0508. From the recognition angle, it can be concluded that the kNN classifier and Random Forest classifier have achieved the highest average prediction accuracy on the data set after Wiener filtering, which were 88.83% and 88.69%, respectively. Taking into account the unavoidable presence of leading noise in the sound signal processed by Wiener filtering, and after removing the leading noise, Wiener filtering obtained the highest average SNR of 5.6108, obtained a new RMSE of 0.0551, and performed well in the five signal filtering methods. Therefore, a comprehensive analysis was finally carried out from both the signal angle and the recognition angle, and Wiener filtering was determined as the best method for broiler sound signal filtering. This research is the basis for the author to carry out follow-up systematic research and has important practical value for realizing the monitoring of broiler health in the farm. At the same time, the research results can be extended and applied to the research on the detection and processing of livestock and poultry sound signals.