Field-Applicable Pig Anomaly Detection System Using Vocalization for Embedded Board Implementations

: Failure to quickly and accurately detect abnormal situations, such as the occurrence of infectious diseases, in pig farms can cause signiﬁcant damage to the pig farms and the pig farming industry of the country. In this study, we propose an economical and lightweight sound-based pig anomaly detection system that can be applicable even in small-scale farms. The system consists of a pipeline structure, starting from sound acquisition to abnormal situation detection, and can be installed and operated in an actual pig farm. It has the following structure that makes it executable on the embedded board TX-2: (1) A module that collects sound signals; (2) A noise-robust preprocessing module that detects sound regions from signals and converts them into spectrograms; and (3) A pig anomaly detection module based on MnasNet, a lightweight deep learning method, to which the 8-bit ﬁlter clustering method proposed in this study is applied, reducing its size by 76.3% while maintaining its identiﬁcation performance. The proposed system recorded an F1-score of 0.947 as a stable pig’s abnormality identiﬁcation performance, even in various noisy pigpen environments, and the system’s execution time allowed it to perform in real time.


Introduction
In Korea, the share of the livestock industry in the agriculture sector accounts for 41.8% of approximately 40 billion dollars, of which the largest portion is pig breeding.Despite this being an important industry, Korea's marketed-pigs per sow per year (MSY) is merely 17.8, which is very low compared to 31 in Denmark, a country advanced in livestock production [1].One of the main reasons for this low performance in Korea is the difficulty of a small number of managers of small and medium-sized pig farms to effectively and meticulously manage numerous pigs in enclosed pigpens with very poor air quality, which is a bad living condition that negatively affects the animals' welfare and that unfortunately still exists in countries underdeveloped in the livestock industry such as Korea.Therefore, it is difficult for those farms to quickly and accurately detect two of the most common causes of swine mortality, namely pig respiratory diseases and aggressive behaviors between pigs, resulting from financial and practical challenges.
Recently, several studies have been reported to detect abnormalities in livestock using sound sensors to guarantee animals' health and welfare without affecting farm budgets.These methods are not only cost effective, as they are cheaper than other sensors, but also more practical and animal friendly because they collect data constantly for 24 h a day without causing any stress or discomfort to the animals [2][3][4][5][6][7][8].In the case of chickens, certain studies that employed sound data include a study of hens' stress caused by temperature changes based on the results of vocalization analysis [2]; studies on the detection of respiratory diseases based on broilers' sneezing and coughing [3,4]; and a study that attempted to detect avian influenza using chicken sound data [5].Similarly, in the case of cattle, certain studies employed sound to detect cows' estrous [6] and respiratory diseases [7,8].In conclusion, sound data are important as they allow retrieving information that can be useful for detecting abnormalities in livestock.
In this study, we focus on a new method to quickly and accurately detect abnormalities in pigs based on sound data to effectively manage and eliminate anomalies in livestock.Studies that detect abnormalities using sound also exist in the swine research field.Table 1 summarizes the qualitative characteristics of recent studies [9][10][11][12][13][14][15][16] that focused on sound-based pig abnormality detection.These studies are largely divided into detecting coughing sounds caused by diseases and screams caused by stress because a failure in early detection of respiratory diseases and aggressive behaviors among weaning pigs owing to social conflict will result in serious financial damage [14][15][16][17].In early studies, abnormal pig sounds were detected using the statistical significance of time domain or frequency domain features [9][10][11][12].However, recent studies have employed machine learning techniques to improve the accuracy of abnormality detection [13,16].

Statistical analysis
Not specified [11] No No Frequency domain

Statistical analysis
Not specified [12] No No Frequency domain

Cough by disease
Machine learning Not specified [13] Yes No Frequency domain

Statistical analysis
Not specified [14] No No Time domain

Statistical analysis
Not specified [15] No No Frequency domain

Cough by air quality
Machine learning Not specified [16] There were shortcomings and restrictions in the applied usage of previous studies because their main purpose was solely to verify whether the vocalization of pigs could be employed to detect abnormal behaviors in the pigpen.Some of the most relevant limitations of existing sound-based pig abnormality detection studies include the following: 1.
Majority of studies presented methods that can only be performed and reproduced in restricted laboratory environments.

2.
Only a few studies have applied the automatic detection and localization of pig sound events in untrimmed sound data, without manual editing.

3.
Though a study reported the effects of noise on cough sound detection performance [10], studies that attempted to detect abnormal situations while guaranteeing robustness to noise are rare.

4.
No previous study explored the feasibility of implementing a real-time economic abnormality detection system in a low-cost computing environment for small/medium-size farms.The term "real-time" was used to describe a study in one paper [10], but the content did not specify any measurement of the execution time for us to verify the real-time processing status.
However, when the research team interviewed the managers of small/medium-sized pig farms to better understand their needs, they put forward the following conditions to be met by the pig abnormality detection system: (1) Low price; (2) 24-hours monitoring; (3) The false alarm rate must be low whereas the abnormality detection performance must be high; (4) The system must be noise robust if it relies on sound, as there is a considerable amount of noise in piggeries; and (5) The sensors must be periodically replaced because their data collection functionalities deteriorate from the constant exposure to gases such as ammonia, hydrogen sulfide, and methane generated in pigpens with poor air circulation.
In this study, we propose a low-cost real-time sound-based pig abnormality monitoring system that can be installed in real pigpens and operate for 24 h a day in an embedded environment that has limited computing resources, without needing a personal computer (PC) environment.First, the system employs an adaptive context attachment model (ACAM)-based noise-robust voice activity detection (VAD) algorithm, which can effectively detect sound regions even in noisy environments, to detect the sound regions in the data received from the sound sensor [18][19][20][21].Then, each detected sound region is converted into a spectrogram, containing both frequency and time information, before feeding it to the lightweight deep learning model, MnasNet [22].The filter or kernel of the neural network is pruned using the filter clustering method proposed in this study that improves the processing speed while maintaining the abnormality detection performance.In addition, we used a convolutional neural network (CNN) based deep learning structure in this study because it guarantees an effective abnormality detection performance even in various noisy environments [23][24][25].The remainder of this paper is organized as follows.In Section 2, we describe the noise-robust sound-based pig anomaly detection system that is deployed in an embedded board and can process data in real time.In Section 3, we present the performance and experimental results of the proposed system.In Section 4, we draw conclusions and discuss future research.

Embedded Board-Based Real-Time Pig Abnormality Detection System
The structure of the sound-based real-time pig anomaly detection system proposed in this study is illustrated in Figure 1.Data acquisition, preprocessor, and anomaly detector were implemented in the embedded system TX-2 board [26].

Data Acquisition and Preprocessor
Various sounds produced by pigs were collected using the audio sensor installed in the pigpen and then transmitted to the preprocessing module where the end point detector was employed to detect the area where sound is present in the signal.In general, traditional techniques using the time domain or frequency domain characteristics of a signal have a low sound region detection rate performance when a strong signal to noise ratio (SNR) is present in the sound signal [19].In addition, they are highly vulnerable to background noise in the case of threshold-based end point detection [19].However, in this study, pig sounds had to be acquired from pigpens where various environmental noises (such as the footsteps of pigs and the music played inside pigpens) were constantly present.
In this study, to detect sounds inside piggeries, we applied a VAD [18] algorithm that used a deep learning-based pattern-matching algorithm and guaranteed good and noise-robust sound detection performance.This VAD model was implemented based on ACAM, and the introduction of the attention technique further improved the sound detection performance, even in noisy situations [19][20][21].In the VAD system, during the initialization process, the algorithm first converts the sound signal into overlapped frame information with 25 ms and 10 ms shifts, and then, adds context information before inputting the signal to the decoder.Thereafter, through the decoder, attention, encoder, and long short-term memory (LSTM)-based core processes, it is determined whether the corresponding signal frame region is sound (see Figure 2).In this study, we modified certain parameters of the algorithm and used it to acquire the sound generated by pigs in pigpens.The specifics of the algorithm and user-defined parameters are fully described in [18].Once the sound region was detected in the signal, it was converted into a spectrogram and transmitted to the anomaly detector.In this study, to detect sounds inside piggeries, we applied a VAD [18] algorithm that used a deep learning-based pattern-matching algorithm and guaranteed good and noise-robust sound detection performance.This VAD model was implemented based on ACAM, and the introduction of the attention technique further improved the sound detection performance, even in noisy situations [19][20][21].In the VAD system, during the initialization process, the algorithm first converts the sound signal into overlapped frame information with 25 ms and 10 ms shifts, and then, adds context information before inputting the signal to the decoder.Thereafter, through the decoder, attention, encoder, and long short-term memory (LSTM)-based core processes, it is determined whether the corresponding signal frame region is sound (see Figure 2).In this study, we modified certain parameters of the algorithm and used it to acquire the sound generated by pigs in pigpens.The specifics of the algorithm and user-defined parameters are fully described in [18].Once the sound region was detected in the signal, it was converted into a spectrogram and transmitted to the anomaly detector.

Anomaly Detector
In this module, the CNN-based MnasNet structure generated sound features and classified them to detect the sound anomalies of pigs.At this level, and to deploy it to the embedded system, the smaller the amount of filter calculation of the deep learning structure is, the better it is.To this end, the number of filters of the basic structure of MnasNet was controlled using a filter clustering method.

MnasNet
The CNN algorithm is considered an important breakthrough in the image classification field, and the application of models based on it showed a remarkable increase in image recognition performance.This resulted in the CNN algorithm being employed in various fields of study [27][28][29][30].Recently, different attempts to run such high-performance CNN models in a low-computing environment such as mobile ones have been reported [31,32].The representative hand-crafted CNN models in low-computing environments include MobileNet and MobileNetV2, which demonstrated stable identification performances in a mobile environment [31,32].In addition, studies on neural architecture search (NAS), which automatically generates suitable models for specific target problems based on reinforcement learning (RL) rather than by hand-crafting a CNN model, has been

Anomaly Detector
In this module, the CNN-based MnasNet structure generated sound features and classified them to detect the sound anomalies of pigs.At this level, and to deploy it to the embedded system, the smaller the amount of filter calculation of the deep learning structure is, the better it is.To this end, the number of filters of the basic structure of MnasNet was controlled using a filter clustering method.

MnasNet
The CNN algorithm is considered an important breakthrough in the image classification field, and the application of models based on it showed a remarkable increase in image recognition performance.This resulted in the CNN algorithm being employed in various fields of study [27][28][29][30].Recently, different attempts to run such high-performance CNN models in a low-computing environment such as mobile ones have been reported [31,32].The representative hand-crafted CNN models in low-computing environments include MobileNet and MobileNetV2, which demonstrated stable identification performances in a mobile environment [31,32].In addition, studies on neural architecture search (NAS), which automatically generates suitable models for specific target problems based on reinforcement learning (RL) rather than by hand-crafting a CNN model, has been conducted [33,34].Based on this concept, studies have attempted to automatically generate a model to apply such an NAS to a mobile environment rather than a PC environment, and the representative result of this study is known as mobile neural architecture search (MNAS) [22].Unlike NAS, which emphasizes only the high accuracy of the generated model, the process of retrieving the CNN model using MNAS considers the hardware in which the generated model will be deployed.The search process for MNAS is optimized using Equation (1) [22].
To generate and optimize model m, MNAS attempts to maximize the value obtained by calculating the accuracy of m, ACC(m), multiplied by the latency measured by the target hardware, LAT(m), and divided by the target latency T. W denotes a variable that determines the tradeoff between ACC and LAT.
Tan et al. [22] set up MNAS to generate an optimized model capable of performing image recognition in a Google Pixel 1 mobile phone, resulting in the creation of MnasNet, a lightweight model that showed higher efficiency and accuracy than models specifically hand-crafted for mobile environments (MobileNet and MobileNetV2) [22,35].In this study, we employed the MnasNet model proposed by Tan et al. [22] as a sound-based pig anomaly detector.
The structure of MnasNet employed in the experiment is illustrated in Figure 3. Furthermore, the mobile bottleneck convolution (MBConv) and separable convolution (SepConv) layers used in MobileNetV2 are used here [22,28].Each block receives an input vector of shape H × w × F (H refers to height, w to width, and F to the number of channels).The MBConv block is calculated by expanding the number of channels F by three times (MBConv3; H × w × 3F) or six times (MBConv6; H × w × 6F) before going through depthwise convolution (DWConv), and then the number of channels is restored back to F. The hierarchical structure of MnasNet is composed of repeating blocks with different channel expansion ratios (MBConv3; MBConv6), filter sizes (3 × 3; 5 × 5), and number of filters.In Figure 3, symbols ×2/ × 3/ × 4 on the right side of each layer numbered 1 -5 represent the number of times that specific block is repeated.In this study, a spectrogram image of size 128 × 128 × 3 was used as an input.It was dimensionally reduced to 4 × 4 × 320 (the output of the last MBconv block) and fed as an input to a fully connected layer (FC).Thereafter, the FC was used to calculate the probability of belonging to each class to obtain the result of the classification.

Filter Clustering and Pruning
Although MnasNet is an optimized model for the target hardware, it is sometimes necessary to expand the model to obtain higher accuracy or reduce it to decrease computing power consumption.The easiest model scaling method is to reduce the overall latency by resizing the input image fed to MnasNet from 224 × 224 to 128 × 128 [22,32].Another method operates by removing the convolution layer filters at a fixed rate [33,[36][37][38][39]. MnasNet employs a depth multiplier (DM) as a model scaling hyper-parameter that removes the filters and decreases or increases the number of channels in each layer of the model to control its size [22].If the DM is set to 0.5, the number of filters in each layer is reduced to half, thereby reducing the latency.
Recently, there has been a study that, instead of using the algorithm with the DM described earlier, which removes filters at a fixed rate before training the CNN-based deep learning model, applied a clustering method to remove filters of low importance among the ones present at the level of each layer of the trained neural network model [29].This clustering method was first applied to the detector You Only Look Once (YOLO) [40], and the results proved that the network size was effectively reduced while the identification performance was maintained.In this study, an 8-bit filter clustering algorithm is proposed to further improve the model compression ratio of the filter clustering algorithm proposed in [29], which relied on a 9-bit filter, while maintaining the detection performance.The algorithm proceeds in the following order: 1.
Except for the center of the filter, the weights in 3 × 3 filters belonging to a specific layer of the deep learning model are converted to binary values-0 if the value of the weight is less than the value at the center of the filter and 1 if greater than or equal to the value at the center of the filter (see Figure 4).Then, as shown in Figure 4c, 8-bit binary pattern values of the corresponding filters are obtained by converting them to 8-bit binary numbers.These 8-bit binary pattern values possess a maximum of 256 patterns.

2.
After defining the 256 patterns that can be generated through the 8-bit binary filtering as individual clusters, all filters belonging to a specific layer of the deep learning model are classified into their corresponding clusters.For example, if a specific filter that has undergone the process described in Figure 4 has the binary pattern 11010110 (2) , it is classified in the 214th cluster.

3.
After clustering all the filters in a specific layer and using the original filter values shown in Figure 4a before the binary patterning is applied, we calculate the value l2 − Norm At this stage, in each cluster, the filter with the highest l2 − Norm value is considered the most relevant and important filter for the identification performance [37,41] and hence retained, whereas all the remaining ones are removed because they are regarded as less important filters that will not affect the model's performance.5.
Steps 1 to 4 are performed in all the convolutional layers of the deep learning model.After the algorithm is applied to the entire network, only the highly important and relevant filters among the 3 × 3 filters in each layer will remain.This will improve the speed performance of the deep learning network by reducing its size while maintaining its classification performance.The 8-bit filter clustering algorithm for convolutional layers consisting only of 3 × 3 filters proposed in this study is described in Algorithm 1.

else if end for prune 3 × 3 filter in convolution layer, except filter in fl
The 8-bit filter clustering algorithm proposed in this paper was implemented with the purpose of being used on 3 × 3 filters in convolutional layers.However, unlike YOLO, which includes only 3 × 3 filters, the MnasNet structure also has convolutional layers that use 5 × 5 filters, which makes it impossible to apply the 8-bit filter clustering algorithm to all the layers of MnasNet.To solve this problem, in this study, at the level of the convolutional layers composed of MnasNet's 5 × 5 filters, the DWConv layers inside MBConv (see Figure 3c) were replaced with DWConv layers with a stack of two 3 × 3 filters that used the same receptive field.This method allows us to apply the 8-bit clustering algorithm proposed in this paper in all the layers of the neural network while minimizing the change to the existing MnasNet structure, which helps increase the compression ratio of the model.However, the following should be noted when changing the MBConv structure and applying the filter clustering method: DWConv, which plays the same role as the depthwise separable convolution layer proposed by Chollet [42], is a dependency that is mapped 1 : 1 with the Conv 1 × 1 layer located at the top of the MBConv to which the DWConv belongs.If filters belonging to the DWConv layer in MBConv are removed according to the result of filter clustering and the filters belonging to the Conv 1 × 1 layer, which is the upper layer of DWConv, are maintained, the dependency is damaged.To solve this dependency problem, in this study, whenever a filter belonging to the DWConv layer of MBConv is removed, the filter of the Conv 1 × 1 layer mapped 1 : 1 with the removed filter is also removed.The process of compressing MnasNet by applying the 8-bit filter clustering algorithm is shown in Algorithm 2.

Data Collection and Datasets
The data were obtained from 36 pigs (Yorkshire, Landrace, and Duroc), each weighing 25-35 kg and kept inside four pigpens (with dimensions of 1.8 × 4.8 m and temperature of 23 • C) at pig farms located in Chungnam, Korea.One study [13] details the data collection and organization of the targeted respiratory diseases, including mycoplasma hyopneumoniae (MH), porcine reproductive and respiratory syndrome (PRRS), and postweaning multisystemic wasting syndrome (PMWS).When labeling the data, and in situations where the data included irrelevant sound caused by pigs' footprint or aggressiveness and attacks among themselves, the video recorded was analyzed along with the sound to ensure that the label accurately matched the class.The sound region was detected using the algorithm proposed by Kim and Hanh [18] through the VAD system previously mentioned.The detected sound data was 0.127 to 2.627 s long, and the sample rate was 44,100 Hz.
To check the detection performance of pig abnormalities in noisy situations, white Gaussian noise (SNR: 20, 15, 10, 5, and 0 dB) and environmental noise (radio operation, door opening, weak footsteps, and strong footsteps) were synthesized with pig sounds.The radio sound refers to the music played inside the pigsty to suppress stress in pigs and maintain their psychological state at a stable level.The strong footsteps are sounds made by several pigs running around excitedly in the pigsty, and the weak footsteps are those made by a few pigs walking or running around under normal circumstances.Lastly, the sound of the door opening is the one that occurs when the manager enters or leaves the pigpen.Table 2 lists certain basic information related to environmental noise, and Figure 5 displays examples of signals for various sounds that can be produced by a pig.

End Point Detection
To detect sound-based pig anomalies, the first process involves localizing the sound generated from the sound signal acquired through the sound sensor installed in the pig house.In this study, the VAD algorithm proposed by Kim and Hanh [18] was used for that purpose, allowing the detection of all sounds generated in pig houses.The values of the settings used to detect sound were as follows: the length of the fast Fourier transform (FFT) window was 512, window size was 0.025 s, hop size was 0.01 s, and threshold was set to 0.75.Figure 6 depicts the 12.669 s long signal that represents five pig coughs and the result of the coughing sound detection in the signal.The results indicated that continuous coughing sounds such as 1 , 2 , 3 , and 4 , and coughing sounds with a small signal sound size such as 5 were effectively detected.In addition, the time taken to detect a sound region in a sound signal of 12.669 s in length in the TX-2 embedded board (CPU: ARM Cortex-A57, GPU: Pascal with 256 CUDA cores, and RAM: 8 GB) was 4.391 s.The sound region detected through this process was converted into a spectrogram, and then, input to the MnasNet-based abnormality detector.Librosa Python package 0.7.2 [43] with its default setting values was used to convert the sound signals to spectrograms.At this stage, the time required to convert a 2.005 s sound signal into a spectrogram in the TX-2 board was 1.095 s.

Pig Anomaly Classification Results
The data used for sound-based pig anomaly detection included 100 samples of cough, 110 of grunt, 70 of MH, 150 of PMWS, 140 of PRRS, and 140 of Scream, adding up to 710 samples of data.The dataset was divided in the ratio of 8:2, with 8 (568) as the training set and 2 (142) as the testing set.Furthermore, to confirm whether abnormal situations could be detected robustly in various noisy situations, five steps of white Gaussian noise and four environmental noise sounds were used (142 × 9) by synthesizing them with the original test data.
In the first experiment, MnasNet was trained only with the original data for training that did not contain noise.As mentioned in the Introduction, the CNN-based deep learning structure is known to be robust to noise, but it is still necessary to secure more robust anomaly detection performance.Consequently, in the second experiment, the original data for training and the data obtained by synthesizing SNR 0 with the corresponding data were used in the MnasNet model training.Subsequently, an experiment was conducted to confirm the effectiveness of the filter clustering technique proposed in this study for the corresponding MnasNet structure.Then, another experiment, in which the DM option was applied to MnasNet models before training them, was conducted to be used for performance comparison.For MnasNet, Keras 2.2.4 [44] and TensorFlow 1.12.0 [45] were used, and an Adam Optimizer with decay rates β-1 = 0.9 and β-2 = 0.999, a learning rate of 0.001, and a batch size of 142 were used.The first experiment was trained for 80 epochs, whereas the second one for 100 epochs, and default settings were used as hyper-parameters for the training.After filter clustering was applied to MnasNet, additional training was performed on the pruned MnasNet model for fine-tuning.The evaluation index used in the experimental results is the F1-score, which is calculated as follows [46]: where true positive (TP) represents the data accurately classified as true, false positive (FP) represents the inaccurate identification of data as true, and false negative (FN) represents the data inaccurately identified as false.Precision indicates the ratio of how much of the data predicted as a specific class actually belongs to it, and recall indicates the ratio of accurately detecting a specific class.The DM values in the first three experiments represent the rate at which the filters are maintained.DM 1.0 represents the training performed without pruning any filters of MnasNet, which we will refer to as the basic structure of MnasNet for the remainder of the paper, and DM set to 0.75 and 0.5 are the ones where MnasNet filters are removed at rates of 25% and 50%, respectively, before the training.However, the remaining three experiments are the result of applying the filter clustering technique to the trained model of the basic structure of MnasNet.In the order of listing, the first experiment uses the model resulting from applying the initial filter clustering algorithm [29], the second one uses the result of applying the 8-bit filter clustering technique to only the convolutional layer comprising 3 × 3 filters of the MnasNet, and the last one uses the model resulting from applying the 8-bit filter clustering technique to all layers of MnasNet to identify abnormalities in pig sounds.The experimental results indicated that when DM was set to 0.75 or 0.5, the model's identification performance could not be maintained because the results showed a significant drop.In contrast, the three experiments using the filter clustering technique showed that the identification performance was well maintained despite a decrease in the number of filters in the neural network.This demonstrated that the MnasNet model's identification performance was not affected by the removal of filters that were not relevant for the identification, which was different from removing filters randomly from MnasNet using the DM.However, for SNRs 15, 10, 5, and 0, with strong white Gaussian noise and door opening noise (environmental noise), we noticed that the identification performance was generally low for all algorithms.
Unlike the previous experiments, the experimental results listed in Table 4 are from the training performed using a dataset containing data obtained by synthesizing white Gaussian noise SNR 0 and clean data.As shown in Table 4, compared to the previous experimental results, the F1-score has considerably increased with values ranging from 0.107 to 0.277, and the pig's abnormality is stably identified not only in white Gaussian noise conditions but also in environments containing environmental noise.In particular, the average F1-score result of the 8-bit filter clustering method proposed in this study is 0.947, the highest identification result, which is 0.025 higher than the result obtained using the basic structure of MnasNet (DM 1.0).In addition, as presented in Table 5, the number of parameters of the neural network is 646,624 and the execution time for detecting a pig's abnormality from sound converted to spectrogram images on the TX-2 board is 0.253 s/image.While this model has a size that is 76.3% smaller than the basic structure of MnasNet, its execution time is 0.220 s faster, which proves that the proposed method produces the most optimized model.Thus, this model can be executed in real time, and as shown in Table 4, provides the best identification results.4, to all layers of MnasNet.The results confirmed that it effectively detected abnormalities related to pigs' respiratory diseases and screams resulting from attacks between pigs. Figure 7 illustrates the compressed MnasNet structure after pruning the filters that are irrelevant to the identification performance using the 8-bit filter clustering method.As previously described in Section 2.2.2, to apply the 8-bit filter clustering method to all layers of MnasNet, the MBConv layers composed of 5 × 5 filters were changed to comprise two hierarchical 3 × 3 filters.Therefore, the interior of the existing MBConv structure was changed to a structure with two DWConvs, named MBConv2, as shown in Figure 7d.In addition, in MBConv in the MnasNet model, layers having the same number of filters were repeated; hence, we represented them as one and added symbols ×2/ × 3/ × 4 on the right side of the layers (see Figure 3a).However, when the 8-bit filter clustering algorithm is applied to MnasNet, the number of filters belonging to each layer of the repeated MBConv changes.Therefore,

Conclusions
Failure to quickly and accurately detect various abnormalities (porcine respiratory diseases, aggressive behaviors among pigs, etc.) occurring in pigpens will cause considerable damage to the pig farms and national economy.In particular, and unlike large-scale enterprise farms, small and medium-sized farms are relatively negligent when preparing for and dealing with such abnormal situations.To provide them with a suitable solution, we propose a system that employs sound data to effectively detect abnormal situations in pigs.The system was designed specifically to be executed in real time using a low-cost sound sensor and run on an embedded board TX-2, instead of relying on relatively expensive video sensors and general PCs, such that small farms with limited budgets could purchase them without having to bear any financial burden.In addition, the system was implemented to be robust against various noises generated inside pigpens, such that it could be applied in real-life pig farms.
The proposed system included a pipeline that connected the entire process starting from sound acquisition to the detection of anomalies in pigs as follows: (1) effective sound signal acquisition from a sound sensor mounted in an environment with possible noise occurrences; (2) signal-to-sound area detection and conversion to a spectrogram; and (3) application of the 8-bit filter clustering algorithm proposed in this paper to MnasNet, a light-weight deep learning model, to remove filters that did not affect the identification performance.As a result, a model 76.3% lighter than the original MnasNet model was created and used to receive the spectrogram as input to detect and identify abnormal pig situations.The results of the abnormality identification experiment demonstrated an F1-score of 0.947, achieving the best identification performance even in pigpens where various noises were generated.In addition, the execution time of the abnormality identification algorithm on the TX-2 board was 0.253 s, which was 0.220 s faster than the basic MnasNet model; this allowed real-time execution.In our next study, we intend to implement a more reliable pig abnormality monitoring system by combining sound and video data acquired from sensors installed in pigpens.

Figure 1 .
Figure 1.Overall structure of the pig anomaly detection system used to detect postweaning multisystemic wasting syndrome (PMWS), porcine reproductive and respiratory syndrome (PRRS), mycoplasma hyopneumoniae (MH) and scream.

3 2 4
.40) of the filters belonging to each cluster and sort them based on the calculated value in their corresponding clusters.4.

Algorithm 1 . 8 -
bit filter clustering for convolutional layers consisting only of 3 × 3 filters Input: Pre-trained weight W Output: Filter-Clustered weight W FC Initialize: 3 × 3 filter f in 3 × 3 convolution layer, save filter list f l, L2Norm list L for i = 1 to number of filters in 3 × 3 convolution layer do b = 1, c = 0 for j = 1 to 9 do if f

Algorithm 2 . 1
Compression of MnasNet Input: Pre-trained weight of MnasNet W Output: Filter-Clustered weight of MnasNet W FC Initialize: Convolution layer of MnasNet l for i = 1 to number of layers in MnasNet if l[i].f ilter_size == 9 do execute 8-bit filter clustering algorithm for l[i] if l[i] == depthwise convolution do prune corresponding filters of l[i −

Figure 6 .
Figure 6.Sound detection result in pigsty using the VAD algorithm.

Table 2 .
Basic statistical information about the environmental noise in pigsty.

Table 3
presents the experimental results of identifying abnormalities in pigs after training with only clean data (not containing any synthesized noise).The table illustrates the results of three experiments with MnasNet using different DM values and three experiments related to filter clustering.

Table 5 .
Comparison between the number of parameters of the pig-anomaly detector model and the execution time of the model on the TX-2 board (Train: clean + SNR 0 synthesized dataset).

Table 6
displays a confusion matrix of the results after applying the 8-bit filter clustering method, shown in Table

Table 6 .
Confusion matrix for identification of pig abnormalities (Test: Clean + All synthesized noise dataset).

Table 8 .
[13]arison of respiratory disease identification performance between proposed method and that used by Chung et al.[13].