Fast Pig Detection with a Top-View Camera under Various Illumination Conditions

: The fast detection of pigs is a crucial aspect for a surveillance environment intended for the ultimate purpose of the 24 h tracking of individual pigs. Particularly, in a realistic pig farm environment, one should consider various illumination conditions such as sunlight, but such consideration has not been reported yet. We propose a fast method to detect pigs under various illumination conditions by exploiting the complementary information from depth and infrared images. By applying spatiotemporal interpolation, we ﬁrst remove the noises caused by sunlight. Then, we carefully analyze the characteristics of both the depth and infrared information and detect pigs using only simple image processing techniques. Rather than exploiting highly time-consuming techniques, such as frequency, optimization, or deep learning-based detections, our image processing-based method can guarantee a fast execution time for the ﬁnal goal, i.e., intelligent pig monitoring applications. In the experimental results, pigs could be detected effectively through the proposed method for both accuracy (i.e., 0.79) and execution time (i


Introduction
Caring for group-housed pigs is an important issue that can be resolved by detecting or managing problems early with regards to their health and welfare [1][2][3][4][5][6].Especially, it is required to minimize the potential damage for individual pigs from infectious diseases or other health problem.Because of the small number of farm workers, however, it is very challenging to care for individual pigs in a large pig farm.
Recently, several researches have been reported using surveillance techniques for an automatic pig monitoring system .In this study, we focus on pig monitoring systems with a top-view camera under various illumination conditions in a realistic pig farm environment.The illumination problem has been considered by either applying image processing techniques that are time-consuming or by using thermal/depth cameras known to be less sensitive to the illumination problem.Indeed, we reported results previously for pig detection with Kinect-based depth information [38,39].The depth information obtained from low-cost sensors, however, are susceptible to sunlight, and a fast solution to the illumination problem caused by sunlight has not yet been reported.
In this study, we propose not only a low cost but also a fast method to detect pigs through a top-view camera under various illumination conditions.First, we exploit the infrared and depth Symmetry 2019, 11, 266 2 of 20 information which is concurrently obtained from a low-cost camera, such as Intel RealSense [43].The accuracy of the depth information measured from the RealSense camera is degraded significantly when covering a large area (i.e., pig room).Furthermore, sunlight through a window at daytime generates many noises on both depth and infrared information.Thus, we integrate both information complementarily to resolve the low-pixel accuracy and illumination noises such as sunlight.Second, we apply simple but effective image processing techniques only for satisfying the real-time execution for pig detection.By decreasing the computational workload of the detection task through the simple image processing, it allows to complete intermediate-level vision tasks, such as pig tracking, and high-level vision tasks, such as behavior analysis for pigs.
The rest of the paper is organized as follows: Section 2 describes previous pig detection methods.Section 3 explains the proposed method to detect pigs under various illumination conditions.The experimental results for pig detection are presented in Section 4, and Section 5 finally concludes the proposed method.

Background
The contribution of this study is to achieve our ultimate goal of an automatic analysis for pig behavior during 24 h by individually recognizing each pig through pig detection.The previous researches performed segmenting touching pigs and tracking individual pigs [41,42], but the most important task for the ultimate goal of a 24 h pig monitoring is accurate pig detection.For example, Figure 1 shows various illumination conditions in a realistic pig farm environment.With an infrared camera, the gray values of pigs located at four corners are generally darker than those of pigs located at center locations (see Figure 1a).In addition, the accuracy of the depth information obtained from a low-cost depth camera decreases quadratically as the distance increases [44].Thus, we can apparently confirm the differences between infrared and depth information images.Sunlight through a window at daytime makes it especially difficult to separate pigs from the neighboring wall and floor (see Figure 1b) with both infrared and depth information.Clearly, the critical problem in consistently separating and tracking pigs for the automatic behavior analysis is to precisely detect pigs under various illumination conditions.accuracy of the depth information measured from the RealSense camera is degraded significantly when covering a large area (i.e., pig room).Furthermore, sunlight through a window at daytime generates many noises on both depth and infrared information.Thus, we integrate both information complementarily to resolve the low-pixel accuracy and illumination noises such as sunlight.Second, we apply simple but effective image processing techniques only for satisfying the real-time execution for pig detection.By decreasing the computational workload of the detection task through the simple image processing, it allows to complete intermediate-level vision tasks, such as pig tracking, and high-level vision tasks, such as behavior analysis for pigs.The rest of the paper is organized as follows: Section 2 describes previous pig detection methods.Section 3 explains the proposed method to detect pigs under various illumination conditions.The experimental results for pig detection are presented in Section 4, and Section 5 finally concludes the proposed method.

Background
The contribution of this study is to achieve our ultimate goal of an automatic analysis for pig behavior during 24 h by individually recognizing each pig through pig detection.The previous researches performed segmenting touching pigs and tracking individual pigs [41,42], but the most important task for the ultimate goal of a 24 h pig monitoring is accurate pig detection.For example, Figure 1 shows various illumination conditions in a realistic pig farm environment.With an infrared camera, the gray values of pigs located at four corners are generally darker than those of pigs located at center locations (see Figure 1a).In addition, the accuracy of the depth information obtained from a low-cost depth camera decreases quadratically as the distance increases [44].Thus, we can apparently confirm the differences between infrared and depth information images.Sunlight through a window at daytime makes it especially difficult to separate pigs from the neighboring wall and floor (see Figure 1b) with both infrared and depth information.Clearly, the critical problem in consistently separating and tracking pigs for the automatic behavior analysis is to precisely detect pigs under various illumination conditions.In order to enhance the low-contrast image as shown in Figure 1a or the sunlight image as shown in Figure 1b, we adopted contrast limited adaptive histogram equalization (CLAHE) [45], which is In order to enhance the low-contrast image as shown in Figure 1a or the sunlight image as shown in Figure 1b, we adopted contrast limited adaptive histogram equalization (CLAHE) [45], which is one of the most widely used techniques, to enhance the low contrast, as in bio/medical applications (e.g., CT/MRI imaging).Note that, histogram equalization (HE) [46] is also one of the most employed techniques for improving image contrast, but it may cause a problem that foreground cannot be detected because of the excessive change in brightness.Then, we adopted the Otsu algorithm [47] to detect objects from the gray images based on thresholding.
Figure 2 shows the results of Otsu after CLAHE.From the infrared images, it is difficult to detect the dark pig from the low-contrast image (see the red box shown in Figure 2a) or the possible boundary lines between the pig and the neighboring wall and floor from the sunlight image (see the red box shown in Figure 2b).The illumination problems with infrared images may be solved by using depth images.From depth images, however, it is difficult to detect the pig completely, owing to the inaccurate pixel values of the depth images (see the green area shown in Figure 2, and we call this problem the "missing pig-pixel problem" for the purpose of explanation).If we can exploit this complementary information from infrared and depth images, we can detect pigs more accurately under various illumination conditions.one of the most widely used techniques, to enhance the low contrast, as in bio/medical applications (e.g., CT/MRI imaging).Note that, histogram equalization (HE) [46] is also one of the most employed techniques for improving image contrast, but it may cause a problem that foreground cannot be detected because of the excessive change in brightness.Then, we adopted the Otsu algorithm [47] to detect objects from the gray images based on thresholding.
Figure 2 shows the results of Otsu after CLAHE.From the infrared images, it is difficult to detect the dark pig from the low-contrast image (see the red box shown in Figure 2a) or the possible boundary lines between the pig and the neighboring wall and floor from the sunlight image (see the red box shown in Figure 2b).The illumination problems with infrared images may be solved by using depth images.From depth images, however, it is difficult to detect the pig completely, owing to the inaccurate pixel values of the depth images (see the green area shown in Figure 2, and we call this problem the "missing pig-pixel problem" for the purpose of explanation).If we can exploit this complementary information from infrared and depth images, we can detect pigs more accurately under various illumination conditions.[47] after contrast limited adaptive histogram equalization (CLAHE) [45] with a low-contrast image and (b) the results of Otsu [47] after CLAHE [45] with a sunlight image.
We summarize some of the previous approaches used for pig monitoring as shown in Table 1.Even if online monitoring applications are required to satisfy the real-time requirements, the processing speed was not described or the real-time requirements were not satisfied in many previous studies.Furthermore, some of the methods considered the illumination problem by applying time-consuming image processing techniques (i.e., "management of various illumination = Yes" shown in Table 1), whereas some others did not (i.e., "management of various illumination = No" shown in Table 1).Also, some of the methods tried to avoid the illumination problem by using thermal/depth cameras that were known to be less sensitive to the illumination problem.However, none of the previous methods have reported the results of pig detection with sunlight images (i.e., "management of sunlight = No" shown in Table 1).For example, we try to extend the previous research of detecting standing pigs [38,39] in order to additionally detect lying pigs with sunlight images, which is very difficult to solve with only depth information.In addition to pig detection, some studies for detecting objects have been reported by using data modalities (i.e., multi-sensor fusion) through various information [48][49][50][51][52].For example, References [48] and [49] proposed an object detection method by using both color and infrared information.In Reference [50], the fusion information between grayscale and thermal information was employed for foreground detection.As  [47] after contrast limited adaptive histogram equalization (CLAHE) [45] with a low-contrast image and (b) the results of Otsu [47] after CLAHE [45] with a sunlight image.
We summarize some of the previous approaches used for pig monitoring as shown in Table 1.Even if online monitoring applications are required to satisfy the real-time requirements, the processing speed was not described or the real-time requirements were not satisfied in many previous studies.Furthermore, some of the methods considered the illumination problem by applying time-consuming image processing techniques (i.e., "management of various illumination = Yes" shown in Table 1), whereas some others did not (i.e., "management of various illumination = No" shown in Table 1).Also, some of the methods tried to avoid the illumination problem by using thermal/depth cameras that were known to be less sensitive to the illumination problem.However, none of the previous methods have reported the results of pig detection with sunlight images (i.e., "management of sunlight = No" shown in Table 1).For example, we try to extend the previous research of detecting standing pigs [38,39] in order to additionally detect lying pigs with sunlight images, which is very difficult to solve with only depth information.In addition to pig detection, some studies for detecting objects have been reported by using data modalities (i.e., multi-sensor fusion) through various information [48][49][50][51][52].For example, References [48] and [49] proposed an object detection method by using both color and infrared information.In Reference [50], the fusion information between grayscale and thermal information was employed for foreground detection.As another example of data modality, Reference [52] proposed a background subtraction method for detecting moving objects by using color and depth information.To the best of our knowledge, this is the first report on detecting pigs in real time by exploiting the complementary information (i.e., without any time-consuming techniques, such as frequency-, optimization-, or deep learning-based detections) with sunlight images obtained from a low-cost camera.That is, we propose a fast pig detection method with a reasonable accuracy with the ultimate goal of achieving a "complete" real-time vision application from low-level vision tasks to intermediateand high-level vision tasks by carefully balancing the tradeoffs between the computational workload and detection accuracy as well as exploiting both depth and infrared information.

Proposed Method
As in Reference [39], detecting pigs in a pen can be achieved by analyzing the depth information between the background and foreground (e.g., the floor, wall, and pigs) because the depth information is less sensitive to various illuminations.However, it is challenging to precisely detect the pigs by analyzing depth information because the depth information obtained from the low-cost camera is measured inaccurately.That is, pigs such as lying pigs cannot be detected according to a certain threshold that is obtained from the inaccurate depth information.Meanwhile, infrared information has the advantage of accurate pixel values, so that the pigs can be detected from the background by using simple image processing techniques.If some of pigs are located at the four corners of the pen, however, the pigs cannot be detected accurately because the gray values at the corners are darker than those of the center.Furthermore, if sunlight appears in the pig pen during daytime, many noises seriously affect both the depth and infrared information due to the sunlight and, thus, make it difficult to detect the pigs accurately.Thus, it is required to exploit the complementary information from both the depth and infrared images under the low-contrast and sunlight environment.
In this study, we propose a fast pig detection method, which is denoted as 'FastPigDetect' under various illumination conditions by using the advantages of both depth (i.e., less sensitive to illumination) and infrared (i.e., more accurate pixel values) information.First, the region of interest (ROI) is set to exclude unnecessary regions, such as a feeder or another pig pen.Then, we remove not only noises generated by the sunlight at daytime but also other noises according to the environment of the pen by using spatiotemporal interpolation on both depth and infrared information.In the next step, the neighboring background (i.e., the floor and wall) is segmented from the pigs by analyzing the depth information, and the contrast of the infrared information is improved with a contrast enhancement technique to "roughly" detect pigs.Finally, the pigs in the pen can be "precisely" detected by integrating both the depth and infrared information with simple image processing techniques.With the advantages of the two information, the pigs can be detected effectively in the low-contrast as well as the sunlight environment.Figure 3 shows the overall procedure for the pig detection under various illumination conditions.We define some terminologies described in the proposed method for increased readability.Table 2 describes the terminologies for each procedure of the pig detection method.We define some terminologies described in the proposed method for increased readability.Table 2 describes the terminologies for each procedure of the pig detection method.In Section 3.1.1,we localize the pigs in the pig pen by removing noises from the depth information.From D input , we first set the ROI to exclude the unnecessary regions (i.e., a feeder or another pen).Then, we apply 4 × 4 window spatiotemporal interpolation [39] as a preprocessing step for removing noises (i.e., undefined pixels) that may occur, such as those caused by sunlight at daytime.Note that because the noises generated from intense sunlight are similar to the large moving noises described in Reference [39], the spatiotemporal interpolation technique is iteratively conducted until the noises that are removed.For understanding the locations of each pig, the pixel frequencies from both foreground (i.e., the pigs) and background (i.e., the floor and wall) are calculated through histogram analysis using D interpolate .We note that the background area is larger than that of each pig in the pen.Accordingly, the most frequent pixel can be selected as the background pixel used to segment the depth image into the background and the foreground, i.e., the pixel used to separate the foreground and background from each other.By setting the most frequent pixel to the threshold for background segmentation, the roughly localized pigs in D localize can be obtained by using the threshold.However, the depth values of the wall may be the same as those of the pigs in accordance with their size, so that the floor can be relatively removed, but the wall cannot be removed using the threshold.

Removing Noises and Localizing Pigs
In order to resolve the problem, we conduct the background modeling for producing D background and apply the frame difference between D background and D input for removing the wall.Before modeling D background , it is necessary to realize the characteristic of depth information.The depth information is likely to be measured inaccurately according to the distance between the sensor and the pigs/background.For example, even if the size of the pig located at the corner in the pen is the same with the size of the pig located at the center, the depth values of each pig are subtly different because the pig in the corner is farther away from the sensor than the pig in the center.In case of the background, depth values at any locations of the background may be also obtained differently according to the distance from the camera.Thus, the background modeling is required to calibrate the depth values at any location, which are then used to conduct background subtraction for calibrating the depth values of each pig and the background.
For modeling the background of the pig pen, we exploit all depth information videos recorded during a 24 h period.First, the floor and other parts (i.e., the wall and pigs) for every frame are respectively divided by using the threshold for background segmentation that is selected by histogram analysis.Because the depth values of the wall among other parts may be the same as with those of the pigs, the pigs and the wall are considered to be in the same category.Here, we define the floor and the other parts as the floor background and the other background, respectively.In the next step, the floor background and the other background are independently updated with the depth values of the floor and the other parts during the 24 h videos.After updating each background, the depth values of the other background are overlapped to the not updated regions of the floor background because the depth values of the floor are only updated from D input through the threshold.
Every D interpolate is applied with frame difference using the modeled D background and by applying histogram equalization and Otsu algorithm [47] on the image, a D localize , where the pigs are localized with removing the wall, can be obtained.Applying the two localized images, D localize and D localize , to the infrared images allows for the robust detection of the pigs in the sunlight and low-contrast conditions.Figure 4 shows the results of localization for the pigs through the depth information in low-contrast and sunlight conditions, respectively.The missing pig-pixel problem could be solved effectively (compare the green areas shown in Figures 2 and 4).

Procedure with Infrared Information
In fact, the pigs in the pen can be detected using characteristics of the infrared information  , such as accurate pixel values.However, there is a problem in that pigs may not be detected in various illumination conditions such as sunlight and low contrast in  .In other words, because the infrared image is affected by various illuminations, the localized images obtained from the depth information,  and ′ , should be exploited for accurately detecting the pigs.In the same way as the depth images, the ROI of  is set in order to exclude the unnecessary region in the pen, and then the 4 × 4 spatiotemporal interpolation technique is performed for removing noises such as sunlight.Note that the spatiotemporal interpolation technique is performed only once, as the pixel values in  are not correctly interpolated, owing to the characteristic sensitivity to illumination conditions.Then, the histogram equalization (HE) [46] is performed to resolve the low contrast of  , which makes the contrast in  consistent.Through the procedure of HE, the Otsu algorithm is applied to roughly localize the pigs in  .Figure 5 shows the results of the localization of the pigs in  gained from  at low-contrast and sunlight conditions.However, the pigs in  cannot be accurately detected because the contrast of all pixels in the floor, wall, and sunlight are also coordinated consistently by applying HE.That is, even though all the pigs in the pen can be confirmed, the noises are not totally removed.These noises can be removed by exploiting the complementary information from the infrared images (i.e.,  ) and depth images (i.e.,  and ′ ) simultaneously.

Procedure with Infrared Information
In fact, the pigs in the pen can be detected using characteristics of the infrared information I input , such as accurate pixel values.However, there is a problem in that pigs may not be detected in various illumination conditions such as sunlight and low contrast in I input .In other words, because the infrared image is affected by various illuminations, the localized images obtained from the depth information, D localize and D localize , should be exploited for accurately detecting the pigs.
In the same way as the depth images, the ROI of I input is set in order to exclude the unnecessary region in the pen, and then the 4 × 4 spatiotemporal interpolation technique is performed for removing noises such as sunlight.Note that the spatiotemporal interpolation technique is performed only once, as the pixel values in I input are not correctly interpolated, owing to the characteristic sensitivity to illumination conditions.Then, the histogram equalization (HE) [46] is performed to resolve the low contrast of I interpolate , which makes the contrast in I interpolate consistent.Through the procedure of HE, the Otsu algorithm is applied to roughly localize the pigs in I contrast .Figure 5 shows the results of the localization of the pigs in I localize gained from I contrast at low-contrast and sunlight conditions.However, the pigs in I localize cannot be accurately detected because the contrast of all pixels in the floor, wall, and sunlight are also coordinated consistently by applying HE.That is, even though all the pigs in the pen can be confirmed, the noises are not totally removed.These noises can be removed by exploiting the complementary information from the infrared images (i.e., I localize ) and depth images (i.e., D localize and D localize ) simultaneously.

Procedure with Infrared Information
In fact, the pigs in the pen can be detected using characteristics of the infrared information  , such as accurate pixel values.However, there is a problem in that pigs may not be detected in various illumination conditions such as sunlight and low contrast in  .In other words, because the infrared image is affected by various illuminations, the localized images obtained from the depth information,  and ′ , should be exploited for accurately detecting the pigs.In the same way as the depth images, the ROI of  is set in order to exclude the unnecessary region in the pen, and then the 4 × 4 spatiotemporal interpolation technique is performed for removing noises such as sunlight.Note that the spatiotemporal interpolation technique is performed only once, as the pixel values in  are not correctly interpolated, owing to the characteristic sensitivity to illumination conditions.Then, the histogram equalization (HE) [46] is performed to resolve the low contrast of  , which makes the contrast in  consistent.Through the procedure of HE, the Otsu algorithm is applied to roughly localize the pigs in  .Figure 5 shows the results of the localization of the pigs in  gained from  at low-contrast and sunlight conditions.However, the pigs in  cannot be accurately detected because the contrast of all pixels in the floor, wall, and sunlight are also coordinated consistently by applying HE.That is, even though all the pigs in the pen can be confirmed, the noises are not totally removed.These noises can be removed by exploiting the complementary information from the infrared images (i.e.,  ) and depth images (i.e.,  and ′ ) simultaneously.

Detecting Pigs Using both Depth and Infrared Information
In order to detect only the pigs, DI 1 is produced by conducting an intersection operation between D localize and I localize where HE and the Otsu algorithm are applied.Figure 6 shows the result after an intersection operation to detect the pigs by removing the noises generated from the floor, wall, and sunlight.Although the pigs can be identified at low-contrast and sunlight conditions, some pigs are unidentified by the detected background because all of the pixels in   are consistently coordinated by HE.

Detecting Pigs Using both Depth and Infrared Information
In order to detect only the pigs,  is produced by conducting an intersection operation between  and  where HE and the Otsu algorithm are applied.Figure 6 shows the result after an intersection operation to detect the pigs by removing the noises generated from the floor, wall, and sunlight.Nevertheless, there is a problem in that the wall and floor are still detected in  : first, the wall is not removed in not only  with HE and Otsu algorithm but also in  ; second, the center of the floor is also detected, largely because of all of the coordinated contrast pixels caused by HE.For only detecting pigs from the wall and floor, ′ , i.e., the frame difference image between  and  , is used.Because the wall in ′ is mostly removed through background subtraction and the pigs are roughly localized in the image, it is able to detect the pigs by performing an intersection between  and ′ , where most of the wall and floor are removed in  .Given  , the post-processing using some image processing techniques is performed to accurately detect the pigs.In order to remove the noise remaining in  , an erosion operation is conducted to remove and minimize small noises that are adjacent to the objects or generated from the intersection operation in  .Then, all of the objects are labeled through connected component analysis (CCA), where each area of the objects is calculated, along with whether the objects should be removed or not according to their sizes.After removing the noises according their sizes, the pigs can be precisely detected by using a dilation operation to recover the shapes of the pigs.Figure 7 shows that the pigs are finally detected by applying the proposed method through  and  in various illumination conditions.Nevertheless, there is a problem in that the wall and floor are still detected in DI 1 : first, the wall is not removed in not only I localize with HE and Otsu algorithm but also in D localize ; second, the center of the floor is also detected, largely because of all of the coordinated contrast pixels caused by HE.For only detecting pigs from the wall and floor, D localize , i.e., the frame difference image between D input and D background , is used.Because the wall in D localize is mostly removed through background subtraction and the pigs are roughly localized in the image, it is able to detect the pigs by performing an intersection between DI 1 and D localize , where most of the wall and floor are removed in DI 2 .Given DI 2 , the post-processing using some image processing techniques is performed to accurately detect the pigs.In order to remove the noise remaining in DI 2 , an erosion operation is conducted to remove and minimize small noises that are adjacent to the objects or generated from the intersection operation in DI 2 .Then, all of the objects are labeled through connected component analysis (CCA), where each area of the objects is calculated, along with whether the objects should be removed or not according to their sizes.After removing the noises according their sizes, the pigs can be precisely detected by using a dilation operation to recover the shapes of the pigs.Figure 7 shows that the pigs are finally detected by applying the proposed method through D input and I input in various illumination conditions.At last, the proposed method is described in Algorithm 1 as follows.

Experimental Setup and Resources for the Experiment
The following experimental setup was used to conduct our pig detection method: Intel Core i7-7700K 4.20 GHz (Intel, Santa Clara, CA, USA), NVIDIA GeForce GTX1080 Ti 11 GB VRAM (NVIDIA, Santa Clara, CA, USA), 32 GB RAM, Ubuntu 16.04.2LTS (Canonical Ltd, London, UK), and OpenCV At last, the proposed method is described in Algorithm 1 as follows.Step 2: Detecting pigs with depth and infrared information collectively Erode DI 2 to remove and minimize noises; Conduct CCA to the minute noises in DI 2 ; Dilate DI 2 to recover shapes of the pigs;

Experimental Setup and Resources for the Experiment
The following experimental setup was used to conduct our pig detection method: Intel Core i7-7700K 4.20 GHz (Intel, Santa Clara, CA, USA), NVIDIA GeForce GTX1080 Ti 11 GB VRAM (NVIDIA, Santa Clara, CA, USA), 32 GB RAM, Ubuntu 16.04.2LTS (Canonical Ltd, London, UK), and OpenCV 3.4 [53] for image processing.We installed an Intel RealSense low-cost camera (D435 model, Intel, Symmetry 2019, 11, 266 11 of 20 Santa Clara, CA, USA) [43] on a ceiling at a height of 3.2 m in a 2.0 m × 4.9 m pig pen located in Chungbuk National University, Korea.
In the pig pen, a total of nine pigs (Duroc × Landrace × Yorkshire) were raised, with an average initial body weight (BW) of 92.5 ± 5.9 kg.We simultaneously obtained infrared and depth videos from the installed camera, which had a resolution of 1, 280 × 720 and 30 frames per second (FPS).Figure 8 displays the whole monitoring setup with the camera in the pig pen.
We used the depth and infrared images obtained from the camera during a 24 h period.Because it was extremely difficult to create the ground-truth image 24 h videos (i.e., 2,592,000 frames were obtained from 24 h videos of 30 FPS), our method for detecting the pigs was applied to three frames per ten minutes (i.e., total of 432 frames) selected from each video.Meanwhile, as explained in Section 2, various illumination conditions in the depth and infrared videos were confirmed, such as low contrast and sunlight.In particular, the illumination issues of low-contrast and sunlight conditions were evidently found when the pigs were located at the corners in the pen or when sunlight appeared at the specific time (08:00-10:00 a.m.).Thus, we detected the pigs while considering the issues for the illumination conditions.3.4 [53] for image processing.We installed an Intel RealSense low-cost camera (D435 model, Intel, Santa Clara, CA, USA) [43] on a ceiling at a height of 3.2 m in a 2.0 m 4.9 m pig pen located in Chungbuk National University, Korea.
In the pig pen, a total of nine pigs (Duroc × Landrace × Yorkshire) were raised, with an average initial body weight (BW) of 92.5 ± 5.9 kg.We simultaneously obtained infrared and depth videos from the installed camera, which had a resolution of 1,280 720 and 30 frames per second (FPS).Figure 8 displays the whole monitoring setup with the camera in the pig pen.We used the depth and infrared images obtained from the camera during a 24 h period.Because it was extremely difficult to create the ground-truth image 24 h videos (i.e., 2,592,000 frames were obtained from 24 h videos of 30 FPS), our method for detecting the pigs was applied to three frames per ten minutes (i.e., total of 432 frames) selected from each video.Meanwhile, as explained in Section 2, various illumination conditions in the depth and infrared videos were confirmed, such as low contrast and sunlight.In particular, the illumination issues of low-contrast and sunlight conditions were evidently found when the pigs were located at the corners in the pen or when sunlight appeared at the specific time (08:00-10:00 a.m.).Thus, we detected the pigs while considering the issues for the illumination conditions.

Detection of Pigs under Various Illumination Conditions
Initially, we modeled  as an independent procedure for conducting the frame difference between  and  .To remove and minimize the noises caused by the illumination conditions in the depth and infrared information, a spatiotemporal interpolation technique was applied to the 1,296 frames extracted from each video.Note that because the spatiotemporal interpolation technique was interpolated from three frames to one frame, 1,296 frames were needed to detect the pigs in 432 frames.In the  and  derived from the interpolation technique, simple image processing techniques were conducted to each domain.
In the case of the procedure of depth information, a histogram analysis was performed to gain  from  .Here, the frequency of the depth value corresponding to the background converged to 53, and the threshold for segmenting the background was defined as 53. was then derived by binarizing  , using the threshold defined through histogram analysis.In the second step, the frame difference between  and  was carried out to derive ′ , where the Otsu algorithm was applied to ′ .Note that because the parameter for localizing the pigs may be changed continuously according to the inaccurate depth values, the Otsu algorithm should be used to automatically determine the parameter for every image.In the case of the procedure of infrared information,  could be obtained by applying HE to localize the pigs with  .Similar to the procedure of the depth information, the Otsu algorithm was used

Detection of Pigs under Various Illumination Conditions
Initially, we modeled D background as an independent procedure for conducting the frame difference between D background and D interpolate .To remove and minimize the noises caused by the illumination conditions in the depth and infrared information, a spatiotemporal interpolation technique was applied to the 1296 frames extracted from each video.Note that because the spatiotemporal interpolation technique was interpolated from three frames to one frame, 1296 frames were needed to detect the pigs in 432 frames.In the D interpolate and I interpolate derived from the interpolation technique, simple image processing techniques were conducted to each domain.
In the case of the procedure of depth information, a histogram analysis was performed to gain D localize from D interpolate .Here, the frequency of the depth value corresponding to the background converged to 53, and the threshold for segmenting the background was defined as 53.D localize was then derived by binarizing D interpolate , using the threshold defined through histogram analysis.In the second step, the frame difference between D background and D interpolate was carried out to derive D localize , where the Otsu algorithm was applied to D localize .Note that because the parameter for localizing the pigs may be changed continuously according to the inaccurate depth values, the Otsu algorithm should be used to automatically determine the parameter for every image.In the case of the procedure of infrared information, I contrast could be obtained by applying HE to localize the pigs with I interpolate .Similar to the procedure of the depth information, the Otsu algorithm was used to define the parameter for segmenting the background so that I localize , where the pigs were localized, was obtained from I contrast with the Otsu algorithm.With the attributes from these procedures, DI 1 and DI 2 were obtained by intersecting among these localized images, where the noises resulted in the illumination conditions were removed.Finally, a morphology operation and CCA were conducted to DI 2 as the post-processing steps for refining the detected pigs.As the size of each noise calculated by CCA was less than 100, the noises were simply removed with the threshold defined as 100.After that, a dilation operation was conducted three times to sufficiently recover the shape of the pigs, and as a result, all of the pigs in the pen could be accurately detected.Figure 9 illustrates the detected pigs by using the proposed method from the 24 h recorded videos.In Figure 9, only one detection result per hour is displayed because of the large number of the frames in the 24 h videos.to define the parameter for segmenting the background so that  , where the pigs were localized, was obtained from  with the Otsu algorithm.With the attributes from these procedures,  and  were obtained by intersecting among these localized images, where the noises resulted in the illumination conditions were removed.Finally, a morphology operation and CCA were conducted to  as the post-processing steps for refining the detected pigs.As the size of each noise calculated by CCA was less than 100, the noises were simply removed with the threshold defined as 100.After that, a dilation operation was conducted three times to sufficiently recover the shape of the pigs, and as a result, all of the pigs in the pen could be accurately detected.Figure 9 illustrates the detected pigs by using the proposed method from the 24 h recorded videos.In Figure 9, only one detection result per hour is displayed because of the large number of the frames in the 24 h videos.

Evaluation of Detection Performance
For evaluating the performance of detecting the pigs from the proposed method, we compared the detection result of the proposed method with those of state-of-the-art deep learning-based methods, including YOLO9000 [54] (i.e., a bounding box-based object detection method) and DeepLab [55] (i.e., a pixel-level semantic segmentation method).In particular, YOLO9000 was selected among many bounding box-based object detectors because it is known to be very fast and reasonably accurate (due to its "you only look once").DeepLab was also selected among many pixellevel semantic segmentors because it is known to be fast and accurate (due to its "Atrous convolution").Because YOLO9000 is a bounding box-based object detector and DeepLab is a pixellevel semantic segmentor, YOLO9000 is expected to be faster but less accurate than DeepLab.Note that because the depth information was inaccurate as described in Section 2, we only used the infrared

Evaluation of Detection Performance
For evaluating the performance of detecting the pigs from the proposed method, we compared the detection result of the proposed method with those of state-of-the-art deep learning-based methods, including YOLO9000 [54] (i.e., a bounding box-based object detection method) and DeepLab [55] (i.e., a pixel-level semantic segmentation method).In particular, YOLO9000 was selected among many bounding box-based object detectors because it is known to be very fast and reasonably accurate (due to its "you only look once").DeepLab was also selected among many pixel-level semantic segmentors because it is known to be fast and accurate (due to its "Atrous convolution").Because YOLO9000 is a bounding box-based object detector and DeepLab is a pixel-level semantic segmentor, Symmetry 2019, 11, 266 13 of 20 YOLO9000 is expected to be faster but less accurate than DeepLab.Note that because the depth information was inaccurate as described in Section 2, we only used the infrared information for training and testing the data (i.e., detecting the pigs) with the deep learning-based methods.Before executing the deep learning-based methods, we realized that it was hard to generate the ground-truth images, and that the ground-truth was not enough to train for deep leaning-based methods.Thus, we generated 2592 ground-truth data through data augmentation by flipping the input data vertically and horizontally.Note that the input image resolution could be increased to detect more objects [56].However, the dataset for training and testing both YOLO and DeepLab was composed of the same resolution as the data which was used in the proposed method for fair comparison.
In the case of YOLO9000, we produced a model through the training data, which was composed of 2592 infrared frames.We defined the hyperparameters that were utilized in YOLO9000 for training as follows: 0.001 for learning rate, 0.0005 for decay, default anchor parameter, 0.9 for momentum, leaky ReLU as the activation function, and 10,000 for the epoch.In the case of DeepLab, we also produced a model through the training data, which was composed of the same dataset as YOLO.In addition, we defined the hyperparameters that were also utilized in DeepLab for training as follows: 0.006 for learning rate, 0.0005 for decay, 0.9 for momentum, ReLU as the activation function, and 30,000 for the epoch.In the training step of each method, a pretrained model through ResNet with COCO dataset was exploited.We then used 432 test frames, consisting of sunlight and low-contrast conditions in the pen, as well as normal conditions.From the test step from each method, YOLO9000 generated bounding boxes on the pigs and DeepLab conducted semantic segmentation between foreground (i.e., pigs) and background (i.e., floor and wall).However, both of the methods could not detect some pigs located at the corner or in the area of sunlight, as compared to the proposed method.Figure 10 shows the results of the detected pigs for each method in the various illumination conditions.information for training and testing the data (i.e., detecting the pigs) with the deep learning-based methods.Before executing the deep learning-based methods, we realized that it was hard to generate the ground-truth images, and that the ground-truth was not enough to train for deep leaning-based methods.Thus, we generated 2,592 ground-truth data through data augmentation by flipping the input data vertically and horizontally.Note that the input image resolution could be increased to detect more objects [56].However, the dataset for training and testing both YOLO and DeepLab was composed of the same resolution as the data which was used in the proposed method for fair comparison.
In the case of YOLO9000, we produced a model through the training data, which was composed of 2,592 infrared frames.We defined the hyperparameters that were utilized in YOLO9000 for training as follows: 0.001 for learning rate, 0.0005 for decay, default anchor parameter, 0.9 for momentum, leaky ReLU as the activation function, and 10,000 for the epoch.In the case of DeepLab, we also produced a model through the training data, which was composed of the same dataset as YOLO.In addition, we defined the hyperparameters that were also utilized in DeepLab for training as follows: 0.006 for learning rate, 0.0005 for decay, 0.9 for momentum, ReLU as the activation function, and 30,000 for the epoch.In the training step of each method, a pretrained model through ResNet with COCO dataset was exploited.We then used 432 test frames, consisting of sunlight and low-contrast conditions in the pen, as well as normal conditions.From the test step from each method, YOLO9000 generated bounding boxes on the pigs and DeepLab conducted semantic segmentation between foreground (i.e., pigs) and background (i.e., floor and wall).However, both of the methods could not detect some pigs located at the corner or in the area of sunlight, as compared to the proposed method.Figure 10 shows the results of the detected pigs for each method in the various illumination conditions.In the experimental results for detecting the pigs through the proposed method and the deep learning-based methods, we calculated the pig detection accuracy for comparing the performance of each method.We calculated the precision, recall, and the detection accuracy (denoted as ACC) as the intersection-over-union [57] for each method using the following equations: In the experimental results for detecting the pigs through the proposed method and the deep learning-based methods, we calculated the pig detection accuracy for comparing the performance of each method.We calculated the precision, recall, and the detection accuracy (denoted as ACC) as the intersection-over-union [57] for each method using the following equations: Symmetry where true positive (TP) means a pixel on the pig predicted as the pig, false positive (FP) means a pixel on the background predicted as the pig, and false negative (FN) means a pixel on the pig predicted as the background, respectively.As shown in Figure 10, for example, we represented the false detected pixels for pigs (i.e., FP as false pig and FN as false background) as the red and green colors, respectively, from the results of each detection method.In the experimental results, the precision of each method was respectively measured as 0.79 (YOLO9000 method), 0.91 (DeepLab method), and 0.92 (proposed method).Also, the recall of each method was derived as 0.64 (YOLO9000 method), 0.88 (DeepLab method), and 0.86 (proposed method).Lastly, the detection accuracy (i.e., ACC) was measured as 0.54 (YOLO9000 method), 0.79 (DeepLab method), and 0.79 (proposed method), as shown in Table 3.By carefully fusing the depth and infrared information, the proposed method could also provide a higher accuracy than the deep learning-based methods.In addition, the execution time for each method was measured in order to verify the real-time requirements on pig detection.As shown in Table 3, YOLO9000 could provide faster results than DeepLab.By applying simple but effective image processing techniques without any time-consuming techniques, the proposed method could provide much faster results than YOLO9000.Note that the deep learning-based methods have a huge number of weights to be computed and thus required tens or hundreds of milliseconds to detect the pigs from one image, even with a powerful GPU.On the contrary, the execution time of the proposed method was measured with a single CPU core.If we parallelize the simple pixel-level operations of the proposed method, then we can improve the execution speed of the proposed method further.
For real-time video stream applications such as 24 h pig monitoring or autonomous driving [58], we need to maximize the accuracy while satisfying the real-time constraint.Generally, there is a tradeoff between accuracy and the computational resources required.That is, a higher accuracy requires more computational resources, whereas less computational resources drive a lower accuracy.Thus, the tradeoff between accuracy (i.e., ACC) and processing speed (i.e., FPS) should be analyzed for the 24 h pig monitoring application.Similar cases have been analyzed by the video compression community to control the power consumption of an embedded computer and to maximize the compressed video quality [59,60].For the purpose of explanation, we define "real-time accuracy" (denoted as ACC RealTime ) as follows.
To derive the collective (i.e., ACC vs FPS) performance of a method X, we first represent the performance of method X in the two-dimensional domain of FPS (i.e., x axis) and ACC (i.e., y axis) as shown in Figure 11a: Per f ormance = (X FPS , X ACC ), where 0 < X ACC < 1. (4) Then, we assume two hypotheses: first, the upper limit of X ACC with an unlimited computational resource (i.e., X FPS = 0) is 1 (see the black point at (0, 1) shown in Figure 11a); second, each computational step of method X contributes to the accuracy equally.In addition, the real-time criterion for a video stream application such as 24 h pig monitoring is set to 30 FPS (see the dashed line shown in Figure 11a).For the detection accuracy of method X at 30 FPS, we estimate the real-time accuracy of method X by using the two points (0, 1) and (X FPS , X ACC ) by using Equation ( 5) and ( 6): Where For example, we can represent the performance of three methods (i.e., A, B, and C) in the two-dimensional domain of FPS and ACC, as shown in Figure 11a, as A FPS < 30 and B FPS > 30, A ACC < A ACC and B ACC > B ACC .However, the real-time accuracy of method C is undefined because C ACC < 0. It means that method C cannot satisfy the real-time requirement due to the relatively low accuracy in terms of its resource consumption.Figure 11b shows the possible area where ACC RealTime can be defined.That is, the real-time accuracies of the proposed and YOLO9000 methods could be defined sufficiently, whereas the DeepLab method could be defined marginally with the very low real-time accuracy on our experimental setup.Especially, the precise detection of the pigs using DeepLab was very difficult in low-contrast and sunlight environments, and the huge computational workload required for semantic segmentation was also burdened for real-time pig detection.On the contrary, the proposed method (i.e., FastPigDetect) could provide a reasonable accuracy with much less computational workload.As described in Section 1, it is necessary to establish a complete and automatic monitoring application in real-time for our final goal involving both intermediate-and high-level vision tasks.That is, detecting pigs should be performed as fast as possible by considering the further procedures of both intermediate-and high-level vision tasks.With less time-consuming techniques, it is able to establish a real-time monitoring application for pig involving both intermediateand high-level vision tasks.
Although the FastPigDetect method could detect the pigs in real time by applying simple image processing through data modality between infrared and depth information, it is necessary to develop a parameter-optimized pig detection method.It means that the generalization for other modality images or other pig room data may be impossible with the current proposed method because the parameter of the proposed method by using both depth and infrared information were optimized for our experimental pig room only.In fact, the contribution of the proposed method is to detect the pigs in real time from various illumination conditions including intense sunlight in a pig room.It is important to develop a general pig detection algorithm whose parameters are determined automatically for commercial products.However, other parameters for the RoI setting or morphological operation (e.g., dilation/erosion) should be optimized according to the structure of a pig room (e.g., shape of floor/wall or size of a pig room) or the camera installation environment (e.g., installation height).Even though the generalization capability is out of scope of this study, the capability is required for commercial products, and it will be an interesting future work.
Furthermore, our proposed method exploited both infrared and depth information, but the deep learning-based methods only used infrared information.In the previous study [39], the YOLO model trained with depth information was used to detect only standing pigs in a pig room.When training the YOLO model with only depth information, however, the detection performance for standing and lying pigs through YOLO was not acceptable.Since detecting lying pigs was much more difficult than standing pig detection due to inaccurate depth values, for accomplishing our goal in this study, we conducted training and testing with YOLO by only using infrared information.As shown in Figure 1, the infrared information has more accurate pixel values than depth information.However, multimodal learning has several research issues.Thus, we will explore how the pig detection accuracy through the deep learning-based method is improved by using depth information as an additional channel with infrared information, as well as fine-tuning the deep learning architecture.
the proposed method (i.e., FastPigDetect) could provide a reasonable accuracy with much less computational workload.As described in Section 1, it is necessary to establish a complete and automatic monitoring application in real-time for our final goal involving both intermediate-and high-level vision tasks.That is, detecting pigs should be performed as fast as possible by considering the further procedures of both intermediate-and high-level vision tasks.With less time-consuming

Conclusions
In a surveillance environment on a realistic pig farm, fast pig detection is important to efficiently manage the pigs for their health care.Nevertheless, there is a problem that pigs could not be accurately detected because of various illumination conditions in a realistic pig farm.With an infrared camera, for example, the gray values of pigs located at the four corners are generally darker than those of pigs located at center locations.In particular, sunlight through a window at daytime makes it difficult to separate pigs from the neighboring wall and floor.
In this study, we concentrated on detecting pigs in real time under various illumination conditions to analyze the behaviors of individual pigs with the final goal of the consistent monitoring during 24 h.In other words, we proposed a pig detection method at daytime and nighttime with less time-consuming techniques.As an initial step for preprocessing, a spatiotemporal interpolation technique was applied to remove the noise caused by sunlight.Then, we detected pigs by carefully fusing the depth and infrared information and applying image processing techniques.In particular, we applied simple but effective image processing techniques only (i.e., without any time-consuming techniques, such as frequency-or optimization-or deep learning-based detections) with both previous and current frame information in order to make the final goal of intelligent pig monitoring run in real time.
Based on the experimental results for 432 video frames (including 3888 pigs) over 24 h, we confirmed that all 3888 pigs could be detected correctly (while the accuracy with ground-truth was 0.79) in real time (i.e., 114 FPS).Compared with the state-of-the-art deep learning-based methods, the proposed method could detect pigs more accurately and more quickly.We will extend this study to develop a real-time tracking system for individual pigs over 24 h for the management of individual pigs as the final goal.

Figure 1 .
Figure 1.Various illumination conditions: (a) At 7 a.m.(for the purpose of explanation, we denote this kind of image as a low-contrast image) and (b) at 9 a.m.(for the purpose of explanation, we denote this kind of image as a sunlight image).

Figure 1 .
Figure 1.Various illumination conditions: (a) At 7 a.m.(for the purpose of explanation, we denote this kind of image as a low-contrast image) and (b) at 9 a.m.(for the purpose of explanation, we denote this kind of image as a sunlight image).

Figure 2 .
Figure 2. The difficulties of pig detection under various illumination conditions: (a) the results of Otsu[47] after contrast limited adaptive histogram equalization (CLAHE)[45] with a low-contrast image and (b) the results of Otsu[47] after CLAHE[45] with a sunlight image.

Figure 2 .
Figure 2. The difficulties of pig detection under various illumination conditions: (a) the results of Otsu[47] after contrast limited adaptive histogram equalization (CLAHE)[45] with a low-contrast image and (b) the results of Otsu[47] after CLAHE[45] with a sunlight image.

Figure 3 .
Figure 3.The overall procedure of the proposed method.
Depth background image through modeling during 24 h videos  Depth interpolated image through spatiotemporal interpolation  Depth image where pigs are localized through threshold ′ Depth image where pigs are localized through background subtraction and Otsu Infrared  Infrared input image  Infrared interpolated image with spatiotemporal interpolation  Infrared image where the contrast is coordinated by histogram equalization  Infrared image where pigs are localized by Otsu algorithm Depth + Infrared  Intersection image between  and   Intersection image between  and ′

Figure 3 .
Figure 3.The overall procedure of the proposed method.

Figure 4 .
Figure 4.The localization of the pigs with a low contrast (at 7 a.m.) and a sunlight image (at 9 a.m.). and ′ are generated through the threshold from a histogram analysis about the depth values and background subtraction using modeled  , respectively.

Figure 4 .
Figure 4.The localization of the pigs with a low contrast (at 7 a.m.) and a sunlight image (at 9 a.m.).D localize and D localize are generated through the threshold from a histogram analysis about the depth values and background subtraction using modeled D background , respectively.

Symmetry 2019 , 19 Figure 4 .
Figure 4.The localization of the pigs with a low contrast (at 7 a.m.) and a sunlight image (at 9 a.m.). and ′ are generated through the threshold from a histogram analysis about the depth values and background subtraction using modeled  , respectively.

Figure 5 .
Figure 5.The localization of the pigs in the infrared information at low-contrast and sunlight conditions.The bold box indicates the noises detected by histogram equalization (HE) in I localize .Although the pigs can be identified at low-contrast and sunlight conditions, some pigs are unidentified by the detected background because all of the pixels in I interpolate are consistently coordinated by HE.

Symmetry 2019 , 19 Figure 5 .
Figure 5.The localization of the pigs in the infrared information at low-contrast and sunlight conditions.The bold box indicates the noises detected by histogram equalization (HE) in   .

Figure 6 .
Figure 6.The result of the intersection operation between   and   to detect the pigs.

Figure 6 .
Figure 6.The result of the intersection operation between D localize and I localize to detect the pigs.

Figure 7 .
Figure 7.The detection result of all pigs by using depth information and infrared information.

Figure 7 .
Figure 7.The detection result of all pigs by using depth information and infrared information.

Algorithm 1 .
Pig detection algorithm under various illumination conditions Input: Depth and infrared images Output: Detected pig image Step 1: Removing noises and localizing pigs with depth and infrared information individually Procedure with depth information: Generate D background from modeling background during 24 h videos; D interpolate = SpatTempIntp D input ; threshold = HistAnalysis D interpolate ; for y = 0 to height: for x = 0 to width: i f D interpolate (x, y) > threshold: D localize (x, y) = 255; else: D localize (x, y) = 0; D localize = BackgroundSubtract D interpolate , D background ; D localize = Otsu(D localize ); Procedure with infrared information:; I interpolate = SpatTempIntp I input ; I contrast = HistEqualization I interpolate ; I localize = Otsu(I contrast );

Figure 8 .
Figure 8.The experimental setup with a RealSense low-cost camera.

Figure 8 .
Figure 8.The experimental setup with a RealSense low-cost camera.

Figure 9 .
Figure 9.The results of pig detection under various illumination conditions.

Figure 9 .
Figure 9.The results of pig detection under various illumination conditions.

Figure 10 .
Figure 10.The results of each method for pig detection: (a) the results with a low-contrast image and (b) the results with a sunlight image.

Figure 10 .
Figure 10.The results of each method for pig detection: (a) the results with a low-contrast image and (b) the results with a sunlight image.

Figure 11 .
Figure 11.A comparison of each method by using various performance metrics: (a) the illustration of the real-time accuracy in the two-dimensional domain of processing speed (i.e., FPS) and accuracy (i.e., ACC) and (b) the comparison of the real-time accuracy between the proposed method and deep learning-based methods.The shadow area shows the range of ACC of which the ACCRealTime can be defined.

Figure 11 .
Figure 11.A comparison of each method by using various performance metrics: (a) the illustration of the real-time accuracy in the two-dimensional domain of processing speed (i.e., FPS) and accuracy (i.e., ACC) and (b) the comparison of the real-time accuracy between the proposed method and deep learning-based methods.The shadow area shows the range of ACC of which the ACC RealTime can be defined.

Table 1 .
Some of the pig detection results (published during 2009-2018).

Table 2 .
Definitions of the key terminologies of the proposed method.

Table 2 .
Definitions of the key terminologies of the proposed method.
localize Infrared image where pigs are localized by Otsu algorithm Depth + Infrared DI 1 Intersection image between D localize and I localize DI 2 Intersection image between DI 1 and D localize Symmetry 2019, 11, 266 7 of 20 3.1.Removing Noises and Localizing Pigs 3.1.1.Procedure with Depth Information

Table 3 .
A comparison of the average performance.