Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment

Lee, Bohyeok; Song, Juwhan

doi:10.3390/agriculture15050539

Open AccessArticle

Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment

by

Bohyeok Lee

¹ and

Juwhan Song

^1,2,*

¹

Graduate School of Artificial Intelligence, Jeonju University, Jeonju-si 55069, Republic of Korea

²

Artificial Intelligence Research Center, Jeonju University, Jeonju-si 55069, Republic of Korea

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(5), 539; https://doi.org/10.3390/agriculture15050539

Submission received: 10 January 2025 / Revised: 3 February 2025 / Accepted: 25 February 2025 / Published: 1 March 2025

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The weight information of broilers is important for understanding the growth progress of broilers and adjusting the breeding schedule, and predicting the broiler live weight at the time of shipment is an important task for producing high-quality broilers that meet consumer demand. To this end, we plan to analyze the broiler weight data automatically measured in a smart broiler house with an intelligent system and conduct a study to predict the weight until the time of shipment. To estimate the accurate daily body weight representative value of broiler body weight data, the K-means clustering method and the kernel density estimation method were applied, and the growth trends generated by each method were used as training data for the Prophet predictor, double exponential smoothing predictor, ARIMA predictor, and Gompertz growth model. The experimental results showed that the K-means + Prophet predictor model recorded the best prediction performance among the algorithm combinations proposed in this paper. The prediction results of the algorithm presented in this paper can analyze the growth progress of broilers in actual broiler houses and can be used as meaningful judgment data for adjusting the breeding schedule considering the time of shipment.

Keywords:

Prophet model; clustering; kernel density estimation; broiler

1. Introduction

The body weight information of broilers is important for understanding the growth progress of broilers, and the problem of predicting this body weight information is an important factor necessary for maximizing meat production. Cobb 500, the most widely bred broiler breed, accounting for approximately half of all farmed chickens worldwide [1], can reach a live weight of 1 kg at 21 days of age and grow to 2 kg at 32 days of age [2]. Due to these characteristics, they can be flexibly raised in various countries and are raised to the main consumption weight and processed and consumed. For example, the Republic of Korea prefers raising up to 1500 g in weight, which is suitable for the commonly consumed Korean Fried Chicken and smoked processing forms [3], and determines the rearing period from the start of rearing to shipment by predicting the optimal rearing time considering the production of meat of the preferred quality. The shipping time is determined by the growth progress in the broiler farm, and accurate weight information from the farm before shipping is required to determine the optimal shipping time considering the daily weight gain of the broiler [4]. In this way, the problem of predicting the weight at the time of shipping and the weight gain until shipping is an important problem directly related to the farm’s supply income and the sales of the slaughter company responsible for distribution and processing. However, it is realistically difficult to manually measure the exact weight of the broiler on a daily basis for accurate analysis. Cobb 500 is a fast-growing, high-efficiency broiler breed, but it is also a breed that dies easily. The main cause of death can be external physical stimulation, and the stress caused by manually measuring the weight of broilers can significantly increase the mortality rate of broilers [5]. In addition, when weight measurement researchers are continuously exposed to a low oxygen concentration and high dust and ammonia gas concentration inside the chicken house it can be harmful to the human body [6].

In this paper, we study a prediction algorithm that utilizes broiler weight data automatically collected from smart broiler farms in Namwon and Wanju, Republic of Korea, to predict the exact weight at the time of shipment, rather than measuring the weight manually. Most of the existing studies utilize manually measured refined data to predict broiler shipment weight. In this study, we collect broiler weight data from an actual smart barn and propose a new data refinement and prediction process that effectively removes outliers and noise. To this end, a live weight measurement device was installed inside the barn to automatically collect data, and we identified problems in which broilers were poorly positioned on the measurement board and feces, feed, carcasses, and feathers were measured as noise. The key contribution of this study is to overcome the limitations of the existing simple average-based approach and propose a daily representative value estimation method that can be applied to various data sets containing outliers. Automatically collected data contain numerous outliers, and it is difficult to refine these outliers with traditional detection methods. A prediction method that includes outliers results in large errors. Therefore, this study aims to simplify the data, which exhibit a high complexity due to outliers, to obtain clear trend information, and address this problem through time series prediction, an approach rarely used in existing studies. In this study, we developed a method to effectively remove outliers and separate meaningful data without complex preprocessing by applying clustering and density estimation techniques. Through this, time series data consisting of representative weight values of broiler chickens that can explain broiler chicken growth patterns were created and optimized by using them as input data for the prediction model. To this end, we performed a clustering method and a density analysis method for data containing outliers and noise and implemented a prediction algorithm by applying the Prophet prediction model distributed by Meta (formerly Facebook). These results were compared with the prediction results of the Gompertz growth curve, a growth modeling function used in many existing studies, and the double exponential smoothing model and ARIMA model, which are representative time series prediction models. This study contributes to improving the reliability of weight monitoring and shipping point prediction in a smart livestock environment by proposing an automated refinement and prediction process for broiler weight data. This is expected to complement the limitations of existing labor-intensive measurement methods and support more efficient data-based decision making.

2. Background

2.1. Previous Studies on Broiler Weight Analysis

Research on analyzing the growth of broilers, explaining the growth curve, and predicting the growth of broilers has been ongoing for a long time.

W. B. Roush et al. (2006) modeled the growth curve of broilers using manually measured broiler weight data from 1 to 70 days. They noted that broilers have a gradual initial growth period, a rapid growth period, and a stable growth period. They classified the collected data set into even-day data and odd-day data and used them for learning and experiments with the Gompertz model and the Neural Network model. The experimental results showed that both models were suitable for explaining the broiler growth curve, and the Neural Network model was slightly superior, recording a mean squared error (MSE) of 382.2 and a mean absolute percentage error (MAPE) of 2.983 [7].

M. Topal et al. (2008) conducted a comparative study on growth models using weekly broiler weight data from 0 to 6 weeks, manually measured once a week, to select a nonlinear function that best explains the growth curve of broilers among various nonlinear functions. Among the Weibull model, MMF model, Gompertz model, Bertalanffy model, and logistic model, the model that best explained the growth of broilers was the Weibull model, which recorded a coefficient of determination (R-squared) of 1.0 and a mean absolute error (MAPE) of 0.03 [8].

Ahmad H.A. et al. (2009) generated representative weights from 0 to 7 weeks using manually measured weight data of broilers. The generated representative weights were used as training data for a Neural Network model performing backpropagation and the model’s suitability was tested. The trained Neural Network model recorded a coefficient of determination (R-squared) of 0.998 and explained the growth curve of broilers appropriately [9].

Pinto, M. A. et al. (2020) used the Gompertz growth model to analyze the different growth curves of broilers fed various nutrient ratios. The study aimed to select the nutrient ratio that showed the fastest growth rate in Cobb 500 broilers using four comparison groups fed different nutrient ratios. In the study, the Gompertz growth model was recognized as a good fit with an R-squared of 0.99 for all four data sets, and the comparison group fed HCl, SO₄, and cacium pidolate showed the highest growth rate [10].

Chumthong, R. et al. (2021) used 3280 chickens manually measured at two-week intervals to analyze the growth curve of Blackbone chickens in Thailand. In this paper, three nonlinear growth models, Gompertz, logistic, and Von Bertalanffy, were used to model the growth curve of Blackbone chickens at two-week intervals, and the Von Bertalanffy model showed the best fit for both male and female species, recording an R-squared of 0.9 [11].

Alijani, S. et al. (2021) collected 45-day manual weight data of 823 broilers to compare the growth curves of ascitic and healthy broilers and observed the growth differences between ascitic and healthy broilers. The logistic, Gompertz, Richards, Lopez, and Von Bertalanffy models were applied as models to explain the growth curve. For healthy broilers, the Gompertz model showed an R-squared of 0.99 and an AIC of 68, which was recognized as a good fit. However, for ascitic broilers the AIC was 85.4, which was low, and the growth model that best explained ascitic broilers was the Richards nonlinear curve [12].

Hangan, B.A. et al. (2022) modeled four commercial broiler breeds raised in tropical regions using Gompertz and polynomial growth curves. To track broiler growth, body weight changes were manually measured from day 1 to day 42, and the differences between the representative body weight and the modeled body weight over a five-week period were compared. The study reported that the Gompertz growth model had an average error rate of less than 0.05 compared to the polynomial growth curve and was suitable for weekly growth modeling [13].

Previous studies on the growth of broilers have utilized manually measured data. However, the data covered in this paper contain a large amount of noise that hinders the accuracy of analysis. A large amount of noise is difficult to remove through preprocessing, and previous studies have studied methods for storing accurate weights during the storage stage and methods for refining already-stored data.

Park H. et al. (2018) studied a method for storing measured weights by analyzing broiler video footage to automatically measure broiler weight information. The mean-shift clustering algorithm was used as a method to decompose the exact number of broilers that climbed onto the scale plate in the footage, and the weight information displayed by the scale in the footage was detected and stored using a CNN model trained on the MNIST data set, thereby automatically storing daily weight information The study reported that it successfully extracted weight data of broilers with an accuracy of 91.09% from 358 test images, and that recognition accuracy was low in situations where the poultry were crowded together [14].

Chun-Yao Wang et al. (2021) identified the problem that automatic weight measuring scales installed in broiler houses contained a large number of outliers and conducted a study to remove outliers. GMM (Gaussian Mixture Model) clustering was applied to create a single-head weight cluster and the daily weight representative value was estimated using a cluster selection method using the bootstrap algorithm. In this study, it was noted that as broilers grow the proportion of outliers mixed into the data increases due to various factors, and it was reported that the final data obtained could be fitted to the Gompertz growth model to obtain a representative weight value corrected with an error rate of less than 5% [15].

Oh Y. et al. (2024) [16] conducted a study on applying the kernel density estimation method to estimate the accurate daily representative value of automatically measured broiler weight information. In order to improve the accuracy of the kernel density estimation method, an optimal bandwidth learning method suitable for broiler analysis was introduced and a comparison with the shipping weight was conducted. The study focused on the difficulty of analyzing the weight representative value of the automatic weight measurement system of broilers. It analyzed this density and reported that the method of updating the bandwidth of kernel density estimation is a way to select the correct weight representative value [16].

Table 1 summarizes the previous studies described above. Many previous studies have used relatively small numbers of refined data measured manually to explain the growth progress of broilers and have focused on finding factors that are beneficial and detrimental to the growth of broilers. These studies may be suitable for studies explaining the growth factors of broilers, but they may be insufficient to accurately explain the growth curve of broilers that changes every time in the actual smart barn environment. Therefore, this study focuses on the prediction performance by focusing on the MAPE index that indicates the prediction accuracy rather than the R-squared index that accurately expresses the growth curve.

2.2. Clustering Method and Density Analysis Method for Estimating the Weight Representative Value

2.2.1. Clustering

Clustering is an unsupervised learning method that measures data similarity and groups data with a high similarity. Clustering is broadly divided into density-based clustering and distance-based clustering. Density-based clustering sets data groups with a high data density into cluster groups. Distance-based clustering is a clustering algorithm that determines the similarity and sets up cluster groups based on the distance between data. In this paper, we adopted the distance-based clustering method and used the K-means clustering algorithm [17].

The K-means clustering algorithm is an algorithm that is easy to implement and understand and has the advantage of a fast computation speed. These advantages make it suitable for the analysis of the data in this paper, which has a large data size. The K-means clustering algorithm performs initial clustering based on the distance between the initial center point and the data points arranged in Euclidean space, based on the initial center point set at a random location. Currently, the data point belongs to the cluster of the initial center point that is closest to it. The initial center point is updated to the coordinates that minimize the sum of errors with the data points to which it belongs, and the appropriate cluster center is learned. This learning step is repeated, and when there is no or very little change in the center point, learning is terminated (1).

a r g m i n_{c} \sum_{i = 1}^{k} \sum_{x \in C_{i}} {||x - μ||}^{2} μ_{i} = \frac{1}{|C_{i}|} \sum_{x \in C_{i}} x

(1)

The K-means clustering algorithm sets the number of clusters to classify data before learning. This can be divided into a method of directly determining the number of clusters experimentally based on the data and a method of selecting the optimal number of clusters by judging the change in clustering quality each time the number of clusters is increased. The silhouette score analysis method [18] can be applied as a method of selecting the optimal number of clusters.

Silhouette score analysis is a method of analyzing the silhouette coefficient (Equation (2)), which is an indicator of clustering quality, as the number of clusters increases. The silhouette coefficient can be calculated by considering the cluster cohesion calculated as the average distance between the data points belonging to the cluster to which data point

i

belongs (Equation (3)) and the inter-cluster separation calculated as the smallest value among the average distances between the data points of other clusters to which

i

does not belong (Equation (4)).

s (i) = \frac{b (i) - a (i)}{\max (a (i), b (i))}

(2)

a (i) = \frac{1}{|C_{k}| - 1} \sum_{x \in C_{k}, x \neq i} \sqrt{\sum_{j = 1}^{n} {(i_{j} - x_{j})}^{2}}

(3)

b (i) = \min_{C^{'} \neq C_{k}} \frac{1}{|C^{'}|} \sum_{x \neq C^{'}} \sqrt{\sum_{j = 1}^{n} {(i_{j} - x_{j})}^{2}}

(4)

2.2.2. Kernel Density Estimation

Kernel density estimation (KDE) is one of the nonparametric density estimation methods that estimates the probability density function (PDF) by replacing each data with a kernel function and adding them [19]. The simplest nonparametric density estimation method is the histogram. The histogram divides the data into bins and expresses them by adding the number of data points belonging to each bin. The histogram has the characteristic of being expressed discontinuously based on the bin. Kernel density estimation can express the continuity of the probability density function by assuming each data point as a kernel function for continuous expression rather than as the discontinuous expression of the histogram. Figure 1 illustrates the difference between a histogram and kernel density estimation. A histogram represents the distribution of data using discrete bars, whereas kernel density estimation assumes that each data point follows a Gaussian distribution and sums them to construct a continuous density function. In Figure 1, the kernel density estimation method models each data point with a Gaussian distribution, represented by red dotted curves, and combines them to form the final density estimation.

Kernel density estimation represent data using a kernel function to construct a continuous probability density function. The kernel function is a non-negative, symmetric function with an integral value of 1, and the Gaussian kernel is commonly used (Equation (5)). The estimated density is obtained by summing all the created kernel functions and normalizing the result by dividing it by the total number of data points to ensure the total probability sums to 1 (Equation (6)).

K (u) = \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} u^{2}}

(5)

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h})

(6)

h

stands for bandwidth, which is a very important parameter in the analysis of kernel density estimation. The smaller the bandwidth, the sharper the distribution, which has the advantage of being able to identify more detailed structures but also has the disadvantage of being susceptible to noise and overfitting. Silverman’s rule of thumb is one of the representative methods for estimating the optimal bandwidth in the kernel density estimation method using the Gaussian kernel function [20]. It is an empirical rule that presents the optimal bandwidth h through theoretical analysis and an approximate approach. According to Silverman’s rule of thumb, the optimal bandwidth

h

is calculated based on the size

n

of the data sample and the standard deviation

σ

as in Equation (7).

h = {(\frac{4 σ^{5}}{3 n})}^{\frac{1}{5}} \approx 1.06 σ n^{- 1 / 5}

(7)

2.3. Growth Trend Representation Method and Time Series Approach for Shipping Weight Prediction

2.3.1. Gompertz Growth Model

The Gompertz growth model is a nonlinear function proposed by Benjamin Gompertz [21] and is a representative growth model that can explain the S-shaped growth curve along with the logistic growth model. The Gompertz growth model has the characteristic that the growth rate gradually increases in the beginning and then gradually decreases after the inflection point where the growth rate reaches its maximum. The decreased growth rate converges to the limit value after a certain period of time, indicating the end of growth. Due to these characteristics, the Gompertz function is an effective growth model for explaining biological growth and aging processes.

G o m p e r t z (x) = A \cdot e^{- B e^{- C x}}

(8)

Formula (8) is the formula for the Gompertz growth model.

A

represents the growth saturation value and

B

represents the initial value of the function.

C

is a parameter that controls the degree of the growth curve.

2.3.2. Double Exponential Smoothing

Exponential smoothing is a representative technique used in time series forecasting, which is a method of predicting future values using past data. It is an algorithm that can design a forecasting model with relatively simple formulas and calculation methods and has a wide range of applications that can be quickly applied to short-term forecasting problems. The biggest characteristic of exponential smoothing is that it gives past data increasingly smaller weights over time. In other words, the most recent data are considered the most important, and the importance of past data decreases over time. Simple exponential smoothing is suitable for time series data that do not have a trend, which means an increasing and decreasing trend in data, and a seasonality, which means a repetitive periodic component, and double exponential smoothing is applied to data that have a trend component.

Double exponential smoothing is an extended method of simple exponential smoothing and is a time series forecasting method that can reflect trends. Double exponential smoothing can perform more sophisticated forecasting by simultaneously considering levels and trends. Double exponential smoothing models the main elements of time series data through level smoothing and trend smoothing and then combines the two models to perform forecasting.

L_{t} = α Y_{t} + (1 - α) (L_{t - 1} + T_{t - 1})

(9)

Equation (9) is the formula for level smoothing.

L_{t}

represents the level at time

t

and

Y_{t}

represents the actual observation value at time

t

.

L_{t - 1}

and

T_{t - 1}

represent the level and trend at time

t - 1

, respectively, and

α

is a smoothing constant, which follows

(0 \leq α \leq 1)

. The smoothing constant is a factor that determines how strongly the data from the previous time point are reflected to explain the current time point. In level smoothing, the starting level, which represents the level of the first time point, uses the first data point and can be expressed as

L_{0} = Y_{0}

. In addition, the starting trend generally uses the difference between the first and second data points and can be expressed as

T_{0} = Y_{1} - Y_{0}

. Level smoothing is the process of estimating the current level of data.

T_{t} = β (L_{t} - L_{t - 1}) + (1 - β) T_{t - 1}

(10)

Formula (10) is the formula for trend smoothing. It determines how much importance is given to past trend changes. The closer

β

is to 1, the more strongly it reflects recent trends, and the closer it is to 0, the more strongly it reflects past trends.

{\hat{Y}}_{t + h} = L_{t} + h \cdot T_{t}

(11)

Formula (11) is a formula that combines level smoothing and trend smoothing to predict future time values and performs predictions for time

h

from a given time

t

. Double exponential smoothing is a method of obtaining future predicted values by updating the level and trend, respectively. It can simply explain the time series progression in complex time series data and perform predictions with easy implementation.

2.3.3. ARIMA

The ARIMA (autoregressive integrated moving average) forecasting model is a time series forecasting model that uses past data to predict current and future values [22]. The ARIMA model is based on the ARMA (autoregressive moving average) model, which combines the AR (autoregression) model and the MA (moving average) model. The ARIMA model applies differencing to transform time series data with stationarity so that the ARMA model can be applied. Here, stationarity means a state in which the statistical characteristics of data over time remain constant. Data with stationarity refers to data with a constant mean and variance, and data with increased and decreased trend components and periodically repeating seasonal components are removed. If stationarity is not achieved through differencing, the differencing can be repeated to perform second- or higher-order differencing.

The AR (autoregression) model uses autoregression to explain the current point in time’s value as a linear combination of past values. This is shown in Equation (12). Here,

y_{t}^{'}

is the current point in time’s value that has stationarity,

y_{t - 1}^{'}, y_{t - 2}^{'}, \dots, y_{t - p}^{'}

is the past point in time’s value that has stationarity, and

ϕ_{1}, ϕ_{2}, \dots, ϕ_{p}

is the AR coefficient that represents the weight for each past data.

c

is a constant term,

ϵ_{t}

represents white noise, and

p

represents the order of the data from the previous point in time that is to be reflected.

y_{t}^{'} = c + ϕ_{1} y_{t - 1}^{'} + ϕ_{2} y_{t - 2}^{'} + \dots + ϕ_{p} y_{t - p}^{'} + ϵ_{t}

(12)

The MA (moving average) model explains the current value based on past errors. Equation (13)

θ_{1}, θ_{2}, \dots, θ_{q}

are the coefficients of the MA model that represents the influence of past errors on the current value.

ϵ_{t}

is the error and

ϵ_{t - 1}, ϵ_{t - 2}, \dots, ϵ_{t - q}

are the past error term.

q

represents the order of the model. The MA model captures the impact of past errors on the present values, incorporating random and unpredictable fluctuations through past errors.

y_{t}^{'} = c + ϵ_{t} + θ_{1} ϵ_{t - 1} + θ_{2} ϵ_{t - 2} + \dots + {θ_{q} ϵ}_{t - q}

(13)

The ARIMA model models the current value by reflecting the past values and past errors of stationary time series data by combining three components, autoregression, differencing, and moving average, as shown in Equation (14). At this time, the order of autoregression, the order of differencing, and the order of moving average are expressed as ARIMA (

p

,

d

,

q

).

y_{t}^{'} = c + ϕ_{1} y_{t - 1}^{'} + \dots + ϕ_{p} y_{t - p}^{'} + θ_{1} ϵ_{t - 1} + \dots + θ_{q} ϵ_{t - q} + ϵ_{t}

(14)

2.3.4. Prophet

Prophet is a time series analysis library developed by Meta (formerly Facebook) in 2017 [23]. The Prophet model effectively explains time series data that include missing values, outliers, seasonality, and trend changes. The Prophet model models time series data with three factors, trend, seasonality, and holidays, and explains the error that cannot be explained by the three factors as

ϵ_{t}

. This is shown in Equation (15).

y (t) = g (t) + s (t) + h (t) + ϵ_{t}

(15)

g (t)

is a growth model that describes trends and is a trend function that describes long-term changes in time series data. Prophet’s linear or nonlinear trend function flexibly expresses data and effectively describes the growth trend of data. Prophet’s curve trend function is based on the logistic growth model. Prophet applies changepoints to learn where the trend changes sharply and adjusts the slope of the subsequent trend function using changepoints to express this as a piecewise logistic growth model to allow for more flexible application of the curve trend. The piecewise logistic growth model is as in Equation (16), and an indicator function

a (t)

is introduced to determine whether time

t

is before or after time

s_{j}

, where the changepoint occurs. If time

t

is after the changepoint, the change rate of the logistic growth model is adjusted. At this time,

t

adjusts the amount of change in the slope of the logistic growth model, and

s_{j}

adjusts the inflection point of the logistic growth model. At this time,

δ

adjusts the slope change amount of the logistic growth model, and

γ

adjusts the inflection point of the logistic growth model.

\begin{matrix} g (t) = \frac{C}{1 + \exp (- (k + a (t) δ) (t - (m + a (t) γ)))} \\ a (t) = \{\begin{matrix} 1 i f x > 0 \\ 0 o t h e r w i s e . \end{matrix} \end{matrix}

(16)

Prophet’s piecewise logistic growth model adjusts the parameters of the logistic growth model using changepoints and can appropriately explain growth curves that are difficult to explain with general logistic growth models but can be vulnerable to overfitting due to noise and outliers. The Prophet library supports the changepoint_prior_scale tool, which can set the parameter adjustment reflection ratio to prevent over-adjustment of the slope and inflection points of the logistic growth model at changepoints due to noise and outliers, thereby preventing overfitting of the growth model.

s (t)

has a regular pattern and describes seasonality that repeats periodically. Prophet decomposes the periodic components of data through the Fourier Series to model seasonality. Prophet basically decomposes the weekly seasonality and annual seasonality components, and when decomposing a user-defined seasonality specifying a specific period, it accounts for the periodic component by substituting it into the

P

variable in Equation (17).

s (t) = \sum_{n = 1}^{N} (a_{n} \cos (\frac{2 π n t}{P}) + b_{n} \sin (\frac{2 π n t}{P}))

(17)

h (t)

describes the holiday effect. The holiday effect refers to the change in data that occurs on a specific date or period in a time series, and is a component introduced to model the impact of a specific event on the data. The public holiday component is modeled as in Equation (18), where

Z (t)

is an indicator function that generates an indicator vector indicating whether time point

t

belongs to the public holiday set

D_{1}, D_{2}, \dots, D_{L}

.

\begin{matrix} Z (t) = [1 (t \in D_{1}), \dots, 1 (t \in D_{L})] \\ h (t) = Z {(t)}_{κ} \\ κ ~ N o r m a l (0, ν^{2}) \end{matrix}

(18)

3. Materials and Methods

3.1. Data Collection

The broiler individual body weight data used in this study were stored using data measured in an actual smart barn using a broiler weight measuring device developed by Emotion Co., Ltd. (Jeonju, Jeollabuk-do, Republic of Korea). Three broiler weight measuring devices were installed in one broiler barn to collect the scale number, cage number, data output time, body weight data, temperature and humidity information inside the barn, and the concentrations of carbon dioxide, ammonia, and particulate matter in the air inside the barn every second, and the measured data were stored in a cloud database for verification. Table 2 presents the specifications of the broiler weight measuring machine. The scale number, cage number, data output time, and body weight data were extracted from the stored data to create a body weight data set, and shipping weight information was extracted from the shipping performance indicators mapped to the body weight data set to create a learning and verification data set. The actual photo of the broiler weight measuring machine is shown in Figure 2.

This study was conducted in 36 actual smart farms in Namwon, Wanju, and Gimje, Republic of Korea, and the algorithm was applied to 110 data sets collected from January 2023 to September 2024. The list of 110 data sets used in the experiment can be found in Table 3. Each data set stores approximately 7,200,000-to-10,368,000 rows of data measured during the breeding period of approximately 30-to-40 days from the start of breeding of broilers to the shipping date. The main reason for the difference in the number of data is the difference according to the breeding period caused by the adjustment of the shipping time according to the growth rate of broilers, and other reasons such as missing values due to device malfunction during the collection period and schedule changes such as the cancellation of shipping.

The file name of the saved data set follows the format of the breeding start date + KF (Kokofarm, Device Name) + FarmID + HouseID_sensorData.csv, and each data set has a farmID column for the farm number, a houseID column for the barn number, a scaleID column for the scale number, a getTime column for the collection time, and a weight (g) column for the broiler weight in grams. The ID column contains data combining the collection farm, barn, scale number, and collection time. Of these, the weight (g) source is good data for predicting future points in time. The data used in this paper follows the format shown in Figure 3.

The weight information collected by the broiler live weight measuring device placed inside the broiler house automatically measures the weight of the broiler, but the automatically saved data are saved with various outliers and noise. As shown on the left side of Figure 4, outliers and noise are hindering visual analysis. When adjusting the transparency of the scatter plot graph on the left side of Figure 4, it was confirmed that a specific-density group of data was formed in the outliers and noise, as shown on the right side.

Figure 5 shows images taken by a camera attached to a broiler live weight measuring device to analyze the reason for the formation of a specific-density group confirmed through a scatter plot graph. It was confirmed that noise from various factors was measured, such as cases where broiler feathers, feed, and feces were detected above the measuring load cell plate, cases where excessive weight was detected during the process of broilers landing on the plate, cases where broilers were poorly positioned on the load cell plate, and cases where multiple broilers were detected while climbing on the load cell plate.

In this paper, we utilize time series data of weight data collected for prediction. However, as mentioned above, the data are distorted by outliers and noise. Therefore, we refine the data mixed with noise and outliers, select meaningful daily representative values, and then use them as data for prediction.

3.2. Algorithm Composition and Design

The algorithm presented in this paper for predicting the live weight of broilers at the time of shipment is designed to follow three steps sequentially: the daily weight estimation step, the weight representative value selection step, and the shipment weight prediction step. The data used in this paper is divided into daily groups, and after going through the daily weight estimation step and the weight representative value selection step, outliers are processed to form a daily growth time-series data. The daily growth time-series data is then input into Prophet in the shipment weight prediction step, and the prediction results are compared with ARIMA, Double Exponential Smoothing, and the Gompertz growth model. The flow of the algorithm presented in this paper is shown in Figure 6.

The daily weight estimation step is the step of generating a list of representative values estimated as the weight of one chicken among the outliers and noises mixed in the data. This step is based on the analysis that the scatter plot of the data commonly forms a specific density [16]. The dense sections of the daily data converted to daily units of data for the entire period appear between 1 and 5, and it is assumed that there will be a section where the weight of one chicken is dense among the sections. At this time, the section with the highest density of data cannot be considered the dense section of the weight of one broiler. In the daily weight estimation step, the clustering method and the density analysis method are used to interpret the dense sections of the daily data.

The clustering method divides the data through distance-based clustering and obtains the cluster centers of the clusters as a list of representative values. Distance-based clustering may not be suitable for interpreting the density of data, but the large number of data covered in this paper is intended to compensate for this by using the characteristic of a high internal cohesion when dense sections are divided into clusters. The silhouette coefficient is an evaluation index that evaluates the degree of internal cohesion and division between clusters, and it is noted that the dense section of the weight characteristic can be divided by performing optimal clustering by setting the number of clusters showing high silhouette coefficients as a parameter.

In this study, the K-means algorithm is applied as a distance-based clustering algorithm applied to the clustering method. K-means is a representative distance-based clustering algorithm and has the advantages of a simple implementation and fast computation speed. These characteristics are suitable for the requirements of this study, which requires continuous clustering of approximately 9,000,000 data points stored in one data set. In addition, this step is more important to properly determine the list of representative values of broiler weight than to precisely segment and separate the density. Therefore, although the segmentation precision is somewhat low, clustering based on the distance similarity of data using a large number of data can show a high performance in estimating representative values. The characteristic of K-means, which depends on the number of clusters, can be effectively used to explain the dense section of broiler weight if appropriate parameters are set. In this study, a method for selecting the number of clusters considering the cohesion within clusters and the separation between clusters using silhouette scores is used, and this method can effectively separate clusters between data and obtain the optimal-density section. Figure 7 is an example of clustering results. The difference in color for each cluster indicates the difference in the data points to which they belong, and the center value of the cluster belonging to the representative value list of the cluster is indicated as a red data point.

The density analysis method estimates the probability density function of daily weight data using the kernel density estimation (KDE) method and obtains a list of representative values of high-density areas of data by generating a list of local maximums of the density distribution. The kernel density estimation method is an effective nonparametric probability density function estimation method that can be used to analyze the distribution of broiler weight data more precisely and to simultaneously analyze outliers and representative values [24]. In this study, where the weight data showing the maximum density of broilers are likely not to be the representative values of the weight of one bird, the kernel density estimation method is an effective method for distinguishing the density interval of data and analyzing multiple density regions. The density distribution was rotated clockwise to visualize it with a scatter plot graph. The density of the data was confirmed by dividing the data based on the local minimum of the distribution. The local maximum is marked as a red data point, and the local maximum is saved as a list of daily representative values. Figure 8 shows the data separated using kernel density estimation. The red data points represent the local maxima, and the data points are differentiated by color according to the local minimum regions.

The weight representative value selection step is the step of selecting one weight representative value from the list of daily weight representative values obtained in the daily weight estimation step. The weight representative selection step, along with the previous daily weight estimation step, is an important step to refine the broiler weight data set, which contains a large amount of noise and shows high data contamination, and convert it into a form in which clear growth trends can be analyzed. The weight representative value selected in this step can be called the daily weight representative value, representing the weight of the broiler on that date. The daily weight representative value forms a time series with the daily weight representative values of other sequentially selected dates, and the growth trend of the broiler generated in this way is used as learning data for prediction. The most important thing in this step is to select a representative value that is close to the actual weight of one broiler from the list of daily weight representative values of broilers. In this paper, in order to select an accurate daily weight representative value, a filtering method is used to estimate the weight representative value at the current point in time by referring to the weight representative values selected at past points in time due to the upward growth trend of broilers over time. Figure 9 illustrates an example of the weight representative value selection step. It consists of a time series on a daily basis and is dependent on previous time points. After going through this step, the selected data forms a consistent trend, as shown by the red progression in Figure 9. The center of the cluster, marked by red data points, is tracked and connected by a red arrow to express the trend.

This step begins by assuming the representative weight of one broiler at the current time based on the representative weight of one broiler at the past time (Equation (19)). At this time,

θ

is an empirically set growth constant. The assumed representative weight value of one broiler

{\hat{y}}_{t}

is compared with the list of representative weight values of broilers, and the representative weight value closest to

{\hat{y}}_{t}

is selected as the representative weight value of one broiler (Equation (20)). The data set used in this paper has a common weight of 30 g at the start of breeding and can be applied as

y_{1} = 30

.

{\hat{y}}_{t} = y_{t - 1} \times θ

(19)

\underset{x_{i}}{argmin} (| {\hat{y}}_{t} - x_{i} |)

(20)

At this time, in order to minimize the selection of incorrect daily weight representative values in the weight representative value selection step, a method of filtering out outliers through the weighted moving average filter, which is a recursive filtering method, is applied. The weighted moving average filter is an algorithm that can correct the incorrect daily weight representative value selected in the process of selecting the daily weight list of broilers and is a very important step for generating an accurate growth trend of broilers. The weighted moving average filter explains the current point-in-time data as the average of past point-in-time data. At this time, the past point-in-time data have weights according to their importance. In this paper, the closer the past point in time is to the current point in time, the higher the weight is, and the farther the past point in time is to the current point in time, the lower the weight is. This method of giving a high weight to the latest point in time reflects the growth degree of broilers and allows for the selection of a daily weight representative value that removes noise appropriately. Equation (21) is an equation that explains

{\hat{y}}_{t}

through the value filtered by the intervention of outliers using a weighted moving average filter and the growth constant

θ

. Here,

n

means window, which is the number of past points to reflect.

{\hat{y}}_{t} = \frac{w_{1} y_{t - 1} + w_{2} y_{t - 2} + \dots + w_{n} y_{t - n}}{\sum_{i = 1}^{n} w_{i}} \times θ

(21)

In Figure 10 and Figure 11, the blue data points are candidates for the daily weight representative value matching the x-axis, which is the time series axis. The daily weight representative value selected through this process is indicated as a red data point, and when the daily weight representative values are connected it can be seen that they form a growth trend of broilers.

The shipment weight prediction step is a step to predict the shipment weight of broilers through time series prediction using the daily weight representative values obtained through the daily weight estimation step and the weight representative value selection step, i.e., the growth trend of broilers. In this study, the growth trend of broilers obtained is trained to a predictor up to a certain point to predict the shipment weight after N days. At this time, the shipment performance index actually recorded by the data set was applied to compare the actual shipment weight with the predicted weight. The growth trend of broilers trained to the predictor is the growth trend up to the point where the daily weight representative value of broilers exceeds 1000 g for two consecutive days. The broiler live weight measuring device placed inside the broiler farm generally had difficulty measuring the weight of broilers after the weight of broilers exceeded 1000 g. The density characteristics shown at the previous point were weakened, and the involvement of outliers and noise due to wing overlapping and weight mismeasurement increased. Therefore, when the representative weight value exceeds 1000 g it is acknowledged that the data reliability is low, and in the step of selecting the representative weight value, a step of predicting the broiler live weight at the time of shipment was applied when the time when the representative weight value exceeded 1000 g was more than 2 days. In this study, the prediction model used to predict shipping weight was the Prophet model, and in order to compare the prediction performance of the Prophet model, the Gompertz growth, double exponential smoothing, and ARIMA models were applied to each model and the results were compared.

As the growth trend (

g (t)

) modeling function of the Prophet model, the piecewise logistic growth model was used along with the piecewise logistic growth model. The piecewise logistic growth trend function of the Prophet model is a regression function that can express the S-shaped growth of broilers, and can effectively explain the rapid growth period and growth convergence period of broilers. The growth trend of this study, which generates a list of daily weight representative values of broilers through the daily weight estimation step and the growth trend of broilers through the weight representative value selection step, is relatively inaccurate and uncertain compared to data measured precisely manually. The growth trend function of Prophet is a function that is more flexible and can model robust outlier intervention compared to the general logistic growth curve. In addition, it reflects the periodic characteristics through seasonal decomposition that decompose the periodic component of the data, and explains the factors that cannot be explained by the above-mentioned factors as residuals to reduce and reflect the intervention of outliers. The assumed value of the growth convergence of the piecewise logistic growth was set to 2500 g, and the changepoint_prior_scale adjustment value, which is a parameter that sets the sensitivity of the changepoint, was set to 0.3.

Gompertz, used for comparison of results with Prophet, has been evaluated as an appropriate model for the growth curve of broilers in previous studies. In a study modeling the growth characteristics of Korean native chickens, it was recognized for its growth modeling appropriateness with an R-squared of 0.97 [25], and in a study on growth modeling using broiler weight data it showed the best appropriateness among the comparison groups of logistics, Gompertz, Weibull, and Von Bertalanffy with an R-squared of 0.99 and the lowest MSE score [26]. In a previous study [8], the Weibull function showed the best growth modeling, but Gompertz also recorded a MAPE of 0.049 and recognized its appropriateness for the broiler growth model. The Gompertz growth model used gradient descent to optimize parameters for the marginal convergence variables

A

, initial value

B

, and slope

C

to explain the data, and selected the parameters that minimized the mean squared error score. It learned time series data up to the point where the broiler weight was 1000 g to explain the growth curve.

The growth pattern of broilers is a dynamic system that changes over time. The growth of broilers occurs gradually over time, and the weight at each point in time is strongly affected by the weight of the previous point in time. Not all broilers grow the same, and each individual has the characteristic of growing independently due to various factors. This characteristic can be interpreted as time-dependent, and the independent growth characteristic appears as a nonlinear and complex growth pattern [27]. In this study, the data used due to excessive noise were replaced with a representative value, and the learning data were simplified and successfully converted into a relatively simple growth pattern. To effectively explain this, we intend to use a time series predictor that can perform effective prediction with a single parameter.

The double exponential smoothing prediction method is a prediction technique that explains the dependency of changes over time with past data and enables strong modeling of outliers that are difficult to predict and is a prediction model suitable for predicting short-term changes over time [28]. Double exponential smoothing reflects both trends and levels of time series data and smooths out complex changes to update them. The data handled in this study are data that have been filtered and simplified from noise through the previous step, and since the trend is clear we intend to use the double exponential smoothing technique as a method for predicting shipping weight. The main parameters of exponential smoothing, the level smoothing constant α and the trend smoothing constant β, were set to optimal parameters suitable for the applied data set by applying the least squares method to minimize the residual as a result of smoothing.

The ARIMA model is a representative time series forecasting model and is an effective model for short-term and long-term forecasting of time series data. It shows excellent performance on linear time series data and is an effective forecasting model for future forecasting based on autoregression of time series data and past errors based on past observations, and is evaluated as a model that shows excellent performance on short-term forecasting with weak periodic patterns [29]. The ARIMA model has some limitations in reflecting the nonlinear growth pattern of the early growth period when broilers grow rapidly, but the input data of this study, which handles time series data that have been converted to daily weight representative values by removing noise that is difficult to explain, have weak periodic components and were performed by applying recursive filtering that depends on past data. Therefore, it was selected as a comparative model of Prophet by focusing on the strong autoregressive pattern modeling of ARIMA. The Hyndman–Khandakar algorithm [30] is used as a method to search for the optimal order of the ARIMA model. The Hyndman–Khandakar algorithm automatically searches for the orders of

d

(difference),

p

(AR), and

q

(MA) of the ARIMA model and determines the optimal order. The Hyndman–Khandakar algorithm begins by applying the KPSS test to determine the order of the difference. The KPSS test uses the cumulative sum and variance of the residuals of the time series data to calculate the KPSS statistic. This is compared to the threshold value to determine stationarity, and the order of difference is increased when the data do not satisfy stationarity. If the data satisfy stationarity through the KPSS test, the order of

p

and

q

is determined through the AIC (Akaike Information Criterion) evaluation. The AIC evaluation is an index that evaluates the suitability and complexity of a model. Equation (22) considers the number of parameters of the model, i.e., the complexity of the model, and

L

means the maximum likelihood of the model.

AIC = 2 k - 2 l n (L)

(22)

The Hyndman–Khandakar algorithm can determine the optimal orders by setting up initial model candidates by considering possible orders for and in advance, training and testing an ARIMA model, and selecting the model candidate with the lowest AIC value.

Figure 12 presents the results obtained by applying each prediction model. The broiler weight data were represented by blue scatter points, with the alpha value reduced to 0.002. The daily weight representative values up to 1000 g are indicated by pink points, while the predicted values of the fitted prediction models corresponding to the trend of the daily weight representative values are shown as red plots. These predicted values can be compared with the actual shipment weight, represented in black.

3.3. Performance Evaluation of the Algorithm

In this paper, we aim to predict the live weight of broilers at the time of shipment using the proposed algorithm. For this purpose, 110 weight data sets with different growing houses, growing periods, and shipment times were used. This paper aims to compare the performance of each proposed algorithm to select the optimal algorithm for predicting the shipment weight of broilers. The prediction performance of each of the 110 weight data sets was compared by the error percentage from the actual shipment weight label, and the MAE, MAPE, and RMSE were used as evaluation indices to evaluate the performance of the algorithm itself.

MAE (mean absolute error) is one of the indicators for evaluating the prediction performance of a model. It averages the absolute errors between the predicted value and the actual value and intuitively shows how much the predicted value differs from the actual value. The smaller the value, the smaller the error, and the closer it is to 0, the better the model’s performance can be interpreted.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(23)

MAPE (mean absolute percentage error) is a prediction evaluation index that expresses the relative error between the predicted value and the actual value as a percentage. By expressing the prediction accuracy as a percentage, you can intuitively judge the prediction accuracy. The closer the value is to 0, the more accurately the prediction is evaluated.

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} | \times 100

(24)

RMSE (root mean squared error) is a prediction evaluation index that shows the difference between the predicted value and the actual value. The smaller the value, the higher the prediction accuracy. RMSE is sensitive to large errors, and the RMSE score can increase significantly when large errors occur in some data sets.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(25)

In this paper, we analyze the mean error and the standard deviation of the error to evaluate the prediction performance of the designed algorithm and to confirm the prediction stability. The mean error can intuitively confirm the prediction error reflecting negative and positive numbers, and the standard deviation of the error can confirm the stability of the prediction result.

4. Results

4.1. Experimental Results

4.1.1. Experiments in the Daily Weight Estimation Step

The clustering method using K-means clustering was performed by evaluating the internal cohesion and the separation between clusters through clustering that applied the number of clusters that showed the largest silhouette coefficient. In the case of the initial cluster starting point parameter that greatly intervenes in the result of K-means clustering, the experimental results showed that when applied to the large-sized data covered in this paper, the same result was converged at any starting point. The daily weight estimation step performed with K-means generally divided into three-to-four dense sections from about 0-to-10 days of age and into two-to-three dense sections from about 11-to-27 days of age. The cluster division was performed in detail from about 28 days, when the weight of the broiler exceeded 1000 g, to the time of shipment.

The density analysis method using the kernel density estimation method used a Gaussian kernel and the bandwidth was set by applying Silverman’s rule of thumb, which is an optimal bandwidth approximation method that considers the size and standard deviation of the data. The estimated probability density function shows a multimodal distribution with a sharp shape until about 0-to-10 days of age. This can be interpreted as a group of one weight-dense animal, a group of two weight-dense animals, etc. After that, a bimodal distribution with a certain distance between local maxima was confirmed until about 11-to-22 days of age. It was found that a multimodal distribution with a narrow distance between local maxima was formed from about 23 days of age to the time of shipment. These results are shown in Figure 13 and Figure 14.

4.1.2. Experiments in the Weight Representative Value Selection Step

The result of applying the list of daily weight representative values to the clustering method to the weight representative value selection step was relatively simple and intuitive. Since the number of daily weight representative values that can be selected is relatively small, cases were observed where outliers that slightly deviate from the smooth growth trend were selected due to incorrect clustering results. However, there were few cases where weight representative values that greatly deviate from the growth trend were selected, and weight representative values that formed a relatively smooth growth trend were selected.

The selection step of the representative weight value using the density analysis method showed irregular changes compared to the growth trend generated through the clustering method. This indicates that the density interval division process of the data using the density analysis method is more detailed than the clustering method. The list of daily representative weight values generated in detail is likely to include a more accurate representative weight value of one broiler chicken in the list, but it increases the possibility that the representative weight value that deviates significantly from the growth trend is selected as an outlier in the selection step of the representative weight value. Although these outliers are corrected through the weighted moving average filter, many data sets were observed in which the representative weight value selection formed a relatively fluctuating growth trend. These results are shown in Figure 15 and Figure 16.

4.1.3. Experiments in the Shipping Weight Prediction Step

In this paper, the daily weight representative values generated by the K-means and KDE methods are used as training data for each of the four predictors. A total of eight combinations—K-means + Prophet predictor, K-means + Gompertz growth model, K-means + double exponential smoothing predictor, K-means + ARIMA predictor, KDE + Prophet predictor, KDE + Gompertz growth model, KDE + double exponential smoothing predictor, and KDE + ARIMA predictor—are compared, and time series prediction is performed based on the daily weight information of the prepared 110 data sets. The learning period is until the time when the daily weight representative value of broilers exceeds 1000 g, and prediction is performed until the shipping time of the data thereafter. Each model was evaluated for prediction performance through the MAE, MAPE, and RMSE to confirm the prediction accuracy of the prediction results, and the stability and reliability of the model were confirmed by analyzing the mean and standard deviation of the errors. These results are shown in Figure 17 and Figure 18.

Table 4 is a part of the results comparing the actual predicted values and actual values performed with the four predictors applied to the experiment and the combination of K-means and KDE. Among the entire results, 10 data sets between January 2023 and March 2023 were extracted and compared. The predicted values were processed by discarding decimals to improve readability, and the error percentages between the predicted values and actual values were entered in parentheses. The Prophet predictor showed a relatively high accuracy as a result of the prediction, and among them the error percentage was 0.02 when learning the daily weight representative value generated through the K-means method, showing the highest accuracy. In the case of the Gompertz growth curve, it predicted a value significantly lower than the actual weight, and it can be concluded that it is not suitable for use as a predictor of the Gompertz growth curve.

Table 5 is the prediction performance evaluation index of the eight models applied to the experiment. Among the models applied to the experiment, the K-means + Prophet prediction model showed the best evaluation index, with an MAE of 79.65, MAPE of 4.92, and RMSE of 102.18. The model that showed the second-best prediction performance was the model that used the K-means + double exponential smoothing method. It recorded a MAPE of 9.82 and RMSE of 207.99. The model that showed the third-highest prediction performance was the KDE + Prophet model. It recorded an MAE of 158.91, which was better than the K-means + double exponential smoothing method that showed the second-highest performance, but showed slightly worse evaluations when compared with the MAPE and RMSE scores. Among them, the RMSE index is a result index that reflects small errors with a small proportion and large errors with a large proportion. We can see that the KDE + Prophet model showed many results with large errors compared to the K-means + double exponential smoothing method. The model with the worst performance was KDE + Gompertz, which showed the lowest scores in all performance indices, and in terms of the RMSE score it showed an error score that was nearly four-times larger than that of the K-means + Prophet model, which was the best model.

The model to which the kernel density estimation method was applied showed a higher standard deviation of error compared to the model to which the K-means was applied. This was concluded as a problem that occurs when the growth trend generated by the weight representative value selection step frequently differs significantly from the actual growth trend. A predictor that has learned an incorrect growth trend makes incorrect predictions regardless of the performance of the predictor, and when the incorrect growth trend is frequently learned among the data sets used for testing, it shows a large error and a large standard deviation.

The model with K-means applied showed a low standard deviation of error, and it is expected that the growth trend generated in the weight representative value selection step was generated closer to the actual growth trend compared to the model with KDE applied.

Among the two models that learned the same K-means growth trend at this time, the Prophet predictor showed the best prediction experimental results, followed by the double exponential, ARIMA, and Gompertz models, in that order.

The Prophet predictor predicted the live weight at the time of shipment with the best performance when combined with the K-means method and recorded the third-highest MAE and MAPE when combined with the kernel density estimation method. Considering the RMSE index, which is an evaluation index that sensitively reflects large errors, it can be concluded that the error in the prediction result may occur depending on the accuracy of the learned growth trend.

If the accuracy of the learned trend is high, the next best choice is the double exponential model, and although the performance of the ARIMA predictor was relatively low, it was able to perform predictions with a MAPE less than 15.00.

5. Conclusions

In this paper, we discussed a method of dividing meaningful density sections to analyze data with mixed outliers and noise and predicted broiler live weight at the time of shipment using daily broiler weight estimates. For this purpose, using 110 broiler weight data sets we focused on the occurrence of a specific-density section in the data mixed with outliers and noise, and applied K-means clustering and kernel density estimation methods to estimate the representative value of daily broiler weight and calculate the daily weight. Assuming that there is one representative broiler weight value among the list of daily body weight representative values, a weighted moving average filter is applied to select one representative broiler weight value, the selected one daily broiler weight representative value is interpreted as time series data, and the Prophet model, double exponential, ARIMA, and Gompertz models are applied to compare the predicted broiler live weights at the time of shipment. As a result of the comparison, the predictor of the Prophet model showed a relatively high performance, and it was concluded that the K-means clustering method may be suitable for generating growth trends used as learning data.

This study assumes that there is at least one dense section of broiler weight data among the dense sections, but there is also a possibility that this is not the case in actual data. In the future, we plan to study a more accurate and valid data storage method for data measured by a broiler live weight measuring device placed inside a broiler house and a method for minimizing the dispersion of stored data.

Author Contributions

The author B.L. collected and organized the data, conducted the idea of solving the problem addressed in this paper, and analyzed the data. He also designed the methodology and wrote the first draft of this paper. The author J.S. conceptualized this study and designed the methodology. He also managed the project and reviewed and revised the first draft of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data set generated and analyzed in this paper cannot be publicly released due to the proprietary rights of Emotion Co., Ltd., Republic of Korea. However, it is available from the corresponding author upon research request with reasonable justification.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Torrella, K. How a Shipping Error More than a Century Ago Launched the $30 Billion Chicken Industry. Vox. Available online: https://www.vox.com/future-perfect/2023/2/10/23589333/cecile-steele-chicken-meat-poultry-eggs-delaware (accessed on 10 February 2023).
Cobb500 Broiler. Performance & Nutrition Supplement (2022) (PDF) (Report); Cobb-Vantress: Siloam Springs, AR, USA, 2022; p. 4. [Google Scholar]
Korea Rural Economic Institute. Agricultural Outlook 2022; Korea Rural Economic Institute: Naju, Republic of Korea, 2022; Chapter 18; pp. 756–758. [Google Scholar]
Kim, G.-W.; Kim, J.-H.; Kim, H.-Y.; Kim, B.-K.; Park, H.-B.; Choe, J.; Kim, J.-H. Analysis of Marketing Performances according to Raising Environment in Broilers. Korean J. Poult. Sci. 2019, 46, 25–30. [Google Scholar] [CrossRef]
Jang, H. Causes of Stress—A Major Disruptor of Broiler Production. Korean J. Poult. Sci. 1997, 29, 140–143. [Google Scholar]
Choi, H.-C. Dust Generation and Removal Method in the Premises, Monthly Foltree, National Institute of Animal Science, Rural Development Administration. 2010. Available online: https://www.nias.go.kr/front/soboarddown.do?cmCode=M090814151125016&boardSeqNum=120&fileSeqNum=123 (accessed on 10 January 2025).
Roush, W.; Dozier, W.; Branton, S. Comparison of Gompertz and Neural Network Models of Broiler Growth. Poult. Sci. 2006, 85, 794–797. [Google Scholar] [CrossRef] [PubMed]
Topal, M.; Bölükbaşı, Ş. Comparison of Nonlinear Growth Curve Models in Broiler Chickens. J. Appl. Anim. Res. 2008, 34, 149–152. [Google Scholar] [CrossRef]
Ahmad, H.A. Poultry growth modeling using neural networks and simulated data. J. Appl. Poult. Res. 2009, 18, 440–446. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Pinto, M.A.; Pereira, R.A.C.; Figueiredo, M.E. Growth curves of broilers fed different nutritional relationships using the Gompertz model. J. Anim. Sci. Technol. 2020, 62, 61–67. [Google Scholar]
Chumthong, R.; Sripha, S.; Tanaka, T. Using non-linear models to describe growth curves for Thai black-bone chickens. Anim. Sci. Technol. 2021, 65, 847–856. [Google Scholar]
Alijani, S.; Nematzadeh, R.; Hasanpur, K.; Varnaseri, H. Comparison of different mathematical functions for fitting growth curves of ascitic and healthy chickens. Ank. Üniv. Vet. Fak. Derg. 2021, 68, 289–295. [Google Scholar] [CrossRef]
Hagan, B.A.; Asumah, C.; Yeboah, E.D.; Lamptey, V.K. Modeling the growth of four commercial broiler genotypes reared in the tropics. Trop. Anim. Health Prod. 2022, 54, 75. [Google Scholar] [CrossRef]
Park, H.; Kim, N.; Han, Y.; Hahn, H. Implementation of Poultry Weight Measuring System using Object Segmentation based on Mean-shift Clustering. J. Inst. Electron. Inf. Eng. 2018, 55, 55–64. [Google Scholar]
Wang, C.-Y.; Chen, Y.-J.; Chien, C.-F. Industry 3.5 to empower smart production for poultry farming and an empirical study for broiler live weight prediction. Comput. Ind. Eng. 2021, 151, 106931. [Google Scholar] [CrossRef]
Oh, Y.; Lyu, P.; Ko, S.; Min, J.; Song, J. Enhancing Broiler Weight Estimation through Gaussian Kernel Density Estimation Modeling. Agriculture 2024, 14, 809. [Google Scholar] [CrossRef]
Lioyd, S.P. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Loftsgaarden, D.O.; Quesenberry, C.P. A nonparametric estimate of a multivariate density function. Ann. Math. Stat. 1965, 36, 1049–1051. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; CRC Press: Boca Raton, FL, USA, 1986. [Google Scholar]
Sharpe, D.H.; DeMichele, D.W. A Gompertzian model of population growth with density-dependent feedback. Ecol. Monogr. 1977, 47, 351–380. [Google Scholar]
Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
Taylor, S.J.; Letham, B. Forecasting at scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
Johansen, S.V.; Bendtsen, J.D.; R.-Jensen, M.; Mogensen, J. Broiler weight forecasting using dynamic neural network models with input variable selection. Comput. Electron. Agric. 2019, 159, 97–109. [Google Scholar] [CrossRef]
Kigon, K.; Sik, C.E.; Hwan, S.S. A Study on Growth Pattern in a New Synthetic Korean Native Commercial Chicken by Sex and Strains. Korean J. Poult. Sci. 2022, 49, 229–237. [Google Scholar]
Şengül, T.; Çelik, Ş.; Şengül, A.Y.; İnci, H.; Şengül, Ö. Investigation of growth curves with different nonlinear models and MARS algorithm in broiler chickens. PLoS ONE 2024, 19, e0307037. [Google Scholar] [CrossRef]
Taylor, C.; Guy, J.; Bacardit, J. Prediction of growth in grower-finisher pigs using recurrent neural networks. Biosyst. Eng. 2022, 220, 114–134. [Google Scholar] [CrossRef]
Shabir, F.; Abdullah, A.I.; Asrul, B.E.W.; Alifah, S.; Nur, A. Implementasi Metode Double Exponential Smoothing Dalam Menentukan Masa Tanam Pada Perkebunan Strawberry. Telemat. J. Inform. Dan Teknol. Inf. 2022, 19, 259–270. [Google Scholar]
Albeladi, K.; Zafar, B.; Mueen, A. Time Series Forecasting using LSTM and ARIMA. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 313–320. [Google Scholar] [CrossRef]
Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 2008, 27, 1–22. [Google Scholar] [CrossRef]

Figure 1. Histograms and KDE.

Figure 2. (a) Broiler live weight measuring machine; (b) Example of a broiler live weight scale measuring weight.

Figure 3. Example of the data sets used.

Figure 4. Broiler weight data scatter plot graph transparency adjustment.

Figure 5. (a) Normal measurement (1); (b) Normal measurement (2); (c) Multiple simultaneous measurement (1); (d) Multiple simultaneous measurement (2); (e) Wing overlapping; (f) Noise measurement.

Figure 6. Algorithm design and implementation.

Figure 7. Results of clustering application.

Figure 8. Results of applying KDE.

Figure 9. Example of the cluster selection step.

Figure 10. Example of the performing weight selection step (K-means representative value list).

Figure 11. Example of the performing weight selection step (KDE representative value list).

Figure 12. (a) Shipment weight prediction by Prophet; (b) Shipment weight prediction by Gompertz; (c) Shipment weight prediction by double exponential smoothing; (d) Shipment weight prediction by ARIMA.

Figure 13. 20230413074211_KF005101 (comparison of K-means and KDE).

Figure 14. 20230804210351_KF001601 (comparison of K-means and KDE).

Figure 15. 20230413074211_KF005101 (weight representative value selection step comparison).

Figure 16. 20230804210351_KF001601 (weight representative value selection step comparison).

Figure 17. 20230413074211_KF005101 prediction results. (a) K-means + forecast; (b) KDE + forecast.

Figure 18. 20230804210351_KF001601 prediction results. (a) K-means + forecast; (b) KDE + forecast.

Table 1. Review of broiler growth curve modeling studies.

Researchers (Year)	Data Collection Method	Analysis Model	Key Findings
W. B. Roush et al. (2006) [7]	Manual measurement (1–70 days)	Gompertz model, Neural Network	Both models explained growth well; Neural Network performed better (MSE: 382.2, MAPE: 2.983).
M. Topal et al. (2008) [8]	Manual measurement (weekly, 0–6 weeks)	Weibul, MMF, Gompertz, Bertalanffy, logistic model	The Weibull model performed best (R²: 1.0, MAPE: 0.03).
Ahmad H.A. et al. (2009) [9]	Manual measurement (0–7 weeks)	Neural Network	Neural Network effectively explained the growth curve (R²: 0.998).
Pino, M. A. et al. (2020) [10]	No mention (four different comparison groups)	Gompertz model	The Gompertz model had an R-squared of 0.99 for all four data sets. The group fed HCl, SO₄, and calcium pidolate showed the highest growth rate.
Chumthong, R. et al. (2021) [11]	Manual measurement (2-week intervals)	Gompertz, logistic, Von Bertalanffy model	The Von Bertalanffy model best fitted both male and female Blackbone chickens, with an R-squared of 0.9.
Alijani, S. et al. (2021) [12]	Manual measurement (1–45 days)	Logistic, Gompertz, Richards, Lopez, Von Bertalanffy	The Gompertz model showed an R-squared of 0.99 and an AIC of 68 for healthy broilers. For ascitic broilers, the Richards model was the best fit, with an AIC of 85.4.
Hangan, B.A. et al. (2022) [13]	Manual measurement (1–42 days)	Gompertz, polynomial growth models	The Gompertz model had an average error rate of less than 0.05 and was suitable for weekly growth modeling.
Park H. et al. (2018) [14]	Video-based automatic measurement	Mean-shift clustering, CNN	Automatic measurement accuracy: 91.09%; there was a lower accuracy in crowded conditions.
Chun-Yao Wang et al. (2021) [15]	Automatic measurement (with outliers)	GMM clustering, bootstrap algorithm, Gompertz	The Gompertz model fitted with a corrected representative weight error rate < 5%.
Oh Y. et al. 2024) [16]	Automatic measurement (with outliers)	Kernel density estimation	Optimized bandwidth selection improved representative weight accuracy.

Table 2. Details of a broiler weight meter.

Details
Device Name	Emotion Co., Ltd.’s Kokofarm broiler live weight meter
Protocol	RS-485
Data Types	Broiler Weight Sensor Data
Weight Unit	Gram (g)
Collecting Data	Broiler Live Weight (Weight Scale Hit): 1 hit/1 s

Table 3. List of the data sets used.

ID	File Name (Date + KokoFarm + Farm ID + House ID)
1	20230104130541_KF008102_sensorData.csv
2	20230117075028_KF010101_sensorData.csv
3	20230123080647_KF002101_sensorData.csv
4	20230126152117_KF001602_sensorData.csv
5	20230129085932_KF005505_sensorData.csv
6	20230129090602_KF005504_sensorData.csv
…	…
110	20240902000000_KF010102_sensorData.csv

Table 4. Predictor experiment results (January–March 2023).

Data	Model	Prophet	ARIMA	D_ES	Gompertz	Actual Weight
20230126152117 _KF001602	K-means (error%)	1540 (3.84)	1184 (26.03)	1219 (23.86)	1271 (20.64)	1602
20230126152117 _KF001602	KDE (error%)	1876 (17.11)	1194 (25.43)	1450 (9.44)	1396 (12.80)	1602
20230127112722 _KF001603	K-means (error%)	1253 (0.30)	1253 (0.29)	1299 (3.96)	1054 (15.65)	1250
20230127112722 _KF001603	KDE (error%)	2188 (75.07)	1341 (7.33)	1461 (16.94)	894 (28.40)	1250
20230129115417 _KF005511	K-means (error%)	1538 (4.19)	1285 (19.94)	1335 (16.87)	1265 (21.17)	1606
20230129115417 _KF005511	KDE (error%)	1376 (14.28)	1327 (17.32)	1378 (14.17)	1307 (18.60)	1606
20230202114437 _KF001601	K-means (error%)	1369 (0.02)	1288 (5.98)	1390 (1.50)	1004 (26.68)	1370
20230202114437 _KF001601	KDE (error%)	1881 (37.33)	1334 (2.60)	1375 (0.36)	1435 (4.78)	1370
20230213120922 _KF002503	K-means (error%)	1304 (2.75)	1236 (7.78)	1229 (8.31)	1081 (19.36)	1341
20230213120922 _KF002503	KDE (error%)	1801 (34.34)	1233 (7.98)	1168 (12,86)	1248 (6.88)	1341
20230213150747 _KF002501	K-means (error%)	1339 (4,18)	1294 (7.39)	1274 (8.82)	1096 (21.59)	1398
20230213150747 _KF002501	KDE (error%)	1723 (23.26)	1202 (13.97)	1255 (10.17)	979 (29.95)	1398
20230219151207 _KF005207	K-means (error%)	1696 (0.90)	1618 (5.49)	1627 (4.95)	1395 (18.45)	1712
20230219151207 _KF005207	KDE (error%)	1736 (1.41)	1420 (17.00)	1599 (6.54)	1472 (13.98)	1712
20230219151242 _KF005202	K-means (error%)	1459 (1.61)	1360 (8.34)	1401 (5.55)	1228 (17.20)	1484
20230219151242 _KF005202	KDE (error%)	1644 (10.84)	1360 (8.35)	1514 (2.08)	1241 (16.32)	1484
20230303174810 _KF010102	K-means (error%)	1532 (1.79)	1481 (5.03)	1538 (1.39)	1517 (2.73)	1560
20230303174810 _KF010102	KDE (error%)	1733 (11.09)	1777 (13.94)	1698 (8.88)	1871 (19.96)	1560
20230303175712 _KF010101	K-means (error%)	1400 (3.40)	1321 (2.39)	1379 (1.91)	1333 (1.50)	1354
20230303175712 _KF010101	KDE (error%)	1343 (0.77)	1244 (8.11)	1297 (4.16)	1278 (5.60)	1354

Table 5. Predictive performance of the entire model.

Model	MAE	MAPE	RMSE	Std	Mean
K-means + Prophet	79.65	4.92	104.84	102.18	25.41
K-means + ARIMA	228.73	14.10	272.41	173.88	210.36
K-means + D_ES	160.51	9.82	207.99	146.55	148.25
K-means + Gompertz	289.25	17.88	340.13	192.34	281.12
KDE + Prophet	158.91	10.34	234.59	221.86	−79.41
KDE + ARIMA	216.03	13.43	262.95	207.04	163.40
KDE + D_ES	230.80	14.41	392.70	350.82	179.60
KDE + Gompertz	299.59	18.82	436.65	400.75	177.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, B.; Song, J. Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment. Agriculture 2025, 15, 539. https://doi.org/10.3390/agriculture15050539

AMA Style

Lee B, Song J. Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment. Agriculture. 2025; 15(5):539. https://doi.org/10.3390/agriculture15050539

Chicago/Turabian Style

Lee, Bohyeok, and Juwhan Song. 2025. "Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment" Agriculture 15, no. 5: 539. https://doi.org/10.3390/agriculture15050539

APA Style

Lee, B., & Song, J. (2025). Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment. Agriculture, 15(5), 539. https://doi.org/10.3390/agriculture15050539

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of an Algorithm for Predicting Broiler Shipment Weight in a Smart Farm Environment

Abstract

1. Introduction

2. Background

2.1. Previous Studies on Broiler Weight Analysis

2.2. Clustering Method and Density Analysis Method for Estimating the Weight Representative Value

2.2.1. Clustering

2.2.2. Kernel Density Estimation

2.3. Growth Trend Representation Method and Time Series Approach for Shipping Weight Prediction

2.3.1. Gompertz Growth Model

2.3.2. Double Exponential Smoothing

2.3.3. ARIMA

2.3.4. Prophet

3. Materials and Methods

3.1. Data Collection

3.2. Algorithm Composition and Design

3.3. Performance Evaluation of the Algorithm

4. Results

4.1. Experimental Results

4.1.1. Experiments in the Daily Weight Estimation Step

4.1.2. Experiments in the Weight Representative Value Selection Step

4.1.3. Experiments in the Shipping Weight Prediction Step

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI