Application of Machine Learning Algorithms for On-Farm Monitoring and Prediction of Broilers’ Live Weight: A Quantitative Study Based on Body Weight Data

: A non-invasive automatic broiler weight estimation and prediction method based on a machine learning algorithm was developed to address the issue of high labor costs and stress responses caused by the traditional broiler weighing method in large-scale broiler production. Machine learning algorithms are a data-driven strategy that enables computer systems to make predictions and judgments based on patterns and regularities that they have learned. To estimate the current weight of individual live broilers on farms, machine learning algorithms such as the Gaussian mixture model, Isolation Forest, and Ordering Points To Identify the Clustering Structure (OPTICS) are used to ﬁlter and extract data features using a two-stage clustering and noise reduction process. Real-time weight prediction was also achieved by combining polynomial ﬁtting and the gray models and adjusting the model parameters based on prediction accuracy feedback. The symmetric mean absolute percentage error (SMAPE) value is a metric that is commonly used to evaluate the predictive performance of a model by comparing the degree of error between the model’s predicted value on the day of slaughter and the true value measured manually, and the results of the experiments on 111 datasets showed that 7.21% were less than or equal to 0.03, 28.83% were less than or equal to 0.1 and greater than 0.03, and 31.53% were less than or equal to 0.2 and greater than 0.1. This method can be used as a prediction scheme for broiler weight monitoring in a large-scale rearing environment, considering the cost of implementation and the accuracy of estimation.


Introduction
Broilers' body weight is an important indicator of their health, and effective broiler body weight monitoring is a problem that must be solved in the process of large-scale broiler farming.Meanwhile, large poultry companies typically cover a vertical chain from breeding farms, chicken farms, slaughterhouses, and distributors to form a complete broiler supply chain in order to maximize profits.Due to the lack of a younger generation of workers willing to work in the broiler industry, companies in Korea contract with hundreds of farms to meet market demand for broilers with acceptable specifications, and to meet the production capacity of their slaughterhouses.Because of the large price difference between qualified and substandard broilers, agribusinesses must increase the qualification rate, i.e., broiler production within a specified weight and size range, to avoid potential profit losses.Farmers, on the other hand, can earn additional incentive gains based on the number of broilers that meet the weight standards.As a result, on-farm real-time monitoring of live broiler weights and slaughter time control are critical for revenue management.
Broiler body weight estimation and monitoring have been extensively studied and can be categorized into four main directions: i.
Traditional growth curves and growth models based on fitting mathematical functions.For example, Topal et al. [1] studied the fitting and prediction of avian weight-age relationships and compared the goodness of fit between MMF, Weibull, logistic, Gompertz, and von Bertalanffy models.Moharrery et al. [2] proposed a methodology to study and predict the growth characteristics of commercial broilers and indigenous chickens using a nonlinear function, and they used several statistical methods to evaluate the fit of the function and differences in growth parameters.Rizzi et al. [3] investigated growth patterns and sex differences in poultry meat production by comparing different models, such as linear, logistic, Gompertz, and Richards models, and the effect of fit analysis revealed a flexible growth function.Mouffok et al. [4], for the Cobb500 strain of meat birds, found that the Gompertz model was more accurate in estimating body weights in the early stages when comparing and evaluating the fit and predictive effects of different models.ii.
Live weight estimation methods based on digital image processing.For example, Wet et al. [5] proposed a method to analyze images of broilers using commercial software and established a nonlinear regression equation to estimate the body weight of broilers by statistically analyzing the nonlinear relationship between their surface area, girth, and body weight, which was found to be less accurate compared to image analysis of pigs.Chedad et al. [6] proposed a method to estimate the body weight of chickens by image analysis, and the results of the study showed that the results of automated weighing systems tend to underestimate the actual body weight of chickens at the end stage of the growth period.Bazlur et al. [7] proposed a method to develop a linear equation for estimating the body weight of broilers by analyzing the digital images of their body surface area, validated using a random sample of 100 broilers, and the highest error between the manually measured weight and estimated weight was 16.47%, while the lowest error was 0.04%.Mortensen et al. [8] proposed a method for predicting the weight of broilers based on a 3D camera and image processing algorithms, where the average relative error between the predicted and true weight on the test dataset was 7.8%, and as the density of chickens increased, the absolute error of prediction became larger in the later stages of breeding.Amraei et al. [9] proposed a research method that includes the use of machine vision techniques to extract features related to body weight and the use of artificial neural network algorithms for predicting body weight, with prediction errors mainly centered on less than 50 g.iii.
Body weight monitoring methods based on audio analysis.For example, Aydin et al. [10] conducted a study to determine the feed intake of chickens by detecting the birds' pecks and comparing them with feed intake measured by a weighing system.They discovered a linear correlation between the number of pecks and feed intake, with 93% of pecks being accurately identified.Fontana et al. [11] developed a tool that can automatically detect the growth status of broiler chickens at varying ages based on the frequency of calls emitted by the chickens.The results of the statistical analysis showed a significant correlation between the age and weight of the chickens and the maximum power frequency (PF) emitted in their calls.Fontana et al. [12] applied SAS 9.3 software programs, including PROC TTEST, PROC CORR, and PROC REG, to perform regression analyses and statistical tests.Statistical and regression analyses indicated a notable correlation between the sound frequency, age, and body weight of broilers.Fontana et al. [13] conducted a study on the use of sound analysis to predict the body weight of broilers and found a considerable correlation between age and body weight.Incidentally, they established that frequency analyses of chickens' crowing may be disrupted by filters and ambient noise during the final stages of broiler growth.The study revealed that filters and environmental noise during the final stages of broiler growth may interfere with the frequency analysis of chicken calls.Abdel-Kafy et al. [14] utilized statistical analysis software and regression modeling to predict the body weight of turkeys by recording their vocalizations and corresponding body weights.The results demonstrated a decrease in the frequency of vocalizations with age.iv.
Direct predictive modeling based on other sensor data (nutritional intake, ventilation, temperature, humidity, etc.) or weight data.For example, Johansena et al. [15] proposed a research methodology to predict broiler weight utilizing a dynamic neural network model.The model was trained using an LM optimization algorithm, with input variables selected based on mutual information.Additionally, kernel density estimation was employed to estimate the joint probability density function.The system achieved an average root-mean-square error of prediction of 66.8 g. Lee et al. [16] developed an automated chicken weighing system composed of weighing scales and workstations.The weighing scale was built using an aluminum plate and a 5 kg load cell, and weight data were transmitted wirelessly to the workstation via a transmission module.The workstation collects data every 15 s and compares the average weight per day with a reference value to monitor the growth and development of the chickens.Weihong Ma et al. [17] introduced an effective method for extracting values using dynamic weighing.Their approach involves an improved amplitude-limited filtering algorithm and a BP neural network model to analyze data such as age, daily weight gain, average speed, and preprocessed weight values.The weighing error was reduced from 6% to less than 3% through a data-driven framework proposed by Chunyao Wang et al. [18] This framework employs Gaussian mixture modeling, self-sampling, and weighted averaging techniques to enhance the accuracy of monitoring and predicting live chicken weights.Birzniece et al. [19] suggested utilizing a long short-term memory (LSTM) artificial neural network for broiler weight prediction, based on environmental factors including temperature, gas concentration, humidity, broiler weight, and feed consumption.
A detailed enumeration and analysis of the methods, evaluation indicators, and main contributions that have appeared in previous research is provided in Table 1 below.
As shown in Table 1, in the traditional field of growth pattern and function fitting, researchers primarily rely on mainstream nonlinear regression to estimate different growth curve models for broiler chickens (e.g., Gompertz, Richards, and logistic), in order to describe and predict growth trends.However, this method has some drawbacks.It can be challenging to find a function that is applicable to diverse broiler breeds or breeding cycles, resulting in fitted functions that are often only suitable for characterizing the growth pattern of a specific breed of broiler chickens at a certain stage.When evaluating models, more emphasis is typically placed on goodness of fit, such as the coefficient of determination (R 2 ), rather than the accuracy of single-point or overall live weight predictions.Although some studies report a mean absolute percentage error (MAPE) value of 4% for prediction errors, these values were measured on small datasets consisting of no more than 100 chickens.Compared the growth of an Italian commercial hybrid (Berlanda) and a local Italian breed called Padovana (in two color varieties) and their crosses.
The birds were reared from 1 day to 180 days of age in an environmentally controlled breeder house.
Male and female chicks from five different genotypes were used to compare growth patterns using linear, logistic, Gompertz, and Richards growth models.
The most successful Gompertz and Richards growth models exhibit adjusted determination coefficient (R 2 ) values of 99.51% for the commercial strain and 99.12% for native chickens.
The growth rates of the studied genotypes were lower than those of the commercial hybrids.
The Gompertz and Richards growth models gave better estimates of weight parameters than the logistic model.
Charef Eddine Mouffok, Semara L, and Farida Belkasmi et al. [4] Describes the retrospective analysis of 50 broiler chicks and their division into three weight classes: light, middle, and heavy.
Highlights the use of goodness-of-fit criteria to evaluate the accuracy of the models.
Discusses the comparison of six mathematical models: Gompertz, Richards, logistic, Weibull, von Bertalanffy, and exponential.
The total determination coefficient (R 2 ) values of the Gompertz, logistic, von Bertalanffy, and WLS models at the three weight classes were all 0.954.
Concludes that the Gompertz model is the most suitable for describing the growth curve up to four weeks of age, while the logistic, von Bertalanffy, and WLS models accurately describe the growth curve after one month of age.

Image processing and computer vision
Lourens de Wet, Erik Vranken, Jean-Marie Aerts, and Daniel Berckmans et al. [5] Use of commercial software to analyze captured images and determine body size based on surface area and peripheral pixel count.Nonlinear regression analysis was used to determine the relationship between body weight and image characteristics.
Used digital image processing techniques to estimate the live weight of broiler chickens and discussed the challenges and limitations of image analysis, such as variations in lighting, animal movement, occlusion, and background clutter.
The body weight of the chickens was estimated with an average relative error of about 11% from the image surface and 16% from the image periphery.
Suggested the possibility of using image sequences for behavioral characterization and real-time observation systems.Use of a low-cost 3D camera (Kinect) with its own infrared light source and image processing algorithm, along with a range-based watershed algorithm for segmentation, extraction of weight descriptors, and weight prediction using a Bayesian artificial neural network, and the comparison and evaluation of four other models for weight prediction.
Using a commercial broiler house of 48,000 broilers (Ross 308) during the last 20 days of the breeding period and a test set of 83 broilers, manually annotated images, and a traditional platform scale for reference weights to explore the different 1D, 2D and 3D features for weight prediction.
A relative mean error of 7.8% was achieved on a separate test set, and the range of absolute errors was 20-100 g in the first half of the period and 50-250 g in the last half.Larger errors were observed at the end of the rearing period as the broiler density increased.
The system shows promise as a non-intrusive, robust solution for weighing broilers in commercial production environments, with potential for additional applications.Sound recordings using a microphone attached to the feeding pen.Each hen was deprived of food for four hours before the experiment.Feed intake was automatically recorded using a weighing system, and feed wastage was manually collected and weighed.The results of the algorithm were compared with reference feed intake values obtained by weighing and video observation.
Twelve individual 28-day-old male broiler chickens (Ross-308) were used.Pecking sounds were recorded for 15 min during each trial.A total of 36 trials were performed, with three laboratory trials per broiler.
The algorithm correctly identified 93% of pecking sounds but had a false positive rate of 7%.The coefficient of determination (R 2 ) between the number of pecks and the feed uptake was 0.995.The coefficient of determination (R 2 ) between the feed intake and the number of pecks (pecking frequency) was 0.985.
Provided a non-invasive and automated method to measure feed intake in broiler chickens.The real-time data provided by this algorithm have potential applications in the study of broilers' feeding behavior and welfare.There was a strong positive correlation (0.97, p-value < 0.001) between the weight and age of the broilers, and a strong negative correlation (0.95, p-value < 0.001) between the PF of the sounds and the age of the broilers.There was a significant negative correlation (0.80; p-value < 0.001) between the frequency of the vocalizations and the weight of the broilers.
A strong positive correlation was observed between weight and age.PF showed significant negative correlations with both age and weight.PF could be used as an early warning or continuous monitoring system to assess the health and status of broiler chickens.The correlation coefficient between the expected and observed weights was high and positive (R 2 = 0.96, p value ≤ 0.001).The regression model between the expected and observed weights was highly significant (R 2 = 0.93, p value ≤ 0.001).
The identified model for predicting weight as a function of peak frequency confirmed that birds' weight could be predicted by frequency analysis of sounds emitted at the farm level.
Regression models were developed to predict weight and PF from age and weight from PF. Pooled data were analyzed using ANOVA to test for differences between age, weight, and PF variables.Regression models based on pooled data were used to predict weight and PFs.
Four trials were conducted in Egypt to record sounds and weights of turkeys during an 11-day growth period.A total of 2200 sounds were manually analyzed and labeled using peak frequency (PF).
The correlation coefficient between turkeys' weight and age was high and positive (R 2 = 0.96, p < 0.0001).The correlation coefficient between the PF of turkey vocalizations and their age was high and negative (R 2 = 0.97, p < 0.001).
The correlation coefficient between the PF of the vocalizations and the weight of the turkeys was high and negative (R 2 = 0.97, p < 0.001).
The RMSE values during calibration and validation differed by 5.1%.
Audio monitoring provides a non-contact method of monitoring turkeys' growth, eliminating manual handling.Potentially useful for farmers to automate turkey growth monitoring.The automated chicken weighing system uses a wireless sensor network (WSN) for data transmission.The system includes rugged aluminum plate scales and 5 kg load cells.The weight data are transmitted to a workstation via a wireless transceiver module.
The scales were designed to accurately measure the weight of the chickens.They were made of aluminum plates and equipped with 5 kg load cells to ensure robustness and accuracy, and they were placed inside the three pens for 38 days to record and monitor the development of the chickens from day to day.
From day 1 to day 12, the hens' development rate was as planned.However, beginning on day 13, the development rate of the hens was 3.38% to 12.21% slower than projected, causing the animals to take 40-42 days to reach 1.8 kg.
The successful development of an automated chicken weighing system using a wireless sensor network (WSN) capable of collecting real-time weight data from broiler chickens.Chun-Yao Wang, Ying-Jen Chen, and Chen-Fu Chien et al. [18] Proposed a data-driven framework for weight monitoring and prediction in the broiler industry.The weight monitoring module estimates live broiler body weight using a Gaussian mixture model (GMM).It employs a bootstrap resampling approach to decrease cluster noise, identify outliers, and compute the cluster-weighted mean as a single individual average weight.To model broiler growth and anticipate future weight, the weight prediction module employs mathematical growth functions, notably the Gompertz function.The cumulative mean absolute percentage error (Cu-MAPE) is used as a model fit indicator.
Empirical studies were carried out in six broiler farms to validate the proposed approach.
For each batch, the estimated value was compared with manual weighing on four reference days, i.e., day 14, day 21, day 28, and the day of delivery.
For batch 165-1, the error on day 28 was greater than 8%; for batch 165-2, the error on day 21 was 7.55%, while that on day 28 was 11.2%; for batch 165-4, the error on day 14 was the highest of all, at about 16.34%; for batch 165-9, the error on day 14 was 5.2%, while that on day 21 was 7.75%.As regards the error on the day of delivery, several batches had an error of less than 3%, with some of them even less than 1%.
The proposed data-driven framework for weight monitoring and prediction in the broiler industry was successfully implemented and validated.The study demonstrated the practicality of the approach and highlighted the potential for Industry 3.5 solutions in the agricultural sector.In the widely popular fields of image processing and computer vision (referenced in Table 1), researchers commonly use commercial image processing software or artificial neural networks to establish the relationship between broilers' area in images and their actual weight.These methods often involve the fundamental task of distinguishing the broiler from the background, frequently using the Otsu method.However, this technique is highly sensitive to noise and may face challenges in accurately segmenting images that lack distinct bimodal peaks or exhibit multimodal characteristics.Furthermore, conducting research in this domain requires the use of expensive camera equipment, usually incorporating infrared night-vision capabilities for capturing scenes in low-light conditions, as well as high pixel resolution for enhanced calculation accuracy in obtaining high-definition images.In some studies, 3D cameras are even employed to capture spatial information.Additionally, both image processing and training neural networks are time-consuming tasks, and the former also requires complex data annotation work.Finally, the evaluation of these models involves calculations performed on small test datasets consisting of less than 100 chickens.For example, the best-performing Bayesian neural network model achieved a root-mean-square error (RMSE) value of 82.37 when tested on a dataset comprising 30 broiler chickens.
As indicated in Table 1, in the increasingly popular field of audio processing and correlation exploration, researchers commonly utilize audio processing software in conjunction with multiple regression models to estimate the relationship between broilers' age, weight, and peak frequency (PF).The first challenge that they face is removing background noise.Furthermore, expensive recording equipment is necessary for this type of research, requiring high-resolution recording capabilities to capture more detail and improve audio quality.These devices often incorporate noise suppression technology to minimize the interference of background noise.Data preprocessing and annotation work are also time-consuming and labor-intensive, similar to video processing.Finally, when conducting model evaluation, most studies focus on correlation studies, examining the p-values between age, weight, and PF, with little attention given to direct prediction accuracy measurement.
As observed in Table 1, in the limited domain of direct live weight prediction for broilers using other sensors, researchers have attempted to use alternative types of sensors to directly capture data on factors influencing body weight (such as temperature, carbon dioxide concentration, and ammonia concentration).They established neural networks or regression models to directly estimate broiler chickens' weight.The first step for researchers is to reduce data noise through data preprocessing, filtering out the real weight data of individual broiler chickens.It should be noted that the resampling technique mentioned in previous studies originally served as a method of expanding the dataset, not a noise reduction algorithm.The sensor equipment required for this method is relatively simple, and model calculations on the same scale are faster compared to indirect data processing methods like images and sounds.Finally, when conducting model evaluation, although the best prediction accuracy error can be as low as 3%, these results are based on a limited number of datasets and small samples.
Based on the previous research, we believe that the live weight monitoring and prediction of broilers in large-scale breeding industries has not really been studied, because the actual data are not for tens or even hundreds of thousands of chickens in breeding.Moreover, directly using the weight sensor to calculate the live weight of broiler chickens is undoubtedly the simplest and most economical method, and the construction of the Internet of Things system and model calculation are relatively simple.The only difficulty is extracting individual body weight data from highly contaminated data.Our research is based on large datasets and combined with the most popular machine learning methods in current artificial intelligence to provide advanced solutions to the above challenges.
The solution consists of two stages: non-invasive automatic real-time monitoring, and prediction of broiler weight.The first stage uses a multi-clustering machine learning algorithm to continuously eliminate outliers to obtain the average weight of a single individual.In the second stage, short-and long-term prediction models are flexibly established accord-ing to the feeding cycle to improve the prediction accuracy.An experimental study using 113 large datasets provided by Harim's Cooperative Farms, the largest and leading broiler company in South Korea, demonstrated the practical feasibility of this approach.Figure 1 shows the general process framework, from data collection to model building, used in this study.It includes a hardware IoT system and a software algorithm system.The IoT system consists of three parts: an electronic scale fixed on the farm floor, a database that stores the local measurement data of the electronic scale, and a cloud database that aggregates all of the datasets.The software part constructs two types of models for monitoring and prediction based on the weight dataset (in terms of farms and feeding cycles) from the cloud database.

Materials and Methods
The objective of this study was to explore an automated and cost-effective method, based on machine learning algorithms, that uses weight sensors as the sole source of data collection to determine the live weight of individual broilers.To this end, several machine learning clustering algorithms were introduced to develop a methodology for processing and analyzing the data in three stages, allowing the average weight of an individual chicken to be derived from noisy and outlier data.Machine learning itself is a collection of algorithms that automatically learn and extract patterns and features from large amounts of data.It can be divided into supervised learning, which includes labels (i.e., positive solutions), unsupervised learning, which does not include labels, and reinforcement learning, which simulates maximizing cumulative utility, with unsupervised learning algorithms often used in data analysis and mining.The overall processing flow is shown in Figure 2 (Phases 2-4): In the first phase, all obvious outliers in the original data, including negative and 0 values, are replaced by data within the range of valid values.Here, we used the multiple interpolation machine learning algorithm.In subsequent stages, the entire dataset is divided into a number of monitoring units at fixed time intervals, and then the mean body weights of the individuals in each unit block are calculated using multiple clustering, anomaly detection, and other machine learning algorithms.In the third stage, the mean weight for the next time interval is predicted using the compiled list of individual weights.

Materials and Methods
The objective of this study was to explore an automated and cost-effective method, based on machine learning algorithms, that uses weight sensors as the sole source of data collection to determine the live weight of individual broilers.To this end, several machine learning clustering algorithms were introduced to develop a methodology for processing and analyzing the data in three stages, allowing the average weight of an individual chicken to be derived from noisy and outlier data.Machine learning itself is a collection of algorithms that automatically learn and extract patterns and features from large amounts of data.It can be divided into supervised learning, which includes labels (i.e., positive solutions), unsupervised learning, which does not include labels, and reinforcement learning, which simulates maximizing cumulative utility, with unsupervised learning algorithms often used in data analysis and mining.The overall processing flow is shown in Figure 2 (Phases 2-4): In the first phase, all obvious outliers in the original data, including negative and 0 values, are replaced by data within the range of valid values.Here, we used the multiple interpolation machine learning algorithm.In subsequent stages, the entire dataset is divided into a number of monitoring units at fixed time intervals, and then the mean body weights of the individuals in each unit block are calculated using multiple clustering, anomaly detection, and other machine learning algorithms.In the third stage, the mean weight for the next time interval is predicted using the compiled list of individual weights.

Broiler Sample and Live Weight Data Collection
This study was conducted at 36 farms.A regular supply of Cobb 500 chicks was provided by the firm, and Cobb-Vantress, Inc. of Siloam Springs, AR, USA, was the output company.The breed's minimum theoretical weight was 42 g (0 days of age), and the maximum weight was 4641 g (56 days of age).KOKOFARM electronic scales (model: KO-KOFARM, provider: EMOTION Co., Ltd. of Jeonju, Korea, output company: CAS Co., Ltd. of Yangju, Korea) were used for precise measurements.We established and consistently calibrated monitoring systems at the front, middle, and rear locations of each farm before introducing a new group of breeders.Each scale was equipped with six types of sensors, including weight sensors, with a maximum range of 1500 g, a divisional value of 1 g, and a measurement error range of 5%.The breeding cycle on the farms lasts for 28 to 31 days, during which 30,000 broilers are raised in a single batch.The theoretical body weights of the broilers ranged from 42 to 2094 g based on the actual breeding cycle, and within this range the sample of broilers was representative.The sensors for body weight produced one reading per second, which was sent to the data server for storage in a CSV file format every minute of accumulation.The raw data, consisting of time and scale data, were obtained by filtering redundant sensor data and synchronizing timestamps with three weight values.The program was developed using Python 3.9, with core algorithmic libraries such as scikit-learn 1.1.2,SciPy 1.9.1, missingpy 0.2.0, and others being employed.All experiments were conducted on the server, which was equipped with an Intel (R) Xeon (R) Gold 5218R CPU operating at 2.10 GHz, an NVIDIA A100 80 GB PCIe, 251 GB RAM, and the Ubuntu Linux operating system.The experimental farm layout, broilers, and electronic scales are shown in Figure 3

Broiler Sample and Live Weight Data Collection
This study was conducted at 36 farms.A regular supply of Cobb 500 chicks was provided by the firm, and Cobb-Vantress, Inc. of Siloam Springs, AR, USA, was the output company.The breed's minimum theoretical weight was 42 g (0 days of age), and the maximum weight was 4641 g (56 days of age).KOKOFARM electronic scales (model: KOKOFARM, provider: EMOTION Co., Ltd. of Jeonju, Korea, output company: CAS Co., Ltd. of Yangju, Korea) were used for precise measurements.We established and consistently calibrated monitoring systems at the front, middle, and rear locations of each farm before introducing a new group of breeders.Each scale was equipped with six types of sensors, including weight sensors, with a maximum range of 1500 g, a divisional value of 1 g, and a measurement error range of 5%.The breeding cycle on the farms lasts for 28 to 31 days, during which 30,000 broilers are raised in a single batch.The theoretical body weights of the broilers ranged from 42 to 2094 g based on the actual breeding cycle, and within this range the sample of broilers was representative.The sensors for body weight produced one reading per second, which was sent to the data server for storage in a CSV file format every minute of accumulation.The raw data, consisting of time and scale data, were obtained by filtering redundant sensor data and synchronizing timestamps with three weight values.The program was developed using Python 3.9, with core algorithmic libraries such as scikit-learn 1.1.2,SciPy 1.9.1, missingpy 0.2.0, and others being employed.All experiments were conducted on the server, which was equipped with an Intel (R) Xeon (R) Gold 5218R CPU operating at 2.10 GHz, an NVIDIA A100 80 GB PCIe, 251 GB RAM, and the Ubuntu Linux operating system.The experimental farm layout, broilers, and electronic scales are shown in Figure 3 below.

Broiler Sample and Live Weight Data Collection
This study was conducted at 36 farms.A regular supply of Cobb 500 chicks was provided by the firm, and Cobb-Vantress, Inc. of Siloam Springs, AR, USA, was the output company.The breed's minimum theoretical weight was 42 g (0 days of age), and the maximum weight was 4641 g (56 days of age).KOKOFARM electronic scales (model: KO-KOFARM, provider: EMOTION Co., Ltd. of Jeonju, Korea, output company: CAS Co., Ltd. of Yangju, Korea) were used for precise measurements.We established and consistently calibrated monitoring systems at the front, middle, and rear locations of each farm before introducing a new group of breeders.Each scale was equipped with six types of sensors, including weight sensors, with a maximum range of 1500 g, a divisional value of 1 g, and a measurement error range of 5%.The breeding cycle on the farms lasts for 28 to 31 days, during which 30,000 broilers are raised in a single batch.The theoretical body weights of the broilers ranged from 42 to 2094 g based on the actual breeding cycle, and within this range the sample of broilers was representative.The sensors for body weight produced one reading per second, which was sent to the data server for storage in a CSV file format every minute of accumulation.The raw data, consisting of time and scale data, were obtained by filtering redundant sensor data and synchronizing timestamps with three weight values.The program was developed using Python 3.9, with core algorithmic libraries such as scikit-learn 1.1.2,SciPy 1.9.1, missingpy 0.2.0, and others being employed.All experiments were conducted on the server, which was equipped with an Intel (R) Xeon (R) Gold 5218R CPU operating at 2.10 GHz, an NVIDIA A100 80 GB PCIe, 251 GB RAM, and the Ubuntu Linux operating system.The experimental farm layout, broilers, and electronic scales are shown in Figure 3

Data Preparation and Preprocessing
The raw data were time-series data with one time column (in seconds) and three columns of body weight values (in grams), i.e., three body weight measurements at the same time corresponding to different farm locations.Due to the unpredictable individual and group behavior of chickens, as well as the uncontrollable contingencies of external environmental factors such as earthquakes, strong winds, hardware short-circuits, and so on, it was inevitable that the collected data would be filled with negative numbers, zeros, and other noisy data, such as 1.5 chicken data versus 2.5 chicken data or even 3.5 chicken data.When the weight of a chicken was accurately identified, the numbers were mixed with many outliers due to the sensitivity of the sensors.Therefore, these noise and outliers needed to be removed before analyzing the data.

Critical-Value-Based Data Processing for Noise Reduction
We can define thresholds based on observation or life experience to eliminate data that appear invalid.For example, setting the threshold to a very small positive number, such as 1, and deleting values less than that will filter out all negative integers and zeros.Similarly, a higher number can be used to remove noisy data that exceed the upper limit of the observed value.The advantages of this method are that it is fast, effective, and has a low computational overhead.The disadvantage is that it requires the user to have sufficient a priori knowledge about the range of values of the data and the labeling standards.
After processing, the cut data segment, as shown in Figure 4, contains the true weight data.This is particularly useful when there is a lot of noise outside the threshold.

MissForest-Based Approach to Null-Filling
Noises are not removed immediately to maintain the timeliness of the data but are replaced with null values and then replenished periodically.This method attempts to maximize the value of the data.The MissForest [20] approach, known for its robustness, is used to impute noisy data, especially in cases where missing data are large and dispersed.It is a modern nonparametric approach based on random forests, a widely used

Data Preparation and Preprocessing
The raw data were time-series data with one time column (in seconds) and three columns of body weight values (in grams), i.e., three body weight measurements at the same time corresponding to different farm locations.Due to the unpredictable individual and group behavior of chickens, as well as the uncontrollable contingencies of external environmental factors such as earthquakes, strong winds, hardware short-circuits, and so on, it was inevitable that the collected data would be filled with negative numbers, zeros, and other noisy data, such as 1.5 chicken data versus 2.5 chicken data or even 3.5 chicken data.When the weight of a chicken was accurately identified, the numbers were mixed with many outliers due to the sensitivity of the sensors.Therefore, these noise and outliers needed to be removed before analyzing the data.

Critical-Value-Based Data Processing for Noise Reduction
We can define thresholds based on observation or life experience to eliminate data that appear invalid.For example, setting the threshold to a very small positive number, such as 1, and deleting values less than that will filter out all negative integers and zeros.Similarly, a higher number can be used to remove noisy data that exceed the upper limit of the observed value.The advantages of this method are that it is fast, effective, and has a low computational overhead.The disadvantage is that it requires the user to have sufficient a priori knowledge about the range of values of the data and the labeling standards.
After processing, the cut data segment, as shown in Figure 4, contains the true weight data.This is particularly useful when there is a lot of noise outside the threshold.

Data Preparation and Preprocessing
The raw data were time-series data with one time column (in seconds) and three co umns of body weight values (in grams), i.e., three body weight measurements at the sam time corresponding to different farm locations.Due to the unpredictable individual an group behavior of chickens, as well as the uncontrollable contingencies of external env ronmental factors such as earthquakes, strong winds, hardware short-circuits, and so o it was inevitable that the collected data would be filled with negative numbers, zeros, an other noisy data, such as 1.5 chicken data versus 2.5 chicken data or even 3.5 chicken da When the weight of a chicken was accurately identified, the numbers were mixed wi many outliers due to the sensitivity of the sensors.Therefore, these noise and outlie needed to be removed before analyzing the data.

Critical-Value-Based Data Processing for Noise Reduction
We can define thresholds based on observation or life experience to eliminate da that appear invalid.For example, setting the threshold to a very small positive numb such as 1, and deleting values less than that will filter out all negative integers and zero Similarly, a higher number can be used to remove noisy data that exceed the upper lim of the observed value.The advantages of this method are that it is fast, effective, and h a low computational overhead.The disadvantage is that it requires the user to have suffi cient a priori knowledge about the range of values of the data and the labeling standard After processing, the cut data segment, as shown in Figure 4, contains the true weig data.This is particularly useful when there is a lot of noise outside the threshold.

MissForest-Based Approach to Null-Filling
Noises are not removed immediately to maintain the timeliness of the data but a replaced with null values and then replenished periodically.This method attempts maximize the value of the data.The MissForest [20] approach, known for its robustne is used to impute noisy data, especially in cases where missing data are large and d persed.It is a modern nonparametric approach based on random forests, a widely us

MissForest-Based Approach to Null-Filling
Noises are not removed immediately to maintain the timeliness of the data but are replaced with null values and then replenished periodically.This method attempts to maximize the value of the data.The MissForest [20] approach, known for its robustness, is used to impute noisy data, especially in cases where missing data are large and dispersed.It is a modern nonparametric approach based on random forests, a widely used nonlinear modeling tool.Its advantages include its ability to capture interactions and nonlinear characteristics within data variables, along with its flexibility to handle a wide range of data formats, including mixed types with numerical classifications.Missing value imputation involves filling in all missing data with the median or mean for continuous values, training a random forest model on the entire dataset, predicting missing values, and iterating until convergence.For the set of continuous variables N, the convergence equation is defined as follows: where X imp new denotes the interpolation matrix and X imp old is the previous one.Following the two phases of processing in the first stage, we obtained input data that could be used to begin the second stage of data analysis.

Time Interval Segmentation and Average Weight Calculation
To achieve real-time monitoring of the average body weight of individual chickens while reducing the computational overhead, the data from the preprocessing stage were split by temporal frequency.Direct clustering or finding the mean was difficult and imprecise due to the continuity of the data.Multiple clustering techniques and anomaly detection methods were used with data blocks as processing units for continuous cleaning and screening.

Segmentation and Gaussian-Mixture-Model-Based Data Modeling
Data segmentation using time-frequency equivalent intervals is straightforward and, therefore, is not provided individually.We can divide the amount of data collected in real time into days or hours (in this case, 3 h).Then we can move on to clustering.
Because the dataset included various possibilities, like multiple chickens on the platform, or just a wing touch, we employed a Gaussian mixture model (GMM) [21] for segmenting the data to decompose them into individual Gaussian distribution components.As per the central limit theorem, if a random variable is composed of many tiny and independent elements, the variable is considered to follow a Gaussian distribution.Consequently, numerous incidental variables can be described roughly by uni-or multivariate Gaussian distributions.Specific weights merge multiple Gaussian models into a solitary model, and distinct data points possess diverse likelihoods of belonging to all Gaussian models, creating the GMM.Based on the time-weight value pairs, a two-dimensional GMM was utilized, as presented in Equations ( 2) and (3): where p k is the weight of the kth component, ∑ K k=1 p k = 1, and N(X | θ) is the probability density function of the kth component.The data dimension is denoted by d (2 in this study), X and µ are both d-dimensional data represented by a matrix with one row and d columns, and Σ is a non-heterogeneous square covariance matrix with d rows and d columns.
The expectation maximization (EM) method was utilized to estimate parameters, and upon completion of the iterations, the raw dense data were divided into clusters, each representing a unique parameter model of the initial data distribution.

Isolation-Forest-Based Outlier Sieving
After breaking down the initial data into smaller subsets and patterns, it is necessary to determine the representative values of each subset by calculating averages.Centralization is crucial to prevent excessive variation in each set.To exclude outliers from each cluster and enable the calculation of a more general and robust average, the Isolation Forest method [22] was employed.Isolation is the action of separating samples with characteristics from other data.These characteristics have two distinct meanings: firstly, the samples have a minimal size compared to the overall data, and secondly, the values of the samples differ significantly from the surrounding sample data.The Isolation Forest algorithm is a rapid and efficient anomaly detection algorithm.It calculates the necessary number of hyperplanes to isolate an instance by dividing the hyperplane and then assesses the anomaly of said instance.Owing to the miscellaneous and sporadic characteristics of the anomalous image components, they are typically detected close to the base node in the constructed tree structure, while the background is more likely to be identified at the deeper end of the tree, which, in turn, leads to a smaller depth of the anomaly in the isolated tree.Consequently, if image elements in an Isolation Forest made up of multiple standalone trees possess short path lengths, they are considered to be abnormal.

Average-and OPTICS-Based Multiple Clustering
The individual Gaussian components excluding outliers were averaged to convert the mixed data into discrete feature points.These representatives possessed both true and noisy values.The density-based soft clustering algorithm Ordering Points To Identify the Clustering Structure (OPTICS) [23] was then used to re-cluster the above discrete values into clusters of varying densities.The average weight of a single chicken within the current time was determined by selecting the value with the highest weight from the cluster with the most objects.This decision was based on our assumption that clusters with the most objects meeting the required density are more likely to contain accurate values, as the actions that reliably trigger the sensors during data collection produce the most noise, whereas temporary, random triggers produce less noise.Valid and accurate data typically arise when the sensor is in a stable condition (e.g., triggered continuously).OPTICS is a density-based clustering algorithm that can identify clusters of any shape and detect anomalies in the data.It is not sensitive to initial parameters and can find noisy points effectively.Additionally, OPTICS provides a result in the form of a set of sequences of points in all possible classification cases, making it versatile for various data scenarios.Multiple iterations are possible depending on the size of the data.
We determined average broiler body weights using algorithmic model analysis based on dual clustering combining GMM and OPTICS.Farmers can monitor body weights in real time based on these data, which can be used for predictive modeling as they accumulate.

Adaptive Forecasting Combining Multinomial Regression with Gray Models
Once enough real-time weight measurements have been collected, a new model can be created to forecast weight at a future point in time.Body weight data obtained during the monitoring stage in hours are sampled at a fixed frequency for daily data.The application of polynomial functions for modeling biological growth curves has been a longstanding area of research; however, this method typically necessitates additional data to attain a high level of precision.To address the gaps in prediction during data accumulation, we employed a gray model, which demands fewer data and provides good accuracy.

Multinomial-Regression-Based Medium-and Long-Term Forecasting
Polynomial regression, a subset of multiple linear regression, estimates the relationship as an nth-order polynomial [24].To determine the best function match for the data, the widely used least squares method (LSM) [25], a mathematical optimization technique, minimizes the sum of squares of errors (also known as residuals).That is, a function is selected from the set of nth-order polynomial functions with strong predictive ability for both known and unknown data.Frequently, fitting a mathematical function is used to model animals' growth characteristics.Based on prior research, the Gompertz function was chosen to offer a polynomial fit to the invested data for up-to-date weight estimates as of the anticipated delivery date.Technical abbreviations will be explained at their first use.Equation ( 4) below provides the function, where a signifies the maximum limit, b signifies the displacement along the x-axis (shifting the graph to the left or right), c signifies the growth rate (scaling along the y-axis), and e is Euler's number (e = 2.71828...).However, as chickens' growth is influenced by environmental conditions, health status, and farmers' feeding strategies, the initial fitted curve may not accurately predict the average weight of future broilers.Therefore, to enhance the computational accuracy, this study incorporated a cumulative symmetric mean absolute percentage error (SMAPE) feedback model adjustment mechanism.SMAPE is a symmetry-based measure of the percentage error between anticipated and actual values that is particularly useful for evaluating predictive model performance in regression situations.SMAPE is assessed according to Equation (5), where A t denotes the measured value and P t denotes the projected value.The absolute deviation between A t and P t is divided by half the amount of the absolute values of A t and P t (the measured and projected values, respectively).Then, the calculated value for each fitted point t is divided by the number of fitted points n.
The chicken growth cycle usually spans 28 to 32 days, consisting of a starter phase, growing phase, and terminal phase.The growth function cannot be fitted until the chickens have completed the growing period, due to inadequate data in the beginning.This study began by fitting a mathematical curve to the weight monitoring data from day 1 to day 14, which included the starter and grower periods.The deviation between the projected value according to the mathematical curve and the measured value from the weight monitoring was computed utilizing SMAPE for each period following the initial model fitting.Whenever the cumulative SMAPE of the vested growth function surpasses a specific threshold, the program adaptively refits a new growth function.

Gray-Model-Based Short-Term Forecasting
To compensate for the late starting point of polynomial fitting prediction models (generally, two weeks of data are required to achieve high accuracy), and because the time series was only dedicated to data fitting and not law discovery, a gray model with more flexible application was introduced (generally, only four data points are needed to be sufficient).Gray models (GMs) are fuzzy long-term descriptions of the development patterns of things created by building gray differential prediction models with limited information [26].They are generally used to determine the degree of dissimilarity of development trends among system factors, i.e., correlation analysis and gray generation of the original data to find the pattern of system changes and generate a data series with strong regularity, and then establish the corresponding differential equation model to predict the future development trends of things.This study's prediction method is known as the GM (1, 1) model, which is a type of gray prediction model [27].Where G denotes gray, M denotes model, the first 1 in parentheses denotes that the differential equation is first-order, and the second 1 denotes that the equation has only one variable.Because each fit uses four new data points from the neighborhood, there is no feedback adjustment.
Combining the above methods, we achieved automatic real-time monitoring and prediction of the average body weight of a single chicken.

Results of Data Pre-Processing
Typically, a CSV file from the server stores data from a farm during a breeding cycle, with a size of about 50 MB and 70 columns.The first 10 columns contain time (in minutes), ammonia concentration, and so on, while the last 60 columns are broiler weight data, one second at a time, with the number of rows varying according to the time duration.The example data have 133,346 rows.The time columns and body weight data were filtered with Pandas and then classified by electronic scales 1, 2, and 3, and finally the timestamps in minutes were matched to each body weight value in seconds, as shown in Figure 5 below.The weight data extraction code, the timestamp matching code, and the data analysis code were combined to create an end-to-end automatic computational model.

Results of Data Pre-Processing
Typically, a CSV file from the server stores data from a farm during a breeding cycle, with a size of about 50 MB and 70 columns.The first 10 columns contain time (in minutes), ammonia concentration, and so on, while the last 60 columns are broiler weight data, one second at a time, with the number of rows varying according to the time duration.The example data have 133,346 rows.The time columns and body weight data were filtered with Pandas and then classified by electronic scales 1, 2, and 3, and finally the timestamps in minutes were matched to each body weight value in seconds, as shown in Figure 5 below.The weight data extraction code, the timestamp matching code, and the data analysis code were combined to create an end-to-end automatic computational model.

Analysis of Data Preprocessing
From previous research, we know that electronic scales collect a large number of zeros when the broiler does not walk up to the scale, or when there is no touching at all.And when the scale is touched, various other noise values are generated.Thus, after removing the noise values, there is a large number of missing values, almost 1/5 of the data shown

Analysis of Data Preprocessing
From previous research, we know that electronic scales collect a large number of zeros when the broiler does not walk up to the scale, or when there is no touching at all.And when the scale is touched, various other noise values are generated.Thus, after removing the noise values, there is a large number of missing values, almost 1/5 of the data shown in Figure 7 below.Since the weight of the missing values is usually large and destroys the real-time performance, they have to be filled in.We used MissForest, which has excellent performance, to multiply impute the table of numbers rich in missing values, and we used boxplots to observe five statistics, including the minimum, first quartile (Q1), median, third quartile (Q3), and maximum of all data before and after the populations, and found that the overall distribution of the data did not change significantly, as shown in Figure 8.The need for padding and the effectiveness of the algorithm were demonstrated.

Analysis of Data Preprocessing
From previous research, we know that electronic scales collect a large number of zeros when the broiler does not walk up to the scale, or when there is no touching at all.And when the scale is touched, various other noise values are generated.Thus, after removing the noise values, there is a large number of missing values, almost 1/5 of the data shown in Figure 7 below.Since the weight of the missing values is usually large and destroys the real-time performance, they have to be filled in.We used MissForest, which has excellent performance, to multiply impute the table of numbers rich in missing values, and we used boxplots to observe five statistics, including the minimum, first quartile (Q1), median, third quartile (Q3), and maximum of all data before and after the populations, and found that the overall distribution of the data did not change significantly, as shown in Figure 8.The need for padding and the effectiveness of the algorithm were demonstrated.

Results of the Calculation of the Average Live Weight of Individual Broilers
It is difficult to weigh each chicken during a feeding cycle, especially when almost every farm has thousands of chickens.The traditional method is to weigh a random number of chickens in different areas of the farm each week to represent the broiler growth on the sampled farm on that day.However, such a small sample measurement is usually

Results of the Calculation of the Average Live Weight of Individual Broilers
It is difficult to weigh each chicken during a feeding cycle, especially when almost every farm has thousands of chickens.The traditional method is to weigh a random number of chickens in different areas of the farm each week to represent the broiler growth on the sampled farm on that day.However, such a small sample measurement is usually biased and inaccurate.Either way, the truck collects the entire batch of broilers at once on the day of delivery (usually between days 28 and 32) and then calculates the gross weight of all of the broilers.In this study, a checkpoint was set every five days to validate the proposed framework, with days 5, 10, 15, 20, 25, and 30 serving as reference days to calculate differences and compare results, as shown in Figures 9 and 10.However, due to the high cost of the inputs, only two such datasets were available, in addition to 111 datasets containing only the delivery day measurement data, which were used for validation as shown in Figure 11.Although all of the broilers were measured at one time in this case, the reference value of the data on the delivery date was not high, due to human interference during transportation.In particular, in order to avoid fines, staff usually take unqualified broilers away at will.Nevertheless, the validation results of this method on 111 datasets showed that 67.57% of the datasets had calculation errors below 0.2, of which 40 farms had SMAPE values below 0.1 and 7 farms had SMAPE values below 0.03.This demonstrates the practical feasibility of this method.

Analysis of the Calculation of the Average Live Weight of Individual Broilers
As our experimental data included both time and weight values, we used a twodimensional Gaussian mixture model to perform a first-round clustering on the data after the above filling.Obviously, it makes sense to discuss the time values of these rapidly growing and changing broilers.The number of hyperparametric components k of the Gaussian mixture model was determined by calculating the Bayesian information criterion (BIC) (here, the BIC values were calculated exhaustively for the first 20 components under four covariance types: differential, spherical, joint, and full, as shown in Figure 12 below).Observing the decreasing trend of the BIC and the final calculation results, we found that it is reasonable to set the k value to 20, which not only reduces the computational overhead of the system but also ensures a high accuracy rate.And the cluttered data were decomposed into twenty or fewer components, as shown in Figure 13 below.

Analysis of the Calculation of the Average Live Weight of Individual Broilers
As our experimental data included both time and weight values, we used a two-dimensional Gaussian mixture model to perform a first-round clustering on the data after the above filling.Obviously, it makes sense to discuss the time values of these rapidly growing and changing broilers.The number of hyperparametric components k of the Gaussian mixture model was determined by calculating the Bayesian information criterion (BIC) (here, the BIC values were calculated exhaustively for the first 20 components   A powerful Isolation Forest was introduced to make the data of the components more centralized.Outliers, in general, were fewer than regular observations and differ in terms of values (they were far from regular observations in the feature space), as shown in Figure 14 below.The average of each filtered component was then calculated as the representative value of the overall data, as shown by the white five-pointed stars in Figure 13 above.However, these representative values were not equalized.Instead, weights were calculated for each value based on the ratio of the total number of values in its corresponding cluster to the total number of values in all clusters.A powerful Isolation Forest was introduced to make the data of the components mo centralized.Outliers, in general, were fewer than regular observations and differ in term of values (they were far from regular observations in the feature space), as shown in Figu 14 below.The average of each filtered component was then calculated as the represent tive value of the overall data, as shown by the white five-pointed stars in Figure 13 abov However, these representative values were not equalized.Instead, weights were calc lated for each value based on the ratio of the total number of values in its correspondin cluster to the total number of values in all clusters.A powerful Isolation Forest was introduced to make the data of the components more centralized.Outliers, in general, were fewer than regular observations and differ in terms of values (they were far from regular observations in the feature space), as shown in Figure 14 below.The average of each filtered component was then calculated as the representative value of the overall data, as shown by the white five-pointed stars in Figure 13 above.However, these representative values were not equalized.Instead, weights were calculated for each value based on the ratio of the total number of values in its corresponding cluster to the total number of values in all clusters.To screen the target values, we performed quadratic clustering of the above feature values based on the effective density (e.g., the electronic scale was restored from the perturbed state to the stationary state to obtain the effective measurement data).The range of feature data was significantly reduced after two iterations of the OPTICS process.Then, as shown in Figure 15 below, the datum with the highest weight among the feature values was selected as the target weight value.To screen the target values, we performed quadratic clustering of the above feature values based on the effective density (e.g., the electronic scale was restored from the perturbed state to the stationary state to obtain the effective measurement data).The range of feature data was significantly reduced after two iterations of the OPTICS process.Then, as shown in Figure 15 below, the datum with the highest weight among the feature values was selected as the target weight value.To screen the target values, we performed quadratic clustering of the above feature values based on the effective density (e.g., the electronic scale was restored from the perturbed state to the stationary state to obtain the effective measurement data).The range of feature data was significantly reduced after two iterations of the OPTICS process.Then, as shown in Figure 15 below, the datum with the highest weight among the feature values was selected as the target weight value.

Results of the Prediction of the Average Live Weight of Individual Broilers
We labeled both the medium-and long-term prediction results using multinomial regression and the short-term prediction results based on the gray model (1, 1) in real time on the same image.As shown in Figure 16  Agriculture 2023, 13, x FOR PEER REVIEW 24 of 27

Results of the Prediction of the Average Live Weight of Individual Broilers
We labeled both the medium-and long-term prediction results using multinomial regression and the short-term prediction results based on the gray model (1, 1) in real time on the same image.As shown in Figure 16

Analysis of the Prediction of the Average Live Weight of Individual Broilers
In the polynomial fit, the predicted values for day 15 were first fitted using the calculated weights for the first 14 days of age, each predicted value was compared to the actual calculated value for the next day, and the SMAPE value was calculated, and so on until the day of delivery.As shown in Figure 17 below, if the cumulative SMAPE value exceeded the set threshold, the model was refitted using all data prior to the current day of age.

Analysis of the Prediction of the Average Live Weight of Individual Broilers
In the polynomial fit, the predicted values for day 15 were first fitted using the calculated weights for the first 14 days of age, each predicted value was compared to the actual calculated value for the next day, and the SMAPE value was calculated, and so on until the day of delivery.As shown in Figure 17 below, if the cumulative SMAPE value exceeded the set threshold, the model was refitted using all data prior to the current day of age.
In the GM (1, 1) model, each predicted value is the result of modeling predictions based on the weight values of the previous 4 days of age.In addition, each prediction is labeled with a confidence level for the breeder's reference, as shown in Figure 18 below.In the GM (1, 1) model, each predicted value is the result of modeling predictions based on the weight values of the previous 4 days of age.In addition, each prediction is labeled with a confidence level for the breeder's reference, as shown in Figure 18 below.In the GM (1, 1) model, each predicted value is the result of modeling predictions based on the weight values of the previous 4 days of age.In addition, each prediction is labeled with a confidence level for the breeder's reference, as shown in Figure 18 below.

Conclusions
Despite the fact that broiler rearing has become highly scientific and standardized, broiler weight management has long been largely dependent on farmers' experience and expertise.In this study, artificial intelligence machine learning algorithms were used to develop cost-effective solutions to improve the accuracy of broiler weight monitoring and prediction based on a simple electronic scale device to support intelligent production decisions, thereby saving labor and resources, improving production, and increasing income.
The innovation of this study is the use of machine learning algorithms based on Gaussian mixture models, Isolation Forest, and OPTICS, which simulate the natural process of weighing and obtaining reliable readings using electronic scales in real life through a twostage clustering and noise reduction process.In addition, a polynomial fitting model and a gray model were combined to achieve real-time weight prediction over long periods of time.Instead of using video cameras and image processing algorithms, we have pioneered a new scheme for calculating and predicting the average body weight of an individual broiler at low cost, with program code that can be run by an ordinary computer and graphs that can be understood by the average person, using only a simple electronic scale and data analysis algorithms.
Future research could include more factors that can be collected on modern broiler farms to efficiently and effectively estimate broiler weights in real time and derive broiler weight distributions to adjust decisions related to broiler feeding, health improvement, harvesting, and chicken meat production planning for overall resource management and income optimization.Secondly, this system could also be used for the weight estimation of other farm animals.

Agriculture 2023 ,
13, x FOR PEER REVIEW 11 of 27for monitoring and prediction based on the weight dataset (in terms of farms and feeding cycles) from the cloud database.

Figure 4 .
Figure 4. Principle of critical-value noise reduction.

Figure 3 .
Figure 3. Laboratory space, equipment, and materials: (A) Scheme of the length, width, and height of the experimental farm and location of the three electronic scales; (B) Cobb 500 broiler; (C) KOKO-FARM electronic scale; (D) farm environment during feeding operations.

Figure 3 .
Figure 3. Laboratory space, equipment, and materials: (A) Scheme of the length, width, and heig of the experimental farm and location of the three electronic scales; (B) Cobb 500 broiler; (C) K KOFARM electronic scale; (D) farm environment during feeding operations.

Figure 4 .
Figure 4. Principle of critical-value noise reduction.

Figure 4 .
Figure 4. Principle of critical-value noise reduction.

Figure 5 .
Figure 5. Sensor data vs. raw data: (A) Sensor data CSV file downloaded from the server containing data from 5 additional sensors and weight sensor data; (B) raw data used as preprocessing inputs.The raw data consisting of timestamps (in seconds) and weight values (in grams) after the preprocessing are shown in Figure6below.

Figure 5 .
Figure 5. Sensor data vs. raw data: (A) Sensor data CSV file downloaded from the server containing data from 5 additional sensors and weight sensor data; (B) raw data used as preprocessing inputs.The raw data consisting of timestamps (in seconds) and weight values (in grams) after the preprocessing are shown in Figure6below.

Figure 6 .
Figure 6.Raw data after preprocessing: (A) Noise reduction results based on critical values; (B) using MissForest to fill all null values.

Figure 6 .
Figure 6.Raw data after preprocessing: (A) Noise reduction results based on critical values; (B) using MissForest to fill all null values.

Figure 6 .
Figure 6.Raw data after preprocessing: (A) Noise reduction results based on critical values; (B) using MissForest to fill all null values.

Figure 7 .
Figure 7. Visualization of the missing values of the three scales' data: j01 represents the first scale's data, with a missing data volume of 602,452 lines (seconds); j02 represents the second scale, and j03 represents the third; the full data volume is 2,667,360 lines.

Figure 7 . 27 Figure 8 .
Figure 7. Visualization of the missing values of the three scales' data: j01 represents the first scale's data, with a missing data volume of 602,452 lines (seconds); j02 represents the second scale, and j03 represents the third; the full data volume is 2,667,360 lines.Agriculture 2023, 13, x FOR PEER REVIEW 19 of 27

Figure 8 .
Figure 8. Visualization of raw data, data based on critical-value noise reduction, and populated data: The 1st denoised data represent the critical-value noise reduction operation.

Agriculture 2023 , 27 Figure 9 .
Figure 9.Comparison of the measured values (1) at 5-day intervals with the results calculated by the algorithm.

Figure 9 .
Figure 9.Comparison of the measured values (1) at 5-day intervals with the results calculated by the algorithm.

Figure 9 .
Comparison of the measured values (1) at 5-day intervals with the results calculated by the algorithm.

Figure 10 .
Figure 10.Comparison of the measured values (2) at 5-day intervals with the results calculated by the algorithm.

Figure 10 . 27 Figure 11 .
Figure 10.Comparison of the measured values (2) at 5-day intervals with the results calculated by the algorithm.Agriculture 2023, 13, x FOR PEER REVIEW 21 of 27

Figure 11 .
Figure 11.Comparison of the delivery-day measurements of the other 111 datasets with the results calculated by the algorithm.

Figure 12 .
Figure 12.BIC values for different variance types; * marks the position of the minimum value.

Figure 13 .
Figure 13.Two-dimensional Gaussian mixture model clustering results.Here, the number of components corresponding to the BIC minimum was set to 19.The different colors represent different clusters, and the white pentagrams are the means for each cluster.

Figure 12 .Figure 12 .
Figure 12.BIC values for different variance types; * marks the position of the minimum value.

Figure 13 .
Figure 13.Two-dimensional Gaussian mixture model clustering results.Here, the number of com ponents corresponding to the BIC minimum was set to 19.The different colors represent differe clusters, and the white pentagrams are the means for each cluster.

Figure 13 .
Figure 13.Two-dimensional Gaussian mixture model clustering results.Here, the number of components corresponding to the BIC minimum was set to 19.The different colors represent different clusters, and the white pentagrams are the means for each cluster.

Figure 14 .
Figure 14.Outliers filtered by Isolation Forest; the purple Xs are outliers.

Figure 15 .
Figure 15.Secondary clustering based on OPTICS.The boxes indicate the first iteration, the X indicates the second iteration, and the red pentagram indicates the final weight.

Figure 14 .
Figure 14.Outliers filtered by Isolation Forest; the purple Xs are outliers.

Figure 15 .
Figure 15.Secondary clustering based on OPTICS.The boxes indicate the first iteration, the X indicates the second iteration, and the red pentagram indicates the final weight.

Figure 15 .
Figure 15.Secondary clustering based on OPTICS.The boxes indicate the first iteration, the X indicates the second iteration, and the red pentagram indicates the final weight.

Figure 16 .
Figure 16.Monitoring and prediction results.Here, the yellow line is the individual broiler body weight value calculated by the body weight monitoring model, the purple pentagram(with the purple text) is the representative body weight value for each day, the green pentagram is the true mean body weight value measured manually on the day of slaughter, the blue line is the polynomial regression prediction value, and the red line is the GM (1, 1) regression prediction value.

Figure 16 .
Figure 16.Monitoring and prediction results.Here, the yellow line is the individual broiler body weight value calculated by the body weight monitoring model, the purple pentagram (with the purple text) is the representative body weight value for each day, the green pentagram is the true mean body weight value measured manually on the day of slaughter, the blue line is the polynomial regression prediction value, and the red line is the GM (1, 1) regression prediction value.

Figure 17 .
Figure 17.Body weight predicted by a polynomial regression model with a feedback regulation mechanism.

Figure 17 .
Figure 17.Body weight predicted by a polynomial regression model with a feedback regulation mechanism.

Figure 17 .
Figure 17.Body weight predicted by a polynomial regression model with a feedback regulation mechanism.

Figure 18 .
Figure 18.Short-term forecasting based on the GM 1) model.

Table 1 .
Summary of previous studies' methodology, scale, findings, and conclusions.