Real-Time Monitoring of Animals and Environment in Broiler Precision Farming—How Robust Is the Data Quality?

: Increasing digitalization in animal farming, commonly addressed as Precision Livestock Farming (PLF), offers beneﬁts in terms of productivity, sustainability, reduced labor and improved monitoring of animal welfare. However, the large amounts of collected data must be stored, processed and evaluated in a proper way. In practice, challenges of continuous and exact data collection can arise, e.g., from air pollutants like dust occluding cameras and sensors, degrading material, the ever-present commotion caused by animals, workers and machines, regularly required maintenance or weak signal transmission. In this study, we analyzed the quality of multi-source spatio-temporal data from a broiler house with 8100 birds over a period of 31 months collected by the Farmer Assistant System (FAS). This is a ceiling-suspended robot equipped with several sensors and cameras that continuously collect data while moving through the barn. The data analysis revealed numerous irregularities: missing values, outliers, repetitive measurements, systematic errors, and temporal and spatial inconsistencies. About 40–50% of all records collected with the early version of the FAS had to be sorted out. The newer version of FAS provided cleaner data, although still about 10–20% of the data had to be removed. Our study has shown that where sophisticated technological systems meet a challenging environment, a thorough and critical review of data completeness and quality is crucial to avoid misinterpretation. The pipeline developed here is designed to help developers and farmers detect failures in signal processing and localize the problem in the hardware components. Scientists, industrial developers and farmers should work more closely together to develop new PLF technologies to more easily advance digitization in agriculture.


Introduction
The global demand for food is steadily increasing along with the growing population in a world of limited natural supplies.For this reason, sustainable livestock production goes hand in hand with an animal species' efficiency in making use of the planet's resources, for which the broiler chicken is a suitable candidate due to its high feed conversion ratio.However, sustainable livestock production further has to meet economic interests and consumer expectations while minimizing the environmental impact [1].A good balance must be found to satisfy various prospects, since, for example, high productivity may come at the expense of broiler welfare [2].A reasonable response is to employ and improve on computer-aided technologies.Precision Livestock Farming (PLF) is a growing sector in animal farming and the production of food derived from animals.It comprises a variety of digital technologies that can record and store large amounts of data from animals and their environment.The high potential of digitalization in livestock production (digital applications) generates enthusiasm and promises benefits such as increased productivity and sustainability, reduced costs and working time, relief in workload, easier surveillance and improvement of standards in animal welfare [3,4].At the same time, critical voices were also raised, ranging from social, cultural, ethical and environmental to political concerns [5].However, what is often overlooked in these debates are the various technical hurdles in an agricultural environment that need to be overcome to bring innovations to realization in the first place.The researcher is often not familiar with the daily challenges that farmers face [6].Also, digitalization in agriculture is directly tied to the collection of large amounts of data.The handling of 'big data' can be vaguely characterized by the so-called three Vs: volume, variety and velocity.These refer to the ever-increasing size of data stored, the complexity of the acquired data types and the speed of data generation, transfer or processing.Additional V's of relevance have been proposed, such as veracity, value and visualization [7][8][9].Precision Livestock Farming (PLF) is one such field where the handling of big data becomes increasingly relevant.In broiler chickens, large flocks are kept in an environment that farmers cannot possibly observe themselves at all times.Automation is already very much prevalent in modern broiler houses, for example, to control the climate in terms of temperature and humidity, to regulate the intensity and duration of light exposure or to ensure food and water supply with the help of stationary sensors [10].Furthermore, the support of advanced PLF systems allows automated, continuous and more objective monitoring and assessment of animal welfare in a non-intrusive fashion throughout entire barns [10,11].Both Fernandez et al. and Yang et al. used stationary ceiling-mounted cameras to investigate parameters such as broiler activity and density, the latter including additional experimental setups, an elevated platform and a robotic vehicle [11,12].Here, we present another such system, the Farmer Assistant System (FAS), a moving robot equipped with multiple sensors and cameras that dynamically and continuously monitor the entire environment of the broiler house.The FAS presents data in (nearly) real-time via mobile alerts or through interaction with a cloud platform, providing an intuitive user interface.Because FAS provides an overview of the entire farm at all times, it is of great value for farmers and vets, by allowing fast retrospective action, such as the identification of weak points in the farm design and establishment of preventive measures for the future to improve animal welfare (Faromatics, Vilanova i la Geltrú, Spain, User Manual, v. 2.2.0).However, with these developments, the challenges in handling big data are becoming ever more imposing.Different types of data are collected, which differ in structure, size and complexity.The data of interest need to be acquired and transmitted consistently and reliably.Furthermore, the data must be easily accessible to the end-user.While the end-user has insight into some basic data analyses and graphs provided by the manufacturer of the FAS, farmers must interpret the results themselves and may want to use the data for their own purposes.To make appropriate use of the data and draw reasonable conclusions, it is essential to work with clean data, i.e., data free of inconsistencies that may impede data analysis.Yet, the aforementioned potential of digitalization in agriculture leads many to strive for fast implementation of new systems while neglecting data quality and thus data usability [13,14].
We, therefore, show an approach to process spatio-temporal multi-source data collected by the FAS to describe the air quality (temperature, relative humidity, airspeed, ammonia and carbon dioxide) and clean them with respect to missing values, outliers, repetitive measurements, systematic errors and spatial and temporal inconsistencies.In doing so, we identified and pointed out typical issues that may occur when employing new technologies in an agricultural setup.

Animal Housing
All data were collected at the farm for education and research of the University of Veterinary Medicine Hannover (Foundation), Ruthe, Germany over a period of 31 months (June 2020-December 2022).The FAS was installed in a broiler house that encompasses up to 8100 broiler chickens per fattening period with a floor space of approximately 480 m 2 .In total, 18 fattening periods were completed during that time, with the first 13 fattening periods lasting 33-34 days.The remaining five fattening periods comprised 40-43 days due to a change in broiler line from Ross 308 (fattening period no.[1][2][3][4][5][6][7][8][9][10][11][12][13] to Ross Ranger (fattening period no.14-15) to Hubbard 757 (fattening period no.[16][17][18].The birds were kept according to the Animal Welfare Livestock Farming Ordinance (TierSchNutztV) Section 4. The regulation was last amended on June 30, specifying the permissible values for temperature, relative humidity, ammonia and carbon dioxide concentrations, light and animal stocking density.Drinking water and food were provided ad libitum, whereas the type of feed was adjusted progressively on the basis of four different developmental stages.Air temperature was controlled by stationary sensors inside the barn and adjusted downward over time from approximately 34 • C to 23

Farmer Assistant System
The FAS operates as a ceiling-suspended robot, attached to and moving along an enclosed rail line (Figure 1).It moves at a speed of approximately 540 m/h, while continuous recording takes place at a level of approximately 60-70 cm above the floor.The total length of the route amounts to approximately 182 m across the barn.As a rule of thumb, the FAS requires about 1 h of charging for 1 h of recording with data being transmitted every 15 min.The FAS records several parameters such as thermal distribution and air quality, with the latter described by airspeed, humidity, NH 3 and CO 2 content.While the FAS is equipped with additional sensors to capture light and sound distribution as well as camera systems that provide an automatic evaluation of images, these were not included in this study due to the additional complexity of the data structure.The behavior of the broiler chickens remained unaffected by the FAS [15,16].

Animal Housing
All data were collected at the farm for education and research of the University of Veterinary Medicine Hannover (Foundation), Ruthe, Germany over a period of 31 months (June 2020-December 2022).The FAS was installed in a broiler house that encompasses up to 8100 broiler chickens per fattening period with a floor space of approximately 480 m 2 .In total, 18 fattening periods were completed during that time, with the first 13 fattening periods lasting 33-34 days.The remaining five fattening periods comprised 40-43 days due to a change in broiler line from Ross 308 (fattening period no.[1][2][3][4][5][6][7][8][9][10][11][12][13] to Ross Ranger (fattening period no.14-15) to Hubbard 757 (fattening period no.[16][17][18].The birds were kept according to the Animal Welfare Livestock Farming Ordinance (TierSchNutztV) Section 4. The regulation was last amended on June 30, specifying the permissible values for temperature, relative humidity, ammonia and carbon dioxide concentrations, light and animal stocking density.Drinking water and food were provided ad libitum, whereas the type of feed was adjusted progressively on the basis of four different developmental stages.Air temperature was controlled by stationary sensors inside the barn and adjusted downward over time from approximately 34 °C to 23 °C (Ross 308) or 34 °C to 20 °C (Ross Ranger & Hubbard 757).The time window for light exposure spanned 24 h at the first day, with 20 and 18 h in the following two days.Hereafter, the duration of light exposure was held constant at 16 h in alternation with a dark period of 8 h [15,16].

Farmer Assistant System
The FAS operates as a ceiling-suspended robot, attached to and moving along an enclosed rail line (Figure 1).It moves at a speed of approximately 540 m/h, while continuous recording takes place at a level of approximately 60-70 cm above the floor.The total length of the route amounts to approximately 182 m across the barn.As a rule of thumb, the FAS requires about 1 h of charging for 1 h of recording with data being transmitted every 15 min.The FAS records several parameters such as thermal distribution and air quality, with the latter described by airspeed, humidity, NH3 and CO2 content.While the FAS is equipped with additional sensors to capture light and sound distribution as well as camera systems that provide an automatic evaluation of images, these were not included in this study due to the additional complexity of the data structure.The behavior of the broiler chickens remained unaffected by the FAS [15,16].

Dataset Structure and Analysis
The retrieval and structure of the raw data are illustrated in Figure 2. The FAS continuously collects and stores data while moving through the barn (Figure 2A).After completing a round, the records are appended to an existing data table (Figure 2B).The raw data for the parameters collected (temperature, ammonia, CO 2 , humidity and airspeed) are stored as separate files (Figure 2C).Each record includes the round number, date and time, x/y/z position (coordinates within the barn), as well as the measured value of the respective parameter.The last record at the end of each round, when the FAS enters the battery charging position (hereafter named 'marker position'), is denoted by characteristic entries and indicates the start of a new round.An example of the basic raw data structure is presented in Supplemental Table S1.All data together from one fattening period make up one dataset (DS) each.

Dataset Structure and Analysis
The retrieval and structure of the raw data are illustrated in Figure 2. The FAS continuously collects and stores data while moving through the barn (Figure 2A).After completing a round, the records are appended to an existing data table (Figure 2B).The raw data for the parameters collected (temperature, ammonia, CO2, humidity and airspeed) are stored as separate files (Figure 2C).Each record includes the round number, date and time, x/y/z position (coordinates within the barn), as well as the measured value of the respective parameter.The last record at the end of each round, when the FAS enters the battery charging position (hereafter named 'marker position'), is denoted by characteristic entries and indicates the start of a new round.An example of the basic raw data structure is presented in Supplemental Table S1.All data together from one fattening period make up one dataset (DS) each.  .At the end of a fattening period, a complete dataset containing all data tables can be downloaded.
To make use of the raw data provided, several processing steps are necessary.The workflow will be explained in more detail below: (1) The column with the round numbers is supposed to help narrow the focus to a particular round.Each round by itself was scanned for inconsistencies in round number assignment.For that, each data table was split into rounds based on the records that serve as 'marker positions'.For each split, the records with a round number unequal to the most prevalent round number were reassigned to the most prevalent round number.Furthermore, it was ensured that each split contained a unique round number.The number of updated records was counted for each file.(2) Next, duplicate records (identical timestamp, x/y/z positions and measurement values) were identified and removed.(3) Measurements for different parameters (NH3, CO2, airspeed, humidity and temperature) were stored in separate files.In order to merge all files and assess the degree of overlap between files corresponding to the same dataset, all files were merged based on identical timestamps and x/y/z positions.(4) Then, the 'marker positions' were removed, as these contain inexpressive numbers that solely indicate the start of a new round.(5) Hereafter, each round was examined for outliers in x/y positions that did not fit into the order of the FAS route.For this purpose, the x/y positions of each record were compared to the previous and the next record.Records with a distance of more than five units to both the previous and the next record were deemed as likely  .At the end of a fattening period, a complete dataset containing all data tables can be downloaded.
To make use of the raw data provided, several processing steps are necessary.The workflow will be explained in more detail below: (1) The column with the round numbers is supposed to help narrow the focus to a particular round.Each round by itself was scanned for inconsistencies in round number assignment.For that, each data table was split into rounds based on the records that serve as 'marker positions'.For each split, the records with a round number unequal to the most prevalent round number were reassigned to the most prevalent round number.Furthermore, it was ensured that each split contained a unique round number.The number of updated records was counted for each file.(2) Next, duplicate records (identical timestamp, x/y/z positions and measurement values) were identified and removed.(3) Measurements for different parameters (NH 3 , CO 2 , airspeed, humidity and temperature) were stored in separate files.In order to merge all files and assess the degree of overlap between files corresponding to the same dataset, all files were merged based on identical timestamps and x/y/z positions.(4) Then, the 'marker positions' were removed, as these contain inexpressive numbers that solely indicate the start of a new round.(5) Hereafter, each round was examined for outliers in x/y positions that did not fit into the order of the FAS route.For this purpose, the x/y positions of each record were compared to the previous and the next record.Records with a distance of more than five units to both the previous and the next record were deemed as likely technical errors and removed.(6) Afterwards, each round was searched for repeated measurements, i.e., the repeated occurrence of the same x/y positions within one round.The values of all such measurements were averaged.(7) Subsequently, the validity of the sensor data was examined by calculating the standard deviation over all measurements by parameter per round.All rounds with zero standard deviation (i.e., an invariant value across all records) were assumed to be technical errors and removed.(8) Then, the time difference from the first record to the last record was calculated for each round.An acceptable time window was defined based on the results (20-30 min) and records from rounds outside of that time window were removed.( 9) At last, the number of records per round was counted, and, once again, based on the results, an acceptable threshold was defined (≥150 records), with records from rounds below this threshold removed.
For each dataset, namely DS1-DS18, numbered in chronological order, the total number of unique timestamps among all files (NH 3 , CO 2 , airspeed, humidity and temperature) was counted.The datasets contained 48,617 (DS7) to 131,527 (DS17) records, ranging from 32 to 65 days (Table 1).Notably, sometimes records were obtained outside of a fattening period.The percentage of data removed by each processing step was calculated in relation to the total number of unique timestamps for each dataset.

Maintenance on the FAS
During scheduled maintenance performed by the manufacturer, the FAS was replaced with a new machine in 2021 in calendar week 42 (amidst the collection of data for DS12, Table 1).The farm personnel performed minor changes and solved timely irregular system failures under the guidance of customer support via remote servicing.

Descriptive Statistics
Analyses were performed in the programming environment 'R' [17].To facilitate the processing of large data tables, the 'data.table'package was used [18].Visualization of data and their distributions was carried out using the 'ggplot2 package [19].Comparisons between means of fattening periods (DS) were carried out using the procedure GLM of SAS, version 9.4 (Statistical Analysis System, Cary, NC, USA, 2023).

Round Number Assignment and Data Merging
Within all datasets, the number of rounds was corrected to a maximum of 63.6% (DS6) and a minimum of 0% (DS3) of all records, while in most datasets less than 10% of the new number for rounds had to be assigned.The amount of updated records was often not equal for different files from the same dataset, most prevalent in DS8-DS11.In more recent datasets, DS13-DS18, the number of updated records was generally low (<4%) and more consistent among different files from the same dataset (Figure 3).Merging the respective files, including the removal of duplicate records, resulted in a data loss of up to 31.2% (DS4).For DS12-DS18, a gradual decrease in data loss due to merging was observed, ranging from 10.1% (DS12) to 3.6% (DS18).
between means of fattening periods (DS) were carried out using the procedure GLM of SAS, version 9.4 (Statistical Analysis System, Cary, NC, USA, 2023).

Round Number Assignment and Data Merging
Within all datasets, the number of rounds was corrected to a maximum of 63.6% (DS6) and a minimum of 0% (DS3) of all records, while in most datasets less than 10% of the new number for rounds had to be assigned.The amount of updated records was often not equal for different files from the same dataset, most prevalent in DS8-DS11.In more recent datasets, DS13-DS18, the number of updated records was generally low (<4%) and more consistent among different files from the same dataset (Figure 3).Merging the respective files, including the removal of duplicate records, resulted in a data loss of up to 31.2% (DS4).For DS12-DS18, a gradual decrease in data loss due to merging was observed, ranging from 10.1% (DS12) to 3.6% (DS18).

Erroneous Coordinates and Repeated Measurements
After data merging, the record that served as a 'marker position' was removed from each round, making up for approximately 0.5% of all records across all datasets.Next, records with incorrect coordinates were removed.These outliers accounted for approximately 1.3-0% across all datasets.Hereafter, records with repeated measurements at identical coordinates were averaged.By this approach, an additional 12% (DS7) to 0.9% (DS10) of all records were removed.

Erroneous Coordinates and Repeated Measurements
After data merging, the record that served as a 'marker position' was removed from each round, making up for approximately 0.5% of all records across all datasets.Next, records with incorrect coordinates were removed.These outliers accounted for approximately 1.3-0% across all datasets.Hereafter, records with repeated measurements at identical coordinates were averaged.By this approach, an additional 12% (DS7) to 0.9% (DS10) of all records were removed.

Analysis of Standard Deviation
The distribution of the standard deviation among all records per round was generally comparable between all datasets (Figure 4).The standard deviation of the airspeed mea-surements was consistently higher in DS9-DS18 in contrast to DS1-DS8, which coincides with an increase in the operating speed of the FAS.Furthermore, some datasets contained entire rounds with a zero standard deviation (i.e., an invariant measurement value) across all parameters: DS1, DS3, DS4, DS5 and DS13 with 16, 11, 5, 109 and 28 rounds affected, respectively.All such rounds were removed and accounted for up to 15.5% (DS5) of the entire dataset.

Analysis of Standard Deviation
The distribution of the standard deviation among all records per round was generally comparable between all datasets (Figure 4).The standard deviation of the airspeed measurements was consistently higher in DS9-DS18 in contrast to DS1-DS8, which coincides with an increase in the operating speed of the FAS.Furthermore, some datasets contained entire rounds with a zero standard deviation (i.e., an invariant measurement value) across all parameters: DS1, DS3, DS4, DS5 and DS13 with 16, 11, 5, 109 and 28 rounds affected, respectively.All such rounds were removed and accounted for up to 15.5% (DS5) of the entire dataset.

Time Window and Field Usage
It can be seen that the majority of all rounds took up to about 27 min in earlier datasets (DS2-DS7) and up to about 22 min in more recent datasets (DS13-DS18).Furthermore, the time window appeared slightly more fine-tuned starting from DS13 (Figure 5).Based on this, all rounds that lasted less than 20 or more than 30 min (a total of 306 and 310 rounds, respectively) were removed.By doing so, 12.4% (DS9) to 0.21% (DS18) of all records were deleted.Notably, some extreme outliers were observed across all datasets, with 18 rounds lasting less than 1 min and 25 rounds lasting more than 1000 min.

Time Window and Field Usage
It can be seen that the majority of all rounds took up to about 27 min in earlier datasets (DS2-DS7) and up to about 22 min in more recent datasets (DS13-DS18).Furthermore, the time window appeared slightly more fine-tuned starting from DS13 (Figure 5).Based on this, all rounds that lasted less than 20 or more than 30 min (a total of 306 and 310 rounds, respectively) were removed.By doing so, 12.4% (DS9) to 0.21% (DS18) of all records were deleted.Notably, some extreme outliers were observed across all datasets, with 18 rounds lasting less than 1 min and 25 rounds lasting more than 1000 min.The majority of all rounds were within a time window of approximately 21 to 27 min.Rounds that took less than 20 min or more than 30 min were removed from further analyses.For improved visibility, the range of the x-axis was adjusted, hiding data from approx.7.5% (outliers) of all rounds.
Afterward, the number of recorded fields per round was counted, ranging mostly from about 150 to 200 fields, with newer datasets (DS13-DS18) being narrowed down to about 170 to 185 fields (Figure 6).Accordingly, all rounds with less than 150 fields recorded (in total 993 rounds) were removed.This applied to 22% (DS2) to as little as 0% (DS16, DS18) of all records.The majority of all rounds were within a time window of approximately 21 to 27 min.Rounds that took less than 20 min or more than 30 min were removed from further analyses.For improved visibility, the range of the x-axis was adjusted, hiding data from approx.7.5% (outliers) of all rounds.
Afterward, the number of recorded fields per round was counted, ranging mostly from about 150 to 200 fields, with newer datasets (DS13-DS18) being narrowed down to about 170 to 185 fields (Figure 6).Accordingly, all rounds with less than 150 fields recorded (in total 993 rounds) were removed.This applied to 22% (DS2) to as little as 0% (DS16, DS18) of all records.

Final Route
For the latest dataset DS18, the field records along the rail line are illustrated (Figure 7).In total, 189 fields were recorded over all rounds.Of these, 175 fields were recorded more than or equal to 90%, six fields more than or equal to 50% and eight fields less than 50% of the time.In this dataset, only one field was recorded in 100% of all rounds.All of these 175 fields were found in DS13-DS18.A summary of the distribution of each measurement parameter is shown in Supplemental Table S2.In earlier datasets (DS1-DS12), different coordinates were transmitted and therefore excluded.

Final Route
For the latest dataset DS18, the field records along the rail line are illustrated (Figure 7).In total, 189 fields were recorded over all rounds.Of these, 175 fields were recorded more than or equal to 90%, six fields more than or equal to 50% and eight fields less than 50% of the time.In this dataset, only one field was recorded in 100% of all rounds.All of these 175 fields were found in DS13-DS18.A summary of the distribution of each measurement parameter is shown in Supplemental Table S2.In earlier datasets (DS1-DS12), different coordinates were transmitted and therefore excluded.

Overall Amount of Removed Data
The percentage of data removed by each filtering step over all datasets is summarized in Figure 8. From earlier datasets (DS1-DS11), a maximum of 54.8% (DS7) and a minimum Figure 7. FAS route and field usage for the DS18 dataset.The color of the fields reflects the percentage number of times that field was recorded over all rounds: ≥90% (black), ≥50% (orange), <50% (red).Non-colored fields were never recorded.Arrows along the black-colored fields indicate a robust FAS route.

Overall Amount of Removed Data
The percentage of data removed by each filtering step over all datasets is summarized in Figure 8. From earlier datasets (DS1-DS11), a maximum of 54.8% (DS7) and a minimum of 39.3% (DS5) of all records were removed.In turn, in more recent datasets (DS12-DS18), the percentage of data being removed ranged from 22.9% (DS12) to 8.7% (DS18).

Discussion
In this study, the quality of large datasets from autonomous digital sensor systems was investigated based on correct annotation, missing values, outliers, repetitive measurements, systematic errors, as well as temporal and spatial irregularities.Cleaning the data is crucial for deducing correct results that lead to useful conclusions and reasonable solutions.Each processing step will be discussed one by one in more detail and possible sources of errors that might have contributed to these are highlighted.
Before further data processing, it was necessary to correct the round numbers accordingly.Some rounds contained single entries with different round numbers that were otherwise identical.By correcting such records, it was easier to identify and remove duplicates that may have caused conflicts in the subsequent data merging process.Furthermore, several entire rounds shared the same round number, especially in DS6 and DS7.Here, round numbers for successive rounds were often not updated (i.e., the start of a new round was not recognized by the machine), leading to the high amount of records (up to 64%) needing to be corrected.Ignoring such errors would cause all sorts of trouble when one intended to analyze the data round-wise later on, as records from different rounds would be mixed up.These technical errors were reported to have mostly come from the FAS not recognizing the loading station.In fact, one of the greater challenges in PLF is the installment of reliable and durable technologies [20].Loss of data due to merging can be explained by files not having the same size as well as unmatched timestamps.Simultaneous acquisition of data by multiple sensors can result in misaligned signals if, for example, the hardware components have slightly different sample rates.There was a discrepancy of up to 12,319 records (DS4) between multiple files from the same dataset.In newer datasets, from DS12 onwards, data loss due to merging was relatively small (10.1-3.6%),im-

Discussion
In this study, the quality of large datasets from autonomous digital sensor systems was investigated based on correct annotation, missing values, outliers, repetitive measurements, systematic errors, as well as temporal and spatial irregularities.Cleaning the data is crucial for deducing correct results that lead to useful conclusions and reasonable solutions.Each processing step will be discussed one by one in more detail and possible sources of errors that might have contributed to these are highlighted.
Before further data processing, it was necessary to correct the round numbers accordingly.Some rounds contained single entries with different round numbers that were otherwise identical.By correcting such records, it was easier to identify and remove duplicates that may have caused conflicts in the subsequent data merging process.Furthermore, several entire rounds shared the same round number, especially in DS6 and DS7.Here, round numbers for successive rounds were often not updated (i.e., the start of a new round was not recognized by the machine), leading to the high amount of records (up to 64%) needing to be corrected.Ignoring such errors would cause all sorts of trouble when one intended to analyze the data round-wise later on, as records from different rounds would be mixed up.These technical errors were reported to have mostly come from the FAS not recognizing the loading station.In fact, one of the greater challenges in PLF is the installment of reliable and durable technologies [20].Loss of data due to merging can be explained by files not having the same size as well as unmatched timestamps.Simultaneous acquisition of data by multiple sensors can result in misaligned signals if, for example, the hardware components have slightly different sample rates.There was a discrepancy of up to 12,319 records (DS4) between multiple files from the same dataset.In newer datasets, from DS12 onwards, data loss due to merging was relatively small (10.1-3.6%),implying that data acquisition among different parameters was more consistent and better synchronized.To further decrease the data loss from merging, missing data could be imputed [21].One could argue that data merging is not necessary as each parameter can be analyzed separately.However, one promising advantage of PLF over conventional systems is that data is acquired continuously via multiple sensors while covering a larger area at a higher resolution.That combined allows for the development of new, more complex models.Investigating relationships among the data may then reveal previously unrecognized patterns.In this way, a more comprehensive picture of the atmosphere in the barn emerges.This could, for instance, be useful for (re-)evaluating animal welfare [9,22,23].For example, it is known that many variables affect the health of broilers [24].Clearer characterization of such risk factors may not only help to identify weak points in the barn to improve the housing design [25] but also support decision making in the choice of the broiler line that is better suited to less controllable factors in the environment [26,27].In this sense, the FAS also comes with a high research value, as insight into the data allows for retrospective deductions and measures for improvements.The removal of records highlighted as 'marker positions' was necessary as they served no other purpose and would not contribute to the actual data analysis.While they made up only a small portion of the total number of records with a similar distribution across all datasets, irregularities were observed.In some datasets, these 'marker positions' were found unexpectedly and interrupted a yet incomplete round.Similarly, seemingly regular records sometimes displayed coordinates not fitting into the sequence of the FAS route, questioning their legitimacy.Again, malfunctions in electrical components are often major contributors to such effects.Overall, such errors were small in quantity but may affect the data analysis.Furthermore, as large amounts of data with increasing complexity have to be stored in PLF applications, any reasonable attempt to condense the data is meaningful [28].As the FAS had no integrated automatic obstacle avoidance at the time of this study, the machine came sometimes to a standstill, leading to repeated measurements at one and the same position.Other technical issues such as maintenance problems may have contributed as well.To counteract these data artifacts, repeated measurements were averaged, but the time required to finish a round might also have increased significantly in such cases.A drone may be more capable of avoiding obstacles.Yet, Parajuli and colleagues observed in an experiment studying the avoidance distance of broilers towards drones and aerial rail systems under various conditions that the broilers were less reactive towards the latter, presumably due to more predictable movements [29].Our own research has also shown that the FAS had no effect on the behavior of the broiler chickens [15,16].
Some datasets contained entire rounds with zero standard deviation (constant value) within a single parameter or across all parameters together.While the FAS contains buffer memory for temporary storage, it is noteworthy that the FAS required manual cleaning of several components such as the sensors or the charger on a regular basis and that-at earlier time points-the internet connection on the farm was not always reliable, possibly adding up to such errors.Also, few records contained exceptionally high or low measurement values.Here, an error in data transmission or sensor functionality may be assumed, but the data was not removed, as each outlier should be consulted with an experienced farmer, e.g., to define appropriate thresholds or to take relevant measures within the barn.Ethical aspects concerning the effect of automation on the human-animal relationship, such as "alienation of laborers" [30] or "deskilling in the farmer" [31] have been rigorously discussed, yet the farmer's knowledge remains invaluable with PLF systems solely assisting farmers in their judgment.
The time needed for the FAS to complete a round improved from older to newer datasets as the operating speed was adjusted slightly.Some rounds lasted longer, most likely explained by the FAS being stalled, while others were much shorter, indicating interrupted rounds.It may also be possible that timestamps were not always transmitted accurately.The number of fields recorded per round was much more consistent in newer datasets.Rounds with a number much smaller than expected were removed as the FAS appeared to have occasionally skipped fields.However, many such rounds were already removed by the previous data filtering step, as rounds with few fields recorded were often also outside of the acceptable defined time window.
Despite all the data processing steps, it is still necessary to impute-or at least acknowledge-missing measurements because most fields will not be recorded across all rounds.Although large amounts of data are collected, the importance of a single record or round should not be overestimated.In contrast to conventional stationary systems, PLF offers higher temporal and spatial resolution [32].Conventional systems often only provide daily average values.Nevertheless, the atmosphere in most broiler barns is already well-regulated by traditional computer-assisted systems and little change is expected most of the time.The more rounds the FAS collects per day, the better the representation of the overall situation in the broiler house.However, a smaller number of rounds may already be sufficient to display the fluctuations based on the daily rhythm of the broiler chickens and the data storage is reduced.Therefore, endless round-the-clock recording is neither desired nor reasonable.
After installing the FAS, there were many issues that had to be addressed first, as summarized in Figure 8.Most datasets from DS1 to DS11 had roughly 40% or more of their records removed by the data pre-processing pipeline presented here.The system required continuous refinement, which lasted more than a year before a significant improvement in data quality was finally observed.From DS12 on to more recent datasets, the amount of data noise gradually decreased.By presenting this arduous progress in the development of a digitalized system in a typical agricultural setup, we want to draw explicit attention to the many obstacles that are often optimistically overlooked at the start of a project and that require a lot of time and many resources to overcome.Multi-source sensor data will always require a critical review and post-processing to some degree due to irregular and incidental errors stemming from electrical components [33].Therefore, close communication between industry, researchers and farmers is well advised to integrate environmental factors, such as the effect of the air quality (acidity, dust, temperature, etc.) on electrical components and the herewith-associated maintenance, animal behavior and farmers' routines.A reliable internet connection needs greater consideration when planning digitalized assistance in an agricultural context.

Conclusions
In this study, we presented a descriptive statistical evaluation of the quality of data collected by the Farmer Assistant System (FAS).For that, we developed a pipeline that identifies common interferences in signal processing such as missing values, outliers, repetitive measurements, systematic errors, and spatial and temporal inconsistencies.Accordingly, we show that the environmental conditions of the broiler house clearly aggravated the intended operation of the rather novel ceiling-suspended machine, resulting in impractical data collection over an extended time period.Technical improvements made it possible to significantly reduce the required data discard rate to less than 10%.In addition to reliably working robots, instruments and sensors, the future challenge will be an interference-free data transmission and, in particular, a permanent critical check of data completeness and quality.This study demonstrates the importance of cooperation between scientists, industrial developers and farmers to adapt hardware tasks to environmental conditions.We further recommend that manufacturers disclose data quality more explicitly.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/su152115527/s1,Table S1: Minimal example of the basic raw data structure (dataset DS18; temperature measurements; modified for demonstration purposes to fit multiple examples of data noise into one table).Each row represents a record with the assigned round number, the date (DD-MM-YYYY) and time (HH:MM:SS), the field (defined by x/y/z-coordinates) at which a measurement was acquired and the measurement value itself.A column with row numbers was added to aid guidance.Row number 7 contains untypical entries with the x/y/z-coordinates denoted by (−1) and the measurement value denoted by (0).This is the last record of a round and serves as a starting indicator, separating the former round from the next.It can be seen that row number 6 (modified) was wrongfully assigned to the next round.At the same time, row number 6 is a repeated measurement with identical timestamps and coordinates to row number 5. Row number 9 (modified) contains another wrongfully assigned round number.Row number 11 (modified) contains x/y-coordinates that do not fit into the expected sequence of the FAS route.Table S2: Mean, median, standard deviation, variance, number of measurements, minimum and maximum per field and parameter (ammonia (ppm), carbon dioxide (ppm), temperature ( • C), airspeed (m/s) and relative humidity (%) across the datasets DS13-DS18.

Figure 1 .
Figure 1.Installation of the Farmer Assistant System (FAS) to monitor birds and environment in a broiler barn.(A) Example image of the ceiling-suspended robot in operation.(B) Construction layout that illustrates the route of the robot (red line) throughout the stable along the food (brown line) and drink supply (blue line).

Figure 1 .
Figure 1.Installation of the Farmer Assistant System (FAS) to monitor birds and environment in a broiler barn.(A) Example image of the ceiling-suspended robot in operation.(B) Construction layout that illustrates the route of the robot (red line) throughout the stable along the food (brown line) and drink supply (blue line).

Figure 2 .
Figure 2. Data retrieval and storage.(A) The FAS moves along the rail line while continuously collecting and storing data.(B) After completing a round, the data is appended to an already existing data table.(C) Each parameter is stored in a separate data table.At the end of a fattening period, a complete dataset containing all data tables can be downloaded.

Figure 2 .
Figure 2. Data retrieval and storage.(A) The FAS moves along the rail line while continuously collecting and storing data.(B) After completing a round, the data is appended to an already existing data table.(C) Each parameter is stored in a separate data table.At the end of a fattening period, a complete dataset containing all data tables can be downloaded.

Figure 4 .
Figure 4. Standard deviation per round for all parameters (airspeed, ammonia, carbon dioxide, relative humidity and temperature) across all datasets.The datasets DS1, DS3, DS4, DS5 and DS13 contained 16, 11, 5, 109 and 28 entire rounds with zero standard deviation for all parameters.For improved visibility, the range of the y-axis was pseudo-log transformed.

Figure 5 .
Figure 5. Density plots displaying the amount of time for a complete round across all datasets.The majority of all rounds were within a time window of approximately 21 to 27 min.Rounds that took less than 20 min or more than 30 min were removed from further analyses.For improved visibility, the range of the x-axis was adjusted, hiding data from approx.7.5% (outliers) of all rounds.

Figure 5 .
Figure 5. Density plots displaying the amount of time for a complete round across all datasets.The majority of all rounds were within a time window of approximately 21 to 27 min.Rounds that took less than 20 min or more than 30 min were removed from further analyses.For improved visibility, the range of the x-axis was adjusted, hiding data from approx.7.5% (outliers) of all rounds.

Figure 6 .
Figure 6.Density plots displaying the number of fields recorded for a complete round across all datasets.In the majority of all rounds approximately 150 to 200 fields were recorded.Rounds that contained less than 150 records were removed from further analyses.

Figure 6 . 16 Figure 7 .
Figure 6.Density plots displaying the number of fields recorded for a complete round across all datasets.In the majority of all rounds approximately 150 to 200 fields were recorded.Rounds that contained less than 150 records were removed from further analyses.Sustainability 2023, 15, x FOR PEER REVIEW 10 of 16

Sustainability 2023 , 16 Figure 8 .
Figure 8. Summary of total amount (%) of removed records from each dataset (DS1-DS18) as a result of various filtering steps.In total, 54.8% (DS7)-8.7%(DS18) were deleted.A significant improvement in data quality can be seen from DS12 onwards in reducing the duplicates and merging losses, incomplete rounds, invariant rounds and repeated measurements.Records outside the time windows clustered in DS8-DS12.

Figure 8 .
Figure 8. Summary of total amount (%) of removed records from each dataset (DS1-DS18) as a result of various filtering steps.In total, 54.8% (DS7)-8.7%(DS18) were deleted.A significant improvement in data quality can be seen from DS12 onwards in reducing the duplicates and merging losses, incomplete rounds, invariant rounds and repeated measurements.Records outside the time windows clustered in DS8-DS12.
• C (Ross 308) or 34 • C to 20 • C (Ross Ranger & Hubbard 757).The time window for light exposure spanned 24 h at the first day, with 20 and 18 h in the following two days.Hereafter, the duration of light exposure was held constant at 16 h in alternation with a dark period of 8 h [15,16].

Table 1 .
General information about the raw data of each dataset.The datasets are numbered increasingly from DS1 to DS18.Sometimes, the data tables contained records with timestamps outside of the fattening period.
Standard deviation per round for all parameters (airspeed, ammonia, carbon dioxide, relative humidity and temperature) across all datasets.The datasets DS1, DS3, DS4, DS5 and DS13 contained 16, 11, 5, 109 and 28 entire rounds with zero standard deviation for all parameters.For improved visibility, the range of the y-axis was pseudo-log transformed.