Next Article in Journal
UVB Light as a Source of Vitamin D for Indoor-Housed Gestating Sows
Previous Article in Journal
Genome Selection for Fleece Traits in Inner Mongolia Cashmere Goats Based on GWAS Prior Marker Information
Previous Article in Special Issue
DAEF-YOLO Model for Individual and Behavior Recognition of Sanhua Geese in Precision Farming Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

KINLI: Time Series Forecasting for Monitoring Poultry Health in Complex Pen Environments

1
Data Science & Artificial Intelligence, Fraunhofer Institute for Applied Information Technology FIT, 53757 Sankt Augustin, Germany
2
Digital Supply Chain Research Group, Offenburg University of Applied Science, 77652 Offenburg, Germany
3
Institute for Machine Learning and Analytics, Offenburg University of Applied Science, 58084 Offenburg, Germany
4
Fakultät für Mathematik und Informatik, Fern Universität in Hagen, 77652 Hagen, Germany
*
Authors to whom correspondence should be addressed.
Animals 2025, 15(21), 3180; https://doi.org/10.3390/ani15213180
Submission received: 29 August 2025 / Revised: 15 October 2025 / Accepted: 29 October 2025 / Published: 31 October 2025

Simple Summary

The paper presents the KINLI project, which applies machine learning and deep learning techniques to time series forecasting for monitoring turkey health in poultry farms. Using a real-world dataset from turkey barns—characterized by noisy, incomplete, and irregular sensor data (e.g., food intake, water intake, environmental factors)—the study evaluates a wide range of forecasting models. These include statistical approaches (ARIMA and Prophet), classical machine learning models (XGBoost and LSTM), transformer-based architectures (Informer, Autoformer and FEDformer), and emerging time series foundation models (PatchTST, TimeLLM, and TimesFM). The authors compare models in terms of forecasting accuracy and practical usability, especially in settings with limited technical expertise. Results show that while deep learning models such as PatchTST perform best overall, simpler models can still offer reliable predictions with minimal setup. Large language models (LLMs) show potential but suffer from computational inefficiency and “pattern deterioration”. Ultimately, the study concludes that robust yet easy-to-use forecasting tools are essential for real-world agricultural applications, where automation and low maintenance are critical.

Abstract

We analyze how to perform accurate time series forecasting for monitoring poultry health in a complex pen environment. To this end, we make use of a novel dataset consisting of a collection of real-world sensor data in the housing of turkeys. The dataset comprises features such as food intake, water intake, and various environmental values, which come with high variance, sensor defects, and unreliable timestamps. In this paper, we investigate different state-of-the-art forecasting algorithms to predict different features, as well as a variety of deep learning models such as different transformer models and time series foundational models. We evaluate both their forecasting accuracy as well as the efforts required to run the models in the first place. Our findings show that some of these aforementioned algorithms are able to produce satisfactory forecasting results on this highly challenging dataset while still remaining easy to use, which is key in a tech-distant industry such as poultry farming.

1. Introduction

Time series analysis is an important research field [1]. Since time series arise whenever information is connected over time, time series analysis has become ubiquitous in many data-driven application domains. From analyzing sensor data to understanding financial data, time series analysis has adopted various machine learning methods. Among the many applications of time series analysis, time series forecasting, which aims to predict future values from present and past values of a time series, has become a frequently encountered operation of predictive analysis. This challenging operation has been tackled by many different approaches, from arithmetic algorithms to deep learning models.
In this paper, we focus on the KINLI project, which aims to apply machine learning along the supply chain of the meat industry, from the hatching, nursing and feeding of animals, to the analysis of meat quality and processing. In doing so, KINLI cooperates with turkey farmers to integrate information about turkeys and assemble systems such that farmers are enabled to analyze their turkey’s condition, ultimately aiming to increase animal health by catching issues with the animals early. To this end, KINLI collects sensor data from turkey barns and conducts an initial outlier analysis prior to forecasting. Naturally, forecasting the future values of the collected sensor data would allow KINLI to assess arising issues with the turkey population ahead of time. But creating meaningful forecasts in this field proves to be a challenge. Though the KINLI project has compiled a dataset of sensor data from turkey populations from multiple different barns, the quality in terms of uncertainty of the dataset is very challenging. Each of these barns varies in multiple aspects—not just in size but also in ventilation systems and sensor implementations. Moreover, sensor failures and system outage occur frequently. This results in a dataset riddled with qualitative issues.
Accurate time series forecasting is vital in the domain of animal farming to detect arising issues early, before the animals are impacted negatively. In the domain of turkey farming, the health of the animals develops so quickly that even a few hours of missing information can have detrimental effects on the herd. By making use of time series forecasting, farmers will be able to detect issues in advance. This paper evaluates a variety of forecasting algorithms on the proposed KINLI turkey dataset. We specifically forgo any data cleaning procedures to instead try to forecast the raw dataset. This is because turkey farmers have little to no expertise in the data science domain, necessitating any technical solution to be near fully automated, down to maintenance, data cleaning, and training of ML models. We also want to make sure that the adoption of new barns and new farmers is possible without a long training and optimization phase. So in the end, the goal is to find a model with no or little adoption needs. Our goal is clearly to create a solution that is practical and easy to use, rather than focusing solely on achieving the most accurate results. Therefore, this paper provides a comprehensive evaluation of multiple forecasting algorithms, from statistical algorithms to complex deep learning frameworks, on the raw dataset. We examine why certain models achieve accurate forecasts despite the dataset’s low quality, while others fall short. We also indicate which changes to the models or the entire process would lead to a better result.
The remainder of this paper is structured as follows: In Section 2 we present a brief overview of time series forecasting and respective approaches. In Section 3 we describe our methods and the used dataset. The resulting findings are presented in Section 4. A detailed discussion of the results of the individual models is presented in Section 5. The impact on the health of turkeys during fattening is discussed in Section 6.

2. Materials and Methods

In this paper, we focus on univariate forecasting, which considers the following problem:
Definition 1. 
Given a series of time stamps T, let X R 1 × T be a univariate time series of numeric values. The univariate time series forecasting problem concerns finding a function or algorithm f · : X Y where Y R 1 × T ^ such that all time stamps in T ^ are temporally located after the time stamps in T and Y = X ^ , where X ^ R 1 × T ^ .
Usually, forecasting is complicated so that finding the right algorithm is difficult and values from Y and X ^ differ. This is evaluated using error metrics, such as the mean squared error or mean absolute error of all values in Y compared to X ^ .

2.1. Forecasting Algorithms

Statistical forecasting algorithms are mathematical methodologies employed to estimate future values or trends by utilizing historical data. These algorithms examine the underlying patterns, relationships, and trends present within the data to generate predictions. Their performance relies on the availability of a comprehensive time series. Notable examples of such algorithms include ARIMA [2] and Prophet [3]. Additionally, these algorithms require periodic retraining, the frequency of which is determined by the temporal granularity of the time series data.

2.2. Forecasting with ML Models

Classic machine learning models for time series forecasting like XGBoost [4] and LSTM [5] use a supervised approach to learn the inherent properties of time series for predictions. Because of that, it is difficult for these models to capture the complete length of time series for training and prediction and even sometimes ignore temporal dependencies. These algorithms must be trained once on the time series data, but when this is done, this can be sufficient. Transfer learning is therefore not possible.

2.2.1. Time Series Forecasting Using Transformers

Transformer models are a class of machine learning models that were developed to solve natural language problems, such as translations or named entity recognition. The development of the attention layer [6] allowed these models to succeed in this regard. By scaling up model size and providing large amounts of data to train on, these large language models (LLMs) were able to mostly solve natural language processing. OpenAI developed the first large language models, starting with GPT-2 [7] and upscaling to GPT-3 [8], followed by the release of ChatGPT [9] and GPT-4 [10]. LLMs possess capabilities beyond natural language processing, including general pattern recognition and reasoning abilities [11,12,13,14]. Due to this, LLMs are also being adapted to other fields.
Experiments with transformer models in time series analysis have begun shortly after the release of the architecture. However, they faced two problems that made their use for time series forecasting inefficient. (i) High memory complexity: Due to the transformer’s memory complexity of O ( L 2 ) for an input sequence of L many tokens, it is difficult to make proper use of it in time series forecasting. (ii) Insensitivity to locality: Due to the transformer having no recurrent features, it is unable to model important information of a time series properly, such as seasonalities.
These issues were first addressed with LogTransformer [15], an encoder–decoder transformer that incorporates an adjusted attention layer that improves the memory complexity to O ( L log L 2 ) , by introducing convolutional kernels inside the layer and introducing a timestamp encoding for time series. This led to a series of improved transformer models made specifically for long-sequence time series forecasting. Informer [16] improved the memory complexity to O ( L log L ) . Autoformer [17] introduced time series decomposition strategies into the model that better provide information about the time series to the models. Finally, FEDformer [18] improved the memory complexity to O ( L ) , with the following improvements:
1.
Mixture-of-experts seasonal-trend decomposition (MOE Decomposition): This uses a set of filters with different sizes for multiple trend components from the input, combining them to create a final trend. This replaces the normal transformer’s feed forward layers.
2.
Frequency enhanced block: This Fourier transforms the time series to then select random components which are transformed back. This is used at the beginning of both the encoder and decoder.
3.
Frequency enhanced attention: This Fourier transforms the input and randomly selects components for the attention computation. This replaces cross-attention.
More improvements in time series forecasting with transformer models include the following: not yet established encoder–decoder transformers [19,20], encoder models [21], and decoder models [22], as well as improvements in model training with different time scales [23] and forecasts with exogenous variables [24].
Beyond these forecasting models, the attention mechanism has been studied for multiple time series analysis tasks independently, such as next frame prediction [25], univariate time series forecasting [26], forecasting with different window sizes [27], and as components in non-transformer networks [28,29].

2.2.2. Time Series Forecasting Using Foundation Models

The transformer models previously described are focused on being as compact and efficient as possible, but they need to be trained on the specific time series that is the objective for forecasting. With the emergence of LLMs, however, the question for a time series foundational model arose, a pre-trained model that can perform any time series analysis task without the explicit necessity to be trained or fine-tuned in the conventional way. However, establishing a time series foundational model faces multiple challenges: (i) Cross-domain differences: Time series data are heterogeneous in terms of dimensions and domain specific characteristics. (i) Language and prompt barrier: Using pre-trained LMs for time series analysis poses challenges, as the language of the LM is not directly adapted to the time series domain, and the effectiveness of prompts additionally requires domain-specific information. (i) Generalization conflicts with specificity: A foundational model is supposed to leverage its generalized knowledge of time series for any given specific time series, where it exhibits issues with identifying and analyzing domain-specific information.
An example of domain-specific challenges is research into forecasting the stock market, which is highly dependent on news sources relevant to the stock. Here, LLMs are frequently used for sentiment analysis of said news articles [30,31,32]. Another use is in the task of modeling and describing stocks [33,34].
This led to research on the creation of time series foundational models [35,36,37,38]. But these efforts were limited due to the low availability of high-quality time series data to train such models on. TimesFM [39] is the first time series foundational model trained purely on time series data. Another approach involved utilizing existing foundation models, which showed first promising results. Utilizing GPT-2 and BERT attention blocks and solely training the embeddings and feed-forward layers on the time series achieves competitive forecast accuracy [40]. PatchTST [41] introduced the idea of patching the time series. By splitting the time series into multiple smaller patches, they were able to enhance the ability of the LLM to process the entire time series without requiring massive computational and memory resources. LLM4TS [42] freezes the attention and feed-forward layers of the LM but still needs to fine tune the other layers inside the model to reach adequate forecasting accuracy. Time-LLM [43] on the other hand demonstrates that adapting existing LMs for time series forecasting without any fine-tuning is possible by using an elaborate patch reprogramming of the numerical values to allow the LM to obtain a textual representation of the time series patches. Unitime [44] further refines the patching by creating a time series tokenizer and proposing a Language-TS transformer which combines the forecasting prompt with the tokenized time series data, which is further refined with dynamic prompt adaptation in Time-FFM [45], with further improvements to time series tokenization by Chronos [46].
However, all previously described approaches use patching and additional strategies to adapt time series to LMs. When using LLMs, it has been shown that it is possible to simply feed the time series to the LLM and achieve high forecast accuracy. Purely prompt-based time series forecasting has been researched with LMs such as T5, Bart, and RoBERTa [47]. LLMTime [48] then used GPT-3 [8] and LLaMA-2 [49] as the basis of their forecasting model, but further research has remained sparse.

3. Test Setup/Method Description

3.1. The KINLI Dataset

The KINLI dataset comprises multiple sensor values originating from turkey farming barns. It is part of a system that monitors the state of each barn, with forecasts to detect possible issues early. This dataset encompasses all challenges that can be found in real-world dataset. There are sensor errors where no values are being recorded and outlier values due to things like the farmer cleaning the system without turning off the sensor. Furthermore, the timestamps for each sensor recording are inconsistent, with a value recorded roughly every 10 min. This has to do with the way in which the data is collected. It is not read via an interface but via a web scraper, which has a different execution time for each cycle and therefore cannot achieve an exact interval. Direct access to the database is not possible. As each barn was added to the system sequentially, none of the barns have the same number of sensor values recorded. Older recordings are also even less frequent, sometimes only once or twice per hour. The dataset contains 14 months of data from 20 different turkey pens. The values are reset to zero after each day and the time series starts again from the beginning. The data comes from stables with animals of the BUT 6 variety and is recorded in the stable from week six to week 22. Only roosters are used for fattening.
This dataset is intriguing to use precisely because of its imperfections, a snippet of which can be seen in Figure 1. Common benchmarks for forecasting algorithms are datasets with very few errors, especially not of this kind. This will give an overview of how these forecasting algorithms perform in a real-life setting.
Another concern is the amount of pre-processing necessary to make proper forecasts on this dataset. Agriculture and animal farming is not a widely digitalized industry, at most relying on proprietary software that does not offer analytical capabilities and suffers from extreme vendor lock-in. Because of that, we want to try to use as little pre-processing as possible so that the setup and deployment of the forecasting solution are as easy as possible for turkey farmers, even without extensive knowledge about machine learning and data science. In addition, every barn is different with respect to structure, technical equipment, location, and climate, and every farmer has a different approach. Each turkey fattening cycle is also different, and the cycles for each barn do not start at the same time and are different in length.
Regarding the task of time series forecasting, we consider the following values:
  • Water/Day tracks the amount of water distributed to turkeys in a barn. This sensor accumulates until a final value is reached at 23:59:59 and resets for the next day. Normally it accumulates slowly until 06:00:00. Then the turkeys wake up and start drinking, which leads to a steeper rise until 23:00:00 when the turkeys go to sleep. This value is calculated by dividing the total consumption by the number of animals present.
  • Food/Day tracks the amount of food distributed to turkeys in a barn. This sensor also accumulates until a final value is reached at 23:59:59 and resets for the next day. Unlike the water sensor, the feed pump only runs a few minutes per day, meaning the sensor data remains unchanged for long times and then shoots up as the food pump runs for a few minutes to refill the food for the turkeys. A critical time is when the turkeys wake up and start eating food, between 06:00:00 and 08:00:00. This value is calculated by dividing the total consumption by the number of animals present.
  • Water/Food is the ratio of water to food consumed by turkeys, based on the previous two sensor values. This indicator is vital to detecting diseases in turkey, as a sickly herd will stop eating food.

3.2. Experiment Set-Up

The framework from [18] was used for the training and inference of the models. For all the deep learning models a GPU Cluster via SLURM was used for the statistical and ML models. The Inference for the Non-Open-Source Models was performed via the according APIs from the vendors. The mean squared error and mean absolute error were used as metrics. The experiments were carried out several times and the respective mean value was calculated. Data transformation was only carried out to the extent that it was brought into the appropriate form for the respective model. The steps for this are explained in the model section. Strategies for the LLM models can be found in Appendix A. Hyperparameters for the models are in Appendix B. If a batch size can be used, it is 32.

3.3. Considered Algorithms and Models

  • Statistical algorithms: ARIMA [2] and Prophet [3] are frequently used for time series forecasts and often show better results than deep learning models. SARIMA was not chosen, as there are no fixed lengths for individual cycles or seasons. The models do not require any scaling of the data. To perform the tests, it is necessary that the model is refitted after a prediction step. This makes the tests significantly longer.
  • Machine learning algorithms: XGBoost [4], a Gradient Boosting Model, was chosen because it is easy to implement and offers fast training and inference times. It is frequently used in industry for time series, with good results. The model does not require any scaling of the data. A sliding window was used to transform the data to make them accessible for XGBoost. This allowed the time series to be presented as tabular data. A multi-output regressor from [50] was also used to predict several time steps at once. This was not necessary for the EOD forecast.
  • Deep neural networks: We use three types of simple linear models as a baseline for deep learning algorithms. These are called ‘Linear’ which is just a single linear layer, ‘NLinear’ which applies normalization to the linear layer, and ‘DLinear’ which uses time series decomposition and a moving window trend, similar to Autoformer [17], before the linear layer.
  • Recurrent neural networks: LSTM [5] models can recognize long-range dependencies and with automatic feature extraction, no pre-processing other than scaling is necessary. Before the advent of Transformer models, LSTM models were the industry standard for sequential data.
  • Transformer models: When it comes to transformer models, we first use a basic transformer model [6], and then three of the specialized forecasting models: Informer [16], Autoformer [17], and FEDformer [18].
  • Time Series Foundation Models: We test PatchTST [41], which introduces patching, as well as TimeLLM [43], which combines patching with reprogramming to allow forecasting with proprietary LLMs. And at last we use TimesFM [39] as the first time series foundational model trained purely on time series data.
  • LLM forecasting models: Following the results of LLMTime [48], we adapt their results to our tests by using smaller open-weight LLMs, not larger than 10 billion parameters. In addition to that, we also implement some prompting strategies for time series forecasting with LLMs, instead of only feeding raw values into the LLM. The strategies used can be found in Appendix A. We focus on smaller LLMs due to the fact that these models, were they to be implemented in KINLI, would have to be run locally at the farmer’s own server infrastructure, which is not particularly powerful. They are also more cost efficient due to this fact. We use Falcon-7b [51] because it is a small LLM that is not remarkable, maybe even outdated, to compare against more advanced LLMs. The other LLMs we test are from the LLama series of LLM models [49,52], namely LLama-3-8b, LLama-3.1-8b, LLama-3.2-1b, and LLama-3.2-3b [53] since they are known for their good performance for their size. We only use base models, as they did not receive extensive RLHF, which should have a positive impact on performance [48].

3.4. The End-of-Day Forecast

To be able to properly detect problems in the barn and help immediately, we aim to forecast the end-of-day value (eod value). By comparing the forecast of each day over the entire growing cycle (cycle value) of the turkeys against the forecast of the eod value of the current day using today’s sensor data, it is possible to detect possible errors if the eod value is too far below or above the cycle value. And thus, we have the opportunity to recognize problems or illnesses as early as possible, possibly even before they can occur. What is important here is that it is not necessary to make an exact prediction; what is more important is how large the deviation will or can be. This test concerns forecasting the eod value using the available sensor data at present, from the current day and forecasts the next values until the eod value. We consider the tests of forecasting the end-of-day value from (1) 22:00:00 (2) 18:00:00, (3) 12:00:00, and (4) 08:00:00, as well as (5) the critical time when the turkeys are just waking up, forecasting the 08:00:00 time using all values up to 06:00:00.

3.5. Long-Sequence Forecast

After evaluating these forecasting algorithms on the end-of-day test, we choose promising algorithms to perform a series of long-sequence forecasts, where the sequence length is increased to take multiple days of sensor values into account before making the end-of-day forecast. Each of these tests aims to forecast the end-of-day value from 08:00:00, as this is the most challenging end-of-day forecast.

4. Results

This section presents the results of our tests. The following Figure 2 shows the result of the different models for each of the forecasting horizons and the different prediction values. The results for all models can be found in Appendix B.
Machine learning and deep learning models that need to train on the dataset all have issues with outliers impacting their forecasts. As a result, even on a seemingly simple target such as (Water/Day), where the sensor value each day roughly follows an upwards trend, these models forecast big jumps between each consecutive value, even if the overall trend of the forecast is correct. Models with better metrics usually forecast less severe jumps between values. Examples can be seen in Figure 3. Statistical models have problems with the first predictions, as they always make a very large jump, up or down. However, they then approach the ground truth curve again in phases, only to be completely wrong at the end of the day by ending the day too early.
The linear models serve as a baseline for deep learning model performance. As they simply consist of a single linear layer, we consider any more complex deep learning model that has worse accuracy to be a failure. On all three basic tests, this is only the case for TimeLLM. All other models are better. On the end-of-day tests, linear models improve comparatively as the forecasting length increases. The normalization linear model (NLinear) performs particularly well in all three end-of-day tests. However, in the test from 06:00:00 to 08:00:00, the normalization linear model is worse.
The specialized transformer models all face a noticeable forecast accuracy issue on the unscaled dataset, which is more extreme with the vanilla Transformer and Informer. On the unscaled dataset, these models are terrible, while on the scaled dataset, these two models are among the best models of each test. Other models do not show such a significant difference between scaled and unscaled versions. The sensitivity to scaling is inverted on the (Water/Food) forecasting test. Here, these models perform better on the unscaled dataset.
The LLMs performance varies heavily between the different tests. In the basic test, the LLMs achieve good metrics on all three targets. On the end-of-day tests, the LLMs still perform decently on the 18:00:00 test but drop sharply for the 12:00:00 test, the 08:00:00 test, and the critical-time test. On the (Water/Day) forecast, the LLMs show seemingly normal performance but all have highly similar metrics on (Food/Day) and (Water/Food). This is due to an issue with pure LLM forecasting that we call “Pattern Deterioration”. LLMs have a habit of simplifying the forecasting pattern down to producing a flat line. Due to the shape of the (Food/Day) data, this process is instant, which is why all these models have the same metrics. On (Water/Food) the deterioration still occurs, but since the values of this target always remain around the same value, a flat line after a few values is not a bad forecast, while on (Water/Day), the pattern deteriorates into an upward trend line, which is why the LLMs perform decently there. This is shown in Figure 4.
One unique issue of the LLM-based forecasting is the topic of their compute time. During testing, it has been shown that forecasting the time series one value at a time, up to the forecasting length, results in the highest forecast accuracy. However, that means that the computation time scales with the forecasting length, while these models already have the highest inference time of all tested models. All other models are much faster and less resource hungry in comparison.
TimeLLM performs worse than the direct LLM forecast with the same model on the basic forecast with all models tested. This is peculiar because TimeLLM uses patch reprogramming to allow the LLM to obtain a text representation of the time series to improve accuracy and avoid issues such as pattern deterioration. On the end-of-day tests, the TimeLLM performance improves comparatively to the LLMs due to training, and on the 12:00:00 and 08:00:00 tests, it even reaches a performance in the higher middle field with GPT-2 as the backbone. We surmise that TimeLLM is sensitive to the models used with it and cannot properly use newer LLMs, which is why performance is best with GPT-2. In addition to this, TimeLLM is scaling sensitive. Although the difference is not as severe as with the specialized transformer models, TimeLLM shows the same preference for scaling, and the same inverse preference for unscaled data on (Water/Food).
PatchTST on the other hand performs well in all tests, being among the best performing models. It is also not scale sensitive.
The performance of TimesFM is initially very good but drops heavily on the end-of-day tests. Being a model that did not receive fine-tuning on the dataset, the model’s forecasting output is heavily impacted by the length of the input sequence. Therefore, as the timestamp gets closer to 00:00:00 and the forecasting length increases, TimesFM has a particularly hard performance drop. The (Water/Day) end-of-day test from 08:00:00 in Figure 5 showcases the issue with TimesFM particularly well, as it is not able to pick up on the increased water intake from the waking turkeys, forecasting a lower rise in line with values from when the turkeys were still sleeping.

Long Sequence Forecasts

To gain further insights into the performance of the forecasting algorithms on our dataset, we performed a series of long sequence forecasts on (Water/Day), with the sequence lengths being 256, 512, and 1024 in 10 min time steps. We were especially interested in the performance of models that did not train on the data, such as the direct LLM forecasts and TimesFM, and compared them against the performance of trained models. Table 1 shows an overview of the results of these forecasts.
The models that train on the entire dataset show no significant improvements in their metrics. For the models that do not train, TimesFM and the LLMs, the performance does increase significantly but only up to a limit. Figure 6 shows that TimesFM only improves the MAE to 241.0 on the sequence length of 256, but further increasing the sequence length does not produce any further improvements. Although this is a significant improvement over performance in the end-of-day test, where the MAE is greater than 300, it still is among the worst models of these tests.
Figure 7 shows that LLMs also show a significant performance increase with the added sequence length but are not able to catch up to the trained models either. The additional sequence length also does not reduce the previously described issue of pattern deterioration.

5. Discussion

In this section, we will offer some discussion of the results of our forecasting tests.

5.1. Issues with LLMs in Time Series Forecasting

When we adapted these LLMs for time series forecasting, we identified the following reoccurring issues:
  • Unrelated output: This refers to the model output being unrelated to the given prompt. Usually, this results in the output containing either no usable numeric values at all or having noticeable output deterioration to the point of making the forecast meaningless. An example is shown in Figure 8. The risk of unrelated output increases with each non-numeric token in the input prompt; therefore, the most successful forecasts include no text prompt but only time series values. We also found that LLama models generate unrelated output more often than Falcon-7b.
  • Precision Deterioration: This refers to the phenomenon that the output values become imprecise with increasing forecasting horizon. As shown in Figure 9, after the model generates some values with the correct precision, it then starts reducing the precision and never recovers, reducing the output to a simple set of single-digit numbers. To combat this, we filter the model’s output to only accept values that have the required precision or higher and discard any generated lower precision values. This leads to a higher amount of forecasts failing, like rounding to 12 digits or keeping the original 16 digit precision.
  • Pattern Deterioration: As mentioned previously, this refers to the model output values presenting less and less complex patterns with increasing forecasting horizon, until it settles on a single value that is repeated over and over again. This usually happens quickly; in just four to five values, the LLM reduces the pattern to a constant but this can be faster if the time series already contains repeating values. This is shown in Figure 9. Pattern deterioration remains the key challenge open-weight LLMs of this size face when forecasting a time series directly.

5.2. Issues with ARIMA

ARIMA does not perform well in all tests; this is because the data does not represent good statistical patterns. Seasonal and recurring patterns are not strongly represented in the data, so this leads to problems. The fact that the intervals between the individual time stamps are not always the same also causes problems here. It is also problematic that the performance of ARIMA depends very much on the hyperparameters, which were not tested too extensively in this scenario. Even if the tests performed well, a major problem here would be the constant training of the model on the limited data available.

5.3. Issues with Prophet

Prophet performs relatively well on the data, but problems arise with longer forecasting lengths. For use, Prophet has the problem that it always has to be completely retrained on new data; retraining is not possible in our scenario. We want to ensure that the model works out of the box without retraining or refitting to a different barn.

5.4. Issues with XGBoost

The main problem with XGBoost is that the hyperparameters are very important for this model and a lot of optimization is required. Pre-processing the data so that they can be used for a time series prediction is also very time-consuming.

5.5. Issues with LSTM

There are relatively little data for LSTM such that it can be trained well. Another problem for the LSTM is that the models often require hyperparameter tuning, which was not performed in detail in this scenario due to the limited number of pre-processing steps. The training of the model is also very slow and resource intensive.

5.6. Issues with Specialized Transformer Models

The interesting observation about the specialized Transformer models is that the performance of the newer FEDformer [18] and Autoformer [17] is usually worse than that of Informer and the regular Transformer. We surmise that the time series decomposition introduced by Autoformer is at fault, as FEDformer further enhances this component. These models were optimized for memory complexity by observing patterns in the attention layer when forecasting and enhancing attention with these patterns. But these patterns do not appear when forecasting our dataset without pre-processing, leading to information loss and therefore lower forecasting accuracy compared to Informer or even just the vanilla Transformer model. All transformer models exhibit ’erratic’ forecasts due to the high amount of inaccuracies and anomalies in our dataset.

5.7. Issues with TimeLLM

The performance of TimeLLM [43] is in contrast to PatchTST [41], which performs well on all tests. This demonstrates that patching the time series is not the issue and perfectly suited for forecasting messy data like this. Instead, TimeLLM performance can be explained by the two main differences from PatchTST. The model’s reliance on pre-trained LLMs that are not directly adapted to time series forecasting is the main cause of the discrepancy in performance, as it is in line with the raw LLM forecasting performance. The second issue is the patch reprogramming used to adapt the patches to the LLMs which understand the text best. Patch reprogramming is successful in the sense that it eliminates the issues that LLMs face when forecasting, such as pattern deterioration. However, the adaptation leads to a worse performance than the LLMs in cases where the simpler pattern of the LLM forecast is fitting, such as (Water/Day) or (Water/Food). The comparative improvement in performance over LLMs in longer forecasting lengths can be attributed to the fact that this model still trains on the entire dataset, eliminating the key weakness of short sequence lengths.
Due to the overall bad performance and training requirements of TimeLLM, we would not recommend using this model in production at this time.

5.8. Issues with TimesFM

With TimesFM [39] we observe the main issue of the short sequence length of the end-of-day tests. Since the model does not train on the dataset, the performance is directly related to the amount of information put into the model, meaning that the short sequence length leads to poor forecast accuracy. The test of 06:00:00 to 08:00:00 demonstrates that TimesFM is more impacted by the length of the input sequence than the forecasting length, as performance remains poor.

5.9. Possibilities with Pre-Processing

Significantly better results could be achieved by pre-processing the data. However, the authors decided against this because the aim was to test the models virtually out of the box. A farmer should later have the option of adding a new barn or generally using the system without having to make any major preparations. Data availability in this area is also not particularly high, which means that training models or pre-processing based on metrics is not possible.

5.10. Fine-Tuning of LLM Models

Fine-tuning the LLM would significantly improve its results but at a high cost in terms of resources. However, it remains questionable whether an individual farmer would be able to fine-tune such a model using their existing data. Looking at the entire dataset across stables, it is certainly possible to achieve better results. However, training would contradict the original idea that no or only minor adjustments should be made to a model.

6. Conclusions

6.1. Result Discussion-Impact on Poultry Health/KINLI

Achieving the daily dose is very important for turkey fattening; larger deviations can be indicators of disease, or poor water or feed quality. The earlier it becomes apparent that the daily dose cannot be reached, the faster countermeasures can be taken. In the fattening of turkeys, 5–6 h are crucial here to react; during this period, sick animals can possibly be saved. The earlier the daily dose can be predicted, the faster any diseases can be recognized. Any problems with the water or feed supply can also be identified. It can also be determined whether the procedure needs to be changed for a fattening cycle because the animals can react differently to feed or water quality than in previous cycles. However, in order to achieve this, the farmer must have an approximate forecast of the animals’ consumption as early as possible. Ultimately, this ensures that the animals remain healthy and that turkey fattening can be carried out sustainably in terms of resource consumption. In the end, it is not crucial that the prediction is as accurate as possible but that it can be applied quickly and easily and can also be applied to new barns without starting a long data collection process and pre-processing it accordingly afterwards. At this point, however, it must be clearly stated that testing across different stables and different fattening cycles is still pending in order to ultimately ensure that farmers can work with it, and that the health of the animals is also improved as a result.

6.2. Conclusions

We were able to identify the best models for our application, which are Informer and PatchTST. Although XGBoost and Prophet also offered good forecast accuracy, the required hyperparameter tuning and regular retraining requirements make them unattractive in the field of poultry farming, where there is usually very little data to train or adapt time series forecasting models. Models that did not train on the dataset, such as LLMs and TimesFM, were easier to use but failed to achieve satisfactory forecast accuracy. TimeLLM did reach good accuracy with GPT-2 as the backbone but was more costly to train. After evaluating these models on the dataset with these tests, we came to the conclusion of focusing on PatchTST in the production environment. With this we will be able to make forecasts with satisfactory accuracy to help detect problems in the poultry pen ahead of time, without a long phase in the data collection, pre-processing, and adopting phase. We have demonstrated that, with very little effort in terms of data preparation and training, it is possible to use LLM models to forecast consumption data in turkey farming. However, the results are not yet accurate enough to make completely precise predictions. The initial results are good enough to test them in practice. The predictions are made available via a REST API and incorporated into the visualizations of the sensor values. This makes them easily accessible to farmers. At the same time, the predicted value is compared with a threshold value for the respective fattening period, and if it overly exceeds or falls below this value, an email is sent to the farmer.
Further research plans include testing new smaller models that have been published in the meantime, as well as retraining any small LLM models in order to achieve better results. There are also plans to train smaller models in order to obtain a kind of foundation model, which would make it much easier to adapt and apply to new pens.

6.3. Concluding Remarks and Novel Contributions

In conclusion, this work contributes to the research on time series forecasting by systematically evaluating a broad spectrum of forecasting paradigms—from classical statistical models to modern deep learning and foundation architectures—on sensor data collected from real turkey barns. This setting represents a highly complex and practically relevant application domain, where sensor failures, irregular data intervals, and environmental variability pose significant challenges for predictive modeling. By focusing on poultry health monitoring, the study demonstrates how time series forecasting can support early detection of anomalies and improve animal welfare in modern livestock management systems. Furthermore, we empirically observe a degradation effect in LLM-based forecasting, where large language models tend to simplify temporal dynamics into flatter trajectories over longer forecast horizons—a behavior that may limit their applicability for real-world longtime monitoring tasks. Overall, the findings highlight both the opportunities and the limitations of applying foundation models in agriculture, and provide a foundation for future research on data-driven animal health monitoring under realistic field conditions.

Author Contributions

Conceptualization, C.I.P. and T.Z.; methodology, C.I.P. and T.Z.; software, C.I.P. and T.Z.; validation, C.I.P. and T.Z.; formal analysis, C.I.P. and T.Z.; investigation, C.I.P. and T.Z.; resources, C.I.P. and T.Z.; data curation, T.Z.; writing—original draft preparation, C.I.P. and T.Z.; writing—review and editing, C.B. and T.L.; visualization, C.I.P. and T.Z.; supervision, C.B. and T.L.; project administration, C.I.P.; funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

The project is supported by funds of the Federal Ministry of Agriculture, Food and Regional Identity (BMLEH) based on a decision of the Parliament of the Federal Republic of Germany via the Federal Office for Agriculture and Food (BLE) under the strategy for digitalisation in agriculture.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the subject of the study merely requiring data analysis without direct animal involvement.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to data protection reasons and due to business confidentiality.

Acknowledgments

Different Generative AI models were used as subjects of the study. GenAI was used to adjust phrasing and spelling errors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Prompt Strategies for LLM Based Forecasts

We used an experimental approach that allowed us to test a variety of possible prompt and data injection strategies with the LLM. With this setup, we tested a variety of prompting strategies to see how they would perform.

Appendix A.1. Spacing

Introducing possible spacing between individual digits of input values, as suggested in [48] to have the tokenizer tokenize each digit individually.
Table A1. Spacing strategies.
Table A1. Spacing strategies.
No SpacingSpacing
0.856776 0 . 8 5 6 7 7 6
We were unable to corroborate the claim that added spacing for certain tokenizers increases performance. Instead, forecasting performance was always worse when spaces between the digits were added.

Appendix A.2. Input Length

Shortening the input length down from the original sequence length to half or a quarter of that. This did not have any positive effects, and we recommend always giving the LLM as much information as possible.

Appendix A.3. Data Format

The time series is given to the LLM either as a closed list of values, enclosed by brackets, or as an open list that ends in a comma or a tabular view with timestamps, which is intended to encourage the LLM to continue generating numbers.
Table A2. Data formats.
Table A2. Data formats.
Open ListClosed ListTabular
..., 0.856776 , 0.867993 , 0.882301 [..., 0.856776 , 0.867993 , 0.882301 ]...,12:34:26; 0.867993, 12:44:51; 0.882301

Appendix A.4. Rounding

This entails rounding the input down from 16 digits to either 12, 8, 7, 6, 4 or 2 digits after the decimal point. We found that not rounding the values leads to failure due to precision deterioration. Depending on the LLM, the amount of rounding necessary may vary, but for open weights LLMs in the range of 7b parameters, the best forecasting performance was reached on both 7 and 6 digits.

Appendix A.5. Integer Transformation

Removing the decimal point to make the numbers simple integers. This uses two possible strategies, either removing leading 0s or keeping them. Transforming the numbers and removing leading 0s encourages precision deterioration, usually leading to a failure to produce output values with the proper precision. Between keeping the original value and keeping leading 0s, the performance depends on the original values. For values that frequently have two or more digits in front of the decimal point, no integer transformation is recommended, but for values that only have a single digit before the decimal point, such as most scaled datasets, the simple integer transformation is better.
Table A3. Integer transformation.
Table A3. Integer transformation.
Original ValueSimple IntegersComplex Integers
0.985543 0985543985543
1.112436 11124361112436

Appendix A.6. Prediction Task

This is providing either only the data as is without a given task, or providing a simple task description. These might either be to predict the next 16 steps at once, or predict missing steps until the prediction length is reached, or at last only predict the single next point. All tasks follow the prompt-as-prefix format that has been shown to be effective [43]. Our prompts can be found in Table A4. Introducing a task prompt has different effects on the observed LLMs. For Falcon-7b, the prompts that ask the LLM to always predict the next 16 values, regardless of currently missing values, result in slightly improved performance over no task prompt, while both other tasks are slightly worse. For Falcon-7b the differences between the prompts were very minimal. With LLama-3-8b, using no prediction task prompt has the best performance, then predicting the next value and predicting 16 always are similar, and predicting the next missing values is worst. The differences in performance are much larger for LLama-3-8b than for Falcon-7b but still very small. Because of this, both models have their best strategies distributed between each prediction strategy, except for missing prediction.
Table A4. Prompt strategies used.
Table A4. Prompt strategies used.
Prompt TypePrompt
None{data}
Predict 1Predict the next value of the following numeric sequence: {data}
Predict 16Predict the next 16 values of the following numeric sequence: {data}
Predict missingPredict the next {missing} values of the following numeric sequence: {data}

Appendix A.7. Knowledge Enhancement

Providing simple information about the previous 96 steps provides possible knowledge enhancement. These contain two settings. The first one simply lists all knowledge enhancements we made, the second one only picks some of them. A list of all used knowledge enhancements can be found in Table A6. Full KE and basic KE had negative effects on forecasting accuracy, while minimal KE and no KE have minimal performance differences and are overall the same. The effectiveness of KE on these LLMs is limited due to the previously outlined issue of pattern deterioration.
Table A5. List of possible knowledge enhancements, and where they are used.
Table A5. List of possible knowledge enhancements, and where they are used.
InformationUsed by Full KEUsed by Basic KEUsed by Nimimal KE
Minimal valueyesyesyes
Maximal valueyesyesyes
Mean valueyesyesno
Median valueyesyesno
Standard deviationyesyesno
General trendyesyesyes
Varianceyesnono
Range valueyesnono
25th percentileyesnono
75th percentileyesnono
Mode valueyesnono
Kurtosis valueyesnono
Skewness valueyesnono
Autocorrelationyesyesno
Value countyesyesyes

Appendix B. Results for the Individual Prediction Tasks

Table A6. Best results of model evaluation on raw water per day.
Table A6. Best results of model evaluation on raw water per day.
DatasetLengthModelSettingsMSEMAE
Water/Day16LLama-3.1-8bRound7, pred 16a57,840.10268.47949
Water/Day16LLama-3.2-1bRound658,941.6664.88672
Water/Day16LLama-3.2-3bRound6, minimal KE58,611.0269.29980
Water/Day16LLama-3-8bRound7, pred16a58,413.54769.79590
Water/Day16Falcon-7bRound7, pred157,588.23872.88477
Water/Day16FEDformerdm1024 nh8 el4 dl4 df2048 fc3101,843.289225.293
Water/Day16Autoformerdm512 nh8 el4 dl4 df2048 fc2138,818.594322.072
Water/Day16Informerdm1024 nh8 el4 dl4 df2048 fc3627,469.875743.984
Water/Day16Transformerdm1024 nh8 el4 dl4 df2048 fc3627,754.438744.168
Water/Day16DLinear 81,499.10194.8440
Water/Day16NLinear 77,887.39186.6709
Water/Day16Linear 89,925.68209.81595
Water/Day16PatchTST 35,381.09109.3663
Water/Day16TimeLLMLlama-2-7b88,190.01219.6265
Water/Day16TimeLLMGPT256,871.81160.2766
Water/Day16TimeLLMLlama-3.2-1b104,556.6234.1954
Water/Day16TimeLLMLlama-3.2-3b72,011.2192.8992
Water/Day16TimeLLMLlama-3.1-8b59,742.5167.828
Water/Day16TimesFMno fine tuning62,865.2884.95092
Water/Day16ARIMA96, 2, 1333,688.7494462.066
Water/Day16Prophetseasonality = daily, mode = multiplicativ27,151.4590114.79
Water/Day16XGBoostestimators = 700, depth = 634,543.429875.05
Water/Day16LSTM 132,292.36299.84
Table A7. Best results of model evaluation on raw food per day.
Table A7. Best results of model evaluation on raw food per day.
DatasetLengthModelSettingsMSEMAE
Food/Day16LLama-3.1-8bRound68853.47851.54004
Food/Day16LLama-3.2-1bRound68853.47851.54004
Food/Day16LLama-3.2-3bRound68853.47851.54004
Food/Day16LLama-3-8bRound68853.47851.54004
Food/Day16Falcon-7bRound6, pred16a, integers with 0s8788.88651.10449
Food/Day16FEDformerdm1024 nh8 el4 dl4 df2048 fc34287.31444.77009
Food/Day16Autoformerdm512 nh8 el4 dl4 df2048 fc313,844.83587.94547
Food/Day16Informerdm1024 nh8 el4 dl4 df2048 fc328,238.459139.49284
Food/Day16Transformerdm1024 nh8 el4 dl4 df2048 fc333,873.613157.26195
Food/Day16DLinear 11,288.9873.32854
Food/Day16NLinear 10,931.5873.36824
Food/Day16Linear 12,085.0577.60903
Food/Day16PatchTST 1343.07223.36775
Food/Day16TimeLLMLlama-2-7b8176.93966.14028
Food/Day16TimeLLMGPT25703.72051.06726
Food/Day16TimeLLMLlama-3.2-1b9821.35768.67682
Food/Day16TimeLLMLlama-3.2-3b18,011.233105.4492
Food/Day16TimeLLMLlama-3.1-8b14,496.4790.37284
Food/Day16TimesFMno fine tuning9961.34955.96097
Food/Day16ARIMA96, 0, 661,003.7543198.2594
Food/Day16Prophetseasonality = daily, mode = multiplicative8190.9396470.3218606
Food/Day16XGBoostestimators = 700, depth = 610,768.681050.45
Food/Day16LSTM 15,114.2695.41
Table A8. Best results of model evaluation on raw water per food.
Table A8. Best results of model evaluation on raw water per food.
DatasetLengthModelSettingsMSEMAE
Water/Food16LLama-3.1-8bRound6, integers with 0s0.7084030.566563
Water/Food16LLama-3.2-1bRound60.6853070.557363
Water/Food16LLama-3.2-3bRound7, pred 1, minimal KE0.6684940.560215
Water/Food16LLama-3-8bRound6, pred1, minimal KE0.7303400.556641
Water/Food16Falcon-7bRound7, pred16a0.4416950.437510
Water/Food16FEDformerdm512 nh8 el2 dl1 df2048 fc31.8695641.134118
Water/Food16Autoformerdm512 nh8 el2 dl1 df2048 fc31.0766910.827988
Water/Food16Informerdm512 nh8 el4 dl4 df2048 fc30.2187830.327771
Water/Food16Transformerdm512 nh8 el4 dl4 df2048 fc30.0984500.242006
Water/Food16DLinear 0.8690790.756642
Water/Food16NLinear 0.9499090.789085
Water/Food16Linear 0.7704730.709996
Water/Food16PatchTST 0.9516540.725133
Water/Food16TimeLLMLlama-2-7b1.4699221.047710
Water/Food16TimeLLMGPT21.0460010.860559
Water/Food16TimeLLMLlama-3.2-1b1.0536650.870521
Water/Food16TimeLLMLlama-3.2-3b1.0746780.890120
Water/Food16TimeLLMLlama-3.1-8b1.8486941.189474
Water/Food16TimesFMno fine tuning0.3952220.404078
Water/Food16ARIMA96, 0, 631.723672.781132
Water/Food16Prophetseasonality = daily, mode = multiplicative62.9358406.58161
Water/Food16XGBoostestimators = 700, depth = 623.392552.00468
Water/Food16LSTM 0.62460.5562
Table A9. Best results of model evaluation on EOD 18-24 raw water per day.
Table A9. Best results of model evaluation on EOD 18-24 raw water per day.
DatasetLengthModelSettingsMSEMAE
Water/Day35LLama-3.1-8bRound6, integers leading 063,745.38696.94633
Water/Day35LLama-3-8bRound6, minimal KE, integer leading 064,70297.65088
Water/Day35Falcon-7bRound7, minimal KE, integer leading 061,410.516108.23203
Water/Day35LLama-3.2-1bRound7, integers leading 063,554.199101.94559
Water/Day35LLama-3.2-3bRound6, pred163,255.33699.65455
Water/Day35FEDformerdm1024 nh8 el4 dl4 df2048 fc344,349.059122.17598
Water/Day35Autoformerdm1024 nh8 el4 dl4 df2048 fc371,051.375195.52776
Water/Day35Informerdm1024 nh8 el4 dl4 df2048 fc363,6301.12752.30859
Water/Day35Transformerdm1024 nh8 el4 dl4 df2048 fc3635,873.75752.04865
Water/Day35Linear 47,395.52153.22598
Water/Day35NLinear 34,418.758100.29855
Water/Day35DLinear 45,670.336146.07356
Water/Day35PatchTST 30,045.18688.355118
Water/Day35TimeLLMGPT233,996.527121.36325
Water/Day35TimeLLMLLAMA27,202.824103.54283
Water/Day35TimeLLMLLama-3.1-8b33,246.043135.12067
Water/Day35TimeLLMLLama-3.2-1b44,537.82157.351
Water/Day35TimeLLMLLama-3.2-3b50,407.285148.35048
Water/Day35TimesFM 69,906.184155.88891
Water/Day35ARIMA96, 0, 6317,326.68748451.2992161
Water/Day35Prophetseasonality = daily, mode = multiplicative50,732.796193164,133,299
Water/Day35XGBoostestimators = 700, depth = 655,122.3666120.274808
Water/Day35LSTM 226,152.1797429.03188101
Table A10. Best results of model evaluation on EOD 18-24 raw food per day.
Table A10. Best results of model evaluation on EOD 18-24 raw food per day.
DatasetLengthModelSettingsMSEMAE
Food/Day35LLama-3.1-8bRound616,931.81887.94107
Food/Day35LLama-3-8bRound616,931.81887.94107
Food/Day35Falcon-7bRound616,931.81887.94107
Food/Day35LLama-3.2-1bRound616,931.81887.94107
Food/Day35LLama-3.2-3bRound616,931.81887.94107
Food/Day35FEDformerdm1024 nh8 el4 dl4 df2048 fc311,468.25771.640099
Food/Day35Autoformerdm1024 nh8 el4 dl4 df2048 fc317,393.18692.439812
Food/Day35Informerdm1024 nh8 el4 dl4 df2048 fc3205,548.92425.98523
Food/Day35Transformerdm1024 nh8 el4 dl4 df2048 fc3205,643.06426.08453
Food/Day35Linear 11,846.94875.168358
Food/Day35NLinear 6607.983948.441002
Food/Day35DLinear 11,733.61774.472069
Food/Day35PatchTST 5418.831136.753399
Food/Day35TimeLLMGPT28365.878961.055107
Food/Day35TimeLLMLLAMA7521.219748.208687
Food/Day35TimeLLMLLama-3.1-8b7828.521556.618542
Food/Day35TimeLLMLLama-3.2-1b12,181.81179.305267
Food/Day35TimeLLMLLama-3.2-3b11,142.69966.228477
Food/Day35TimesFM 15,674.90671.358452
Food/Day35ARIMA96, 0, 658,600.825657194.563275
Food/Day35Prophetseasonality = daily, mode = multiplicative20,569.6108124.62285
Food/Day35XGBoostestimators = 700, depth = 68432.6218952.20021
Food/Day35LSTM 94,945.739959280.33244
Table A11. Best results of model evaluation on EOD 18-24 raw water per food per day.
Table A11. Best results of model evaluation on EOD 18-24 raw water per food per day.
DatasetLengthModelSettingsMSEMAE
Water/Food35LLama-3.1-8bRound7, integers leading 0 0.530766 0.403786
Water/Food35LLama-3-8bRound7, minimal KE, integers leading 0 0.543724 0.404424
Water/Food35Falcon-7bRound7 0.342801 0.304580
Water/Food35LLama-3.2-1bRound7, pred1, integers leading 0 0.474231 0.384112
Water/Food35LLama-3.2-3bRound7 0.567289 0.429888
Water/Food35FEDformerdm512 nh8 el4 dl4 df2048 fc2 12.721095 1.6575569
Water/Food35Autoformerdm512 nh8 el4 dl4 df2048 fc2 20.563625 2.1462989
Water/Food35Informerdm512 nh8 el4 dl4 df2048 fc3 0.5531605 0.5554001
Water/Food35Transformerdm512 nh8 el2 dl1 df2048 fc3 0.5594669 0.5674024
Water/Food35Linear 6.7625961 1.5873865
Water/Food35NLinear 7.9038653 1.5324063
Water/Food35DLinear 7.772625 1.8593231
Water/Food35PatchTST 8.894248 1.6568558
Water/Food35TimeLLMGPT2 21.213932 2.5848339
Water/Food35TimeLLMLLAMA 14.042688 2.2479351
Water/Food35TimeLLMLLama-3.1-8b 17.969053 2.6226354
Water/Food35TimeLLMLLama-3.2-1b 111.3106 5.1977305
Water/Food35TimeLLMLLama-3.2-3b 42.425476 3.2953618
Water/Food35TimesFM 0.5520953 0.4224015
Water/Food35ARIMA96, 0, 6 26.8833622 2.445425
Water/Food35Prophetseasonality = daily, mode = multiplicative 0.212537 0.40059
Water/Food35XGBoostestimators = 700, depth = 6 1.100196 0.396809
Water/Food35LSTM 5.0412 1.80235
Table A12. Best results of model evaluation on EOD 12-24 on raw water per day.
Table A12. Best results of model evaluation on EOD 12-24 on raw water per day.
DatasetLengthModelSettingsMSEMAE
Water/Day72LLama-3.1-8bRound7127,219.5179.63584
Water/Day72LLama-3-8bRound6400,398.2232.81554
Water/Day72Falcon-7bRound799,966.3188.24978
Water/Day72LLama-3.2-1bRound6215,273.1211.50824
Water/Day72LLama-3.2-3bRound6128,601.8183.02409
Water/Day72FEDformerdm1024 nh8 el4 dl4 df2048 fc367,055.773150.94092
Water/Day72Autoformerdm512 nh8 el4 dl4 df2048 fc270,955.727180.86993
Water/Day72Informerdm1024 nh8 el4 dl4 df2048 fc3456,403.38610.25104
Water/Day72Transformerdm1024 nh8 el4 dl4 df2048 fc3455,644.03609.69543
Water/Day72Linear 99,236.414236.90807
Water/Day72NLinear 39,264.383117.4375
Water/Day72DLinear 67,357.328177.82516
Water/Day72PatchTST 30,868.74299.920082
Water/Day72TimeLLMGPT236,441.277115.92726
Water/Day72TimeLLMLLAMA48,275.848135.40031
Water/Day72TimeLLMLLama-3.1-8b69,104.68189.53014
Water/Day72TimeLLMLLama-3.2-1b62,497.641170.28175
Water/Day72TimeLLMLLama-3.2-3b41,100.262129.60506
Water/Day72TimesFM 97,844.638192.79371
Water/Day72ARIMA96, 0, 6316,569.758632450.2248
Water/Day72Prophetseasonality = daily, mode = multiplicative32,817.351578131.943
Water/Day72XGBoostestimators = 700, depth = 6126,232.35895164.597509
Water/Day72LSTM 158,568.514725329.969951
Table A13. Best results of model evaluation on EOD 12-24 on raw food per day.
Table A13. Best results of model evaluation on EOD 12-24 on raw food per day.
DatasetLengthModelSettingsMSEMAE
Food/Day72LLama-3.1-8bRound643,614.63168.08182
Food/Day72LLama-3-8bRound643,614.63168.08182
Food/Day72Falcon-7bRound643,614.63168.08182
Food/Day72LLama-3.2-1bRound643,614.63168.08182
Food/Day72LLama-3.2-3bRound643,614.63168.08182
Food/Day72FEDformerdm1024 nh8 el4 dl4 df2048 fc315,813.65696.876381
Food/Day72Autoformerdm512 nh8 el2 dl1 df2048 fc316,074.98695.071198
Food/Day72Informerdm512 nh8 el4 dl4 df2048 fc211,043.8771.900696
Food/Day72Transformerdm1024 nh8 el4 dl4 df2048 fc38197.972760.516228
Food/Day72Linear 32,399.889142.77518
Food/Day72NLinear 13,294.71470.150322
Food/Day72DLinear 16,815.31694.899933
Food/Day72PatchTST 8156.789149.072395
Food/Day72TimeLLMGPT214,227.89181.751991
Food/Day72TimeLLMLLAMA15,507.04988.383728
Food/Day72TimeLLMLLama-3.1-8b10,399.14568.121513
Food/Day72TimeLLMLLama-3.2-1b10,741.90167.632515
Food/Day72TimeLLMLLama-3.2-3b11,974.56676.812675
Food/Day72TimesFM 26,058.219107.60436
Food/Day72ARIMA96, 0, 658,525.7958194.235154
Food/Day72Prophetseasonality = daily, mode = multiplicative13,867.4237499.8531971
Food/Day72XGBoostestimators = 700, depth = 610,311.5889961.2380111
Food/Day72LSTM 61,487.02896208.947407
Table A14. Best results of model evaluation on EOD 12-24 on raw water per food.
Table A14. Best results of model evaluation on EOD 12-24 on raw water per food.
DatasetLengthModelSettingsMSEMAE
Water/Food72LLama-3.1-8bRound6 29.75216 1.868294
Water/Food72LLama-3-8bRound6 30.96716 1.946766
Water/Food72Falcon-7bRound6 26.77479 1.512218
Water/Food72LLama-3.2-1bRound7 31.18094 1.781682
Water/Food72LLama-3.2-3bRound6 31.00623 1.874792
Water/Food72FEDformerdm512 nh8 el2 dl1 df2048 fc3 47.567806 3.6072099
Water/Food72Autoformerdm512 nh8 el4 dl4 df2048 fc2 47.90118 3.6683018
Water/Food72Informerdm1024 nh8 el4 dl4 df2048 fc3 4.2334499 0.7910022
Water/Food72Transformerdm512 nh8 el2 dl1 df2048 fc3 4.6746197 0.9696611
Water/Food72Linear 18.464836 2.503298
Water/Food72NLinear 31.623196 2.3313577
Water/Food72DLinear 17.509649 2.6272669
Water/Food72PatchTST 19.467442 2.3075855
Water/Food72TimeLLMGPT2 128.50887 5.5041637
Water/Food72TimeLLMLLAMA 26.97904 2.9874995
Water/Food72TimeLLMLLama-3.1-8b 49.222603 3.8950772
Water/Food72TimeLLMLLama-3.2-1b 71.824181 4.582509
Water/Food72TimeLLMLLama-3.2-3b 110.70022 5.2826319
Water/Food72TimesFM 1.9870864 0.8731274
Water/Food72ARIMA96, 0, 6 27.13487 2.46185
Water/Food72Prophetseasonality = daily, mode = multiplicative 0.23685 0.40524
Water/Food72XGBoostestimators = 700, depth = 6 9.53428099 0.7338821
Water/Food72LSTM 16.30125233 2.889976
Table A15. Best results of model evaluation on EOD 08-24 on raw water per day.
Table A15. Best results of model evaluation on EOD 08-24 on raw water per day.
DatasetLengthModelSettingsMSEMAE
Water/Day94LLama-3.1-8bRound6134,045.0229.66190
Water/Day94LLama-3-8bRound6, minimal KE132,584.4228.46576
Water/Day94Falcon-7bRound6160,179.5281.47873
Water/Day94LLama-3.2-1bRound6181,587.5268.71460
Water/Day94LLama-3.2-3bRound7136,423.6237.23671
Water/Day94FEDformerdm1024 nh8 el4 dl4 df2048 fc3154,820.84205.27248
Water/Day94Autoformerdm1024 nh8 el4 dl4 df2048 fc3106,292.59231.01073
Water/Day94Informerdm1024 nh8 el4 dl4 df2048 fc3375,410.12538.77234
Water/Day94Transformerdm1024 nh8 el4 dl4 df2048 fc3375,288.69538.6684
Water/Day94Linear 151,499.34290.81503
Water/Day94NLinear 81,248.977191.62004
Water/Day94DLinear 81,626.609196.50258
Water/Day94PatchTST 56,002.957136.36386
Water/Day94TimeLLMGPT262,078.516158.99992
Water/Day94TimeLLMLLAMA64,618.055171.73296
Water/Day94TimeLLMLLama-3.2-1b68,709.617178.30528
Water/Day94TimeLLMLLama-3.2-3b69,128.562180.73517
Water/Day94TimesFM 167,282.64308.11604
Water/Day94ARIMA96, 0, 6314,858.7631448.9585
Water/Day94Prophetseasonality = daily, mode = multiplicative23,627.412932107.2348
Water/Day94XGBoostestimators = 700, depth = 637,955.39699108.960082
Water/Day94LSTM 54,214.290572193.1003371
Table A16. Best results of model evaluation on EOD 08-24 on raw food per day.
Table A16. Best results of model evaluation on EOD 08-24 on raw food per day.
DatasetLengthModelSettingsMSEMAE
Food/Day94LLama-3.1-8bRound6106,993.9273.90292
Food/Day94LLama-3-8bRound6106,993.9273.90292
Food/Day94Falcon-7bRound6106,993.9273.90292
Food/Day94LLama-3.2-1bRound6106,993.9273.90292
Food/Day94LLama-3.2-3bRound6106,993.9273.90292
Food/Day94FEDformerdm1024 nh8 el4 dl4 df2048 fc363,938.793203.65533
Food/Day94Autoformerdm512 nh8 el4 dl4 df2048 fc234,084.82144.66563
Food/Day94Informerdm1024 nh8 el4 dl4 df2048 fc3112,796.54289.76105
Food/Day94Transformerdm1024 nh8 el4 dl4 df2048 fc3112,717.09289.63867
Food/Day94Linear 55,614.445187.25125
Food/Day94NLinear 54,986.531183.73718
Food/Day94DLinear 24,125.869119.84232
Food/Day94PatchTST 9165.394564.747391
Food/Day94TimeLLMGPT211,608.89975.451408
Food/Day94TimeLLMLLAMA10,372.41966.009544
Food/Day94TimeLLMLLama-3.2-1b18,642.94998.507324
Food/Day94TimeLLMLLama-3.2-3b10,981.99269.961159
Food/Day94TimesFM 108,090.62275.52387
Food/Day94ARIMA96, 0, 658,241.02083193.74541
Food/Day94Prophetseasonality = daily, mode = multiplicative10,607.05184483.265704
Food/Day94XGBoostestimators = 700, depth = 66665.187487152.4635754
Food/Day94LSTM 18,821.261799119.3361072
Table A17. Best results of model evaluation on EOD 08-24 on raw water per food.
Table A17. Best results of model evaluation on EOD 08-24 on raw water per food.
DatasetLengthModelSettingsMSEMAE
Water/Food94LLama-3.1-8bRound6 2064.6154 25.13210
Water/Food94LLama-3-8bRound6 2407.4668 26.75854
Water/Food94Falcon-7bRound6 1017.1714 18.91526
Water/Food94LLama-3.2-1bRound7 1668.2446 23.90957
Water/Food94LLama-3.2-3bRound7, integers lead 0 18.56463 2.491583
Water/Food94FEDformerdm512 nh8 el2 dl1 df2048 fc3 107.50979 5.9102168
Water/Food94Autoformerdm512 nh8 el2 dl1 df2048 fc3 103.11153 5.7164969
Water/Food94Informerdm512 nh8 el4 dl4 df2048 fc3 9.4249907 1.1121691
Water/Food94Transformerdm1024 nh8 el4 dl4 df2048 fc3 9.66008 0.9591334
Water/Food94Linear 32.426571 3.2802207
Water/Food94NLinear 532.24011 12.311095
Water/Food94DLinear 33.396038 3.2424126
Water/Food94PatchTST 48.766281 3.8374641
Water/Food94TimeLLMGPT2 54.385811 4.5503159
Water/Food94TimeLLMLLAMA 65.39135 4.5003433
Water/Food94TimeLLMLLama-3.2-1b 154.21689 6.2390552
Water/Food94TimeLLMLLama-3.2-3b 56.128967 4.3136649
Water/Food94TimesFM 362.32859 9.1224683
Water/Food94ARIMA96, 0, 62,727,683,668 2.46731
Water/Food94Prophetseasonality = daily, mode = multiplicative 15.434231 0.858939
Water/Food94XGBoostestimators = 700, depth = 6 95.4685184 4.0301748
Water/Food94LSTM 103.346606 5.0909842
This test forecasts the values between 06:00:00 and 08:00:00 using the start of the day value as input. This time is critical, as it is when the turkeys are just waking up.
Table A18. Best results of model evaluation on EOD 06-08 on raw water per day.
Table A18. Best results of model evaluation on EOD 06-08 on raw water per day.
DatasetLengthModelSettingsMSEMAE
Water/Day12LLama-3.1-8bRound64250.372638.61718
Water/Day12LLama-3-8bRound6, minimal KE4225.838438.21875
Water/Day12Falcon-7bRound6, minimal KE4263.489738.72395
Water/Day12LLama-3.2-1bRound74262.837438.78515
Water/Day12LLama-3.2-3bRound74235.928238.51432
Water/Day12FEDformerdm512 nh8 el2 dl1 df2048 fc34264.629446.152172
Water/Day12Autoformerdm1024 nh8 el4 dl4 df2048 fc34363.550345.404327
Water/Day12Informerdm1024 nh8 el4 dl4 df2048 fc343,233.488177.9469
Water/Day12Transformerdm1024 nh8 el4 dl4 df2048 fc343,026.168177.40883
Water/Day12Linear 14,518.61292.228477
Water/Day12NLinear 7100.554757.105877
Water/Day12DLinear 11,145.27776.910149
Water/Day12PatchTST 2217.420223.792437
Water/Day12TimeLLMGPT24171.31244.646255
Water/Day12TimeLLMLLAMA22,659.805115.91663
Water/Day12TimeLLMLLama-3.1-8b6940.584562.111256
Water/Day12TimeLLMLLama-3.2-1b9221.164172.825569
Water/Day12TimeLLMLLama-3.2-3b33,299.645127.93154
Water/Day12TimesFM 31,208.614113.44911
Water/Day12ARIMA96, 0, 6314,669.637448.9622
Water/Day12Prophetseasonality = daily, mode = multiplicative6706.8227464.3016
Water/Day12XGBoostestimators = 700, depth = 636,346.7190468.90266777
Water/Day12LSTM 34,974.1794334173.6668
Table A19. Best results of model evaluation on EOD 06-08 on raw food per day.
Table A19. Best results of model evaluation on EOD 06-08 on raw food per day.
DatasetLengthModelSettingsMSEMAE
Food/Day12LLama-3.1-8bRound6606.575518.601563
Food/Day12LLama-3-8bRound6606.575518.601563
Food/Day12Falcon-7bRound6606.575518.601563
Food/Day12LLama-3.2-1bRound6606.575518.601563
Food/Day12LLama-3.2-3bRound6606.575518.601563
Food/Day12FEDformerdm1024 nh8 el4 dl4 df2048 fc33667.652353.018097
Food/Day12Autoformerdm1024 nh8 el4 dl4 df2048 fc31733.054730.34667
Food/Day12Informerdm1024 nh8 el4 dl4 df2048 fc32889.972427.680847
Food/Day12Transformerdm512 nh8 el4 dl4 df2048 fc23113.793929.272665
Food/Day12Linear 2254.864336.60244
Food/Day12NLinear 1876.58832.061527
Food/Day12DLinear 2043.489433.071217
Food/Day12PatchTST 744.3105517.705568
Food/Day12TimeLLMGPT22182.713636.632145
Food/Day12TimeLLMLLAMA829.4625919.06687
Food/Day12TimeLLMLLama-3.1-8b1670.534129.97143
Food/Day12TimeLLMLLama-3.2-1b849.5509622.064957
Food/Day12TimeLLMLLama-3.2-3b926.6013822.062468
Food/Day12TimesFM 63,453.254186.62499
Food/Day12ARIMA96, 0, 658,720.45933193.85355
Food/Day12Prophetseasonality = daily, mode = multiplicative718.245620.33473
Food/Day12XGBoostestimators = 700, depth = 6950.58565121.6275
Food/Day12LSTM 21,673.792321141.724656
Table A20. Best results of model evaluation on EOD 06-08 on raw water per food.
Table A20. Best results of model evaluation on EOD 06-08 on raw water per food.
DatasetLengthModelSettingsMSEMAE
Water/Food12LLama-3.1-8bRound6 65.23934 3.512001
Water/Food12LLama-3-8bRound6 65.56091 3.466901
Water/Food12Falcon-7bRound7 64.68107 3.4630734
Water/Food12LLama-3.2-1bRound7 65.25236 3.502396
Water/Food12LLama-3.2-3bRound6 65.26582 3.495886
Water/Food12FEDformerdm512 nh8 el4 dl4 df2048 fc3 76.843819 4.4923301
Water/Food12Autoformerdm512 nh8 el2 dl1 df2048 fc3 74.666283 4.2812572
Water/Food12Informerdm1024 nh8 el4 dl4 df2048 fc3 211.39128 7.1620622
Water/Food12Transformerdm1024 nh8 el4 dl4 df2048 fc3 192.72319 6.7343025
Water/Food12Linear 313.64319 9.3907223
Water/Food12NLinear 95.085266 4.3722696
Water/Food12DLinear 325.85709 9.6598787
Water/Food12PatchTST 141.24937 4.6548424
Water/Food12TimeLLMGPT2 217.675 7.4429684
Water/Food12TimeLLMLLAMA 165.22368 6.4503074
Water/Food12TimeLLMLLama-3.1-8b 172.60667 5.9101272
Water/Food12TimeLLMLLama-3.2-1b 147.49525 5.9639492
Water/Food12TimeLLMLLama-3.2-3b 169.36555 5.6974831
Water/Food12TimesFM 75.693913 4.0768743
Water/Food12ARIMA96, 0, 6 27.3321 2.46678
Water/Food12Prophetdseasonality = daily, mode = multiplicative 66.66029 6.018013
Water/Food12XGBoostestimators = 700, depth = 6 104.99004 4.794791
Water/Food12LSTM 388.5438684 10.96561

References

  1. Kim, J.; Kim, H.; Kim, H.; Lee, D.; Yoon, S. A Comprehensive Survey of Time Series Forecasting: Architectural Diversity and Open Challenges. arXiv 2024, arXiv:2411.05793. [Google Scholar] [CrossRef]
  2. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis, 5th ed.; Wiley Series in Probability and Statistics; John Wiley & Sons: Nashville, TN, USA, 2015. [Google Scholar]
  3. Taylor, S.J.; Letham, B. Forecasting at Scale. Am. Stat. 2018, 72, 37–45. [Google Scholar] [CrossRef]
  4. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  5. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  6. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; NeurIPS Foundation: La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
  7. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  8. Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2020, Vancouver, BC, Canada, 6–12 December 2020; NeurIPS Foundation: La Jolla, CA, USA, 2020; Volume 33, pp. 1877–1901. [Google Scholar]
  9. OpenAI. Introducing ChatGPT. 2022. Available online: https://openai.com/index/chatgpt/ (accessed on 4 October 2023).
  10. OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
  11. Qin, Y.; Hu, S.; Lin, Y.; Chen, W.; Ding, N.; Cui, G.; Zeng, Z.; Huang, Y.; Xiao, C.; Han, C.; et al. Tool learning with foundation models. arXiv 2023, arXiv:2304.08354. [Google Scholar] [CrossRef]
  12. Bubeck, S.; Chandrasekaran, V.; Eldan, R.; Gehrke, J.; Horvitz, E.; Kamar, E.; Lee, P.; Lee, Y.T.; Li, Y.; Lundberg, S.; et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv 2023, arXiv:2303.12712. [Google Scholar] [CrossRef]
  13. Mirchandani, S.; Xia, F.; Florence, P.; Ichter, B.; Driess, D.; Arenas, M.G.; Rao, K.; Sadigh, D.; Zeng, A. Large language models as general pattern machines. arXiv 2023, arXiv:2307.04721. [Google Scholar] [CrossRef]
  14. Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, F.; Le, Q.; Zhou, D. Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv 2022, arXiv:2201.11903. [Google Scholar]
  15. Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019; NeurIPS Foundation: La Jolla, CA, USA, 2019; Volume 32. [Google Scholar]
  16. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
  17. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2021, Online, 6–14 December 2021; NeurIPS Foundation: La Jolla, CA, USA, 2021; Volume 34, pp. 22419–22430. [Google Scholar]
  18. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
  19. Cao, H.; Huang, Z.; Yao, T.; Wang, J.; He, H.; Wang, Y. InParformer: Evolutionary Decomposition Transformers with Interactive Parallel Attention for Long-Term Time Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 6906–6915. [Google Scholar]
  20. Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2022, New Orleans, LA, USA, 28 November–9 December 2022; NeurIPS Foundation: La Jolla, CA, USA, 2022; Volume 35, pp. 9881–9893. [Google Scholar]
  21. Sasal, L.; Chakraborty, T.; Hadid, A. W-Transformers: A Wavelet-based Transformer Framework for Univariate Time Series Forecasting. In Proceedings of the 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 12–14 December 2022; IEEE: New York, NY, USA, 2022; pp. 671–676. [Google Scholar]
  22. Haugsdal, E.; Aune, E.; Ruocco, M. Persistence initialization: A novel adaptation of the transformer architecture for time series forecasting. Appl. Intell. 2023, 53, 26781–26796. [Google Scholar] [CrossRef]
  23. Shabani, A.; Abdi, A.; Meng, L.; Sylvain, T. Scaleformer: Iterative multi-scale refining transformers for time series forecasting. arXiv 2022, arXiv:2206.04038. [Google Scholar]
  24. Wang, Y.; Wu, H.; Dong, J.; Qin, G.; Zhang, H.; Liu, Y.; Qiu, Y.; Wang, J.; Long, M. Timexer: Empowering transformers for time series forecasting with exogenous variables. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 9–14 December 2024; NeurIPS Foundation: La Jolla, CA, USA, 2024; Volume 37, pp. 469–498. [Google Scholar]
  25. Cholakov, R.; Kolev, T. Transformers predicting the future. Applying attention in next-frame and time series forecasting. arXiv arXiv:2108.08224. [CrossRef]
  26. Lara-Benítez, P.; Gallego-Ledesma, L.; Carranza-García, M.; Luna-Romera, J.M. Evaluation of the transformer architecture for univariate time series forecasting. In Advances in Artificial Intelligence, Proceedings of the 19th Conference of the Spanish Association for Artificial Intelligence, CAEPIA 2020/2021, Málaga, Spain, 22–24 September 2021; Proceedings 19; Springer: Berlin/Heidelberg, Germany, 2021; pp. 106–115. [Google Scholar]
  27. Shi, J.; Jain, M.; Narasimhan, G. Time series forecasting (tsf) using various deep learning models. arXiv 2022, arXiv:2204.11115. [Google Scholar] [CrossRef]
  28. Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
  29. HajiAkhondi-Meybodi, Z.; Mohammadi, A.; Hou, M.; Rahimian, E.; Heidarian, S.; Abouei, J.; Plataniotis, K.N. Multi-Content Time-Series Popularity Prediction with Multiple-Model Transformers in MEC Networks. arXiv 2022, arXiv:2210.05874. [Google Scholar]
  30. Sousa, M.G.; Sakiyama, K.; de Souza Rodrigues, L.; Moraes, P.H.; Fernandes, E.R.; Matsubara, E.T. BERT for stock market sentiment analysis. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; IEEE: New York, NY, USA, 2019; pp. 1597–1601. [Google Scholar]
  31. Sonkiya, P.; Bajpai, V.; Bansal, A. Stock price prediction using BERT and GAN. arXiv 2021, arXiv:2107.09055. [Google Scholar] [CrossRef]
  32. Qin, J.; Zong, L. TS-BERT: A Fusion Model for Pre-training Time Series-Text Representations. 2021. Available online: https://openreview.net/forum?id=Fia60I79-4B (accessed on 28 August 2025).
  33. Yu, X.; Chen, Z.; Ling, Y.; Dong, S.; Liu, Z.; Lu, Y. Temporal Data Meets LLM–Explainable Financial Time Series Forecasting. arXiv 2023, arXiv:2306.11025. [Google Scholar]
  34. Xie, Q.; Han, W.; Zhang, X.; Lai, Y.; Peng, M.; Lopez-Lira, A.; Huang, J. PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance. arXiv 2023, arXiv:2306.05443. [Google Scholar] [CrossRef]
  35. Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
  36. Zhang, X.; Zhao, Z.; Tsiligkaridis, T.; Zitnik, M. Self-supervised contrastive pre-training for time series via time-frequency consistency. In Advances in Neural Information Processing Systems, Proceedings of the Conference and Workshop on Neural Information Processing Systems 2022, New Orleans, LA, USA, 28 November–9 December 2022; NeurIPS Foundation: La Jolla, CA, USA, 2022; Volume 35, pp. 3988–4003. [Google Scholar]
  37. Deldari, S.; Xue, H.; Saeed, A.; He, J.; Smith, D.V.; Salim, F.D. Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data. arXiv 2022, arXiv:2206.02353. [Google Scholar] [CrossRef]
  38. Woo, G.; Liu, C.; Kumar, A.; Xiong, C.; Savarese, S.; Sahoo, D. Unified training of universal time series forecasting transformers. arXiv 2024, arXiv:2402.02592. [Google Scholar] [CrossRef]
  39. Das, A.; Kong, W.; Sen, R.; Zhou, Y. A decoder-only foundation model for time-series forecasting. arXiv 2023, arXiv:2310.10688. [Google Scholar]
  40. Zhou, T.; Niu, P.; Wang, X.; Sun, L.; Jin, R. One Fits All: Power General Time Series Analysis by Pretrained LM. arXiv 2023, arXiv:2302.11939. [Google Scholar]
  41. Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  42. Chang, C.; Peng, W.; Chen, T.F. LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters. arXiv 2023, arXiv:2308.08469. [Google Scholar] [CrossRef]
  43. Jin, M.; Wang, S.; Ma, L.; Chu, Z.; Zhang, J.Y.; Shi, X.L.; Chen, P.Y.; Liang, Y.; Li, Y.F.; Pan, S.; et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. arXiv 2023, arXiv:2310.01728. [Google Scholar]
  44. Liu, X.; Hu, J.; Li, Y.; Diao, S.; Liang, Y.; Hooi, B.; Zimmermann, R. Unitime: A language-empowered unified model for cross-domain time series forecasting. In Proceedings of the ACM on Web Conference 2024, Singapore, 13–17 May 2024; pp. 4095–4106. [Google Scholar]
  45. Liu, Q.; Liu, X.; Liu, C.; Wen, Q.; Liang, Y. Time-FFM: Towards LM-Empowered Federated Foundation Model for Time Series Forecasting. arXiv 2024, arXiv:2405.14252. [Google Scholar]
  46. Ansari, A.F.; Stella, L.; Turkmen, C.; Zhang, X.; Mercado, P.; Shen, H.; Shchur, O.; Rangapuram, S.S.; Arango, S.P.; Kapoor, S.; et al. Chronos: Learning the Language of Time Series. arXiv 2024, arXiv:2403.07815. [Google Scholar] [CrossRef]
  47. Xue, H.; Salim, F.D. PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting. arXiv 2022, arXiv:2210.08964. [Google Scholar] [CrossRef]
  48. Gruver, N.; Finzi, M.; Qiu, S.; Wilson, A.G. Large Language Models Are Zero-Shot Time Series Forecasters. arXiv 2023, arXiv:2310.07820. [Google Scholar] [CrossRef]
  49. Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
  50. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  51. Penedo, G.; Malartic, Q.; Hesslow, D.; Cojocaru, R.; Cappelli, A.; Alobeidli, H.; Pannier, B.; Almazrouei, E.; Launay, J. The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data, and web data only. arXiv 2023, arXiv:2306.01116. [Google Scholar] [CrossRef]
  52. Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
  53. Meta. LLama-3-8b on Huggingface. 2024. Available online: https://huggingface.co/meta-llama/Meta-Llama-3-8B (accessed on 12 May 2024).
Figure 1. Visualization of the KINLI dataset.
Figure 1. Visualization of the KINLI dataset.
Animals 15 03180 g001
Figure 2. Forecast MAE ordered by forecasting length, with 12 corresponding to the 06:00:00 to 08:00:00 test, 16 being the 22:00:00 to eod test, 35 18:00:00, 72 12:00:00, and 94 08:00:00.
Figure 2. Forecast MAE ordered by forecasting length, with 12 corresponding to the 06:00:00 to 08:00:00 test, 16 being the 22:00:00 to eod test, 35 18:00:00, 72 12:00:00, and 94 08:00:00.
Animals 15 03180 g002
Figure 3. Forecasts of informer: water per day in ml (left), food per day in gr (center), water per food (right).
Figure 3. Forecasts of informer: water per day in ml (left), food per day in gr (center), water per food (right).
Animals 15 03180 g003
Figure 4. Forecasts of LLMs: Water per day in ml (left), Food per day in gr (center), Water per Food (right).
Figure 4. Forecasts of LLMs: Water per day in ml (left), Food per day in gr (center), Water per Food (right).
Animals 15 03180 g004
Figure 5. Forecasts of TimesFM on water per day in ml: 18:00:00 (left), 12:00:00 (center), 08:00:00 (right).
Figure 5. Forecasts of TimesFM on water per day in ml: 18:00:00 (left), 12:00:00 (center), 08:00:00 (right).
Animals 15 03180 g005
Figure 6. TimesFM long sequence forecasts for Water/Day in ml for 08:00:00 to 24:00:00 are able to better identify the shape of the dataset, but still fail to improve accuracy.
Figure 6. TimesFM long sequence forecasts for Water/Day in ml for 08:00:00 to 24:00:00 are able to better identify the shape of the dataset, but still fail to improve accuracy.
Animals 15 03180 g006
Figure 7. LLM long sequence forecasts for Water/Day in ml for 08:00:00 to 24:00:00 still show pattern deterioration.
Figure 7. LLM long sequence forecasts for Water/Day in ml for 08:00:00 to 24:00:00 still show pattern deterioration.
Animals 15 03180 g007
Figure 8. Example of unrelated output with input (top) and output (bottom).
Figure 8. Example of unrelated output with input (top) and output (bottom).
Animals 15 03180 g008
Figure 9. Examples of output deterioration with input (top) and precision deterioration (left) and pattern deterioration (right).
Figure 9. Examples of output deterioration with input (top) and precision deterioration (left) and pattern deterioration (right).
Animals 15 03180 g009
Table 1. Test results of long sequence forecasts on 08:00:00 to end-of-day (Water/Day) for select models.
Table 1. Test results of long sequence forecasts on 08:00:00 to end-of-day (Water/Day) for select models.
DatasetSequence LengthForecasting LengthModelMSEMAE
Water/Day5094LLama-3.2-3b144,172.3 246.67404
Water/Day5094Informer53,442.754138.82674
Water/Day5094PatchTST56,044.457136.47639
Water/Day5094TimeLLM-GPT-259,466.441157.79073
Water/Day5094TimesFM167,285.69308.22034
Water/Day25694LLama-3.2-3b123,470.1196.3396
Water/Day25694Informer49,009.93137.2414
Water/Day25694PatchTST43,613.63119.1713
Water/Day25694TimeLLM-GPT-253,003.62143.8254
Water/Day25694TimesFM106,913.9241.1781
Water/Day51294LLama-3.2-3b124,592.6201.5027
Water/Day51294Informer52,098.79149.1073
Water/Day51294PatchTST50,142.46143.5388
Water/Day51294TimeLLM-GPT-270,224.34190.5589
Water/Day51294TimesFM106,130.4243.7242
Water/Day102494LLama-3.2-3b94,902.53175.6282
Water/Day102494Informer51,740.08149.0719
Water/Day102494PatchTST48,542.24139.8018
Water/Day102494TimeLLM-GPT-248,614.13142.5252
Water/Day102494TimesFM106,130.4243.7242
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pack, C.I.; Zeiser, T.; Beecks, C.; Lutz, T. KINLI: Time Series Forecasting for Monitoring Poultry Health in Complex Pen Environments. Animals 2025, 15, 3180. https://doi.org/10.3390/ani15213180

AMA Style

Pack CI, Zeiser T, Beecks C, Lutz T. KINLI: Time Series Forecasting for Monitoring Poultry Health in Complex Pen Environments. Animals. 2025; 15(21):3180. https://doi.org/10.3390/ani15213180

Chicago/Turabian Style

Pack, Christopher Ingo, Tim Zeiser, Christian Beecks, and Theo Lutz. 2025. "KINLI: Time Series Forecasting for Monitoring Poultry Health in Complex Pen Environments" Animals 15, no. 21: 3180. https://doi.org/10.3390/ani15213180

APA Style

Pack, C. I., Zeiser, T., Beecks, C., & Lutz, T. (2025). KINLI: Time Series Forecasting for Monitoring Poultry Health in Complex Pen Environments. Animals, 15(21), 3180. https://doi.org/10.3390/ani15213180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop