PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models

Kim, Jonghyun; Sohn, Chae-Bong

doi:10.3390/agriculture16030389

Open AccessCommunication

PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models

by

Jonghyun Kim

^1,2

and

Chae-Bong Sohn

^3,*

¹

Department of Electronics and Communications Engineering, Kwangwoon University, Seoul 01897, Republic of Korea

²

Valiantdata Inc., Seongnam-si 13590, Republic of Korea

³

Department of Defense Acquisition Program, Kwangwoon University, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(3), 389; https://doi.org/10.3390/agriculture16030389

Submission received: 14 January 2026 / Revised: 3 February 2026 / Accepted: 4 February 2026 / Published: 6 February 2026

(This article belongs to the Special Issue Machine Learning in Precision Livestock Farming: From Animal Activity Forecasting to Environmental Control)

Download

Browse Figures

Versions Notes

Abstract

Real-world dairy farming datasets are often noisy (e.g., missing or corrupted sensor signals) and contain only short labeled sequences, making conventional correlation analysis and feature prioritization unreliable. We present a robust learning framework that identifies head-specific informative sensor features and predicts daily milk yield by combining reinforcement learning (RL)-based dynamic feature gating with the Mamba architecture. The RL policy samples a binary feature mask to suppress uninformative or corrupted signals to maximize prediction reward, while the Mamba predictor captures long-range dependencies with linear computational complexity. Experiments using the MMCows dataset demonstrate that the proposed framework achieves an average

R^{2}

of 0.656 and exhibits substantially lower head-wise variance than Transformer-based baselines, indicating robustness to individual heterogeneity. Ablations removing key components show that RL-based gating is essential: removing the gating module (No-RL) collapses (

R^{2} < 0

). Overall, the proposed approach provides a practical solution for digital livestock farming that mitigates noise and data scarcity while improving robustness across heads.

Keywords:

dairy cattle; deep learning; Mamba architecture; milk yield prediction; digital livestock farming; reinforcement learning

1. Introduction

Digital livestock farming increasingly leverages multi-modal sensing systems to monitor animal behavior, health status, and production outcomes in commercial dairy operations [1,2,3,4,5]. Wearable and environmental sensors—such as inertial measurement units (IMUs), ultra-wideband (UWB) localization tags, and barn climate stations—produce continuous data streams that support management decisions, including nutrition adjustment, early health risk screening, and productivity management. Among these applications, accurate forecasting of daily milk yield is particularly valuable because it enables timely interventions and more efficient resource allocation at the farm level [6,7].

Despite its importance, reliable milk yield prediction from on-farm sensor data remains challenging [8,9,10]. A key obstacle is data quality. In commercial barns, sensor streams can include measurement noise and systematic distortions, such as intermittent dropouts, missing values, and outliers (e.g., abrupt spikes). These issues may arise from device detachment or misalignment, animal interference, harsh environmental conditions, and unstable wireless communication. A second obstacle is the mismatch between sensor sampling rates and label availability. While sensors may be sampled at fine temporal granularity (e.g., every 0.1 s), milk yield records are typically available only once per day. Consequently, after temporal alignment, aggregation, and windowing, the number of effective supervised training samples can be limited. Finally, substantial cow-to-cow heterogeneity means that a one-size-fits-all model may fail to extract stable and informative patterns from noisy signals, leading to poor generalization across individuals [11,12].

Prior work has investigated milk productivity from complementary perspectives, providing useful foundations for herd- and farm-level management. For example, Kurtuluş et al. examined how biological and environmental factors (including feeding-related variables) relate to milk production and derived management settings using a statistical optimization framework [13]. Soumri et al. modeled heat-stress effects on milk production traits using THI indices and random regression, capturing between-cow variability in production responses [14]. While these studies provide valuable population-level insights, they do not fully reflect the practical conditions of continuous sensor streams on commercial farms. In particular, (i) sensor malfunctions or communication disruptions can introduce mechanical noise into the measurements, and (ii) the signals most informative for prediction may shift across cows and situations depending on the animal’s physiological state (e.g., heightened susceptibility to heat stress).

These considerations motivate individualized modeling in PLF, especially for small-to-medium herds where managers often aim to support targeted, cow-specific interventions. While prior models can identify general risk factors, they often assume a relatively stable relationship between sensor features and milk yield. In practice, cows may respond differently to the same environmental load: for a heat-sensitive cow, yield changes may be closely linked to THI, whereas for another cow, yield variation may be better explained by behavior-related indicators associated with feeding and mobility. Our work addresses this gap by learning individualized, context-dependent sensor prioritization from multi-modal streams collected in a commercial farm setting.

In such noisy and data-scarce settings, conventional deep learning approaches—particularly Transformer-based models—can exhibit limitations. Transformers often require large-scale training data to generalize reliably; however, in PLF scenarios with limited labeled samples, they may overfit to noise and spurious correlations rather than capturing stable physiological dynamics [15,16,17,18,19]. In addition, the quadratic computational cost of self-attention with respect to sequence length can reduce practicality for long sensor sequences and resource-constrained environments [20,21].

To address these challenges, we propose PLF-Mamba, a two-stage framework that combines dynamic sensor channel prioritization with efficient temporal modeling for robust milk yield prediction under noisy and data-scarce conditions. Specifically, an RL-based gating module adaptively suppresses unreliable or uninformative modalities, and the resulting sequences are modeled by a Mamba predictor based on a selective state space model (SSM). The core innovation lies in this coupling: by mitigating context-irrelevant distortions at the input level, the Mamba backbone can more effectively focus on learning stable physiological dynamics from limited supervision. This design leverages SSM-style temporal dynamics with linear-time sequence modeling, which is well suited to long sensor streams under practical computational constraints [22,23,24,25].

The main contributions of this work are as follows:

We propose an integrated RL–Mamba learning framework tailored to PLF data characterized by sparse daily labels, real-farm measurement distortions, and strong individual heterogeneity.
We show that the proposed approach reduces cow-wise performance variability compared with Transformer and other baseline models, mitigating performance collapse on difficult, noise-affected cows.
Through ablation studies, we demonstrate that dynamic feature gating is a key component for maintaining predictive stability when supervision is limited and sensor streams contain frequent distortions.

2. Materials and Methods

2.1. Dataset Description and Preprocessing

This study utilizes the MMcows dataset [26], a multi-modal dairy farming dataset collected at a commercial dairy farm, comprising continuous monitoring data from 10 Holstein cows (Bos taurus L.) (10 head). The data used in this study were collected on 21 October 2025. Since this study focuses on sensor modality analysis, only cows with complete and synchronized multi-modal sensor streams together with daily milk yield records were included. Cows for which only image data and milk yield information were available were excluded from the analysis.

2.1.1. Sensor Modalities

The data acquisition system captures three categories of physiological and behavioral signals. To accurately reflect real-world farm conditions, we explicitly account for the susceptibility of each modality to specific environmental and biological noise:

Behavioral Metrics (IMU and UWB):
A neck-mounted sensor node records tri-axial acceleration and ultra-wideband (UWB) signals. Over 10-min intervals, we compute the standard deviation of the acceleration magnitude (Activity) and the variance of the UWB signal (Roaming). While these metrics represent movement intensity, they are prone to mechanical noise caused by non-locomotive behaviors, such as scratching against fences (grooming) or physical interactions with herd mates, which can produce high-variance artifacts unrelated to actual activity levels.
Physiological Status (Bolus): An indwelling reticulorumen bolus measures core body temperature (CBT) and internal pressure. Although these reflect metabolic status, the raw signals are frequently distorted by transient biological events, most notably rapid water intake (which causes sharp temperature drops) or rumen contractions, creating temporary deviations from the true physiological state.
Environmental Context (THI): A local weather station logs the temperature–humidity index (THI), a standard metric for assessing heat stress. While generally stable, micro-climate variations within the barn (e.g., proximity to fans or sprinklers) can introduce spatial discrepancies between the sensor reading and the cow’s actual thermal experience.

2.1.2. Data Preprocessing and Windowing

All sensor streams were resampled to a uniform 10-min resolution. Missing values caused by temporary transmission failures were imputed using linear interpolation followed by forward and backward filling.

Noise handling strategy. We distinguish (i) technical missingness due to communication failures from (ii) context-dependent real-farm artifacts and outliers (e.g., abrupt spikes or step changes) that can arise from animal behavior and physiological events, as described in Section 2.1.1. While interpolation addresses connectivity gaps, we did not apply aggressive heuristic filtering (e.g., smoothing or hard-threshold removal) to eliminate such artifacts. The rationale is that distinguishing between “noise” (e.g., fence scratching) and meaningful but transient physiological/behavioral changes (e.g., estrus-related locomotion) often requires contextual information. Therefore, these artifact-prone regimes are preserved in the input tensor to be handled by the proposed RL gating mechanism, rather than being discarded during preprocessing.

Following this, each feature was standardized using Z-score normalization,

x^{'} = (x - μ) / σ

, where

μ

and

σ

were computed exclusively from the training split to prevent information leakage. We formulated milk yield prediction as a supervised time-series regression problem. Samples were constructed using a sliding window with length

T = 60

steps (10 h) and stride

S = 6

steps (1 h). The input tensor is defined as

X \in R^{N \times T \times D}

, where N is the total number of windows,

T = 60

is the sequence length, and

D = 5

denotes the feature dimension (Activity, Roaming, CBT, Pressure, THI). The target variable Y corresponds to the daily milk yield (kg) recorded on the day immediately following each input window.

2.2. Proposed Framework: PLF-Mamba

We propose PLF-Mamba, a two-stage hierarchical framework designed to robustly predict milk yield under noisy and data-scarce conditions. As illustrated in Figure 1, the system consists of (1) a dynamic feature selector trained via reinforcement learning (RL) [27] to filter input signals and (2) a temporal predictor based on the Mamba architecture to model long-term dependencies.

2.3. Stage 1: Dynamic Feature Selection via Policy Gradient

2.3.1. Problem Formulation

In livestock monitoring, the predictive utility of sensor features is non-stationary and varies across individuals and environmental contexts. For example, activity metrics may be informative during estrus-related behavioral changes but largely uninformative during prolonged resting periods. Consequently, a fixed feature subset is suboptimal. We therefore formulate feature selection as a sequential decision problem and learn a stochastic gating policy that adapts to the statistical context of the current window.

2.3.2. Justification for Algorithm Selection

Although reinforcement learning (RL) offers a range of advanced algorithms such as Deep Q-Networks (DQN) [28,29] and Proximal Policy Optimization (PPO) [30,31], we deliberately adopt the REINFORCE algorithm (vanilla policy gradient) because it matches the structure and constraints of our feature-gating problem.

First, our feature selection for each window is naturally formulated as a contextual bandit rather than a sequential Markov decision process (MDP). The gating decision for a given window does not affect future windows and therefore does not involve state-transition dynamics or long-horizon credit assignment. As a result, the additional machinery of PPO or actor–critic methods, which is primarily designed to handle temporal credit assignment, is unnecessary.

Second, the action space is inherently combinatorial, consisting of

2^{D}

possible feature subsets. Value-based methods such as DQN are typically designed to choose a single action from a discrete action set, and extending them to multi-feature selection would require either an exponentially large output layer or specialized, complex architectures. In contrast, a policy gradient formulation can naturally parameterize the gating policy as independent Bernoulli variables, enabling efficient optimization in high-dimensional combinatorial spaces.

Finally, REINFORCE avoids the need for an auxiliary value (critic) network, which keeps the overall framework simple and computationally lightweight—a key requirement for reliable on-farm deployment.

Table 1 provides a compact summary of sections (i)–(iv); below, we present only the essential details.

2.3.3. (i) Policy Network

We model channel selection as a stochastic gating policy. Let

x_{t} \in R^{T \times D}

denote an input window and let

s_{t} \in R^{D}

denote a window-level state summary (e.g., the channel-wise mean of

x_{t}

over time). The policy network

π_{θ}

maps

s_{t}

to selection probabilities

p_{t} \in {[0, 1]}^{D}

. A binary channel mask is then sampled and applied:

m_{t} \sim Bernoulli (p_{t}), {\tilde{x}}_{t} = x_{t} ⊙ m_{t},

(1)

where ⊙ denotes element-wise multiplication broadcast along the temporal axis.

2.3.4. (ii) Reward Function and Policy Optimization

The gating policy is trained to maximize expected return while encouraging sparse channel usage. We define the step reward as validation performance penalized by the fraction of selected channels:

R_{t} = R_{val}^{2} - λ \cdot \frac{1}{D} {∥ m_{t} ∥}_{1},

(2)

where

λ

controls the sparsity strength. Because the sampling step

m_{t} \sim Bernoulli (p_{t})

is non-differentiable, we optimize

θ

using the REINFORCE policy gradient estimator:

\nabla_{θ} J (θ) \approx \sum_{t} \nabla_{θ} log π_{θ} (m_{t} ∣ s_{t}) (R_{t} - b_{t}),

(3)

where

b_{t}

is a moving-average baseline used to reduce the variance of the gradient estimate. (We use K Monte Carlo samples per update; details are provided in the implementation settings).

2.3.5. (iii) Selective State Space Model (SSM)

Given the gated sequence

{\tilde{x}}_{1 : T}

, we perform temporal prediction with a selective state space model (SSM). The underlying continuous-time dynamics are

h^{'} (t) = A h (t) + B x (t), y (t) = C h (t),

(4)

where

h (t)

is the latent state and

A, B, C

are learned projections. For discrete sampled data, we discretize the system via zero-order hold (ZOH) with time scale

Δ

:

\bar{A} = exp (Δ A), \bar{B} = {(Δ A)}^{- 1} (exp (Δ A) - I) \cdot Δ B .

(5)

2.3.6. (iv) Selective Scan Mechanism and Biological Interpretation

A key feature of Mamba is input-dependent parameterization that enables multi-scale dynamics. At each step t, the time-scale parameter

Δ_{t}

and the SSM projections

B_{t}

and

C_{t}

are generated from the current input:

Δ_{t} = Softplus ({Linear}_{Δ} (x_{t})), B_{t} = {Linear}_{B} (x_{t}), C_{t} = {Linear}_{C} (x_{t}) .

(6)

Intuitively,

Δ_{t}

controls the effective memory: larger

Δ_{t}

emphasizes short-term variations, whereas smaller

Δ_{t}

preserves longer-term trends. This selective scan helps attenuate transient measurement noise while retaining persistent physiological patterns, and it maintains linear computational complexity

O (T)

in the sequence length.

2.4. Implementation Details

The proposed framework was implemented in PyTorch (v2.9.1) [32], a widely used deep-learning library that supports reproducible training and efficient GPU acceleration for sequence models. The architecture specifications and hyperparameter settings are summarized in Table 2. The RL agent and the Mamba predictor were trained jointly, using distinct learning rates to reflect their different optimization dynamics.

2.5. Validation Strategy

To rigorously evaluate generalization capability and prevent temporal data leakage, we employed a Day–Block Split strategy. We partitioned the data by holding out entire 24 h blocks; for each run, 20% of the available days were randomly selected as validation. This protocol ensures that validation is conducted on completely unseen temporal contexts.

To assess the statistical significance of performance differences between the proposed PLF-Mamba and baseline models, we employed the Wilcoxon signed-rank test. This choice is motivated by two considerations:

1.: Non-normality under small sample size: With a limited cohort ( $N = 10$ heads), the normality assumption required for parametric paired tests (e.g., a paired t-test) is difficult to verify and may not hold.
2.: Paired comparison with effect magnitude: Because all models are evaluated on the same set of heads (paired samples), the Wilcoxon signed-rank test is more appropriate than the Mann–Whitney U test, which assumes independent samples. Moreover, unlike the Sign Test, it accounts for the magnitude of within-head performance differences, providing a more informative comparison.

3. Results

3.1. Preliminary Analysis: Heterogeneity and Non-Stationarity

Before evaluating predictive performance, we conducted a preliminary analysis to examine whether a fixed feature set is suitable for multi-cow monitoring under noisy and limited-length data conditions.

3.1.1. Individual Variability in Feature Prioritization

Figure 2 visualizes the feature selection behavior of the RL agent across cows. The heatmap reveals substantial heterogeneity: for instance, Roaming is frequently selected for Cow C04, whereas Activity is more consistently prioritized for Cow C05. These distinct patterns indicate that informative sensor modalities are highly cow-dependent, supporting the need for a dynamic, individualized gating mechanism rather than a static feature set.

3.1.2. Limitations of Simple Correlation Analysis

Figure 3 further illustrates that simple linear correlation analysis is often insufficient to characterize sensor–yield relationships. In several cows, dominant features exhibit weak or inconsistent Pearson correlations with milk yield, suggesting non-stationarity and potential non-linear effects. This observation motivates the adoption of learning-based approaches capable of capturing complex temporal dependencies beyond fixed linear associations.

3.2. Quantitative Performance Comparison

We compared the proposed PLF-Mamba framework against Transformer, LSTM, and MLP baselines. Additionally, we analyzed a “No-RL” model—an ablation variant excluding the reinforcement learning module—to rigorously evaluate the contribution of the dynamic feature gating mechanism.

Table 3 reports the

R^{2}

scores for all 10 cows (10 head). PLF-Mamba achieved the best average performance (

R^{2} = 0.656

), showing a slightly higher average

R^{2}

than Transformer (

0.640

) and LSTM (

0.633

). The MLP baseline performed substantially worse (

0.548

), highlighting the importance of temporal modeling for milk yield forecasting. Notably, removing the RL module (No-RL) led to a pronounced performance drop (Average

R^{2} = - 0.759

), suggesting that indiscriminate use of all sensor channels can overwhelm the predictor under noise-affected farm conditions without selective gating.

Stability and Robustness Analysis

A key advantage of PLF-Mamba is its improved stability across cows. As evident in Table 3 and summarized in Figure 4, the Transformer baseline exhibits high variance; while it excels on specific cows (e.g., 0.891 on C03), it suffers from severe degradation on challenging cases (e.g., 0.336 on C05). This results in a wide performance gap (Max–Min spread of 0.555 for Transformer). In contrast, PLF-Mamba maintains more consistent performance with a tighter spread, effectively reducing large failures in challenging cases (e.g., 0.599 on C05). We attribute this robustness to the synergy between the RL agent, which suppresses unreliable or uninformative channels, and the Mamba architecture, which efficiently models long-range dependencies even in data-scarce regimes.

3.3. Qualitative Analysis

Figure 5 presents qualitative reconstruction results for milk yield. Note that the target variable is presented in Z-score scaled units (denoted here as scaled milk yield) to standardize comparisons, distinct from fat–protein-corrected milk (FPCM) normalization often used in dairy science. The model’s predictions (red dashed lines) closely track the ground truth (black solid lines) across all cows, capturing both gradual trends and day-to-day fluctuations. Notably, even for cows with more volatile yield patterns, the model captures peaks and troughs with limited lag, supporting its potential utility for practical monitoring.

4. Discussion

4.1. Why Mamba Outperforms Transformer in Livestock Monitoring

Across our experiments, the Mamba-based predictor exhibits more consistent performance across cows than the Transformer baseline (Table 3), especially for individuals whose sensor streams are strongly affected by noise and non-stationarity. We attribute this performance gap to the alignment between the model’s inductive bias and the biological nature of lactation dynamics.

Transformer architectures rely on global self-attention, treating time steps as discrete tokens and computing pairwise interactions. Although this architecture has strong representational capacity, it does not impose an explicit inductive bias toward continuous temporal evolution. In data-scarce PLF settings, this flexibility can make it difficult to distinguish robust physiological dynamics from transient measurement noise, which may contribute to high variance and overfitting to spurious correlations [15,17].

In contrast, Mamba is grounded in state space models (SSMs), which model the underlying system through continuous-time evolution:

h^{'} (t) = A h (t) + B x (t) .

(7)

This formulation aligns with the view that milk production is driven by continuous metabolic processes (e.g., digestion, hormonal regulation, and thermal adaptation) rather than isolated events. The recurrent state

h (t)

acts as a latent physiological memory that propagates information over time, capturing persistent trends (e.g., gradual acclimation to heat load) while potentially attenuating high-frequency perturbations in sensor streams [22].

This perspective also helps explain the interaction with our RL module. In recurrent dynamics, corrupted inputs can propagate through the latent state and degrade downstream predictions. By dynamically suppressing uninformative channels before they enter the SSM dynamics, the RL agent encourages the latent-state evolution to be driven primarily by more reliable and context-relevant signals.

4.2. The Critical Role of Dynamic Feature Gating

The ablation study highlights the importance of dynamic feature gating. When the RL module is removed (No-RL), average performance degrades substantially (

R^{2} = - 0.759

), suggesting that indiscriminately feeding all available sensor data is detrimental in farm environments where measurement reliability fluctuates.

A deeper analysis indicates that this challenge is related to non-stationarity in feature relevance. In biological systems, the predictive value of a given signal can vary with the animal’s condition. For instance, accelerometer-derived activity may be informative during estrus-related locomotion changes, whereas during heat stress or illness it may become less reliable relative to internal temperature or other physiological indicators.

Our RL policy addresses this by treating sensor selection as a context-aware adaptation rather than a fixed rule. By penalizing the selection of irrelevant channels, the agent learns to prioritize signals that are most informative under the current context, analogous to condition-dependent prioritization in livestock health monitoring.

4.3. Implications for Digital Livestock Farming

The observed heterogeneity in feature selection challenges the assumption that a single “universal formula” can model all animals. As shown in the correlation analysis (Figure 3), the relationship between features (e.g., Activity) and milk yield varies markedly across individuals (from

r = - 0.76

to

r = + 0.18

), suggesting that a single global linear regression (

y = a x + b

) is unlikely to generalize well.

Instead, our results support individualized, dynamic sensor prioritization. To illustrate this, we highlight Cow C10 as an example in the sensor selection heatmap (Figure 2). Compared with motion-related channels, the gating policy for C10 more frequently prioritizes THI-related context and bolus-derived physiological signals, indicating that the most informative modalities can differ by individual.

A plausible biological interpretation is that, when thermal load increases, milk yield may be more constrained by thermoregulation than by physical activity. In such periods, internal temperature can be more directly informative than motion signals. Although the heatmap does not identify specific physiological states, the learned prioritization pattern is consistent with known responses to heat stress and supports the value of individualized, context-dependent sensing. Notably, the RL policy can discover such prioritization patterns without explicit annotation of physiological states, which can contribute to stable predictions when simple correlation-based heuristics are insufficient.

4.4. Limitations and Future Work

First, the dataset is limited to 10 cows from a single farm, which may constrain external validity. Generalization could be affected by factors such as breed, parity, and feeding regimes (e.g., TMR vs. grazing). Future work will validate the framework on larger, multi-farm datasets to assess robustness under diverse management environments.

Second, regarding computational efficiency, although Mamba offers linear-time inference (

O (T)

) in principle, the current training procedure involves an RL optimization loop. Future work will examine inference latency under practical hardware constraints and explore lighter alternatives, such as differentiable gating or policy distillation, to facilitate real-time deployment.

5. Conclusions

In this study, we proposed PLF-Mamba for individualized daily milk yield prediction in Digital livestock farming under practical constraints, including inter-animal heterogeneity, temporal non-stationarity, sensor noise, and limited labeled data. PLF-Mamba integrates a deep reinforcement learning (DRL)-based feature-gating module with a selective state space model (Mamba), enabling input-dependent modality prioritization and efficient temporal modeling.

Experiments on multi-modal sensor data collected from 10 dairy cows support the following conclusions:

1.: Accuracy and animal-level robustness. PLF-Mamba achieved the highest mean predictive performance ( $R^{2} = 0.656$ ) among the evaluated models and exhibited improved robustness relative to the Transformer baseline ( $R^{2} = 0.640$ ). In particular, the Transformer showed markedly degraded performance on specific cows with highly volatile signals, whereas PLF-Mamba maintained more stable performance across individuals. These results suggest that selective state space dynamics can be advantageous in small, noisy biological time-series settings.
2.: Contribution of dynamic gating (ablation evidence). In the ablation setting where the RL-based gating module was removed while keeping the temporal predictor unchanged, performance decreased substantially (average $R^{2} = - 0.759$ ). This observation indicates that, for the studied dataset, learning directly from raw multi-modal streams without adaptive feature selection is insufficient, and that dynamic gating is a key component of PLF-Mamba under data scarcity and noise.
3.: Individual-specific modality prioritization. Analysis of the learned gating policy revealed that the prioritized sensor modalities differ across cows and over time. While this does not establish causality, it provides model-based evidence consistent with the premise that informative signals for milk yield forecasting are individual- and context-dependent. Such prioritization patterns may serve as a useful diagnostic for domain experts when interpreting model behavior and planning targeted sensing or management actions.

Overall, the results indicate that combining selective state space modeling with adaptive feature gating is a viable alternative to attention-based architectures for individualized forecasting in small-scale, high-noise PLF datasets. The present study is limited to a single-farm cohort with a small sample size; therefore, the extent of generalization to other breeds, management systems, feeding strategies, seasons, and sensor configurations remains to be established. Future work will (i) validate PLF-Mamba on larger multi-farm and multi-season datasets, (ii) study transfer and calibration strategies for deployment to new herds, and (iii) further investigate biological interpretability by aligning modality prioritization patterns with independent records (e.g., health, reproduction, and feeding events). We also plan to evaluate computational efficiency and implement edge-oriented inference for real-time on-farm decision support.

Author Contributions

Conceptualization, J.K. and C.-B.S.; methodology, J.K. and C.-B.S.; software, J.K.; validation, J.K. and C.-B.S.; formal analysis, J.K.; investigation, J.K.; resources, J.K. and C.-B.S.; data curation, J.K.; writing—original draft preparation, J.K.; writing—review and editing, C.-B.S. and J.K.; visualization, J.K.; supervision, C.-B.S.; project administration, C.-B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Industrial Fundamental Technology Development Program (RS-2025-25455819, Development of AI-Based Manufacturing Process Optimization Technology for Quality Control of Small-Batch, Multi-Variety Dairy Products) funded by the Ministry of Trade, Industry and Resources of Korea.

Data Availability Statement

The original data presented in the study are openly available in MMCows at https://github.com/neis-lab/mmcows (accessed on 20 November 2025).

Acknowledgments

The present research was conducted under the Research Grant of Kwangwoon University in 2025.

Conflicts of Interest

Author Jonghyun Kim was employed by the company Valiantdata Inc. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Danev, V.; Atanasova, T.; Dineva, K. Multi-Sensor Platform in Precision Livestock Farming for Air Quality Measurement Based on Open-Source Tools. Appl. Sci. 2024, 14, 8113. [Google Scholar] [CrossRef]
Lamanna, M.; Bovo, M.; Cavallini, D. Wearable Collar Technologies for Dairy Cows: A Systematized Review of the Current Applications and Future Innovations in Precision Livestock Farming. Animals 2025, 15, 458. [Google Scholar] [CrossRef] [PubMed]
Paolino, R.; Trana, A.D.; Coppola, A.; Sabia, E.; Riviezzi, A.M.; Vignozzi, L.; Claps, S.; Caparra, P.; Pacelli, C.; Braghieri, A. May the Extensive Farming System of Small Ruminants Be Smart? Agriculture 2025, 15, 929. [Google Scholar] [CrossRef]
Tangorra, F.M.; Buoio, E.; Calcante, A.; Bassi, A.; Costa, A. Internet of Things (IoT): Sensors Application in Dairy Cattle Farming. Animals 2024, 14, 3071. [Google Scholar] [CrossRef]
Hashimoto, Y.; Zin, T.T.; Tin, P.; Kobayashi, I.; Hama, H. Generating Accurate Activity Patterns for Cattle Farm Management Using MCMC Simulation of Multiple-Sensor Data System. Sensors 2025, 25, 6781. [Google Scholar] [CrossRef]
Fan, X.; Ma, J. A Spatial Econometric Analysis of Weather Effects on Milk Production. Earth 2024, 5, 477–490. [Google Scholar] [CrossRef]
Pinto, D.; Santos, R.; Maia, C.; Bartolomé, E.; Niza-Ribeiro, J.; d’Anjo, M.C.; Batista, M.; Conceição, L.A. Digital Environmental Management of Heat Stress Effects on Milk Yield and Composition in a Portuguese Dairy Farm. AgriEngineering 2025, 7, 231. [Google Scholar] [CrossRef]
Dittrich, I.; Gertz, M.; Krieter, J. Alterations in Sick Dairy Cows’ Daily Behavioural Patterns. Heliyon 2019, 5, e02902. [Google Scholar] [CrossRef]
Fuentes, S.; Gonzalez Viejo, C.; Tongson, E.; Lipovetzky, N.; Dunshea, F.R. Biometric Physiological Responses from Dairy Cows Measured by Visible Remote Sensing Are Good Predictors of Milk Productivity and Quality through Artificial Intelligence. Sensors 2021, 21, 6844. [Google Scholar] [CrossRef]
Dineva, K.; Atanasova, T. Health Status Classification for Cows Using Machine Learning and Data Management on AWS Cloud. Animals 2023, 13, 3254. [Google Scholar] [CrossRef]
Cabrera, V.E.; Bewley, J.; Breunig, M.; Breunig, T.; Cooley, W.; De Vries, A.; Fourdraine, R.; Giordano, J.O.; Gong, Y.; Greenfield, R.; et al. Data Integration and Analytics in the Dairy Industry: Challenges and Pathways Forward. Animals 2025, 15, 329. [Google Scholar] [CrossRef]
D’Urso, P.R.; Arcidiacono, C.; Pastell, M.; Cascone, G. Assessment of a UWB Real Time Location System for Dairy Cows’ Monitoring. Sensors 2023, 23, 4873. [Google Scholar] [CrossRef]
Kurtuluş, Y.; Şahin, H.; Atalan, A. Statistical Optimization and Analysis of Factors Maximizing Milk Productivity. Animals 2025, 15, 1475. [Google Scholar] [CrossRef]
Soumri, N.; Carabaño, M.J.; González-Recio, O.; Bedhiaf-Romdhani, S. Modelling heat stress effects on milk production traits in Tunisian Holsteins using a random regression approach. J. Anim. Breed. Genet. 2025, 142, 155–169. [Google Scholar] [CrossRef] [PubMed]
Shao, R.; Bi, X.-J. Transformers Meet Small Datasets. IEEE Access 2022, 10, 118454–118464. [Google Scholar] [CrossRef]
Lee, S.; Lee, S.; Song, B.C. Improving Vision Transformers to Learn Small-Size Dataset from Scratch. IEEE Access 2022, 10, 123212–123224. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are Transformers Effective for Time Series Forecasting? Proc. AAAI Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Zeyer, A.; Bahar, P.; Irie, K.; Schlüter, R.; Ney, H. A Comparison of Transformer and LSTM Encoder–Decoder Models for ASR. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; IEEE: New York, NY, USA, 2019; pp. 8–15. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Babiloni, F.; Marras, I.; Deng, J.; Kokkinos, F.; Maggioni, M.; Chrysos, G.; Torr, P.; Zafeiriou, S. Linear Complexity Self-Attention with 3rd-Order Polynomials. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12726–12737. [Google Scholar] [CrossRef]
Guo, M.-H.; Liu, Z.-N.; Mu, T.-J.; Hu, S.-M. Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5436–5447. [Google Scholar] [CrossRef] [PubMed]
Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
Liu, X.; Zhang, C.; Huang, F.; Xia, S.; Wang, G.; Zhang, L. Vision Mamba: A Comprehensive Survey and Taxonomy. IEEE Trans. Neural Netw. Learn. Syst. 2025, 37, 505–525. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Y.; Yan, J.; Lu, J.; Sun, X. MemMamba: Rethinking Memory Patterns in State Space Model. arXiv 2025, arXiv:2510.03279. [Google Scholar] [CrossRef]
Patro, B.N.; Agneeswaran, V.S. Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges. arXiv 2024, arXiv:2404.16112. [Google Scholar] [CrossRef]
Vu, H.; Prabhune, O.C.; Raskar, U.; Panditharatne, D.; Chung, H.; Choi, C.; Kim, Y. MmCows: A Multimodal Dataset for Dairy Cattle Monitoring. Adv. Neural Inf. Process. Syst. 2024, 37, 59451–59467. [Google Scholar]
Barto, A.G.; Sutton, R.S.; Anderson, C.W. Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problems. IEEE Trans. Syst. Man Cybern. 1983, SMC-13, 834–846. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control Through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. Proc. AAAI Conf. Artif. Intell. 2016, 30, 2094–2100. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]

Figure 1. Overall architecture of the proposed PLF-Mamba framework. The system consists of (1) an RL-based dynamic feature selector that filters input sensor data and (2) a Mamba-based temporal predictor that estimates milk yield from the selected features.

Figure 2. AI-driven sensor prescription heatmap. Values indicate the selection frequency of sensor features by the RL agent. The distinct selection patterns (e.g., C04 prioritizing Roaming vs. C05 prioritizing Activity) illustrate substantial individual heterogeneity in feature relevance.

Figure 3. Daily aggregated correlation analysis. Scatter plots show weak and inconsistent linear relationships between representative sensor features and milk yield across cows, indicating that fixed linear models may be insufficient to capture heterogeneous and non-stationary milk production dynamics.

Figure 4. Overall performance comparison (Average

R^{2}

with standard deviation). PLF-Mamba shows improved stability (tighter error bars) compared to the higher-variance Transformer baseline. The pronounced degradation of the No-RL ablation highlights that dynamic feature gating is important for maintaining robust performance in noise-affected environments.

Figure 4. Overall performance comparison (Average

R^{2}

with standard deviation). PLF-Mamba shows improved stability (tighter error bars) compared to the higher-variance Transformer baseline. The pronounced degradation of the No-RL ablation highlights that dynamic feature gating is important for maintaining robust performance in noise-affected environments.

Figure 5. Ground truth (black) and model reconstruction (red dashed) of milk yield (Z-score scaled) across all cows. The proposed model tracks diverse individual patterns with high fidelity, capturing both stable trends (e.g., C04) and abrupt fluctuations (e.g., C09).

Table 1. Explanatory overview of the proposed PLF-Mamba framework components.

Module	Purpose	Core Formulation (Summary)	Role & Interpretation
(i) Policy Network	Select informative sensor channels dynamically based on input context.	$p_{t} = π_{θ} (s_{t})$ $m_{t} \sim Bernoulli (p_{t})$ ${\tilde{x}}_{t} = x_{t} ⊙ m_{t}$	Stochastic Gating: Suppresses irrelevant signals (noise) before temporal modeling.
(ii) Optimization	Train the gate to balance accuracy and sparsity without explicit labels.	$R_{t} = R_{val}^{2} - λ \frac{1}{D} {∥ m_{t} ∥}_{1}$ Update via REINFORCE	Efficiency: Penalizes redundant sensors; maximizes predictive reward.
(iii) Selective SSM	Model temporal dependencies efficiently on the gated sequence.	$h^{'} (t) = A h (t) + B \tilde{x} (t)$ $y (t) = C h (t)$ (Discretized via ZOH)	Linear Complexity: Captures long-term patterns without quadratic cost.
(iv) Selective Scan	Adapt time-scale and projections to the current input context.	$Δ_{t} = Softplus (Linear ({\tilde{x}}_{t}))$ Input-dependent $B_{t}, C_{t}$	Multi-scale Modeling: Large $Δ_{t}$ for rapid events; small $Δ_{t}$ for trends.

Table 2. Architecture specifications and hyperparameter settings. The input dimension D corresponds to the five sensor features.

Module	Component	Configuration/Parameters	Activation
RL Agent	Input state ( $s_{t}$ )	Dimension: $D = 5$ (mean statistics)	–
	Hidden layer 1	Linear ( $5 \to 128$ )	ReLU
	Hidden layer 2	Linear ( $128 \to 64$ )	ReLU
	Output layer ( $p_{t}$ )	Linear ( $64 \to 5$ )	Sigmoid
	Action sampling	$m_{t} \sim Bernoulli (p_{t})$	Stochastic
Mamba Predictor	Input embedding	Linear ( $5 \to d_{model}$ ), $d_{model} = 64$	–
	SSM Backbone	Stacked layers: $L = 2$	SiLU
		State dim ( $N_{ssm}$ ): 16
		Conv kernel size: 4
	Pooling	Global average pooling	–
	Prediction head	Linear ( $64 \to 1$ )	Linear
Training	Optimizer	AdamW	–
Training	Learning rate	RL: $10^{- 2}$ , Mamba: $10^{- 3}$	–

Table 3. Comparison of prediction performance (

R^{2}

). C01–C10 denote individual cow identifiers (one ID per cow). PLF-Mamba achieves the highest average score and demonstrates improved robustness. While baseline models can perform strongly on specific cows, they exhibit substantial performance degradation on challenging cases, whereas PLF-Mamba maintains more consistent predictive accuracy across individuals.

Table 3. Comparison of prediction performance (

R^{2}

). C01–C10 denote individual cow identifiers (one ID per cow). PLF-Mamba achieves the highest average score and demonstrates improved robustness. While baseline models can perform strongly on specific cows, they exhibit substantial performance degradation on challenging cases, whereas PLF-Mamba maintains more consistent predictive accuracy across individuals.

Cow ID	PLF-Mamba	Transformer	LSTM	MLP	No-RL
C01	0.682	0.559	0.638	0.386	0.271
C02	0.717	0.761	0.795	0.500	−0.042
C03	0.647	0.891	0.740	0.659	0.335
C04	0.650	0.709	0.558	0.602	−1.254
C05	0.599	0.336	0.445	0.449	0.037
C06	0.645	0.642	0.503	0.346	−1.097
C07	0.659	0.511	0.684	0.732	−1.164
C08	0.867	0.879	0.831	0.610	−2.862
C09	0.490	0.502	0.576	0.680	−1.708
C10	0.609	0.608	0.556	0.515	−0.106
Average	0.656	0.640	0.633	0.548	−0.759

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Sohn, C.-B. PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models. Agriculture 2026, 16, 389. https://doi.org/10.3390/agriculture16030389

AMA Style

Kim J, Sohn C-B. PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models. Agriculture. 2026; 16(3):389. https://doi.org/10.3390/agriculture16030389

Chicago/Turabian Style

Kim, Jonghyun, and Chae-Bong Sohn. 2026. "PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models" Agriculture 16, no. 3: 389. https://doi.org/10.3390/agriculture16030389

APA Style

Kim, J., & Sohn, C.-B. (2026). PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models. Agriculture, 16(3), 389. https://doi.org/10.3390/agriculture16030389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PLF-Mamba: Analyzing Individual Milk Yield Dynamics Under Data Scarcity Using Selective State Space Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Description and Preprocessing

2.1.1. Sensor Modalities

2.1.2. Data Preprocessing and Windowing

2.2. Proposed Framework: PLF-Mamba

2.3. Stage 1: Dynamic Feature Selection via Policy Gradient

2.3.1. Problem Formulation

2.3.2. Justification for Algorithm Selection

2.3.3. (i) Policy Network

2.3.4. (ii) Reward Function and Policy Optimization

2.3.5. (iii) Selective State Space Model (SSM)

2.3.6. (iv) Selective Scan Mechanism and Biological Interpretation

2.4. Implementation Details

2.5. Validation Strategy

3. Results

3.1. Preliminary Analysis: Heterogeneity and Non-Stationarity

3.1.1. Individual Variability in Feature Prioritization

3.1.2. Limitations of Simple Correlation Analysis

3.2. Quantitative Performance Comparison

Stability and Robustness Analysis

3.3. Qualitative Analysis

4. Discussion

4.1. Why Mamba Outperforms Transformer in Livestock Monitoring

4.2. The Critical Role of Dynamic Feature Gating

4.3. Implications for Digital Livestock Farming

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI