We propose Meta-LSTM-Affine, a batch-statistics-free normalization framework that unifies recurrent memory and meta-learning to achieve both temporal and task-level adaptability. The key idea is to replace BN’s batch-statistics-driven normalization step with an LSTM-based affine parameter generator (APG), and further enhance its adaptability with meta-learning strategies.
3.2. LSTM-Based Affine Parameter Generator
At each step
, we first extract a compact descriptor
via global average pooling (GAP), as shown in Equation (2). This descriptor is fed into an LSTM [
25], which maintains hidden and cell states
, as shown in Equation (3), where
are LSTM parameters. The hidden state
is then projected into affine parameters, where
and
, as shown in Equation (4). The final affined output
is obtained as shown in Equation (5). Thus,
,
evolve with temporal dynamics while remaining independent of batch statistics. We deliberately keep the LSTM parameters
fixed during meta-adaptation, as updating recurrent parameters under few-shot or streaming settings may lead to unstable temporal dynamics and overfitting. Therefore, we restrict adaptation to the lightweight projection head, which enables efficient task-specific refinement while preserving the temporal structure learned by the LSTM. Although the input samples are not strictly sequential, the LSTM serves as a mechanism to model gradual distributional adaptation across samples within an episode.
The overall workflow of the LSTM-based affine parameter generator is illustrated in
Figure 1 [
4], highlighting its role as a batch-statistics-free alternative to BN.
3.3. Meta-Learning Integration
While the LSTM-based affine parameter generator (APG) provides temporal adaptation, its projection parameters , and recurrent weights remain static after training, limiting task-level generalization under distribution shifts (e.g., FSL or SFDA). This limitation is particularly problematic in Few-Shot Learning (FSL), where each new task has its own distribution and requires rapid adaptation. To address this, we introduce three meta-learning mechanisms that incorporate the support set into the adaptation process.
- (a)
Meta-Initialization
Instead of starting the LSTM with fixed hidden and cell states, we generate task-specific initial states from the support set, as shown in Equation (6), where
is a learnable meta-network trained end-to-end during meta-training to produce task-adaptive initializations. This design ensures that each task begins from an initialization aligned with its underlying distribution. The loss on the support set
and the loss on the query set
share the same formulation, as shown in Equation (7). Each loss consists of: Equation (1) a standard task classification loss
computed from the classifier output, and Equation (2) a temporal smoothness penalty
, weighted by
, that encourages the affine parameters
to evolve smoothly across timesteps, as defined in Equation (8). Intuitively, this regularization discourages abrupt changes in the generated affine parameters between consecutive timesteps, stabilizing feature modulation under non-stationary input streams and complementing the temporal modeling provided by the LSTM. Meta-Initialization primarily addresses task-level distribution mismatch at the sequence onset, without altering the temporal dynamics governed by the LSTM.
- (b)
Meta-Conditioning
A task embedding is extracted from the support set,
, for example via class prototypes or attention-based pooling, where
is a learnable meta-network. The affine parameters are then generated as shown in Equation (9), where
captures temporal patterns, while the additional term
injects task-specific information. While Meta-Conditioning injects task context as a static conditioning signal, it does not involve parameter updates and therefore complements, rather than overlaps with gradient-based refinement. Thus, affine parameters depend both on temporal memory and the task embedding, enabling per-task adaptation without modifying the backbone. Importantly,
is learned during meta-training but kept fixed during meta-test inference, while adaptation arises solely from the task-dependent
.
- (c)
Meta-Update
A lightweight inner-loop adaptation is applied only to the projection head
, while keeping the backbone and LSTM fixed. Given the support set loss
, one or a few gradient steps yield updated parameters
, as shown in Equation (10), where
is optimized during meta-training and remains fixed at inference, rather than being treated as a hyperparameter or generated by a meta-network. This design allows rapid refinement of the affine projection head using only the support set, while avoiding adaptation of the entire backbone or LSTM generator. This localized adaptation strategy follows recent observations in meta-learning that fast task-level adaptation is most effective when confined to task-specific affine or projection layers, rather than deep feature extractors.
It is important to clarify that the meta-networks introduced in our design are not responsible for training or generating the parameters of the LSTM itself. Instead, they act as auxiliary modules that enhance task adaptability at different stages of the adaptation process. Meta-Initialization provides task-specific initial hidden and cell states, Meta-Conditioning injects task-level information into the affine generation process, and Meta-Update refines the affine projection parameters through lightweight gradient-based adaptation. Together, these mechanisms align initialization, conditioning, and refinement, enabling efficient task-level adaptation under diverse distribution shifts.
The overall episodic meta-training procedure integrates three meta-learning mechanisms, as illustrated in
Figure 2. Meta-Initialization performs a one-shot mapping
to generate task-specific initial states from the support set. Meta-Conditioning injects the task embedding
into the affine projection process, enabling task-aware modulation of the generated affine parameters. Meta-Update applies a lightweight inner-loop refinement to the projection head parameters based on the support set loss, allowing rapid task-level adaptation without modifying the backbone or the recurrent generator.
The overall episodic meta-training procedure is summarized in Algorithm 1, while the conceptual roles of the three meta-learning mechanisms are illustrated in
Figure 2. Meta-Initialization performs one-shot mapping
at the beginning of each episode to align the temporal dynamics with the task distribution. Meta-Conditioning incorporates the task embedding
into the affine projection process, enabling task-aware modulation of the generated parameters. Meta-Update applies a lightweight inner-loop refinement to the affine projection head parameters based on the support set loss, as implemented in Algorithm 1.
In addition, temporal smoothness regularization is applied to stabilize the evolution of affine parameters across steps, as described in Algorithm 2, while the inference procedure is summarized in Algorithm 3.
All three mechanisms are modular and can be independently activated depending on the target scenario (e.g., FSL or SFDA), without requiring any modification to the backbone network or the underlying LSTM architecture.
| Algorithm 1. Episodic Meta-Training for Meta-LSTM-Affine |
| Input: Task distribution ; backbone parameters ; LSTM parameters ; projection head ; optional Meta-Init network , Meta-Conditioning network , and conditional matrix ; inner step size ; smoothness weight |
| Output: Trained parameters , , , , , , , |
| 1. repeat # for each episode |
| 2. Sample a task with support set and query set ; |
| 3. if the Meta-Initialization module is used then |
| 4. |
| 5. else |
| 6. ← ; |
| 7. if the Meta-Conditioning module is used then |
| 8. |
| 9. else |
| 10. ; |
| 11. # Initialize inner projection parameters: |
| 12. ; |
| 13. # Support set forward (inner loop): |
| 14. Initialize ; and clear ; |
| 15. for each do |
| 16. # Compute descriptor |
| 17. ; ; |
| 18. # Generate affine parameters |
| 19. [; |
| 20. |
| 21. compute task loss and smoothness penalty ; # using Algorithm 2 |
| 22. ; # update loss |
| 23. ; |
| 24. end for |
| 25. # Meta-Update (if enabled): |
| 26. ; |
| 27. ; |
| 28. # Query set forward (outer loop): |
| 29. ; ; |
| 30. for each do |
| 31. ; ; |
| 32. # Generate affine parameters |
| 33. [; |
| 34. ; |
| 35. Compute task loss and smoothness penalty ; # using Algorithm 2 |
| 36. ; # update loss |
| 37. |
| 38. end for |
| 39. ; # Compute outer objective |
| 40. # Outer update: |
| 41. Apply gradient descent to , , , , (and if learnable) |
| 42. until convergence or maximum episodes reached |
| Algorithm 2. Temporal Smoothness Regularization |
| 1. for : |
| 2. ; |
| 3. for : |
| 4. set the penalty to 0. |
| Algorithm 3. Meta-Test (Inference) |
| Input: Trained , , , , , , , ; new task with support and query . |
| 5. Meta-Init: If enabled, compute ; else ; |
| 6. Meta-Conditioning: If enabled, compute task embedding ; otherwise set ; |
| 7. Meta-Update (optional): Apply one light inner-step update to with ; |
| 8. Query inference: Reset states to , then generate and final predictions. |
3.4. Bi-Level Training Objective
Meta-training follows an episodic paradigm [
7,
18]. For each episode, the support set
is used to perform task adaptation (Meta-Initialization, Meta-Conditioning, or Meta-Update), while the query set
evaluates performance. The outer-loop meta-objective minimizes the loss,
, as given in (11). Here,
denotes the standard task classification loss computed on the query set
using the normalized features produced by the affine parameters
. The term
, as defined in (12), is the temporal smoothness regularization that encourages the affine parameters to vary smoothly across timesteps, and
is a regularization weight. A single-level objective was insufficient to simultaneously enforce temporal smoothness and task-level adaptation, motivating the use of a bi-level formulation.
During meta-training, tasks are constructed episodically with disjoint support and query sets. Class splits for training, validation, and testing strictly follow standard Few-Shot Learning protocols to ensure fair evaluation and prevent data leakage across tasks.