Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems

Kao, Yang-Ta; Tu, Ching-Ting; Lin, Hwei Jen; Tokuyama, Yoshimasa

doi:10.3390/electronics15101990

Open AccessArticle

Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems

¹

Department of Commerce Technology and Management, Chihlee University of Technology, New Taipei City 220305, Taiwan

²

Department of Applied Mathematics, National Chung Hsing University, Taichung City 40227, Taiwan

³

Department of Computer Science and Information Engineering, Tamkang University, New Taipei City 251301, Taiwan

⁴

Department of Media and Image Technology, Faculty of Engineering, Tokyo Polytechnic University, Tokyo 164-0012, Japan

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(10), 1990; https://doi.org/10.3390/electronics15101990

Submission received: 5 April 2026 / Revised: 27 April 2026 / Accepted: 6 May 2026 / Published: 8 May 2026

(This article belongs to the Section Systems & Control Engineering)

Download

Browse Figures

Versions Notes

Abstract

Modeling non-stationary systems with dynamically evolving data distributions remains a fundamental challenge in modern learning and optimization problems. In this work, we adopt a generalized notion of non-stationarity, where distribution shifts across tasks and domains are treated as forms of non-stationary processes. This perspective allows us to study non-stationary behavior in controlled settings such as Few-Shot Learning (FSL) and Source-Free Domain Adaptation (SFDA), where data distributions vary across episodes or domains. Conventional normalization and feature modulation strategies often rely on batch-level statistics, leading to unstable behavior under small-batch, streaming, and distribution-shifted conditions. To address these limitations, we propose Meta-LSTM-Affine, a memory-based meta-adaptive affine modeling (normalization) framework that unifies recurrent temporal memory and meta-learning for robust feature modulation. Unlike batch-statistics-driven normalization, our method employs an LSTM-based affine parameter generator (APG) to dynamically produce channel-wise scale and shift parameters based on both current inputs and historical context. To further enhance task-level adaptability, we introduce three lightweight meta-learning mechanisms—Meta-Initialization, Meta-Conditioning, and Meta-Update—that enable rapid cross-task adaptation without modifying the backbone. A bi-level training strategy with temporal smoothness regularization ensures stable affine parameter dynamics under distributional shifts. We validate Meta-LSTM-Affine on FSL and SFDA benchmarks, including Omniglot, MiniImageNet, TieredImageNet, Office-31, MNIST, SVHN, and USPS. Experimental results show that our method consistently outperforms existing approaches such as BN, MetaBN, MetaAFN, and LSTM-Affine, achieving improved stability and adaptation performance with minimal additional computational overhead. Overall, Meta-LSTM-Affine provides a stable and efficient affine modeling mechanism for learning under generalized non-stationary conditions without relying on batch-level statistics. This generalized formulation of non-stationarity allows us to study distributional changes in controlled and widely used benchmark settings, while maintaining relevance to real-world scenarios such as streaming data, continual learning, and time-evolving environments.

Keywords:

affine modeling; batch-statistics-free normalization; meta-adaptive optimization; recurrent memory; affine parameter generation; non-stationary systems; streaming inference; few-shot learning; source-free domain adaptation

1. Introduction

Affine modeling (normalization) techniques are indispensable components in modern deep neural networks, as they stabilize feature distributions and facilitate faster convergence and improved generalization. Among them, Batch Normalization (BN) [1] remains one of the most widely used methods, normalizing activations across the batch dimension and applying a learned affine transformation. Despite its popularity, BN exhibits inherent limitations when batch statistics are unreliable or unavailable, such as in small-batch training, streaming inference, and distribution-shifted scenarios. In these cases, the mismatch between training and testing statistics often leads to unstable representations and degraded performance.

In this work, we consider a generalized notion of non-stationarity, where data distributions may evolve not only over time but also across tasks or domains. From this perspective, distribution shifts encountered in Few-Shot Learning (FSL) and Source-Free Domain Adaptation (SFDA) can both be interpreted as forms of non-stationary processes, in which the underlying data distribution varies across episodes or domains. This viewpoint enables us to study non-stationary behavior in controlled and widely adopted benchmark settings, while maintaining relevance to real-world scenarios such as streaming data, continual learning, and time-evolving environments.

Several studies have attempted to address these limitations. For example, MetaBN [2] adapts BN statistics via meta-learning, while Meta-Affine Normalization (MetaAFN) [3] proposes a batch-statistics-free affine transformation framework that learns adaptive parameters without relying on batch statistics. More recently, LSTM-Affine [4] replaced fixed affine parameters with a recurrent generator, leveraging temporal memory to capture sequential patterns across inputs. However, existing batch-statistics-free designs remain complementary but incomplete: MetaAFN removes batch dependency via adaptive affine modulation, yet it does not explicitly enforce temporal consistency across sequential inputs, whereas LSTM-Affine captures temporal dynamics but struggles to generalize across tasks due to its fixed initialization and projection mechanisms—limitations that are particularly critical in Few-Shot Learning (FSL) [5,6,7,8,9,10,11] and Source-Free Domain Adaptation (SFDA) [12,13,14].

To address these complementary limitations, we propose Meta-LSTM-Affine, a unifying normalization framework that combines the batch-statistics-free design of MetaAFN, the recurrent memory of LSTM-Affine, and meta-learning for rapid task-level adaptation. Specifically, we introduce three lightweight meta-adaptation mechanisms—Meta-Initialization, Meta-Conditioning, and Meta-Update—that enhance task-specific adaptability without requiring backbone modification or incurring significant inference cost. Furthermore, we provide theoretical analysis of temporal smoothness regularization to stabilize affine parameter dynamics under non-stationary shifts.

Extensive experiments on benchmark datasets for both FSL and SFDA demonstrate that Meta-LSTM-Affine consistently outperforms BN, MetaBN, MetaAFN, and LSTM-Affine, achieving faster task-level adaptation, improved stability, and robust performance under distribution shifts. We further show that the proposed design incurs minimal inference overhead, making it suitable for deployment in settings where batch statistics are unreliable or unavailable.

Specifically, our contributions are threefold:

(1): Unified batch-statistics-free normalization framework: We propose Meta-LSTM-Affine, which reformulates normalization as dynamic affine parameter generation and represents a unified framework that integrates recurrent temporal memory with meta-learning to address both Few-Shot Learning (FSL) and Source-Free Domain Adaptation (SFDA) under a generalized non-stationary setting.
(2): Lightweight meta-adaptation mechanisms: We introduce three modular strategies—Meta-Initialization, Meta-Conditioning, and Meta-Update—that enable rapid task-level adaptation while keeping the backbone and recurrent generator fixed, resulting in efficient and stable adaptation.
(3): Stability via temporal smoothness regularization: We incorporate a temporal smoothness objective to stabilize the evolution of affine parameters under non-stationary shifts, and provide empirical analysis demonstrating its effectiveness across standard FSL and SFDA benchmarks.

Compared with prior works such as MetaAFN and LSTM-Affine, the proposed framework integrates temporal modeling and task-level meta-adaptation within a unified batch-statistics-free formulation.

2. Related Work

Normalization techniques have been extensively explored for stabilizing training and improving generalization in deep neural networks. Prior research can be broadly grouped into several major directions, including traditional normalization methods and their variants, meta-learning-based adaptive normalization, batch-statistics-free approaches, and recurrent or memory-enhanced mechanisms. Below, we provide a brief overview of each category.

2.1. Batch Normalization and Its Variants

Batch Normalization (BN) [1] has been the dominant normalization method in deep neural networks, normalizing activations using batch-wise mean and variance, followed by a learnable affine transformation. While BN significantly accelerates training and improves generalization, its reliance on batch statistics makes it less reliable in small-batch training or when facing domain shifts at test time. Several variants have been proposed to mitigate this limitation. Instance Normalization (IN) [15] computes statistics per instance and channel, making it popular in style transfer tasks. Layer Normalization (LN) [16] normalizes across all channels of a layer, which is particularly effective in sequence models such as transformers. Group Normalization (GN) [17] aggregates statistics over groups of channels, offering stable performance in small-batch regimes. Despite these improvements, these methods still depend on fixed affine transformations learned during training, which may be suboptimal under distribution shifts.

2.2. Meta-Learning for Adaptive Normalization

Meta-learning has emerged as a powerful paradigm to enable rapid task-level adaptation [7,18,19]. In normalization, MetaBN [2] extends BN by adapting affine parameters through a meta-learning framework, improving performance in few-shot learning scenarios. Other works introduce feature-wise modulation mechanisms conditioned on auxiliary information [20,21], allowing scale and shift parameters to dynamically adapt to task-specific contexts. More recent approaches explore meta-modulation strategies [22] and conditional normalization [21] to incorporate task embeddings into normalization layers.

Beyond normalization, meta-learning has been extensively studied in Few-Shot Learning (FSL). Gradient-based meta-learners such as MAML [7] and metric-learning approaches, including Matching Networks [10], Prototypical Networks [5], and Relation Networks [11], demonstrate that task-conditioned adaptation enables rapid generalization from limited samples. These studies confirm the strong effectiveness of meta-learning in FSL and motivate its integration into normalization modules. Moreover, recent works extend meta-learning beyond FSL into domain adaptation [12,13], where task-specific modulation helps models generalize to unseen target domains. This naturally motivates the exploration of Source-Free Domain Adaptation (SFDA) [14], where no source data is available during adaptation.

2.3. Batch-Statistics-Free Normalization

An alternative line of research questions the necessity of batch statistics altogether. Batch-statistics-free approaches directly learn or generate affine parameters to modulate features [3,23,24]. For example, Meta Affine Transformation (MetaAFN) [3] entirely removes the whitening stage, replacing it with a lightweight meta-network that adaptively generates channel-wise scale and shift. By avoiding dependency on batch statistics, MetaAFN has shown robustness in both few-shot learning and source-free domain adaptation. Subsequent studies further explored learnable adaptive normalization [23] and sample normalization tailored for few-shot detection [24]. While Batch-statistics-free methods effectively reduce the risk of mismatched statistics, many designs rely on static affine parameters, which may be insufficient for highly dynamic or sequential settings.

2.4. Recurrent and Memory-Based Normalization

Recurrent architectures provide an alternative means of adaptation by leveraging temporal information. Long Short-Term Memory (LSTM) [25] networks can maintain hidden states across sequential inputs, making them well-suited for scenarios with gradual distribution shift. Our prior work, LSTM-Affine [4], integrates an LSTM-based generator to produce affine parameters that evolve with temporal context, resulting in more stable adaptation compared to static batch-statistics-free methods. While effective for in-episode sequential adaptation, LSTM-Affine lacks explicit task-level meta-adaptation, limiting its ability to rapidly generalize across different tasks or domains.

2.5. Toward Meta-Recurrent Normalization

Bridging batch-statistics-free design, recurrent adaptation, and meta-learning remains largely unexplored. Recent works such as MetaDiff [26] and domain generalization surveys [27] highlight the potential of combining meta-learning with advanced architectures for better generalization. However, existing meta-learning methods [18,19] do not explicitly integrate temporal memory, while recurrent generators [4,25] have yet to be combined with meta-learning mechanisms. This motivates the development of Meta-LSTM-Affine, a unified framework that incorporates batch-statistics-free efficiency [3], recurrent temporal modeling [4], and meta-learning-driven adaptability across both FSL and SFDA tasks [12,13,14].

3. The Proposed Method

We propose Meta-LSTM-Affine, a batch-statistics-free normalization framework that unifies recurrent memory and meta-learning to achieve both temporal and task-level adaptability. The key idea is to replace BN’s batch-statistics-driven normalization step with an LSTM-based affine parameter generator (APG), and further enhance its adaptability with meta-learning strategies.

3.1. Reformulating Normalization as Affine Transformation

Let

x_{t} \in R^{C \times H \times W}

denote the activation map at step

t

, where

C

is the number of channels. Standard BN [1] applies the formulae shown in Equation (1), where

μ_{B}

and

σ_{B}

are batch statistics, and

γ

,

β \in R^{C}

are learnable parameters.

{\hat{x}}_{t} = \frac{x_{t} - μ_{B}}{σ_{B}}, f_{t} = γ ⊙ {\hat{x}}_{t} + β

(1)

MetaAFN [3] showed that the reliance on batch statistics can be removed by directly generating

γ

and

β

using a lightweight meta-network conditioned on input features. Similarly, LSTM-Affine [4] demonstrated that a recurrent generator can produce temporally consistent

γ_{t}

,

β_{t}

, adapting to gradual distribution shifts. In Meta-LSTM-Affine, we unify these ideas by incorporating meta-learning into the recurrent generator, enabling both temporal consistency and rapid task-level adaptation.

3.2. LSTM-Based Affine Parameter Generator

At each step

t

, we first extract a compact descriptor

{\bar{x}}_{t} \in R^{C}

via global average pooling (GAP), as shown in Equation (2). This descriptor is fed into an LSTM [25], which maintains hidden and cell states

(h_{t}, c_{t})

, as shown in Equation (3), where

Θ

are LSTM parameters. The hidden state

h_{t} \in R^{d}

is then projected into affine parameters, where

W_{p} \in R^{2 C \times d}

and

b_{p} \in R^{2 C}

, as shown in Equation (4). The final affined output

f_{t}

is obtained as shown in Equation (5). Thus,

γ_{t}

,

β_{t}

evolve with temporal dynamics while remaining independent of batch statistics. We deliberately keep the LSTM parameters

Θ

fixed during meta-adaptation, as updating recurrent parameters under few-shot or streaming settings may lead to unstable temporal dynamics and overfitting. Therefore, we restrict adaptation to the lightweight projection head, which enables efficient task-specific refinement while preserving the temporal structure learned by the LSTM. Although the input samples are not strictly sequential, the LSTM serves as a mechanism to model gradual distributional adaptation across samples within an episode.

The overall workflow of the LSTM-based affine parameter generator is illustrated in Figure 1 [4], highlighting its role as a batch-statistics-free alternative to BN.

{\bar{x}}_{t} = G A P (x_{t})

(2)

(h_{t}, c_{t}) = L S T M ({\bar{x}}_{t}, h_{t - 1}, c_{t - 1}; Θ)

(3)

[γ_{t}, β_{t}] = W_{p} \cdot h_{t} + b_{p}

(4)

f_{t} = γ_{t} ⊙ x_{t} + β_{t}

(5)

3.3. Meta-Learning Integration

While the LSTM-based affine parameter generator (APG) provides temporal adaptation, its projection parameters

W_{p}

,

b_{p}

and recurrent weights

Θ

remain static after training, limiting task-level generalization under distribution shifts (e.g., FSL or SFDA). This limitation is particularly problematic in Few-Shot Learning (FSL), where each new task has its own distribution and requires rapid adaptation. To address this, we introduce three meta-learning mechanisms that incorporate the support set

S

into the adaptation process.

(a): Meta-Initialization

Instead of starting the LSTM with fixed hidden and cell states, we generate task-specific initial states from the support set, as shown in Equation (6), where

f_{φ} (\cdot)

is a learnable meta-network trained end-to-end during meta-training to produce task-adaptive initializations. This design ensures that each task begins from an initialization aligned with its underlying distribution. The loss on the support set

L_{S}

and the loss on the query set

L_{Q}

share the same formulation, as shown in Equation (7). Each loss consists of: Equation (1) a standard task classification loss

l_{t}

computed from the classifier output, and Equation (2) a temporal smoothness penalty

r_{t}

, weighted by

λ

, that encourages the affine parameters

(γ_{t}, β_{t})

to evolve smoothly across timesteps, as defined in Equation (8). Intuitively, this regularization discourages abrupt changes in the generated affine parameters between consecutive timesteps, stabilizing feature modulation under non-stationary input streams and complementing the temporal modeling provided by the LSTM. Meta-Initialization primarily addresses task-level distribution mismatch at the sequence onset, without altering the temporal dynamics governed by the LSTM.

(h_{0}, c_{0}) = f_{φ} (S)

(6)

L_{R} = \sum_{t = 0}^{T} (l_{t} + λ r_{t}), R \in {S, Q}

(7)

S m o o t h (γ_{t}, β_{t}; γ_{t - 1}, β_{t - 1}) = \{\begin{matrix} ∥ γ_{t} - γ_{t - 1} ∥_{2}^{2} + ∥ β_{t} - β_{t - 1} ∥_{2}^{2} f o r t > 1 \\ 0 e l s e \end{matrix}

(8)

(b): Meta-Conditioning

A task embedding is extracted from the support set,

z_{S} = g_{ψ} (S)

, for example via class prototypes or attention-based pooling, where

g_{ψ} (\cdot)

is a learnable meta-network. The affine parameters are then generated as shown in Equation (9), where

h_{t}

captures temporal patterns, while the additional term

U z_{S}

injects task-specific information. While Meta-Conditioning injects task context as a static conditioning signal, it does not involve parameter updates and therefore complements, rather than overlaps with gradient-based refinement. Thus, affine parameters depend both on temporal memory and the task embedding, enabling per-task adaptation without modifying the backbone. Importantly,

U

is learned during meta-training but kept fixed during meta-test inference, while adaptation arises solely from the task-dependent

z_{S}

.

[γ_{t}, β_{t}] = W_{p} h_{t} + U z_{S} + b_{p}

(9)

(c): Meta-Update

A lightweight inner-loop adaptation is applied only to the projection head

(W_{p}, b_{p})

, while keeping the backbone and LSTM fixed. Given the support set loss

L_{S}

, one or a few gradient steps yield updated parameters

({W^{'}}_{p}, {b^{'}}_{p})

, as shown in Equation (10), where

α

is optimized during meta-training and remains fixed at inference, rather than being treated as a hyperparameter or generated by a meta-network. This design allows rapid refinement of the affine projection head using only the support set, while avoiding adaptation of the entire backbone or LSTM generator. This localized adaptation strategy follows recent observations in meta-learning that fast task-level adaptation is most effective when confined to task-specific affine or projection layers, rather than deep feature extractors.

{W^{'}}_{p} = W_{p} - α \nabla_{W} L_{S}, {b^{'}}_{p} = b_{p} - α \nabla_{b} L_{S}

(10)

It is important to clarify that the meta-networks introduced in our design are not responsible for training or generating the parameters of the LSTM itself. Instead, they act as auxiliary modules that enhance task adaptability at different stages of the adaptation process. Meta-Initialization provides task-specific initial hidden and cell states, Meta-Conditioning injects task-level information into the affine generation process, and Meta-Update refines the affine projection parameters through lightweight gradient-based adaptation. Together, these mechanisms align initialization, conditioning, and refinement, enabling efficient task-level adaptation under diverse distribution shifts.

The overall episodic meta-training procedure integrates three meta-learning mechanisms, as illustrated in Figure 2. Meta-Initialization performs a one-shot mapping

(h_{0}, c_{0}) = f_{φ} (S)

to generate task-specific initial states from the support set. Meta-Conditioning injects the task embedding

z_{S}

into the affine projection process, enabling task-aware modulation of the generated affine parameters. Meta-Update applies a lightweight inner-loop refinement to the projection head parameters based on the support set loss, allowing rapid task-level adaptation without modifying the backbone or the recurrent generator.

The overall episodic meta-training procedure is summarized in Algorithm 1, while the conceptual roles of the three meta-learning mechanisms are illustrated in Figure 2. Meta-Initialization performs one-shot mapping

(h_{0}, c_{0}) = f_{ϕ} (S)

at the beginning of each episode to align the temporal dynamics with the task distribution. Meta-Conditioning incorporates the task embedding

z_{S}

into the affine projection process, enabling task-aware modulation of the generated parameters. Meta-Update applies a lightweight inner-loop refinement to the affine projection head parameters based on the support set loss, as implemented in Algorithm 1.

In addition, temporal smoothness regularization is applied to stabilize the evolution of affine parameters across steps, as described in Algorithm 2, while the inference procedure is summarized in Algorithm 3.

All three mechanisms are modular and can be independently activated depending on the target scenario (e.g., FSL or SFDA), without requiring any modification to the backbone network or the underlying LSTM architecture.

Algorithm 1. Episodic Meta-Training for Meta-LSTM-Affine

Input: Task distribution

p (T)

; backbone parameters

θ

; LSTM parameters

Θ

; projection head

W_{p}, b_{p}

; optional Meta-Init network

f_{φ}

, Meta-Conditioning network

g_{ψ}

, and conditional matrix

U

; inner step size

α

; smoothness weight

λ

Output: Trained parameters

θ

,

Θ

,

W_{p}

,

b_{p}

,

φ

,

ψ

,

U

,

α

1. repeat # for each episode

2. Sample a task

T ~ p (T)

with support set

S

and query set

Q

;

3. if the Meta-Initialization module is used then

4.

(h_{0}, c_{0}) \leftarrow f_{φ} (S)

5. else

6.

(h_{0}, c_{0})

←

(0, 0)

;

7. if the Meta-Conditioning module is used then

8.

z_{S} \leftarrow g_{ψ} (S)

9. else

10.

z_{S} \leftarrow 0

;

11. # Initialize inner projection parameters:

12.

{W^{'}}_{p} \leftarrow W_{p}; {b^{'}}_{p} \leftarrow b_{p}

;

13. # Support set forward (inner loop):

14. Initialize

L_{S} \leftarrow 0

;

(h, c) \leftarrow (h_{0}, c_{0})

and clear

(γ_{p r e v}, β_{p r e v})

;

15. for each

(x_{t}, y_{t}) \in S

do

16. # Compute descriptor

17.

{\bar{x}}_{t} \leftarrow G A P (x_{t})

;

(h, c) \leftarrow L S T M (\bar{x}, h, c; Θ)

;

18. # Generate affine parameters

19. [

γ_{t}, β_{t}] \leftarrow W_{p} h + U z_{S} + b_{p}

;

20.

f_{t} \leftarrow γ_{t} ⊙ x_{t} + β_{t}

21. compute task loss

l_{t}

and smoothness penalty

r_{t}

; # using Algorithm 2

22.

L_{S} \leftarrow L_{S} + l_{t} + λ r_{t}

; # update loss

23.

(γ_{p r e v}, β_{p r e v}) \leftarrow (γ_{t}, β_{t})

;

24. end for

25. # Meta-Update (if enabled):

26.

{W^{'}}_{p} \leftarrow {W^{'}}_{p} - α \nabla_{{W^{'}}_{p}} L_{S}

;

27.

{b^{'}}_{p} \leftarrow {b^{'}}_{p} - α \nabla_{{b^{'}}_{p}} L_{S}

;

28. # Query set forward (outer loop):

29.

L_{Q} \leftarrow 0

;

(h, c) \leftarrow (h_{0}, c_{0})

;

30. for each

(x_{t}, y_{t}) \in Q

do

31.

{\bar{x}}_{t} \leftarrow G A P (x_{t})

;

(h, c) \leftarrow L S T M (\bar{x}, h, c; Θ)

;

32. # Generate affine parameters

33. [

γ_{t}, β_{t}] \leftarrow {W^{'}}_{p} h + U z_{S} + {b^{'}}_{p}

;

34.

f_{t} \leftarrow γ_{t} ⊙ {\hat{x}}_{t} + β_{t}

;

35. Compute task loss

l_{t}

and smoothness penalty

r_{t}

; # using Algorithm 2

36.

L_{Q} \leftarrow L_{Q} + l_{t} + λ r_{t}

; # update loss

37.

(γ_{p r e v}, β_{p r e v}) \leftarrow (γ_{t}, β_{t})

38. end for

39.

L_{m e t a} \leftarrow L_{Q} + λ \sum r_{t}

; # Compute outer objective

40. # Outer update:

41. Apply gradient descent to

θ

,

Θ

,

W_{p}, b_{p}, ϕ

,

ψ

,

U

(and

α

if learnable)

42. until convergence or maximum episodes reached

Algorithm 2. Temporal Smoothness Regularization

1. for

t > 1

:

2.

r_{t} \leftarrow S m o o t h (γ_{t}, β_{t}; γ_{t - 1}, β_{t - 1})

;

3. for

t = 1

:

4. set the penalty to 0.

Algorithm 3. Meta-Test (Inference)

Input: Trained

θ

,

Θ

,

W_{p}

,

b_{p}

,

φ

,

ψ

,

U

,

α

; new task with support

S

and query

Q

.

5. Meta-Init: If enabled, compute

(h_{0}, c_{0}) \leftarrow f_{φ} (S)

; else

(h_{0}, c_{0}) \leftarrow 0

;

6. Meta-Conditioning: If enabled, compute task embedding

z_{S} \leftarrow g_{ψ} (S)

; otherwise set

z_{S} \leftarrow 0

;

7. Meta-Update (optional): Apply one light inner-step update to

W_{p}, b_{p}

with

S

;

8. Query inference: Reset states to

(h_{0}, c_{0})

, then generate

(γ_{t}, β_{t})

and final predictions.

3.4. Bi-Level Training Objective

Meta-training follows an episodic paradigm [7,18]. For each episode, the support set

S

is used to perform task adaptation (Meta-Initialization, Meta-Conditioning, or Meta-Update), while the query set

Q

evaluates performance. The outer-loop meta-objective minimizes the loss,

L_{m e t a}

, as given in (11). Here,

L_{Q} (γ_{t}, β_{t})

denotes the standard task classification loss computed on the query set

Q

using the normalized features produced by the affine parameters

(γ_{t}, β_{t})

. The term

R_{s m o o t h}

, as defined in (12), is the temporal smoothness regularization that encourages the affine parameters to vary smoothly across timesteps, and

λ

is a regularization weight. A single-level objective was insufficient to simultaneously enforce temporal smoothness and task-level adaptation, motivating the use of a bi-level formulation.

During meta-training, tasks are constructed episodically with disjoint support and query sets. Class splits for training, validation, and testing strictly follow standard Few-Shot Learning protocols to ensure fair evaluation and prevent data leakage across tasks.

L_{m e t a} = E_{Τ} [L_{Q} + λ R_{s m o o t h} (γ_{1 : T}, β_{1 : T})]

(11)

R_{s m o o t h} (γ_{1 : T}, β_{1 : T}) = \sum_{t = 0}^{T} r_{t}

(12)

3.5. Properties and Advantages

The overall architecture of Meta-LSTM-Affine is illustrated in Figure 3, where a shared encoder processes both the adaptation data and the inference data. The data used for task-level adaptation (referred to as support data in the meta-learning sense) is passed through the same encoder and GAP operation to form adaptation features, which are utilized by the Meta-Initialization, Meta-Conditioning, and Meta-Update modules. These meta-modules provide task-level conditioning signals that configure the LSTM-based affine generator. In contrast, the inference (query) samples follow the main inference pathway, where their features are modulated by the LSTM-APG to produce temporally evolving affine parameters

(γ_{t}, β_{t})

.

The proposed Meta-LSTM-Affine framework offers the following advantages: (1) Batch-statistics-free efficiency: It avoids the need for batch statistics, maintaining stability under small-batch or distribution-shifted conditions [3,23,24]. (2) Temporal adaptation: The LSTM-based affine generator captures sequential dependencies, enabling smooth and context-aware evolution of the affine parameters [4,25]. (3) Task-aware generalization: Through Meta-Initialization, Meta-Conditioning, and Meta-Update, the model supports fast task-level adaptation to new domains or tasks, without requiring retraining of the backbone network [2,7,18,19,21,22]. (4) Lightweight design: Only compact projection heads and meta-modules are introduced, preserving backbone compatibility and ensuring efficient inference.

4. Experimental Results

We evaluate Meta-LSTM-Affine on both Few-Shot Learning (FSL) and Source-Free Domain Adaptation (SFDA) benchmarks to assess its effectiveness and compare it against existing normalization methods. We focus on representative normalization-based methods to enable controlled and fair comparison under consistent architectures and training protocols, isolating the effect of the proposed normalization design.

4.1. Datasets

For FSL, we adopt three widely used benchmarks: Omniglot [6], MiniImageNet [8], and TieredImageNet [28]. Omniglot contains 1623 handwritten character classes from 50 alphabets, each with 20 samples; we follow the standard split of 1200 classes for training, 100 for validation, and 323 for testing. MiniImageNet consists of 100 classes sampled from ImageNet, each with 600 images, divided into 64 training, 16 validation, and 20 testing classes. TieredImageNet is a larger subset of ImageNet with 608 classes grouped into 34 higher-level categories; the standard split assigns 351 classes for training, 97 for validation, and 160 for testing. These benchmarks provide a comprehensive evaluation of few-shot classification under varying levels of complexity.

For SFDA, we consider both digit recognition and object recognition tasks. For digits, we use MNIST [12], USPS [13], and SVHN [14], evaluating domain adaptation across U→M, S→M, and M→U. MNIST includes 70,000 grayscale digit images (28 × 28, 10 classes), USPS contains 9298 grayscale digit images (16 × 16, 10 classes), and SVHN provides 99,289 color digit images (32 × 32, 10 classes) collected from natural scenes. For objects, we use the Office-31 dataset [7], which consists of 4652 images in 31 categories from three domains: Amazon (A), Webcam (W), and DSLR (D). We evaluate all six transfer tasks (A→D, A→W, W→D, W→A, D→A, D→W). These datasets collectively test adaptability under both low-level digit recognition and high-level object classification with significant domain shifts.

4.2. Experimental Setup

For FSL experiments, we follow the standard episodic training paradigm [5,7,8,9], where each episode consists of a support set for task adaptation and a query set for evaluation. Experiments are conducted under the 5-way 5-shot classification settings, following our previous work [4] to ensure consistency in comparison. For the backbone, we use a 4-layer CNN with 64 filters per layer on Omniglot, and ResNet-12 on MiniImageNet and TieredImageNet. Models are optimized using Adam with an initial learning rate of 0.001, which is decayed by half every 20,000 episodes. Meta-training is performed over 100,000 episodes, with meta-validation every 1000 episodes.

For evaluation, we report the average accuracy of over 600 randomly sampled episodes. In addition, all results are reported as the mean over 5 independent runs with different random seeds, along with standard deviations and 95% confidence intervals. The confidence intervals are computed across independent runs.

For SFDA, we follow the SHOT framework [10], where the classifier is frozen, and only the feature extractor (with BN replaced by Meta-LSTM-Affine) is adapted using unlabeled target data. Training objectives combine information maximization loss with entropy minimization and pseudo-label cross-entropy loss. We train for 30 epochs with a mini-batch size of 64, an initial learning rate of 0.01, and cosine decay down to 0.001. For Office-31, we use ResNet-50 [5] as the backbone under the same protocol, evaluating all six domain adaptation tasks.

Although the overall framework may involve test-time optimization of the feature extractor under the SHOT protocol, the proposed Meta-LSTM-Affine module itself does not require explicit backpropagation-based updates during inference. Instead, it performs feed-forward adaptation via dynamic affine parameter generation. In contrast, the overall SFDA framework may still involve standard optimization procedures (e.g., SHOT).

4.3. Results on Few-Shot Classification

Table 1 reports the classification accuracies on Omniglot, MiniImageNet, and TieredImageNet under the 5-way 5-shot settings. Meta-LSTM-Affine achieves the best performance across all datasets and evaluation protocols, consistently outperforming BN, MetaBN, MetaAFN, and LSTM-Affine. The gains are most significant on MiniImageNet, which is considered a challenging benchmark due to its fine-grained categories and large intra-class variability. These results highlight that combining temporal memory with meta-learning in a batch-statistics-free design enables both rapid task-level adaptation and stable performance, avoiding the instability often observed in BN under small or imbalanced batches. Notably, the performance gap between LSTM-Affine and Meta-LSTM-Affine is more pronounced on MiniImageNet and TieredImageNet than on Omniglot. This observation suggests that temporal adaptation alone is insufficient when inter-task distribution shifts are substantial, and that explicit task-level conditioning and initialization are necessary to handle higher visual complexity and semantic diversity.

Table 2 summarizes the average accuracies on the digit classification tasks (U→M, S→M, M→U). Meta-LSTM-Affine achieves the best overall performance, surpassing all competing normalization methods. Unlike optimization-based SFDA methods that rely on backpropagation at test time, our framework leverages the proposed module to provide efficient adaptation without requiring extensive test-time optimization. This design yields both improved accuracy and inference efficiency, indicating that recurrent affine generation can serve as an effective alternative to optimization-based test-time adaptation under source-free constraints.

Table 3 presents the results on the Office-31 benchmark under six domain transfer settings. Meta-LSTM-Affine achieves the highest average accuracy, with especially significant improvements on the challenging A→W and W→A transfers, which involve substantial visual domain gaps. The results confirm that the integration of recurrent temporal modeling and meta-level adaptation provides a robust mechanism for aligning features in the absence of source data. In particular, the consistent gains on A→W and W→A transfers indicate that Meta-LSTM-Affine effectively handles asymmetric domain shifts, where source and target domains differ substantially in both visual statistics and data acquisition conditions. Furthermore, the lightweight nature of our method ensures that these improvements are obtained without incurring additional computational overhead compared to BN-based baselines.

In summary, Meta-LSTM-Affine consistently outperforms prior approaches in both FSL and SFDA benchmarks. The improvements validate the effectiveness of unifying batch-statistics-free design, recurrent memory, and meta-learning, enabling stable and adaptable normalization across diverse distributional settings.

4.4. Ablation Studies

To analyze the contribution of each component, we evaluate three variants:

w/o Meta-Initialization: Removing task-specific initialization of LSTM states.

w/o Meta-Conditioning: Disabling task embedding conditioning.

w/o Meta-Update: Removing inner-loop adaptation of projection head.

For FSL, the results in Table 4 show that each mechanism contributes positively in the 5-shot setting. Meta-Update notably improves stability across episodes, while Meta-Conditioning enhances the model’s adaptation capability. Removing Meta-Initialization results in slower convergence and reduced consistency during task adaptation. The “Gap to Full” values in Table 4 further quantify the performance degradation caused by disabling each component.

For SFDA, the results in Table 5 demonstrate that temporal adaptation alone (LSTM-Affine) is insufficient for robust cross-domain transfer. Meta-Conditioning significantly improves adaptation to unseen domains by leveraging task embeddings, while Meta-Update enhances stability when adaptation requires fine-grained adjustment of affine parameters. Together, the three mechanisms achieve the best accuracy across both digit recognition and Office-31 transfer tasks. The gaps listed in Table 5 highlight the individual contributions of each meta-learning mechanism to the overall transfer performance. Overall, the ablation results confirm that the three meta-learning mechanisms play complementary roles: Meta-Initialization aligns the adaptation starting point, Meta-Conditioning injects task context, and Meta-Update refines task-specific affine projections. Removing any single component leads to measurable performance degradation, validating the necessity of the proposed unified design.

4.5. Computational Complexity Analysis

To further evaluate the efficiency of the proposed method, we analyze its computational complexity in terms of parameter count and floating-point operations (FLOPs), and compare it with representative normalization approaches, including BN, MetaAFN, and LSTM-Affine. The reported FLOPs are estimated under the same backbone setting for fair comparison.

As shown in Table 6, Meta-LSTM-Affine introduces only a modest increase in both parameters and computational cost compared to the baseline methods. Since the backbone network remains unchanged, the additional overhead mainly comes from the affine parameter generator and the lightweight meta-adaptation modules. Notably, the LSTM-based generator operates on compact feature descriptors obtained via global average pooling, rather than full feature maps, which significantly limits the computational burden.

Compared with LSTM-Affine, the additional cost of Meta-LSTM-Affine mainly arises from the meta-learning components (Meta-Initialization, Meta-Conditioning, and Meta-Update), which are lightweight and only applied during adaptation. As a result, the overall increase in FLOPs remains modest, as quantified in Table 6.

These results demonstrate that the proposed framework achieves improved adaptability under non-stationary conditions while maintaining relatively low computational overhead compared to baseline methods.

5. Conclusions

In this paper, we introduced Meta-LSTM-Affine, a batch-statistics-free affine modeling (normalization) framework that unifies recurrent temporal memory with meta-learning to achieve both temporal and task-level adaptability. Unlike Batch Normalization, which relies on potentially unstable batch-level statistics, Meta-LSTM-Affine reformulates normalization as dynamic affine parameter generation, enabling robust feature modulation without dependence on batch statistics.

Experimental results from Few-Shot Learning (FSL) benchmarks (Omniglot, MiniImageNet, and TieredImageNet) and Source-Free Domain Adaptation (SFDA) tasks (MNIST, USPS, SVHN, Office-31) demonstrate the effectiveness of the proposed design under two representative yet challenging settings. In FSL, Meta-LSTM-Affine improves stability and generalization under limited data, while in SFDA, it mitigates performance degradation caused by domain shift without access to source data. These results suggest that coupling recurrent memory with task-aware modulation provides a principled mechanism for handling both intra-task sequential dynamics and inter-task distributional variation.

The proposed framework is particularly suitable for learning under generalized non-stationary conditions, where data distributions evolve across tasks, domains, or time. In this work, we validate this capability using controlled benchmark settings (FSL and SFDA), which capture representative forms of distributional variation.

Beyond these case studies, the proposed framework highlights a broader perspective on affine modeling: rather than relying on fixed statistics or static affine parameters, normalization can be viewed as a dynamic, context-dependent transformation driven by both temporal history and task information.

Several directions merit further investigation. Future work includes extending the framework to real-world streaming and time-evolving scenarios, exploring more advanced meta-optimization strategies, and incorporating attention-based or transformer-style memory to better capture long-range dependencies.

Author Contributions

Y.-T.K., C.-T.T., H.J.L. and Y.T. Conceptualization, Y.-T.K., C.-T.T. and H.J.L.; methodology, Y.-T.K., C.-T.T. and H.J.L.; software, Y.-T.K.; validation, Y.-T.K., C.-T.T. and H.J.L.; formal analysis, H.J.L.; investigation, C.-T.T.; resources, Y.T.; data curation, Y.T.; writing—original draft preparation, H.J.L.; writing—review and editing, Y.T.; visualization, Y.-T.K.; supervision, H.J.L.; project administration, H.J.L.; funding acquisition, H.J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML); PMLR: Sheffield, UK, 2015; pp. 448–456. [Google Scholar]
Gao, W.; Shao, M.; Shu, J.; Zhuang, X. Meta-BN net for few-shot learning. Front. Comput. Sci. 2023, 17, 171302. [Google Scholar] [CrossRef]
Yeh, J.P.; Tsai, Y.; Lin, H.J.; Tokuyama, Y.; Hsu, W.-L. Meta Affine Transformation: A Batch-Statistics-Free Adaptive Normalization Method for Robust Few-Shot Learning and Domain Adaptation. Int. J. Pattern Recognit. Artif. Intell. 2025, 39, 2551033. [Google Scholar] [CrossRef]
Yeh, J.P.; Feng, J.-M.; Lin, H.J.; Tokuyama, Y. Replacing Batch Normalization with Memory-Based Affine Transformation for Test-Time Adaptation. Special issue on Advances in Data Security: Challenges, Technologies, and Applications. Electronics 2025, 14, 4251. [Google Scholar] [CrossRef]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 4080–4090. [Google Scholar]
Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning (ICML’17); PMLR: Sheffield, UK, 2017; pp. 1126–1135. [Google Scholar]
He, H.; Song, Y.; Wang, J. Few-shot and meta-learning methods for image understanding: A survey. Front. Comput. Sci. 2023, 17, 173327. [Google Scholar] [CrossRef]
Tian, S.; Li, L.; Li, W.; Ran, H.; Ning, X.; Tiwari, P. A survey on few-shot class-incremental learning. Neural Netw. 2024, 169, 307–324. [Google Scholar] [CrossRef] [PubMed]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 3637–3645. Available online: https://dl.acm.org/doi/10.5555/3157382.3157504 (accessed on 27 April 2026).
Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.S.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Liang, J.; Hu, D.; Feng, J. Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. In Proceedings of the 37th International Conference on Machine Learning (ICML’20); PMLR: Sheffield, UK, 2020; pp. 6028–6039. Available online: https://dl.acm.org/doi/10.5555/3524938.3525498 (accessed on 27 April 2026).
Liang, J.; Hu, D.; Wang, Y.; He, R.; Feng, J. Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer. In IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: Piscataway, NJ, USA, 2022; Volume 44, pp. 8602–8617. [Google Scholar] [CrossRef]
Kundu, J.N.; Venkat, N.; Rahul, M.V.; Babu, R.V. Universal Source-Free Domain Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4543–4552. [Google Scholar] [CrossRef]
Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2017, arXiv:1607.08022. [Google Scholar] [CrossRef]
Xu, J.; Sun, X.; Zhang, Z.; Zhao, G.; Lin, J. Understanding and Improving Layer Normalization. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Wu, Y.; He, K. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2018; pp. 3–19. [Google Scholar]
Gharoun, H.; Momenifar, F.; Chen, F.; Gandomi, A.H. Meta-learning Approaches for Few-Shot Learning: A Survey of Recent Advances. ACM Comput. Surv. 2024, 56, 294. [Google Scholar] [CrossRef]
Finn, C.; Rajeswaran, A.; Kakade, S.; Levine, S. Online meta-learning. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2023; PMLR: Sheffield, UK, 2019; pp. 1920–1930. Available online: https://proceedings.mlr.press/v97/finn19a.html (accessed on 27 April 2026).
Wu, Z.; Li, Y.; Guo, L.; Jia, K. Parn: Position-aware relation networks for few-shot learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2019; pp. 6659–6667. [Google Scholar]
Li, Z.; Zhou, F.; Chen, F.; Li, H. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv 2017, arXiv:1707.09835. [Google Scholar]
Sun, W.; Du, Y.; Zhen, X.; Wang, F.; Wang, L.; Snoek, C.G.M. MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks. In Proceedings of the 40th International Conference on Machine Learning (ICML), Honolulu, HI, USA, 23–29 July 2023; PMLR: Sheffield, UK, 2023; pp. 32847–32858. Available online: https://proceedings.mlr.press/v202/sun23b.html (accessed on 27 April 2026).
Du, Y.; Zhen, X.; Shao, L.; Snoek, C.G.M. MetaNorm: Learning to Normalize Few-Shot Batches Across Domains. In Proceedings of the ICLR, Virtual, 7–11 May 2021; Available online: https://openreview.net/forum?id=9z_dNsC4B5t (accessed on 27 April 2026).
Perez, E.; Strub, F.; de Vries, H. FiLM: Visual Reasoning with a General Conditioning Layer. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2018. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Kim, T.; Park, J.; Cho, H. MetaDiff: Diffusion-based meta-learning without second-order gradients. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Palo Alto, CA, USA, 2024; Volume 38, pp. 16687–16695. [Google Scholar] [CrossRef]
Wang, J.; Lan, C.; Liu, C.; Ouyang, Y.; Qin, T.; Lu, W.; Chen, Y.; Zeng, W.; Yu, P.S. Generalizing to Unseen Domains: A Survey on Domain Generalization. In IEEE Transactions on Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2022; Volume 35, pp. 8052–8072. [Google Scholar]
TieredImageNet. Available online: https://www.kaggle.com/datasets/arjun2000ashok/tieredimagenet (accessed on 15 December 2025).

Figure 1. Architecture of the LSTM-based affine parameter generator. At each step

t

, the input feature map

x_{t}

is first reduced by global average pooling into a compact descriptor

{\bar{x}}_{t}

. The LSTM processes

{\bar{x}}_{t}

together with its hidden and cell states to produce a dynamic representation

h_{t}

, which is projected into affine parameters

γ_{t}

,

β_{t}

for temporally adaptive and batch-independent transformation.

Figure 1. Architecture of the LSTM-based affine parameter generator. At each step

t

, the input feature map

x_{t}

is first reduced by global average pooling into a compact descriptor

{\bar{x}}_{t}

. The LSTM processes

{\bar{x}}_{t}

together with its hidden and cell states to produce a dynamic representation

h_{t}

, which is projected into affine parameters

γ_{t}

,

β_{t}

for temporally adaptive and batch-independent transformation.

Figure 2. Conceptual comparison of three meta-learning mechanisms integrated into the LSTM-based affine parameter generator. (a) Meta-Initialization derives task-specific hidden and cell states from the support set. (b) Meta-Conditioning injects a task embedding into the affine projection to enable task-aware modulation. (c) Meta-Update performs lightweight gradient-based refinement of the projection parameters for rapid task-level adaptation.

Figure 3. Overview of the proposed Meta-LSTM-Affine framework. Support data is used to generate meta-level modulation signals, which condition the LSTM-based affine parameter generator. The query features are then normalized using the generated parameters for classification.

Table 1. Few-shot classification accuracy (%) with 95% confidence intervals on Omniglot, MiniImageNet, and TieredImageNet under the 5-way 5-shot settings.

Method	Omniglot	MiniImageNet	TieredImageNet	Avg.
BN	94.8	67.5	72.4	79.4
MetaBN	95.2	70.2	75.5	80.3
MetaAFN	96.7	72.3	77.0	82.0
LSTM-Affine	99.0	74.5	79.4	84.3
Meta-LSTM-Affine	99.7	75.2	80.1	85.0

Table 2. Source-Free Domain Adaptation (SFDA) accuracy (%) on digit classification datasets (MNIST, USPS, SVHN). Reported results are averaged over multiple runs across U→M, S→M, and M→U transfer tasks.

Method	U→M	S→M	M→U	Avg.
BN	97.71	99.05	98.17	98.31
MetaBN	97.66	98.82	98.17	98.22
MetaAFN	97.61	99.09	98.17	98.29
LSTM-Affine	98.04	99.10	98.66	98.60
Meta-LSTM-Affine	98.32	99.37	98.92	98.87

Table 3. Source-Free Domain Adaptation (SFDA) accuracy (%) on Office-31 dataset under all transfer directions (Amazon → Webcam, Amazon → DSLR, Webcam → DSLR, Webcam → Amazon, DSLR → Amazon, DSLR → Webcam).

Method	A→W	A→D	W→D	W→A	D→A	D→W	Avg.
BN	92.24	90.29	99.90	72.82	73.39	97.78	87.74
MetaBN	92.20	92.21	99.80	71.14	73.77	98.03	87.86
MetaAFN	90.25	93.52	99.60	76.75	75.21	98.10	88.91
LSTM-Affine	92.26	94.05	99.90	77.94	76.14	98.37	89.78
Meta-LSTM-Affine	92.66	94.45	99.98	78.34	76.54	98.77	90.12

Table 4. Average classification accuracy (%) for the ablation study of Meta-LSTM-Affine in the 5-shot setting. The “Gap to Full” column quantifies the performance drop relative to the full model.

Method	Avg (Few-Shot)	Gap to Full
Meta-LSTM-Affine	85.0	—
w/o Meta-Init	83.1	−1.9
w/o Meta-Cond.	83.5	−1.5
w/o Meta-Update	82.0	−3.0

Table 5. Average accuracy (%) of different model variants on SFDA tasks over the digit and Office-31 datasets. The “Gap to Full” columns indicate accuracy degradation compared to the full Meta-LSTM-Affine model.

Method	Avg (Digits)	Avg (Office-31)	Gap to Full (Digits)	Gap to Full (Office-31)
Meta-LSTM-Affine	98.87	90.12	0.00	—
w/o Meta-Init	98.22	89.30	−0.65	−0.82
w/o Meta-Cond.	98.34	89.42	−0.53	−0.70
w/o Meta-Update	98.34	88.93	−0.53	−1.19

Table 6. Complexity comparison.

Method	Params (Relative)	FLOPs (Relative)	Overhead
BN	1.00×	1.00×	–
MetaAFN	1.02×	1.01×	+2%
LSTM-Affine	1.05×	1.03×	+5%
Meta-LSTM-Affine	1.07×	1.04×	+7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kao, Y.-T.; Tu, C.-T.; Lin, H.J.; Tokuyama, Y. Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems. Electronics 2026, 15, 1990. https://doi.org/10.3390/electronics15101990

AMA Style

Kao Y-T, Tu C-T, Lin HJ, Tokuyama Y. Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems. Electronics. 2026; 15(10):1990. https://doi.org/10.3390/electronics15101990

Chicago/Turabian Style

Kao, Yang-Ta, Ching-Ting Tu, Hwei Jen Lin, and Yoshimasa Tokuyama. 2026. "Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems" Electronics 15, no. 10: 1990. https://doi.org/10.3390/electronics15101990

APA Style

Kao, Y.-T., Tu, C.-T., Lin, H. J., & Tokuyama, Y. (2026). Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems. Electronics, 15(10), 1990. https://doi.org/10.3390/electronics15101990

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meta-LSTM-Affine: A Memory-Based Meta-Adaptive Affine Modeling Framework for Non-Stationary Systems

Abstract

1. Introduction

2. Related Work

2.1. Batch Normalization and Its Variants

2.2. Meta-Learning for Adaptive Normalization

2.3. Batch-Statistics-Free Normalization

2.4. Recurrent and Memory-Based Normalization

2.5. Toward Meta-Recurrent Normalization

3. The Proposed Method

3.1. Reformulating Normalization as Affine Transformation

3.2. LSTM-Based Affine Parameter Generator

3.3. Meta-Learning Integration

3.4. Bi-Level Training Objective

3.5. Properties and Advantages

4. Experimental Results

4.1. Datasets

4.2. Experimental Setup

4.3. Results on Few-Shot Classification

4.4. Ablation Studies

4.5. Computational Complexity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI