1. Introduction
Indoor localization is a foundational technology for smart environments, supporting applications such as IoT device management and robotic navigation [
1,
2]. While GPS is widely used outdoors, its signals are attenuated inside buildings, resulting in degraded positioning accuracy. Consequently, localization methods that leverage Wi-Fi infrastructure installed in buildings have been investigated [
3,
4].
Among Wi-Fi-based localization approaches, the use of Channel State Information (CSI) has attracted increasing attention [
5,
6]. CSI represents the channel characteristics of radio wave propagation between a transmitter and receiver, capturing amplitude and phase variations across multiple frequency bands (subcarriers). The conventional Received Signal Strength Indicator (RSSI) provides only a single aggregate measure of received power, making it difficult to capture fine-grained environmental changes. In contrast, CSI provides channel responses for tens to hundreds of individual subcarriers, enabling the detection of subtle variations in radio propagation caused by changes in receiver position and achieving positioning accuracy on the order of tens of centimeters.
Deep learning approaches have demonstrated competitive accuracy in CSI-based indoor localization [
7,
8]. DeepFi [
7] collects CSI data at multiple reference points (RPs)—measurement locations that serve as position references—within an environment and trains a neural network using the location labels associated with each RP. The trained model estimates the RP to which the receiver is closest based on newly observed CSI patterns. Such methods have achieved high localization accuracy even in multipath environments where radio waves arrive via multiple paths after reflecting off walls and furniture.
To build a practical localization system, it is desirable to train models using data collected from diverse environments. A model trained solely on data from a single building or room tends to overfit to that specific environment, resulting in degraded performance in other settings. Therefore, integrating data from multiple environments with different characteristics—such as office buildings, commercial facilities, and residential spaces—is necessary for training [
9,
10].
One approach to multi-environment data integration is centralized learning, in which CSI data and location labels collected from all environments are aggregated on a single server for model training. However, centralized learning raises privacy concerns [
11]. The combination of CSI data and location labels reveals which CSI patterns are observed at specific positions within an environment. If such information were leaked, it could potentially allow inference of building indoor layouts and obstacle placements, or reconstruction of movement histories of users carrying devices. Additionally, the continuous transmission of high-dimensional CSI data to a server imposes substantial network bandwidth requirements, creating a barrier to large-scale deployment.
Federated Learning (FL) has been explored as an approach to address both privacy preservation and communication efficiency [
12]. In FL, each client trains a model locally on its own data without transmitting raw data to an external server. After training, each client sends only its model weights (parameters) to a server. The server aggregates the weights received from multiple clients (e.g., by averaging) to produce a global model, which is then redistributed to each client. By iterating this process, a model trained on data from multiple clients is trained without directly sharing the underlying data, enabling both privacy preservation and communication efficiency.
A key challenge in applying FL to CSI-based localization is distributional heterogeneity across environments. CSI measurements are sensitive to environmental factors, exhibiting different characteristics depending on room geometry, furniture placement, wall materials, and the Wi-Fi hardware used [
13,
14]. For example, an office environment with abundant metallic equipment and an apartment with predominantly wooden furniture may produce different CSI patterns. This situation, where the statistical properties of data held by each client differ, is referred to as non-Independent and Identically Distributed (non-IID).
FedAvg [
12], a representative FL algorithm, constructs the global model by simply averaging local model updates from each client. However, when averaging is performed across non-IID clients, environment-specific features can cancel each other out, resulting in a model that performs inadequately across all environments. Consequently, slow convergence and model instability in the known training environments (source domains) have been reported, along with a tendency toward reduced generalization to new environments (target domains) that did not participate in training [
15,
16].
Among existing approaches to non-IID data, regularization-based methods such as FedProx [
17] and SCAFFOLD [
18] constrain each client’s model updates to remain close to the global model. This prevents non-IID clients from updating the model in highly divergent directions and promotes stable convergence in source domains. However, these methods apply constraints of uniform strength across all layers of the model. In deep neural networks, it is known that shallow layers learn generic low-level features (such as edges and textures) while deeper layers learn task-specific high-level features [
19]. Applying constraints of identical strength to all layers may result in insufficiently strong constraints on shallow layers—undermining the consistency of generic features that should be shared across environments—while overly constraining deeper layers, potentially hindering appropriate adaptation to each environment [
20,
21].
Domain adaptation approaches, such as FedPos [
22], separate the model into a feature extraction component shared across environments and an environment-specific classification component. The shared component learns environment-independent generic representations, while the specific component is adapted to the characteristics of each environment. This design enables efficient adaptation to a target domain by fine-tuning only the specific component with a small amount of data. However, as the shared component prioritizes generality, it may not fully capture the features necessary for discriminating fine-grained positional differences in source domains. Consequently, while adaptability to target domains may improve, reduced localization precision in source domains has been observed [
23,
24].
In this way, existing methods tend to prioritize either stable operation in source domains (stability) or deployment to target domains (plasticity), making it difficult to achieve both at a high level simultaneously [
25].
A practical localization system requires from both stability and plasticity. For example, when deploying robots with localization capability across multiple office buildings, it is desirable to maintain high-precision positioning in currently operational buildings while enabling adaptation to new buildings using only a small number of RP measurements. Since collecting large amounts of training data for each target domain is often impractical, zero-shot and few-shot performance are important considerations.
To address the stability–plasticity trade-off in federated CSI-based localization, this paper proposes AdaFed-LDR, which combines server-side Confidence-Weighted Adaptive Aggregation (hereafter referred to as adaptive aggregation) with client-side Layerwise Dynamics Regularization (LDR).
The adaptive aggregation mechanism weights each client update according to its estimated reliability. Specifically, it monitors the feature covariance matrices extracted by each client and assigns higher reliability to clients that learn temporally stable features, while suppressing the influence of unstable updates affected by noise. This aims to mitigate the adverse effects of excessive variability in client updates on the global model.
Client-side LDR applies regularization of different strengths to each layer of the network. Shallow layers, which extract generic low-level features such as edges and textures that are less dependent on the specific environment, receive stronger regularization to maintain consistency with the global model. Deeper layers, which learn task- and environment-specific high-level features, receive weaker regularization to permit adaptation to each environment’s data. Through this differentiation, LDR aims to preserve the consistency of generic features that should be shared across environments while also allowing the learning of features necessary for position discrimination within each environment.
By combining both mechanisms, AdaFed-LDR aims to achieve both high precision in source domains and effective adaptation to target domains.
To validate the proposed method, we conducted experiments using CSI datasets collected from 8 diverse environments. We adopted Leave-One-Out Cross-Validation (LOOCV), designating 7 of the 8 environments as source domains participating in training and the remaining 1 as a target domain excluded from training, evaluating all 8 possible configurations. Statistical tests using 5 different random seeds were performed to assess the significance of differences between methods.
In the experiments, stability was evaluated based on localization accuracy in source domains. Plasticity was assessed based on zero-shot performance in target domains (measuring generalization) and few-shot performance (measuring adaptation capability). Representative FL methods, including FedAvg and FedProx, were used as comparison baselines.
The main contributions of this paper are as follows:
In federated CSI-based localization, this study proposes AdaFed-LDR to address the severe stability–plasticity trade-off caused by environmental multipath effects. Its fundamental novelty lies in a complementary architecture specifically designed for the physical characteristics of RF signals, tightly coupling local adaptation with representation-space reliability estimation during global aggregation.
Through evaluation on source domains, this study shows that AdaFed-LDR achieves higher localization accuracy compared to existing FL methods, with smaller performance variance across different random seeds. These results suggest that the proposed method contributes to stable operation in source domains.
Through evaluation on target domains, this study confirms that AdaFed-LDR mitigates the severe performance degradation observed in existing methods. In the few-shot setting, while adaptation accuracy may be lower than that of domain adaptation methods in some cases, the performance variance remains small, indicating that the design goal of balancing stability and plasticity is met.
Through ablation experiments, this study confirms that both adaptive aggregation and LDR contribute to performance improvement, and that combining the two components achieves the highest overall accuracy and lowest variance compared to their individual applications, validating their complementary integration.
7. Conclusions
This paper proposed AdaFed-LDR to address the stability–plasticity trade-off in federated CSI-based indoor localization. AdaFed-LDR combines server-side adaptive aggregation based on feature covariance changes with client-side Layerwise Dynamics Regularization (LDR), which imposes stronger constraints on shallow layers to preserve generic features shared across environments and weaker constraints on deeper layers to allow environment-specific adaptation.
Evaluation across 8 indoor environments with 5 random seeds confirmed that AdaFed-LDR achieves source-domain precision comparable to centralized learning while outperforming existing federated methods in domain generalization to unseen environments. In few-shot adaptation, AdaFed-LDR achieves lower error than centralized learning with only one sample per reference point, suggesting that a minimal calibration budget is sufficient for practical deployment. Although FedPos achieves lower adaptation error in few-shot settings, AdaFed-LDR exhibits substantially smaller performance variance across seeds, indicating higher reproducibility. Ablation experiments confirmed that combining the two components yields the highest overall improvement, supporting the design rationale of addressing layer-specific roles in both aggregation and regularization.
This study has several limitations. First, the dataset was collected in static environments over a fixed period, and robustness to long-term temporal drift caused by changes in temperature, humidity, or furniture placement has not been evaluated. Second, all experiments used uniform hardware, and the behavior of the adaptive aggregation module under hardware-induced amplitude and phase biases remains to be examined. Third, the evaluation covered 8 environments within a single institution, and validation across a larger and more architecturally diverse set of buildings is necessary to establish broader generalizability.
Future directions include integrating established robust statistical estimation frameworks with online continual learning mechanisms to dynamically handle temporal domain shifts and non-Gaussian measurement noise. Additionally, we plan to explore hardware-agnostic calibration strategies, communication-efficient matrix approximations, and large-scale cross-building evaluation. Integration with differential privacy techniques to protect the transmitted covariance matrices is also an important direction. We hope this study contributes to the broader understanding of how layer-specific regularization can mitigate the stability–plasticity trade-off in federated learning for CSI-based localization.