Federated learning (FL) struggles under non-IID client data when local models drift toward conflicting optima, impairing global convergence and performance. We introduce entropy-regularized federated optimization (ERFO), a lightweight client-side modification that augments each local objective with a Shannon entropy penalty on the per-parameter
[...] Read more.
Federated learning (FL) struggles under non-IID client data when local models drift toward conflicting optima, impairing global convergence and performance. We introduce entropy-regularized federated optimization (ERFO), a lightweight client-side modification that augments each local objective with a Shannon entropy penalty on the per-parameter update distribution. ERFO requires no additional communication, adds a single-scalar hyperparameter
, and integrates seamlessly into any FedAvg-style training loop. We derive a closed-form gradient for the entropy regularizer and provide convergence guarantees: under
-strong convexity and
L-smoothness, ERFO achieves the same
(or linear) rates as FedAvg (with only
bias for fixed
and exact convergence when
); in the non-convex case, we prove stationary-point convergence at
. Empirically, on five-client non-IID splits of the UNSW-NB15 intrusion-detection dataset, ERFO yields a +1.6 pp gain in accuracy and +0.008 in macro-F1 over FedAvg with markedly smoother dynamics. On a three-of-five split of PneumoniaMNIST, a fixed
matches or exceeds FedAvg, FedProx, and SCAFFOLD—achieving 90.3% accuracy and 0.878 macro-F1—while preserving rapid, stable learning. ERFO’s gradient-only design is model-agnostic, making it broadly applicable across tasks.
Full article