You are currently on the new version of our website. Access the old version .
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

15 January 2026

SeqFAL: A Federated Active Learning Framework for Private and Efficient Labeling of Security Requirements

Department of Computer Science, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia
This article belongs to the Section Computing and Artificial Intelligence

Abstract

Security requirements play a critical role in ensuring the trustworthiness and resilience of software systems; however, their automatic classification remains challenging due to limited labeled data, confidentiality constraints, and the heterogeneous nature of requirements across organizations. Existing approaches typically assume centralized access to training data and rely on costly manual annotation, making them unsuitable for distributed industrial settings. To address these challenges, we propose SeqFAL, a communication-efficient and privacy-preserving Federated Active Learning framework for natural language–based security requirements classification. SeqFAL integrates frozen pre-trained sentence embeddings, margin-based active learning, and lightweight federated aggregation of linear classifiers, enabling collaborative model training without sharing raw requirement text. We evaluate SeqFAL on a combined dataset of SeqReq dataset and the PROMISE-NFR dataset under varying federation sizes, query budgets, and communication rounds, and compare it against three baselines: centralized learning, active learning without federated aggregation, and federated learning without active querying. In addition to the proposed margin-based sampling strategy, we investigate alternative query strategies, including least-confidence and random sampling, as well as multiple linear classifiers such as LinearSVC and SGD-based classifiers with logistic and hinge losses. Results show that SeqFAL consistently outperforms FL-only and achieves performance comparable to AL-only centralized baselines, while approaching the optimal upper bound using significantly fewer labeled samples. These findings demonstrate that the joint integration of federated learning and active learning provides an effective and privacy-preserving strategy for security requirements classification in distributed software engineering environments.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.