Featured Application
SeqFAL enables organizations to collaboratively classify security requirements without sharing sensitive specification documents or proprietary system details. This makes the framework particularly suitable for multi-stakeholder software development environments—such as government suppliers, financial institutions, and regulated industries—where privacy constraints prevent data centralization. SeqFAL can be integrated into existing requirements management tools to provide automated, privacy-preserving security analysis while significantly reducing manual labeling effort.
Abstract
Security requirements play a critical role in ensuring the trustworthiness and resilience of software systems; however, their automatic classification remains challenging due to limited labeled data, confidentiality constraints, and the heterogeneous nature of requirements across organizations. Existing approaches typically assume centralized access to training data and rely on costly manual annotation, making them unsuitable for distributed industrial settings. To address these challenges, we propose SeqFAL, a communication-efficient and privacy-preserving Federated Active Learning framework for natural language–based security requirements classification. SeqFAL integrates frozen pre-trained sentence embeddings, margin-based active learning, and lightweight federated aggregation of linear classifiers, enabling collaborative model training without sharing raw requirement text. We evaluate SeqFAL on a combined dataset of SeqReq dataset and the PROMISE-NFR dataset under varying federation sizes, query budgets, and communication rounds, and compare it against three baselines: centralized learning, active learning without federated aggregation, and federated learning without active querying. In addition to the proposed margin-based sampling strategy, we investigate alternative query strategies, including least-confidence and random sampling, as well as multiple linear classifiers such as LinearSVC and SGD-based classifiers with logistic and hinge losses. Results show that SeqFAL consistently outperforms FL-only and achieves performance comparable to AL-only centralized baselines, while approaching the optimal upper bound using significantly fewer labeled samples. These findings demonstrate that the joint integration of federated learning and active learning provides an effective and privacy-preserving strategy for security requirements classification in distributed software engineering environments.