1. Introduction
Spectrum Situational Awareness (SSA) methods face growing demands to monitor ever-wider bandwidths and reliably detect signals of interest (SOIs) in real time with cost-effective receivers [
1]. Achieving these objectives is made increasingly challenging by the evolving nature of spectrum usage, particularly as dynamic cognitive radio (CR) techniques are adopted [
2,
3] and as nonstandard or low-energy communication protocols emerge [
1]. In the past, specific communications were confined to fixed frequency ranges; however, recent CR research has enabled dynamic spectrum access (DSA) that allows opportunistic utilization of underutilized spectral segments [
4]. Although DSA increases overall spectral efficiency, its opportunistic nature undermines legacy SSA systems that expect communications to occur within predetermined static ranges [
5]. Furthermore, the congested and unpredictable RF environment poses a persistent challenge to maintaining SSA performance [
6].
SSA plays a vital role in wireless network engineering, spectrum regulation, and military, law enforcement, and cybersecurity applications. It aids in understanding spectrum utilization for optimizing future CR networks [
7] and in managing public spectrum use to detect unauthorized or interfering signals [
8]. In military contexts, SSA facilitates battlefield management by differentiating friendly from adversary communications, radar systems, and even unintentional electromagnetic emissions [
9,
10]. In civil and private networks, SSA enhances cybersecurity by flagging unusual radio frequency (RF) activity that may indicate cyberattacks—especially those targeting Internet of Things (IoT) devices—or by gathering evidence of network compromise [
11].
Within the cybersecurity realm, adversaries may exploit nonstandard RF communication protocols to covertly exfiltrate data from air-gapped systems, thereby evading traditional spectrum and network monitoring [
12]. To counter such threats, defenders may employ high-resolution sensors that simultaneously monitor the entire spectral band to guarantee near-100% detection probability [
13]. Given the vast range of frequencies and inherent limitations of the current hardware, this method of brute-force monitoring is impractical. Instead, efficient allocation of finite spectrum monitoring resources is necessary—a challenge underscored by programs such as IARPA’s Securing Compartmented Information with Smart Radio Systems (SCISRS) [
14].
Traditional spectrum sensing methods often rely on linear sweeping strategies [
15]. Although these approaches maximize coverage and standardize revisit rates, they can miss transient or burst-like signals that occur between sweeps. To overcome these limitations, recent approaches harness RF machine learning (RFML) techniques that jointly allocate spectrum monitoring and processing resources for comprehensive environmental characterization [
7,
16,
17,
18]. Among the various machine learning (ML) techniques, deep reinforcement learning (RL) has shown promise in inferring actionable policies from experience—even in environments with fuzzy objectives [
19,
20]. Multi-agent reinforcement learning (MARL) has further advanced resource allocation by enabling cooperative decision-making in distributed scenarios [
20]. For example, MARL has been applied outside the RF domain to balance processing workloads [
19], to dynamically allocate unmanned aerial vehicles (UAVs) as communication nodes [
21], and to deploy diverse sensors for maritime situational awareness [
22]. Within the RF domain, MARL methods have been proposed to optimize spectrum access efficiency [
23] and to allocate sensing resources for cognitive radios [
24,
25].
As AI/ML techniques increasingly control critical wireless network functions, understanding why a model makes a particular decision is paramount. Explainable Artificial Intelligence (XAI) seeks to transform these black-box models into transparent systems by revealing their decision logic [
26,
27]. In RF sensing, such transparency is not only important for trust and debugging but is also crucial when decisions impact mission-critical operations—such as in military communications or urban emergency services [
28,
29]. Pioneering work has applied post hoc XAI techniques (e.g., SHAP [
30] and LIME [
31]) to identify which signal features drive model predictions in interference detection [
32]. Additionally, multi-agent settings have motivated the development of XAI methods tailored for cooperative RL scenarios, enabling systems to articulate the rationale behind dynamic spectrum allocations [
33,
34]. Complementary studies have further reviewed interpretable deep learning methods for RF applications, proposing frameworks that integrate explainability into network management protocols [
35,
36,
37].
Emerging research is also exploring trust-enhancing frameworks that combine human-understandable explanations with high-performance ML models in cognitive radio systems [
38,
39]. Future-oriented surveys outline directions for incorporating interpretability into deep reinforcement learning strategies for spectrum sensing, highlighting the need for real-time explainable decision support systems [
27,
40]. Recent contributions have reinforced this trend by reviewing explainable RL in wireless communications, proposing novel metrics and hybrid approaches that combine supervised and RL methods for robust RF signal detection [
41].
Against this backdrop, our work presents SmartScan—an explainable multi-agent reinforcement learning approach that leverages coarse RF spectrum observations to dynamically allocate high-resolution receivers for detailed signal inspection. By adopting a mixture-of-experts (MoE) paradigm into the command and control (C2) of an SSA system, SmartScan fundamentally enhances detection performance on low-probability-of-intercept (LPI) and low-probability-of-detect (LPD) signals in complex RF environments while providing human-interpretable explanations of its decision-making process. This paper details the SmartScan framework, discusses simulated and real-world performance metrics, and outlines future research directions aimed at further bridging the gap between performance and interpretability in AI-driven spectrum sensing.
2. Materials and Methods
2.1. Overview of the SmartScan Framework
SmartScan is a novel scanning framework designed to efficiently balance the discovery of new information with the updating of existing knowledge. It achieves this by orchestrating two specialized agents, an Eager Agent and a Revisit Agent, working in tandem. The Eager Agent rapidly explores and processes newly active channels as soon as energy appears in the spectrum by observing a live spectrogram composed of fast-Fourier-transform (FFT) frames generated by the spectrum monitor, ensuring minimal delay in initial analysis of signals. In parallel, the Revisit Agent periodically re-examines previously explored channels based on the time since last visit, thereby keeping the accumulated knowledge up to date. Together, these components enable SmartScan to maintain a dynamic equilibrium between exploration and exploitation of data sources. This division of labor addresses the common trade-off in continuous scanning systems: the Eager Agent prevents missing out on new signals, while the Revisit Agent ensures a nearly constant sweep interval across all channels.
Figure 1 illustrates the high-level system architecture, showing how the two agents ingest their observations and output to external resources.
The SmartScan framework builds on principles from multi-agent systems and adaptive scheduling. By separating concerns between new signal acquisition and channel re-examination, it avoids the pitfalls of a one-size-fits-all scanner that might either spend too much time revisiting known items or too much time seeking new ones. Instead, SmartScan’s design allows each agent to specialize and optimize its respective task. This methodology is informed by prior research on age-of-information-aware radio resource management in vehicular networks [
42], which emphasizes the importance of balancing freshness and coverage. In SmartScan, we incorporate these principles directly into the architecture: the Eager Agent maximizes freshness of data from active channels, and the Revisit Agent maximizes the coverage of all channels evenly.
2.2. SmartScan Algorithm and Workflow
At the core of our framework lies the SmartScan algorithm, which coordinates the actions of the Eager and Revisit agents through a shared scheduling system. Pseudocode for the SmartScan workflow is provided in Algorithm 1, and a summary is given here for clarity. At the core of our framework lies the SmartScan algorithm, which employs a multi-threaded design to asynchronously process multiple data streams. As illustrated in Algorithm 1, the system initializes three dedicated queues:
Q_SpectrumMonitor for FFT frames,
Q_ReceiverHistory for receiver history, and
Q_AgentInference for channel priority inferences. Three independent threads—
SpectrumMonitorThread,
ReceiverOutputThread, and
SmartScanInferenceThread—are launched to continuously capture and publish data into their respective queues. The system continuously iterates through a cycle where it checks for and ingests incoming information from the external resources, updates the internal knowledge of the Eager Agent and Revisit Agent, and then makes the most up-to-date channel priorities available to the Channel Inspection Receivers.
Algorithm 1 Multi-Threaded SmartScan Top-Level Pseudocode |
- 1:
Initialize: - 2:
Queue Q_SpectrumMonitor ▹ For FFT Frames - 3:
Queue Q_ReceiverHistory ▹ For Receiver History - 4:
Queue Q_AgentInference ▹ For Channel Priorities - 5:
Initialize: Shared variable ChannelPriorities - 6:
Start Thread: SpectrumMonitorThread ▹ Continuously: publish(getFrameFromSpectrumMonitor()) to Q_Spectrum - 7:
Start Thread: ReceiverOutputThread ▹ Continuously: publish(getActionsTaken()) to Q_Receiver - 8:
Start Thread: SmartScanInferenceThread ▹ Continuously: publish(SmartScan.infer()) to Q_Inference - 9:
while system is running do - 10:
data_spectrum ← dequeueAll(Q_Spectrum) - 11:
data_receiver ← dequeueAll(Q_Receiver) - 12:
data_inference ← dequeueAll(Q_Inference) ▹ Pull all data from each queue - 13:
new_frame ← extractFrame(data_spectrum) - 14:
receiver_history ← extractReceiverHistory(data_receiver) - 15:
inference_data ← extractInferenceData(data_inference) - 16:
if new_frame or new_receiver_history are not empty then - 17:
SmartScan.update(new_frame) - 18:
SmartScan.update(new_receiver_history) - 19:
new_channel_priorities ← SmartScan.getChannelPriorities(new_frame, new_receiver_history, inference_data) ▹ SmartScan Agents Inference - 20:
updateShared(ChannelPriorities, new_channel_priorities) - 21:
end if - 22:
// Concurrently, Channel Inspection Receivers read from ChannelPriorities - 23:
end while
|
The dual-agent architecture of SmartScan, comprising the Eager and Revisit agents, facilitates a transparent allocation of contributions toward overall channel prioritization. The Eager Agent is responsible for ingesting new data and assigning high initial priorities based on immediate observations, while the Revisit Agent schedules and processes re-scans using the receiver history. By maintaining distinct queues and processing pathways, the framework is able to attribute portions of the overall predicted reward—used to determine channel prioritization—to the individual actions of each agent. This separation not only enhances operational efficiency but also provides clear insight into which agent drives the final decision-making process.
Such granularity in attributing contributions is pivotal for explainability. By quantifying how much each agent influences the final prioritization, system operators can better understand the balance between responding to fresh high-priority inputs and revisiting channels with significant historical change. This insight enables a more informed evaluation of the system’s performance, facilitating targeted improvements and ensuring greater potential for human–machine interaction. Ultimately, the modular design not only boosts real-time responsiveness but also demystifies the decision logic behind channel selection, thereby enhancing user trust and system reliability.
2.3. Eager Agent: New Energy Detection
The Eager Agent is dedicated to rapidly identifying and prioritizing new energy bursts in the spectrum. Its objective is to “quickly attend to new energy” so that transient signals—especially signals of opportunity with short on-times—are captured effectively. In practice, the agent processes FFT frames that contain wide-band power spectral density (PSD) inputs on a per-channel basis and computes an expected reward for each channel (see
Figure 2). The Eager Agent’s policy is implemented using a Deep Q-Network (DQN) [
43].
DQN is a value-based reinforcement learning method aimed at approximating the optimal action–value function, denoted
. The optimal function
is defined as the maximum expected return achievable by taking action
a in state
s and subsequently following the optimal policy:
In Equation (
1),
s denotes the current state, including wide-band PSD observations for each channel;
a represents the current action (channel selection decision);
r is the immediate reward reflecting detection of new energy;
is the discount factor (
) determining future reward weighting;
is the next state; and
represents possible future actions.
Since the optimal action–value function
is not directly known, the DQN employs a neural network parameterized by weights
to approximate this function as
. The DQN training procedure involves minimizing the loss function
derived from the Bellman equation:
Here, is a replay buffer storing past experiences, and are parameters of a separate target network periodically updated from to enhance training stability. Through this training process, the Eager Agent learns to prioritize channels exhibiting new signal energy, significantly improving the detection efficiency of transient signals compared to traditional scanning methods. This formulation enables the agent to dynamically allocate radio resources based on the instantaneous spectral activity.
2.4. Revisit Agent: Ensuring Uniform Channel Coverage
In contrast, the Revisit Agent is designed to maintain a consistent observation interval for every channel, mitigating the potential bias introduced by continuous prioritization of new signals. The agent uses a neural network to learn a reward function that exponentially increases for channels that have not been observed for extended periods. The state vector
S (with N channels indexed such that the most recently visited channel is 0) is given by:
This mechanism ensures that, even while the Eager Agent disrupts the naïve linear sweep to capture transient events, the overall system still achieves near-uniform coverage. Moreover, during initial “coldstart” conditions—when calibrating the radios—the Revisit Agent runs exclusively for the first two full sweeps to establish a reliable baseline, as depicted in
Figure 3.
2.5. Integration and Multi-Agent Decision-Making
Both agents predict channel-specific rewards for the available actions (i.e., assigning K radios to N channels). Their outputs are combined in a multi-agent framework where the top-K channels with the highest aggregated reward are selected for tuning, thus supporting any number of acting Channel Inspection Receivers. This fusion strategy not only balances the rapid response to new spectral events with the need for consistent monitoring but also enhances the explainability of the system’s behavior by attributing parts of the final decision to each agent’s contribution.
2.6. System Architecture and Implementation Details
The SmartScan framework is implemented as a modular system composed of individually scalable components, as shown schematically in
Figure 1. At a high level, the architecture consists of (1) the Eager Agent inference module, (2) the Revisit Agent inference module, (3) a central scheduler that operates continuously and facilitates data transfer between external resources and agents, and (4) a Data Repository that stores current knowledge of the spectrogram, receiver history, and channel priorities. These components communicate through well-defined interfaces and leverage ZeroMQ for asynchronous message passing on the same machine or over a network.
In a real-world deployment scenario, the architecture can be scaled horizontally: multiple instances of Eager Agents can run in parallel to handle high incoming data rates, and, similarly, multiple Revisit Agents can process higher numbers of channels if the bandwidth of Channel Inspection Receivers is decreased (or the frequency range of interest is increased). Furthermore, because the Eager Agent’s observation space is defined on a per-channel basis, the computational load increases proportionally with the number of channels. This scaling characteristic necessitates either additional computational resources or the parallelization of Eager Agents to maintain real-time performance in large-scale environments. Our system currently runs on a server with 16 CPU cores and 32 GB RAM, which is sufficient for moderate-scale experiments; however, the design can be distributed across a cluster using a networked message queue for inter-agent communication and a distributed database for the repository. This configuration ensures that SmartScan can handle real-world data volumes and rates, as expected in large-scale monitoring applications [
44].
Data handling in SmartScan is carefully designed to ensure consistency and efficiency. The Data Repository is implemented as a receiver history queue and a spectrogram, where the number of FFT frames stacked to form the spectrogram is equal to the receiver history. New data ingested by the Eager Agent are first written to the in-memory cache for rapid access by other components, then batch-inserted into the persistent database to prevent bottlenecks during bursts of data arrival.
In terms of real-world implementation, SmartScan was integrated into a spectrum monitoring system as the frequency-picking decision agent. In this configuration, the Eager Agent continuously captures real-time spectral data, while the Revisit Agent maintains a history of channel usage and interference patterns. By dynamically updating channel priorities based on immediate spectral observations and historical trends, SmartScan efficiently identifies and selects optimal frequency bands for communication. The system is tuned so that channels exhibiting high interference or rapid fluctuations are de-prioritized, ensuring that the most viable frequencies are allocated for ongoing operations. During development, and later stress testing, SmartScan demonstrated reliability regarding multiple failure conditions, including spectrum monitor FFT frames corrupted, spectrum monitor FFT frames failed to deliver, Channel Inspection Receivers’ timing slowed, and Channel Inspection Receivers stopping and restarting. This deployment validates the resilience and efficiency of our modular non-blocking architecture, demonstrating that the separation of data ingestion and processing yields substantial improvements in throughput and decision latency compared to conventional spectrum monitoring methods.
2.7. Limitations and Future Improvements
While the SmartScan framework has demonstrated promising performance in our experiments, several limitations remain that warrant further investigation. First, the current rate at which the Eager Agent can direct receivers to capture new energy is constrained by the fixed duration of receiver tunes. In practice, if a receiver is engaged in a 50 ms capture, brief energy bursts of less than 50 ms may be missed because the system cannot interrupt a low-value tune to rapidly reconfigure the radio. Future work should explore mechanisms to pre-empt or shorten ongoing tunes when low-value energy is detected, thereby allowing a more agile response to transient spectral events. Additionally, the current implementation of Revisit Agent does not factor in the value of previous actions taken by the system. Feedback from the high-resolution inspection digital signal processing (DSP) pipeline is not yet utilized, which could provide critical insights to better prioritize channels during revisits. By integrating such feedback, future iterations of SmartScan could refine channel prioritization and improve overall system performance.
Another area for improvement is the dynamic allocation of resources between the Eager and Revisit agents. Our fixed scheduling policy (essentially a weighted round-robin) may not be optimal under varying load conditions. An adaptive or learned resource allocation strategy—for instance, using reinforcement learning as suggested in [
45]—could better balance new item ingestion with the re-examination of existing items, particularly during periods of fluctuating data rates.
Finally, while we have validated SmartScan within a spectrum monitoring application, broader deployment in other domains (e.g., IoT sensor networks or file system monitoring) may introduce additional challenges, such as intermittent connectivity or energy constraints. Addressing these domain-specific issues and integrating more flexible tuning and feedback mechanisms will be key directions for future work.
Despite these challenges, the modular design of SmartScan—with its clear separation between the Eager and Revisit agents—provides a robust foundation for future enhancements as well as additional agents or policies. Improvements to one component, such as faster radio reconfiguration or more accurate predictive models, can be integrated without disrupting the overall system, ensuring that SmartScan can evolve to meet new demands.
3. Results
SmartScan was developed and validated across diverse spectrum conditions, various SOIs, and a range of configurable hyperparameters. The results presented in this section were selected to best represent the overall system performance. Designed explicitly for real-time C2 applications, SmartScan consistently achieved low-latency inference (2 ms) on commodity CPU hardware. The system efficiently orchestrated high-resolution inspection receivers in scanning 50 MHz channels concurrently while running in a containerized environment and interacting with multiple external APIs for data input and output. The efficacy and adaptability of SmartScan were validated through deployments in diverse settings, including a commercial office building (Aurora, CO), the RF laboratory at Virginia Tech National Security Institute, and a dedicated testing facility at the Pacific Northwest National Laboratory. These varied test environments underscore the robustness, portability, and dynamic adaptability of SmartScan’s containerized framework to differing real-world spectrum scenarios.
Figure 4,
Figure 5 and
Figure 6 visualize the differences in high-resolution inspection receiver allocations between SmartScan and the naïve approach.
Figure 4 illustrates the naïve behavior of linearly sweeping with all the available radios. The static allocation of resources is unable to reliably detect low-probability-of-intercept signals that have shorter durations.
Figure 5 shows a qualitative improvement over the behavior illustrated in
Figure 4 by dynamically allocating resources to inspect the signal that is transmitted in between the sweeping interval. Additionally, this figure illustrates the explainability offered by the SmartScan MoE approach, where individual actions taken by the system can be audited and attributed to an individual policy.
Figure 6 illustrates the efficacy of the SmartScan framework by contrasting the allocation behaviors of the individual agents with that of the combined MoE approach. Panel (a) demonstrates that the MoE integration strategically concentrates Channel Inspection Receivers on frequency bands exhibiting higher standard deviations in channel power, thereby targeting regions of elevated spectral activity. In contrast, panel (b) illustrates that the Eager Agent, when deployed independently, overcompensates by concentrating resources solely on active bands, resulting in several GHz-wide segments of the spectrum where comparatively low-peak-power signals would go undetected. Panel (c) shows that the Revisit Agent, operating in isolation, uniformly distributes actions across the spectrum without adequately prioritizing more active channels, taking the naïve approach. These observations confirm that the SmartScan framework, through its integrated MoE strategy, robustly balances the need for dynamic responsiveness with consistent coverage, thereby enhancing the reliable detection of both LPI and LPD signals under hardware constraints.
SmartScan delivers quantitative and qualitative improvements over the naïve approach while maintaining near-linear sweep revisit rates across the spectrum.
Table 1 compares the results from an over-the-air system test that measures the time intervals between consecutive observations of the same channel.
Table 1 quantifies the performance benefits of the SmartScan framework by comparing the observation intervals achieved by the individual agent policies and their combined operation. The Eager Agent yields a low mean observation interval (408.0 ms), demonstrating rapid responsiveness to new energy bursts; however, its high variability (a standard deviation of 1526.0 ms) indicates that its coverage is inconsistent, potentially leaving some frequency bands unmonitored for extended periods. In contrast, the Revisit Agent enforces a uniform revisit rate (2075.0 ms for all the percentiles), ensuring consistent monitoring but lacking the agility to quickly detect transient signals. As shown by the combined metrics (a mean of 2065.5 ms and a standard deviation of 1175.8 ms), the system maintains a near-linear sweep revisit rate akin to the Revisit Agent while incorporating the adaptive responsiveness of the Eager Agent.
The results obtained from SmartScan demonstrate significant improvements over the naïve scanning approach that are critical when building a system for the reliable detection of LPI/LPD signals under limited hardware constraints. Quantitatively, as shown in
Table 1, SmartScan not only achieves a near-linear sweep revisit rate on average, thus maximizing the likelihood of detecting LPD signals at or below the noise floor, but also dynamically allocates Channel Inspection Receivers to frequencies exhibiting higher spectral activity. This targeted allocation reduces the observation interval for channels with transient or high-power signals—a crucial metric since LPI signals are typically low-duty-cycle and may otherwise be missed if not promptly captured. In hardware-constrained environments, where the available radios and processing power are limited, ensuring consistent and timely channel monitoring is essential to maximize detection probability. Moreover, the explainability inherent in the MoE approach provides valuable insights into the decision-making process, allowing system operators to fine-tune resource allocation in real time and ensure that even low-probability-of-intercept signals are reliably detected despite the hardware limitations.
To place SmartScan’s performance into perspective, we qualitatively compare it against other notable state-of-the-art spectrum scanning methods. Conventional heuristic methods such as periodic round-robin scanning and priority-based scanning methods [
46] generally fail to adapt dynamically, often missing short-duration signals due to their fixed or semi-fixed schedules. SpecInsight [
46], a prominent heuristic multi-armed bandit approach, effectively detects periodic signals but may struggle with unpredictable or highly dynamic spectrum conditions, where signal patterns deviate frequently.
In contrast, compressed sensing-based methods [
47] offer simultaneous spectrum coverage with fewer measurements but depend heavily on the assumption of sparse occupancy. Their computational complexity and vulnerability to densely populated environments often limit their practical deployment. DQN-based spectrum sensing [
17] and Q-learning-based anti-jamming techniques [
48] improve adaptability but often focus on single-goal optimization and lack comprehensive and continuous spectrum coverage.
Table 2 summarizes the comparison of notable SSA methods, highlighting SmartScan’s unique combination of comprehensive coverage, adaptability, explainability, and real-time performance.
As demonstrated, SmartScan uniquely combines adaptive decision-making, continuous real-time operation, robustness against varying spectral densities, and explainable scanning policies, making it highly suitable for practical SSA applications.
Collectively, these metrics and qualitative assessments validate SmartScan’s ability to operate effectively under real-world conditions, serving as performance benchmarks for systems tasked with intercepting LPI/LPD signals. The combination of agile response to transient events and consistent coverage ensures that SmartScan can be deployed in resource-constrained scenarios without compromising the detection of elusive signals.
It is important to note that all the RF data used in these experiments were collected over the air in real-world conditions. Due to regulatory and privacy constraints associated with live RF transmissions, the raw data will not be made publicly available. Nonetheless, the detailed descriptions of our experimental setup and evaluation methodologies provide sufficient information for independent replication and further research in this area.
Furthermore, similar over-the-air results can be achieved using the following radio hardware for the spectrum monitor: SignalHound BB60.
Table 3 summarizes the configuration parameters used for the SignalHound BB60 in our experiments.
4. Discussion
The SmartScan system demonstrates a significant advancement in RF signal detection and interception through its innovative reinforcement-learning-based MoE approach. By integrating multiple policy-informed agents—each optimized for distinct operational objectives—SmartScan achieves a near-linear sweep revisit rate on average while dynamically reallocating resources to target frequency bands with heightened spectral activity.
Our over-the-air experimental results validate the efficacy of SmartScan. The quantitative metrics summarized in
Table 1 demonstrate that SmartScan maintains a balanced and rapid revisit schedule, ensuring that transient low-duty-cycle events are not overlooked. Qualitatively, the distribution of actions—as illustrated in
Figure 4,
Figure 5 and
Figure 6—reveals that the MoE framework successfully mediates the trade-off between exploration and exploitation. Specifically, while the Eager Agent tends to over-concentrate on transient high-energy bursts and the Revisit Agent uniformly covers the spectrum without discriminating active channels, their integration results in a robust scanning strategy that adaptively focuses on channels with greater potential for signal interception.
These results are particularly important when considering the broader context of SSA, as discussed in the Introduction. The evolving nature of dynamic spectrum access and cognitive radio techniques demands that SSA systems not only cover wide bandwidths but also swiftly detect fleeting low-energy signals. In military, law enforcement, and cybersecurity applications—where the timely detection of covert or nonstandard transmissions is critical—the ability to deploy an explainable adaptive scanning framework like SmartScan is invaluable. The transparency provided by our MoE approach, which allows individual actions to be audited and attributed to specific policies, further enhances trust and accountability in mission-critical operations.
The technical contributions of this work are threefold. First, we propose a novel MoE-based framework for SSA that combines multiple specialized agents, yielding performance improvements over traditional linear sweeping methods. Second, we demonstrate the practical viability of our approach with over-the-air experiments on representative hardware. Third, we offer an in-depth analysis of the trade-offs between exploration and exploitation, providing insights that are essential for optimizing resource allocation under real-world hardware constraints.
Looking forward, future research should explore adaptive strategies to further refine the balance between agent policies, incorporate additional data sources, and leverage advanced signal processing techniques to enhance the detection of elusive LPI/LPD signals. These efforts will ensure that SSA systems remain effective and resilient in increasingly congested and dynamic RF environments.