6.3. Scenario-Based Demonstration: Port-Scan-like Behavior
To concretely demonstrate how evidence abstraction creates ambiguity, we trace a canonical network behavior—port-scan-like patterns characterized by high destination fan-out, SYN-dominant connections, and short-lived flows—through each evidence layer. For the scenario trace, we selected a short sub-interval within the 15 min window exhibiting an extreme tail in (i) per-source destination fan-out and (ii) SYN-dominant, short-lived connection patterns, as measured by our artifact definitions; this selection is used solely to illustrate representation-driven evidence loss and does not constitute attack attribution.
Figure 6 visualizes the 15 min window with the selected sub-interval highlighted, showing the same time period across L0, L1, and L2 representations.
At L0 (packet-level), a responder can observe fine-grained timing patterns (inter-packet intervals, burst dynamics) and flag-level detail (SYN ratios), enabling support for Reconnaissance, Discovery, and Initial Access tactics with high evidential strength. At L1 (flow-level), destination fan-out becomes directly computable per source, preserving the ability to support Lateral Movement inference while losing packet-level timing texture. At L2 (time-aggregated), destination fan-out is no longer computable because temporal aggregation removes per-source keyed structure; only bin-level aggregate counts (total flows, rates) remain. This collapse eliminates support for Lateral Movement while preserving support for Reconnaissance and Discovery via SYN patterns and protocol statistics.
This scenario directly illustrates the logical progression from
Table 6 to
Table 7: the loss of destination fan-out at L2 (
Table 6) eliminates support for
Lateral Movement (
Table 7), demonstrating that the “9 → 7 tactics” reduction is not uniform. Some tactics (e.g.,
Lateral Movement) become completely non-supportable at L2, while others (e.g.,
Reconnaissance) remain supportable but with reduced evidential strength due to loss of structural detail.
6.4. Comparative Analysis: Framework Positioning and Trade-Offs
To address concerns about comparative evaluation, we provide a systematic conceptual comparison of our audit framework with related approaches, highlighting advantages, disadvantages, and appropriate use cases. We note that traditional experimental comparisons (e.g., ROC-AUC, precision, recall) are not methodologically appropriate here because our framework evaluates representational limits of inference (a structural property), not detection performance (a measurable accuracy metric). Detection frameworks and audit frameworks serve fundamentally different purposes and cannot be meaningfully compared using shared experimental metrics. Instead, we provide a qualitative comparison based on framework characteristics, objectives, and use cases.
Table 9 compares our framework against representative works across key dimensions: evaluation objective, methodology, output, and operational applicability.
6.4.1. Comparison with Detection-Focused Frameworks
Detection frameworks (e.g., [
6,
7]) evaluate
how accurately attacks can be identified across different data representations. These frameworks measure performance metrics (ROC-AUC, precision, recall) and optimize for detection effectiveness. In contrast, our audit framework evaluates
which threat hypotheses can be logically supported given available evidence, independent of any specific detection algorithm.
Advantages of our approach:
Algorithm-agnostic: Results hold regardless of detection method (ML, rule-based, statistical).
Reveals fundamental limits: Identifies cases where inference is impossible due to evidence loss, not algorithm limitations.
No ground truth required: Structural analysis does not depend on labeled attack datasets.
Design-time guidance: Helps architects choose telemetry configurations before deployment.
Limitations relative to detection frameworks:
No detection capability: Does not identify attacks or provide alerts.
No performance metrics: Does not measure accuracy, false positive rates, or detection latency.
Requires interpretation: Coverage reports must be manually analyzed to inform decisions.
When to use each: Detection frameworks are appropriate for operational security systems requiring real-time threat identification. Our audit framework is appropriate for design-time evaluation, forensic readiness planning, and understanding fundamental inference boundaries before deploying detection systems.
6.4.2. Comparison with Telemetry System Frameworks
Telemetry system frameworks (e.g., [
14,
15,
32]) optimize monitoring architectures for efficiency, throughput, and resource utilization. These frameworks focus on
how to export telemetry data efficiently, while our framework evaluates
what can be inferred from exported data.
Advantages of our approach:
Forensic coverage evaluation: Quantifies which threat hypotheses remain supportable, not just export efficiency.
Evidence-centric analysis: Reveals how abstraction affects inference capabilities, not just performance.
Design guidance: Provides criteria for choosing between telemetry configurations based on forensic requirements.
Limitations relative to telemetry frameworks:
No performance optimization: Does not address throughput, memory, or bandwidth constraints.
No operational deployment: Provides design-time analysis, not runtime telemetry export.
Static evaluation: Assumes fixed telemetry configurations, does not handle dynamic adaptation.
When to use each: Telemetry frameworks are essential for designing scalable monitoring systems under resource constraints. Our audit framework complements these by evaluating whether efficient telemetry configurations preserve required forensic coverage, enabling informed trade-off decisions.
6.4.3. Comparison with ATT&CK/D3FEND Modeling Frameworks
ATT&CK/D3FEND modeling frameworks (e.g., [
38,
39]) use these vocabularies for strategic threat/defense modeling, often with rich visibility (endpoint logs, payload access, domain context). Our framework uses ATT&CK/D3FEND under backbone-only constraints to evaluate supportability.
Advantages of our approach:
Backbone-constrained: Explicitly designed for partial observability (no payload, no endpoint context).
Supportability focus: Evaluates which hypotheses remain defensible, not strategic modeling.
Representation-driven: Links evidence abstraction directly to inference limits.
Limitations relative to modeling frameworks:
Narrower scope: Restricted to network-observable tactics, excludes endpoint-dependent behaviors.
No strategic recommendations: Does not provide game-theoretic or optimization-based defense strategies.
Tactic-level only: Avoids technique-level attribution that requires richer context.
When to use each: Modeling frameworks are appropriate for enterprise environments with rich visibility and strategic defense planning. Our framework is appropriate for ISP/backbone monitoring where visibility is constrained, and the focus is on understanding inference boundaries.
6.4.4. Synthesis: Complementary Roles
This comparative analysis demonstrates that frameworks serve complementary rather than competing roles. Detection frameworks optimize for accuracy, telemetry systems optimize for efficiency, and modeling frameworks optimize for strategic defense. Our audit framework fills a distinct gap by evaluating representational limits of inference under evidence abstraction.
The audit framework is most valuable when used before deploying detection systems or telemetry architectures, helping architects understand which threat hypotheses will remain supportable under chosen configurations. It does not replace detection systems or telemetry optimizers; rather, it provides a prerequisite analysis that informs their design and configuration.
6.7. Feasibility Analysis: Integration into SDN Controllers and P4 Pipelines
To address practical implementation concerns, we provide a detailed feasibility analysis for integrating the audit framework into real-world SDN controllers and P4-programmable switch pipelines. This analysis outlines architectural considerations, implementation challenges, and concrete integration strategies.
6.7.1. SDN Controller Integration Architecture
The audit framework can be integrated into SDN controllers (e.g., OpenDaylight, ONOS, Ryu) as a telemetry policy engine that evaluates and enforces forensic coverage requirements. The proposed architecture positions the audit framework as a middleware component between the controller’s northbound API and southbound telemetry collection.
Architecture Components:
Policy Definition Module: Allows operators to specify required ATT&CK tactic coverage (e.g., “preserve support for Lateral Movement and Execution tactics”) and acceptable abstraction levels.
Real-time Audit Engine: Evaluates current telemetry exports against policy requirements, computing artifact computability and inference coverage metrics.
Adaptive Telemetry Manager: Dynamically adjusts telemetry granularity (packet sampling rates, flow timeout settings, aggregation bin sizes) based on audit results and resource constraints.
Forensic Coverage Monitor: Continuously tracks which threat hypotheses remain supportable given the current telemetry configuration.
Implementation Challenges:
Challenge 1: Real-time Audit Computation Overhead. The audit framework must evaluate artifact computability across multiple evidence layers in real-time. For a network with thousands of switches, this could introduce significant computational overhead. Solution: Implement incremental audit updates that recompute metrics only when the telemetry configuration changes, rather than continuously re-evaluating all artifacts. Cache artifact computability results per switch/port configuration, invalidating only when relevant parameters (timeout settings, bin sizes) are modified.
Challenge 2: Controller-Switch Communication Latency. SDN controllers communicate with switches via OpenFlow or P4Runtime, introducing latency between policy decisions and telemetry reconfiguration. This delay could create windows where forensic coverage requirements are temporarily unmet. Solution: Implement predictive policy enforcement that pre-configures telemetry settings based on anticipated traffic patterns, and use asynchronous audit validation that tolerates brief coverage gaps during reconfiguration.
Challenge 3: Heterogeneous Switch Capabilities. Different switch models support varying telemetry features (e.g., some support packet mirroring, others only flow export). The audit framework must adapt to these constraints. Solution: Maintain a capability matrix per switch type, mapping available telemetry features to artifact computability. The audit engine queries this matrix to determine feasible coverage levels per device.
Challenge 4: Policy Conflict Resolution. Operators may specify conflicting requirements (e.g., “preserve Execution inference” while “minimize export bandwidth”). The framework must resolve these trade-offs. Solution: Implement a priority-based policy resolution system that ranks tactics by operational importance (e.g., Critical/High/Medium), and uses the audit framework to find minimal telemetry configurations that satisfy critical requirements first.
6.7.2. P4 Pipeline Integration Architecture
For P4-programmable switches, the audit framework informs the design of the compile-time telemetry schema and the runtime selective export policies. Unlike SDN controllers, which operate in the control plane, P4 integration requires embedding audit logic directly into the data-plane pipeline.
Architecture Components:
Schema Generator: Uses audit framework output to generate P4 telemetry table definitions that preserve required artifacts (e.g., registers for per-source destination fan-out, timestamps for inter-packet intervals).
Selective Export Logic: Implements conditional telemetry export based on artifact requirements—only exporting detailed metrics when they enable critical tactic support.
Resource-Aware Binning: Dynamically adjusts temporal aggregation bin sizes based on available switch memory and required artifact preservation.
Implementation Challenges:
Challenge 1: P4 Memory Constraints. P4 switches have limited memory for stateful data structures (registers, meters). Storing per-source destination fan-out matrices or fine-grained timing data for all flows may exceed available resources. Solution: Implement approximate data structures (e.g., Count-Min sketches for fan-out estimation, sampled timing windows) that preserve artifact computability within memory bounds. The audit framework can be extended to evaluate artifact computability under approximation, trading exactness for feasibility.
Challenge 2: Pipeline Stage Limitations. P4 pipelines have fixed processing stages, limiting where telemetry logic can be inserted. Complex artifact computations (e.g., temporal periodicity detection) may require multiple pipeline passes. Solution: Decompose artifact computation into pipeline-compatible primitives. For example, periodicity detection can be implemented using P4 registers that track flow counts per time bin, with periodicity analysis performed by a control plane agent that processes exported register snapshots.
Challenge 3: Export Bandwidth Constraints. High-speed switches generate massive telemetry volumes. Exporting detailed artifacts (e.g., per-packet timestamps) could saturate control plane links. Solution: Use the audit framework to identify minimal sufficient artifact sets. For example, if both “inter-arrival burstiness” and “inter-packet timing” support Execution, but only one is required, export only the less bandwidth-intensive artifact. Implement intelligent sampling that exports detailed metrics only for flows matching suspicious patterns (e.g., high fan-out, SYN-dominant).
Challenge 4: Compile-Time vs. Runtime Flexibility. P4 programs are compiled and loaded onto switches, limiting runtime reconfiguration. However, telemetry requirements may change based on threat intelligence. Solution: Design P4 pipelines with parameterized telemetry tables (e.g., configurable bin sizes, selectable export fields) that can be modified via P4Runtime without recompilation. The audit framework validates that parameter changes maintain required coverage.
6.7.3. Concrete Integration Example: P4 Switch with Hybrid Monitoring
To illustrate practical integration, we outline a concrete P4 implementation for a switch that must preserve support for Lateral Movement and Execution tactics while operating under memory constraints.
Requirements:
Preserve Lateral Movement inference (requires destination fan-out artifact).
Preserve Execution inference (requires inter-packet timing artifact).
Memory budget: 64 KB for telemetry state.
Export bandwidth: <1% of data plane throughput.
P4 Implementation Strategy:
- 1.
Destination Fan-out Preservation: Use a P4 register array indexed by source IP hash, storing a compact representation of destination sets (e.g., Bloom filter or Count-Min sketch). Export register snapshots every 10 s, enabling per-source fan-out computation at the control plane.
- 2.
Inter-packet Timing Preservation: For flows matching suspicious patterns (SYN-dominant, short-lived), enable packet-level timestamp export. Use P4 m to identify candidate flows, then mirror selected packets to a telemetry collector with full timing information.
- 3.
Resource Management: Implement LRU eviction for fan-out registers when memory is exhausted, prioritizing high-activity sources. Use the audit framework to validate that eviction policies do not eliminate support for critical tactics.
Validation: The audit framework evaluates this configuration and confirms that (i) destination fan-out remains computable from exported register snapshots (L1-level support), and (ii) inter-packet timing is available for selected flows (L0-level support for Execution). The framework reports that Lateral Movement and Execution remain supportable under this hybrid configuration, while Persistence (requiring entity-linked periodicity) becomes non-supportable due to register eviction, which removes long-term state.
6.7.4. Scalability and Computational Overhead Analysis
To address concerns about scalability and computational overhead for large-scale, high-speed backbone traffic over extended periods, we provide a complexity- and pipeline-oriented analysis. We do not claim a universal benchmark (e.g., a fixed number of minutes per 15 min window), because runtime depends strongly on traffic volume, capture settings, hardware, and implementation details.
Audit Framework Overhead: The audit framework’s core operation—checking artifact computability against a telemetry schema—is a lightweight, rule-based process that evaluates structural properties of evidence representations. The computationally intensive work (PCAP to L1/L2 conversion, flow record generation, time aggregation) is performed by standard tools (e.g., YAF [
41]) as part of the normal telemetry pipeline deployment. The audit itself is a
meta-analysis of the output schema, not a full traffic re-analysis. Once artifact computability is determined for a given telemetry configuration (L0/L1/L2), the audit results remain valid regardless of traffic volume or the length of the time period, making the framework highly scalable for design-time evaluation.
Computational Complexity (High Level):
The audit framework’s computational overhead is dominated by artifact computation and depends on the evidence product:
L0 (Packet-level): Processing scales linearly with the number of packets whose headers/timestamps are parsed and used for artifact computation ( with respect to packet count). This layer is the most computationally demanding and is typically infeasible to run continuously at line rate on high-speed backbones without dedicated capture infrastructure, sampling, or selective capture.
L1 (Flow-level): Processing scales with the number of exported flow records ( with respect to flow count). In operational settings, flow generation is performed by exporters/meters (e.g., IPFIX exporters), so the audit can be applied to already-exported flow logs, shifting the primary cost from packet processing to flow-log analytics.
L2 (Time-aggregated): Processing scales with the number of time bins and exported aggregate counters ( with respect to bins/records). Because aggregation substantially reduces volume, L2 audits are typically the least computationally intensive.
Scaling to Extended Time Periods:
For extended periods (hours, days, weeks), the framework can be applied using several strategies:
Strategy 1: Incremental Processing. The audit framework evaluates artifact computability, which is a structural property of the representation, not traffic volume. Once artifact computability is established for a telemetry schema (L0/L1/L2), it remains invariant regardless of time period length. Therefore, the audit can be performed once per telemetry configuration, not continuously over time. For extended periods, operators can do the following:
Run the audit on representative time windows (e.g., peak hours, off-peak hours).
Cache artifact computability results per telemetry configuration.
Re-run the audit only when telemetry schema changes (e.g., IPFIX template modifications, aggregation bin size adjustments).
Strategy 2: Sampling-Based Evaluation. For very large datasets, the audit can be applied to sampled subsets. Since artifact computability is representation-driven (not traffic-dependent), sampling does not affect the structural results. Operators can do the following:
Sample representative time windows (e.g., 1-h samples per day).
Apply the audit to sampled data to validate artifact computability.
Extrapolate results to the full time period, since computability is invariant to traffic volume.
Strategy 3: Distributed Processing. For multi-terabyte archives, artifact computation can be parallelized:
Partition data by time windows or network segments.
Compute artifacts independently per partition.
Aggregate results (artifact computability is identical across partitions for the same representation).
Memory and Storage Requirements (Qualitative):
Memory requirements depend on whether artifacts are computed in a streaming fashion or require maintaining per-entity state (e.g., per-source adjacency surrogates for fan-out, per-entity periodicity features). In practice, the audit can be implemented as a streaming pipeline over PCAP/flow logs, with bounded memory determined by the chosen artifact set and any required state (e.g., hash tables, sketches, sliding windows). L2 representations generally have the smallest storage and memory footprint because they are already aggregated.
High-Speed Traffic Considerations:
For high-speed backbone links, the primary bottleneck is typically data ingestion and storage at L0 (packet capture), not the audit logic itself:
Packet-level (L0): Continuous full-rate packet capture is often infeasible at backbone scale due to storage and processing constraints; in such settings, operators rely on sampling, selective capture, or derived products. Our audit can be applied to sampled or selectively captured packet traces to validate which packet-level artifacts are computable under the deployed capture policy.
Flow-level (L1): Flow exporters (e.g., IPFIX) handle high-speed traffic natively. The audit framework processes exported flow records that are already aggregated by the exporter, thereby avoiding packet-level bottlenecks.
Time-aggregated (L2): Pre-aggregated statistics from high-speed links are typically manageable volumes, enabling efficient audit processing.
Practical Recommendations:
For large-scale, extended-period deployments:
- 1.
Design-time evaluation: Run the audit once per telemetry configuration during design/planning phases, not continuously during operation.
- 2.
Configuration change triggers: Re-run the audit only when telemetry schemas change (IPFIX template updates, aggregation parameter modifications).
- 3.
Representative sampling: For validation, apply the audit to representative time windows rather than processing entire archives.
- 4.
Incremental processing: Use streaming algorithms for artifact computation to enable constant-memory processing of extended periods.
- 5.
Caching: Cache artifact computability results per telemetry configuration, invalidating only when relevant parameters change.
Limitations:
The current implementation processes 15 min windows sequentially. For very large archives (months/years), full processing would require the following:
Significant computational resources (days/weeks of CPU time for packet-level processing).
Large storage capacity for intermediate results.
However, this is typically unnecessary since artifact computability is invariant to time period length once the telemetry schema is fixed.
Future work should develop optimized implementations using distributed processing, approximate algorithms for very large datasets, and hardware acceleration for packet-level processing on high-speed links.
6.7.5. Implementation Limitations and Future Work
Several limitations must be acknowledged for realistic deployment:
Platform-Specific Constraints: The feasibility analysis assumes standard P4 and SDN capabilities, but real deployments may face vendor-specific limitations (e.g., restricted register sizes and fixed export formats). Future work should develop platform-specific audit adapters that map framework outputs to vendor capabilities.
Dynamic Traffic Adaptation: The current framework evaluates static telemetry configurations. In practice, traffic patterns change, requiring dynamic reconfiguration. Future work should extend the framework to support online adaptation, continuously monitoring coverage metrics, and triggering reconfiguration when thresholds are breached.
Multi-Switch Coordination: Large networks require coordinated telemetry policies across multiple switches. The framework currently evaluates single-device configurations. Future work should develop distributed audit protocols that ensure network-wide coverage requirements are met while respecting per-device resource constraints.
Scalability for Extended Periods: While the framework’s computational complexity is manageable for design-time evaluation, processing multi-terabyte archives would require significant resources. However, since artifact computability is invariant to the length of the time period (once the telemetry schema is fixed), full archive processing is typically unnecessary. Future work should develop optimized implementations for very large-scale deployments.
This feasibility analysis demonstrates that while integration is non-trivial, the audit framework provides actionable guidance for real-world deployment, with concrete solutions to identified implementation challenges.
6.8. Handling Partial and Borderline Support Cases
While our framework evaluates artifact computability as a binary property (supportable vs. non-supportable), practitioners must often reason about cases where artifacts gradually deteriorate rather than completely vanish. This subsection provides structured guidance for interpreting and acting upon partial or borderline support cases.
6.8.1. Degradation Categories
We classify artifact degradation into three categories:
Complete Loss: The artifact cannot be computed from the representation due to fundamental information loss. For example, inter-packet timing patterns are completely lost at L1 because flow records do not contain per-packet timestamps. In such cases, the artifact provides no support for tactics that require it (e.g., Execution inference).
Precision Degradation: The artifact remains technically computable but with reduced precision or reliability. For example, entity-linked periodicity at L1 may be affected by flow timeout effects that split periodic patterns across multiple flow records. Practitioners must evaluate whether the degraded precision is sufficient for their forensic objectives.
Scope Reduction: The artifact remains computable but for a reduced set of entities or time windows. For example, destination fan-out at L1 may lose precision if flow timeouts cause connection splitting, reducing the reliability of Lateral Movement inference. Practitioners must assess whether the reduced scope enables actionable forensic conclusions.
6.8.2. Decision Framework for Practitioners
We provide a structured decision framework that helps practitioners evaluate whether partial support is sufficient:
High-Stakes Scenarios: When forensic defensibility is critical (e.g., legal proceedings, compliance audits), practitioners should treat precision-degraded artifacts as non-supportable to maintain conservative reasoning. For example, if entity-linked periodicity at L1 is affected by flow timeout splitting, practitioners should not rely on it for Persistence inference in legal contexts.
Operational Monitoring: When the objective is operational threat detection rather than forensic reconstruction, precision-degraded artifacts may be acceptable if they enable actionable alerts. For example, destination fan-out at L1 may be sufficient for operational Lateral Movement detection even if flow timeouts reduce precision, as long as alerts trigger further investigation.
Hybrid Configurations: When partial support is insufficient, practitioners can design hybrid monitoring configurations that preserve critical artifacts at higher-fidelity layers. For example, selective packet capture (L0) for high-value targets while using flow records (L1) for general monitoring enables both scalable operation and forensic coverage for critical assets.
6.8.3. Degradation Indicators
We introduce qualitative degradation indicators that practitioners can use to assess artifact reliability:
Flow Timeout Effects: For artifacts that depend on connection continuity (e.g., entity-linked periodicity), practitioners should evaluate whether flow timeout settings cause significant splitting that degrades artifact reliability. Extended flow timeouts (e.g., 300 s) may preserve periodic patterns better than short timeouts (e.g., 15 s), but at the cost of increased state overhead.
Aggregation Window Effects: For time-aggregated artifacts, practitioners should assess whether aggregation windows are sufficiently fine-grained to preserve temporal patterns of interest. For example, 1-s bins may preserve Execution-related timing patterns better than 60-s bins, but at the cost of increased storage.
Sampling Effects: When sampling is used (e.g., packet sampling at L0), practitioners should evaluate whether the sampling rate provides sufficient coverage for artifact computation. For example, 1:100 packet sampling may preserve rate-based artifacts but may miss low-volume timing patterns required for Execution inference.
6.8.4. Practical Examples
Example 1: Lateral Movement Inference with Flow Timeout Effects
Consider a scenario where practitioners must evaluate whether L1 flow records provide sufficient support for Lateral Movement inference when destination fan-out may be affected by flow timeout splitting. The decision framework suggests the following:
High-Stakes: If forensic defensibility is critical, treat destination fan-out at L1 as non-supportable due to potential timeout effects, and require L0 packet capture or extended flow timeouts.
Operational: If the objective is operational detection, destination fan-out at L1 may be sufficient if flow timeouts are configured appropriately (e.g., 60-s active timeout) and alerts trigger further investigation.
Hybrid: Use L1 for general monitoring with selective L0 packet capture for suspicious flows, enabling both scalable operation and forensic coverage.
Example 2: Persistence Detection with Entity-Linked Periodicity
Consider a scenario where practitioners must determine whether L2 time-aggregated statistics enable Persistence detection when entity-linked periodicity is lost but aggregate periodicity remains. The decision framework suggests the following:
High-Stakes: Entity-linked periodicity is required for Persistence inference; aggregate periodicity alone is insufficient. Practitioners should not rely on L2 for Persistence inference in legal contexts.
Operational: Aggregate periodicity at L2 may enable operational detection of periodic traffic patterns, but cannot attribute periodicity to specific entities. Practitioners should use L2 for initial alerts and require L1 or L0 for entity attribution.
Hybrid: Use L2 for general monitoring with selective L1 flow records for entities exhibiting periodic patterns, enabling both scalable operation and entity-level attribution.
This structured guidance enables practitioners to make informed decisions about partial support cases while maintaining the framework’s focus on structural computability as the primary evaluation criterion.
6.9. Practical Ramifications and Telemetry Design Trade-Offs
While our results are well-interpreted at a high level, practitioners require detailed guidance on concrete trade-offs when designing telemetry systems. This subsection provides actionable analysis of practical ramifications and design decision-making.
6.9.1. Concrete Design Trade-Off Analysis
Trade-off 1: Storage vs. Forensic Coverage
Scenario: An ISP must choose between L0 packet capture (high storage cost, full forensic coverage) vs. L1 flow records (low storage cost, reduced forensic coverage).
Trade-off Analysis:
L0: Provides support for 9 ATT&CK tactics but requires substantially higher storage compared to L1 (packet headers vs. flow summaries). The storage ratio depends on traffic characteristics, but packet-level capture typically requires orders of magnitude more storage than flow records.
L1: Provides support for 8 ATT&CK tactics but loses Execution inference capability (requires packet-level timing).
L2: Provides support for 7 ATT&CK tactics but enables flow-based defensive techniques not available at L0 (e.g., Flow-based Monitoring).
Practical Recommendation: For resource-constrained switches, prioritize L1 with extended flow timeouts (e.g., 300 s) to preserve entity-linked artifacts while avoiding per-packet overhead. Use approximate data structures (e.g., Count-Min sketches) for high-cardinality artifacts (e.g., destination fan-out) to manage memory constraints.
Trade-off 3: Scalability vs. Defensive Coverage
Scenario: An enterprise network must choose a monitoring architecture for 10 Gbps links with varying security requirements.
Trade-off Analysis:
L0: Enables full defensive coverage (9 tactics) but may not scale to high-speed links without sampling. Continuous full-rate packet capture at 10 Gbps requires specialized hardware and significant storage.
L1: Scales to high-speed links but loses Execution inference and some behavioral monitoring capabilities. Flow exporters handle 10 Gbps natively with standard hardware.
L2: Scales best but loses entity-linked periodicity and destination fan-out (affecting Persistence and Lateral Movement). Pre-aggregated statistics are manageable even at 100+ Gbps.
Practical Recommendation: Use L1 as a baseline with selective L0 sampling for suspicious flows, enabling scalable operation while preserving critical forensic capabilities. For example, use L1 flow records for all traffic and trigger L0 packet capture (1:100 sampling or selective mirroring) for flows matching threat intelligence indicators or anomaly detection alerts.
6.9.2. Non-Monotonic Behavior Implications
The non-monotonic transformation of defensive coverage (D3FEND applicability increases at L1 before decreasing at L2) has important practical implications:
Why D3FEND Applicability Increases at L1: Flow-based monitoring techniques become enabled at L1 (requiring flow records), partially compensating for the loss of packet-level behavioral monitoring. For example, Flow-based Monitoring enables defensive techniques (e.g., flow-based rate limiting, connection throttling) that are not applicable at L0, where only packet-level behavioral monitoring is available.
Practical Implication: Practitioners should not assume that more abstract telemetry always reduces defensive capabilities. Instead, they should evaluate which defensive techniques are enabled/disabled at each layer. For example, L1 may be preferable to L0 for certain defensive objectives (e.g., flow-based DDoS mitigation) even though it loses Execution inference capability.
Design Guidance: When designing hybrid monitoring systems, practitioners can strategically combine L0, L1, and L2 to maximize defensive coverage by enabling complementary defensive techniques. For example, use L0 for behavioral monitoring (Execution, fine-grained anomaly detection) and L1 for flow-based defensive techniques (rate limiting, connection throttling), enabling both capabilities simultaneously.
6.9.3. Concrete Telemetry Configuration Examples
Example 1: ISP Backbone Monitoring
Constraints: 100 Gbps links, 30-day retention, privacy regulations limit payload access.
Audit Application: Evaluate L1 vs. L2 for different network segments.
Decision: Use L1 for general monitoring (preserves 8 tactics and enables flow-based defense), and L2 for long-term archival (reduces storage and preserves 7 tactics). For example, retain L1 flow records for 7 days (for operational monitoring) and L2 time-aggregated statistics for 30 days (for long-term analysis).
Risk Assessment: Loss of Execution inference at both L1 and L2 is acceptable for the ISP context, as endpoint monitoring (outside ISP scope) handles Execution detection. The ISP’s primary forensic objectives (Lateral Movement, Command and Control, Exfiltration) remain supportable at L1.
Example 2: Enterprise Network Security Operations Center (SOC)
Constraints: 10 Gbps links, real-time threat detection required, forensic reconstruction needed for incident response.
Audit Application: Evaluate hybrid L0 + L1 configuration.
Decision: Use L1 for real-time monitoring (scalable, enables 8 tactics) with selective L0 packet capture for suspicious flows (enables Execution inference, preserves full forensic coverage). For example, use L1 flow records for all traffic and trigger L0 packet capture (selective mirroring) for flows matching threat intelligence indicators or anomaly detection alerts.
Risk Assessment: Hybrid configuration balances scalability with forensic coverage, enabling both operational detection and post-incident reconstruction. The SOC can investigate Execution-related incidents using L0 packet captures while maintaining scalable L1 monitoring for general traffic.
Example 3: Cloud Provider Network Monitoring
Constraints: Multi-tenant environment, varying security requirements, programmable data plane (P4) available.
Audit Application: Evaluate P4 telemetry export schema design.
Decision: Design a parameterized P4 pipeline that can export L0, L1, or L2 based on tenant security requirements, enabling dynamic telemetry configuration. For example, high-security tenants can request L0 packet capture (full forensic coverage), standard tenants receive L1 flow records (balanced coverage and cost), and low-security tenants receive L2 time-aggregated statistics (cost-optimized).
Risk Assessment: Parameterized design enables tenants to choose the telemetry abstraction level based on their security needs and cost constraints. The cloud provider can offer tiered monitoring services (Premium: L0, Standard: L1, Basic: L2) with corresponding pricing and forensic coverage.
6.9.4. Cost-Benefit Framework
We provide a structured framework for practitioners to evaluate telemetry design decisions:
Cost Dimensions:
Storage: Data retention costs (L0: high, L1: moderate, L2: low).
Processing Overhead: Computational requirements (L0: high, L1: moderate, L2: low).
Export Bandwidth: Network overhead for telemetry export (L0: high, L1: moderate, L2: low).
Implementation Complexity: Development and maintenance effort (L0: high, L1: moderate, L2: low).
Benefit Dimensions:
Forensic Coverage: Number of supportable ATT&CK tactics (L0: 9, L1: 8, L2: 7).
Defensive Applicability: Number of enabled D3FEND techniques (L0: 7, L1: 8, L2: 7).
Operational Capabilities: Real-time detection, incident response, and forensic reconstruction.
Decision Matrix: Practitioners can use this framework to evaluate trade-offs between cost and benefit dimensions for different telemetry configurations, enabling data-driven design decisions informed by organizational priorities and constraints.
6.9.5. Integration with Existing Systems
IPFIX Exporter Configuration: Practitioners can configure flow timeout settings to preserve entity-linked artifacts while managing state overhead. For example, extended active timeouts (300 s) preserve periodic patterns better than short timeouts (15 s), but at the cost of increased flow state memory. The audit framework can evaluate which timeout settings preserve required artifacts for specific forensic objectives.
Time-Aggregation Policy: Practitioners can choose aggregation windows that balance storage reduction with temporal pattern preservation. For example, 1-s bins preserve Execution-related timing patterns better than 60-s bins, but at the cost of increased storage. The audit framework can evaluate which aggregation windows preserve required artifacts for specific forensic objectives.
Selective Capture Strategies: Practitioners can design sampling or selective capture policies that preserve critical artifacts while managing resource constraints. For example, 1:100 packet sampling may preserve rate-based artifacts but may miss low-volume timing patterns. The audit framework can evaluate which sampling strategies preserve required artifacts for specific forensic objectives.
This expanded discussion provides concrete, actionable guidance that practitioners can directly apply to their telemetry design decisions, moving beyond high-level interpretation to practical implementation strategies.
6.10. Limitations and Future Work
This study intentionally focused on backbone-level passive observation without payload visibility, endpoint context, or ground truth. While this reflects common operational constraints, several limitations warrant future investigation.
Artifact Catalog Scope: The 13 artifacts examined in this study are representative rather than exhaustive, designed to demonstrate the framework’s methodology and enable evaluation of all network-observable ATT&CK tactics. The framework is extensible: The mapping method is general and can accommodate additional artifacts (e.g., DNS query patterns, TLS handshake features, packet size distributions) as they are identified in future work. The current instantiation using 13 artifacts provides sufficient coverage to demonstrate selective and structural inference loss patterns, but practitioners may extend the catalog based on domain-specific requirements.
Other Limitations: First, the artifact catalog is currently static; future work could develop adaptive artifact definitions that adjust based on traffic characteristics or threat intelligence. Second, the analysis assumes deterministic artifact computation; in practice, measurement noise, sampling, and flow timeout variations introduce uncertainty that should be quantified. Third, the framework evaluates artifact computability but not evidential strength; future work could develop probabilistic models that quantify confidence in tactic support given noisy or partial observations.
The detailed feasibility analysis provided in
Section 6 (Feasibility Analysis: Integration into SDN Controllers and P4 Pipelines) addresses implementation challenges and concrete integration strategies for real-world deployment.