Next Article in Journal
Dunhuang Mural Style Transfer Using Vision Mamba: In-Context Prompting and Physically Motivated HSV Modulation
Previous Article in Journal
Hierarchical Redundancy-Driven Real-Time Replanning for Manipulators Under Dynamic Environments and Task Constraints
Previous Article in Special Issue
A-WHO: Stagnation-Based Adaptive Metaheuristic for Cloud Task Scheduling Resilient to DDoS Attacks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Impact of Optimization Goal Visibility on Inter-Cloud DTM Performance

1
Institute of Telecommunications and Cybersecurity, AGH University of Krakow, 30-059 Krakow, Poland
2
Institute of Applied Computer Science and Mark Kac Center for Complex Systems Research, Jagiellonian University, 30-348 Krakow, Poland
3
Akamai Technologies, Inc., 31-323 Krakow, Poland
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(8), 1576; https://doi.org/10.3390/electronics15081576
Submission received: 8 March 2026 / Revised: 7 April 2026 / Accepted: 8 April 2026 / Published: 9 April 2026

Abstract

This work presents an enhancement to the Dynamic Traffic Management (DTM) framework aimed at reducing signaling overhead between SDN controllers in multi-domain cloud environments. This extension is based on the ability to transmit information regarding the amount of balanced traffic and the optimal transfer pattern. In the baseline periodic mode, the system regularly exchanges the compensation vector ( C ) and the reference pattern ( R ). To minimize communication, we define non-periodic modes that restrict C updates and eliminate R transmission entirely. Within these restricted signaling modes, we further distinguish between reactive and proactive operational schemes. Our experimental results demonstrate that reducing the visibility of optimization goals ( R and only sign of C ) and cutting signaling frequency in this manner maintains a comparable level of cost-efficiency. Specifically, the initial evaluation shows that DTM typically decreases transit costs by 8% to 15%, with maximum savings reaching up to 29% when compared to the worst-case default BGP path scenario. These findings suggest that the DTM mechanism can maintain its economic efficiency even with significantly reduced inter-domain coordination.

1. Introduction

The management of inter-domain traffic transit costs has become a cornerstone of Operational Expenditures (OpEx) for modern network operators. As cloud services become increasingly distributed, the traffic exchanged between remote data centers (DCs) via Internet Service Providers (ISPs) continues to grow in both volume and complexity. Unlike traditional intra-domain routing, inter-domain transit is governed by diverse and often non-linear billing models. The most prevalent among these are volume-based tariffs, where costs are proportional to the total data transferred, and 95th percentile-based billing, which penalizes short-lived but high-magnitude traffic spikes.
To address these financial challenges, the Dynamic Traffic Management (DTM) framework was proposed [1]. DTM is an SDN-based architecture that performs real-time load balancing across multiple inter-domain links. By distinguishing between “manageable” cloud originated traffic and “non-manageable” background traffic, the mechanism can dynamically shift flows to the most cost-effective path. While the foundational principles of DTM have been shown to reduce transit costs significantly, the mechanism traditionally relies on a high degree of coordination between administrative domains.
The effectiveness of DTM is inherently related to the signaling process between the domain performing the optimization (the receiving ISP) and the domain where the traffic is originated (the sending ISP). In its baseline configuration, the remote controller requires full visibility of the optimization target—represented by a reference vector—to execute precise traffic steering. However, in real-world deployments, maintaining such a continuous and detailed signaling exchange introduces overhead and raises concerns regarding the scalability of the control plane.
This article investigates a critical, yet previously unexplored dimension of DTM: the impact of optimization goal visibility on system performance. We propose an evolution of the DTM signaling protocol that allows for a reduction in message frequency and a simplification of the remote control logic. Specifically, we evaluate new operational modes—Reactive and Proactive—where the remote domain operates with limited or no knowledge of the global optimization pattern. We examine whether DTM can maintain its cost-saving capabilities for both volume and 95th percentile billing schemes when signaling is reduced to its most basic form.
This article extends the foundational DTM framework described in [1] by introducing several novel features designed to optimize the control plane and investigate the impact of informational exchange on system efficiency. The key contributions of this work include:
  • Analysis of Optimization Goal Visibility: We define and evaluate operational modes that vary the degree of information shared between domains, specifically investigating whether the remote domain requires knowledge of the global optimization target (the reference vector) to achieve cost-optimal results.
  • Adaptive Signaling and Overhead Reduction: We introduce a sign-based reporting logic that minimizes inter-domain control traffic. By triggering updates only upon a change in the compensation direction rather than relying on strict periodic reporting, we significantly reduce the signaling footprint of the mechanism.
  • Control Plane Scalability via Proactive Steering: We propose a proactive operational mode that allows the management of large manageable traffic using a single flow-table entry. This approach enhances the scalability of the solution in hardware-constrained SDN environments by eliminating the need for per-flow state maintenance.
  • Dual-Tariff Performance Verification: We provide a focused comparative analysis of these new operational modes under both volume-based and 95th percentile billing schemes, evaluating the feasibility of reduced-visibility steering against default BGP path baselines in various economic scenarios.
While the previous study in [1] established the DTM system architecture and the mathematical mechanisms for inter-domain cost optimization, this manuscript focuses on overcoming the signaling overhead challenges inherent in practical inter-domain deployments. Specifically, we investigate the impact of information visibility on control-plane efficiency. Our work moves beyond simple engineering adjustments by proposing a decoupled signaling architecture that ensures economic efficiency even under limited goal visibility and hardware constraints (proactive mode—limiting Ternary Content-Addressable Memory (TCAM) occupancy).
The rest of the paper is organized as follows. We discuss other similar solutions in Section 2. In Section 3, we explain the basic DTM operation and describe the new enhancements in detail. Section 4 covers the testbed setup, the metrics used for evaluation, and a detailed analysis of our experimental results. The paper is concluded in Section 5.

2. Related Work

The rising volume of inter-cloud traffic, driven by the expansion of distributed DC architectures, has made wide area network (WAN) interconnection a critical factor for online service providers like Google, Amazon, and Microsoft [2,3,4]. For these entities, as well as for ISPs, operational expenditures are heavily influenced by link utilization and the specific billing models applied to inter-domain transit [5,6].
Current industry-leading Software-Defined Wide Area Network (SD-WAN) solutions, such as Google B4 [7,8] and Microsoft SWAN [9], utilize SDN to centralize control over the network edge. These frameworks often rely on dedicated private backbones and multipath forwarding to achieve near-maximal bandwidth utilization. Microsoft’s recent OneWAN [10] further advances this by unifying separate WAN infrastructures through MPLS-based control to reduce routing table overhead. While these systems are highly efficient for private infrastructures, DTM distinguishes itself by focusing on cost optimization across distinct administrative domains, utilizing GRE or MPLS tunneling to manage traffic without requiring full control over the transit provider’s internal routing [11].
Cost optimization strategies in the literature often target specific pricing or traffic types. For instance, the COIN mechanism [12] employs a mathematical model to solve NP-hard optimization problems for hybrid pricing schemes. In contrast, DTM is specifically engineered for standard ISP billing practices, namely 95th percentile and volume-based tariffs. Similarly, while CASCARA [13] balances operator costs against latency for outbound cloud traffic, DTM offers a more flexible approach by managing both inbound and outbound flows. The latter is particularly complex as it necessitates active signaling and coordination between communicating domains—a core focus of our current study regarding information visibility.
Other research focuses on resource placement and bandwidth guarantees. Gu et al. [14] proposed a mixed-integer linear programming model for optimal VM placement and load balancing to meet Service Level Agreements (SLAs). In a related vein, the work in [15] introduces a distributed algorithm to decompose bandwidth guarantee problems into scalable sub-problems. While these solutions are effective for resource allocation, DTM operates as a reactive mechanism that manages existing traffic based on real-time measurements rather than pre-allocation or VM migration.
A significant distinction exists between spatial and temporal optimization. Solutions such as Grandet [16], Sparse SnF [17], and NetStitcher [18] exploit the delay tolerance of certain data transfers by scheduling them during non-peak hours (time-domain optimization). DTM, conversely, operates shifting “manageable” traffic across available inter-domain links in real-time to avoid breaching tariff thresholds.
Finally, while mechanisms like CONA [19] and Gemini [20] prioritize congestion avoidance and throughput maximization through TCP window modulation or buffer management, DTM’s primary objective remains the monetary optimization of transit costs. By reacting to traffic spikes across cooperating administrative domains, DTM provides a unique layer of management that complements traditional congestion control and scheduling-based transmission schemes [21].

3. Mechanism Description

The DTM mechanism is designed as a reactive control loop that redistributes traffic across multi-homed inter-domain links to minimize traffic transfer costs. The operation and theoretical foundations of DTM, particularly its ability to adapt to varying network conditions, have been established in [1]. The mechanism is cloud-agnostic and functions by steering tunneled traffic based on real-time telemetry.

3.1. Overview of the DTM Mechanism

The DTM framework is designed to manage OpEx for multi-homed ISP architectures by strategically distributing traffic across available inter-domain links. As analyzed in [1], the mechanism is highly versatile, supporting cost optimization for both volume-based and 95th percentile-based charging schemes. Given the inherent complexity of managing inbound traffic, DTM relies on a cooperative signaling model as illustrated in Figure 1. In this framework, the receiving domain (ISP-C) is responsible for telemetry collection and the execution of the optimization logic, while the sending domain (ISP-B) performs the actual traffic steering. The control plane interactions are delineated by the grey dashed lines in the figure: at the receiver site, these represent the monitoring of Border Gateway (BG) routers, whereas at the sender site, they indicate the interface for flow-rule installation and telemetry at the Data center Attachment point (DA-B).
Load Balancing Infrastructure To enable traffic redistribution, the ISP must maintain at least two inter-domain links. DTM focuses on balancing inbound traffic aggregates to ensure that the cumulative transfer cost remains minimal. The decision-making process occurs in the domain receiving the data, but the actual shifting of flows is performed at the traffic source.
Traffic Categorization Data arriving at the inter-domain interfaces is split into two classes: manageable and non-manageable (background) traffic. Manageable traffic typically comprises cloud-to-cloud flows that can be encapsulated and redirected.
Tunnel-based Traffic Steering Intersite communication is handled via tunnels (e.g., GRE, MPLS) established between the communicating DCs. Each tunnel is logically mapped to a specific inter-domain link. By attaching these tunnels to load-balancing points (such as the DA-B router in Figure 1), the system can shift traffic before it reaches the transit links.
State Representation via Vectors During each billing cycle, the system periodically monitors the total traffic volume X and the manageable component Z on the Border Gateway (BG) routers. Each vector component represents the traffic volume measured on a specific link, providing a real-time snapshot of the network’s utilization.
Optimization Target ( R ) The mechanism aims to achieve a target traffic distribution, represented by the reference vector R , which corresponds to the predicted minimal total cost for the current billing period. This prediction is performed by analyzing the traffic statistics and distribution patterns observed during the preceding billing period. The calculation of R is adaptable to the link’s tariff function; it can optimize for simple cumulative volume or for the 95th percentile samples, depending on the operator’s contracts.
The calculation of the reference vector R constitutes the core of the cost-minimization logic. A comprehensive mathematical formulation of this procedure—including both a custom heuristic algorithm and a formal Linear Programming (LP) model for both volume-based and 95th percentile tariffs—is beyond the scope of this paper. Interested readers are referred to Section II.E.1 of our foundational work in [1] for the complete, step-by-step derivation of the optimization target. In this manuscript, we assume R as a given input provided by the optimization engine, focusing instead on the signaling mechanism.
Dynamic Compensation ( C ) To maintain the desired trajectory towards the optimal state R , the system calculates a compensation vector C at short intervals (e.g., every 30 s). This vector identifies the deviation of current traffic X from the target. Based on C , new manageable flows are assigned to less utilized links, effectively counteracting unpredictable spikes in background traffic and ensuring the final cost remains within the assumed budget.
The compensation vector C ( t ) serves as the real-time control signal, dynamically reflecting the deviation of the current traffic state X ( t ) from the optimal trajectory defined by R . The precise mathematical formulas, along with the specific counter-readout and polling procedures used to compute C ( t ) at short intervals for different billing tariffs, are detailed in Section II.E.3 of [1]. Presented in this article research focuses entirely on optimizing the inter-domain signaling of this vector. Transmitting only the sign of the C ( t ) components, rather than their full magnitude, is sufficient to manage traffic efficiently.
Principles of DTM load-balancing process The operational logic of the steering process is illustrated in Figure 2. The SDN controller continuously monitors the current load X i on each inter-domain link. When the manageable traffic exceeds the reference threshold (e.g., R 2 on Link L 2 ), the mechanism utilizes the compensation vector C to identify and redirect the excess volume to an underutilized link (e.g., Link L 1 ). This real-time adjustment ensures that the global traffic distribution remains aligned with the overall optimization target R .

3.2. Operational Modes and Signaling Paradigms

The core innovation of this work lies in how the information contained in vectors R and C is shared between domains. We define three distinct modes that vary in the “visibility” of the optimization goal. To provide a clear overview of the distributed execution logic, the detailed logical flow of the inter-domain signaling and steering process is depicted in the flowchart in Figure 3. This diagram illustrates the event-driven control loop, starting from telemetry collection in the receiving domain to the final flow table updates in the sending domain, highlighting the conditions under which signaling updates are triggered.

3.2.1. Reactive with Full Reference Visibility

In this mode, the SDN controller in the optimizing domain periodically transmits the full state—both the compensation vector C and the reference vector R —to the remote domain (e.g., every 30 s). The remote SDN controller uses the magnitude of C to determine exactly how much manageable traffic must be steered toward a specific link. Once the measured volume of traffic redirected via the selected tunnel reaches the value specified in the compensation vector, the remote controller switches to a balancing scheme derived directly from the components of the reference vector R . While this provides the highest precision for 95th percentile optimization, it requires the remote controller to perform continuous per-tunnel traffic measurements and increases signaling overhead.
To clarify the control-plane operations within the source domain, Algorithm 1 details the event-driven logic of the SDN controller in the reactive with reference mode. The process is divided into four asynchronous handlers to ensure high-throughput processing. When new optimization targets are signaled from the receiving domain (Event update_C_R), the controller establishes start parameters for current billing period. Upon receiving a new C (Event update_C_R or update_C), the controller activates the compensation flag. A background timer task (Event update_counters) continuously evaluates the accumulated traffic against the compensation constrain (line 23 in the algorithm). Once the required volume is shifted, the system exits the compensation phase and falls back to a proportional load-balancing scheme based on the reference vector R . New flow requests (Event Packet-In) do not block the controller; they are simply assigned to the active tunnel determined by the ongoing evaluation loop.
Algorithm 1 DTM Flow Steering and Compensation Logic (Reactive with Reference)
Require: Reference vector R = [ R 1 , R 2 ] , Compensation vector C = [ C 1 , C 2 ]
  1:
Event  update_C_R: Upon receiving signalling updates ( R , C ) from ISP-C at the billing period start
  2:
r [ 1 ] R 1 / ( R 1 + R 2 )                         ▹ Target proportion for Link 1
  3:
c n t _ s t a r t [ 1 ] R EAD C OUNTER ( 1 )
  4:
c n t _ s t a r t [ 2 ] R EAD C OUNTER ( 2 )
  5:
c o m p e n s a t e false
  6:
t u n n e l 1
 
 
  7:
Event  update_C_R   or  update_C: Upon receiving signalling update C from ISP-C
  8:
procedure  UpdateOptimizationTarget
  9:
    if  C 1 > 0  then
10:
           t u n n e l 1
11:
           c o m p e n s a t e true
12:
    else if  C 2 > 0  then
13:
           t u n n e l 2
14:
           c o m p e n s a t e true
15:
    end if
16:
end procedure
 
 
17:
Event  update_couters: Periodic State Evaluation (Executes every 1 s)
18:
procedure  EvaluateTunnelState
19:
    if  c o m p e n s a t e  is true then
20:
           v o l u m e [ 1 ] R EAD C OUNTER ( 1 ) c n t _ s t a r t [ 1 ]
21:
           v o l u m e [ 2 ] R EAD C OUNTER ( 2 ) c n t _ s t a r t [ 2 ]
22:
           t u n n e l _ p ( t u n n e l mod 2 ) + 1
23:
          if  v o l u m e [ t u n n e l ] C t u n n e l > R t u n n e l / R t u n n e l _ p · v o l u m e [ t u n n e l _ p ]  then
24:
                 c n t _ s t a r t [ 1 ] R EAD C OUNTER ( 1 )
25:
                 c n t _ s t a r t [ 2 ] R EAD C OUNTER ( 2 )
26:
                 t u n n e l t u n n e l _ p                        ▹ Switch to the other link
27:
                 c o m p e n s a t e false                    ▹ Compensation threshold reached
28:
          end if
29:
    else
30:
           t r a f f i c 1 R EAD C OUNTER ( 1 ) c n t _ s t a r t [ 1 ]
31:
           t r a f f i c total t r a f f i c 1 + R EAD C OUNTER ( 2 ) c n t _ s t a r t [ 2 ]
32:
          if  t r a f f i c total > 0  then
33:
                if  ( t r a f f i c 1 / t r a f f i c total ) r [ 1 ]  then
34:
                      t u n n e l 1                     ▹ Balance proportionally towards R 1
35:
                else
36:
                      t u n n e l 2                     ▹ Balance proportionally towards R 2
37:
                end if
38:
          end if
39:
    end if
40:
end procedure
 
 
41:
Event  Packet-In: New Flow Processing
42:
procedure OnNewFlowArrival( f l o w )
43:
    Send_SDN_Rule( f l o w , t u n n e l )              ▹ Install rule using currently active tunnel
44:
end procedure

3.2.2. Reactive with Limited Visibility

To simplify the inter-domain interface, we introduce a mode where the remote domain remains “blind” to the reference vector R . In “reactive without reference” mode, the remote controller ignores the magnitude of the compensation signal and focuses solely on its sign. The sign of C indicates the target link for new flows. A signaling message is only generated by the optimizing SDN controller when a sign change is detected (i.e., a different link becomes more cost-effective). To ensure system stability, a safety update is sent regardless of sign changes when a longer “compensation period” (e.g., 5 min) elapses. This mode significantly reduces the number of exchanged messages and eliminates the need for the remote domain to measure tunneled traffic, as it simply follows the “direction” provided by the optimizer. Algorithm 2 used by the SDN controller in the sending side is much simpler then that for full visibility case.
Algorithm 2 DTM Flow Steering and Compensation Logic (Reactive without Reference)
  1:
Initialize Global Variables:
  2:
t u n n e l 1
 
 
  3:
Event  update_C: Upon receiving signalling update C from ISP-C
  4:
procedure  UpdateOptimizationTarget
  5:
    if  C 1 > 0  then
  6:
           t u n n e l 1
  7:
    else if  C 2 > 0  then
  8:
           t u n n e l 2
  9:
    end if
10:
end procedure
 
 
11:
Event  Packet-In: New Flow Processing
12:
procedure OnNewFlowArrival( f l o w )
13:
    Send_SDN_Rule( f l o w , t u n n e l )     ▹ Install rule using currently active tunnel
14:
end procedure
The foundational DTM framework presented in [1] relied on exchanging four 64-bit values to synchronize the state of vectors C and R (with R updated once per billing cycle). Conversely, the “reactive without reference” mode reduces the communication overhead by transmitting only the signs of the C vector components. Consequently, the inter-domain signaling payload is limited to a mere two bits, where each bit dictates the required shift direction for its corresponding link.

3.2.3. Proactive Steering and Control Plane Scalability

The final investigated mode is “proactive without reference”. While reactive modes apply steering decisions only to newly initiated flows (to avoid packet reordering), the proactive mode immediately reassigns the entire manageable traffic—including active flows—to the newly selected link upon receiving a signaling update. Any sign change in vector C directly triggers a flow table update (Algorithm 3).
The proactive approach offers several key advantages for large-scale cloud environments, primarily focusing on flow-table efficiency and reduced reaction times. By managing the entire traffic with a single aggregated rule instead of maintaining state for thousands of individual flow entries, the mechanism ensures scalability in hardware-constrained SDN environments. Furthermore, because this method forces an immediate shift of all active traffic, it reacts much faster to unpredictable background spikes than reactive alternatives. Such rapid compensation is particularly valuable under 95th percentile tariffs, where waiting for new flows to be generated before shifting the load can lead to higher billing samples and increased transit costs.
Algorithm 3 DTM Flow Steering and Compensation Logic (Proactive)
  1:
Initialize Global Variables:
  2:
t u n n e l 1
 
 
  3:
Event  update_C: Upon receiving signalling update C from ISP-C
  4:
procedure  UpdateOptimizationTarget
  5:
    if  C 1 > 0  then
  6:
           t u n n e l 1
  7:
    else if  C 2 > 0  then
  8:
           t u n n e l 2
  9:
    end if
10:
    Modify_SDN_Rule( d e f a u l t , t u n n e l ) ▹ Modify default rule using currently active tunnel
11:
end procedure
The potential drawback of this mode is the risk of out-of-order packet delivery due to path latency differences. However, the primary objective of our evaluation is to determine if this proactive, low-visibility approach can achieve comparable cost reduction to the full-visibility reactive baseline.

4. Evaluation

To assess the performance of the DTM mechanism and its signaling variants, we conducted a series of experiments focusing on cost-efficiency and system stability. Specifically, our evaluation focuses exclusively on inbound traffic management. This configuration was selected because it represents a more complex technical challenge compared to outbound control. Inbound management necessitates a signaling exchange where the receiving domain must actively instruct the remote sending domain on how to distribute traffic to achieve the desired optimization goal. In contrast, outbound traffic steering is relatively straightforward, as the originating domain retains local control over its forwarding decisions without requiring inter-domain coordination. Therefore, to better investigate the impact of information visibility and signaling frequency, outbound scenarios were omitted from this study.
This section provides a detailed description of the experimental methodology, starting from the testbed configuration and traffic characterization, followed by the definition of Key Performance Indicators (KPIs) and a thorough analysis of the collected data.

4.1. Testbed Configuration and Traffic Profiles

The evaluation was performed in a controlled, virtualized network environment representing a multi-homed ISP architecture. The logical structure of the testbed, focused on the receiving domain (ISP-C), is illustrated in Figure 4. The SDN controller monitors inbound traffic via traffic measurement points located on the inter-domain interfaces of the border routers. Bidirectional dashed lines indicate telemetry paths.

4.1.1. Network Setup

The testbed was implemented using Ubuntu 24.04 virtual machines. Core routing functions were handled by VyOS instances, while SDN switching was realized through Open vSwitch (OVS). The DTM control logic was integrated into the Floodlight SDN controller. As shown in the architecture, the SDN controller monitors real-time traffic statistics at specific traffic measurement points on the BG-1 and BG-2 routers. These points provide the telemetry necessary to compute the X and Z vectors, which are foundational for both volume-based and 95th percentile cost optimization.

4.1.2. Traffic Generation and Billing Periods

To ensure the validity of our findings, traffic was generated using a realistic daily envelope derived from the DE-CIX internet exchange statistics [22]. We utilized data from June 2025 to model the daily fluctuations of manageable and non-manageable traffic. The manageable DC-to-DC flows were generated using a proprietary internet-like traffic generator (based on the Well19937c pseudo-random number generator) to produce a mixture of short-lived and long-lasting UDP flows.
The experiments were conducted using two distinct billing period configurations:
  • 1-Day Billing Period: Employed for rapid parameter tuning and as a preparatory phase for 95th percentile experiments. This allowed us to validate the system’s operating point under dynamic conditions where traffic behavior is less predictable.
  • 7-Day Billing Period: Used for the main performance assessment. This duration was selected to provide an adequate averaging effect, especially for the 95th percentile tariff, which is highly sensitive to the sample distribution. A 7-day run yields 2016 five-minute samples, which we found sufficient for the calculated percentile to converge to a realistic value.
The detailed traffic parameters and the piecewise linear cost functions used for the inter-domain links are summarized in Table 1 and Table 2, respectively. To accurately reflect real-world ISP billing practices, the transit cost associated with each inter-domain link l is modeled as a piecewise linear function f l ( x ) . This model accounts for tiered pricing structures, where the unit cost changes once certain traffic thresholds are exceeded. For a given link, the cost function is defined as:
f l ( x ) = a l , i · x + b l , i for α l , i 1 x < α l , i
where x represents the billing traffic volume (either total cumulative volume or the 95th percentile value), i denotes the current pricing tier index, and α l , i are the threshold values defining the intervals for link l. The parameter a l , i represents the marginal cost per unit of traffic (the slope) in the i-th tier, while b l , i is the intercept constant that ensures the continuity of the function or represents fixed port fees associated with that specific tier.

4.2. Parameter Sensitivity and Operational Trade-Offs

The performance of the DTM mechanism heavily relies on the timing parameters governing the control loop. While a full empirical parameter sweep is beyond the scope of this study, understanding the sensitivity of the system to the selected intervals is crucial for real-world deployment. The chosen parameters—the 30 s update interval, the 5 min safety timeout, and the 1-day/7-day billing periods—were established based on the following analytical trade-offs:
  • Update Interval ( T r e p o r t = 30 s): This parameter dictates the frequency at which the SDN controller polls traffic statistics ( X ) and calculates the compensation vector ( C ). Sensitivity: Decreasing this interval (e.g., to 5 or 10 s) would allow the mechanism to react to micro-bursts faster, theoretically improving the 95th percentile cost convergence. However, this would increase the control-plane overhead, leading to switch CPU exhaustion due to aggressive statistics polling. Conversely, increasing the interval (e.g., to 60 or 120 s) degrades the system’s responsiveness. In the context of a 5 min billing sample, a 120 s interval allows only two compensation corrections per sample, which is insufficient to flatten sudden traffic spikes, thereby risking a threshold violation. Thus, a 30 s interval provides an optimal balance, yielding 9 effective correction opportunities per billing sample (as the 10th update coincides with the sample recording and does not affect its final volume) without overwhelming the SDN infrastructure.
  • Safety Timeout (5 min): Theoretically, an explicit timeout is unnecessary for the core operation of the “without reference” mode. Since the mechanism is inherently event-driven, the optimal steering direction remains valid as long as the sign of the compensation vector C does not change. However, this parameter is retained as a critical failsafe mechanism. First, inter-domain signaling operates across wide area networks where control messages might occasionally be lost or delayed. If a crucial sign-change update were dropped in transit, the remote controller would continue forwarding traffic along a suboptimal path indefinitely. The safety timeout prevents this by forcing a state refresh. Second, the 5 min interval aligns with the industry-standard 95th-percentile sampling window. This guarantees that, regardless of traffic dynamics or potential packet loss, the inter-domain state is strictly synchronized at least once per billing sample, placing a hard upper bound on any potential desynchronization errors.
  • Billing Period Length (1-day vs. 7-day): The 95th percentile calculation is mathematically sensitive to the sample population size. Sensitivity: A 1-day billing period yields only 288 five-minute samples. In this scenario, the 95th percentile is determined by the 15th highest sample. At this scale, the metric is highly volatile and extremely sensitive to isolated, brief anomalies. Therefore, 1-day periods were utilized exclusively for initial tuning and observing immediate reactive behaviors. In contrast, a 7-day period generates 2016 samples, where the 95th percentile corresponds to the 101st highest value. This larger population absorbs statistical noise and captures the diurnal and weekly traffic cyclicality (e.g., weekday peaks vs. weekend lows).

4.3. Performance Metrics and Key Performance Indicators

To evaluate the efficiency of the DTM mechanism across its various operational variants—including the reactive modes (with and without reference visibility) and the proactive steering scheme—we employ the same set of Key Performance Indicators (KPIs) as established in our previous work [1]. This consistency allows for a direct comparison of how signaling limitations and steering strategies impact the economic outcomes of the optimization process.
The evaluation relies on the following general notation: X l represents the total traffic volume on a given inter-domain link l (the sum of background and manageable components), and Z l signifies the manageable traffic routed through the SDN-controlled tunnel l. The optimization target is defined by the reference vector R , where R l is the target volume for link l. The expected cost at the optimal point is denoted as D R = l f l ( R l ) , where f l is the cost function for the respective link l. The primary KPIs used in this study are:
  • Relative Monetary Gain ( ξ l ): The ratio of the cost achieved with DTM to the cost incurred when all manageable traffic follows a single default BGP path via link l. It indicates the percentage of savings achieved.
  • Absolute Cost Benefit ( Δ D ): The total monetary savings in Euros [€] compared to a non-optimized scenario.
  • Convergence Ratio ( ρ ): A measure of optimization precision, comparing the actual achieved cost to the theoretical optimal target ( D R ) calculated at the start of the billing period.
  • Signaling Update Count: A metric representing the control-plane overhead, defined as the total number of C vector updates exchanged between SDN controllers.

4.3.1. Volume-Based Tariff Metrics

For volume-based billing, the primary objective is to minimize the total expenditure D = l f l ( X l V ) , where X l V is the cumulative traffic on link l. To quantify the effectiveness of the DTM mechanism compared to a standard BGP-based approach (where all manageable traffic follows a single default path), we utilize the relative monetary gain factor ξ :
ξ ( l ) = D DTM D no-DTM , l
where l { 1 , 2 } indicates which link would serve as the default BGP path in the absence of DTM. A value of ξ ( l ) < 1 indicates that DTM successfully reduced costs relative to the static routing scenario. Complementary to this, the absolute cost benefit is calculated as Δ D ( l ) = D no-DTM , l D DTM . Finally, we monitor the precision of the optimization using the convergence ratio ρ = D / D R , which measures the deviation of the achieved cost from the theoretical optimum.

4.3.2. 95th Percentile Tariff Metrics

Evaluating the 95th percentile tariff is more complex due to its dependency on traffic dynamics over short intervals. Consistent with [1], the billing period is discretized into N five-minute samples. Separately for each link, these collected samples are stored in the ascending order. 95 percent of the lowest samples are discarded. From the remaining samples the lowest one is a billing value denoted by X l 95 . The final cost is then D = l f l ( X l 95 ) .
To assess the impact of the new signaling modes (reactive/proactive) on this tariff, we reconstruct the “what-if” scenarios for static routing. By isolating the manageable traffic samples Z l , i (where l denotes the link and i refers to the sample number) from each interval, we calculate what the 95th percentile would have been if all manageable traffic had remained on a single default link. The relative gain ξ ( l ) and absolute benefit Δ D ( l ) are then derived using these reconstructed costs, following the same logic as in the volume-based case. This approach allows us to precisely identify if reducing the visibility of the optimization goal R or switching to proactive steering affects the mechanism’s ability to “shave” the traffic peaks that define the 95th percentile cost.

4.4. Experimental Results

This subsection presents the findings obtained from the testbed infrastructure. The primary focus of the evaluation was to determine how varying signaling modes and optimization goal visibility affect the efficiency of the DTM mechanism. Our study first analyzes system performance under volume-based tariffs, highlighting the tradeoff between signaling overhead and cost optimization. Subsequently, we introduce a dedicated experiment involving “unfriendly” traffic patterns—defined by long-lasting flows—to evaluate the constraints of the reactive approach and the potential advantages of proactive steering. The results confirm that the proposed enhancements allow for significant signaling reduction while maintaining economic efficiency, although specific traffic profiles necessitate proactive intervention to achieve the desired optimization targets.
The quantitative data presented in the subsequent tables (e.g., Table 3, Tables 6 and 10) correspond to representative experimental runs for each operational mode. This approach was necessitated by the rigorous three-stage procedure required for each valid evaluation cycle, which lasted a minimum of 21 continuous days: an initial 7-day warm-up period for testbed stabilization, a subsequent 7-day period for collecting traffic statistics to calculate a realistic reference vector R , and a final 7-day phase for performance evaluation. In practice, data were often collected for two or three consecutive weeks to ensure steady-state accuracy, resulting in individual testbed executions lasting at least 21 days.

4.4.1. Volume-Based Tariff Performance

The performance of the DTM mechanism under volume-based tariffs was verified using three distinct SDN controller operational modes. For each configuration, system behavior was observed over several billing cycles, and the obtained results were highly consistent. In all evaluated cases, the measured traffic vector followed the reference vector R with high precision. Consequently, the resulting transit costs were nearly optimal, with the convergence KPI ρ remaining close to 1, as summarized in Table 3. The marginal discrepancy between the expected and achieved costs (less than 1%) originates from the inherent statistical variability of traffic patterns, as the total volume transferred often fluctuates between billing periods. In the presented datasets, the traffic volume in the observed billing period was slightly lower than in the preceding one used for the R vector calculation.
The absolute transit traffic costs and the corresponding KPIs are presented in Table 3 and Table 4. These data indicate that for every SDN controller mode, the achieved cost was not only aligned with the expected values but was also significantly lower than the costs associated with static routing alternatives. This confirms that the DTM mechanism provides substantial savings; without its activation, routing all manageable traffic through a fixed path (either Link 1 or Link 2) would result in higher operational expenditures.
Since the traffic plots for all operational modes exhibited similar characteristics, we present the figures illustrating traffic patterns and traffic growth on the cost map for the “reactive without reference” mode only (Figure 5 and Figure 6). It should be noted that in Figure 6, the curves representing the “Reference vector” and the results obtained “with DTM” appear to overlap almost entirely. This effect is a direct consequence of the high steering precision of the mechanism; since the actual traffic distribution is maintained very close to the optimization goal, the two lines occupy identical coordinates on the plot.
Manageable traffic is dynamically distributed between the two available links. During intervals of very low total traffic, all manageable flows are directed through Link 1. Because the manageable volume during these periods is insufficient to meet the target shift, the measured traffic vector temporarily diverges from the reference vector. However, this is corrected later in the cycle, and the traffic vector converges back to R . These temporal discrepancies, indicated by green circles in Figure 5 and Figure 6, are minor. Overall, the distribution logic ensures that the current traffic vector follows the direction of R accurately, reaching the optimal point by the end of the billing period. The high values of the compensation vector C during these intervals reflect the volume of manageable traffic required to align the system state with the reference direction (Figure 8).
The operational differences between the “with reference” and “without reference” modes are primarily visible in the compensation vector values (Figure 7 and Figure 8). In the “with reference” mode, C updates are transmitted regularly (every 30 s), ensuring high accuracy through continuous correction. In the “without reference” mode, updates to the C vector are triggered only upon a sign change or after a 5 min safety timeout. Consequently, this mode exhibits higher oscillations in the compensation values, as the reduced signaling frequency may lead to slight overcompensation before the next update arrives. Crucially, however, this behavior does not adversely affect the final cost efficiency. Furthermore, as shown in Table 5, the number of signaling messages is significantly lower in the “without reference” modes. Given these findings, the “without reference” configuration is recommended to minimize signaling overhead and improve system scalability.
Finally, when comparing proactive and reactive modes for volume-based tariffs, the results are comparable. However, since the proactive mode reassigns all active flows immediately—which may lead to out-of-order packet delivery—we recommend using the “reactive mode” for volume-based charging.

4.4.2. 95th Percentile Tariff Performance

While the volume-based tariff evaluates cumulative traffic, the 95th percentile billing scheme is inherently more sensitive to short-term traffic dynamics. To provide a comprehensive comparative study, we quantitatively evaluated the DTM performance under the 95th percentile tariff across the three defined operational modes. The key performance indicators, absolute costs, and signaling overhead for the 95th percentile tariff are summarized in Table 6, Table 7, and Table 8, respectively.
As presented in Table 7, the DTM mechanism successfully reduces transit costs while 95th percentile billing scheme is used regardless of the operational mode. The relative monetary gain KPIs ( ξ ( 1 ) and ξ ( 2 ) ) remain strictly below 1, confirming that dynamic steering provides substantial savings compared to static BGP routing (Table 6). The actual cost savings heavily depend on the default BGP path used as a baseline. When DTM optimization is compared against the worst-case scenario—routing all manageable traffic through the more expensive link—the cost reduction reaches its maximum of approximately 29% ( ξ ( 2 ) 0.71 ). The convergence ratio ( ρ ) demonstrates that the achieved costs are remarkably close to the theoretical optimum.
Consistent with the volume-based experiments, the signaling overhead is a critical differentiator between the modes. The with reference mode, which relies on strict 30 s periodic updates, requires over 20,000 signaling messages during the billing cycle (Table 8). By contrast, transitioning to the without reference modes—where updates are triggered primarily by a sign change in the compensation vector—reduces the inter-domain signaling footprint by more than 50%. Importantly, this drastic reduction in control-plane communication incurs only a negligible penalty in cost convergence (a ρ increase of roughly 0.004). These quantitative results confirm that hiding the explicit optimization goal (reference vector) from the remote domain significantly enhances system scalability without compromising the economic benefits of the 95th percentile optimization.

4.4.3. Evaluation Under Unfriendly Traffic Patterns

As discussed previously, managing traffic under the 95th percentile tariff is more challenging because DTM must keep each 5 min sample below a specific threshold, leaving a very narrow window for compensation. If the manageable traffic consists of numerous frequently generated short flows, the reactive mode can effectively perform compensation by steering new flows to the chosen tunnel.
However, we decided to investigate a more difficult scenario: a traffic pattern consisting of relatively long flows (compared to the 5 min sample length) with a large inter-arrival time. Such “unfriendly” conditions are problematic for the reactive mode, as the system cannot shift existing flows, and the lack of new incoming flows limits its ability to adjust the distribution within the required 5 min interval. In this context, the proactive mode should offer an advantage by allowing the immediate switching of active flows. To test this hypothesis, we configured a traffic generator as specified in Table 9, resulting in a mean flow length of 162.5 s.
The results for the reactive and proactive modes are shown in Figure 9 and Figure 10, respectively. When the proactive mode is employed, the 5 min sample pairs are well-condensed around the reference vector. In contrast, under the reactive mode, the samples are significantly more spread, and the actual 95th percentile threshold fails to meet the optimization target. This occurs because the active flows are effectively non-shiftable in the reactive mode, meaning the volume of traffic that DTM can actually manage is much lower than the nominal amount. These findings demonstrate that the proactive mode is significantly more effective at managing traffic patterns dominated by long-lasting flows.
The results for the “unfriendly” traffic scenario are summarized in Table 10. The proactive mode provides high precision, demonstrating convergence with a ρ value of 1.007. This indicates that even under challenging traffic conditions, the actual transit expenditure was kept within a 0.7% margin of the calculated optimum. In contrast, the reactive mode exhibited a vulnerability to the “unfriendly” traffic profile, overshooting the optimization goal by 3.5% ( ρ = 1.035 ). While this percentage might seem moderate in isolation, the absolute monetary deviation of 117.4 EUR is more than five times higher than the 21.6 EUR deviation recorded for the proactive mode. This inaccuracy is a direct consequence of the reactive mode’s inability to shift existing flows. When long-lasting sessions are established on a link, the controller is unable to reduce the load until those flows terminate naturally, which often does not occur within the 5 min billing sample window.
It must be emphasized that these results represent a preliminary investigation conducted over a reduced billing period of one day. For the purposes of this specific test, the manageable traffic was composed exclusively of long-lasting flows to isolate the impact of session duration on steering accuracy. Consequently, the traffic profile did not include short-lived flows, which typically provide the reactive mode with more frequent opportunities for compensation. We acknowledge that a different traffic composition—specifically a mixture of short-lived and long-lasting sessions—may yield different results, as the presence of short flows could mitigate some of the reactive mode’s limitations. Further research is required to evaluate these complex, hybrid traffic distributions over standard billing cycles.

4.4.4. Analytical Assessment of Scalability and Switch Resources

While the reduction of inter-domain signaling messages addresses bandwidth overhead, the operational scalability of the DTM mechanism can also be analytically evaluated in terms of switch resource management and controller load. In the reactive modes, traffic steering is executed on a per-flow basis. Consequently, the SDN switch must maintain an individual flow table entry for every active connection. This approach yields an O ( N ) memory complexity, where N is the number of concurrent flows. In high-traffic data center environments, it is analytically expected that this leads to significant rule churn and can rapidly exhaust the limited TCAM available in hardware switches. Furthermore, from a theoretical perspective, the controller experiences a higher processing load due to the continuous stream of Packet-In events required to assign each new flow to the appropriate tunnel.
Conversely, the proactive steering mode fundamentally shifts this paradigm from per-flow management to aggregate-level routing. The proactive approach requires only a single, aggregated forwarding rule to direct the entire manageable traffic class into the selected tunnel. When the compensation vector C triggers a path switch, the controller simply updates this O ( 1 ) entry, instantly redirecting the aggregate traffic. Our analytical assessment indicates that this operational model virtually eliminates rule churn associated with flow arrivals and departures, drastically reduces TCAM occupancy, and shields the controller from Packet-In floods.

4.5. Discussion on System Robustness and Limitations

The effectiveness of the DTM mechanism is inherently tied to the accuracy of the reference vector R . We acknowledge that a significant forecasting error in the calculation of R can lower the system’s efficiency. If the prediction, based on historical traffic patterns, fails to account for current network conditions, the mechanism may steer traffic toward suboptimal distribution points, potentially leading to increased transit costs. To mitigate this risk, we are currently investigating methods for online prediction of the reference vector. Such a dynamic approach would allow the optimization goals to be updated in real-time, ensuring a more effective response to non-stationary demand and sudden changes in traffic profiles.
Several other operational constraints remain. High signaling latency between administrative domains could lead to oscillatory behavior in the compensation loop, as control decisions might be based on outdated telemetry. Furthermore, while the proactive mode successfully optimizes TCAM occupancy in SDN switches, its impact on packet reordering for time-critical applications is a trade-off that requires careful consideration. Finally, extending DTM to larger, more complex topologies may necessitate a different signaling approach to maintain stability, which remains a key objective for our future research.

5. Conclusions

This work investigated the relationship between signaling overhead and the economic efficiency of the Dynamic Traffic Management (DTM) framework, specifically focusing on the impact of optimization goal visibility in inter-domain inter-cloud communication. We evaluated whether the remote sending domain requires full knowledge of the global optimization target (the reference vector R ) or if cost minimization can be effectively managed using reduced-visibility modes based on the compensation vector ( C ).
Our experimental results demonstrate that a comparable level of cost-efficiency can be maintained even with a significantly reduced signaling footprint. In the main experiments conducted, we observed inter-domain traffic cost reductions ranging from 5% to as much as 29%, depending on the specific link utilized as the default BGP path for manageable traffic. While these figures represent the upper and lower bounds of our primary testbed runs against default BGP routing baselines, typical values of the relative monetary gain factor observed in various experiments varied between 8% and 15%. These findings indicate that DTM can serve as an effective tool for Operational Expenditure (OpEx) optimization, even when coordination between SDN controllers is limited to basic sign-based compensation updates.
However, the evaluation also highlights that the effectiveness of the mechanism is not universal and depends on several environmental variables. The actual cost savings are closely tied to the volume of traffic on inter-domain links, the proportion of traffic classified as manageable, and the steepness of the piecewise cost functions. A critical takeaway for ISPs deploying DTM for 95th percentile charging is the necessity of aligning the SDN controller mode with the manageable traffic profile. Specifically, the distribution of flow lengths and flow inter-arrival times should dictate the choice between reactive and proactive steering. Our results indicate that while reactive modes perform well for short-lived flows, based on an additional preliminary one-day scenario, the proactive approach shows promising potential when dealing with long-lasting flows that would otherwise remain unshiftable within the critical 5 min billing sample intervals.
Additionally, the proactive mode offers an advantage regarding switch resource management. Analytically, the reactive mode requires a separate flow table entry for every flow; in contrast, the proactive approach utilizes a single aggregated rule for the entire manageable traffic class, which leads to a substantial reduction in flow table occupancy and enhances the scalability of the system.
It is crucial to note that the operational superiority of the proactive mode regarding cost convergence and scalability comes at a strict cost to Quality of Service (QoS). Because the proactive mode abruptly redirects active, in-flight sessions to a new inter-domain link, it introduces a severe risk of out-of-order packet delivery and delay jitter due to asymmetric path characteristics. The proactive steering mode should only be deployed for delay-tolerant, bulk-transfer traffic aggregates where packet reordering is handled gracefully by the transport layer. For time-sensitive or interactive applications, the reactive mode remains the only viable option, despite its lower efficiency in managing long-lasting flows.
Our future research will focus on extending these signaling modes to topologies with more than two receiving links, evaluating the proactive mode over extended multi-day billing cycles, and investigating the impact of signaling latency on the stability of the compensation loop. Furthermore, we plan to explore the implications of proactive mode usage on time-sensitive applications, specifically assessing how out-of-order packet delivery and session transients during immediate path switching may lead to the degradation of application performance and perceived quality of service.

Author Contributions

Conceptualization, G.R. and Z.D.; methodology, G.R. and Z.D.; software, G.R., Z.D. and P.W.; validation, G.R., Z.D. and R.S.; formal analysis, G.R., Z.D. and R.S.; investigation, G.R. and Z.D.; resources, Z.D.; data curation, G.R. and Z.D.; writing—original draft preparation, G.R. and Z.D.; writing—review and editing, G.R. and Z.D.; visualization, R.S.; supervision, Z.D. All authors have read and agreed to the published version of the manuscript.

Funding

The presented work has been done partly within the EU ICT Project SmartenIT (FP7-2012-ICT-317846). The authors would like to thank all SmartenIT partners for cooperation and valuable discussions. Research project partly supported by the program “Excellence initiative—research university” for the AGH University of Krakow. This work was supported by the Polish Ministry of Science and Higher Education with the subvention funds of the Faculty of Computer Science, Electronics and ń of AGH University and by the Priority Research Area Digiworld under the program Excellence Initiative-Research University at the Jagiellonian University in Krakow. Moreover, this research was supported by the National Research Institute, grant number KPOD.01.18-IW.03-0009/24 on “National Laboratory for Advanced 5G Research” as part of the National Recovery and Resilience Plan, Investment A2.4.1: Investments in the Development of Research Potential.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

Author Rafał Stankiewicz and Author Piotr Wydrych are currently employed by Akamai Technologies. The work presented in this manuscript was conducted prior to their employment at Akamai Technologies, and the company was not involved in the research, funding, or preparation of this manuscript. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BGBorder Gateway
BGPBorder Gateway Protocol
DAData center Attachment point
DCData Center
DE-CIXDeutscher Commercial Internet Exchange
DTMDynamic Traffic Management
GREGeneric Routing Encapsulation
ISPInternet Service Provider
KPIKey Performance Indicator
MPLSMulti-Protocol Label Switching
OpExOperational Expenditure
QoSQuality of Service
SDNSoftware-Defined Networking
SD-WANSoftware-Defined Wide Area Network
SLAService Level Agreement
TCAMTernary Content-Addressable Memory
WANWide Area Network

References

  1. Duliński, Z.; Stankiewicz, R.; Rzym, G.; Wydrych, P. Dynamic Traffic Management for SD-WAN Inter-Cloud Communication. IEEE J. Sel. Areas Commun. 2020, 38, 1335–1351. [Google Scholar] [CrossRef]
  2. Chen, Y.; Jain, S.; Adhikari, V.K.; Zhang, Z.L.; Xu, K. A first look at inter-data center traffic characteristics via Yahoo! datasets. In Proceedings of the 2011 Proceedings IEEE INFOCOM, April 2011; IEEE: New York, NY, USA, 2011; pp. 1620–1628. [Google Scholar] [CrossRef]
  3. Labovitz, C.; Iekel-Johnson, S.; McPherson, D.; Oberheide, J.; Jahanian, F. Internet inter-domain traffic. SIGCOMM Comput. Commun. Rev. 2010, 41. Available online: https://api.semanticscholar.org/CorpusID:6738290 (accessed on 7 April 2026).
  4. Cisco Global Cloud Index: Forecast and Methodology, 2015–2020, White Paper Series, 2016. Available online: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/622_11_15-16-Cisco_GCI_CKN_2015-2020_AMER_EMEAR_NOV2016.pdf (accessed on 7 April 2026).
  5. SD-WAN or MPLS: A Pricing Analysis. Available online: https://sd-wan-experts.com/wp-content/uploads/2017/03/Pricing-Paper.pdf (accessed on 31 January 2024).
  6. Goldenberg, D.K.; Qiuy, L.; Xie, H.; Yang, Y.R.; Zhang, Y. Optimizing Cost and Performance for Multihoming. SIGCOMM Comput. Commun. Rev. 2004, 34, 79–92. [Google Scholar] [CrossRef]
  7. Jain, S.; Kumar, A.; Mandal, S.; Ong, J.; Poutievski, L.; Singh, A.; Venkata, S.; Wanderer, J.; Zhou, J.; Zhu, M.; et al. B4: Experience with a Globally-deployed Software Defined Wan. SIGCOMM Comput. Commun. Rev. 2013, 43, 3–14. [Google Scholar] [CrossRef]
  8. Hong, C.Y.; Mandal, S.; Al-Fares, M.; Zhu, M.; Alimi, R.; B., K.N.; Bhagat, C.; Jain, S.; Kaimal, J.; Liang, S.; et al. B4 and After: Managing Hierarchy, Partitioning, and Asymmetry for Availability and Scale in Google’s Software-defined WAN. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication; SIGCOMM ’18; ACM: New York, NY, USA, 2018; pp. 74–87. [Google Scholar]
  9. Hong, C.Y.; Kandula, S.; Mahajan, R.; Zhang, M.; Gill, V.; Nanduri, M.; Wattenhofer, R. Achieving High Utilization with Software-driven WAN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM; SIGCOMM ’13; ACM: New York, NY, USA, 2013; pp. 15–26. [Google Scholar] [CrossRef]
  10. Krishnaswamy, U.; Singh, R.; Mattes, P.; Bissonnette, P.A.C.; Bjørner, N.; Nasrin, Z.; Kothari, S.; Reddy, P.; Abeln, J.; Kandula, S.; et al. OneWAN is better than two: Unifying a split WAN architecture. In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), Boston, MA, USA, 17–19 April 2023; pp. 515–529. [Google Scholar]
  11. Duliński, Z.; Rzym, G.; Chołda, P. MPLS-based reduction of flow table entries in SDN switches supporting multipath transmission. Comput. Commun. 2020, 151, 365–385. [Google Scholar] [CrossRef]
  12. Zhao, G.; Wang, J.; Xu, H.; Yu, Z.; Qiao, C. COIN: Cost-Efficient Traffic Engineering with Various Pricing Schemes in Clouds. In Proceedings of the IEEE INFOCOM 2023—IEEE Conference on Computer Communications; IEEE: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
  13. Singh, R.; Agarwal, S.; Calder, M.; Bahl, V. Cost-Effective Cloud Edge Traffic Engineering with CASCARA. In Proceedings of the USENIX NSDI, Virtual, 12–14 April 2021. [Google Scholar]
  14. Gu, L.; Zeng, D.; Guo, S.; Xiang, Y.; Hu, J. A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers. IEEE Trans. Comput. 2016, 65, 19–29. [Google Scholar] [CrossRef]
  15. Li, W.; Li, K.; Guo, D.; Min, G.; Qi, H.; Zhang, J. Cost-Minimizing Bandwidth Guarantee for Inter-Datacenter Traffic. IEEE Trans. Cloud Comput. 2019, 7, 483–494. [Google Scholar] [CrossRef]
  16. Zhang, Y.; Zhang, H.; Cong, P.; Wang, W.; Xu, K. Grandet: Cost-aware Traffic Scheduling without Prior Knowledge in SD-WAN. In Proceedings of the 2023 IEEE/ACM 31st International Symposium on Quality of Service (IWQoS); IEEE: New York, NY, USA, 2023; pp. 1–10. [Google Scholar]
  17. Yue, S.; Lin, X.; Sun, W.; Hu, W. Modeling Sparse Store-and-Forward Bulk Data Transfers in Inter-Datacenter Networks with Multiple Congested Links. IEEE Trans. Cloud Comput. 2023, 11, 2725–2738. [Google Scholar] [CrossRef]
  18. Laoutaris, N.; Sirivianos, M.; Yang, X.; Rodriguez, P. Inter-datacenter Bulk Transfers with Netstitcher. SIGCOMM Comput. Commun. Rev. 2011, 41, 74–85. [Google Scholar] [CrossRef]
  19. Tao, X.; Ota, K.; Dong, M.; Borjigin, W.; Qi, H.; Li, K. Congestion-Aware Traffic Allocation for Geo-Distributed Data Centers. IEEE Trans. Cloud Comput. 2022, 10, 1675–1687. [Google Scholar] [CrossRef]
  20. Zeng, G.; Bai, W.; Chen, G.; Chen, K.; Han, D.; Zhu, Y.; Cui, L. Congestion Control for Cross-Datacenter Networks. IEEE/ACM Trans. Netw. 2022, 30, 2074–2089. [Google Scholar] [CrossRef]
  21. Dong, X.; Zhao, L.; Zhou, X.; Li, K.; Guo, D.; Qiu, T. An Online Cost-Efficient Transmission Scheme for Information-Agnostic Traffic in Inter-Datacenter Networks. IEEE Trans. Cloud Comput. 2022, 10, 202–215. [Google Scholar] [CrossRef]
  22. Deutscher Commercial Internet Exchange (DE-CIX), 2026. Available online: https://www.de-cix.net (accessed on 7 April 2026).
Figure 1. System architecture for inter-cloud DTM.
Figure 1. System architecture for inter-cloud DTM.
Electronics 15 01576 g001
Figure 2. Operational principle of the DTM steering mechanism.
Figure 2. Operational principle of the DTM steering mechanism.
Electronics 15 01576 g002
Figure 3. Operational flowchart of the DTM mechanism.
Figure 3. Operational flowchart of the DTM mechanism.
Electronics 15 01576 g003
Figure 4. Logical layout of the testbed environment.
Figure 4. Logical layout of the testbed environment.
Electronics 15 01576 g004
Figure 5. Daily traffic patterns for links L 1 and L 2 under volume-based tariff in the reactive without reference mode.
Figure 5. Daily traffic patterns for links L 1 and L 2 under volume-based tariff in the reactive without reference mode.
Electronics 15 01576 g005
Figure 6. Traffic growth during the billing period and a cost map (reactive without reference mode).
Figure 6. Traffic growth during the billing period and a cost map (reactive without reference mode).
Electronics 15 01576 g006
Figure 7. Compensation vector values received by remote SDN controller. SDN controller mode: “reactive with reference”.
Figure 7. Compensation vector values received by remote SDN controller. SDN controller mode: “reactive with reference”.
Electronics 15 01576 g007
Figure 8. Compensation vector values received by remote SDN controller. SDN controller mode: “reactive without reference”.
Figure 8. Compensation vector values received by remote SDN controller. SDN controller mode: “reactive without reference”.
Electronics 15 01576 g008
Figure 9. The distribution of 5 min sample pairs—long flows, reactive mode.
Figure 9. The distribution of 5 min sample pairs—long flows, reactive mode.
Electronics 15 01576 g009
Figure 10. The distribution of 5 min sample pairs—long flows, proactive mode.
Figure 10. The distribution of 5 min sample pairs—long flows, proactive mode.
Electronics 15 01576 g010
Table 1. Configuration of Traffic Profiles.
Table 1. Configuration of Traffic Profiles.
ParameterBackground L 1 Background L 2 Manageable DC-DC
Flow inter-arrivalExp ( 1 / λ = 109.5 ms)Exp ( 1 / λ = 218.5 ms)Exp ( 1 / λ = 118 ms)
Flow lengthPareto (k = 3400, α = 1.5)Pareto (k = 3400, α = 1.5)Exp ( 1 / λ = 2400 ms)
Packet inter-arrivalExp ( 1 / λ = 35 ms)Exp ( 1 / λ = 35 ms)Exp ( 1 / λ = 35 ms)
Payload SizeNormal Mixture (Bimodal):
(Component #1) P = 0.6 :   μ = 1358 , σ = 25 P = 0.95 :   μ = 1358 , σ = 25
(Component #2) P = 0.4 :   μ = 158 , σ = 20 P = 0.05 :   μ = 158 , σ = 20
Table 2. Transit Cost Model Parameters for Inter-domain Links.
Table 2. Transit Cost Model Parameters for Inter-domain Links.
Tariff TypeLinkInterval [GB]Slope a [€/GB]Intercept b [€]
Volume-based L 1.1 0 x 168 0200
( f ( x ) = a x + b ) 168 < x 880.32 2.976−300
x > 880.32 8.929−5540
L 1.2 0 x 369.6 1.190
369.6 < x 559.78 5.952−1760
x > 559.78 11.9−5092
95th Percentile L 1.1 0 x 0.16 0200
( f ( x ) = a x + b ) 0.16 < x 0.7 3 × 10 3 −280
x > 0.7 9 × 10 3 −4480
L 1.2 0 x 0.18 2.8 × 10 3 0
0.18 < x 0.34 7.8 × 10 3 −900
x > 0.34 20.3 × 10 3 −5150
Note: a represents the marginal cost per GB, and b is the fixed cost component for each tier.
Table 3. KPI values for DTM run with various SDN controller modes (volume-based tariff).
Table 3. KPI values for DTM run with various SDN controller modes (volume-based tariff).
KPIReactive with
Reference
Reactive Without
Reference
Proactive Without
Reference
ξ ( 1 ) 0.93070.92770.9333
ξ ( 2 ) 0.82280.83470.8264
Δ D ( 1 ) [€]239.31258.13233.82
Δ D ( 2 ) [€]691.86656.03686.85
ρ 0.994870.989260.99111
Table 4. Total traffic costs achieved with DTM and expected without DTM obtained for various SDN controller modes (volume-based tariff).
Table 4. Total traffic costs achieved with DTM and expected without DTM obtained for various SDN controller modes (volume-based tariff).
Cost ItemReactive with
Reference [€]
Reactive Without
Reference [€]
Proactive Without
Reference [€]
Expected (estimated by R )3228.53347.93298.6
Achieved with DTM3211.93312.03269.7
Optimal (current period)3203.83294.63254.9
w/o * DTM (default Link 1)3451.33570.13503.1
w/o * DTM (default Link 2)3903.83968.03956.1
* w/o stands for without. These rows represent costs incurred under default BGP paths.
Table 5. The number of C vector updates exchanged by SDN controllers during the billing period (volume-based tariff).
Table 5. The number of C vector updates exchanged by SDN controllers during the billing period (volume-based tariff).
Reactive with ReferenceReactive Without ReferenceProactive Without Reference
2016091549413
Table 6. KPI values for DTM run with various SDN controller modes (95th percentile tariff).
Table 6. KPI values for DTM run with various SDN controller modes (95th percentile tariff).
KPIReactive with
Reference
Reactive Without
Reference
Proactive Without
Reference
ξ ( 1 ) 0.95200.95610.9545
ξ ( 2 ) 0.71170.71480.7135
Δ D ( 1 ) [€]169.80155.30161.10
Δ D ( 2 ) [€]1363.801349.301355.10
ρ 1.0011.0051.003
Table 7. Total traffic costs achieved with DTM and expected without DTM obtained for 95th percentile tariff across various SDN controller modes.
Table 7. Total traffic costs achieved with DTM and expected without DTM obtained for 95th percentile tariff across various SDN controller modes.
Cost ItemReactive with
Reference [€]
Reactive Without
Reference [€]
Proactive Without
Reference [€]
Expected (estimated by R )3311.03433.53382.8
Achieved with DTM3366.73381.23375.4
Optimal (current period)3362.03362.03362.0
w/o * DTM (default Link 1)3536.53536.53536.5
w/o * DTM (default Link 2)4730.54730.54730.5
* w/o stands for without. These rows represent costs incurred under default BGP paths.
Table 8. The number of C vector updates exchanged by SDN controllers during the billing period (95th percentile tariff).
Table 8. The number of C vector updates exchanged by SDN controllers during the billing period (95th percentile tariff).
Reactive with ReferenceReactive Without ReferenceProactive Without Reference
20,16093429518
Table 9. Configuration Parameters for Long-lasting Manageable Flows.
Table 9. Configuration Parameters for Long-lasting Manageable Flows.
ParameterDistributionParameters
Flow inter-arrivalExponential 1 / λ = 800  ms
Flow lengthLognormal μ = 11.5474 , σ = 0.9497
Packet inter-arrivalExponential 1 / λ = 35  ms
Payload Size Bimodal MixDistribution Values
(Normal Mixture)Component #1 P = 0.95 : μ = 1358 , σ = 25
Component #2 P = 0.05 : μ = 158 , σ = 20
Table 10. Aggregated monetary performance and convergence KPIs under “unfriendly” traffic patterns (total for all inter-domain links).
Table 10. Aggregated monetary performance and convergence KPIs under “unfriendly” traffic patterns (total for all inter-domain links).
KPIReactive ModeProactive Mode
Expected Cost [€] ( D R )3353.63088.4
Achieved Cost [€] (D)3471.03110.0
Absolute Cost Deviation [€]+117.4+21.6
Convergence KPI ρ ( D / D R )1.0351.007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rzym, G.; Duliński, Z.; Stankiewicz, R.; Wydrych, P. Impact of Optimization Goal Visibility on Inter-Cloud DTM Performance. Electronics 2026, 15, 1576. https://doi.org/10.3390/electronics15081576

AMA Style

Rzym G, Duliński Z, Stankiewicz R, Wydrych P. Impact of Optimization Goal Visibility on Inter-Cloud DTM Performance. Electronics. 2026; 15(8):1576. https://doi.org/10.3390/electronics15081576

Chicago/Turabian Style

Rzym, Grzegorz, Zbigniew Duliński, Rafał Stankiewicz, and Piotr Wydrych. 2026. "Impact of Optimization Goal Visibility on Inter-Cloud DTM Performance" Electronics 15, no. 8: 1576. https://doi.org/10.3390/electronics15081576

APA Style

Rzym, G., Duliński, Z., Stankiewicz, R., & Wydrych, P. (2026). Impact of Optimization Goal Visibility on Inter-Cloud DTM Performance. Electronics, 15(8), 1576. https://doi.org/10.3390/electronics15081576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop