An Adaptive Application-Aware Dynamic Load Balancing Framework for Open-Source SD-WAN
Abstract
1. Introduction
- An adaptive load-balancing framework that accommodates traffic routing based on real-time network and application metrics.
- Implementation in an open-source SD-WAN environment, ensuring cost-effective and scalable deployment.
- Comprehensive performance evaluation demonstrates a 7.09% latency reduction and a 9.80% improvement in RAM efficiency compared to traditional approaches.
2. Related Work
2.1. Proprietary vs. Open-Source SD-WAN Platforms
2.2. Load Balancing in SD-WAN
3. System Design and Methodology
3.1. Adaptive Load-Balancing Scheme
Algorithm 1: Weight Generating Algorithm |
Input: (application instances), (CPU, memory, etc), (user-defined) |
Output: weights (Fabio weights for each instance) |
weights_config user-defined weight percentages for each instance in instances do Extract metrics: end for |
- Data Acquisition and Metrics Derivation: The algorithm relies on two primary categories of data: Host-Level Metrics, which include CPU and RAM utilization and the trend coefficient, and Connection-Level Metrics, which include the average connection duration, average bytes sent and received, and connection error rate.
- Normalization: Each metric is normalized to a standard scale of 0–100. For instance, CPU utilization is inversely scaled. Slower utilization scores result in a higher normalized score. At the same time, connection metrics are normalized based on predefined ideal ranges and specific processing functions based on scenarios.
- Weighting and Scoring: Each metric is assigned a weight percentage, reflecting its importance in the final calculation. The normalized scores are multiplied by their respective weights to compute the weighted scores.
- Weight Mapping: The final weighted score is scaled to an integer and added as a tag to a specific app instance in our service registry. The higher the score, the more resources the instance has, and the more traffic it can handle. This enables the load balancer to allocate more traffic to application instances with more resources, i.e., less burdened ones, thereby optimizing our system’s performance.
3.2. Implementation
3.3. Performance Metrics
- Host-Level Metrics: CPU and memory utilization data are collected using a dedicated monitoring service (e.g., Zabbix). A key derived metric is the trend coefficient, calculated from current 1 min interval metrics grouped by service IP address or domain name, which indicates increasing or decreasing resource load to support proactive traffic redirection.
- Connection-Level Metrics: Real-time network traffic, including communication between the load balancer and application instances, is monitored via Zeek logs. Essential metrics are extracted from logs (e.g., conn.log, http.log), stored in Elasticsearch, and periodically queried to supply data for processing.
- Score Normalization: This stage applies predefined functions to each metric, converting raw values into a standardized score. For host-level metrics like CPU and RAM utilization, scores are inversely proportional to their percentage (e.g., <10% utilization yields 100 points; 70% yields 30 points). Connection-level metrics such as average connection duration are mapped to ideal ranges (e.g., 0.1–0.2 s scoring 100 points), with scores proportionally reduced for values exceeding optimal thresholds based on custom formulae and user-defined cutoff values (e.g., a 0.3 s duration with a 0.2 s cutoff and 20-point penalty per 0.1 s interval results in 80 points). The connection error ratio is normalized to yield higher scores for lower error rates, incorporating specific user-defined cutoffs for ideal performance.
- Weight Assignment and Scoring: Metrics are weighted based on their importance to service needs, determined via a heuristic approach combining domain knowledge and expert assessment under real-environment conditions. Our own experience and detailed understanding of specific network traffic requirements further enhanced the decision-making process. For example,
- ○
- CPU and RAM utilization are each assigned a 15% weight due to their direct impact on an instance’s traffic handling capacity.
- ○
- The trend coefficient receives 20% for its role in anticipating resource demands.
- ○
- Average connection duration, indicating service responsiveness, is assigned the highest weight at 20%.
- ○
- Bytes sent and received are each weighted at 10%, providing network bandwidth insight.
- ○
- The connection error ratio receives 10%, significant for identifying potential service failures.
- Weight Mapping: To ensure compatibility with Fabio’s weight system, the total weighted score for each application instance is rounded to the nearest integer (e.g., 56.5 points rounded to 57 points). This rounded value is then converted into a tag descriptor and attached to the corresponding application instance within its service group in Consul, enabling Fabio to dynamically distribute traffic.
4. Experimental Setup and Testing
4.1. Network Topology and Node Distribution
4.2. Virtual Infrastructure and Deployment Environment
4.3. Traffic Generation Model
- Arrival Process: Each thread initiates a request batch after a randomized idle interval sampled from a uniform distribution over the interval [1, 20] s, effectively simulating asynchronous user activity and preventing traffic burst synchronization.
- Request Volume per Batch: The number of requests per batch is randomly chosen from a discrete uniform distribution in the range [10, 100], ensuring variability in burst sizes like real-world request patterns.
- Request Types: The generated traffic consists of HTTP REST API calls, including GET (≈70%), POST, INSERT, and UPDATE operations (≈30%). The traffic composition shows typical enterprise workloads, marked by mostly read-heavy operations, with write operations still placing significant demands on processing resources and bandwidth.
- Payload Characteristics:
- ▪
- GET requests are usually simple and require small processing from the server.
- ▪
- POST/INSERT/UPDATE requests include varied payload sizes, which are synthetically generated with sizes ranging from 256 bytes to 16 KB, based on empirical distributions from prior real-world measurements of enterprise application traffic [21].
- Randomized temporal distribution, avoiding periodicity;
- Statistical variation in intensity and payload, simulating bursty and heterogeneous demand;
- Protocol consistency, using standardized HTTP/1.1 over TCP with client-initiated transactions.
4.4. Testing Scenarios
- Test Scenario 1—Backend Instance Failure: To replicate a real-world failure condition, one backend instance (Instance 3) was manually taken offline between minutes 30 and 40. During this period, we measured CPU and RAM utilization, connection durations, and error rates. The goal was to evaluate system response and check each algorithm’s ability to reroute traffic and restore service after reintegrating the failed instance.
- Test Scenario 2—Load Distribution and Latency Due to Service Placement: To analyze sensitivity to inter-WAN delays, the central MySQL database was hosted in WAN-1. Traffic was then analyzed to compare instances located within the same WAN (low latency) versus those in different WANs (higher latency). App instances on different WAN networks should have a longer average connection time and a higher error rate compared to those on the same network as the database. This will impact how the weighting algorithm allocates the traffic. This way, we can assess our algorithm’s sensitivity to network topology and inter-WAN latencies.
5. Results and Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, G.; Zhao, Y.; Huang, J.; Wu, Y. An effective approach to controller placement in software defined wide area networks. IEEE Trans. Netw. Serv. Manag. 2017, 15, 344–355. [Google Scholar] [CrossRef]
- Gawande, S. The Role of SD-WAN in Facilitating Multi-Cloud Connectivity. Int. J. Innov. Sci. Res. Technol. 2025, 10, 1274–1280. [Google Scholar]
- NetworkCentre. Network Management with SD-WAN. Available online: http://www.networkcentre.net/network-management-with-sd-wan/ (accessed on 1 October 2024).
- Mine, G.; Hai, J.; Jin, L.; Huiying, Z. A design of SD-WAN-oriented wide area network access. In Proceedings of the International Conference on Computer Communication and Network Security (CCNS), Xi’an, China, 21–23 August 2020. [Google Scholar]
- Troia, S.; Zorello, L.M.M.; Maralit, A.J.; Maier, G. SD-WAN: An open-source implementation for enterprise networking services. In Proceedings of the 22nd International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020. [Google Scholar]
- Yang, Z.; Cui, Y.; Li, B.; Liu, Y.; Xu, Y. Software-defined wide area network (SD-WAN): Architecture, advances and opportunities. In Proceedings of the 28th International Conference on Computer Communication and Networks (ICCCN), Valencia, Spain, 29 July–1 August 2019. [Google Scholar]
- Segeč, P.; Moravčik, M.; Uratmová, J.; Papán, J.; Yeremenko, O. SD-WAN-architecture, functions and benefits. In Proceedings of the 18th International Conference on Emerging eLearning Technologies and Applications (ICETA), Kosice, Slovenia, 12–13 November 2020. [Google Scholar]
- Zouini, M.; El Mantar, Z.; Rouboa, N.; Bensaoud, O.; Outzourhit, A.; Bahnasse, A. Towards a Modern ISGA Institute Infrastructure Based on Fortinet SD-WAN Technology: Recommendations and Best Practices. Procedia Comput. Sci. 2022, 210, 311–316. [Google Scholar] [CrossRef]
- Gooley, J.; Yanch, D.; Schuemann, D.; Curran, J. SD-WAN Architecture and Deployment. In Cisco Software-Defined Wide Area Networks: Designing, Deploying and Securing Your Next Generation WAN with Cisco SD-WAN, 1st ed.; Graziani, R., Lynn, A.M., Eds.; Cisco Press: Indianapolis, IN, USA, 2020; pp. 77–112. [Google Scholar]
- Troia, S.; Zorello, L.M.M.; Maier, G. SD-WAN: How the control of the network can be shifted from core to edge. In Proceedings of the 25th International Conference on Optical Network Design and Modeling (ONDM), Gothenburg, Sweden, 28 June–1 July 2021. [Google Scholar]
- Ouamri, M.A.; Alharbi, T.; Singh, D.; Sylia, Z. A comprehensive survey on software-defined wide area network (SD-WAN): Principles, opportunities, and future challenges. J. Supercomput. 2025, 81, 291. [Google Scholar] [CrossRef]
- Cheimaras, V.; Papagiakoumos, S.; Peladarinos, N.; Trigkas, A.; Papageorgas, P.; Piromalis, D.D.; Munteanu, R.A. Low-Cost, Open-Source, Experimental Setup Communication Platform for Emergencies, Based on SD-WAN Technology. Telecom 2024, 5, 347–368. [Google Scholar] [CrossRef]
- Rajagopalan, S. An overview of SD-WAN load balancing for WAN connections. In Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020. [Google Scholar]
- Vdovin, L.; Likin, P.; Vilchinskii, A. Network utilization optimizer for SD-WAN. In Proceedings of the International Science and Technology Conference (Modern Networking Technologies—MoNeTeC), Moscow, Russia, 28–29 October 2014. [Google Scholar]
- Gillgallon, R.; Almutairi, R.; Bergami, G.; Morgan, G. SimulatorOrchestrator: A 6G-Ready Simulator for the Cell-Free/Osmotic Infrastructure. Sensors 2025, 25, 1591. [Google Scholar] [CrossRef] [PubMed]
- VyOS—Open Source Router and Firewall Platform. Available online: https://vyos.io/ (accessed on 11 December 2024).
- Emmanuel, I.D. Proposed New SD-WAN Architecture to Facilitate Dynamic Load Balancing. Master’s Thesis, University of Salford, Salford, UK, 2024. [Google Scholar]
- Hama Amin, R.R.; Ahmed, D.H. Comparative Analysis of Flexiwan, OPNSense, and pfSense Cybersecurity Mechanisms in MPLS/SD-WAN Architectures. Passer J. Basic Appl. Sci. 2023, 6, 27–32. [Google Scholar] [CrossRef]
- Ouamri, M.A.; Barb, G.; Singh, D.; Alexa, F. Load balancing optimization in software-defined wide area networking (SD-WAN) using deep reinforcement learning. In Proceedings of the International Symposium on Electronics and Telecommunications (ISETC 2022), Timisoara, Romania, 10–11 November 2022. [Google Scholar]
- Duliński, Z.; Stankiewicz, R.; Rzym, G.; Wydrych, P. Dynamic traffic management for SD-WAN inter-cloud communication. IEEE J. Sel. Areas Commun. 2020, 38, 1335–1351. [Google Scholar] [CrossRef]
- Patil, A.G.; Surve, A.R.; Gupta, A.K.; Sharma, A.; Anmulwar, S. Survey of synthetic traffic generators. In Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; pp. 1–3. [Google Scholar]
SD-WAN Platform | Advantages | Constraints |
---|---|---|
Cisco SD-WAN | Integration with Cisco devices | High cost of licenses and support |
Advanced functionalities (traffic optimization, security) | Complicated configuration | |
Centralized management (vManage) | Dependence on the Cisco ecosystem | |
VeloCloud (VMware SD-WAN) | Simplified management through the VMware platform | Scalability in large networks |
Intelligent traffic routing Scalability in large networks Integration with VMware solutions | Integration with VMware solutions | |
Fortinet SD-WAN | Integration with FortiGate firewalls | Scalability in large networks |
Advanced security functionalities (IPS, VPN, antivirus) | Integration with VMware solutions | |
Traffic optimization (QoS, load balancing) | Scalability in large networks | |
Centralized management (FortiManager) Scalability in large networks Integration with VMware solutions | Integration with VMware solutions | |
OpenWrt | Open solution Flexibility and adaptability Large community and support Easy customization of functionality | Complicated setup Limited advanced functionality Lack of official support Need for technical expertise |
VyOS | Free solution Flexibility and adaptability Compatibility with different devices and networks | Limited functionality compared to proprietary solutions Lack of official support Difficult to implement and configure |
pfSense | Free and open-source | Technical knowledge required for implementation |
Strong security and VPN options Compatibility with various network devices Large community | Limited scalability compared to proprietary solutions |
Metric | Description and Functional Role |
---|---|
CPU Utilization | Measures the processing load on each backend instance. High CPU usage decreases the instance’s likelihood of receiving new traffic. |
RAM Utilization | Indicates memory pressure. Instances with elevated memory usage may respond slower and are deprioritized in load distribution. |
Trend Coefficient | Reflects the temporal change in CPU and RAM utilization over a defined interval (e.g., 1 min). A positive trend suggests increasing load, while a negative trend implies recovery. This metric enables proactive traffic redirection. |
Average Connection Duration | Represents how long requests stay active. Longer durations often indicate performance degradation or saturation and reduce the instance’s routing weight. |
Bytes Sent and Received | Quantifies bandwidth usage. Sustained high throughput may indicate either high efficiency or network saturation and is interpreted in conjunction with other metrics. |
Connection Error Ratio | Tracks the percentage of failed connection attempts. A rising error ratio signals instability or overload and results in penalizing the instance within the routing decision process. |
Component | Count | Resources (vCPU/RAM) | Purpose |
---|---|---|---|
Application Instances | 4 | 2 vCPU/2 GB | Backend APIs |
Database Server | 1 | 2 vCPU/2 GB | Central data storage (WAN-1) |
Load Balancer | 1 | 4 vCPU/8 GB | Fabio |
Controller | 1 | 6 vCPU/16 GB | SD-WAN management |
Monitoring and Logging | 2 | 4 vCPU/8 GB | Zeek, Logstash, Filebeat, Elasticsearch |
Traffic Generators | 3 | 2 vCPU/4 GB | Simulated user request generation |
Node | Network Domain | Purpose | Traffic Type |
---|---|---|---|
LB | Central Hub | Load Balancer | Receives and forwards traffic |
Controller | Central Hub | Controller, Data Processor | Central data storage |
Log Server | Central Hub | Log Server | Monitoring, Logging |
APP-1 | WAN-1 | App Instance 1 | REST API, Database Access |
APP-2 | WAN-1 | App Instance 2 | REST API, Database Access |
DB | WAN-1 | MySQL Database | Bulk SQL Transactions |
APP-3 | WAN-2 | App Instance 3 | REST API, Database Access |
APP-4 | WAN-2 | App Instance 4 | REST API, Database Access |
TG1 | External | Traffic Generator 1 | REST Requests |
TG2 | External | Traffic Generator 2 | WebRTC Requests (UDP) |
TG3 | External | Traffic Generator 3 | SQL (Database Queries) |
Algorithm | Avg Response Time (ms) | Standard Deviation | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|
DRR | 160.21 | 12.05 | 138.9 | 178.5 |
AADLB | 163.69 | 10.30 | 148.8 | 176.6 |
WRR | 169.23 | 12.31 | 134.0 | 188.0 |
PQ | 160.77 | 12.87 | 148.1 | 200.6 |
WFQ | 172.90 | 12.26 | 140.6 | 199.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Petrović, T.; Vidaković, A.; Doknić, I.; Veinović, M.; Bojović, Ž. An Adaptive Application-Aware Dynamic Load Balancing Framework for Open-Source SD-WAN. Sensors 2025, 25, 5516. https://doi.org/10.3390/s25175516
Petrović T, Vidaković A, Doknić I, Veinović M, Bojović Ž. An Adaptive Application-Aware Dynamic Load Balancing Framework for Open-Source SD-WAN. Sensors. 2025; 25(17):5516. https://doi.org/10.3390/s25175516
Chicago/Turabian StylePetrović, Teodor, Aleksa Vidaković, Ilija Doknić, Mladen Veinović, and Živko Bojović. 2025. "An Adaptive Application-Aware Dynamic Load Balancing Framework for Open-Source SD-WAN" Sensors 25, no. 17: 5516. https://doi.org/10.3390/s25175516
APA StylePetrović, T., Vidaković, A., Doknić, I., Veinović, M., & Bojović, Ž. (2025). An Adaptive Application-Aware Dynamic Load Balancing Framework for Open-Source SD-WAN. Sensors, 25(17), 5516. https://doi.org/10.3390/s25175516