On Optimized Scheduling Scheme for Rapid Pod Autoscaling in Kubernetes
Abstract
1. Introduction
1.1. Challenges
- Performance Efficiency: One of the primary challenges is enhancing the scheduler’s performance, particularly in environments requiring rapid scaling. The default Kubernetes scheduler at times struggles with scalability and speed due to its complex calculations and the absence of a state-persistent mechanism.
- Cache Utilization: Implementing a caching mechanism that not only speeds up the scheduling process by avoiding redundant computations but also ensures that the cache remains up-to-date with the latest state of the cluster is a significant technical hurdle.
- Integration with Existing Systems: Ensuring that the custom scheduler could seamlessly integrate with the existing Kubernetes infrastructure without disrupting ongoing operations is crucial.
- Maintaining Consistency and Reliability: The scheduler needed to consistently make optimal scheduling decisions under varying conditions and maintain high reliability and fault tolerance.
1.2. Positioning and Contributions
- (1)
- A ScoreKey that captures scheduling-relevant pod features (resource requests and constraint hashes) to define similarity for score reuse;
- (2)
- A rotating cache that returns only the current top-ranked node for a ScoreKey and pops it after each successful scheduling to avoid repeatedly selecting the same node;
- (3)
- Correctness and freshness safeguards via feasible-set change detection and periodic cache refresh after a configurable consumption threshold; and
- (4)
- An evaluation on a real GKE cluster comparing against the default scheduler, Koordinator, and YuniKorn under burst deployment.
1.3. Difference to Prior Reuse/Batching Proposals
1.4. Paper Organization
2. Background
2.1. Container Orchestration
2.2. Kubernetes Architecture
- -
- etcd: This key-value store maintains the cluster’s intended state synchronously.
- -
- Scheduler: It allocates pods across available worker nodes.
- -
- API Server: This serves as the communication hub for issuing commands and managing Kubernetes objects, which are durable entities denoting the cluster’s state. The API server facilitates a RESTful HTTP API that describes objects in JSON or YAML formats. Commands can also be sent to the API server using Kubernetes’ command-line interface (CLI), kubectl.
- -
- Controller Manager: It keeps an eye on etcd for changes and pushes the system towards the desired state. Known Kubernetes controllers like ReplicaSet, Deployment, Job, or DaemonSet offer various functionalities, including maintaining availability, enabling rollbacks, managing task execution, or ensuring a pod is active on every node.
3. Scheduling in Kubernetes
- Identify all nodes that can host the pod, i.e., nodes whose available CPU and memory satisfy the pod’s requirements.
- For every eligible node, retrieve the relevant metric information (memory).
- Select only the node(s) with the strongest (largest) metric value. The pod life cycle is illustrated in Figure 3.
3.1. User Specifications
3.2. Internal Workflow
Default Scheduler
- There exists a queue within the scheduler, known as the podQueue, which continually monitors the API Server for any updates.
- Upon the creation of a pod, its metadata is initially recorded in the etcd via the API Server.
- Operating akin to a controller, the default scheduler observes these updates and reacts by adjusting the state accordingly. It specifically scans for pods that haven’t been assigned to any node in etcd, adding each detected unassigned pod to the podQueue.
- The primary function of the scheduler is to methodically remove pods from the podQueue and allocate them to the most fitting nodes available for their execution.
- Once a pod is assigned to a node, this binding is updated in the etcd and communicated to the kubelet on the respective worker node.
- The kubelet, which operates on the worker node and keeps track of pod assignments, then initiates the execution of the newly assigned pod, thus beginning its operation on the node. The scheduler’s core logic rotates through the nodes using a round-robin method, and for each pod awaiting assignment, it executes steps of filtering and scoring to determine the optimal node.
4. Methodology
4.1. Deficiency Analysis of Default Scheduling Algorithm
4.1.1. Default Scoring Algorithm
4.1.2. Horizontal Pod Autoscaling (HPA) Burst Scaling Scenarios
Definition and Background
What We Mean by “HPA Scenarios”
- Burstiness: Many Pods are created nearly back-to-back (often within seconds), producing a spike of pending Pods in the scheduler queue.
- Homogeneity: The new Pods are typically replicas of the same template and hence share nearly identical scheduling-relevant properties (resource requests/limits, labels/selectors, and often the same affinity/toleration/topology constraints).
- Strict latency requirement: Service quality depends on how quickly new replicas become Ready; therefore, scheduling latency becomes part of end-to-end scale-out latency.
Typical Application Scenarios
- Traffic spikes in online services: Sudden increases in requests (e.g., flash sales, ticketing events, time-limited promotions) cause CPU/QPS/latency metrics to cross the HPA threshold, triggering rapid replica expansion.
- Event-driven backlogs: Message queues or stream processors scale out when queue length grows (custom metrics), producing many workers with identical Pod templates.
- Multi-tenant microservices: A shared platform experiences correlated load increases across multiple services, creating concurrent bursts of similar replicas.
- Failover and recovery: Node restarts or transient failures may temporarily reduce capacity; when capacity recovers, controllers may recreate many missing replicas quickly.
Why Scheduling Becomes a Bottleneck in HPA Bursts
Default Scheduler Pipeline
- Snapshot and feasibility: Obtain a scheduling snapshot and run PreFilter/Filter (and extenders if enabled) to produce a feasible node set F.
- Scoring: Run enabled Score plugins over (typically all) feasible nodes to produce node scores, normalize them, and compute a total score for each node.
- Selection and binding: Select a host (including tie-breaking) and then execute the binding cycle (Reserve/Permit/Bind and related hooks).
Limitations of the Default Scheduler Under HPA Burst Workloads
- Redundant scoring for near-identical Pods. When a workload creates many replicas, the scheduling-relevant inputs of consecutive Pods (requests, constraints, and often feasible node sets) are highly similar. Nevertheless, the default scheduler re-executes the full scoring pipeline for each Pod, repeatedly evaluating the same Score plugins over largely the same node set. This causes redundant computation that does not directly improve the decision quality for every replica.
- Score-stage cost grows with cluster size and plugin complexity. Let denote the number of feasible nodes and let be the set of enabled Score plugins. The per-Pod scoring cost is roughly proportional to , and some plugins (e.g., those involving affinity/topology reasoning or non-trivial resource calculations) are significantly more expensive than simple resource checks. In a burst, this cost is multiplied by the number of new replicas, amplifying scheduler CPU consumption and latency.
- Queueing and throughput degradation under burst arrivals. HPA can create Pods faster than the scheduler can complete scoring and binding, which increases the pending queue length. As queue length grows, Pods spend more time waiting before entering the scheduling cycle, increasing the overall scale-out time (create → scheduled → ready). In practice, this queueing delay can dominate for large bursts.
- Limited reuse across consecutive scheduling events. While the scheduler maintains internal caches for certain computations, the default pipeline does not provide a general mechanism to reuse node-score rankings across consecutive Pods that are scheduling-equivalent. As a result, even when consecutive Pods would yield nearly identical node rankings, the scheduler recomputes scores from scratch rather than amortizing the cost across a burst.
4.2. Custom Scheduling Algorithm Design
4.2.1. Similarity Definition and Cache Applicability
ScoreKey (Pod Equivalence Signature)
Feasibility-Context Fingerprint (feasibleHash)
4.2.2. Caching Mechanism
Cache Entry Structure and Update Rules
Insertion (Seeding)
Selection (Pop-and-Advance Top-1)
Invalidation and Bounded Refresh
Capacity Control and Concurrency
4.2.3. Cache Content
Cache Key
Cache Entry Structure
- feasibleHash is a fingerprint of the current feasible node set produced by the Filter stage (computed by hashing the sorted feasible node names). It binds the cached ranking to the feasibility context under which it was computed, and prevents reusing rankings when feasibility changes.
- L is the core cached content: an ordered list of scored node records sorted by descending total score. Each element corresponds to one feasible node and stores:where is the node identity (e.g., node name), is the final aggregated score used for ranking, and (optional) is a vector of per-plugin scores (useful for debugging and analysis but not required for selection). The list is sorted as:primarily by (descending). If multiple nodes have the same total score, we apply a deterministic tie-breaker on node name to ensure stable, repeatable behavior.
- initialN records the size of L at insertion time, i.e., when the entry is seeded. This value is used to compute a consumption ratio for bounded refresh.
- popped counts how many candidates have already been consumed from the front of the list since the last recomputation. It reflects how far the cache entry has progressed through its ranking.
- ts (optional) is a timestamp for the last recomputation, used for diagnostics and performance measurement (not strictly required by the algorithm).
Interpretation: A Reusable “Ranking Table” with a Moving Pointer
Seeding on Miss (What Gets Inserted)
Safe Reuse and Bounded Staleness (Why Metadata Is Needed)
4.2.4. Scheduling Logic
- (1)
- Feasible node computation (Filter stage remains unchanged).
- (2)
- Similarity modeling with ScoreKey.
- The scheduler/profile identity (to avoid cross-profile interference),
- Priority and runtime class information,
- Aggregated resource requests (CPU, memory, GPU) computed from containers and overhead, and
- Hashed representations of constraint-bearing fields (e.g., nodeSelector, affinity, tolerations, and topologySpreadConstraints).
- (3)
- Feasibility-context binding via feasibleHash.
- (4)
- Cache-aware scoring with rotating Top-1 selection.
- Compute and .
- Query the score cache for an entry associated with k that is valid under feasibility context h.
- Cache hit: If a valid entry exists, it contains a ranked list L of nodes sorted by their total framework score (descending). We return only the first element of L (the current Top-1 candidate) and remove it from the list (“pop”). This yields a pop-and-advance behavior: subsequent Pods with the same ScoreKey rotate to the next-best candidates without re-running the full scoring pipeline.
- Cache miss or invalid entry: If no valid cache entry exists (first time for this key, or entry expired/invalidated), we execute the standard Kubernetes scoring pipeline once over all nodes in F (i.e., the original scoring function including enabled Score plugins and normalization). The resulting list is sorted into a ranked list L, stored in the cache together with feasibility context h, and then we pop and return the Top-1 element as the scheduling decision.
- (5)
- Invalidation and refresh (bounded staleness).
- Feasible-set mismatch: If the stored feasibleHash differs from the current h, the cached ranking is considered stale and is immediately invalidated. This ensures that we do not select a node that is no longer feasible or ignore newly feasible nodes.
- Consumption-based refresh: Each cache entry tracks how many nodes have been popped relative to the initial list size. When the consumed fraction exceeds a configured threshold (default: refresh after more than of the cached nodes have been popped), we invalidate the entry so that the next scheduling attempt for the same ScoreKey triggers a full recomputation. This policy limits long-lived reuse and keeps rankings responsive to cluster dynamics.
- Empty ranking: If the cached list is exhausted, the entry is removed and recomputed upon the next request.
- (6)
- Concurrency control.
- (7)
- Output and integration with the default pipeline.
| Algorithm 1: Rotating Score-Cache Reuse for Similar Pods |
![]() |
4.2.5. Correctness and Freshness Considerations and Safety Mechanisms
- (i)
- Feasible-set binding: Cache reuse is permitted only when the hash of the current feasible node set matches the hash stored with the cache entry; otherwise, the entry is dropped and recomputed. This ensures that nodes that become infeasible due to resource consumption or state changes are not selected from stale rankings.
- (ii)
- Consumption-based refresh: Even if feasibility remains unchanged, score ordering may drift as resources change. To limit staleness, we recompute the cached ranking after a configurable fraction of the list has been consumed (default: refresh after more than 50% of the cached nodes have been popped). This provides a practical trade-off between reuse benefits and responsiveness to cluster dynamics.
4.3. Related Work: Result Reuse in Kubernetes Scheduling
5. Experiment
5.1. Experimental Environment
5.1.1. Cloud Platform and Cluster Provisioning
- Cloud provider: Google Cloud Platform (GCP);
- Managed Kubernetes: Google Kubernetes Engine (GKE);
- Cluster name: sched-exp;
- Location (zone): asia-northeast1-a;
- Kubernetes version (nodes): v1.33.5-gke.2019000.
5.1.2. Node Pool Configuration
Autoscaling Configuration
5.1.3. Schedulers Under Comparison
Apache YuniKorn (Baseline Scheduler)
- (i)
- Hierarchical queues and fairness (e.g., enforcing capacity/fair-share between tenants or workloads);
- (ii)
- Placement rules to map applications/users/namespaces into specific queues in a policy-driven way;
- (iii)
Koordinator (Baseline Scheduling & QoS System)
5.1.4. Proposed Cache Scheduler Deployment
Required Manifests and Concrete Deployment Steps
- (i)
- A dedicated Namespace for isolating scheduler resources;
- (ii)
- A ConfigMap containing a KubeSchedulerConfiguration that registers a scheduler Name;
- (iii)
- ServiceAccount plus RBAC bindings (cluster-scoped reads and leader-election permissions);
- (iv)
- A Deployment that launches the customized scheduler image with the mounted configuration.
- (1)
- Namespace: create an isolated namespace to host all resources of the custom scheduler (e.g., scheduler-system). This keeps the deployment independent from kube-system and avoids name collisions with default components.
- (2)
- Scheduler configuration (ConfigMap): store a v1 scheduler component config (kubescheduler.config.k8s.io/v1). The key fields are:
- profiles[0].schedulerName=cache-scheduler: Binds this scheduler instance to Pods with the same spec.schedulerName.
- leaderElection.leaderElect=true: Enables leader election (safe even for single replica; prevents dual-active scheduling if scaled).
- leaderElection.resourceNamespace=scheduler-system and resourceName=cache-scheduler: The lock (Lease) is created and maintained in the custom namespace.
- (3)
- RBAC (ServiceAccount + bindings): Run the scheduler under a dedicated Service Account (scheduler-system/cache-scheduler). In Kubernetes v1.33.5, a scheduler requires:
- Core scheduler permissions (watch/list pods, nodes, PV/PVC, create bindings, update Pod status, etc.). In practice, this is commonly satisfied by binding the service account to the built-in ClusterRole system:kube-scheduler (when present in the cluster).
- Informer read permissions observed necessary in our GKE setup:
- (a)
- Listing/watching StorageClass objects at cluster scope;
- (b)
- Listing/watching ConfigMap objects, in particular access to the control-plane kube-system/extension-apiserver-authentication configmap which is used by client-go authentication plumbing.
- Leader election permissions in the scheduler namespace: get/create/update/patch on leases.coordination.k8s.io for the lock named cache-scheduler.
- (4)
- Scheduler Deployment: Deploy the customized scheduler as a single-replica Deployment in scheduler-system. The key fields are:
- serviceAccountName:cache-scheduler to attach the RBAC identity.
- image:mrboen123/cache-scheduler:<tag> pointing to our v1.33.5-based build. The <tag> used in the evaluation is v1.33.5-r2.
- args: Start with --config=/etc/kubernetes/scheduler-config.yaml and a verbosity level (e.g., --v=3).
- volumeMounts: Mount the config file from the ConfigMap at the path referenced by --config.
- (5)
- Workload opt-in: For any workload that should be scheduled by cache-scheduler, set:spec:schedulerName: cache-schedulerThis single field ensures the default scheduler ignores the Pod, and our custom scheduler becomes responsible for scheduling it.
- (6)
- Apply order and verification commands: The concrete deployment procedure is:
- 1.
- Apply namespace, configuration, RBAC, and deployment manifests in order:kubectl apply -f 01-namespace.yamlkubectl apply -f 02-configmap.yamlkubectl apply -f 03-rbac.yamlkubectl apply -f 04-deployment.yaml
- 2.
- Wait for the scheduler Pod to become Running:kubectl -n scheduler-system get pods -l app = cache-scheduler -o wide
- 3.
- Check scheduler logs for successful startup and leader acquisition (no RBAC forbidden errors):kubectl -n scheduler-system logs deploy/cache-scheduler –tail = 200
- 4.
- Launch an opt-in smoke-test Pod and confirm it is scheduled (a Scheduled event is generated and a node is assigned):kubectl -n sched-test describe pod <pod-name>
- Files used in our experiments. We maintain the following manifest files as part of the experimental artifact:
- 01-namespace.yaml (namespace scheduler-system);
- 02-configmap.yaml (KubeSchedulerConfiguration with
- schedulerName=cache-scheduler);
- 03-rbac.yaml (service account and RBAC including leases permissions);
- 04-deployment.yaml (deployment of mrboen123/cache-scheduler:<tag> mounting the configuration).
5.2. Experiment Design
5.2.1. Scheduling Performance Benchmark Design
Objective
Controlled Workload and Burst Generation
Scheduler Selection and Run Isolation
Trial Procedure (Cleanup → Burst → Wait Conditions → Export)
- Namespace and Deployment preparation: Create the namespace if needed; apply/update the Deployment manifest with replicas initially set to 0.
- Forced refresh and cleanup: Patch the Pod template labels with a new RUNID to force the controller to create a new ReplicaSet; scale the Deployment to 0 and wait for old Pods to be deleted.
- Burst scale-out: Scale replicas to N; poll until at least N Pods with bench-run=RUNID are observed to handle controller creation delays.
- Synchronization points: Wait until all Pods satisfy condition PodScheduled. Then, wait for Ready condition as a best-effort signal whcih is bounded by a timeout.
- Data export: Export all Pods of the run as JSON (pods_RUNID.json). If the Metrics Server is available, also snapshot node-level resource usage via kubectl top nodes (topnodes_RUNID.txt).
Latency Metrics
- Scheduling latency :PodScheduled.lastTransitionTime − metadata.creationTimestamp.
- Ready latency :Ready.lastTransitionTime − metadata.creationTimestamp.
- Post-scheduling latency :Ready.lastTransitionTime − PodScheduled.lastTransitionTime.
Placement Distribution and QoS-Oriented Indicators
- Pods per node: Min/max/mean, standard deviation, and coefficient of variation (CV).
- Fairness: Jain’s fairness index over (i) Pod counts per node, (ii) aggregated requested CPU per node, and (iii) aggregated requested memory per node. Higher Jain values indicate more even distribution.
- Skew inspection: The top-5 nodes with the highest Pod counts are printed for quick diagnosis.
Repeatability and Comparison Across Schedulers
5.2.2. HPA Benchmark Design
Purpose
Workload and Autoscaling Configuration
| Algorithm 2: Burst Scheduling Benchmark |
![]() |
Run Isolation and Artifacts
Load Generation and SLA Definition
Two-Stage Protocol: FIND Then HOLD
- (1)
- FIND (threshold discovery). Concurrency is ramped from FIND_START to FIND_MAX in steps of FIND_STEP, and each level is exercised for FIND_DURATION. The first concurrency that violates the SLA is selected as the hold concurrency. If no violation is observed up to FIND_MAX, the script conservatively sets .
- (2)
- HOLD (time-to-recover measurement). The script applies constant stress at concurrency for HOLD_TOTAL using fixed windows of length HOLD_WINDOW. In the updated run configuration, HOLD_TOTAL is extended to 15 minutes to increase the probability of observing both degradation and recovery under autoscaling dynamics. Recovery is declared after RECOVERY_STREAK consecutive SLA-OK windows following a first observed breach; in our updated configuration we set RECOVERY\_STREAK=1 to measure the earliest return to the SLA target once autoscaling takes effect.
TTR Definition and Interpretation Modes
- ttr_mode = recovered: an SLA breach occurred and the service recovered; ttr_sec is the measured recovery time.
- ttr_mode = no_degradation: no SLA breach occurred during HOLD; the script reports ttr_sec = 0 (TTR not applicable because the service never degraded).
- ttr_mode = no_recovery_within_hold: an SLA breach occurred but no recovery was observed within the HOLD horizon; ttr_sec = −1 denotes right-censoring.
Timeline Collection (HPA and Pod Readiness)
Scheduling/Ready Latency for HPA-Created Pods
| Algorithm 3: HPA Benchmark with FIND+HOLD and TTR Modes (as implemented in HPA.sh) |
![]() |
5.3. Experiment Result
5.3.1. Scheduling Performance Benchmark
Metrics
Overall Comparison
Placement Quality (Distribution QoS)
5.3.2. HPA Scenario Benchmark
Time-to-Recover (TTR)
Autoscaling Dynamics vs. QoS Recovery
QoS Degradation During HOLD
6. Conclusions
6.1. Achievements
- A keyed score-reuse mechanism integrated into kube-scheduler. We implemented a scheduler-side score reuse layer that identifies scheduling-equivalent Pods via a ScoreKey derived from scheduling-relevant PodSpec fields (resource requests and hashed constraints). The mechanism reuses standard framework scoring outputs without modifying individual scheduling plugins, preserving compatibility with the Kubernetes scheduling framework.
- Rotating Top-1 selection to avoid repeated placement on a single node. Instead of repeatedly selecting the same cached best node for all replicas, our cache stores a ranked node list and applies a pop-and-advance policy: each successful scheduling decision returns the current Top-1 candidate and removes it from the cached list. This design naturally rotates placements among high-scoring nodes in a burst, improving diversity without introducing additional optimization passes.
- Bounded staleness via feasibility-context validation and refresh. We introduced two explicit safety controls to maintain correctness under dynamic cluster conditions. First, cache applicability is guarded by a feasibility-context fingerprint (feasibleHash) computed from the current feasible node set; mismatches trigger immediate invalidation and recomputation. Second, we bound reuse over time by a consumption-based refresh rule (default threshold ), which forces periodic full recomputation after consuming a significant portion of cached candidates.
- Evaluation on a real GKE cluster, including HPA burst scenarios. We evaluated the proposed scheduler on a production-grade Google Kubernetes Engine (GKE) cluster with one control plane and five worker nodes. In addition to rapid deployment bursts (e.g., creating 200 Pods), we also conducted experiments under HPA-driven scale-out scenarios to validate effectiveness in realistic autoscaling conditions. We compared against the default scheduler as well as representative alternative schedulers (Koordinator and YuniKorn). The results demonstrate that reusing ranked scoring results can reduce average scheduling latency after warm-up while maintaining placement fairness comparable to the default scheduler.
6.2. Limitations
- Conservative similarity definition may reduce cache hit rate. ScoreKey uses strict equality over hashed constraint fields and aggregated resource requests. Semantically equivalent but syntactically different specifications (e.g., reordered constraints) may not match, lowering reuse opportunities. This is an intentional correctness-first choice, but it may underutilize reuse in practice.
- Feasible-set fingerprint is a coarse proxy for cluster dynamics. Our feasibleHash guards reuse based on changes in the feasible node set. However, scores can drift even when feasibility remains unchanged (e.g., resource headroom decreases but still passes filters). We mitigate this using consumption-based refresh, yet the fingerprint does not directly capture finer-grained score drift.
- Top-1 determinism and interaction with tie-breaking. The cache-aware routine returns a single Top-1 candidate rather than a full priority list. This can reduce the randomness introduced by reservoir sampling among equally-scored nodes in the default scheduler. Although pop-and-advance provides diversity across consecutive replicas, the selection may still be more deterministic than the default behavior.
- Workload dependence and warm-up cost. The benefits depend on the presence of repeated, scheduling-equivalent Pods (typical in replica bursts). For heterogeneous workloads with low repetition, cache hits will be rare and performance will approach the default scheduler. Additionally, the first Pod in each ScoreKey class still incurs a full scoring pass (warm-up).
- Limited experimental scope. Our evaluation is conducted on a specific cluster scale and configuration (GKE, 5 workers) and a limited set of workloads and scheduler configurations. Results may vary with larger clusters, different plugin sets, different node heterogeneity, or different autoscaling policies.
6.3. Future Work
- Richer similarity modeling and canonicalization. We plan to improve ScoreKey robustness by canonicalizing constraint specifications (e.g., normalizing ordering) and by selectively including additional scheduling-relevant fields (e.g., certain annotations or resource classes). This could increase cache hit rates while maintaining safety.
- Stronger freshness controls beyond feasible-set hashing. Future versions could incorporate lightweight resource-state summaries into the applicability check (e.g., hashing coarse resource headroom buckets) or adopt plugin-aware staleness signals. This would allow refresh decisions to reflect score drift more directly rather than relying mainly on set changes and consumption thresholds.
- Adaptive refresh and diversity policies. Instead of a fixed , the scheduler could adapt the refresh threshold based on observed cluster volatility, scheduling latency targets, or workload characteristics. Similarly, diversity policies could be extended (e.g., mixing Top-k sampling from the cached ranking) to better emulate default tie-breaking while retaining reuse benefits.
- Scaling evaluation and broader baselines. We will extend experiments to larger clusters and more diverse workloads, and evaluate interactions with additional scheduling features such as preemption, extenders, and heterogeneous node pools. We also plan to evaluate end-to-end autoscaling latency (HPA decision → Pod ready) under controlled load traces.
- Engineering hardening and observability. Additional engineering work includes improved cache eviction policies (e.g., LRU), richer metrics for cache hit rate and recomputation causes, and tracing hooks to help operators understand when score reuse is effective or when it is being invalidated.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| VM | Virtual Machine |
| CaaS | Container as a Service |
| ECS | Elastic Container Service |
| OS | Operating System |
| LXC | Linux Containers |
| OCI | Open Container Initiative |
| API | Application Programming Interface |
| CRI | Container Runtime Interface |
| GKE | Google Kubernetes Engine |
| AKS | Azure Kubernetes Service |
| EKS | Operating System |
| CLI | Command-Line Interface |
| QoS | Quality of Service |
| HA | High Availability |
| K8s | Kubernetes |
| HPA | Horizontal Pod Autoscaler |
References
- Kun, H.; Hongjun, C. The Applied Research on the Virtualization Technology in Cloud Computing. In Proceedings of the 1st International Workshop on Cloud Computing and Information Security, Shanghai, China, 9–11 November 2013; Atlantis Press: Dordrecht, The Netherlands, 2013; pp. 526–529. [Google Scholar]
- Xiao, Z.; Song, W.; Chen, Q. Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 2012, 24, 1107–1117. [Google Scholar] [CrossRef]
- Bentaleb, O.; Belloum, A.S.; Sebaa, A.; El-Maouhab, A. Containerization technologies: Taxonomies, applications and challenges. J. Supercomput. 2022, 78, 1144–1181. [Google Scholar] [CrossRef]
- Merkel, D. Docker: Lightweight linux containers for consistent development and deployment. Linux J. 2014, 239, 2. [Google Scholar]
- Al Jawarneh, I.M.; Bellavista, P.; Bosi, F.; Foschini, L.; Martuscelli, G.; Montanari, R.; Palopoli, A. Container orchestration engines: A thorough functional and performance comparison. In Proceedings of the ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
- Malviya, A.; Dwivedi, R.K. A comparative analysis of container orchestration tools in cloud computing. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; IEEE: New York, NY, USA, 2022; pp. 698–703. [Google Scholar]
- Rodriguez, M.A.; Buyya, R. Container-based cluster orchestration systems: A taxonomy and future directions. Softw. Pract. Exp. 2019, 49, 698–719. [Google Scholar] [CrossRef]
- Rashid, A.; Chaturvedi, A. Virtualization and its role in cloud computing environment. Int. J. Comput. Sci. Eng. 2019, 7, 1131–1136. [Google Scholar] [CrossRef]
- Pahl, C.; Brogi, A.; Soldani, J.; Jamshidi, P. Cloud container technologies: A state-of-the-art review. IEEE Trans. Cloud Comput. 2017, 7, 677–692. [Google Scholar] [CrossRef]
- Ambrosino, G.; Fioccola, G.B.; Canonico, R.; Ventre, G. Container mapping and its impact on performance in containerized cloud environments. In Proceedings of the 2020 IEEE International Conference on Service Oriented Systems Engineering (SOSE), Oxford, UK, 13–16 April 2020; IEEE: New York, NY, USA, 2020; pp. 57–64. [Google Scholar]
- Morabito, R. A performance evaluation of container technologies on internet of things devices. In Proceedings of the 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), San Francisco, CA, USA, 10–14 April 2016; IEEE: New York, NY, USA, 2016; pp. 999–1000. [Google Scholar]
- Boettiger, C. An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 2015, 49, 71–79. [Google Scholar] [CrossRef]
- Stanojevic, P.; Usorac, S.; Stanojev, N. Container manager for multiple container runtimes. In Proceedings of the 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021; IEEE: New York, NY, USA, 2021; pp. 991–994. [Google Scholar]
- Casalicchio, E.; Iannucci, S. The state-of-the-art in container technologies: Application, orchestration and security. Concurr. Comput. Pract. Exp. 2020, 32, e5668. [Google Scholar] [CrossRef]
- Carrión, C. Kubernetes scheduling: Taxonomy, ongoing issues and challenges. ACM Comput. Surv. 2022, 55, 138. [Google Scholar] [CrossRef]
- Burns, B.; Grant, B.; Oppenheimer, D.; Brewer, E.; Wilkes, J. Borg, omega, and kubernetes. Commun. ACM 2016, 59, 50–57. [Google Scholar] [CrossRef]
- Verma, A.; Pedrosa, L.; Korupolu, M.; Oppenheimer, D.; Tune, E.; Wilkes, J. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, Bordeaux, France, 21–24 April 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1–17. [Google Scholar]
- Chhajed, S. Learning ELK Stack; Packt Publishing Ltd.: Birmingham, UK, 2015. [Google Scholar]
- Carvalho, M.; Macedo, D.F. QoE-aware container scheduler for co-located cloud environments. In Proceedings of the 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM), Virtual, 17–21 May 2021; IEEE: New York, NY, USA, 2021; pp. 286–294. [Google Scholar]
- Nguyen, T.T.; Yeom, Y.J.; Kim, T.; Park, D.H.; Kim, S. Horizontal pod autoscaling in kubernetes for elastic container orchestration. Sensors 2020, 20, 4621. [Google Scholar] [CrossRef] [PubMed]
- Santos, J.; Wauters, T.; Volckaert, B.; De Turck, F. Towards network-aware resource provisioning in kubernetes for fog computing applications. In Proceedings of the 2019 IEEE Conference on Network Softwarization (NetSoft), Paris, France, 24–28 June 2019; IEEE: New York, NY, USA, 2019; pp. 351–359. [Google Scholar]
- Wojciechowski, Ł.; Opasiak, K.; Latusek, J.; Wereski, M.; Morales, V.; Kim, T.; Hong, M. Netmarks: Network metrics-aware kubernetes scheduler powered by service mesh. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications, Virtual, 10–13 May 2021; IEEE: New York, NY, USA, 2021; pp. 1–9. [Google Scholar]
- Qi, S.; Kulkarni, S.G.; Ramakrishnan, K. Assessing container network interface plugins: Functionality, performance, and scalability. IEEE Trans. Netw. Serv. Manag. 2020, 18, 656–671. [Google Scholar] [CrossRef]
- Menouer, T. KCSS: Kubernetes container scheduling strategy. J. Supercomput. 2021, 77, 4267–4293. [Google Scholar] [CrossRef]
- Pérez de Prado, R.; García-Galán, S.; Muñoz-Expósito, J.E.; Marchewka, A.; Ruiz-Reyes, N. Smart containers schedulers for microservices provision in cloud-fog-IoT networks. Challenges and opportunities. Sensors 2020, 20, 1714. [Google Scholar] [CrossRef] [PubMed]
- Rejiba, Z.; Chamanara, J. Custom scheduling in kubernetes: A survey on common problems and solution approaches. ACM Comput. Surv. 2022, 55, 151. [Google Scholar] [CrossRef]
- Senjab, K.; Abbas, S.; Ahmed, N.; Khan, A.u.R. A survey of Kubernetes scheduling algorithms. J. Cloud Comput. 2023, 12, 87. [Google Scholar] [CrossRef]
- Kubernetes SIG Scheduling. Remove Equivalence Cache (eCache) from the Scheduler Code Base. GitHub Issue #71013, Kubernetes/Kubernetes. 2018. Available online: https://github.com/kubernetes/kubernetes/issues/71013 (accessed on 26 January 2026).
- Kubernetes Enhancements. KEP-5598: Opportunistic Batching. Kubernetes Enhancement Proposal (KEP), SIG Scheduling. 2025. Available online: https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/5598-opportunistic-batching/README.md (accessed on 26 January 2026).
- The Kubernetes Authors. Scheduler Performance Tuning: Enabling Opportunistic Batching. Kubernetes Documentation. 2026. Available online: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduler-perf-tuning/ (accessed on 26 January 2026).
- The Kubernetes Authors. Kubernetes v1.35: Timbernetes (The World Tree Release). Kubernetes Blog. 2025. Available online: https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/ (accessed on 26 January 2026).
- Apache YuniKorn Project. YuniKorn Kubernetes Shim: Design/Kubernetes Shim Design. 2026. Available online: https://yunikorn.apache.org/docs/next/archived_design/k8shim (accessed on 29 January 2026).
- Apache YuniKorn Project. YuniKorn on Kubernetes: Scheduler Shim Overview (Documentation Page). 2026. Available online: https://yunikorn.apache.org/docs/ (accessed on 29 January 2026).
- Alibaba Cloud. ACK Koordinator (FKA ack-slo-Manager): Product Overview and Architecture. 2025. Available online: https://www.alibabacloud.com/help/en/ack/product-overview/ack-koordinator-fka-ack-slo-manager (accessed on 29 January 2026).
- Koordinator Project. Load-Aware Scheduling (Koordinator Documentation). 2026. Available online: https://koordinator.sh/docs/user-manuals/load-aware-scheduling (accessed on 29 January 2026).
- AdriftVin. k8s-Cache-Scheduler: A Custom Scheduler with Score Cache for Kubernetes (Based on kube-Scheduler v1.33.5). GitHub Repository. Release v1.33.5-r2. 2026. Available online: https://github.com/AdriftVin/k8s-cache-scheduler (accessed on 29 January 2026).






| Item | Configuration |
|---|---|
| Node pool name | exp-pool |
| Machine type | e2-standard-2 |
| Boot disk size | 30 GB |
| Node OS/runtime | GKE default node image; container runtime via containerd |
| Nodes during main runs | 5 worker nodes |
| Planned upper bound | up to 8 nodes (subject to quota) |
| Scheduler | Namespace | Scheduler Name |
|---|---|---|
| Default Kubernetes scheduler | kube-system | default-scheduler |
| YuniKorn scheduler | yunikorn | yunikorn |
| Koordinator scheduler | koordinator-system | koord-scheduler |
| Proposed cache scheduler | scheduler-system | cache-scheduler |
| Scheduler | Jain ↑ | CV ↓ | ||||
|---|---|---|---|---|---|---|
| Default | 1175.0 | 2673.3 | 4958.3 | 8673.3 | 0.9919 | 0.090 |
| Koordinator | 816.7 | 2000.0 | 4856.7 | 11,670.0 | 0.9806 | 0.141 |
| YuniKorn | 1955.0 | 4336.7 | 11,540.0 | 26,003.3 | 0.8793 | 0.370 |
| Cache-scheduler | 801.7 | 2000.0 | 4818.3 | 8670.0 | 0.9917 | 0.091 |
| Scheduler | TTR (s) ↓ | Speedup vs. Default | Max p99 (s) ↓ | Mean p99 (s) ↓ | OK-Rate ↑ | Ready-20 Time (s) ↓ |
|---|---|---|---|---|---|---|
| Default | 346 | 1.00× | 1.962 | 1.273 | 0.40 | 99 |
| Koordinator | 191 | 1.81× | 2.613 | 1.291 | 0.43 | 105 |
| YuniKorn | 152 | 2.28× | 1.823 | 1.166 | 0.33 | 111 |
| Cache-scheduler | 115 | 3.01× | 1.872 | 1.178 | 0.43 | 105 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhou, B.; Mondal, S.K.; Cheng, Y.; Kabir, H.M.D. On Optimized Scheduling Scheme for Rapid Pod Autoscaling in Kubernetes. Appl. Sci. 2026, 16, 2481. https://doi.org/10.3390/app16052481
Zhou B, Mondal SK, Cheng Y, Kabir HMD. On Optimized Scheduling Scheme for Rapid Pod Autoscaling in Kubernetes. Applied Sciences. 2026; 16(5):2481. https://doi.org/10.3390/app16052481
Chicago/Turabian StyleZhou, Bowen, Subrota Kumar Mondal, Yuning Cheng, and H. M. Dipu Kabir. 2026. "On Optimized Scheduling Scheme for Rapid Pod Autoscaling in Kubernetes" Applied Sciences 16, no. 5: 2481. https://doi.org/10.3390/app16052481
APA StyleZhou, B., Mondal, S. K., Cheng, Y., & Kabir, H. M. D. (2026). On Optimized Scheduling Scheme for Rapid Pod Autoscaling in Kubernetes. Applied Sciences, 16(5), 2481. https://doi.org/10.3390/app16052481




