1. Introduction
In modern integrated circuit design, routing is a critical task that involves electrically connecting different components in a circuit (such as transistors, wiring ports, etc.) while minimizing the total wiring length and ensuring electrical performance. As the scale of integrated circuits continues to increase, especially the complexity of chip designs, traditional routing algorithms are facing unprecedented challenges in terms of computational time, accuracy, and processing capabilities for large-scale designs. Accordingly, the development of efficient and scalable routing solutions has emerged as a key research focus in Electronic Design Automation (EDA) [
1].
The significance of routing optimization extends beyond theoretical considerations, having a direct impact on practical integrated circuit designs. For instance, low-power digital signal processing circuits, such as the DLMS adaptive filter [
2], are highly sensitive to interconnect delays and power consumption; efficient routing can substantially enhance their energy efficiency. Similarly, many-ported memories implemented with standard-cell memory architectures [
3] involve extremely dense interconnects, where routing optimization is critical to achieving timing closure and reliable operation. These examples underscore the crucial role of effective routing strategies in enabling high-performance, low-power integrated circuits.
Routing in an integrated circuit physical design is typically divided into three sequential phases: tree generation, global routing, and detailed routing. Among these, tree generation is the first and most critical stage, as it determines the initial net topology upon which all subsequent routing steps are based. The quality of the tree directly influences downstream stages, including congestion control, timing closure, and design rule compliance. The Rectilinear Steiner Minimum Tree (RSMT) model is widely adopted during this stage to construct minimal-length topologies by introducing Steiner points. In addition to wirelength optimization, RSMT serve as the foundation for interconnect optimization techniques such as buffer insertion, which are crucial for improving timing performance and reducing dynamic power consumption [
4]. Consequently, the ability to efficiently generate high-quality RSMT is fundamental to achieving timing-aware and power-efficient routing solutions.
Despite its utility, RSMT remains computationally intractable for large-scale nets due to its NP-hard nature [
5]. In practice, the RSMT construction process can be invoked millions of times throughout a modern chip’s routing flow, as it underpins the initial topology of every net. This high frequency of invocation highlights the pressing need for algorithms that are not only accurate but also computationally efficient and reusable. However, most traditional RSMT construction approaches are tailored to specific topologies or design contexts, which limits their generalization capabilities. As a result, even minor variations in net structures often necessitate re-executing the algorithm from scratch, leading to significant computational overhead and suboptimal design cycles.
Conventional approaches to RSMT construction typically fall into three categories: exact algorithms, heuristic algorithms, and meta-heuristic algorithms [
6]. Exact algorithms guarantee optimality but become computationally infeasible as problem size increases [
7,
8]. Heuristic methods are efficient but heavily rely on handcrafted rules and often suffer from poor generalizability. Meta-heuristic methods integrate global search strategies, such as simulated annealing and evolutionary computation, but incur higher computational cost and convergence instability. Furthermore, these algorithms often fail to exploit the shared structural patterns among diverse RSMT instances, limiting their effectiveness in Very-Large-Scale Integration (VLSI) systems. These limitations have motivated the research for more data-driven, adaptable, and scalable solutions that can generalize across instances and topologies with minimal manual intervention.
In recent years, deep learning and reinforcement learning have shown great potential in solving complex optimization problems, especially in image processing, natural language processing, and combinatorial optimization. Among these, reinforcement learning excels at addressing combinatorial optimization challenges like the RSMT problem through iterative exploration and policy optimization. As a result, its application in integrated circuit design has garnered increasing attention [
9,
10].
In this paper, a novel methodology combining reinforcement learning and advanced neural network architectures is proposed to improve the overall performance of the RSMT construction task. The main contributions of this paper are as follows:
A reinforcement learning framework for RSMT construction is presented, which integrates a Selective Kernel Transformer Network (SKTNet) to jointly model local geometric patterns and global topological dependencies.
The proposed SKTNet innovatively combines Selective Kernel Convolutions (SKConv) and the stacked improved Macaron Transformer to achieve adaptive multi-scale feature extraction and enhance the model’s ability to handle varying point-set complexities.
A Self-Critical Sequence Training (SCST) strategy is adopted to optimize the end-to-end training process without requiring explicit ground-truth labels, while extensive experiments (including input transformations, visualizations, and ablation studies) demonstrate that the proposed method consistently reduces wirelength errors compared to traditional heuristics and prior learning-based baselines.
For clarity, all acronyms used in this paper are summarized in
Appendix A (
Table A1) in alphabetical order.
2. Related Work
Traditional RSMT solution methods mainly include heuristic algorithms and exact algorithms, among which the most common heuristic algorithms include FLUTE [
11] and GeoSteiner [
7]. The FLUTE algorithm solves small-scale problems by constructing lookup tables and decomposes large-scale problems into several sub-problems for solving. This method has high computational efficiency and is suitable for quickly generating approximate solutions. However, when faced with large-scale point sets, the accuracy of the solutions will decrease rapidly, especially when the distribution of point sets is complex, and the accuracy loss is more obvious. GeoSteiner ensures the optimal solution by enumerating all possible combinations of Steiner points; its time complexity usually grows exponentially, making it difficult to meet the real-time computing requirements in large-scale designs.
In recent years, machine learning has been widely used in various fields of EDA, especially in key aspects such as routing [
12], placement [
13], timing [
14], and congestion [
15], showing great application potential. Graph Neural Networks (GNNs) have become a powerful tool for modeling IC design problems due to the ability to learn from graph structure data. GNN-based models such as CktGNN [
9] have been used to extract route-aware logic features from netlists, while RouteGNN, proposed in the RoutePlacer [
16], learns geometric and topological features to predict net-level routability during layout optimization. Recently, several GNN-based approaches have been proposed to solve the RSMT problem by modeling the network as a graph and directly predicting Steiner points. GAT-Steiner [
17] employs a graph-attentive network to learn the representations on the Hanan grid graphs and achieves high-accuracy Steiner point prediction through parallel reasoning. Similarly, NeuroSteiner [
18] introduces a graph transformer architecture that uses supervised learning on GeoSteiner-labeled datasets to predict Steiner points in a one-shot fashion. These approaches demonstrate the ability of GNN to capture spatial and topological dependencies in circuit networks, although they focus primarily on point prediction and do not directly model edge-level connectivity or route sequence generation. Additionally, GNNs usually necessitate information transfer across multiple layers, leading to significant computational overhead that scales poorly with design size. In addition, their ability to extract local geometric patterns is relatively weak, limiting their effectiveness in tasks such as Steiner Tree construction that require accurate local connectivity modeling.
Another approach is to optimize the routing scheme through reinforcement learning. Currently, reinforcement learning-based models have been proposed and used for global routing [
19], detailed routing [
20], and printed circuit board routing [
21]. In these approaches, the agent learns to construct paths or strategies based on rewards. For example, Li et al. [
22] solved the global routing problem based on pattern routing and introduced reinforcement learning techniques to better search for globally optimal routing decisions. Xiang et al. [
23] used a dueling double deep Q network for multi-agent routing. At a more fundamental level, REST [
24] applies reinforcement learning to the RSMT construction task by combining an Actor–Critic framework with a Transformer encoder to sequentially select edges in the tree. While REST introduces a learning-based decision-making strategy, its encoder lacks architectural specialization, which limits its ability to fully capture the multi-scale structure of the input point set, especially in complex or high-degree networks (where “degree” refers to the number of terminal nodes or pins that must be interconnected in the RSMT). As a result, its performance degrades significantly when applied to large-scale circuit routing scenarios.
To address the limitations of existing learning-based RSMT methods, this paper proposes a novel reinforcement learning framework that incorporates a more expressive encoder, termed the SKTNet. The encoder is designed by combining an SKConv [
25] for adaptive multi-scale feature extraction and an improved Macaron Transformer [
26] to capture long-range dependencies among terminals. Batch normalization is employed in place of layer normalization to enhance training stability. To optimize the sequence generation process, SCST [
27] is adopted, which enables efficient learning through baseline-guided policy optimization. Compared to previous methods such as REST, the proposed model better leverages the local and global features present in complex layouts, enabling it to generate more accurate, robust, and generalizable RSMT solutions over a variety of network sizes. Experimental results on standard benchmarks show that our method consistently outperforms REST in terms of wirelength error while maintaining competitive inference efficiency.
3. Problem Formulation
The Steiner Tree problem is a fundamental combinatorial optimization problem that arises in a variety of domains, including VLSI physical design. Given a set of required nodes, known as terminals, the goal is to construct a connected acyclic graph (i.e., a tree) that spans all terminals, possibly including additional auxiliary nodes called Steiner points, such that the total edge length is minimized [
28].
Figure 1a illustrates an example of terminals placed on a grid, which serve as the input to the problem.
Unlike the Minimum Spanning Tree (MST), which must connect only the given terminals (as shown in
Figure 1b), Steiner Trees are allowed to insert non-terminal nodes to reduce the overall cost of the network. As illustrated in
Figure 1c,d, these additional nodes, called Steiner points (white circles), act as junctions that enable interconnect segments to merge, thereby shortening the total wirelength compared with the MST.
When the objective is to minimize the total length of the tree, the resulting structure is called a Steiner Minimum Tree (SMT). Formally, let
be a set of points in a metric space. An SMT is a tree
S whose vertex set includes all terminals in
T and possibly some additional Steiner points
, such that the total length of edges in
S is minimized, and
S connects all terminals in
T [
29]. In Euclidean space, this leads to oblique edges with optimally placed Steiner points.
In VLSI routing, however, wire segments are restricted to horizontal and vertical directions due to manufacturing constraints. This restriction gives rise to the RSMT problem, a variant of the classical SMT problem, in which all edges must follow Manhattan (L1) geometry. That is, the edge cost between two points
and
is defined as
The RSMT problem is known to be NP-hard [
30], and efficient approximation or learning-based solutions are therefore of great interest in routing.
Figure 1 summarizes this progression: (a) terminals on a grid; (b) a rectilinear MST without Steiner points; (c) a Rectilinear Steiner Tree (RST) with an extra Steiner point but not necessarily optimal; and (d) the RSMT, which achieves the minimum total wirelength under the Manhattan distance metric.
4. The Proposed Method
The overall architecture of the proposed model is illustrated in
Figure 2, comprising two major components: an encoding module and a decoding module. The encoding module is responsible for extracting rich and informative features from the input point set, learning representations that capture both local geometric structures and global topological dependencies. Based on the extracted features, the decoding module sequentially generates the edges of an RSMT through a learned decision-making process.
To enhance the modeling capacity, the encoder adopts the SKTNet, which effectively captures multi-scale spatial information and long-range dependencies within the point set. Furthermore, the model is trained using the SCST strategy, enabling efficient policy optimization in the absence of ground-truth supervision. Through the integration of SKTNet and SCST, the proposed framework achieves significant improvements in RSMT construction accuracy compared to prior learning-based methods.
4.1. Motivation
In the RSMT construction, effective modeling of both local spatial patterns and global topological dependencies is critical. Traditional Transformer architectures, although powerful in capturing global interactions via self-attention, often lack inductive bias for local geometric structures and are sensitive to scale variations in point sets [
31]. This limitation becomes evident in RSMT tasks, where routing quality depends heavily on capturing fine-grained layout features and global coordination among terminals.
To address this, the proposed encoder adopts a hybrid structure that combines an SKConv and an improved Macaron Transformer. SKConv enhances local feature representation by dynamically adjusting its receptive field across multiple kernel sizes, allowing the model to extract multi-scale geometric patterns from irregular point sets. This is particularly important for RSMT, where topological relations can vary significantly with terminal configuration.
In addition, the Macaron Transformer structure, which inserts dual feedforward layers around the attention block, improves the modeling of long-range dependencies and stabilizes gradient flow in deep architectures. This design facilitates better coordination among distant nodes during route generation and enables more expressive encoding of the global layout context.
To further optimize solution quality, SCST is employed as the learning paradigm. Unlike standard reinforcement learning methods that rely on noisy value estimation or Critic networks, SCST directly optimizes the generation policy using a self-generated baseline. Similar reinforcement learning strategies have also been successfully applied to routing-related combinatorial optimization tasks [
32].This results in faster convergence and lower variance gradients, which is crucial in combinatorial problems such as RSMT where supervision is sparse and solution space is exponentially large.
By integrating these components, the proposed framework is better equipped to learn from spatially structured data and produce high-quality routing solutions with improved accuracy and stability.
4.2. Model Architecture
4.2.1. Encoder
As illustrated in
Figure 2, the encoder corresponds to the encoding module of the overall architecture and provides the feature matrix
that conditions the decoder during the sequential construction of the RSMT. Below, we describe the internal SKTNet structure of this encoder and explain how these components produce the high-resolution local and global features consumed by the decoder.
The encoder adopts the SKTNet structure, as shown in
Figure 3. It first embeds the raw 2D coordinates into a latent feature space through a 1D convolutional layer and then processes them with an SKConv module. The SKConv block applies multiple convolutional branches with different kernel sizes and fuses their outputs using attention-based weighting, enabling the model to adaptively emphasize local patterns at multiple scales. The aggregated features are subsequently passed into a stack of Macaron Transformer blocks, where each block consists of two feedforward layers surrounding a multi-head self-attention module to capture long-range dependencies and refine global context. By combining adaptive local feature extraction in SKConv with global dependency modeling in the MT blocks, the encoder is able to learn both fine-grained geometric details and holistic structural relationships, which are essential for accurate RSMT generation.
Given an input set of 2D coordinates
, a 1D convolutional operation is first applied along the set dimension to map each coordinate pair into a higher-dimensional latent space:
where
d denotes the embedding dimension. This preliminary feature set
serves as the foundation for subsequent hierarchical feature extraction.
Traditional convolutional layers typically employ fixed kernel sizes, which limits their adaptability to varying local structures present in different RSMT instances. To overcome this constraint, the encoder integrates an SKConv module. SKConv simultaneously applies multiple convolutional branches with different kernel sizes
and dynamically fuses the resulting features based on learned attention weights. The output from each branch is computed as
where kernel sizes are set as
,
, and
. The outputs of the multi-branch convolutions are concatenated and globally pooled to form a joint descriptor:
The global descriptor
is passed through a shared Multilayer Perceptron (MLP) to generate attention scores, which are normalized using the softmax function to produce selection weights:
The final multi-scale feature representation is then computed as the weighted sum of branch outputs:
This adaptive fusion enables the encoder to flexibly focus on different spatial resolutions depending on the geometric distribution of the point set.
Following SKConv, the encoded features are passed through an improved Macaron Transformer encoder to model the long-range dependencies among points. Each Macaron block consists of two position-wise feedforward networks sandwiching a multi-head self-attention layer. The feature update in each block is as follows:
where
denotes the feedforward sublayer, and the multi-head attention mechanism learns to capture complex pairwise relationships across the entire set of terminals. In addition, to enhance training stability in reinforcement learning settings, batch normalization [
33] is employed within both feedforward and attention modules. Here,
denotes the input to the current block, and
serves as the input to the next block. This process is repeated for three stacked blocks, progressively refining feature representations across multiple layers.
After passing through the SKConv module and the stacked improved Macaron Transformer blocks, the final output of the encoder is a feature matrix , encoding comprehensive local and global information. This feature matrix is subsequently used by the decoder to guide the sequential construction of the RSMT.
4.2.2. Decoder
As illustrated in
Figure 2, the decoder corresponds to the decoding module of the proposed architecture. It takes the encoder feature matrix
as input and sequentially generates the RSMT topology by predicting edge connections step by step.
The decoding module is responsible for constructing an RSMT by sequentially generating a set of
directed edge pairs between terminal points. The decoding process follows the pointer-based framework proposed in REST [
34] and operates in an autoregressive manner conditioned on the encoder output.
Let the encoded feature matrix be denoted as
, where
represents the learned embedding of the
i-th point. A binary mask
is used to track visited nodes. At the initial step, a starting point
is selected based on attention scores:
The decoder then iteratively selects rectilinear edges to incrementally build the tree. At each decoding step t, the model maintains a query embedding derived from historical context, including previous selections. A two-stage pointer mechanism is employed:
First Point Selection: Given the current query vector, the decoder selects a point
from the set of unvisited nodes:
Second Point Selection: Conditioned on the first selection and internal state, a second point
is chosen from either the visited or unvisited set. To capture rectilinear constraints, the selection is formulated as a directional index
, which encodes both a point index and a direction:
where
means the first point is the visited node and
means the second is the unvisited node.
The decoder calculates the two endpoints of each edge by
This edge is appended to the decoded edge list. The visitation mask is updated to mark as visited: . To maintain consistency in generation and allow the decoder to perform conditional processing based on previous selections, context vectors are updated at each step by aggregating the embeddings of previously chosen points via learnable projections.
The process repeats for
steps, generating a total of
valid rectilinear edges. The final output of the decoder is a sequence:
This sequence defines the edge set of the constructed RSMT. During training, the decoder samples actions from the learned policy, while during evaluation, greedy deterministic decoding is used. The entire decoding process adheres to the unique-visit constraint (each point is selected exactly once as unvisited) and constructs a valid tree structure by avoiding cycles through implicit graph construction logic.
4.2.3. Training Strategies
In this work, transfer learning is employed to improve training efficiency and generalization across different circuit sizes. Specifically, when training a model for a degree
instance, the parameters of the model trained on degree
n are used as initialization. This strategy leverages the structural similarity between successive degrees, allowing the model to inherit previously learned representations and accelerate convergence on larger problem instances.This approach is inspired by prior works applying transfer learning in combinatorial optimization, for example, the use of transfer learning from TSP to CVRP to speed up training and improve generalization [
35] and the ATSP-to-SOP RL transfer framework showing gains in both efficiency and solution quality [
36].
To directly optimize the sequence-level objective of minimizing the total wirelength in RSMT generation, the model is trained using the SCST algorithm, a reinforcement learning method that enables policy optimization without the need for ground-truth labels. SCST builds on the reinforcement algorithm by introducing a self-generated baseline to reduce gradient variance while maintaining unbiased learning.
Let the policy network define a probability distribution over decoding sequences , given an input point set . In reinforcement learning terminology, this probability distribution is referred to as the policy. At each training step, two trajectories are generated for each instance in the batch:
A greedy trajectory , generated by deterministically selecting the most probable action (i.e., via ) at each decoding step.
A sampled trajectory , generated by sampling from the policy , thereby introducing exploration into the learning process.
The quality of a trajectory is evaluated using a task-specific reward function. Since the goal is to minimize wirelength, the reward is defined as the negative total wirelength of the decoded tree:
where
denotes the rectilinear (Manhattan) distance of edge
e. The advantage
A of the sampled trajectory over the baseline is calculated as
This advantage quantifies the relative improvement or degradation of the sampled solution compared to the greedy one, which serves as a dynamic, model-driven baseline. The SCST loss is then formulated as
In practice, the expectation is approximated over a batch of sampled trajectories, and the gradient is estimated as
where
B is the batch size. The log probabilities are obtained from the decoder during sampling, and gradients are backpropagated only through the sampled trajectories.
The training process does not rely on an external Critic network. Instead, the model evaluates its own performance under greedy decoding, enabling more efficient and stable updates with lower implementation complexity. This Critic-free SCST formulation offers the advantage of being more computationally efficient and robust, especially in combinatorial optimization tasks like RSMT, where constructing an accurate Critic is nontrivial.
Integrating SCST with the pointer-based decoder and the SKTNet encoder achieves effective alignment between perception and action, learning to construct Steiner Trees that reduce wirelength in a fully end-to-end, reward-optimized training paradigm.
5. Experiments and Results
In this paper, the method is implemented in Python 3.9.19. The software and hardware environment used for training the network includes the Ubuntu 20.04 LTS operating system, PyTorch 1.12, CUDA 11.3, Intel(R) Xeon(R) Gold 6226 CPU, 125 GB RAM, and NVIDIA RTX 3090GPU with 24 GB RAM, where the optimizer is Adam [
37] and the initial learning rate is 2.5 ×
. The learning rate will decay by 0.96 after each training.
To train the proposed model, a series of random two-dimensional point sets were generated, with point sets ranging from 3 to 50. For each degree value, a dedicated model checkpoint was maintained, allowing the encoder–decoder network to capture and specialize in the corresponding structural complexity. This degree-specific checkpointing strategy ensures that the model generalizes well across varying design sizes and topological patterns. During training, the model was exposed to continuously changing layouts and terminal distributions, enabling it to learn a robust routing policy applicable to a broad range of RSMT instances. By simulating diverse connection topologies, the training process effectively encompassed both simple and complex routing scenarios, thereby improving the model’s adaptability and learning capacity.
To evaluate the effectiveness of the proposed method, comprehensive evaluations were conducted using multiple standard test sets. The same random point dataset adopted by REST was reused to ensure a fair and consistent comparison. Additionally, the proposed model was benchmarked against several representative RSMT construction approaches, including the exact algorithm GeoSteiner, its high-performance variant BGA [
38], and the widely used heuristic method FLUTE, which was tested under its highest precision configuration (A = 18).
To further enhance prediction accuracy, an eight-way transformation strategy is applied to each test case. Specifically, each input point set is subjected to eight geometric transformations, including horizontal/vertical reflections and coordinate swaps, to simulate the common symmetries in physical layout designs, as illustrated in
Figure 4. The model is evaluated independently on each transformed version, and the result with the shortest total wirelength is selected as the final prediction for that sample. This selection-based approach effectively serves as a form of test-time ensemble, improving both robustness and overall solution quality.
Figure 5a further demonstrates that selecting the best solution among transformed variants consistently yields lower wirelength errors than evaluating them on raw inputs alone, highlighting the benefit of incorporating geometric transformations during inference. However, as illustrated in
Figure 5b, this accuracy improvement comes at the cost of increased inference runtime. Since each transformation requires a full forward evaluation of the model, runtime grows approximately linearly with
T, with the overhead becoming more significant for larger point sets. These results highlight a practical trade-off: incorporating geometric transformations yields notable gains in solution quality, but it requires careful balancing against computational efficiency depending on application requirements.
As shown in
Table 1, the proposed model, SKTNet (where T denotes the number of geometric transformations applied during inference), achieves a runtime performance comparable to REST across all degrees. Specifically, SKTNet (T = 1) corresponds to evaluating the model on the raw input, while SKTNet (T = 8) applies an eight-way transformation strategy involving horizontal/vertical reflections and coordinate swaps. Although SKTNet (T = 8) incurs additional runtime due to multiple forward passes, its growth is still linear and remains comparable to REST under the same settings. Compared with traditional methods such as GeoSteiner and FLUTE, which achieve high accuracy in small-scale instances but scale poorly with problem size, both REST and SKTNet maintain stable runtime across all degrees. Notably, SKTNet (T = 1) performs on par with or slightly faster than REST (T = 1), and SKTNet (T = 8) achieves efficiency comparable to REST (T = 8). These results indicate that the architectural enhancements in SKTNet improve accuracy without incurring additional runtime overhead, thereby preserving scalability for practical deployment.
In addition to the numerical results,
Figure 5c provides a visual comparison of runtime as a function of degree. The plotted curves clearly highlight the scalability trend: while classical algorithms exhibit rapidly increasing runtime with higher degrees, both REST and SKTNet maintain nearly linear growth, with SKTNet showing no extra runtime overhead despite its enhanced encoder design. Together,
Table 1 and
Figure 5c confirm that the proposed framework achieves improved accuracy without sacrificing scalability, making it practical for large-scale deployment.
Table 2 presents the wirelength error of the proposed SKTNet compared with REST and heuristic baselines across various degrees, under both single-input (T = 1) and eight-way transformation (T = 8) settings. SKTNet (T = 1) directly evaluates raw inputs and already achieves lower errors than REST (T = 1), demonstrating the robustness of its feature extraction and dependency modeling. With SKTNet (T = 8), the model further reduces wirelength error by selecting the best result among the eight transformed variants, effectively functioning as a test-time ensemble. This strategy yields consistently lower error, especially in larger instances, though at the cost of increased runtime as reported in
Table 1. Compared with heuristic methods such as BGA and FLUTE, SKTNet maintains both competitive runtime and superior accuracy, confirming its effectiveness as a practical RSMT construction framework.
As illustrated in
Figure 6,
Figure 6a presents the RSMT generated by the REST model. It can be observed that the tree includes several redundant segments and unnecessary bends, particularly in certain local areas, which results in longer and less efficient routing paths. In contrast,
Figure 6b shows the result generated by the proposed model, which achieves a more efficient structure by reducing redundant wire segments and avoiding excessive detours. This leads to shorter overall wirelength and a topology with fewer crossings, making the layout more regular and closer to the optimal RSMT. The improvement is primarily attributed to the SKTNet encoder’s ability to capture multi-scale geometric features and global structural relationships among the input terminals. Additionally, the integration of the SCST strategy stabilizes the learning process and promotes better solution quality through policy refinement during training.
Figure 7 presents the step-by-step decoding process of constructing the RSMT, illustrating how the model incrementally selects terminal pairs based on the encoded representations. This sequential visualization reflects the model’s ability to dynamically reason about spatial positions and global topology during tree construction, resulting in more compact and efficient Steiner Trees with fewer redundant edges.
To evaluate the contribution of each architectural component to the overall performance of the proposed model, a set of ablation experiments was conducted under three progressively enhanced configurations: (1) a baseline configuration solely employing the SCST training strategy along with a standard Transformer encoder; (2) a mid-level variant that integrates SCST with an improved Macaron Transformer encoder; and (3) the full version of the proposed model incorporating the complete SKTNet encoder structure.
In the baseline configuration, the model is capable of generating valid RSMT, benefiting from the SCST reinforcement learning strategy. However, its performance is constrained by the limited capacity of the standard Transformer to adequately capture complex geometric relationships and global dependencies across the input point set. This often results in suboptimal feature encoding and reduced solution accuracy.
The introduction of the Macaron Transformer encoder significantly enhances the model’s global modeling ability by leveraging stacked dual feedforward layers and deeper attention mechanisms. This architectural refinement enables more expressive representation learning, leading to measurable improvements in wirelength accuracy across various test scenarios.
Building upon this, the integration of the SKConv module into the encoder results in the proposed SKTNet structure. SKConv facilitates dynamic multi-scale feature extraction through adaptive kernel selection, allowing the model to more effectively capture local topological variations and spatial patterns. The synergy between SKConv and the Macaron Transformer enables the encoder to extract richer hierarchical representations, which in turn guides the decoder toward higher-quality edge predictions.
As reported in
Table 3, each progressive enhancement yields noticeable gains in accuracy. The results clearly indicate that incorporating the Macaron Transformer substantially improves global dependency modeling, while further integrating the SKConv module provides additional benefits through adaptive multi-scale feature extraction. Together, these enhancements enable the complete SKTNet encoder to achieve superior accuracy compared to simpler variants. This progression underscores the importance of both global dependency modeling and adaptive local feature extraction in achieving state-of-the-art performance in RSMT construction.
6. Discussion
The proposed SKTNet-based RSMT generation framework demonstrates notable advantages over traditional and prior learning-based approaches. By integrating a Selective Kernel Convolution and an improved Macaron Transformer encoder, the model is capable of capturing both fine-grained local patterns and long-range global dependencies within the input point sets. This structural enhancement, coupled with the SCST training strategy, leads to consistently lower wirelength errors across various test configurations. Despite these improvements, several challenges and limitations remain.
First, the overall model complexity is moderately increased due to the multi-branch SKConv module and deeper encoder structure, which may affect inference efficiency in resource-constrained environments. While runtime remains comparable to REST, further optimization (such as kernel fusion, pruning, or quantization) could be explored to make the model more hardware-friendly.
Second, the current framework assumes that wirelength minimization alone is sufficient for routing optimization. In practical VLSI designs, however, additional constraints such as congestion management, layer assignment, and strict Design Rule Check (DRC) compliance play a critical role in ensuring manufacturable and high-performance layouts [
39]. Although the proposed method does not explicitly address these constraints, it provides a flexible foundation: congestion and layer utilization information can be incorporated into the reinforcement learning state representation, while reward functions can be extended to penalize congestion or imbalance across layers. Similarly, DRC rules can be embedded as feasibility checks during decoding or modeled through constraint-aware loss functions. These extensions would allow the framework to evolve from an “idealized” RSMT solver toward a practical, constraint-aware routing optimizer.
In addition, timing and power consumption are equally important optimization objectives in realistic design flows [
40]. Our current formulation focuses on wirelength as the primary metric, but the reinforcement learning framework can be naturally extended to capture timing- and power-aware objectives. For example, timing can be modeled by incorporating path delay or slack penalties into the reward function, while power consumption can be estimated through wire capacitance and switching activity and included as an auxiliary optimization term. Such multi-objective extensions would enable the framework to evolve from a pure RSMT solver into a more comprehensive routing optimizer capable of balancing wirelength, timing, and power simultaneously.
Moreover, routing solutions in practice must also account for vias introduced at wire bends. Each via incurs additional delay, resistance, and reliability cost, which can significantly impact both timing closure and overall power efficiency [
41]. Future extensions of the proposed framework will incorporate via-aware penalties into the optimization objective, ensuring that the model not only minimizes wirelength but also reduces unnecessary vias to improve signal integrity and manufacturability.
Lastly, while the model shows robust performance on synthetically generated point sets, its generalization to industrial benchmarks remains to be demonstrated. Bridging this gap may involve transfer learning, domain adaptation, or joint training with real circuit data and physical design constraints. Such efforts would further validate the practical applicability of SKTNet within real EDA workflows.
Overall, this work represents a promising step toward neural network-based RSMT generation. By addressing efficiency, via-aware parameters, and broader design constraints, the framework can be refined into a practical and extensible solution for next-generation VLSI physical design automation.
7. Conclusions
This work proposed a reinforcement learning-based framework for RSMT construction, powered by the SKTNet encoder and the SCST strategy. By integrating Selective Kernel Convolutions with an improved Macaron Transformer, SKTNet effectively captures both multi-scale geometric patterns and long-range topological dependencies, while SCST enables stable and efficient policy optimization without ground-truth labels.
Comprehensive experiments confirmed the superiority of the proposed framework over classical algorithms (GeoSteiner, BGA, and FLUTE) and the recent learning-based baseline REST. Results demonstrated that SKTNet not only achieves lower wirelength errors but also maintains comparable runtime scalability, even up to high-degree instances. Ablation studies further verified the contribution of each architectural component, highlighting the necessity of combining SKConv and the Macaron Transformer for accurate and efficient RSMT construction. These findings indicate that the proposed model is both methodologically innovative and practically scalable.
Looking forward, several promising research directions can be envisioned. Although the runtime of the proposed framework is already comparable to REST, further optimization through pruning, quantization, and operator fusion could significantly improve efficiency and hardware deployment. Furthermore, by integrating practical VLSI constraints, including timing closure, congestion management, power optimization, and via-aware parameters, the framework could be adapted to provide routing solutions that are not only wirelength-efficient but also manufacturable and performance-oriented. Another critical direction is industrial validation: applying the model to ISPD and DAC benchmarks as well as industrial-scale datasets, potentially enhanced by transfer learning or domain adaptation, will help bridge the gap between synthetic experiments and real-world design scenarios. Although not yet experimentally verified, the framework’s ability to jointly model local geometric patterns and global topological dependencies suggests potential applicability beyond routing optimization, such as placement refinement, parasitic-aware layout generation, and signal integrity optimization. These directions remain prospective, but they highlight the possibility of SKTNet serving as a unified learning-based backbone for multiple physical design tasks in the future.
In summary, this work contributes a scalable and extensible learning-based solution for RSMT construction. By combining architectural innovation, reinforcement learning strategies, and extensibility toward practical design constraints, the proposed framework provides new opportunities for integrating advanced neural architectures into next-generation VLSI physical design automation.