Feasibility Prediction for Rapid IC Design Space Exploration

: The DARPA POSH program echoes with the research community and identiﬁes that engineering productivity has fallen behind Moore’s law, resulting in the prohibitive increase in IC design cost at leading technology nodes. The primary reason is that it requires many computing resources, expensive tools, and even many days to complete a design implementation. However, at the end of this process, some designs could not meet the design constraints and become unroutable, creating a vicious circuit design cycle. As a result, designers have to re-run the whole process after design modiﬁcation. This research applied a machine learning approach to automatically identify design constraints and design rule checking (DRC) violation issues and help the designer identify design constraints with optimal DRCs before the long detailed routing process through iterative greedy search. The proposed algorithm achieved up to 99.99% design constraint prediction accuracy and reduced 98.4% DRC violations with only a 6.9% area penalty.


Introduction
In recent years, information technology is experiencing tremendous growth due to the remarkably low sensor price and a sea of devices connected to the internet, including mobile phones, wristwatches, TVs, cars, washing machines, etc. [1][2][3]. On the other hand, computational power has increased significantly, and researchers are struggling to meet the market demand at low-cost, especially at low-technology nodes (i.e., below 20 nm technologies) [4,5]. The primary reasons are the increasing number of design rules, design complexity, high-performance, and lower area budget. After considering all those issues, implementing a design rule checking (DRC) violations clean design is exceptionally challenging. The primary reason is the existing global routing-based congestion maps do not adequately correlate with the DRC violations maps after the detailed routing phase [6].

Motivation
The conventional IC design flow uses hardware descriptive language and then uses logic synthesis and place and route tools for implementation. Depending on the size of a design, this implementation process may take several days to weeks [7]. Moreover, the DRC violations are only visible to designers after the final step of the design and require critical technical expertise to resolve those with specific instructions or manually. Moreover, meeting several design constraints can lead to many implementation iterations and consequently increasing the chip price.
At low-technology nodes, researchers identified several issues such as compact design rules, constraint in via placements, and metal routing orientations [8]. The severity of those issues could lead to design to a point where the benefits of technology scaling even could diminish. Furthermore, there are several reasons why high-end process technology developers cannot accurately measure sophisticated IC design rules. One of the crucial reasons is electronic design automation (EDA) companies allow only close co-development with customers resulting in a long time-consuming process that creates a vicious cycle. In addition, the library exchange format (LEF) standards are not consistent among different vendors. All those factors again increase design and development time. As a result, research started applying machine learning approaches in the IC design process [9].

Related Work
Machine learning (ML) algorithms have widespread use in every abstraction layer of computation, including edge computing [10], cloud computation [11], internet of things (IoTs), conventional feature space exploitation for performance estimation [12]. However, ML applications on EDA have a much more significant impact due to electronics in every aspect of our life. Researchers primarily consider standard design layout features that have a significant impact on DRC violations. A congestion predictor uses pin shape layouts and their density to identify local routing hotspots [13]. Researchers use a supervised machine learning approach to improve detailed routing considering global routing features [14]. Chan et al. [6] proposed a supervised machine learning-based white space allocation approach at the detailed routing stage to improve routability and reduce the number of DRC violations. However, this approach requires additional legalization after new cell placements. Islam et al. [15] used unsupervised machine learning algorithms for clock tree synthesis (CTS). Kahng et al. proposed a multi-armed bandit (MAB) algorithm considering the area, slack time, and the number of tool runs thresholds to reduce tool noise and increase profitability [16]. However, this method requires an exhaustive tool run to identify the optimal frequency of a design.
A deep learning approach uses a convolutional neural network (CNN) for DRC hotspots prediction [17]. This approach uses trial route and pin density features and constructs a single tensor image size of 224 × 224 × 2. However, the CNN approach suffers from low prediction accuracy. Another J-Net-based CNN architecture predicts DRC hotspots using high-resolution pin configuration images and low-resolution tile-based feature maps [18]. However, this approach also suffers from low prediction accuracy. Another attractive deep learning-based model uses pin-related features to identify metalrelated issues in the layouts. However, this approach ignores issues related to other physical properties of a layout [19].

Contributions
This research used ensemble ML methods and heuristic greedy algorithms to improve synthesis tool performance by predicting optimal design constraints and DRC violations. A large dataset was collected using extremely time-consuming industrial synthesis and implementation tools. The proposed EDA synthesis flow can target both application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) implementations. In particular, the key contributions in this work are:

•
To the best of the author's knowledge, this is the first-ever approach to improving synthesis tool performance by combining ensemble learning algorithms and a greedy approach.

•
To the best of the author's knowledge, this is the first-ever approach to guide inexperienced designers with efficient design choices to reduce the precious design time.

•
This research collected 74 design features from layouts and identified the design constraints those influence DRC violations. Figure 1 shows the vicious cycle of EDA tools flow, since each stage of tools flow creates tremendous stress to the circuit designer in order to optimize the Pareto optimal parameters such as performance, power, area, cost, and design time to launch the implemented design [20][21][22]. At the beginning of the traditional system on chip (SoC) hardware design flow, the behavioral description of the design systems needs to be written using the hardware description language [7], which is the register transfer level (RTL) of abstraction. Recently, due to the increasing popularity of hardware accelerators, the abstraction level is rising from RTL to the algorithmic level, which is described using high-level languages such as C, C++, SystemC, or Simulink and then followed by an automated high-level synthesis (HLS) flow to synthesize these descriptions to generate RTL [23][24][25][26]. These multiple processes of system description allow designers to evaluate multiple alternatives, which is known as design space exploration (DSE) [27,28]. Then, to meet the functional requirements of design, we need to run the functional simulation until it meets the requirements.  Figure 1. A vicious cycle in circuit design often begins with a hardware description of a new design and end-up after sending the tape-out to the fabrication lab by addressing several stressful iterative processes to meet the complex design rules, high-performance, and low area budget.

DesignWare
After successful RTL translation from the behavior description, we resort to EDA tools to translate RTL to gate-level netlist through a logical synthesizer for formal verification [29]. The translated gate-level netlist contains timing information in terms of basic gates. Such information helps EDA tools to perform static timing analysis for checking several design constraints such as setup and hold times, process corners variation, temperature variations, and interconnect variations [30]. Failure to meet these criteria requires tedious effort to fix the HDL description of the given design.
After successful gate-level synthesis, EDA tools initiate the physical design steps, where partitioning is the first step. In the partitioning step, EDA tools separate the whole gate-level netlist into different partitions in order to facilitate parallel implementation of each sub-system. However, this partitioning introduced several constraints in the design, such as communication cost among the sub-system, timing budgets, and inter-dependency among the sub-system [31]. Hence, several optimization techniques, as well as the machine learning model, were introduced to deal with these constraints [31][32][33].
After partitioning, EDA tools perform floorplanning to estimate the rough area of the logic cells and modules, power network, and I/O pads through optimization of several constraints such as minimization of area, limitation of aspect ratio, reducing routing, minimum voltage islands, and distribution of heat map throughout the chip. After estimating the rough area, EDA tools perform clock tree synthesis (CTS), routing, DRC violation cleaning, and Layout Versus Schematic (LVS) in the placement step before sending it for fabrication.

Proposed Methodology
In order to correlate design choices with the layout features, this research considers empirical design constraints. For example, clock period (CP), aspect ratio (AR), total wire length (TWL), total area (TA), core utilization (CU), etc. All those features have a significant impact on the routability of a design and help us to build a large dataset. The proposed methodology automatically identifies the set of optimal design constraints for the designer to reduce design time by ensuring routability of a design before the most expensive detailed routing phase, as shown in Figure 2.  First of all, the proposed algorithm uses a decision tree regression (DTR) algorithm to predict CU, AR, TA, CP, the number of DRC, and TWL. The DTR is a standard supervised learning algorithm used for both classification and regression [34][35][36][37]. The decision tree considers an independent set of rules, and the top-down approach is the ensemble of many attribute tests from root to each leaf node. This method uses a criterion function considering either mean squared error (MSE) or mean absolute error (MAE).

DesignWare
Unlike existing block-based or grid-based DRC violation prediction, the proposed method predicts design constraints and violations in a design. From many design implementations (D 1 , D 2 , . . . . . . , D x ) and design features (F 1 , F 2 , . . . . . . , F y ), we can define objective or cost functions to determine the optimal model. When we want to maximize two features F1 and F2, our objective function will try to maximize( Similarly, when we want to minimize two features, the objective function will try to max- On the contrary, when we try to increase one feature and decrease the other, then the objective function tries to increase the differences using Algorithm 1 shows the optimal design cost computation model. This algorithm uses a set of designs, a set of associated features, and an optimization type (OptType) as inputs and returns the optimal cost. The algorithm iterates through each design from D x and associated features from F y and computes the maximization, minimization, and maximum difference using maxObjective(), minObjective(), and di f f Objective() functions, respectively, from Line 5 to Line 24. It is worth mentioning that depending on the OptType only one of the conditional blocks (i.e., Line 7 or Line 12 or Line 17) will be true. Then Algorithm 1 returns the optimal cost in Line 25. for all y ∈ F y : do 7: if (OptType == maximization) then 8: OptCost max = maxObjective(D x , F y ) 9: if (OptCost max > OptCost) then 10: OptCost = OptCost max 11: end if 12: else if (OptType == minimization) then 13: OptCost min = minObjective(D x , F y ) 14: if (OptCost min > OptCost) then 15: OptCost = OptCost min 16: end if 17: else(OptType == di f f erentiation) 18: Once the proposed methodology computes the optimal cost function associated with an optimal design, it can guide the researchers to set the right set of constraints for their design using the proposed greedy approach. The proposed iterative greedy search (IGS) algorithm correlates the features with the results in optimal design constraints and the number of DRC. To be precise, it used an iterative method to identify the optimal set of design constraints.
The proposed IGS takes a set of features and a feature priority list as inputs and returns the optimal design constraints, as shown in Algorithm 2. The objectiveEvaluation() function iteratively computes the optimized results depending on the Cartesian product of x and y pairs and given the priority list in Line 6, utilizing Algorithm 1. The sort() function computes the optimal design constraints in Line 8. Finally, Algorithm 2 returns the efficient design constraints in Line 9. For n number of feature constraints; the for loop in Line 5 uses n(n−1) 2 iterations. However, the proposed methodology has a limited number of feature constraints, so the algorithm is very time-efficient.

Experimental Setup
The proposed methodology considered the open-source implementation of arithmetic cores, cryptographic cores, processors, co-processors, memory cores, ECC cores, communications and system controllers, DSP cores, video controllers from OpenCore [38]. It varied in its different high-level features and implemented 589 designs, and each design contained 74 features, resulting in 60.8 K data.
This research used a Synopsys Design Compiler for logic synthesis and an IC Compiler for placing and routing with a 28 nm CMOS technology library. For synthesis and placing and routing, it used our custom tool command language (TCL) script. This research considered 80% data for training and 20% data for the test. In addition, it used the Python programming language to implement the proposed algorithms on an Intel Xeon(R) Silver 3.2 GHz 20-core processor with 48 GB random access memory (RAM).

Results
To determine design feature dependencies, we consider several design constraints including, CP, CU, DRC violations, and TWL, as shown in Figure 3a, Figure 3b, Figure 3c, and Figure 3d, respectively. Empirically, clock period variation has a non-linear relationship with CU and DRC violation and mostly worsens DRC issues. However, the proposed algorithm identifies the best suitable CU, DRC violations, and TWL for a specific CP. The best solutions exhibit that reduction in CP reduces the CU and DRC violation, but it increases the TWL. A 21-30% reduction in CP resulted in a significant 53% reduction in DRC violations. The increase in core utilization had an adverse impact on DRC violations and CP, as shown in Figure 3b. In general, a decrease in DRC violations requires a reduction in CU and an increase in TWL, as shown in Figure 3c. An increase in TWL reduces the DRC violations. However, it improves the clock frequency and confirms the non-linear relationship among those constraints, as shown in Figure 3d, resulting in a non-trivial task in the designer hand. The proposed methodology performed analysis considering our maximization, minimization, and differentiation functions, as shown in Figure 4. The optimal solutions are represented by the largest blue, red, and black circles, considering maximization, minimization, and subtraction functions, respectively. This research defines the constraints prediction problem as a DTR algorithm. In this analysis, it considered 42 random_state for the splitting and MSE criteria function.
The proposed algorithm predicted the CU, AR, TA, CP, number of DRC violations, and TW using the DTR algorithm, as shown in Table 1 Table 1 shows the greedy-based design-space exploration algorithm results. It compared these research results with the existing industry-standard synthesis tools (baseline). Each pair of rows of results uses single priority constraints (bold fonts). For example, when one sets CU as a priority, the proposed method shows 11.2% CU improvement and 75.8% DRC reduction. Similarly, the proposed algorithm reduces 98.4% DRC violations with only a 6.9% area penalty.

Discussion
Following an ensemble DTR algorithm, we proposed a optimal design cost algorithm (i.e., IGS) that used physical design-related features and performance-related features to identify the optimal set of design parameters. A priority-based design space-exploration allow designers to capture best solutions for a given set of constraints. The proposed DTR algorithm exhibits 88.9% DRC hotspot prediction accuracy compared to existing CNN-based model's [17] 73% prediction accuracy.
The proposed DTR algorithm has a training time complexity of O(d F n log(n)), where d F is dimensions or features, and n is the data points. The testing time complexity of the proposed DTR algorithm is O(D t ), where (D t ) is the depth of the tree. The performance of Algorithm 1 is almost constant in time due to the use of a limited number of designs and design features. The proposed DTR algorithm requires 31.8 ms training time and 0.72 ms inference time, while the existing J-Net-based CNN model [18] has 2 h of training time and 1 min of inference time.
The inclusion of the proposed methodology with the existing EDA tools has the potential to reduce days of engineering costs. In addition, it will pave the designer's way to building a better constraint model for area and performance optimizations.

Conclusions
This paper proposed a searching algorithm by combining both ensemble and heuristic greedy search methods to explore the design space quickly and efficiently in order to improve the synthesis tool flow performance. To explore large search space as well as to illustrate the efficacy of proposed algorithm, this research collected a large number of features (i.e., 74) from various designs and identified the features those influence DRC violations. The proposed algorithm achieved up to 99.99% design constraint prediction accuracy before tedious detailed-routing and reduced 98.4% DRC violations with only a 6.9% area penalty.