Tree-Based Machine Learning Models with Optuna in Predicting Impedance Values for Circuit Analysis

The transmission characteristics of the printed circuit board (PCB) ensure signal integrity and support the entire circuit system, with impedance matching being critical in the design of high-speed PCB circuits. Because the factors affecting impedance are closely related to the PCB production process, circuit designers and manufacturers must work together to adjust the target impedance to maintain signal integrity. Five machine learning models, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM), were used to forecast target impedance values. Furthermore, the Optuna algorithm is used to determine forecasting model hyperparameters. This study applied tree-based machine learning techniques with Optuna to predict impedance. The results revealed that five tree-based machine learning models with Optuna can generate satisfying forecasting accuracy in terms of three measurements, including mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R2). Meanwhile, the LightGBM model with Optuna outperformed the other models. In addition, by using Optuna to tune the parameters of machine learning models, the accuracy of impedance matching can be increased. Thus, the results of this study suggest that the tree-based machine learning techniques with Optuna are a viable and promising alternative for predicting impedance values for circuit analysis.


Introduction
An integrated circuit (IC) comprises electronic circuits and components connected to each other via planar conductors that are electrically arranged on a planar silicon semiconductor substrate. Interconnections constitute the signal communication between the dies on a printed circuit board (PCB). Because signal integrity in high-speed circuit design is critical to electronic products, signal integrity issues have been essential for both high-speed circuit designers and PCB manufacturers. As a result, signal integrity has been extensively investigated in various high-speed and high-frequency applications [1][2][3][4]. Due to the close relationship between impedance and wiring patterns, impedance matching is one of the critical factors in high-speed PCB circuit design.
Impedance is the combination of capacitance and inductance in a high-frequency circuit. The controlled impedance in a printed circuit board ensures high signal integrity. The impedance Z is represented in Equation (1).
where R is the resistance, j is the imaginary number, XL is the inductive reactance, and XC is the capacitive reactance. There are various signal transmissions in the wires on the circuit Table 1. Recent applications of machine learning approaches in circuit design and analysis.

Literature Years Applications Methods
Zhang et al. [7] 2022 Impedance prediction DNN Juang et al. [8] 2022 Decoupling capacitor Genetic Algorithms Xu et al. [9] 2021 Decoupling placement optimization Genetic Algorithms Park et al. [10] 2020 Decoupling capacitor Q-Learning Swaminathan et al. [11] 2020 Signal and power integrity FFNN, RNN, CNN Cecchetti et al. [12] 2020 Decoupling capacitor GA-ANN Schierholz et al. [13] 2020 Predicting target impedance violations ANN Zhang et al. [14] 2019 Decoupling capacitor DRL, DNN Park et al. [15] 2019 Signal and power integrity DNN Givaki et al. [16] 2019 Impedance estimation Random Forest Paulis et al. [17] 2019 Decoupling placement optimization Genetic Algorithms Using decoupling capacitors, Xu et al. [9] presented a genetic algorithm-based method to optimize power delivery networks. The proposed method can also optimize jitter and power delivery networks (PDNs) impedance. According to the simulation and analysis results, the designed optimization method could reduce jitter and provide an optimal solution for the number of decoupling capacitors. Meanwhile, Park et al. [10] proposed an optimized decoupling capacitance design method based on a Q-learning algorithm for silicon interposer-based 2.5-D/3-D ICs. When testing power distribution networks, the presented approach was used to confirm target impedance values. The validation procedure was confirmed by comparing full-search simulations with the best result. The computation time of the proposed model was significantly less than that of the full-search simulation.
Swaminathan et al. [11] used machine learning techniques to solve signal and power integrity issues in package design. According to the finding of their study, using machine learning techniques logically can eliminate errors in the design process and thus, reduce design cycle time. Meanwhile, Cecchetti et al. [12] proposed an iterative optimization for the placement of decoupling capacitors in PDNs based on genetic algorithms (GA) and artificial neural networks (ANN). The study revealed that the designed GA-ANN model effectively produced results consistent with those obtained from the simulator generating a longer computation time.
Schierholz et al. [13] also used an ANN to predict target impedance violations in a large design space. The results of their study revealed that prediction accuracy in the design space for PDN impedance was very satisfactory. On the other hand, Zhang et al. [14] applied the deep reinforcement learning (DRL) approach and DNN to optimize the allocation of decoupling capacitors on priority positions. According to the results of their study, the proposed hybrid method could provide the minimum number of decoupling capacitors to satisfy the target impedance on a printed circuit board test.
Park et al. [15] created a DNN with regression and classification functions to conduct forecasting and classifying tasks for peak time-domain reflectometry impedance using silicon via void defects. Their study revealed that by partially tuning weights, the proposed models could provide accurate results. To estimate the impedance of power networks, Givaki et al. [16] proposed a random-forest model. The proposed model used the evolutionary multi-objective NSGA-II algorithm to adjust the random-forest model to estimate the resistance and inductive reactance accurately. In addition, Paulis et al. [17] employed genetic algorithms to optimize decoupling capacitors for PDN design at the PCB level to obtain a frequency spectrum at various locations. A close relationship between the measured results and the simulated input impedance revealed the effectiveness of the proposed method when validated on the board.
The characteristics of PCBs varies with different suppliers in the PCB circuit design and manufacturing process. The PCB foundry can precisely control the impedance characteristics when producing a PCB, while the signal transmission speed can be tested after the PCB board is manufactured. In this study, five tree-based machine learning models with the Optuna optimization algorithm were used to forecast the target impedance values. Optuna was used to determine the hyperparameters of machine learning models. The rest of this study is organized as follows. Section 2 depicts the PCB-based substrate and circuit transfer characteristics. Section 3 introduces the machine learning models and the Optuna optimization algorithm, while Sections 4 and 5 describe the numerical results and conclusion, respectively.

IC-PCB Circuit Signal Transmission and Substrate Structure
Impedance matching is a common working state in PCB circuits, reflecting the power transfer relationship between the input and the output circuits. PCB or substrate design is responsible for the characteristic impedance discontinuities of interconnections for signal integrity. Maximum power transfer, on the other hand, is achieved when the circuit impedance is matched. Signal integrity and power loss are influenced by the impedance gaps between the IC package and the PCB system. Reflections can cause unexpected noises in systems [18]. Figure 1 shows the circuit signal transmission on the substrate. As depicted in Figure 1, the impedance gaps have a significant impact on signal integrity and power loss. The internal impedance of the signal transmitter should ideally be the same as the target impedance of the transmission line at the source to reduce reflections when sending signals. Meanwhile, to communicate the signal between the chip and the circuit board, the circuit inside the IC carrier board connects the chip and the external circuit board together. Lines and drawings, dielectric layers, holes, and solder resist ink make up the substrate. Figure 2 depicts a multilayer PCB stack-up. When using a time-domain reflectometer to measure impedance signals, probes are used for the outer signal line and GND pin. The measurement includes the metal and dielectric layers in the inner layer of the PCB stack-up.   Since the circuit performance of the PCB board provided must be able to ensure the signal is not reflected during the transmission process, which keeps the signal intact and reduces the transmission loss, this plays a crucial role in the substrate material of impedance [19,20]. Figure 3 illustrates the proposed architecture for impedance value prediction. As shown in Figure 3, the architecture was divided into three stages: data preprocessing, training stage, and testing stage. SPIL (Siliconware Precision Industries Co., Ltd.,Taiwan ) provided the raw impedance data used in this study. Each dataset was preprocessed and divided into 80% training data and 20% testing data. Training data was employed to build models with tree-based machine-learning (ML) methods during the training stage. Five models, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) were used in this study. In addition, the Optuna framework was used to determine model hyperparameters. Finally, testing data was used to predict the finalized model, and the forecasting performances were evaluated.    Since the circuit performance of the PCB board provided must be able to ensure the signal is not reflected during the transmission process, which keeps the signal intact and reduces the transmission loss, this plays a crucial role in the substrate material of impedance [19,20]. Figure 3 illustrates the proposed architecture for impedance value prediction. As shown in Figure 3, the architecture was divided into three stages: data preprocessing, training stage, and testing stage. SPIL (Siliconware Precision Industries Co., Ltd.,Taiwan ) provided the raw impedance data used in this study. Each dataset was preprocessed and divided into 80% training data and 20% testing data. Training data was employed to build models with tree-based machine-learning (ML) methods during the training stage. Five models, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) were used in this study. In addition, the Optuna framework was used to determine model hyperparameters. Finally, testing data was used to predict the finalized model, and the forecasting performances were evaluated. Since the circuit performance of the PCB board provided must be able to ensure the signal is not reflected during the transmission process, which keeps the signal intact and reduces the transmission loss, this plays a crucial role in the substrate material of impedance [19,20]. Figure 3 illustrates the proposed architecture for impedance value prediction. As shown in Figure 3, the architecture was divided into three stages: data preprocessing, training stage, and testing stage. SPIL (Siliconware Precision Industries Co., Ltd., Taichung, Taiwan) provided the raw impedance data used in this study. Each dataset was preprocessed and divided into 80% training data and 20% testing data. Training data was employed to build models with tree-based machine-learning (ML) methods during the training stage. Five models, including decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), categorical boosting (CatBoost), and light gradient boosting machine (LightGBM) were used in this study. In addition, the Optuna framework was used to determine model hyperparameters. Finally, testing data was used to predict the finalized model, and the forecasting performances were evaluated. Table 2 presents the PCB products' raw data, including product types and variables of impedance. The PCB products' raw data were categorized into different datasets based on the following attributes: signal layers and patterns. Seven attributes were used as independent variables in this study. These attributes include trace width, gap, space, solder mask, L1 thickness, base, and dielectric thickness. According to the manufacturing process, the data of products were classified into three categories represented by GSSG, SS, and S for category A, B, and C, respectively. The signal layer has different layers based on PCB layer structures. Table 3 displays the datasets for product subcategories based on the signal layers and patterns.   Table 2 presents the PCB products' raw data, including product types and variables of impedance. The PCB products' raw data were categorized into different datasets based on the following attributes: signal layers and patterns. Seven attributes were used as independent variables in this study. These attributes include trace width, gap, space, solder mask, L1 thickness, base, and dielectric thickness. According to the manufacturing process, the data of products were classified into three categories represented by GSSG, SS, and S for category A, B, and C, respectively. The signal layer has different layers based on PCB layer structures. Table 3 displays the datasets for product subcategories based on the signal layers and patterns.

Tree-Based Machine Learning
Tree-based machine learning models are defined as supervised machine learning algorithms employed for solving problems of classification and regression. In the treedividing procedure, the training data was divided into subsets, where every split increases the complexity of models to conduct the task well [21][22][23][24][25].
In addition to the basic decision tree (DT) and the random forest (RF), the extreme gradient boosting (XGBoost), light gradient-boosting machine (LightGBM), and the categorical boosting (CatBoost) are popular and powerful methods with outstanding performance in many fields. DT and RF are basic tree-based machine learning, while XGBoost, CatBoost, and LightGBM are advanced models of gradient-boosting decision trees. Treebased machine-learning models have been used in many fields, such as economics and finance [26,27], politics [28], business and insurance [29,30], biology and environment [31,32], and medicine and healthcare [33,34]. However, the applications of tree-based machinelearning models in forecasting impedance values for circuit analysis have not been widely investigated. Thus, this study used Optuna to determine hyperparameters for tree-based machine learning models, applied to impedance values for the PCB industry.
The first tree-based model used in this study is the decision tree. As one of the basic methods for dealing with regression and classification problems [35], the decision tree conducts regression and classification tasks by variables with continuous and discrete values, respectively [36]. This study used decision trees for regression problems. Table 4 indicates the hyperparameters and search ranges of the decision tree model used in this study [37][38][39]. Table 4. Hyperparameters of the decision tree model tuned in this study [37][38][39]. The second technique employed in this study is the random forest. Developed by Breiman [40], the random forecast is composed of multiple decision trees and performs random feature selection of each tree, then averages output values of all individual trees to obtain the model's output [41]. Table 5 depicts the hyperparameters and searching ranges used in the random forecast method. As shown in Table 5, hyperparameters contain the number of trees in the forest (n_estimators), the max number of levels in each decision tree (max_depth), and the number of data points placed in a node before the node is split (min_samples_split) [37][38][39]42]. Table 5. Hyperparameters of the random forest model tuned in this study [37][38][39]42].

Hyperparameters
Implications XGBoost [43] approach is the third tree-based machine learning model employed in this study. The XGBoost combines two characteristics, bagging and boosting, for ensemble learning. The bagging trains models in parallel and generates trees by independent sampling. The advantage of bagging policy is to increase the stability and accuracy of models. The boosting generates trees sequentially, and each tree is related to each other. The generation of each tree in the boosting procedure can improve the poor learning of the previous tree [44,45]. Table 6 presents the hyperparameters and the search ranges of the XGBoost model in this study [46,47]. Sequentially, the CatBoost [48], which is one of the gradient-boosting algorithms based on decision trees, was introduced in this study. By using an ensemble learning strategy, the CatBoost approach takes advantage of the combination of weaker regression models to form a robust regression model. Table 7 illustrates hyperparameters and the search ranges of the CatBoost model in this study [47,[49][50][51]. Lastly, this study employed the LightGBM to forecast the impedance values for circuit analysis. The LightGBM is a lightweight algorithm based on the gradient-boosting algorithm proposed by Ke et al. [52]. LightGBM approach uses a novel, gradient-based, and one-sided sampling technique to filter data instances and generate segmentation values. In addition, exclusive feature bundling is conducted to reduce the number of features. Thus, the LightGBM results in an efficient training procedure. Table 8 shows hyperparameters and the search ranges of the LightGBM model in this study [26,[53][54][55][56]. Finally, the LightGBM was employed in this study to forecast the impedance values for circuit analysis. The LightGBM is a lightweight algorithm based on gradient boosting proposed by Ke et al., [52]. LightGBM approach uses a novel gradient-based one-sided sampling technique to filter data instances and generate segmentation values. In addition, the exclusive feature bundling is conducted to reduce the number of features. Thus, the LightGBM results in an efficient training procedure. Table 8 depicts hyperparameters and the searching ranges of the LightGBM model in this study [26,[53][54][55][56].

Optuna for Selecting Hyperparameters of Tree-Based Machine Learning Models
Determining hyperparameters for tree-based machine learning models significantly influences the forecasting performance [57,58].
Optuna [59] is an emerging tool with three advantages for model selection or hyperparameters determination. The first advantage Optuna provides is the define-by-run style API. The second advantage is an efficient pruning and sampling mechanism. The third advantage is that it is easy to set up. The concept of define-by-run style API comes from a deep-learning framework. It enables users to decide the hyperparameter search space dynamically. Meanwhile, two efficient sampling and pruning mechanism policies are efficient searching and efficient performance estimation, both of which request the cost-effective optimization method. On the other hand, the most commonly used sampling methods are relational sampling and independent sampling, represented by covariance matrix adaptation evolution strategy (CMA-ES) and (tree-structured Parzen estimator) TPE, respectively. Specifically, Optuna allows customized sampling procedures. In terms of the pruning mechanism, two phases were performed. First, the intermediate objective values were periodically monitored. Second, the trail is terminated when the predefined condition is not met. Optuna's last design feature is associated with its ease of setup, which allows it to be easily configured for lightweight experiments to heavy-weight distributed computations under the versatile architecture [60,61]. Figure 4 depicts the essential steps in determining hyperparameters for machine learning models in Optuna. The first step is to enter the hyperparameters of machine learning models. In this study, five tree-based machine learning models, each having a different set of hyperparameters, were used. The second step is determining the search ranges of hyperparameters and types, including integers, real numbers, and categorical numbers. The third step is to set the objective function for Optuna, as provided by the machine learning models. Then, optimization directions are determined. Minimizing forecasting errors serves as the direction and the objective function of this study. Finally, the number of trials of Optuna is set. In this research, the sampler, direction, and n_trials are set to TPE sampler, minimum, and 100, respectively. setup, which allows it to be easily configured for lightweight experiments to heavy-weight distributed computations under the versatile architecture [60,61]. Figure 4 depicts the essential steps in determining hyperparameters for machine learning models in Optuna. The first step is to enter the hyperparameters of machine learning models. In this study, five tree-based machine learning models, each having a different set of hyperparameters, were used. The second step is determining the search ranges of hyperparameters and types, including integers, real numbers, and categorical numbers. The third step is to set the objective function for Optuna, as provided by the machine learning models. Then, optimization directions are determined. Minimizing forecasting errors serves as the direction and the objective function of this study. Finally, the number of trials of Optuna is set. In this research, the sampler, direction, and n_trials are set to TPE sampler, minimum, and 100, respectively.

Numerical Results
This study demonstrated five tree-based ML methods: DT, FR, XGBoost, CatBoost, and LightGBM. To predict the impedance value, this study optimized each model's hyperparameters using Optuna. Three evaluation metrics were used to evaluate the experimental results of this study: mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination ( R2 ). As shown in Equations (2)-(4).

Numerical Results
This study demonstrated five tree-based ML methods: DT, FR, XGBoost, CatBoost, and LightGBM. To predict the impedance value, this study optimized each model's hyperparameters using Optuna. Three evaluation metrics were used to evaluate the experimental results of this study: mean absolute percentage error (MAPE), root mean square error (RMSE), and coefficient of determination (R2). As shown in Equations (2)-(4).
where n is the number of forecasting instance, Y i is the i th actual impedance value,Ŷ i is the i th forecasting impedance value, and Y is the mean value of actual impedance value. Tables 9 and 10 depict the hyperparameters determined by Optuna. Figures 5 and 6 present the importance of the LightGBM models' hyperparameters for different products, indicating that the hyperparameter of the "min_data_in_leafe" is the most important in most products. Table 11 illustrates the prediction results of the tree-based machine learning models. Overall, the average MAPE and RMSE of impedance-predicting results are low. LightGBM has the best performance in all datasets, followed by XGBoost and CatBoost, while DT and RF performances are slightly inferior. In addition, five tree-based machine learning models with Optuna can obtain MAPE values less than 10 and can be treated as accurate forecasting models [62]. R2 is the measurement of the independent variables' abilities to interpret dependent variables. When the R2 value is close to 1, the explanatory abilities of independent variables are at higher levels [63][64][65]. Among all models, the most explanatory is LightGBM, followed by XGBoost and CatBoost, and finally DT and RF. Figure 7 provides the actual and predicted values of impedance for the five tree-based machine learning models used in this study. Thus, the proposed tree-based machine learning models are useful and can be duplicated in forecasting impedances accurately during the design process of PCB. Therefore, the PCB design time can be reduced effectively. Table 9. The hyperparameters for category A models provided by Optuna.

Conclusions
This study used five tree-based machine-learning techniques with Optuna to predict impedance due to the differences between the circuit simulation and the actual measurement in the production process of PCB wiring impedance. The forecasting outcomes revealed that tree-based machine learning models with Optuna are feasible and accurate methods for predicting target impedance values. The light gradient-boosting machine with Optuna performed the best in three forecasting measurements. Thus, the proposed

Conclusions
This study used five tree-based machine-learning techniques with Optuna to predict impedance due to the differences between the circuit simulation and the actual measurement in the production process of PCB wiring impedance. The forecasting outcomes revealed that tree-based machine learning models with Optuna are feasible and accurate methods for predicting target impedance values. The light gradient-boosting machine with Optuna performed the best in three forecasting measurements. Thus, the proposed treebased machine learning using the Optuna model is useful when defining target impedances during design, simulation, and manufacturing, as it improves the impedance prediction for PCB designers and manufacturers. For the current practical manufacturing process, manufacturers can use the existing impedance data and perform accurate impedance prediction through the method proposed in this research to shorten the PCB design and process time. Future studies may employ more impedance value forecasting cases to examine the robustness of the designed machine learning techniques in predicting the target impedance. The other potential direction for future work is applying other forecasting techniques to obtain more accurate results.