A Multi-Layer Data-Driven Security Constrained Unit Commitment Approach with Feasibility Compliance

: Security constrained unit commitment is an essential part of the day-ahead energy markets. The presence of discrete and continuous variables makes it a complex, mixed-integer, and time-hungry optimization problem. Grid operators solve unit commitment problems multiple times daily with only minor changes in the operating conditions. Solving a large-scale unit commitment problem requires considerable computational effort and a reasonable time. However, the solution time can be improved by exploiting the fact that the operating conditions do not change signiﬁcantly in the day-ahead market clearing. Therefore, in this paper, a novel multi-layer data-driven approach is proposed, which signiﬁcantly improves the solution time (90% time-reduction on average for the three studied systems). The proposed approach not only provides a near-optimal solution (<1% optimality gap) but also ensures that it is feasible for the stable operation of the system (0% infeasible predicted solutions). The efﬁcacy of the developed algorithm is demonstrated through numerical simulations on three test systems, namely a 4-bus system and the IEEE 39-bus and 118-bus systems, and promising results are obtained.


Introduction
In a power system, the daily load profile exhibits significant fluctuations between offpeak and peak demand hours. If generators are scheduled according to the peak demand of a day, then several generators will operate at their lowest power levels during off-peak hours. It will increase the cost of power production, which can be avoided using efficient scheduling. Hence, it is the job of a grid operator to determine cost-efficient generation schedules, known as a Unit Commitment (UC) problem. It is a mathematical optimization problem where the goal is to coordinate the production of a set of generators to meet the anticipated energy requirements, as illustrated in Figure 1.

Introduction
In a power system, the daily load profile exhibits significant fluctuations between off-peak and peak demand hours. If generators are scheduled according to the peak demand of a day, then several generators will operate at their lowest power levels during off-peak hours. It will increase the cost of power production, which can be avoided using efficient scheduling. Hence, it is the job of a grid operator to determine cost-efficient generation schedules, known as a Unit Commitment (UC) problem. It is a mathematical optimization problem where the goal is to coordinate the production of a set of generators to meet the anticipated energy requirements, as illustrated in Figure 1.  Following deregulation in the USA in the 1990s, Independent System Operators (ISOs) and Regional Transmission Organizations (RTOs) replaced utilities as grid operators. Figure 2 illustrates the areas operated by RTOs in the USA. They typically run two bid-based markets to determine economic dispatch: real-time and day-ahead markets.
Energies 2022, 15, x FOR PEER REVIEW Following deregulation in the USA in the 1990s, Independent System Ope (ISOs) and Regional Transmission Organizations (RTOs) replaced utilities a operators. Figure 2 illustrates the areas operated by RTOs in the USA. They typica two bid-based markets to determine economic dispatch: real-time and day markets. ISOs and RTOs solve security-constrained unit commitment (SCUC) pro multiple times daily for various probable operating scenarios to clear an annual market of USD 400 billion [2]. The goal is to set not only the least-cost schedules, ensure balanced generation and demand but also to make sure that the system do violate any security constraints. These constraints not only include systems se constraints, e.g., thermal limits of transmission lines to carry maximum power, b include physical and operational constraints. The involvement of such securit operational constraints makes it a very complicated large-scale mixed-integer optimization problem. A classical mixed-integer-programming (MIP) solver takes twenty minutes to reach a viable optimal generation strategy for a large power s with a duality gap of 0.1%. Improvement in this solution time and computational can result in multiple improvements, which can lead to a more efficient energy m and bring several additional rewards, such as the implementation of cost curves h higher granularity, within-interval dispatch, etc. [3].
This paper proposes a novel three-layer data-based algorithm to solve a problem, as described in Figure 3. The first layer involves training machine learnin models to predict the binary commitment status of the generators based on th coefficients and the nodal demands. In the second layer, another set of ML models p the dispatch level of each of the generators using predictions from the first layer an and nodal demand data. Finally, the third layer implements a post-processing stra satisfy any violated constraints in the obtained predicted solutions from the fir layers. The contribution of this research work is twofold: • A multi-layer modeling framework is proposed using predictive mo techniques to model and solve a SCUC problem. • A third feasibility compliance layer is proposed to ensure that the predicted o solution is also a feasible solution for the secure operation of the system. ISOs and RTOs solve security-constrained unit commitment (SCUC) problems multiple times daily for various probable operating scenarios to clear an annual energy market of USD 400 billion [2]. The goal is to set not only the least-cost schedules, which ensure balanced generation and demand but also to make sure that the system does not violate any security constraints. These constraints not only include systems security constraints, e.g., thermal limits of transmission lines to carry maximum power, but also include physical and operational constraints. The involvement of such security and operational constraints makes it a very complicated large-scale mixed-integer energy optimization problem. A classical mixed-integer-programming (MIP) solver takes about twenty minutes to reach a viable optimal generation strategy for a large power system with a duality gap of 0.1%. Improvement in this solution time and computational speed can result in multiple improvements, which can lead to a more efficient energy market and bring several additional rewards, such as the implementation of cost curves having higher granularity, withininterval dispatch, etc. [3].
This paper proposes a novel three-layer data-based algorithm to solve a SCUC problem, as described in Figure 3. The first layer involves training machine learning (ML) models to predict the binary commitment status of the generators based on the cost coefficients and the nodal demands. In the second layer, another set of ML models predicts the dispatch level of each of the generators using predictions from the first layer and cost and nodal demand data. Finally, the third layer implements a post-processing strategy to satisfy any violated constraints in the obtained predicted solutions from the first two layers. The contribution of this research work is twofold: • A multi-layer modeling framework is proposed using predictive modeling techniques to model and solve a SCUC problem. • A third feasibility compliance layer is proposed to ensure that the predicted optimal solution is also a feasible solution for the secure operation of the system. Following deregulation in the USA in the 1990s, Independent System Ope (ISOs) and Regional Transmission Organizations (RTOs) replaced utilities as operators. Figure 2 illustrates the areas operated by RTOs in the USA. They typical two bid-based markets to determine economic dispatch: real-time and daymarkets. ISOs and RTOs solve security-constrained unit commitment (SCUC) pro multiple times daily for various probable operating scenarios to clear an annual e market of USD 400 billion [2]. The goal is to set not only the least-cost schedules, ensure balanced generation and demand but also to make sure that the system do violate any security constraints. These constraints not only include systems se constraints, e.g., thermal limits of transmission lines to carry maximum power, bu include physical and operational constraints. The involvement of such securit operational constraints makes it a very complicated large-scale mixed-integer e optimization problem. A classical mixed-integer-programming (MIP) solver takes twenty minutes to reach a viable optimal generation strategy for a large power s with a duality gap of 0.1%. Improvement in this solution time and computational can result in multiple improvements, which can lead to a more efficient energy m and bring several additional rewards, such as the implementation of cost curves h higher granularity, within-interval dispatch, etc. [3].
This paper proposes a novel three-layer data-based algorithm to solve a problem, as described in Figure 3. The first layer involves training machine learning models to predict the binary commitment status of the generators based on th coefficients and the nodal demands. In the second layer, another set of ML models pr the dispatch level of each of the generators using predictions from the first layer an and nodal demand data. Finally, the third layer implements a post-processing strat satisfy any violated constraints in the obtained predicted solutions from the fir layers. The contribution of this research work is twofold: • A multi-layer modeling framework is proposed using predictive mo techniques to model and solve a SCUC problem. • A third feasibility compliance layer is proposed to ensure that the predicted op solution is also a feasible solution for the secure operation of the system.   The rest of the article is set up as follows: Section 2 reviews commonly used approaches for solving a UC problem. Section 3 discusses problem formulation for a single time interval SCUC problem, followed by the proposed approach in Section 4. Case studies and simulation results are examined in Section 5. Section 6 presents a performance comparison of the CPLEX mixed-integer quadratic programming (MIQP) solver with the proposed multi-layer approach. Finally, Section 7 summarizes the conclusions and future work.

Literature Survey
In the literature, a lot of approaches have been explained for solving a UC problem which can be categorized in one of the categories below. Some of the most discussed optimization techniques are discussed in detail here.
In EE techniques, UC problems are solved by listing all combinations of binary variables representing the commitment status of generators. The sequences that result in the least operational cost are selected as optimal solutions for commitment decisions. Earlier solutions to UC problems were obtained using EE techniques [4,5]. Although EE-based approaches provide accurate optimal solutions, they do not apply to large-scale systems because of the dimensionality curse problem.
In PL methods, generators are sorted in increasing production costs. This prespecified assortment is then utilized to solve the UC problem so that the power demand is fulfilled. Refs. [6][7][8][9] solved the UC problem using PL-based approaches. Ref. [6] proposed a three-stage PL-based approach to solve a UC problem involving ramp rate constraints. Refs. [7,8] solved single-area and multi-area UC problems using PL approaches based on a classical index. Ref. [9] presented a PL-based approach to solving a UC problem which also includes import/export constraints. The advantage of the PL-based methods is the fast computational time; however, it suffers from producing sub-optimal generation schedules.
DP-based techniques are one of the earliest and the most widely used optimization methods for solving a UC problem [10][11][12][13][14]. The advantage of DP-based approaches is that they are easily modifiable to include specific features of utilities, and it is rather easier to include restrictions that affect hourly operations. However, the drawback is that it is more complicated to add time-dependent constraints and their substandard treatment. Moreover, solving a large-scale UC problem using DP can be time-consuming. Ref. [10] presented an adaptive DP-based algorithm to solve a UC problem involving coupled power and gas networks. In [11], Lowery et al. studied the feasibility of using DP for solving UC problems. They implemented dynamic programming for solving the UC problem for a 14-machine test system. Ref. [12] solved a multiple-area UC problem by employing a truncated DP and implementing it in a two-area Indian Power System. In [13], Forootani et al. proposed an approximate DP-based methodology using Markov decision processes to solve a stochastic UC problem. Ref. [14] discusses quadratic approximate DP-based approach for resource scheduling using Uruguayan power system. LR method is another widely used optimization technique used by the utilities for UC problems. Its use in the utilities for solving the UC is more modern than the DP methods. Like DP, it is easily modifiable to include specific characteristics of utilities. However, it also has the disadvantage of having inherent suboptimality. In LR-based methods, the UC problem is solved by ignoring the coupling constraints as if they do not exist. In [15], Merlin et al. developed a novel technique for the UC problem using the LR method and tested it on Electricité De France. Ref. [16] proposed a new three-stage LR algorithm for UC. The first involves maximization of the Lagrangian dual of the UC problem. The second stage consists of finding a reserve feasible dual solution, and economic dispatch is performed in the third stage. In [17], Takriti et al. developed a refined LR algorithm using integer programming. There is a lot of research being conducted on methods using LR methods and its refinements using hybrid approaches for solving UC problems.
EP-based approaches are established on the rules of evolution [18][19][20][21]. In such approaches, a population of potential solutions is maintained based on the fitness score of individuals. Ref. [18] implemented the genetic algorithm (GA) to the UC problem of time span involving a day up to a week. The feasibility of GAs for solving the UC problems has been explored for smaller-size as well as bigger-size systems [19]. Ref. [20] developed a new UC approach using a GA with mutation operators specific to a domain. The robustness of the presented approach is illustrated by comparison with an LR-based UC approach. In [21], Chen et al. presented a hybrid PL-GA algorithm to solve a UC problem involving wind farms and studied it performance on two test systems having 10 and 20 units.
In the 90s and early 2000s, there was some interest among researchers in using neural networks (NNET) for solving the UC problem, but because of the processing constraints at the time, it proved very challenging to reach high-mark solutions [22][23][24]. Lately, several research studies have been conducted focusing on the use of machine learning (ML) for optimization problems; however, their focus has been on the reduction of the number of constraints to enhance the speed of the MIP engines rather than switching them with the learning-based approaches [25].
There has also been a lot of research work on solving the unit commitment problem using decentralized approaches. Ref. [26] presented a decentralized SCUC approach to accelerate the scheduling of large-size systems. In this method, a power system is decoupled into multiple inter-connected areas using tie-lines, and each area is responsible to solve its reduced SCUC problem in a parallel fashion. In [27][28][29], a decentralized approach is proposed for solving a reliability SCUC which consists of two stages. A power system is broken down into multiple areas, and every area solves its SCUC problem.

Unit Commitment Problem Formulation
Even though the UC problem involves multiple intervals, for simplicity and to emphasize the efficacy of the developed approach, it is defined here as a single time interval MIQP problem as discussed in several other research studies [30,31]. The fundamental distinction is the presence of time-dependent constraints; however, the developed method is identically applicable to an SCUC problem involving multiple time periods.
Let N B , N G , N L represent respectively the number of buses, generators and transmission lines of the power network. The sets B = {1, 2, 3, . . . , N B }, G = {1, 2, 3, . . . , N G } and L = {1, 2, 3, . . . , N L } are the sets of buses, generators and lines respectively. For every generator g ∈ G, let I g ∈ {0, 1} represent the commitment decision variable and P g , a continuous variable, represent the output power being generated by g. A single-period SCUC problem can be formulated as an MIQP problem as presented below.

Objective Function
The goal of a UC problem is to minimize the total operating expense of all the generators. Let F g indicate a generator's cost function. Assuming a quadratic cost of power production (i.e., F g = a g + b g P g + c g P 2 g ) and neglecting the start-up costs, the objective function can be formulated as: min : Here I g , P g , a g , b g and c g represent respectively the commitment status, the production levels, and the cost coefficients of generator g. If a generator g is committed, its no-load cost (a g I g ) and power production cost (b g P g + c g P 2 g ) will be included in the objective function to be minimized.

The Constraints
There are several operational as well as security requirements that need to be met while minimizing operational costs. Some of the commonly used constraints in formulating a single-period SCUC are discussed here.

Nodal Power Balance
At each bus b ∈ N B , the nodal power balance (i.e., ∑ generation − ∑ load − ∑ line flow = 0) must be satisfied. While formulating a UC problem as a mixed-integer programming (MIP) problem, the line flows are usually approximated by DC power flow assumptions in the literature. Hence, the nodal power balance constraints are expressed as follows.
Here d b represents the estimated demand at node b, P l represents the active power being injected into line l from node b, and L b and G b respectively represent the sets of transmission lines and generators connected to node b. There will be N B power balance constraints, each representing the DC power balance at node b.

Generator Power Limits
These constraints ensure that, if committed, the power generated by a generator g, does not exceed its maximum (P g ) and minimum (P g ) power generation limits.
Here R g is the spinning reserve power provided by a generator g.

Thermal Limits of Transmission Lines
Because of the thermal limits of lines, there is a power limit that each line can carry. So, to ensure that the line flow (P l ) of line l does not surpass its maximum power carrying capacity (P l ), the constraints in (5) must be satisfied. The line flows are approximated using DC power flow assumptions. x l is the reactance of line l and δ l is the voltage angle difference of the two busses between which line l is connected.

Spinning Reserve Requirements
The spinning reserves are described as the online generation capacity minus the total generated power. The main purpose of the spinning reserve is to cope with generation losses or unexpected load changes. The spinning reserves must be higher than or equal to the minimum reserve requirements (R), also called system-wide reserves. R is usually described as a percentage of the total demand.

Binary and Non-Negative Variables
The UC problem is a mixed-integer non-convex optimization problem. It has both binary (I g ) and continuous (P g ) decision variables which are usually non-negative.

The Proposed Approach
The proposed data-driven approach for solving a SCUC problem contains three layers which are described below.

Layer-1: Prediction of Commitment Decision Variables
In the first layer of the proposed multi-layer approach, several machine learning classification models are trained by employing supervised machine learning approaches for predicting the commitment status of the units. During the training phase, either historic or synthetic data can be used. The data points in the training dataset represent the solved cases of the UC problem. There are a lot of machine learning approaches that can be employed for model training and the selection of an approach depends on its prediction performance. An approach working well for one application may not perform well for another. Hence, the common practice is to train multiple models and choose the best-performing one, as shown in Figure 4.
The UC problem is a mixed-integer non-convex optimization problem. It has both binary (Ig) and continuous (Pg) decision variables which are usually non-negative.

The Proposed Approach
The proposed data-driven approach for solving a SCUC problem contains three layers which are described below.

Layer-1: Prediction of Commitment Decision Variables
In the first layer of the proposed multi-layer approach, several machine learning classification models are trained by employing supervised machine learning approaches for predicting the commitment status of the units. During the training phase, either historic or synthetic data can be used. The data points in the training dataset represent the solved cases of the UC problem. There are a lot of machine learning approaches that can be employed for model training and the selection of an approach depends on its prediction performance. An approach working well for one application may not perform well for another. Hence, the common practice is to train multiple models and choose the best-performing one, as shown in Figure 4. For the first layer prediction, multiple models are trained, such as "K-Nearest Neighbor" (KNN), "Generalized Linear Model" (GLM), "Linear Discriminant Analysis" (LDA), "Support Vector Machine" (SVM), "Decision Tree" (DT), "Random Forest" (RF), "Extreme Gradient Boosting" (XGB) and "Neural Network" (NNET). For model evaluation, the original dataset is divided into disjoint train and test datasets. For initial selection during the training process for hyperparameter tuning, 10-fold repeated crossvalidation is utilized. The final selection is performed on the basis of the test set classification accuracy, which is defined in (9).
Here , , and respectively indicate "true-positives", "true-negatives", "false-positives" and "false-negatives" as defined in Figure 5. For the first layer prediction, multiple models are trained, such as "K-Nearest Neighbor" (KNN), "Generalized Linear Model" (GLM), "Linear Discriminant Analysis" (LDA), "Support Vector Machine" (SVM), "Decision Tree" (DT), "Random Forest" (RF), "Extreme Gradient Boosting" (XGB) and "Neural Network" (NNET). For model evaluation, the original dataset is divided into disjoint train and test datasets. For initial selection during the training process for hyperparameter tuning, 10-fold repeated cross-validation is utilized. The final selection is performed on the basis of the test set classification accuracy, which is defined in (9).

Layer-2: Prediction of Production Decision Variables
The predicted commitment decision variables obtained from the layer-1 prediction are then used to predict the production levels of the generators using trained ML regression models. Similar to the layer-1 approach, several models are trained, such as KNN, GLM, DT, RF, XGB and NNET, and the best-performing model is used for the final prediction. The performance evaluation of layer-2 models is similar to layer-1, but the indices used to quantify the efficiency are different. The commonly used metrics are "mean-absolute-error" (MAE), "root-mean-square-error" (RMSE), "mean-absolute-

Layer-2: Prediction of Production Decision Variables
The predicted commitment decision variables obtained from the layer-1 prediction are then used to predict the production levels P g of the generators using trained ML regression models. Similar to the layer-1 approach, several models are trained, such as KNN, GLM, DT, RF, XGB and NNET, and the best-performing model is used for the final prediction. The performance evaluation of layer-2 models is similar to layer-1, but the indices used to quantify the efficiency are different. The commonly used metrics are "mean-absolute-error" (MAE), "root-mean-square-error" (RMSE), "mean-absolute-percentage-error" (MAPE) and "coefficient-of-determination" (R-squared). We used MAE for the best model selection based on the test set performance, which is defined in (10).
Here Pg i and Pg i are respectively the actual and the predicted production of a generator g for test set instance i and n is the size of the test dataset.

Layer-3: Feasibility Compliance
Since layer-1 and layer-2 models are trained on the data which consists of solved feasible solutions, they implicitly predict solutions that would somewhat fulfill the constraints requirements. However, because of the inherent uncertainty in predictive modeling techniques, it is required from the proposed approach to ensure that every constraint is realized for the safe functioning of the power system. Therefore, a mandatory third layer for feasibility compliance is added to attain this objective.

Spinning Reserve Requirements
To satisfy the spinning reserve constraints, it is confirmed if the total committed generation capacity (P c G ) is higher than total demand (D) plus spinning reserve requirement (R) as given by (11)- (13). If it is not satisfied, the nodal demands are assumed to have increased by a small percentage, and the commitment decisions are predicted again to increase the predicted committed capacity. Here G c represents the set of committed generators.

Generator Power Limits
The predicted production decisions (P g ), obtained from layer-2 prediction, are truncated by P g and P g to satisfy the upper and lower power limits of generators as follows.

Total Power Balance
To balance the total generation (P G ) with the total demand (D), the generation load imbalance (GLI) is calculated, and the production decisions are updated using (16) while satisfying the upper and lower power limits of generators.
The decision variables (P g ) which violate P g or P g limits are truncated at their limits, and the remaining P g variables are updated again using (15)- (16).

Thermal Limits of Transmission Lines
To ensure that none of the line capacity limits are violated, the generator powers (P g ) are re-dispatched to ensure that all the line flows are within their capacity limits. For rescheduling the generator powers, small variations (∆P g ) are introduced in P g variables using Generation Shift Factors (GSF) and Eigen Value Decomposition (EVD) discussed below.

Generation Shift Factors (GSF)
The effect of nodal/bus power injections on the line flows can be studied using linear sensitivity factors called Generation Shift Factors. They indicate how much power being injected at bus b will appear on line l. Using the DC load flow assumptions, the effect of varying the nodal injections (∆P) on the node voltage angles (∆δ) is given by (17).
B is the DC load flow admittance matrix. In a similar manner, the effect of varying the node voltage angles (∆δ) on the line flows (∆P line ) is given by (18).
D X is a square diagonal matrix with a diagonal element D X (i, i) equal to −1/X l . Here X l is the inductive reactance of the l th transmission line, and matrix A N is the node-arcincidence matrix which indicates how lines are connected in the network. Using (17) and (18), we can find the effect of varying nodal injections on the power flow in lines.
Here T is the matrix of GSFs given by (20), and ∆P line , ∆δ and ∆P are vectors of change in line flows, node voltage angles and nodal injections, respectively.
Using (19), the effect of varying nodal powers (∆P), i.e., variation in generators' output powers (∆P g ), on the line flows can be calculated.

Eigenvalue Decomposition (EVD)
A vector x varies its magnitude and its direction when operated on a system matrix A as y = Ax. However, when eigenvectors (v λ i ) of the matrix A are multiplied with it, they only change their magnitudes while keeping the same direction, as given by (24). Figure 6 shows the effect of multiplying all vectors {x i } of equal lengths with A. It can be seen that when x lies in the span of the eigenvector v m , an eigenvector of the highest eigen-value λ m , then y is maximized, i.e., if x = αv m , then y = αAv m = αλ m v m . Therefore, to have maximum effect of ∆P on a specific line l, weighted L2 norm of ∆P line can be utilized. Using (19), it can be deduced that: Here, N = T T W T is a real symmetric matrix, and W is a selection matrix to select the lines whose flows are to be controlled. As mentioned, ∆P l would have maximum change if ∆P lies in the span of the eigen-vector v m of N, i.e., if ∆P = αv m , then ∆P max l = α 2 λ m . Scalar α is a design parameter. It gives the desired variation in a line flow and can be calculated as α = ± ∆P l,req /λ m [32,33]. Since the system's load and total generation must be balanced in the obtained solution, the condition ∑ ∀b ∆P b = 0 must also be satisfied. Therefore, only the signs of the elements of v m are used to get a sense of the required change in generators power (i.e., increase or decrease P g ) to maximize the effect ∆P on ∆P l . The flowchart of the proposed three-layer algorithm is presented in Figure 7.
highest eigen-value , then is maximized, i.e., if = , then = = . Therefore, to have maximum effect of ∆ on a specific line , weighted L2 norm of ∆ can be utilized. Using (19), it can be deduced that: Here, = is a real symmetric matrix, and is a selection matrix to select the lines whose flows are to be controlled. As mentioned, ∆ would have maximum change if ∆ lies in the span of the eigen-vector of , i.e., if Δ = , then ∆ = . Scalar is a design parameter. It gives the desired variation in a line flow and can be calculated as = ± ∆ , / [32,33]. Since the system's load and total generation must be balanced in the obtained solution, the condition ∑ Δ ∀ = 0 must also be satisfied. Therefore, only the signs of the elements of are used to get a sense of the required change in generators power (i.e., increase or decrease ) to maximize the effect ∆ on ∆ . The flowchart of the proposed three-layer algorithm is presented in Figure 7.

Case Studies and Simulation Results
To evaluate the efficacy of the developed algorithm, we will study three power systems, i.e., a 4-bus 3-machine system and the IEEE 39-bus 10-machine and 118-bus 54machine systems. Application of the developed approach on a large-size system is under study. All the simulations are performed using MATLAB and R-Studio on an AMD 16cores machine having 64 GB of DDR4 RAM.

Case Studies and Simulation Results
To evaluate the efficacy of the developed algorithm, we will study three power systems, i.e., a 4-bus 3-machine system and the IEEE 39-bus 10-machine and 118-bus 54-machine systems. Application of the developed approach on a large-size system is under study. All the simulations are performed using MATLAB and R-Studio on an AMD 16-cores machine having 64 GB of DDR4 RAM.

Case Study: 4-Bus System
This small system is utilized as an explanatory example to investigate the working of the presented method. The system has four buses, three generators, two loads and five transmission lines. The system diagram is shown in Figure 8. Further information on the system is given in Table 1.

Case Studies and Simulation Results
To evaluate the efficacy of the developed algorithm, we will study three power systems, i.e., a 4-bus 3-machine system and the IEEE 39-bus 10-machine and 118-bus 54machine systems. Application of the developed approach on a large-size system is under study. All the simulations are performed using MATLAB and R-Studio on an AMD 16cores machine having 64 GB of DDR4 RAM.

Case Study: 4-Bus System
This small system is utilized as an explanatory example to investigate the working of the presented method. The system has four buses, three generators, two loads and five transmission lines. The system diagram is shown in Figure 8. Further information on the system is given in Table 1.   For model training, the dataset is generated by solving the MIQP problem presented in Section 4 using the IBM CPLEX toolbox for MATLAB. A dataset of 1000 samples, representing different scenarios of power demands and price signals, is generated assuming normal distributions for the nodal demand (d 2 , d 4 ) and the cost coefficients (a g , b g , c g ). The spread of the cost curve resulting from these cost coefficients is given in Figure 9. Figure 10 displays the histograms of the nodal demands and the cost coefficients in the generated synthetic dataset.
The generated dataset is divided into an 80-20% train-test split using stratified random sampling to preserve the distributions of the classes in the train and the test sets. The initial selection of the top-performing models is conducted based on their performance using 10-fold cross-validation (10-CV) repeated five times. This 10-CV is also used for hyperparameter tuning. The selected models are further evaluated using the test dataset, and the best-performing model is selected.
The generated dataset is divided into an 80-20% train-test split using stratified random sampling to preserve the distributions of the classes in the train and the test sets. The initial selection of the top-performing models is conducted based on their performance using 10-fold cross-validation (10-CV) repeated five times. This 10-CV is also used for hyperparameter tuning. The selected models are further evaluated using the test dataset, and the best-performing model is selected.   Table 2 shows the models' performance during training. TPR and TNR represent th "true positive" and "true negative" rates, respectively. Figure 11 shows the boxplot fo the training accuracy score to show the differences among these trained models. The tes set performance of the initially selected models is given in Table 3. Based on the test se performance, NNET was selected for layer-1 prediction.   Table 2 shows the models' performance during training. TPR and TNR represent the "true positive" and "true negative" rates, respectively. Figure 11 shows the boxplot for the training accuracy score to show the differences among these trained models. The test set performance of the initially selected models is given in Table 3. Based on the test set performance, NNET was selected for layer-1 prediction.   Similar to layer 1, several models are trained for predicting the production levels of the committed generators. These models include GLM, KNN, RF, SVM, NNET and XGB. Table 4 shows the performance of the models during the training and testing phases.   Similar to layer 1, several models are trained for predicting the production levels of the committed generators. These models include GLM, KNN, RF, SVM, NNET and XGB. Table 4 shows the performance of the models during the training and testing phases. In this layer, the solutions obtained from the first two layers are further processed to ensure all constraints are satisfied. Table 5 shows the percentage of the infeasible instances (out of 200 total instances in the test set) with and without layer-3.

Case Study: IEEE 39-Bus System
The system has thirty-nine buses, forty-six transmission lines, eighteen loads and ten generators. The spinning reserve requirements were taken to be 10% of the total demand. Further details of the system can be obtained from the MATPOWER. Figure 12 shows the system diagram.
A dataset of 1000 samples is created using normal distribution functions for the cost coefficients and the nodal loads. The distributions of the nodal demands in the generated dataset are shown in Figure 13. For training and evaluation, the data is divided with an 80-20% train-test split using stratified random sampling. The training and test set performances of the trained models for layer-1 and layer-2 predictions are shown in Tables 6 and 7, respectively. Based on the performance, GLM was selected for predicting the commitment decision variables and NNET was selected for layer 2 prediction. The effect of layer-3 on the percentage of infeasible test samples is evident in Table 8.

Case Study: IEEE 39-Bus System
The system has thirty-nine buses, forty-six transmission lines, eighteen loads and ten generators. The spinning reserve requirements were taken to be 10% of the total demand. Further details of the system can be obtained from the MATPOWER. Figure 12 shows the system diagram. A dataset of 1000 samples is created using normal distribution functions for the cost coefficients and the nodal loads. The distributions of the nodal demands in the generated dataset are shown in Figure 13. For training and evaluation, the data is divided with an 80-20% train-test split using stratified random sampling. The training and test set performances of the trained models for layer-1 and layer-2 predictions are shown in Tables 6 and 7, respectively. Based on the performance, GLM was selected for predicting the commitment decision variables and NNET was selected for layer 2 prediction. The effect of layer-3 on the percentage of infeasible test samples is evident in Table 8.

Case Study: IEEE 118-Bus System
The system has 118 buses, 99 loads, 54 generators and 186 transmission lines. The reserve requirements were taken to be 10% of the total demand. Further details of the system can be obtained from the MATPOWER. The system diagram is given in Figure 14. A dataset of 3500 instances is generated using normal probability distributi functions for the cost coefficients and the nodal loads. Figure 15 shows the nodal dema distributions of some of the buses. For training and evaluation, the data is divided w an 80-20% train-test split using stratified random sampling. The train set and the test performances of the trained models for layer-1 and layer-2 predictions are shown Tables 9 and 10 respectively. Based on the performance, GLM was selected for layer-1 well as layer-2 predictions. The effect of layer-3 on the percentage of infeasible t samples is evident in Table 11. A dataset of 3500 instances is generated using normal probability distribution functions for the cost coefficients and the nodal loads. Figure 15 shows the nodal demand distributions of some of the buses. For training and evaluation, the data is divided with an 80-20% train-test split using stratified random sampling. The train set and the test set performances of the trained models for layer-1 and layer-2 predictions are shown in Tables 9 and 10 respectively. Based on the performance, GLM was selected for layer-1 as well as layer-2 predictions. The effect of layer-3 on the percentage of infeasible test samples is evident in Table 11. functions for the cost coefficients and the nodal loads. Figure 15 shows the nodal demand distributions of some of the buses. For training and evaluation, the data is divided with an 80-20% train-test split using stratified random sampling. The train set and the test set performances of the trained models for layer-1 and layer-2 predictions are shown in Tables 9 and 10 respectively. Based on the performance, GLM was selected for layer-1 as well as layer-2 predictions. The effect of layer-3 on the percentage of infeasible test samples is evident in Table 11.    First, the cost comparison is performed between the solutions obtained from the proposed approach and the conventional MIQP solver using IBM CPLEX MATLAB Toolbox. Table 12 presents the cost comparison for the test set for the three case studies. The table data shows that the developed algorithm obtained solutions with an optimality gap of less than 1% on average which can be considered a satisfactory performance in terms of cost. However, the primary advantage of the proposed approach is the fast computational speed which is evaluated next.

Time Comparison
Next, the computational time comparison is performed. From the test dataset, 100 instances are solved, and the solution time taken by an approach is observed. This is repeated several times and the average time to solve 1000 instances is calculated. Figure 16 shows the average time taken by each approach to solve 1000 instances. It is evident from the figure data that the developed algorithm provides significant time savings (96%, 94% and 81% reductions in time for the 4-bus and the IEEE 39-bus and 118-bus systems respectively) compared to the CPLEX MIQP solver.

Conclusions and Future Work
This paper proposes a novel three-layer hierarchical learning-based algorithm for solving a security constrained UC problem. In the first layer, a set of data-driven ML models are utilized to predict the binary commitment variables. In the second layer, another set of predictive models predicts the production levels of the committed generators. Finally, in the third layer, a feasibility compliance algorithm post-processes the predicted schedule to ensure the feasibility of the obtained solution. The efficacy of the developed method was demonstrated through numerical simulations using a 4-bus 3machine system and two standard IEEE power systems, i.e., 39-bus 10-machine and 118bus 54-machine power systems. Simulation results demonstrate that the proposed approach provided significant time savings (90% time-reduction on average for the three studied systems) in solving a single-period SCUC problem with minimal degradation

Conclusions and Future Work
This paper proposes a novel three-layer hierarchical learning-based algorithm for solving a security constrained UC problem. In the first layer, a set of data-driven ML models are utilized to predict the binary commitment variables. In the second layer, another set of predictive models predicts the production levels of the committed generators. Finally, in the third layer, a feasibility compliance algorithm post-processes the predicted schedule to ensure the feasibility of the obtained solution. The efficacy of the developed method was demonstrated through numerical simulations using a 4-bus 3-machine system and two standard IEEE power systems, i.e., 39-bus 10-machine and 118-bus 54-machine power systems. Simulation results demonstrate that the proposed approach provided significant time savings (90% time-reduction on average for the three studied systems) in solving a single-period SCUC problem with minimal degradation (<1%) to the optimality index. Moreover, it also ensured that all the predicted solutions are feasible solutions (0% infeasible solutions) for the secure operation of the power system.
The significance of the proposed approach would be more eminent when we exploit its capability of learning patterns from the big data and study its performance for a realistic large-size SCUC problem. Future work will be to solve a UC problem with multiple time intervals and/or take into consideration renewable energy sources, e.g., battery storage, wind and solar energy.