Data-Driven Compartmental Modeling Method for Harmonic Analysis—A Study of the Electric Arc Furnace

: The electric arc furnace (EAF) contributes to almost one-third of the global iron and steel industry, and its harmonic pollution has drawn attention. An accurate EAF harmonic model is essential to evaluate the harmonic pollution of EAF. In this paper, a data-driven compartmental modeling method (DCMM) is proposed for the multi-mode EAF harmonic model. The proposed DCMM considers the coupling relationship among di ﬀ erent frequencies of harmonics to enhance the modeling accuracy, meanwhile, the dimensions of the harmonic dataset are reduced to improve computational e ﬃ ciency. Furthermore, the proposed DCMM is applicable to establish a multi-mode EAF harmonic model by dividing the multi-mode EAF harmonic dataset into several clusters corresponding to the di ﬀ erent modes of the EAF smelting process. The performance evaluation results show that the proposed DCMM is adaptive in terms of establishing the multi-mode model, even if the data volumes, number of clusters, and sample distribution change signiﬁcantly. Finally, a case study of EAF harmonic data is conducted to establish a multi-mode EAF harmonic model, showing that the proposed DCMM is e ﬀ ective and accurate in EAF modeling.


Introduction
The electric arc furnace (EAF) is a widely used device in the iron and steel industry and it injects harmonic current into the power system and produces harmonic voltage at the point of common coupling (PCC) [1].A small reduction in harmonic pollution represents a significant electrical energy saving since EAF works in the range between several megavolt amperes (MVA) and 250 MVA [2][3][4][5].In this context, there is a need for an EAF harmonic model to analyze the undesirable disturbing effects of harmonics in the harmonic power-flow calculation.
In recent years, various mechanism modeling methods have been proposed for EAF harmonic models.For instance, the time-domain simulation method was developed to model EAF for the flicker study [6][7][8][9].However, it is relatively complex for the established time-domain model to analyze the undesirable disturbing effects of harmonics due to the solution of the time-domain differential equations.In this case, it is much simpler for frequency-domain models (e.g., the constant-harmonic-ratio-type model and the Norton equivalent model) to evaluate the harmonic pollution of the EAF [10].Munoz et al. approximatively modeled the EAF as a constant-harmonic-ratio-type model and applied it to the harmonic power-flow calculation [11].The constant-harmonic-ratio-type model considers the interaction between the harmonic currents and the fundamental current.It should be noted that Energies 2019, 12, 4378 2 of 15 the EAF smelting process normally consists of three modes with different volt-ampere characteristics, i.e., boring, melting, and refining.More importantly, the EAF harmonic model depends critically on volt-ampere characteristics.Such a constant-harmonic-ratio-type model ignored that the impact of harmonic voltage on the harmonic current would lead to invalid harmonic power-flow calculation results.Considering the impact of the harmonic voltage on the harmonic current in the same frequency, Sezgin et al. modeled the EAF as a Norton equivalent model [12].In this mechanism, the harmonic power-flow calculation results still deviate from the desirable range as such a modeling method ignores the coupling relationship among different frequencies of harmonics.Based on the discussion above, it is challenging to analyze the undesirable disturbing effects of harmonics through an EAF harmonic model established by a mechanism modeling method due to the difficulty precisely comprehending the EAF harmonic mechanism.
Given the limitations of the above-mentioned mechanism modeling methods, some data-driven modeling methods for EAF harmonic models were proposed.For example, the EAF harmonic model based on field measurements of voltages and currents was proposed [13,14].However, these methods ignore the interactions between harmonic currents and harmonic voltages and would result in a calculation error.To deal with these issues, the genetic algorithm and hidden Markov theory were utilized to estimate the parameters of the time-varying EAF harmonic model [15][16][17].It is evident that the existing data-driven methods are effective in establishing an accurate EAF harmonic model when the one-mode EAF harmonic dataset is adopted, but they cannot partition the different modes of the multi-mode EAF harmonic dataset, i.e., they are not applicable to establish the multi-mode EAF harmonic model.In light of the potential benefits of artificial intelligence, e.g., a strong ability for nonlinear mapping, the neural-network-based method was employed to establish the multi-mode EAF harmonic model [18][19][20].Nevertheless, the non-linear multi-mode EAF harmonic model established by the neural-network-based method is not suitable for the harmonic power-flow calculation because it results in a heavy computational cost in the harmonic power-flow calculation [21,22].
As such, there is little academic literature on an EAF modeling method that fully considers the coupling relationship among different frequencies of harmonics and accurately identifies the parameters of different modes of the multi-mode EAF harmonic model.However, the need for an EAF modeling method on these issues is recognized in the industry and academic research areas [23], and how to employ massive power quality data to promote and enhance the power quality of the power grid is a pressing issue.
In this paper, we propose a data-driven compartmental modeling method (DCMM), which fully considers the coupling relationship among different frequencies of harmonics and accurately identifies the parameters of different modes of the multi-mode EAF harmonic model.Compared with the existing EAF modeling methods, the proposed DCMM has three main technical contributions as follows.
Firstly, the proposed DCMM fully considers the coupling relationship among different frequencies of harmonics to guarantee the accuracy of the multi-mode EAF harmonic model.Moreover, we adopt principle component analysis (PCA) to reduce heavy modeling computational cost caused by such a consideration.Hence, the proposed DCMM enhances the modeling accuracy while reducing the computational burden.
Secondly, in the proposed DCMM, the PSO-based K-means algorithm is employed to divide the multi-mode EAF harmonic dataset into several clusters corresponding to the different modes of the EAF smelting process.We calculate the optimal number of clusters and initial clustering centers to guarantee that the objects belonging to the same mode are compact and the objects belonging to different modes are well separated.Moreover, the clustering center and distance measure of the PSO-based K-means algorithm are redefined to ensure that the volt-ampere characteristic of each mode is linear.
Finally, the proposed DCMM can accurately identify the parameters of different modes of the EAF smelting process simply based on the multi-mode EAF harmonic dataset without adjusting the input parameters of the DCMM.Therefore, the proposed DCMM is applicable to establish the multi-mode EAF harmonic model.
The rest of the paper is organized as follows.In Section 2, the proposed DCMM is introduced.In Section 3, two cases are carried out to investigate the effectiveness and adaptivity of the proposed DCMM.In Section 4, the proposed DCMM is utilized to identify the parameters of the multi-mode EAF harmonic model based on the harmonic data, furthermore, the application of the multi-mode EAF harmonic model is discussed.In Section 5, the EAF modeling in the context of the different models is discussed.Finally, Section 6 concludes the paper.

Data-Driven Compartmental Modeling Method for EAF
This section elaborates on the multi-mode EAF harmonic model first.Next, the DCMM is proposed.

Multi-Mode EAF Harmonic Model
The non-linear load model shown as Equation ( 1) depends on fundamental and harmonic voltages [24][25][26][27], where I h is the h-th harmonic current, U 1 denotes the fundamental voltage, U m is the m-th harmonic voltage, and C is the constant coefficient.
The above-mentioned non-linear load model is not suitable for the EAF as the correlation between the harmonic currents and fundamental current is more prominent than that between the harmonic currents and the fundamental voltage.Therefore, the non-linear coupling model based on the interactions among the fundamental current, the harmonic voltages, and the harmonic currents is presented as follows, where I 1 denotes the fundamental current.
The non-linear function, F h , leads to the heavy computational cost of harmonic power-flow calculation, due to the iterative solution of the system equations and non-linear coupling model [20,21].Therefore, a non-linear coupling model is approximated to a linear coupling model, as shown in Equation (3).In this way, the computation burden is reduced while the convergence precision is maintained.
where A is the coupling matrix.Equation (3) can be rewritten as follows by denoting [I 1 , U 2 , U 3 , . . ., U m ] T as W m , The coupling matrix A and the constant-coefficient C represent the coupling relationship among the fundamental component and harmonic components.Note that the harmonic components vary with the modes of the EAF smelting process.Therefore, it is necessary to classify the measured data according to operating modes and estimate A and C from the respective data clusters.It is assumed that an EAF smelting process has n operating modes, i.e., the multi-mode EAF harmonic model consists of n linear coupling models shown as Equation (4), the next goal of this paper is to identify the model parameters A, C, A, C, . . ., A (n) , and C (n) .
However, the dimension of the concerned harmonics of Equation ( 4) is normally 25; it is difficult to classify such high-dimensional spatial data and identify the model parameters accurately.Therefore, the PCA was adopted to reduce the dimensions of the dataset, so that we could obtain accurate model parameters.The PCA adjusts the coordinate axes of the dataset to guarantee that the variances of the adjusted coordinate axes are following a diminishing order.The first several coordinate axes were employed to classify the measured data of EAF because the coordinate axes with small enough variances were unable to distinguish the data.That is, the PCA can extract a smaller representative dataset from a larger one [28].According to the description above, the process of model simplification is elaborated as follows.
We suppose that the dimension of W m , m, is reduced to j by PCA, then W m is processed to R j as follows, where Q denotes a matrix consisting of the selected eigenvectors of the covariance matrix of W m .R j can be represented as [r 1 , r 2 , r 3 , . . ., r j ] T , where r 1 , r 2 , r 3 , . . ., and r j are the first j principal components of W m .
Consequently, the simplified model based on PCA is formulated as follows Equation ( 6) is rewritten as follows by denoting AQ −1 as A p , where A p is the simplified coupling matrix.
The linear coupling relationship among different frequencies of harmonics is consistent before and after model simplification [29].Additionally, the modeling precision and computational efficiency are improved since model simplification filters out redundant and correlated features of the measured data of EAF [30][31][32].
Thus, the next goal of this paper is converted to identify the simplified model parameters A p , C, A p , C, . . ., A p (n) , and C (n) , then the model parameters A, C, A, C, . . ., A (n) , and C (n) can be calculated according to Equation (8), where Q is the known matrix obtained from the PCA.
Based on the discussion above, the parameter identification method of the multi-mode EAF harmonic model will be elaborated in Section 2.2.

Data-Driven Compartmental Modeling Method (DCMM)
In this subsection, the DCMM based on the multi-mode EAF harmonic model is proposed.As shown in Figure 1, the proposed DCMM is illustrated.First, the h-th harmonic current data, the fundamental current data, and the 2-nd to m-th harmonic voltage data were normalized according to the short-circuit capacity and base voltage, and the dimensions of the normalized dataset were reduced by PCA.Furthermore, the optimal number of clusters and initial clustering centers were calculated based on the sum of the squared error (SSE) and PSO, respectively.Then the preprocessed data was separated into several clusters based on the K-means algorithm.Thereafter, the simplified model parameters A p , C, A p , C, . . ., A p (n) , and C (n) were identified.Finally, the model parameters A, C, A, C, . . ., A (n) , and C (n) were calculated.
Energies 2019, 12, x FOR PEER REVIEW 4 of 16 smaller representative dataset from a larger one [28].According to the description above, the process of model simplification is elaborated as follows.
We suppose that the dimension of Wm, m, is reduced to j by PCA, then Wm is processed to Rj as follows, where Q denotes a matrix consisting of the selected eigenvectors of the covariance matrix of Wm.Rj can be represented as [r1, r2, r3, …, rj] T , where r1, r2, r3, … , and rj are the first j principal components of Wm.
Consequently, the simplified model based on PCA is formulated as follows Equation ( 6) is rewritten as follows by denoting AQ −1 as Ap, where Ap is the simplified coupling matrix.
The linear coupling relationship among different frequencies of harmonics is consistent before and after model simplification [29].Additionally, the modeling precision and computational efficiency are improved since model simplification filters out redundant and correlated features of the measured data of EAF [30][31][32].
Thus, the next goal of this paper is converted to identify the simplified model parameters Ap, C, Ap, C, … , Ap (n) , and C (n) , then the model parameters A, C, A, C, … , A (n) , and C (n) can be calculated according to Equation ( 8), ( ) ( ) , 1, 2,..., where Q is the known matrix obtained from the PCA.
Based on the discussion above, the parameter identification method of the multi-mode EAF harmonic model will be elaborated in Subsection 2.2.

Data-Driven Compartmental Modeling Method (DCMM)
In this subsection, the DCMM based on the multi-mode EAF harmonic model is proposed.As shown in Figure 1, the proposed DCMM is illustrated.First, the h-th harmonic current data, the fundamental current data, and the 2-nd to m-th harmonic voltage data were normalized according to the short-circuit capacity and base voltage, and the dimensions of the normalized dataset were reduced by PCA.Furthermore, the optimal number of clusters and initial clustering centers were calculated based on the sum of the squared error (SSE) and PSO, respectively.Then the preprocessed data was separated into several clusters based on the K-means algorithm.Thereafter, the simplified model parameters Ap, C, Ap, C, … , Ap (n) , and C (n) were identified.Finally, the model parameters A, C, A, C, … , A (n) , and C (n) were calculated.In our method, the PSO-based K-means algorithm was adopted to divide the multi-mode EAF harmonic dataset into several clusters corresponding to the different modes of the EAF smelting process.We adopted the SSE and PSO, which is different from the conventional K-means algorithm, to calculate the optimal number of clusters and initial clustering centers to improve clustering

Clustering analysis
A p (2) , C (2)   A p (3) , C (3)   A p (1) , C (1)   Simplified model parameters A (1) , C (1)   Model parameters  In our method, the PSO-based K-means algorithm was adopted to divide the multi-mode EAF harmonic dataset into several clusters corresponding to the different modes of the EAF smelting process.We adopted the SSE and PSO, which is different from the conventional K-means algorithm, to calculate the optimal number of clusters and initial clustering centers to improve clustering accuracy.Moreover, the clustering center and distance measure of the PSO-based K-means algorithm were redefined to ensure that the volt-ampere characteristic of each mode of the EAF smelting process is linear.
Considering that the number of clusters of the K-means algorithm needs be set beforehand, we introduced SSE formulated as follows to determine the optimal number of clusters, where k is the number of clusters, Y i is the i-th cluster, q denotes a data point belonging to Y i , and v i is the center of Y i .
With the increasing number of clusters, SSE decreases, and SSE will decrease markedly if k approximates the optimal number of clusters.Therefore, we calculated SSEs under different conditions (k = 1,2, . . .,10) and take k at the marked decline of SSEs as the optimal number of clusters.
PSO was employed to obtain the initial clustering center of the K-means algorithm rather than randomly initializing the K-means algorithm [33].Compared with the random initialization K-means algorithm, it was more efficient for the PSO-based K-means algorithm to search for the near-global solution or global optimal solution and enhance the clustering accuracy and computational efficiency [34,35].
The clustering center and distance measure of the PSO-based K-means algorithm were redefined as follows, where a 1 , a 2 , . . ., a j , and c are constant coefficients, j denotes the number of dimensions of matrix R j , dis (i) is the distance between the i-th data point and the clustering center, and r 1 (i) , r 2 (i) , . . ., r j (i) , and I h are coordinates of the i-th data point.The redefined clustering center is linear instead of a point, which is different from the clustering center of the conventional K-means algorithm.Such a redefinition can guarantee that the volt-ampere characteristic of each mode of the EAF smelting process is linear.
The clustering process is illustrated in Figure 2. First, the distances between each data point and different clustering centers are calculated.Furthermore, the data points are divided into the nearest cluster.Thereafter, several clustering centers are recalculated.Finally, if the clustering centers change, return to Step 1, otherwise, the clustering centers are obtained.
Based on the discussion above, our method is able to partition the multi-mode EAF harmonic dataset into several clusters, such that the objects with shared linear volt-ampere characteristics are compact, and the objects with different linear volt-ampere characteristics are well separated.
Finally, the simplified model parameters A p , C, A p , C, . . ., A p (n) , and C (n) were identified by the least square fitting, and the model parameters A, C, A, C, . . ., A (n) , and C (n) were calculated according to Equation (8).In this way, the multi-mode EAF harmonic model was established by the proposed DCMM.

Start
Calculate the distances between each data point and different clustering centers.
Divide data points into the nearest cluster.
Recalculate the clustering centers.
Have the clustering centers changed?
Output the clustering centers.Based on the discussion above, our method is able to partition the multi-mode EAF harmonic dataset into several clusters, such that the objects with shared linear volt-ampere characteristics are compact, and the objects with different linear volt-ampere characteristics are well separated.
Finally, the simplified model parameters Ap, C, Ap, C, … , Ap (n) , and C (n) were identified by the least square fitting, and the model parameters A, C, A, C, … , A (n) , and C (n) were calculated according to Equation (8).In this way, the multi-mode EAF harmonic model was established by the proposed DCMM.

Performance Evaluation
To test the effectiveness and adaptivity of the proposed DCMM, two cases are discussed in this section.The corresponding simplified model parameters are shown in Table 1.  3 and Figure 4, respectively.

Performance Evaluation
To test the effectiveness and adaptivity of the proposed DCMM, two cases are discussed in this section.The corresponding simplified model parameters are shown in Table 1.According to Equation ( 7) and the parameters in Table 1, we utilized MATLAB to generate 2880 and 3360 data points detailed in Figures 3 and 4, respectively.
The proposed DCMM is utilized to estimate the simplified model parameters of the datasets shown in Figures 3 and 4, respectively.The average estimation deviation of a mode, D m , is defined as follows, where C and C are the true value and the estimated value of the constant-coefficient, respectively.
[p 1 , p 2 ] and [p 1 , p 2 ] are equal to A p and A p representing the true value and the estimated value of the simplified coupling matrix, respectively.The average estimation deviation of a case, D c , is defined as follows, where n is the number of the modes of a case and D m(i) denotes the average estimation deviation of the i-th mode.
Energies 2019, 12, x FOR PEER REVIEW 7 of 16  The proposed DCMM is utilized to estimate the simplified model parameters of the datasets shown in Figure 3 and Figure 4, respectively.The average estimation deviation of a mode, Dm, is defined as follows, where C and C` are the true value and the estimated value of the constant-coefficient, respectively.[p1, p2] and [p1`, p2`] are equal to Ap and Ap` representing the true value and the estimated value of the simplified coupling matrix, respectively.
The average estimation deviation of a case, Dc, is defined as follows, where n is the number of the modes of a case and Dm(i) denotes the average estimation deviation of the i-th mode.
It means that the true value and estimated value of the simplified model parameters are identical  The proposed DCMM is utilized to estimate the simplified model parameters of the datasets shown in Figure 3 and 4, respectively.The average estimation deviation of a mode, Dm, is defined as follows, where C and C` are the true value and the estimated value of the constant-coefficient, respectively.[p1, p2] and [p1`, p2`] are equal to Ap and Ap` representing the true value and the estimated value of the simplified coupling matrix, respectively.
The average estimation deviation of a case, Dc, is defined as follows, where n is the number of the modes of a case and Dm(i) denotes the average estimation deviation of the i-th mode.
It means that the true value and estimated value of the simplified model parameters are identical if Dc is calculated to be 0. Nonetheless, it is quite difficult to achieve identical results when the data means that the true value and estimated of the simplified model parameters are identical if D c is calculated to be 0. Nonetheless, it is quite difficult to achieve identical results when the data volumes, number of clusters, and the complexity of the sample distribution are high.Thus, the parameter identification results are considered accurate if D c is less than 5%.

Case 1: 2880 Data Points, 2 Modes
In this subsection, we conduct a case study involving the parameter identification of two modes shown in Figure 3.The clustering results are shown in Figure 5.We can see that the data points with different colors belong to different modes, and two linear modes are separated clearly.The parameters of two linear modes are detailed in Table 2.
This case shows that the proposed DCMM is able to handle the linear clustering problem and obtain accurate parameters as D c is calculated to be 3.27%.

Case 1: 2880 Data Points, 2 Modes
In this subsection, we conduct a case study involving the parameter identification of two modes shown in Figure 3.The clustering results are shown in Figure 5.We can see that the data points with different colors belong to different modes, and two linear modes are separated clearly.The parameters of two linear modes are detailed in Table 2.

Case 2: 3360 Data Points, 3 Modes
The second case study involving the parameter identification of the three modes shown in Figure 4 was tested.The clustering results are shown in Figure 6.It can be seen that three linear modes are divided properly under the proposed DCMM.The parameters of the three linear modes are detailed in Table 3.The second case study involving the parameter identification of the three modes shown in Figure 4 was tested.The clustering results are shown in Figure 6.It can be seen that three linear modes are divided properly under the proposed DCMM.The parameters of the three linear modes are detailed in Table 3.
shown in Figure 3.The clustering results are shown in Figure 5.We can see that the data points with different colors belong to different modes, and two linear modes are separated clearly.The parameters of two linear modes are detailed in Table 2.

Case 2: 3360 Data Points, 3 Modes
The second case study involving the parameter identification of the three modes shown in Figure 4 was tested.The clustering results are shown in Figure 6.It can be seen that three linear modes are divided properly under the proposed DCMM.The parameters of the three linear modes are detailed in Table 3.It can be found that the DCMM maintains its high precision when the test dataset changes observably because D c is calculated to be 2.18% in this case.

Summary of Case 1 and 2
It can be observed that the DCMM is able to separate the intersecting linear modes exactly.The parameter identification results are accurate as the average estimation deviations of two cases are less than 5%.In other words, the proposed DCMM is effective in different scenes, even if the data volumes, number of clusters, and sample distributions change significantly.

Case Study
In this section, the DCMM was utilized to identify the parameters of the multi-mode EAF harmonic model based on the harmonic dataset.A comparison among the proposed model, the constant-harmonic-ratio-type model, and the Norton equivalent model was then implemented.Finally, the application of the multi-mode EAF harmonic model is introduced.

Parameter Identification of Multi-Mode EAF Harmonic Model
To obtain the harmonic data of the EAF, the EAF model simulating flicker disturbance was developed in Simulink.As shown in Figure 7, a three-phase EAF model connects a 0.4 kV, 1 MVA, 50 Hz three-phase source, where Z 1 is the system impedance resulting from the minimum short-circuit capacity of the assumed source.The three-phase EAF model shown in Figure 7 is detailed in Figure 8; each EAF function block represents an electrode at each phase, and the controlled voltage source with a resistive and inductive network was adopted to model the flicker frequency and magnitude variation.[0.3510, −0.8861] −0.0904It can be found that the DCMM maintains its high precision when the test dataset changes observably because Dc is calculated to be 2.18% in this case.

Summary of Case 1 and 2
It can be observed that the DCMM is able to separate the intersecting linear modes exactly.The parameter identification results are accurate as the average estimation deviations of two cases are less than 5%.In other words, the proposed DCMM is effective in different scenes, even if the data volumes, number of clusters, and sample distributions change significantly.

Case Study
In this section, the DCMM was utilized to identify the parameters of the multi-mode EAF harmonic model based on the harmonic dataset.A comparison among the proposed model, the constant-harmonic-ratio-type model, and the Norton equivalent model was then implemented.Finally, the application of the multi-mode EAF harmonic model is introduced.

Parameter Identification of Multi-Mode EAF Harmonic Model
To obtain the harmonic data of the EAF, the EAF model simulating flicker disturbance was developed in Simulink.As shown in Figure 7, a three-phase EAF model connects a 0.4kV, 1MVA, 50Hz three-phase source, where Z1 is the system impedance resulting from the minimum short-circuit capacity of the assumed source.The three-phase EAF model shown in Figure 7 is detailed in Figure 8; each EAF function block represents an electrode at each phase, and the controlled voltage source with a resistive and inductive network was adopted to model the flicker frequency and magnitude variation.The simulation time was set to 101 s, and the current and voltage signals extracted through a PQ recorder were analyzed every 0.02 s with a fixed sampling rate of 2500 Hz by the fast-Fourier transform (FFT) algorithm.Finally, we obtained 5000 sets of the fundamental current, the 2-nd to 25th harmonic current, and the 2-nd to 25-th harmonic voltage peaks.Taking the 5-th harmonic current as an example (the multi-mode EAF harmonic model of other frequencies can be also established by our method), the 5000 sets of the fundamental current, the 5th harmonic current, and the 2-nd to 25-th harmonic voltage peaks are normalized according to the short-circuit capacity and base voltage, and the dimensions of the normalized dataset are reduced to 3-dimensions from 25-dimensions by PCA.In this case, 4000 sets of data shown in Figure 9 are employed to identify the parameters of the multi-mode EAF harmonic model and the remaining 1000 sets of data named the validation dataset are utilized to evaluate modeling accuracy.Taking the 5-th harmonic current as an example (the multi-mode EAF harmonic model of other frequencies can be also established by our method), the 5000 sets of the fundamental current, the 5-th Energies 2019, 12, 4378 10 of 15 harmonic current, and the 2-nd to 25-th harmonic voltage peaks are normalized according to the short-circuit capacity and base voltage, and the dimensions of the normalized dataset are reduced to 3-dimensions from 25-dimensions by PCA.In this case, 4000 sets of data shown in Figure 9 are employed to identify the parameters of the multi-mode EAF harmonic model and the remaining 1000 sets of data named the validation dataset are utilized to evaluate modeling accuracy.Taking the 5-th harmonic current as an example (the multi-mode EAF harmonic model of other frequencies can be also established by our method), the 5000 sets of the fundamental current, the 5th harmonic current, and the 2-nd to 25-th harmonic voltage peaks are normalized according to the short-circuit capacity and base voltage, and the dimensions of the normalized dataset are reduced to 3-dimensions from 25-dimensions by PCA.In this case, 4000 sets of data shown in Figure 9 are employed to identify the parameters of the multi-mode EAF harmonic model and the remaining 1000 sets of data named the validation dataset are utilized to evaluate modeling accuracy.The number of clusters of the modeling dataset was calculated to be 3.The clustering results are shown in Figure 10, the data points with different colors belong to different modes, it is can be found that three linear modes corresponding to three modes of the EAF smelting process are separated as desired.The simplified model parameters can be obtained and are listed in Table 4.In terms of Equation ( 8), the model parameters can be obtained.The number of clusters of the modeling dataset calculated to be 3.The clustering results are shown in Figure 10, the data points with different colors belong to different modes, it is can be found that three linear modes corresponding to three modes of the EAF smelting process are separated as desired.The simplified model parameters can be obtained and are listed in Table 4.In terms of Equation ( 8), the model parameters can be obtained.

Comparison of Different Models
The proposed DCMM was also adopted to establish the constant-harmonic-ratio-type model and the Norton equivalent model based on the modeling dataset, shown in Figure 9, respectively.To demonstrate the advantages of the proposed model, a comparison among the proposed model, the constant-harmonic-ratio-type model, and the Norton equivalent model was implemented based on the validation dataset.The evaluation indexes are shown as follows,

Comparison of Different Models
The proposed DCMM was also adopted to establish the constant-harmonic-ratio-type model and the Norton equivalent model based on the modeling dataset, shown in Figure 9, respectively.
To demonstrate the advantages of the proposed model, a comparison among the proposed model, the constant-harmonic-ratio-type model, and the Norton equivalent model was implemented based on the validation dataset.The evaluation indexes are shown as follows, where ME represents the mean error, MSE is the mean square error, R 2 denotes the coefficient of determination, FA is the fitting accuracy, N is the number of data points of validation dataset, I t(i) and I c(i) , respectively, denote the true value and calculated value of the 5-th harmonic current of the i-th data point of the validation dataset, and I is the mean value of the 5-th harmonic current of the validation dataset.
The comparison results are demonstrated in Table 5 and it can be observed that both the ME and MSE of the proposed models are less than those of the other two models, therefore, the proposed model is more accurate than the other two models.Moreover, both the R 2 and FA of the proposed model are greater than those of the other two models, thus, the fitting effect of the proposed model is optimal among the three models.

Application of Multi-Mode EAF Harmonic Model
In this subsection, the application of the multi-mode EAF harmonic model is elaborated.The process of the harmonic power-flow calculation based on the multi-mode EAF harmonic model is shown in Figure 11.The fundamental current is known since the fundamental power-flow calculation is carried out before the harmonic power-flow calculation.First, we set the initial values of harmonic voltages.Then, the values of harmonic currents are calculated based on Equation (4).Later, the harmonic currents are substituted into the harmonic power-flow equations to calculate new values of harmonic voltages.Finally, if the convergence condition is true, the new values of harmonic voltages are obtained, otherwise, return to Step 2. process of the harmonic power-flow calculation based on the multi-mode EAF harmonic model is shown in Figure 11.The fundamental current is known since the fundamental power-flow calculation is carried out before the harmonic power-flow calculation.First, we set the initial values of harmonic voltages.Then, the values of harmonic currents are calculated based on Equation (4).Later, the harmonic currents are substituted into the harmonic power-flow equations to calculate new values of harmonic voltages.Finally, if the convergence condition is true, the new values of harmonic voltages are obtained, otherwise, return to Step 2.

Start
Set the initial values of harmonic voltages U (0) .
Calculate the values of harmonic currents based on (2).
Obtain new values of harmonic voltages.
Output the harmonic voltages U (i) .Taking a mode of the 3-rd and 5-th multi-mode EAF harmonic models as an example, we calculated the 3-rd and 5-th harmonic voltages according to the process shown in Figure 11, respectively.The calculated results are demonstrated in Table 6, Zh denotes the h-th harmonic system impedance calculated based on the circuit diagram shown in Figure 7, Uh is the h-th harmonic voltage.

Order
Zh(ohms) Uh(V) 3 0.039+0.882i30.4131 5 0.039+1.372i18.2623The probability distributions of the measured 3-rd and 5-th harmonic voltage data are shown in Figure 12.The blue lines denote the mean values of the measured 3-rd and 5-th harmonic voltages Taking a mode of the 3-rd and 5-th multi-mode EAF harmonic models as an example, we calculated the 3-rd and 5-th harmonic voltages according to the process shown in Figure 11, respectively.The calculated results are demonstrated in Table 6, Z h denotes the h-th harmonic system impedance calculated based on the circuit diagram shown in Figure 7, U h is the h-th harmonic voltage.The probability distributions of the measured 3-rd and 5-th harmonic voltage data are shown in Figure 12.The blue lines denote the mean values of the measured 3-rd and 5-th harmonic voltages and the red lines denote the harmonic power-flow calculation results.It can be observed that the established multi-mode EAF harmonic models are effective in the harmonic power-flow calculation, as the deviations between the calculated values and the mean values of measured data are 3.10% and 2.40%, respectively.and the red lines denote the harmonic power-flow calculation results.It can be observed that the established multi-mode EAF harmonic models are effective in the harmonic power-flow calculation, as the deviations between the calculated values and the mean values of measured data are 3.10% and 2.40%, respectively.

Discussion
This paper proposes the DCMM for the multi-mode EAF harmonic model.When adopting the same algorithm (i.e., PSO-based K-means algorithm), the results show that the multi-mode EAF

Discussion
This paper proposes the DCMM for the multi-mode EAF harmonic model.When adopting the same algorithm (i.e., PSO-based K-means algorithm), the results show that the multi-mode EAF harmonic model is more accurate than the constant-harmonic-ratio-type model and the Norton equivalent model (see Table 5).The constant-harmonic-ratio-type model of EAF considers the interaction between the harmonic currents and the fundamental current but ignores the coupling relationship between the harmonic currents and harmonic voltages.The Norton equivalent model of EAF considers the impact of the harmonic voltages on the harmonic current in the same frequency but ignores the interaction among the harmonic currents, the fundamental current, and the harmonic voltages of different frequencies.As the correlation between the harmonic currents and fundamental current is more prominent than that between the harmonic currents and the harmonic voltage of the same frequency, therefore, the constant-harmonic-ratio-type model is more accurate than the Norton equivalent model.In our method, the proposed multi-mode EAF harmonic model considers the linear coupling relationship among the harmonic currents, the fundamental current, and the harmonic voltages, as shown in Equation (3).Therefore, the multi-mode EAF harmonic model is more accurate than the other two models.However, the increased model variables would inevitably lead to poor computational efficiency.To deal with such an issue, our method extracts the smaller representative dataset from a larger one to improve computational efficiency.
On the other hand, the DCMM based on the PSO-based K-means algorithm is elaborated in Section 2.2.Compared with the conventional K-means algorithm, it is more efficient for the PSO-based K-means algorithm to search for the near-global solution or global optimal solution, because high dimensional spatial data is preprocessed, and the optimal number of clusters and initial clustering centers are calculated.The performance evaluation results clearly show that the proposed DCMM is accurate in establishing the multi-mode model since the average estimation deviations are less than 5% under different experimental conditions.Moreover, the accuracy of the linear approximation of the proposed model introduced in Section 2.1 can be reflected by FA.As shown in Table 5, it can be found that the FA of the proposed model is at the desired level.However, the accuracy of the linear approximation of the proposed model may decrease when the small-scale modeling dataset is employed in EAF modeling as the outliers of the small-scale modeling dataset have a greater negative impact on the linear fitting than those of the large-scale modeling dataset.

Conclusions
This paper has developed the DCMM to establish the multi-mode EAF harmonic model to evaluate the harmonic pollution of EAF.The proposed DCMM fully considers the coupling relationship among different frequencies of harmonics while employing the PCA to improve computational efficiency.Furthermore, the proposed DCMM partitions the multi-mode EAF harmonic data into several clusters corresponding to the different modes of the EAF smelting process, and identifies the multi-mode EAF harmonic model parameters.Finally, the multi-mode EAF harmonic model was applied to the harmonic power-flow calculation to evaluate the harmonic pollution of EAF.The accuracy of the multi-mode EAF harmonic model was verified by carrying out the case study.Comparison results show that the multi-mode EAF harmonic model is more accurate than the other two models.In summary, the proposed DCMM is effective for evaluating the harmonic pollution of EAF, in this way, it improves the energy efficiency and productivity of EAF.
Note that the multi-mode harmonic model is suitable for EAF, which in practice, may not be applicable to other harmonic sources.Hence, our future research work will focus on studying the different multi-mode models for different harmonic sources.

Figure 1 .
Figure 1.The process of the proposed data-driven compartmental modeling method (DCMM).

Figure 1 .
Figure 1.The process of the proposed data-driven compartmental modeling method (DCMM).

Table 2 .
The estimated values of the simplified model parameters of case 1. .8913,−0.3801] −0.2142This case shows that the proposed DCMM is able to handle the linear clustering problem and obtain accurate parameters as Dc is calculated to be 3.27%.

Figure 6 .Table 3 .
Figure 6.Clustering results of case 2.Table 3. The estimated values of the simplified model parameters of case 2. Case Mode A p C The simulation time was set to 101 s, and the current and voltage signals extracted through a PQ recorder were analyzed every 0.02 s with a fixed sampling rate of 2500 Hz by the fast-Fourier transform (FFT) algorithm.Finally, we obtained 5000 sets of the fundamental current, the 2-nd to 25-th harmonic current, and the 2-nd to 25-th harmonic voltage peaks.

Figure 7 .
Figure 7. Simplified circuit diagram for data measurement of the electric arc furnace (EAF).Figure 7. Simplified circuit diagram for data measurement of the electric arc furnace (EAF).

Figure 7 . 16 Figure 8 .
Figure 7. Simplified circuit diagram for data measurement of the electric arc furnace (EAF).Figure 7. Simplified circuit diagram for data measurement of the electric arc furnace (EAF).Energies 2019, 12, x FOR PEER REVIEW 10 of 16

Table 4 .
The simplified model parameters of the 5-th harmonic of the EAF.

Figure 10 .
Figure 10.Clustering results of the 5-th harmonic dataset of the EAF.

Table 4 .
The simplified model parameters of the 5-th harmonic of the EAF.

Figure 11 .
Figure 11.The process of harmonic power-flow calculation.

Figure 11 .
Figure 11.The process of harmonic power-flow calculation.

Figure 12 .
Figure 12.The probability distributions of the measured data.

Figure 12 .
Figure 12.The probability distributions of the measured data.

Table 1 .
(7) true values of the simplified model parameters of case 1 and 2. According to Equation(7)and the parameters in Table1, we utilized MATLAB to generate 2880 and 3360 data points detailed in Figure

Table 1 .
The true values of the simplified model parameters of case 1 and 2.

Table 2 .
The estimated values of the simplified model parameters of case 3.2.Case 2: 3360 Data Points, 3 Modes

Table 5 .
Comparison results of the different models.

Table 6 .
The harmonic power-flow calculation results.

Table 6 .
The harmonic power-flow calculation results.