Segmentation of High Dimensional Time-Series Data Using Mixture of Sparse Principal Component Regression Model with Information Complexity

This paper presents a new and novel hybrid modeling method for the segmentation of high dimensional time-series data using the mixture of the sparse principal components regression (MIX-SPCR) model with information complexity (ICOMP) criterion as the fitness function. Our approach encompasses dimension reduction in high dimensional time-series data and, at the same time, determines the number of component clusters (i.e., number of segments across time-series data) and selects the best subset of predictors. A large-scale Monte Carlo simulation is performed to show the capability of the MIX-SPCR model to identify the correct structure of the time-series data successfully. MIX-SPCR model is also applied to a high dimensional Standard & Poor’s 500 (S&P 500) index data to uncover the time-series’s hidden structure and identify the structure change points. The approach presented in this paper determines both the relationships among the predictor variables and how various predictor variables contribute to the explanatory power of the response variable through the sparsity settings cluster wise.


Introduction
This paper presents a new and novel method for the segmentation and dimension reduction in high dimensional time-series data. We develop hybrid modeling between mixture-model cluster analysis and sparse principal components regression (MIX-SPCR) model as an expert unsupervised classification methodology with information complexity (ICOMP) criterion as the fitness function. This new approach performs dimension reduction in high dimensional time-series data and, at the same time, determines the number of component clusters.
The research of time-series segmentation and change point positioning has been a hot topic of research for a long time. Different research groups have provided solutions with various approaches in this area, including, but not limited to, Bayesian methods Barber et al. [1], fuzzy systems Abonyi and Feil [2], and complex system modeling Spagnolo and Valenti [3], Valenti et al. [4], S Lima [5], Ding et al. [6]. We group these approaches into two branches, one based on complex systems modeling and the other on the statistical model through parameter estimation and inference. Among the complex systems-based modeling approaches, it is worth noting a series of papers that use the stochastic volatility model by Spagnolo and Valenti [3]. For example, these authors used a nonlinear Hestone model to analyze 1071 stocks on the New York Stock Exchange (1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998). After accounting for the stochastic nature of volatility, the model is well

•
Identify and select variables that are sparse in the MIX-SPCR model.

•
Treat each time segment continuously in the process with some specified probability density function (pdf).

•
Determine the number of time-series segments and the number of sparse variables and estimate the structural change points simultaneously.

•
Develop a robust and efficient algorithm for estimating model parameters.
We aim to achieve these objectives by developing the information complexity (ICOMP) criteria as our fitness function throughout the paper for the segmentation of high-dimensional time-series data.
Our approach involves a two-stage procedure. We first make a variable selection by using SPCA with the benefit of sparsity. We then fit the sparse principal component regression (SPCR) model by transforming the original high dimensional data into several main principal components and estimating relationships between the sparse component loadings and the response variable. In this way, the mixture model not only handles the curse of dimensionality but also maintains the model's excessive explanatory power. In this manner, we choose the best subset of predictors and determine the number of time-series segments in the MIX-SPCR model simultaneously using ICOMP.
The rest of the paper is organized as follows. In Section 2, we present the model and methods. In particular, we first briefly explain sparse principal component analysis (SPCA) due to Zou et al. [14] in Section 2.1. In Section 2.2, we modify SPCA and develop mixtures of the sparse principal component regression (MIX-SPCR) model for the segmentation of time-series data. In Section 3, we present a regularized entropy-based Expectation and Maximization (EM) clustering algorithm. As is well known, the EM algorithm performs through maximizing the likelihood of the mixture models. However, to make the conventional EM algorithm robust (not sensitive to initial values) and converge to global optimum, we use the robust version of the EM algorithm for the MIX-SPCR model based on the work of Yang et al. [15]. These authors addressed the robustness issue by adding an entropy term of mixture proportions to the conventional EM algorithm's objective function. While our EM algorithm is in the same spirit of the Yang et al. [15] approach, there are significant differences between our approach and theirs. Yang's robust EM algorithm merely deals with the usual clustering problem without involving any response (or dependent) variable or time factor in the data. We extend it to the case of the MIX-SPCR model in the context of time-series data. In Section 4, we discuss various information criteria, specifically the information complexity based criteria (ICOMP). We derive the ICOMP for the MIX-SPCR model based on Bozdogan's previous research ([ [16][17][18][19][20]). In Section 5, we present our Monte Carlo simulation study. Section 5.2 involves an experiment on the detection of structural points, and Section 5.3 presents a large scale Monte Carlo simulation verifying the advantage of the MIX-SPCR with statistical information criteria. We provide a real data analysis in Section 6 using the daily adjusted closing S&P 500 index and stock prices from the Yahoo Finance database that spans the period from January 1999 to December 2019. Finally, our conclusion and discussion are presented in Section 7.

Model and Methods
In this section, we briefly present the sparse principal component analysis (SPCA), sparse principal component regression (SPCR) as a background. Then, by hybridizing these two methods within the mixture model, we propose the mixture-model cluster analysis of sparse principal component regression (abbreviated as MIX-SPCR model hereafter), for segmentation of high dimensional time-series datasets. Compared with a simple linear combination of all explanatory variables (i.e., the dense PCA model), the new approach interprets better because it maintains a sparsity specification.
Referring to Figure 1, we first show the overall structure of the model in this paper. The overall processing flow is that we clean and standardize the data after obtaining the time-series data. Subsequently, we specify the number of time-series segments and how many Sparse Principal Components (SPCs) each segment contains. Using the Robust EM algorithm (Section 3), we estimate the model parameters, especially the boundaries (also known as change points) of each time segment. The information criterion values are calculated using the method of Section 4. By testing different numbers of time segments/SPCs, we obtain multiple criterion values. According to the calculated information criterion values, we choose the most appropriate model with the estimated parameters.

Sparse Principal Component Analysis (SPCA)
Given the input data matrix, X with n number of observations and p variables, we decompose X using the singular value decomposition (SVD). We write the decomposition procedure as X = UDV T , where D is a diagonal matrix of singular values and orthogonal columns U and V as the left and right singular vectors. When we perform SVD of a data matrix X that has been centered, by subtracting each column's mean, the process is the well-known principal component analysis (PCA). As discussed by Zou et al. [14], PCA has several advantages as compared with other dimensionality reduction techniques. For example, the PCA can sequentially identify the source of variability by considering the linear combination of all the variables. Because of the orthonormal constraint during the computation, all the calculated principal components (PCs) have clear geometrical interpretation corresponding to the original data space as a dimension reduction technique. Because PCA can deal with "the curse of dimensionality" of high-dimensional data sets, it has been widely used in real-world scenarios, including biomedical and financial applications.
Even though PCA has excellent properties that are desirable in real-world applications and statistical analysis, the interpretation of PCs is often difficult since it includes all the variables as linear combinations of all the original variables in each of the PCs. In practice, the principal components always have a large number of non-zero coefficient values for corresponding variables. To resolve this drawback, researchers proposed various improvements focusing on PCA's sparsity while maintaining the minimal loss of information. Shen and Huang [21] designed an algorithm to iteratively extract top PCs using the so-called penalized least sum of square (PLSS) criterion. Zou et al. [14] utilized the lasso penalty (via Elastic Net) to maintain a sparse loading of the principal components, which is named sparse principal component analysis (SPCA).
In this paper, we use the sparse principal component analysis (SPCA) proposed by Zou et al. [14]. Given the data matrix X, we minimize the objective function to obtain the SPCA results: subject to where I k is the identity matrix. We maintain the hyperparameters λ 1,j and λ 2 to be non-negative. The A and B matrices of size (p × k) are given by and If we choose the first k principal components from the data matrix X, then the estimateB (j) contains the sparse loading vectors, which are no longer orthogonal.
A bigger λ 1,j means a greater penalty for having non-zero entries inB (j) . By using different λ 1,j , we control the number of zeros in the jth loading vector. If λ 1,j = 0 for j = 1, 2, . . . , k, this problem reduces to usual PCA.
Zou et al. [14] proposed a generalized SPCA algorithm to solve the optimization problem in Equation (1). The algorithm applies the Elastic Net (EN) to estimate B (j) iteratively and update matrix A. However, this algorithm is not the only available approach for extracting principal components with sparse loadings. The SPCA could also be computed through dictionary learning by Mairal et al. [22]. By introducing the probability model of principal component analysis, SPCA is equivalent to the sparse probabilistic principal component analysis (SPPCA) if the prior is Laplacian distribution for each weight matrix element (Guan and Dy [23], Williams [24]). For further discussion on SPPCA, we refer readers to those related publications for more details.
Next, we introduce the MIX-SPCR model for the segmentation of time-series data.

Mixtures of SPCR Model for Time-Series Data
Suppose the continuous response variable is denoted as y = {y i |1 ≤ i ≤ n}, where n represents the number of observations (time points). Similarly, we have the predictors denoted as Each observation x i has p dimensions and is represented as Both the response variable and independent variables are collected sequentially labeled by time points T = [t 1 , t 2 , · · · , t n ].
The finite mixture model allows applying cluster analysis on conditionally dependent data into several classes. In the time-series data scenario, researchers cluster the data ((t 1 , x 1 , y 1 ), (t 2 , x 2 , y 2 ), · · · , (t n , x n , y n )) into several homogeneous groups where the number of groups G is unknown in general. Within each group, we apply the SPCA to extract top k principal components that each of them has a sparse loading of p variable coefficients. The extracted top k PCs are denoted as matrix P p×k . We also use P g to represent the principal component matrix obtained from the group indexed by g = 1, 2, . . . , G.
The SPCR model assumes that each pair (x i , y i ) is independently drawn from a cluster using both the SPCA and the regression model as follows.
For each group g, the random error is assumed to be Gaussian distributed. That is, i,g ∼ N (0, σ 2 g ). If the response variable is multivariate, then the random error is usually also assumed to be a multivariate Gaussian distribution. Thus the probability density function (pdf) of the SPCR model is We emphasize here that the noise (i.e., the error term) included in the statistical model is drawn from a normal distribution independent for each time-series segment, with different values of σ 2 g for each period. Since we use the EM algorithm to estimate the parameters of the model, the noise parameter σ 2 g can be estimated accurately as well. Future studies will consider introducing different noise distributions, such as α-stable Lévy noise [25], and other non-Gaussian noise distributions to further extend the current model.
We also consider time factor t i in the SPCR model of time-series data to be continuous. The pdf of the time factor is where v g is the mean, and σ 2,time g is the variance of the time segment g. Apart from the normal distribution, our approach can also be generalized to other distributions for the time factor, such as skewed distributions, Student's t-distribution, ARCH, GARCH time-series models, and so on.
As a result, if we use the MIX-SPCR model to perform segmentation of time-series data, the likelihood function of the whole data ((t 1 , x 1 , y 1 ), (t 2 , x 2 , y 2 ), · · · , (t n , x n , y n )) with G number of clusters (or segments) is given by where the π g is the mixing proportion with the constraint that π g ≥ 0 and We follow the definition of missing values by Yang et al. [15] and let Z = {Z 1 , Z 2 , · · · , Z n }. If Z i = g, then z g,i = 1, otherwise, z g,i = 0. Then the log-likelihood function of the MIX-SPCR model models is We denote z = z g,i where g = 1, 2, · · · , G and i = 1, 2, · · · , n.
Given the number of segments, researchers usually apply the EM algorithm to determine the optimal segmentation by setting the objective function as J EM = L mix (Gaffney and Smyth [26], Esling and Agon [27], Gaffney [28]).

Regularized Entropy-Based EM Clustering Algorithm
The EM algorithm is a method for iteratively optimizing the objective function. As discussed in Section 2.2, by setting the objective function as the log-likelihood function, we can use the EM algorithm to identify optimal segmentation of time series.
However, in practice, the EM algorithm is sensitive to model initialization conditions and cannot estimate the number of clusters appropriately. To deal with the initialization problem, in 2012, Yang et al. [15] proposed using an entropy penalty to stabilize the computation of each step. The improved method is called the robust EM algorithm. In this paper, we extend the robust EM algorithm to deal with time-series data for the MIX-SPCR model.
In Section 3.1, we discuss the entropy term of the robust EM algorithm. Then, we show the extension of the robust EM algorithm for the MIX-SPCR model in Sections 3.2 and 3.3.

The Entropy of EM Mixture Probability
As introduced in Equation (8), the π g represents the mixture probability of each cluster or segment. In other words, the value of π g is the probability that a data point belongs to group g. The clustering complexity is determined by the number of clusters and corresponding probability values, which could be obtained using entropy. Given {π g |1 ≤ g ≤ G}, the entropy of Z i is Then the entropy of Z is written as, The objective function of the robust EM algorithm is where λ Robust-EM ≥ 0. The log-likelihood term L mix is from Equation (9), which gives the goodness-of-fit. Next, we present the steps of the EM algorithm for maximizing the objective function in Equation (13).

E-Step (Expectation)
From a Bayesian perspective, we let z g,i denote the posterior probability of the true cluster membership that a dataset triplet (t i , x i , y i ) is drawn from group g. Using the Bayes theorem, we have

M-Step (Maximization)
Using the robustified derivation of π g , the estimated mixture proportion, we have where We follow the recommendation of Yang et al. [15] for the value of λ new Robust-EM as where and p is the number of variables in the model. We iterate E-step and M-step several times until convergence to obtain the parameter estimates. In particular, the β g values get updated by maximizing the J Robust-EM from Equation (13). Since we fix the number of segments and principal components during each E-step and M-step, the updated values of β g and σ g can be calculated using L mix directly. The estimated values of β g and σ g are given as follows.
For the time factor, the estimated mean v g and variance σ 2,time As discussed above, our approach is flexible in considering other distributional models for the time-series factor, which we will pursue in separate research work.

Information Complexity Criteria
Recently, the statistical literature recognized the necessity of introducing model selection as one of the technical areas. In this area, the entropy and the Kullback-Leibler [29] information (or KL distance) play a crucial role and serve as an analytical basis to obtain the forms of model selection criteria. In this paper, we use information criteria to evaluate a portfolio of competing models and select the best-fitting model with minimum criterion values.
One of the first information criteria for model selection in the literature is due to the seminal work of Akaike [30]. Following the entropy maximization principle (EMP), Akaike developed the Akaike's Information Criterion (AIC) to estimate the expected KL distance or divergence. The form of AIC is where L(θ) is the maximized likelihood function, and k is the number of estimated free parameters in the model. The model with minimum AIC value is chosen as the best model to fit the data. Motivated by Akaike's work, Bozdogan [16][17][18][19][20]31] developed a new information complexity (ICOMP) criteria based on Van Emden's [32] entropic complexity index in parametric estimation. Instead of penalizing the number of free parameters directly, ICOMP penalizes the covariance complexity of the model. There are several forms of ICOMP. In this section, we present the two general forms of ICOMP criteria based on the estimated inverse Fisher information matrix (IFIM). The first form is where L(θ) is the maximized likelihood function, and C 1 ( F −1 ) represents the entropic complexity of IFIM. We define C 1 ( F −1 ) as and where s = rank( F −1 ). We can also give the form of C 1 ( F −1 ) in terms of eigenvalues, whereλ a is the arithmetic mean of the eigenvalues, λ 1 , λ 2 , . . . , λ s , andλ g is the geometric mean of the eigenvalues. We note that ICOMP penalizes the lack of parsimony and the profusion of the model's complexity through IFIM. It offers a new perspective beyond counting and penalizing number of estimated parameters in the model. Instead, ICOMP takes into account interaction (i.e., correlation) among the estimated parameters through the model fitting process.
We define the second form of ICOMP as where C 1F ( F −1 ) is given by In terms of the eigenvalues of IFIM, we write C 1F ( F −1 ) as We want to highlight some features of measures the relative variation in the eigenvalues.
These two forms of ICOMP provide us an easy to use computational means in high dimensional modeling. Next, we derive the analytical forms of ICOMP in the MIX-SPCR model.

Derivation of Information Complexity in MIX-SPCR Model for Time-Series Data
We first consider the log-likelihood function of the MIX-SPCR model given in Equation (9), After some work, the estimated inverse Fisher information matrix (IFIM) of the mixture probabilities is Similarly, for each segment g, the estimated IFIM, F −1 g,SPCR , is Note that the IFIM should include both the SPCR models F −1 g,SPCR and the time factor F −1 g,time for each segment.
For each segment g, the time factor is under the univariate Gaussian distribution. As a result, the IFIM of the time factor is By combining the two IFIMs for the SPCR model and the time factor, we have the inverse Fisher information Overall, the inverse of the estimated Fisher information matrix (IFIM) for the MIX-SPCR model becomes Using the above definition of ICOMP(IFIM) and the properties of block-diagonal matrices with their trace and determinant, we have where and where s = rank( Similarly, we derive the second equivalent form of ICOMP(IFIM) C 1F as Using the properties of the block-diagonal matrices, we have Thus, an open computational form of ICOMP(IFIM) C 1F becomes We note that in computing both forms of ICOMP above, we do not need to build the full inverse of the estimated Fisher information matrix (IFIM) for the MIX-SPCR model given in Equation (36). All one requires is the computation of IFIM for each segment, which is appealing.
We also use AIC and CAIC (Bozdogan [33]) for comparison purposes given by where s * = G(k + 3) is the number of estimated parameters in the MIX-SPCR model and log denotes the natural logarithm of the sample size n. Next, we show our numerical examples starting with a detailed Monte Carlo simulation study.

Monte Carlo Simulation Study
We perform numerical experiments in a unified computing environment: Ubuntu 18.04 operating system, Intel I7-8700, and 32 GB of RAM. We use the programming language Python and the scientific computing package NumPy [34] to build a computational platform. The size of the input data directly affects the running time of the program. At n = 4000 time-series observations, the execution time for each EM iteration is about 0.9 s. Parameter estimation can reach convergence within 40 steps of iterations, with a total machine run time of 37 s.

Simulation Protocol
In this section, we present the performance of the proposed MIX-SPCR model using synthetic data generated from a segmented regression model. Our simulation protocol has p = 12 variables and four actual latent variables. Two segmented regression models determine the dependent variable y, and each segment is continuous and has its own specified coefficients (β 1 and β 2 ). Our simulation set up is as follows: y t,g=2 = x 2,t β 2 + ε 2,t , t = 2801, 2802, · · · , 4000.
We set the total number of time-series observations, n = 4000. The first segment has n 1 = 2800, and the second segment has n 2 = 1200 time-series observations. We randomly draw error term from a Gaussian distribution with zero mean and σ 2 = 9. Among all the variables, the first six observable variables explain the first segment, and the remaining six explanatory variables primarily determine the second segment. We set the mixing proportions π 1 = 0.7 and π 2 = 0.3 for two time-series segments, respectively.

Detection of Structural Change Point
In the first simulation study, we limit the actual number of segments equal to two, which means that the first segment expands from the starting point to a structural change point, and the second segment expands from the change point to the end. By design, each segment is continuous on the time scale, and different sets of independent variables explain the trending and volatility. We run the MIX-SPCR model to see if it can successfully determine the position of the change point using the information criteria. If a change point is correctly selected, we expect that the information criteria is minimized at this change point. Figures 2 and 3 show our results from the MIX-SPCR model. Specifically, it shows the sample path of the information criteria at each time point. We note that all the information criteria values are minimized from t = 2800 to t = 3000, which covers the time-series's actual change point position. As the MIX-SPCR model selects different change points, the penalty term of AIC and CAIC remain the same because both the number of model parameters and the number of observations do not change. In this simulation scenario, the fixed penalty term means that the AIC and CAIC reflect the changes only in the "lack of fit" term of various models without considering model complexity. This indicates that using AIC-type criteria just counting and penalizing the number of parameters may be necessary but not sufficient in model selection.
As a comparison, however, we note that the penalty term of information complexity-based criteria, C 1 and C 1F , are adjusted in selecting different change points. They are varying but not fixed.

A Large-Scale Monte Carlo Simulation
Next, we perform a large-scale Monte Carlo simulation to illustrate the MIX-SPCR model's performance in choosing the correct number of segments and the number of latent variables. A priori, in this simulation, we pretend that we do not know the actual structure of the data and use the information criteria to recover the actual construction of the MIX-SPCR model. To achieve this, we follow the above simulation protocol using a different number of time points by varying n = 1000, 2000, 4000. As before, there are twelve explanatory variables drawn from four latent variable models generated from a multivariate Gaussian distribution given in Equation (47). The simulated data again consist of two time-series segments with mixing proportions π 1 = 0.7 and π 2 = 0.3, respectively. For each data generating process, we replicate the simulation one hundred times and record both information complexity-based criteria (ICOMP(IFIM) & ICOMP(IFIM) C 1F ) and classic AIC-type criteria (AIC & CAIC).
In Table 1, we present how many times the MIX-SPCR model selects different models in the one hundred simulations. In this way, we can assess different information criteria by measuring the hit rates.
Looking at Table 1, we see that when the sample size n = 1000 (small), AIC selects the correct model (G = 2, k = 4) 69 times, CAIC selects 80 times, ICOMP(IFIM) selects 48 times, and ICOMP(IFIM) C 1F selects 76 times, respectively, in 100 replications of the Monte Carlo simulation. When the sample size is small, ICOMP(IFIM) tends to choose a sparser regression model sensitive to the sample size. However, as the sample size increases, when n = 2000 and n = 4000, ICOMP(IFIM) consistently outperforms other information criteria in terms of hit rates. The percentage of the correctly identified model is above 90%, as reported above. In summary, the large-scale Monte Carlo simulation analysis highlights the performance of the MIX-SPCR model. As the sample size increases, the MIX-SPCR model improves its performance. As shown in Figure 3, the MIX-SPCR model can efficiently determine the structural change point and estimate the mixture proportions when the number of segments is unknown beforehand. Another key finding is that, by using the appropriate information criteria, the MIX-SPCR model can correctly identify the number of segments and the number of latent variables from the data. In other words, our approach can extract the main factors not only from the intercorrelated variables but also classify the data into several clearly defined segments on the time scale.

Description of Data
The financial market often generates a large amount of time-series data, and in most cases, the generated data is high-dimensional. In this paper, we use the S&P 500 index and its related hundreds of company stocks categorized into eleven sectors, which are high dimensional time-series data. The index value is the response variable mixed by plenty of companies' variations at each time point. These long time-series values often consist of different regimes and states. For example, the stock market experienced a boom period from 2017 to 2019, which is a dramatic change compared with the stock market during the 2008 financial crisis. If we analyze each sector or company, some industries perform more actively than others during a particular period.
In this section, we implement the MIX-SPCR model on the adjusted closing price of the S&P 500 (^GSPC) as a case study. We extract the daily adjusted closing prices from the Yahoo Finance database (https://finance.yahoo.com/) that spans the period from 1 January 1999 to 31 December 2019. By removing weekends and holidays, there are n = 5292 tradable days in total. The main focus of this section is to split the time-series into several self-contained segments. Besides, we expect the extracted sparse principal components to explain the variance and volatility in each segment.

Computational Results
To have a big picture of how the S&P 500 index values reflect the changes of 506 company stock prices, Figure 5 shows the plot of the normalized values of adjusted closing prices. We use the MIX-SPCR model with the information criteria to determine the number of segments and the number of sparse principal components. To achieve interpretable results, we limit our search space to a maximum of seven time-series and six sparse principal components. Table 2 shows the optimal combination of three self-contained segments and three sparse principal components for each of the segments by using the information complexity ICOMP(IFIM). The other three information criteria also choose this combination as the best-fitting model. Figure 6 illustrates the probability and time range of each segment. We can see that the first segment is from 1 January 1999, to 26     We emphasize that many factors may explain the stock market variation, and this is not a research on how the socioeconomic events influence the S&P 500 index. However, it does raise our interest in the distribution of two structural change points from the segmentation results. The first change point is October 2007, which is the early stage of the 2008 financial crisis. The second structural change point is December 2016, the transitional period of the USA presidential election. Identification of these two change points shows that our proposed method can detect the underlying physical and structural change from the available time-series data. Table 3 lists the estimated coefficients (β g ) from sparse principal component regression. Because all the collected stock prices and S&P 500 index values are standardized before implementing the MIX-SPCR model, we make dimension reduction, remove the constant term, and perform regression analysis using the SPCR model. The R 2 values are above 0.8 across all three different time segments. Table 3. SPCR coefficients (β g ) of three different segments.

Interpretation of Computational Results
One may ask a question, "Can the MIX-SPCR model identify the key variables from the hundreds of companies?" If the constructed model is dense, the selected companies would include all the sectors whereby the dense model is limiting the interpretation of the data. Our analysis identifies all the companies with non-zero coefficient values and maps them back to each of the sectors in Tables A1-A3. Each calculated sparse principal component vector consists of around fifty companies, much less than the original data dimension (p = 506). We observe that these selected companies are grouped into a few sectors within different time segments. For example, energy companies load in the first sparse principal component vector from 1999 to 2007 (segment 1) and diminish after that.
To have a detailed analysis of how different sectors perform across three segments, we do the stem plot to show the sparse principal component coefficients P g of four sectors, namely financials, real estate, energy, and information technology (IT). Figures 7-8 indicate a similar behavior that happened in financial and real estate companies. Both sectors play an essential role in the first two time-series segments but have no contribution in the third segment, which is the period after December 2016. Notice that in Figure 9, energy companies act as an essential player before 2016. However, during the recession in 2008, energy company loadings are negated from the first SPC to the second SPC. Compared with other industries, the variation in energy company stock prices does not contribute to the S&P 500 index after 2016.
Another question is "What sector/industry is the main contributing factor after the 2016 United States presidential election?" A possible answer is, as shown in Figure 10, the SPC coefficients of information technology companies. From 1999 to the recession in 2008, IT companies work mainly on the second SPC and the third SPC, which do not contribute much to the main variation. After the recession, the variations of IT companies do not contribute compared with other sectors. However, after December 2016, companies from the IT industry play an essential role in the primary stock price volatility.        ACN  ADBE  ADI  ADP  ADS  ADSK  AKAM  AMAT  AMD  ANET  ANSS  APH  AVGO  BR  CDNS  CDW  CRM  CSCO  CTSH  CTXS  DXC  FFIV  FIS  FISV  FLIR  FLT  FTNT  GLW  GPN  HPE  HPQ  IBM  INTC  INTU  IPGP  IT  JKHY  JNPR  KEYS  KLAC  LDOS  LRCX  MA  MCHP  MSFT  MSI  MU  MXIM  NLOK  NOW  NTAP  NVDA  ORCL  PAYC  PAYX  PYPL  QCOM  QRVO  SNPS  STX  SWKS  TEL  TXN  V  VRSN  WDC  WU  XLNX  XRX  As discussed above, Figures 7-10 provide a clear picture of how different sectors perform (via coefficient P g ) without considering the effects on the S&P 500 index. It might raise the interest in how the SPCR coefficient P g β g changes before/after certain socioeconomic events. We follow the research implemented by Aït-Sahalia and Xiu [35] about how the Federal Reserve addressing heightened liquidity from March 10 to 14 March 2008, affects the stock market. The data analyzed by Aït-Sahalia and Xiu [35] are the S&P 100 index values using the traditional PCA, and the authors grouped stocks into financial and non-financial categories. Instead of PCA, we apply the SPCR model on the S&P 500 index and analyze how eleven sectors react before/after Federal Reserve operations. Figure 11 shows that financials, consumer discretionary, real estate, and industrials experienced more significant perturbations than other sectors in terms of SPCR coefficients P g β g . This conclusion is consistent with the results from Aït-Sahalia and Xiu [35] that the average loadings of first and second principal components of financial companies are distinct from non-financial companies. However, considering that we have 506 companies in the raw data and make a sparse loading of companies for comparison, the excessive explanatory power is still maintained in this high-dimensional case using the SPCR model, which is more interpretable.

Conclusions and Discussions
In this paper, we presented a new and novel method to segment high-dimensional time-series data into different clusters or segments using the mixture model of the sparse principal components model (MIX-SPCR). The MIX-SPCR model considers both the relationships among the predictor variables and how various predictor variables contribute the explanatory power to the response variable through the sparsity settings. Information criteria have been introduced and derived for the MIX-SPCR model. These criteria are applied to study their performance under different sample sizes and to select the best-fitting model.
Our large-scale Monte Carlo simulation exercise showed that the MIX-SPCR model could successfully identify the real structure of the time-series data using the information criteria as the fitness function. In particular, based on our results, the information complexity-based criteria-i.e., ICOMP(IFIM) and ICOMP(IFIM) C 1F -outperformed the conventional standard information criteria, such as the AIC-type criteria as the data dimension and the sample size increase.
Later, we empirically applied the MIX-SPCR model to uncover the S&P 500 index data (from 1999 to 2019) and identify two change points of this data set.
We observe that the first change point physically coincides with the early stages of the 2008 financial crisis. The second change point is immediately after the 2016 United States presidential election. This structural change point coincides with the election of President Trump and his transition.
Our findings showed how the S&P 500 index and company stock prices react within each time-series segment. The MIX-SPCR model presents excessive explanatory power by identifying how different sectors fluctuated before/after the Federal Reserve's addressing heightened liquidity from 10 March to 14 March 2008. Although this is not a traditional event study paper, it is the first paper to use the sparse principal component regression model with mixture models in the time-series analysis. The proposed new and novel MIX-SPCR model enlightens us to explore more interpretable results on how macroeconomic factors/events influence the stock prices on the time scale. Later, in a separate paper, we will incorporate the event study in the MIX-SPCR model as our future research initiative. This paper's time segmentation model builds on time-series data, constructs likelihood functions, and performs parameter estimation by introducing error information unique to each period. Researchers have recently realized that environmental background noise can positively affect the model building and analysis under certain circumstances ( [36][37][38][39][40][41][42]). For example, in Azpeitia and Wagner [40], the authors highlighted that the introduction of noise is necessary to obtain information about the system. In our next study, we would like to explore this positive effect of environmental noise even further and use it to build better statistical models for analyzing high-dimensional time-series data. Acknowledgments: The first author expresses his gratitude to Bozdogan in bringing this challenging problem to his attention as part of his doctoral thesis chapter and spending valuable time with him that resulted in this joint work. We also express our thanks to Ejaz Ahmed for inviting us to make a contribution to the Special Issue of Entropy. We extend our thanks and gratitude to anonymous reviewers. Their constructive comments further improved the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: