Tri-Partition Alphabet-Based State Prediction for Multivariate Time-Series

: Recently, predicting multivariate time-series (MTS) has attracted much attention to obtain richer semantics with similar or better performances. In this paper, we propose a tri-partition alphabet-based state (tri-state) prediction method for symbolic MTSs. First, for each variable, the set of all symbols, i.e., alphabets, is divided into strong, medium, and weak using two user-speciﬁed thresholds. With the tri-partitioned alphabet, the tri-state takes the form of a matrix. One order contains the whole variables. The other is a feature vector that includes the most likely occurring strong, medium, and weak symbols. Second, a tri-partition strategy based on the deviation degree is proposed. We introduce the piecewise and symbolic aggregate approximation techniques to polymerize and discretize the original MTS. This way, the symbol is stronger and has a bigger deviation. Moreover, most popular numerical or symbolic similarity or distance metrics can be combined. Third, we propose an along–across similarity model to obtain the k -nearest matrix neighbors. This model considers the associations among the time stamps and variables simultaneously. Fourth, we design two post-ﬁlling strategies to obtain a completed tri-state. The experimental results from the four-domain datasets show that (1) the tri-state has greater recall but lower precision; (2) the two post-ﬁlling strategies can slightly improve the recall; and (3) the along–across similarity model composed by the Triangle and Jaccard metrics are ﬁrst recommended for new datasets.

The trisecting-acting-outcome (TAO) model [28] of thinking in threes [29] to understand and process a whole via three distinct and related parts [30] has inspired many novel and significant theories and applications. Recently, theories such as three-way formal concept analysis [31] and three-way cognition computing [32,33] have focused on concept learning via multi-granularity from the viewpoint of cognition. The three-way fuzzy sets method [34], three-way decisions space [35], sequential three-way decisions [36], and generalized three-way decision models [37][38][39] have been proposed. Moreover, applications include the three-way recommender system [40], three-way active learning [41], three-way clustering [42], tri-partition neighborhood covering reduction [43], three-way spam filtering [44], three-way face recognition [45], and the tri-alphabet-based sequence pattern [46]. However, the extension of TAO to MTS prediction needs to be studied in depth.
In this paper, a tri-partition alphabet-based state (tri-state) prediction method for symbolic multivariate time-series (MTS) was proposed. First, with the symbolic aggregate approximation (SAX) [47] technique, g symbols are generated with the piecewise aggregate approximation (PAA) [13] version of MTS and the hypothesis of a probability distribution function. Moreover, the most common standard normal distribution, i.e., N (0, 1), is used here. Hence, the g − 1 breakpoints can be obtained by averagely partitioning the under area of N (0, 1) into g parts. As these breakpoints also provide the deviation degree far from the expectation, the two thresholds α and β (α ≥ β > 0) can be specified from them. Hence, if the absolute value of a breakpoint is not less than α, the symbol is called a strong element. If the absolute value of a breakpoint is less than β, the symbol is called a weak element. Otherwise, the symbol is called a medium element. This way, for each variable of the given MTS, its alphabet, i.e., the set of symbols, is partitioned into the strong, medium, and weak regions.
Second, on the basis of the tri-partitioned alphabet, the predicted tri-state hence takes the form of a matrix with the size 3 × n (n is the number of variables). For each variable, we simultaneously predict the three most likely symbols occurring from the strong, medium, and weak regions. The state defined by the existing work only contains one case while the tri-state includes up to 3 n cases. Note that our method does not take the top three most likely occurring symbols as the prediction result because the deviation degree can provide some new orthogonal information. This way, the outliers are more noticeable for users.
Third, an along-across similarity model to generate the k-nearest matrix neighbors (kNMN) is presented. The along similarity considers the associations of the time stamps. The across similarity focuses on the relation between the variables. Additionally, with the PAA-and SAX-MTSs, the most popular numerical or symbolic metrics can be combined regardless of whether they are similarities or distances. Given a sliding window w, the PAAand SAX-MTSs can be transformed into m − w + 1 temporal subsequences, called instances. m is the number of time stamps, and all instances are matrices with the shape m × n. Moreover, the latest state following each instance is denoted as the decision information, called the label. With the optimal k labels from m − w, the tri-state can be finally predicted using the traditional voting strategy.
Fourth, two post-filling strategies called the individual and related ones, are designed to fill the possibly missing symbols of each variable. The reason for which the tri-state may be uncompleted is that no strong, medium or weak symbols occur after all matrix instances. For brevity, given a tri-state, we assume that the strong symbol of its i-th variable (a i ) is missing. The individual filling strategy (IFS) directly scans the history data of a i to obtain the most frequently occurring strong symbol. The related filling strategy (RFS) considers the associations between a i and the other n − 1 variables. One of the other variables, which is the most linear related to a i , is its condition.
The main contributions of this paper are presented as follows: • Tri-state. It provides three kinds of symbols for each variable simultaneously. The proposed deviation degree-based alphabet tri-partition strategy makes the outliers more noticeable for experts. Moreover, the IFS and RFS are designed to obtain a completed tri-state.
• Along-across similarity model. The similarities between time stamps and variables are considered simultaneously. This model provides a framework for the integration of the popular similarity or distance metrics. • Combination of the popular numerical or symbolic metrics. The PAA-and SAX-MTSs are simultaneously used in the above similarity model. The PAA-MTS is available for the numerical metrics, while the SAX-MTS fits the symbolic ones.
The experimental results undertaken on four real-world datasets show that (1) in terms of precision, the states are 30% to 50% higher than the three kinds of tri-states, while for the recall, the three kinds of tri-state are 10% higher than the state; (2) the IFS and RFS can slightly improve the recall by approximately 1%; and (3) the along-across similarity model composed of the Triangle and Jaccard metrics are first recommended for new datasets. Note that the IFS and RFS are necessary if the tri-state is incomplete. In other words, when the obtained tri-state is fulfilled, no difference is found among the three kinds of tri-states.
The rest of this paper is organized as follows. Section 2 reviews the existing work on time-series prediction. Section 3 presents the fundamental definitions of the tri-state. Section 4 proposes the algorithm for tri-state prediction. Section 5 discusses the performance of the prediction algorithm on four real-world datasets. Section 6 lists the conclusions and future work of this paper.
For the deep learning-based ones aiming to solve the volatility problem of wind power, a forecasting model based on a convolution neural network and LightGBM was constructed by Ju [14]. Ma et al. proposed a deep learning-based method, namely transferred bidirectional long short-term memory model for air-quality prediction [19]. Weytjens et al. predicted accounts' receivable cash flows by employing methods applicable to companies with many customers and many transactions [22].
In terms of the matrix or tensor decomposition-based ones, Shi et al. proposed a strategy that combines low-rank Tucker decomposition into a unified framework [48]. Ma et al. proposed a deep spatial-temporal tensor factorization framework, which provides a general design for high-dimensional time-series forecasting [49]. To model the inherent rhythms and seasonality of time-series as global patterns, Chen et al. [50] proposed a low-rank autoregressive tensor completion framework to model multivariate time-series' data. To generalize the effect of distance and reachability, Wu et al. [51] developed an Inductive graph neural network kriging model to recover data for unsampled sensors on a network graph structure.
For the kNN-based ones, Zhang et al. [15] proposed a new two-stage methodology that combines the ensemble empirical mode decomposition with a multidimensional kNN model in order to simultaneously forecast the closing price and high price of stocks. Xu et al. [17] proposed an algorithm based on the kernel kNN to predict road traffic states in time-series. Yin et al. [18] proposed the multivariate predicting method and discussed the prediction performance of MTS by comparing it with the univariate time-series and kNN nonparametric regression model. Martinez et al. [21] devised an automatic tool, i.e., a tool that works without human intervention; furthermore, the methodology should be effective and efficient. The tool can be applied to accurately forecast many time series.
Other techniques were also used for MTS prediction. To handle multivariate long nonstationary time-series, Shen et al. [16] proposed a fast prediction model based on a combination of an elastic net and a higher-order fuzzy cognitive map. Chen et al. [25] proposed a weighted least squares support vector machine-based approach for univariate and multivariate time-series forecasting. To predict future outbreaks of methicillin-resistant Staphylococcus aureus, Jimenez et al. [26] proposed the use of artificial intelligence-specifically time-series forecasting techniques. The orthogonal decision tree may fail to capture the geometrical structure of data samples, so Qiu et al. [27] attempted to study oblique random forests in the context of time-series forecasting.

Models and Problem Statement
In this section, we first introduce the definitions of the original multivariate timeseries (MTS) and its piecewise aggregate approximation (PAA) and symbolic aggregate approximation (SAX) versions. Second, we propose an along-across similarity model and the problem of state prediction. Third, we define the strategy of alphabet tri-partition and the problem of tri-partition alphabet-based state prediction. The notations are introduced in Table 1.

Notations Descriptions
The number of all time stamps, |T|. n The number of all variables, |A|. g The number of partitions; ∀a ∈ A, |V a | = g. D The set of breakpoints for S, The strong region. Λ The medium region. Ω The weak region. Σ = (Γ, Λ, Ω) Tri-partition alphabet. β ≥ 0 The threshold for the weak region. α ≥ β The threshold for the strong region. f i, * A symbolic state occurring at time t i . f i, * A numerical state occurring at time t i . p m+1, * A prediction of state occurring at time t m+1 . w The length of sliding window.
The similarity of two matrix instances. k The number of nearest matrix neighbors.

N
The set of k-nearest matrix neighbors. P m+1, * The form of the tri-state with area 3 × n.

Data Model
The PAA and SAX versions of MTS are defined on the basis of the original numerical MTS. Definition 1. An original numerical MTS is the quadruple: where T = {t 1 , t 2 , . . . , t M } is the finite set of time points, A = {a 1 , a 2 , . . . , a n } is the finite set of variables, V a ⊂ {real number} is the value ranges of variable a, and f : T × A → V is the mapping function. For brevity, f (t i , a j ) can be denoted by f i,j . We further assume that

Definition 2.
The PAA-MTS S = (T, A, V = ∪ a∈A V a , f ) has similar forms to Definition 1. However, two differences are present, namely (i) Example 1. Figure 1 shows an example of the transition of NO 2 from the original numerical MTS (S ) to the PAA version of MTS (S ). Here, m = 10 and M = 100. This way, the dimension is reduced from 100 to 10. The only difference is that the numerical value is transformed into a symbolic one. To produce symbols with equiprobability, a set of breakpoints D = {δ 1 , δ 2 , . . . , δ g−1 } dividing the area of under the probability distribution function (PDF) of a ∈ A is required. Therefore, let V a = {γ 1 , γ 2 , . . . , γ g } containing g symbols; then, we have: where j ∈ [1, g], and δ 0 , and δ g are defined as −∞ and +∞, respectively.
Example 2. Table 2 shows a lookup table of breakpoints for the N (0, 1) distribution. In practice, g can be set as an integer that is not less than 2. Notably, g = 2 means that D = {0}.   Example 4. Table 3 shows an example of SAX-MTS with three variables (i.e., A = {SO 2 (a 1 ), NO 2 (a 2 ), and PM2.5 (a 3 )}), and 10 time stamps (i.e., T = {t 1 , t 2 , . . . , t 10 }). For variable SO 2 , symbols a and f are missing. For variable NO 2 , symbols b and e are missing. For variable PM2.5, symbols e, d, and e are missing. This phenomenon is temporary until the data are big enough.

State
First, a formal description of the state is introduced as follows. Additionally, the type of prediction result that the SAX-MTS state is was described.
is called a state of SAX-MTS at time t i . Moreover, the state of PAA- is formally similar to this one.

Example 5.
With Table 3, f 10, * = {e, g, f} is called a state of SAX-MTS at time t 10 . Accordingly, Second, the state f i, * is denoted as a known label. This way, the corresponding instance of f i, * is defined as follows.

Definition 5.
Given an SAX-MTS S = (T, A, V, f ) and a sliding window w < m, an instance with the matrix form is:  Table 3, let w = 2 and i = 9; then: This way, the set of all instances can be denoted by where |SP| = |T| − w = m − w + 1. Table 3, let w = 2; then,

Example 7. With
where ∆ is the along-across similarity of the given matrix pair.
Note that the neighborhood N for O m may not be unique, where some other matrices have the same similarity with O m .
Fourth, the along-across similarity model ∆ is proposed to obtain the neighborhood N by merging the popular similarity and distance metrics.

Definition 7.
Given PAA-MTS S = (T, A, V , f ), SAX-MTS S = (T, A, V, f ), and sliding window size w, the similarity between the two matrix-based instances O i and O j is: where the row vector similarity is: and the column vector similarity is: where: Note that the row or column vector h in Equation (11) is indeed one of f and f, corresponding to PAA-and SAX-MTSs, respectively. Moreover, the data type of vector h in Equations (9) and (10) are coincident. In other words, the pairs of vectors in Equations (9) and (10) are either PAA-MTS or SAX-MTS. Namely, the case that h i * ,l is PAA-MTS while h j * ,l is SAX-MTS is not permitted. Table 4 presents the availability of similarities and distances for Equation (11). Two things need to be further explained. One is the availability of the metrics. Given any two indices r and c (r, c ∈ IDs), PAA(r) = True or PAA(c) = True indicates that the r-th or the c-th metric fits the numerical data. Similarly, SAX(r) = True or SAX(c) = False means that the r-th or the c-th metric fits the symbolic data. For example, PAA(0) = True indicates that the Euclidean distance fits PAA-MTS but not SAX-MTS. Jaro similarity False True The other is the transformation from the distance to similarity. As similarity and distance metrics are simultaneously used here, the distance needs be transformed into the similarity. Therefore, given two vectors h i and h j , the transformation from distance to similarity is: where d denotes the distance between h i and h j . This way, 100 combinations of distances and similarities exist. Their performances are discussed in Section 5.  = 0.5. More specifically, the Jaccard similarity between row vectors (g, f, f) and (g, f, g) is 2 3 . Third, the column similarity C 6,7 = Fifth, given a future time stamp (e.g., t 11 ), the state (e.g., f 11 ) at this time is unknown. Formally, the state occurring at time t m+1 is denoted as p m+1 = (p m+1,1 , p m+1,2 , . . . , p m+1,n ). To obtain the components of p m+1, * with the kNN-like method, the instances, neighbors, and labels were defined by the above. Therefore, the label of O m , i.e., p m+1, * can be predicted with the following voting strategy. and: where I(·) = 1, if the condition (·) is True; otherwise, I(·) = 0.  Second, with the three nearest neighbors, the states/labels after them can be obtained. Namely, f 6, * = (g, f, g), f 5, * = (g, f, f), f 8, * = (c, d, c), and f 7, * = (e, f, f).
Sixth, the prediction performance is better with less difference between the p m+1, * and f m+1, * in general. The measures of prediction performance such as the precision and recall are introduced here. ∀i ∈ [1, n], and the precision and recall of the state f m+1, * have the same form, namely: Finally, with the above definitions, the problem of state prediction is proposed as follows. Although two types of datasets, i.e., the PAA-and SAX-MTSs, are both used here, the space complexity remains the same. The time complexity is closely related to the size of the matrix instance and similarity metrics for vectors. Table 4, let r = 8 and c = 2; given PAA-MTS S , SAX-MTS S, and the sliding window w, the time complexities of the row and column vectors' similarity between two matrix instances are both Θ(wn). Moreover, the size of SP is n − w + 1; hence, the time complexity of our method is Θ(wn(m − w + 1)) = Θ(mn).

Tri-State
To enrich the semantics of predictions, we extend each component of p m to a column vector with length 3. For each vector, different components have various semantics. This way, the form of prediction is changed from a 1 × n vector into a 3 × n matrix.
First, we introduce the definition of the tri-partition alphabet as follows.

Definition 9.
Given an SAX-MTS S = (T, A, V, f ), ∀a ∈ A, is called a tri-partition alphabet of a if Additionally, we call Γ a , Λ a , and Ω a the strong, medium, and weak regions of attribute a ∈ A, respectively. Table 3, the range of values for variable NO 2 (a 2 ) is {a, b, c, d, e, f, g}. Let Γ a 2 = {a, g}, Λ a 2 = {b, f}, and Ω a 2 = {c, d, e}, Σ a 2 is called a tri-partition alphabet of NO 2 .   [1, n]. Moreover, this predicted vector can be interpreted as the most probable symbol from the strong, medium, and weak regions, respectively. Note that the three-way state is useless for historical data.

Example 11. With
Therefore, we present the voting strategy for the three-way state prediction as follows. Given ∀i ∈ [1, n]: Practically, regions Γ a , Λ a , and Ω a can be obtained using various partition strategies and have meaningful explanations. Here, we partition the range of symbolic values for each attribute using the following strategy. Based on Equations (2) and (3) , the tri-partition strategy is formally described as The combination of PAA-MTS S = (T, A, V , f ) and SAX-MTS S = (T, A, V, f ) is first used here. The breakpoint δ g is +∞, and δ g > α always holds. Hence, γ g always belongs to Γ a i .
However, the predicted tri-state is incomplete if no strong, medium, or weak symbols are found following the whole matrix neighbors. Namely, what the current method can guarantee is that each variable has at least one predicted symbol. Formally, given a tri-state P m+1, * at t m+1 , ∀i ∈ [1, n], we have: Table 3 Note that φ means the symbol of the current position is temporally unknown. More specifically, the strong symbol of a 2 and the medium symbol of a 1 are unknown. Here, "c / e" indicates that the final predicted symbol was randomly selected from them. For brevity, the symbol c was selected.

Example 12. With
Moreover, the precision for the incomplete tri-state is calculated as follows: In order to remedy this defect, i.e., to obtain a completed tri-state, we propose two simplified and effective filling strategies called the individual and related ones, respectively.
For each attribute, if one or two symbols are missing, the individual filling strategy (IFS) predicts them with the most frequent ones in its own history data. Then, ∀i ∈ [1, n], the IFS can be formally described as follows: where: Example 13. According to Example 12 and Table 3, for variable a 1 , p Λ 1 = b. This is because IFS-Count(b) = 3 10 > IFS-Count(f) = 0. For variable a 2 , p Γ 2 = a. This is because IFS-Count(a) = 2 10 > IFS-Count(g) = 1 10 . Hence, the tri-state filled by the IFS is The related filling strategy (RFS) predicts the missing symbols by considering the association relationships between any pair of variables. Given two variables a i and a j (i, j ∈ [1, n], i = j), a j is the most linear related variable of a i . Namely, a j = arg a j ∈A\{a i } max Pearson(a i , a j ). Hence, their predicted vectors are (p Then, the RFS can be formally described as follows: where: Table 3, the Pearson correlations among the three variables are listed as follows. Pearson(a 1 , a 2 ) = 0.892, Pearson(a 1 , a 3 ) = 0.919, and Pearson(a 2 , a 3 ) = 0.839. Hence, for the variable a 1 , the most related one is a 3 . Then, when (a 3 , g) happens, the happening symbols set of a 1 is {g}. When (a 3 , f) happens, the happening symbols set of a 1 is {g, e, e}. When (a 3 , c) happens, the happening symbols set of a 1 is {c}. No medium symbol for p Λ 1 by the RFS is available. Therefore, the result is b, which is predicted using the IFS.

Example 14. Based on Example 12 and
Then, for the variable a 2 , the most related one is a 1 . Then, when (a 1 , g) happens, the happening symbols' set of a 2 is {f, f}. When (a 1 , b) happens, the happening symbols set of a 2 is {a, a, c}. When (a 1 , c) happens, the happening symbols set of a 2 is {d, c}. Therefore, the result of p Γ i is a.
Finally, with all of the above definitions, we can define the problem of three-way state prediction as follows:  (p m+1,n , p m+1,n , . . . , p m+1 Compared with Problem 1, Problem 2 has two more parameters α and β. The first process that generates Σ is required, but it has a polynomial time complexity Θ(mn). The output is a matrix P with size 3 × n. Hence, we can obtain three of the most likely occurring symbols from the strong, medium, and weak regions, respectively. Note that Problem 1 obtains one predicted state at once, while Problem 2 can obtain up to 3 n possible states. Excitedly, the time and space complexity of the two problems remain the same.

Algorithms
In this section, the framework of the three-way state prediction algorithm with k nearest matrix neighbors (kNMN-3WSP) is shown in Figure 3. Three stages, namely kNMN construction, alphabet tri-partition, and three-way state prediction, are proposed. Note that datasets such as PAA S and SAX S are the inputs of all stages. In stages II and III, S and S were omitted for brevity.  Table 4. In other words, we have r, c ∈ {0, 1, . . . , 9}. Moreover, if r = 0, the similarity between two row vectors is measured using the Euclidean distance. If c = 7, the similarity between two column vectors are measured using the Triangle one. Second, the cardinalities of O m and all elements in SP are w × n, |N| = k and m = |T|. Third, the availability of PAA and SAX is the key to integrating S and S. They are mutually exclusive.

Stage II
Algorithm 3 describes the details of Stage II. First, the variable g was specified to generate the SAX version of MTS. In other words, g ≥ 2 is the number of symbols for each attribute. Second, if α = β, Λ is an ∅. When g = 2, no other choices are available except for α = β. Finally, the time complexity of this stage is only Θ(ng).

Stage III
Algorithm 4 discusses the details of Stage III. First, f j+1,i is the label of attribute a i . The predicted symbol is the one with the maximal frequency. Second, the purpose of using the index to count is to improve the efficiency of this algorithm. In Line 6, Count(·) is a mapping function for the count matrix in which the size is g × 2. The l-th position stores the frequency of γ l (l ∈ [1, g]). Generally, the matrix is denoted by M = ((Count(1), 1), (Count(2), 2), . . . , (Count(g), g)). For example, with Table 3 (4,2)) is transformed into ((4, 2), (3, 0), (2,1)). In Lines 10-21, the algorithm searches for three symbols with the biggest count from the strong, medium, and weak regions each. There is no need to continue searching if all three symbols of the current variable are known. The time complexity of this stage is Θ(mn).
Finally, the RFS considers more information than the IFS, but their time and space complexities are the same, namely Θ(nm). This way, we can obtain four kinds of states called the state, tri-state, IFS-based tri-state (IFS-tri-state), and RFS-based tri-state (RFS-tri-state). for (each neighbor O ∈ N) do 5: Get its last time stamp, denoted by t j ; 6: Obtain the index of f j+1,i in V a i , denoted by l; Obtain M by listing M in the descending order of Count(·); 10: for (j ∈ [1, g]) do 11: Let l = M j,2 ; 12: if (γ l ∈ Γ and p Γ i = φ) then 13: p Γ i = γ l ; 14: else if (γ l ∈ Λ) and p Λ i = φ then 15: p Λ i = γ l ; 16: else if (γ l ∈ Ω) and p Ω i = φ then 17: p Ω i = γ l ; 18: else 19: break; 20: end if 21: end for 22: end for 23: return P 3×n ;

Experiments
We attempted the discussion of the following issues using experiments: • The prediction performance of our along-across similarity model; • The stability of the similarity metrics combination.

Dataset and Experiment Settings
Experiments are undertaken on four datasets from four different domains, i.e., the environmental, financial, industrial, and health domains. The most important information from these datasets is listed in Table 6.  Stocks  4300  12  Finance  III  IPES  33,001  11  Healthy  IV  CACS  88,840  37  Industry  With Table 4, 10 × 10 = 100 combinations need to be discussed. The test set consists of the last 20% of the above three MTSs. However, the training set is dynamic at different time points within the testing set. Generally, for each time point i ∈ [ 20%m , m], the training set contains the whole records within the time range [1, i − 1]. In other words, 80% is the smallest training set ratio when the time point i is 20%m . Figures 4 and 5 show the meaning of precision, recall, and F1-measure for four kinds of states on the four datasets' test sets. Commonly, the form (r, c), r, c ∈ [0, 9], indicates the indices of row and column metrics, respectively. For example, (3,8) means that the row metric is Levenshtein and that the column metric is Jaccard. Second, with increasing k, the precisions of the state, tri-state, IFS-based tri-state, and RFS-based state are decreased. Third, the precision of state is better than that of the others. Moreover, the precision of tri-state is slightly better than that of the IFS-and RFS-based ones. The precisions of the IFS-and RFS-based tri-states are almost consistent. This is because three kinds of tri-states provide two additional symbols for each variable. However, tri-state may be incomplete while the IFS-and RFS-based ones are complete. Therefore, the precision of the tri-state is between the state and the IFS-and RFS-based tri-states. This can be observed in Figure 4b,c.  In Figure 5, the recalls of the three kinds of tri-states are better than that of the state. Moreover, the recalls of the IFS-and RFS-based tri-states are the highest. Similarly, the recall of the tri-state is also between that of the state and the IFS-and RFS-based tristates. Interestingly, the recall of the IFS-and RFS-based tri-states on the Stocks (Dataset II) can reach 95% and 93%, respectively. Compared with the state, the three kinds of tri-states have better recall but worse precision. Although the improvement of IFS-and RFS-based tri-states is not significant compared to the tri-state, more information can be provided. In most cases, k = 1 is the first choice for precision and recall.

Stability
Tables 7 and 8 list the top 10 metric combinations for precision and recall with four kinds of states on the four datasets' test sets. We can observe that some metric combinations are repeated. Hence, these combinations are considered more stable, with higher frequency/probability occurring in different datasets. For stronger discrimination, we additionally introduce a weighting strategy ranking for each metric combination.
With the above observations, the eighth metric, i.e., the Jaccard similarity, is the most frequently used, followed by the second one, i.e., the Manhattan distance.

Conclusions
In this paper, a new tri-state and its prediction problem were defined on multivariate time-series (MTS). The most likely occurring strong, medium, and weak symbols can be obtained with the tri-state. Second, a deviation degree-based tri-partition strategy and the algorithm were designed. For all symbols of each variable, the symbol was stronger and deviated further from the average value. Third, the along-across similarity model was proposed to capture the temporal and variables' association relationships. Fourth, the integration of the PAA and SAX versions of MTS can combine numerical or symbolic similarities or distances. Finally, when a new dataset is introduced, the first choices in parameter settings are k = 1 (the size neighborhood), r = 1 (the Jaccard), and c = 8 (the Manhattan).
The following research topics deserve further investigation: • More alphabet tri-partition strategies; • More tri-state completion strategies; • Adaptive learning of the parameters by cost-sensitive learning; and • More intelligent metrics combination strategies, e.g., integrated learning.