An Application of the Associate Hopﬁeld Network for Pattern Matching in Chart Analysis

: Chart patterns are signiﬁcant for ﬁnancial market behavior analysis. Lots of approaches have been proposed to detect speciﬁc patterns in ﬁnancial time series data, most of them can be categorized as distance-based or training-based. In this paper, we applied a trainable continuous Hopﬁeld Neural Network for ﬁnancial time series pattern matching. The Perceptually Important Points (PIP) segmentation method is used as the data preprocessing procedure to reduce the ﬂuc-tuation. We conducted a synthetic data experiment on both high-level noisy data and low-level noisy data. The result shows that our proposed method outperforms the Template Based (TB) and Euclidean Distance (ED) and has an advantage over Dynamic Time Warping (DTW) in terms of the processing time. That indicates the Hopﬁeld network has a potential advantage over other distance-based matching methods.


Introduction
Chart analysis is a kind of technical analysis in financial trading, which is different from quantitative analysis. The quantitative analysis intends to predict an exact future price by using machine learning models or deep learning models. Chart analysis aims to predict the trend of the future price according to the price pattern of the historical data. The traders and the financial analyst believe the theory that some specific patterns that have appeared before would appear again. Accordingly, these patterns could be signals for trading decisions. For example, the Head-and-Shoulder (H&S) pattern is believed to be one of the most reliable trend-reversal patterns. This pattern consists of three peaks, the first and third peaks are the shoulders and they represent the small rallies of the stock price. The second peak forms the head and it is the sign that the price would subsequently decline. A neckline can be drawn connecting the bottom of the two shoulders and the pattern can be confirmed normally when the closing price is clearly below this line. Lots of works have been proposed to analyze the relationship of the movement of the financial market and the shape of the chart pattern. Bulkowski [1] studies the character of the chart patterns and summarizes 53 applicable trading patterns. Wan et al. [2] divided the patterns into five categories in terms of their shape on the basis of Bulkowski's work. In this paper, we adopt several classic patterns to generate synthetic data.
How to find the subsequences that match the query patterns as much as possible [3] has become an important problem in technical analysis. This question can be explained as given a fixed length of the financial time series data, find all the subsequences similar to the stored or expected pattern like H&S. Normally, pattern matching approaches in financial chart analysis can be categorized as template-based, rule-based, training based and distance-based. Most of them need to be segmented as a data preprocessing process and then the similarity between the processed data and the predefined template needs to be evaluated. Once the similarity reaches or exceeds a given threshold by the analyst, the subsequence can be accepted as a specific tradable pattern.
The existing time series data mining research has been studied thoroughly in [4][5][6]. Fu et al. proposed a preprocessing method called perceptually important point (PIP) [3,7] to reduce the fluctuant data points and extract a given number of points to represent the subsequence of the time series data. Keogh et al. proposed piecewise aggregate (PAA) method [8] and piecewise linear approximation (PLA) method [9]. The PAA approach divides the time series data into N equal parts and calculates the mean value of each part to represent the subsequence. The PLA method adopts the sliding window to scan the subsequence in a top-down or bottom-up way to extract several straight lines to segment the data. In addition, Si et al. [10] proposed a segmentation method based on TPs (turning points). Wan et al. conducted several experiments [5] of different segmentation methods on the synthetic data and concluded that the PIP segmentation is robust in different similarity measurement and can preserve the overall shape of the subsequence.
Other than the segmentation methods mentioned above, Leigh and Martins et al. utilized a grid template method [11,12] to represent the 'Bull Flag' pattern. Goumatianos et al. proposed a grid template representation for pattern recognition in the forex market [13]. They introduced the template grid to capture the chart formation and defined a novel similarity measurement based on the template grid.
The similarity measurement is also critical to the matching result. In the work of Fu et al., they introduced a temporal distance (TD) [3] measurement to define the similarity of the segmented sequence and the predefined template. The rule-based (RB) method is also proposed in the same paper, which uses predefined rules to identify patterns. In [14], Zhang et al. designed a real-time pattern matching scheme based on Spearman's rank correlation coefficient and rule sets. These two methods rely on the rules defined for each pattern thus have a disadvantage when we want to update or increase the query pattern, and it takes time to redefine the rules for the new patterns. The Euclidean distance (ED) method can also be used to calculate the similarity of two patterns and it does not need to be segmented, but from the previous experiments we can see that the ED approach has bad performance regard to some distorted sequences data and does not consider the horizontal and vertical shifts, so the dynamic time warping [15] algorithm (DTW) would be more useful in time-series data processing. However, time sequences are usually long, it would be time-consuming using the distance-based methods without segmentation.
Training-based methods consider the pattern matching process as a classic pattern recognition problem. Traditional classification models like support vector machine (SVM) and back-propagation neural network (BPNN) can be applied in time series data classification, the segmentation process is not necessary in these algorithms. Therefore, SVM and BPNN can preserve more information from the raw data [2]. However, this kind of method may have another drawback: These models always need an amount of training data to achieve a high testing accuracy, and it has to learn multiple classifiers for different patterns. Consequently, it is inefficient to apply in real-time financial pattern matching. In this context, we consider using the continuous Hopfield network as our matching approach.
The Hopfield neural network (HNN) was proposed by Hopfield John J [16] in 1982. The energy function was introduced to study the stability of the network, and it turns out that the HNN has a good memory association ability. The original HNN can only deal with the discrete binary pattern recognition by using Hebb's rule [17] and its memory capacity is limited to the network size [18]. However, in recent years lots of works [19][20][21][22][23][24][25][26] have studied the memory capacity and invented different kinds of continuous HNN to deal with the continuous value pattern. We leverage HNN's advantage in warping pattern recognition and the segmentation method in our work, proposing a training-based pattern matching approach, which only needs to be trained on the predefined template pattern.
With the work mentioned above, we treat the financial time series pattern matching problem as a classic pattern recognition problem. In the next section, we will review the related work in financial pattern matching. In Sections 3.1.1 and 3.1.2, we introduced details about how to leverage the segmentation method and template grid into our matching approach. Section 3.2 presents the algorithm of the learning associate Hopfield network. The content includes the training process and matching procedure of our method. Section 4 describes the experimental data we use and the algorithm that generates the synthetic data. Section 4.2 summarizes the results of the experiments.

Related Work
Several current similarity-based pattern matching approaches are reviewed in this section. The TD approach in template-based pattern matching measures the point-topoint similarity between the predefined template and the segmented subsequences. The similarity can be described as a weighted combination of the amplitude distance (AD) and temporal distance (TD). The amplitude distance can capture vertical distortion and the temporal distance reflects the horizontal disparity. AD is defined as follows: Here, SP and sp k denote the extracted points with the PIP segmentation method. q k is the point of the predefined template. TD is defined as follows: where sp t k and q t k denotes the coordinate of the point in time dimension. The similarity measure is in the following form: usually, w 1 is set to be 0.5 in the experiment of [2,3,5], we follow this setting in our experiment. Furthermore, we set a threshold for the similarity measure: Once D(SP, Q) is lower than the preset threshold, the stored pattern with the minimum D would be accepted as the matching pattern. The ED approach calculates the point-to-point distance between the query template and the sequences without segmentation. Let the predefined pattern be denoted as Y(y 1 , · · · , y n ) and the time series sequences be denoted as X(x 1 , · · · , x n ). We can easily get the similarity by the following: same as above, we set a threshold, once the sequence got a minimum ED(X, Y) that was lower than the threshold, it accepts the corresponding pattern. DTW has been applied in time series pattern detection by Berndt et al. [15] in 1994. It has been widely used in speech recognition, gesture recognition and time series clustering since it was prevented. In time-series data processing, the length of the two sequences that need to be compared for similarity may not be equal. In such a case, the ED approach can not measure the similarity of two different size sequences of time series data efficiently. Figure 1A is the point-to-point computation of the ED method, Figure 1B is the demonstration of DTW, for each data point in the time series T, it considers the distance between the point and the other all points in the sequence S. DTW is based on the dynamic programming approach, given two sequences S(s 1 , · · · , s n ) and T(t 1 , · · · , t m ) we can form an n-by-m matrix γ, the matrix elements d(s i , t j ) represent the Euclidean distance of the points s i and t i . The warping path W(w 1 , · · · , w k ) shown in Figure 2 maps the elements in S and T. The dynamic time warping problem can be solved by minimizing the warping path.
The dynamic programming formulation is based on the following recursive equation: where γ(i, j) denotes the cumulative distance of (i, j).  In the work of Kim et al. [27], the DTW algorithm is utilized for intraday price pattern matching. They constructed two sets of fixed patterns and used them as the matching templates. However, the computational complexity of DTW is O(m × n), which is not efficient for large scale financial time series pattern recognition. In [28], Keogh et al. proposed a scaling up DTW for massive data processing. Their main idea is to reduce the data points by the piecewise linear representation (PLR) segmentation approach.
Training-based methods like SVM are applied in stock trading signal prediction in [29,30].
To train a SVM model, a set of time series labeled as positive and negative samples should be generated. Wan et al. also proposed a hidden semi-Markov model (HSMM) [2] for chart pattern matching. In this paper, we present a non-distance-based matching method by using the learning associate Hopfield network. This model could be trained with fewer samples and it costs less training time than the SVM and BPNN.

Pattern Representation
A suitable segmentation or representation of the time sequence is important for the matching results. In the next few subsections we introduce how to make use of the PIP and TG method in our pattern representation.

Perceptually Important Point
The well-known segmentation method PIP would be used in our matching approach. There are three variant ways to measure the distance between the adjacent points in PIP, which are the euclidean distance (PIP-ED), perpendicular distance (PIP-PD) and vertical distance (PIP-VD). The results of Fu et al. [3] illustrate that the PIP-VD would be the best choice in terms of efficiency and effectiveness. The PIP-VD can be calculated as follows: where p 3 (x 3 , y 3 ) denotes the next chosen PIP, p 1 (x 1 , y 1 ) and p 2 (x 2 , y 2 ) are the exiting PIPs and p c (x c , y c ) is the point in the line between p 1 and p 2 . The schemematic diagram is illustrated in Figure 3: Input: sequence S(s 1 , · · · , s m ), predefined template P(p 1 , · · · , p n ); Output: SP: PIPs with the length of n; 1: Set SP 1 = s 1 , SP n = s m ; 2: repeat 3: Select point S j with maximum VD distance to the adjacent points in SP;

4:
Add S j to SP; 5: until SP is filled. 6: return SP;

Template Grid
In Goumatianos' paper [13], they introduced their novel template grid representation methodology: The time series data is encoded by the pattern identification code (PIC), a one-dimensional array that represents the position of the data points in a given template grid. An example is given to illustrate the PIC in Figure 4: After generating the PIC, each cell of the template grid would be assigned a weight by applying the following approach: the weights of each column are calculated as the equation We constructed the predefined patterns like H&S, Double Top, Triple Top, Spike Top, etc. using the TG method. In order to generate the stored pattern that could be properly learned by the HNN, we present three different representation: (a) PIP-TG: To reduce the processing time, we firstly utilize PIP to process the data and then extract the data points to generate PIC, such that a simplified TG can be formed. (b) N-equal-part TG: Preset the dimension N of the template grid, the time sequence would be split evenly into N parts, each part represents the data point on the TG. For the predefined pattern, we increase the data points of the template to N and then apply the TG representation. (c) Scaling PIP: After reducing fluctuant points by PIP, simply scale up the number of the data points between the PIPs.

Learning Assosciate Hopfield Network
The Hopfield network is an important tool for memory retrieval. Traditional discrete HNN [16] can be defined by using Hebb's rule. In 2011, Zheng et al. [20] firstly proposed the Learning Associate Hopfield Network, which can be applied in continuous-time realworld problems. They adopted the m energy function method to analyze the retrieval property of the CHNN and proposed sufficient conditions for local and global asymptotic stability in [19]. Based on that theorem, the given patterns can be assigned as locally asymptotically stable equilibria of the learning associate Hopfield network (LAHN) by error back-propagation algorithm. The neural dynamics of LAHN can be described as following equations: where x denotes the initial neuron states, Θ ∈ R n is a real constant vector. A ∈ R n×n is a positive diagonal matrix, W ∈ R n×n is the asymmetric weight matrix that can be learned from the error back-propagation algorithm. F(x) denotes a continuous activation function that is differentiable [20], α is the growing rate.
In [19], a sufficient condition is given for the local stability of the equilibria.
denotes the maximum eigenvalue of a matrix. In order to train the LAHN, we choose a proper differentiable activation function. The training datasets are the stored patterns, and the label of the stored pattern X (i) is AX (i) . Based on that, we could describe the training data as this form: D = {(X (1) , AX (1) ), · · · , (X (m) , AX (m) )}. The training process of LAHN would be described in Algorithm 2.
In this paper, we propose a pattern matching approach by LAHN that is tailored for our pattern representation method, for different representations we choose different activation functions and different initialization of A and W. The growing rate α is also variant with different neuron sizes. The detailed parameter setting would be described in Section 4.2.

Algorithm 2 Training process of LAHN.
Input: the predefined patterns P = (p 1 , · · · , p m ), activation function f (x), diagonal matrix A, the threshold of squared error T, learning rate lr; Output: the asymmetric weight W and bias Θ; Initialize W and Θ, squared error (SE) = 0; 2: while SE > T do for each stored pattern p i in P do 4: out ← W f (p i ) + Θ; y i ← Ap i ; 6: Get the training loss of each training sample: loss ← (y i − out) 2 ; Compute the derivative of W and Θ; 8: 10: Calculate overall squared error; end while 12: return W, Θ;

Pattern Matching Based on Segmentation and LAHN
In this section, we will introduce our matching procedure in detail. Eleven basic patterns are chosen from 5 categories ( Figure 5) that were summarized by Wan et al. [2]. We take these predefined patterns as stored patterns of the LAHN. As the number of the neuron is fixed and predefined, the basic chart patterns would be firstly processed by the PIP algorithm to extract a fixed number of the data points and form the stored pattern (i.e., the template patterns). The basic template patterns would be represented by three representation methods described above, and then we design three different LAHNs accordingly. To illustrate our pattern matching process, we take Figure 6 for an example. The classic H&S chart pattern is represented in three different ways. As is shown in Figure 6, the PIP-TG method would extract a fixed number of data points and then TG is used to represent the pattern. According to Zheng's paper, LAHN has the same characteristic as the traditional Hopfield network: the more neurons in the network, the higher the correct recall percentage. But it brings the higher computational cost at the same time. Therefore, we need to properly design the neuron number of each representation. The figure shows that there are 49 cells in the TG, so the neuron number of the LAHN would be 49. It is the same way for the N-equal-part TG. As for the Scaling-PIP, the neuron number would be the predefined total number of the data points after scaling and there would be 25 neurons in the case of Figure 6. In order to assign the stored pattern as the locally asymptotically stable equilibria, we fine-tune the parameters until the maximum eigenvalue of each [H − AΦ(X (i) )] is lower than 0. We choose the tanh function as our activation function. The form is as follows: f (x) = (e kx −e −kx ) (e kx +e −kx ) and the exact value of the parameters would be described in Section 4.2.
Each stored pattern can be resized as a 1-dimensional vector, so we can easily get the training label by multiplying the matrix A. The training process is quite simple and efficient, after getting a satisfied squared error, a well-trained LAHN can be obtained. For the matching process, the incoming time series data would be segmented by one of the three representation methods as the initial state of the LAHN. The network dynamic evolves following the Equations (8)- (10). To avoid the oscillation, we constrain the growth rate into the range of (0, 0.5]. Unlike other distance-based methods, after completing the iteration, the initial warping pattern would asymptotically converge to the most similar stored template pattern. It saves time in calculating the similarity between the input pattern and template patterns. Figure 7 shows the network structure of the LAHN and Figure 8 is the flow chart for training a LAHN. There would be only one cell value that was equal to 1 in each column. Here, the dimension of the grid is 12. The scaling PIP is generated by the scaling procedure and it will be described in detail in Section 4. The input x(t) denotes the neuron state in time step t, the output of the networkẋ(t) is the momentum of the current state, we can get the next state (i.e., the input of the network in time step (t + 1) by Equation (10), and the stable state could be achieved if we evolve the network recursively.

Synthetic Data and Predefined Templates
Our experiment would be conducted on the synthetic dataset following the algorithm in Wan's paper [5] to generate the synthetic data. It consists of three steps to generate the data, time scaling, time warping, and noise adding. The time scaling is used to enlarge the data of the template pattern, we produce different lengths of the scaling time series during this process. Time warping changes the position of the data points extracted by PIP, so the shape of the synthetic data would be different from the template, but the overall picture is similar. The noise adding process we use is different from the original method, a random value rnd follows the Gaussian distribution N (0, σ 2 ) would be added to each data point if the random value r is below the threshold (0.7 is used in this paper), we can easily control the noise level by adjusting the value of σ. Algorithm 3 describes the process of generating the synthetic data. Later in Section 4.2, the results will show the performance of each model in different noise levels. The predefined template patterns would be shown in Figures A1-A3 in Appendix A.
The accuracy is used to compare the models, which is defined as Equtation (11). As described in the last paragraph, each template pattern would be used to generate numbers of corrupted patterns. The correctly recalled patterns would be the final matching pattern. Accuracy = correct recall patterns totall patterns (11)

Algorithm 3 Generating synthetic data.
Input: A template pattern p, scaling number m , length of the template q num ; Output: Synthetic data of the template pattern;

Time Scaling
Compute X ← (m − q num )/(n − 1); 3: for each point x i in the set (x 2 , · · · , x end ) do x i ← x i−1 + (X + 1); end for 6: Time Warping for each data point x i do Randomly change the position of x i between x i+1 and x i−1 ; 9: end for Enlarge the points between each critical point; Noise Adding 12: for each point x i do Generate a random value r that follows U (0, 1); if r < threshold then 15: Generate a random value rnd that follows N (0, σ 2 ); x i ← x i + rnd; end if 18: end for return x; Table 1 summarizes the differences of each matching method. The distance-based methods, ED, DTW and PIP-VD, require a predefined threshold to decide whether to accept the matching pattern. Segmentation means the methods require a data preprocessing procedure to extract the important data points. Training represents the methods that need to be trained.

Six Stored Patterns
The proposed matching approach is compared with three different distance-based methods: ED, DTW and PIP-VD. We choose the first six templates as the stored patterns of the LAHN. We generate the synthetic data of the 11 patterns shown in Figure 9. Each template pattern would generate 200 samples. The length of each generated time-series data is the width of the sliding window and we temporally set it to be 49. Table 2 shows the accuracy of each model applied on the first 6 templates (H&S, Tria-A, CWH, Reverse CWH, Trip-B, Doub-T). We conduct the experiment on the normal data and noisy data, the standard deviation σ in the noise adding process of the normal data is 0.15 while it is 0.2 in the noisy data. As for the time-warping process, we change the position of data point x i within the range of 2 3 [x i−1 , x i+1 ] in the normal dataset, and set the range to [x i−1 , x i+1 ] in the noisy dataset.

Segmentation
Training Threshold  In the above experiment, the threshold of the ED approach is set to be 3, and DTW is 17. We scale the template pattern to the same size as the synthetic data in these two methods. Once the difference measured by the model is lower than the predefined threshold, the distorted time series would be matched to the most similar template pattern. The parameters of the LAHNs are summarized in Table A1. As is shown in Table 2, the ED approach is quite sensitive to the noise level of the data, while the accuracy of the other 5 methods is changed within an acceptable range. In our proposed method, the scaling PIP has the best overall matching accuracy, and the PIP-TG is slightly better than the traditional PIP. As for the training process, it only takes a few seconds to train a LAHN. DTW got the best performance over the other methods, but it requires more processing time as the data size increases.
The results in Figure 10 show that the consuming time of DTW grows with the scale of the time series. However, with the segmentation, the processing time of the other 4 methods (except ED) only has a small change. The N-equal-part method costs a bit more time because the LAHN has more neurons compared to the one used in the other representation. Table 3 shows the matching results of each template pattern. The synthetic data is generated with three noise levels here, and the position of the data points would be changed within the range [x i−1 , x i+1 ] in the time-warping process. Although the PIP method has a good performance in some patterns such as H&S, Tria-A, and Trip-B, its overall accuracy is not the best. With more information on the time series, the DTW method got the best overall performance. The scaling PIP performs slightly worse than DTW, but as described above, it has an advantage in terms of processing time.  From Figure 11, we can see that the performance of DTW and scaling PIP barely changed with the noise level, and the accuracy of each pattern is very balanced. It indicates these two models would not have a preference on some specific pattern.

Samples Analysis
To analyze the matching result more intuitively, we take several samples to illustrate the advantage of our approaches. Figure 12 shows the matching result of a distorted Trip-B pattern. Although the representation of the time series is blurred because of the Gaussian noise, the LAHN retrieves the original stored Trip-B template that most closely resembles the testing samples, while the ED method could not recognize this pattern because the distance-based similarity is higher than the predefined threshold.  Figure 13 shows that Doub-T testing data is misidentified by the ED, DTW and PIP methods. The black line represents the synthetic data generated according to the Doub-T template. By enlarging the data points between the critical points of the PIP, the representation method can provide more information for the LAHN.

Eleven Stored Patterns
It has been verified that the memory capacity of the Hopfield network is confined to the neuron size of the network. In this section we enlarge the stored patterns of LAHN and explore how this could influence the matching performance of the LAHNs. The parameter and threshold setting is the same as the experiment of the six stored patterns, but the testing data is a little bit different here. The standard deviation of the Gaussian noise in the noisy data is 0.25, and the range of the time-warping process remains the same. As can be seen from Figure 14 and Table 4, the overall performance of the Hopfield network based methods is somewhat declined, but the result of scaling PIP changes in an acceptable range. We can also observe that the N-equal-part method has a particularly low accuracy of the H&S pattern, that is mainly because the representation of H&S is quite similar to the Doub-T pattern. The improvement of this method would be later discussed in Section 5. The overall accuracy on different synthetic data. The ED and DTW performs well when the noise level is low, but the accuracy declines a lot as the noise increases. The results of the scaling PIP are similar to the one in Figure 11. It barely changed with the noise level and that indicates that the scaling PIP is much more robust than the distance-based methods.

Discussion
We proposed a lightweight pattern matching method by utilizing the learning associative Hopfield network, which is a non-distance-based approach that combines the segmented representation method. By using the synthetic data generated from the 11 traditional trading chart templates, we conduct experiments with different levels of noise and distortion. We discover that the scaling PIP performs better than the ED and the traditional PIP methods. Its matching results are slightly worse than DTW but cost less time during the matching process. Furthermore, when the number of stored patterns does not exceed the memory capacity of the LAHN, the N-equal-part method performs steadily. It can be concluded from the experimental results that the scaling PIP is a robust and efficient matching method and the Hopfield network has a potential advantage over other distance-based matching methods.
The proposed Hopfield network based algorithm can be applied in some pattern matching trading systems to detect the patterns in the daily or hourly financial market. With fewer training datasets and less processing time the trader can efficiently capture the specific signals and make the transaction. Future work could consider how to leverage the characteristics of DTW and the methods we proposed to construct a more accurate and scaling matching method. Moreover, the memory capacity limits the performance of the Hopfield network based pattern matching. Future studies can also explore the retrieval reliability of the LAHN and different segmentation algorithms like PLA, PAA or TP (turning points), which could distinguish different stored templates.  Table A1. Parameters setting of the LAHN. A denotes the diagonal elements of the matrix A, for simplicity, the elements of A are all equal. k denotes the parameter of the activation function. The first column is the number of neurons and the last column is the learning rate.