An Advanced Optimization Approach for Long-Short Pairs Trading Strategy Based on Correlation Coefﬁcients and Bollinger Bands

: In the ﬁnancial market, commodity prices change over time, yielding proﬁt opportunities. Various trading strategies have been proposed to yield good earnings. Pairs trading is one such critical, widely-used strategy with good effect. Given two highly correlated paired target stocks, the strategy suggests buying one when its price falls behind, selling it when its stock price converges, and operating the other stock inversely. In the existing approach, the genetic Bollinger Bands and correlation-coefﬁcient-based pairs trading strategy (GBCPT) utilizes optimization technology to determine the parameters for correlation-based candidate pairs and discover Bollinger Bands-based trading signals. The correlation coefﬁcients are used to calculate the relationship between two stocks through their historical stock prices, and the Bollinger Bands are indicators composed of the moving averages and standard deviations of the stocks. In this paper, to achieve more robust and reliable trading performance, AGBCPT, an advanced GBCPT algorithm, is proposed to take into account volatility and more critical parameters that inﬂuence proﬁtability. It encodes six critical parameters into a chromosome. To evaluate the ﬁtness of a chromosome, the encoded parameters are utilized to observe the trading pairs and their trading signals generated from Bollinger Bands. The ﬁtness value is then calculated by the average return and volatility of the long and short trading pairs. The genetic process is repeated to ﬁnd suitable parameters until the termination condition is met. Experiments on 44 stocks selected from the Taiwan 50 Index are conducted, showing the merits and effectiveness of the proposed approach. writing—original W.-H.L. and S.-T.H.; writing—review and C.-H.C. and T.-P.H.; supervision, T.-P.H.; funding acquisi-tion, T.-P.H.


Introduction
In financial markets, investment assets include bonds, funds, stocks, and other derivative financial products, for instance, futures and options. Investors are familiar with the basic principle of profitability: buy an asset at a low price and sell it at a higher price. The difficult part is that appropriate trading signals are hard to find, given the various assets and trends in real financial markets. Because of this phenomenon it is difficult to make a profit. Thus many approaches have been proposed for finding trading strategies that make profits more stable [1][2][3][4][5][6][7].
Of these, pairs trading is a critical, widely used trading strategy [20][21][22][23][24] based on a central concept: for two highly-correlated assets, buy when one stock price falls behind and sell when the stock prices converge; this constitutes an arbitrage opportunity [25]. In other words, a profitable pairs trading strategy make take into account how to find a pair of highly correlated stocks and how to generate useful trading signals for buying and selling. Pairs trading can also be applied more widely, e.g., to cryptocurrency and prosumer markets [26,27].
The genetic Bollinger Bands and correlation-coefficient based pairs trading algorithm (GBCPT) was proposed by Huang [28]. It involves an optimization approach to determine parameters for correlation-based candidate pair generation and the Bollinger Bands-based trading signal discovery process. Stock pairs whose correlation coefficients meet the predefined threshold are expected to show more discrete trends in the future. In addition, Bollinger Bands are used to determine the rise/fall degrees of the pair. When both conditions are met, the transaction is longed for expected rises and shorted for declining stocks. The pair transaction is closed when the ending conditions of the Bollinger Bands are met. However, there are other parameters in pairs trading that affect the profitability of the strategy; these should be taken into consideration when designing the fitness function.
To solve the above-mentioned problems, we propose the advanced genetic Bollinger Bands and correlation-coefficient based pairs trading algorithm (AGBCPT) to achieve more robust and reliable trading performance. The algorithm encodes six critical parameters into a chromosome: the correlation coefficient threshold, the entry width of the Bollinger Bands, the out width of the Bollinger Bands, the correlation coefficient calculation days, the moving average calculation days, and the forward observation days. When evaluating fitness using such a chromosome, the encoded parameters are utilized to observe the trading pairs and their trading signals generated from the Bollinger Bands, after which the fitness value is calculated by the average return and volatility of long and short trading pairs. The genetic process is repeated to find suitable parameters until the termination conditions are satisfied. Experiments conducted on 44 stocks selected from the Taiwan 50 Index show the merits and effectiveness of the proposed approach.
This paper is organized as follows. Related work is described in Section 2 and the details of the proposed AGBCPT method are stated in Section 3. The experimental results are discussed in Section 4, and Section 5 concludes and outlines future work.

Review of Pairs Trading Strategies
Pairs trading is a neutral trading strategy that investors utilize to yield profits from changing market situations [1,20,21,23,29]. Based on the historical performance of high correlated commodities, a pairs trading strategy focuses on how to observe the trading pair as a target and achieve profit from it [21]. When the correlation weakens, for instance, one stock rises and the other falls. Such a temporary discrete situation can be caused by changes in supply and demand, a sudden large number of transactions by a securities firm, or major news. These factors cause stock fluctuations. A pairs trading strategy then shorts the rising stock and longs the falling one at the same time because investors expect the price difference between the two to converge in the future [23,29]. Krauss classifies pairs trading strategies into distance methods, cointegration methods, time series methods, stochastic control methods, and other methods [23]. In recent years, abundant related research has been produced [25,[30][31][32][33][34][35][36]. Below, we introduce approaches related to pair trading.
In 2006, Gatev et al. published a well-known pairs trading paper. Their proposed GGR (Gatev, Goetzmann and Rouwenhorst) pairs trading method [25] used six-month trading periods from 1962 to 1997 on a large sample of the U.S. equities. After testing the profitability of several trading rules, they observed that their strategy yielded annualized excess returns of up to eleven percent at low exposure to systematic sources of risk. Do Appl 32]. Their study extends the GGR method, comparing the test data over different years and different industries and confirming that the declining profitability in pairs trading is mainly due to an increasing share of non-converging pairs. One experimental result also shows that more industrially matched portfolios yield more substantial profits than portfolios selected from the whole market. They thus reduce the convergence failure of the selected stock portfolio.
For various situations and purposes, pairs trading also works with other methods that improve the performance of the pairs trading strategy [37]. For example, Rende et al. experiment with the persistence-based decomposition (PBD) model in a large-scale highfrequency pairs trading application [38]. Their study provides empirical evidence to show that the model is well-suited to noisy high-frequency data in terms of model fitting and prediction. Stäbinger et al. develop a pairs trading framework based on a mean-reverting jump-diffusion model [39]. Their results show that the method performs well in terms of risk-reward characteristics. To find an optimized pairs trading strategy, Fallahpour et al. propose pairs trading strategy optimization based on reinforcement learning [40]. Results on S&P500 constituent stocks confirm the efficiency of the proposed method and show that their approach is superior to existing approaches.
In addition to the stock market, pairs trading strategies are also used in other financial fields. For example, Fil et al. propose the use of paired trading for the cryptocurrency market to find profit space [26]. In experiments, they shift the standard pairs trading from finance to cryptocurrency. The experimental results of the same use of paired trading show that the trading portfolio in the cryptocurrency market does not converge, and profitability is improved when using higher-frequency trading. In addition, Lintilhac et al. state that historically, pairs trading in bitcoin markets have been possible [41]. Due to the increasing needs for distributed energy trading, Oh et al. propose two pair-matching strategies for distributed prosumer energy trading that consider the properties of the trading rules and the statistical characteristics of participants [27]. The literature shows that pairs trading is an effective trading strategy used by investors to yield profits from different market situations in various financial fields.

Review of Optimization Approaches in Financial Applications
Genetic algorithms (GAs) are optimization algorithms widely used for solving complex problems in a variety of fields [42,43]. In the financial field, many applications utilize GAs to improve and search for near-optimal solutions in limited time [44]. For example, Chen et al. propose an optimization algorithm to address the diverse group stock portfolio optimization problem to obtain a diverse group stock portfolio using the grouping genetic algorithm (GGA) [18]. To identify good group trading strategy portfolios, Chen et al. propose an algorithm to not only obtain a reliable group trading strategy portfolio but also to find appropriate stop-loss and take-profit points based on the GGA [17]. Huang proposes a methodology for effective stock selection using support vector regression (SVR) and a GA [45]. He was first to use the SVR to generate surrogates for actual stock returns that, in turn, serve to provide reliable stock rankings. The GA is then used to optimize the parameters for the proposed model. Chen et al. propose an approach for feature selection utilizing the GA, and use the selected features to construct a long short-term memory (LSTM) neural network model for stock prediction [13]. The results showed that the GA-LSTM model outperforms all baseline models for time series prediction. Cheong et al. propose a spatiotemporal convolutional neural network-based relational network (STCNN-RN) model for stock anomaly detection [46]. To improve the accuracy of the STCNN-RN model, the GA is then employed to identify outlier time points for use in the model to identify abnormal behaviors. They indicate that the model is effective on a multiple financial time series dataset for finding anomalous situations. For pairs trading optimization, Sermpinis et al. propose a pairs trading structure based on deep reinforcement learning (DRL) and a GA [47]. They first apply the distance method (DM) and the cointegration approach (CA) to generate trading pairs from the given pair pool, after which trading actions are determined using the simple thresholds (ST) strategy, the GA, and DRL. They propose five pairs trading strategies for trading, including the DM-ST, CA-DM-ST, and CA-ST benchmark strategies, and the improved strategies CA-GA-ST and CA-DRL. In CA-GA-ST, the GA is utilized to find appropriate parameter settings, and in CA-DRL, DRL is employed to construct an agent using pairs trading rules and the differences between the two assets. They indicate that CA-DRL is superior to other strategies. Goldkamp et al. propose an intelligent system using mixed integer programming (MIP) and the multi-objective genetic algorithm (NSGA-II) for multivariate pairs trading [48]. It uses MIP to generate trading pairs. The risk and return are used as two conflicting objective functions when finding Pareto solutions using NSGA-II. The results indicate that multi-objective multivariate pairs trading outperforms traditional approaches.
Huang et al. propose an intelligent model for pairs trading based on GA [49]. In their approach, the GA is utilized to find the parameters of moving averages, Bollinger Bands, and stock weight coefficients for the model. Experimental results indicate that GA-based pairs trading effectively improves the performance of pairs trading and outperforms the benchmark in terms of return.
In addition, Huang proposes the genetic Bollinger Band and correlation-coefficient based pairs trading algorithm (GBCPT), using a GA for pairs trading [28]. GBCPT encodes the parameters into a chromosome, including the correlation coefficient threshold, the entry width of the Bollinger channel, and the exit width of the Bollinger channel. The last two parameters are used to determine the width of the Bollinger Bands. To evaluate the chromosome, they first use the correlation coefficient between companies to determine a suitable candidate combination with a correlation coefficient for purchase, after which the Bollinger channels are used as a reference indicator to find the buying and selling signals for the target pair. The average return is then calculated and set as the fitness of a chromosome. The genetic operators are utilized to generate new solutions. The selection operator is used to generate the next population. The genetic process is repeated until the termination condition is met.

Review of Bollinger Bands
Bollinger Bands are a type of statistical chart that indicates the price volatility of financial commodities over time. The following parameters control typical Bollinger Bands, including the moving average (MA) and the constant W for controlling the bandwidth. The MA of a trading day i is the average price from the trading day i-mDay to i − 1, where mDay determines the number of days for calculating MA. The constant W is used to control the bandwidth. The upper and the lower bands are the components of the Bollinger Bands. The upper and lower bands are calculated using (MA + Wσ) and (MA − Wσ), where σ is the standard deviation of the given period. These parameters determine the form of the Bollinger Bands.
In the literature, many approaches take Bollinger Bands into consideration when designing trading strategies. For instance, Windasari et al. propose a technical analysis method that uses historical data and indicators to identify price fluctuations in a specific period [50]. Bollinger Bands and the Williams percent range are indicators used in the research to provide information about stock trends by following a particular pattern of buying/selling. For the dataset, they use the stocks of six companies from the Indonesia Stock Exchange. Their experimental results show that the average return of the companies performs well, which proves that Bollinger Bands are feasible as an indicator for finding trading signals. Prasetijo et al. propose trading strategies employing Bollinger Bands and parabolic SAR indicators [3]. They develop a web-based application by which to evaluate the performance of the proposed strategies.

Proposed Approach
In this section, we describe the proposed approach in detail. The flowchart of AGBCPT is presented in Section 3.1, and the AGBCPT components are introduced in Section 3.2, including the encoding scheme, the initial population, the fitness function, and the genetic operations. In Section 3.3, the AGBCPT algorithm is presented, followed by an example in Section 3.4.

AGBCPT Flowchart
The AGBCPT flowchart is shown in Figure 1.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 5 of 25 parabolic SAR indicators [3]. They develop a web-based application by which to evaluate the performance of the proposed strategies.

Proposed Approach
In this section, we describe the proposed approach in detail. The flowchart of AGBCPT is presented in Section 3.1, and the AGBCPT components are introduced in Section 3.2, including the encoding scheme, the initial population, the fitness function, and the genetic operations. In Section 3.3, the AGBCPT algorithm is presented, followed by an example in Section 3.4.

AGBCPT Flowchart
The AGBCPT flowchart is shown in Figure 1.  Figure 1 shows that the proposed approach collects the stock price series of the companies and then preprocesses the data, after which the population is randomly initialized according to the encoding scheme and the population size. The fitness calculation process determines the correlation coefficient matrix of all companies in each trading day T (Step 1). The number of days for the calculation is the cDay gene. Next, the cLimit gene is a threshold used to find the qualified stock pairs (Step 2), which are kept in TPset when their correlation coefficient value is smaller than cLimit. Then, the Bollinger Band channels for stock pairs are generated using the mDay and BBentryWidth genes (Step 3). On each date T, mDay is used to calculate the moving average, and BBentryWidth is used to calculate the upper and lower channels. The formulas of the upper and lower channels of entering are defined as where ( ) is the i-th moving average calculated as  Figure 1 shows that the proposed approach collects the stock price series of the companies and then preprocesses the data, after which the population is randomly initialized according to the encoding scheme and the population size. The fitness calculation process determines the correlation coefficient matrix of all companies in each trading day T (Step 1). The number of days for the calculation is the cDay gene. Next, the cLimit gene is a threshold used to find the qualified stock pairs (Step 2), which are kept in TPset when their correlation coefficient value is smaller than cLimit. Then, the Bollinger Band channels for stock pairs are generated using the mDay and BBentryWidth genes (Step 3). On each date T, mDay is used to calculate the moving average, and BBentryWidth is used to calculate the upper and lower channels. The formulas of the upper and lower channels of entering are defined as , the proposed approach sells s i and buys s j , and the pair (s i , s j ) is also recorded. It then continues to judge the entry conditions for the next candidate pair until all pairs are processed.
The next step is to generate the Bollinger Band channels again for the pairs that have been performed previously (Step 4). According to the mDay and BBoutWidth, the exiting channels are calculated as: The exiting conditions are (1) cp i T−oDay > LS i (T) > cp i T for stock s i and (2) cp j T−oDay < US j (T) < cp i T for stock s j , by which the proposed approach buys s i and sells s j . When a stock pair trading is complete, it records profit(s i , s j ) = income(s i , s j )/cost(s i , s j ) as well as the minimum value of the return, after which the trading pair (s i , s j ) is removed from TPset, and it continues to judge the next pair's exit condition until all pairs have been processed.
Finally, the fitness value of a chromosome, that is, the profit of all trading pairs divided by the minimum value of the return, is evaluated and the genetic operators are executed to generate new offspring. The process is repeated until the termination conditions are met.

AGBCPT Components
In this section, we describe four AGBCPT components: the encoding scheme, the initial population, the fitness function, and the genetic operations.

Encoding Scheme
The parameters used in the pairs trading strategy influence the pairs trading return. Because the strategy described here utilizes the correlation coefficient and Bollinger Bands, it takes into account the six parameters-correlation coefficient threshold (cLimit), entry width of the Bollinger Bands (BBentryWidth), out width of the Bollinger Bands (BBoutWidth), correlation coefficient calculation days (cDay), moving averages calculation days (mDay), forward observation days (oDay)-and encodes them into a chromosome with real numbers. The correlation coefficient is applied to find potential stock pairs, and the Bollinger Bands are employed to find pairs trading signals. The encoding scheme of a chromosome is shown in Table 1. Table 1. Encoding scheme of chromosome C q .
Chromosome C q cLimit q BBentryWidth q BBoutWidth q mDay q cDay q oDay q In Table 1, the genes representing cLimit and cDay belong to the correlation coefficient calculation, and mDay, BBentryWidth, BBoutWidth, and oDay belong to the Bollinger Bands. The cLimit value is the threshold of the correlation coefficient set for finding potential stock pairs. cDay represents the days for calculating the correlation coefficient of two stocks. The mDay, BBentryWidth, BBoutWidth, and oDay parameters are used in the Bollinger Bands. mDay represents the days for calculating the moving averages. The Bollinger Bands width of the up and down channels for the entry and exit signals are represented by BBentryWidth and BBoutWidth. oDay represents the days of the stock price comparison for a trading signal.

Initial Population
According to the predefined ranges of the six parameters, the initial population is generated randomly at the given population size. The parameter ranges are shown in Table 2.

Fitness Function
Since the goal of the fitness function is to evaluate the quality of the chromosome, it is important to define an appropriate fitness function. In the proposed method, the GA is utilized to find appropriate parameters for the pairs trading strategy; therefore, the fitness value of a chromosome is evaluated by the profit and risk of a pairs trading strategy. Before starting the fitness function, the profit of a stock pair after n transactions using the trading strategy is defined as where tpP t(Si, Sj) and tpC t(Si, Sj) are the income and the cost of the h-th stock pair (s i , s j ) in the t-th transaction, respectively. The total profit of a chromosome is then defined as where TPset contains the qualified stock pairs and |TPset| is the number of stock pairs. The risk of a chromosome is defined as where the function min() is used to find the smallest return from the set of profit h (s i , s j ); if all the returns are higher than one, the risk value is one. According to the total profit and the risk factors, the fitness function of a chromosome is defined as In other words, the fitness value of a chromosome is evaluated by the sum of the return and the minimum return of all trading pairs.

Genetic Operations
The crossover and mutation genetic operations are described in this section. First, the max-min-arithmetical (MMA) crossover operator applied to the population in the proposed algorithm. It is executed as follows: (1) two chromosomes Cq and Cp, randomly selected from the population, are Cq: [cLimit q , BBentryWidth q , BBoutWidth q , mDay q , cDay q , oDay q ] and C p : [cLimit p , BBentryWidth p , BBoutWidth p , mDay p , cDay p , oDay p ]; (2) Then, four new chromosomes are generated by the four operators based on a predefined parameter d as C new1 : [min(cLimit q , cLimit p ), . . . , min(oDay q , oDay p )]; C new2 : [max(cLimit q , cLimit p ), . . . , max(oDay q , oDay p )]; A one-point mutation operator is applied to the population to generate new offspring. Every gene is mutated itself according to the mutation rate. Once a gene is selected for mutation, it randomly generates a new value based on the given range (see Table 2).

Proposed AGBCPT
Before describing the proposed AGBCPT, the notation is introduced in Table 3.
The proposed AGBCPT is described below: Input: Selected companies: S = {s 1 , s 2 , . . . , s i , . . . , s n }, 1 ≤ i ≤ n, where n is the number of companies, and the closing prices of all the companies, with the i-th represented as where dTotal is the last trading day and NumCompanies is the number of companies.
Parameters: Population size pSize, max generation maxGeneration, mutation rate mRate, crossover rate cRate, and parameter for the max-min arithmetical crossover operator d.
Output: Chromosome with highest fitness value bestChro.

STEP 1:
Randomly initialize the population with population size pSize. Each chromosome has six genes: the correlation coefficient threshold (cLimit), the entry width of the Bollinger Bands (BBentryWidth), the out width of the Bollinger Bands (BBoutWidth), the correlation coefficient calculation days (cDay), the moving average calculation days (mDay), and the forward observation days (oDay).

STEP 2:
Use the following steps to calculate the correlation coefficient matrix of n companies MT (n) .

STEP 2.1:
Obtain the historical closing prices CPs i and CPs j of two companies s i and s j from the trading days (T − cDay q ) to (T − 1) according to cDay q in chromosome C q as

STEP 2.2:
Calculate the correlation coefficient of s i and s j using (10) STEP 2.3: Repeat Steps 2.1 and 2.2 to complete the correlation coefficient matrix MT (n) .

STEP 3:
Use the following steps to select the stock pairs whose CC sisj is less than cLimit q and then calculate the stock pair's entry and exit bands according to BBentryWidth q , BBoutWidth q , and mDay q of chromosome C q . STEP 3.1: Generate the trading pair candidate set according to TPset = {tp(s i , s j )|CC sisj ≤ cLimit q }, where cLimit q is the correlation coefficient threshold from chromosome C q . STEP 3.2: Obtain the closing prices CPs i and CPs j from trading days (T − mDay q ) to (T − 1) of both s i and s j of tp(s i , s j ) in TPset as  (1) and (2) as STEP 3.5: Use the moving average value and BBoutWidth q to calculate the exit upper and lower bands of s i and s j on day T based on Formulas (3) and (4) as

STEP 4:
Use the following steps to determine whether to start pairs trading.
Condition 2: The (T − oDay q ) closing price of the stock s j crosses the lower band of buy (LB) upward on T day: when the two entry conditions are met, as shown in Figure 2, it is expected that s i will continue to fall, and s j will continue to rise. Hence, short s i and long s j .
Appl. Sci. 2022, 12, x FOR PEER REVIEW 10 of 25 when the two entry conditions are met, as shown in Figure 2, it is expected that si will continue to fall, and sj will continue to rise. Hence, short si and long sj. Step 4.2: Short one unit of stock si and buy an integer number of stock sj at the same cost to stock si when > ; otherwise, buy one unit of stock si and short an integer number of stock sj at the same cost as stock sj.   Use the following steps to determine whether to finish every trading pair in TPset. Step 4.2: Short one unit of stock si and buy an integer number of stock sj at the same cost to stock si when > ; otherwise, buy one unit of stock si and short an integer number of stock sj at the same cost as stock sj.
Step 4.3: Record the cost tpC(si, sj) of this trading pair tp(si, sj).
Step 4.4: Remove tp(si, sj) from TPset if the entry conditions are not met.
Step 4.5: Go to Step 4.1 to determine the entry conditions of the next pair in TPset. STEP 5: Use the following steps to determine whether to finish every trading pair in TPset.
Step   If the stop conditions are not met (T + 1 < dTotal), set T = T + 1 and go to Step 2 to continue the entry and exit judgment. Otherwise, go to Step 7.

STEP 7:
Evaluate the fitness value of a chromosome by the average return and the risk of all trading pairs, as mentioned in the previous section.

STEP 8:
Repeat Steps 2 to 7 until the fitness value of every chromosome in the population is calculated.

STEP 9:
If the stop condition Generation = maxGeneration is met, then terminate the evolution process and goes to Step 14. Otherwise, set Generation = Generation + 1 and go to Step 10. STEP 10: Execute tournament selection to generate the next population. STEP 10.1: Select two chromosomes randomly from the population and compare their fitness values. The chromosome with the higher fitness value is kept for the next population. STEP 10.2: Repeat Step 10.1 until pSize chromosomes have been generated. STEP 11: Execute MMA crossover operator with parameter d and crossover rate cRate. STEP 12: Execute mutation operator to generate a new offspring with mutation rate mRate. STEP 13: Go to Step 2 to evaluate the fitness of new chromosomes.

STEP 14:
Output the chromosome with the highest fitness value as the best chromosome bestChro.

AGBCPT Example
In this section, the stock price series of the six companies in Table 4 are used as the input dataset to demonstrate AGBCPT. Each stock price series contains thirteen stock prices.  10, 9.75, 9.5, 9, 8.5, 9, 8.5, 8, 7.75, 7.5, 7.5 The parameters used in this example are stated as follows. The population size was set at 5, the parameter for the MMA crossover operator was set at 0.7, and the crossover and mutation rates were set at 0.8 and 0.1. Below, the example is given and explained step-by-step.

STEP 1:
The population is initialized. Since pSize is 5, the initial population can be randomly generated according to the encoding schema and the predefined ranges of parameters. Take C 1 as an example. The six parameters are generated as [−0.98, 1.0, 0.5, 10, 10, 1]. In the same way, the initial population is formed and shown in Table 5.

STEP 3:
The cLimit q value is used to find the qualified stock pairs and BBentryWidth q , BBoutWidth q , and mDay q are used to generate the entry and exit bands.

STEP 4:
The oDay q , entry upper and lower bands are used to determine whether trading pair tp(s i , s j ) in TPset meet the conditions to enter the market. Since the above entry conditions are met, it is expected that S 1102 will continue to fall and S 2412 will continue to rise; the pairs trading strategy then shorts S 1102 and longs S 2412 . The conditions are shown in Figure 4. Since the above entry conditions are met, it is expected that S1102 will continue to fall and S2412 will continue to rise; the pairs trading strategy then shorts S1102 and longs S2412. The conditions are shown in Figure 4. Step 4.2: Because the closing price on trading Day 11 of (59) is greater than (27), their ratio is rounded to 2. Hence, the trading strategy then longs one unit of S2412 and shorts two units of S1102. Thus, their investment capital is nearly equal. has met the conditions to exit the market.
Step 5.5: The number of transactions is set to totalSell = totalSell + 1.  The upper and lower exit bands are then used to determine whether tp(S 1102 , S 2412 ) has met the conditions to exit the market. The trading stop conditions are checked. If stop condition (T + 1 < dTotal) is not met, then T is set to T + 1, Step 2 is executed, and the entry and exit judgment is continued for the next trading day. Otherwise, Step 7 is executed.

STEP 7:
Because the three profits of the trading pair are calculated as 13.57%, 10.15%, and 1%, the risk of the trading pair is 1%, which is the minimum value of the three profits. The fitness value of C 1 is then calculated by the total profit and risk of the trading pair as 24.72 (=24.72%/1%).

STEP 8:
Steps 2 to 7 are repeated to calculate the fitness values of all chromosomes, yielding the results shown in Table 7.

STEP 9:
If the stop condition is met, Step 14. Otherwise, Step 10 is executed to generate next population. STEP 10: Tournament selection is used to generate the next population. STEP 10.1: Take the two chromosomes shown in Table 8 as an example. Because the fitness value of C 1 is greater than C 4 , C 1 is retained for the next population.

STEP 10.2:
Step 10.1 is repeated until the number of chromosomes is equal to 5. STEP 11: The MMA crossover operator is applied to generate offspring. The MMA parameter d and the crossover rate cRate are set to 0.7 and 0.8. For every two chromosomes, four new chromosomes are generated as candidate offspring. Take chromosomes C 1 and C 2 as an example. After crossover, the final offspring are shown in Table 9.

STEP 12:
The one-point mutation operator is executed to generate new offspring according to the mutation rate. STEP 13: After executing the crossover and mutation operators, Steps 2 to 8 are used to calculate the fitness value of the new chromosomes. STEP 14: The chromosome with the highest fitness value is outputted. In this example, according to Table 7, C 1 : [−0.98, 1.0, 0.5, 10, 10, 1] is selected and outputted as the parameters for trading.

Experimental Results and Discussion
In this section, we describe experiments conducted to show the effectiveness of the proposed approach, and discuss the results. The experimental dataset consisted of companies selected from the Taiwan stock exchange (TSE). Companies with stock price series from the top 50 companies in the Taiwan stock market were selected. The dataset contained stock price series from 1 January 2009 to 31 December 2020. The stock price series are shown in Figure 5.
In this section, we describe experiments conducted to show the effectiveness of the proposed approach, and discuss the results. The experimental dataset consisted of companies selected from the Taiwan stock exchange (TSE). Companies with stock price series from the top 50 companies in the Taiwan stock market were selected. The dataset contained stock price series from 1 January 2009 to 31 December 2020. The stock price series are shown in Figure 5. In Figure 5, most stock prices fall between 0 and 100, with some between 100 and 400; only three exceed 400. In addition to the stock price series, the correlation coefficient distribution between companies with cDay set at 20 is shown in Figure 6. In Figure 6, the ratio of the numbers of stock pairs with positive and negative correlation coefficients is 3.32 (=2,121,549/638,879), which means that the number of stock pairs with negative correlation coefficients is smaller than that with positive ones. Note that the correlation coefficient distribution may be affected by cDay and the period of the dataset.   In Figure 5, most stock prices fall between 0 and 100, with some between 100 and 400; only three exceed 400. In addition to the stock price series, the correlation coefficient distribution between companies with cDay set at 20 is shown in Figure 6.
In this section, we describe experiments conducted to show the effectiveness of the proposed approach, and discuss the results. The experimental dataset consisted of companies selected from the Taiwan stock exchange (TSE). Companies with stock price series from the top 50 companies in the Taiwan stock market were selected. The dataset contained stock price series from 1 January 2009 to 31 December 2020. The stock price series are shown in Figure 5. In Figure 5, most stock prices fall between 0 and 100, with some between 100 and 400; only three exceed 400. In addition to the stock price series, the correlation coefficient distribution between companies with cDay set at 20 is shown in Figure 6. In Figure 6, the ratio of the numbers of stock pairs with positive and negative correlation coefficients is 3.32 (=2,121,549/638,879), which means that the number of stock pairs with negative correlation coefficients is smaller than that with positive ones. Note that the correlation coefficient distribution may be affected by cDay and the period of the dataset.   In Figure 6, the ratio of the numbers of stock pairs with positive and negative correlation coefficients is 3.32 (=2,121,549/638,879), which means that the number of stock pairs with negative correlation coefficients is smaller than that with positive ones. Note that the correlation coefficient distribution may be affected by cDay and the period of the dataset.
To show the effectiveness of the proposed approach, we conducted three experiments concerning the following: (1) The impact of the three new parameters to the pairs trading strategy; (2) the impact of the proposed approach under different stock trends; (3) a comparison of AGBCPT and GBCPT.

Impact of Three New Parameters on Pairs Trading Strategy
To observe their impacts on the pairs trading strategy, we adjusted the three new parameters: the correlation coefficient calculation days (cDay), the moving average calculation days (mDay), and the forward observation days (oDay). In the experiments, we adjusted one parameter at a time, while using the default values for the others. The default values for cDay, mDay, and oDay were 10, 10 and 1, respectively, and the values of cLimit, BBentryWidth and BBoutWidth were set to −0.73, 2.3 and 1. To show the effectiveness of the proposed approach, we conducted three experiments concerning the following: (1) The impact of the three new parameters to the pairs trading strategy; (2) the impact of the proposed approach under different stock trends; (3) a comparison of AGBCPT and GBCPT.

Impact of Three New Parameters on Pairs Trading Strategy
To observe their impacts on the pairs trading strategy, we adjusted the three new parameters: the correlation coefficient calculation days (cDay), the moving average calculation days (mDay), and the forward observation days (oDay). In the experiments, we adjusted one parameter at a time, while using the default values for the others. The default values for cDay, mDay, and oDay were 10, 10 and 1, respectively, and the values of cLimit, BBentryWidth and BBoutWidth were set to −0.73, 2.3 and 1.   Figure 7 shows that when cDay is set to 5 and 10, the profits of the pairs trading strategy yield a positive profit, with the best profit at 4.25%. When cDay is set to 15 and 20, the pairs trading strategy yields a negative profit, with the worst profit at −18.42% when cDay is set to 20. For the parameter mDay, the pairs trading strategy yields positive profits when mDay is set to 10 and 15. When mDay is set to 5 and 20, it yields negative profits, with the worst profit at −7.69% when mDay is set to 5. The experimental results for parameter oDay show that the best profit is 4.25% when oDay is set to 1, and the profit becomes worse when oDay increases. When oDay is set to 3, the profit is negative. Hence, for long training periods, the suggested parameter setting is 10, 10 and 1 for cDay, mDay, and oDay. In Figure 8, when cDay is set to 5, 10 and 15, the pairs trading strategy yields no profits, or negative profits, on the three-year dataset. When cDay is set to 20, the profit is best at 3.1%. For parameter mDay, the pairs trading strategy yields the best profit at 27.38% when mDay is set to 15. For parameter oDay, the profit increases with increases in oDay;  Figure 7 shows that when cDay is set to 5 and 10, the profits of the pairs trading strategy yield a positive profit, with the best profit at 4.25%. When cDay is set to 15 and 20, the pairs trading strategy yields a negative profit, with the worst profit at −18.42% when cDay is set to 20. For the parameter mDay, the pairs trading strategy yields positive profits when mDay is set to 10 and 15. When mDay is set to 5 and 20, it yields negative profits, with the worst profit at −7.69% when mDay is set to 5. The experimental results for parameter oDay show that the best profit is 4.25% when oDay is set to 1, and the profit becomes worse when oDay increases. When oDay is set to 3, the profit is negative. Hence, for long training periods, the suggested parameter setting is 10, 10 and 1 for cDay, mDay, and oDay.
In Figure 8, when cDay is set to 5, 10 and 15, the pairs trading strategy yields no profits, or negative profits, on the three-year dataset. When cDay is set to 20, the profit is best at 3.1%. For parameter mDay, the pairs trading strategy yields the best profit at 27.38% when mDay is set to 15. For parameter oDay, the profit increases with increases in oDay; the best profit at 2.97% when oDay is set to 3. As a result, for the three-year training period, the suggested parameter setting is 20, 15 and 3 for cDay, mDay, and oDay.
From Figure 9, we see that in the two-year dataset, the worst profit is −5.62% when cDay is set to 5. For parameter mDay, the best profit is 13.32% when mDay is set to 13.32%, and the profit is 0, 1.25, and 3.61 when mDay is set to 10, 15, and 20, respectively. For oDay, the profit increases while the set of oDay becomes larger; the best profit is 4.83% when oDay is set to 3. Hence, for the two-year training period, the suggested parameter setting is 5 and 3 for mDay and oDay. For cDay, however, additional experiments are needed to determine a suitable setting.
The results of Figure 10 are as follows. For parameter cDay, the best profit is 6.97% when it is set to 10. When cDay is set to 5, 15, 20, the profits are around zero. For parameter mDay, the positive profits are 1.82%, 6.97%, and 3.41% when mDay is set to 5, 10, and 15, respectively. In addition, the best profit is 6.97% when oDay is set to 1, and the worst is −0.93% when oDay is set to 3. Thus, for short training periods, the suggested parameter setting is 10, 10, and 1 for cDay, mDay, and oDay. In Figure 8, when cDay is set to 5, 10 and 15, the pairs trading strategy yields no profits, or negative profits, on the three-year dataset. When cDay is set to 20, the profit is best at 3.1%. For parameter mDay, the pairs trading strategy yields the best profit at 27.38% when mDay is set to 15. For parameter oDay, the profit increases with increases in oDay; the best profit at 2.97% when oDay is set to 3. As a result, for the three-year training period, the suggested parameter setting is 20, 15 and 3 for cDay, mDay, and oDay. From Figure 9, we see that in the two-year dataset, the worst profit is −5.62% when cDay is set to 5. For parameter mDay, the best profit is 13.32% when mDay is set to 13.32%, and the profit is 0, 1.25, and 3.61 when mDay is set to 10, 15, and 20, respectively. For oDay, the profit increases while the set of oDay becomes larger; the best profit is 4.83% when oDay is set to 3. Hence, for the two-year training period, the suggested parameter setting   The results of Figure 10 are as follows. For parameter cDay, the best profit is 6.97% when it is set to 10. When cDay is set to 5, 15, 20, the profits are around zero. For parameter mDay, the positive profits are 1.82%, 6.97%, and 3.41% when mDay is set to 5, 10, and 15, respectively. In addition, the best profit is 6.97% when oDay is set to 1, and the worst is −0.93% when oDay is set to 3. Thus, for short training periods, the suggested parameter setting is 10, 10, and 1 for cDay, mDay, and oDay.
Notably, the results show that various parameter settings influence the profit of the pairs trading model. That is, determining suitable parameters for the pairs trading strategy is a difficult task and constitutes an optimization problem. We thus use AGBCPT to determine parameters that yield better performance for the pairs trading strategy.

Impact of AGBCPT under Different Stock Trends
To show the effectiveness of the proposed approach on different trends, we conducted experiments using different stock trends as the testing datasets, including upwardtrend, correction-trend, and downward-trend datasets. The buy-and-hold method (BAH) was used in comparison with AGBCPT. BAH is executed as the following steps. It buys all of the stocks on the first trading day and sells them all on the last trading day, after which the profit of the transactions is calculated. As shown in Figure 11, the datasets used in this experiment were selected correspond to Taiwan stock market trends. Notably, the results show that various parameter settings influence the profit of the pairs trading model. That is, determining suitable parameters for the pairs trading strategy is a difficult task and constitutes an optimization problem. We thus use AGBCPT to determine parameters that yield better performance for the pairs trading strategy.

Impact of AGBCPT under Different Stock Trends
To show the effectiveness of the proposed approach on different trends, we conducted experiments using different stock trends as the testing datasets, including upward-trend, correction-trend, and downward-trend datasets. The buy-and-hold method (BAH) was used in comparison with AGBCPT. BAH is executed as the following steps. It buys all of the stocks on the first trading day and sells them all on the last trading day, after which the profit of the transactions is calculated. As shown in Figure 11, the datasets used in this experiment were selected correspond to Taiwan stock market trends. The trends were chosen as the following testing intervals: (1) 2020 was selected as the upward-trend dataset, (2) 2012 is selected was the correction-trend dataset, and (3) 2015 was selected as the downward-trend dataset. According to the trend periods, three training and testing periods are shown in Table 10.

Market Trend
Training Period Testing Period Figure 11. Taiwan stock market trends.
The trends were chosen as the following testing intervals: (1) 2020 was selected as the upward-trend dataset, (2) 2012 is selected was the correction-trend dataset, and (3) 2015 was selected as the downward-trend dataset. According to the trend periods, three training and testing periods are shown in Table 10. The training results of the AGBCPT and BAH methods are shown in Figure 12. The trends were chosen as the following testing intervals: (1) 2020 was selected as the upward-trend dataset, (2) 2012 is selected was the correction-trend dataset, and (3) 2015 was selected as the downward-trend dataset. According to the trend periods, three training and testing periods are shown in Table 10. The training results of the AGBCPT and BAH methods are shown in Figure 12. From Figure 12, in the three training periods, we observe that both methods yield positive profits. The profits of the three trends with AGBCPT are 50.05%, 58.32%, and From Figure 12, in the three training periods, we observe that both methods yield positive profits. The profits of the three trends with AGBCPT are 50.05%, 58.32%, and 26.39%, which are all better than that with BAH. Based on the trained results, their profits on the testing datasets are shown in Figure 13.  26.39%, which are all better than that with BAH. Based on the trained results, their profits on the testing datasets are shown in Figure 13. From Figure 13, we observe that in the testing phase, the results of the upward-trend dataset show that the 9.86% profit of AGBCPT is better than that of BAH (5.98%). In the correction-trend dataset, the results show similar profits for both AGBCPT and BAH: the profit of AGBCPT is 6.25%, and that of BAH is 6.37%. For the downward-trend dataset, AGBCPT yields no profit in the testing period (0%). However, compared with the BAH, From Figure 13, we observe that in the testing phase, the results of the upward-trend dataset show that the 9.86% profit of AGBCPT is better than that of BAH (5.98%). In the correction-trend dataset, the results show similar profits for both AGBCPT and BAH: the profit of AGBCPT is 6.25%, and that of BAH is 6.37%. For the downward-trend dataset, AGBCPT yields no profit in the testing period (0%). However, compared with the BAH, AGBCPT is better than BAH because the profit of BAH is −17.01%. This shows that AGBCPT reduces risk on a downward-trend dataset.

Comparison of AGBCPT and GBCPT
In this section, we compare the proposed AGBCPT method with the previous GBCPT method [28]. Table 11   The profits of AGBCPT and GBCPT in the training phase are shown in Figure 14. In Figure 14, the AGBCPT and GBCPT profits are positive in the training phase, with AGBCPT achieving higher profits than GBCPT. Based on the trained results, the profits on the testing datasets are shown in Figure 15. In Figure 14, the AGBCPT and GBCPT profits are positive in the training phase, with AGBCPT achieving higher profits than GBCPT. Based on the trained results, the profits on the testing datasets are shown in Figure 15. In Figure 14, the AGBCPT and GBCPT profits are positive in the training phase, with AGBCPT achieving higher profits than GBCPT. Based on the trained results, the profits on the testing datasets are shown in Figure 15.  Figure 15 shows that the AGBCPT and GBCPT profits using the three trained models are 3.15%, 0.91%, and 0%, and −2.33%, 2.46%, and −1.47%, respectively. In addition, the GBCPT profits are negative on the one-year and three-year datasets. However, the AGBCPT profits are positive. Thus, we conclude that the fitness function in the AGBCPT method, which accounts for risk, reduces the possibility of negative profits in the testing period. Next, to show the profitability of AGBCPT, experiments were conducted on the oneyear training dataset (2015) and the one-year (2016), two-year (2016-2017), and three-year (2016-2018) testing datasets; the results on the training period are shown in Figure 16.  Figure 15 shows that the AGBCPT and GBCPT profits using the three trained models are 3.15%, 0.91%, and 0%, and −2.33%, 2.46%, and −1.47%, respectively. In addition, the GBCPT profits are negative on the one-year and three-year datasets. However, the AGBCPT profits are positive. Thus, we conclude that the fitness function in the AGBCPT method, which accounts for risk, reduces the possibility of negative profits in the testing period. Next, to show the profitability of AGBCPT, experiments were conducted on the one- In Figure 16, the 21.9% profit of AGBCPT is better than the 13.39% of CBCPT. Figure  17 compares the two approaches on the testing periods in terms of profit. In Figure 16, the 21.9% profit of AGBCPT is better than the 13.39% of CBCPT. Figure 17 compares the two approaches on the testing periods in terms of profit. In Figure 16, the 21.9% profit of AGBCPT is better than the 13.39% of CBCPT. Figure  17 compares the two approaches on the testing periods in terms of profit.  Figure 17 shows that in the one-year testing period, neither method yields a profit. In the two-year testing period, the 16.34% profit of AGBCPT is better than the 1.35% of GBCPT. In the two-year testing period, the 20.26% profit of AGBCPT is better than the 8.26% of GBCPT. From the experimental results, we conclude that AGBCPT is effective and profitable for middle-long-term trading.

Discussion
In this section, we discuss how to improve the efficiency of the proposed AGBCPT method, how to make the derived trading strategy more profitable and stable, and the applications of AGBCPT.  Figure 17 shows that in the one-year testing period, neither method yields a profit. In the two-year testing period, the 16.34% profit of AGBCPT is better than the 1.35% of GBCPT. In the two-year testing period, the 20.26% profit of AGBCPT is better than the 8.26% of GBCPT. From the experimental results, we conclude that AGBCPT is effective and profitable for middle-long-term trading.

Discussion
In this section, we discuss how to improve the efficiency of the proposed AGBCPT method, how to make the derived trading strategy more profitable and stable, and the applications of AGBCPT.
For the first issue, the main differences between AGBCPT, the proposed approach, and GBCPT, the compared approach, are the encoding schema and the fitness function. In AGBCPT, six parameters are encoded into a chromosome, and the return and risk are jointly considered to evaluate the fitness value of a chromosome. In GBCPT, three parameters are used, and every chromosome is evaluated only by return. Hence, execution times for AGBCPT are slightly longer than those for GBCPT. Taking AGBCPT as an example, the execution time is 43,741 s with a population size of 60 and 40 stocks, which is timeconsuming. The efficiency of AGBCPT can be improved via soft computing techniques or hardware devices. For example, chromosomes could be clustered into groups. For every group, the fitness value of a selected representative chromosome could be calculated and used as the fitness value of the other chromosomes in the same group. By thus using k-means clustering, only k chromosomes are selected to calculate fitness values, resulting in reduced time costs. Alternatively, the graphics processing unit (GPU) could be utilized to speed up data calculations.
For the second issue, in the proposed approach, the correlation coefficient of stocks and Bollinger Bands are used to identify trading pairs and signals. However, other factors could be considered to increase profitability and reduce risk. For instance, company fundamentals, e.g., earnings per share, or the P/E ratio, could be used as a filter to avoid high risk stocks. In addition, industrial information could be considered by using the correlation coefficient of industries to identify relationships between industries, yielding more profitable and stable trading pairs.
As to the applications of AGBCPT, along with the popularity of program trading in recent years, AGBCPT could be enclosed as a module for providing programmers to design trading procedure which can generate trading signals automatically for trading. Besides, for securities company, from customer relationship management point of view, AGBCPT can be embed in their trading system as a function for providing more information to users, which may increase customer loyalty.

Conclusions and Future Work
Trading strategies are commonly used approaches for finding buying or selling signals for trading. One type of trading strategy is the pairs trading strategy. In the past, parameters in pairs trading strategies are usually set through experience, which is typically time-consuming. In this paper, negative correlation coefficient trading pairs, genetic algorithms, and Bollinger Bands are considered in AGBCPT, the proposed advanced genetic Bollinger Bands and correlation-coefficient based pairs trading algorithm, to determine the appropriate parameters for the long-short pairs trading strategy. To verify the effectiveness of AGBCPT, experiments were conducted on real datasets, showing that the parameters considered in pairs trading do affect the profitability of the pairs trading strategy; AGBCPT profit is superior to that of BAH and GBCPT for three stock market trends on various training and testing periods; and the fitness function used in AGBCPT also outperforms that of the previous approach in terms of reducing the trading risk of the trained model. Besides, AGBCPT can also be used as a module or function for securities company for providing more information to users to increase customer loyalty. In the future, we will enhance the proposed approach in the following directions: (1) by enhancing the pairs trading optimization algorithm by adding more stocks to the dataset to identify more profitable potential pairs; (2) by using other algorithms in pairs trading strategies to determine better parameter settings for more complex financial problems; (3) by utilizing statistical tests to verify whether AGBCPT is significantly better than existing approaches, or comparing with other pairs trading algorithms to identify the merits of AGBCPT; and (4) by considering industry relations among stocks to classify stocks as groups, generating more profitable trading pairs.