The Complexity Behavior of Big and Small Trading Orders in the Chinese Stock Market

The Chinese stock market exhibits many characteristics that deviate from the efficient market hypothesis and the trading volume contains a great deal of complexity information that the price cannot reflect. Do small or big orders drive trading volume? We studied the complex behavior of different orders from a microstructure perspective. We used ETF data of the CSI300, SSE50, and CSI500 indices and divided transactions into big and small orders. A multifractal detrended fluctuation analysis (MFDFA) method was used to study persistence. It was found that the persistence of small orders was stronger than that of big orders, which was caused by correlation with time. A multiscale composite complexity synchronization (MCCS) method was used to study the synchronization of orders and total volume. It was found that small orders drove selling-out transactions in the CSI300 market and that big orders drove selling-out transactions in the CSI500 market. Our findings are useful for understanding the microstructure of the trading volume in the Chinese market.


Introduction
Many transactions take place in the stock market every day and these transactions cause market fluctuations. The Chinese stock market has many characteristics that deviate from an efficient market [1,2] and financial anomalies, such as momentum and reversal effects in the stock market, show that price does not fully reflect market information. The trading volume contains a great deal of complexity information that price cannot reflect as a direct description of stock trading. The stock trading volume contains a lot of information, including fundamental stock trading information [3] and long-term stock market performance data [4]. For stock technical indicators, the trading volume can increase information on price and earnings [5,6], reflect volatility information about the stock [7], and provide international earnings spillover information [8]. Many volume studies have linked volume to expected returns. The volume-price relationship is a key issue and is important in the stock market [9,10]; the correlation between them changes with time [11]. The relationship between volume and price volatility has also been studied as it affects the stock market [12], the money market [13] and the foreign exchange market [14]. Some researchers have suggested that risk exposure [15] and household belief dispersion [16] are related to the trading volume of stocks. The driving force behind the trading volume is a problem that has concerned many investigators [17,18]. Studies undertaken include those from the investor's perspective and consider which types of investors are the main drivers of trading volume. With respect to trading volume, buying and selling-out volume have been investigated separately [19]. In the present study, we sought to examine the information contained in the volume microstructure. The financial market is similar to a physical system made up of numerous interacting agents and many economists have used physical methods to study its complexity [20]. To investigate the complex information contained in trading volume data, we applied multifractal and entropy methods.
The Peter fractal market hypothesis states that the stock price follows fractional Brownian motion and challenges the strict assumption of an efficient market [21]. Hurst found that the current value of the time series affects the future value in a way that transcends random perturbations [22]. The study defined this phenomenon as the long memory of the time series and proposed the use of rescaled range analysis (R/S) for measurement. Mandelbrot proposed fractional Brownian motion (FBM) in 1968, developing a model combined with the Hurst index to form a well-established research system [23]. Peng proposed the detrending fluctuation analysis method (DFA) applied to the investigation of DNA sequences [24]. This method is more accurate and easier to implement than the R/S method and has gradually become the mainstream method for measuring long-term memory. Kantelhardt proposed the multifractal detrended fluctuation analysis (MFDFA) method based on improvement of the detrended fluctuation analysis (DFA) method, extending the method to multiple scales to correspond more closely to the reality of financial data [25]. Economists have applied long memory to non-stationary time series in the financial market and used multifractal methods to study the stock market [26,27], the futures market [28], the foreign exchange market [29], and the Bitcoin market [30,31]. Thompson verified improved fitting using the MFDFA method when measuring financial time series generalized Hurst indices and multifractal spectra [32].
In 1948, Shannon proposed the concept of information entropy and suggested that the greater the entropy, the greater the uncertainty of the variable, and the greater the amount of information required. Many scholars have extended the entropy method and proposed new entropy measurement methods, such as Deng entropy [33], dispersion entropy [34], and multiscale entropy [35]. The application of entropy in finance is based on information entropy. In 1972, Philippatos and Wilson first applied entropy to finance, building portfolios with minimal entropy, and received considerable returns [36]. The concept of entropy is applied in options pricing [37], risk measurement [38], and utility calculations [39]. In the stock market, entropy is used in the study of volume heterogeneity [40], volatility forecasting [41,42], and the investigation of stock market regularity during turbulence periods [43]. In recent years, entropy has been used to measure the similarity between financial markets [44], to study the synchronization of stock returns [45], and to determine the similarity between stock and commodity markets [46]. Pincus proposed an approximate entropy method [47] to measure time series complexity. Richman proposed a more accurate sample entropy method based on the approximate entropy [48]. Costa introduced a coarse-graining procedure to assess multiscale entropy at multiple timescales [49]. Xu researched the complexity problem of time series. Their study combined the entropy measurement method with the complexity invariant distance (CID) and proposed a multiscale composite complexity synchronization (MCCS) approach, which can replace cross-sample entropy when calculating the similarity of different time series [50].
There have been many investigations regarding the microstructure of the stock market. The trading volume is often included in microstructure research as trading volume has a driving effect on volatility, opening price, and income spillover [5,7,8,51]. Suominen used a game model to study the information content of the trading volume [52], while Wang studied the circuit breakers and volume structure of the Chinese stock market [53]. Ormos studied the impact of the financial crisis on the microstructure of the trading volume [54]. Xu first investigated the microstructure of the Chinese stock market in terms of volume and volatility [55], while Covrig and Ng studied the driving forces behind trading volume from an investor perspective [18]. Alvarez-Ramírez and Rodríguez used the DFA method to study the temporal correlation of trading volume [56]. Lee observed a stronger cross-correlation in stock buy volume from a buy and sell trade perspective [19]. The previous studies only investigated the microstructure of trading volume; however, we have applied economic physics methods to investigate the complexity behavior of trading volume. For the benefit of the reader, Table 1 summarizes previous studies and highlights the contribution of this paper. The driving influences of volume for big and small orders are conceivably different, being affected by time correlation, and selling transactions are more likely to have a clear driving force. This is a hypothesis that we discuss in the paper. We aim to study the complexity behavior of different orders to find which orders drive the trading volume of the Chinese stock market. We investigate the fractal characteristics of trading volume and the complexity synchronization between the total volume and trading orders. We use the ETF data of the CSI 300, the SSE 50, and the CSI 500 indices from the Chinese stock market. We divide transactions into buying transactions and selling-out transactions and discuss their complexity characteristics. The MFDFA method is used to obtain a multifractality index and the fractal spectrum of the volume. In addition, the fractal characteristics of different markets are compared. We also plot the multifractality curves and analyze the reasons for the fractal characteristics of the three markets using the shuffle sequence. We find that small orders are persistent. On this basis, we analyze the complexity synchronization of the orders and total volume. We find that the transactions are influenced by different orders in different markets. For the selling-out market, small orders are the leading force in CSI300, while big orders are the main force in CSI500. This feature is caused by the correlation of time series.
The paper is organized as follows: Section 2 describes the ETF data and order transaction indicators and provides detailed MFDFA and MCCS calculations. Section 3 provides a simple explanatory analysis of the data, presents descriptive statistics and discusses their characteristics. Section 4 discusses the fractal characteristics and complex synchronization characteristics of orders and discusses them in relation to the findings of earlier studies. Section 5 gives the concrete value of the investment, considers the transferability of the methodology, and suggests future research directions.

Data
ETF is the exchange-traded fund. ETF tracks the sector index or the market index and can be purchased and redeemed at trading time or traded in the market like stocks. Investors can use ETFs for arbitrage; in the Chinese market, ETFs cannot be short sold. This paper uses data from the CSI300 ETF, SSE50 ETF, and CSI500 ETF from 1 January 2018 to 30 May 2022. CSI300 is the most representative index of the Chinese stock market and reflects the overall situation of China's A-share market. The SSE50 is composed of 50 stocks with the highest dividend yield and the largest cash dividend and reflects the performance of leading enterprises in the Chinese market. The CSI500 reflects the stock performance of growth companies in the Chinese market. We selected the largest of the three index ETFs: Huatai-PB CSI300 ETF (510300), ChinaAMC SSE50 ETF (510050), and ChinaSouthern CSI500 ETF (510500).
The data includes the daily total volume and the daily trading order volume; we select the initiative buying and selling-out volume. The initiative buying is to agree at the lowest price of the selling orders; the initiative selling-out is to agree at the highest price of the buying orders. For the total volume, we use the total trading shares of the buying and selling-out. For trading orders, we divide trading orders into small orders and big orders.Transactions with a turnover of less than 40,000 yuan represent small orders, and transactions with a turnover of more than 1 million yuan represent big orders. The data was obtained from the Wind financial terminal.

MFDFA Method
The MFDFA method divides time series into sub-intervals and eliminates local trends, fits a residual sequence function, calculates the generalized Hurst exponent, and expresses the multifractal characteristics of time series through the power law correlation of functions and multifractal spectra. The MFDFA method is briefly described below.
Given a sample time series x(t)(t = 1, 2 . . . N), the sequence length is N. The MFDFA method consists of several steps: Step 1 Construct a time series side, wherex = ∑ N t=1 x(t) Step 2 Split the sub-interval We divide X(i) into N s = int(N/s) with s as the equal interval; the length of the time series N is not always an integer multiple of the interval s. We ensure that the information of the sequences is included in the segments, thereby, the sequence obtains 2N s segments.
Step 3 Local trend elimination For each segment υ, the trend can be eliminated with a local polynomial, where m = 3 is selected Step 4 Calculates the volatility function, where q is not equal to 0 Step 5 Calculates the scale variable The power-law function F q (s) satisfies the following relationship F q (s)∝s h(q) , h(q) is the slope of logF q (s) = h(q)log(s) + log C, called the generalized Hurst index, adjust q to get a different h(q). When q > 0, the characteristics of large fluctuating segments are indicated, and when q < 0, the characteristics of small fluctuating segments are indicated. When h(q) > 0.5, the sequence is persistent and the previously ascending sequence still rises afterwards; When 0 < h(q) < 0.5, the sequence is anti-persistent and the previously ascending sequences revert to the mean. Then, we plot multifractal spectra to present the fractal behaviors of the sequences [57].
We use α to describe the time series' variation and depict the partial multifractal characteristics. The smaller α, the greater the singularity. τ(q) is the Renyi index, f (α) is the multifractal spectrum. The greater the fractal spectrum width α max − α min , the more significant the fractal and the greater the slope of the Hurst index. ∆ f (α) = f (α) max − f (α) min ; the larger the ∆ f (α), the greater the difference between series, the greater the fluctuation, the more uneven the distribution.

MCCS Method
For financial time series, this method first follows a coarse-graining procedure to the series and then calculates a complexity synchronization measure, which is composed of sample entropy and complexity invariant distance.
Given two equal length time series x n , y n with multiscale scale factor τ, where N τ = int(n/τ), The sample entropy is an improvement on the approximate entropy and is calculated as follows: For the time series x n , m is the giving dimension, which can constitute the m-dimensional vector x m (i) = {x i , x i+1 , . . . , x i+m−1 }, where i ∈ {1, 2, . . . , n − m}; in this paper, m = 2. The distance between two vectors x m (i) and x m (j) is d(x m (i), x m (j)) = max(|x i+k − x j+k | : 0≤k≤m − 1); when the distance d is less than the tolerance level r, the vector x m (i) is said to be close to x m (j), N m i (r) is the number of vectors close to the vector x m (i), tolerance r is 0.15 times the standard deviation in this paper, and the probability of being close to x m (i) is The mean for this probability is The sample entropy is defined as [48] SampEn(x, m,r) = − log( The generalized complexity invariant distance measures the complexity relationship between two-time series, x and y, according to the Minkowski distance, The generalized complexity invariant distance is calculated as We define MCCS(x, y, p, τ) = CCS(x (τ) , y (τ) , p), where CCS is calculated from the sample entropy and the generalized complexity invariant distance, The correlation coefficient ρ MCCS of multiscale complexity synchronization between x and y is defined as, Table 2 shows the order transaction of the CSI300, SSE50 and CSI500 markets. The SSE50 market has a higher volume and the CSI500 market has a lower volume. For different orders, the mean and standard deviation of buying and selling transactions for big orders are higher than those for small orders, which indicates that the big orders have a higher absolute volume than small orders and dramatic volatility. The volume of the three markets shows peak characteristics and the JB test results strongly reject the normal distribution hypothesis. This suggests that the volume of the three markets does not closely follow a normal distribution and needs to be studied from the perspective of a fractal market. We consider the inconsistencies between the different markets; the data is normalized,

Explanatory Analysis
To study the characteristic of big and small orders in different markets, Figures 1-3 present relevant histograms. The big and small orders in the three markets have obvious spikes and right-bias characteristics.

Fractal Characteristics of Big and Small Orders
We divide big and small orders into buying and selling-out trades and obtain four types of orders. We plot the generalized Hurst index curve; the fluctuation scale q is selected as (−15, 15). The result are presented in Figure 4. The Hurst curves of the three markets are all above 0.5, the transaction shows persistence, and the persistence of small fluctuations is especially significant. In the CSI300 market, the selling-out transaction for big orders has a stronger persistence. The buying transactions for big orders have lower persistence with small fluctuations, while the buying transactions for small orders have lower persistence with significant fluctuations. In the CSI500 market, the different orders have similar persistence and the persistence has differences including both minimal and great fluctuations. For the SSE50 market, the buying transactions of small orders show a high degree of persistence and the persistence of the big orders is lower than that of the small orders. When q > 10, the Hurst index for big buying transactions is close to 0.5, indicating that, when significant fluctuations occur, the market volume is random and there is no obvious persistence or anti-persistence.
The Hurst curve after the shuffle of the original sequence is presented in Figure 5. The Hurst index of the three markets significantly reduces, showing that the persistence of the transaction is caused by the time series correlation. For the CSI300 market, the small orders after the shuffle have apparent persistence under small fluctuations. The transaction of big orders is close to 0.5 when q is near 0, indicating that, when small fluctuations occur, the buying transactions for big orders have no prominent memory characteristics. The SSE50 market shows the characteristic of persistence with small fluctuations and anti-persistence with big fluctuations; the buying transactions for small orders is highly persistent. In the CSI500 market, the selling-out transaction of small orders always shows persistence, indicating that small orders are the force behind selling-out transactions.  We report multifractal results for weekly volume in Figures 6 and 7. The results are similar to the daily volume in Figure 4, with strong persistence in all three markets. In contrast to Figure 5, the generalized Hurst curve in Figure 7 is not smooth and very close, but shows persistence with small fluctuations and anti-persistence with big fluctuations; the persistence of small orders is higher than that of big orders. The results show that the time correlation is the reason for the strong persistence of the weekly trading volume, but that time correlation also affects the scaling effect of q. We suspect that, in the Chinese stock market, this may be because weekly trading volume is affected by holidays.

The Complexity Synchronization Characteristic of Orders
To determine whether big or small orders force the transaction, we calculated the synchronization of the different orders and total volume. Figure 7 shows the situation for the three markets when p = 2. Figure 8a shows the obvious hierarchical characteristics of transactions in the CSI300 market. The synchronization of big order buying transactions is higher and the synchronization of big order selling-out transactions is lower. The synchronization difference in buying transactions is greater than that of selling-out transactions in the CSI300 market. Figure 8c shows that there is no significant difference in the synchronization of the SSE50 market. When τ is less than four, the big order buying transaction is higher, while, when τ is greater than four, the synchronization of small order selling-out transactions is more prominent. Figure 8e shows that the CSI500 market is similar to the CSI300 market. The buying transactions for big orders have the highest degree of synchronization, but for sell-out transactions, the synchronization of big orders is higher. Figure 8b,d,f is the complexity synchronization after shuffle according to the shuffle rules [25]. The synchronization of the three markets has different degrees of reduction, with the decline in the CSI300 market being the most obvious. This suggests that the synchronization results partly from the correlation of the time series. The stratification of the three markets also changes after the shuffle. The stratification phenomenon is not obvious after the shuffle in the CSI300 market. The synchronization of small order selling-out behavior is higher than that for big orders and small orders are the dominant force in selling transactions. The CSI500 market shows a more obvious stratification phenomenon after the shuffle. This suggests that the synchronization is weakened by the correlation of the time series in the CSI500 market. The synchronization of big orders is higher than that of small orders and big orders are the leading force in market transactions. For the relatively mature SSE50 market, there is no obvious dominant force in market transactions. We present the complexity synchronization characteristic for weekly volume in Figure 9a. In Figure 9, small selling orders are shown to have stronger synchronization than big orders, which is consistent with the daily volume result, though this result is cut by time correlation. In Figure 9f, buy transactions are driven by big orders in the CSI500 market. In Figure 9d, there is no obvious driving force for trading in the SSE50 market, which is also consistent with the daily volume result.  To further understand the complexity synchronization of the total volume and different orders in the three markets, three-dimensional diagrams are presented in Figures 10 and 11 with a multiscale exponent p and a coarse grain factor τ. Figure 10 shows the buying transaction and Figure 11 shows the selling transaction. For buying and selling orders in different markets, with a change in p, the figures show the characteristics of first rapidly increasing and then remaining basically unchanged. With a change in τ, the figures show a positive correlation trend of different degrees. However, there are differences between the changes for the different transactions. In Figure 8, the Euclidean distance is used to measure the complexity invariant distance, where p = 2. In Figures 10 and 11, we use the Minkowski distance to measure the complexity invariant distance. Therefore, Figure 8 can be seen as a slice of Figures 10 and 11 at p = 2. For the variation in MCCS values, p = 2 is in the increasing part and is a top plane at p > 5. As τ increases, the synchronization gradually increases, especially when τ is less than five. This positive relationship is more obvious in small order buying transactions of the CSI500 market. Compared with Figure 10, the top plane of each plot in Figure 11 fluctuates to a greater extent. The fluctuation in Figure 10b is the most pronounced, exhibiting a rugged top plane and the top surface of the other plots is smoother. In the increasing part of the three-dimensional figure, as p increases, as shown in Figure 10b,f increases more rapidly, exhibiting a rapid color change in the figure. Figure 10a shows a noticeable drop when p is large, but the change is still smooth, showing a color change in the top plane. This implies that there is a significant peak in the MCCS value when p < 5.

Discussion
The results of the fractal analysis indicate that the trading volume of the Chinese stock market shows strong persistence, which is caused by the correlation of time. When we remove the time correlation by shuffle, it is found that the persistence of small orders is more potent than that of big orders and that the Chinese stock market shows persistence of small fluctuations and anti-persistence of big fluctuations. We observed that selling trades are driven by small order transactions which are unaffected by the time correlation. However, there is no apparent driving force for buying trades. In the CSI500 market, selling transactions are driven by big orders. We think that this may be because the majority of retail investors are in the CSI500 market. Transactions are mainly carried out in small orders, but big orders, which have more considerable weight, are the dominant force in the CSI500 market.
Previous studies of the stock market microstructure have mostly focused on the relationship between technical indicators, including volume and price, and volume and volatility. These studies found interaction between variables. Studies of the microstructure of volume also focus on the temporal correlation of volume and the driving force of this correlation. We directly studied the complex behavior of volume itself to determine the driving force of volume. Early studies showed that volume drives volatility [7] and stock price [5], and some studies have shown that volatility drives volume [51]. For trading volume, institutional investors are a stronger driving force for volume correlation than individual investors [18] and trading volume itself has a strong time correlation [56]. Buying transactions are more inter-related than selling transactions [19]. Financial rules and shocks can also reduce trading volume [53,54]. We did not only consider the relationship between variables, but also focused on the variable itself. We used the MFDFA and MCCS methods, which utilize more nuanced market information than the econometric regression models used in previous studies. Earlier studies were based on microstructure analysis from the perspective of the relationship between variables, whereas, we explain the microstructure of the volume indicator itself.

Conclusions
We used MFDFA and MCCS methods to study the complexity behavior of small and big orders in the Chinese stock market and found that the driving forces of different markets were different. Our results are informative for market participants. Market investors can better understand the buying and selling transactions in the stock market, discern the intentions of different trading forces in the market, and gain more information from the trading volume in the investment. For emerging markets, such as the Chinese market, the results of CSI300 are more informative, and for developed markets, the results of SSE50 can be used as a reference. For policy makers, different policies need to be formulated for different markets due to the different drivers of different markets. For the SSE50 market, policies can be relatively lenient, while for the CSI500 market, which is driven by big orders, targeted policies can help improve market efficiency. Our study has certain limitations. We have only considered the absolute value of the volume, and have not undertaken indepth investigation of the volatility of different orders-future studies may pursue this. In addition, our results may have been influenced by the global health crisis and it is sensible to recognize that this period differs from those without such crisis. Finally, the complexity analysis method can be applied to the study of other financial markets. This may include complexity synchronization between different futures markets and the complexity behavior of futures and stock markets. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://lareine66.github.io/complexity-behavior-data/.