Precision Measurement of the Return Distribution Property of the Chinese Stock Market Index

In econophysics, the analysis of the return distribution of a financial asset using statistical physics methods is a long-standing and important issue. This paper systematically conducts an analysis of composite index 1 min datasets over a 17-year period (2005–2021) for both the Shanghai and Shenzhen stock exchanges. To reveal the differences between Chinese and mature stock markets, we precisely measure the property of the return distribution of the composite index over the time scale Δt, which ranges from 1 min to almost 4000 min. The main findings are as follows: (1) The return distribution presents a leptokurtic, fat-tailed, and almost symmetrical shape that is similar to that of mature markets. (2) The central part of the return distribution is described by the symmetrical Lévy α-stable process, with a stability parameter comparable with a value of about 1.4, which was extracted for the U.S. stock market. (3) The return distribution can be described well by Student’s t-distribution within a wider return range than the Lévy α-stable distribution. (4) Distinctively, the stability parameter shows a potential change when Δt increases, and thus a crossover region at 15 <Δt< 60 min is observed. This is different from the finding in the U.S. stock market that a single value of about 1.4 holds over 1 ≤Δt≤ 1000 min. (5) The tail distribution of returns at small Δt decays as an asymptotic power law with an exponent of about 3, which is a widely observed value in mature markets. However, it decays exponentially when Δt≥ 240 min, which is not observed in mature markets. (6) Return distributions gradually converge to a normal distribution as Δt increases. This observation is different from the finding of a critical Δt= 4 days in the U.S. stock market.

The stock market is a complex financial system in which the traders, assets, and many unforeseen external factors interact with each other non-linearly; thus, it is extremely hard to write down a dynamical equation among these elements.Fortunately, the price fluctuations of individual stocks and market indices provide us with a powerful tool to understand its dynamics [2,6].Fluctuations are often quantified by a logarithmic return over a time scale of ∆t that can be mathematically defined as follows: where S (t) denotes the time series of a company stock price or market index.The return is of a great key role in asset pricing and is at the core of financial risks through multifractal analysis [6][7][8]16].
In the late 20th century, huge amounts of data from the stock market are available for scholars due to the rapid development of computer technology [2,6].This development allows physicists to analyze precisely the properties of the return of a financial asset using the methodology developed for statistical physics.In 1995, a paper published in Nature analyzed the return distribution of the Standard & Poor's 500 (S&P 500) index over the 6-year period (1984)(1985)(1986)(1987)(1988)(1989) [26].This work found that the central region of the return distribution can be well described by a truncated Lévy stable symmetrical distribution [27] with an index of α ≈ 1.4 (comparable with α ≈ 1.5 for the income distribution [3] and α ≈ 1.7 for the distribution of the fluctuation of cotton price [13]).More importantly, this study observed the scaling behavior of the probability density over three orders of magnitude of ∆t.Four years later, the same team conducted more detailed studies on the stock indices [28] and individual company stocks [29,30] in the U.S. market.These new works found a universal asymptotic inverse cubic power-law in the return distribution tails for both the S&P 500 index and individual company stocks (the power-law has been observed in many natural and social complex systems [31]).They also observed a critical point of ∆t below which the tail distributions retain a similar power-law, and gradually converge to Gaussian distribution otherwise ( ∆t ≈ 4 days and ∆t ≈ 16 days for market index and individual company stocks, respectively) [28,30,32].Further studies illustrated similar scaling behavior in other mature stock markets, such as the market in England, France, Germany, Mexico, and Japan [28,30,[33][34][35][36][37].
Although mature stock markets seem to show a universality of power-law scaling behavior, many exceptions exist in other stock markets.Studies on the stocks traded in the Australian Stock Exchange and the daily WIG index of the Warsaw Stock Exchange showed that tail distributions follow power-law, with the exponent being significantly different from 3 [38][39][40].As for the Indian stock market, a study on the daily returns of the 49 largest stocks indicated [41] that the tail distributions decay exponentially as e −β r .However, new studies in 2007 and 2008 [42,43] found that the distributions of fluctuations of the individual stock prices, the Nifty index, and the Sensex index follow the asymptotic power-law with exponent α ≈ 3.For the Hong Kong stock market, research has also been conducted and delivered different results from research based on the U.S. stock market [44,45].These non-unified results make it difficult to understand the dynamical property of the stock market.It seems to be dependent on the degree of development of a specific financial market [46].
The Chinese Mainland stock market, the largest emerging financial market in the world, has different trading rules and government regulations compared with other developed financial markets, such as T + 1 trading, intraday price limits, and IPO policy [47].These differences may result in different interactions among traders, assets, and external factors in the Chinese stock market.Thus, it is of great importance to understand the dynamical properties of the Chinese stock market via return distributions.However, the previous analyses on the return distribution for the Chinese stock market were conducted about 10 years ago and did not obtain conclusive results because of the limitation of data statistics.In 2005, scholars analyzed the data of 104 individual stocks listed on the Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) and found the tail distributions of daily stock price returns follow the power-law, with the exponent being significantly different from that in the U.S. stock market [48].They also stated that the distributions of returns are asymmetrical, but almost symmetrical distributions were observed in the U.S. stock market [48].A similar analysis of the SSE Composite Index (SSECI) and the SZSE Component Index in 2007 observed a symmetrical return distribution with a power-law exponent of less than 3 over 1 ≤ ∆t ≤ 60 min [49].In contrast, a more detailed analysis of the 1 min data and 1-day data of the SSECI in 2008 showed [50] that the power-law exponent is systematically larger than 3. Subsequently, a study on the tick-by-tick data from 23 individual stocks listed in SZSE argued that return distribution can be well fitted with the Student's t-distribution [51,52], which is different from the truncated Lévy stable process model [26,27].In 2010, a study of the SSE 50 index and SZSE 100 index revealed [53] that tail distributions obey the power-law when ∆t < 1 week and follow exponential decay otherwise.
In the past 20 years, more high-frequency data on the Chinese stock market have been accumulated for analysis.These data provide us with a good opportunity to precisely measure the properties of return distributions.To shed light on the understanding of the dynamical property of the Chinese stock market and help clarify the confusion stated above, this paper analyzes 1 min datasets recorded for the SSECI and the SZSE Composite Index (SZSECI) over the 17-year period (January 4, 2005 to December 31, 2021), using the methods and concepts adopted in statistical physics.This paper analyzes 1 min datasets over the 17-year period (January 4, 2005 to December 31, 2021) for both the SSECI and the SZSECI.The SSE and SZSE, the only two stock exchanges in Mainland China, were established in late 1990.The SSECI comprises all stocks of A-shares and B-shares listed and traded on SSE.Similarly, the SZSECI consists of all stocks listed and traded on SZSE.Both indices aim to reflect the overall Chinese stock market performance and are calculated by the capitalization-weighted method.Both exchanges trade during 9:30-11:30 and 13:00-15:00 of a trading day and are closed on the weekend and national holidays.When we construct a time series S (t), we first skip the non-trading days and the period of 11:30-13:00 on trading days, and then connect 9:00 to 15:00 of the previous trading day.The time series of indices with other time scales are constructed using the 1 min datasets (991,680 records for each index).

Results and discussion
To provide an overview of the statistical property of returns, Fig. 2 shows the probability density functions (PDFs) of returns of over 1 ≤ ∆t ≤ 3,840 min for both the SSECI and the SZSECI.It is evident from Fig. 2 that the PDFs of both indices have a similar shape.To study the shape quantitatively, we examine the skewness and kurtosis and the corresponding statistical significance tests [54,55].These examinations show that our data present slightly negative skewness with statistical significance, but its most central parts are symmetrical.These examinations also show that Fisher's kurtosis is larger than 0, with very high statistical significance.Additionally, as can be seen in this figure, these distributions are leptokurtic, fat-tailed, and almost symmetrical.Previous works have illustrated that the central parts of the return PDFs shown in Fig. 2 can be well described by the Lévy α-stable process [6,26,27,49,50].The symmetrical Lévy α-stable PDF is mathematically written as where P α refers to PDF, R ∆t is the return defined by Eq. ( 1), ∆t denotes time scale (in Eqs. ( 2) -( 5), to make ∆t dimensionless, we let ∆t equal the time scale divided by 1 minute), α (0 < α ≤ 2, stability parameter, also known as index) is a key parameter for Lévy α-stable distribution, and γ is the scale parameter.In Eq. ( 2), e −γ∆t|q| α is the characteristic function.According to Eq.
(2), the PDF of R ∆t = 0 is where Γ denotes the Gamma function.
Next, we use the approach proposed by Ref. [26] to extract the stability parameter α from the data shown in Fig. 2. According to Eq. ( 3), the parameter α equals the negative of the reciprocal of the slope shown in Fig. 3.The α values are as follows: 1.34 ± 0.03 (SSECI) and 1.13 ± 0.04 (SZSECI) over 1 ≤ ∆t ≤ 15 min, and 1.49 ± 0.03 (SSECI) and 1.57 ± 0.02 (SZSECI) over 60 ≤ ∆t ≤ 3,840 min.Such values of α extracted from our data are consistent with 0 < α ≤ 2 and comparable with 1.40 ± 0.05 which is extracted from the U.S. stock market [26].A potential crossover region at 15 < ∆t < 60 min for these two indices is observed.This potential crossover region is not observed in the U.S. stock market in which a single fitting holds over 1 ≤ ∆t ≤ 1,000 min [26], and also in the previous similar studies regarding the Chinese Stock market [49,50].We skip the first few minutes to an hour for each trading day to see whether this crossover region disappears, but it exists.The overnight return also cannot contribute to this phenomenon since the data points with ∆t ≤ 60 min do not include the overnight effect.By removing the data affected by extreme events, such as the global financial crisis of 2007-2008, the 2015-2016 Chinese stock market turbulence, and the COVID-19 global pandemic [56,57], we observe that those extreme events have no contribution to this crossover region.Therefore, this potential crossover region may indicate an underlying dynamical behavior of the Chinese stock market that differs from the U.S. stock market.The symmetrical Lévy α-stable distribution shown in Eq. (2) will collapse on the ∆t =1 distribution under the transformations below.
where R s denotes rescaled return, and R ∆t and P α (R ∆t , ∆t) are defined by Eqs. ( 1) and (2), respectively.Fig. 4 shows the rescaled PDFs of returns R s with the stability parameter α extracted from Fig. 3.An obvious collapse of distributions with a large ∆t is observed here.However, only central parts of distributions with a larger ∆t overlap with the ∆t =1 min data.The red curves are the symmetrical Lévy α-stable distributions with the parameters α obtained in Fig. 3.Note that these red curves are not simple fits to data.Their scale factors γ are obtained using Eq. ( 3) and the experimental P (0) for ∆t ≤ 15 min or the extrapolation of P (0) using the straight-line fits for ∆t ≥ 60 min in Fig. 3.The symmetrical Lévy α-stable distributions with the parameters extracted from Fig. 3 show good agreement with data in the central parts.From the two aspects discussed above, we could conclude that the symmetrical Lévy α-stable process describes a part of the dynamical properties of the Chinese stock market.In Fig. 4, the tail distributions of data are larger than the Gaussian distribution and smaller than the Lévy α-stable distribution.Thus, we tried using the Student's t-distribution to fit data, as demonstrated by the solid black curves shown in panels a and b which have sufficient data.The fit results show that the Student's t-distribution can describe the data at a wider range than the Lévy α-stable distribution well, which could be explained by the so-called non-extensive statistical framework [58,59].Given the discussion above, it is essential to obtain a detailed study of the tail distribution.To compare tails with different ∆t, we introduce the normalized return r ∆t where ⟨R ∆t ⟩ T is the average of returns R ∆t over the entire time T , and V is the volatility of the R ∆t time series.Fig. 5 shows the PDF and the complementary cumulative distribution function (CCDF) of the ∆t = 1 min tail of normalized returns r ∆t in a log-log style.The PDF of the tail follows a power-law decay in the form of r −(1+α) ∆t , as shown in panels a and b.Naturally, the CCDF follows the form of r −α ∆t , as shown in panels c and d.Similar behavior for both positive and negative tails is observed obviously.We use a straight-line fit to extract the exponent α, as shown by the dashed black lines.The exponents extracted from PDF are consistent with those from CCDF within acceptable errors.For the SSECI positive tail, the average (weighted by the reciprocal of squared errors) of exponents α extracted from fits to PDF and CCDF is 3.07 ± 0.05.For the SZSECI positive tail, that value is 3.14 ± 0.04.The values of the SSECI and the SZSECI are in agreement with each other within errors, and well as outside the Lévy α-stable process.
Fig. 6 compares the CCDF of the normalized returns with 1 ≤ ∆t ≤ 3,840 min for positive and negative tails.Here, the theoretical values of standard normal distribution and Student's t-distribution are also drawn for a comparison.It can be seen from Fig. 6 that these tails with small time scales follow an asymptotic power-law decay, but the tails gradually deviate from the power-law when ∆t becomes longer.For the case of short ∆t, the tail distributions are fitted by the power-law and the Student's t-distribution; thus, the exponents α are extracted.The exponents α extracted from these two functions are close to the value of 3 which is frequently observed in mature stock markets.Such values ensure a finite variance of returns.This is important for option pricing and risk management.We also verified the convergence behavior of return distribution by comparing the moments µ k between the normalized return data and the standard normal distribution, as shown in Fig. 8.The result indicates that the data gradually converge to the standard normal distribution starting from ∆t = 1 min.To quantify this convergence behavior, we introduce a measure of moment difference between the normalized return data and the standard normal distribution, as shown in Eq. (7).
where M D and M G denote the moments of the normalized return data with a ∆t and the standard normal distribution, respectively; n is the number of data points shown in Fig. 8.The measure of D can also serve as the distance between two curves.Therefore, we define speed using Eq. 8 to measure the speed of this convergence between a moment curve i and another moment curve i + 1, as shown in Fig. 8.
Fig. 9 shows the measured distance between the normalized return data and the standard normal distribution and the speed of the convergence of data.This figure demonstrates quantitatively that the convergence starts at ∆t = 1 min, and the speed at a small ∆t is much faster than others.This convergence behavior is different from the early studies in the U.S. [28] and the Chinese stock markets [50], in which convergence to the standard normal distribution occurs only when ∆t ≥ 4 days [28,50].The colored curves denote the data, and the solid black curves refer to the moments of standard normal distribution.These two panels share a common legend.It is evident that the data gradually converge to the standard normal distribution as ∆t increases.

Conclusions
Because the previous studies on the return distribution of the Chinese stock market are dramatically limited by statistics, this paper systematically and precisely investigates the property of the return distributions of both the SSECI and the SZSECI in the Chinese stock market.We used 1 min high statistics datasets over a 17-year period (January 4, 2005 to December 31, 2021) to construct return distributions with time intervals ∆t ranging from 1 min to almost 4,000 min.The results illustrate that the properties of the return distributions for both the SSECI and the SZSECI are similar.The main findings are as follows: (1) The return distributions present a leptokurtic, fat-tailed, and almost symmetrical shape that is similar to that of mature stock markets.
(3) Return distributions can be described well by Student's t-distribution within a wider return range than the Lévy α-stable 8/10 distribution.(4) A potential crossover region at 15 < ∆t < 60 min was discovered.Such a crossover region is not observed in the U.S. stock market, where a single value of α ≈ 1.4 holds over 1 ≤ ∆t ≤ 1,000 min [26].(5) To obtain a better understanding of tail distribution, this paper checks the PDF and CCDF of tails in detail.For small ∆t, the tail shows scaling behavior and follows an asymptotic power-law decay with an exponent of about 3, which is a value widely observed in mature stock markets.However, the tail decays exponentially when ∆t ≥ 240 min, which is not observed in mature stock markets.(6) Finally, it is observed that return distributions gradually converge to a normal distribution as ∆t increases.Such convergence behavior is different from previous studies in the U.S [28] and Chinese stock markets [50], which state that convergence only occurs when ∆t ≥ 4 days.
Stock markets are inhomogeneous and time-varying.A multifractal analysis via the return distribution and an analysis of volatility surfaces in stock markets across the world should be conducted using the latest high-frequency datasets that have been collected over the same time period.By comparing the empirical results from different stock markets and constructing theoretical models, one can learn the underlying dynamics of stock markets [60], such as the impacts of investor risk attitude, trading rules, and government regulations in different countries.

Figure 1 .
Figure 1.The time series of the SSECI and the SZSECI.The 1 min datasets over the 17-year period (January 4, 2005 to December 31, 2021) analyzed in this paper are shown.The higher one and lower one represent the SSECI and the SZSECI, respectively.

Figure 2 .
Figure 2. Probability density of returns over time intervals ranging from 1 min to 3,840 min.Return R ∆t is defined as ln [S (t) /S (t − ∆t)], where S(t) refers to the time series of the SSECI (panel a) or the SZSECI (panel b).Different markers represent data of returns with different time intervals ∆t.These two panels share a common legend.It is evident that these distributions expand as ∆t increases.

Figure 3 .
Figure 3. Probability density of the return R ∆t = 0 as a function of the time interval ∆t.The black stars are the data points.The red and blue lines are straight-line fits to the data points over 1 ≤ ∆t ≤ 15 and 60 ≤ ∆t ≤ 3,840, respectively.The fit results of slopes for red and blue lines are also shown.Similar scaling behavior is observed for both the SSECI (panel a) and the SZSECI (panel b).From this figure, the key parameter α characterizing the Lévy α-stable process is extracted (see main text for details).A potential crossover region at 15 < ∆t < 60 is observed here.

Figure 4 .
Figure 4.The comparison of rescaled PDFs of returns R s with theoretical models.The colored markers are data points with different time scales.The dashed black curves are Gaussian distributions with a mean of 0 and a standard deviation of 1 min data.The solid black curves are Student's t-distribution fits to 1 min data.The solid red curves are symmetrical Lévy α-stable distributions with the observed parameters, as shown by the red text (see main text for details).The top two panels and the bottom two panels share a common legend, respectively.

Figure 7 .
Figure 7. CCDF of return tails with different time scales ∆t in log-linear plot.Panels a and b reflect the SSECI and the SZSECI, respectively.Colored markers are for positive tails.Solid straight lines are exponential fits to data, and the values of β with its fitting errors are also presented here.These two panels share a common legend.To keep the figure from looking cluttered, the negative tails are not shown here; they feature similar results to positive tails.

Figure 8 .
Figure 8.Comparison of moments µ k between the normalized return data and the standard normal distribution for the SSECI (panel a) and the SZSECI (panel b).The colored curves denote the data, and the solid black curves refer to the moments of standard normal distribution.These two panels share a common legend.It is evident that the data gradually converge to the standard normal distribution as ∆t increases.

Figure 9 .
Figure 9.The moment distance between the normalized return data and the standard normal distribution (panel a) and the speed of convergence of data (panel b).∆t is the time scale divided by 1 min.The circle and star points represent the data of SSECI and SZSECI, respectively.