Empirical Research on the Fama-French Three-Factor Model and a Sentiment-Related Four-Factor Model in the Chinese Blockchain Industry

: As one of the most significant components of financial technology (FinTech), blockchain technology arouses the interests of numerous investors in China, and the number of companies engaged in this field rises rapidly. The emotion of investors has an effect on stock returns, which is a hot topic in behavioral finance. Blockchain is an essential part of FinTech, and with the fast development of this technology, investors’ sentiment varies as well. The online information that directly reflects investors’ mood could be utilized for mining and quantifying to construct a sentiment index. For a better understanding of how well some factors adequately explain the return of stocks related to blockchain companies in the Chinese stock market, the Fama-French three-factor model (FFTFM) will be introduced in this paper. Furthermore, sentiment could be a new independent variable to enhance the explanatory power of the FFTFM. A comparison between those two models reveals that the sentiment factor could raise the explanatory power. The results also indicate that the Chinses blockchain industry does not own the size effect and book-to-market effect.


Introduction
Financial technology (FinTech) consists of several technologies, such as blockchain, cloud computing, big data, and machine learning. Blockchain is an advanced technology extracted from the bitcoin, which was first promoted by Nakamoto [1]. As one of the most innovative and important components of FinTech, it could now tackle challenges such as digital currency, asset securitization, cross-border payment and settlement, and insurance management. As part of FinTech, blockchain has produced a series of extremely promising applications because of its characteristics, such as decentralization, immutability, and anonymity. Blockchain can not only play a role in FinTech, but can also be applied to diverse industries, such as supply chain, intellectual property, estate, and the Internet of Things (IoT). Blockchain technology is highly valued in China. Even the People's Bank of China (PBOC) began planning to issue CBDC (Central Bank Digital Currency) based on blockchain technology, and the excellent design has been basically completed [2]. Mu et al. claim that the People's Bank of China owns the most blockchain patents among central banks in the world [3]. Due to the innovative nature of this technology and the high level of interest, the number of companies in this field is also increasing. It is necessary to detect the value of these firms to understand this industry better.
This paper will try to make use of the FFTFM (Fama-French Three-Factor Model) for the analysis of stocks of Chinese blockchain firms and to detect the existence of size effect and book-to-market ratio effect (BM effect) in this field. The capital asset pricing model is a popular topic attracting numerous researchers for a long time. Markowitz first proposed portfolio theory to balance the risk and return [4]. Sharpe, Lintner, and Mossion built the capital asset pricing model in the 1960s, and this model considers the market return as the unique variable to explain the return. Fama and French proposed the FFTFM, that adding size factor and book-to-market ratio factor into the CAPM (Capital Asset Pricing Model) enhances the explanatory power [5]. After the model was released, Chinese researchers began to utilize the FFTFM to analyze Chinese stock market performance and found it gains a better result than CAPM [5][6][7][8]. Some studies pay attention to the stock market, for instance, the stocks belonging to the A-shares' market and Growth Enterprise Market [9,10]. Some Chinese researchers focus on a particular industry via the FFTFM, such as the real estate industry, the electric power industry, steel industry, and bank industry [11][12][13][14]. It could be noticed that many studies emphasize the traditional industry [12,14], whereas the blockchain industry is an innovation and the research of implementation of FFTFM in this field is lacking.
Blockchain owns its noticeable position, especially when it comes to concepts like FinTech. There is a saying that "one day in the blockchain industry, one year in real life", which reveals the extremely rapid changes in this field. Blockchain technology was first applied in the financial field. Since it revolutionizes centralization and simplifies a series of the transaction process, it is recognized as a particularly useful tool of FinTech, which arouses a lot of interest. The rapid development also has an impact on investors' expectations and sentiment to the blockchain companies including but not limited to the firms that take blockchain as FinTech, and the relationship between emotion and stock return is an indispensable topic of behavioral finance.
Behavioral finance researchers study the impact of capital market participants' psychological and behavioral characteristics on capital markets based on the assumptions of limited arbitrage opportunities and bounded rationality. The "emotion" in psychology is the expression of external attitudes generated by individual cognitive processes. Investor sentiment in behavioral finance is caused by investors' limited rationality and can be interpreted as investors' expected bias, subjective preferences, investment beliefs, and speculative needs. When investor sentiment affects enough investment demand, it will cause the stock price to deviate from its value. According to empirical studies, investor sentiment has an essential impact on financial behaviors such as stock price and income fluctuations, stock market anomalies, and corporate investment decisions and earnings management [15,16]. Liu and Zhang summarize that the Chinese stock market is mainly composed of individual investors with relatively weak investment skills, keen subjective awareness, and lowrisk perception ability [17]. Investors are more inclined to pursue short-term capital gains and are keen on short-term investment projects to gain speculative profits. This determines that investor sentiment owns a more powerful influence in China than in mature capital markets.
Under this background, this paper tries to investigate the influence of Internet information related to the stock performance by mining and quantifying Internet public opinion information. Then the sentiment factor would be added into the traditional FFTFM for research.
The data are collected from the China Stock Market Accounting Research platform and China Research Data Services platform. Sentiment factor results from the Guba public comments of each stock by online users. Stocks that related to blockchain technology, including but not limited to the listed firms that treat blockchain as FinTech, would be grouped to construct portfolios with different characteristics for the research. While comparing the results of the FFTFM and improved four-factor model, the sentiment factor could present a better explanation of the return of Chinese blockchain stocks. It also notices that size effect and BM effect could not be found in this industry, and the portfolios constructed by big-size companies and low book-to-market ratios companies gain the best return.
Due to the creativity and the bright prospect of blockchain technology, this paper focused on Chinese listed firms related to blockchain technology and demonstrated the valuation via the FFTFM, which is supposed to describe the risk yield characteristics of the blockchain industry in China. Many scholars have studied China's asset pricing based on FFTFM. This paper also drew on the FFTFM to empirically research the Chinese A-share market, to try to explain the stock performance of the Chinese A-share blockchain firms and verify whether the industry has scale effects, value premium effects, and profitability effects. We concluded that the BM effect does not exist in the Chinese blockchain industry. We also built a sentiment factor using data mining to improve the traditional FFTFM in order to present a better explanatory power for this industry.

Fama-French Model Research
Fama and French found that the beta value of CAPM could not explain the difference of excess return, so they proposed a three-factor model that divides the main factors into three factors, namely market factor, scale factor, and value factor, for a better explanatory power of excess return [4]. In order to explore whether the model can be applied to the stock markets of other countries, Fama and French studied the stock returns and pricing factors in different countries and claimed that FFTFM is better than CAPM [18].
Carhart believed that the FFTFM could not explain the difference of excess returns well and added a new variable-momentum factor to construct the Fama-French four-factor model [19]. Xu and Xiong used A-share listed companies as samples from 2004 to 2005 and found that the four-factor explanatory ability has been improved but cannot fully explain the stock fluctuations in yield [20]. Xue and Guan conducted empirical research through a four-factor model and found that only a few funds can perform slightly better than the whole market index [21]. This paper aimed to construct a model based on FFTFM with a sentiment factor and to detect a more explanatory power, which has a similar purpose as the Carhart model.
Fama and French analyzed profitability and investment information and added them to the FFTFM to construct a new Fama-French five-factor model (FFFFM) [22]. However, the FFFFM is still an imperfect model. Fama and French found that the FFFFM mainly has two defects: The first is that the model lacks the ability to describe the average return of small stocks, and the second weakness is that the HML (High Minus Low) factor is a redundancy factor. Racicot et al. studied the FFFFM with traditional illiquidity measures and found the weakness of this model, especially for the endogenous illiquidity measures [23]. The robust instrumental variables (RIV) algorithm conducted by GMM (Generalized Method of Moment) was taken into consideration for correction. Racicot et al. transferred FFFFM into the dynamic specification and used Kalman filtering and a recursive robust instrumental variables (IV) algorithm to detect the estimation of alpha and beta [24]. They noticed that illiquidity is a significant factor in the Kalman filter approach and that market risk premium is the only effective factor in a dynamic context based on the GMM approach.
Sembiring applied the Fama-French model in the Indonesian securities market under market overreaction conditions and found that the market, size, and value factor are accurate to explain portfolios' returns [25]. Cox and Britten utilized the FFFFM in the Johannesburg securities market and concluded that size and value factors are significant, but the market factor presents a negative relationship [26]. Bangash, Khan, and Jabeen disagreed that size pattern performs well by empirical research on the Pakistan equity market [27]. In China, scholars not only use the FFTFM to investigate the Chinese stock market but also on the replacement of indicators based on the combination of the Chinese stock market's actual situation. Tian, Wang, and Zhang compared the FFTFM between the securities market of China and the United States [28]. They concluded that the FFTFM could explain the excess returns of the two countries' investment portfolios, whereas their applicability in the Chinese and American stock markets is different. The Chinese market risks are more significant than other factors, and SMB (Small minus Big) has explanatory power for small-cap stocks. Yang and Fan indicated that the FFTFM is interpretable to the stock markets of developed and developing countries [29]. Several researchers use the FFTFM application in the Chinese securities market to test the effectiveness of the whole market [9,30,31]. Liu, Zhu, and Li argued that the FFTFM is suitable for China's Growth Enterprise Market [10]. Yin added the sentiment factor into the Fama-French model and found that prices of small-cap stocks, high P/E (Price/Earnings) ratios, and high-priced stocks are more sensitive to investor sentiment [32]. Hu, Tu, and Zhu believe that the five-factor model of Fama-French stock pricing is suitable for use in China's stock market [33]. Yuan and Cong found that the FFTFM is suitable for the listed companies in the Chinese HA-DA-QI (Harbin-Daqing-Qiqihar) region [34]. For the stock returns in Chinese, a particular industry, some researchers take the FFTFM to explain the returns. You found the market factor effect and BM effect in the estate industry and Cheng and Fang conducted an empirical research of stock returns in the auto industry [11,35]. For energy listed companies, Li and Zhao pointed out that the FFTFM model applies to the prediction of market returns of China's listed power companies and that FFTFM could also be used in Chinese iron industry [12,13]. Gou, Wang, and Zhu drew the same conclusion that small-cap stocks own scale effects, and stocks with high book-to-market value ratios have BM effect [14,36].

Conventional Investor Sentiment Research
Behavioral finance arouses numerous academics' interest and they hope to find a principle for better decision making by using investor sentiment. Baur, Quintero, and Stevens used stock market data from 1986 to 1988 to explore the relationship between investor sentiment and linkage with the 1987 securities' market crash [37]. Mehra and Sah summarized the three conditions in which the investor sentiment affects the stock price in the arbitrage market: Firstly, there is a systematic fluctuation of investor sentiment; secondly, investors make decisions based on emotions; thirdly, investor ignores the subjective influence brought by emotions [38]. Brown and Cliff collected data and compiled investor sentiment index [39]. The study found that the lag effect of market yield has a more significant impact on investor sentiment, but, in turn, investor sentiment is not efficient in predicting market returns. Cheng and Liu used a blue-chip index to reflect the bullish situation and found that the stock market's mid-term sentiment was more affected than the short-term sentiment [40]. Wang, Zhao, and Fang claimed that investor sentiment leads to share prices in the early stage of IPOs (Initial Public Offering), which will cause listed companies to use investor sentiment to maximize profitability [41]. Wen et al. used the Shanghai Securities' Market data to construct an investor sentiment index to study the characteristics of investor behavior under different emotions [42].

Investor Sentiment Research Based on Internet Information
Currently, how to use social network information to predict economic behavior has gradually become one of the research hotspots in various fields. Tetlock uses the one column of the Wall Street Journal as the investment sentiment analysis basis, and analyzes the relationship between investor sentiment and stock market returns [43]. He believes that the large fluctuations in investor sentiment will cause an increase in volume, and pessimistic forecasts will lead to a fall in stock prices. Chen et al. indicated that online information helps investors make better financial decisions [44]. Meng, Meng, and Hu constructed the investor sentiment index using factor analysis, based on the data of CSSCI (Chinese Social Sciences Citation Index), Sina Weibo text, and Baidu's keyword recommendation system [45]. Luo, Wang, and Fang considered investor sentiment index when establishing the CAPM, and the investor sentiment index was constructed on the sentiment analysis of the stock forum posting [46]. Investor sentiment against the stock index based on the ordinary linear regression model was found. Xu uses text analysis and machine learning to construct a new investor sentiment indicator system based on Sina stock evaluation information and a long-term survey [47].

Stock and Financial Data
This paper selected listed blockchain companies in Shanghai and Shenzhen stock markets. The Special Treatment (st) company cannot be considered as a regular listed company because of its business difficulties, so the data only included non-st companies' data. Because the data of listed blockchain companies included in the Wind blockchain index were complete since 2016, and to test the authenticity and comprehensiveness of the samples, monthly stocks' data during June 2016 to June 2019 were collected from China stock market and accounting research database and Chinese research data services platform (CNRDS). After screening, there were 50 sample companies. A large sample time interval, with sufficient and new sample data, could provide a practical and meaningful result.
This paper selected the monthly information of stocks related to the Chinese blockchain industry from June 2016 to June 2019, which was collected from the Chinese stock market and CNRDS. These stocks included, but were not limited to, some listed companies that use blockchain as a FinTech.
The way to determine whether a company is involved in the blockchain industry is based on the components of Wind's blockchain industry index. If a firm was collected as a component of the index, this firm was considered for the research. There was an overlap between the stocks belonging to the blockchain industry index and the stocks belonging to the FinTech index.
The traditional Fama-French model will reclassify the portfolios at the end of June each year, but to better reflect the performance, this paper regrouped the portfolios monthly. The reason is that the blockchain industry in China is an emerging industry, and several companies are embracing this innovative technology, including, but not limited to, FinTech firms. Other companies involved in FinTech also show the same trend. In 2016, there were merely eight listed firms related to blockchain technology according to Wind database, whereas there were more than 150 firms in 2019 [48]. More data details will be shown in Sections 3.1., 3.2., and 4.

Stock Returns
The stock returns were selected by the monthly return that after cash dividend reinvestment. Transaction costs was not considered. Stock returns are the basic element for constructing SMB and HML factors [49].

Market Returns
Monthly market returns would be seen as a market index, which is comprehensively calculated based on the returns of China A-shares market, B-shares market, and China's growth enterprise market. This is due to the complexity of the blockchain industry in China. These listed companies have different characteristics and are distributed in different stock markets. Some of them are FinTech firms. Therefore, this comprehensive index can more comprehensively and objectively reflect the overall price change trend of the market and provide investors with more valuable indicators. This element could reflect the situation of the whole market, which is also an essential data for calculating beta [49].

Risk-Free Rate
In practice, there is no absolute risk-free interest rate. Researchers choose those financial products with better liquidity and less default risk to represent the risk-free interest rates, such as the national debt rate and bank savings deposit rate. The Chinese banking system is dominated by stateowned banks with little default risk, no market segmentation issues, or any individual or corporate institution that can deposit in the bank [50]. This paper intended to select the one-year-term savings deposit interest rate and convert it into a monthly return rate by using the continuous compounding method.

Size
The size of a listed company is determined by its market capitalization, obtained by multiplying the stock price by the number of shares outstanding. According to China's unique conditions, stocks are divided into tradable shares and nontradable shares. Due to the special historical background of the restructuring of state-owned enterprises, the existence of nontradable shares is a major feature of China's securities market and nontradable shares cannot be circulated on the secondary market [51]. So only tradable shares will be used and market capitalization calculated by the tradable shares as well.

Value
The measurement of the value of the company is the number of book equity to the market value of equity. The financial statement of the listed company does not directly present this number, but it could be gained from price-to-book value. This paper took the reciprocal of the PB (Price-to-Book) value on the last trading day of each month.

Sentiment Data
There are various resources to construct investor sentiment factors. Zhang and Liu summarized that the simple sentiment index mainly adopts a direct survey method or data mining method, and compound indicators are constructed by selecting multiple single objective indicators, or a combination of single objective indicators and subjective indicators to construct investor sentiment [52]. This paper used the data mining method, and Guba comments were used to present investors' sentiment. Unlike news reports from newspapers or traditional news websites, Guba is a free medium, and the content of their posts is mainly the expression of investors' subjective wishes, which are relatively random and irregular. For example, Guba comments may contain a few simple words, or some irrelevant expressions and meaningless text expressions. These noises will affect the accuracy of our sentiment judgment on posts. Comments on blockchain companies, including the firms, treat this technology as FinTech and would be collected for further analysis.
There are several platforms for analyzing the text and extracting the emotional tendencies from the content, for example, cloud natural language from Google, Baidu AI (Artificial Intelligence) platform, and Yuyi data platform. In order to keep data consistent, the Guba media data analysis database from the Chinese research data services platform was used in this paper. According to the Guba database description, the platform uses a supervised learning model to judge the post's sentiment. The application of supervised learning in the post classification of the database includes the following steps: ① Define the categories (including positive, negative, and neutral) in advance, manually label the content of the posts, and obtain positive, negative tendency. Score 1 is positive, −1 is negative, and 0 is neutral. ② Automatically obtain data from a dataset with category information. This part of the data is called "training data". ③ Supervised learning algorithm support vector machine is introduced to learn the classification model on the training dataset. Use classification models to predict the categories of a test dataset automatically. It is noticeable that the Guba comments data from the Chinese research data services platform were merely from 2008 to 2018, so the 2019 comments data were collected from the China stock market and accounting research database.

Build Portfolios According to Size and Value
The stocks will be grouped monthly from two dimensions, size and value.

Size
There are many ways to determine in which group the company should be involved, and there are various ways to deal with it. Fama and French divided the stocks from three American stock exchanges into a small or big group based on the median of the size factor [5]. This paper used the FFTFM division method. After sorting the circulating market capitalization in ascending order, the stocks were evenly divided into two groups: Small (S) and big (B).

Value
Fama and French sorted the book-to-market ratio in ascending order and divided the sample data into low (L) group, medium (M) group, and high (H) group, according to the proportions of 30%, 40%, and 30%, which are called growth group, medium group, and value group [5]. Since the number of blockchain companies change rapidly, including but not limited to the firms that take blockchain as FinTech, so, based on the traditional Fama-French division method, the number of shares in a certain group might be zero. In order to deal with this issue, the stocks were evenly classified into two groups, high group (H) and low group (L). In the future, there will be more companies using blockchain technology and more companies embracing FinTech. By then, this weak point can be compensated.

Portfolios
After the above two classifications, each stock had two indicators: Size and value. Those stocks will be cross-combined to build four portfolios based on those two dimensions. They are S/L, S/H, B/L, and B/H. The research on the Fama-French pricing model was based on the data of these four portfolios.
The details of these four portfolios are: 1. Portfolio S/L: Refers to those stocks which both belong to the small-size group and low book-tomarket ratio group at the same time. 2. Portfolio S/H: Refers to those stocks which both belong to the small-size group and high bookto-market ratio group at the same time. 3. Portfolio B/L: Refers to those stocks which both belong to the big-size group and low book-tomarket ratio group at the same time. 4. Portfolio B/H: Refers to those stocks which both belong to the big-size group and high book-tomarket ratio group at the same time.

Ri
Ri is the return of the portfolio that is calculated according to the ratio of the circulation market value of each stock to the sum of the circulation market value of the combination.

Market Risk Premium Factor (Rm-Rf)
The market risk premium is obtained by subtracting the risk-free interest rate from the market rate of return. As mentioned above, the market return (Rm) is a monthly return of A-shares, B-shares, and China growth enterprise market. It is a comprehensive monthly return that is considered after cash dividend reinvestment and obtained by using a weighted average market capitalization method. The risk-free rate is the coupon rate of the one-year bank saving deposits and then turns an annual risk-free interest rate into a monthly one. SMB SMB factor is obtained by comparing the average portfolio return of a small-sized company and the average portfolio return of a big-sized company. This factor measures the difference in returns due to the size of the listed companies. The construction method is to sort all companies according to market capitalization from low to high, select the first 50% of the stocks to form a small market value group, select the last 50% of the stocks to form a large market value group, and calculate the return of small market value group and the return of large market value group, respectively. Then calculate the difference between those two return rates. Repeat the above process every month to get the SMB factor sequence.
The specific calculation formula is as follows: HML HML factor is obtained by comparing the monthly return rate of the high book-to-market ratio portfolio and the monthly return rate of the low book-to-market ratio portfolio. This factor measures the difference in returns due to different book-to-market ratios of listed companies. The construction method sorts all companies according to the book-to-market ratio from high to low, selects the top 50% of the stocks to form a group with a high book-to-market value ratio, and selects the bottom 50% of the stocks to form a group with a low book-to-market value ratio. Then calculate the marketweighted return of the two groups, respectively. Repeat the above process every month to get the HML factor sequence.
The specific calculation formula is as follows: Antweiler and Frank introduced a method to measure the effect of investors' sentiment [53]. Bu et al. proposed a measure of investor sentiment that integrates bullish and bearish expectations of investors based on the Guba comments and naive Bayesian method [54]. This paper referred to this method to construct the model for analyzing the negative or positive sentiment based on the Guba comment website. The formulation is: = ∑ ∈ ( ) means the sum of one emotion during the period D(t). The "c" belongs to positive, negative, or neutral. The means if a comment "i" is one of "c", then equals 1. If a comment "i" is not one of "c", then equals 0. The "pos" represents positive emotions, "neg" represents negative emotions, and "neu" represents neutral emotions. The sentiment index " " is between −1 to 1, which indicates investor expectations. Every stock has a "Sent" value every month. According to which portfolios they belong (S/L, S/H, B/L, B/H), the "sent" value of each portfolio will be obtained.

Fama-French Model
After collecting and processing the data above, the traditional Fama-French model could be presented: The regression equation of the model is expressed as follows: According to the idea of the FFTFM, based on the model, the sentiment index reflecting investor sentiment was constructed according to the above sentiment analysis method. Then, add the sentiment factor to the model, and finally get a four-factors model:  is the excess return on portfolio "i". It is the difference between the weighted return of portfolio "I" and the risk-free rate during the same period 't'.
is the difference between the market return and risk-free rate during the same period "t".
SM t B is the difference between the portfolio returns of the small companies and the portfolio returns the big companies constructed during the period "t".

t HML
is the difference between the return of portfolios with high book-to-market ratio companies and the return of portfolios with low book-to-market ratio companies during the period "t".
it Sentiment is the sentiment score of the portfolio "i" during the period "t".

Descriptive Analysis
Based on the collected data and indicators constructed before, we could perform descriptive statistical analysis. Related data processing was conducted in Python. Table 1 shows the basic monthly returns of four different portfolios from 2016 to 2019, and it could be noticed that portfolios with low book-to-market ratios had a positive average return. Additionally, the investment portfolio with the lowest average rate of return was B/H, which was −0.01115. Portfolio B/L owned the highest average rate of return, which was 0.01505. This portfolio had the lowest standard deviation, indicating the smallest fluctuation of the performance. S/L portfolio owned the highest standard deviation, which meant that the small-size company with low book-to-market ratios carried the most top variations. Table 2 shows the correlation between parameters and portfolios' returns. The correlation coefficient between SMB and market premium was 0.19, indicating that the two were positively correlated, where that between HML and market premium was −0.1, showing the negatively correlated.

Market Risk Factor
Generally, the trend of market risk factors is the same as the trend of the average return of the four portfolios. Among them, the market risk factor better reflects the changing direction of the S/H, B/H portfolio. Changes in other combinations can also be presented, but there will be some discrepancies in specific periods. The portfolio S/L and B/L excess market factored a lot during the period from February 2018 to April 2018. This showed that the market risk factor was one of the essential variables to explain the difference in stock returns, but it was not enough to rely on the market risk factor alone to explain the changes. This also reflected the defects of the CAPM model from the side, as shown in Figure 1.  Figures 2 and 3 show the comparison of portfolios with different companies' size, given the same book-to-market ratio. The orange line presents the portfolios constructed by the big companies, while the blue line displays the one built by the small companies. From June 2016 to February 2019, the trend of the monthly average return of a portfolio with large-size listed companies was consistent with the direction of the one constructed by small-size firms. After February 2019, the average monthly yield on stocks of small-size listed companies was higher than the portfolios of large-scale listed companies. This may be because there were fewer companies engaged in the blockchain industry before 2019, and the size of the company cannot be an essential factor affecting the portfolio yield. After 2019, there were more than 50 companies engaged in the blockchain industry, which more clearly reflected the difference in yields caused by size. This trend may be because blockchain is still a new technology, and the number of companies, including the firms that treat blockchain as FinTech, was relatively small in the early stage.

ADF (Augmented Dickey-Fuller) Test
Generally, the first step is to perform a stationary test when studying on a time series data. The Fama-French model is based on the return of stocks, which is a kind of time series dataset. In addition to the method of visual inspection, the more commonly used statistical test method is the augmented Dickey-Fuller (ADF) test, and it is an extended form of Dickey-Fuller test. The ADF test is also known as the unit root test. If the significance test statistic obtained is less than three confidence levels at 10%, 5%, or 1%, then there should be 90%, 95%, or 99% certainty to reject the null hypothesis accordingly. Since the difference between the FFTFM and the improved four-factor model is adding a new independent variable "sentiment", the selected stocks remain, and the classification method does not change as well. Therefore, it merely needs to test the stationary of returns of each portfolio. The ADF test is suggested to be conducted in Python using the "statsmodels" package, and the results are delivered in Table 3. As shown in Table 3, the return of eight portfolios all passed the ADF test. The t-values of them were −5.4165, −5.0223, −4.3714, and −5.2175, respectively, and all p-values were equal to zero. As the null hypothesis was rejected, there was no unit root in any time series data. The stationary data could be taken into further research.

Autocorrelation
Autocorrelation refers to the correlation between the expected values of random error terms, and it could harm the effectiveness of the multilinear regression model. So, whether using the traditional FFTFM or the four-factor model, including the new sentiment parameter, it is necessary to test if this situation exists. The first detection method was the standard Durbin-Watson (DW) test, and the results of the test are presented in Table 4. The range of DW was from 0 to 4, and the value of DW close to 0 indicated that the error terms were a positive autocorrelation while the value close to 4 indicated the negative autocorrelation. If the DW value ranged from dL (lower critical value of d) to dU (upper critical value of d), it could not judge whether there was autocorrelation. Moreover, if the DW value was between dU and 4-dU, it could bring greater confidence to conclude the nonexistence of autocorrelation. It is required to refer to a list of DW values to acquire the upper limit value (dU) and lower limit value (dL) under different situations for checking the autocorrelation accurately. According to Table 4, there was no autocorrelation in most portfolios. Still, the DW values of portfolio S/L and B/H in the traditional FFTFM could not confirm whether they passed the test. The S/L portfolio in the FFTFM was also unable to recognize if there was autocorrelation. Therefore, Breusch-Godfrey LM (Lagrange multiplier) test was suggested to be considered, and the probabilities of chi2 (Chi-square) were 0.0818 and 0.0847, respectively. The null hypothesis of the Breusch-Godfrey LM test was that there was no autocorrelation. The results of the Breusch-Godfrey LM test also emerge in Table 4, and it can be seen that the probability of all portfolios implied that the null hypothesis was acceptable. There was no autocorrelation in any multilinear regression models.

Multicollinearity
Multicollinearity means that there is a linear correlation among the independent variables. This situation manifests itself as one independent variable that can be a linear combination of one or several other independent variables. It hurts the regression model. Perfect multicollinearity could result in the non-existence of parameter estimation. Near-extreme multicollinearity allows the estimator of the ordinary least square model to cease to be effective. Simultaneously, the parameter estimator and the significance test would not make sense. It could not obtain a reliable prediction under the multicollinearity. The value of variance inflation factor (VIF) could be used for multicollinearity test. A greater VIF value means a higher probability of multicollinearity between independent variables. The results of the multicollinearity test are shown in Table 5. All independent variables in every regression model could pass the test, and there was no multicollinearity

Heteroscedasticity
All error terms have the same variance, which is an essential hypothesis of ordinary least squares regression that guarantees a reliable result of parameter estimation. If the error terms own a different variance, it could conclude that heteroscedasticity exists in the linear regression model. There are several test methods for heteroscedasticity, such as the White test, Park test, Gleiser test, Goldfel-Quandt test, and a directly subjective judgment is based on the graph. In this paper, the White test was taken into consideration for heteroscedasticity, and the results are presented in Table 6. The null hypothesis of the White test was there is homoscedasticity, and the alternative hypothesis was that there is unrestricted heteroskedasticity. According to Table 6, the probabilities of all portfolios, be they a three-or four-factor model, were higher than 0.05 overall. It implies that the null hypothesis was accepted and there was no heteroscedasticity in any portfolios. After conducting the above tests, we could conclude that there was no autocorrelation, multicollinearity, and heteroscedasticity. Further regression analysis was allowed to perform.

Goodness of Fit of the FFTFM
The sample data obtained through actual observation used in empirical research were all authentic reflections of facts. Therefore, after introducing the sample data into the model, it must be able to describe this part of the objective facts well before the model can be considered meaningful. Therefore, the model after data processing should be able to describe the fact better. The degree to which the model approximates the sample is called the "goodness of fit." In multiple regression analyses, the determination coefficient R 2 is usually used to determine the goodness of fit of the equation. The R 2 indicates what percent of the independent variable can explain the dependent variable. The value of R^2 is between 0 and 1. The closer R 2 is to 1, the better the model fits the sample data. If the R 2 is close to 0, the model fits the fact badly. The regression could be conducted in Python and the results of "goodness of fit" are shown below. Durbin-Watson statistics were also included in the table, indicating no autocorrelation.
The FFTFM model performed relatively well in stocks of China's blockchain industry, which could provide a reference for other FinTech companies. The S/L group performed best in terms of explaining portfolio returns, explaining 77.7% changes in the stock return. Portfolio B/H owned the worst result, only 45.8%. These results illustrate that more factors could explain that the portfolio return needs to be included in the mode, as shown in Table 7. The goodness of fit can only reflect the results of the FFTFM based on the selected data, but cannot describe the overall relationship among the factors. Therefore, it was necessary to perform a significance test on the model to test the degree of approximation of the trend in yields. The universal test for testing the significance of the whole model is the F-test, as shown in Table 8.
Test hypothesis: Hypothesis 0 (H0). All coefficients of the regression model are zero, which means that it indicates that the linear relationship of the FFTFM is not significant, and the model is meaningless .   Hypothesis 1 (H1). At least one of the coefficients are not zero. This shows that the FFTFM has a significant linear relationship, and this model has explanatory power to portfolio returns. According to Table 8, at a given significance level of 1%, the F statistics of the four portfolios were all greater than the critical value ( . (3,33) = 2.89). Then the null hypothesis H0 was rejected, which illustrates that at least one of the regression coefficients was significantly different from 0. It can be concluded that the linear relationship of the FFTFM was significant. Besides, the probability value corresponding to the F statistic of each portfolio was equal to 0, which also shows that the overall linear relationship of the FFTFM was highly significant. In short, the FFTFM can better reflect the overall characteristics of portfolio returns, which the companies constructed by the companies related to blockchain technology. It may also provide a baseline for other companies that use blockchain as FinTech.

Significance Test of Coefficients
In the previous section, the F-test of the FFTFM was performed in this paper. The results showed that all four portfolios passed the F-test, which indicates that all factors in the FFTFM (Rm-Rf, SMB, and HML) on stock returns were significant. However, this does not mean that each element in the model (Rm-Rf, SMB, or HML) had a considerable effect on the yield alone. Therefore, it was necessary to test the significance of each coefficient in the model. This paper used the t-test to analyze the impact of a single factor on the stock return in the model.
As an explanatory variable shared by the capital asset pricing model and the FFTFM, testing the coefficient b of the excess market rate of return (Rm-Rf) can analyze whether market risk factors have a significant effect on stock returns.
According to Table 9, the coefficients of all portfolios were positive values, which indicates that the market risk factor was positively correlated with the stock return. Besides, the coefficients of the portfolios all passed the t-test with a significance level of 1%, and the t-values exceeded the significance level of the coefficient that represents the scale factor and the coefficient that means bookto-market ratio factor. This shows that market risk factors had a significant impact on stock returns. However, it is different from the study of Fama and French [5]. They concluded that market risk factors have only weak explanatory power. As an explanatory variable shared by the capital asset pricing model and the FFTFM, testing the coefficient b of the excess market rate of return (Rm-Rf) can analyze whether market risk factors have a significant effect on stock returns.
According to Table 9, the coefficients of all portfolios were positive values, which indicates that the market risk factor was positively correlated with the stock return. Besides, the coefficients of the portfolios all passed the t-test with a significance level of 1%, and the t-values exceeded the significance level of the coefficient that represents the scale factor and the coefficient that means bookto-market ratio factor. This shows that market risk factors had a significant impact on stock returns. However, it is different from the study of Fama and French [5]. They concluded that market risk factors have only weak explanatory power.

Book-to-Market Ratio Factor
The coefficient of the book-to-market value ratio factor (HML) was conducted to test whether there was a significant relationship between the HML and portfolio returns. As shown in Table 9, only the S/L and B/L passed the 1% significance test, and portfolios that were constructed by companies with a high market-to-book ratio performed worse and did not pass the t-test. Fama and French concluded that when the stock has a low book-to-market rate, which is called is a growth stock, the HML factor in the model generally has a negative slope or a decreasing positive slope; when the stock has a high book-to-market ratio, which is called value stock, the HML factor in the model generally has an increasing slope [5]. However, the empirical research of the Chinese companies in the blockchain industry, including which companies take blockchain as FinTech, did not follow this rule, and firms with high market-to-book ratio could not pass the t-test of HML.

Size Factor
This part examines the linear relationship between the independent variable SMB and portfolio returns. Analysis of the results in Table 9 showed that the regression coefficients of the company's size factor SMB on S/L, S/H, B/L, and B/H were 1.0496, 0.3943, −0.6057, and 0.0496, respectively. The P values of the t-test were 4.916, 1.730, −2.657, and 0.232, respectively. According to the above results, the SMB factor of the three portfolios, S/L, S/H, and B/H, had a positive correlation with the excess return of the portfolio, while the explanatory variable SMB factor of the B/L portfolio had a negative relationship with its performance. According to the t-test results of the SMB factor, the P values of the S/L and B/L portfolios were less than 1%, and the P-value of the S/H was less than 5%. Therefore, the correlation coefficient passed the test significantly, indicating that the SMB factor had a substantially higher positive correlation for small-scale blockchain stock portfolios. The t-value of the coefficient of portfolio B/H was less than the critical value, and the confidence level of the p-value was greater than 10%, which means that the correlation between the scale factor and the return of B/H portfolio was not significant. Since the samples were companies using blockchain, which are one of the components of FinTech, it might present a reference for other FinTech firms.

An Improved Four-Factor Model Based on Fama-French Model
After collecting and processing the data about Guba comments on each blockchain company, the score of the investors' emotions could be obtained. People generally regard blockchain as an innovative technology, especially when it comes to FinTech. The score of sentiment is a daily series, and it should be turned into a monthly sequence. Moreover, the monthly score of the firms that are in the same portfolio will be average weighted to get the grades of the portfolio's monthly score. Subsequently, the four portfolios, S/L, S/H, B/L, and B/H, own their sentiment index each month, which could be added into the traditional Fama-French model to construct a new four-factor model. The equation is shown below: Goodness of Fit and F Test Table 10 illustrates the results of the F test and goodness of fit of the four-factor Fama-French pricing model. The F test results for measuring the significance of the regression equation show that the P-value of F test for the S/L, S/H, B/L, and B/H portfolios was equal to 0 when the number of samples was 150, and the number of explanatory variables was 3. All four portfolios passed the F test at a 99% confidence level, indicating that the four sets of regression equations were highly significant. From this, it was concluded that the independent variable market factor (Rm-Rf), the scale factor (SMB), the value factor (HML), and the sentiment factor (Sentiment) all had significant effects on the dependent variable portfolio yield.
2 R can be used to test how well the regression equation fits the sample observations. The closer 2 R is to 1, the better the regression fit. The results in Table 10 show that the determination coefficient of the S/L was 0.771, the determination coefficient of the S/H combination was 0.529, the B/L combination was 0.689, and the B/H was 0.514. From the above analysis, it was found that the portfolio with the best fit was a small-scale and low book-to-market value one, and the worst fit was the portfolio constructed by the big-scale and high book-to-market value companies. The fit of the four groups was not very good, so there should be other factors in the market that had higher explanatory power besides three factors.

Parameters' Analysis
The first is the regression coefficient, and the significance analysis of market returns to the returns of each portfolio. S market portfolio returns are proxy variables for systemic risk; the beta reflects the sensitivity of a single asset or portfolio to market changes. Table 11 shows that the market risk premium coefficient β values of the four portfolios were 1.2245, 1.3174, 1.289, and 1.1852, respectively. The β coefficients were all greater than 0, indicating that the return on portfolios kept the same moving direction as the return of the whole market. The P-values of the t-test of the four stock portfolios were all less than 1%, and the null hypothesis could be rejected at a 99% confidence level. Therefore, it is believed that the linear relationship between the market factor (Rm-Rf) and the portfolio return was significant. Secondly, the linear relationship between independent variable SMB and portfolio return rate was tested. The results in Table 11 reveal that the regression coefficients of the company size factor (SMB) on S/L, S/H, B/L, and B/H were 1.06, 0.4275, −0.4984, and 0.1167, respectively. The t-values were 4.878, 1.871, −2.203, and 0.56, and the p-values for the t-test were 0, 0.070, 0.035, and 0.579. It illustrates that the SMB factors of portfolio S/L, S/H, and B/H had a positive correlation with the excess return, and the explanatory variable SMB factor of the B/L had a negative correlation with its return. According to the t-test result of the SMB factor, the P-value of the S/L portfolio was less than 1%, and the P-value of the S/H was less than 10%. Therefore, the correlation coefficient passed the test significantly, indicating that the scale factor (SMB) affected small-scale blockchain stocks. The portfolios had a significantly higher positive correlation.
The SMB factor in the B/L portfolio also passed the t-test at the 1% confidence level. Still, the t value of the B/H was less than the critical value, indicating that the correlation between the scale factor of the B/H portfolio and the portfolio's return was not significant.
Thirdly, the correlation coefficients that showed the BM effect in the four portfolios were −1.1042, −0.0487, −0.8535, and −0.155, in turn. The t values were −5.793, −0.249, −3.922, and −0.872, respectively, with the corresponding P values 0, 0.805, 0, and 0.39. The regression results showed that HML factors of all portfolios had a negative correlation with their return, which is different from the study of Fama and French [5]. According to the t-test results, the t-values of value factor (HML) in S/L and B/L portfolio were greater than the critical value, and these HML factors in the two portfolios passed the t-test at the 1% confidence level. The t-value of HML in portfolio S/H and B/H showed that the HML factors were not allowed to pass the test, which means that the HML factor in portfolios composed of listed firms with high market-to-book ratio owned the very weak relationship with the return of the portfolios.
Finally, the fourth-factor, "sentiment", was suggested to test. The results in Table 11 reflect that the coefficients "Sentiment" in four portfolios were 0.0351, 0.1054, 0.2469, and 0.1493, in respect, which implied that the sentiment factors in all portfolios had a positive relationship with the return.
In brief, the more exciting the investors, the higher the returns of stocks. However, when checking the results from a more detailed perspective, it was not difficult to conclude that merely sentiment factors in portfolios comprised of big-sized firms could pass the t-test at a 10% confidence level. The t-values of sentiment factor of S/L and B/L portfolios were 0.0351 and −0.0487, respectively. So it was no significant relationship between investors' sentiment and the return of portfolios that were constructed by small-size firms.

Conclusions
This paper is relevant to the topic of regression for FinTech, to demonstrate the effective valuation theory conducted in companies embracing FinTech. It focused on the blockchain companies in China, including those which treat blockchain as FinTech, and used the FFTFM to find these firms. It also contained a new sentiment factor collecting from public comments for better explanatory power. After the above descriptive analysis and regression analysis, it could lead to some conclusions.

Feasibility of FFTFM and an Improved Four-Factor Model
The FFTFM and the improved Fama-French model that added a new sentiment factor both passed the F test. The market factor, size factor, and book-to-market ratio factor in FFTFM owned the explanatory power to describe and review portfolios' returns. In the improved model, the sentiment factor could also explain the return of portfolios effectively.
It was noticeable that the explanatory power of four portfolios in FFTFM increased when adding sentiment factor, rising from 0.77, 0.509, 0.653, and 0.458 to 0.771, 0.529, 0.689, 0.514, respectively. It revealed that the sentiment factor had a positive effect on the model. All coefficients of sentiment factors in different portfolios were positive, indicating more optimism brings a higher return.
However, there are still some minor flaws that caused the eight portfolios' R-square to be not as good as the expectation: The best goodness-of-fit value was 77.1%, and there could be more explanatory variables to review the return of portfolios.
Additionally, to guarantee the reliability of the regression results, it was indispensable to test if there were autocorrelation, multicollinearity, and heteroscedasticity, and we proved that all regressions were acceptable.
Subsequently, although all portfolios passed the F test, the independent variables in each portfolio should also be checked for the significance. There were two portfolios (S/L and B/L) under the FFTFM and one portfolio (B/L) under the four-factor model's own independent variables that all passed the significance test.

Influence of Market Risk Premium Factor
The blockchain industry in China, including, but not limited to, the firms that take blockchain as FinTech, owns a positive relationship with the whole market environment. The coefficients of the independent variable market risk premium factor of all eight portfolios were more significant than 1, which implied that the investment portfolio could release the nonsystematic risk and contribute to the return of portfolios. The blockchain industry is an emerging market in China, and several update companies have begun to brace this technology recently, including the companies which use blockchain as FinTech. Along with the development of the blockchain industry, numerous investors are attracted by this industry and plan their investment in this as well. It causes higher volatility in return than the whole market.

The Non-Existence of Size Effect and Book-to-Market Ratio Effect in the Chinese Blockchain Industry
The size effect is that the return of small listed firms have significantly higher average returns than the large. Banz first found this effect and Fama and French verified the existence [5,55]. However, several researchers own opposite views about the size effect. Goyal and Welch concluded that this effect is caused by the deviation of sample selection rather than the size of the companies [56]. Dimon and Marsh believe that big companies could achieve a higher return than small firms [57]. Schwert claimed that the size effect is disappearing gradually. In this empirical research, the conclusion could be drawn that there is no size effect in the Chinese blockchain industry, which includes the firms using this technology as FinTech; portfolios with big companies bring a higher return [58].
The BM effect indicates that the return of the stocks has a positive relationship with the company's market-to-book ratio. A higher book-to-market rate could bring out a higher stock return. Fama and French also believe the existence of the BM effect [5]. Chinese researchers drew a different conclusion about the BM effect in the Chinese securities market. Xu argued that there is a significant BM effect in the Chinese stock market [59]. Gu and Ding conducted an empirical study of the growth effect of China's securities' market and proved that the BM effect is non-existent [60]. According to the analysis of the Fama-French model above, the BM effect does not exist in the Chinese blockchain industry. Portfolios built by the low book-to-market ratio companies earn more returns than others.
There are various factors that affect the investment in the stock market and investors have been trying to obtain higher investment returns. Many scholars have also studied the effective factors in this filed and have proved that FFTFM can be applied to Western developed securities' markets. Compared with these mature stock markets, the Chinese stock market has developed late. Therefore, whether the Chinese market can effectively meet the conditions for using the three-factor model has been under discussion. In recent years, with the continuous development of technologies, such as big data, methods for measuring investor sentiment have also advanced. The online forum is an important window for investors to express their sentiments. This article also took this factor into account to improve the model's explanatory power. The empirical analysis of this article showed that there is no size effect in the Chinese blockchain industry, but there is a BM effect. Companies with more positive and optimistic concerns can bring higher returns to investors. These can help investors choose high-return companies in this field. We also need to admit that the method of sentiment analysis is still relatively simple, and the accuracy of text sentiment measurement needs to be improved. The extent to which the information in the online forum can affect investors' decisions needs further research.