# Estimation of Economic Indicator Announced by Government From Social Big Data

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- Two variables follow a linear relationship.
- Averages and standard deviations are well-defined.

- Overfitting
- Spurious regression
- Multicollinearity

## 2. Development of a New Economic Index Based on a Comprehensive Search

#### 2.1. Data Pre-Processing

#### 2.2. Variable Selection

#### 2.2.1. Extraction of Words by One-Body Correlation

#### 2.2.2. Grouping

#### 2.2.3. Round Robin (Detection of Spurious Correlation)

#### 2.2.4. Visualization of Relationships Between Variables

#### 2.3. Regression Analysis

#### 2.4. Daily Index and Smoothing

## 3. Discussion

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Sano, Y.; Yamada, K.; Watanabe, H.; Takayasu, H.; Takayasu, M. Empirical analysis of collective human behavior for extraordinary events in the blogosphere. Phys. Rev. E
**2013**, 87. [Google Scholar] [CrossRef] [PubMed] - Fujiyama, T.; Matsui, C.; Takemura, A. A Power-Law Growth and Decay Model with Autocorrelation for Posting Data to Social Networking Services. PLoS ONE
**2016**, 11, e0160592. [Google Scholar] [CrossRef] [PubMed] - Takayasu, M.; Sato, K.; Sano, Y.; Yamada, K.; Miura, W.; Takayasu, H. Rumor Diffusion and Convergence during the 3.11 Earthquake: A Twitter Case Study. PLoS ONE
**2015**, 10, e0121443. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science
**2018**, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Preis, T.; Moat, H.S.; Stanley, H.E. Quantifying Trading Behavior in Financial Markets Using Google Trends. Sci. Rep.
**2013**, 3. [Google Scholar] [CrossRef] [PubMed] [Green Version] - De Choudhury, M.; Sundaram, H.; John, A.; Seligmann, D.D. Can blog communication dynamics be correlated with stock market activity? In Proceedings of the Nineteenth ACM Conference on Hypertext and Hypermedia, Pittsburgh, PA, USA, 19–21 June 2008. [Google Scholar]
- Goel, S.; Hofman, J.M.; Lahaie, S.; Pennock, D.M.; Watts, D.J. Predicting consumer behavior with Web search. PNAS
**2010**, 107, 17486–17490. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Choi, H.; Varian, H. Predicting the Present with Google Trends. Econ. Rec.
**2012**, 88, 2–9. [Google Scholar] [CrossRef] [Green Version] - Sakaki, T.; Okazaki, M.; Matsuo, Y. Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. In Proceedings of the 19th International Conference on World Wide Web, New York, NY, USA, 26–30 April 2010. [Google Scholar]
- Granger, C.W.J.; Newbold, P. Spurious regressions in econometrics. J. Econ.
**1974**, 2, 111–120. [Google Scholar] [CrossRef] [Green Version] - Phillips, P. Understanding spurious regressions in econometrics. J. Econ.
**1986**, 33, 311–340. [Google Scholar] [CrossRef] - Antoniak, C.E. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Ann. Stat.
**1974**, 2, 1152–1174. [Google Scholar] [CrossRef] - Ohnishi, T.; Mizuno, T.; Aihara, K.; Takayasu, M.; Takayasu, H. Systematic tuning of optimal weighted-moving-average of yen-dollar market data. In Practical Fruits of Econophysics; Springer: Tokyo, Japan, 2006; pp. 62–66. [Google Scholar]

**Figure 1.**Flowchart of estimating an economic index from many word frequencies based on comprehensive search.

**Figure 2.**The upper chart shows the Composite Index (CI) coincidence and the lower chart is monthly frequency of “hukeiki” (recession) from January 2007 to December 2014. Broken lines represent the median.

**Figure 3.**Two typical examples of grouped words: (

**a**) recession and economic recovery policy; and (

**b**) foreign banks and capital market.

**Figure 4.**The relationships between variables: (i) economic index and words; and (ii) representative words and subordinate words as a result of analyses: extraction of words by one-body correlation, grouping and round robin (detection of spurious correlation).

**Figure 5.**Regression results using the following word frequencies: (

**a**) recession (one word), (

**b**) recession, foreign banks and primary balance (three words), and (

**c**) recession, foreign banks, primary balance, allowance, medical expenses, long-term prime rate and home sales (seven words). Training period was from January 2007 to December 2014 (green line) and test period was from January 2015 to October 2015 (blue line).

**Figure 6.**Regression results of CI by seven random walks. In the case of (

**a**), ${R}^{2}=0.35$, and in the case of (

**b**), ${R}^{2}=0.83$.

**Figure 7.**Probability density function of ${R}^{2}$ for 10,000 samples by the regression of seven random walks. The red line shows the result in Figure 5c.

**Figure 8.**The relationship between the coefficient of determination (${R}^{2}$) and the number of explanatory variables for the proposed model (black) and random walk model with Top 5 and 10 percentiles (red and blue).

Words | p-Value |
---|---|

Recession | $9.23\times {10}^{-16}$ |

Stock Investment | $9.23\times {10}^{-16}$ |

Commercial Bank | $4.77\times {10}^{-14}$ |

IPO | $4.77\times {10}^{-14}$ |

j | Recession | Foreign Banks | Medical Expenses | Economic Measures | |
---|---|---|---|---|---|

i | |||||

Recession | — | 39.2 | 31.4 | 31.7 | |

Foreign banks | 21.8 | — | 18.8 | 21.3 | |

Medical expenses | 9.4 | 11.6 | — | 15.0 | |

Economic measures | 7.6 | 12.8 | 13.5 | — |

Word | Coefficients (${\mathit{c}}_{\mathit{k}}$) |
---|---|

Recession | $-1.09\times {10}^{4}$ |

Foreign banks | $2.07\times {10}^{5}$ |

Primary balance | $2.70\times {10}^{5}$ |

Allowance | $4.74\times {10}^{3}$ |

Medical expenses | $1.50\times {10}^{4}$ |

Long-term prime rate | $1.77\times {10}^{5}$ |

Home sales | $4.21\times {10}^{5}$ |

(intercept) | $93.1$ |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Yamada, K.; Takayasu, H.; Takayasu, M.
Estimation of Economic Indicator Announced by Government From Social Big Data. *Entropy* **2018**, *20*, 852.
https://doi.org/10.3390/e20110852

**AMA Style**

Yamada K, Takayasu H, Takayasu M.
Estimation of Economic Indicator Announced by Government From Social Big Data. *Entropy*. 2018; 20(11):852.
https://doi.org/10.3390/e20110852

**Chicago/Turabian Style**

Yamada, Kenta, Hideki Takayasu, and Misako Takayasu.
2018. "Estimation of Economic Indicator Announced by Government From Social Big Data" *Entropy* 20, no. 11: 852.
https://doi.org/10.3390/e20110852