Estimation of Cross-Lingual News Similarities Using Text-Mining Methods
AbstractIn this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM) based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method. View Full-Text
- Supplementary File 1:
Supplementary (ZIP, 1646 KB)
A printed edition of this Special Issue is available here.
Share & Cite This Article
Wang, Z.; Liu, E.; Sakaji, H.; Ito, T.; Izumi, K.; Tsubouchi, K.; Yamashita, T. Estimation of Cross-Lingual News Similarities Using Text-Mining Methods. J. Risk Financial Manag. 2018, 11, 8.
Wang Z, Liu E, Sakaji H, Ito T, Izumi K, Tsubouchi K, Yamashita T. Estimation of Cross-Lingual News Similarities Using Text-Mining Methods. Journal of Risk and Financial Management. 2018; 11(1):8.Chicago/Turabian Style
Wang, Zhouhao; Liu, Enda; Sakaji, Hiroki; Ito, Tomoki; Izumi, Kiyoshi; Tsubouchi, Kota; Yamashita, Tatsuo. 2018. "Estimation of Cross-Lingual News Similarities Using Text-Mining Methods." J. Risk Financial Manag. 11, no. 1: 8.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.