Text Mining arXiv: A Look Through Quantitative Finance Papers
Abstract
:1. Introduction
2. Data Description
- Computational finance (q-fin.CP) includes Monte Carlo, PDE, lattice and other numerical methods with applications to financial modeling;
- Economics (q-fin.EC) is an alias for econ.GN and it analyses micro and macro economics, international economics, theory of the firm, labor economics, and other economic topics outside finance;
- General finance (q-fin.GN) is focused on the development of general quantitative methodologies with applications in finance;
- Mathematical finance (q-fin.MF) examines mathematical and analytical methods of finance, including stochastic, probabilistic and functional analysis, algebraic, geometric, and other methods;
- Portfolio management (q-fin.PM) deals with security selection and optimization, capital allocation, investment strategies, and performance measurement;
- Pricing of securities (q-fin.PR) discusses valuation and hedging of financial securities, their derivatives, and structured products;
- Risk management (q-fin.RM) is about risk measurement and management of financial risks in trading, banking, insurance, corporate, and other applications;
- Statistical finance (q-fin.ST) includes statistical, econometric and econophysics analyses with applications to financial markets and economic data;
- Trading and market microstructure (q-fin.TR) studies market microstructure, liquidity, exchange and auction design, automated trading, agent-based modeling, and market-making.
3. Text Preprocessing
4. Topics Trend
4.1. Topic Modeling Algorithms
- k-means. I perform a clustering analysis by considering the k-means algorithm implemented in scikit-learn. This algorithm groups data points into k clusters by minimizing the distance between data points and their cluster center. The document word matrix is created through the CountVectorizer function, which converts the corpus to a matrix of token counts. I ignore terms that have a document frequency strictly higher than 75%.
- LDA. By considering the same document word matrix analyzed with the k-means algorithm, I perform topic modeling with the latent Dirichlet allocation (LDA). LDA is a well-known unsupervised learning algorithm. As observed in the seminal work of [30], the basic idea is that documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words. I study two different implementations of LDA (i.e., scikit-learn and gensim).
- Word2Vec. I train a word embedding model (i.e., Word2Vec) and then I perform a clustering analysis by considering again the k-means approach. An embedding is a low-dimensional space into which high-dimensional vectors are projected. Machine learning on large inputs like sparse vectors representing words is easier if embeddings are considered. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. The Word2Vec neural network introduces distributed word representations that capture syntactic and semantic word relationships (see [31]). In more detail, I generate document vectors using the trained Word2Vec model, that is, I obtain numerical vectors for each word in a document, and then the document vector is the weighted average of the vectors. Thus, the k-means algorithm is applied to the matrix representing the corpus. I consider the Word2Vec model implemented in gensim.
- Doc2Vec. I create a vectorized representation of each document through the Doc2Vec model and then I perform a clustering analysis by considering the k-means approach. The Doc2Vec extends Word2Vec and it can learn distributed representations of varying lengths of text, from sentences to documents (see [32]). I consider the Doc2Vec model implemented in gensim.
- Top2Vec. I study the Top2Vec model, an unsupervised learning algorithm that finds topic vectors in a space of jointly embedded document and word vectors (see [33]). This algorithm directly detects topics by performing the following steps. First, embedding vectors for documents and words are generated. Second, a dimensionality reduction on the vectors is implemented. Third, the vectors are clustered and topics are assigned. This algorithm is implemented in an ad hoc library named Top2Vec and it automatically provides information on the number of topics, topic size, and words representing the topics.
- BERTopic. I study a BERTopic model, which is similar to Top2Vec in terms of algorithmic structure and uses BERT as an embedder. As described in the seminal work of [34], from the clusters of documents, topic representations are extracted using a custom class-based variation of term frequency-inverse document frequency (TF-IDF). This is the main difference with respect to Top2Vec. The algorithm is implemented in an ad hoc library named BERTopic. The main downside with working with large documents, as in my case, is that information will be ignored if the documents are too long. The model accepts a limited number of tokens, discarding any additional input. Since I are dealing with large documents, to work around this issue, I first split each documents into chunks of 300 tokens and then fit the model on these chunks. BERTopic does not allow one to directly select the number of topics; for this reason, on the first step, I obtain a number of topics much larger than the desired one. Since I obtain for each chunk the corresponding topic, I have for each document a list of possibly different topics and the length of these lists varies across documents (i.e., the length of a single list depends on the length of the corresponding document). To perform clustering on this list of lists of topics, I consider each integer representing a topic as a word. Thus, I use the Word2Vec algorithm described above to find similarities between these list of topics. Each topic label, that is the number representing the topic, is treated as a string, and Word2Vec transforms it into a numerical vector. I then apply k-means clustering to group these lists based on their similarity in the vector space. The resulting clusters reveal relationships and patterns among these lists and allow us to select the number of clusters I need for my purposes.
4.2. Algorithm Performance on Full Texts
4.3. Empirical Study
5. Extracting Authors and Journals
6. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bianchi, M.L.; Tassinari, G.L.; Fabozzi, F.J. Fat and heavy tails in asset management. J. Portf. Manag. 2023, 49, 236–263. [Google Scholar] [CrossRef]
- Vogl, M. Quantitative modelling frontiers: A literature review on the evolution in financial and risk modelling after the financial crisis (2008–2019). SN Bus. Econ. 2022, 2, 183. [Google Scholar] [CrossRef] [PubMed]
- Derman, E. Models Behaving Badly: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
- Ippoliti, E. Mathematics and finance: Some philosophical remarks. Topoi 2021, 40, 771–781. [Google Scholar] [CrossRef]
- Carmona, R. The influence of economic research on financial mathematics: Evidence from the last 25 years. Financ. Stoch. 2022, 26, 85–101. [Google Scholar] [CrossRef]
- Cesa, M. A brief history of quantitative finance. Probab. Uncertain. Quant. Risk 2017, 2, 1–16. [Google Scholar] [CrossRef]
- Derman, E.; Miller, M.B. The Volatility Smile; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
- Bianchetti, M.; Carlicchi, M. Interest Rates After the Credit Crunch: Multiple-Curve Vanilla Derivatives and SABR. Available online: https://arxiv.org/abs/1103.2567 (accessed on 21 February 2025).
- Huisman, J.; Smits, J. Duration and quality of the peer review process: The author’s perspective. Scientometrics 2017, 113, 633–650. [Google Scholar] [CrossRef]
- Clement, C.B.; Bierbaum, M.; O’Keeffe, K.P.; Alemi, A.A. On the use of arXiv as a dataset. Available online: https://arxiv.org/abs/1905.00075 (accessed on 21 February 2025).
- Eger, S.; Li, C.; Netzer, F.; Gurevych, I. Predicting Research Trends from arXiv. Available online: https://arxiv.org/abs/1903.02831 (accessed on 21 February 2025).
- Viet, N.T.; Kravets, A.G. Analyzing recent research trends of computer science from academic open-access digital library. In Proceedings of the 8th International Conference on System Modeling and Advancement in Research Trends (SMART), Moradabad, India, 22–23 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 31–36. [Google Scholar]
- Lin, J.; Yu, Y.; Zhou, Y.; Zhou, Z.; Shi, X. How many preprints have actually been printed and why: A case study of computer science preprints on arXiv. Scientometrics 2020, 124, 555–574. [Google Scholar] [CrossRef]
- Tan, K.; Munster, A.; Mackenzie, A. Images of the arXiv: Reconfiguring large scientific image datasets. J. Cult. Anal. 2021, 3, 1–41. [Google Scholar] [CrossRef]
- Okamura, K. Scientometric engineering: Exploring citation dynamics via arXiv eprints. Quant. Sci. Stud. 2022, 3, 122–146. [Google Scholar] [CrossRef]
- Bohara, K.; Shakya, A.; Debb Pande, B. Fine-tuning of RoBERTa for document classification of arXiv dataset. In Mobile Computing and Sustainable Informatics; Shakya, G., Papakostas, S., Kamel, K.A., Eds.; Springer Nature: Singapore, 2023; pp. 243–255. [Google Scholar]
- Fatima, R.; Yasin, A.; Liu, L.; Wang, J.; Afzal, W. Retrieving arXiv, SocArXiv, and SSRN metadata for initial review screening. Inf. Softw. Technol. 2023, 161, 107251. [Google Scholar] [CrossRef]
- Burton, B.; Kumar, S.; Pandey, N. Twenty-five years of The European Journal of Finance (EJF): A retrospective analysis. Eur. J. Financ. 2020, 26, 1817–1841. [Google Scholar] [CrossRef]
- Ali, A.; Bashir, H.A. Bibliometric study on asset pricing. Qual. Res. Financ. Mark. 2022, 14, 433–460. [Google Scholar] [CrossRef]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Sharma, P.; Sharma, D.K.; Gupta, P. Review of research on option pricing: A bibliometric analysis. Qual. Res. Financ. Mark. 2024, 16, 159–182. [Google Scholar] [CrossRef]
- Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
- Van Eck, N.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef]
- Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An open source software for exploring and manipulating networks. In Proceedings of the International AAAI Conference on Web and Social Media, Palo Alto, CA, USA, 17–20 May 2009; pp. 361–362. [Google Scholar]
- Liu, J.; Liu, Y.; Ren, L.; Li, X.; Wang, S. Trends and Trajectories: A Bibliometric Analysis of Financial Risk in Corporate Finance and Finance (2020–2024). Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4960436 (accessed on 21 February 2025).
- Joaqui-Barandica, O.; Manotas-Duque, D.F. Assets liability management: A bibliometric analysis and topic modeling. Entramado 2022, 18, 1–23. [Google Scholar]
- Westergaard, D.; Stærfeldt, H.H.; Tønsberg, C.; Jensen, L.J.; Brunak, S. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 2018, 14, e1005962. [Google Scholar] [CrossRef]
- DuBay, W.H. The principles of readability. ERIC 2004. Available online: https://eric.ed.gov/?id=ed490073 (accessed on 21 February 2025).
- Sethia, K.; Saxena, M.; Goyal, M.; Yadav, R.K. Framework for topic modeling using BERT, LDA and K-means. In Proceedings of the 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2204–2208. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In International Conference on Machine Learning; PMLR: Brookline, MA, USA, 2014; pp. 1188–1196. [Google Scholar]
- Angelov, D. Top2vec: Distributed Representations of Topics. arXiv 2020, arXiv:2008.09470. [Google Scholar]
- Grootendorst, M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
- Egger, R.; Yu, J. A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify Twitter posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef] [PubMed]
- Rüdiger, M.; Antons, D.; Joshi, A.M.; Salge, T.O. Topic modeling revisited: New evidence on algorithm performance and quality metrics. PLoS ONE 2022, 17, e0266325. [Google Scholar] [CrossRef]
- Terragni, S.; Fersini, E.; Galuzzi, B.G.; Tropeano, P.; Candelieri, A. OCTIS: Comparing and optimizing topic models is simple. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Kyiv, Ukraine, 19–23 April 2021; pp. 263–270. [Google Scholar]
- Ebinezer, S. Transform Your Topic Modeling with ChatGPT: Cutting-Edge NLP. Available online: https://medium.com/ (accessed on 21 February 2025).
- Metelko, Z.; Maver, J. Exploring arXiv usage habits among Slovenian scientists. J. Doc. 2023, 79, 72–94. [Google Scholar] [CrossRef]
- Mishkin, D.; Tabb, A.; Matas, J. ArXiving Before Submission Helps Everyone. arXiv 2020, arXiv:2010.05365. [Google Scholar]
- Wang, Z.; Chen, Y.; Glänzel, W. Preprints as accelerators of scholarly communication: An empirical analysis in mathematics. J. Informetr. 2020, 14, 101097. [Google Scholar] [CrossRef]
RS | ARS | MI | NMI | CA | PS | |
---|---|---|---|---|---|---|
k-means | 0.570 | 0.029 | 0.232 | 0.136 | 0.271 | 0.297 |
LDA scikit-learn | 0.823 | 0.194 | 0.608 | 0.284 | 0.376 | 0.460 |
LDA gensim | 0.788 | 0.085 | 0.276 | 0.131 | 0.275 | 0.314 |
Word2Vec | 0.832 | 0.200 | 0.613 | 0.283 | 0.371 | 0.427 |
Doc2Vec | 0.831 | 0.220 | 0.699 | 0.325 | 0.388 | 0.490 |
Top2Vec raw | 0.810 | 0.195 | 0.501 | 0.239 | 0.365 | 0.404 |
Top2Vec cleaned | 0.811 | 0.206 | 0.530 | 0.387 | 0.253 | 0.416 |
BERTopic raw | 0.826 | 0.238 | 0.608 | 0.289 | 0.436 | 0.458 |
BERTopic cleaned | 0.821 | 0.239 | 0.574 | 0.276 | 0.398 | 0.429 |
Number | Label | Title of the Most Representative Paper |
---|---|---|
0 | diverse perspectives in education, innovation, and economic development | Perspectives in public and university sector co-operation in the change in the higher education model in Hungary, in light of China’s experience |
1 | modeling financial market dynamics | Comment on: Thermal model for adaptive competition in a market |
2 | decentralized finance and blockchain technology | Understanding the maker protocol |
3 | correlation analysis in financial markets and networks | Random matrix theory and cross-correlations in global financial indices and local stock market indices |
4 | deep reinforcement learning in stock trading and portfolio management | Practical deep reinforcement learning approach for stock trading optimal market making by reinforcement learning |
5 | optimal trading and portfolio liquidation strategies in financial markets | An FBSDE approach to market impact games with stochastic parameters |
6 | portfolio optimization techniques and strategies | Seven sins in portfolio optimization |
7 | stochastic volatility modeling and option pricing | On the uniqueness of classical solutions of Cauchy problems |
8 | asset pricing, investment, and arbitrage in financial markets | Characterization of arbitrage-free markets |
9 | network analysis of financial contagion and systemic risk | Clearing algorithms and network centrality |
10 | counterparty risk and valuation adjustments in financial derivatives | Collateral margining in arbitrage-free counterparty valuation adjustment including re-hypotecation and netting |
11 | quantum models in finance and option pricing | Sornette–Ide model for markets: Trader expectations as imaginary part |
12 | valuation and risk management in annuity and insurance products | A policyholder’s utility indifference valuation model for the guaranteed annuity option |
13 | optimal dividend strategies in stochastic control and risk management | Optimal dividends problem with a terminal value for spectrally positive Levy processes |
14 | risk measures and utility maximization under model uncertainty | On the C-property and -representations of risk measures |
15 | economic complexity, networks, and trade patterns | Economic complexity and growth: Can value-added exports better explain the link? |
16 | health, policy, and social impact studies | Ramadan and infant health outcomes |
17 | statistical analysis of financial markets and volatility | Volatility distribution in the S&P500 Stock Index |
18 | renewable energy economics and electricity market dynamics | On wholesale electricity prices and market values in a carbon-neutral energy system |
19 | game theory and strategic decision-making | Simultaneous auctions for complementary goods |
20 | stock price prediction with deep learning and news sentiment analysis | Stock Prediction: a method based on extraction of news features and recurrent neural networks |
21 | kinetic wealth exchange models in economics | Gibbs versus non-Gibbs distributions in money dynamics |
22 | market order flow and price impact | Order flow and price formation |
23 | high-order numerical methods for option pricing in finance | High-order compact finite difference scheme for option pricing in stochastic volatility with contemporaneous jump models |
24 | optimal investment and consumption in financial models with constraints | Recursive utility optimization with concave coefficients |
25 | environmental and economic impacts of mobility technologies | A review on energy, environmental, and sustainability implications of connected and automated vehicles |
26 | advanced risk measures in financial modeling | Generating unfavorable VaR scenarios with patchwork copulas |
27 | Bayesian models for financial tail risk forecasting | A semi-parametric realized joint value-at-risk and expected shortfall regression framework |
28 | pricing and modeling options in stochastic volatility models with jumps | Semi-analytical pricing of barrier options in the time-dependent Heston model |
29 | economic growth and market dynamics | Uncovering volatility dynamics in daily REIT returns |
Author_ID | Occurences | Name | Topics | Citations | h-Index | i10-Index |
---|---|---|---|---|---|---|
mG07_6k | 4505 | Ioannis Karatzas | [‘stochastic analysis’, ‘stochastic control’, ‘mathematical finance’] | 35,724 | 60 | 127 |
58amEmw | 4441 | Jean-Philippe Bouchaud | [‘statistical mechanics’, ‘disordered systems’, ‘random matrices’, ‘quantitative finance’, ‘agent based models’] | 49,794 | 105 | 351 |
ahLm1v0 | 3256 | Walter Schachermayer | [] | 15,891 | 56 | 124 |
HGsSmMA | 3135 | Didier Sornette | [‘cooperation’, ‘organization’, ‘patterns’, ‘prediction’] | 53,371 | 112 | 587 |
CqFCQVE | 2794 | Alexander Schied | [‘probability theory’, ‘stochastic processes’, ‘mathematical finance’] | 16,780 | 35 | 71 |
2QOp9_M | 2457 | Peter Carr | [‘financial engineering’, ‘quantitative finance’, ‘mathematical finance’, ‘derivatives’, ‘volatility’] | 24,060 | 62 | 117 |
mVF1X_U | 2371 | Freddy Delbaen | [‘mathematik’, ‘ökonomie’] | 25,557 | 50 | 90 |
ElAtiUs | 2258 | Darrell Duffie | [‘finance’, ‘central banking’] | 58,442 | 88 | 142 |
fq7BQos | 2042 | Dilip B. Madan | [‘mathematical finance’, ‘general equilibrium theory’] | 25,131 | 55 | 156 |
8abFiFM | 1774 | Robert Engle | [‘finance and econometrics’] | 189,678 | 118 | 229 |
GU9HgNA | 1743 | Quanquan Gu | [‘statistical machine learning’, ‘nonconvex optimization’, ‘deep learning theory’, ‘reinforcement learning’, ‘ai for science’] | 14,915 | 56 | 161 |
Q7N-rCk | 1672 | Rosario Nunzio Mantegna | [‘econophysics’, ‘statistical physics’, ‘complex systems’, ‘financial markets’, ‘information filtering’] | 28,269 | 67 | 139 |
QsYYhSE | 1629 | Søren Johansen | [‘matamatical statistics’, ‘econometrics’] | 97,728 | 65 | 132 |
vZA2pjw | 1627 | Benoît B. Mandelbrot | [‘mathematics’, ‘fractals’, ‘economics’, ‘information theory’, ‘fluid dynamics’] | 142,895 | 96 | 319 |
Lf1kf1Q | 1505 | Mario Coccia | [‘evolution of technology’, ‘scientific change’, ‘social dynamics’, ‘complex adaptive systems’, ‘environment & COVID-19’] | 19,376 | 106 | 228 |
9HXRjPk | 1444 | Damir Filipovic | [‘quantitative finance’, ‘quantitative risk management’] | 6910 | 39 | 73 |
6_INHZI | 1389 | Fabrizio Lillo | [‘quantitative finance’, ‘statistical mechanics’, ‘data science’] | 10,900 | 51 | 121 |
mGpnlA8 | 1218 | Touzi Nizar | [‘stochastic control’, ‘mathematical finance’, ‘monte carlo methods’] | 11,831 | 56 | 120 |
rp-3Yoo | 1187 | Barry Williams | [‘banks and banking’, ‘bank risk’, ‘multinational’, ‘banking’] | 2342 | 17 | 23 |
MZNxzRY | 1161 | Huyên Pham | [‘mathematical finance’, ‘stochastic control’, ‘numerical probabilities’] | 9967 | 54 | 135 |
3HhvEUc | 1147 | Yuri Kabanov | [‘mathematical finance’, ‘mathematics’] | 6297 | 38 | 77 |
-YEPo1E | 1143 | Wing-Keung Wong | [‘financial economics’, ‘econometrics’, ‘investment theory’, ‘risk management’, ‘operational research’] | 14,440 | 65 | 274 |
GyPrRgc | 1138 | Swarn Chatterjee | [‘financial planning’, ‘wealth management’, ‘financial literacy’, ‘household finance’, ‘behavioral finance’] | 2719 | 28 | 54 |
RZid9X8 | 1075 | Guido Caldarelli | [‘network theory’, ‘network science’, ‘statistical physics’, ‘complex systems’] | 24,165 | 71 | 191 |
ImhakoA | 1075 | Daniel Kahneman | [] | 519,507 | 158 | 369 |
zO_tShM | 1050 | Marek Rutkowski | [‘mathematical finance’, ‘stochastic processes’] | 7559 | 30 | 67 |
7NJ7Ax8 | 1039 | Patrick Cheridito | [] | 5400 | 34 | 59 |
nyfza90 | 1019 | Volker Schmidt | [‘virtual materials testing’, ‘statistical learning’, ‘image analysis’, ‘spatial stochastic modeling’, ‘monte carlo simulation’] | 11,896 | 52 | 230 |
x4vtSxI | 1017 | Rene Carmona | [‘stochastic analysis’, ‘financial mathematics’, ‘financial engineering’] | 17,474 | 59 | 139 |
kukA0Lc | 999 | Yoshua Bengio | [‘machine learning’, ‘deep learning’, ‘artificial intelligence’] | 656,874 | 222 | 763 |
vQ0_nz8 | 989 | Emmanuel Bacry | [‘self-similarity’, ‘multifractal’, ‘stochastic modeling’, ‘statistical finance’, ‘financial time-series modelization’] | 11,937 | 47 | 69 |
1XwLUrc | 980 | Jim Gatheral | [‘volatility modeling’, ‘market microstructure’, ‘algorithmic trading’] | 6126 | 30 | 42 |
3HwRbiQ | 955 | Jerome Friedman | [] | 283,058 | 95 | 197 |
e2Xowj0 | 900 | Neil Shephard | [‘econometrics’, ‘economics’, ‘statistics’, ‘financial econometrics’, ‘finance’] | 42,035 | 69 | 140 |
pEnxwCM | 887 | Victor M. Yakovenko | [‘condensed matter theory’, ‘econophysics’] | 8891 | 44 | 101 |
a11vssU | 845 | Constantinos Kardaras | [‘stochastic analysis’, ‘probability’, ‘mathematical finance’] | 1557 | 20 | 31 |
79htA7g | 838 | Bent Flyvbjerg | [‘project management’, ‘management’, ‘infrastructure’, ‘planning’, ‘cities’] | 73,264 | 70 | 152 |
zH1qBSo | 834 | Albert Shiryaev | [‘probability theory’] | 35,521 | 59 | 163 |
QVb4LGI | 815 | Andrey Itkin | [‘mathematical finance’, ‘computational finance’, ‘derivatives’, ‘quantitative finance’, ‘machine learning’] | 709 | 14 | 19 |
Zuhod6s | 813 | Yong Deng | [‘uncertainty’, ‘deng entropy’, ‘information volume’, ‘random permutation set’, ‘chaos and fractal’] | 23,189 | 81 | 335 |
bWlZ3-Y | 810 | Eric Jacquier | [] | 4458 | 19 | 25 |
GKthQJQ | 804 | Peter K. Friz | [‘rough path theory’, ‘stochastic analysis’, ‘pdes’, ‘finance’] | 5678 | 39 | 82 |
2qTa_4U | 794 | Francis Diebold | [‘economics’, ‘econometrics’, ‘time series’, ‘statistics’] | 76,159 | 97 | 175 |
utY1nTo | 794 | Matteo Marsili | [‘statistical mechanics’, ‘stochastic processes’, ‘collective phenomena in socio-economic systems’, ‘networks’, ‘complex systems’] | 10,093 | 50 | 139 |
ZpG_cJw | 783 | Robert Tibshirani | [‘statistics’, ‘data science’, ‘machine learning’] | 460,493 | 172 | 525 |
65wdZxA | 780 | Damiano Brigo | [‘probability’, ‘mathematical finance’, ‘stochastic analysis’, ‘signal processing’, ‘differential geometry and statistics’] | 9663 | 42 | 114 |
bxJe87s | 780 | Marco Frittelli | [‘financial mathematics’, ‘mathematical finance’, ‘probability’] | 3795 | 24 | 33 |
aVju7cI | 771 | Monique Jeanblanc | [‘mathématiques financières’] | 10,256 | 51 | 110 |
-iOn6uI | 769 | Aurélien Alfonsi | [] | 2731 | 22 | 30 |
P_LECrk | 750 | Tomasz R. Bielecki | [‘mathematical finance’, ‘stochastic processes’, ‘stochastic control’, ‘stochastic analysis’, ‘probability’] | 6901 | 39 | 89 |
5sQ0Fag | 729 | Ajit Singh | [] | 21,592 | 60 | 360 |
6quAJUE | 706 | Josef Teichmann | [‘mathematical finance’, ‘machine learning in finance’, ‘rough analysis’] | 3019 | 30 | 67 |
58amEmw | 705 | Jean-Philippe Bouchaud | [‘statistical mechanics’, ‘disordered systems’, ‘random matrices’, ‘quantitative finance’, ‘agent based models’] | 49,794 | 105 | 351 |
JicYPdA | 691 | Geoffrey Hinton | [‘machine learning’, ‘psychology’, ‘artificial intelligence’, ‘cognitive science’, ‘computer science’] | 687,453 | 180 | 436 |
i2MC67A | 679 | J.F. Muzy | [‘multifractal analysis’, ‘econophysics’, ‘turbulence’] | 13,417 | 55 | 83 |
K9yGky8 | 678 | Andreas Kyprianou | [‘probability theory’, ‘applied mathematics’] | 7946 | 44 | 99 |
aCSds20 | 670 | Xavier Gabaix | [‘economics’, ‘finance’] | 30,926 | 56 | 75 |
G-WPCrM | 667 | Diego Garlaschelli | [‘network theory’, ‘econophysics’, ‘sociophysics’, ‘statistical physics’] | 7394 | 41 | 73 |
YTCnA4E | 664 | Eduardo Schwartz | [‘finance’] | 44,652 | 81 | 141 |
A0ISJPU | 664 | Steven Shreve | [‘probability’, ‘financial mathematics’] | 35,605 | 42 | 70 |
fFFOHec | 660 | Alexander McNeil | [] | 24,473 | 41 | 60 |
dYwbc9s | 659 | Guido Imbens | [‘causal inference’, ‘econometrics’] | 90,765 | 95 | 169 |
OQK4DDY | 657 | Peter Forsyth | [‘scientific computing’, ‘computational finance’, ‘numerical solution of pdes’] | 10,617 | 58 | 143 |
c1wQ9_k | 655 | Daojian Zeng | [‘natural language processing’] | 4877 | 13 | 17 |
Vs7kOf4 | 645 | Marcel Nutz | [‘optimal transport’, ‘mathematical finance’, ‘game theory’] | 2339 | 30 | 43 |
vjc1kF0 | 640 | Francesca Biagini | [‘financial and insurance mathematics’, ‘stochastic calculus’, ‘probability’] | 2598 | 20 | 42 |
zGJKZpk | 629 | Marianne Bertrand | [] | 64,693 | 66 | 119 |
nEfnJZM | 628 | Vadim Linetsky | [‘mathematical finance’, ‘financial economics’] | 5260 | 39 | 63 |
Bekg2Qo | 621 | Joel Shapiro | [‘financial intermediation’, ‘regulation of financial institutions’, ‘corporate governance’, ‘industrial organization’] | 3449 | 15 | 16 |
KDhGvNQ | 611 | Johanna Ziegel | [‘statistical forecasting’, ‘risk measures’, ‘postitive definite functions’, ‘stereology’, ‘copulas’] | 2088 | 19 | 31 |
r5PHkCs | 610 | Thomas Guhr | [‘theoretical physics’] | 8395 | 39 | 104 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bianchi, M.L. Text Mining arXiv: A Look Through Quantitative Finance Papers. Mathematics 2025, 13, 1375. https://doi.org/10.3390/math13091375
Bianchi ML. Text Mining arXiv: A Look Through Quantitative Finance Papers. Mathematics. 2025; 13(9):1375. https://doi.org/10.3390/math13091375
Chicago/Turabian StyleBianchi, Michele Leonardo. 2025. "Text Mining arXiv: A Look Through Quantitative Finance Papers" Mathematics 13, no. 9: 1375. https://doi.org/10.3390/math13091375
APA StyleBianchi, M. L. (2025). Text Mining arXiv: A Look Through Quantitative Finance Papers. Mathematics, 13(9), 1375. https://doi.org/10.3390/math13091375