Machine Learning in Finance: A Metadata-Based Systematic Review of the Literature

: Machine learning in ﬁnance has been on the rise in the past decade. The applications of machine learning have become a promising methodological advancement. The paper’s central goal is to use a metadata-based systematic literature review to map the current state of neural networks and machine learning in the ﬁnance ﬁeld. After collecting a large dataset comprised of 5053 documents, we conducted a computational systematic review of the academic ﬁnance literature intersected with neural network methodologies, with a limited focus on the documents’ metadata. The output is a meta-analysis of the two-decade evolution and the current state of academic inquiries into ﬁnancial concepts. Researchers will beneﬁt from a mapping resulting from computational-based methods such as graph theory and natural language processing.


Introduction
The theory and practice of finance have undergone a remarkable evolution in the past five decades. The emergence and acceptance of the Efficient Market Hypothesis (EMH), its subsequent mixed empirical record, the rise of pragmatically driven 'Chartism', and the present co-evolution of quantitative and behavioral finance represent some exciting significant developments in the financial domain.
The vibrancy of finance can also be observed by two methodological revolutions bringing sophisticated technical analysis of financial phenomena. Machine Learning Algorithms (MLAs) application in explaining and forecasting financial market trends has been a significant methodological advancement in the past three decades. Another critical research direction has been the rise of sentiment analysis of unstructured data relating to relevant news for financial markets.
In this article, we propose to take a comprehensive look at machine learning in finance. For that, we will use neural network as a keyword in our data collection. Using neural network as a keyword does not limit us to just neural networks approaches, because the source data will also contain other terms such as machine learning, deep learning, etc. The rationale behind using neural network as a core keyword is that the most influential papers introducing machine learning in finance used neural networks as a methodology of choice (i.e., . Conventional systematic literature reviews (SLR) are a process that enables the collection of relevant evidence on a given topic that meets predefined eligibility criteria and provides an answer to the research questions formulated. A meta-analysis necessitates descriptive and/or inferential statistical methods to synthesize data from multiple studies on a particular subject. The techniques facilitate the generation of knowledge from a variety of studies, both qualitative and quantitative. The conventional method consists of four fundamental steps: search (define the search string and database types), appraisal (pre-defined literature inclusion and exclusion criteria, and quality assessment criteria), synthesis (extract and categorize the data), and analysis (narrate the results and finally reach a conclusion) (SALSA) 2 of 31 (Mengist et al. 2020). SLR is defined as a "systematic, explicit, and reproducible method for identifying, evaluating, and synthesizing the existing body of completed and recorded work" (del Amo et al. 2018). According to Grant and Booth (2009), the SALSA framework is a methodology for determining the search protocols that the SLR should follow. This ensures methodological precision, standardization, comprehensiveness, and reproducibility. The majority of scientific work employed this methodological approach to mitigate the risk of publication bias and increase the work's acceptability (del Amo et al. 2018;Grant and Booth 2009;Malinauskaite et al. 2019;Perevochtchikova et al. 2019). Thus, most review articles followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol and the Search, Appraisal, Synthesis, and Analysis (SALSA) framework (Grant and Booth 2009).
From SALSA, this article adds a pre-processing step to reduce potential human biases and highlights new results based on text-based analyses of the data collected.
Indeed, our main contribution is a computational systematic literature review of machine learning (and neural networks in particular) in finance between 1990 and 2021. We believe it is crucial to map the evolution of these new technologies and methodologies in our field. When scholars in the computer science field essentially develop the Artificial Intelligence (AI) sub-domain and machine learning techniques, including deep learning and reinforcement learning, it is interesting to look at the bridges between these developments and the ones in finance.
A second contribution is methodological. We indeed perform a metadata-based systematic review of the relevant literature. In the methodology section, we will provide a precise definition of the approach. We believe it is an essential methodological complement to conventional qualitative reviews and econometric-based meta-analyses. A metadata analysis means we will collect more articles than in a traditional systematic literature review and use algorithms to filter and sort the initial dataset. The methodological approach will be twofold: (1) we will use Natural Language Processing (NLP) techniques to extract textas-data information, and (2) we will use graph theory to visualize potential collaboration networks. These two methodological approaches combined will provide us a different analysis than a formal systematic review. It is not to be seen as a substitute, but instead as a complement to the more conventional approach.
As an aside, and although we will not spend time on this aspect, a third contribution could be an epistemological one in nature and leverages our first contribution on the mapping of machine learning in finance to reflect on the implications of its significance on the old debate between theorists and chartists in finance. Markowitz (1952); Sharpe (1963Sharpe ( , 1964, EMH emerged as a dominant paradigm providing a formal explanation of financial markets' behavior. Empirical approaches emerged under the umbrella of "Chartism" (e.g., Berardi 2011). Chartists-or empirically minded technical analysts-have used extrapolative rules to discover statistical regularities in the time series for prices (e.g., Hsieh 1989;Frankel and Froot 1990;Gerritsen et al. 2020). Additionally, a burgeoning literature on agent-based financial market models emerged, allowing various interactions between chartists and fundamentalists (e.g., Day and Huang 1990). Thanks to ML techniques, induction generates causal relationships based on information at the moment of estimation (Popper 1962;Warin 2005). These causal relationships are at the root of the predictive power of ML models. In the ML context, causality and prediction seem to get theorists and technical analyses closer.
The structure of the paper is as follows. In the next section, we provide a metadatabased systematic review of the academic literature on finance, published between January 1990 and May 2021. The third section elaborates the conceptual structures behind the relevant literature by exploring the keywords, keywords co-occurrences, and the topics' evolution based on a topic modeling technique. In the next section, we examine the intellectual structures behind the evolution of analytical thinking on finance by focusing on what vehicles and which organizations are the main engines in this topic dynamics. The fifth section critically examines the social structures of our sample, encompassing different measures to capture the social connections of authors, co-citations, and collaborations across institutions. The concluding remarks summarize the potential of machine learning, neural networks, and in general, the augmented technical analysis in analyzing financial markets.

Materials and Methods
A standard introduction to financial theory would often distinguish several valuation models that might be useful for analyzing securities and managing portfolios (see Lee and Lee 2010). Since the 1970s, the evolution of financial theory has been greatly influenced and informed by the emergence and acceptance of the EMH and the Modern Portfolio Theory (MPT) (Prasch and Warin 2016). Given the vast literature on financial analytics models, we confine our critical review only to the main strands of the relevant academic literature.
To illustrate the development of neural networks in finance, we conduct a scientometric study of the academic literature on finance, published between January 1990 and May 2021.

Methodology
The methodology used here is a systematic literature review with a different approach to more conventional reviews. In usual literature reviews, the author selects the relevant literature based on her domain or methodological expertise. Then, the analysis is based on the content found in the sample that has been created in the initial stage. The primary characteristics of SLR and its associated procedure, meta-analysis, are the following: (1) a clearly stated research question that the study will address; (2) explicit and reproducible objectives; (3) search strings that include all related studies that meet the eligibility criteria; and (4) an assessment of the quality/validity of the selected studies.
To have a comprehensive look, conventional systematic literature might not be the best choice. Considering the pace of the new developments in the artificial intelligence field, we propose here to map the extent of the usage of these new technologies and methodologies in finance. Systematic literature is a mapping exercise of a knowledge area, and it is also really focused, with between 50 to 200 papers being analyzed. Here, we also want to map the machine learning knowledge area while collecting a significant number of documents. The large dataset size will allow us to build an analysis based on the documents' metadata, such as authors' affiliations, universities, etc. This research protocol built around a metadata-based systematic literature review could be considered the first phase in a systematic literature review.
In contrast to more conventional methods, we have two phases: First, similar to a traditional systematic review, the selection of the relevant articles is performed via a search engine, except the expert does not select the relevant articles from the results presented to her. Here, the expert chooses the keywords and creates a comprehensive dataset of all the documents matching the keywords in the title, abstract, keyword, and keyword + section. The first phase, being automated, allows the utilization of quantitative criteria to filter down the dataset. Then, in the second phase, a dataset reduction to 50-200 documents is made by an expert.
To summarize, one of the critical contributions of a metadata-based systematic literature review is to reduce-though not wholly-potential human biases. Another significant contribution of this new methodology based on these two phases is that it allows us to consider the documents' metadata in a text format. By adding a computational treatment based on Natural Language Processing (NLP) techniques to transform the text into data, we can then provide analyses that would not be possible otherwise, leveraging analytical approaches such as graph theory. It is particularly relevant to discover research patterns, research history, the actual research vehicles, or to be able to associate discoveries with institutions, to name a few examples. These sophisticated techniques allow us to perform a literature mapping thanks to this computational approach.
Another critical point is the large size of the dataset, which has a lot of favorable statistical properties. We will also use algorithms to help us analyze quantities of papers that we would not be able to do otherwise due to the sheer quantity of information analyzed by a human.
Finally, another important dimension is using each document's reference section to perform metrics that allow researchers to understand the knowledge transmission patterns.
Beyond the computational treatment and to leverage the results obtained from these computations, we use the following theoretical framework.  propose to look at three different structures: the conceptual, intellectual, and social structures. The conceptual structures are about leveraging the metadata to help us understand which concepts and topics are used in the academic conversation and how they have evolved through time. The intellectual structure will help us understand who produced these concepts, which journals played a pivot role in this nascent literature, and which articles were among the most referenced that fueled this literature. Lastly, the social structure will allow us to look at authors' collaborations and the knowledge support from universities and countries through their collaborations.
The data collection will be conducted using a "human-in-the-loop" (HIL) approach. It consists of proceeding to a purely automated data collection with an ex-post validation based on the field expertise.
First, we use an automated process in two phases as described earlier. The search was performed on the publisher-independent citation database "Web of Science" (WoS), Clarivate Analytics, by using combinations of keywords (and simultaneously removing the duplicates): "neural network*" AND "finance*".
These keywords allow us to build our sample. This sample does not aim at being representative of the domain. Instead, it intends to analyze the dynamics of the conversation about neural networks in finance. By building a sample about a modeling technique, we risk overfitting the true representativity of neural networks in finance if someone is interested in generalizing; this is not our intent.
We then use human-based field expertise to review the references anyway while adding some potential missing references based on the domain expertise (see Appendix A for a list of the added references). HIL allows us to have a combined qualitative assessment with pure automatic data collection. This second step is marginal in terms of added articles, but it is crucial for quality control.
Our approach differs at these two levels: in the sample creation, we try to be as comprehensive as possible on a particular topic, here "neural network*" AND "financ*". The stars mean that we collect any occurrence with a declination of the word's root. We use neural networks as a proxy for machine learning techniques as authors who use neural networks also reference machine learning in their keywords (among 10,160 used keywords and 3606 keywords Plus, see Table 1). So, the sample includes papers on machine learning as well. The sample is likely not comprehensive, as in any systematic literature review, but it is larger than conventional methods. The sample is collected by finding matches in the text title, the abstract, the keywords, and the keywords + in Web of Science. It helps us create a 5053 rich sample, a larger sample than regular, systematic reviews. We can deal with a larger sample thanks to the second differentiation point of our methodology: leveraging the sample metadata through computational techniques. The dataset can be found on the following webpage, including a search engine: https: //warin.ca/posts/article-machine-learning-finance/ (accessed date: 29 June 2021).
In this second level of differentiation, we create and use the metadata from the title, the abstract, the keywords, and the keywords +. The creation of metadata is conducted via Natural Language Processing (NLP) techniques. We prepare the dataset by selecting tokens, n-grams, etc. (Aria and Cuccurullo 2017).
These metadata are helpful to provide quantitative analysis to the sample. Using these machine learning tools allows us to have a research synthesis that can be leveraged with other techniques such as social network analysis. We can also look at the dynamics of the research contributions, the collaborations, the idea generation, and propagation. Let us first look at the descriptive statistics before studying the dynamics of the research in this sample. We present the main descriptive statistics and empirical findings from the systematic literature review in the next step.

Descriptive Statistics
The relevant 'universe' of the literature consists of references identified in the HIL-Web of Science citation database (see Table 1) totaling 5053 documents, most of which are published in refereed journals (see Table 2). The literature review covers the period between 1 January 1990, and 10 May 2021 (see Figure 1).  The overall number of documents in our sample is 5053 (see Table 1). This number is the cumulative result of each year, and we can observe a significant rise in the number of documents per year. The average citations per document are 14.66 but have evolved through time to numbers ranging between 1 and 2. As a reference point, the total citations per paper in economics and business for the highly cited papers were 3.04 for the 2011-2015 period and 3.91 for the 2017-2021 period. In Social Sciences in general, the total citations per paper for the highly cited papers were 2.89 for the 2011-2015 period and 3.30 for the 2017-2021 period. These results show the normalization of machine learning in financerelated documents.
The number of articles dominates the sample for the overall period (see Table 2) with 2719 occurrences, followed by 1974 proceedings papers. So, short contributions (articles and proceedings papers) represent the actual output in this sample. Authors indeed tend to produce the knowledge body about machine learning in finance through short contributions (e.g., ).

Descriptive Statistics
The relevant 'universe' of the literature consists of references identified in the HIL-Web of Science citation database (see Table 1) totaling 5053 documents, most of which are published in refereed journals (see Table 2). The literature review covers the period between 1 January 1990, and 10 May 2021 (see Figure 1).  Our database of references covers 308 keywords and 946 author appearances (see Table 3). Most of the publications are multi-authored documents, indicating the increasingly collaborative nature of research in the finance domain. The descriptive statistical analysis also reveals that, on average, there are 2.32 authors per publication and 2.72 co-authors per publication (see Table 4). Most of the documents are collectively written. Only 661 documents have a single author. To conclude this descriptive statistics section, we observed a similar trend in the academic production about machine learning in finance based on short documents and co-authorship. Let us now analyze the three different structures: conceptual, intellectual, and social.

Conceptual Structures of Our Sample
The application of AI in the domain of finance is not a recent phenomenon in the academic literature (e.g., Gavrishchaka and Banerjee 2006;De Spiegeleer et al. 2018;Huang et al. 2020). However, the last decade witnessed empirical studies using Machine Learning Algorithms (MLAs) to examine credit risk analysis and forecasting stock returns. As Dixon et al. (2020, p. vii) highlight, "ML in finance sits at the intersection of several emergent disciplines, including pattern recognition, financial econometrics, statistical computing, probabilistic programming, and dynamic programming". One of the main competitive advantages of ML is that computers have an outstanding ability to process large amounts of financial information.
From a methodological perspective, the empirical studies rely not only on conventional MLAs such as support vector machine (SVM) and k-nearest neighbors (kNN) but also on Deep Learning (DL) (e.g., Huang et al. 2020), an advanced technique based on artificial neural network algorithms (e.g., Chung-Ming and White 1994; . Some DL models were also used to predict stock prices (e.g., Kraus and Feuerriegel 2017;Minh et al. 2017;Jiang et al. 2018;Matsubara et al. 2018). For instance, Schumaker and Chen (2010) make a stock market forecasting based on financial news articles using a text classification approach. Glasserman et al. (2020) study using the supervised Latent Dirichlet Allocation (sLDA) framework to select news articles topics to explain stock returns.
The network analysis has been used more in the context of financial stability analysis and financial linkages. Another strand of the literature examines the impact of views and opinions of investors-also known as investor sentiment-on stock price movements. The sentiment analysis aims to capture news by traditional and/or social media and assess the investors' views and market mood (e.g., Mitra and Mitra 2011;Mitra and Yu 2016). The assessment of market sentiment-often captured by market indices-can be strengthened by sentiment analysis of the market mood or investors' emotions. A popular approach is to extract relevant news articles, preprocess the text, and assign a sentiment score to each article. The sentiment score is then commonly calculated as the difference between the number of positive and negative words in the article divided by the total number of words. The studies use a reputable lexicon of financial terms-such as Loughran and McDonald (2011) lexicon-to determine positive and negative words.
In the following sub-sections, we will consider the conceptual structures of our sample by looking at the keywords, the keywords co-occurrences, and the evolution of the topics based on a topic modeling technique.

Keywords Analyses
We consider here the entire words that we find in the keyword section of every document. Remember that the sample was created using "neural network*" AND "finance*" (see Figure 2). It is thus expected that authors would again put neural networks as keywords in the keyword section. They will also associate other keywords such as prediction, forecasting, or machine learning, including deep learning. This is evidence that our sample goes beyond just neural networks but also covers other related topics.
It is interesting to see that deep learning is a very recent addition to the fintech field, as approximated by our sample. It is also interesting to notice that it is recently that the reasons why we would use the new techniques in finance have appeared, for instance, the role of these new methodologies in prediction. Machine learning techniques are indeed a paradigm shift when it comes to their predictive power.
We consider here the entire words that we find in the keyword section of every document. Remember that the sample was created using "neural network*" AND "finance*" (see Figure 2). It is thus expected that authors would again put neural networks as keywords in the keyword section. They will also associate other keywords such as prediction, forecasting, or machine learning, including deep learning. This is evidence that our sample goes beyond just neural networks but also covers other related topics. It is interesting to see that deep learning is a very recent addition to the fintech field, as approximated by our sample. It is also interesting to notice that it is recently that the reasons why we would use the new techniques in finance have appeared, for instance, the role of these new methodologies in prediction. Machine learning techniques are indeed a paradigm shift when it comes to their predictive power. Table 5 represents the top keywords in the overall sample and the top keywords per year. It is interesting to see keywords ranking through time and how the literature has evolved in machine learning ownership and maturity, with deep learning papers moving up the ladder.  Table 5 represents the top keywords in the overall sample and the top keywords per year. It is interesting to see keywords ranking through time and how the literature has evolved in machine learning ownership and maturity, with deep learning papers moving up the ladder.
To go beyond a single-dimensional perspective of the keywords, let us look now at the co-occurrences matrix.

Keywords Co-Occurrences Network Analyses
Now, we are interested in looking at the keywords co-occurrences. When a keyword is used, it is possible to build a count matrix and compute its relationships with other keywords. From there, we can compute some relevant network indicators (centrality, density, etc.). Several figures will plot the relevance degree (centrality, or notions of 'importance') against the development degree (density). Degree centrality counts the number of links held by each node and points at themes that can easily connect with the broader network. The density of a network is the frequency of realized edges relative to potential edges.
In Figure 3, we represent the graphs based on the network indicators. The first figure is the network of keywords for the entire sample, while each other graph represents a network for 2021, 2020, 2019, 2018, and 2017, respectively.
When we consider the co-occurrences networks, particularly the years 2021 and 2017, we observe that most of the conversations are organized around two groups, representing both computer techniques and mathematical approaches. Only recently, applications in finance are starting to appear, such as the prediction of bankruptcies.
In Table 6, we compute the mathematical features of the networks. We observe that the size of the networks has been on the rise in the past years, showing an increase in the spread of the concepts. It is accompanied by a decrease in density through time with a slight increase in the average path length, confirming potentially that the literature opens up to applications.

Topic Modeling-Based Analyses
In the following analysis, we will add a new dimension based on structural topic modeling. The goal here is to complement the information we obtained from the keywords co-occurrences. A structural topic modeling first means that we will leverage words including the keywords section and beyond: the title section, the abstract, and the keyword + section.
We tokenize all the words, and we compute the latent variables to identify potential topics. In the following figures, we represent this analysis. The top-left figure covers the whole period, while the other figures represent each year, 2021, 2020, 2019, 2018, and 2017, respectively.   We found the topics mapped in four dimensions: basic themes, emerging or declining, niche themes, and motor themes.
Interestingly, data mining and neural networks were part of the fundamental themes in 2017 (see Figure 4). Since we consider mostly finished documents in our sample, it means the work from the researchers has started a bit earlier, likely one or two years before.  In 2017, a generic algorithm was an emerging theme as well as network theory. We see here a burgeoning reflection about what will become the contribution from data science in finance. Comparing 2017 and 2020, and 2021, it is interesting to see that the motor themes are about the predictive capacity of machine learning-based models. We can also observe the emerging sub-field of deep learning in finance. We can easily extrapolate and imagine that deep learning in finance will have a prominent future in the field.
We want to insist on the inductive nature of machine learning: it is inductive by nature but does not come with the former empirical baggage of being potentially biased and lacking theoretical grounds (the falsification potential, etc.). Inductive in the context of ML implies finding causal patterns in empirical data.

Intellectual Structures of Our Sample
An interesting analysis stems from the investigation of which authors and organizations are driving the dynamics of this topic.

Authors
In the intellectual structure, authors are interesting to consider. We can see that the top authors have published more than 30 papers on this topic in our sample ( Figure 5). Manag. 2021, 14, x FOR PEER REVIEW 13 of 33

Intellectual Structures of Our Sample
An interesting analysis stems from the investigation of which authors and organizations are driving the dynamics of this topic.

Authors
In the intellectual structure, authors are interesting to consider. We can see that the top authors have published more than 30 papers on this topic in our sample ( Figure 5). We can go a little deeper and look at the average productivity of all the authors (see Figure 6). It has not evolved much through time, and on average, every author produces two articles a year on this topic.
We can also look at the authors' dominance ranking through time (see Figure 7). The authors' dominance is computed by looking at how many times an author is a first author in a multi-authored paper. It can be a weak indicator as the alphabetical order is respected most of the time, irrespective of the marginal contributions, as assumed by this indicator.
Interestingly, it is interesting to see that authors unfavored by the alphabetical order, such as Zhang or Wang, are still making the top 10 of this ranking. We can go a little deeper and look at the average productivity of all the authors (see Figure 6). It has not evolved much through time, and on average, every author produces two articles a year on this topic.
We can also look at the authors' dominance ranking through time (see Figure 7). The authors' dominance is computed by looking at how many times an author is a first author in a multi-authored paper. It can be a weak indicator as the alphabetical order is respected most of the time, irrespective of the marginal contributions, as assumed by this indicator.
Interestingly, it is interesting to see that authors unfavored by the alphabetical order, such as Zhang or Wang, are still making the top 10 of this ranking.      Table 7 illustrates the citations of the articles in our sample. We can go a little further and look now at the articles that authors in our sample include in their references. As such, those references are the foundations of this nascent literature in machine learning in finance. Let us look at the top authors in the references of each paper (see Figure 8).

Articles
We can also look at the most cited references in terms of journals beyond their authors. The most cited authors and the most cited references will match, but it is interesting to see the nuances (see Figure 9).
It is interesting to note that the literature has not moved too much from the top papers from 2017 to 2021. We can also look at the most cited references in terms of journals beyond their authors. The most cited authors and the most cited references will match, but it is interesting to see the nuances (see Figure 9).
It is interesting to note that the literature has not moved too much from the top papers from 2017 to 2021.

Social Structures of Our Sample
In this section, we will spend time on different measures to capture the social connections: the co-citations of authors, the co-citations of articles, the co-citations of journals, and the collaborations across institutions. Figure 10 highlights the evolution of authors' collaborations. We can observe that it is still a narrow network of collaborators. We are showing the nascent nature of the field. We represent here the network of the top authors.

Co-Citations of Authors
As we can see in the previous figure, the top authors are still working nearby within their groups of collaborators. The next question is to know whether it is still the case for co-citations of articles.

Social Structures of Our Sample
In this section, we will spend time on different measures to capture the social connections: the co-citations of authors, the co-citations of articles, the co-citations of journals, and the collaborations across institutions. Figure 10 highlights the evolution of authors' collaborations. We can observe that it is still a narrow network of collaborators. We are showing the nascent nature of the field. We represent here the network of the top authors.

Co-Citations of Authors
As we can see in the previous figure, the top authors are still working nearby within their groups of collaborators. The next question is to know whether it is still the case for co-citations of articles.

Co-Citations of Articles
When a reference was addressed by two articles published in the same journal, this reference was included in the co-citation network of references (see Figure 11). Therefore, the co-citation network addressed the expected references to the concept of uncertainty in articles published by a journal. J. Risk Financial Manag. 2021, 14, x FOR PEER REVIEW 18 of 33

Co-Citations of Articles
When a reference was addressed by two articles published in the same journal, this reference was included in the co-citation network of references (see Figure 11). Therefore, the co-citation network addressed the expected references to the concept of uncertainty in articles published by a journal.  In our sample, most of the authors in finance are residents of the People's Republic of China, the United States, the United Kingdom, and India (see Table 8). While the dominant presence of authors from the advanced economies is undisputed, it is also noticeable that the law of large numbers ensures the participation of authors from several Emerging Market Economies (EMEs).  Table 9 provides Supplementary Materials on the total citations per country. Asia and China, in particular, dominate the ranking. Figure 12 shows an apparent increase in the contributions coming from Asia: China and India being at the forefront of academic production.
Starting from a bibliographic matrix, two groups of descriptive measures are computed: (1) the summary statistics of the network and (2) the leading indices of centrality and prestige of vertices.
This group of statistics presented in Table 8 allows us to describe the structural properties of a network: (1) 'size': is the number of vertices composing the network; (2) 'density': is the proportion of present edges from all possible edges in the network; (3) 'transitivity' is the ratio of triangles to connected triples; (4) 'diameter' is the longest geodesic distance (length of the shortest path between two nodes) in the network; (5) 'degree distribution' is the cumulative distribution of vertex degrees, and (6) 'degree centralization' is the normalized degree of the overall network.
When it comes to countries' collaborations, China and the USA are at the center of the graph (see Figure 13). Most of the international collaborations are between China and the USA. There seems to be a slight regionalization of collaborations, China with Asian countries, though it is much less apparent in the case of the USA, which seems to be a bit more eclectic in terms of collaborations.  Starting from a bibliographic matrix, two groups of descriptive measures are computed: (1) the summary statistics of the network and (2) the leading indices of centrality and prestige of vertices.
This group of statistics presented in Table 8 allows us to describe the structural properties of a network: (1) 'size': is the number of vertices composing the network; (i2)   Considering the results mentioned above, it confirms that Asia and China are somehow at the forefront of the academic production on neural networks and the larger machine learning domain in finance. It is interesting to the connections with other countries, notably in Europe. Below, we will also investigate the connections at the institutional level.

Co-Citations of Journals
We will look at which journals have contributed to developing the field's methodological transformation in what follows. Through time (see Table 10), we will see that it mostly started in more engineering journals to penetrate the finance field. Still, nowadays, the ranking is dominated by more engineering-oriented journals.  Figure 14 is an excellent illustration of the evolution of the knowledge map seen through journal co-citations. It is interesting to see the origin of the transformation and the pace of the penetration of machine learning in finance journals and through which channels. It is worth noticing the pivotal role played by the "Expert Systems with Applications" journal.

Co-Citations of Institutions
Related to Figure 13, it is interesting to study the collaborations through a different indicator: the co-citations of institutions.

Co-Citations of Institutions
Related to Figure 13, it is interesting to study the collaborations through a different indicator: the co-citations of institutions.
The network of university collaboration is also well developed (see Figure 15), indicating a strong presence of Chinese, U.S., and Indian universities. It is interesting to notice a slight geographical concentration of China and Europe, the U.S. and Canada. Geography seems to be a factor in the collaborations. J. Risk Financial Manag. 2021, 14, x FOR PEER REVIEW 26 of 33 The network of university collaboration is also well developed (see Figure 15), indicating a strong presence of Chinese, U.S., and Indian universities. It is interesting to notice a slight geographical concentration of China and Europe, the U.S. and Canada. Geography seems to be a factor in the collaborations.  To conclude, we visualize the main items of three fields (e.g., authors, keywords, journals) and how they are related through a so-called Sankey diagram. The three fields plot in Figure 16 also reveals the rising importance of deep learning and neural networks in finance and its most robust channel for articulating academic contributions, the Experts Systems with Applications Journal for the overall period, and IEEE Access for most of the latest five years.
In the past five years, IEEE Access has been a prominent vehicle for developing the academic conversation on neural networks in finance and, most importantly, deep learning in finance. To conclude, we visualize the main items of three fields (e.g., authors, keywords, journals) and how they are related through a so-called Sankey diagram. The three fields plot in Figure 16 also reveals the rising importance of deep learning and neural networks in finance and its most robust channel for articulating academic contributions, the Experts Systems with Applications Journal for the overall period, and IEEE Access for most of the latest five years. In the past five years, IEEE Access has been a prominent vehicle for developing the academic conversation on neural networks in finance and, most importantly, deep learning in finance.

Conclusions
Neural networks in finance are becoming increasingly popular tools to analyze financial market trends based on preprocessing and transforming a large amount of information into machine-readable data. It would be a mistake to attribute this development solely to the outstanding computing power and storage capacity growth.
ML can make essential contributions to the technical analysis of financial market trends. It has a wide variety of applications: supervised, unsupervised, and semisupervised learning; reinforcement learning; inverse reinforcement learning; imitation learning; self-learning; feature learning; sparse dictionary learning; anomaly detection, etc. A subfield at the intersection of linguistics, computer science, and artificial intelligence-Natural Language Processing (NLP)-has found numerous applications in finance.
This article demonstrated the basic steps required to conduct a metadata-based SLR in the finance field.
The method can help generate topic-specific existing knowledge, trends, and gaps observed and the derivation of a conclusion suitable for policymakers and the scientific community.
Indeed, in this article, we conducted a metadata-based systematic review of the academic contributions to finance between 1990 and 2021. A metadata-based systematic literature review complements more conventional approaches to systematic literature reviews. It allows to collect more significant amounts of documents and then analyze the current dynamics within the collected documents. This article leverages the text information found in this dataset. Titles, abstracts, keywords, authors' names, institutions, and references are transformed into quantitative indicators. From there, using text-as-data techniques such as NLP as well as graph theory, we could provide a mapping capturing multiple dimensions. In particular, we used a theoretical framework that organizes the literature's mapping through three dimensions: conceptual, intellectual, and social. Beyond this mapping, we also used two techniques to deal with the data: NLP and graph theory.

Conclusions
Neural networks in finance are becoming increasingly popular tools to analyze financial market trends based on preprocessing and transforming a large amount of information into machine-readable data. It would be a mistake to attribute this development solely to the outstanding computing power and storage capacity growth.
ML can make essential contributions to the technical analysis of financial market trends. It has a wide variety of applications: supervised, unsupervised, and semi-supervised learning; reinforcement learning; inverse reinforcement learning; imitation learning; selflearning; feature learning; sparse dictionary learning; anomaly detection, etc. A subfield at the intersection of linguistics, computer science, and artificial intelligence-Natural Language Processing (NLP)-has found numerous applications in finance.
This article demonstrated the basic steps required to conduct a metadata-based SLR in the finance field.
The method can help generate topic-specific existing knowledge, trends, and gaps observed and the derivation of a conclusion suitable for policymakers and the scientific community.
Indeed, in this article, we conducted a metadata-based systematic review of the academic contributions to finance between 1990 and 2021. A metadata-based systematic literature review complements more conventional approaches to systematic literature reviews. It allows to collect more significant amounts of documents and then analyze the current dynamics within the collected documents. This article leverages the text information found in this dataset. Titles, abstracts, keywords, authors' names, institutions, and references are transformed into quantitative indicators. From there, using text-as-data techniques such as NLP as well as graph theory, we could provide a mapping capturing multiple dimensions. In particular, we used a theoretical framework that organizes the literature's mapping through three dimensions: conceptual, intellectual, and social. Beyond this mapping, we also used two techniques to deal with the data: NLP and graph theory.
The results are a mapping of the literature through these three dimensions. Researchers can use this mapping to select a sub-sample to perform the systematic literature review of their choice.
This mapping is helpful for researchers, university administrators willing to understand the evolution of the finance field, and policymakers. Concerning the latter, the conversation in academic circles about machine learning in finance finds its parallel in the financial industry with the development of the so-called fintech. It is relevant to map col-laboration networks both at the authors' level and the institutional level for policymakers. It is also relevant to be able to visualize the knowledge maps.
For further research, the appearance of artificial intelligence and machine learning, in particular in finance, is quite attractive in the context of the old-time debate between the theorists and the chartists. While the opposing theorists and chartists debate is still relevant, we conjecture that ML techniques could shed some new light on theoretical advancement. MLAs are not an atheoretical approach, as it is premised on inductive reasoning, which generates causal relationships based on the state of information at the moment of estimation. The main advantage of ML is the ability to process vast information, simultaneously ignoring ideological standpoints or inclinations to a particular school of thought.