Big data has become a very popular expression in recent years, related to the advance of technology which allows, on the one hand, the recovery of a great amount of data, and on the other hand, the analysis of that data, benefiting from the increasing computational capacity of devices. Big data has been used in several research areas such as business intelligence (Chen et al. 2012
; Sun et al. 2018
), marketing (Verhoef et al. 2015
; Wright et al. 2019
), economics (Glaeser et al. 2018
; Sobolevsky et al. 2017
), health (Pramanik et al. 2017
; Rose et al. 2019
), and psychology (Matz and Netzer 2017
; Adjerid and Kelley 2018
), among many other areas and studies which could be mentioned.
Another area where big data is being applied is finance. In this particular case, the existence of large amounts of data allows a very broad type of analysis, from general indices to single specific assets. In particular, the use of big data allows the analysis of complex problems and has attracted the attention of physicists in recent decades. In fact, big data and complexity are intimately related to the emergence of a new research area called Econophysics.
In this paper, which is a brief introductory approach to the special issue “The Use of Big Data in Finance”, we start by presenting a general view of big data and its advantages in finance (Section 2
), building the bridge to Econophysics (Section 3
) and some of its possible approaches, in particular, power laws and networks (Section 4
, Section 5
and Section 6
), and conclude the paper by including some suggestions for future research (Section 7
2. The Use of Big Data in Economics and Finance
The use of big data allows the analysis of (very) large datasets, reaching conclusions for some processes which could involve complex analysis. With the growth of web access in recent decades, the amount of data available has increased exponentially. This large amount of available data and its use has influenced society, communication, habits, and even cultural aspects, so understanding, interpreting, and knowing how to use this amount of data has been a challenge for data scientists. In this context, big data was initially described by four different Vs: Velocity, Volume, Veracity, and Variety, as follows.
Velocity, referring to the speed with which the data is produced, with this parameter being related to processing capacity.
Volume, related to the amount of data to be processed.
Veracity, referring to data feasibility.
Variety, characterized by the heterogeneity, diversity, structures, and scales of the data.
This original V model of the big data paradigm is now added to with other data characteristics. For example, volatility, referring to the validity of the data or value referring to the potential use of the data (see, for example, Tennant et al. 2017
). Other features are found in the literature, both using the initial V letter and others, like the Ps proposed by Lupton
), who just asks about the importance of big data in society in general and in the academic world in particular (see, for example, Kitchin and McArdle 2016
for more features). The objective of identifying big data in V was to outline the existence of different interconnected and measurable properties (Carbone et al. 2016
The research area of finance is very sensitive to the use of this kind of approach, as there is a huge amount of data available, with some being publicly disclosed and more available in databases via subscription. Platforms such as Google, Twitter, and others give rise to the need for effective verification in terms of whether the large amount of information available on the web can help to analyze and predict financial variables (Preis et al. 2013
). Verifying whether or not this large amount of information influences financial markets requires tools from complex systems (Arthur et al. 1998
; Rosser 1999
; Carbone et al. 2016
). In this regard, the use of models, methods, and techniques available from the physics of complex systems, such as multifractal analysis, multiscalar analysis, temporal networks, or multilevel networks, have been useful in recent research and can also be useful for future research.
Another application involving big data has been developed by The Observatory of Economic Complexity (Hausmann et al. 2014
), derived from the idea of the Economic Complexity Index (ECI; Hidalgo and Hausmann 2009
). This index provides extensive information about more than 130 countries, making it possible to analyze the role of knowledge in the production of goods for countries’ economic growth and development. Since Solow
), the theory of endogenous growth has contributed to the understanding of how human capital influences economic growth. However, it is also necessary to consider whether human capital is incorporated into a country’s productive system by providing the production of goods with relative complexity and diversity. Thus, Hidalgo and Hausmann
) and Hausmann et al.
) created the ECI, which considers both the level of knowledge used to export goods and the diversity of production of those goods. A country has high economic complexity if the goods produced incorporate a high level of knowledge and productive diversity. One such example is Japan, which, according to ECI, had the most complex economy in the world between 2011 and 2016 due to the complexity and variety of products exported. In contrast in Botswana, which ranks 129th according to the ECI, 52% of exports comprise only refined copper, that is, a product that requires little knowledge for manufacture.
3. From Big Data to Econophysics
As previously mentioned, finance is one of the areas showing increased work which could be considered as using big data, and has attracted researchers from other research fields, such as physicists, even creating multidisciplinary research fields such as econophysics.
Econophysics is a neologism used in the branch of Complex Systems from Physics seeking to make a complete survey of the statistical properties of financial markets, using the immense volume of available data and the methodologies of statistical physics (Mantegna and Stanley 1999
). The term Econophysics was coined by Stanley et al.
) when they analyzed the Dow Jones Index and found that stock returns followed a power law distribution, contributing to the emergence of this new research field. Although use of the neologism is relatively recent, the approximation between physics and finance is not new, beginning probably in the 1960s, when Mandelbrot
), analyzing the returns of cotton prices, refutes the condition of normal asset prices (Jovanovic and Schinckus 2013
). For Mirowski
), neoclassical economics had a strong influence on theoretical physics, contributing to economic theory throughout the 20th century. Mandelbrot’s ideas about the non-normality of financial returns remained forgotten, until Mantegna
), analyzing the Italian stock market, discovered that returns were compatible with Lévy stable non-Gaussian distributions.
One of the hot topics of Econophysics, where the connection to financial theory is closer, is analysis of the efficient market hypothesis (EMH). Although the EMH was only formalized by Fama
), this hypothesis implying that the market reflects all the available information on financial assets’ prices, it was formally analyzed by Bachelier
) and Samuelson
), among others, before the definition of Econophysics. Since Fama
), this topic has been studied greatly in the literature (see Lee 2008
; Titan 2015
for reviews). On the basis of the EMH, asset prices may be described by a random walk with this possibility being analyzed through the Hurst exponent, which is a very well-known approach in Econophysics. Methodologies like the rescaled range (R/S) analysis of Detrended Fluctuation Analysis (DFA) are widely used to make this estimation and are found in several studies, such as Costa and Vasconcelos
), Di Matteo et al.
), Wang et al.
), López and Contreras
), Kristoufek and Vosvrda
), Cao and Zhang
), Anagnostidis et al.
), Nadarajah and Chu
), or Ferreira et al.
), among many others.
Regardless of the features identified in the previous section, it is important to understand that big data can be used for commercial and financial purposes. In different areas, firms can use their data with the objective of raising their returns (see, for example, Subrahmanyam 2019
). Obviously, thanks to increased computational capacity, the financial sector can also use Econophysics models to identify, test, and evaluate models which could be used, for example, to identify patterns in financial (big) data (see, for example, Preis et al. 2010
, among many others which use big data trying to anticipate possible warning signs in financial markets).
Regarded with some distrust in mainstream economics at the beginning (see, for example, Ball 2006
; Schinckus 2018
), this may now be changing, with some economists adopting methods originating in the field of Econophysics and increasing recognition of its wide applicability in several areas of the economy. Thus, Econophysics is slowly being assured its place. Initially, applications were restricted to financial markets and in rare cases macroeconomics. Currently, Econophysics is used in several areas of the economy, such as energy (Filip et al. 2016
), regional economics (Gao and Zhou 2018
), or environmental economics (Stolbova et al. 2018
), among many others, demonstrating that this discipline is attaining greater importance in solving various economic problems. In complexity, an area that has gained prominence is complex networks.
4. Power Laws in Finance
One type of research which has gained importance in Econophysics is the investigation of power laws. While some economic models and hypotheses, such as the efficient market hypothesis of Fama
) and Fama
) and the Black and Scholes
) model, have assumed that returns follow a normal distribution, Econophysics has contradicted this since its emergence: if the distribution of stock returns follows a power-law distribution, this implies that large fluctuations in stock exchanges can occur. Accepting that financial markets are subject to wide variations can contribute to mitigation of these financial instabilities or even prevent them (Pereira et al. 2017
The idea that stock returns follow a power-law format is recurrent in Econophysics (see, for example, Stanley et al. 1996
; Lux 1996
; Mantegna and Stanley 1999
; or Gabaix 2009
, among many others). One power-law can be defined as
, where k
are constants, asymptotic values of a variable x
). Thus, power-laws have played an important role in economics to the extent of warranting an extensive article in the Journal of Economic Perspective (JEP), in which Gabaix
) demonstrates their applications in relation to finance, city size, executive salaries and macroeconomics, very different subjects.
Therefore, power laws become a stylized fact in financial time series, contributing to the interpretation of financial markets as unstable, which is noticeable during crisis episodes like the subprime one. The recognition that Econophysics approaches could be useful in the context of financial crises meant that in December 2017, the American Economic Association (AEA) hosted a conference on Econophysics attended by a group of physicists and economists with the intention of demonstrating a little more about this new research field. The positive attitude of the AEA is an important step towards achieving understanding of the two areas, economics and physics, which have always been very close. During the presentations at the AEA conference, some of the most important researchers in the new discipline were able to show some of the advances in economics and its perspectives.
5. Complex Networks
The structure of a complex network is represented in the same way as a graph in a set R, which, in the case of networks that have no weights in their connections, is defined by
the nodes (or vertices) and
, the edges or connections that link pairs of nodes. The numbers
are considered as the quantities of elements in
, respectively (Newman 2018
Complex systems, in general, involve innumerable elements organized in structures that can exist or coexist, in different scales. Most of their main characteristics emerge from interactions between their constituent parts, and cannot be predicted from an isolated understanding of each of these parts. In this context, complex networks can be located at the intersection of graph theory and statistical mechanics, involving several knowledge areas, and therefore their study can be considered as a multidisciplinary approach (Costa et al. 2007
It can also be highlighted that complex networks have contributed to the economy by proposing new methods, techniques and properties (Schweitzer et al. 2009
). In this context, one research area which benefited from these new approaches is finance, for which network theory enabled measurement of the probability of systemic risk, due to the interconnections and interdependence between the agents of a given system or market, in which the insolvency or bankruptcy of a single entity (or group of entities) can cause chain failures (Jackson 2010
). In this context, Boss et al.
) showed that before a financial crisis, the world banks’ payment systems were interconnected and had a probability distribution in the form of a power-law, meaning that a large part of the transactions involving those payment systems was concentrated in a very small number of banks, while many others traded a smaller amount.
Economists consider the importance of a payment institution according to the volume of resources it administers. However, the concept of centrality extracted from complex networks helps to classify the importance of these institutions based on how central they are in a given network (in that case, the financial system).
An important property of networks is the centrality, which quantifies the importance of the vertices (or edges) that are in a networked system. In complex networks, there is a wide variety of mathematical methods and measures of centrality of vertices that focus on different concepts and definitions of what it is to be central in a network. A simple measure of centrality in a network is the degree of vertex, which represents the number of edges connected to it, being considered one of the most important network metrics (Newman 2018
6. Dynamics in Complex Networks
Considering the concept previously described, Battiston et al.
) propose that instead of considering financial institutions “too big to fail” in terms of default risk, they can be considered “too central to fail”, i.e., monitoring the centrality of a financial institution rather than its size. This may better explain how a crisis can spread in a banking system, as negative shocks to central financial institutions can cause a system-wide contagion effect. When it is discovered that there are too many central markets to break, this implies that some relevant financial markets, such as those of the European Union or the United States, have a high centrality and any disturbance in them can affect, practically, the entire network which is connected to them (Pereira et al. 2019
In fact, one of the most interesting and relevant issues in networks is related to their dynamics, found in several complex systems, and not only in finance, with dynamic issues like congestion, cascading failures, spreading, and synchronization often being identified (Motter et al. 2006
; Lorenz et al. 2009
; Elliott et al. 2014
). These concepts are intimately related to the diffusion of information and networks’ capacity to pass problems from vital nodes to the other network members (Jalili and Perc 2017
). Related to the connectivity of the networks, these dynamics could imply a greater or lesser possibility of contagion (Gai and Kapadia 2010
), which is often described in the literature.
Complex networks have influenced analysis in finance, as in, for example Haldane and May
), who analyzed the banking system as an ecological network susceptible to financial risks due to its topology. Diebold and Yilmaz
), using complex networks, demonstrate that during the period preceding the 2008 crisis, markets were more connected and more subject to instability. Highlighted among studies relating complex networks and the contagion effect are Gai and Kapadia
), Glasserman and Young
), or Acemoglu et al.
), who analyze the risk of financial networks and propose that financial contagion shows a form of transition phase, and to a certain extent, strong interconnections serve as a shock propagation mechanism, leading to the fragility of a given network. It is crucial to analyze the changing relevance of the nodes, according to Bartesaghi et al.
), which in the context of crises and their potential financial implications, could also be an interesting issue for analysis using big data.
Big data and complexity go hand in hand and Econophysics is a way to use this kind of data, with the particularity of being used mainly in finance. The availability of large datasets, jointly with increased computational processing, made big data and finance very attractive research areas in general, of particular interest in the analysis of crisis events.
Since the global financial crisis, governments around the world have been acting to improve financial stability and reduce the risks of a highly interconnected financial system, using complexity to do so (Yellen 2013
). That said, complexity has been providing methods and models that try to explain the instabilities occurring in different markets, leaving five important lessons for financial markets:
Extreme events can occur in stock markets.
The financial markets are interconnected.
Some sectors or companies are too central to fail.
Different systems, for example, public health, transport, industry, and finance, are interrelated, increasing the global risk.
Financial crises can be complex phenomena, when markets are in a transition phase, when any “shock” can cause a crisis and the consequent contagion effect.
In the current context, where we are facing the Covid-19 crisis, which could be considered as a complex phenomenon (Wagner 2020
), risk analysis is raised to another level, requiring the monitoring of several issues lying beyond economic or financial topics, and that risk analysis must incorporate issues related to the environment (for example, increased deforestation in the Amazon or global climate change), public health (spread of this or other epidemics), politics (corruption), credit (more than the usual default analysis, for example, the high university student debt in the USA), and energy (the dependence on oil from regions in geopolitical conflicts), among others. Therefore, there is a need for complexity to understand instability, in particular, high risks due to a globalized, interconnected world with a large volume of information. For all of these kinds of complex problems, the use of big data could be the solution to help authorities prepare for possible future challenges.
Considering the potential of studying the effects of the current crisis, future research could focus, for example, on the linkage between Google trends and this particular disease. It has already attracted the attention of some authors, for example, Effenberger et al.
), who relate the public interest of the disease with the number of new reported cases. In fact, the use of Google trends opens a wide set of possibilities for academic literature in general and for the study of economics and finance in particular. Recent studies such as Salisu et al.
) or Simionescu et al.
) show the possibilities of using this measure. Certainly, it could also be used in other contexts, such as the relationship between Google trends and investor sentiment. This linkage already exists (see, for example, Bank et al. 2011
; Kim et al. 2019
; or El Alaoui et al. 2020
, among many others) and could be used in the future to relate investor sentiment and financial markets during the COVID crisis but after, in recovery from the pandemic.
Recently, many studies have used high-frequency data in the analysis of financial markets, which could also be explored to analyze the effects of Covid (Corbet et al. 2020
). Despite the interest of using high-frequency data, even in the context of Econophysics methods to study financial markets’ multiscale and multifractality behaviors, it is also important to take into account the quality of big data, in particular, that of high frequency data. However, it is clear that the use of big data could have a direct impact on management practices, as it fosters the development of organizations’ knowledge (Choi et al. 2017
; Dugast and Foucault 2018
), including financial firms’ risk management (Campbell-Verduyn et al. 2017
; Fang and Zhang 2016
). Moreover, the use of big data could have an impact on reducing costs for firms and also for analysts, making activities more efficient in this domain (Fanning and Grant 2013
; Li et al. 2015
; Lee 2017