The Nexus between Big Data and Sustainability: An Analysis of Current Trends and Developments

With the development of technological innovations, Big Data is transforming the socioeconomic world, impacting almost every organization and person. The transformations associated with the development of Big Data have important consequences for the sustainability of organizations, regions, and the society as a whole, and as such, they have been specifically addressed by the academic literature focusing on sustainability. Despite its importance, and perhaps because of its rapid emergence, there is a lack of studies dealing with the analysis of this body of literature and its trends. The current research attempts to fill this gap. The study develops a bibliometric and visualization analysis of the literature on the nexus between Big Data and Sustainability. The research analyzes 726 documents on this topic, published until the end of 2020, in the Web of Science Core Collection database through the VOSviewer software. The results indicate the main trends and developments on the topic related to the most cited papers, authors, publications, institutions, and countries. The visualized frameworks, structures and trends are useful for both researchers and practitioners, as they can help them understand the current situation, issues to consider, and main developments on the topic.


Introduction
Big Data is very important for organizations and people, as its developments and associated transformations are not only enhancing long term innovation, competitiveness, and sustainability of organizations and regions but also changing the functioning and the understanding of nowadays society. In the new Ubiquitous Environment, derived from the "The Internet of Things", "Pervasive and Ubiquitous Computing", and the evolution of the semantic or "Web 3.0 Era" [1], the development of "artificial intelligence" and new robots, and technological improvements, people, scholars, and organizations need to understand these transformations to be able to adapt to them.
However, the search for sustainable big data developments, conceived as Big Data developments that follow a sustainable perspective, requires a deep understanding of the meaning of both terms (concepts and evolutions), the existing perspectives of their conceptualizations, and stakeholders' perceptions. Specifically, the diverse sustainability conceptualizations have to be considered in order to observe the sustainability of these developments, or the conception of sustainable development goals. In addition, the evolution of the technology, and the processes related to big data, must be also observed. This is essential because the economic and financial sustainability of organizations and areas, as well as the natural, human, and social capital sustainability [2] also depend on this knowledge because innovative Big Data applications can also be an important mechanism to enhance sustainability.
On the basis of the above, the aim of this research is to develop a deep analysis of the conceptualizations, state of the art and the emerging trends in the literature on the nexus between big data and sustainability. The paper also attempts to structure, and graphically illustrate, the research that relates big data and sustainability, through the development of a bibliographic and visualization analysis. In this way, both scholars and practitioners can understand the complexities and interactions when dealing with these issues.
To achieve the aforementioned objectives, this section briefly explains the importance of developing a bibliometric study to analyze BD&S research. After emphasizing the lack of previous review and/or bibliometric analyses of both terms, the section highlights the importance and contributions of our work. The next section introduces the main concepts and perspectives associated with big data and sustainability, pointing out the relevance of big data in sustainability research. Once the methodology has been described, the following sections develop the proposed bibliometric and visualization analysis.
Although there is an exponential growth in both the big data and the sustainability literature, little is known about the current state of knowledge on the relationship between the two fields of research. Bibliometric analyses are widely recognized and extended methods in the literature for such purposes. Specifically, these analyses are conceived as a cross-disciplinary science focused on quantitative analysis of bibliographic data through the use of statistical and mathematical tools [3]. The academy highlights the ability of this technique to analyze specific research areas from objective information in an easily treatable way [2]. The technique has been broadly used to identify and describe the state of the art and trends of a wide range of topics and disciplines, as well as networks and connections among topics, authors, institutions, countries, and research areas. Hence, we consider bibliometric analysis as an appropriate method to analyze and evaluate the quality and the research interest in the intersection between the two topics.
We have also found bibliometric studies connecting Big Data with specific research areas such as Bioinformatics [31], Blockchain research [32], Internet of things research [33], open data [34], Industry 4.0 Research [35][36][37], industry 4.0 technologies [38], Artificial intelligence [39], and Data science [40]. There have also been bibliometric analyses that look at the relationship between big data and certain sustainability-related aspects such as: the impact of big data and internet of things on the circular economy [41], Industry 4.0, process safety and environmental protection [42], sustainable industry 4.0 [43], open innovation and sustainable tourism [44], sustainable manufacturing and industry 4.0 [45], smart cities [14,46], urban sustainability [47], and sustainable supply chain [48,49]. However, we could not find any bibliometric or visualization study focusing on big data and sustainability.
Despite the current lack of BD&S research, understanding the relationship between Big Data and Sustainability is crucial. It is essential for authors and researchers to understand the state of the art of this literature, the main and new trends that connect Big Data and Sustainability, as well as observe the main perspectives and streams of research that have analyzed the connection of both terms, the pioneering studies, the topics that are underdeveloped, and the impacts and relationships among existing studies. In this regard, the present work can uncover the structure of these analyses and their developments, which is key in shaping and planning future research. In addition, the study is also relevant for practitioners and policy makers since the development of new business strategies, and public policies, related to Big Data must incorporate sustainability aspects. This, in turn, is vital for enhancing not only the company image and competitiveness, but also society's welfare, economic and social progress, and quality of life. Moreover, big data can be used to improve sustainability in many aspects. Specifically, the importance of analyzing both terms together is relevant because it can help public administrators and practitioners to develop Big Data applications, with sustainability concerns, and to better understand the conditions under which the implementation of Big Data can create benefits to improve the environmental, societal, and economic sustainability. For example, Big Data can benefit public organizations and address societal issues, as well as can help firms outperform their competitors [50]. In addition, the introduction of sustainable practices and strategies can also reduce costs, improve the organizations' environmental, social and governance ratings or even lead them to build better performance and competitive advantage (as it can lead to more operational efficiency, or to offer more value to customers, resulting in higher number of customers, better customer engagement, or even more turnover). Moreover, the sustainable practices can help managers and practitioners develop and make better decisions; their analysis and study can help researchers to better understand the causal mechanisms that lead companies to embrace sustainability in a successful way [51].
Consequently, given the lack of bibliometric analyses on BD&S, and given the need for this study, our research attempts to fill this gap by carrying out a bibliometric analysis that addresses this issue.
The paper is structured as follows: the next section discusses the concepts of Big Data and Sustainability as well as some of their interrelationships; Section 3 describes the data sources and bibliometric methodologies used to delve into the interrelationships; Section 4 presents the results of the analysis, visualizes the data, and presents the main findings; Section 5 provides discussion of the results, and, finally, Section 6 offers the main research conclusions.

Literature Review
The evolution from Web 2.0 through the Semantic web to the Ubiquitous Web of today, which involves the interaction between technology and Big Data, must consider the key role of sustainability for the development of any field. First, it is necessary to understand the new transformations and why Big Data is so important. The new organizational context has been increasingly defined by the advancement of Big Data, which has been triggered by the development of information technologies, social media, the Internet, Internet of Things, and cloud computing. As a result of these digital and technological innovations, an explosive growth of data in almost all industries and business areas is occurring [52]. In addition, Big Data has been increasingly recognized as one of the critical resources for the development and success of organizations. These developments are reinforced by big data-related tools and activities, such as learning machines and artificial Intelligence, robots, smart communication devices, business intelligence and analytics. Overall, those advancements represent the so-called Ubiquitous Era, which is conceptualized in this paper as a new epoch in which companies have to provide customized solutions to address the needs of customers in their specific context. More specifically, the concept is related to a global computing environment, which can provide users with "seamless and invisible access to computing resources" [53,54] (p. 421).
Big Data has proved essential not only for an improved customer service, but also for an enhanced internal management of all types of organizations and geographic areas. As such, big data has also incited academic interest, judging by the growth pattern of publications on Big Data since the beginning of the last decade [26] (p. 1179). The topic has rapidly evolved into a hot research area, attracting widespread attention from academia, industry, and governments around the world [52] (p. 59) across a wide range of disciplines [15]. The academic research on big data is particularly important in the business field [55], as Big Data is used in all areas of industry and business functions nowadays. The term refers to a large mass of digital data collected from various sources [15]. This rich data shows more details about the behaviors, activities, and events happening, whereby big data analytics gives access to the "variety and different types of data, from huge resources, with less response time" [56] (p. 111). It integrates the physical world (obtaining data through sensors, scientific experiments, and observations), human society (gathering data through social networks, the internet, and other mechanisms), and Cyberspace [52].
But what is the relationship between these processes and sustainability? Sustainability has developed as an important, growing research discipline and is a policy-relevant issue for businesses and public organizations. Based on the need to preserve limited resources, and, in line with the Brundtland report and Garrigos et al. [2], sustainability is defined as those actions and developments that meet the needs of today's societies, organizations, and people, without having a negative impact on the environment, ecology, society, landscape, culture, or heritage, and without compromising the prosperity and well-being of future generations. The field incorporates diverse perspectives, including natural, cultural, educational, human, political, institutional, and socio-economic ones and, essentially, represents a balance between economic, ecological, and social dimensions. In this regard, recent literature reveals economic (performance), environmental (energy, products, and services), and social (employment, local community) sustainability categories associated with corporate sustainability [57]. In addition, bringing economic growth and wealth should go hand in hand with environmental and societal concerns that affect key stakeholders, both internal and external. These stakeholders include people, governments, organizations, and industries, and their various purposes related to increasing wealth, quality of life, economy, and competitiveness. In fact, as Sanchez Planelles et al. [51] state, sustainability practices, from a managerial point of view, have to respond to new challenges and stakeholder pressures. In addition, these authors observe different concepts of sustainability from a corporate sustainability perspective, including holistic sustainability, sustainable business models, sustainable methodologies, sustainable operations, and sustainability-oriented innovation.
In this context, Big Data management, combined with technological development, is essential to reach the sustainability goals in the organizational world as digitalization widely impacts business and society [58]. More specifically, it should be noted that the transformations inherent to Big Data, and the development of the Ubiquitous Web, are changing the roles of both customers and organizations through sophisticated, open innovation techniques, co-design, co-creation, co-working, co-marketing, and crowdsourcing ("the act of taking a specific task and outsourcing it to a large group of people over the internet, through an open call" [59] (p. 95). As a result, there is increased customer involvement, as consumers become protagonists, in the design and creation of customized and personalized products and services [60][61][62][63].
However, its application is more extensive. Understanding big data and its consequences is vital to: create and develop new business models; facilitate management (e.g., business intelligence) and make better business decisions; design, develop, deliver, and sell better, and more personalized, services and products (as it offers information about customers' preferences, feedback about organizations' products and services performance, and insight about emerging trends [50]; create, optimize and improve business processes in terms of management and distribution; improve overall and employee productivity, efficiency and reduce costs; increase connectivity and self-learning; improve and streamline internal and external operations and communications (e.g., by incorporating knowledge of user interaction in social networks; improvements in interactions with users (B2C) and businesses (B2B)); improve public and private cooperation, transparency and participation; assist customers in real life (e.g., to interact with customers at any point in their life or experience, to provide a pocket assistant, or to understand specific customer needs through the use of artificial intelligence and smart devices); plan, predict and monitor various developments and effects surrounding organizations and geographical areas; develop green innovation and the circular economy; or to improve marketing (helping to understand, satisfy and improve customer loyalty and engagement). All this leads to enhancing the competitiveness and sustainability of organizations. For instance, green management prac-tices related to the integration of big data technology, can improve banks' environmental and financial performance [64]. Big data is also able to generate valuable information and insights, helping to capture value about customers, and impacting the long-term firm performance [50]. Moreover, Big Data can improve societal and environmental sustainability in supply chains [65] by reducing, for instance, slavery or pressures on the consumption of natural resources through the improvement of visibility and coordination among supply chain partners.
Furthermore, due to COVID-19 the use of Big Data and new technologies is essential not only to control and reduce the impact of the pandemic but also to improve people's quality of life, as well as the business ecosystem and the environment. In this regard, the use of Big Data is essential to predict, plan, manage, and control environmental processes, which derives from a discipline called Environmental Informatics [66]. Combining Artificial Intelligence (AI), Geographic Information Systems (GIS), modelling, simulation, and user interfaces is a key tool for environmental sustainability. These can also be used to improve green infrastructure and practices to optimize energy, water, waste, pollution, and other externalities, and to improve resource usage and thus reduce negative environmental impacts. For instance, big data provide new and powerful ways of studying and improving coupled environmental, social, and economic systems to enhance urban sustainability [47]. Big data can be used to strengthen the sustainability of healthcare project financing by improving the quality and timeliness of information [67]. The capability to analyze and visualize large datasets in a rapid timeframe, which is possible thanks to Big Data, can help to improve the conservation and sustainability of the environment, as well as mitigate environmental declines, by allowing scientists to discover, analyze, and better understand environmental changes at micro to global scale [68].
However, the development of Big Data also presents some negative consequences for environmental sustainability, as it can lead to overexploitation of resources needed to develop and implement new technological innovations. An increased demand for scarce resources, in turn, requires more energy production, which results in a higher pollution rates and other environmental impacts. The sustainability of organizations or the functioning of the economy can also be negatively affected, as it can lead to monopolistic practices. More specifically, the leading technological companies can dominate a wide range of sectors as they control big data, while small and medium enterprises do not have enough knowledge, access to large volume of data, technological resources and financial capability to implement innovations. It can also transform social and cultural structures, affect employment, people's privacy, safety and security, the income distribution, and even affect the concept of society and its governance as we know it today.

Materials and Methods
This research uses the WoS Core Collection database as a data source. The choice of this database is justified by the fact that it is one of the two databases with the highest academic recognition, together with Scopus. In this study, Scopus was discarded as WOS is more restrictive in terms of the quality of the journals it includes [69]. The option of Google Scholar was not deemed suitable due to existing unreliability concerns [70]. Thus, our approach follows previous studies that have used WoS [2,71].
In order to evaluate only papers that jointly analyze the topics of "Big data" and "sustainability," we searched for all the papers that use both keywords simultaneously in the entire database (including all its sub databases, knowledge areas and categories [72]. However, the population only comprises documents catalogued as articles, reviews, letters, and notes [2]. Documents from 2011 (the first year in which a document on both topics appears), until 31 December 2020 are included. Data collection took place in February 2021. The final sample consists of 726 documents.
The study uses the VOSviewer software [73] to map and graphically illustrate the data. This tool is widely used in the literature to visualize the structure and networks of authors, journals, universities and countries [72]. In addition, the study makes use of the most broadly used analyses in the bibliometric literature [2]: keywords co-occurrence, co-citation (two papers cited by the same document) [74]; bibliographic coupling (one document is cited by two documents) [75] and co-authorship [71]. The research performed hierarchical cluster analyses.
Moreover, the most popular bibliometric indicators were used including the total number of published articles (to assess productivity); the total number of citations (to observe the relevance of an author, an institution or a country [69,71]; the h-index (indicating the authorship of h documents cited at least h times [15] to analyze the quality of a group of documents [76]; the number of documents above a number of citations threshold (to observe influence) [2]; the impact factor reflected in the WoS (to show the dissemination power of a source) [71]; the citation/document ratio (to measure the impact of each article).section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

Results
In this section the results derived from seven analyses are presented. Firstly, the state of the art, the research progress on BD&S and the citation structure of the articles is described. Next, the most cited papers on BD&S are analyzed. The third subsection contains the results related to the top journals. The fourth part focuses on the co-occurrence study of keywords associated with BD&S. This is followed by the analysis of co-citation of references, journals, and authors in the BD&S literature. The sixth subsection focuses on the bibliographic coupling of authors. Finally, the study reveals the co-authorship clusters of countries and institutions.

Current Situation and Evolution of the Literature on Big Data and Sustainability
The first paper on Big Data and Sustainability, published in the Web of Science (WoS) core collection, analyses spatiotemporal patterns in criminal offense records [77]. Although there were only 3 published articles in 2013 and 5 in 2014, the growth of the academic literature on the topic has been exponential since then. Thus, in 2018, 121 articles were published, reaching 309 in 2020, with an increase of almost 90% in the last year. Figure 1 indicates the annual trend in terms of publications. The importance of the articles on Big Data and sustainability is highlighted by the number of citations they have received. The most cited paper in this field is Tao et al. [78], which received 346 citations in WoS in just over two years. The articles by Bibri and Krogstie [79] and Nuaimi et al. [80] have also received more than two hundred citations from papers indexed in WoS. Table 1 shows the general citation structure of BD&S publications. The data indicates that only the first two articles obtain more than 250 citations (0.28%), although 29.48% of them exceed 10 citations, despite the novelty of the topic. In addition, the h-index [76], which is indicative of the holistic observation of the field [72], for all articles related to BD&S, is 49 (49 articles have 49 or more citations). In addition, the whole sample of articles is cited more than 9500 times in WoS, with an average of 13.25 citations per item.  Table 2 contains the most important documents in the BD&S research area. Specifically, the data shows the characteristics of the top 20 articles that have receive the greatest number of citations. This analysis is very relevant, since this data reveals not only the quality of the documents, but also their popularity and influence in the field [72].  The article by Tao et al. [78] leads not only in terms of citations in WoS (346), but also in regard to the number of citations per year (115.33). This article studies "how to generate and use converged cyber-physical data to better serve product lifecycle, so as to drive product design, manufacturing, and service to be more efficient, smart, and sustainable." The second most cited paper is Bibri and Krogstie [79] with 266 citations, which is also the second most cited paper per year (66.5) in the ranking. This paper provides an overview of the field and existing work on smart (and) sustainable cities and reveals numerous research opportunities in the area of smart sustainable cities. The article by Al Nuaimi et al. [80] is third in the ranking of citations (215), although it occupies a lower position when considering the number of citations per year (35.83). This paper, also focused on the study of smart cities, analyses the use and implementation of big data applications to support smart cities in order to improve sustainability and the living standards (e.g., to improve the performance of health, transportation, energy, education, and water services). Finally, the paper by Wang et al. [81] should be pointed out, which, although occupying a discrete position in the ranking of total amount of citations (111 citations), is ranked third as for citations per year with 55.5 annual cites in WOS. This paper reviews smart meter data analytics, highlighting the relevance of the massive use of this data to promote and enhance the efficiency and sustainability of the power grid.

Leading Journals in BD&S
The three main publication categories within the WoS, are Green Sustainable Science Technology (211 papers, 29.06%), Environmental Sciences (193 papers, 26.58%), and Environmental Studies (145, 19.45%). Analyzing the sources of the 726 papers in our sample, these were published in 370 journals, only 47 of which have published more than 3 papers, 19 published more than 5, and only 6 more than 10 documents on BD&S.
Focusing on these sources, 234 documents (22.23% of the BD&S sample) appeared in the top 10 journals (Table 3). However, only the first two journals account for more than 20% of the articles published on the topic: these are Sustainability accounting for 15.29% of the total publications, Journal of Cleaner Production representing 5.51%, and in third place Sustainable Cities and Society with 2.48%. The H-index is also led by these three sources, although in this case Journal of Cleaner Production (17) ranks first, followed by Sustainability (13) and Sustainable Cities and Society (7).  Analyzing further the journals in Table 3 and focusing on those that have dedicated the largest amount of their published articles to the BD&S topic since 2011 (the first year in which a publication on the topic appeared), Advances in Science Technology Innovation comes first with 12.64% of its publications dedicated to the topic. The Journal of Enterprise Information Management, and International Journal of Logistics Management are the next in the ranking, although these sources have only devoted 2.22% and 1.25% respectively of their publications to BD&S. As

Keywords Analysis
In this section, keywords are analyzed in order to get a glimpse of their importance and distribution, which, in turn, represent the state of the art and the main trends in the BD&S area. More specifically, co-occurrence analysis (which studies the number of articles in which two keywords appear together) of all keywords was performed. The analysis of the 726 documents in sample, through the VOS viewer program, indicates the existence of 3802 keywords. Figure 2 shows the most relevant keywords and the size of the nodes (the larger the keyword and the node, the greater the number of articles that contain the keyword), as well as the relationships between them. The thickness of the relationship line indicates the frequency of the joint co-occurrence of two keywords, while the relative strength of the relationship between two keywords, compared to others, is displayed by the smaller distance between them. The nodes are colored to indicate the different clusters and groups of keywords. Using a threshold of fifteen co-occurrences, Figure 2 shows the 73 most frequent keywords grouped in 5 clusters. The three main keywords are "big data", "sustainability" and "management." However, the most relevant cluster is the one colored in red and labelled "sustainability" with 21 items. The network includes, among others, the keywords "big data analytics", "performance", and "innovation", listed as the sixth, seventh, and twelfth most frequent keywords. The second cluster, in green, contains 18 items, dominated by the keyword "big data", and also including "model" and "design" as relevant keywords. The third cluster, displayed in blue, is labelled as "management" and contains 14 items among which "challenges", "technology", and "analytics" should be highlighted. The fourth cluster, in yellow, also comprises 14 items, being "framework", "future", and "internet" the fourth, the eighth and the tenth most frequently used keywords. Finally, the fifth cluster colored in purple contains six items, only two out of which are among the 30 most frequent keywords: "industry 4.0" and "supply chain." Table 4 shows the top 30 keywords, their frequency, and total link strengths. Note: R: Rank; Oc: All keyword occurrences; Co: All keyword co-occurrences link.

Reference, Journal and Author Co-citation Analysis
The study of co-citations indicates when two elements (author, journal, or document) are cited simultaneously by a third document, thus appearing together in a new bibliographic list [74]. This subsection analyzes three elements: references, journals, and authors. In doing so, the characteristics, development, structure, and relationships of BD&S research is revealed, through the analysis of relationships within the current sample. Figure 3 illustrates the main co-cited papers. A more detailed analysis shows that in terms of the number of citations the study of Wang et al. [82] and McAffee et al. [83] share the first position in the ranking with 47 citations each by the 726 papers in the BD&S sample. In this ranking the paper by Kitchin [84] ranks third with 45 citations. As for the ranking based on the link strength, the first and third positions are still occupied by Wang et al. [82] and Kitchin [84], although the second one belongs to Gunasekaran et al. [85] with respective values of 263, 237, and 232. It should be noted that these papers are not necessarily included in the sample of 726 papers on BD&S, but that they are co-cited by them. The documents by Wang et al. [82], McAffee et al. [83], and Gunasekaran et al. [85] represent the largest cluster, which is colored in red, in Figure 3 and encompasses 24 items. This network has a predominantly management focus related to Big Data. More specifically, these three papers mainly focus on Big Data management for value chain logistics management, as well as supply chain networks for sustainability. The rest of the papers in this cluster have similar characteristics. The second cluster, displayed in green, contains 14 items and is led by the study of Kitchin [84]. This cluster has a more targeted approach to sustainability issues, considering the geographic scopes and smart areas (including cities). For instance, Kitchin [84] focuses on the use of Big Data for smart urbanism. A large part of the research gathered in this cluster addresses the sustainable management of cities, and the so-called smart sustainable city development. The emphasis is on the management of areas and destinations, rather than the management of specific organizations and their value chains. In addition, while the previous network is mainly centered on the technological side of big data, this cluster emphasizes the sustainability and the environmental aspects. Finally, the blue cluster, encompassing only nine items, is closely related to the first one. This last cluster is dominated by Stock and Seliger [86], although the article did not perform well in the citation (30 citations) nor in the link strength ranking (27th position). Stock and Seliger [86] focus on sustainable manufacturing in the industry 4.0. In contrast to the first cluster, which also addresses technology management, the focus here is on the technological rather than the management aspects (e.g., Internet of things, industry 4.0, etc.) Once the reference co-citation results are presented, the journal co-citation network on BD&S is analyzed ( Figure 4). As in the previous analysis, the nodes in the illustration show the importance and number of cited items, while the distances between them indicate the frequency of co-citations. The analysis uncovers the existence of four networks of co-cited journals. The first network, displayed in red, contains 34 journals, with Sustainability  , 9170 link strength). However, the prevailing management area is information systems management, associated with decision making. This cluster shares information management content with the previous cluster, and also addresses operations management issues as the next one.  10th, 11th, 13th, 14th, 16th, and 17th positions. The studies conducted by these researchers focus on big data in general, as well as its application in logistics and supply chain management, as already observed in the co-citations of references analysis. Finally, the last cluster, in yellow, includes three authors, among which Ivanov (87 citations) stands out as the 4th most cited author. It should be noted that this cluster was not revealed in the analysis of co-citations of references. Ivanov's research addresses the combination of digital technology, smart factory industry 4.0, and value chain management from a mathematical and engineering perspective. The position of the cluster in the figure, far from the second one and in the middle, between the first and the third one, is due to its technological approach.

Bibliographic Coupling of Authors
Bibliographic coupling complements the previous analysis of the relationship among authors by counting the number of references that a group of documents have in common (documents A and B are coupled if they both cite another article C) [75]. Figure 6 shows the results of this analysis, which indicates the existence of five main groups of authors. Looking at the link strength ranking, Bibri (31 papers and 3941 link strength), Krogstie (7 papers and 2991 link strength), and Gunasekaran (7 papers and 2043 link strength) occupy the first three positions. These authors are also leading the ranking in terms of total citations, although the order is as follows: Gunasekaran (505 citations), Bibri (500 citations), and Krogstie (318 citations). Drilling down into the structure of clusters, the first of them, in red, and with 18 members, is anchored by Bag and Gupta, ranked 4th and 8th according to the link strength criterion, with 7 documents each. Jabbour is another prominent scholar, who appears with two entries, and such is Hazen, who is ranked 8th in the citation ranking with six papers. The research of these authors is focused on industry 4.0. The second cluster, depicted in green, contains 18 authors, the leading among them being Bibri and Krogstie. A close relationship between both scholars is observed, which is explained by their frequent co-authorship (e.g., [79]). Another distinguished researcher in this cluster is Wu, who ranks 6th per number of citations. As mentioned earlier, the research area of these authors is smart sustainable cities. The third network, in blue, contains six researchers, Gunasekaran being the most prominent among them. Childe, Dubey, and Papadopoulos, with four articles each, occupy the 5th, 6th, and 7th positions, respectively, in the link strength ranking. This group of researchers, which also includes Kamble, investigate Big Data analytics in logistics and supply chain management. Next is the yellow cluster, comprising five authors and anchored by Y. Liu, who, although in a low position, according to the link strength criterion, receives 58 citations from his seven papers. His research adopts a technological perspective to the use of big data and is mainly focused on mobile computing. Finally, the last cluster (in purple) is composed by the following three scholars: Akhtar, Rao-Nicholson, and Khan, with three papers and 64 citations each. These researchers share the authorship of multiple papers in the management field, some of which are related to Big Data analytics and sustainability in emerging markets. It should be noted that the last two clusters offer new perspectives on BD&S, not addressed before.

Country and University Co-Author Analysis
The main purpose of co-authorship analysis is to reveal the structure of collaboration between institutions and countries at the researcher level and thus visualize the behavior of research teams [87]. The nodes reveal the influence of countries/institutions, while the thickness and distance among them show the degree of collaboration.
First, the collaboration among countries is analyzed. Figure 7 indicates that USA  The conducted analysis reveals the existence of six clusters. The largest one (in red) includes 16 countries, which are basically European Union member countries (e.g., Italy, France, Spain, Norway). The second most important cluster is displayed in green and encompasses 11 countries: the USA, Australia, China, India, South Korea, and the U. Arab Emirates. The third cluster, in dark blue, contains 9 elements, among which is Malaysia (20 documents, 301 citations) together with other Asian countries, two European ones, and Mexico. The fourth cluster, in yellow, includes only Brazil, Chile, and Japan. The fifth one also contains 3 elements: Canada and, at a longer distance, Belgium and Romania. Finally, the light blue cluster includes only England and Taiwan.
Lastly, the most prominent institutions in BD&S research are analyzed (Figure 8

Discussion
The present research has investigated the existing body of literature on the intersection between Big Data and Sustainability. Given the lack of a systematic understanding of this field of knowledge, the study has contributed to the literature by carrying out a bibliometric analysis and visualizing the most relevant pieces of research on BD&S. The conducted literature review on the concepts of "Big Data" and "sustainability" in the beginning of the paper also helps to improve the current understanding of the topic.
The results reveal the interdisciplinary nature of the area, integrating sustainability, technological and management perspectives. However, it should be noted that the literature on this topic is still in its infancy, and while it is currently expanding, most of the studies are conceptual and theoretical in nature. Furthermore, the extant advances are partial and limited, thus failing to integrate the developments in the main scientific fields.
While the research addressing Big Data and Sustainability has only started a decade ago, its growth has been significant in recent years, considering that the number of published articles has increased from 5 in 2014 to more than 300 in 2020. It should also be highlighted that 2020 was a particularly fruitful in terms of publications on BD&S since more than 40% of all the papers were published this year. This exponential growth in the number of publications is reflected by the impact of the some of the studies, with the Tao et al. [78] article receiving more than 300 citations and Bibri and Krogstie [79] being cited more than 250 times in WoS, despite both been published less than 5 years ago.
Another relevant finding of the conducted bibliometric analysis is that the main academic fields investigating on BD&S are Green Sustainable Science Technology, Environmental Sciences, and Environmental Studies. The leading journal in terms of number of publications is Sustainability, followed, at a great distance, by the Journal of Cleaner Production and Sustainable Cities and Society. However, the existing articles are spread over 370 journals, indicating their expansion across a wide range of disciplines. Table 5 summarizes the streams of research and main findings of the different bibliometric analyses. As for the keyword co-occurrence, the most frequent keywords are "big data", "sustainability", "management", "framework", and "challenges", which is indicative of the environmental, technological, and managerial perspectives adopted to study this topic. Industry 4.0 and supply chain management appears as another cluster independent from the management one. The cluster labelled "framework", as well as the frequently used keywords "challenges", "future", and "model" suggest that the literature on BD&S is in a conceptualization phase.
The co-citation analysis reveals that the papers by Wang et al. [82], McAffee et al. [83], and Kitchin [84] are the most cited in the research sample. The first two focus on Big Data management and represent a cluster that addresses the company management of these technological developments for sustainability purposes. The third anchors the second cluster, which adopts an environmental perspective, is associated with geographical area management, mainly looking at the development of smart cities. The third group is closely related to the first one, while the last one deals with Industry 4.0 and the Internet of Things applying a more technological approach.
The study of journal co-citations also uncovers four groups. The first one, anchored by Sustainability, the third most cited source, is centered on the sustainability of smart cities. The second one, in contrast, has a strong management focus and mainly investigates information systems for decision-making. This cluster is anchored by the International Journal of Production Economics, the second most cited source. The Journal of Clean Production, the most cited source, is the most relevant one in the third cluster, which also has a predominantly management focus, although it mainly addresses production and operations management. While the fourth group is more residual, it reveals a nascent research area on technological innovations related to renewable development in the energy field.
The analysis of co-citations of authors also identifies four clusters. The first one, anchored by Dubey, the second most cited author, investigates aspects associated with Big Data, in general, and Big Data Analytics in the logistics and supply chain management field. The second largest cluster, which includes Bibri and Kitchin (the first and third most cited authors), also focuses on smart sustainable cities and smart urbanism. The third group addresses sustainable manufacturing in industry 4.0. Finally, the fourth network is centered on digital technology and smart factory industry 4.0, thus representing a new research line that adopts mathematical and engineering perspectives.
The bibliographic coupling of authors indicates the existence of 6 groups of authors, the largest one of which is anchored by the fourth most cited author, Bag, and focuses on industry 4.0. The second most important cluster is dominated by Bibri and Krogstie and addresses smart sustainable cities, as mentioned above. The third group, anchored by Gunasekaran, deals with Big Data analytics in logistics and supply chain management. The fourth cluster looks at Big Data through a technological lens, with a particular emphasis on mobile computing. The fifth group opens a new perspective on Big Data analytics and sustainability in emerging markets.
The co-authorship analysis of BD&S shows the leadership of the USA, China, and England. However, the most important cluster represents mainly European Union countries. The second group includes the USA and China, together with India and Australia. The third group represents mainly Asian countries, while the fourth one comprises South American ones. As for the rest of the clusters, it should be noted that they include geographically dispersed locations. Regarding the main institutions contributing to the knowledge on BD&S, the Norwegian University of Science and Technology, the University of Hong Kong and the Chinese Academy of Sciences are revealed as the most prolific centers. The first cluster is anchored by multiple Chinese institutions, while the most relevant in the second one is the Norwegian University of Science and Technology. The third cluster subsumes mainly US institutions, while the fourth one comprises Australian universities. Authors should discuss the results and how they can be interpreted from the perspective of previous studies and of the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.

Conclusions
This study has analyzed the state of the art, and trends, in the BD&S literature. The results have shown the existence of various theoretical research areas, which have approached the topic from heterogeneous perspectives. Overall, the analysis has revealed the existence of four main approaches to the study of sustainability and big data: an information systems management perspective; an environmental approach; an operations management perspective; the technological stance related to the so-called Industry 4.0 and the Internet of Things. In addition, a few minor approaches, which may evolve in new research trends, were unveiled: mobile computing, mathematics, engineering, as well as specific developments in the field of energy. The study emphasizes the relevance of these emerging areas and trends, which are still in a predominantly theoretical and conceptual stage of development. The findings are important for both academic scholars and practitioners, since they anticipate not only new lines of research, but also key aspects that have to be considered in the practical implementation of BD&S.
In terms of practical implications, the study uncovers opportunities for new business models, improved decision-making, as well as innovations in the operations management and the internal and external logistics chain. In addition, the identified developments in the literature on smart cities also offer implications for the decision-making of governmental institutions. However, the study also visualizes the lack of Big Data application in business functions, such as marketing and the company's internal value chain, among others. The application of big data is also scarce across most production sectors, except for the few studies conducted in the energetic field. In this regard, the lack of BD&S studies in the healthcare and other services sectors, which have a high impact on today's economies is particularly notable. In addition, the developments observed in the business field should be transferred to the decision-making of public institutions beyond issues such as sustainable urban planning and renewable energy management.
As for the theoretical relevance of this article, the conducted bibliometric study opens new research avenues on BD&S. The paper visualizes the embryonic state of the art of the literature, which is still in a conceptual stage, and thus opens diverse methodological applications, as observed by Garrigos et al. [2,71] in other bibliometric analyses. Furthermore, as previously indicated, the application of the BD&S is still lacking in multiple sectors. In addition, academic research going beyond the technological and management approaches is still scarce. In this regard, and although some developments in the fields of mathematics, geography and engineering have been revealed, the rest of research areas have not addressed the topic yet. These would include health care, biology, physics, and environmental sciences. As for the social sciences, the sustainable development of big data should be approached by sociology, ethics, political science, education sciences, and economics.
In this regard, the present study can uncover new research trends, and help researchers to discover important gaps. Future investigations should consider the aspects pointed out by Garrigos et al. [2,71]: issues related to trends in BD&S in classical research fields, the application to new areas and sectors where this literature has not yet developed, and the consideration of BD&S when analyzing timely topics. In this sense, the findings regarding the most influential and most cited authors, sources, and papers, as well as the listing of the most frequent keywords, can help scholars to identify the most relevant research questions and the hot topics that attract the current research interest.
Furthermore, both academic studies and practical applications should consider the link between business and technological developments, since the management and the technological side of BD&S are the two main areas that have been uncovered by the conducted bibliometric analysis (apart from sustainable management of smart cities). Hence, future research efforts and practical applications should be cognizant of the evolution of the Web 3.0 or Semantic Web and the conceptual models pointed out by Garrigos et al. [1] where Artificial Intelligence, as well as the Ubiquitous and Pervasive Web associated with the Internet of Things play a crucial role. In addition, the evolution of Big Data itself should also be closely followed, including not only the numerical and alphabetical data from organizations' internal warehouses, but also the management and mining of data from social networks, external websites, smartphones, data in the cloud, which is increasingly decentralized, and associated with images, tastes, and other senses. In addition, the new developments should consider the new types of companies, which are essentially virtual organizations and outsourced networks, where crowdsourcing plays a key role [62].
Finally, future research may attempt to overcome the limitations of this study. Further investigations can apply new methodologies, incorporate qualitative data, or adopt a more focused approach to BD&S and thus, address the limitations of the conducted bibliometric and visualization analyses. Some fruitful areas of research would be to complement the study with a more extensive sample from additional databases; to consider other types of documents such as conference, proceedings, and professional papers; to analyze documents published in other languages or to focus on a specific geographical area. Future studies could also develop the main trends identified by this research, further develop, and complement to the conducted bibliometric analyses, as well as implement new techniques presenting relevant methodological innovations.