1. Introduction
Sustainable development is defined as development that meets the needs of the present without compromising the ability of future generations to meet their own needs [
1], and it has become a key strategic objective worldwide. Sustainable development requires the consideration and integration of economic, social, cultural, political, and ecological factors in decision-making, in an attempt to balance economic development, social development, and environmental protection [
2]. As a result, the majority of large companies and an important number of small and medium-sized enterprises have been incorporating policies and actions aimed at improving their sustainability and the sustainability of their supply chains (SCs) [
3]. This consideration of sustainability as an objective in supply chain management, in what is also known as Sustainable Supply Chain Management (SSCM), is due mainly to three factors: (1) the pressure of stakeholders (such as investors, shareholders, customers, and non-profits) to alleviate the enormous environmental impacts that are being generated (deterioration of the environment, scarcity of resources, increase in waste generated, increased pollution); (2) generating brand value that serves as a differentiating element against competitors; and (3) increasingly restrictive regulations [
4].
However, the incorporation of sustainability into supply chain management faces a number of obstacles and difficulties. Pagell [
5] states that present knowledge in SSCM is not sufficient to create truly sustainable supply chains and identifies problems with assumptions, norms, institutions, measures, and methods that future research needs to address. In addition to these shortcomings, Luthra and Mangla [
6] add another: the technical issues related to the generation, processing, and analysis of data that will allow greater effectiveness and efficiency in business processes, as well as performance control and support for supply chain decision-making.
This is where the concept of Industry 4.0 and the use of Big Data technology come into play. Industry 4.0 has become a buzzword that describes the trend towards digitalization and automation of manufacturing. It allows products, machines, components, individuals, and systems to create a smart network so that it can integrate cyber-physical systems to act quickly by linking information and physical memory to the smart network for faster and more effective service environments [
7]. The contribution of Industry 4.0 to more sustainable industrial value creation will be remarkable in the future [
8]. For example, it will allow efficient resource allocation, which includes water, energy, raw material, and other products, based on data that is collected in real time, resulting in new sustainable green practices. On the other hand, Big Data analytics are technologies and techniques used to analyse large-scale, complex data from various applications in order to acquire intelligence and to extract unknown, hidden, valid, and useful relationships, patterns and information [
9]. Big Data can be very useful in enhancing supply chain sustainability as it can be used to identify the sustainability impact of the SC in the past, as well as to predict its future sustainability impact [
10]. Therefore, the combined use of digitalization and automation of Industry 4.0 together with Big Data technology has great potential to overcome some of the problems affecting sustainable supply chain management [
11].
However, development of the research on Big Data and Industry 4.0 to enhance supply chain sustainability is still limited, its final application in real cases of the business world is usually in the early stages [
12], and hence practitioners have problems using Big Data and Industry 4.0 in supply chain sustainability [
13]. This could limit its impact, since existing SSCM literature is primarily focused on other problems, such as building the definitions of SSCM; discussing the implementation of SSCM; proposing strategic decisions to be considered in SSCM; analysing SSCM governance mechanisms; and developing frameworks for SSCM [
14]. Therefore, more research in the technological issues concerning the Industry 4.0 and Big Data solutions that support sustainable practices in the supply chain is required [
6,
15,
16].
In order to cover the research gap between the growing and useful interest in the ways Industry 4.0 and Big Data can help to support sustainable supply chains, and the scarcity of systematic and extensive reviews of the recent research on this subject, this paper seeks to explore the status of research in the domains of Industry 4.0, Big Data, and sustainable supply chains, as well as to establish a categorical framework that brings together research conducted on the basis of relevant common points. The hypothesis is that a bibliographic analysis of current research can facilitate the advancement of future research in this area. In particular, the following research questions (RQs) are elaborated:
- RQ1:
Which are the top contributing authors, countries, and institutions in the field of Big Data and Industry 4.0 as applied to SSCM?
- RQ2:
Is it possible to define research categories on the basis of relevant common points?
- RQ3:
Which are the future research necessities in the field of Big Data and Industry 4.0 as applied to SSCM?
To answer the above research questions and to address Industry 4.0 and Big Data as means to support sustainable supply chains, this paper (1) reviews the literature on Big Data, Industry 4.0, and supply chain management published since 2009; (2) provides a thorough insight into the field by using bibliometric and network analysis techniques and by evaluating 87 articles published over the past 10 years, identifying top contributing authors, countries, and key research topics related to the field; (3) obtains and compares the most influential works based on citations and PageRank; (4) identifies and proposes six established and emerging research categories that would encourage scholars to expand research on Industry 4.0, Big Data, and SSCM; and (5) identifies the future research necessities in every category. The methodology we followed combines Rowley and Slack’s proposal [
17] and the snowball method [
18] for item selection. To perform the bibliographic and network analyses, the paper by Mishra et al. [
19] was used as a model. Finally, for the identification of the categories, we followed the comparative method proposed by Collier [
20].
This paper is organized as follows:
Section 2 presents a review of the literature related to supply chain sustainability, Industry 4.0, and Big Data.
Section 3 describes the software tools and the research methodology to perform the bibliographic and network analyses.
Section 4 shows the findings of the bibliographic and network analyses. Finally,
Section 5 discusses the findings and shows the conclusions, research limitations, and future work.
3. Software Tools and Research Methodology
3.1. Software Tools
The bibliographic search was conducted using the Web of Science. Web of Science together with Scopus are the main sources of bibliographic citations used for bibliometric analyses. This is mainly because they are the only ones that combine both a rigorous selection process and a wide interdisciplinary coverage, which represents a significant strength over the others databases. On the one hand, there are other popular interdisciplinary databases such as Google Scholar, but the low data quality found in Google Scholar raises questions about its suitability for research evaluation [
52]. On the other hand, there are prestigious bibliographic databases but they are focused on specialised fields, or they are regional and even country-based abstracting.
In this case, for the systematic and extensive review of Industry 4.0 and Big Data to support sustainable supply chains, relevant and high-quality papers from different scientific fields are needed, because sustainability is an interdisciplinary field addressed from different research domains, such as business management, environmental and social sciences, engineering, etc. Therefore, Web of Science together with Scopus are the best databases for this study.
Even so, both have their limitations and biases that give rise to the existence of underrepresentation of articles by fields, language, or countries. The most comprehensive approach would be to combine different search engines so that they can each cover the biases of the others [
52]. In the study presented in this paper only the Web of Science was used because it offers several analysis tools that are necessary and that are not compatible with other systems. The reason for not using Scopus is that its search tool works by study areas and does not offer the versatility to carry out the search by content, title or author keywords, which is a crucial element for this study.
The desktop version of EndNote was used for the management of the bibliographic libraries extracted from the search in the Web of Science. This software was selected because it is a tool that is very easy to use, very powerful, compatible with many bibliographic search engines, compatible with the MS Office tools and can be downloaded as a trial version.
A tool integrated within the Web of Science was used for the bibliographic analysis. This tool performs an analysis of the list of articles, grouping them into different categories, such as country, institution in which they were developed, the journal that published them or their authors. It also allows a report to be created that provides an analysis of the evolution of the number of citations over time and obtains some reference indicators such as the h-index.
The list of articles was extracted from the Web of Science in plain text format and was then processed with the BibExcell tool to generate a file with the necessary format and structure for the network analysis performed in Gephi. Gephi is an open source tool widely used for network and graph analysis. It offers the possibility of performing an advanced exploratory analysis of entity–relationship data. It also allows a significant number of distribution, organization, and clustering algorithms to be applied in order to determine the interactions that occur in the data. These techniques allow, among other things: identification of nodes, or more influential entities; identification of groups related by affinity; calculation of relevance indicators such as PageRank; or visualization of information in order to facilitate the identification of elements of interest.
3.2. Research Methodology
The model for the methodology followed to perform the bibliographic analysis was taken from the article by Mishra et al. [
19], which in turn was based on the five-step method proposed by Rowley and Slack [
17].
First, the keywords that defined the search for the articles were selected. Three keyword combinations were established: (1) Big Data, Industry 4.0, sustainability, supply chain; (2) Big Data, sustainability, supply chain; and (3) Industry 4.0, sustainability, supply chain. The search was conducted in English as there are a greater number of bibliographic sources in that language [
35]. Web of Science has the search criterion ‘Topic’, which searches for the keyword combinations in the title, abstract, and keywords of the papers. This is the search criterion that was used in the study.
The search method used in this study is based on the snowball system that is widely described in Wohlin’s article [
18]. As a result, an initial list of articles was obtained from the combination of the aforementioned keywords. The articles in the initial list were then filtered by analysing each of them individually. First, we checked whether at least two of the search keywords were contained among the keywords of the article in question. If that was the case, references to the third keyword were sought in the abstract. If they were not included, the abstract was analysed in detail to determine whether the article dealt with the topic in question, even if the keyword was not explicitly mentioned. If the keyword was not found, the article was discarded. If it was found, the article was selected, and then its cited papers were analysed to determine whether they could also be selected. If so, the process was continued in a similar manner until no further adequate articles were found in the references of a selected article. When this happened, it returned to a higher level to continue the search.
Applying this process resulted in a list of 87 articles. Once the list was defined, the analysis tools available on the Web of Science were used. The following tasks were performed with these tools:
Analysis of the evolution over time of the number of articles published included in the list,
Analysis of the evolution of the number of citations generated by the articles,
Analysis of the number of articles published by author,
Analysis of the number of articles published by country,
Analysis of the number of articles published by institution,
Analysis of the content of the 10 most cited articles on the list,
Analysis of the h-index indicator,
Analysis of the number of articles published per journal,
Analysis of the indicators of relevance, impact, and prestige of the 10 journals with the most published articles on the list. The indicators analysed were the following: CiteScore, Impact Factor, Normalized Source Factor, and Scimago Journal Rank.
The plain file containing the list of papers was treated with BibExcell to generate a file that was compatible with Gephi software. This file contained only the relationships among all the articles and a label that identified them. The relationships reflected the citations that papers on the list made from other papers on the list. The analysis of networks and relationships carried out with Gephi was aimed at: (1) analysing the relationship of articles based on citations made among them; (2) identifying the relevant articles and those that were marginal; (3) detecting clusters or sub-communities on the list based on the relationships generated through citations; and (4) calculating the PageRank indicator.
PageRank was the first algorithm used by Google to establish a ranking among web pages, proposed by Brin and Page [
53]. The algorithm assigns a value to each web page based on a network analysis that measures the interactions among the pages. This algorithm can be applied to any set of elements that are related to each other through citations or references. This is why, shortly after its appearance, it was applied in bibliographic studies and in determining the relevance and prestige of publications [
54]. The interesting thing about this algorithm is that it not only takes into account the times an article is cited but also the degree of importance of the articles that cite it.
Finally, a detailed analysis of the content of all the articles was carried out in order to classify them in different categories. A system inspired by the comparative method proposed by Collier [
20] was used to establish the categories. According to this system, a preliminary classification based on a previous content analysis was proposed. To establish these categories, the common points shared by the articles were identified. The fundamental objective of the article, as well as its contributions and the advances it offered to the state of the art were the main factors considered. This classification was taken as a starting hypothesis. After that, the adequacy of the categories to classify all the articles was checked paper by paper. When an article was found that did not fit in any category, the classification was rethought with a view to integrating the dissonant element. Several reviews were performed until all the items on the list were properly distributed in the proposed classification.
4. Bibliometric Analysis
4.1. Initial Results
First, it can be said that the number of articles on the final list was relatively low when compared with specific searches for only one of the keywords. For example, on the Web of Science 54,371 articles appeared for the keyword ‘supply chain’ and 56,546 for ‘Big Data’. However, combining the two keywords reduced the number to 679 papers. When the search was performed with the three keywords in one of the combinations selected for the study (Big Data, supply chain, and sustainability), 93 results appeared, representing 13.7% of the 679 selected in the previous search that did not include the word ‘sustainability’. From this, it can be deduced that an important number of the studies related to Big Data as applied to the supply chain took into account the dimension of sustainability, which is a relevant fact. These searches were carried out as of June 2019. Finally, it should be clarified that, although the search that was performed with the words ‘Big Data’, ‘supply chain’, and ‘sustainability’ offered 93 results, several were left out of the final selection because of the filtering process described in the methodology section.
The analysis tools of Web of Science were used to carry out a global analysis of the list of articles obtained. Attention should be paid to the clear upward trend that can be observed in relation to the number of articles generated on this subject in the years covered by the study (
Figure 1). In the case of 2019, since it was not complete, a projection was carried out using linear regression. The result was 36 articles, with a 95% confidence interval.
This trend can also be observed in relation to the number of accumulated citations (
Figure 2). In the case of 2019, a projection was carried out using linear regression. The result was 721 citations, with a 95% confidence interval.
4.2. Author Influence
There are several authors who stood out above the rest in relation to the number of publications written on the subject. In particular, A. Gunasekaran and T. Papadopoulos each published five articles during the analysis period.
Table 1 shows the 10 most relevant authors.
4.3. Affiliation Statistics
Another issue that stood out from the analysis was the distribution by countries. The United States and United Kingdom together accounted for almost 50% of the items generated. Behind them China and India represented the Asian countries. Together, these four countries covered the major part of the studies. From a continental point of view, the Europeans researched the most in this field, followed by the Asians (
Table 2).
4.4. Analysis by Institution
The three institutions with the largest number of articles published were the English universities of Kent, Plymouth, and Hull.
Table 3 shows the 10 institutions with the largest number of published articles.
4.5. Citation Analysis
With regard to the 10 most cited articles (
Table 4), only two were bibliographic analyses. Most of them were models or theoretical developments with a practical application, and in one way or another served to facilitate the incorporation of new data analysis technologies to specific aspects of supply chain management.
In first position, with a total of 158 citations, was the article by Wang, Gunasekaran, Ngai and Papadopoulos [
9]. This article carries out a review of the literature that deals with the application of data analytics technologies, including Big Data, to the management of supply chains considering their sustainability. They also develop the use of data management techniques in that context.
Figure 3 shows the evolution of the citations since the publication of the article. It can be observed how the trend ascends quite sharply. Given the relatively recent publication of the paper, the number of citations was very high compared to most of the papers on the list. In the case of 2019, a projection was carried out using linear regression. The result was 105 citations, with a 95% confidence interval. The total number of citations mentioned above (158) does not take into account the projection, only the actual citations at the time of the study (June 2019).
In second position was the article by Shrouf, Ordieres and Miragliotta [
55]. This article had 123 citations and establishes a reference architecture for the development of Industry 4.0 projects in the context of the so-called Smart factories, with a focus on sustainability. It also deals in depth with energy management.
Figure 4 shows the evolution of the citations of this article over time. In the case of 2019 a projection was carried out using linear regression. The result was 69 citations, with a 95% confidence interval. The total number of citations mentioned above (123) does not take into account the projection, only the actual citations at the time of the study (June 2019). Here it can be observed that the lower and upper confidence limits are very close to the prediction; this is due to the fact that the dating trend followed a path close to a straight line and was therefore easier to predict. In this case, unlike the previous one, the number of citations was initially much higher (16 versus 3 for the previous one) but from there on, the rise was much gentler for the second case.
In third position was the article by Chae [
56], which had 92 citations. This article applies Big Data techniques such as sentiment analysis to generate knowledge related to issues like corporate social responsibility or human rights in the field of supply chain and logistics. From there, the potential of social networks for practices related to supply chain management is evaluated. The evolution of the citations for this case is shown in
Figure 5. For 2019, a projection was carried out using linear regression. The result was 48 citations, with a 95% confidence interval. The total number of citations mentioned above (92) does not take into account the projection, only the actual citations at the time of the study (June 2019).
In fourth position was the article by Dubey, Gunasekaran, Childe, Wamba and Papadopoulos [
57], which analyses the role of Big Data analytics in supporting global scale manufacturing systems. This analysis includes structured interviews with senior managers and concludes with a proposal for a conceptual model that is tested with the data obtained throughout the study.
In fifth position was the article by Zhang, Ren, Liu and Si [
58], which proposes an architecture for Big Data analysis applied to the product life cycle. The proposed architecture is tested in a case study.
In sixth place was the article by Papadopoulos, Gunasekaran, Dubey, Altay, Childe and Fosso-Wamba [
59], which proposes and tests a framework to explain resilience in supply chain networks using Big Data analytics on data extracted from social networks, among other sources. The article draws its conclusions by analysing the context of the earthquake that occurred in Nepal in 2015.
In seventh place was the article by Zhao, Liu, Zhang and Huang [
60], in which a multi-objective optimization model is developed for the management of sustainable supply chains and for which they apply Big Data.
In eighth place was the article by Wu, Liao, Tseng, Lim, Hu and Tan [
61], which proposes a method to use Big Data analytics to determine causal relationships of risks and uncertainties in supply chains, integrating sustainable indicators, among other things.
In ninth place was the article by Fawcett and Waller [
62], which analyses five elements with potential to revolutionize the design of supply chains, one of which is Big Data. Four barriers that prevent high levels of co-creation of value among members of the supply chain are also identified, one of which is the poor understanding of corporate social responsibility initiatives.
In tenth place was the publication by Ur, Chang, Batool and Wah [
63], which presents a framework for the best management of Big Data in the context of sustainable companies, with the ultimate goal of generating value for the company.
4.6. H-index
Web of Science provides an impact meter, the h-index. The h-index is an indicator proposed by [
64] which, according to the author, is better than others that are commonly used to measure the impact of a researcher, such as Number of articles published, Total number of citations, Citations per article, Number of significant articles (establishing as significant those that are based on a minimum number of citations), or Number of citations of the
n most cited articles. In fact, all these measures have disadvantages. For example, the number of citations per publication rewards low productivity (number of articles published), while the number of articles published, in contrast, rewards productivity without taking into account their quality. This issue is problematic because, depending on how scientific publications are evaluated, research is conducted in one way or another. The use of simple quantitative meters (which do not meet the quality of the article) is negatively affecting the generation of articles with long-term impact. This topic is dealt with in depth in articles such as Smith et al. [
65].
In response to this problem, indicators such as the h-index emerged. This index corresponds to the number n of articles that were cited at least n times. The h indicator of the list of 87 articles that resulted from the literature search was 19, so that there were 19 articles that were cited at least 19 times. The indicator was calculated by ordering the items on the list by the number of times they were cited (from highest to lowest) and the list was scrolled until the number of citations was less than or equal to the position of the item on the list. This comprised the h-index.
4.7. Sources Analysis
Table 5 shows the 10 journals with the highest number of articles on the list. The most commonly used impact and relevance indicators are also included. It can be noted that in general terms they are sources that address the application of sustainable policies in the industrial or logistics sector. An example is the first,
Journal of Cleaner Production, published by Elsevier. It addresses in depth many aspects related to sustainability, such as governance, the environment, corporate social responsibility and how they can be introduced into production systems.
The indicators reflected in
Table 5 express the degree of impact, relevance and importance of the journal. There was a great deal of debate about which indicators were best suited to measure the impact of a journal. The relevance indicators shown in
Table 4 are the most used and accessible. They are briefly explained below:
- i.
CiteScore: Measures the average number of citations received per document published in the journal. Values are calculated by counting citations over a year for documents published in the three years prior to the calculation and dividing by the number of documents published in those three years. As a comparative reference to the results shown in
Table 4, the best score for the year 2018 was 160.19 and the average value was 1.6337.
- ii.
Impact Factor: This is another widely used impact meter. The difference with respect to the previous one is that, instead of taking the publications of the three previous years, it does it with a time range of two years. The best score for 2018 was 244.585.
- iii.
Source Normalized Impact per Paper (SNIP): This index measures the impact of citations in a given context. It is based on total citations per field of study. The impact of a citation has a greater value in fields where citations are less likely to occur. The best score for 2018 was 100.014 and the average value was 0.8566.
- iv.
SCImago Journal Rank (SJR): This measure takes into consideration the prestige of the journal in which the article is published. It uses an algorithm similar to Google to establish rankings between websites. It also takes into account the citations of the article. The best score for 2018 was 72.576 and the average value 0.7244.
4.8. Data Clustering Using Content Analysis
A content analysis of all the articles on the study list was carried out to define categories that classify articles based on common elements, in order to bring some order to the research effort that was being made, and to identify future research suggestions in Big Data, SSCM, and Industry 4.0. The categories obtained are the following:
Applied research: This category includes all the articles whose objective is to develop a framework, model, or system that can be used in some practical context to solve a problem that has been detected. The proposal is validated through its application to a case study.
Diagnosis: This category encompasses articles that perform a purely theoretical analysis of the status or evolution of a given theme or area of study. The most influential elements are identified and possible future evolutions, patterns, principles, etc., are established.
Bibliographic study. This category includes articles that perform a review of the published literature that addresses a subject or area in question (usually bounded by keywords). Among other aspects, the number of published articles and their impact and trends over time are analysed. In addition, it also identifies the elements in which more interest is shown and knowledge gaps. The ultimate goal is to give a complete diagnosis of the state of the art in order to influence the trends detected.
Impact analysis. This category includes articles that analyse the impact that an element has on a real phenomenon. It is a practical application focused on a specific case. The impact referred to is evaluated and contrasted with data and conclusions are drawn based on the results.
Theoretical postulate: This category includes articles that revolve around the argumentation and foundation of a theoretical proposal that does not constitute a framework for practical application, but instead moves in the field of principles, foundations, and relevant elements linked to an area of study or a phenomenon. No concrete proposal is made that has any practical application.
Specific solutions. This category comprises articles that present a practical solution to a very specific problem. Apart from explaining the main points of the proposed solution, its functionality is contrasted with real case studies. Within this framework, different programming models, algorithms, or indicators can be found.
Table 6 presents the distribution of the 87 papers on the list in each category, and a compilation of the future research suggestions in Big Data, SSCM, and Industry 4.0 made by a content analysis of the papers in every category.
4.9. Network Analysis: Gephi
Finally, a network analysis was made using Gephi. It showed the relationships among the papers on the list, such as the paper most cited by other papers on the list, and the calculation of the PageRank indicator. The results are showed in the
Appendix A.1.
6. Conclusions
In this paper, a bibliographic analysis of the literature on Big Data, Industry 4.0, and supply chain management published since 2009 was carried out. A total of 87 papers were analysed in order to identify the evolution over time of the number of articles published that are included in the list; the evolution of the number of citations generated by these articles; the number of articles published by author; the number of articles published by country; the number of articles published by institution; the content of the 10 most cited articles on the list; their h-index indicators; the number of articles published per journal; the indicators of relevance, impact, and prestige of the 10 journals with the most published articles on the list; the established and emerging research categories on the topic; and the relationships among the 87 papers.
The bibliographic analysis proved the initial hypothesis that an analysis of current research can facilitate the advancement of future research in this area. The main conclusion is that the area of study requires more research and a higher number of annual publications. It is also necessary to improve the relevance of the research carried out, something that can be achieved by accessing journals of greater impact. Finally, the research conducted in four of the six identified categories should also be improved, since the majority of papers were published in the Applied Research and Diagnosis categories.
Finally, it is important to highlight the limitations of the study. This study was limited mainly by (1) the biases introduced by studying a single bibliographic database, the Web of Science, which has shortcomings in terms of publications in the field of humanities or certain social sciences. There was also a language bias, since this database includes mostly articles that were written in English, and the search was conducted only in English. Other important databases, such as Scopus Database, could be used to improve and compare the results; (2) choosing a series of specific keywords introduced another bias by default. For example, other keywords, such as sustainable supply chain management could have been used and might have yielded different results; (3) the bibliometric and network analysis for reviewing the literature based on [
19] was used. Other methods might be used for such an analysis; and finally, (4) the literature was classified in six research clusters. Other methods may result in other classifications.