Research Trends, Enabling Technologies and Application Areas for Big Data

Lars Lundberg; Håkan Grahn

doi:10.3390/a15080280

and

Department of Computer Science, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

^*

Author to whom correspondence should be addressed.

Algorithms2022, 15(8), 280;https://doi.org/10.3390/a15080280

This article belongs to the Special Issue Optimization Techniques, Algorithms, Applications for Cloud and Edge/Fog Computing Environments

Version Notes

Order Reprints

Abstract

The availability of large amounts of data in combination with Big Data analytics has transformed many application domains. In this paper, we provide insights into how the area has developed in the last decade. First, we identify seven major application areas and six groups of important enabling technologies for Big Data applications and systems. Then, using bibliometrics and an extensive literature review of more than 80 papers, we identify the most important research trends in these areas. In addition, our bibliometric analysis also includes trends in different geographical regions. Our results indicate that manufacturing and agriculture or forestry are the two application areas with the fastest growth. Furthermore, our bibliometric study shows that deep learning and edge or fog computing are the enabling technologies increasing the most. We believe that the data presented in this paper provide a good overview of the current research trends in Big Data and that this kind of information is very useful when setting strategic agendas for Big Data research.

Keywords:

survey; Big Data; telecommunication; image processing; smart cities; manufacturing; parallel processing; storage systems; cloud computing; deep learning

1. Introduction

The amount of data collected and managed in most applications is increasing at a staggering pace, including almost all industrial and commercial areas. In May 2018, Forbes noted that 2.5 quintillion (

10^{18}

) bytes of data are produced every day [1], and the data production rate is increasing all the time. There are two main challenges associated with these enormous amounts of data: (1) we need to provide storage systems, database systems, processing platforms, etc. to technically handle the data in a fast, cost-effective and secure way, and (2) we need to develop analysis methods (e.g., AI and machine learning) that can automatically find useful trends and patterns in the data so that we can produce business as well as user value. The need for addressing both these aspects was discussed in a recent survey by Roh et al. [2].

The exploding amount of data in many domains and applications makes it crucial that the advancements in Big Data research find their way from the research labs to practical applications and that these research results can be successfully integrated into industrial and commercial processes and systems. In order to formulate the most important topics for further research in Big Data, we need to identify which are the most important application areas and which of these application areas grow the fastest. For the same reason, we also need to identify the most important enabling technologies for Big Data and which of these enabling technologies develop the fastest. To further promote relevant and well-informed research in Big Data, it is useful to understand where in the world most research is currently being conducted and what the trends are.

In this paper, we analyze more than 80 recent research papers, surveys and books in the area of Big Data. Based on this analysis and almost a decade of close contact with leading practitioners in the field, we have identified seven important application areas and six important groups of enabling technologies for Big Data. For these seven application areas and the six enabling technologies, we analyze the bibliometric trends during the last decade. Our bibliometric analysis also includes Big Data research trends in different geographical regions.

2. Methodology

The methodology in this study consists of two steps:

1.: Identifying the major application areas and the major enabling technologies for Big Data;
2.: Using bibliometrics to quantify how the research interest for each of the identified application areas and enabling technologies for Big Data have developed during the last decade. In this step, we also quantify how the total number of research publications in Big Data from different geographical regions has developed during the last decade.

We describe these two steps in Section 2.1 and Section 2.2.

2.1. Identifying Important Application Areas and Enabling Technologies in Big Data

We used the following two main approaches for identifying important application areas and enabling technologies in Big Data:

1.: Extensive contacts with experts from the industry and research community in the Big Data domain for more than 8 years;
2.: Systematically reviewing the recent literature in Big Data.

Using these two approaches, we identified seven important application areas and six important groups of enabling technologies for Big Data. We describe the two approaches below.

2.1.1. Extensive Contacts with Experts from the Industry and Research Community

Our research projects are mostly executed in collaboration with industry and society. We have much experience with this type of collaboration, and through this collaboration, we gain both input on important industrial trends as well as knowledge about the state of practice in the industry. For example, during 2014–2020, we conducted the project BigData@BTH (https://www.bth.se/bigdata, accessed on 5 August 2022)—“Scalable resource-efficient systems for big data analytics”, where more than 15 researchers collaborated with 11 company partners. Some trends that we identified during that project were the importance of scalable solutions for analyzing large amounts of data (e.g., through scalable storage solutions as well as cloud, multicore and GPU computing) and the rise in machine learning and AI in many application areas (e.g., image and document analysis, telecommunication, social media, anomaly detection and decision support systems). Associated with the BigData@BTH project, we had a reference group with representatives from both the industry and international academia.

We also attained input on Big Data trends from our international network. We have collaborations, projects and joint studies with many European universities, such as the Hasso-Plattner Institute in Potsdam, Germany, Tor Vergata University of Rome and Sapienza University of Rome (both in Italy), University of Sofia, Bulgaria and KTO Karatay University, Turkey. These international collaborations provide important input from an international perspective.

2.1.2. Systematically Reviewing the Recent Literature on Big Data

Systematically reviewing the recent literature is another important approach to identifying the trends, enabling technologies and application areas for Big Data. As part of this, we edited a special issue in the journal Big Data Research in 2021 [3]. The focus of the special issue was “Big Data in Industrial and Commercial Applications”. The submitted papers gave clear indications of which areas researchers found interesting and relevant for Big Data. Based on the submitted papers, we identified and organized the papers into categories such as telecommunication, smart cities, document and image processing and social media. These categories and application areas are also represented in this article and provide a base for which areas to focus on initially.

2.2. Bibliometric Study

The bibliometric information was obtained from the Scopus (https://www.scopus.com/, accessed on 25 May 2022)) database. We started by conducting a search based on the keyword “Big Data” (written as {big data} in Scopus) in the title, keywords or abstract (written as TITLE-ABS-KEY in Scopus) for all documents and for the years 2012–2021. Let the name BigData2012–21 denote the found set of documents from this search. The number of publications for each year in BigData2012–21 was then stored in a table and plotted.

To quantify and visualize the trends for different application areas for Big Data, we did the following. Within BigData2012–21, we searched for each of the seven application areas identified in the previous step (see Section 2.1). The number of publications for each year for these seven application areas was then stored in a table and plotted.

To quantify and visualize the trends for different enabling technologies for Big Data, we applied a similar approach. Within BigData2012–21, we searched for each of the six enabling technologies identified in the previous step (see Section 2.1). The number of publications for each year for these six enabling technologies was then stored in a table and plotted.

To quantify and visualize the trends for Big Data research in different geographical regions, we looked at BigData2012–21 and found that two countries stood out compared with the rest of the world: USA and China. Therefore, the number of publications per year for USA, China and Others (the rest of the world) for the years 2012–2021 were stored in a table and plotted.

3. Application Areas for Big Data

Big Data is becoming increasingly important in many industrial and commercial application areas [4,5]. One example of this is the telecommunication domain [6,7]. Today, phones in a mobile network continuously generate Call Detail Records (CDRs) and Internet Protocol Detailed Records (IPDRs). These huge amounts of data can be used to create value for the telecom operators in different ways. Xia et al. showed that Big Data analytics can be used for churn prediction in telecommunication networks [8]. In [9,10], Sidorova et al. showed that this kind of mobility data makes it possible to predict and balance the load in telecommunication networks. Since this is a Big Data application, one not only needs to find methods that can extract the business value from the data, but it is equally important to have proper database systems for handling large amounts of mobility data in an efficient way. This aspect was explored by Niyizamwiyitira and Lundberg [11]. Two conclusions from that and similar studies were that the Cassandra NoSQL database is very efficient in many cases, and many providers of industrial and commercial Big Data applications use virtualized cloud systems for storing and processing huge amounts of data in an efficient and scalable way. The performance characteristics of cloud-based storage and Cassandra in virtualized environments were investigated by Shirinbab et al. [12,13]. In [14], Souza et al. discussed the fact that context-aware mobile applications are emerging as a relevant technology to improve user satisfaction in telecommunication networks. That paper emphasizes the importance of developing methods that can support us to find value in huge amounts of data and at the same time provide ways of handling the data in an efficient way.

Another industrial and commercial application area where Big Data plays an important role is manufacturing [15,16]. In [17], O’Donovan et al. presented a systematic literature mapping of different areas in manufacturing where Big Data analytics have been applied. The authors identified the following eight areas: process and planning; enterprise; maintenance and diagnosis; supply chain; transport and logistics; environment, health and safety; product design and quality. The study by O’Donovan et al. list these eight areas in a decreasing order related to the amount of research being performed (i.e., the largest amount of research was conducted in the process and planning area, and the smallest amount of research was in the area of quality). The pharmaceutical industry is heavily regulated, and it is therefore important that production lines are auditable. In [18], Leal at al. suggested an architecture for smart pharmaceutical manufacturing using blockchain properties and smart contracts to ensure data authenticity, transparency and immutability. In [19], Gupta and Goyal performed a review of the existing research on Big Data in manufacturing. In their paper, they identified 16 barriers that need to be addressed before the manufacturing industry can fully benefit from Big Data analytics. The most critical barrier is the lack of commitment from top management.

Big Data has been used in the application area of smart cities [20]. Smart cities typically include huge numbers of different sensors and Internet of Things devices, which continuously produce large amounts of data regarding human behavior and mobility in a smart city [21]. By using Big Data analytics on such data, one can improve sustainability and the quality of life [22]. One way of handling Big Data efficiently in smart cities is to process the data close to the sources using edge or fog computing [23]. In [24], Fugini et al. presented the approach to Big Data analytics developed in an industry academia project in Italy (the SIBDA project). The paper discusses the elements of Big Data tackled in the three different subareas in the project, namely document processing, mass e-mail applications and Internet of Things sensor networks. The paper discusses the dual challenges in industrial and commercial Big Data systems: one must develop analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way. In [25], Koulali et al. discussed a smart city scenario where citizens take active parts in improving the overall quality of life by taking pictures and videos of different infrastructure problems when they encounter them in their daily lives. These images and videos are uploaded using smartphones, thus allowing city authorities to make appropriate incident responses. This paper proposes a benchmark of machine learning algorithms for image classification evaluated on a dataset of captured images by citizens that cover problems related to water and electricity distribution. The paper identifies the need for Big Data analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way.

Image processing is an important industrial and commercial application area from a Big Data analytics perspective. One reason for this is that many image processing, analysis, object detection and classification tasks rely on neural networks, mainly deep learning and convolutional neural networks (CNNs) [26,27,28,29]. These network models can be trained to reach high levels of accuracy [30,31] and have had a significant impact on our daily lives (e.g., in self-driving cars). However, these models are often very large (sometimes hundreds of millions of parameters) and require enormous amounts of data to train. Furthermore, large models require powerful hardware to train the models, and the training times are often very long. Providing labeled data to train deep learning models is a daunting task. For example, the ImageNet database contains more than a million images in 1000 different classes [32]. Although there are several open datasets (e.g., the UC Irvine Machine Learning Repository), many of these datasets are relatively small and not feasible for training deep learning models. Therefore, large, open labeled datasets are important input to the research community. The recent explosion of online data, video and images available through, for example, social media and streaming services calls for efficient processing pipelines for these scenarios as well as efficient labeling of such data for model training purposes.

Document analysis is an important subdomain of image processing [33]. With an increasing number of digital documents, new scalable analysis methods are necessary [34]. Furthermore, new benchmarks and datasets [35,36] are also important to develop in order to properly evaluate the proposed methods. In particular, historical handwritten documents pose a number of challenges, such as image enhancement and binarization, layout analysis, segmentation and character recognition [34,37,38,39,40].

In [41], Gani et al. presented a survey of the role of Big Data in social media. In [42], Jiang and Fu discussed the relation between Big Data, ethics and the need for personal integrity in Chinese social media. They concluded that ethical aspects are not taken into consideration in the way that a human-centric approach would demand. In [43], Yang et al. analyzed how Big Data obtained from social media can make it possible to detect problems related to adverse drug reactions. Other researchers have studied how social media Big Data can be used to prevent drug abuse and addiction problems [44]. In [45], Arrigo et al. studied users’ preferences, stated on a social media platform, in order to aid businesses in making their marketing communication decisions.

Agriculture and forestry are two related fields where Big Data plays an increasingly important role. In [46], Rossit et al. discussed opportunities and challenges related to the fact that modern forest harvesters can collect large amounts of data. Zou et al. wrote a survey of Big Data for smart forestry [47]. In [48], Osinga et al. summarized the experiences of Big Data in precision agriculture in 12 use cases in a Horizon 2020 project. The use of Big Data in animal agriculture is discussed by Morota et al. in [49]. Kamilaris et al. wrote a review on the practice of Big Data in agriculture [50].

In [51], Hasan et al. presented a survey of Big Data in finance. The editor of a recent special issue on Big Data in finance performed a bibliometric review that showed that the interest for Big Data is increasing rapidly in the financial sector [52]. In [53], Goldstein et al. presented an overview of the papers submitted to another special issue on Big Data in finance. Two conclusions drawn from that special issue were that (1) more research is needed on how the use of Big Data should be regulated in the financial sector, and (2) future research on Big Data in finance may involve scholars from fields other than finance (e.g., scholars from computer science and mathematics). In [54], Cockcroft and Russell identified subareas related to Big Data in finance that need further research. The under-researched subareas identified were privacy and security, data visualization and predictive analytics, data management and data quality.

Big Data analytics are used also in other fields than the ones discussed in this section. For example, in [55], Alani et al. collected 10 papers that describe the experiences of Big Data analytics in different application areas (e.g., in education, healthcare and environmental contexts).

We analyzed the keywords used in the papers referenced in this section and saw some interesting patterns. Aside from the expected keywords related to Big Data and the specific application areas discussed, the most common keyword was machine learning. Machine learning (or some variation of that, including deep learning) was used as a keyword in almost one third of the papers referenced in this section. Another interesting observation was that the word smart was used in a number of contexts, including not only smart cities but also smart contracts, smart manufacturing, smart environments, smart grids, smart transport, smart healthcare, smart communities and smart farming. It seems that the word smart is used to indicate that one has been able to create value through Big Data analysis and that different forms of machine learning are the most important ways of extracting such value from large datasets.

4. Enabling Technologies for Big Data

As discussed previously, handling the large amounts of data that all application areas of Big Data require is a challenge. As a result of this, considerable research has been directed toward high-performance computing and supercomputing techniques for Big Data [56]. In [57], Mirtaheri and Grandinetti discussed the importance of optimized load balancing in high-performance computing for Big Data analytics. The importance of load balancing and optimization for Big Data application was also identified by Kumar and Kumar Jha [58].

One way to improve the performance of applications using Big Data is to use GPUs or other forms of parallel and distributed processing. In [59], Chen et al. presented an architecture that enables Apache Flink (https://flink.apache.org/, accessed on 5 August 2022) to benefit from the massive parallel processing capacity of modern GPUs. In [60], Jurczuk et al. presented another system that uses GPUs for accelerating the performance of Big Data applications. Ahmad et al. suggested a system that improves the parallel processing performance of Big Data applications using Apache Hadoop MapReduce [61]. In [62], Dolev et al. presented a survey of geographically distributed Big Data processing using MapReduce and Apache Spark (https://spark.apache.org/, accessed on 5 August 2022). In [63], Wang et al. proposed an optimization algorithm that improves the computational performance when analyzing Earth system models on multi-core clusters. Such models use massive amounts of data and are used for weather forecasting. In [64], Chen at al. discussed how massive parallel processing can be used for providing high performance in applications that process Big Data for describing brain activities and functions (e.g., EEG data). Finally, Zhang et al. conducted a survey of parallel processing systems for Big Data [65].

When we discussed the telecommunication domain, we showed that efficient storage systems and database systems for Big Data are very important. In [66], Xu et al. defined a scheme for optimizing performance by placing data with high I/O cost in fast SSD storage. In [67], Lee et al. improved the locality of network and storage I/O operations on many-core systems running Big Data applications using Apache Hadoop MapReduce. In [68], Lu et al. discussed the importance of proper parameter settings in high-performance database systems for Big Data. In [69], Zhang et al. defined a new benchmark and a new set of tools for benchmarking database systems for Big Data applications. In [70], Bauer et al. described the implementation of a data lake, where a company can store their raw data in such a way that it could be governed by one set of policies but processed by multiple teams using different tools.

There are many publications that discuss how cloud computing can be used as an enabling technology for Big Data applications, including two conference series on the subject [71,72]. In [73], Aceto et al. discussed how cloud computing and Big Data can be used as enablers for the Fourth Industrial Revolution in healthcare. In [74], Hashem et al. discussed the relation between Big Data and cloud computing, including research challenges. Zbakh et al. were the editors for a special issue on cloud computing and Big Data [75]. The manuscripts of that special issue focus on job scheduling, resource optimization, privacy and security and performance evaluation.

A recent trend is that the processing of Internet of Things (IoT) Big Data is performed closer to where the data originate. This is called edge computing or fog computing [76] (as opposed to cloud computing, where the processing is performed in the cloud). In [77], Sanchez-Gallegos et al. discussed performance aspects related to setting up a software pipeline for continuous delivery of Big Data from the edge to the cloud. In [78], Barbik et al. proposed a three-tier secure framework for efficient management of health data using fog devices. In [79], Du et al. discussed security and the challenges related to handling Big Data and training machine learning models in edge computing environments. In [80], Lai et al. used long short-term memory (LSTM) and edge computing to recognize different types of electrical equipment in the Industrial Internet of Things (IIoT).

Machine learning in general, and deep learning in particular, has become an important enabling technology in many Big Data application areas in recent years [81]. In [82], Hossain and Muhammad proposed an emotion recognition system using a deep learning approach from emotional Big Data. In [83], Sohangir et al. investigated how Big Data and deep learning can be used for analyzing the stock market. Dekhtiar et al. provided an overview of the use of deep learning on Big Data in manufacturing applications [84]. In [85], Dargazany et al. provided a review of current research in the intersection between deep learning, wearable IoT and Big Data. An overview of the use of deep learning in different Big Data application areas is provided by Khan et al. [86].

In [87], Sakr provided a survey of enabling technologies for Big Data processing. That survey discussed most of the major systems used for Big Data processing, including Hadoop and Spark. In [88], Misale et al. presented a C++ interface called Pipeline Composition (PiCo). PiCo can obtain higher performance than Spark and Flink.

We also analyzed the keywords used in the papers referenced in this section. Most of the keywords were, as could be expected, related to either Big Data or to the enabling technologies discussed. However, for the references in this section as well, it was clear that the word smart was used to indicate the creation of value in an area (e.g., smart health, smart edges and smart grid). An additional observation was that the names of specific technologies such as MapReduce, Hadoop and Spark were relatively often used as keywords for the papers referenced in this section.

5. Trends in Big Data Research

Table 1 and Figure 1 show the number of Big Data research publications per year for the years 2012–2021 (see Section 2.2 for details on how the search was performed). It is clear that the research interest increased significantly during the last decade. However, the number of publications seemed to be rather stable during the last three years. The search was performed on 25 May 2022. Due to delayed updates of the Scopus database, the final number of publications for 2021 will probably be a bit higher than what is shown in Table 1.

Table 1. The number of research publications per year in the area of Big Data.

Figure 1. The number of research publications per year in the area of Big Data.

Table 2, along with Figure 2 and Figure 3, shows the number of research publications per year for the seven application areas for Big Data that we identified. Figure 2 and Figure 3 show that the research publication totals for the Big Data application areas manufacturing, finance, agriculture or forestry and smart cities are increasing, whereas the research publication totals for the Big Data application areas of social media, image processing and telecommunication are decreasing.

Table 2. The number of research publications per year for the seven application areas for Big Data that we have identified.

Figure 2. The number of research publications per year for the Big Data application areas manufacturing, social media and finance.

Figure 3. The number of research publications per year for the Big Data application areas of telecommunication, smart cities, image processing and agriculture or forestry.

Table 3, along with Figure 4 and Figure 5, shows the number of research publications per year for the six enabling technology areas for Big Data that we identified. Figure 4 and Figure 5 show that the research publication totals for the Big Data-enabling technologies of deep learning and edge or fog computing are increasing, whereas the research publication totals for the other Big Data-enabling technologies are decreasing or at least seem to have passed their peak values.

Table 3. The number of research publications per year for the six enabling technologies for Big Data that we identified.

Figure 4. The number of research publications per year for the Big Data-enabling technologies deep learning and cloud computing.

Figure 5. The number of research publications per year for the Big Data-enabling technologies of high-performance computing or supercomputing; parallel processing, distributed processing or GPU; storage systems, database systems or data lakes and edge computing or fog computing.

Table 4 and Figure 6 show the number of research publications in Big Data per year for different geographic regions. Figure 6 shows that two geographic regions are dominating: USA and China. The figure also shows that the number of research publications in Big Data from China is increasing, whereas the number research publications in Big Data from the USA and the rest of the world is decreasing. Today, China is clearly the part of the world where most research in the area of Big Data is conducted.

Table 4. The number of research publications in Big Data per year for different geographic regions.

Figure 6. The number of research publications in Big Data per year for different geographic regions.

6. Discussion

When we looked at the literature, we found two concepts that are clearly related to Big Data: AI and the IoT. In Section 2.2, we defined a set called BigData2012–21. Within this set, we searched for AI and the IoT. The number of publications for each year for AI and the IoT are shown in Table 5, and the corresponding values are plotted in Figure 7.

Table 5. The number of research publications related to Big Data per year for AI and IoT.

Figure 7. The number of research publications related to Big Data per year for AI and IoT.

Our experience reading a significant number of papers in the area of Big Data indicates that the connection between Big Data and AI is, in many cases, taken for granted. By comparing the curve for AI in Figure 7 with Figure 1, we see that the bibliometric data seem to support this, since the two curves are very similar and almost every second paper in the Big Data area also mentioned AI.

The IoT is a concept that is related to several application areas of Big Data, particularly the application areas of manufacturing, smart cities and agriculture or forestry. By comparing the curve for the IoT in Figure 7 with the curve for manufacturing in Figure 2 and the curves for smart cities and agriculture or forestry in Figure 3, we see that the growing interest for Big Data in manufacturing, smart cities and agriculture or forestry seems to be related to the trend toward the IoT in Big Data. We also believe that the trend toward the IoT is the main reason why we saw an increasing interest in fog or edge computing in Big Data (see Figure 5).

7. Conclusions

The number of research papers in Big Data has increased significantly during the last decade (663 papers in 2012 to 19,758 papers in 2021). However, over the last three years, the number of research papers has been relatively constant. Through extensive contacts with experts from the industry and the research community for Big Data for more than 8 years, and by systematically reviewing the recent literature on Big Data, we identified seven main application areas in Big Data: social media, manufacturing, finance, image processing, smart cities, agricultureor forestry and telecommunications. Manufacturing and agriculture or forestry are the two application areas with the fastest growth in terms of research papers. We also identified six enabling technologies for Big Data: deep learning; cloud computing; high-performance computing or supercomputing; parallel processing, distributed Processing or GPU; storage systems, database systems or data lakes and edge or fog computing. Our bibliometric study shows that the deep learning and edge or fog computing areas are increasing.

When analyzing the geographical distribution of the research on Big Data, it is clear that two countries are dominating: USA and China. It is also clear that the research interest for Big Data in China is increasing much faster than it is in the USA and in the rest of the world. In fact, the number of publications in Big Data is decreasing in the USA and in the rest of the world.

Our literature survey shows that the connection between Big Data and AI is, in many cases, taken for granted. The bibliometric data support this observation (e.g., almost every second paper in the Big Data area also mentioned AI). The Internet of Things (IoT) is an area that is growing rapidly for Big Data research. Our literature survey shows that the IoT is a concept that is related to several application areas of Big Data, particularly the application areas of manufacturing, smart cities and agriculture or forestry. The growing interest in Big Data in manufacturing, smart cities and agriculture or forestry seems to be related to the trend toward the IoT in Big Data. We also believe that the trend toward the IoT is the main reason why we saw an increasing interest in fog or edge computing in Big Data.

An additional conclusion from our literature review is that there is a dual challenge in industrial and commercial Big Data systems: one must develop analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way. Most Big Data applications will only be useful when both of these challenges are successfully addressed.

The data presented in this paper provide a good overview of the current research trends in Big Data, and this kind of information is very useful when setting strategic agendas for Big Data research.

Author Contributions

Conceptualization, L.L.; methodology, L.L. and H.G.; validation, L.L.; investigation, L.L. and H.G.; data curation, L.L.; writing—original draft preparation, L.L. and H.G.; writing—review and editing, L.L. and H.G.; visualization, L.L.; funding acquisition, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly funded by the Excellence Center at Linköping—Lund in Information Technology (ELLIIT) project “GPAI—General Purpose AI Computing”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The bibliometric data presented in this study were obtained from the Scopus reference database.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Marr, B. How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. 2018. Available online: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/?sh=661e274e60ba (accessed on 5 August 2022).
Roh, Y.; Heo, G.; Whang, S. A Survey on Data Collection for Machine Learning: A Big Data—AI Integration Perspective. IEEE Trans. Knowl. Data Eng. 2021, 33, 1328–1347. [Google Scholar] [CrossRef]
Lundberg, L.; Grahn, H.; Cardellini, V.; Polze, A.; Shirinbab, S. Editorial to the Special Issue on Big Data in Industrial and Commercial Applications. Big Data Res. 2021, 26, 100244. [Google Scholar] [CrossRef]
Vassakis, K.; Petrakis, E.; Kopanakis, I. Big Data Analytics: Applications, Prospects and Challenges. In Mobile Big Data: A Roadmap from Models to Technologies; Skourletopoulos, G., Mastorakis, G., Mavromoustakis, C.X., Dobre, C., Pallis, E., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–20. [Google Scholar] [CrossRef]
Desai, P.V. A survey on big data applications and challenges. In Proceedings of the 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), Coimbatore, India, 20–21 April 2018; pp. 737–740. [Google Scholar] [CrossRef]
Wang, Z.; Wei, G.; Zhan, Y.; Sun, Y. Big Data in Telecommunication Operators: Data, Platform and Practices. J. Commun. Inf. Netw. 2017, 2, 78–91. [Google Scholar] [CrossRef][Green Version]
Zahid, H.; Mahmood, T.; Morshed, A.; Sellis, T. Big data analytics in telecommunications: Literature review and architecture recommendations. IEEE/CAA J. Autom. Sin. 2020, 7, 18–38. [Google Scholar] [CrossRef]
Xia, X.; Zeng, L.; Yu, R. HMM of telecommunication big data for consumer churn prediction. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/ IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 1903–1910. [Google Scholar] [CrossRef]
Sidorova, J.; Sköld, L.; Rosander, O.; Lundberg, L. Optimizing utilization in cellular radio networks using mobility data. Optim. Eng. 2019, 20, 37–64. [Google Scholar] [CrossRef]
Sidorova, J.; Rosander, O.; Skold, L.; Grahn, H.; Lundberg, L. Finding a Healthy Equilibrium of Geo-demographic Segments for a Telecom Business: Who Are Malicious Hot-Spotters? In Machine Learning Paradigms: Advances in Data Analytics, Intelligent Systems Reference Library, Volume 149; Tsihrintzis, G.A., Sotiropoulos, D.N., Jain, L.C., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 187–196. [Google Scholar] [CrossRef]
Niyizamwiyitira, C.; Lundberg, L. Performance evaluation of SQL and NoSQL database management systems in a cluster. Int. J. Database Manag. Syst. 2017, 9, 124. [Google Scholar] [CrossRef]
Shirinbab, S.; Lundberg, L.; Erman, D. Performance evaluation of distributed storage systems for cloud computing. Int. J. Comput. Their Appl. 2013, 20, 195–207. [Google Scholar]
Shirinbab, S.; Lundberg, L.; Casalicchio, E. Performance evaluation of containers and virtual machines running Cassandra workload concurrently. Concurr. Comput. Pract. Exp. 2020, 32, e5693. [Google Scholar] [CrossRef]
Souza, R.P.; dos Santos, L.J.; Coimbra, G.T.; Silva, F.A.; Silva, T.R. A big data-driven hybrid solution to the indoor-outdoor detection problem. Big Data Res. 2021, 24, 100194. [Google Scholar] [CrossRef]
Dubey, R.; Gunasekaran, A.; Childe, S.; Blome, C.; Papadopoulos, T. Big data and predictive analytics and manufacturing performance: Integrating institutional theory, resource-based view and big data culture. Brit. J. Manag. 2019, 3, 341–361. [Google Scholar] [CrossRef]
Cui, Y.; Kara, S.; Chan, K.C. Manufacturing big data ecosystem: A systematic literature review. Robot. Comput.-Integr. Manuf. 2020, 62, 101861. [Google Scholar] [CrossRef]
O’Donovan, P.; Leahy, K.; Bruton, K.; O’Sullivan, D. Big data in manufacturing: A systematic mapping study. J. Big Data 2015, 2, 20. [Google Scholar] [CrossRef]
Leal, F.; Chis, A.; Caton, S.; González-Vélez, H.; García-Gómez, J.; Durá, M.; Sánchez-García, A.; Sáez, C.; Karageorgos, A.; Gerogiannis, V.; et al. Smart pharmaceutical manufacturing: Ensuring end-to-end traceability and data integrity in medicine production. Big Data Res. 2021, 24, 100172. [Google Scholar] [CrossRef]
Gupta, A.K.; Goyal, H. Framework for implementing big data analytics in Indian manufacturing: ISM-MICMAC and Fuzzy-AHP approach. Inf. Technol. Manag. 2021, 22, 207–229. [Google Scholar] [CrossRef]
Hashem, I.; Chang, V.; Anour, N.; Adewole, K.; Yaqoob, I.; Gani, A.; Ahmed, E.; Chiroma, H. The role of big data in smart city. Int. J. Inf. Manag. 2016, 36, 748–758. [Google Scholar] [CrossRef]
Jara, A.; Genoud, D.; Bocchi, Y. Big data for smart cities with KNIME: A real experience in the SmartSantander testbed. Softw. Pract. Exp. 2015, 45, 1145–1160. [Google Scholar] [CrossRef]
Popescu, G.; Lazaroiu, G.; Kovacova, M.; Valaskova, K.; Majerova, J. Urban sustainability analytics: Harnessing Big Data for smart city planning and design. Theor. Empir. Res. Urban Manag. 2020, 15, 39–48. [Google Scholar]
Badidi, E.; Mahrez, Z.; Sabir, E. Fog computing for smart cities’ big data management and analytics: A review. Future Internet 2020, 12, 190. [Google Scholar] [CrossRef]
Fugini, M.; Finocchi, J.; Locatelli, P. A big data analytics architecture for smart cities and smart companies. Big Data Res. 2021, 24, 100192. [Google Scholar] [CrossRef]
Koulali, R.; Zaidani, H.; Zaim, M. Image classification approach using machine learning and an industrial Hadoop based data pipeline. Big Data Res. 2021, 24, 100184. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Kavukcuoglu, K.; Sermanet, P.; Boureau, Y.; Gregor, K.; Mathieu, M.; LeCun, Y. Learning convolutional feature hierarchies for visual recognition. In Proceedings of the 23rd International Confenerce on Neural Information Processing Systems (NIPS’10), Vancouver, BC, Canada, 6–9 December 2010; pp. 1090–1098. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of the 2nd International Conference on Learning Representations 2014 (ICLR’14), Banff, AB, Canada, 14–16 April 2014. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 (NIPS’12), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Hussain, R.; Raza, A.; Siddiqi, I.; Khurshid, K.; Djeddi, C. A comprehensive survey of handwritten document benchmarks: Structure, usage and evaluation. EURASIP J. Image Video Process. 2015, 2015, 46. [Google Scholar] [CrossRef]
Westphal, F.; Grahn, H.; Lavesson, N. Efficient document image binarization using heterogeneous computing and parameter tuning. Int. J. Doc. Anal. Recognit. 2018, 21, 41–58. [Google Scholar] [CrossRef]
Fernández-Mota, D.; Almazán, J.; Cirera, N.; Fornés, A.; Lladós, J. BH2M: The Barcelona historical, handwritten marriages database. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14), Stockholm, Sweden, 24–28 August 2014; pp. 256–261. [Google Scholar] [CrossRef]
Kusetogullari, H.; Yavariabdi, A.; Cheddad, A.; Grahn, H.; Hall, J. Efficient document image binarization using heterogeneous computing and parameter tuning. Neural Comput. Appl. 2020, 32, 16505–16518. [Google Scholar] [CrossRef]
Westphal, F.; Lavesson, N.; Grahn, H. Learning character recognition with graph-based privileged information. In Proceedings of the 15th International Conference on Document Analysis and Recognition (ICDAR’19), Sydney, NSW, Australia, 20–25 September 2019; pp. 1163–1168. [Google Scholar] [CrossRef]
Kusetogullari, H.; Grahn, H.; Lavesson, N. Handwriting image enhancement using local learning windowing, Gaussian mixture model and k-means clustering. In Proceedings of the 16th IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2016), Limassol, Cyprus, 12–14 December 2016; pp. 305–310. [Google Scholar] [CrossRef]
Kusetogullari, H.; Yavariabdi, A.; Hall, J.; Lavesson, N. Diginet: A deep handwritten digit detection and recognition method using a new historical handwritten digit dataset. Big Data Res. 2021, 23, 100182. [Google Scholar] [CrossRef]
Liang, X.; Cheddad, A.; Hall, J. Comparative study of layout analysis of tabulated historical documents. Big Data Res. 2021, 24, 100195. [Google Scholar] [CrossRef]
Ghani, N.A.; Hamid, S.; Targio Hashem, I.A.; Ahmed, E. Social media big data analytics: A survey. Comput. Hum. Behav. 2019, 101, 417–428. [Google Scholar] [CrossRef]
Jiang, M.; Fu, K. Chinese social media and big data: Big data, big brother, big profit? Policy Internet 2018, 10, 372–392. [Google Scholar] [CrossRef]
Yang, M.; Kiang, M.; Shang, W. Filtering big data from social media – Building an early warning system for adverse drug reactions. J. Biomed. Inform. 2015, 54, 230–240. [Google Scholar] [CrossRef] [PubMed]
Kim, S.J.; Marsch, L.A.; Hancock, J.T.; Das, A.K. Scaling up research on drug abuse and addiction through social media big data. J. Med. Internet Res. 2017, 19, e353. [Google Scholar] [CrossRef] [PubMed]
Arrigo, E.; Liberati, C.; Mariani, P. Social media data and users’ preferences: A statistical analysis to support marketing communication. Big Data Res. 2021, 24, 100189. [Google Scholar] [CrossRef]
Rossit, D.; Olivera, A.; Cespedes, V.; Broz, D. A Big Data approach to forestry harvesting productivity. Comput. Electron. Agric. 2019, 161, 29–52. [Google Scholar] [CrossRef]
Zou, W.; Jing, W.; Chen, G.; Lu, Y.; Song, H. A Survey of Big Data Analytics for Smart Forestry. IEEE Access 2019, 7, 46621–46636. [Google Scholar] [CrossRef]
Osinga, S.; Paudel, D.; Mouzakitis, S.; Athanasiadis, I. Big data in agriculture: Between opportunity and solution. Agric. Syst. 2022, 195, 103298. [Google Scholar] [CrossRef]
Morota, G.; Ventura, R.; Silva, F.; Koyama, M.; Fernando, S. Big Data analytics and Precision animal agriculture symposium: Machine learning and data mining advance predictive big data analysis in precision animal agriculture. J. Anim. Sci. 2018, 96, 1540–1550. [Google Scholar] [CrossRef] [PubMed]
Kamilaris, A.; Kartakoullis, A.; Prenafeta-Boldu, F. A review on the practice of big data analysis in agriculture. Comput. Electron. Agric. 2017, 143, 23–37. [Google Scholar] [CrossRef]
Hasan, M.; Popp, J.; Olah, J. Current landscape and influence of big data on finance. J. Big Data 2020, 7, 21. [Google Scholar] [CrossRef]
Nobanee, H. A Bibliometric Review of Big Data in Finance. Big Data 2021, 9, 73–78. [Google Scholar] [CrossRef] [PubMed]
Goldstein, I.; Spatt, C.; Ye, M. Big Data in Finance. Rev. Financ. Stud. 2021, 34, 3213–3225. [Google Scholar] [CrossRef]
Cockcroft, S.; Russell, M. Big Data Opportunities for Accounting and Finance Practice and Research. Aust. Account. Rev. 2018, 28, 323–333. [Google Scholar] [CrossRef]
Alani, M.; Tawfik, H.; Saeed, M.; Anya, O. Applications of Big Data Analytics: Trends, Issues, and Challenges; Springer: Berlin/Heidelberg, Germany, 2018; ISBN 978-3-319-76471-9. [Google Scholar]
Grandinetti, L.; Mirtaheri, S.; Shahbazian, R. (Eds.) Communications in Computer and Information Science-High-Performance Computing and Big Data Analysis; Springer: Berlin/Heidelberg, Germany, 2019; Volume 891. [Google Scholar]
Mirtaheri, S.; Grandinetti, L. Optimized load balancing in high-performance computing for big data analytics. Concurr. Comput. Pract. Exp. 2021, 33, e6265. [Google Scholar] [CrossRef]
Kumar, D.; Kumar Jha, V. An efficient query optimization technique in big data using σ-ANFIS load balancer and CaM-BW optimizer. J. Supercomput. 2021, 77, 13018–13045. [Google Scholar] [CrossRef]
Chen, C.; Li, K.; Ouyang, A.; Li, K. FlinkCL: An OpenCL-Based In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data. IEEE Trans. Comput. 2018, 67, 1765–1779. [Google Scholar] [CrossRef]
Jurczuk, K.; Czajkowski, M.; Kretowski, M. Multi-GPU approach to global induction of classification trees for large-scale data mining. Appl. Intell. 2021, 51, 5683–5700. [Google Scholar] [CrossRef]
Ahmad, A.; Paul, A.; Din, S.; Rathore, M.; Choi, G.; Jeon, G. Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing. Int. J. Parallel Program. 2018, 46, 508–527. [Google Scholar] [CrossRef]
Dolev, S.; Florissi, P.; Gudes, E.; Sharma, S.; Singer, I. A Survey on Geographically Distributed Big-Data Processing Using MapReduce. IEEE Trans. Big Data 2019, 5, 60–80. [Google Scholar] [CrossRef]
Wang, Y.; Hao, H.; Zhang, J.; Jiang, J.; He, J.; Ma, Y. Performance optimization and evaluation for parallel processing of big data in earth systems models. Clust. Comput. 2019, 22, 2371–2381. [Google Scholar] [CrossRef]
Chen, D.; Hu, Y.; Cai, C.; Zeng, K.; Li, X. Brain big data processing with massively parallel computing technology: Challenges and opportunities. Softw. Pract. Exp. 2017, 47, 405–420. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, T.; Li, S.; Tian, X.; Yuan, L.; Jia, H.; Vasilakos, A. Parallel Processing Systems for Big Data: A Survey. Proc. IEEE 2016, 104, 2114–2136. [Google Scholar] [CrossRef]
Xu, G.; Tan, Z.; Feng, D.; Yang, L.; Zhou, W.; Zhang, X.; Zhang, Y.; Xu, J. FvRS: Efficiently identifying performance-critical data for improving performance of big data processing. Future Gener. Comput. Syst. 2019, 91, 157–166. [Google Scholar] [CrossRef]
Lee, C.G.; Cho, J.Y.; Kim, J.; Jin, H.W. Transparent many-core partitioning for high-performance big data I/O. Concurr. Comput. Pract. Exp. 2020, 33, e6017. [Google Scholar] [CrossRef]
Lu, J.; Chen, Y.; Herodotou, H.; Babu, S. Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems. Proc. Vldb Endow. 2019, 12, 1970–1973. [Google Scholar] [CrossRef]
Zhang, C.; Li, Y.; Zhang, R.; Qian, W.; Zhou, A. Benchmarking for Transaction Processing Database Systems in Big Data Era. In Lecture Notes in Computer Science, Proceedings of the Benchmarking, Measuring, and Optimizing: First BenchCouncil International Symposium, Seattle, WA, USA, 10–13 December 2018; Revised Selected Papers; Springer: Cham, Switzerland, 2018; pp. 147–158. [Google Scholar] [CrossRef]
Bauer, D.; Froese, F.; Garcés-Erice, L.; Giblin, C.; Labbi, A.; Nagy, Z.; Pardon, N.; Rooney, S.; Urbanetz, P.; Vetsch, P.; et al. Building and operating a large-scale enterprise data analytics platform. Big Data Res. 2021, 23, 100181. [Google Scholar] [CrossRef]
Naiouf, M.; Rucci, E.; Chichizola, F.; De Giusti, L. (Eds.) Communications in Computer and Information Science-Cloud Computing, Big Data & Emerging Topics; Springer: Berlin/Heidelberg, Germany, 2021; Volume 1444. [Google Scholar]
Cai, Z.; Angryk, R.; Song, W.Z.; Li, Y.; Cao, X.; Bourgeois, A.; Luo, G.; Cheng, L.; Krishnamachari, B. (Eds.) IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), BDCloud-SocialCom-SustainCom; IEEE Computer Society: Washington, DC, USA, 2016; ISBN 978-1-5090-3936-4. [Google Scholar]
Aceto, G.; Persico, V.; Pescape, A. Industry 4.0 and Health: Internet of Things, Big Data, and Cloud Computing for Healthcare 4.0. J. Ind. Inf. Integr. 2020, 18, 100129. [Google Scholar] [CrossRef]
Hashem, I.; Yaqoob, I.; Anuar, N.; Mokhtar, S.; Gani, A.; Khan, S.U. The rise of “big data” on cloud computing: Review and open research issues. Inf. Syst. 2015, 47, 98–115. [Google Scholar] [CrossRef]
Zbakh, M.; Bakhouya, M.; Essaaidi, M.; Manneback, P. Cloud computing and big data: Technologies and applications. Concurr. Comput. Pract. Exp. 2018, 30, e4517. [Google Scholar] [CrossRef]
Sing, S.; Nayyar, A.; Kumar, R.; Sharma, A. Fog computing: From architecture to edge computing and big data processing. J. Supercomput. 2018, 75, 2070–2105. [Google Scholar] [CrossRef]
Sanchez-Gallegos, D.; Carrizales-Espinoza, D.; Reyes-Anastacio, H.; Gonzalez-Compean, J.; Carretero, J.; Morales-Sandoval, M.; Galaviz-Mosqueda, A. From the edge to the cloud: A continuous delivery and preparation model for processing big IoT data. Simul. Model. Pract. Theory 2020, 105, 102136. [Google Scholar] [CrossRef]
Barik, R.; Dubey, H.; Mankodiya, K. SOA-FOG: Secure Service-Oriented Edge Computing Architecture for Smart Health Big Data Analytics. In Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada, 4–16 November 2017; pp. 477–481. [Google Scholar] [CrossRef]
Du, M.; Wang, K.; Xia, Z.; Zhang, Y. Differential Privacy Preserving of Training Model in Wireless Big Data with Edge Computing. IEEE Trans. Big Data 2020, 6, 283–295. [Google Scholar] [CrossRef]
Lai, C.F.; Chien, W.C.; Yang, L.; Qiang, W. LSTM and Edge Computing for Big Data Feature Recognition of Industrial Electrical Equipment. IEEE Trans. Ind. Inform. 2019, 15, 2469–2477. [Google Scholar] [CrossRef]
Hassanien, A.; Darwish, A. (Eds.) Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges; Studies in Big Data; Springer: Berlin/Heidelberg, Germany, 2021; Volume 77, ISBN 978-3-030-59337-7. [Google Scholar]
Hossain, M.; Muhammad, G. Emotion recognition using deep learning approach from audio-visual emotional big data. Inf. Fusion 2019, 49, 69–78. [Google Scholar] [CrossRef]
Sohangir, S.; Wang, D.; Pomeranets, A.; Khoshgoftaar, T. Big Data: Deep Learning for financial sentiment analysis. J. Big Data 2018, 5, 3. [Google Scholar] [CrossRef]
Dekhtiar, J.; Durupt, A.; Bricogne, M.; Eynard, B.; Rowson, H.; Kiritsis, D. Deep learning for big data applications in CAD and PLM-Research review, opportunities and case study. Comput. Ind. 2018, 100, 227–243. [Google Scholar] [CrossRef]
Dargazany, A.; Stegagno, P.; Mankodiya, K. WearableDL: Wearable Internet-of-Things and Deep Learning for Big Data Analytics—Concept, Literature, and Future. Mob. Inf. Syst. 2018, 2018, 8125126. [Google Scholar] [CrossRef]
Khan, M.; Jan, B.; Farman, H. Deep Learning: Convergence to Big Data Analytics; SpringerBriefs in Computer Science; Springer: Berlin/Heidelberg, Germany, 2019; ISBN 978-981-13-3458-0. [Google Scholar]
Sakr, S. Big Data 2.0 Processing Systems—A Survey; Springer Briefs in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 978-3-030-44186-9. [Google Scholar] [CrossRef]
Misale, C.; Drocco, M.; Tremblay, G.; Martinelli, A.; Aldinucci, M. PiCo: High-performance data analytics in modern C++. Future Gener. Comput. Syst. 2018, 87, 392–403. [Google Scholar] [CrossRef]

Figure 1. The number of research publications per year in the area of Big Data.

Figure 2. The number of research publications per year for the Big Data application areas manufacturing, social media and finance.

Figure 3. The number of research publications per year for the Big Data application areas of telecommunication, smart cities, image processing and agriculture or forestry.

Figure 4. The number of research publications per year for the Big Data-enabling technologies deep learning and cloud computing.

Figure 5. The number of research publications per year for the Big Data-enabling technologies of high-performance computing or supercomputing; parallel processing, distributed processing or GPU; storage systems, database systems or data lakes and edge computing or fog computing.

Figure 6. The number of research publications in Big Data per year for different geographic regions.

Figure 7. The number of research publications related to Big Data per year for AI and IoT.

Table 1. The number of research publications per year in the area of Big Data.

Year	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
No. of Publications	663	2766	4817	8751	12,389	13,679	16,020	19,366	19,693	19,758

Table 2. The number of research publications per year for the seven application areas for Big Data that we have identified.

Application Area∖Year	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
Telecommunication	29	85	169	370	658	550	737	720	675	628
Manufacturing	11	63	180	384	546	886	1180	1540	1912	2323
Smart Cities	8	28	69	166	327	476	747	1014	1096	1281
Image Processing	6	92	154	452	606	714	903	1147	1218	1083
Social Media	46	232	402	844	1223	1367	1548	2063	2040	2008
Agriculture or Forestry	7	47	97	188	341	396	615	882	1076	1359
Finance	10	54	111	265	390	531	672	987	1313	1588

Table 3. The number of research publications per year for the six enabling technologies for Big Data that we identified.

Enabling Tech.∖Year	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
High-Performance Computing or Supercomputing	69	203	382	753	1009	1012	1105	1378	1051	1008
Deep Learning	0	7	29	143	388	835	1674	3195	3958	4434
Cloud Computing	147	635	1078	2020	2880	3149	3594	4223	3644	3704
Parallel Processing, Distributed Processing or GPU	81	289	480	858	1373	1187	1239	1685	1221	867
Storage Systems or Database Systems or Data Lakes	115	411	639	1017	1214	1312	1313	1452	1187	1035
Edge Computing or Fog Computing	0	3	5	25	81	167	501	744	957	1123

Table 4. The number of research publications in Big Data per year for different geographic regions.

Countries∖Year	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
China	69	588	962	1970	3340	3862	4960	5831	8330	9263
USA	268	921	1523	2542	3063	3052	3139	3656	2917	2227
Other	326	1257	2332	4239	5986	6765	7921	9879	8446	8268

Table 5. The number of research publications related to Big Data per year for AI and IoT.

Concept∖Year	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021
Artificial Intelligence or AI	192	1068	1804	3580	4930	5497	6830	8605	9436	9353
Internet of Things or IoT	13	91	270	727	1307	1924	2999	4026	4442	5341

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Research Trends, Enabling Technologies and Application Areas for Big Data

Abstract

1. Introduction

2. Methodology

2.1. Identifying Important Application Areas and Enabling Technologies in Big Data

2.1.1. Extensive Contacts with Experts from the Industry and Research Community

2.1.2. Systematically Reviewing the Recent Literature on Big Data

2.2. Bibliometric Study

3. Application Areas for Big Data

4. Enabling Technologies for Big Data

5. Trends in Big Data Research

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics