Research Trends, Enabling Technologies and Application Areas for Big Data

: The availability of large amounts of data in combination with Big Data analytics has transformed many application domains. In this paper, we provide insights into how the area has developed in the last decade. First, we identify seven major application areas and six groups of important enabling technologies for Big Data applications and systems. Then, using bibliometrics and an extensive literature review of more than 80 papers, we identify the most important research trends in these areas. In addition, our bibliometric analysis also includes trends in different geographical regions. Our results indicate that manufacturing and agriculture or forestry are the two application areas with the fastest growth. Furthermore, our bibliometric study shows that deep learning and edge or fog computing are the enabling technologies increasing the most. We believe that the data presented in this paper provide a good overview of the current research trends in Big Data and that this kind of information is very useful when setting strategic agendas for Big Data research.


Introduction
The amount of data collected and managed in most applications is increasing at a staggering pace, including almost all industrial and commercial areas. In May 2018, Forbes noted that 2.5 quintillion (10 18 ) bytes of data are produced every day [1], and the data production rate is increasing all the time. There are two main challenges associated with these enormous amounts of data: (1) we need to provide storage systems, database systems, processing platforms, etc. to technically handle the data in a fast, cost-effective and secure way, and (2) we need to develop analysis methods (e.g., AI and machine learning) that can automatically find useful trends and patterns in the data so that we can produce business as well as user value. The need for addressing both these aspects was discussed in a recent survey by Roh et al. [2].
The exploding amount of data in many domains and applications makes it crucial that the advancements in Big Data research find their way from the research labs to practical applications and that these research results can be successfully integrated into industrial and commercial processes and systems. In order to formulate the most important topics for further research in Big Data, we need to identify which are the most important application areas and which of these application areas grow the fastest. For the same reason, we also need to identify the most important enabling technologies for Big Data and which of these enabling technologies develop the fastest. To further promote relevant and well-informed research in Big Data, it is useful to understand where in the world most research is currently being conducted and what the trends are.
In this paper, we analyze more than 80 recent research papers, surveys and books in the area of Big Data. Based on this analysis and almost a decade of close contact with leading practitioners in the field, we have identified seven important application areas and six important groups of enabling technologies for Big Data. For these seven application areas and the six enabling technologies, we analyze the bibliometric trends during the last decade. Our bibliometric analysis also includes Big Data research trends in different geographical regions.

Methodology
The methodology in this study consists of two steps: 1.
Identifying the major application areas and the major enabling technologies for Big Data; 2.
Using bibliometrics to quantify how the research interest for each of the identified application areas and enabling technologies for Big Data have developed during the last decade. In this step, we also quantify how the total number of research publications in Big Data from different geographical regions has developed during the last decade.
We describe these two steps in Sections 2.1 and 2.2.

Identifying Important Application Areas and Enabling Technologies in Big Data
We used the following two main approaches for identifying important application areas and enabling technologies in Big Data:

1.
Extensive contacts with experts from the industry and research community in the Big Data domain for more than 8 years; 2.
Systematically reviewing the recent literature in Big Data.
Using these two approaches, we identified seven important application areas and six important groups of enabling technologies for Big Data. We describe the two approaches below.

Extensive Contacts with Experts from the Industry and Research Community
Our research projects are mostly executed in collaboration with industry and society. We have much experience with this type of collaboration, and through this collaboration, we gain both input on important industrial trends as well as knowledge about the state of practice in the industry. For example, during 2014-2020, we conducted the project BigData@BTH (https://www.bth.se/bigdata, accessed on 5 August 2022)-"Scalable resource-efficient systems for big data analytics", where more than 15 researchers collaborated with 11 company partners. Some trends that we identified during that project were the importance of scalable solutions for analyzing large amounts of data (e.g., through scalable storage solutions as well as cloud, multicore and GPU computing) and the rise in machine learning and AI in many application areas (e.g., image and document analysis, telecommunication, social media, anomaly detection and decision support systems). Associated with the Big-Data@BTH project, we had a reference group with representatives from both the industry and international academia.
We also attained input on Big Data trends from our international network. We have collaborations, projects and joint studies with many European universities, such as the Hasso-Plattner Institute in Potsdam, Germany, Tor Vergata University of Rome and Sapienza University of Rome (both in Italy), University of Sofia, Bulgaria and KTO Karatay University, Turkey. These international collaborations provide important input from an international perspective.

Systematically Reviewing the Recent Literature on Big Data
Systematically reviewing the recent literature is another important approach to identifying the trends, enabling technologies and application areas for Big Data. As part of this, we edited a special issue in the journal Big Data Research in 2021 [3]. The focus of the special issue was "Big Data in Industrial and Commercial Applications". The submitted papers gave clear indications of which areas researchers found interesting and relevant for Big Data. Based on the submitted papers, we identified and organized the papers into categories such as telecommunication, smart cities, document and image processing and social media. These categories and application areas are also represented in this article and provide a base for which areas to focus on initially.

Bibliometric Study
The bibliometric information was obtained from the Scopus (https://www.scopus. com/, accessed on 25 May 2022)) database. We started by conducting a search based on the keyword "Big Data" (written as {big data} in Scopus) in the title, keywords or abstract (written as TITLE-ABS-KEY in Scopus) for all documents and for the years 2012-2021. Let the name BigData2012-21 denote the found set of documents from this search. The number of publications for each year in BigData2012-21 was then stored in a table and plotted.
To quantify and visualize the trends for different application areas for Big Data, we did the following. Within BigData2012-21, we searched for each of the seven application areas identified in the previous step (see Section 2.1). The number of publications for each year for these seven application areas was then stored in a table and plotted.
To quantify and visualize the trends for different enabling technologies for Big Data, we applied a similar approach. Within BigData2012-21, we searched for each of the six enabling technologies identified in the previous step (see Section 2.1). The number of publications for each year for these six enabling technologies was then stored in a table and plotted.
To quantify and visualize the trends for Big Data research in different geographical regions, we looked at BigData2012-21 and found that two countries stood out compared with the rest of the world: USA and China. Therefore, the number of publications per year for USA, China and Others (the rest of the world) for the years 2012-2021 were stored in a table and plotted.

Application Areas for Big Data
Big Data is becoming increasingly important in many industrial and commercial application areas [4,5]. One example of this is the telecommunication domain [6,7]. Today, phones in a mobile network continuously generate Call Detail Records (CDRs) and Internet Protocol Detailed Records (IPDRs). These huge amounts of data can be used to create value for the telecom operators in different ways. Xia et al. showed that Big Data analytics can be used for churn prediction in telecommunication networks [8]. In [9,10], Sidorova et al. showed that this kind of mobility data makes it possible to predict and balance the load in telecommunication networks. Since this is a Big Data application, one not only needs to find methods that can extract the business value from the data, but it is equally important to have proper database systems for handling large amounts of mobility data in an efficient way. This aspect was explored by Niyizamwiyitira and Lundberg [11]. Two conclusions from that and similar studies were that the Cassandra NoSQL database is very efficient in many cases, and many providers of industrial and commercial Big Data applications use virtualized cloud systems for storing and processing huge amounts of data in an efficient and scalable way. The performance characteristics of cloud-based storage and Cassandra in virtualized environments were investigated by Shirinbab et al. [12,13]. In [14], Souza et al. discussed the fact that context-aware mobile applications are emerging as a relevant technology to improve user satisfaction in telecommunication networks. That paper emphasizes the importance of developing methods that can support us to find value in huge amounts of data and at the same time provide ways of handling the data in an efficient way.
Another industrial and commercial application area where Big Data plays an important role is manufacturing [15,16]. In [17], O'Donovan et al. presented a systematic literature mapping of different areas in manufacturing where Big Data analytics have been applied. The authors identified the following eight areas: process and planning; enterprise; maintenance and diagnosis; supply chain; transport and logistics; environment, health and safety; product design and quality. The study by O'Donovan et al. list these eight areas in a decreasing order related to the amount of research being performed (i.e., the largest amount of research was conducted in the process and planning area, and the smallest amount of research was in the area of quality). The pharmaceutical industry is heavily regulated, and it is therefore important that production lines are auditable. In [18], Leal at al. suggested an architecture for smart pharmaceutical manufacturing using blockchain properties and smart contracts to ensure data authenticity, transparency and immutability. In [19], Gupta and Goyal performed a review of the existing research on Big Data in manufacturing. In their paper, they identified 16 barriers that need to be addressed before the manufacturing industry can fully benefit from Big Data analytics. The most critical barrier is the lack of commitment from top management.
Big Data has been used in the application area of smart cities [20]. Smart cities typically include huge numbers of different sensors and Internet of Things devices, which continuously produce large amounts of data regarding human behavior and mobility in a smart city [21]. By using Big Data analytics on such data, one can improve sustainability and the quality of life [22]. One way of handling Big Data efficiently in smart cities is to process the data close to the sources using edge or fog computing [23]. In [24], Fugini et al. presented the approach to Big Data analytics developed in an industry academia project in Italy (the SIBDA project). The paper discusses the elements of Big Data tackled in the three different subareas in the project, namely document processing, mass e-mail applications and Internet of Things sensor networks. The paper discusses the dual challenges in industrial and commercial Big Data systems: one must develop analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way. In [25], Koulali et al. discussed a smart city scenario where citizens take active parts in improving the overall quality of life by taking pictures and videos of different infrastructure problems when they encounter them in their daily lives. These images and videos are uploaded using smartphones, thus allowing city authorities to make appropriate incident responses. This paper proposes a benchmark of machine learning algorithms for image classification evaluated on a dataset of captured images by citizens that cover problems related to water and electricity distribution. The paper identifies the need for Big Data analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way.
Image processing is an important industrial and commercial application area from a Big Data analytics perspective. One reason for this is that many image processing, analysis, object detection and classification tasks rely on neural networks, mainly deep learning and convolutional neural networks (CNNs) [26][27][28][29]. These network models can be trained to reach high levels of accuracy [30,31] and have had a significant impact on our daily lives (e.g., in self-driving cars). However, these models are often very large (sometimes hundreds of millions of parameters) and require enormous amounts of data to train. Furthermore, large models require powerful hardware to train the models, and the training times are often very long. Providing labeled data to train deep learning models is a daunting task. For example, the ImageNet database contains more than a million images in 1000 different classes [32]. Although there are several open datasets (e.g., the UC Irvine Machine Learning Repository), many of these datasets are relatively small and not feasible for training deep learning models. Therefore, large, open labeled datasets are important input to the research community. The recent explosion of online data, video and images available through, for example, social media and streaming services calls for efficient processing pipelines for these scenarios as well as efficient labeling of such data for model training purposes. Document analysis is an important subdomain of image processing [33]. With an increasing number of digital documents, new scalable analysis methods are necessary [34]. Furthermore, new benchmarks and datasets [35,36] are also important to develop in order to properly evaluate the proposed methods. In particular, historical handwritten documents pose a number of challenges, such as image enhancement and binarization, layout analysis, segmentation and character recognition [34,[37][38][39][40].
In [41], Gani et al. presented a survey of the role of Big Data in social media. In [42], Jiang and Fu discussed the relation between Big Data, ethics and the need for personal integrity in Chinese social media. They concluded that ethical aspects are not taken into consideration in the way that a human-centric approach would demand. In [43], Yang et al. analyzed how Big Data obtained from social media can make it possible to detect problems related to adverse drug reactions. Other researchers have studied how social media Big Data can be used to prevent drug abuse and addiction problems [44]. In [45], Arrigo et al. studied users' preferences, stated on a social media platform, in order to aid businesses in making their marketing communication decisions.
Agriculture and forestry are two related fields where Big Data plays an increasingly important role. In [46], Rossit et al. discussed opportunities and challenges related to the fact that modern forest harvesters can collect large amounts of data. Zou et al. wrote a survey of Big Data for smart forestry [47]. In [48], Osinga et al. summarized the experiences of Big Data in precision agriculture in 12 use cases in a Horizon 2020 project. The use of Big Data in animal agriculture is discussed by Morota et al. in [49]. Kamilaris et al. wrote a review on the practice of Big Data in agriculture [50].
In [51], Hasan et al. presented a survey of Big Data in finance. The editor of a recent special issue on Big Data in finance performed a bibliometric review that showed that the interest for Big Data is increasing rapidly in the financial sector [52]. In [53], Goldstein et al. presented an overview of the papers submitted to another special issue on Big Data in finance. Two conclusions drawn from that special issue were that (1) more research is needed on how the use of Big Data should be regulated in the financial sector, and (2) future research on Big Data in finance may involve scholars from fields other than finance (e.g., scholars from computer science and mathematics). In [54], Cockcroft and Russell identified subareas related to Big Data in finance that need further research. The under-researched subareas identified were privacy and security, data visualization and predictive analytics, data management and data quality.
Big Data analytics are used also in other fields than the ones discussed in this section. For example, in [55], Alani et al. collected 10 papers that describe the experiences of Big Data analytics in different application areas (e.g., in education, healthcare and environmental contexts).
We analyzed the keywords used in the papers referenced in this section and saw some interesting patterns. Aside from the expected keywords related to Big Data and the specific application areas discussed, the most common keyword was machine learning. Machine learning (or some variation of that, including deep learning) was used as a keyword in almost one third of the papers referenced in this section. Another interesting observation was that the word smart was used in a number of contexts, including not only smart cities but also smart contracts, smart manufacturing, smart environments, smart grids, smart transport, smart healthcare, smart communities and smart farming. It seems that the word smart is used to indicate that one has been able to create value through Big Data analysis and that different forms of machine learning are the most important ways of extracting such value from large datasets.

Enabling Technologies for Big Data
As discussed previously, handling the large amounts of data that all application areas of Big Data require is a challenge. As a result of this, considerable research has been directed toward high-performance computing and supercomputing techniques for Big Data [56]. In [57], Mirtaheri and Grandinetti discussed the importance of optimized load balancing in highperformance computing for Big Data analytics. The importance of load balancing and optimization for Big Data application was also identified by Kumar and Kumar Jha [58].
One way to improve the performance of applications using Big Data is to use GPUs or other forms of parallel and distributed processing. In [59], Chen et al. presented an architecture that enables Apache Flink (https://flink.apache.org/, accessed on 5 August 2022) to benefit from the massive parallel processing capacity of modern GPUs. In [60], Jurczuk et al. presented another system that uses GPUs for accelerating the performance of Big Data applications. Ahmad et al. suggested a system that improves the parallel process-ing performance of Big Data applications using Apache Hadoop MapReduce [61]. In [62], Dolev et al. presented a survey of geographically distributed Big Data processing using MapReduce and Apache Spark (https://spark.apache.org/, accessed on 5 August 2022). In [63], Wang et al. proposed an optimization algorithm that improves the computational performance when analyzing Earth system models on multi-core clusters. Such models use massive amounts of data and are used for weather forecasting. In [64], Chen at al. discussed how massive parallel processing can be used for providing high performance in applications that process Big Data for describing brain activities and functions (e.g., EEG data). Finally, Zhang et al. conducted a survey of parallel processing systems for Big Data [65].
When we discussed the telecommunication domain, we showed that efficient storage systems and database systems for Big Data are very important. In [ [70], Bauer et al. described the implementation of a data lake, where a company can store their raw data in such a way that it could be governed by one set of policies but processed by multiple teams using different tools.
There are many publications that discuss how cloud computing can be used as an enabling technology for Big Data applications, including two conference series on the subject [71,72]. In [73], Aceto et al. discussed how cloud computing and Big Data can be used as enablers for the Fourth Industrial Revolution in healthcare. In [74], Hashem et al. discussed the relation between Big Data and cloud computing, including research challenges. Zbakh et al. were the editors for a special issue on cloud computing and Big Data [75]. The manuscripts of that special issue focus on job scheduling, resource optimization, privacy and security and performance evaluation.
A recent trend is that the processing of Internet of Things (IoT) Big Data is performed closer to where the data originate. This is called edge computing or fog computing [76] (as opposed to cloud computing, where the processing is performed in the cloud). In [77], Sanchez-Gallegos et al. discussed performance aspects related to setting up a software pipeline for continuous delivery of Big Data from the edge to the cloud. In [78], Barbik et al. proposed a three-tier secure framework for efficient management of health data using fog devices. In [79], Du et al. discussed security and the challenges related to handling Big Data and training machine learning models in edge computing environments. In [80], Lai et al. used long short-term memory (LSTM) and edge computing to recognize different types of electrical equipment in the Industrial Internet of Things (IIoT).
Machine learning in general, and deep learning in particular, has become an important enabling technology in many Big Data application areas in recent years [81]. In [82], Hossain and Muhammad proposed an emotion recognition system using a deep learning approach from emotional Big Data. In [83], Sohangir et al. investigated how Big Data and deep learning can be used for analyzing the stock market. Dekhtiar et al. provided an overview of the use of deep learning on Big Data in manufacturing applications [84]. In [85], Dargazany et al. provided a review of current research in the intersection between deep learning, wearable IoT and Big Data. An overview of the use of deep learning in different Big Data application areas is provided by Khan et al. [86].
In [87], Sakr provided a survey of enabling technologies for Big Data processing. That survey discussed most of the major systems used for Big Data processing, including Hadoop and Spark. In [88], Misale et al. presented a C++ interface called Pipeline Composition (PiCo). PiCo can obtain higher performance than Spark and Flink.
We also analyzed the keywords used in the papers referenced in this section. Most of the keywords were, as could be expected, related to either Big Data or to the enabling technologies discussed. However, for the references in this section as well, it was clear that the word smart was used to indicate the creation of value in an area (e.g., smart health, smart edges and smart grid). An additional observation was that the names of specific technologies such as MapReduce, Hadoop and Spark were relatively often used as keywords for the papers referenced in this section. Table 1 and Figure 1 show the number of Big Data research publications per year for the years 2012-2021 (see Section 2.2 for details on how the search was performed). It is clear that the research interest increased significantly during the last decade. However, the number of publications seemed to be rather stable during the last three years. The search was performed on 25 May 2022. Due to delayed updates of the Scopus database, the final number of publications for 2021 will probably be a bit higher than what is shown in Table 1.     Figures 2 and 3, shows the number of research publications per year for the seven application areas for Big Data that we identified. Figures 2 and 3 show that the research publication totals for the Big Data application areas manufacturing, finance, agriculture or forestry and smart cities are increasing, whereas the research publication totals for the Big Data application areas of social media, image processing and telecommunication are decreasing.   Telecommunication  29  85  169  370  658  550  737  720  675  628  Manufacturing  11  63  180  384  546  886  1180  1540  1912  2323  Smart Cities  8  28  69  166  327  476  747  1014  1096  1281  Image Processing  6  92  154  452  606  714  903  1147  1218  1083  Social Media  46  232  402  844  1223  1367  1548  2063  2040  2008  Agriculture or Forestry  7  47  97  188  341  396  615  882  1076  1359  Finance  10  54  111  265  390  531  672  987  1313 Figure 6 show the number of research publications in Big Data per year for different geographic regions. Figure 6 shows that two geographic regions are dominating: USA and China. The figure also shows that the number of research publications in Big Data from China is increasing, whereas the number research publications in Big Data from the USA and the rest of the world is decreasing. Today, China is clearly the part of the world where most research in the area of Big Data is conducted.

Discussion
When we looked at the literature, we found two concepts that are clearly related to Big Data: AI and the IoT. In Section 2.2, we defined a set called BigData2012-21. Within this set, we searched for AI and the IoT. The number of publications for each year for AI and the IoT are shown in Table 5, and the corresponding values are plotted in Figure 7.   Concept\Year  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021   Artificial Intelligence or AI  192  1068  1804  3580  4930  5497  6830  8605  9436  9353  Internet of Things or IoT  13  91  270  727  1307  1924  2999  4026  4442  5341 Our experience reading a significant number of papers in the area of Big Data indicates that the connection between Big Data and AI is, in many cases, taken for granted. By comparing the curve for AI in Figure 7 with Figure 1, we see that the bibliometric data seem to support this, since the two curves are very similar and almost every second paper in the Big Data area also mentioned AI.
The IoT is a concept that is related to several application areas of Big Data, particularly the application areas of manufacturing, smart cities and agriculture or forestry. By comparing the curve for the IoT in Figure 7 with the curve for manufacturing in Figure 2 and the curves for smart cities and agriculture or forestry in Figure 3, we see that the growing interest for Big Data in manufacturing, smart cities and agriculture or forestry seems to be related to the trend toward the IoT in Big Data. We also believe that the trend toward the IoT is the main reason why we saw an increasing interest in fog or edge computing in Big Data (see Figure 5).

Conclusions
The number of research papers in Big Data has increased significantly during the last decade (663 papers in 2012 to 19,758 papers in 2021). However, over the last three years, the number of research papers has been relatively constant. Through extensive contacts with experts from the industry and the research community for Big Data for more than 8 years, and by systematically reviewing the recent literature on Big Data, we identified seven main application areas in Big Data: social media, manufacturing, finance, image processing, smart cities, agricultureor forestry and telecommunications. Manufacturing and agriculture or forestry are the two application areas with the fastest growth in terms of research papers. We also identified six enabling technologies for Big Data: deep learning; cloud computing; high-performance computing or supercomputing; parallel processing, distributed Processing or GPU; storage systems, database systems or data lakes and edge or fog computing. Our bibliometric study shows that the deep learning and edge or fog computing areas are increasing.
When analyzing the geographical distribution of the research on Big Data, it is clear that two countries are dominating: USA and China. It is also clear that the research interest for Big Data in China is increasing much faster than it is in the USA and in the rest of the world. In fact, the number of publications in Big Data is decreasing in the USA and in the rest of the world.
Our literature survey shows that the connection between Big Data and AI is, in many cases, taken for granted. The bibliometric data support this observation (e.g., almost every second paper in the Big Data area also mentioned AI). The Internet of Things (IoT) is an area that is growing rapidly for Big Data research. Our literature survey shows that the IoT is a concept that is related to several application areas of Big Data, particularly the application areas of manufacturing, smart cities and agriculture or forestry. The growing interest in Big Data in manufacturing, smart cities and agriculture or forestry seems to be related to the trend toward the IoT in Big Data. We also believe that the trend toward the IoT is the main reason why we saw an increasing interest in fog or edge computing in Big Data.
An additional conclusion from our literature review is that there is a dual challenge in industrial and commercial Big Data systems: one must develop analytics that can help us find value in huge amounts of data and, at the same time, provide ways of handling large amounts of data in an efficient way. Most Big Data applications will only be useful when both of these challenges are successfully addressed.
The data presented in this paper provide a good overview of the current research trends in Big Data, and this kind of information is very useful when setting strategic agendas for Big Data research. Data Availability Statement: The bibliometric data presented in this study were obtained from the Scopus reference database.