Research Progress and Development Trend of Social Media Big Data (SMBD): Knowledge Mapping Analysis Based on CiteSpace

: Social Media Big Data (SMBD) is widely used to serve the economic and social development of human beings. However, as a young research and practice ﬁeld, the understanding of SMBD in academia is not enough and needs to be supplemented. This paper took Web of Science (WoS) core collection as the data source, and used traditional statistical methods and CiteSpace software to carry out the scientometrics analysis of SMBD, which showed the research status, hotspots and trends in this ﬁeld. The results showed that: (1) More and more attention has been paid to SMBD research in academia, and the number of journals published has been increased in recent years, mainly in subjects such as Computer Science Engineering and Telecommunications. The results were published primarily in IEEE Access Sustainability and Future Generation Computer Systems the International Journal of eScience and so on; (2) In terms of contributions, China, the United States, the United Kingdom and other countries (regions) have published the most papers in SMBD, high-yield institutions also mainly from these countries (regions). There were already some excellent teams in the ﬁeld, such as the Wanggen Wan team at Shanghai University and Haoran Xie team from City University of Hong Kong; (3) we studied the hotspots of SMBD in recent years, and realized the summary of the frontier of SMBD based on the keywords and co-citation literature, including the deep excavation and construction of social media technology, the reﬂection and concerns about the rapid development of social media, and the role of SMBD in solving human social development problems. These studies could provide values and references for SMBD researchers to understand the research status, hotspots and trends in this ﬁeld.


Introduction
Social Media (SM) is an internet application program with online interactive characteristics, which is one of the primary mediums for people to use the Internet. It is characterized by an interactive social network where users participate, reshape, and share information, connecting to the users ultimately [1]. People could send relevant events, comments, opinions, and insights to the world by social media anytime, anywhere [2]. Social media has already become an important network visualization", set the language to "English", selected the document type to "Article", set the period time span of the article to "2010-2020". A total of 2493 effective publications were retrieved, and the retrieval results were saved and output in text format, each document contained authors, institutions, keywords, abstract, date and other information.

Analysis Tools
To achieve an objective and comprehensive survey of the publication in the field of study, we combined the traditional statistical method and scientific knowledge mapping tool CiteSpace to describe the research status, hotspots and trends of SMBD in detail ( Figure 1). CiteSpace is a data visualization software developed by the team of Chen Chaomei, which is widely used in many fields such as science, information and bibliometrics. It could visualize the location and size of nodes in the knowledge network. In this paper, the software was used to analyze the knowledge base, research hotspots and development context by using the modules of country, institution, author, keyword and reference. The software was used to analyze the SMBD research field visually and draw the corresponding knowledge map. The parameters were as follows: Node Type: Selection based on analysis; Time Period: 2010-2020; Time Slice Length = 1; Threshold Selection Criteria: Top 25 per slice; others were default settings. Detailed parameters were listed in the upper left corner of each knowledge map. N, E and Density represented the number of nodes, connection, and the network density respectively. In the cluster graph, the silhouette value was used to measure the homogeneity of the network. The closer to 1, the higher homogeneity of the network was, and the value above 0.5 indicates that the cluster result was reasonable. Meanwhile, the color and size of each node represented different years and the number of citations, which were used to represent the citation history of the literature since its publication.
ISPRS Int. J. Geo-Inf. 2020, 9,18 FOR PEER REVIEW  3 of 17 were retrieved, and the retrieval results were saved and output in text format, each document contained authors, institutions, keywords, abstract, date and other information.

Analysis Tools
To achieve an objective and comprehensive survey of the publication in the field of study, we combined the traditional statistical method and scientific knowledge mapping tool CiteSpace to describe the research status, hotspots and trends of SMBD in detail ( Figure 1). CiteSpace is a data visualization software developed by the team of Chen Chaomei, which is widely used in many fields such as science, information and bibliometrics. It could visualize the location and size of nodes in the knowledge network. In this paper, the software was used to analyze the knowledge base, research hotspots and development context by using the modules of country, institution, author, keyword and reference. The software was used to analyze the SMBD research field visually and draw the corresponding knowledge map. The parameters were as follows: Node Type: Selection based on analysis; Time Period: 2010-2020; Time Slice Length = 1; Threshold Selection Criteria: Top 25 per slice; others were default settings. Detailed parameters were listed in the upper left corner of each knowledge map. N, E and Density represented the number of nodes, connection, and the network density respectively. In the cluster graph, the silhouette value was used to measure the homogeneity of the network. The closer to 1, the higher homogeneity of the network was, and the value above 0.5 indicates that the cluster result was reasonable. Meanwhile, the color and size of each node represented different years and the number of citations, which were used to represent the citation history of the literature since its publication.

Annual Publishing Trend
In order to make an in-depth analysis of SMBD trends, we collected the number of publications from the WoS core collection from 2010 to 2020 ( Figure 2). We found that the number of publications in SMBD increased slowly from 2010 to 2013, and did not show a significant growth trend until 2014. From 2010 to 2013, the average annual output was 51; while from 2014 to 2016, the average annual output was 201. In addition, in the last five years (2016-2020), 78.42% of the articles (1955 out of 2493) were published. This showed that research in SMBD was novel, and the research heat has been increased in the past five years. It should be noted that we do not have complete data for 2020 because the date of data collection for these publications ended in September 2020.

Annual Publishing Trend
In order to make an in-depth analysis of SMBD trends, we collected the number of publications from the WoS core collection from 2010 to 2020 ( Figure 2). We found that the number of publications in SMBD increased slowly from 2010 to 2013, and did not show a significant growth trend until 2014. From 2010 to 2013, the average annual output was 51; while from 2014 to 2016, the average annual output was 201. In addition, in the last five years (2016-2020), 78.42% of the articles (1955 out of 2493) were published. This showed that research in SMBD was novel, and the research heat has been ISPRS Int. J. Geo-Inf. 2020, 9, 632 4 of 16 increased in the past five years. It should be noted that we do not have complete data for 2020 because the date of data collection for these publications ended in September 2020.

Web of Science Categories
Through the analysis of SMBD literature, we could accurately understand the focus of scientific research in this field. According to the discipline system of WoS, 2493 articles in SMBD could be divided into 110 research areas/directions. Moreover, one article might cover one or more fields of research, which made the number of articles corresponding to the research field more, and it also reflected the interdisciplinary character of the SMBD field. Table 1 shows the top 10 research areas with more than 85 articles. As a whole, SMBD was mainly studied in the field of Computer and Engineering, and Computer Science was the top research field, with a total of 1338 articles, accounting for 53.67% of the total. Next, Engineering (568), Telecommunications (298), Science Technology Other Topics (243), Environmental Sciences Ecology (215) and so on were the key fields of research in SMBD. Finally, Information Science Library Science (5.93%), Operations Research Management Science (3.57%), Physical Geography (3.45%) were potential areas of SMBD research. In other words, SMBD has been paid attention to by many subjects, and SMBD reflected the characteristics of interdisciplinary, multi-domain co-construction and multi-direction integration.

Journal Analysis
By analyzing the distribution of journals in this field, we could accurately identify the main part of the academic research, and papers published in such journals could be supported by academic research [22]. In general, the greater the journal Total Publications (TP), the greater the contribution to the field, the greater the journal Impact Factor (IF), and the higher the H-index, the greater the academic impact of the journal [23]. Table 2 lists the top 10 journals that published in SMBD research articles. The number of publications in these journals was about 22.42% of the total number of publications in this field. The number of articles published in IEEE Access was the largest, with 137 articles, accounting for 5.495% of total, and the comprehensive Impact Factor was 3.745 in 2019. Second was Sustainability, with 93 articles (3.73%), Future Generation Computer Systems the International Journal of eScience, with 62 articles (2.487%). In terms of the journal's Impact Factor and

Web of Science Categories
Through the analysis of SMBD literature, we could accurately understand the focus of scientific research in this field. According to the discipline system of WoS, 2493 articles in SMBD could be divided into 110 research areas/directions. Moreover, one article might cover one or more fields of research, which made the number of articles corresponding to the research field more, and it also reflected the interdisciplinary character of the SMBD field. Table 1 shows the top 10 research areas with more than 85 articles. As a whole, SMBD was mainly studied in the field of Computer and Engineering, and Computer Science was the top research field, with a total of 1338 articles, accounting for 53.67% of the total. Next, Engineering (568), Telecommunications (298), Science Technology Other Topics (243), Environmental Sciences Ecology (215) and so on were the key fields of research in SMBD. Finally, Information Science Library Science (5.93%), Operations Research Management Science (3.57%), Physical Geography (3.45%) were potential areas of SMBD research. In other words, SMBD has been paid attention to by many subjects, and SMBD reflected the characteristics of interdisciplinary, multi-domain co-construction and multi-direction integration.

Journal Analysis
By analyzing the distribution of journals in this field, we could accurately identify the main part of the academic research, and papers published in such journals could be supported by academic research [22]. In general, the greater the journal Total Publications (TP), the greater the contribution to the field, the greater the journal Impact Factor (IF), and the higher the H-index, the greater the academic impact of the journal [23]. Table 2 lists the top 10 journals that published in SMBD research articles. The number of publications in these journals was about 22.42% of the total number of publications in this field. The number of articles published in IEEE Access was the largest, with 137 articles, accounting for ISPRS Int. J. Geo-Inf. 2020, 9, 632 5 of 16 5.495% of total, and the comprehensive Impact Factor was 3.745 in 2019. Second was Sustainability, with 93 articles (3.73%), Future Generation Computer Systems the International Journal of eScience, with 62 articles (2.487%). In terms of the journal's Impact Factor and H-index, Future Generation Computer Systems and the International Journal of eScience had the highest IF. The Journal PLoS One had the highest H-index, but the IF was only 2.74. IF and H-index of journals IEEE Transactions on Visualization and Computer Graphics, Journal of Medical Internet Research, Information Sciences were higher, but the quantity of articles was less.

Country and Institutional Analysis
Number of national/regional publications reflects the degree of the country/region's contribution to the research in this field. Based on the data of literature published by countries with SMBD in the WoS core collection, the top 10 countries were sorted according to the number of publications published. As shown in Figure 3, China ranked first (739 articles), accounting for 29.643% of the total data collected. Then came the United States (722 articles), the United Kingdom (240 articles) and South Korea (159 articles), which covered 44.966% of all publications in the dataset. Centrality is an indicator that measures the importance of nodes in the network, and it is used to measure the importance of specific pieces of nodes in CiteSpace. The centrality of country reflected the international recognition of a country in the field of SMBD development research. According to Table 3, the United Kingdom had the highest degree of centrality (centrality = 0.2), the second was the United States (centrality = 0.19). Although China was ranked first in the number of publications published, the centrality was less than other countries. From this part, it reflected the weak international influence of China's research results in the SMBD despite its large number of publications.  CiteSpace was used to establish an institutional cooperation network to reflect the contribution and cooperation degree of each institution in the SMBD research field. Figure 4 shows the Institution Collaboration Network, which consists of 385 institutional nodes with 602 connections. The thicker connection line indicates the closer cooperation between institutions, and each link between two different institutions is represented by a spectrum of colors corresponding to the years of occurrence. Higher central synthesis values were Chinese Academy of Science, Wuhan University, Tsinghua University, City University of Hong Kong, Huazhong University Science and Technology, Zhejiang University and Peking University, which played intermediary and leading roles. The close cooperation between these institutions could be seen clearly from the connection. In addition, global distribution of SMBD research institutions was uneven. The top 15 institutions with the largest number of papers published are mainly located in China and the USA. CiteSpace was used to establish an institutional cooperation network to reflect the contribution and cooperation degree of each institution in the SMBD research field. Figure 4 shows the Institution Collaboration Network, which consists of 385 institutional nodes with 602 connections. The thicker connection line indicates the closer cooperation between institutions, and each link between two different institutions is represented by a spectrum of colors corresponding to the years of occurrence. Higher central synthesis values were Chinese Academy of Science, Wuhan University, Tsinghua University, City University of Hong Kong, Huazhong University Science and Technology, Zhejiang University and Peking University, which played intermediary and leading roles. The close cooperation between these institutions could be seen clearly from the connection. In addition, global distribution of SMBD research institutions was uneven. The top 15 institutions with the largest number of papers published are mainly located in China and the USA.

Author Analysis
Authors' Cooperative Network analysis could reflect the core authors, authors' cooperative intensity and mutual citation in a certain field, and explore the important influence of team

Author Analysis
Authors' Cooperative Network analysis could reflect the core authors, authors' cooperative intensity and mutual citation in a certain field, and explore the important influence of team cooperation on academic research in this field [24]. Table 4 lists the 10 most productive authors. The results showed that Liu Y was the one with the most publications. Other relevant authors included Zhang Y (18 articles), Chen Y (14 articles), Wang Y (14 articles), Liu YH (13 articles), and Wang H (13 articles). Most of the top 10 authors were from China and belonged to nine research institutions. In Figure 5, each node in the author collaboration network represents the author, the number of papers published by the author is represented by the size of the nodes, and the connections between the nodes represent the cooperative relationship between the authors. The author collaboration network in the SMBD field consisted of 995 authors and 479 collaborative links. Different scholars formed different research teams based on the collaborative relationship between authors: (1) The Wanggen Wan team from the Shanghai University analyzed the application of SMBD to social development, including a study of the spatial and temporal distribution of urban green spaces [25], as well as video, spatial, temporal, and social media analysis of urban populations [26]. (2) Nicola Luigi Bragazzi's team from the University of Genoa in Italy focused on the application of SMBD in medicine, and the team's two most cited papers focused on the immunological and rheumatology value of the use of SMBD [27] and the digital behavior of using Behavior Informatics to analyze the entire spread of epidemic [28]. (3) Haoran Xie's team from the City University of Hong Kong mainly studied the user's personalized profile and information needs in order to realize personalized searches [29], and proposed a kind of potential user group identification based on folklore [30]. (4) In addition, there were some other outstanding teams, such as Antonio Ferrandezu's team from the University of Alicante in Spain, and Henrikki Tenkanen's team from the University of Helsinki in Finland. value of the use of SMBD [27] and the digital behavior of using Behavior Informatics to analyze the entire spread of epidemic [28]. (3) Haoran Xie's team from the City University of Hong Kong mainly studied the user's personalized profile and information needs in order to realize personalized searches [29], and proposed a kind of potential user group identification based on folklore [30]. (4) In addition, there were some other outstanding teams, such as Antonio Ferrandezu's team from the University of Alicante in Spain, and Henrikki Tenkanen's team from the University of Helsinki in Finland.

Co-Citation Literature Analysis
Co-citation deeply reflects the theoretical knowledge foundation of relevant research, and the high frequency co-citation literature shows the fundamental research achievements in different periods and plays an important role in the academic development of this field. The co-citation network in the SMBD field was composed of 778 nodes and 975 connections ( Figure 6). The node represented the cited literature, and the importance of the literature was expressed by its size. The label on the node was the first author and publication year of the article. Fifteen key literature nodes with important academic influence were selected in this paper, as shown in Table 5. mining, deep learning, data visualization, and natural language processing [36,37]. The growth of the field will lead to the evolution of business, web, and scientific applications [38].  Keywords are the condensation and reaction to the main content of the article, which can reflect the hot topic and the development trend related to the research field. We ran the "Keyword" module of CiteSpaceⅤ, merged some semantic repetitions, and generated a graph of the keyword cooccurrence network in SMBD research, with 547 nodes and 3631 connections (Figure 7). The top 10 keywords for co-occurrence and centrality are shown in Table 6. The keywords "big data" and "social media" from Table 6 displayed the highest frequency of 696 and 408, respectively. Followed by the  Bollen et al. [31] extracted seven aspects of public sentiment from the text of Twitter and correlated them with economic indicators. This was a representative literature on the use of SMBD in economic research. Chen et al. [32] systematically discussed the opportunities and challenges, technical principles and future research trends for data-intensive applications. Lazer et al. [33] used Google Flu Trends (GFT) as an argument to raise questions about big data as an alternative to traditional statistical methods and theories, and proposed two contradictions of Google Flu Trends: Hubris and Dynamics Algorithm. As big data became more widely available, critical discussion was on the rise. Boyd et al. [34] argued that reasonable critical questioning and assumptions were necessary for big data, as an emerging analytical technique. He has proposed six critical points, including (1) the changes in the whole theory of social theory, which was caused by big data, (2) the inevitable problems with big data in terms of objectivity and accuracy, (3) the quality of the research was determined by the degree of which big data fits the problem and the representativeness of the data, (4) the graphical representation of the relationship between people did not mean the equivalent information, (5) the ethical issues of big data, (6) the digital divide caused by the availability and accessibility of big data. Ginsberg J designed a model for epidemic disease surveillance based on Google search engine and used to accurately estimate the weekly influenza activity level in each region of the United States [35]. In general, SMBD was interdisciplinary, covering areas such as data mining, deep learning, data visualization, and natural language processing [36,37]. The growth of the field will lead to the evolution of business, web, and scientific applications [38].

Co-Occurrence of Keywords
Keywords are the condensation and reaction to the main content of the article, which can reflect the hot topic and the development trend related to the research field. We ran the "Keyword" module of CiteSpaceV, merged some semantic repetitions, and generated a graph of the keyword co-occurrence network in SMBD research, with 547 nodes and 3631 connections (Figure 7). The top 10 keywords for co-occurrence and centrality are shown in Table 6. The keywords "big data" and "social media" from Table 6 displayed the highest frequency of 696 and 408, respectively. Followed by the keywords "social network" (226), "visualization" (189), "network" (186) and "twitter" (181). From the analysis of the centrality, the keyword "social network" showed the highest central value, followed by "visualization", "centrality", "pattern", "design", "social network analysis" and so on, indicating that these keywords has been the focus of the researchers and created certain influences. ratio (LLR) tests tend to reflect a unique aspect of a cluster, which is more suitable for generating high-quality clustering with intra class similarity and inter class similarity. We clustered the keyword map according to the LLR algorithm in the CiteSpace software to get the Timeline view of the nine clusters shown in Figure 7, with the cluster label on the right and time at the top. The keywords of the same cluster are on the same horizontal line, and each node represents a keyword, the keywords are fixed in the year when they first appear, connected by lines. Through the timeline, we could observe the time span of the co-occurrence keywords and the rise and fall of specific research content of clusters. Table 7 shows more details of these clusters.    CiteSpace provides automatic tagging of clustering networks, allowing noun phrases to be extracted from titles, keywords or abstracts through three algorithms (LSI, LLR and MI). Log-like ratio (LLR) tests tend to reflect a unique aspect of a cluster, which is more suitable for generating high-quality clustering with intra class similarity and inter class similarity. We clustered the keyword map according to the LLR algorithm in the CiteSpace software to get the Timeline view of the nine clusters shown in Figure 7, with the cluster label on the right and time at the top. The keywords of the same cluster are on the same horizontal line, and each node represents a keyword, the keywords are fixed in the year when they first appear, connected by lines. Through the timeline, we could observe the time span of the co-occurrence keywords and the rise and fall of specific research content of clusters. Table 7 shows more details of these clusters. As shown in Table 7, an internal uniformity (profile) value from 0.57 to 1 indicated that the top terms in the cluster match well and the cluster was reliable [39]. # 0 and # 6 were "social media" and "big data research", which focused on the importance of Social Media Big Data [40][41][42], and explored the risk and future of Social Media Big Data [43]. The study focused on key fields such as big data, association rules, cultural communication, double-layer coupling, social network, social media, social media usage characteristics, users, effect, airline and so on. # 1 was "negative emotion", and the cluster contained case study, data, role, coastal resilience assessments, floods and other keywords. This cluster mainly used disaster cases to show that research of SMBD had a negative impact on people's emotions [44,45]. # 2 was "humanitarian supply chain"; the cluster contained artificial intelligence, industry, social gains, principles, dam operations, and the cluster focused on the important value of Social Media Big Data combined with blockchain technology [46,47]. # 3 was "microblogs data management", this cluster was a study of technologies, models, and frameworks for Social Media Big Data [48,49], the main keywords were review, classification techniques, cetacean vocalization, automatic detection, exploiting academic factor. # 4, # 5, # 7 and # 8 were "scientific domain", "utilizing big data", "scalable urban infrastructure condition assessment", they were all specific applications of SMBD in human production and life. For example, Vargas-Quesada et al. [50] used the performance of category synergy and its social network to visually predict or label developments in the field of science. Liu et al. [51] used SMBD to categorize green space and smart city buildings. Alipour et al. [52] provided a framework for the development and extensibility of visual surveillance of urban infrastructure and built environments based on SMBD. O'Doherty et al. [53] explored the use of big data for health.

Research Trends Analysis
The trends of research in a field could reflect the future development direction of research. By using the burst detection function in CitespaceV, sudden increases or decreases in the number of citations of specific keywords or papers could be revealed [54]. A keyword or literature with a strong number of citations that have increased or decreased in a short period of time may cause mutation rate changes, and we could better understand the trends and future directions of a field through keywords and changing trends in the literature.

Keyword Burst Analysis
The burst of keywords could reflect the changes of research topics and hotspots in one field. As shown in Figure 8, we selected 20 emergent words in SMBD research according to the two indicators of starting year and strength. The results showed that the research frontier of SMBD in the past decades has changed with time, and the strongest burst word is Visualization. Among the keywords with longer burst cycles were Social Network (2010-2015) and Graph Visualization (2012-2017), and research related to these terms had a more sustained impact on the SMBD field. The latest burst words, Validation, Real Time, Emotion, Context, represented some of the hottest topics in 2018 so far, and will continue to be followed.

Co-Citation Literature Burst
The above analysis of keywords showed that SMBD was the research frontier in different periods

Co-Citation Literature Burst
The above analysis of keywords showed that SMBD was the research frontier in different periods from 2010 to 2020. In addition, a burst test was also an indicator of the research frontier for co-citation literature. The higher burst of articles, the higher the degree of attention in a certain period of time, the research content of the article represented the hot spot and frontier of the field in a certain period of time. The red parts in Figure 9 represent the time range in which the literature burst appeared. We listed the literature with burst characteristics and no "cooling", so as to analyze the research frontier of SMBD in recent years. As for the strongest burst co-citation shown in Figure 7, there were 11 highly cited articles in 2018-2020. It could mainly be divided into three aspects: media activity that existed nowadays, and used a synchronous trap as an incremental processing system to efficiently handle big data in large online social networks by deploying applications on Facebook and Instagram to expose malicious accounts and attacks in a short period of time.
(3) The role of SMBD in solving human social development problems. Eichstaedt et al. [61] used the language of Twitter to predict heart disease mortality well at the county level and demonstrated the importance of SMBD in the field of disease. Kryvasheyeu et al. [62] used big data on Twitter to demonstrate that large scale online social networks could quickly assess the damage caused by large scale disasters. In addition, by building a disaster social media framework, Houston et al. [63] facilitated the creation of disaster social media tools, the development of implementation processes, and the scientific study of their effects. Albuquerque et al. [64] used social media as a potential resource to improve the management of crisis situations. He proposed a geographic approach and used it to examine tweets generated by the Twitter platform (Twitter) during the June 2013 floods in Elbe, Germany, and considered social media messages to be reliable quantitative indicators. Disaster management in crisis response and preventive monitoring was of great value.

Conclusions and Deficiencies
Based on the analysis of SMBD research in the previous part of this paper, the following conclusions could be drawn: (1) As far as the number of published papers was concerned, the research of SMBD has shown an obvious increasing trend in the last ten years, especially the number of papers published in the past five years accounted for 78.42% of the total number of published papers, which has indicated that the research on SMBD is novel. The current research involved Computer Science, Engineering, Telecommunications and other disciplines (fields), reflecting the characteristics of SMBD are interdisciplinary, multi field co-construction and multi-direction mutual integration. IEEE Access, Sustainability, IEEE Transactions on Visualization and Computer Graphics, PLoS One and other journals have collected a lot of research in this field. As far as the main strength of SMBD research was concerned, the most productive authors in SMBD field were mainly from China, the academic teams led by Wanggen Wan, Haoran Xie and others have made significant contributions in this field. China, the USA, the UK and other countries had the largest number of publications, and the main research institutions were the Chinese Academy of Sciences, Wuhan University, Tsinghua University, City University of Hong Kong, etc. However, although the number of papers published by China ranked first, the centrality remained at a low level, which indicated that the international influence of China's research results in SMBD was weak. It is necessary to improve the innovation and comprehensiveness of research results in the future. (1) Deep excavation and construction of social media technology. Mikolov [55] proposed the skip-gram model, which could capture a large number of precise syntactic and semantic relations and make vectors express millions of phrases well. This model provided a methodological foundation for the prediction and analysis of Social Media Big Data. Meanwhile, Schmidhuber [56] explored deep learning in Neural Networks (NNS). This research promoted evolutionary computation of SMBD, the application of Computational Intelligence Algorithms, and the advancement of visualization. Gandomi et al. [57] described social media in detail on the basis of previous research and stressed the need to develop appropriate and effective analytical methods. They pointed out that the key feature of modern social media analysis was its data-centric nature and it could be divided into content-based analysis and structure-based analysis.
(2) Rethinking and worrying about the rapid growth of social media. Ferrara et al. [58] examined the behavior of social robots that abounded in the social media ecosystem, such as the social bots on Twitter, in their imitation of features associated with the temporal patterns of content, network, emotion and activity, these robots had produced the characteristic of engineering social manipulation, which led to the conclusion that there were good robots and bad robots. Their emergence may pose a threat to the Internet ecology and human society. In addition, Boshmaf et al. [59] pointed out that with the widespread use of online social media and the growing number of users, online social media could be used to steal user data and damage the Internet ecosystem if it was not properly handled, so had to build a prototype of a socialbot network to run tests on Facebook in response to mass infiltration. Cao et al. [60] analyzed the massive amount of aggressive social media activity that existed nowadays, and used a synchronous trap as an incremental processing system to efficiently handle big data in large online social networks by deploying applications on Facebook and Instagram to expose malicious accounts and attacks in a short period of time.
(3) The role of SMBD in solving human social development problems. Eichstaedt et al. [61] used the language of Twitter to predict heart disease mortality well at the county level and demonstrated the importance of SMBD in the field of disease. Kryvasheyeu et al. [62] used big data on Twitter to demonstrate that large scale online social networks could quickly assess the damage caused by large scale disasters. In addition, by building a disaster social media framework, Houston et al. [63] facilitated the creation of disaster social media tools, the development of implementation processes, and the scientific study of their effects. Albuquerque et al. [64] used social media as a potential resource to improve the management of crisis situations. He proposed a geographic approach and used it to examine tweets generated by the Twitter platform (Twitter) during the June 2013 floods in Elbe, Germany, and considered social media messages to be reliable quantitative indicators. Disaster management in crisis response and preventive monitoring was of great value.

Conclusions and Deficiencies
Based on the analysis of SMBD research in the previous part of this paper, the following conclusions could be drawn: (1) As far as the number of published papers was concerned, the research of SMBD has shown an obvious increasing trend in the last ten years, especially the number of papers published in the past five years accounted for 78.42% of the total number of published papers, which has indicated that the research on SMBD is novel. The current research involved Computer Science, Engineering, Telecommunications and other disciplines (fields), reflecting the characteristics of SMBD are interdisciplinary, multi field co-construction and multi-direction mutual integration. IEEE Access, Sustainability, IEEE Transactions on Visualization and Computer Graphics, PLoS One and other journals have collected a lot of research in this field. As far as the main strength of SMBD research was concerned, the most productive authors in SMBD field were mainly from China, the academic teams led by Wanggen Wan, Haoran Xie and others have made significant contributions in this field. China, the USA, the UK and other countries had the largest number of publications, and the main research institutions were the Chinese Academy of Sciences, Wuhan University, Tsinghua University, City University of Hong Kong, etc. However, although the number of papers published by China ranked first, the centrality remained at a low level, which indicated that the international influence of China's research results in SMBD was weak. It is necessary to improve the innovation and comprehensiveness of research results in the future.
(2) We used co-citation analysis to identify the knowledge base of SMBD, and the results showed that the research of SMBD was interdisciplinary, covered fields such as data mining, deep learning, data visualization, and natural language processing. It can be seen that the knowledge structure of SMBD has begun to take shape. We further divided the keywords into nine clusters and found that the hot research focuses on the significance of SMBD research, the combination with cutting-edge technology, and the specific application in production and life. Big data, social media, social network and visualization appeared more frequently, and social network, visualization, centrality, pattern, design had a higher degree of centrality, which represented the research hotspots of the past decade.
(3) We used two modules, keyword burst and co-citation literature burst to analyze the research frontiers of SMBD. We found that the strongest keyword was Visualization, the longest ones were Social Network and Graph Visualization, and the most recent ones were Validation, Real time, Emotion, Context. We can find that Visualization, Social Network and other topics represent the academic frontier in the brewing stage. With the maturity of theory and the development of technology, SMBD is turning from theoretical research to practical application. We detected the burst of co-citation literature and found that the frontier of SMBD included the in-depth exploration and construction of social media technologies, reflections and concerns about the rapid development of social media, and the role of SMBD in solving human social development problems. These findings provided valuable information for SMBD researchers to understand the research status and trends in this field.
Although we conducted an effective econometric analysis of the SMBD field, there were still some limitations in the current research. First, the analysis in this article was limited by using the WoS database, and there existed data incompleteness at the time node, so data from other databases or collected at different times may have different results and conclusions. Secondly, although bibliometrics provided an effective tool and means for the development of the research field, with the further improvement of bibliometrics software and tools in function and methods, future research will come to more detailed and valuable conclusions, so the conclusion of this paper is worth further study to be tested and improved.