Next Article in Journal
Coupled Water–Energy–Carbon Study of the Agricultural Sector in the Great River Basin: Empirical Evidence from the Yellow River Basin, China
Previous Article in Journal
Influence of Consumer Trust, Return Policy, and Risk Perception on Satisfaction with the Online Shopping Experience
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms

1
School of Management, Hangzhou Dianzi University, Hangzhou 310018, China
2
School of Science, Nanjing University of Science and Technology, Nanjing 210094, China
*
Author to whom correspondence should be addressed.
Systems 2025, 13(3), 159; https://doi.org/10.3390/systems13030159
Submission received: 10 November 2024 / Revised: 31 January 2025 / Accepted: 6 February 2025 / Published: 26 February 2025

Abstract

:
Digital transformation (DT) has become an important engine for the development of the digital economy and an important means of reshaping corporate culture, business processes, management models, and so on. Different social communities at different levels have different needs and understandings of digital transformation. Therefore, this paper proposes to explore the communication themes of digital transformation on social media. This study’s main objective is to uncover underlying thematic structures and core ideas from large amounts of textual data in different social media communities to better understand the significance of the communication themes. This paper also aims to reveal the characteristics of diffusion patterns of DT themes by opinion-themed mining. This study uses text mining and social network analysis methods to mine DT themes, theme structure, and the statistical characteristics of hot words across various online communities. The main findings of this study are as follows. The Huawei forum discusses the technological drivers of the digital economy from a micro level. Sohu News explores business operation strategies at a macro level. The Zhihu forum discusses the elements of digital development at the micro level. Moreover, the hot words’ degree centrality and betweenness centrality across various online communities exhibited a power law distribution. In conclusion, this research paper studies and analyzes DT themes of different social media platforms to discover the opinions and attitudes of various social groups in the digital transformation era and deeply interprets social trends and public opinions in order to provide valuable decision-making theoretical support for managers, enterprises, and governments.

1. Introduction

Numerous advancements in science and technology have ushered in an era of digital transformation (DT). Digital transformation has also become a major engine of the digital economy. Scholar Hess et al. defined DT as reaping the benefits of digitalization, such as productivity improvements, cost reductions, and innovation [1]. Nadkarni et al. defined DT as encompassing two aspects, disruptive digital technologies and organizational changes in capabilities, structures, processes, and business model components [2]. Ebert et al. defined DT as the adoption of disruptive technologies to increase productivity, value creation, and social welfare [3]. From an enterprise-level perspective, enterprises today are increasingly using digital technology to optimize resource information, which has become the driving force for many enterprises to achieve digital transformation, leading to industrial reform and hence reshaping enterprise practices [4,5]. From a technological perspective, “DT” refers to the process of using digital technologies to transform the operational processes and customer service models of a business entity. This transformation involves the application of digital technologies to all operational areas of a business entity, thereby fundamentally and comprehensively changing its operational processes and growth model. From a market-level perspective, digitalization is a primary trend that is expected to change society and business soon. Hence, DT, as an economic driver and a focus of national policy advocacy, brings changes to society, affects economic relations and social life, and generates a wide range of social opinion.
Given the above, there is social value and significance in tapping into DT opinions and public opinion on social media platforms. This article explores and analyzes the theme of digital transformation discussions in different communities on social media platforms in China. Social media platforms such as web portals, topical communities, and specialized forums are the hubs of public opinion from all walks of life. So, understanding the views, attitudes, and problems of various digital media channels and studying the information dissemination characteristics of the critical theme are both meaningful and necessary. Social media opinion theme mining can help to identify emerging trends, issues, and crises before they become mainstream. Government policymakers, business managers, and individuals can use social media opinion theme mining to monitor public sentiment about political issues and make informed decisions. Social media opinion theme mining is also a valuable tool with which to gain insights into public opinion and make informed decisions. From there, this article attempts to answer the following research questions:
RQ1: What are the statistical diffusion characteristics of different social media platforms?
RQ2: What are the hidden communication themes in the public opinion discourse on different social media platforms, and what are their differences or commonalities? What are the structural features of these themes?
RQ3: What are the key terms of the theme network on different social media platforms, and what core values are reflected?
This study intends to conduct text mining and social network analysis (SNA) to research themes, structure, and content hot words. The word network uses SNA and text mining to reveal the information dissemination rules and core values. This includes visualization of keywords and perspective statements. This study is organized as follows: Section 1 introduces the theoretical background and the research questions. Section 2 and Section 3 address the related work and the research design, respectively, and Section 4 addresses the data experiments and results. Section 5 concludes this paper with a discussion of this study’s implications and scope for future research.

2. Related Work

This section discusses relative categories of work; the first involves works in social media policy opinions and includes the following factors: public influential factors, the evolutionary logic of public opinion, characteristics of dissemination, attitudes towards policy opinion, public opinion management and guidance, and so on. The second group discusses social media content analysis, which mainly involves the following: content mining methods, text techniques, natural language processing, and theme model construction. The final group discusses the widespread use of social media platforms globally, their impact on various aspects of economic life, and some of the common situations of social media use in Eastern and Western countries.

2.1. Social Media Policy Opinion

Public policy opinion expresses the public’s concern about socio-economic development and their thoughts and attitudes towards social change. It plays an important role in social management and public opinion guidance. Laor found online entertainment is influential in forming popular opinion and its connection with the more extensive media scene, representing a round cycle where web-based entertainment informs different opinions by working with the outflow of different recently published sentiments [6]. Hou et al. examined the coexistence of information overload and information fragmentation. The influential factors were identified, and the evolutionary logic of public opinion for effective governance was uncovered [7]. Wang et al.’s study suggests that the properties of the information and content itself are the main factors influencing the dissemination of public opinion. Therefore, public opinion monitoring should focus on timing and influential netizens in order to prevent its wide dissemination [8]. Xing et al. investigated how privacy worries and cultural disparities impacted societal views on the pandemic and performed a comparative study of public sentiment in two distinct nations [9]. Wang et al. conducted an examination of the online sentiments and reactions of Chinese internet users to two separate sets of population policies, thereby acquiring profound insights into the mechanisms of online opinion development and the evolution of population policies in China [10]. Yu et al. in their seminal study identified the critical elements that influence public sentiment regarding target recovery policy. Furthermore, they assessed the advantages and disadvantages of the policy by examining the emotions and associated themes expressed in public discourse on social media platforms [11]. Calnan analyzed the current health policy measures implemented in England to curb or handle the outbreak and pinpointed the principal sociological and political factors that have molded these strategies [12]. Zhang et al. developed a multilayer fuzzy cognitive mapping (MFCM) model, an innovative method that builds upon multiple individual fuzzy cognitive maps (FCMs), to investigate the dynamics of policy diffusion. FCMs possess superior capabilities for knowledge representation and reasoning, which are crucial for addressing the interplay of complex factors, modeling intricate systems, articulating human thought processes, and amalgamating expert insights [13]. The aforementioned study expands on the groundwork laid by fuzzy cognitive maps (FCMs) to create a multi-layered fuzzy cognitive map (MFCM) model that delineates the factors influencing the spread of policies. This model pinpoints the key components and their dynamic interactions within the policy diffusion process. Additionally, the MFCM facilitates the portrayal of intricate connections across various FCMs, with each FCM addressing a specific facet of the overall issue [14]. Lawlor et al., using a discussion of public sentiments around the crisis, created a policy window for enhanced social spending and for entrenching targeted financial support for vulnerable individuals [15]. Núnez-Barriopedro et al. measured the correlation which exists among perceptions of public service performance and taxes based on public opinion and fiscal policy [16]. Garbarino et al. examined how articulating the causes of obesity from an attributional perspective could increase public support for various policy measures aimed at curbing obesity [17]. Public themes are closely related to people’s lives, and people’s opinions and attitudes toward them directly reflect their perceptions of social reality. In the current digital economy era, DT serves as a catalyst for economic growth and a propelling force. Consequently, examining the subject of DT to uncover its patterns of information spread and the public’s stance is both highly significant and pressing.

2.2. Social Media Text Mining

Social media text mining is the process of collecting, processing, and analyzing large amounts of text data from social media platforms to extract useful information and insights. These include themes, themes, content analysis, etc. Viet et al. proposed a thematic model designed to identify prevalent themes from aggregated social media data, in conjunction with a sentiment analysis framework developed to assess public opinion regarding diverse aspects of urban infrastructure [18]. Sun et al. employed both SnowNLP and LDA for the purposes of sentiment mining and topic extraction [19]. This analysis enabled the examination of the evolution of public opinion in each coupling stage, both in terms of sentiment and topic, as well as the spatiotemporal coupling relationship between online public opinion and offline epidemics. LDA is an unsupervised machine learning technique which can identify hidden theme information in the corpus and is widely used in the field of opinion theme research [20]. SnowNLP is a language processing library for the Chinese corpus; it has powerful functions, including Chinese word segmentation, part-of-speech tagging, sentiment analysis, text classification, pinyin conversion, traditional-to-simplified conversion, text keyword extraction, abstract extraction, sentence segmentation, text similarity, etc. [21]. Zhou et al. analyzed breaking news and public opinion through fine-grained mining in order to determine the impact of user characteristics on public opinion [22]. Li et al. extracted data such as spatiotemporal labels and textual content, which were used to discover theme and sentiment labels hidden in opinion texts in order to discover the contextual evolution characteristics and patterns of online public opinion from a temporal perspective [23]. Carvalho VD et al. collected public safety issues in different media, applied natural language techniques to text cleaning, named entity recognition, and constructed theme modeling [24]. Yadav et al. studied public sentiment toward the COVID-19 vaccine and COVID-19 responses through theme modeling-supported text mining and network analysis [25]. Li et al. examined different text techniques and used natural language processing methods for extracting themes, classifying text, and analyzing sentiments and clustered text for tourism text mining [26]. Kumar et al. used a visualization tool to show the associations between the themes extracted from the text in order to clarify the main themes and relationships [27]. Aloini et al. utilized methodologies derived from social network analysis and text mining to delineate the social networks of handovers among stakeholders in port logistics. The efficiency of the export process was evaluated, and significant deviations within it were identified [28]. Wu et al. utilized both a single-document and a multi-document summarization approach, grounded in SNA, to identify and extract significant sentences from various articles [29]. D’agostini used text mining to analyze Maersk and Mediterranean Shipping’s social media posts to reveal the marketing focus of their advertising content, determining whether it was skewed towards brand awareness, the launch of a new shipping service, or providing value orientation to stakeholders [30]. Bhat et al. used content mining-related techniques to analyze online social networks in depth [31]. Himelboim et al. created a thematic network on social media by means of keywords or hashtags for the search and selection of different collections of information [32].

2.3. Use of Social Media

Social media platforms are widely used globally in various countries and regions. Both the Eastern and Western worlds utilize these platforms for communication, marketing, and data analysis. Social media influences various aspects of economic life. Linda et al. underscored the influence of cultural values on SNS use, with collectivistic cultures like China placing a higher value on real-world social interactions and individualistic cultures like the U.S. showing a greater inclination towards virtual social interactions. Personal characteristics played a more significant role in predicting SNS use in the U.S. than in China, where cultural values and norms may take precedence [33]. Keller et al. found that while users are free to express their opinions and emotions on Twitter, there is also a large amount of disinformation [34]. Robson et al., studied brand post popularity on various social media platforms, including Twitter, Instagram, and Facebook, to understand the relationship between the post characteristics and popularity of start-ups [35]. Amin et al. investigated corporate financial disclosure via Twitter and found that board characteristics such as independence, gender diversity, and tenure were associated with the extent of Twitter usage for financial disclosure [36]. Jordan et al. analyzed gun advertisements on social media platforms like Twitter, noting the prevalence of protection themes in advertisements featuring women [37]. Deng et al. stated the perceived image of Western tourists is more related to architecture and representative attractions, while the perceived image of Eastern tourists is more focused on the daily life and culture of the residents. Western tourists tend to use stronger emotional vocabulary when expressing their emotions, whereas Eastern tourists are more introverted and use vocabulary with less emotional intensity. Western tourists may be more concerned with the characteristics of cities where modernization and tradition coexist, while Eastern tourists may be more concerned with people and nature as well as the environment of daily life [38]. Xing et al.’s study identified some commonalities in media use between China and the West. For example, social media platforms such as Facebook, Twitter, Weibo, WeChat, etc. provide convenient channels for Internet users to publish, discuss, and disseminate their perceptions, emotions, and opinions on topical social events. The emotions, opinions, and perceptions of online groups are expressed, discussed, diffused, and converged through social media platforms to form and evolve online public opinion, which interacts, negotiates, plays, and even clashes with public issues on the Internet [39].

3. Methodology and Research Design of Themes

3.1. Text Mining and the SNA Method

1. Text mining technology, alternatively known as text analytics or natural language processing, is a branch of artificial intelligence whose main function is to extract insights and information from large amounts of unstructured text data. The technique uses algorithms and statistical models to analyze text data, identify patterns and trends, and extract meaning from language. Text mining can also be used to identify trends and patterns in large amounts of data that are difficult or impossible to recognize manually. Steps in text mining include information retrieval and preprocessing, classification, and clustering and more advanced processes such as extracting relationships or complex patterns [40,41]. In the current era, the majority of data within businesses, industries, governmental bodies, and other organizations are archived in textual format within databases, which include semi-structured information [42,43].
2. Social network analysis, or SNA, is a theoretical approach applied in a multidisciplinary field dedicated to analyzing and modeling the relationships between various objects and other entities in nature and society in order to understand the social phenomena of individual behavior and their interactions. It can also be modeled on information dissemination to study the connections and interactions between information or knowledge [30,31]. In addition to studying relational data, SNA can also be used as a semantic mining technique for textual research, such as co-word network research, which is a type of social network analysis that focuses on the relationships between words and not only between individuals or organizations. It is a valuable tool for gaining insights into the structure and content of text datasets, and it can identify key themes and concepts.
Measures of network centrality, such as degree centrality and betweenness centrality, are chosen for the analysis of trending theme networks based on three key network theories: Bavelas’s theory of centralization [44], Freeman’s theory of centrality [45], and Granovetter’s theory of the strength of weak ties [46]. In the field of SNA, degree centrality, also referred to as degree, is a pivotal metric. This metric, which is determined by the number of direct connections a particular node has in the network, is of particular significance. Nodes with a large number of direct connections are considered to have a high degree of centrality and are at the center of the network. This metric highlights nodes with extensive connectivity, thereby indicating that these nodes have more direct interactions and closer ties with other nodes in the network. In a network with n participants, the degree centrality equation for a word W o i can be defined as follows:
D e g r e e   C e n t r a l i t y W o i = c ( n i ) n 1
where c( n i ) represents the number of words with whom the word   i is connected in the co-word network.
Another indicator that characterizes centrality is betweenness centrality, which measures the degree based on actors who control resources. The frequency of a specific node on the shortest path between any pair of nodes in the network is measured. Also, the node on the shortest path is considered to be in a favorable position. Therefore, nodes that appear on the shortest path tend to have higher betweenness centrality than nodes that do not appear. In a co-word network of size n, the betweenness centrality for a word W o i can be represented by the following equation:
B e t w e e n n e s s   C e n t r a l i t y W o i = j < k d j k ( n i ) d j k [ ( n 1 ) ( n 2 ) ] / 2
where i j k , and d j k ( n i ) represents the number of shortest paths linking the two words that contain word i , and d j k is the number of shortest paths linking word j   and word   k .

3.2. Research Process Design

The present study is based on the analysis of corpora in order to identify themes and analyze text. In addition, this study puts forward various pieces of research work, including the frequency distribution of hot words, the topic groupings of theme words, and the network of themes’ hot words. Consequently, this study analyzes and presents the characteristics and contents of the dissemination of public opinion on various platforms. The three questions raised by this study are proposed to be addressed and demonstrated through the design of the research process. The design of the research process is shown in Figure 1.

4. Experiments and Results of Themes

4.1. Data Collection and Processing

1. DT is a critical academic and managerial topic. Many digital media platforms, social platforms, and learning communities have discussed DT extensively. This study selects three different types of representative social media platforms for the analysis of public discussion themes on DT. They are Huawei forum, Sohu News, and Zhihu Forums.
The Huawei forum is a channel created to connect Huawei experts and external developers, discuss development practices, share industry trends, and answer developers’ questions; the Huawei forum details the discussion and cognition of professionals on a given topic. Sohu News, a large portal website with social media capabilities, allows numerous users to access news information promptly; Sohu News displays official news. Zhihu is an original content platform that gathers high-quality Q&A communities and creators on the internet, allowing people to share knowledge, experience, and insights and find answers; Zhihu forum displays the comments of the active public. As such, these platforms represent the views of professionals, the views of the news media, and public opinion, respectively. These three platforms were considered to be valuable resources in identifying the theme of “DT”, as well for the following reasons. (1) These blogs are long articles and more descriptive, and they are publicly available, unlike short comments. (2) Specialized platforms allow you to interact with people in your field of expertise, not just make friends. (3) Since these platforms publish information continuously, it is possible to analyze the theme ”DT”. (4) These platforms are also loved by celebrities and social sectors. Differing from the social media platforms selected for the study, Twitter, Facebook, and WhatsApp are the strongest in terms of private domain transformation, being typical social media, and are more used for interpersonal communication and the creation of public relations. However, the three media platforms selected for the study are professional forums, portals, and knowledge-based communities; they are mass media which focus on professional discussion, news dissemination, and knowledge sharing, which can serve as a forum for public opinion with the authority to express opinions and release news and have a certain degree of social influence.
2. This study will explore the rules of information dissemination of popular DT themes and analyze the theme’s content via public opinion on DT. Specifically, the data are collected by crawlers. The text mining process consists of several stages, including data acquisition, preprocessing, analysis, and visualization. In the data collection phase, the text data are collected from various sources and stored in a structured format. Then, word separation and other text-mining preprocessing steps are performed to develop a standardized corpus. Based on the corpus, a network of relationships between words is created for this matrix, and characterization is performed. The text is then analyzed by using the theme clustering and SNA method.
3. Basic description of the data. (1) In the study, “digital transformation” was used as the keyword for collecting relevant themes, articles, and long blog posts (1350 in total) from three platforms, namely the Huawei forum, Sohu News, and the Zhihu forum. The long blogs’ collection time was focused on 2021 to 2022, according to preliminary measurements. Figure 2 below shows the frequency of posts for each platform. (2) This study extracted positive and negative words from the corpus of the three platforms. The representative terms are shown in Table 1. The study assigned a score of 1 to positive words and −1 to negative words. This resulted in a distribution graph of positive and negative words, as shown in Figure 3. These images show the general situation of the emotions talked about for each platform topic. As can be seen from the figure, the distribution of the total scores for positive and negative sentiments is centered around the plotted curved line. Based on the coefficients (0.0169, 0.0125), we can see that sentiments on the Huawei and Zhihu platforms were more positive, while on the Sohu News platform, they were slightly negative (−0.0557), but not to a great extent.

4.2. Degree Distribution of Theme Hot Words

In this study, the index distribution of text data was calculated, and the distribution results of the centrality and intermediary centrality of hot words were obtained as outputs. The distribution of centrality degree and the distribution of interactions between words both exhibited a power law, indicating that most words only interacted with a few other words. The degree distribution characteristics also revealed that the themes of various forums are led by these few centrality words and revealed the spreading effect and statistical characteristics (Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9). Centrality shows most of the words directly connected to the hotspot words, while betweenness shows some other words go through an intermediary centricity word to connect with the hotspot words. This indicates that the degree centrality and betweenness centrality of hot words play a crucial role in the composition of blog posts. In addition, the greater the slope of the degree distribution, the higher the popularity of its central degree words and intermediate central degree words, meaning that these words attract a higher degree of connectiveness, thus forming clusters of topics and driving the formation of topics. For example, in Figure 4 and Figure 5, the slope of −0.758 is steeper than −0.96, which indicates that the steeper the slope, the stronger the word connection is. This shows that the topic of DT is more popular among Sohu News users than Huawei forum users. The topic is discussed more deeply and heatedly on Sohu News and spreads more widely. This is a statistical characteristic of topic dissemination.

4.3. Theme Hot Words Mining and Explanation

This section focuses on theme mining. It includes the following sections. 1. Theme categorization to show the structure of the communication. 2. Construction of a theme hot word network used to extract and show keywords. Due to the excessive number of words that would affect the display effect of word network diagrams, the words with the highest centrality rank are selected for visual display in this research phase. There are three types of charts calculated. One type is a clustering graph. Another is the hot word network diagram, and there is also the hot word centrality degree table. The centrality degree table is the result of the extraction and presentation of the hot word network diagram. Furthermore, this part explains and demonstrates the three research questions, which are critical to this study.

4.3.1. Hot Words in Huawei Forum

According to Figure 10, the themes Artificial Intelligence (blue), Internet of Things (green), and Data Center (red) are the three theme clusters with the central term. This exhibits the thematic structure of the Huawei forum. Many of the concepts in the three theme clusters are interlinked, which indicates that they are interrelated and influential in practical applications. For example, “industrial internet” can be implemented and deployed with the help of “cloud platforms”; “digital transformation” requires the use of “big data” and “automation”; and “deep learning” is an important branch of artificial intelligence, which enables computer systems to simulate human learning processes to achieve higher-level tasks. Artificial Intelligence in Figure 10 also includes “automated driving”, “facial recognition”, “speech recognition”, “natural language processing”, and other technologies. The continuous development of these technologies will bring about profound changes in a wide range of areas of social life and work.
Figure 11 demonstrates the word “Operating Systems” at the center of the hot word network. In Figure 11, initialization, constructors, and static methods are the most relevant words. As can be seen from Table 2, the degree of “Operating Systems” is the highest at 1581, accounting for 0.153; Initialization is 1196, accounting for 0.115, etc. This shows that the operating system is the most basic concept, which manages computer hardware resources and provides services for applications; initialization is the preparation work before running a program to ensure that the program can be started correctly; software development involves designing, writing, testing, and maintaining application software; constructors are special methods in a class used to create object instances; and so on. In summary, Figure 11 illustrates that the Huawei forum has an in-depth technical discussion, a high degree of specialization, and is mainly concerned with computer science, focusing on landing and implementation. These core words indicate a deeper discussion of the technological drivers of the digital economy.

4.3.2. Hot Words in Sohu News

Figure 12 is a complex network diagram showing three clusters of themes centered on digitization (red), with China (green) and technology (blue) as associated clusters. This exhibits the thematic structure of Sohu News. Some key concepts can be observed in Figure 12, such as “digitization”, “data” and “strategy”. These concepts are strongly linked to many others. Secondly, concepts such as “China” and “platform” can be seen in the diagram, which may indicate that the network diagram focuses on themes related to a specific country or region or explores the application and development of digital technologies in different cultural and economic contexts. The map also represents the digital transformation of enterprises or organizations, business mode, and interactions with user, consumer, and services.
Figure 13 demonstrates the word “Enterprise” at the center of the hot word network. The figure also shows that “Digital Transformation” is one of the core concepts of this network, which connects several other nodes, such as “Digitization”, “Technology”, “Data”, and so on. This suggests that digital transformation plays a key role in modern business practices, involving all aspects of the organization and driving other concepts. In addition, in Table 3 “Enterprise, 100,354, 0.201” is connected to other nodes as the most important node in the network, indicating the increasing importance of enterprises in the digital era. Figure 13 also shows the interrelationships between the elements of the digital enterprise. These elements are connected by complex lines, indicating their interactions and dependencies in the digital enterprise. For example, there are links between “Services” and “Technology”, “Markets”, and “Big Data”, etc., showing the synergy and impact of the elements in the digital enterprise. By the strength of its connections, Figure 13 also shows that the most relevant words are Enterprise, Digitalization, Digital Transformation, Software, Technology, and Business. Overall, the central word in Figure 13 expresses Sohu News’ focus on the strategic level of enterprises’ operation.

4.3.3. Hot Words on Zhihu Forum

Figure 14 is organized into three main themes. The company (green), business (red), and ability (yellow) are the center words. These theme clusters mainly include “strategy”, “client”, “platform”, “product”, “Tradition”, “Marketing”, “Research and Development”, “manage”, and “Business”. These terms focus on the operational, managerial, and strategic aspects of a company, emphasizing the importance of marketing, customer relations, product development, and management. However, government (yellow) and ability (yellow) imply the role and influence of government in the business environment and how firms can use their capabilities to respond to policy changes and market competition.
Figure 15 demonstrates the word “Digitalization” at the center of the hot words network. Table 4 shows that “Digitalization” has a high degree of value at 12,060, 0.347, which means that it plays a central role among other technologies and concepts. Table 4 also shows words “Big Data”, “Cloud Computing”, and “Automation”, which means that they have been developed and applied on the basis of Digitalization. While in Figure 15 shows “AI”, “Machine Learning”, and “Internet of Things” may have been inspired by or directly benefited from digitalization and other higher-level technologies. These words are interconnected, and the connective relationships also reflect the current trend and direction of technology development, which is moving deeper from fundamentals to applications. The thickness of the connecting lines in Figure 15 reflects that “Digitalization”, “Informatization”, and “Big Data” are the most highly correlated, which indicates their importance and interplay in the modern technology landscape. Overall, the central words in Figure 15 are indicative of the central idea, which is the exploration of the connecting elements of digital development.

4.3.4. Commonality of Themes on the Three Social Media Platforms

Figure 16 and Figure 17 present the results of words that are common to a total of three platforms. These words reflect the commonality of the topics discussed on the three platforms. Figure 16 compares the number of common words on each platform. As can be seen from the figure, “Artificial Intelligence” appears most frequently in the Huawei forum. For Sohu News and the Zhihu forum, the most frequent word is “Digital transformation”. Figure 17 presents the cumulative total number of words common to the three platforms; it can be seen that “Artificial Intelligence” has the highest percentage. The next most frequent words are “Digitalization” and “Digital transformation”. The first three words occupy 0.6818 of the common hot words, and the remaining hot words occupy 0.382. This percentage decreases as the field name extends to the right, showing a long-tailed distribution. This indicates that the current research hotspots are concentrated in AI- and digitization-related fields, which have received extensive attention and development in recent years.
According to the common hot words in Figure 17, the common themes of the three platforms are summarized as follows. Digital strategy: digitization and digital transformation; Data technology: big data, data center, digital technology, computing power, cloud service, cloud computing, and blockchain; Intelligent manufacturing: intelligentization, automate, and informatization; AI: Artificial Intelligence; Internet business model: mobile internet, electronic business, and Internet of Things. A total of five themes were discussed across the three social media platforms.

5. Discussions and Conclusions

This study analyzed the dissemination of the theme of DT in nongovernmental organizations, professionals, and large media social platforms. This study identified themes and summarized opinions on hot themes regarding communication and revealed the thematic hot word characteristics of each platform. The discussion and conclusions are summarized here, as follows.

5.1. Discussions

This study comes to the following conclusions from the measurement of the hot word spreading in Section 4.2 and Section 4.3. These two sections adequately demonstrates the statistical characteristics, themes structural features of the three platforms, and the response to RQ1, RQ2 is more detailed. First, this study began by identifying the primary themes and concepts within a DT text corpus and establishing their relationships. This study discovered that words with high degrees of network centrality and betweenness in a hot word network tend to have strong connections, indicating that they are more likely to create new communication themes. They thus contribute more to the concept of digitalization or digitalization innovation compared to other words. Second, the degree distribution graph of the statistics illustrates that the proliferation of hot words shows a power law distribution pattern, i.e., a few words or content take up most of the popularity and themes, while a large number of other words or content create only a smaller number of themes. Also, when the degrees are the same, whether centrality or betweenness, the greater the slope, and the greater the popularity of the word. Hence, when a theme becomes popular, keywords and phrases related to that theme may be used more frequently in discussions and content creation. By studying the distribution of keywords of a hot topic, it is possible to gain a deeper understanding of information diffusion characteristics. Lastly, this study demonstrates the relative importance and connection strength between different concepts through the thickness and color of the lines of the hot word centrality degree network diagram. The thicker the lines between nodes, the stronger the relationship or the higher the connection strength. Moreover, these network structures demonstrate how these technologies and concepts are interconnected and influence the future of intelligent systems and digitization. The connections between them show the interaction and integration between these represented domains, which are elements that together drive digital transformation and development. Overall, Section 4.2 statistically illustrates that the degree distribution is a keyword centered, topic-driven pattern; Section 4.3 exhibits the structure of the themes.

5.2. Conclusions

Using public opinion content analysis, this study comes to the following conclusions. The theme cluster map in Section 4.3 shows the themes clustered on each platform, and the thematic network map shows the keywords of each platform’s theme network and a summary of their commonalities. In addition, based on the analysis of these clusters and theme networks, relevant conclusions and answers to RQ3 were drawn. The conclusions are elaborated upon as follows.
The public opinion of three social media platforms with different attributes represents the public opinions and attitudes of different social groups. Huawei forum is an ecological service platform for Huawei’s terminal business that brings together a wide range of professionals. The platform talks about the meaning, role, function, and necessity of digitalization and digital transformation from the perspective of experts and developers. Therefore, the Huawei forum mainly discusses the technological drivers of the digital economy from a micro level. Sohu News is both a portal and a new media site. It is also a well-known and influential form of media. The platform presents news released by celebrities, opinion leaders, media professionals, and government officials, etc. Therefore, the platform’s opinions have a social influence. In this study, we found that Sohu News explores business operation strategies at a macro level. Zhihu is a platform on which the public engages in relevant discussions around a theme of interest. It is a media community with a wide range of participants. The themes talked about are discussions and opinions expressed by enthusiasts and people of interest. This study found that the Zhihu forum addresses the elements of digital development at the micro level. Finally, Section 4.3.4 concludes that the three platforms share a number of hidden themes that focus on elements of digital transformation such as digital technology, AI, and strategy, etc.
Each platform demonstrates a variety of values. (1) For the Huawei forum, these core words also illustrate in-depth discussions and exchanges on the technological drivers of the digital economy on this social media platform. The core values emphasize the convergence and innovation of management and operating models for digital transformation. (2) Sohu News focuses on the strategic level of enterprise operation. The core values emphasize the adaptability and importance of organizational change and management restructuring. (3) On the Zhihu forum, the core values emphasize the technological foundation and strategic direction of digital transformation as well as the importance of synergies between the online and offline and the Internet of Things. To conclude, the values of these core terms illustrate that digital transformation is not just about technology but involves a comprehensive transformation of many aspects of corporate strategy, business models, and business organizational structures. They emphasize the data-centric drive and the propulsive nature of technology. They also illustrate that digital transformation is a complex process that requires enterprises to think and act deeply at multiple levels to ensure that they remain competitive and sustainable in the digital economy and adapt to the requirements of the digital economy era.

5.3. Future Work and Research Limitations

This study generally analyzes the themes of public opinion on DT across different social media platforms. The views and opinions in each platform were mined, and the dissemination characteristics of popular theme words were computed. However, because of technical limitations, this paper’s textual information is limited. Although text mining based on SNA is a powerful tool for analyzing unstructured text data, it also involves several challenges, including that of data quality, bias and fairness; these can make accurate and effective analyses difficult. A high-quality corpus can be difficult to collect. The quality of the text data being analyzed can significantly impact the accuracy of text mining results. Text data may contain errors, inconsistencies, or missing information, making extracting meaningful insights difficult. Moreover, bias and fairness in the texts are a concern. These data can lead to inaccurate results. Considering the above, preprocessing tasks such as data cleaning, normalization, and standardization can help to mitigate this challenge and validate findings using human judgment and continuously monitoring and refining the text mining process, which can lead to the better accuracy of research results. There is also a technical lack of more AI algorithms for deep text mining, such as LDA, K-Means, etc. So, future research must aim to apply more algorithms and expand the scope of information collection to propose further dimensions for a deeper and more comparative analysis, which will strengthen future research. Finally, this paper still has a limited understanding of international media practices. It is difficult to perform an in-depth analysis of media themes in both Chinese and Western media. This limits the global extent of our textual study. These are dimensions for future research to look into.

Author Contributions

Conceptualization, H.L.; methodology, C.W.; software, C.W.; visualization, Y.G.; formal analysis, H.L.; resources, R.L.; data curation, Y.G.; writing—original draft preparation, H.L.; supervision, R.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Zhejiang Provincial Philosophy and Social Sciences Planning Project (25NDJC068YBMS).

Data Availability Statement

Data came from publicly available websites. These data can be found here: https://developer.huawei.com/consumer/cn/forum/ (accessed on 5 September 2022), https://news.sohu.com/ (accessed on 5 September 2022), https://www.zhihu.com/ (accessed on 5 September 2022).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hess, T.; Matt, C.; Benlian, A.; Wiesböck, F. Options for Formulating a Digital Transformation Strategy. MIS Q. Exec. 2016, 15, 123–139. [Google Scholar]
  2. Nadkarni, S.; Prügl, R. Digital transformation: A review, synthesis and opportunities for future research. Manag. Rev. Q. 2020, 10, 539. [Google Scholar] [CrossRef]
  3. Ebert, C.; Duarte, C.H.C. Digital transformation. IEEE Softw. 2018, 35, 16–21. [Google Scholar] [CrossRef]
  4. Zhao, T.; Zhang, Z.; Liang, S.K. Digital economy, entrepreneurship, and high-quality economic development: Empirical evidence from Urban China. J. Manag. World 2020, 36, 65–76. [Google Scholar]
  5. Lobo, S.; Whyte, J. Aligning and Reconciling: Building project capabilities for digital delivery. Res. Policy 2017, 46, 93–107. [Google Scholar] [CrossRef]
  6. Laor, T. Breaking the silence: The role of social media in fostering community and challenging the spiral of silence. Online Inf. Rev. 2024, 48, 710–724. [Google Scholar] [CrossRef]
  7. Hou, Y.; Meng, F.; Wang, J.; Li, Y. Research on two-stage public opinion evolution configuration path based on fuzzy-set qualitative comparative analysis. Aslib J. Inf. Manag. 2024, 76, 677–693. [Google Scholar] [CrossRef]
  8. Wang, J.; Li, Y. Research on the propagation and governance of public opinion information under the joint action of internal and external factors. Aslib J. Inf. Manag. 2023, 75, 193–214. [Google Scholar] [CrossRef]
  9. Xing, Y.F.; Li, Y.; Wang, F.K. How privacy concerns and cultural differences affect public opinion during the COVID-19 pandemic: A case study. Aslib J. Inf. Manag. 2021, 73, 517–542. [Google Scholar] [CrossRef]
  10. Wang, S.; Song, Y. Chinese online public opinions on the two-child policy. Online Inf. Rev. 2018, 43, 387–403. [Google Scholar] [CrossRef]
  11. Yu, W.; Chen, N.; Chen, J. Characterizing Chinese online public opinions towards the COVID-19 recovery policy. Electron. Libr. 2022, 40, 140–159. [Google Scholar] [CrossRef]
  12. Calnan, M. Health policy and controlling COVID-19 in England: Sociological insights. Emerald Open Res. 2023, 1, 1–16. [Google Scholar] [CrossRef]
  13. Kosko, B. Fuzzy cognitive maps. Int. J. Man-Mach. Stud. 1986, 24, 65–75. [Google Scholar] [CrossRef]
  14. Zhang, Y.; Xu, L.; Lu, Z. Research on policy diffusion mechanism of Government Procurement of Public Services based on an MFCM. Kybernetes 2023, 52, 3986–4013. [Google Scholar] [CrossRef]
  15. Lawlor, A.; Girard, T.; Wodnicki, P.; Goode, M. Crisis management: Personal financial well-being and public attitudes toward government intervention. Int. J. Sociol. Soc. Social Policy 2023, 43, 777–794. [Google Scholar] [CrossRef]
  16. Núnez-Barriopedro, E.; Penelas-Leguía, A.; López-Sanz, J.M.; Loranca-Valle, M.C. A public service management model as an antecedent for citizen satisfaction and fiscal policy. Manag. Decis. 2024, 62, 725–739. [Google Scholar] [CrossRef]
  17. Garbarino, E.; Henry, P.; Kerfoot, S. Using attribution to foster public support for alternative policies to combat obesity. Eur. J. Mark. 2018, 52, 418–438. [Google Scholar] [CrossRef]
  18. Viet, N.T.; Banlasan, D.; Sy, D.T. Public Opinion Analysis for Management of Urban Infrastructure Systems: Social Media Data Mining Approach. In Sustainability Management Strategies and Impact in Developing Countries; Emerald: Leeds, UK, 2022; pp. 233–242. [Google Scholar]
  19. Sun, J.; Zeng, Z.; Li, T.; Sun, S. Analyzing the spatiotemporal coupling relationship between public opinion and the epidemic during COVID-19. Libr. Hi Tech 2024, 42, 1880–1904. [Google Scholar] [CrossRef]
  20. Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
  21. Zhang, B.; Xiao, P.; Yu, X. The Influence of Prosocial and Antisocial Emotions on the Spread of Weibo Posts: A Study of the COVID-19 Pandemic. Discret. Dyn. Nat. Soc. 2021, 2021, 8462264. [Google Scholar] [CrossRef]
  22. Zhou, Q.; Jing, M. Multidimensional mining of public opinion in emergency events. Electron. Libr. 2020, 38, 545–560. [Google Scholar] [CrossRef]
  23. Li, Q.; Zeng, Z.; Sun, S.; Cheng, C.; Zeng, Y. Constructing a spatiotemporal situational awareness framework to sense the dynamic evolution of online public opinion on social media. Electron. Libr. 2023, 41, 722–749. [Google Scholar] [CrossRef]
  24. De Carvalho, V.D.H.; Costa, A.P.C.S. Towards corpora creation from social web in Brazilian Portuguese to support public security analyses and decisions. Libr. Hi Tech 2024, 42, 1080–1115. [Google Scholar] [CrossRef]
  25. Yadav, H.; Sagar, M. Exploring COVID-19 vaccine hesitancy and behavioral topics using social media big-data: A text mining approach. Kybernetes 2023, 52, 2616–2648. [Google Scholar] [CrossRef]
  26. Li, Q.; Li, S.; Zhang, S.; Hu, J.; Hu, J. A Review of Text Corpus-Based Tourism Big Data Mining. Appl. Sci. 2019, 9, 3300. [Google Scholar] [CrossRef]
  27. Kumar, S.; Kar, A.K.; Ilavarasan, P.V. Applications of text mining in services management: A systematic literature review. Int. J. Inf. Manag. Data Insights 2021, 1, 100008. [Google Scholar] [CrossRef]
  28. Aloini, D.; Benevento, E.; Stefanini, A.; Zerbino, P. Process fragmentation and port performance: Merging SNA and text mining. Int. J. Inf. Manag. 2020, 51, 101925. [Google Scholar] [CrossRef]
  29. Wu, I.C.; Lin, Y.S. WNavis: Navigating Wikipedia semantically with an SNA-based summarization technique. Decis. Support Syst. 2012, 54, 46–62. [Google Scholar] [CrossRef]
  30. D’agostini, E. The dynamics of value propositions through social media engagement in maritime transport networks: Maersk vs Mediterranean Shipping Company. Marit. Bus. Rev. 2022, 8, 209–224. [Google Scholar] [CrossRef]
  31. Bhat, S.Y.; Abulaish, M. Analysis and mining of online social networks: Emerging trends and challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2013, 3, 408–444. [Google Scholar] [CrossRef]
  32. Himelboim, I.; Smith, M.A.; Rainie, L.; Shneiderman, B.; Espina, C. Classifying Twitter topic-networks using social network analysis. Soc. Media+ Soc. 2017, 3, 2056305117691545. [Google Scholar] [CrossRef]
  33. Jackson, L.A.; Wang, J.L. Cultural differences in social networking site use: A comparative study of China and the United States. Comput. Hum. Behav. 2013, 29, 910–921. [Google Scholar] [CrossRef]
  34. Keller, F.B.; Schoch, D.; Stier, S.; Yang, J. Political astroturfing on twitter: How to coordinate a disinformation campaign. Political Commun. 2020, 37, 256–280. [Google Scholar] [CrossRef]
  35. Robson, S.; Banerjee, S. Brand post popularity on Facebook, Twitter, Instagram and LinkedIn: The case of start-ups. Online Inf. Rev. 2022, 47, 486–504. [Google Scholar] [CrossRef]
  36. Amin, M.H.; Mohamed EK, A.; Elragal, A. Corporate disclosure via social media: A data science approach. Online Inf. Rev. 2020, 44, 278–298. [Google Scholar] [CrossRef]
  37. Jordan, L.; Kalin, J.; Dabrowski, C. Characteristics of gun advertisements on social media: Systematic search and content analysis of Twitter and YouTube posts. J. Med. Internet Res. 2020, 22, e15736. [Google Scholar] [CrossRef]
  38. Deng, N.; Liu, J.; Dai, Y.; Li, H. Different cultures, different photos: A comparison of Shanghai’s pictorial destination image between East and West. Tour. Manag. Perspect. 2019, 30, 182–192. [Google Scholar] [CrossRef]
  39. Xing, Y.; Wang, X. Research Review of Foreign Studies on Group Polarization in Social Media. Inf. Sci. 2022, 40, 176–184. [Google Scholar]
  40. Anne, K.; Poteet, S.R. (Eds.) Natural Language Processing and Text Mining; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  41. Gulo, C.A.S.J.; Rúbio, T.R.P.M. Text Mining Scientific Articles using the R. In Doctoral Symposium in Informatics Engineering; University of Porto: Porto, Portugal, 2015; pp. 60–69. [Google Scholar]
  42. Al Hattab, M. The dynamic evolution of synergies between BIM and sustainability: A text mining and network theory approach. J. Build. Eng. 2021, 37, 102159. [Google Scholar] [CrossRef]
  43. Gaikwad, S.V.; Chaugule, A.; Patil, P. Text mining methods and techniques. Int. J. Comput. Appl. 2014, 85, 42–45. [Google Scholar]
  44. Bavelas, A. Communication Patterns in Task-Oriented Groups. Acoust. Soc. Am. J. 1950, 22, 725–730. [Google Scholar] [CrossRef]
  45. Freeman, L. Centrality in social networks: Conceptual clarification. Soc. Netw. 1978, 1, 215–239. [Google Scholar] [CrossRef]
  46. Granovetter, M. The strength of weak ties. Am. J. Sociol. 1973, 78, 1360–1380. [Google Scholar] [CrossRef]
Figure 1. Research process design graph.
Figure 1. Research process design graph.
Systems 13 00159 g001
Figure 2. Frequency of posts on three social media platforms. (a) Huawei forum post frequency. (b) Sohu News post frequency. (c) Zhihu forum post frequency.
Figure 2. Frequency of posts on three social media platforms. (a) Huawei forum post frequency. (b) Sohu News post frequency. (c) Zhihu forum post frequency.
Systems 13 00159 g002
Figure 3. Positive and negative total word score distribution on three social media platforms. (a) Huawei forum’s positive and negative word scores. (b) Sohu News’ positive and negative word scores. (c) Zhihu forum’s positive and negative word scores.
Figure 3. Positive and negative total word score distribution on three social media platforms. (a) Huawei forum’s positive and negative word scores. (b) Sohu News’ positive and negative word scores. (c) Zhihu forum’s positive and negative word scores.
Systems 13 00159 g003
Figure 4. Degree centrality distribution of Huawei forum hot words.
Figure 4. Degree centrality distribution of Huawei forum hot words.
Systems 13 00159 g004
Figure 5. Degree centrality distribution of Sohu News hot words.
Figure 5. Degree centrality distribution of Sohu News hot words.
Systems 13 00159 g005
Figure 6. Degree centrality distribution of Zhihu forum hot words.
Figure 6. Degree centrality distribution of Zhihu forum hot words.
Systems 13 00159 g006
Figure 7. Betweenness centrality distribution of Huawei forum hot words.
Figure 7. Betweenness centrality distribution of Huawei forum hot words.
Systems 13 00159 g007
Figure 8. Betweenness centrality distribution of Sohu News hot words.
Figure 8. Betweenness centrality distribution of Sohu News hot words.
Systems 13 00159 g008
Figure 9. Betweenness centrality distribution of Zhihu forum hot words.
Figure 9. Betweenness centrality distribution of Zhihu forum hot words.
Systems 13 00159 g009
Figure 10. Huawei forum hot words’ theme clustering.
Figure 10. Huawei forum hot words’ theme clustering.
Systems 13 00159 g010
Figure 11. Huawei forum hot word network diagram.
Figure 11. Huawei forum hot word network diagram.
Systems 13 00159 g011
Figure 12. Sohu News hot words’ theme clustering.
Figure 12. Sohu News hot words’ theme clustering.
Systems 13 00159 g012
Figure 13. Sohu News hot words’ network diagram.
Figure 13. Sohu News hot words’ network diagram.
Systems 13 00159 g013
Figure 14. Zhihu forum hot words’ theme clustering.
Figure 14. Zhihu forum hot words’ theme clustering.
Systems 13 00159 g014
Figure 15. Zhihu forum hot words’ network diagram.
Figure 15. Zhihu forum hot words’ network diagram.
Systems 13 00159 g015
Figure 16. Common hot word structures on three social media platforms.
Figure 16. Common hot word structures on three social media platforms.
Systems 13 00159 g016
Figure 17. Common hot words’ distribution and cumulative frequency graph.
Figure 17. Common hot words’ distribution and cumulative frequency graph.
Systems 13 00159 g017
Table 1. The representative positive and negative words.
Table 1. The representative positive and negative words.
CategoriesWords
Huawei positive wordssupport, consistent, must, normative, important, simple, gain, normal, proper, right, etc.
Sohu positive wordsdistinguished, trusted, leading, prestigious, elevated, awarded, well-known, reputation, etc.
Zhihu positive wordsfeatured, endorsement, strengths, innovative, efficient, stand out, helpful, etc.
Huawei negative wordsdisappear, lose, worry, misunderstand, rupture, worse, etc.
Sohu negative wordsimpact, block, risk, outburst, difficulty, narrow-viewed, etc.
Zhihu negative wordsinadequate, ephemeral, pandering, subversive, backstage, complex, impractical, etc.
Table 2. Huawei forum hot words’ top centrality degree.
Table 2. Huawei forum hot words’ top centrality degree.
NumberWordCentrality DegreePercentage
1Operating Systems15810.153
2Initialization11960.115
3Software Development7130.069
4Constructors6880.066
5Static Methods6610.064
6Hardware and Software5060.049
7Data Analysis4670.045
8Operations3460.033
9Development Tools3270.032
10Technology Innovation3230.031
Table 3. Sohu News hot words’ top centrality degree.
Table 3. Sohu News hot words’ top centrality degree.
NumberWordCentrality DegreePercentage
1Enterprise100,3540.201
2Digitalization49,0030.098
3Digital Transformation46,5800.093
4Software37,4650.075
5Data37,2900.075
6Technology35,3630.071
7Organizations28,6170.057
8Product27,0820.054
9Systems25,4970.051
10Business23,9780.048
Table 4. Zhihu forum hot words’ top centrality degree.
Table 4. Zhihu forum hot words’ top centrality degree.
NumberWordCentrality DegreePercentage
1Digitalization12,0600.347
2Informatization33550.096
3Artificial Intelligence23600.068
4Big Data21510.062
5Intelligent19480.056
6Automation18590.053
7Digital Technology13100.038
8Cloud Computing9200.026
9Online and Offline8720.025
10Internet of Things8380.024
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liao, H.; Wang, C.; Gu, Y.; Liu, R. A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms. Systems 2025, 13, 159. https://doi.org/10.3390/systems13030159

AMA Style

Liao H, Wang C, Gu Y, Liu R. A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms. Systems. 2025; 13(3):159. https://doi.org/10.3390/systems13030159

Chicago/Turabian Style

Liao, Haihan, Chengmin Wang, Yanzhang Gu, and Renhuai Liu. 2025. "A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms" Systems 13, no. 3: 159. https://doi.org/10.3390/systems13030159

APA Style

Liao, H., Wang, C., Gu, Y., & Liu, R. (2025). A Text Data Mining-Based Digital Transformation Opinion Thematic System for Online Social Media Platforms. Systems, 13(3), 159. https://doi.org/10.3390/systems13030159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop