What Causes the Virtual Agglomeration of Creative Industries?

: The agglomeration paradigm for creative industries has fundamentally changed under the digital economy, giving rise to a new form of virtual agglomeration within these industries. This study explores the causes of this virtual agglomeration. We collected online Chinese news texts related to the virtual agglomeration of the creative industry, used text mining to identify nine factors affecting its formation, and reﬁned the internal and external factors for an analytical framework based on the PEST (political, economic, social, technological) and value-chain models. We then combined the relevant literature and the creative industry’s development practices, analyzed the mechanism of each driving factor, and constructed a driving-force model for the creative industry’s virtual agglomeration. The external driving factors were government policy planning, the digital economic environment, emerging consumer demand, and the application of innovative technology; the internal factors were the digitalization of cultural resources, ﬂexible manufacturing, digital marketing and promotion, online interactive services, and virtual platform facilities. Each factor was found to contribute to virtual agglomeration through different internal mechanisms. This study’s ﬁndings have theoretical and practical value for cultivating the modes of virtual agglomeration within creative industries.


Introduction
With the development of digital technologies, virtual agglomeration has provided a new avenue for the geographic agglomeration of creative industries. Various creative industry enterprises, institutions, and users rely on digital technologies to conveniently share creative ideas, develop creative resources, and design products in virtual spaces. For example, Yipin Witkey, a creative service platform in China, gathers creative service providers in fields such as design, development, planning, and marketing, and it offers more than 300 kinds of creative services. The platform has more than 22 million registered users and has supported the creative needs of numerous enterprises [1]. Meanwhile, the US-based creative gaming platform Steam brings together thousands of game creators and publishers; it offers about 30,000 games played by over 90 million users around the world [2]. These types of virtual agglomeration practices extend beyond traditional concepts of geographical agglomeration; we call this phenomenon "the virtual agglomeration of creative industries" [3,4].
Research on virtual agglomeration began with the EU's program for the Competitiveness of Enterprises and SMEs (COSME) [5,6]. Virtual agglomeration is also sometimes referred to as "e-clusters" or "virtual clusters" [7][8][9]. Brown and Lockett [10] noted that the emergence of the Internet provided firms with platforms on which they could form interconnected digital enterprise communities. Moreover, Mason et al. [11] noted that the development of information and communications technology (ICT) has connected organizations that were previously scattered across regions, thereby forming e-clusters in the information space. Based on the literature [12][13][14], we define virtual agglomeration tion based on a specific location reduces spatial distance in enterprise collaboration, thereby reducing production and transaction costs and increasing production efficiency [34]. Such agglomeration requires enterprises to move into the cluster, and the benefits they obtain can outweigh the costs [14]. Creative industries, however, have difficulty overcoming the boundary constraints of geographic space and providing benefits for those outside the cluster location [35]. Moreover, the geographic agglomeration of the creative industry exists in a relatively closed-loop geographical space [36], and the number and scale of enterprises it can accommodate are limited [14]. This limitation weakens the ability of industry clusters to form more efficient organizational systems.
Virtual agglomeration transcends geographic boundaries [37] and provides a new direction for the development of creative industry agglomeration. It is characterized by data capitalization, online information, demand fragmentation, transaction ubiquity, production flexibility, platform "mega-ization," and full chain integration [38]. Such characteristics are consistent with basic creative industry attributes, such as production digitalization, communication networking, consumption informatization, and innovation globalization [35,39]. The COSME project initially characterized virtual agglomeration as a collection of companies that can mobilize their respective capabilities and operate a virtual enterprise, such that market opportunities are shared among all member companies [5]. Later, Passiante and Secundo [13] defined a "virtual cluster" as a community in which suppliers, distributors, service providers, and customers can cooperate or compete based on technological business networks. Wu and Li [40] suggested that virtual agglomeration can not only incorporate geographical agglomeration but also integrate resources across regions and industries, thus accelerating enterprise innovation. Virtual agglomeration has thus become an important mode of industrial development in the digital economy. Wei and Wu [41] proposed that the virtual clustering of creative industries is embedded in various relationships-such as between production and consumption, global and local, and online and offline-and that connectivity and flexibility are essential indicators for measuring its competitiveness. As the main carriers of virtual agglomeration, network platforms are characterized by openness and inclusivity [42] and facilitate low-cost, high-efficiency connections between numerous subjects, such as creative product/service providers, creative consumers, and related institutions. Based on intelligent algorithms, perfect rules, and humanized service functions of creative network platforms, participants can also realize precise information matching, network collaboration, and value trading [3]. This emergence of virtual agglomeration in creative industries thus creates a need for systematic research on the causes of its formation.
Previous research has focused on specific creative industries, using qualitative methods to summarize the factors affecting the formation of virtual agglomeration. Studying China's animation industry, Du et al. [20] classified the driving factors of virtual agglomeration into three levels: social environment (market demand, network infrastructure, industrial policy); industry (internal organization, institutional construction, cultural construction, network platform, brand effect); and enterprise (information system construction, scale, ability to attract capital, creative ability, learning-conversion ability, technological ability). Wang [21], meanwhile, found that the internal factors affecting virtual agglomeration in the information service industry were survival pressure, interest attraction, user demand, and economic scale; the external driving factors were technical, government, and credit guarantees. Studying the fashion industry, Crewe [23] found that digitally mediated communication technologies significantly influenced virtual agglomeration. Finally, Xiao [22] studied virtual agglomeration in digital publishing and found that it relies on the dual elements of technology and capital, which include capabilities such as digital technology R&D, transformation of technological achievements, collaborative innovation, and capital operation.
Given the differences in the attributes of different creative industries, no consistent findings have emerged regarding the causes of virtual agglomeration. In addition, few studies have systematically developed theoretical frameworks to study the mechanisms of these  [3] only proposed the mechanisms of network coordination, freedom of participation and guarantee of trust based on interviews and questionnaires of creative industries, but they did not analyze these mechanisms in depth. Because the creative industry's virtual agglomeration is a new type of relationship formed in a virtual space, existing research conclusions regarding geographic agglomeration cannot be used to explain its formation. As such, there is a need to establish a new theoretical framework to analyze its driving factors and mechanisms.
Unlike traditional agglomeration, virtual agglomeration is built on network platforms using big data [43], which has benefits but also introduces high complexity [44]. Qualitative methods are not entirely suitable for this research. With the ongoing advancements in bigdata-related research, quantitative methods based on big data web crawling and text mining can provide excellent solutions [45,46], and researchers are increasingly obtaining results that better reflect reality using big data text mining methods [47][48][49]. Taking the Shenzhen Dafen Oil Painting Village as an example, Yuan et al. [50] studied the convergence between creative industry parks and tourism using ROST Content Mining 6 and performing an analysis based on the network text. Applying text analysis to 100,000 tweets, Casadei et al. [51] investigated the interlinked relationships between the fashion industry and locations in four major cities (London, New York, Milan, and Paris). In light of such research, the present study used big-data-related methods to identify the driving factors of the virtual agglomeration of creative industries.
The existing research on the creative industry's virtual agglomeration is fragmented. The main driving factors and mechanisms have not been sufficiently explored and related theoretical frameworks have not been built. Therefore, this study used big data analysis to identify these external and internal driving factors and used PEST and value-chain models to build an analytical framework. The PEST model is a classic tool used for analyzing the external macroenvironment [52]. Lee and Rim [53], for example, used a PEST-SWOT model to analyze the external and internal factors affecting human resource development systems in Korea's software industry. Xie et al. [54], meanwhile, used PEST to refine the external factors affecting the entrepreneurial ecosystem of the online cultural industry. Porter introduced the value-chain model in 1985 to analyze the internal industrial environment; this model assumes that an enterprise's production and operation activities create value, and these different but interrelated activities form a dynamic value-creation process called a value chain [55]. The value chain for the creative industry refers to the organic whole constituted by each value-added link among creative industries [56]. Horng et al. [57] used a value-chain model to analyze the value-creation processes of Taiwan's cultural and creative industries. Madudová [58], meanwhile, used a value-chain model to evaluate industry participants, value-creation processes, supporting environments, and stakeholder relationships in the creative industries of advertising, architecture, and design.
In light of the above, this study analyzed the causes of the creative industry's virtual agglomeration using big data web-crawler technology and text mining to identify the key factors. These were refined into internal and external factors based on PEST and value-chain models, and the mechanism of each factor was analyzed. Finally, a unified driving-force model of the creative industry's virtual agglomeration was built to provide a theoretical basis for guiding creative industry development.

Research Design
News is a record of noteworthy events [59]. It includes not only events but also details about participants, time, place, causes, and processes. It is characterized by newness, importance, proximity, significance, and level of interest [60]. As an extension of traditional news, online news has quickly adopted information technology, featuring fast, multifaceted, multichannel, multimedia, and interactive formats [61]. This study therefore used Chinese online news materials to investigate the virtual agglomeration of creative industries. The specific process is described below. First, Octoparse's web-crawler technology was used to collect Chinese online news reports related to the creative industry's virtual agglomeration to form a news corpus.
Second, Python-based text mining was performed on the collected data, including text preprocessing, feature -word extraction, text vectorization, and text clustering. This helped identify the key factors driving the creative industry's virtual agglomeration.
Third, the identified factors were refined into internal and external factors based on PEST and value-chain models. Each mechanism was analyzed based on the related literature and industry practices. In this way, a driving-force model of the creative industry's virtual agglomeration was built. Figure 1 shows the research framework.
Sustainability 2021, 13, x FOR PEER REVIEW 5 of 18 multifaceted, multichannel, multimedia, and interactive formats [61]. This study therefore used Chinese online news materials to investigate the virtual agglomeration of creative industries. The specific process is described below. First, Octoparse's web-crawler technology was used to collect Chinese online news reports related to the creative industry's virtual agglomeration to form a news corpus.
Second, Python-based text mining was performed on the collected data, including text preprocessing, feature -word extraction, text vectorization, and text clustering. This helped identify the key factors driving the creative industry's virtual agglomeration.
Third, the identified factors were refined into internal and external factors based on PEST and value-chain models. Each mechanism was analyzed based on the related literature and industry practices. In this way, a driving-force model of the creative industry's virtual agglomeration was built. Figure 1 shows the research framework.

Research Methods
Feldman and Dagan proposed the concept of text mining in 1995 [62]. It involves obtaining user interest or useful patterns from unstructured text information [63]. We used text mining to process and analyze the online news corpus; the specific steps are as follows.
The first step is text preprocessing. The complexity of the collected information made text processing and mining difficult. It was therefore necessary to preprocess the text to normalize the data. Preprocessing included text regularization, Chinese word segmentation, part-of-speech tagging, and stop-word removal [64].
The second step is feature-word extraction. Feature-word extraction involves extracting representative and informative phrases from preprocessed text. Using specific algorithms can make this process more efficient and accurate and avoid the disruptive effects of dimensionality in text clustering [65]. Common feature-word extraction methods include the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, the Text Rank algorithm, and the Topic Model algorithm. Among them, TF-IDF is the most widely used and effective method. The principle of TF-IDF is that the most meaningful feature items for distinguishing texts should be words that appear frequently in the text but infrequently in other texts. This approach extracts feature words based on word-frequency statistics combined with weighted calculation [66]. We therefore used TF-IDF to extract feature words from text information.
The third step is text vectorization. The text information in feature words cannot be directly understood by machine-learning models. Thus, the information needs to be converted into text representations that are easily processed by machine-learning algorithms. Text vectorization represents a text as a series of vectors that can express textual semantics. In 2013, Google released the word-embedding tool Word2vec [67]. Word2vec can be

Research Methods
Feldman and Dagan proposed the concept of text mining in 1995 [62]. It involves obtaining user interest or useful patterns from unstructured text information [63]. We used text mining to process and analyze the online news corpus; the specific steps are as follows.
The first step is text preprocessing. The complexity of the collected information made text processing and mining difficult. It was therefore necessary to preprocess the text to normalize the data. Preprocessing included text regularization, Chinese word segmentation, part-of-speech tagging, and stop-word removal [64].
The second step is feature-word extraction. Feature-word extraction involves extracting representative and informative phrases from preprocessed text. Using specific algorithms can make this process more efficient and accurate and avoid the disruptive effects of dimensionality in text clustering [65]. Common feature-word extraction methods include the Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, the Text Rank algorithm, and the Topic Model algorithm. Among them, TF-IDF is the most widely used and effective method. The principle of TF-IDF is that the most meaningful feature items for distinguishing texts should be words that appear frequently in the text but infrequently in other texts. This approach extracts feature words based on word-frequency statistics combined with weighted calculation [66]. We therefore used TF-IDF to extract feature words from text information.
The third step is text vectorization. The text information in feature words cannot be directly understood by machine-learning models. Thus, the information needs to be converted into text representations that are easily processed by machine-learning algorithms. Text vectorization represents a text as a series of vectors that can express textual semantics. In 2013, Google released the word-embedding tool Word2vec [67]. Word2vec can be trained on millions of dictionaries and hundreds of millions of datasets, and the obtained word vectors can be a good measure of word-to-word textual similarity. Word2vec is a shallow neural network that depends on two crucial algorithm models, Continuous Bag of Words and Skip-gram, and two efficient training methods, Negative Sampling and Hierarchical Softmax [68]. We used the Word2vec tool to calculate word vectors. The last step is text clustering. After text vectorization, the word vectors can be grouped with a clustering algorithm. A widely used clustering algorithm is the k-means algorithm [69], which divides the original samples into different clusters by calculating the similarity between samples. Samples in the same cluster are similar to each other, and those in different clusters are different [70]. The main goal of the algorithm is to find independent clusters with more close distances. The least quadratic error of the loss function of the algorithm needs to be satisfied to accomplish this. The equation for the loss function is as follows: where x is the position of the scattered points in the cluster, w k is the kth cluster, t(w k ) is the center point of the kth cluster, and RSS k . is the loss function of the kth cluster. The goal of algorithm optimization is to choose a reasonable clustering scheme to minimize RSS.
In k-means clustering, the k number can be determined by the silhouette coefficient. In this case, the k value corresponding to the larger silhouette coefficient was selected. The equation for the silhouette coefficient is as follows: where a(i) is the average distance from the i vector to other points in the same cluster, and b(i) is the average distance from the i vector to all points in the nearest cluster. The value of the silhouette coefficient is in the range of [−1,1]. A value closer to 1 means that cohesion and separation are relatively good, and a value closer to −1 means cohesion and separation are relatively bad [71]. The total silhouette coefficient of the clustering result can be obtained by averaging the silhouette coefficients of all of the data points. We used the k-means algorithm for clustering analysis and the silhouette coefficient method to determine the k number in the clustering process. People.com.cn and Chinanews.com are the two major national online news platforms in China. Their authority, timeliness, and depth and breadth of content have been widely recognized in the field of news communication. We therefore used web crawler technology to collect online news from these two platforms about the creative industry's virtual agglomeration through keyword retrieval and formed an online news corpus.

Search Term Selection
To ensure accurate, comprehensive data collection, core search terms were selected based on the combination of three keyword categories: virtual network, creative industry, and agglomeration space [26,72]. This generated 891 search terms, such as "online game community", "virtual design services platform", "online publishing cluster", "digital advertising platform", and "online fashion space". Relevant news texts were collected from People.com.cn and Chinanews.com based on these search terms. Table 1 shows the composition of the search terms. Octoparse's data collection system was used to collect news reports related to the creative industry's virtual agglomeration from People.com.cn and Chinanews.com using the search terms identified above. Octoparse is based on a distributed cloud computing platform that can automatically search, filter by time and category, and collect news information. Online news texts were collected in the period 1-5 January 2021. The content included four parts: title, release time, media source, and body content. Initially, 955 news texts were collected, including 604 from People.com.cn and 391 from Chinanews.com. After eliminating duplicate or irrelevant items, 812 news texts were finally collected, spanning 1 January 2012, to 31 December 2020.

Text Preprocessing
First, we used regular expressions to remove English, numbers, and Chinese and English symbols in the text; preprocessing was set to [a-zb-fhj-qs-uw-z0- . Other specialized terms such as 4G, 5G, AI, AR, and others were thus excluded from the text regularization process. Second, the Jieba toolkit in Python was used to segment the regularized text. Third, we expanded the related stop words based on the Harbin Institute of Technology's stop word database, forming a database containing 2245 stop words. Based on this, stop words were removed during text segmentation.

Feature-Word Extraction
The news corpus was transformed from text data into a collection of words after preprocessing. The words, however, were still too complex to identify the key factors affecting the creative industry's virtual agglomeration. Therefore, the words needed to be filtered by feature-word extraction. The TF-IDF algorithm was used to extract feature words, and the extracted feature words and their TF-IDF values were saved as a "featurewords.csv" file for subsequent analysis.

Text Vectorization
Selecting a suitable Chinese word vector corpus and training for natural language processing requires many resources and computing power. The Chinese Information Processing Institute of Beijing Normal University and the DBIIR Laboratory of Renmin University of China provide open-source word vectors based on dozens of Chinese corpora for training in different domains (e.g., Baidu Baike, Chinese Wikipedia, Zhihu Q&A, and Weibo), and they contain a variety of training settings [73]. Among them, the word vector of Zhihu Q&A is obtained from the training results of the Zhihu Q&A corpus, which has a total capacity of 2.1 GB and 1117 k-phrases. The Skip-gram model in the Word2vec toolkit was used to train the corpus (basic settings: window size = 5, dynamic window = yes, subsampling = 1 × 10 −5 , low-frequency word = 10, iteration = 5, negative sampling = 5). We calculated the word vectors in the feature-words.csv file and constructed the word vector matrix by loading the trained Zhihu Q&A word vectors.

Text Clustering
We used the k-means algorithm to cluster the word vector matrix of the news text. It was necessary to set the number of clusters (k value) in the results manually because the k-means clustering algorithm is an unsupervised learning method with no category attribute. Our study used the silhouette coefficient method to determine and improve the accuracy of the selection of the k value. The variation of the silhouette coefficient for different k values is shown in Figure 2.

Text Clustering
We used the k-means algorithm to cluster the word vector matrix of the news text. It was necessary to set the number of clusters ( value) in the results manually because the k-means clustering algorithm is an unsupervised learning method with no category attribute. Our study used the silhouette coefficient method to determine and improve the accuracy of the selection of the value. The variation of the silhouette coefficient for different values is shown in Figure 2. As shown in Figure 2, the silhouette coefficient fluctuates between 0.09 and 0.45 as the value of increases. When the value of is 9, the silhouette coefficient peaks at 0.45, which is the closest the value is to 1. Thus, the value of is 9. Table 2 shows the clustering results when the value is 9.  As shown in Figure 2, the silhouette coefficient fluctuates between 0.09 and 0.45 as the value of k increases. When the value of k is 9, the silhouette coefficient peaks at 0.45, which is the closest the value is to 1. Thus, the value of k is 9. Table 2 shows the clustering results when the k value is 9.

Representative Words Title
In Table 2, the feature words of news texts related to the creative industry's virtual agglomeration are clustered into nine categories.
Cluster 1 (digital marketing and promotion) includes words such as "media", "marketing", "promotion", "short video", "Weibo", and other words that reflect the characteristics of digital marketing and promotion.
Cluster 6 (government policy planning) includes "central", "government department", "national", "strategy", "measures", "intellectual property", and other words that reflect the characteristics of government policies and strategic planning.
Cluster 7 (online interactive service) includes "user", "topic", "communication", "interaction", "service", and other words that reflect the characteristics of communication and interaction services with users.
Cluster 9 (digitalization of cultural resources) includes "digital culture", "digital creativity", "digital museum", "digital art museum", and other words that reflect the characteristics of the digital transformation of cultural resources.

Internal and External Driving Factors
The results of the text clustering indicate that the creative industry's virtual agglomeration is influenced by nine driving factors: digital marketing and promotion, virtual platform facilities, emerging consumer demand, application of innovative technology, digital economic environment, government policy planning, online interactive service, flexible manufacturing, and the digitalization of cultural resources. These factors are too dispersed, however, to be explained at a systemic level. In creative industries, virtual agglomeration cannot appear without the combined effects of internal and external factors. We therefore used the PEST and value-chain models to extract the driving factors from the external and internal aspects of this virtual agglomeration.
First, according to the PEST framework, the creative industry's virtual agglomeration is driven by four external factors: government policy planning (political factor), digital economic environment (economic factor), emerging consumer demand (social factor), and the application of innovative technology (technological factor).
Second, with the rapid development of the digital economy, creative enterprises and institutions in the value chain increasingly rely on digital technology to achieve value creation leaps in virtual space and spawn a new form of virtual agglomeration. We therefore analyzed the internal driving factors of this virtual agglomeration based on an analysis of primary (design, production, marketing, services) and auxiliary (basic guarantees) activities in the value chain. The creative industry's value chain in virtual space is created based on the activities "digitalization of cultural resources → flexible manufacturing → digital marketing and promotion → online interactive services" and "virtual platform facilities." These five aspects constitute the internal driving factors of the creative industry's virtual agglomeration.
This virtual agglomeration is jointly driven by strong external and internal factors. The external driving factors are government policy planning, digital economic environment, emerging consumer demand, and application of innovative technology; the internal factors are the digitalization of cultural resources, flexible manufacturing, digital marketing and promotion, online interactive service, and virtual platform facilities (Figure 3). We therefore used the PEST and value-chain models to extract the driving factors from the external and internal aspects of this virtual agglomeration. First, according to the PEST framework, the creative industry's virtual agglomeration is driven by four external factors: government policy planning (political factor), digital economic environment (economic factor), emerging consumer demand (social factor), and the application of innovative technology (technological factor).
Second, with the rapid development of the digital economy, creative enterprises and institutions in the value chain increasingly rely on digital technology to achieve value creation leaps in virtual space and spawn a new form of virtual agglomeration. We therefore analyzed the internal driving factors of this virtual agglomeration based on an analysis of primary (design, production, marketing, services) and auxiliary (basic guarantees) activities in the value chain. The creative industry's value chain in virtual space is created based on the activities "digitalization of cultural resources → flexible manufacturing → digital marketing and promotion → online interactive services" and "virtual platform facilities." These five aspects constitute the internal driving factors of the creative industry's virtual agglomeration.
This virtual agglomeration is jointly driven by strong external and internal factors. The external driving factors are government policy planning, digital economic environment, emerging consumer demand, and application of innovative technology; the internal factors are the digitalization of cultural resources, flexible manufacturing, digital marketing and promotion, online interactive service, and virtual platform facilities (Figure 3).

Mechanism Analysis
We combined the relevant literature and industry practices with the analysis of the mechanisms of each factor, as described below.
First, the driving mechanism of government policy planning is primarily reflected in digital strategy [74] and network supervision [75]. Regarding digital strategy, governments have formulated various strategic plans for the digital development of creative industries [76][77][78], which include accelerating the construction of new infrastructure, encouraging digital innovation in creative enterprises, and stimulating online creative

Mechanism Analysis
We combined the relevant literature and industry practices with the analysis of the mechanisms of each factor, as described below.
First, the driving mechanism of government policy planning is primarily reflected in digital strategy [74] and network supervision [75]. Regarding digital strategy, governments have formulated various strategic plans for the digital development of creative industries [76][77][78], which include accelerating the construction of new infrastructure, encouraging digital innovation in creative enterprises, and stimulating online creative consumption, all of which provide opportunities for the creative industry's virtual agglomeration. Regarding network supervision, the government has improved laws and regulations related to online transactions for creative products, safeguarded subjects' legitimate interests [79], and created a suitable environment for creative production and trading activity on virtual platforms [80].
Second, the driving mechanism of the digital economic environment is reflected in industrial digital transformation [81] and innovation [82]. Regarding industrial digital transformation, the digital economy promotes the transformation of traditional creative industries. It provides more formats, such as digital publishing [83], film [84], and music [85], thus promoting the creative industry's virtual agglomeration. Regarding industrial digital innovation, innovation in the design, production, marketing, and overall service provision of creative industries has accelerated under the digital economy, producing intelligent, personalized digital creative products and services. This will increasingly attract creative enterprises and users who gather in virtual spaces.
Third, the driving mechanism of emerging consumer demand is predominately reflected in the pursuit of spiritual culture [86] and the rise of online consumption [87]. With improved living standards, more people are pursuing diversified and personalized spiritual and cultural consumption, and a rich diversity of digital cultural and creative products and services in the virtual space can meet these changing consumption demands [88] in a way that promotes the creative industry's virtual agglomeration. Various new online consumption models have also emerged, such as live e-commerce, creative content e-commerce, and creative community e-commerce. Increasingly, consumers are pursuing green, healthy consumption through this cloud-life model, which promotes prosperity in the online creative consumption market and accelerates the virtual clustering of creative industries.
Fourth, the driving mechanisms for the application of innovative technology appear in innovation support [89] and service intelligence [90]. Regarding innovation support, technologies such as AI, blockchain, cloud computing, big data, and 5G accelerate the formation of cooperative networks among creative enterprises [91], promote the efficient flow of resources [92], accelerate profit acquisition, and attract more creative enterprises to participate in virtual agglomeration. Creative enterprises can also use big data technology to record users' browsing and purchasing behaviors in real time, capture creative consumption preferences, and portray crowd characteristics, thus bringing more personalized intelligent services to users.
Fifth, the driving mechanism of the digitalization of cultural resources appears in digital cultural inheritance [93,94] and activation [95]. Regarding the former, by combining traditional cultural resources with digital technology, creative entities can create virtual forms of creative products that not only protect these heritage and cultural resources but also allow more users to experience traditional culture free from the constraints of physical space. Regarding the latter, creative entities can browse, learn about, and utilize digital cultural resources; be stimulated to create content [96]; and develop digital creative derivatives to enrich the content ecology of creative industries [97], thus promoting the creative industry's virtual agglomeration.
Sixth, flexible manufacturing is reflected in personalized customization [98,99] and flexible supply [100]. A flexible production system can achieve personalized customization through user-demand analysis, creative design, product optimization, logistics distribution, and service feedback. It can also greatly enhance the market responsiveness and conversion ability of creative enterprises [101]. With advanced flexible production systems and efficient resource matching on virtual platforms, creative enterprises can flexibly produce and supply according to market demand. This can help to avert inventory backlogs, reduce operating costs [102], and compel more creative subjects to cluster on virtual platforms.
Seventh, the driving mechanism of digital marketing and promotion is reflected in precise cross-media communication [103] and immersive virtual experience [104]. Creative companies use diversified digital media channels, such as Weibo, short videos, and WeChat, to form links with users. They can therefore more accurately explore online markets, tap into user needs, deliver product ideas, and shape brand culture in virtual space. New communication channels can greatly improve product sales and brand stickiness for creative enterprises and attract more creative institutions and users to gather in virtual spaces [19]. Via digital technologies such as virtual reality, artificial reality, mixed reality, 5G, 4K, and 8K, users can fully immerse themselves in creative experiences, better sense the fit between products and their needs, and enhance their satisfaction [105].
Eighth, the driving mechanism of online interactive services is reflected in two-way service feedback [98,106] and shorter service distance [107]. The linear one-way service model between creative enterprises and users has been gradually replaced by an online two-way interactive service model in the social media environment [108]. Creative enterprises are increasingly inviting users into live rooms, where they join fan groups to share new product expectations, product usage tips, and service satisfaction. This can help improve products and services. Mobile Internet has removed the distance between creative enterprises and users. Users have more opportunities to participate in digital product creation, service development, and event hosting [102], thus promoting the virtual agglomeration of creative industries.
Ninth, the driving mechanism of virtual platform facilities is reflected in network collaboration [109], freedom of participation [3], and the guarantee of trust [110]. Diversified service functions, efficient information sharing [111], and convenient transaction processes of virtual platforms [112] help establish close network collaborations between creative enterprises, thereby reducing production costs, improving service efficiency, and enhancing value creation [113]. Open and inclusive platform features also lower the participation threshold for creative entities [114], causing a more diverse range of subjects to gather on platforms. Meanwhile, reviews on virtual platforms, reputation ratings, violation penalties, and other rules can constrain unethical or illegal online behaviors and protect the rights and interests of all participants [115].
Based on this analysis, Figure 4 shows our proposed driving-force model of the creative industry's virtual agglomeration. Creative companies use diversified digital media channels, such as Weibo, short videos, and WeChat, to form links with users. They can therefore more accurately explore online markets, tap into user needs, deliver product ideas, and shape brand culture in virtual space. New communication channels can greatly improve product sales and brand stickiness for creative enterprises and attract more creative institutions and users to gather in virtual spaces [19]. Via digital technologies such as virtual reality, artificial reality, mixed reality, 5G, 4K, and 8K, users can fully immerse themselves in creative experiences, better sense the fit between products and their needs, and enhance their satisfaction [105]. Eighth, the driving mechanism of online interactive services is reflected in two-way service feedback [98,106] and shorter service distance [107]. The linear one-way service model between creative enterprises and users has been gradually replaced by an online two-way interactive service model in the social media environment [108]. Creative enterprises are increasingly inviting users into live rooms, where they join fan groups to share new product expectations, product usage tips, and service satisfaction. This can help improve products and services. Mobile Internet has removed the distance between creative enterprises and users. Users have more opportunities to participate in digital product creation, service development, and event hosting [102], thus promoting the virtual agglomeration of creative industries.
Ninth, the driving mechanism of virtual platform facilities is reflected in network collaboration [109], freedom of participation [3], and the guarantee of trust [110]. Diversified service functions, efficient information sharing [111], and convenient transaction processes of virtual platforms [112] help establish close network collaborations between creative enterprises, thereby reducing production costs, improving service efficiency, and enhancing value creation [113]. Open and inclusive platform features also lower the participation threshold for creative entities [114], causing a more diverse range of subjects to gather on platforms. Meanwhile, reviews on virtual platforms, reputation ratings, violation penalties, and other rules can constrain unethical or illegal online behaviors and protect the rights and interests of all participants [115].
Based on this analysis, Figure 4 shows our proposed driving-force model of the creative industry's virtual agglomeration.

Conclusions and Implications
This study analyzed the causes of the virtual agglomeration of creative industries. We used web crawler technology to collect relevant online news reports and used text mining to extract nine key factors. We also constructed an analysis framework for the internal and external factors based on the PEST and value-chain models. On that basis, we combined the relevant literature and industry practices and analyzed the mechanism of each factor. Finally, we built a driving-force model of the creative industry's virtual agglomeration.
The main findings are as follows. First, the creative industry's virtual agglomeration was found to result from a combination of internal and external factors. The external driving factors are government policy planning, the digital economic environment, emerging consumer demand, and application of innovative technology; the internal factors are the digitalization of cultural resources, flexible manufacturing, digital marketing and promotion, online interactive service, and virtual platform facilities. Second, each driving factor was found to promote the creative industry's virtual agglomeration through different mechanisms. Among them, (1) government policy planning is based on digital strategy and network supervision [3]; (2) the digital economic environment is based on industrial digital transformation and innovation; (3) emerging consumer demand is based on the pursuit of spiritual culture and the rise of online consumption; (4) application of innovative technology is based on innovation support and service intelligence [84]; (5) the digitalization of cultural resources is based on digital cultural inheritance and activation [97]; (6) flexible manufacturing is based on personalized customization and flexible production and supply [92]; (7) digital marketing and promotion are based on cross-media communication and immersive virtual experiences; (8) online interactive services are based on two-way service feedback and shortened service distance [116]; and (9) virtual platform facilities are based on network coordination, freedom of participation, and guarantee of trust [117]. The creative industry's virtual agglomeration arises from the influence of these factors and mechanisms.
Among the driving factors found in this study, government policy planning, the digital economic environment, and emerging consumer demand are key factors that also affect the geographic agglomeration of creative industries [118][119][120][121]. However, factors such as office rent, transportation facilities, and urban cultural facilities that affect this geographic agglomeration [122][123][124] are no longer significant in virtual agglomeration because virtual agglomeration is formed based on virtual space, which eliminates the restriction of geographical boundaries [37]. It will therefore be difficult for region-related factors to have a critical impact. This study also identified some new factors, such as the application of innovative technology, the digitization of cultural resources, flexible manufacturing, digital marketing and promotion, online interactive services, and virtual platform facilities. These emerging factors are worth exploring in future research.
This study enriches the research on the creative industry's virtual agglomeration. First, we addressed its causes, discovered the internal and external driving factors and their mechanisms through big data text mining and theoretical analysis, and constructed a driving-force model. The proposed model can lay a theoretical foundation for future research on the creative industry's virtual agglomeration. Second, we used web-crawler technology and text-mining methods (text preprocessing, feature-word extraction, text vectorization, and text clustering) to identify the key factors affecting virtual agglomeration, thus ensuring the reliability of the results. The use of big-data technology in this study broadens the choice of methods for related future research.
Our findings also have practical implications for guiding the creative industry's virtual agglomeration. First, governments should focus on constructing digital infrastructure for creative industries and encourage creative enterprises to explore the commercial applications of digital facilities, such as 5G, cloud computing, big data, and the Internet of Things. In addition, to create a fair and mutually trusting digital economic environment, governments should innovate in terms of digital governance methods by guiding the digital innovation of creative industries, maintaining order in the digital economy, and cracking down on illegal digital activities [125]. Second, creative entities should strengthen R&D on generic technologies, algorithms, and software and establish a network cooperation system for interconnectivity between the different parts of the creative-industry chain [126]. Creative entities should explore new technologies to creatively transform traditional cultural resources, develop creative digital products, and optimize the virtual ecology of creative industries [127]. Third, virtual platforms should continue to improve their intelligent services through more advanced algorithms, improved rules [128], and humanized functions. This will help to achieve more accurate information matching, network collaboration, and value trading among creative subjects [129]. Virtual platforms should also innovate in terms of their business models, promote the efficient use of creative resources, and integrate overall creative industry processes. Fourth, creative enterprises can change the creation mode of digital products and meet consumers' social and creative needs by inviting them to participate in content creation, as well as developing digital products, such as interactive videos, movies/TV, and books. Creative enterprises should aim to customize digital creative products with precision based on user preferences, thus promoting positive interactions between supply and demand on virtual platforms. This study has some limitations. First, we did not empirically test the effects of the factors and their mechanisms. Second, this study was conducted based on Chinese online news texts related to the creative industry's virtual agglomeration. Whether our conclusions are equally applicable to Western creative industries needs to be verified. These limitations suggest directions for future research.