Technology Hotspot Tracking: Topic Discovery and Evolution of China’s Blockchain Patents Based on a Dynamic LDA Model

: Tracking scientiﬁc and technological (S&T) research hotspots can help scholars to grasp the status of current research and develop regular patterns in the ﬁeld over time. It contributes to the generation of new ideas and plays an important role in promoting the writing of scientiﬁc research projects and scientiﬁc papers. Patents are important S&T resources, which can reﬂect the development status of the ﬁeld. In this paper, we use topic modeling, topic intensity, and evolutionary computing models to discover research hotspots and development trends in the ﬁeld of blockchain patents. First, we propose a time-based dynamic latent Dirichlet allocation (TDLDA) modeling method based on a probabilistic graph model and knowledge representation learning for patent text mining. Second, we present a computational model, topic intensity (TI), that expresses the topic strength and evolution. Finally, the point-wise mutual information (PMI) value is used to evaluate topic quality. We obtain 20 hot topics through TDLDA experiments and rank them according to the strength calculation model. The topic evolution model is used to analyze the topic evolution trend from the perspectives of rising, falling, and stable. From the experiments we found that 8 topics showed an upward trend, 6 topics showed a downward trend, and 6 topics became stable or ﬂuctuated. Compared with the baseline method, TDLDA can have the best effect when K is 40 or less. TDLDA is an effective topic model that can extract hot topics and evolution trends of blockchain patent texts, which helps researchers to more accurately grasp the research direction and improves the quality of project application and paper writing in the blockchain technology domain.


Introduction
Patents, papers, S&T projects are important scientific and technological resources and occupy an important position in social progress. Patents can reflect the highlights of research hotspot earlier. Therefore, patent's topic mining can help researchers generate new ideas, which has important scientific significance for paper writing and project applications.
Patents are the most important achievements in the development of science and technology. The quantity and quality of patents are important in enterprise development and technological progress [1]. The development status and research hotspots in a particular field can be effectively found by analyzing the patent text. Patent analysis can promote the rapid development of research and academic fields [2]. In recent years, blockchain technology has developed rapidly and has received the attention of the business community and academia, and has achieved certain results in the publication of papers and patent applications. Blockchain technology integrates distributed computing [3], encryption and (1) We propose a time-based dynamic latent Dirichlet allocation (TDLDA) modeling method based on a probabilistic graph model and knowledge representation learning for patent text mining. (2) We present a computational model, topic intensity (TI), that expresses the topic strength and evolution. (3) In order to evaluate the topic quality, we used the point-wise mutual information (PMI) [9] value which can measure the word association to test the effectiveness of our proposed TDLDA model. (4) TDLDA is an effective model to extract hot topics and evolution trends of blockchain patent texts, which can help researchers more accurately grasp the research direction of blockchain technology.
The remainder of the paper is organized as follows. Section 2 briefly summarizes the related technology and theoretical research status of topic discovery and evolution for technology hotspot tracking. Section 3 describes the foundation theory and construction of the architecture for mining blockchain patent text based on the time-based dynamic LDA model. Section 4 presents the data source and experimental process in detail. Section 5 describes the extensive comparative experiments conducted to evaluate the effectiveness of our proposed TDLDA model on patent datasets. Section 6 presents the conclusions of this paper and discusses future research directions.

Related Works
Topic extraction and evolution analysis of patent documents can determine the status of research in this field and bring new ideas to scientific and technological (S&T) researchers. Many experts and scholars have conducted research on patent data and found valuable knowledge, which provides good technical support for knowledge dissemination and expert decision-making. Table 1 summarizes the latest research situation from three aspects. Table 1. A brief summarization of patent mining and topic evolution.

Fields Methods Contexts
Patent text mining [10] Convolutional neural networks Patent information management and knowledge mining [11] Consumer-driven product function Identify customer needs in real time [12] Preposition-based semantic analysis Overcomes the limitations of keyword-based network [13] Feature vector space model (FVSM) Patents related to Internet of Things (IoT) technology [14] Social network analysis Identify underlying topics inhumanoid robot technology [15] Bayesian network modeling Analyze patent documents related to artificial intelligence Technical level [16] WELQLC-QR Improve the similar semanticsfor question retrieval [17] Valid and reliable LDA More accessible to communication researchers [18] AC-LDA Capture the relationships hidden in local sentences [19] NN-LDA Create anintrusion detection systems [20] Intelligent LDA Explored their associated trends over time [21] NMF-LDA Provide users with suggestions and selections [22] SwLDA Mimics human perception behavior Topic evolution trend [23] Manifold learning-based model Explore topic-sentiment associate online news [24] dDeep learning language model How keyword semantics can invoketopic evolution [25] Semantic-aware dynamic model Monitors the evolution of author interest [26] Graph-based theory Research front detection and topic evolution [27] Computational content analysis CSR-related conversations in the Twitter-sphere [28] Functional count data model The patent data of Applecompany In terms of patent text mining, many researchers have done a lot of work to discover the knowledge contained in granted patents. Li proposed convolutional neural networks and word embedding to classify patent text, which is an essential task in patent information management and knowledge mining [10]. Trappey used social media and patent mining to deploy a consumer-driven product technology function that can identify customer needs in real time [11]. An proposed a preposition-based semantic analysis method to derive technological intelligence from patents that overcomes the limitations of the existing keyword-based network analysis and demonstrated its potential through an application [12]. Lei proposed a feature vector space model (FVSM) to analyze data that used patents related to Internet of Things (IoT) technology to demonstrate the performance and effectiveness of the model [13]. Kumari proposed a topic modelling and social network analysis method for humanoid robot technology publications and patents that used topic modelling based on latent Dirichlet allocation analysis to identify underlying topics in sub-areas in the field and social network analysis to detect important and influential subareas [14]. Sangsung proposed a Bayesian network modeling and factor analysis method to analyze patent documents related to disaster artificial intelligence technology [15].
At the technical level, LDA is an effective text mining tool that uses a probabilistic graph model to mine potential topics in the text. Many scholars have carried out research on LDA technology, which has been effectively applied to various fields, and achieved good results. Liu proposed an integrated retrieval framework for similar questions, named word-semantic embedded label clustering-LDA with question life cycle (WELQLC-QR), to improve the similar semantics and high popularity for question retrieval [16]. Maier proposed a valid and reliable methodology to make LDA topic modeling more accessible to communication researchers and to ensure compliance with disciplinary standards, and developed a brief hands-on user guide for applying LDA topic modeling [17]. Wan pro-posed an association constrained LDA (AC-LDA) for effectively capturing co-occurrence relationships; the model can effectively capture the relationships hidden in local sentences and further increase the extraction rate of fine-grained aspects and opinion words [18]. Elkhadir proposed a new robust median NN-LDA based on the generalized mean to create an intrusion detection systems (IDS) that was superior to the approaches of many LDA variants [19]. Bastani proposed an intelligent approach based on latent Dirichlet allocation (LDA) to analyze CFPB consumer complaints and extract latent topics in the narratives, and explored their associated trends over time [20].Cao proposed a probabilistic matrix factorization recommendation approach fusing neighborhood selection based on latent Dirichlet allocation to provide users with suggestions and selections that was effective at improving recommendation performance and solving the data sparsity problem [21]. Jeon proposed a saliency-weighted LDA (swLDA) model that mimics human perception behavior, remarkably outperforming previous LDA models in terms of image categorization [22].
The topic evolution trend can effectively track the development status of research hotspots. Through the evolution trend, researchers can understand the current research status of the field from the time dimension. This can help researchers get inspiration and provide a basis for paper writing and project application. Scholars have conducted various studies on the technology and application of trend prediction and evolution, and achieved good results. For example, Xu proposed a manifold learning-based model to explore topic-sentiment associations and their evolution over time in the online news domain that can visualize the hidden sentiment dynamics of topics in a low-dimensional space [23]. Hu applied a deep learning language model, Google Word2Vec, to find whether and how keyword semantics can invoke or affect topic evolution [24]. Yang proposed an orderingsensitive and semantic-aware dynamic author topic model that monitors the evolution of author interest in timestamped documents [25]. Xu proposed a new research front detection and topic evolution approach based on graph theory utilizing topological structure and the PageRank algorithm [26]. Chae used computational content analysis to understand topics from CSR-related conversations in the Twitter-sphere to find directions for future research [27]. Kim proposed a functional count data model to analyze patent data; using the patent data of Apple, the authors investigated the company's technological structure and evolution through high-dimensional visualization using harmonic components generated by functional data analysis [28].
From the related work, we can see that the research on topic discovery and evolution has made some progress, but there is less research on the use of topic models to analyze the development of patent technology. This paper uses topic model technology to analyze the evolution of the topic of blockchain, which has certain guiding significance for research on blockchain patents.

Foundation Theory for LDA
Latent Dirichlet allocation (LDA) is an unsupervised machine learning technique based on a probabilistic graph model [29]. It can discover potential topics in large-scale document sets, and it works well in text mining, especially short text processing [30]. Supposing the set of documents is D, the set of topics is T, and the vocabulary in the document is denoted by w, then the relationship between the topic, document, and vocabulary of LDA can be expressed by the following formula [31]: where p(w |d ) represents the probability of a certain word in a document, p(t |d ) indicates the probability of a certain topic t in the document, and p(w |t ) indicates the probability of a certain word w in topic t. It can be found from the formula that topic t is the intermediate layer between the document and vocabulary layers. We can get the relationship between topic and vocabulary by calculating the probability of each word in the document. Document-topic and topic-topic words satisfy a multinomial distribution, which can be represented by a Dirichlet distribution [32]. Its probability density function can be expressed as: where α indicates the parameters of the multinomial distribution and p k represents the probability of a topic. Suppose that there are K topics in m documents, each of which has its own topic distribution and is subject to a Dirichlet distribution with a parameter of α. Each topic has its own distribution of topic words and is subject to a Dirichlet distribution with a parameter of β. Each word in the document has a corresponding topic, and its probability graph model is shown in Figure 1 [33].
lationship between topic and vocabulary by calculating the probability of each the document. Document-topic and topic-topic words satisfy a multinomial distr which can be represented by a Dirichlet distribution [32]. Its probability density can be expressed as: where α indicates the parameters of the multinomial distribution and k p rep the probability of a topic. Suppose that there are K topics in m documents, each of which has its ow distribution and is subject to a Dirichlet distribution with a parameter of α . Ea has its own distribution of topic words and is subject to a Dirichlet distribution parameter of β . Each word in the document has a corresponding topic, and it bility graph model is shown in Figure 1 [33].  In Figure 1, θ indicates the topic, and the topic of the i-th document d i can be expressed as θ i = (θ i1 , θ i2 , . . . , θ ik ). ϕ k indicates the distribution of the words corresponding to the k-th topic, where k ∈ [1, K]. For the k-th topic, the distribution of the topic words can be expressed as ϕ k = (ϕ k1 , ϕ k2 , . . . , ϕ kj ). z m,n represents the n topics of the m-th document. We finally get the required observation value w m,n by selecting ϕ k from z i,j .

Time-Based Dynamic LDA Blockchain Patent Text Topic Construction Process
This section introduces the construction process of TDLDA. Set D of the patent document can be represented as D = {d 1 , d 2 , . . . d n }, the vocabulary in the document can be represented as w = {w 1 , w 2 , . . . w n }, the dynamic time period can be represented as Y = {y 1 , y 2 , . . . , y m }, the generated K topics can be represented as T = {t 1 , t 2 , . . . , t k }, and the k topics obtained in time period y i can be represented as T y i k .The process of generating topics and corresponding topic words by the TDLDA topic model can be expressed by the following Algorithm 1: Output: Topic set T = {t 1 , t 2 , . . . , t k } within each time period and the first n topic words according to probability.
The above process can be summarized as: (1) Traverse the vocabulary w y i in the patent document in each time period, and randomly assign atopic number to the vocabulary.
(2) Traverse the patent document dataset D again, perform Gibbs sampling on each document, find the subject corresponding to each w y i and update the number, and modify the number of words in the patent document set. Gibbs sampling can obtain a sample with certain probability. In this method, we can obtain a specific word through Gibbs sampling to further determine the probability that the word belongs to a certain topic.
(4) Finally, get K patent topics in each time period and the corresponding keywords of each topic.

Data Source
The dataset in this paper comes from the patent literature database of CNKI. We entered the keyword "blockchain" under the entry where the search condition is "patent name", and the search date was 27 February 2020. We obtained 8245 patents related to blockchain, including 67 items in 2016, 573 items in 2017, 2170 items in 2018, 5355 items in 2019, and 80 items in 2020. We selected all patents to export references, and got documents containing relevant fields such as inventor, applicant, patent name, address, publication date, publication number, abstract, etc. Finally, we obtained the titles of the patents in the data set as our research object.

TDLDA-Based Patent Text Topic Extraction Process
Based on the acquired data, we first preprocessed data document D, and further conducted experiments using the TDLDA algorithm. The specific process is shown in Figure 2.
First, we obtained patent documents on the subject of "blockchain" on CNKI, and constructed corresponding document sets in which the datasets were held before being processed. A patent contains many fields such as title, applicant, inventor, and granted time. The title can reflect the connotation and significance of the invention, so we used the patent titles as the research target. The patent title of each document set was regarded as a document. tents in the data set as our research object.

TDLDA-based patent text topic extraction process
Based on the acquired data, we first preprocessed data document D, and further conducted experiments using the TDLDA algorithm. The specific process is shown in Figure 2. First, we obtained patent documents on the subject of "blockchain" on CNKI, and constructed corresponding document sets in which the datasets were held before being processed. A patent contains many fields such as title, applicant, inventor, and granted time. The title can reflect the connotation and significance of the invention, so we used the patent titles as the research target. The patent title of each document set was regarded as a document.
Second, we called the third-party library jieba in the python IDE development environment to segment the downloaded blockchain patent titles, remove irrelevant adjectives, adverbs, etc., extract professional vocabulary, and obtain a document composed of i words.
Finally, we used the TDLDA algorithm to perform multiple iterations on the vocabulary document to obtain K topics, select m topic words with similar semantics for each topic as representatives, and then summarize the topics names according to the corresponding vocabulary manually. Second, we called the third-party library jieba in the python IDE development environment to segment the downloaded blockchain patent titles, remove irrelevant adjectives, adverbs, etc., extract professional vocabulary, and obtain a document composed of i words.
Finally, we used the TDLDA algorithm to perform multiple iterations on the vocabulary document to obtain K topics, select m topic words with similar semantics for each topic as representatives, and then summarize the topics names according to the corresponding vocabulary manually.
In the actual experiment, the amount of patent data in 2020 was relatively small. Therefore, although these data were obtained, the TDLDA topic model was not processed, thus This paper does not analyze patent data for 2020.

Topic Strength Calculation and Evolution Analysis Process
We first evaluated the quality of the topics based on TDLDA, and then selected 20 typical hot topics. In order to better know the popularity of topics, we calculated the strength of each topic and sorted the hot topics. Finally, evolution analysis was carried out according to yearly changes in topic intensity. Figure 3 shows the flowchart of the entire experiment.
The experimental process of this paper can be divided into six steps: (1) Obtain patent data on blockchain in recent years.
(2) Pre-process blockchain data set to form a document set D on which to run the TDLDA algorithm.
(3) Choose the appropriate parameters to run TDLDA, and get the topic and topic words.
(4) Select principal component analysis (PCA) [34], singular value decomposition (SVD) [35], and other topic model algorithms as baseline to evaluate and analyze the topic quality of TDLDA; PCA and SVD are effective data dimensionality reduction methods, which are widely used in text mining and image noise reduction scenarios.
(5) Establish the topic strength model, calculate the intensity of each hot topic, and analyze hot topics.
(6) Build a hot topic evolution model and analyze the evolution trend of each hot topic.

Topic Strength Calculation and Evolution Analysis Process
We first evaluated the quality of the topics based on TDLDA, and then selected 20 typical hot topics. In order to better know the popularity of topics, we calculated the strength of each topic and sorted the hot topics. Finally, evolution analysis was carried out according to yearly changes in topic intensity. Figure 3 shows the flowchart of the entire experiment. The experimental process of this paper can be divided into six steps: (1) Obtain patent data on blockchain in recent years.
(2) Pre-process blockchain data set to form a document set D on which to run the TDLDA algorithm.
(3) Choose the appropriate parameters to run TDLDA, and get the topic and topic words.
(4) Select principal component analysis (PCA) [34], singular value decomposition (SVD) [35], and other topic model algorithms as baseline to evaluate and analyze the topic quality of TDLDA; PCA and SVD are effective data dimensionality reduction methods, which are widely used in text mining and image noise reduction scenarios.
(5) Establish the topic strength model, calculate the intensity of each hot topic, and analyze hot topics.
(6) Build a hot topic evolution model and analyze the evolution trend of each hot topic. After the above process is completed, we will get K hot topics. At the same time, according to the topic intensity model, the intensity value of each topic in different years is calculated, and the evolution trend of a topic in recent years is obtained. According to the above steps, we conducted experiments on actual data sets. The specific experimental results can be seen in Section 5.

Experimental Environment and Related Parameter Settings
This experiment was conducted in the Windows 10 64-bit operating system, the development environment was JetBrains pycharm 2018, and python version 2.7.15 was used. The CPU hardware environment is a dual-core i7 6700 processor, the main frequency is 3.40GHz, and the RAM capacity is 8GB. Based on the obtained patent data, the patent data were divided into 5 datasets: BC_patents_2016, BC_patents_2017, BC_patents_2018, BC_patents_2019, and the total data set, BC_patents_16-19. The experimental parameter settings are shown in Table 2.
We set the Dirichlet distribution parameter α of document-topic to 0.2, the Dirichlet distribution parameter β of topic-vocabulary to 0.1, and the Gibbs sampling iteration number to 999.
Alpha and beta are important parameters of Dirichlet distribution. The larger for the value, the smoother for the distribution. Appropriate values for alpha and beta depend on the number of topics and the number of words in vocabulary. In this application, good results can be obtained by setting alpha = 0.2 and beta = 0.1. Due to the small number of patent data sets in 2016, theBC_patents_2016 topic number was set to 10. The number of topics in the other datasets was set to 20. The number of documents in each dataset was 67, 573, 2170, 5355, and 8245, respectively. The dictionary length of the documents was 11, 64, 976, 1570, and 1985, respectively.

Experimental Results and Analysis
This section first analyzes hot topics in different time periods and puts forward the concept of hot spot strength. Then, in order to evaluate the effect of the TDLDA model in short text patent processing, PMI, a method for evaluating the quality of topics, is proposed. The indicators are compared and analyzed with PCA, SVD and other topic models. Finally, the theoretical basis of topic evolution in different time periods is proposed, and the evolution experiment results are analyzed.

Analysis of Hot Topics and Keywords
In our experiments, we set the number of topics K to (20, 40,60,80,100). Thus, we obtained K topics according to the array. Through the topic comparison, we found that when K = 20, the topic effect is optimal. We selected 20 topics for analysis, and summarized the corresponding topic names based on the content of the vocabulary. The specific results are shown in Table 3.  After obtaining the topic terms through experiments, the focus becomes how to evaluate the strength of the subject and further analyze its evolution trend. In order to measure the proportion of one topic among all topics, we use its strength to test its topic intensity. Let Sθ k be the topic strength of the kth topic, which can be expressed as: where V represents the length of the dictionary that is the total number of all words, Fre(w i ) represents the frequency of all words, N represents the number of words in each topic, and Fre(θ k , w j ) represents the frequency of words in the k-th topic.
According to the experimental results and Formula (3), we obtained the intensity of 20 topics, as shown in Figure 4.
We sorted the 20 topics through the topic strength model and analyzed the top 6: Topic4 network storage; Topic14, copyright protection; Topic2, data security; Topic18, risk supervision; Topic13, digital Currency; Topic9, consensus algorithm.  We sorted the 20 topics through the topic strength model and analyzed the top 6: Topic4 network storage; Topic14, copyright protection; Topic2, data security; Topic18, risk supervision; Topic13, digital Currency; Topic9, consensus algorithm.
(1) Network storage. Network storage is one of the earliest areas of blockchain applications. As a distributed database, the blockchain itself provides functions such as security and reliability, multiple backups, and data traceability. Therefore, many scenarios have appeared in which network storage is a typical application. As society's demand for big data storage increases, network storage based on blockchain will occupy an important position. The main keywords of network storage include data processing, storage medium, server, network system, etc.
(2) Copyright protection. As the main application scenario in the early stage of the blockchain, copyright protection plays an important role in protecting intellectual property. The blockchain system can provide an important traceability function for various (1) Network storage. Network storage is one of the earliest areas of blockchain applications. As a distributed database, the blockchain itself provides functions such as security and reliability, multiple backups, and data traceability. Therefore, many scenarios have appeared in which network storage is a typical application. As society's demand for big data storage increases, network storage based on blockchain will occupy an important position. The main keywords of network storage include data processing, storage medium, server, network system, etc.
(2) Copyright protection. As the main application scenario in the early stage of the blockchain, copyright protection plays an important role in protecting intellectual property. The blockchain system can provide an important traceability function for various digital products, effectively protecting intellectual property and ensuring the product's sovereignty. The main keywords corresponding to this topic include assets, transactions, copyright, etc.
(3) Data security. Because of the distributed and multiple backup characteristics of the blockchain system, the blockchain can provide more secure technology than traditional databases. Blockchain technology provides important technical support for information security and sharing, and promotes the integration and development of emerging technologies such as big data, cloud computing, and artificial intelligence. The key words corresponding to data security mainly include encryption and decryption, secret key, wallet, model, homomorphism, etc.
(4) Risk supervision. The blockchain system has the characteristics of being tamper proof and having data traceability, making risk supervision an important application area, which can monitor the sources of transaction data. Risk monitoring can be used in many fields, such as securities and border trade. The keywords corresponding to the topic mainly include transactions, articles, digital certificates, risks, and certificates.
(5) Digital currency. Digital currency is the main application area of blockchain technology. Blockchain technology is derived from Bitcoin, and several digital currencies have been derived based on it, and it has become a digital currency technology strongly supported by the National Central Bank. The key words corresponding to this topic mainly include node, mining, sharding, ledger, etc.
(6) Consensus algorithm. The consensus algorithm is the core technology of the blockchain, and its execution efficiency is the main factor that affects the output, security, and expansion performance of the blockchain system. Terms such as consensus protocol, consensus model, consensus mechanism, and Byzantine are the main vocabulary of the consensus algorithm and occupy a very important position in authorized patents.
Judging from the hot patent topics, authorized patents are mainly concentrated in some application fields, and the patents for the core technologies of blockchain (such as consensus algorithms, smart contracts, cross-chain technologies, and algorithms) do not account for much. This shows that patent research and property rights protection for blockchain technology need to be strengthened. Blockchain technology patents lay a foundation for China's sustainable development in the fields of economy, finance, engineering technology, and social science.

Topic Performance Metrics Evaluation
After obtaining the topic and corresponding vocabulary through TDLDA, we need to evaluate the quality of the topic. In this paper, we use the PMI value to measure topic quality [36]. The PMI value was first proposed by Newman at the ADCS conference in 2009. Now it has become the main indicator for measuring the quality of a topic [37]. The PMI value can be expressed as: where, w i , w j represent the first i and j words in a topic, p(w i , w j ) represents the probability that w i and w j will appear together in the same topic, and p(w i ) and p(w j ) represent the probability that the words w i and w j will appear in a certain topic. The larger the PMI value, the better the topic effect. The baselines methods of this paper are SVD and PCA, two commonly used methods for topic modeling. PCA, SVD, and TDLDA experiments were carried out on the blockchain patent datasets BC_patents_2016, BC_patents_2017, BC_patents_2018, BC_patents_2019, and BC_patents_16-19. We can get the topic vocabulary with different K values using the three methods. We calculated the PMI value of the respective topic sets, and performed statistics according to the TopN value under different K value topics. The experimental results are shown in Table 4. In this experiment, the number of topics K was 20, 40, 60, and 80, and Top10 and Top20 refer to the first 10 and 20 words of the topic, respectively. Then their PMI values were calculated. It can be seen from the table that when the TDLDA model has K = 20, 40, and TopN = 10, 20, the PMI values are 0.4634, 1.36685, 0.58643, and 1.35635, respectively, which are significantly greater than the PMI values under the corresponding PCA and SVD models. This shows that when the number of topics K is less than 40, the TDLDA model is better than the PCA and SVD models. When K = 60 and TopN = 10, the PMI value of SVD is greater than that of the other two models. When K = 80, TopN = 10 and 20, the PMI value corresponding to the PCA model is larger than that of the other two, which indicates that the topic effect is best when the number of topics is 80. From the experimental results, it can be seen that when performing topic mining on the blockchain patent short text dataset, TDLDA can obtain very good results when the number of topics K is less than 40.

Theoretical Basis of Topic Evolution
Topic evolution is the change patterns of topic words in a period series [38], which can be expressed by the topic intensity within a certain period. Scholars have studied topic intensity for detailed applications. For example, Cui [39] proposed the definition of topic intensity that He used, θ d k , to represent the ratio of the kth topic in document d, and then the kth topic in period t. The topic strength in time t is θ t k . It can be expressed as: According to the needs of topic evolution for patent text mining, we propose a calculation model of topic strength, that is, the proportion of the frequency of topic words in a topic among all topic word frequencies. The modeling process can be expressed as follows. It will generate K topics after the model run, and there are J keywords in each topic. The number of occurrences of topics in the kth topic in period i can be expressed as: where w j is the jth word, count k (w j ) is the number of the keywords in the kth topic, T i k means the number of occurrences of topics in the kth topic in period i. According to the frequency of occurrence of subject words, the total frequency of occurrence of all topic words is recorded as M, which can be expressed as: where N means the number of all topics, M i means the total frequency of occurrence of all topic words. Then the topic intensity of the kth topic in a certain period can be expressed as: where θ i k means the topic intensity of the kth topic in period i. According to Formula (8), we can obtain the topic intensity of the blockchain topic in each year through experiments, and analyze the evolution trend of each hot topic according to the topic intensity, including rising, falling, and steady.

Topic Evolution Trend Analysis
This section analyzes and discusses the evolution trend of blockchain patent topics from the three perspectives of ascent, descent, and stability.
(1) Analysis of the rising trend of blockchain patent topics According to the experimental results of Formula (8), we obtained the blockchain patent topic that showed an upward trend from 2016 to 2019, which is shown in Figure 5.

Topic Evolution Trend Analysis
This section analyzes and discusses the evolution trend of blockchain patent topics from the three perspectives of ascent, descent, and stability.
(1) Analysis of the rising trend of blockchain patent topics According to the experimental results of Formula (8), we obtained the blockchain patent topic that showed an upward trend from 2016 to 2019, which is shown in Figure 5. As can be seen from Figure 5, eight topics (Topic4, Topic5, Topic6, Topic8, Topic12, Topic13, Topic16, and Topic18) were on the rise, and they represent network storage, healthcare, supply chain, bill management, library materials, smart e-commerce, risk supervision, etc. Among them, Topic18 (risk supervision) declined in 2018 and rebounded immediately in 2019, generally showing an upward trend; and the proportion of all topics is relatively large which indicates that blockchain was used more in e-government. Topic4 (network storage) and Topic8 (bill management) were on an up- As can be seen from Figure 5, eight topics (Topic4, Topic5, Topic6, Topic8, Topic12, Topic13, Topic16, and Topic18) were on the rise, and they represent network storage, healthcare, supply chain, bill management, library materials, smart e-commerce, risk supervision, etc. Among them, Topic18 (risk supervision) declined in 2018 and rebounded immediately in 2019, generally showing an upward trend; and the proportion of all topics is relatively large which indicates that blockchain was used more in e-government. Topic4 (network storage) and Topic8 (bill management) were on an upward trend, and the rise was relatively large, accounting for more topics, which indicates that the blockchain was increasingly used in network storage and bill management. Topic6 (supply chain) and Topic13 (digital currency) were on an upward trend from 2016 to 2018 and slightly decreased in 2019, but the overall trend was upward, which indicates that blockchain was inseparable from the supply chain and digital currency. Blockchain is widely used in all aspects of the supply chain, which improves the efficiency of the supply chain operation and saves a lot of resources. The combination of blockchain and digital currency provides technical support for the financial sector. Topic12 (book materials) and Topic16 (smart business) were on an upward trend, but the proportion is relatively small, indicating that the blockchain has been applied to the field of book management and e-commerce, greatly improving management and operation efficiency. Topic5 (health care) declined in 2017 and then rose in 2018-2019, which indicates that blockchain was widely used in the field of healthcare, providing safe and reliable fast query and other functions for healthcare information.
(2) Analysis of the downward trend of blockchain patent topics According to the experimental results of Formula (8), we obtained the blockchain patent topics that showed a downward trend from 2016 to 2019, shown in Figure 6.
Topic5 (health care) declined in 2017 and then rose in 2018-2019, which indicates that blockchain was widely used in the field of healthcare, providing safe and reliable fast query and other functions for healthcare information.
(2) Analysis of the downward trend of blockchain patent topics According to the experimental results of Formula (8), we obtained the blockchain patent topics that showed a downward trend from 2016 to 2019, shown in Figure 6. As can be seen from Figure 6, six topics (Topic2, Topic7, Topic9, Topic14, Topic17, and Topic19) showed a downward trend, and they represent data security, cross-chain technology, consensus algorithm, copyright protection, energy Internet, and distributed storage. Among them, Topic14 (copyright protection), Topic2 (data security), and Topic9 (consensus algorithm) were on a downward trend, but they accounted for a large proportion of all topics, with a ratio of 0.07-0.16, which indicates that copyright protection, data security, and consensus algorithm have always been important core technology or application field in blockchain. The decline in the ratio is mainly because blockchain was applied to more fields. Looking at the frequency of the corresponding topic words every year, we can see that its absolute value always increased which indicates that the ratio of copyright protection, data security, and consensus algorithm declined, but continued to As can be seen from Figure 6, six topics (Topic2, Topic7, Topic9, Topic14, Topic17, and Topic19) showed a downward trend, and they represent data security, cross-chain technology, consensus algorithm, copyright protection, energy Internet, and distributed storage. Among them, Topic14 (copyright protection), Topic2 (data security), and Topic9 (consensus algorithm) were on a downward trend, but they accounted for a large proportion of all topics, with a ratio of 0.07-0.16, which indicates that copyright protection, data security, and consensus algorithm have always been important core technology or application field in blockchain. The decline in the ratio is mainly because blockchain was applied to more fields. Looking at the frequency of the corresponding topic words every year, we can see that its absolute value always increased which indicates that the ratio of copyright protection, data security, and consensus algorithm declined, but continued to develop and was a research hotspot. Topic19 (distributed storage) was in a downward trend. Looking at the frequency of Topic19's corresponding topic words every year, the absolute value increased, but the growth rate was slower compared to other topics. Topic7 (cross-chain technology) and Topic17 (energy Internet) both rose first and then fell, and their share in 2017 reached a peak, which indicates that these two topics were research hotspots that year. Combined with reality, cross-chain technology and energy Internet were granted multiple patents that year and many papers were published.
(3) Blockchain patents tend to be stable or fluctuate According to the experimental results of Equation (8), we obtained the blockchain patent topics that were stable or fluctuated from 2016 to 2019, shown in Figure 7.
fell, and their share in 2017 reached a peak, which indicates that these two topics were research hotspots that year. Combined with reality, cross-chain technology and energy Internet were granted multiple patents that year and many papers were published.
(3) Blockchain patents tend to be stable or fluctuate According to the experimental results of Equation (8), we obtained the blockchain patent topics that were stable or fluctuated from 2016 to 2019, shown in Figure 7. As can be seen from Figure 7, six topics (Topic1, Topic3, Topic10, Topic11, Topic15, and Topic20) showed steady development or a fluctuation trend, and they represent public service, digital copyright, smart contract, smart transportation, digital currency, intelligent transactions, and electronic files. Among them, Topic10 (smart contract) and Topic11 (smart transportation) decreased in 2017, increased in 2018, and decreased in 2019. The fluctuations were relatively large. The ratio of smart contracts was 0.06-0.08, and the proportion was large. The ratio of intelligent transportation was 0.02-0.03, which indicates that smart contracts were a research hotspot in the early stage of blockchain development, and intelligent transportation was less popular during that stage. The fluctuation was due to the emergence of new hot topics in certain years. Topic1 (public service), Topic3 (digital copyright), and Topic15 (intelligent trading) rose and fell, but the rise and fall were not large. The overall trend was relatively stable. Topic20 (electronic file) first fell and then rose, reached the lowest in 2017, and began to rise in 2018, with a ratio of 0.037-0.045. The trend tended to be stable, indicating that blockchain has always been a concern of scholars in these fields and is in a stable development state, showing characteristics of refinement, extensiveness, and intelligence. As can be seen from Figure 7, six topics (Topic1, Topic3, Topic10, Topic11, Topic15, and Topic20) showed steady development or a fluctuation trend, and they represent public service, digital copyright, smart contract, smart transportation, digital currency, intelligent transactions, and electronic files. Among them, Topic10 (smart contract) and Topic11 (smart transportation) decreased in 2017, increased in 2018, and decreased in 2019. The fluctuations were relatively large. The ratio of smart contracts was 0.06-0.08, and the proportion was large. The ratio of intelligent transportation was 0.02-0.03, which indicates that smart contracts were a research hotspot in the early stage of blockchain development, and intelligent transportation was less popular during that stage. The fluctuation was due to the emergence of new hot topics in certain years. Topic1 (public service), Topic3 (digital copyright), and Topic15 (intelligent trading) rose and fell, but the rise and fall were not large. The overall trend was relatively stable. Topic20 (electronic file) first fell and then rose, reached the lowest in 2017, and began to rise in 2018, with a ratio of 0.037-0.045. The trend tended to be stable, indicating that blockchain has always been a concern of scholars in these fields and is in a stable development state, showing characteristics of refinement, extensiveness, and intelligence.

Conclusions
Tracking scientific and technological (S&T) research hotspots can help scholars to grasp the current research status and regular development pattern of the field over time. This paper proposes a short text topic acquisition method, the time-based dynamic LDA (TDLDA) model. We took Chinese blockchain patent datasets from 2016 to 2019 from CNKI as the research subject. First, we called the third-party library jieba to obtain the professional terminology of the blockchain patent literature. Second, we used the TDLDA, an unsupervised representation method, to capture potential topics in blockchain patents, in order to obtain hot topics and corresponding topic words. We proposed a topic intensity model to calculate the intensity of hot topics, then sorted the K hot topics and calculated their intensity in different time periods, and discussed the feature distribution and development trend of each hot topic. Finally, we used the PMI evaluation index to compare the three modeling methods of PCA, SVD, and TDLDA, and analyzed the topic quality in these models. of the three models. The experimental results show the following: (1) Blockchain patent topics were processed by TDLDA to get K topics from 2016 to 2019. When K = 20, the topic effect was the best; eight topics showed an upward trend, six topics showed a downward trend, and six topics tended to be stable or fluctuate slightly. (2) We sorted the 20 topics according to intensity, and analyzed the first six topics: network storage, copyright protection, data security, risk supervision, digital currency, and consensus algorithm. We found that these topics were the main areas for blockchain patent applications. (3) When the number of topics K was less than or equal to 40, the PMI value of TDLDA is the best among the three topic models, which indicates that TDLDA was a suitable method for short-text topic data mining of patents. The research in this paper can help researchers more accurately grasp the research direction and improve the quality of project applications and paper writing.