Analysis of Peatland Research Trends Based on BERTopic

: Peatlands comprise approximately 3% of the land area worldwide. Peatland exists in most countries, including tropical, subtropical, and boreal regions. Accordingly, peatland has garnered increased research attention as a potential countermeasure against climate change. Therefore, it is necessary to identify and comprise the topics constituting global peatland research. In this study, we applied BERTopic—a topic modeling technique—to analyze relatedness between research topics to classify global peatland research trends, evaluate changes over time, and analyze the relationships between topics. To this end, we searched the keyword “peatland” on ScienceDirect—a global academic publication data platform—and collected the titles and abstracts from 10,158 publications from 1953 to 2022 for dynamic topic modeling and network analysis. Eighty-two peatland research topics were identified, which were combined into 15 main topics. Over time, an increasing trend was observed in topics related to production, management, and fire. In addition, upon analysis of the relationships between topics, three groups centered on fire, peatland value, and carbon were identified. We anticipate that the findings of this study can be expanded to analyze trends in research related to fires in peatlands, regional characteristics of peat soil, prediction of greenhouse gases emission and mitigation due to peatland fires, and prediction of future peatland research topics.


Introduction
Peatland, i.e., wetlands with accumulated peat, is estimated to comprise approximately 3% of land area worldwide, with distribution across most geographical regions, including tropical, subtropical, and boreal regions [1][2][3][4].Peatlands are typically defined as regions with ≥30 cm of peat soil depth, or where dead organic material comprises a minimum of 30% of the dry peat weight [4,5].Peatlands provide various ecological services, including carbon and water storage and biodiversity conservation, and serve as the largest natural carbon store among land ecosystems, making them an important resource for mitigating climate change [2,6,7].In particular, peatlands have an excellent carbon storage capacity per land area, storing about 40% of the earth's soil carbon [8].For example, it was reported that the carbon storage of Indonesian peatlands (approximately 20 million ha) is approximately 46 Gt, accounting for 8-14% of the global peatland carbon storage [9].Through this, peatland restoration has recently received attention as a means of responding to climate change in degraded peatland.
Peatlands have garnered increasing attention throughout the international community, with studies reporting on the relationship between peatland and the climate crisis, reducing greenhouse gases using peatland [10][11][12], and changes in peatland microbial ecosystems with climate change [13].Given the surge in peatland research, a need exists for the classification of associated peatland research topics such as climate change, greenhouse gases, and microbial ecosystems, etc.Previous studies that sought to achieve this goal applied topic modeling techniques to classify peatland studies by keywords and identify trends based on topic categories.However, Van Bellen and Larivière [14] were unable to efficiently investigate the significance of relationships between topics due to duplication issues when keywords were designated as topics.These results were improved upon by Yang et al. [15], who applied Dynamic Topic Modeling (DTM) and network analysis to select topics by keyword groups, enabling them to investigate structural properties and changes over time in topics.However, the study used algorithms based on DTM or Latent Dirichlet Allocation (LDA), which limited the authors to consider the context of the studies.Furthermore, concerns were raised regarding duplication caused by the assumption that text contained various topics.To overcome these limitations, the latest techniques must be adopted, assigning a single topic per text and allowing combinations of topics to be analyzed based on the significance between topics.
Topic modeling is an unsupervised machine learning technique applied to determine abstract topics within large-scale text data.To date, various topic modeling methodologies, including LDA and Probabilistic Latent Semantic Analysis (PLSA), have been suggested.However, these methods adopt the Bag of Words (BoW) technique, which is limited by the word order being ignored and sparse representation caused by the large number of words and dimensions in the data [16].Moreover, semantic relationships between words cannot be investigated using these methodologies as the text context is not considered.In contrast, BERTopic, based on the Bidirectional Encoder Representations from Transformers (BERT), has been increasingly applied since its development in 2018 as it performs natural language processing using embeddings at the text level.BERTopic boasts excellent performance functionality by applying new techniques, such as embeddings and c-TF-IDF, to extract topics based on context.In fact, it reportedly offers superior technological performance compared to LDA and Non-negative Matrix Factorization (NMF) in terms of Normalized Pointwise Mutual Information (NPMI) [17].Accordingly, BERTopic has received considerable attention from researchers as a deep-learning classifier in the field of natural language processing, with NMF, Corex, Top2Vec, and others [18,19].
This study is conducted to solve the limitations of previous studies by elucidating the significance and relationships between topics in peatland research.To this end, we applied BERTopic to derive topics from all classes via c-TF-IDF-based clustering and identify the hierarchical structure between topics as well as changes in research trends over time.The specific objectives of this study were to investigate (i) how peatland research topics are classified, (ii) changes in the proportion of peatland research topics over time, and (iii) the core peatland research topics, based on an analysis of the relationships between topics and their significance.

Research Framework
We used the deep-learning language model BERTopic to classify peatland-related studies and analyze their trends.After searching "peatland" on the academic publication data platform ScienceDirect, we collected the titles and abstracts of peatland-related papers published between 1953 and 2022.After pre-processing, BERTopic was applied to extract peatland-related topics based on the collected data.We then performed hierarchical clustering of 82 topics classified by BERTopic to extract 15 final main topics.The significance of changes in the 15 main topics over time and the relationships between topics was analyzed (Figure 1).

Data Collection and Pre-Processing
After searching for "peatland" on ScienceDirect, we collected the titles and abstracts of 11,519 papers published between 1953 and 2022.We excluded 1361 papers that were missing an abstract or for which the abstract was "unknown".The remaining 10,158 papers were included in our analysis.The "all-mpnet-based-v2" model from Hugging Face was applied to embed the data.Subsequently, additional pre-processing was performed on the collected data to derive significant results.Using Python 3.11.4,we removed stop words using the stop word package provided by the Python NLTK, as well as verbs, adjectives, adverbs, and specific symbols commonly used in academic writing.

BERTopic Modeling
BERTopic improves research accuracy by performing NLP via document embedding based on the model published by Google in 2018 [19][20][21].BERTopic uses a sentence transformer to construct document embedding, and then it clusters based on density using HDBSCAN (Hierarchical Density-based Spatial Clustering of Applications with Noise) after dimensionality reduction of embedded vectors using UMAP (Uniform Manifold Approximation and Projection) (Figure 2).Meanwhile, c-TF-IDF was used to extract important topics and words for each cluster.BERTopic shows high topic coherence and topic diversity [22].In particular, unlike previous topic modeling techniques, BERTopic can extract accurate topics while accounting for context [22] and can perform hierarchical clustering based on topic similarity.Accordingly, we applied the functionality of BERTopic to identify the topics constituting global peatland research, investigate how these topics change over time, and analyze the relationships between topics.We trained a BERTopic model and derived 82 topics.Subsequently, c-TF-IDF was employed to measure the similarity between topics.Via hierarchical clustering, we condensed these 82 topics into 15 main topics, the titles for which were determined based on the top 20 keywords and representative papers for each topic.Additionally, the Python "statsmodel" library was used to perform linear regression analysis and test the significance of annual changes in the 15 main topics.The independent variable was the year of publication, whereas the dependent variables were the number and proportion of each topic.We performed significance testing for the papers published from 2001 to 2022 where all 15 main topics appeared.Using a significance level of 95%, positive (+) regression coefficients were categorized as hot topics that significantly increased over time, while negative (−) regression coefficients were categorized as cold topics that significantly decreased over time.Additionally, we inspected the intertopic distance map using the visualization function to assess the relationships between topics.The intertopic distance map is a visualization of the relationships between topic clusters generated during the modeling process of BERTopic.The distance between the visualized topics on this map indicates the semantic similarity between the topics.

BERTopic Modeling
The 10,158 peatland studies analyzed by BERTopic were classified into 82 topics (Appendix A).After hierarchical clustering based on the similarity between topics (Figure 3), 15 main topics were generated.In the similarity matrix presented in Figure 3, a higher similarity between two topics indicated a closer relationship, causing them to be clustered together.For example, Topic 21 (fire_burned_burning) exhibited the highest similarity with Topic 60 (fire_charcoal_frequency; similarity 0.927) and Topic 61 (fire_wild-fire_health; similarity 0.899).This demonstrates the close relationships between these topics; for example, studies investigating fires (fire, burn) on peatland may evaluate the frequency and breath of wildfire spread, the extent of tree burning (charcoal), and effects of smoke generation on health.
The 15 main topics were assigned names based on the top 20 keywords for each topic indicated by BERTopic, and 3 representative papers (Table 1, Figure 5, Appendix B).BERTopic assigns topic numbers to the topics associated with the higher number and proportion of papers.Topic 1 (carbon dynamic) applied to approximately half of all papers (47.7%), whereas Topic 15 (deposit) applied to only 0.8%.Additionally, the number of papers in all topics is increasing every year (Figure 5).Topic 1 was "carbon dynamic," comprising various carbon dynamics studies that analyzed the flow of carbon.This included studies on regulating the release of CO2 and CH4 greenhouse gases through peatland management and rewetting [23], the stability of carbon in tropical peatlands assessed based on the flow and cycling of greenhouse gases [24], and changes in soil organic matter and vegetation with increasing CO2 concentration due to climate change [25].
Topic 2 was "past environmental change" and included studies on estimating changes in vegetation, climate, and landscape throughout the Holocene based on sedimentary matter records [26][27][28].
Topic 3 was "production" and included studies on the sustainability of peatlands for production, such as the environmental sustainability of biofuels (biodiesel, biochar, etc.) produced on peatlands and used in palm oil plantations [29], greenhouse gases released during palm oil production [30], and bioenergy production systems [31].
Topic 4 was "metal" and included studies on sedimentary heavy metals in peatlands, focusing on Hg, Pb, 137Cs, and Cd.One such research evaluated changes in the concentration of sedimentary heavy metals from the atmosphere and nearby rivers and lakes over time [32], whereas another research reported changes in the primary productivity of heavy metal sediments [33].Additionally, the characteristics of heavy metal isotopes in peatland were investigated and compared [34].
Topic 5 was "management" and comprised studies focused on peatland sustainability through suitable management strategies.This included a research on improving air quality and biodiversity via ecological services provided in peatlands [35].Another research proposed solutions for how to cooperate and resolve conflicts related to the positive benefits of peatlands and the need to increase the profitability of agriculture as the major peatland use [36,37].
Topic 6 was "microbe," comprising studies on the biodiversity, species diversity, and species abundance of microbial communities in peatlands.Others evaluated the potential of testate amoebae as an ecological indicator [38][39][40].
Topic 7 was "soil organic matter" and included studies on changes in carbon and nitrogen stores at different soil depths due to root addition with climate change [41].Another evaluated the activity of soil organic matter with pH change [42] and the rate of soil organic matter decomposition by different microbial communities [43].
Peatland fire can be broadly divided into two types [44]: surface fire (flaming), in which flames are visible above the surface and easier to be extinguished, and peat fire (smoldering), referring to the burning of organic matter in peat soil under the surface, hard to detect and producing thicker smoke and haze.Topic 8 was "surface fire (flaming)" and included studies on substances produced during peatland surface fires above the surface (atmospheric pollutants, greenhouse gases, particulate matter, etc.) [45], changes in peat forest vegetation due to fires during the Holocene [46], and the effects on tree species and parts according to fire frequency [47].
Topic 9 was "hydrology" and comprised studies on changes in the underground water level with peatland drying and wetting due to the effects of coastal water, rivers, and lakes [48], hydrological trends and the concentrations of dissolved materials [49], and changes in cation concentrations in underground water based on soil conditions [50].
Topic 10 was "global warming," focusing primarily on changes in ozone concentration, UV-A, and UV-B due to the effects of global warming on peat soil, and how these effects influence greenhouse gas emissions [51][52][53].
Topic 11 was "forest" and included studies on the tree growth in peatland forests, changes in microbial communities after wood ash fertilization in drained peatland forests [54], changes in tree growth following wood ash fertilization [55], and the effects of these processes [56].
Topic 12 was "peat fire (smoldering)," encompassing peatland fires that burn below the surface and deep surface.This topic primarily included studies on greenhouse gas emissions.For example, one research evaluated the smoke generated in peat fires and the effects of soil moisture content on fire spread [57].Others reported the effects and physicochemical changes in fires at different soil depths [58] and the effects of smoldering fires on atmospheric oxygen [59].
Topic 13 was "biomarker" and primarily included studies that monitored past environmental changes using lipids extracted from the cell membranes of microbes living in peatland.For example, one research evaluated the changes in the distribution and abundance of Glycerol Dialkyl Glycerol Tetraethers (GDGTs) at different soil depths due to microclimate warming [60,61].Meanwhile, another assessed seasonal changes in GDGTs [62].Topic 14 was "permafrost" and comprised studies on the role of peatland in preventing freezing and melting.For example, a research reported on the stabilization and changes in organic carbon in peatland soil located in permafrost [63].Others assessed the effects of permafrost destruction due to global warming on the structure and function of peatland ecosystems [64], and the characteristics of the permafrost [65].
Topic 15 was "deposit," including studies on the effects of volcanic eruption on the tephra ecosystem [66], the types and composition of sedimentary layers due to volcanic ash deposition in coastal peatlands [67], and the classification of volcanic ash deposited during the Holocene [68].

Changes in the Number and Proportion of Papers for Each Topic
To analyze the trends in peatland studies over time, we assessed the significance of changes in the number and proportion of papers for each topic.First, considering that topics appear at different times, the same starting publication year was applied throughout to ensure that the order of appearance did significantly affect the proportion.Accordingly, for all 15 main topics, the significance of changes was evaluated based on papers published between 2001 and 2022 (Table 2).We classified topics for which the number of papers increased significantly (p < 0.05) over time as hot topics.Of these, Topic 1 (carbon dynamic) showed the largest rate of increase in number, whereas Topic 15 (deposit) exhibited the lowest rate of increase.

Analysis of Relationships between Topics
A distance map was derived, demonstrating the relationships between the 15 main topics, which were further divided into three groups (Figure 7).
Group 1 comprised Topics 8 and 12, focused on fires.This group inevitably shows high relatedness, given that the two topics focus on peatland and peat fires, as well as the resulting effects on atmospheric pollution, greenhouse gas emissions, and human health.
Group 2 comprised Topics 3 and 5, focused on the value obtained from peatlands.This group demonstrates the high relatedness between the value and scale of production (e.g., biofuel, bio-oil, bioenergy, etc.) from peatlands that differ depending on how peatlands are managed.Group 3 was the largest cluster, including Topics 2, 4, 6, 7, 9, 10, 11, 13, 14, and 15, and centered on Topic 1 (carbon dynamic).This group exhibited the highest paper number of the 15 main topics.Semantically, the similarity between topics was higher than in Groups 1 and 2, demonstrating that the other topics were clustered around this topic.Through this, it can be seen that past environmental changes can be inferred based on the cycling of accumulated materials in peatland, and that basic scientific research fields related to peatlands are interconnected with a focus on carbon.

Discussion
In this study, we investigated global peatland research trends using BERTopic to analyze 10,158 papers retrieved by searching "peatland" on ScienceDirect.Initially, 82 topics were derived, which were subsequently reclassified into 15 main topics based on hierarchical clustering.The "hot topics" with significantly increased paper proportions over time were Topic 3 (production), Topic 5 (management), Topic 8 (surface fire (flaming)), and Topic 12 (peat fire (smoldering)), related to peatland management, usage, protection, and conservation.Conversely, the cold topics with a decreasing research proportion over time were Topic 1 (carbon dynamic), Topic 4 (metal), Topic 6 (microbe), and Topic 11 (forest), related to the basic ecology, chemistry, and matter cycles of peatlands.This shows that peatland research trends are shifting from basic studies to fields relating to utilization and applications.In previous studies analyzing peatland research trends, van Bellen and Larivière [12] and Yang et al. [13] also reported significant increases over time in keywords such as management, climate change, restoration, production, and biodiversity.This likely reflects the current circumstances in that the importance and utility of peatland are increasing due to the changes associated with human settlement, including its roles in habitats and its overall utility [69][70][71].
Upon analysis of the relationships between topics, three groups were divided.By combining the analysis of relatedness with changes in the proportion of papers over time, Topics 3 and 5 (hot topics) formed Group 2, whereas Topics 8 and 12 (hot topics) formed Group 1.Meanwhile, the remaining 11 topics, which were either cold topics or showed no change in paper proportions, formed Group 3. Topics 8 and 12 in Group 1 focused on fires, with the studies evaluating the effects of peat and surface fires, including atmospheric pollution, friction with neighboring countries due to smoke, the release of organic carbon stored in peatland soil in the form of greenhouse gases, and negative effects on human health [45][46][47]57,58].
Group 2 is particularly important to evaluate as it exhibited the fastest increase in the proportion of papers.This is similar to a report by Yang et al. [15], which not only considers peatland as a target for conservation or development but also emphasizes the importance of sustainable peatland management as a countermeasure to climate change that can mitigate greenhouse gases.Thus, we can conclude that the value and scale of ecosystem services and products (e.g., biofuel, bioenergy, etc.) from peatland may differ based on management, and effective peatland management is important to improve peatland productivity and sustainable use [29][30][31][35][36][37].
Topic 1 (carbon dynamic), which is the center of Group 3, had the highest number and proportion of papers from 1953 to 2022.Although this topic exhibited an annual increase in the raw number of papers, the proportion of total papers showed a decreasing trend.This result shows that other topics such as Topic 3, 5, 8, and 12 than Topic 1 are increasing over time.Based on the keywords and research content for Topic 1, overall, this topic is related to the basic flow of carbon within peatlands.Moreover, Topics 2, 4, 6, 7, 9, 10, 11, 13, 14, and 15 are centered on Topic 1, indicating that the circulation of accumulated materials in peatland and the associated basic science studies focus on carbon.This supports previous reports that peatland is closely related to carbon [1,[72][73][74].In fact, the United Nations Framework Convention on Climate Change (UNFCCC) and the Convention on Wetlands (Ramsar Convention) advised that peatlands should be included in Nationally Determined Contributions (NDCs); accordingly, countries with peatlands are adopting these guidelines as a means of achieving their target NDC [3,10,12,[75][76][77].However, before expecting all countries with peatlands to implement these practices, given the different characteristics of each country and region, it will first be important to precisely ascertain and map the global peatland area [3].
Although previous studies on peatland research trends simply analyzed the keywords that received the most attention [14] or the structural relationships between topics using network analysis [15], the current study performed hierarchical clustering based on similarities between global peatland studies, deriving 15 main topics that were arranged into three groups by analyzing the relationships between topics.Additionally, the significance of these findings has been discussed.In particular, although several studies have previously highlighted that the topics of fire and carbon are inseparable from peatlands in global research [78][79][80][81], this was supported through the results of this study.
Most topic modeling techniques are limited by the nature of unsupervised learning in that the modeling results differ depending on the algorithms used and hyperparameter tuning.To overcome these limitations, we performed several rounds of BERTopic hyperparameter tuning and interpreted the results based on the learning capabilities of the researchers, including the contents of each topic and the naming of topics.Nevertheless, one limitation of this study is that BERTopic assumes that each text only corresponds to a single topic, thus restricting the consideration of multiple topics.To overcome this, the development and supplementation of new topic modeling techniques, the combined use of topic modeling techniques, and improvements in the review capacity of peatland research experts will be necessary.Moreover, based on our findings, further research is warranted to fully explore the composition of the fire-related topics (Topics 8 and 12), which are important in peatland research, as well as to differentiate the characteristics of peatland soil types in different regions.

Conclusions
In this study, we used BERTopic, a recent topic modeling technique, to analyze relatedness between topics, with the aim of classifying global peatland research trends into topics, examining the change in topics over time, and identifying core themes in peatland research trends and their significance.We classified global peatland research trends into 15 main topics and observed an overall increase in research related to production, management, and fire.When we analyzed the relationships between topics, three groups were divided, focusing on fire, peatland value, and carbon.Through these research results, we anticipate that, in the future, our findings can be expanded to analyze trends in research related to fires in peatlands, regional peat soil properties, the prediction of greenhouse gases emission and mitigation due to peatland fires, and the prediction of future peatland research topics, among others.

Figure 1 .
Figure 1.An overall research framework for identifying changes in peatland research trends.

Figure 2 .
Figure 2. BERTopic algorithm process in which topics are extracted.

Figure 3 .
Figure 3. Similarity matrix between 82 topics.The x and y axes represent 82 topics; the darker the color, the higher the similarity score between topics.

Figure 4 .
Figure 4. Hierarchical clustering of 82 topics.The linkage distance required to merge two clusters and used to measure inter-cluster similarity.Lower values indicate higher similarity, while higher values indicate greater difference.The color line connecting each other show clustering into highly similarity topics.

Figure 5 .
Figure 5. Change in number of papers per topic from 1953 to 2022 (n = 10,158).

Figure 6 .
Figure 6.Proportions of topics which significantly (a) increased (hot topics) or (b) decreased (cold topics) from 2001 to 2022.

Figure 7 .
Figure 7. Intertopic distance map of 15 topics.The size of the circles represents the paper number for each topic, and semantically similar topics are positioned in closer proximity to one another.
Appendix B. Number and Proportion of Papers Classified by Topic from 1953 to 2022 (n = 10,158)

Table 1 .
Top 20keywords with high probability appearing in each topic calculated using BERTopic modeling in peatland research from 1953 to 2022.

Table 2 .
Regression coefficients and p-values of topic number and proportion over time from 2001to 2022 (n = 9002).'Hot' indicates topics where the number and proportion changes significantly increase (p < 0.05), while 'Cold' represents topics where the number and proportion changes significantly decrease (p < 0.05).'-' is not statistically significant.
The 82 topic names were automatically combining the top 3 keywords named by Python.