1. Introduction
In recent decades, the literature on sustainable supply chain (SSC) and sustainable supply chain management (SSCM) has evolved from theoretical definitions to perspectives that are increasingly specialized and backed by evidence and data [
1,
2,
3] as the sustainability challenges increased. In fact, early studies primarily focused on integrating the triple bottom line (TBL) by Elkington [
4,
5] into logistics systems, thus emphasizing the alignment between environmental efficiency and economic performance [
6,
7]. Recently, scholars have moved toward a more systemic and technology-oriented interpretation of the supply chain concept in some fields [
8]. Building on these commonly accepted perspectives in the SSC and SSCM literature, a continuation of this line of reasoning suggests the appearance of several research directions. The conceptual integration of circular economy principles—including closed-loop and reverse logistics—has become a central research stream [
9,
10], reflecting the broader shift from linear to regenerative supply models [
11]. Technological innovation—including digitalization, Industry 4.0/5.0, and smart manufacturing—has emerged as a major driver of sustainable operations [
12,
13,
14]. Finally, there is a growing interest in decision-making frameworks that integrate multiple criteria (economic, environmental, and social) through data analytics and AI-based models [
15,
16]. However, most existing reviews are qualitative and descriptive, offering rich conceptual insight but lacking the quantitative depth necessary to map the temporal dynamics of research evolution or to forecast emerging themes [
17,
18,
19].
Moreover, bibliometric studies identify co-citation patterns and research clusters and tend to focus on the past and present structures of knowledge rather than on their future trajectories; for instance, Ren et al. [
20] and Koberg and Longoni [
3] mapped thematic trends in green and sustainable logistics. Similarly, recent text mining analyses in related domains [
21,
22] demonstrate the potential of machine learning-based bibliometric tools but remain limited to exploratory mapping without predictive modeling. Thus, to address these gaps, this study proposes to combine systematic literature review and text mining to examine nearly 9000 indexed publications (2000–2025): through clustering, co-occurrence, and time-series forecasting, this research seeks to detect emerging clusters and predict their short-term development.
2. Literature Review
Sustainability started to be integrated into the logistics and supply chains shortly after the sustainable development concept appeared in the Brundtland Report Our Common Future [
23]. Even though signs for sustainability in supply chains came in the area of transportation and reverse logistics [
24] in the late 1990s, the complete definitions appeared after 2000. Some of them are well recognized but are focused more on the environmental dimension and envisage Green Supply Chain Management (GSCM) [
7,
25,
26] while others encompass the three dimensions of the triple bottom line. One of the most cited and completed definitions of SSCM that also includes the social element is that of Carter and Rogers [
6]. Although the two types of definitions for green and SSC overlap to a large extent and sometimes are used interchangeably, the SSCM extends and complements the GSCM. Consequently, we will regard the broader notion as the SSC, while both “green” and “sustainable” will be considered in the research results.
Combining the terms SSC and SSCM in large-scale text mining research helps identify the main trends and frameworks. This approach provides new insights into the field and explores key trends and new topics and uncovers their dynamic change.
There already exist numerous studies on sustainability both in supply chains and supply chain management. Some of these studies comprise extended literature reviews [
1,
2,
3] focusing on the conceptualization and evolution of SSCM. These studies outline important notions in relation to the integration of sustainability in traditional SCM. Other authors emphasize the triple bottom line of sustainable development, which includes economic, environmental, and social dimensions [
27,
28,
29]. According to Lazar et al., almost half of the studies are three-dimensional, fewer are two-dimensional, and a small part are one-dimensional [
30], which proves the importance and interconnection between them. However, according to the same authors, the most represented was the economic dimension, followed by the environmental and lastly by the social dimension [
30]. Becerra et al. also state that social impacts as the purpose of models are the least studied [
31], which proves the lagged attention to the social dimension.
There are scientific articles that examine specific practices used in various logistics activities within SSC. This includes reverse logistics within closed-loop supply systems in the context of the circular economy [
9] because it is considered critical for embracing and implementing the circular economy concept across supply chains [
10]. Additionally, the practices in eco-design are also well connected with circularity in supply chains, and in the latest research, Yang and Sun even interconnect the eco-design with closed-loop supply chains by integrating remanufactured products alongside the manufacturing of new units [
32]. Then, other separate activities, such as sustainable sourcing and procurement, are also subject to distinct research: these studies emphasize responsible business practices and supplier selection in procurement decisions [
15] and the application of multi-criteria decision-making to evaluate supplier performance. According to scholars [
33], the literature has seen rapid growth; however, green procurement is still an infant field, and they expect that to be a significant field for further research and publications. More traditional activities, such as inventory management, are also gaining attention in terms of sustainability, and Wang et al. [
34] state that there is a gap in the literature studying inventory and pricing from a green growth perspective. Another gap in the literature is observed in terms of sustainable production, where there is an absence of bibliometric synthesis in sustainable and social public procurement [
35] and a lack of comprehensive, field-wide review of green procurement [
33]. These conclusions lead to opportunities for further research and could be an object of interest for forecasting.
Other researchers [
36] also study green warehousing practices, since they significantly contribute to greenhouse gas emissions in supply chains. They indicate that the sustainable warehouse location is a new emerging topic that can contribute to sustainability in supply chains. Conversely, the green packaging topic is gaining less attention, and the research is very limited, according to Morashti et al. [
37], who explore papers in the field between 1993 and 2020. Last but not least, in terms of activities in the supply chain, sustainable distribution and transportation are some of the most traditional and advanced, but more for empirical, multimodal, and decarbonization-oriented research [
38]. However, gaps are identified in sustainable last-mile logistics [
39].
Research on SSC is also conducted in different industries. Some of the most researched are the automotive [
40], food [
41,
42], fashion [
43], and pharmaceutical [
44,
45] industries. The forecasts in the development of these studies specifically address not only a particular industry but also broader topics such as technology, performance, and sustainability dimensions. This complexity makes forecasting their development more challenging and often connects them to other related topics.
New challenges in the supply chains are also connected with the regulatory framework, governance mechanisms, and international standards. Based on this, a growing number of studies examine how policies, international standards, and commitments reshape the supply chain. Special attention is given to SDG alignment to supply chain practices and performance [
46]. In terms of initiatives, the sustainability reporting standards (GRI) [
47] and environmental, social, and governance (ESG) performance could be considered as an emerging issue in the sustainability of supply chains [
48]. From a company perspective, environmental management systems (EMSs), especially ISO 14001, are associated with improved SSCM processes and outcomes and are also objects of interest for researchers [
49]. As relatively new concepts, these topics are expected to grow, and their applications to be of interest to research.
A parallel topic in supply chain sustainability examines the internal and external factors for the development of sustainable practices. In recent years, the external pressures and risks in the supply chain have become powerful drivers for the adoption of such practices. Traditionally, regulators, customers, brand image, and partners are common drivers [
50], but recently, the pandemic and geopolitical disruptions and wars changed the priorities in the supply chains and the factors for their development. The number of articles considering these risks increased and also included the role of new technologies in them [
51,
52], which reshaped many studies.
Further research directions include the technologies and innovations that are entering all spheres of life, including SSC—spanning AI/ML, IoT, data analytics and text mining, Industry 4.0/5.0, and green technologies. Some reviews show AI-integrated tools and data-driven methods that improve planning, traceability, and low-carbon decisions while also noting gaps in social dimensions, data governance, and scalability [
14,
22]. The digitalization of supply chains has become a must, and this mixture of methods, technologies, and innovations opens the floor for further themes in SSC.
As we can see from the literature, there are many different fields of application and research in SSC. In one article, it is difficult to make a forecast for all of these fields, and for that reason, not all of them will be presented in the results of this research. However, they are mentioned and discussed in order to see some gaps in the literature in different fields found by other scientists and also to keep in mind that, in the volume of research, some of the mentioned subjects can be discussed in a small number of articles, and they might not be enough for the representation of separate clusters by the methods used in this article. This may be viewed as a limitation of the current study.
2.1. Prior Bibliometric Studies on Sustainable Supply Chains
In recent years, there has been a growing body of bibliometric studies examining how research on sustainable supply chains has evolved. Balcıoğlu et al. [
53] identify transparency, traceability and blockchain-based trust as dominant themes in digitalized supply chains, conducting a bibliometric analysis of 1069 articles from the Scopus database. Another study examines AI-driven automation, predictive analytics, and real-time information flows, demonstrating the growing integration of digital technologies into SSCM through a review of 383 articles over the period of 2017–2024 [
54]. In specific industries (dairy products), sustainability issues in supply chains are addressed, and, using traditional and emerging forecasting methods, the authors conclude that “ARIMA and SVR, achieve superior accuracy across all stated metrics”, which reinforces the reliability of these methods [
55]. Another bibliometric study using Scopus and clustering as the main methods in the field of supply chains is that of Wołek and Próchniak [
56]. They cover a period of 10 years and a database of 1796 records.
The bibliometric reviews of the literature contribute to understanding the structure of specific fields and help uncover trends and forecasts. In supply chains, Theeraworavit et al. [
57] show how the circular economy rethinks classical supply chain paradigms using a meta-analysis of 709 studies indexed in Scopus and published over a 15-year period. Another research [
58] also used meta-analysis and mapped 2574 peer-reviewed articles on SSCM for a 10-year period and found strong geographical and sectoral imbalances. In addition, Zhang et al. [
59] identified 1130 journal articles using clustering and derived five clusters related to reverse channel optimization, closed-loop systems, etc., with a study period after 2000. Although we consider article abstracts in the current study, compared to the aforementioned bibliometric reviews, the current analysis includes a significantly larger number of articles (
n = 8955) and covers a longer time horizon (2000–2025) that is favorable for forecasting trends. This allows a more comprehensive view of long-term changes and structural gaps in SSCM research.
2.2. Conceptual Framework
Based on the literature review, this section synthesizes four research streams that motivate the research questions of this study. This study assumes that knowledge about sustainable supply chains can be structured into a limited set of thematic clusters. Their importance changes over time and is connected through innovative technologies and governance mechanisms, while leaving some areas relatively underdeveloped [
60,
61,
62]. The first direction is derived from the global context of sustainable logistics systems, which shows that, although there are many themes, a small set of coherent themes (e.g., environmental/circular flows, governance, technology-driven practices) is observed in the field. This explains the treatment of the knowledge base as a limited set of grouped themes. On this basis, the first research question 1 (RQ) is developed. This aims to identify the main thematic clusters. These developments suggest that the relative weight of the themes changes over time (circular economy, digitalization, sustainability) and provide the theoretical basis for RQ2. In most research (especially after pandemics and wars), sustainability has emerged as a central perspective, combined with resilience and flexibility. These developments suggest a temporal weighting of the themes, which examines how the identified clusters have developed and how they are likely to develop in the near future [
60,
63]. Also, bibliometric and literature reviews show that Industry 4.0/5.0 applied technologies (e.g., Internet of Things, analytics, artificial intelligence, blockchain) that are strongly associated with developments in sustainability through improved decision-making, traceability and transparency in supply chains. There is increasing attention on traceability, which is seen as a key mechanism linking environmental and circular practices with governance, compliance and sustainability [
60,
64,
65]. The shift to circular practices in spare-part networks is also gaining strategic importance, and circular practices are developing and underexplored themes [
66]. These trends provide the basis for RQ3, which suggests that circularity, decision-making, digitalization and traceability act as connecting concepts linking otherwise disparate thematic clusters. Sustainable development has three dimensions, but in many studies, there is an imbalance and lagging areas, such as the social pillar, which is lacking in depth and scope [
19,
67]. This imbalance provides the theoretical basis for RQ4, which aims to identify underrepresented areas and outline directions for future development in sustainable supply chains. In summary, a conceptual framework is presented in
Table 1, which presents the four research questions and their associated hypotheses. RQ1 suggests that the literature is organized into a limited number of macro-clusters, with the relative importance of these clusters changing over time as circularity, digitalization and sustainability increase (RQ2). Decision-making, digitalization and traceability act as cross-bridges between the studied clusters (RQ3), and the social dimension is identified as lagging behind other areas in the SSCM (RQ4). The methods used in this study are designed to test these theoretically informed expectations based on publications on sustainable supply chains covering the period of 2000–2025.
2.3. Research Questions
Despite a large and growing body of work on SSC and SSCM, the field still lacks a clear synthesis of its evolving thematic dimensions, research transitions, and future trajectories. There is a pressing need to employ text-mining techniques to systematically extract and categorize dominant research themes. This study aims to apply a text-mining approach to analyze a substantial body of literature related to sustainability in supply chains, identifying key trends, thematic clusters, and conceptual interrelations.
- RQ1:
What thematic clusters characterize the sustainable supply chain research landscape?
H1. The sustainability in supply chain literature is expected to organize around a limited number of interrelated macro-clusters combining environmental, social, and technological dimensions. To avoid any appearance of confirmation bias, this hypothesis is formulated as a general expectation rather than specifying a fixed number of clusters in advance. This expectation derives from previous literature reviews highlighting the coexistence of green, circular, and digital sustainability paradigms [1,3,68] but remains to be empirically verified through large-scale text mining and clustering analyses.
- RQ2:
How has the emphasis on these clusters changed over time, and what are the projected future research trends?
H2. Following the growing influence of the circular economy and Industry 4.0 paradigms, the SSC research field is expected to shift from a primarily environmental and efficiency-oriented focus toward a more integrative model that combines circularity, digitalization, and resilience.
This assumption is consistent with recent studies pointing to a paradigm transition from green logistics to data-driven and regenerative supply chains [
10,
11,
13,
14]. By applying ARIMA-based forecasting to bibliometric data, this research anticipates detecting accelerating trends in circular and technology-related topics.
- RQ3:
How are key concepts and thematic clusters interconnected, and what insights do these relationships offer for theory and practice?
H3. It is hypothesized that the concepts of decision-making, digitalization, and traceability play a bridging role between environmentally driven and circular paradigms, revealing a growing integration of technological innovation into sustainability governance frameworks.
Prior reviews of SSCM decision-making frameworks and technology-enabled sustainability [
12,
15,
16] support this expected convergence, which will be further tested through co-occurrence network analysis.
- RQ4:
Which areas of sustainable supply chain research remain underexplored and offer potential for future theoretical development?
H4. The social dimension—including collaboration, human well-being, and equity across global supply networks—is expected to remain underrepresented compared with environmental and technological research streams.
This expectation aligns with meta-analyses showing that the social pillar of sustainability receives less academic attention relative to the economic and environmental pillars [
30,
31]. Identifying this imbalance will help outline an agenda for future research.
3. Materials and Methods
Text mining has long been applied to research in SSC, with numerous studies published over the past decade. Techniques like LDA, BERTopic, NLTK, TF, and TF-IDF have been used to uncover trends in areas such as Metaverse applications across supply chains [
69], sustainability in process industries [
70], and supply chain risk factors [
71]. Other studies explored Industry 4.0’s link to sustainability [
21], green logistics and logistics service providers [
72], and renewable energy supply chains [
19]. Bibliometric analyses have mapped knowledge in green logistics [
20], SSCM [
17], and sustainability in textiles [
73]. Text mining is widely used as a reliable tool for identifying topics, trends, differences, etc. in the current state rather than forecasting. This motivates the present research to use text mining as a main tool but also to try to update existing knowledge with forecasting. This corresponds to the research methods evolution, from using mainly quantitative data to the usage of qualitative data [
74]. This is also valid for the literature reviews, giving researchers a variety of new perspectives. In this field, indexing abstracts are widely used. This is possible because abstracts have to synthesize the main content [
75]. Also, topic similarity between using a full-text paper and its corresponding abstract is higher when more documents are analyzed [
76], as it is in our case. Many other arguments could be noted that use similar approaches—19,931 abstracts related to the problems of human resources [
77], 9580 records for the text-mining approach for analyzing the research trends in Scopus [
78], 3477 records for exploring the landscape of purchasing and supply management research [
79], etc. Abstracts and keywords were used because they are more accessible and consistently available, while full texts are often restricted, but their lower information density may limit the depth of the analysis and partly explain the differences between keyword- and abstract-based clustering. In the current research, the source of the data is SCOPUS export CSV (comma-separated values) with the required fields. The current research encompasses
n = 8955 articles in English (without conference proceedings), related to the words “scm”, “supply chain” and “sustainab*”. The choice for a single database is motivated by its consistent structure, high metadata quality, and the ability to download large datasets suitable for large-scale text-mining analysis. We considered other sources, such as AI-based bibliometric platforms, but found the metadata inconsistent. This reveals a potential for future research in developing systems for automatic database integration. Abstracts and keywords were used because they are the only standardized and consistently available text fields for the size of the database. Although they contain less depth compared to full texts, this approach is used in bibliometric research. The period of publication is from 1999 to the mid of 2025. Covered subject areas are “Business, Management and Accounting” and “Economics, Econometrics and Finance”. The exact search string that was used is ((TITLE-ABS-KEY (scm) OR TITLE-ABS-KEY (“supply chain”)) AND TITLE-ABS-KEY (sustainab*)) AND PUBYEAR > 1999 AND PUBYEAR < 2026 AND (LIMIT-TO (SUBJAREA, “BUSI”) OR LIMIT-TO (SUBJAREA, “ECON”)) AND (LIMIT-TO (DOCTYPE, “ar”)) AND (LIMIT-TO (LANGUAGE, “English”)), executed at the end of May 2025. The search terms were applied simultaneously to the Title, Abstract, and Author Keywords fields of the database. This ensures methodological transparency and clarifies how the search string was operationalized. To keep a clear focus on the managerial and economic aspects of sustainable supply chains, the analysis was limited to the subject areas of Business, Management and Accounting (BUSI) and Economics, Econometrics and Finance (ECON). These areas represent the academic domains in which supply chain management, sustainability practices, corporate responsibility, procurement, and decision-making frameworks are most commonly studied. Selecting these specific subject areas influences the final results. However, this choice was intentional to align with our research goal from a managerial and economic perspective. This focus ensures a coherent and comparable dataset, as the database provides standardized metadata for these specific subject areas. This study provides a foundational mapping of the management side of the field. The initial search was broad. Sustainability in supply chains is an emerging and conceptually diverse field, and a broader search strategy reduces the risk of excluding relevant studies that use alternative or evolving terminology. Other subject areas such as Computer Science, Engineering, Environmental Science, Energy and other related technical fields often examine sustainability from technical or other perspectives that fall outside the scope of this study. This ensures that the dataset encompasses publications that approach sustainability in supply chains from a managerial and economic perspective. In the dataset, tags such as “scm” were used as provided by the authors, and within the included subject areas, this tag is conventionally used to denote supply chain management and supply chain problems. This study is based exclusively on a single database for publications, and all results should be interpreted within this scope. Given this approach, the broad article searching was combined with focused subject filtering to balance the results.
This database gives the possibility for two main directional analyses in our case—by keywords and by abstracts. Even with the usage of the same database, the logic and the structure of the data processing are different. The proposed model aims to create a logical foundation for future automation. By streamlining the data collection and analysis, it provides a framework for monitoring research trends using standardized and free tools. The overall process of performing data analyses is presented in
Figure 1.
The analysis is performed in Python (version 3.12.) in the Jupyter Lab [
80] (version 4.0.11) that is used in Anaconda Navigator [
81] (version 2.6.2.) for installation and library control. Before further processing steps, a standard deduplication check was performed to remove potential duplicate records. Data preparation begins with removing irrelevant text, special characters, and publisher data. For keywords, replacement has to be performed to ensure consistency (e.g., replacing “IoT” with “Internet of Things,” “EU” with “European Union”). About tokenization, we refer to the fact that tokens are words, but not always [
82]. In general, authors and publishers choose their keywords very carefully, so it is better to avoid any chances of misinterpretation. This is a motivation why tokenization is used only for abstract analysis. Keyword processing includes case lowering and replacing spaces with underscores. A key step is filtering out dominant and rare terms—here, the low-frequency threshold is set at 10. A short set of normalization rules was also applied to unify common variants of the same keyword and to remove non-topic or inconsistent terms. Then, it is appropriate to apply classical approaches, and some of them are:
By country—the total number of words, leading keywords by country, etc.
By year—priority keywords by year.
N-grams—in most cases bigrams and trigrams.
Generating a bag of words, wordcloud, etc.
These basic steps represent some of the fundamental results. However, more significant is the point at which we move on to search for larger unions and identify possible text clusters. In the current case of keyword analysis, different types of methods can be used. One of them is k-means clustering after appropriate vectorization, for example, with TfidfVectorizer of sckit-learn library [
83] (version 1.6.1.). In this study, the clustering is performed in the TF-IDF keyword feature space, where each pre-processed keyword is represented as an individual TF-IDF dimension. This choice was driven by the nature of our dataset. The texts are short and heterogeneous, and classical short-text methods remain the most stable and reproducible for this type of input. More advanced semantic models generally require another text input. Integrating TF-IDF features and k-means clustering is a useful way to organize and draw conclusions from a large amount of data [
84]. Here, similar keywords are grouped based on their vector representations into a predefined number of clusters. For the time-series and forecasting analysis, each article is assigned to a single cluster based on its pre-processed keywords. The cluster that appears most frequently among an article’s keywords is selected. For the exact number of clusters, a loop can be used to track their silhouettes as well as the elbow method and the corresponding visualization. When defining the clusters, dimensionality reduction is often necessary to deal with the challenges of high data dimensionality and to improve the performance of the algorithm. High-dimensional data are complex for visualization, and this could be solved by the application of PCA (Principal Component Analysis) [
85,
86] to reduce the number of features while retaining as much variance information in the data as possible. As an alternative to k-means, LDA, BERT, Non-negative Matrix Factorization (NMF) and others can be used. Well-known methods such as LDA require longer texts for improved output [
87]. Short-text modeling is an emerging topic that challenges classic approaches [
88]. The more advanced methods cover more advanced analysis including the semantic meaning of the text, different topic overlaps, etc., but they require more text. On the other hand, k-means is preferred for its simplicity, widespread use, and sufficiency in the initial assessment [
89,
90]. This gives a motivation for more classic and basic approaches to be preferred in the current research for clustering methods.
After defining clusters and recording them in a separate column in the database, further analysis is possible—descriptive statistics about the content of the clusters as keywords. In parallel, the possibility of aggregating data, such as the number of publications, for each cluster and by year. This data is fundamental for forecasting the cluster evolution in the future. Methods such as moving average, exponential smoothing, autoregressive ARIMA, etc. can be possibly applied. ARIMA models are feasible for predicting bibliometric indicators [
91]. This can be used for forecasting different bibliometrics, but in our case, this is the number of articles in the future as well as forecasting the appearance of single text strings. Clustering allows other analyses, like network analysis. Individual words or clusters can be viewed as elements of systems between which there are links. This is an improved visualization of the problem, and it reveals the deficits in the relationships between the individual elements that can be explored in the future.
The text mining could be extended also for the abstracts, which are also available in most cases. An important step here is the tokenization process, which is more required than the keyword analysis. The initial analyses are quite the same—bibliometrics, regional specifics, n-grams, clustering, etc. In addition, the following could be mentioned: part-of-speech tagging (POS), average sentiment score, similarity scores, and others, as well as categories like “Dimension Reduction” and “Logistic Regression” [
92]. Combining keyword- and abstract-related results gives alternatives for augmenting the results and finding the convergence or divergence if any.
Each method contributes to the study, but LLMs (large language models) can further aid result interpretation, especially for text analysis. This is why LLMs can interpret the context in the text [
93,
94]. Combined with the fact that these systems can learn quickly, they are defined as few-shot learners [
95], which makes them an adequate tool. Topic modeling is a complex task and evolved from earlier rule-based text-mining approaches to increasingly sophisticated unsupervised methods [
78]. LLM systems can do topic modeling and summarization [
96]. But the use of LLM models is related to research ethics; thus, this should be transparent [
97]. In this study, LLMs were used to give semantic names to outputs, such as PCA dimensions and cluster labels, making them more human-friendly. The specific LLM and version are noted in the text.
4. Results—Main Data Output
4.1. Basic Keyword Statistics
The initial part of the analysis is the keyword statistics, including the bigrams. From the building database, the top keywords by each year are extracted. In our case, it is the period from 2000 to 2025, including the top 10 keywords by frequency for each year. This table offers a chronological overview of keyword evolution in sustainable supply chain research, revealing some shifts in the topics. The initial period (2000–2005) primarily focused on foundational aspects such as “environmental_impact,” “environmental_management” and “costs”. For the initial years, we can also mention “buildings”, “construction_industry,”, “textile_industry”, etc., which shows industry-centric approach to sustainability. Other keywords, such as “economic_and_social_effects,” “competition,” and “marketing,” indicate the early recognition of the broader sustainable practices. From 2006 to 2010, the results reveal a growing awareness of corporate responsibilities, and “corporate_social_responsibility” emerged as a prominent keyword. The “closed_loop_supply_chain” and “reverse logistics” were introduced. This shows increasing interest in circularity and managing products’ lifecycle. Also, “climate_change” appeared. The period of 2011–2018 was largely dominated by the concept of “green_ supply_chain,” which consistently stayed as a top keyword. This indicates research focus on environmentally friendly practices throughout the supply chain. Alongside “environmental_sustainability” was a persistent topic. “decision_making” was also maintained as its central topic. In recent years, “circular_economy” rapidly gained prominence from 2019, indicating a fundamental shift to regenerative economic models designed to minimize waste and maximize resource utilization. A variety of other keywords also emerged, such as “COVID-19,” “blockchain,” and “Industry 4.0,” highlighting research into supply chain resilience and digital integration. An interesting fact is “food_supply” consistently appearing in the top keywords from 2014, suggesting a sustained and growing focus on sustainability within the agricultural and food sectors.
Another result is based on the bigrams. They offer a deeper layer of insight into how specific concepts have changed. The pairs like “gas emissions–greenhouse gases” and “environmental impact–environmental management” focus on environmental impact. Meanwhile, the frequent pairing of “circular_economy” with both “closed_loop_supply_chain” and “industry_4.0” signifies the link between sustainability and digital transformation. This suggests that the field has matured from isolated environmental concerns to integrated, innovation-driven frameworks. Moreover, the recurrence of bigrams such as “decision_making” with “environmental impact,” “economic and social effects,” “developing_countries,” etc. points out the growing attention to decision-making and sustainability. The presence of “blockchain–circular economy” and “food supply–food waste” further illustrates the evolution toward resilience, traceability, and resource efficiency.
4.2. Keyword Clustering
Clustering is performed in a two-step process. At a glance, different techniques were used to detect the cluster numbers and related features and then to present the clusters in 2D and 3D spaces. The selection of the optimal number of clusters is a fundamental step in k-means clustering that relates to the interpretability of the resultant groupings. For the determination of k for our keyword analysis, the elbow method from the automated KneeLocator algorithm was used [
98]. This process is multi-iterated across a range of settings to comprehensively assess the stability and sensitivity of the optimal k. Different settings for the number of clusters and the number of features were tested. During the process, the inertia score was recorded and analyzed. The most preferred results of setting were additionally explored. The optimal values varied significantly across different max features and cluster numbers, highlighting that the perceived optimal grouping structure can depend on the granularity of features considered. This deeper analysis allowed our study to determine the optimal k that is not only based on strong statistical results but also interpretable for the current research. We additionally conducted a sensitivity analysis over a broader range of cluster numbers (k = 3–8) and across multiple TF-IDF feature sizes (20–490 features). The elbow curves indicated that solutions between k = 3 and k = 5 were statistically plausible depending on feature granularity. Across this range, the same general thematic structure reappeared, showing high consistency in the dominant research themes. To evaluate the robustness of the clustering results, we examined both cluster sizes and stability. The resulting clusters were unbalanced (Cluster 2 is the largest one), which is typical for high-dimensional keyword data, but all four groups were sufficiently represented for interpretation. Stability was assessed by re-running the k-means algorithm under different random seeds and by applying the model to multiple random subsamples of the dataset. Across these checks, the overall macro-structure of the clusters remained consistent, indicating that the themes are stable under reasonable variations in initialization and data composition. Same approach was used for the abstract clustering parameters. For the current state, clusters_nm = 4 and max_features = 100 were selected. To verify the structural integrity of this distribution, an additional analysis was conducted. Specific terms such as “decision-making”, “manufacturing”, “innovation”, and “environmental management” were identified as additional domain words that form potential noise. They appear frequently, and they may be found as sematic points between the papers. By temporarily excluding these words and repeating the cluster exploration, we attempted to break the large cluster into a smaller one. Despite these experimental exclusions, the analyses consistently indicated an optimal number of clusters between 3 and 5, with the core thematic mass around one dominating cluster. This suggests that the large size of Cluster 2 is not because of specific common keywords, but a fundamental characteristic of the research landscape of the database. This dominance is interpreted as a finding reflecting the semantic integration of the research field. This indicates that the majority of research in sustainable supply chains has converged into a unified systemic framework. Within this mainstream, strategic decision-making, resilience, and technological integration are discussed as a consolidated research core rather than isolated topics. For space reduction, PCA was used to support the visualization in 2D and 3D spaces, which are presented in
Figure 2a,b. Under these settings, the silhouette coefficient calculated in the visualization framework is 0.562, representing separability in the reduced PCA space rather than a global clustering validity measure. It was utilized only to validate the visual coherence of the macro-clusters.
The figure illustrates the distribution of keyword clusters within a two-dimensional space. Cluster 1 and Cluster 3 are well defined and clear, occupying specific parts of the coordinate system. At the same time, Cluster 0 and Cluster 2 have significant partial overlap. They are concentrated near the coordinating system center, indicating a shared foundational vocabulary or less variance along the primary axes compared to the other clusters. To extend the analysis, the clusters could be presented in a 3D space as another point of view. While the 2D view suggested a high degree of overlap between Cluster 0 and Cluster 2, the addition of a third dimension effectively separates them along the vertical axis. Cluster 1 and Cluster 3 keep their high degree of separation and structural integrity, showing that their thematic uniqueness is clear across multiple dimensions. Cluster 0 and Cluster 2 remain proximal in the horizontal plane, and they demonstrate distinctness in 3D. The 3D view also clarifies that the clusters are not flat spots but also volumetric clouds, providing a more accurate representation.
Received clusters were defined with several keywords and weights for each. Because of the challenge of giving them shorter names, an LLM system was initially used and then names were updated. This step was supported by an LLM because converting long weighted keyword lists into short semantic labels is trivial for a language model, which can efficiently summarize the underlying meaning without influencing the analytical results. Prior research also shows that model-generated cluster names can be better than naming by experts [
99]. We used an LLM (Gemini 2.5) only to suggest human-readable labels for clusters and principal components. No primary analyses or inferences relied on LLM outputs. The prompt was related to a name in short, and these clusters were presented by keywords and their weights. An LLM was applied only for short label naming, without any analytical outcomes. The result named by the LLM and some additional refinement cluster names are:
Cluster 0: Corporate Social Responsibility and Stakeholder Engagement and Sustainable Development.
Cluster 1: Circular Economy and Sustainable Production and Resource Looping.
Cluster 2: Decision-making, Resilience and Emerging Technologies.
Cluster 3: Green Supply Chain Management.
For the naming of the Principal Components (PCs), the same approach was used and the results are:
For 2D space—PCA1: Environmental Performance and Sustainable Operations. PCA2: Circular Economy and Reverse Logistics.
For 3D space—PCA1: Green Economy and Sustainable Industrial Operations. PCA2: Reverse Logistics and Circularity Challenges. PCA3: Economic Performance and Environmental Oversight.
During the research process, different experiments with other clustering and topic modeling methods were performed. It could be mentioned that LDA after the adopted TfidfVectorizer vectorization was performed with different settings for cluster numbers and features. It is positive to mention that the results support the findings and could be used to augment the output. In our case, LDA n = 4 clustering also defined clusters with names related to corporate and social responsibility, circular economy, and green supply chain and a cluster covering the resilience and technological aspects. The latter showed slight deviations compared to keyword-based clustering, but the overall thematic alignment remains.
4.3. Emerging Research Topics
The largest thematic groups are “decision_making”, “green_supply_chain” and “circular_economy”, which indicate that these are the central topics in this field of research. The “decision_making” highlights the focus on strategic and operational choices under sustainability constraints. On the other hand, “green_supply_chain” emphasizes the environmental aspect of supply chains because there are separate topics with high importance, which are “environmental_impact” and “environmental_management”. It could mean that the green vaguely enlarges its boundaries and includes more activities in the supply chain that can be classified as sustainable. Equal importance is given to “circular_economy” in the supply chain, which shows its key role for resource optimization, value creation, and alignment with environmental goals. Additionally, smaller topics like “recycling,” “waste_management,” and “closed-loop_supply_chain” show commitment to end-of-life management of products and long-term environmental balance, which strengthen the circularity and are closely related. Various industries are affected by the circularity and green initiatives, and as a result, major topics cover “manufacturing” and “food_supply.” They gain this place in the supply chains because of varying sustainable practices in different productions. Other significant themes include “blockchain” and “industry_4_0.” The fact that they are more commonly used means that digital technologies are integrated in the supply chain in order to enhance transparency, traceability, and efficiency. Less dominant but important topics include “corporate_social_responsibility”, “economic_and_social_effects”, “social_sustainability,” which shows a recognition of the social dimension, although this domain remains to be less researched compared to others. In the same category and similar significance, “stakeholders” can be added because of their role in social responsibility and the multiple parties across the entire supply chain.
Overall, the topic distribution shows that the research is multi-dimensional, with a strong focus on decision-making, circularity, and environment, complemented by emerging technologies and social considerations. This indicates mainly two suggestions: first, although environmental and operational dimensions are sufficiently represented, there remains potential for improved incorporation of new technologies and social sustainability; second, the smaller the topic, the larger its potential for further research.
4.4. Keyword Cluster Development over Time Forecast
This study also takes a step toward forecasting bibliometric trends, and one of them is article number. After defining clusters, each article is assigned to a cluster, enabling the creation of a new data table showing the yearly number of article numbers per cluster for 2000–2025 (2025 data is incomplete and it is excluded). We used the auto_arima function from the pmdarima library (version 2.0.4) in Python [
100], automating the process of identifying the most suitable ARIMA model parameters. For each time-series, the auto_arima explores the combinations of the autoregressive order (p∈[0, 3]), the differencing order (d∈[0, 2]), and the moving average order (q∈[0, 3]). Seasonality is disabled, as no recurring periodic patterns are expected in the annual article counts. The optimal model for each cluster is identified using the Akaike Information Criterion (AIC). The stepwise = true option further optimizes the search process by employing a heuristic algorithm. Forecast length
n_periods = 3. Given that our annual time-series are short and bibliometric forecasting is not the primary aim of this study, the forecasts are interpreted solely as indicative directional trends. The analysis does not seek precise numerical prediction but instead provides a high-level view of the short-term evolution of the identified clusters. This framing aligns with the exploratory nature of the study and avoids the over-interpretation of statistical output based on limited data. To make the results comparable, the forecast data is normalized and presented together on a plot. The pyplot module of Matplotlib [
101] (version 3.10.0.) was utilized to create the graphical representations. The auto_arima procedure identified the optimal ARIMA model for each cluster by minimizing the Akaike Information Criterion (AIC). The difference to achieve data stationarity is performed. The Ljung–Box test results, with generally high
p-values Prob(Q) (0.08 for Cluster 0, 0.58 for Cluster 1, 0.64 for Cluster 2, and 0.68 for Cluster 3), indicate that the models are quite a good fit for explaining the evolution of each cluster’s article count over time. The ARIMA models demonstrate adequate short-term fit, as indicated by the low AIC values and the absence of residual autocorrelation, making them suitable for identifying directional tendencies in cluster development. Given the dynamic nature of the field and the limitations of the available time-series length, these forecasts are intended to remain explicitly exploratory and directional. They serve as a direction for research trajectories and thematic shifts within the sustainable supply chains, rather than precise quantitative predictions. However, a comprehensive assessment of the residual diagnostics also revealed consistent characteristics across all clusters. The residuals are not normally distributed, and there is a heteroskedasticity, providing strong evidence that the variance of the forecast errors is not constant over time. While the models effectively capture underlying trends, these persistent heteroskedasticity and residual diagnostics additionally impact the statistical validity of the calculated confidence intervals. Because of that, the forecasts offer valuable insights for general future directions, and the results should be interpreted with caution. For this reason, the forecasting results are presented as indicative rather than conclusive, and they should be viewed as a supportive context for thematic tendencies rather than as precise quantitative predictions.
To present the overall forecast, the normalized time-series and forecasts are presented in
Figure 3.
Forecast values are shown in the normalized form for comparability across clusters. The confidence intervals should be read as indicative ranges, and the forecasts interpreted as general directional trends rather than precise numerical predictions. The forecast values are shown as normalized trajectories to support a cross-cluster comparison. The results reflect general directional tendencies in publication dynamics rather than precise numerical predictions. Additionally, other forecasting approaches—such as exponential-smoothing methods, growth-rate transformations, or count-based models—could be explored in future research, particularly as longer and more granular bibliometric datasets become available.
From the data collected, it is evident that Cluster 1 and Cluster 2 have a very positive forecast for their number of publications in the near future, i.e., in the next 3 years. Cluster 3 does not have the same positive projection. Cluster 0 is at a maturity level. These results reveal some general trends in the data. This is the overall expectation for a cluster’s evolution in the future. Definitely, the circular economy’s resilience, as well as the environmental decision-making and the tech-driven processes in the supply chains, becomes more attractive. Green Supply Chain Management may not keep growing, while corporate social responsibility and topics related to stakeholders or generally related to sustainability are reaching stable levels. The observed differences in cluster trajectories can be explained by their thematic composition. Clusters 1 and 2 include topics strongly aligned with current priorities—such as the circular economy, environmental decision-making, supply chain resilience, and technology-enabled logistics. These areas have experienced rapid growth maybe because regulatory pressure, digital transformation, and increasing sustainability requirements in practice, which also reflects in academic research. In contrast, Cluster 3 contains topics where research growth has a maturity and Cluster 0 reflects an already established stream of literature, where publication volumes remain relatively stable over time. The positive forecast for Cluster 2 looks particularly significant. As previously noted, this cluster represents the integrated semantic core of the research field. The fact that this consolidated mainstream shows a strong upward trajectory indicates that the field is not just growing in volume, but is actively maturing around its most integrated systemic framework. This alignment between the structural dominance of Cluster 2 and its forecasted growth reinforces the conclusion that the future of sustainable supply chain research is increasingly focused on the convergence of management, technology and resilience.
4.5. Keyword Co-Occurrence Networks
The defined clusters contain various words that can co-occur as bigrams. This reveals deeper relationships. This study applies a bigram filter with two settings—top 100 and top 20 bigrams—to explore these connections. Visualizations in
Figure 4 and
Figure 5 show word frequency (circle size) and bigram strength (line thickness).
In both figures, it is seen that several groups are formed. In the first figure, a significant center is present around the environmentally related problems, and a significant focus there is the green supply chain. It is surprising or unexpected that the term “closed-loop supply chain” appears further away from environmental terms and instead closer to waste management and recycling. Also, the topic of food management emerges with clear significance. Decision-making is very clearly defined with multiple interrelated links across many of the other words. It is surprising that Industry 4.0 and blockchain terms show weak co-occurrence with the main sustainability vocabulary, suggesting these themes are somewhat isolated. In the second figure, the number of the words is reduced, and some more clear links are represented. Environmental problems and the green supply chain keep their places. On the other hand, there are circular economy- and recycling-related problems. In the middle, decision-making is the common bridge between all of the mentioned problems.
4.6. General Abstract Results and Token Clustering
After data cleaning, 8824 abstracts were available and processed, which represents 98.5% of the initial dataset (8824 out of 8955 records). The abstract analysis supports earlier findings by examining bigram frequencies and part-of-speech (POS) dependencies. Frequent bigrams such as circular economy, environmental impact, carbon emission, and social responsibility highlight the field’s focus on environmental and societal dimensions. The emergence of Industry 4.0 reflects the digital transformation of supply chains, aligning with keyword trends toward resilient, tech-integrated logistics. Green management remains stable, suggesting its role as an important sustainability strategy. Decision-making terms appear less frequently in abstracts than in keywords. POS dependency analysis further reveals links—dominant pairs like (economy, circular) and (performance, environmental) indicate systemic and performance-oriented approaches. The terms (management, green) and (responsibility, social) reflect ethical and managerial integration. Overall, the abstracts show the evolution from environmental protection toward value-generating, complex logistics systems.
K-means clustering was performed after tokenization of the abstract. After this step, several tokens were removed as publishers’ names, years, words related to the research process (review, researcher, study, literature, paper, etc.), and dominating words that are expected to be presented in the abstracts, like supply, chain, and sustainable. The overall process applied for the keyword clustering was repeated to find the appropriate number of clusters and features. The clustering results are less definitive compared to the previous keyword analysis. Therefore, they should be viewed as providing a general perspective rather than a precise conclusion. For this paper, final clusters_nm = 6 and max_features = 100 were selected, and the results are presented in
Figure 6a,b.
Using the same approach as that of keyword clustering for naming the clusters, they are:
Cluster 0: Industry and Business Practices for a Circular Economy.
Cluster 1: Environmental, Social and Risk Management.
Cluster 2: Supplier Relationships and Performance Management.
Cluster 3: Food Production and Consumer Waste Management.
Cluster 4: Carbon Emissions and Cost Reduction Strategies.
Cluster 5: Green Practices and Environmental Performance.
To strengthen the validation, we also examined how the four keyword-based clusters relate to the clusters obtained from the abstract token analysis. This comparison helps to show whether the two methods point to similar thematic structures or whether they diverge in a systematic way, as shown in
Table 2.
The results indicate that the main themes appear in both approaches, but sometimes at different levels of detail. One of them is the Circular Economy And Sustainable Production topic. The other one is green management with a significant overlap. The last one that is evident is the topic that relates to performance and resilience. The most evident discrepancies are in specific application areas. The abstract clusters reveal several specific domains that are not mentioned in the keyword clusters. for instance, “Food Production and Consumer Waste Management” (abstract Cluster 3). The same is for “Carbon Emissions and Cost Reduction Strategies” (abstract Cluster 4). Also, the keyword cluster “Corporate Social Responsibility and Stakeholder Engagement and Sustainable Development” does not have a direct equivalent in the abstract clusters. Finally, the keyword cluster “Decision-making, Resilience and Emerging Technologies” is very different. As the dominant cluster in this study, it functions as a primary strategic driver that integrates various operational themes. This is a highly conceptual cluster that does not align with any single abstract cluster. Its components are likely distributed across various abstract clusters. The specific mention of “Emerging Technologies” is very unique in this way. The results demonstrate that the two clustering methods output some distinct clustering, they are complementary and should be used together. The keyword clustering provides a high-level, conceptual overview, while the abstract clustering offers some more specific perspective. By combining these approaches, research can be augmented. This dual analysis offers a major point of view of the research area. Overall, this comparison shows that the abstract-based clustering supports the main thematic structure found in the keyword analysis. Each method highlights the themes with a different level of granularity. This ensures that the two approaches are not producing parallel or disconnected results, but support each other.
5. Discussion and Hypotheses’ Confirmation
The purpose of this discussion is to interpret the large-scale patterns derived from the literature of nearly 9000 articles. Given the scope of this research, the aim of this study is to systematize long-term thematic developments and identify points where existing concepts of sustainability develop, converge or diverge. For this reason, the focus is on how empirical models refine or nuance traditional understandings of sustainable supply chains. The discussion related to the research questions aims to clarify how environmental, circular, technological and social themes interact in the literature and how they shape the current conceptual landscape of sustainable supply chain research, which is also one of the main contributions of this study. The discussion expands on how the empirical findings connect to established concepts in the field and highlights new interactions that earlier research has not fully addressed. This helps improve clarity and reduces redundancy in interpreting the results.
5.1. Thematic Structure of the Sustainable Supply Chain Landscape (RQ1 and H1)
The clustering and co-occurrence analyses identified four macro-clusters in the literature (Corporate Social Responsibility and Stakeholder Engagement, Circular Economy and Sustainable Production, Decision-Making, Resilience and Emerging Technologies, and Green Supply Chain Management). This structure empirically confirms H1, showing that sustainability research in supply chains is indeed organized around a limited number of interrelated domains combining environmental, social, and technological dimensions. Earlier conceptual assumptions [
1,
2] are thus corroborated by quantitative evidence. The coexistence of “traditional” green SCM topics (e.g., environmental management, waste reduction) with emerging ones such as AI, IoT, and blockchain integration reflects the field’s transition from compliance to innovation. The simultaneous presence of CSR-related and technology-related clusters also indicates an expanding theoretical spectrum—from ethics and stakeholder engagement to data-driven governance—suggesting that sustainable supply chains are now interpreted as socio-technical ecosystems rather than isolated managerial processes. This thematic structure reflects a shift toward more integrated socio-technical interpretations of supply chain sustainability, where environmental and technological domains reinforce each other. Previous reviews consistently identify thematic clusters concerning green logistics, circularity, and technology-enabled supply chains. Our findings validate these prevailing streams while also introducing complexity by illustrating their prolonged interaction and co-evolution. A finding of this study is the high degree of semantic integration within the research core. Sensitivity analysis demonstrated the structural integrity of the database. The dominance of one cluster (keywords) is not a methodological result, but maybe a reflection of a systemic research practice in the context of SSCM. This is also validated by abstract clustering. Such convergence suggests conceptual maturity, where environmental, technological, and managerial dimensions have evolved from isolated variables into an integrated research mainstream. Like [
58], we see that there are still gaps in research on social sustainability and developing economies. This suggests that these imbalances are structural and not just random. Simultaneously, the evident co-emergence of circularity and digitalization in our data appears more pronounced than in previous reviews, suggesting a growing integration of these themes.
5.2. Temporal Evolution and Forecasting of Research Trends (RQ2 and H2)
The longitudinal keyword distribution (2000–2025) reveals distinct chronological phases in SSCM’s evolution.
Phase I (2000–2008) was dominated by “green logistics,” “environmental management,” and “costs,” representing the classical efficiency-driven approach.
Phase II (2009–2018) saw the emergence of “CSR,” “reverse logistics,” and “closed-loop systems,” marking the integration of circular economy principles.
Phase III (2019–2025) shows the rise of “circular economy,” “blockchain,” and “Industry 4.0,” pointing to a digitally enabled, regenerative perspective on sustainability.
The ARIMA forecasting supports this shift. Clusters 1 (Circular Economy and Sustainable Production) and 2 (Decision-Making, Resilience and Emerging Technologies) display a strong positive growth trend over the next three years, while Cluster 3 (Green SCM) shows a plateauing trajectory—signifying a maturing research field. Cluster 0 (CSR and Stakeholder Engagement) stabilizes at moderate levels, suggesting consolidation rather than expansion.
These results confirm H2, demonstrating that research scholarships are moving from linear, environmentally centered approaches toward integrated techno-circular frameworks. The accelerated growth of technology-enabled and circular research—especially around traceability, predictive analytics, and digital transparency—signals that the next research frontier will be characterized by resilient, intelligent, and adaptive supply chains aligned with Industry 5.0. These trends further underscore the growing importance of digital transparency, predictive capabilities, and adaptability in shaping the next generation of sustainable supply chain research.
5.3. Conceptual Interconnections and Bridging Mechanisms (RQ3 and H3)
The keyword co-occurrence network highlights the bridging function of “decision-making,” “traceability,” “digitalization,” and “resilience.” These terms act as conceptual nexuses linking the environmental, circular, and social domains of sustainability.
This observation validates H3, supporting the hypothesis that decision-making and digital innovation form the connective tissue of the research. Theoretical implications point to a hybrid paradigm—the techno-circular supply chain—in which sustainability performance is enhanced through continuous feedback loops between technological intelligence and environmental objectives.
Practically, this conceptual bridge reflects a managerial evolution: firms are shifting from reactive compliance (focused on environmental standards) to proactive orchestration, where data-driven decision systems coordinate sustainability across global networks. The growing prevalence of terms like “blockchain,” “transparency,” and “AI” confirms that technology is no longer an enabler but a core component of governance within sustainable supply networks.
5.4. Underrepresented Domains and Research Gaps (RQ4 and H4)
Despite these advances, the text mining results expose a clear asymmetry among the three pillars of sustainability. While environmental and technological topics dominate the research landscape, social sustainability—encompassing labor welfare, inclusivity, equity, and inter-organizational collaboration—remains structurally marginal. Terms such as “social sustainability,” “human well-being,” and “equity” appear sporadically and exhibit weak interconnections within the co-occurrence networks. It should also be noted that the social dimension may be partially influenced by the database choice, as social-oriented studies are more frequently published in regional or non-English outlets that are not included in the current database.
This evidence confirms H4, reinforcing prior findings indicating that the social pillar is consistently underexplored [
30,
31]. The imbalance underscores the need for a new generation of SSC and SSCM studies integrating human-centered design, participatory governance, and ethical digitalization. Future research should investigate how emerging technologies—AI, IoT, and blockchain—can be leveraged not only for efficiency but also for social value creation and collective well-being, aligning with the principles of Industry 5.0 and the UN SDGs.
6. Conclusions
This study provides a quantitative and forward-looking examination of the evolution of sustainability problems by integrating large-scale text mining and time-series forecasting. Analyzing nearly 9000 indexed publications from 2000 to 2025, this research identified the main thematic clusters shaping the field and projected their short-term development using forecasting models. Findings indicate that the topic has progressively transitioned from a predominantly environmental and compliance-oriented perspective toward a techno-circular paradigm, where circular economy, digital transformation, and resilience converge. This shift highlights how sustainability discourse is increasingly driven by technological innovation—particularly AI, IoT, and big data analytics—while revealing that the social pillar remains relatively underrepresented. Methodologically, this study demonstrates how computational text-mining approaches can enrich sustainability research by capturing both historical dynamics and emerging trajectories. This study does not seek to offer definitive conclusions, but rather to provide an additional analytical instrument that upgrades existing approaches for examining sustainable supply chain research. Also, the current dataset allows the extraction of a matrix with countries and priority keywords. This shows strong potential for updating the results with regional research differences. This step is not included here because it requires a broader, separate methodological approach and different database processing. Here, we only highlight the potential of this direction, while its full development is planned for the later stage of the research project.
6.1. Theoretical Implications
This study consolidates different perspectives of TBL and how they interact with circular and digital innovation within sustainable supply chains. An integrative perspective is proposed, revealing that technological capabilities and circular practices often develop together, while decision-making, transparency and traceability act as bridges between environmental, economic and technological directions. These insights help clarify current trends in the field and view supply chains as adaptive systems that tailor to changing sustainability priorities. The persistent weakness of the social dimension underscores the need for models that integrate human-centered design and ethical governance, paving the way for an Industry 5.0 paradigm focused on inclusivity and collective value creation.
6.2. Managerial and Policy Implications
Sustainability and digitalization should be integrated rather than treated as separate priorities. The convergence of circular and technological paradigms calls for strategies where environmental and digital investments reinforce each other. Firms need data-driven architectures using IoT, analytics, and predictive tools to optimize resources, cut emissions, and improve transparency. Social sustainability indicators such as labor well-being and diversity should be embedded in performance dashboards to ensure technology does not compromise ethics. Policymakers should promote balanced regulations and align disclosure standards (ESG, GRI, CSRD) with technological and circular measures to accelerate progress toward the UN SDGs. The cluster analysis indicates where managerial and policy attention is most needed. The rising importance of circularity and technology-enabled resilience suggests that organizations should prioritize investments in reverse-flow systems, digital traceability, and risk-responsive planning. At the same time, the stabilizing CSR- and green-oriented themes show that these practices have become standard expectations. These insights help firms and policymakers focus on the areas where future development is likely to be most impactful.
6.3. Limitations and Directions for Future Research
While this study offers an innovative overview of sustainability problems in supply chains, several limitations indicate potential for further research. First, relying solely on a single database-indexed publication set may exclude regional and non-English studies. Future work should be integrated with more databases. This reliance on a single database may systematically underrepresent social-oriented sustainability research, which is more frequently published in regional or non-English outlets. This potentially is affecting the observed thematic clusters. Second, clustering and applied forecasting could be adjusted if more detailed research questions are defined. At this stage, focus is more on the automation process that support researchers and the potential to be applied in other fields. Third, text mining identifies patterns but does not explain them, so this is a basic analysis that could be combined, for example, with other qualitative methods. We acknowledge that more advanced semantic models give more depth, but they require full-text corpora and longer inputs. We also acknowledge that using abstracts, keywords, and a single database reduces the interdisciplinary and completeness of the corpus. The findings should therefore be interpreted within the managerial research domain, and future studies may integrate additional subject areas and full-text corpora.
It should be noted that forecasting publication counts reflects only the dynamics of research activity and not the intrinsic scientific importance of a theme. Growth in article numbers may be influenced by external factors such as funding priorities, policy agendas, institutional incentives, and editorial trends rather than the intellectual significance of the topic itself. For this reason, the identification of circular economy and technology-related themes as emerging directions represents a descriptive observation based on publication trajectories rather than a normative evaluation of their scholarly importance. Given the continuously growing volume of scientific publications, the need for automated tools that support the research process is increasing. Our approach is a step in this direction, and we welcome suggestions that could extend our approach and ideas.