Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis

Mohapatra, Subhashree; Singh, Jai Govind; Samantaray, Subham Pankaj; Mishra, Manohar

doi:10.3390/a19050390

Open AccessReview

Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis

¹

Department of Computer Science and Engineering, Siksha O Anusandhan (Deemed to be University), Bhubaneswar 751030, Odisha, India

²

Faculty of Climate Change and Sustainability, Asian Institute of Technology, Klong Luang 12120, Pathumthani, Thailand

³

Department of Electrical and Electronics Engineering, Siksha O Anusandhan (Deemed to be University), Bhubaneswar 751030, Odisha, India

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(5), 390; https://doi.org/10.3390/a19050390

Submission received: 2 April 2026 / Revised: 4 May 2026 / Accepted: 8 May 2026 / Published: 14 May 2026

Download

Browse Figures

Versions Notes

Abstract

In 1997, Long Short-Term Memory (LSTM) networks were proposed, which significantly changed the landscape of sequential data analysis by resolving the critical issue of the vanishing gradient problem in recurrent neural networks (RNNs). Over the last 25 years, LSTM has advanced from its inception as an innovative solution to its widespread adoption as an essential tool in various fields, including natural language processing (NLP), speech recognition, financial prediction, and healthcare analytics. The present study is a bibliometric review of the evolution of LSTMs. The evolution of LSTM is discussed in terms of its theoretical advancements, architectural developments, and its applications. The study is based on data obtained from the Scopus database, which is then analyzed to identify publication patterns, prominent authors, prominent institutions, and global contributions to the field. The present study is an insightful review of the evolution of LSTM, highlighting its developments and advancements, as well as its applications, to identify its future scope.

Keywords:

fundamentals of Long Short-Term Memory; historical development of LSTM; bibliometric analysis of LSTM research; research trends in recurrent neural networks; scientific impact of LSTM models; publication patterns in LSTM studies; evolution of LSTM in deep learning

1. Introduction

Long Short-Term Memory (LSTM) networks, which were introduced in 1997, have become the mainstay of Deep Learning (DL), overcoming the limitations of conventional Recurrent Neural Networks (RNNs) in handling long-range dependencies [1]. Keeping in mind the capabilities of LSTMs in overcoming the ‘Vanishing Gradient’ problem and efficiently handling sequential data, they have been employed in numerous domains such as Natural Language Processing (NLP), Speech Recognition (SR), Financial Forecasting (FF), Health Analytics (HA), Energy Forecasting (EF), and Self-Governing Systems. During the preceding two and a half decades, the academic and industrial communities have reported a rapidly growing and diverse body of literature exploring LSTM architectures, variations, and usages [2]. This necessitates the use of effective methods for understanding the trends, impact, and future of LSTM research through data-driven approaches that can effectively map the complex knowledge support.

In general, review articles fall into three main categories: systematic reviews (SR), meta-analyses (MA), and bibliometric analyses (BA), each with distinct objectives and methodologies (refer to Figure 1) [3,4]. SR aims to rigorously synthesize the literature around a focused research question using predefined protocols, often leading to qualitative visions. MA, in contrast, quantitatively combines data from multiple empirical studies to identify effect sizes or relationships [3,4]. The word “Bibliometrics” refers to the field that uses quantitative methods and draws upon multiple disciplines, including philology, information science, mathematics, statistics, and engineering. Considering a significant subfield of information science, bibliometrics can be applied to examine the characteristics of publications within a particular research area or academic paper. It also helps uncover the internal structure and interconnections among these periodicals [5,6].

Recently, bibliometric reviews have emerged as valuable tools for quantifying and visualizing the intellectual landscape of scientific subjects. With the help of citation analysis, keyword co-occurrence, and network visualization techniques, bibliometric studies can identify influential works, key contributors, emerging research fronts, and evolving patterns of collaborative research [7]. Figure 2 illustrates the annual publication trend of bibliometric-related journal papers from 2015 to 2025, based on Scopus data retrieved using a targeted title-based search query. Figure 2a shows an increase in the number of publications over time, starting from around 2018, with a sharp increase in the number of publications in the year 2025, reaching more than 5000 articles. This shows a rising trend in the use of bibliometric analysis techniques in different subjects. In Figure 2b, the bar chart shows the distribution of different publications according to their subjects, showing that Medicine (18.6%) and Social Sciences (15.2%) are the most active subjects, followed by Environmental Science, Computer Science, Business, and Engineering. All other categories show 25% of the total publications, demonstrating that the use of bibliometric analysis techniques is quite widespread. This trend highlights the growing role of bibliometrics in research evaluation, thematic mapping, and scientific policy development.

Considering the exponential growth and interdisciplinary adoption of LSTM models since their inception, a bibliometric approach is well-suited to comprehensively analyze the evolution and intellectual structure of this research domain. This work attempts to undertake such a task by providing a comprehensive bibliometric analysis of LSTM research across a 25-year timeframe. From the insights obtained by the authors through the extraction of information from large-scale publication data, the objective is to provide an overview of the trajectory of the growth of LSTM-related publications, the research conducted in different domains and geographical regions, and the key themes or clusters that have dominated the LSTM domain.

In the context of the above-mentioned objectives, the present review critically evaluates the previously published literature mentioned above. The key aspects of the previously published literature that have been evaluated include the comprehensiveness of the dataset used in the study, the appropriateness of the bibliometric indicators used in the study, the appropriateness of the visualization techniques used in the study, the extent to which the study reflects the nuances of LSTM research trends and their application, and the timeliness of the study in the context of recent advancements in DL, including the introduction of the Transformer architecture that has replaced LSTM in certain domains. By critically evaluating the advantages and disadvantages of the previously published study in the context of the objectives of the present study, it is possible to determine the usefulness of the study as a resource for researchers who wish to navigate the extensive landscape of sequential modeling research.

2. Review Methodology

The review methodology for this present study begins with the formulation of the objectives and the research questions. As the LSTM has been in existence for nearly three decades since its introduction in 1997, the main objective of this present study is to provide an exhaustive review of the bibliometric analysis of the developmental trajectory of the LSTM over the past 25 years. Keeping the above objective in mind, the following research questions (RQ) are formulated by the authors to provide the direction for the present study, which would serve as the foundation for the exhaustive review of the research trajectory of the LSTM over the past two and a half decades.

RQ-1: Are there identifiable phases in the evolution of LSTM research, and what characterizes each phase?
RQ-2: What are the publication trends in LSTM research?
RQ-3: Who are the most influential contributors and which regions lead in LSTM research?
RQ-4: Which publications have had the greatest impact?
RQ-5: How has the thematic focus of LSTM research evolved?
RQ-6: What are the patterns of collaboration in LSTM research?
RQ-7: What are the broader trends in deep learning and sequence modeling?

After formulating the required RQs, in the third step (as shown in Figure 3), a search strategy is implemented to collect the data of publications in the most popular scientific database (SCOPUS). Afterward, inclusion and exclusion criteria are applied to refine the data, making it more domain-specific and relevant. Then, in the fourth step, the data is pre-processed using various techniques, including data cleaning, duplication removal, standardization, and format conversion. This yields systematic and vibrant data that can be used for the bibliometric analysis. Afterward, a bibliometric study and network analysis were conducted to address RQs 1 to 7. The output of all these studies is visualized through specific charts and with advanced software such as VosViewer (version 1.6.20). All the findings are interpreted by listing the key findings.

2.1. Data Collection and Pre-Processing Database Selection

In this study, the “Scopus” database is used as the main bibliographic data source, considering that it covers all scholarly publications (such as peer-reviewed high-quality journals, conference proceedings, and interdisciplinary publications) that have cited references since 1970, and citation analysis since 1996. Nevertheless, it should be mentioned that using a single bibliographic database could lead to some sort of coverage bias, since there may be some papers relevant to this study that were published in other databases, like Web of Science, which are not included in this analysis. Nonetheless, despite the limitation, Scopus has proven itself in many bibliometric studies due to its structure and consistency.

2.2. Inclusion Criteria and Data Cleaning

Following data retrieval, a systematic screening and cleaning process was applied to ensure dataset quality. Duplicate records were removed, and entries with incomplete metadata were excluded. Considering the ‘survey theme’ and RQs, the keywords for initial search are kept as {“Long-Short Term Memory Network” OR “LSTM”} with the time span 1997 to 2025. This yields a total of 111,133 items. However, it has been seen that the term “LSTM” is confusing with another area, such as “LSTM Erlangen,” which is the name of the institute (the Institute of Fluid Mechanics’ LSTM, located at Erlangen). Therefore, within this search, an exclusion criterion (EC-1) is applied by searching for the term “Erlangen”. Afterwards, another exclusion criterion (EC-2) is used to exclude items such as Erratum, Retracted, Letter, Book, Note, Editorial, Data Paper, Short Survey, etc. A few abstract conferences proceeding is included within this search, having “Author Name” as “Undefined”. So, an additional exclusion criterion (EC-3) is applied to exclude this type of item. The output of this search comprises 109,023 items, including journal articles, conference papers, book chapters, conference reviews, and review articles. In this regard, the final output search keywords include “TITLE-ABS-KEY (LSTM) AND PUBYEAR > 1997 AND PUBYEAR < 2027 AND (EXCLUDE (DOCTYPE, “er”) OR EXCLUDE (DOCTYPE, “tb”) OR EXCLUDE (DOCTYPE, “le”) OR EXCLUDE (DOCTYPE, “bk”) OR EXCLUDE (DOCTYPE, “no”) OR EXCLUDE (DOCTYPE, “ed”) OR EXCLUDE (DOCTYPE, “dp”) OR EXCLUDE (DOCTYPE, “sh”))”. Following this, considering the possibility of confusion that may arise due to the use of the word “LSTM,” a manual check has been conducted to filter out non-LSTM-related articles from the list. The purpose of doing this is to increase precision. Lastly, the final filtered dataset consisting of a “total 105,238 numbers” was exported to BibTeX and CSV files compatible with VOSviewer software. Afterward, the author names, affiliations, and keywords are standardized to ensure consistency; for example, “Long Short-Term Memory,” “Long Short-Term Memory,” and “LSTM” were unified under “LSTM.” Likewise, institution names were harmonized (e.g., “MIT” and “Massachusetts Institute of Technology” are merged). Details of the data collection are given in Table 1.

2.3. Sensitivity Analysis

In order to determine the strength of the chosen search strategy, sensitivity analysis was carried out on three additional variations of queries: (i) “LSTM,” (ii) “Long Short-Term Memory,” and (iii) “LSTM + Long Short-Term Memory.” As illustrated in Table 2 below, it is evident that while the number of documents identified using each of these queries may vary, the general trend of publications, growth, and prominent contributors stayed constant. Most importantly, a sharp rise in publications post-2020 was observed in all three cases, with common influential authors and topics dominating the field. Based on these results, it can be concluded that the study’s major findings are insensitive to the use of specific queries.

2.4. Analytical Tools

Considering the study’s motto, which is to provide a comprehensive bibliometric analysis of LSTM research from its inception, a combination of software tools is employed to ensure thorough data processing, statistical evaluation, and visualization. Firstly, Microsoft Excel is used for data cleaning, deduplication, and initial tabulation of publications per annum and document types. It is used for manual harmonization of author names, institutions, and keywords. VOSviewer is used for creating bibliometric networks such as co-authorship, co-citation, and keyword co-occurrence networks. This allows for clear identification of thematic clusters and collaboration links. This workflow allows for a robust exploration of the structure, impact, and development of LSTM research.

In the subsequent section, the foundation of LSTM is presented to address RQ1, focusing on its publication trends and historical development. This is followed by other sections that explore RQ-2 to RQ-7.

3. Inception and Foundational Concepts

One of the most important advancements in the area of sequence modeling comes from Long Short-Term Memory (LSTM) networks developed by Hochreiter and Schmidhuber (1997) [1]. Conventional RNNs have difficulties modeling long-range dependencies because of the vanishing and exploding gradient problem during backpropagation through time. LSTM solves the aforementioned issue through the introduction of a special mechanism, including a memory cell and gate structures.

In general, the key characteristic of LSTM is the existence of a cell state that allows for information preservation during extended time periods. Gating structures, such as input, forget, and output gates, control the process of adding, storing, or deleting information in the memory cell. Due to the unique characteristics mentioned above, LSTM architectures are capable of learning time dependencies of sequences, which can be widely applied to natural language processing, speech recognition, and forecasting of time series.

Figure 4 demonstrates how the interaction between the memory cell and gate units takes place within an LSTM cell architecture.

Although the technical background of LSTM is rather advanced, its significance is not limited to architectural innovations. The emergence of LSTM was one of the key moments in the history of sequence modeling research since it was followed by the sharp growth in publications and the applicability of LSTM networks in numerous fields.

Architectural Refinements over Time (RQ-1)

Over time, the LSTM architecture has undergone numerous modifications to improve its performance and effectiveness. The Gated Recurrent Unit (GRU), introduced by Cho et al. in 2014, is an example of such a variant, which simplifies the LSTM structure by combining the input and forget gates and eliminating the output gate. Another variation includes peephole connections, which allow gates to access the cell state directly, and Bidirectional LSTM (BiLSTM), which processes sequences in both forward and backward directions, enhancing performance in tasks where contextual information from both ends is helpful. Furthermore, attention mechanisms and residual connections have also been incorporated into LSTM architectures in new research to enhance their learning abilities. The available variants of LSTM architectures with respect to year-wise development are presented in Figure 5 with more details.

4. Bibliometric Analysis

In this section, the results obtained by the bibliometric study on the research on LSTM in the last 25 years are presented and structured according to the research questions (RQs) proposed in Section 2. Indeed, in the different subsections of this section, a specific analysis is proposed to answer a particular RQ, considering the systematic study on the evolution of LSTM in scientific sources. This dissertation provides a mix of quantitative information on the research on LSTM and qualitative interpretations of the emerging research trends and findings. Indeed, the proposed methodology not only provides information on the structural and cognitive landscape of research on LSTM but also on the evolving nature of the research on the topic in the context of the latest developments in the field of DL and sequence modeling. The results cover the emergence of the technique in the late 1990s, the adoption in the 2010s, and the latest trends until 2024.

4.1. Publication Trends in LSTM Research (RQ2)

The growth pattern of LSTM research publications presents a basic idea of the development of interest in this architecture. In 1997, LSTM was inducted by Hochreiter and Schmidhuber [1]. In the early years, LSTM attracted little interest. Nevertheless, the architecture’s potential for handling long-range dependencies in sequential information provided a foundation for its resurgence in the DL world. Figure 6 presents a diagram representing the annual growth in LSTM-related research publications from 1997 to 2025. It is obvious from the figure that in the first five years (from 1997 to 2002), there were fewer research publications in LSTM. In this period, the focus of research was on theoretical study and early experimentation. After 2013, a major change occurred in LSTM-related research publications. This change occurred because of the breakthrough in DL and advancements in computation, especially in terms of GPU technology. In this period, the potential of LSTM in handling sequential information was widely acknowledged. As a result, there has been a significant increase in research publications from 2015 to 2019. In this period, the focus has been on conference proceedings, in which LSTM has officially entered the world of ML, and the publication count has increased progressively.

The peak for the LSTM study in terms of both diversity and volume was between 2020 and 2025. This is where there was significant growth in terms of journal articles compared to conference articles. This may suggest a maturation of the domain, where more research is progressing beyond early-stage experimentation into peer-reviewed, distinguished publications. Furthermore, there was an increase in book chapters and monographs, representing that LSTM had also become a key subject in academic curricula and interdisciplinary research fields. The detailed data in Figure 6b shows that journals accounted for the most significant proportion of publications by 2022, while conferences continued to contribute heavily to the field’s growth.

4.2. Influential Contributors and Regions (RQ3)

Acknowledging the key contributors and geographical distribution of LSTM research offers valuable insights into how knowledge has been generated, disseminated, and established over the last 25 years or since its inception. This section recognizes the most productive authors, organizations, and states, and evaluates the global collaboration patterns that have shaped the evolution of LSTM-based research.

4.2.1. Leading Authors and Research Impact

In this bibliometric analysis, it has been found that a relatively small group of authors has played a central role in driving LSTM-based publications. Obviously, Sepp Hochreiter and Jürgen Schmidhuber, co-researchers of the original LSTM architecture (1997) [1], remain among the most quoted and prolific contributors. As soon as the DL was fully developed, new authors also emerged, making notable impacts in applied fields such as NLP, TS forecasting, and speech recognition.

Figure 7 depicts the top 15 authors ranked by the number of LSTM-based articles published. The depicted researchers not only contribute significantly to the literature but also often serve as hubs of collaboration in their own networks.

In order to explore the association between research output and impact, Figure 8 shows a bubble chart that displays total publication number against citation number, where the bubble size reflects citation intensity per publication. In order to account for the possibility of an older publication being favored, the citation number was adjusted by citations per year, taking into consideration each author’s publication years. It helps to distinguish between high-output researchers and those whose fewer papers have had an outsized impact. For example, C. Shen has fewer papers but a high citation count and the largest bubble, indicating a high impact, while I. Ashraf has the most publications but a lower average impact. Authors like S. Tanwar and R. C. Deo balance both productivity and influence, whereas D. S. Khafaga has low productivity and effect.

4.2.2. Institutional Contributions

To illustrate the diversity of participation in advancing LSTM research through various institutions, Figure 9 displays the top 10 institutions by number of LSTM-based publications, which highlights both academic and industrial sponsors.

This analysis of organizational contributions highlights an intense concentration of LSTM-based research within top Chinese academic and research institutions. The top contributors include well-known educational institutions such as the Ministry of Education of the People’s Republic of China, the Chinese Academy of Sciences, and several other leading universities, including Tsinghua, Zhejiang University, and Shanghai Jiao Tong University. This also shows the power and dominance of the organization in using the latest technologies in the field of DL. These domains range from engineering to computer science and communication systems. This trend has been popularized by the research priorities and the popularity of the LSTM architecture.

4.2.3. Global Distribution of Research Output

In order to analyze the global distribution of research output on LSTM-based publications, a country-level heat map is presented in Figure 10. It can be seen that the geographical landscape of LSTM research is diversified; however, certain countries have clearly emerged as top runners. China unswervingly ranks first in both publication count and citation impact, driven by its robust university system and deep integration between academia and industrial sectors. Other key contributors include India, the USA, the Republic of Korea, Germany, Australia, the United Kingdom, and Canada. Remarkably, India has shown rapid growth in publication volume in the last decade, though citation impact per paper differs.

4.3. Most Impactful Publications and Citation Analysis (RQ4)

Across a vast volume of publications, citation analysis offers a deeper understanding of which works have had the most significant influence on the development and dissemination of LSTM-based research work. In this regard, this section focuses on highly cited papers, explores co-citation patterns, and identifies citation bursts that mark pivotal moments in the field’s development.

4.3.1. Highly Cited Papers

Generally, highly cited papers represent the foundational literature. At the top of the list (as presented in Table 3) is the original paper by Hochreiter and Schmidhuber from 1997 [1], which introduced the LSTM model. Although the paper was not highly cited in the early days, the recent increase in citations reflects the revival of the field of Deep Learning in the early 2010s.

The top 15 highly cited papers on the subject of LSTM are presented in Table 3. The highly cited literature represents the innovation in the field of AI, architectural advancements in the model, and the successful application of the model. Some of the highly cited literature include the paper by Sutskever et al. from 2014 [20], which introduced the use of the LSTM model in sequence-to-sequence learning. Another highly cited paper is the work by Graves et al. from 2013 [22], which presented the use of the LSTM model in handwriting and speech recognition. The highly cited literature has been published in top-tier journals such as Neural Computation, Neural Networks, and Physica D. The literature has also been presented at top-tier conferences such as NIPS, ICCV, and ICML. The highly cited literature reflects the dominance of the LSTM model in the recent past and its continued importance in the recent literature.

4.3.2. Co-Citation Analysis

To understand the intellectual structure of LSTM research, an analysis has been constructed based on a co-citation network. Through this network, “how frequently two publications are cited together, reflecting conceptual linkages and the emergence of thematic clusters,” can be revealed. Figure 11 shows the map of co-cited works among influential papers in the field of LSTM-based research. This visualization was also created using VOSviewer version 1.6.20. The co-citation analysis was performed based on cited references found in the overall Scopus dataset.

The unit of analysis and the type of counting for analysis were set to cited references and full counting, respectively. A threshold of at least 20 citations per reference was applied to filter the influential works only. A total number of 2,147,779 references were screened, where 2459 references met the threshold. The top 1000 relevant items were chosen to build the network visualization for clarity and readability. Network visualization was created using the normalized by association strength algorithm. Clustering analysis of this network was performed using the modularity-based algorithm implemented in VOSviewer software. The map includes eight clusters. At this juncture, each node represents a cited reference, while the size of the node corresponds to the frequency with which that work has been cited together with another. The edges (links) reflect the strength of co-citation relationships, and the colors denote clusters of closely connected references, which can be understood as thematic subfields of the research domain. At the center of the map, the most prominent nodes belong to Hochreiter and Schmidhuber (1997) [1], whose seminal work introducing the LSTM architecture represents the intellectual foundation of these areas. The strength of these nodes also confirms that nearly all subsequent studies within this domain rely on the contribution of these nodes, thus establishing them as the main reference point to connect various clusters of research. The following are some of the major clusters as highlighted from the map:

Red Cluster: The Red Cluster is densely populated with literature on NLP, DL models, and their applications. Literature such as Devlin et al. (2018) [32] on BERT and Srivastava et al. (2014) [33] on dropout is very visible. The presence of such literature confirms that LSTMs are highly referenced for developing modern text processing, sentiment analysis, and DL-based applications.
Green Cluster: The Green Cluster represents the foundational theory of LSTMs as discussed by Hochreiter & Schmidhuber [1]. This represents the theoretical foundation of RNNs. The thickness of this cluster confirms its importance as it is highly co-cited, with various studies focusing on overcoming various training difficulties with RNNs, such as the vanishing and exploding gradients, and gate usage.
Blue/Purple Cluster: This mostly pertains to the research on word embeddings and attention mechanisms-based studies. In this section, there is a focus on representation learning and attention-based models. The prominent studies on Word2Vec by Mikolov et al. (2013) [34,35], GloVe by Pennington et al. (2014) [36], and Vaswani et al. (2017) [37] on the Transformer are cited. The proximity to the core cluster on LSTMs reveals the intellectual journey from sequence modeling based on LSTMs to Transformer-based models and their hybrid forms.
Yellow Cluster: This culture is related to hybrid models and computer vision-based applications. In this section, there are citations on ResNet by He et al. (2016) [38], hybrid models by Zhang and Wang [39], and other studies on LSTMs and computer vision/multimodal learning. The position on the periphery reveals that while these applications are peripheral to LSTMs, they are vital extensions of LSTMs into interdisciplinary research.

As discussed above, based on the above-specified co-citation network, it can be clearly inferred that there are two parallel paths in the development of LSTM research: (a) establishing the foundations of LSTMs in sequence modeling, and (b) extending LSTMs into various applications, especially in NLP, attention mechanisms, and hybrid deep learning models. The journey from the core green culture on foundations to the periphery and then to the red culture on applications, blue culture on attention mechanisms and hybrid models, and then to the yellow culture on hybrid models and computer vision applications reveals the intellectual journey of LSTMs.

4.3.3. Citation Bursts and Early Influential Works

The citation burst analysis is based on those articles that have shown a sudden spike in their citations over a small time period. This is an indication of a sudden shift or acceleration. Figure 12 represents the citation burst timeline of the top-most influential references. In this figure, each row represents an article, and the horizontal axis represents the years (2000 to 2025). The bubbles and their sizes indicate annual citation counts and numbers of citations, respectively. Again, the red-circled bubbles highlight “citation bursts,” which are years where a reference underwent a sudden and significant increase in citations compared to its historical movement. Perhaps Hochreiter (1997) [1] shows consistent citations over time, reflecting its foundational role in RNNs (LSTMs), while Sutskever (2014) [20] and Bahdanau (2015) [14] exhibit sharp bursts after their publication, representing their substantial impact on DL and attention mechanisms-based research. Likewise, more recent works, such as Yu (2019) [24] and Sherstinsky (2020) [2], demonstrate emerging bursts, indicating the rapid adoption of their methods in the current study. The presented timeline helps identify both foundational works with long-term influence (e.g., Hochreiter 1997 [1], Graves 2005 [22]) and emerging contributions that quickly gained attention (e.g., Bahdanau 2015 [14], Kingma 2015 [40], LeCun 2015 [41]). Therefore, the figure highlights the evolution of influential literature in the domain and shows how new ideas gain momentum in the scientific community.

4.4. Thematic Evolution and Application Areas (RQ5)

The research based on LSTM has grown significantly over the past 25 years, both in terms of core themes and diverse application domains. In this regard, this section analyzes the thematic patterns using keyword co-occurrence, clustering techniques, and temporal segmentation to understand how LSTM’s research focus has shifted over time, from early theoretical work to widespread use in diverse domains.

4.4.1. Keyword Co-Occurrence and Thematic Clustering

Analyzing author keywords and index terms discloses the conceptual structure of the LSTM-based study. In this regard, a keyword co-occurrence analysis has been conducted using VOSviewer, with author keywords as the unit of analysis. To ensure its robustness, a minimum occurrence threshold of 10 was set, resulting in 3560 keywords meeting the criteria out of a total of 111,002 keywords. The resulting network visualization is presented in Figure 13. Here, each node signifies a keyword, and its size corresponds to the frequency of occurrence in the dataset as extracted from Scopus. The lines connecting the nodes indicate the co-occurrence relationship, and the colors indicate the clusters of keywords with high correlation. From the figure, we can see the prominence of “LSTM” in the middle, indicating its importance in the research field and serving as the center in different application fields. Through the in-depth and visualized analysis, we find different clusters indicating the research trends in the field:

Green Cluster (Deep Learning Models and Computer Vision): The presence of keywords such as CNN, RNN, Bi-LSTM, attention, image captioning, and computer vision emphasizes the methodological nature of the research conducted on models and computer vision applications.
Blue Cluster (Natural Language Processing and Sentiment Analysis): The presence of keywords such as sentiment analysis, natural language processing, BERT, GRU, and Twitter emphasizes the popularity of LSTM in text mining and NLP-related applications.
Red Cluster (Forecasting and Energy Systems): The presence of keywords such as wind speed, solar energy, lithium-ion battery, power prediction, and remaining useful life emphasizes the role of LSTM in energy-related applications.
Yellow Cluster (IoT and Cybersecurity): The presence of keywords such as Internet of Things, intrusion detection systems, malware, DDOS, and SDN emphasizes the role of LSTM in IoT applications.
Purple Cluster (Finance and Prediction): The presence of keywords such as stock market, cryptocurrency, ARIMA, and random forest emphasizes the role of integrating LSTM with other prediction models.

This analysis proves that LSTM is not confined to a single research domain but is widely applied across computer vision, NLP, cybersecurity, forecasting, finance, energy, and systems. Moreover, the strong interconnection between keywords reflects the interdisciplinary nature of LSTM applications, while the emergence of smaller clusters (e.g., emotion recognition, depression detection, intrusion detection) suggests growing research opportunities in specialized domains.

4.4.2. Temporal Evolution of Research Themes

Figure 14 illustrates the stage-wise temporal evolution of the most frequent author keywords in LSTM-related research, divided into the Early Stage (1997–2007), Growth Stage (2008–2016), and Maturity Stage (2017–2022). During the Early Stage (1997–2007), research activity was limited and mainly theoretical. The most frequent keywords were recurrent neural networks, neural networks, long short-term memory, learning systems, and algorithms. The low-frequency values reflect the small publication volume during this period, with a focus on model stability, training mechanisms, and conceptual exploration of sequence learning, rather than application-based approaches. During the Growth Stage (2008–2016), a resurgence of deep learning has been observed. In this period, the keywords such as long short-term memory (LSTM), recurrent neural networks, speech recognition, and computational linguistics gained enormous fame. Therefore, a rapid expansion of LSTM applications, particularly in natural language processing (NLP), speech recognition, and forecasting tasks, has been observed. In this period, the keywords such as long short-term memory (LSTM), recurrent neural networks, speech recognition, and computational linguistics gained enormous fame. Therefore, a rapid expansion of LSTM applications, particularly in natural language processing (NLP), speech recognition, and forecasting tasks, has been observed. As seen in the chart above, the recurring use of the keyword ‘brain’ indicates possible connections to cognitive modeling and neuroscience-related studies. The Maturity Stage (2017–2022) shows a rapid increase in the quantity of publications and the diversity of topics. The ‘high-flying’ keywords in this stage include ‘deep learning,’ ‘forecasting,’ ‘CNN,’ ‘machine learning,’ and ‘LSTM.’ This stage shows the application of LSTM in various interdisciplinary areas such as health, smart grids, anomaly detection, IoT, and energy forecasting. The comparison with other recent models, such as GRU and Transformer, was also apparent in this stage.

4.4.3. Application Domain Analysis

The classification of application domains was performed using a hybrid approach combining keyword-based filtering and manual validation. Each publication was initially assigned to a domain based on predefined keyword sets derived from titles, abstracts, and author keywords. The domains were grouped into standardized categories, including energy forecasting, natural language processing, cybersecurity, healthcare, finance, speech processing, and others. To ensure consistency and reduce classification bias, a manual review was conducted, and ambiguous cases were resolved through consensus. This structured approach enhances the reliability of the domain-wise distribution analysis. Figure 15 shows the distribution of the top application areas of LSTM research based on keyword and abstract labeling. The pie chart in Figure 15 clearly shows that energy forecasting is the dominating application area with a significant share of 24.5%, which is equivalent to 18,559 papers. It is understandable that the LSTM network is the key tool in time series prediction problems such as energy forecasting, wind/solar energy prediction, and energy demand management. The second and third dominating application areas are healthcare and NLP, with a share of 15.3% and 14.7% (which is equivalent to 11,561 and 11,123 papers, respectively). The next application area is transportation, with a share of 11.8%, which is equivalent to 8954 papers. The fifth dominating application area is IoT with a share of 10.5%, which is equivalent to 7922 papers. The inclusion of the essential application areas is finance with a share of 7.9%, cybersecurity with a share of 4.6%, and computer vision with a share of 9.4%.

4.5. Collaboration Patterns in LSTM Research (RQ6)

Research collaboration plays an essential role in scientific invention, mainly in interdisciplinary and fast-evolving fields like DL. Therefore, this section reports the collaboration structures that have shaped LSTM-based research over the past 25 years, focusing on patterns of co-authorship at the author, institutional, and national levels.

4.5.1. Researcher Collaboration Networks

Researcher-level collaboration networks tell how authors cluster around thematic or institutional tracks. Through co-authorship analysis, a network is constructed where nodes represent individual authors and edges indicate co-authored publications. Figure 16 presents the author collaboration network, showing distinct clusters of researchers working on topics such as NLP, time series forecasting, medical applications, and energy. This author collaboration network has been constructed in VOSviewer using co-authorship analysis. To ensure clarity and focus on influential contributors, a threshold of at least 10 documents and 10 citations per author is applied. Within a total of 108,530 authors in the dataset, 3168 met these criteria and are included in the network. Here, the node size represents the number of documents per author, edge thickness reflects co-authorship strength, and colors denote either collaboration clusters (network visualization as presented in Figure 16a) or the average publication year (overlay visualization as presented in Figure 16b).

The following are a few major observations from Figure 16a:

The largest cluster (center, dominated by authors such as Wang, Chen, Zhang) indicates an extensive collaboration network mainly from Asian institutions, particularly in China, which has emerged as a major contributor in LSTM research.
Another large cluster on the right (red, including authors like Kumar, Soman, Kim, Schuller) represents groups working extensively on applications such as speech recognition, signal processing, and time series forecasting.
Smaller peripheral clusters correspond to specialized application domains, such as medical research, energy forecasting, and control systems, where collaborations are narrower but domain-focused.

Similarly, the observation from Figure 16b (the overlay visualization of the author collaboration network) is highlighted as follows:

Authors in green/yellow shades indicate more recent contributors (2022–2023), showing where new collaborations are emerging.
The figure shows that while established researchers (e.g., Wang, Chen, Zhang) remain central in large-scale collaborations, new clusters of authors, particularly in applied domains like NLP, time-series forecasting, healthcare, and energy, are increasingly entering the field.

The geographic and thematic spread of recent, yellow-colored nodes suggests that the LSTM research community continues to expand with new entrants and application-driven collaborations. To examine institutional collaboration patterns, a co-authorship analysis has been carried out at the organizational level using VOSviewer software. Figure 17a displays the clustered institutional collaboration network, where node size represents the number of publications, edge thickness indicates the strength of collaboration, and colors reflect the institutional clusters. It has been clearly revealed that the University of Chinese Academy of Sciences, together with affiliated Chinese universities such as Central South University, Beijing University of Posts and Telecommunications, and Sichuan University, constitutes the dominant hub in LSTM-based study. Here, a clear dominance of China’s highly integrated national research ecosystem is clearly observed. Few other institutions, including Imperial College London (UK) and Chitkara University (India), appear as peripheral clusters, showing regionally focused collaborations rather than central integration. Figure 17b presents the overlay visualization of the same network, where node colors correspond to the average publication year (purple = earlier, yellow = recent). The temporal pattern indicates that leading Chinese institutions (blue–green) have been consistently active since earlier years (2020–2021), while newer entrants such as Chitkara University and other regional institutions (yellow) have emerged more recently (2022–2024).

4.5.2. Institutional Co-Authorship Patterns

In order to explore patterns of collaboration, an analysis of authorship relations has been done on the organizational level based on the VOSviewer software. Figure 17a shows an institutional collaboration network that is clustered according to publication frequency, with edge width indicating collaboration strength, and institution color showing clusters. For the analysis, the organizations have been used as the unit of analysis with the full counting approach. In order to be sure that active institutions are taken into account, the following threshold requirements were defined: at least 10 papers and at least 10 citations for each organization. According to these requirements, among 143,540 organizations, there were 581 active ones included in the analysis. Unlike usual filtering techniques, unconnected elements remained in the analysis (the “No” option in VOSviewer was chosen). Thus, the biased picture, when only the most connected institutions are shown, was avoided. The network was built using the association strength normalization technique and clustered using a default modularity clustering algorithm in VOSviewer. As a result, a network was constructed that consisted of 581 institutions divided into 193 clusters, with the presence of 619 connections and a total link strength of 1136. Major institutions, such as the University of Chinese Academy of Sciences and Beijing University of Posts and Telecommunications, appear as central nodes with higher link strength, indicating their significant role in institutional collaboration. In contrast, many peripheral nodes suggest emerging or regionally confined research efforts, highlighting opportunities for stronger global collaboration in LSTM research. It has been clearly revealed that the University of Chinese Academy of Sciences, together with affiliated Chinese universities such as Central South University, Beijing University of Posts and Telecommunications, and Sichuan University, constitutes the dominant hub in LSTM-based study. Here, a clear dominance of China’s highly integrated national research ecosystem is clearly observed. Few other institutions, including Imperial College London (UK) and Chitkara University (India), appear as peripheral clusters, showing regionally focused collaborations rather than central integration. Figure 17b presents the overlay visualization of the same network, where node colors correspond to the average publication year (purple = earlier, yellow = recent). The temporal pattern also shows that the prominent Chinese organizations (blue–green) have been active over the years (2020–2021), while the new entrants, such as Chitkara University and other regional organizations (yellow), have joined more recently (2022–2024).

4.5.3. Country-Level Collaboration and International Ties

Regarding involvement by country, the collaboration analysis reveals the distribution and evolution of the world’s contribution to LSTM-based studies. As illustrated in Figure 18a, an international collaborative network has been depicted, with the nodes’ sizes representing the publication volumes, edge thickness representing the collaboration strengths, and colors representing the clusters at the national scale. In the analysis, country has been adopted as the unit of analysis, and the full counting approach has been used. The minimum threshold of the document number and citation number of each nation has been set at 20. According to the threshold, there were 100 countries out of 640 satisfying the requirements and involved in the construction of the network. The association strength normalization approach has been employed for constructing the network, while the network visualization and overlay visualization methods have been applied for visualizing the network. The final network is made up of 100 nations forming eight clusters and containing 2251 connections with a link strength of 30,186, depicting a very tightly joined collaboration system within the global community.

From the above figure, it has been analyzed that China, the USA, and India are the three biggest hubs, contributing to a majority of the research in the field of LSTM globally. China has the highest number of publications and also the strongest collaboration links with the USA, the UK, Japan, and other Asian countries. India has the strongest regional connectivity with countries such as Saudi Arabia, Malaysia, Pakistan, and Egypt, indicating its new position as a leader in the South Global. Again, the European cluster, consisting of Germany, the UK, France, and Italy, has the strongest intra-cluster collaboration. Figure 18b provides the overlay visualization, where node colors correspond to the average year of publication. Here, this temporal view indicates that China, the United States, Germany, and the United Kingdom (blue–green shades) are earlier leaders with sustained activity, while India and Middle Eastern countries such as Saudi Arabia, Egypt, and Malaysia (yellow shades) have become more prominent in the most recent years (2021–2023).

5. Broader Trends in DL and Sequence Modeling (RQ7)

Although LSTM has portrayed a fundamental role in the development of DL for sequential datasets, it is essential to situate its evolution within the broader context of sequence modeling architectures. By comparing LSTM with more recent models, such as GRUs and Transformers, this section examines its enduring role and relevance in today’s AI landscape.

5.1. LSTM vs. GRU and the Evolution of Recurrent Models

Following the success of LSTM, the GRU was proposed in 2014 with a small architectural change (refer to Figure 5, Section 3). As mentioned earlier, the GRUs gained power, particularly in domains requiring efficient computation or smaller data. As time progresses, bibliometric trends indicate a rising number of GRU-based publications, often in conjunction with LSTM, particularly in comparative or hybrid analyses. Figure 19 illustrates publication trends for LSTM and GRU from 2015 to 2025. LSTM maintained its lead in volume, but GRU showed steady growth and was also popular in resource-constrained settings such as edge computing and mobile appliances. Several works involve a head-to-head competition between the two architectures on a range of tasks such as sentiment analysis [42,43,44,45,46], EEG signal classification [47,48,49,50,51], stock market prediction [51,52,53,54,55,56], and renewable energy forecasting [57,58,59,60], which reflects the fact that the choice of architecture depends on the constraints in the application domain.

5.2. The Rise of Transformers and Attention Mechanisms

Again, it is to be noted that there has been a shift in the sequence modeling landscape with the advent of Transformer models (Vaswani et al., 2017) [37]. The latest model is widely adopted for various applications, including NLP, computer vision, and multimodal models, compared to LSTMs.

Analysis has demonstrated that there has been an increased number of Transformer models published from 2018 onwards, with models such as BERT, GPT, and T5 performing better than LSTMs for various applications. Hence, it can be noted that several researchers are shifting from LSTMs to Transformer models, specifically for applications such as modeling long-range dependencies [61].

Still, it is to be noted that LSTM is included in the competitive research and is applicable for various applications such as time series forecasting, as it is critical for sequential order and temporal resolution [62]; small data regimes, as LSTMs perform better than Transformer models, which require massive data [63]; and real-time systems and embedded devices, as it has less inference latency and memory usage compared to Transformer models [64].

5.3. Bibliometric Signals of Decline or Saturation

The trend of publication (refer to Figure 20) clearly shows a paradigm shift in the research direction related to sequence modeling, due to the introduction of the Transformer architecture. From the figure, it is clear that LSTM and GRU-based models dominated the research scene up to 2018, with rapid growth and accounting for almost all the research contributions. However, with the publication of the Transformer-based architecture paper titled “Attention Is All You Need” in 2017, Transformer-based models started to gain momentum, with an extremely rapid growth rate in their share. From almost negligible research contributions, i.e., less than 1% in 2017, Transformer-based models have now become the major research contributions, with a share of over 50% by the end of 2025. On the other hand, the research contributions related to LSTM and GRU-based models, although still high, have shown a plateau and then a decline in their growth rate. This shows that the research direction is slowly moving away from recurrent-based models and is shifting towards Transformer-based models. A very minor dip in the growth rate of LSTM-based model research contributions from 2020 might suggest that this is the stage where the research is becoming mature. This is not to suggest that LSTM-based models have become outdated, but that the research is slowly moving towards a more generalized approach. Among the emerging themes identified, hybrid approaches integrating LSTM with optimization algorithms have gained increasing attention, which is further analyzed in Section 6.

6. Critical Analysis

This part further analyzes the development process of LSTM research based on the previous findings of bibliometrics and themes. It is evident that even though LSTM has become a fundamental structure used in sequential and time series data modeling, the focus of research on LSTM gradually moves towards improvements, applications, and hybridization. From the theme evolution and keywords, the trend reveals that there was an early stage of architecture development and later application and optimization-oriented research on LSTM. Though there has been an increased interest in the development of other architectures such as Transformers, LSTM still retains its significance because of its interpretability, efficiency, and capability of handling temporal tasks. Notably, the bibliometric results show a rising trend of hybrid LSTM models designed for performance improvement. Hybrid models are created to solve problems such as sensitivity to hyperparameters, slow convergence, and local minima challenges that arise from gradient-based optimization methods. It is important to note that hybrid LSTM model optimization has been emerging as a significant research area. The following subsection provides a focused critical analysis of this trend.

6.1. Emerging Direction: Hybridization of LSTM with Optimization Algorithms

Bibliometric analysis results show a gradual tendency towards LSTM models combined with the optimization approach to improve the results of model operation. Although such hybrid techniques may not necessarily emerge as prominent clusters during keyword analysis, their increasing number within recent research publications clearly demonstrates an upward tendency. The need for hybridization arises from the inability to cope with some of the disadvantages of classic LSTM, namely, the lack of automatic adjustment and the difficulties of gradient-based optimizations, which require using BPTT. That is why researchers began to combine the functioning of LSTM with optimization techniques. With this objective in mind, the following subsection is dedicated to the justification and review of hybrid approaches to working with LSTM, along with their advantages and disadvantages.

6.1.1. Rationale for Hybridization

From the bibliometric analysis results, one of the main drivers for combining LSTM with optimization algorithms has been identified as the desire to overcome the problems associated with parameter tuning and convergence. There have been quite a number of works focusing on the problem of choosing the optimal hyperparameters, which include the number of hidden units, learning rate, dropout rate, and sequence length.

In an attempt to overcome such problems, researchers have started relying on the application of optimization algorithms for parameter tuning purposes. In this regard, it should be noted that optimization algorithms based on the principles of evolutionary and swarm intelligence have enabled a globally constrained search space and help avoid local optima. As reflected in the literature, such hybrid approaches contribute to improved model stability and predictive performance, making them attractive for complex real-world applications.

6.1.2. Common Optimization Algorithms Applied with LSTM

The analysis of selected studies reveals that some of the popular optimization algorithms used in combination with the LSTM approach include the following algorithms: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and swarm intelligence-based algorithms like Grey Wolf Optimizer (GWO) and Whale Optimization Algorithm (WOA). GA-based algorithms have become wide-spread for weight, bias, and hyperparameter optimization in LSTM architectures. These algorithms ensure higher global search capabilities and overcome the disadvantages inherent to conventional gradient-based optimization [65,66,67]. Another algorithm that is often used for tuning the parameters of neural networks is PSO, which can effectively solve different optimization problems. As demonstrated by multiple applications, from short-term load forecasting to weather prediction, this approach proves to be effective for a variety of optimization tasks [68,69,70]. DE optimization has been widely applied in various fields to boost convergence behavior and increase forecasting accuracy. This algorithm can be successfully implemented in solving tasks related to wind speed forecast, renewable energy, and so on [71,72]. Moreover, swarm intelligence algorithms like GWO and WOA are becoming increasingly popular due to their advantages in searching through complex problem spaces [73,74,75]. Lastly, multi-objective optimization algorithms could be considered for balancing conflicting criteria in the task at hand [76,77,78].

6.1.3. Benefits and Limitations of Hybrid Approaches

Several benefits have been linked to the hybridization of the LSTM model with optimization algorithms. The first benefit is the predictive accuracy of the model. The hybrid model has been seen to be more accurate in prediction compared to the normal LSTM model. The second benefit is the generalization capacity of the model. The model has the capacity to perform well in all environments. The third benefit is the capacity of the model to be automated in the tuning of the parameters. The flexibility of the model has seen it applied in various domains. These domains include finance, energy, weather prediction, health, and industry. The publication trend in the past few years (Figure 21) shows the rising impact of the use of optimization and the hybridization of the model. Growth trends of different categories of LSTM studies are shown on a logarithmic scale using normalized indexing in Figure 21b. Hybrid optimization approaches exhibit the highest contribution in recent years. The early years from 2005 to 2015 saw minimal contributions to the model. From 2016 to 2019, the trend changed. The year 2017–2018 marked the introduction of the CNN-LSTM and attention-based model. From 2018 onwards, the number of outputs has been rising exponentially. The shaded region indicates the initial acceleration phase identified through changes in logarithmic growth slope. The results reveal that hybrid optimization models experience the fastest growth, followed by hybrid LSTM approaches. The increase in journal publications indicates the acceptance of the research direction by the scientific community.

Despite the above advantages, however, hybrid techniques also face certain limitations. One of the main limitations of the hybrid techniques is the computational cost. The use of meta-heuristic algorithms involves the execution of a large number of objective function evaluations. This makes the techniques computationally expensive [14]. Moreover, the majority of the research in this domain has been performed on small datasets. Therefore, the applicability of the techniques remains questionable. Another limitation of the hybrid techniques is the possibility of overfitting. The use of optimization techniques may cause the model to overfit. The large number of studies carried out in this domain using various optimization techniques has created a fragmented research scenario. This has hindered the formulation of a standardized framework for the evaluation of hybrid LSTM techniques.

7. Conclusions and Future Scope

In this bibliometric review article, an extensive overview of 25 years of research in Long Short-Term Memory (LSTM) networks has been provided, emphasizing its significant contribution to the development of AI and DL. This article has shown an increased rate of publication in recent years, following the DL boom in 2014, in addition to an expansion in application domains from basic sequence learning to various domains like NLP, finance, healthcare, and energy. Significant contributors in the form of individuals and organizations have contributed to the development of LSTM networks, facilitated by a highly connected global research community. Other bibliometric indicators, like citation analysis, have also shown the importance of basic works in the development of LSTM networks, in addition to highlighting significant periods in the development of LSTM networks in terms of citation bursts and thematic moves. In terms of thematic moves, it has shown the development of LSTM networks from basic constructs to applications, in addition to architectural advancements and hybrid approaches. Despite the development of recent sequence modeling approaches like GRUs and Transformers, LSTM still holds its position in various application domains.

Regardless of its maturity, the LSTM-based research shows multiple promising directions for future study:

Hybridization in LSTM Architectures with integration with attention mechanisms, convolutional networks, or Transformer networks may enhance model yields with balanced interpretability, efficiency, and performance.
Emergent domains of applications such as personalized healthcare, climate modeling, and edge computing may present new challenges and opportunities for tailored LSTM variants.
Progressing techniques for understanding LSTM decision-making processes will be crucial for critical applications in medicine, finance, autonomous systems, and energy sectors.
Developing standardized benchmarks and datasets across diverse domains can help systematically evaluate LSTM and its competitors under reliable circumstances.
Considering the growing interest in AI deployment on mobile and embedded devices, research into lightweight and quantized LSTM models remains essential.
Escalating global and cross-sector partnerships can accelerate innovation, particularly by bridging academia, industry, and policy stakeholders.

At last, it is also vital to acknowledge a few limitations of this study, including the reliance on specific bibliographic databases that may exclude relevant works, potential bias introduced by keyword selection in the search strategy, and the inherent challenges of accurately attributing contributions in large-scale bibliometric studies. Therefore, the future work of this study includes the domain-specific scientific analysis and meta-analysis.

Even though there is an attempt to ensure methodological soundness, certain shortcomings need to be taken into account. First, using only one database (Scopus) may create a constraint in terms of capturing all the sources of literature available for analysis. Second, the use of specific queries and selection criteria impacts the research outcomes. In order to prove that our conclusions are sound, a sensitivity analysis was carried out, and it showed that they hold true even when alternative search queries were used. Still, future research might focus on expanding the analysis to include several other databases and the use of algorithms to disambiguate entities automatically.

Author Contributions

S.M.: Conceptualization, Software, writing—review & editing; writing—original draft; S.P.S. and S.M.: Conceptualization, Software, writing—review & editing; writing—original draft; J.G.S.: Investigation, Review & editing, supervision. M.M.: Investigation, formal analysis, data curation, writing—original draft; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This work has not been published or is considered for publication in any Journals/Conferences/Symposia/Seminars and is free from plagiarism.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

The authors declare no financial or other conflicts of interest with this research work.

References

Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Kumar, M.; George, R.J.; PS, A. Bibliometric analysis for medical research. Indian J. Psychol. Med. 2023, 45, 277–282. [Google Scholar]
Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
He, X.; Wu, Y.; Yu, D.; Merigó, J.M. Exploring the ordered weighted averaging operator knowledge domain: A bibliometric analysis. Int. J. Intell. Syst. 2017, 32, 1151–1166. [Google Scholar] [CrossRef]
White, H.D. Pennants for Garfield: Bibliometrics and document retrieval. Scientometrics 2018, 114, 757–778. [Google Scholar] [CrossRef]
Li, Y.; Xu, Z.; Wang, X.; Wang, X. A bibliometric analysis on deep learning during 2007–2019. Int. J. Mach. Learn. Cybern. 2020, 11, 2807–2826. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J. Recurrent nets that time and count. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Como, Italy, 27 July 2000; Volume 3, pp. 189–194. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
Krueger, D.; Maharaj, T.; Kramár, J.; Pezeshki, M.; Ballas, N.; Ke, N.R.; Goyal, A.; Bengio, Y.; Courville, A.; Pal, C. Zoneout: Regularizing rnns by randomly preserving hidden activations. arXiv 2016, arXiv:1606.01305. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Neil, D.; Pfeiffer, M.; Liu, S.-C. Phased LSTM: Accelerating recurrent network training for long or event-based sequences. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, 5–10 December 2016; pp. 3889–3897. Available online: https://dl.acm.org/doi/10.5555/3157382.3157532 (accessed on 7 May 2026).
Kalchbrenner, N.; Danihelka, I.; Graves, A. Grid long short-term memory. arXiv 2015, arXiv:1507.01526. [Google Scholar]
Kim, J.; El-Khamy, M.; Lee, J. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv 2017, arXiv:1701.03360. [Google Scholar] [CrossRef]
Sak, H.; Senior, A.; Beaufays, F. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv 2014, arXiv:1402.1128. [Google Scholar] [CrossRef]
Krause, B.; Lu, L.; Murray, I.; Renals, S. Multiplicative LSTM for sequence modelling. arXiv 2016, arXiv:1609.07959. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the 28th Interna-tional Conference on Neural Information Processing Systems (NIPS’14), Montreal, Canada, 8–13 December 2014; pp. 3104–3112. Available online: https://dl.acm.org/doi/10.5555/2969033.2969173 (accessed on 7 May 2026).
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://proceedings.neurips.cc/paper_files/paper/2015/file/07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf (accessed on 7 May 2026).
Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 1999, 12, 2451–2471. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24–28 June 2017; pp. 1–12. [Google Scholar]
Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Fei-Fei, L.; Savarese, S. Social LSTM: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 24–30 June 2016; pp. 961–971. [Google Scholar]
Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–4 November 2016; pp. 606–615. [Google Scholar]
Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1243–1252. [Google Scholar]
Ordóñez, F.J.; Roggen, D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
Ravi, S.; Larochelle, H. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Yue-Hei Ng, J.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 4694–4702. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (long and short papers), pp. 4171–4186. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26. Available online: https://proceedings.neurips.cc/paper_files/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf (accessed on 7 May 2026).
Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf (accessed on 7 May 2026).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, D.; Wang, S. A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN. J. Bioinform. Comput. Biol. 2022, 20, 2250003. [Google Scholar] [CrossRef]
Kingma, D.P.; Salimans, T.; Welling, M. Variational dropout and the local reparameterization trick. Adv. Neural Inf. Process. Syst. 2015, 28. Available online: https://proceedings.neurips.cc/paper_files/paper/2015/file/bc7316929fe1545bf0b98d114ee3ecb8-Paper.pdf (accessed on 7 May 2026).
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Seabe, P.L.; Moutsinga, C.R.B.; Pindza, E. Sentiment-driven cryptocurrency forecasting: Analyzing LSTM, GRU, Bi-LSTM, and temporal attention model (TAM). Soc. Netw. Anal. Min. 2025, 15, 52. [Google Scholar] [CrossRef]
Kaur, G.; Sharma, A. A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. J. Big Data 2023, 10, 5. [Google Scholar] [CrossRef]
Singgalen, Y.A. Sentiment Analysis and Trend Mapping of Hotel Reviews Using LSTM and GRU. J. Inf. Syst. Inform. 2024, 6, 2814–2836. [Google Scholar] [CrossRef]
Ouni, C.; Benmohamed, E.; Ltifi, H. Sentiment analysis deep learning model based on a novel hybrid embedding method. Soc. Netw. Anal. Min. 2024, 14, 210. [Google Scholar] [CrossRef]
Atlas, L.G.; Arockiam, D.; Muthusamy, A.; Balusamy, B.; Selvarajan, S.; Al-Shehari, T.; Alsadhan, N.A. A modernized approach to sentiment analysis of product reviews using BiGRU and RNN based LSTM deep learning models. Sci. Rep. 2025, 15, 16642. [Google Scholar] [CrossRef]
Omar, S.M.; Kimwele, M.; Olowolayemo, A.; Kaburu, D.M. Enhancing EEG signals classification using LSTM-CNN architecture. Eng. Rep. 2024, 6, e12827. [Google Scholar]
Manivannan, G.S.; Mani, K.; Rajaguru, H.; Talawar, S.V. Detection of Alcoholic EEG signal using LASSO regression with metaheuristics algorithms based LSTM and enhanced artificial neural network classification algorithms. Sci. Rep. 2024, 14, 21437. [Google Scholar] [CrossRef]
Karimian-Kelishadrokhi, M.; Safi-Esfahani, F. TD-LSTM: A time distributed and deep-learning-based architecture for classification of motor imagery and execution in EEG signals. Neural Comput. Appl. 2024, 36, 15843–15868. [Google Scholar] [CrossRef]
Ananthi, A.; Subathra, M.S.P.; George, S.T.; Sairamya, N.J.; Prasanna, J.; Manimegalai, P. Motor imaginary tasks-based EEG signals classification using continuous wavelet transform and LSTM network. In Computational Intelligence and Deep Learning Methods for Neuro-Rehabilitation Applications; Academic Press: Cambridge, MA, USA, 2024; pp. 239–256. [Google Scholar]
Das, A.; Singh, S.; Kim, J.; Ahanger, T.A.; Pise, A.A. Enhanced EEG signal classification in brain computer interfaces using hybrid deep learning models. Sci. Rep. 2025, 15, 27161. [Google Scholar] [CrossRef] [PubMed]
Alam, K.; Bhuiyan, M.H.; Haque, I.U.; Monir, M.F.; Ahmed, T. Enhancing stock market prediction: A robust LSTM-DNN model analysis on 26 real-life datasets. IEEE Access 2024, 12, 122757–122768. [Google Scholar] [CrossRef]
Kothari, A.; Kulkarni, A.; Kohade, T.; Pawar, C. Stock market prediction using LSTM. In International Conference on Smart Computing and Communication; Springer Nature Singapore: Singapore, 2024; pp. 143–164. [Google Scholar]
Liu, F.; Guo, S.; Xing, Q.; Sha, X.; Chen, Y.; Jin, Y.; Zheng, Q.; Yu, C. Application of an ANN and LSTM-based ensemble model for stock market prediction. In Proceedings of the 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 27–29 September 2024; pp. 390–395. [Google Scholar]
Wang, J.; Hong, S.; Dong, Y.; Li, Z.; Hu, J. Predicting stock market trends using LSTM networks: Overcoming RNN limitations for improved financial forecasting. J. Comput. Sci. Softw. Appl. 2024, 4, 1–7. [Google Scholar]
Agarwal, S.; Sharma, S.; Faisal, K.N.; Sharma, R.R. Time-Series Forecasting Using SVMD-LSTM: A Hybrid Approach for Stock Market Prediction. J. Probab. Stat. 2025, 2025, 9464938. [Google Scholar] [CrossRef]
Khan, S.; Mazhar, T.; Khan, M.A.; Shahzad, T.; Ahmad, W.; Bibi, A.; Saeed, M.M.; Hamam, H. Comparative analysis of deep neural network architectures for renewable energy forecasting: Enhancing accuracy with meteorological and time-based features. Discov. Sustain. 2024, 5, 533. [Google Scholar] [CrossRef]
Narayanan, S.; Kumar, R.; Ramadass, S.; Ramasamy, J. Hybrid forecasting model integrating RNN-LSTM for renewable energy production. Electr. Power Compon. Syst. 2024, 1–19. [Google Scholar] [CrossRef]
Yang, Y.; Han, L.; Qiu, C.; Zhao, Y. A short-term wave energy forecasting model using two-layer decomposition and LSTM-attention. Ocean. Eng. 2024, 299, 117279. [Google Scholar] [CrossRef]
Hossain, M.L.; Shams, S.M.; Ullah, S.M. Time-series and deep learning approaches for renewable energy forecasting in Dhaka: A comparative study of ARIMA, SARIMA, and LSTM models. Discov. Sustain. 2025, 6, 775. [Google Scholar] [CrossRef]
Sakib, S.; Mahadi, M.K.; Abir, S.R.; Moon, A.M.; Shafiullah, A.; Ali, S.; Faisal, F.; Nishat, M.M. Attention-based models for multivariate time series forecasting: Multi-step solar irradiation prediction. Heliyon 2024, 10, e27795. [Google Scholar] [CrossRef]
Kong, Y.; Wang, Z.; Nie, Y.; Zhou, T.; Zohren, S.; Liang, Y.; Sun, P.; Wen, Q. Unlocking the power of LSTM for long term time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 11968–11976. [Google Scholar]
Bornschein, J.; Visin, F.; Osindero, S. Small data, big decisions: Model selection in the small-data regime. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1035–1044. [Google Scholar]
Park, D.; Kim, S.; An, Y.; Jung, J.Y. LiReD: A light-weight real-time fault detection system for edge computing using LSTM recurrent neural networks. Sensors 2018, 18, 2110. [Google Scholar] [CrossRef]
Bai, Z. Residential electricity prediction based on GA-LSTM modeling. Energy Rep. 2024, 11, 6223–6232. [Google Scholar] [CrossRef]
Singh, U.; Saurabh, K.; Trehan, N.; Vyas, R.; Vyas, O.P. GA-LSTM: Performance Optimization of LSTM driven Time Series Forecasting. Comput. Econ. 2024, 66, 2873–2908. [Google Scholar] [CrossRef]
Vaitheeswaran, S.S.; Ventrapragada, V.R. Wind Power Pattern Prediction in time series measuremnt data for wind energy prediction modelling using LSTM-GA networks. In Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kanpur, India, 6–8 July 2019; pp. 1–5. [Google Scholar]
Badjan, A.; Rashed, G.I.; Bahageel, A.O.; Gony, H.A.I.; Shaheen, H.I.; Tuaimah, F.M. Efficient grid management: Smart forecasting of short-term power load using PSO-LSTM. Eng. Res. Express 2024, 6, 035364. [Google Scholar] [CrossRef]
Ge, W.; Wang, X. PSO–LSTM–Markov Coupled Photovoltaic Power Prediction Based on Sunny, Cloudy and Rainy Weather. J. Electr. Eng. Technol. 2025, 20, 935–945. [Google Scholar] [CrossRef]
Gundu, V.; Simon, S.P. PSO–LSTM for short term forecast of heterogeneous time series electricity price signals. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 2375–2385. [Google Scholar] [CrossRef]
Liu, S.; Zhang, L.; Zou, B. Study on electricity market price forecasting with large-scale wind power based on LSTM. In Proceedings of the 2019 6th International Conference on Dependable Systems and Their Applications (DSA), Harbin, China, 3–6 January 2020; pp. 297–303. [Google Scholar]
Bilgili, M.; Arslan, N.; Şekertekin, A.; Yaşar, A. Application of long short-term memory (LSTM) neural network based on deeplearning for electricity energy consumption forecasting. Turk. J. Electr. Eng. Comput. Sci. 2022, 30, 140–157. [Google Scholar] [CrossRef]
Zhu, M.; Qi, H.; Qin, P. IGWO-MALSTM: An Improved Grey Wolf-Optimized Hybrid LSTM with Multi-Head Attention for Financial Time Series Forecasting. Appl. Sci. 2025, 15, 6619. [Google Scholar] [CrossRef]
Huiyong, W.; Wang, Z. Stock market forecasting research based on GA-WOA-LSTM. PLoS ONE 2025, 20, e0330324. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Mutalib, S.; Tian, L. Improved Whale Optimization Algorithm with LSTM for Stock Index Prediction. Int. J. Adv. Comput. Sci. Appl. 2025, 16, 283. [Google Scholar] [CrossRef]
Bharathi, M.; Rajaniraiyn, R.; Ramyakumari, S.; Tejaswini, T. Integrated Predictive Maintenance and Performance Optimization of BLDC Motors Using GA, PSO, and LSTM on Temperature and Vibration Data Targeting Bearing Defects. In Proceedings of the 2025 3rd International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India, 4–5 April 2025; pp. 1–5. [Google Scholar]
Kishore, C.R.; Rao, D.C.; Nayak, J.; Behera, H.S. Improved particle swarm optimization based bidirectional-long short-term memory for intrusion detection system in Internet of Vehicle. Arab. J. Sci. Eng. 2025, 50, 12357–12386. [Google Scholar] [CrossRef]
Hosseini, E.; Al-Ghaili, A.M.; Kadir, D.H.; Gunasekaran, S.S.; Ahmed, A.N.; Jamil, N.; Deveci, M.; Razali, R.A. Meta-heuristics and deep learning for energy applications: Review and open research challenges (2018–2023). Energy Strategy Rev. 2024, 53, 101409. [Google Scholar] [CrossRef]

Figure 1. Comparison of major available survey methodologies.

Figure 2. Year-wise publication of bibliometric journal papers. This figure illustrates the publication trend of bibliometric documents from 2015 to 2025 (a) number of publications over time, (b) number of publications based on subject area.

Figure 3. Methodological workflow.

Figure 4. Basic architecture of the LSTM network.

Figure 5. Year-wise development of LSTM Models and their variants [1,8,9,10,11,12,13,14,15,16,17,18,19].

Figure 6. Publication trend in LSTM research (a) over all from 1997 to 2025 (b) details of the last decade (2015–mid-2025).

Figure 7. Leading authors of published LSTM-based research (a) All articles including journal, conference proceedings, book, and book chapters, (b) Only Journal Articles.

Figure 8. Bubble chart showing the relationship between total publications and total citations of leading authors (Journals only).

Figure 9. Top 15 contributing organizations based on the number of published articles in LSTM-related research (a) All articles including journal, conference proceedings, book, and book chapters; (b) Only Journal Articles.

Figure 10. LSTM-based research articles: Country-wise distribution of global LSTM research.

Figure 11. Co-citation network of LSTM-related literature. Node size reflects citation frequency, edge thickness indicates co-citation strength, and colors represent thematic clusters identified by VOSviewer.

Figure 12. Citation burst timelines.

Figure 13. Co-occurrence map of author keywords in LSTM literature.

Figure 14. Stage-wise temporal evolution of top keywords in LSTM-related research. Each subplot is independently scaled to highlight trends within the Early Stage (1997–2007), Growth Stage (2008–2016), and Maturity Stage (2017–2022).

Figure 15. Top 10 application areas of LSTM research by publication count, based on keyword and abstract tagging. Energy forecasting dominates with 34.7% of publications.

Figure 16. Author collaboration network in LSTM-related research in VOSviewer (a) based on co-authorship analysis, (b) overlay visualization.

Figure 17. Institutional collaboration network in LSTM-related research (VOSviewer) (a) based on co-authorship analysis, (b) overlay visualization.

Figure 18. Country-level collaboration in LSTM-related research. (a) Cluster visualization of international collaborations, where node size = publication count, edge thickness = co-authorship strength, and colors represent country clusters. China, the United States, and India form the dominant hubs, with strong links to Europe and Asia. (b) Overlay visualization showing the temporal evolution of national participation (purple = earlier, yellow = recent). While China and the US are long-standing leaders, India, Saudi Arabia, and Malaysia represent newer and rapidly growing contributors (2021–2023).

Figure 19. Publication trends for LSTM and GRU from 2015 to 2025.

Figure 20. Thematic overlay map comparing LSTM, GRU, and Transformer keywords, illustrating the progressive shift in focus from traditional RNNs toward attention-driven architectures in recent years.

Figure 21. Distribution of journal publications on hybrid, optimization-based, and hybrid–optimization LSTM models and growth trends of different categories shown on a logarithmic scale using normalized indexing.

Table 1. Details the data collection.

Seach	Inclusion/Exclusion Items		Output Overall Items
TITLE-ABS-KEY (LSTM)	Inclusion Criteria-1	Applied the Year (1997–present)	111,133
	EC-1 applied	Word (Erlangen)	111,043
	EC-2 applied	Document type (Erratum, Retracted, Letter, Book, Note, Editorial, Data Paper, Short Survey)	110,425
	EC-3 applied	Author Name (Undefined)	109,023
	Manual Inspection	(Paper not related to LSTM application) Implicitly focus on LSTM methodologies or applications Do not contribute to theoretical, methodological, or applied aspects of LSTM research	105,238

Table 2. Sensitivity analysis of search query formulations.

Metric	Query A: “LSTM”	Query B: “Long Short-Term Memory”	Query C: Combined Query
Total Publications	109,023	113,118	90,276
Time Span (Years)	1997–2025	1997–2025	1997–2025
Peak Publication Year	2020	2020	2020
Publications in Peak Year	2025	2025	2025
Growth Trend Pattern	Similar	Similar	Similar
Top 5 Authors Overlap (%)	100%	100%	100%
Top 5 Sources/Journals	Similar	Slightly Different	Similar
Dominant Research Themes	Similar	Similar	Similar
Inclusion of Irrelevant Records	Good	Good	Good

Table 3. Most cited publications in LSTM-based research (based on Scopus data access date: 2 May 2025).

Article Title	Article Source Name: Journal (J)/Conference (C)/Review (R)	Cited by	Ref. Authors [No.], Year
Long Short-Term Memory	Neural Computation (J)	93,119	Hochreiter, S. and Schmidhuber, J. [1], 1997
Sequence-to-sequence learning with neural networks	Advances in Neural Information Processing Systems (C)	15,641	Sutskever et al. [20], 2014
Convolutional LSTM network: A machine learning approach for precipitation nowcasting	Advances in Neural Information Processing Systems (C)	16,647	Shi et al. [21], 2015
LSTM: A search space odyssey.	IEEE Transactions on Neural Networks and Learning Systems	8351	Greff et al. [10], 2017
Framewise phoneme classification with bidirectional LSTM and other neural network architectures	Neural Networks (J)	5884	Graves, A. and Schmidhuber, J. [22], 2005
Learning to forget: Continual prediction with LSTM	Neural Computation (J)	5286	Gers et al. [23], 2000
A review of recurrent neural networks: LSTM cells and network architectures	Neural Computation (R)	5187	Yu et al. [24], 2019
Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network	Physica D: Nonlinear Phenomena (J)	4919	Sherstinsky, A. [2] 2020
In-datacenter performance analysis of a tensor processing unit	44th Annual International Symposium on Computer Architecture (C)	4011	Jouppi et al. [25], 2017
Social LSTM: Human trajectory prediction in crowded spaces	IEEE Conference on Computer Vision and Pattern Recognition (C)	3456	Alahi et al. [26], 2016
Attention-based LSTM for aspect-level sentiment classification	Empirical Methods in Natural Language Processing (C)	2440	Wang et al. [27], 2016
Convolutional sequence-to-sequence learning	International Conference on Machine Learning (C)	2418	Gehring et al. [28], 2017
Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition	Sensors (J)	2399	Ordóñez and Roggen, [29], 2016
Optimization as a model for few-shot learning	International Conference on Learning Representations (C)	2194	Ravi and Larochelle [30], 2017
Beyond short snippets: Deep networks for video classification	IEEE Conference on Computer Vision and Pattern Recognition (C)	2086	Yue-Hei Ng et al. [31], 2015

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mohapatra, S.; Singh, J.G.; Samantaray, S.P.; Mishra, M. Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis. Algorithms 2026, 19, 390. https://doi.org/10.3390/a19050390

AMA Style

Mohapatra S, Singh JG, Samantaray SP, Mishra M. Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis. Algorithms. 2026; 19(5):390. https://doi.org/10.3390/a19050390

Chicago/Turabian Style

Mohapatra, Subhashree, Jai Govind Singh, Subham Pankaj Samantaray, and Manohar Mishra. 2026. "Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis" Algorithms 19, no. 5: 390. https://doi.org/10.3390/a19050390

APA Style

Mohapatra, S., Singh, J. G., Samantaray, S. P., & Mishra, M. (2026). Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis. Algorithms, 19(5), 390. https://doi.org/10.3390/a19050390

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Long Short-Term Memory Networks Since Their Inception: Mapping 25 Years of Scientific Development via Bibliometric Analysis

Abstract

1. Introduction

2. Review Methodology

2.1. Data Collection and Pre-Processing Database Selection

2.2. Inclusion Criteria and Data Cleaning

2.3. Sensitivity Analysis

2.4. Analytical Tools

3. Inception and Foundational Concepts

Architectural Refinements over Time (RQ-1)

4. Bibliometric Analysis

4.1. Publication Trends in LSTM Research (RQ2)

4.2. Influential Contributors and Regions (RQ3)

4.2.1. Leading Authors and Research Impact

4.2.2. Institutional Contributions

4.2.3. Global Distribution of Research Output

4.3. Most Impactful Publications and Citation Analysis (RQ4)

4.3.1. Highly Cited Papers

4.3.2. Co-Citation Analysis

4.3.3. Citation Bursts and Early Influential Works

4.4. Thematic Evolution and Application Areas (RQ5)

4.4.1. Keyword Co-Occurrence and Thematic Clustering

4.4.2. Temporal Evolution of Research Themes

4.4.3. Application Domain Analysis

4.5. Collaboration Patterns in LSTM Research (RQ6)

4.5.1. Researcher Collaboration Networks

4.5.2. Institutional Co-Authorship Patterns

4.5.3. Country-Level Collaboration and International Ties

5. Broader Trends in DL and Sequence Modeling (RQ7)

5.1. LSTM vs. GRU and the Evolution of Recurrent Models

5.2. The Rise of Transformers and Attention Mechanisms

5.3. Bibliometric Signals of Decline or Saturation

6. Critical Analysis

6.1. Emerging Direction: Hybridization of LSTM with Optimization Algorithms

6.1.1. Rationale for Hybridization

6.1.2. Common Optimization Algorithms Applied with LSTM

6.1.3. Benefits and Limitations of Hybrid Approaches

7. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI