Next Article in Journal
Numerical Study of the Filtration Performance for Electrospun Nanofiber Membranes
Previous Article in Journal
Evaluation of Slope Stability and Landslide Prevention in a Closed Open-Pit Mine Used for Water Storage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Sentence-Level Insights from the Martian Literature: A Natural Language Processing Approach

1
School of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China
2
College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
3
School of Earth and Space Science and Technology, Wuhan University, Wuhan 430072, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(15), 8663; https://doi.org/10.3390/app15158663
Submission received: 10 June 2025 / Revised: 14 July 2025 / Accepted: 21 July 2025 / Published: 5 August 2025
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Abstract

Mars has been a primary focus of planetary science, with significant advancements over the past two decades across disciplines including geological evolution, surface environment, and atmospheric and space science. However, the rapid growth of the related literature has rendered traditional manual review methods increasingly inadequate. This inadequacy is particularly evident in interdisciplinary research, which is often characterized by dispersed topics and complex semantics. To address this challenge, this study proposes an automated analysis framework based on natural language processing (NLP) to systematically review the Martian research in Earth and space science over the past two decades. The research database contains 151,196 Mars-related sentences extracted from 10,655 publications spanning 2001 to 2024. Using machine learning techniques, the framework clusters Mars-related sentences into semantically coherent groups and applies topic modeling to extract core research themes. It then analyzes their temporal evolution across the Martian solid, surface, atmosphere, and space environments. Finally, through sentiment analysis and semantic matching, it highlights unresolved scientific questions and potential directions for future research. This approach offers a novel perspective on the knowledge structure underlying Mars exploration and demonstrates the potential of NLP for large-scale literature analysis in planetary science. The findings potentially provide a structured foundation for building an interdisciplinary, peer-reviewed Mars knowledge base, which may inform future scientific research and mission planning.

1. Introduction

Mars, as one of the most closely studied terrestrial planets in the solar system, occupies a central position in planetary science due to evidence of past water activity, potential signs of life, and its geological evolution [1,2]. Research on Mars not only advances our understanding of Earth’s history and future but also holds strategic importance for prospective interstellar migration and resource utilization. As noted in [3], “ Mars exploration is a topic that concerns the fate of human beings and influences the lasting of human civilization,” and has therefore attracted growing scientific attention.
Over the past two decades, the number and accuracy of Mars exploration missions have steadily increased. As a result, Mars research has entered a data-intensive period [4], enabling a deeper understanding of key scientific topics such as geological processes, climate evolution, and the potential for life. The objectives of Mars missions have expanded beyond terrain and geomorphological studies to include the search for signs of life, reconstruction of climate history, assessment of habitability, and preparation for sample return and future human exploration. These aims have driven the deployment of various orbiters, landers, rovers, and sample return missions, supporting systematic investigations of Martian geology, climate, magnetic fields, and radiation environment [5,6,7].
With the rapid growth in the Martian literature, along with the increasing interdisciplinarity of research topics, traditional manual reviews are showing significant limitations in large-scale knowledge integration. These methods, which typically rely on manual screening and subjective judgment, often lack systematic structure and reproducibility, making it difficult to effectively capture semantic relationships and temporal trends across research areas [8]. Meanwhile, the continuous gain of heterogeneous scientific data from Mars missions presents researchers with the dual challenges of information overload and the need for efficient data organization and interpretation [4]. In this highly interdisciplinary and data-intensive research environment, the adoption of automated and structured text mining and semantic analysis methods has become essential for advancing literature analysis and scientific understanding.
While natural language techniques and systematic reviews are increasingly used in ecology and environmental science [9], such approaches remain rare in planetary science. Prior meta-analyses often rely on metadata or abstracts from databases like Web of Science. For instance, a recent review of Google Earth Engine applications conducted a large-scale bibliometric and keyword-based analysis, rather than applying sentence-level semantic techniques [10].
In this work, we (1) develop a framework for sentence-level understanding and structured processing of the interdisciplinary Martian research literature using natural language processing (NLP) techniques; (2) identify thematic evolution, sentiment trends, and hotspot distributions in Martian research across domains from the solid planet to outer space; and (3) explore unresolved scientific questions, technological bottlenecks, and future research directions, providing insights into planetary science research and space mission planning. This study aims to support the systematic organization and interpretation of domain knowledge in planetary science, offering structured insights into research.

2. Data and Method

2.1. Corpus

This study compiled 10,655 academic papers related to Martian research, drawn from leading journals in Earth and space sciences, including Journal of Geophysical Research: Planets (JGR: Planets), Journal of Geophysical Research: Space Physics (JGR: Space Physics), Journal of Geophysical Research: Earth Surface (JGR: Earth Surface), Geophysical Research Letters (GRL), Earth and Planetary Science Letters (EPSL), Geochimica et Cosmochimica Acta (GCA), Environmental Science & Technology (EST), Nature, Nature Geoscience, Science, among others. The publication years span from 2001 to 2024. All selected papers are peer-reviewed journal articles, retrieved in full-text from major publishers such as American Geophysical Union, European Geosciences Union, American Meteorological Society, Elsevier, Springer, Wiley, and Nature Publishing Group. The selection focused on journals with high relevance and impact in the fields of Earth and space sciences. All selected papers must include keywords such as “Mars,” “MARS,” and “Martian” (case-insensitive). The corpus does not cover all Mars-related publications.
After completing the full-text download, all PDF files were systematically converted into text format, following conventions aligned with human reading habits. Specifically, multi-column layouts (dual or triple columns) were reformatted into a single-column structure. Figures, captions, and publisher-related information were excluded, retaining only the core content, such as abstracts and main body text. The resulting text was then segmented into sentences, each constrained to a minimum of 10 words and a maximum of 256 words. Sentences containing the keywords “Mars,” “MARS,” and “Martian” (case-insensitive) were further screened, yielding a corpus of 151,196 sentences directly relevant to Mars research.
Figure 1 illustrates the trends in the annual number of publications, the total number of sentences per year, and the distribution of sentences across different journals within the corpus.
Overall, both the number of annual publications and the number of sentences show an upward trend with time. Among the top five journals by total sentence count (Figure 1b), JGR: Planets primarily focuses on the Martian surface and environment, JGR: Space Physics emphasizes the Martian magnetic field and ionosphere, and GCA concentrates on geochemistry. These findings suggest that the Martian space environment, surface processes, and chemical composition have been important areas of research interest.
The Mars-related corpus is primarily sourced from journals published by the American Geophysical Union (AGU), although the database includes journals from multiple publishers. Notably, while the annual number of publications in journals such as GRL remained relatively stable, the volume of Mars-related sentences increased significantly since 2020. This trend may align with the timeline of several major Mars exploration missions. For example, in February 2021, China’s Tianwen-1 spacecraft entered Mars orbit, initiating a comprehensive investigation of the planet’s morphology, geology, subsurface water ice, material composition, climate, and physical fields to better understand its evolutionary history and potential habitability [11,12]. Concurrently, NASA’s Perseverance rover successfully landed in Jezero Crater, and its onboard Ingenuity helicopter achieved the first controlled flight on another planet in April 2021 [13]. These landmark missions have likely contributed to the heightened research interest in Mars.

2.2. Unsupervised Clustering

This study employs the lightweight language model all-MiniLM-L6-v2, developed by Ubiquitous Knowledge Processing Lab within the Sentence-Bidirectional Encoder Representations from Transformers (SBERT) architecture, to generate sentence embeddings, mapping each sentence into a 384-dimensional vector. SBERT is an enhanced version of Bidirectional Encoder Representations from Transformers (BERT), utilizing a Siamese network structure to independently encode sentences while preserving their semantic meaning. Unlike the original BERT, which requires simultaneous input of sentence pairs, SBERT outputs sentence vectors that can be efficiently compared using cosine similarity, making it particularly suitable for large-scale corpus similarity calculations and clustering analyses [14]. Moreover, the all-MiniLM-L6-v2 model offers an effective balance between computational speed and performance, and demonstrates strong results on the Massive Text Embedding Benchmark (MTEB) tasks.
After obtaining the semantic embeddings, this study applies unsupervised clustering to automatically group the text data. The clustering algorithm constructs a similarity graph based on cosine similarity between sentences and applies graph-based algorithms to identify “community structures,” representing semantically similar text clusters. Specifically, sentences are treated as nodes in the graph, and an edge is established between two sentences if their cosine similarity exceeds a defined threshold (set at 0.5 for Section 3.1, and 0.7 for Section 3.3). This forms a sentence similarity network. The Louvain community detection algorithm [14,15] is then employed to identify clusters, with a minimum community size requirement of five sentences.
To further analyze the distribution of clustering results, this study applies the t-distributed stochastic neighbor embedding (t-SNE) algorithm to reduce the dimensionality of high-dimensional sentence vectors into a two-dimensional space. t-SNE is a nonlinear dimensionality reduction technique that effectively preserves local data structures and is well-suited for visualizing clustering patterns [16]. The reduced sentence vectors are presented as scatter plots, with different colors used to distinguish the identified clusters, thereby visually highlighting the boundaries and concentrations of distinct semantic categories. To validate the reliability of the clustering results, a visual analysis was conducted, and each cluster was manually inspected to confirm high semantic coherence and good interpretability.

2.3. Topic Modeling

Prior to topic modeling, systematic preprocessing was applied to the texts from the top ten clusters to standardize content, reduce semantic noise, and improve the quality of subsequent embeddings. The preprocessing steps included removing punctuation and references (e.g., “et al.”), converting all text to lowercase, eliminating stop words using the Natural Language Toolkit (NLTK) English stopwords corpus, and applying WordNet Lemmatizer to restore words to their base forms [18]. The BERTopic model extracts topics based on sentence embeddings. In this study, the number of topics was set to 11 to balance topic granularity and interpretability, while the minimum topic size was set to 10 to ensure that each topic contained a sufficient number of documents. To enhance the interpretability of the modeling results, the anomalous topic labeled “−1” representing texts that were not effectively assigned to any cluster, was excluded from the analysis. Word clouds were then generated for each topic based on the frequency distribution of extracted keywords, using graphic masking and word frequency scaling techniques to visualize the semantic structure of each topic [19].
Building on the clustering analysis, this study employs the BERTopic language model to identify explicit and implicit semantic topic structures within the corpus. BERTopic integrates Transformer-based text embeddings with clustering techniques and introduces class-based Term Frequency–Inverse Document Frequency (c-TF-IDF) for topic representation. Compared to traditional topic modeling approaches, such as Latent Dirichlet Allocation (LDA), BERTopic demonstrates superior performance in capturing contextual relationships, synonyms, and polysemy, particularly in shorter texts [17].

2.4. Sentiment Analysis

This study employs the twitter-roberta-base-sentiment-latest language model to classify the emotional tendencies of the text. Based on the RoBERTa architecture [20], the model is trained on a large corpus of tweets and fine-tuned for sentiment analysis using the TweetEval benchmark [21]. The model outputs three sentiment labels: positive, neutral, and negative. The text is first encoded using a tokenizer and then input into the model. The model’s output is normalized using the softmax function to produce a probability distribution across the sentiment categories [22]. The category with the highest probability is selected as the predicted label.

2.5. Semantic Match

To perform semantic similarity analysis on the text data, this study utilizes the mxbai-embed-large-v1 language model, which is based on the Transformer architecture and incorporates the Angle Optimization method [14,23]. By optimizing the angular difference between embedding vectors in complex space, this approach effectively mitigates the gradient vanishing problem associated with cosine similarity in saturation regions, thereby significantly enhancing the accuracy and stability of semantic similarity modeling.
All texts are input into the model line by line for semantic vectorization. Before input, each text segment is truncated to a maximum length of 512 tokens. In the semantic retrieval stage, this study defines query statements related to the research topic (e.g., “Mars research future”), which are also transformed into embedding vectors using the model. Cosine similarity is then used to match the query vector with the pre-generated text vector library. The cosine score is positively correlated with semantic similarity. Finally, the results are ranked in descending order by similarity score, and texts with scores above 0.7 are selected as candidate content highly relevant to the query for subsequent analysis. This step is intended to identify challenge-related or future-oriented content to support structured scientific insight extraction.
A schematic overview of the entire framework is presented in Figure 2.

3. Results

3.1. Unsupervised Clustering

Unsupervised clustering was applied to the entire corpus, and the ten largest clusters were identified and summarized, as shown in Figure 3.
Sample points for each theme are represented by scattered dots of different colors, illustrating the overall structure of the corpus and highlighting primary research directions. Overlapping among differently colored dots suggests potential thematic overlap between certain clusters. This makes clear separation difficult in these regions. For instance, the scatter distributions of Cluster 1, “Impact of Solar Events on the Martian Ionosphere”, and Cluster 9, “Interaction Between Mars’ Crustal Magnetic Fields and Ionospheric Plasma,” show low discrimination, possibly indicating a strong content-related correlation. Although their research focuses differ, both clusters center on investigations of the Martian ionosphere.
As a supplement to Figure 3, Table 1 summarizes additional clusters beyond the top 10 shown in Figure 3, those with sample sizes greater than 300.
The identified themes primarily reflect the core motivations behind Mars exploration, particularly the search for evidence of past or present life and the analogical study of Earth’s geological history and future evolution.
Figure 4 presents the clustering analysis results based on five-year intervals, illustrating the evolutionary trends of research topics over time. The number and size of clusters in each period reflect shifts in research attention, with certain themes expanding, stabilizing, or declining. For example, research on Martian dust and dust storms has shown a clear upward trend, with sample sizes increasing from 154 (2001–2005) to 730 (2021–2024).
Given that the corpus is primarily drawn from core journals such as GRL and JGR: Planets, and that their annual publication volumes have remained relatively stable over the past two decades, the consistency in journal sources and publication scale provides a reliable basis for analyzing research trends. Building on a systematic review of the Mars-related literature since 2001, this study summarizes the developmental trajectory and evolving characteristics of key research topics across four domains: solid, surface, atmosphere, and space.
Research on the Martian solid crust was primarily focused on the crustal magnetic field, basalt mineralogy, internal structure, and thermal evolution. Between 2001 and 2010, gravity and magnetic field data from orbital missions, such as Mars Global Surveyor (MGS) and Mars Odyssey, along with laboratory analyses of Martian meteorites, supported investigations into the evolution and major components of the Martian crust, the mapping of crustal magnetic fields, the history of magnetic field evolution, and the distribution of localized magnetic anomalies. From 2011 to 2024, further laboratory studies of Martian meteorites and remote analyses of basaltic surfaces advanced research on the geochemistry, mantle evolution, and compositional characteristics of Martian basalts. In addition, seismic data from the Seismic Experiment for Interior Structure (SEIS) aboard the InSight lander provided critical constraints on mantle dynamics, thermal evolution, and the internal structure of Mars. Overall, research trends indicate a progression from the characterization of surface rock properties toward deeper investigations of mantle structure and thermochemical evolution. The integration of meteorite analyses with orbital remote sensing has emerged as the dominant research approach in this field.
Research on the Martian surface was focused on geological landforms and mineral records shaped by possible liquid water activities, revealing the evolution of early climate changes and potential habitable environments. Sustained interest in this domain is reflected by the prominence of surface-related topics among the top ten research themes from 2001 to 2005 and from 2006 to 2010, with continued emphasis across subsequent periods. Specifically, between 2001 and 2005, studies based on MGS orbital remote sensing and Mars Exploration Rover (MER) field investigations identified geomorphic features and water-bearing minerals indicative of ancient liquid water activity, providing key evidence for past aqueous environments. From 2006 to 2010, research progressed toward integrating geological and climatic systems, focusing on water–rock interactions and CO2 climate feedback mechanisms, and gradually constructing a framework for early Martian climate evolution. Between 2011 and 2015, attention shifted to sedimentary evolution and water storage processes, with empirical constraints strengthened through high-resolution observations, particularly in regions such as Gale Crater. From 2016 to 2020, research further deepened to investigate process mechanisms, including the relationship between wind systems and carbon cycling, and explored the role of an enhanced CO2 + H2 greenhouse effect on paleoclimate. Between 2021 and 2024, studies increasingly focused on the evolution of aeolian landforms and surface mineral diversity, examining the interactions between dust distribution and geomorphic changes to expand understanding of surface environments and paleoclimate evolution. Overall, research on the Martian surface has evolved from initial identification of morphological and aqueous indicators to in-depth analysis of process mechanisms and stratigraphic coupling, reflecting a progressive shift from discovery to interpretation in this field.
Research on the Martian atmosphere has focused on its structural characteristics, compositional evolution, and escape mechanisms, highlighting its critical role in the planet’s climate history and assessments of past habitability. Since 2001, atmospheric studies have been steadily expanded, consistently occupying major research directions across multiple periods. From 2001 to 2005, investigations based on orbital missions such as MGS and Mars Odyssey established a preliminary understanding of temperature distribution, vertical dust structure, and high-tide phenomena, laying the foundation for characterizing Martian atmospheric circulation. Between 2006 and 2010, the research focus shifted to the coupling processes between the atmosphere and dust storms, utilizing climate models and multi-mission observations to explore circulation disturbances, thermal structure evolution, and their feedback on seasonal climate cycles. From 2011 to 2015, the Mars Atmosphere and Volatile Evolution (MAVEN) mission enabled detailed investigations of middle and upper atmosphere escape mechanisms, systematically measuring ion composition and escape fluxes, and clarifying the role of solar wind and electromagnetic interactions in atmospheric loss. Between 2016 and 2020, attention turned to modeling the long-term evolution and escape history of the Martian atmosphere, exploring the impacts of CO2 and H2O loss on paleoclimatic conditions and atmospheric thinning, and proposing a transition pathway from a dense, warm climate to the current thin, cold state. From 2021 to 2024, research focused on the spatiotemporal evolution of atmospheric structural disturbances and escape responses induced by global dust storms. By integrating multi-source data from MAVEN and the Exobiology on Mars (ExoMars) program, studies achieved high-resolution characterizations of thermospheric dynamics, ionospheric disturbances, and energy input processes. Overall, research on the Martian atmosphere has evolved from static structural mapping to dynamic process analysis, advancing a comprehensive understanding of the multi-layer interactions that govern the evolution of Martian climate and atmospheric escape history.
Research on the Martian space focuses on the interactions between the solar wind, interplanetary magnetic field, and the Martian ionosphere–magnetosphere system, revealing the planet’s atypical magnetospheric structure and atmospheric escape processes in the absence of a global intrinsic magnetic field. Since 2006, studies in this area have shown a continuous trend of deepening. From 2006 to 2010, research preliminarily identified the basic configuration of the induced magnetosphere, emphasizing the regulatory role of crustal magnetic fields on ionospheric structure and plasma boundaries. Between 2011 and 2015, advancements in theoretical modeling and the accumulation of orbital observations shifted the focus to the influence of solar wind dynamic pressure variations on the topology of the Martian magnetosphere, clarifying the shielding and guiding functions of crustal magnetic fields in plasma transport. From 2016 to 2020, data from the MAVEN mission supported in-depth analysis of energy–mass coupling mechanisms between magnetic fields and escaping ions, revealing that changes in solar activity cycles modulate atmospheric escape rates by influencing external energy inputs and magnetic field configurations. Between 2021 and 2024, research focused on high-resolution analyses of plasma disturbances and energy deposition, elucidating the regulatory pathways through which solar wind–magnetosphere interactions drive neutral atmospheric escape and emphasizing the overall coupling mechanisms across the ionosphere, magnetosphere, and escape processes. Overall, research on the Martian space environment has progressively transitioned from early structural identification to dynamic modeling of multi-scale, multi-layer coupling processes, reflecting a shift from descriptive studies to mechanistic understanding.

3.2. Topic Modeling

Figure 5 presents the results of further topic modeling using BERTopic, building on the top ten clusters identified in Figure 3.
The initial clustering divided the corpus into ten relatively distinct topic groups. To further explore the internal structure of each cluster, this study adopts a unit-based approach, treating each cluster as an independent analytical unit and applying additional topic modeling within each group. This approach effectively compensates for potential omissions of secondary topics in the overall summary and enhances the hierarchical structure and completeness of the topic analysis. For example, within Cluster 1, “Impact of Solar Events on the Martian Ionosphere”, more specific subthemes were identified, including “Martian Ionosphere and Solar Interaction and Solar Flares” and “Their Impact on Mars”. Additionally, important subtopics that were not highlighted in the original summary, such as “Global Dust Storms and Atmospheric Effects” and “Dust Storms and Their Atmospheric Effects”, were also uncovered. Finally, word clouds were generated based on keyword frequency to visually present the core vocabulary and their relative importance within each topic, further enhancing the interpretability of the modeling results.
The modeling results indicate that Cluster 1 and Cluster 4, “Mineralogical History and Past Environments of Mars”, exhibit the most complex internal structures, characterized by a high degree of subtheme diversity and semantic differentiation. Specifically, Cluster 1 encompasses multiple subfields, including the ionospheric response to solar activity, ion escape and electron density variations, plasma dynamics, and atmospheric disturbance processes. Collectively, these studies highlight the multi-scale linkages between microscopic particle interactions and macroscopic climate disturbances. Cluster 4 is organized into subthemes addressing crustal mineral composition, sedimentary environment evolution, regional analyses (e.g., Jezero Crater), and meteorite correlation studies, emphasizing the importance of mineral genesis and paleoenvironmental reconstruction in assessing early habitability.

3.3. Sentiment Analysis

Figure 6a highlights the positive emotions in Mars research across multiple domains, including geological evolution, atmospheric dynamics, water resource distribution, and meteorite genesis analysis. These advances reflect the significant contributions of recent exploration missions and modeling studies in uncovering Mars’ evolutionary processes, reconstructing ancient environments, and assessing its potential habitability. Collectively, these findings not only enhance our understanding of multi-layer interactions on Mars but also establish a scientific foundation for future high-precision exploration missions.
In contrast, Figure 6b highlights the core scientific challenges that remain unresolved in Mars research, reflecting the limitations and uncertainties within the current understanding framework. These negative sentiment expressions reveal persistent difficulties faced by researchers in reconstructing early Martian climate conditions, explaining variations in atmospheric trace gases, evaluating the stability of liquid water, and predicting the effects of space radiation. Such challenges not only constrain a systematic understanding of Martian geological and climatic history but also pose significant obstacles to future crewed exploration and sample return missions.
In the study of Martian solid crust, remote sensing data and in situ geochemical analyses have significantly advanced our understanding of the planet’s internal structure and geological evolution. High-resolution terrain and spectral datasets have supported modeling efforts related to crust–mantle differentiation, volcanic activity, and localized thermal anomalies. In particular, Martian meteorites, as ground truth samples, provide critical evidence for constraining mantle composition and volcanic processes. However, substantial gaps remain. Some early plate tectonic models are inconsistent with observed crustal thickness and surface geochemical characteristics, and the correlation between Martian surface compositions and meteorite types remains unclear. These issues reflect dual challenges: limited representativeness of available samples and an incomplete understanding of crustal differentiation mechanisms. Addressing these challenges will require the advancement of integrated modeling approaches.
In the study of the Martian surface, exploration of the Martian surface has revealed abundant evidence of ancient water activity, including river valley networks, sedimentary bedding, and water-altered minerals, supporting the view that early Mars once possessed a habitable environment. The combined observations from orbital platforms and rovers have enabled the reconstruction of ancient hydrological processes, significantly enriching our understanding of early surface topography and sedimentary systems. However, the modern Martian surface presents several challenges. Under current low-pressure and low-temperature conditions, the stable existence of liquid water is highly unlikely, creating a marked inconsistency with geomorphic evidence. Moreover, the presence of surface oxidants, intense ultraviolet radiation, and frequent dust storms not only inhibits the preservation of organic matter but also poses significant risks to lander systems and future human exploration. This contrast between positive findings and ongoing challenges highlights the dramatic transition of the Martian surface from a potentially habitable early environment to its current harsh and inhospitable state.
In the Martian atmosphere, recent exploration missions, notably MAVEN, have significantly advanced our understanding of the Martian atmospheric structure, ionospheric dynamics, and long-term gas escape mechanisms. These studies have identified diurnal variations and seasonal cycles in the Martian atmosphere, improved global climate models, and contributed to modeling potential mechanisms responsible for early warm and humid climates. However, notable contradictions between models and observations remain unresolved. Climate models based solely on CO2 and H2O are insufficient to reproduce the warming conditions necessary for the stable existence of liquid water. Additionally, the sources and loss mechanisms of methane remain highly controversial, with conflicting results reported between the ExoMars Trace Gas Orbiter (TGO) and Curiosity rover, highlighting gaps in the current understanding of atmospheric chemical processes. These issues underscore the need for continued interdisciplinary collaboration to construct a more complete and coherent model of Martian atmospheric evolution.
In the Martian space, the upper atmosphere and space plasma environment of Mars have been systematically observed in recent years through orbital missions such as MAVEN. These studies have revealed the coupling mechanisms between the solar wind and Martian crustal magnetic anomalies, advancing the understanding of Martian space weather and atmospheric escape processes. However, the absence of a global magnetic field and a dense atmosphere leaves the Martian surface exposed to prolonged bombardment by high-energy particles from the solar wind and cosmic rays. This radiation environment poses significant health risks to future crewed missions, including potential impacts such as Deoxyribonucleic acid (DNA) damage, cellular death, and increased cancer risk. Space radiation has been identified as one of the primary barriers to long-term human habitation on Mars, and the development of effective protective strategies remains an urgent research priority.

3.4. Semantic Match

Using semantic analysis methods, negative prompts such as “Mars research limitation,” “Mars research problem,” and “Mars research challenge” were applied to screen and identify the core literature in the field of Mars research over the past six years. Representative scientific and technological limitations were then manually extracted and summarized (Table 2).
Similarly, to further highlight the positive developments in this field, potential research directions and forward-looking solutions were manually extracted using affirmative prompts such as “Mars research future,” “Mars research further,” and “Mars research better” (Table 3).
The year 2019 was selected as the starting point based on the operational milestones of major Mars missions. At the end of 2018 and early 2019, the InSight mission successfully completed its landing and began providing the first seismic and heat flow data [71]. Simultaneously, TGO transitioned to routine scientific operations, with key atmospheric and gas detection results being released from 2019 onward [72,73]. As such, 2019 may mark an important milestone in Mars research, particularly in the fields of solid geophysics and atmospheric studies, offering high-value semantic material for analysis. The identified challenges include insufficient data availability, technological bottlenecks, limitations in instrument performance, and the inherent geological and environmental complexities of Mars, collectively providing a foundation for understanding the current state and future directions of Mars exploration.
This semantic analysis serves as a targeted extension of the sentiment analysis presented in Figure 6. While the sentiment analysis qualitatively revealed the dual structure of Mars research, which highlights both promising advancements and unresolved issues, semantic analysis quantitatively refines this framework by mapping these emotional tendencies to specific scientific themes and technical domains.

4. Summary and Conclusions

This study presents a sentence-level semantic analysis framework for the Martian exploration literature, systematically tracing the development and thematic evolution of research from 2001 to 2024. The framework combines NLP techniques, including unsupervised clustering, topic modeling, sentiment analysis, and semantic match, to establish an efficient and scalable automated review method for processing large, interdisciplinary corpora in planetary science and facilitating knowledge discovery.
Research on Martian solid has expanded from studies of crustal magnetic fields and meteorite mineralogy to investigations of mantle structure and thermal evolution processes. The introduction of InSight seismic data has significantly enhanced the modeling of deep interior structures, suggesting that future missions will further advance the exploration of geochemical differentiation and thermal evolution pathways.
Research on the Martian surface has evolved from the morphological identification of liquid water activity to the systematic modeling of sedimentary processes, carbon cycling, and wind-driven mechanisms. This transition highlights a profound shift from the planet’s early potential habitability to the harsh surface environment observed today.
Research on the Martian atmosphere has progressed from preliminary descriptions of temperature and dust structures to quantitative modeling of escape mechanisms and climate evolution. In particular, the MAVEN mission’s characterization of CO2 and H2O loss processes has provided critical observational constraints for hypotheses of an early warm and humid climate. However, uncertainties surrounding current methane source-sink mechanisms and climate models remain key challenges for future research.
Research on Martian space has revealed the complex interactions between the solar wind and the planet’s ionosphere–magnetosphere system, emphasizing the coupling between atypical magnetospheric structures and atmospheric escape processes. In the absence of a global magnetic field, the strong influence of solar activity on the near-space environment has become a critical factor for space weather studies and future crewed mission planning.
In the near future, Martian surface might remain a central focus of research, offering both significant scientific potential and considerable technological challenges. Studies of the space environment would be expected to play a more supporting role by contributing to the construction of integrated system models of Mars, addressing broader questions such as habitability and planetary evolution.
Overall, this study demonstrates the practical value of automated semantic analysis methods in reviewing the planetary science literature and systematically summarizes the major advances and key challenges in Mars research since 2001. Furthermore, the proposed framework in this study demonstrates strong versatility and can be extended to literature analysis tasks for other planetary topics, such as the Moon and Venus, as well as to outline future trends in scientific research.
In addition to its methodological contribution, the study draws upon a large, domain-specific corpus of over 10,000 peer-reviewed articles focused on Mars. This data foundation ensures both depth and relevance, enabling more accurate identification of research patterns, unresolved questions, and evolving scientific priorities. The integration of multiple NLP tools into a coherent workflow further enhances the interpretability and scalability of the results.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z.; software, Y.Z.; validation, Q.H. and Y.S.; formal analysis, J.S.; resources, Y.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z., Y.G., K.H., S.Z., and Y.S.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the National Natural Science Foundation of China under Grants 42205074 and 62101203.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the corresponding authors upon request.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-4o (OpenAI, released May 2024) to assist in summarizing textual outputs produced by natural language processing methods. The authors have reviewed and edited the generated content and take full responsibility for the final version of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Braun, R.D.; Manning, R.M.; Braun, R.D.; Manning, R.M. Mars Exploration Entry, Descent and Landing Challenges. In Proceedings of the 2006 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2006; IEEE: Big Sky, MT, USA, 2006; pp. 1–18. [Google Scholar] [CrossRef]
  2. National Aeronautics and Space Administration. NASA’s Journey to Mars: Pioneering Next Steps in Space Exploration; National Aeronautics and Space Administration: Washington, DC, USA, 2015. Available online: https://www.nasa.gov/wp-content/uploads/2017/11/journey-to-mars-next-steps-20151008_508.pdf (accessed on 20 July 2025).
  3. Du, P.; Yuan, P.; Liu, J.; Ye, B. Clay Minerals on Mars: An up-to-Date Review with Future Perspectives. Earth-Sci. Rev. 2023, 243, 104491. [Google Scholar] [CrossRef]
  4. Rongier, G.; Pankratius, V. Computer-Aided Exploration of the Martian Geology. Earth Space Sci. 2018, 5, 393–407. [Google Scholar] [CrossRef] [PubMed]
  5. McCleese, D.; Greeley, R.; MacPherson, G. Science planning for exploring mars. JPL Publ. 2001, 1, 1–47. [Google Scholar]
  6. Chojnacki, M.; Banks, M.; Urso, A. Wind-Driven Erosion and Exposure Potential at Mars 2020 Rover Candidate-Landing Sites. JGR Planets 2018, 123, 468–488. [Google Scholar] [CrossRef] [PubMed]
  7. Siljeström, S.; Czaja, A.D.; Corpolongo, A.; Berger, E.L.; Li, A.Y.; Cardarelli, E.; Abbey, W.; Asher, S.A.; Beegle, L.W.; Benison, K.C.; et al. Evidence of Sulfate-Rich Fluid Alteration in Jezero Crater Floor, Mars. JGR Planets 2024, 129, e2023JE007989. [Google Scholar] [CrossRef]
  8. Snyder, H. Literature Review as a Research Methodology: An Overview and Guidelines. J. Bus. Res. 2019, 104, 333–339. [Google Scholar] [CrossRef]
  9. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for Geo-Big Data Applications: A Meta-Analysis and Systematic Review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
  10. Gurevitch, J.; Koricheva, J.; Nakagawa, S.; Stewart, G. Meta-Analysis and the Science of Research Synthesis. Nature 2018, 555, 175–182. [Google Scholar] [CrossRef] [PubMed]
  11. Zou, Y.; Zhu, Y.; Bai, Y.; Wang, L.; Jia, Y.; Shen, W.; Fan, Y.; Liu, Y.; Wang, C.; Zhang, A.; et al. Scientific Objectives and Payloads of Tianwen-1, China’s First Mars Exploration Mission. Adv. Space Res. 2021, 67, 812–823. [Google Scholar] [CrossRef]
  12. Tan, X.; Liu, J.; Zhang, X.; Yan, W.; Chen, W.; Ren, X.; Zuo, W.; Li, C. Design and Validation of the Scientific Data Products for China’s Tianwen-1 Mission. Space Sci. Rev. 2021, 217, 69. [Google Scholar] [CrossRef]
  13. Tzanetos, T.; Aung, M.; Balaram, J.; Grip, H.F.; Karras, J.T.; Canham, T.K.; Kubiak, G.; Anderson, J.; Merewether, G.; Starch, M.; et al. Ingenuity Mars Helicopter: From Technology Demonstration to Extraterrestrial Scout. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; IEEE: Big Sky, MT, USA, 2022; pp. 01–19. [Google Scholar]
  14. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
  15. Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. 2008, 2008, P10008. [Google Scholar] [CrossRef]
  16. Maaten, L.V.; Hinton, G.E. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  17. Grootendorst, M. BERTopic: Neural Topic Modeling with a Class-Based TF-IDF Procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
  18. Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
  19. Heimerl, F.; Lohmann, S.; Lange, S.; Ertl, T. Word Cloud Explorer: Text Analytics Based on Word Clouds. In Proceedings of the 2014 47th Hawaii International Conference on System Sciences, Waikoloa, HI, USA, 6–9 January 2014; pp. 1833–1842. [Google Scholar] [CrossRef]
  20. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  21. Barbieri, F.; Camacho-Collados, J.; Neves, L.; Espinosa-Anke, L. TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. arXiv 2020, arXiv:2010.12421. [Google Scholar]
  22. Bridle, J.S. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In Neurocomputing; Soulié, F.F., Hérault, J., Eds.; Springer: Berlin, Heidelberg, 1990; pp. 227–236. ISBN 978-3-642-76155-3. [Google Scholar] [CrossRef]
  23. Li, X.; Li, J. AnglE-Optimized Text Embeddings. arXiv 2024, arXiv:2309.12871. [Google Scholar]
  24. Pieterek, B.; Jones, T.J. The Evolution of Martian Fissure Eruptions and Their Plumbing Systems. Earth Planet. Sci. Lett. 2023, 621, 118382. [Google Scholar] [CrossRef]
  25. Yoshizaki, T.; McDonough, W.F. The Composition of Mars. Geochim. Cosmochim. Acta 2020, 273, 137–162. [Google Scholar] [CrossRef]
  26. Joshi, R.; Knapmeyer-Endrun, B.; Mosegaard, K.; Igel, H.; Christensen, U.R. Joint Inversion of Receiver Functions and Apparent Incidence Angles for Sparse Seismic Data. Earth Space Sci. 2021, 8, e2021EA001733. [Google Scholar] [CrossRef]
  27. Flynn, I.T.W.; Crown, D.A.; Ramsey, M.S. Determining Emplacement Conditions and Vent Locations for Channelized Lava Flows Southwest of Arsia Mons. JGR Planets 2022, 127, e2022JE007467. [Google Scholar] [CrossRef] [PubMed]
  28. Khan, A.; Sossi, P.A.; Liebske, C.; Rivoldini, A.; Giardini, D. Geophysical and Cosmochemical Evidence for a Volatile-Rich Mars. Earth Planet. Sci. Lett. 2022, 578, 117330. [Google Scholar] [CrossRef]
  29. Kim, D.; Stähler, S.C.; Ceylan, S.; Lekic, V.; Maguire, R.; Zenhäusern, G.; Clinton, J.; Giardini, D.; Khan, A.; Panning, M.P.; et al. Structure Along the Martian Dichotomy Constrained by Rayleigh and Love Waves and Their Overtones. Geophys. Res. Lett. 2023, 50, e2022GL101666. [Google Scholar] [CrossRef]
  30. Shi, J.; Plasman, M.; Knapmeyer-Endrun, B.; Xu, Z.; Kawamura, T.; Lognonné, P.; McLennan, S.M.; Sainton, G.; Banerdt, W.B.; Panning, M.P.; et al. High-Frequency Receiver Functions With Event S1222a Reveal a Discontinuity in the Martian Shallow Crust. Geophys. Res. Lett. 2023, 50, e2022GL101627. [Google Scholar] [CrossRef]
  31. Kronyak, R.E.; Arndt, C.; Kah, L.C.; TerMaath, S.C. Predicting the Mechanical and Fracture Properties of Mars Analog Sedimentary Lithologies. Earth Space Sci. 2020, 7, e2019EA000926. [Google Scholar] [CrossRef]
  32. Berlanga, G.; Williams, Q.; Temiquel, N. Convolutional Neural Networks as a Tool for Raman Spectral Mineral Classification Under Low Signal, Dusty Mars Conditions. Earth Space Sci. 2022, 9, e2021EA002125. [Google Scholar] [CrossRef]
  33. Robbins, S.J. Mars Reconnaissance Orbiter ’s Mars Color Imager (MARCI): A New Workflow for Processing Its Image Data. Earth Space Sci. 2022, 9, e2021EA002138. [Google Scholar] [CrossRef]
  34. Wynne, J.J.; Titus, T.N.; Agha-Mohammadi, A.; Azua-Bustos, A.; Boston, P.J.; De León, P.; Demirel-Floyd, C.; De Waele, J.; Jones, H.; Malaska, M.J.; et al. Fundamental Science and Engineering Questions in Planetary Cave Exploration. JGR Planets 2022, 127, e2022JE007194. [Google Scholar] [CrossRef] [PubMed]
  35. Hazen, R.M.; Downs, R.T.; Morrison, S.M.; Tutolo, B.M.; Blake, D.F.; Bristow, T.F.; Chipera, S.J.; McSween, H.Y.; Ming, D.; Morris, R.V.; et al. On the Diversity and Formation Modes of Martian Minerals. JGR Planets 2023, 128, e2023JE007865. [Google Scholar] [CrossRef]
  36. Robbins, S.J.; Kirchoff, M.R.; Hoover, R.H. Empirical Brightness Control and Equalization of Mars Context Camera Images. Earth Space Sci. 2020, 7, e2019EA001053. [Google Scholar] [CrossRef]
  37. Hader, J.D.; Fairén, A.G.; MacLeod, M. Planetary Protection Requirements Should Address Pollution from Chemicals and Materials. Proc. Natl. Acad. Sci. USA 2023, 120, e2310792120. [Google Scholar] [CrossRef] [PubMed]
  38. Montlaur, A.; Arias, S.; Rojas, J.I. Thermally Driven Winds on Mars: A Review and a Slope Effect Numerical Study. JGR Planets 2024, 129, e2023JE007987. [Google Scholar] [CrossRef]
  39. Li, Y.; Xiao, Z.; Ma, C.; Zeng, L.; Zhang, W.; Peng, M.; Li, A. Extraction and Analysis of Three-Dimensional Morphological Features of Centimeter-Scale Rocks in Zhurong Landing Region. JGR Planets 2023, 128, e2022JE007656. [Google Scholar] [CrossRef]
  40. Kite, E.S.; Melwani Daswani, M. Geochemistry Constrains Global Hydrology on Early Mars. Earth Planet. Sci. Lett. 2019, 524, 115718. [Google Scholar] [CrossRef]
  41. Wu, B.; Dong, J.; Wang, Y.; Li, Z.; Chen, Z.; Liu, W.C.; Zhu, J.; Chen, L.; Li, Y.; Rao, W. Characterization of the Candidate Landing Region for Tianwen-1—China’s First Mission to Mars. Earth Space Sci. 2021, 8, e2021EA001670. [Google Scholar] [CrossRef]
  42. Yu, W.; Zeng, X.; Li, X.; Tang, H.; Liu, J. Quantifying Shock Effects of Mars Sample via Micro-FTIR Spectra of Plagioclase. JGR Planets 2024, 129, e2024JE008487. [Google Scholar] [CrossRef]
  43. Creecy, E.; Li, L.; Jiang, X.; Smith, M.; Kass, D.; Kleinböhl, A.; Martínez, G. Mars’ Emitted Energy and Seasonal Energy Imbalance. Proc. Natl. Acad. Sci. USA 2022, 119, e2121084119. [Google Scholar] [CrossRef] [PubMed]
  44. Wernicke, L.J.; Jakosky, B.M. Martian Hydrated Minerals: A Significant Water Sink. JGR Planets 2021, 126, e2019JE006351. [Google Scholar] [CrossRef]
  45. Hoffman, J.A.; Hecht, M.H.; Rapp, D.; Hartvigsen, J.J.; SooHoo, J.G.; Aboobaker, A.M.; McClean, J.B.; Liu, A.M.; Hinterman, E.D.; Nasr, M.; et al. Mars Oxygen ISRU Experiment (MOXIE)—Preparing for Human Mars Exploration. Sci. Adv. 2022, 8, eabp8636. [Google Scholar] [CrossRef] [PubMed]
  46. Fonseca, R.M.; Zorzano, M.; Martín-Torres, J. MARSWRF Prediction of Entry Descent Landing Profiles: Applications to Mars Exploration. Earth Space Sci. 2019, 6, 1440–1459. [Google Scholar] [CrossRef]
  47. Mooring, T.A.; Davis, G.E.; Greybush, S.J. Low-Level Jets and the Convergence of Mars Data Assimilation Algorithms. JGR Planets 2022, 127, e2021JE006968. [Google Scholar] [CrossRef]
  48. Viúdez-Moreiras, D.; Richardson, M.I.; Newman, C.E. Constraints on Emission Source Locations of Methane Detected by Mars Science Laboratory. JGR Planets 2021, 126, e2021JE006958. [Google Scholar] [CrossRef]
  49. Fedorova, A.; Trokhimovskiy, A.; Lefèvre, F.; Olsen, K.S.; Korablev, O.; Montmessin, F.; Ignatiev, N.; Lomakin, A.; Forget, F.; Belyaev, D.; et al. Climatology of the CO Vertical Distribution on Mars Based on ACS TGO Measurements. JGR Planets 2022, 127, e2022JE007195. [Google Scholar] [CrossRef]
  50. Posner, A.; Strauss, R.D. Warning Time Analysis From SEP Simulations of a Two-Tier REleASE System Applied to Mars Exploration. Space Weather 2020, 18, e2019SW002354. [Google Scholar] [CrossRef]
  51. Song, Y.; Lu, H.; Cao, J.; Li, S.; Yu, Y.; Wang, S.; Ge, Y.; Zhang, X.; Zhou, C.; Wang, J. Effects of Force in the Martian Plasma Environment With Solar Wind Dynamic Pressure Enhancement. JGR Space Phys. 2023, 128, e2022JA031083. [Google Scholar] [CrossRef]
  52. Nauth, M.; Fowler, C.M.; Andersson, L.; DiBraccio, G.A.; Xu, S.; Weber, T.; Mitchell, D. The Influence of Magnetic Field Topology and Orientation on the Distribution of Thermal Electrons in the Martian Magnetotail. JGR Space Phys. 2021, 126, e2020JA028130. [Google Scholar] [CrossRef]
  53. Agarwal, S.; Tosi, N.; Kessel, P.; Padovan, S.; Breuer, D.; Montavon, G. Toward Constraining Mars’ Thermal Evolution Using Machine Learning. Earth Space Sci. 2021, 8, e2020EA001484. [Google Scholar] [CrossRef]
  54. Emran, A.; Marzen, L.J.; King, D.T. Semiautomated Identification and Characterization of Dunes at Hargraves Crater, Mars. Earth Space Sci. 2020, 7, e2019EA000935. [Google Scholar] [CrossRef]
  55. Martin, P.E.; Ehlmann, B.L.; Thomas, N.H.; Wiens, R.C.; Hollis, J.J.R.; Beegle, L.W.; Bhartia, R.; Clegg, S.M.; Blaney, D.L. Studies of a Lacustrine-Volcanic Mars Analog Field Site With Mars-2020-Like Instruments. Earth Space Sci. 2020, 7, e2019EA000720. [Google Scholar] [CrossRef]
  56. Leask, E.K.; Ehlmann, B.L.; Greenberger, R.N.; Pinet, P.; Daydou, Y.; Ceuleneer, G.; Kelemen, P. Tracing Carbonate Formation, Serpentinization, and Biological Materials With Micro-/Meso-Scale Infrared Imaging Spectroscopy in a Mars Analog System, Samail Ophiolite, Oman. Earth Space Sci. 2021, 8, e2021EA001637. [Google Scholar] [CrossRef] [PubMed]
  57. Dundas, C.M.; Mellon, M.T.; Conway, S.J.; Gastineau, R. Active Boulder Movement at High Martian Latitudes. Geophys. Res. Lett. 2019, 46, 5075–5082. [Google Scholar] [CrossRef] [PubMed]
  58. Leone, G. The Absence of an Ocean and the Fate of Water All Over the Martian History. Earth Space Sci. 2020, 7, e2019EA001031. [Google Scholar] [CrossRef]
  59. Stähler, S.C.; Widmer-Schnidrig, R.; Scholz, J.-R.; Van Driel, M.; Mittelholz, A.; Hurst, K.; Johnson, C.L.; Lemmon, M.T.; Lognonné, P.; Lorenz, R.D.; et al. Geophysical Observations of Phobos Transits by InSight. Geophys. Res. Lett. 2020, 47, e2020GL089099. [Google Scholar] [CrossRef]
  60. Pan, L.; Deng, Z.; Bizzarro, M. Impact Induced Oxidation and Its Implications for Early Mars Climate. Geophys. Res. Lett. 2023, 50, e2023GL102724. [Google Scholar] [CrossRef]
  61. Lorenz, R.D. Martian Ripples Making a Splash. JGR Planets 2020, 125, e2020JE006658. [Google Scholar] [CrossRef]
  62. Montgomery, W. New Paths for Survivability of Organic Material in the Martian Subsurface. JGR Planets 2020, 125, e2019JE006370. [Google Scholar] [CrossRef]
  63. Campbell, J.D.; Schmitt, B.; Brissaud, O.; Muller, J. The Detectability Limit of Organic Molecules Within Mars South Polar Laboratory Analogs. JGR Planets 2021, 126, e2020JE006595. [Google Scholar] [CrossRef]
  64. Ke, T.; Zhong, Y.; Song, M.; Wang, X.; Zhang, L. Mineral Detection Based on Hyperspectral Remote Sensing Imagery on Mars: From Detection Methods to Fine Mapping. ISPRS J. Photogramm. Remote Sens. 2024, 218, 761–780. [Google Scholar] [CrossRef]
  65. Lin, B.; Liu, Z. Martian Atmospheric CO2 and Pressure Profiling With Differential Absorption Lidar: System Consideration and Simulation Results. Earth Space Sci. 2021, 8, e2020EA001600. [Google Scholar] [CrossRef]
  66. Guzewich, S.D.; Fedorova, A.A.; Kahre, M.A.; Toigo, A.D. Studies of the 2018/Mars Year 34 Planet-Encircling Dust Storm. JGR Planets 2020, 125, e2020JE006700. [Google Scholar] [CrossRef]
  67. Sun, W.; Ma, Y.; Russell, C.T.; Luhmann, J.; Nagy, A.; Brain, D. 5-Species MHD Study of Martian Proton Loss and Source. JGR Space Phys. 2023, 128, e2023JA031301. [Google Scholar] [CrossRef]
  68. Wu, Z.; Li, T.; Heavens, N.G.; Newman, C.E.; Richardson, M.I.; Yang, C.; Li, J.; Cui, J. Earth-like Thermal and Dynamical Coupling Processes in the Martian Climate System. Earth-Sci. Rev. 2022, 229, 104023. [Google Scholar] [CrossRef]
  69. Holmberg, M.K.G.; André, N.; Garnier, P.; Modolo, R.; Andersson, L.; Halekas, J.; Mazelle, C.; Steckiewicz, M.; Génot, V.; Fedorov, A.; et al. MAVEN and MEX Multi-instrument Study of the Dayside of the Martian Induced Magnetospheric Structure Revealed by Pressure Analyses. JGR Space Phys. 2019, 124, 8564–8589. [Google Scholar] [CrossRef]
  70. Shuvalov, S.D.; Grigorenko, E.E. Observation of SLAMS-Like Structures Close to Martian Aphelion by MAVEN. JGR Space Phys. 2023, 128, e2022JA031018. [Google Scholar] [CrossRef]
  71. Banerdt, W.B.; Smrekar, S.E.; Banfield, D.; Giardini, D.; Golombek, M.; Johnson, C.L.; Lognonné, P.; Spiga, A.; Spohn, T.; Perrin, C.; et al. Initial Results from the InSight Mission on Mars. Nat. Geosci. 2020, 13, 183–189. [Google Scholar] [CrossRef]
  72. Vago, J.; Witasse, O.; Svedhem, H.; Baglioni, P.; Haldemann, A.; Gianfiglio, G.; Blancquaert, T.; McCoy, D.; De Groot, R. ESA ExoMars Program: The next Step in Exploring Mars. Sol. Syst. Res. 2015, 49, 518–528. [Google Scholar] [CrossRef]
  73. Korablev, O.; Vandaele, A.C.; Montmessin, F.; Fedorova, A.A.; Trokhimovskiy, A.; Forget, F.; Lefèvre, F.; Daerden, F.; Thomas, I.R.; Trompet, L.; et al. No Detection of Methane on Mars from Early ExoMars Trace Gas Orbiter Observations. Nature 2019, 568, 517–520. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Annual trends of Mars-related publications and sentence counts from 2001 to 2024. (a) The green curve illustrates the number of Mars-related papers published from 2001 to 2024. (b) Changes in the number of Mars-related statements across different journals are shown, with the green curve representing the total number of statements and the colored curves representing individual journals, including Journal of Geophysical Research: Planets, Journal of Geophysical Research: Space Physics, Geophysical Research Letters, Earth and Planetary Science Letters, and Geochimica et Cosmochimica Acta.
Figure 1. Annual trends of Mars-related publications and sentence counts from 2001 to 2024. (a) The green curve illustrates the number of Mars-related papers published from 2001 to 2024. (b) Changes in the number of Mars-related statements across different journals are shown, with the green curve representing the total number of statements and the colored curves representing individual journals, including Journal of Geophysical Research: Planets, Journal of Geophysical Research: Space Physics, Geophysical Research Letters, Earth and Planetary Science Letters, and Geochimica et Cosmochimica Acta.
Applsci 15 08663 g001
Figure 2. Workflow of the framework for analyzing Mars-related literature, showing major steps and outputs.
Figure 2. Workflow of the framework for analyzing Mars-related literature, showing major steps and outputs.
Applsci 15 08663 g002
Figure 3. Unsupervised clustering of all sentences, highlighting the top ten clusters and their corresponding large-model summaries. The scatter plot on the left illustrates the distribution of samples within the Mars research corpus, where gray dots represent sentences projected onto a two-dimensional plane, and colored dots denote the top ten identified topics. Surrounding text provides brief summaries of the top ten clusters, with text colors matching the corresponding scatter plot colors. Summaries were generated using the GPT-4o model with the prompts: “Read the entire file and generate a 100-word summary for the file” and “Give this paragraph a short title”.
Figure 3. Unsupervised clustering of all sentences, highlighting the top ten clusters and their corresponding large-model summaries. The scatter plot on the left illustrates the distribution of samples within the Mars research corpus, where gray dots represent sentences projected onto a two-dimensional plane, and colored dots denote the top ten identified topics. Surrounding text provides brief summaries of the top ten clusters, with text colors matching the corresponding scatter plot colors. Summaries were generated using the GPT-4o model with the prompts: “Read the entire file and generate a 100-word summary for the file” and “Give this paragraph a short title”.
Applsci 15 08663 g003
Figure 4. Temporal evolution of Mars research hotspots based on unsupervised clustering. Clustering results are divided into five time periods: (a) 2001–2005, (b) 2006–2010, (c) 2011–2015, (d) 2016–2020, and (e) 2021–2024. The visualizations illustrate changes in topic distribution over time. Colored labels on the right indicate research themes summarized by the GPT-4o large model using the prompt: “Read the complete text and generate a short title.”
Figure 4. Temporal evolution of Mars research hotspots based on unsupervised clustering. Clustering results are divided into five time periods: (a) 2001–2005, (b) 2006–2010, (c) 2011–2015, (d) 2016–2020, and (e) 2021–2024. The visualizations illustrate changes in topic distribution over time. Colored labels on the right indicate research themes summarized by the GPT-4o large model using the prompt: “Read the complete text and generate a short title.”
Applsci 15 08663 g004
Figure 5. In-depth thematic simulation of the top ten clusters identified in Figure 3. Panel (a) presents the simulation of ten sub-themes within Cluster 1, including corresponding word clouds and GPT-4o-generated summaries. Similarly, Panels (bj) represent Clusters 2 to 10. Each theme is accompanied by a word cloud and a concise summary generated using GPT-4o.
Figure 5. In-depth thematic simulation of the top ten clusters identified in Figure 3. Panel (a) presents the simulation of ten sub-themes within Cluster 1, including corresponding word clouds and GPT-4o-generated summaries. Similarly, Panels (bj) represent Clusters 2 to 10. Each theme is accompanied by a word cloud and a concise summary generated using GPT-4o.
Applsci 15 08663 g005
Figure 6. Sentiment-based clustering of Mars-related research statements. Panels (a) and (b) display the top ten themes associated with positive and negative sentiments, respectively, along with corresponding summaries generated by the GPT-4o model. In panel (a), light yellow clusters represent irrelevant content that was excluded from further analysis and is not included in the thematic interpretation.
Figure 6. Sentiment-based clustering of Mars-related research statements. Panels (a) and (b) display the top ten themes associated with positive and negative sentiments, respectively, along with corresponding summaries generated by the GPT-4o model. In panel (a), light yellow clusters represent irrelevant content that was excluded from further analysis and is not included in the thematic interpretation.
Applsci 15 08663 g006
Table 1. Summary of additional clusters (Cluster 11–31). GPT-4o large language model (LLM), was applied to clusters with sample sizes greater than 300 to identify the main topic and short summary of each cluster.
Table 1. Summary of additional clusters (Cluster 11–31). GPT-4o large language model (LLM), was applied to clusters with sample sizes greater than 300 to identify the main topic and short summary of each cluster.
No.Sample SizeTopicSummary
11643Martian Meteorites as Proxies for Mars’ GeochemistryMartian meteorites reveal Mars’ mantle composition, volatile content, and past aqueous processes, providing insights into its magmatic evolution.
12638Geochemical Diversity in ShergottitesShergottites display geochemical diversity, indicating Mars’ mantle heterogeneity and complex magmatic history.
13612Martian Crustal Magnetic Fields and Solar Wind InteractionMars’ crustal magnetic fields affect solar wind interaction, influencing ion escape, atmospheric erosion, and ionospheric structure.
14587Stability of Liquid Water on MarsMars’ low pressure limits liquid water, but brines and transient water exist, affecting its habitability and climate evolution.
15548Martian Crust CompositionMars’ crust is primarily basaltic with diverse igneous compositions, shaped by magmatic and sedimentary processes.
16511Mapping Mars’ Topography with Mars Orbiter Laser Altimeter (MOLA)MOLA provides precise topographic data, improving understanding of Mars’ surface features, sedimentary structures, and crater formations.
17505Sulfate Minerals and Martian Water HistorySulfates on Mars suggest past water activity, formed by volcanic, acidic, or groundwater processes, influencing habitability.
18466Martian Dust and Climate EvolutionDust controls Mars’ climate, affecting temperature, circulation, and solar radiation interactions, influencing global weather.
19463Solar Wind and Martian AtmosphereWithout a global magnetic field, Mars’ atmosphere interacts directly with solar wind, affecting ion loss and atmospheric evolution.
20453Impact Craters as Martian Sediment TrapsImpact craters preserve Mars’ environmental history, trapping sediments that reveal past climate and hydrological activity.
21440Water Ice Distribution on MarsMars hosts widespread water ice, influenced by climate, obliquity shifts, and atmospheric processes, key for future exploration.
22367Fluvial Landforms and Hydrological History of MarsMars’ valley networks and fluvial ridges indicate past surface water, suggesting episodic precipitation and runoff. Hydrological models support intermittent water flow.
23362Martian Crustal Structure and Subsurface PropertiesMars’ crust varies in thickness and composition. Gravity and seismic data show heterogeneous layering, affecting heat flow and evolution.
24361Climate History of Early MarsEarly Mars may have been warm and wet or cold and icy. Geological evidence suggests episodic warming from volcanism or impacts.
25360The Martian Water Cycle and Atmospheric LossMars’ water cycle involves surface-atmosphere exchanges. Hydrogen and oxygen escape contribute to long-term water loss.
26349Evolution and Loss of Mars’ AtmosphereMars’ atmosphere was once thicker, allowing liquid water. Solar wind stripping led to its current thin state.
27340Martian Crustal Evolution and CompositionMars’ crust formed through magma ocean processes. It is primarily basaltic with regional variations from volcanic and impact events.
28311Subsurface and Ionospheric Sounding with MARSISMars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) detects Mars’ subsurface structures and ionosphere, revealing buried ice, geological features, and dielectric properties.
29306Perchlorates in Martian Soil and Their ImplicationsPerchlorates, found globally on Mars, form via atmospheric oxidation. They impact brine stability, habitability, and organic preservation.
30305Aeolian Dunes on Mars: Morphology and ActivityMartian dunes are widespread, with varying activity. Orbital and rover data show migration, sediment transport, and climate influence.
31304Mars–Solar Wind Interaction and Atmospheric DynamicsSimulations show Mars’ interaction with the solar wind affects atmospheric escape and space weather, influenced by density changes and crustal fields.
Table 2. Challenges and limitations identified in Martian research. Texts retrieved through semantic analysis of prompts such as “Mars research challenge,” “Mars research limitation,” and “Mars research problem” were applied and summarized by GPT-4o LLM.
Table 2. Challenges and limitations identified in Martian research. Texts retrieved through semantic analysis of prompts such as “Mars research challenge,” “Mars research limitation,” and “Mars research problem” were applied and summarized by GPT-4o LLM.
(a) Challenges and Limitations in Martian Solid Studies
ReferenceThemeLimitation/Challenge
[24]Martian Magma Differentiation StudiesLimited Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) data preclude robust detection of variations related to magma differentiation.
[25]Martian Compositional ModelingExisting models are based on sparse and potentially unrepresentative chemical data from Martian rocks.
[26]Seismic Studies on MarsMars’ lack of plate tectonics leads to low seismic activity, limiting terrestrial seismic analysis techniques.
[27]Martian Lava Flow StudiesProximal lava flow regions are buried and vents are unidentifiable.
[28]Martian Mantle Chemistry AnalysisShergottite–Nakhlite–Chassignite (SNC) meteorites lack direct mantle samples, hindering precise chemical characterization.
[29]Martian Crustal Structure StudiesLimited frequency content and absence of Love waves restrict constraints below 30 km depth.
[30]Shallow Crustal Structure Analysis on MarsHigh-frequency noise limits receiver function analysis of shallow structures (1–5 km).
(b) Challenges and Limitations in Martian Surface Exploration
ReferenceThemeLimitation/Challenge
[31]Martian Rock Properties AnalysisLimited compositional data, restricted sample access, and scarce mechanical measurements constrain knowledge of Martian rock properties.
[32]Earth–Mars Communication EfficiencyCommunication delays and data bottlenecks between Earth and Mars impede timely scientific data acquisition and rover decision-making.
[33]Photometric Corrections on MarsThe movement of various surface materials over time makes photometric corrections difficult.
[34]Cave Life Detection on MarsCave life detection missions exceed NASA’s New Frontiers budget limits, requiring Flagship-level funding.
[35]Martian Mineral Diversity AnalysisLimited landing site coverage and instrument detection thresholds constrain mineralogical analysis.
[3]Pre-Noachian Strata AnalysisFew and small pre-Noachian outcrops, combined with dataset and technique limitations, impede analysis of early alteration states.
[36]Illumination Variability on MarsMars’ ~25° obliquity causes annual shadow shifts of up to 50°, resulting in inconsistent image illumination.
[37]Environmental Contamination by Human ActivitiesHuman activity on Mars will introduce anthropogenic materials and degradation products of diverse types.
[38]Dust Accumulation on Solar PanelsDust accumulation on InSight’s solar panels led to power loss and early mission termination.
[39]Limitations of Orbital Imaging ResolutionOrbital imagery lacks the resolution to detect sub-meter rocks, despite their abundance on Mars.
[40]Limitations of Orbital Spectroscopy for Rock Composition AnalysisVisible and Near-Infrared (VNIR) spectroscopy cannot quantify bulk Fe, while gamma-ray spectroscopy lacks resolution for Hesperian sulfate-rich outcrops.
[41]Limited High-Resolution Imaging CoverageHigh-resolution ground images are limited to select landing sites, and the High Resolution Imaging Science Experiment (HiRISE) covers only a small surface fraction.
[42]Shock Environment Analysis on MarsLaboratory constraints on shock environments are imprecise, as most Martian meteorites are impact-altered.
[43]Mars’ Bond Albedo MeasurementsComprehensive spatial, spectral, and angular data for accurate Bond albedo measurements are currently lacking.
[44]Limitations of VNIR Spectroscopy on MarsVNIR spectroscopy penetrates only shallow depths, making data susceptible to dust obscuration.
[45]Propellant Requirements for Mars Ascent Vehicle (MAV)Delivering MAV propellant from Earth demands 12–13 tons in LEO per ton landed on Mars.
[3]Clay Mineral Characterization on MarsCrystal–chemical substitution, mixed layering, variable crystallinity, and hydration states complicate clay mineral characterization.
(c) Challenges and Limitations in Martian Atmospheric Research
ReferenceThemeLimitation/Challenge
[46]Challenges in Mars Entry, Descent, and Landing (EDL) SystemsUncertainties in atmospheric density and wind profiles hinder development of a standardized EDL system.
[47]Challenges in Martian Data AssimilationMartian data assimilation depends on fewer and distinct observations, heavily relying on infrared temperature retrievals.
[48]Methane Detection on MarsTGO detection limits and methane’s atmospheric lifetime constrain explanations for Mars Science Laboratory (MSL) methane observations.
[49]Limitations of the Occultation Method for Atmospheric ProfilingAerosol opacity prevents occultation techniques from retrieving gas profiles near the Martian surface.
[48]Unresolved Methane Source and Sink ProblemThe mismatch between suspected methane sources and proposed rapid destruction mechanisms remains unresolved.
[48]Uncertainty in In Situ Methane MeasurementsMSL is the sole surface platform for methane detection, but Tunable Laser Spectrometer–Sample Analysis at Mars (TLS-SAM) data may be affected by instrument contamination.
(d) Challenges and Limitations in Martian Space Environment Studies
ReferenceThemeLimitation/Challenge
[50]Lack of High-Quality Energetic Particle ObservationsHigh-quality 1-AU equivalent observations of ~1 MeV electrons and ~40 MeV protons are lacking at Mars’ orbit.
[51]Limitations of In Situ Plasma Observations on MarsSatellite orbits and limited temporal coverage constrain in situ observations of the Martian plasma environment.
[52]Challenges in Studying Martian Magnetotail PlasmaElectrostatic analyzer constraints hinder detection of thermal (<few eV) plasma in the Martian magnetotail.
Table 3. Scientific and exploration frontiers identified in Martian research. Texts retrieved through semantic analysis of prompts such as “Mars research further,” “Mars research future,” and “Mars research better” were filtered and summarized.
Table 3. Scientific and exploration frontiers identified in Martian research. Texts retrieved through semantic analysis of prompts such as “Mars research further,” “Mars research future,” and “Mars research better” were filtered and summarized.
(a) Scientific and Exploration Frontiers in Martian Solid
ReferenceThemeEvidence/Approach
[31]Mineralized Fractures and Subsurface Fluid FlowMineralized fractures preserve direct evidence of postdepositional fluid flow in the Martian subsurface.
[53]Martian PaleomagnetismPaleomagnetic samples would provide the first direct measurements of Mars’ paleo-field direction.
[53]Martian Interior Temperature StructureFull one-dimensional temperature profiles are unrealistic, but seismic data can provide temperature-pressure points at discontinuities.
[36]Martian Magnetic Field MappingFuture studies will use a higher maximum spherical harmonic (SH) degree model with careful regularization.
(b) Scientific and Exploration Frontiers in Martian Surface
ReferenceThemeEvidence/Approach
[54]Dune Field Analysis on MarsThe Object-Based Image Analysis (OBIA) method enables improved estimates of sediment flux, dune migration, and erosion rates.
[55]Remote Detection of Hydrated MineralsSuperCam infrared (IR) passive spectroscopy is uniquely capable of identifying hydrated and hydroxylated mineral outcrops at a distance.
[55]Mars-2020 Rover Sampling CapabilitiesThe Mars-2020 rover can abrade targets to create a flat surface, remove dust, and extract core samples (~1 cm wide, 5 cm long).
[56]Mineral Identification by Mars-2020 RoverThe rover is equipped with a shortwave infrared point spectrometer and a green Raman spectrometer to detect carbonates and/or serpentine from a distance at cm to mm scale.
[57]Modern Surface Processes on MarsThese results contribute to growing evidence that current processes shape Martian geomorphology.
[36]Global Imaging for Mars Temporal ChangesInvestigating whether near-global, contemporaneous imaging (e.g., Mars Color Imager) could produce more consistent data products.
[58]Robotic Water Resource ExplorationRobotic missions will assess and optimize the use of Mars’ initial limited water resources, potentially from the poles or nearby asteroids.
[59]Subsurface Layer Modeling on MarsFurther research is needed to develop detailed layered models of the Martian subsurface, which are currently unavailable.
[60]Melt Oxidation in Martian SamplesAs more samples are returned from Mars, the extent and intensity of melt oxidation will be studied in greater detail.
[61]Investigation of Martian Ripple SystemsWith Tianwen-1 and Perseverance en route to Mars, and Rosalind Franklin rover under development, future missions will allow close-up examination of ripple systems.
[62]Effect of Excess H2O2 on OrganicsA logical next step is to investigate how excess H2O2 affects organics over Martian geologic timescales or its impact on thermal decomposition analysis methods used by Mars rovers.
[63]Spectral Analysis and Modeling of MarsOngoing efforts will combine laboratory-derived spectra with CRISM observations, supplemented by Polycyclic Aromatic Hydrocarbon (PAH) spectra relevant to non-polar environments.
[34]Selection Criteria for Mars RoboticsFuture robotic mission planning will emphasize life-detection potential, water ice presence, site accessibility, and established selection criteria.
[35]In Situ Mineralogical Analysis on MarsFuture research should prioritize in situ mineralogical investigations to provide direct evidence for interpreting Mars’ geological history.
[3]Pre-Noachian History of MarsDetailed analysis of fine-scale rock units, through in situ rover measurements or advanced laboratory studies of returned samples, is essential for probing pre-Noachian Mars.
[64]Martian Mineral Spectral LibraryFuture research should focus on developing a more comprehensive spectral library covering a wider range of surface mineral spectra.
(c) Scientific and Exploration Frontiers in Martian Atmosphere
ReferenceThemeEvidence/Approach
[65]Atmospheric CO2 Measurements and InstrumentationFuture efforts will focus on advancing instrumentation and improving atmospheric CO2 measurements on Mars.
[65]Enhanced Atmospheric Pressure SensingThis study proposes adding active sensors to the existing pressure-sensing system for Martian atmospheric studies.
[66]Oxygen Density Decline on MarsFuture research is required to assess the robustness of the observed decline in oxygen density.
[67]Proton Origins in Mars’ Atmospheric EscapeResults indicate that explicitly modeling proton origins improves understanding of atmospheric escape processes.
[68]Martian Data Assimilation for Weather StudiesAdvances in Martian data assimilation techniques are improving.
(d) Scientific and Exploration Frontiers in Martian Space
ReferenceThemeEvidence/Approach
[69]Induced Magnetosphere of MarsInvestigating the piled-up magnetic field, its strength, and its dependence on varying solar wind.
[70]Solar Wind Interaction with Martian PlasmaLong-term studies, increasingly informed by orbital observations, have focused on solar wind interactions with the Martian plasma environment.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Zhang, J.; Huang, Q.; Sun, Y.; Shao, J.; Gou, Y.; Huang, K.; Zhang, S. Sentence-Level Insights from the Martian Literature: A Natural Language Processing Approach. Appl. Sci. 2025, 15, 8663. https://doi.org/10.3390/app15158663

AMA Style

Zhang Y, Zhang J, Huang Q, Sun Y, Shao J, Gou Y, Huang K, Zhang S. Sentence-Level Insights from the Martian Literature: A Natural Language Processing Approach. Applied Sciences. 2025; 15(15):8663. https://doi.org/10.3390/app15158663

Chicago/Turabian Style

Zhang, Yizheng, Jian Zhang, Qian Huang, Yangyi Sun, Jia Shao, Yu Gou, Kaiming Huang, and Shaodong Zhang. 2025. "Sentence-Level Insights from the Martian Literature: A Natural Language Processing Approach" Applied Sciences 15, no. 15: 8663. https://doi.org/10.3390/app15158663

APA Style

Zhang, Y., Zhang, J., Huang, Q., Sun, Y., Shao, J., Gou, Y., Huang, K., & Zhang, S. (2025). Sentence-Level Insights from the Martian Literature: A Natural Language Processing Approach. Applied Sciences, 15(15), 8663. https://doi.org/10.3390/app15158663

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop