Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools

Barreau, Jean-Baptiste

doi:10.3390/electronics14071395

Open AccessArticle

Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools

by

Jean-Baptiste Barreau

Archéologie des Amériques (ArchAm), CNRS, Université Paris 1 Panthéon-Sorbonne, 75004 Paris, France

Electronics 2025, 14(7), 1395; https://doi.org/10.3390/electronics14071395

Submission received: 15 February 2025 / Revised: 22 March 2025 / Accepted: 27 March 2025 / Published: 30 March 2025

(This article belongs to the Special Issue Electronics and Computer Science for Cultural Heritage: Advancements, Preservation, and Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

This study explores the use of artificial intelligence and geospatial tools to analyze historical explorers’ narratives. Explorers’ accounts provide valuable insights into the cultural, environmental, and logistical dynamics of exploration journeys. However, traditional methods of analyzing these narratives are often subjective and difficult to reproduce on a large scale. The main objective is to overcome the limitations of traditional methods by using AI techniques to systematically extract and structure information from explorers’ narratives. This study employs Python scripts to extract factual data from narratives available on Project Gutenberg, followed by structuring the data in JSON format. Geographic data are enriched through geocoding using libraries such as Geopy and OpenCage. An interactive web interface based on Leaflet allows for the visualization and validation of explorers’ routes. The results show a concentration of visits in North and West Africa, with traditional modes of transport like caravans and traveling on foot being dominant. The main challenges faced were related to transportation, climatic conditions, and natural obstacles. Principal component analysis (PCA) and correspondence analysis reveal latent structures in the data, while clustering analysis segments the journeys based on similarity criteria. This research demonstrates the value of AI and geospatial tools for a more objective and detailed analysis of explorers’ narratives, opening new perspectives for historical and geographical studies.

Keywords:

historical travel analysis; natural language processing (NLP); geospatial visualization

1. Introduction

1.1. Explorer Accounts as Historical Sources

The narratives written by explorers in past centuries provide valuable insights into the cultural, environmental, and logistical dynamics of exploration journeys. These firsthand accounts, rich in detailed descriptions and empirical observations, serve as essential primary sources for analyzing interactions between encountered societies, traversed environments, and the numerous challenges faced by explorers [1]. They shed light on the motivations, strategies, and lived experiences of expeditions while enriching our understanding of the historical and geographical contexts explored.

1.2. Challenges and Limitations of Traditional Analysis

The study of travel narratives is crucial for understanding the cultural, economic, and geopolitical dynamics that shaped exploration and world representation. Holtz and Masse [2] provide an overview of traditional interpretation methods, emphasizing the epistemological challenges related to source subjectivity, discourse variations, and the inherent biases of narrative construction. Lefebvre [3] explores how geographical and political discourses influenced colonial expansion and territorial mapping, highlighting the legitimization of expansionist enterprises through narratives. Gauthier [4] examines the role of institutions such as the Bordeaux Commercial Geography Society (1874–1911) in supporting exploration, revealing the intersection of scientific ambitions, commercial interests, and political strategies. These studies demonstrate that exploration accounts go beyond mere testimony; they are complex objects of study at the crossroads of spatial representation, knowledge circulation, and power dynamics. However, traditional analysis faces several obstacles: the large volume of documents, their heterogeneous structure, and the subjectivity of interpretation make systematic analysis difficult and hardly reproducible [5]. Information overload is a major challenge in systematic reviews. Gallou [5] highlights that the abundance of sources can hinder knowledge synthesis, thereby compromising the relevance of results. Roetzel [6] identifies the impacts of information overload on decision-making, while an exploratory review [7] proposes strategies to mitigate it, such as filtering and data structuring. Seutter [8] stresses the importance of an adapted architecture to reduce users’ cognitive overload.

1.3. Contributions of Quantitative and Computational Approaches

While many studies analyze specific narratives, systematic quantitative approaches remain rare. Traditional methods struggle to extract and synthesize essential information such as routes, transportation modes, obstacles, and interactions with local populations. Large-scale narrative analysis is complex due to the heterogeneity of corpora and diverse historical contexts. Rorden [9] illustrates the benefits of machine learning techniques for extracting and classifying narratives within extensive text databases. Supervised and unsupervised techniques facilitate the identification of relevant documents and reveal latent structures inaccessible through classical methods. Chachereau [10] demonstrates how data mining and digital mapping reconstruct past tourist flows and analyze their socio-economic dynamics. Villamor [11] showcases the potential of topic modeling and word embeddings to structure and interpret historical corpora, overcoming the limitations of traditional approaches. Human subjectivity also introduces analytical biases. Winterbottom [12] and Betsch [13] examine the influence of narratives on decision-making and risk assessment, revealing how emotions and affective engagement amplify the impact of stories, often at the expense of objective data. In response to these challenges, automating information extraction and utilizing geospatial tools become essential. Partlan [14] proposes a multi-graph model to study the structure of interactive narratives, while Jones [15] employs natural language processing (NLP) to predict macro-narrative structures. Sudhahar [16] combines text mining and pattern extraction to analyze key actors in narratives, and Chen [17] explores automated narrative visualizations. NLP applications in tourism, as explained by Alvarez [18], facilitate the extraction of routes and transport preferences, aiding in the assessment of socio-economic dynamics. Gregoriades [19] uses NLP to predict tourists’ intention to revisit destinations, demonstrating the growing interest in computational approaches for enhancing analysis in this field.

1.4. Objectives and Hypothesis

The objective of this study is to systematically extract, structure, and analyze information contained in exploration narratives, as suggested by Brunsting [20] and Hu [21]. Unlike traditional approaches relying on manual reading or limited textual analyses, we leverage NLP tools to automate data extraction and structuring. This method enables the processing of a vast volume of documents with unparalleled granularity and speed. A major innovation of our approach is the combined integration of NLP and geocoding. By automatically extracting toponyms and linking them with specialized geocoding libraries, we ensure the precise localization of the mentioned places. Additionally, we developed an interactive Leaflet-based interface for semi-automated data validation, reducing errors and ambiguities inherent in fully automated processes. Another novel aspect is the enrichment of geographical data through advanced contextual analysis. Using machine learning models and Python scripts, we identify and categorize transportation modes and challenges faced by explorers. This approach not only improves our understanding of travel conditions but also reveals historical and environmental dynamics often overlooked by classical methods. Finally, we develop Python functions to organize and analyze these data using advanced statistical methods, including principal component analysis (PCA), correspondence analysis (AFC), and clustering. These techniques allow us to identify latent structures and correlations, offering a more refined and multidimensional reading of exploration narratives. The central hypothesis is that this computational approach surpasses the limitations of traditional methods by enabling a richer and more systematic understanding of routes, transportation modes, environmental conditions, and explorers’ personal characteristics. It thus opens new perspectives for the study of exploration narratives by combining automation, geographical precision, and historical contextualization.

2. Materials and Methods

This section describes the different steps implemented for collecting, organizing, and analyzing sources from explorers’ narratives. The main objective of our methodological approach is to structure these documents into a usable form for studying the journeys and experiences of explorers. We designed a data processing pipeline comprising several stages: source retrieval, data extraction and structuring, geocoding of mentioned locations, classification of contextual information, and statistical analysis of the narratives.

2.1. Input Data

We chose Project Gutenberg as our source due to several significant advantages. First and foremost, it provides completely free access to the works it hosts, without requiring account creation or any form of identification. This immediate accessibility is a major asset, particularly for quickly consulting documents. Additionally, the books are available as web pages, sometimes including images, which is particularly relevant for illustrated works, travel narratives containing maps, or any other documents where iconography plays an important role. Furthermore, the web format facilitates the automated extraction of both text and images. Finally, since all works available on Project Gutenberg belong to the public domain, their use is not restricted by copyright laws, making their dissemination and study significantly easier. The thematic selection of works focused on travel narratives by explorers who journeyed across the African continent, guided by several methodological and scientific considerations. Firstly, a substantial number of such works are available via Project Gutenberg, providing a sufficiently extensive and diverse corpus to enable an in-depth analysis. Secondly, identifying and geolocating the place names mentioned in these narratives presents a significant challenge due to the evolution of geographical names over time. This issue is particularly pronounced in Africa, especially in remote regions such as the Sahara, where historical and cartographic documentation may be sparse [22,23]. This context adds a layer of complexity to the geocoding process, making it a relevant technical challenge that we sought to address. Lastly, the period covered by these exploration narratives is relatively consistent, with a few exceptions, allowing for a coherent contextualization of the data and facilitating a comparative approach to the trajectories and exploratory dynamics across different historical periods. The list of authors and books is presented in Table 1.

All of these works, accessible on the Project Gutenberg website [56,57], contain web pages enriched with images. However, our goal was to obtain these books in a purely textual format, supplemented with links pointing to the image URLs.

2.2. Data Extraction and Transformartion

We worked in a Python 3.12.4 environment packaged by Anaconda, Inc. (Austin, TX, USA). Our Python implementation is CPython and our system runs on Windows 10 64-bit with MSC v.1929 and AMD64 architecture. All the Python code described in this section is available online [58]. The data processing workflow, described in more detail below, is presented in Figure 1.

To achieve our goal, we designed a dedicated Python function. This function downloads the book’s web page using the requests library [59] and processes its HTML content with BeautifulSoup [60]. The ‘<img>’ tags are replaced with their respective URLs, and the modified content is saved as a local HTML file. Finally, this file is converted into a PDF document via a web browser. We opted for a data organization in JSON format [61]. First, each PDF file was integrated into NotebookLM [62,63]. By conducting a comprehensive selection of all explorer narratives, we formulated the following prompt: “Based on these explorer narratives, what potential factual parameters could be identified regarding travel stages, beyond just place names, dates, and means of transport?” This step was crucial in gaining a better understanding of the different categories of data that could be extracted and structured from the analyzed narratives. Next, we formulated a detailed prompt as follows: “List in JSON format the title, year of travel, author, author’s age, amount of money available, main travel objective, success or failure in achieving the objective, language, and book URL”. The generated result was then copied and inserted into a dedicated JSON file. However, this JSON file required an additional step of standardization and correction. The fields generated by NotebookLM exhibited some inconsistency in their naming, requiring adjustments to match the standardized labels: “titre” (title), “annee” (year), “auteur” (author), “age” (age), “objectif” (objective), “argent” (money), “tresors” (treasures), “succes” (success), “langage” (language), and “url” (URL). In a second phase, an additional prompt was entered into NotebookLM to detail each travel stage in JSON format: “List in JSON format each travel stage, including the departure location, departure date, means of transport, difficulties encountered, and food consumed at the stage”. If errors occurred during generation, we adjusted the parameters by prioritizing the removal of the “description” field, which appeared to be particularly demanding for NotebookLM. Once the data were consolidated, the validity of the resulting JSON files was verified using the online tool JSONLint (available at jsonlint.com) [64], ensuring their structural integrity. In parallel with these operations, we also generated a file named images.json using a specific query: “List the image files and their descriptions in JSON format”. This allowed for the creation of a structured inventory of the visual resources associated with each book.

2.2.1. Geocoding and Interactive Validation

We developed two functions to complete the travel step data by adding missing geographic coordinates using geocoding services. The first function handles individual steps. If the key fields “latitude” and “longitude” are missing, it initiates a geocoding process, first using Geopy [65]. If Geopy provides coordinates, they are added with the source marked as “Geopy”. If Geopy fails, it proceeds to OpenCage [66] and records the obtained coordinates with the source identified as “OpenCage”. If neither geocoder returns a result, the coordinates are set to None, and the source is marked as “manual”. The second function handles the global processing of steps in a JSON file. After loading the data, it initializes counters to track geocoding statistics for Geopy, OpenCage, and manual additions. It then calculates the total number of steps to track processing progress. For each journey containing steps, it calls the first function and updates the counters accordingly. At each step, a function informs of the progress, and once the data are enriched, they are saved. Finally, it displays the geocoding source statistics and returns the results to summarize the process.

To test the validity of the coordinates and attempt to complete the missing ones, we developed a web interface displaying the map of the steps. The HTML and JavaScript code of the web interface creates an interactive map centered on North Africa, showing the explorers’ routes based on data loaded from the JSON file. The map uses the Leaflet library [67] to display tiles from the “World Imagery” layer of ArcGIS [68]. The code generates a polyline for each explorer and assigns a unique color to each route based on the title of the work associated with the explorer. Custom markers are also placed at each step of the journey, with a colored round icon displaying the step number. These icons can also indicate the number of steps with null coordinates between the previous step and the current one. When the user clicks on a marker, a pop-up window appears, showing additional information about the step, the title of the work, the publication year, the author, and other relevant details, as well as a link to the work’s URL. The code also includes a legend that allows individual explorer routes to be shown or hidden via checkboxes. This legend displays the information in the following order: publication year, author, and title of the work. Each time a box is checked, the corresponding route and markers are added to the map, and when unchecked, they are removed. The entire script works by dynamically retrieving the JSON data and integrating them into the interactive map (cf. Figure 2).

To test the validity of the coordinates and attempt to complete those missing for each step, we performed a series of checks and actions. First, we checked whether the step was present in the web interface representing the map. If the step appeared, we ensured that the point’s position seemed correct and consistent with its location. However, if the step did not appear in the interface, we copied its name and performed a search on Google Maps. If this search returned no relevant results, we assigned null values to the latitude and longitude parameters in the JSON file. On the other hand, if a geographically coherent result was found, we updated the JSON file by saving the geographic coordinates obtained from Google Maps and adjusting the source to indicate “Geopy_corrected_manually” or “OpenCage_corrected_manually”, depending on the case. If the step was not displayed in the web interface and a relevant result was found during the Google Maps search, we updated the JSON file with the corresponding geographic coordinates, specifying the source as “manual”. Thus, we can say that this is a semi-automated process.

2.2.2. Classification and Enrichment of Data

In order to categorize the various modes of transport mentioned in our data, we designed another Python function. This function takes the JSON file as input, processes the described steps, and extracts each mentioned mode of transport. The data are then cleaned by removing any periods, and grouped into a list. The function returns a sorted list, keeping only the unique identified modes of transport. With this generated list, we turned to ChatGPT [69,70] with a specific prompt: “For each item in the following list, assign a transport category and create a corresponding Python list”. This list, generated from the aforementioned function, served as the basis for this task. If no category could be assigned to a mode of transport, it was by default classified into the generic “Not specified” category. To minimize these undefined cases, we then performed a manual review, assigning the most appropriate categories based on our judgment. Finally, an additional Python function was developed to enrich the data. This function processes each step of each book, identifies the “moyen_transport” field, and attempts to classify it by comparing its content with the pre-established category list. If a match is found, the appropriate category is assigned to the step under the key “categorie_transport”. If no association can be made, the default category remains “Not specified”. This process, in its methodology and principles, was replicated to perform a similar categorization of the difficulties encountered.

We finally designed two separate functions to enrich the geographic data by adding an additional category indicating the current country corresponding to each point. The first of these functions also relies on the use of the OpenCage API [66], querying it to determine the country associated with a specific pair of geographic coordinates, namely latitude and longitude. If these coordinates are valid, the function sends a request, retrieves the results, extracts the country mention, and returns this information. The second function concurrently handles the JSON file, uses the first function to enrich each step by adding the corresponding country category, and ensures that the updated file is saved.

2.3. Statistical Analysis

At this stage, we have developed several other Python functions, designed to orchestrate the processing and organization of data. First, a function examines the provided dates, aiming to confirm their format and, when only a single year is given, assigns 1 July of that year as the reference date. Next, a function is responsible for deducing a person’s age at a given time, subtracting the years while carefully considering months and days. Another function works by traversing steps, measuring the distances between valid points to calculate the traveled path using the geodesic function from geopy.distance [71]. Then, a general function explores the details of the steps, whether it is places, departure and arrival dates, or corresponding ages, while compiling statistics on transport categories, journey difficulties, or countries traversed. Finally, a last function creates a general DataFrame [72] on which the statistics will be applied. A detailed report on the DataFrame was generated using the Python library ydata-profiling [73] and is available online [74].

2.3.1. Principal Component Analysis

This analysis aims to identify latent structures within the data by reducing dimensionality while preserving as much information as possible. It allows for the exploration of correlations between variables and highlights the factors influencing explorers’ trajectories. We developed a function that performs a principal component analysis (PCA) on a given dataset in the form of a DataFrame. It begins by creating an output directory to store the results, then selects only the numerical columns of the DataFrame, replacing missing values with zero. Next, it standardizes the data using the “StandardScaler” [75] to ensure homogeneous scaling before applying PCA. The relevant columns are: Columns normalized: [“Start Year”, “Departure Age”, “Arrival Age”, “Travel duration in days”, “Number of steps”, “Number of images”, “Steps with null coordinates in %”, “Total distance traveled (km)”, “Transport_…”, “Country_…”, “Difficulty_…”]. PCA is then executed on the standardized data using the “PCA” class from “sklearn”, generating the principal components and the proportion of variance explained by each. These results are saved as CSV files, including the explained and cumulative variance by each component, as well as the coordinates of the individuals in the new reduced space. The contributions of the variables to the different components are calculated as “loadings” and stored in another CSV file. Next, visualizations are generated and saved: a bar chart illustrating the variance explained by each component and a correlation circle showing the relationship between the original variables and the first two principal components. Finally, a text file summarizes the key results, highlighting the explained variance and the most influential variables on the first two components. The final results, containing the individuals’ coordinates and the variable contributions, are returned as DataFrames. For illustration purposes, here is the equation of principal component PC1 (1).

\begin{matrix} PC 1 & = (- 0.237) \cdot Start Year + (- 0.017) \cdot Departure Age + (0.005) \cdot Arrival Age \\ + (0.020) \cdot Travel duration in days + (- 0.015) \cdot Number of steps \\ + (- 0.016) \cdot Number of images + (- 0.011) \cdot Steps with null coordinates in % \\ + (0.293) \cdot Total distance traveled (km) + (- 0.077) \cdot Transport_Train \\ + (- 0.037) \cdot Transport_Equid + (- 0.073) \cdot Country_France \\ + (- 0.105) \cdot Country_Algeria + (- 0.094) \cdot Transport_Slow ground vehicle \\ + (0.250) \cdot Transport_Boat + (- 0.132) \cdot Difficulty_Climate \\ + (- 0.110) \cdot Difficulty_Transport + (- 0.102) \cdot Difficulty_Nature \\ + (- 0.039) \cdot Country_Tunisia + (- 0.034) \cdot Country_Morocco \\ + (- 0.017) \cdot Country_Libya + (- 0.022) \cdot Transport_Caravan \\ + (- 0.057) \cdot Transport_Camelid + (0.033) \cdot Country_United Kingdom \\ + (0.020) \cdot Country_Nigeria + (- 0.015) \cdot Country_Niger \\ + (- 0.124) \cdot Difficulty_Humans + (- 0.022) \cdot Country_Spain \\ + (- 0.044) \cdot Country_Gibraltar + (- 0.084) \cdot Transport_Walking \\ + (- 0.084) \cdot Difficulty_Fatigue/Illness + (- 0.105) \cdot Difficulty_Thirst/Hunger \\ + (- 0.067) \cdot Country_Sierra Leone + (- 0.087) \cdot Country_Guinea \\ + (- 0.083) \cdot Country_Mali + (- 0.008) \cdot Country_C ô te d^{'} Ivoire \\ + (- 0.055) \cdot Country_Senegal + (0.087) \cdot Country_Egypt \\ + (0.245) \cdot Country_Kenya + (0.041) \cdot Country_Tanzania \\ + (0.160) \cdot Country_Mozambique + (0.224) \cdot Country_South Africa \\ + (0.258) \cdot Country_Portugal + (0.278) \cdot Country_Cape Verde \\ + (0.278) \cdot Country_Saint Helena, Ascension and Tristan da Cunha \\ + (0.278) \cdot Country_India + (0.278) \cdot Country_Somalia \\ + (0.278) \cdot Country_Guinea - Bissau + (0.004) \cdot Country_Malawi \\ + (0.074) \cdot Country_Liberia + (0.074) \cdot Country_The Gambia \\ + (0.074) \cdot Country_Ethiopia + (0.074) \cdot Country_Malta \\ + (0.081) \cdot Country_Germany + (0.081) \cdot Country_Angola \\ + (0.081) \cdot Country_Namibia + (0.055) \cdot Country_Sweden \\ + (0.055) \cdot Country_Ukraine + (0.055) \cdot Country_Russia \end{matrix}

(1)

2.3.2. Factorial Correspondence Analysis

The objective is to visualize the associations between authors and the characteristics of their journeys. This method highlights the differences and similarities among explorers, revealing underlying trends in exploration narratives. We have created a function that performs a correspondence analysis on the DataFrame in several well-defined steps. First, it ensures the creation of the output directory to store the results. Then, it prepares the data by replacing missing values with zero and setting the “Author” column as the index. It filters the columns to retain only those containing numerical values and excludes certain specific columns related to distances and null coordinates. The relevant columns are [“Arrival Age”, “Number of steps”, “Number of images”, “Transport_…”, “Country_…”, “Difficulty_…”, “Start Year”, “Departure Age”, “Travel duration in days”]. An external function is then used to aggregate the data by author if necessary. After that, rows and columns composed only of zeros are removed to avoid an empty matrix, which is checked before proceeding. Once the data are prepared, the function applies CA using the prince library. It fits a model with two main dimensions and extracts the coordinates of the rows and columns. These results are then saved in a text file within the output directory. To visualize the results, the function generates a plot where rows are represented by blue points and columns by red points. Each point is annotated with its respective name, simplifying the column names for clarity. Reference axes are added to better position the points in the factor space, and a legend along with a grid enhances readability. Finally, the plot is saved as an image in the output directory and then closed to free up memory. For illustration, the equation for Dimension 1 is given by (2).

\begin{matrix} Dim 1 = & (1.022) \cdot Arrival Age + (0.088) \cdot Number of steps + (0.034) \cdot Number of images \\ + (- 0.405) \cdot Transport_Train + (- 0.076) \cdot Transport_Equid + (- 0.687) \cdot Country_France \\ + (- 0.531) \cdot Country_Algeria + (- 0.163) \cdot Transport_Slow ground vehicle \\ + (0.421) \cdot Transport_Boat + (0.594) \cdot Difficulty_Climate + (0.121) \cdot Difficulty_Transport \\ + (1.282) \cdot Difficulty_Nature + (- 0.149) \cdot Country_Tunisia \\ + (0.107) \cdot Country_Morocco + (- 0.396) \cdot Country_Libya \\ + (- 0.234) \cdot Transport_Caravan + (- 0.258) \cdot Transport_Camelid \\ + (- 0.266) \cdot Country_United Kingdom + (- 0.092) \cdot Country_Nigeria \\ + (0.118) \cdot Country_Niger + (0.258) \cdot Difficulty_Humans \\ + (- 0.516) \cdot Country_Spain + (- 0.242) \cdot Country_Gibraltar \\ + (1.062) \cdot Transport_Walking + (1.518) \cdot Difficulty_Fatigue / Illness \\ + (1.455) \cdot Difficulty_Thirst / Hunger + (0.146) \cdot Country_Sierra Leone \\ + (0.750) \cdot Country_Guinea + (0.439) \cdot Country_Mali \\ + (0.660) \cdot Country_C ô te d^{'} Ivoire + (0.621) \cdot Country_Senegal \\ + (- 0.351) \cdot Country_Egypt + (1.098) \cdot Country_Kenya \\ + (1.271) \cdot Country_Tanzania + (1.427) \cdot Country_Mozambique \\ + (- 0.080) \cdot Country_Democratic Republic of the Congo \\ + (2.438) \cdot Country_Zambia + (2.438) \cdot Country_Botswana \\ + (0.590) \cdot Country_South Africa + (0.867) \cdot Country_Zimbabwe \\ + (0.157) \cdot Country_Portugal + (1.864) \cdot Country_Cape Verde \\ + (1.864) \cdot Country_Saint Helena, Ascension and Tristan da Cunha \\ + (1.864) \cdot Country_India + (1.864) \cdot Country_Somalia \\ + (1.864) \cdot Country_Guinea-Bissau + (- 0.703) \cdot Country_Malawi \\ + (- 0.140) \cdot Country_Mauritania + (- 0.709) \cdot Country_Ghana \\ + (- 0.709) \cdot Country_Liberia + (- 0.709) \cdot Country_The Gambia \\ + (- 0.709) \cdot Country_Ethiopia + (- 0.709) \cdot Country_Malta \\ + (- 0.700) \cdot Country_Gabon + (- 0.684) \cdot Country_Germany \\ + (- 0.684) \cdot Country_Angola + (- 0.684) \cdot Country_Namibia \\ + (- 0.156) \cdot Country_Sahrawi Arab Democratic Republic \\ + (0.446) \cdot Country_Sweden + (0.446) \cdot Country_Ukraine \\ + (0.446) \cdot Country_Russia + (- 0.143) \cdot Start Year \\ + (0.203) \cdot Departure Age + (1.476) \cdot Travel duration in days . \end{matrix}

(2)

2.3.3. Clustering

This analysis aims to group journeys into homogeneous segments based on similarity criteria. It helps identify explorer profiles and gain a better understanding of major exploration patterns. We have finally created a function designed to perform hierarchical and k-means clustering analysis [76] on a dataset provided as a DataFrame. Its goal is to segment the observations into multiple groups based on their numerical features and save the results as CSV files and a graphical representation. It begins by filtering the numerical columns of the DataFrame, excluding non-numeric data. Then, it replaces missing values with the mean of the respective columns to avoid issues during further processing. Once the data are prepared, they are normalized using the StandardScaler transformation [75], which centers each variable to a mean of zero and scales them according to their standard deviation. The normalized columns are the numerical columns, namely [“Start Year”, “Departure Age”, “Arrival Age”, “Travel duration in days”, “Number of steps”, “Number of images”, “Steps with null coordinates in %”, “Total distance traveled (km)”, “Transport_…”, “Difficulty_…”, “Country_…”]. The second step involves applying hierarchical clustering to the normalized data. It uses the Ward method [77] to minimize intra-cluster variance and produces a linkage matrix, which is then used to plot a dendrogram. This displays the clustering structure in the form of a tree, where each leaf represents an observation from the DataFrame. To facilitate the identification of the observations, the leaf labels correspond to a shortened title obtained via a dedicated function. A red horizontal line is drawn at an arbitrary distance of 12.5 to indicate the cut-off threshold used to define the clusters. The image of the dendrogram is saved in a file at a specific location on the disk. Once the dendrogram is generated, the function assigns each observation to a group based on the defined cut-off threshold. The results of the hierarchical clustering are then added to the DataFrame under a new column titled Hierarchical_Cluster. A statistical summary of the hierarchical clusters is calculated by grouping the data by cluster and extracting descriptive statistics such as the mean, standard deviation, minimum, and maximum values for each numerical variable. This information is exported to a dedicated CSV file. The function continues the analysis by applying another type of clustering, based on the k-means algorithm. The k-means algorithm partitions the normalized data into k groups iteratively, minimizing intra-cluster distance. Each observation is assigned to a cluster, and this information is added to both the numerical DataFrame and the original DataFrame under the KMeans_Cluster column. As with the hierarchical clustering, a statistical summary of the k-means clusters is calculated and saved in a CSV file. Additionally, a file containing the observations annotated with their respective clusters is also exported. The equation representing the average center of the clusters resulting from K-Means clustering is (3).

\begin{matrix} Cluster Center & = - 1.008 \cdot Start Year + 0.102 \cdot Departure Age \\ + 0.353 \cdot Arrival Age + 0.361 \cdot Travel duration in days \\ + 0.172 \cdot Number of steps + 0.239 \cdot Number of images \\ - 0.027 \cdot Steps with null coordinates in % \\ + 0.578 \cdot Total distance traveled (km) \\ + 0.146 \cdot Transport_Train - 0.041 \cdot Transport_Equid \\ + 0.133 \cdot Country_Algeria \\ - 0.082 \cdot Transport_Slow ground vehicle \\ + 0.502 \cdot Transport_Boat + 0.497 \cdot Difficulty_Climate \\ - 0.539 \cdot Difficulty_Transport + 0.825 \cdot Difficulty_Nature \\ + 0.145 \cdot Country_Tunisia - 0.052 \cdot Country_Morocco \\ - 0.061 \cdot Country_Libya - 0.342 \cdot Transport_Camelid \\ - 0.034 \cdot Country_United Kingdom \\ - 0.259 \cdot Difficulty_Humans - 0.126 \cdot Country_Spain \\ + 0.969 \cdot Transport_Walking + 0.783 \cdot Difficulty_Fatigue/Illness \\ + 0.532 \cdot Difficulty_Thirst/Hunger + 0.928 \cdot Country_Guinea \\ - 0.267 \cdot Country_Mali - 0.131 \cdot Country_Ivory Coast \\ + 0.633 \cdot Country_Senegal + 0.790 \cdot Country_Kenya \\ + 0.293 \cdot Country_Tanzania - 0.110 \cdot Country_Mozambique \\ - 0.110 \cdot Country_Democratic Republic of the Congo \end{matrix}

(3)

3. Results

The generated JSON file [78] serves as the empirical basis for this study (cf. Listing 1). In this section, we will conduct a detailed examination of the results derived from our analytical methodology, with a particular focus on metrics characterizing explorers’ movements. Our analysis will center on the cartographic representation of travel routes and the geographical distribution of visited destinations. Next, we will analyze parameters specific to the explorers themselves. Finally, we will deepen our interpretation of the data through advanced statistical methods. Principal component analysis and correspondence analysis will be employed to identify latent structures within the studied corpus. These analyses will be complemented by a clustering approach aimed at segmenting journeys based on similarity criteria.

The dataframe comprises 85 variables, providing a compact yet detailed structure for analysis. The total number of steps is 771. However, it contains 1732 missing cells, accounting for 59.9% of the data. The total memory footprint of the dataset is approximately 22.6 KiB, with an average record size of 680.0 bytes, indicating a lightweight dataset overall. The variable composition is diverse, including 17 numeric variables, 54 categorical variables, 13 text variables, and one unsupported variable, reflecting a mix of quantitative and qualitative data [74].

Listing 1. JSON schema for documenting exploration narratives

3.1. Travel Metrics

The mapping of the journeys is shown in Figure 2. The web page for the mapping is accessible online [79]. Among the 437 geolocated steps (out of the total of 771), 166 were geolocated using Geopy (38%) and 28 with OpenCage (6%). This means that 243 steps were sourced manually (56%). Among them, 70 had not been found by either Geopy or OpenCage (16% of the total), while 173 were repositioned due to a lack of precision (40% of the total).

Figure 3 shows the distribution of visited countries, highlighting significant geographical diversity. Algeria and Morocco appear as frequent destinations, followed by countries such as Mali, Libya, and Niger, reflecting a strong concentration of visits in North and West Africa. Other African countries, including Guinea, Mozambique, Tanzania, and South Africa, are also well represented, emphasizing the continent’s importance in this distribution. Outside Africa, countries such as Spain, France, the United Kingdom, and Portugal show a notable presence, indicating departures from Europe.

Figure 4a illustrates the distribution of transportation types used, highlighting the predominance of traditional and land-based travel modes. Boats and caravans appear as the most frequently used means of transport, followed by walking and the use of camelids. Equids and slow land vehicles are also represented, though to a lesser extent, while trains seem to be the least utilized mode of transport. This distribution reflects a strong reliance on transport methods suited to diverse environments, often characterized by limited infrastructure, while emphasizing the importance of traditional travel modes in the studied regions.

Figure 4b presents the distribution of encountered difficulties, shedding light on the various challenges travelers faced. Transportation-related issues emerge as the most frequent difficulty, closely followed by climatic conditions and natural obstacles. Human interactions, though less common, also pose concerns. Fatigue and illness, as well as thirst and hunger, are less represented but remain significant challenges. This distribution suggests that travelers had to overcome both logistical and environmental obstacles, with particular attention given to travel conditions and climatic hazards.

3.2. Author Metrics

There are 25 distinct authors. Descriptive statistics are presented in Table 2. The variables include the number of stages, the number of images, the total distance traveled, the starting year, the departure age, and the trip duration in days. The data show significant variability, with high standard deviations for some variables, indicating a substantial dispersion of values. The distributions appear skewed, with medians often lower than means, suggesting the presence of outliers or widely spread data. The authors associated with the minimum and maximum values vary, reflecting diversity in data sources or contexts. The cardinality of the variables indicates a limited number of categories or unique values, which could influence the interpretation of the results.

Figure 5a presents a histogram of the distribution of the number of stages, with a superimposed density curve, revealing a concentration of values around 10 to 20 stages and a right skewness, where the frequency decreases for higher stage counts, reflecting a low probability for these values. Figure 5b illustrates the distribution of total distances traveled, showing a dominance of short distances with a peak near zero and a right skewness, indicating that longer distances are rare. Figure 5c displays the distribution of the number of images, characterized by a concentration in lower values and a decreasing frequency for higher counts, suggesting that datasets with few images are more common. Figure 5d represents the distribution of starting years, ranging from 1500 to 1900, where frequency varies across historical periods, with some periods showing a higher concentration of events. Figure 5e shows a histogram of departure ages, with frequency peaks around 28 to 30 years and a gradual decline for higher ages, highlighting an asymmetry and a dominant concentration around the early thirties. Finally, Figure 5f depicts the distribution of travel durations in days, where most journeys are short, with a high frequency for durations under 200 days and a rapid decline for longer trips, indicating that short-duration travels are more common.

The image in Figure 6 presents a correlation matrix between various numerical variables, revealing diverse relationships among them. The starting age shows a moderate positive correlation with the trip duration in days, suggesting that older travelers tend to undertake longer journeys. The starting year is negatively correlated with the total distance traveled and the trip duration, indicating that more recent trips tend to be shorter and cover less distance. The number of steps is positively correlated with the total distance and trip duration, implying that trips with more steps are generally longer and cover a greater distance. The number of images does not show a strong correlation with the other variables, suggesting that it is relatively independent. In summary, the matrix highlights complex relationships between the variables, with clear trends regarding age, duration, and distance of the trips.

The pie charts in image Figure 7 provide a visual representation of the distribution of nationalities, languages, and professions. The first figure (a) highlights a predominance of French (23.5%), Scottish (20.6%), and British (17.6%), while other nationalities, such as Americans, Finns, Germans, and Austrians, appear in smaller proportions, ranging from 5.9% to 8.8%. Certain categories, such as the Portuguese, Franco-Americans, and unspecified individuals, do not exceed 2.9%, emphasizing a relatively concentrated distribution around a few major groups. The second figure (b) illustrates the distribution of languages, with English clearly dominating at 67.6%, followed by French at 20.6%. Other languages, such as Swedish, Dutch, German, and Finnish, constitute only a small part of the total, each representing about 2.9%, reflecting a strong linguistic homogeneity centered around English and French. Finally, the last diagram (c) highlights the strong representation of explorers and writers, who occupy a dominant place. Geographers, travelers, and military personnel also stand out, although in more limited proportions. A wide variety of other professions is also recorded, including academics, missionaries, photographers, and botanists, as well as more specific categories such as Africanists, aristocrats, and engineers. This distribution reflects a strong concentration in fields related to exploration and writing, while also showcasing a notable diversity in professional trajectories.

3.3. Principal Component Analysis

Principal component analysis reveals that 34 components progressively explain the variance in the data, with the first components capturing a significant portion of the information. The first component (PC1) explains approximately 13.7% of the total variance, while the first five components together account for nearly 48.9% of this variance. PC1 is primarily influenced by variables such as the total distance traveled and certain specific destinations, including Saint Helena, Somalia, Cape Verde, and India. Regarding PC2, which explains about 10.6% of the variance, the most significant contributions come from the difficulties encountered, such as human interactions, climatic conditions, thirst or hunger, natural obstacles, as well as fatigue or illness. Figure 8 (left) illustrates the variance explained by the principal components. It can be observed that the first components, particularly the first five, contribute significantly to the total variance, with a rapid increase in the cumulative variance. Beyond the fifth component, the addition of new components brings a diminishing contribution to the explained variance, although the cumulative variance continues to progress until it reaches 100%. The variable plot, shown in Figure 8 (right), indicates that PC1 is negatively influenced by the departure year and weakly by other variables. Other components reveal more complex influences, with marked positive or negative associations on certain variables. The most influential variables include the age at departure and arrival, which significantly contribute to different dimensions of the variance. Several geographic variables, such as Tanzania, Senegal, the United Kingdom, Guinea, and Ivory Coast, also play a key role in structuring the data. Furthermore, the importance of certain modes of transport, such as camels, and the proportion of stages with null coordinates, highlight a differentiation in the routes based on travel characteristics.

3.3.1. Correspondence Factor Analysis

The correspondence analysis highlights the associations between authors and various variables related to travel routes and challenges encountered (cf. Figure 9). The first dimension mainly contrasts authors associated with caravan travels and Mauritania, positioned at the positive end, with those linked to France, Algeria, and modern transportation like trains positioned at the negative end. This suggests a distinction between explorers who traveled using traditional modes and those who used more modern transportation methods. The second dimension separates authors based on the difficulties and countries encountered. Authors associated with journeys marked by fatigue, illness, and walking are positioned higher, while those linked to human and climatic obstacles are positioned lower. Countries such as Tanzania, Mozambique, or South Africa are strongly associated with this dimension, suggesting a contrast between journeys marked by physical endurance and those confronted with external challenges. Specific associations also emerge. For example, authors like Ernest Psichari and the Association for Promoting the Discovery of the Interior Parts of Africa are positioned at the positive end of the first dimension, indicating an affiliation with caravan travels and expeditions in Mauritania or Western Sahara. In contrast, Matilda Betham-Edwards and Hugues Le Roux, positioned more negatively, appear to be linked to more European or modern travel contexts. Overall, the CA reveals a structure among explorers based on their modes of travel, the regions traversed, and the challenges encountered, highlighting contrasts between traditional and modern journeys, as well as between different types of obstacles faced.

3.3.2. Clustering

In order to determine the most appropriate number of classes from the dendrogram (see Figure 10), we conducted a thorough analysis of the hierarchical tree structure. This process led us to select a cutting height that would allow for the clearest separation of the different branches. To do this, we observed the points where the tree displays distinct divisions, revealing groupings within the data.

It became apparent that a cut around 12 to 13 on the y-axis represents a wise choice. At this height, the main branches of the dendrogram are divided into clearly distinct groups. By applying this cut, we found that the resulting number of classes typically varies between three and five, depending on the exact threshold chosen.

For comparison, a cut placed at a higher level, close to 15, tends to merge more subgroups, reducing the total number of classes to just two. Conversely, a lower cut, around 10, leads to a finer segmentation of the data, resulting in the formation of six to seven classes.

The organization of the data exported in the clusters.csv file is based on a structure with three main columns, labeled Title, Hierarchical_Cluster, and KMeans_Cluster. Each row in this file corresponds to a specific work, which is assigned two classifications from distinct cluster analysis methods. The first classification comes from a hierarchical clustering algorithm, while the second relies on segmentation achieved through the KMeans method.

The comparative analysis of the results provided by these two approaches highlights a certain degree of consistency in how the books are grouped, although notable divergences remain. Specifically, applying hierarchical clustering with a cutoff threshold of 12.5 leads to a partition of the works into five distinct clusters. The first group (Cluster 1) contains only two works, namely Le journal du premier voyage de Vasco da Gama and Le récit de l’expédition de Livingstone au Zambèze, suggesting a thematic or stylistic proximity between these texts. The second group (Cluster 2) is quantitatively dominant, encompassing the majority of the works in the sample. The third group (Cluster 3) includes works such as The Last Journals of David Livingstone, in Central Africa, from 1865 to His Death, Volume II (of 2), 1869–1873, Through Timbuctu and across the great Sahara, Exploration de l’Aïr, Les voix qui crient dans le désert, Timbouctou, voyage au Maroc au Sahara et au Soudan, Tome 2 (de 2), and Travels in the Great Desert of Sahara, in the Years of 1845 and 1846. In contrast, the fourth and fifth groups (Clusters 4 and 5) are highly isolated, each containing only one work. Specifically, Through Spain to the Sahara is found alone in Cluster 4, while Travels through Central Africa to Timbuctoo (Volume 1) constitutes Cluster 5 by itself.

In contrast, the KMeans algorithm proposes a more simplified structure, segmenting all the books into only two clusters. The first category (Cluster 1) groups the vast majority of the works, suggesting a strong thematic homogeneity among these texts. On the other hand, Cluster 2 is much smaller, containing only two works: Travels through Central Africa to Timbuctoo (Volume 1) and Through Spain to the Sahara.

The differences observed between these two classification methods reflect the specific criteria used for grouping. For instance, Through Spain to the Sahara is isolated in Cluster 4 according to hierarchical clustering, but it is associated with another work in Cluster 2 of KMeans. Similarly, Travels through Central Africa to Timbuctoo (Volume 1), which appears as an isolated item in the hierarchical classification (Cluster 5), is grouped with another text in the KMeans algorithm (Cluster 2). These divergences illustrate the sensitivity of clustering methods to the specific characteristics of the data and the proximity criteria defined.

In conclusion, while the results from both approaches show a certain robustness in identifying homogeneous groupings, they also reveal notable distinctions in the handling of “atypical” works. These differences highlight the complexity inherent in analyzing textual similarities and emphasize the need for a careful interpretation of the underlying criteria of each classification method.

The CSV file containing the aggregated cluster data includes a set of descriptive statistics applied to the different variables characterizing the journeys. Each variable is associated with several columns providing measures of central tendency and dispersion, including the arithmetic mean, standard deviation, as well as the extreme values, namely the minimum and maximum observed. Among the variables considered are the year of departure, age at the start of the journey, age at arrival, total duration of the journey in days, number of steps taken, number of associated images, percentage of steps with missing geographic coordinates, total distance traveled in kilometers, modes of transport used, territories crossed, and types of difficulties encountered during the journey.

Regarding the first cluster, consisting of two elements, the average year of departure is 1677.5, accompanied by high inter-individual variability. The average age at departure is 33.5 years, while the duration of the journeys within this group averages 2840.5 days. The total distance traveled is estimated at 21,075.845 km. The most frequently used modes of transport include primarily trains and equines. As for destinations, France and Algeria appear as the most visited countries. This cluster is characterized by major difficulties related to climatic conditions, modes of transport, and natural obstacles encountered during the routes.

The second cluster, comprising 24 elements, stands out with an average departure year of 1874.5, showing less variability than in the first cluster. The average age at departure is 21.75 years, and the average journey duration is 261.58 days. The total distance traveled is 5510.118 km. The transport modes exhibit some diversity, including trains, equines, and various low-speed land vehicles. The primary countries visited remain similar to those of the first cluster, with France and Algeria being among the most frequent destinations. The difficulties faced in this group are similar to those in the first cluster, suggesting recurring constraints related to climatic conditions and transport infrastructure.

The third cluster, consisting of six elements, is characterized by an average departure year of 1889.33, with moderate variability. The average age at the start of the journey is 35.33 years, while the average journey duration is 353 days. The total distance traveled by individuals in this cluster reaches an average of 2857.415 km. The modes of transport used and the destinations visited show a great similarity to those of the other groups, while the main difficulties identified are climatic, logistical, and environmental in nature.

The fourth cluster, which contains only one element, presents a unique departure year of 1868. The departure age and journey duration are not provided, but the distance traveled is quantified at 2766.24 km. The modes of transport used in this isolated case are comparable to those identified in the other groups.

The fifth and final cluster, also composed of a single element, is characterized by a departure year of 1824. The individual started their journey at the age of 24, and the total journey duration was 1316 days. The distance traveled is estimated at 2714.89 km. As with the other groups, the transport modes used and the territories crossed remain similar. The difficulties reported in this cluster are mainly attributed to climatic conditions and constraints related to transport infrastructure.

In conclusion, the comparative analysis of the different clusters highlights several common trends. Trains and equines are among the most frequently used modes of transport, regardless of the group studied. France and Algeria emerge as recurring destinations among travelers across all clusters. Furthermore, the difficulties encountered are largely homogeneous across the groups, with a predominance of climatic and logistical obstacles, suggesting a major influence of environmental conditions and infrastructure on the travel experiences of the individuals analyzed.

4. Discussion

The analysis of historical explorations, enhanced by artificial intelligence (AI) and geospatial technologies, opens new perspectives for studying routes, modes of transportation, and the challenges faced by explorers. This section examines the methodological and empirical contributions of our approach, highlighting its advantages, limitations, and future research directions.

4.1. Methodological Contributions

Our approach stands out by integrating AI and geospatial tools into the analysis of exploration narratives. While traditional historiographical studies rely on qualitative analysis subject to subjective biases and manual clustering, our computational method offers a systematic and quantitative analysis. It enables the identification of complex patterns related to routes, modes of transportation, and challenges encountered. Natural language processing (NLP) automatically extracts relevant data, while geospatial technologies improve the accuracy of route reconstructions. This approach provides a more objective and detailed understanding of historical exploratory dynamics.

4.2. Corpus Structuring

The application of principal component analysis and classification techniques allowed for an innovative structuring of the corpus. While previous groupings were based on subjective criteria, PCA objectively identifies the main factors influencing the narratives, such as distance traveled, frequent destinations, and obstacles encountered. This automatic segmentation highlights new trends, notably the distinction between explorers using traditional means (caravans, river navigation) and those adopting technological innovations (mechanical means). These results enable a more refined and contextualized historiographical analysis.

4.3. Correlation Analysis

Our study highlights correlations between explorers’ characteristics and the parameters of their expeditions. We found a significant relationship between explorers’ age and the duration of their journeys, emphasizing the influence of experience and physical capabilities on route and transportation choices. Older explorers often planned their expeditions better and benefited from a more robust logistical network. Research suggests that experience gained with age enhances resource management and logistical planning. For example, older leaders are often chosen for stable missions rather than risky explorations, illustrating their ability to manage complex operations [80]. These explorers generally had access to more resources and stronger organizational support, facilitating longer expeditions. This observation aligns with theories on technological evolution, which enabled faster and more efficient travel, reducing logistical constraints [81]. The analysis of transitions between traditional and modern transportation methods shows the impact of infrastructural and technological progress on exploratory practices. These results enrich our understanding of logistical advances that allowed for more ambitious expeditions. For example, in the 19th century, better infrastructure explains why this trend was more pronounced at certain times. Technologies like more efficient ships and advanced navigation tools enabled longer and safer voyages.

4.4. Influence of External Factors

Our approach integrates climatic, logistical, and sociopolitical factors beyond geographical constraints alone. It reveals that expeditions were also influenced by parameters such as extreme climatic conditions, supply constraints, and diplomatic relations. This holistic approach allows for a comprehensive understanding of expeditions, contributing to a more complete picture of exploratory challenges.

4.5. Empirical Validation and Historiographical Enrichment

One of our methodology’s objectives is to empirically validate certain historiographical hypotheses. Our results quantitatively confirm the predominance of certain regions, such as Algeria and Morocco, in exploration narratives. Beyond geopolitical considerations, our analysis demonstrates the importance of these areas in cultural and commercial exchanges. This empirical validation represents a methodological advancement by going beyond traditional descriptive approaches. It allows for the integration of quantitative data to enhance the rigor and reproducibility of historiographical analyses.

4.6. Methodological Limitations and Biases

Our study has some limitations. Although the corpus size is representative, it remains limited. Expanding the corpus by integrating diverse sources (regional digital libraries, historical archives) would strengthen the robustness of our conclusions. Moreover, the models employed involve simplifications, particularly in categorizing transportation methods and obstacles, which may affect the precision of the analyses. The exclusive use of Project Gutenberg introduces bias, as the availability of digitized texts varies by region and copyright restrictions. To mitigate this bias, it would be relevant to expand documentary sources and apply sampling techniques to balance regional and temporal representation. Finally, AI tools used present inherent biases, such as the prevalence of certain text types and inaccuracies in geospatial data. Rigorous documentation of the models and tools used is essential to ensure transparency and reproducibility in analyses.

4.7. Future Perspectives

Several research avenues emerge from this study. Expanding the corpus and improving semantic analysis algorithms would refine current results. Introducing predictive models could anticipate explorers’ routes and potential challenges, enriching the study of exploratory practices. Integrating interactions between human and environmental factors would be another promising avenue to better understand route choices. From a development perspective, a priority would be the creation of an interactive web interface for GeoJSON data entry and correction. This tool would offer researchers an efficient way to identify and complete missing information while improving the quality and granularity of geospatial data. Moreover, active collaboration with an international research community would allow for the extension of analyses to other historical periods and regions worldwide. Finally, to enhance the dissemination of results, designing a web-based cartographic application, inspired by projects like Bootleaf [82], could facilitate access to data in an interactive and educational format. Such a portal would offer historians and the general public dynamic visualizations of exploratory routes, thereby enhancing the scientific and cultural impact of the project.

5. Conclusions

This study has demonstrated the value of combining artificial intelligence and geospatial tools to systematically analyze historical explorers’ narratives. By integrating these technologies, the adopted methodology revealed recurring patterns and trends, particularly regarding the structuring of routes taken, the preferred modes of transportation by explorers, and the main challenges encountered during their journeys.

The goal of automatically extracting and structuring information from the narratives was achieved through the use of Python scripts and NotebookLM, which effectively extracted factual data such as visited locations, travel dates, and means of transportation. These data were organized in JSON format, facilitating further analysis.

To accurately locate the places mentioned in the narratives and validate geographic coordinates, the Geopy and OpenCage libraries were used for geocoding. An interactive web interface based on Leaflet enabled the visualization and validation of explorers’ routes. Errors were manually corrected, improving the accuracy of geographic data.

The categorization of transportation modes and encountered difficulties was successfully carried out using ChatGPT (GPT-4) and Python scripts. Geographic data were also enriched with information about current countries corresponding to each point, using the OpenCage API.

Statistical analysis of the data was made possible by Python functions developed to organize and explore the data, allowing for the calculation of statistics on transportation modes, challenges faced, and countries traversed. Techniques such as principal component analysis, correspondence analysis, and clustering analysis revealed latent structures and segments based on similarity criteria.

In conclusion, this research aligns with historiographical work that recognizes exploration narratives as significant historical sources. It stands out by introducing a systematic and computational approach, enriching the understanding of these narratives while bringing a new dimension through modern techniques and automated data processing. By applying this methodology, the research contributes to broadening our view of exploration journeys and opens new avenues for future scientific investigations.

To further this approach, several avenues could be explored. Expanding the corpus of analyzed texts would strengthen the robustness of the results and refine the conclusions. Integrating more advanced semantic analysis techniques could improve the extraction of subtle and nuanced information, enriching the understanding of the historical context and explorers’ motivations. The development of predictive models based on machine learning methods could offer the possibility of anticipating routes and challenges faced by non-writing explorers, based on the specific characteristics of their journeys. Finally, a deeper exploration of the links between exploration narratives and the sociopolitical dynamics of the time could help situate these journeys within a global framework, considering the geopolitical, economic, and cultural issues that marked these periods of exploration.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The JSON file is available online [78].

Conflicts of Interest

The author declares no conflicts of interest.

References

Yves, B. Explorations en Afrique centrale, 1790–1930: Apport des explorateurs à la connaissance du milieu. J. Afr. 2024, 93, 413–414. [Google Scholar]
Holtz, G.; Masse, V. Étudier les récits de voyage: Bilan, questionnements, enjeux. Arborescences 2012. [Google Scholar] [CrossRef]
Lefebvre, C. Chapitre I. Dans les pas des explorateurs. In Frontières de Sable, Frontières de Papier; Éditions de la Sorbonne: Paris, France, 2015. [Google Scholar]
Gauthier, J. La Société de géographie commerciale de Bordeaux: Sur les traces des explorateurs entre 1874 et 1911. Dyn. Environ. J. Int. Géosci. L’Environ. 2017, 39–40, 36–53. [Google Scholar] [CrossRef]
Gallou-Guyot, M.; Rousseau, C.; Perrochon, A. Les limites des revues systématiques de la littérature–quand le trop d’information devient délétère. Kinésithér. Rev. 2024, 24, 60–65. [Google Scholar]
Roetzel, P.G. Information overload in the information age: A review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development. Bus. Res. 2019, 12, 479–522. [Google Scholar] [CrossRef]
Shahrzadi, L.; Mansouri, A.; Alavi, M.; Shabani, A. Causes, consequences, and strategies to deal with information overload: A scoping review. Int. J. Inf. Manag. Data Insights 2024, 4, 100261. [Google Scholar] [CrossRef]
Seutter, J.; Kutzner, K.; Stadtländer, M.; Kundisch, D.; Knackstedt, R. “Sorry, too much information”—Designing online review systems that support information search and processing. Electron. Mark. 2023, 33, 47. [Google Scholar]
Rörden, J.; Gruber, D.; Krickl, M.; Haslhofer, B. Identifying historical travelogues in large text corpora using machine learning. In Sustainable Digital Communities, Proceedings of the 15th International Conference, iConference 2020, Boras, Sweden, 23–26 March 2020; Springer: Cham, Switzerland, 2020; pp. 801–815. [Google Scholar]
Chachereau, N.; Humair, C. Méthodes informatiques et quantitatives en histoire du tourisme: Apports et limites. Introduction. Mondes Tour. 2023. [Google Scholar] [CrossRef]
Villamor Martin, M.; Kirsch, D.A.; Prieto-Nañez, F. The promise of machine-learning-driven text analysis techniques for historical research: Topic modeling and word embedding. Manag. Organ. Hist. 2023, 18, 81–96. [Google Scholar] [CrossRef]
Winterbottom, A.; Bekker, H.L.; Conner, M.; Mooney, A. Does narrative information bias individual’s decision making? A systematic review. Soc. Sci. Med. 2008, 67, 2079–2088. [Google Scholar]
Betsch, C.; Haase, N.; Renkewitz, F.; Schmid, P. The narrative bias revisited: What drives the biasing influence of narrative information on risk perceptions? Judgm. Decis. Mak. 2015, 10, 241–264. [Google Scholar]
Partlan, N.; Carstensdottir, E.; Snodgrass, S.; Kleinman, E.; Smith, G.; Harteveld, C.; El-Nasr, M.S. Exploratory automated analysis of structural features of interactive narrative. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, New Orleans, LA, USA, 2–7 February 2018; Volume 14, pp. 88–94. [Google Scholar]
Jones, S.; Fox, C.; Gillam, S.; Gillam, R.B. An exploration of automated narrative analysis via machine learning. PLoS ONE 2019, 14, e0224634. [Google Scholar]
Sudhahar, S.; Cristianini, N. Automated analysis of narrative content for digital humanities. Int. J. Adv. Comput. Sci. 2013, 3, 440–447. [Google Scholar]
Chen, Q.; Cao, S.; Wang, J.; Cao, N. How does automation shape the process of narrative visualization: A survey of tools. IEEE Trans. Vis. Comput. Graph. 2023, 30, 4429–4448. [Google Scholar] [CrossRef] [PubMed]
Álvarez-Carmona, M.Á.; Aranda, R.; Rodríguez-Gonzalez, A.Y.; Fajardo-Delgado, D.; Sánchez, M.G.; Pérez-Espinosa, H.; Martínez-Miranda, J.; Guerrero-Rodríguez, R.; Bustio-Martínez, L.; Díaz-Pacheco, Á. Natural language processing applied to tourism research: A systematic review and future research directions. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 10125–10144. [Google Scholar]
Gregoriades, A.; Pampaka, M.; Herodotou, H.; Christodoulou, E. Explaining tourist revisit intention using natural language processing and classification techniques. J. Big Data 2023, 10, 60. [Google Scholar]
Brunsting, S.; De Sterck, H.; Dolman, R.; van Sprundel, T. Geotexttagger: High-precision location tagging of textual documents using a natural language processing approach. arXiv 2016, arXiv:1601.05893. [Google Scholar]
Hu, Y.; Mao, H.; McKenzie, G. A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements. Int. J. Geogr. Inf. Sci. 2019, 33, 714–738. [Google Scholar]
Lefebvre, C.; Surun, I. Exploration et transferts de savoir: Deux cartes produites par des Africains au début du 19e siècle. M@ Ppemonde 2008, 4, 1–24. [Google Scholar]
Lefebvre, C. Frontières de Sable, Frontières de Papier: Histoire de Territoires et de Frontières, du Jihad de Sokoto à la Colonisation Française du Niger, XIXe-XXe Siècles; Éditions de la Sorbonne: Paris, France, 2019. [Google Scholar]
Buchanan, A. Exploration of Aïr: Out of the World North of Nigeria; J. Murray: South Elgin, IL, USA, 1921. [Google Scholar]
Buchanan, A. Sahara; J. Murray: South Elgin, IL, USA, 1926. [Google Scholar]
Buchanan, A. Three Years of War in East Africa; J. Murray: South Elgin, IL, USA, 1920. [Google Scholar]
Proceedings of the Association for Promoting the Discovery of the Interior Parts of Africa; C. Macrae, Printer to the Association: New York, NY, USA, 1790.
Haywood, A. Through Timbuctu and across the Great Sahara. Geogr. J. 1913, 41, 278. [Google Scholar]
Kilian, C. Au Hoggar, Mission de 1922: Ouvrage Orne de Trois Cartes et de Seize Planches Hors-Texte; Société d’éditions géographiques maritimes et coloniales: Paris, France, 1925. [Google Scholar]
Livingstone, D. A Popular Account of Dr. Livingstone’s Expedition to the Zambesi and Its Tributaries: And of the Discovery of Lakes Shirwa and Nyassa, 1858–1864: Abridged from the Larger Work; J. Murray: South Elgin, IL, USA, 1875. [Google Scholar]
Livingstone, D. Missionary Travels and Researches in South Africa: In Large Print; BoD–Books on Demand: Norderstedt, Germany, 2022. [Google Scholar]
Livingstone, D. The Last Journals of David Livingstone: In Central Africa, from 1865 to His Death; RW Bliss: Fort Huachuca, AZ, USA, 1875. [Google Scholar]
Waller, H. The Last Journals of David Livingstone, in Central Africa; RW Bliss: Fort Huachuca, AZ, USA, 1875. [Google Scholar]
Psichari, E. Les Voix qui Crient dans le désert: Souvenirs d’Afrique; L. Conard: Washington, DC, USA, 1928. [Google Scholar]
Fromentin, E. Un été dans le Sahara; Plon: Schleswig-Holstein, Germany, 1888. [Google Scholar]
Grogan, E.S.; Sharp, A.H. From the Cape to Cairo: The First Traverse of Africa from South to North; Hurst and Blackett: London, UK, 1900. [Google Scholar]
Rohlfs, G. Land und Volk in Afrika: Berichte aus den Jahren 1865–1870; H. Fischer Nachf: Hamburg, Germany, 1884. [Google Scholar]
von Alfthan, G.E. Afrikanska Reseminnen: Äfventyr Och Intryck Från en Utflykt Till de Svartes Världsdel; PH Beijer, distr.: Helsingfors, Finland, 1892. [Google Scholar]
Woodberry, G.E. North Africa and the Desert: Scenes and Moods; C. Scribner’s SONS: New York, NY, USA, 1914. [Google Scholar]
de Nordeck, G. Het land der Bagas en de Rio-Nuñez; DigiCat: London, UK, 2023. [Google Scholar]
Mattsson, G.; Lehtonen, J. Suomen mies meni Zanzibariin; Project Gutenberg: Salt Lake City, UT, USA, 2020. [Google Scholar]
Barth, H.; Bettany, G. Travels and Discoveries in North and Central Africa: Including Accounts of Tripoli, the Sahara, the Remarkable Kingdom of Bornu, and the Countries Around Lake Chad; Good Press: Boca Raton, FL, USA, 1890. [Google Scholar]
Le Roux, H. Au Sahara: Illustré d’après des Photographies de l’auteur. Gravées par Petit et Cie; Libr. Marpon & Flammarion: Paris, France, 1890. [Google Scholar]
Richardson, J. Narrative of a Mission to Central Africa, 1850–1851; Routledge: London, UK, 2014. [Google Scholar]
Richardson, J. Travels in the Great Desert of Sahara, in the Years of 1845 and 1846; DigiCat: London, UK, 2022. [Google Scholar]
Betham-Edwards, M. Through Spain to the Sahara; Hurst and Blackett: London, UK, 1868. [Google Scholar]
Lenz, O. Timbouctou, Voyage au Maroc: Au Sahara et au Soudan; Hachette: New York, NY, USA, 1886; Volume 1. [Google Scholar]
Du Chaillu, P.B. In African Forest and Jungle; C. Scribner’s Sons: New York, NY, USA, 1914. [Google Scholar]
Caillié, R. Travels Through Central Africa to Timbuctoo: And Across the Great Desert, to Morocco, Performed in the Years 1824–1828; Routledge: London, UK, 1830. [Google Scholar]
Caillié, R. Voyage d’un Faux Musulman à Travers l’Afrique. Tombouctou, le Niger, Jenné et le Désert; Good Press: Boca Raton, FL, USA, 2023. [Google Scholar]
Davis, R.H. The Congo and Coasts of Africa; T. Fisher Unwin: London, UK, 1907. [Google Scholar]
Nelson, T. A Biographical Memoir of the Late Dr. Walter Oudney, Captain Hugh Clapperton, Both of the Royal Navy, and Major Alex. Gordon Laing, All of Whom Died Amid Their Active and Enterprising Endeavours to Explore the Interior of Africa; Prabhat Prakashan: New Delhi, India, 2024. [Google Scholar]
Ravenstein, E.G. A Journal of the First Voyage of Vasco da Gama, 1497–1499; Hakluyt Society: London, UK, 2017. [Google Scholar]
Harris, W. Tafilet: The Narrative of a Journey of Exploration in the Atlas Mountains and the Oases of the North-West Sahara; W. Blackwood and Sons: Edinburgh, UK, 1895. [Google Scholar]
Cannon, W.A. Botanical Features of the Algerian Sahara; Number 178; Carnegie Institution of Washington: Washington, DC, USA, 1913. [Google Scholar]
Stroube, B. Literary freedom: Project gutenberg. XRDS Crossroads ACM Mag. Stud. 2003, 10, 3. [Google Scholar] [CrossRef]
Rowberry, S. The Early Development of Project Gutenberg c. 1970–2000; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
Barreau, J.B. Python Script for Extracting, Geocoding, and Structuring Geographic Data from Historical Explorers’ Records. 2025. Available online: https://github.com/jean-baptiste-barreau/jean-baptiste-barreau.github.io/blob/main/explorers/explorateurs.py (accessed on 15 March 2025).
Chandra, R.V.; Varanasi, B.S. Python Requests Essentials; Packt Publishing: Birmingham, UK, 2015. [Google Scholar]
Richardson, L. Beautiful Soup Documentation. 2007. Available online: https://ucilnica.fri.uni-lj.si/pluginfile.php/217774/mod_resource/content/1/beautiful-soup-4-readthedocs-io-en-latest.pdf (accessed on 13 November 2024).
Pezoa, F.; Reutter, J.L.; Suarez, F.; Ugarte, M.; Vrgoč, D. Foundations of JSON schema. In Proceedings of the 25th international Conference on World Wide Web, Montreal, BC, Canada, 11–15 April 2016; pp. 263–273. [Google Scholar]
Huffman, P.; Hutson, J. Enhancing History Education with Google NotebookLM: Case Study of Mary Easton Sibley’s Diary for Multimedia Content and Podcast Creation. ISRG J. Arts Humanit. Soc. Sci. 2024, 2, 683. [Google Scholar]
Mehta, N.; Agrawal, A.; Benjamin, J.; Mehta, S.; MacNeill, H.; Masters, K. Pedagogy and generative artificial intelligence: Applying the PICRAT model to Google NotebookLM. Med. Teach. 2024, 1–3. [Google Scholar] [CrossRef]
D’mello, B.J.; Sriparasa, S.S. JavaScript and JSON Essentials: Build Light Weight, Scalable, and Faster Web Applications with the Power of JSON; Packt Publishing: Birmingham, UK, 2018. [Google Scholar]
GeoPy Contributors. GeoPy Documentation. 2014. Available online: https://app.readthedocs.org/projects/geopy/downloads/pdf/latest/ (accessed on 26 March 2025).
Zeigermann, L. OPENCAGEGEO: Stata Module for Forward and Reverse Geocoding Using the OpenCage Geocoder API. 2018. Available online: https://econpapers.repec.org/software/bocbocode/s458155.htm (accessed on 14 September 2016).
Crickard, P., III. Leaflet. js Essentials; Packt Publishing Ltd.: Birmingham, UK, 2014. [Google Scholar]
Hou, D.; Miao, Z.; Xing, H.; Wu, H. Two novel benchmark datasets from ArcGIS and bing world imagery for remote sensing image retrieval. Int. J. Remote Sens. 2021, 42, 240–258. [Google Scholar] [CrossRef]
González-Gallardo, C.E.; Boros, E.; Girdhar, N.; Hamdi, A.; Moreno, J.G.; Doucet, A. Yes but.. can chatgpt identify entities in historical documents? In Proceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Santa Fe, NM, USA, 26–30 June 2023; pp. 184–189. [Google Scholar]
Chartier, M.A.; Dakkoune, N.; Bourgeois, G.; Jean, S. Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d’historiens. In Proceedings of the 24ème Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC 2024), Dijon, France, 24–26 January 2024; Number 40. pp. 155–166. [Google Scholar]
Guyeux, C. Predicting the Number of Pedestrians per Street Section: A Detailed Step-by-step Example. In Proceedings of the International Conference on Information Technology & Systems; Springer: Berlin/Heidelberg, Germany, 2023; pp. 341–349. [Google Scholar]
McKinney, W. pandas: A foundational Python library for data analysis and statistics. Python High Perform. Sci. Comput. 2011, 14, 1–9. [Google Scholar]
Clemente, F.; Ribeiro, G.M.; Quemy, A.; Santos, M.S.; Pereira, R.C.; Barros, A. ydata-profiling: Accelerating data-centric AI with high-quality data. Neurocomputing 2023, 554, 126585. [Google Scholar]
Barreau, J.B. Explorers DataFrame Profiling Report. 2025. Available online: https://jean-baptiste-barreau.github.io/explorers/dataframe_report.html (accessed on 15 March 2025).
Brownlee, J. How to use standardscaler and minmaxscaler transforms in python. Mach. Learn. Mastery 2020, 10, 10. [Google Scholar]
Likas, A.; Vlassis, N.; Verbeek, J.J. The global k-means clustering algorithm. Pattern Recognit. 2003, 36, 451–461. [Google Scholar]
Schielke, H.J.; Fishman, J.L.; Osatuke, K.; Stiles, W.B. Creative consensus on interpretations of qualitative data: The Ward method. Psychother. Res. 2009, 19, 558–565. [Google Scholar]
Barreau, J.B. JSON Dataset Containing Detailed Information on Historical Explorers and Their Expeditions. 2025. Available online: https://github.com/jean-baptiste-barreau/jean-baptiste-barreau.github.io/blob/main/explorers/explorateurs.json (accessed on 15 March 2025).
Barreau, J.B. Interactive Map Showcasing Historical Explorers’ Expeditions. 2025. Available online: https://jean-baptiste-barreau.github.io/explorers/map.html (accessed on 15 March 2025).
Spisak, B.R.; Grabo, A.E.; Arvey, R.D.; Van Vugt, M. The age of exploration and exploitation: Younger-looking leaders endorsed for change and older-looking leaders endorsed for stability. Leadersh. Q. 2014, 25, 805–816. [Google Scholar]
Wikipedia Contributors. Exploration. 2025. Available online: https://en.wikipedia.org/wiki/Exploration (accessed on 22 March 2025).
Nyangweso, D.O.; Gede, M. An Open-Source Framework for Publishing Geographical Names—A Case Study of Kenya. 2021. Available online: https://repository.dkut.ac.ke:8080/xmlui/handle/123456789/4736 (accessed on 19 April 2023).

Figure 1. Workflow for geospatial data analysis and visualization of exploration narratives.

Figure 2. Interactive map of explorers’ journeys across Africa.

Figure 3. Number of visits per country: distribution of visited destinations.

Figure 4. Distributions of modes of transport and encountered difficulties.

Figure 5. Frequency distribution of the number of steps (a), total distance traveled (b), the number of images (c), start years (d), departure ages (e), and travel duration (f).

Figure 6. Correlation matrix of travel−related numerical columns: departure age, start year, number of steps, number of images, total distance traveled, and travel duration.

Figure 7. Distribution of nationalities among participants (a), spoken languages among participants (b), participant activities and interests (c).

Figure 8. Explained variance by principal components (left) and graph of variables (right).

Figure 9. Correspondence analysis: Representation of rows and columns.

Figure 10. Hierarchical clustering dendrogram of distance metrics with a cutoff threshold of 12.5.

Table 1. Authors and book titles involved in the study.

Author	Title
Angus Buchanan	Exploration de l’Aïr [24]
Angus Buchanan	Sahara [25]
Angus Buchanan	Three Years of War in East Africa [26]
Association for Promoting the Discovery of the Interior Parts of Africa	Proceedings of the Association for Promoting the Discovery of the Interior Parts of Africa [27]
Austin Hubert Wightwick Haywood	Through Timbuctu and across the great Sahara [28]
Conrad Kilian	Au Hoggar [29]
David Livingstone	A Popular Account of Dr. Livingstone’s Expedition to the Zambesi and Its Tributaries [30]
David Livingstone	Missionary Travels and Researches in South Africa [31]
David Livingstone	The Last Journals of David Livingstone, in Central Africa, from 1865 to His Death, Volume I (of 2), 1866–1868 [32]
David Livingstone	The Last Journals of David Livingstone, in Central Africa, from 1865 to His Death, Volume II (of 2), 1869–1873 [33]
Ernest Psichari	Les voix qui crient dans le désert [34]
Eugène Fromentin	Un été dans le Sahara [35]
Ewart Scott Grogan	From the Cape to Cairo: The First Traverse of Africa from South to North [36]
Friedrich Gerhard Rohlfs	Land und Volk in Afrika, Berichte aus den Jahren 1865–1870 [37]
Georg Edvard von Alfthan	Afrikanska reseminnen, Äfventyr och Intryck från En utflykt till de Svartes Världsdel [38]
George Edward Woodberry	North Africa and the Desert. Scenes and Moods [39]
Grégoire-Gaspard-Félix Coffinières de Nordeck	Het land der Bagas en de Rio-Nuñez [40]
Gustaf Otto Mattsson	Suomen mies meni Zanzibariin [41]
Heinrich Barth	Travels and discoveries in North and Central Africa [42]
Hugues Le Roux	Au Sahara [43]
James Richardson	Narrative of a Mission to Central Africa Performed in the Years 1850–51 [44]
James Richardson	Travels in the Great Desert of Sahara, in the Years of 1845 and 1846 [45]
Matilda Betham-Edwards	Through Spain to the Sahara [46]
Oskar Lenz	Timbouctou, voyage au Maroc au Sahara et au Soudan, Tome 2 (de 2) [47]
Oskar Lenz	Timbouctou, voyage au Maroc, au Sahara et au Soudan, Tome 1 (de 2) [47]
Paul Belloni Du Chaillu	In African Forest and Jungle [48]
René Caillié	Travels through Central Africa to Timbuctoo and across the Great Desert to Morocco performed in the year 1824–1828, in Two Volumes, Vol. I [49]
René Caillié	Travels through Central Africa to Timbuctoo and across the Great Desert to Morocco performed in the year 1824–1828, in Two Volumes, Vol. II [49]
René Caillié	Voyage d’un faux musulman à travers l’Afrique [50]
Richard Harding Davis	The Congo And Coasts Of Africa [51]
Thomas Nelson (publisher)	A biographical memoir of the late Dr. Walter Oudney, Captain Hugh Clapperton, both of the Royal Navy, and Major Alex. Gordon Laing, all of whom died amid their active and enterprising endeavours to explore the interior of Africa [52]
Vasco da Gama	A Journal of the First Voyage of Vasco da Gama 1497–1499 [53]
Walter Burton Harris	Tafilet [54]
William Austin Cannon	Botanical features of the Algerian Sahara [55]

Table 2. Descriptive statistics of explorations by author.

	Number of Steps	Number of Images	Total Distance Traveled (km)	Start Year	Departure Age	Travel Duration in Days
Mean	30.84	23.22	8209.17	1862.16	32.33	313.58
Std	23.59	21.53	8840.76	83.73	11.89	325.17
Min	5.0	1.0	102.54	1497.0	23.0	15.0
Max	82.0	82.0	35819.58	1922.0	73.0	999.0
Median	21.0	15.0	4584.06	1885.0	28.0	162.0
IQR	34.0	25.0	8639.38	61.0	5.5	468.5
Author Min	Thomas Nelson (publisher)	Thomas Nelson (publisher)	Grégoire-Gaspard-Félix Coffinières de Nordeck	Vasco da Gama	Conrad Kilian	Walter Burton Harris
Author Max	René Caillié	William Austin Cannon	Vasco da Gama	Conrad Kilian	Grégoire-Gaspard-Félix Coffinières de Nordeck	David Livingstone
Cardinality	25	18	24	25	15	12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barreau, J.-B. Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools. Electronics 2025, 14, 1395. https://doi.org/10.3390/electronics14071395

AMA Style

Barreau J-B. Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools. Electronics. 2025; 14(7):1395. https://doi.org/10.3390/electronics14071395

Chicago/Turabian Style

Barreau, Jean-Baptiste. 2025. "Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools" Electronics 14, no. 7: 1395. https://doi.org/10.3390/electronics14071395

APA Style

Barreau, J.-B. (2025). Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools. Electronics, 14(7), 1395. https://doi.org/10.3390/electronics14071395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mapping the Past: Unlocking Historical Explorer Narratives with AI and Geospatial Tools

Abstract

1. Introduction

1.1. Explorer Accounts as Historical Sources

1.2. Challenges and Limitations of Traditional Analysis

1.3. Contributions of Quantitative and Computational Approaches

1.4. Objectives and Hypothesis

2. Materials and Methods

2.1. Input Data

2.2. Data Extraction and Transformartion

2.2.1. Geocoding and Interactive Validation

2.2.2. Classification and Enrichment of Data

2.3. Statistical Analysis

2.3.1. Principal Component Analysis

2.3.2. Factorial Correspondence Analysis

2.3.3. Clustering

3. Results

3.1. Travel Metrics

3.2. Author Metrics

3.3. Principal Component Analysis

3.3.1. Correspondence Factor Analysis

3.3.2. Clustering

4. Discussion

4.1. Methodological Contributions

4.2. Corpus Structuring

4.3. Correlation Analysis

4.4. Influence of External Factors

4.5. Empirical Validation and Historiographical Enrichment

4.6. Methodological Limitations and Biases

4.7. Future Perspectives

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI