On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities

Karageorgoudis, Emmanuil; Papakostas, Christos; Lianos Liantis, Efstathios; Miotto, Marco

doi:10.3390/heritage8080330

Open AccessArticle

On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities

by

Emmanuil Karageorgoudis

¹,

Christos Papakostas

^2,*

,

Efstathios Lianos Liantis

¹ and

Marco Miotto

³

¹

Department of Social Theology and Religious Studies, National and Kapodistrian University of Athens, 15784 Athens, Greece

²

Department of Informatics and Computer Engineering, University of West Attica, 12243 Athens, Greece

³

Department of History and Ethnology, Democritus University of Thrace, 69100 Komotini, Greece

^*

Author to whom correspondence should be addressed.

Heritage 2025, 8(8), 330; https://doi.org/10.3390/heritage8080330

Submission received: 7 July 2025 / Revised: 1 August 2025 / Accepted: 12 August 2025 / Published: 14 August 2025

Download

Browse Figure

Versions Notes

Abstract

Despite Greece’s historical and geographical significance in the Mediterranean, there is currently no national digital repository offering systematic access to Arabic chronicles, diplomatic letters, and travelogues from the eighth to sixteenth centuries. This absence critically impedes rigorous Arabological and Islamological research within Greek academia and restricts the educational landscape to predominantly Eurocentric perspectives. The Hellenic Digital Library of Arabic Historical Sources (HDB-AHS) is proposed as a pre-implementation targeted solution, presenting a trilingual (Greek–English–Arabic) digital platform designed to aggregate, preserve, and openly disseminate these vital sources. The article outlines a six-phase implementation plan combining IIIF, TEI-XML, FAIR for interoperability and reuse and CARE principles where community authority or sensitivity requires it, and open licensing with a robust rights–clearance framework for modern copyrights and sensitive materials. Beyond academic benefits, the project aspires to act as a meeting point of cultures, offering concrete tools for building bridges, combating intolerance, and fostering intercultural understanding. In a world that is rapidly changing, the creation of such an inclusive and responsibly curated digital resource is vital not only for advancing research but also for supporting dialogue and mutual respect across societies. The HDB-AHS provides a blueprint for similar initiatives in underrepresented fields.

Keywords:

Arabic digital repository; FAIR/CARE principles; trilingual platform; TEI-XML

1. Introduction

Humanities scholarship is riding a digital wave [1,2]. Text-recognition engines can unlock the handwriting of tenth-century scribes, and open-data portals let a researcher in New Zealand consult parchment preserved in Cairo with a click. Yet, this new accessibility has gaps, and one of the widest lies exactly where three continents meet: the Greek–Arabic corridor of the eastern Mediterranean. The present paper proposes the Hellenic Digital Library of Arabic Historical Sources (HDB-AHS)1 as a remedy. Its purpose is to gather Arabic chronicles, diplomatic letters, and travelogues covering the eighth to sixteenth centuries into one trilingual, FAIR-compliant platform [3] hosted in Greece. By doing so, the project tackles two global problems, the fragmented evidence for medieval Mediterranean history and the rising cultural intolerance, and one national omission: Greece still lacks a trusted electronic resource for Arabological and Islamological materials despite holding sizeable collections.

To ground the plan, this section states the project’s current status and roadmap. The paper’s topic matters beyond the specialist circle as Arabic historians reported food prices in Alexandria, recorded plague waves in Damascus, and described sea-routes that carried Genoese cotton to Mamluk ports. Their pages supply data that modern economic and climate historians need to test competing theories about pre-modern globalisation and environmental stress. One camp maintains that Mediterranean markets were already integrated by the fourteenth century, citing parallel price movements in Venice and Barcelona; another insists integration was weak, pointing to discordant figures from Cairo [4,5]. Without systematic access to Arabic price lists, the debate remains unresolved. Likewise, migration scholars ask whether mixed neighbourhoods in medieval port cities fostered long-term tolerance; answers vary because Arabic eyewitness accounts are harder to consult than Latin town chronicles. By centralising Arabic sources, the HDB-AHS equips scholars outside Middle-Eastern studies—economists, geographers, social scientists—to weigh evidence firsthand and move these wider debates forward.

As far as the current landscape is concerned including progress and blind spots, the digitisation of cultural heritage has accelerated [6,7,8]. Europeana2 hosts more than 50 million objects, the Vatican Library streams high-resolution manuscripts via the International Image Interoperability Framework (IIIF)3, and projects such as the Yemeni Manuscript Digitization Initiative4 safeguard endangered codices. Yet the coverage is uneven. Arabic holdings in Greek universities remain offline; Ottoman and Syriac collections receive periodic attention, but systematic Arabic pipelines are rare. Technical frameworks exist—IIIF for images, TEI-XML for transcriptions—but institutional adoption in Greece stops at classical and Byzantine corpora. Consequently, Arabological research in the country depends on loans from foreign archives, delaying projects and narrowing the questions scholars can ask. Internationally, the absence of a Greek-based Arabic portal hinders comparative projects that align Byzantine, Latin, and Arabic narratives.

Digitising Islamic heritage held outside majority-Muslim countries is not free of contention. Some argue that large-scale open release risks “digital dispossession,” placing rare texts in Western-controlled clouds rather than in local servers [9,10]. Others warn that over-cautious embargoes create a digital divide, privileging researchers with travel funds [11,12]. A middle position advocates open images but insists on community consultation for sensitive content [13]. There are also scholarly disagreements over the reliability of script-aware optical-character recognition: optimists report very high accuracy for nineteenth-century Arabic print while sceptics highlight far lower rates for earlier ligature-rich fonts. The HDB-AHS enters this contested space by combining open licences (CC BY-NC for images, CC BY-SA for text) with a rights–clearance workflow that allows embargoes when copyright holders or communities request them. It will publish OCR confidence levels alongside scans so users can judge transcription reliability—a practice that addresses, rather than sidesteps, technical dissent.

The purpose and significance of the present work is to set out a blueprint for building a sustainable, rights-aware, and machine-readable Arabic digital library in Greece. Its importance is threefold:

Scholarly acceleration. By replacing months of archival travel with seconds of browser search, the platform shortens the research cycle and widens participation in Arabic studies;
Pedagogical enrichment. Multilingual metadata will power bilingual lesson plans, enabling Greek and international classrooms to compare sources across languages;
Civic dialogue. Open, verifiable evidence of past cultural exchange counters intolerance by showing that coexistence and mutual influence have deep historical roots.

Digital scholarship has matured from one-off scanning projects to integrated data ecosystems. Standards such as FAIR—Findable, Accessible, Interoperable, Reusable—guide [3] funding agencies; IIIF has become the norm for image distribution; and the Linked Open Data movement ties together authority files like VIAF5 and GeoNames6. Yet a survey of peer-reviewed literature reveals a skew. Most case studies showcase European or East-Asian corpora. Articles on Arabic digitisation appear, but they often focus on single libraries or technical experiments, not end-to-end infrastructures. Greek contributions to this dialogue are confined largely to classical and Byzantine material. Thus, there is both a thematic and geographical gap that the HDB-AHS seeks to fill, providing a case study of how a mid-sized institution can adopt global standards while hosting non-Greek heritage.

With the need established, the next section sets out the principles that shape our design. The main aim is to demonstrate that a FAIR-compliant, IIIF-enabled Arabic library can be built and maintained in Greece through a six-phase workflow: selection, identification, acquisition, digitisation, website development, and publishing. The article concludes that such a workflow is feasible within a 36-month horizon, provided that rights management, multilingual interface design, and clear evaluation metrics are integrated from the start. A secondary conclusion is that transparency builds trust with both scholarly and source communities, setting a replicable model for future projects on Ottoman Turkish or Syriac materials.

Because the audience of this journal spans multiple disciplines, the article explains technical acronyms in plain language: IIIF is likened to a set of “digital shipping crates” that let any viewer unpack images; TEI-XML functions as “smart brackets” around transcribed text; FAIR principles [3] are unpacked as a checklist for data longevity. Historical references are supplied with brief context; for example, Ibn Jubayr is introduced as a twelfth-century traveller whose diary rivals that of Marco Polo in scope. Where debates arise—such as over the degree of medieval market integration—competing hypotheses are stated in neutral terms before indicating how the new corpus can arbitrate between them. Readers from computer science or library science can thus grasp why the humanities content matters while historians learn why technical standards are not mere jargon but vehicles for discovery.

The rest of the paper is structured as follows. Section 2 articulates the scholarly objectives; Section 3 presents the theoretical framework and translates it into concrete requirements; Section 4 details the six-phase implementation roadmap, highlighting risk controls and staffing needs; Section 5 discusses legal and ethical considerations, including copyright clearance, GDPR compliance, and community consultation for culturally sensitive material; and Section 6 summarises conclusions and future work.

2. Scholarly Objectives and Relevance

The HDB-AHS is designed around a tightly defined corpus: narrative chronicles that chart wars and dynastic succession, diplomatic letters that record negotiations across religious frontiers, and travelogues that describe shipping lanes, markets, and monuments from Al-Andalus to the Indian Ocean. Chronologically, the collection spans the eighth–sixteenth centuries, the long millennium during which Arabic was both a lingua franca of commerce and a vehicle of scholarly exchange throughout the Mediterranean and beyond. Within these genres lie eyewitness accounts of sieges, price lists of staple grains, itineraries of pilgrim caravans, and panegyrics to Byzantine emperors penned by captive Arab poets. Together they form a documentary backbone for three major research communities—historians of Byzantium, specialists in early Ottoman studies, and scholars tracing pre-modern circuits of global trade—yet they remain dispersed in small print runs, microfilm cabinets, or idiosyncratic personal scans [14,15].

2.1. Fragmented Access and Its Consequences

The problem of access is not merely logistical; it shapes research outcomes. Compared with Latin or Chinese textual traditions, Arabic materials have entered the digital domain in piecemeal fashion. A handful of collection-level projects—the Yemeni Manuscript Digitization Initiative and the Wellcome Arabic Manuscripts portal7—have preserved priceless codices, but their metadata models are incompatible, and their coverage is thematic rather than comprehensive (ymdi.uoregon.edu, wamcp.bibalex.org). At the same time, European and East Asian libraries have released millions of IIIF manifests, enabling machine-readable comparanda that pull scholarship toward topics for which sources have already been aggregated. The scarcity of interoperable Arabic corpora restrains quantitative economic modelling, hampers linked-data person registries, and limits the reproducibility of textual analyses that have become routine in Digital Humanities. Studies of medieval Mediterranean grain markets, for example, often stop at the gates of Cairo because the Arabic series that continue beyond them are still trapped on microfilm in a single archive [16]. Without a consolidated platform, scholars produce case-study monographs rather than shareable datasets, and cross-lingual comparisons remain tentative.

2.2. Greece at the Crossroads

A Greek-led initiative can reverse that imbalance. Geographically, Greece sits where the Aegean meets the Levant; historically, its ports brokered flows of silk, grain, and ideas between Arab caliphates and Latin Christendom. Contemporary Athens and Thessaloniki hold significant Arabic and Ottoman collections brought during the Philhellenic movement and the interwar years. Greek research infrastructure—anchored in universities, the Academy of Athens, and the National Documentation Centre8—already hosts large-scale digitisation programmes in Byzantine and classical studies. Extending this expertise to Arabic materials creates an archival bridge that aligns naturally with Greece’s diplomatic and scholarly agenda of positioning itself as a hub for Mediterranean heritage9. Crucially, the trilingual interface (Greek–English–Arabic) will embed Arabic sources within Greek- and Latin-language corpora, allowing scholars to trace the same event across linguistic boundaries without leaving the platform.

2.3. Scenarios

The first scenario is a prosopographical mapping of shared elites The research gain is high-resolution prosopographical mapping. Individual actors—envoys, merchants, converts—often move in and out of both Greek and Arabic records under different names or titles. The Byzantine–Arab Chronicle of 741 [17] credits an Arab commander with the rank of patrikios; Arabic biographical dictionaries list the same figure under a nisba tied to his tribal lineage while Byzantine seals render his name in Greek phonetics. By aligning these entries through authority files and geotagged life events, the HDB-AHS enables historians to trace multi-confessional careers, reconstruct kinship networks, and examine patterns of social mobility from al-Shām to Constantinople. The outcome is a machine-queryable dataset that can feed graph visualisations or feed into the PROSOPON International Research Network10, which already integrates prosopographical projects for the Eastern Mediterranean, 300–1600. Because all links, transcriptions, and match decisions will be stored as structured data, other scholars can audit and reuse the work—an essential condition for reproducible research.

The second scenario is about a comparative economic data extraction. The library unlocks comparative economic history. Chronicles and travel diaries embed recurring notations of commodity prices, shipping tolls, and tax assessments. When mined systematically, these entries complement the Latin notarial registers that underpin current models of market integration. A recent study comparing grain prices in Cairo and Western Europe during the fourteenth century found no conclusive evidence of price convergence, but the analysis relied on a narrow set of printed Arabic sources. By supplying full-text TEI transcriptions with stable canonical references, the HDB-AHS allows the automated extraction of price strings, unit conversions, and geographic tags. A macroeconomic historian can thus run time-series analyses that include Alexandria, Damascus, and Tunis alongside Venice or Barcelona, finally rendering the southern and eastern Mediterranean visible in quantitative terms. Because facsimile images, OCR confidence scores, and conversion scripts will all be archived, other researchers can replicate the pipeline or critique the cleaning rules—bringing economic history in line with open-science norms.

The third scenario includes multilingual literary reception studies. The platform supports multilingual literary reception. Medieval writers translated and re-worked each other’s texts across language boundaries: Andalusi geographers adapted Ptolemaic maps while scholastic humanists cited Arabic commentaries on Aristotle. Yet reception history is still mapped largely within single philological traditions. By linking Arabic travelogues such as Ibn Battuta’s Riḥla11 to their Greek epitomes or Latin abridgements, researchers can track the diffusion and adaptation of narratives across cultures. The pay-off is not only qualitative insight—who read whom, and why—but also in quantitative metrics: citation frequency, translation lag, and the geography of manuscript diffusion. The approach resonates with new work on Latin–Arabic literary entanglements, which demonstrates how intertextual pathways shape canon formation [18]. Multilingual alignment also improves Natural Language Processing (NLP) models that rely on parallel corpora, opening the door to the automated identification of shared motifs or loanwords.

The aforementioned scenarios illustrate why the HDB-AHS is more than a digitisation exercise; it is digital scholarship in the Humanities at full stretch. Digital scholarship thrives on enriched data: the addition of structured metadata, geospatial coordinates, and semantic annotations that convert a static scan into a networked research object. Each item in the library will carry persistent identifiers, IIIF manifests, and TEI-XML markup, ensuring that annotations created in one project remain compatible with tools used elsewhere. Because every action—from OCR correction to entity disambiguation—will be version-controlled and time-stamped, the platform embodies reproducibility: another team can re-run the same queries and reach comparable results or test alternative hypotheses by forking the data pipeline.

Cross-disciplinary collaboration is the third pillar. Prosopographical mapping requires historians, computer scientists, and data curators; price-series modelling attracts economists and mathematicians; literary-reception studies intersect with linguistics and translation theory. By exposing GraphQL (a modern query language for APIs that allows users to request exactly the data they need, rather than receiving a fixed set of results) APIs and bulk-download endpoints, the HDB-AHS invites integration with visualisation dashboards, geographic-information-systems, or even machine-learning notebooks in climate history. The design aligns with FAIR and CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) [19] frameworks, signalling that the data are not only technically open but also culturally responsible: Arabic heritage retained in Greek collections is shared back with scholars and communities across North Africa and the Middle East.

2.4. Relevance for Byzantium, Ottoman Studies, and Global Trade

For Byzantine historians, the merged corpus provides Arab viewpoints on turning points such as the Battle of Manzikert [20] or the Fourth Crusade [21], correcting a narrative that often pivots on Greek chronicles alone. Ottomanists gain earlier Arabic evidence on frontier emirates, supplying socio-economic context that pre-dates the empire’s own archival explosion in the fifteenth century. Scholars of global trade can embed Arabic price data and shipping logs into comparative models that currently rely on Genoese or Venetian customs ledgers, moving from a Eurocentric picture of market integration to a genuinely Mediterranean one. The geographic match is perfect: Greek ports were the clearinghouses where Venetian, Ottoman, and Arab merchants exchanged cargo, and modern Greek repositories mirror that blend in their holdings [22]. By uniting Arabic, Greek, and Latin sources under one digital roof, the HDB-AHS reconstructs the documentary ecosystem that once underpinned those exchanges.

2.5. Catalysing New Teaching and Public-Engagement Formats

Beyond specialist research, the library will reshape teaching practice. A trilingual interface allows Greek undergraduates to compare Arabic and Greek versions of the same treaty side by side, fostering language acquisition and critical source analysis. Interactive IIIF storytelling modules can guide secondary-school students through a merchant’s journey from Alexandria to Thessaloniki, grounding abstract history lessons in primary evidence. Public exhibitions—physical or virtual—can embed manifests from the platform, broadening cultural literacy about shared Mediterranean heritage. These outreach pathways satisfy national goals for public history while returning cultural capital to the Arab and Greek communities whose ancestors produced the documents.

In sum, the scholarly objectives of the HDB-AHS are three-fold: to centralise dispersed Arabic materials that are indispensable to Byzantine, Ottoman, and economic historians; to supply structured, interoperable data that stimulate reproducible digital scholarship; and to forge a linguistic and geographic bridge that situates Greek scholarship at the heart of Mediterranean heritage curation. By enabling prosopographical network analysis, quantitative economic comparison, and multilingual reception studies—all within a FAIR-compliant, trilingual platform—the library promises an intellectual return far greater than the sum of scanned pages. It will re-balance the evidentiary base of medieval studies, enrich cross-cultural pedagogy, and model responsible data stewardship for regions where resources are unevenly distributed. In doing so, it affirms that the digital future of the Humanities is not confined to well-served European and East-Asian corpora but is open to all who share the sea-routes of the past. Table 1 contrasts the scope, the content, the standards, and what the HDB-AHS adds.

3. Theoretical Framework and Design Principles

Open science now shapes every serious conversation about cultural data. UNESCO’s 2021 Recommendation calls for “transparency, reproducibility, and inclusiveness” in both methods and infrastructure [23,24], insisting that heritage datasets be treated as public goods wherever legal conditions permit. At the same time, memory institutions warn that openness must respect provenance, community rights, and long-term preservation costs. The HDB-AHS is positioned at that crossroads: it adopts open-science norms while acknowledging the custodial duties that accompany Arabic manuscripts held in Greek collections.

Digitisation practice in Europe offers a cautionary tale. Platforms such as Europeana show the power of aggregating millions of objects under shared licences and persistent identifiers, yet they also highlight the labour involved in normalising disparate metadata and funding storage for the long haul. Scholarly criticism of “scan-and-forget” projects underscores that access without structured data hampers computational reuse. The HDB-AHS therefore commits not only to scanning but also to embedding each image in a standards-driven ecosystem designed for machine harvesting and scholarly annotation (Europeana PRO12).

The FAIR principles—Findable, Accessible, Interoperable, Reusable—supply the project’s core checklist. “Findable” means every item must carry a globally unique, persistent identifier resolvable even if the hosting server changes; “Accessible” requires standard protocols such as HTTPS and IIIF; “Interoperable” demands community-agreed vocabularies and linked-data formats; “Reusable” insists on rich provenance information and licenses that spell out downstream rights. FAIR emerged in 2016 as a response to data loss and siloed archives and now functions as an evaluative yardstick for heritage grants across the European Research Area.

We prioritise FAIR because it is the prevailing requirement for technical interoperability, persistence, and machine reuse across European research infrastructures; it also provides the most mature metrics for audit and funding evaluation. CARE is applied where materials implicate community authority or sensitivities; it informs governance and access controls rather than identifier, metadata, and API design. This ordering ensures that the baseline corpus is maximally usable by scholars while allowing proportionate restrictions or negotiations where community interests warrant them.

Translating FAIR into engineering tasks produces three baseline requirements. First, the HDB-AHS will mint Archival Resource Keys (ARKs) and Digital Object Identifiers (DOIs) for each manifest [25,26], ensuring that citations remain valid if the physical server is replaced. Second, the platform will expose an open GraphQL endpoint and IIIF Search API [27,28] so that external tools—Voyant13, Recogito14, Symogih15—can query images and metadata without manual export. Third, every asset will carry a machine-readable Creative Commons licence expressed in both human language and as metadata in the IIIF manifest file, satisfying automated licence checkers already in use at Europeana and the Digital Public Library of America.

Interoperability drives the technology stack. The project adopts the IIIF Presentation API 3.0 to encapsulate canvases, sequences, and annotations; this format is now the lingua franca for image-based scholarship, supported by viewers such as Mirador16 and Universal Viewer17. Metadata at the collection, manifest, and canvas levels will be serialised, making each record a first-class Linked Open Data node.

Inclusivity further requires adherence to universal-design guidelines. The HDB-AHS targets full compliance with WCAG 2.2 Level AA18. That commitment dictates colour-contrast ratios, keyboard navigability, and semantic HTML augmented by ARIA roles. The trilingual interface will expose lang attributes at both page and span level so that assistive technologies load correct voice libraries and reading orders. Because Arabic page images include right-to-left scripts, the viewer will default to appropriate scrolling behaviour and offer a mirrored thumbnail strip to assist orientation. User-testing sessions will recruit participants with visual, motor, and cognitive impairments to validate design assumptions before deployment.

Accessible experience is engineered, not wished for. Concrete features include adjustable line spacing, font scaling, and a high-contrast colour theme toggle stored in the user’s browser via prefers-color-scheme. IIIF annotations will supply alt-text for decorative elements and extended descriptions for complex facsimiles, ensuring that visually impaired users receive equivalent content. Keyboard shortcuts will let scholars paginate canvases, trigger search, and copy canonical URIs without touching a mouse—a key requirement for researchers using adaptive devices. Error-state messaging will follow the WCAG principle of “robustness”, exposing ARIA role = “alert” so that screen readers announce form validation feedback promptly. (W3C)

Collectively, these design principles anchor the HDB-AHS in the most current thinking on open, responsible, and sustainable cultural data. By operationalising FAIR through identifiers, APIs, and licences, the project guarantees long-term findability and lawful reuse. By building multilingualism and accessibility into the first wireframes, it addresses inclusion upfront rather than retrofitting patches later. In doing so, the HDB-AHS sets a replicable benchmark for mid-sized institutions that wish to share heritage materials without compromising on technical rigor or community responsibility.

4. Methodological Implementation

Having defined the principles, we now translate them into the implementation plan. The development roadmap of the HDB-AHS is divided into six sequential phases that move a text from archival obscurity to a fully citable, machine-readable resource. Each phase has its own goals, staff mix, success metrics, and risk controls, yet all six are united by three overarching principles: adherence to FAIR practice, respect for copyright and community rights, and readiness for computational analysis. The steering committee reviews to confirm that dependencies are met before the next stage begins (Figure 1 summarises the workflow).

The first phase refers to the selection. The purpose is to establish which works enter the pipeline and in what order. The corpus target is 400 volumes representing three genres—chronicles, diplomatic letters, travelogues—across the eighth to sixteenth centuries.

The criteria are as follows:

Scholarly significance—the frequency of citation in recent research, representation in university syllabi, and appearance in reference bibliographies.
Rarity—the absence of modern critical editions or restricted physical access (e.g., single-copy manuscripts).
Physical condition—items at risk from ink corrosion or brittle paper receive priority, provided conservation labs approve safe handling.

A five-person acquisitions team will compile an initial longlist of 700 candidate works. A twelve-member international advisory board, comprising historians, Arabic philologists, and preservation experts, will rank each item against the criteria on a five-point scale. Scores will feed an algorithm that balances significance with feasibility, producing a short-list of 450 titles and a reserve pool for later calls.

The output of the first phase will be a public “Selection White-paper” detailing the scoring methodology, ensuring transparency and inviting scholarly feedback before digitisation begins. Once priorities are set, we will identify specific witnesses and holding institutions.

The second phase regards the identification. Its purpose is to trace where each short-listed work survives—printed editions, manuscripts, microfilms—and record its shelf marks, call numbers, or digital surrogates.

The tasks involved are as follows:

Harvesting records from Greek university libraries, the Hellenic Parliament Library19, and the Mount Athos archives20.
Querying European catalogues (Karlsruhe Virtual OPAC21, British Library Explore22).
Consulting Middle Eastern repositories through partner liaisons who can access non-public inventories.

The output of the second phase will be the union catalogue, updated monthly, which will feed the acquisition scheduler and become an open dataset that other projects can mine for gaps in Arabic textual preservation. With candidates in hand, we secure rights and logistics for acquisition.

The third phase is about acquisition. Its purpose is to obtain high-quality source images or physical access. For printed editions, the default request is for 400 dpi colour scans delivered via secure FTP; for manuscripts, the team negotiates either new photography or a time-bound loan for on-site digitisation. Memoranda of Understanding specify the copyright, resolution, colour-profile standards, and delivery timelines.

Standard permission letters, translated into Arabic, Greek, English, and French, clarify that images will be released under Creative Commons BY-NC unless holder objections arise; in such cases, a custom licence with view-only restrictions is offered.

On receipt, the digital-curation team performs checksum verification, colour-bar analysis, and focus inspection. Items that fail are returned for reshoot within 30 days.

The output of the third phase will be a secure raw-asset repository with immutable backups at two geographic locations contains all received files and their MD5 hashes, satisfying preservation best practice. After assets are received and verified, we proceed to digitisation.

The fourth phase is about digitisation. The purpose is to transform raw images into preservation-quality masters, service derivatives, and searchable text. Scans will be converted to uncompressed TIFF masters at 400 dpi minimum; 600 dpi will be used for manuscripts with marginalia.

For printed Arabic texts, they will be run through Tesseract OCR23 using custom models trained on nineteenth-century typefaces. For handwritten manuscripts, Kraken OCR24 will be used with a model adapted from the IAM Arabic Handwriting Database [29] plus extra training on samples that we will transcribe semi-automatically. This workflow will ensure that the digital collection is both reliable for preservation and fully searchable. Processed masters and derivatives will feed the platform build and APIs.

Phase 5 involves the website development. Its purpose is to deliver a responsive, trilingual platform that surfaces the content to users and machines alike.

Three moderated usability sessions—one per language community— will evaluate navigation and search clarity. The output of this phase will be a public beta with 100 items demonstrates page rendering, search facets, and citation export in CSL-JSON, RIS, and BibTeX formats. Following internal testing, we will publish the beta and prepare the release.

The sixth phase will result in publishing and linking. Its purpose will be to release stable, citable objects and weave them into the broader linked-data cloud. The standard licence will be CC BY-NC 4.0 for images; TEI transcriptions will carry CC BY-SA 4.0 to encourage remixing. Licence metadata will appear in IIIF manifests and in TEI headers.

A one-click citation tool will generate Chicago, MLA, and ISO-690 references that embed the DOI and access date, encouraging correct attribution and easing learning curves for students. Its output will be a versioned, linked corpus ready for citation, computational analysis, and integration with external portals.

The six-phase plan charts a clear path from archive shelf to linked-data node. Each stage delivers tangible products—catalogues, images, code, metadata—that feed the next. By combining rigorous selection, rights-based acquisition, standards-driven digitisation, modular web development, and open publishing, the HDB-AHS sets up a workflow that others can copy or fork. The sequence is linear where dependencies demand it, yet flexible enough to absorb setbacks without jeopardising final delivery. The result will be a sustainable, interoperable Arabic-language digital library rooted in Greek scholarship but connected to a global community of users and builders.

Selection Scoring and Governance

We use a transparent multi-criteria scoring model (Equation (1)). Items are scored 0–5 against six factors; the weighted sum sets priority.

Factors and weights (initial): Scholarly value/rarity 0.35; Demand/teaching potential 0.15; Physical condition/risk 0.15; Rights and feasibility 0.15; Representational diversity 0.10; Cost/time to acquire × 0.10.

(Weights adjustable by the advisory board by ±10% with rationale.)

Rubric (0–5): 0 = absent/unknown; 3 = meets baseline; 5 = exceptional/clear evidence. Evidence sources: catalogues, expert reports, rights records, and cost estimates.

Score:

P r i o r i t y = \sum w_{i} \cdot s_{i}

(1)

Decision rule. Shortlist if Priority ≥ 3.5/5 and no factor is 0 for Rights or Condition. Ties break to higher diversity or lower cost. All decisions and evidence are logged.

5. Legal and Ethical Considerations

Technical design alone is not sufficient; governance determines responsible access. We therefore outline our legal and ethical framework. Digitising Arabic works for a Greek-hosted, open-access library is not simply a technical challenge; it is a legal and ethical negotiation [30] that touches intellectual-property law, privacy regulation, and cultural stewardship. The HDB-AHS therefore adopts a structured compliance strategy that addresses three recognised risk zones, implements a transparent rights–clearance workflow, aligns licences with project goals, meets GDPR expectations, and respects the CARE principles for materials whose originating communities lie outside the host nation.

5.1. Risk Zone A: Post-1925 Editions Still Under Copyright

Copyright law in most jurisdictions within the European Union protects literary works for 70 years after the death of the last surviving author [31]. Editions published after 1925 are thus frequently still in-copyright—even when the text itself is many centuries old—because copyright covers the editor’s scholarly apparatus, layout, and critical notes. Relying on the “public domain” status of the original Arabic words alone exposes digitisation projects to infringement claims by modern publishers or heirs. For the HDB-AHS, the legal hazard is acute: several foundational Arabic chronicles were re-edited in Cairo and Damascus during the 1930s–1960s, and their critical introductions remain lucrative in academic reprint markets. To mitigate the risk, the selection process flags every post-1925 printing and routes it through a separate clearance track. Until written permission is secured, only bibliographic metadata and low-resolution cover thumbnails are displayed, avoiding any unlicensed dissemination.

5.2. Risk Zone B: Unpublished Manuscripts with Contested Ownership

A second category involves manuscripts whose physical custody does not clearly establish legal ownership [32,33]. Some codices travelled illegally during colonial rule; others belong to religious foundations that recognise neither state archives nor foreign libraries as legitimate gatekeepers. Digitising these items without community consent risks accusations of cultural appropriation and—even if no lawsuit follows—undermines trust. The HDB-AHS therefore treats provenance research as a first-order task. Before photography is commissioned, the rights officer assembles a dossier: a chain-of-custody timeline, any export licences, prior digitisation attempts, and statements from the holding institution. If gaps appear, the advisory board consults external experts and, where appropriate, reaches out to stakeholders such as mosque libraries or national heritage authorities in the manuscript’s country of origin. Only once “reasonable provenance certainty” is documented does digitisation advance; otherwise, the item remains dark-archived until resolution.

5.3. Risk Zone C: Culturally Sensitive Content

Some texts contain accounts of sectarian violence, sacred rituals, or medical recipes that communities regard as restricted knowledge [34,35,36]. Publishing such pages in full-resolution, zoom-enabled IIIF canvases can harm the very constituencies whose history the project seeks to preserve. Sensitivity is not always obvious: what appears to be a routine chronicle entry may, in certain contexts, reveal genealogical information used in identity disputes today. To anticipate harm, the HDB-AHS integrates a “sensitivity check” into its metadata workflow. Cataloguers mark passages flagged by advisers or community contacts; the technical team can then suppress OCR and deep-zoom for those folios, offering watermarked “reference images” or time-bound embargoes instead. The suppression decision is reversible and transparent: the manifest records the restriction reason and review date, maintaining scholarly accountability without exposing vulnerable content.

5.4. Rights–Clearance Workflow

The rights–clearance pipeline runs in four steps:

Initial rights survey: For every item on the short-list, the rights officer records publication date, author death date, editor death date, and current publisher or repository. A traffic-light flag (green for public domain, amber for uncertain, red for likely in-copyright) is stored in the union catalogue.
Stakeholder contact: Amber and red items trigger template letters—available in Arabic, Greek, English, and French—sent to publishers, heirs, or archive directors. The letter explains the non-profit scope, the intended Creative Commons licence, and the technical safeguards against commercial exploitation.
Negotiation and documentation: Where consent is granted, a Memorandum of Understanding specifies resolution, format, attribution wording, and revocation procedures. Where no response comes after 90 days, the dossier moves to fallback strategies: restricted images or metadata-only display.
Repository update: Clearance outcomes propagate automatically to the CMS. Only green-lit assets pass to the digitisation queue; embargoed or partial-access items receive a manifest bearing a striking “rights pending” icon and a summary note for users.

The workflow leverages the licensing guidance provided by Creative Commons itself, which urges licensors to “secure all rights necessary before applying our licenses so that the public can reuse the material as expected” (Creative Commons).

5.5. Licensing Strategy

The HDB-AHS applies CC BY-NC 4.0 to images and CC BY-SA 4.0 to TEI transcriptions. The NonCommercial clause on images restricts reuse to research and teaching, reflecting concerns expressed by holding institutions that high-resolution scans might feed commercial décor markets or for-profit app compilations. Scholarly projects, however, rarely object to share-alike conditions for text, and TEI files’ smaller commercial value justifies the more permissive CC BY-SA licence. Making licences explicit at the metadata layer spares downstream users from legal guesswork and aligns with FAIR’s “Reusable” requirement.

A non-commercial restriction remains necessary because many repositories demand assurances that digitisation will not erode potential revenue from print facsimile sales or paid research services. The advisory board will review the policy every two years; if the majority of partners become comfortable with open commercial use, the licence can be upgraded to plain CC BY.

5.6. GDPR Compliance

Although Arabic chronicles pre-date modern privacy rights, digitisation projects still process personal data—staff emails, contributor ORCID records, and user statistics from the public website. The EU General Data Protection Regulation (GDPR) imposes seven principles: lawfulness, purpose limitation, data minimisation, accuracy, storage limitation, integrity/confidentiality, and accountability [37,38]. The HDB-AHS satisfies these requirements as follows:

Lawfulness and consent: Employment contracts, volunteer agreements, and mailing-list opt-ins include explicit clauses on data storage duration and access scope;
Purpose limitation and minimisation: Only data essential for project delivery (e.g., email for GitHub (https://github.com) access, IP addresses for security logs) are retained;
Storage limitation: Logs older than 180 days are anonymised; contributor contact details are deleted five years after final release unless renewed consent is given;
Integrity and confidentiality: All databases use field-level encryption for personally identifiable information; daily backups are encrypted at rest;
Accountability: A Data Protection Impact Assessment (DPIA) is filed with the Greek Data Protection Authority, and the project appoints a part-time Data Protection Officer.

Scholars downloading IIIF manifests encounter no tracking pixels; analytics run on Matomo with IP masking, keeping research behaviour private.

5.7. Cultural Sensitivity and the CARE Principles

Open data practices can clash with community interests when the custodial institution sits outside the cultural group that produced the material. The CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics) developed by the Global Indigenous Data Alliance shift the conversation from “open by default” to “as open as respectful”. The HDB-AHS adopts CARE in four ways:

Collective Benefit. Scans are repatriated digitally to partner institutions in Egypt, Syria, and Morocco, giving local scholars free, immediate access;
Authority to Control: Source repositories and, where appropriate, descendant communities can request takedown or licence adjustments through a documented governance channel;
Responsibility: Attribution chains record both the holding library and the originating cultural group, acknowledging layered custodianship;
Ethics: The sensitivity flagging system ensures that sacred or vulnerable narratives are not exposed without consultation.

CARE does not negate FAIR; it modifies implementation, reminding technologists that ethical access trumps frictionless openness when the two collide. By foregrounding CARE, the project positions itself as a respectful mediator rather than a neutral platform—a stance likely to foster long-term partnerships in the Middle East and North Africa.

Legal tightropes and ethical cross-currents cannot be wished away; they require a proactive framework [39]. The HDB-AHS meets the challenge with a layered approach: identifying risk zones early, codifying a multilingual rights–clearance workflow, choosing Creative Commons licences that balance access with protection, embedding GDPR safeguards into everyday operations, and aligning with CARE to honour source communities. This apparatus does more than keep lawyers satisfied; it builds the trust without which digitisation efforts fail. Researchers gain clarity on what they can reuse, contributors know their data are safeguarded, and cultural stakeholders retain agency over how their heritage circulates online. The result is a digital library that is not only technically sound but also legally and ethically robust—an essential precondition for sustainable, shared scholarship.

6. Expected Academic and Societal Impact

With governance in place, we indicate how scholars and instructors can use the resource during and after the beta. The HDB-AHS aims not only to aggregate scattered pages but to supply the intellectual “force multiplier” that Arabic studies has so far lacked. Its projected impact stretches across three concentric circles—academic research, teaching and learning, and the wider public sphere—each reinforced by concrete examples and measurable downstream ventures.

6.1. Accelerating Academic Research

At present, many scholars invest months locating half-legible microfilm or negotiating reading-room appointments in three countries before proper analysis can even begin. The HDB-AHS compresses that discovery phase into seconds: a single search bar returns facsimile images, diplomatic transcriptions, and machine-readable authority links. By removing the logistical drag, the platform liberates time for interpretation and hypothesis testing.

Comparative studies gain immediate traction. A Byzantinist can align an Arabic chronicle entry on the 1204 sack of Constantinople with corresponding Greek and Latin eyewitness accounts already available online, forming a multi-angle dossier without boarding a plane. Economic historians can ingest OCR-derived prices from Tunis or Alexandria into statistical notebooks alongside Venetian customs ledgers, expanding the data universe for integrated market models and price convergence tests.

Reproducible digital scholarship becomes the norm. Because every canvas, TEI file, and authority reference is version-controlled and citable via DOI, peers can verify a scholar’s path from image to argument. This transparency invites replication studies, meta-analysis, and collaborative layer-building—activities rarely feasible when primary sources reside in personal hard drives.

Interdisciplinary fertilisation thrives. Linguists interested in lexical borrowing will harvest parallel passages; computer scientists training transformer models for right-to-left scripts will access ground-truth pairs; paleoclimatologists will extract famine frequencies. The library thus acts as a shared laboratory where distinct disciplines find ready-to-use corpora.

6.2. Pedagogical Transformation

Digital primary sources are already reshaping classroom practice, but most Mediterranean history syllabi still lean heavily on European documents because they are easier to source online. The HDB-AHS redresses the balance with three pedagogical pay-offs.

The first one is the existence of bilingual and trilingual modules. Multilingual metadata and tri-script navigation allow educators to construct assignments where students compare an Arabic travelogue with its medieval Greek epitome, building language skills while practicing source criticism. A lecturer can provide stable IIIF links, letting learners annotate passages, bookmark folios, and export citation chains.

The second one is the inquiry-based learning. A Greek secondary-school teacher preparing a unit on the Crusader states might embed IIIF manifests showing Arabic depictions of Frankish garrisons alongside Western miniatures. Pupils could then debate differing narrative frames, learning how standpoint influences historiography. The activity would satisfy national curriculum goals for intercultural competence and digital literacy.

The third one is the low-bandwidth inclusivity. Downloadable zipped packages enable offline workshops in regions with unstable internet. A university in Upper Egypt can stage a corpus-training session using a local mirror, ensuring equitable participation in digital humanities regardless of bandwidth.

6.3. Societal Benefits and Public Discourse

Open access to historical evidence matters far beyond the ivory tower. When citizens can interrogate the past directly, they engage more critically with policy debates on identity, migration, and cultural exchange.

Regarding data-driven journalism, imagine a Greek data-journalist investigating medieval commodity flows to illuminate present-day grain-shipping routes. By calling the library’s API, they retrieve gold price quotations recorded in fifteenth-century Cairo customs registers. A Sankey diagram juxtaposes historical trade paths with modern shipping lanes, offering readers a visually compelling narrative about continuity and change in Mediterranean logistics.

Museum exhibits and virtual tours represent an impressive trend [40,41]. Regional museums lacking extensive conservation budgets can embed high-resolution folios inside touch-screen kiosks, enhancing visitor experience without expensive loans. A traveling exhibition on Mediterranean cartography might juxtapose a Greek portolan chart with a digital facsimile of a medieval world map, underscoring cross-cultural cartographic knowledge.

Discussions about migration often pivot on selective readings of history. By surfacing chronicles that document centuries of intermarriage, religious coexistence, and mercantile mobility, the library injects nuance into contemporary dialogue, countering narratives that portray cultural boundaries as immutable.

6.4. Illustrative Scenarios

A first scenario is the classroom embedding. Consider that a teacher in Thessaloniki (Greece) opens Mirador inside the school’s learning-management system and loads three canvases tagged “Acre 1191.” Students use the built-in drawing tool to mark strategic positions described in both Arabic and Frankish sources. Homework requires exporting annotations and reflections via CSV, reinforcing data organisation skills.

Another scenario is of journalistic visualisation. A reporter writing for a major Athens (Greece) newspaper queries the API for all price listings of olive oil between 1350 and 1450. Using Python’s pandas library25, they generate a time-series graph that parallels spikes in modern energy-market volatility. The article credits the HDB-AHS, spreading awareness far beyond academia.

A third scenario involves community-driven annotation. A diaspora association in Jordan organises a hackathon to tag tribal names in a thirteenth-century biographical dictionary. New entities feed back into Wikidata, improving global knowledge graphs and demonstrating reciprocal benefit.

To illustrate the practical value of the library, consider the following examples:

A Greek high school student is assigned a project on the Crusades. Using the HDB-AHS, she accesses both Arabic and Greek accounts of the siege of Acre, compares the different perspectives, and presents her findings on cross-cultural understanding to her class.
A postgraduate researcher in economic history downloads a dataset of grain prices from fourteenth-century Arabic chronicles and integrates it with Venetian customs records. This allows him to test new hypotheses about Mediterranean market integration.
A community museum curator in Thessaloniki designs an interactive exhibit using IIIF manifests from the HDB-AHS. Visitors explore digitized Arabic travelogues and Greek port records side by side, learning how trade and migration shaped the city’s diverse identity over centuries.

6.5. Catalysing Downstream Projects

The corpus is suited to computational methods that demand scale.

Topic modelling: Once thousands of pages are tokenised, latent-Dirichlet allocation can surface thematic clusters—plague outbreaks, military taxation, pilgrimage logistics—guiding historians to research questions they might not have framed otherwise.
Geographic network analysis: Place-name extraction plus GeoNames coordinates enable the construction of trade and communication networks visualised in Gephi (https://gephi.org/). Scholars will plot shifting hubs from Umayyad Damascus to Ottoman Constantinople, exposing macro-patterns invisible in isolated studies.
Machine translation training: Parallel passages between Arabic originals and Greek paraphrases supply high-quality sentence pairs for transformer models, potentially boosting low-resource Arabic-Greek machine translation.

6.6. Boosting Greek Capacity and Regional Leadership

Greece has invested heavily in digitising classical and Byzantine heritage [42,43,44,45]; the HDB-AHS broadens that portfolio, illustrating national capability in minority-script preservation and open-linked data. Success will lay the groundwork for parallel initiatives:

Ottoman Turkish manuscripts—hosted in the same backend infrastructure, with scripts and OCR models tuned to Arabic-derived Ottoman hand.
Syriac Christian chronicles—forming a Semitic-language cluster that complements Arabic records and showcases Greece as a Mediterranean digital-heritage hub.

By demonstrating scalability, from acquisition pipelines to FAIR-compliant APIs, the project presents a turnkey blueprint for neighbouring institutions in the Balkans and Eastern Mediterranean. Joint grant proposals can ride on this momentum, injecting new resources into regional cultural infrastructures.

In academic spheres, the HDB-AHS radically shortens the “time-to-insight,” powering new comparative and quantitative scholarship. In classrooms, it enriches bilingual instruction and fosters digital literacy. In civil society, it supplies data that deepen the public understanding of cultural interdependence. Finally, it positions Greece as a credible leader in the digital stewardship of non-Greek sources, setting off a virtuous chain reaction that could revitalise heritage work for other understudied languages. We close by summarising the expected impact and the milestones that will trigger public access.

7. Conclusions and Future Work

The blueprint presented in this article demonstrates how a mid-sized institution can transform scattered Arabic chronicles, letters, and travelogues into a world-class digital library by aligning five pillars: open standards, rigorous rights management, inclusive design, evidence-based evaluation, and community partnership. Each pillar translates lofty principles into actionable tasks—minting DOIs, running GDPR-compliant analytics, flagging culturally sensitive folios, or embedding multilingual interface strings. Together they produce a pipeline that other heritage organisations can adapt, regardless of budget or locale.

Key achievements outlined across the previous sections include a clear six-phase workflow and a robust legal-ethical framework. These elements cohere into a repeatable model—select, identify, acquire, digitise, publish, evaluate—that any library or archive facing similar resource constraints can apply to their own collections, whether Ottoman court records or colonial Swahili newspapers.

To secure the resources needed for these next steps, the research team will pursue funding from both national and international bodies. Applications will target both Greek research grants as well as Horizon Europe calls that support digital cultural heritage. We will also engage foundations and private donors with an interest in Mediterranean studies, offering sponsorship packages that acknowledge their contributions in perpetuity.

Looking further ahead, we envision extending the same workflow to other underrepresented collections—Ottoman Turkish court records, Syriac monastic chronicles, or even North African oral histories. By documenting our processes and publishing the code as open source, we invite other institutions to adopt and adapt the model. In doing so, we hope the Hellenic Digital Library of Arabic Historical Sources becomes not just a repository but also a catalyst for a truly shared Mediterranean digital humanities landscape. Pending award and consortium finalisation, we will execute the project plan detailed above.

Author Contributions

Conceptualisation, E.K., C.P., E.L.L. and M.M.; methodology, E.K., C.P., E.L.L. and M.M.; software, C.P.; validation, E.K., C.P., E.L.L. and M.M.; formal analysis, E.K., C.P., E.L.L. and M.M.; investigation, E.K., C.P., E.L.L. and M.M.; resources, E.K., C.P., E.L.L. and M.M.; data curation, E.K., C.P., E.L.L. and M.M.; writing—original draft preparation, E.K., C.P., E.L.L. and M.M.; writing—review and editing, E.K., C.P., E.L.L. and M.M.; visualisation, E.K., C.P., E.L.L. and M.M.; supervision, C.P.; project administration, E.K., C.P., E.L.L. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations have been used in this manuscript:

Acronym	Term	How we use it
FAIR	Findable, Accessible, Interoperable, Reusable	Primary technical framework guiding identifiers, metadata, and APIs.
CARE	Collective Benefit, Authority to Control, Responsibility, Ethics	Governs community authority and access to sensitive materials.
IIIF	International Image Interoperability Framework	We will publish Presentation 3.0 manifests and image services.
OCR	Optical Character Recognition	We will benchmark tools (e.g., Tesseract for print; Kraken for handwriting) before selection.
TEI	Text Encoding Initiative	We will use a minimal header and selected modules for item-level description.
DOI	Digital Object Identifier	Persistent citation ID for items and releases.
ARK	Archival Resource Key	Internal persistence ID; resolves to the same landing page as the DOI.
API	Application Programming Interface	Public endpoints (IIIF; planned read APIs) to access data and metadata.

Notes

1	Authors’ note on the project stage: The HDB-AHS is currently in the pre-implementation stage. The team has completed corpus identification and designed the implementation phases and work-packages; the consortium and funding are being finalised. Technical activities (digitisation, OCR, platform deployment) will begin upon award, following a 36-month plan. This article should be read as a blueprint and impact case, not as a report of completed implementation.
2	https://www.europeana.eu/ (accessed on 5 May 2025).
3	https://developers.wellcomecollection.org/docs/iiif (accessed on 5 May 2025).
4	https://ymdi.uoregon.edu (accessed on 5 May 2025).
5	https://viaf.org/en (accessed on 5 May 2025).
6	https://www.geonames.org/ (accessed on 5 May 2025).
7	http://wamcp.bibalex.org/ (accessed on 5 May 2025).
8	https://www.ekt.gr/en/index (accessed on 5 May 2025).
9	https://www.fdd.org/events/2024/02/12/eastern-mediterranean-at-a-crossroads-the-future-of-regional-integration-and-alliances (accessed on 5 May 2025).
10	https://www.oeaw.ac.at/en/imafo/research/byzantine-research/byzantium-and-beyond/mobility-and-intercultural-contacts/prosopon (accessed on 5 May 2025).
11	https://www.loc.gov/item/2021666192/ (accessed on 5 May 2025).
12	https://pro.europeana.eu/page/linked-open-data-faq (accessed on 5 May 2025).
13	https://voyant-tools.org/ (accessed on 5 May 2025).
14	https://recogito.pelagios.org (accessed on 5 May 2025).
15	http://symogih.org/?q=node/78&lang=en (accessed on 5 May 2025).
16	https://projectmirador.org/ (accessed on 5 May 2025).
17	https://universalviewer.io/ (accessed on 5 May 2025).
18	https://www.levelaccess.com/blo (accessed on 5 May 2025).
19	https://library.parliament.gr/ (accessed on 5 May 2025).
20	https://athos.guide/en/encyclopedia-of-athos/archives-and-libraries?srsltid=AfmBOorv1GqVHZA_wLFn6CwtCarn8bBwftGJlrEmeLoLiAz4uB1IF8Qm (accessed on 5 May 2025).
21	https://bib.hwg-lu.de/en/search-and-find/catalogues (accessed on 5 May 2025).
22	https://www.bl.uk/ (accessed on 5 May 2025).
23	https://github.com/tesseract-ocr/tesseract (accessed on 5 May 2025).
24	https://kraken.re/main/index.html (accessed on 5 May 2025).
25	https://pandas.pydata.or (accessed on 5 May 2025).

References

Ekpenyong, A. Digital Humanities Scholarship: A Model for Reimagining Knowledge Work in the 21st Century. Divers. Divergence Dialogue 2021, 12645, 435–445. [Google Scholar] [CrossRef]
Muslimova, M.; Mamedova, G.; Dzhukaeva, M. Digital Technology and Practices of Humanities Research. SHS Web Conf. 2023, 172, 05001. [Google Scholar] [CrossRef]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
Komar, P. Imports and Market Integration in the Roman Mediterranean. J. Mediterr. Archaeol. 2024, 37, 54–76. [Google Scholar] [CrossRef]
Kourtelis, C. A quarter-century of studying Euro-Mediterranean relations: A systematic literature review. Mediterr. Politics 2022, 29, 165–185. [Google Scholar] [CrossRef]
Windhager, F.; Federico, P.; Schreder, G.; Glinka, K.; Dörk, M.; Miksch, S.; Mayr, E. Visualization of Cultural Heritage Collection Data: State of the Art and Future Challenges. IEEE Trans. Vis. Comput. Graph. 2019, 25, 2311–2330. [Google Scholar] [CrossRef]
Borowiecki, K.; Navarrete, T. Digitization of heritage collections as indicator of innovation. Econ. Innov. New Technol. 2015, 26, 227–246. [Google Scholar] [CrossRef]
Papakostas, C. Bridging Church History, Geopolitics, and Digital Education: A New Approach to Teaching Religious Heritage. Teach. Theol. Relig. 2025. Early View (First Published 16 June 2025). [Google Scholar] [CrossRef]
Broekhuizen, T.; Gijsenberg, M.; Sloot, L.; Broekhuis, M.; Donkers, B.; Emrich, O. Digital platform openness: Drivers, dimensions and outcomes. J. Bus. Res. 2021, 122, 902–914. [Google Scholar] [CrossRef]
Al-Shamayleh, A.; Haider, S.; Khalil, W.; Gani, A.; Akhunzada, A. Risk Factors and Practices for the Development of Open Source Software From Developers’ Perspective. IEEE Access 2023, 11, 63333–63350. [Google Scholar] [CrossRef]
Lythreatis, S.; El-Kassar, A.; Singh, S. The digital divide: A review and future research agenda. Technol. Forecast. Soc. Change 2021, 175, 121359. [Google Scholar] [CrossRef]
Picatoste, X.; Aceleanu, M.; Șerban, A.; Vasilescu, M.; Dimian, G. Digital divide, skills and perceptions on digitalisation in the European Union—Towards a smart labour market. PLoS ONE 2020, 15, e0232032. [Google Scholar] [CrossRef]
Fan, J.; Zhang, W.; Kuang, Z.; Zhang, B.; Yu, J.; Lin, D. Leveraging Content Sensitiveness and User Trustworthiness to Recommend Fine-Grained Privacy Settings for Social Image Sharing. IEEE Trans. Inf. Forensics Secur. 2018, 13, 1317–1332. [Google Scholar] [CrossRef]
Ramadan, A. The Treatment of Arab Prisoners of war in Byzantium, 9th–10th Centuries. Ann. Islam. 2009, 43, 155–194. [Google Scholar]
Durak, K. Performance and Ideology in the Exchange of Prisoners between the Byzantines and the Islamic Near Easterners in the Early Middle Ages. In Medieval and Early Modern Performance in the Eastern Mediterranean; Brepols Publisher: Turnhout, Belgium, 2014; pp. 167–180. [Google Scholar] [CrossRef]
Söderberg, J. Grain Prices in Cairo and Europe in the Middle Ages. Res. Econ. Hist. 2006, 24, 189–216. [Google Scholar] [CrossRef]
Hoyland, R. EXCURSUS B: The Byzantine-Arab Chronicle of 741 and Its Eastern Source. In A Survey and Evaluation of Christian, Jewish and Zoroastrian Writings on Early Islam; Gorgias Press: Piscataway, NJ, USA, 2019; pp. 477–494. [Google Scholar] [CrossRef]
König, D.G. Latin literature and the Arabic language. In A Millennium Heritage; Stella, F., Doležalová, L., Shanzer, D., Eds.; John Benjamins Publishing Company: Amsterdam, The Netherlands, 2024; Chapter 17; pp. 284–295. [Google Scholar] [CrossRef]
Carroll, S.R.; Garba, I.; Figueroa-Rodríguez, O.L.; Holbrook, J.; Lovett, R.; Materechera, S.; Parsons, M.; Raseroka, K.; Rodriguez-Lonebear, D.; Rowe, R.; et al. The CARE Principles for Indigenous Data Governance. Data Sci. J. 2020, 19, 43. [Google Scholar] [CrossRef]
Congrong, X. The Impact of the Battle of Manzikert on the Late Byzantine Empire and Balkan Issues. Int. Theory Pract. Humanit. Soc. Sci. 2025, 2, 376–392. [Google Scholar] [CrossRef]
Bartusis, M. The Byzantine empire and the Balkans. In The Cambridge History of War; Cambridge University Press: Cambridge, UK, 2020; pp. 429–448. [Google Scholar] [CrossRef]
Çalık, Z.A. Forging Cosmopolitan Networks: Muslim Ottoman Merchants in Trieste’s Mediterranean Trade Network in the Late-Eighteenth and the Early-Nineteenth Centuries. Mediterr. Stud. 2025, 33, 70–97. [Google Scholar] [CrossRef]
Ossiannilsson, E. Open educational resources (OER) and some of the United Nations sustainable development goals. Int. J. Inf. Learn. Technol. 2023, 40, 548–561. [Google Scholar] [CrossRef]
Santos-Hermosa, G. Impact and implementation of UNESCO’s Recommendation on Open Educational Resources in academic libraries: SPARC Europe Case Study. Res. Learn. Technol. 2024, 32, 3183. [Google Scholar] [CrossRef]
Kelly, M.; Greenberg, J.; Rauch, C.B.; Grabus, S.; Boone, J.P.; Kunze, J.A.; Logan, P.M. A Computational Approach to Historical Ontologies. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Virtual, 10–13 December 2020; pp. 1878–1883. [Google Scholar] [CrossRef]
Freire, N.; Manguinhas, H.; Isaac, A.; Charles, V. Persistent Identifier Usage by Cultural Heritage Institutions: A Study on the Europeana.eu Dataset. In Linking Theory and Practice of Digital Libraries, In Proceedings of the 27th International Conference on Theory and Practice of Digital Libraries, TPDL 2023, Zadar, Croatia, 26–29 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 341–348. [Google Scholar] [CrossRef]
Nishioka, C.; Nagasaki, K. Understanding IIIF image usage based on server log analysis. Digit. Scholarsh. Humanit. 2021, 36, 210–221. Available online: https://consensus.app/papers/understanding-iiif-image-usage-based-on-server-log-nagasaki-nishioka/bdb48337cb8b5c71b59a36421464c85e/ (accessed on 26 January 2025). [CrossRef]
Qi, L.; He, Q.; Chen, F.; Zhang, X.; Dou, W.; Ni, Q. Data-Driven Web APIs Recommendation for Building Web Applications. IEEE Trans. Big Data 2022, 8, 685–698. [Google Scholar] [CrossRef]
Al-Ma’adeed, S.; Elliman, D.; Higgins, C. A Data Base for Arabic Handwritten Text Recognition Research. Int. Arab J. Inf. Technol. 2024, 1, 38–42. [Google Scholar]
Papakostas, C. Artificial Intelligence in Religious Education: Ethical, Pedagogical, and Theological Perspectives. Religions 2025, 16, 563. [Google Scholar] [CrossRef]
Hutukka, P. Copyright Law in the European Union, the United States and China. IIC-Int. Rev. Intellect. Prop. Compet. Law 2023, 54, 1044–1080. [Google Scholar] [CrossRef]
King, J. Inscriptions and Ways of Owning Books among the Sisters of Syon Abbey. Rev. Engl. Stud. 2021, 72, 836–859. [Google Scholar] [CrossRef]
Yanson, R. Nathan W. Hill: The Historical Phonology of Tibetan, Burmese, and Chinese. Bull. Sch. Orient. Afr. Stud. 2020, 83, 166–168. [Google Scholar] [CrossRef]
Smith, R.; Snow, P.; Serry, T.; Hammond, L. The Role of Background Knowledge in Reading Comprehension: A Critical Review. Read. Psychol. 2021, 42, 214–240. [Google Scholar] [CrossRef]
Cabell, S.; Hwang, H. Building Content Knowledge to Boost Comprehension in the Primary Grades. Read. Res. Q. 2020, 55, S99–S107. [Google Scholar] [CrossRef]
McCarthy, K.; McNamara, D. The Multidimensional Knowledge in Text Comprehension framework. Educ. Psychol. 2021, 56, 196–214. [Google Scholar] [CrossRef]
Hoofnagle, C.; Van Der Sloot, B.; Borgesius, F. The European Union general data protection regulation: What it is and what it means. Inf. Commun. Technol. Law 2019, 28, 65–98. [Google Scholar] [CrossRef]
Tamburri, D. Design principles for the General Data Protection Regulation (GDPR): A formal concept analysis and its evaluation. Inf. Syst. 2020, 91, 101469. [Google Scholar] [CrossRef]
Papakostas, C. Faith in Frames: Constructing a Digital Game-Based Learning Framework for Religious Education. Teach. Theol. Relig. 2024, 27, 137–154. [Google Scholar] [CrossRef]
Papakostas, C.; Troussas, C.; Krouska, A.; Mylonas, P.; Sgouropoulou, C. Utilizing Fuzzy Weights for Enhanced User Experience in Virtual Museums. In Proceedings of the 2024 19th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP), Athens, Greece, 21–22 November 2024; pp. 25–31. [Google Scholar] [CrossRef]
Strousopoulos, P.; Papakostas, C.; Troussas, C.; Krouska, A.; Mylonas, P.; Sgouropoulou, C. SculptMate: Personalizing Cultural Heritage Experience Using Fuzzy Weights. In UMAP ’23 Adjunct: Adjunct Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, Limassol, Cyprus, 26–29 June 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 397–407. [Google Scholar] [CrossRef]
Marsili, G.; Orlandi, L.M. Digital Humanities and Cultural Heritage Preservation. Stud. Digit. Herit. 2020, 3, 144–155. [Google Scholar] [CrossRef]
Stamou, A.; Nassis, C.; Chrysafi, E.; Sylaiou, S.; Kaya, G.; Sarlak, E.; Ribolov, S.; Karavaltchev, V.; Constantinides, A.; Belk, M.; et al. Preserving Ecclesiastical Cultural Heritage of Thrace: A Needs Analysis for Digital Recording in Monasteries and Temples. Heritage 2025, 8, 66. [Google Scholar] [CrossRef]
Kantaros, A.; Soulis, E.; Alysandratou, E. Digitization of Ancient Artefacts and Fabrication of Sustainable 3D-Printed Replicas for Intended Use by Visitors with Disabilities: The Case of Piraeus Archaeological Museum. Sustainability 2023, 15, 12689. [Google Scholar] [CrossRef]
Bellia, A. Towards a Digital Approach to the Listening to Ancient Places. Heritage 2021, 4, 2470–2480. [Google Scholar] [CrossRef]

Figure 1. HDB-AHS six-phase.

Table 1. Comparison with related initiatives.

Project	Primary Scope	Focus/Holdings	Content Type	Standards/Access	What HDB-AHS Adds
Qatar Digital Library (QDL)	Gulf history and Arabic-language heritage	British Library and partners	Digitised manuscripts, archives, maps	IIIF viewer and rich metadata; bilingual interface	Greece-anchored aggregation across Greek repositories; trilingual (GR/AR/EN); selection scoring + DOI/ARK policy; teaching kits for Greek curricula
Digital Muṣḥaf Project	Qur’anic manuscripts	Specialist collections	High-quality facsimiles and codicological data	Public viewing emphasis	Broader corpus beyond Qur’an; programmatic access plan (IIIF v3 + APIs); integration with Greek catalogues
Open Islamicate Texts Initiative (OpenITI)	Machine-readable texts (Arabic, Persian, etc.)	Large text corpora	OCR/edited text, not page images	Text formats and NLP pipelines	Image-first library with IIIF manifests; paired text where feasible; OCR benchmarking protocol
Fihrist (union catalogue, UK)	Islamic manuscripts catalogue	Multiple UK libraries	Descriptive records (little or no images)	Catalogue aggregation	Union catalogue plus images, manifests, and persistent IDs; Greece-based holdings

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karageorgoudis, E.; Papakostas, C.; Lianos Liantis, E.; Miotto, M. On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities. Heritage 2025, 8, 330. https://doi.org/10.3390/heritage8080330

AMA Style

Karageorgoudis E, Papakostas C, Lianos Liantis E, Miotto M. On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities. Heritage. 2025; 8(8):330. https://doi.org/10.3390/heritage8080330

Chicago/Turabian Style

Karageorgoudis, Emmanuil, Christos Papakostas, Efstathios Lianos Liantis, and Marco Miotto. 2025. "On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities" Heritage 8, no. 8: 330. https://doi.org/10.3390/heritage8080330

APA Style

Karageorgoudis, E., Papakostas, C., Lianos Liantis, E., & Miotto, M. (2025). On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities. Heritage, 8(8), 330. https://doi.org/10.3390/heritage8080330

Article Menu

On the Development of the Hellenic Digital Library of Arabic Historical Sources: A Framework for Digital Scholarship in the Humanities

Abstract

1. Introduction

2. Scholarly Objectives and Relevance

2.1. Fragmented Access and Its Consequences

2.2. Greece at the Crossroads

2.3. Scenarios

2.4. Relevance for Byzantium, Ottoman Studies, and Global Trade

2.5. Catalysing New Teaching and Public-Engagement Formats

3. Theoretical Framework and Design Principles

4. Methodological Implementation

Selection Scoring and Governance

5. Legal and Ethical Considerations

5.1. Risk Zone A: Post-1925 Editions Still Under Copyright

5.2. Risk Zone B: Unpublished Manuscripts with Contested Ownership

5.3. Risk Zone C: Culturally Sensitive Content

5.4. Rights–Clearance Workflow

5.5. Licensing Strategy

5.6. GDPR Compliance

5.7. Cultural Sensitivity and the CARE Principles

6. Expected Academic and Societal Impact

6.1. Accelerating Academic Research

6.2. Pedagogical Transformation

6.3. Societal Benefits and Public Discourse

6.4. Illustrative Scenarios

6.5. Catalysing Downstream Projects

6.6. Boosting Greek Capacity and Regional Leadership

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI