Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach

Titocci, Jessica; Pulieri, Martina; Rosati, Ilaria; Karam, Naouel

doi:10.3390/app152312484

Open AccessArticle

Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach

¹

Research Institute on Terrestrial Ecosystems (IRET), National Research Council of Italy (CNR), Campus Ecotekne, 73100 Lecce, Italy

²

LifeWatch Italy, 73100 Lecce, Italy

³

Department of Biological and Environmental Sciences and Technologies, University of Salento, Campus Ecotekne, 73100 Lecce, Italy

⁴

Institute for Applied Informatics (InfAI), University of Leipzig, 04109 Leipzig, Germany

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12484; https://doi.org/10.3390/app152312484

Submission received: 2 October 2025 / Revised: 3 November 2025 / Accepted: 15 November 2025 / Published: 25 November 2025

(This article belongs to the Special Issue Current Advances in Intelligent Semantic Technologies)

Download

Browse Figures

Versions Notes

Abstract

Over the past decade, trait data collection and mobilisation have expanded significantly, yet much of this data remains only partially compliant with FAIR principles. A major challenge lies in the inconsistent use of standards for harmonising heterogeneous trait data, along with the diversity, redundancy, and poor alignment of semantic artefacts developed to address this challenge. This study presents an approach to enhance the interoperability of the Trait Thesauri developed within the LifeWatch Italy research infrastructure for annotating and standardising trait data and metadata of aquatic organisms. This approach combines manual and automated alignment techniques, tested within the 2023 Ontology Alignment Evaluation Initiative. Domain experts manually aligned the Phytoplankton, Zooplankton, Macroalgae, Macrozoobenthos, and Fish trait thesauri, while five matching tools, LogMap, LogMapKG, LogMapLt, Matcha, and OLaLa, were tested for automated mappings. Both approaches revealed significant overlap among thesauri: Manual mapping identified 160 cross-thesauri correspondences and served as a benchmark for evaluating automated matching systems. Automated tools showed variable performance, with OLaLa achieving the best automated alignment results, with an F-measure of 0.93. Challenges in alignment included varying linguistic expressions and differing levels of concept specificity. The results highlight the importance of combining automation with expert validation to ensure mapping quality and allowed the development of a unified Trait Thesaurus, which integrates approximately 500 harmonised concepts, reducing redundancy and enhancing FAIR compliance in ecological and trait-based research.

Keywords:

semantic interoperability; FAIR; SKOS thesauri; species traits; semantic mapping; ontology alignment

1. Introduction

Trait-based approaches are connected within and across disciplines, and they can be used consistently to ask ecological questions at the individual, community, and ecosystem levels, promoting comparability across spatial, temporal, and organisational scales [1]. Trait-based research has developed rapidly in recent years, with a growing research interest in describing and documenting key species traits, trait–environment relationships, and trait dynamics. This research seeks to gain a clearer mechanistic understanding of the relationship between biodiversity and ecosystem functioning and to predict trait changes and functional dynamics under global change scenarios [2,3,4]. This has led to the production of a vast number of trait datasets and databases [5,6,7,8,9,10,11,12], created by direct trait measurements or by extracting, collating, and integrating species traits from the existing literature and multiple online sources. Despite the firm establishment of trait-based approaches in ecological research, there remains a general misunderstanding and disagreements over how ecologists define “trait” and “functional trait” terminologies and, consequently, their applications among and within subdisciplines and study systems [13]. Efforts to define species traits unambiguously and standardise the entire trait measurement process have been limited. Frequently, trait data are reported across data repositories and/or coded inconsistently, resulting in datasets that are not fully compliant with the FAIR principles (Findable, Accessible, Interoperable, and Reusable) [14], mainly due to the heterogeneity in which trait information is collected and reported, thus hindering the accuracy and comparability of global and cross-disciplinary trait-based studies. The FAIR principles provide a globally recognised framework for enhancing the quality, findability, accessibility, interoperability, and reusability of scientific data and other research products [15]. In particular, data interoperability and reusability depend on the use of shared semantic artefacts that enable both humans and machines to interpret and integrate information consistently. Semantic artefacts, such as ontologies, controlled vocabularies, and thesauri, therefore, play a central role in operationalising the FAIR implementation practices. They establish formal, standardised representations of concepts and their relationships, supporting automated data integration and facilitating cross-domain interoperability [16]. Recent developments, such as the FAIR Implementation Profiles (FIPs), provided by the GO FAIR working group [17,18,19], have further advanced this perspective by offering an online resource that documents the FAIR implementation choices made by different domain communities, including the semantic artefacts they have adopted.

To address trait-data management challenges and improve FAIRness, particularly the interoperability of trait-based data, scientists have started to produce and use semantic artefacts such as ontologies, thesauri, and controlled vocabularies, along with common data and metadata schemes. These resources aim to improve indexing and facilitate data retrieval, ultimately enhancing trait discoverability, accessibility, and interoperability [20]. While common data standards like “Darwin Core” (DwC) [21] and the “Ecological Trait-data Standard” (ETS) [22] have played a crucial role in advancing semantic and syntactic interoperability in biodiversity research, they are not designed to describe specific traits and attributes and must be used alongside trait thesauri, which provide controlled vocabularies for trait descriptions. Therefore, there is a growing need for more specific and domain-oriented trait thesauri and ontologies to improve trait-data annotation and machine readability and actionability.

In this context, a collaborative initiative was launched in 2015 within LifeWatch Italy (https://www.lifewatchitaly.eu, accessed on 2 October 2025), the Italian Distributed Centre of the European e-Science Research Infrastructure on biodiversity and ecosystem research by experienced researchers, which aimed to create trait-related semantic artefacts (thesauri) for various groups of aquatic organisms. This initiative led to the development of the Phytoplankton, Zooplankton, Fish, Macroalgae, and Macrozoobenthos trait thesauri [23,24]. These thesauri provide standardised names, alternative labels, and definitions for demographic, morphological, physiological, behavioural, and life history traits. Similar initiatives have been undertaken for other terrestrial and aquatic organisms, such as soil invertebrates [25], marine species (Polytraits [26], Marine Species Traits, and SeaLifeBase), and plants (Thesaurus Of Plant Characteristics). As a result, the original goal of addressing data heterogeneity in trait-based research has inadvertently led to the creation and proliferation of a wide range of vocabularies that currently lack connectivity and a high degree of interoperability. Therefore, the challenge of managing semantic heterogeneity across different trait-based information sources remains unresolved. Interoperability among semantic artefacts can be achieved through traditional methods, which rely heavily on manual mapping and require significant intellectual effort from domain experts and bioinformaticians, or through Semantic Web technologies. In particular, ontology matching systems automatically find correspondences between two or more semantic artefacts, providing automatic alignments and confidence intervals for each correspondence in the mapping. By establishing semantic relationships between concepts or classes in different semantic artefacts, semantic mappings help bridge the gap between different vocabularies and ensure that similar labels and/or meanings are appropriately matched and aligned. Thus, semantic mappings represent solutions that automate the integration of information sources and overcome the problem of semantic heterogeneity.

Functionalities for semantic mappings are already provided by existing semantic artefact catalogues such as Linked Open Vocabularies (LOVs) [27], BioPortal [28], AgroPortal [29], EarthPortal [30], and EcoPortal [31]. However, these catalogues primarily act as platforms for collecting, hosting, serving, and enabling the reuse of semantic artefacts while relying on automatic mapping tools, largely based on lexical similarity of preferred names and synonyms of the concepts (e.g., LOOM for the OntoPortal instances [32,33]) for semantic artefacts alignment. To address the deeper semantic heterogeneity typical of ecological and trait-based vocabularies, dedicated approaches that combine automated semantic artefact alignment techniques with expert-driven validation of semantic mappings are required.

Each year, the Ontology Alignment Evaluation Initiative (OAEI) promotes semantic mapping and alignment between semantic artefacts across disciplines through the systematic evaluation of ontology alignment systems. Since 2018, the OAEI Biodiversity and Ecology (biodiv) track has been conducted to find pairwise alignments between ontologies and thesauri useful for biodiversity and ecology research [34]. In this study, we address the fragmentation and lack of interoperability among existing trait-based semantic artefacts by combining manual expert mapping with automated semantic artefact alignment techniques. We focused on the LifeWatch Italy trait thesauri, a set of domain-specific vocabularies developed independently for different aquatic organism groups. We first conducted a comprehensive expert-based review of the hierarchical structure, content, and relationships among the trait terms included in the trait thesauri. This review involved the use of a manual mapping approach, which led to the discovery of significant semantic overlaps and several inconsistencies in the lexicalization of trait terminologies, particularly with regard to labels, definitions, and the proliferation of identical concepts across different thesauri with different URIs, compromising ISO/SKOS integrity conditions [35]. Following this manual assessment, the trait thesauri were submitted to the OAEI 2023 campaign [36] to evaluate the potential of automated ontology alignment. Five ontology matching systems were tested to assess their ability to detect semantic correspondences across trait thesauri and support harmonisation among trait concepts. The combined insights and results from the manual and automated alignment approaches applied to the LifeWatch Italy trait thesauri encouraged and supported the development of a unified semantic artefact, the “Traits Thesaurus”, which integrates trait concepts from multiple thesauri into a coherent and interoperable structure, as a unique thesaurus for data and metadata annotation for aquatic organism traits. The Traits Thesaurus serves as the basis for improving the information linkage within each of the LifeWatch Italy trait thesauri, bringing them to a higher level of maturity, greatly facilitating the extrapolation of trait-based data, and supporting FAIR applications for the annotation of trait-based data and metadata on aquatic organisms.

2. Materials and Methods

2.1. SKOS Thesauri

The LifeWatch Italy thesauri include the Phytoplankton, Zooplankton, Fish, Macroalgae, and Macrozoobenthos trait thesauri (Table 1), and they are all openly available on EcoPortal (https://ecoportal.lifewatch.eu, accessed on 2 October 2025), the LifeWatch ERIC repository of semantic artefacts for ecology, with the exception of the thesaurus for macrozoobenthos traits, which has not yet been published. The thesauri have evolved over time and were developed by the interdisciplinary collaboration of experts from domain-specific research groups using information and communication technologies, resulting in different levels of maturity and numbers of instances while maintaining a consistent structure. They were developed following an SKOS data model using firstly TemaTres, an open source, web-based thesaurus management software application [37], and then improved and edited using VocBench 3 [38], a web-based collaborative development platform for managing SKOS thesauri, OWL ontologies, and RDF datasets. The thesauri are in English, and each thesaurus comprises a unique skos:conceptScheme, organising a set of concepts represented by the property skos:concepts, each identified with a unique and resolvable resource identifier (URI). The core structure of each skos:conceptScheme includes the top-level concept trait, which branches into broader categories such as “Functional trait”, “Demographic Trait”, “Behavioural Trait”, “Morphological Trait”, “Physiological Trait”, “Life-history Trait”, and “Phenological Trait”. These categories, common to all trait thesauri, are populated with specific trait concepts relevant to the respective functional groups of interest (e.g., phytoplankton, zooplankton, fish). Each concept is labelled with a descriptor term (skos:prefLabel), alternative labels (skos:altLabel), and a definition (skos:definition). All trait concepts have hierarchical and associative relationships with other concepts in the same thesaurus, and additional trait-related information is documented by various types of annotations (skos:note, skos:scopeNote, skos:historyNote, skos:example, etc.).

2.2. Manual Mapping Approach

Manual mapping was performed by a team of four experts in ecology and the functional traits of aquatic organisms, together with two computer scientists specialized in semantic technologies. Each expert independently carried out pairwise comparisons among concepts included in the Phytoplankton, Zooplankton, Macrozoobenthos, Macroalgae, and Fish trait thesauri, considering three main SKOS properties: preferred label (skos:prefLabel), alternative label (skos:altLabel), and definition (skos:definition). The SKOS mapping properties skos:exactMatch, skos:closeMatch, and skos:relatedMatch were used to create and specify mapping relations between SKOS concepts present in different concept schemes of each thesaurus.

The relation skos:exactMatch was used when concepts had the same preferred label and definition. The looser relation skos:closeMatch was used instead of skos:exactMatch in most cases and when some alignments were ambiguous, e.g., in the absence of more detailed information such as scope notes, or when concepts were sufficiently similar with slightly different preferred labels and definitions. skos:relatedMatch was used to indicate an associative mapping relationship between two concepts, even though they had different preferred labels and/or definitions. All mappings and semantic relationships provided by experts were then compared in dedicated reconciliation sessions, where disagreements were discussed and resolved by consensus with the support of computer scientists. This two-step process (independent annotation followed by reconciliation) ensured both reproducibility and robustness by reducing the influence of individual biases.

The resulting manual mappings, created by the expert team, were provided to OAEI organisers during the 2023 OAEI campaign, for the Biodiversity and Ecology (biodiv) track preparation phase, as Excel data files (Supplementary Table S1). These mappings served as the standard baseline reference alignments prior to the execution of the automated matching tools and as ground truth for the evaluation phase.

2.3. Automatic Matching Approach Within the OAEI

Within the Biodiversity and Ecology (biodiv) track of the 2023 OAEI campaign, five automated matching systems were considered to test and evaluate their performance in generating pairwise alignments between LifeWatch Italy SKOS trait thesauri. All input thesauri underwent identical preprocessing across all tools to ensure comparability, and each tool evaluation was carried out according to the same methodology, following the OAEI participation rules and the experimental setup, which consists of three phases:

(1): Preparation: Data conversion, thesauri loading, and matcher configuration;
(2): Execution: Automatic generation of alignments on pairwise thesaurus comparisons;
(3): Evaluation: Comparison of system-generated mappings against expert-based reference alignments.

In the preparation phase, the LifeWatch Italy trait thesauri were downloaded from EcoPortal in RDF/XML format and then normalised and transformed into OWL format using a source code and a jar file obtained from the AgreementMakerLight (AML) [39] ontology parsing module. This step was necessary because the automated systems tested in the Ontology Alignment Evaluation Initiative do not handle the SKOS data model natively. This ensured that all systems received an equivalent OWL representation of each thesaurus, avoiding differences due to input format rather than algorithmic behaviour. After that, the automatic mapping execution and evaluation phase was first performed by each tool on two representative pairs of thesauri, between the Macroalgae–Macrozoobenthos and Fish–Zooplankton trait thesauri, using the following ontology matching systems:

LogMap [40] is an ontology matching system that constructs an inverted lexical index for each ontology and uses external lexicons to find synonyms and lexical variation. It also exploits the information in the class hierarchy, and it employs reasoning and repair techniques to minimise logical errors.
LogMapLt is a lightweight variant of LogMap, which applies string matching techniques. LogMapKG is the LogMap system that returns instance-level and concept-level correspondences.
Matcha [41] is an ontology matching system that incorporates the lexical and structural algorithms from AML and a matching algorithm that uses large language models (LLMs). The system relies on the entities being semantically equivalent, either by having the same URI or by being declared as owl:sameAs.
OLaLa [42,43] is a matching system based on sentence transformers and LLM. The system generates some matching candidates using the Sentence BERT model (SBERT) with a function that is able to extract labels or descriptions, as well as URI fragments and annotation properties. These are fed to the LLM application, where each candidate is analysed independently; therefore, the system has to decide whether one candidate is correct or not, or the system selects the most likely correspondence from a set of possible targets. The output of the high-precision matcher is added, and finally, filters are applied to ensure that only candidates with high confidence intervals are returned.

The evaluation phase was conducted using Java 11 and the MELT framework [44], following the standard setup of the OAEI biodiv track. All matching systems were executed under identical hardware and software conditions to ensure comparability. Specifically, the experiments ran on a Windows 10 (64-bit) machine equipped with an Intel^® Core™ i7-4770 CPU @ 3.40 GHz and 16 GB RAM.

The results of each matching system were evaluated based on the number of matches detected by each automated tool. To evaluate the performance of the systems, measures of precision (which is a measure of correctness), recall (which is a measure of completeness), and F-measure or balanced F-score (F1 score) were calculated against the reference manual alignment, performed by domain experts, according to the following formulas:

Precision = \frac{T P}{(T P + F P)}

Recall = \frac{T P}{(T P + F N)}

where TP is the true positive match (the correctly identified instances from the reference), FP is the false positive match (the incorrectly identified instances from the reference), and FN is the false negative (the unidentified instances from the reference):

F-measure = \frac{2 \cdot p r e c i s i o n \cdot r e c a l l}{(p r e c i s i o n + r e c a l l)}

After the initial evaluation of all automated matching systems, the tool that achieved the highest precision and recall was selected. This system was then used to complete the alignment and evaluation of the thesaurus pairs that were not originally included in the first evaluation phase of the biodiv track. This approach ensured full coverage of all LifeWatch Italy trait thesauri and produced a complete set of mappings using a consistent and reproducible workflow.

3. Results

3.1. Manual Mapping

A total of 160 terms with identical or similar labels and definitions were manually identified and mapped by domain experts, and the subproperties of skos:mappingRelation were defined between concepts belonging to different concept schemes, indicating the type of relationship between terms (Figure 1 and Supplementary Table S1).

The pairwise comparison between the Phytoplankton and Zooplankton trait thesauri revealed the highest number of mappings (22 total), with a significant proportion of these mappings consisting of close matches (15 close matches), indicating a high degree of semantic similarity between concepts in these domains. The Macrozoobenthos thesaurus showed similar results when compared to the thesauri for fish and zooplankton, highlighting the intricate interplay between concepts in different biological domains. Specifically, mappings involving the pairwise comparison between the Fish and Macrozoobenthos trait thesauri had a higher total number of mappings (20 total), with a notable proportion being exact matches (seven exact matches), indicating high redundancy among these two resources and the possibility for semantic alignment. Conversely, the comparison among the Zooplankton and Macroalgae trait thesauri yielded the lowest number of total mappings (13 total mappings).

3.2. Adequacy of Matching Tools: Results Against Manually Created Mappings

None of the automated matching tools were able to detect all manual mappings identified by domain experts (Table 2, Figure 1). However, while the OLaLa and LogMapLt systems detected the lowest number of mappings in the pairwise comparison between the thesauri for fish and zooplankton (13 mappings) and the thesauri for macroalgae and macrozoobenthos (10 mappings) compared to the other tools, they had the highest number of true positive alignments and lowest number of false positives and, thus, precision value (close to 1), indicating that the mappings detected were mostly correct.

In particular, in the comparison between the Fish and Zooplankton trait thesaurus, OLaLa provided 13 automated alignments out of the 15 identified manually, corresponding to an agreement of 86.7, while in the comparison between the Macroalgae and Macrozoobenthos thesauri, 10 mappings were identified out of the 19 manually detected (52.6%). In general, OLaLa was also able to find alignments among concepts with different preferential labels but similar alternative labels (e.g., “Dry Weight” from the Fish trait thesaurus aligned with “Dry Mass” of the Zooplankton trait thesaurus, which had “Dry Weight” as an alternative label). On the other hand, while LogMapLt showed 100% true positive contribution, it was only able to match concepts with the same preferred label without considering alternative labels (e.g., “Trait” vs. “Trait”, “Functional Trait” vs. “Functional Trait”, “Morphological Trait” vs. “Morphological Trait”), resulting in an overall lower number of matches compared to the OLaLa system.

Moreover, the OLaLa matching tool was able to find a correspondence between two terms, namely “Branched” and “Flat” from the Macroalgae trait thesaurus and “Arborescent” and “Flattened” from the Macrozoobenthos trait thesaurus, which had not been identified and listed in the manual mapping task. The systems LogMap, LogMapKG, and Matcha detected 1.5 to 3.7 times more mappings than those identified through manual alignment by domain experts. This discrepancy was primarily due to the prevalence of incorrect mappings, leading to significantly lower precision and F-measure values. Notably, all three systems showed a high incidence of false positives. Specifically, the increased number of matches across LogMap, LogMapKG, and Matcha often aligned terms based on superficial lexical similarity or URI structure rather than true semantic equivalence. For example, incorrect automatic alignments were generated between concepts that did not share the same labels, definitions, or hierarchical context but only had similar URIs, particularly those ending with identical numeric suffixes (e.g., https://kos.lifewatch.eu/thesauri/zooplanktontraits/c_8, https://kos.lifewatch.eu/thesauri/fishtraits/c_8, accessed on 2 October 2025). These incorrect matches tended to inflate the number of mappings while reducing precision. Moreover, several concepts were not mapped by any system, despite having similar preferred labels (e.g., “Body Weight” in the Macrozoobenthos trait thesaurus and “Thallus Weight” in the Macroalgae trait thesaurus) or similar meanings (e.g., “Spheroid” in the Zooplankton trait thesaurus with “Globiform” in the Fish trait thesaurus). Overall, the runtimes of each tool were generally very fast, varying from a few seconds to a few minutes, depending more on the systems used than on the size of the input thesauri. OLaLa was the slowest system, taking up to 10 min to generate the alignment in both pairwise comparisons, while LogMapLt detected the correct alignment in less than one second.

3.3. OLaLa Performance

After evaluating the performance of each tool, the automated alignment approach was performed a second time across all remaining pairwise comparisons between the trait thesauri using only the best performing system, OLaLa. A highly variable number of alignments was identified in each pairwise comparison (Table 3). The highest number of alignments (19) resulted from the Fish and Macrozoobenthos and Phytoplankton and Zooplankton pairwise comparisons, corresponding to 95% (19/20) and 86% (19/22) agreement with the expert-based mappings, respectively. In contrast, the lowest number of alignments was found by OLaLa between the Zooplankton and Macroalgae trait thesauri, with only 4 correct mappings out of 11, resulting in 37% agreement with the manual mappings. In five of the eight pairwise comparisons performed, all automatically identified mappings were 100% correct, showing a precision score of 1 (Table 3).

3.4. Thesauri Merging: The Trait Thesaurus

The results from both the manual and automated alignment approach highlighted a great scope for a straightforward linking of concepts across thesauri, where several concepts from one thesaurus have at least one correspondence in the other thesaurus and vice versa. In light of the obtained results, we therefore considered merging the Phytoplankton, Zooplankton, Fish, Macroalgae, and Macrozoobenthos trait thesauri into a single unified comprehensive resource: the trait thesaurus. This unique semantic artefact will be used to annotate and standardise trait data for aquatic organisms. During the merging process, the common structure among the thesauri was maintained, and concepts from each thesaurus were integrated into five distinct skos:conceptSchemes, one for each micro-thesaurus; nearly 500 concepts were reviewed, mapped, harmonised, and validated across the thesauri. While preserving the global structure, the relationships and hierarchies between concepts were reorganised, and redundancies were eliminated through a selection process based on concept usage frequency. Potential merging conflicts were resolved by refining, simplifying, and generalising definitions where possible, and the duplicate concepts were unified and associated with all concept schemes (Figure 2). After internal validation, the revised and unified version 1.0 of the trait thesaurus was published in EcoPortal and can be accessed through the following link: https://doi.org/10.48373/a2at-d828 (accessed on 2 October 2025). In Figure 3, we present the final structure of the unified trait thesaurus, which integrates all schemes and illustrates the relationships among the main trait categories.

4. Discussion

As the volume and complexity of trait-based data continue to grow, researchers and research infrastructures are now adopting FAIR trait semantic artefacts to better mobilise and integrate trait data on a global scale, gain new research insights, and ultimately transform trait data into human- and machine-actionable knowledge [19]. In this context, a variety of controlled vocabularies and other semantic artefacts have been developed in the ecological domain over recent years. However, they are distributed across multiple catalogues and registries, often with different levels of maturity, management, and versioning practices, and contain high levels of redundancy in content with no or minimal alignment and interlinking. Achieving interoperability among already existing semantic artefacts remains a critical challenge.

In our work, we evaluated the capabilities and limitations of both manual and automated alignment approaches to improve interoperability within trait thesauri for aquatic organisms developed by the LifeWatch Italy research infrastructure. Although more time-consuming, the manual mapping approach performed by domain experts presented minimal challenges overall. In this case, the pairwise comparisons among thesauri were made easier by the consistent and common hierarchical structure and the limited number of concepts involved. However, the identification of mapping relationships between concepts posed some difficulties to the experts, which in several cases led to ambiguity in selecting between exact matches for absolute equivalence and close or related matches. Since skos:exactMatch is a transitive property used to link terms that can be used interchangeably across a wide range of semantic artefacts, the identification of exact matches between thesauri has been limited. In general, exact matches were only found between concepts of the core structure and other top categories across thesauri, e.g., “Trait”, “Functional Trait”, “Behavioural Trait”, “Body Size”, etc. On the other hand, the property skos:closeMatch is more approximate, indicating similar concepts where the scope may be slightly different; a higher proportion of close matches was found between thesauri: for example, concepts with the same labels but slightly different definitions according to the functional groups of interest, e.g., “Biovolume” and “Body Length”, or cases where the preferred labels were different but the general meaning was the same, e.g., “Globiform” and “Sphere”. Associative relations have been indicated for concepts where, despite being expressed with different preferred labels and definitions, the meanings were related to one another, e.g., “Internal” and External Structures” with “Presence of Siliceous Exoskeletal Structures” or with “Presence of Flagella” or “Presence of Lorica”.

Manual mapping methods are difficult to apply to semantic artefacts with a large number of concepts and complex hierarchical structures. Consequently, automating the mapping process should be considered. In our case, the automated alignment performed by the five automated tools used in the Biodiversity and Ecology (biodiv) track within the OAEI campaign worked sufficiently well, despite differences between the various tools, especially in the pairwise comparisons among Fish and Zooplankton and for Macroalgae with Macrozoobenthos trait thesauri. However, none of the tools detected all the manual mappings identified by domain experts. Often, different linguistic expressions for the same concept, expressed as preferred and alternative labels, or different degrees of specificity in the definition of the concepts caused difficulties for the automated tools in recognising the matches. In addition, the set of mappings found by the different systems varied, as did the number of mappings, precision, and performance of each tool. OLaLa and LogMapLT were found to have good performance that can be used for practical applications, while LogMap, LogMapKG, and Matcha had a higher number of mappings, mostly false positive matches, indicating a lower level of precision. The predominant issue with these latter tools was the mapping of concepts with different labels but similar URIs, leading to false and inaccurate matches. Consequently, careful interpretation and post-processing of the alignments generated by such tools became necessary, especially when the matching algorithms considered code numbers reported within the URI of the concept. New developments are needed to enhance the tools and their mapping algorithms to take into account labels, definitions, and exact URIs within a thesaurus construct. This can enhance the accuracy of the match, particularly in instances where terms may vary slightly in their lexicalisation but share synonymous meanings (e.g., “Dry mass” vs. “Dry weight” or “Size” vs. “Body size”).

Undoubtedly, the runtime of the automated alignment tools was significantly faster and not as time-consuming as the manual mapping approach performed by the domain experts. However, the speed of the mapping execution did not correlate with complete efficiency and accuracy of automated ontology alignment systems. Moreover, the large discrepancy in run time between the two best-performing tools OLaLa and LogMapLt makes it difficult, in our case, to find correlations between runtime, alignment adequacy, and tool performance.

Catalogues such as LOV, the OntoPortal instances, and other semantic artefacts repositories have greatly contributed to the sharing and reuse of semantic artefacts and provided mappings across scientific disciplines. However, coverage of ecological and trait-based terminologies is still limited. As a result, trait concepts are often underrepresented, fragmented [45], or embedded within broader taxonomic or environmental semantic artefacts rather than treated as primary semantic entities. In addition, the mapping algorithms adopted by the semantic artefacts catalogues tend to rely mainly on lexical and automatic techniques rather than systematic, independently evaluated workflows. In contrast, this study focuses on domain-specific trait thesauri for aquatic organisms and adopts a hybrid strategy that combines expert-driven validation with automated ontology alignment methods, tested within the OAEI Biodiversity and Ecology track. This approach provides a more robust and domain-aware pathway to improving semantic interoperability in trait-based ecology and complements existing general-purpose repositories with targeted integration workflows.

Despite the ongoing efforts to develop algorithms that address the need for automatic alignment systems [39,40,41,42,43], manual mapping remains indispensable, especially in ambiguous cases where polysemy and homonymy confound automated processes [43]. Our work also confirmed the inability of the considered systems to handle SKOS natively [36]. While SKOS-aware alignment tools exist, they are not included in OAEI evaluations. Consequently, most alignment systems overlook key SKOS features, such as hierarchical relations, alternative labels, and semantic mappings, often flattening these structures and causing information loss and thus necessitating expert validation. Future work should therefore focus on promoting SKOS-aware alignment approaches and their integration into mainstream evaluation campaigns and tools. We therefore encourage tool developers to prioritise this aspect and to provide specialised solutions tailored to SKOS-based resources given their widespread adoption.

To this end, future research should also explore hybrid approaches that integrate symbolic reasoning with LLMs capable of capturing contextual semantics. LLM-based post-processing pipelines could assist in refining automatic alignments, validating ambiguous mappings, and generating context-aware suggestions. These approaches would significantly enhance the interoperability, scalability, and automation of thesaurus integration tasks. Moreover, the development of SKOS-aware matchers would allow for the native handling of hierarchical and associative relationships, which are currently overlooked by most tools. Such advancements would bridge the gap between statistical and rule-based methods, enhancing precision, scalability, and explainability in ontology alignment.

All of these challenges highlight the plasticity of ontology alignment systems and underscore the importance of ongoing testing and evaluation within frameworks such as the Ontology Alignment Evaluation Initiative (OAEI) to drive progress and address emerging issues [46,47]. By evaluating the performance and assessing the strengths and weaknesses of alignment and matching tools, the OAEI helps refine systems and improve interoperability between semantic artefacts each year, providing valuable insights for both tool developers and semantic artefact developers.

Overall, the results of the manual mapping and automated alignment methods supported and guided the development of the “Traits Thesaurus” as a unique semantic artefact for describing aquatic organism traits. The thesauri merging process was a complex cognitive effort in which matching technology surely assisted to some extent in generating mappings. However, intervention and monitoring by domain experts is still fundamental to ensure consistency and quality of mappings and to address redundancy and inconsistency aspects of candidate mappings derived from automated tools.

The unified trait thesaurus thus represents a key step towards semantic harmonisation in trait-based ecology. Its integration within LifeWatch Italy resources enables cross-ecosystem trait analyses by linking biodiversity and trait-based datasets under a shared semantic framework. This interoperability aligns with FAIR principles and opens opportunities for comparative studies, large-scale modelling, and synthesis of functional trait information across spatial, temporal, and organisational scales. In addition, the unified trait thesaurus provides a single access point for cross-searching among different trait thesauri, reducing the effort required to retrieve information and simplifying thesaurus maintenance and updates. This enhances the findability, accessibility, and reusability of trait-based standards by researchers working in trait-based ecology.

The mapping approach used in this work has proved to be very useful in improving interoperability between semantic artefacts, and a similar approach can be applied and extended to other ontologies and thesauri to scale up the process of aligning semantic artefacts across all scientific domains. Research infrastructures can promote the automation of this process by including automated ontology matching systems in semantic artefacts repositories and portals to significantly improve and accelerate the harmonisation and mobilisation of standardised data and metadata globally. Finally, we hope that this manuscript can encourage and guide researchers from the Semantic Web community alongside the scientific community to work closely together in integrating the use of semantic artefacts in realistic application contexts and in applying matching technologies to improve data and metadata interoperability in ecological and trait-based research.

5. Conclusions

Overall, this work addresses a concrete challenge in biodiversity informatics: the semantic heterogeneity that hampers the integration, comparison, and reuse of trait-based data. To overcome this issue, we combined expert-driven curation with automated ontology matching, demonstrating that neither approach alone is sufficient for producing reliable and semantically coherent trait alignments. Manual review was crucial for resolving ambiguous cases, correcting false positives, and ensuring semantic accuracy, while automated tools allowed us to process large vocabularies efficiently and systematically. By jointly applying these approaches, this study offers a systematic evaluation of ontology alignment methods on the LifeWatch Italy trait thesauri within the OAEI Biodiversity and Ecology track, revealing semantic inconsistencies such as duplicated concepts, lexical heterogeneity, and violations of SKOS integrity rules and leading to the creation of the first unified trait thesaurus for aquatic organisms. This resource enhances data standardisation, semantic interoperability, and FAIR-compliant trait annotation across databases, web portals, and research infrastructures. Although developed in an aquatic context, the methodology, combining expert validation and automated alignment, can be replicated in other ecological or biodiversity domains, offering a scalable pathway towards interoperable data. Future work should focus on improving SKOS-compliant tool support, developing SKOS-aware matching algorithms, and exploring LLM-based post-processing to refine mappings, reduce manual effort, and further automate thesauri and, in general, semantic artefact integration. These advancements would improve the automation of thesaurus integration while preserving semantic accuracy, ultimately fostering more robust, reusable, and interoperable trait-based knowledge systems.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/app152312484/s1. Supplementary Table S1: Manually curated alignment of the LifeWatch Italy trait thesauri of aquatic organisms. The mapping served as ground truth for automatic alignment performed by five algorithms, namely LogMap, LogMapKG, LogMapLt, Matcha, and OLaLa, during the Biodiv track of the 2023 OAEI.

Author Contributions

J.T.: Conceptualization, methodology, investigation, formal analysis, validation, writing—original draft, review, and editing, and project administration. M.P.: Formal analysis, data curation, resources, and writing—review and editing. I.R.: Resources, data curation, writing—review and editing, project administration, and funding acquisition. N.K.: Methodology, software, resources, data curation, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

The publication has been funded by EU—Next Generation EU Mission 4 “Education and Research”—Component 2: “From research to business”—Investment 3.1: “Fund for the realization of an integrated system of research and innovation infrastructures”—Project IR0000032—ITINERIS—Italian Integrated Environmental Research Infrastructures System (D.D. n. 130/2022—CUP B53C22002150006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Traits Thesaurus version 1 is available at the following link: https://kos.lifewatch.eu/thesauri/traits/.

Acknowledgments

The authors would like to thank all members of the 2023 Ontology Alignment Evaluation Initiative steering committee and Sven Hertling for providing the results of the automated matching systems. The authors would also like to acknowledge the support received through the facilities provided by the LifeWatch ERIC and LifeWatch Italy research infrastructures. Jessica Titocci was supported by the project “Italian Integrated Environmental Research Infrastructure System” (ITINERIS) in the framework of Next Generation EU PNRR-Mission 4 “Education and Research”—Component 2: “From research to business”—Investment 3.1: “Fund for the realisation of an integrated system of research and innovation infrastructures”, Notice 3264/2021, IR0000032. CUP B53C22002150006. Martina Pulieri was supported by LifeWatch Italy through the project “LifeWatchPLUS”-CIR01_00028.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kissling, W.D.; Walls, R.; Bowser, A.; Jones, M.O.; Jens, K.; Donat, A.; Josep, A.; Basset, A.; van Bodegom, P.M.; Cornelissen, J.H.C.; et al. Towards global data products of essential biodiversity variables on species traits. Nat. Ecol. Evol. 2018, 2, 1531–1540. [Google Scholar] [CrossRef]
Flynn, D.F.B.; Mirotchnick, N.; Jain, M.; Palmer, M.I.; Naeem, S. Functional and phylogenetic diversity as predictors of biodiversity–ecosystem-function relationships. Ecology 2011, 92, 1573–1581. [Google Scholar] [CrossRef] [PubMed]
Cardinale, B.J.; Duffy, J.E.; Gonzalez, A.; Hooper, D.U.; Perrings, C.; Venail, P.; Narwani, A.; Mace, G.; Tilman, D.; Wardle, D.A.; et al. Biodiversity loss and its impact on humanity. Nature 2012, 486, 59–67. [Google Scholar] [CrossRef] [PubMed]
Krause, S.; Le Roux, X.; Niklaus, P.A.; Van Bodegom, P.M.; Lennon, J.T.; Bertilsson, S.; Grossart, H.P.; Philippot, L.; Bodelier, P.L.E. Trait-based approaches for understanding microbial biodiversity and ecosystem functioning. Front. Microbiol. 2014, 5, 251. [Google Scholar] [CrossRef] [PubMed]
Pata, P.R.; Hunt, B.P.V. Harmonizing marine zooplankton trait data toward a mechanistic understanding of ecosystem functioning. Limnol. Oceanogr. 2024, 70, S8–S27. [Google Scholar] [CrossRef]
Laraib, M.; Titocci, J.; Rosati, I.; Basset, A. An integrated individual-level trait-based phytoplankton dataset from transitional waters. Sci. Data 2023, 10, 897. [Google Scholar] [CrossRef]
Falster, D.; Gallagher, R.; Wenk, E.H.; Wright, I.J.; Indiarto, D.; Andrew, S.C.; Baxter, C.; Lawson, J.; Allen, S.; Fuchs, A.; et al. AusTraits, a curated plant trait database for the Australian flora. Sci. Data 2021, 8, 254. [Google Scholar] [CrossRef]
Pekár, S.; Wolff, J.O.; Černecká, Ľ.; Birkhofer, K.; Mammola, S.; Lowe, E.C.; Fukushima, C.S.; Herberstein, M.E.; Kučera, A.; Buzzatto, B.A.; et al. The World Spider Trait database: A centralized global open repository for curated data on spider traits. Database 2021, 2021, baab064. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Gallagher, R.V.; Falster, D.S.; Maitner, B.S.; Salguero-Gómez, R.; Vandvik, V.; Pearse, W.D.; Schneider, F.D.; Kattge, J.; Poelen, J.H.; Madin, J.S.; et al. Open science principles for accelerating trait-based science across the Tree of Life. Nat. Ecol. Evol. 2020, 4, 294–303. [Google Scholar] [CrossRef]
Kattge, J.; Bönisch, G.; Díaz, S.; Lavorel, S.; Prentice, I.C.; Leadley, P.; Tautenhahn, S.; Werner, G.D.A.; Aakala, T.; Abedi, M.; et al. TRY plant trait database–enhanced coverage and open access. Glob. Change Biol. 2020, 26, 119–188. [Google Scholar] [CrossRef]
Kattge, J.; Díaz, S.; Lavorel, S.; Prentice, I.C.; Leadley, P.; Bönisch, G.; Garnier, E.; Westoby, A.M.; Reich, P.B.; Wright, I.J.; et al. TRY—A global database of plant traits. Glob. Change Biol. 2011, 17, 2905–2935. [Google Scholar] [CrossRef]
Jones, K.E.; Bielby, J.; Cardillo, M.; Fritz, S.A.; O’Dell, J.; Orme, C.D.L.; Safi, K.; Sechrest, W.; Boakes, E.H.; Carbone, C.; et al. PanTHERIA: A species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology 2009, 90, 2648. [Google Scholar] [CrossRef]
Dawson, S.K.; Carmona, C.P.; González-Suárez, M.; Jönsson, M.; Chichorro, F.; Mallen-Cooper, M.; Melero, Y.; Moor, H.; Simaika, J.P.; Duthie, A.B.; et al. The traits of “trait ecologists”: An analysis of the use of trait and functional trait terminology. Ecol. Evol. 2021, 11, 16434–16445. [Google Scholar] [CrossRef]
Wilkinson, M.; Dumontier, M.; Aalbersberg, I.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; Bonino da Silva Santos, L.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
Wilkinson, S.R.; Aloqalaa, M.; Belhajjame, K.; Crusoe, M.R.; Kinoshita, B.P.; Gadelha, L.; Garijo, D.; Gustafsson, O.J.R.; Juty, N.; Kanwal, S.; et al. Applying the FAIR Principles to computational workflows. Sci. Data 2025, 12, 328. [Google Scholar] [CrossRef]
Bernabé, C.H.; Queralt-Rosinach, N.; Silva Souza, V.E.; Bonino da Silva Santos, L.O.; Mons, B.; Jacobsen, A.; Roos, M. The use of Foundational Ontologies in Bioinformatics. In Proceedings of the 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, SWAT4HCLS 2022, Leiden, The Netherlands, 10–14 January 2022. [Google Scholar]
Schultes, E.; Magagna, B.; Hettne, K.M.; Pergl, R.; Suchánek, M.; Kuhn, T. Reusable FAIR Implementation Profiles as Accelerators of FAIR Convergence. In Advances in Conceptual Modeling; Grossmann, G., Ram, S., Eds.; ER 2020. Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12584. [Google Scholar] [CrossRef]
Wyborn Lesley, A.; Prent, A.; Croucher, J.; Rees, N.; Farrington, R. Using FAIR Implementation Profiles (FIPs) and FAIR Enabling Resources (FERs) to Accelerate Machine-to-machine Interoperability of Geoscience datasets Within and Across Repositories, Communities and Other Domains. In Proceedings of the AGU Fall Meeting Abstracts, San Francisco, CA, USA, 11–15 December 2023; Volume 2023. [Google Scholar]
Magagna, B.; Schultes, E.; Fouilloux, A.; Burger, G.; Devriendt, D.; Bramley, R.; Kuhn, T.; Rebelo Moreira, J.L.; Bonino da Silva Santos, L.O.; Ferreira Pires, L. Ontological Analysis of FAIR Supporting Resources. In Proceedings of the Joint Ontology Workshops-Episode X: The Tukker Zomer of Ontology, and Satellite Events, JOWO 2024, Enschede, The Netherlands, 15–19 July 2024. [Google Scholar]
Zeng, M.L. Knowledge organization systems (KOS). KO Knowl. Organ. 2008, 35, 160–182. [Google Scholar] [CrossRef]
Wieczorek, J.; Bloom, D.; Guralnick, R.; Blum, S.; Döring, M.; Giovanni, R.; Robertson, T.; Vieglais, D. Darwin Core: An evolving community-developed biodiversity data standard. PLoS ONE 2012, 7, e29715. [Google Scholar] [CrossRef] [PubMed]
Schneider, F.D.; Fichtmueller, D.; Gossner, M.M.; Güntsch, A.; Jochum, M.; König-Ries, B.; Le Provost, G.; Manning, P.; Ostrowski, A.; Penone, C.; et al. Towards an ecological trait-data standard. Methods Ecol. Evol. 2019, 10, 2006–2019. [Google Scholar] [CrossRef]
Rosati, I.; Bergami, C.; Fiore, N.; Oggioni, A.; Tagliolato, P. LifeWatch Italy Thesauri Documentation; Version 1.0; CNR Edizioni: Roma, Italy, 2017; p. 18. ISBN 978-88-8080-249-5. [Google Scholar]
Rosati, I.; Bergami, C.; Stanca, E.; Roselli, L.; Tagliolato, P.; Oggioni, A.; Fiore, N.; Pugnetti, A.; Zingone, A.; Boggero, A.; et al. A thesaurus for phytoplankton trait-based approaches: Development and applicability. Ecol. Inform. 2017, 42, 129–138. [Google Scholar] [CrossRef]
Pey, B.; Laporte, M.A.; Nahmani, J.; Auclerc, A.; Capowiez, Y.; Caro, G.; Cluzeau, D.; Cortet, J.; Decaëns, T.; Dubs, F.; et al. A thesaurus for soil invertebrate trait-based approaches. PLoS ONE 2014, 9, e108985. [Google Scholar] [CrossRef]
Faulwetter, S.; Markantonatou, V.; Pavloudi, C.; Papageorgiou, N.; Keklikoglou, K.; Chatzinikolaou, E.; Pafilis, E.; Chatzigeorgiou, G.; Vasileiadou, K.; Dailianis, T.; et al. Polytraits: A database on biological traits of marine polychaetes. Biodivers. Data J. 2014, 2, e1024. [Google Scholar] [CrossRef]
Vandenbussche, P.-Y.; Atemezing, G.A.; Poveda-Villalón, M.; Vatant, B. Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web. Semant. Web 2016, 8, 437–452. [Google Scholar] [CrossRef]
Whetzel Patricia, L.; Noy, N.F.; Shah, N.H.S.; Alexander, P.R.; Nyulas, C.; Tudorache, T.; Musen, M.A. BioPortal: Enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011, 39 (Suppl. S2), W541–W545. [Google Scholar] [CrossRef]
Jonquet, C.; Toulet, A.; Arnaud, E.; Aubin, S.; Dzalé Yeumo, E.; Emonet, V.; Graybeal, J.; Laporte, M.A.; Musen, M.A.; Pesce, V.; et al. AgroPortal: A vocabulary and ontology repository for agronomy. Comput. Electron. Agric. 2018, 144, 126–143. [Google Scholar] [CrossRef]
Pierkot, C.; Alviset, G.; Vernet, M. The EarthPortal towards an ontology repository for the Earth System semantic artefacts. In Proceedings of the Onto4FAIR 2023 Workshops, Sherbrooke, QC, Canada, 20 July 2023; 2023; pp. 17–22. [Google Scholar]
Tarallo, A.; Pulieri, M.; Ramezani, P.; Rosati, I. Advancements in EcoPortal: Enhancing functionalities for the eco-logical domain semantic artefacts repository. FAIR Connect Empower. Data Steward. 2024, 2, 1–7. [Google Scholar] [CrossRef]
Jonquet, C.; Graybeal, J.; Bouazzouni, S.; Dorf, M.; Fiore, N.; Kechagioglou, X.; Redmond, T.; Rosati, I.; Skrenchuk, A.; Vendetti, J.L.; et al. Ontology Repositories and Semantic Artefact Catalogues with the OntoPortal Technology. In The Semantic Web–ISWC 2023; Payne, T.R., Presutti, V., Qi, G., Poveda-Villalón, M., Stoilos, G., Hollink, L., Kaoudi, Z., Cheng, G., Li, J., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 14266. [Google Scholar] [CrossRef]
Yang, S.-Y. OntoPortal: An ontology-supported portal architecture with linguistically enhanced and focused crawler technologies. Expert Syst. Appl. 2009, 36, 10148–10157. [Google Scholar] [CrossRef]
Karam, N.; Khiat, A.; Algergawy, A.; Sattler, M.; Weiland, C.; Schmidt, M. Matching biodiversity and ecology ontologies: Challenges and evaluation results. Knowl. Eng. Rev. 2020, 35, e9. [Google Scholar] [CrossRef]
Martínez-González, M.M.; Alvite-Díez, M.L. The support of constructs in thesaurus tools from a Semantic Web perspective: Framework to assess standard conformance. Comput. Stand. Interfaces 2019, 65, 79–91. [Google Scholar] [CrossRef]
Abd Nikooie Pour, M.; Algergawy, A.; Buche, P.; Castro, L.J.; Chen, J.; Coulet, A.; Cufi, J.; Dong, H.; Fallatah, O.; Faria, D.; et al. Results of the Ontology Alignment Evaluation Initiative 2023. In Proceedings of the 18th International Workshop on Ontology Matching (OM 2023), HAL, Athens, Greece, 6–7 November 2023; Available online: https://hal.archives-ouvertes.fr/hal-04366893 (accessed on 2 October 2025).
Gonzales-Aguilar, A.; Ramírez-Posada, M.; Ferreyra, D. TemaTres: Software para gestionar tesauros. Prof. Inf. 2012, 21, 319–325. [Google Scholar] [CrossRef]
Stellato, A.; Fiorelli, M.; Turbati, A.; Lorenzetti, T.; Van Gemert, W.; Dechandon, D.; Laaboudi-Spoiden, C.; Gerencser, A.; Waniart, A.; Costetchi, E.; et al. VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri, and lexicons. Semant. Web 2020, 11, 855–881. [Google Scholar] [CrossRef]
Faria, D.; Pesquita, C.; Santos, E.; Palmonari, M.; Cruz, I.F.; Couto, F.M. The AgreementMakerLight ontology matching system. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8185, pp. 527–541. [Google Scholar] [CrossRef]
Jiménez-Ruiz, E.; Grau, B.C. LogMap: Logic-based and scalable ontology matching. In Proceedings of the 10th International Semantic Web Conference (ISWC ‘11), Bonn, Germany, 23–27 October 2011; pp. 273–288. [Google Scholar]
Faria, D.; Silva, M.C.; Cotovio, P.; Eugénio, P.; Pesquita, C. Matcha and Matcha-DL results for OAEI 2022. In Proceedings of the 17th International Workshop on Ontology Matching (OM 2022) Co-Located with the 21st International Semantic Web Conference (ISWC 2022), Hangzhou, China, 23 October 2022. CEUR Workshop Proceedings, 3324. CEUR-WS.org. [Google Scholar]
Hertling, S.; Paulheim, H. OLaLa: Ontology matching with large language models. In Proceedings of the 12th Knowledge Capture Conference (K-CAP ‘23), Pensacola, FL, USA, 2–7 December 2023; pp. 131–139. [Google Scholar] [CrossRef]
Dhamankar, R.; Lee, Y.; Doan, A.; Halevy, A.; Domingos, P. iMAP: Discovering Complex Semantic Matches between Database Schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004. [Google Scholar] [CrossRef]
Hertling, S.; Portisch, J.; Paulheim, H. Melt-matching evaluation toolkit. In Proceedings of the International Conference on Semantic Systems, Karlsruhe, Germany, 9–12 September 2019; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
Di Muri, C.; Pulieri, M.; Raho, D.; Muresan, A.N.; Tarallo, A.; Titocci, J.; Nestola, E.; Basset, A.; Mazzoni, S.; Rosati, I. Assessing semantic interoperability in environmental sciences: Variety of approaches and semantic artefacts. Sci. Data 2024, 11, 1055. [Google Scholar] [CrossRef] [PubMed]
Kotis, K.; Lanzenberger, M. Ontology matching: Current status, dilemmas and future challenges. In Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems, Barcelona, Spain, 4–7 March 2008; IEEE: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
Shvaiko, P.; Euzenat, J. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng. 2011, 25, 158–176. [Google Scholar] [CrossRef]

Figure 1. Number of mappings and type of skos:mappingRelations manually detected for each of the pairwise comparisons performed among LifeWatch Italy trait thesauri.

Figure 2. Conceptual diagram showing the merging and integration process that led to the development of the trait thesaurus, starting from a combined approach of manual and automatic mapping of trait terminologies previously available in the Phytoplankton, Zooplankton, Fish, Macroalgae, and Macrozoobenthos trait thesauri of LifeWatch Italy.

Figure 3. Overview of the unified trait thesaurus hosted in EcoPortal. The figure illustrates the hierarchical organisation of trait concepts and how the different trait schemes have been integrated into a single semantic structure.

Table 1. LifeWatch Italy trait thesauri metrics.

Thesaurus	Short Name	Version	Concepts	Link
Phytoplankton Traits Thesaurus	PHYTOTRAITS	1.5	86	https://ecoportal.lifewatch.eu/ontologies/PHYTOTRAITS, accessed on 2 October 2025
Zooplankton Traits Thesaurus	ZOOPLANKTRAITS	1.5	52	https://ecoportal.lifewatch.eu/ontologies/ZOOPLANKTRAITS, accessed on 2 October 2025
Fish Traits Thesaurus	FISHTRAITS	1.5	126	https://ecoportal.lifewatch.eu/ontologies/FISHTRAITS, accessed on 2 October 2025
Macroalgae Traits Thesaurus	MACROALGAETRAITS	1.5	110	https://ecoportal.lifewatch.eu/ontologies/MACROALGAETRAITS, accessed on 2 October 2025
Macrozoobenthos Traits Thesaurus	MACROZOOBENTHOSTRAITS	1.5	125	not published

Table 2. Automatic alignment results from the five tools tested during the OAEI 2023 in the pairwise comparisons MACROALGAE–MACROZOOBENTHOS and FISH–ZOOPLANKTON. The table includes the execution time, the total number of mappings detected, the relative positive and negative contribution, precision, recall, and F-measure of each matching tool.

System	Time (HH:MM:SS)	N. Mappings Detected	True Positive	False Positive	Precision	Recall	F-Measure
MACROALGAE–MACROZOOBENTHOS
OLaLa	0:08:30	10	9	1	0.7	0.39	0.5
LogMapLt	0:00:00	7	7	0	0.86	0.33	0.48
LogMap	0:00:03	29	8	21	0.27	0.44	0.34
LogMapKG	0:00:04	29	9	20	0.27	0.44	0.34
Matcha	0:00:07	45	9	36	0.2	0.5	0.28
FISH–ZOOPLANKTON
OLaLa	0:07:59	13	13	0	1	0.87	0.93
LogMapLt	0:00:00	8	8	0	1	0.53	0.69
LogMap	0:00:03	32	3	29	0.09	0.2	0.13
LogMapKG	0:00:04	55	11	44	0.22	0.8	0.34
Matcha	0:00:11	47	13	34	0.28	0.87	0.42

Table 3. OLaLa results from the remaining pairwise comparisons. The table includes the total number of mappings detected by both the manual and automated systems and the relative positive and negative contributions, precision, recall, and F-measure of each comparison.

Pairwise Comparison	N. Mappings Manually Detected	N. Mappings Automatically Detected	True Positive	False Positive	Precision	Recall	F-Measure
FISH–MACROALGAE	13	9	7	2	0.7	0.54	0.63
FISH–MACROZOOBENTHOS	20	19	19	0	1	0.95	0.97
ZOOPLANKTON–MACROZOOBENTHOS	18	15	15	0	1	0.83	0.9
ZOOPLANKTON–MACROALGAE	11	4	4	0	1	0.36	0.53
PHYTOPLANKTON–FISH	13	11	9	2	0.81	0.69	0.75
PHYTOPLANKTON–MACROALGAE	15	10	10	0	1	0.66	0.8
PHYTOPLANKTON–ZOOPLANKTON	22	19	19	0	1	0.86	0.92
PHYTOPLANKTON–MACROZOOBENTHOS	15	10	9	1	0.9	0.64	0.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Titocci, J.; Pulieri, M.; Rosati, I.; Karam, N. Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach. Appl. Sci. 2025, 15, 12484. https://doi.org/10.3390/app152312484

AMA Style

Titocci J, Pulieri M, Rosati I, Karam N. Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach. Applied Sciences. 2025; 15(23):12484. https://doi.org/10.3390/app152312484

Chicago/Turabian Style

Titocci, Jessica, Martina Pulieri, Ilaria Rosati, and Naouel Karam. 2025. "Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach" Applied Sciences 15, no. 23: 12484. https://doi.org/10.3390/app152312484

APA Style

Titocci, J., Pulieri, M., Rosati, I., & Karam, N. (2025). Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach. Applied Sciences, 15(23), 12484. https://doi.org/10.3390/app152312484

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Trait Thesauri Interoperability Using a Manual and Automated Alignment Approach

Abstract

1. Introduction

2. Materials and Methods

2.1. SKOS Thesauri

2.2. Manual Mapping Approach

2.3. Automatic Matching Approach Within the OAEI

3. Results

3.1. Manual Mapping

3.2. Adequacy of Matching Tools: Results Against Manually Created Mappings

3.3. OLaLa Performance

3.4. Thesauri Merging: The Trait Thesaurus

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI