You are currently viewing a new version of our website. To view the old version click .

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability.
The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
Quartile Ranking JCR - Q2 (Multidisciplinary Sciences)

All Articles (1,266)

Labels4Rails: A Railway Image Annotation Tool and Associated Reference Dataset

  • Tina Hiebert,
  • Florian Hofstetter and
  • Carsten Thomas
  • + 3 authors

The development of autonomous train systems relies heavily on machine learning (ML) models, which in turn depend on large, high-quality annotated datasets for training and evaluation. The railway domain lacks adequate public datasets and efficient annotation tools. To address this gap, we present Labels4Rails, a tool designed specifically for the annotation of railway scenes. It captures track topology, switch states including switch directions, and informational tags regarding the images’ content and leverages consistent camera perspectives and the fixed track geometries inherent to railways for annotation efficiency. We used Labels4Rails to create the L4R_NLB reference dataset from Norwegian railway footage. The dataset contains 10,253 annotated images across four seasons, including 1415 switch annotations. Both the tool and dataset are publicly available.

16 December 2025

Labels4Rails user interface with annotated ego track (yellow) and right neighbor track (green).
  • Data Descriptor
  • Open Access

AlimurgITA: A Database of the Italian Alimurgic Flora

  • Piera Di Marzio,
  • Angela Di Iorio and
  • Carmen Giancola
  • + 1 author

The AlimurgITA portal is a user-friendly and effective tool for researching Wild Edible Plants (WEPs). It provides valuable information on alimurgic plant species, aiding conservation and potential applications (agricultural, food, etc.). Users can interact with authors to report errors and contribute to the knowledge base regarding local uses. The authors will update the site every six months to include new data. Currently, the online database contains data on 1116 taxa used in 20 Italian regions: updated scientific name and link to the site Acta Plantarum, family, main synonyms, common name in Italian and regional dialect, chorotype, life form, a map showing the regions where it is known to be used, the part used, how it is used, and the bibliography. From the home page, you can search for taxa by scientific name, and there are pages dedicated to summaries of the entries: scientific name, family, chorotype, life form, method of use, and part used. Additionally, within the FuD WE PIC Project, the AlimurgITA entity list is being integrated with Italian vegetation data from the European Vegetation Archive to model WEPs richness, identify diversity hotspots, and explore the relationship between WEPs diversity and habitat types.

16 December 2025

AlimurgITA online database: a screenshot of the page corresponding to the species Achillea ligustica All.

The development of Natural Language Processing applications tailored for diverse Arabic-speaking users requires specialized Arabic corpora, which are currently lacking in existing Arabic linguistic resources. Therefore, in this study, a multidialectal parallel Arabic corpus is built, focusing on the travel and tourism domain. By leveraging the text generation and dialectal transformation capabilities of Large Language Models, an initial set of approximately 100,000 parallel sentences was generated. Following a rigorous multi-stage deduplication process, 50,010 unique parallel sentences were obtained from Modern Standard Arabic (MSA) and five major Arabic dialects—Saudi, Egyptian, Iraqi, Levantine, and Moroccan. This study presents the detailed methodology of corpus generation and refinement, describes the characteristics of the generated corpus, and provides a comprehensive statistical analysis highlighting the corpus size, lexical diversity, and linguistic overlap between MSA and the five dialects. This corpus represents a valuable resource for researchers and developers in Arabic dialect processing and AI applications that require nuanced contextual understanding.

12 December 2025

Flowchart illustrating the overall process.

Logistics operations demand real-time visibility and rapid response, yet minute-level traffic speed forecasting remains challenging due to heterogeneous data sources and frequent distribution shifts. This paper proposes a Deep Operator Network (DeepONet)-based framework that treats traffic prediction as learning a mapping from historical states and boundary conditions to future speed states, enabling robust forecasting under changing scenarios. We project logistics demand onto a road network to generate diverse congestion scenarios and employ a branch–trunk architecture to decouple historical dynamics from exogenous contexts. Experiments on both a controlled simulation dataset and the real-world Metropolitan Los Angeles (METR-LA) benchmark demonstrate that the proposed method outperforms classical regression and deep learning baselines in cross-scenario generalization. Specifically, the operator learning approach effectively adapts to unseen boundary conditions without retraining, establishing a promising direction for resilient and adaptive logistics forecasting.

12 December 2025

Project workflow: from Solomon demand mapping and SUMO simulation, through feature construction and operator-style branch–trunk modeling, to cross-scene evaluation and diagnostics.

News & Conferences

Issues

Open for Submission

Editor's Choice

Reprints of Collections

Data Mining and Computational Intelligence for E-learning and Education
Reprint

Data Mining and Computational Intelligence for E-learning and Education

Editors: Antonio Sarasa Cabezuelo, Ramón González del Campo Rodríguez Barbero
Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)
Reprint

Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)

Editors: María del Carmen Valls Martínez, José-María Montero, Pedro Antonio Martín Cervantes

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Data - ISSN 2306-5729