You are currently viewing a new version of our website. To view the old version click .

Data

Data is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability.
The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
Quartile Ranking JCR - Q2 (Multidisciplinary Sciences)

All Articles (1,258)

  • Data Descriptor
  • Open Access

Mine accidents pose severe threats to worker safety and sustainable mining development in China. However, existing mine accident data in China are often scattered, unstructured, and lack systematic integration, which limits their application in safety research and practice. This study constructed a standardized structured dataset using 532 mine accident reports from official channels covering the period 2010–2025. The dataset went through four stages: data collection, standardized cleaning, structured annotation, and quality validation. It is stored in JSON Lines (JSONL) format for easy reuse. The dataset covers 27 provinces/autonomous regions/municipalities in China. Among accident levels, general accidents account for 65.6%; among accident types, roof accidents account for 20.3%. Accidents are geographically concentrated, with 11.7%, 8.3%, and 7.7% occurring in Shanxi, Gansu, and Inner Mongolia, respectively. Official data have shown an annual average decrease of 9.7% in mine accidents from 2018 to 2022, reflecting improved safety governance. This dataset addresses the gap of a full-element structured mine accident database in China, providing high-quality data for accident causation modeling, regional risk early warning, and safety policy evaluation. It also supports mine enterprises in targeted risk prevention and regulatory authorities in precise regulatory enforcement.

4 December 2025

Coal mining mortality rate comparison (2010–2025).

Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles

  • Miriam Guillen-Aguinaga,
  • Enrique Aguinaga-Ontoso and
  • Laura Guillen-Aguinaga
  • + 2 authors

Data quality is fundamental to scientific integrity, reproducibility, and evidence-based decision-making. Nevertheless, many datasets lack transparency in their collection and curation, undermining trust and reusability across research domains. This narrative review synthesizes scientific and technical literature published between 1996 and 2025, complemented by international standards (ISO/IEC 25012, ISO 8000), to provide an integrated overview of data quality frameworks, governance, and ethical considerations in the era of Artificial Intelligence (AI). Sources were retrieved from PubMed, Scopus, Web of Science, and grey literature. Across sectors, accuracy, completeness, consistency, timeliness, and accessibility consistently emerged as universal quality dimensions. Evidence from healthcare, business, and public administration suggests that poor data quality leads to substantial financial losses, operational inefficiencies, and erosion of trust. Emerging frameworks are increasingly integrating FAIR principles (Findability, Accessibility, Interoperability, Reusability) and incorporating ethical safeguards, including bias mitigation in AI systems. Data quality is not solely a technical issue but a socio-organizational challenge that requires robust governance and continuous assurance throughout the data lifecycle. Embedding quality and ethical governance into data management practices is crucial for producing trustworthy, reusable, and reproducible data that supports sound science and informed decision-making.

4 December 2025

Conceptual framework integrating data quality, governance, ethics, and AI tools for reproducibility and trust. Light blue nodes represent core data quality management components; the green node highlights reproducibility and trust as the collective outcome. Bidirectional arrows indicate mutually reinforcing relationships, demonstrating that technical solutions alone are insufficient without robust governance and ethical safeguards.
  • Data Descriptor
  • Open Access

Liepāja Lake, a Natura 2000 protected area and one of the largest coastal freshwater bodies in Latvia, has been historically influenced by urbanization, diffuse agricultural inputs, and legacy contamination from metallurgy and ship-repair industries. Comprehensive, spatially explicit data on its sediment and water chemistry were previously lacking. The dataset used in this study provides an openly accessible record of major and trace element concentrations in surface sediments and surface waters collected during the 2024 field campaign. Sampling sites were distributed across northern, central, and southern zones to capture gradients in anthropogenic pressure and natural variability. Water samples were filtered and acidified following ISO 15587-2:2002, while sediments were homogenized, sieved, and digested following EPA 3051a. Both matrices were analyzed using Inductively Coupled Plasma Mass Spectrometry (ICP-MS, Agilent 8900 ICP-QQQ) with multi-element calibration traceable to NIST standards. The dataset comprises 31 analytes (Li–Bi) with paired standard deviation values, reported in mg kg–1 (sediments) and µg L–1 (water). Rigorous validation included certified reference materials, duplicates, blanks, and statistical outlier screening. The resulting data form a reliable geochemical baseline for assessing pollution sources, quantifying spatial heterogeneity, and supporting future monitoring, modeling, and restoration efforts in climate-sensitive Baltic coastal lakes.

3 December 2025

Liepaja Lake with points of water and sediment sampling.
  • Data Descriptor
  • Open Access

With the increasing demand for sustainable building materials, it has become essential to identify sustainable alternatives to conventional sound absorbers, particularly in the context of waste reduction and the circular economy. The aim of this study was to compile and describe a structured dataset of sound absorption coefficients for laboratory-produced panels made from recycled textile materials. Five types of panels were developed using cotton, polyester, wool, linen, and a mixed composition of textiles. A biopolymer binder was applied to ensure structural stability of the materials. Following careful sorting, shredding, and homogenization of the textile waste, test specimens were prepared and examined under controlled laboratory conditions. The sound absorption coefficients were measured using an AFD 1000 impedance tube in accordance with the ISO 10534-2 standard, across a frequency range from 6.25 to 6393.75 Hz. For each material, three repeated measurements were performed, and mean values were calculated to ensure accuracy and reliability. The resulting dataset contains structured values of sound absorption coefficients, which can be applied in building acoustics modeling, comparative studies with conventional insulation materials, and the development of new sustainable products. In addition, the data can be used in educational contexts and machine learning applications to predict the acoustic properties of recycled textile composites.

2 December 2025

Textile Waste Collection and Preparation Process.

News & Conferences

Issues

Open for Submission

Editor's Choice

Reprints of Collections

Data Mining and Computational Intelligence for E-learning and Education
Reprint

Data Mining and Computational Intelligence for E-learning and Education

Editors: Antonio Sarasa Cabezuelo, Ramón González del Campo Rodríguez Barbero
Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)
Reprint

Recent Advances and Applications in Partial Least Squares Structural Equation Modeling (PLS-SEM)

Editors: María del Carmen Valls Martínez, José-María Montero, Pedro Antonio Martín Cervantes

Get Alerted

Add your email address to receive forthcoming issues of this journal.

XFacebookLinkedIn
Data - ISSN 2306-5729