Journal Description
Data
Data
is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), Ei Compendex, dblp, Inspec, RePEc, and other databases.
- Journal Rank: JCR - Q2 (Multidisciplinary Sciences) / CiteScore - Q2 (Information Systems and Management)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 25.2 days after submission; acceptance to publication is undertaken in 2.9 days (median values for papers published in this journal in the first half of 2025).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
- Journal Cluster of Information Systems and Technology: Analytics, Applied System Innovation, Cryptography, Data, Digital, Informatics, Information, Journal of Cybersecurity and Privacy and Multimedia.
Impact Factor:
2.0 (2024);
5-Year Impact Factor:
2.1 (2024)
Latest Articles
Anonymized Dataset of Information Systems and Technology Students at a South African University for Learning Analytics
Data 2026, 11(1), 1; https://doi.org/10.3390/data11010001 (registering DOI) - 19 Dec 2025
Abstract
Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education
[...] Read more.
Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education institutions have been slow to implement learning analytics despite the continued accumulation of digital data. The research related to this study presents a dataset of Information Systems and Technology (IS&T) students from the University of KwaZulu-Natal, a South African university. The dataset comprises approximately 14,000 registered student records from 10 IS&T courses, primarily consisting of demographic data, academic performance (including past IS&T courses and school records), and Learning Management System (LMS) interaction data. The dataset exhibits an imbalance, characterised by a higher proportion of students who have successfully completed courses compared to those who have not. The dataset will be of interest to researchers engaged in learning analytics application studies, including early pass/fail prediction and grade classification, as well as those who want to test their techniques on a real-world dataset.
Full article
(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition)
►
Show Figures
Open AccessArticle
A Real-World Underwater Video Dataset with Labeled Frames and Water-Quality Metadata for Aquaculture Monitoring
by
Osbaldo Aragón-Banderas, Leonardo Trujillo, Yolocuauhtli Salazar, Guillaume J. V. E. Baguette and Jesús L. Arce-Valdez
Data 2025, 10(12), 211; https://doi.org/10.3390/data10120211 - 18 Dec 2025
Abstract
Aquaculture monitoring increasingly relies on computer vision to evaluate fish behavior and welfare under farming conditions. This dataset was collected in a commercial recirculating aquaculture system (RAS) integrated with hydroponics in Queretaro, Mexico, to support the development of robust visual models for Nile
[...] Read more.
Aquaculture monitoring increasingly relies on computer vision to evaluate fish behavior and welfare under farming conditions. This dataset was collected in a commercial recirculating aquaculture system (RAS) integrated with hydroponics in Queretaro, Mexico, to support the development of robust visual models for Nile tilapia (Oreochromis niloticus). More than ten hours of underwater recordings were curated into 31 clips of 30 s each, a duration selected to balance representativeness of fish activity with a manageable size for annotation and training. Videos were captured using commercial action cameras at multiple resolutions (1920 × 1080 to 5312 × 4648 px), frame rates (24–60 fps), depths, and lighting configurations, reproducing real-world challenges such as turbidity, suspended solids, and variable illumination. For each recording, physicochemical parameters were measured, including temperature, pH, dissolved oxygen and turbidity, and are provided in a structured CSV file. In addition to the raw videos, the dataset includes 3520 extracted frames annotated using a polygon-based JSON format, enabling direct use for training object detection and behavior recognition models. This dual resource of unprocessed clips and annotated images enhances reproducibility, benchmarking, and comparative studies. By combining synchronized environmental data with annotated underwater imagery, the dataset contributes a non-invasive and versatile resource for advancing aquaculture monitoring through computer vision.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures

Figure 1
Open AccessArticle
Labels4Rails: A Railway Image Annotation Tool and Associated Reference Dataset
by
Tina Hiebert, Florian Hofstetter, Carsten Thomas, Savera Mushtaq, Eero Kaan and Biranavan Parameswaran
Data 2025, 10(12), 210; https://doi.org/10.3390/data10120210 - 16 Dec 2025
Abstract
►▼
Show Figures
The development of autonomous train systems relies heavily on machine learning (ML) models, which in turn depend on large, high-quality annotated datasets for training and evaluation. The railway domain lacks adequate public datasets and efficient annotation tools. To address this gap, we present
[...] Read more.
The development of autonomous train systems relies heavily on machine learning (ML) models, which in turn depend on large, high-quality annotated datasets for training and evaluation. The railway domain lacks adequate public datasets and efficient annotation tools. To address this gap, we present Labels4Rails, a tool designed specifically for the annotation of railway scenes. It captures track topology, switch states including switch directions, and informational tags regarding the images’ content and leverages consistent camera perspectives and the fixed track geometries inherent to railways for annotation efficiency. We used Labels4Rails to create the L4R_NLB reference dataset from Norwegian railway footage. The dataset contains 10,253 annotated images across four seasons, including 1415 switch annotations. Both the tool and dataset are publicly available.
Full article

Figure 1
Open AccessData Descriptor
AlimurgITA: A Database of the Italian Alimurgic Flora
by
Piera Di Marzio, Angela Di Iorio, Carmen Giancola and Bruno Paura
Data 2025, 10(12), 209; https://doi.org/10.3390/data10120209 - 16 Dec 2025
Abstract
The AlimurgITA portal is a user-friendly and effective tool for researching Wild Edible Plants (WEPs). It provides valuable information on alimurgic plant species, aiding conservation and potential applications (agricultural, food, etc.). Users can interact with authors to report errors and contribute to the
[...] Read more.
The AlimurgITA portal is a user-friendly and effective tool for researching Wild Edible Plants (WEPs). It provides valuable information on alimurgic plant species, aiding conservation and potential applications (agricultural, food, etc.). Users can interact with authors to report errors and contribute to the knowledge base regarding local uses. The authors will update the site every six months to include new data. Currently, the online database contains data on 1116 taxa used in 20 Italian regions: updated scientific name and link to the site Acta Plantarum, family, main synonyms, common name in Italian and regional dialect, chorotype, life form, a map showing the regions where it is known to be used, the part used, how it is used, and the bibliography. From the home page, you can search for taxa by scientific name, and there are pages dedicated to summaries of the entries: scientific name, family, chorotype, life form, method of use, and part used. Additionally, within the FuD WE PIC Project, the AlimurgITA entity list is being integrated with Italian vegetation data from the European Vegetation Archive to model WEPs richness, identify diversity hotspots, and explore the relationship between WEPs diversity and habitat types.
Full article
(This article belongs to the Section Information Systems and Data Management)
►▼
Show Figures

Figure 1
Open AccessArticle
Automated Building of a Multidialectal Parallel Arabic Corpus Using Large Language Models
by
Khalid Almeman
Data 2025, 10(12), 208; https://doi.org/10.3390/data10120208 - 12 Dec 2025
Abstract
The development of Natural Language Processing applications tailored for diverse Arabic-speaking users requires specialized Arabic corpora, which are currently lacking in existing Arabic linguistic resources. Therefore, in this study, a multidialectal parallel Arabic corpus is built, focusing on the travel and tourism domain.
[...] Read more.
The development of Natural Language Processing applications tailored for diverse Arabic-speaking users requires specialized Arabic corpora, which are currently lacking in existing Arabic linguistic resources. Therefore, in this study, a multidialectal parallel Arabic corpus is built, focusing on the travel and tourism domain. By leveraging the text generation and dialectal transformation capabilities of Large Language Models, an initial set of approximately 100,000 parallel sentences was generated. Following a rigorous multi-stage deduplication process, 50,010 unique parallel sentences were obtained from Modern Standard Arabic (MSA) and five major Arabic dialects—Saudi, Egyptian, Iraqi, Levantine, and Moroccan. This study presents the detailed methodology of corpus generation and refinement, describes the characteristics of the generated corpus, and provides a comprehensive statistical analysis highlighting the corpus size, lexical diversity, and linguistic overlap between MSA and the five dialects. This corpus represents a valuable resource for researchers and developers in Arabic dialect processing and AI applications that require nuanced contextual understanding.
Full article
(This article belongs to the Topic New Applications of Big Data Technology: Integration of Data Mining and Artificial Intelligence)
►▼
Show Figures

Figure 1
Open AccessArticle
Operator Learning with Branch–Trunk Factorization for Macroscopic Short-Term Speed Forecasting
by
Bin Yu, Yong Chen, Dawei Luo and Joonsoo Bae
Data 2025, 10(12), 207; https://doi.org/10.3390/data10120207 - 12 Dec 2025
Abstract
Logistics operations demand real-time visibility and rapid response, yet minute-level traffic speed forecasting remains challenging due to heterogeneous data sources and frequent distribution shifts. This paper proposes a Deep Operator Network (DeepONet)-based framework that treats traffic prediction as learning a mapping from historical
[...] Read more.
Logistics operations demand real-time visibility and rapid response, yet minute-level traffic speed forecasting remains challenging due to heterogeneous data sources and frequent distribution shifts. This paper proposes a Deep Operator Network (DeepONet)-based framework that treats traffic prediction as learning a mapping from historical states and boundary conditions to future speed states, enabling robust forecasting under changing scenarios. We project logistics demand onto a road network to generate diverse congestion scenarios and employ a branch–trunk architecture to decouple historical dynamics from exogenous contexts. Experiments on both a controlled simulation dataset and the real-world Metropolitan Los Angeles (METR-LA) benchmark demonstrate that the proposed method outperforms classical regression and deep learning baselines in cross-scenario generalization. Specifically, the operator learning approach effectively adapts to unseen boundary conditions without retraining, establishing a promising direction for resilient and adaptive logistics forecasting.
Full article
(This article belongs to the Topic Advanced Techniques and Modeling in Business and Economics)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
A Dataset for the Medical Support Vehicle Location–Allocation Problem
by
Miguel Medina-Perez, Giovanni Guzmán, Magdalena Saldana-Perez, Adriana Lara and Miguel Torres-Ruiz
Data 2025, 10(12), 206; https://doi.org/10.3390/data10120206 - 10 Dec 2025
Abstract
►▼
Show Figures
In mass-casualty incidents, emergency responders require access to accurate and timely information to support informed decision-making and ensure the efficient allocation of resources. This article presents a dataset derived from a case study conducted in Mexico City (CDMX) based on the earthquake of
[...] Read more.
In mass-casualty incidents, emergency responders require access to accurate and timely information to support informed decision-making and ensure the efficient allocation of resources. This article presents a dataset derived from a case study conducted in Mexico City (CDMX) based on the earthquake of 19 September 2017. The dataset presents hypothetical scenarios involving multiple demand points and large numbers of victims, making it suitable for analysis using optimization techniques. It integrates voluntary collaborative geographic information, open government data sources, and historical records, and details the data collection, cleaning, and preprocessing stages. The accompanying Python 3 source code enables users to update the original data for consistent analysis and processing. Researchers can adapt this dataset to other cities with similar risk characteristics, such as Santiago (Chile), Los Angeles (USA), or Tokyo (Japan), and extend it to other types of catastrophic events, including floods, landslides, or epidemics, to support emergency response and resource allocation planning.
Full article

Figure 1
Open AccessData Descriptor
Computational Dataset for Polymer–Pharmaceutical Interactions: MD/MM-PBSA and DFT Resources for Molecularly Imprinted Polymer (MIP) Design
by
David Visentin, Mario Lovrić, Dejan Milenković, Robert Vianello, Željka Maglica, Kristina Tolić Čop and Dragana Mutavdžić Pavlović
Data 2025, 10(12), 205; https://doi.org/10.3390/data10120205 - 10 Dec 2025
Abstract
►▼
Show Figures
Molecularly imprinted polymers (MIPs) are promising sorbents for selectively capturing pharmaceutically active compounds (PhACs), but design remains slow because candidate screening is largely experimental or based on computationally expensive methods. We present MIP–PhAC, an open, curated resource of polymer–pharmaceutical interaction energies generated from
[...] Read more.
Molecularly imprinted polymers (MIPs) are promising sorbents for selectively capturing pharmaceutically active compounds (PhACs), but design remains slow because candidate screening is largely experimental or based on computationally expensive methods. We present MIP–PhAC, an open, curated resource of polymer–pharmaceutical interaction energies generated from molecular dynamics (MD) followed by MM/PBSA analysis, with a small DFT subset for cross-method comparison. This resource is comprised of two complementary datasets: MIP–PhAC-Calibrated, a benchmark set with manually verified pH-7 microstates that reports both monomeric (pre-polymerized) and polymeric (short-chain) MD/MMPBSA energies and includes a DFT subset; and MIP–PhAC-Screen, a broader, high-throughput collection produced under a uniform automated workflow (including automated protonation) for rapid within-polymer ranking and machine learning development. For each MIP—PhAC pair we provide ΔG* components (electrostatics, van der Waals, polar and non-polar solvation; −TΔS omitted), summary statistics from post-convergence frames, simulation inputs, and chemical metadata. To our knowledge, MIP–PhAC is the largest open, curated dataset of polymer–pharmaceutical interaction energies to date. It enables benchmarking of end-point methods, reproducible protocol evaluation, data-driven ranking of polymer–pharmaceutical combinations, and training/validation of machine learning (ML) models for MIP design on modest compute budgets.
Full article

Figure 1
Open AccessData Descriptor
Early-Season Field Reference Dataset of Croplands in a Consolidated Agricultural Frontier in the Brazilian Cerrado
by
Ana Larissa Ribeiro de Freitas, Fábio Furlan Gama, Ivo Augusto Lopes Magalhães and Edson Eyji Sano
Data 2025, 10(12), 204; https://doi.org/10.3390/data10120204 - 10 Dec 2025
Abstract
This dataset presents field observations collected in the municipality of Goiatuba, Goiás State, Brazil, a consolidated and representative agricultural frontier of the Brazilian Cerrado biome. The region presents diverse land use dynamics, including annual cropping systems, irrigated fields with up to three harvests
[...] Read more.
This dataset presents field observations collected in the municipality of Goiatuba, Goiás State, Brazil, a consolidated and representative agricultural frontier of the Brazilian Cerrado biome. The region presents diverse land use dynamics, including annual cropping systems, irrigated fields with up to three harvests per year, and pasturelands. We conducted a field campaign from 3 to 7 November 2025, corresponding to the beginning of the 2025/2026 Brazilian crop season, when crops were at distinct early phenological stages. To ensure representativeness, we delineated 117 reference fields prior to the field campaign, and an additional 463 plots were surveyed during work. Geographic coordinates, crop types, and photographic records were obtained using the GPX Viewer application, a handheld GPS receiver, and the QField 3.7.9 mobile GIS application running on a tablet uploaded with Sentinel-2 true-color imagery and the municipal road network. Plot boundaries were subsequently digitized in QGIS Desktop 3.34.1 software, following a conservative mapping strategy to minimize edge effects and internal heterogeneity associated with trees and water catchment basins. In total, more than 26,000 hectares of agricultural fields were mapped, along with additional land use and land cover polygons representing water bodies, urban areas, and natural vegetation fragments. All reference fields were labeled based on in situ observations and linked to Sentinel-2 mosaics downloaded via the Google Earth Engine platform. This dataset is well-suited for training, testing, and validation of remote sensing classifiers, benchmarking studies, and agricultural mapping initiatives focused on the beginning of the agricultural season in the Brazilian Cerrado.
Full article
(This article belongs to the Special Issue New Progress in Big Earth Data)
►▼
Show Figures

Figure 1
Open AccessArticle
ANSEC-MM: Identifying Antecedents of Negative Public Sentiment Through Expression Capacity: A Mixed-Methods Approach to Crisis Mitigation
by
Zeeshan Rasheed, Shahzad Ashraf and Syed Kanza Mehak
Data 2025, 10(12), 203; https://doi.org/10.3390/data10120203 - 9 Dec 2025
Abstract
Social networks have emerged as integral platforms for communication and information dissemination in contemporary society. The spread of negative sentiments and its impact on activities of users in social networks is a crucial issue. When users receive negative reviews about news or articles,
[...] Read more.
Social networks have emerged as integral platforms for communication and information dissemination in contemporary society. The spread of negative sentiments and its impact on activities of users in social networks is a crucial issue. When users receive negative reviews about news or articles, regardless of authenticity, they form opinions based on their own understanding, and statistics show that more than 90% of the time this reveals predictable behavior patterns. To address this situation, the proposed Antecedents of Negative Sentiment through Expression Capacity: Mixed Methods (ANSEC-MM) study identifies the antecedents of negative sentiment using expression capacity as a mixed-methods approach to mitigate the generation of negative sentiments. The proposed model introduces the concept of identification of influencer nodes with further categorization into active and inactive influencer nodes. The model separates negative influencer nodes from positive nodes and processes the negative influencer nodes further. A Node Expressive Capacity (NE) metric predicts the frequency with which users interact with neighboring influencer nodes, which contributes to the generation of negative sentiments. A Cognitive Effect Coefficient (φ) defines the temperament status of the users. Through further computation, the model distinguishes the proportion of negative sentiments from positive ones. Negative sentiment mitigation is achieved through a developed algorithmic approach. Performance is tested and compared across three datasets against state-of-the-art models: EANN, BERT, and AOAN. The proposed model demonstrated superior performance in negative sentiment detection and mitigation, achieving accuracy rates of 90% and 88%, respectively, compared to existing models.
Full article
(This article belongs to the Special Issue Advances in Graph-Structured Data: Methods and Applications)
►▼
Show Figures

Graphical abstract
Open AccessData Descriptor
China’s 15-Year Mine Accident Report Dataset (2010–2025): Construction and Analysis
by
Maoquan Wan, Hao Li, Hao Wang, Hanjun Gong and Jie Hou
Data 2025, 10(12), 202; https://doi.org/10.3390/data10120202 - 4 Dec 2025
Abstract
►▼
Show Figures
Mine accidents pose severe threats to worker safety and sustainable mining development in China. However, existing mine accident data in China are often scattered, unstructured, and lack systematic integration, which limits their application in safety research and practice. This study constructed a standardized
[...] Read more.
Mine accidents pose severe threats to worker safety and sustainable mining development in China. However, existing mine accident data in China are often scattered, unstructured, and lack systematic integration, which limits their application in safety research and practice. This study constructed a standardized structured dataset using 532 mine accident reports from official channels covering the period 2010–2025. The dataset went through four stages: data collection, standardized cleaning, structured annotation, and quality validation. It is stored in JSON Lines (JSONL) format for easy reuse. The dataset covers 27 provinces/autonomous regions/municipalities in China. Among accident levels, general accidents account for 65.6%; among accident types, roof accidents account for 20.3%. Accidents are geographically concentrated, with 11.7%, 8.3%, and 7.7% occurring in Shanxi, Gansu, and Inner Mongolia, respectively. Official data have shown an annual average decrease of 9.7% in mine accidents from 2018 to 2022, reflecting improved safety governance. This dataset addresses the gap of a full-element structured mine accident database in China, providing high-quality data for accident causation modeling, regional risk early warning, and safety policy evaluation. It also supports mine enterprises in targeted risk prevention and regulatory authorities in precise regulatory enforcement.
Full article

Figure 1
Open AccessReview
Data Quality in the Age of AI: A Review of Governance, Ethics, and the FAIR Principles
by
Miriam Guillen-Aguinaga, Enrique Aguinaga-Ontoso, Laura Guillen-Aguinaga, Francisco Guillen-Grima and Ines Aguinaga-Ontoso
Data 2025, 10(12), 201; https://doi.org/10.3390/data10120201 - 4 Dec 2025
Abstract
►▼
Show Figures
Data quality is fundamental to scientific integrity, reproducibility, and evidence-based decision-making. Nevertheless, many datasets lack transparency in their collection and curation, undermining trust and reusability across research domains. This narrative review synthesizes scientific and technical literature published between 1996 and 2025, complemented by
[...] Read more.
Data quality is fundamental to scientific integrity, reproducibility, and evidence-based decision-making. Nevertheless, many datasets lack transparency in their collection and curation, undermining trust and reusability across research domains. This narrative review synthesizes scientific and technical literature published between 1996 and 2025, complemented by international standards (ISO/IEC 25012, ISO 8000), to provide an integrated overview of data quality frameworks, governance, and ethical considerations in the era of Artificial Intelligence (AI). Sources were retrieved from PubMed, Scopus, Web of Science, and grey literature. Across sectors, accuracy, completeness, consistency, timeliness, and accessibility consistently emerged as universal quality dimensions. Evidence from healthcare, business, and public administration suggests that poor data quality leads to substantial financial losses, operational inefficiencies, and erosion of trust. Emerging frameworks are increasingly integrating FAIR principles (Findability, Accessibility, Interoperability, Reusability) and incorporating ethical safeguards, including bias mitigation in AI systems. Data quality is not solely a technical issue but a socio-organizational challenge that requires robust governance and continuous assurance throughout the data lifecycle. Embedding quality and ethical governance into data management practices is crucial for producing trustworthy, reusable, and reproducible data that supports sound science and informed decision-making.
Full article

Figure 1
Open AccessData Descriptor
Georeferenced Sediment and Surface Water Element Concentrations in the Coastal Liepāja Lake (Latvia), 2024
by
Inga Grinfelde, Uldis Valainis, Maris Nitcis, Ieva Buske, Jana Grave, Normunds Stivrins, Vilda Grybauskiene, Gitana Vyciene, Maris Bertins and Jovita Pilecka-Ulcugaceva
Data 2025, 10(12), 200; https://doi.org/10.3390/data10120200 - 3 Dec 2025
Abstract
►▼
Show Figures
Liepāja Lake, a Natura 2000 protected area and one of the largest coastal freshwater bodies in Latvia, has been historically influenced by urbanization, diffuse agricultural inputs, and legacy contamination from metallurgy and ship-repair industries. Comprehensive, spatially explicit data on its sediment and water
[...] Read more.
Liepāja Lake, a Natura 2000 protected area and one of the largest coastal freshwater bodies in Latvia, has been historically influenced by urbanization, diffuse agricultural inputs, and legacy contamination from metallurgy and ship-repair industries. Comprehensive, spatially explicit data on its sediment and water chemistry were previously lacking. The dataset used in this study provides an openly accessible record of major and trace element concentrations in surface sediments and surface waters collected during the 2024 field campaign. Sampling sites were distributed across northern, central, and southern zones to capture gradients in anthropogenic pressure and natural variability. Water samples were filtered and acidified following ISO 15587-2:2002, while sediments were homogenized, sieved, and digested following EPA 3051a. Both matrices were analyzed using Inductively Coupled Plasma Mass Spectrometry (ICP-MS, Agilent 8900 ICP-QQQ) with multi-element calibration traceable to NIST standards. The dataset comprises 31 analytes (Li–Bi) with paired standard deviation values, reported in mg kg–1 (sediments) and µg L–1 (water). Rigorous validation included certified reference materials, duplicates, blanks, and statistical outlier screening. The resulting data form a reliable geochemical baseline for assessing pollution sources, quantifying spatial heterogeneity, and supporting future monitoring, modeling, and restoration efforts in climate-sensitive Baltic coastal lakes.
Full article

Figure 1
Open AccessData Descriptor
Sound Absorption Coefficient Data for Laboratory-Produced Sound-Absorbing Panels from Textile Waste
by
Kristaps Siltumens, Inga Grinfelde, Raitis Brencis and Andris Paeglitis
Data 2025, 10(12), 199; https://doi.org/10.3390/data10120199 - 2 Dec 2025
Abstract
►▼
Show Figures
With the increasing demand for sustainable building materials, it has become essential to identify sustainable alternatives to conventional sound absorbers, particularly in the context of waste reduction and the circular economy. The aim of this study was to compile and describe a structured
[...] Read more.
With the increasing demand for sustainable building materials, it has become essential to identify sustainable alternatives to conventional sound absorbers, particularly in the context of waste reduction and the circular economy. The aim of this study was to compile and describe a structured dataset of sound absorption coefficients for laboratory-produced panels made from recycled textile materials. Five types of panels were developed using cotton, polyester, wool, linen, and a mixed composition of textiles. A biopolymer binder was applied to ensure structural stability of the materials. Following careful sorting, shredding, and homogenization of the textile waste, test specimens were prepared and examined under controlled laboratory conditions. The sound absorption coefficients were measured using an AFD 1000 impedance tube in accordance with the ISO 10534-2 standard, across a frequency range from 6.25 to 6393.75 Hz. For each material, three repeated measurements were performed, and mean values were calculated to ensure accuracy and reliability. The resulting dataset contains structured values of sound absorption coefficients, which can be applied in building acoustics modeling, comparative studies with conventional insulation materials, and the development of new sustainable products. In addition, the data can be used in educational contexts and machine learning applications to predict the acoustic properties of recycled textile composites.
Full article

Figure 1
Open AccessData Descriptor
Open Dataset on Neurocognitive Complaints and Physical Symptoms in Long COVID: A Six-Month Post-Infection Cohort
by
Somayeh Pour Mohammadi, Francisco Mercado Romero, Moein Noroozi Fashkhami and Irene Peláez
Data 2025, 10(12), 198; https://doi.org/10.3390/data10120198 - 1 Dec 2025
Abstract
Long COVID is frequently accompanied by enduring neurocognitive and physical symptoms that substantially affect quality of life. Cognitive complaints—including difficulties in memory, attention, and executive functioning—often co-occur with physical manifestations such as fatigue, dyspnea, and headache. Despite growing research, openly available datasets integrating
[...] Read more.
Long COVID is frequently accompanied by enduring neurocognitive and physical symptoms that substantially affect quality of life. Cognitive complaints—including difficulties in memory, attention, and executive functioning—often co-occur with physical manifestations such as fatigue, dyspnea, and headache. Despite growing research, openly available datasets integrating demographic, cognitive, and physical symptom profiles assessed during chronic phases of Long COVID remain scarce. Here, we present two complementary self-report datasets collected ≥6 months after the most recent COVID-19 infection. The first dataset (“Neuro–Long COVID-212”) includes demographic information, binary neurocognitive symptom indicators, and a 14-item Post-COVID Cognitive Impairment Scale assessing memory and attention complaints. The second dataset (“Neuro–Long COVID–210”) provides a broad range of physical symptoms—operationally defined as somatic and neurological complaints (e.g., fatigue, pain, sleep disturbance, anosmia/ageusia)—recorded as binary indicators (present/absent). Data were collected online via the Porsline platform using individualized links, with remote researcher support to ensure accuracy. Quality assurance procedures included duplicate-response removal, consistency checks, and transparent handling of missing values. The datasets are released in Excel (.xlsx) format, fully de-identified and accompanied by a detailed data dictionary to facilitate reuse. These datasets enable reproducibility, secondary analyses, and meta-analyses on cognitive and physical outcomes in Long COVID, and may inform future cross-disciplinary rehabilitation research.
Full article
(This article belongs to the Special Issue Data in Behavioral and Experimental Research: Datasets and Applications)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
Articulatory Data on Preboundary Lengthening Across Prominence Conditions in American English
by
Jiyoung Jang, Sahyang Kim and Taehong Cho
Data 2025, 10(12), 197; https://doi.org/10.3390/data10120197 - 1 Dec 2025
Abstract
►▼
Show Figures
This article presents articulatory–kinematic data on preboundary lengthening (Intonational Phrase-final lengthening) from the productions of ten native speakers of American English—a relatively rare class of phonetic data compared with the more widely available acoustic data. The dataset includes three trisyllabic nonce words (bábaba,
[...] Read more.
This article presents articulatory–kinematic data on preboundary lengthening (Intonational Phrase-final lengthening) from the productions of ten native speakers of American English—a relatively rare class of phonetic data compared with the more widely available acoustic data. The dataset includes three trisyllabic nonce words (bábaba, babába, bababá), each designed to manipulate the location of lexical stress. These were produced under prosodic conditions that varied in boundary position and focus-induced phrasal prominence, enabling analysis of how preboundary lengthening is distributed across words with different lexical stress locations and how it interacts with prosodic prominence. Articulatory data were collected using electromagnetic articulography (EMA, Carstens AG200), providing kinematic measurements such as movement duration, peak velocity, and displacement of articulatory gestures. The accompanying files allow examination of individual speaker variation in these measures as modulated by prosodic structure, including boundary and prominence effects. While theoretical findings have been reported in a previous study, the full dataset, including detailed descriptions of individual speaker patterns, is made available here. By making these less commonly available articulatory data publicly available, we aim to promote broad reuse and support further research in prosody, articulatory phonetics, and speech production.
Full article

Figure 1
Open AccessArticle
Using Machine Learning to Identify Predictors of Heterogeneous Intervention Effects in Childhood Obesity Prevention
by
Elizabeth Mannion, Kristine Bihrmann, Nanna Julie Olsen, Berit Lilienthal Heitmann and Christian Ritz
Data 2025, 10(12), 196; https://doi.org/10.3390/data10120196 - 1 Dec 2025
Abstract
Obesity prevention interventions in children often produce small or null effects. However, ignoring heterogeneous responses may widen pre-existing inequalities. This secondary analysis explored baseline predictors of differential effects on BMI z-score, Fat mass (%), stress, and sleep outcomes in obesity-susceptible, healthy-weight children (n
[...] Read more.
Obesity prevention interventions in children often produce small or null effects. However, ignoring heterogeneous responses may widen pre-existing inequalities. This secondary analysis explored baseline predictors of differential effects on BMI z-score, Fat mass (%), stress, and sleep outcomes in obesity-susceptible, healthy-weight children (n = 543). A modified LASSO regression was applied to baseline characteristics, including physical activity and socio-demographics. Few predictors were retained. For BMI z-score, weekly chores and parental divorce were the strongest predictors: children who did chores had a slightly larger increase in BMI z-score in the intervention group compared with controls (MD = 0.15, 95% CI: −0.03, 0.33), while children with divorced parents showed a smaller increase (MD = −0.19, 95% CI: −0.69, 0.31). These results align with evidence that low-intensity activity has limited impact on obesity outcomes and that children with compounded vulnerability may respond differently to tailored interventions. Even when overall effects are small, machine learning approaches can identify potential predictors of heterogeneous intervention effects, supporting the design of future targeted interventions aimed at reducing inequalities.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research
by
DECOVID Consortium, Louis J. M. Aslett, Andreea Avramescu, Nicholas Bakewell, Isabel Birds, Louise Bowler, Michael P. J. Camilleri, Sheng-Chia Chung, David A. Clifton, Samuel N. Cohen, Nathan Constantine-Cooke, Eric G. Daub, Shaun Davidson, Spiros Denaxas, Karla Diaz-Ordaz, Richard Feltbower, Suzy Gallier, Stephen Gardiner, Francesca Gasperoni, Robert J. B. Goudie, Rebecca E. Green, Marlous Hall, Chris Holmes, John R. Hurst, Mark M. Iles, Joao Jorge, Emma Karoune, Ruth Keogh, Ruairidh King, Ruth King, Paul D. W. Kirk, Roman Klapaukh, Samaneh Kouchaki, Alvina G. Lai, Nathan Lea, Clemence Leyrat, Kezhi Li, Watjana Lilaonitkul, Huiqi Y. Lu, Terry Lyons, Ann Marie Mallon, Andrew Manderson, Nicolò Margaritella, Joshua Matteson, Sam Morley, Hannah Nicholls, Martin O’Reilly, Christina Pagel, Edward Palmer, Jack Roberts, Timothy J. Roberts, David S. Robertson, James Robinson, Patrick Rockenschaub, Roy Ruddle, Elizabeth Sapey, Luis Santos, Andrew A. S. Soltan, Fang Gao Smith, Colin Starr, Oliver Strickson, Li Su, Mia S. Tackney, Johan H. Thygesen, Ana Torralbo, Alice Turner, Catalina A. Vallejos, Chenyang Wang, Kirstie Whitaker, Tony Whitehouse, David R. Westhead, Wai Keong Wong, Yue Wu, Lingyi Yang and Xiaoxu Zouadd
Show full author list
remove
Hide full author list
Data 2025, 10(12), 195; https://doi.org/10.3390/data10120195 - 24 Nov 2025
Abstract
►▼
Show Figures
The DECOVID database contains harmonized pseudonymized electronic health record (EHR) data on all adult (≥18 years old) patients presenting to two large, digitally mature centers in the United Kingdom between 1 January 2020 and 28 February 2021, with follow-up until at least 28
[...] Read more.
The DECOVID database contains harmonized pseudonymized electronic health record (EHR) data on all adult (≥18 years old) patients presenting to two large, digitally mature centers in the United Kingdom between 1 January 2020 and 28 February 2021, with follow-up until at least 28 March 2021. The database was originally developed to support the COVID-19 response but is now available via the PIONEER data hub for researchers to explore a wide range of research questions, including exploratory analyses, risk factor assessment, prediction modeling, and comparative effectiveness studies. Raw data were extracted from local EHRs and transformed into a standardized form (Observational Health Data Sciences and Informatics-Common Data Model version 5.3.1). The database includes 165,420 patients across 256,804 hospital presentations. For these patients, highly granular data are available, including patient demographics, longitudinal vital signs, physiology, treatments, laboratory findings, clinical diagnoses, and outcomes. There are 10,030 patients with COVID-19, of whom 1472 died in hospital.
Full article

Figure 1
Open AccessData Descriptor
SurfaceEMG Datasets for Hand Gesture Recognition Under Constant and Three-Level Force Conditions
by
Cinthya Alejandra Zúñiga-Castillo, Víctor Alejandro Anaya-Mosqueda, Natalia Margarita Rendón-Caballero, Marcos Aviles, José M. Álvarez-Alvarado, Roberto Augusto Gómez-Loenzo and Juvenal Rodríguez-Reséndiz
Data 2025, 10(12), 194; https://doi.org/10.3390/data10120194 - 22 Nov 2025
Abstract
►▼
Show Figures
This work introduces two complementary surface electromyography (sEMG) datasets for hand gesture recognition. Signals were collected from 40 healthy subjects aged 18 to 40 years, divided into two independent groups of 20 participants each. In both datasets, subjects performed five hand gestures. Most
[...] Read more.
This work introduces two complementary surface electromyography (sEMG) datasets for hand gesture recognition. Signals were collected from 40 healthy subjects aged 18 to 40 years, divided into two independent groups of 20 participants each. In both datasets, subjects performed five hand gestures. Most of the gestures are the same, although the exact set and the order differ slightly between datasets. For example, Dataset 2 (DS2) includes the simultaneous flexion of the thumb and index finger, which is not present in Dataset 1 (DS1). Data were recorded with three bipolar sEMG sensors placed on the dominant forearm (flexor digitorum superficialis, extensor digitorum, and flexor pollicis longus). A battery-powered acquisition system was used, with sampling rates of 1000 Hz for DS1 and 1500 Hz for DS2. DS1 contains recordings performed at a constant moderate force, while DS2 includes three force levels (low, medium, and high). Both datasets provide raw signals and pre-processed versions segmented into overlapping windows, with clear file structures and annotations, enabling feature extraction for machine learning applications. Together, they constitute a large-scale standardized sEMG resource that supports the development and benchmarking of gesture and force recognition algorithms for rehabilitation, assistive technologies, and prosthetic control.
Full article

Figure 1
Open AccessData Descriptor
Sampling the Darcy Friction Factor Using Halton, Hammersley, Sobol, and Korobov Sequences: Data Points from the Colebrook Relation
by
Dejan Brkić and Marko Milošević
Data 2025, 10(11), 193; https://doi.org/10.3390/data10110193 - 20 Nov 2025
Abstract
►▼
Show Figures
When the Colebrook equation is used in its original implicit form, the unknown pipe flow friction factor can only be obtained through time-consuming and computationally demanding iterative calculations. The empirical Colebrook equation relates the unknown Darcy friction factor to a known Reynolds number
[...] Read more.
When the Colebrook equation is used in its original implicit form, the unknown pipe flow friction factor can only be obtained through time-consuming and computationally demanding iterative calculations. The empirical Colebrook equation relates the unknown Darcy friction factor to a known Reynolds number and a known relative roughness of a pipe’s inner surface. It is widely used in engineering. To simplify computations, a variety of explicit approximations have been developed, the accuracy of which must be carefully evaluated. For this purpose, this Data Descriptor gives a sufficient number of pipe flow friction factor values that are computed using a highly accurate iterative algorithm to solve the implicit Colebrook equation. These values serve as reference data, spanning the range relevant to engineering applications, and provide benchmarks for evaluating the accuracy of the approximations. The sampling points within the datasets are distributed in a way that minimizes gaps in the data. In this study, a Python Version v1 script was used to generate quasi-random samples, including Halton, Hammersley, Sobol, and deterministic lattice-based Korobov samples, which produce smaller gaps than purely random samples generated for comparison purposes. Using these sequences, a total of 220 = 1,048,576 data points were generated, and the corresponding datasets are provided in in the zenodo repositoryWhen a smaller subset of points is needed, the required number of initial points from these sequences can be used directly.
Full article

Figure 1
Journal Menu
► ▼ Journal Menu-
- Data Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Guidelines for Reviewers
- Special Issues
- Topics
- Sections & Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Batteries, Buildings, Data, Electricity, Electronics, Energies, Smart Cities
Smart Energy Systems, 2nd Edition
Topic Editors: Hugo Morais, Rui Castro, Cindy GuzmanDeadline: 30 December 2025
Topic in
Entropy, Future Internet, Healthcare, Sensors, Data
Communications Challenges in Health and Well-Being, 2nd Edition
Topic Editors: Dragana Bajic, Konstantinos Katzis, Gordana GardasevicDeadline: 28 February 2026
Topic in
Applied Sciences, Data, Electronics, Information, Mathematics
New Applications of Big Data Technology: Integration of Data Mining and Artificial Intelligence
Topic Editors: Xujuan Zhou, Yuefeng Li, Raj Gururajan, Ji Zhang, Revathi VenkataramanDeadline: 31 March 2026
Topic in
Algorithms, Data, Earth, Geosciences, Mathematics, Land, Water, IJGI
Applications of Algorithms in Risk Assessment and Evaluation
Topic Editors: Yiding Bao, Qiang WeiDeadline: 31 July 2026
Special Issues
Special Issue in
Data
Cutting-Edge Datasets and Algorithms for Enhancing Industrial Processes and Supply Chain Optimization
Guest Editors: Iván Pérez-Olguín, Luis Carlos Méndez González, Luis Alberto Rodríguez-PicónDeadline: 30 March 2026
Special Issue in
Data
Data Management in Life Sciences
Guest Editor: Jorge dos Santos OliveiraDeadline: 31 March 2026
Special Issue in
Data
Navigating Emerging Advancements and Challenges in AI and Big Data Technologies for Business and Society
Guest Editor: Michael GerlichDeadline: 31 March 2026
Special Issue in
Data
Interactive Visual Analytics: Bridging Human Cognition and Complex Data
Guest Editors: Kamran Sedig, Sheikh Shaugat AbdullahDeadline: 30 April 2026
Topical Collections
Topical Collection in
Data
Modern Geophysical and Climate Data Analysis: Tools and Methods
Collection Editors: Vladimir Sreckovic, Zoran Mijic


