Next Issue
Volume 11, February
Previous Issue
Volume 10, December
 
 

Data, Volume 11, Issue 1 (January 2026) – 23 articles

Cover Story (view full-size image): Magnetic resonance imaging and hyperspectral imaging offer complementary strengths for image-guided neurosurgery by combining anatomical detail with tissue-specific optical information. This work introduces a multimodal dataset based on agar phantoms designed for MRI–HSI integration. The phantoms replicate layered brain tissue structures, including white matter, gray matter, tumors, and superficial blood vessels, reproducing MRI contrasts of the rat brain while providing stable hyperspectral signatures. The dataset includes two phantom designs with synchronized MRI, HSI, RGB-D, and tracking data, along with pixel-wise annotations and 3D models. This reproducible dataset supports benchmarking of registration, segmentation, classification, depth estimation, and multimodal fusion methods. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
38 pages, 12262 KB  
Article
A Reproducible FPGA–ADC Synchronization Architecture for High-Speed Data Acquisition
by Van Muoi Ngo and Thanh Dong Nguyen
Data 2026, 11(1), 23; https://doi.org/10.3390/data11010023 - 21 Jan 2026
Viewed by 200
Abstract
High-speed data acquisition systems based on field-programmable gate arrays (FPGAs) often face synchronization challenges when interfacing with commercial analog-to-digital converters (ADCs), particularly under constrained hardware routing conditions and vendor-specific clocking assumptions. This work presents a vendor-independent FPGA–ADC synchronization architecture that enables reliable and [...] Read more.
High-speed data acquisition systems based on field-programmable gate arrays (FPGAs) often face synchronization challenges when interfacing with commercial analog-to-digital converters (ADCs), particularly under constrained hardware routing conditions and vendor-specific clocking assumptions. This work presents a vendor-independent FPGA–ADC synchronization architecture that enables reliable and repeatable high-speed data acquisition without relying on clock-capable input resources. Clock and frame signals are internally reconstructed and phase-aligned within the FPGA using mixed-mode clock management (MMCM) and input serializer/deserializer (ISERDES) resources, enabling time-sequential phase observation without the need for parallel snapshot or delay-line structures. Rather than targeting absolute metrological limits, the proposed approach emphasizes a reproducible and transparent data acquisition methodology applicable across heterogeneous FPGA–ADC platforms, in which clock synchronization is treated as a system-level design parameter affecting digital interface timing integrity and data reproducibility. Experimental validation using a custom Kintex-7 (XC7K325T) FPGA and an AFE7225 ADC demonstrates stable synchronization at sampling rates of up to 125 MS/s, with frequency-offset tolerance determined by the phase-tracking capability of the internal MMCM-based alignment loop. Consistent signal acquisition is achieved over the 100 kHz–20 MHz frequency range. The measured interface level timing uncertainty remains below 10 ps RMS, confirming robust clock and frame alignment. Meanwhile, the observed signal-to-noise ratio (SNR) performance, exceeding 80 dB, reflects the phase–noise-limited measurement quality of the system. The proposed architecture provides a cost-effective, scalable, and reproducible solution for experimental and research-oriented FPGA-based data acquisition systems operating under practical hardware constraints. Full article
(This article belongs to the Topic Data Stream Mining and Processing)
Show Figures

Figure 1

9 pages, 513 KB  
Data Descriptor
A Curated Dataset on the Acute In Vivo Ecotoxicity of Metallic Nanomaterials from Published Literature
by Surendra Balraadjsing, Willie J. G. M. Peijnenburg and Martina G. Vijver
Data 2026, 11(1), 22; https://doi.org/10.3390/data11010022 - 15 Jan 2026
Viewed by 250
Abstract
Metallic engineered nanomaterials (ENMs) have enormous technological potential and are increasingly applied across different fields and products. However, substances (including ENMs) can be detrimental to the environment and human health, thus requiring systematic testing to uncover potential hazardous effects (in compliance with REACH). [...] Read more.
Metallic engineered nanomaterials (ENMs) have enormous technological potential and are increasingly applied across different fields and products. However, substances (including ENMs) can be detrimental to the environment and human health, thus requiring systematic testing to uncover potential hazardous effects (in compliance with REACH). Although hazard testing traditionally involves the use of animal experiments, recent years have seen a shift towards in silico modeling. High-quality data is required for in silico modeling, which is frequently not readily available for ENMs. Vast amounts of data have been published in literature but they are unstructured and scattered across numerous sources. To mitigate the limitations in data availability, we have compiled and created a nanotoxicity dataset based on published literature. The compiled dataset focuses mainly on acute in vivo endpoints conducted in a laboratory setting using metallic nanomaterials. The data extracted from literature include material information, physico-chemical properties, experimental conditions, endpoint information, and literary meta-data. The dataset presented here is useful for meta-analysis or in silico modeling purposes. Full article
Show Figures

Graphical abstract

20 pages, 3333 KB  
Data Descriptor
Dataset for Device-Free Wireless Sensing of Crowd Size in Public Transportation Environments
by Robin Janssens, Rafael Berkvens and Ben Bellekens
Data 2026, 11(1), 21; https://doi.org/10.3390/data11010021 - 14 Jan 2026
Viewed by 232
Abstract
Congested platforms in public transportation systems can jeopardize the safety and comfort of passengers. Real-time crowd size estimation using Device-Free Wireless Sensing (DFWS) can offer a privacy-preserving solution for monitoring and preventing overcrowding. However, no public dataset exists on DFWS in public transportation [...] Read more.
Congested platforms in public transportation systems can jeopardize the safety and comfort of passengers. Real-time crowd size estimation using Device-Free Wireless Sensing (DFWS) can offer a privacy-preserving solution for monitoring and preventing overcrowding. However, no public dataset exists on DFWS in public transportation environments. In this work, we introduce a new dataset comprising two different public transportation environments, which contains data on the presence of rail vehicles at the platform, as well as manual people counts at regular intervals. By providing this dataset, we aim to offer a foundation for other DFWS researchers to explore novel algorithms and methods in public transportation environments. Full article
Show Figures

Figure 1

16 pages, 6107 KB  
Data Descriptor
Actual Evapotranspiration Dataset of Mongolia Plateau from 2001 to 2020 Based on SFE-NP Model
by Yuhui Su, Juanle Wang and Baomin Han
Data 2026, 11(1), 20; https://doi.org/10.3390/data11010020 - 13 Jan 2026
Viewed by 174
Abstract
Evapotranspiration (ET) refers to the total water vapor flux transported by vegetation and surface soil to the atmosphere. It is an important component of water and heat regulation, and has an impact on plant productivity and water resource management. As a water-shortage region, [...] Read more.
Evapotranspiration (ET) refers to the total water vapor flux transported by vegetation and surface soil to the atmosphere. It is an important component of water and heat regulation, and has an impact on plant productivity and water resource management. As a water-shortage region, the Mongolian Plateau is characterized by drought and an uneven distribution of rainwater resources. Understanding the spatiotemporal distribution characteristics of ET on the Mongolian Plateau is important for water resource regulation for climate change adaption and regional sustainable development. This study calculated the spatiotemporal distribution characteristics of the actual ET in the Mongolian Plateau based on the SFE-NP model and generated a surface ET dataset with a spatial resolution of 1 km and monthly temporal resolution from 2001 to 2020. Theil-Sen median and Mann–Kendall trend models were used to analyze the temporal and spatial distribution characteristics of the actual ET over the Mongolian Plateau. This dataset has been validated for accuracy against the commonly used authoritative ET datasets ERA5_Land and MOD16A2, demonstrating high precision and accuracy. This dataset can provide data support for research and applications such as surface water resource allocation and drought detection in the Mongolian Plateau. Full article
(This article belongs to the Collection Modern Geophysical and Climate Data Analysis: Tools and Methods)
Show Figures

Figure 1

24 pages, 1203 KB  
Article
Towards Data-Driven Decisions in Agriculture—A Proposed Data Quality Framework for Grains Trials Research
by Aakansha Chadha, Nathan Robinson and Judy Channon
Data 2026, 11(1), 19; https://doi.org/10.3390/data11010019 - 13 Jan 2026
Viewed by 265
Abstract
Future agriculture will depend on smart systems and digital technologies to improve food production and sustainability. Data-driven methods, such as artificial intelligence, will become integral to agricultural research and development, transforming how decisions are made and how sustainability goals are achieved. Reliable, high-quality [...] Read more.
Future agriculture will depend on smart systems and digital technologies to improve food production and sustainability. Data-driven methods, such as artificial intelligence, will become integral to agricultural research and development, transforming how decisions are made and how sustainability goals are achieved. Reliable, high-quality data is essential to ensure that research users can trust their conclusions and decisions. To achieve this, a standard for assessing and reporting data quality is required to realise the full potential of data-driven agriculture. Two practical and empirical data quality assessment tools are proposed—a trial data quality test (primarily for data contributors) and a trial data quality statement (for data users). These tools provide information on data qualities assessed for contributors to the submitted trial data and those seeking to use the data for decision support purposes. An action case study using the Online Farm Trials platform illustrates their application. The proposed data quality framework provides a consistent approach for evaluating trial quality and determining fitness for purpose. Flexible and adaptable, the DQF and its tools can be tailored to different agricultural contexts, strengthening confidence in data-driven decision-making and advancing sustainable agriculture. Full article
Show Figures

Figure 1

9 pages, 2602 KB  
Data Descriptor
A Comprehensive Dataset and Workflow for Building Large-Scale, Highly Oxidized Graphene Oxide Models
by Merve Fedai, Albert L. Kwansa and Yaroslava G. Yingling
Data 2026, 11(1), 18; https://doi.org/10.3390/data11010018 - 13 Jan 2026
Viewed by 367
Abstract
Graphene (GRA) and graphene oxide (GO) have drawn significant attention in materials science, chemistry, and nanotechnology because of their tunable physicochemical properties and wide range of potential uses in biomedical and environmental applications. Building reliable, large-scale molecular models of GRA and GO is [...] Read more.
Graphene (GRA) and graphene oxide (GO) have drawn significant attention in materials science, chemistry, and nanotechnology because of their tunable physicochemical properties and wide range of potential uses in biomedical and environmental applications. Building reliable, large-scale molecular models of GRA and GO is essential for molecular simulations of wetting, adsorption, and catalytic behavior. However, current methods often struggle to generate large, chemically consistent sheets at high oxidation levels. In addition, the resulting structures are frequently incompatible across different simulation packages. This work introduces a step-by-step protocol with custom Tool Command Language (Tcl) and modified Python version 3.12 scripts for building large-scale, AMBER-compatible GO structures with oxidation levels from 0% to 68%. The workflow applies a systematic surface modification strategy combined with post-processing and atom-type assignment routines to ensure chemical accuracy and force field consistency. The dataset includes fifteen MOL2 format files of 20 × 20 nm2 GO sheets, ranging from pristine to highly oxidized surfaces, each validated through oxidation-ratio analysis and structural integrity checks. Together, the dataset and protocol provide a design of scalable and chemically reliable GO molecular models for molecular dynamics simulations. Full article
Show Figures

Figure 1

9 pages, 1277 KB  
Data Descriptor
Experimental Data of a Pilot Parabolic Trough Collector Considering the Climatic Conditions of the City of Coatzacoalcos, Mexico
by Aldo Márquez-Nolasco, Roberto A. Conde-Gutiérrez, Luis A. López-Pérez, Gerardo Alcalá Perea, Ociel Rodríguez-Pérez, César A. García-Pérez, Josept D. Revuelta-Acosta and Javier Garrido-Meléndez
Data 2026, 11(1), 17; https://doi.org/10.3390/data11010017 - 13 Jan 2026
Viewed by 218
Abstract
This article presents a database focused on measuring the experimental performance of a pilot parabolic trough collector (PTC) combined with the meteorological conditions corresponding to the installation site. Water was chosen as the fluid to recirculate through the PTC circuit. The data were [...] Read more.
This article presents a database focused on measuring the experimental performance of a pilot parabolic trough collector (PTC) combined with the meteorological conditions corresponding to the installation site. Water was chosen as the fluid to recirculate through the PTC circuit. The data were recorded between August and September, assuming that global radiation was adequate for use in the concentration process. The database comprises seven experimental tests, which contain variables such as time, inlet temperature, outlet temperature, ambient temperature, global radiation, diffuse radiation, wind direction, wind speed, and volumetric flow rate. Based on the data obtained from this pilot PTC system, it is possible to provide relevant information for the installation and construction of large-scale solar collectors. Furthermore, the climatic conditions considered allow key factors in the design of multiple collectors to be determined, such as the type of arrangement (series or parallel) and manufacturing materials. In addition, the data collected in this study are key to validating future theoretical models of the PTC. Finally, considering the real operating conditions of a PTC in conjunction with meteorological variables could also be useful for predicting the system’s thermal performance using artificial intelligence-based models. Full article
Show Figures

Figure 1

19 pages, 6871 KB  
Article
A BIM-Derived Synthetic Point Cloud (SPC) Dataset for Construction Scene Component Segmentation
by Yiquan Zou, Tianxiang Liang, Wenxuan Chen, Zhixiang Ren and Yuhan Wen
Data 2026, 11(1), 16; https://doi.org/10.3390/data11010016 - 12 Jan 2026
Viewed by 277
Abstract
In intelligent construction and BIM–Reality integration applications, high-quality, large-scale construction scene point cloud data with component-level semantic annotations constitute a fundamental basis for three-dimensional semantic understanding and automated analysis. However, point clouds acquired from real construction sites commonly suffer from high labeling costs, [...] Read more.
In intelligent construction and BIM–Reality integration applications, high-quality, large-scale construction scene point cloud data with component-level semantic annotations constitute a fundamental basis for three-dimensional semantic understanding and automated analysis. However, point clouds acquired from real construction sites commonly suffer from high labeling costs, severe occlusion, and unstable data distributions. Existing public datasets remain insufficient in terms of scale, component coverage, and annotation consistency, limiting their suitability for data-driven approaches. To address these challenges, this paper constructs and releases a BIM-derived synthetic construction scene point cloud dataset, termed the Synthetic Point Cloud (SPC), targeting component-level point cloud semantic segmentation and related research tasks.The dataset is generated from publicly available BIM models through physics-based virtual LiDAR scanning, producing multi-view and multi-density three-dimensional point clouds while automatically inheriting component-level semantic labels from BIM without any manual intervention. The SPC dataset comprises 132 virtual scanning scenes, with an overall scale of approximately 8.75×109 points, covering typical construction components such as walls, columns, beams, and slabs. By systematically configuring scanning viewpoints, sampling densities, and occlusion conditions, the dataset introduces rich geometric and spatial distribution diversity. This paper presents a comprehensive description of the SPC data generation pipeline, semantic mapping strategy, virtual scanning configurations, and data organization scheme, followed by statistical analysis and technical validation in terms of point cloud scale evolution, spatial coverage characteristics, and component-wise semantic distributions. Furthermore, baseline experiments on component-level point cloud semantic segmentation are provided. The results demonstrate that models trained solely on the SPC dataset can achieve stable and engineering-meaningful component-level predictions on real construction point clouds, validating the dataset’s usability in virtual-to-real research scenarios. As a scalable and reproducible BIM-derived point cloud resource, the SPC dataset offers a unified data foundation and experimental support for research on construction scene point cloud semantic segmentation, virtual-to-real transfer learning, scan-to-BIM updating, and intelligent construction monitoring. Full article
Show Figures

Figure 1

14 pages, 1839 KB  
Data Descriptor
Whole-Genome Sequencing of Sinorhizobium Phage AP-202, a Novel Siphovirus from Agricultural Soil
by Marina L. Roumiantseva, Alexandra P. Kozlova, Victoria S. Muntyan, Maria E. Vladimirova, Alla S. Saksaganskaia, Andrey N. Gorshkov, Marsel R. Kabilov and Boris V. Simarov
Data 2026, 11(1), 15; https://doi.org/10.3390/data11010015 - 12 Jan 2026
Viewed by 263
Abstract
Bacteriophages are a key ecological factor in the legume rhizosphere, controlling bacterial populations and affecting introduced inoculant strains. Despite their importance, rhizobiophage genomic diversity remains poorly characterized. We report the complete genome of a novel predicted temperate Sinorhizobium phage, AP-202, isolated from agricultural [...] Read more.
Bacteriophages are a key ecological factor in the legume rhizosphere, controlling bacterial populations and affecting introduced inoculant strains. Despite their importance, rhizobiophage genomic diversity remains poorly characterized. We report the complete genome of a novel predicted temperate Sinorhizobium phage, AP-202, isolated from agricultural Chernozem. This siphovirus infects the symbiont Sinorhizobium meliloti. Its 121,599 bp dsDNA genome has a strikingly low GC content (27.1%), likely reflecting adaptive evolution and a strategy to evade host defenses. The linear genome is flanked by 240 bp direct terminal repeats (DTRs), and its DNA packaging follows a T7-like strategy. Annotation predicted 178 protein-coding genes and one tRNA. Functional analysis revealed a complete lysogeny module and a divergent, two-pronged codon-usage strategy for translational control. A significant part of the proteome (74.2%) comprises hypothetical proteins, with 50 CDSs having no database homologs, underscoring its genetic novelty. Complete-genome comparison shows minimal similarity to known rhizobiophages, defining AP-202 as a distinct lineage. Phenotypic analysis indicates AP-202 acts as a selective ecological filter, with host resistance being more prevalent in agricultural than in natural soils. The AP-202 genome provides a unique model for studying phage–host coevolution in the rhizosphere and is a valuable resource for comparative genomics and soil virome research. Full article
Show Figures

Figure 1

19 pages, 9258 KB  
Data Descriptor
Data on Scuttle Flies (Diptera: Phoridae) Based on Extensive Sampling Regions in Central and Eastern European Russia
by Alexander B. Ruchin, Bernd Grundmann and Mikhail N. Esin
Data 2026, 11(1), 14; https://doi.org/10.3390/data11010014 - 12 Jan 2026
Viewed by 231
Abstract
Background: The Phoridae are one of the most poorly studied families of Diptera insects in Russia. They are small flies that play an important role in ecosystems. Methods: This dataset presents the results of a study on Phoridae conducted between 2019 and 2024 [...] Read more.
Background: The Phoridae are one of the most poorly studied families of Diptera insects in Russia. They are small flies that play an important role in ecosystems. Methods: This dataset presents the results of a study on Phoridae conducted between 2019 and 2024 in European Russia. The overall study area covered 400,000 km2. Results: A total of 16,265 specimens were reliably identified, representing 272 species and 22 genera from 180 localities. Of these, 2673 specimens were females (16.4%), while the remaining 83.6% were males. Conclusions: The genus Megaselia Rondani accounted for 200 species (73.5%) and 12,120 specimens (74.5%). Ten species were particularly common: Megaselia pusilla, M. angusta agg., Triphleba opaca, Diplonevra funebris, M. brevicostalis, M. plurispinulosa, M. flavicans, M. lutea, M. minuta, and M. lactipennis. The highest number of localities was recorded for M. angusta agg. (37.2%), M. flavicans (27.8%), and M. brevicostalis (25.0%). In terms of collection methods, the majority of both specimens and species were captured using Malaise traps and pan traps. The highest species richness and specimen abundance were recorded in floodplain habitats, steppified areas, and meadows. In contrast, forested sites showed lower species diversity and abundance. Full article
Show Figures

Figure 1

16 pages, 1579 KB  
Data Descriptor
Dataset on Citizens’ Perceptions of Urban Resilience: Survey Results from Veracruz—Boca Del Río Metropolitan Area, Mexico
by María de los Ángeles Martínez-Cosío, José Eriban Barradas-Hernández, Sergio Márquez-Domínguez, Alejandro Vargas-Colorado, Pedro Javier García-Ramírez, Gerardo Mario Ortigoza-Capetillo, José Piña-Flores, Franco Antonio Carpio-Santamaría, Abigail Zamora-Hernández, Erick Alejandro Ramírez-Martínez and Dariniel de Jesús Barrera-Jiménez
Data 2026, 11(1), 13; https://doi.org/10.3390/data11010013 - 12 Jan 2026
Viewed by 379
Abstract
This paper presents a dataset developed to characterize the citizens’ perceptions of urban resilience applied to the Veracruz—Boca del Río Metropolitan Area (VBMA) in Mexico. The data were obtained by conducting online surveys, which were administered to a total of 147 subjects, including [...] Read more.
This paper presents a dataset developed to characterize the citizens’ perceptions of urban resilience applied to the Veracruz—Boca del Río Metropolitan Area (VBMA) in Mexico. The data were obtained by conducting online surveys, which were administered to a total of 147 subjects, including 89 from the municipality of Veracruz, 35 from Boca del Río, 15 from Medellín de Bravo, and 8 from Alvarado, with ages ranging from 16 years to over 61 years. The survey was designed to estimate the population’s perception of the Urban Resilience Index (URI) and the Urban Resilience Profile (URP). It was developed derived from a methodology based on IMPLAN and enriched with questionnaires from Villada and SEDATU, resulting in a final questionnaire comprising 10 axes, 33 indicators, and 156 variables. A novel contribution was implemented as a significant study case, which uses the dataset to estimate the URI and URP to the VBMA applying the Entropy Method, considering three criteria: age, gender, and municipality. Here, citizens’ perceptions about urban resilience have been estimated in an URI equal to 0.4571, resulting in a moderate level of resilience. Moreover, this perception could be improved by conducting a full-scale survey with substantial financial investment. Full article
Show Figures

Figure 1

17 pages, 20645 KB  
Data Descriptor
Multimodal MRI–HSI Synthetic Brain Tissue Dataset Based on Agar Phantoms
by Manuel Villa, Jaime Sancho, Gonzalo Rosa-Olmeda, Aure Enkaoua, Sara Moccia and Eduardo Juarez
Data 2026, 11(1), 12; https://doi.org/10.3390/data11010012 - 8 Jan 2026
Viewed by 345
Abstract
Magnetic resonance imaging (MRI) and hyperspectral imaging (HSI) provide complementary information for image-guided neurosurgery, combining high-resolution anatomical detail with tissue-specific optical characterization. This work presents a novel multimodal phantom dataset specifically designed for MRI–HSI integration. The phantoms reproduce a three-layer tissue structure comprising [...] Read more.
Magnetic resonance imaging (MRI) and hyperspectral imaging (HSI) provide complementary information for image-guided neurosurgery, combining high-resolution anatomical detail with tissue-specific optical characterization. This work presents a novel multimodal phantom dataset specifically designed for MRI–HSI integration. The phantoms reproduce a three-layer tissue structure comprising white matter, gray matter, tumor, and superficial blood vessels, using agar-based compositions that mimic MRI contrasts of the rat brain while providing consistent hyperspectral signatures. The dataset includes two designs of phantoms with MRI, HSI, RGB-D, and tracking acquisitions, along with pixel-wise labels and corresponding 3D models, comprising 13 phantoms in total. The dataset facilitates the evaluation of registration, segmentation, and classification algorithms, as well as depth estimation, multimodal fusion, and tracking-to-camera calibration procedures. By providing reproducible, labeled multimodal data, these phantoms reduce the need for animal experiments in preclinical imaging research and serve as a versatile benchmark for MRI–HSI integration and other multimodal imaging studies. Full article
Show Figures

Figure 1

9 pages, 918 KB  
Data Descriptor
Soil Health Descriptors and Socio-Demographic-Economic Context: A Dataset for the European Union
by Lukas Bayer, Keerthi Bandru, Nora Naumann and Cenk Dönmez
Data 2026, 11(1), 11; https://doi.org/10.3390/data11010011 - 6 Jan 2026
Viewed by 232
Abstract
Soil degradation is a pressing concern in the European Union, affecting all major land use types, including agriculture, forests, and urban areas. Existing studies often identify explanatory variables for soil degradation, but large-scale, comprehensive datasets are limited. This dataset, compiled at the NUTS2 [...] Read more.
Soil degradation is a pressing concern in the European Union, affecting all major land use types, including agriculture, forests, and urban areas. Existing studies often identify explanatory variables for soil degradation, but large-scale, comprehensive datasets are limited. This dataset, compiled at the NUTS2 (Nomenclature of Territorial Units for Statistics, level 2–a European regional classification system) level, integrates socio-demographic factors, land use changes, and soil health descriptors from 2005 to 2023. It includes variables such as population dynamics, material deprivation, land tenure, and soil health challenges (erosion, compaction, salinity, soil organic carbon levels, and industrial pollution). The soil descriptors used were derived from secondary geospatial datasets, including ESDAC, processed via GIS techniques. Designed for use in spatial planning, agriculture, and environmental research, this dataset facilitates multivariate and regression analyses to explore socio-economic impacts on soil health. By merging diverse descriptors from multiple sources, it provides a valuable resource for understanding soil degradation and supporting evidence-based policymaking. Full article
Show Figures

Figure 1

8 pages, 2719 KB  
Data Descriptor
Spatial Dataset for Comparing 3D Measurement Techniques on Lunar Regolith Simulant Cones
by Piotr Kędziorski, Janusz Kobaka, Jacek Katzer, Paweł Tysiąc, Marcin Jagoda and Machi Zawidzki
Data 2026, 11(1), 10; https://doi.org/10.3390/data11010010 - 6 Jan 2026
Viewed by 258
Abstract
The presented dataset contains spatial models of cones formed from lunar soil simulants. The cones were formed in a laboratory by allowing the soil to fall freely through a funnel. Then, the cones were measured using three methods: a high-precision handheld laser scanner [...] Read more.
The presented dataset contains spatial models of cones formed from lunar soil simulants. The cones were formed in a laboratory by allowing the soil to fall freely through a funnel. Then, the cones were measured using three methods: a high-precision handheld laser scanner (HLS), photogrammetry, and a low-cost LiDAR system integrated into an iPad Pro. The dataset consists of two groups. The first group contains raw measurement data, and the second group contains the geometry of the cones themselves, excluding their surroundings. This second group was prepared to support the calculation of the cones’ volume. All data are provided in standard 3D file format (.STL). The dataset enables direct comparison of resolution and geometric reconstruction performance across the three techniques and can be reused for benchmarking 3D processing workflows, segmentation algorithms, and shape reconstruction methods. It provides complete geometric information suitable for validating automated extraction procedures for parameters such as cone height, base diameter, and angle of repose, as well as for further research into planetary soil and granular material morphology. Full article
Show Figures

Figure 1

11 pages, 2349 KB  
Article
Long-Term Temporal Variability of Flowering Day of Red Spider Lily (Lycoris radiata)
by Nagai Shin and Taku M. Saitoh
Data 2026, 11(1), 9; https://doi.org/10.3390/data11010009 - 5 Jan 2026
Viewed by 304
Abstract
In Japan, the flowering of the red spider lily (Lycoris radiata) marks the autumn equinox. To evaluate the effect of climate change on Japanese people’s sense of seasons and this cultural ecosystem service, we examined the spatiotemporal variability of the flowering [...] Read more.
In Japan, the flowering of the red spider lily (Lycoris radiata) marks the autumn equinox. To evaluate the effect of climate change on Japanese people’s sense of seasons and this cultural ecosystem service, we examined the spatiotemporal variability of the flowering day (FD) of red spider lily at 9 sites (Maebashi, Choshi, Nagano, Kanazawa, Shizuoka, Tsu, Nara, Wakayama, and Okayama) over the past 60 to 70 years through its relationship with the autumn equinox. (1) Delaying trends were statistically significant (0.12–0.16 days per year) at 4 sites (Nagano, Tsu, Nara, and Wakayama). (2) Bayesian inference analysis with a beta distribution showed that the probability of FD being later than the autumn equinox has increased in the 2010s at all sites. (3) The year-to-year variability of FD was positively correlated with average temperature during the period of flower stalk elongation (late August to mid-September) at 7 sites (except Nagano and Shizuoka). These results suggest that the probability of FD being later than the autumn equinox will increase under further warming during the period of flower stalk elongation, thus affecting people’s sense of seasons and this cultural ecosystem service. Full article
Show Figures

Figure 1

10 pages, 4078 KB  
Data Descriptor
A Database of Fruit and Seed Morphological Traits and Images from Subtropical Flora of Hong Kong
by Ying Ki Law, Chun Chiu Pang, Ting Wing Shum, Theodora Chin-Tung Chan, Cheuk Yan Law and Billy Chi Hang Hau
Data 2026, 11(1), 8; https://doi.org/10.3390/data11010008 - 5 Jan 2026
Viewed by 335
Abstract
Plant functional traits are key to understanding species performance, community assembly and ecosystem processes. Fruit and seed traits play an important role in early life-cycle processes by influencing seed dispersal, germination, and establishment, ultimately shaping plant regeneration and ecosystem dynamics. While global initiatives [...] Read more.
Plant functional traits are key to understanding species performance, community assembly and ecosystem processes. Fruit and seed traits play an important role in early life-cycle processes by influencing seed dispersal, germination, and establishment, ultimately shaping plant regeneration and ecosystem dynamics. While global initiatives such as TRY and Seed Information Database (SID) have assembled extensive trait data, coverage of reproductive traits remains limited, and high-quality images of diaspores are particularly scarce, particularly in subtropical Asia. To address this need, we created an open-source, comprehensive database of fruit and seed traits, accompanied by diaspore images against a high-contrast background. This dataset documents 684 species in 128 families recorded in Hong Kong and provides standardised measurements of morphological attributes (e.g., length, mass, number of seeds per fruit) and dispersal characteristics (e.g., presence of appendages). Our measurements were validated against previously published records of common species in Hong Kong, showing strong consistency with R2 = 0.80 (p < 0.001) for fruit dry mass and R2 = 0.91 (p < 0.001) for seed dry mass, respectively. This database provides a valuable resource for trait-based ecology, forest dynamics and conservation biology. Additionally, it supports applications in ecological restoration, habitat management, and predicting plant responses to environmental change. This initiative enhances our understanding of trait-based ecology by complementing global initiatives such as TRY and SID and improving the representation of reproductive traits from subtropical Asia, a region that is underrepresented in existing global databases. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

43 pages, 1151 KB  
Review
Clustering of Temporal and Visual Data: Recent Advancements
by Priyanka Mudgal
Data 2026, 11(1), 7; https://doi.org/10.3390/data11010007 - 4 Jan 2026
Cited by 1 | Viewed by 427
Abstract
Clustering plays a central role in uncovering latent structure within both temporal and visual data. It enables critical insights in various domains including healthcare, finance, surveillance, autonomous systems, and many more. With the growing volume and complexity of time-series and image-based datasets, there [...] Read more.
Clustering plays a central role in uncovering latent structure within both temporal and visual data. It enables critical insights in various domains including healthcare, finance, surveillance, autonomous systems, and many more. With the growing volume and complexity of time-series and image-based datasets, there is an increasing demand for robust, flexible, and scalable clustering algorithms. Although these modalities differ—time-series being inherently sequential and vision data being spatial—they exhibit common challenges such as high dimensionality, noise, variability in alignment and scale, and the need for interpretable groupings. This survey presents a comprehensive review of recent advancements in clustering methods that are adaptable to both time-series and vision data. We explore a wide spectrum of approaches, including distance-based techniques (e.g., DTW, EMD), feature-based methods, model-based strategies (e.g., GMMs, HMMs), and deep learning frameworks such as autoencoders, self-supervised learning, and graph neural networks. We also survey hybrid and ensemble models, as well as semi-supervised and active clustering methods that leverage minimal supervision for improved performance. By highlighting both the shared principles and the modality-specific adaptations of clustering strategies, this work outlines current capabilities and open challenges, and suggests future directions toward unified, multimodal clustering systems. Full article
(This article belongs to the Section Featured Reviews of Data Science Research)
Show Figures

Figure 1

22 pages, 777 KB  
Data Descriptor
Dataset on AI- and VR-Supported Communication and Problem-Solving Performance in Undergraduate Courses: A Clustered Quasi-Experiment in Mexico
by Roberto Gómez Tobías
Data 2026, 11(1), 6; https://doi.org/10.3390/data11010006 - 2 Jan 2026
Viewed by 302
Abstract
Behavioral and educational researchers increasingly rely on rich datasets that capture how students respond to technology-enhanced instruction, yet few open resources document the full pipeline from experimental design to data curation in authentic classroom settings. This data descriptor presents a clustered quasi-experimental dataset [...] Read more.
Behavioral and educational researchers increasingly rely on rich datasets that capture how students respond to technology-enhanced instruction, yet few open resources document the full pipeline from experimental design to data curation in authentic classroom settings. This data descriptor presents a clustered quasi-experimental dataset on the impact of an instructional architecture that combines virtual reality (VR) simulations with artificial intelligence (AI)-driven formative feedback to enhance undergraduate students’ communication and problem-solving performance. The study was conducted at a large private university in Mexico during the 2024–2025 academic year and involved six intact classes (three intervention, three comparison; n = 180). Exposure to AI and VR was operationalized as a session-level “dose” (minutes of use, number of feedback events, number of scenarios, perceived presence), while performance was assessed with analytic rubrics (six criteria for communication and seven for problem solving) scored independently by two raters, with interrater reliability estimated via ICC (2, k). Additional Likert-type scales measured presence, perceived usefulness of feedback and self-efficacy. The curated dataset includes raw and cleaned tabular files, a detailed codebook, scoring guides and replication scripts for multilevel models and ancillary analyses. By releasing this dataset, we seek to enable reanalysis, methodological replication and cross-study comparisons in technology-enhanced education, and to provide an authentic resource for teaching statistics, econometrics and research methods in the behavioral sciences. Full article
Show Figures

Graphical abstract

8 pages, 3871 KB  
Data Descriptor
A Georeferenced Field Dataset of Forest Cover Density and Composition for Vegetation Classification and Monitoring
by Lucio Di Cosmo, Patrizia Gasparini, Antonio Floris, Maria Rizzo, Hannes Markart and Marco Pietrogiovanna
Data 2026, 11(1), 5; https://doi.org/10.3390/data11010005 - 1 Jan 2026
Viewed by 228
Abstract
Forests provide a wide range of ecosystem services, and their importance in supporting human well-being is widely recognized. As goods and benefits from forests are exhaustible, it is therefore essential to gather sound data for their monitoring and management. Remote sensing has gained [...] Read more.
Forests provide a wide range of ecosystem services, and their importance in supporting human well-being is widely recognized. As goods and benefits from forests are exhaustible, it is therefore essential to gather sound data for their monitoring and management. Remote sensing has gained increasing importance in collecting data on forests, driven by the growing demand for regularly updated environmental data. However, remote sensing modeling of vegetation requires reference data to be collected in the field. This article presents a dataset on tree crown cover—both total and by species—of 528 georeferenced forest plots located in the Eastern Alps, Italy, an area affected by extensive wind and snow damage and subsequent widespread damage caused by bark beetles. The characteristic species of the forest types in the dataset are widely distributed over the Eurasian continent, making the dataset potentially useful to many users and researchers studying forest biodiversity or remote sensing applications to monitor forest cover changes. Data were collected within a still ongoing project aimed at detecting crown cover changes in small forest patches. Full article
Show Figures

Figure 1

8 pages, 656 KB  
Data Descriptor
Transcriptomic Profiling of HepaRG Cells During Differentiation and 3-Methylcholanthrene Induction Using Oxford Nanopore Direct RNA Sequencing
by Nataliya G. Luzgina, Svetlana N. Tarbeeva, Daniil D. Romashin, Konstantin G. Ptitsyn, Svetlana A. Khmeleva, Leonid K. Kurbatov, Sergey P. Radko, Anna S. Kozlova, Polina A. Veselova, Ekaterina V. Ilgisonis and Alexander L. Rusanov
Data 2026, 11(1), 4; https://doi.org/10.3390/data11010004 - 29 Dec 2025
Viewed by 308
Abstract
The aryl hydrocarbon receptor (AhR) plays a crucial role in mediating xenobiotic responses, as well as regulating broader metabolic, differentiation, and stress response programs. In this study, we present a comprehensive long-read RNA sequencing dataset that examines transcriptional changes in the HepaRG human [...] Read more.
The aryl hydrocarbon receptor (AhR) plays a crucial role in mediating xenobiotic responses, as well as regulating broader metabolic, differentiation, and stress response programs. In this study, we present a comprehensive long-read RNA sequencing dataset that examines transcriptional changes in the HepaRG human cell line during differentiation induced by dimethyl sulfoxide (DMSO) and acute activation of the AhR with 3-methylcholanthrene (3-MC). We identified 946 genes that were differentially expressed between the NonDiff and Diff conditions (303 genes upregulated and 643 genes downregulated), and 1786 genes that showed differential expression between Diff and Ind conditions (961 genes upregulated and 825 genes downregulated). The acute induction of 3-MC produced a robust AhR signature, characterized by the robust induction of CYP1A1 and CYP1B1, along with a coordinated downregulation of several constitutive hepatic genes involved in drug metabolism (e.g., CYP3A4 and CYP2C8). To facilitate further analysis and reuse of our data, we have provided processed gene-level count matrices, transcript per million (TPM) tables, and detailed differential expression results, as well as analysis scripts. This resource supports research into AhR biology, pharmacogene regulation, and the development of methods for long-read transcriptomics in liver models. Full article
Show Figures

Figure 1

13 pages, 4911 KB  
Data Descriptor
Seasonal Trap Captures Data of Stink and Leaf-Footed Bugs in a Northern Italian Ecosystem
by Vito Antonio Giannuzzi, Valeria Rossi, Rihem Moujahed, Adriana Poccia, Florinda D’Archivio, Tiziano Rossi Magi, Elena Chierici, Luca Casoli, Gabriele Rondoni and Eric Conti
Data 2026, 11(1), 3; https://doi.org/10.3390/data11010003 - 24 Dec 2025
Viewed by 356
Abstract
An essential first step to implement a control strategy against herbivorous insects is the monitoring of their populations. The efficacy of pheromone-based traps in capturing herbivorous insects can be enhanced by adding adjuvants and using slow-release dispensers to ensure long-lasting attractiveness. Here, we [...] Read more.
An essential first step to implement a control strategy against herbivorous insects is the monitoring of their populations. The efficacy of pheromone-based traps in capturing herbivorous insects can be enhanced by adding adjuvants and using slow-release dispensers to ensure long-lasting attractiveness. Here, we present datasets from a two-year field monitoring campaign of the invasive brown marmorated stink bug, Halyomorpha halys (Stål) (Hemiptera: Pentatomidae), using clear sticky traps baited with its aggregation pheromone and a synergist, tested towards different dispensers and adjuvants. Bycatch data for native stink bugs (all Hemiptera: Pentatomidae) and leaf-footed bugs (Hemiptera: Coreidae) are also presented. The R code provided was used to organize data and generate weekly captures or weekly density of both H. halys and non-target species. The information provided in this article may contribute to the optimization of pest control strategies in agriculture. Full article
Show Figures

Figure 1

38 pages, 2216 KB  
Article
A Dual-Model Framework for Writing Assessment: A Cross-Sectional Interpretive Machine Learning Analysis of Linguistic Features
by Cheng Tang, George Engelhard, Yinying Liu and Jiawei Xiong
Data 2026, 11(1), 2; https://doi.org/10.3390/data11010002 - 21 Dec 2025
Viewed by 426
Abstract
Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize [...] Read more.
Constructed-response items offer rich evidence of writing proficiency, but the linguistic signals they contain vary with grade level. This study presents a cross-sectional analysis of 5638 English Language Arts essays from Grades 6–12 to identify which linguistic features predict proficiency and to characterize how their importance shifts across grade levels. We extracted a suite of lexical, syntactic, and semantic-cohesion features, and evaluated their predictive power using an interpretive dual-model framework combining LASSO and XGBoost algorithms. Feature importance was assessed through LASSO coefficients, XGBoost Gain scores, and SHAP values, and interpreted by isolating both consensus and divergences of the three metrics. Results show moderate, generalizable predictive signals in Grades 6–8, but no generalizable predictive power was found in the Grades 9–12 cohort. Across the middle grades, three findings achieved strong consensus. Essay length, syntactic density, and global semantic organization served as strong predictors of writing proficiency. Lexical diversity emerged as a key divergent feature, it was a top predictor for XGBoost but ignored by LASSO, suggesting its contribution depends on interactions with other features. These findings inform actionable, grade-sensitive feedback, highlighting stable, diagnostic targets for middle school while cautioning that discourse-level features are necessary to model high-school writing. Full article
Show Figures

Figure 1

11 pages, 335 KB  
Data Descriptor
Anonymized Dataset of Information Systems and Technology Students at a South African University for Learning Analytics
by Rushil Raghavjee, Prabhakar Rontala Subramaniam and Irene Govender
Data 2026, 11(1), 1; https://doi.org/10.3390/data11010001 - 19 Dec 2025
Viewed by 327
Abstract
Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education [...] Read more.
Advancements in data storage and data processing technologies has compelled higher education institutions to optimise the use of their data. Many universities globally have begun to implement learning analytics at their institutions to better understand and improve teaching and learning. African higher education institutions have been slow to implement learning analytics despite the continued accumulation of digital data. The research related to this study presents a dataset of Information Systems and Technology (IS&T) students from the University of KwaZulu-Natal, a South African university. The dataset comprises approximately 14,000 registered student records from 10 IS&T courses, primarily consisting of demographic data, academic performance (including past IS&T courses and school records), and Learning Management System (LMS) interaction data. The dataset exhibits an imbalance, characterised by a higher proportion of students who have successfully completed courses compared to those who have not. The dataset will be of interest to researchers engaged in learning analytics application studies, including early pass/fail prediction and grade classification, as well as those who want to test their techniques on a real-world dataset. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop