Journal Description
Data
Data
is a peer-reviewed, open access journal on data in science, with the aim of enhancing data transparency and reusability. The journal publishes in two sections: a section on the collection, treatment and analysis methods of data in science; a section publishing descriptions of scientific and scholarly datasets (one dataset per paper). The journal is published monthly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus, ESCI (Web of Science), Ei Compendex, dblp, Inspec, RePEc, and other databases.
- Journal Rank: JCR - Q2 (Multidisciplinary Sciences) / CiteScore - Q2 (Information Systems and Management)
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 26.8 days after submission; acceptance to publication is undertaken in 3.6 days (median values for papers published in this journal in the second half of 2024).
- Recognition of Reviewers: reviewers who provide timely, thorough peer-review reports receive vouchers entitling them to a discount on the APC of their next publication in any MDPI journal, in appreciation of the work done.
Impact Factor:
2.2 (2023);
5-Year Impact Factor:
2.4 (2023)
Latest Articles
Sentiment Matters for Cryptocurrencies: Evidence from Tweets
Data 2025, 10(4), 50; https://doi.org/10.3390/data10040050 (registering DOI) - 1 Apr 2025
Abstract
►
Show Figures
This study provides empirical evidence that cryptocurrency market movements are influenced by sentiment extracted from social media. Using a high frequency dataset covering four major cryptocurrencies (Bitcoin, Ether, Litecoin, and Ripple) from October 2017 to September 2021, we apply state-of-the-art natural language processing
[...] Read more.
This study provides empirical evidence that cryptocurrency market movements are influenced by sentiment extracted from social media. Using a high frequency dataset covering four major cryptocurrencies (Bitcoin, Ether, Litecoin, and Ripple) from October 2017 to September 2021, we apply state-of-the-art natural language processing techniques on tweets from influential Twitter accounts. We classify sentiment into positive, negative, and neutral categories and analyze its effects on log returns, liquidity, and price jumps by examining market reactions around tweet occurrences. Our findings show that tweets significantly impact trading volume and liquidity: neutral sentiment tweets enhance liquidity consistently, negative sentiments prompt immediate volatility spikes, and positive sentiments exert a delayed yet lasting influence on the market. This highlights the critical role of social media sentiment in influencing intraday market dynamics and extends the research on sentiment-driven market efficiency.
Full article
Open AccessData Descriptor
Predictors of Immune Fitness and the Alcohol Hangover: Survey Data from UK and Irish Adults
by
Joris C. Verster, Agnese Merlo, Maureen N. Zijlstra, Benthe R. C. van der Weij, Anne S. Boogaard, Sanne E. Schulz, Jessica Balikji, Andy J. Kim, Sherry H. Stewart, Simon B. Sherry, Johan Garssen, Gillian Bruce and Lydia E. Devenney
Data 2025, 10(4), 49; https://doi.org/10.3390/data10040049 (registering DOI) - 1 Apr 2025
Abstract
Immune fitness is defined as the capacity of the body to respond to health challenges (such as infections) by activating an appropriate immune response to promote health and prevent and resolve disease, which is essential for improving quality of life. Thus, immune fitness
[...] Read more.
Immune fitness is defined as the capacity of the body to respond to health challenges (such as infections) by activating an appropriate immune response to promote health and prevent and resolve disease, which is essential for improving quality of life. Thus, immune fitness plays an essential role in health, and reduced immune fitness may be an important signal of increased susceptibility for disease. Lifestyle factors such as increased levels of alcohol consumption have been shown to negatively impact immune fitness. The alcohol hangover is the most frequently reported negative consequence of alcohol consumption and is defined as the combination of negative mental and physical symptoms, which can be experienced after a single episode of alcohol consumption, starting when blood alcohol concentration (BAC) approaches zero. Significant correlations have been reported between hangover severity and both immune fitness and biomarkers of systemic inflammation. The concepts of immune fitness and alcohol hangover are further linked by the fact that the inflammatory response to alcohol consumption plays an important role in the pathology of the alcohol hangover. Moreover, immune fitness has been related to the susceptibility of experiencing hangovers per se. It is therefore important to investigate the interrelationship between immune fitness and the alcohol hangover, and to identify possible predictor variables of both constructs. This data descriptor article describes a study that was conducted with adults living in the UK or Ireland, evaluating possible correlates and predictors of immune fitness and the alcohol hangover. Data on mood, personality, mental resilience, pain catastrophizing, and sleep were collected from n = 1178 participants through an online survey. Herein, the survey and corresponding dataset are described.
Full article
Open AccessData Descriptor
Plankton Dataset During Austral Spring and Summer in the Valdés Biosphere Reserve, Patagonia, Argentina
by
Ariadna Celina Nocera, Maité Latorre, Valeria Carina D’Agostino, Brenda Temperoni, Carla Derisio, María Sofía Dutto, Anabela Berasategui, Irene Ruth Schloss and Rodrigo Javier Gonçalves
Data 2025, 10(4), 48; https://doi.org/10.3390/data10040048 - 31 Mar 2025
Abstract
The present dataset served to evaluate the plankton community composition and abundance in Nuevo Gulf (42°42′ S, 64°30′ W), a World Heritage Site in Argentinian Patagonia and part of the Valdés Biosphere Reserve. It reports zooplankton abundance (>300 µm) and phytoplankton concentration (10–200
[...] Read more.
The present dataset served to evaluate the plankton community composition and abundance in Nuevo Gulf (42°42′ S, 64°30′ W), a World Heritage Site in Argentinian Patagonia and part of the Valdés Biosphere Reserve. It reports zooplankton abundance (>300 µm) and phytoplankton concentration (10–200 μm) during the spring and summer seasons from 2019 to 2021. Special attention was given to the taxonomic classification of zooplankton, leading to the first identification of jellyfish species within the Gulf and the detection of an unreported copepod for the area (Drepanopus forcipatus). Samples were collected at two depths—a surface and a deeper layer—to assess vertical distribution patterns of plankton communities and explore potential environmental drivers influencing their variability. This dataset provides a valuable baseline for future studies analyzing temporal variations in the Gulf’s plankton communities. Moreover, it encourages the local scientific community to contribute data and promote open access to marine biodiversity records in the region.
Full article
Open AccessData Descriptor
A Comprehensive Monte Carlo-Simulated Dataset of WAXD Patterns of Wood Cellulose Microfibrils
by
Ricardo Baettig and Ben Ingram
Data 2025, 10(4), 47; https://doi.org/10.3390/data10040047 - 29 Mar 2025
Abstract
►▼
Show Figures
Wide-angle X-ray diffraction analysis is a powerful tool for investigating the structure and orientation of cellulose microfibrils in plant cell walls, but the complex relationship between diffraction patterns and underlying structural parameters remains challenging to both understand and validate. This study presents a
[...] Read more.
Wide-angle X-ray diffraction analysis is a powerful tool for investigating the structure and orientation of cellulose microfibrils in plant cell walls, but the complex relationship between diffraction patterns and underlying structural parameters remains challenging to both understand and validate. This study presents a comprehensive dataset of 81,906 Monte Carlo-simulated wide-angle X-ray diffraction patterns for the cellulose Iβ 200 lattice. The dataset was generated using a mechanistic, physically informed simulation procedure that incorporates realistic cell wall geometries from wood anatomy, including circular and polygonal fibers, and accounts for the full range of crystallographic and anatomical parameters influencing diffraction patterns. Each simulated pattern required multiple nested Monte Carlo iterations, totaling approximately 10 million calculations per pattern. The resulting dataset pairs each diffraction pattern with its exact generating parameter set, including mean microfibril angle (MFA), MFA variability, fiber tilt angles, and cell wall cross-sectional shape. The dataset addresses a significant barrier in the field—the lack of validated reference data with known ground truth values for testing and developing new analytical methods. It enables the development, validation, and benchmarking of novel algorithms and machine learning models for MFA prediction from diffraction patterns. The simulated data also allow for systematic investigation of the effects of geometric factors on diffraction patterns and serves as an educational resource for visualizing structure–diffraction relationships. Despite some limitations, such as assuming ideal diffraction conditions and focusing primarily on the S2 cell wall layer, this dataset provides a valuable foundation for advancing X-ray diffraction analysis methods for cellulose microfibril architecture characterization in plant cell walls.
Full article

Figure 1
Open AccessData Descriptor
River Restoration Units: Riverscape Units for European Freshwater Ecosystem Management
by
Gonçalo Duarte, Angeliki Peponi, Pedro Segurado, Tamara Leite, Florian Borgwardt, Andrea Funk, Sebastian Birk, Maria Teresa Ferreira and Paulo Branco
Data 2025, 10(4), 46; https://doi.org/10.3390/data10040046 - 28 Mar 2025
Abstract
Freshwater habitats and biota are among the most threatened worldwide. In Europe, significant efforts are being taken to counteract detrimental human impacts on nature. In line with these efforts, the MERLIN project funded by the H2020 program focuses on mainstreaming ecosystem restoration for
[...] Read more.
Freshwater habitats and biota are among the most threatened worldwide. In Europe, significant efforts are being taken to counteract detrimental human impacts on nature. In line with these efforts, the MERLIN project funded by the H2020 program focuses on mainstreaming ecosystem restoration for freshwater-related environments at the landscape scale. Additionally, the Dammed Fish project focuses on one of the main threats affecting European Networks—artificial fragmentation of the river. Meeting the objectives of both projects to work on a large, pan-European scale, we developed a novel spatial database for river units. These spatial units, named River Restoration Units (R2Us), abide by river network functioning while creating the possibility of aggregating multiple data sources with varying resolutions to size-wise comparable units. To create the R2U, we set a methodological framework that departs from the Catchment Characterization and Modelling—River and Catchment Database v2.1 (CCM2)—together with the capabilities of the River Network Toolkit (v2) software (RivTool) to implement a seven-step methodological procedure. This enabled the creation of 11,557 R2U units in European sea outlet river basins along with their attributes. Procedure outputs were associated with spatial layers and then reorganized to create a relational database with normalized data. Under the MERLIN project, R2Us have been used as the spatial analysis unit for a large-scale analysis using multiple input datasets (e.g., ecosystem services, climate, and European Directive reporting data). This database will be valuable for river management and conservation planning, being particularly well suited for large-scale restoration planning in accordance with European Nature legislation.
Full article
(This article belongs to the Topic Intersection Between Macroecology and Data Science)
►▼
Show Figures

Figure 1
Open AccessArticle
A Benchmark Dataset for the Validation of Phase-Based Motion Magnification-Based Experimental Modal Analysis
by
Pierpaolo Dragonetti, Marco Civera, Gaetano Miraglia and Rosario Ceravolo
Data 2025, 10(4), 45; https://doi.org/10.3390/data10040045 - 27 Mar 2025
Abstract
►▼
Show Figures
In recent years, the development of computer vision technology has led to significant implementations of non-contact structural identification. This study investigates the performance offered by the Phase-Based Motion Magnification (PBMM) algorithm, which employs video acquisitions to estimate the displacements of target pixels and
[...] Read more.
In recent years, the development of computer vision technology has led to significant implementations of non-contact structural identification. This study investigates the performance offered by the Phase-Based Motion Magnification (PBMM) algorithm, which employs video acquisitions to estimate the displacements of target pixels and amplify vibrations occurring within a desired frequency band. Using low-cost acquisition setups, this technique can potentially replace the pointwise measurements provided by traditional contact sensors. The main novelty of this experimental research is the validation of PBMM-based experimental modal analyses on multi-storey frame structures with different stiffnesses, considering six structural layouts with different configurations of diagonal bracings. The PBMM results, both in terms of time series and identified modal parameters, are validated against benchmarks provided by an array of physically attached accelerometers. In addition, the influence of pixel intensity on estimates’ accuracy is investigated. Although the PBMM method shows limitations due to the low frame rates of the commercial cameras employed, along with an increase in the signal-to-noise ratio in correspondence of bracing nodes, this method turned out to be effective in modal identification for structures with modest variations in stiffness in terms of height. Moreover, the algorithm exhibits modest sensitivity to pixel intensity. An open access dataset containing video and sensor data recorded during the experiments, is available to support further research at the following https://doi.org/10.5281/zenodo.10412857.
Full article

Figure 1
Open AccessData Descriptor
Experimental Dataset for Fiber Optic Specklegram Sensing Under Thermal Conditions and Use in a Deep Learning Interrogation Scheme
by
Francisco J. Vélez, Juan D. Arango, Víctor H. Aristizábal, Carlos Trujillo and Jorge A. Herrera-Ramírez
Data 2025, 10(4), 44; https://doi.org/10.3390/data10040044 - 26 Mar 2025
Abstract
►▼
Show Figures
This dataset comprises specklegram images acquired from a multimode optical fiber subjected to varying thermal conditions. Designed for training neural networks focused on developing Fiber Optic Specklegram Sensors (FSSs), these experimental data enable the detection of changes in speckle patterns corresponding to applied
[...] Read more.
This dataset comprises specklegram images acquired from a multimode optical fiber subjected to varying thermal conditions. Designed for training neural networks focused on developing Fiber Optic Specklegram Sensors (FSSs), these experimental data enable the detection of changes in speckle patterns corresponding to applied temperature variations. The dataset includes 24,528 images captured over a temperature range from 25 °C to 200 °C, with incremental steps of approximately 0.175 °C. Key acquisition parameters include a wavelength of 633 nm, a sensing zone length of 20 mm, and a multimode fiber with a core diameter of 62.5 μm. This dataset supports developing and validating temperature-sensing models using fiber optic technology and can facilitate benchmarking against other experimental or synthetic datasets. Finally, an implementation is presented for utilizing the dataset in a deep learning interrogation scheme.
Full article

Figure 1
Open AccessArticle
Improved Script Identification Algorithm Using Unicode-Based Regular Expression Matching Strategy
by
Mamtimin Qasim and Wushour Silamu
Data 2025, 10(4), 43; https://doi.org/10.3390/data10040043 - 25 Mar 2025
Abstract
While script identification is the first step in many natural language processing and text mining tasks, at present, there is no open-source script identification algorithm for text. For this reason, we analyze the Unicode encoding of each type of script and construct regular
[...] Read more.
While script identification is the first step in many natural language processing and text mining tasks, at present, there is no open-source script identification algorithm for text. For this reason, we analyze the Unicode encoding of each type of script and construct regular expressions in this study, in order to design an improved script identification algorithm. Because some scripts share common characters, it’s impossible to count and summarize them. As a result, some extracted scripts are incomplete, which affects subsequent text processing tasks; furthermore, if a new script identification feature is required, the regular expression for each script must be re-adjusted. To improve the performance and scalability of script identification, we analyze the encoding range of each script provided on the official Unicode website and identify the shared characters, allowing us to design an improved script identification algorithm. Using this approach, we can fully consider all 169 Unicode script types. The proposed method is scalable and does not require numbers, punctuation marks, or other symbols to be filtered during script identification; furthermore, these items in the text are also included in the script identification results, thus ensuring the integrity of the provided information. The experimental results show that the proposed algorithm performs almost as well as our previous script identification algorithm while providing improvements on its basis.
Full article
(This article belongs to the Section Information Systems and Data Management)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
Linking Fungal Genomics to Thermal Growth Limits: A Dataset of 730 Sequenced Species
by
William Bains
Data 2025, 10(4), 42; https://doi.org/10.3390/data10040042 - 25 Mar 2025
Abstract
The response of fungal species to changes in temperature is of theoretical and practical importance in a world of changing temperatures, ecologies and populations. Genomic sequencing to identify fungal species and their potential metabolic capabilities is well established, but linking this to growth
[...] Read more.
The response of fungal species to changes in temperature is of theoretical and practical importance in a world of changing temperatures, ecologies and populations. Genomic sequencing to identify fungal species and their potential metabolic capabilities is well established, but linking this to growth temperature conditions has been limited. To that end, I describe a dataset that brings together the maximum and minimum temperature growth limits for 730 species of Fungi and Oomycetes for which genome sequences are available, together with supporting proteome and taxonomic data and literature references. The set will provide an entry for studies into how genomic structure and sequence can be used to predict the potential for growth at low or high temperatures, and hence the potential industrial use or pathogenic liability of existing or new fungal species.
Full article
(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
Terrestrial Carbon Storage Estimation in Guangdong Province (2000–2021)
by
Wei Wang, Yueming Hu, Xiaoyun Mao, Ying Zhang, Liangbo Tang and Junxing Cai
Data 2025, 10(4), 41; https://doi.org/10.3390/data10040041 - 25 Mar 2025
Abstract
(1) Terrestrial ecosystems are critical carbon sinks, and the accurate assessment of their carbon storage is vital for understanding global carbon cycles and formulating climate change mitigation strategies. (2) This study integrated vegetation indices, meteorological factors, land use data, soil/vegetation types, field sampling,
[...] Read more.
(1) Terrestrial ecosystems are critical carbon sinks, and the accurate assessment of their carbon storage is vital for understanding global carbon cycles and formulating climate change mitigation strategies. (2) This study integrated vegetation indices, meteorological factors, land use data, soil/vegetation types, field sampling, and a convolutional neural network (CNN) model to estimate the carbon storage of terrestrial ecosystems in Guangdong Province. (3) Total carbon storage increased by 0.11 Pg from 2000 to 2021, with vegetation carbon gains (+0.19 Pg) offsetting soil carbon losses (−0.08 Pg), with the latter primarily being driven by reduced soil carbon in forest ecosystems. (4) Northern and eastern Guangdong exhibit high potential for enhancing carbon storage capacity, which is crucial for achieving regional carbon peaking and neutrality targets.
Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
Analysis of Minerals Using Handheld Laser-Induced Breakdown Spectroscopy Technology
by
Naila Mezoued, Cécile Fabre, Jean Cauzid, YongHwi Kim and Marjolène Jatteau
Data 2025, 10(3), 40; https://doi.org/10.3390/data10030040 - 20 Mar 2025
Abstract
Laser-induced breakdown spectroscopy (LIBS), a rapid and versatile analytical technique, is becoming increasingly widespread within the geoscience community. Suitable for fieldwork analyses using handheld analyzers, the elemental composition of a sample is revealed by generating plasma using a high-energy laser, providing a practical
[...] Read more.
Laser-induced breakdown spectroscopy (LIBS), a rapid and versatile analytical technique, is becoming increasingly widespread within the geoscience community. Suitable for fieldwork analyses using handheld analyzers, the elemental composition of a sample is revealed by generating plasma using a high-energy laser, providing a practical solution to numerous geological challenges, including identifying and discriminating between different mineral phases. This data paper presents over 12,000 reference mineral spectra acquired using a handheld LIBS analyzer (© SciAps), including those of silicates (e.g., beryl, quartz, micas, spodumene, vesuvianite, etc.), carbonates (e.g., dolomite, magnesite, aragonite), phosphates (e.g., amblygonite, apatite, topaz), oxides (e.g., hematite, magnetite, rutile, chromite, wolframite), sulfates (e.g., baryte, gypsum), sulfides (e.g., chalcopyrite, pyrite, pyrrhotite), halides (e.g., fluorite), and native elements (e.g., sulfur and copper). The datasets were collected from 170 pure mineral samples in the form of crystals, powders, and rock specimens, during three research projects: NEXT, Labex Ressources 21, and ARTeMIS. The extensive spectral range covered by the analyzer spectrometers (190–950 nm) allowed for the detection of both major (>1 wt.%) and trace (<1 wt.%) elements, recording a unique spectral signature for each mineral. Mineral spectra can serve as reference data to (i) identify relevant emission lines and spectral ranges for specific minerals, (ii) be compared to unknown LIBS spectra for mineral identification, or (iii) constitute input data for machine learning algorithms.
Full article
(This article belongs to the Topic Techniques and Science Exploitations for Earth Observation and Planetary Exploration)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
The 1688 Sannio–Matese Earthquake: A Dataset of Environmental Effects Based on the ESI-07 Scale
by
Angelica Capozzoli, Valeria Paoletti, Sabina Porfido, Alessandro Maria Michetti and Rosa Nappi
Data 2025, 10(3), 39; https://doi.org/10.3390/data10030039 - 19 Mar 2025
Abstract
►▼
Show Figures
The 1688 Sannio–Matese earthquake, with a macroseismically derived magnitude of Mw = 7 and an epicentral intensity of IMCS = XI, had a deep impact on Southern Italy, causing thousands of casualties, extensive damage and significant environmental effects (EEEs) in the
[...] Read more.
The 1688 Sannio–Matese earthquake, with a macroseismically derived magnitude of Mw = 7 and an epicentral intensity of IMCS = XI, had a deep impact on Southern Italy, causing thousands of casualties, extensive damage and significant environmental effects (EEEs) in the epicentral area. Despite a comprehensive knowledge of its economic and social impacts, information regarding the earthquake’s environmental effects remains poorly studied and far from complete, hindering accurate intensity calculations by the Environmental Seismic Intensity Scale (ESI-07). This study aims to address this knowledge gap by compiling a thorough dataset of the EEEs induced by the earthquake. By consulting over one hundred historical, geological and scientific reports, we have collected and classified, using the ESI-07 scale, its primary and secondary EEEs, most of which were previously undocumented in the literature. We verified the historical sources regarding some of these effects through reconnaissance field mapping. Analysis of the obtained dataset reveals some primary effects (surface faulting) and extensive secondary effects, such as slope movements, ground cracks, hydrological anomalies, liquefaction and gas exhalation, which affected numerous towns. These findings enabled us to reassess the Sannio earthquake intensity, considering its environmental impact and comparing traditional macroseismic scales with the ESI-07. Our analysis allowed us to provide an epicentral intensity ESI of I = X, one degree lower than the published IMCS = XI. This study highlights the importance of combining traditional scales with the ESI-07 for more accurate hazard assessments. The macroseismic revision provides valuable insights for seismic hazard evaluation and land-use planning in the Sannio–Matese region, especially considering the distribution of the secondary effects.
Full article

Figure 1
Open AccessData Descriptor
Historical Hourly Information of Four European Wind Farms for Wind Energy Forecasting and Maintenance
by
Javier Sánchez-Soriano, Pedro Jose Paniagua-Falo and Carlos Quiterio Gómez Muñoz
Data 2025, 10(3), 38; https://doi.org/10.3390/data10030038 - 19 Mar 2025
Abstract
For an electric company, having an accurate forecast of the expected electrical production and maintenance from its wind farms is crucial. This information is essential for operating in various existing markets, such as the Iberian Energy Market Operator—Spanish Hub (OMIE in its Spanish
[...] Read more.
For an electric company, having an accurate forecast of the expected electrical production and maintenance from its wind farms is crucial. This information is essential for operating in various existing markets, such as the Iberian Energy Market Operator—Spanish Hub (OMIE in its Spanish acronym), the Portuguese Hub (OMIP in its Spanish acronym), and the Iberian electricity market between the Kingdom of Spain and the Portuguese Republic (MIBEL in its Spanish acronym), among others. The accuracy of these forecasts is vital for estimating the costs and benefits of handling electricity. This article explains the process of creating the complete dataset, which includes the acquisition of the hourly information of four European wind farms as well as a description of the structure and content of the dataset, which amounts to 2 years of hourly information. The wind farms are in three countries: Auvergne-Rhône-Alpes (France), Aragon (Spain), and the Piemonte region (Italy). The dataset was built and validated following the CRISP-DM methodology, ensuring a structured and replicable approach to data processing and preparation. To confirm its reliability, the dataset was tested using a basic predictive model, demonstrating its suitability for wind energy forecasting and maintenance optimization. The dataset presented is available and accessible for improving the forecasting and management of wind farms, especially for the detection of faults and the elaboration of a preventive maintenance plan.
Full article
(This article belongs to the Special Issue Cutting-Edge Datasets and Algorithms for Enhancing Industrial Processes and Supply Chain Optimization)
►▼
Show Figures

Figure 1
Open AccessData Descriptor
Experimental Parametric Forecast of Solar Energy over Time: Sample Data Descriptor
by
Fernando Venâncio Mucomole, Carlos Augusto Santos Silva and Lourenço Lázaro Magaia
Data 2025, 10(3), 37; https://doi.org/10.3390/data10030037 - 17 Mar 2025
Abstract
Variations in solar energy when it reaches the Earth impact the production of photovoltaic (PV) solar plants and, in turn, the dynamics of clean energy expansion. This incentivizes the objective of experimentally forecasting solar energy by parametric models, the results of which are
[...] Read more.
Variations in solar energy when it reaches the Earth impact the production of photovoltaic (PV) solar plants and, in turn, the dynamics of clean energy expansion. This incentivizes the objective of experimentally forecasting solar energy by parametric models, the results of which are then refined by machine learning methods (MLMs). To estimate solar energy, parametric models consider all atmospheric, climatic, geographic, and spatiotemporal factors that influence decreases in solar energy. In this study, data on ozone, evenly mixed gases, water vapor, aerosols, and solar radiation were gathered throughout the year in the mid-north area of Mozambique. The results show that the calculated solar energy was close to the theoretical solar energy under a clear sky. When paired with MLMs, the clear-sky index had a correlational order of 0.98, with most full-sun days having intermediate and clear-sky types. This suggests the potential of this area for PV use, with high correlation and regression coefficients in the range of 0.86 and 0.89 and a measurement error in the range of 0.25. We conclude that evenly mixed gases and the ozone layer have considerable influence on transmittance. However, the parametrically forecasted solar energy is close to the energy forecasted by the theoretical model. By adjusting the local characteristics, the model can be used in diverse contexts to increase PV plants’ electrical power output efficiency.
Full article
(This article belongs to the Topic Smart Energy Systems, 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
KRID: A Large-Scale Nationwide Korean Road Infrastructure Dataset for Comprehensive Road Facility Recognition
by
Hyeongbok Kim, Eunbi Kim, Sanghoon Ahn, Beomjin Kim, Sung Jin Kim, Tae Kyung Sung, Lingling Zhao, Xiaohong Su and Gilmu Dong
Data 2025, 10(3), 36; https://doi.org/10.3390/data10030036 - 14 Mar 2025
Abstract
►▼
Show Figures
Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed
[...] Read more.
Comprehensive datasets are crucial for developing advanced AI solutions in road infrastructure, yet most existing resources focus narrowly on vehicles or a limited set of object categories. To address this gap, we introduce the Korean Road Infrastructure Dataset (KRID), a large-scale dataset designed for real-world road maintenance and safety applications. Our dataset covers highways, national roads, and local roads in both city and non-city areas, comprising 34 distinct types of road infrastructure—from common elements (e.g., traffic signals, gaze-directed poles) to specialized structures (e.g., tunnels, guardrails). Each instance is annotated with either bounding boxes or polygon segmentation masks under stringent quality control and privacy protocols. To demonstrate the utility of this resource, we conducted object detection and segmentation experiments using YOLO-based models, focusing on guardrail damage detection and traffic sign recognition. Preliminary results confirm its suitability for complex, safety-critical scenarios in intelligent transportation systems. Our main contributions include: (1) a broader range of infrastructure classes than conventional “driving perception” datasets, (2) high-resolution, privacy-compliant annotations across diverse road conditions, and (3) open-access availability through AI Hub and GitHub. By highlighting critical yet often overlooked infrastructure elements, this dataset paves the way for AI-driven maintenance workflows, hazard detection, and further innovations in road safety.
Full article

Figure 1
Open AccessData Descriptor
A Comprehensive Indoor Environment Dataset from Single-Family Houses in the US
by
Sheik Murad Hassan Anik, Xinghua Gao and Na Meng
Data 2025, 10(3), 35; https://doi.org/10.3390/data10030035 - 5 Mar 2025
Abstract
►▼
Show Figures
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data were collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection
[...] Read more.
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data were collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at a frequency of one record per minute for a year, combining to a total over million records. The paper provides actual floor plans with sensor placements to aid researchers and practitioners in creating reliable building performance models. The techniques used to collect and verify the data are also explained in the paper. The resulting dataset can be employed to enhance models for building energy consumption, occupant behavior, predictive maintenance, and other relevant purposes.
Full article

Figure 1
Open AccessData Descriptor
Draft Genome Sequence Data of the Ensifer sp. P24N7, a Symbiotic Bacteria Isolated from Nodules of Phaseolus vulgaris Grown in Mining Tailings from Huautla, Morelos, Mexico
by
José Augusto Ramírez-Trujillo, Maria Guadalupe Castillo-Texta, Mario Ramírez-Yáñez and Ramón Suárez-Rodríguez
Data 2025, 10(3), 34; https://doi.org/10.3390/data10030034 - 27 Feb 2025
Abstract
In this work, we report the draft genome sequence of Ensifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules of Phaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced
[...] Read more.
In this work, we report the draft genome sequence of Ensifer sp. P24N7, a symbiotic nitrogen-fixing bacterium isolated from nodules of Phaseolus vulgaris var. Negro Jamapa was planted in pots that contained mining tailings from Huautla, Morelos, México. The genomic DNA was sequenced by an Illumina NovaSeq 6000 using the 250 bp paired-end protocol obtaining 1,188,899 reads. An assembly generated with SPAdes v. 3.15.4 resulted in a genome length of 7,165,722 bp composed of 181 contigs with a N50 of 323,467 bp, a coverage of 76X, and a GC content of 61.96%. The genome was annotated with the NCBI Prokaryotic Genome Annotation Pipeline and contains 6631 protein-coding sequences, 3 complete rRNAs, 52 tRNAs, and 4 non-coding RNAs. The Ensifer sp. P24N7 genome has 59 genes related to heavy metal tolerance predicted by RAST server. These data may be useful to the scientific community because they can be used as a reference for other works related to heavy metals, including works in Huautla, Morelos.
Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
►▼
Show Figures

Figure 1
Open AccessArticle
Data Quality Tools to Enhance a Network Anomaly Detection Benchmark
by
José Camacho and Rafael A. Rodríguez-Gómez
Data 2025, 10(3), 33; https://doi.org/10.3390/data10030033 - 25 Feb 2025
Abstract
►▼
Show Figures
Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model.
[...] Read more.
Network traffic datasets are essential for the construction of traffic models, often using machine learning (ML) techniques. Among other applications, these models can be employed to solve complex optimization problems or to identify anomalous behaviors, i.e., behaviors that deviate from the established model. However, the performance of the ML model depends, among other factors, on the quality of the data used to train it. Benchmark datasets, with a profound impact on research findings, are often assumed to be of good quality by default. In this paper, we derive four variants of a benchmark dataset in network anomaly detection (UGR’16, a flow-based real-world traffic dataset designed for anomaly detection), and show that the choice among variants has a larger impact on model performance than the ML technique used to build the model. To analyze this phenomenon, we propose a methodology to investigate the causes of these differences and to assess the quality of the data labeling. Our results underline the importance of paying more attention to data quality assessment in network anomaly detection.
Full article

Figure 1
Open AccessData Descriptor
Spatial Dataset of Climate Robust and High-Yield Agricultural Areas in Brandenburg: Results of a Classification Framework Using Bio-Economic Climate Simulations
by
Hannah Jona von Czettritz, Sandra Uthes, Johannes Schuler, Kurt-Christian Kersebaum and Peter Zander
Data 2025, 10(3), 32; https://doi.org/10.3390/data10030032 - 25 Feb 2025
Abstract
Coherent spatial data are crucial for informed land use and regional planning decisions, particularly in the context of securing a crisis-proof food supply and adapting to climate change. This dataset provides spatial information on climate-robust and high-yield agricultural arable land in Brandenburg, Germany,
[...] Read more.
Coherent spatial data are crucial for informed land use and regional planning decisions, particularly in the context of securing a crisis-proof food supply and adapting to climate change. This dataset provides spatial information on climate-robust and high-yield agricultural arable land in Brandenburg, Germany, based on the results of a classification using bio-economic climate simulations. The dataset is intended to support regional planning and policy makers in zoning decisions (e.g., photovoltaic power plants) by identifying climate-robust arable land with high current and stable future production potential that should be reserved for agricultural use. The classification method used to generate the dataset includes a wide range of indicators, including established approaches, such as a soil quality index, drought, water, and wind erosion risk, as well as a dynamic approach, using bio-economic simulations, which determine the production potential under future climate scenarios. The dataset is a valuable resource for spatial planning and climate change adaptation, contributing to long-term food security especially in dry areas such as the state of Brandenburg facing increased production risk under future climatic conditions, thereby serving globally as an example for land use planning challenges related to climate change.
Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
►▼
Show Figures

Figure 1
Open AccessArticle
Using Weather Data for Improved Analysis of Vehicle Energy Efficiency
by
Reno Filla
Data 2025, 10(3), 31; https://doi.org/10.3390/data10030031 - 24 Feb 2025
Abstract
In moving vehicles, the dominating energy losses are due to interactions with the environment: air resistance and rolling resistance. It is known that weather has a significant impact, yet there is a lack of literature showing how the wealth of openly available data
[...] Read more.
In moving vehicles, the dominating energy losses are due to interactions with the environment: air resistance and rolling resistance. It is known that weather has a significant impact, yet there is a lack of literature showing how the wealth of openly available data from professional weather observations can be used in this context. This article will give an overview of how such data are structured and how they can be accessed in order to augment logs gained during vehicle operation or simulated trips. Two efficient algorithms for such data extraction and augmentation are discussed and several examples for use are provided, also demonstrating that some caveats do exist with respect to the source of weather data.
Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
►▼
Show Figures

Figure 1

Journal Menu
► ▼ Journal Menu-
- Data Home
- Aims & Scope
- Editorial Board
- Reviewer Board
- Topical Advisory Panel
- Instructions for Authors
- Guidelines for Reviewers
- Special Issues
- Topics
- Sections & Collections
- Article Processing Charge
- Indexing & Archiving
- Editor’s Choice Articles
- Most Cited & Viewed
- Journal Statistics
- Journal History
- Journal Awards
- Editorial Office
Journal Browser
► ▼ Journal BrowserHighly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Algorithms, Data, Earth, Geosciences, Mathematics, Land, Water, IJGI
Applications of Algorithms in Risk Assessment and Evaluation
Topic Editors: Yiding Bao, Qiang WeiDeadline: 31 July 2025
Topic in
AI, Data, Economies, Mathematics, Risks
Advanced Techniques and Modeling in Business and Economics
Topic Editors: José Manuel Santos-Jaén, Ana León-Gomez, María del Carmen Valls MartínezDeadline: 30 September 2025
Topic in
Biology, Data, Diversity, Fishes, Animals, Conservation, Hydrobiology
Intersection Between Macroecology and Data Science
Topic Editors: Paulo Branco, Gonçalo DuarteDeadline: 30 November 2025
Topic in
Applied Sciences, Batteries, Buildings, Data, Electricity, Electronics, Energies, Smart Cities
Smart Energy Systems, 2nd Edition
Topic Editors: Hugo Morais, Rui Castro, Cindy GuzmanDeadline: 30 December 2025

Conferences
Special Issues
Special Issue in
Data
Cutting-Edge Datasets and Algorithms for Enhancing Industrial Processes and Supply Chain Optimization
Guest Editors: Iván Pérez-Olguín, Luis Carlos Méndez González, Luis Alberto Rodríguez-PicónDeadline: 30 April 2025
Special Issue in
Data
Data-Driven Approaches for Safety in Industrial Sites
Guest Editors: Francesca Mauro, Mara Lombardi, Mario FargnoliDeadline: 30 June 2025
Special Issue in
Data
Benchmarking Datasets in Bioinformatics, 2nd Edition
Guest Editor: Pufeng DuDeadline: 31 July 2025
Special Issue in
Data
Data Mining and Computational Intelligence for E-Learning and Education—3rd Edition
Guest Editor: Antonio Sarasa-CabezueloDeadline: 20 August 2025
Topical Collections
Topical Collection in
Data
Modern Geophysical and Climate Data Analysis: Tools and Methods
Collection Editors: Vladimir Sreckovic, Zoran Mijic