Previous Issue
Volume 10, June
 
 

Data, Volume 10, Issue 7 (July 2025) – 27 articles

Cover Story (view full-size image): Functional magnetic resonance imaging has become instrumental in the investigation of autism spectrum disorder (ASD). The Autism Brain Imaging Data Exchange (ABIDE) facilitates research using this modality through its data-sharing initiative. While ABIDE offers raw data and data preprocessed with atlases, independent component analysis (ICA) remains underutilized. ICA is a data-driven means of reducing dimensionality without making assumptions regarding delineations. Additionally, ICA identifies functional brain networks called resting-state networks (RSNs). No dataset preprocessed with extracted RSNs has been made available yet. We address this gap by presenting RSNs extracted from ABIDE data. These RSNs reveal neural activation clusters, providing a perspective on ASD analyses complementary to the predominantly atlas-based literature. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
7 pages, 4461 KiB  
Data Descriptor
Dataset on Environmental Parameters and Greenhouse Gases in Port and Harbor Seawaters of Jeju Island, Korea
by Jae-Hyun Lim, Ju-Hyoung Kim, Hyo-Ryeon Kim, Seo-Young Kim and Il-Nam Kim
Data 2025, 10(7), 118; https://doi.org/10.3390/data10070118 (registering DOI) - 19 Jul 2025
Abstract
This dataset presents environmental observations collected in August 2021 from 18 port and harbor sites located around Jeju Island, Korea. It includes physical, biogeochemical, and greenhouse gas (GHG) variables measured in surface seawater, such as temperature, salinity, dissolved oxygen, nutrients, chlorophyll-a, [...] Read more.
This dataset presents environmental observations collected in August 2021 from 18 port and harbor sites located around Jeju Island, Korea. It includes physical, biogeochemical, and greenhouse gas (GHG) variables measured in surface seawater, such as temperature, salinity, dissolved oxygen, nutrients, chlorophyll-a, pH, total alkalinity, and dissolved inorganic carbon. Concentrations and air–sea fluxes of nitrous oxide (N2O), methane (CH4), and carbon dioxide (CO2) were also quantified. All measurements were conducted following standardized analytical protocols, and certified reference materials and duplicate analyses were used to ensure data accuracy. Consequently, the dataset revealed that elevated nutrient accumulation in port and harbor waters and GHG concentrations tended to be higher at sites with stronger land-based influence. During August 2021, most sites functioned as sources of N2O, CH4, and CO2 to the atmosphere. This integrated dataset offers valuable insights into the influence of anthropogenic and hydrological factors on coastal GHG dynamics and provides a foundation for future studies across diverse semi-enclosed marine systems. Full article
Show Figures

Figure 1

19 pages, 1906 KiB  
Article
LADOS: Aerial Imagery Dataset for Oil Spill Detection, Classification, and Localization Using Semantic Segmentation
by Konstantinos Gkountakos, Maria Melitou, Konstantinos Ioannidis, Konstantinos Demestichas, Stefanos Vrochidis and Ioannis Kompatsiaris
Data 2025, 10(7), 117; https://doi.org/10.3390/data10070117 - 14 Jul 2025
Viewed by 223
Abstract
Oil spills on the water surface pose a significant environmental hazard, underscoring the critical need for developing Artificial Intelligence (AI) detection methods. Utilizing Unmanned Aerial Vehicles (UAVs) can significantly improve the efficiency of oil spill detection at early stages, reducing environmental damage; however, [...] Read more.
Oil spills on the water surface pose a significant environmental hazard, underscoring the critical need for developing Artificial Intelligence (AI) detection methods. Utilizing Unmanned Aerial Vehicles (UAVs) can significantly improve the efficiency of oil spill detection at early stages, reducing environmental damage; however, there is a lack of training datasets in the domain. In this paper, LADOS is introduced, an aeriaL imAgery Dataset for Oil Spill detection, classification, and localization by incorporating both liquid and solid classes of low-altitude images. LADOS comprises 3388 images annotated at the pixel level across six distinct classes, including the background. In addition to including a general oil class describing various oil spill appearances, LADOS provides a detailed categorization by including emulsions and sheens. Detailed examination of both instance and semantic segmentation approaches is illustrated to validate the dataset’s performance and significance to the domain. The results on the test set demonstrate an overall performance exceeding 66% mean Intersection over Union (mIoU), with specific classes such as oil and emulsion to surpass 74% of IoU part of the experiments. Full article
Show Figures

Figure 1

29 pages, 2211 KiB  
Article
Big Data Analytics Framework for Decision-Making in Sports Performance Optimization
by Dan Cristian Mănescu
Data 2025, 10(7), 116; https://doi.org/10.3390/data10070116 - 14 Jul 2025
Viewed by 255
Abstract
The rapid proliferation of wearable sensors and advanced tracking technologies has revolutionized data collection in elite sports, enabling continuous monitoring of athletes’ physiological and biomechanical states. This study proposes a comprehensive big data analytics framework that integrates data acquisition, processing, analytics, and decision [...] Read more.
The rapid proliferation of wearable sensors and advanced tracking technologies has revolutionized data collection in elite sports, enabling continuous monitoring of athletes’ physiological and biomechanical states. This study proposes a comprehensive big data analytics framework that integrates data acquisition, processing, analytics, and decision support, demonstrated through synthetic datasets in football, basketball, and athletics case scenarios, modeled to represent typical data patterns and decision-making workflows observed in elite sport environments. Analytical methods, including gradient boosting classifiers, logistic regression, and multilayer perceptron models, were employed to predict injury risk, optimize in-game tactical decisions, and personalize sprint mechanics training. Key results include a 12% reduction in hamstring injury rates in football, a 16% improvement in clutch decision-making accuracy in basketball, and an 8% decrease in 100 m sprint times among athletes. The framework’s visualization tools and alert systems supported actionable insights for coaches and medical staff. Challenges such as data quality, privacy compliance, and model interpretability are addressed, with future research focusing on edge computing, federated learning, and augmented reality integration for enhanced real-time feedback. This study demonstrates the potential of integrated big data analytics to transform sports performance optimization, offering a reproducible and ethically sound platform for advancing personalized, data-driven athlete management. Full article
(This article belongs to the Special Issue Big Data and Data-Driven Research in Sports)
Show Figures

Figure 1

6 pages, 171 KiB  
Data Descriptor
A Combined HF Radar and Drifter Dataset for Analysis of Highly Variable Surface Currents
by Bartolomeo Doronzo, Michele Bendoni, Stefano Taddei, Angelo Boccacci and Carlo Brandini
Data 2025, 10(7), 115; https://doi.org/10.3390/data10070115 - 12 Jul 2025
Viewed by 170
Abstract
This data descriptor presents the HF radar and drifter datasets, along with the methods used to process and apply them in a previously published study on the validation of surface current measurements in a region characterized by highly variable coastal dynamics. The data [...] Read more.
This data descriptor presents the HF radar and drifter datasets, along with the methods used to process and apply them in a previously published study on the validation of surface current measurements in a region characterized by highly variable coastal dynamics. The data were collected in the framework of a large-scale Lagrangian experiment, which included extensive drifter deployment and the generation of virtual trajectories based on HF radar-derived flow fields. Both Eulerian and Lagrangian approaches were used to assess radar performance through correlation and RMSE metrics, with additional refinement achieved via Kriging interpolation. The validation results, published in Remote Sensing, demonstrated good agreement between HF radar and drifter observations, particularly when quality control parameters were optimized. The datasets and associated methodologies described here support ongoing efforts to enhance HF radar tuning strategies and improve surface current monitoring in complex marine environments. Full article
15 pages, 2054 KiB  
Data Descriptor
Data on Brazilian Powdered Milk Formulations for Infants of Various Age Groups: 0–6 Months, 6–12 Months, and 12–36 Months
by Francisco José Mendes dos Reis, Antonio Marcos Jacques Barbosa, Elaine Silva de Pádua Melo, Marta Aratuza Pereira Ancel, Rita de Cássia Avellaneda Guimarães, Priscila Aiko Hiane, Flavio Santana Michels, Daniele Bogo, Karine de Cássia Freitas Gielow, Diego Azevedo Zoccal Garcia, Geovanna Vilalva Freire, João Batista Gomes de Souza and Valter Aragão do Nascimento
Data 2025, 10(7), 114; https://doi.org/10.3390/data10070114 - 9 Jul 2025
Viewed by 232
Abstract
Milk powder is a key nutritional alternative to breastfeeding, but its thermal properties, which vary with temperature, can affect its quality and shelf life. However, there is little information about the physical and chemical properties of powdered milk in several countries. This dataset [...] Read more.
Milk powder is a key nutritional alternative to breastfeeding, but its thermal properties, which vary with temperature, can affect its quality and shelf life. However, there is little information about the physical and chemical properties of powdered milk in several countries. This dataset contains the result of an analysis of the aflatoxins, macroelement and microelement concentrations, oxidative stability, and fatty acid profile of infant formula milk powder. The concentrations of Al, As, Ba, Cd, Co, Cr, Cu, Fe, Mg, Mn, Mo, Ni, Pb, Se, V, and Zn in digested powdered milk samples were quantified through inductively coupled plasma optical emission spectrometry (ICP OES). Thermogravimetry (TG) and differential scanning calorimetry (DSC) were used to estimate the oxidative stability of infant formula milk powder, while the methyl esters of the fatty acids were analyzed by gas chromatography. Most milk samples showed significant concentrations of As (0.5583–1.3101 mg/kg) and Pb (0.2588–0.0847 mg/kg). The concentrations of aflatoxins G2 and B2 are below the limits established by Brazilian regulatory agencies. The thermal degradation behavior of the samples is not the same due to their fatty acid compositions. The data presented may be useful in identifying compounds present in infant milk powder used as a substitute for breast milk and understanding the mechanism of thermal stability and degradation, ensuring food safety for those who consume them. Full article
Show Figures

Figure 1

10 pages, 4821 KiB  
Data Descriptor
Multi-Resolution Remote Sensing Dataset for the Detection of Anthropogenic Litter: A Multi-Platform and Multi-Sensor Approach
by Robert Rettig, Felix Becker, Alexander Berghoff, Tobias Binkele, Wolfram Michael Butter, Tilman Floehr, Martin Kumm, Carolin Leluschko, Florian Littau, Elmar Reinders, Eike Rodenbäck, Tobias Schmid, Sabine Schründer, Sören Schweigert, Michael Sinhuber, Jens Wellhausen, Frederic Stahl and Christoph Tholen
Data 2025, 10(7), 113; https://doi.org/10.3390/data10070113 - 9 Jul 2025
Viewed by 207
Abstract
The dataset developed within the PlasticObs+ project aims to facilitate a multi-resolution approach for detecting and quantifying anthropogenic litter through areal images. Traditional detection methods often suffer from narrow, use-case-specific limitations, reducing their transferability. To address this, an image dataset was created featuring [...] Read more.
The dataset developed within the PlasticObs+ project aims to facilitate a multi-resolution approach for detecting and quantifying anthropogenic litter through areal images. Traditional detection methods often suffer from narrow, use-case-specific limitations, reducing their transferability. To address this, an image dataset was created featuring various spatial and spectral resolutions. The highest spatial resolution images (ground sampling distance = 0.2 cm) were used to generate a labeled dataset, which was georeferenced for mapping onto coarser-resolution images. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

9 pages, 16281 KiB  
Data Descriptor
Advancements in Regional Weather Modeling for South Asia Through the High Impact Weather Assessment Toolkit (HIWAT) Archive
by Timothy Mayer, Jonathan L. Case, Jayanthi Srikishen, Kiran Shakya, Deepak Kumar Shah, Francisco Delgado Olivares, Lance Gilliland, Patrick Gatlin, Birendra Bajracharya and Rajesh Bahadur Thapa
Data 2025, 10(7), 112; https://doi.org/10.3390/data10070112 - 9 Jul 2025
Viewed by 271
Abstract
Some of the most intense thunderstorms and extreme weather events on Earth occur in the Hindu Kush Himalaya (HKH) region of Southern Asia. The need to provide end users, stakeholders, and decision makers with accurate forecasts and alerts of extreme weather is critical. [...] Read more.
Some of the most intense thunderstorms and extreme weather events on Earth occur in the Hindu Kush Himalaya (HKH) region of Southern Asia. The need to provide end users, stakeholders, and decision makers with accurate forecasts and alerts of extreme weather is critical. To that end, a cutting edge weather modeling framework coined the High Impact Weather Assessment Toolkit (HIWAT) was created through the National Aeronautics and Space Administration (NASA) SERVIR Applied Sciences Team (AST) effort, which consists of a suite of varied numerical weather prediction (NWP) model runs to provide probabilities of straight-line damaging winds, hail, frequent lightning, and intense rainfall as part of a daily 54 h forecast tool. The HIWAT system was first deployed in 2018, and the recently released model archive hosted by the Global Hydrometeorology Resource Center (GHRC) Distributed Active Archive Center (DAAC) provides daily model outputs for the years of 2018–2022. With a nested modeling domain covering Nepal, Bangladesh, Bhutan, and Northeast India, the HIWAT archive spans the critical pre-monsoon and monsoon months of March–October when severe weather and flooding are most frequent. As part of NASA’s Transformation To Open Science (TOPS), this data archive is freely available to practitioners and researchers. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

12 pages, 659 KiB  
Article
PlantDRs: A Database of Dispersed Repeats in Plant Genomes Identified by the Iterative Procedure Method
by Valentina Rudenko, Eugene Korotkov and Dmitrii Kostenko
Data 2025, 10(7), 111; https://doi.org/10.3390/data10070111 - 9 Jul 2025
Viewed by 226
Abstract
In this work, we searched for and analyzed highly divergent dispersed repeats (DRs) in the genomes of four plants: Arabidopsis thaliana, Capsicum annuum, Daucus carota, and Zea mays. DRs were detected using the iterative procedure method which has shown [...] Read more.
In this work, we searched for and analyzed highly divergent dispersed repeats (DRs) in the genomes of four plants: Arabidopsis thaliana, Capsicum annuum, Daucus carota, and Zea mays. DRs were detected using the iterative procedure method which has shown efficacy in searches for highly divergent repeats in bacteria and algae. The results indicated that the number of DRs in the plant genomes depended on the genome size, whereas the number of repeat families did not. The DRs covered from 36 to 50% of the studied genomes. The shortest repeats were observed in the D. carota genome, but their consensus lengths were similar to those in the other species. Analysis of periodicity in various DR families showed that most periods were 3 bp long. We created a database of the detected DRs, which contains 5,392,216 DRs grouped in 150 families and which can be accessed on the Research Center of Biotechnology RAS server. The server makes it possible to search for repeats based on various criteria and to download the obtained data. Full article
Show Figures

Figure 1

18 pages, 1810 KiB  
Article
Analysis of Student Dropout Risk in Higher Education Using Proportional Hazards Model and Based on Entry Characteristics
by Liga Paura, Irina Arhipova, Gatis Vitols and Sandra Sproge
Data 2025, 10(7), 110; https://doi.org/10.3390/data10070110 - 8 Jul 2025
Viewed by 645
Abstract
The aim of this study is to identify the key factors contributing to student dropout and to develop a predictive model that estimates the dropout risk of students based on their entry characteristics and enrolment registration data. Our analysis is based on the [...] Read more.
The aim of this study is to identify the key factors contributing to student dropout and to develop a predictive model that estimates the dropout risk of students based on their entry characteristics and enrolment registration data. Our analysis is based on the registration and academic data of 971 full-time and part-time bachelor’s students in five faculties, who were enrolled in the academic year 2021–2022 at the Latvia University of Life Sciences and Technologies (LBTU). The dropout analysis was done during the 3.5 years of study, when the students started their last semester in engineering and information technology, agriculture and food technology, economics and social sciences, and forest and environmental studies and when veterinary medicine students had completed more than half of their program of study. Survival analysis methods were used during the study. Students’ dropout risk in relation to gender, faculty, priority to study in the program, and secondary school performance (SM) was estimated using the Proportional hazard model (Cox model). The highest student dropout was observed during the first year of study. Secondary school performance was a significant predictor of students’ dropout risk; students with higher SM had a lower dropout risk (HR = 0.66, p < 0.05). As well, student dropout can be explained by faculty or study programme. Students in economics and social sciences were at lower dropout risk than the students from the other faculties. Results show the model’s concordance index was 0.59, and this indicates that additional or stronger predictors may be needed to improve model performance. Full article
Show Figures

Figure 1

16 pages, 3375 KiB  
Data Descriptor
ICA-Based Resting-State Networks Obtained on Large Autism fMRI Dataset ABIDE
by Sjir J. C. Schielen, Jesper Pilmeyer, Albert P. Aldenkamp, Danny Ruijters and Svitlana Zinger
Data 2025, 10(7), 109; https://doi.org/10.3390/data10070109 - 3 Jul 2025
Viewed by 437
Abstract
Functional magnetic resonance imaging (fMRI) has become instrumental in researching the functioning of the brain. One application of fMRI is investigating the brains of people with autism spectrum disorder (ASD). The Autism Brain Imaging Data Exchange (ABIDE) facilitates this research through its extensive [...] Read more.
Functional magnetic resonance imaging (fMRI) has become instrumental in researching the functioning of the brain. One application of fMRI is investigating the brains of people with autism spectrum disorder (ASD). The Autism Brain Imaging Data Exchange (ABIDE) facilitates this research through its extensive data-sharing initiative. While ABIDE offers raw data and data preprocessed with various atlases, independent component analysis (ICA) for dimensionality reduction remains underutilized. ICA is a data-driven way to reduce dimensionality without prior assumptions on delineations. Additionally, ICA separates the noise from the signal, and the signal components correspond well to functional brain networks called resting-state networks (RSNs). Currently, no large, readily available dataset preprocessed with ICA exists. Here, we address this gap by presenting ABIDE’s data preprocessed to extract ICA-based resting-state networks, which are publicly available. These RSNs unveil neural activation clusters without atlas constraints, offering a perspective on ASD analyses that complements the predominantly atlas-based literature. This contribution provides a resource for further research into ASD, benchmarking between methodologies, and the development of new analytical approaches. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Graphical abstract

10 pages, 687 KiB  
Data Descriptor
A DNA Barcode Dataset for the Aquatic Fauna of the Panama Canal: Novel Resources for Detecting Faunal Change in the Neotropics
by Kristin Saltonstall, Rachel Collin, Celestino Aguilar, Fernando Alda, Laura M. Baldrich-Mora, Victor Bravo, María Fernanda Castillo, Sheril Castro, Luis F. De León, Edgardo Díaz-Ferguson, Humberto A. Garcés, Eyda Gómez, Rigoberto G. González, Maribel A. González-Torres, Hector M. Guzman, Alexandra Hiller, Roberto Ibáñez, César Jaramillo, Klara L. Kaiser, Yulang Kam, Mayra Lemus Peralta, Oscar G. Lopez, Maycol E. Madrid C., Matthew J. Miller, Natalia Ossa-Hernandez, Ruth G. Reina, D. Ross Robertson, Tania E. Romero-Gonzalez, Milton Sandoval, Oris Sanjur, Carmen Schlöder, Ashley E. Sharpe, Diana Sharpe, Jakob Siepmann, David Strasiewsky, Mark E. Torchin, Melany Tumbaco, Marta Vargas, Miryam Venegas-Anaya, Benjamin C. Victor and Gustavo Castellanos-Galindoadd Show full author list remove Hide full author list
Data 2025, 10(7), 108; https://doi.org/10.3390/data10070108 - 2 Jul 2025
Viewed by 429
Abstract
DNA metabarcoding is a powerful biodiversity monitoring tool, enabling simultaneous assessments of diverse biological communities. However, its accuracy depends on the reliability of reference databases that assign taxonomic identities to obtained sequences. Here we provide a DNA barcode dataset for aquatic fauna of [...] Read more.
DNA metabarcoding is a powerful biodiversity monitoring tool, enabling simultaneous assessments of diverse biological communities. However, its accuracy depends on the reliability of reference databases that assign taxonomic identities to obtained sequences. Here we provide a DNA barcode dataset for aquatic fauna of the Panama Canal, a region that connects the Western Atlantic and Eastern Pacific oceans. This unique setting creates opportunities for trans-oceanic dispersal while acting as a modern physical dispersal barrier for some terrestrial organisms. We sequenced 852 specimens from a diverse array of taxa (e.g., fishes, zooplankton, mollusks, arthropods, reptiles, birds, and mammals) using COI, and in some cases, 12S and 16S barcodes. These data were collected for a variety of studies, many of which have sought to understand recent changes in aquatic communities in the Panama Canal. The DNA barcodes presented here are all from captured specimens, which confirms their presence in Panama and, in many cases, inside the Panama Canal. Both native and introduced taxa are included. This dataset represents a valuable resource for environmental DNA (eDNA) work in the Panama Canal region and across the Neotropics aimed at monitoring ecosystem health, tracking non-native and potentially invasive species, and understanding the ecology and distribution of these freshwater and euryhaline taxa. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics, 2nd Edition)
Show Figures

Figure 1

7 pages, 1300 KiB  
Data Descriptor
Global Database for Naturally Occurring Radionuclides Associated with Offshore Oil and Gas Production
by Ziran Wei, Songjie He, Stephanie Sharuga and Kanchan Maiti
Data 2025, 10(7), 107; https://doi.org/10.3390/data10070107 - 1 Jul 2025
Viewed by 324
Abstract
This study compiles a comprehensive dataset on the occurrence, distribution, and potential impacts of Naturally Occurring Radionuclides (NORMs) near offshore oil and gas platforms. It encompasses data, including activities (Bq/l) and exposure levels (Msv), derived from various environmental matrices. A particular emphasis is [...] Read more.
This study compiles a comprehensive dataset on the occurrence, distribution, and potential impacts of Naturally Occurring Radionuclides (NORMs) near offshore oil and gas platforms. It encompasses data, including activities (Bq/l) and exposure levels (Msv), derived from various environmental matrices. A particular emphasis is placed on petroleum products and waste, such as produced water, scales, and sludges. The dataset contributes to a better understanding of the distribution of NORM wastes in marine environments, informs future radiological safety standards, contributes to the formulation of regulatory policies, and facilitates the design of mitigation strategies. The information—literature and data from five continents over the past 70 years—has been carefully compiled and organized to support intuitive analysis, making it a valuable tool for policymakers and researchers. Full article
Show Figures

Figure 1

27 pages, 1023 KiB  
Article
Exploring Legislative Textual Data in Brazilian Portuguese: Readability Analysis and Knowledge Graph Generation
by Gisliany Lillian Alves de Oliveira, Breno Santana Santos, Marianne Silva and Ivanovitch Silva
Data 2025, 10(7), 106; https://doi.org/10.3390/data10070106 - 1 Jul 2025
Viewed by 407
Abstract
Legislative documents are crucial to democratic societies, defining the legal framework for social life. In Brazil, legislative texts are particularly complex due to extensive technical jargon, intricate sentence structures, and frequent references to prior legislation. The country’s civil law tradition and multicultural context [...] Read more.
Legislative documents are crucial to democratic societies, defining the legal framework for social life. In Brazil, legislative texts are particularly complex due to extensive technical jargon, intricate sentence structures, and frequent references to prior legislation. The country’s civil law tradition and multicultural context introduce further interpretative and linguistic challenges. Moreover, the study of Brazilian Portuguese legislative texts remains underexplored, lacking legal-specific models and datasets. To address these gaps, this work proposes a data-driven approach utilizing large language models (LLMs) to analyze these documents and extract knowledge graphs (KGs). A case study was conducted using 1869proposals from the Legislative Assembly of Rio Grande do Norte (ALRN), spanning January 2019 to April 2024. The Llama 3.2 3B Instruct model was employed to extract KGs representing entities and their relationships. The findings support the method’s effectiveness in producing coherent graphs faithful to the original content. Nevertheless, challenges remain in resolving entity ambiguity and achieving full relationship coverage. Additionally, readability analyses using metrics for Brazilian Portuguese revealed that ALRN proposals require superior reading skills due to their technical style. Ultimately, this study advances legal artificial intelligence by providing insights into Brazilian legislative texts and promoting transparency and accessibility through natural language processing techniques. Full article
Show Figures

Figure 1

21 pages, 287 KiB  
Article
Expert Experiences in Anonymizing Personal Data and Its Use as Open Data: Qualitative Insights
by Norbert Lichtenauer, Johann Guggumos, Matthias Kampmann, Juliane Kis, Florian Laumer, Elena März, Florian Wahl and Sebastian Wilhelm
Data 2025, 10(7), 105; https://doi.org/10.3390/data10070105 - 1 Jul 2025
Viewed by 358
Abstract
Introduction: The effective and meaningful use of anonymized personal data, including open data, is globally significant across various sectors. Enhancing data utilization aims to generate substantial societal benefits and added value through innovations, products, and services. However, several legal, ethical, and technical [...] Read more.
Introduction: The effective and meaningful use of anonymized personal data, including open data, is globally significant across various sectors. Enhancing data utilization aims to generate substantial societal benefits and added value through innovations, products, and services. However, several legal, ethical, and technical challenges currently hinder the development and broader adoption of open data. Furthermore, the availability of technical support tools with high usability is especially desirable to facilitate the anonymization process effectively. Methods: As part of the EAsyAnon research project, preliminary insights were gathered through a scoping review that identified factors promoting or impeding the anonymization and use of personal data. Based on these findings, a structured interview guide was developed. Following a pretest, 19 interviews were conducted with diverse stakeholders from healthcare institutions, research organizations, public authorities, and private companies. The collected data were analyzed using Kuckartz’s structural content analysis methodology, supported by qualitative analysis software. Results: The content analysis yielded five overarching categories and 21 subcategories. These encompassed stakeholder experiences related to anonymization and open data processes, the various types and formats of personal data, identified barriers and enabling factors, support services, and the ethical and legal considerations associated with anonymization. Discussion: The findings highlight significant uncertainty among stakeholders regarding the anonymization of personal data. Although the importance and potential applications of open data for innovation and continuous improvement are widely acknowledged and supported, numerous challenges persist at both the macro and micro levels. The results emphasize a clear need for targeted support measures to address these challenges effectively. Full article
(This article belongs to the Special Issue Ethical AI and Responsible Data Science)
Show Figures

Figure 1

14 pages, 228 KiB  
Article
Extracting Information from Unstructured Medical Reports Written in Minority Languages: A Case Study of Finnish
by Elisa Myllylä, Pekka Siirtola, Antti Isosalo, Jarmo Reponen, Satu Tamminen and Outi Laatikainen
Data 2025, 10(7), 104; https://doi.org/10.3390/data10070104 - 1 Jul 2025
Viewed by 321
Abstract
In the era of digital healthcare, electronic health records generate vast amounts of data, much of which is unstructured, and therefore, not in a usable format for conventional machine learning and artificial intelligence applications. This study investigates how to extract meaningful insights from [...] Read more.
In the era of digital healthcare, electronic health records generate vast amounts of data, much of which is unstructured, and therefore, not in a usable format for conventional machine learning and artificial intelligence applications. This study investigates how to extract meaningful insights from unstructured radiology reports written in Finnish, a minority language, using machine learning techniques for text analysis. With this approach, unstructured information could be transformed into a structured format. The results of this research show that relevant information can be effectively extracted from Finnish medical reports using classification algorithms with default parameter values. For the detection of breast tumour mentions from medical texts, classifiers achieved high accuracy, almost 90%. Detection of metastasis mentions, however, proved more challenging, with the best-performing models Support Vector Machine (SVM) and logistic regression achieving an F1-score of 81%. The lower performance in metastasis detection is likely due to the more complex problem, ambiguous labeling, and the smaller dataset size. The results of classical classifiers were also compared with FinBERT, a domain-adapted Finnish BERT model. However, classical classifiers outperformed FinBERT. This highlights the challenge of medical language processing when working with minority languages. Moreover, it was noted that parameter tuning based on translated English reports did not significantly improve the detection rates, likely due to linguistic differences between the datasets. This larger translated dataset used for tuning comes from a different clinical domain and employs noticeably simpler, less nuanced language than the Finnish breast cancer reports, which are written by native Finnish-speaking medical experts. This underscores the need for localised datasets and models, particularly for minority languages with unique grammatical structures. Full article
Show Figures

Figure 1

15 pages, 770 KiB  
Data Descriptor
NPFC-Test: A Multimodal Dataset from an Interactive Digital Assessment Using Wearables and Self-Reports
by Luis Fernando Morán-Mirabal, Luis Eduardo Güemes-Frese, Mariana Favarony-Avila, Sergio Noé Torres-Rodríguez and Jessica Alejandra Ruiz-Ramirez
Data 2025, 10(7), 103; https://doi.org/10.3390/data10070103 - 30 Jun 2025
Viewed by 332
Abstract
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol [...] Read more.
The growing implementation of digital platforms and mobile devices in educational environments has generated the need to explore new approaches for evaluating the learning experience beyond traditional self-reports or instructor presence. In this context, the NPFC-Test dataset was created from an experimental protocol conducted at the Experiential Classroom of the Institute for the Future of Education. The dataset was built by collecting multimodal indicators such as neuronal, physiological, and facial data using a portable EEG headband, a medical-grade biometric bracelet, a high-resolution depth camera, and self-report questionnaires. The participants were exposed to a digital test lasting 20 min, composed of audiovisual stimuli and cognitive challenges, during which synchronized data from all devices were gathered. The dataset includes timestamped records related to emotional valence, arousal, and concentration, offering a valuable resource for multimodal learning analytics (MMLA). The recorded data were processed through calibration procedures, temporal alignment techniques, and emotion recognition models. It is expected that the NPFC-Test dataset will support future studies in human–computer interaction and educational data science by providing structured evidence to analyze cognitive and emotional states in learning processes. In addition, it offers a replicable framework for capturing synchronized biometric and behavioral data in controlled academic settings. Full article
Show Figures

Figure 1

30 pages, 30383 KiB  
Technical Note
Dataset and AI Workflow for Deep Learning Image Classification of Ulcerative Colitis and Colorectal Cancer
by Joaquim Carreras, Giovanna Roncador and Rifat Hamoudi
Data 2025, 10(7), 99; https://doi.org/10.3390/data10070099 - 24 Jun 2025
Viewed by 312
Abstract
Inflammatory bowel disease (IBD) is a chronic inflammatory condition of the gastrointestinal tract characterized by the deregulation of immuno-oncology markers. IBD includes ulcerative colitis and Crohn’s disease. Chronic active inflammation is a risk factor for the development of colorectal cancer (CRC). This technical [...] Read more.
Inflammatory bowel disease (IBD) is a chronic inflammatory condition of the gastrointestinal tract characterized by the deregulation of immuno-oncology markers. IBD includes ulcerative colitis and Crohn’s disease. Chronic active inflammation is a risk factor for the development of colorectal cancer (CRC). This technical note describes a dataset of histological images of ulcerative colitis, CRC (adenocarcinoma), and colon control. The samples were stained with hematoxylin and eosin (H&E), and immunohistochemically analyzed for LAIR1 and TOX2 markers. The methods used for collecting, processing, and analyzing scientific data, including this dataset, using convolutional neural networks (CNNs) and information about the dataset’s use are also described. This article is a companion to the manuscript “Ulcerative Colitis, LAIR1 and TOX2 Expression, and Colorectal Cancer Deep Learning Image Classification Using Convolutional Neural Networks”. Full article
Show Figures

Figure 1

20 pages, 2245 KiB  
Article
Data-Driven Modeling and Simulation in Forestry and Agricultural Product Transportation Management by Small Businesses: A Case Study
by Galina Merkurjeva, Vitalijs Bolsakovs, Jurijs Merkurjevs, Andrejs Romanovs and Wouter Faes
Data 2025, 10(7), 98; https://doi.org/10.3390/data10070098 - 24 Jun 2025
Viewed by 311
Abstract
This article proposes an innovative methodology for data-driven modeling and simulation of transportation management through cross-sectoral collaboration in small businesses. The present research is multidisciplinary and interdisciplinary in nature. We investigate the improvements in logistics management that can be achieved through cross-sector collaboration [...] Read more.
This article proposes an innovative methodology for data-driven modeling and simulation of transportation management through cross-sectoral collaboration in small businesses. The present research is multidisciplinary and interdisciplinary in nature. We investigate the improvements in logistics management that can be achieved through cross-sector collaboration in agriculture and forestry. A data-driven method, such as symbolic regression, is used to identify the relationships between factors in a modeled system using mathematical expressions. These expressions are directly integrated into the simulation models. Simulation spreads the modeling of transportation processes over a period of time. The system dynamics model is designed to analyze and assess the performance of a system based on its past behavior and is, therefore, deterministic. The discrete-event model enables the simulation of future scenarios and outcomes over time, given random input variables. As new data become available, relationships within the symbolic regression method are discovered more accurately, and simulations are updated accordingly. The tools offered for implementation are supplemented by a multi-user web simulation. The proposed case study is based on a real-life example. The obtained results allow small agricultural companies to use transportation and labor resources more efficiently when organizing the transportation of their agricultural and forestry products. Integrating data-driven models into simulations enables a better interpretation of data across the entire data value chain. Full article
Show Figures

Figure 1

24 pages, 1586 KiB  
Article
Effective Education System for Athletes Utilising Big Data and AI Technology
by Martin Mičiak, Dominika Toman, Roman Adámik, Ema Kufová, Branislav Škulec, Nikola Mozolová and Aneta Hoferová
Data 2025, 10(7), 102; https://doi.org/10.3390/data10070102 - 24 Jun 2025
Viewed by 473
Abstract
Education leads to building successful careers. However, different groups of students have different studying preferences. Our target group are athletes, combining their education and sports training. The main objective is to provide recommendations for an effective education system for athletes, improving their chances [...] Read more.
Education leads to building successful careers. However, different groups of students have different studying preferences. Our target group are athletes, combining their education and sports training. The main objective is to provide recommendations for an effective education system for athletes, improving their chances of finding new careers after leaving sports. Such a system must include Big Data and utilise AI possibilities currently available that support athletes’ career planning and development in a meaningful way. The main objective is specified by the following partial objectives: identifying what types of Big Data to analyse in connection with the athletes’ education; revealing what AI tools to include in the athletes’ education for their better preparation for a career after sports; determining what knowledge of AI and Big Data athletes need to stay relevant once they enter the labour market. Our study combines secondary and primary data sources. The secondary data (used in the orientation analysis) include case studies on AI and Big Data connected to education. The primary data were collected via a survey performed on over 200 Slovak junior athletes. The results show directions for the sports policymakers and sports organisations’ managers willing to improve their athletes’ career prospects. Full article
Show Figures

Figure 1

8 pages, 786 KiB  
Data Descriptor
OrthoKnow-SP: A Large-Scale Dataset on Orthographic Knowledge and Spelling Decisions in Spanish Adults
by Jon Andoni Duñabeitia
Data 2025, 10(7), 101; https://doi.org/10.3390/data10070101 - 24 Jun 2025
Viewed by 274
Abstract
Orthographic knowledge is a critical component of skilled language use, yet its large-scale behavioral signatures remain understudied in Spanish. To address this gap, we developed OrthoKnow-SP, a megastudy that captures spelling decisions from 27,185 native Spanish-speaking adults who completed an 80-item forced-choice task. [...] Read more.
Orthographic knowledge is a critical component of skilled language use, yet its large-scale behavioral signatures remain understudied in Spanish. To address this gap, we developed OrthoKnow-SP, a megastudy that captures spelling decisions from 27,185 native Spanish-speaking adults who completed an 80-item forced-choice task. Each trial required selecting the correctly spelled word from a pair comprising a real word and a pseudohomophone foil that preserved pronunciation while violating the correct graphemic representation. The stimuli targeted six high-confusability contrasts in Spanish orthography. We recorded response accuracy and reaction times for over 2.17 million trials, alongside demographic and device metadata. Results show robust variability across items and individuals, with item-level metrics closely aligned with independent norms of word prevalence. A composite difficulty index integrating speed and accuracy further allowed fine-grained item ranking. The dataset provides the first population-scale norms of Spanish spelling difficulty, capturing regional and generational diversity absent from traditional lab-based studies. Public release of OrthoKnow-SP enables new research on the cognitive and demographic factors shaping orthographic decisions, and provides educators, clinicians, and developers with a valuable benchmark for assessing spelling competence and modeling written language processing. Full article
Show Figures

Figure 1

14 pages, 296 KiB  
Article
Collecting and Analyzing IBD Clinical Data for Machine-Learning: Insights from an Italian Cohort
by Aldo Marzullo, Victor Savevski, Maddalena Menini, Alessandro Schilirò, Gianluca Franchellucci, Arianna Dal Buono, Cristina Bezzio, Roberto Gabbiadini, Cesare Hassan, Alessandro Repici and Alessandro Armuzzi
Data 2025, 10(7), 100; https://doi.org/10.3390/data10070100 - 24 Jun 2025
Viewed by 258
Abstract
Research of Inflammatory Bowel Disease (IBD) involves integrating diverse and heterogeneous data sources, from clinical records to imaging and laboratory results, which presents significant challenges in data harmonization and exploration. These challenges are also reflected in the development of machine-learning applications, where inconsistencies [...] Read more.
Research of Inflammatory Bowel Disease (IBD) involves integrating diverse and heterogeneous data sources, from clinical records to imaging and laboratory results, which presents significant challenges in data harmonization and exploration. These challenges are also reflected in the development of machine-learning applications, where inconsistencies in data quality, missing information, and variability in data formats can adversely affect the performance and generalizability of models. In this study, we describe the collection and curation of a comprehensive dataset focused on IBD. In addition, we present a dedicated research platform. We focus on ethical standards, data protection, and seamless integration of different data types. We also discuss the challenges encountered, as well as the insights gained during its implementation. Full article
Show Figures

Figure 1

27 pages, 1050 KiB  
Article
Developing Data Workflows: From Conceptual Blueprints to Physical Implementation
by Bruno Oliveira and Óscar Oliveira
Data 2025, 10(7), 97; https://doi.org/10.3390/data10070097 - 23 Jun 2025
Viewed by 216
Abstract
Data workflows are an important component of modern analytical systems, enabling structured data extraction, transformation, integration, and delivery across diverse applications. Despite their importance, these workflows are often developed using ad hoc approaches, leading to scalability and maintenance challenges. This paper proposes a [...] Read more.
Data workflows are an important component of modern analytical systems, enabling structured data extraction, transformation, integration, and delivery across diverse applications. Despite their importance, these workflows are often developed using ad hoc approaches, leading to scalability and maintenance challenges. This paper proposes a structured, three-level methodology—conceptual, logical, and physical—for modeling data workflows using Business Process Model and Notation (BPMN). A custom BPMN metamodel is introduced, along with a tool built on BPMN.io, that enforces modeling constraints and supports translation from high-level workflow designs to executable implementations. Logical models are further enriched through blueprint definitions, specified in a formal, implementation-agnostic JSON schema. The methodology is validated through a case study, demonstrating its applicability across ETL and machine learning domains, promoting clarity, reuse, and automation in data pipeline development. Full article
Show Figures

Figure 1

20 pages, 4787 KiB  
Article
A Data Imputation Strategy to Enhance Online Game Churn Prediction, Considering Non-Login Periods
by JaeHong Lee, Pavinee Rerkjirattikal and SangGyu Nam
Data 2025, 10(7), 96; https://doi.org/10.3390/data10070096 - 23 Jun 2025
Viewed by 444
Abstract
User churn in online games refers to players becoming inactive for an extended period. Even a small increase in churn can lead to significant revenue loss, making churn prediction crucial for sustaining long-term player engagement. Although user churn prediction has been extensively studied, [...] Read more.
User churn in online games refers to players becoming inactive for an extended period. Even a small increase in churn can lead to significant revenue loss, making churn prediction crucial for sustaining long-term player engagement. Although user churn prediction has been extensively studied, most existing approaches either ignore non-login periods or treat all inactivity uniformly, overlooking key behavioral differences. This study addresses this gap by categorizing non-login periods into three types, as follows: inactivity due to new or dormant users, genuine loss of interest, and temporary inaccessibility caused by external factors. These periods are treated as either non-existent or missing data and imputed using techniques such as mean or mode substitution, linear interpolation, and multiple imputation by chained equations (MICE). MICE was selected due to its ability to impute missing values more robustly by considering multivariate relationships. A random forest (RF) classifier, chosen for its interpretability and robustness to incomplete data, serves as the primary prediction model. Additionally, classifier chains are used to capture label dependencies, and principal component analysis (PCA) is applied to reduce dimensionality and mitigate overfitting. Experiments on real-world MMORPG data show that our approach improves predictive accuracy, achieving a micro-averaged AUC of above 0.92 and a weighted F1 score exceeding 0.70. These findings suggest that our approach improves churn prediction and offers actionable insights for supporting personalized player retention strategies. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

17 pages, 7465 KiB  
Data Descriptor
A Sub-Hourly Precipitation Dataset from a Pluviographic Network in Central Chile
by Claudia Sangüesa, Alfredo Ibañez, Roberto Pizarro, Cristian Vidal-Silva, Pablo Garcia-Chevesich, Romina Mendoza, Cristóbal Toledo, Juan Pino, Rodrigo Paredes and Ben Ingram
Data 2025, 10(7), 95; https://doi.org/10.3390/data10070095 - 22 Jun 2025
Viewed by 1067
Abstract
This data descriptor presents a unique high-resolution rainfall dataset derived from 14 pluviograph stations across central Chile’s Mediterranean region, covering variable periods starting from between 1969 and 1992, up to 2009. The dataset provides continuous precipitation records at a 5 min temporal resolution, [...] Read more.
This data descriptor presents a unique high-resolution rainfall dataset derived from 14 pluviograph stations across central Chile’s Mediterranean region, covering variable periods starting from between 1969 and 1992, up to 2009. The dataset provides continuous precipitation records at a 5 min temporal resolution, obtained through the digitization and processing of pluviograph strip charts using specialized software. This high temporal resolution is unprecedented for the region and enables detailed analysis of rainfall intensity, duration, and frequency patterns critical for hydrological research, climate studies, and water resource management in general. Each station’s data was subjected to quality control procedures, including manual validation and correction of digitization errors to ensure data integrity. The dataset reveals the significant temporal variability of rainfall in central Chile, capturing both short-duration high-intensity events and longer precipitation patterns. By making this dataset publicly available, we provide researchers with a valuable resource for studying rainfall behavior in a Mediterranean climate zone subject to significant climate variability and change. The dataset supports various applications, including the development of intensity–duration–frequency curves, analysis of rainfall erosivity, calibration of hydrological models, and investigation of precipitation trends in the context of climate change. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

15 pages, 847 KiB  
Data Descriptor
Mixtec–Spanish Parallel Text Dataset for Language Technology Development
by Hermilo Santiago-Benito, Diana-Margarita Córdova-Esparza, Juan Terven, Noé-Alejandro Castro-Sánchez, Teresa García-Ramirez, Julio-Alejandro Romero-González and José M. Álvarez-Alvarado
Data 2025, 10(7), 94; https://doi.org/10.3390/data10070094 - 21 Jun 2025
Viewed by 247
Abstract
This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachistlahuaca, Northern Guerrero, and Xochapa) and Oaxaca [...] Read more.
This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachistlahuaca, Northern Guerrero, and Xochapa) and Oaxaca (Western Coast, Southern Lowland, Santa María Yosoyúa, Central, Lower Cañada, Western Central, San Antonio Huitepec, Upper Western, and Southwestern Central). Texts are classified into four main domains as follows: education, law, health, and religion. To compile these data, we conducted a two-phase collection process as follows: first, an online search of government portals, religious organizations, and Mixtec language blogs; and second, an on-site retrieval of physical texts from the library of the Autonomous University of Querétaro. Scanning and optical character recognition were then performed to digitize physical materials, followed by manual correction to fix character misreadings and remove duplicates or irrelevant segments. We conducted a preliminary evaluation of the collected data to validate its usability in automatic translation systems. From Spanish to Mixtec, a fine-tuned GPT-4o-mini model yielded a BLEU score of 0.22 and a TER score of 122.86, while two fine-tuned open source models mBART-50 and M2M-100 yielded BLEU scores of 4.2 and 2.63 and TER scores of 98.99 and 104.87, respectively. All code demonstrating data usage, along with the final corpus itself, is publicly accessible via GitHub and Figshare. We anticipate that this resource will enable further research into machine translation, speech recognition, and other NLP applications while contributing to the broader goal of preserving and revitalizing the Mixtec language. Full article
Show Figures

Figure 1

12 pages, 379 KiB  
Data Descriptor
Wildfire Occurrence and Damage Dataset for Chile (1985–2024): A Real Data Resource for Early Detection and Prevention Systems
by Cristian Vidal-Silva, Roberto Pizarro, Miguel Castillo-Soto, Claudia de la Fuente, Vannessa Duarte, Claudia Sangüesa, Alfredo Ibañez, Rodrigo Paredes and Ben Ingram
Data 2025, 10(7), 93; https://doi.org/10.3390/data10070093 - 20 Jun 2025
Viewed by 541
Abstract
Wildfires represent an increasing global concern, threatening ecosystems, human settlements, and economies. Chile, characterized by diverse climatic zones and extensive forested areas, has been particularly vulnerable to wildfire events over recent decades. In this context, real, long-term data are essential to understand wildfire [...] Read more.
Wildfires represent an increasing global concern, threatening ecosystems, human settlements, and economies. Chile, characterized by diverse climatic zones and extensive forested areas, has been particularly vulnerable to wildfire events over recent decades. In this context, real, long-term data are essential to understand wildfire dynamics and to design effective early warning and prevention systems. This paper introduces a unique dataset containing detailed wildfire occurrence and damage information across Chilean municipalities from 1985 to 2024. Derived from official records by the National Forestry Corporation of Chile CONAF, this dataset encompasses key variables such as the number of fires, total burned area, estimated material damages, and the number of affected individuals. It provides an invaluable resource for researchers and policymakers aiming to improve fire risk assessments, model fire behavior, and develop AI-driven early detection systems. The temporal span of nearly four decades offers opportunities for longitudinal analyses, the study of climate change impacts on fire regimes, and the evaluation of historical prevention strategies. Furthermore, by presenting a complete spatial coverage at the municipal level, it allows fine-grained assessments of regional vulnerabilities and resilience. Full article
Show Figures

Figure 1

31 pages, 1146 KiB  
Article
Benchmarking and Lessons Learned from Using SharePoint as an Electronic Lab Notebook in Engineering Joint Research Projects
by Kim Feldhoff, Tim Opatz, Hajo Wiemer, Martin Zinner and Steffen Ihlenfeldt
Data 2025, 10(7), 92; https://doi.org/10.3390/data10070092 - 20 Jun 2025
Viewed by 324
Abstract
The adoption of Electronic Lab Notebooks (ELNs) significantly enhances research operations by enabling the streamlined capture, storage, and dissemination of data. This promotes collaboration and ensures organised and efficient access to critical research information. Microsoft SharePoint® (SP) is an established, widely used, [...] Read more.
The adoption of Electronic Lab Notebooks (ELNs) significantly enhances research operations by enabling the streamlined capture, storage, and dissemination of data. This promotes collaboration and ensures organised and efficient access to critical research information. Microsoft SharePoint® (SP) is an established, widely used, web-based platform with advanced collaboration capabilities. This study investigates whether SP can meet the needs of engineering research projects, particularly in a collaborative environment. The paper outlines the process of adapting SP into an ELN tool and evaluates its effectiveness compared to established ELN systems. The evaluation considers several categories related to data management, ranging from data collection to publication. Six distinct application scenarios are analysed, representing a spectrum of collaborative research projects, ranging from small-scale initiatives with minimal processes and data to large-scale, complex projects with extensive data requirements. The results indicate that SP is competitive in relation with established ELN tools, ranking second among the six alternatives evaluated. The adapted version of SP proves particularly effective for managing data in engineering research projects involving both academic and industrial partners, accommodating datasets for around 1000 samples. The practical implementation of SP is demonstrated through a collaborative engineering research project, showing its use in everyday research tasks such as data documentation, workflow automation, and data export. The study highlights the benefits and usability of the adapted SP version, including its support for regulatory compliance and reproducibility in research workflows. In addition, limitations and lessons learned are discussed, providing insights into the potential and challenges of using SP as an ELN tool in collaborative research projects. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Previous Issue
Back to TopTop