Previous Issue
Volume 9, April
 
 

Data, Volume 9, Issue 5 (May 2024) – 13 articles

  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
22 pages, 1226 KiB  
Article
Comparative Analysis of the Predictive Performance of an ANN and Logistic Regression for the Acceptability of Eco-Mobility Using the Belgrade Data set
by Jelica Komarica, Draženko Glavić and Snežana Kaplanović
Data 2024, 9(5), 73; https://doi.org/10.3390/data9050073 (registering DOI) - 19 May 2024
Viewed by 115
Abstract
To solve the problem of environmental pollution caused by road traffic, alternatives to vehicles with internal combustion engines are often proposed. As such, eco-mobility microvehicles have significant potential in the fight against environmental pollution, but only on the condition that they are widely [...] Read more.
To solve the problem of environmental pollution caused by road traffic, alternatives to vehicles with internal combustion engines are often proposed. As such, eco-mobility microvehicles have significant potential in the fight against environmental pollution, but only on the condition that they are widely accepted and that they replace the vehicles that predominantly pollute the environment. With this in mind, this study aims to elucidate the main variables that influence the acceptability of these vehicles, using prediction models based on binary logistic regression and a multilayer artificial neural network—a multilayer perceptron (ANN). The data of a random sample obtained via an online questionnaire, answered by 503 inhabitants of Belgrade (Serbia), were used for training and testing the model. A multilayer perceptron with 9 and 7 neurons in two hidden layers, a hyperbolic tangent activation function in the hidden layer, and an identity function in the output layer performed slightly better than the binary logistic regression model. With an accuracy of 85%, a precision of 79%, a recall of 81%, and an area under the ROC curve of 0.9, the multilayer perceptron model recognized the influential variables in predicting acceptability. The results of the model indicate that a respondent’s relationship to their current environmental pollution, the frequency of their use of modes of transport such as bicycles and motorcycles, their mileage for commuting, and their personal income have the greatest influence on the acceptability of using eco-mobility vehicles. Full article
Show Figures

Figure 1

13 pages, 2679 KiB  
Article
A Benchmark Data Set for Long-Term Monitoring in the eLTER Site Gesäuse-Johnsbachtal
by Florian Lippl, Alexander Maringer, Margit Kurka, Jakob Abermann, Wolfgang Schöner and Manuela Hirschmugl
Data 2024, 9(5), 72; https://doi.org/10.3390/data9050072 (registering DOI) - 18 May 2024
Viewed by 287
Abstract
This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected [...] Read more.
This paper gives an overview over all currently available data sets for the European Long-term Ecosystem Research (eLTER) monitoring site Gesäuse-Johnsbachtal. The site is part of the LTSER platform Eisenwurzen in the Alps of the province of Styria, Austria. It contains both protected (National Park Gesäuse) and non-protected areas (Johnsbachtal). Although the main research focus of the eLTER monitoring site Gesäuse-Johnsbachtal is on inland surface running waters, forests and other wooded land, the eLTER whole system (WAILS) approach was followed in regard to the data selection, systematically screening all available data in regard to its suitability as eLTER’s Standard Observations (SOs). Thus, data from all system strata was included, incorporating Geosphere, Atmosphere, Hydrosphere, Biosphere and Sociosphere. In the WAILS approach these SOs are key data for a whole system approach towards long term ecosystem research. Altogether, 54 data sets have been collected for the eLTER monitoring site Gesäuse-Johnsbachtal and included in the Dynamical Ecological Information Management System – Site and Data Registry (DEIMS-SDR), which is the eLTER data platform. The presented work provides all these data sets through dedicated data repositories for FAIR use. This paper gives an overview on all compiled data sets and their main properties. Additionally, the available data are evaluated in a concluding gap analysis with regard to the needed observation data according to WAILS, followed by an outlook on how to fill these gaps. Full article
24 pages, 545 KiB  
Article
Neural Architecture Comparison for Bibliographic Reference Segmentation: An Empirical Study
by Rodrigo Cuéllar Hidalgo, Raúl Pinto Elías, Juan Manuel Torres Moreno, Osslan Osiris Vergara Villegas , Gerardo Reyes Salgado and Andrea Magadán Salazar
Data 2024, 9(5), 71; https://doi.org/10.3390/data9050071 (registering DOI) - 18 May 2024
Viewed by 249
Abstract
In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation [...] Read more.
In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication. Full article
17 pages, 6833 KiB  
Data Descriptor
Continuous Wave Measurements Collected in Intermediate Depth throughout the North Sea Storm Season during the RealDune/REFLEX Experiments
by Jantien Rutten, Marion Tissier, Paul van Wiechen, Xinyi Zhang, Sierd de Vries, Ad Reniers and Jan-Willem Mol
Data 2024, 9(5), 70; https://doi.org/10.3390/data9050070 - 17 May 2024
Viewed by 246
Abstract
High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set [...] Read more.
High-resolution wave measurements at intermediate water depth are required to improve coastal impact modeling. Specifically, such data sets are desired to calibrate and validate models, and broaden the insight on the boundary conditions that force models. Here, we present a wave data set collected in the North Sea at three stations in intermediate water depth (6–14 m) during the 2021/2022 storm season as part of the RealDune/REFLEX experiments. Continuous measurements of synchronized surface elevation, velocity and pressure were recorded at 2–4 Hz by Acoustic Doppler Profilers and an Acoustic Doppler Velocimeter for a 5-month duration. Time series were quality-controlled, directional-frequency energy spectra were calculated and common bulk parameters were derived. Measured wave conditions vary from calm to energetic with 0.1–5.0 m sea-swell wave height, 5–16 s mean wave period and W-NNW direction. Nine storms, i.e., wave height beyond 2.5 m for at least six hours, were recorded including the triple storms Dudley, Eunice and Franklin. This unique data set can be used to investigate wave transformation, wave nonlinearity and wave directionality for higher and lower frequencies (e.g., sea-swell and infragravity waves) to compare with theoretical and empirical descriptions. Furthermore, the data can serve to force, calibrate and validate models during storm conditions. Full article
Show Figures

Figure 1

38 pages, 390 KiB  
Review
Review of Data Processing Methods Used in Predictive Maintenance for Next Generation Heavy Machinery
by Ietezaz Ul Hassan, Krishna Panduru and Joseph Walsh
Data 2024, 9(5), 69; https://doi.org/10.3390/data9050069 - 15 May 2024
Viewed by 353
Abstract
Vibration-based condition monitoring plays an important role in maintaining reliable and effective heavy machinery in various sectors. Heavy machinery involves major investments and is frequently subjected to extreme operating conditions. Therefore, prompt fault identification and preventive maintenance are important for reducing costly breakdowns [...] Read more.
Vibration-based condition monitoring plays an important role in maintaining reliable and effective heavy machinery in various sectors. Heavy machinery involves major investments and is frequently subjected to extreme operating conditions. Therefore, prompt fault identification and preventive maintenance are important for reducing costly breakdowns and maintaining operational safety. In this review, we look at different methods of vibration data processing in the context of vibration-based condition monitoring for heavy machinery. We divided primary approaches related to vibration data processing into three categories–signal processing methods, preprocessing-based techniques and artificial intelligence-based methods. We highlight the importance of these methods in improving the reliability and effectiveness of heavy machinery condition monitoring systems, highlighting the importance of precise and automated fault detection systems. To improve machinery performance and operational efficiency, this review aims to provide information on current developments and future directions in vibration-based condition monitoring by addressing issues like imbalanced data and integrating cutting-edge techniques like anomaly detection algorithms. Full article
15 pages, 1153 KiB  
Data Descriptor
EEG and Physiological Signals Dataset from Participants during Traditional and Partially Immersive Learning Experiences in Humanities
by Rebeca Romo-De León, Mei Li L. Cham-Pérez, Verónica Andrea Elizondo-Villegas, Alejandro Villarreal-Villarreal, Alexandro Antonio Ortiz-Espinoza, Carol Stefany Vélez-Saboyá, Jorge de Jesús Lozoya-Santos, Manuel Cebral-Loureda and Mauricio A. Ramírez-Moreno
Data 2024, 9(5), 68; https://doi.org/10.3390/data9050068 - 15 May 2024
Viewed by 355
Abstract
The relevance of the interaction between Humanities-enhanced learning using immersive environments and simultaneous physiological signal analysis contributes to the development of Neurohumanities and advancements in applications of Digital Humanities. The present dataset consists of recordings from 24 participants divided in two groups (12 [...] Read more.
The relevance of the interaction between Humanities-enhanced learning using immersive environments and simultaneous physiological signal analysis contributes to the development of Neurohumanities and advancements in applications of Digital Humanities. The present dataset consists of recordings from 24 participants divided in two groups (12 participants in each group) engaging in simulated learning scenarios, traditional learning, and partially immersive learning experiences. Data recordings from each participant contain recordings of physiological signals and psychometric data collected from applied questionnaires. Physiological signals include electroencephalography, real-time engagement and emotion recognition calculation by a Python EEG acquisition code, head acceleration, electrodermal activity, blood volume pressure, inter-beat interval, and temperature. Before the acquisition of physiological signals, participants were asked to fill out the General Health Questionnaire and Trait Meta-Mood Scale. In between recording sessions, participants were asked to fill out Likert-scale questionnaires regarding their experience and a Self-Assessment Manikin. At the end of the recording session, participants filled out the ITC Sense of Presence Inventory questionnaire for user experience. The dataset can be used to explore differences in physiological patterns observed between different learning modalities in the Humanities. Full article
Show Figures

Figure 1

41 pages, 2238 KiB  
Article
Unveiling University Groupings: A Clustering Analysis for Academic Rankings
by George Matlis, Nikos Dimokas and Petros Karvelis
Data 2024, 9(5), 67; https://doi.org/10.3390/data9050067 - 11 May 2024
Viewed by 372
Abstract
The evaluation and ranking of educational institutions are of paramount importance to a wide range of stakeholders, including students, faculty members, funding organizations, and the institutions themselves. Traditional ranking systems, such as those provided by QS, ARWU, and THE, have offered valuable insights [...] Read more.
The evaluation and ranking of educational institutions are of paramount importance to a wide range of stakeholders, including students, faculty members, funding organizations, and the institutions themselves. Traditional ranking systems, such as those provided by QS, ARWU, and THE, have offered valuable insights into university performance by employing a variety of indicators to reflect institutional excellence across research, teaching, international outlook, and more. However, these linear rankings may not fully capture the multifaceted nature of university performance. This study introduces a novel clustering analysis that complements existing rankings by grouping universities with similar characteristics, providing a multidimensional perspective on global higher education landscapes. Utilizing a range of clustering algorithms—K-Means, GMM, Agglomerative, and Fuzzy C-Means—and incorporating both traditional and unique indicators, our approach seeks to highlight the commonalities and shared strengths within clusters of universities. This analysis does not aim to supplant existing ranking systems but to augment them by offering stakeholders an alternative lens through which to view and assess university performance. By focusing on group similarities rather than ordinal positions, our method encourages a more nuanced understanding of institutional excellence and facilitates peer learning among universities with similar profiles. While acknowledging the limitations inherent in any methodological approach, including the selection of indicators and clustering algorithms, this study underscores the value of complementary analyses in enriching our understanding of higher educational institutions’ performance. Full article
Show Figures

Figure 1

9 pages, 191 KiB  
Data Descriptor
A Series Production Data Set for Five-Axis CNC Milling
by Anna-Maria Schmitt and Bastian Engelmann
Data 2024, 9(5), 66; https://doi.org/10.3390/data9050066 - 30 Apr 2024
Viewed by 534
Abstract
The described data set contains features from the machine control of a five-axis milling machine. The features were recorded during thirteen series productions. Each series production includes a changeover process in which the machine was set up for the production of a different [...] Read more.
The described data set contains features from the machine control of a five-axis milling machine. The features were recorded during thirteen series productions. Each series production includes a changeover process in which the machine was set up for the production of a different product. In addition to the timestamps and the twenty recorded features derived from Numerical Control (NC) variables, the data set also contains labels for the different production phases. For this purpose, up to 23 phases were assigned, which are based on a generalized milling process. The data set consists of thirteen .csv files, each representing a series production. The data set was recorded in a production company in the contract manufacturing sector for components with real series orders in ongoing industrial production. Full article
Show Figures

Figure 1

16 pages, 2904 KiB  
Article
Spectral Library of Plant Species from Montesinho Natural Park in Portugal
by Isabel Pôças, Cátia Rodrigues de Almeida, Salvador Arenas-Castro, João C. Campos, Nuno Garcia, João Alírio, Neftalí Sillero and Ana C. Teodoro
Data 2024, 9(5), 65; https://doi.org/10.3390/data9050065 - 30 Apr 2024
Viewed by 1311
Abstract
In this work, we present and describe a spectral library (SL) with 15 vascular plant species from Montesinho Natural Park (MNP), a protected area in Northeast Portugal. We selected species from the vascular plants that are characteristic of the habitats in the MNP, [...] Read more.
In this work, we present and describe a spectral library (SL) with 15 vascular plant species from Montesinho Natural Park (MNP), a protected area in Northeast Portugal. We selected species from the vascular plants that are characteristic of the habitats in the MNP, based on their prevalence, and also included one invasive species: Alnus glutinosa (L.) Gaertn, Castanea sativa Mill., Cistus ladanifer L., Crataegus monogyna Jacq., Frangula alnus Mill., Fraxinus angustifolia Vahl, Quercus pyrenaica Willd., Quercus rotundifolia Lam., Trifolium repens L., Arbutus unedo L., Dactylis glomerata L., Genista falcata Brot., Cytisus multiflorus (L’Hér.) Sweet, Erica arborea L., and Acacia dealbata Link. We collected spectra (300–2500 nm) from five records per leaf and leaf side, which resulted in 538 spectra compiled in the SL. Additionally, we computed five vegetation indices from spectral data and analysed them to highlight specific characteristics and differences among the sampled species. We detail the data repository information and its organisation for a better understanding of the data and to facilitate its use. The SL structure can add valuable information about the selected plant species in MNP, contributing to conservation purposes. This plant species SL is publicly available in Zenodo platform. Full article
Show Figures

Figure 1

17 pages, 7237 KiB  
Data Descriptor
A Comprehensive Dataset of the Aerodynamic and Geometric Coefficients of Airfoils in the Public Domain
by Kanak Agarwal, Vedant Vijaykrishnan, Dyutit Mohanty and Manikandan Murugaiah
Data 2024, 9(5), 64; https://doi.org/10.3390/data9050064 - 30 Apr 2024
Viewed by 644
Abstract
This study presents an extensive collection of data on the aerodynamic behavior at a low Reynolds number and geometric coefficients for 2900 airfoils obtained through the class shape transformation (CST) method. By employing a verified OpenFOAM-based CFD simulation framework, lift and drag coefficients [...] Read more.
This study presents an extensive collection of data on the aerodynamic behavior at a low Reynolds number and geometric coefficients for 2900 airfoils obtained through the class shape transformation (CST) method. By employing a verified OpenFOAM-based CFD simulation framework, lift and drag coefficients were determined at a Reynolds number of 105. Considering the limited availability of data on low Reynolds number airfoils, this dataset is invaluable for a wide range of applications, including unmanned aerial vehicles (UAVs) and wind turbines. Additionally, the study offers a method for automating CFD simulations that could be applied to obtain aerodynamic coefficients at higher Reynolds numbers. The breadth of this dataset also supports the enhancement and creation of machine learning (ML) models, further advancing research into the aerodynamics of airfoils and lifting surfaces. Full article
Show Figures

Figure 1

15 pages, 6850 KiB  
Article
Detailed Landslide Traces Database of Hancheng County, China, Based on High-Resolution Satellite Images Available on the Google Earth Platform
by Junlei Zhao, Chong Xu and Xinwu Huang
Data 2024, 9(5), 63; https://doi.org/10.3390/data9050063 - 29 Apr 2024
Viewed by 515
Abstract
Hancheng is located in the eastern part of China’s Shaanxi Province, near the west bank of the Yellow River. It is located at the junction of the active geological structure area. The rock layer is relatively fragmented, and landslide disasters are frequent. The [...] Read more.
Hancheng is located in the eastern part of China’s Shaanxi Province, near the west bank of the Yellow River. It is located at the junction of the active geological structure area. The rock layer is relatively fragmented, and landslide disasters are frequent. The occurrence of landslide disasters often causes a large number of casualties along with economic losses in the local area, seriously restricting local economic development. Although risk assessment and deformation mechanism analysis for single landslides have been performed for landslide disasters in the Hancheng area, this area lacks a landslide traces database. A complete landslide database comprises the basic data required for the study of landslide disasters and is an important requirement for subsequent landslide-related research. Therefore, this study used multi-temporal high-resolution optical images and human-computer interaction visual interpretation methods of the Google Earth platform to construct a landslide traces database in Hancheng County. The results showed that at least 6785 landslides had occurred in the study area. The total area of the landslides was about 95.38 km2, accounting for 5.88% of the study area. The average landslide area was 1406.04 m2, the largest landslide area was 377,841 m2, and the smallest landslide area was 202.96 m2. The results of this study provides an important basis for understanding the spatial distribution of landslides in Hancheng County, the evaluation of landslide susceptibility, and local disaster prevention and mitigation work. Full article
Show Figures

Figure 1

16 pages, 5947 KiB  
Data Descriptor
Stimulated Microcontroller Dataset for New IoT Device Identification Schemes through On-Chip Sensor Monitoring
by Alberto Ramos, Honorio Martín, Carmen Cámara and Pedro Peris-Lopez
Data 2024, 9(5), 62; https://doi.org/10.3390/data9050062 - 28 Apr 2024
Viewed by 507
Abstract
Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range [...] Read more.
Legitimate identification of devices is crucial to ensure the security of present and future IoT ecosystems. In this regard, AI-based systems that exploit intrinsic hardware variations have gained notable relevance. Within this context, on-chip sensors included for monitoring purposes in a wide range of SoCs remain almost unexplored, despite their potential as a valuable source of both information and variability. In this work, we introduce and release a dataset comprising data collected from the on-chip temperature and voltage sensors of 20 microcontroller-based boards from the STM32L family. These boards were stimulated with five different algorithms, as workloads to elicit diverse responses. The dataset consists of five acquisitions (1.3 billion readouts) that are spaced over time and were obtained under different configurations using an automated platform. The raw dataset is publicly available, along with metadata and scripts developed to generate pre-processed T–V sequence sets. Finally, a proof of concept consisting of training a simple model is presented to demonstrate the feasibility of the identification system based on these data. Full article
Show Figures

Figure 1

10 pages, 290 KiB  
Data Descriptor
Training Datasets for Epilepsy Analysis: Preprocessing and Feature Extraction from Electroencephalography Time Series
by Christian Riccio, Angelo Martone, Gaetano Zazzaro and Luigi Pavone
Data 2024, 9(5), 61; https://doi.org/10.3390/data9050061 - 26 Apr 2024
Viewed by 760
Abstract
We describe 20 datasets derived through signal filtering and feature extraction steps applied to the raw time series EEG data of 20 epileptic patients, as well as the methods we used to derive them. Background: Epilepsy is a complex neurological disorder which has [...] Read more.
We describe 20 datasets derived through signal filtering and feature extraction steps applied to the raw time series EEG data of 20 epileptic patients, as well as the methods we used to derive them. Background: Epilepsy is a complex neurological disorder which has seizures as its hallmark. Electroencephalography plays a crucial role in epilepsy assessment, offering insights into the brain’s electrical activity and advancing our understanding of seizures. The availability of tagged training sets covering all seizure phases—inter-ictal, pre-ictal, ictal, and post-ictal—is crucial for data-driven epilepsy analyses. Methods: Using the sliding window technique with a two-second window length and a one-second time slip, we extract multiple features from the preprocessed EEG time series of 20 patients from the Freiburg Seizure Prediction Database. In addition, we assign a class label to each instance to specify its corresponding seizure phase. All these operations are made through a software application we developed, which is named Training Builder. Results: The 20 tagged training datasets each contain 1080 univariate and bivariate features, and are openly and publicly available. Conclusions: The datasets support the training of data-driven models for seizure detection, prediction, and clustering, based on features engineering. Full article
Show Figures

Figure 1

Previous Issue
Back to TopTop