Data | March 2023 - Browse Articles

14 pages, 8743 KiB

Open AccessArticle

Instance and Data Generation for the Offline Nanosatellite Task Scheduling Problem

by Cezar Antônio Rigo, Edemar Morsch Filho, Laio Oriel Seman, Luís Loures and Valderi Reis Quietinho Leithardt

Data 2023, 8(3), 62; https://doi.org/10.3390/data8030062 - 21 Mar 2023

Cited by 3 | Viewed by 2499

Abstract

This paper discusses several cases of the Offline Nanosatellite Task Scheduling (ONTS) optimization problem, which seeks to schedule the start and finish timings of payloads on a nanosatellite. Modeled after the FloripaSat-I mission, a nanosatellite, the examples were built expressly to test the [...] Read more.

This paper discusses several cases of the Offline Nanosatellite Task Scheduling (ONTS) optimization problem, which seeks to schedule the start and finish timings of payloads on a nanosatellite. Modeled after the FloripaSat-I mission, a nanosatellite, the examples were built expressly to test the performance of various solutions to the ONTS problem. Realistic input data for power harvesting calculations were used to generate the instances, and an instance creation procedure was employed to increase the instances’ difficulty. The instances are made accessible to the public to facilitate a fair comparison of various solutions and to aid in establishing a baseline for the ONTS problem. Additionally, the study discusses the various orbit types and their effects on energy harvesting and mission performance. Full article

(This article belongs to the Topic Techniques and Science Exploitations for Earth Observation and Planetary Exploration)

► Show Figures

Figure 1

14 pages, 955 KiB

Open AccessData Descriptor

TKGQA Dataset: Using Question Answering to Guide and Validate the Evolution of Temporal Knowledge Graph

by Ryan Ong, Jiahao Sun, Ovidiu Șerban and Yi-Ke Guo

Data 2023, 8(3), 61; https://doi.org/10.3390/data8030061 - 14 Mar 2023

Cited by 3 | Viewed by 3637

Abstract

Temporal knowledge graphs can be used to represent the current state of the world and, as daily events happen, the need to update the temporal knowledge graph, in order to stay consistent with the state of the world, becomes very important. However, there [...] Read more.

Temporal knowledge graphs can be used to represent the current state of the world and, as daily events happen, the need to update the temporal knowledge graph, in order to stay consistent with the state of the world, becomes very important. However, there is currently no reliable method to accurately validate the update and evolution of knowledge graphs. There has been a recent development in text summarisation, whereby question answering is used to both guide and fact-check summarisation quality. The exact process can be applied to the temporal knowledge graph update process. To the best of our knowledge, there is currently no dataset that connects temporal knowledge graphs with documents with question–answer pairs. In this paper, we proposed the TKGQA dataset, consisting of over 5000 financial news documents related to M&A. Each document has extracted facts, question–answer pairs, and before and after temporal knowledge graphs, to highlight the state of temporal knowledge and any changes caused by the facts extracted from the document. As we parse through each document, we use question–answering to check and guide the update process of the temporal knowledge graph. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

18 pages, 3814 KiB

Open AccessArticle

Development of a Machine-Learning-Based Novel Framework for Travel Time Distribution Determination Using Probe Vehicle Data

by Gurmesh Sihag, Praveen Kumar and Manoranjan Parida

Data 2023, 8(3), 60; https://doi.org/10.3390/data8030060 - 14 Mar 2023

Cited by 1 | Viewed by 2495

Abstract

Investigating travel time variability is critical for pre-trip planning, reliable route selection, traffic management, and the development of control strategies to mitigate traffic congestion problems cost-effectively. Hence, a large number of studies are available in the literature which determine the most suitable distribution [...] Read more.

Investigating travel time variability is critical for pre-trip planning, reliable route selection, traffic management, and the development of control strategies to mitigate traffic congestion problems cost-effectively. Hence, a large number of studies are available in the literature which determine the most suitable distribution to fit the travel time data, but these studies recommend different distributions for the travel time data, and there is a disagreement on the best distribution option for fitting to the travel time data. The present study proposes a novel framework to determine the best distribution to represent the travel time data obtained from probe vehicles by using the modern machine learning technique. This study employs vast travel time data collected by fitting GPS tracking units on the probe vehicles and offers a comprehensive investigation of travel time distribution in different scenarios generated due to spatiotemporal variation of the travel time. The study also considers the effect of weather and uses the three most commonly used non-parametric goodness-of-fit tests (namely, Kolmogorov–Smirnov test, Anderson–Darling test, and chi-squared test) to fit and rank a comprehensive set of around 60 unimodal statistical distributions. The framework proposed in the study can determine the travel time distribution with 91% accuracy. Additionally, the distribution determined by the framework has an acceptance rate of 98.4%, which is better than the acceptance rates of the distributions recommended in existing studies. Because of its robustness and applicability in many different traffic situations, the proposed framework can also be used in developing countries with heterogeneous disordered traffic conditions to evaluate the road network’s performance in terms of travel time reliability. Full article

(This article belongs to the Special Issue Data-Driven Approach on Urban Planning and Smart Cities)

► Show Figures

Figure 1

12 pages, 3082 KiB

Open AccessData Descriptor

WaRM: A Roof Material Spectral Library for Wallonia, Belgium

by Coraline Wyard, Rodolphe Marion and Eric Hallot

Data 2023, 8(3), 59; https://doi.org/10.3390/data8030059 - 7 Mar 2023

Cited by 2 | Viewed by 3387

Abstract

The exploitation of urban-material spectral properties is of increasing importance for a broad range of applications, such as urban climate-change modeling and mitigation or specific/dangerous roof-material detection and inventory. A new spectral library dedicated to the detection of roof material was created to [...] Read more.

The exploitation of urban-material spectral properties is of increasing importance for a broad range of applications, such as urban climate-change modeling and mitigation or specific/dangerous roof-material detection and inventory. A new spectral library dedicated to the detection of roof material was created to reflect the regional diversity of materials employed in Wallonia, Belgium. The Walloon Roof Material (WaRM) spectral library accounts for 26 roof material spectra in the spectral range 350–2500 nm. Spectra were acquired using an ASD FieldSpec3 Hi-Res spectrometer in laboratory conditions, using a spectral sampling interval of 1 nm. The analysis of the spectra shows that spectral signatures are strongly influenced by the color of the roof materials, at least in the VIS spectral range. The SWIR spectral range is in general more relevant to distinguishing the different types of material. Exceptions are the similar properties and very close spectra of several black materials, meaning that their spectral signatures are not sufficiently different to distinguish them from each other. Although building materials can vary regionally due to different available construction materials, the WaRM spectral library can certainly be used for wider applications; Wallonia has always been strongly connected to the surrounding regions and has always encountered climatic conditions similar to all of Northwest Europe. Full article

► Show Figures

Figure 1

13 pages, 1476 KiB

Open AccessData Descriptor

Home Comfort Dataset: Acquired from SGH

by Mariana Santos, Mário Antunes, Diogo Gomes and Rui L. Aguiar

Data 2023, 8(3), 58; https://doi.org/10.3390/data8030058 - 3 Mar 2023

Cited by 1 | Viewed by 2375

Abstract

In this work, we share the dataset collected during the Smart Green Homes (SGH) project. The project’s goal was to develop integrated products and technology solutions for households, as well as to improve the standards of comfort and user satisfaction. This was to [...] Read more.

In this work, we share the dataset collected during the Smart Green Homes (SGH) project. The project’s goal was to develop integrated products and technology solutions for households, as well as to improve the standards of comfort and user satisfaction. This was to be achieved while improving household energy efficiency and reducing the usage of gaseous pollutants, in response to the planet’s sustainability issues. One of the tasks executed within the project was the collection of data from volunteers’ homes, including environmental information and the level of comfort as perceived by the volunteers themselves. While used in the original project, the resulting dataset contains valuable information that could not be explored at the time. We now share this dataset with the community, which can be used for various scenarios. These may include heating appliance optimisation, presence detection and environmental prediction. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

7 pages, 2523 KiB

Open AccessData Descriptor

Dataset for Spectroscopic, Structural and Dynamic Analysis of Human Fe(II)/2OG-Dependent Dioxygenase ALKBH3

by Lyubov Yu. Kanazhevskaya, Alexey A. Gorbunov, Polina V. Zhdanova and Vladimir V. Koval

Data 2023, 8(3), 57; https://doi.org/10.3390/data8030057 - 3 Mar 2023

Cited by 1 | Viewed by 2009

Abstract

Fe(II)/2OG-dependent dioxygenases of the AlkB family catalyze a direct removal of alkylated damages in the course of DNA and RNA repair. A human homolog of the E. coli AlkB ALKBH3 protein is able to hydroxylate N1-methyladenine, N3-methylcytosine, and N1-methylguanine in single-stranded DNA and [...] Read more.

Fe(II)/2OG-dependent dioxygenases of the AlkB family catalyze a direct removal of alkylated damages in the course of DNA and RNA repair. A human homolog of the E. coli AlkB ALKBH3 protein is able to hydroxylate N1-methyladenine, N3-methylcytosine, and N1-methylguanine in single-stranded DNA and RNA. Due to its contribution to an antitumor drug resistance, this enzyme is considered a promising therapeutic target. The elucidation of ALKBH3’s structural peculiarities is important to establish a detailed mechanism of damaged DNA recognition and processing, as well as to the development of specific inhibitors. This work presents new data on the wild type ALKBH3 protein and its four mutant forms (Y143F, Y143A, L177A, and H191A) obtained by circular dichroism (CD) spectroscopy. The dataset includes the CD spectra of proteins measured at different temperatures and a 3D visualization of the ALKBH3–DNA complex where the mutated amino acid residues are marked. These results show how substitution of the key amino acids influences a secondary structure content of the protein. Full article

(This article belongs to the Section Computational Biology, Bioinformatics, and Biomedical Data Science)

► Show Figures

Figure 1

33 pages, 1260 KiB

Open AccessArticle

Learned Sorted Table Search and Static Indexes in Small-Space Data Models

by Domenico Amato, Raffaele Giancarlo and Giosué Lo Bosco

Data 2023, 8(3), 56; https://doi.org/10.3390/data8030056 - 3 Mar 2023

Cited by 3 | Viewed by 2857

Abstract

Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning [...] Read more.

Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning models. Although in their infancy, these are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor, and a major open question concerning this area is to assess to what extent one can enjoy the speeding up of Binary Searches achieved by Learned Indexes while using constant or nearly constant-space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index; and (b) systematically exploring the time–space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. We document a novel and rather complex time–space trade-off picture, which is informative for users as well as designers of Learned Indexing data structures. By adhering to and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model is competitive in time with respect to Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model Index, can achieve speeding up of Binary Search with a model space of

0.05 %

more than the one taken by the table, thereby, being competitive in terms of the time–space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area since they highlight the need for further studies regarding the time–space relation in Learned Indexes. Full article

(This article belongs to the Section Information Systems and Data Management)

► Show Figures

Figure 1

11 pages, 3083 KiB

Open AccessData Descriptor

Dataset AqADAPT: Physicochemical Parameters, Vibrio Abundance, and Species Determination in Water Columns of Two Adriatic Sea Aquaculture Sites

by Marija Purgar, Damir Kapetanović, Ana Gavrilović, Branimir K. Hackenberger, Božidar Kurtović, Ines Haberle, Jadranka Pečar Ilić, Sunčana Geček, Domagoj K. Hackenberger, Tamara Djerdj, Lav Bavčević, Jakov Žunić, Fran Barac, Zvjezdana Šoštarić Vulić and Tin Klanjšček

Data 2023, 8(3), 55; https://doi.org/10.3390/data8030055 - 3 Mar 2023

Cited by 1 | Viewed by 2623

Abstract

Aquaculture provides more than 50% of all seafood for human consumption. This important industrial sector is already under pressure from climate-change-induced shifts in water column temperature, nutrient loads, precipitation patterns, microbial community composition, and ocean acidification, all affecting fish welfare. Disease-related risks are [...] Read more.

Aquaculture provides more than 50% of all seafood for human consumption. This important industrial sector is already under pressure from climate-change-induced shifts in water column temperature, nutrient loads, precipitation patterns, microbial community composition, and ocean acidification, all affecting fish welfare. Disease-related risks are also shifting with important implications for risk from vibriosis, a disease that can lead to massive economic losses. Adaptation to these pressures pose numerous challenges for aquaculture producers, policy makers, and researchers. The dataset AqADAPT aims to help the development of management and adaptation tools by providing (i) measurements of physicochemical (temperature, salinity, total dissolved solids, pH, dissolved oxygen, conductivity, transparency, total nitrogen, ammonia, nitrate, nitrite, total phosphorus, total particulate matter, particulate organic matter, and particulate inorganic matter) and microbiological (heterotrophic (total) bacteria, fecal indicators, and Vibrio abundance) parameters of seawater and (ii) biochemical determination of culturable bacteria in two locations near floating cage fish farms in the Adriatic Sea. Water sampling was conducted seasonally in two fish farms (Cres and Vrgada) and corresponding reference (control) sites between 2019 and 2021 of four vertical layers for a total of 108 observations: the surface, 6 m, 12 m, and the bottom. Full article

► Show Figures

Figure 1

17 pages, 3762 KiB

Open AccessData Descriptor

Manual of GUI Program Governing ABAQUS Simulations of Bar Impact Test for Calibrating Bar Properties, Measured Strain, and Impact Velocity

by Hyunho Shin

Data 2023, 8(3), 54; https://doi.org/10.3390/data8030054 - 1 Mar 2023

Cited by 2 | Viewed by 2776

Abstract

Bar impact instruments, such as the (split) Hopkinson bars and direct impact Hopkinson bars, measure blast/impact waves or mechanical properties of materials at high strain rates. To effectively use such instruments, it is essential to know (i) the elastic properties of the bar, [...] Read more.

Bar impact instruments, such as the (split) Hopkinson bars and direct impact Hopkinson bars, measure blast/impact waves or mechanical properties of materials at high strain rates. To effectively use such instruments, it is essential to know (i) the elastic properties of the bar, (ii) the correction factor of the measured strain, and (iii) information on impact velocity. This paper presents a graphic-user-interface (GUI) program prepared for solving these fundamental issues. We describe the directory structure of the program, roles and relations of associated files, GUI panels, algorithm, and execution procedure of the program. This program employs a separately measured bar density value and governs the ABAQUS simulations (explicit finite element analyses) of the bar impact test at a given impact velocity for a range of bar properties (elastic modulus and Poisson’s ratio) and two correction factors (in compression and tension) of the measured strain. The simulation is repeated until the predicted elastic wave profile in the bar is reasonably consistent with the experimental counterpart. The bar properties and correction factors are determined as the calibrated values when the two wave profiles are reasonably consistent. The program is also capable of impact velocity calibration with reference to a reliably measured bar strain wave. The quantities of a 19.1 mm diameter bar (maraging steel) were successfully calibrated using the presented GUI program. The GUI program, auxiliary programs, pre-processing files, and an example ABAQUS input file are available in a publicly accessible data repository. Full article

► Show Figures

Figure 1

20 pages, 2084 KiB

Open AccessArticle

Toward a Spatially Segregated Urban Growth? Austerity, Poverty, and the Demographic Decline of Metropolitan Greece

by Kostas Rontos, Enrico Maria Mosconi, Mattia Gianvincenzi, Simona Moretti and Luca Salvati

Data 2023, 8(3), 53; https://doi.org/10.3390/data8030053 - 1 Mar 2023

Cited by 4 | Viewed by 2832

Abstract

Metropolitan decline in southern Europe was documented in few cases, being less intensively investigated than in other regions of the continent. Likely for the first time in recent history, the aftermath of the 2007 recession was a time period associated with economic and [...] Read more.

Metropolitan decline in southern Europe was documented in few cases, being less intensively investigated than in other regions of the continent. Likely for the first time in recent history, the aftermath of the 2007 recession was a time period associated with economic and demographic decline in Mediterranean Europe. However, the impacts and consequences of the great crisis were occasionally verified and quantified, both in strictly urban contexts and in the surrounding rural areas. By exploiting official statistics, our study delineates sequential stages of demographic growth and decline in a large metropolitan region (Athens, Greece) as a response to economic expansion and stagnation. Having important implications for the extent and spatial direction of metropolitan cycles, the Athens’ case—taken as an example of urban cycles in Mediterranean Europe—indicates a possibly new dimension of urban shrinkage, with spatially varying population growth and decline along a geographical gradient of income and wealth. Heterogeneous dynamics led to a leapfrog urban expansion decoupled from agglomeration and scale, the factors most likely shaping long-term metropolitan expansion in advanced economies. Demographic decline in urban contexts was associated with multidimensional socioeconomic processes resulting in spatially complex demographic outcomes that require appropriate, and possibly more specific, regulation policies. By shedding further light on recession-driven metropolitan decline in advanced economies, the present study contributes to re-thinking short-term development mechanisms and medium-term demographic scenarios in Mediterranean Europe. Full article

(This article belongs to the Special Issue Data-Driven Approach on Urban Planning and Smart Cities)

► Show Figures

Figure 1

9 pages, 5176 KiB

Open AccessData Descriptor

Dataset on SCADA Data of an Urban Small Wind Turbine Operation in São Paulo, Brazil

by Welson Bassi, Alcantaro Lemes Rodrigues and Ildo Luis Sauer

Data 2023, 8(3), 52; https://doi.org/10.3390/data8030052 - 28 Feb 2023

Cited by 4 | Viewed by 7182

Abstract

Small wind turbines (SWTs) represent an opportunity to promote energy generation technologies from low-carbon renewable sources in cities. Tall buildings are inherently suitable for placing SWTs in urban environments. Thus, the Institute of Energy and Environment of the University of São Paulo (IEE-USP) [...] Read more.

Small wind turbines (SWTs) represent an opportunity to promote energy generation technologies from low-carbon renewable sources in cities. Tall buildings are inherently suitable for placing SWTs in urban environments. Thus, the Institute of Energy and Environment of the University of São Paulo (IEE-USP) has installed an SWT in an existing high-height High Voltage Laboratory building on its campus in São Paulo, Brazil. The dataset file contains data regarding the actual electrical and mechanical operational quantities and control parameters obtained and recorded by the internal inverter of a Skystream 3.7 SWT, with 1.8 kW rated power, from 2017 to 2022. The main electrical parameters are the generated energy, voltages, currents, and power frequency in the connection grid point. Rotation, referential wind speed, and temperatures measured in some points at the inverter and in the nacelle are also recorded. Several other parameters concerning the SWT inverter operation, including alarms and status codes, are also presented. This dataset can be helpful for reanalysis, to access information, such as capacity factor, and can also be used as overall input data of actual SWT operation quantities. Full article

► Show Figures

Figure 1

1 pages, 182 KiB

Open AccessCorrection

Correction: Michel et al. SEN2VENµS, a Dataset for the Training of Sentinel-2 Super-Resolution Algorithms. Data 2022, 7, 96

by Julien Michel, Juan Vinasco-Salinas, Jordi Inglada and Olivier Hagolle

Data 2023, 8(3), 51; https://doi.org/10.3390/data8030051 - 28 Feb 2023

Viewed by 1551

Abstract

There was an error in the original publication [...] Full article

5 pages, 436 KiB

Open AccessData Descriptor

Dataset of Partial Analytical Validation of the 1,2-O-Dilauryl-Rac-Glycero-3-Glutaric Acid-(6′-Methylresorufin) Ester (DGGR) Lipase Assay in Equine Plasma

by Laureen Michèle Peters and Judith Howard

Data 2023, 8(3), 50; https://doi.org/10.3390/data8030050 - 28 Feb 2023

Cited by 1 | Viewed by 1715

Abstract

Laboratory assays require analytical validation to prove they are providing accurate results. This dataset describes the partial analytical validation of lipase activity, measured with the 1,2-o-dilauryl-rac-glycero-3-glutaric acid-(6′-methylresorufin) ester (DGGR) lipase assay in equine plasma. Samples with low (approx. 12 U/L), moderately increased (approx. [...] Read more.

Laboratory assays require analytical validation to prove they are providing accurate results. This dataset describes the partial analytical validation of lipase activity, measured with the 1,2-o-dilauryl-rac-glycero-3-glutaric acid-(6′-methylresorufin) ester (DGGR) lipase assay in equine plasma. Samples with low (approx. 12 U/L), moderately increased (approx. 79 U/L), and markedly increased lipase activity (approx. 298 U/L) were chosen. Linearity was assessed in samples of ascending dilution prepared by mixing samples with low and high lipase activity in different proportions. Repeatability or intra-assay replication was evaluated by measuring each level in 25 replicates within the same run. Reproducibility or inter-assay replication was calculated by measuring each level in five replicates on five consecutive days. The assay was linear in the range of 12–298 U/L (R² = 0.9998) with a <2.3% deviation from the calculated value at any point. Within-run coefficients of variation were 4.43%, 0.69%, and 1.00% for the low, medium, and high samples, respectively. Between-run coefficients of variation were 3.57%, 1.42%, and 1.16%, respectively. To our knowledge, these are the first published data on the analytical validation of the DGGR lipase assay in horses, which may be of interest to veterinary clinical pathologists and equine clinicians measuring DGGR lipase in equine blood for diagnostic and research purposes. Full article

► Show Figures

Figure 1

14 pages, 2048 KiB

Open AccessArticle

Data Balancing Techniques for Predicting Student Dropout Using Machine Learning

by Neema Mduma

Data 2023, 8(3), 49; https://doi.org/10.3390/data8030049 - 27 Feb 2023

Cited by 30 | Viewed by 7853

Abstract

Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the [...] Read more.

Predicting student dropout is a challenging problem in the education sector. This is due to an imbalance in student dropout data, mainly because the number of registered students is always higher than the number of dropout students. Developing a model without taking the data imbalance issue into account may lead to an ungeneralized model. In this study, different data balancing techniques were applied to improve prediction accuracy in the minority class while maintaining a satisfactory overall classification performance. Random Over Sampling, Random Under Sampling, Synthetic Minority Over Sampling, SMOTE with Edited Nearest Neighbor and SMOTE with Tomek links were tested, along with three popular classification models: Logistic Regression, Random Forest, and Multi-Layer Perceptron. Publicly accessible datasets from Tanzania and India were used to evaluate the effectiveness of balancing techniques and prediction models. The results indicate that SMOTE with Edited Nearest Neighbor achieved the best classification performance on the 10-fold holdout sample. Furthermore, Logistic Regression correctly classified the largest number of dropout students (57348 for the Uwezo dataset and 13430 for the India dataset) using the confusion matrix as the evaluation matrix. The applications of these models allow for the precise prediction of at-risk students and the reduction of dropout rates. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-learning and Education)

► Show Figures

Figure 1

12 pages, 3179 KiB

Open AccessData Descriptor

Reconstructed River Water Temperature Dataset for Western Canada 1980–2018

by Rajesh R. Shrestha and Jennifer C. Pesklevits

Data 2023, 8(3), 48; https://doi.org/10.3390/data8030048 - 26 Feb 2023

Cited by 6 | Viewed by 3178

Abstract

Continuous water temperature data are important for understanding historical variability and trends of river thermal regime, as well as impacts of warming climate on aquatic ecosystem health. We describe a reconstructed daily water temperature dataset that supplements sparse historical observations for 55 river [...] Read more.

Continuous water temperature data are important for understanding historical variability and trends of river thermal regime, as well as impacts of warming climate on aquatic ecosystem health. We describe a reconstructed daily water temperature dataset that supplements sparse historical observations for 55 river stations across western Canada. We employed the air2stream model for reconstructing water temperature dataset over the period 1980–2018, with air temperature and discharge data used as model inputs. The model was calibrated and validated by comparing with observed water temperature records, and the results indicate a reasonable statistical performance. We also present historical trends over the ice-free summer months from June to September using the reconstructed dataset, which indicate- significantly increasing water temperature trends for most stations. Besides trend analysis, the dataset could be used for various applications, such as calculation of heat fluxes, calibration/validation of process-based water temperature models, establishment of baseline condition for future climate projections, and assessment of impacts on ecosystems health and water quality. Full article

(This article belongs to the Collection Modern Geophysical and Climate Data Analysis: Tools and Methods)

► Show Figures

Figure 1

15 pages, 581 KiB

Open AccessData Descriptor

A Dataset of Service Time and Related Patient Characteristics from an Outpatient Clinic

by Haolin Feng, Yiwu Jia, Siyi Zhou, Hongyi Chen and Teng Huang

Data 2023, 8(3), 47; https://doi.org/10.3390/data8030047 - 25 Feb 2023

Cited by 1 | Viewed by 6912

Abstract

Outpatient clinics’ productivity largely depends on their appointment scheduling systems. It is crucial for appointment scheduling to understand the intrinsic heterogeneity in patient and service types and act accordingly. This article describes an outpatient clinic dataset of consultation service time with heterogeneous characteristics. [...] Read more.

Outpatient clinics’ productivity largely depends on their appointment scheduling systems. It is crucial for appointment scheduling to understand the intrinsic heterogeneity in patient and service types and act accordingly. This article describes an outpatient clinic dataset of consultation service time with heterogeneous characteristics. The dataset contains 6637 consultation records collected from 381 half-day sessions between 2018 and 2019. Each record includes encrypted session and patient IDs, consultation start and (approximated) end times, the month and day of the week, whether it was on a holiday, the patient’s visit count for a specific medical condition, gender, whether the consultation was cancer-related, and the distance from the patient’s mailing address to the clinic. These features can be used to classify patients into heterogeneous groups in studies of appointment scheduling. Therefore, this dataset with rich, heterogeneous patient characteristics provides a valuable opportunity for healthcare operations management researchers to develop, test, and benchmark the performance of their models and methods. It can also be used for studying appointment scheduling in other service industries. More generally, it provides pedagogical value in areas related to management science and operations research, applied statistics, and machine learning. Full article

► Show Figures

Figure 1

17 pages, 1617 KiB

Open AccessArticle

Analysis of Government Policy Sentiment Regarding Vacation during the COVID-19 Pandemic Using the Bidirectional Encoder Representation from Transformers (BERT)

by Intan Nurma Yulita, Victor Wijaya, Rudi Rosadi, Indra Sarathan, Yusa Djuyandi and Anton Satria Prabuwono

Data 2023, 8(3), 46; https://doi.org/10.3390/data8030046 - 23 Feb 2023

Cited by 9 | Viewed by 4410

Abstract

To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been [...] Read more.

To address the COVID-19 situation in Indonesia, the Indonesian government has adopted a number of policies. One of them is a vacation-related policy. Government measures with regard to this vacation policy have produced a wide range of viewpoints in society, which have been extensively shared on social media, including YouTube. However, there has not been any computerized system developed to date that can assess people’s social media reactions. Therefore, this paper provides a sentiment analysis application to this government policy by employing a bidirectional encoder representation from transformers (BERT) approach. The study method began with data collecting, data labeling, data preprocessing, BERT model training, and model evaluation. This study created a new dataset for this topic. The data were collected from the comments section of YouTube, and were categorized into three categories: positive, neutral, and negative. This research yielded an F-score of 84.33%. Another contribution from this study regards the methodology for processing sentiment analysis in Indonesian. In addition, the model was created as an application using the Python programming language and the Flask framework. The government can learn the extent to which the public accepts the policies that have been implemented by utilizing this research. Full article

(This article belongs to the Special Issue Sentiment Analysis in Social Media Data)

► Show Figures

Figure 1

27 pages, 1261 KiB

Open AccessArticle

Multi-Level Analysis of Learning Management Systems’ User Acceptance Exemplified in Two System Case Studies

by Parisa Shayan, Roberto Rondinelli, Menno van Zaanen and Martin Atzmueller

Data 2023, 8(3), 45; https://doi.org/10.3390/data8030045 - 22 Feb 2023

Cited by 7 | Viewed by 4542

Abstract

There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are [...] Read more.

There has recently been an increasing interest in Learning Management Systems (LMSs). It is currently unclear, however, exactly how these systems are perceived by their users. This article analyzes data on user acceptance for two LMSs (Blackboard and Canvas). The respective data are collected using a questionnaire modeled after the Technology Acceptance Model (TAM); it relates several variables that influence system acceptability, allowing for a detailed analysis of the system acceptance. We present analyses at two levels of the questionnaire data: questions and constructs (taken from TAM) as well as on different analysis levels using targeted methods. First, we investigate the differences between the above LMSs using statistical tests (t-test). Second, we provide results at the question level using descriptive indices, such as the mean and the Gini heterogeneity index, and apply methods for ordinal data using the Cumulative Link Mixed Model (CLMM). Next, we apply the same approach at the TAM construct level plus descriptive network analysis (degree centrality and bipartite motifs) to explore the variability of users’ answers and the degree of users’ satisfaction considering the extracted patterns. In the context of TAM, the statistical model is able to analyze LMS acceptance on the question level. As we are also very much interested in identifying LMS acceptance at the construct level, in this article, we provide both statistical analysis as well as network analysis to explore the connection between questionnaire data and relational data. A network analysis approach is particularly useful when analyzing LMS acceptance on the construct level, as this can take the structure of the users’ answers across questions per construct into account. Taken together, these results suggest a higher rate of user acceptance among Canvas users compared to Blackboard both for the question and construct level. Likewise, the descriptive network modeling for Canvas indicates a slightly higher concordance between Canvas users than Blackboard at the construct level. Full article

(This article belongs to the Special Issue Data Mining and Computational Intelligence for E-learning and Education)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Data, Volume 8, Issue 3 (March 2023) – 18 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI