Next Issue
Volume 6, September
Previous Issue
Volume 6, July
 
 

Data, Volume 6, Issue 8 (August 2021) – 16 articles

Cover Story (view full-size image): Manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. The diversity and richness of the existing datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied to detect manipulations in digital photos and videos. This paper presents a dataset obtained by extracting a set of features through DFT from genuine and manipulated photos and videos as well as assigning a label to each entry. The proposed dataset is balanced and has a total amount of 40,588 photos and 12,400 video frames, which are available on Github. The dataset was validated with Convolutional Neural Networks and Support Vector Machines-based methods. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
16 pages, 5223 KiB  
Data Descriptor
Do the European Data Portal Datasets in the Categories Government and Public Sector, Transport and Education, Culture and Sport Meet the Data on the Web Best Practices?
by Morgana Carneiro Andrade, Rafaela Oliveira da Cunha, Jorge Figueiredo and Ana Alice Baptista
Data 2021, 6(8), 94; https://doi.org/10.3390/data6080094 - 19 Aug 2021
Cited by 1 | Viewed by 2544
Abstract
The European Data Portal is one of the worldwide initiatives that aggregates and make open data available. This is a case study with a qualitative approach that aims to determine to what extent the datasets from the Government and Public Sector, Transport, and [...] Read more.
The European Data Portal is one of the worldwide initiatives that aggregates and make open data available. This is a case study with a qualitative approach that aims to determine to what extent the datasets from the Government and Public Sector, Transport, and Education, Culture and Sport categories published on the portal meet the Data on the Web Best Practices (W3C). With the datasets sorted by last modified and filtered by the ratings Excellent and Good+, we analyzed 50 different datasets from each category. The analysis revealed that the Government and Transport categories have the best-rated datasets, followed by Transportation and, lastly, Education. This analysis revealed that the Government and Transport categories have the best-rated datasets and Education the least. The most observed BPs were: BP1, BP2, BP4, BP5, BP10, BP11, BP12, BP13C, BP16, BP17, BP19, BP29, and BP34, while the least observed were: BP3, BP7H, BP7C, BP13H, BP14, BP15, BP21, BP32, and BP35. These results fill a gap in the literature on the quality of the data made available by this portal and provide insights for European data managers on which best practices are most observed and which ones need more attention. Full article
(This article belongs to the Topic Machine and Deep Learning)
Show Figures

Figure 1

32 pages, 6529 KiB  
Article
A Sustainable Method for Publishing Interoperable Open Data on the Web
by Raf Buyle, Brecht Van de Vyvere, Julián Rojas Meléndez, Dwight Van Lancker, Eveline Vlassenroot, Mathias Van Compernolle, Stefan Lefever, Pieter Colpaert, Peter Mechant and Erik Mannens
Data 2021, 6(8), 93; https://doi.org/10.3390/data6080093 - 19 Aug 2021
Cited by 4 | Viewed by 3078
Abstract
Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open [...] Read more.
Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open data in a sustainable, predictable and cost-effective way. Our research question considers what methods for publishing Linked Open Data time series, in particular air quality data, are suitable in a sustainable and cost-effective way. Furthermore, we demonstrate the cross-domain applicability of our data publishing approach through a different use case on railway infrastructure—Linked Open Data. Based on scenarios co-created with various governmental stakeholders, we researched methods to promote data interoperability, scalability and flexibility. The results show that applying a Linked Data Fragments-based approach on public endpoints for air quality and railway infrastructure data, lowers the cost of publishing and increases availability due to better Web caching strategies. Full article
(This article belongs to the Special Issue A European Approach to the Establishment of Data Spaces)
Show Figures

Figure 1

21 pages, 2629 KiB  
Data Descriptor
Country-Specific Interests towards Fall Detection from 2004–2021: An Open Access Dataset and Research Questions
by Nirmalya Thakur and Chia Y. Han
Data 2021, 6(8), 92; https://doi.org/10.3390/data6080092 - 15 Aug 2021
Cited by 30 | Viewed by 6795
Abstract
Falls, which are increasing at an unprecedented rate in the global elderly population, are associated with a multitude of needs such as healthcare, medical, caregiver, and economic, and they are posing various forms of burden on different countries across the world, specifically in [...] Read more.
Falls, which are increasing at an unprecedented rate in the global elderly population, are associated with a multitude of needs such as healthcare, medical, caregiver, and economic, and they are posing various forms of burden on different countries across the world, specifically in the low- and middle-income countries. For these respective countries to anticipate, respond, address, and remedy these diverse needs either by using their existing resources, or by developing new policies and initiatives, or by seeking support from other countries or international organizations dedicated to global public health, the timely identification of these needs and their associated trends is highly necessary. This paper addresses this challenge by presenting a study that uses the potential of the modern Internet of Everything lifestyle, where relevant Google Search data originating from different geographic regions can be interpreted to understand the underlining region-specific user interests towards a specific topic, which further demonstrates the public health need towards the same. The scientific contributions of this study are two-fold. First, it presents an open-access dataset that consists of the user interests towards fall detection for all the 193 countries of the world studied from 2004–2021. In the dataset, the user interest data is available for each month for all these countries in this time range. Second, based on the analysis of potential and emerging research directions in the interrelated fields of Big Data, Data Mining, Information Retrieval, Natural Language Processing, Data Science, and Pattern Recognition, in the context of fall detection research, this paper presents 22 research questions that may be studied, evaluated, and investigated by researchers using this dataset. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

20 pages, 641 KiB  
Article
NagareDB: A Resource-Efficient Document-Oriented Time-Series Database
by Carlos Garcia Calatrava, Yolanda Becerra Fontal, Fernando M. Cucchietti and Carla Diví Cuesta
Data 2021, 6(8), 91; https://doi.org/10.3390/data6080091 - 13 Aug 2021
Cited by 4 | Viewed by 3355
Abstract
The recent great technological advance has led to a broad proliferation of Monitoring Infrastructures, which typically keep track of specific assets along time, ranging from factory machinery, device location, or even people. Gathering this data has become crucial for a wide number of [...] Read more.
The recent great technological advance has led to a broad proliferation of Monitoring Infrastructures, which typically keep track of specific assets along time, ranging from factory machinery, device location, or even people. Gathering this data has become crucial for a wide number of applications, like exploration dashboards or Machine Learning techniques, such as Anomaly Detection. Time-Series Databases, designed to handle these data, grew in popularity, becoming the fastest-growing database type from 2019. In consequence, keeping track and mastering those rapidly evolving technologies became increasingly difficult. This paper introduces the holistic design approach followed for building NagareDB, a Time-Series database built on top of MongoDB—the most popular NoSQL Database, typically discouraged in the Time-Series scenario. The goal of NagareDB is to ease the access to three of the essential resources needed to building time-dependent systems: Hardware, since it is able to work in commodity machines; Software, as it is built on top of an open-source solution; and Expert Personnel, as its foundation database is considered the most popular NoSQL DB, lowering its learning curve. Concretely, NagareDB is able to outperform MongoDB recommended implementation up to 4.7 times, when retrieving data, while also offering a stream-ingestion up to 35% faster than InfluxDB, the most popular Time-Series database. Moreover, by relaxing some requirements, NagareDB is able to reduce the disk space usage up to 40%. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

16 pages, 6757 KiB  
Data Descriptor
Canadian Dental Patients with a Single-Unit Implant-Supported Restoration in the Aesthetic Region of the Mouth: Qualitative and Quantitative Patient-Reported Outcome Measures (PROMs)
by Kelvin I. Afrashtehfar, Kensuke Igarashi and S. Ross Bryant
Data 2021, 6(8), 90; https://doi.org/10.3390/data6080090 - 11 Aug 2021
Cited by 5 | Viewed by 2760
Abstract
This article contains quantitative and qualitative patient-reported outcome measures (PROMs) collected from nine dental patients, with a single-implant in the maxillary anterior region of the mouth, recruited after obtaining consent documents. The quantitative data were obtained from participants’ demographics, frontal extraoral digital photographs, [...] Read more.
This article contains quantitative and qualitative patient-reported outcome measures (PROMs) collected from nine dental patients, with a single-implant in the maxillary anterior region of the mouth, recruited after obtaining consent documents. The quantitative data were obtained from participants’ demographics, frontal extraoral digital photographs, intraoral scans (IOS) of the maxillary arch, and self-administered questionnaires (where patients judged the overall, appearance, function, and comfort of their single-implant-supported crowns). Objective single-implant aesthetic index mean scores (Pink Esthetic Score/White Esthetic Score [PES/WES]) were obtained after two experienced calibrated clinicians analyzed the photographs and the three-dimensional models generated from the IOS. The self-administered questionnaires used a visual analogue scale (VAS) to obtain the patients’ subjective perceptions. The qualitative data were obtained from in-depth, semi-structured one-to-one interviews. The transcriptions from audio-recorded interview data were managed and coded, with the aid of a Computer-Assisted Qualitative Data Analysis Software (CAQDAS). These data were stored in a public repository that can be easily downloaded from a Mendeley data repository (DOI: 10.17632/sv8t6tkvjv.1). Full article
Show Figures

Figure 1

13 pages, 7853 KiB  
Data Descriptor
Geodatabase of Publicly Available Information about Czech Municipalities’ Local Administration
by Vít Pászto, Jiří Pánek and Jaroslav Burian
Data 2021, 6(8), 89; https://doi.org/10.3390/data6080089 - 10 Aug 2021
Cited by 2 | Viewed by 2518
Abstract
In this data description, we introduce a unique (geo)dataset with publicly available information about the municipalities focused on (geo)participatory aspects of local administration. The dataset comprises 6258 Czech municipalities linked with their respective administrative boundaries. In total, 55 attributes were prepared for each [...] Read more.
In this data description, we introduce a unique (geo)dataset with publicly available information about the municipalities focused on (geo)participatory aspects of local administration. The dataset comprises 6258 Czech municipalities linked with their respective administrative boundaries. In total, 55 attributes were prepared for each municipality. We also describe the process of data collection, processing, verification, and publication as open data. The uniqueness of the dataset is that such a complex dataset regarding geographical coverage with a high level of detail (municipalities) has never been collected in Czechia before. Besides, it could be applied in various research agendas in public participation and local administration and used thematically using selected indicators from various participation domains. The dataset is available freely in the Esri geodatabase, geospatial services using API (REST, GeoJSON), and other common non-spatial formats (MS Excel and CSV). Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

15 pages, 4791 KiB  
Data Descriptor
VHR-REA_IT Dataset: Very High Resolution Dynamical Downscaling of ERA5 Reanalysis over Italy by COSMO-CLM
by Mario Raffa, Alfredo Reder, Gian Franco Marras, Marco Mancini, Gabriella Scipione, Monia Santini and Paola Mercogliano
Data 2021, 6(8), 88; https://doi.org/10.3390/data6080088 - 9 Aug 2021
Cited by 18 | Viewed by 7314
Abstract
This work presents a new dataset for recent climate developed within the Highlander project by dynamically downscaling ERA5 reanalysis, originally available at ≃31 km horizontal resolution, to ≃2.2 km resolution (i.e., convection permitting scale). Dynamical downscaling was conducted through the COSMO Regional Climate [...] Read more.
This work presents a new dataset for recent climate developed within the Highlander project by dynamically downscaling ERA5 reanalysis, originally available at ≃31 km horizontal resolution, to ≃2.2 km resolution (i.e., convection permitting scale). Dynamical downscaling was conducted through the COSMO Regional Climate Model (RCM). The temporal resolution of output is hourly (like for ERA5). Runs cover the whole Italian territory (and neighboring areas according to the necessary computation boundary) to provide a very detailed (in terms of space–time resolution) and comprehensive (in terms of meteorological fields) dataset of climatological data for at least the last 30 years (01/1989-12/2020). These types of datasets can be used for (applied) research and downstream services (e.g., for decision support systems). Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

15 pages, 1409 KiB  
Data Descriptor
A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing
by Sara Ferreira, Mário Antunes and Manuel E. Correia
Data 2021, 6(8), 87; https://doi.org/10.3390/data6080087 - 5 Aug 2021
Cited by 11 | Viewed by 6341
Abstract
Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic [...] Read more.
Deepfake and manipulated digital photos and videos are being increasingly used in a myriad of cybercrimes. Ransomware, the dissemination of fake news, and digital kidnapping-related crimes are the most recurrent, in which tampered multimedia content has been the primordial disseminating vehicle. Digital forensic analysis tools are being widely used by criminal investigations to automate the identification of digital evidence in seized electronic equipment. The number of files to be processed and the complexity of the crimes under analysis have highlighted the need to employ efficient digital forensics techniques grounded on state-of-the-art technologies. Machine Learning (ML) researchers have been challenged to apply techniques and methods to improve the automatic detection of manipulated multimedia content. However, the implementation of such methods have not yet been massively incorporated into digital forensic tools, mostly due to the lack of realistic and well-structured datasets of photos and videos. The diversity and richness of the datasets are crucial to benchmark the ML models and to evaluate their appropriateness to be applied in real-world digital forensics applications. An example is the development of third-party modules for the widely used Autopsy digital forensic application. This paper presents a dataset obtained by extracting a set of simple features from genuine and manipulated photos and videos, which are part of state-of-the-art existing datasets. The resulting dataset is balanced, and each entry comprises a label and a vector of numeric values corresponding to the features extracted through a Discrete Fourier Transform (DFT). The dataset is available in a GitHub repository, and the total amount of photos and video frames is 40,588 and 12,400, respectively. The dataset was validated and benchmarked with deep learning Convolutional Neural Networks (CNN) and Support Vector Machines (SVM) methods; however, a plethora of other existing ones can be applied. Generically, the results show a better F1-score for CNN when comparing with SVM, both for photos and videos processing. CNN achieved an F1-score of 0.9968 and 0.8415 for photos and videos, respectively. Regarding SVM, the results obtained with 5-fold cross-validation are 0.9953 and 0.7955, respectively, for photos and videos processing. A set of methods written in Python is available for the researchers, namely to preprocess and extract the features from the original photos and videos files and to build the training and testing sets. Additional methods are also available to convert the original PKL files into CSV and TXT, which gives more flexibility for the ML researchers to use the dataset on existing ML frameworks and tools. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

11 pages, 700 KiB  
Review
Contemporary Business Analytics: An Overview
by Wullianallur Raghupathi and Viju Raghupathi
Data 2021, 6(8), 86; https://doi.org/10.3390/data6080086 - 4 Aug 2021
Cited by 12 | Viewed by 15078
Abstract
We examine the state-of-the-art of the business analytics field by identifying and describing the four types of analytics and the three pillars of modeling. Further, we offer a framework of the interplay between the types of analytics and those pillars of modeling. The [...] Read more.
We examine the state-of-the-art of the business analytics field by identifying and describing the four types of analytics and the three pillars of modeling. Further, we offer a framework of the interplay between the types of analytics and those pillars of modeling. The article describes the architectural framework and outlines an analytics methodology life cycle. Additionally, key contemporary design issues and challenges are highlighted. In this paper, we offer researchers and practitioners a contemporary overview of business analytics. As business analytics has emerged as a distinct discipline with the key objective to gain insight to make informed decisions, this state-of-the art survey sheds light on recent developments in the business analytics discipline. Full article
(This article belongs to the Special Issue Challenges in Business Intelligence)
Show Figures

Figure 1

19 pages, 2921 KiB  
Article
VISEMURE: A Visual Analytics System for Making Sense of Multimorbidity Using Electronic Medical Record Data
by Maede S. Nouri, Daniel J. Lizotte, Kamran Sedig and Sheikh S. Abdullah
Data 2021, 6(8), 85; https://doi.org/10.3390/data6080085 - 4 Aug 2021
Cited by 2 | Viewed by 2694
Abstract
Multimorbidity is a growing healthcare problem, especially for aging populations. Traditional single disease-centric approaches are not suitable for multimorbidity, and a holistic framework is required for health research and for enhancing patient care. Patterns of multimorbidity within populations are complex and difficult to [...] Read more.
Multimorbidity is a growing healthcare problem, especially for aging populations. Traditional single disease-centric approaches are not suitable for multimorbidity, and a holistic framework is required for health research and for enhancing patient care. Patterns of multimorbidity within populations are complex and difficult to communicate with static visualization techniques such as tables and charts. We designed a visual analytics system called VISEMURE that facilitates making sense of data collected from patients with multimorbidity. With VISEMURE, users can interactively create different subsets of electronic medical record data to investigate multimorbidity within different subsets of patients with pre-existing chronic diseases. It also allows the creation of groups of patients based on age, gender, and socioeconomic status for investigation. VISEMURE can use a range of statistical and machine learning techniques and can integrate them seamlessly to compute prevalence and correlation estimates for selected diseases. It presents results using interactive visualizations to help healthcare researchers in making sense of multimorbidity. Using a case study, we demonstrate how VISEMURE can be used to explore the high-dimensional joint distribution of random variables that describes the multimorbidity present in a patient population. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

19 pages, 305 KiB  
Data Descriptor
The Automatic Detection of Dataset Names in Scientific Articles
by Jenny Heddes, Pim Meerdink, Miguel Pieters and Maarten Marx
Data 2021, 6(8), 84; https://doi.org/10.3390/data6080084 - 4 Aug 2021
Cited by 8 | Viewed by 3952
Abstract
We study the task of recognizing named datasets in scientific articles as a Named Entity Recognition (NER) problem. Noticing that available annotated datasets were not adequate for our goals, we annotated 6000 sentences extracted from four major AI conferences, with roughly half of [...] Read more.
We study the task of recognizing named datasets in scientific articles as a Named Entity Recognition (NER) problem. Noticing that available annotated datasets were not adequate for our goals, we annotated 6000 sentences extracted from four major AI conferences, with roughly half of them containing one or more named datasets. A distinguishing feature of this set is the many sentences using enumerations, conjunctions and ellipses, resulting in long BI+ tag sequences. On all measures, the SciBERT NER tagger performed best and most robustly. Our baseline rule based tagger performed remarkably well and better than several state-of-the-art methods. The gold standard dataset, with links and offsets from each sentence to the (open access available) articles together with the annotation guidelines and all code used in the experiments, is available on GitHub. Full article
Show Figures

Figure 1

11 pages, 1911 KiB  
Article
A Global Book Reading Dataset
by Nazanin Sabri and Ingmar Weber
Data 2021, 6(8), 83; https://doi.org/10.3390/data6080083 - 4 Aug 2021
Cited by 3 | Viewed by 6737
Abstract
The choice of what to read is both influenced by and indicative of such factors as a person’s beliefs, culture, gender, and socioeconomic status. However, obtaining data including such personal attributes, as well as detailed reading habits and activities of individuals is difficult [...] Read more.
The choice of what to read is both influenced by and indicative of such factors as a person’s beliefs, culture, gender, and socioeconomic status. However, obtaining data including such personal attributes, as well as detailed reading habits and activities of individuals is difficult and would usually require either (i) data from e-readers, such as the Amazon Kindle, or from library checkouts, both of which are hard to obtain, or (ii) distributing questionnaires and conducting interviews, which can be expensive and suffers from recall bias. In this study, we present a dataset of over 40 million reading instances of 1,872,677 unique individuals collected from Goodreads. Goodreads is a book-cataloging social media platform with millions of users, where users share comments on the books they have read, while creating and maintaining social connections. We enrich the dataset with gender and location information. The dataset presented in this study can be used to perform cross-national and cross-gender analyses of reading behavior among book enthusiasts. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

12 pages, 541 KiB  
Article
Factors Influencing Business Analytics Solutions and Views on Business Problems
by Martin Potančok, Jan Pour and Wui Ip
Data 2021, 6(8), 82; https://doi.org/10.3390/data6080082 - 4 Aug 2021
Cited by 2 | Viewed by 6160
Abstract
The main aim of this paper is to identify and specify factors that influence business analytics. A factor in this context refers to any significant characteristic that defines the environment in which business analytics and business in general are conducted. Factors and their [...] Read more.
The main aim of this paper is to identify and specify factors that influence business analytics. A factor in this context refers to any significant characteristic that defines the environment in which business analytics and business in general are conducted. Factors and their understanding are essential for the quality of final business analytics solutions, given their complexity and interconnectedness. Factors play an extremely important role in analytic thinking and business analysts’ skills and knowledge. These factors determine effective approaches and procedures for business analytics, and, in some cases, they also aid in the decision to delay a business analytics solution given a situation. This paper has used the case study method, a qualitative research method, due to the need to carry out investigation within the actual business (company) environment, in order to be able to fully understand and verify factors affecting analytics from the viewpoint of all stakeholders. This study provides a set of 15 factors from business, company, and market environments, including their importance in business analytics. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

16 pages, 7703 KiB  
Article
Dataset of Gravity-Induced Landforms and Sinkholes of the Northeast Coast of Malta (Central Mediterranean Sea)
by Stefano Devoto, Linley J. Hastewell, Mariacristina Prampolini and Stefano Furlani
Data 2021, 6(8), 81; https://doi.org/10.3390/data6080081 - 31 Jul 2021
Cited by 18 | Viewed by 3301
Abstract
This study investigates gravity-induced landforms that populate the North-Eastern coast of Malta. Attention is focused on tens of persistent joints and thousands of boulders associated with deep-seated gravitational slope deformations (DGSDs), such as lateral spreads and block slides. Lateral spreads produce deep and [...] Read more.
This study investigates gravity-induced landforms that populate the North-Eastern coast of Malta. Attention is focused on tens of persistent joints and thousands of boulders associated with deep-seated gravitational slope deformations (DGSDs), such as lateral spreads and block slides. Lateral spreads produce deep and long joints, which partially isolate limestone boulders along the edge of wide plateaus. These lateral spreads evolve into large block slides that detach thousands of limestone boulders from the cliffs and transport them towards the sea. These boulders are grouped in large slope-failure deposits surrounding limestone plateaus and cover downslope terrains. Gravity-induced joints (n = 124) and downslope boulders (n = 39,861) were identified and categorized using Google Earth (GE) images and later validated by field surveys. The datasets were digitized in QGIS and stored using ESRI shapefiles, which are common digital formats for storing vector GIS data. These types of landslides are characterized by slow-moving mechanisms, which evolve into destructive failures and present an elevated level of risk to coastal populations and infrastructure. Hundreds of blocks identified along the shore also provide evidence of sinkholes; for this reason, the paper also provides a catalogue of sinkholes. The outputs from this research can provide coastal managers with important information regarding the occurrence of coastal geohazards and represent a key resource for future landslide hazard assessment. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Graphical abstract

16 pages, 4110 KiB  
Article
Machine-Learning-Based Prediction of Corrosion Behavior in Additively Manufactured Inconel 718
by O. V. Mythreyi, M. Rohith Srinivaas, Tigga Amit Kumar and R. Jayaganthan
Data 2021, 6(8), 80; https://doi.org/10.3390/data6080080 - 26 Jul 2021
Cited by 20 | Viewed by 4326
Abstract
This research work focuses on machine-learning-assisted prediction of the corrosion behavior of laser-powder-bed-fused (LPBF) and postprocessed Inconel 718. Corrosion testing data of these specimens were collected and fit into the following machine learning algorithms: polynomial regression, support vector regression, decision tree, and extreme [...] Read more.
This research work focuses on machine-learning-assisted prediction of the corrosion behavior of laser-powder-bed-fused (LPBF) and postprocessed Inconel 718. Corrosion testing data of these specimens were collected and fit into the following machine learning algorithms: polynomial regression, support vector regression, decision tree, and extreme gradient boosting. The model performance, after hyperparameter optimization, was evaluated using a set of established metrics: R2, mean absolute error, and root mean square error. Among the algorithms, the extreme gradient boosting algorithm performed best in predicting the corrosion behavior, closely followed by other algorithms. Feature importance analysis was executed in order to determine the postprocessing parameters that influenced the most the corrosion behavior in Inconel 718 manufactured by LPBF. Full article
(This article belongs to the Section Chemoinformatics)
Show Figures

Graphical abstract

3 pages, 170 KiB  
Data Descriptor
Temporal Changes in Delaware Waters Using Long-Term (1967–2019) Water Temperature Data
by Bhanu Paudel and Lori M. Brown
Data 2021, 6(8), 79; https://doi.org/10.3390/data6080079 - 24 Jul 2021
Viewed by 1804
Abstract
The present article provides long-term (1967–2019) water temperature data collected from Delaware water quality monitoring sites. In Delaware, there are approximately 140 water quality monitoring sites in Piedmont, Delaware Bay, Chesapeake Bay, and Inland Bay drainage basins. Long-term quarterly (i.e., four times a [...] Read more.
The present article provides long-term (1967–2019) water temperature data collected from Delaware water quality monitoring sites. In Delaware, there are approximately 140 water quality monitoring sites in Piedmont, Delaware Bay, Chesapeake Bay, and Inland Bay drainage basins. Long-term quarterly (i.e., four times a year: Q1—January–February–March; Q2—April–May–June; Q3—July–August–September; Q4—October–November–December) water temperature data were calculated from each water quality monitoring sites’ continuous monthly data. This study focuses on water quality monitoring sites with significant (p-value identifying linear regression model) increasing or decreasing trends of water temperature. Quarterly water temperature data, statistical analysis, and maps showing increasing and decreasing trend from water quality monitoring sites with significant trends are presented in this article. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Previous Issue
Next Issue
Back to TopTop