Next Issue
Volume 7, January
Previous Issue
Volume 6, November
 
 

Data, Volume 6, Issue 12 (December 2021) – 12 articles

Cover Story (view full-size image): This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semiparametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
20 pages, 10350 KiB  
Article
A Prototypical Network-Based Approach for Low-Resource Font Typeface Feature Extraction and Utilization
by Kangying Li, Biligsaikhan Batjargal and Akira Maeda
Data 2021, 6(12), 134; https://doi.org/10.3390/data6120134 - 16 Dec 2021
Cited by 1 | Viewed by 2622
Abstract
This paper introduces a framework for retrieving low-resource font typeface databases by handwritten input. A new deep learning model structure based on metric learning is proposed to extract the features of a character typeface and predict the category of handwrittten input queries. Rather [...] Read more.
This paper introduces a framework for retrieving low-resource font typeface databases by handwritten input. A new deep learning model structure based on metric learning is proposed to extract the features of a character typeface and predict the category of handwrittten input queries. Rather than using sufficient training data, we aim to utilize ancient character font typefaces with only one sample per category. Our research aims to achieve decent retrieval performances over more than 600 categories of handwritten characters automatically. We consider utilizing generic handcrafted features to train a model to help the voting classifier make the final prediction. The proposed method is implemented on the ‘Shirakawa font oracle bone script’ dataset as an isolated ancient-character-recognition system based on free ordering and connective strokes. We evaluate the proposed model on several standard character and symbol datasets. The experimental results showed that the proposed method provides good performance in extracting the features of symbols or characters’ font images necessary to perform further retrieval tasks. The demo system has been released, and it requires only one sample for each character to predict the user input. The extracted features have a better effect in finding the highest-ranked relevant item in retrieval tasks and can also be utilized in various technical frameworks for ancient character recognition and can be applied to educational application development. Full article
Show Figures

Figure 1

12 pages, 2596 KiB  
Data Descriptor
Indoor Environment Dataset to Estimate Room Occupancy
by Andreé Vela, Joanna Alvarado-Uribe and Hector G. Ceballos
Data 2021, 6(12), 133; https://doi.org/10.3390/data6120133 - 13 Dec 2021
Cited by 6 | Viewed by 4344
Abstract
The estimation of occupancy is a crucial contribution to achieve improvements in energy efficiency. The drawback of data or incomplete data related to occupancy in enclosed spaces makes it challenging to develop new models focused on estimating occupancy with high accuracy. Furthermore, considerable [...] Read more.
The estimation of occupancy is a crucial contribution to achieve improvements in energy efficiency. The drawback of data or incomplete data related to occupancy in enclosed spaces makes it challenging to develop new models focused on estimating occupancy with high accuracy. Furthermore, considerable variation in the monitored spaces also makes it difficult to compare the results of different approaches. This dataset comprises the indoor environmental information (pressure, altitude, humidity, and temperature) and the corresponding occupancy level for two different rooms: (1) a fitness gym and (2) a living room. The fitness gym data were collected for six days between 18 September and 2 October 2019, obtaining 10,125 objects with a 1 s resolution according to the following occupancy levels: low (2442 objects), medium (5325 objects), and high (2358 objects). The living room data were collected for 11 days between 14 May and 4 June 2020, obtaining 295,823 objects with a 1 s resolution, according to the following occupancy levels: empty (50,978 objects), low (202,613 objects), medium (35,410 objects), and high (6822 objects). Additionally, the number of fans turned on is provided for the living room data. The data are publicly available in the Mendeley Data repository. This dataset can be used to train and compare different machine learning, deep learning, and physical models for estimating occupancy at enclosed spaces. Full article
Show Figures

Graphical abstract

7 pages, 561 KiB  
Data Descriptor
Lipid Profiles of Human Brain Tumors Obtained by High-Resolution Negative Mode Ambient Mass Spectrometry
by Denis S. Zavorotnyuk, Stanislav I. Pekov, Anatoly A. Sorokin, Denis S. Bormotov, Nikita Levin, Evgeny Zhvansky, Savva Semenov, Polina Strelnikova, Konstantin V. Bocharov, Alexander Vorobiev, Alexey Kononikhin, Vsevolod Shurkhay, Eugene N. Nikolaev and Igor A. Popov
Data 2021, 6(12), 132; https://doi.org/10.3390/data6120132 - 12 Dec 2021
Cited by 5 | Viewed by 3105
Abstract
Alterations in cell metabolism, including changes in lipid composition occurring during malignancy, are well characterized for various tumor types. However, a significant part of studies that deal with brain tumors have been performed using cell cultures and animal models. Here, we present a [...] Read more.
Alterations in cell metabolism, including changes in lipid composition occurring during malignancy, are well characterized for various tumor types. However, a significant part of studies that deal with brain tumors have been performed using cell cultures and animal models. Here, we present a dataset of 124 high-resolution negative ionization mode lipid profiles of human brain tumors resected during neurosurgery. The dataset is supplemented with 38 non-tumor pathological brain tissue samples resected during elective surgery. The change in lipid composition alterations of brain tumors enables the possibility of discriminating between malignant and healthy tissues with the implementation of ambient mass spectrometry. On the other hand, the collection of clinical samples allows the comparison of the metabolism alteration patterns in animal models or in vitro models with natural tumor samples ex vivo. The presented dataset is intended to be a data sample for bioinformaticians to test various data analysis techniques with ambient mass spectrometry profiles, or to be a source of clinically relevant data for lipidomic research in oncology. Full article
Show Figures

Figure 1

10 pages, 387 KiB  
Data Descriptor
Panel Dataset to Assess Proactive Eco-Innovation in the Paradigm of Firm Financial Progression
by Md Abu Toha and Satirenjit Kaur Johl
Data 2021, 6(12), 131; https://doi.org/10.3390/data6120131 - 10 Dec 2021
Viewed by 2043
Abstract
Recently, eco-innovation has received a lot of attention in the academic and corporate world due to its potential to accelerate firm financial progression. To measure eco-innovation, mostly primary data and a reactive approach were employed. By emphasising the proactive approach and utilising a [...] Read more.
Recently, eco-innovation has received a lot of attention in the academic and corporate world due to its potential to accelerate firm financial progression. To measure eco-innovation, mostly primary data and a reactive approach were employed. By emphasising the proactive approach and utilising a secondary panel dataset, this study fills the existing research gap. Data presented in this paper comprise 31 energy firms from Bursa Malaysia for the years between 2015 and 2019. Panel data associated with eco-innovation proactiveness and firm financial progression were collected from three different sources such as company websites, annual reports, and sustainability reports using content analysis. For data collection, an index was adapted comprising five dimensions of eco-innovation, named as product, process, technology, organizational, and marketing. In addition to that, Tobin’s Q was considered as a proxy dimension for firm financial progression because it considers both market value as well as book value. Following a unit root test, six specific data diagnostic tests were performed to ensure data reliability and validity for future potential usage. The results reveal that the panel dataset was organised and is eligible for further statistical model analysis. Full article
Show Figures

Figure 1

34 pages, 8431 KiB  
Article
Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody
by Mathilde Marie Duville, Luz María Alonso-Valerdi and David I. Ibarra-Zarate
Data 2021, 6(12), 130; https://doi.org/10.3390/data6120130 - 6 Dec 2021
Cited by 10 | Viewed by 4149
Abstract
In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic [...] Read more.
In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available. Full article
Show Figures

Figure 1

19 pages, 959 KiB  
Article
Shipping Accidents Dataset: Data-Driven Directions for Assessing Accident’s Impact and Improving Safety Onboard
by Panagiotis Panagiotidis, Kyriakos Giannakis, Nikolaos Angelopoulos and Angelos Liapis
Data 2021, 6(12), 129; https://doi.org/10.3390/data6120129 - 3 Dec 2021
Cited by 5 | Viewed by 6570
Abstract
Recent tragic marine incidents indicate that more efficient safety procedures and emergency management systems are needed. During the 2014–2019 period, 320 accidents cost 496 lives, and 5424 accidents caused 6210 injuries. Ideally, we need historical data from real accident cases of ships to [...] Read more.
Recent tragic marine incidents indicate that more efficient safety procedures and emergency management systems are needed. During the 2014–2019 period, 320 accidents cost 496 lives, and 5424 accidents caused 6210 injuries. Ideally, we need historical data from real accident cases of ships to develop data-driven solutions. According to the literature, the most critical factor to the post-incident management phase is human error. However, no structured datasets record the crew’s actions during an incident and the human factors that contributed to its occurrence. To overcome the limitations mentioned above, we decided to utilise the unstructured information from accident reports conducted by governmental organisations to create a new, well-structured dataset of maritime accidents and provide intuitions for its usage. Our dataset contains all the information that the majority of the marine datasets include, such as the place, the date, and the conditions during the post-incident phase, e.g., weather data. Additionally, the proposed dataset contains attributes related to each incident’s environmental/financial impact, as well as a concise description of the post-incident events, highlighting the crew’s actions and the human factors that contributed to the incident. We utilise this dataset to predict the incident’s impact and provide data-driven directions regarding the improvement of the post-incident safety procedures for specific types of ships. Full article
(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)
Show Figures

Figure 1

10 pages, 893 KiB  
Data Descriptor
Geo-Questionnaire for Environmental Planning: The Case of Ecosystem Services Delivered by Trees in Poland
by Patrycja Przewoźna, Adam Inglot, Marcin Mielewczyk, Krzysztof Mączka, Piotr Matczak and Piotr Wężyk
Data 2021, 6(12), 128; https://doi.org/10.3390/data6120128 - 1 Dec 2021
Cited by 4 | Viewed by 2840
Abstract
Studies on society and the environment interface are often based on simple questionnaires that do not allow for an in-depth analysis. Research conducted with geo-questionnaires is an increasingly common method. However, even if data collected via a geo-questionnaire are available, the shared databases [...] Read more.
Studies on society and the environment interface are often based on simple questionnaires that do not allow for an in-depth analysis. Research conducted with geo-questionnaires is an increasingly common method. However, even if data collected via a geo-questionnaire are available, the shared databases provide limited information due to personal data protection. In the article, we present open databases that overcome those limitations. They are the result of the iTre-es project concerning public opinion on the benefits provided by trees and shrubs in four different research areas. The databases provide information on the location of trees that are valuable to the residents, the distances from the respondents’ residence place, their attitude toward tree removal, socio-demographic variables, attachment to the place of life, and environmental attitudes. The presentation of all these aspects was possible thanks to the appropriate aggregation of the results. A method to anonymize the respondents is presented. We discuss the collected data and their possible areas of application. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Graphical abstract

14 pages, 4464 KiB  
Article
Development of A Spatiotemporal Database for Evolution Analysis of the Moscow Backbone Power Grid
by Andrey Karpachevskiy, German Titov and Oksana Filippova
Data 2021, 6(12), 127; https://doi.org/10.3390/data6120127 - 30 Nov 2021
Cited by 4 | Viewed by 3322
Abstract
Currently in the field of transport geography, the spatial evolution of electrical networks remain globally understudied. Publicly available data sources, including remote sensing data, have made it possible to collect spatial data on electrical networks, but at the same time a suitable data [...] Read more.
Currently in the field of transport geography, the spatial evolution of electrical networks remain globally understudied. Publicly available data sources, including remote sensing data, have made it possible to collect spatial data on electrical networks, but at the same time a suitable data structure for storing them has not been defined. The main purpose of this study was the collection and structuring of spatiotemporal data on electric networks with the possibility of their further processing and analysis. To collect data, we used publicly available remote sensing and geoinformation systems, archival schemes and maps, as well as other documents related to the Moscow power grid. Additionally, we developed a web service for data publication and visualization. We conducted a small morphological analysis of the evolution of the network to show the possibilities of working with the database using a Python script. For example, we found that the portion of new lines has been declining since 1950s and in the 2010s the portion of partial reconstruction reached its maximum. Thus, the developed data structure and the database itself provide ample opportunities for the analysis and interpretation of the spatiotemporal development of electric networks. This can be used as a basis to study other territories. The main results of the study are published on the web service where the user can interactively choose a year and two forms of power lines representation to visualize on a map. Full article
Show Figures

Figure 1

12 pages, 7451 KiB  
Data Descriptor
Spatial Interpolation of Air Pollutant and Meteorological Variables in Central Amazonia
by Renato Okabayashi Miyaji, Felipe Valencia de Almeida, Lucas de Oliveira Bauer, Victor Madureira Ferrari, Pedro Luiz Pizzigatti Corrêa, Luciana Varanda Rizzo and Giri Prakash
Data 2021, 6(12), 126; https://doi.org/10.3390/data6120126 - 30 Nov 2021
Viewed by 2841
Abstract
The Amazon Rainforest is highlighted by the global community both for its extensive vegetation cover that constantly suffers the effects of anthropic action and for its substantial biodiversity. This dataset presents data of meteorological variables from the Amazon Rainforest region with a spatial [...] Read more.
The Amazon Rainforest is highlighted by the global community both for its extensive vegetation cover that constantly suffers the effects of anthropic action and for its substantial biodiversity. This dataset presents data of meteorological variables from the Amazon Rainforest region with a spatial resolution of 0.001° in latitude and longitude, resulting from an interpolation process. The original data were obtained from the GoAmazon 2014/5 project, in the Atmospheric Radiation Measurement (ARM) repository, and then processed through mathematical and statistical methods. The dataset presented here can be used in experiments in the field of Data Science, such as training models for predicting climate variables or modeling the distribution of species. Full article
Show Figures

Figure 1

11 pages, 1339 KiB  
Article
Learning Interpretable Mixture of Weibull Distributions—Exploratory Analysis of How Economic Development Influences the Incidence of COVID-19 Deaths
by Róbert Csalódi, Zoltán Birkner and János Abonyi
Data 2021, 6(12), 125; https://doi.org/10.3390/data6120125 - 26 Nov 2021
Viewed by 2607
Abstract
This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a [...] Read more.
This paper presents an algorithm for learning local Weibull models, whose operating regions are represented by fuzzy rules. The applicability of the proposed method is demonstrated in estimating the mortality rate of the COVID-19 pandemic. The reproducible results show that there is a significant difference between mortality rates of countries due to their economic situation, urbanization, and the state of the health sector. The proposed method is compared with the semi-parametric Cox proportional hazard regression method. The distribution functions of these two methods are close to each other, so the proposed method can estimate efficiently. Full article
(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)
Show Figures

Figure 1

18 pages, 1452 KiB  
Article
Crystal Clear: Investigating Databases for Research, the Case of Drone Strikes
by Giampiero Giacomello and Damiano Martinelli
Data 2021, 6(12), 124; https://doi.org/10.3390/data6120124 - 25 Nov 2021
Cited by 1 | Viewed by 2639
Abstract
The availability of numerous online databases offers new and tremendous opportunities for social science research. Furthermore, databases based on news reports often allow scholars to investigate issues otherwise hard to tackle, such as, for example, the impact and consequences of drone strikes. Crucial [...] Read more.
The availability of numerous online databases offers new and tremendous opportunities for social science research. Furthermore, databases based on news reports often allow scholars to investigate issues otherwise hard to tackle, such as, for example, the impact and consequences of drone strikes. Crucial to the campaign against terrorism, official data on drone strikes are classified, but news reports permit a certain degree of independent scrutiny. The quality of such research may be improved if scholars can rely on two (or more) databases independently reporting on the same issue (a solution akin to ‘data triangulation’). Given these conditions, such databases should be as reliable and valid as possible. This paper aimed to discuss the ‘validity and reliability’ of two such databases, as well as open up a debate on the evaluation of the quality, reliability and validity of research data on ‘problematic’ topics that have recently become more accessible thanks to online sources. Full article
Show Figures

Figure 1

7 pages, 1094 KiB  
Data Descriptor
Collection of Bacterial Community Associated with Size Fractionated Aerosols from Kuwait
by Nazima Habibi, Saif Uddin, Fadila Al Salameen, Montaha Behbehani, Faiz Shirshikhar, Nasreem Abdul Razzack, Anisha Shajan and Farhana Zakir Hussain
Data 2021, 6(12), 123; https://doi.org/10.3390/data6120123 - 24 Nov 2021
Cited by 8 | Viewed by 2270
Abstract
Airborne particles play a significant role in the spread of bacterial communities. The prevalence of both pathogenic and non-pathogenic forms in the inhalable fractions of aerosols is known. The abundance of microorganisms in the aerosols heightens the likely health hazards due to inhalation [...] Read more.
Airborne particles play a significant role in the spread of bacterial communities. The prevalence of both pathogenic and non-pathogenic forms in the inhalable fractions of aerosols is known. The abundance of microorganisms in the aerosols heightens the likely health hazards due to inhalation since they serve as carriers for pathogens and allergens, often acting as a vector for pulmonary/respiratory infections. Not much information is available on the occurrence and prevalence of bacterial communities in different size-fractionated aerosols in Kuwait. A high-volume air sampler with a six-stage cascade impactor was deployed for sample collection at two sites representing a remote and an urban site. A total volume of 815 ± 5 m3 of air was passed through the filters to trap the particulate matter ranging from 0.39 to >10.2 μm in size (Stage 1 to Stage 5 and base filter). Aeromonas dominated all the stages at the urban site and Stage 5 at the remote site, whereas Sphingobium was prevalent at Stages, 2, 3 and 4 at the remote site. Brevundimonas were found at Stages 1 and 5, and the base filter at the remote site. These results show that the bacterial community is altered in different size fractions of aerosols. Stages 1–4 form the respirable fraction, whereas Stage 5 and particles on the base filter are the inhalable fractions. Many species of Aeromonas cause disease, and hence their presence in inhalable fractions is a health concern, meaning that species-level identification is warranted. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop