Next Issue
Volume 6, July
Previous Issue
Volume 6, May

Data, Volume 6, Issue 6 (June 2021) – 15 articles

Cover Story (view full-size image): For the public sector, Application Programming Interfaces (APIs) could greatly facilitate the exchange of data and digital functionalities in a flexible, controlled and secure way. This first-of-its-kind landscape analysis looks at the main European Commission policy instruments on the adoption of APIs, the available web API standards, government API strategies and cases, and current practices. This research reveals that European governments’ API strategies are rather young, but that policy and associated instruments are emerging. There are well-known API standards and promising practices ready to support the digital transformation of governments through rapid, harmonised and successful adoption of APIs. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Readerexternal link to open them.
Order results
Result details
Select all
Export citation of selected articles as:
Data Descriptor
Analyses of Li-Rich Minerals Using Handheld LIBS Tool
Data 2021, 6(6), 68; https://doi.org/10.3390/data6060068 - 21 Jun 2021
Viewed by 513
Abstract
Lithium (Li) is one of the latest metals to be added to the list of critical materials in Europe and, thus, lithium exploration in Europe has become a necessity to guarantee its mid- to long-term stable supply. Laser-induced breakdown spectroscopy (LIBS) is a [...] Read more.
Lithium (Li) is one of the latest metals to be added to the list of critical materials in Europe and, thus, lithium exploration in Europe has become a necessity to guarantee its mid- to long-term stable supply. Laser-induced breakdown spectroscopy (LIBS) is a powerful analysis technique that allows for simultaneous multi-elemental analysis with an excellent coverage of light elements (Z < 13). This data paper provides more than 4000 LIBS spectra obtained using a handheld LIBS tool on approximately 140 Li-content materials (minerals, powder pellets, and rocks) and their Li concentrations. The high resolution of the spectrometers combined with the low detection limits for light elements make the LIBS technique a powerful option to detect Li and trace elements of first interest, such as Be, Cs, F, and Rb. The LIBS spectra dataset combined with the Li content dataset can be used to obtain quantitative estimation of Li in Li-rich matrices. This paper can be utilized as technical and spectroscopic support for Li detection in the field using a portable LIBS instrument. Full article
Show Figures

Figure 1

Article
Semantic Partitioning and Machine Learning in Sentiment Analysis
Data 2021, 6(6), 67; https://doi.org/10.3390/data6060067 - 21 Jun 2021
Viewed by 442
Abstract
This paper investigates sentiment analysis in Arabic tweets that have the presence of Jordanian dialect. A new dataset was collected during the coronavirus disease (COVID-19) pandemic. We demonstrate two models: the Traditional Arabic Language (TAL) model and the Semantic Partitioning Arabic Language (SPAL) [...] Read more.
This paper investigates sentiment analysis in Arabic tweets that have the presence of Jordanian dialect. A new dataset was collected during the coronavirus disease (COVID-19) pandemic. We demonstrate two models: the Traditional Arabic Language (TAL) model and the Semantic Partitioning Arabic Language (SPAL) model to envisage the polarity of the collected tweets by invoking several, well-known classifiers. The extraction and allocation of numerous Arabic features, such as lexical features, writing style features, grammatical features, and emotional features, have been used to analyze and classify the collected tweets semantically. The partitioning concept was performed on the original dataset by utilizing the hidden semantic meaning between tweets in the SPAL model before invoking various classifiers. The experimentation reveals that the overall performance of the SPAL model competes over and better than the performance of the TAL model due to imposing the genuine idea of semantic partitioning on the collected dataset. Full article
Show Figures

Figure 1

Data Descriptor
The NCAR Airborne 94-GHz Cloud Radar: Calibration and Data Processing
Data 2021, 6(6), 66; https://doi.org/10.3390/data6060066 - 19 Jun 2021
Viewed by 384
Abstract
The 94-GHz airborne HIAPER Cloud Radar (HCR) has been deployed in three major field campaigns, sampling clouds over the Pacific between California and Hawaii (2015), over the cold waters of the Southern Ocean (2018), and characterizing tropical convection in the Western Caribbean and [...] Read more.
The 94-GHz airborne HIAPER Cloud Radar (HCR) has been deployed in three major field campaigns, sampling clouds over the Pacific between California and Hawaii (2015), over the cold waters of the Southern Ocean (2018), and characterizing tropical convection in the Western Caribbean and Pacific waters off Panama and Costa Rica (2019). An extensive set of quality assurance and quality control procedures were developed and applied to all collected data. Engineering measurements yielded calibration characteristics for the antenna, reflector, and radome, which were applied during flight, to produce the radar moments in real-time. Temperature changes in the instrument during flight affect the receiver gains, leading to some bias. Post project, we estimate the temperature-induced gain errors and apply gain corrections to improve the quality of the data. The reflectivity calibration is monitored by comparing sea surface cross-section measurements against theoretically calculated model values. These comparisons indicate that the HCR is calibrated to within 1–2 dB of the theory. A radar echo classification algorithm was developed to identify “cloud echo” and distinguish it from artifacts. Model reanalysis data and digital terrain elevation data were interpolated to the time-range grid of the radar data, to provide an environmental reference. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Article
Sustainability of Urbanization, Non-Agricultural Output and Air Pollution in the World’s Top 20 Polluting Countries
by , and
Data 2021, 6(6), 65; https://doi.org/10.3390/data6060065 - 17 Jun 2021
Viewed by 352
Abstract
Rapid urbanization is being increasingly recognized as a significant factor of environmental pollution across the world. However, the significance of sustainable urbanization in controlling both pollution and population remains either limited in scope, in the case of developed countries, or less researched, in [...] Read more.
Rapid urbanization is being increasingly recognized as a significant factor of environmental pollution across the world. However, the significance of sustainable urbanization in controlling both pollution and population remains either limited in scope, in the case of developed countries, or less researched, in the case of developing nations. To fill this gap, the present study employed both theoretical and empirical tools to investigate the significant link between sustainable urbanization, pollution and non-agricultural output. In order to empirically examine the supposed link among the key variables mentioned above, the present study considered a panel of the world’s top 20 polluting countries for the 1991–2018 period, which significantly includes both developed and developing nations. Panel vector error correction model and panel co-integration techniques were employed to derive the possible correlation between the variables through sustainable urbanization. Empirical findings show an absence of equilibrium relations among the three variables in the panel of developed countries. However, the study clearly finds that all the three indicators maintain long-run associations for the panel of developing countries. Furthermore, in the short run, the results determine unambiguously that there are significant causal interplays between any two sets of variables and the remaining one variable for both the panel data of developed and developing countries. On the other hand, short-run interplays among the variables we considered exist for both developed and developing economies. From the perspective of policy formulation, the present study shows that policy makers from both the developed and developing nations should be cautious before encouraging urbanization, at least in the short term. However, the combined effects in the short and long term suggest policy makers should be more careful before encouraging urbanization in developing economies. Full article
Show Figures

Figure 1

Data Descriptor
A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan Areas over a 255-Day Period
Data 2021, 6(6), 64; https://doi.org/10.3390/data6060064 - 16 Jun 2021
Viewed by 583
Abstract
One of the unfortunate findings from the ongoing COVID-19 crisis is the disproportionate impact the crisis has had on people and communities who were already socioeconomically disadvantaged. It has, however, been difficult to study this issue at scale and in greater detail using [...] Read more.
One of the unfortunate findings from the ongoing COVID-19 crisis is the disproportionate impact the crisis has had on people and communities who were already socioeconomically disadvantaged. It has, however, been difficult to study this issue at scale and in greater detail using social media platforms like Twitter. Several COVID-19 Twitter datasets have been released, but they have very broad scope, both topically and geographically. In this paper, we present a more controlled and compact dataset that can be used to answer a range of potential research questions (especially pertaining to computational social science) without requiring extensive preprocessing or tweet-hydration from the earlier datasets. The proposed dataset comprises tens of thousands of geotagged (and in many cases, reverse-geocoded) tweets originally collected over a 255-day period in 2020 over 10 metropolitan areas in North America. Since there are socioeconomic disparities within these cities (sometimes to an extreme extent, as witnessed in ‘inner city neighborhoods’ in some of these cities), the dataset can be used to assess such socioeconomic disparities from a social media lens, in addition to comparing and contrasting behavior across cities. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Data Descriptor
A Disease Control-Oriented Land Cover Land Use Map for Myanmar
Data 2021, 6(6), 63; https://doi.org/10.3390/data6060063 - 13 Jun 2021
Viewed by 641
Abstract
Malaria is a serious infectious disease that leads to massive casualties globally. Myanmar is a key battleground for the global fight against malaria because it is where the emergence of drug-resistant malaria parasites has been documented. Controlling the spread of malaria in Myanmar [...] Read more.
Malaria is a serious infectious disease that leads to massive casualties globally. Myanmar is a key battleground for the global fight against malaria because it is where the emergence of drug-resistant malaria parasites has been documented. Controlling the spread of malaria in Myanmar thus carries global significance, because the failure to do so would lead to devastating consequences in vast areas where malaria is prevalent in tropical/subtropical regions around the world. Thanks to its wide and consistent spatial coverage, remote sensing has become increasingly used in the public health domain. Specifically, remote sensing-based land cover/land use (LCLU) maps present a powerful tool that provides critical information on population distribution and on the potential human-vector interactions interfaces on a large spatial scale. Here, we present a 30-meter LCLU map that was created specifically for the malaria control and eradication efforts in Myanmar. This bottom-up approach can be modified and customized to other vector-borne infectious diseases in Myanmar or other Southeastern Asian countries. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Data Descriptor
Measurements of LoRaWAN Technology in Urban Scenarios: A Data Descriptor
Data 2021, 6(6), 62; https://doi.org/10.3390/data6060062 - 10 Jun 2021
Viewed by 674
Abstract
This work is a data descriptor paper for measurements related to various operational aspects of LoRaWAN communication technology collected in Brno, Czech Republic. This paper also provides data characterizing the long-term behavior of the LoRaWAN channel collected during the two-month measurement campaign. It [...] Read more.
This work is a data descriptor paper for measurements related to various operational aspects of LoRaWAN communication technology collected in Brno, Czech Republic. This paper also provides data characterizing the long-term behavior of the LoRaWAN channel collected during the two-month measurement campaign. It covers two measurement locations, one at the university premises, and the second situated near the city center. The dataset’s primary goal is to provide the researchers lacking LoRaWAN devices with an opportunity to compare and analyze the information obtained from 303 different outdoor test locations transmitting to up to 20 gateways operating in the 868 MHz band in a varying metropolitan landscape. To collect the data, we developed a prototype equipped with a Microchip RN2483 Low-Power Wide-Area Network (LPWAN) LoRaWAN technology transceiver module for the field measurements. As an example of data utilization, we showed the Signal-to-noise Ratio (SNR) and Received Signal Strength Indicator (RSSI) in relation to the closest gateway distance. Full article
Show Figures

Figure 1

Article
A Framework Using Contrastive Learning for Classification with Noisy Labels
Data 2021, 6(6), 61; https://doi.org/10.3390/data6060061 - 09 Jun 2021
Viewed by 520
Abstract
We propose a framework using contrastive learning as a pre-training task to perform image classification in the presence of noisy labels. Recent strategies, such as pseudo-labeling, sample selection with Gaussian Mixture models, and weighted supervised contrastive learning have, been combined into a fine-tuning [...] Read more.
We propose a framework using contrastive learning as a pre-training task to perform image classification in the presence of noisy labels. Recent strategies, such as pseudo-labeling, sample selection with Gaussian Mixture models, and weighted supervised contrastive learning have, been combined into a fine-tuning phase following the pre-training. In this paper, we provide an extensive empirical study showing that a preliminary contrastive learning step brings a significant gain in performance when using different loss functions: non robust, robust, and early-learning regularized. Our experiments performed on standard benchmarks and real-world datasets demonstrate that: (i) the contrastive pre-training increases the robustness of any loss function to noisy labels and (ii) the additional fine-tuning phase can further improve accuracy, but at the cost of additional complexity. Full article
(This article belongs to the Special Issue Machine Learning with Label Noise)
Show Figures

Figure 1

Article
Information Quality Assessment for Data Fusion Systems
Data 2021, 6(6), 60; https://doi.org/10.3390/data6060060 - 08 Jun 2021
Viewed by 572
Abstract
This paper provides a comprehensive description of the current literature on data fusion, with an emphasis on Information Quality (IQ) and performance evaluation. This literature review highlights recent studies that reveal existing gaps, the need to find a synergy between data fusion and [...] Read more.
This paper provides a comprehensive description of the current literature on data fusion, with an emphasis on Information Quality (IQ) and performance evaluation. This literature review highlights recent studies that reveal existing gaps, the need to find a synergy between data fusion and IQ, several research issues, and the challenges and pitfalls in this field. First, the main models, frameworks, architectures, algorithms, solutions, problems, and requirements are analyzed. Second, a general data fusion engineering process is presented to show how complex it is to design a framework for a specific application. Third, an IQ approach, as well as the different methodologies and frameworks used to assess IQ in information systems are addressed; in addition, data fusion systems are presented along with their related criteria. Furthermore, information on the context in data fusion systems and its IQ assessment are discussed. Subsequently, the issue of data fusion systems’ performance is reviewed. Finally, some key aspects and concluding remarks are outlined, and some future lines of work are gathered. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Graphical abstract

Article
APIs for EU Governments: A Landscape Analysis on Policy Instruments, Standards, Strategies and Best Practices
Data 2021, 6(6), 59; https://doi.org/10.3390/data6060059 - 08 Jun 2021
Viewed by 1091
Abstract
Application Programming Interfaces (APIs) could greatly facilitate the exchange of data and functionalities between software applications in a flexible, controlled and secure way, especially on the web. Private companies, from startups to enterprises, have been using APIs for several years now, but it [...] Read more.
Application Programming Interfaces (APIs) could greatly facilitate the exchange of data and functionalities between software applications in a flexible, controlled and secure way, especially on the web. Private companies, from startups to enterprises, have been using APIs for several years now, but it is only recently that APIs have seen increased interest in the public sector. API adoption in the public sector faces organisational, technical, legal and economic obstacles, and to overcome these barriers, proposed methods from the private sector and early adopters in the public sector provide a way forward. The available documentation is often sparse, difficult to find and to reuse for new contexts. No past efforts to collect and analyse these resources have been made. To address this shortcoming, this paper describes a landscape analysis in four areas: the main European Commission policy instruments on the adoption of APIs, the available web API standards, a set of European government API strategies and cases, and a list of government proposed methods distilled from more than 3900 documents. Our results reveal that European policy legislation and associated instruments promote, and in some cases mandate, the use of APIs, and that governments’ API strategies in the European Union are rather young but also that there are well known web APIs standards and proposed methods ready to support the digital transformation of governments through rapid, harmonised and successful adoption of APIs. Full article
(This article belongs to the Special Issue A European Approach to the Establishment of Data Spaces)
Show Figures

Figure 1

Data Descriptor
A Large-Scale Dataset of Barley, Maize and Sorghum Variety Identification Using DNA Fingerprinting in Ethiopia
Data 2021, 6(6), 58; https://doi.org/10.3390/data6060058 - 03 Jun 2021
Viewed by 651
Abstract
The data described in this paper were part of a large-scale nationally representative household survey, the Ethiopian Socioeconomic Survey (ESS 2018/19). Grain samples of barley, maize and sorghum were collected in six regions in Ethiopia. Variety identification was assessed by matching samples to [...] Read more.
The data described in this paper were part of a large-scale nationally representative household survey, the Ethiopian Socioeconomic Survey (ESS 2018/19). Grain samples of barley, maize and sorghum were collected in six regions in Ethiopia. Variety identification was assessed by matching samples to a reference library composed of released improved materials, using approximately 50,000 markers from DArTseq platforms. This data were part of a study documenting the reach of CGIAR-related germplasms in Ethiopia. These objective measures of crop varietal adoption, unique in the public domain, can be analyzed along with a large set of variables related to agro-ecologies, household characteristics and plot management practices, available in the Ethiopian Socioeconomic Survey 2018/19. Full article
Show Figures

Figure 1

Data Descriptor
Dataset for the Solar Incident Radiation and Electricity Production BIPV/BAPV System on the Northern/Southern Façade in Dense Urban Areas
Data 2021, 6(6), 57; https://doi.org/10.3390/data6060057 - 26 May 2021
Viewed by 864
Abstract
The prosperous implementation of Building Integrated Photovoltaics (BIPV), as well as Building Attached Photovoltaics (BAPV), needs an accurate and detailed assessment of the potential of solar irradiation and electricity production of various commercialised technologies in different orientations on the outer skins of the [...] Read more.
The prosperous implementation of Building Integrated Photovoltaics (BIPV), as well as Building Attached Photovoltaics (BAPV), needs an accurate and detailed assessment of the potential of solar irradiation and electricity production of various commercialised technologies in different orientations on the outer skins of the building. This article presents a dataset for the solar incident radiation and electricity production of PV systems in the north and south orientations in a dense urban area (in the northern hemisphere). The solar incident radiation and the electricity production of two back-to-back PV panels with a ten-centimetre gap for one year are monitored and logged as primary data sources. Using Microsoft Excel, both panels’ efficiency is also presented as a secondary source of data. The implemented PV panels are composed of polycrystalline silicon cells with an efficiency of 16.9%. The results depicted that the actual efficiency of the south-facing panel (13–15%) is always closer to the standard efficiency of the panel compared to the actual efficiency of the north-facing panel (8–12%). Moreover, although the efficiency of the south-facing panel on sunny days of the year is almost constant, the efficiency of the north-facing panel decreases significantly in winter. This phenomenon might be linked to the spectral response of the polycrystalline silicon cells and different incident solar radiation spectrum on the panels. While the monitored data cover the radiation and system electricity production in various air conditions, the analysis is mainly conducted for sunny days, and more investigation is needed to analyse the system performance in other weather conditions (like cloudy and overcast skies). The presented database could be used to analyse the performance of polycrystalline silicon PV panels and their operational efficiency in a dense urban area and for different orientations. Full article
Show Figures

Graphical abstract

Article
Automation of Work Processes and Night Work
Data 2021, 6(6), 56; https://doi.org/10.3390/data6060056 - 26 May 2021
Viewed by 741
Abstract
Background: Automation of production processes is not just a simple replacement of a person in production, but it should lead to the success of an organization and contribute to the sustainable development of society and the natural environment. The aim of our study [...] Read more.
Background: Automation of production processes is not just a simple replacement of a person in production, but it should lead to the success of an organization and contribute to the sustainable development of society and the natural environment. The aim of our study was to find out whether the level of automation of production processes affects the proportion of night work hours of production workers and whether employers are willing to automate production processes to achieve a lower number of night work hours. Methods: We used a quantitative approach to collect primary data through the survey method. The questionnaire was completed by 502 large and medium-sized manufacturing companies in Slovenia. Results: We found no statistically significant correlation between the level of automation of production processes and the percentage of night work hours of production workers. We also found that the reduction of the proportion of night work does not appear to be the main motivator for the introduction of automation of production processes. Conclusions: Based on the results, we rejected the assumption that automation of production processes has a direct impact on the proportion of night work. Moreover, our study will benefit all those who are concerned with the automation of production processes and night work. Full article
(This article belongs to the Special Issue Development of a Smart Future under Society 5.0)
Show Figures

Figure 1

Review
Machine Learning-Based Algorithms to Knowledge Extraction from Time Series Data: A Review
Data 2021, 6(6), 55; https://doi.org/10.3390/data6060055 - 25 May 2021
Viewed by 734
Abstract
To predict the future behavior of a system, we can exploit the information collected in the past, trying to identify recurring structures in what happened to predict what could happen, if the same structures repeat themselves in the future as well. A time [...] Read more.
To predict the future behavior of a system, we can exploit the information collected in the past, trying to identify recurring structures in what happened to predict what could happen, if the same structures repeat themselves in the future as well. A time series represents a time sequence of numerical values observed in the past at a measurable variable. The values are sampled at equidistant time intervals, according to an appropriate granular frequency, such as the day, week, or month, and measured according to physical units of measurement. In machine learning-based algorithms, the information underlying the knowledge is extracted from the data themselves, which are explored and analyzed in search of recurring patterns or to discover hidden causal associations or relationships. The prediction model extracts knowledge through an inductive process: the input is the data and, possibly, a first example of the expected output, the machine will then learn the algorithm to follow to obtain the same result. This paper reviews the most recent work that has used machine learning-based techniques to extract knowledge from time series data. Full article
(This article belongs to the Special Issue Knowledge Extraction from Data Using Machine Learning)
Show Figures

Figure 1

Data Descriptor
Data on the Quantification of Aspartate, GABA and Glutamine Levels in the Spinal Cord of Larval Sea Lampreys after a Complete Spinal Cord Injury
Data 2021, 6(6), 54; https://doi.org/10.3390/data6060054 - 24 May 2021
Viewed by 471
Abstract
We used high-performance liquid chromatography (HPLC) methods to quantify aspartate, GABA, and glutamine levels in the spinal cord of larval sea lampreys following a complete spinal cord injury. Mature larval sea lampreys recover spontaneously from a complete spinal cord transection and the changes [...] Read more.
We used high-performance liquid chromatography (HPLC) methods to quantify aspartate, GABA, and glutamine levels in the spinal cord of larval sea lampreys following a complete spinal cord injury. Mature larval sea lampreys recover spontaneously from a complete spinal cord transection and the changes in neurotransmitter systems after spinal cord injury might be related to their amazing regenerative capabilities. The data presented here show the concentration of the aminoacidergic neurotransmitters GABA (and its precursor glutamine) and aspartate in the spinal cord of control (non-injured) and 2-, 4-, and 10-week post-lesion animals. Statistical analyses showed that GABA and aspartate levels significantly increase in the spinal cord four weeks after a complete spinal cord injury and that glutamine levels decrease 10 weeks after injury as compared to controls. These data might be of interest to those studying the role of neurotransmitters and neuromodulators in recovery from spinal cord injury in vertebrates. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop