Special Issue "Data Science and Knowledge Discovery"

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Big Data and Augmented Intelligence".

Deadline for manuscript submissions: closed (31 January 2021).

Special Issue Editor

Dr. Carlos Filipe Da Silva Portela
E-Mail Website
Guest Editor
Founder & CEO of IOTech; Information Systems and Technologies, Algoritmi Research Centre, University of Minho, 4800 Guimarães, Portugal
Interests: knowledge discovery; data science; progressive web apps; research and development
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

The importance and impact of data science (DS) in the decision process are significantly increasing. DS is an interdisciplinary field that combines a set of areas including computer science, machine learning, math and statistics, domain/business knowledge, software development, and traditional research. As a research topic, DS applies scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

Knowledge discovery (KD) is the basis of data science and consists of the creation of knowledge from structured and unstructured sources (e.g., text, data, and images, among others). The output needs to be in a readable and interpretable format. It must represent knowledge in a manner that facilitates inferencing.

Several areas, like education, health, accounting, energy, and public administration are exploring this new trend. In this context, this Special Issue provides an excellent opportunity for providing scientific knowledge and disseminating the findings and achievements through several communities.

This Special Issue will discuss this trending topic and present innovative solutions to show the importance of data science and knowledge discovery to researchers, managers, industry, society, and other communities.

Finally, I would like to thank Gisela Fernandes and her valuable work for assisting me with this Special Issue.

Dr. Carlos Filipe Da Silva Portela
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Data mining
  • Big data
  • Machine learning
  • Text mining
  • Applied data science
  • Image mining
  • Business intelligence
  • Artificial intelligence
  • Intelligent data systems
  • Expert systems
  • Business intelligence.

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

Article
Pervasive Intelligent Models to Predict the Outcome of COVID-19 Patients
Future Internet 2021, 13(4), 102; https://doi.org/10.3390/fi13040102 - 20 Apr 2021
Cited by 1 | Viewed by 395
Abstract
Nowadays, there is an increasing need to understand the behavior of COVID-19. After the Directorate-General of Health of Portugal made available the infected patient’s data, it became possible to analyze it and gather some conclusions, obtaining a better understanding of the matter. In [...] Read more.
Nowadays, there is an increasing need to understand the behavior of COVID-19. After the Directorate-General of Health of Portugal made available the infected patient’s data, it became possible to analyze it and gather some conclusions, obtaining a better understanding of the matter. In this context, the project developed—ioCOVID19—Intelligent Decision Support Platform aims to identify patterns and develop intelligent models to predict and support clinical decisions. This article explores which typologies are associated with different outcomes to help clinicians fight the virus with a decision support system. So, to achieve this purpose, classification algorithms were used, and one target was studied—Patients outcome, that is, to predict if the patient will die or recover. Regarding the obtained results, the model that stood out is composed of scenario s4 (composed of all comorbidities, symptoms, and age), the decision tree algorithm, and the oversampling sampling method. The obtained results by the studied metrics were (in order of importance): Sensitivity of 95.20%, Accuracy of 90.67%, and Specificity of 86.08%. The models were deployed as a service, and they are part of a clinical decision support system that is available for authorized users anywhere and anytime. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Adapting Data-Driven Research to the Fields of Social Sciences and the Humanities
Future Internet 2021, 13(3), 59; https://doi.org/10.3390/fi13030059 - 26 Feb 2021
Cited by 1 | Viewed by 707
Abstract
Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a [...] Read more.
Recent developments in the fields of computer science, such as advances in the areas of big data, knowledge extraction, and deep learning, have triggered the application of data-driven research methods to disciplines such as the social sciences and humanities. This article presents a collaborative, interdisciplinary process for adapting data-driven research to research questions within other disciplines, which considers the methodological background required to obtain a significant impact on the target discipline and guides the systematic collection and formalization of domain knowledge, as well as the selection of appropriate data sources and methods for analyzing, visualizing, and interpreting the results. Finally, we present a case study that applies the described process to the domain of communication science by creating approaches that aid domain experts in locating, tracking, analyzing, and, finally, better understanding the dynamics of media criticism. The study clearly demonstrates the potential of the presented method, but also shows that data-driven research approaches require a tighter integration with the methodological framework of the target discipline to really provide a significant impact on the target discipline. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Graphical abstract

Article
Dashboard COMPRIME_COMPRI_MOv: Multiscalar Spatio-Temporal Monitoring of the COVID-19 Pandemic in Portugal
Future Internet 2021, 13(2), 45; https://doi.org/10.3390/fi13020045 - 12 Feb 2021
Viewed by 834
Abstract
Due to its novelty, the recent pandemic of the coronavirus disease (COVID-19), which is associated with the spread of the new severe acute respiratory syndrome coronavirus (SARS-CoV-2), triggered the public’s interest in accessing information, demonstrating the importance of obtaining and analyzing credible and [...] Read more.
Due to its novelty, the recent pandemic of the coronavirus disease (COVID-19), which is associated with the spread of the new severe acute respiratory syndrome coronavirus (SARS-CoV-2), triggered the public’s interest in accessing information, demonstrating the importance of obtaining and analyzing credible and updated information from an epidemiological surveillance context. For this purpose, health authorities, international organizations, and university institutions have published online various graphic and cartographic representations of the evolution of the pandemic with daily updates that allow the almost real-time monitoring of the evolutionary behavior of the spread, lethality, and territorial distribution of the disease. The purpose of this article is to describe the technical solution and the main results associated with the publication of the COMPRIME_COMPRI_MOv dashboard for the dissemination of information and multi-scale knowledge of COVID-19. Under two rapidly implementing research projects for innovative solutions to respond to the COVID-19 pandemic, promoted in Portugal by the FCT (Foundation for Science and Technology), a website was created. That website brings together a diverse set of variables and indicators in a dynamic and interactive way that reflects the evolutionary behavior of the pandemic from a multi-scale perspective, in Portugal, constituting itself as a system for monitoring the evolution of the pandemic. In the current situation, this type of exploratory solutions proves to be crucial to guarantee everyone’s access to information while simultaneously emerging as an epidemiological surveillance tool that is capable of assisting decision-making by public authorities with competence in defining control policies and fight the spread of the new coronavirus. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks
Future Internet 2021, 13(1), 3; https://doi.org/10.3390/fi13010003 - 25 Dec 2020
Viewed by 972
Abstract
The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life [...] Read more.
The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Article
A Data Augmentation Approach to Distracted Driving Detection
Future Internet 2021, 13(1), 1; https://doi.org/10.3390/fi13010001 - 22 Dec 2020
Cited by 1 | Viewed by 995
Abstract
Distracted driving behavior has become a leading cause of vehicle crashes. This paper proposes a data augmentation method for distracted driving detection based on the driving operation area. First, the class activation mapping method is used to show the key feature areas of [...] Read more.
Distracted driving behavior has become a leading cause of vehicle crashes. This paper proposes a data augmentation method for distracted driving detection based on the driving operation area. First, the class activation mapping method is used to show the key feature areas of driving behavior analysis, and then the driving operation areas are detected by the faster R-CNN detection model for data augmentation. Finally, the convolutional neural network classification mode is implemented and evaluated to detect the original dataset and the driving operation area dataset. The classification result achieves a 96.97% accuracy using the distracted driving dataset. The results show the necessity of driving operation area extraction in the preprocessing stage, which can effectively remove the redundant information in the images to get a higher classification accuracy rate. The method of this research can be used to detect drivers in actual application scenarios to identify dangerous driving behaviors, which helps to give early warning of unsafe driving behaviors and avoid accidents. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Role of Artificial Intelligence in Shaping Consumer Demand in E-Commerce
Future Internet 2020, 12(12), 226; https://doi.org/10.3390/fi12120226 - 08 Dec 2020
Cited by 3 | Viewed by 1327
Abstract
The advent and incorporation of technology in businesses have reformed operations across industries. Notably, major technical shifts in e-commerce aim to influence customer behavior in favor of some products and brands. Artificial intelligence (AI) comes on board as an essential innovative tool for [...] Read more.
The advent and incorporation of technology in businesses have reformed operations across industries. Notably, major technical shifts in e-commerce aim to influence customer behavior in favor of some products and brands. Artificial intelligence (AI) comes on board as an essential innovative tool for personalization and customizing products to meet specific demands. This research finds that, despite the contribution of AI systems in e-commerce, its ethical soundness is a contentious issue, especially regarding the concept of explainability. The study adopted the use of word cloud analysis, voyance analysis, and concordance analysis to gain a detailed understanding of the idea of explainability as has been utilized by researchers in the context of AI. Motivated by a corpus analysis, this research lays the groundwork for a uniform front, thus contributing to a scientific breakthrough that seeks to formulate Explainable Artificial Intelligence (XAI) models. XAI is a machine learning field that inspects and tries to understand the models and steps involved in how the black box decisions of AI systems are made; it provides insights into the decision points, variables, and data used to make a recommendation. This study suggested that, to deploy explainable XAI systems, ML models should be improved, making them interpretable and comprehensible. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Predicting Activities of Daily Living with Spatio-Temporal Information
Future Internet 2020, 12(12), 214; https://doi.org/10.3390/fi12120214 - 27 Nov 2020
Viewed by 696
Abstract
The smart home has begun playing an important role in supporting independent living by monitoring the activities of daily living, typically for the elderly who live alone. Activity recognition in smart homes has been studied by many researchers with much effort spent on [...] Read more.
The smart home has begun playing an important role in supporting independent living by monitoring the activities of daily living, typically for the elderly who live alone. Activity recognition in smart homes has been studied by many researchers with much effort spent on modeling user activities to predict behaviors. Most people, when performing their daily activities, interact with multiple objects both in space and through time. The interactions between user and objects in the home can provide rich contextual information in interpreting human activity. This paper shows the importance of spatial and temporal information for reasoning in smart homes and demonstrates how such information is represented for activity recognition. Evaluation was conducted on three publicly available smart-home datasets. Our method achieved an average recognition accuracy of more than 81% when predicting user activities given the spatial and temporal information. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Monitoring and Support for Elderly People Using LoRa Communication Technologies: IoT Concepts and Applications
Future Internet 2020, 12(11), 206; https://doi.org/10.3390/fi12110206 - 20 Nov 2020
Cited by 2 | Viewed by 951
Abstract
The pandemic declared by the World Health Organization due to the SARS-CoV-2 virus (COVID-19) awakened us to a reality that most of us were previously unaware of—isolation, confinement and the massive use of information and communication technologies, as well as increased knowledge of [...] Read more.
The pandemic declared by the World Health Organization due to the SARS-CoV-2 virus (COVID-19) awakened us to a reality that most of us were previously unaware of—isolation, confinement and the massive use of information and communication technologies, as well as increased knowledge of the difficulties and limitations of their use. This article focuses on the rapid implementation of low-cost technologies, which allow us to answer a fundamental question: how can near real-time monitoring and follow-up of the elderly and their health conditions, as well as their homes, especially for those living in isolated and remote areas, be provided within their care and protect them from risky events? The system proposed here as a proof of concept uses low-cost devices for communication and data processing, supported by Long-Range (LoRa) technology and connection to The Things Network, incorporating various sensors, both personal and in the residence, allowing family members, neighbors and authorized entities, including security forces, to have access to the health condition of system users and the habitability of their homes, as well as their urgent needs, thus evidencing that it is possible, using low-cost systems, to implement sensor networks for monitoring the elderly using the LoRa gateway and other support infrastructures. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Geospatial Assessment of the Territorial Road Network by Fractal Method
Future Internet 2020, 12(11), 201; https://doi.org/10.3390/fi12110201 - 17 Nov 2020
Viewed by 654
Abstract
This paper proposes an approach to the geospatial assessment of a territorial road network based on the fractals theory. This approach allows us to obtain quantitative values of spatial complexity for any transport network and, in contrast to the classical indicators of the [...] Read more.
This paper proposes an approach to the geospatial assessment of a territorial road network based on the fractals theory. This approach allows us to obtain quantitative values of spatial complexity for any transport network and, in contrast to the classical indicators of the transport provisions of a territory (Botcher, Henkel, Engel, Goltz, Uspensky, etc.), consider only the complexity level of the network itself, regardless of the area of the territory. The degree of complexity is measured by a fractal dimension. A method for calculating the fractal dimension based on a combination of box counting and GIS analysis is proposed. We created a geoprocessing script tool for the GIS software system ESRI ArcGIS 10.7, and a study of the spatial pattern of the transport network of the Ukraine territory, and other countries of the world, was made. The results of the study will help to better understand the different aspects of the development of transport networks, their changes over time and the impact on the socioeconomic indicators of urban development. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
An Analysis of the Supply of Open Government Data
Future Internet 2020, 12(11), 186; https://doi.org/10.3390/fi12110186 - 29 Oct 2020
Viewed by 732
Abstract
An index of the release of open government data, published in 2016 by the Open Knowledge Foundation, shows that there is significant variability in the country’s supply of this public good. What explains these cross-country differences? Adopting an interdisciplinary approach based on data [...] Read more.
An index of the release of open government data, published in 2016 by the Open Knowledge Foundation, shows that there is significant variability in the country’s supply of this public good. What explains these cross-country differences? Adopting an interdisciplinary approach based on data science and economic theory, we developed the following research workflow. First, we gather, clean, and merge different datasets released by institutions such as the Open Knowledge Foundation, World Bank, United Nations, World Economic Forum, Transparency International, Economist Intelligence Unit, and International Telecommunication Union. Then, we conduct feature extraction and variable selection founded on economic domain knowledge. Next, we perform several linear regression models, testing whether cross-country differences in the supply of open government data can be explained by differences in the country’s economic, social, and institutional structures. Our analysis provides evidence that the country’s civil liberties, government transparency, quality of democracy, efficiency of government intervention, economies of scale in the provision of public goods, and the size of the economy are statistically significant to explain the cross-country differences in the supply of open government data. Our analysis also suggests that political participation, sociodemographic characteristics, and demographic and global income distribution dummies do not help to explain the country’s supply of open government data. In summary, we show that cross-country differences in governance, social institutions, and the size of the economy can explain the global distribution of open government data. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
A Knowledge-Driven Multimedia Retrieval System Based on Semantics and Deep Features
Future Internet 2020, 12(11), 183; https://doi.org/10.3390/fi12110183 - 28 Oct 2020
Cited by 1 | Viewed by 727
Abstract
In recent years the information user needs have been changed due to the heterogeneity of web contents which increasingly involve in multimedia contents. Although modern search engines provide visual queries, it is not easy to find systems that allow searching from a particular [...] Read more.
In recent years the information user needs have been changed due to the heterogeneity of web contents which increasingly involve in multimedia contents. Although modern search engines provide visual queries, it is not easy to find systems that allow searching from a particular domain of interest and that perform such search by combining text and visual queries. Different approaches have been proposed during years and in the semantic research field many authors proposed techniques based on ontologies. On the other hand, in the context of image retrieval systems techniques based on deep learning have obtained excellent results. In this paper we presented novel approaches for image semantic retrieval and a possible combination for multimedia document analysis. Several results have been presented to show the performance of our approach compared with literature baselines. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Visualization, Interaction and Analysis of Heterogeneous Textbook Resources
Future Internet 2020, 12(10), 176; https://doi.org/10.3390/fi12100176 - 21 Oct 2020
Viewed by 610
Abstract
Historically grown research projects, run by researchers with limited understanding of data sustainability, data reusability and standards, often lead to data silos. While the data are very valuable it can not be used by any service except the tool it was prepared for. [...] Read more.
Historically grown research projects, run by researchers with limited understanding of data sustainability, data reusability and standards, often lead to data silos. While the data are very valuable it can not be used by any service except the tool it was prepared for. Over the years, the number of such data graveyards will increase because new projects will always be designed from scratch. In this work we propose a Component Metadata Infrastructure (CMDI)-based approach for data rescue and data reuse, where data are retroactively joined into one repository minimizing the implementation effort of future research projects. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Employing a Chatbot for News Dissemination during Crisis: Design, Implementation and Evaluation
Future Internet 2020, 12(7), 109; https://doi.org/10.3390/fi12070109 - 30 Jun 2020
Cited by 5 | Viewed by 2983
Abstract
The use of chatbots in news media platforms, although relatively recent, offers many advantages to journalists and media professionals and, at the same time, facilitates users’ interaction with useful and timely information. This study shows the usability of a news chatbot during a [...] Read more.
The use of chatbots in news media platforms, although relatively recent, offers many advantages to journalists and media professionals and, at the same time, facilitates users’ interaction with useful and timely information. This study shows the usability of a news chatbot during a crisis situation, employing the 2020 COVID-19 pandemic as a case study. The basic targets of the research are to design and implement a chatbot in a news media platform with a two-fold aim in regard to evaluation: first, the technical effort of creating a functional and robust news chatbot in a crisis situation both from the AI perspective and interoperability with other platforms, which constitutes the novelty of the approach; and second, users’ perception regarding the appropriation of this news chatbot as an alternative means of accessing existing information during a crisis situation. The chatbot designed was evaluated in terms of effectively fulfilling the social responsibility function of crisis reporting, to deliver timely and accurate information on the COVID-19 pandemic to a wide audience. In this light, this study shows the advantages of implementing chatbots in news platforms during a crisis situation, when the audience’s needs for timely and accurate information rapidly increase. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Other

Jump to: Research

Technical Note
About Rule-Based Systems: Single Database Queries for Decision Making
Future Internet 2020, 12(12), 212; https://doi.org/10.3390/fi12120212 - 27 Nov 2020
Viewed by 789
Abstract
One of the developmental directions of Future Internet technologies is the implementation of artificial intelligence systems for manipulating data and the surrounding world in a more complex way. Rule-based systems, very accessible for people’s decision-making, play an important role in the family of [...] Read more.
One of the developmental directions of Future Internet technologies is the implementation of artificial intelligence systems for manipulating data and the surrounding world in a more complex way. Rule-based systems, very accessible for people’s decision-making, play an important role in the family of computational intelligence methods. The use of decision-making rules along with decision trees are one of the simplest forms of presenting complex decision-making processes. Decision support systems, according to the cross-industry standard process for data mining (CRISP-DM) framework, require final embedding of the learned model in a given computer infrastructure, integrated circuits, etc. In this work, we deal with the topic concerning placing the learned rule-based model of decision support in the database environment-exactly in the SQL database tables. Our main goal is to place the previously trained model in the database and apply it by means of single queries. In our work we assume that the decision-making rules applied are mutually consistent and additionally the Minimal Description Length (MDL) rule is introduced. We propose a universal solution for any IF THEN rule induction algorithm. Full article
(This article belongs to the Special Issue Data Science and Knowledge Discovery)
Show Figures

Figure 1

Back to TopTop