Special Issue "Trends of Data Science and Knowledge Discovery"

A special issue of Future Internet (ISSN 1999-5903). This special issue belongs to the section "Big Data and Augmented Intelligence".

Deadline for manuscript submissions: 5 October 2022 | Viewed by 7204

Special Issue Editor

Prof. Dr. Filipe Portela
E-Mail Website
Guest Editor
Founder & CEO of IOTech; Information Systems and Technologies, Algoritmi Research Centre, University of Minho, 4800 Guimarães, Portugal
Interests: knowledge discovery; data science; progressive web apps; research and development
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In a world even more digital, the importance of data in our lives is significantly increasing, and new approaches and solutions arise everywhere in different formats and contexts. Data science (DS) is an interdisciplinary field that combines various areas, including computer science, machine learning, math and statistics, domain/business knowledge, software development, and traditional research. As a research topic, DS applies scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

Knowledge discovery (KD) is the basis of data science and consists of creating knowledge from structured and unstructured sources (e.g., text, data, and images, among others).

Trending topics like gamification, chatbots and blockchain, among others, are taking advantage of using data science and knowledge discovery to improve their solutions and create emerging and pervasive environments.

This Special Issue is an excellent opportunity to provide scientific knowledge and disseminate the findings and achievements through several communities. It will discuss trends and new approaches in this area and present innovative solutions to show the importance of data science and knowledge discovery to researchers, managers, industry, society, and other communities.

Dr. Carlos Filipe Da Silva Portela
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Future Internet is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Applied data science
  • Artificial intelligence
  • Big data
  • Blockchain
  • Business intelligence
  • Chatbots
  • Data analytics
  • Data mining, text mining and image mining
  • Expert systems
  • Gamification
  • Intelligent data systems
  • Machine learning
  • Pervasive data
  • Smart cities

Related Special Issue

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Correlation between Human Emotion and Temporal·Spatial Contexts by Analyzing Environmental Factors
Future Internet 2022, 14(7), 203; https://doi.org/10.3390/fi14070203 - 30 Jun 2022
Viewed by 153
Abstract
In this paper, we propose a method for extracting emotional factors through audiovisual quantitative feature analysis from images of the surrounding environment. Nine features were extracted such as time complexity, spatial complexity (horizontal and vertical), color components (hue and saturation), intensity, contrast, sound [...] Read more.
In this paper, we propose a method for extracting emotional factors through audiovisual quantitative feature analysis from images of the surrounding environment. Nine features were extracted such as time complexity, spatial complexity (horizontal and vertical), color components (hue and saturation), intensity, contrast, sound amplitude, and sound frequency. These nine features were used to infer “pleasant-unpleasant” and “arousal-relaxation” scores through two support vector regressions. First, the inference accuracy for each of the nine features was calculated as a hit ratio to check the distinguishing power of the features. Next, the difference between the position in the two-dimensional emotional plane inferred through SVR and the ground truth determined subjectively by the subject was examined. As a result of the experiment, it was confirmed that the time-complexity feature had the best classification performance, and it was confirmed that the emotion inferred through SVR can be valid when the two-dimensional emotional plane is divided into 3 × 3. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining
Future Internet 2022, 14(6), 181; https://doi.org/10.3390/fi14060181 - 09 Jun 2022
Viewed by 380
Abstract
Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources [...] Read more.
Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multi-view approach to automatically detect redundant activity labels by using not only context-aware features such as control–flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Gamifying Community Education for Enhanced Disaster Resilience: An Effectiveness Testing Study from Australia
Future Internet 2022, 14(6), 179; https://doi.org/10.3390/fi14060179 - 09 Jun 2022
Viewed by 493
Abstract
Providing convenient and effective online education is important for the public to be better prepared for disaster events. Nonetheless, the effectiveness of such education is questionable due to the limited use of online tools and platforms, which also results in narrow community outreach. [...] Read more.
Providing convenient and effective online education is important for the public to be better prepared for disaster events. Nonetheless, the effectiveness of such education is questionable due to the limited use of online tools and platforms, which also results in narrow community outreach. Correspondingly, understanding public perceptions of disaster education methods and experiences for the adoption of novel methods is critical, but this is an understudied area of research. The aim of this study is to understand public perceptions towards online disaster education practices for disaster preparedness and evaluate the effectiveness of the gamification method in increasing public awareness. This study utilizes social media analytics and conducts a gamification exercise. The analysis involved Twitter posts (n = 13,683) related to the 2019–2020 Australian bushfires, and surveyed participants (n = 52) before and after experiencing a gamified application—i.e., STOP Disasters! The results revealed that: (a) The public satisfaction level is relatively low for traditional bushfire disaster education methods; (b) The study participants’ satisfaction level is relatively high for an online gamified application used for disaster education; and (c) The use of virtual and augmented reality was found to be promising for increasing the appeal of gamified applications, along with using a blended traditional and gamified approach. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Optimization of the System of Allocation of Overdue Loans in a Sub-Saharan Africa Microfinance Institution
Future Internet 2022, 14(6), 163; https://doi.org/10.3390/fi14060163 - 27 May 2022
Viewed by 480
Abstract
In microfinance, with more loans, there is a high risk of increasing overdue loans by overloading the resources available to take actions on the repayment. So, three experiments were conducted to search for a distribution of the loans through the officers available to [...] Read more.
In microfinance, with more loans, there is a high risk of increasing overdue loans by overloading the resources available to take actions on the repayment. So, three experiments were conducted to search for a distribution of the loans through the officers available to maximize the probability of recovery. Firstly, the relation between the loan and some characteristics of the officers was analyzed. The results were not that strong with F1 scores between 0 and 0.74, with a lot of variation in the scores of the good predictions. Secondly, the loan is classified as paid/unpaid based on what prediction could result of the analysis of the characteristics of the loan. The Support Vector Machine had potential to be a solution with a F1 score average of 0.625; however, when predicting the unpaid loans, it showed to be random with a score of 0.55. Finally, the experiment focused on segmentation of the overdue loans in different groups, from where it would be possible to know their prioritization. The visualization of three clusters in the data was clear through Principal Component Analysis. To reinforce this good visualization, the final silhouette score was 0.194, which reflects that is a model that can be trusted. This way, an implementation of clustering loans into three groups, and a respective prioritization scale would be the best strategy to organize and assign the loans to maximize recovery. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
The Whole Is Greater than the Sum of the Parts: A Multilayer Approach on Criminal Networks
Future Internet 2022, 14(5), 123; https://doi.org/10.3390/fi14050123 - 20 Apr 2022
Viewed by 701
Abstract
Traditional social network analysis can be generalized to model some networked systems by multilayer structures where the individual nodes develop relationships in multiple layers. A multilayer network is called multiplex if each layer shares at least one node with some other layer. In [...] Read more.
Traditional social network analysis can be generalized to model some networked systems by multilayer structures where the individual nodes develop relationships in multiple layers. A multilayer network is called multiplex if each layer shares at least one node with some other layer. In this paper, we built a unique criminal multiplex network from the pre-trial detention order by the Preliminary Investigation Judge of the Court of Messina (Sicily) issued at the end of the Montagna anti-mafia operation in 2007. Montagna focused on two families who infiltrated several economic activities through a cartel of entrepreneurs close to the Sicilian Mafia. Our network possesses three layers which share 20 nodes. The first captures meetings between suspected criminals, the second records phone calls and the third detects crimes committed by pairs of individuals. We used measures from multilayer network analysis to characterize the actors in the network based on their local edges and their relevance to each specific layer. Then, we used measures of layer similarity to study the relationships between different layers. By studying the actor connectivity and the layer correlation, we demonstrated that a complete picture of the structure and the activities of a criminal organization can be obtained only considering the three layers as a whole multilayer network and not as single-layer networks. Specifically, we showed the usefulness of the multilayer approach by bringing out the importance of actors that does not emerge by studying the three layers separately. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Graphical abstract

Article
Automated Business Goal Extraction from E-mail Repositories to Bootstrap Business Understanding
Future Internet 2021, 13(10), 243; https://doi.org/10.3390/fi13100243 - 23 Sep 2021
Viewed by 654
Abstract
The Cross-Industry Standard Process for Data Mining (CRISP-DM), despite being the most popular data mining process for more than two decades, is known to leave those organizations lacking operational data mining experience puzzled and unable to start their data mining projects. This is [...] Read more.
The Cross-Industry Standard Process for Data Mining (CRISP-DM), despite being the most popular data mining process for more than two decades, is known to leave those organizations lacking operational data mining experience puzzled and unable to start their data mining projects. This is especially apparent in the first phase of Business Understanding, at the conclusion of which, the data mining goals of the project at hand should be specified, which arguably requires at least a conceptual understanding of the knowledge discovery process. We propose to bridge this knowledge gap from a Data Science perspective by applying Natural Language Processing techniques (NLP) to the organizations’ e-mail exchange repositories to extract explicitly stated business goals from the conversations, thus bootstrapping the Business Understanding phase of CRISP-DM. Our NLP-Automated Method for Business Understanding (NAMBU) generates a list of business goals which can subsequently be used for further specification of data mining goals. The validation of the results on the basis of comparison to the results of manual business goal extraction from the Enron corpus demonstrates the usefulness of our NAMBU method when applied to large datasets. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
A Review on Clustering Techniques: Creating Better User Experience for Online Roadshow
Future Internet 2021, 13(9), 233; https://doi.org/10.3390/fi13090233 - 13 Sep 2021
Viewed by 883
Abstract
Online roadshow is a relatively new concept that has higher flexibility and scalability compared to the physical roadshow. This is because online roadshow is accessible through digital devices anywhere and anytime. In a physical roadshow, organizations can measure the effectiveness of the roadshow [...] Read more.
Online roadshow is a relatively new concept that has higher flexibility and scalability compared to the physical roadshow. This is because online roadshow is accessible through digital devices anywhere and anytime. In a physical roadshow, organizations can measure the effectiveness of the roadshow by interacting with the customers. However, organizations cannot monitor the effectiveness of the online roadshow by using the same method. A good user experience is important to increase the advertising effects on the online roadshow website. In web usage mining, clustering can discover user access patterns from the weblog. By applying a clustering technique, the online roadshow website can be further improved to provide a better user experience. This paper presents a review of clustering techniques used in web usage mining, namely the partition-based, hierarchical, density-based, and fuzzy clustering techniques. These clustering techniques are analyzed from three perspectives: their similarity measures, the evaluation metrics used to determine the optimality of the clusters, and the functional purpose of applying the techniques to improve the user experience of the website. By applying clustering techniques in different stages of the user activities in the online roadshow website, the advertising effectiveness of the website can be enhanced in terms of its affordance, flow, and interactivity. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Graphical abstract

Article
Improving RE-SWOT Analysis with Sentiment Classification: A Case Study of Travel Agencies
Future Internet 2021, 13(9), 226; https://doi.org/10.3390/fi13090226 - 30 Aug 2021
Viewed by 992
Abstract
Nowadays, many companies collect online user reviews to determine how users evaluate their products. Dalpiaz and Parente proposed the RE-SWOT method to automatically generate a SWOT matrix based on online user reviews. The SWOT matrix is an important basis for a company to [...] Read more.
Nowadays, many companies collect online user reviews to determine how users evaluate their products. Dalpiaz and Parente proposed the RE-SWOT method to automatically generate a SWOT matrix based on online user reviews. The SWOT matrix is an important basis for a company to perform competitive analysis; therefore, RE-SWOT is a very helpful tool for organizations. Dalpiaz and Parente calculated feature performance scores based on user reviews and ratings to generate the SWOT matrix. However, the authors did not propose a solution for situations when user ratings are not available. Unfortunately, it is not uncommon for forums to only have user reviews but no user ratings. In this paper, sentiment analysis is used to deal with the situation where user ratings are not available. We also use KKday, a start-up online travel agency in Taiwan as an example to demonstrate how to use the proposed method to build a SWOT matrix. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Article
Implementation of a Virtual Assistant for the Academic Management of a University with the Use of Artificial Intelligence
Future Internet 2021, 13(4), 97; https://doi.org/10.3390/fi13040097 - 13 Apr 2021
Cited by 2 | Viewed by 1448
Abstract
Currently, private universities, as a result of the pandemic that the world is facing, are going through very delicate moments in several areas, both academic and financial. Academically, there are learning problems and these are directly related to the dropout rate, which brings [...] Read more.
Currently, private universities, as a result of the pandemic that the world is facing, are going through very delicate moments in several areas, both academic and financial. Academically, there are learning problems and these are directly related to the dropout rate, which brings financial problems. Added to this are the economic problems caused by the pandemic, where the rates of students who want to access a private education have dropped considerably. For this reason, it is necessary for all private universities to have support to improve their student income and avoid cuts in budgets and resources. However, the academic part represents a great effort to fulfill their academic activities, which are the priority, with attention on those interested in pursuing a training programs. To solve these problems, it is important to integrate technologies such as Chatbots, which use artificial intelligence in such a way that tasks such as providing information on an academic courses are addressed by them, reducing the administrative burden and improving the user experience. At the same time, this encourages people to be a part of the college. Full article
(This article belongs to the Special Issue Trends of Data Science and Knowledge Discovery)
Show Figures

Figure 1

Back to TopTop