Due to planned maintenance work on our platforms, there might be short service disruptions on Saturday, December 3rd, between 15:00 and 16:00 (CET).

Special Issue "Integrated Artificial Intelligence in Data Science"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 April 2023 | Viewed by 27086

Special Issue Editors

Prof. Dr. Jerry Chun-Wei Lin
E-Mail Website
Guest Editor
Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
Interests: AI and machine learning; data analytics; optimization; soft computing
Special Issues, Collections and Topics in MDPI journals
Dr. Stefania Tomasiello
E-Mail Website
Guest Editor
Institute of Computer Science, University of Tartu, Narva mnt 18, 50090 Tartu, Estonia
Interests: soft computing; machine learning; dynamical systems and control
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Artificial Intelligence (AI) is an emerging research topic since it can be used to solve high-complexity problems and find optimized solutions in many applications and domains. Thus, it has the potential to create a better society. The benefits of AI in science, medicine, technology, and the social sciences have already been shown. Data science, also referred to as pattern analytics and mining, can be used to retrieve useful and meaningful information from databases, which helps to efficiently make decisions and build strategies in different domains. In particular, as a result of the exponential growth of data in recent years, the dual concept of big data and AI has given rise to many research topics, such as scale-up behavior from classical algorithms.

In addition, a recent challenge is the integration of multiple AI technologies, emerging from different fields (e.g., vision, security, control, bioinformatics), in order to develop efficient and robust systems that interact in the real world. In spite of the tremendous progress in core AI technologies in recent years, the integration of such capabilities into larger systems that are reliable, transparent, and maintainable is still in its infancy. There are numerous open issues both from a theoretical and practical perspective.

Topics of interest include, but are not restricted to, the following areas:

  • Data analytics using AI techniques;
  • Evolutionary computation in big datasets;
  • Data-driven AI systems;
  • Machine learning algorithms;
  • Fuzzy modeling and uncertain systems;
  • Data reduction techniques;
  • Deep-learning algorithms in big datasets;
  • Information granularity in high-dimensional data;
  • Pattern mining by machine learning and optimization techniques;
  • Neural network data analytics and prediction;
  • AI-based applications in data science.
Prof. Jerry Chun-Wei Lin
Dr. Stefania Tomasiello
Dr. Gautam Srivastava
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2300 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • AI
  • data-driven analytics
  • machine learning
  • optimization
  • deep learning

Published Papers (17 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Article
Investigation of Classification and Anomalies Based on Machine Learning Methods Applied to Large Scale Building Information Modeling
Appl. Sci. 2022, 12(13), 6382; https://doi.org/10.3390/app12136382 - 23 Jun 2022
Viewed by 577
Abstract
Building Information Models (BIM) capable of collecting and synchronizing all the data related to a construction project into a unified numerical model consisting of a 3D representation and additional metadata (e.g., materials, physical properties, cost) have become commonplace in the building sector. Their [...] Read more.
Building Information Models (BIM) capable of collecting and synchronizing all the data related to a construction project into a unified numerical model consisting of a 3D representation and additional metadata (e.g., materials, physical properties, cost) have become commonplace in the building sector. Their extensive use today, alongside the increase in experience with BIM models, offers new perspectives and potentials for design and planning. However, large-scale complex data collection leads to two main challenges: the first is related to the automatic classification of BIM elements, namely windows, walls, beams, columns, etc., and the second to detecting abnormal elements without manual intervention, particularly in the case of misclassification. In this work, we propose machine learning for the automated classification of elements, and for the detection of anomalies based on geometric inputs and additional metadata properties that are extracted from the building model. More precisely, a Python program is used to decipher the BIM models (available as IFC files) for a series of complex buildings, and three types of machine learning methods are then tested to classify and detect objects from a large set of BIM data. The approach is tested on a variety of practical test cases. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Red Fox Optimizer with Data-Science-Enabled Microarray Gene Expression Classification Model
Appl. Sci. 2022, 12(9), 4172; https://doi.org/10.3390/app12094172 - 21 Apr 2022
Cited by 4 | Viewed by 945
Abstract
Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classification techniques suffer from overfitting and [...] Read more.
Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classification techniques suffer from overfitting and the high dimensionality of gene expression data. Therefore, the feature (gene) selection approach plays a vital role in handling a high dimensionality of data. Data science concepts can be widely employed in several data classification problems, and they identify different class labels. In this aspect, we developed a novel red fox optimizer with deep-learning-enabled microarray gene expression classification (RFODL-MGEC) model. The presented RFODL-MGEC model aims to improve classification performance by selecting appropriate features. The RFODL-MGEC model uses a novel red fox optimizer (RFO)-based feature selection approach for deriving an optimal subset of features. Moreover, the RFODL-MGEC model involves a bidirectional cascaded deep neural network (BCDNN) for data classification. The parameters involved in the BCDNN technique were tuned using the chaos game optimization (CGO) algorithm. Comprehensive experiments on benchmark datasets indicated that the RFODL-MGEC model accomplished superior results for subtype classifications. Therefore, the RFODL-MGEC model was found to be effective for the identification of various classes for high-dimensional and small-scale microarray data. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
A One-Phase Tree-Structure Method to Mine High Temporal Fuzzy Utility Itemsets
Appl. Sci. 2022, 12(6), 2821; https://doi.org/10.3390/app12062821 - 09 Mar 2022
Cited by 1 | Viewed by 854
Abstract
Compared to fuzzy utility itemset mining (FUIM), temporal fuzzy utility itemset mining (TFUIM) has been proposed and paid attention to in recent years. It considers the characteristics of transaction time, sold quantities of items, unit profit, and transformed semantic terms as essential factors. [...] Read more.
Compared to fuzzy utility itemset mining (FUIM), temporal fuzzy utility itemset mining (TFUIM) has been proposed and paid attention to in recent years. It considers the characteristics of transaction time, sold quantities of items, unit profit, and transformed semantic terms as essential factors. In the past, a tree-structure method with two phases was previously presented to solve this problem. However, it spent much time because of the number of candidates generated. This paper thus proposes a one-phase tree-structure method to find the high temporal fuzzy utility itemsets in a temporal database. The tree was designed to maintain candidate 1-itemsets with their upper bound values meeting the defined threshold constraint. Besides, each node in this tree keeps the required data of a 1-itemset for mining. We also designed an algorithm to construct the tree and gave an example to illustrate the mining process in detail. Computational experiments were conducted to demonstrate the one-phase tree-structure method is better than the previous one regarding the execution time on three real datasets. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Applying Machine Learning Techniques to the Audit of Antimicrobial Prophylaxis
Appl. Sci. 2022, 12(5), 2586; https://doi.org/10.3390/app12052586 - 02 Mar 2022
Cited by 1 | Viewed by 721
Abstract
High rates of inappropriate use of surgical antimicrobial prophylaxis were reported in many countries. Auditing the prophylactic antimicrobial use in enormous medical records by manual review is labor-intensive and time-consuming. The purpose of this study is to develop accurate and efficient machine learning [...] Read more.
High rates of inappropriate use of surgical antimicrobial prophylaxis were reported in many countries. Auditing the prophylactic antimicrobial use in enormous medical records by manual review is labor-intensive and time-consuming. The purpose of this study is to develop accurate and efficient machine learning models for auditing appropriate surgical antimicrobial prophylaxis. The supervised machine learning classifiers (Auto-WEKA, multilayer perceptron, decision tree, SimpleLogistic, Bagging, and AdaBoost) were applied to an antimicrobial prophylaxis dataset, which contained 601 instances with 26 attributes. Multilayer perceptron, SimpleLogistic selected by Auto-WEKA, and decision tree algorithms had outstanding discrimination with weighted average AUC > 0.97. The Bagging and SMOTE algorithms could improve the predictive performance of decision tree against imbalanced datasets. Although with better performance measures, multilayer perceptron and Auto-WEKA took more execution time as compared with that of other algorithms. Multilayer perceptron, SimpleLogistic, and decision tree algorithms have outstanding performance measures for identifying the appropriateness of surgical prophylaxis. The efficient models developed by machine learning can be used to assist the antimicrobial stewardship team in the audit of surgical antimicrobial prophylaxis. In future research, we still have the challenges and opportunities of enriching our datasets with more useful clinical information to improve the performance of the algorithms. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Week-Wise Student Performance Early Prediction in Virtual Learning Environment Using a Deep Explainable Artificial Intelligence
Appl. Sci. 2022, 12(4), 1885; https://doi.org/10.3390/app12041885 - 11 Feb 2022
Cited by 1 | Viewed by 1442
Abstract
Early prediction of students’ learning performance and analysis of student behavior in a virtual learning environment (VLE) are crucial to minimize the high failure rate in online courses during the COVID-19 pandemic. Nevertheless, traditional machine learning models fail to predict student performance in [...] Read more.
Early prediction of students’ learning performance and analysis of student behavior in a virtual learning environment (VLE) are crucial to minimize the high failure rate in online courses during the COVID-19 pandemic. Nevertheless, traditional machine learning models fail to predict student performance in the early weeks due to the lack of students’ activities’ data in a week-wise timely manner (i.e., spatiotemporal feature issues). Furthermore, the imbalanced data distribution in the VLE impacts the prediction model performance. Thus, there are severe challenges in handling spatiotemporal features, imbalanced data sets, and a lack of explainability for enhancing the confidence of the prediction system. Therefore, an intelligent framework for explainable student performance prediction (ESPP) is proposed in this study in order to provide the interpretability of the prediction results. First, this framework utilized a time-series weekly student activity data set and dealt with the VLE imbalanced data distribution using a hybrid data sampling method. Then, a combination of convolutional neural network (CNN) and long short-term memory (LSTM) was employed to extract the spatiotemporal features and develop the early prediction deep learning (DL) model. Finally, the DL model was explained by visualizing and analyzing typical predictions, students’ activities’ maps, and feature importance. The numerical results of cross-validation showed that the proposed new DL model (i.e., the combined CNN-LSTM and ConvLSTM), in the early prediction cases, performed better than the baseline models of LSTM, support vector machine (SVM), and logistic regression (LR) models. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
A Comparative Study of Ensemble Models for Predicting Road Traffic Congestion
Appl. Sci. 2022, 12(3), 1337; https://doi.org/10.3390/app12031337 - 27 Jan 2022
Cited by 1 | Viewed by 972
Abstract
Increased road traffic congestion is due to different factors, such as population and economic growth, in different cities globally. On the other hand, many households afford personal vehicles, contributing to the high volume of cars. The primary purpose of this study is to [...] Read more.
Increased road traffic congestion is due to different factors, such as population and economic growth, in different cities globally. On the other hand, many households afford personal vehicles, contributing to the high volume of cars. The primary purpose of this study is to perform a comparative analysis of ensemble methods using road traffic congestion data. Ensemble methods are capable of enhancing the performance of weak classifiers. The comparative analysis was conducted using a real-world dataset and bagging, boosting, stacking and random forest ensemble models to compare the predictive performance of the methods. The ensemble prediction models are developed to predict road traffic congestion. The models are evaluated using the following performance metrics: accuracy, precision, recall, f1-score, and the misclassification cost viewed as a penalty for errors incurred during the classification process. The combination of AdaBoost with decision trees exhibited the best performance in terms of all performance metrics. Additionally, the results showed that the variables that included travel time, traffic volume, and average speed helped predict vehicle traffic flow on the roads. Thus, the model was developed to benefit transport planners, researchers, and transport stakeholders to allocate resources accordingly. Furthermore, adopting this model would benefit commuters and businesses in tandem with other interventions proffered by the transport authorities. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
An Advanced Optimization Approach for Long-Short Pairs Trading Strategy Based on Correlation Coefficients and Bollinger Bands
Appl. Sci. 2022, 12(3), 1052; https://doi.org/10.3390/app12031052 - 20 Jan 2022
Viewed by 1670
Abstract
In the financial market, commodity prices change over time, yielding profit opportunities. Various trading strategies have been proposed to yield good earnings. Pairs trading is one such critical, widely-used strategy with good effect. Given two highly correlated paired target stocks, the strategy suggests [...] Read more.
In the financial market, commodity prices change over time, yielding profit opportunities. Various trading strategies have been proposed to yield good earnings. Pairs trading is one such critical, widely-used strategy with good effect. Given two highly correlated paired target stocks, the strategy suggests buying one when its price falls behind, selling it when its stock price converges, and operating the other stock inversely. In the existing approach, the genetic Bollinger Bands and correlation-coefficient-based pairs trading strategy (GBCPT) utilizes optimization technology to determine the parameters for correlation-based candidate pairs and discover Bollinger Bands-based trading signals. The correlation coefficients are used to calculate the relationship between two stocks through their historical stock prices, and the Bollinger Bands are indicators composed of the moving averages and standard deviations of the stocks. In this paper, to achieve more robust and reliable trading performance, AGBCPT, an advanced GBCPT algorithm, is proposed to take into account volatility and more critical parameters that influence profitability. It encodes six critical parameters into a chromosome. To evaluate the fitness of a chromosome, the encoded parameters are utilized to observe the trading pairs and their trading signals generated from Bollinger Bands. The fitness value is then calculated by the average return and volatility of the long and short trading pairs. The genetic process is repeated to find suitable parameters until the termination condition is met. Experiments on 44 stocks selected from the Taiwan 50 Index are conducted, showing the merits and effectiveness of the proposed approach. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Active Learning Based on Crowdsourced Data
Appl. Sci. 2022, 12(1), 409; https://doi.org/10.3390/app12010409 - 01 Jan 2022
Viewed by 691
Abstract
The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given [...] Read more.
The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given sample is informative or not. The proposed solution reduces the amount of work needed to annotate large sets of data. Furthermore, it allows a perpetual increase in the trained network quality by the inclusion of new samples, gathered after network deployment. The paper also discusses means of limiting network training times, especially in the post-deployment stage, where the size of the training set can increase dramatically. This is done by the introduction of the fourth set composed of samples gather during network actual usage. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Enhanced Image Captioning with Color Recognition Using Deep Learning Methods
Appl. Sci. 2022, 12(1), 209; https://doi.org/10.3390/app12010209 - 26 Dec 2021
Cited by 3 | Viewed by 2636
Abstract
Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder [...] Read more.
Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Efficient Detection of DDoS Attacks Using a Hybrid Deep Learning Model with Improved Feature Selection
Appl. Sci. 2021, 11(24), 11634; https://doi.org/10.3390/app112411634 - 08 Dec 2021
Cited by 13 | Viewed by 2264
Abstract
DDoS (Distributed Denial of Service) attacks have now become a serious risk to the integrity and confidentiality of computer networks and systems, which are essential assets in today’s world. Detecting DDoS attacks is a difficult task that must be accomplished before any mitigation [...] Read more.
DDoS (Distributed Denial of Service) attacks have now become a serious risk to the integrity and confidentiality of computer networks and systems, which are essential assets in today’s world. Detecting DDoS attacks is a difficult task that must be accomplished before any mitigation strategies can be used. The identification of DDoS attacks has already been successfully implemented using machine learning/deep learning (ML/DL). However, due to an inherent limitation of ML/DL frameworks—so-called optimal feature selection—complete accomplishment is likewise out of reach. This is a case in which a machine learning/deep learning-based system does not produce promising results for identifying DDoS attacks. At the moment, existing research on forecasting DDoS attacks has yielded a variety of unexpected predictions utilising machine learning (ML) classifiers and conventional approaches for feature encoding. These previous efforts also made use of deep neural networks to extract features without having to maintain the track of the sequence information. The current work suggests predicting DDoS attacks using a hybrid deep learning (DL) model, namely a CNN with BiLSTM (bidirectional long/short-term memory), in order to effectively anticipate DDoS attacks using benchmark data. By ranking and choosing features that scored the highest in the provided data set, only the most pertinent features were picked. Experiment findings demonstrate that the proposed CNN-BI-LSTM attained an accuracy of up to 94.52 percent using the data set CIC-DDoS2019 during training, testing, and validation. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images
Appl. Sci. 2021, 11(23), 11446; https://doi.org/10.3390/app112311446 - 02 Dec 2021
Viewed by 861
Abstract
Tables are an important element in a document and can express more information with fewer words. Due to the different arrangements of tables and texts, as well as the variety of layouts, table detection is a challenge in the field of document analysis. [...] Read more.
Tables are an important element in a document and can express more information with fewer words. Due to the different arrangements of tables and texts, as well as the variety of layouts, table detection is a challenge in the field of document analysis. Nowadays, as Optical Character Recognition technology has gradually matured, it can help us to obtain text information quickly, and the ability to accurately detect table structures can improve the efficiency of obtaining text content. The process of document digitization is influenced by the editor’s style on the table layout. In addition, many industries rely on a large number of people to process data, which has high expense, thus, the industry imports artificial intelligence and Robotic Process Automation to handle simple and complicated routine text digitization work. Therefore, this paper proposes an end-to-end table detection model, U-SSD, as based on the object detection method of deep learning, takes the Single Shot MultiBox Detector (SSD) as the basic model architecture, improves it by U-Net, and adds dilated convolution to enhance the feature learning capability of the network. The experiment in this study uses the dataset of accident claim documents, as provided by a Taiwanese Law Firm, and conducts table detection. The experimental results show that the proposed method is effective. In addition, the results of the evaluation on open dataset of TableBank, Github, and ICDAR13 show that the SSD-based network architectures can achieve good performance. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
An Improved VGG16 Model for Pneumonia Image Classification
Appl. Sci. 2021, 11(23), 11185; https://doi.org/10.3390/app112311185 - 25 Nov 2021
Cited by 3 | Viewed by 2958
Abstract
Image recognition has been applied to many fields, but it is relatively rarely applied to medical images. Recent significant deep learning progress for image recognition has raised strong research interest in medical image recognition. First of all, we found the prediction result using [...] Read more.
Image recognition has been applied to many fields, but it is relatively rarely applied to medical images. Recent significant deep learning progress for image recognition has raised strong research interest in medical image recognition. First of all, we found the prediction result using the VGG16 model on failed pneumonia X-ray images. Thus, this paper proposes IVGG13 (Improved Visual Geometry Group-13), a modified VGG16 model for classification pneumonia X-rays images. Open-source thoracic X-ray images acquired from the Kaggle platform were employed for pneumonia recognition, but only a few data were obtained, and datasets were unbalanced after classification, either of which can result in extremely poor recognition from trained neural network models. Therefore, we applied augmentation pre-processing to compensate for low data volume and poorly balanced datasets. The original datasets without data augmentation were trained using the proposed and some well-known convolutional neural networks, such as LeNet AlexNet, GoogLeNet and VGG16. In the experimental results, the recognition rates and other evaluation criteria, such as precision, recall and f-measure, were evaluated for each model. This process was repeated for augmented and balanced datasets, with greatly improved metrics such as precision, recall and F1-measure. The proposed IVGG13 model produced superior outcomes with the F1-measure compared with the current best practice convolutional neural networks for medical image recognition, confirming data augmentation effectively improved model accuracy. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
SLA-DQTS: SLA Constrained Adaptive Online Task Scheduling Based on DDQN in Cloud Computing
Appl. Sci. 2021, 11(20), 9360; https://doi.org/10.3390/app11209360 - 09 Oct 2021
Cited by 2 | Viewed by 1011
Abstract
Task scheduling is key to performance optimization and resource management in cloud computing systems. Because of its complexity, it has been defined as an NP problem. We introduce an online scheme to solve the problem of task scheduling under a dynamic load in [...] Read more.
Task scheduling is key to performance optimization and resource management in cloud computing systems. Because of its complexity, it has been defined as an NP problem. We introduce an online scheme to solve the problem of task scheduling under a dynamic load in the cloud environment. After analyzing the process, we propose a server level agreement constraint adaptive online task scheduling algorithm based on double deep Q-learning (SLA-DQTS) to reduce the makespan, cost, and average overdue time under the constraints of virtual machine (VM) resources and deadlines. In the algorithm, we prevent the change of the model input dimension with the number of VMs by taking the Gaussian distribution of related parameters as a part of the state space. Through the design of the reward function, the model can be optimized for different goals and task loads. We evaluate the performance of the algorithm by comparing it with three heuristic algorithms (Min-Min, random, and round robin) under different loads. The results show that the algorithm in this paper can achieve similar or better results than the comparison algorithms at a lower cost. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection
Appl. Sci. 2021, 11(11), 4970; https://doi.org/10.3390/app11114970 - 28 May 2021
Cited by 7 | Viewed by 1206
Abstract
The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their [...] Read more.
The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their geometrical divide. In the previous analyses it was established that the Geometrical Divide (GD) method outperforms the algorithm creating the data particles based on classes by a compound of 1 ÷ 1 cardinality. This occurs in the process of balanced data sets classification, in which class centroids are close to each other and the groups of objects, described by different labels, overlap. The purpose of the article was to examine the efficiency of the Geometrical Divide method in the unbalanced data sets classification, by the example of real case-occupancy detecting. In addition, in the paper, the concept of the Unequal Geometrical Divide (UGD) was developed. The evaluation of approaches was conducted on 26 unbalanced data sets-16 with the features of Moons and Circles data sets and 10 created based on real occupancy data set. In the experiment, the GD method and its unbalanced variant (UGD) as well as the 1CT1P approach, were compared. Each method was combined with three data particle mass determination algorithms-n-Mass Model (n-MM), Stochastic Learning Algorithm (SLA) and Bath-update Algorithm (BLA). k-fold cross validation method, precision, recall, F-measure, and number of used data particles were applied in the evaluation process. Obtained results showed that the methods based on geometrical divide outperform the 1CT1P approach in the imbalanced data sets classification. The article’s conclusion describes the observations and indicates the potential directions of further research and development of methods, which concern creating the data particle through its geometrical divide. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
A New Approach to Group Multi-Objective Optimization under Imperfect Information and Its Application to Project Portfolio Optimization
Appl. Sci. 2021, 11(10), 4575; https://doi.org/10.3390/app11104575 - 17 May 2021
Cited by 4 | Viewed by 1126
Abstract
This paper addresses group multi-objective optimization under a new perspective. For each point in the feasible decision set, satisfaction or dissatisfaction from each group member is determined by a multi-criteria ordinal classification approach, based on comparing solutions with a limiting boundary between classes [...] Read more.
This paper addresses group multi-objective optimization under a new perspective. For each point in the feasible decision set, satisfaction or dissatisfaction from each group member is determined by a multi-criteria ordinal classification approach, based on comparing solutions with a limiting boundary between classes “unsatisfactory” and “satisfactory”. The whole group satisfaction can be maximized, finding solutions as close as possible to the ideal consensus. The group moderator is in charge of making the final decision, finding the best compromise between the collective satisfaction and dissatisfaction. Imperfect information on values of objective functions, required and available resources, and decision model parameters are handled by using interval numbers. Two different kinds of multi-criteria decision models are considered: (i) an interval outranking approach and (ii) an interval weighted-sum value function. The proposal is more general than other approaches to group multi-objective optimization since (a) some (even all) objective values may be not the same for different DMs; (b) each group member may consider their own set of objective functions and constraints; (c) objective values may be imprecise or uncertain; (d) imperfect information on resources availability and requirements may be handled; (e) each group member may have their own perception about the availability of resources and the requirement of resources per activity. An important application of the new approach is collective multi-objective project portfolio optimization. This is illustrated by solving a real size group many-objective project portfolio optimization problem using evolutionary computation tools. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure A1

Article
Natural Language Description of Videos for Smart Surveillance
Appl. Sci. 2021, 11(9), 3730; https://doi.org/10.3390/app11093730 - 21 Apr 2021
Cited by 7 | Viewed by 1427
Abstract
After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by [...] Read more.
After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Article
Improving Monte Carlo Tree Search with Artificial Neural Networks without Heuristics
Appl. Sci. 2021, 11(5), 2056; https://doi.org/10.3390/app11052056 - 25 Feb 2021
Cited by 3 | Viewed by 1671
Abstract
Monte Carlo Tree Search is one of the main search methods studied presently. It has demonstrated its efficiency in the resolution of many games such as Go or Settlers of Catan and other different problems. There are several optimizations of Monte Carlo, but [...] Read more.
Monte Carlo Tree Search is one of the main search methods studied presently. It has demonstrated its efficiency in the resolution of many games such as Go or Settlers of Catan and other different problems. There are several optimizations of Monte Carlo, but most of them need heuristics or some domain language at some point, making very difficult its application to other problems. We propose a general and optimized implementation of Monte Carlo Tree Search using neural networks without extra knowledge of the problem. As an example of our proposal, we made use of the Dots and Boxes game. We tested it against other Monte Carlo system which implements specific knowledge for this problem. Our approach improves accuracy, reaching a winning rate of 81% over previous research but the generalization penalizes performance. Full article
(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)
Show Figures

Figure 1

Back to TopTop