Applied Sciences

Editorial

Jump to: Research

3 pages, 170 KiB

Open AccessEditorial

Integrated Artificial Intelligence in Data Science

by Jerry Chun-Wei Lin, Stefania Tomasiello and Gautam Srivastava

Appl. Sci. 2023, 13(21), 11612; https://doi.org/10.3390/app132111612 - 24 Oct 2023

Viewed by 1674

Abstract

Artificial Intelligence (AI) is increasingly pervading everyday life since it can be used to solve high-complexity problems, as well as determine optimal solutions, in various domains and for numerous applications [...] Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

Research

Jump to: Editorial

28 pages, 2816 KiB

Open AccessArticle

Enhancing Skills Demand Understanding through Job Ad Segmentation Using NLP and Clustering Techniques

by Mantas Lukauskas, Viktorija Šarkauskaitė, Vaida Pilinkienė, Alina Stundžienė, Andrius Grybauskas and Jurgita Bruneckienė

Appl. Sci. 2023, 13(10), 6119; https://doi.org/10.3390/app13106119 - 16 May 2023

Cited by 15 | Viewed by 5453

Abstract

The labor market has been significantly impacted by the rapidly evolving global landscape, characterized by increased competition, globalization, demographic shifts, and digitization, leading to a demand for new skills and professions. The rapid pace of technological advancements, economic transformations, and changes in workplace [...] Read more.

The labor market has been significantly impacted by the rapidly evolving global landscape, characterized by increased competition, globalization, demographic shifts, and digitization, leading to a demand for new skills and professions. The rapid pace of technological advancements, economic transformations, and changes in workplace practices necessitate that employees continuously adapt to new skill requirements. A quick assessment of these changes enables the identification of skill profiles and the activities of economic fields. This paper aims to utilize natural language processing technologies and data clustering methods to analyze the skill needs of Lithuanian employees, perform a cluster analysis of these skills, and create automated job profiles. The hypothesis that applying natural language processing and clustering in job profile analyzes can allow the real-time assessment of job skill demand changes was investigated. Over five hundred thousand job postings were analyzed to build job/position profiles for further decision-making. In the first stage, data were extracted from the job requirements of entire job advertisement texts. The regex procedure was found to have demonstrated the best results. Data vectorization for initial feature extraction was performed using BERT structure transformers (sentence transformers). Five dimensionality reduction methods were compared, with the UMAP technique producing the best results. The HDBSCAN method proved to be the most effective for clustering, though RCBMIDE also demonstrated a robust performance. Finally, job profile descriptions were generated using generative artificial intelligence based on the compiled job profile skills. Upon expert assessment of the created job profiles and their descriptions, it was concluded that the automated job advertisement analysis algorithm had shown successful results and could therefore be applied in practice. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

26 pages, 5963 KiB

Open AccessArticle

An Ensemble Approach Based on Fuzzy Logic Using Machine Learning Classifiers for Android Malware Detection

by İsmail Atacak

Appl. Sci. 2023, 13(3), 1484; https://doi.org/10.3390/app13031484 - 23 Jan 2023

Cited by 28 | Viewed by 4354

Abstract

In this study, a fuzzy logic-based dynamic ensemble (FL-BDE) model was proposed to detect malware exposed to the Android operating system. The FL-BDE model contains a structure that combines both the processing power of machine learning (ML)-based methods and the decision-making power of [...] Read more.

In this study, a fuzzy logic-based dynamic ensemble (FL-BDE) model was proposed to detect malware exposed to the Android operating system. The FL-BDE model contains a structure that combines both the processing power of machine learning (ML)-based methods and the decision-making power of the Mamdani-type fuzzy inference system (FIS). In this structure, six different methods, namely, logistic regression (LR), Bayes point machine (BPM), boosted decision tree (BDT), neural network (NN), decision forest (DF) and support vector machine (SVM) were used as ML-based methods to benefit from their scores. However, through an approach involving the process of voting and routing, the scores of only three ML-based methods which were more successful in classifying either the negative instances or positive instances were sent to the FIS to be combined. During the combining process, the FIS processed the incoming inputs and determined the malicious application score. Experimental studies were performed by applying the FL-BDE model and ML-based methods to the balanced dataset obtained from the APK files downloaded in the Drebin database and Google Play Store. The obtained results showed us that the FL-BDE model had a much better performance than the ML-based models did, with an accuracy of 0.9933, a recall of 1.00, a specificity of 0.9867, a precision of 0.9868, and an F-measure of 0.9934. These results also proved that the proposed model can be used as a more competitive and powerful malware detection model compared to those of similar studies in the literature. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

14 pages, 6958 KiB

Open AccessArticle

Multi-Range Sequential Learning Based Dark Image Enhancement with Color Upgradation

by Trisha Das Mou, Saadia Binte Alam, Md. Hasibur Rahman, Gautam Srivastava, Mahady Hasan and Mohammad Faisal Uddin

Appl. Sci. 2023, 13(2), 1034; https://doi.org/10.3390/app13021034 - 12 Jan 2023

Cited by 3 | Viewed by 2409

Abstract

Images under low-light conditions suffer from noise, blurring, and low contrast, thus limiting the precise detection of objects. For this purpose, a novel method is introduced based on convolutional neural network (CNN) dual attention unit (DAU) and selective kernel feature synthesis (SKFS) that [...] Read more.

Images under low-light conditions suffer from noise, blurring, and low contrast, thus limiting the precise detection of objects. For this purpose, a novel method is introduced based on convolutional neural network (CNN) dual attention unit (DAU) and selective kernel feature synthesis (SKFS) that merges with the Retinex theory-based model for the enhancement of dark images under low-light conditions. The model mentioned in this paper is a multi-scale residual block made up of several essential components equivalent to an onward convolutional neural network with a VGG16 architecture and various Gaussian convolution kernels. In addition, backpropagation optimizes most of the parameters in this model, whereas the values in conventional models depend on an artificial environment. The model was constructed using simultaneous multi-resolution convolution and dual attention processes. We performed our experiment in the Tesla T4 GPU of Google Colab using the Customized Raw Image Dataset, College Image Dataset (CID), Extreme low-light denoising dataset (ELD), and ExDark dataset. In this approach, an extended set of features is set up to learn from several scales to incorporate contextual data. An extensive performance evaluation on the four above-mentioned standard image datasets showed that MSR-MIRNeT produced standard image enhancement and denoising results with a precision of 97.33%; additionally, the PSNR/SSIM result is 29.73/0.963 which is better than previously established models (MSR, MIRNet, etc.). Furthermore, the output of the proposed model (MSR-MIRNet) shows that this model can be implemented in medical image processing, such as detecting fine scars on pelvic bone segmentation imaging, enhancing contrast for tuberculosis analysis, and being beneficial for robotic visualization in dark environments. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

19 pages, 1526 KiB

Open AccessArticle

Bet-GAT: An Efficient Centrality-Based Graph Attention Model for Semi-Supervised Node Classification

by Atul Kumar Verma, Rahul Saxena, Mahipal Jadeja, Vikrant Bhateja and Jerry Chun-Wei Lin

Appl. Sci. 2023, 13(2), 847; https://doi.org/10.3390/app13020847 - 7 Jan 2023

Cited by 11 | Viewed by 5573

Abstract

Graph Neural Networks (GNNs) have witnessed great advancement in the field of neural networks for processing graph datasets. Graph Convolutional Networks (GCNs) have outperformed current models/algorithms in accomplishing tasks such as semi-supervised node classification, link prediction, and graph classification. GCNs perform well even [...] Read more.

Graph Neural Networks (GNNs) have witnessed great advancement in the field of neural networks for processing graph datasets. Graph Convolutional Networks (GCNs) have outperformed current models/algorithms in accomplishing tasks such as semi-supervised node classification, link prediction, and graph classification. GCNs perform well even with a very small training dataset. The GCN framework has evolved to Graph Attention Model (GAT), GraphSAGE, and other hybrid frameworks. In this paper, we effectively usd the network centrality approach to select nodes from the training set (instead of a traditional random selection), which is fed into GCN (and GAT) to perform semi-supervised node classification tasks. This allows us to take advantage of the best positional nodes in the network. Based on empirical analysis, we choose the betweenness centrality measure for selecting the training nodes. We also mathematically justify why our proposed technique offers better training. This novel training technique is used to analyze the performance of GCN and GAT models on five benchmark networks—Cora, Citeseer, PubMed, Wiki-CS, and Amazon Computers. In GAT implementations, we obtain improved classification accuracy compared to the other state-of-the-art GCN-based methods. Moreover, to the best of our knowledge, the results obtained for Citeseer, Wiki- CS, and Amazon Computer datasets are the best compared to all the existing node classification methods. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

18 pages, 2933 KiB

Open AccessArticle

Fuzzy Multicriteria Decision-Making Model Based on Z Numbers for the Evaluation of Information Technology for Order Picking in Warehouses

by Željko Stević, Edmundas Kazimieras Zavadskas, Ferdous M. O. Tawfiq, Fairouz Tchier and Tatjana Davidov

Appl. Sci. 2022, 12(24), 12533; https://doi.org/10.3390/app122412533 - 7 Dec 2022

Cited by 16 | Viewed by 2540

Abstract

Order-picking process management is one of the most demanding tasks within the operations of a warehouse system. It is especially evident in companies that have a high intensity of product flows, so the question of increasing the productivity of order picking arises. In [...] Read more.

Order-picking process management is one of the most demanding tasks within the operations of a warehouse system. It is especially evident in companies that have a high intensity of product flows, so the question of increasing the productivity of order picking arises. In this paper, a novel integrated fuzzy MCDM (Multicriteria Decision-Making) model was developed for the evaluation and selection of information technologies for order picking in a warehouse system, which is one of the most important novelties and contributions of the paper. Barcode, pick-to-light, pick-to-voice, and pick-to-vision technologies were evaluated based on IMF SWARA (improved fuzzy stepwise weight assessment ratio analysis) and fuzzy EDAS (evaluation based on distance from average solution) based on Z numbers. IMF SWARA-Z was applied to determine the importance of four criteria while the information technologies for order picking were evaluated with the fuzzy EDAS-Z method. The averaging of the estimates of the critera and alternatives was performed using the fuzzy Dombi aggregator. The results show that in this particular case under these research conditions, pick-to-vision is the best order-picking technology. Subsequently, validation tests were carried out, and they included the simulation of criteria weights and the impact of the reverse rank matrix. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

21 pages, 7734 KiB

Open AccessArticle

Investigation of Classification and Anomalies Based on Machine Learning Methods Applied to Large Scale Building Information Modeling

by Manyu Xiao, Zhiqin Chao, Rajan Filomeno Coelho and Shaobo Tian

Appl. Sci. 2022, 12(13), 6382; https://doi.org/10.3390/app12136382 - 23 Jun 2022

Cited by 3 | Viewed by 2301

Abstract

Building Information Models (BIM) capable of collecting and synchronizing all the data related to a construction project into a unified numerical model consisting of a 3D representation and additional metadata (e.g., materials, physical properties, cost) have become commonplace in the building sector. Their [...] Read more.

Building Information Models (BIM) capable of collecting and synchronizing all the data related to a construction project into a unified numerical model consisting of a 3D representation and additional metadata (e.g., materials, physical properties, cost) have become commonplace in the building sector. Their extensive use today, alongside the increase in experience with BIM models, offers new perspectives and potentials for design and planning. However, large-scale complex data collection leads to two main challenges: the first is related to the automatic classification of BIM elements, namely windows, walls, beams, columns, etc., and the second to detecting abnormal elements without manual intervention, particularly in the case of misclassification. In this work, we propose machine learning for the automated classification of elements, and for the detection of anomalies based on geometric inputs and additional metadata properties that are extracted from the building model. More precisely, a Python program is used to decipher the BIM models (available as IFC files) for a series of complex buildings, and three types of machine learning methods are then tested to classify and detect objects from a large set of BIM data. The approach is tested on a variety of practical test cases. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

21 pages, 6335 KiB

Open AccessArticle

Red Fox Optimizer with Data-Science-Enabled Microarray Gene Expression Classification Model

by Thavavel Vaiyapuri, Liyakathunisa, Haya Alaskar, Eman Aljohani, S. Shridevi and Abir Hussain

Appl. Sci. 2022, 12(9), 4172; https://doi.org/10.3390/app12094172 - 21 Apr 2022

Cited by 22 | Viewed by 3386

Abstract

Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classification techniques suffer from overfitting and [...] Read more.

Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classification techniques suffer from overfitting and the high dimensionality of gene expression data. Therefore, the feature (gene) selection approach plays a vital role in handling a high dimensionality of data. Data science concepts can be widely employed in several data classification problems, and they identify different class labels. In this aspect, we developed a novel red fox optimizer with deep-learning-enabled microarray gene expression classification (RFODL-MGEC) model. The presented RFODL-MGEC model aims to improve classification performance by selecting appropriate features. The RFODL-MGEC model uses a novel red fox optimizer (RFO)-based feature selection approach for deriving an optimal subset of features. Moreover, the RFODL-MGEC model involves a bidirectional cascaded deep neural network (BCDNN) for data classification. The parameters involved in the BCDNN technique were tuned using the chaos game optimization (CGO) algorithm. Comprehensive experiments on benchmark datasets indicated that the RFODL-MGEC model accomplished superior results for subtype classifications. Therefore, the RFODL-MGEC model was found to be effective for the identification of various classes for high-dimensional and small-scale microarray data. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

15 pages, 19339 KiB

Open AccessArticle

A One-Phase Tree-Structure Method to Mine High Temporal Fuzzy Utility Itemsets

by Tzung-Pei Hong, Cheng-Yu Lin, Wei-Ming Huang, Shu-Min Li, Shyue-Liang Wang and Jerry Chun-Wei Lin

Appl. Sci. 2022, 12(6), 2821; https://doi.org/10.3390/app12062821 - 9 Mar 2022

Cited by 10 | Viewed by 2487

Abstract

Compared to fuzzy utility itemset mining (FUIM), temporal fuzzy utility itemset mining (TFUIM) has been proposed and paid attention to in recent years. It considers the characteristics of transaction time, sold quantities of items, unit profit, and transformed semantic terms as essential factors. [...] Read more.

Compared to fuzzy utility itemset mining (FUIM), temporal fuzzy utility itemset mining (TFUIM) has been proposed and paid attention to in recent years. It considers the characteristics of transaction time, sold quantities of items, unit profit, and transformed semantic terms as essential factors. In the past, a tree-structure method with two phases was previously presented to solve this problem. However, it spent much time because of the number of candidates generated. This paper thus proposes a one-phase tree-structure method to find the high temporal fuzzy utility itemsets in a temporal database. The tree was designed to maintain candidate 1-itemsets with their upper bound values meeting the defined threshold constraint. Besides, each node in this tree keeps the required data of a 1-itemset for mining. We also designed an algorithm to construct the tree and gave an example to illustrate the mining process in detail. Computational experiments were conducted to demonstrate the one-phase tree-structure method is better than the previous one regarding the execution time on three real datasets. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

17 pages, 1386 KiB

Open AccessArticle

Applying Machine Learning Techniques to the Audit of Antimicrobial Prophylaxis

by Zhi-Yuan Shi, Jau-Shin Hon, Chen-Yang Cheng, Hsiu-Tzy Chiang and Hui-Mei Huang

Appl. Sci. 2022, 12(5), 2586; https://doi.org/10.3390/app12052586 - 2 Mar 2022

Cited by 9 | Viewed by 2704

Abstract

High rates of inappropriate use of surgical antimicrobial prophylaxis were reported in many countries. Auditing the prophylactic antimicrobial use in enormous medical records by manual review is labor-intensive and time-consuming. The purpose of this study is to develop accurate and efficient machine learning [...] Read more.

High rates of inappropriate use of surgical antimicrobial prophylaxis were reported in many countries. Auditing the prophylactic antimicrobial use in enormous medical records by manual review is labor-intensive and time-consuming. The purpose of this study is to develop accurate and efficient machine learning models for auditing appropriate surgical antimicrobial prophylaxis. The supervised machine learning classifiers (Auto-WEKA, multilayer perceptron, decision tree, SimpleLogistic, Bagging, and AdaBoost) were applied to an antimicrobial prophylaxis dataset, which contained 601 instances with 26 attributes. Multilayer perceptron, SimpleLogistic selected by Auto-WEKA, and decision tree algorithms had outstanding discrimination with weighted average AUC > 0.97. The Bagging and SMOTE algorithms could improve the predictive performance of decision tree against imbalanced datasets. Although with better performance measures, multilayer perceptron and Auto-WEKA took more execution time as compared with that of other algorithms. Multilayer perceptron, SimpleLogistic, and decision tree algorithms have outstanding performance measures for identifying the appropriateness of surgical prophylaxis. The efficient models developed by machine learning can be used to assist the antimicrobial stewardship team in the audit of surgical antimicrobial prophylaxis. In future research, we still have the challenges and opportunities of enriching our datasets with more useful clinical information to improve the performance of the algorithms. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

16 pages, 8017 KiB

Open AccessArticle

Week-Wise Student Performance Early Prediction in Virtual Learning Environment Using a Deep Explainable Artificial Intelligence

by Hsing-Chung Chen, Eko Prasetyo, Shian-Shyong Tseng, Karisma Trinanda Putra, Prayitno, Sri Suning Kusumawardani and Chien-Erh Weng

Appl. Sci. 2022, 12(4), 1885; https://doi.org/10.3390/app12041885 - 11 Feb 2022

Cited by 29 | Viewed by 5268

Abstract

Early prediction of students’ learning performance and analysis of student behavior in a virtual learning environment (VLE) are crucial to minimize the high failure rate in online courses during the COVID-19 pandemic. Nevertheless, traditional machine learning models fail to predict student performance in [...] Read more.

Early prediction of students’ learning performance and analysis of student behavior in a virtual learning environment (VLE) are crucial to minimize the high failure rate in online courses during the COVID-19 pandemic. Nevertheless, traditional machine learning models fail to predict student performance in the early weeks due to the lack of students’ activities’ data in a week-wise timely manner (i.e., spatiotemporal feature issues). Furthermore, the imbalanced data distribution in the VLE impacts the prediction model performance. Thus, there are severe challenges in handling spatiotemporal features, imbalanced data sets, and a lack of explainability for enhancing the confidence of the prediction system. Therefore, an intelligent framework for explainable student performance prediction (ESPP) is proposed in this study in order to provide the interpretability of the prediction results. First, this framework utilized a time-series weekly student activity data set and dealt with the VLE imbalanced data distribution using a hybrid data sampling method. Then, a combination of convolutional neural network (CNN) and long short-term memory (LSTM) was employed to extract the spatiotemporal features and develop the early prediction deep learning (DL) model. Finally, the DL model was explained by visualizing and analyzing typical predictions, students’ activities’ maps, and feature importance. The numerical results of cross-validation showed that the proposed new DL model (i.e., the combined CNN-LSTM and ConvLSTM), in the early prediction cases, performed better than the baseline models of LSTM, support vector machine (SVM), and logistic regression (LR) models. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

16 pages, 3150 KiB

Open AccessFeature PaperArticle

A Comparative Study of Ensemble Models for Predicting Road Traffic Congestion

by Tebogo Bokaba, Wesley Doorsamy and Babu Sena Paul

Appl. Sci. 2022, 12(3), 1337; https://doi.org/10.3390/app12031337 - 27 Jan 2022

Cited by 31 | Viewed by 4100

Abstract

Increased road traffic congestion is due to different factors, such as population and economic growth, in different cities globally. On the other hand, many households afford personal vehicles, contributing to the high volume of cars. The primary purpose of this study is to [...] Read more.

Increased road traffic congestion is due to different factors, such as population and economic growth, in different cities globally. On the other hand, many households afford personal vehicles, contributing to the high volume of cars. The primary purpose of this study is to perform a comparative analysis of ensemble methods using road traffic congestion data. Ensemble methods are capable of enhancing the performance of weak classifiers. The comparative analysis was conducted using a real-world dataset and bagging, boosting, stacking and random forest ensemble models to compare the predictive performance of the methods. The ensemble prediction models are developed to predict road traffic congestion. The models are evaluated using the following performance metrics: accuracy, precision, recall, f1-score, and the misclassification cost viewed as a penalty for errors incurred during the classification process. The combination of AdaBoost with decision trees exhibited the best performance in terms of all performance metrics. Additionally, the results showed that the variables that included travel time, traffic volume, and average speed helped predict vehicle traffic flow on the roads. Thus, the model was developed to benefit transport planners, researchers, and transport stakeholders to allocate resources accordingly. Furthermore, adopting this model would benefit commuters and businesses in tandem with other interventions proffered by the transport authorities. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

26 pages, 5531 KiB

Open AccessArticle

An Advanced Optimization Approach for Long-Short Pairs Trading Strategy Based on Correlation Coefficients and Bollinger Bands

by Chun-Hao Chen, Wei-Hsun Lai, Shih-Ting Hung and Tzung-Pei Hong

Appl. Sci. 2022, 12(3), 1052; https://doi.org/10.3390/app12031052 - 20 Jan 2022

Cited by 13 | Viewed by 8554

Abstract

In the financial market, commodity prices change over time, yielding profit opportunities. Various trading strategies have been proposed to yield good earnings. Pairs trading is one such critical, widely-used strategy with good effect. Given two highly correlated paired target stocks, the strategy suggests [...] Read more.

In the financial market, commodity prices change over time, yielding profit opportunities. Various trading strategies have been proposed to yield good earnings. Pairs trading is one such critical, widely-used strategy with good effect. Given two highly correlated paired target stocks, the strategy suggests buying one when its price falls behind, selling it when its stock price converges, and operating the other stock inversely. In the existing approach, the genetic Bollinger Bands and correlation-coefficient-based pairs trading strategy (GBCPT) utilizes optimization technology to determine the parameters for correlation-based candidate pairs and discover Bollinger Bands-based trading signals. The correlation coefficients are used to calculate the relationship between two stocks through their historical stock prices, and the Bollinger Bands are indicators composed of the moving averages and standard deviations of the stocks. In this paper, to achieve more robust and reliable trading performance, AGBCPT, an advanced GBCPT algorithm, is proposed to take into account volatility and more critical parameters that influence profitability. It encodes six critical parameters into a chromosome. To evaluate the fitness of a chromosome, the encoded parameters are utilized to observe the trading pairs and their trading signals generated from Bollinger Bands. The fitness value is then calculated by the average return and volatility of the long and short trading pairs. The genetic process is repeated to find suitable parameters until the termination condition is met. Experiments on 44 stocks selected from the Taiwan 50 Index are conducted, showing the merits and effectiveness of the proposed approach. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

17 pages, 3492 KiB

Open AccessArticle

Active Learning Based on Crowdsourced Data

by Tomasz Maria Boiński, Julian Szymański and Agata Krauzewicz

Appl. Sci. 2022, 12(1), 409; https://doi.org/10.3390/app12010409 - 1 Jan 2022

Cited by 2 | Viewed by 2146

Abstract

The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given [...] Read more.

The paper proposes a crowdsourcing-based approach for annotated data acquisition and means to support Active Learning training approach. In the proposed solution, aimed at data engineers, the knowledge of the crowd serves as an oracle that is able to judge whether the given sample is informative or not. The proposed solution reduces the amount of work needed to annotate large sets of data. Furthermore, it allows a perpetual increase in the trained network quality by the inclusion of new samples, gathered after network deployment. The paper also discusses means of limiting network training times, especially in the post-deployment stage, where the size of the training set can increase dramatically. This is done by the introduction of the fourth set composed of samples gather during network actual usage. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

15 pages, 11582 KiB

Open AccessArticle

Enhanced Image Captioning with Color Recognition Using Deep Learning Methods

by Yeong-Hwa Chang, Yen-Jen Chen, Ren-Hung Huang and Yi-Ting Yu

Appl. Sci. 2022, 12(1), 209; https://doi.org/10.3390/app12010209 - 26 Dec 2021

Cited by 14 | Viewed by 7627

Abstract

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder [...] Read more.

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

22 pages, 1003 KiB

Open AccessArticle

Efficient Detection of DDoS Attacks Using a Hybrid Deep Learning Model with Improved Feature Selection

by Daniyal Alghazzawi, Omaimah Bamasag, Hayat Ullah and Muhammad Zubair Asghar

Appl. Sci. 2021, 11(24), 11634; https://doi.org/10.3390/app112411634 - 8 Dec 2021

Cited by 106 | Viewed by 9059

Abstract

DDoS (Distributed Denial of Service) attacks have now become a serious risk to the integrity and confidentiality of computer networks and systems, which are essential assets in today’s world. Detecting DDoS attacks is a difficult task that must be accomplished before any mitigation [...] Read more.

DDoS (Distributed Denial of Service) attacks have now become a serious risk to the integrity and confidentiality of computer networks and systems, which are essential assets in today’s world. Detecting DDoS attacks is a difficult task that must be accomplished before any mitigation strategies can be used. The identification of DDoS attacks has already been successfully implemented using machine learning/deep learning (ML/DL). However, due to an inherent limitation of ML/DL frameworks—so-called optimal feature selection—complete accomplishment is likewise out of reach. This is a case in which a machine learning/deep learning-based system does not produce promising results for identifying DDoS attacks. At the moment, existing research on forecasting DDoS attacks has yielded a variety of unexpected predictions utilising machine learning (ML) classifiers and conventional approaches for feature encoding. These previous efforts also made use of deep neural networks to extract features without having to maintain the track of the sequence information. The current work suggests predicting DDoS attacks using a hybrid deep learning (DL) model, namely a CNN with BiLSTM (bidirectional long/short-term memory), in order to effectively anticipate DDoS attacks using benchmark data. By ranking and choosing features that scored the highest in the provided data set, only the most pertinent features were picked. Experiment findings demonstrate that the proposed CNN-BI-LSTM attained an accuracy of up to 94.52 percent using the data set CIC-DDoS2019 during training, testing, and validation. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

17 pages, 2358 KiB

Open AccessArticle

U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images

by Shih-Hsiung Lee and Hung-Chun Chen

Appl. Sci. 2021, 11(23), 11446; https://doi.org/10.3390/app112311446 - 2 Dec 2021

Cited by 4 | Viewed by 3194

Abstract

Tables are an important element in a document and can express more information with fewer words. Due to the different arrangements of tables and texts, as well as the variety of layouts, table detection is a challenge in the field of document analysis. [...] Read more.

Tables are an important element in a document and can express more information with fewer words. Due to the different arrangements of tables and texts, as well as the variety of layouts, table detection is a challenge in the field of document analysis. Nowadays, as Optical Character Recognition technology has gradually matured, it can help us to obtain text information quickly, and the ability to accurately detect table structures can improve the efficiency of obtaining text content. The process of document digitization is influenced by the editor’s style on the table layout. In addition, many industries rely on a large number of people to process data, which has high expense, thus, the industry imports artificial intelligence and Robotic Process Automation to handle simple and complicated routine text digitization work. Therefore, this paper proposes an end-to-end table detection model, U-SSD, as based on the object detection method of deep learning, takes the Single Shot MultiBox Detector (SSD) as the basic model architecture, improves it by U-Net, and adds dilated convolution to enhance the feature learning capability of the network. The experiment in this study uses the dataset of accident claim documents, as provided by a Taiwanese Law Firm, and conducts table detection. The experimental results show that the proposed method is effective. In addition, the results of the evaluation on open dataset of TableBank, Github, and ICDAR13 show that the SSD-based network architectures can achieve good performance. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

19 pages, 9032 KiB

Open AccessArticle

An Improved VGG16 Model for Pneumonia Image Classification

by Zhi-Peng Jiang, Yi-Yang Liu, Zhen-En Shao and Ko-Wei Huang

Appl. Sci. 2021, 11(23), 11185; https://doi.org/10.3390/app112311185 - 25 Nov 2021

Cited by 106 | Viewed by 17046

Abstract

Image recognition has been applied to many fields, but it is relatively rarely applied to medical images. Recent significant deep learning progress for image recognition has raised strong research interest in medical image recognition. First of all, we found the prediction result using [...] Read more.

Image recognition has been applied to many fields, but it is relatively rarely applied to medical images. Recent significant deep learning progress for image recognition has raised strong research interest in medical image recognition. First of all, we found the prediction result using the VGG16 model on failed pneumonia X-ray images. Thus, this paper proposes IVGG13 (Improved Visual Geometry Group-13), a modified VGG16 model for classification pneumonia X-rays images. Open-source thoracic X-ray images acquired from the Kaggle platform were employed for pneumonia recognition, but only a few data were obtained, and datasets were unbalanced after classification, either of which can result in extremely poor recognition from trained neural network models. Therefore, we applied augmentation pre-processing to compensate for low data volume and poorly balanced datasets. The original datasets without data augmentation were trained using the proposed and some well-known convolutional neural networks, such as LeNet AlexNet, GoogLeNet and VGG16. In the experimental results, the recognition rates and other evaluation criteria, such as precision, recall and f-measure, were evaluated for each model. This process was repeated for augmented and balanced datasets, with greatly improved metrics such as precision, recall and F1-measure. The proposed IVGG13 model produced superior outcomes with the F1-measure compared with the current best practice convolutional neural networks for medical image recognition, confirming data augmentation effectively improved model accuracy. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

17 pages, 1295 KiB

Open AccessArticle

SLA-DQTS: SLA Constrained Adaptive Online Task Scheduling Based on DDQN in Cloud Computing

by Kaibin Li, Zhiping Peng, Delong Cui and Qirui Li

Appl. Sci. 2021, 11(20), 9360; https://doi.org/10.3390/app11209360 - 9 Oct 2021

Cited by 11 | Viewed by 2989

Abstract

Task scheduling is key to performance optimization and resource management in cloud computing systems. Because of its complexity, it has been defined as an NP problem. We introduce an online scheme to solve the problem of task scheduling under a dynamic load in [...] Read more.

Task scheduling is key to performance optimization and resource management in cloud computing systems. Because of its complexity, it has been defined as an NP problem. We introduce an online scheme to solve the problem of task scheduling under a dynamic load in the cloud environment. After analyzing the process, we propose a server level agreement constraint adaptive online task scheduling algorithm based on double deep Q-learning (SLA-DQTS) to reduce the makespan, cost, and average overdue time under the constraints of virtual machine (VM) resources and deadlines. In the algorithm, we prevent the change of the model input dimension with the number of VMs by taking the Gaussian distribution of related parameters as a part of the state space. Through the design of the reward function, the model can be optimized for different goals and task loads. We evaluate the performance of the algorithm by comparing it with three heuristic algorithms (Min-Min, random, and round robin) under different loads. The results show that the algorithm in this paper can achieve similar or better results than the comparison algorithms at a lower cost. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

18 pages, 1101 KiB

Open AccessArticle

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

by Łukasz Rybak and Janusz Dudczyk

Appl. Sci. 2021, 11(11), 4970; https://doi.org/10.3390/app11114970 - 28 May 2021

Cited by 13 | Viewed by 2588

Abstract

The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their [...] Read more.

The history of gravitational classification started in 1977. Over the years, the gravitational approaches have reached many extensions, which were adapted into different classification problems. This article is the next stage of the research concerning the algorithms of creating data particles by their geometrical divide. In the previous analyses it was established that the Geometrical Divide (GD) method outperforms the algorithm creating the data particles based on classes by a compound of 1 ÷ 1 cardinality. This occurs in the process of balanced data sets classification, in which class centroids are close to each other and the groups of objects, described by different labels, overlap. The purpose of the article was to examine the efficiency of the Geometrical Divide method in the unbalanced data sets classification, by the example of real case-occupancy detecting. In addition, in the paper, the concept of the Unequal Geometrical Divide (UGD) was developed. The evaluation of approaches was conducted on 26 unbalanced data sets-16 with the features of Moons and Circles data sets and 10 created based on real occupancy data set. In the experiment, the GD method and its unbalanced variant (UGD) as well as the 1CT1P approach, were compared. Each method was combined with three data particle mass determination algorithms-n-Mass Model (n-MM), Stochastic Learning Algorithm (SLA) and Bath-update Algorithm (BLA). k-fold cross validation method, precision, recall, F-measure, and number of used data particles were applied in the evaluation process. Obtained results showed that the methods based on geometrical divide outperform the 1CT1P approach in the imbalanced data sets classification. The article’s conclusion describes the observations and indicates the potential directions of further research and development of methods, which concern creating the data particle through its geometrical divide. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

36 pages, 614 KiB

Open AccessArticle

A New Approach to Group Multi-Objective Optimization under Imperfect Information and Its Application to Project Portfolio Optimization

by Eduardo Fernández, Nelson Rangel-Valdez, Laura Cruz-Reyes and Claudia Gomez-Santillan

Appl. Sci. 2021, 11(10), 4575; https://doi.org/10.3390/app11104575 - 17 May 2021

Cited by 9 | Viewed by 2815

Abstract

This paper addresses group multi-objective optimization under a new perspective. For each point in the feasible decision set, satisfaction or dissatisfaction from each group member is determined by a multi-criteria ordinal classification approach, based on comparing solutions with a limiting boundary between classes [...] Read more.

This paper addresses group multi-objective optimization under a new perspective. For each point in the feasible decision set, satisfaction or dissatisfaction from each group member is determined by a multi-criteria ordinal classification approach, based on comparing solutions with a limiting boundary between classes “unsatisfactory” and “satisfactory”. The whole group satisfaction can be maximized, finding solutions as close as possible to the ideal consensus. The group moderator is in charge of making the final decision, finding the best compromise between the collective satisfaction and dissatisfaction. Imperfect information on values of objective functions, required and available resources, and decision model parameters are handled by using interval numbers. Two different kinds of multi-criteria decision models are considered: (i) an interval outranking approach and (ii) an interval weighted-sum value function. The proposal is more general than other approaches to group multi-objective optimization since (a) some (even all) objective values may be not the same for different DMs; (b) each group member may consider their own set of objective functions and constraints; (c) objective values may be imprecise or uncertain; (d) imperfect information on resources availability and requirements may be handled; (e) each group member may have their own perception about the availability of resources and the requirement of resources per activity. An important application of the new approach is collective multi-objective project portfolio optimization. This is illustrated by solving a real size group many-objective project portfolio optimization problem using evolutionary computation tools. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure A1

12 pages, 1663 KiB

Open AccessArticle

Natural Language Description of Videos for Smart Surveillance

by Aniqa Dilawari, Muhammad Usman Ghani Khan, Yasser D. Al-Otaibi, Zahoor-ur Rehman, Atta-ur Rahman and Yunyoung Nam

Appl. Sci. 2021, 11(9), 3730; https://doi.org/10.3390/app11093730 - 21 Apr 2021

Cited by 19 | Viewed by 3909

Abstract

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by [...] Read more.

After the September 11 attacks, security and surveillance measures have changed across the globe. Now, surveillance cameras are installed almost everywhere to monitor video footage. Though quite handy, these cameras produce videos in a massive size and volume. The major challenge faced by security agencies is the effort of analyzing the surveillance video data collected and generated daily. Problems related to these videos are twofold: (1) understanding the contents of video streams, and (2) conversion of the video contents to condensed formats, such as textual interpretations and summaries, to save storage space. In this paper, we have proposed a video description framework on a surveillance dataset. This framework is based on the multitask learning of high-level features (HLFs) using a convolutional neural network (CNN) and natural language generation (NLG) through bidirectional recurrent networks. For each specific task, a parallel pipeline is derived from the base visual geometry group (VGG)-16 model. Tasks include scene recognition, action recognition, object recognition and human face specific feature recognition. Experimental results on the TRECViD, UET Video Surveillance (UETVS) and AGRIINTRUSION datasets depict that the model outperforms state-of-the-art methods by a METEOR (Metric for Evaluation of Translation with Explicit ORdering) score of 33.9%, 34.3%, and 31.2%, respectively. Our results show that our framework has distinct advantages over traditional rule-based models for the recognition and generation of natural language descriptions. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

18 pages, 550 KiB

Open AccessArticle

Improving Monte Carlo Tree Search with Artificial Neural Networks without Heuristics

by Alba Cotarelo, Vicente García-Díaz, Edward Rolando Núñez-Valdez, Cristian González García, Alberto Gómez and Jerry Chun-Wei Lin

Appl. Sci. 2021, 11(5), 2056; https://doi.org/10.3390/app11052056 - 25 Feb 2021

Cited by 11 | Viewed by 6689

Abstract

Monte Carlo Tree Search is one of the main search methods studied presently. It has demonstrated its efficiency in the resolution of many games such as Go or Settlers of Catan and other different problems. There are several optimizations of Monte Carlo, but [...] Read more.

Monte Carlo Tree Search is one of the main search methods studied presently. It has demonstrated its efficiency in the resolution of many games such as Go or Settlers of Catan and other different problems. There are several optimizations of Monte Carlo, but most of them need heuristics or some domain language at some point, making very difficult its application to other problems. We propose a general and optimized implementation of Monte Carlo Tree Search using neural networks without extra knowledge of the problem. As an example of our proposal, we made use of the Dots and Boxes game. We tested it against other Monte Carlo system which implements specific knowledge for this problem. Our approach improves accuracy, reaching a winning rate of 81% over previous research but the generalization penalizes performance. Full article

(This article belongs to the Special Issue Integrated Artificial Intelligence in Data Science)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Integrated Artificial Intelligence in Data Science

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (23 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI