applsci-logo

Journal Browser

Journal Browser

Text and Data Mining (TDM) Techniques for Personalized Services and Their Policy

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (1 December 2021) | Viewed by 6616

Special Issue Editor


E-Mail Website
Guest Editor
Department of Industrial Security, Chung-Ang University, Seoul 06974, Korea
Interests: databases; big data analysis; music retrieval; multimedia systems; machine learning; knowledge management; computational intelligence
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recently, because massive amounts of personal data are being collected through digital tools and communication platforms, companies that operate these tools and platforms can provide better individual service for people by analyzing these data. For instance, the companies can share medical data with people to control a virus, such as COVID-19, based on the geographic information system (GIS) of their smart devices. One key issue for providing optimized services to individuals is to actively utilize big data processing and artificial intelligence (AI) technologies.

However, because most of the data collected are composed of unstructured data (typically text-heavy), text and data mining (TDM) should actively be utilized to deal with unstructured data for AI-based modeling. TDM uses diverse techniques such as natural language processing (NLP), machine learning (ML), information retrieval, and knowledge management for the automated analysis of digital content. By doing so, TDM can extract information, identify patterns, and discover new trends, insights, and correlations.

This Special Issue solicits original research and survey papers addressing diverse personalization service technologies using the personal data-based TDM technique. Recently, because personal data protection issues have been increasing, several governments have regulated personal data protection laws for national security or public interest exemptions. Hence, this Special Issue also solicits papers related to data protection (ownership) policy for sustainable technology implementation.

  • Data-driven AI-based personalized services;
  • Big data processing;
  • Unstructured data analysis;
  • Text and data mining (TDM);
  • Artificial intelligence;
  • Natural language processing;
  • Machine learning;
  • Sustainable technology implementation;
  • Data protection law;
  • Data protection policy.

Prof. Dr. Seungmin Rho
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 732 KiB  
Article
A Study on High-Speed Outlier Detection Method of Network Abnormal Behavior Data Using Heterogeneous Multiple Classifiers
by Jaeik Cho, Seonghyeon Gong and Ken Choi
Appl. Sci. 2022, 12(3), 1011; https://doi.org/10.3390/app12031011 - 19 Jan 2022
Cited by 2 | Viewed by 1369
Abstract
As the complexity and scale of the network environment increase continuously, various methods to detect attacks and intrusions from network traffic by classifying normal and abnormal network behaviors show their limitations. The number of network traffic signatures is increasing exponentially to the extent [...] Read more.
As the complexity and scale of the network environment increase continuously, various methods to detect attacks and intrusions from network traffic by classifying normal and abnormal network behaviors show their limitations. The number of network traffic signatures is increasing exponentially to the extent that semi-realtime detection is not possible. However, machine learning-based intrusion detection only gives simple guidelines as simple contents of security events. This is why security data for a specific environment cannot be configured due to data noise, diversification, and continuous alteration of a system and network environments. Although machine learning is performed and evaluated using a generalized data set, its performance is expected to be similar in that specific network environment only. In this study, we propose a high-speed outlier detection method for a network dataset to customize the dataset in real-time for a continuously changing network environment. The proposed method uses an ensemble-based noise data filtering model using the voting results of 6 classifiers (decision tree, random forest, support vector machine, naive Bayes, k-nearest neighbors, and logistic regression) to reflect the distribution and various environmental characteristics of datasets. Moreover, to prove the performance of the proposed method, we experimented with the accuracy of attack detection by gradually reducing the noise data in the time series dataset. As a result of the experiment, the proposed method maintains a training dataset of a size capable of semi-real-time learning, which is 10% of the total training dataset, and at the same time, shows the same level of accuracy as a detection model using a large training dataset. The improved research results would be the basis for automatic tuning of network datasets and machine learning that can be applied to special-purpose environments and devices such as ICS environments. Full article
Show Figures

Figure 1

18 pages, 1858 KiB  
Article
Improved Text Summarization of News Articles Using GA-HC and PSO-HC
by Muhammad Mohsin, Shazad Latif, Muhammad Haneef, Usman Tariq, Muhammad Attique Khan, Sefedine Kadry, Hwan-Seung Yong and Jung-In Choi
Appl. Sci. 2021, 11(22), 10511; https://doi.org/10.3390/app112210511 - 9 Nov 2021
Cited by 4 | Viewed by 2799
Abstract
Automatic Text Summarization (ATS) is gaining attention because a large volume of data is being generated at an exponential rate. Due to easy internet availability globally, a large amount of data is being generated from social networking websites, news websites and blog websites. [...] Read more.
Automatic Text Summarization (ATS) is gaining attention because a large volume of data is being generated at an exponential rate. Due to easy internet availability globally, a large amount of data is being generated from social networking websites, news websites and blog websites. Manual summarization is time consuming, and it is difficult to read and summarize a large amount of content. Automatic text summarization is the solution to deal with this problem. This study proposed two automatic text summarization models which are Genetic Algorithm with Hierarchical Clustering (GA-HC) and Particle Swarm Optimization with Hierarchical Clustering (PSO-HC). The proposed models use a word embedding model with Hierarchal Clustering Algorithm to group sentences conveying almost same meaning. Modified GA and adaptive PSO based sentence ranking models are proposed for text summary in news text documents. Simulations are conducted and compared with other understudied algorithms to evaluate the performance of proposed methodology. Simulations results validate the superior performance of the proposed methodology. Full article
Show Figures

Figure 1

19 pages, 4680 KiB  
Article
TREASURE: Text Mining Algorithm Based on Affinity Analysis and Set Intersection to Find the Action of Tuberculosis Drugs against Other Pathogens
by Pradeepa Sampath, Nithya Shree Sridhar, Vimal Shanmuganathan and Yangsun Lee
Appl. Sci. 2021, 11(15), 6834; https://doi.org/10.3390/app11156834 - 25 Jul 2021
Cited by 1 | Viewed by 1513
Abstract
Tuberculosis (TB) is one of the top causes of death in the world. Though TB is known as the world’s most infectious killer, it can be treated with a combination of TB drugs. Some of these drugs can be active against other infective [...] Read more.
Tuberculosis (TB) is one of the top causes of death in the world. Though TB is known as the world’s most infectious killer, it can be treated with a combination of TB drugs. Some of these drugs can be active against other infective agents, in addition to TB. We propose a framework called TREASURE (Text mining algoRithm basEd on Affinity analysis and Set intersection to find the action of tUberculosis dRugs against other pathogEns), which particularly focuses on the extraction of various drug–pathogen relationships in eight different TB drugs, namely pyrazinamide, moxifloxacin, ethambutol, isoniazid, rifampicin, linezolid, streptomycin and amikacin. More than 1500 research papers from PubMed are collected for each drug. The data collected for this purpose are first preprocessed, and various relation records are generated for each drug using affinity analysis. These records are then filtered based on the maximum co-occurrence value and set intersection property to obtain the required inferences. The inferences produced by this framework can help the medical researchers in finding cures for other bacterial diseases. Additionally, the analysis presented in this model can be utilized by the medical experts in their disease and drug experiments. Full article
Show Figures

Figure 1

Back to TopTop