MDPI - Publisher of Open Access Journals

14 pages, 1366 KB

Open AccessArticle

Data-Driven Analysis of Privacy Policies Using LexRank and KL Summarizer for Environmental Sustainability

by Abdul Quadir Md, Raghav V. Anand, Senthilkumar Mohan, Christy Jackson Joshua, Sabhari S. Girish, Anthra Devarajan and Celestine Iwendi

Sustainability 2023, 15(7), 5941; https://doi.org/10.3390/su15075941 - 29 Mar 2023

Cited by 1 | Viewed by 2455

Abstract

Natural language processing (NLP) is a field in machine learning that analyses and manipulate huge amounts of data and generates human language. There are a variety of applications of NLP such as sentiment analysis, text summarization, spam filtering, language translation, etc. Since privacy documents are important and legal, they play a vital part in any agreement. These documents are very long, but the important points still have to be read thoroughly. Customers might not have the necessary time or the knowledge to understand all the complexities of a privacy policy document. In this context, this paper proposes an optimal model to summarize the privacy policy in the best possible way. The methodology of text summarization is the process where the summaries from the original huge text are extracted without losing any vital information. Using the proposed idea of a common word reduction process combined with natural language processing algorithms, this paper extracts the sentences in the privacy policy document that hold high weightage and displays them to the customer, and it can save the customer’s time from reading through the entire policy while also providing the customers with only the important lines that they need to know before signing the document. The proposed method uses two different extractive text summarization algorithms, namely LexRank and Kullback Leibler (KL) Summarizer, to summarize the obtained text. According to the results, the summarized sentences obtained via the common word reduction process and text summarization algorithms were more significant than the raw privacy policy text. The introduction of this novel methodology helps to find certain important common words used in a particular sector to a greater depth, thus allowing more in-depth study of a privacy policy. Using the common word reduction process, the sentences were reduced by 14.63%, and by applying extractive NLP algorithms, significant sentences were obtained. The results after applying NLP algorithms showed a 191.52% increase in the repetition of common words in each sentence using the KL summarizer algorithm, while the LexRank algorithm showed a 361.01% increase in the repetition of common words. This implies that common words play a large role in determining a sector’s privacy policies, making our proposed method a real-world solution for environmental sustainability. Full article

(This article belongs to the Special Issue Applications of Machine Learning and Big Data Analytics for Environmental Sustainability)

► Show Figures

Figure 1

16 pages, 1821 KB

Open AccessArticle

Informative Language Encoding by Variational Autoencoders Using Transformer

by Changwon Ok, Geonseok Lee and Kichun Lee

Appl. Sci. 2022, 12(16), 7968; https://doi.org/10.3390/app12167968 - 9 Aug 2022

Cited by 5 | Viewed by 4580

Abstract

In natural language processing (NLP), Transformer is widely used and has reached the state-of-the-art level in numerous NLP tasks such as language modeling, summarization, and classification. Moreover, a variational autoencoder (VAE) is an efficient generative model in representation learning, combining deep learning with statistical inference in encoded representations. However, the use of VAE in natural language processing often brings forth practical difficulties such as a posterior collapse, also known as Kullback–Leibler (KL) vanishing. To mitigate this problem, while taking advantage of the parallelization of language data processing, we propose a new language representation model as the integration of two seemingly different deep learning models, which is a Transformer model solely coupled with a variational autoencoder. We compare the proposed model with previous works, such as a VAE connected with a recurrent neural network (RNN). Our experiments with four real-life datasets show that implementation with KL annealing mitigates posterior collapses. The results also show that the proposed Transformer model outperforms RNN-based models in reconstruction and representation learning, and that the encoded representations of the proposed model are more informative than other tested models. Full article

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

► Show Figures

Figure 1

17 pages, 1074 KB

Open AccessArticle

Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings

by Woo Hyun Kang and Nam Soo Kim

Sensors 2019, 19(21), 4709; https://doi.org/10.3390/s19214709 - 30 Oct 2019

Cited by 4 | Viewed by 2633

Abstract

Over the recent years, various research has been conducted to investigate methods for verifying users with a short randomized pass-phrase due to the increasing demand for voice-based authentication systems. In this paper, we propose a novel technique for extracting an i-vector-like feature based on an adversarially learned inference (ALI) model which summarizes the variability within the Gaussian mixture model (GMM) distribution through a nonlinear process. Analogous to the previously proposed variational autoencoder (VAE)-based feature extractor, the proposed ALI-based model is trained to generate the GMM supervector according to the maximum likelihood criterion given the Baum–Welch statistics of the input utterance. However, to prevent the potential loss of information caused by the Kullback–Leibler divergence (KL divergence) regularization adopted in the VAE-based model training, the newly proposed ALI-based feature extractor exploits a joint discriminator to ensure that the generated latent variable and the GMM supervector are more realistic. The proposed framework is compared with the conventional i-vector and VAE-based methods using the TIDIGITS dataset. Experimental results show that the proposed method can represent the uncertainty caused by the short duration better than the VAE-based method. Furthermore, the proposed approach has shown great performance when applied in association with the standard i-vector framework. Full article

(This article belongs to the Special Issue Speech, Acoustics, Audio Signal Processing and Applications in Sensors)

► Show Figures

Figure 1

15 pages, 280 KB

Open AccessArticle

Symmetry Properties of Bi-Normal and Bi-Gamma Receiver Operating Characteristic Curves are Described by Kullback-Leibler Divergences

by Gareth Hughes and Bhaskar Bhattacharya

Entropy 2013, 15(4), 1342-1356; https://doi.org/10.3390/e15041342 - 10 Apr 2013

Cited by 14 | Viewed by 7278

Abstract

Receiver operating characteristic (ROC) curves have application in analysis of the performance of diagnostic indicators used in the assessment of disease risk in clinical and veterinary medicine and in crop protection. For a binary indicator, an ROC curve summarizes the two distributions of risk scores obtained by retrospectively categorizing subjects as cases or controls using a gold standard. An ROC curve may be symmetric about the negative diagonal of the graphical plot, or skewed towards the left-hand axis or the upper axis of the plot. ROC curves with different symmetry properties may have the same area under the curve. Here, we characterize the symmetry properties of bi-Normal and bi-gamma ROC curves in terms of the Kullback-Leibler divergences (KLDs) between the case and control distributions of risk scores. The KLDs describe the known symmetry properties of bi-Normal ROC curves, and newly characterize the symmetry properties of constant-shape and constant-scale bi-gamma ROC curves. It is also of interest to note an application of KLDs where their asymmetry—often an inconvenience—has a useful interpretation. Full article

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI