Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks

Mehmet Can

doi:10.3390/mca19010021

Abstract

Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. Authors of texts identified by the competitive neural networks, which use these effective features.

Keywords:

principal components; authorship attribution; stylometry; text categorization; stylistic features; syntactic characteristics; multilayer preceptor; competitive learning; artificial neural network

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.