Special Issue "Humanistic Data Processing"

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (6 December 2016)

Special Issue Editors

Guest Editor
Dr. Katia Lida Kermanidis

Department of Informatics, Ionian University, Greece
Website | E-Mail
Interests: artificial intelligence; natural language processing, grammar development; information retrieval; linguistic data mining; ontology extraction
Guest Editor
Dr. Christos Makris

Department of Computer Engineering & Informatics, University of Patras, Greece
Website | E-Mail
Interests: data structures; information retrieval; data mining; bioinformatics; string algorithmic; computational geometry; multimedia data bases; internet technologies
Guest Editor
Prof. Dr. Phivos Mylonas

Department of Informatics, Ionian University, 7,Tsirigoti Square, P.C. 49100, Corfu, Greece
Website | E-Mail
Interests: content-based information retrieval; visual context representation and analysis; knowledge-assisted multimedia analysis; issues related to multimedia personalization; user adaptation; user modeling and profiling
Guest Editor
Dr. Spyros Sioutas

Department of Informatics, Ionian University, Greece
Website | E-Mail
Interests: algorithmic data management; spatio-temporal database systems; distributed data structures and P2P overlays; cloud infrastructures; indexing; query processing and query optimization

Special Issue Information

Dear Colleagues,

Data processing and analysis could be described as being one of the most important, yet challenging tasks of our era. The abundant amount of available information retrieved from, or related to the areas of Humanistic Sciences poses significant challenges to the research community. The ultimate goal is two-fold: on the one hand, to extract knowledge that will aid human behavior understanding, increasing human creativity, learning, decision making, socializing and even biological processing; on the other hand, to extract and exploit the underlying semantic knowledge by incorporating it into computationally intelligent systems.

The nature of humanistic data can be multimodal, semantically heterogeneous, dynamic, time and space-dependent, and highly complicated. Translating humanistic information, e.g., behavior, state of mind, artistic creation, linguistic utterance, learning and genomic information into numerical or categorical low-level data is a significant challenge on its own. New algorithms, appropriate for dealing with this type of data, need to be proposed and existing ones adapted to the individual special characteristics.

This Special Issue aims to bring together interdisciplinary approaches that focus on the application of innovative as well as existing data matching, fusion and mining and knowledge discovery and management techniques (such as decision rules, decision trees, association rules, ontologies and alignments, clustering, filtering, learning, classifier systems, neural networks, support vector machines, preprocessing, post processing, feature selection, visualization techniques) to data derived from all areas of Humanistic Sciences, e.g., linguistic, historical, behavioral, psychological, artistic, musical, educational, social, etc. The Issue is devoted to the exploitation of the many facets of the above fields and will explore the current related state-of-the-art. Its topics of interest cover the scope of the MHDW 2016 workshop (https://conferences.cwa.gr/mhdw2016/). Extended versions of papers presented at MHDW 2016 are sought, but this Call for Papers is fully open to all who want to contribute by submitting a relevant research manuscript.

Dr. Katia Lida Kermanidis
Dr. Christos Makris
Dr. Phivos Mylonas
Dr. Spyros Sioutas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 550 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • humanistic sciences
  • data matching and fusion,
  • data mining,
  • knowledge discovery and management,
  • artificial intelligence,
  • information retrieval, context, social data analytics

Published Papers (6 papers)

View options order results:
result details:
Displaying articles 1-6
Export citation of selected articles as:

Research

Open AccessArticle Fuzzy Random Walkers with Second Order Bounds: An Asymmetric Analysis
Algorithms 2017, 10(2), 40; doi:10.3390/a10020040
Received: 22 December 2016 / Revised: 22 March 2017 / Accepted: 27 March 2017 / Published: 30 March 2017
PDF Full-text (315 KB) | HTML Full-text | XML Full-text
Abstract
Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artificial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they
[...] Read more.
Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artificial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they are consequently expressed in probabilistic terms. Thus, algorithms for fuzzy graph analysis must rely on non-deterministic design principles. One such principle is Random Walker, which is based on a virtual entity and selects either edges or, like in this case, vertices of a fuzzy graph to visit. This allows the estimation of global graph properties through a long sequence of local decisions, making it a viable strategy candidate for graph processing software relying on native graph databases such as Neo4j. As a concrete example, Chebyshev Walktrap, a heuristic fuzzy community discovery algorithm relying on second order statistics and on the teleportation of the Random Walker, is proposed and its performance, expressed in terms of community coherence and number of vertex visits, is compared to the previously proposed algorithms of Markov Walktrap, Fuzzy Walktrap, and Fuzzy Newman–Girvan. In order to facilitate this comparison, a metric based on the asymmetric metrics of Tversky index and Kullback–Leibler divergence is used. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Open AccessArticle A Geo-Clustering Approach for the Detection of Areas-of-Interest and Their Underlying Semantics
Algorithms 2017, 10(1), 35; doi:10.3390/a10010035
Received: 20 December 2016 / Revised: 1 March 2017 / Accepted: 13 March 2017 / Published: 18 March 2017
PDF Full-text (12600 KB) | HTML Full-text | XML Full-text
Abstract
Living in the “era of social networking”, we are experiencing a data revolution, generating an astonishing amount of digital information every single day. Due to this proliferation of data volume, there has been an explosion of new application domains for information mined from
[...] Read more.
Living in the “era of social networking”, we are experiencing a data revolution, generating an astonishing amount of digital information every single day. Due to this proliferation of data volume, there has been an explosion of new application domains for information mined from social networks. In this paper, we leverage this “socially-generated knowledge” (i.e., user-generated content derived from social networks) towards the detection of areas-of-interest within an urban region. These large and homogeneous areas contain multiple points-of-interest which are of special interest to particular groups of people (e.g., tourists and/or consumers). In order to identify them, we exploit two types of metadata, namely location-based information included within geo-tagged photos that we collect from Flickr, along with plain simple textual information from user-generated tags. We propose an algorithm that divides a predefined geographical area (i.e., the center of Athens, Greece) into “tile”-shaped sub-regions and based on an iterative merging procedure, it aims to detect larger, cohesive areas. We examine the performance of the algorithm both in a qualitative and quantitative manner. Our experiments demonstrate that the proposed geo-clustering algorithm is able to correctly detect regions that contain popular tourist attractions within them with very promising results. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Open AccessArticle A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek
Algorithms 2017, 10(1), 34; doi:10.3390/a10010034
Received: 11 December 2016 / Accepted: 24 February 2017 / Published: 6 March 2017
PDF Full-text (1348 KB) | HTML Full-text | XML Full-text
Abstract
Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It
[...] Read more.
Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It is therefore quite complex to cope with the vast amount of textual data particularly if we also take the incremental production speed into account. Social media, e-commerce, news articles, comments and opinions are broadcasted on a daily basis. A rational solution, in order to handle the abundance of data, would be to build automated information processing systems, for analyzing and extracting meaningful patterns from text. The present paper focuses on sentiment analysis applied in Greek texts. Thus far, there is no wide availability of natural language processing tools for Modern Greek. Hence, a thorough analysis of Greek, from the lexical to the syntactical level, is difficult to perform. This paper attempts a different approach, based on the proven capabilities of gradient boosting, a well-known technique for dealing with high-dimensional data. The main rationale is that since English has dominated the area of preprocessing tools and there are also quite reliable translation services, we could exploit them to transform Greek tokens into English, thus assuring the precision of the translation, since the translation of large texts is not always reliable and meaningful. The new feature set of English tokens is augmented with the original set of Greek, consequently producing a high dimensional dataset that poses certain difficulties for any traditional classifier. Accordingly, we apply gradient boosting machines, an ensemble algorithm that can learn with different loss functions providing the ability to work efficiently with high dimensional data. Moreover, for the task at hand, we deal with a class imbalance issues since the distribution of sentiments in real-world applications often displays issues of inequality. For example, in political forums or electronic discussions about immigration or religion, negative comments overwhelm the positive ones. The class imbalance problem was confronted using a hybrid technique that performs a variation of under-sampling the majority class and over-sampling the minority class, respectively. Experimental results, considering different settings, such as translation of tokens against translation of sentences, consideration of limited Greek text preprocessing and omission of the translation phase, demonstrated that the proposed gradient boosting framework can effectively cope with both high-dimensional and imbalanced datasets and performs significantly better than a plethora of traditional machine learning classification approaches in terms of precision and recall measures. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Open AccessArticle Large Scale Implementations for Twitter Sentiment Classification
Algorithms 2017, 10(1), 33; doi:10.3390/a10010033
Received: 8 December 2016 / Revised: 28 February 2017 / Accepted: 1 March 2017 / Published: 4 March 2017
PDF Full-text (343 KB) | HTML Full-text | XML Full-text
Abstract
Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide
[...] Read more.
Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Open AccessArticle Mining Domain-Specific Design Patterns: A Case Study †
Algorithms 2017, 10(1), 28; doi:10.3390/a10010028
Received: 16 November 2016 / Revised: 24 January 2017 / Accepted: 16 February 2017 / Published: 21 February 2017
PDF Full-text (2979 KB) | HTML Full-text | XML Full-text
Abstract
Domain-specific design patterns provide developers with proven solutions to common design problems that arise, particularly in a target application domain, facilitating them to produce quality designs in the domain contexts. However, research in this area is not mature and there are no techniques
[...] Read more.
Domain-specific design patterns provide developers with proven solutions to common design problems that arise, particularly in a target application domain, facilitating them to produce quality designs in the domain contexts. However, research in this area is not mature and there are no techniques to support their detection. Towards this end, we propose a methodology which, when applied on a collection of websites in a specific domain, facilitates the automated identification of domain-specific design patterns. The methodology automatically extracts the conceptual models of the websites, which are subsequently analyzed in terms of all of the reusable design fragments used in them for supporting common domain functionalities. At the conceptual level, we consider these fragments as recurrent patterns consisting of a configuration of front-end interface components that interrelate each other and interact with end-users to support certain functionality. By performing a pattern-based analysis of the models, we locate the occurrences of all the recurrent patterns in the various website designs which are then evaluated towards their consistent use. The detected patterns can be used as building blocks in future designs, assisting developers to produce consistent and quality designs in the target domain. To support our case, we present a case study for the educational domain. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Open AccessArticle Evaluation of Diversification Techniques for Legal Information Retrieval
Algorithms 2017, 10(1), 22; doi:10.3390/a10010022
Received: 22 November 2016 / Revised: 17 January 2017 / Accepted: 19 January 2017 / Published: 29 January 2017
PDF Full-text (435 KB) | HTML Full-text | XML Full-text
Abstract
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information
[...] Read more.
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information institutes of the world, a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques should be employed in the context of legal information retrieval, as to increase user satisfaction. We address the diversification of results in legal search by adopting several state of the art methods from the web search, network analysis and text summarization domains. We provide an exhaustive evaluation of the methods, using a standard dataset from the common law domain that we objectively annotated with relevance judgments for this purpose. Our results: (i) reveal that users receive broader insights across the results they get from a legal information retrieval system; (ii) demonstrate that web search diversification techniques outperform other approaches (e.g., summarization-based, graph-based methods) in the context of legal diversification; and (iii) offer balance boundaries between reinforcing relevant documents or sampling the information space around the legal query. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Figures

Figure 1

Back to Top