Open AccessArticle
Pokémon GO Forensics: An Android Application Analysis
Information 2017, 8(3), 71; doi:10.3390/info8030071 -
Abstract
As the geolocation capabilities of smartphones continue to improve, developers have continued to create more innovative applications that rely on this location information for their primary function. This can be seen with Niantic’s release of Pokémon GO, which is a massively multiplayer online
[...] Read more.
As the geolocation capabilities of smartphones continue to improve, developers have continued to create more innovative applications that rely on this location information for their primary function. This can be seen with Niantic’s release of Pokémon GO, which is a massively multiplayer online role playing and augmented reality game. This game became immensely popular within just a few days of its release. However, it also had the propensity to be a distraction to drivers, resulting in numerous accidents, and was used as a tool by armed robbers to lure unsuspecting users into secluded areas. This facilitates the need for forensic investigators to be able to analyze the data within the application in order to determine if it may have been involved in these incidents. Because this application is new, limited research has been conducted regarding the artifacts that can be recovered from the application. In this paper, we aim to fill the gaps within the current research by assessing what forensically-relevant information may be recovered from the application and understanding the circumstances behind the creation of this information. Our research focuses primarily on the artifacts generated by the Upsight analytics platform, those contained within the bundles directory and the Pokémon Go Plus accessory. Moreover, we present our new application-specific analysis tool that is capable of extracting forensic artifacts from a backup of the Android application and presenting them to an investigator in an easily-readable format. This analysis tool exceeds the capabilities of the well known mobile forensic tool Cellebrite’s UFED (Universal Forensic Extraction Device) Physical Analyzer in processing Pokémon GO application data. Full article
Open AccessCommunication
Adopting Sector-Based Replacement (SBR) and Utilizing Air-R to Achieve R-WSN Sustainability
Information 2017, 8(2), 70; doi:10.3390/info8020070 -
Abstract
Sensor replacement in the rechargeable wireless sensor network (R-WSN) is important to provide continuous sensing services once sensor node failure or damage occurs. However, satisfactory solutions have not been found yet in developing a sustainable network and effectively prolonging its lifetime. Thus, we
[...] Read more.
Sensor replacement in the rechargeable wireless sensor network (R-WSN) is important to provide continuous sensing services once sensor node failure or damage occurs. However, satisfactory solutions have not been found yet in developing a sustainable network and effectively prolonging its lifetime. Thus, we propose a new technique for detecting, reporting, and handling sensor failure, called sector-based replacement (SBR). Base station (BS) features are utilized in dividing the monitoring field into sectors and analyzing the incoming data from the nodes to detect the failed nodes. An airplane robot (Air-R) is then sent to a replacement task trip. The goals of this study are to (i) increase and guarantee the sustainability of the R-WSN; (ii) rapidly detect the failed nodes in sectors by utilizing the BS capabilities in analyzing data and achieving the highest performance for replacing the failed nodes using Air-R; and (iii) minimize the Air-R effort movement by applying the new field-dividing mechanism that leads to fast replacement. Extensive simulations are conducted to verify the effectiveness and efficiency of the SBR technique. Full article
Figures

Figure 1

Open AccessArticle
Expression and Analysis of Joint Roughness Coefficient Using Neutrosophic Number Functions
Information 2017, 8(2), 69; doi:10.3390/info8020069 -
Abstract
In nature, the mechanical properties of geological bodies are very complex, and its various mechanical parameters are vague, incomplete, imprecise, and indeterminate. In these cases, we cannot always compute or provide exact/crisp values for the joint roughness coefficient (JRC), which is a quite
[...] Read more.
In nature, the mechanical properties of geological bodies are very complex, and its various mechanical parameters are vague, incomplete, imprecise, and indeterminate. In these cases, we cannot always compute or provide exact/crisp values for the joint roughness coefficient (JRC), which is a quite crucial parameter for determining the shear strength in rock mechanics, but we need to approximate them. Hence, we need to investigate the anisotropy and scale effect of indeterminate JRC values by neutrosophic number (NN) functions, because the NN is composed of its determinate part and the indeterminate part and is very suitable for the expression of JRC data with determinate and/or indeterminate information. In this study, the lower limit of JRC data is chosen as the determinate information, and the difference between the lower and upper limits is chosen as the indeterminate information. In this case, the NN functions of the anisotropic ellipse and logarithmic equation of JRC are developed to reflect the anisotropy and scale effect of JRC values. Additionally, the NN parameter ψ is defined to quantify the anisotropy of JRC values. Then, a two-variable NN function is introduced based on the factors of both the sample size and measurement orientation. Further, the changing rates in various sample sizes and/or measurement orientations are investigated by their derivative and partial derivative NN functions. However, an actual case study shows that the proposed NN functions are effective and reasonable in the expression and analysis of the indeterminate values of JRC. Obviously, NN functions provide a new, effective way for passing from the classical crisp expression and analyses to the neutrosophic ones. Full article
Figures

Figure 1

Open AccessArticle
Computer-Generated Abstract Paintings Oriented by the Color Composition of Images
Information 2017, 8(2), 68; doi:10.3390/info8020068 -
Abstract
Designers and artists often require reference images at authoring time. The emergence of computer technology has provided new conditions and possibilities for artistic creation and research. It has also expanded the forms of artistic expression and attracted many artists, designers and computer experts
[...] Read more.
Designers and artists often require reference images at authoring time. The emergence of computer technology has provided new conditions and possibilities for artistic creation and research. It has also expanded the forms of artistic expression and attracted many artists, designers and computer experts to explore different artistic directions and collaborate with one another. In this paper, we present an efficient k-means-based method to segment the colors of an original picture to analyze the composition ratio of the color information and calculate individual color areas that are associated with their sizes. This information is transformed into regular geometries to reconstruct the colors of the picture to generate abstract images. Furthermore, we designed an application system using the proposed method and generated many works; some artists and designers have used it as an auxiliary tool for art and design creation. The experimental results of datasets demonstrate the effectiveness of our method and can give us inspiration for our work. Full article
Figures

Figure 1

Open AccessArticle
Understanding the Impact of Human Mobility Patterns on Taxi Drivers’ Profitability Using Clustering Techniques: A Case Study in Wuhan, China
Information 2017, 8(2), 67; doi:10.3390/info8020067 -
Abstract
Taxi trajectories reflect human mobility over the urban roads’ network. Although taxi drivers cruise the same city streets, there is an observed variation in their daily profit. To reveal the reasons behind this issue, this study introduces a novel approach for investigating and
[...] Read more.
Taxi trajectories reflect human mobility over the urban roads’ network. Although taxi drivers cruise the same city streets, there is an observed variation in their daily profit. To reveal the reasons behind this issue, this study introduces a novel approach for investigating and understanding the impact of human mobility patterns (taxi drivers’ behavior) on daily drivers’ profit. Firstly, a K-means clustering method is adopted to group taxi drivers into three profitability groups according to their driving duration, driving distance and income. Secondly, the cruising trips and stopping spots for each profitability group are extracted. Thirdly, a comparison among the profitability groups in terms of spatial and temporal patterns on cruising trips and stopping spots is carried out. The comparison applied various methods including the mash map matching method and DBSCAN clustering method. Finally, an overall analysis of the results is discussed in detail. The results show that there is a significant relationship between human mobility patterns and taxi drivers’ profitability. High profitability drivers based on their experience earn more compared to other driver groups, as they know which places are more active to cruise and to stop and at what times. This study provides suggestions and insights for taxi companies and taxi drivers in order to increase their daily income and to enhance the efficiency of the taxi industry. Full article
Figures

Figure 1

Open AccessArticle
An Energy-Efficient Routing Algorithm in Three-Dimensional Underwater Sensor Networks Based on Compressed Sensing
Information 2017, 8(2), 66; doi:10.3390/info8020066 -
Abstract
Compressed sensing (CS) has become a powerful tool to process data that is correlated in underwater sensor networks (USNs). Based on CS, certain signals can be recovered from a relatively small number of random linear projections. Since the battery-driven sensor nodes work in
[...] Read more.
Compressed sensing (CS) has become a powerful tool to process data that is correlated in underwater sensor networks (USNs). Based on CS, certain signals can be recovered from a relatively small number of random linear projections. Since the battery-driven sensor nodes work in adverse environments, energy-efficient routing well-matched with CS is needed to realize data gathering in USNs. In this paper, a clustering, uneven-layered, and multi-hop routing based on CS (CS-CULM) is proposed. The inter-cluster transmission and fusion are fulfilled by an improved LEACH protocol, then the uneven-layered, multi-hop routing is adopted to forward the packets fused to sink node for data reconstruction. Simulation results show that CS-CULM can achieve better performances in energy saving and data reconstruction. Full article
Figures

Figure 1

Open AccessArticle
Turbo Coded OFDM Combined with MIMO Antennas Based on Matched Interleaver for Coded-Cooperative Wireless Communication
Information 2017, 8(2), 63; doi:10.3390/info8020063 -
Abstract
A turbo coded cooperative orthogonal frequency division multiplexing (OFDM) with multiple-input multiple-output (MIMO) antennas scheme is considered, and its performance over a fast Rayleigh fading channel is evaluated. The turbo coded OFDM incorporates MIMO (2 × 2) Alamouti space-time block code. The interleaver
[...] Read more.
A turbo coded cooperative orthogonal frequency division multiplexing (OFDM) with multiple-input multiple-output (MIMO) antennas scheme is considered, and its performance over a fast Rayleigh fading channel is evaluated. The turbo coded OFDM incorporates MIMO (2 × 2) Alamouti space-time block code. The interleaver design, and its placement always plays a vital role in the performance of a turbo coded cooperation scheme. Therefore, a code-matched interleaver (CMI) is selected as an optimum choice of interleaver and is placed at the relay node. The performance of the CMI is evaluated in a turbo coded OFDM system over an additive white Gaussian noise (AWGN) channel. Moreover, the performance of the CMI is also evaluated in the turbo coded OFDM system with MIMO antennas over a fast Rayleigh fading channel. The modulation schemes chosen are Binary Phase shift keying (BPSK), Quadrature phase shift keying (QPSK) and 16-Quadrature amplitude modulation (16QAM). Soft-demodulators are employed along with joint iterative soft-input soft-output (SISO) turbo decoder at the destination node. Monte Carlo simulated results reveal that the turbo coded cooperative OFDM system with MIMO antennas scheme incorporates coding gain, diversity gain and cooperation gain successfully over the direct transmission scheme under identical conditions. Full article
Figures

Figure 1

Open AccessArticle
Security Policy Scheme for an Efficient Security Architecture in Software-Defined Networking
Information 2017, 8(2), 65; doi:10.3390/info8020065 -
Abstract
In order to build an efficient security architecture, previous studies have attempted to understand complex system architectures and message flows to detect various attack packets. However, the existing hardware-based single security architecture cannot efficiently handle a complex system structure. To solve this problem,
[...] Read more.
In order to build an efficient security architecture, previous studies have attempted to understand complex system architectures and message flows to detect various attack packets. However, the existing hardware-based single security architecture cannot efficiently handle a complex system structure. To solve this problem, we propose a software-defined networking (SDN) policy-based scheme for an efficient security architecture. The proposed scheme considers four policy functions: separating, chaining, merging, and reordering. If SDN network functions virtualization (NFV) system managers use these policy functions to deploy a security architecture, they only submit some of the requirement documents to the SDN policy-based architecture. After that, the entire security network can be easily built. This paper presents information about the design of a new policy functions model, and it discusses the performance of this model using theoretical analysis. Full article
Figures

Figure 1

Open AccessArticle
Identifying High Quality Document–Summary Pairs through Text Matching
Information 2017, 8(2), 64; doi:10.3390/info8020064 -
Abstract
Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high
[...] Read more.
Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document–summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document–summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value. Full article
Figures

Figure 1

Open AccessArticle
Exponential Operations and an Aggregation Method for Single-Valued Neutrosophic Numbers in Decision Making
Information 2017, 8(2), 62; doi:10.3390/info8020062 -
Abstract
As an extension of an intuitionistic fuzzy set, a single-valued neutrosophic set is described independently by the membership functions of its truth, indeterminacy, and falsity, which is a subclass of a neutrosophic set (NS). However, in existing exponential operations and their aggregation methods
[...] Read more.
As an extension of an intuitionistic fuzzy set, a single-valued neutrosophic set is described independently by the membership functions of its truth, indeterminacy, and falsity, which is a subclass of a neutrosophic set (NS). However, in existing exponential operations and their aggregation methods for neutrosophic numbers (NNs) (basic elements in NSs), the exponents (weights) are positive real numbers in unit intervals under neutrosophic decision-making environments. As a supplement, this paper defines new exponential operations of single-valued NNs (basic elements in a single-valued NS), where positive real numbers are used as the bases, and single-valued NNs are used as the exponents. Then, we propose a single-valued neutrosophic weighted exponential aggregation (SVNWEA) operator based on the exponential operational laws of single-valued NNs and the SVNWEA operator-based decision-making method. Finally, an illustrative example shows the applicability and rationality of the presented method. A comparison with a traditional method demonstrates that the new decision-making method is more appropriate and effective. Full article
Open AccessArticle
Information and Inference
Information 2017, 8(2), 61; doi:10.3390/info8020061 -
Abstract
Inference is expressed using information and is therefore subject to the limitations of information. The conventions that determine the reliability of inference have developed in information ecosystems under the influence of a range of selection pressures. These conventions embed limitations in information measures
[...] Read more.
Inference is expressed using information and is therefore subject to the limitations of information. The conventions that determine the reliability of inference have developed in information ecosystems under the influence of a range of selection pressures. These conventions embed limitations in information measures like quality, pace and friction caused by selection trade-offs. Some selection pressures improve the reliability of inference; others diminish it by reinforcing the limitations of the conventions. This paper shows how to apply these ideas to inference in order to analyse the limitations; the analysis is applied to various theories of inference including examples from the philosophies of science and mathematics as well as machine learning. The analysis highlights the limitations of these theories and how different, seemingly competing, ideas about inference can relate to each other. Full article
Figures

Figure 1

Open AccessArticle
Correction of Outliers in Temperature Time Series Based on Sliding Window Prediction in Meteorological Sensor Network
Information 2017, 8(2), 60; doi:10.3390/info8020060 -
Abstract
In order to detect outliers in temperature time series data for improving data quality and decision-making quality related to design and operation, we proposed an algorithm based on sliding window prediction. Firstly, the time series are segmented based on the sliding window. Then,
[...] Read more.
In order to detect outliers in temperature time series data for improving data quality and decision-making quality related to design and operation, we proposed an algorithm based on sliding window prediction. Firstly, the time series are segmented based on the sliding window. Then, the prediction model is established based on the history data to predict the future value. If the difference between a predicted value and a measured value is larger than the preset threshold value, the sequence point will be judged to be an outlier and then corrected. In this paper, the sliding window and parameter settings of the algorithm are discussed and the algorithm is verified on actual data. This method does not need to pre classify the abnormal points and perform fast, and can handle large scale data. The experimental results show that the proposed algorithm can not only effectively detect outliers in the time series of meteorological data but also improves the correction efficiency notoriously. Full article
Figures

Figure 1

Open AccessArticle
A Two-Stage Joint Model for Domain-Specific Entity Detection and Linking Leveraging an Unlabeled Corpus
Information 2017, 8(2), 59; doi:10.3390/info8020059 -
Abstract
The intensive construction of domain-specific knowledge bases (DSKB) has posed an urgent demand for researches about domain-specific entity detection and linking (DSEDL). Joint models are usually adopted in DSEDL tasks, but data imbalance and high computational complexity exist in these models. Besides, traditional
[...] Read more.
The intensive construction of domain-specific knowledge bases (DSKB) has posed an urgent demand for researches about domain-specific entity detection and linking (DSEDL). Joint models are usually adopted in DSEDL tasks, but data imbalance and high computational complexity exist in these models. Besides, traditional feature representation methods are insufficient for domain-specific tasks, due to problems such as lack of labeled data, link sparseness in DSKBs, and so on. In this paper, a two-stage joint (TSJ) model is proposed to solve the data imbalance problem by discriminatively processing entity mentions with different degrees of ambiguity. In addition, three novel methods are put forward to generate effective features by incorporating an unlabeled corpus. One crucial feature involving entity detection is the mention type, extracted by a long short-term memory (LSTM) model trained on automatically annotated data. The other two types of features mainly involve entity linking, including the inner-document topical coherence, which is measured based on entity co-occurring relationships in the corpus, and the cross-document entity coherence evaluated using similar documents. An overall 74.26% F1 value is obtained on a dataset of real-world movie comments, demonstrating the effectiveness of the proposed approach and indicating its potentiality to be used in real-world domain-specific applications. Full article
Figures

Figure 1

Open AccessArticle
A Novel Identity-Based Signcryption Scheme in the Standard Model
Information 2017, 8(2), 58; doi:10.3390/info8020058 -
Abstract
Identity-based signcryption is a useful cryptographic primitive that provides both authentication and confidentiality for identity-based crypto systems. It is challenging to build a secure identity-based signcryption scheme that can be proven secure in a standard model. In this paper, we address the issue
[...] Read more.
Identity-based signcryption is a useful cryptographic primitive that provides both authentication and confidentiality for identity-based crypto systems. It is challenging to build a secure identity-based signcryption scheme that can be proven secure in a standard model. In this paper, we address the issue and propose a novel construction of identity-based signcryption which enjoys IND-CCA security and existential unforgeability without resorting to the random oracle model. Comparisons demonstrate that the new scheme achieves stronger security, better performance efficiency and shorter system parameters. Full article
Open AccessArticle
An Effective and Robust Single Image Dehazing Method Using the Dark Channel Prior
Information 2017, 8(2), 57; doi:10.3390/info8020057 -
Abstract
In this paper, we propose a single image dehazing method aiming at addressing the inherent limitations of the extensively employed dark channel prior (DCP). More concretely, we introduce the Gaussian mixture model (GMM) to segment the input hazy image into scenes based on
[...] Read more.
In this paper, we propose a single image dehazing method aiming at addressing the inherent limitations of the extensively employed dark channel prior (DCP). More concretely, we introduce the Gaussian mixture model (GMM) to segment the input hazy image into scenes based on the haze density feature map. With the segmentation results, combined with the proposed sky region detection method, we can effectively recognize the sky region where the DCP cannot well handle this. On the basis of sky region detection, we then present an improved global atmospheric light estimation method to increase the estimation accuracy of the atmospheric light. Further, we present a multi-scale fusion-based strategy to obtain the transmission map based on DCP, which can significantly reduce the blocking artifacts of the transmission map. To further rectify the error-prone transmission within the sky region, an adaptive sky region transmission correction method is also presented. Finally, due to the segmentation-blindness of GMM, we adopt the guided total variation (GTV) to tackle this problem while eliminating the extensive texture details contained in the transmission map. Experimental results verify the power of our method and show its superiority over several state-of-the-art methods. Full article
Figures

Figure 1

Open AccessArticle
Dynamic, Interactive and Visual Analysis of Population Distribution and Mobility Dynamics in an Urban Environment Using the Mobility Explorer Framework
Information 2017, 8(2), 56; doi:10.3390/info8020056 -
Abstract
This paper investigates the extent to which a mobile data source can be utilised to generate new information intelligence for decision-making in smart city planning processes. In this regard, the Mobility Explorer framework is introduced and applied to the City of Vienna (Austria)
[...] Read more.
This paper investigates the extent to which a mobile data source can be utilised to generate new information intelligence for decision-making in smart city planning processes. In this regard, the Mobility Explorer framework is introduced and applied to the City of Vienna (Austria) by using anonymised mobile phone data from a mobile phone service provider. This framework identifies five necessary elements that are needed to develop complex planning applications. As part of the investigation and experiments a new dynamic software tool, called Mobility Explorer, has been designed and developed based on the requirements of the planning department of the City of Vienna. As a result, the Mobility Explorer enables city stakeholders to interactively visualise the dynamic diurnal population distribution, mobility patterns and various other complex outputs for planning needs. Based on the experiences during the development phase, this paper discusses mobile data issues, presents the visual interface, performs various user-defined analyses, demonstrates the application’s usefulness and critically reflects on the evaluation results of the citizens’ motion exploration that reveal the great potential of mobile phone data in smart city planning but also depict its limitations. These experiences and lessons learned from the Mobility Explorer application development provide useful insights for other cities and planners who want to make informed decisions using mobile phone data in their city planning processes through dynamic visualisation of Call Data Record (CDR) data. Full article
Figures

Figure 1

Open AccessArticle
A Filter Structure for Arbitrary Re-Sampling Ratio Conversion of a Discrete Signal
Information 2017, 8(2), 53; doi:10.3390/info8020053 -
Abstract
In this report, we studied the sampling synchronization of a discrete signal in the receiver of a communication system and found that the frequency of the received signal usually exhibits some unpredictable deviations. We observed many harmonics caused by the frequency deviations of
[...] Read more.
In this report, we studied the sampling synchronization of a discrete signal in the receiver of a communication system and found that the frequency of the received signal usually exhibits some unpredictable deviations. We observed many harmonics caused by the frequency deviations of the discrete received signal. These findings indicate that signal sampling synchronization is an important research technique when using discrete Fourier transforms (DFT) to analyze the harmonics of discrete signals. We investigated the influence of these harmonics on the performance of signal sampling and studied the frequency estimation of the received signal. Based on the frequency estimation of the received signal, the sampling rate of the discrete signal was converted using a modified Farrow filter to achieve sampling synchronization for the received signal. The algorithm discussed here can be applied to sampling synchronization for monitoring and control systems. Finally, simulations and experimental results are presented. Full article
Figures

Figure 1

Open AccessArticle
An Experience-Based Framework for Evaluating Tourism Mobile Commerce Platforms
Information 2017, 8(2), 55; doi:10.3390/info8020055 -
Abstract
This research presents and studies an evaluation framework for tourism mobile commerce platforms based on tourists’ experience. Synthesizing from prior literature, relevant theories, and the results of online questionnaires, we select 24 evaluation indices for preliminary evaluation. Using exploratory factor analysis method, we
[...] Read more.
This research presents and studies an evaluation framework for tourism mobile commerce platforms based on tourists’ experience. Synthesizing from prior literature, relevant theories, and the results of online questionnaires, we select 24 evaluation indices for preliminary evaluation. Using exploratory factor analysis method, we then extract from these indices the following five principal factors: interactive experience, infrastructure experience, personalization experience, product or service quality experience, and product operation experience. We further employ the confirmatory factor analysis to test the construction of the evaluation framework and demonstrate that the evaluation framework is both robust and effective. Finally, based on our proposed evaluation framework, we empirically evaluate the most popular mobile commerce platforms (Ctrip and Qunaer) in China by using fuzzy comprehensive evaluation method. Full article
Open AccessArticle
A Method for Multi-Criteria Group Decision Making with 2-Tuple Linguistic Information Based on Cloud Model
Information 2017, 8(2), 54; doi:10.3390/info8020054 -
Abstract
This paper presents a new approach to solve the multi-criteria group decision making (MCGDM) problem where criteria values take the form of 2-tuple linguistic information. Firstly, a 2-tuple hybrid ordered weighted geometric (THOWG) operator is proposed, which synthetically considers the importance of both
[...] Read more.
This paper presents a new approach to solve the multi-criteria group decision making (MCGDM) problem where criteria values take the form of 2-tuple linguistic information. Firstly, a 2-tuple hybrid ordered weighted geometric (THOWG) operator is proposed, which synthetically considers the importance of both individual and the ordered position so as to overcome the defects of existing operators. Secondly, combining the advantages of the cloud model and 2-tuple linguistic variable, a new generating cloud method is proposed to transform 2-tuple linguistic variables into clouds. Thirdly, we further define some new cloud algorithms, such as cloud possibility degree and cloud support degree which can be respectively used to compare clouds and determine the criteria weights. Furthermore, a new approach for 2-tuple linguistic group decision making is presented on the basis of the THOWG operator, the improved generating cloud method as well as the new cloud algorithms. Finally, an example of assessing the social effects of biomass power plants (BPPS) is illustrated to verify the application and feasible of the developed approach, and a comparative analysis is also conducted to validate the effectiveness of the proposed method. Full article
Figures

Figure 1

Open AccessArticle
Multi-Label Classification from Multiple Noisy Sources Using Topic Models
Information 2017, 8(2), 52; doi:10.3390/info8020052 -
Abstract
Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance,
[...] Read more.
Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. Our first contribution assumes labels from a perfect source. Towards this, we propose a novel topic model (ML-PA-LDA). The distinguishing feature in our model is that classes that are present as well as the classes that are absent generate the latent topics and hence the words. Extensive experimentation on real world datasets reveals the superior performance of the proposed model. A natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In this more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this motivation, we further augment our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS. With experiments on simulated noisy annotators, the proposed model learns the qualities of the annotators well, even with minimal training data. Full article
Figures

Figure 1