entropy-logo

Journal Browser

Journal Browser

Pattern Recognition and Data Clustering in Information Theory

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (30 November 2023) | Viewed by 17706

Special Issue Editors


E-Mail Website
Guest Editor
Higher School of Mechanical and Electrical Engineering (ESIME), National Polytechnic Institute of Mexico (Instituto Politécnico Nacional, IPN), Mexico city 07738, CDMX, Mexico
Interests: pattern recognition; artificial intelligence; neural networks; image processing; segmentation

E-Mail Website
Guest Editor
Higher School of Mechanical and Electrical Engineering (ESIME), National Polytechnic Institute of Mexico (Instituto Politécnico Nacional, IPN), Mexico city 07738, CDMX, Mexico
Interests: image processing; real-time processing; computer vision; deep learning

Special Issue Information

Dear Colleagues,

This Special Issue on Pattern Recognition and Data Clustering in Information Theory applies specialized algorithms in signals acquired by different sensors to solve problems related to the automated recognition of patterns and regularities in data in the fields of engineering and computer science.

In pattern recognition, the data analysis is related to predictive modeling, which aims to enable the use of training data to predict the behavior of unseen test data. This task is known as “learning”. One type of learning problem can be solved using clustering.  

Clustering is the process of partitioning a set of objects (pattern vectors) into subsets of similar objects called clusters. Some algorithms based on clustering include: connectivity models (hierarchical clustering), centroid models (k-means and fuzzy C-means), distribution models (multivariate normal distributions used by the expectation-maximization algorithm), density models (DBSCAN and OPTICS), subspace models (biclustering), graph-based models (HCS), and neural models (artificial neural networks, self-organizing maps, and principal component analysis). In recent years, considerable effort has been put into improving the performance of existing clustering-based algorithms and the development of new methods.

The goal of the Special Issue is to collect original clustering-based research papers that develop or apply new theory to solve issues, for example, in the fields of artificial vision, signal and image processing, information retrieval, data compression, computer graphics, and machine learning. Topics of interest include, but are not limited to: 

  • Filtering;
  • Enhancement and restoration;
  • Segmentation;
  • Classification and recognition.

Dr. Francisco J. Gallegos-Funes
Dr. Alberto J. Rosales Silva
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • information theory
  • data analysis
  • statistics
  • computing
  • machine learning and systems theory

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1924 KiB  
Article
Multiview Data Clustering with Similarity Graph Learning Guided Unsupervised Feature Selection
by Ni Li, Manman Peng and Qiang Wu
Entropy 2023, 25(12), 1606; https://doi.org/10.3390/e25121606 - 30 Nov 2023
Viewed by 730
Abstract
In multiview data clustering, consistent or complementary information in the multiview data can achieve better clustering results. However, the high dimensions, lack of labeling, and redundancy of multiview data certainly affect the clustering effect, posing a challenge to multiview clustering. A clustering algorithm [...] Read more.
In multiview data clustering, consistent or complementary information in the multiview data can achieve better clustering results. However, the high dimensions, lack of labeling, and redundancy of multiview data certainly affect the clustering effect, posing a challenge to multiview clustering. A clustering algorithm based on multiview feature selection clustering (MFSC), which combines similarity graph learning and unsupervised feature selection, is designed in this study. During the MFSC implementation, local manifold regularization is integrated into similarity graph learning, with the clustering label of similarity graph learning as the standard for unsupervised feature selection. MFSC can retain the characteristics of the clustering label on the premise of maintaining the manifold structure of multiview data. The algorithm is systematically evaluated using benchmark multiview and simulated data. The clustering experiment results prove that the MFSC algorithm is more effective than the traditional algorithm. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

17 pages, 8159 KiB  
Article
Denoising Vanilla Autoencoder for RGB and GS Images with Gaussian Noise
by Armando Adrián Miranda-González, Alberto Jorge Rosales-Silva, Dante Mújica-Vargas, Ponciano Jorge Escamilla-Ambrosio, Francisco Javier Gallegos-Funes, Jean Marie Vianney-Kinani, Erick Velázquez-Lozada, Luis Manuel Pérez-Hernández and Lucero Verónica Lozano-Vázquez
Entropy 2023, 25(10), 1467; https://doi.org/10.3390/e25101467 - 20 Oct 2023
Cited by 2 | Viewed by 1035
Abstract
Noise suppression algorithms have been used in various tasks such as computer vision, industrial inspection, and video surveillance, among others. The robust image processing systems need to be fed with images closer to a real scene; however, sometimes, due to external factors, the [...] Read more.
Noise suppression algorithms have been used in various tasks such as computer vision, industrial inspection, and video surveillance, among others. The robust image processing systems need to be fed with images closer to a real scene; however, sometimes, due to external factors, the data that represent the image captured are altered, which is translated into a loss of information. In this way, there are required procedures to recover data information closest to the real scene. This research project proposes a Denoising Vanilla Autoencoding (DVA) architecture by means of unsupervised neural networks for Gaussian denoising in color and grayscale images. The methodology improves other state-of-the-art architectures by means of objective numerical results. Additionally, a validation set and a high-resolution noisy image set are used, which reveal that our proposal outperforms other types of neural networks responsible for suppressing noise in images. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

16 pages, 2442 KiB  
Article
Graph Clustering with High-Order Contrastive Learning
by Wang Li, En Zhu, Siwei Wang and Xifeng Guo
Entropy 2023, 25(10), 1432; https://doi.org/10.3390/e25101432 - 10 Oct 2023
Cited by 1 | Viewed by 1083
Abstract
Graph clustering is a fundamental and challenging task in unsupervised learning. It has achieved great progress due to contrastive learning. However, we find that there are two problems that need to be addressed: (1) The augmentations in most graph contrastive clustering methods are [...] Read more.
Graph clustering is a fundamental and challenging task in unsupervised learning. It has achieved great progress due to contrastive learning. However, we find that there are two problems that need to be addressed: (1) The augmentations in most graph contrastive clustering methods are manual, which can result in semantic drift. (2) Contrastive learning is usually implemented on the feature level, ignoring the structure level, which can lead to sub-optimal performance. In this work, we propose a method termed Graph Clustering with High-Order Contrastive Learning (GCHCL) to solve these problems. First, we construct two views by Laplacian smoothing raw features with different normalizations and design a structure alignment loss to force these two views to be mapped into the same space. Second, we build a contrastive similarity matrix with two structure-based similarity matrices and force it to align with an identity matrix. In this way, our designed contrastive learning encompasses a larger neighborhood, enabling our model to learn clustering-friendly embeddings without the need for an extra clustering module. In addition, our model can be trained on a large dataset. Extensive experiments on five datasets validate the effectiveness of our model. For example, compared to the second-best baselines on four small and medium datasets, our model achieved an average improvement of 3% in accuracy. For the largest dataset, our model achieved an accuracy score of 81.92%, whereas the compared baselines encountered out-of-memory issues. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

16 pages, 36164 KiB  
Article
Enhancing Image Quality via Robust Noise Filtering Using Redescending M-Estimators
by Ángel Arturo Rendón-Castro, Dante Mújica-Vargas, Antonio Luna-Álvarez and Jean Marie Vianney Kinani
Entropy 2023, 25(8), 1176; https://doi.org/10.3390/e25081176 - 7 Aug 2023
Viewed by 840
Abstract
In the field of image processing, noise represents an unwanted component that can occur during signal acquisition, transmission, and storage. In this paper, we introduce an efficient method that incorporates redescending M-estimators within the framework of Wiener estimation. The proposed approach effectively suppresses [...] Read more.
In the field of image processing, noise represents an unwanted component that can occur during signal acquisition, transmission, and storage. In this paper, we introduce an efficient method that incorporates redescending M-estimators within the framework of Wiener estimation. The proposed approach effectively suppresses impulsive, additive, and multiplicative noise across varied densities. Our proposed filter operates on both grayscale and color images; it uses local information obtained from the Wiener filter and robust outlier rejection based on Insha and Hampel’s tripartite redescending influence functions. The effectiveness of the proposed method is verified through qualitative and quantitative results, using metrics such as PSNR, MAE, and SSIM. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

32 pages, 9097 KiB  
Article
Benign and Malignant Breast Tumor Classification in Ultrasound and Mammography Images via Fusion of Deep Learning and Handcraft Features
by Clara Cruz-Ramos, Oscar García-Avila, Jose-Agustin Almaraz-Damian, Volodymyr Ponomaryov, Rogelio Reyes-Reyes and Sergiy Sadovnychiy
Entropy 2023, 25(7), 991; https://doi.org/10.3390/e25070991 - 28 Jun 2023
Cited by 8 | Viewed by 2652
Abstract
Breast cancer is a disease that affects women in different countries around the world. The real cause of breast cancer is particularly challenging to determine, and early detection of the disease is necessary for reducing the death rate, due to the high risks [...] Read more.
Breast cancer is a disease that affects women in different countries around the world. The real cause of breast cancer is particularly challenging to determine, and early detection of the disease is necessary for reducing the death rate, due to the high risks associated with breast cancer. Treatment in the early period can increase the life expectancy and quality of life for women. CAD (Computer Aided Diagnostic) systems can perform the diagnosis of the benign and malignant lesions of breast cancer using technologies and tools based on image processing, helping specialist doctors to obtain a more precise point of view with fewer processes when making their diagnosis by giving a second opinion. This study presents a novel CAD system for automated breast cancer diagnosis. The proposed method consists of different stages. In the preprocessing stage, an image is segmented, and a mask of a lesion is obtained; during the next stage, the extraction of the deep learning features is performed by a CNN—specifically, DenseNet 201. Additionally, handcrafted features (Histogram of Oriented Gradients (HOG)-based, ULBP-based, perimeter area, area, eccentricity, and circularity) are obtained from an image. The designed hybrid system uses CNN architecture for extracting deep learning features, along with traditional methods which perform several handcraft features, following the medical properties of the disease with the purpose of later fusion via proposed statistical criteria. During the fusion stage, where deep learning and handcrafted features are analyzed, the genetic algorithms as well as mutual information selection algorithm, followed by several classifiers (XGBoost, AdaBoost, Multilayer perceptron (MLP)) based on stochastic measures, are applied to choose the most sensible information group among the features. In the experimental validation of two modalities of the CAD design, which performed two types of medical studies—mammography (MG) and ultrasound (US)—the databases mini-DDSM (Digital Database for Screening Mammography) and BUSI (Breast Ultrasound Images Dataset) were used. Novel CAD systems were evaluated and compared with recent state-of-the-art systems, demonstrating better performance in commonly used criteria, obtaining ACC of 97.6%, PRE of 98%, Recall of 98%, F1-Score of 98%, and IBA of 95% for the abovementioned datasets. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

18 pages, 3785 KiB  
Article
Infrared Image Caption Based on Object-Oriented Attention
by Junfeng Lv, Tian Hui, Yongfeng Zhi and Yuelei Xu
Entropy 2023, 25(5), 826; https://doi.org/10.3390/e25050826 - 22 May 2023
Cited by 2 | Viewed by 1247
Abstract
With the ongoing development of image technology, the deployment of various intelligent applications on embedded devices has attracted increased attention in the industry. One such application is automatic image captioning for infrared images, which involves converting images into text. This practical task is [...] Read more.
With the ongoing development of image technology, the deployment of various intelligent applications on embedded devices has attracted increased attention in the industry. One such application is automatic image captioning for infrared images, which involves converting images into text. This practical task is widely used in night security, as well as for understanding night scenes and other scenarios. However, due to the differences in image features and the complexity of semantic information, generating captions for infrared images remains a challenging task. From the perspective of deployment and application, to improve the correlation between descriptions and objects, we introduced the YOLOv6 and LSTM as encoder-decoder structure and proposed infrared image caption based on object-oriented attention. Firstly, to improve the domain adaptability of the detector, we optimized the pseudo-label learning process. Secondly, we proposed the object-oriented attention method to address the alignment problem between complex semantic information and embedded words. This method helps select the most crucial features of the object region and guides the caption model in generating words that are more relevant to the object. Our methods have shown good performance on the infrared image and can produce words explicitly associated with the object regions located by the detector. The robustness and effectiveness of the proposed methods were demonstrated through evaluation on various datasets, along with other state-of-the-art methods. Our approach achieved BLUE-4 scores of 31.6 and 41.2 on KAIST and Infrared City and Town datasets, respectively. Our approach provides a feasible solution for the deployment of embedded devices in industrial applications. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

28 pages, 22100 KiB  
Article
Adaptive Density Spatial Clustering Method Fusing Chameleon Swarm Algorithm
by Wei Zhou, Limin Wang, Xuming Han, Yizhang Wang, Yufei Zhang and Zhiyao Jia
Entropy 2023, 25(5), 782; https://doi.org/10.3390/e25050782 - 11 May 2023
Cited by 6 | Viewed by 1641
Abstract
The density-based spatial clustering of application with noise (DBSCAN) algorithm is able to cluster arbitrarily structured datasets. However, the clustering result of this algorithm is exceptionally sensitive to the neighborhood radius (Eps) and noise points, and it is hard to obtain [...] Read more.
The density-based spatial clustering of application with noise (DBSCAN) algorithm is able to cluster arbitrarily structured datasets. However, the clustering result of this algorithm is exceptionally sensitive to the neighborhood radius (Eps) and noise points, and it is hard to obtain the best result quickly and accurately with it. To solve the above problems, we propose an adaptive DBSCAN method based on the chameleon swarm algorithm (CSA-DBSCAN). First, we take the clustering evaluation index of the DBSCNA algorithm as the objective function and use the chameleon swarm algorithm (CSA) to iteratively optimize the evaluation index value of the DBSCAN algorithm to obtain the best Eps value and clustering result. Then, we introduce the theory of deviation in the data point spatial distance of the nearest neighbor search mechanism to assign the identified noise points, which solves the problem of over-identification of the algorithm noise points. Finally, we construct color image superpixel information to improve the CSA-DBSCAN algorithm’s performance regarding image segmentation. The simulation results of synthetic datasets, real-world datasets, and color images show that the CSA-DBSCAN algorithm can quickly find accurate clustering results and segment color images effectively. The CSA-DBSCAN algorithm has certain clustering effectiveness and practicality. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

19 pages, 410 KiB  
Article
An Ensemble and Multi-View Clustering Method Based on Kolmogorov Complexity
by Juan Zamora and Jérémie Sublime
Entropy 2023, 25(2), 371; https://doi.org/10.3390/e25020371 - 17 Feb 2023
Cited by 2 | Viewed by 1467
Abstract
The ability to build more robust clustering from many clustering models with different solutions is relevant in scenarios with privacy-preserving constraints, where data features have a different nature or where these features are not available in a single computation unit. Additionally, with the [...] Read more.
The ability to build more robust clustering from many clustering models with different solutions is relevant in scenarios with privacy-preserving constraints, where data features have a different nature or where these features are not available in a single computation unit. Additionally, with the booming number of multi-view data, but also of clustering algorithms capable of producing a wide variety of representations for the same objects, merging clustering partitions to achieve a single clustering result has become a complex problem with numerous applications. To tackle this problem, we propose a clustering fusion algorithm that takes existing clustering partitions acquired from multiple vector space models, sources, or views, and merges them into a single partition. Our merging method relies on an information theory model based on Kolmogorov complexity that was originally proposed for unsupervised multi-view learning. Our proposed algorithm features a stable merging process and shows competitive results over several real and artificial datasets in comparison with other state-of-the-art methods that have similar goals. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

12 pages, 4712 KiB  
Article
Efficient System for Delimitation of Benign and Malignant Breast Masses
by Dante Mújica-Vargas, Manuel Matuz-Cruz, Christian García-Aquino and Celia Ramos-Palencia
Entropy 2022, 24(12), 1775; https://doi.org/10.3390/e24121775 - 5 Dec 2022
Cited by 2 | Viewed by 1292
Abstract
In this study, a high-performing scheme is introduced to delimit benign and malignant masses in breast ultrasound images. The proposal is built upon by the Nonlocal Means filter for image quality improvement, an Intuitionistic Fuzzy C-Means local clustering algorithm for superpixel generation with [...] Read more.
In this study, a high-performing scheme is introduced to delimit benign and malignant masses in breast ultrasound images. The proposal is built upon by the Nonlocal Means filter for image quality improvement, an Intuitionistic Fuzzy C-Means local clustering algorithm for superpixel generation with high adherence to the edges, and the DBSCAN algorithm for the global clustering of those superpixels in order to delimit masses’ regions. The empirical study was performed using two datasets, both with benign and malignant breast tumors. The quantitative results with respect to the BUSI dataset were JSC0.907, DM0.913, HD7.025, and MCR6.431 for benign masses and JSC0.897, DM0.900, HD8.666, and MCR8.016 for malignant ones, while the MID dataset resulted in JSC0.890, DM0.905, HD8.370, and MCR7.241 along with JSC0.881, DM0.898, HD8.865, and MCR7.808 for benign and malignant masses, respectively. These numerical results revealed that our proposal outperformed all the evaluated comparative state-of-the-art methods in mass delimitation. This is confirmed by the visual results since the segmented regions had a better edge delimitation. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

19 pages, 11569 KiB  
Article
Grid-Based Clustering Using Boundary Detection
by Mingjing Du and Fuyu Wu
Entropy 2022, 24(11), 1606; https://doi.org/10.3390/e24111606 - 4 Nov 2022
Cited by 5 | Viewed by 4026
Abstract
Clustering can be divided into five categories: partitioning, hierarchical, model-based, density-based, and grid-based algorithms. Among them, grid-based clustering is highly efficient in handling spatial data. However, the traditional grid-based clustering algorithms still face many problems: (1) Parameter tuning: density thresholds are difficult to [...] Read more.
Clustering can be divided into five categories: partitioning, hierarchical, model-based, density-based, and grid-based algorithms. Among them, grid-based clustering is highly efficient in handling spatial data. However, the traditional grid-based clustering algorithms still face many problems: (1) Parameter tuning: density thresholds are difficult to adjust; (2) Data challenge: clusters with overlapping regions and varying densities are not well handled. We propose a new grid-based clustering algorithm named GCBD that can solve the above problems. Firstly, the density estimation of nodes is defined using the standard grid structure. Secondly, GCBD uses an iterative boundary detection strategy to distinguish core nodes from boundary nodes. Finally, two clustering strategies are combined to group core nodes and assign boundary nodes. Experiments on 18 datasets demonstrate that the proposed algorithm outperforms 6 grid-based competitors. Full article
(This article belongs to the Special Issue Pattern Recognition and Data Clustering in Information Theory)
Show Figures

Figure 1

Back to TopTop