MDPI - Publisher of Open Access Journals

14 pages, 2091 KiB

Open AccessArticle

PyGlaucoMetrics: A Stacked Weight-Based Machine Learning Approach for Glaucoma Detection Using Visual Field Data

by Mousa Moradi, Saber Kazeminasab Hashemabad, Daniel M. Vu, Allison R. Soneru, Asahi Fujita, Mengyu Wang, Tobias Elze, Mohammad Eslami and Nazlee Zebardast

Medicina 2025, 61(3), 541; https://doi.org/10.3390/medicina61030541 - 20 Mar 2025

Viewed by 704

Abstract

Background and Objectives: Glaucoma (GL) classification is crucial for early diagnosis and treatment, yet relying solely on stand-alone models or International Classification of Diseases (ICD) codes is insufficient due to limited predictive power and inconsistencies in clinical labeling. This study aims to [...] Read more.

Background and Objectives: Glaucoma (GL) classification is crucial for early diagnosis and treatment, yet relying solely on stand-alone models or International Classification of Diseases (ICD) codes is insufficient due to limited predictive power and inconsistencies in clinical labeling. This study aims to improve GL classification using stacked weight-based machine learning models. Materials and Methods: We analyzed a subset of 33,636 participants (58% female) with 340,444 visual fields (VFs) from the Mass Eye and Ear (MEE) dataset. Five clinically relevant GL detection models (LoGTS, UKGTS, Kang, HAP2_part1, and Foster) were selected to serve as base models. Two multi-layer perceptron (MLP) models were trained using 52 total deviation (TD) and pattern deviation (PD) values from Humphrey field analyzer (HFA) 24-2 VF tests, along with four clinical variables (age, gender, follow-up time, and race) to extract model weights. These weights were then utilized to train three meta-learners, including logistic regression (LR), extreme gradient boosting (XGB), and MLP, to classify cases as GL or non-GL. Results: The MLP meta-learner achieved the highest performance, with an accuracy of 96.43%, an F-score of 96.01%, and an AUC of 97.96%, while also demonstrating the lowest prediction uncertainty (0.08 ± 0.13). XGB followed with 92.86% accuracy, a 92.31% F-score, and a 96.10% AUC. LR had the lowest performance, with 89.29% accuracy, an 86.96% F-score, and a 94.81% AUC, as well as the highest uncertainty (0.58 ± 0.07). Permutation importance analysis revealed that the superior temporal sector was the most influential VF feature, with importance scores of 0.08 in Kang’s and 0.04 in HAP2_part1 models. Among clinical variables, age was the strongest contributor (score = 0.3). Conclusions: The meta-learner outperformed stand-alone models in GL classification, achieving an accuracy improvement of 8.92% over the best-performing stand-alone model (LoGTS with 87.51%), offering a valuable tool for automated glaucoma detection. Full article

(This article belongs to the Special Issue Advances in Diagnosis and Therapies of Ocular Diseases)

► Show Figures

Figure 1

18 pages, 16958 KiB

Open AccessArticle

Investigating Energy Performance Criteria in Compliance with Iranian National Building Regulations: The Role of Residential Building Envelope Adjacency

by Payam Soltan Ahmadi, Ahmad Khoshgard and Hossein Ahmadi Danesh Ashtiani

Buildings 2025, 15(1), 44; https://doi.org/10.3390/buildings15010044 - 26 Dec 2024

Cited by 1 | Viewed by 1198

Abstract

Energy consumption modeling in buildings is crucial for calculating energy performance indices and establishing criteria for energy labeling. Different countries utilize diverse approaches to calculate these indices based on energy efficiency regulations and classifications. In recent years, Iran has established energy compliance standards, [...] Read more.

Energy consumption modeling in buildings is crucial for calculating energy performance indices and establishing criteria for energy labeling. Different countries utilize diverse approaches to calculate these indices based on energy efficiency regulations and classifications. In recent years, Iran has established energy compliance standards, outlined in Article 19 of the National Building Regulations, to improve the energy efficiency of buildings. This study aims to develop a systematic methodology for assessing energy consumption indicators in residential buildings using the criteria specified in the Iranian National Building Regulations. Our research examines three specific energy standard categories in residential buildings to evaluate the suitability of the energy compliance specifications and identify the distribution of energy indices, rather than relying solely on the fixed values prescribed in the regulations. Initially, three model building shapes were analyzed to demonstrate how different building envelope designs affect energy performance. This study fills a critical research gap by estimating energy consumption indices through a novel methodology that combines regression analysis and Monte Carlo simulation for the three energy classifications specified in Article 19 of the Iranian National Building Regulations. The study employs a permutation approach to evaluate the primary energy consumption indicators and the uncertainties arising from various adjacency configurations. Extensive simulations were conducted, resulting in the development of regression equations that account for the surface area of the building envelope adjacent to the outdoor environment. The Monte Carlo method was used to assess potential fluctuations in the adiabatic area of the building envelope and the area adjacent to the external environment for buildings with varying orientations, allowing for the generation of probability distributions for energy consumption intensities. The sensitivity analysis identified the critical components of the building envelope and their orientation that significantly impact the uncertainty of energy efficiency. The findings revealed that the west and east walls of buildings adjacent to the outdoor environment substantially influence the uncertainty of energy consumption. In contrast, the floor surface and south wall had the least significant effect on annual energy uncertainty. This innovative approach represents a significant advancement in the field. It plays a specific role in energy labeling for buildings by calculating the required standard deviation in energy consumption indices resulting from various envelope adjacencies. This research also has practical implications for building design and energy efficiency measurement. Full article

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

► Show Figures

Figure 1

25 pages, 5565 KiB

Open AccessArticle

Unsupervised Modelling of E-Customers’ Profiles: Multiple Correspondence Analysis with Hierarchical Clustering of Principal Components and Machine Learning Classifiers

by Vijoleta Vrhovac, Marko Orošnjak, Kristina Ristić, Nemanja Sremčev, Mitar Jocanović, Jelena Spajić and Nebojša Brkljač

Mathematics 2024, 12(23), 3794; https://doi.org/10.3390/math12233794 - 30 Nov 2024

Viewed by 1353

Abstract

The rapid growth of e-commerce has transformed customer behaviors, demanding deeper insights into how demographic factors shape online user preferences. This study performed a threefold analysis to understand the impact of these changes. Firstly, this study investigated how demographic factors (e.g., age, gender, [...] Read more.

The rapid growth of e-commerce has transformed customer behaviors, demanding deeper insights into how demographic factors shape online user preferences. This study performed a threefold analysis to understand the impact of these changes. Firstly, this study investigated how demographic factors (e.g., age, gender, education) influence e-customer preferences in Serbia. From a sample of n = 906 respondents, conditional dependencies between demographics and user preferences were tested. From a hypothetical framework of 24 tested hypotheses, this study successfully rejected 8/24 (with p < 0.05), suggesting a high association between demographics with purchase frequency and reasons for quitting the purchase. However, although the reported test statistics suggested an association, understanding how interactions between categories shape e-customer profiles was still required. Therefore, the second part of this study considers an MCA-HCPC (Multiple Correspondence Analysis with Hierarchical Clustering on Principal Components) to identify user profiles. The analysis revealed three main clusters: (1) young, female, unemployed e-customers driven mainly by customer reviews; (2) retirees and older adults with infrequent purchases, hesitant to buy without experiencing the product in person; and (3) employed, highly educated, male, middle-aged adults who prioritize fast and accurate delivery over price. In the third stage, the clusters are used as labels for Machine Learning (ML) classification tasks. Particularly, Gradient Boosting Machine (GBM), Decision Tree (DT), k-Nearest Neighbors (kNN), Gaussian Naïve Bayes (GNB), Random Forest (RF), and Support Vector Machine (SVM) were used. The results suggested that GBM, RF, and SVM had high classification performance in identifying user profiles. Lastly, after performing Permutation Feature Importance (PFI), the findings suggested that age, work status, education, and income are the main determinants of shaping e-customer profiles and developing marketing strategies. Full article

(This article belongs to the Special Issue Computational and Mathematical Methods in Information Science and Engineering, 2nd Edition)

► Show Figures

Figure 1

23 pages, 5944 KiB

Open AccessArticle

Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques

by Gaurish Thakkar, Nives Mikelić Preradović and Marko Tadić

Eng 2024, 5(4), 2920-2942; https://doi.org/10.3390/eng5040152 - 7 Nov 2024

Viewed by 2105

Abstract

This investigation investigates the influence of a variety of data augmentation techniques on sentiment analysis in low-resource languages, with a particular emphasis on Bulgarian, Croatian, Slovak, and Slovene. The following primary research topic is addressed: is it possible to improve sentiment analysis efficacy [...] Read more.

This investigation investigates the influence of a variety of data augmentation techniques on sentiment analysis in low-resource languages, with a particular emphasis on Bulgarian, Croatian, Slovak, and Slovene. The following primary research topic is addressed: is it possible to improve sentiment analysis efficacy in low-resource languages through data augmentation? Our sub-questions look at how different augmentation methods affect performance, how effective WordNet-based augmentation is compared to other methods, and whether lemma-based augmentation techniques can be used, especially for Croatian sentiment tasks. The sentiment-labelled evaluations in the selected languages are included in our data sources, which were curated with additional annotations to standardise labels and mitigate ambiguities. Our findings show that techniques like replacing words with synonyms, masked language model (MLM)-based generation, and permuting and combining sentences can only make training datasets slightly bigger. However, they provide limited improvements in model accuracy for low-resource language sentiment classification. WordNet-based techniques, in particular, exhibit a marginally superior performance compared to other methods; however, they fail to substantially improve classification scores. From a practical perspective, this study emphasises that conventional augmentation techniques may require refinement to address the complex linguistic features that are inherent to low-resource languages, particularly in mixed-sentiment and context-rich instances. Theoretically, our results indicate that future research should concentrate on the development of augmentation strategies that introduce novel syntactic structures rather than solely relying on lexical variations, as current models may not effectively leverage synonymic or lemmatised data. These insights emphasise the nuanced requirements for meaningful data augmentation in low-resource linguistic settings and contribute to the advancement of sentiment analysis approaches. Full article

(This article belongs to the Special Issue Feature Papers in Eng 2024)

► Show Figures

Figure 1

13 pages, 1965 KiB

Open AccessArticle

Binary-Tree-Fed Mixnet: An Efficient Symmetric Encryption Solution

by Diego Antonio López-García, Juan Pérez Torreglosa, David Vera and Manuel Sánchez-Raya

Appl. Sci. 2024, 14(3), 966; https://doi.org/10.3390/app14030966 - 23 Jan 2024

Cited by 1 | Viewed by 1648

Abstract

Mixnets are an instrument to achieve anonymity. They are generally a sequence of servers that apply a cryptographic process and a permutation to a batch of user messages. Most use asymmetric cryptography, with the high computational cost that this entails. The main objective [...] Read more.

Mixnets are an instrument to achieve anonymity. They are generally a sequence of servers that apply a cryptographic process and a permutation to a batch of user messages. Most use asymmetric cryptography, with the high computational cost that this entails. The main objective of this study is to reduce delay in mixnet nodes. In this sense, this paper presents a new scheme that is based only on symmetric cryptography. The novelty of this scheme is the use of binary graphs built by mixnet nodes. The root node collects user keys and labels without knowing their owners. After feeding each node by its graph, they can establish a random permutation and relate their keys to the incoming batch positions through labels. The differences with previous symmetric schemes are that users do not need long headers and nodes avoid the searching process. The outcomes are security and efficiency improvements. As far as we know, it is the fastest mixnet system. Therefore, it is appropriate for high-throughput applications like national polls (many users) or debates (many messages). Full article

(This article belongs to the Special Issue Cryptography and Information Security)

► Show Figures

Figure 1

22 pages, 19830 KiB

Open AccessArticle

D-Net: A Density-Based Convolutional Neural Network for Mobile LiDAR Point Clouds Classification in Urban Areas

by Mahdiye Zaboli, Heidar Rastiveis, Benyamin Hosseiny, Danesh Shokri, Wayne A. Sarasua and Saeid Homayouni

Remote Sens. 2023, 15(9), 2317; https://doi.org/10.3390/rs15092317 - 27 Apr 2023

Cited by 6 | Viewed by 2767

Abstract

The 3D semantic segmentation of a LiDAR point cloud is essential for various complex infrastructure analyses such as roadway monitoring, digital twin, or even smart city development. Different geometric and radiometric descriptors or diverse combinations of point descriptors can extract objects from LiDAR [...] Read more.

The 3D semantic segmentation of a LiDAR point cloud is essential for various complex infrastructure analyses such as roadway monitoring, digital twin, or even smart city development. Different geometric and radiometric descriptors or diverse combinations of point descriptors can extract objects from LiDAR data through classification. However, the irregular structure of the point cloud is a typical descriptor learning problem—how to consider each point and its surroundings in an appropriate structure for descriptor extraction? In recent years, convolutional neural networks (CNNs) have received much attention for automatic segmentation and classification. Previous studies demonstrated deep learning models’ high potential and robust performance for classifying complicated point clouds and permutation invariance. Nevertheless, such algorithms still extract descriptors from independent points without investigating the deep descriptor relationship between the center point and its neighbors. This paper proposes a robust and efficient CNN-based framework named D-Net for automatically classifying a mobile laser scanning (MLS) point cloud in urban areas. Initially, the point cloud is converted into a regular voxelized structure during a preprocessing step. This helps to overcome the challenge of irregularity and inhomogeneity. A density value is assigned to each voxel that describes the point distribution within the voxel’s location. Then, by training the designed CNN classifier, each point will receive the label of its corresponding voxel. The performance of the proposed D-Net method was tested using a point cloud dataset in an urban area. Our results demonstrated a relatively high level of performance with an overall accuracy (OA) of about 98% and precision, recall, and F1 scores of over 92%. Full article

(This article belongs to the Special Issue Machine Learning for LiDAR Point Cloud Analysis)

► Show Figures

Figure 1

28 pages, 888 KiB

Open AccessArticle

Efficient False Positive Control Algorithms in Big Data Mining

by Xuze Liu, Yuhai Zhao, Tongze Xu, Fazal Wahab, Yiming Sun and Chen Chen

Appl. Sci. 2023, 13(8), 5006; https://doi.org/10.3390/app13085006 - 16 Apr 2023

Cited by 6 | Viewed by 2665

Abstract

The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in [...] Read more.

The typical hypothesis testing issue in statistical analysis is determining whether a pattern is significantly associated with a specific class label. This usually leads to highly challenging multiple-hypothesis testing problems in big data mining scenarios, as millions or billions of hypothesis tests in large-scale exploratory data analysis can result in a large number of false positive results. The permutation testing-based FWER control method (PFWER) is theoretically effective in dealing with multiple hypothesis testing issues. In reality, however, this theoretical approach confronts a serious computational efficiency problem. It takes an extremely long time to compute an appropriate FWER false positive control threshold using PFWER, which is almost impossible to achieve in a reasonable amount of time using human effort on medium- or large-scale data. Although some methods for improving the efficiency of the FWER false positive control threshold calculation have been proposed, most of them are stand-alone, and there is still a lot of space for efficiency improvement. To address this problem, this paper proposes a distributed PFWER false-positive threshold calculation method for large-scale data. The computational effectiveness increases significantly when compared to the current approaches. The FP-growth algorithm is used first for pattern mining, and the mining process reduces the computation of invalid patterns by using pruning operations and index optimization for merging patterns with index transactions. The distributed computing technique is introduced on this basis, and the constructed FP tree is decomposed into a set of subtrees, each corresponding to a subtask. All subtrees (subtasks) are distributed to different computing nodes. Each node independently calculates the local significance threshold according to the designated subtasks. Finally, all local results are aggregated to compute the FWER false positive control threshold, which is completely consistent with the theoretical result. A series of experimental findings on 11 real-world datasets demonstrate that the distributed algorithm proposed in this paper can significantly improve the computation efficiency of PFWER while ensuring its theoretical accuracy. Full article

(This article belongs to the Special Issue Big Data Engineering and Application)

► Show Figures

Figure 1

10 pages, 258 KiB

Open AccessArticle

A Novel Problem for Solving Permuted Cordial Labeling of Graphs

by Ashraf ELrokh, Mohammed M. Ali Al-Shamiri, Mohammed M. A. Almazah and Atef Abd El-hay

Symmetry 2023, 15(4), 825; https://doi.org/10.3390/sym15040825 - 29 Mar 2023

Cited by 6 | Viewed by 2187

Abstract

In this paper, we used the permutation group together with the concept of cordiality in graph theory to introduce a new method of labeling. This construed permuted cordial labeling can be applied to all paths, cycles, fans and wheel graphs. Moreover, some other [...] Read more.

In this paper, we used the permutation group together with the concept of cordiality in graph theory to introduce a new method of labeling. This construed permuted cordial labeling can be applied to all paths, cycles, fans and wheel graphs. Moreover, some other properties are investigated and show that the union of any two paths and the union of any two cycles are permuted cordial graphs. In addition, we investigated the permuted cordiality for the union of any path with cycle. Full article

(This article belongs to the Special Issue Graph Algorithms and Graph Theory II)

21 pages, 5282 KiB

Open AccessArticle

A Novel Separable Scheme for Encryption and Reversible Data Hiding

by Pei Chen, Yang Lei, Ke Niu and Xiaoyuan Yang

Electronics 2022, 11(21), 3505; https://doi.org/10.3390/electronics11213505 - 28 Oct 2022

Cited by 6 | Viewed by 1803

Abstract

With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes [...] Read more.

With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes a novel separable scheme for encryption and reversible data hiding. In terms of encryption method, intra-prediction mode and motion vector difference are encrypted by XOR encryption, and quantized discrete cosine transform block is permutated based on logistic chaotic mapping. In terms of the reversible data hiding algorithm, difference expansion is applied in encrypted video for the first time in this paper. The encryption method and the data hiding algorithm are separable, and the embedded information can be accurately extracted in both encrypted video bitstream and decrypted video bitstream. The experimental results show that the proposed encryption method can resist sketch attack and has higher security than other schemes, keeping the bit rate unchanged. The embedding algorithm used in the proposed scheme can provide higher capacity in the video with lower quantization parameter and good visual quality of the labeled decrypted video, maintaining low bit rate variation. The video encryption and the reversible data hiding are separable and the scheme can be applied in more scenarios. Full article

(This article belongs to the Topic Computer Vision and Image Processing)

► Show Figures

Figure 1

12 pages, 1139 KiB

Open AccessArticle

Domain Adaptation for In-Line Allergen Classification of Agri-Food Powders Using Near-Infrared Spectroscopy

by Alexander Lewis Bowler, Samet Ozturk, Ahmed Rady and Nicholas Watson

Sensors 2022, 22(19), 7239; https://doi.org/10.3390/s22197239 - 24 Sep 2022

Cited by 9 | Viewed by 2614

Abstract

The addition of incorrect agri-food powders to a production line due to human error is a large safety concern in food and drink manufacturing, owing to incorporation of allergens in the final product. This work combines near-infrared spectroscopy with machine-learning models for early [...] Read more.

The addition of incorrect agri-food powders to a production line due to human error is a large safety concern in food and drink manufacturing, owing to incorporation of allergens in the final product. This work combines near-infrared spectroscopy with machine-learning models for early detection of this problem. Specifically, domain adaptation is used to transfer models from spectra acquired under stationary conditions to moving samples, thereby minimizing the volume of labelled data required to collect on a production line. Two deep-learning domain-adaptation methodologies are used: domain-adversarial neural networks and semisupervised generative adversarial neural networks. Overall, accuracy of up to 96.0% was achieved using no labelled data from the target domain moving spectra, and up to 99.68% was achieved when incorporating a single labelled data instance for each material into model training. Using both domain-adaptation methodologies together achieved the highest prediction accuracies on average, as did combining measurements from two near-infrared spectroscopy sensors with different wavelength ranges. Ensemble methods were used to further increase model accuracy and provide quantification of model uncertainty, and a feature-permutation method was used for global interpretability of the models. Full article

(This article belongs to the Special Issue Artificial Intelligence and Sensor Technologies in Agri-Food)

► Show Figures

Figure 1

12 pages, 555 KiB

Open AccessFeature PaperArticle

An Information-Theoretic Bound on p-Values for Detecting Communities Shared between Weighted Labeled Graphs

by Predrag Obradovic, Vladimir Kovačević, Xiqi Li and Aleksandar Milosavljevic

Entropy 2022, 24(10), 1329; https://doi.org/10.3390/e24101329 - 21 Sep 2022

Viewed by 1924

Abstract

Extraction of subsets of highly connected nodes (“communities” or modules) is a standard step in the analysis of complex social and biological networks. We here consider the problem of finding a relatively small set of nodes in two labeled weighted graphs that is [...] Read more.

Extraction of subsets of highly connected nodes (“communities” or modules) is a standard step in the analysis of complex social and biological networks. We here consider the problem of finding a relatively small set of nodes in two labeled weighted graphs that is highly connected in both. While many scoring functions and algorithms tackle the problem, the typically high computational cost of permutation testing required to establish the p-value for the observed pattern presents a major practical obstacle. To address this problem, we here extend the recently proposed CTD (“Connect the Dots”) approach to establish information-theoretic upper bounds on the p-values and lower bounds on the size and connectedness of communities that are detectable. This is an innovation on the applicability of CTD, broadening its use to pairs of graphs. Full article

(This article belongs to the Special Issue Information Theory in Computational Biology)

► Show Figures

Figure 1

39 pages, 1576 KiB

Open AccessArticle

A Catalog of Enumeration Formulas for Bouquet and Dipole Embeddings under Symmetries

by Mark N. Ellingham and Joanna A. Ellis-Monaghan

Symmetry 2022, 14(9), 1793; https://doi.org/10.3390/sym14091793 - 29 Aug 2022

Cited by 2 | Viewed by 1712

Abstract

Motivated by the problem arising out of DNA origami, we give a general counting framework and enumeration formulas for various cellular embeddings of bouquets and dipoles under different kinds of symmetries. Our algebraic framework can be used constructively to generate desired symmetry classes, [...] Read more.

Motivated by the problem arising out of DNA origami, we give a general counting framework and enumeration formulas for various cellular embeddings of bouquets and dipoles under different kinds of symmetries. Our algebraic framework can be used constructively to generate desired symmetry classes, and we use Burnside’s lemma with various symmetry groups to derive the enumeration formulas. Our results assimilate several existing formulas into this unified framework. Furthermore, we provide new formulas for bouquets with colored edges (and thus for bouquets in nonorientable surfaces) as well as for directed embeddings of directed bouquets. We also enumerate vertex-labeled dipole embeddings. Since dipole embeddings may be represented by permutations, the formulas also apply to certain equivalence classes of permutations and permutation matrices. The resulting bouquet and dipole symmetry formulas enumerate structures relevant to a wide variety of areas in addition to DNA origami, including RNA secondary structures, Feynman diagrams, and topological graph theory. For uncolored objects, we catalog 58 distinct sequences, of which 43 have not, as far as we know, been described previously. Full article

(This article belongs to the Special Issue Topological Methods in Chemistry and Molecular Biology)

► Show Figures

Figure 1

21 pages, 7529 KiB

Open AccessArticle

A Novel Query Strategy-Based Rank Batch-Mode Active Learning Method for High-Resolution Remote Sensing Image Classification

by Xin Luo, Huaqiang Du, Guomo Zhou, Xuejian Li, Fangjie Mao, Di’en Zhu, Yanxin Xu, Meng Zhang, Shaobai He and Zihao Huang

Remote Sens. 2021, 13(11), 2234; https://doi.org/10.3390/rs13112234 - 7 Jun 2021

Cited by 10 | Viewed by 3476

Abstract

An informative training set is necessary for ensuring the robust performance of the classification of very-high-resolution remote sensing (VHRRS) images, but labeling work is often difficult, expensive, and time-consuming. This makes active learning (AL) an important part of an image analysis framework. AL [...] Read more.

An informative training set is necessary for ensuring the robust performance of the classification of very-high-resolution remote sensing (VHRRS) images, but labeling work is often difficult, expensive, and time-consuming. This makes active learning (AL) an important part of an image analysis framework. AL aims to efficiently build a representative and efficient library of training samples that are most informative for the underlying classification task, thereby minimizing the cost of obtaining labeled data. Based on ranked batch-mode active learning (RBMAL), this paper proposes a novel combined query strategy of spectral information divergence lowest confidence uncertainty sampling (SIDLC), called RBSIDLC. The base classifier of random forest (RF) is initialized by using a small initial training set, and each unlabeled sample is analyzed to obtain the classification uncertainty score. A spectral information divergence (SID) function is then used to calculate the similarity score, and according to the final score, the unlabeled samples are ranked in descending lists. The most “valuable” samples are selected according to ranked lists and then labeled by the analyst/expert (also called the oracle). Finally, these samples are added to the training set, and the RF is retrained for the next iteration. The whole procedure is iteratively implemented until a stopping criterion is met. The results indicate that RBSIDLC achieves high-precision extraction of urban land use information based on VHRRS; the accuracy of extraction for each land-use type is greater than 90%, and the overall accuracy (OA) is greater than 96%. After the SID replaces the Euclidean distance in the RBMAL algorithm, the RBSIDLC method greatly reduces the misclassification rate among different land types. Therefore, the similarity function based on SID performs better than that based on the Euclidean distance. In addition, the OA of RF classification is greater than 90%, suggesting that it is feasible to use RF to estimate the uncertainty score. Compared with the three single query strategies of other AL methods, sample labeling with the SIDLC combined query strategy yields a lower cost and higher quality, thus effectively reducing the misclassification rate of different land use types. For example, compared with the Batch_Based_Entropy (BBE) algorithm, RBSIDLC improves the precision of barren land extraction by 37% and that of vegetation by 14%. The 25 characteristics of different land use types screened by RF cross-validation (RFCV) combined with the permutation method exhibit an excellent separation degree, and the results provide the basis for VHRRS information extraction in urban land use settings based on RBSIDLC. Full article

(This article belongs to the Special Issue Machine Learning Using Medium and High-Resolution Remote Sensing Datasets)

► Show Figures

Graphical abstract

24 pages, 406 KiB

Open AccessArticle

Mixture-Based Probabilistic Graphical Models for the Label Ranking Problem

by Enrique G. Rodrigo, Juan C. Alfaro, Juan A. Aledo and José A. Gámez

Entropy 2021, 23(4), 420; https://doi.org/10.3390/e23040420 - 31 Mar 2021

Cited by 9 | Viewed by 3153

Abstract

The goal of the Label Ranking (LR) problem is to learn preference models that predict the preferred ranking of class labels for a given unlabeled instance. Different well-known machine learning algorithms have been adapted to deal with the LR problem. In particular, fine-tuned [...] Read more.

The goal of the Label Ranking (LR) problem is to learn preference models that predict the preferred ranking of class labels for a given unlabeled instance. Different well-known machine learning algorithms have been adapted to deal with the LR problem. In particular, fine-tuned instance-based algorithms (e.g., k-nearest neighbors) and model-based algorithms (e.g., decision trees) have performed remarkably well in tackling the LR problem. Probabilistic Graphical Models (PGMs, e.g., Bayesian networks) have not been considered to deal with this problem because of the difficulty of modeling permutations in that framework. In this paper, we propose a Hidden Naive Bayes classifier (HNB) to cope with the LR problem. By introducing a hidden variable, we can design a hybrid Bayesian network in which several types of distributions can be combined: multinomial for discrete variables, Gaussian for numerical variables, and Mallows for permutations. We consider two kinds of probabilistic models: one based on a Naive Bayes graphical structure (where only univariate probability distributions are estimated for each state of the hidden variable) and another where we allow interactions among the predictive attributes (using a multivariate Gaussian distribution for the parameter estimation). The experimental evaluation shows that our proposals are competitive with the start-of-the-art algorithms in both accuracy and in CPU time requirements. Full article

(This article belongs to the Special Issue Bayesian Inference in Probabilistic Graphical Models)

► Show Figures

Figure 1

16 pages, 2733 KiB

Open AccessArticle

Local Feature Extraction Network for Point Cloud Analysis

by Zehao Zhou, Yichun Tai, Jianlin Chen and Zhijiang Zhang

Symmetry 2021, 13(2), 321; https://doi.org/10.3390/sym13020321 - 16 Feb 2021

Cited by 4 | Viewed by 4188

Abstract

Geometric feature extraction of 3D point clouds plays an important role in many 3D computer vision applications such as region labeling, 3D reconstruction, object segmentation, and recognition. However, hand-designed features on point clouds lack semantic information, so cannot meet these requirements. In this [...] Read more.

Geometric feature extraction of 3D point clouds plays an important role in many 3D computer vision applications such as region labeling, 3D reconstruction, object segmentation, and recognition. However, hand-designed features on point clouds lack semantic information, so cannot meet these requirements. In this paper, we propose local feature extraction network (LFE-Net) which focus on extracting local feature for point clouds analysis. Such geometric features learning from a relation of local points can be used in a variety of shape analysis problems such as classification, part segmentation, and point matching. LFE-Net consists of local geometric relation (LGR) module which aims to learn a high-dimensional local feature to express the relation between points and their neighbors. Benefiting from the additional singular values of local points and hierarchical neural networks, the learned local features are robust to permutation and rigid transformation so that they can be transformed into 3D descriptors. Moreover, we embed prior spatial information of the local points into the sub-features for combining features from multiple levels. LFE-Net achieves state-of-the-art performances on standard benchmarks including ModelNet40, ShapeNetPart. Full article

► Show Figures

Figure 1

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI