MDPI - Publisher of Open Access Journals

27 pages, 3412 KiB

Open AccessArticle

Efficient Clustering Method for Graph Images Using Two-Stage Clustering Technique

by Hyuk-Gyu Park, Kwang-Seong Shin and Jong-Chan Kim

Electronics 2025, 14(6), 1232; https://doi.org/10.3390/electronics14061232 - 20 Mar 2025

Cited by 1 | Viewed by 546

Graphimages, which represent data structures through nodes and edges, present significant challenges for clustering due to their intricate topological properties. Traditional clustering algorithms, such as K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), often struggle to effectively capture both spatial and structural relationships within graph images. To overcome these limitations, we propose a novel two-stage clustering approach that integrates conventional clustering techniques with graph-based methodologies to enhance both accuracy and efficiency. In the first stage, a distance- or density-based clustering algorithm (e.g., K-means or DBSCAN) is applied to generate initial cluster formations. In the second stage, these clusters are refined using spectral clustering or community detection techniques to better preserve and exploit topological features. We evaluate our approach using a dataset of 8118 graph images derived from depth measurements taken at various angles. The experimental results demonstrate that our method surpasses single-method clustering approaches in terms of the silhouette score, Calinski-Harabasz index (CHI), and modularity. The silhouette score measures how similar an object is to its own cluster compared to other clusters, while the CHI, also known as the Variance Ratio Criterion, evaluates cluster quality based on the ratio of between-cluster dispersion to within-cluster dispersion. Modularity, a metric commonly used in graph-based clustering, assesses the strength of division of a network into communities. Furthermore, qualitative analysis through visualization confirms that the proposed two-stage clustering approach more effectively differentiates structural similarities within graph images. These findings underscore the potential of hybrid clustering techniques for various applications, including three-dimensional (3D) measurement analysis, medical imaging, and social network analysis. Full article

(This article belongs to the Special Issue Pattern Recognition and Image Processing: Latest Advances and Prospects)

► Show Figures

Figure 1

17 pages, 5620 KiB

Open AccessArticle

Image Segmentation and Filtering of Anaerobic Lagoon Floating Cover in Digital Elevation Model and Orthomosaics Using Unsupervised k-Means Clustering for Scum Association Analysis

by Benjamin Steven Vien, Thomas Kuen, Louis Raymond Francis Rose and Wing Kong Chiu

Remote Sens. 2023, 15(22), 5357; https://doi.org/10.3390/rs15225357 - 14 Nov 2023

Cited by 6 | Viewed by 1919

Abstract

In various engineering applications, remote sensing images such as digital elevation models (DEMs) and orthomosaics provide a convenient means of generating 3D representations of physical assets, enabling the discovery of new insights and analyses. However, the presence of noise and artefacts, particularly unwanted natural features, poses significant challenges, and their removal requires the application of filtering techniques prior to conducting analysis. Unmanned aerial vehicle-based photogrammetry is used at Melbourne Water’s Western Treatment Plant as a cost-effective and efficient method of inspecting the floating covers on the anaerobic lagoons. The focus of interest is the elevation profile of the floating covers for these sewage-processing lagoons and its implications for sub-surface scum accumulation, which can compromise the structural integrity of the engineered assets. However, unwanted artefacts due to trapped rainwater, debris, dirt, and other irrelevant structures can significantly distort the elevation profile. In this study, a machine learning algorithm is utilised to group distinct features on the floating cover based on an image segmentation process. An unsupervised k-means clustering algorithm is employed, which operates on a stacked 4D array composed of the elevation of the DEM and the RGB channels of the associated orthomosaic. In the cluster validation process, seven cluster groups were considered optimal based on the Calinski–Harabasz criterion. Furthermore, by utilising the k-means method as a filtering technique, three clusters contain features related to the elevations associated with the floating cover membrane, collectively representing 84% of the asset, with each cluster contributing at least 19% of the asset. The artefact groups constitute less than 6% of the asset and exhibit significantly different features, colour characteristics, and statistical measurements from those of the membrane groups. The study found notable improvements using the k-means filtering method, including a 59.4% average reduction in outliers and a 36.3% decrease in standard deviation compared to raw data. Additionally, employing the proposed method in the scum hardness analysis improved correlation strength by 13.1%, removing approximately 16% of the artefacts in total assets, in contrast to a 3.6% improvement with the median filtering method. This improved imaging will lead to significant benefits when integrating imagery into deep learning models for structural health monitoring and asset performance. Full article

► Show Figures

Graphical abstract

13 pages, 1486 KiB

Open AccessArticle

Longitudinal Trajectory Modeling to Assess Adherence to Sacubitril/Valsartan among Patients with Heart Failure

by Sara Mucherino, Alexandra Lelia Dima, Enrico Coscioni, Maria Giovanna Vassallo, Valentina Orlando and Enrica Menditto

Pharmaceutics 2023, 15(11), 2568; https://doi.org/10.3390/pharmaceutics15112568 - 1 Nov 2023

Cited by 2 | Viewed by 1980

Abstract

Medication adherence in chronic conditions is a long-term process. Modeling longitudinal trajectories using routinely collected prescription data is a promising method for describing adherence patterns and identifying at-risk groups. The study aimed to characterize distinct long-term sacubitril/valsartan adherence trajectories and factors associated with them in patients with heart failure (HF). Subjects with incident HF starting sac/val in 2017–2018 were identified from the Campania Regional Database for Medication Consumption. We estimated patients’ continuous medication availability (CMA9; R package AdhereR) during a 12-month period. We selected groups with similar CMA9 trajectories (Calinski-Harabasz criterion; R package kml). We performed multinomial regression analysis, assessing the relationship between demographic and clinical factors and adherence trajectory groups. The cohort included 4455 subjects, 70% male. Group-based trajectory modeling identified four distinct adherence trajectories: high adherence (42.6% of subjects; CMA mean 0.91 ± 0.08), partial drop-off (19.6%; CMA 0.63 ± 0.13), moderate adherence (19.3%; CMA 0.54 ± 0.11), and low adherence (18.4%; CMA 0.17 ± 0.12). Polypharmacy was associated with partial drop-off adherence (OR 1.194, 95%CI 1.175–1.214), while the occurrence of ≥1 HF hospitalization (OR 1.165, 95%CI 1.151–1.179) or other hospitalizations (OR 1.481, 95%CI 1.459–1.503) were associated with low adherence. This study found that tailoring patient education, providing support, and ongoing monitoring can boost adherence within different groups, potentially improving health outcomes. Full article

(This article belongs to the Topic Drug Utilization and Medication Adherence: Strategies, Technologies and Practices)

► Show Figures

Figure 1

17 pages, 1433 KiB

Open AccessArticle

Identifying Hazardous Crash Locations Using Empirical Bayes and Spatial Autocorrelation

by Anteneh Afework Mekonnen, Tibor Sipos and Nóra Krizsik

ISPRS Int. J. Geo-Inf. 2023, 12(3), 85; https://doi.org/10.3390/ijgi12030085 - 21 Feb 2023

Cited by 7 | Viewed by 2759

Abstract

Identifying and prioritizing hazardous road traffic crash locations is an efficient way to mitigate road traffic crashes, treat point locations, and introduce regulations for area-wide changes. A sound method to identify blackspots (BS) and area-wide hotspots (HS) would help increase the precision of intervention, reduce future crash incidents, and introduce proper measures. In this study, we implemented the operational definitions criterion in the Hungarian design guideline for road planning, reducing the huge number of crashes that occurred over three years for the accuracy and simplicity of the analysis. K-means and hierarchical clustering algorithms were compared for the segmentation process. K-means performed better, and it is selected after comparing the two algorithms with three indexes: Silhouette, Davies–Bouldin, and Calinski–Harabasz. The Empirical Bayes (EB) method was employed for the final process of the BS identification. Three BS were identified in Budapest, based on a three-year crash data set from 2016 to 2018. The optimized hotspot analysis (Getis-Ord Gi*) using the Geographic Information System (GIS) technique was conducted. The spatial autocorrelation analysis separates the hotspots, cold spots, and insignificant areas with 95% and 90% confidence levels. Full article

► Show Figures

Figure 1

18 pages, 3794 KiB

Open AccessArticle

Characterising Free-Range Layer Flocks Using Unsupervised Cluster Analysis

by Terence Zimazile Sibanda, Mitchell Welch, Derek Schneider, Manisha Kolakshyapati and Isabelle Ruhnke

Animals 2020, 10(5), 855; https://doi.org/10.3390/ani10050855 - 15 May 2020

Cited by 13 | Viewed by 3621

Abstract

This study aimed to identify sub-populations of free-range laying hens and describe the pattern of their resource usage, which can affect hen performance and welfare. In three commercial flocks, 3125 Lohmann Brown hens were equipped with radio-frequency identification (RFID) transponder leg bands and placed with their flock companions, resulting in a total of 40,000 hens/flock. Hens were monitored for their use of the aviary system, including feeder lines, nest boxes, and the outdoor range. K-means and agglomerative cluster analysis, optimized with the Calinski-Harabasz Criterion, was performed and identified three clusters. Individual variation in time duration was observed in all the clusters with the highest individual differences observed on the upper feeder (140 ± 1.02%) and the range (176 ± 1.03%). Hens of cluster 1 spent the least amount time on the range and the most time on the feed chain located at the upper aviary tier (p < 0.05). We conclude that an uneven load on the resources, as well as consistent and inconsistent movement patterns, occur in the hen house. Further analysis of the data sets using classification models based on support vector machines, artificial neural networks, and decision trees are warranted to investigate the contribution of these and other parameters on hen performance. Full article

(This article belongs to the Section Animal Welfare)

► Show Figures

Figure 1

19 pages, 2687 KiB

Open AccessArticle

Emergence Patterns of Rare Arable Plants and Conservation Implications

by Joel Torra, Frank Forcella, Jordi Recasens and Aritz Royo-Esnal

Plants 2020, 9(3), 309; https://doi.org/10.3390/plants9030309 - 1 Mar 2020

Cited by 1 | Viewed by 2810

Abstract

Knowledge on the emergence patterns of rare arable plants (RAP) is essential to design their conservation in Europe. This study hypothesizes that is possible to find functional groups with similar emergence patterns within RAP with the aim of establishing management strategies. Seeds of 30 different species were collected from Spanish arable fields and sown under two tillage treatments: (a) 1 cm depth without soil disturbance to simulate no-till, and (b) 1–10 cm depth with soil disturbance every autumn to simulate tillage to 10 cm depth. Two trials were established; the first trial being maintained for three seasons and the second for two seasons. Relative emergence in autumn, winter and spring was calculated each season. Afterwards, multivariate analysis was performed by K-means clustering and Principal Component Analysis to find groups of RAP species with similar emergence patterns. Four RAP groups were defined, and each was based on its main emergence season: autumn, winter, spring, or autumn-winter. Tillage treatment and the year of sowing had little effect on emergence patterns, which were mostly dependent on environmental factors, particularly temperature and rainfall. Therefore, conservation strategies could be designed for each of these RAP functional groups based on emergence patterns, rather than on a species-by-species basis. Full article

(This article belongs to the Special Issue Weed Ecology and Management)

► Show Figures

Figure 1

30 pages, 18220 KiB

Open AccessArticle

Optimal Clustering and Cluster Identity in Understanding High-Dimensional Data Spaces with Tightly Distributed Points

by Oliver Chikumbo and Vincent Granville

Mach. Learn. Knowl. Extr. 2019, 1(2), 715-744; https://doi.org/10.3390/make1020042 - 5 Jun 2019

Cited by 11 | Viewed by 7053

Abstract

The sensitivity of the elbow rule in determining an optimal number of clusters in high-dimensional spaces that are characterized by tightly distributed data points is demonstrated. The high-dimensional data samples are not artificially generated, but they are taken from a real world evolutionary many-objective optimization. They comprise of Pareto fronts from the last 10 generations of an evolutionary optimization computation with 14 objective functions. The choice for analyzing Pareto fronts is strategic, as it is squarely intended to benefit the user who only needs one solution to implement from the Pareto set, and therefore a systematic means of reducing the cardinality of solutions is imperative. As such, clustering the data and identifying the cluster from which to pick the desired solution is covered in this manuscript, highlighting the implementation of the elbow rule and the use of hyper-radial distances for cluster identity. The Calinski-Harabasz statistic was favored for determining the criteria used in the elbow rule because of its robustness. The statistic takes into account the variance within clusters and also the variance between the clusters. This exercise also opened an opportunity to revisit the justification of using the highest Calinski-Harabasz criterion for determining the optimal number of clusters for multivariate data. The elbow rule predicted the maximum end of the optimal number of clusters, and the highest Calinski-Harabasz criterion method favored the number of clusters at the lower end. Both results are used in a unique way for understanding high-dimensional data, despite being inconclusive regarding which of the two methods determine the true optimal number of clusters. Full article

(This article belongs to the Section Data)

► Show Figures

Graphical abstract

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI