Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (48)

Search Parameters:
Keywords = Calinski Harabasz

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
34 pages, 3431 KiB  
Article
Evaluation of Hierarchical Clustering Methodologies for Identifying Patterns in Timeout Requests in EuroLeague Basketball
by José Miguel Contreras, Elena Molina Portillo and Juan Manuel Fernández Luna
Mathematics 2025, 13(15), 2414; https://doi.org/10.3390/math13152414 - 27 Jul 2025
Viewed by 210
Abstract
This study evaluates hierarchical clustering methodologies to identify patterns associated with timeout requests for EuroLeague basketball games. Using play-by-play data from 3743 games spanning the 2008–2023 seasons (over 1.9 million instances), we applied Principal Component Analysis to reduce dimensionality and tested multiple agglomerative [...] Read more.
This study evaluates hierarchical clustering methodologies to identify patterns associated with timeout requests for EuroLeague basketball games. Using play-by-play data from 3743 games spanning the 2008–2023 seasons (over 1.9 million instances), we applied Principal Component Analysis to reduce dimensionality and tested multiple agglomerative and divisive clustering techniques (e.g., Ward and DIANA) with different distance metrics (Euclidean, Manhattan, and Minkowski). Clustering quality was assessed using internal validation indices such as Silhouette, Dunn, Calinski–Harabasz, Davies–Bouldin, and Gap statistics. The results show that Ward.D and Ward.D2 methods using Euclidean distance generate well-balanced and clearly defined clusters. Two clusters offer the best overall quality, while four clusters allow for meaningful segmentation of game situations. The analysis revealed that teams that did not request timeouts often exhibited better scoring efficiency, particularly in the advanced game phases. These findings offer data-driven insights into timeout dynamics and contribute to strategic decision-making in professional basketball. Full article
(This article belongs to the Section E: Applied Mathematics)
Show Figures

Figure 1

7 pages, 808 KiB  
Proceeding Paper
Performance of a Single-Flicker SSVEP BCI Using Single Channels
by Gerardo Luis Padilla and Fernando Daniel Farfán
Eng. Proc. 2024, 81(1), 19; https://doi.org/10.3390/engproc2024081019 - 6 Jun 2025
Viewed by 661
Abstract
This study investigated performance characteristics and channel selection strategies for single-flicker steady-state visual evoked potential (SSVEP) brain–computer interfaces (BCIs) using minimal recording channels. SSVEP clustering patterns from seven subjects, who focused on four static targets while being exposed to a central 15 Hz [...] Read more.
This study investigated performance characteristics and channel selection strategies for single-flicker steady-state visual evoked potential (SSVEP) brain–computer interfaces (BCIs) using minimal recording channels. SSVEP clustering patterns from seven subjects, who focused on four static targets while being exposed to a central 15 Hz stimulus, were analyzed. Using a single-channel approach, signal energy patterns were examined, and principal component analysis (PCA) was performed, which explained over 90% of the data variance. The Calinski–Harabasz Index quantified state separability, identifying channels and comparisons with maximum clustering efficiency. The results demonstrate the feasibility of implementing single-flicker SSVEP BCIs with reduced recording channels, contributing to more practical and efficient BCI systems. Full article
(This article belongs to the Proceedings of The 1st International Online Conference on Bioengineering)
Show Figures

Figure 1

20 pages, 13652 KiB  
Article
Classification of Tropical Cyclone Tracks in the Northwest Pacific Based on the SD-K-Means Model
by Nan Xu, Baisong Yang and Jia Ren
Appl. Sci. 2025, 15(11), 6160; https://doi.org/10.3390/app15116160 - 30 May 2025
Viewed by 448
Abstract
Tropical cyclone (TC) track clustering plays a crucial role in understanding cyclone movement patterns, which is essential for risk assessment and disaster preparedness. This study proposes an improved SD-K-Means clustering algorithm for classifying TC tracks. Using the best-track datasets of TCs from 2000 [...] Read more.
Tropical cyclone (TC) track clustering plays a crucial role in understanding cyclone movement patterns, which is essential for risk assessment and disaster preparedness. This study proposes an improved SD-K-Means clustering algorithm for classifying TC tracks. Using the best-track datasets of TCs from 2000 to 2022, provided by NOAA (National Oceanic and Atmospheric Administration) and JMA (Japan Meteorological Agency), it explores the quantitative relationships between various TC features, such as latitude, longitude, and wind speed, and their motion speed and deflection angles. Based on these analyses, clustering indicators coupled with TC tracks and motion characteristics are identified. To evaluate the model’s performance, three clustering methods—standard K-Means, DTW (Dynamic Time Warping)-based K-Means, and DBSCAN (Density-Based Spatial Clustering of Applications with Noise)—are compared using the Calinski–Harabasz (CH) index and the Davies–Bouldin Index (DBI) as evaluation metrics. The experimental results show that the SD-K-Means algorithm achieved high consistency across the majority of clustering indices, with the optimal number of clusters determined to be four. The spatial distribution of the clustering results demonstrates that SD-K-Means is effective in distinguishing different TC track patterns, providing valuable insights for regional disaster prevention and risk management efforts. Full article
Show Figures

Figure 1

13 pages, 2071 KiB  
Article
Exploratory Cluster-Based Radiographic Phenotyping of Degenerative Cervical Disorder: A Retrospective Study
by Si-Hyung Lew, Ye-Jin Jeong, Ye-Ri Roh and Dong-Ho Kang
Medicina 2025, 61(5), 916; https://doi.org/10.3390/medicina61050916 - 19 May 2025
Viewed by 476
Abstract
Background and Objectives: Degenerative cervical myelopathy (DCM), a major subtype of degenerative cervical disorders, presents with diverse sagittal alignment patterns. However, radiography-based phenotyping remains underexplored. This study aimed to identify distinct cervical alignment subgroups using unsupervised clustering analysis and to explore their [...] Read more.
Background and Objectives: Degenerative cervical myelopathy (DCM), a major subtype of degenerative cervical disorders, presents with diverse sagittal alignment patterns. However, radiography-based phenotyping remains underexplored. This study aimed to identify distinct cervical alignment subgroups using unsupervised clustering analysis and to explore their potential clinical relevance. Materials and Methods: We analyzed 1371 lateral cervical radiographs of patients with DCM. C3–C7 sagittal vertical axis (SVA), lordosis, vertical length, and curved length were determined. K-means clustering was applied, and the optimal cluster number was determined using the elbow method and silhouette analysis. Clustering validity was assessed using the Calinski–Harabasz and Davies–Bouldin indices. Results: The final clustering solution was validated with a high Calinski–Harabasz index (1171.70) and an acceptable Davies–Bouldin index (0.99) at k = 3, confirming the stability and robustness of the classification. Cluster 1 (forward-head type) exhibited low lordosis (8.3° ± 4.7°), moderate SVA (95.9 ± 60.2 mm), and a compact cervical structure, consistent with kyphotic alignment and forward-head displacement. Cluster 2 (normal) showed the highest lordosis (24.1° ± 6.8°), moderate SVA (70.6 ± 50.2 mm), and balanced sagittal alignment, indicating a biomechanically stable cervical posture. Cluster 3 (long-neck type) displayed the highest SVA (135.6 ± 76.7 mm), the longest vertical and curved lengths, and moderate lordosis, suggesting a structurally elongated cervical spine with anterior head displacement. Significant differences (p < 0.01) were observed across all clusters, confirming distinct phenotypic patterns in cervical sagittal alignment. Conclusions: This exploratory clustering analysis identified three distinct radiographic phenotypes of DCM, reflecting biomechanical heterogeneity. Although prospective studies linking these phenotypes to clinical outcomes are warranted, our findings provide a framework for personalized spinal care in the future. Full article
(This article belongs to the Special Issue Clinical Advances in Spine Surgery)
Show Figures

Figure 1

18 pages, 2692 KiB  
Article
Unit Size Determination for Exploratory Brain Imaging Analysis: A Quest for a Resolution-Invariant Metric
by Jihnhee Yu, HyunAh Lee and Zohi Sternberg
Mathematics 2025, 13(7), 1195; https://doi.org/10.3390/math13071195 - 4 Apr 2025
Viewed by 378
Abstract
Defining an adequate unit size is often crucial in brain imaging analysis, where datasets are complex, high-dimensional, and computationally demanding. Unit size refers to the spatial resolution at which brain data is aggregated for analysis. Optimizing unit size in data aggregation requires balancing [...] Read more.
Defining an adequate unit size is often crucial in brain imaging analysis, where datasets are complex, high-dimensional, and computationally demanding. Unit size refers to the spatial resolution at which brain data is aggregated for analysis. Optimizing unit size in data aggregation requires balancing computational efficiency in handling large-scale data sets with the preservation of brain activity patterns, minimizing signal dilution. We propose using the Calinski–Harabasz index, demonstrating its invariance to sample size changes due to varying image resolutions when no distributional differences are present, while the index effectively identifies an appropriate unit size for detecting suspected regions in image comparisons. The resolution-independent metric can be used for unit size evaluation, ensuring adaptability across different imaging protocols and modalities. This study enhances the scalability and efficiency of brain imaging research by providing a robust framework for unit size optimization, ultimately strengthening analytical tools for investigating brain function and structure. Full article
(This article belongs to the Special Issue Mathematical Methods for Image Processing and Computer Vision)
Show Figures

Figure 1

26 pages, 4679 KiB  
Article
Importance Classification Method for Signalized Intersections Based on the SOM-K-GMM Clustering Algorithm
by Ziyi Yang, Yang Chen, Dong Guo, Fangtong Jiao, Bin Zhou and Feng Sun
Sustainability 2025, 17(7), 2827; https://doi.org/10.3390/su17072827 - 22 Mar 2025
Viewed by 407
Abstract
Urbanization has intensified traffic loads, posing significant challenges to the efficiency and stability of urban road networks. Overloaded nodes risk congestion, thus making accurate intersection importance classification essential for resource optimization. This study proposes a hybrid clustering method that combines Self-Organizing Maps (SOMs), [...] Read more.
Urbanization has intensified traffic loads, posing significant challenges to the efficiency and stability of urban road networks. Overloaded nodes risk congestion, thus making accurate intersection importance classification essential for resource optimization. This study proposes a hybrid clustering method that combines Self-Organizing Maps (SOMs), K-Means, and the Gaussian Mixture Model (GMM), which is supported by the Traffic Flow–Network Topology–Social Economy (TNS) evaluation framework. This framework integrates three dimensions—traffic flow, road network topology, and socio-economic features—capturing six key indicators: intersection saturation, traffic flow balance, mileage coverage, capacity, betweenness efficiency, and node activity. The SOMs method determines the optimal k value and centroids for K-Means, while GMM validates the cluster membership probabilities. The proposed model achieved a silhouette coefficient of 0.737, a Davies–Bouldin index of 1.003, and a Calinski–Harabasz index of 57.688, with the silhouette coefficient improving by 78.1% over SOMs alone, 65.2% over K-Means, and 11.5% over SOM-K-Means, thus demonstrating high robustness. The intersection importance ranking was conducted using the Mahalanobis distance method, and it was validated on 40 intersections within the road network of Zibo City. By comparing the importance rankings across static, off-peak, morning peak, and evening peak periods, a dynamic ranking approach is proposed. This method provides a robust basis for optimizing resource allocation and traffic management at urban intersections. Full article
(This article belongs to the Section Sustainable Transportation)
Show Figures

Figure 1

27 pages, 3412 KiB  
Article
Efficient Clustering Method for Graph Images Using Two-Stage Clustering Technique
by Hyuk-Gyu Park, Kwang-Seong Shin and Jong-Chan Kim
Electronics 2025, 14(6), 1232; https://doi.org/10.3390/electronics14061232 - 20 Mar 2025
Cited by 1 | Viewed by 546
Abstract
Graphimages, which represent data structures through nodes and edges, present significant challenges for clustering due to their intricate topological properties. Traditional clustering algorithms, such as K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), often struggle to effectively capture both spatial and [...] Read more.
Graphimages, which represent data structures through nodes and edges, present significant challenges for clustering due to their intricate topological properties. Traditional clustering algorithms, such as K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), often struggle to effectively capture both spatial and structural relationships within graph images. To overcome these limitations, we propose a novel two-stage clustering approach that integrates conventional clustering techniques with graph-based methodologies to enhance both accuracy and efficiency. In the first stage, a distance- or density-based clustering algorithm (e.g., K-means or DBSCAN) is applied to generate initial cluster formations. In the second stage, these clusters are refined using spectral clustering or community detection techniques to better preserve and exploit topological features. We evaluate our approach using a dataset of 8118 graph images derived from depth measurements taken at various angles. The experimental results demonstrate that our method surpasses single-method clustering approaches in terms of the silhouette score, Calinski-Harabasz index (CHI), and modularity. The silhouette score measures how similar an object is to its own cluster compared to other clusters, while the CHI, also known as the Variance Ratio Criterion, evaluates cluster quality based on the ratio of between-cluster dispersion to within-cluster dispersion. Modularity, a metric commonly used in graph-based clustering, assesses the strength of division of a network into communities. Furthermore, qualitative analysis through visualization confirms that the proposed two-stage clustering approach more effectively differentiates structural similarities within graph images. These findings underscore the potential of hybrid clustering techniques for various applications, including three-dimensional (3D) measurement analysis, medical imaging, and social network analysis. Full article
Show Figures

Figure 1

15 pages, 2587 KiB  
Article
Understanding Student Learning Behavior: Integrating the Self-Regulated Learning Approach and K-Means Clustering
by Buchaputara Pansri, Sandhya Sharma, Suresh Timilsina, Worawudh Choonhapong, Kentarou Kurashige, Shinya Watanabe and Kazuhiko Sato
Educ. Sci. 2024, 14(12), 1291; https://doi.org/10.3390/educsci14121291 - 25 Nov 2024
Cited by 2 | Viewed by 2481
Abstract
Information and communication technology considerably impacts students’ engagement with self-regulated learning (SRL) methodologies. However, there has been a lack of any comprehensive visualization of the SRL process, making it difficult to interpret student behaviors. To address this issue, the REXX platform is used [...] Read more.
Information and communication technology considerably impacts students’ engagement with self-regulated learning (SRL) methodologies. However, there has been a lack of any comprehensive visualization of the SRL process, making it difficult to interpret student behaviors. To address this issue, the REXX platform is used in this study to visualize SRL outputs. While REXX has previously been used to present educational metrics more comprehensively and personally in the learning management system (LMS) framework, research on understanding student behavior through the learning analytics platform (LAP) remains unexplored. This study focused on transforming REXX from an LMS to an LAP to capture detailed features of individual student profiles, thereby reflecting specific SRL characteristics. We collected profile data from 215 high school students via an e-learning web application and used K-means clustering to categorize their behaviors. The method yielded a Davies–-Bouldin score of 0.9718, a silhouette score of 0.54, and a Calinski–Harabasz score of 124.1805. This study addresses both teaching and learning strategies for educators and students. It represents a considerable step toward understanding student behavior in the e-learning environment. However, we recommend integrating machine learning models to enhance automated learning strategies alongside this baseline framework. Full article
Show Figures

Figure 1

32 pages, 7951 KiB  
Article
Hybrid Machine Learning for Stunting Prevalence: A Novel Comprehensive Approach to Its Classification, Prediction, and Clustering Optimization in Aceh, Indonesia
by Novia Hasdyna, Rozzi Kesuma Dinata, Rahmi and T. Irfan Fajri
Informatics 2024, 11(4), 89; https://doi.org/10.3390/informatics11040089 - 21 Nov 2024
Cited by 2 | Viewed by 2413
Abstract
Stunting remains a significant public health issue in Aceh, Indonesia, and is influenced by various socio-economic and environmental factors. This study aims to address key challenges in accurately classifying stunting prevalence, predicting future trends, and optimizing clustering methods to support more effective interventions. [...] Read more.
Stunting remains a significant public health issue in Aceh, Indonesia, and is influenced by various socio-economic and environmental factors. This study aims to address key challenges in accurately classifying stunting prevalence, predicting future trends, and optimizing clustering methods to support more effective interventions. To this end, we propose a novel hybrid machine learning framework that integrates classification, predictive modeling, and clustering optimization. Support Vector Machines (SVM) with Radial Basis Function (RBF) and Sigmoid kernels were employed to improve the classification accuracy, with the RBF kernel outperforming the Sigmoid kernel, achieving an accuracy rate of 91.3% compared with 85.6%. This provides a more reliable tool for identifying high-risk populations. Furthermore, linear regression was used for predictive modeling, yielding a low Mean Squared Error (MSE) of 0.137, demonstrating robust predictive accuracy for future stunting prevalence. Finally, the clustering process was optimized using a weighted-product approach to enhance the efficiency of K-Medoids. This optimization reduced the number of iterations from seven to three and improved the Calinski–Harabasz Index from 85.2 to 93.7. This comprehensive framework not only enhances the classification, prediction, and clustering of results but also delivers actionable insights for targeted public health interventions and policymaking aimed at reducing stunting in Aceh. Full article
(This article belongs to the Section Health Informatics)
Show Figures

Figure 1

27 pages, 15476 KiB  
Article
Explainable AI-Based Ensemble Clustering for Load Profiling and Demand Response
by Elissaios Sarmas, Afroditi Fragkiadaki and Vangelis Marinakis
Energies 2024, 17(22), 5559; https://doi.org/10.3390/en17225559 - 7 Nov 2024
Cited by 7 | Viewed by 1462
Abstract
Smart meter data provide an in-depth perspective on household energy usage. This research leverages on such data to enhance demand response (DR) programs through a novel application of ensemble clustering. Despite its promising capabilities, our literature review identified a notable under-utilization of ensemble [...] Read more.
Smart meter data provide an in-depth perspective on household energy usage. This research leverages on such data to enhance demand response (DR) programs through a novel application of ensemble clustering. Despite its promising capabilities, our literature review identified a notable under-utilization of ensemble clustering in this domain. To address this shortcoming, we applied an advanced ensemble clustering method and compared its performance with traditional algorithms, namely, K-Means++, fuzzy K-Means, Hierarchical Agglomerative Clustering, Spectral Clustering, Gaussian Mixture Models (GMMs), BIRCH, and Self-Organizing Maps (SOMs), across a dataset of 5567 households for a range of cluster counts from three to nine. The performance of these algorithms was assessed using an extensive set of evaluation metrics, including the Silhouette Score, the Davies–Bouldin Score, the Calinski–Harabasz Score, and the Dunn Index. Notably, while ensemble clustering often ranked among the top performers, it did not consistently surpass all individual algorithms, indicating its potential for further optimization. Unlike approaches that seek the algorithmically optimal number of clusters, our method proposes a practical six-cluster solution designed to meet the operational needs of utility providers. For this case, the best performing algorithm according to the evaluation metrics was ensemble clustering. This study is further enhanced by integrating Explainable AI (xAI) techniques, which improve the interpretability and transparency of our clustering results. Full article
(This article belongs to the Special Issue Advances in Energy Market and Distributed Generation)
Show Figures

Figure 1

19 pages, 4338 KiB  
Article
Discovering Electric Vehicle Charging Locations Based on Clustering Techniques Applied to Vehicular Mobility Datasets
by Elmer Magsino, Francis Miguel M. Espiritu and Kerwin D. Go
ISPRS Int. J. Geo-Inf. 2024, 13(10), 368; https://doi.org/10.3390/ijgi13100368 - 18 Oct 2024
Cited by 1 | Viewed by 2087
Abstract
With the proliferation of vehicular mobility traces because of inexpensive on-board sensors and smartphones, utilizing them to further understand road movements have become easily accessible. These huge numbers of vehicular traces can be utilized to determine where to enhance road infrastructures such as [...] Read more.
With the proliferation of vehicular mobility traces because of inexpensive on-board sensors and smartphones, utilizing them to further understand road movements have become easily accessible. These huge numbers of vehicular traces can be utilized to determine where to enhance road infrastructures such as the deployment of electric vehicle (EV) charging stations. As more EVs are plying today’s roads, the driving anxiety is minimized with the presence of sufficient charging stations. By correctly extracting the various transportation parameters from a given dataset, one can design an adequate and adaptive EV charging network that can provide comfort and convenience for the movement of people and goods from one point to another. In this study, we determined the possible EV charging station locations based on an urban city’s vehicular capacity distribution obtained from taxi and ride-hailing mobility GPS traces. To achieve this, we first transformed the dynamic vehicular environment based on vehicular capacity into its equivalent urban single snapshot. We then obtained the various traffic zone distributions by initially utilizing k-means clustering to allow flexibility in the total number of wanted traffic zones in each dataset. In each traffic zone, iterative clustering techniques employing Density-based Spatial Clustering of Applications with Noise (DBSCAN) or clustering by fast search and find of density peaks (CFS) revealed various area separation where EV chargers were needed. Finally, to find the exact location of the EV charging station, we last ran k-means to locate centroids, depending on the constraint on how many EV chargers were needed. Extensive simulations revealed the strengths and weaknesses of the clustering methods when applied to our datasets. We utilized the silhouette and Calinski–Harabasz indices to measure the validity of cluster formations. We also measured the inter-station distances to understand the closeness of the locations of EV chargers. Our study shows how CFS + k-means clustering techniques are able to pinpoint EV charger locations. However, when utilizing DBSCAN initially, the results did not present any notable outcome. Full article
(This article belongs to the Topic Spatial Decision Support Systems for Urban Sustainability)
Show Figures

Figure 1

27 pages, 8384 KiB  
Article
Energy-Efficient Anomaly Detection and Chaoticity in Electric Vehicle Driving Behavior
by Efe Savran, Esin Karpat and Fatih Karpat
Sensors 2024, 24(17), 5628; https://doi.org/10.3390/s24175628 - 30 Aug 2024
Cited by 5 | Viewed by 2045
Abstract
Detection of abnormal situations in mobile systems not only provides predictions about risky situations but also has the potential to increase energy efficiency. In this study, two real-world drives of a battery electric vehicle and unsupervised hybrid anomaly detection approaches were developed. The [...] Read more.
Detection of abnormal situations in mobile systems not only provides predictions about risky situations but also has the potential to increase energy efficiency. In this study, two real-world drives of a battery electric vehicle and unsupervised hybrid anomaly detection approaches were developed. The anomaly detection performances of hybrid models created with the combination of Long Short-Term Memory (LSTM)-Autoencoder, the Local Outlier Factor (LOF), and the Mahalanobis distance were evaluated with the silhouette score, Davies–Bouldin index, and Calinski–Harabasz index, and the potential energy recovery rates were also determined. Two driving datasets were evaluated in terms of chaotic aspects using the Lyapunov exponent, Kolmogorov–Sinai entropy, and fractal dimension metrics. The developed hybrid models are superior to the sub-methods in anomaly detection. Hybrid Model-2 had 2.92% more successful results in anomaly detection compared to Hybrid Model-1. In terms of potential energy saving, Hybrid Model-1 provided 31.26% superiority, while Hybrid Model-2 provided 31.48%. It was also observed that there is a close relationship between anomaly and chaoticity. In the literature where cyber security and visual sources dominate in anomaly detection, a strategy was developed that provides energy efficiency-based anomaly detection and chaotic analysis from data obtained without additional sensor data. Full article
(This article belongs to the Special Issue Anomaly Detection and Fault Diagnosis in Sensor Networks)
Show Figures

Figure 1

18 pages, 3527 KiB  
Article
Identification of Patterns in CO2 Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear t-SNE Visualization
by Ana Lorena Jiménez-Preciado, Salvador Cruz-Aké and Francisco Venegas-Martínez
Mathematics 2024, 12(16), 2591; https://doi.org/10.3390/math12162591 - 22 Aug 2024
Cited by 4 | Viewed by 2266
Abstract
This paper identifies patterns in total and per capita CO2 emissions among 208 countries considering different emission sources, such as cement, flaring, gas, oil, and coal. This research uses linear and non-linear dimensional reduction techniques, combining K-means clustering with principal component analysis [...] Read more.
This paper identifies patterns in total and per capita CO2 emissions among 208 countries considering different emission sources, such as cement, flaring, gas, oil, and coal. This research uses linear and non-linear dimensional reduction techniques, combining K-means clustering with principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), which allows the identification of distinct emission profiles among nations. This approach allows effective clustering of heterogeneous countries despite the highly dimensional nature of emissions data. The optimal number of clusters is determined using Calinski–Harabasz and Davies–Bouldin scores, of five and six clusters for total and per capita CO2 emissions, respectively. The findings reveal that for total emissions, t-SNE brings together the world’s largest economies and emitters, i.e., China, USA, India, and Russia, into a single cluster, while PCA provides clusters with a single country for China, USA, and Russia. Regarding per capita emissions, PCA generates a cluster with only one country, Qatar, due to its significant flaring emissions, as byproduct of the oil industry, and its low population. This study concludes that international collaboration and coherent global policies are crucial for effectively addressing CO2 emissions and developing targeted climate change mitigation strategies. Full article
Show Figures

Figure 1

29 pages, 2253 KiB  
Article
Clustering Molecules at a Large Scale: Integrating Spectral Geometry with Deep Learning
by Ömer Akgüller, Mehmet Ali Balcı and Gabriela Cioca
Molecules 2024, 29(16), 3902; https://doi.org/10.3390/molecules29163902 - 17 Aug 2024
Cited by 2 | Viewed by 2126
Abstract
This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace–Beltrami operator to derive significant geometric features. By examining the [...] Read more.
This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace–Beltrami operator to derive significant geometric features. By examining the eigenvectors of these operators, we captured the intrinsic geometric properties of the molecules, aiding their classification and clustering. The research utilized four deep learning methods: Deep Belief Network, Convolutional Autoencoder, Variational Autoencoder, and Adversarial Autoencoder, each paired with k-means clustering at different cluster sizes. Clustering quality was evaluated using the Calinski–Harabasz and Davies–Bouldin indices, Silhouette Score, and standard deviation. Nonparametric tests were used to assess the impact of topological descriptors on clustering outcomes. Our results show that the DBN + k-means combination is the most effective, particularly at lower cluster counts, demonstrating significant sensitivity to structural variations. This study highlights the potential of integrating spectral geometry with deep learning for precise and efficient molecular clustering. Full article
(This article belongs to the Special Issue Deep Learning in Molecular Science and Technology)
Show Figures

Figure 1

18 pages, 1222 KiB  
Article
A Nature-Inspired Partial Distance-Based Clustering Algorithm
by Mohammed El Habib Kahla, Mounir Beggas, Abdelkader Laouid and Mohammad Hammoudeh
J. Sens. Actuator Netw. 2024, 13(4), 36; https://doi.org/10.3390/jsan13040036 - 21 Jun 2024
Cited by 4 | Viewed by 2155
Abstract
In the rapidly advancing landscape of digital technologies, clustering plays a critical role in the domains of artificial intelligence and big data. Clustering is essential for extracting meaningful insights and patterns from large, intricate datasets. Despite the efficacy of traditional clustering techniques in [...] Read more.
In the rapidly advancing landscape of digital technologies, clustering plays a critical role in the domains of artificial intelligence and big data. Clustering is essential for extracting meaningful insights and patterns from large, intricate datasets. Despite the efficacy of traditional clustering techniques in handling diverse data types and sizes, they encounter challenges posed by the increasing volume and dimensionality of data, as well as the complex structures inherent in high-dimensional spaces. This research recognizes the constraints of conventional clustering methods, including sensitivity to initial centroids, dependence on prior knowledge of cluster counts, and scalability issues, particularly in large datasets and Internet of Things implementations. In response to these challenges, we propose a K-level clustering algorithm inspired by the collective behavior of fish locomotion. K-level introduces a novel clustering approach based on greedy merging driven by distances in stages. This iterative process efficiently establishes hierarchical structures without the need for exhaustive computations. K-level gives users enhanced control over computational complexity, enabling them to specify the number of clusters merged simultaneously. This flexibility ensures accurate and efficient hierarchical clustering across diverse data types, offering a scalable solution for processing extensive datasets within a reasonable timeframe. The internal validation metrics, including the Silhouette Score, Davies–Bouldin Index, and Calinski–Harabasz Index, are utilized to evaluate the K-level algorithm across various types of datasets. Additionally, comparisons are made with rivals in the literature, including UPGMA, CLINK, UPGMC, SLINK, and K-means. The experiments and analyses show that the proposed algorithm overcomes many of the limitations of existing clustering methods, presenting scalable and adaptable clustering in the dynamic landscape of evolving data challenges. Full article
Show Figures

Figure 1

Back to TopTop