Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data
Abstract
:1. Introduction
2. Conventional SOM, Its Selected Generalizations, and Related Work
2.1. Conventional SOM
2.2. Growing SOMs (GSOMs) and Growing Grid Networks (GGNs)
2.3. Incremental Grid Growing (IGG) Approach
2.4. Growing Neural Gas (GNG) Approach
2.5. Related Recent Work
3. The Proposed Generalized SOMs with Splitting-Merging Structures
- 1.
- The removal of single, low-active neurons preserving the network continuity: a given neuron is removed if its activity—measured by the number of its wins—is below an assumed level.
- 2.
- The disconnection of the network (subnetwork) into two subnetworks: the disconnection of two neighboring neurons takes place if the Euclidean distance between them exceeds an assumed level.
- 3.
- The removal of very small subnetworks with two or three neurons (usually, they represent noise).
- 4.
- The insertion of additional neurons into the neighborhood of high-active neurons in order to take over some of their activities (it results in distributing more evenly the system’s activity across the network).
- 5.
- The reconnection of two selected subnetworks:
- 5.1.
- The GeSOM with 1DN case: the nearest end-neurons from two neighboring sub-chains are connected if the Euclidean distance between them is below an assumed level.
- 5.2.
- The GeSOM with T-LSs case: the nearest single neurons from two neighboring subnetworks are connected if the Euclidean distance between them is below an assumed level (this mechanism supports growing tree-like structure of the network).
- (a)
- in [13] we applied our GeSOMs with 1DN (DSOMs) to WWW-newsgroup-document clustering (the collection of 19997 documents was considered); our approach generated 58.41% of correct decisions, whereas alternative approaches achieved from 33.98% to 49.12% of correct decisions,
- (b)
- in [11] we tested our GeSOMs with 1DN (DSOMs) in terms of their abilities to correctly determine the number of clusters in a given data set (8 benchmark data sets available from the University of California (UCI) Database Repository at https://archive.ics.uci.edu/ml were considered); our approach achieved 100% of correct decisions for 6 out of 8 considered data sets, whereas an alternative method obtained such an accuracy only for 1 data set,
- (c)
- in [15] we applied our GeSOMs with T-LSs to microarray leukemia gene data clustering (the benchmark leukemia cancer data set containing 7129 genes and 72 samples was considered); our approach achieved 98.6% of correct decisions regarding the cluster assignments of particular data samples, whereas an alternative method gave only 93.14% accuracy,
- (d)
- in [16] we applied our GeSOMs with T-LSs to WWW-document clustering (the collection of 548 abstracts of technical reports and its 476-element subset, both available from the WWW server of the Department of Computer Science, University of Rochester, USA at https://www.cs.rochester.edu/trs were considered); our approach obtained 87.23% and 84.87% clustering accuracies for bigger and smaller collections, respectively, whereas alternative approaches gave from 36.68% to 65.33% accuracy for bigger collection and from 38.45% to 69.96% for smaller collection,
- (e)
- in [18] our GeSOMs with T-LSs were used to uncover informative genes from colon cancer gene expression data via multi-step clustering (the benchmark colon cancer microarray data set containing 6500 genes and 62 samples was considered); our approach generated 88.71% of correct decisions regarding the clustering of samples, whereas alternative methods achieved from 51.61% to 85.73% accuracy,
- (f)
- in [19] we applied our GeSOMs with T-LSs to electricity consumption data clustering for load profiling (the benchmark Irish Commission for Energy Regulation data set containing 4066 customer profiles with 25728 recordings per profile was considered); our approach achieved 94.86% of correct decisions, whereas alternative methods generated from 89.77% to 94.76% of correct decisions,
- (g)
- finally, in [17] we applied our both approaches to microarray lymphoma gene data clustering (the benchmark lymphoma cancer data set containing 4026 genes and 62 samples was considered); our approaches achieved 91.9% (GeSOMs with 1DN) and 93.6% (GeSOMs with T-LSs) of correct decisions, whereas alternative techniques gave from 61.3% to 75.8% of correct decisions.
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Kohonen, T. Self-Organizing Maps, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
- Gorricha, J.; Lobo, V. On the Use of Three-Dimensional Self-Organizing Maps for Visualizing Clusters in Georeferenced Data. In Lecture Notes in Geoinformation and Cartography; Spring: Berlin/Heidelberg, Germany, 2011; pp. 61–75. [Google Scholar]
- Pal, N.R.; Bezdek, J.C.; Tsao, E.C.K. Generalized clustering networks and Kohonen’s self-organizing scheme. IEEE Trans. Neural Netw. 1993, 4, 549–557. [Google Scholar] [CrossRef] [PubMed]
- Ultsch, A. Clustering with SOM: U*C. In Proceedings of the Workshop on Self-Organizing Maps, Paris, France, 5–8 September 2005; pp. 75–82. [Google Scholar]
- Vellido, A.; Gibert, K.; Angulo, C.; Guerrero, M.J.D. Advances in Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization. In Proceedings of the 13th International Workshop, WSOM+ 2019, Barcelona, Spain, 26–28 June 2019; Volume 976. [Google Scholar]
- Rodrigues, J.S.; Almeida, L.B. Improving the learning speed in topological maps of patterns. In The International Neural Network Society (INNS), the IEEE Neural Network Council Cooperating Societies, International Neural Network Conference (INNC); Springer: Paris, France, 1990; pp. 813–816. [Google Scholar]
- Fritzke, B. Growing grid—A self-organizing network with constant neighborhood range and adaptation strength. Neural Process. Lett. 1995, 2, 9–13. [Google Scholar] [CrossRef]
- Blackmore, J.; Miikkulainen, R. Incremental grid growing: Encoding high-dimensional structure into a two-dimensional feature map. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, USA, 28 March–1 April 1993; Volume 1, pp. 450–455. [Google Scholar]
- Fritzke, B. A growing neural gas network learns topologies. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1995; Volume 7, pp. 625–632. [Google Scholar]
- Gorzałczany, M.B.; Rudziński, F. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis. IEEE Trans. Neural Networks Learn. Syst. 2018, 29, 2833–2845. [Google Scholar] [CrossRef] [PubMed]
- Gorzałczany, M.B.; Rudziński, F. Cluster analysis via dynamic self-organizing neural networks. In Artificial Intelligence and Soft Computing—ICAISC 2006; Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4029, pp. 593–602. [Google Scholar]
- Gorzałczany, M.B.; Rudziński, F. WWW-newsgroup-document clustering by means of dynamic self-organizing neural networks. In Artificial Intelligence and Soft Computing—ICAISC 2008; Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5097, pp. 40–51. [Google Scholar]
- Gorzałczany, M.B.; Piekoszewski, J.; Rudziński, F. Generalized tree-like self-organizing neural networks with dynamically defined neighborhood for cluster analysis. In Artificial Intelligence and Soft Computing—ICAISC 2014; Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8468, pp. 725–737. [Google Scholar]
- Gorzałczany, M.B.; Piekoszewski, J.; Rudziński, F. Microarray leukemia gene data clustering by means of generalized self-organizing neural networks with evolving tree-like structures. In Artificial Intelligence and Soft Computing—ICAISC 2015; Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9119, pp. 15–25. [Google Scholar]
- Gorzałczany, M.B.; Rudziński, F.; Piekoszewski, J. Generalized SOMs with splitting-merging tree-like structures for WWW-document clustering. In Proceedings of the 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (IFSA-EUSFLAT-15), Gijón, Spain, 15–19 June 2015; Alonso, J.M., Bustince, H., Reformat, M., Eds.; Atlantis Press: Gijón, Spain, 2015; Volume 89, pp. 186–193. [Google Scholar]
- Gorzałczany, M.B.; Rudziński, F.; Piekoszewski, J. Gene expression data clustering using tree-like SOMs with evolving splitting-merging structures. In Proceedings of the IEEE World Congress on Computational Intelligence (IEEE WCCI 2016), International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 3666–3673. [Google Scholar]
- Gorzałczany, M.B.; Piekoszewski, J.; Rudziński, F. Uncovering informative genes from colon cancer gene expression data via multi-step clustering based on generalized SOMs with splitting-merging structures. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 533–539. [Google Scholar]
- Gorzałczany, M.B.; Piekoszewski, J.; Rudziński, F. Electricity Consumption Data Clustering for Load Profiling Using Generalized Self-Organizing Neural Networks with Evolving Splitting-Merging Structures. In Proceedings of the 2018 IEEE 27th International Symposium on Industrial Electronics (ISIE), Cairns, Australia, 13–15 June 2018; pp. 747–752. [Google Scholar]
- Cottrell, M.; Fort, J.; Pages, G. Theoretical aspects of the SOM algorithm. Neurocomputing 1998, 21, 119–138. [Google Scholar] [CrossRef] [Green Version]
- Fritzke, B. Growing cell structures—A self-organizing network for unsupervised and supervised learning. Neural Netw. 1994, 7, 1441–1460. [Google Scholar] [CrossRef]
- Martinetz, T. Competitive Hebbian learning rule forms perfectly topology preserving maps. In Proceedings of the ICANN ’93, Amsterdam, The Netherlands, 13–16 September 1993; Gielen, S., Kappen, B., Eds.; Springer: London, UK, 1993; pp. 427–434. [Google Scholar]
- Martinetz, T.; Schulten, K. Topology representing networks. Neural Netw. 1994, 7, 507–522. [Google Scholar] [CrossRef]
- Ultsch, A.; Thrun, M.C. Credible visualizations for planar projections. In Proceedings of the 2017 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization (WSOM), Nancy, France, 28–30 June 2017; pp. 1–5. [Google Scholar]
- Thrun, M. Projection-Based Clustering through Self-Organization and Swarm Intelligence; Springer Vieweg: Auflage/Berlin, Germany, 2018. [Google Scholar]
- Onishi, A. Landmark map: An extension of the self-organizing map for a user-intended nonlinear projection. Neurocomputing 2020, 388, 228–245. [Google Scholar] [CrossRef] [Green Version]
- Hu, R.; Ratner, K.; Ratner, E.; Miche, Y.; Björk, K.M.; Lendasse, A. ELM-SOM+: A continuous mapping for visualization. Neurocomputing 2019, 365, 147–156. [Google Scholar] [CrossRef]
- Boualem, M.; Chemseddine, R.; Djamel, B.; Belkacem, O.B. A novel gearbox fault feature extraction and classification using Hilbert empirical wavelet transform, singular value decomposition, and SOM neural network. J. Vib. Control 2018, 24, 2512–2531. [Google Scholar]
- Rezaei, F.; Ahmadzadeh, M.; Safavi, H. SOM-DRASTIC: Using self-organizing map for evaluating groundwater potential to pollution. Stoch. Environ. Res. Risk Assess. 2017, 31, 1941–1956. [Google Scholar] [CrossRef]
- Feng, N.; Yang, L.; XueYong, J.; LiYan, D.; YongJie, C. Application of improved SOM network in gene data cluster analysis. Measurement 2019, 145, 370–378. [Google Scholar]
- Delgado, S.; Higuera, C.; Calle-Espinosa, J.; Morán, F.; Montero, F. A SOM prototype-based cluster analysis methodology. Expert Syst. Appl. 2017, 88, 14–28. [Google Scholar] [CrossRef]
- Prasad, H. Mixed data clustering using dynamic growing hierarchical self-organizing map with improved LM learning. Int. Res. J. Eng. Technol. 2016, 3, 150–156. [Google Scholar]
- Hung, W.L.; Yang, J.H.; Song, I.W.; Chang, Y.C. A modified self-updating clustering algorithm for application to dengue gene expression data. Commun. Stat. Simul. Comput. 2019. [Google Scholar] [CrossRef]
- Febrita, R.E.; Mahmudy, W.F.; Wibawa, A.P. High Dimensional Data Clustering using Self-Organized Map. Knowl. Eng. Data Sci. 2019, 2, 31–40. [Google Scholar] [CrossRef]
- Vesanto, J.; Alhonierni, E. Clustering of the self-organizing map. IEEE Trans. Neural Netw. 2000, 11, 586–600. [Google Scholar] [CrossRef]
- Brugger, D.; Bogdan, M.; Rosenstiel, W. Automatic cluster detection in Kohonen’s SOM. IEEE Trans. Neural Netw. 2008, 19, 442–459. [Google Scholar] [CrossRef]
- Tasdemir, K.; Merenyi, E. Exploiting data topology in visualization and clustering of self-organizing maps. IEEE Trans. Neural Netw. 2009, 20, 549–562. [Google Scholar] [CrossRef]
- Tasdemir, K.; Milenov, P.; Tapsall, B. Topology-based hierarchical clustering of self-organizing maps. IEEE Trans. Neural Netw. 2011, 22, 474–485. [Google Scholar] [CrossRef]
- Cabanes, G.; Bennani, Y. Learning the number of clusters in self organizing map. In Self-Organizing Map; Matsopoulos, G.K., Ed.; Intech: Vienna, Austria, 2010; pp. 15–28. [Google Scholar]
- Wu, S.; Chow, T.W.S. Self-organizing-map based clustering using a local clustering validity index. Neural Process. Lett. 2003, 17, 253–271. [Google Scholar] [CrossRef]
- Bezdek, J.C.; Reichherzer, T.R.; Lim, G.S.; Attikiouzel, Y. Multiple-prototype classifier design. IEEE Trans. Syst. Man Cybern. Part C 1998, 28, 67–79. [Google Scholar] [CrossRef] [Green Version]
Methods | SOM [1,2] | GSOM [7] | GGN [8] | IGG [9] | GNG [10] | GeSOM with 1DN [11] (DSOM [12,13]) | GeSOM with T-LSs [14,15,16,17,18,19] | |
---|---|---|---|---|---|---|---|---|
Modification mechanisms | Adding neurons | no | yes | yes | yes | yes | yes | yes |
Removing neurons | no | no | no | no | yes | yes | yes | |
Adding connections | no | no | no | yes | yes | yes | yes | |
Removing connections | no | no | no | yes | yes | yes | yes | |
Range of network modifications | Increasing the size | no | yes | yes | yes | yes | yes | yes |
Reducing the size | no | no | no | no | yes | yes | yes | |
Disconnection into subnetworks | no | no | no | yes | yes | yes | yes | |
Reconnection of some of subnetworks | no | no | no | n/r | n/r | yes | yes | |
Struc- ture regu- larity | Fully connected | yes | yes | yes | no | no | no | no |
Regular, rectangular structure | yes | yes | yes | yes | no | no | no | |
Form of data visualization | (4) | (4) | (4) | (4) | - | (5) | - | |
Effectiveness of | data visualization | ── | ── | |||||
data clustering |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gorzałczany, M.B.; Rudziński, F. Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data. Algorithms 2020, 13, 109. https://doi.org/10.3390/a13050109
Gorzałczany MB, Rudziński F. Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data. Algorithms. 2020; 13(5):109. https://doi.org/10.3390/a13050109
Chicago/Turabian StyleGorzałczany, Marian B., and Filip Rudziński. 2020. "Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data" Algorithms 13, no. 5: 109. https://doi.org/10.3390/a13050109
APA StyleGorzałczany, M. B., & Rudziński, F. (2020). Evolution of SOMs’ Structure and Learning Algorithm: From Visualization of High-Dimensional Data to Clustering of Complex Data. Algorithms, 13(5), 109. https://doi.org/10.3390/a13050109