# Assessing the Robustness of Cluster Solutions in Emotionally-Annotated Pictures Using Monte-Carlo Simulation Stabilized K-Means Algorithm

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Affective Multimedia Databases

#### 2.1. Models of Affect in Affective Multimedia Databases

_{i}= {val,ar,dom} where val, ar, dom are continuous variables representing valence, arousal and dominance emotional dimensions [17]. These three emotional dimensions form mutually orthogonal axes and their values are normalized in interval [1,9]. val ∈ [1,9] ∈ Val, ar ∈ [1,9] ∈ Ar, dom ∈ [1,9] ∈ Dom. Dominance (dom) is frequently omitted from the description of the emotion space because it was shown to be the least informative measure of the elicited affect [18]. Thus, following the dimensional model of affect and for all practical purposes, a single emotionally annotated multimedia document can be represented as a coordinate in a two-dimensional space of emotion Ω

_{Emo}= Val × Ar. Russell estimated the approximate central coordinates of specific discrete emotions in the dimensional model’s space [17]. He hypothesized that these locations are not fixed but rather change during a person’s lifetime, and also differ from one person to another, or between homogenous groups of persons based on their character traits. An illustration of the circumplex model of emotion, as proposed in [17], is shown in Figure 1. Emotionally annotated pictures, listed in Table 1 and shown in Figure 2, are projected on the two-dimensional space of emotion Ω

_{Emo}with each point representing one picture.

#### 2.2. The NAPS Affective Picture Database

## 3. Related Work

## 4. Unsupervised Machine Learning Methods

#### 4.1. k-Means Algorithm

#### 4.2. Disadvantages of the k-Means Algorithm and the Solutions Used

#### 4.2.1. Unstable Cluster Indexes

#### 4.2.2. Statistical Distribution Undecidability

#### 4.3. Defining the Optimal Number of Clusters (Parameter k)

## 5. Experiment and Results

#### 5.1. The Optimal Number of Clusters

#### 5.2. Reliability of the Stable Distribution Method

- Calculate the histogram, i.e., the matrix of cluster affiliation (n x k) through s simulations.
- All elements of the matrix that are equal to s are reset to zero because these points are stable.
- For each row (example) in the matrix, count columns other than zero.
- Subtract 1 from each such row (one column is considered correct).
- The total error e is then the sum of all the rows from Step 4.

_{ideal}looks like this:

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

- (1)
- Analysis—the main program that runs the selected computation (snippet) and produces a graph or textual output. These outputs were directly used for analysis and are included in the paper as figures or tables.
- (2)
- Runner—a class with all the computation and plotting logic on the higher abstraction level, e.g., for computing stable argmax partitions, plotting stability error curves, and computing silhouette scores.Lib—implements the lower-level library functions and abstractions, contains the following classes:
- (3)
- InputData—abstraction for data input and output for the NAPS or other affective picture datasets with similar architectures;
- (4)
- Config—class for configuring the k-means algorithm and evaluation parameters, other methods, such as dataset partitioning;
- (5)
- PlotAnnotator—a class module that provides support for rendering interactive data plots in the tool’s graphical user interface.

**Figure A1.**UML class diagram showing the software tool’s five functional class modules (Analysis, Runner, InputData, Config, PlotAnnotator), their attributes, operations and mutual relationships.

**Figure A2.**The clustering procedure using Monte-Carlo simulation stabilized k-means implemented in the Python software tool. UML activity diagrams illustrating functions StableColoredKMeans (

**left**) and MonteCarloKMeans (

**right**).

## References

- Omran, M.G.H.; Engelbrecht, A.P.; Salman, A. An overview of clustering methods. Intell. Data Anal.
**2007**, 11, 583–605. [Google Scholar] [CrossRef] - Alelyani, S.; Tang, J.; Liu, H. Feature Selection for Clustering: A Review. In Data Clustering: Algorithms and Applications; Aggarwal, C., Reddy, C., Eds.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
- de Amorim, R.C.; Hennig, C. Recovering the number of clusters in data sets with noise features using feature rescaling factors. Inf. Sci.
**2015**, 324, 126–145. [Google Scholar] [CrossRef] [Green Version] - Calvo-Zaragoza, J.; Valero-Mas, J.J.; Rico-Juan, J.R. Prototype generation on structural data using dissimilarity space representation. Neural Comput. Appl.
**2017**, 28, 2415–2424. [Google Scholar] [CrossRef] [Green Version] - Cios, K.J.; Swiniarski, R.W.; Pedrycz, W.; Kurgan, L.A. Unsupervised learning: Clustering. In Data Mining; Springer: Boston, MA, USA, 2007; pp. 257–288. [Google Scholar]
- Celebi, M.E.; Aydin, K. (Eds.) Unsupervised Learning Algorithms; Springer: Berlin, Germany, 2016. [Google Scholar]
- Kameshwaran, K.; Malarvizhi, K. Survey on clustering techniques in data mining. Int. J. Comput. Sci. Inf. Technol.
**2014**, 5, 2272–2276. [Google Scholar] - Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell.
**2002**, 24, 881–892. [Google Scholar] [CrossRef] - Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. InKdd
**1996**, 96, 226–231. [Google Scholar] - Sinaga, K.P.; Yang, M.S. Unsupervised K-means clustering algorithm. IEEE Access
**2020**, 8, 80716–80727. [Google Scholar] [CrossRef] - Horvat, M.; Popović, S.; Ćosić, K. Towards semantic and affective coupling in emotionally annotated databases. In Proceedings of the 35th International Convention on Information and Communication Technology, Electronics and Microelectronics MIPRO 2012, Opatija, Croatia, 21–25 May 2012; pp. 1003–1008. [Google Scholar]
- Colden, A.; Bruder, M.; Manstead, A.S. Human content in affect-inducing stimuli: A secondary analysis of the international affective picture system. Motiv. Emot.
**2008**, 32, 260–269. [Google Scholar] [CrossRef] - Horvat, M. A Brief Overview of Affective Multimedia Databases. In Central European Conference on Information and Intelligent Systems; Faculty of Organization and Informatics: Varaždin, Croatia, 2017; pp. 3–9. [Google Scholar]
- Marchewka, A.; Żurawski, Ł.; Jednorog, K.; Grabowska, A. The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database. Behav. Res. Methods
**2014**, 46, 596–610. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Riegel, M.; Żurawski, Ł.; Wierzba, M.; Moslehi, A.; Klocek, Ł.; Horvat, M.; Grabowska, A.; Michałowski, J.; Marchewka, A. Characterization of the Nencki Affective Picture System by discrete emotional categories (NAPS BE). Behav. Res. Methods
**2016**, 48, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Peter, C.; Herbon, A. Emotion representation and physiology assignments in digital systems. Interact. Comput.
**2006**, 18, 139–170. [Google Scholar] [CrossRef] - Posner, J.; Russell, J.A.; Peterson, B.S. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev. Psychopathol.
**2005**, 17, 715. [Google Scholar] [CrossRef] [PubMed] - Lang, P.J.; Bradley, M.M.; Cuthbert, B.N. International Affective Picture System (IAPS): Affective Ratings of Pictures and Instruction Manual; Technical Report A-8; University of Florida: Gainesville, FL, USA, 2008. [Google Scholar]
- Wierzba, M.; Riegel, M.; Pucz, A.; Leśniewska, Z.; Dragan, W.Ł.; Gola, M.; Jednorog, K.; Marchewka, A. Erotic subset for the Nencki Affective Picture System (NAPS ERO): Cross-sexual comparison study. Front. Psychol.
**2015**, 6, 1336. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Kensinger, E.A.; Schacter, D.L. Processing emotional pictures and words: Effects of valence and arousal. Cogn. Affect. Behav. Neurosci.
**2006**, 6, 110–126. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Horvat, M.; Jednoróg, K.; Marchewka, A. Clustering of Affective Dimensions in Pictures: An exploratory analysis of the NAPS database. In Proceedings of the 39th International Convention on Information and Communication Technology, Electronics and Microelectronics MIPRO 2016, Opatija, Croatia, 30 May–3 June 2016; pp. 1496–1501. [Google Scholar]
- Horvat, M.; Popović, S.; Ćosić, K. Multimedia stimuli databases usage patterns: A survey report. In Proceedings of the 36th International Convention on Information and Communication Technology, Electronics and Microelectronics MIPRO 2013, Opatija, Croatia, 20–24 May 2013; pp. 993–997. [Google Scholar]
- Constantinescu, A.C.; Wolters, M.; Moore, A.; MacPherson, S.E. A cluster-based approach to selecting representative stimuli from the International Affective Picture System (IAPS) database. Behav. Res. Methods
**2017**, 49, 896–912. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hamerly, G.; Drake, J. Accelerating Lloyd’s algorithm for k-means clustering. In Partitional Clustering Algorithms; Springer: Cham, Switzerland, 2015; pp. 41–78. [Google Scholar]
- Mahajan, M.; Nimbhorkar, P.; Varadarajan, K. The planar k-means problem is NP-hard. Theor. Comput. Sci.
**2012**, 442, 13–21. [Google Scholar] [CrossRef] [Green Version] - Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2000. [Google Scholar]
- Kroese, D.P.; Brereton, T.; Taimre, T.; Botev, Z.I. Why the Monte Carlo method is so important today. Wiley Interdiscip. Rev. Comput. Stat.
**2014**, 6, 386–392. [Google Scholar] [CrossRef] - Cluster Validation Essentials. Available online: https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/ (accessed on 31 March 2021).
- Ketchen, D.J.; Shook, C.L. The application of cluster analysis in strategic management research: An analysis and critique. Strateg. Manag. J.
**1996**, 17, 441–458. [Google Scholar] [CrossRef] - Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.
**1987**, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**The circumplex model of emotion as described in [17]. Valence (val) represents x-axis and arousal (ar) y-axis. Red points mark pictures from the experimental dataset listed in Table 1. Approximate (val, ar) coordinates of basic emotions in the dimensional emotion model space Ω

_{Emo}are indicated.

**Figure 2.**Example pictures from the NAPS dataset. Reproduced with permission from Marchewka, A.; Żurawski, Ł.; Jednorog, K.; Grabowska, A. The Nencki Affective Picture System (NAPS): Introduction to a novel, standardized, wide-range, high-quality, realistic picture database (2014), Springer.

**Figure 3.**Examples of unstable distribution indexes and cluster order permutations for four possible distribution indexes.

**Figure 4.**An example of the undecidability of the distribution. Black dots represent the unstable cluster affiliation of pictures in the feature space (valence, arousal).

**Figure 7.**Overall stability error of the distribution method with respect to the number of simulation iterations.

**Figure 8.**Overall stability error of the distribution method with respect to the number of clusters.

ID | Description | Valence (Avg) | Arousal (Avg) |
---|---|---|---|

Animals_002_v | lion | 6.45 | 6.86 |

Animals_003_h | snake | 5.02 | 5.51 |

Animals_004_v | wolf | 4.54 | 7.10 |

Animals_005_h | bat | 5.57 | 5.73 |

Faces_001_h | children with a dog | 7.80 | 4.97 |

Faces_242_h | man and woman smiling | 6.66 | 3.76 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Horvat, M.; Jović, A.; Burnik, K.
Assessing the Robustness of Cluster Solutions in Emotionally-Annotated Pictures Using Monte-Carlo Simulation Stabilized K-Means Algorithm. *Mach. Learn. Knowl. Extr.* **2021**, *3*, 435-452.
https://doi.org/10.3390/make3020022

**AMA Style**

Horvat M, Jović A, Burnik K.
Assessing the Robustness of Cluster Solutions in Emotionally-Annotated Pictures Using Monte-Carlo Simulation Stabilized K-Means Algorithm. *Machine Learning and Knowledge Extraction*. 2021; 3(2):435-452.
https://doi.org/10.3390/make3020022

**Chicago/Turabian Style**

Horvat, Marko, Alan Jović, and Kristijan Burnik.
2021. "Assessing the Robustness of Cluster Solutions in Emotionally-Annotated Pictures Using Monte-Carlo Simulation Stabilized K-Means Algorithm" *Machine Learning and Knowledge Extraction* 3, no. 2: 435-452.
https://doi.org/10.3390/make3020022