Characterizing User Archetypes and Discussions on Social Hypernetworks

Failla, Andrea; Citraro, Salvatore; Rossetti, Giulio; Cauteruccio, Francesco

doi:10.3390/bdcc9090236

Open AccessArticle

Characterizing User Archetypes and Discussions on Social Hypernetworks

¹

Department of Computer Science, University of Pisa, I56126 Pisa, Italy

²

Institute of Information Science and Technologies “A. Faedo” (ISTI), National Research Council (CNR), I56127 Pisa, Italy

³

Department of Information Engineering, Electrical Engineering and Applied Mathematics, University of Salerno, I84084 Fisciano, Italy

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(9), 236; https://doi.org/10.3390/bdcc9090236

Submission received: 2 July 2025 / Revised: 1 September 2025 / Accepted: 13 September 2025 / Published: 16 September 2025

Download

Browse Figures

Versions Notes

Abstract

In recent years, the proliferation of social platforms has drastically transformed how individuals interact, organize, and share information. In this scenario, there has been an unprecedented increase in the scale and complexity of interactions and, at the same time, little to no research about certain fringe social platforms. In this paper, we present a multi-dimensional framework for characterizing nodes and hyperedges in social hypernetworks, with a focus on the understudied alt-right platform Scored.co. Our approach integrates the possibility of studying higher-order interactions, thanks to the hypernetwork representation, and various node features such as user activity, sentiment, and toxicity, with the aim of defining distinct user archetypes and understanding their roles within the network. Utilizing a comprehensive dataset from Scored.co, consisting of more than 4.4 M posts and 36.9 M comments, we analyze the dynamics of these archetypes over time and explore their interactions and influence within the community. We identify eight archetypes, with the largest group comprising over 15,000 users, and observe that 44% of interactions involve at least five participants, highlighting the importance of higher-order modeling. Furthermore, we find significant archetype transitions and stable yet locally dense interaction patterns, with users exposed to roughly 1000 unique peers on average. The framework’s versatility allows for detailed analysis of both individual user behaviors and broader social structures. Our findings highlight the importance of higher-order interactions and node features in understanding social dynamics, and offer new insights into the roles and behaviors that emerge in complex online environments.

Keywords:

social media; hypergraphs; higher-order networks; debate networks; node roles; network datasets

1. Introduction

In the last decades, the proliferation of social platforms and the opportunities tied to each of them have drastically transformed the way individuals interact, share information, and form communities. This unprecedented growth in the scale and complexity of online interactions is largely due to social networks like X (formerly known as Twitter), discussion fora like Reddit, and even more niche platforms, such as the now-defunct Voat, known for their explorations of censorship-free models from the point of view of specific political groups [1]. Hence, understanding user behavior in general, and especially within these platforms, has become a crucial area of research in computational social science [2,3]. By studying how users interact with each other and with others’ content, researchers can uncover how patterns of influence spread, opinion formation, and the creation and dissolution of communities [4]. These insights are not only academically interesting, but also have practical implications for designing better social platforms, enhancing user experience, and mitigating issues such as misinformation and online harassment [5,6]. Moreover, understanding these behaviors can aid in developing strategies for marketing, public health campaigns, and even political mobilization, making this research area highly relevant in multiple domains. Furthermore, all of these aspects are even more critical in the analysis of understudied platforms such as Scored.co [7,8], which have not been extensively explored, and thus offering unique social dynamics that may differ significantly from mainstream platforms. Investigating these less-explored environments can uncover novel patterns of user behavior and mechanisms of information dissemination. In addition, such platforms often cater to niche communities or specialized interests, thus providing insights into subcultures and micro-level social dynamics that might not be straightforwardly visible on larger social platforms.

Understanding these dynamics becomes particularly important when addressing the general challenges inherent in studying complex social platforms. Among these, a significant challenge is the identification of the roles that users play within complex social structures. Indeed, role identification is crucial for various applications, including targeted information dissemination [9] and community detection [3]. In traditional social network analysis, roles are often determined based on patterns of dyadic connectivity and interactions. Nevertheless, in a discussion on a social platform, two users can interact directly, e.g., by sharing comments, but also indirectly, i.e., through other users. Hence, it is reasonable to assume that taking into account group dynamics and higher-order interactions could allow a more reliable identification of the roles played by users. A promising approach to encapsulating group interactions is defined by the concept of a social hypernetwork [10,11,12], which allows for the modeling of relationships among groups of nodes, generally called hyperedges. Such a representation is particularly useful for studying social platforms from a higher-order point of view. Despite its importance and the existing methodologies designed for pairwise graphs, current literature about the problem of role identification lacks comprehensive approaches for social hypernetworks.

To address this gap, in this work, we formally introduce an approach for defining higher-order roles in a social hypernetwork and characterize higher-order interactions. We define the concept of an archetype, a general characterization of higher-order roles that serves as a “template” for identifying nodes representative of specific roles within the context of hyperedges. These archetypes provide a structured way to categorize and understand the diverse roles users play in group interactions. Then, we propose a general hyperedge characterization function, which allows one to characterize a hyperedge with regards to the features of the nodes it contains. To validate the effectiveness of our proposed framework, we conducted an exhaustive experimental campaign on the understudied social platform Scored.co, which offers a rich, novel dataset for analyzing user interactions in a hypernetwork context, making it an ideal testbed for our framework. Our experiments demonstrated the utility of our approach in characterizing user roles within social hypernetworks, and allowed us to derive several insights and implications enabled by our framework.

Summarizing, the main contributions of this work are as follows:

We define the concept of archetype, as a high-level description of user roles acting as a “template” characterizing node representatives. We also propose a way of quantifying how “distant” a node is from its archetypal representation;
We introduce an approach to characterize higher-order entities, such as hyperedges, taking into account exhibited features and their higher-order dynamics;
We apply our framework to a newly collected dataset of content and interactions from Scored.co, which we also release (anonymized) [13]. Notably, this is also the first work to study the structural properties of Scored.co.

We believe that our contribution could be pivotal regarding various different aspects, as we will thoroughly illustrate in the rest of the paper. Indeed, understanding user roles within complex social structures is a key task in computational social science, especially on platforms where interactions extend beyond dyadic relationships. By defining archetypes as role-based templates within a hypernetwork, our framework provides a better understanding of user behaviors in settings where interactions involve multiple participants simultaneously. Strictly related is the analysis of group-based dynamics that are not visible through dyadic connections alone, which is made possible by our proposed approach for higher-order entity characterization and exhibits a practical utility in understanding complex interactions within both social and heterogeneous systems.

The outline of this paper is as follows: in Section 2, we provide a detailed overview of related literature. In Section 3, we provide the formalization of our proposed approach by defining the concept of archetype first, and the hyperedge characterization function after. In Section 4, we illustrate the experimental campaign we carried out to evaluate the effectiveness of our framework. Afterward, in Section 5 we provide a discussion regarding our approach and we highlight some important implications. Finally, in Section 6, we draw our conclusions and delineate some future works.

2. Related Literature

The identification of roles within social networks stands as a fundamental aspect for understanding the dynamics and function of these complex systems [14]. In traditional social network analysis, roles are often determined based on patterns of connectivity, which provide insights into the influence, responsibilities, and grouping of individuals within the network [15]. This analysis is crucial for applications ranging from the characterization of polarization [16] and user behaviors [17] to the analysis of biological networks [18] and animal populations [19]. In different cases, role identification is also framed within the task of discovering influential users in a social network. In light of this, we refer the interested reader to the comprehensive survey proposed in [15].

Nevertheless, extending role identification into hypernetwork scenarios poses many challenges. Unlike traditional networks, hypernetworks consist of interactions, represented by hyperedges, which can encompass multiple nodes simultaneously. A toy example of an hypernetwork, modeled as an hypergraph, is depicted in Figure 1. Higher-order interaction models can better represent complex real-world systems [10,20]; however, the complexity of higher-order interactions can introduce difficulties in accurately defining and identifying roles, as traditional centrality metrics and methodologies (e.g., community detection algorithms) often fall short in capturing the multidimensional nature of hyperedges [14,21,22]. Despite the relevance of such a task, to the best of our knowledge, there are no existing papers that explicitly discuss role identification within the context of hypernetworks. Given this gap in the literature, the rest of the section will concentrate on providing an overview of the methodologies and findings from conventional settings in dyadic interactions, which may serve as a baseline for future exploration into hypernetwork-specific role identification.

The general task of role identification consists in capturing the multifaceted characteristics of a node’s role in a pairwise network. Huang et al. [23] identified roles in undirected, unweighted networks by evaluating node importance using correlated indicators like degree and centrality measures. A node’s role is then determined by comparing its indicator relationships to statistical correlations within the overall network. While their work shares the task of role identification with ours, the process does not involve group-based indicators leveraging the concept of “groups” such as communities or subgraphs. Bhagat et al. [24], instead, framed the node role identification task as a node classification approach based on random walks. Similarly, framed within a classification task, Buntain and Golbeck [25] focused on the so-called “answer-person” role on Reddit, characterized by users who predominantly respond to questions posed by others, with minimal engagement in broader discussions. The works from Bhagat et al. [24] and Buntain and Golbeck [25] shared a similar aim to study. Nevertheless, they relied on supervised approaches on pairwise graphs. In addition, their methods might not be straightforwardly generalized to higher-order interactions.

The approach proposed by Brendel and Krawczyk [26] focuses on the identification of node roles in dynamic social networks as a sequence of different types of activities, by leveraging pattern subgraphs and sequence diagrams. In doing so, the authors were able to capture roles such as “gossipmongers”, i.e., users who replicate every received message at least three times. This approach can be considered as orthogonal to ours, as a hypergraph mining approach could also be applied in our context to enhance role identification. Hacker and Riemer [27] outlined a process for identifying user roles in Enterprise Social Networks (ESNs), using a mix of design science research and data mining, involving data collection, preparation, and evaluation, with user roles identified through statistical analysis, including PCA and clustering. The work by Hacker and Riemer [27] shares a similar focus to ours, although using a different methodology. The identification of emergent leadership roles was addressed by Temdee et al. [28], where the authors examined collaboration patterns between teams and proposed the so-called “leadership index”, a combination of centrality measures including closeness and betweenness. Zhou et al. [9] proposed a mixed-methods methodology for user role identification, focusing on dynamic user profiling. Several types of special users are defined and identified to support information dissemination. They proposed an approach based on computational methods and questionnaire-based evaluation to quantitatively describe user features. Such a work parallels ours in the sense that it also considers the network’s features. Cauteruccio et al. [3] studied community and user stereotypes on Reddit. Here, with the term stereotype, the authors refer to the tendency to classify people into groups and to associate each group with a general idea or a label. The authors proposed a rule-based approach based on different quantitative views of the data, and defined author stereotypes on the basis of two orthogonal taxonomies, namely, the number of posts, and the number of comments by an author. While the work by Cauteruccio et al. and ours share some similarities, the employed methodologies are substantially different. The former defined stereotypes based on a single quantitative measure called score, which is inherent to the social platform itself. Instead, our approach proposes a general framework in which the characterization of users and the identification of their role are both defined by their feature-rich, higher-order surroundings. Moreover, Kou et al. [6] identified five distinct social roles in a specific community on Reddit. Among them, we cite the “knowledge broker”, i.e., a member who introduces knowledge to the community by sharing links, and the “translator”, i.e., a member who contributes academic knowledge. While the contribution by Kou et al. is notable and similar to ours in considering various aspects of user characterization, these roles are specific and applicable only to the analyzed community.

Finally, it is worth mentioning embedding-based methodologies [29], which could potentially be adapted to hypernetwork scenarios. For instance, Rozemberczki et al. [30] introduced a technique that incorporates node attributes to generate embeddings that capture similarities based on neighborhood structures. Instead, Dehgan et al. [31] leveraged both node and structural embeddings to detect nodes impersonating social bots. Indeed, node embeddings based on structural properties and exhibited attributes have attracted substantial interest in the last years [32]. While these methods use embedding techniques that effectively summarize and exploit the structural information within traditional networks, their application to hypernetworks might not be straightforward. Our approach diverges from embedding-based methods in the sense that it is specifically tailored to the unique features of nodes within hypernetworks, such as the features they exhibit with regards to the hyperedges containing them. By focusing on this, without relying on an embedding, our methodology seeks to offer a more direct and fitting analysis of node roles.

We conclude this bird’s-eye view of the related literature by focusing on the correlated aspect of understudied platforms. Intuitively, the reasoning provided in the above discussion of related work also holds for these platforms. In fact, there has been a small number of works on understudied platforms, and generally, these were limited to presenting a dataset collection about the considered platform [7,33,34,35]. For instance, a large-scale dataset for the social platform Scored.co was presented by Patel et al. [7]. In their work, they studied aspects such as posting activity and user characterization, as well as the phenomenon of user migration from other platforms. A recent social platform, called Bluesky, has also been targeted by different studies [33,34]. In the former work, the authors presented a comprehensive study of the social structure of the platform, as well as posting activity and content analysis. In addition, the complete post history of over 4M users was released. Similarly, the latter work studied users’ political leaning and ideological polarization on BlueSky, while also presenting a characterization of the network topology over time. Finally, Mekacher et al. [35] released a dataset targeting the Indian microblogging platform Koo. The dataset consists of more than 72M posts and 75M comments, with related features such as shares and likes. Moreover, a thorough overview of the platform was presented, consisting of a discussion of the news ecosystem on the platform, hashtag usage, and user engagement.

3. Materials and Methods

In this section, we introduce our framework for characterizing nodes and hyperedges in a social hypernetwork, as well as the definition of roles, called archetypes, in such a network. We start by providing some background, which is useful to understand the frame of our context. Then, we introduce the definition of archetypes and our method to calculate them. Finally, we detail the proposed framework to characterize nodes and hyperedges.

3.1. Background

A hypergraph

H = (V, E)

,

V = {v_{1}, \dots, v_{n}}

is a set of nodes,

E = {e_{1}, \dots, e_{m}}

is a set of hyperedges, where

e_{j} \subseteq V

, for

j = 1, \dots, m

. A visual representation of a hypergraph is depicted in Figure 1. The order of H is

n = | V |

, while the size of H is

m = | E |

. We denote with

E_{v_{i}}

the set of hyperedges containing

v_{i}

, that is

E_{v_{i}} = {e_{j} \in E : v_{i} \in e_{j}}

. Given a hyperedge

e_{j} \in E

, its size is the number of nodes belonging to it, that is

| e_{j} |

. The degree of a node

v_{i}

, denoted as

d e g (v_{i})

, is the number of neighbors of

v_{i}

; a node

v_{k} \in V

is the neighbor of node

v_{i}

if and only if there exists at least one hyperedge

e_{j}

which

v_{i}

and

v_{k}

both belong to. The hyperdegree of a node

v_{i}

, denoted as

h d e g (v_{i})

, is the number of hyperedges to which

v_{i}

belongs. A graphical depiction of a hypergraph is given in Figure 1.

We use a hypergraph H to model a social hypernetwork. Here, nodes are users of the social hypernetwork, while a hyperedge represents a discussion between users. In what follows, we will refer to social hypernetwork simply as hypernetwork.

Moreover, features can be associated with users, and these are generally based on the content the users interact with and the discussions they participate in. To formalize them, we employ a set of features

F = {f_{1}, \dots, f_{l}}

characterizing the nodes in H. Given a node

v_{i} \in V

,

F_{i} = {f_{1_{i}}, \dots, f_{l_{i}}}

represents the values of each feature of

v_{i}

, that is

f_{k_{i}}

indicates the value of the k-th feature of

v_{i}

. We assume features can be either numerical or categorical and that numerical features are always normalized. In addition, with

H_{F}

, we indicate the hypergraph equipped with the set of node features F.

3.2. Characterizing Nodes via Archetypes

The first aim of our approach is the definition of archetypes. An archetype serves as a “template” to represent nodes characterized by a certain subset of features from the set F. The concept of an archetype is pivotal in characterizing higher-order node roles based on subsets of hyperedges. Often, the study of a representation of nodes in a social network involves the definition of a taxonomy or a more thorough analysis of the behavior of such nodes [3]. Instead, our definition of archetype enables a more general characterization of possible behavioral dynamics occurring among nodes, and does not rely on a single attribute.

Let

F_{A} = {f_{1}, \dots, f_{p}} \subseteq F

be a subset of features. Then, an archetype A is defined as a tuple of values based on the features indicated by

F_{A}

. Formally, let

A = 〈 a_{1}, a_{2}, \dots, a_{p} 〉

, where

a_{j}

is the value of the j-th feature from

F_{A}

for the archetype A. Then, each

a_{k}

, for

k = 1, \dots, p

, represents a particular value of the k-th feature from

F_{A}

that characterizes the nodes associated with this archetype. As an example, let

F_{A} = {f_{sent}, f_{tox}}

, where

f_{sent}

(resp.,

f_{tox}

) is the value of a quantitative feature indicating the average sentiment value (resp., average toxicity value) expressed by a user. Such features are extensively used in various data-science-based studies and can be easily computed via classical sentiment analysis methods [36,37]. Suppose

f_{sent} \in [- 1, 1]

, where

- 1

(resp., 1) indicates a mostly negative (resp., mostly positive) average sentiment, and

f_{tox} \in [0, 1]

, with higher values indicating a high degree of toxicity. Then, various archetypes could be defined based on these features. For instance, we could define the archetype

A = 〈 - 1, 0.5 〉

, which describes a template for users that are extremely negative and toxic on average, and we could name this archetype Cynical Commenter; instead, the archetype

A = 〈 1, 0.5 〉

would be a template for users that are extremely positive and exhibit moderate toxicity, and we could name this Overzealous User. In addition, it is noteworthy that, for the sake of presentation, here the definition of an archetype is only given with regards to nodes. Nevertheless, in the also case of features being available for hyperedges, then the same definition applies, and archetypes for the latter can be defined.

Essentially, an archetype A can be viewed as a prototypical example of nodes that share similar feature values. To simplify the analysis and categorization of archetypes, instead of taking into account the feature values directly, we can map them into categorical values, indicating the state of the feature for the particular archetype. This approach involves setting a series of thresholds

T = 〈 t_{1}, \dots, t_{p} 〉

, where the value

t_{j}

has the same domain as the feature

f_{j}

in F, a set of labels L and a labeling function

ι

. Then, given an archetype

A = 〈 a_{1}, \dots, a_{p} 〉

, by applying to each feature

a_{j}

the corresponding threshold

t_{j}

from T via the labeling function

ι

, we derive the archetype

A_{T} = 〈 a_{1}^{t}, \dots, a_{2}^{t} 〉

, where

ι (a_{j}, t_{j}) = a_{j}^{t}

is a label in L. Practically speaking,

A_{T}

represents the same archetype as A, but its representation is now built over a specific set of labels. Let us take the aforementioned archetype

A = 〈 1, 0.5 〉

, defined over the set of features

F_{A} = {f_{sent}, f_{tox}}

, and representing Overzealous Users. Suppose we set

T = 〈 0, 0.75 〉

, and we set

L = {low, high}

. Moreover, suppose we define the labeling function

ι

as in Equation (1):

ι (a_{j}, t_{j}) = \{\begin{matrix} low & if a_{j} \leq t_{j} \\ high & otherwise \end{matrix}

(1)

Therefore, in this example, we derive

A_{T} = 〈 high, low 〉

, which can be seamlessly interpreted as representing users exhibiting a high sentiment value and a low toxicity value. Note how the labeling depends on the thresholds T, thus allowing for flexibility that can accommodate different contexts of analysis.

Finally, we are now able to state when a node is represented by a given archetype. Given an archetype

A = 〈 a_{i}, a_{2}, \dots, a_{p} 〉

, based on a subset of features

F_{A} = {f_{1}, \dots, f_{p}} \subseteq F

, we want to effectively understand what nodes are represented by it. Let

v_{i} \in V

be a node, and let

F_{A} (v_{i}) = 〈 f_{i_{1}}, f_{i_{2}}, \dots, f_{i_{p}} 〉

be the feature vector of node

v_{i}

according to the features selected in

F_{A}

. The node

v_{i}

can be considered represented by the archetype A if

F_{A} (v_{i})

is sufficiently close to A, according to a predefined distance metric d. We express this as

d (F_{A} (v_{i}), A) \leq ϵ

, where

ϵ

is a small positive threshold value that determines the acceptable distance between the node’s feature vector and the archetype. Note that both A and

F_{A} (v_{i})

can be considered vectors of the same length p; thus, classical distance metrics, such as the cosine similarity [38], can be used. Furthermore, as we will see in the experiments, in the simplest case, a feature-wise comparison can also be used to assess when a node can be considered represented by a given archetype.

3.2.1. A Characterization of Archetypes

To comprehensively understand and categorize user archetypes, we decided to highlight them through three different expressions, namely, (i) emotional, (ii) psycho-emotional, and (iii) moral expressions. By leveraging well-established psychological theories and lexicons, we aimed to create detailed profiles that reflect the peculiar ways in which users interact and participate within the social platform. In what follows, we describe in detail these three expressions, which we subsequently used in our experiments.

Emotional Profiles

We aim to characterize archetypes based on the emotions they express. To do so, we refer to the psychological theory of emotions [39] developed in 1980 by the American psychologist Robert Plutchik. This theory identifies eight basic emotions—joy, trust, fear, surprise, sadness, anticipation, anger, and disgust—and claims that all other emotions derive from a mixture of these primary ones. To quantify feelings expressed by Scored users, we leverage the NRCLexicon (National Research Council Lexicon) [40], a resource containing over 14,000 English words and their associated emotional ratings according to Plutchik’s theory. This dictionary was further expanded by the National Research Council of Canada to include WordNet synonyms, reaching over 27,000 terms [41]. For each of the users’ texts, we compute emotion scores, and normalize them in [0, 1]. In this context, 0 indicates texts that do not elicit any emotion, while 1 signifies texts that strongly evoke the specified emotion.

Psycho-Emotional Profiles

We also characterize how user archetypes relate to their surrounding social environments. To do so, we refer to the PAD model (Pleasure, Arousal, Dominance) introduced by Mehrabian and Russell in 1974 [42]. According to the PAD model, three dimensions characterize the perception an individual has of the environment in which she finds herself. Pleasure (sometimes referred to as valence) concerns whether an individual perceives the environment as enjoyable or not. Arousal measures how stimulating the environment is for the individual. Dominance indicates whether the individual feels in control of the environment. To operationalize these dimensions, we leverage the VAD Lexicon (Valence, Arousal, Dominance) [43]. This resource contains over 20,000 English terms, along with their associated valence/pleasure, arousal, and dominance values. For each of the three dimensions, we associate each text with the total score of its words. Then, we normalize results to [0, 1], where 0 implies an absence of the corresponding dimension, and 1 implies the strong presence thereof.

Moral Profiles

We aim to characterize archetypes based on the moral dimensions that emerge from the content they produce. We rely on the Moral Foundations Theory, a psychological framing rooted in cultural anthropology that postulates the existence of five universal moral dimensions [44]: authority/subversion, care/harm, fairness/cheating, loyalty/betrayal, and sanctity/degradation. Each dimension is composed of a virtue (e.g., loyalty), and a corresponding vice (e.g., betrayal). Virtues can be understood as follows, while vices can be considered their opposite: The concept of authority can be defined in relation to specific traits, such as deference to higher authorities, in order to maintain group cohesion. Similarly, the concept of care can be understood in terms of nurturing and protection. Fairness can be conceptualized in terms of equal treatment and reward. Loyalty can be understood in relation to the prioritization of one’s group and alliances. Finally, sanctity can be defined in terms of the maintenance of the sacredness of the body and the avoidance of moral contamination. We operationalize this framework via the eMFD (Extended Moral Foundations Dictionary), a lexicon containing more than 3000 words [45]. Each word has an associated score in [−1, 1] for each foundation, ranging from strong vice outage (−1) to strong virtue outage (1).

3.3. Analyzing Higher-Order Entities

While archetypes are a particular yet effective way to analyze entities within the social platform, they mainly focus on node features, whereas interactions are not taken into account. Hence, we define here a general characterization function to characterize nodes and hyperedges according to their exhibited features and higher-order dynamics. Without loss of generality, we propose the definition of such a function with regards to hyperedges. The same can be also applied to nodes.

Let

H_{F} = (V, E)

be a hypergraph equipped with the node feature set F. We denote with

ω

a function which we call the hyperedge characterization function. The need for a function such as

ω

addresses the challenge of characterizing a hyperedge with regards to the nodes it contains. Let us recall that our approach deals with analyzing higher-order entities. To do so, we exploit the representation of node relationships via hyperedges. While this effectively captures the higher-order structural interactions between nodes, it might not be sufficient in acquiring insights into the semantics of such interactions. Therefore, we focus on the latter aspect through a characterization of hyperedges that is based not only on the contained nodes but also on their features. Formally, the domain of our hyperedge characterization function is E.

ω

takes as input a hyperedge

e \in E

and returns a value

ω (e_{j})

, which we call its characteristic value. Such value depends on the actual implementation

ω

: in fact,

ω

is general, and different approaches can be exploited to accommodate the hyperedge characterization. Furthermore, to address the aforementioned challenge, there are cases in which

ω

should be defined to consider the values of each node’s features contained in the considered hyperedge. Given a feature of interest

f_{k}

, we write

ω^{k}

to denote that the actual implementation of

ω

considers the feature

f_{k}

.

In the following, we propose various specializations of

ω

and a brief rationale for each of them. Some of these specializations are used in Section 4. We separate them into three families, namely, (i) numerical-only definitions, (ii) categorical-only, and (iii) structural-based. In describing each of these, we assume we are interested in characterizing a hyperedge

e \in E

. In addition, there are different specializations that are intended to be exploited when the analysis we are carrying out is feature-oriented. Therefore, in these cases, we assume a feature

f_{k}

of interest.

3.3.1. Numerical-Only Specializations

The following specializations are intended to be used when numerical features are considered within the investigation. Therefore, here we assume the feature

f_{k}

of interest is numerical. Some specializations are as follows:

Statistics Descriptors: common statistics descriptors such as the mean, median, mode, variance, and standard deviation of the feature $f_{i}$ among the nodes in e can be easily computed. For instance, the mean would be simply defined and denoted as

$ω_{avg} = \frac{\sum_{u \in e} f_{i_{u}}}{| e |}$

(2)
MAD: calculates the Mean Absolute Deviation of the feature $f_{i}$ among the nodes in e, defined as

$ω_{mad} (e) = \sum_{u \in e} \frac{| f_{i_{u}} - ω_{avg} |}{| e |}$

(3)
Gini Coefficient: employs the Gini Coefficient to compute the dispersion of the values of $f_{i}$ among the nodes in e, defined as

$ω_{Gini} (e) = \sum_{u \in e} \sum_{v \in e, u \neq v} \frac{| f_{i_{u}} - f_{i_{v}} |}{2 \bar{f_{i}} {| e |}^{2}} .$

(4)

3.3.2. Categorical-Only Definitions

Differently from the above, the following specializations are intended to be used when categorical features are considered. Therefore, here we assume the feature

f_{k}

of interest is categorical. Some specializations are as follows:

Entropy: measures the cohesion of the values of $f_{i}$ among the nodes in e. We define and denote this in Equation (5), where $r_{i_{u}}$ denotes the proportion of the value of feature $f_{i_{u}}$ over all nodes in e.

$ω_{entr} (e) = - \sum_{u \in e} r_{i_{u}} l o g (r_{i_{u}})$

(5)
Gini Impurity: measures the probability of incorrectly classifying a randomly chosen element in the dataset if it were randomly labeled according to the distribution of all categories in the dataset. Here, we employ this on the feature $f_{i}$ , thus we define and denote it in Equation (6), where $r_{i_{u}}$ denotes the proportion of the value of feature $f_{i_{u}}$ over all nodes in e.

$ω_{GiniImp} (e) = 1 - \sum_{u \in e} r_{i_{u}}^{2}$

(6)

3.3.3. Structural-Based Specializations

The structural-based specializations focus on characterizing a hyperedge based on its exhibited structural properties rather than only focusing on the features of the contained nodes. Some proposed specializations include the following:

Hyperedge Size: This measure is simply given by the size of the hyperedge e, that is $ω_{size} (e) = | e |$ , regardless of the feature $f_{i}$ .
Purity: Here, the measure indicates the homogeneity of the values of $f_{i}$ in e, defined as

$ω_{purity} (e) = \frac{m a x (c o u n t (f_{i_{u}}) : u \in e)}{| e |}$

(7)
Cohesion: The characterization of the hyperedge e is given by the mean pairwise similarity of the value of the feature $f_{i}$ for each pair of nodes $u, v \in e$ . Formally, given a similarity function $s i m (\cdot, \cdot)$ defined for two features, we define this as

$ω_{cohes} (e) = \frac{2}{| e | (| e | - 1)} \sum_{u \in e} \sum_{v \in e, v \neq u} s i m (f_{i_{u}}, f_{i_{v}})$

(8)
Interaction potential: The hyperedge e is characterized by its capacity to create external connections, i.e., connections that span outside the hyperedge. This measures the extent to which the hyperedge facilitates interactions beyond its mere composition. It is formally defined in Equation (9), where $v \sim u$ indicates that v and u are neighbors within a hyperedge different from e. The denominator can be $| e |$ , i.e., only the nodes in e are taken into account, or $| V ∖ e |$ , i.e., $ω_{intpot}$ is sensitive to the order of the hypergraph.

ω_{intpot} (e) = \frac{| {v \in V ∖ e : \exists u \in e, v \sim u} |}{| e |}

(9)

4. Experiments

In what follows, we first provide a characterization of the studied dataset regarding the social platform Scored.co. We studied archetypes based on platform-specific and linguistic features, and we illustrate how our proposed framework allowed for a comprehensive analysis of them. Then, we focus on the constructed hypernetworks, and we provide a characterization of discussions, that is, hyperedges, through our proposed hyperedge characterization function.

4.1. Dataset

We focused on the Scored.co social platform, which hosts a network of communities that users can create and join. The platform shares some similarities with the more well-known platform Reddit [46]. Users can join communities related to various topics, from politics to memes, and upvote content they like, and downvote content they dislike. The number of upvotes minus the downvotes constitutes a post’s score, i.e., a value that indicates the content’s usefulness and/or relevance to the discussion.

Interestingly, Scored.co has included many far-right/alt-right communities since the 2020 massive Reddit bans [47]. Some of these maintained their names and key figures, e.g., c/TheDonald, c/GreatAwakening, ultimately acting as a continuation of the original communities. Scored.co is an emerging yet understudied platform, potentially hosting dangerous content. Despite its growing user base, it has received relatively little academic attention compared to other social media platforms, as we pointed out in Section 2. The loosely moderated nature of some communities can lead to the spread of misinformation, hate speech, and radical ideologies. This makes Scored.co a platform of interest for researchers studying online extremism, digital sociology, and the impacts of social media on public discourse.

To build a dataset for our experiments, we collected data from the Scored.co platform via the official API [48]. To retrieve as much content as possible, we used a breadth-first search technique by first gathering all posts and comments authored by C, one of the platform’s admins, who is one of the most active users on the platform. Then, the content was parsed to retrieve other users and their related content. We repeated this procedure until no new users were found. Eventually, we found 207,554 unique users (referenced by their username), 4,398,074 posts, and 36,978,685 comments. Our data collection process started on 9 June 2024, and ended on 22 June 2024. The final dataset comprised higher-order interactions on Scored.co, starting from the platform’s launch on 15 October 2019, and ending on 1 June 2024.

The dataset was released on Zenodo [13], anonymized and providing information for (i) higher-order interactions extracted from discussion threads and (ii) user profiles as sets of features characterizing each user at each time stamp. A Python data collection script was also enclosed in the repository.

4.2. Higher-Order Interactions and Analysis

Given the previously described dataset, we were now able to construct the hypernetworks, which constituted the basis of the experiments. We started by constructing a hypernetwork

H = (V, E)

, where V denotes the set of Scored.co users who participated in at least one discussion, while E denotes the set of hyperedges encoding discussion threads. That is, a user v appears in V if they participate in at least one discussion. In addition, we discard all hyperedges of size 2, due to the fact that we are interested in studying higher-order interactions. This means that, for all

e \in E

,

| e | > 2

. Since discussion topics on Scored are varied, ranging from politics to sports, memes, and more, we decided to center our analysis on discussions taking place in c/TheDonald during 2023. This community is a controversial and highly active online group that originally formed on Reddit, before migrating to platforms like Scored.co [47]. We chose it because it is the oldest and most active community on the platform, covering 74% of all collected posts and comments. This allowed us to maintain topic continuity without sacrificing large portions of the data. Note that

H

is a hypernetwork that represents the whole dataset. To enable analyses in the time dimension, we constructed a hypergraph

H_{t} = (V_{t}, E_{t})

, for each month

t = 1, \dots, 12

, containing users and discussions relative to month t. In Table 1, we report a summary of the statistics of all constructed hypernetworks. The table reports basic statics, namely, nodes and hyperedges number

| V |

and

| E |

, respectively, the largest hyperedge size

m a x_{| e |}

, the average hyperdegree

\bar{h d e g}

, and the average degree

\bar{d e g}

. In addition, for each pair of adjacent timestamps

1 \leq t, t_{+ 1} < 12

, we report the Jaccard similarity index between node sets. The last row of the table reports the same information for the hypernetwork

H

.

From an analysis of this table, different insights can be pinpointed. First off, the aggregated hypernetwork

H

includes interactions among roughly 20% of all discovered users. Indeed, on average, users participated in 113 discussions (hyperedges) throughout the year, and in 18–20 discussions each month. Scored.co users were exposed to ∼1 K unique peers, with a monthly average of 300–400, thus consistently showing locally dense patterns over time. We can also observe how the largest hyperedge size exhibited some variations but remained in the same order throughout the whole period. This indicates the presence of a highly connected subgroup at certain timestamps. The average hyperdegree

\bar{h d e g}

remained relatively stable across timestamps, indicating that on average, each node was part of ∼20 hyperedges. The average degree

\bar{d e g}

also remained fairly consistent. To complement these statistics, we performed a more detailed examination of the distribution of hyperdegree and hyperedge size, which are depicted in Figure 2a and b, respectively. From an analysis of these figures, we can see that both show the typical power-law behavior observed in many real-world hypernetworks [21]. We also note how a few users participated in a large number of discussions, while most users participated in just a few. The hyperedge size distribution (Figure 2b) displays a similar shape, with mostly small discussions and a few large ones. We also note how these patterns were consistent over time, with 44% of interactions of at least size 5, 15% of interactions of at least size 10, and 2% of interactions of at least size 50. Finally, in Table 1, we can observe that the overlap between adjacent timestamps, calculated via the Jaccard similarity index between subsequent node sets

V_{t}

and

V_{t + 1}

, was large and coherent over time, stabilizing at 64% on average. Overall, the studied part of this platform emerged as a locally dense online community with temporally stable structural patterns and significant user engagement, which made it interesting to study with our approach.

4.3. Identification and Analysis of Archetypes

We recall that a core aspect of our approach relies on identifying archetypes, namely typical patterns that can give us approximate descriptions of individual element behaviors. As we also noted in Section 3.2, archetypes can be easily defined for nodes and hyperedges. In this section, we aim to characterize user behaviors. Therefore, henceforth, we refer to archetypes as related to nodes/users.

To define user archetypes, we combine platform-specific and linguistic features. Specifically, we outline eight archetypes based on these features and their values. The features we decided to use are, namely, (i) score, (ii) sentiment, and (iii) toxicity. As far as the score is regarded, we computed the average post score for each user, then we normalized values in

[0, 1]

. For the sentiment feature, we used the Valence Aware Dictionary for sEntiment Reasoning (VADER) [36], a well-known rule-based sentiment classifier that assigns each text positive, negative, neutral, and compound sentiment intensity values. Out of these, the compound sentiment is a score in

[- 1, 1]

, going from the most extreme negative (

- 1

) to the most extreme positive (1) sentiment. For our purposes, we computed the average compound scores for each user and rescaled values in

[0, 1]

. Moreover, for the toxicity feature, we computed its values with Detoxify [37], a neural toxicity detection model that returns the probability that a text contains toxic language, and in this case, we also normalized the obtained values. Finally, to discern between high and low values for each feature, we set a threshold of 0.5, such that when a feature has a value higher than this threshold, it is considered high, and vice-versa.

The list of the eight archetypes defined is depicted in Table 2. Here, we highlight how setting the various combinations of the state of a feature (high or low) creates different archetypes. We defined these archetypes in order to encapsulate the inherent semantics of the discussions taking place in the studied dataset. For instance, when we refer to the HHL combination, this represents users showing (H)igh scores, (H)igh sentiment (thus, in the range of positive levels), and (L)ow levels of toxicity. Hence, they could be interpreted as users who consistently contribute positively to the discussion by maintaining a positive and supportive attitude, without engaging in toxic behavior. A detailed characterization of these archetypes is given in the following section.

4.3.1. Archetype Characterization

Having defined the archetypes, we are now able to provide a characterization of them, based on multiple psychological and social dimensions. In more detail, we describe the behaviors of these types with respect to the different aspects highlighted in Section 3.2.1. We recall that these aspects are the (i) environment perception, the (ii) emotions, and (iii) morality.

To control for class size imbalances, we compared average values for multiple features across the top 10 most archetypal users for each archetype, i.e., the most representative users for each category. To do so, we ranked users according to their typicality, which we define in Equation (10):

T y p i c a l i t y (u) = \prod_{f} α_{f} f (u)

(10)

Here, u is the target user; f is a function computing u’s characteristic score, sentiment, or toxicity;

a l p h a_{f}

is a coefficient that equals 1 if u’s archetype has a high value for f and, conversely, −1 if u’s archetype has a low value for f. As an example, a HHL’s typicality is maximized when its score is 1 (H), its sentiment value is 1 (H), and its toxicity value is 0 (L). Along with this characterization, in Figure 3, we depict the profiles of each archetype. The top part of the figure represents the profiles according to Plutchik’s wheel of emotions. The middle part, instead, represents the profiles according to the Pleasure/Arousal/Dominance model. Finally, the bottom part depicts the profiles according to the Moral Foundations theory.

Low Score—High Sentiment—Low Toxicity (LHL)

With respect to their psychological features, LHL users are characterized by high trust (0.18), joy (0.14), and anticipation (0.11); low arousal (0.44); and high valence (0.66). Hence, from their feature values, a possible interpretation can suggest them as users focusing on creating a supportive environment and emphasizing social cohesion, by manifesting a positive and optimistic outlook. It is important to note that, for this archetype, as well as all the following ones, the descriptors above can only lead to qualitative suggestions and interpretations of users’ behaviors.

High Score—High Sentiment—Low Toxicity (HHL)

Such users are characterized by high trust (0.17), joy (0.12), and anticipation (0.14); low arousal (0.45); and high valence (0.64). Their psychological profile shows a generally positive outlook. From a moral focus perspective, the values suggest they mostly emphasize loyalty and, to an extent, authority. We may hypothesize that these users would provide support in more prominent and/or critical situations, gaining greater acknowledgment from the community.

High Score—High Sentiment—High Toxicity (HHH)

With respect to their psychological features, HHH users show signs of joy (0.13), trust (0.12), and anger (0.10), as well as high valence (0.60). This indicates that, while they have a positive emotional tone and are capable of experiencing happiness and trust, they can also exhibit anger. This leads to an interpretation of this archetype as an emotionally charged one, with a tendency to express both positive and negative tones. Moreover, they mostly emphasize the loyalty and authority moral dimensions.

High Score—Low Sentiment—High Toxicity (HLH)

This archetype shows high signs of anger (0.14), fear (0.14), and disgust (0.10). Arousal (0.53) and Dominance (0.50) are average, while Valence is slightly below average (0.48). This complex psychological profile hints at expressions of negative tones, perhaps toxic interactions. High levels of fear could indicate a tendency to anticipate negative outcomes, which can make their discourse more alarmist. Disgust may contribute to a critical and often harsh tone. Average arousal and dominance could suggest a moderate level of engagement with the environment, whereas their slightly below-average valence could be a sign of negative interactivity. Moral dimensions of care, fairness, loyalty, authority, and sanctity display negative values. This moral outlook, combined with a negative emotional profile, contributes to an intuition of a provocative, even controversial, behavior.

Low Score—Low Sentiment—High Toxicity (LLH)

This archetype is characterized by high values for disgust (0.13), and anger (0.11), paired with below-average values for Valence (0.39) and Dominance (0.45), and high Arousal (0.61). These emotional values manifest a fully negative profile. The archetype shows high arousal, as well as strong emotional responses and engagement, but also low valence and dominance. This could mean the users belonging to it often display dissatisfaction, leading to discontented interactions. Morally, LLH users use vice-oriented language, focusing on sanctity/degradation and care/harm. Low values in these areas could indicate strong criticism and a tendency to vent.

Low Score—Low Sentiment—Low Toxicity (LLL)

This psychological profile outlines an emotional landscape characterized by high levels of sadness (0.11), fear (0.14), and anger (0.12). These predominant features suggest unhappiness with their environment. Anger that is present but paired with low toxicity values could suggest a profile expressing criticism in a non-confrontational way rather than being directly opposed. With average levels of Valence, Arousal, and Dominance, their emotional responses are steady and moderated. From a moral perspective, LLL’s language frequently involves vice-oriented terms, particularly focusing on sanctity/degradation, care/harm, and loyalty/betrayal. Their emphasis on sanctity interestingly could suggest a concern for moral purity, whereas the focus on care/harm could highlight preoccupation with injustice. Their attention to loyalty/betrayal could indicate a sensitivity to perceived trust, which might align with their cautious yet critical nature.

High Score—Low Sentiment—Low Toxicity (HLL)

This psychological profile mostly elicits sadness (0.8), fear (0.14), and anger (0.12). Sadness and fear highlight anticipation of negative outcomes. Anger drives the behavior of these users, moderated by average levels of valence, arousal, and dominance.

Low Score—High Sentiment—High Toxicity (LHH)

The users in this archetype are characterized by high levels of trust (0.18) and joy (0.13). They show average Dominance (0.54) and Arousal (0.49) values, as well as high Valence (.60). Their high trust and joy could suggest they generally expect positive interactions from others and have a positive disposition. The high Valence reflects an overall positive emotional tone. However, despite these positive traits, their high toxicity scores reveal a tendency toward a duality, which we could interpret as a confrontational/provocative communication style. This dichotomy of positivity and toxicity illustrates a complex profile, where users’ intentions may be good, but their execution can be damaging to others.

4.3.2. Archetypes Transitions

In this section, we aim to explore the temporal dimension of user interactions within the Scored.co community and understand whether significant transitions exist between the identified archetypes. This analysis can not only help in understanding the fluidity of user roles, but also potentially provide valuable information for targeted interventions and community management.

In order to carry out this analysis, we propose the following methodology. Given the imbalanced class sizes (see Table 2) throughout the observation periods, we employed a null model to test the transition probabilities between archetypes. This null model computed the expected transition probabilities on a copy of our dataset where archetype labels were shuffled. The detailed steps were as follows:

1.: We generated $N = 500$ shuffled copies of the dataset, randomizing the archetype labels while preserving the overall distribution of interactions and activities;
2.: For each archetype pair $(A, B)$ in the shuffled copies, we computed the transition probability $P (B | A)$ , which represents the likelihood of a user transitioning from archetype A to archetype B;
3.: We calculated the mean and standard deviation of $P (B | A)$ across all shuffled copies. These statistics were used to compute z-scores and p-values for the observed transition probabilities in the original dataset;
4.: Finally, transitions with p-values less than $0.01$ were considered statistically significant, indicating that these transitions considerably deviated from what was expected from the null model.

Figure 4 depicts the observed statistically significant transitions. One notable transition was from LHL to HLL, occurring at a significant rate of 39.23%. The former are characterized by positive sentiments and low post scores, typically engaging in supportive and encouraging behaviors. This transition may indicate that even users who start with low recognition can, through consistent positive contributions, build credibility and eventually gain influence within the community. HHL, another archetype defined by high trust, joy, and engagement, showed a tendency to transition to LHL or, more rarely, to LHH. The slight shift to LHL suggests that HHLs, despite their initially high engagement and recognition, may sometimes experience a reduction in visibility or engagement, leading them to adopt a more low-profile but still positive role. In addition, we observed an occasional drift towards more toxic behaviors. HLHs, which have high post scores and negative sentiment, exhibited a significant probability of remaining in their “contentious” roles (13.71%). However, there was also an 11.71% likelihood of transitioning to LLH. This shift reflects a potential decrease in engagement and influence, where high-profile negative behavior can lead to more discontent and a less active state. This also suggests a trajectory where persistent negativity can diminish a user’s central role within the community. LLHs, who are driven by negative emotions and low post scores, present an interesting case, with a substantial 33.29% chance of shifting into HLL. This transition implies that their critical nature can eventually earn them credibility, provided they moderate their toxic behaviors. Another significant transition was present from LLL to HLLL, with a remarkably high probability of 43.29%.

4.3.3. Temporal Analysis of Archetype Higher-Order Interactions

Above, we proposed a thorough characterization of a series of archetypes. At this point, an interesting aspect to study is the dynamics of archetypes within the social platform, as well as their evolution over time. To do so, we now focus on temporal trends in archetypes by examining some features on a monthly basis. We focus on two key metrics, that is, the average hyperdegree

\bar{h d e g}

and the average degree

\bar{d e g}

of all nodes represented by each archetype.

We considered the archetypes defined in the previous section, and we analyzed the nodes each archetype represents in each month. More formally, for each archetype and for each node u, we computed its hyperdegree

h d e g (u)

and its degree

d e g (u)

, and we did this for each hypergraph

H_{t}

,

t = 1, \dots, 12

. In Figure 5, we show the obtained results. The figure consists of 12 subplots, one for each hypergraph, corresponding to a different month. For each subplot, the x-axis represents the average hyperdegree

\bar{h d e g}

of each archetype, while the y-axis represents the average degree

\bar{d e g}

of each archetype. From the analysis of this figure, different insights can be derived. First off, we can notice how the HLL archetype had a higher hyperdegree and degree than all other archetypes. This is somewhat expected, as their characterization encompasses a considerable number of nodes, as we have seen in Section 3.2.1. It is straightforward to observe how there are pairs of archetypes that are often collocated together with regards to these dimensions. This is the case for HLH and LLL, which appear very close in almost all monthly snapshots, although they are very different in size. The relatively small number of archetypes such as HHH and LHH is also reflected in their average hyperdegree and degree values: indeed, they remained stable over the months and had little to no fluctuations. This is somewhat expected, considering that these archetypes are characterized by emotionally charged and often provocative interactions, leading to fluctuating levels of activity and influence. This variability suggests that their presence is more dynamic, potentially driven by specific events or contentious discussions that peak at different times. Such dynamics underline that archetypes act as evolving “roles” rather than rigid categories: supportive archetypes (e.g., HHL) tend to maintain a stable influence over time, while more disruptive ones (e.g., LLH, HLH) fluctuate depending on whether the community context amplifies or marginalizes them. Also in this case, these results could be somewhat enhanced by a more qualitative-oriented approach: indeed, aspects such as the content shared between archetypes would be of great importance in a more in-depth analysis of their interaction. Nevertheless, we believe the provided observation already suggests different implications. For instance, archetypes with stable engagement patterns are likely to be reliable contributors, therefore they could be taken into account in systematic operations of the platforms, such as moderation activities. In addition, archetypes that show fluctuating feature values might be highly responsive to specific events or topics, which could fuel critical discussions.

4.4. Characterization of Discussions

We recall that one of the objectives of our approach is to analyze higher-order entities. To do so, it offers a so-called hyperedge characterization function

ω

, which we presented in Section 3.3. We are now able to leverage this to describe the dataset with regard to some of the properties exhibited by the discussions.

First off, we focus on examining our dataset on a monthly basis, thus we consider each hypergraph

H_{t}

,

t = 1, \dots, 12

. We do not focus on the overall hypergraph

H

, due to the fact that we are interested in observing how the structure and the interactions within Scored.co evolved over time. In addition, this approach allows for a more granular understanding of user behavior and interactions. For each hypergraph

H_{t}

, we compute the s-betweenness hyperedge centrality [11]. This measure is a higher-order extension of classical graph betweenness centrality, and is computed on the line graph projection of the hypergraph to estimate the importance of hyperedges. We recall that the line graph of a hypergraph is a graph that represents the relationships between the hyperedges of the original hypergraph. In particular, there is a node in the line graph for each hyperedge in the hypergraph, and there is an edge between two nodes if the corresponding hyperedges share at least one node. After the computation of the centrality, we select the top 50 hyperedges in each hypergraph

H_{t}

, and for each of we compute some descriptive features based on the linguistic productions of the contained nodes. In particular, we focus on the following features: (i) average word count, (ii) average unique word count, and (iii) average purity with regard to user archetypes. These values can be easily calculated via the

ω

function, specialized for each of them. For instance, the specialization for the average purity is given by

ω_{purity}

in Section 3.3.

In Figure 6, we report the temporal trends in the average values of average word count, average unique word count, and average purity for the top 50 most prominent discussions. Note how discussion here is another term for hyperedge: in fact, in our setting, a hyperedge represents a discussion between users in the social hypernetwork. From the analysis of this figure, we can derive different insights. First off, the average word count graph, in the left part of the figure, shows a relatively stable trend, with slight fluctuations, indicating a consistent level of engagement in discussions over the year. There are some notable peaks, such as a slight increase around mid-year, followed by a step-down and a subsequent rise towards the end of the year. Similarly, the average vocabulary size, which indicates the average unique word count, mirrors the previous measure but remains consistently lower, which is somewhat expected because of non-content words (e.g., articles, pronouns, etc.) appearing more frequently in texts. A slight increase in the middle of the year is also depicted here, suggesting periods of higher and lower lexical diversity in user interactions.

Instead, the middle part of Figure 6 reports the trend for average subjectivity. We recall that this metric evaluates how subjective (as opposed to objective) the discussions are. It remains relatively stable throughout the year, which indicates that discussions typically contain a balanced mix of personal opinions and factual statements. Indeed, this calls for a more qualitative-oriented analysis of these discussions, an approach often used in the content-based investigation of social platforms [17]. Finally, the right part of the image depicts the graph of the average purity with regards to the user archetypes. This refers to the consistency of user behavior and content relative to the defined archetypes. We can see how this trend shows more variability compared to the previous features. There are sharp declines in some months, suggesting periods of more inconsistent behavior among users, while the different peaks could indicate times when user behavior is more aligned with archetypes.

It is worth noting that, while all these insights are interesting and show the applicability of our proposed approach, more in-depth analyses could be performed, as also noted above. Nevertheless, the various specializations of

ω

enable a series of different lenses through which social interactions can be studied, especially in understudied platforms.

5. Discussion and Implications

In this study, we have proposed an approach for defining higher-order roles in a social hypernetwork and characterizing higher-order interaction. The approach culminated in a general framework allowing the definition of archetypes, i.e., high-level description of user roles, and the characterization of higher-order entities by taking into account their exhibited features and higher-order dynamics. Our experimental evaluation, conducted on the Scored.co platform, showed the advantages of the framework in enhancing user behavior modeling, revealing mechanisms of role evolution and sentiment observability in a social platform characterized by higher-order interactions. We believe it is worth discussing the obtained results, together with the framework’s implications. To this end, we start by considering how our framework advances the task of modeling user behavior and temporal dynamics.

We observed how the archetypes derived from our analysis capture typical behavior patterns among users, revealing distinct roles and interactions within hyperedges. For instance, supportive archetypes such as HHL (high score, high sentiment, low toxicity) consistently align with positive and constructive participation, while archetypes like LLH (low score, low sentiment, high toxicity) are often associated with disruptive behaviors. These role-specific profiles provide interpretable models of user behavior within different contexts, even though our analysis focused on only three features (score, sentiment, toxicity). In principle, additional dimensions—such as topical interests, engagement persistence, or structural centrality—could be incorporated to achieve a more comprehensive, multi-dimensional view of user roles. Beyond these static characterizations, our analysis also highlights the temporal dynamics of archetypes, uncovering plausible mechanisms of role evolution. Notably, we found a significant 39% transition from LHL (low score, high sentiment, low toxicity) to HLL (high score, low sentiment, low toxicity), suggesting that users with initially low visibility but consistently positive contributions can gradually build credibility and influence within the community. Similarly, the 33% transition from LLH to HLL indicates that critical or dissatisfied users may earn legitimacy when their engagement becomes less toxic. Conversely, transitions such as the 11% shift from HLH (high score, low sentiment, high toxicity) to LLH reveal how persistent negativity can diminish visibility and reduce influence over time. These findings emphasize that archetypes should not be seen as fixed categories, but rather as fluid templates, sensitive to contextual events, shifting community norms, and feedback mechanisms. From a practical perspective, understanding such role progressions can inform community management, by helping platforms identify users who are likely to evolve into constructive contributors, as well as those whose negative trajectories may warrant closer attention.

Moreover, an important implication is the observation of sentiment and emotional dynamics. In fact, our framework integrates psychological and emotional dimensions, which in turn allow a heterogeneous archetype characterization, enhancing the analysis of sentiment trends within specific agglomerates of users. Indeed, we revealed different insights that can be valuable in different tasks, such as designing community guidelines and identifying disruptive sentiment dynamics. For instance, we noted that LLL (low score, low sentiment, low toxicity) archetype users frequently displayed high levels of sadness, fear, and anger, with low toxicity scores, indicating that these users express negative emotions without leaning into harmful language, which could disrupt the experience of other users. Nevertheless, such insights could be of great use in contexts such as analyzing health discourses and information [5,49].

Finally, we believe it is worth discussing the differences and advantages of the current method compared to existing techniques. As we reported in Section 2, a key advantage of our framework lies in its explicit treatment of higher-order interactions. Most existing approaches to role identification rely on pairwise graphs, where relationships are reduced to dyads, and thus cannot capture the collective dynamics that arise when multiple users interact simultaneously. This dyadic simplification risks obscuring important group-level behaviors, such as coalition-building, echo-chamber effects, or the emergence of influential subgroups. In contrast, by representing discussions as hyperedges, our framework preserves the natural multi-user structure of online conversations. This allows us to characterize roles and archetypes, not only through individual features, but also through their positions and behaviors within group interactions. Furthermore, our hyperedge characterization function extends the analysis beyond structure to incorporate semantic dimensions such as sentiment and toxicity, providing a richer picture of how higher-order contexts shape user behavior. In this way, the proposed method goes beyond traditional role identification techniques and offers a novel contribution: it systematically integrates structural and content-based perspectives to reveal dynamics that are only visible when higher-order interactions are taken into account.

6. Conclusions

The identification of node roles within complex networks is significant when analyzing their dynamics and function. To advance in this context, in this paper we proposed a multi-dimensional, general framework to characterize nodes and hyperedges in a social hypernetwork. The aim of our framework is two-fold: (i) to characterize nodes and hyperedges, taking into account their exhibited features and higher-order dynamics, and (ii) to define “archetypes”, serving as a template to represent the higher-order roles of nodes. Our framework consists of different components, namely, hyperedge ensembles, and hyperedge- and node-based characterization functions. The combination of such components allows one to analyze a social hypernetwork from a more general point of view; each component can also be exploited in a standalone fashion, thus enabling a detailed analysis of the single aspect it captures. To assess the effectiveness of the framework, we carried out an exhaustive experimental campaign on Scored.co, an understudied social platform, focusing on different aspects such as the characterization of nodes, as well as their behaviors and surroundings. Indeed, our research could be relevant to multiple audiences. For scholars in computational social science, it provides a novel framework to study roles and dynamics in higher-order social interactions. For platform designers and moderators, the identification of archetypes and their transitions may inform strategies for community management and moderation, particularly in contentious or fringe environments. Finally, for policymakers and practitioners studying online extremism and digital sociology, the findings offer insights into the mechanisms of engagement, role evolution, and influence within politically charged communities.

6.1. Limitations and Applications

While we believe our approach presents a solid contribution, it is useful to discuss some of its limitations. First, the analysis was restricted to a single understudied platform (Scored.co), which may limit the generalizability of our findings to other social environments with different cultural, structural, or moderation characteristics. Second, our sentiment, toxicity, and moral profiling relied on lexicon-based approaches and pretrained models, which, although well-established, may not fully capture the nuances of user expression or evolving linguistic trends. Third, the archetypes we defined are shaped by a particular set of features (score, sentiment, toxicity), and additional dimensions (e.g., topical interests, network centrality, or temporal engagement) could further enrich the analysis. We applied our framework to study data from Scored.co, which we collected. Hence, there are further considerations to be made on potential data biases. First, Scored.co communities tend to overrepresent alt-right and fringe political discourse, limiting generalizability to broader social platforms. Second, the breadth-first expansion strategy, while effective for maximizing coverage, may have biased towards highly connected users and active communities, thereby underrepresenting isolated or inactive accounts.

As far as applications are concerned, we believe our framework could be of use in different context. First off, it could be seamlessly exploited to study role dynamics in crisis situations. Indeed, it can be applied to study social hypernetworks in crisis situations, e.g., natural disasters and pandemics, to understand how roles and behaviors shift under stress and how information and support are mobilized, as well as to identify possible roles that could misbehave, e.g., diffusing fake news [50,51]. In general, it can support platform designers and moderators in identifying archetypes associated with toxic or disruptive behaviors, thereby informing targeted interventions and moderation strategies. Secondly, marketers and communication strategists can leverage archetype dynamics to tailor message dissemination, exploiting insights into how different roles contribute to the spread of content within communities. Finally, public health and civic organizations may apply the framework to design more effective campaigns, by recognizing archetypes most likely to amplify constructive or prosocial narratives, thus complementing the aforementioned first context of application. In these ways, the approach extends beyond theoretical contributions and provides concrete utility for both academic and applied contexts.

6.2. Future Work

In our opinion, this paper does not represent an end point, but rather a starting point for future research. A first avenue concerns its application to other social platforms with different structural and cultural characteristics. Applying the framework to mainstream environments such as Reddit or X would enable the exploration of whether the archetypes and higher-order interaction patterns identified on Scored.co also emerge in large-scale, highly moderated ecosystems. Conversely, examining alternative or understudied platforms such as Bluesky [33] or region-specific communities could reveal unique user behaviors shaped by different governance models, linguistic practices, and cultural norms. Obviously, the framework could be applied as it is to all these platforms, and different extensions could be integrated in case of special aspects. For instance, when studying topical communities, the identification and analysis of archetypes could play a key role in understanding how users in communities shift and radicalize around a certain topic [17]. A second avenue involves cross-platform comparisons: by jointly analyzing datasets from multiple platforms, one could investigate how platform affordances—e.g., anonymity, moderation policies, and algorithmic recommendation systems—influence the prevalence, transitions, and interactions of archetypes. A third avenue concerns the moral characterization of user behaviors. While our study relied on the Moral Foundations Theory, future work could explore adaptations or extensions of this framework to capture value systems in diverse cultural contexts. For example, developing lexicons tailored to non-Western societies or integrating complementary moral theories could uncover cultural nuances in online interactions that are not visible through existing models. These directions would broaden the applicability of our framework and contribute to a more comprehensive understanding of user roles and dynamics in heterogeneous online environments. Finally, we note that by exploiting the temporal component, our framework could be employed to analyze higher-order roles in a multi-dimensional analysis of group evolution in temporal data [4].

Author Contributions

A.F., S.C., G.R., and F.C. designed and performed the research, and wrote the paper. A.F. collected the data. A.F., S.C., and F.C. contributed to data analysis and interpretation of the experiments. A.F. and S.C. contributed to the software used to perform the experiments. GR and F.C. coordinated and supervised all of the research. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data collected and analyzed during the current study are available in a dedicated Zenodo repository [13].

Acknowledgments

This work was supported by (i) the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019—Integrating Activities for Advanced Communities”, Grant Agreement n.871042, ”SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics“ (http://www.sobigdata.eu); (ii) SoBigData.it which receives funding from the European Union—NextGenerationEU—National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR)—Project: ”SoBigData.it—Strengthening the Italian RI for Social Mining and Big Data Analytics”—Prot. IR0000013—Avviso n. 3264 del 28/12/2021; (iii) EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).

Conflicts of Interest

The authors declare that they have no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

VADER	Valence Aware Dictionary for sEntiment Reasoning
NRCLexicon	National Research Council Lexicon
PAD	Pleasure, Arousal, Dominance
VAD	Valence, Arousal, Dominance
eMFD	Extended Moral Foundations Dictionary
MAD	Mean Absolute Deviation

References

Mekacher, A.; Papasavva, A. “I Can’t Keep It Up.” A Dataset from the Defunct Voat. co News Aggregator. In Proceedings of the International AAAI Conference on Web and Social Media, Copenhagen, Denmark, 23–26 June 2022; Voume 16, pp. 1302–1311. [Google Scholar]
Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabási, A.L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; et al. Computational social science. Science 2009, 323, 721–723. [Google Scholar] [CrossRef]
Cauteruccio, F.; Corradini, E.; Terracina, G.; Ursino, D.; Virgili, L. Investigating Reddit to detect subreddit and author stereotypes and to evaluate author assortativity. J. Inf. Sci. 2022, 48, 783–810. [Google Scholar] [CrossRef]
Failla, A.; Cazabet, R.; Rossetti, G.; Citraro, S. Describing group evolution in temporal data using multi-faceted events. Mach. Learn. 2024, 113, 7591–7615. [Google Scholar] [CrossRef]
Record, R.A.; Silberman, W.R.; Santiago, J.E.; Ham, T. I sought it, I Reddit: Examining health information engagement behaviors among Reddit users. J. Health Commun. 2018, 23, 470–476. [Google Scholar] [CrossRef]
Kou, Y.; Gray, C.M.; Toombs, A.L.; Adams, R.S. Understanding social roles in an online community of volatile practice: A study of user experience practitioners on reddit. ACM Trans. Soc. Comput. 2018, 1, 1–22. [Google Scholar] [CrossRef]
Patel, J.; Paudel, P.; De Cristofaro, E.; Stringhini, G.; Blackburn, J. iDRAMA-Scored-2024: A Dataset of the Scored Social Media Platform from 2020 to 2023. In Proceedings of the International AAAI Conference on Web and Social Media, Buffalo, New York, USA, 3–6 June 2024; Volume 18, pp. 2014–2024. [Google Scholar]
Scored.co. Available online: https://scored.co (accessed on 27 August 2025).
Zhou, X.; Wu, B.; Jin, Q. User role identification based on social behavior and networking analysis for information dissemination. Future Gener. Comput. Syst. 2019, 96, 639–648. [Google Scholar] [CrossRef]
Battiston, F.; Cencetti, G.; Iacopini, I.; Latora, V.; Lucas, M.; Patania, A.; Young, J.G.; Petri, G. Networks beyond pairwise interactions: Structure and dynamics. Phys. Rep. 2020, 874, 1–92. [Google Scholar] [CrossRef]
Aksoy, S.G.; Joslyn, C.; Marrero, C.O.; Praggastis, B.; Purvine, E. Hypernetwork science via high-order hypergraph walks. EPJ Data Sci. 2020, 9, 16. [Google Scholar] [CrossRef]
Failla, A.; Citraro, S.; Rossetti, G. Attributed Stream Hypergraphs: Temporal modeling of node-attributed high-order interactions. Appl. Netw. Sci. 2023, 8, 31. [Google Scholar] [CrossRef]
Failla, A.; Citraro, S.; Rossetti, G.; Cauteruccio, F. Scored.co Hypernetwork Dataset; 2024. Available online: https://zenodo.org/doi/10.5281/zenodo.13142207 (accessed on 4 September 2025).
Sekara, V.; Stopczynski, A.; Lehmann, S. Fundamental structures of dynamic social networks. Proc. Natl. Acad. Sci. USA 2016, 113, 9977–9982. [Google Scholar] [CrossRef] [PubMed]
Al-Garadi, M.A.; Varathan, K.D.; Ravana, S.D.; Ahmed, E.; Mujtaba, G.; Khan, M.U.S.; Khan, S.U. Analysis of online social network connections for identification of influential users: Survey and open research issues. ACM Comput. Surv. (CSUR) 2018, 51, 1–37. [Google Scholar] [CrossRef]
Recuero, R.; Zago, G.; Soares, F. Using social network analysis and social capital to identify user roles on polarized political conversations on Twitter. Soc. Media+ Soc. 2019, 5, 2056305119848745. [Google Scholar] [CrossRef]
Cauteruccio, F.; Kou, Y. Investigating the emotional experiences in eSports spectatorship: The case of League of Legends. Inf. Process. Manag. 2023, 60, 103516. [Google Scholar] [CrossRef]
Wang, P.; Lü, J.; Yu, X. Identification of important nodes in directed biological networks: A network motif approach. PLoS ONE 2014, 9, e106132. [Google Scholar] [CrossRef]
Bajardi, P.; Barrat, A.; Natale, F.; Savini, L.; Colizza, V. Dynamical patterns of cattle trade movements. PLoS ONE 2011, 6, e19869. [Google Scholar] [CrossRef]
Torres, L.; Blevins, A.S.; Bassett, D.; Eliassi-Rad, T. The why, how, and when of representations for complex systems. SIAM Rev. 2021, 63, 435–485. [Google Scholar] [CrossRef]
Patania, A.; Petri, G.; Vaccarino, F. The shape of collaborations. EPJ Data Sci. 2017, 6, 1–16. [Google Scholar] [CrossRef]
Bonacich, P.; Holdren, A.C.; Johnston, M. Hyper-edges and multidimensional centrality. Soc. Netw. 2004, 26, 189–203. [Google Scholar] [CrossRef]
Huang, S.; Lv, T.; Zhang, X.; Yang, Y.; Zheng, W.; Wen, C. Identifying node role in social network based on multiple indicators. PLoS ONE 2014, 9, e103733. [Google Scholar] [CrossRef]
Bhagat, S.; Cormode, G.; Muthukrishnan, S. Node classification in social networks. In Social Network Data Analytics; Springer: Boston, MA, USA, 2011; pp. 115–148. [Google Scholar]
Buntain, C.; Golbeck, J. Identifying social roles in reddit using network structure. In Proceedings of the 23rd International Conference on World Wide Web (WWW 2014), Seoul, Republic of Korea, 7–11 April 2014; pp. 615–620. [Google Scholar]
Brendel, R.; Krawczyk, H. Primary role identification in dynamic social networks. In Proceedings of the 2011 International Conference on Computational Aspects of Social Networks (CASoN 2011), Salamanca, Spain, 19–21 October 2011; pp. 54–59. [Google Scholar]
Hacker, J.; Riemer, K. Identification of user roles in enterprise social networks: Method development and application. Bus. Inf. Syst. Eng. 2021, 63, 367–387. [Google Scholar] [CrossRef]
Temdee, P.; Thipakorn, B.; Sirinaovakul, B.; Schelhowe, H. Of collaborative learning team: An approach for emergent leadership roles identification by using social network analysis. In Proceedings of the Technologies for E-Learning and Digital Entertainment: First International Conference, Hangzhou, China, 16–19 April 2006; pp. 745–754. [Google Scholar]
Zhou, J.; Liu, L.; Wei, W.; Fan, J. Network representation learning: From preprocessing, feature extraction to node embedding. ACM Comput. Surv. (CSUR) 2022, 55, 1–35. [Google Scholar] [CrossRef]
Rozemberczki, B.; Allen, C.; Sarkar, R. Multi-scale attributed node embedding. J. Complex Netw. 2021, 9, cnab014. [Google Scholar] [CrossRef]
Dehghan, A.; Siuta, K.; Skorupka, A.; Dubey, A.; Betlen, A.; Miller, D.; Xu, W.; Kamiński, B.; Prałat, P. Detecting bots in social-networks using node and structural embeddings. J. Big Data 2023, 10, 119. [Google Scholar] [CrossRef]
Jin, J.; Heimann, M.; Jin, D.; Koutra, D. Toward understanding and evaluating structural node embeddings. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 16, 1–32. [Google Scholar] [CrossRef]
Failla, A.; Rossetti, G. “I’m in the Bluesky Tonight”: Insights from a year worth of social data. PLoS ONE 2024, 19, e0310330. [Google Scholar] [CrossRef]
Quelle, D.; Bovet, A. Bluesky: Network Topology, Polarisation, and Algorithmic Curation. arXiv 2024, arXiv:240517571. [Google Scholar]
Mekacher, A.; Falkenberg, M.; Baronchelli, A. The Koo Dataset: An Indian Microblogging Platform With Global Ambitions. In Proceedings of the International AAAI Conference on Web and Social Media, Buffalo, New York, USA, 3–6 June 2024; Volume 18, pp. 1991–2002. [Google Scholar]
Hutto, C.; Gilbert, E. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM 2014), Ann Arbor, MI, USA, 1–4 June 2014; Volune 8, pp. 216–225. [Google Scholar]
Hanu, L.; Unitary Team. Detoxify; 2020. Github. Available online: https://github.com/unitaryai/detoxify (accessed on 27 August 2025).
Singhal, A. Modern information retrieval: A brief overview. IEEE Data Eng. Bullettin 2001, 24, 35–43. [Google Scholar]
Plutchik, R. A general psychoevolutionary theory of emotion. In Theories of Emotion; Elsevier: Amsterdam, The Netherlands, 1980; pp. 3–33. [Google Scholar]
Mohammad, S.M.; Turney, P.D. Crowdsourcing a word–emotion association lexicon. Comput. Intell. 2013, 29, 436–465. [Google Scholar] [CrossRef]
NRCLex. Available online: https://github.com/metalcorebear/NRCLex (accessed on 27 August 2025).
Mehrabian, A.; Russell, J.A. An Approach to Environmental Psychology; The MIT Press: Cambridge, MA, USA, 1974. [Google Scholar]
Mohammad, S. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 174–184. [Google Scholar]
Haidt, J.; Joseph, C. Intuitive ethics: How innately prepared intuitions generate culturally variable virtues. Daedalus 2004, 133, 55–66. [Google Scholar] [CrossRef]
Hopp, F.R.; Fisher, J.T.; Cornell, D.; Huskey, R.; Weber, R. The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behav. Res. Methods 2021, 53, 232–246. [Google Scholar] [CrossRef]
Reddit. Reddit—The Heart of the Internet. Available online: https://reddit.com (accessed on 27 August 2025).
Cima, L.; Trujillo, A.; Avvenuti, M.; Cresci, S. The Great Ban: Efficacy and Unintended Consequences of a Massive Deplatforming Operation on Reddit. In Proceedings of the Companion Publication of the 16th ACM Web Science Conference, Stuttgart, Germany, 21–24 May 2024; pp. 85–93. [Google Scholar]
Scored.co. Getting Started with the API. Available online: https://help.scored.co/knowledge-base/getting-started-with-the-api/ (accessed on 27 August 2025).
De Choudhury, M.; De, S. Mental health discourse on reddit: Self-disclosure, social support, and anonymity. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 71–80. [Google Scholar]
Basile, V.; Cauteruccio, F.; Terracina, G. How dramatic events can affect emotionality in social posting: The impact of covid-19 on reddit. Future Internet 2021, 13, 29. [Google Scholar] [CrossRef]
Zhang, C.; Fan, C.; Yao, W.; Hu, X.; Mostafavi, A. Social media for intelligent public information and warning in disasters: An interdisciplinary review. Int. J. Inf. Manag. 2019, 49, 190–207. [Google Scholar] [CrossRef]

Figure 1. A toy hypergraph. Nodes are labeled with capital letters. Nodes A, B, C, and D are connected by a hyperedge of size 4. Nodes C and E, D and E, D and F, and E and F are pairwise connected.

Figure 2. Hyperdegree (a) and hyperedge size (b) distributions for the aggregated hypergraph.

Figure 3. Archetype profiles according to (a) Plutchik’s wheel of emotions, (b) the Pleasure/Arousal/Dominance model, and (c) Moral Foundations Theory.

Figure 4. Transition probabilities across archetypes (expressed in percentages). Only statistically significant probabilities are shown (

p < 0.01

).

Figure 4. Transition probabilities across archetypes (expressed in percentages). Only statistically significant probabilities are shown (

p < 0.01

).

Figure 5. Monthly average hyperdegree (x-axis) and number of neighbors (y-axis) for each archetype.

Figure 6. Average word count and average unique word count (a), average subjectivity (b), and average purity with regards to user archetypes (c) for the top 50 most central discussions.

Table 1. Statistics of the constructed hypergraphs.

t	$\| V \|$	$\| E \|$	$\max_{\| e \|}$	$\bar{hdeg}$	$\bar{\deg}$	${Jaccard}_{t, t + 1}$
1	10,889	28,280	293	19.59	394.17	0.63
2	10,018	24,234	230	18.24	348.39	0.63
3	10,245	26,393	411	19.18	382.09	0.63
4	10,203	25,406	428	18.72	375.51	0.64
5	9922	25,681	368	19.43	391.11	0.64
6	9594	24,558	357	19.13	359.61	0.63
7	9400	23,869	225	19.00	335.22	0.62
8	9449	24,542	380	19.30	358.27	0.63
9	9022	24,132	454	19.90	364.10	0.63
10	9179	29,632	241	23.39	376.16	0.64
11	8865	24,962	258	20.77	351.09	0.65
12	8402	22,711	216	20.34	333.75	-
all	20,937	321,860	454	112.8	1090.89	-

Table 2. User archetypes based on score, sentiment, and toxicity.

Score	Sentiment	Toxicity	#
H	H	L	419
H	H	H	21
H	L	L	15,286
H	L	H	807
L	H	L	267
L	H	H	25
L	L	L	3643
L	L	H	469

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Failla, A.; Citraro, S.; Rossetti, G.; Cauteruccio, F. Characterizing User Archetypes and Discussions on Social Hypernetworks. Big Data Cogn. Comput. 2025, 9, 236. https://doi.org/10.3390/bdcc9090236

AMA Style

Failla A, Citraro S, Rossetti G, Cauteruccio F. Characterizing User Archetypes and Discussions on Social Hypernetworks. Big Data and Cognitive Computing. 2025; 9(9):236. https://doi.org/10.3390/bdcc9090236

Chicago/Turabian Style

Failla, Andrea, Salvatore Citraro, Giulio Rossetti, and Francesco Cauteruccio. 2025. "Characterizing User Archetypes and Discussions on Social Hypernetworks" Big Data and Cognitive Computing 9, no. 9: 236. https://doi.org/10.3390/bdcc9090236

APA Style

Failla, A., Citraro, S., Rossetti, G., & Cauteruccio, F. (2025). Characterizing User Archetypes and Discussions on Social Hypernetworks. Big Data and Cognitive Computing, 9(9), 236. https://doi.org/10.3390/bdcc9090236

Article Menu

Characterizing User Archetypes and Discussions on Social Hypernetworks

Abstract

1. Introduction

2. Related Literature

3. Materials and Methods

3.1. Background

3.2. Characterizing Nodes via Archetypes

3.2.1. A Characterization of Archetypes

Emotional Profiles

Psycho-Emotional Profiles

Moral Profiles

3.3. Analyzing Higher-Order Entities

3.3.1. Numerical-Only Specializations

3.3.2. Categorical-Only Definitions

3.3.3. Structural-Based Specializations

4. Experiments

4.1. Dataset

4.2. Higher-Order Interactions and Analysis

4.3. Identification and Analysis of Archetypes

4.3.1. Archetype Characterization

Low Score—High Sentiment—Low Toxicity (LHL)

High Score—High Sentiment—Low Toxicity (HHL)

High Score—High Sentiment—High Toxicity (HHH)

High Score—Low Sentiment—High Toxicity (HLH)

Low Score—Low Sentiment—High Toxicity (LLH)

Low Score—Low Sentiment—Low Toxicity (LLL)

High Score—Low Sentiment—Low Toxicity (HLL)

Low Score—High Sentiment—High Toxicity (LHH)

4.3.2. Archetypes Transitions

4.3.3. Temporal Analysis of Archetype Higher-Order Interactions

4.4. Characterization of Discussions

5. Discussion and Implications

6. Conclusions

6.1. Limitations and Applications

6.2. Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI