1. Introduction
In the last decades, the proliferation of social platforms and the opportunities tied to each of them have drastically transformed the way individuals interact, share information, and form communities. This unprecedented growth in the scale and complexity of online interactions is largely due to social networks like X (formerly known as Twitter), discussion fora like Reddit, and even more niche platforms, such as the now-defunct Voat, known for their explorations of censorship-free models from the point of view of specific political groups [
1]. Hence, understanding user behavior in general, and especially within these platforms, has become a crucial area of research in computational social science [
2,
3]. By studying how users interact with each other and with others’ content, researchers can uncover how patterns of influence spread, opinion formation, and the creation and dissolution of communities [
4]. These insights are not only academically interesting, but also have practical implications for designing better social platforms, enhancing user experience, and mitigating issues such as misinformation and online harassment [
5,
6]. Moreover, understanding these behaviors can aid in developing strategies for marketing, public health campaigns, and even political mobilization, making this research area highly relevant in multiple domains. Furthermore, all of these aspects are even more critical in the analysis of understudied platforms such as Scored.co [
7,
8], which have not been extensively explored, and thus offering unique social dynamics that may differ significantly from mainstream platforms. Investigating these less-explored environments can uncover novel patterns of user behavior and mechanisms of information dissemination. In addition, such platforms often cater to niche communities or specialized interests, thus providing insights into subcultures and micro-level social dynamics that might not be straightforwardly visible on larger social platforms.
Understanding these dynamics becomes particularly important when addressing the general challenges inherent in studying complex social platforms. Among these, a significant challenge is the identification of the roles that users play within complex social structures. Indeed, role identification is crucial for various applications, including targeted information dissemination [
9] and community detection [
3]. In traditional social network analysis, roles are often determined based on patterns of dyadic connectivity and interactions. Nevertheless, in a discussion on a social platform, two users can interact directly, e.g., by sharing comments, but also indirectly, i.e., through other users. Hence, it is reasonable to assume that taking into account group dynamics and higher-order interactions could allow a more reliable identification of the roles played by users. A promising approach to encapsulating group interactions is defined by the concept of a social hypernetwork [
10,
11,
12], which allows for the modeling of relationships among groups of nodes, generally called hyperedges. Such a representation is particularly useful for studying social platforms from a higher-order point of view. Despite its importance and the existing methodologies designed for pairwise graphs, current literature about the problem of role identification lacks comprehensive approaches for social hypernetworks.
To address this gap, in this work, we formally introduce an approach for defining higher-order roles in a social hypernetwork and characterize higher-order interactions. We define the concept of an archetype, a general characterization of higher-order roles that serves as a “template” for identifying nodes representative of specific roles within the context of hyperedges. These archetypes provide a structured way to categorize and understand the diverse roles users play in group interactions. Then, we propose a general hyperedge characterization function, which allows one to characterize a hyperedge with regards to the features of the nodes it contains. To validate the effectiveness of our proposed framework, we conducted an exhaustive experimental campaign on the understudied social platform Scored.co, which offers a rich, novel dataset for analyzing user interactions in a hypernetwork context, making it an ideal testbed for our framework. Our experiments demonstrated the utility of our approach in characterizing user roles within social hypernetworks, and allowed us to derive several insights and implications enabled by our framework.
Summarizing, the main contributions of this work are as follows:
We define the concept of archetype, as a high-level description of user roles acting as a “template” characterizing node representatives. We also propose a way of quantifying how “distant” a node is from its archetypal representation;
We introduce an approach to characterize higher-order entities, such as hyperedges, taking into account exhibited features and their higher-order dynamics;
We apply our framework to a newly collected dataset of content and interactions from Scored.co, which we also release (anonymized) [
13]. Notably, this is also the first work to study the structural properties of Scored.co.
We believe that our contribution could be pivotal regarding various different aspects, as we will thoroughly illustrate in the rest of the paper. Indeed, understanding user roles within complex social structures is a key task in computational social science, especially on platforms where interactions extend beyond dyadic relationships. By defining archetypes as role-based templates within a hypernetwork, our framework provides a better understanding of user behaviors in settings where interactions involve multiple participants simultaneously. Strictly related is the analysis of group-based dynamics that are not visible through dyadic connections alone, which is made possible by our proposed approach for higher-order entity characterization and exhibits a practical utility in understanding complex interactions within both social and heterogeneous systems.
The outline of this paper is as follows: in
Section 2, we provide a detailed overview of related literature. In
Section 3, we provide the formalization of our proposed approach by defining the concept of archetype first, and the hyperedge characterization function after. In
Section 4, we illustrate the experimental campaign we carried out to evaluate the effectiveness of our framework. Afterward, in
Section 5 we provide a discussion regarding our approach and we highlight some important implications. Finally, in
Section 6, we draw our conclusions and delineate some future works.
2. Related Literature
The identification of roles within social networks stands as a fundamental aspect for understanding the dynamics and function of these complex systems [
14]. In traditional social network analysis, roles are often determined based on patterns of connectivity, which provide insights into the influence, responsibilities, and grouping of individuals within the network [
15]. This analysis is crucial for applications ranging from the characterization of polarization [
16] and user behaviors [
17] to the analysis of biological networks [
18] and animal populations [
19]. In different cases, role identification is also framed within the task of discovering influential users in a social network. In light of this, we refer the interested reader to the comprehensive survey proposed in [
15].
Nevertheless, extending role identification into hypernetwork scenarios poses many challenges. Unlike traditional networks, hypernetworks consist of interactions, represented by hyperedges, which can encompass multiple nodes simultaneously. A toy example of an hypernetwork, modeled as an hypergraph, is depicted in
Figure 1. Higher-order interaction models can better represent complex real-world systems [
10,
20]; however, the complexity of higher-order interactions can introduce difficulties in accurately defining and identifying roles, as traditional centrality metrics and methodologies (e.g., community detection algorithms) often fall short in capturing the multidimensional nature of hyperedges [
14,
21,
22]. Despite the relevance of such a task, to the best of our knowledge, there are no existing papers that explicitly discuss role identification within the context of hypernetworks. Given this gap in the literature, the rest of the section will concentrate on providing an overview of the methodologies and findings from conventional settings in dyadic interactions, which may serve as a baseline for future exploration into hypernetwork-specific role identification.
The general task of role identification consists in capturing the multifaceted characteristics of a node’s role in a pairwise network. Huang et al. [
23] identified roles in undirected, unweighted networks by evaluating node importance using correlated indicators like degree and centrality measures. A node’s role is then determined by comparing its indicator relationships to statistical correlations within the overall network. While their work shares the task of role identification with ours, the process does not involve group-based indicators leveraging the concept of “groups” such as communities or subgraphs. Bhagat et al. [
24], instead, framed the node role identification task as a node classification approach based on random walks. Similarly, framed within a classification task, Buntain and Golbeck [
25] focused on the so-called “answer-person” role on Reddit, characterized by users who predominantly respond to questions posed by others, with minimal engagement in broader discussions. The works from Bhagat et al. [
24] and Buntain and Golbeck [
25] shared a similar aim to study. Nevertheless, they relied on supervised approaches on pairwise graphs. In addition, their methods might not be straightforwardly generalized to higher-order interactions.
The approach proposed by Brendel and Krawczyk [
26] focuses on the identification of node roles in dynamic social networks as a sequence of different types of activities, by leveraging pattern subgraphs and sequence diagrams. In doing so, the authors were able to capture roles such as “gossipmongers”, i.e., users who replicate every received message at least three times. This approach can be considered as orthogonal to ours, as a hypergraph mining approach could also be applied in our context to enhance role identification. Hacker and Riemer [
27] outlined a process for identifying user roles in Enterprise Social Networks (ESNs), using a mix of design science research and data mining, involving data collection, preparation, and evaluation, with user roles identified through statistical analysis, including PCA and clustering. The work by Hacker and Riemer [
27] shares a similar focus to ours, although using a different methodology. The identification of emergent leadership roles was addressed by Temdee et al. [
28], where the authors examined collaboration patterns between teams and proposed the so-called “leadership index”, a combination of centrality measures including closeness and betweenness. Zhou et al. [
9] proposed a mixed-methods methodology for user role identification, focusing on dynamic user profiling. Several types of special users are defined and identified to support information dissemination. They proposed an approach based on computational methods and questionnaire-based evaluation to quantitatively describe user features. Such a work parallels ours in the sense that it also considers the network’s features. Cauteruccio et al. [
3] studied community and user stereotypes on Reddit. Here, with the term stereotype, the authors refer to the tendency to classify people into groups and to associate each group with a general idea or a label. The authors proposed a rule-based approach based on different quantitative views of the data, and defined author stereotypes on the basis of two orthogonal taxonomies, namely, the number of posts, and the number of comments by an author. While the work by Cauteruccio et al. and ours share some similarities, the employed methodologies are substantially different. The former defined stereotypes based on a single quantitative measure called score, which is inherent to the social platform itself. Instead, our approach proposes a general framework in which the characterization of users and the identification of their role are both defined by their feature-rich, higher-order surroundings. Moreover, Kou et al. [
6] identified five distinct social roles in a specific community on Reddit. Among them, we cite the “knowledge broker”, i.e., a member who introduces knowledge to the community by sharing links, and the “translator”, i.e., a member who contributes academic knowledge. While the contribution by Kou et al. is notable and similar to ours in considering various aspects of user characterization, these roles are specific and applicable only to the analyzed community.
Finally, it is worth mentioning embedding-based methodologies [
29], which could potentially be adapted to hypernetwork scenarios. For instance, Rozemberczki et al. [
30] introduced a technique that incorporates node attributes to generate embeddings that capture similarities based on neighborhood structures. Instead, Dehgan et al. [
31] leveraged both node and structural embeddings to detect nodes impersonating social bots. Indeed, node embeddings based on structural properties and exhibited attributes have attracted substantial interest in the last years [
32]. While these methods use embedding techniques that effectively summarize and exploit the structural information within traditional networks, their application to hypernetworks might not be straightforward. Our approach diverges from embedding-based methods in the sense that it is specifically tailored to the unique features of nodes within hypernetworks, such as the features they exhibit with regards to the hyperedges containing them. By focusing on this, without relying on an embedding, our methodology seeks to offer a more direct and fitting analysis of node roles.
We conclude this bird’s-eye view of the related literature by focusing on the correlated aspect of understudied platforms. Intuitively, the reasoning provided in the above discussion of related work also holds for these platforms. In fact, there has been a small number of works on understudied platforms, and generally, these were limited to presenting a dataset collection about the considered platform [
7,
33,
34,
35]. For instance, a large-scale dataset for the social platform Scored.co was presented by Patel et al. [
7]. In their work, they studied aspects such as posting activity and user characterization, as well as the phenomenon of user migration from other platforms. A recent social platform, called Bluesky, has also been targeted by different studies [
33,
34]. In the former work, the authors presented a comprehensive study of the social structure of the platform, as well as posting activity and content analysis. In addition, the complete post history of over 4M users was released. Similarly, the latter work studied users’ political leaning and ideological polarization on BlueSky, while also presenting a characterization of the network topology over time. Finally, Mekacher et al. [
35] released a dataset targeting the Indian microblogging platform Koo. The dataset consists of more than 72M posts and 75M comments, with related features such as shares and likes. Moreover, a thorough overview of the platform was presented, consisting of a discussion of the news ecosystem on the platform, hashtag usage, and user engagement.
3. Materials and Methods
In this section, we introduce our framework for characterizing nodes and hyperedges in a social hypernetwork, as well as the definition of roles, called archetypes, in such a network. We start by providing some background, which is useful to understand the frame of our context. Then, we introduce the definition of archetypes and our method to calculate them. Finally, we detail the proposed framework to characterize nodes and hyperedges.
3.1. Background
A hypergraph
,
is a set of nodes,
is a set of hyperedges, where
, for
. A visual representation of a hypergraph is depicted in
Figure 1. The order of
H is
, while the size of
H is
. We denote with
the set of hyperedges containing
, that is
. Given a hyperedge
, its size is the number of nodes belonging to it, that is
. The degree of a node
, denoted as
, is the number of neighbors of
; a node
is the neighbor of node
if and only if there exists at least one hyperedge
which
and
both belong to. The hyperdegree of a node
, denoted as
, is the number of hyperedges to which
belongs. A graphical depiction of a hypergraph is given in
Figure 1.
We use a hypergraph H to model a social hypernetwork. Here, nodes are users of the social hypernetwork, while a hyperedge represents a discussion between users. In what follows, we will refer to social hypernetwork simply as hypernetwork.
Moreover, features can be associated with users, and these are generally based on the content the users interact with and the discussions they participate in. To formalize them, we employ a set of features characterizing the nodes in H. Given a node , represents the values of each feature of , that is indicates the value of the k-th feature of . We assume features can be either numerical or categorical and that numerical features are always normalized. In addition, with , we indicate the hypergraph equipped with the set of node features F.
3.2. Characterizing Nodes via Archetypes
The first aim of our approach is the definition of archetypes. An archetype serves as a “template” to represent nodes characterized by a certain subset of features from the set
F. The concept of an archetype is pivotal in characterizing higher-order node roles based on subsets of hyperedges. Often, the study of a representation of nodes in a social network involves the definition of a taxonomy or a more thorough analysis of the behavior of such nodes [
3]. Instead, our definition of archetype enables a more general characterization of possible behavioral dynamics occurring among nodes, and does not rely on a single attribute.
Let
be a subset of features. Then, an archetype
A is defined as a tuple of values based on the features indicated by
. Formally, let
, where
is the value of the
j-th feature from
for the archetype
A. Then, each
, for
, represents a particular value of the
k-th feature from
that characterizes the nodes associated with this archetype. As an example, let
, where
(resp.,
) is the value of a quantitative feature indicating the average sentiment value (resp., average toxicity value) expressed by a user. Such features are extensively used in various data-science-based studies and can be easily computed via classical sentiment analysis methods [
36,
37]. Suppose
, where
(resp., 1) indicates a mostly negative (resp., mostly positive) average sentiment, and
, with higher values indicating a high degree of toxicity. Then, various archetypes could be defined based on these features. For instance, we could define the archetype
, which describes a template for users that are extremely negative and toxic on average, and we could name this archetype
Cynical Commenter; instead, the archetype
would be a template for users that are extremely positive and exhibit moderate toxicity, and we could name this
Overzealous User. In addition, it is noteworthy that, for the sake of presentation, here the definition of an archetype is only given with regards to nodes. Nevertheless, in the also case of features being available for hyperedges, then the same definition applies, and archetypes for the latter can be defined.
Essentially, an archetype
A can be viewed as a prototypical example of nodes that share similar feature values. To simplify the analysis and categorization of archetypes, instead of taking into account the feature values directly, we can map them into categorical values, indicating the state of the feature for the particular archetype. This approach involves setting a series of thresholds
, where the value
has the same domain as the feature
in
F, a set of labels
L and a labeling function
. Then, given an archetype
, by applying to each feature
the corresponding threshold
from
T via the labeling function
, we derive the archetype
, where
is a label in
L. Practically speaking,
represents the same archetype as
A, but its representation is now built over a specific set of labels. Let us take the aforementioned archetype
, defined over the set of features
, and representing
Overzealous Users. Suppose we set
, and we set
. Moreover, suppose we define the labeling function
as in Equation (
1):
Therefore, in this example, we derive
, which can be seamlessly interpreted as representing users exhibiting a high sentiment value and a low toxicity value. Note how the labeling depends on the thresholds
T, thus allowing for flexibility that can accommodate different contexts of analysis.
Finally, we are now able to state when a node is represented by a given archetype. Given an archetype
, based on a subset of features
, we want to effectively understand what nodes are represented by it. Let
be a node, and let
be the feature vector of node
according to the features selected in
. The node
can be considered represented by the archetype
A if
is sufficiently close to
A, according to a predefined distance metric
d. We express this as
, where
is a small positive threshold value that determines the acceptable distance between the node’s feature vector and the archetype. Note that both
A and
can be considered vectors of the same length
p; thus, classical distance metrics, such as the cosine similarity [
38], can be used. Furthermore, as we will see in the experiments, in the simplest case, a feature-wise comparison can also be used to assess when a node can be considered represented by a given archetype.
3.2.1. A Characterization of Archetypes
To comprehensively understand and categorize user archetypes, we decided to highlight them through three different expressions, namely, (i) emotional, (ii) psycho-emotional, and (iii) moral expressions. By leveraging well-established psychological theories and lexicons, we aimed to create detailed profiles that reflect the peculiar ways in which users interact and participate within the social platform. In what follows, we describe in detail these three expressions, which we subsequently used in our experiments.
Emotional Profiles
We aim to characterize archetypes based on the emotions they express. To do so, we refer to the psychological theory of emotions [
39] developed in 1980 by the American psychologist Robert Plutchik. This theory identifies eight basic emotions—joy, trust, fear, surprise, sadness, anticipation, anger, and disgust—and claims that all other emotions derive from a mixture of these primary ones. To quantify feelings expressed by Scored users, we leverage the NRCLexicon (National Research Council Lexicon) [
40], a resource containing over 14,000 English words and their associated emotional ratings according to Plutchik’s theory. This dictionary was further expanded by the National Research Council of Canada to include WordNet synonyms, reaching over 27,000 terms [
41]. For each of the users’ texts, we compute emotion scores, and normalize them in [0, 1]. In this context, 0 indicates texts that do not elicit any emotion, while 1 signifies texts that strongly evoke the specified emotion.
Psycho-Emotional Profiles
We also characterize how user archetypes relate to their surrounding social environments. To do so, we refer to the PAD model (Pleasure, Arousal, Dominance) introduced by Mehrabian and Russell in 1974 [
42]. According to the PAD model, three dimensions characterize the perception an individual has of the environment in which she finds herself. Pleasure (sometimes referred to as
valence) concerns whether an individual perceives the environment as enjoyable or not. Arousal measures how stimulating the environment is for the individual. Dominance indicates whether the individual feels in control of the environment. To operationalize these dimensions, we leverage the VAD Lexicon (Valence, Arousal, Dominance) [
43]. This resource contains over 20,000 English terms, along with their associated valence/pleasure, arousal, and dominance values. For each of the three dimensions, we associate each text with the total score of its words. Then, we normalize results to [0, 1], where 0 implies an absence of the corresponding dimension, and 1 implies the strong presence thereof.
Moral Profiles
We aim to characterize archetypes based on the moral dimensions that emerge from the content they produce. We rely on the Moral Foundations Theory, a psychological framing rooted in cultural anthropology that postulates the existence of five universal moral dimensions [
44]:
authority/subversion,
care/harm,
fairness/cheating,
loyalty/betrayal, and
sanctity/degradation. Each dimension is composed of a virtue (e.g., loyalty), and a corresponding vice (e.g., betrayal). Virtues can be understood as follows, while vices can be considered their opposite: The concept of authority can be defined in relation to specific traits, such as deference to higher authorities, in order to maintain group cohesion. Similarly, the concept of care can be understood in terms of nurturing and protection. Fairness can be conceptualized in terms of equal treatment and reward. Loyalty can be understood in relation to the prioritization of one’s group and alliances. Finally, sanctity can be defined in terms of the maintenance of the sacredness of the body and the avoidance of moral contamination. We operationalize this framework via the eMFD (Extended Moral Foundations Dictionary), a lexicon containing more than 3000 words [
45]. Each word has an associated score in [−1, 1] for each foundation, ranging from strong vice outage (−1) to strong virtue outage (1).
3.3. Analyzing Higher-Order Entities
While archetypes are a particular yet effective way to analyze entities within the social platform, they mainly focus on node features, whereas interactions are not taken into account. Hence, we define here a general characterization function to characterize nodes and hyperedges according to their exhibited features and higher-order dynamics. Without loss of generality, we propose the definition of such a function with regards to hyperedges. The same can be also applied to nodes.
Let be a hypergraph equipped with the node feature set F. We denote with a function which we call the hyperedge characterization function. The need for a function such as addresses the challenge of characterizing a hyperedge with regards to the nodes it contains. Let us recall that our approach deals with analyzing higher-order entities. To do so, we exploit the representation of node relationships via hyperedges. While this effectively captures the higher-order structural interactions between nodes, it might not be sufficient in acquiring insights into the semantics of such interactions. Therefore, we focus on the latter aspect through a characterization of hyperedges that is based not only on the contained nodes but also on their features. Formally, the domain of our hyperedge characterization function is E. takes as input a hyperedge and returns a value , which we call its characteristic value. Such value depends on the actual implementation : in fact, is general, and different approaches can be exploited to accommodate the hyperedge characterization. Furthermore, to address the aforementioned challenge, there are cases in which should be defined to consider the values of each node’s features contained in the considered hyperedge. Given a feature of interest , we write to denote that the actual implementation of considers the feature .
In the following, we propose various specializations of
and a brief rationale for each of them. Some of these specializations are used in
Section 4. We separate them into three families, namely,
(i) numerical-only definitions,
(ii) categorical-only, and
(iii) structural-based. In describing each of these, we assume we are interested in characterizing a hyperedge
. In addition, there are different specializations that are intended to be exploited when the analysis we are carrying out is feature-oriented. Therefore, in these cases, we assume a feature
of interest.
3.3.1. Numerical-Only Specializations
The following specializations are intended to be used when numerical features are considered within the investigation. Therefore, here we assume the feature of interest is numerical. Some specializations are as follows:
Statistics Descriptors: common statistics descriptors such as the mean, median, mode, variance, and standard deviation of the feature
among the nodes in
e can be easily computed. For instance, the mean would be simply defined and denoted as
MAD: calculates the Mean Absolute Deviation of the feature
among the nodes in
e, defined as
Gini Coefficient: employs the Gini Coefficient to compute the dispersion of the values of
among the nodes in
e, defined as
3.3.2. Categorical-Only Definitions
Differently from the above, the following specializations are intended to be used when categorical features are considered. Therefore, here we assume the feature of interest is categorical. Some specializations are as follows:
Entropy: measures the cohesion of the values of
among the nodes in
e. We define and denote this in Equation (
5), where
denotes the proportion of the value of feature
over all nodes in
e.
Gini Impurity: measures the probability of incorrectly classifying a randomly chosen element in the dataset if it were randomly labeled according to the distribution of all categories in the dataset. Here, we employ this on the feature
, thus we define and denote it in Equation (
6), where
denotes the proportion of the value of feature
over all nodes in
e.
3.3.3. Structural-Based Specializations
The structural-based specializations focus on characterizing a hyperedge based on its exhibited structural properties rather than only focusing on the features of the contained nodes. Some proposed specializations include the following:
5. Discussion and Implications
In this study, we have proposed an approach for defining higher-order roles in a social hypernetwork and characterizing higher-order interaction. The approach culminated in a general framework allowing the definition of archetypes, i.e., high-level description of user roles, and the characterization of higher-order entities by taking into account their exhibited features and higher-order dynamics. Our experimental evaluation, conducted on the Scored.co platform, showed the advantages of the framework in enhancing user behavior modeling, revealing mechanisms of role evolution and sentiment observability in a social platform characterized by higher-order interactions. We believe it is worth discussing the obtained results, together with the framework’s implications. To this end, we start by considering how our framework advances the task of modeling user behavior and temporal dynamics.
We observed how the archetypes derived from our analysis capture typical behavior patterns among users, revealing distinct roles and interactions within hyperedges. For instance, supportive archetypes such as HHL (high score, high sentiment, low toxicity) consistently align with positive and constructive participation, while archetypes like LLH (low score, low sentiment, high toxicity) are often associated with disruptive behaviors. These role-specific profiles provide interpretable models of user behavior within different contexts, even though our analysis focused on only three features (score, sentiment, toxicity). In principle, additional dimensions—such as topical interests, engagement persistence, or structural centrality—could be incorporated to achieve a more comprehensive, multi-dimensional view of user roles. Beyond these static characterizations, our analysis also highlights the temporal dynamics of archetypes, uncovering plausible mechanisms of role evolution. Notably, we found a significant 39% transition from LHL (low score, high sentiment, low toxicity) to HLL (high score, low sentiment, low toxicity), suggesting that users with initially low visibility but consistently positive contributions can gradually build credibility and influence within the community. Similarly, the 33% transition from LLH to HLL indicates that critical or dissatisfied users may earn legitimacy when their engagement becomes less toxic. Conversely, transitions such as the 11% shift from HLH (high score, low sentiment, high toxicity) to LLH reveal how persistent negativity can diminish visibility and reduce influence over time. These findings emphasize that archetypes should not be seen as fixed categories, but rather as fluid templates, sensitive to contextual events, shifting community norms, and feedback mechanisms. From a practical perspective, understanding such role progressions can inform community management, by helping platforms identify users who are likely to evolve into constructive contributors, as well as those whose negative trajectories may warrant closer attention.
Moreover, an important implication is the observation of sentiment and emotional dynamics. In fact, our framework integrates psychological and emotional dimensions, which in turn allow a heterogeneous archetype characterization, enhancing the analysis of sentiment trends within specific agglomerates of users. Indeed, we revealed different insights that can be valuable in different tasks, such as designing community guidelines and identifying disruptive sentiment dynamics. For instance, we noted that LLL (low score, low sentiment, low toxicity) archetype users frequently displayed high levels of sadness, fear, and anger, with low toxicity scores, indicating that these users express negative emotions without leaning into harmful language, which could disrupt the experience of other users. Nevertheless, such insights could be of great use in contexts such as analyzing health discourses and information [
5,
49].
Finally, we believe it is worth discussing the differences and advantages of the current method compared to existing techniques. As we reported in
Section 2, a key advantage of our framework lies in its explicit treatment of higher-order interactions. Most existing approaches to role identification rely on pairwise graphs, where relationships are reduced to dyads, and thus cannot capture the collective dynamics that arise when multiple users interact simultaneously. This dyadic simplification risks obscuring important group-level behaviors, such as coalition-building, echo-chamber effects, or the emergence of influential subgroups. In contrast, by representing discussions as hyperedges, our framework preserves the natural multi-user structure of online conversations. This allows us to characterize roles and archetypes, not only through individual features, but also through their positions and behaviors within group interactions. Furthermore, our hyperedge characterization function extends the analysis beyond structure to incorporate semantic dimensions such as sentiment and toxicity, providing a richer picture of how higher-order contexts shape user behavior. In this way, the proposed method goes beyond traditional role identification techniques and offers a novel contribution: it systematically integrates structural and content-based perspectives to reveal dynamics that are only visible when higher-order interactions are taken into account.
6. Conclusions
The identification of node roles within complex networks is significant when analyzing their dynamics and function. To advance in this context, in this paper we proposed a multi-dimensional, general framework to characterize nodes and hyperedges in a social hypernetwork. The aim of our framework is two-fold: (i) to characterize nodes and hyperedges, taking into account their exhibited features and higher-order dynamics, and (ii) to define “archetypes”, serving as a template to represent the higher-order roles of nodes. Our framework consists of different components, namely, hyperedge ensembles, and hyperedge- and node-based characterization functions. The combination of such components allows one to analyze a social hypernetwork from a more general point of view; each component can also be exploited in a standalone fashion, thus enabling a detailed analysis of the single aspect it captures. To assess the effectiveness of the framework, we carried out an exhaustive experimental campaign on Scored.co, an understudied social platform, focusing on different aspects such as the characterization of nodes, as well as their behaviors and surroundings. Indeed, our research could be relevant to multiple audiences. For scholars in computational social science, it provides a novel framework to study roles and dynamics in higher-order social interactions. For platform designers and moderators, the identification of archetypes and their transitions may inform strategies for community management and moderation, particularly in contentious or fringe environments. Finally, for policymakers and practitioners studying online extremism and digital sociology, the findings offer insights into the mechanisms of engagement, role evolution, and influence within politically charged communities.
6.1. Limitations and Applications
While we believe our approach presents a solid contribution, it is useful to discuss some of its limitations. First, the analysis was restricted to a single understudied platform (Scored.co), which may limit the generalizability of our findings to other social environments with different cultural, structural, or moderation characteristics. Second, our sentiment, toxicity, and moral profiling relied on lexicon-based approaches and pretrained models, which, although well-established, may not fully capture the nuances of user expression or evolving linguistic trends. Third, the archetypes we defined are shaped by a particular set of features (score, sentiment, toxicity), and additional dimensions (e.g., topical interests, network centrality, or temporal engagement) could further enrich the analysis. We applied our framework to study data from Scored.co, which we collected. Hence, there are further considerations to be made on potential data biases. First, Scored.co communities tend to overrepresent alt-right and fringe political discourse, limiting generalizability to broader social platforms. Second, the breadth-first expansion strategy, while effective for maximizing coverage, may have biased towards highly connected users and active communities, thereby underrepresenting isolated or inactive accounts.
As far as applications are concerned, we believe our framework could be of use in different context. First off, it could be seamlessly exploited to study role dynamics in crisis situations. Indeed, it can be applied to study social hypernetworks in crisis situations, e.g., natural disasters and pandemics, to understand how roles and behaviors shift under stress and how information and support are mobilized, as well as to identify possible roles that could misbehave, e.g., diffusing fake news [
50,
51]. In general, it can support platform designers and moderators in identifying archetypes associated with toxic or disruptive behaviors, thereby informing targeted interventions and moderation strategies. Secondly, marketers and communication strategists can leverage archetype dynamics to tailor message dissemination, exploiting insights into how different roles contribute to the spread of content within communities. Finally, public health and civic organizations may apply the framework to design more effective campaigns, by recognizing archetypes most likely to amplify constructive or prosocial narratives, thus complementing the aforementioned first context of application. In these ways, the approach extends beyond theoretical contributions and provides concrete utility for both academic and applied contexts.
6.2. Future Work
In our opinion, this paper does not represent an end point, but rather a starting point for future research. A first avenue concerns its application to other social platforms with different structural and cultural characteristics. Applying the framework to mainstream environments such as Reddit or X would enable the exploration of whether the archetypes and higher-order interaction patterns identified on Scored.co also emerge in large-scale, highly moderated ecosystems. Conversely, examining alternative or understudied platforms such as Bluesky [
33] or region-specific communities could reveal unique user behaviors shaped by different governance models, linguistic practices, and cultural norms. Obviously, the framework could be applied
as it is to all these platforms, and different extensions could be integrated in case of special aspects. For instance, when studying topical communities, the identification and analysis of archetypes could play a key role in understanding how users in communities shift and radicalize around a certain topic [
17]. A second avenue involves cross-platform comparisons: by jointly analyzing datasets from multiple platforms, one could investigate how platform affordances—e.g., anonymity, moderation policies, and algorithmic recommendation systems—influence the prevalence, transitions, and interactions of archetypes. A third avenue concerns the moral characterization of user behaviors. While our study relied on the Moral Foundations Theory, future work could explore adaptations or extensions of this framework to capture value systems in diverse cultural contexts. For example, developing lexicons tailored to non-Western societies or integrating complementary moral theories could uncover cultural nuances in online interactions that are not visible through existing models. These directions would broaden the applicability of our framework and contribute to a more comprehensive understanding of user roles and dynamics in heterogeneous online environments. Finally, we note that by exploiting the temporal component, our framework could be employed to analyze higher-order roles in a multi-dimensional analysis of group evolution in temporal data [
4].