3. Problem Definition
We use a boldface capital letter to denote a matrix (e.g.,
X), a boldface lowercase letter to denote a vector (e.g.,
u), a capital squiggle letter to denote a set (e.g.,
), a capital letter to denote a amount (e.g., M), and a lowercase letter to denote a scalar (e.g., g). The notations of our model MAGER are shown in
Table 3.
A group set is represented as , a event set is represented as , and a user set is represented as . Suppose that a group has M members, i.e., , and an event has four features: organizer , venue , day of week and hour of day . Specifically, the organizer is an initiator of event, the venue is the location of event, and the starting time of the event is regarded as a two-dimensional time pattern, consisting of “week” and “hour”. Therefore, we define an event participation record as:
An Event Participation Record. When a user participates in an event, we can create an event participation record . And when a group participates in an event, we can create an event participation record . The w and h form a two-dimensional coordinate that event e started, which means the time pattern hour of day h and day of week w.
There are two participation interaction forms: user–event and group–event. We use X to denote group–event interaction matrix, i.e., , where indicates group has participated in event , and create a participation record . We use Y to denote user–event interaction matrix, i.e., , where indicates user has participated in event , and create a participation record . Our task is to recommend a rank event list to a group or a user, that is, to achieve both group recommendation task and personalized recommendation task. Formally, it is written as follows:
Input: User set , Group set , Event set , user–event interaction , group–event interaction .
Output: A function , a function , which means to map an event to a real score for each user or each group.
4. Data Analysis and Motivation
The previous studies ignore the impact of users’ personalized preferences for fine-grained features of item on group decision-making. For instance, various aggregation-based methods such as AGREE [
14], ConsRec [
17] and PGR-PM [
40] merely learn members’ dynamic preferences toward items. We argue that not only do users themselves have different preferences for different fine-grained features, but also these preferences dynamically influence the behavior of the different groups they belong to. For example, a student may personally prefer attending events on weekends, with their family group to frequently engage in weekend events, while their social group may only focus on party-related events without considering time constraints. It may be known that the student has high weight in their family group, and the decisions of the family group follow their preferences. However, when the student is in the social group, their time preference no longer exists; rather, they align with the will of the social group. This indicates that the student’s preferences for the fine-grained features change dynamically depending on the different groups they are in.
To elaborate further, we conduct a case study: a distribution chart is used to quantify the heterogeneity of individual preferences in terms of fine-grained features. Specifically, we randomly select a user and we statistically analysis the distribution of their fine-grained preferences for the events they participated in, as shown in
Figure 2. A total of 18 events were participated, which were held by 10 different organizers, held at 7 different venues, and distributed across 3 different weeks, all scheduled within the same hour of the day. From the distribution of these features, it is evident that the user exhibits distinct preferences toward event features. For example, the user shows diverse preferences with respect to organizers, whereas their preference for the hour feature is highly concentrated.
Moreover, we demonstrate that there are significant differences in the attention weights that individual users assign to the features of the event when they participate in different groups. (1) We show the changes in a single user’s preference for specific features across different groups, as shown in
Figure 3a. We select the same user and randomly identified two groups he had participated in, namely group 1 and group 2. We select the features that this user interacted with most frequently, such as organizer o_86, venue v_124, week w_0, and hour h_18, and calculate the frequency of participate for these features in group 1 and group 2. Here, the total number event participated in by group 1 and group 2 was 10 times each. It can be seen that group 1 has no preference for the o_86 and v_124 features, and group 2 has no preference for the o_86 feature. This proves that a single user’s preference for fine-grained features will significantly change as he moves between different groups. (2) We present the heat map of the attention weights for the event features when group 1 and group 2 participate in the same event. As shown in
Figure 3b, we can observe that there is a significant difference in the attention weights given to the features by group 1 and group 2.
These findings again prove our motivation: users not only exhibit dynamic preferences for fine-grained features but also that these preferences further influence the decision-making of their groups.
5. Methodology
The differences between our MAGER and the general co-attention network as shown in
Figure 4. The novelty of MAGER lies first in the use of multi-attention cross-feature to capture the fine-grained features of events, rather than simply aggregating through MLP. Secondly, MAGER includes dual-layer attentions on both user side and group side, while most general co-attention networks only contain a single-layer user–item attention. Most importantly, the attention calculation of MAGER relies on the candidate event context, rather than statically weighting members or historical behaviors. Specifically, the overall architecture of our model MAGER as shown in
Figure 5, which illustrates the recommendation tasks in this paper, i.e., recommend well-pleasing events for groups or users. Specifically, the MAGER is divided into three components. The first one is Multiple Attention Group Representation Learning, the second is Neural Collaborative Filtering for modeling group–event interactions and user–event interactions, and the third is Recommendation and Optimization for learning model parameters. In the first component, we design three attention modules: the first one is obtaining the event representation
by using fine-grained features (such as
); the second is generating the group
’s aggregated preference embedding
based on the preferences of members; the third is obtaining the event representation at the group-level
by using fine-grained features.
Furthermore, to generate the initial embeddings of users, we take into account the real-world scenario and that the interests of users and groups change over time, so we design a mechanism based on time decay to generate the dynamic preference representations of users. We assume that the historical interaction sequence of user
u is
, where
represents an interactive event and
is the timestamp. The influence of the interaction term decreases with the passage of time. Therefore, we define the time decay weight as follows:
where
denotes the attenuation rate. Then the user’s dynamic preference embedding is
where
is an event embedding that user
u interacts with which is obtained based on “Fine-grained Attention” module. The dynamic user preference embedding is used to generate the group aggregated preference representation.
5.1. Multiple Attention Group Representation Learning
In this section, we introduce the significant module of our framework: Multiple Attention Group Representation Learning module that consists of three attention structures: (1) fine-grained feature-based attention event representation that first aims to learn event representation based on event features and target user by attentive mechanism; (2) member of group embedding aggregation that aims to estimate group preference on an event by a aggregate function; (3) group representation and group attention event representation which obtains final group embedding and corresponding event embedding. This three-level hierarchical attention structure, combined with the joint modeling of user- and group-level recommendations, is unique to our framework and specifically addresses the challenges in EBSNs.
5.1.1. Fine-Grained Feature-Based Attention for Event Representation
In EBSN scenarios, different users may have varying preferences for different features of event. Therefore, when modeling event representations, simply averaging or performing weighted summation of all features may fail to adequately capture users’ personalized preferences. To address this, we introduce a feature-based attention mechanism in this module to learn refined event representations, thereby better modeling users’ preference weights for different event features.
In this module, we learn event representation based on different features and target user. We propose a method to weight sum the embedding of different features, in which the weight parameters present the influence of each feature in deciding user behavior preference on event. To have an intuition for when a feature of event appears more in the preference behavior of a user, it should have more weight in the event representation. We assume event
has four fine-grained features
, each feature’s embedding vector are
,
,
,
, respectively. Assign an independent attention head to each feature
as follows:
then influence weights of each feature are obtained through feature-level attention, taking the organizer feature as an example, the formula is as follow:
where the
is a learnable weight matrix, the
is a learnable vector. Then the final event representation is as follows:
5.1.2. Member of Group Embedding Aggregation
After learning the feature-based attentive event representation, we further introduce a group representation module to integrate the interest features of group members and generate the final group embedding representation. Since group recommendations differ from personalized recommendations, a group consists of multiple members who may exhibit varying interests in the same event; thus, how to effectively aggregating members’ interest is a critical challenge in group recommendation task. Inspired by [
14], we use a neural attention network to highlight the influence of members of group on target event for aggregating member embedding vector. In this module, each group
, where the embedding representation of each member
is derived through attention weighted computation with the target event. Subsequently, we assign a weight to each member in the group to measure their influence on the target event. The final group embedding representation is then generated by performing a weighted summation of the members’ embedding representations based on these weights.
Specifically, our goal is to learn a member-level attention weight to quantify the influence of member
in the group
. This weight holistically considers both the member’s personal interest preferences and their social influence within the group. For instance, in social network, certain members (e.g., opinion leaders) exert greater influence on group event decision, and thus their weights should be higher. Based on this, we define the group embedding representation as follows:
where
is an aggregate dynamic preference representation of group
.
denotes dynamic preference embedding of member.
denotes the number of members of group
. We use the
was obtained by “Fine-grained Features Attention” module to compute the influence weight of member
. Then the attention weight
of each member as follows:
where
is a learnable weight matrix, and the
is a learnable vector.
denotes concatenation operate.
5.1.3. Group Representation and Group-Attention Event Representation
Besides the group embedding vector derived from member embedding aggregation, the group own preference embedding cannot be ignored. In some scenarios, a group’s event decision is influenced not only by the individual interests of its members but also by the group’s consensus preferences, particularly when members have a common purpose. For instance, a group formed by a reading enthusiast and a travel enthusiast might ultimately choose to watch a movie together. In such cases, relying solely on individual member preferences may fail to capture the group’s holistic intent. Thus, for group recommendation tasks, we propose integrating both the member-aggregated embedding and the group consensus preference embedding into the final group representation. The group consensus preference embedding, denoted as
, can be learned from group–event historical interactions. Like the user dynamic preference embedding, the group consensus preference embedding is also calculated based on a time decay mechanism as follows:
where
denotes the historical interaction sequence of group
,
denotes the time decay weight, and the
is an event embedding that group
interacted with which is obtained based on “Fine-grained Feature Attention” module. Therefore, we define the final group embedding as follows:
where the
is the group’s dynamic aggregation representation obtained in the above section by aggregating the members’ embedding.
After modeling the aggregation of member interests and modeling the group consensus preferences, we further generate the event representation for group
to better characterize the group consensus interest in different events. To achieve this, we introduce a “Norm and Add” (Normalization and Weighted Summation) module to integrate personalized event embeddings of group members and produce the final group event representation. Specifically, we first utilize the “fine-grained feature attention” module to obtain event embeddings for each member. These embeddings capture the degree of preference each member holds for various event features. However, since different members exert varying levels of influence in group decision-making, we further normalize these member-specific event representations and assign appropriate weights to ensure that the final group event representation effectively reflects the collective preference trend of the group. Thus the event representation for group
is as follows:
where the
is the event embedding for group
.
denotes feature embedding of event
. Then the weight
of each feature is as follows:
where
is a learnable weight matrix, and the
is the final group representation. We can predict the group preferences after obtaining the final group representations and the corresponding event representations.
5.2. Interaction Learning with Neural Collaborative Filtering
We adopt Neural Collaborative Filtering (NCF) to model the interaction information between group–event and user–event. NCF is a deep-learning-based approach that learns complex interaction relationships between users/groups and target event through nonlinear transformations and Multi-Layer Perceptrons (MLP), thereby enhancing recommendation accuracy. Specifically, for each group–event interaction pair
, we first obtain the embedding representations of group
and event
via a representation learning module, which can be acquired from a pre-trained embedding layer. Subsequently, the obtained group and event embeddings are fed into a pooling layer to further learn interaction relationships. The output embedding vectors of the pooling layer for user and group are as follows:
where include two recommendation tasks: group recommendation and user recommendation.
After the pooling layer, we feed the interaction representation into a fully connected layer to further learn the complex relationships between users/groups and events and compute the final prediction score. The primary role of the fully connected layer is to perform nonlinear transformations, enhancing the model’s expressive capacity. Specifically, for the target group
and target user
, we construct independent fully connected neural networks, obtaining the final hidden representation through layer-wise computations. The outputs of neurons in the L-th layer are defined as follows:
where the
is the weight matrix of the
l-th layer, mapping input features to the hidden space.
is the bias vector of the
l-th layer, enhancing the model’s learning capacity. The ReLU activation function introduces nonlinearity, empowering the model with stronger expressive capabilities. After obtaining the hidden representation of the final layer, we compute the final recommendation scores using a single-layer linear transformation. The predicted score
for target group
on event
and the predicted score
for target user
on event
are calculated as follows:
where the
is the weight parameter of prediction layer of group recommendation, the
is the weight parameter of prediction layer of user recommendation.
5.3. Recommendation and Optimization
We adopt the Regression-based Pairwise Loss (RPL) [
41] as the optimization objective to learn model parameters and jointly model group–event and user–event interactions. The core idea of this loss function is to optimize the preference ranking of users or groups across different events through pairwise comparisons, enabling the model to better capture the true interests of users or groups. To ensure effective learning of group preferences, we define the group-event loss function as follows:
where
denotes the group–event interaction data in the training set for group
g. The triplet
indicates that group
g has interacted with event
j but not with event
. Here
and
represent the model’s predicted preference scores of group
g for event
j and event
, respectively. The objective of this loss function is to ensure that the predicted score for the interacted event
j is higher than that for the non-interacted event
, i.e.,
. The optimization aims to minimize the squared error of
, thereby enforcing a margin of approximately 1 between the predicted scores to enhance the model’s discriminative capability.
Similarly, for the user–event recommendation task, we apply the same optimization strategy, defined as follows:
By maintaining a reasonable margin between scores, i.e., , the model effectively learns the user’s true preferences. To optimize the model, we employ Stochastic Gradient Descent (SGD) to minimize the combined losses and .
In recommendation tasks, another widely used pairwise learning method is Bayesian Personalized Ranking (BPR) [
42]. However, BPR suffers from a potential issue in multi-layer models: its loss can be reduced simply by scaling up the hidden weights, leading to a trivial solution and necessitating strict
regularization to constrain the parameters. In contrast, the regression-based pairwise loss (RPL) directly enforces a fixed margin between positive and negative samples (i.e.,
), which prevents such trivial solutions and enables stable optimization without additional regularization. This property makes the loss particularly suitable for integration with our fine-grained attention mechanism, as it provides clearer and more robust optimization signals for the model.
7. Discussion
In this work, we propose MAGER, a group event recommendation framework based on multiple attention mechanisms, which effectively captures user and group preferences at the fine-grained feature level through a heterogeneous attentions structure. Experiments on three real-world datasets demonstrate that MAGER significantly outperforms existing methods in both group and user recommendation tasks, validating the importance of feature-level attention and heterogeneous attentions structure in modeling dynamic group preferences.
7.1. Theoretical and Practical Implications
This work demonstrates that the Fine-grained Feature Attention (FFA) module plays a crucial role in capturing user and group preferences in EBSNs. A key theoretical insight revealed by our empirical analysis is that among all event features, the organizer exhibits the most dominant influence on both user and group behaviors. This finding indicates that users tend to rely heavily on organizer credibility when selecting events, which substantially shapes group preferences. Therefore, future EBSN recommendation models should avoid treating all features uniformly and instead explicitly differentiate the contributions of individual features, especially organizer-related signals, when modeling user and group decisions. This insight enhances our theoretical understanding of how fine-grained features affect group preferences in EBSNs scenarios. From a practical perspective, MAGER demonstrates robust performance under sparse interaction, which are common in real-world platforms such as Douban-Event and Meetup where new events emerge frequently and user interaction histories are limited. By accurately learning feature-level preferences, even when user histories are short, MAGER improves recommendation quality in dynamic and data sparse. This highlights the practical value of incorporating feature-aware attention mechanisms in EBSN recommenders, particularly in platforms characterized by fast-evolving event life cycles. Overall, this study provides both theoretical insights into feature-specific preference modeling and practical evidence of adaptability to sparse and dynamic social settings.
7.2. Limitations
Despite its advantages, MAGER has several limitations. First, MAGER does not explicitly capture latent correlations among different features, which may reduce the richness of event representations. Second, our experiments are conducted on two structurally similar EBSNs, and thus further validation is needed on more heterogeneous platforms. Third, although the hierarchical attention design improves performance, its computational cost increases with the number of event features, which may limit scalability in large-scale applications. More importantly, the use of synthetic groups constitutes a key limitation. In this study, groups are generated by clustering users who have participated in the same event more than three times. As shown in
Section 6.3.5, this construction method leads to strong preference homogeneity among group members, which in turn results in minimal performance variation across different group sizes. This homogeneity, in turn, minimized performance fluctuations across different group sizes, potentially masking the true challenges of modeling dynamic group preferences. Future work should therefore explore more realistic or dynamic group formation mechanisms to better reflect real-world group behaviors and to more rigorously evaluate the model’s capability in handling preference diversity.
7.3. Future Work
Future work could explore more diverse group construction methods, such as forming groups based on social or geographical criteria with minimal overlap in historical activities, to evaluate the model’s robustness under preference conflicts and low-consistency conditions. Additionally, introducing a cross-feature attention mechanism to explicitly model correlations between features, e.g., organizer and venue, and incorporating this information into NCF could further enrich event representations. Finally, applying MAGER to a wider range of heterogeneous EBSN platforms and integrating multi-modal event features would help validate its generalizability and practical applicability.