Next Article in Journal
Computational Investigation of Long Free-Span Submarine Pipelines with Buoyancy Modules Using an Automated Python–Abaqus Framework
Previous Article in Journal
Lightweight Models for Influenza and COVID-19 Prediction in Heterogeneous Populations: A Trade-Off Between Performance and Level of Detail
 
 
Article
Peer-Review Record

Attribute-Aware Graph Aggregation for Sequential Recommendation

Mathematics 2025, 13(9), 1386; https://doi.org/10.3390/math13091386
by Yiming Qu 1, Yang Fang 2,*, Zhen Tan 1 and Weidong Xiao 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Mathematics 2025, 13(9), 1386; https://doi.org/10.3390/math13091386
Submission received: 7 March 2025 / Revised: 10 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025
(This article belongs to the Section E1: Mathematics and Computer Science)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. There are only four baseline models, all of which were published before 2022. Please compare them with the latest models from the literature in 2024 and 2025.

2. How can it be demonstrated that LightGCN in Attribute Attention Graph Embedding is a state-of-the-art method?

3. Regarding Experiment 3, is there a consistent pattern in the selection of the maximum attribute sequence length across different datasets? And what is this pattern related to?

Comments on the Quality of English Language

The overall readability of the paper is poor. The complete process from input to output is not well-connected, making it difficult to follow each step. Additionally, some variables in the equations are not clearly explained. For example, why does Equation (5) contain two Ea​? Where do s1 and s2 in Equation (7) come from, and what do they represent? What does SO in Equation (8) mean? There are similar problems elsewhere in the paper!

Author Response

Comments 1: There are only four baseline models, all of which were published before 2022. Please compare them with the latest models from the literature in 2024 and 2025.

Response 1: Thank you for pointing this out. The state-of-the-art work in using attribute information for sequence recommendation is the paper "Context and Attribute aware Sequential Recommendation via Cross attention" published in 2022. There are some papers that use attribute information for session recommendation, but this is not within the scope of our research.

Comments 2: How can it be demonstrated that LightGCN in Attribute Attention Graph Embedding is a state-of-the-art method?

Response 2: We agree with this comment. We applied the method of using graph neural networks to model the interaction between users and projects in recommendation systems to the interaction between projects and attributes. Therefore, we did not consider more complex graph neural network models, and calling LightGCN the most advanced graph neural network model is not particularly accurate. We have made modifications in the article.

Comments 3: Regarding Experiment 3, is there a consistent pattern in the selection of the maximum attribute sequence length across different datasets? And what is this pattern related to?

Response 3: The length of the maximum attribute sequence is determined by both the length of the item sequence and the number of item attributes in the dataset. We used grid search method in different datasets to find the maximum attribute sequence length that resulted in the best experimental results.

Comments on the Quality of English Language: The overall readability of the paper is poor. The complete process from input to output is not well-connected, making it difficult to follow each step. Additionally, some variables in the equations are not clearly explained. For example, why does Equation (5) contain two Ea​? Where do s1 and s2 in Equation (7) come from, and what do they represent? What does SO in Equation (8) mean? There are similar problems elsewhere in the paper!

Response: We agree with this comment. Therefore, we have provided more specific descriptions for each component and formula in the model. Equation (5) contain two Ea is that we have designed an attention mechanism that aggregates multiple embeddings into the final user representation. Specifically, we calculate the similarity between item level basic interests and attribute level embeddings, and calculate the weighted sum of attribute level embeddings based on the similarity as the final user representation. This attention mechanism uses item level interests as queries and attribute level multiple interests as keys and values.  s1 and s2 in Equation (7) denotes the component of s in Equation (6) in two-dimensional space. Sin Equation (8) denotes the embedding after rotation encoding in Equation (7). 

Reviewer 2 Report

Comments and Suggestions for Authors

The paper proposes an attribute-aware graph aggregation method for sequential recommendation, aiming to improve recommendation accuracy by modeling user preferences through item attributes and their changes over time.

Pros:

1. The paper is well-organized and easy to read. 2. The proposed model is well-motivated. 3. The experiments are comprehensive.

Cons:
1. The method for constructing attribute sequences is not clearly explained. It is unclear how the mapping rules are applied to form the attribute change sequence. 2. The use of rotary encoding is complex and lacks a clear explanation of how it enhances the model's ability to capture relative positional relationships in the sequence. 3. The paper does not provide a detailed analysis of how hyperparameters, such as the maximum attribute sequence length (La) and embedding dimension (d), affect the model's performance. 4. The article mainly discusses graph-based recommendation. It is recommended to introduce related methods: Learning Graph ODE for Continuous-Time Sequential Recommendation. 5. The paper does not provide code or detailed instructions for reproducing the results. This makes it difficult for other researchers to validate and build upon the proposed methods.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Comments 1: The method for constructing attribute sequences is not clearly explained. It is unclear how the mapping rules are applied to form the attribute change sequence. 

Response 1: Thank you for pointing this out. To capture more accurate user interests, in this paper, we propose to replace items with their attributes and construct an attribute sequence for each item sequence. Note that different items may share common attributes, and the attribute sequence is supposed to indicate evolution of user interests. An intuitive solution is to keep the latest one because it reflects the persistence of user preferences for attributes. 

Comments 2: The use of rotary encoding is complex and lacks a clear explanation of how it enhances the model's ability to capture relative positional relationships in the sequence.

Response 2: Thank you for pointing this out.  Due to space limitations, the paper did not provide excessive elaboration on rotation encoding, so we have added appropriate supplements in the paper. Rotational encoding RoPE can effectively maintain the relative relationship of positional information, that is, there is a certain similarity between encodings of adjacent positions, while there is a certain difference between encodings of distant positions. This can enhance the model's perception and utilization of location information. This is something that other absolute position encoding methods (such as sine position encoding, learned position encoding, etc.) do not possess, as they can only represent absolute position and cannot represent relative position.

Comments 3: The paper does not provide a detailed analysis of how hyperparameters, such as the maximum attribute sequence length (La) and embedding dimension (d), affect the model's performance.

Response 3: Under our attribute sequence construction method, when La is small, only the first La attributes with the largest item neighbors are retained, which represent the attributes that users mainly care about. When L is too large, attributes of recent items become dominant in multi-interest extraction. As for the embedding dimension (d), we use grid search method to find the best embedding dimension that results in the best experimental results.  High dimensional embedded expression can achieve higher accuracy and lower popularity bias. This means that increasing the dimensionality of embedded vectors can improve the performance and accuracy of recommendation systems, but at the same time may cause overfitting issues with popularity bias. This type of issue is not the main focus of the paper.

Comments 4: The article mainly discusses graph-based recommendation. It is recommended to introduce related methods: Learning Graph ODE for Continuous-Time Sequential Recommendation. 

Response 4: We agree with this comment. Therefore, we introduce the paper in Related work part.  

Comments 5: The paper does not provide code or detailed instructions for reproducing the results. This makes it difficult for other researchers to validate and build upon the proposed methods.

Response 5:  The code can be obtained from the corresponding author if the paper is published in the journal.

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The study argues that current methods do not investigate the fundamental causes of every encounter. Compared to conventional item-based approaches, how does the suggested attribute-centric approach provide a more profound comprehension of user preferences?
  2. According to the authors, this approach improves generality and accounts for shifting user preferences. Could you provide more details on the precise processes behind this enhanced anticipation and generalization?
  3. The authors provide the notion of preserving each property in its most recent condition. "How does this most recent state method resolve inconsistencies or conflicts between various instances of the same attribute in a user's sequence?
  4. New encoding techniques for attention networks are mentioned in the introduction. Could you provide a more thorough explanation of these encoding techniques and why they are required for better stability and performance?
  5. The study claims to create 'item-attribute interaction graphs.' What kind of graph is expressly used, and how is it built using the interaction data? What details does this graph provide that are not included in a typical contact sequence?
  6. The suggested approach manages data sparsity, according to the research. Could you elaborate on how shifting the emphasis to characteristics helps data sparsity?
  7. What are the possible drawbacks of concentrating just on attributes? Are item-level data still essential for making precise recommendations in any situation?
  8. The main topic of the study is dynamic user preferences. How well does this method work for cold-start issues when there is little to no user interaction history?
  9. The shift from Markov chains to RNNs and then to self-attention mechanisms is covered in the study. Regarding sequential suggestions, what are each strategy's main benefits and drawbacks?
  10. TiSASRec and Chorus are mentioned in the report as tools for managing temporal information. What distinguishes the suggested model from these current techniques for capturing temporal dynamics?
  11. Given the rapid advancement in sequential recommendation, how does the proposed method differentiate itself from the latest state-of-the-art models beyond the baselines mentioned?
  12. A variety of attribute-aware models, including factorization machines and neural network-based techniques, are covered in the related work section. In what ways does the suggested model incorporate attribute data differently or more successfully than these current methods?
  13. Attributes are used as supplemental information in some models, according to the research. What issues result from considering characteristics as just auxiliary, and how does the suggested approach resolve these issues?
  14. SOTA is used to refer to the CARCA model. What benefits does the suggested model provide, and what are the main distinctions between it and CARCA?
  15. The study examines several embedding vector-based attribute-aware recommendation models. What distinguishes the attribute embedding technique of the suggested model from these other models?
  16. How does the proposed method bridge the gap between sequential recommendation and attribute-aware recommendation, and what are the benefits of this integration?
  17. Based on the related work, what are the most significant challenges that remain in sequential and attribute-aware recommendation, and how does the proposed model attempt to address them?
  18. How does the proposed model deal with items with very few attributes or a large number of attributes?
Comments on the Quality of English Language

To better convey the study, the English may be enhanced.

Author Response

Comments 1: The study argues that current methods do not investigate the fundamental causes of every encounter. Compared to conventional item-based approaches, how does the suggested attribute-centric approach provide a more profound comprehension of user preferences?

Responses 1: Many methods for analyzing user interests are designed to extract multiple interests from project sequences. Usually, each preference embedding is calculated by a linear combination of input item embeddings, where weights represent the likelihood between items and interests. However, all these methods treat projects as basic user preference units without considering project attributes and ignoring the true reasons behind searching for each interaction. We believe that users always pay attention to certain attributes in each interaction, and different users exhibit different points of interest. By searching for the real attributes that users care about and directly generating interests based on the attributes, fine-grained user preferences can be obtained and model performance can be improved.

Comments 2: According to the authors, this approach improves generality and accounts for shifting user preferences. Could you provide more details on the precise processes behind this enhanced anticipation and generalization?

Responses 2: The paper replaces traditional item sequences with attribute sequence modeling, decouples the strong binding between attributes and items, and achieves universal optimization. Using a rotation matrix in the complex domain to encode positional relationships, alleviating the learning bias of sequence positional information caused by data sparsity. By using the temporal patterns of attribute sequences (such as order and repetition frequency), we utilize variant GAT (introducing a time decay factor) to focus on recent key attributes.

Comments 3: The authors provide the notion of preserving each property in its most recent condition. "How does this most recent state method resolve inconsistencies or conflicts between various instances of the same attribute in a user's sequence?

Responses 3: If users show different preferences for the same attribute (such as liking a certain characteristic of attribute X in the early stage and focusing on another characteristic in the later stage), the new instance will directly overwrite the old instance, implicitly assuming that the latest behavior has reflected the preference correction.

Comments 4: New encoding techniques for attention networks are mentioned in the introduction. Could you provide a more thorough explanation of these encoding techniques and why they are required for better stability and performance?

Responses 4: Rotational encoding RoPE can effectively maintain the relative relationship of positional information, that is, there is a certain similarity between encodings of adjacent positions, while there is a certain difference between encodings of distant positions. This can enhance the model's perception and utilization of location information. This is something that other absolute position encoding methods (such as sine position encoding, learned position encoding, etc.) do not possess, as they can only represent absolute position and cannot represent relative position.

 

Comments 5: The study claims to create 'item-attribute interaction graphs.' What kind of graph is expressly used, and how is it built using the interaction data? What details does this graph provide that are not included in a typical contact sequence?

Responses 5: The graph type is binary graph, and the association between items and attributes directly comes from the metadata of the items (the attribute list of each item).

The paper supplements the following key information that traditional user behavior sequences cannot provide by explicitly constructing a "project attribute interaction graph": semantic associations across items, implicit relationships and dynamic propagation of attributes, and fine-grained interest decoupling. Compared to models that rely solely on user history clicks, this graph structure introduces semantic prior knowledge of items into recommendation systems, compensating for the sparsity and noise issues of behavioral data.

Comments 6: The suggested approach manages data sparsity, according to the research. Could you elaborate on how shifting the emphasis to characteristics helps data sparsity?

Responses 6: By elevating user interest modeling from item level to attribute level and combining GNN's cross item information propagation and dynamic sequence maintenance, the model alleviates sparsity issues in the following aspects:

Generalized representation: By sharing attribute links with cold start items, it reduces dependence on the data volume of a single item.

Dynamic selection: Maintain the latest status, focus on effective signals, and filter historical noise.

Hierarchical complementarity: The gating mechanism adaptively selects trustworthy signal sources in sparse scenes.

Comments 7: What are the possible drawbacks of concentrating just on attributes? Are item-level data still essential for making precise recommendations in any situation?

Responses 7: The drawbacks of attribute orientation include coarse-grained information, attribute noise, insufficient modeling of complex preferences, and neglect of unstructured features. The best practice is to balance attribute generalization and item uniqueness with a mixed strategy, rather than making one-sided choices. The core value of attributes lies in expanding the breadth of recommendations, while project level data ensures recommendation accuracy.

Comments 8: The main topic of the study is dynamic user preferences. How well does this method work for cold-start issues when there is little to no user interaction history?

Responses 8: The effectiveness of the "Project Attribute Interaction Graph" method proposed in the paper in cold start scenarios mainly depends on:

The generalization ability of attribute propagation on sparse data: mining potential associations across items through shared attributes.

Dynamic preference modeling of graph neural networks: combining long short-term interests and real-time response to limited interactions.

Explainable recommendation decision: The cold start recommendation result can be traced back to specific attributes, making it easy for users to understand and debug the system.

Comments 9: The shift from Markov chains to RNNs and then to self-attention mechanisms is covered in the study. Regarding sequential suggestions, what are each strategy's main benefits and drawbacks?

Responses 9: The advantages of Markov chains include low computational complexity, strong interpretability of parameters, strong ability to model short sequences, and data sparsity friendliness. Disadvantages: Unable to capture long-range dependencies, parameter count exponentially increases with the number of states, and state space explodes.

The advantages of RNNs include the ability to model long sequences, dynamically adapt to variable length sequences, and feature fusion capabilities. Disadvantages: The problem of vanishing/exploding gradients still exists, and parallel computing is not possible. The entire sequence is compressed into fixed dimensional hidden vectors, and local details may be lost.

The advantages of self attention mechanism: direct modeling of arbitrary position dependencies in the entire sequence without information decay; Parallelization is achieved through matrix multiplication, and the training speed is much faster than RNN; Visualization of attention weights reveals user behavior associations. Disadvantages: High computational resource requirements, requiring explicit introduction of positional encoding to maintain sequential information. When there are too many parameters, small-scale datasets are prone to overfitting.

Comments 10: TiSASRec and Chorus are mentioned in the report as tools for managing temporal information. What distinguishes the suggested model from these current techniques for capturing temporal dynamics?

Responses 10: In TiSASRec, the author uses Relative Positional Encoding and self attention mechanism to explicitly model the time intervals and order between items in user behavior sequences. This mechanism draws inspiration from the Transformer architecture, but considers the actual time difference between users' continuous behaviors when calculating attention weights. Chorus combines knowledge graph to model the relationships between products, and introduces a dynamic time kernel function for each product relationship to control the impact of historical interactions with different relationships on the target product. This better captures dynamic user demand from both knowledge enhancement and time sensitivity perspectives.

Comments 11: Given the rapid advancement in sequential recommendation, how does the proposed method differentiate itself from the latest state-of-the-art models beyond the baselines mentioned?

Responses 11: Through the innovative design mentioned above, our model has achieved breakthroughs in fine-grained preference capture, dynamic evolution modeling, and multi-level feature fusion, especially surpassing existing SOTA in high attribute structured scenarios, providing a new paradigm for interpretability and dynamic adaptability of recommendation systems. In the future, further exploration can be conducted on the dynamic fusion of cross modal attributes (such as text+image) to expand applicable scenarios.

Comments 12: A variety of attribute-aware models, including factorization machines and neural network-based techniques, are covered in the related work section. In what ways does the suggested model incorporate attribute data differently or more successfully than these current methods?

Responses 12: The model in the paper is significantly different from traditional factorization (FM) and neural network methods through three innovative mechanisms: "dynamic attribute sequence construction", "graph based cross hierarchical attribute association modeling", and "sequence aware rotation encoding". It achieves improvements in fine-grained capture and long-term evolution modeling:

Replace each item in the user behavior sequence with its attribute and construct a dynamic attribute sequence by deduplicating and retaining the latest attribute state. This sequence directly reflects the shift of user preference focus in the attribute dimension, rather than the arrangement of item IDs, and then captures long-term dependencies between attributes through self attention mechanism (GAT variant).

Build an item attribute heterogeneous graph (nodes containing items and attributes, edges representing item attribute relationships), and use LightGCN's multi-layer propagation to fuse implicit associations between attributes (such as the synergy effect between brand and price). At the same time, learning the temporal importance differences of attributes through GAT variants on attribute sequences (such as higher weights for recently focused attributes).

Introduce Rotary Encoding to express relative positional differences through vector rotation. For example, for the attribute sequence "Brand A → Price B → Brand C", rotation encoding can explicitly model the "relative distance between Brand A and Brand C (2 digits apart)" and the "contextual impact of Price B on Brand C" without relying on fixed sliding windows or positional embedding matrices.

Comments 13: Attributes are used as supplemental information in some models, according to the research. What issues result from considering characteristics as just auxiliary, and how does the suggested approach resolve these issues?

Responses 13: The common problems with attributes as auxiliary factors include: static feature intersection cannot capture dynamic evolution, sequence modeling of items as the main unit ignores attribute commonalities, feature concatenation leads to semantic confusion, and noise introduction under high-dimensional sparse features. The paper elevates attributes from "supplementary information" to the core modeling unit of dynamic evolution (i.e. "attributes are sequences"), and combines graph networks and rotation encoding techniques to solve the problems of dynamic loss, sparsity sensitivity, and semantic confusion caused by attribute assisted use in traditional methods. Its core contribution lies in proving:

The representation of user interest evolution in attribute dimension is more direct and has stronger generalization ability than item ID.

Explicit attribute sequence+graph structure modeling can compensate for the information loss caused by static feature interaction, especially enhancing the scene applicability of structured attributes such as brand and price.

The introduction of rotation encoding enables the model to have flexibility in modeling complex sequence dependencies that traditional methods cannot match.

Comments 14: SOTA is used to refer to the CARCA model. What benefits does the suggested model provide, and what are the main distinctions between it and CARCA?

Responses 14: The paper model uses attribute sequences to replace item sequences, attribute graph association mining, and rotation encoding to enhance temporal modeling, achieving three major leaps compared to CARCA:

Granularity of interest modeling: from item level to attribute level, with stronger generalization ability;

Dynamic capture efficiency: Real time transfer that directly reflects user attribute preferences, with faster response speed;

Computational efficiency optimization: Simplify sequence structure, reduce redundant parameters, and make it more suitable for large-scale deployment.

The core difference lies in the paradigm shift from "item attribute dual stream parallel" to "attribute single stream dominant", which solves the inherent bottlenecks of CARCA in attribute sparsity, cold start, and long sequence modeling.

Comments 15: The study examines several embedding vector-based attribute-aware recommendation models. What distinguishes the attribute embedding technique of the suggested model from these other models?

Responses 15: The core differences between the attribute embedding technology proposed in the paper and traditional models can be attributed to three aspects:

Dynamic temporality: Upgrading attribute embeddings from static features to dynamic representations that evolve with user behavior sequences, replacing the dominant position of item IDs;

Implicit structural association: Capturing collaborative and alternative patterns between attributes through graph propagation rather than artificial rules or explicit intersections;

Spatial Rotation Encoding: Utilizing complex transformations to preserve positional information and address the directional disruption of embedding space caused by traditional additive encoding.

These innovations make attribute embedding not only an auxiliary signal, but also a core carrier for user interest modeling, especially suitable for recommendation scenarios with high sparsity, strong timeliness, and multi-attribute heterogeneity.

Comments 16: How does the proposed method bridge the gap between sequential recommendation and attribute-aware recommendation, and what are the benefits of this integration?

Responses 16: The paper achieves seamless integration of sequential recommendation and attribute aware recommendation through attribute serialization expression, temporal semantic bidirectional modeling, and dynamic gating fusion. Its core value lies in:

Modeling uniformity: shifting from a fragmented design of "item timing + attribute features" to an integrated paradigm driven by "attribute timing";

Dynamic adaptability: capturing real-time migration of user attribute preferences (time dimension) while characterizing the global semantic network of attributes (spatial dimension);

Scenario breakthrough: Achieving performance leaps that traditional models cannot match in multiple aspects such as cold start, interpretability, and long tail coverage.

This integration not only compensates for the inherent deficiencies of two types of recommendation technologies, but also provides a new architectural paradigm for building the next generation of user intention perception systems.

Comments 17: Based on the related work, what are the most significant challenges that remain in sequential and attribute-aware recommendation, and how does the proposed model attempt to address them?

Responses 17: The core challenges of current order and attribute aware recommendation focus on dynamic preference adaptation, complex relationship mining, cold start generalization, location noise suppression, and efficiency interpretability balance. The paper model systematically addresses these challenges through dynamic state updates, graph context enhancement, rotation position encoding, and gating fusion mechanisms. Its innovation lies in transforming attributes from auxiliary information to core modeling objects, achieving bidirectional enhancement of temporal dynamics and semantic associations, providing key technical support for the next generation evolution of recommendation systems.

Comments 18: How does the proposed model deal with items with very few attributes or a large number of attributes?

Responses 18: The paper model demonstrates significant advantages in handling projects with few attributes and multiple attributes through four strategies: dimension alignment, graph network enhancement, dual door control fusion, and meta learning transfer.

Less attribute side: utilizing external knowledge completion, sharing graph neighbor information, and temporal weight bias to break through the limitations of information sparsity;

Multi attribute end: Based on attention pruning, subgraph focusing, and meta knowledge distillation, it suppresses noise interference and strengthens core attribute signals.

This design not only balances the modeling requirements of projects with different attribute densities, but also provides unified support for complex scenarios such as cold start and cross category recommendation, which is significantly better than traditional static attribute processing methods.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

There are no other questions.

Reviewer 2 Report

Comments and Suggestions for Authors

The author addressed my concerns and I am inclined to accept.

Back to TopTop