Round 1
Reviewer 1 Report
Comments and Suggestions for Authors
Dear Authors,
I think the manuscript is thorough and well-structured; however, some shortcomings mean it isn't ready for publication in its current state.
My comments are as follow;
- It would be better to use passive expressions rather than subjective explanations in the summary section.
- You should replace reference 1 in the introduction section from [1?] to [1].
- The problem description is given clearly.
- The contribution points to the literature are emphasized in detail.
- Embedding in Figure 1 is misspelled and needs to be corrected.
-The methodology is detailed and clearly specified.
-It is not clear what the findings in Table 3, Table 4, and Table 5 are. Are these precision, accuracy, or RMSE? You should also present these as metrics so that it is clear what has been improved and by how much.
- Similarly, in Figure 2, it is not clear what is being compared, as there are no metrics or units. These also require further explanation.
- Additionally, a discussion section should be added where not only the advantages but also the disadvantages are clearly listed and discussed using known methods.
- In the conclusion section, the improvements achieved should be highlighted with numerical data.
-In reference, no mention has been made of the studies related to the year 2025. It would be better if you scan this part and add the necessary improvements.
I would love to review after the revision.
Best
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you devoted to evaluating our manuscript. We have carefully addressed each of your comments, and a full point-by-point response is provided in the attachment together with all revised text.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for Authors
The article presents mature and relevant research, with very detailed solution architecture and experimental validation. Testing the generalization of the framework on multiple encoders is a plus.
Can be accepted in its current form.
My only recommendation is to use more graphical content, for example describing the model architecture in more detail. Also some formulas are well known and the article does not benefit from them (ex. 23).
Some minor proofreading is required (ex. reference [1], iintent line 119).
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you devoted to evaluating our manuscript. We have carefully addressed each of your comments, and a full point-by-point response is provided in the attachment together with all revised text.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors
While I enjoyed reviewing the paper I should say the paper needs major revision. My comments are listed below:
In the abstract, you mention running experiments on two datasets, but later you use three: Beauty, Toys, and ML 1M. Make sure you are consistent and update the numbers to match.
The introduction still has a placeholder reference tag. Fix this and make sure all the references are complete before you submit.
The abstract keeps calling the framework new, but combining semantic item representations, a global item graph, and a transformer style sequential encoder is similar to existing recommenders. Be more careful with the novelty claim and explain more clearly how your work is different from the closest papers.
The paper uses the same symbol for both randomly initialized embeddings and LLM based embeddings. This is confusing. Say clearly whether there are two different embedding matrices or just one that comes from the LLM, and fix the notation so it is easy to follow.
You say you use a pretrained LLM for the item encoder and mention SBERT, but you do not say which SBERT version you use, if it is frozen or fine tuned, or how long the prompts are. Add these details and say clearly whether you train the LLM or just use it as is.
You define top K neighbors and a similarity threshold for the semantic graph, but never say what values you use or how you pick them. Report your choices for each dataset and say a few words about how these choices affect performance.
The adaptive edge learning module is important, but you only describe it in general terms. Add details like hidden size, how you initialize things, and any regularization on the edge weights. This helps others reimplement the module.
You talk about edge dropout and node dropout, but do not say if you regularize the learned edge weights or limit their range. Say if you tried other regularization and what effect it had on overfitting.
You say you use binary cross entropy on one positive and one negative item, but you do not say how many negatives you sample per interaction or if you pick them randomly or by popularity. Explain your negative sampling strategy and why you chose it.
You use Recall when defining metrics, but later in the tables you switch to HR. Say clearly that HR at K is the same as Recall at K here, and use the same words in both the text and the tables.
You say your method beats strong baselines, but you only give numbers from a single run and do not report any variance or significance tests. Add multiple runs with standard deviations or basic significance testing so readers know if the improvements are real.
You list several hyperparameters, but do not say what hardware you used or how long training took. Add a short paragraph with GPU type, training time per epoch, and total training time compared to at least one strong baseline. This will back up your claim that the method is practical.
You say the maximum sequence length is 50 for Beauty and 165 for ML 1M, but do not mention Toys. Give the maximum sequence length for all datasets and say if this choice helps any method more than others.
You say you merge metadata for Amazon and use titles and genres for ML 1M, but do not say if the baselines also get to use these features. Make it clear if LLM based semantic information is only used by SAGERec or also by some baselines, and talk about whether the comparison is fair.
You mention recent recommendation work that uses language models, but do not compare with any of these models in your experiments. Either add at least one strong LLM based baseline or explain clearly why you cannot do this in your study.
The conclusion says your approach is promising for real world systems, but you do not provide any case studies or examples where the method helps with long tail items. Add a short analysis, such as nearest neighbor examples or user sequences where your method changes the ranking in a clear way.
There are small typos and words that are broken up, like 'iintent' and 'Attribution' split across a line. Some references also say 'Proceedings of the Proceedings.' Go through the paper and fix these issues to improve the writing.
In Section 4, the move from describing the framework to talking about embeddings and the semantic graph is dense. Add a small example with a real item to help readers understand the steps from metadata to LLM embeddings to graph edges.
You say you do not include self loops in the adjacency matrix. Say if you tried using self loops and whether it made any difference.
You mention edge dropout on the semantic graph, but do not say if you rebuild or update the graph during training, for example by recomputing similarities. Say if the graph is fixed after you build it, or if you update it, and explain why you chose this.
Your discussion of related work on hybrid graph and sequential models is brief, even though some are very similar to your framework. Expand this section and compare more clearly how those models use graph information compared to SAGERec.
The experimental section would be better with a short complexity comparison against SINE or another graph based method, since both use item item graphs. Adding a table with node and edge counts, memory use, and training time per epoch would make your efficiency claims clearer.
The captions for Tables 3 and 4 are not clear about whether the numbers are for top 10 or top 20. Make sure each table says if it is top 10 or top 20, and use the same wording in the captions and in the main text.
You say your method helps with sparse or noisy interactions, but there is no test on short sequences or cold users. Try adding results where you report metrics for users with short histories to support this claim.
The limitations section says you will work on flexibility and aligning semantics with behavior in the future, but it would help to be more specific about current weaknesses. For example, talk about problems with noisy metadata or if SBERT was trained on a different domain. List these so readers know where SAGERec might not work well.
Author Response
Dear Reviewer,
We sincerely appreciate the time and effort you devoted to evaluating our manuscript. We have carefully addressed each of your comments, and a full point-by-point response is provided in the attachment together with all revised text.
Author Response File:
Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for Authors
Thanks to the authors for addressing my comments.