Next Article in Journal / Special Issue
A Survey on Quantum Computing for Recommendation Systems
Previous Article in Journal
Instantaneous Frequency Estimation of FM Signals under Gaussian and Symmetric α-Stable Noise: Deep Learning versus Time–Frequency Analysis
Previous Article in Special Issue
Special Issue on Information Retrieval, Recommender Systems and Adaptive Systems
 
 
Article
Peer-Review Record

Introducing CSP Dataset: A Dataset Optimized for the Study of the Cold Start Problem in Recommender Systems

Information 2023, 14(1), 19; https://doi.org/10.3390/info14010019
by Julio Herce-Zelaya 1,*, Carlos Porcel 1, Álvaro Tejeda-Lorente 2, Juan Bernabé-Moreno 2 and Enrique Herrera-Viedma 1
Reviewer 1:
Reviewer 2: Anonymous
Information 2023, 14(1), 19; https://doi.org/10.3390/info14010019
Submission received: 17 September 2022 / Revised: 20 December 2022 / Accepted: 22 December 2022 / Published: 29 December 2022
(This article belongs to the Special Issue Information Retrieval, Recommender Systems and Adaptive Systems)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

The authors resubmitted the paper in a new ID. Compared to the previous submission (1780253), the new version solved the most concerns. However, there are still some language and style issues for writing. For instance, in line 82, there is only one sentence for a paragraph. It is suggested that the authors need to find somebody help to revise the entire paper.

Author Response

Dear reviewer, we want to thank you for your valuable comments, recommendations and findings. Please find below the response to your points:   However, there are still some language and style issues for writing. For instance, in line 82, there is only one sentence for a paragraph. It is suggested that the authors need to find somebody help to revise the entire paper.   An extensive grammar proof has been performed for the whole article. We have also, rewritten and split some paragraphs to add clarity.

Reviewer 2 Report (New Reviewer)

The paper proposes a new dataset for the evaluation of recommender systems. The dataset, Filmaffinity enriched with user data from Twitter, is tailored towards the cold start problem. Having a standard benchmark for the cold-start recommendation task would be a valuable addition to the literature. However, I miss clear specifications of the evaluation protocol and the results of at least a simple baseline, if not some state-of-the-art methods. A clear evaluation protocol is particularly important for future work to obtain comparable results, especially because the dataset aims at the cold start problem.

In the end, I appreciate the effort of putting together a dedicated benchmark dataset, from which the recommender systems community can profit. Yet, I believe the paper would require some baseline results and a clear evaluation protocol to be published. I encourage the authors to add these missing items in a revision or resubmission.

Major issues:

1) No results for baseline. Running a baseline, such as the most simple imaginable, or the state-of-the-art method would give readers a hint of how challenging this dataset is to work on, and what can be gained from focusing their research on this dataset. For example, I point the authors to [1] and [2], which consider the cold start problem as a central theme of their research.

2) No clear evaluation protocol, such as a fixed train/dev/test split or a suggestion in what way to cross-validate results. I deem this to be important, because only with a clear protocol, different future researchers working on this dataset would generate comparable results. 

Minor issues:

- line 441: The prediction that this dataset will be used by a "massive amount of researchers" is quite bold. Please consider rephrasing this part as your expectation.

- Style: Readability suffers from the combination of indented paragraphs and very short paragraphs. I suggest to expand where appropriate, and otherwise consider merging paragraphs or removing the indentation.

- Styling of equations: Eq 2 centered, while Eq 3 is not

- line 424: double comma

- line 441: This work provides

- Some figures need to be improved. What is Figure 7? The caption says it would be a formula, while it seems to be a screenshot of the result of a metric that was calculated? The presentation needs improvement.

- In Table 3, it says "Figure??"

- If space is needed, some of the exemplary data can be removed/shortened

- Please check comma placement (like after 'Therefore', 'Moreover', in the entire draft)

 

[1] Majumdar, Angshul, and Anant Jain. “Cold-Start, Warm-Start and Everything in between: An Autoencoder Based Approach to Recommendation.” In 2017 International Joint Conference on Neural Networks (IJCNN), 3656–63, 2017. https://doi.org/10.1109/IJCNN.2017.7966316.

[2] Vagliano, I., L. Galke, and A. Scherp. “Recommendations for Item Set Completion: On the Semantics of Item Co-Occurrence with Data Sparsity, Input Size, and Input Modalities.” Information Retrieval Journal, April 4, 2022. https://doi.org/10.1007/s10791-022-09408-9.

   

Author Response

Dear reviewer, we want to thank you for your valuable comments, recommendations and findings. Please find below the response to your points:

1) No results for baseline. Running a baseline, such as the most simple imaginable, or the state-of-the-art method would give readers a hint of how challenging this dataset is to work on, and what can be gained from focusing their research on this dataset. For example, I point the authors to [1] and [2], which consider the cold start problem as a central theme of their research.     We have added the reference to suggested works. Also, we have chosen a baseline and provided information on how our work compares it against it: "For baseline, we have chosen the recent work on recommendations for items set completion \cite{Vagliano2022} and we have picked the predicted subject labels for the EconBiz Dataset, where the partial set of items along with the title is given, and using with SVD model. These scenario, which is comparable to the cold-start problem which it is faced on this work, is obtaining a MRR of around 45%. On the other side, the proposed model of our work with CSP-Dataset is obtaining a MRR of 0.457%."   2) No clear evaluation protocol, such as a fixed train/dev/test split or a suggestion in what way to cross-validate results. I deem this to be important, because only with a clear protocol, different future researchers working on this dataset would generate comparable results.    We have added description about the protocol for split train-test data: "The split for training-test data is 80% - 20%. These split has been made based on user ids with the idea that users in test data do not appear in the training data, emulating a new user's cold-start scenario." Also in the "Materials and Methods" section we have shared the dataset and the code used for creating the recommendations.

- line 441: The prediction that this dataset will be used by a "massive amount of researchers" is quite bold. Please consider rephrasing this part as your expectation.   We have replaced the quoted text: "This work provides a dataset that will be used by a massive amount of researchers to create their models in the future with cutting-edge algorithms..." with the following: "This work provides a dataset of high interest due to the scarcity of datasets providing extensive items and user behavioral features. The dataset will enable researchers to create their models in the future with cutting-edge algorithms..."   - Style: Readability suffers from the combination of indented paragraphs and very short paragraphs. I suggest to expand where appropriate, and otherwise consider merging paragraphs or removing the indentation.   We have, following your advice, merged some related short paragraphs. Unfortunately, we can not do anything about the indentation since it is provided by the journal template.   - Styling of equations: Eq 2 centered, while Eq 3 is not We have fixed that.   - line 424: double comma We have removed the double comma   - line 441: This work provides We have fixed this.   - Some figures need to be improved. What is Figure 7? The caption says it would be a formula, while it seems to be a screenshot of the result of a metric that was calculated? The presentation needs improvement. We have replaced the caption with: Accuracy classification score result   - In Table 3, it says "Figure??" We have fixed that.   - Please check comma placement (like after 'Therefore', 'Moreover', in the entire draft)  We have performed an extensive grammar check on the whole article.

Round 2

Reviewer 2 Report (New Reviewer)

Dear authors,

I carefully checked your response and the revised manuscript. I appreciate the changes. I feel the addition of a baseline model and the clarity on the train/test split have improved the paper. However, I do have a few more points, which I hope will further improve the manuscript:

- I understand that the dataset is the main contribution of the article. Yet the extensive figures and exemplary tables somewhat hinder the reading flow. I suggest to double-check, if possible, to reduce the number of rows presented in the exemplary tables. 

Line 170: I suggest spell out the link, in case people read a printed version of the article

Where does the model in 3.2.3 come from? I suggest to add a citation to give credit if it is an existing method. Or do you consider it a contribution of the paper?

The clarity of the model description Section 3.2.3 could be improved. Although the formula (1) provides a good overview, the issue is that the following subsubsections do not connect with the introduced formula. Side note: the actual label (1) is missing for formula 1.

- Table 4: what is the meaning of the NaN values? It should be described in the text. Is it a problem, or expected?

- Line 400: it is not an actual evaluation, but rather an example of the final output, if I'm not mistaken.

Figure 7 does not need to be a Figure, the only information is 60% accuracy, 0.457 MRR, which can be expressed in a sentence. In addition, it is not clear whether this is only one example, or whether this is an average over the entire test set? Normally, I'd assume that it is an average, but given that previous tables contain so much example data, I suggest make it explicit.

- Lines 424-426: I understand that this part aims to explain the choice of model. However, the writing needs to be improved: "we have picked the predicted subject labels [...]" does not seem accurate. 

 

 

Author Response

Dear Reviewer 2, we want to enormously thank you for your thorough review and great suggestions to help us improve the work. Please find below our responses to your comments:

-  I understand that the dataset is the main contribution of the article. Yet the extensive figures and exemplary tables somewhat hinder the reading flow. I suggest to double-check, if possible, to reduce the number of rows presented in the exemplary tables. 

We have removed exemplary rows and columns from tables 2, 4, 5, 6, 7.

- Line 170: I suggest spell out the link, in case people read a printed version of the article

 We have spell out the link.

- Where does the model in 3.2.3 come from? I suggest to add a citation to give credit if it is an existing method. Or do you consider it a contribution of the paper?

This is a contribution of our work.

- The clarity of the model description Section 3.2.3 could be improved. Although the formula (1) provides a good overview, the issue is that the following subsubsections do not connect with the introduced formula. Side note: the actual label (1) is missing for formula 1.

We have added one missing formula for the affinity-based approach and the overall aggregation of the two models to add clarity what is is the overall logic of the model, matching now the formulas with the next subsections.

- Table 4: what is the meaning of the NaN values? It should be described in the text. Is it a problem, or expected?
We have added a text explaining what the NaN values mean:
```
The NaN values mean that the user has not rated the movie. No rated movies will be ignored for calculating the total average rating for every movie. Movies with no provided rating from any high affinity user will not be taken into consideration for the recommendations.
```

-  Line 400: it is not an actual evaluation, but rather an example of the final output, if I'm not mistaken.
We have rephrased the explanation of the table.
```
In Table~\ref{tab:table6} an example of the final outcome can be found, with the actual ratings from certain user for every movie and the predictions from both models together with the aggregated prediction.
```

- Figure 7 does not need to be a Figure, the only information is 60% accuracy, 0.457 MRR, which can be expressed in a sentence. In addition, it is not clear whether this is only one example, or whether this is an average over the entire test set? Normally, I'd assume that it is an average, but given that previous tables contain so much example data, I suggest make it explicit.

We have removed the Figure 7 and we have added a description of the results instead:
```
The average results from these two metrics applied to the whole dataset are a MRR of 0.457 and an accuracy from 60%.
```

- Lines 424-426: I understand that this part aims to explain the choice of model. However, the writing needs to be improved: "we have picked the predicted subject labels [...]" does not seem accurate.

We have rephrased and added more appropriate description for the choice of the model
```
For baseline, we have chosen the recent work on recommendations for items set completion \cite{Vagliano2022}. In order to choose a similar scenario to the user's cold start problem covered in this work,  the chosen scenario is the predicted subject labels for the EconBiz Dataset, where the partial set of items along with the title is given, and using with SVD model. This scenario, which is comparable to the cold-start problem which it is faced on this work, is obtaining a MRR of around 45%. On the other side, the proposed model of our work with CSP-Dataset is obtaining a MRR of 0.457%.
```

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

This paper released a dataset for cold-start recommendation. As we know, the cold-start problem is a big concern in today's recommender systems, and thus the release of such a dataset can benefit a lot of researchers. The authors introduce the details of how to obtain the dataset, give some analysis, and then conduct some simple recommendation tasks on this dataset.

The positive points include:

(1) It is beneficial to the community to build such a dataset.

(2) The details of datasets are well introduced.

(3) The dataset is released at a public link.

The negative points include:

(1) The presentation is awful. The figures and tables are a mess, which may make the readers struggle. The authors should pay more attention to how to organize the paper well and improve the tables and figures. For example, referring to the recently published papers in this journal.

(2) There is no explicit comparison between the released dataset and the existing dataset for the cold-start recommendation. More discussions are required to make sure the dataset is a significant contribution.

(3) The experiments (Section 4) should be improved. There should be more clear settings, strong methods, well-discussed conclusions, etc.

 

Author Response

Dear reviewer we want to thank you for your valuable remarks. Please find below our response to your comments:   (1) The presentation is awful. The figures and tables are a mess, which may make the readers struggle. The authors should pay more attention to how to organize the paper well and improve the tables and figures. For example, referring to the recently published papers in this journal. We have performed a change in the structure of the paper adding more clarity to the overall picture. Additionally we have removed some unnecessary code blocks that were redundant since the code was linked to the project. We have improved some tables and added formulas to show clearer how the algorithm works. (2) There is no explicit comparison between the released dataset and the existing dataset for the cold-start recommendation. More discussions are required to make sure the dataset is a significant contribution. We have added to section 2.2 a description of the datasets that are often used for models that deal with the cold-start problem presenting their pitfalls. In the section 5 (Discussion) we have provided arguments on why our dataset provides implicit user behavioral features that other datasets lack of. (3) The experiments (Section 4) should be improved. There should be more clear settings, strong methods, well-discussed conclusions, etc. We have added a clearer description of the model providing a formula for it. We have removed unnecessary code blocks and we have improved the conclusion section.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper presented a multi-source dataset optimized for the study and the alleviation of the cold start problem. This dataset contained the users, the items (movies) and ratings with some contextual information. An example user behavior driven algorithm was also presented with the constructed dataset for creating recommendations under the cold start situation.

 

Some concerns need to be addressed in the revision.

(1)       The title might be better if the authors consider “Introducing CSP dataset: A dataset optimized for the study of the cold start problem in recommender systems.

(2)       In Section 2.3, more references published recently for cold start problem are suggested to be added and analyzed, such as [1].

(3)       The content of data processing might be summarized more briefly and is not necessary to show all specified codes and result matrices in the context.

(4)       Both the British writing English and American writing English can be found in this article. It might be better to unify them in the revision.

(5)       Many typos need to be corrected. For example, the “recommender systems models” might be “recommender system models” in line 39; “To” is changed to “to” inline 80; The lines 160, 288 and 422 (or more) might be organized in a subtitle.

[1] Feng et al. RBPR: A hybrid model for the new user cold start problem in recommender systems, KBS, 2021.

Author Response

Dear reviewer we want to thank you for your valuable remarks. Please find below our response to your comments: (1)       The title might be better if the authors consider “Introducing CSP dataset: A dataset optimized for the study of the cold start problem in recommender systems. We have changed the title for the suggested one.   (2)       In Section 2.3, more references published recently for cold start problem are suggested to be added and analyzed, such as [1]. We have added more recent references, including the suggested one.   (3)       The content of data processing might be summarized more briefly and is not necessary to show all specified codes and result matrices in the context. We have summarized the description of the model, removing unnecessary code blocks and adding formula for the model.   (4)       Both the British writing English and American writing English can be found in this article. It might be better to unify them in the revision. We have unified all the styles to American writing English.   (5)       Many typos need to be corrected. For example, the “recommender systems models” might be “recommender system models” in line 39; “To” is changed to “to” inline 80; The lines 160, 288 and 422 (or more) might be organized in a subtitle. We have performed a thorough review where we have suggested the typos indicated and improved some other expressions. Also we have organized some sub sections to add more clarity.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thanks to the authors' quick revision.

The authors have marked the revised texts with another color. We can find that the revisions indeed lead to some improvement. However, my concerns in my last review still exist as follows.

1. The figures and tables are still a mess. The revised version does not improve the table and figures. The textual descriptions in the figures are still very unclear.

2. Although the authors have discussed the difference between the existing datasets and the dataset in this paper (line 424), the explanations are not so solid. For example, the provided weather information may not be so useful.

3. The weaknesses in experiments, including the settings, strong baselines, etc., have not been well addressed.

Given the above reasons, I still hold my score.

Author Response

Dear reviewer we want to thank you again for your valuable remarks. Please find below our response to your comments:   1. The figures and tables are still a mess. The revised version does not improve the table and figures. The textual descriptions in the figures are still very unclear.   We have fixed all the tables and we have removed some unnecessary figures.Also we have improved some figures to make them more readable.   2. Although the authors have discussed the difference between the existing datasets and the dataset in this paper (line 424), the explanations are not so solid. For example, the provided weather information may not be so useful.   We have reformulate part of the conclusion and in section 2.2, adding more information to make it more understandable and clear, and to explain better what is the key strength from CSP dataset (user related features that make possible more custom recommendations).   3. The weaknesses in experiments, including the settings, strong baselines, etc., have not been well addressed.   Through the changes in the tables and figures we have made more clear how the experiments are performed. Although the main actor from the work is the dataset itself, the algorithm used is, as shown in the results, performing at an optimal level.

Author Response File: Author Response.pdf

Reviewer 2 Report

Thanks for the authors' revisions. All my concerns have been solved in this version.

Author Response

Thank you very much for your help.

Back to TopTop