Next Article in Journal
Effect of Coriolis Force on Vibration of Annulus Pipe
Next Article in Special Issue
Improvement in the Convolutional Neural Network for Computed Tomography Images
Previous Article in Journal
Critical Comparison of Spherical Microindentation, Small Punch Test, and Uniaxial Tensile Testing for Selective Laser Melted Inconel 718
 
 
Article
Peer-Review Record

Predicting Implicit User Preferences with Multimodal Feature Fusion for Similar User Recommendation in Social Media

Appl. Sci. 2021, 11(3), 1064; https://doi.org/10.3390/app11031064
by Jenq-Haur Wang 1,*, Yen-Tsang Wu 1 and Long Wang 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2021, 11(3), 1064; https://doi.org/10.3390/app11031064
Submission received: 23 December 2020 / Revised: 19 January 2021 / Accepted: 21 January 2021 / Published: 25 January 2021
(This article belongs to the Special Issue Principles and Applications of Data Science)

Round 1

Reviewer 1 Report

The task of automatic recommendation has traditionally been based on user-based and object-based systems, many times using a collaborative filtering approach (and content filtering), many times combining the approaches, therefore, creating hybrid models. These approaches make use of past interactions and experiences and assume that what created a relationship in the past will do it also in the future.

The authors of this paper take a quite different approach. Firstly, by using hashtags as a basis on what users want and like. Eventually, with a combination of the hashtags. While I do not assert this is wrong, and that should not be used, it would be a good starting point to show it does work for real people in real-world situations, for user-recommendation. At least, please do show some evidence that there is this ‘the connection’ you are proposing.

It seems to me that the general idea of combining text and image features is interesting and, ultimately – I believe! – it will play a role of major importance; autoencoders for dimensionality reduction are also a good idea (but in this case you are using a technique that will lead you to poor-controlled results); the division between early and late fusion is also interesting to be analyzed.

However, from reading your paper, I feel many of the connections between these ideas are not really assessed on their quality.

You use the accuracy metric to assess ImageCNN and TextCNN, and a score to assess the result of your recommendation system, which seems to me to be fair. However, then I was expecting to see the evolution of that score in a ROC-based evaluation, or at least, some metric to assess the recall of the method.

On the other hand, what I do not think is fair is the baseline models you made: the methodology for their creation is just leaving out the interesting things of your methodology. This clearly will, deliberately, conduce to worse results, and that is not fair. What I think is a fair comparison, is to pick a standard recommendation methodology (object or user-based) and compare it with yours.

In section 5 you mix results in terms of 'scores' and 'accuracy', which makes it harder to understand how good the system is.

From the results, you conclude that the methodology is good because it led to a big improvement of the baseline (which I criticized because I don’t think is fair) and compare the score between two datasets. But, actually, what you got was the system saying that, if it gets more hashtags, then it becomes confused, and that it would probably not scale well.

Therefore, I believe you would need to revise some aspects of your experience to get more convincing results.

I also noted some minor issues:

Avoid situations like the heading of a new section alone at the end of a page (line 146).

In line 257, correct the phrase "to find out more important" (most).

In line 258 you refer to the use of autoencoders using a “single-hidden h-dimensional neural network”, which is first mentioned in line 185 as the size of a window with h words. However, it is not clear how do you get to that value. Please make sure these two ‘h’ represent the same, and how were they actually computed. Saying “experimentally” is, clearly, not enough.

In table 2 you should maintain a more consistent division of the rows.

In line 282 you refer to the calculation of cluster similarity. This is not new research and there is vast research on the subject from which you can profit from. To deal with this you use a mixture of online and offline methods (due to time restraints in real-world situations). However, the clusters cannot be the same forever: as new users come into them the centroids will progressively change/move. Therefore, I wonder if you do have some kind of threshold to trigger a new clustering process to update the centroids?

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Dear authors!

New methods for predicting user's preference in social networks are very interesting subject. It was my pleasure to read and review your comprehensive paper. I have some proposals in order to improve your paper:

  1. Does your system distinguish posted and re-posted images? One user can post a various content, e.g. drawings and photos of the same thing. Is your algorithm able to take this into account?
  2. You declare that your approach gives "an accuracy of 0.491". This is relative low accuracy for ANN technology. Please, explain this in the text.
  3. It is known, that Instagram contains mainly images and videos. What about audio, GIF animations and files, like in other social networks? Is your approach suitable to solve this task?
  4. Subsections 4.2.1-4.2.6 are too short. Please, restructurize the paper.
  5. More thorough comparison should be given with existing predictors.
  6. Is your approach able to act as intellectual data crawler in social networks? If yes, please, indicate this in conclusion as a possible application.

I like this paper and will recommend it for publication after revisions.

Author Response

Q: New methods for predicting user's preference in social networks are very interesting subject. It was my pleasure to read and review your comprehensive paper. I have some proposals in order to improve your paper:

A: Thanks for your encouragement and valuable comments. We are doing our best to improve our paper.

 

Q: Does your system distinguish posted and re-posted images? One user can post a various content, e.g. drawings and photos of the same thing. Is your algorithm able to take this into account?

A: Thanks for your valuable comments. We have included the corresponding explanation in the discussion section.

No, we do not distinguish between posted and re-posted images if they are captured in the same resolution. Since the features of all images are extracted by the same CNN architecture of VGG16, the posted and re-posted images will have the same features if they are represented by the same pixels. This will classify them into the same category, from which our recommendation is made. Following the same line of thought, various content such as drawings and photos of the same thing will not have exactly the same features since it would be difficult to mimic the photos when drawing the same thing. However, CNN models have been demonstrated [1] to improve the performance of sketch-based image retrieval (SBIR) by extracting deep features in recent years. Since we utilize VGG16 with multiple convolutional and pooling layers, our algorithm is able to extract the important semantic features from drawings and photos of the same thing that give similar classification results.

[1] Deng Yu, Yujie Liu, Yunping Pang, Zongmin Li, Hua Li, A multi-layer deep fusion convolutional neural network for sketch based image retrieval, Neurocomputing, Vol. 296, Pages 23-32, 2018.

 

Q: You declare that your approach gives "an accuracy of 0.491". This is relative low accuracy for ANN technology. Please, explain this in the text.

A: Thanks a lot for your valuable comments. We have modified the term “score” into “top-k accuracy”, which is the evaluation metric defined in Eq.(8). The following explanation has been clarified in the corresponding text.

Text or image classification using ANN models do have better performance in terms of accuracy than the baseline models which are not neural network-based, as we have shown in our experiments (Sec. 4.2.2).

However, in this paper, we focus on the task of user recommendation. Existing recommender systems focus on item recommendation based on user ratings on different items. User recommendation is more challenging since we have many users and each user has different attributes and preferences which cannot be exactly the same as others. Multiclass classification is more difficult than binary classification. To the best of our knowledge, there’s no publicly available dataset for user recommendation. Since there’s no ground truth in real-world data, we assume that hashtags posted by users represent their implicit user preferences. For each user, the users with more similar hashtags are assumed to be the ground truth of his/her similar users. Then, we evaluate the performance by the average top-k accuracy of each system-generated user list, as shown in Eq.(8).

One of the reasons why the top-k accuracy for user recommendation is not as high as accuracy in classification tasks is that: user-provided hashtags are very diverse. Since our approach is evaluated based on hashtags, it shows the limitation of our proposed approach. But as shown in the experimental results, our proposed approach can outperform the cases of single-modal feature. We have demonstrated the effectiveness of multi-modal feature fusion from texts and images for user recommendation. In future, we plan to resolve the issue of diverse hashtags by consolidating the semantic meanings of hashtags using word embedding models or state-of-the-art deep learning models for linguistic tasks such as transformers or BERT.

 

Q: It is known, that Instagram contains mainly images and videos. What about audio, GIF animations and files, like in other social networks? Is your approach suitable to solve this task?

A: Thanks for your valuable comments. Currently, we only focus on the major multimedia types: images and texts among others in social networks. Our approach is able to handle GIF animations if we add preprocessing steps by extracting the individual image frames, with their features combined (for example, by taking an average of features) into one for the same animation. Audio, and other files are not currently considered, and will be taken as future work.

 

Q: Subsections 4.2.1-4.2.6 are too short. Please, restructurize the paper.

More thorough comparison should be given with existing predictors.

A: Thanks a lot for your valuable comments. We have restructured Sec.4.2.1-4.2.6 and extensively revised and elaborated on the performance comparison and discussions for different parameters in each experiment in this revision.

 

Q: Is your approach able to act as intellectual data crawler in social networks? If yes, please, indicate this in conclusion as a possible application.

I like this paper and will recommend it for publication after revisions.

A: Thanks a lot for your encouragement and valuable comments. We have added the following paragraph in our discussion.

As a potential application of our proposed approach, we can utilize the similar user recommendation algorithm to build an intellectual data crawler in social networks. For example, based on the targeted topics of interest, we can utilize our proposed approach to discover related posts, with multimedia content, and related user information. Then, by clustering similar users based on user preferences, it would be useful to further expand our proposed approach across multiple social networks that might be different in their structure. This could help reduce the problem of social network analysis across different social networks.

 

Reviewer 3 Report

The manuscript presents a feature fusion approach for user-preference learning by combining user post contents and related image features for recommending similar users on Instagram. The manuscript sounds technically and scientifically interesting, and I would be glad if it will be published.

In my opinion, the only issue is that a further comparison with some state-of-the-art approaches could be useful. The authors compared their proposal with two baselines based on classical cosine similarity. If other recommendation models based on feature fusion are currently available in the literature, a comparison with such approaches would better state the authors' claim. Conversely, if there are no other comparable approaches, the authors should state and clarify that.

The other key points of the paper are perfectly clear and satisfying: the approach is well described and detailed, all choices are well explained and motivated, the experiments are performed in an exhaustively way, and the related work section is adequate.

I would suggest the following modifications:

  • In the related work section, from line 130, both early and late feature fusion methods are described in detail (with also formulas). I would move this part from Related work directly to Section 3 (The proposed method).
  • I would describe the baseline models not in the Evaluation Metric paragraph, but in an appropriate subsection, to emphasize their adoption.
  • I suggest also to motivate, in the Experiments section, the choice of the minimum and maximum numbers of post per user (respectively, 9 and 45)

Spelling and formatting: the paper is, in general, well structured and readable. 
Minor errors and typos:

  • Line 32: Given huge amount... --> Given the huge amount...
  • Line 34: ...information that interest... --> ...information that interest...
  • Line 39: ...information which are often... --> ...information which is often...
  • Line 51: ...deep learning techniques such as... --> ...deep learning techniques, such as...
  • from line 183: all terms of formulas, such as xi, should be written in a proper font (if using latex, I suggest writing all terms in a mathematic environment). Please, pay attention also to subscripts chars.
  • Line 302: ...dataset that include.. --> ..dataset that includes...
  • Line 411: calculations are need --> calculations are needed

 

 

Author Response

Q: The manuscript presents a feature fusion approach for user-preference learning by combining user post contents and related image features for recommending similar users on Instagram. The manuscript sounds technically and scientifically interesting, and I would be glad if it will be published.

A: Thanks a lot for your encouragement and valuable comments.

 

Q: In my opinion, the only issue is that a further comparison with some state-of-the-art approaches could be useful. The authors compared their proposal with two baselines based on classical cosine similarity. If other recommendation models based on feature fusion are currently available in the literature, a comparison with such approaches would better state the authors' claim. Conversely, if there are no other comparable approaches, the authors should state and clarify that.

A: Thanks very much for your valuable comments. We added the comparison with some related works as follows:

For user or people recommendation in social networks, most existing approaches rely on social relations and network structures in addition to content similarity. For example, Chen et. al [1] found out that social network structure tends to give known contacts, while content similarity helps to find new friends. Hannon et. al [2] considered content-based techniques and collaborative-filtering approaches based on followees and followers of users.

Armentano et. al [3] proposed to recommend relevant users by exploring the topology of the network. Since content-based approaches tend to have low precision while collaborative filtering based approaches based on follower-followee relations have data sparsity issues, Zhao et. al [4] proposed a community-based approach that utilizes LDA-based method on follower-followee relations to discover communities before applying matrix factorization for user recommendation. Gurini et. al [5] proposed to extract semantic attitudes from user-generated content, including: sentiment, volume, and objectivity, and conducted people recommendation using matrix factorization. In this paper, since there’s no social network structures available, we use content-based approach as our baseline model. Also, we included word embedding using pretrained Word2Vec model to get semantic information of texts.

[1] Jilin Chen, Werner Geyer, Casey Dugan, Michael Muller, Ido Guy, "Make New Friends, but Keep the Old”–Recommending People on Social Networking Sites, CHI '09: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2009, pp.201–210.

[2] John Hannon, Mike Bennett, Barry Smyth, Recommending Twitter Users to Follow Using Content and Collaborative Filtering Approaches, RecSys '10: Proceedings of the fourth ACM conference on Recommender systems, 2010, pp.199–206.

[3] Armentano MG, Godoy D, Amandi A. Topology-based recommendation of users in micro-blogging communities. Journal of Computer Science and Technology, 27(3): 624-634, 2012.

[4] Gang Zhao, Mong Li Lee, Wynne Hsu, Wei Chen, Haoji Hu, “Community-Based User Recommendation in Uni-Directional Social Networks,” CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management, 2013, pp.189–198.

[5] Davide Feltoni Gurini, Fabio Gasparetti, Alessandro Micarelli, and Giuseppe Sansonetti, Temporal people-to-people recommendation on social networks with sentiment-based matrix factorization, Future Generation Computer Systems 78, 2018, 430–439.

 

Q: The other key points of the paper are perfectly clear and satisfying: the approach is well described and detailed, all choices are well explained and motivated, the experiments are performed in an exhaustively way, and the related work section is adequate.

A: Thanks a lot for your encouragement and valuable comments.

 

Q: I would suggest the following modifications:

In the related work section, from line 130, both early and late feature fusion methods are described in detail (with also formulas). I would move this part from Related work directly to Section 3 (The proposed method).

A: Thanks for your valuable comments. We have moved the description of early and late fusion to Sec.3.2.1 and 3.2.2 respectively in our revision.

 

Q: I would describe the baseline models not in the Evaluation Metric paragraph, but in an appropriate subsection, to emphasize their adoption.

A: Thanks for your valuable comments. We have moved the description of baseline models to Sec.4.2.1 in our revision.

 

Q: I suggest also to motivate, in the Experiments section, the choice of the minimum and maximum numbers of post per user (respectively, 9 and 45)

A: Thanks for your valuable comments. We have added the following explanation in the Experiments section.

To collect the Instagram dataset, we randomly selected 6 hashtags from the list of Top 100 Hash Tags on Instagram in 2018 as queries to get user posts. This can be done using Hashtag Search API, which is only available for Instagram Professional accounts. Thus, we have to directly crawl the latest updates for each user on Instagram webpages for tag-based queries. There’s a limit of the maximum number of 45 posts per user that can be crawled. To avoid large variations in user participation where there’re too few posts to learn the features, we chose to keep only users who had posted at least 9 posts, which is a 5:1 ratio (20% of the maximum).

 

Q: Spelling and formatting: the paper is, in general, well structured and readable.

Minor errors and typos:

Line 32: Given huge amount... --> Given the huge amount...

Line 34: ...information that interest... --> ...information that interest...

Line 39: ...information which are often... --> ...information which is often...

Line 51: ...deep learning techniques such as... --> ...deep learning techniques, such as...

from line 183: all terms of formulas, such as xi, should be written in a proper font (if using latex, I suggest writing all terms in a mathematic environment). Please, pay attention also to subscripts chars.

Line 302: ...dataset that include.. --> ..dataset that includes...

Line 411: calculations are need --> calculations are needed

A: Thanks for your valuable comments. We have corrected all errors and typos accordingly.

 

 

Round 2

Reviewer 1 Report

Thank you for your modifications. I am still convinced that you should compare your approach with the other type of user recommendation systems.

However, the content is now more clearer and there is sufficient background on the techniques you use. The minor flaws were also corrected.

 

Back to TopTop