Next Article in Journal
Trust-Based Data Communication in Wireless Body Area Network for Healthcare Applications
Previous Article in Journal
Image Fundus Classification System for Diabetic Retinopathy Stage Detection Using Hybrid CNN-DELM
 
 
Article
Peer-Review Record

NLP-Based Bi-Directional Recommendation System: Towards Recommending Jobs to Job Seekers and Resumes to Recruiters

Big Data Cogn. Comput. 2022, 6(4), 147; https://doi.org/10.3390/bdcc6040147
by Suleiman Ali Alsaif, Minyar Sassi Hidri *, Imen Ferjani, Hassan Ahmed Eleraky and Adel Hidri
Reviewer 1:
Reviewer 2:
Reviewer 3:
Big Data Cogn. Comput. 2022, 6(4), 147; https://doi.org/10.3390/bdcc6040147
Submission received: 13 October 2022 / Revised: 5 November 2022 / Accepted: 24 November 2022 / Published: 1 December 2022

Round 1

Reviewer 1 Report

I congratulate the author on this work, because of its positive impact on the development of job search tools. Here are some notes on the paper with the aim of improving it. I wish you success.

1- An abstract is usually a paragraph that gives the full picture of your research in terms of literature review, methodology, findings, and conclusion. Readers use the abstract to quickly learn the topic of your research. A well-written summary is critical to attracting readers so that they can open up and read about your work.

The abstract should contain a clear introduction that summarizes the field of research, and the results of the study should be briefly explained, specifically the results of machine learning models, so the reader will be eager to read more.

2- A flow chart of the proposed reciprocal recommendation system must be presented in the introduction, flowcharts can help you identify its essential steps and simultaneously offer the bigger picture of the process. It organizes the tasks in chronological order and identifies them

3- The author should add a section for Dataset description, a description of collected data set volume and its structure helps readers to easier understand the data

4- The author needs a scientific description of the model used in terms of the algorithms used, so the mechanism for calculating the accuracy of the used model must be explained.

5- The author should make sure to cite all references that have been listed in the bibliography, citation numbers (20,23,28,29,30,31,32) cannot be found in the reference’s bibliography

6- The author should make sure that volume, pagination, and DIO number are mentioned in the reference's bibliography. 

Author Response

1. An abstract is usually a paragraph that gives the full picture of your research in terms of literature review, methodology, findings, and conclusion. Readers use the abstract to quickly learn the topic of your research. A well-written summary is critical to attracting readers so that they can open up and read about your work.

The abstract should contain a clear introduction that summarizes the field of research, and the results of the study should be briefly explained, specifically the results of machine learning models, so the reader will be eager to read more.

The abstract was updated in the revised manuscript.

"More than ten years ago, online job boards have provided their services to both job seekers and employers who want to hire potential candidates. The provided services are generally based on traditional information retrieval (IR) techniques which may not be appropriate for both job seekers and employers. The reason is that the number of produced results for job seekers may be enormous. Therefore, they are required to spend time reading and reviewing their finding criteria. Reciprocally, recruitment is a crucial process for every organization. Identifying potential candidates and matching them with job offers requires a wide range of expertise and knowledge. This article proposes a reciprocal recommendation based on bi-directional correspondence as a way to support both recruiters' and job seekers' work. Recruiters can find the best-fit candidates for every job position in their job postings, and job seekers can find the best-match jobs to match their resumes. We show how machine learning (ML) can solve problems in natural language processing (NLP) of text content and similarity scores depending on job offers in Saudi major cities scrapped from Indeed. For bi-directional matching, a similarity calculation based on the integration of explicit and implicit job information from two sides (recruiters and job seekers) has been used. The proposed system is evaluated using a resume/job offer dataset. The performance of generated recommendations is evaluated using ML decision support measures. Obtained results confirm that the proposed system can not only solve the problem of bi-directional recommendation but also improve the prediction accuracy."

2. A flow chart of the proposed reciprocal recommendation system must be presented in the introduction, flowcharts can help you identify its essential steps and simultaneously offer the bigger picture of the process. It organizes the tasks in chronological order and identifies them

A flowchart was added in the introduction section to identify the essential steps and simultaneously offer the bigger picture of the process.

3. The author should add a section for Dataset description, a description of collected data set volume and its structure helps readers to easier understand the data.

Dataset description was added in the revised manuscript (subsection 5.1).

We collected data from two different sources for this study. We have two sets of data, one relating to user profiles and another relating to job profiles. We trained the model with 138 resume data and test it on 25 resume data and 250 job descriptions.

    • User profiles Data: The data we used to train our NLP model was acquired from https://github.com/DataTurks-Engg/Entity-Recognition-In-Resumes-SpaCy. A dataset consisting of 138 resumes was annotated using named entity recognition (NER). A resume’s skills have been extracted from its content as entities. 

A total of 25 resumes were collected from https://www.hireitpeople.com/resume-database/ to test our model.

    • Job profiles Data: As Job listing data was web scraped from published jobs in sa.indeed.com, the extracted data was then saved to JSON files. There are five files, each containing nearly 50 job descriptions for one IT job (from our selected jobs) published during a month. Job descriptions contain six key pieces of information: the job URL link, the company name, the location, the salary, and the full job description text. 

The information from job specifications have been extracted and analyzed using Natural Language Processing (NLP) based on Named Entity Recognitions (NER) techniques to find skills.

Kindly refer to (subsection 5.1).

4. The author needs a scientific description of the model used in terms of the algorithms used, so the mechanism for calculating the accuracy of the used model must be explained.

Accuracy calculation was added in the revised manuscript (subsection 5.2).

A predictive model's performance can be easily evaluated by calculating the percentage of its predictions that are accurate. A model's accuracy is a metric used to measure how many predictions it correctly predicted. Essentially, this value is calculated based on the ground truth classes and the predictions of the model.

It is calculated by dividing the number of correct predictions by the total prediction number.

Accuracy= Number of Correct Predictions / Total Number of Predictions

Since our predictive model deals with multiclass classification, the Accuracy algorithm steps are as follows:

    1. Get predictions from the model.
    2. Calculate the number of correct predictions.
    3. Divide it by the total prediction number.
    4. Analyze the obtained value.

The higher the metric value, the better is the prediction. The best possible value is 1 (if a model got all the predictions right), and the worst is 0 (if a model did not make a single correct prediction).

5. The author should make sure to cite all references that have been listed in the bibliography, citation numbers (20,23,28,29,30,31,32) cannot be found in the reference’s bibliography.

All references are listed in-text and in the references section. The MDPI template hides the number of the reference if is between successive numbers.

For example, if three references follow each other, instead of putting them in this format [19,20,21], it reduces them to this format [19-21]. So, all references are listed both in the in-text and in the references section.

6. The author should make sure that volume, pagination, and DOI number are mentioned in the reference’s bibliography. 

References are checked and updated. DOI numbers are mentioned in all references.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have elaborated with detail the NLP based recommendation system for job recruiting scenario. I have following suggestions to further improve upon the manuscript.

1.      The dataset consists of only 25 resumes, how do you justify biasing in the model with the small dataset, and do you think the model could be used on large datasets. These questions would be beneficial to generate interest in using your methodology.

2.      The authors have coined the recommender system as bidirectional NLP based recommendation system, it would be interesting to know how does this recommender system compares with other recommender systems, specifically in job recruitment environments, for example how does this system compare with conversational recommender system(Mentec, François, et al. "Conversational recommendations for job recruiters." Knowledge-aware and Conversational Recommender Systems. 2021.), or collaborative recommendation system based on EM(for example : Mao, Yu, et al. "A bidirectional collaborative filtering recommender system based on EM algorithm." International Conference on Smart Vehicular Technology, Transportation, Communication and Applications. Springer, Cham, 2017)

3.      Some suggestions to include in literature review. My suggestions are to include BERT and other language model based recommendation systems, as the paper pose the problem being solved as NLP based recommendation system. For example: Sun, Fei, et al. "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer." Proceedings of the 28th ACM international conference on information and knowledge management. 2019.

4.      Few minor spelling errors when mentioning Spacy

Author Response

1. The dataset consists of only 25 resumes, how do you justify biasing in the model with the small dataset, and do you think the model could be used on large datasets. These questions would be beneficial to generate interest in using your methodology.

Dataset description was added in the revised manuscript (subsection 5.1).

We collected data from two different sources for this study. We have two sets of data, one relating to user profiles and another relating to job profiles. We trained the model with 138 resume data and test it on 25 resume data and 250 job descriptions.

    • User profiles Data: The data we used to train our NLP model was acquired from https://github.com/DataTurks-Engg/Entity-Recognition-In-Resumes-SpaCy. A dataset consisting of 138 resumes was annotated using named entity recognition (NER). A resume’s skills have been extracted from its content as entities. 

A total of 25 resumes were collected from https://www.hireitpeople.com/resume-database/ to test our model.

    • Job profiles Data: As Job listing data was web scraped from published jobs in sa.indeed.com, the extracted data was then saved to JSON files. There are five files, each containing nearly 50 job descriptions for one IT job (from our selected jobs) published during a month. Job descriptions contain six key pieces of information: the job URL link, the company name, the location, the salary, and the full job description text. 

The information from job specifications have been extracted and analyzed using Natural Language Processing (NLP) based on Named Entity Recognitions (NER) techniques to find skills.

Kindly refer to (subsection 5.1).

2. The authors have coined the recommender system as bidirectional NLP based recommendation system, it would be interesting to know how does this recommender system compares with other recommender systems, specifically in job recruitment environments, for example how does this system compare with conversational recommender system (Mentec, François, et al. "Conversational recommendations for job recruiters." Knowledge-aware and Conversational Recommender Systems. 2021.), or collaborative recommendation system based on EM (for example : Mao, Yu, et al. "A bidirectional collaborative filtering recommender system based on EM algorithm." International Conference on Smart Vehicular Technology, Transportation, Communication and Applications. Springer, Cham, 2017)

The proposed recommender system is static and it is evident that it is difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The recent rise of conversational recommender systems (CRSs) changes this situation fundamentally. Within CRS, users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. The conversational recommendation aims at finding or recommending the most relevant information for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations

We believe that conversational systems will have a major impact on human-computer interaction. Due to users’ constant need to look for information to support both work and daily life, we think that conversational recommendation system will be one of the key techniques towards an intelligent web. This could be our future research track which consists in integrating Deep Learning in CRS.

Regarding the comparison with the collaborative recommendation system based on EM,  the context of our work is very different from that of the work presented in [1]. Our recommender system used content-based filtering model since it does not need any data about other job seekers, since the recommendations are specific to a particular job seeker. This makes it easier to scale down the same to a large number of job seekers. A similar cannot be said or done for collaborative filtering methods. 

[1] Mao, Yu, et al. "A bidirectional collaborative filtering recommender system based on EM algorithm." International Conference on Smart Vehicular Technology, Transportation, Communication and Applications. Springer, Cham, 2017)

→However, we enriched the related work section while highlighting these approaches.

3. Some suggestions to include in literature review. My suggestions are to include BERT and other language model-based recommendation systems, as the paper pose the problem being solved as NLP based recommendation system. For example: Sun, Fei, et al. "BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer." Proceedings of the 28th ACM international conference on information and knowledge management. 2019.

References were added to related work section including BERT of Sun, F., et al. (2019) and collaborative recommendation system based on EM of Mao, Y., et al. (2017).

4. Few minor spelling errors when mentioning Spacy

We fixed spelling and grammatical errors.

Reviewer 3 Report

1.      This paper has a concerning high similarity rate/repeat rate, which is 45%.

2.      Important parts of the manuscript are copy/pasted with no proper referencing.  For example:

a.      Related work was copied word for word from: https://ieeexplore.ieee.org/document/9336532/authors#authors

b.      Problem formulation was copied from https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/154963/GUO-THESIS-2015.pdf?sequence=1 (You only removed monster and indeed.com with sa.indeed.) And http://ceur-ws.org/Vol-2960/paper17.pdf

c.      All word embedding discussion were copied from http://ceur-ws.org/Vol-2960/paper17.pdf

d.      5.2 Data Scrapping was copied from http://ceur-ws.org/Vol-2960/paper17.pdf  and https://brightdata.com/blog/how-tos/how-to-use-beautiful-soup-for-web-scraping-with-python and https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

e.      5.3 data preprocessing was copied from http://ceur-ws.org/Vol-2960/paper17.pdf

f.       5.4 was copied from http://ceur-ws.org/Vol-2960/paper17.pdf  https://pergamos.lib.uoa.gr/uoa/dl/frontend/file/lib/default/data/2964276/theFile

g.      The list goes on and on including 5.5, 5.6, 6.1 (very concerning when the result part is copied)

3.      It is concerning that this paper heavily used : http://ceur-ws.org/Vol-2960/paper17.pdf but it was never mentioned in the references list.

4.      Many figures were basically copied from other published work without even referencing that work for example figure 7

5.      The work lacks novelty and originality; similar methods can be found in basic ML tutorials: https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175

6.      Data is neither interesting nor representative.

7.      Data were not properly described, lines from 250-254 describes which CV’s were collected but I cannot see how many CV were downloaded, what is the test/train split ratio.

8.      It is mentioned that 25 resumes ware used for testing, but what about training, and is this a sufficient number of resumes to yield conclusions.

9.      No comparison with SOTA approaches were presented

10.   Around 60% of the references are old.

Author Response

1. This paper has a concerning high similarity rate/repeat rate, which is 45%.

The similarity rate indicated in the sent document by the journal editor is 23% (kindly find in attached file the Ithenticate report sent by the editor combined with answers to your all comments/suggestions). Even with a rate of 23%, we agree with you that this rate is a little high since we used some preliminaries related to the prediction and validation models of our reciprocal recommendation system. According to the similarity report sent by the editor, the part that gave this rate is that of the related work section (4% according to the similarity report of the editor), and the rate of 3% corresponds to the text of the header and the footer of the paper (2022, Submitted to Big Data Cogn. Comput. and three URLs of the journal in the footer). The similarity of the rest does not exceed 1%.

→However, we updated all the parts of the paper with overlapping words as mentioned in the similarity report sent by the editor.

We attached the sent similarity report by the editor in a supplementary file with the revised manuscript.

2. Important parts of the manuscript are copy/pasted with no proper referencing.  For example:

a. Related work was copied word for word from: https://ieeexplore.ieee.org/document/9336532/authors#authors

This section has been updated in the revised manuscript and the similarity was reduced.

b.Problem formulation was copied from https://oaktrust.library.tamu.edu/bitstream/handle/1969.1/154963/GUO-THESIS-2015.pdf?sequence=1 (You only removed monster and indeed.com with sa.indeed.) And http://ceur-ws.org/Vol-2960/paper17.pdf

This section is removed from the revised manuscript and embedded with the Introduction section (to be more productive).

c. All word embedding discussion were copied from http://ceur-ws.org/Vol-2960/paper17.pdf

Kindly check the similarity report attached with revised manuscript sent by the journal editor. The similarity in this part is minimal and does not exceed 1%. The paper mentioned has no relation to the word embeddings discussion included in our article.

The paper http://ceur-ws.org/Vol-2960/paper17.pdf  doesn’t talk about word embeddings.

d. 5.2 Data Scrapping was copied from http://ceur-ws.org/Vol-2960/paper17.pdf  and https://brightdata.com/blog/how-tos/how-to-use-beautiful-soup-for-web-scraping-with-python and https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

The content of this section has no relation with the content of the paper http://ceur-ws.org/Vol-2960/paper17.pdf and does not share any concept with it.

You are right, we explored both blogs from two addresses https://brightdata.com/blog/how-tos/how-to-use-beautiful-soup-for-web-scraping-with-python and https:/ /medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25 in order to understand the principle of Beautiful Soup web scrapper and the way of automation scaping in our approach.

We have reduced the similarity of this part well.

 e. 5.3 data preprocessing was copied from http://ceur-ws.org/Vol-2960/paper17.pdf

Kindly check, this section is not copied from http://ceur-ws.org/Vol-2960/paper17.pdf.

f. 5.4 was copied from http://ceur-ws.org/Vol-2960/paper17.pdf  

Sub-section 5.4 presents how to train the model to auto-detect named entities (NER) and has no relation to the paper http://ceur-ws.org/Vol-2960/paper17.pdf.

g. The list goes on and on including 5.5, 5.6, 6.1 (very concerning when the result part is copied)

Thank you for this comment. The similarities detected in 5.5, 5.6, and 6.1 are minimal (<1%) and are related to definitions (some words) from the literature that we have already reformulated, and we will reformulate them again to decrease more the similarity rate.

3. It is concerning that this paper heavily used: http://ceur-ws.org/Vol-2960/paper17.pdf but it was never mentioned in the references list.

The proposed recommender system in our article does not share similarity with paper http://ceur-ws.org/Vol-2960/paper17.pdf since the proposed system is static and the second in dynamic and based on ontology. The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. The conversational recommendation approach used in paper http://ceur-ws.org/Vol-2960/paper17.pdf aims at finding or recommending the most relevant information for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations.

→However, we have enriched the Related work section by referencing this paper in the part that presents conversational recommender systems.

4. Many figures were basically copied from other published work without even referencing that work for example figure 7.

→Figure 7 was deleted, and spaCy reference is added to permit to readers to more understand how to train NER.

5. The work lacks novelty and originality; similar methods can be found in basic ML tutorials: https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175

The content in the mentioned URL presents NER (Named Entity Recognition) and how we can apply it for automatically generating summaries of resumes by extracting only chief entities like name, education background, skills, etc.

This blog has been very helpful for us to just extract our entities (Name, Location, Skills, and Education), which will be used in the rest of our approach.

The degree of similarity of this part does not exceed <1% of the global similarity rate and this part corresponds to 5% of the work presented concerning the bi-directional recommendation system.

The similarity rate of this part does not exceed <1% of the global similarity rate (23%) and this part corresponds to only 4% of the presented work concerning the bi-directional recommendation system.

The main contributions of this article can be summarized as:

    • Helping job seekers to find how their skills stack up against the most sought jobs and recommend job postings that matches their skills reflected in their resume.
    • Offering recruiters an engine that facilitates the identification and hiring of candidates.
    • Improving the matching between candidates and job offers.
    • Improving the computerized recruitment process.

6. Data is neither interesting nor representative.

Kindly see the detailed answer in comment 7.

7. Data were not properly described, lines from 250-254 describes which CV’s were collected but I cannot see how many CV were downloaded, what is the test/train split ratio.

Dataset description was added in the revised manuscript (subsection 5.1).

We collected data from two different sources for this study. We have two sets of data, one relating to user profiles and another relating to job profiles. We trained the model with 138 resume data and test it on 25 resume data and 250 job descriptions.

    • User profiles Data: The data we used to train our NLP model was acquired from https://github.com/DataTurks-Engg/Entity-Recognition-In-Resumes-SpaCy. A dataset consisting of 138 resumes was annotated using named entity recognition (NER). A resume’s skills have been extracted from its content as entities. The figure below shows a snapshot of the dataset.

A total of 25 resumes were collected from https://www.hireitpeople.com/resume-database/ to test our model.

    • Job profiles Data: As Job listing data was web scraped from published jobs in sa.indeed.com, the extracted data was then saved to JSON files. There are five files, each containing nearly 50 job descriptions for one IT job (from our selected jobs) published during a month. Job descriptions contain six key pieces of information: the job URL link, the company name, the location, the salary, and the full job description text.

The information from job specifications have been extracted and analyzed using Natural Language Processing (NLP) based on Named Entity Recognitions (NER) techniques to find skills.

8. It is mentioned that 25 resumes ware used for testing, but what about training, and is this a sufficient number of resumes to yield conclusions.

→We trained the model with 138 resume data and tested it on 25 resume data and 250 job descriptions (for more details see subsection 5.2 in the revised manuscript).

9. No comparison with SOTA approaches were presented

→Since we coined the recommender system as bidirectional NLP based recommendation system, it would be surely interesting to know how this recommender system compares with other recommender systems, specifically in job recruitment environments.

The two closest works are those of Mentec, François, et al. (Conversational recommendations for job recruiters, Knowledge-aware and Conversational Recommender Systems, 2021), or collaborative recommendation system based on EM of Mao, Yu, et al. (A bidirectional collaborative filtering recommender system based on EM algorithm, International Conference on Smart Vehicular Technology, Transportation, Communication and Applications. Springer, Cham, 2017).

First of all, the proposed recommender systems is static and it is evident that it is difficult to answer two important questions well due to inherent shortcomings: (a) What exactly does a user like? (b) Why does a user like an item? The shortcomings are due to the way that static models learn user preference, i.e., without explicit instructions and active feedback from users. Within conversational recommender systems (CRSs), users and the system can dynamically communicate through natural language interactions, which provide unprecedented opportunities to explicitly obtain the exact preference of users. The conversational recommendation aims at finding or recommending the most relevant information for users based on textual- or spoken-dialogs, through which users can communicate with the system more efficiently using natural language conversations.

Regarding the comparison with the collaborative recommendation system based on EM, the context of our work is very different from that of the work presented in [1]. Our recommender system used content-based filtering model since it does not need any data about other job seekers, since the recommendations are specific to a particular job seeker. This makes it easier to scale down the same to a large number of job seekers. A similar cannot be said or done for collaborative filtering methods. 

Given this diversity of framework and data behavior, we have limited ourselves to proving the effectiveness of the recommendation system according to the points of view of the job seeker and the recruiters. The decision support measures have only been used to calculate the accuracy of the system.

10. Around 60% of the references are old.

→References have been updated to ensure that more than 60% are recent.

Thank you for all your constructive comments and suggestions.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

After the clarifications of similarity, on my part there is no any restrictions, so i accept the study in present form.

Reviewer 3 Report

The manuscript has been improved significantly, all my comments were addressed properly. and therefore, I recommend the acceptance of this manuscript.

Back to TopTop