Next Article in Journal
Analysis of the Learning Process of Computer Programming Logic in an 8-Year-Old Elementary School Student at Home through the Scratch Program
Previous Article in Journal
The Human Nature of Generative AIs and the Technological Nature of Humanity: Implications for Education
 
 
Article
Peer-Review Record

Survey on Machine Learning Biases and Mitigation Techniques

Digital 2024, 4(1), 1-68; https://doi.org/10.3390/digital4010001
by Sunzida Siddique 1,*, Mohd Ariful Haque 2, Roy George 2, Kishor Datta Gupta 2, Debashis Gupta 3 and Md Jobair Hossain Faruk 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Digital 2024, 4(1), 1-68; https://doi.org/10.3390/digital4010001
Submission received: 5 September 2023 / Revised: 27 November 2023 / Accepted: 28 November 2023 / Published: 20 December 2023

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper suffers form its large size. It has to be seriously re-edited and shortened maybe?

Content-related issues:

1. What do you mean by good or bad possibility of bias?

2. What is the meaning of eq. 3? There is no merit, neither mathematical formula -just mentioning about preprocessing - please explain / modify

Technical issues

1. Most of the examples presented her are for classification-based systems. Maybe it would be worthy to limit this paper to classification?

2. It is paramount to re-edit this paper and get rid of repetitions of a text, like i.e. lines 2420, 2432 and more:

"proving fairness in a variety of domains.
 In paper [12]The good possibility of bias is The paper highlights"

3. Sometimes the citations are missing like i.e. line 2442, but there is more

 

Comments on the Quality of English Language

This paper really need careful editing, avoiding capital letters inside a phrase (i.e. line 30) and adding spaces and filling empty lines i.e. 551

Author Response

Comments and Suggestions for Authors

Content-related issues:

  1. What do you mean by good or bad possibility of bias?

 

  • The good or bad possibility of bias relates to whether bias is beneficial or harmful.

 

  1. What is the meaning of eq. 3? There is no merit, neither mathematical formula -just mentioning about preprocessing - please explain / modify

 

  • In machine learning, data pre-processing is shown by the equation D processed = PreProcess(D). In this case, D processed stands for the dataset that was created by using pre-processing methods on the original dataset (D) Processing involves removing biases, and imbalances, and enhancing machine learning model training. Pre-processing uses oversampling or undersampling to balance data subsets. (D processed) improves machine learning model accuracy and fairness.

Technical issues

  1. Most of the examples presented here are for classification-based systems. Maybe it would be worthy to limit this paper to classification?

 

-Thankyou for the suggestions, we added few regression example as well.

 

  1. It is paramount to re-edit this paper and get rid of repetitions of a text, like i.e. lines 2420, 2432 and more:

 

 

  • Removed repetitions lines.

 

  1. "proving fairness in a variety of domains.
    In paper [12]The good possibility of bias is The paper highlights"

 

  • In paper[12] The good possibility of bias is  is underscored the importance of tackling bias and advocating for fairness within machine learning models. They shed light on the present state of research concerning bias and unfairness in these models, providing an extensive overview of existing literature, tools, metrics, and datasets. The utilization of particular fairness metrics such as Demographic Parity, Equalized Odds, and Equality of Opportunity plays a crucial role in establishing fairness across diverse domains, facilitating impartial decision-making in critical sectors like hiring, healthcare, and criminal justice. Moreover, employing datasets that contain demographic annotations allows for a thorough evaluation and efficient mitigation of biases, ultimately advancing fairness in the decision-making processes.

 

  1. Sometimes the citations are missing like i.e. line 2442, but there is more
    • done
  2. This paper really need careful editing, avoiding capital letters inside a phrase (i.e. line 30) and adding spaces and filling empty lines i.e. 551
    • done

Reviewer 2 Report

Comments and Suggestions for Authors

author to add novelty section

Author Response

Thank you for the reviewer's suggestion; we addressed it as best as possible; please check the responses. 

Reviewer 1:

Comments and Suggestions for Authors

Content-related issues:

  1. What do you mean by good or bad possibility of bias?

 

  • The good or bad possibility of bias relates to whether bias is beneficial or harmful.

 

  1. What is the meaning of eq. 3? There is no merit, neither mathematical formula -just mentioning about preprocessing - please explain / modify

 

  • In machine learning, data pre-processing is shown by the equation D processed = PreProcess(D). In this case, D processed stands for the dataset that was created by using pre-processing methods on the original dataset (D) Processing involves removing biases, and imbalances, and enhancing machine learning model training. Pre-processing uses oversampling or undersampling to balance data subsets. (D processed) improves machine learning model accuracy and fairness.

Technical issues

  1. Most of the examples presented here are for classification-based systems. Maybe it would be worthy to limit this paper to classification?

 

-Thankyou for the suggestions, we added few regression example as well.

 

  1. It is paramount to re-edit this paper and get rid of repetitions of a text, like i.e. lines 2420, 2432 and more:

 

 

  • Removed repetitions lines.

 

  1. "proving fairness in a variety of domains.
    In paper [12]The good possibility of bias is The paper highlights"

 

  • In paper[12] The good possibility of bias is  is underscored the importance of tackling bias and advocating for fairness within machine learning models. They shed light on the present state of research concerning bias and unfairness in these models, providing an extensive overview of existing literature, tools, metrics, and datasets. The utilization of particular fairness metrics such as Demographic Parity, Equalized Odds, and Equality of Opportunity plays a crucial role in establishing fairness across diverse domains, facilitating impartial decision-making in critical sectors like hiring, healthcare, and criminal justice. Moreover, employing datasets that contain demographic annotations allows for a thorough evaluation and efficient mitigation of biases, ultimately advancing fairness in the decision-making processes.

 

  1. Sometimes the citations are missing like i.e. line 2442, but there is more
    • done
  2. This paper really need careful editing, avoiding capital letters inside a phrase (i.e. line 30) and adding spaces and filling empty lines i.e. 551
    • done

 

Reviewer 2:

author to add novelty section

 

This work systematically reviews ML biases. The seven-step assessment method, which includes important databases and paper analysis, is unique. It analyzes past polls and highlights significant ML bias findings. Additionally, it categorizes and examines ML pipeline bias kinds, origins, detection techniques, and reduction tactics. The paper's conclusion emphasizes the possibilities of offered approaches and the relevance of context in bias reduction, making it an important addition to ML bias research. It also analyzes prior surveys and highlights significant ML bias studies.  It also examines various bias types, sources, detection methods, and reduction strategies within the ML pipeline. The paper's conclusion emphasizes the usefulness of the approaches offered while highlighting the significance of context in bias reduction, making it an invaluable addition to the field of ML bias research.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reviewer 3:

This paper introduces a survey of bias mitigation techniques in ML. While they do cover several interesting topics, I think it would be nice to have a more focused survey. There are indeed several survey papers on fairness in machine learning now (many of which have been cited by the authors). Instead of listing them in a table and pointing out differences, if the authors would write a few sentences describing the knowledge gap that they fill which these other papers "collectively" lack, that would be easier to read.

Our paper stands out by offering a comprehensive review of biases in machine learning. Unlike other papers that focus on specific aspects, Our work takes a holistic approach. It systematically categorizes and examines various bias types, sources, detection methods, and reduction strategies. This provides a broader and deeper understanding of bias, addressing a collective knowledge gap in the field.

It would be nice for the authors to focus on a trend/subset within fairness in ML in the title itself. I think they are trying to emphasize more on recent papers but it is still a very huge set and they are only discussing a few papers. 

- we selected paper from top journals and conference to make the discussion limited to few papers while quality didn’t get affected

I would recommend narrowing down the scope of the survey a little to make it very clear from the title and abstract itself what they would like to focus on and why this survey paper is different and unique from other surveys.

added as a novelty section

If it is a "recent" fairness survey paper, all the papers after a certain date should be referenced (like Hort et al. cite a vast majority) and described (not just other survey papers). So, if a reader is interested in a paper, they can refer to it. 

shorted other table

I particularly liked some parts of the paper, like, Section 4, and Fig. 4. I think reducing the scope and emphasizing certain parts would make this an interesting read.

 

 

 

 

 

 

 

Reviewer 4:

The paper is a try of a systematic review of the numerous biases in Machine Learning (ML), such as data, model and algorithmic bias, and of the methods for their mitigation. Authors conducted this review through a seven-step procedure that included selection of the peer-reviewed papers from well-known knowledge databases (IEEE Xplore, ArXiv, Google Scholar...) using appropriate keywords and inclusion and exclusion criteria, so they selected 110 most relevant papers that were further analyzed. In order to gain a more qualitative insight into the research landscape and to identify gaps in the literature, authors then performed bibliometric and document analysis using VOSviewer (a free program for bibliometric analysis and visualization) that includes the co-occurrence analysis of key words.

Within document analysis, at first is provided an overview of the previous surveys conducted in literature, which are analyzed, and the results of the analysis are summarized within the appropriate tables. Then, in section 3 of the paper, authors provide details and findings from two papers related to ML bias, whereby the first paper presents a comprehensive study of the impact of bias mitigation algorithms on classification performance and fairness, and the other paper is a large-scale empirical study of 17 different bias mitigation methods for ML classifiers applied to 8 widely adopted software classification tasks.

In Sections 4-6 of the paper are named, described, systematized and commented numerous types of bias in ML, then the sources of bias within ML pipeline and the techniques for bias detecting and measuring, as well as various bias reduction strategies and methods.

As the overall conclusion, authors state that all presented methods have shown the potential in lowering ML bias, but that they each have drawbacks and might not be appropriate in all situations. On the other hand, with the use of the analysis conducted in this paper, the context of a particular situation may determine which measure is the best for detecting bias against protected groups, and whether a sensitive property can be used to create a fairness metric.

There are some deficiencies that must be corrected before final accepting of the paper:

- lines 147-150: Table 2 is mentioned (which represents 922 documents...), but this table is missing in the paper. This table should be included as well, and all other tables in the paper should be renumerated.

done

- line 160/161: the final number of documents is missing (after exclusion of some types of publications ...). Also, paragraph begins with small letter...

-fixed

- sub-section 3.1: in lines 569 and 573 personal pronoun "we" is used (instead of "they"). Also, two paragraphs in this sub-section begins with small letter.

-done

- lines 583-592: there are repeated (or very similar) sentences in the paragraph (the paragraph should be rewritten; or the repeated sentences should be deleted).

-removed

Reviewer 3 Report

Comments and Suggestions for Authors

This paper introduces a survey of bias mitigation techniques in ML. While they do cover several interesting topics, I think it would be nice to have a more focused survey. There are indeed several survey papers on fairness in machine learning now (many of which have been cited by the authors). Instead of listing them in a table and pointing out differences, if the authors would write a few sentences describing the knowledge gap that they fill which these other papers "collectively" lack, that would be easier to read.

It would be nice for the authors to focus on a trend/subset within fairness in ML in the title itself. I think they are trying to emphasize more on recent papers but it is still a very huge set and they are only discussing a few papers. 

I would recommend narrowing down the scope of the survey a little to make it very clear from the title and abstract itself what they would like to focus on and why this survey paper is different and unique from other surveys. If it is a "recent" fairness survey paper, all the papers after a certain date should be referenced (like Hort et al. cite a vast majority) and described (not just other survey papers). So, if a reader is interested in a paper, they can refer to it. 

I particularly liked some parts of the paper, like, Section 4, and Fig. 4. I think reducing the scope and emphasizing certain parts would make this an interesting read.

 

Comments on the Quality of English Language

Many errors in grammar: full-stops, capitalization, etc.

Author Response

Reviewer 3:

This paper introduces a survey of bias mitigation techniques in ML. While they do cover several interesting topics, I think it would be nice to have a more focused survey. There are indeed several survey papers on fairness in machine learning now (many of which have been cited by the authors). Instead of listing them in a table and pointing out differences, if the authors would write a few sentences describing the knowledge gap that they fill which these other papers "collectively" lack, that would be easier to read.

Our paper stands out by offering a comprehensive review of biases in machine learning. Unlike other papers that focus on specific aspects, Our work takes a holistic approach. It systematically categorizes and examines various bias types, sources, detection methods, and reduction strategies. This provides a broader and deeper understanding of bias, addressing a collective knowledge gap in the field.

It would be nice for the authors to focus on a trend/subset within fairness in ML in the title itself. I think they are trying to emphasize more on recent papers but it is still a very huge set and they are only discussing a few papers. 

Answer:- we selected paper from top journals and conference to make the discussion limited to few papers while quality didn’t get affected

I would recommend narrowing down the scope of the survey a little to make it very clear from the title and abstract itself what they would like to focus on and why this survey paper is different and unique from other surveys.

Answer: added as a novelty section

If it is a "recent" fairness survey paper, all the papers after a certain date should be referenced (like Hort et al. cite a vast majority) and described (not just other survey papers). So, if a reader is interested in a paper, they can refer to it. 

Answer: shorted in  other table

I particularly liked some parts of the paper, like, Section 4, and Fig. 4. I think reducing the scope and emphasizing certain parts would make this an interesting read.

 

 

 

 

Reviewer 4 Report

Comments and Suggestions for Authors

The paper is a try of a systematic review of the numerous biases in Machine Learning (ML), such as data, model and algorithmic bias, and of the methods for their mitigation. Authors conducted this review through a seven-step procedure that included selection of the peer-reviewed papers from well-known knowledge databases (IEEE Xplore, ArXiv, Google Scholar...) using appropriate keywords and inclusion and exclusion criteria, so they selected 110 most relevant papers that were further analyzed. In order to gain a more qualitative insight into the research landscape and to identify gaps in the literature, authors then performed bibliometric and document analysis using VOSviewer (a free program for bibliometric analysis and visualization) that includes the co-occurrence analysis of key words.

Within document analysis, at first is provided an overview of the previous surveys conducted in literature, which are analyzed, and the results of the analysis are summarized within the appropriate tables. Then, in section 3 of the paper, authors provide details and findings from two papers related to ML bias, whereby the first paper presents a comprehensive study of the impact of bias mitigation algorithms on classification performance and fairness, and the other paper is a large-scale empirical study of 17 different bias mitigation methods for ML classifiers applied to 8 widely adopted software classification tasks.

In Sections 4-6 of the paper are named, described, systematized and commented numerous types of bias in ML, then the sources of bias within ML pipeline and the techniques for bias detecting and measuring, as well as various bias reduction strategies and methods.

As the overall conclusion, authors state that all presented methods have shown the potential in lowering ML bias, but that they each have drawbacks and might not be appropriate in all situations. On the other hand, with the use of the analysis conducted in this paper, the context of a particular situation may determine which measure is the best for detecting bias against protected groups, and whether a sensitive property can be used to create a fairness metric.

There are some deficiencies that must be corrected before final accepting of the paper:

- lines 147-150: Table 2 is mentioned (which represents 922 documents...), but this table is missing in the paper. This table should be included as well, and all other tables in the paper should be renumerated.

- line 160/161: the final number of documents is missing (after exclusion of some types of publications ...). Also, paragraph begins with small letter...

- sub-section 3.1: in lines 569 and 573 personal pronoun "we" is used (instead of "they"). Also, two paragraphs in this sub-section begins with small letter.

- lines 583-592: there are repeated (or very similar) sentences in the paragraph (the paragraph should be rewritten; or the repeated sentences should be deleted).

Author Response

Reviewer 4:

The paper is a try of a systematic review of the numerous biases in Machine Learning (ML), such as data, model and algorithmic bias, and of the methods for their mitigation. Authors conducted this review through a seven-step procedure that included selection of the peer-reviewed papers from well-known knowledge databases (IEEE Xplore, ArXiv, Google Scholar...) using appropriate keywords and inclusion and exclusion criteria, so they selected 110 most relevant papers that were further analyzed. In order to gain a more qualitative insight into the research landscape and to identify gaps in the literature, authors then performed bibliometric and document analysis using VOSviewer (a free program for bibliometric analysis and visualization) that includes the co-occurrence analysis of key words.

Within document analysis, at first is provided an overview of the previous surveys conducted in literature, which are analyzed, and the results of the analysis are summarized within the appropriate tables. Then, in section 3 of the paper, authors provide details and findings from two papers related to ML bias, whereby the first paper presents a comprehensive study of the impact of bias mitigation algorithms on classification performance and fairness, and the other paper is a large-scale empirical study of 17 different bias mitigation methods for ML classifiers applied to 8 widely adopted software classification tasks.

In Sections 4-6 of the paper are named, described, systematized and commented numerous types of bias in ML, then the sources of bias within ML pipeline and the techniques for bias detecting and measuring, as well as various bias reduction strategies and methods.

As the overall conclusion, authors state that all presented methods have shown the potential in lowering ML bias, but that they each have drawbacks and might not be appropriate in all situations. On the other hand, with the use of the analysis conducted in this paper, the context of a particular situation may determine which measure is the best for detecting bias against protected groups, and whether a sensitive property can be used to create a fairness metric.

There are some deficiencies that must be corrected before final accepting of the paper:

- lines 147-150: Table 2 is mentioned (which represents 922 documents...), but this table is missing in the paper. This table should be included as well, and all other tables in the paper should be renumerated.

done

- line 160/161: the final number of documents is missing (after exclusion of some types of publications ...). Also, paragraph begins with small letter...

-fixed

- sub-section 3.1: in lines 569 and 573 personal pronoun "we" is used (instead of "they"). Also, two paragraphs in this sub-section begins with small letter.

-done

- lines 583-592: there are repeated (or very similar) sentences in the paragraph (the paragraph should be rewritten; or the repeated sentences should be deleted).

-removed

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

This is an interesting survey of fairness literature. I think the presentation can be made a bit more precise in several places. For, a significant portion is devoted to adversarial learning methods while the focus is fairness using adversarial training. Similarly, there is a lot of digression in many places. Section 4 is quite nice, it can be brought up or emphasized more. The abstract does not seem to discuss what is novel about this survey.

Comments on the Quality of English Language

Many grammar errors

Author Response

We updated the abstract o emphasize section 4. Our survey intentionally focuses a significant portion on adversarial learning methods because we believe that these techniques are pivotal in advancing fairness in machine learning. While we understand that this might seem like an overemphasis, we aimed to highlight the potential and challenges of these methods in ensuring fairness.
We made an effort to provide a comprehensive overview of the topic, which may have led to sections that seem like digressions. However, we believe these sections contribute to a fuller understanding of the complexities and interrelated aspects of fairness in machine learning.

Thank you

Back to TopTop