Identification of Challenges and Best Practices for Including Users with Disabilities in User-Based Testing

Round 1
Reviewer 1 Report
The article presents a systematic literature review to investigate trends, challenges, and best practices during usability testing involving users with disability. The paper is well-written, and the results are nicely discussed. However, some methodological issues need clarification.
- Page 6: the use of the inclusion criteria is wrong.
- The "inclusion criteria" define when a given publication is included, whether by one, two, or "n" addressed criterion. According to the criteria presented in Table 3, a paper just written in English would be accepted, as it meets the I2 inclusion criterion.
- "I1 - The paper seems to provide information about user testing" is not objective. I suggest removing the "seems to."
- Given that the goal of the paper is to address usability testing with users with a disability, why the involvement of people with accessibility was not considered in the criteria?
- Why limit publications from 2012 onwards?
- Details of the review process are missing. How the review process was performed? The guidelines from Kitchenham suggest the use of two steps called filters. The first filter is to read the title and the abstract to perform an initial screening of the publications. In the second filter, the researchers read the entire publication and apply the same selection criteria. Did the authors read the entire paper to confirm whether it fits the selection criteria? How many people were involved in the process? Have any agreement assessment criteria like Cohen's Kappa been defined?Please clarify.
- Page 6 - Table 4:
- How was Venue's quality assessed? What would be a sufficient description of the testing process?
- Q4: if the goal was publications with people with disabilities, wouldn't the study already have to involve this profile instead of being a quality criterion?
- Page 19: How did the authors map the best practices with the challenges? How many researchers were involved? How disagreements were resolved?
- Page 20: The authors suggest reporting how the researchers achieved the approval from the ethical committee. There is usually a reference to the research ethics committee approval document. Did the authors review the documents in these publications to identify what steps (such as mitigating possible discomforts) were taken to conduct such studies?
Minor issues:
- Page 3, line 145: I could not understand who performs better when using products for users with disabilities. Moreover, why is there such a difference?
- I suggest presenting the results from Table 8 and Figure 6 in descending order.
- Page 9, line 244: the last sentence is duplicated.
- IU --> UI (throughout the article)
Author Response
Dear Reviewer,
We are most grateful for your time and effort in reviewing our manuscript and for very positive feedback. The manuscript has been revised according to comments and suggestions provided by all reviewers. The revisions are highlighted using the “Track Changes” function in Microsoft Word, so that changes are easily visible. The comments and the corresponding responses are listed below.
Reply to reviewer’s comments:
Comment 1.1: “The article presents a systematic literature review to investigate trends, challenges, and best practices during usability testing involving users with disability. The paper is well-written, and the results are nicely discussed. However, some methodological issues need clarification.”
Authors’ response 1.1: We want to thank the reviewer for recognizing the positive aspects of our manuscript. We are truly thankful for the reviewer's recognition. We were delighted with the list of positive features of our article. In any case, we are aware that there are still possibilities for improvements in the article, which we have addressed together with other reviewers' comments and improved the structure and content.
Comment 1.2: “- Page 6: the use of the inclusion criteria is wrong.
- The "inclusion criteria" define when a given publication is included, whether by one, two, or "n" addressed criterion. According to the criteria presented in Table 3, a paper just written in English would be accepted, as it meets the I2 inclusion criterion.
- "I1 - The paper seems to provide information about user testing" is not objective. I suggest removing the "seems to."
- Given that the goal of the paper is to address usability testing with users with a disability, why the involvement of people with accessibility was not considered in the criteria?
- Why limit publications from 2012 onwards?”
Authors’ response 1.2: We want to thank the reviewer for this comment. We have reorganized the inclusion/exclusion criteria, which the second reviewer also suggested. To highlight the inclusion and exclusion criteria classification, we divided Table 3 into two tables. We rewrote the inclusion criteria to make more homogeneous definitions of the criteria. As suggested, we removed the word “seems to” to make criteria more objective. In this study, we have included research dealing with accessibility, although this was not explicitly stated in the inclusion criteria. We did include all articles that reported user-based studies, which could also include testing of accessibility. In the list of articles in the supplementary materials, several studies related to accessibility testing are listed (e.g., S1, S6, S9, S24, and more, please see the supplementary). The research team decided to restrict the research to the last ten years (2012-2022). The main reason for that decision was to get a balanced inclusion of different types of disabilities. For instance, cognitive accessibility is an issue that only started to appear in research in the last few years.
Comment 1.3: “- Details of the review process are missing. How the review process was performed? The guidelines from Kitchenham suggest the use of two steps called filters. The first filter is to read the title and the abstract to perform an initial screening of the publications. In the second filter, the researchers read the entire publication and apply the same selection criteria. Did the authors read the entire paper to confirm whether it fits the selection criteria? How many people were involved in the process? Have any agreement assessment criteria like Cohen's Kappa been defined? Please clarify. “
Authors’ response 1.3: We thank the reviewer for this valuable comment about not properly described review process. We agree that we had to provide more information about the review process in the description. We have extended the process description with more details about individual phases and steps in the phases. We added an explanation of the measures that were taken to address threats to the validity of the data extracted during the process. The assessment criteria were conducted with the help of cross-checking, which we additionally explained in the section Materials and Methods. We didn't set Cohen's Kappa criteria in this study, which is one of the threats to the bias in selected studies that we additionally explained in Section 5.3.
Comment 1.4: ” - Page 6 - Table 4:
- How was Venue's quality assessed? What would be a sufficient description of the testing process?
- Q4: if the goal was publications with people with disabilities, wouldn't the study already have to involve this profile instead of being a quality criterion?”
Authors’ response 1.4: We want to thank the reviewer for this comment. We agree that the description of the quality criteria had to be improved. The quality evaluation process was not properly explained in the text, which also pointed out other reviewer. We have added a paragraph to Table 5 (which was Table 4 in the original manuscript that we divided into two tables in the revised version according to the suggestion in comment 2.6) where we explained the four quality criteria that we used for assessing the identified articles before final inclusion in the data extraction.
Comment 1.5: ” - Page 19: How did the authors map the best practices with the challenges? How many researchers were involved? How disagreements were resolved?”
Authors’ response 1.5: We thank the reviewers for their valuable comment and identification of missing data about the mapping process. Based on their suggestion, Table 12 description was extended, explaining the approach how opinions of three researchers were summarized and their disagreements resolved.
Comment 1.6: ”- Page 20: The authors suggest reporting how the researchers achieved the approval from the ethical committee. There is usually a reference to the research ethics committee approval document. Did the authors review the documents in these publications to identify what steps (such as mitigating possible discomforts) were taken to conduct such studies?”
Authors’ response 1.6: In the part that starts in line 402 (“One such example is the very rarely described activity related to obtaining ethical approval for conducting the study. Since users with disabilities are invited to participate in these studies, we expect to see more papers reporting how ethics committee approvals were obtained.”). The following paragraph was extended, explaining that there is a lack of activities connected to obtaining approvals from the ethical committee as this was not included in most of the identified studies:
Since users with disabilities are invited to participate in these studies, we expect to see more papers reporting how ethics committee approvals were obtained and how user testing activities should be adapted accordingly, for example involving a detailed review of research proposals to ensure that they meet ethical and legal standards, protecting the rights and well-being of research participants and maintaining the integrity of the research itself.
Comment 1.7: ”Page 3, line 145: I could not understand who performs better when using products for users with disabilities. Moreover, why is there such a difference?
Authors’ response 1.7: We want to thank the reviewer for this comment. In the revised manuscript, we better explained the statements in section 2. We added more information to the following paragraph:
When user-based testing aims to assess the solution's performance in terms of the speed of performing tasks or errors when using the solution, it doesn't matter which users we include. Existing research has shown that performance of users with disabilities while testing a product can be very close to the performance of users without disabilities [26,32,33]. However, when analyzing and comparing the benefits of products for users with disabilities, the results between users with disabilities and users without disabilities can differ significantly. The difference could be attributed to the greater utilization of new technology's advantages to people with disabilities, while those without disabilities take less advantage of the benefits or don't even need them as much. For example, the study by Giudice et al. demonstrated that older adults who are blind or have a visual disability and experience difficulties in navigating could significantly benefit from a navigation system, and evaluating older participants with a visual disability is important, as the majority of vision loss is age-related [32].
Comment 1.8: “I suggest presenting the results from Table 8 and Figure 6 in descending order.”
Authors’ response 1.8: We want to thank the reviewer for this comment. The order in Table 8 (in the revised version Table 9) in the revised version, was organized in a descending order to make the clarity of the table higher. However, the order in Figure 6 was kept as it was and not reorganized in a descending order due to the chronological sequence of the testing stages. If we present results in descending order, we will lose the value of the presented nine stages-long user testing process.
Comment 1.9: “Page 9, line 244: the last sentence is duplicated.”
Authors’ response 1.8: We want to thank the reviewer for this comment. We don't know how we got to this typo. We have removed the duplicate sentence.
Comment 1.9: “IU --> UI (throughout the article)”
Authors’ response 1.7: We want to thank the reviewer for this comment. We also don't know how we got to this typo. We have, of course, changed all IUs into UIs.
Reviewer 2 Report
TITLE
Identification of challenges and best practices for including users with disabilities in user-based testing.
SUMMARY OF CONTRIBUTION
The paper describes a systematic literature review focused on disable people involved in user testing and produces some best practices as a synthesis.
FINAL EVALUATION
The paper is well structured, pleasant to read. It describes a good literature review and some interesting outcomes. To me, if minor revisions are placed following the suggestions below, the paper can be considered for publication in the Applied Sciences Journal.
COMMENTS
Abstract.
Line(s) 22 to 25: the last two sentences of the abstract should appear before.
1. Introduction.
Line(s) 31 to 142: all this stuff should be reduced into few lines (10-20 at most). The keyword disability never appears… everything refers to old (sometimes very old) concepts or definitions… Please keep in mind that the research focuses on disability.
2. Materials and methods.
Line(s) 168 to 178: “Phase 4” must appear explicitly (instead of “Next” in line 176).
Line(s) 181 (and followings): please recall the phases in the headings (instead of “2.1 Definition of research questions”, better “2.1 Phase 1: Definition of the Research Questions”).
Table 2: double included? Of course yes, but better make it explicit in the text.
Table 3: there are some glitches in this table. First, labels (In VS. En) are not enough to highlight the classification into inclusion and exclusion criteria; I suggest to divide the table in two. Second, criteria are not written homogeneously; all of them should start with the keywords “Include” or “Exclude”. Third, criteria are not linearly independent (there shouldn’t appear “must be in English” and “exclude what is not written in English” as two separate criteria).
Table 4: I suggest to delete “is sufficient” in every item of the table. Otherwise, you must define what you mean by sufficiency…
Table 4: question Q4 is really weird. I thought that the presence of disable people was mandatory to consider the papers in the review! To me, this could introduce lot of bias in setting importance weights during data reduction and processing…
3. Results.
Table 9: Do not use simple numbers to itemize the table voices. This could make the paper reading difficult. Instead, use labels like C1.1 (where C comes from Challenge).
Lines 339-342: here the authors write that the major problem in reducing data was the presence of much different disabilities. For this reason, it was impossible to generate practices good for every situation. That’s ok. Then? What about those described in the following? Are they for specific disabilities only? Are they the sole applicable in any situation, independently from the disability type? Unclear…
Table 10: right as suggested for Table 9, I suggest to make labels more meaningful for Table 10 too. BL1.1 (from Best Practices) would be much better. Moreover, all of this would make Table 11 clearer, having C-something crossing BP-something.
Finally, I am quite surprised to see that the following paper, answering to every requirement considered in the research, is not included in it.
2016 - Filippi, S., Barattin, D., 2016. Involving Autism Spectrum Disorder (ASD) affected people in design. Proceedings of the JCM2016, International Joint Conference on Mechanics, Design Engineering & Advanced Manufacturing, Pages 373-383. 14-16 September, 2016, Catania, Italy. Eynard, B., Nigrelli, V., Oliveri, M.S., Peris-Fajarnes, G., Rizzuti, S. (Eds.), Advances on Mechanics, Design Engineering and Manufacturing. Springer International Publishing AG, Cham, Switzerland. ISSN: 2195-4356. ISBN: 978-3-319-45780-2. eISBN: 978-3-319-45781-9. DOI: 10.1007/978-3-319-45781-9_38. SCOPUS: 2-s2.0-85019412612. WOS: 000400330900009.
Author Response
Dear Reviewer,
We are most grateful for your time and effort in reviewing our manuscript and for your very positive feedback. The manuscript has been revised according to the comments and suggestions provided by all reviewers. The revisions are highlighted in the Microsoft Word file to make changes easily visible. The comments and the corresponding responses are listed below.
Reply to reviewer’s comments:
Comment 2.1: “FINAL EVALUATION
The paper is well structured, pleasant to read. It describes a good literature review and some interesting outcomes. To me, if minor revisions are placed following the suggestions below, the paper can be considered for publication in the Applied Sciences Journal.”
Authors’ response 2.1: We want to thank the reviewer for recognizing the positive aspects of our manuscript. We are truly thankful for the reviewer's recognition.
Comment 2.2:” Abstract.
Line(s) 22 to 25: the last two sentences of the abstract should appear before.”
Authors’ response 2.2: We want to thank the reviewer for this comment. According to the comment, we moved the last two sentences before the sentence “The main result of this study is a list of challenges and good practices that are important in the different phases of user-based testing with users with disabilities.”.
Comment 2.3: “1. Introduction.
Line(s) 31 to 142: all this stuff should be reduced into few lines (10-20 at most). The keyword disability never appears… everything refers to old (sometimes very old) concepts or definitions… Please keep in mind that the research focuses on disability. “
Authors’ response 2.3: We want to thank the reviewer for this comment. We agree with the reviewer that the introduction was too long, and that disability needs to be defined in the introduction. We added a definition of the disability and reduced the introduction to place the study in a broad context and highlight why it is important. We were not able to reduce the introduction to only 20 lines. However, we hope that we have improved the introduction by increasing readability and understandability of the background and context of this study. To keep some basic definitions and related work, however, we moved several parts from the introduction to a newly added section 2. Backgrounds and related work that is highlighted with blue background.
Comment 2.4: “2. Materials and methods.
Line(s) 168 to 178: “Phase 4” must appear explicitly (instead of “Next” in line 176).
Line(s) 181 (and followings): please recall the phases in the headings (instead of “2.1 Definition of research questions”, better “2.1 Phase 1: Definition of the Research Questions”).”
Authors’ response 2.4: We want to thank the reviewer for this comment. We fully agree with the reviewer that referring to individual phases in the text is necessary. It is better to explicitly state the name of the phase because otherwise, it can confuse the reader. And yes, in the headings of the subsections, we should have used the same terms. We have corrected this part of the manuscript according to your kind advice.
Comment 2.5:” Table 2: double included? Of course yes, but better make it explicit in the text.”
Authors’ response 2.5: We want to thank the reviewer for this suggestion. We have added an explanation to the cross-reference to Table 2 in both appearances. In the first cross-reference to Table 2, we added: “(see selected libraries in Table 2)”. In the second cross-reference to Table 2, we explained the data presented: “The number of articles retrieved from selected digital libraries that were used as input into the next selection process steps is presented in Table 2. The most considerable number of articles was found in ACM Digital Library (562), followed by Scopus (255), IEEE Xplore (152) and Web of Science (116).”
Comment 2.6: ” Table 3: there are some glitches in this table. First, labels (In VS. En) are not enough to highlight the classification into inclusion and exclusion criteria; I suggest to divide the table in two. Second, criteria are not written homogeneously; all of them should start with the keywords “Include” or “Exclude”. Third, criteria are not linearly independent (there shouldn’t appear “must be in English” and “exclude what is not written in English” as two separate criteria).”
Authors’ response 2.6: We want to thank the reviewer for this suggestion. To highlight the inclusion and exclusion criteria classification, we divided Table 3 into two tables. We rewrote the inclusion criteria to make more homogeneous definitions of the criteria. Of course, since there was already criteria that included only articles written in English, there is no need for additional exclusion criteria that would exclude non-English articles. We want to thank you again for finding this error in the list of exclusion criteria. We have removed the exclusion criteria E5.
Comment 2.7:” Table 4: I suggest to delete “is sufficient” in every item of the table. Otherwise, you must define what you mean by sufficiency…
Table 4: question Q4 is really weird. I thought that the presence of disable people was mandatory to consider the papers in the review! To me, this could introduce lot of bias in setting importance weights during data reduction and processing…”
Authors’ response 2.7: We want to thank the reviewer for this comment. We agree that the description of the quality criteria had to be improved. The quality evaluation process was not properly explained in the text, which the first reviewer also pointed out. We have added a paragraph to Table 5 (which was Table 4 in the original manuscript that we divided into two tables in the revised version according to the suggestion in comment 2.6) where we explained the four quality criteria that we used for assessing the identified articles before final inclusion in the data extraction. Since our research aimed to gain barriers and good practices in the inclusion of users with disabilities, the article needed to report the implementation of a study with the inclusion of such users (Q4). We were interested in those studies that, based on practical experience, could provide the challenges in including users with disabilities in the user-based testing and good practices based on practical experience.
Comment 2.8:” Table 9: Do not use simple numbers to itemize the table voices. This could make the paper reading difficult. Instead, use labels like C1.1 (where C comes from Challenge).”
Authors’ response 2.8: We want to thank the reviewer for this suggestion. We changed simple numbers to labels in both tables – CX.X in Table 10 (in the original version of the manuscript, it was Table 9) and BPX.X in Table 11 (Table 10 in the original manuscript) to emphasize when we are writing about challenges and best practices.
Comment 2.9:” Lines 339-342: here the authors write that the major problem in reducing data was the presence of much different disabilities. For this reason, it was impossible to generate practices good for every situation. That’s ok. Then? What about those described in the following? Are they for specific disabilities only? Are they the sole applicable in any situation, independently from the disability type? Unclear…”
Authors’ response 2.9: We want to thank the reviewer for the excellent observation and agree with the point. This limitation was additionally explained with the following statements in Section 5. Discussion:
The needs and expectations of end users can differ significantly depending on the type of disability. As a result, the challenges and best practices that can be applied to users with a specific type of disability can also differ. Due to the lack of research in the existing literature, it is currently difficult to develop general guidelines for the implementation of best practices for all types of disabilities nor a list of best practices associated with a specific disability. For this reason, it will be necessary to wait for new research, which can be expected in the future, given the established trend of published research in the last years.
And the following statements in subsection 5.2 Limitations:
Within the framework of the identified challenges and best practices of involving users with disabilities in user-based testing, it is necessary to consider the limitations associated with the scope of studies published in the existing literature.
In section 6. Conclusions, we added proposals for future research that is needed in this field based on the outcomes:
Results of this study showed an opportunity for further systematic mapping research, the goal of which would be to prepare a systematic mapping of challenges and good practices according to the type of disability. We also see the possibility for future research, which will enable the confirmation or validation of identified challenges and good practices in cooperation with users with specific types of disability who have practical experience with active involvement in user-based testing.
Comment 2.10:” Table 10: right as suggested for Table 9, I suggest to make labels more meaningful for Table 10 too. BL1.1 (from Best Practices) would be much better. Moreover, all of this would make Table 11 clearer, having C-something crossing BP-something.”
Authors’ response 2.10: We want to thank the reviewer for this excellent suggestion. As explained in answer to comment 2.8, we made labels more meaningful in Table 11 (previously Table 10) to clarify the table.
Comment 2.11:” Finally, I am quite surprised to see that the following paper, answering to every requirement considered in the research, is not included in it.
2016 - Filippi, S., Barattin, D., 2016. Involving Autism Spectrum Disorder (ASD) affected people in design. Proceedings of the JCM2016, International Joint Conference on Mechanics, Design Engineering & Advanced Manufacturing, Pages 373-383. 14-16 September, 2016, Catania, Italy. Eynard, B., Nigrelli, V., Oliveri, M.S., Peris-Fajarnes, G., Rizzuti, S. (Eds.), Advances on Mechanics, Design Engineering and Manufacturing. Springer International Publishing AG, Cham, Switzerland. ISSN: 2195-4356. ISBN: 978-3-319-45780-2. eISBN: 978-3-319-45781-9. DOI: 10.1007/978-3-319-45781-9_38. SCOPUS: 2-s2.0-85019412612. WOS: 000400330900009.”
Authors’ response 2.11: As we wrote in the Limitations section, although several digital libraries were used and five researchers were involved in searching for relevant literature, not all relevant studies were identified and included in the review after applying inclusion and exclusion criteria and quality assessment. We want to thank the reviewer for directing us to the study. We have added this study to the list of included studies, extracted the data and updated the information in different parts of the manuscript. We have also highlighted all data updates in the manuscript accordingly.
Reviewer 3 Report
I do not see the purpose of your study. I think introduction could be more organized. It would be better to highlight the significance of the study. The tables should be interpreted better. Too long tables. Very complicated. It is not easy to follow.
I do not see the purpose of your study. I think introduction could be more organized. It would be better to highlight the significance of the study. The tables should be interpreted better. Too long tables. Very complicated. It is not easy to follow.
Author Response
Dear Reviewer,
We are most grateful for your time and effort in reviewing our manuscript. The manuscript has been revised according to all reviewer's suggestions. The revisions are highlighted in the Microsoft Word file so that changes are easily visible. The comments and the corresponding responses are listed below.
Reply to reviewer’s comments:
Comment 3.1: “I do not see the purpose of your study. I think introduction could be more organized. It would be better to highlight the significance of the study.”
Authors’ response 3.1: Thank you for pointing out this shortcoming. As other reviewers similarly pointed out, we reorganized the introduction. We added some missing definitions and reduced the introduction to briefly place the study in a broad context and highlight why it is important. To keep some important definitions and related work, we moved the content from the introduction into section 2. Backgrounds and related work. We also added the following statements to emphasize the main contribution of this study:
The main contribution of this study is a list of challenges and best practices for involving users with disability in user-based testing that have been documented in existing scientific literature. The result of our research can be used to identify shortcomings in the existing literature in this field. The results of this study also provide the basis for building a catalogue of patterns of adequate inclusion of users with barriers in user testing.
Comment 3.2: “The tables should be interpreted better. Too long tables. Very complicated. It is not easy to follow.”
Authors’ response 3.2: Thank you for pointing out this shortcoming. We agree with the reviewer that the tables, as also suggested by other reviewers, had to be improved. Accordingly, we added improved descriptions of several tables and shortened the text in Tables 10 and 11 (in the original manuscript, these were Tables 9 and 10). We shortened the descriptions of individual challenges and good practices, not omitting the most crucial information we extracted from existing articles.
Round 2
Reviewer 3 Report
It looks improved.
It looks improved.