Next Article in Journal
Detecting Urban Commercial Districts by Fusing Points of Interest and Population Heat Data with Region-Growing Algorithms
Previous Article in Journal
Analysing River Systems with Time Series Data Using Path Queries in Graph Databases
 
 
Article
Peer-Review Record

The Think-Aloud Method for Evaluating the Usability of a Regional Atlas

ISPRS Int. J. Geo-Inf. 2023, 12(3), 95; https://doi.org/10.3390/ijgi12030095
by Tomas Vanicek and Stanislav Popelka *
Reviewer 2: Anonymous
Reviewer 3:
ISPRS Int. J. Geo-Inf. 2023, 12(3), 95; https://doi.org/10.3390/ijgi12030095
Submission received: 26 January 2023 / Revised: 14 February 2023 / Accepted: 23 February 2023 / Published: 26 February 2023

Round 1

Reviewer 1 Report

Thank you for the opportunity to review this very interesting article entitled "The Think-aloud Method for Evaluating the Usability of a Regional Atlas". The article very aptly describes the use of think-aloud methods in evaluating the usability of a cartographic product.
The authors describe their chosen method very forcefully both in the introduction and in the following chapters, but rely on outdated literature at crucial points in their argumentation. I would like to add more recent studies, e.g. in lines 42-45 and 164-167.
Line 221 - it seems that the earlier review of the literature already confirms this, especially sub-chapter 1.3.
Lines 244-245 - subjective selection will be subjective, even if in consultation with the authors of the Atlas, please specify the choice of questions.
Subsection 2.2 - did the authors consider cognitive tests as part of the selection of question difficulty?
There are no statistical confirmations for the numbers obtained.
It would be reasonable to calculate the efficiency index in the context of the relationship between the time of task completion and the correctness of task completion.

Author Response

Thank you for the opportunity to review this very interesting article entitled "The Think-aloud Method for Evaluating the Usability of a Regional Atlas". The article very aptly describes the use of think-aloud methods in evaluating the usability of a cartographic product.

The authors describe their chosen method very forcefully both in the introduction and in the following chapters, but rely on outdated literature at crucial points in their argumentation. I would like to add more recent studies, e.g. in lines 42-45 and 164-167.

More recent studies have been added to the end of the Introduction section and also at the end of section 1.2.

Line 221 - it seems that the earlier review of the literature already confirms this, especially sub-chapter 1.3.

This sentence has been made more specific: "The purpose of the study was not only to find the weaknesses of the atlas, but also to see if the think-aloud method was at all suitable for an atlas evaluation."

Lines 244-245 - subjective selection will be subjective, even if in consultation with the authors of the Atlas, please specify the choice of questions.

Since the general public can be users of the atlas, the tasks have been designed so that the "average" person (regardless of age) can solve them. = This sentence was added to the paper for 2.2.

Subsection 2.2 - did the authors consider cognitive tests as part of the selection of question difficulty?

Cognitive tests were not used to develop the questions because of the breadth of potential product users (and participants). On the other hand, the tasks created were (subjectively) judged to be relevant to the research needs and did not require any expertise from the participant. = These sentences were added to the paper for 2.2.

There are no statistical confirmations for the numbers obtained.

It is not necessary to add any further statistical calculations since the evaluation was performed qualitatively. Moreover, the article would be unnecessarily long and full of too much detailed information. The reader only needs to look visually to confirm the correlation between the data in the two tables. On the other hand, some sentences have been rewritten to make them more understandable.

It would be reasonable to calculate the efficiency index in the context of the relationship between the time of task completion and the correctness of task completion.

The think-aloud method primarily focuses on the participant's views. It can be said that usability metrics are thus more of an add-on to the evaluation and serve more as a basis for justifying any negative opinions of the user. The result already shows which parts of the product the participants had problems with, and therefore there is no need to calculate additional metrics and deepen the research.

 

Reviewer 2 Report

The paper presents application of the think-aloud method for evaluation of usability of the "Atlas of the Moravian-Silesian Region". The topic of the article is interesting and belongs to a category that is rare in the IJGI journal thus may be interesting to its audience. A coherent and comprehensive overview of the method and its application is provided, the problems of applying the method are discussed. 

 

The testing scenario given in the Table 1 should include more details - the order of the tasks should have been provided (whether the tasks were carried out one after the other in the order specified, or whether the order could be changed) and the time allocated to each task.

 

In 2.2, The authors contradictorily state: "test participants should be able to solve the task in a virtually automated manner, but at the same time, the task must be sufficiently difficult for them." In any case, the assumption of task optimality should be more strongly supported by data on the study group and the specificity of the questions. Which methodology was used to formulate the particular questions?  This is particularly important given that the participants in the experiment were of different ages, so the questions could not be linked to the school curriculum alone. 

 

The methodological validity of having such different age groups in a relatively small group of 13 people is questionable. People aged 18 and 60+ may use the atlases very differently, and when the number of participants in each group is very small, the results are less reliable. 

 

Section 2.4 should be more systematic, clearer and focused on concrete experiment design details.  Phrases like "the other side contained the introductory task. This logic task was presented to the participant before the actual testing began." are vague and should be revised. This section should not be about what "should" or "could" be done, but only about how it was done namely in this experiment. 

 

Figure 8. Error rate analysis - only includes facts (number of errors), not the results of analysis.

 

The analysis of the results presented here is not very deep, although it reveals key observations. Concerining the satisfaction issue, I would like to see more quantitative analysis and e a generalisation, not just a mention of the facts.

 

Discussion and conclusion parts should be shorter, more specific and more systematic, focusing on the findings of the study and, where possible, based on quantitative data.  

 

Despite the shortcomings mentioned above, I believe that the authors have achieved their goal to demonstrate the applicability of the think-aloud method for atlas as a complex cartographic products. 

 

Style, and in some cases, grammar of English should be improved. For example, the term "sub-goal" doesn't seem appropriate in section 1.4 - the listed items are tasks, and "creation of the experiment" (design? conducting of  the experiment?) sounds strange. Constructions like in 2.4 "the questions [...] should be colour coded below each other" and similar must be revised.  

I recommend thorough revision by a native speaker.  

Author Response

The paper presents application of the think-aloud method for evaluation of usability of the "Atlas of the Moravian-Silesian Region". The topic of the article is interesting and belongs to a category that is rare in the IJGI journal thus may be interesting to its audience. A coherent and comprehensive overview of the method and its application is provided, the problems of applying the method are discussed. 

The testing scenario given in the Table 1 should include more details - the order of the tasks should have been provided (whether the tasks were carried out one after the other in the order specified, or whether the order could be changed) and the time allocated to each task.

Participants went through the questions one by one (from 1 to 10). Participants strictly follow the test scenario, which is an advantage because the researcher does not have to interfere in the testing process. At the same time, there was no time limit for completing the task. = These sentences were added to the paper for 2.2.

In 2.2, The authors contradictorily state: "test participants should be able to solve the task in a virtually automated manner, but at the same time, the task must be sufficiently difficult for them." In any case, the assumption of task optimality should be more strongly supported by data on the study group and the specificity of the questions. Which methodology was used to formulate the particular questions?  This is particularly important given that the participants in the experiment were of different ages, so the questions could not be linked to the school curriculum alone.

Cognitive tests were not used to develop the questions because of the breadth of potential product users (and participants). On the other hand, the tasks created were (subjectively) judged to be relevant to the research needs and did not require any expertise from the participant. Since the general public can be users of the atlas, the tasks have been designed so that the "average" person (regardless of age) can solve them. = These sentences were added to the paper for 2.2.

The methodological validity of having such different age groups in a relatively small group of 13 people is questionable. People aged 18 and 60+ may use the atlases very differently, and when the number of participants in each group is very small, the results are less reliable.

This "random" strategy of selecting participants from different age groups was done deliberately because potential users of the product could be all people from these age groups. The more participants from the selected groups, the more reliable the data would be, but the research costs would increase. Part of the discussion in the article was enriched by these sentences.

Section 2.4 should be more systematic, clearer and focused on concrete experiment design details.  Phrases like "the other side contained the introductory task. This logic task was presented to the participant before the actual testing began." are vague and should be revised. This section should not be about what "should" or "could" be done, but only about how it was done namely in this experiment.

This section has been checked and rewritten.

Figure 8. Error rate analysis - only includes facts (number of errors), not the results of analysis.

The word "analysis" has been deleted from the image description.

The analysis of the results presented here is not very deep, although it reveals key observations. Concerining the satisfaction issue, I would like to see more quantitative analysis and e a generalisation, not just a mention of the facts.

As described in the 3.3. section, satisfaction characteristics were collected through a simple post-test interview and also through observer notes. For this reason, a quantitative type of analysis cannot be used. However, the results sections have been rewritten slightly to make them more understandable.

Discussion and conclusion parts should be shorter, more specific and more systematic, focusing on the findings of the study and, where possible, based on quantitative data.

These sections have been shortened and streamlined.

Despite the shortcomings mentioned above, I believe that the authors have achieved their goal to demonstrate the applicability of the think-aloud method for atlas as a complex cartographic products. 

Style, and in some cases, grammar of English should be improved. For example, the term "sub-goal" doesn't seem appropriate in section 1.4 - the listed items are tasks, and "creation of the experiment" (design? conducting of  the experiment?) sounds strange. Constructions like in 2.4 "the questions [...] should be colour coded below each other" and similar must be revised. I recommend thorough revision by a native speaker.  

This article has been submitted for English language review performed by a native speaker. Certain words have been changed and specified.

 

Reviewer 3 Report

 

Interesting topic and research in help of cartographers is presented.

It is not clear why the think-aloud method is chosen for the proposed research article. There is a short statement about this question (lines 43-45) which is not enough as an answer. Will be good if the authors make some additional research and provide evidences for the choice of method.

Another question should be clarified. How “people with an extroverted nature are most often chosen for testing” are defined for the test.

If the think-aloud method has its roots in psychological research, it is not clear if other cartographers used the same method for cartographical research although some applications in cartography are shown in 1.3. Pont 1.2 should be completed in cartographic aspect or the name of point 1.2. should be changed. The text thus constructed shows that the method is new to cartography. Is that correct?

There is not localization of the objects represented on the maps in the atlas because graticules of the maps are missing (Fig. 3, Fig. 11). This fact will make a lot of difficulties to the readers, for examples the participants can go wrong to the questions start with “Where?” (line 265). Could be better if appropriate maps are used for this research.

It is not clear how some of questions will help to improve the atlas which is one of the main goal of the research: e.g. No 2. Are the questions cover the Atlas content or not? If not, the results will help to improve only the maps with the given attention.

13 participants are not enough to receive and analyze results (even one in one of age groups). At least 20-30 participants per age group could be sufficient. Professionals could be divided from other participants and results should be done for every age group.

According to Nielsen [31], there is a simple answer for the number of participants in these qualitative experiments, namely five. This is because testing with five 648 participants will find almost the same number of product usability problems (up to 80 %) 649 as with a much larger number of test participants”. It is difficult to agree in the case of atlas evaluating by participants in different age groups.

The professional reviews of the Atlas also could improve its quality.

Author Response

Interesting topic and research in help of cartographers is presented.

It is not clear why the think-aloud method is chosen for the proposed research article. There is a short statement about this question (lines 43-45) which is not enough as an answer. Will be good if the authors make some additional research and provide evidences for the choice of method.

Two sentences have been added to the Introduction section to give an idea of the purpose of the method: “For this study, the think-aloud method was chosen. The purpose was to see if the method was applicable to a more complex cartographic product, specifically an atlas evaluation." In subsection 1.4 Objectives of the study, the objective was more specified: “The purpose of the study was not only to find the weaknesses of the atlas, but also to see if the think-aloud method was at all suitable for an atlas evaluation.”

Another question should be clarified. How “people with an extroverted nature are most often chosen for testing” are defined for the test.

Subsection 2.3. Participants has been expanded by a few sentences: “As a matter of method principle, it is preferable to select participants with an extroverted nature, and extroverts were preferred for this study. However, this characteristic is difficult to estimate in advance, so before the actual testing began, participants were given a logic task to test how easily they could verbalize their thoughts.” At the same time, subsection 2.4 was slightly expanded to include information on the logic training task, and one sentence was added to the end of section 1.1 "According to Alnashri, extroverts are able to identify a higher number of usability is-sues, have a higher success rate in completing tasks, and are more comfortable verbalizing their thoughts."

If the think-aloud method has its roots in psychological research, it is not clear if other cartographers used the same method for cartographical research although some applications in cartography are shown in 1.3. Pont 1.2 should be completed in cartographic aspect or the name of point 1.2. should be changed. The text thus constructed shows that the method is new to cartography. Is that correct?

Section 1.2 has been renamed "General origins of the think-aloud method" to deal more with generality. This section should give the reader an insight into the reason for the origin of the method, the importance of its emergence and also show some flexibility of the method. The reader should get the idea that the emergence of the method was an important milestone and that the method is still applicable today. 

Section 1.3 deals more with cartographic applications. Unfortunately, the method has not been used that much in cartography (see also the second line in the abstract). This is also the reason why this section is so short. However, the result of this section should be that the method is fully applicable to cartographic products due to its principle.

There is not localization of the objects represented on the maps in the atlas because graticules of the maps are missing (Fig. 3, Fig. 11). This fact will make a lot of difficulties to the readers, for examples the participants can go wrong to the questions start with “Where?” (line 265). Could be better if appropriate maps are used for this research.

The main aim of the research was to evaluate the usability of the atlas. However, the atlas is a complex product and it is not realistic to evaluate all pages/elements in it. Therefore, in consultation with the authors, only specific (important) topics were selected for testing and the associated maps (part 2.2). Finding a specific object on the map was not usually the aim of the questions, but the thought processes during the search for answers were important.

It is not clear how some of questions will help to improve the atlas which is one of the main goal of the research: e.g. No 2. Are the questions cover the Atlas content or not? If not, the results will help to improve only the maps with the given attention.

The first three questions served primarily to get the participant used to working with the atlas. More important were the subsequent topics (starting with question 4). = This sentence was added to the paper for 2.2.

13 participants are not enough to receive and analyze results (even one in one of age groups). At least 20-30 participants per age group could be sufficient. Professionals could be divided from other participants and results should be done for every age group.

This "random" strategy of selecting participants from different age groups was done deliberately because potential users of the product could be all people from these age groups. The more participants from the selected groups, the more reliable the data would be, but the research costs would increase. Part of the discussion in the article was enriched by these sentences. The reason why the number of thirteen participants is sufficient is given in the discussion section of the paper.

According to Nielsen [31], there is a simple answer for the number of participants in these qualitative experiments, namely five. This is because testing with five 648 participants will find almost the same number of product usability problems (up to 80 %) 649 as with a much larger number of test participants”. It is difficult to agree in the case of atlas evaluating by participants in different age groups.

Thirteen participants took part in the testing, more than Nielsen and other experts recommend. Moreover, as has been shown, this was a sufficient number and many of the product's shortcomings were revealed, although the representation for each age group was not equal.

The professional reviews of the Atlas also could improve its quality.

Yes, that's why experts with a cartographic background were part of the testing. Moreover, the atlas was send to professional cartographers before its publishing and they provided professional reviews. 

 

Round 2

Reviewer 3 Report

I accept the review version of the article.

Back to TopTop