The Effect of an Alternative Definition of “Percent Highly Annoyed” on the Exposure–Response Relationship: Comparison of Noise Annoyance Responses Measured by ICBEN 5-Point Verbal and 11-Point Numerical Scales

Since the development of the 5-point verbal and 11-point numerical scales for measuring noise annoyance by the ICBEN Team 6, these scales have been widely used in socio-acoustic surveys worldwide, and annoyance responses have been easily compared internationally. However, both the top two categories of the 5-point verbal scale and the top three ones of the 11-point numerical scale are correspond to high annoyance, so it is difficult to precisely compare annoyance responses. Therefore, we calculated differences in day–evening–night-weighted sound pressure levels (Lden) by comparing values corresponding to 10% highly annoyed (HA) on Lden_%HA curves obtained from measurements in 40 datasets regarding surveys conducted in Japan and Vietnam. The results showed that the Lden value corresponding to 10% HA using the 5-point verbal scale was approximately 5 dB lower than that of the 11-point numerical scale. Thus, some correction is required to compare annoyance responses measured by the 5-point verbal and the 11-point numerical scales. The results of this study were also compared with those of a survey in Switzerland.


Introduction
Schultz [1] used the term "percent highly annoyed" (%HA) to define the rate of people who were classified either in the top two categories of a 7-point scale (cutoff value: 71%) or in the top three categories of an 11-point scale (cutoff value: 73%) for measuring noise annoyance. He also emphasized the importance of high annoyance rather than median annoyance, because median annoyance is more influenced by non-acoustical variables than high annoyance. He also pointed out that the median response is much more difficult to translate from one annoyance scale to another and, furthermore, corresponds to no complaint and thus cannot be used for policy purposes.
Schultz [1] showed the synthesized curve relating the day-night-weighted sound pressure level (Ldn) to %HA regardless of the noise source based on social survey data reported at the time. Miedema and Vos [2] proposed separate exposure-response relationships for noise sources through secondary analysis, adding data to the work of Schultz. Then, they defined the upper 28% of annoyance scales (cutoff value: 72%) as %HA, assuming that the scale intervals were equidistant from 0 through 100, regardless of different scale points. Calculation of the 72% cut-off-point was achieved by weighing the annoyance responses of category 4 (very annoyed) with a weight of 0.4. On the other hand, Fields et al. and the Team 6 of the International Commission on Biological Effects of Noise (ICBEN) [3] proposed that 5-point verbal ("not at all," "slightly," "moderately," "very," and "extremely" annoyed) and 11-point numerical scales (labeling two extremes as "not at all" and "extremely" annoyed) should be used in socio-acoustic surveys, and these scales are adopted by the International Organization for Standardization/technical specifications (ISO/TS) 15666:2003 [4]. Fields et al. [3] proposed defining the top two categories of the 5-point verbal scale (cutoff value: 60%) as high annoyance, because the meanings of "very" and "highly" are similar, but %HA is not defined in ISO/TS 15666:2003.
Initially, Fields et al. constructed the standardized annoyance scales in nine languages in an international joint study [3], and these have since been developed in other languages. Gjestland [5] collected the annoyance scales and question wordings published through 2017 in 17 languages: English, Dutch, French, German, Hungarian, Japanese, Norwegian, Spanish, Turkish, Polish, Danish, Portuguese, Rumanian, Chinese, Korean, Vietnamese, and Thai. Slovenian scales were published in 2018 [6], and to our knowledge, the standardized annoyance scales have been published in these 18 languages only.
Fields et al. [3] measured the intensities of 21 modifiers in English by line marking on a 0-100 scale. Since the average intensity of "Highly" is 79 [3] (p. 661), two categories of the 7-point scale (cutoff value: 71) and three categories of the 11-point scale (cutoff value: 73) corresponding to high annoyance are reasonable considering the range of values indicating %Highly Annoyed. If only the highest category of the 7-point scale (cutoff value, 86) is used to designate a high annoyance, this would result in an extreme response. On the other hand, if the top three categories of the 7-point scale (cutoff value, 57) are included, the response would be classified as median annoyance.
Since the development of the standardized annoyance scales by the ICBEN, they have been used in most socio-acoustic surveys. However, either a 73% cutoff HA (top three categories of the 11-point numerical scale) or a 60% cutoff HA (top two categories of the 5-point verbal scale) has usually been reported in publications. An exception is the study by Wothge et al. [7], who reported the combined effects of aircraft and road traffic noise and of aircraft and railway noise in the Noise-Related Annoyance, Cognition, and Health (NORAH) study. They used only the 5-point verbal scale and demonstrated the relationships between day-evening-night-weighted sound pressure level (Lden) and %HAs for both 60% and 72% cutoffs in their article. The calculation of the 72% cut-off-point was performed according to the method of Miedema and Vos [2]. They investigated differences in Lden between exposure-response curves for 60% and 72% cutoffs at 10% HA and found that the differences were quite large (8-14 dB). This difference is not negligible, considering that Lden values in the exposure-response curves at 10% HA were adopted in the World Health Organization (WHO) Environmental Noise Guidelines in 2018 [8].
Guski et al. [9] conducted a systematic review on environmental noise annoyance for the development of the WHO Environmental Noise Guidelines. They selected survey studies through a predefined framework (population, intervention and/or exposure, control, confounder, outcome, and study design; PECCOS) and systematically meta-analyzed them. However, %HAs for 73% and 60% cutoffs were found to coexist in the studies. Of the 15 selected aircraft noise surveys, 14 used a 73% cutoff, and 1 used a 60% cutoff. Of 26 road traffic noise surveys, 23 used a 73% cutoff, and 3 used a 60% cutoff. Of 11 railway noise surveys, the final analysis was performed using data from 10 surveys, of which 4 used a 73% cutoff and 6 used a 60% cutoff. As shown [7], there is a large difference in the Lden_%HA relationships between the cutoff values of 60% and 72% as %HA. Therefore, a correction for this difference should be applied to precisely conduct a meta-analysis.
While Guski et al. [9] stated that the de facto %HA should be 73%, they analyzed the data shown in the articles without correction, such as for the translation from a 60% cutoff to a 72% cutoff. For aircraft noise, a survey with a 60% cutoff was not used in the final analysis. As for road traffic noise, they showed results excluding surveys with a 60% cutoff as well as Japanese and Vietnamese surveys. However, for railway noise, they showed the representative exposure-response relationships excluding only a shinkansen (bullet train) noise survey in Japan, because the exposure-response relationships cannot be drawn if all the surveys with a 60% cutoff are excluded. Therefore, Guski et al. emphasized the need of a re-evaluation including older data (surveys after 2000 were included in the systematic review).
Brink et al. [10] investigated the following factors to be considered when measuring annoyance: scale type (5-point verbal and 11-point numerical scales), position of the annoyance questions, order of the modifiers of the annoyance scale (ascending or descending), and season (spring or autumn). The value of %HA with a 72% cutoff was obtained according to a method previously described [2]. In terms of scale type, there was lower weighting of the relationship between Ldn and 72% cutoff HA for responses that fell into the second category from the top of the 5-point verbal scale compared with that between Ldn and 73% cutoff HA measured by the 11-point numerical scale. Nguyen et al. [11] investigated corresponding relationships between responses measured by 5-point verbal and 11-point numerical scales based on data from 15 social surveys carried out in Japan and Vietnam and compared (1) quadratic regression curves between Lden and 72% cutoff HA following Miedema and Vos [2], (2) logistic regression curves between Lden and 73% cutoff HA measured by the 11-point numerical scale, and (3) logistic regression curves between Lden and 60% cutoff HA measured by the 5-point verbal scale. They showed that curve (1) was consistent with curve (2), which was lower than curve (3).
We have conducted socio-acoustic surveys using both the 5-point verbal and the 11point numerical scales in Japan and Vietnam since the standardized annoyance scales were proposed by ICBEN. In this paper, we drew three curves by using data from 29 social surveys including the abovementioned 15 surveys that Nguyen et al. [11] used: (a) the exposure-response relationship between Lden and 60% cutoff HA (top two categories) of the 5-point verbal scale, (b) that between Lden and 73% cutoff HA (top three categories) of the 11-point numerical scale, and (c) that between Lden and 72% cutoff HA calculated following Schreckenberg's method [12], in which responses to the second category from the top of the 5-point verbal scale were randomly divided into two groups, i.e., 40% (HA) and 60% (not HA). Schreckenberg's method is basically based on the same idea as the method of Miedema and Vos [2], but it is useful when analyzing individual data, such as in logistic regression analysis, which was applied in the present study. Then, differences in Lden at 10% HA between curve (a) and curves (b) or (c) were calculated. The objectives of this study were to investigate (1) whether some correction is necessary when using a 60% cutoff to obtain the equivalent Lden at 10% HA of the exposure-response curve using a 73% cutoff and (2) whether there are differences in correction values between 72% and 73% cutoffs, when comparing Japanese and Vietnamese results and with respect to noise sources, if the correction is necessary, and (3) to compare the results with those obtained in Switzerland by Brink et al. [10].

Dataset
As shown in Table 1, we conducted 29 social surveys over 18 years using the 5-point verbal and 11-point numerical scales proposed by ICBEN. The number of respondents ranged from approximately 200 to 1500, and the response rates ranged from 29% to 99%, which are low for Japan and high for Vietnam. There were 14 surveys conducted in Japan, and 15 in Vietnam. There were five surveys on conventional railway noise, four on shinkansen noise, five on combined noise, nine on aircraft noise, and six on road traffic noise. The conventional railway and shinkansen railway noise surveys and the aircraft and road traffic noise surveys were mainly conducted in Japan and Vietnam, respectively. In the combined noise surveys, we evaluated three kinds of annoyance caused by road traffic, aircraft, and the total (road traffic + aircraft) noises, as well as those caused by conventional railway, shinkansen railway, and the total (conventional + shinkansen railway) noises. Thus, each of the five studies on combined noise surveys was divided into three datasets. In addition, the survey of "2016_KNZ_SR" was divided into two datasets because it was conducted in two different regions. Accordingly, a total of 40 (23 + 3 × 5 + 1 × 2) datasets were reanalyzed. In Japan, respondents were selected using a nearest birthday method on a one person per family basis. On the other hand, in Vietnam, each family member was asked to answer in order of age: father, mother, and other adults over 18 years old. While the distribute-collect and distribute-mail methods were used in Japan, the face-to-face interview method was used in Vietnam. Table 1. List of datasets used in the present analysis. Abbreviations in the "Noise source" column indicate the following: CR, conventional railway; SR, shinkansen railway; CB, combined noise source; CA, civil aircraft; RT, road traffic. The numbers in the parentheses in the first column correspond to those of the references at the end of this paper.

Survey ID
Year Lden was available for all surveys except for the survey of 2001_SAP_CR. In the survey 2001_SAP_CR, only Ldn was available. Therefore, the difference between Lden and Ldn was confirmed using the five railway noise datasets (2002_FUK_CR, 2009_KUM_CR, 2010_KUM_CR, 2011_KUM_CR, 2012_KUM_CB) for which both Lden and Ldn were available. As a result, the difference between both metrics was found to be in the range from 0.4 dB to 0.6 dB and is not significant. Therefore, it was judged that the difference between Lden and Ldn for conventional railway noise in Japan is small, and Ldn was used instead of Lden in the survey 2001_SAP_CR. All noise exposure data were obtained using field noise measurements and distance reduction equations, except for aircraft noise, which was measured at a reference point at each survey site. If these scales are equidistant, their cutoff points are 60% and 73%, respectively, and the shaded areas show the range of HA. Because the area of a 60% cutoff is larger than that of a 73% cutoff, the HA response to the 5-point verbal scale is easily expected to be larger than that of the 11point numerical scale. The relationships between Lden and %HA for 60% and 73% cutoffs are schematically shown in Figure 2. WHO guidelines for environmental noises [8] recommend Lden values corresponding to 10% HA to be the guideline values. Therefore, the Lden value of a 60% cutoff at 10% HA (abbreviated as Lden_10% (60)) is usually smaller than that of a 73% cutoff (Lden_10% (73)), and it is expected that there is a difference between the two values (ΔL = Lden_10% (73) − Lden_10% (60)). In this paper, Lden at 10% HA was estimated from the results of logistic regression analysis with either HA or not HA as the dependent variable and Lden as an independent variable. In the analysis, Lden as a continuous variable was applied to individual data above 30 dB of Lden. The difference between 60% and 73% cutoffs is represented by ΔL1. In addition, following Schreckenberg (2013) [12], responses to the second category from the top of the 5-point verbal scale were randomly divided into two groups: 40% (HA) and 60% (not HA). The averages of the Lden values in both groups were not significantly different (t-test, p > 0.05). In this case, the cutoff point was 72%. Logistic regression analysis was applied to the above processed data. The Lden value of a 72% cutoff at 10% HA is represented by Lden_10% (72), and the difference between Lden_10% (60) and Lden_10% (72) (ΔL2) was obtained. All analyses were performed using JMP 11 software (SAS Institute Inc., Cary, NC, USA, 2013).

Analysis of Individual Datasets
We applied logistic regression analysis to the 40 individual datasets and calculated Lden_10% (60), Lden_10% (73), Lden_10% (72), ΔL1, and ΔL2. First, we identified the cutoff points for 60% on the 5-point verbal scale, 73% on the 11-point numerical scale, and 72% according to a method reported in the literature [12] and applied logistic regression analysis. Next, Lden corresponding to 10% HA was estimated from the exposure-response relationships. Finally, we calculated differences between Lden_10% (60) and Lden_10% (73) and between Lden_10% (60) and Lden_10% (72) (ΔL1 and ΔL2). The results are shown in Table  2. The values of ΔL1 and ΔL2 were widespread, particularly when the odds ratio of Lden was not significant and the area under the curve (AUC) was less than 0.7. For example, 2016_KNZ_SR and 2007_HCM_RT had very small noise exposure ranges from 45 to 55 dB and from 75 to 83 dB, respectively, and thus the slopes of the curves were small. Therefore, the datasets that had odds ratios of Lden were not significant, and AUCs were less than 0.7 were excluded. The averages of ΔL1 and ΔL2 are shown in Table 3. The overall averages of ΔL1 and ΔL2 regardless of noise source and country were almost the same at 4.6 dB and 4.3 dB, respectively. While the ΔL1 and ΔL2 ranged from 3 to 6 dB depending on noise source, the differences between ΔL1 and ΔL2 were small, except for road traffic noise. The averages of ΔL1 and ΔL2 in Japan and Vietnam regardless of the noise source were almost the same, while there was a 1.4 dB difference (5.0 dB-3.6 dB) in ΔL2 between Japan and Vietnam. Table 2. Lden values at 10% HA for 60%, 73%, and 72% cutoffs of HA, ΔL1, and ΔL2. "CB" in the "Survey ID" column indicates the combined noise survey. For example, "CB_CR" indicates data of conventional railway noise in the combined noise survey. The odds ratio for Lden in each dataset analysis is also shown with the 95% confidential interval (CI). Numbers in red indicate that the AUC values of the logistic regression model are below 0.7.

Analysis of the Total Dataset
In the results of Table 3, the datasets with low model fit were excluded. In this subsection, multiple logistic regression analysis with HA as the dependent variable and Lden and dichotomous variables for noise sources as independent variables were applied to the total dataset to confirm whether the same trend for ΔL1 and ΔL2 was obtained as in the above subsection. The results of Lden_10% (60), Lden_10% (73), Lden_10% (72), ΔL1, and ΔL2 are shown in Table 4. The odds ratios of Lden were significant (p > 0.05), and the AUCs were larger than 0.7 in all three analyses. The values of ΔL1 and ΔL2 were slightly larger for road traffic noise than those for the other noises, which were almost the same as those in Table  3. The difference between ΔL1 and ΔL2 was around 2 dB at the maximum, and overall, the results were consistent with those in Table 3. The average ΔL1 and ΔL2 were 4.8 and 4.9 dBs, respectively.

Difference between Japanese and Vietnamese Data and Swiss Data
Brink et al. [10] showed that there appeared to be lower weighting of the relationship between Ldn and 72% cutoff HA calculated from responses of the second category from the top of the 5-point verbal scale compared to that between Ldn and 73% cutoff HA measured by the 11-point numerical scale, particularly in the range from 62.5 dB to 70.0 dB. The present results are as shown in Figure 3, which compares exposure-response relationships for conventional railway and shinkansen railway noises in Japan and aircraft and road traffic noises in Vietnam between 73% HA measured by the 11-point numerical scale and 72% HA calculated from responses obtained by the 5-point verbal scale. Though the exposure-response relationships for 72% HA by the 5-point verbal scale is slightly higher than that for 73% HA by 11-point scale in Figure 3b and the opposite trend is seen in Figure 3c, there seems to be no consistent difference between the two curves. To investigate in detail the difference between 72% HA using the 5-point verbal scale and 73% HA using the 11-point numerical scale, multiple logistic regression analysis was applied to these data, with HA or not HA as the dependent variable, and Lden, scale type (5-point verbal scale vs. 11-point numerical scale), and the interaction between Lden and scale type as the independent variables. The results are summarized in Tables 5-8. While there was no significant difference between the scales in conventional railway noise surveys in Japan (see Table 5) and road traffic noise surveys in Vietnam (see Table 8), there was a significant difference in shinkansen railway noise surveys in Japan (see Table 6) and aircraft noise surveys in Vietnam (see Table 7). Also, only the interaction between Lden and scale type in the road traffic noise surveys in Vietnam was significant.  Brink et al. [10] converted the 5-point verbal scale and the 11-point numerical scale to an evenly spaced scale ranging from 0 to 100 (discrete point) and examined the correspondence between the two scales. The obtained 5-point verbal scale points were 0, 25, 50, 75, and 100, and the obtained 11-point numerical scale points were 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, and 100. Therefore, we also converted Japanese data into discrete points and compared the range of %HA between those shown in the previous study and those calculated in this study. Here, we analyzed data from Japanese railways and shinkansen noise surveys and Vietnam's road traffic and aircraft noise surveys, using various datasets. To explain how to convert data to the discrete scale, Table 9 shows an example conversion of a dataset of conventional railway noise conducted in Japan from 2001 to 2017. The values shown in the gray cells in the table indicate the number of respondents for each scale value. The average discrete score of the 5-point verbal scale was calculated considering the weighting of the number of respondents on the corresponding 11-point numerical scale value, and the average discrete score of the 11-point numerical scale was calculated considering the weighting of the number of respondents on the corresponding 5-point verbal scale value. Table 10 shows the average discrete score of the 5-point verbal scale, and Table 11 shows the average discrete score of the 11-point numerical scale of Japanese railways and shinkansen noise surveys and Vietnam's road traffic and aircraft noise surveys, comparing the results with those of a former study in Switzerland [10]. In Table 10, it was assumed that the range of category 4 is from the midpoint of 4 and 5 to the midpoint of 4 and 3. This range was divided into 40% (HA) and 60% (not HA), and the border of HA and the range of HA were calculated. In Table 11, the range of HA was calculated assuming that the boundary of HA was at the midpoint between categories 7 and 8. As seen in Table 10, the HA range corresponding to the 11-point numerical scale on the 5point verbal scale (22%) in the Swiss survey was narrower than those in Japanese and Vietnamese surveys (29%-39%), and as seen in Table 11, the HA range corresponding to the 5-point verbal scale on the 11-point numerical scale (33%) in the Swiss survey was wider than those in the Japanese and Vietnamese surveys (23%-28%). Table 9. Frequency-weighted average of discrete values of the 11-point numerical scale scores on the 5-point verbal scale.

Discussion
From the results in Tables 3 and 4, ΔL1 and ΔL2 were around 5 dB on average, even though this value was smaller than 8-14 dB in the NORAH study by Wothge et al. [7]. Gjestland criticized the systematic review, particularly for aircraft and road traffic noise guideline values, by Guski et al., who rebutted the criticism [27][28][29][30]. If the guidelines are updated in the future, we hope that these scientific findings will be reflected. While the exposure-response function applied in the WHO guidelines is based on a meta-analysis weighted by the square root of the number of respondents in each dataset [8,9], Gjestland [27] is also critical of the use of weighting in this meta-analysis. As is shown in the introduction, the recommendation value in the WHO guidelines for railway noise is decided using data from 10 surveys of which 4 used a 73% cutoff and 6 used a 60% cutoff. If the correction of 5 dB which can be roughly introduced on the basis of the present study (Tables 3 and 4) is applied to the six conventional railway noise surveys, which used a 60% cutoff with no weighting of the sample size, the guideline value might be approximately 3 dB larger. Note that this value can be introduced from the simple arithmetic mean of a 5 dB increase over 6 surveys, divided by the total number of surveys, i.e., 10 (5 dB × 6 surveys/10 surveys).
As shown in Tables 3 and 4, the difference between ΔL1 and ΔL2 was found to be small among noise sources and between countries (Japan and Vietnam). This supports findings obtained by Nguyen et al. [11] and indicates the availability of applying Schreckenberg's method [12] in calculating the 72% cutoff HA from responses evaluated using the 5-point verbal scale.
There was no systematic difference for ΔL1 and ΔL2 in regard to the noise source in the results of the analysis using individual datasets, as shown in Figure 3 and Tables 5-8. This result was also obtained for the average of the total dataset regarding noise sources, shown in Table 3. There was also no large difference for ΔL1 and ΔL2 between Japan and Vietnam, as is indicated in Table 3. Accordingly, factors such as noise source and country did not affect the level difference of 5 dB between Lden_10% (60) and Lden_10% (73) and between Lden_10% (60) and Lden_10% (72).
Brink et al. [10] indicated a difference between exposure-response curves for 73% HA obtained using the 11-point numerical scale and for 72% HA using the 5-point scale, particularly in the range from 62.5 dB to 70 dB. To investigate the difference in the present datasets in detail, multiple logistic regression analysis was applied to conventional railway, shinkansen railway, aircraft, and road traffic noise survey data separately. Though significant differences were found in exposure-response relationships between 73% HA using the 11-point numerical scale and 72% HA using the 5-point scale in the Shinkansen and aircraft noise surveys, the effect size was small, and the direction was opposite. Therefore, we might consider that there is practically no systematic difference in exposure-response relationships between 72% HA using the 5-point verbal scale and 73% HA using the 11-point numerical scale. As suggested by the results in Tables 10 and 11, one of the reasons for the difference in the results was the difference in the correspondence between the 5-point verbal and the 11-point numerical scales among the Japanese, Vietnamese, and Swiss surveys. As shown in Table 10, the HA range (22%) in the Swiss survey was narrower than in Japanese and Vietnamese (29-39%) survey, and in Table 11, the HA range (33%) in the Swiss survey was wider than in Japanese and Vietnamese surveys (23-28%). The relatively wider range of HA when using the 11-point numerical scale may be because the Lden-%HA relationship in the 11-point numerical scale was higher than that in the 5point scale in the Swiss survey. Nonetheless, the difference in exposure-response relationships between 60% HA and 72% or 73% HA is important.
The existence of various definitions of %HA is inconvenient. However, each scale (11-point numerical scale and 5-point verbal scale) has its own merits. For this reason, it is difficult to define %HA choosing only one of the scales. In fact, ISO/TS 15666:2021 [31], which was published in May 2021, points out that we should pay attention to the difference in the definition of %HA by different scales. It is further stated that as an improvement method, the method presented in Reference [2] can be used. It is desirable that such a method of transformation be proposed. The method used in this study is also effective to conduct a secondary analysis based on individual data.
The application of the findings of this study is limited, because the results are based on socio-acoustic surveys conducted only in Japan and Vietnam. Because many surveys using both 5-point verbal and 11-point numerical scales were conducted in developed countries, ΔL1 and/or ΔL2 should be validated in those countries.

Conclusions
In this study, Lden values at 10% HA were calculated from exposure-response relationships for a 60% cutoff using the 5-point verbal scale, for a 73% cutoff using the 11point numerical scale, and for a 72% cutoff, weighting responses to the 5-point verbal scale, and the differences in Lden at 10% HA between 60% cutoff and 73% or 72% cutoff curves were compared. These results were compared with a previous study conducted in Switzerland. This study concludes that: (1) If 73% or 72% is the de facto standard cutoff point for %HA, the Lden value at 10% HA for a 60% cutoff should be corrected by adding approximately 5 dB on average in Japan and Vietnam. (2) There was practically no difference upon correction in regard to noise sources and between Japan and Vietnam. (3) Though there appeared to be differences in exposure-response relationships between 73% HA using the 11-point scale and 72% HA using the 5-point scale in the Swiss road traffic noise, Japanese Shinkansen noise, and Vietnamese aircraft noise surveys, there was no significant difference in the Japanese conventional railway noise and Vietnamese road traffic noise surveys. We might consider that there is practically no systematic difference in exposure-response relationships between 73% HA determined by the 11-point numerical scale and 72% HA determined by the 5-point verbal scale.