Usability of Visual Analogue Scales in Assessing Human Perception of Sound with University Students Using a Web-Based Tablet Interface

: Response scales in auditory perception assessment are critical for capturing the true responses of listeners. Despite its impact on data, response scales have received the least attention in auditory perception assessment. In this study, the usability of visual analogue scales for auditory perception assessment was investigated. Five response scales (a unipolar visual analogue scale– negated to regular, a unipolar visual analogue scale—regular to negated, a bipolar visual analogue scale–positive to negative, a bipolar visual analogue scale—negative to positive, and a unipolar 11-point scale (ISO/TS 15666:2021)) for auditory perception assessment are presented. Music and trafﬁc noise were presented to 60 university students at two different levels, i.e., 45 and 65 dBA, respectively. A web-based experimental design was implemented, and tablet pads were provided to the respondents to record their responses. The unipolar 11-point scale required the longest response time, followed by the two unipolar visual analogue scales and two bipolar visual analogue scales with statistical signiﬁcance. All response scales used in this study achieved statistical reliability and sensitivity for the auditory perception assessment. Among the ﬁve response scales, the bipolar visual analogue scale (negative to positive) ranked ﬁrst in reliability over repeated measures, exhibited sensitivity in differentiating sound sources, and was preferred by the respondents under the conditions of the present study. None of the respondents preferred the unipolar 11-point scale. The visual analogue scale was favoured over the traditional unipolar 11-point scale by young educated adults in a mobile-based testing environment. Moreover, the bipolar visual analogue scale demonstrated the highest reliability and sensitivity, and it was preferred the most by the respondents. The semantic labelling direction from negated to regular, or from negative to positive, is preferred over its opposite counterpart. Further research is necessary to investigate the use of response scales for the general public including children and the elderly, as well as that of semantic adjectives and their counterparts for auditory perception assessment. Author Contributions: Conceptualisation, W.Y.; methodology, W.Y.; software, W.Y.; validation, W.Y. and J.Y.J.; formal analysis, W.Y.; investigation, W.Y.; resources, W.Y. and J.Y.J.; data curation, W.Y.; writing—original draft preparation, W.Y.; writing—review and editing, J.Y.J.; visualisation, W.Y.; supervision, J.Y.J.; project administration, W.Y. and J.Y.J.;


Background
Auditory perceptions were measured using a questionnaire survey. The response scales used in the questionnaire survey are critical for obtaining true responses and minimising the impact of the survey design [1]. However, the choice of response scales has not been prioritised, even in broad psychological assessments [2].
In auditory perception, the unipolar five-point verbal and unipolar positive 11point numerical scales proposed by the International Commission on Biological Effects of Noise [3], adopted as the international standard ISO/TS 15666:2021 [4], are widely used to assess noise. Owing to the negative aspects of noise, the two abovementioned unipolar scales span from the neutral to the extremely negative end. The positive aspects of sound assessment can be measured through a soundscape, which is an acoustic environment perceived or experienced and/or understood by one or more persons [5]. The unipolar positive five-point verbal response scale and the unipolar negative and positive five-point verbal response scale based on questions were adopted by the ISO/TS 12913-2:2018 [6] for soundscape data collection and reporting.
The visual analogue scale (VAS) is another option for auditory perception experiments. The VAS has been widely used for subjective rating in many disciplines owing to its high sensitivity in rating subjective feelings [7]. In particular, numerous studies pertaining to the use of the VAS in clinical settings have been performed . Compared with multipoint discretised scales, the VAS offers a few benefits. The VAS appears to be more satisfactory than the multipoint scale for patient self-rating of chronic pain intensity [8]. It is more sensitive than multipoint verbal or numerical scales, and can distinguish slight changes effectively [9,15,18]. The VAS is a valuable instrument for comparing scores in different groups of subjects, comparing treatments in individuals, and observing the scores of individual subjects over a duration [14]. Furthermore, the VAS avoids the ceiling effect better than the multipoint scale, and the time required to complete the questionnaire was 28% shorter than that of the multipoint scale [22].
Despite the apparent advantages of VAS, it has not been widely used in survey research. Two key features of less popularity of the VAS were analysed by Couper et al. [30]: first, it required self-administration and second, it is visual; that is, it cannot be administered using an aural medium, such as the telephone. These characteristics, along with the extra effort needed to measure and record the answers provided, may limit the use of VAS in surveys.
However, recent developments in graphical user interfaces have raised the possibility of greater use of VAS in web-based survey applications [30][31][32][33]. Furthermore, a mobile device or multi-device on a web-based survey has been addressed [34,35]. Recently, headmounted displays (HMDs) have been used for perception evaluation in various disciplines, such as soundscape [36] and noise annoyance [37]. However, the optimal response scales for capturing true responses utilising newly adopted devices in auditory perception studies are in the early stages of research.

Literature Review
A few studies [38][39][40] have investigated the effects of response scales in noise surveys based on ISO/TS 15666:2021 [4]. Brink et al. [38] discovered some disagreements between a five-point verbal scale and an 11-point numerical scale with 2386 Swiss residents. Standardised average annoyance scores were higher using the 11-point numerical scale than the 5-point verbal scale with statistical significance. The percentage of highly annoyed respondents was significantly higher based on the 5-point verbal scale than the 11-point numerical scale. Nguyen et al. [39] investigated the correspondence between 5-point verbal and 11-point numerical scales from 15 social surveys conducted in Japan (N = 3652) and Vietnam (N = 7149). The exposure-response relationships obtained by the logistic regression function with the high annoyance defined by the top three categories of the 11-point scale located lower than the logistic regression relationships with the high annoyance defined by the two categories of the 5-point scale. However, other studies with relatively small number of participants (N = 22 [41], N = 33 [40]) found no statistical differences between the two standardised response scales for noise assessment. Tristan-Hernandez et al. [40] evaluated psychoacoustic annoyance and perception of noise annoyance inside university facilities using both a 5-point verbal scale and an 11-point numerical scale [4], and found no significant differences between the answers of both scales. Bjerre et al. [41] also used both a 5-point verbal scale and an 11-point numerical scale [4] for their on-site and laboratory evaluations of soundscape quality in recreational urban spaces, and no significant differences were resulted.
The visual analogue scale (VAS) is not widely used in auditory perception experiments. Previous studies using VASs for auditory perception primarily involved thermal comfort [42][43][44][45][46][47]. For thermal comfort assessments, a bipolar seven-point scale was stan-dardised according to ISO 10551:2019 [48]. The choice of visual analogue scale could be a mediated choice, since the standardised response scales for subjective surveys in each research field were different. The effects of the VAS were not the main topic of the previous studies. Recently, Yang and Jeon [49] compared the performance and preference of a bipolar VAS and standardised unipolar 11-point numerical scale for auditory perception. Both response scales were acceptable for reliability and sensitivity; however, subtle differences were observed. The bipolar visual analogue scale was more reliable than the unipolar 11point numerical scale in repeated measures, whereas the unipolar 11-point numerical scale was more sensitive than the bipolar visual analogue scale in distinguishing differences between sound sources. The respondents preferred the bipolar VAS. Yang et al. [50] expanded the response scale study to assess the combined environmental perception. They compared four different response scales based on ISO/TS 15666:2021 [4] and ISO 10551:2019 [48]. The degree of relative differentiation based on indoor physical factors did not differ significantly across the four response scales, including the VAS score. Respondents subjectively preferred the bipolar VAS. Despite the two previous studies pertaining to the response scale in auditory perception, the effects of the response scale on aural stimuli have yet to be fully investigated.
Chyung et al. [51] reported on the effects of delivery media and its transition in a survey. Traditionally, surveys have been administered on paper and ruler-type devices. However, owing to the popularity of web-based survey systems and respondents' increased access to the web via desktop and mobile devices, practitioners and researchers now administer their surveys frequently via the web. The administrative drawbacks of using a VAS can be mediated via the web. The selection of the response scale should be reconsidered based on the delivery media shift from analogue to digital.

Research Objectives
The objective of the present study was to investigate the usability of VASs for auditory perception assessments of university students exposed to mobile applications in a webbased testing environment. Four unipolar VASs were compared with the unipolar 11point numerical scale corresponding to the ISO/TS 15666:2021 [4] for auditory perception experiments. The effects of labelling direction on the VAS were investigated using negated and reversed end labels.

Methods
The present study was designed as an experimental study with repeated measures in a laboratory. All participants responded in all conditions of the experimental design. No separate participant groups were designed for comparisons. This study was approved by the Institutional Review Board of Hanyang University.

Participants
Sixty university students (30 men and 30 women) participated in the response scale comparison testing. No hearing-impaired participants were examined during the interviews. All participants provided written informed consent before the start of the study and received financial support for their participation. The mean age of the participants was 21.1 (SD 2.3) years. All participants were mobile phone users who reported their proficiency in using mobile devices.

Experimental Conditions
The experiment was conducted in a small laboratory dedicated to indoor environmental experiments. The indoor environment was maintained at an air temperature of 24 • C and humidity level of 45%. The mean illuminance level along the testing desk surface was 860 lx (Konica Minolta T-10A, Tokyo, Japan). The background noise level was measured to be 41 dBA (Rion Rionote, Tokyo, Japan) while the air handling unit was turned on. The reverberation time in the laboratory was measured to be 0.93 s at mid-frequencies (mean of 500 Hz and 1 kHz).
Four loudspeakers (Genelec 8020C, Iisalmi, Finland) were placed at each corner of the room facing the corner to present sounds with no directional information in order to normalise the sound direction for all participants in the testing room. Four different sound sources (music sound and traffic noise of 45 and 65 dBA, respectively) were presented through the loudspeaker system. The differences in sound level across the room were measured to be ±0.4 dBA. The first movement of Vivaldi's The Four Seasons, "Spring", performed by Amsterdam Sinfonietta in Concertgebouw in 2014 was used as the music sound, i.e., a positive sound source. Meanwhile, traffic noise, representing a negative sound, was recorded on a street near the university. Figure 1 shows frequency spectra of sound souces.

Experimental Conditions
The experiment was conducted in a small laboratory dedicated to indoor environmental experiments. The indoor environment was maintained at an air temperature of 24 °C and humidity level of 45%. The mean illuminance level along the testing desk surface was 860 lx (Konica Minolta T-10A, Tokyo, Japan). The background noise level was measured to be 41 dBA (Rion Rionote, Tokyo, Japan) while the air handling unit was turned on. The reverberation time in the laboratory was measured to be 0.93 s at mid-frequencies (mean of 500 Hz and 1 kHz).
Four loudspeakers (Genelec 8020C, Iisalmi, Finland) were placed at each corner of the room facing the corner to present sounds with no directional information in order to normalise the sound direction for all participants in the testing room. Four different sound sources (music sound and traffic noise of 45 and 65 dBA, respectively) were presented through the loudspeaker system. The differences in sound level across the room were measured to be ±0.4 dBA. The first movement of Vivaldi's The Four Seasons, "Spring", performed by Amsterdam Sinfonietta in Concertgebouw in 2014 was used as the music sound, i.e., a positive sound source. Meanwhile, traffic noise, representing a negative sound, was recorded on a street near the university. Figure 1 shows frequency spectra of sound souces.

Response Scales and Semantic Adjectives
Four different VASs and a unipolar 11-point numerical scale were used in this study, as shown in Figure 2. The unipolar 11-point numerical scale with endpoint and midpoint labels was adopted based on the ISO/TS 15666:2021 [4], which was developed for socioacoustic noise annoyance surveys. The four VASs comprised a pair of unipolar VASs and a pair of bipolar analogue scales. Each pair had reversed left and right verbal end labels. The subjective attributes asked were identical in all five response scales; however, the number of questions for the unipolar scales doubled on the bipolar scales.

Response Scales and Semantic Adjectives
Four different VASs and a unipolar 11-point numerical scale were used in this study, as shown in Figure 2. The unipolar 11-point numerical scale with endpoint and midpoint labels was adopted based on the ISO/TS 15666:2021 [4], which was developed for socioacoustic noise annoyance surveys. The four VASs comprised a pair of unipolar VASs and a pair of bipolar analogue scales. Each pair had reversed left and right verbal end labels. The subjective attributes asked were identical in all five response scales; however, the number of questions for the unipolar scales doubled on the bipolar scales.
The semantic attributes of the questionnaire were four pairs of adjectives: soft vs. loud, quiet vs. noisy, pleasant vs. annoying, and uncomfortable vs. comfortable [52]. The loud, noisy, and annoying were widely used for subjective noise evaluation [47,[53][54][55][56][57]. Comfortable is an attribute used in many human perception evaluations other than auditory perception [58][59][60][61], and was selected as a higher-level semantic attribute inferred by the results of other semantic attributes. The reversed items were then selected for counterbalanced rating scales for validity [62].
For the two unipolar VASs, each semantic attribute was positioned on the left and right ends, and its minimum level was placed using "not" at the other end, which are Sustainability 2021, 13, 9207 5 of 16 called negated items [63]. For the two bipolar VASs, two adjective antonyms, so-called reversed items [63] in auditory perception were used on both sides of the line.
The VAS comprised a plain, horizontal 100-mm-long line mm and primarily verbal end labels. In this study, a web-based tablet interface was developed, and the length of the horizontal line was approximately 120 mm on the tablet screen. Respondents provided rating by tapping a mark on the line. A slider was placed at the left end of the default setting as an indicator of the rating mark.

Experimental Design and Procedure
In each 60-min-long testing session, a 15-min adaptation period was implemented at the beginning of the session. A total of 25 cases were created by combining five sound sources and five response scales. Ambient noise perception was asked at the beginning of the session from the first to the fifth to familiarise the participants with the response scales before the music sound and traffic noise were tested, as shown in Figure 3. Subsequently, the response scale was randomised, and the sound source was presented randomly within the same response scale. When 25 cases were completed, the same 25 cases were tested in duplicates.
called negated items [63]. For the two bipolar VASs, two adjective antonyms, so-called reversed items [63] in auditory perception were used on both sides of the line.
The VAS comprised a plain, horizontal 100-mm-long line mm and primarily verbal end labels. In this study, a web-based tablet interface was developed, and the length of the horizontal line was approximately 120 mm on the tablet screen. Respondents provided rating by tapping a mark on the line. A slider was placed at the left end of the default setting as an indicator of the rating mark.

Experimental Design and Procedure
In each 60-min-long testing session, a 15-min adaptation period was implemented at the beginning of the session. A total of 25 cases were created by combining five sound sources and five response scales. Ambient noise perception was asked at the beginning of the session from the first to the fifth to familiarise the participants with the response scales before the music sound and traffic noise were tested, as shown in Figure 3. Subsequently, the response scale was randomised, and the sound source was presented randomly within the same response scale. When 25 cases were completed, the same 25 cases were tested in duplicates. A maximum of eight respondents participated simultaneously during the testing session. A quick demonstration of the method to use the tablet was provided to the respondents. The respondents were required to click on instead of dragging the slider to avoid potential technical problems from dragging using their fingers. Most respondents were familiar with the use of touchscreen devices.
Each sound source was prepared for 50 s. However, the response time was not fixed. Even if 50 s of a sound source playback had or had not been completed, when all respondents submitted their responses in the test group, the next sound source started. The response data were automatically saved on a server.
Repeated measures analysis of variance (ANOVA) was employed with two independent variables, i.e., response scale and sound source, for eight subjective attributes to confirm the effects of the repeated measures and response scales simultaneously. ANOVA is an effective statistical test; however, it cannot guarantee normality for subjective ratings [64,65]. In this study, the response time was analysed using the ANOVA as well. The response time for each case was calculated by subtracting the former submission time from the latter submission time such that the first submission of each respondent had no response duration value in this study. Fisher's Z-transformation was applied to compare the correlation coefficients of response scales. A maximum of eight respondents participated simultaneously during the testing session. A quick demonstration of the method to use the tablet was provided to the respondents. The respondents were required to click on instead of dragging the slider to avoid potential technical problems from dragging using their fingers. Most respondents were familiar with the use of touchscreen devices.
Each sound source was prepared for 50 s. However, the response time was not fixed. Even if 50 s of a sound source playback had or had not been completed, when all respondents submitted their responses in the test group, the next sound source started. The response data were automatically saved on a server.
Repeated measures analysis of variance (ANOVA) was employed with two independent variables, i.e., response scale and sound source, for eight subjective attributes to confirm the effects of the repeated measures and response scales simultaneously. ANOVA is an effective statistical test; however, it cannot guarantee normality for subjective ratings [64,65]. In this study, the response time was analysed using the ANOVA as well. The response time for each case was calculated by subtracting the former submission time from the latter submission time such that the first submission of each respondent had no response duration value in this study. Fisher's Z-transformation was applied to compare the correlation coefficients of response scales.
In this study, a numerical value from 0.00 to 10.00 was assigned to the responses from the VASs for statistical analysis. Furthermore, original responses from the numerical scale were used. The responses from the bipolar VASs (−10.00 to +10.00) were converted to unipolar 0.00 to 10.00 scales to perform an ANOVA on the five response scales for all eight subjective attributes. In the conversion rule, if the response is negative, then the absolute value of the response is assigned to the left-end subjective attribute, and the value of 10 minus the absolute value of the response is automatically assigned to the right-end subjective attribute. If the response is positive, then the response value is assigned to the right-end subjective attribute, and the left-end subjective attribute automatically assumes a value of 10 minus the response value.

Response Times
Response time was dependent on the order of the questionnaire. Table 1 lists the mean response times based on questionnaire order. The response times from numbers 2 to 6 were significantly longer than that of number 25, regardless of the test combinations. To achieve an order-free dataset, numbers 2 to 7 and 25 were excluded, as listed in Table 2. The response scale significantly affected response time. The numerical 11-point response Sustainability 2021, 13, 9207 7 of 16 scale indicated the longest response time. The bipolar VASs required less time than the others, as expected, because the number of responding scales was half that of the unipolar scales. Repeated measures significantly affected response time. The duplicates were shorter than the first response times. Moreover, the sound source affected the response time. The response time for 45 dBA traffic noise was longer than that for other sound sources.

Data Normalisation
The numerical 11-point scale (scale 5) and unipolar VASs (scales 1 and 2) can be compared directly from the responses; however, the bipolar VASs had different increments on the same line. Although the values allotted for the bipolar VASs ranged from −10.00 to +10.00, the actual length of the response scale line was the same as that of the unipolar VASs.   Table 3 lists the p-values and effect size (μ) of the repeated measures ANOVA for all subjective attributes. As expected, the sound source was a dominant factor, according to the effect size, whereas the response scale affected the subjective perception with a relatively small effect size.   Table 3 lists the p-values and effect size (µ) of the repeated measures ANOVA for all subjective attributes. As expected, the sound source was a dominant factor, according to the effect size, whereas the response scale affected the subjective perception with a relatively small effect size. Loudness and noisiness were not affected by the response scale. Softness, quietness, pleasantness, and comfort, which are positive semantic attributes, had a relatively larger effect size than negative attributes. In scale 5, subjective perception was consistently rated lower compared with the other response scales, particularly for positive semantic attributes. The response values from the second time were always higher than those from the first time; however, statistical significance was only achieved in terms of pleasantness and annoyance. Table 4 lists the Bonferroni post-hoc test results for the interaction between response scales and sound sources. These results were used to investigate the sensitivity to differentiate each sound source used in this study. Only the softness of music sound and traffic noise with the same sound level was distinguishable by all the five scales. The numerical 11-point scale could not distinguish the quietness, pleasantness, annoyance, comfort, and discomfort of music sound and traffic noise at 45 dBA, neither could it distinguish the loudness and quietness of music sound as well as traffic noise at 65 dBA. For loudness, the bipolar VAS from negative to positive indicated the highest sensitivity. The unipolar VASs did not indicate high sensitivity at 65 dBA for loudness, quietness, and noisiness. Meanwhile, the bipolar VASs did not indicate high sensitivity at 45 dBA for quietness and noisiness. Table 4. Bonferroni post-hoc (p < 0.05) results for interaction between response scales and sound sources (coefficients that do not share a letter are significantly different within a cell; A > B > C > D).

Correlation Coefficients and Fisher's Z-Transformation
In this study, Pearson's correlation coefficients between repeated measures were analysed using Fisher's Z-transformation (p < 0.05), as listed in Table 5. For softness, the most reliable scale of response was scale 4, followed by scale 3. No statistical difference in reliability was discovered between the unipolar VAS and numerical 11-point scales. For loudness, scales 2 and 1 were more reliable than the other scales. For quietness, scale 5 was rated the worst in terms of reliability; however, for noisiness, scale 5 was rated the most reliable. The scale 4 was the most reliable scale for pleasantness. For annoyance, scale 5 was more reliable than the others. For comfort and discomfort, scale 4 was more reliable than scales 1 and 2. In summary, the bipolar VASs were more reliable than other scales, and scale 4, which is a bipolar VAS from negative to positive semantic attributes, was ranked the most reliable response scale.  Table 6 lists Pearson's correlation coefficients between paired scales and their Fisher's Z-transformation results for testing the end point label order. The end point label order was more reliable at the VASs than the numerical 11-point scale. Among the VASs, the bipolar VASs were more reliable than the unipolar VASs for end point label ordering in most subjective attributes. Table 6. Pearson's correlation coefficients between paired scales (p < 0.0005) and Fisher's Ztransformation (p < 0.05) results (coefficients that do not share a letter are significantly different within a row; A > B > C).

Respondents' Preference
In general, 37.5% of the respondents preferred the bipolar VAS from negative to positive semantic attributes, as shown in Figure 5. To determine the preferred scale, no responses were collected on a numerical 11-point scale. For the first response, scales 3 and 4 appeared similar, but for the second response, 43.3% of the respondents preferred scale 4, i.e., the bipolar VAS from negative to positive semantic attributes. The bipolar VAS from negative to positive was preferred twice compared with the bipolar VAS from positive to negative.
In general, 37.5% of the respondents preferred the bipolar VAS from negative to positive semantic attributes, as shown in Figure 5. To determine the preferred scale, no responses were collected on a numerical 11-point scale. For the first response, scales 3 and 4 appeared similar, but for the second response, 43.3% of the respondents preferred scale 4, i.e., the bipolar VAS from negative to positive semantic attributes. The bipolar VAS from negative to positive was preferred twice compared with the bipolar VAS from positive to negative. The mean response times of the preferred scale by each respondent were analysed, as shown in Figure 6. The response time appeared to have contributed to the selection of the preferred response scale. To confirm the relationship between response time and response scale preference, the Pearson correlation coefficients between them were calculated, as listed in Table 7. For the first response, the correlation between response scale preference and response time was significant. For the second response, although the correlation strength was not as high as that of the first response, a negative correlation was observed.   The mean response times of the preferred scale by each respondent were analysed, as shown in Figure 6. The response time appeared to have contributed to the selection of the preferred response scale. To confirm the relationship between response time and response scale preference, the Pearson correlation coefficients between them were calculated, as listed in Table 7. For the first response, the correlation between response scale preference and response time was significant. For the second response, although the correlation strength was not as high as that of the first response, a negative correlation was observed.
itive semantic attributes, as shown in Figure 5. To determine the preferred scale, no responses were collected on a numerical 11-point scale. For the first response, scales 3 and 4 appeared similar, but for the second response, 43.3% of the respondents preferred scale 4, i.e., the bipolar VAS from negative to positive semantic attributes. The bipolar VAS from negative to positive was preferred twice compared with the bipolar VAS from positive to negative. The mean response times of the preferred scale by each respondent were analysed, as shown in Figure 6. The response time appeared to have contributed to the selection of the preferred response scale. To confirm the relationship between response time and response scale preference, the Pearson correlation coefficients between them were calculated, as listed in Table 7. For the first response, the correlation between response scale preference and response time was significant. For the second response, although the correlation strength was not as high as that of the first response, a negative correlation was observed.

Reliability over Repeated Measures
In general, the most reliable response scale was the bipolar VAS from negative to positive, followed by that from positive to negative, as listed in Table 5. The unipolar scales, including numerical and analogue scales, did not demonstrate high reliability compared with the bipolar scales, which is consistent with previous studies [49,50]. Yang and Jeon [49] reported that the bipolar visual analogue scale was more reliable than the unipolar 11-point numerical scale for auditory perception. Furthermore, Yang et al. [50] discovered that the bipolar VAS appeared reliable over repeated measures for all subjective attributes, including auditory perception.
It is noteworthy that the reliability over repeated measures was independent of the type of response scale but dependent on the polarity of the response scale. For unipolar scales, the unipolar 11-point scale was more reliable than the unipolar VAS in this study. To the best of our knowledge, the present study is the first to assess the effects of the polarity of response scales.

Sensitivity for Degree of Differentiation in Auditory Perception
The sensitivity of the response scale was different for the subjective attributes, as listed in Tables 3 and 4. In general, the bipolar VAS from negative to positive indicated the highest sensitivity for all subjective attributes. The unipolar 11-point scale is applicable for loudness, noisiness, and annoyance perception with traffic noise, which shows that the unipolar 11-point scale fulfils the purpose of ISO/TS 15666:2021 [4]; however, its sensitivity in differentiating quietness, pleasantness, comfort, and discomfort was insufficient compared with the other scales. The bipolar VAS from negative to positive could only distinguish loudness and noisiness between music and traffic noise at 65 dBA (Table 4).
In previous studies [49,50] where water sound and traffic noise at 42 and 61 dBA were used, respectively, no statistical difference was exhibited in response scale sensitivity. In this study, music sounds were presented to the participants instead of water sounds. The music sound might be a more explicit positive sound than the water sound, thereby yielding differences in the results.

Preference by Young Adults as Mobile Users
Two-thirds of the participants preferred bipolar VASs, which is consistent with previous studies [49,50]. None of the respondents selected the unipolar 11-point scale, which was the only numerical scale in the present study. The respondents preferred semantic directivity on the response scales, such as from negated to regular and from negative to positive. The polarity of the response scale (unipolar vs. bipolar) imposed more significant effects on respondents' preferences than the semantic direction on the response scale. The shorter response time of the bipolar scales might have contributed to the significant effects of the polarity of the response scale on the respondents' preferences.
The participants were familiar with the mobile environment. Unfortunately, comparisons between a traditional paper VAS and an electronic VAS were beyond this research scope; however, their preference might be different from that of people who are not familiar with mobile devices.
Socioeconomic educational (SES) factors significantly affect the user preference of response scales in medicine [17]. For the pain scale, the VAS was not a preference as a response scale [17,19,20], since the respondents' socioeconomic educational background was much wider than that in the present study. The mean age of the respondents in this study was 21 years, and all were university students. The respondents' preferences affected the selection of the response scale, owing to the positive association between the respondents' performance and their preferences [66]. Therefore, the socioeconomic educational factors of the respondents should be considered when the response scale was selected for the survey.

Numerical Scale vs. VAS: Types of Response Scale
To the best of our knowledge, this is the first study to directly compare VASs and numerical scales to assess auditory perception. Only unipolar scales were compared in this study because the experimental design was based on ISO/TS 15666:2021 [4].
The comparison results between a numerically discretised scale and a VAS appeared complicated; hence, a simple conclusion could not be obtained. In the overall analysis, the numerical 11-point scale indicated lower values for softness and pleasantness with statistical significance (i.e., the positive aspects of the sound). The reliability of the numerical 11-point scale for noisiness and annoyance was stronger than that of the unipolar VAS from negated to regular. None of the respondents preferred a numerical 11-point scale over the VAS.
If the positive attributes are not intentionally assessed in the questionnaire, then the two response scales can be substituted for each other in auditory perception. For comparisons with previous noise assessment studies, the use of standardised response scales may be practical for noise assessment. However, considering the association between respondents' preferences and their performance [66], the VAS may be an alternative to numerical scales for the auditory perception of young adults.

Unipolar vs. Bipolar: Polarity of Response Scale
In the present study, comparisons between unipolar and bipolar scales were analysed using VAS. In auditory perception, unipolar response scales have been used as international standards, because the sensation and perception of the sounds are considered unipolar. However, bipolar response scales have been discovered to be the most reliable, sensitive, and preferred for auditory perception assessment. Consequently, the preference and performance associations were validated using bipolar VASs.
The normalisation formula was necessary to apply ANOVA on the unipolar and bipolar scales. The normalisation formula developed in this study showed that the respondents selected the visually similar position on the scale line instead of the linearly proportional position between the unipolar and bipolar scales, regardless of whether the response scale was unipolar or bipolar, even though the actual increment of the scale line of the bipolar scale was half that of the unipolar scale for each semantic attribute. For example, once the music sound of 45 dBA was judged as soft, the respondents' selections for softness on the VAS were similar regardless of the polarity of the scale. Therefore, selecting the loudness of the sound was less valuable than selecting the softness of the sound because it was already judged as soft, and vice versa. This might explain the respondents' strong preference for bipolar scales.
In the present study, equivalent sound levels at 45 and 65 dBA, which can be classified as likes and dislikes, were used. Therefore, an increased number of questions with unipolar scales may not be preferred because of the explicit sound level differences. The other inference on bipolar preference is that semantic attributes can be clearly perceived when its semantic counterpart is provided. Therefore, semantic attributes and their semantic counterparts for auditory perception should be further investigated.

Negation and Reversal Labels
The bipolar VAS with the end labels from negative to positive meaning was preferred by respondents than the bipolar VAS positive to negative meaning. For unipolar VASs, the order of end labels from negation to regular was preferred by the respondents. The participants might associate "negative" meaning or form with left and "positive" meaning or form with right. Spatial-numerical associations have been studied extensively [67][68][69], and a leftward bias for visual perception [70]. However, references regarding spatialpositive relations have not been found, at least to the knowledge of the authors.

Conclusions
VASs can be used to assess auditory perceptions by substituting the traditional numerical 11-point response scale among young educated adults in a mobile environment.
Among the VASs investigated in this study, the bipolar VAS with a negative left end and a positive right end was considered the most reliable, sensitive, and preferred response scale.
The effect of the polarity (unipolar vs. bipolar) was stronger than that of the labelling direction (negated to regular; negative to positive) on the VASs. Further research is necessary to investigate the use of response scales for the general public including children and the elderly, as well as that of semantic adjectives and their counterparts for auditory perception assessment in mobile and virtual reality environments. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.