Reliability of Universal-Platform-Based Voice Screen Application in AVQI Measurements Captured with Different Smartphones

The aim of the study was to develop a universal-platform-based (UPB) application suitable for different smartphones for estimation of the Acoustic Voice Quality Index (AVQI) and evaluate its reliability in AVQI measurements and normal and pathological voice differentiation. Our study group consisted of 135 adult individuals, including 49 with normal voices and 86 patients with pathological voices. The developed UPB “Voice Screen” application installed on five iOS and Android smartphones was used for AVQI estimation. The AVQI measures calculated from voice recordings obtained from a reference studio microphone were compared with AVQI results obtained using smartphones. The diagnostic accuracy of differentiating normal and pathological voices was evaluated by applying receiver-operating characteristics. One-way ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using a studio microphone and different smartphones (F = 0.759; p = 0.58). Almost perfect direct linear correlations (r = 0.991–0.987) were observed between the AVQI results obtained with a studio microphone and different smartphones. An acceptable level of precision of the AVQI in discriminating between normal and pathological voices was yielded, with areas under the curve (AUC) displaying 0.834–0.862. There were no statistically significant differences between the AUCs (p > 0.05) obtained from studio and smartphones’ microphones. The significant difference revealed between the AUCs was only 0.028. The UPB “Voice Screen” application represented an accurate and robust tool for voice quality measurements and normal vs. pathological voice screening purposes, demonstrating the potential to be used by patients and clinicians for voice assessment, employing both iOS and Android smartphones.


Introduction
Mobile communication devices such as smartphones or tablets are widely available to most of the global population, with the number of smartphone subscriptions expected to reach about 7145 billion by 2024 [1]. The increasing number of validated applications for smartphones in the field of general otorhinolaryngology and especially in a field related to voice assessment and management of voice disorders is permanently monitored in the literature [2][3][4][5][6]. Advances in smartphone technology and microphone quality offer an affordable and accessible alternative to studio microphones traditionally used for speech analysis, thus providing an effective tool for assessing, detecting, and caring for voice disorders [7][8][9].
The combination of variables in smartphone hardware and software may lead to differences between voice quality measures. Whether acoustic voice features recorded using smartphones sufficiently match the current gold standard for remote monitoring and clinical assessment with a studio microphone remains uncertain [7,10,11]. Some controversies ferent smartphones for voice recordings and estimations of AVQI will be feasible for the quantitative voice assessment. Therefore, the present study aimed to develop a universal-platform-based (UPB) application suitable for different smartphones for the estimation of AVQI and evaluate its reliability in AVQI measurements and normal/pathological voice differentiation.

Materials and Methods
All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki of 1975, and the protocol was approved by the Kaunas Regional Ethics Committee for Biomedical Research (2022-04-20 No. BE-2-49).
The study group consisted of 135 adult individuals: 58 men and 77 women. The mean age of the study group was 42.9 (SD 15.26) years. They were all examined at the Department of Otolaryngology of the Lithuanian University of Health Sciences, Kaunas, Lithuania. The pathological voice subgroup consisted of 86 patients: 42 men and 44 women, with a mean age of 50.8 years (SD 14.3). They presented with a relatively common and clinically discriminative group of laryngeal diseases and related voice disturbances, i.e., benign and malignant mass lesions of the vocal folds and unilateral paralysis of the vocal fold. The normal voice subgroup consisted of 49 selected healthy volunteer individuals: 16 men and 33 women, mean age 31.69 (SD 9.89) years. This subgroup was collected following three criteria to define a vocally healthy subject: (1) all selected subjects considered their voice as normal and had no actual voice complaints and no history of chronic laryngeal diseases or voice disorders; (2) no pathological alterations in the larynx of the healthy subjects were found during video laryngoscopy; and (3) all these voice samples were evaluated as normal voices by otolaryngologists working in the field of voice. Demographic data of the study group and diagnoses of the pathological voice subgroup are presented in Table 1. No correlations between the subject's age, sex, and AVQI measurements were found in the previous study [28]. Therefore, in the present study, the control and patient groups were considered suitable for AVQI-related data analysis, despite these groups not being matched by sex and age.

Original Voice Recordings
Voice samples from each subject were recorded in a T-series sound-proof room for hearing testing (T-room, CATegner AB, Bromma, Sweden) using a studio oral cardioid AKG Perception 220 microphone (AKG Acoustics, Vienna, Austria). The microphone was placed at a 10.0 cm distance from the mouth, keeping a 90 • microphone-to-mouth angle. Each participant was asked to complete two vocal tasks, which were digitally recorded. The tasks consisted of (1) sustaining phonation of the vowel sound (a:) for at least 4 s duration and (2) reading a phonetically balanced text segment in Lithuanian "Turėjo senelė žilą oželį" ("The granny had a small grey goat"). The participants completed both vocal tasks at a personally comfortable loudness and pitch. All voice recordings were captured with Audacity recording software (https://www.audacityteam.org/, accessed on 30 May 2023) at a sampling frequency of 44.1 kHz and exported in a 16-bit depth lossless "wav" audio file format onto the computer's hard disk drive (HDD).

Auditory-Perceptual Evaluation
Five experienced physicians-laryngologists, who were all native Lithuanians, served as the rater panel. Blind to all relevant information regarding the subject (i.e., identity, age, gender, diagnosis, and disposition of the voice samples), they performed auditoryperceptual evaluations to quantify the vocal deviations, judging the voice samples into four ordinal severity classes of grade from the GRBAS scale (i.e., 0 = normal, 1 = slight, 2 = moderate, 3 = severe dysphonia) [35]. A detailed description of the auditory-perceptual evaluation is presented elsewhere [22].

Transmitting Studio Microphone Voice Recordings to Smartphones
The impact on voice recordings caused by technical differences in studio and smartphone microphones was averted by applying the filtration (equalization) of the already recorded flat frequency audio using the data from the smartphone frequency response curves. The filtered result would represent audio recorded with the selected smartphone. Using this method, the only variable affected was the frequency response, keeping other variables, i.e., room reflections, distance to the microphone, directionality, user loudness, and other variables, constant. Ableton DAW (digital audio workstation) was implemented as an editing environment, and the VST (virtual studio plugin) plugin MFreeformEqualizer by MeldaProduction (https://www.meldaproduction.com/MFreeformEqualizer/features, accessed on 4 June 2023) was used to import the frequency response datasets and equalize the frequencies according to the required frequency response. The MFreeformEqualizer filter quality was set to the extreme (highest available), with 0% curve smoothing. All the audio files were then re-exported as 44,100 Hz 16-bit wav files. With this method, the digital voice recordings obtained with a studio microphone were directly transmitted to different smartphones for analysis, avoiding not only the surrounding environment's impact but also ideally synchronizing all voice samples throughout all devices without the need for additional audio synchronization methods to ensure that the exact same parts of vowels and speech were used for each smartphone's analysis.

AVQI Estimation
For AVQI calculations, the signal processing of the voice samples was performed in the Praat software (version 5.3.57; https://www.fon.hum.uva.nl/praat/, accessed on 4 June 2023). Only voiced parts of the continuous speech were manually extracted and concatenated to the medial 3 s of the sustained (a) phonation. The voice samples were concatenated for auditory-perceptual judgment in the following order: text segment, a 2 s pause, followed by a 3 s sustained vowel /a/ segment. This chain of signals was used for acoustic analysis with the AVQI script version 02.02 developed for the program Praat https://www.vvl.be/documenten-en-paginas/praat-script-avqi-v0203?download= AcousticVoiceQualityIndexv.02.03.txt, accessed on 4 June 2023.

Development of a Universal-Platform-Based "Voice Screen" Application for Automated AVQI Estimation
The "Voice Screen" application for use with iOS operating devices was developed in the initial stage. Background noise monitoring, voice recording, and developed automated AVQI calculations were implemented in the application. Consequently, the "Voice Screen" application allowed voice recording, automatically extracted acoustic voice features, and displayed the AVQI result alongside a recommendation to the user [9].
The upgraded UPB version of the "Voice Screen" application, suitable for iOS and Android devices, was elaborated in the next stage. In this case, the calculation of the AVQI and its characteristics was performed on the server; therefore, the computationally costly sound processing was not dependent on the user's device's computational capabilities. We used the Flutter framework (https://flutter.dev/, accessed on 4 June 2023) to create our client application. It allowed for compiling applications for different platforms (devices and their operating systems) from a single code base. The framework ensured that the same algorithms ran on different devices and that no new software errors were introduced while porting the application. Currently, our application works with both iOS and Android devices. Figure 1 shows the structure of the system. The numbers in the picture depict the flow of the operations.

Development of a Universal-Platform-Based "Voice Screen" Application for Automated AVQI Estimation
The "Voice Screen" application for use with iOS operating devices was developed in the initial stage. Background noise monitoring, voice recording, and developed automated AVQI calculations were implemented in the application. Consequently, the "Voice Screen" application allowed voice recording, automatically extracted acoustic voice features, and displayed the AVQI result alongside a recommendation to the user [9].
The upgraded UPB version of the "Voice Screen" application, suitable for iOS and Android devices, was elaborated in the next stage. In this case, the calculation of the AVQI and its characteristics was performed on the server; therefore, the computationally costly sound processing was not dependent on the user's device's computational capabilities. We used the Flutter framework (https://flutter.dev/, accessed on 4 June 2023) to create our client application. It allowed for compiling applications for different platforms (devices and their operating systems) from a single code base. The framework ensured that the same algorithms ran on different devices and that no new software errors were introduced while porting the application. Currently, our application works with both iOS and Android devices. Figure 1 shows the structure of the system. The numbers in the picture depict the flow of the operations. In the first step, the given smartphone (iOS or Android) records sound waves acquired while saying given phrases aloud. The sound waves are preprocessed (see Step 1 in Figure 1) in real-time. The preprocessing aims to clean the sound waves from pauses and ensure the minimum amount of sound suitable for further analysis.
Step 2 sends the preprocessed sound wave to the server for further analysis. The server runs a Linux operating system and provides web services for software in Python. That software is based on the Praat (https://www.fon.hum.uva.nl/praat/, accessed on 4 June 2023) application ported into a Python library by the Parselmouth project (https://parselmouth.readthedocs.io/, accessed on 4 June 2023). We use this library to calculate AVQI and other sound characteristics used in AVQI calculation. In Step 3, the AVQI index and the related data are returned to the smartphone and displayed to the user.
Step 4 is optional. If the user chooses to save the results, the sound waves and calculated characteristics are saved into the server's database. No personal data relating to a specific person with the calculated AVQI and its parameters is saved on a server.
In the present study, the UPB "Voice Screen" application was installed on five different smartphones (namely, iPhone Pro Max 13, iPhone SE (iOS operating system), OnePlus 9 PRO, Samsung S22 Ultra, Huawei P50 pro (Android operating system)) used for AVQI estimation. The AVQI measures estimated with the "Voice Screen" application from voice In the first step, the given smartphone (iOS or Android) records sound waves acquired while saying given phrases aloud. The sound waves are preprocessed (see Step 1 in Figure 1) in real-time. The preprocessing aims to clean the sound waves from pauses and ensure the minimum amount of sound suitable for further analysis.
Step 2 sends the preprocessed sound wave to the server for further analysis. The server runs a Linux operating system and provides web services for software in Python. That software is based on the Praat (https://www.fon.hum.uva.nl/praat/, accessed on 4 June 2023) application ported into a Python library by the Parselmouth project (https://parselmouth.readthedocs.io/, accessed on 4 June 2023). We use this library to calculate AVQI and other sound characteristics used in AVQI calculation. In Step 3, the AVQI index and the related data are returned to the smartphone and displayed to the user. Step 4 is optional. If the user chooses to save the results, the sound waves and calculated characteristics are saved into the server's database. No personal data relating to a specific person with the calculated AVQI and its parameters is saved on a server.
In the present study, the UPB "Voice Screen" application was installed on five different smartphones (namely, iPhone Pro Max 13, iPhone SE (iOS operating system), OnePlus 9 PRO, Samsung S22 Ultra, Huawei P50 pro (Android operating system)) used for AVQI estimation. The AVQI measures estimated with the "Voice Screen" application from voice recordings obtained from a flat frequency response studio microphone AKG Perception 220 were compared with AVQI results obtained using these smartphone devices.

Statistical Analysis
Statistical analysis was performed using IBM SPSS Statistics for Windows, version 20.0 (IBM Corp., Armonk, NY, USA) and MedCalc Version 20.118 (MedCalc Software Ltd., Ostend, Belgium). The chosen level of statistical significance was 0.05.
The data distribution was determined according to the normality law by applying the Shapiro-Wilk test of normality and calculating the coefficients of skewness and kurtosis. Student's t-test was used to test the equality of means in normally distributed data [36]. An analysis of variance (ANOVA) was employed to determine if there were significant differences between the multiple means of the independent groups [37]. Cronbach's alpha was used to measure the internal consistency of measures [38]. Pearson's correlation coefficient was applied to assess the linear relationship between variables obtained from continuous scales. Spearman's correlation coefficient was used to determine the relationship in ordinal results. Receiver operating characteristic (ROC) curves were used to obtain the optimal sensitivity and specificity at optimal AVQI cut-off points. The "area under the ROC curve" (AUC) served to calculate the possible discriminatory accuracy of AVQI performed with a studio microphone and different smartphones. A pairwise comparison of ROC curves, as described by De Long et al., was used to determine if there was a statistically significant difference between two or more variables when categorizing normal/pathological voices [39].

Raters' Perceptual Evaluation Outcomes
The rater panel demonstrated excellent inter-rater agreement (Cronbach's α = 0.967) with a mean intra-class correlation coefficient of 0.967 between five raters (from 0.961 to 0.973).

AVQI Evaluation Outcomes
An individual smartphone AVQI evaluation displayed excellent agreement by achieving a Cronbach's alpha of 0.984. The inter-smartphone AVQI measurements' reliability was excellent, with an average Intra-class Correlation Coefficient (ICC) of 0.983 (ranging from 0.979 to 0.987).
The mean AVQI scores provided by different smartphones and a studio microphone can be observed in Table 2. As shown in Table 2, the one-way ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using different smartphones (F = 0.759; p = 0.58). Further Bonferroni analysis reaffirmed the lack of difference between the AVQI scores obtained from different smartphones (p = 1.0, estimated Bonferroni's p for statistically significant difference p = 0.01). The mean AVQI differences ranged from 0.01 to 0.4 points when comparing different smartphones.
Almost perfect direct linear correlations were observed between the AVQI results obtained with a studio microphone and different smartphones. Pearson's correlation coefficients ranged from 0.991 to 0.987 and can be observed in Table 3.
The relationships between the AVQI scores obtained with a studio microphone and different smartphones are graphically presented in Figure 2. to 0.4 points when comparing different smartphones. Almost perfect direct linear correlations were observed between the AVQI results obtained with a studio microphone and different smartphones. Pearson's correlation coefficients ranged from 0.991 to 0.987 and can be observed in Table 3. The relationships between the AVQI scores obtained with a studio microphone and different smartphones are graphically presented in Figure 2. As demonstrated in Figure 2, it is evident that AVQI results obtained with different smartphones closely resemble the AVQI results obtained with a studio microphone, with very few data points outside of the 95% confidence interval (R 2 = 0.961). Therefore, it is safe to conclude that the AVQI scores obtained with smartphones are directly compatible with the ones obtained with the reference studio microphone. As demonstrated in Figure 2, it is evident that AVQI results obtained with different smartphones closely resemble the AVQI results obtained with a studio microphone, with very few data points outside of the 95% confidence interval (R 2 = 0.961). Therefore, it is safe to conclude that the AVQI scores obtained with smartphones are directly compatible with the ones obtained with the reference studio microphone.

The Normal vs. Pathological Voice Diagnostic Accuracy of the AVQI Using Different Smartphones
First, the ROC curves of AVQI obtained from a studio microphone and different smartphone voice recordings were inspected visually to identify optimum cut-off scores according to general interpretation guidelines [40]. All of the ROC curves were visually almost identical and occupied the largest part of the graph, clearly revealing their respectable power to discriminate between normal and pathological voices (Figure 3). Second, as revealed by the AUC statistics analysis, a high level of precision of the AVQI in discriminating between normal and pathological voices was yielded with the suggested AUC = 0.800 threshold. The results of the ROC statistical analysis are presented in Table 4.

Smartphones
First, the ROC curves of AVQI obtained from a studio microphone and differ smartphone voice recordings were inspected visually to identify optimum cut-off sco according to general interpretation guidelines [40]. All of the ROC curves were visu almost identical and occupied the largest part of the graph, clearly revealing their resp able power to discriminate between normal and pathological voices (Figure 3). Second, as revealed by the AUC statistics analysis, a high level of precision of AVQI in discriminating between normal and pathological voices was yielded with suggested AUC = 0.800 threshold. The results of the ROC statistical analysis are presen in Table 4. As demonstrated in Table 4, the ROC analysis determined the optimal AVQI cut values for distinguishing between normal and pathological voices for each smartpho All employed microphones passed the proposed 0.8 AUC threshold and revealed an ceptable Youden-index value.
Third, a pairwise comparison of the significance of the differences between the AU revealed in the present study is presented in Table 5.  As demonstrated in Table 4, the ROC analysis determined the optimal AVQI cut-off values for distinguishing between normal and pathological voices for each smartphone. All employed microphones passed the proposed 0.8 AUC threshold and revealed an acceptable Youden-index value.
Third, a pairwise comparison of the significance of the differences between the AUCs revealed in the present study is presented in Table 5. As shown in Table 5, a comparison of the AUCs-dependent ROC curves (AVQI measurements obtained from studio microphone and different smartphones), according to the test of DeLong et al., confirmed no statistically significant differences between the AUCs (p > 0.05). The most considerable observed difference between the AUCs was only 0.028. These results confirmed the compatible results of the AVQI's diagnostic accuracy in differentiating normal vs. pathological voices when using voice recordings from a studio microphone and different smartphones.

Discussion
In the present study, the novel UPB "Voice Screen" application for the estimation of AVQI and detection of voice deteriorations in patients with various voice disorders and healthy controls was tested for the first time simultaneously with different smartphones. The AVQI was chosen for voice quality assessment because of some essential favorable features of this multiparametric measurement: the less vulnerability of the AVQI to environmental noise compared to other complex acoustic markers and the robustness of the AVQI regarding the interaction between acoustic voice quality measurements and room acoustics; there were no significant differences within subjects for both women and men when comparing the AVQI across different voice analysis programs [11,14,41]. Another essential attribute of the AVQI is that Praat is the only freely available program that estimates the AVQI. That eliminates the impact of possible software differences on AVQI computation.
In the present study, the results of the ANOVA analysis did not detect statistically significant differences between mean AVQI scores revealed using different smartphones (F = 0.759; p = 0.58). Moreover, the mean AVQI differences ranged from 0.01 to 0.4 points when comparing AVQI estimated with different smartphones, thus establishing a low level of variability. This corresponded with a value of 0.54 for the absolute retest difference of AVQI values proposed by Barsties and Maryn in 2013 [20,42]. Consequently, these outcomes of AVQI measurements with different smartphones were considered neither statistically nor clinically significant, justifying the possibility of practical use of the UPB "Voice Screen" app.
The correlation analysis showed that all AVQI measurements were highly correlated (Pearson's r ranged from 0.991 to 0.987) across the devices used in the present study. This concurred with the literature data on the high correlation between acoustic voice features derived from studio microphones and smartphones and examined both for control participants and synthesized voice data [7,[12][13][14].
Furthermore, analysis of the results revealed that the AVQI showed a remarkable ability to discriminate between normal and pathological voices as determined by auditoryperceptual judgment. The ROC analysis determined the optimal AVQI cut-off values for distinguishing between normal and pathological voices for each smartphone used. A remarkable precision of AVQI in discriminating between normal and pathological voices was yielded (AUC 0.834-0.862), resulting in an acceptable balance between sensitivity and specificity. These findings suggested that the AVQI was a reliable tool in differentiating normal/pathological voices independently of the voice recordings from tested studio microphones and different smartphones. The comparison of the AUC-dependent ROC curves (AVQI measurements obtained from studio microphone and different smartphones) demonstrated no statistically significant differences between the AUCs (p > 0.05), with the largest revealed difference between the AUCs of only 0.028. These results confirmed the compatible results of the AVQI diagnostic accuracy in differentiating normal vs. pathological voices when using voice recordings from studio microphone and different smartphones and presented remarkable importance from a practical point of view.
Several limitations of the present study have to be considered. Despite the encouraging results of the AVQI measurements, some individual discrepancies between AVQI results revealed with different smartphones still exist. Therefore, further research in a wide diversity of voice pathologies, including functional voice disorders, is needed to ensure the maximum comparability of acoustic voice features derived from voice recordings obtained with mobile communication devices and reference studio microphones. In the present study, the voice recordings were performed in a sound-proof room. However, in real clinical situations where environmental noise exists, the omni-directional built-in microphones of smartphones may induce different results. Therefore, further studies of the Voice Screen application's performance with different smartphones in a real clinical setting are required to evaluate both the impact of the voice recording environment and the peculiarities of the microphones on the AVQI estimation in real clinical situations by performing simultaneous voice recordings with different smartphones. The outcomes of further studies will potentially make possible the results and improvements to be employed in healthcare applications.
Summarizing the results of the previous and present studies allows for the presumption that the performance of the novel UPB "Voice Screen" app using different smartphones represents an adequate and compatible performance of AVQI estimation. However, it is important to note that due to existing differences in recording conditions, microphones, hardware, and software, the results of acoustic voice quality measures may differ between recording systems [11]. Therefore, using the UPB "Voice Screen" app with some caution is advisable. For voice screening purposes, it is more reliable to perform AVQI measurements using the same device, especially when performing repeated measurements. Moreover, these bits of advice should be considered when comparing data of acoustic voice analysis between different voice recording systems, i.e., different smartphones or other mobile communication devices, and when using them for diagnostic purposes or monitoring voice treatment outcomes.

Conclusions
The UPB "Voice Screen" app represents an accurate and robust tool for voice quality measurement and normal vs. pathological voice screening purposes, demonstrating the potential to be used by patients and clinicians for voice assessments, employing both iOS and Android smartphones. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.