Inﬂuence of Analyzed Sequence Length on Parameters in Laryngeal High-Speed Videoendoscopy

: Laryngeal high-speed videoendoscopy (HSV) allows objective quantiﬁcation of vocal fold vibratory characteristics. However, it is unknown how the analyzed sequence length affects some of the computed parameters. To examine if varying sequence lengths inﬂuence parameter calculation, 20 HSV recordings of healthy females during sustained phonation were investigated. The clinical prevalent Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512 × 256 pixels was used to collect HSV data. The glottal area waveform (GAW), describing the increase and decrease of the area between the vocal folds during phonation, was extracted. Based on the GAW, 16 perturbation parameters were computed for sequences of 5, 10, 20, 50 and 100 consecutive cycles. Statistical analysis was performed using SPSS Statistics, version 21. Only three parameters (18.8%) were statistically signiﬁcantly inﬂuenced by changing sequence lengths. Of these parameters, one changed until 10 cycles were reached, one until 20 cycles were reached and one, namely Amplitude Variability Index ( AVI ), changed between almost all groups of different sequence lengths. Moreover, visually observable, but not statistically signiﬁcant, changes within parameters were observed. These changes were often most prominent between shorter sequence lengths. Hence, we suggest using a minimum sequence length of at least 20 cycles and discarding the parameter AVI .


Introduction
The vocal folds are located in the larynx and produce the source signal for voice and speech. They start vibrating when the tracheal airflow, coming from the lungs, sets them in motion. During vibration of the vocal folds this airflow is interrupted, resulting in audible sound. After passing the vocal folds, the airflow is further modulated by tongue and lips, producing voice and speech in the process [1,2]. The vocal folds vibrate in varying frequency. Upper range of females' fundamental frequency (F0) were reported to range from 250 Hz [3,4] to 1000 Hz [5]. During singing even higher frequencies of up to 1568 Hz were reported [6].
Vocal fold vibratory patterns can be investigated using several imaging techniques. Videostroboscopy (VS) produces an illusory slow motion by relying on the assumption of the periodic nature of vocal fold vibration. With short strobe light flashes, single images from consecutive oscillation Vocal fold vibratory patterns can be investigated using several imaging techniques. Videostroboscopy (VS) produces an illusory slow motion by relying on the assumption of the periodic nature of vocal fold vibration. With short strobe light flashes, single images from consecutive oscillation cycles are recorded with a small delay to the previous cycle. These images are then assembled to artificial glottal cycles. However, since VS presents only an artificial slow motion, even subtle variation in periodicity of the vocal fold vibration can result in completely distorted or unrealistic image sequences [7]. Another technique in use is videokymography (VK), which, in contrast to VS, records the vocal fold oscillation at frame rates of about 7000 to 8000 Hz [7][8][9][10], which is distinctly higher than the vocal folds vibration frequency, but can only scan a single line across the glottis [7]. With high-speed videoendoscopy (HSV), the whole glottis is recorded using a high-speed camera [7,11] with frame rates of currently about 4000 Hz in clinical applications [12][13][14][15]. Hence, HSV overcomes the limitations of VS and VK and combines the advantages of both techniques [7,11].
Since the introduction of HSV to laryngeal examination, numbers of different studies using HSV have been published [16][17][18][19][20][21]. Also, HSV is no longer reserved for scientific use only; the clinical applicability of HSV was tested recently on a larger scale in comparison with VS Ratings of all vibratory features which showed changes between VS and HSV and it was concluded that HSV could enable important refinements in diagnosis and management of vocal fold pathology [22]. As HSV is superior to alternative procedures such as VS and VK [7,14,23], it possesses the potential to replace VS [11], the longtime "gold standard" and widely used technique of laryngeal examination [24][25][26]. However, HSV systems are expensive and these high costs are considered as the most prohibitive factor for the widespread clinical implementation of HSV [7].
A typical clinical examination situation, as it is used for HSV using rigid endoscope, is illustrated in Figure 1. The vibration of the vocal folds is recorded from above [27]. From the recorded data, different features can be extracted. The most prominent and significant feature is the glottal area waveform (GAW). The GAW describes the area between the vocal folds, the "glottal area", which opens and closes periodically during normal phonation. For each individual video frame, the glottal area is segmented and lined up in a function as shown in Figure 1b,c. The GAW is defined slightly differently in different works [28][29][30][31]. In this work, the GAW is defined as the function of the glottal area in pixels over frames. All parameters used in this work were calculated using this definition of the GAW.  Even though HSV, sometimes done in combination with recording of the audio signal [32,33], is a powerful method for examining the phonation process [7], the objective parameters obtained from  [34][35][36][37][38]. One of these factors is the recording frame rate, which was already investigated for acoustic and GAW signals. For acoustic measures, a sampling frequency of at least 26 kHz was suggested to avoid the introduction of errors [34]. For GAW signals it is reported that up to 90% of parameters were affected by the changes in the frame rate [35]. That study suggested that normative parameter values based on the recording frame rate should be determined and a recording frequency of 4000 Hz seemed to be too low to register all details of vocal fold vibratory patterns. Still, the application of recording frame rates of 4000 Hz in clinical studies was judged as justified, since the parameter changes between 4000 Hz and 15,000 Hz were relatively small for glottal dynamic characteristics and glottal perturbation characteristics. For acoustic signals, the stability of perturbation measures was investigated with deviating results [36][37][38]. Scherer et al. suggested a minimal sequence length in the order of 100 cycles for the calculation of stable perturbation measures in the acoustic signal [36]. Karnell et al. found that frequency and amplitude perturbation measures (APM) were not in agreement for three different analysis systems, even for 110 consecutive cycles [37]. Another investigation was done for the electroglottographic (EGG) signal, which describes the electrical impedance between two electrodes placed on the left and right side of the larynx and changes with vocal fold vibration. The influence of different sequence lengths on EGG and audio was investigated and it was found that two of nine perturbation measures for the EGG signal and two of nine perturbation measures for the audio signal (although not the same measures) were affected by changing sequence lengths [38]. However, to the best of the authors' knowledge, no studies exist examining the influence of the analyzed interval length especially for GAW parameters computed from HSV data.
In various studies, perturbation parameters are calculated for the GAW, and often the analyzed sequence length varies [39][40][41][42]. Moreover, the sequence lengths are often given in milliseconds [39,40]; hence the number of cycles ultimately used to calculate the perturbation measures may vary within these studies. To find out if and how this affects the comparability of these studies, the current work investigated the influence of a differing sequence length on 16 different perturbation parameters. Specifically, period, amplitude and energy perturbation parameters were investigated. The aims of this work can be summarized in the following way:

1.
Examine if varying sequence length affects GAW perturbation parameters.

2.
Determine if there is a statistical change in parameters by varying sequence length.

3.
Investigate the reason for the susceptibility of these parameters to a changing sequence length.
These goals are met by a systematic analysis of all 16 examined perturbation measures. A detailed discussion of the statistically significantly changes in parameters due to varying sequence length was given. The suggestion of the use of at least 20 cycles was given for future studies using HSV data.

Materials and Methods
Twenty endoscopically recorded HSV data from 20 healthy female subjects were investigated. All recordings were chosen from our existing clinical database. Data collection and usage was approved by the ethic committee of the Medical School at Friedrich-Alexander-University Erlangen-Nürnberg (no. 290_13B). All subjects phonated the vowel /i/ at a comfortable pitch and loudness level during examination. All 20 videos chosen for this study had a comparatively good recording quality with visibility of the entire glottis and good brightness and contrast. The chosen videos were recorded by the clinically used Photron Fastcam MC2 with a spatial resolution of 512 × 256 pixels and a frame rate of 4000 fps. All chosen videos included at least 102 consecutive cycles of glottis closing and opening. The sequences of 100 cycles used for analysis ranged in length from 234.75 ms (427.11 Hz F0) to 426.50 ms (234.69 Hz F0). Therefore, with a sampling rate of 4000 Hz the Nyquist sampling criterion was more than satisfied with respect to GAW F0.
All recordings were segmented using a modified version of our in house developed software, Glottis Analysis Tools (GAT-2018). This modified version was slightly adjusted to allow a smaller inter seed point distance and a more precise segmentation. The segmentation procedure is depicted in Figure 2 and was as follows:

1.
A region of interest in the video was selected, which included full view of glottis.

2.
An interval containing at least 102 cycles during constant phonation was selected. When selecting the intervals, care was taken to choose sections in which the glottis was completely visible and the field of view moved as little as possible.

3.
For the initial pre-segmentation, seed points (green crosses in Figure 2(3,4)) were set and brightness thresholds were used. All pixels surrounding a seed point position including the pixel on the position itself are marked, if they are darker than the selected brightness thresholds.

4.
Afterwards the seed points were substituted by a regular seed point grid. In the grid region every second pixel was marked with a seed point. The grid was created semi-automatically by using a seed point drawing tool. 5.
The brightness thresholds were adjusted yielding the finalized brightness settings. 6.
The total GAW (GAW T ) was extracted for each recording. All recordings were segmented using a modified version of our in house developed software, Glottis Analysis Tools (GAT-2018). This modified version was slightly adjusted to allow a smaller inter seed point distance and a more precise segmentation. The segmentation procedure is depicted in Figure 2 and was as follows: 1. A region of interest in the video was selected, which included full view of glottis.
2. An interval containing at least 102 cycles during constant phonation was selected. When selecting the intervals, care was taken to choose sections in which the glottis was completely visible and the field of view moved as little as possible. 3. For the initial pre-segmentation, seed points (green crosses in Figure 2.3.,2.4.) were set and brightness thresholds were used. All pixels surrounding a seed point position including the pixel on the position itself are marked, if they are darker than the selected brightness thresholds. 4. Afterwards the seed points were substituted by a regular seed point grid. In the grid region every second pixel was marked with a seed point. The grid was created semi-automatically by using a seed point drawing tool. 5. The brightness thresholds were adjusted yielding the finalized brightness settings. 6. The total GAW (GAWT) was extracted for each recording. The segmentation was performed using regular grids of seed points (i.e., setting the seed points in an organized mesh, as it can be seen in Figure 2.4.). This segmentation style was chosen to ensure a more objective segmentation and minimalize errors by missed small sections of the glottal area. However, this method of segmentation is only applicable for recordings with sufficiently good The segmentation was performed using regular grids of seed points (i.e., setting the seed points in an organized mesh, as it can be seen in Figure 2(4)). This segmentation style was chosen to ensure a more objective segmentation and minimalize errors by missed small sections of the glottal area. However, this method of segmentation is only applicable for recordings with sufficiently good contrast and clearly visible boundaries of the glottal area. Altogether 20 GAW T signals were calculated.
Maximum based cycle detection was chosen to determine the cycles of the GAW T signals. Each cycle starts at a significant local maximum and ends one frame before the next one. Beginning with the second detected cycle, as Figure 3 illustrates, for each GAW 5, 10, 20, 50 and 100 consecutive cycles were selected for parameter computation, yielding five "cycle sets" per GAW. Since significant influences on the parameter calculation by frequency shifts in the phonation or field of view movements become more likely with growing recording length [43], no longer cycle sets were chosen. Furthermore, greater numbers of cycles will add more analysis time and would not be feasible in a clinical setting. From the cycle sets, 16 different perturbation parameters were calculated. All 16 parameters, their origin and a brief description are summarized in Table 1. contrast and clearly visible boundaries of the glottal area. Altogether 20 GAWT signals were calculated. Maximum based cycle detection was chosen to determine the cycles of the GAWT signals. Each cycle starts at a significant local maximum and ends one frame before the next one. Beginning with the second detected cycle, as Figure 3 illustrates, for each GAW 5, 10, 20, 50 and 100 consecutive cycles were selected for parameter computation, yielding five "cycle sets" per GAW. Since significant influences on the parameter calculation by frequency shifts in the phonation or field of view movements become more likely with growing recording length [43], no longer cycle sets were chosen. Furthermore, greater numbers of cycles will add more analysis time and would not be feasible in a clinical setting. From the cycle sets, 16 different perturbation parameters were calculated. All 16 parameters, their origin and a brief description are summarized in Table 1.

Parameter (Unit) and Reference Abbreviation Parameter Description Period Perturbation Measures (PPM)
Mean Jitter (ms) [44] MJit Mean deviation in duration between cycle pairs Jitter (%) (a.u.) [44] Jit(%) Normalized mean deviation in duration between cycle pairs Jitter Factor (a.u.) [45] JitFac Normalized mean deviation of reciprocal in duration between cycle pairs Jitter Ratio (a.u.) [46] JitRat Normalized mean deviation in duration between cycle pairs Period Perturbation Quotient-3% (a.u.) [ Energy Perturbation Quotient-3% (a.u.) [47] EPQ3 Difference in energy based on the mean difference between each inner cycle and its neighboring cycles Energy Perturbation Factor (a.u.) [47] EPF Mean normalized deviation in energy between cycle pairs 1 In the source material one formula is given as "Perturbation Quotient" and one as "Perturbation Factor". The different types of Perturbation Quotients and Factors in this work were calculated by inserting cycle lengths, dynamic ranges and cycle energies in these original formulas for, in case of the Perturbation Quotient, k = 3. 2 The "dynamic range" is defined as the maximum of the glottal area in one cycle minus the minimum of the glottal area in the same cycle.
Each parameter was computed for each of the five cycle sets for each of the 20 GAW T signals. All values of one parameter calculated from one cycle set were grouped together resulting in five sets of 20 values each for every parameter. Each set of values referring to a sequence length from 5 to 100 cycles. These five sets were compared with each other for every parameter. Therefore, pairwise tests for connected samples using SPSS Statistics version 21 were performed. For each test the H 0 Hypothesis was rejected if the p-value was equal or less than 0.05. For the general linear model (GLM), repeated measures with five within-subject variables (i.e., the five sequence lengths) were chosen. The default setting of a saturated model with a Type III sum of squares was retained. We applied Bonferroni correction to pairwise comparisons (see Figure 4) by multiplying p-values of post hoc tests by five. The p-values were clipped at 1. The workflow of the entire statistical analysis is shown in Figure 4.
Hypothesis was rejected if the p-value was equal or less than 0.05. For the general linear model (GLM), repeated measures with five within-subject variables (i.e., the five sequence lengths) were chosen. The default setting of a saturated model with a Type III sum of squares was retained. We applied Bonferroni correction to pairwise comparisons (see Figure 4) by multiplying p-values of post hoc tests by five. The p-values were clipped at 1. The workflow of the entire statistical analysis is shown in Figure 4.

Results
Statistical analysis revealed a statistically significant change in three out of 16 examined parameters for different sequence lengths. The significantly changing parameters were Amplitude Variability Index (AVI) (p < 0.001), Relative Average Perturbation Bielamowicz (RAP B ) (p < 0.001) and Amplitude Perturbation Quotient-3% (APQ3) (p = 0.017).
Post hoc tests disclosed that AVI changed between almost all different pairings of sequence lengths. The only not statistically significantly different pairings were between 5 and 10 and between 10 and 20 cycles. RAP B changed statistically significantly until 20 consecutive cycles were reached and APQ3 changed statistically significantly until 10 consecutive cycles were reached. Statistical p-values of all parameters can be seen in Table S1 in the supplementary information.
This table contains the p-values for all Friedman and GLM tests and all performed post hoc tests. Additionally, descriptive values, i.e., group means, standard deviation, maximum and minimum values for period, amplitude and energy perturbation parameters for all sequence lengths are represented in Appendix A in Tables A1-A3. Last in Table 2 a summary of all observed statistically significant changes and also systematic in-or decreases for all parameters is given. x Indicates that the calculated descriptive value increased monotonically until x consecutive cycles were reached. 2 x Indicates that the calculated descriptive value decreased monotonically until x consecutive cycles were reached.
In addition to the statistically significant changes, visual subjectively observable trends were found. As depicted in Figure 5, for the Period Perturbation Measures (PPM) the descriptive values i.e., group mean, standard deviation, maximum and minimum of most parameters increased or decreased consistently up to certain sequence lengths. To give a visual impression for parameter behavior in this figure, the descriptive values were normalized to their maximum values for a better comparability. The same standardization was applied to the data depicted in Figures 6 and 7. Detailed information of observed systematic increases or decreases in descriptive values for all parameters is given in Table 2.
Appl. Sci. 2018, 8, x FOR PEER REVIEW 9 of 18 behavior in this figure, the descriptive values were normalized to their maximum values for a better comparability. The same standardization was applied to the data depicted in Figures 6 and 7. Detailed information of observed systematic increases or decreases in descriptive values for all parameters is given in Table 2. behavior in this figure, the descriptive values were normalized to their maximum values for a better comparability. The same standardization was applied to the data depicted in Figures 6 and 7. Detailed information of observed systematic increases or decreases in descriptive values for all parameters is given in Table 2.   The descriptive values for amplitude perturbation are depicted in Figure 6. The AVI was excluded from this figure since it can become negative and was hence not suitable for relative comparison. In other words, if AVI would be normalized in the same way as the other parameters, it would map to a number space outside the 0 to 1 interval. The descriptive values for amplitude perturbation are depicted in Figure 6. The AVI was excluded from this figure since it can become negative and was hence not suitable for relative comparison. In other words, if AVI would be normalized in the same way as the other parameters, it would map to a number space outside the 0 to 1 interval.
In Figure 7, the descriptive values for all examined Energy Perturbation measures (EPM) are plotted.

Discussion
The segmented glottal area can be affected by changing illumination, camera movement and larynx movement itself, which influences the calculated dynamic ranges (maximum minus minimum of the glottal area in 1 cycle). Hence the dynamic ranges may increase or decrease over time for some segments of the signal. This also explains the statistically significant change in AVI between all groups of sequence lengths in contrast to the other unaffected APM. AVI does not compare the dynamic ranges of consecutive cycles in pairs but instead compares each single dynamic range to an average dynamic range calculated for all cycles (see Table 1). For this reason, AVI is more sensitive to long term changes in the signal. As the dynamic ranges continue to increase or decrease in the signal course, the distance between the average dynamic range and the dynamic ranges of each cycle increases with the signal length, which in turn affects the AVI. As opposed to this, the influence of such long-term effects on perturbation parameters comparing only neighboring cycles does not grow with the sequence length. Analogous to AVI also Period Variability Index (PVI) compares an average cycle length to every single cycle length. The reason why it does not change statistically significantly is that, with constant phonation, the cycle lengths do not increase or decrease over time and hence no long-term effects similar to the effects influencing the dynamic ranges occurred.
RAP B changes statistically significantly until at least a signal length of 20 consecutive cycles is reached. In contrast RAP K , which is a normalized version of RAP B , does not show statistically significant changes. In a previous work it was found that the maximum reachable value of RAP K depends on the number of analyzed cycles [51], which is not the case for RAP B , if the sequence length exceeds five cycles. Hence it seems natural to assume that RAP K changes more strongly with changing sequence lengths than RAP B . Still RAP K was the more stable measure in this study. For that reason, it can be assumed that for healthy female subjects RAP K is more consistent for different sequence lengths than RAP B . Nevertheless because of the previous findings regarding the maximum reachable values, there is the possibility that for other types of phonation, for example voices with high period perturbation [3,52], RAP B would be more consistent than RAP K for different sequence lengths of GAW-cycles.
APQ3 only deviated statistically significantly between a sequence length of five analyzed cycles and the larger sequence lengths (with exception of the 5 cycles/100 cycles pairing). This could be the case since APQ3 seems to be generally less stable than comparable parameters like MShim. In Figure 8a, a series of ten consecutive dynamic ranges is depicted for which the difference in behavior between APQ3 and exemplary MShim is clearly visible. For the different intervals of five cycles and the entire ten cycles, APQ3 and MShim were calculated. MShim behaves consistently across the various intervals and the MShim value for all ten consecutive cycles lies in between the values for the shorter intervals. In contrast, APQ3 varies more strongly for the different five-cycle intervals and additionally the APQ3 value calculated over all ten cycles is lower than the APQ3 value for most of the shorter intervals. Figure 8b depicts the period lengths for the same subject. In contrast to the dynamic ranges, they are generally much more regular. Hence for this example, the PPQ3 values that are calculated using the same formula as the APQ3 values, but using period lengths instead of dynamic ranges, do not change at all for different starting positions. Since the cycle lengths were more uniform than the dynamic ranges, PPQ3 did not change statistically significantly but APQ3 did. The mean and maximum values and standard deviations for most parameters displayed consistent tendencies and changed most clearly between the shorter sequence lengths (for details see Table 2 and Figures 5-7). Minimum values usually increased with an increasing sequence length without reaching a stable region. The instability of the minimum values for all parameters could be due to the rising probability of changes in phonation with increasing sequence length. Furthermore, it is noteworthy that all Perturbation Quotients (PPQ, APQ and EPQ) behaved clearly distinctively from the other parameters of their groups but rather similar in comparison to each other. This is because they are calculated using the same formula only for different input data [47]. However, The mean and maximum values and standard deviations for most parameters displayed consistent tendencies and changed most clearly between the shorter sequence lengths (for details see Table 2 and Figures 5-7). Minimum values usually increased with an increasing sequence length without reaching a stable region. The instability of the minimum values for all parameters could be due to the rising probability of changes in phonation with increasing sequence length. Furthermore, it is noteworthy that all Perturbation Quotients (PPQ, APQ and EPQ) behaved clearly distinctively from the other parameters of their groups but rather similar in comparison to each other. This is because they are calculated using the same formula only for different input data [47]. However, except for AVI, none of these changes were found to be statistically significant for comparisons between sequence lengths of 20 cycles and longer sequences. Furthermore, even though systematic increases and decreases were often visually observed up to a sequence length of 50 or 100 cycles (see Table 2), the largest changes were observed for almost all parameters between shorter sequence lengths. Hence, we suggest avoiding smaller sequence lengths than 20 cycles for calculation of all GAW perturbation measures. Additionally, we suggest avoiding the use of the parameter AVI in general. We make this general suggestion because taking into account the observed often systematic behavior of the descriptive values, it is possible that other more subtle effects exist that were not significant in our analysis. To be able to make a more precise statement, it is necessary to confirm these findings for larger datasets and especially for subjects with vocal disorders.

Shortcomings
Since only recordings of healthy females were investigated, the conclusions of this work are not necessarily transferable to male subjects and subjects with voice disorders. Especially for heavily disturbed vocal fold oscillations, the selection of a sequence length greater than 20 cycles for analysis may be necessary.
Since there is a significant overlap of the cycle sets (see Figure 3), the parameters for different sequence lengths are more likely to attain similar values. This overlap was preferred, since otherwise the influences by camera movement and other long-term effects might increase. Additionally this study only provides a small sample size, which limits its statistical significance.
More perturbation parameters than in this evaluated set of parameters may exist. It is also possible that in other works parameters with the same name as the parameters examined in this work are defined differently. In particular, it should be noted that different software tools may deviate significantly in the calculation of various parameters [37,48]. This may limit the transferability of the results of this study to those. Furthermore, other GAW definitions exist that were not considered here [28][29][30][31].

Conclusions
The comparability of studies using different sequence lengths for GAW perturbation parameter calculations is given with certain limitations. First, the chosen sequence length should be at least 20 cycles to minimize the influence of statistically significant effects on certain parameters. More subtle influences on descriptive values of the investigated parameters were also observed, most clearly between shorter sequence lengths. This further justifies the lower limit of 20 cycles. Second, the parameter AVI is generally not comparable for different GAW sequence lengths. With this study another potential influence factor on voice disorder parameters was investigated, as different other influencing factors on other parameter types were investigated before. This will pave the way to the reduction of the great number of measures in use to a smaller set of meaningful, standardized parameters to greatly improve the information exchange between different studies and the relevance of clinical data.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A
The Tables A1-A3 list descriptive values for all parameters for period perturbation measures (Table A1), amplitude perturbation measures (Table A2) and energy perturbation measures (Table A3).