A Scoping Literature Review of Relative Fundamental Frequency (RFF) in Individuals with and without Voice Disorders

: Relative fundamental frequency (RFF) is an acoustic measure that characterizes changes in voice fundamental frequency during voicing transitions. Despite showing promise as an indicator of vocal disorder and laryngeal muscle tension, the clinical adoption of RFF remains challenging, partly due to a lack of research integration. As such, this review sought to provide summative information and highlight next steps for the clinical implementation of RFF. A systematic literature search was completed across 5 databases, yielding 37 articles that met inclusion criteria. Studies most often included adults with and without tension-based voice disorders (e.g., muscle tension dysphonia), though patient and control groups were directly compared in only 32% of studies. Only 11% of studies tracked therapeutic progress, making it difﬁcult to understand how RFF can be used as a clinical outcome. Speciﬁcally, there is evidence to support within-person RFF tracking as a clinical outcome, but more research is needed to understand how RFF correlates to auditory-perceptual ratings (strain, effort, and overall severity of dysphonia) both before and after therapeutic interventions. Finally, a marked increase in the use of automated estimation methods was noted since 2016, yet there remains a critical need for a universally available algorithm to support widespread clinical adoption.


Introduction
Relative fundamental frequency (RFF) is an acoustic measure that has gained substantial traction for its use in clinical voice evaluation. Over the last two decades, research into RFF has yielded evidence to suggest that this non-invasive, objective measure may assist in assessing and tracking the degree of laryngeal muscle tension [1] and its associated symptoms [2]. It has been estimated that 65% of patients with voice disorders have excessive laryngeal tension [3], making RFF an appealing acoustic measure for clinical diagnosis and tracking.
Despite the promising results demonstrated by these studies, RFF is not yet a staple of standardized voice assessments. The discrepancy between research and clinical support for RFF may be, in part, due to a lack of clear understanding of its application and interpretation across patient populations. Yet, a comprehensive review of the RFF literature has not been undertaken and, moreover, summative information for RFF as a clinical tool is 2 of 18 nonexistent. Thus, the purpose of this scoping review was to carry out an exhaustive search into the existing RFF literature to be able to condense current knowledge and provide recommendations for next steps needed toward the clinical adoption and interpretation of RFF.

Stimuli and Calculation
Traditional acoustic measures of voice-such as jitter and shimmer-are computed during voicing steady-states, most often during sustained vowel productions. RFF is a unique voice measure because it quantifies changes in fundamental frequency (f o ) during voicing transitions. Specifically, RFF is measured from the instantaneous changes in ɥ \textturnh /, and /ufu/ [10], or specific wording from the phonetically balanced Rainbow Passage [11]. For example, the VCV transition within the word "beautiful" (i.e., / /) meets the criteria for RFF extraction. RFF is calculated from the 10 voicing cycles immediately before the voiceless consonant ("offset cycles") and 10 voicing cycles immediately after the voiceless consonant ("onset cycles"). As such, an estimate of RFF for one speaker comprises 20 individual values; each value corresponds to a change in f o from steady state and can be analyzed either alone as an estimate of laryngeal tension or alongside other values to understand trends in the speaker's devoicing and/or voicing gestures. When reporting RFF findings, individual voicing cycle values are labeled sequentially from 1 to 10 as they occur in time, with offset cycle 10 and onset cycle 1 located closest to the consonant (see Figure 1 for example of labeled offset and onset cycles). Despite the promising results demonstrated by these studies, RFF is not yet a staple of standardized voice assessments. The discrepancy between research and clinical support for RFF may be, in part, due to a lack of clear understanding of its application and interpretation across patient populations. Yet, a comprehensive review of the RFF literature has not been undertaken and, moreover, summative information for RFF as a clinical tool is nonexistent. Thus, the purpose of this scoping review was to carry out an exhaustive search into the existing RFF literature to be able to condense current knowledge and provide recommendations for next steps needed toward the clinical adoption and interpretation of RFF.

Stimuli and Calculation
Traditional acoustic measures of voice-such as jitter and shimmer-are computed during voicing steady-states, most often during sustained vowel productions. RFF is a unique voice measure because it quantifies changes in fundamental frequency (fo) during voicing transitions. Specifically, RFF is measured from the instantaneous changes in fo during vocalic devoicing and voicing gestures during a vowel-voiceless consonant-vowel (VCV) utterance. Usual RFF stimuli include VCV utterances of /ifi/, /ɑfɑ/, and /ufu/ [10], or specific wording from the phonetically balanced Rainbow Passage [11]. For example, the VCV transition within the word "beautiful" (i.e., /ɪfə/) meets the criteria for RFF extraction.
RFF is calculated from the 10 voicing cycles immediately before the voiceless consonant ("offset cycles") and 10 voicing cycles immediately after the voiceless consonant ("onset cycles"). As such, an estimate of RFF for one speaker comprises 20 individual values; each value corresponds to a change in fo from steady state and can be analyzed either alone as an estimate of laryngeal tension or alongside other values to understand trends in the speaker's devoicing and/or voicing gestures. When reporting RFF findings, individual voicing cycle values are labeled sequentially from 1 to 10 as they occur in time, with offset cycle 10 and onset cycle 1 located closest to the consonant (see Figure 1 for example of labeled offset and onset cycles). RFF values for a given voicing cycle are determined by first calculating the instantaneous fo of the cycle (i.e., the inverse of the cycle period) in units of Hertz (Hz). The cycle fo is normalized to that of the voicing cycle closest to the steady-state portion of its corresponding vowel (fo ref ); for voicing offset cycles, this is the first voicing cycle (cycle 1) in the first vowel of the VCV utterance and for voicing onset cycles, this is the last voicing cycle (cycle 10) of the second vowel. Finally, all cycle values (in Hz) are converted to semitones (ST), as shown in Equation (1).  Normalizing RFF values from Hz to ST helps to standardize the measurement to account for a speaker's own variations in f o production across vowels, and additionally allows for comparisons across speakers. Therefore, RFF cycle values reflect a standardized change in the frequency of cycle vibration from the steady state of each vowel. Specifically, an RFF cycle value of 0 ST reflects no change in vibratory timing from the vowel steady state, whereas a positive RFF cycle value indicates higher f o value relative to steady state (i.e., faster vocal fold vibration) and a negative RFF cycle value indicates lower f o value relative to steady state (i.e., slower vocal fold vibration). Importantly, Watson [12] was the first to publish on the RFF measure using the normalization and conversion procedures explained above. At the time, he referred to the measures only as "cycle values during devoicing." It was not until Stepp et al. [13] that the term "relative fundamental frequency" was coined, and subsequently adopted across the voice literature.

Physiological Basis
Prior to the development of the RFF metric, several researchers examined cycle-specific changes in f o to describe phonatory offset and onset behaviors in similar phonetically constrained contexts. It has been postulated that two primary laryngeal adjustments occur to cease vocal fold vibration during intervocalic offsets: vocal fold abduction and increased laryngeal muscle tension [14]. Vocal fold abduction is primarily attributed to the posterior cricoarytenoid muscle and has been shown to reduce transglottal pressure as well as reduce the duration of vocal fold contact [15] to inhibit vocal fold vibration. At the same time, increased laryngeal muscle tension from increased activity of the cricothyroid (CT) [16] and vocalis [17] muscles are suspected to help cease vibration, though there is additional evidence to suggest that overall laryngeal height and/or tilt (mediated primarily by the extrinsic laryngeal musculature) may also play a role [18].
Intervocalic onset behavior is influenced by the preceding voiceless consonant [19,20], with laryngeal tension established during the voiceless consonant carrying over into the initial voicing cycles. The existing tension in the CT and vocalis are thought to increase f o [16,21] while vocal fold adduction acts to raise transglottal pressure and assist in the re-initiation of voicing. Prior work has shown that initial voicing onset cycles exhibit relatively higher f o when compared to subsequent voicing cycles [22,23].
Stepp et al. [24] integrated information from these previous works to propose a theoretical model to interpret RFF in speakers with and without voice disorders. The authors hypothesized that offset and onset RFF values rely on three laryngeal factors: kinematics, aerodynamics, and muscle tension. Intervocalic offset cycles generally stabilize around 0 ST in speakers with typical voices, indicating a balance between the increasing laryngeal muscle tension needed to stop vocal fold vibration (increasing vocal fold vibration and cycle speed) and reduced transglottal pressure due to vocal fold abduction for the voiceless consonant production (decreasing vocal fold vibration and cycle speed). Conversely, intervocalic onset cycles in speakers with typical voices are consistently more positive than offset values at the start of the second vowel, hovering between 2 and 3 ST for onset cycle 1. The high RFF values observed during initial onset cycles are thought to occur due to increased transglottal pressure and peak flow during adduction [20] in combination with elevated longitudinal tension carried over from the consonant production.
It follows that someone with a voice disorder who has impacted airflow, laryngeal kinematics, and/or laryngeal muscle tension would show aberrant RFF. Stepp et al. [24] hypothesized that patients with voice disorders characterized by excessive tension of the intrinsic and/or extrinsic laryngeal musculature would exhibit lower RFF values. Specifically, the authors proposed that elevated laryngeal muscle tension at baseline would prohibit increases in tension levels during voicing offset, which would typically help to counteract the effects of abduction. A lack of increased tension from baseline was speculated to result in more negative RFF offset cycle values compared to adults with typical voices, where values hover around 0 ST for voicing offset. It was also hypothesized that voicing onset would be affected by this inability to further modulate laryngeal tension, resulting in lower RFF onset values when compared to adults with typical voices.
Heller Murray et al. [6] expanded on this model by distinguishing the effects of longitudinal and transverse laryngeal muscle tension for specific patient populations. The authors proposed that patients with phonotraumatic lesions may have elevated longitudinal and transverse tension-or tight adduction during phonation [25]-that interacts with the laryngeal kinematics proposed in the Stepp et al. [24] model. It was hypothesized that additional transverse vocal fold tension would increase vocal fold contact time, thereby reducing the effects of vocal fold abduction and consequently lowering RFF values beyond the effects of longitudinal tension alone. Likewise, increased transverse tension was thought to reduce the duration of the adductory gesture and inhibit the impact of aerodynamic forces on RFF to, once again, lower RFF values as compared to patients with only elevated longitudinal tension at baseline. This model was supported via evidence of significantly lower RFF offset and onset cycle values for the voicing cycles located closest to the voiceless consonant (i.e., offset cycle 10 and onset cycle 1) in patients with phonotraumatic lesions when compared to those with non-phonotraumatic VH.

Clinical Challenges
Despite growing theoretical and empirical evidence for RFF as an acoustic indicator of laryngeal muscle tension, RFF is not included within the battery of acoustic metrics recommended for standardized clinical voice evaluations [26]. Two of the primary barriers inhibiting the clinical implementation of RFF include the extensive training required to reliably identify RFF cycles during voicing transitions, as well as the time-consuming nature of manually extracting RFF values from the acoustic signal. It has been shown that at minimum, six RFF extractions are needed to establish a consistent and reliable estimate for a single speaker [4]. With the time demands placed on clinical staff, an extraction time of 20-30 min for a single acoustic measure is not feasible. The implementation of an algorithm to extract estimates of RFF is an attractive alternative to the time-and training-intensive nature of manual RFF estimation and, moreover, is vital for clinical adoption.
In addition to faster processing methods, more information is needed on how to interpret RFF values in the context of disease-specific processes and treatment outcomes. RFF offset cycle 10 and onset cycle 1 have received the most attention in the RFF literature as potential clinical markers. It has been hypothesized that RFF values at these cycles are the most sensitive to the aerodynamic, tension, and kinematic factors proposed in the theoretical models due to their location furthest from the steady-state of each vowel (and closest to the voiceless consonant). Although studies have reported significant differences between individuals with and without tension-based voice disorders [5,6,27], there is still no universally accepted clinical cut-off value for offset cycle 10 or onset cycle 1 to distinguish between vocal health and disorder.

Purpose
To our knowledge, no study has formally undertaken a review of the RFF literature to integrate findings and synthesize recommendations for clinical adoption. Therefore, we completed a scoping review to identify and describe the current published literature on RFF. Through this review, we aimed to: (i) summarize the methodology for acquiring and processing RFF measures, (ii) compile reported RFF offset 10 and onset 1 values, (iii) describe statistical comparisons and outline relationships with perceptual measures, and (iv) provide recommendations for future steps towards clinical implementation and interpretation.

Materials and Methods
A scoping review is a systematic way to summarize findings and identify gaps in literature across studies with heterogenous methods. A scoping review was chosen because it seemed most appropriate for undertaking a comprehensive review of all RFF literature, without limitation to specific patient populations, study designs, or method-Appl. Sci. 2022, 12, 8121 5 of 18 ological approaches. We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist as outlined in Tricco et al. [28], which identifies 20 essential reporting items.

Literature Search
A systematic search of the literature was facilitated by a library scientist (author M.P.) on February 26, 2020. Five databases were searched: Embase (Elsevier, Amsterdam, The Netherlands), Medline (EBSCOhost), PsychInfo, CINAHL, and INSPEC. No filters or time limits were applied to the search. An example search string can be found in Appendix A, but generally included the following inputs for each database, modified as appropriate: relative fundamental frequency, fundamental frequency and (i) devoicing, (ii) offset, (iii) onset, (iv) voiceless obstruent, and (v) cycle-to-cycle. At the same time, a grey literature search was conducted by contacting a lead researcher in the field to share manuscripts that had been peer-reviewed and accepted for publication, but not yet formally published. All identified articles were first saved to EndNote where de-duplication began. Files were transferred to a review software management system, known as Rayyan [29], wherein any remaining duplicates could be identified and removed.
To ensure that this scoping review was as current as possible, a second search was completed on 9 April 2021. The same databases and search strings were used but the dates were limited to those between the first and second searches (i.e., 26 February 2020 to 9 April 2021). Finally, a hand search of the literature that included accepted manuscripts that had not yet been published was also completed in September of 2021 to find any final studies.

Study Inclusion and Exclusion Criteria
Inclusion and exclusion criteria were set prior to study review. Inclusion criteria were as follows: (i) enrollment of human subjects, (ii) entire study reported in the English language, and (iii) analysis of relative fundamental frequency. Exclusion criteria were as follows: (i) conference abstracts, (ii) case study or single-case design, (iii) meta-analysis or review, or (iv) studies where voicing cycles were not normalized to ST. We did not limit our inclusion by subject age, medical status (typical voice, disordered voice), or the type of equipment used to acquire voice acoustics (e.g., microphone, neck-surface accelerometer).

Article Review and Data Extraction
Prior to the official title/abstract review, two authors (V.M., C.K.) independently reviewed a random selection of 200 titles/abstracts from the literature search. This reliability review yielded excellent reliability with 99.5% absolute agreement. Subsequently, the two authors then divided the title/abstract review, wherein each author completed a review of half of the identified studies.
Following the title/abstract screening, the two authors independently read each study in full, resulting in no discrepancies between independent decisions, or 100% agreement for final inclusion in this review. This same process was completed in April of 2021 during the secondary database review and in September of 2021 for the final hand search.
Once included studies were identified, authors independently extracted information and placed all data into a structured Excel (Microsoft; Redmond, WA, USA) document. Author V.M. extracted information from 100% of the included studies, whereas author C.K. extracted information from 50% of the studies and author K.C. extracted the remaining 50% of the studies. Researchers were blind to each other's extractions. Then, the researchers met and reviewed their extractions together and discussed any discrepancies to consensus. The information of interest for this review included the following:
Subject Information: number of subjects, ages, sex/gender, diagnoses 3.
RFF Extraction Methods: speech stimuli, processing methods

4.
RFF Values for Offset Cycle 10 and Onset Cycle 1: for each population and/or study condition. Values that were reported in graphical form were not included or estimated.

5.
Statistical Results and Interpretations.
ing 50% of the studies. Researchers were blind to each other's extractions. Then, the researchers met and reviewed their extractions together and discussed any discrepancies to consensus. The information of interest for this review included the following: 1. Study Information: author(s), year published, journal 2. Subject Information: number of subjects, ages, sex/gender, diagnoses 3. RFF Extraction Methods: speech stimuli, processing methods 4. RFF Values for Offset Cycle 10 and Onset Cycle 1: for each population and/or study condition. Values that were reported in graphical form were not included or estimated. 5. Statistical Results and Interpretations.

RFF Acquisition and Processing
The speech stimuli used to acquire RFF were evenly distributed across studies. A total of 19 studies (51%) analyzed VCV utterances and 16 studies (43%) analyzed RFF from continuous speech. Only two studies (5%) employed both VCV and continuous speech in their methodologies. The intervocalic consonant /f/ results in the lowest withinsubject variation in RFF measures [38] and was unsurprisingly the most common voiceless consonant employed in RFF VCV stimuli.
Five studies focused on algorithmic development for automated RFF extraction [33,39,40,51,52]. Semi-automated algorithmic development began in 2013, in which a custom MATLAB (The MathWorks, Natick, MA, USA) program extracted RFF values from VCV utterances recorded with a microphone [40]. The algorithm (aRFF) [39] was updated in 2017 and openly shared on Boston University's website at no cost to the user. Vojtech and colleagues [51] refined the algorithm via incorporating rule-specific criteria for tuning algorithmic parameters based on patient-specific voice characteristics and acoustic recording environment (referred to as aRFF-AP; current version) and, later, by cross-referencing simultaneous high-speed laryngeal imaging to hone in on cycle-specific extraction decisions around the voiceless consonant (referred to as aRFF-APH; unreleased) [52].
These semi-automated algorithms operate in six steps: (1) vowel and voiceless fricative detection via high-to-low energy ratios, (2) f o estimation via autocorrelation (aRFF) or Auditory-SWIPE (aRFF-AP, aRFF-APH), (3) vocal cycle peak detection of the vowels, (4) vowel-fricative boundary detection via acoustic features, (5) sample rejection for instances that do not meet specified criteria (e.g., less than 10 onset or offset cycles, glottalization, misarticulation, voicing during the voiceless consonant), and (6) RFF calculation. Each step of the algorithm is fully automated except for step (1), in which the user must confirm or alter the algorithmically identified location of the fricative within the acoustic signal.
Of the five studies that focused on algorithmic development, four studies developed automated extraction methods for acoustic signals acquired with a microphone, whereas Groll et al. [33] developed automated methods from neck-surface accelerometer signals. Accelerometers are wearable sensors that have the potential to expand the use of RFF to ambulatory monitoring where it could have utility for long-term clinical tracking. Unlike the microphone-based algorithms, the accelerometer-based algorithm is fully automated and does not require user input. See Table S1 for more information on the studies devoted to automated RFF extraction.
A total of 32 studies in this review did not focus on automated RFF development. Of these, 23 studies (72%) used manual estimation to calculate RFF and 9 studies (28%), beginning in 2016, used semi-automated estimation methods. With manual estimation, voicing onset and offset cycles were identified through visual inspection of the acoustic waveform. Praat, a freely available acoustic software [54], was used in 74% (n = 17) of studies that used manual estimation. Praat's "pulse" function uses an autocorrelation method to determine cycle f o from the acoustic waveform. Of the nine studies that used semi-automated estimation, eight employed the aRFF algorithm and only one used aRFF-AP. The aRFF algorithm also uses the autocorrelation method to determine cycle f o values from the acoustic waveform. Given that manual RFF estimation via Praat and semiautomated RFF estimation via aRFF have been the most widely implemented methods, autocorrelation was the primary technique used to determine f o in most published studies. See Table S2 for characteristics and data extraction from all 32 studies. Mean offset cycle 10 values were consistently lower for individuals with voice disorders when compared to those of typical voices. Individuals with adductor laryngeal dystonia had reported values of −1.20 ST [2,4], and those with PD exhibited −1.90 ST [49] and −2.20 ST [5]. Of the studies that reported values in patients with VH-including both phonotraumatic lesions and non-phonotraumatic VH subgroups-offset cycle 10 ranged from −1.76 ST to −0.80 ST, with a median value of −1.35 ST [6,13,24,27]. Two studies completed formalized receiver operating characteristic (ROC) curves to determine the RFF value that would most accurately distinguish between typical voices and those with VH. Stepp et al. [27] identified a value of −0.56 ST (97% sensitivity, 90% specificity), whereas Heller Murray et al. [6] reported lower cut-off values of −1.92 ST (VH subgroups vs. controls), though the sensitivity of these values were low, ranging from 29-43%. Onset cycle 1 trended lower in individuals with voice disorders but showed substantial overlap with the range of values reported by those with typical voices. For example, studies reporting on adductor laryngeal dystonia hovered around 2.60 ST [2,4] and those with PD ranged from 1.75 ST [5] to 2.70 ST [49]. Studies on patients with VH were similar, reporting a range of 1.18 to 2.54 ST, with a median of 1.90 ST. ROC analyses showed closeto-chance detection rates [6,27], indicating a poor ability to discriminate between those with and without VH.

Between-and Within-Subject Analyses
A total of 12 studies completed formalized comparisons between experimental and control groups, with 9 studies finding statistically significant differences. The strongest evidence was found in comparisons involving adults with typical voices and subjects with VH, wherein 100% of studies (n = 4) reported significantly lower RFF values for those with VH. However, discrepancies arose when trying to pinpoint whether offset or onset cycles Mean offset cycle 10 values were consistently lower for individuals with voice disorders when compared to those of typical voices. Individuals with adductor laryngeal dystonia had reported values of −1.20 ST [2,4], and those with PD exhibited −1.90 ST [49] and −2.20 ST [5]. Of the studies that reported values in patients with VH-including both phonotraumatic lesions and non-phonotraumatic VH subgroups-offset cycle 10 ranged from −1.76 ST to −0.80 ST, with a median value of −1.35 ST [6,13,24,27]. Two studies completed formalized receiver operating characteristic (ROC) curves to determine the RFF value that would most accurately distinguish between typical voices and those with VH. Stepp et al. [27] identified a value of −0.56 ST (97% sensitivity, 90% specificity), whereas Heller Murray et al. [6] reported lower cut-off values of −1.92 ST (VH subgroups vs. controls), though the sensitivity of these values were low, ranging from 29-43%.
Onset cycle 1 trended lower in individuals with voice disorders but showed substantial overlap with the range of values reported by those with typical voices. For example, studies reporting on adductor laryngeal dystonia hovered around 2.60 ST [2,4] and those with PD ranged from 1.75 ST [5] to 2.70 ST [49]. Studies on patients with VH were similar, reporting a range of 1.18 to 2.54 ST, with a median of 1.90 ST. ROC analyses showed close-to-chance detection rates [6,27], indicating a poor ability to discriminate between those with and without VH.

Between-and Within-Subject Analyses
A total of 12 studies completed formalized comparisons between experimental and control groups, with 9 studies finding statistically significant differences. The strongest evidence was found in comparisons involving adults with typical voices and subjects with VH, wherein 100% of studies (n = 4) reported significantly lower RFF values for those with VH. However, discrepancies arose when trying to pinpoint whether offset or onset cycles were different. One study reported significantly lower offsets [27], one reported significantly lower onsets [47], and the remaining two studies reported lower RFF values across both offsets and onsets [6,13]. Across other studies, there was consistent evidence that patients with PD and those with adductor laryngeal dystonia had significantly lower RFF values as a whole compared to those with typical voices [4,5,49].
Other studies examined the impact of RFF across the lifespan. One study compared children who do and do not stutter [30] and another compared children with and without vocal fold nodules [35]; neither found differences between groups for any RFF measure. However, Heller Murray et al. [35] reported that RFF onset cycle 1 was lower in younger children compared to older children, pointing to a potential maturational effect for revoicing. A comparison undertaken between young children (age 4), older children (age 8), and young adults showed similar results, wherein young children had lower onset cycles (1 and 2) compared to young adults [46]. A study by Park et al. [44] examined differences between young adults and aging adults, finding no differences in RFF values; however, a study by Watson [12] did find significant differences for onset values only.
Within-subject analyses were undertaken to understand RFF sensitivity to subject fatigue. Four studies examined how vocal load impacted RFF [32,34,36,43], including short-term (a few hours) and long-term (multiple weeks). Findings were mixed: pre-to post-fatigue comparisons showed that offset cycle 10 may assist in tracking fatigue over the workday [43] but onset cycle 1 seemed to be more sensitive to long-term tracking over several weeks [34]. Fujiki et al. [32] saw no impact of a lab-based vocal loading task on offset cycle 10 or onset cycle 1, whereas Kagan and Heaton [36] reported composite changes in both measures following a formalized loading paradigm.
The relationship between the proposed physiological mechanisms of RFF (i.e., kinematics, laryngeal muscle tension, and aerodynamics) and RFF offset cycle 10 and onset cycle 1 were examined via kinematic estimates of laryngeal stiffness [1,44], an aerodynamic ratio of vocal intensity to estimated subglottal pressure [7], and vocal fold ab/adduction degree and timing [44]. To create within-subject variation, speakers with typical voices modulated vocal effort and/or strain across several voice productions. Group analyses yielded consistently poor-to-moderate relationships between RFF and the physiological parameters of laryngeal stiffness, aerodynamic ratio, and both kinematic degree and timing. Importantly, however, within-subject analyses demonstrated a marked improvement in the strength of these relationships, ranging from moderate to strong [1,7]. Accordingly, the authors supported the use of RFF as a clinical tracking tool to monitor an individual's voice changes over a course of therapeutic intervention (where it was assumed that vocal effort and/or strain would decrease over time).
Four studies specifically investigated how RFF may change over the course of various interventions. Two studies examined RFF changes over the course of vocal therapy, including a single-session treatment paradigm as well as a full course involving several therapy sessions. Roy et al. [47] found that onset values became higher and more similar to control values in a group of 111 patients with MTD following one hour of manual circumlaryngeal therapy. However, there was no effect of therapy on offset cycle values. Similarly, a group of 16 women (3 with nodules, 13 with MTD) completed a successful course of voice therapy, which resulted in significant improvements in RFF cycles as a whole [24]. The largest change was in onset cycle 1, with post-therapy values (M = 2.71 ST) being significantly greater than pre-therapy values (M = 1.90 ST). Interestingly, a comparison of individuals with phonotraumatic lesions pre-/post-surgical removal resulted in no significant changes to RFF [13]. The authors hypothesized that the surgical intervention only acted to change laryngeal anatomy, but did not mitigate the hyperfunctional behaviors and associated muscular tension, leading to no change in RFF values. The final treatment study investigated how low-level light therapy may promote healing and reduce inflammation in adults with typical voices undergoing a vocal fatigue paradigm [36]. Subjects were divided into four intervention groups: infrared wavelength, red wavelength, heat, and no-heat light (control).
Due to the small sample size (n = 4 for each group), a formalized statistical analysis could not be undertaken; however, the red-light group demonstrated a faster return-to-baseline trend in RFF values when compared to all other groups.

Perceptual Measures: Effort, Strain, and Dysphonia
The relationship between RFF values and perceptual measures of vocal effort, strain, and dysphonia have varied widely across studies (n = 7). Lien et al. [7] reported poor relationships between listener-perception of vocal effort and offset cycle 10 and onset cycle 1 (R 2 = 0.21 and R 2 = 0.26, respectively) in individuals with typical voices while self-modulating their vocal effort. The relationship strength increased to moderate for offset cycle 10 (R 2 = 0.46) and onset cycle 1 (R 2 = 0.56) when analyzed within speaker. Conversely, relationships were significant between offset cycle 10 and listener-perceived ratings of vocal effort when analyzed by group across three other studies in those with typical and disordered voices, but no relationships were found for onset cycle 1 [4,8,27].
Roy et al. [47] found that RFF onset cycle 1 values were significantly correlated with overall dysphonia severity in a large group (n = 111) of patients with MTD. However, Stepp and colleagues [27] also investigated subjects with VH and reported weak relationships between onset cycle 1 and dysphonia ratings. Buckley et al. [2] took a slightly different approach in a group of subjects with adductor laryngeal dystonia by examining a combination of several acoustic measures, finding that offset cycle 10, changes between specific RFF cycles (e.g., change from onset cycle 1-6) and additional spectral characteristics were significantly related to dysphonia in their model.
A study by Park et al. [9] manually manipulated cycles in acoustic waveforms to understand how raising and lowering RFF would affect auditory-perceptual ratings of vocal strain. Eight adults without voice disorders produced VCV utterances in their typical voices and then again with maximal vocal effort. When intervocalic cycle values were artificially reduced in the typical voice samples, listeners perceived the voices as having greater strain; conversely, when RFF values were artificially raised in voicing samples collected during effortful speech, they were perceived as having less strain. This manipulation paradigm is perhaps some of the most convincing evidence of how cycle changes within a speaker can contribute to changes in perceived vocal characteristics. The directionality of the changes (i.e., lower RFF increased perceived strain while higher RFF reduced perceived strain) was consistent with the existing theoretical model of RFF not only being an indicator of laryngeal tension, but one that may be perceived by listeners through strained speech.

Discussion
This scoping review summarized the current state of research on RFF as a tool for assessing and tracking changes in tension-based voice disorders. We found 37 articles published since 1998 that lend support to RFF as a useful metric to characterize the human voice. The identified studies were heterogenous in nature, spanning patient populations, speech stimuli, extraction methods, and statistical testing, among other factors. Despite this heterogeneity, these works form a clear picture as to the potential utility of RFF for clinical voice evaluation, as well as areas that need further investigation prior to the clinical adoption of RFF.

Method of RFF Computation
Prior work on RFF indicates that approximately 20-30 min of manual processing time is necessary to calculate a reliable estimate for a single subject. Numerous studies (e.g., [51]) discuss the clinical implications of such an extensive processing time (not to mention the considerable time required to train oneself to manually estimate RFF), which point to a need for faster, automated methodologies for RFF to be implemented in voice clinics. Additional review of the literature indicated that there was, indeed, an increase in algorithmic investigations and implementation of these methodologies since their availability in 2016; 9 of the 17 studies (53%) published since 2016 have utilized algorithmic extraction, indicating that algorithmic adoption is a viable option for many researchers.
The aRFF algorithm [39] is the most commonly used automated method to date. This algorithm, much like software used during manual estimation (e.g., Praat), uses autocorrela-tion to determine cycle f o . It follows that one would expect a close correspondence between automated and manual estimates. Lien et al. [39] reported a strong relationship (r ≥ 0.82) between aRFF and manual estimates for offset and onset values; however, the authors also found that the aRFF algorithm consistently estimates higher offset cycle 10 values (+0.22 to +0.41 ST) and lower onset cycle 1 values (−0.11 to −0.10 ST) when compared to manual estimates. Although these systematic errors are smaller than the increase in RFF that can be expected after undergoing voice therapy [24] (indicating that, on average, clinically meaningful changes in RFF will not be masked by errors associated with using the RFF algorithm), it is still important to consider the source of algorithm errors. One reason for this discrepancy may be that manual estimates allow the user to visually identify voicing offset and onset, whereas the algorithm must use computational techniques for vowel-fricative boundary identification. Boundary cycle shifts are most likely to impact the cycle values closest to the fricative (i.e., offset cycle 10 and onset cycle 1) and require refinement for consistent value interpretation and clinical adoption.
Each new algorithm iteration has improved upon the last by increasing the accuracy of cycle detection through various signal processing considerations. For example, Vojtech et al. [51] incorporated new pitch estimation methods in aRFF-AP to more closely align algorithm extracted RFF values with manually acquired values. Within this algorithm, the authors also introduced a set of signal-specific parameters to account for differences in voice sample characteristics (e.g., overall severity of dysphonia, signal acquisition quality) based on the acoustic measure, pitch strength [55,56]. In doing so, errors affecting the accuracy of the RFF algorithm were reduced by 88.4% relative to manually acquired values.
Further exploration into improving the accuracy of algorithmically extracted RFF values has yielded a new set of acoustic feature processing techniques to locate the voicing cycles closest to the voiceless consonant (i.e., offset cycle 10, onset cycle 1). This most recent version of the RFF algorithm (aRFF-APH) [52] was built by integrating information from simultaneous recordings made using a microphone and high-speed flexible laryngoscopy. The authors were able to identify the physiological-rather than acoustic-initiation and termination of vocal fold vibration that marks the boundaries of the voiceless consonant. Interestingly, the study reported that both algorithm iterations (aRFF-AP, aRFF-APH) were significantly more likely to identify offset cycle 10 and onset cycle 1 values that coincided with physiologically identified vocal fold vibratory timings than manually acquired values, with aRFF-AP having the strongest correspondence with physiologic events. The results of this work call into question the use of manual RFF as a "ground truth" comparison for algorithmic testing. Instead, aligning RFF automated extraction with physiologic events may increase the precision of the measure and provide insight into the underlying physiological factors proposed to influence RFF values. Although the aRFF-APH algorithm demonstrated the highest association with physiologically identified vocal fold vibratory timings, the aRFF-AP algorithm remains the gold-standard method for algorithmically estimating RFF due to its validation across a large dataset (483 independent speakers) and broad range of vocal disorders and severities. A large-scale analysis is still needed with the aRFF-APH algorithm to determine its generalizability across various voice disorders.
Although algorithms developed for microphone signals (aRFF, aRFF-AP) are publicly available and relatively easy to use, automated RFF extraction continues to pose issues for clinical practice. First, these RFF algorithms are semi-automated, meaning that they continue to require user interaction to inform extraction methods and determine extraction accuracy. The algorithm interface allows the user to check the accuracy of the voiceless consonant detection from the acoustic waveform and override the automated decision, if necessary, thereby improving extraction accuracy, but also increasing the time needed for processing. Second, all publicly available algorithms [57] require the use of proprietary MATLAB software, including several toolboxes (e.g., econometrics, curvefitting, signal processing) at additional cost. MATLAB is not a common software used by most voice clinics, further limiting the use of algorithms for RFF processing. Ideally, automated RFF methods would expand into freely available software to increase its usability at clinical sites. Third, and perhaps most importantly, automated algorithms can only extract RFF from VCV stimuli. An algorithm that could extract RFF from continuous speech would be helpful for retrospective analyses of pre-established voice databases that include standard reading passages (e.g., Rainbow Passage). Then, researchers could undertake a large-scale analysis across patients of various clinical diagnoses, which would allow for refinement of clinical ranges, cut-off values for classification between vocal health and disorder, and clinical tracking over the course of intervention.
A more recent algorithmic iteration expanded previous work from acoustic microphone signals to signals gathered with neck-surface accelerometers [33]. A neck-surface accelerometer is a small sensor (approximately the size of a dime) placed on the anterior surface of the neck, inferior to the location of the cricoid cartilage but superior to the sternal notch, at midline. The sensor captures vibration from the surface of the skin and has the benefit of reduced signal disruption from environmental/background noise and speech frication and/or aspiration noise [58]. Furthermore, the accelerometer does not adequately capture oral resonances, making it difficult-to-impossible to discern what the speaker is saying, thus protecting speaker privacy during communication exchanges.
It follows that accelerometers-coupled with smartphone technology-are under development for ambulatory voice monitoring and biofeedback for those at risk for voice disorders (e.g., occupational voice users) and those with current voice disorders [59][60][61][62]. The incorporation of RFF into daily biofeedback could be beneficial for within-subject clinical tracking and/or the identification of maladaptive tension-based vocal patterns in pre-clinical populations. Lien and colleagues [37,41] have shown that RFF calculated from accelerometer signals are lower when compared to those calculated from simultaneously acquired microphone signals. Therefore, accelerometer-derived RFF values need to be interpreted with caution and only compared to other values obtained from accelerometers. Finally, similar to the recommendation for microphone-based RFF algorithmic processing, extraction from continuous speech would increase the usability of accelerometer-based RFF as a monitoring tool during daily communication. These areas of future development are ongoing and provide a potential exciting avenue for incorporation of continuous monitoring to assist in the remediation of voice disorders, and the prevention of aberrant vocal behaviors as well.

Interpreting RFF in Clinical Practice
Regardless of the method used to calculate RFF, it remains that one of the primary challenges hindering the clinical adoption of RFF is a lack of standardized values for typical and disordered voices. Unfortunately, our review of the literature does not provide a clear answer as to which RFF values should be used for a clinically meaningful distinction of vocal health or disorder. On the one hand, we found robust evidence that individuals with voice disorders have consistently lower offset cycle 10 values when directly comparing the range of their reported mean values to those reported in adults with typical voices. Onset cycle 1, however, showed considerable overlap between typical and disordered groups. These findings are surprising since statistical comparisons between those with and without voice disorders were significant in 9 out of 12 studies. This discrepancy could be due to analyses that combined offset and onset cycles rather than reporting offset and onset values in isolation. Thus, at present, there seems to be no clear recommendation for exactly which cycle, or combination of cycles, would effectively distinguish between typical and disordered voices. This area of research is still ongoing, with recent studies looking to enhance the sensitivity of RFF via advancing computational methods (e.g., automated, rule-based decision criteria [51]) as well as combining RFF with other acoustic parameters to improve the accuracy of RFF as a clinical tool of diagnosis [63].
Importantly, offset cycle 10 and onset cycle 1 ranges reported in this review were created without consideration of the number of subjects enrolled across studies, leading to an equal weighting for studies that enrolled few subjects and those that enrolled many. Instead, a weighted average would be the best way to help establish normative ranges and values across healthy and disordered populations. At present, the number of studies reporting RFF values for adults with typical voices (n = 14) may be sufficient to establish an acceptable normative range, but more studies need to report raw RFF values (M, SD, range) for aging adults and children with typical voices to establish normative ranges across the lifespan.
At present, there seems to be stronger evidence for the utility of RFF as a clinical tracking tool for within-subject changes over several voice productions as well as for quantifying progress pre-and post-therapy. Tracking of physiological parameters (kinematic stiffness, aerodynamics) were shown to be moderate-to-strongly correlated with RFF for individual analyses of vocally healthy speakers varying their vocal effort and strain [1,7]. Furthermore, Roy et al. [47] and Stepp et al. [24] reported improvements in onset cycle 1 values (i.e., more positive values) in patients with tension-based voice disorders following vocal therapy. Despite the promise of these results, there were few studies that examined RFF as a clinical outcome measure [13,24,36,47].
Indeed, a systematic review with meta-analysis is considered the highest level of evidence for understanding treatment efficacy and establishing clinical guidelines [64]. However, different intervention approaches (single therapy visit, multiple therapy visits, surgery) and various treatment groups (e.g., MTD, VH, polyps, typical) across the studies described in our review, make a meta-analysis impossible to undertake at this time. We recommend that future treatment studies of patients with tension-based voice disorders incorporate RFF into their standard voice protocols to increase the number of studies available for a large-scale analysis and assist in validating RFF as a treatment outcome. This would not only provide evidence for RFF as an appropriate outcome measure in specific patient populations, but also help to define what a clinically meaningful change in RFF may be over the course of intervention.
A second recommendation would be to incorporate RFF measures into the therapeutic assessment protocols and tracking for patients with listener-perceived vocal strain (defined as excessive vocal effort [65]) or self-reported vocal effort. Historically, vocal strain and effort have proved difficult to quantify via acoustic measures [66][67][68]. This may be due to the variation of strained and effortful productions across speakers, as well as the potential overlap of percepts in individuals with voice disorders, in which a person with a strained voice can also present with a breathy voice. Our review of the literature indicates that RFF may be an appropriate clinical tool related to the percepts of strain and effort with offset cycle 10, in particular, being significantly correlated with either speaker or listenerratings [4,7,8,27].
Perhaps the most substantial evidence for RFF as an objective indicator of vocal strain was described in Park et al. [9]. The study design controlled for the problems of speaker variation and overlapping voicing percepts by synthetically altering RFF cycle values in speech samples gathered from those with typical voices. The authors manipulated RFF cycle values in two ways: (i) artificially lowering RFF cycle values during typical speech, which was hypothesized to increase the perception of strain due to the lower values seen in individuals with tension-based voice disorders, and (ii) artificially raising RFF cycle values in productions of maximal vocal effort so that the cycle values would resemble those found in typical voice productions. The hypotheses were supported when these manipulations resulted in changes to perceived strain in the speakers, wherein lowering RFF values acted to increase the perception of strain, and increasing RFF values reduced perceived strain. The authors incorporated an additional condition into the study in which they artificially increased mid-to-high frequency noise in the signal (which they quantified via a harmonics-to-noise ratio) and, subsequently, saw an increase in perceived strain. This study provides a foundation for future perceptual work on RFF, in that it may prove beneficial to study RFF in combination with other acoustic measures [8] in order to increase the correspondence between acoustic and perceptual measures.
Finally, a thorough understanding of the relationship between RFF and underlying laryngeal physiology is needed to appropriately interpret RFF values. That is, in order to use RFF as a clinical outcome measure, it should be validated as an indicator of laryngeal tension. In our review, few studies investigated the laryngeal factors proposed by Stepp et al. [24]; however, the results of these studies provide preliminary evidence for RFF being related to measures of laryngeal stiffness [1,44], aerodynamics [7], and vocal fold adduction parameters [44]. Our review did not delve into the physiological basis of RFF, partially due to the fact that we limited our search to studies that enrolled human subjects, excluding modeling and theoretical studies. Recent work on modeling voicing offset behaviors [69] suggests a physiological relationship between abduction duration and RFF values; however, these findings should be expanded upon, and evaluated with empirical testing in human subjects. Combining computational modeling strategies with human subject research is an important step towards isolating the differential components associated with intervocalic voicing offset and onset (i.e., laryngeal kinematics, aerodynamics, and tension) and, in turn, may provide useful insights into the clinical utility of RFF for assessing and treating vocal disorders.

Limitations
Although every effort was made to make this scoping review as current as possible, researchers have continued to publish on RFF. Since the hand search completed in September of 2021, there are at least five more published (or in press) studies that may have qualified for this review [63,[70][71][72][73]. We acknowledge that all reviews are meant to be updated and expanded upon, and recommend these studies be considered in future reviews on RFF.
This scoping review focused on specific topics pertaining to the clinical adoption of RFF in patient populations and the processing methods required to do so. Still, there are other topics that were not extensively discussed in our review, but should be topics of future summaries. These include (but are not limited to): (i) the impacts of phoneme, stress, and pitch on RFF measures, (ii) comparisons between VCV and continuous speech estimations of RFF values, (iii) discussions on the inability to calculate RFF in specific cases (e.g., glottalization, fewer than 10 voicing cycles, no established steady state) and the challenges of missing data in individuals with voice disorders, and (iv) physiological modeling [69] and cross-methodological studies to validate RFF as a measure of tension, aerodynamics, and/or laryngeal kinematics [73]. This last point, in particular, is needed to provide quantitative evidence to support, refute, or update the models proposed by Stepp et al. [24] and Heller Murray et al. [6].

Conclusions
RFF is a relatively new voice measure that has been described in the literature since 1998. At present, the clinical adoption of RFF is limited by two primary factors, including the availability of automated processing methods, and the interpretation of RFF values. Our review showed that automated algorithms have been incorporated into nearly 55% of all research studies since their availability in 2016. The algorithms have undergone several iterations and have been extended into ambulatory monitoring of voice disorders, a key clinical advancement for daily monitoring and biofeedback for patients. We found evidence for RFF as a potential treatment outcome measure and clinical tracking tool. However, more intervention studies with RFF as a clinical outcome measure are needed in order to complete a formalized meta-analysis and identify changes in RFF that are clinically meaningful and applicable. Finally, investigations into the underlying physiology of RFF are still needed to fully understand how this measure corresponds to the identified factors of tension, aerodynamics, and kinematics.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/app12168121/s1, Table S1: Studies focused on automated algorithmic processing of RFF; Table S2: Summary of studies (n = 32) examining RFF in people with and without voice disorders.

M=2.90
There was only a small effect of cognitive load on RFF offset cycles as a whole; no effect on onset cycles. Only 60% of subjects had the expected change.
No group differences for onset cycles.
Weerathunge et al. [53] n/a Voice Disorders Auditory-Perceptual Evaluation of Voice [65]; STRAIGHT = Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum. n/a = not applicable; NR=not reported; tx=therapy; sx = surgery; ROC=receiver operating characteristic. a The same subjects were analyzed in the studies by Buckley et al. [2] and Eadie and Stepp [4]. b =values are estimates reported in Stepp [49]; c Offset cycle 1 values were not reported in the study, but instead were gathered directly from authors. d =values calculated from information provided in the study, as raw values not reported. e =values reported in Stepp et al. [24].