Real-Time Visual Feedback in Singing Pedagogy: Current Trends and Future Directions

The technological tools described here can be applied to provide real-time visual


Introduction
Over the last decade, increasing numbers of singing teachers have combined traditional pedagogical tools with kinesthetic and visual feedback [1,2].Good results have been reported from combining visual feedback with specific exercises in voice lessons [3,4].Such pedagogical approaches are highly recommended, as they serve the current educational goals of a student-centered learning approach, known to facilitate critical reflective abilities, self-regulation and appraisal skills [5].These competences are crucial for today's expectations of professional lifelong learners; like other professionals, singers are expected to be able to handle fast-changing technologies, manage highly competitive global working environments and make career choices to maintain employability [6].Given these expectations, guided awareness is currently a central point in music education; it facilitates a successful transition from a student to a professional musician [7,8].A means to promote guided awareness is the use of technological tools that monitor voice production.
The human voice, unlike any other musical instrument, is hidden to the eye [2,9].Therefore, one may argue that modification of neuromuscular behavior in 'voice building', which pertains to the pedagogical responsibilities of singing teachers, should not be limited to or solely based on personal experience [10][11][12].Singing requires motor learning, which is facilitated by a knowledge of processes and procedures [13,14], resulting in two types of responses: (i) knowledge of performance (KP) or, in other words, knowledge of how the body develops and acts; and (ii) knowledge of results (KR), that is, the outcomes associated with a particular bodily action [9,15].KR is of particular interest for the developing singer, as it leads to the promotion of self-regulation, increased motivation to practice and neuromotor improvement, provided that the offered feedback was objective, positive, instructive, task-orientated and meaningful [9,16].
Real-time visual feedback in singing assists with the acquisition of KR by establishing effective biomechanical behavior based on meaningful visual information [4,16,17].When combined with verbal instruction, it has been proven to be effective in the training of particular singing skills, such as intonation [16,18].Providing a direct visualization of the singer's vocal response to a given instruction helps circumvent "critical points" commonly observed in more traditional voice teaching models.In these models, the student needs to wait and process the verbal instruction of the teacher before attempting a subsequent vocal response [16,19].
The increasing number of technological tools that provide real-time visual feedback of the voice justifies a comprehensive overview and discussion of their possible current and future pedagogical applications.Thus, the present article focusses on how to visualize salient aspects of voice production, ordered according to the three subsystems that constitute the vocal apparatus: respiratory, oscillatory and resonatory.The visualization of key components within these subsystems facilitates the connection between perceived voice qualities and their underlying physiological, aerodynamical and acoustical correlates, as illustrated in Figure 1.The ultimate goal is the development of both vocal and expert listening competences in singing students.

Breathing Patterns
Breathing behaviors are crucial to determine voice quality and, therefore, play an important role in voice pedagogy.Terms such as 'appoggio' and 'support' are extensively used, referring to the coordination between laryngeal and breathing events [21,22].Also, less efficient phonatory habits, such as the habitual use of pressed phonation (hyperfunctional), can been modified as a consequence of altering breathing patterns [23,24].However, finding an optimal breathing strategy for a particular song is highly idiosyncratic.On the one hand, different muscular strategies can be applied depending on the singer's individual characteristics [18].For example, some singers mainly use the ribcage to vary lung volume, whereas others also recruit the abdominal wall to either assist with changes With respect to the respiratory subsystem, we will discuss real-time visual feedback of breathing patterns.This is relevant to voice pedagogy because these patterns control lung volume, which affects subglottal pressure (p sub ), i.e., the air pressure in the lungs [18].Variation in p sub results in changes in both sound pressure level and spectrum tilt, which determine vocal loudness [20].
With regard to the oscillatory subsystem, technology used to visually monitor variations in the tension, extension and adduction of the vocal folds will be discussed.These are relevant as they determine the number of vocal fold oscillations per second and glottal resistance.Both will determine fundamental frequency (f o ) and phonation types, respectively, which are significant to qualities such as pitch, roughness, vibrato and voice timbre [18].
Finally, concerning the resonatory system, movements of the larynx, jaw, soft palate, lips and tongue, are associated with modifications of sound radiation and vocal tract resistance.Acoustically, this leads to various vocal tract transfer functions that differentiate vowels, consonants and voice timbre, characteristics that can be visualized in real-time by spectrographic displays.

Breathing Patterns
Breathing behaviors are crucial to determine voice quality and, therefore, play an important role in voice pedagogy.Terms such as 'appoggio' and 'support' are extensively used, referring to the coordination between laryngeal and breathing events [21,22].Also, less efficient phonatory habits, such as the habitual use of pressed phonation (hyperfunctional), can been modified as a consequence of altering breathing patterns [23,24].However, finding an optimal breathing strategy for a particular song is highly idiosyncratic.On the one hand, different muscular strategies can be applied depending on the singer's individual characteristics [18].For example, some singers mainly use the ribcage to vary lung volume, whereas others also recruit the abdominal wall to either assist with changes in the ribcage or stabilize it [25].After inhalation and just before a phrase starts, singers have been found: (1) to apply a slight contraction of the abdominal wall [26]; or (2) to modify ribcage volume [27].
Visual feedback of transdiaphragmatic pressure can help both trained and untrained singers to direct their attention to the act of contracting the diaphragm [23]; such feedback might contribute to developing an optimal breathing strategy.Breathing patterns during singing have been monitored using magnetometers [28], optoelectronic plethysmograph [29] and respiratory inductance plethysmograph (RIP systems) [25,30].RIP systems are relatively easy to manage and allow for the generation of real-time feedback [31].An example is the RespTrack system (by J. Stark, available at www.columbi.se,accessed on 6 June 2022).It has two elastic belts that should be placed-one around the ribcage and the other around the abdominal wall (Figure 2).These belts are connected to a unit equipped with an on-board AD-converter and a reset button.A potentiometer knob allows for varying of the balance between the ribcage and abdominal signals so that they reflect lung volume.The unit has outputs for volumes of the ribcage (RC) and the abdomen (AB) and their sum, which, when accurately balanced, corresponds to lung volume (LV).The RespTrack unit can be connected to a portable computer with a USB cable.The RespTrack Recorder software (Columbi Computers AB, Stockholm, Sweden) visually displays RC, AW and LV signals simultaneously in real-time.To record these signals, data acquisition devices or audio interfaces with direct current-coupled inputs are required; the signals are slowly varying and, therefore, cannot be recorded with normal sound cards.
Figure 3 displays audio, LV, RC and AW signals, recorded with a microphone and a RespTrack unit for a female jazz singer performing a phrase from an aria with, at the top, the audio signal of the phrase, and below, the LV, RC and AW signals.The red box highlights the inhalatory behavior, showing the expansion of AW and RC and the associated increase in LV.Note that the AW is contracting during the first part of the phrase.This breathing behavior is consistent with previous descriptions [26].

Subglottal Pressure
Breathing patterns influence lung volume, which affects the elasticity of the breathing apparatus and, hence, is significant to p sub .Along with adduction, extension and tension of the vocal folds, p sub is also a key physiological parameter for controlling voice quality.
Subglottal pressure is the main tool for controlling sound pressure level (SPL), which is significant to perceived vocal loudness [18,32].Besides affecting SPL and loudness, p sub also has a strong influence on vocal fold contact time and closing speed [20,33].Acoustically speaking, increasing p sub will decrease the tilt of the voice source spectrum, thus enhancing the higher frequency partials more than the lower [18,32,34].Further, it needs to be increased with increasing f o .Also, f o is slightly affected by p sub .An increase in p sub tends to slightly increase f o [18].Thus, singers need to learn to produce the correct pressure for a given pitch and loudness before the tone starts [2].This pre-planned fine-tuning is quite essential for vocal control.One way of automatizing this ability is to practice regularly staccato and arpeggio exercises [35].In addition, exaggerated p sub may lead to stronger vocal folds collisions, which, when habitual, often produce voice disorders [33].
Subglottal pressure peaks can be estimated from the intraoral pressure during the occlusion of the consonant /p/, as this consonant is produced with an open glottis and a closed mouth so the lungs-to-lips airway is open [36].
The intraoral pressure can be captured with a small tube inserted into the corner of the mouth.If attached to a pressure meter, e.g., the PG-100E unit (Glottal Enterprises, Syracuse, NY, USA), it can be visualized while singing syllables /p + vowel/ (Figure 4).Intraoral pressure peaks can be monitored in real-time using an oscilloscope or by means of the RespTrack recorder software.Providing real-time visual feedback of p sub can be helpful not only for vocal health, but also for controlling vocal expressiveness.For example, p sub is crucial to musical phrasing.Figure 5 shows audio and corresponding pressures for the first six bars of the aria, "O mio babbino caro", from the opera Gianni Schicchi by G. Puccini.The excerpt was sung by a female soprano, substituting the lyrics with the syllable /pa/.The left red box highlights the first three notes of the first bar.Although they have the same pitch (Ab 3 ), the p sub peaks increase in value throughout this note sequence.This is required to produce a crescendo.The right red box illustrates another example of the use of p sub for the purpose of musical phrasing; the highest pressure was not produced for the highest pitch in the phrase (Ab 5 ), which occurs in an unstressed position of the bar, but in the following note (Eb 5 ), which occurs in a stressed position in the bar (the first beat).

The Oscillatory Subsystem
In addition to p sub , singers must learn to master vocal fold tension, extension and adduction, as these are crucial to intonation, vocal registers and phonation types.These parameters can be visualized by means of an electroglottograph (EGG), an electrolaryngograph (ELG) and an inverse filter unit.Both EGG/ ELG can show vocal fold contact variations in real-time [37,38], while an inverse filter can provide feedback of variations in glottal airflow [33].

Vocal Folds Contact Variations
The electroglottograph is a non-invasive device that has two electrodes placed on each side of the thyroid notch.When the vocal folds contact, an imperceptible electric current is sent between the electrodes.The resulting signal displays vocal fold contact area in terms of a waveform corresponding to the voltage variation across the glottis; it reaches a maximum at full contact and a minimum when the vocal folds are separated (Figure 6) [37].

The Oscillatory Subsystem
In addition to psub, singers must learn to master vocal fold tension, extension and adduction, as these are crucial to intonation, vocal registers and phonation types.These parameters can be visualized by means of an electroglottograph (EGG), an electrolaryngograph (ELG) and an inverse filter unit.Both EGG/ ELG can show vocal fold contact variations in real-time [37,38], while an inverse filter can provide feedback of variations in glottal airflow [33].

Vocal Folds Contact Variations
The electroglottograph is a non-invasive device that has two electrodes placed on each side of the thyroid notch.When the vocal folds contact, an imperceptible electric current is sent between the electrodes.The resulting signal displays vocal fold contact area in terms of a waveform corresponding to the voltage variation across the glottis; it reaches a maximum at full contact and a minimum when the vocal folds are separated (Figure 6) [37].Real-time visual feedback of EGG shapes can be provided by several software systems, such as SpeechStudio (Laryngograph, UK) and VoceVista Video Pro (Sygyt Software, Bochum, Germany).By visualizing EGG shapes, different degrees of vocal fold adduction can be monitored.For example, the top panel of Figure 7 shows EGG signals and their corresponding derivative (dEGG), as displayed by the VoceVista Video Pro software.This was recorded from a male singer sustaining the vowel /a/ with different degrees of glottal adduction on the same pitch and sound pressure level; the longer the vocal fold contact, the broader and more knee-like the EGG shape.The rightmost bottom EGG shapes illustrate the variation from breathy (hypofunctional) to pressed (hyperfunctional) phonation.Minimal glottal adduction, as used in breathy phonation, results in a waveform with a narrow pulse.In flow phonation, the glottal closure is complete and the pulse is wider.The pulse is still wider for firmer glottal adduction, as it is in neutral and pressed phonation [34,39].Firm phonation refers to an elevated but not maximal degree of adduction, reflected in a wide pulse that is still narrower than it is for pressed phonation.
Real-time displays of EGG signals for phonation types may be advantageous for a developing singer, for both aesthetic and vocal health reasons.As mentioned, glottal adduction determines phonation types which, in turn, determine voice timbre [20,40].Moreover, habitual use of pressed phonation (hyperfunctional) may lead to phonotrauma [33].Finally, in classically trained singing, flow phonation is often considered a baseline phonation type and seems to be associated with a more resonant voice quality [41].VoceVista Video Pro software also offers a wavegram analysis of EGG (by C. Herbst) [42], showing continuous sequences rather than single EGG periods.The amplitude of the wavegram patterns reflect the pulse width and, thus, glottal adduction.The wavegram can be complemented with a display of the EGG derivative sequence (dEGG wavegram), as well as a narrow band spectrogram [38].This combination may help a student learn how to avoid or intentionally produce phonatory events, such as register breaks.Figure 8 shows an example of one voice break during an ascending glissando sung by a male singer.The red arrows highlight f o variations, while the red box highlights a voice break.The chaotic event in the right part of both EGG and dEGG wavegrams illustrates a loss of vocal fold contact.More recently, real-time visual feedback of EGG shapes and their related metrics are also freely available using the software FonaDyn (https://github.com/ElsevierSoftwareX/SOFTX_2019_251, accessed on 6 September 2021) [43].Apart from clinical and voice research applications, it can also be used to map the entire dynamic and pitch range of a voice in real-time.FonaDyn combines the audio and EGG signals to create voice maps of a number of acoustic and EGG metrics and also statistical clustering of these metrics [43].Figure 9 shows a voice map of the vocal range of a female singer.One normalized EGG shape is also displayed, representing how the contact quotient is calculated by FonaDyn; this corresponds to the total area of the normalized EGG shape during contact, i.e., the contact quotient by integration (Q ci ) [43].This EGG metric can be plotted as a voice map.The voice map displays Q ci at different combinations of f o (horizontal axis) and SPL (vertical axis).The color scale indicates short contacting in blue and long contacting in red.Contact quotient by integration can be used to assess the students' vocal progress.According to previous research, there is a tendency to increase vocal fold contact time and, hence, diminish acoustical losses with training [44].It should be recalled, however, that an increase in p sub also results in an increase in contact time [45], so very long contact times may reflect pressed phonation.Thus, EGG shapes should be interpreted in combination with perceptual evaluations of the corresponding acoustical output.
Voice maps can also display other EGG metrics, such as the normalized peak derivative (Q ∆ ) and the index of contacting (I c ) (Figure 10).The former provides information on the speed of the vocal fold contact: the faster the contact, the louder the voice.The latter combines information from both Q ci and Q ∆ , providing a relative indication of vocal fold collision force.It should be noted that this metric has not yet been completely validated, but it is reasonable to assume that a high I c (red color in the map) could be related to a high collision impact stress.This is quite important, especially when the target is a more sustainable vocal technique; habitual use of high collision force tends to result in voice disorders [46,47].Voice maps of EGG metrics using FonaDyn can also visualize phonation types in terms of pre-clustered EGG shapes.These are displayed within the yellow boxes in Figure 11, with the left representing firm (hyperfunctional) and the right representing breathy (hypofunctional) phonation.Singing students can try to model the real-time EGG shape (red boxes) such that they match the pre-clustered EGG shapes.The position of a single point in the pre-recorded voice map can be changed in real-time according to the phonation type.The grey color corresponds to vocalizations previously made to match other intended clustered shapes rather than the one selected as the current model.Building a voice map that corresponds to a voice range profile can be time-consuming, requiring singers to phonate over their entire pitch and dynamic range without leaving large 'holes' in the map.Also, staying on the vowel /a/ is recommended, because modifying the vowel can affect SPL.However, the visualization of voice properties in such a dynamic range of frequencies and intensities can be worthwhile, particularly if the goal is to monitor vocal development.For example, Figure 12   As expected, EGG voice maps differ between individuals.This is particularly relevant when training a singer; teaching tools should be tailored to the student and not the other way around [48].For example, when comparing the voice maps in Figures 10 and 12, pertaining to a female trained jazz singer and to a female amateur singer, respectively, they differ substantially.As compared with the amateur singer, the jazz singer phonated with stronger vocal fold contact over a wider range of frequencies and intensities.The singer reduced vocal fold contact only at higher pitches (approximately above 500 Hz).This can be seen as the green area in the Q ∆ map, which corresponds to a slower vocal fold contact speed, and the light blue area in the I c map, corresponding to a weaker collision force.
Voice maps can also be used to investigate whether implemented teaching approaches resulted in the intended pedagogical goals.For example, immediate effects of flow ball exercises were analyzed as differences between pre-and post-exercise voice maps.The results showed that the use of flow ball phonation assisted with the development of less pressed phonation and gentler vocal fold collisions [49].

Flow Glottograms
Glottal adduction can be observed by means of a real-time inverse filter unit connected to a pressure transducer in a flow mask.When appropriately tuned, the inverse filter displays the oscillation of glottal airflow, henceforth the flow glottogram (FLOGG), which can be visualized by an oscilloscope, for example [50,51].Tuning the inverse filter is done by frequency nobs that introduce antiresonances and, thus, cancel the effects of the vocal tract resonances on the signal [50,52,53].
An example of a typical FLOGG for a male singer is displayed in Figure 13.The right panel shows the FLOGG [18,54].Its ascending part corresponds to the airflow increase during the glottal opening and the descending parts correspond to the airflow decrease during glottal closing.The peak amplitude is strongly related to the amplitude of the voice source fundamental, a parameter relevant to voice quality and phonation type [55].
The real-time visual feedback provided by a FLOGG has assisted both trained and untrained singers to achieve specific phonation types [33].Figure 14 shows examples of audio, FLOGG and EGG signals typical of three phonation types-breathy, flow and pressed.FLOGG and EGG metrics reflect different, but related, aspects of phonation, as shown in this figure [56].Breathy phonation is produced with a substantial airflow as the vocal folds fail to close completely.Thus, the quasi-closed phase is short.The EGG signal, therefore, has a long non-contact time.Pressed phonation, by contrast, is produced with firm glottal adduction and with a complete glottal closure, resulting in minimal airflow and pulse amplitude and a long, closed phase.The EGG reveals a knee-like shape in its descending part.For flow phonation, the vocal folds allow generous airflow, as shown by the large pulse amplitude, combined with complete glottal closure.The EGG lacks the sharp knee of pressed phonation.As mentioned above, changes in lung volume are associated with changes in tracheal pull, which, in turn, tends to affect glottal adduction, a crucial parameter for phonation type [18].From a pedagogical point of view, it should be mentioned that a student singer may be advised to avoid a pressed voice (hyperfunctional) at high pitches by producing them at high lung volumes, i.e., after a deep inhalation, or, by practicing exercises with a descending melodic pitch direction.The opposite will apply for students tending to produce a breathy phonation at higher pitches [35].

The Resonatory System
Among available real-time visual feedback tools with pedagogical applications, those concerning spectrographic displays are not unknown to singing teachers [4,9,17,[57][58][59][60][61][62].A commonly used method for obtaining such displays is the fast Fourier transform, which calculates the spectrum for any periodic or non-periodic signals [63].Currently, there are several freeware recording software that can be used to display spectrograms and spectra (see for example, RTSect, by S. Granqvist, available at www.tolvan.com,Wavesurfer, available at www.speech.kth.se/wavesurfer/ or Praat, available at https://www.fon.hum.uva.nl/praat/, accessed on 20 February 2020).As teaching tools for singing, spectrographic displays should preferably be used with a condenser omnidirectional microphone [64] and an external sound card (for more information on this equipment, please visit the "EVTA zoom panel 2: equipment for Online Teaching", available at https://www.youtube.com/watch?v=mNxRyyMVUw4, accessed on 20 February 2020).

Spectrographic Displays
There are two types of spectrographic displays: spectrogram and spectrum.The underlying analysis can be performed with a narrow-or wide-band filter.The former provides information on individual spectral components and the latter on spectral envelope peaks and valleys.
Figure 15 provides examples of narrow-band analyses of the word yes, spoken with an ascending pitch.The left panel shows the spectrogram, where the vertical axis represents frequency and the horizontal axis shows time, with grey scale representing amplitude [4,9].This figure visualizes the changing spectrum peaks, i.e., the formant pattern.The right panel shows the spectrum for a single moment in time (few milliseconds) of the vowel/ε/.Here, frequency is run along the horizontal axis and intensity along the vertical.The individual harmonic partials are displayed as vertical spikes.Figure 16 visualizes a wide-band analysis of the same utterance.In the spectrogram (left panel), individual partials cannot be seen, but rather the formant peaks in the spectrum envelope.The first formant starts at a low frequency for the vowel /i/ in yes and rises to a high frequency in the vowel /ε/.At the same time, the second formant starts at a high frequency and changes to a low frequency.The spectrum (right panel) shows no individual harmonic partials, but rather spectrum envelop peaks at a given moment of the /ε/ vowel.Both spectrograms and spectra have been applied in the voice studio for visualizing, for example, voice onset, i.e., the manner in which the voice is initiated [4].An onset can be breathy if the increase of p sub is prior to glottal adduction and vocal fold vibration.If hard, adduction occurs first, followed by a rise in p sub and, hence, vocal fold vibration.The staccato onset is typically produced with a simultaneous start of glottal adduction and p sub rise and, hence, also of vocal fold vibration [8].In addition, expressive elements of singing, such as legato, vibrato, ornamentations and intonation, can also be displayed by real-time spectrographs [17,[65][66][67].Moreover, the synchronization of singing and piano accompaniment can be visualized and, thus, improved [4].Finally, several singing teachers use spectrographs to fine-tune vocal tract resonances in accordance with the aesthetic demands of the music style [9,16,57,58,[60][61][62].
Spectrograms are also quite useful for visualizing register breaks, phonation types, intentional voice distortions and intonation.The visualization of such events can be relevant to voice pedagogy for three main reasons.First, it can assist with the training of intentional voice breaks for expressive purposes, i.e., in yodeling.Second, as register breaks normally occur at specific pitches, depending on voice classification, visualizing spectrograms may help the teacher to classify a voice [68].Third, it can aid the learning of vowel modification for equalizing voice timbre across the whole vocal range [69].

Voice Breaks
Figure 17 provides VoceVista Video Pro examples of pitch glides sung by a male singer.On the left, a case of a voice break during the ascending part of the glissando is shown, clearly observed as a significant discontinuity in all harmonic partials.As the singer learns the required adjustments to reduce voice breaks (middle), the register discontinuity is less marked.When the singer has learnt to successfully avoid a register break (right), the harmonic partials show an even change.

Phonation Types
Spectrographic displays can also be used to visualize phonation types.However, it should be mentioned that they will often show effects of both glottal and resonance events.For clearer displays of phonation type, EGG, ELG and FLOGG will always be preferable.
Spectrograms cannot visualize vocal tract resonances per se, but only the resulting spectrum peaks; partials with higher amplitudes appear near or at vocal tract resonances [70,71].Figure 18 provides an example of narrow-band spectrograms of the vowel /a/ sung by a male singer, demonstrating different degrees of glottal adduction.The color scale indicates the amplitudes of the individual harmonic partials, with red representing higher and blue lower.Changing from breathy to pressed phonation is associated with a decrease in the intensity of the first partial.Simultaneously, the partials above 2 kHz increase in amplitude.A caveat regarding real-time spectrographic displays as visual feedback is that the resulting display heavily depends on the choice and placement of the microphone.For example, dynamic microphones (also called "stage microphones") enhance certain frequency ranges and reduce others.Moreover, their proximity effect heavily enhances low frequencies when placed near the lip opening.To circumvent these drawbacks, it is required that, for all lessons, the microphone is the same, preferably omnidirectional and placed always at same distance and position from the student's mouth.

Intentional Voice Distortion
Intentional voice distortions (IVD) are used quite commonly for aesthetic and expressive purposes in several music genres and can be produced without damaging the voice [59].They result from vocal fold aperiodic vibrations, vibration modulations or even the vibration of laryngeal or supralaryngeal structures.
IVD can be visualized by spectrograms [59,72].Figure 19 shows examples of four different types: (1) vocal folds vibration produced by the absence of vocal folds contact; (2) vocal fold vibrations that produce two independent frequencies (red filled arrows), as well as inharmonic partials (empty red arrows); (3) periodic vocal fold vibrations that produce harmonic (filled red arrows), as well as inharmonic partials (empty red arrows) shaped by simultaneous vibrations of supralaryngeal structures; and (4) minimal vocal fold vibrations (red filled arrows) combined with disturbances produced by chaotic vibrations of supraglottal structures.(3) subharmonic phonation with a predominance of harmonic components; (4) subharmonic phonation with a predominance of noise components (adapted from [59] with authors' permission).
Pedagogical applications of spectrographic displays of IVD include: (i) improving the understanding of a given adjustment, e.g., the presence or absence of periodic vocal fold vibration; (ii) enhancing the stability and quality of the IVD; and (iii) facilitating the knowledge transfer from exercise outcomes to songs [59].Red filled arrows correspond to independent frequencies; red empty arrows indicate inharmonic partials.

Intonation
Nowadays, there is a vast number of software and cell phone applications to display, in real-time, the f o contour and thus keep a visual track of pitches [57,58].Tracking f o contour is relevant in voice pedagogy, as intonation is used as an expressive tool [73,74].Most of these applications quantify f o in semitones, thus disregarding finer pitch effects.This excludes the possibility of monitoring microtonal effects, which are used in many singing styles [74][75][76].The VoceVista Video Pro software, by contrast, can also visualize micro-intonation effects; Figure 20 shows an example of this.Each semitone is marked by a white line and the f o contour by the blue curve.The fifth tone (red rectangle) is centered between the pitches C4 and C#4.

Conclusions
The current article aimed at a comprehensive review of technological tools with pedagogical applications for the teaching of singing students.The benefits of the real-time visualization of physiological, aerodynamic and acoustical aspects of voice production in singing were discussed, especially concerning knowledge of results.As in physiological voice therapy programs, teaching singing can strive for the achievement of an aesthetic acoustical output grounded in the balance between the three subsystems of voice production-respiratory, oscillatory and resonatory-which is a goal clearly facilitated by their visualization in real-time.The voice is a unique musical instrument, hidden to the naked eye.Therefore, teaching singing requires pedagogical approaches, among which guided awareness enhanced by meaningful feedback constitutes an example.
The RespTrack system allows for the guided awareness of breathing patterns, crucial to control lung volume and, thus, the elasticity of the breathing apparatus and p sub .This physiological parameter controls SPL, a significant acoustical parameter for the perceived vocal loudness.Still related to the respiratory system, the use of p sub meters in singing lessons were also discussed for the purposes of monitoring behaviors related to vocal health and musical phrasing.
As to what concerns the oscillatory subsystem, the visualization of the vibratory patterns of the vocal folds helps master vocal fold tension, extension and adduction at different degrees of vocal loudness.Real-time visual feedback can be provided by means of an electroglottograph, electrolaryngograph and an inverse filter.Displays using these technologies allow for the visualization of phonation types and an understanding of associated voice qualities.The recent FonaDyn software also allows the display of voice maps, which is crucial to monitor vocal development and control glottal adduction at different pitches and intensities.
Real-time visual feedback of adjustments of the resonatory system can be provided by spectrographic displays.Several software can be used for this purpose, showing details of voice onset, intonation, legato, vibrato, ornamentations, synchronization between singing and piano accompaniment and formant tuning or de-tuning, according to the aesthetic requirements of the singing style.Voice breaks, phonation types, intentional voice distortions and intonation are other parameters relevant to voice quality and expressiveness in singing that can be displayed in real-time using spectrographic displays.
Singing teachers have used spectrographic displays since the 1970s; such displays have become more accessible, are easy to handle and can be complemented by real-time visual feedback on breathing behaviors and vibratory patterns of the vocal folds during singing.We believe that, in the near future, the use of technology in singing lessons can become a gold standard for the development of effective pedagogical approaches.Singing pedagogy is already moving towards the development of science-based practices.Like any other musical instrument, the voice may be impaired by misuse; however, unlike other instruments, the voice cannot be replaced if 'ruined'.It is a teacher's pedagogical responsibility to guide the student in achieving individual vocal homeostasis simultaneously with the fulfilment of artistic expectations.This guidance greatly benefits from real-time visual feedback displays such as the ones reviewed here.

20 Figure 1 .
Figure 1.Schematic representation of the subsystems that constitute the vocal apparatus and underlying physiological, aerodynamical, acoustical and perceptual correlates.

Figure 1 .
Figure 1.Schematic representation of the subsystems that constitute the vocal apparatus and underlying physiological, aerodynamical, acoustical and perceptual correlates.

Figure 4 .
Figure 4. (1) Subglottal pressures for individual notes of an arpeggio, recorded by a PG-100E unit (Glottal Enterprises, USA) and displayed by the software Sopran.(2) Plastic tube placed in the corner of the mouth.(3) PG-100E pressure meter unit (Glottal Enterprises, New York, NY, USA).

Figure 5 .
Figure 5. Recording of the first 6 bars of the aria, "O mio babbino caro", from the opera Gianni Schicchi by G. Puccini, sung by a female soprano on the syllable /pa/.Intraoral pressures were recorded with the PG-100E unit and audio with an omnidirectional microphone.Both signals are displayed with the software Sopran.Red boxes highlight relevant pressure events related to musical phrasing (see text).

Figure 5 .
Figure 5. Recording of the first 6 bars of the aria, "O mio babbino caro", from the opera Gianni Schicchi by G. Puccini, sung by a female soprano on the syllable /pa/.Intraoral pressures were recorded with the PG-100E unit and audio with an omnidirectional microphone.Both signals are displayed with the software Sopran.Red boxes highlight relevant pressure events related to musical phrasing (see text).

Figure 6 .
Figure 6.Electroglottograph signal for three vocal fold vibratory cycles recorded by an electrolaryngograph, here displayed by the software Sopran.The sudden voltage changes (red arrows) represent an initiation of vocal fold contact.

Figure 7 .
Figure 7. Electrolaryngograph (EGG) shapes displayed by the VoceVista Video Pro software for a single vibratory cycle of a male singer sustaining the vowel /a/ with the indicated phonation types.The top panel shows EGG shapes with their corresponding derivative (dEGG) for the indicated phonation types.In the middle and lower panels, the waveforms of the indicated phonation types are shown with greater detail, with normalized amplitudes and differentiated by colored lines for comparisons.

Figure 8 .
Figure 8. VoceVista Video Pro display of an ascending glissando sung by a male.Top panel: EGG wavegram.Middle panel: dEGG wavegram.Bottom panel: narrow band spectrogram.Red box highlights a voice break and the red arrow highlights f o variations.

Figure 9 .
Figure 9.Typical voice map displayed in FonaDyn, showing sound pressure level (SPL) as a function of fundamental frequency (f o ) for the metric of contact quotient by integration (Q ci ): (1) Electroglottograph (EGG) shape showing the calculation of Q ci ; (2) Real-time voice map of a female singer.

Figure 10 .
Figure 10.Real-time voice maps of a female singer's voice displayed by FonaDyn software.(Left panel): normalized peak derivative (Q ∆ ), with red and green indicating high and low values, respectively.(Right panel): index of contact (I c ), with red and blue indicating high and low values, respectively.

Figure 11 .
Figure 11.Examples of voice maps displayed by FonaDyn software for (1) firm and (2) breathy phonation, produced by a male singer while observing the EGG waveshape (red boxes) in real-time and trying to match the EGG waveshape model of the intended phonation type (yellow boxes).The result is also presented in real-time as an individual cell in the voice map (red arrows).Note: SPL, sound pressure level; f o , fundamental frequency.
shows voice maps of Q ∆ and I c metrics for an amateur female singer.As shown in both maps, this singer has two main vibratory patterns, depending on f o and SPL.Above approximately 300 Hz, phonation is mainly achieved by reducing vocal fold adduction, as shown by a clear color change from yellow to green in the Q ∆ map (left panel) and from green to light blue in the I c map (right panel).

Figure 12 .
Figure 12.Voice maps of a female amateur singer displayed by FonaDyn software.(Left panel): normalized peak derivative (Q ∆ ), with red and green indicating high and low values, respectively.(Right panel): index of contact (I c ), with red and blue indicating high and low values, respectively.

Figure 13 .
Figure 13.Left panel: (1) a flow mask with a (2) pressure transducer.Its output can be connected to (3) an inverse filter unit and the resulting (4) FLOGG can be displayed on an oscilloscope.

Figure 14 .
Figure 14.Examples of three phonation types-breathy, flow and pressed-and corresponding (A) audio signals, (B) flow glottograms (FLOGG) and (C) EGG shapes, displayed by the software Sopran, recorded with an omnidirectional microphone, a flow mask and an electrolaryngograph.The shapes of both FLOGG and EGG signals reflect different types of phonation.

Figure 15 .
Figure 15.Spectrographic display of the word "yes", pronounced in an ascending pitch, displayed by the software Praat (by P. Boersma and D. Weenink) as a narrow-band spectrogram (left) and a narrow-band spectrum (right).

Figure 16 .
Figure 16.Spectrographic display of the word "yes", pronounced in an ascending pitch, displayed by the software Praat as a narrow-band spectrogram (left) and a narrow-band spectrum (right).

Figure 17 .
Figure 17.VoceVista Video Pro narrow-band spectrograms of ascending and descending glissandi sung by a male singer (1) with a voice break (red arrow), (2) with a minor instability (red arrow) and (3) with a smooth transition between registers.

Figure 18 .
Figure 18.Narrow-band spectrograms of different degrees of glottal adduction displayed by Voce Vista Pro software.Glottal adduction increases from left (breathy) to right (pressed).

Figure 20 .
Figure 20.VoceVista Video Pro software display of a male singer practicing micro-intonation between pitches C4 and C#4, marked within the red box.