Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery

Tyburek, Krzysztof

doi:10.3390/app15084402

Open AccessArticle

Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery

by

Krzysztof Tyburek

Faculty of Computer Science, Kazimierz Wielki University, 85-064 Bydgoszcz, Poland

Appl. Sci. 2025, 15(8), 4402; https://doi.org/10.3390/app15084402

Submission received: 19 February 2025 / Revised: 9 April 2025 / Accepted: 15 April 2025 / Published: 16 April 2025

Download

Browse Figures

Versions Notes

Abstract

The aim of this article is to analyse the fundamental frequency of oesophageal speech (ES) F0 and compare the results with the physiological speech of healthy people. The research focused on spectrogram analysis, taking into account a frequency range that is appropriate for both people following total laryngectomy and healthy people. Therefore, the frequency range of 50 Hz to 200 Hz was proposed for the research. The studied fundamental frequency F0 was determined by segmenting the speech signal using a moving time window. As a result, a frequency vector F0 (for each tested word) was obtained, the length of which depends on the number of frames. The obtained set of fundamental frequencies (pitch listing) was analysed using statistical functions, which led to the determination of the F0 distribution in the range of minimum, maximum, median, and standard deviation values. Voice samples were taken from 12 people aged between 30 and 70 following total laryngectomy. In accordance with the rehabilitation process, words (spoken in Polish) such as “barrel”, “bread roll”, “egg”, “package”, and “snow” were analysed (each as a separate pattern).

Keywords:

oesophageal speech; laryngectomy; speech analysis; fundamental frequency; speech rehabilitation

Graphical Abstract

1. Introduction

Following laryngeal cancer and total laryngectomy, patients lose the ability to communicate using spoken language. People suffering from laryngeal cancer undergo a laryngectomy surgery, which involves removing the larynx (which contains the vocal cords) and separating the respiratory tract from the nasal cavity and mouth. The consequences of total laryngectomy and the organ system in a healthy and a laryngectomised person are shown in Figure 1.

Total laryngectomy is a life-saving procedure, performed only when the cancer is so advanced that it is impossible to use alternative methods of treatment. However, the post-operative mutilation of the patient is significant and for these people, effective rehabilitation, the aim of which is to develop the ability to use substitute speech, is a priority. There are three methods of speech rehabilitation:

tracheoesophageal (TE),
electrolarynx (EL)
oesophageal speech (ES)

TE is a surgical and prosthetic procedure involving the implantation of a speech prosthesis, most often during laryngectomy surgery. It is also possible to implant a voice prosthesis after the completion of oncological treatment. In this surgical procedure, a fistula is placed between the trachea and the oesophagus, allowing patients to produce tracheoesophageal phonation with pulmonary air. This allows, when the tracheostomy opening is closed using a finger, exhaled air from the lungs to be directed to the oesophagus and hypopharynx [1].

Figure 1. Organ system in a healthy and a laryngectomised person. Visible tracheostomy [2].

EL is a portable, battery-powered device that is pressed against the tissues of the neck to introduce electromechanical energy into the vocal tract [3]. Although EL is easier to use, it has limitations that make it difficult to generate phonetic information such as accent and intonation, and EL speech sounds like machine speech.

ES is one of the major methods of speech rehabilitation. It is a learned technique in which air is swallowed (“injected”) into the and then allowed to escape through the pharynx. Appropriate tensing of the pharyngeal walls during expiratory airflow results in vibration of the pharyngeal walls, which creates sound. Oesophageal speech is typically of lower frequency pitch than normal speech frequencies because of the characteristics of the pharyngeal wall. The advantages of this technique are that it requires no surgical procedure, prosthesis, or cumbersome hand motions [4,5]. Sentences spoken with ES take longer than physiological speech because the patient has to replenish the swallowed air. ES is one of the most popular voice rehabilitation methods. The sounds obtained through ES are closer to the laryngeal voice compared to speech generated by TE and EL. However, their intelligibility and quality may be worse than in the case of laryngeal speech. It depends on the individual predispositions of the patient. Reduced oesophageal speech quality results from many factors, including low intensity and fundamental frequency (F0) and the presence of noises generated by its production mechanism [6,7]. The advantages of speech rehabilitation by ES include:

Non-surgical method;
Hands-free talking;
Closest to physiological speech;
No need to implement a foreign body.

Disadvantages of oesophageal speech:

Learning takes a lot of time and must be intensive;
Not all people are able to master this method well;
Speech may be incomprehensible;
Speaking in short sentences and at a slower pace—having to swallow air while speaking [8,9].

Speech signal analysis can be performed by analysing the time domain, spectrum domain, cepstrum domain, and other solutions proposed by researchers. The ability to identify physical features of speech is provided by the MPEG-7 standard [10] description definition database. In the article [11], the authors present the possibilities of speech parametrisation using popular time domain and spectrum domain descriptors, whose high efficiency in recognising the general speech signal was also demonstrated in the works [8,12,13].

The group of these popular MPEG-7 database descriptors includes, among others,

ZCR (zero-crossing rate)—a measurement used to determine the rate of zero crossings (crossing of the OX axis). This is determined as the percentage of audio samples in a given fragment that change sign. The ZCR is defined by the following equation [14]:

$Z (i) = \frac{1}{2 W_{L}} \sum_{n = 1}^{W_{L}} | s g n [x_{i} (n)] - s g n [x_{i} (n - 1)] |$

(1)

where sgn(*) is the function, i.e.,

$s g n [x_{i} (n)] = \{\begin{matrix} 1, x_{i} (n) \geq 0, \\ - 1, x_{i} < 0 . \end{matrix}$

(2)
Short-time energy (STE) is an audio descriptor from the MPEG-7 standard also used in speech classification. It describes the envelope of the signal. STE is the sum of squares computed in the time domain over the length of the test frame of the signal. The STE is expressed by the formula [14]:

$S T E = \sum_{n = 1}^{N} x^{2} (n)$

(3)

where x(n)—the value of nth sample, n—index of the sample, N—signal length (total number of samples in the processing window).
The signal mean value (SMV) descriptor expresses the average value of the input speech signal. Its value is estimated in the tested frame of the audio signal. It is calculated by summing the values of all samples and dividing by N. The SMV is given by [14]:

$S M V = \frac{1}{N} \sum_{n = 1}^{N} x (n)$

(4)

The possibilities of ES parametrisation in time domain and spectrum domain using the above descriptors are also presented in the article [12]. These studies have shown the effectiveness of descriptors for describing ES. Speech parametrisation could be also realised in the cepstrum domain, which is mainly used to describe the laryngeal tone. In paper [15], the authors show the possibilities of improving automatic speech recognition (ASR) performance in noisy speech recognition cases by the using cepstrum domain. Cepstrum analysis and MFCC are also used to study ES, which is widely presented in [8,12]. In these articles, the effectiveness of recognising differences between ES and physiological speech based on cepstrum analysis is indicated. One of the most important parameters characterising the source of voiced speech is the fundamental frequency of the laryngeal tone (marked as F0), i.e., the fundamental frequency of the vocal fold vibrations, which are primarily a function of the mass, elasticity (stiffness), and the tension coefficient of the vocal cords [16]. The value of the F0 frequency depends, among other things, on the gender, age, and emotional state of the speaker. For men, the F0 frequency takes values from the 80–480 Hz range, and for women, from the 160–960 Hz range (speech and singing are taken into account). It should also be noted that F0 is usually not stationary, but changes constantly in a sentence or spoken word. As mentioned above, ES is characterised by a lower sound frequency than natural speech due to the characteristics of the pharyngeal wall—of course, this also applies to F0. This article presents an in-depth analysis of the F0 frequency of ES in people following total laryngectomy. The obtained results were compared with natural speech. For the purposes of the study, a common frequency range was found for laryngectomised and healthy people and the same spoken words (in Polish) were analysed—each as a separate pattern. The author of this article aims to find significant differences in the F0 frequencies of oesophageal speech and natural speech using words that are used during the rehabilitation of patients following total laryngectomy. This research is a continuation of the ES analysis published earlier, but this time the focus was on the analysis of a narrow aspect of speech, namely the fundamental frequency F0. This study was approved by the bioethics committee (KB 178/2020).

2. Materials and Methods

The study group consisted of six people who had undergone a total laryngectomy surgery. This group consisted of three men aged 30 to 70 and three women aged 30 to 60. Those patients were undergoing speech rehabilitation related to learning ES. The patients had mastered the art of oesophageal speech at a communicative level. The words used by therapists for speech rehabilitation were selected for the study. The criterion for selecting the tested words depended on the speech therapists. All words were spoken in Polish. The selection of words depends on the characteristics and features of the Polish language and is related to articulation. Five words were selected for the study, which are presented in Table 1.

The above-studied words are the basis for the rehabilitation of flowing speech, which is adapted to the specifics of the Polish language. Of particular importance is the articulation of the Polish “cz” (/t͡sɛ/) or “ś” (/ʃ/), which requires a specific positioning of the lips and tongue. The above is included in the tested words, which are the subject of the research described in this paper. The same is true for the pronunciation of the Polish “buł” (/bu͡w/), which is included in the test word “bułka” (Eng. a bread roll). The tested words were also recorded from healthy people, and the numerical results related to the F0 frequency studied were used as a reference point to assess the differences between oesophageal speech and physiological speech. The words to be analysed were recorded at the Bydgoszcz Laryngectomy Association (Bydgoszcz, Poland). The recordings were taken in a specially prepared room. The OMNITRONIC IM-1000 PRO condenser microphone (Steinigke Showtechnik GmbH, Waldbüttelbrunn, Germany) was used. Details of the microphone are as follows:

XLR output
Frequency range: 60–18,000 Hz
Directivity pattern: Cardioid
Impedance: 400 ohms
Sensitivity: −70 dB

All speech samples were recorded in WAV format with a sampling rate of 44,100 Hz and 16 bits/sample. The mean durations of the studied words for healthy and laryngectomised people are presented in Table 2.

The research being carried out is a continuation of the analysis of oesophageal speech. In previous research, a broad analysis of the time domain, frequency domain, and cepstrum was carried out, and the results were presented in paper [12]. In addition, an analysis of ES was carried out using division of the studied words into phonetic segments, and the results of the research were presented in paper [8]. The author’s goal is a broad numerical analysis of oesophageal speech, the synthesis of which will lead to the definition of a precise description of the physical features of ES. The generated ES feature vector will be based on the results of the research included in the publications [8,12] and the results of this research. According to the author, the analysis of oesophageal speech is not yet complete and will be continued, with the results published in subsequent papers.

The Octave programming environment was used for signal pre-processing such as cutting out silence and removing noise and artifacts. All work related to obtaining information about the F0 frequency was performed using the Praat program v.6.4.13. The pitch information was taken from the available “Pitch” menu option of the Praat program. One of the key options is the pitch range setting, which allows the tested frequency range of the analysed word to be set. An important issue in the research was determining the common frequency range for ES and physiological speech, i.e., the range in which frequency components are present for both laryngectomised patients and healthy people. A natural feature of physiological speech is that the average male voice falls within the frequency range of between 50 Hz and 600 Hz. For average female voices, it is 100 Hz to 800 Hz, respectively. However, in the case of children, the upper value of the frequency range can reach up to 2000 Hz, especially while shouting, which accompanies play among peers. The above cases differ significantly from the properties of oesophageal speech (ES), which on the low side of the frequency range may take on the character of a “creaky” voice and a voice that is much lower than physiological speech. In addition, the speech of laryngectomised women does not have to be higher than that of laryngectomised men, which is very common in the case of the speech of healthy people. Figure 2 shows the F0 frequency of oesophageal speech (ES) for selected words and selected people following laryngectomy surgery. The figure shows that the F0 frequencies within individual words are very similar despite the gender of the speaker.

In further stages of the research, the frequency analysis range from 50 Hz to 200 Hz was established, taking into account the following settings of the Praat program:

Pre-processing: attenuation at ceiling = 0.03
Finding a path:
- Silence threshold = 0.09
- Voicing threshold = 0.50
- Octave cost = 0.055
- Octave-jump cost = 0.35
- Voiced/unvoiced cost = 0.14
Time step: fixed time step = 0.01 s
The filtered autocorrelation analysis method was chosen.

For each tested word (physiological speech and oesophageal speech [ES]), the following parameters were analysed: the number of F0 components obtained in the “pitch listing” from Praat, the average of values F0, and the minimum F0 and maximum F0 values of each word.

3. Results

This article focuses on the statistical analysis of the F0 frequency value of oesophageal speech, comparing it with the physiological speech of healthy people. The focus is on determining the temporary value of the F0 frequency, resulting from the segmentation of the analysed signal. This means that the F0 value was determined based on the processing of a speech signal segment that represents at least several cycles of vocal fold or pseudo-glottis work, in the case of oesophageal speech (ES). The speech signal segment is separated using a moving time window (called a frame), while the basic F0 frequency within the frame is constant. Therefore, an F0 frequency vector (pitch listing) is obtained, the length of which depends on the number of frames. The number of frames depends on the length of the spoken word. The duration of the spoken word for oesophageal speech (ES) is longer compared to the speech of healthy people, which depends on the level of the patient’s rehabilitation, including the ability to swallow appropriate air resources. In the case of single words, the time of expression may differ less than in the case of full sentences. Figure 3 and Figure 4 show the time waveform and spectrogram of the word “paczka” (“a package”) of spoken by a man who had had laryngectomy surgery and a healthy man. The following spectogram settings were used:

Time frequency resolutions
Number of time steps 1000
Number of frequency steps 250
Spectogram analysis settings
Method: Fourier
Window shape: Gaussian

There is a significant difference in the length of the spoken word. Moreover, for both words, a pitch contour is displayed.

The information collected regarding the F0 frequency of each single word tested included:

the mean value of the F0 frequencies contained in the pitch listing
the minimum value of the F0 frequencies from the pitch listing
the maximum value of the F0 frequencies from the pitch listing
the median value of the F0 frequencies contained in the pitch listing
the standard deviation value of the F0 frequencies contained in the pitch listing

The results of the study of the fundamental frequency F0 of oesophageal speech (ES) and the speech of healthy individuals (from the pitch listing for each studied word) will be presented later on this section. As mentioned earlier, for the purpose of the analysis, a frequency band range from 50 Hz to 200 Hz was selected. The choice of this frequency range results from the need to determine the space in which the frequency components of both oesophageal speech (ES) and physiological speech will be located. This will prevent a lack of F0 registration for some speech samples. The figures below (Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9) present the average F0 frequency distribution, the average minimum and maximum values of F0, the average median value and the average standard deviation from the pitch listing of the F0 values of the studied words.

Figure 10 presents the average values of the fundamental frequency F0 without taking into account the division according to the studied words. The main criterion is the differences between oesophageal speech (ES) and the speech of healthy people.

From the presented Figure 10, we conclude that the greatest differences can be observed for max. pitch F0. The difference is equal to 29.68, which, taking into account 124.22 Hz for healthy people and 94.54 Hz for laryngectomised people, constitutes 76.1% in the ratio of laryngectomised people to healthy people. The full set of values for the average fundamental frequency F0 for all the words tested is presented in Table 3.

The research performed shows that the largest difference regarding max. pitch F0 occurs in the tested word “egg” and is 47.58 Hz, as shown in Figure 11 and Table 4.

The result sets for the rest of the words tested are presented in Table 5, Table 6, Table 7 and Table 8.

4. Conclusions

This study attempted to provide broad information on the distribution of information on the fundamental frequency F0 for people who had undergone a total laryngectomy surgery and used oesophageal speech (ES). The obtained results located in the pitch listing were subjected to detailed statistical analysis. This analysis was to show significant differences in information on the F0 frequency between oesophageal speech (ES) and the speech of healthy people. Due to the specific features of the Polish language, which are related to the difficult articulation and pronunciation of some syllables, words used during the speech rehabilitation process were selected for the study. The selected words were not random and resulted from the recommendations of speech therapists. In this study, only single words were analysed, not full sentences. Analysis of full sentences is a more complex task and will be carried out by the author in later studies. The results of the analysis clearly indicate a significantly higher fundamental frequency F0 average value for all the tested words for physiological speech. The highest F0 frequency difference between oesophageal speech and the speech of healthy individuals was noted for the word barrel (in Polish “beczka”), and it amounted to 23.29 Hz; the lowest F0 frequency difference was noted for the word package (in Polish “paczka”), and it amounted to 8.38 Hz. The set of maximum values of the F0 frequency of the tested word obtained from the pitch listing showed that the largest difference for max. F0 occurs in the case of the word egg (in Polish “jajko”), which is 47.58 Hz. The smallest difference for the set of max. F0 frequency is 15.45 Hz for the word package (in Polish “paczka”). The differences in the of the set of minimum frequencies F0, included in the pitch listing, for the tested words are: the largest difference is 8.34 Hz for the term bread roll (in Polish “bułka”) but the smallest difference for the set of min. F0 frequencies is −1 Hz for the word package (in Polish “paczka”). The negative value results from the fact that for oesophageal speech the minimum value of the F0 frequency was higher (59.74 Hz) than the analogous one for the speech of healthy people (58.74 Hz). The highest difference in the set of median F0 value was noted for the word barrel (in Polish “beczka”) (18.39) and the lowest was for the word package (in Polish “paczka”) (−1.39); the negative value results from the fact that the median F0 frequency for laryngectomised people is higher than for healthy people. The differences in standard deviations for the set of the F0 frequency are as follows: the largest difference is 8.69 for the word egg (in Polish “jajko”) and the smallest difference is 0.56 for the word package (in Polish “paczka”). The fundamental frequency F0 is a key physical feature of speech, therefore its analysis is necessary to fully understand the changes occurring in the speech signal. Controlling the changes in the numerical values of this descriptor can contribute to a more effective rehabilitation process using oesophageal speech. Information on the distribution of the fundamental frequency F0 in combination with the previously mentioned MPEG-7 descriptors can not only support the rehabilitation process but also contribute to the development of new methods of teaching the use of oesophageal speech. The research results can be used to improve the quality of speech rehabilitation after laryngectomy. A key element of learning oesophageal speech is breathing exercises, which will allow for the correct articulation of vowels, consonants, single words and full sentences. In the first stage, air is sucked into the oesophagus, then by reflection it is directed to the oral cavity. Then, words are pronounced and substitute speech is learned. However, the words pronounced are characterized by different physical features than physiological speech. The obtained research results can be used to indicate differences in the deviations of the numerical values of oesophageal and physiological speech features. Thanks to this, a feature vector can be implemented in a device that will indicate to the patient and therapist whether the rehabilitation process is going optimally—that is, whether during rehabilitation the difference in numerical values of descriptors, between oesophageal and physiological speech features, decreases or increases. Thus, whether the rehabilitation is proceeding correctly or whether changes need to be made or to improve the rehabilitation method can be determined.

Funding

This research is being carried out as part of the mini-grant “Computational recognition and analysis of oesophageal speech in patients following total laryngectomy surgery” in the project funded by the Polish Minister of Science and Higher Education under the ‘Regional Initiative of Excellence’ program (RID/SP/0048/2024/01) for Kazimierz Wielki University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The author declares no conflicts of interest.

References

Chenausky, K.; MacAuslan, J. Utilization of microprocessors in voice quality improvement: The electrolarynx. Curr. Opin. Otolaryngol. Head Neck Surg. 2000, 8, 138–142. [Google Scholar] [CrossRef]
Available online: http://www.cancerresearchuk.org/about-cancer (accessed on 8 April 2025).
Oe, K. An Electrolarynx Control Method Using Myoelectric Signals from the Neck. J. Robot. Mechatron. 2021, 33, 804–813. [Google Scholar] [CrossRef]
Eibling, D.E. Chapter 50—Voice Restoration after Total Laryngectomy. In Operative Otolaryngology: Head and Neck Surgery, 2nd ed.; Two-Volume Set; Elsevier: Amsterdam, The Netherlands, 2008; Volume 1, pp. 431–437. [Google Scholar] [CrossRef]
Sahin, M.; Vardar, R.; Kirazli, T.; Ogut, F.; Akyildiz, S.; Bor, S. Predictive value of esophageal motility test in the proficiency of esophageal speech. Diseases of the Esophagus. Off. J. Int. Soc. Dis. Esophagus 2015, 28, 151–155. [Google Scholar] [CrossRef] [PubMed]
Doi, H.; Nakamura, K.; Toda, T.; Saruwatari, H.; Shikano, K. An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 5136–5139. [Google Scholar]
de Medeiros João Paulo Cabral, B.R.; Meireles, A.R.; Baceti, A.A. A comparative study of fundamental frequency stability between speech and singing. Speech Commun. 2021, 128, 15–23. [Google Scholar] [CrossRef]
Tyburek, K.; Mikołajewski, D.; Rojek, I. Analysis of Phonetic Segments of Oesophageal Speech in People Following Total Laryngectomy. Appl. Sci. 2023, 13, 4995. [Google Scholar] [CrossRef]
Lee, J.H.; Humes, L.E. Effect of fundamental-frequency and sentence-onset differences on speech-identification performance of young and older adults in a competing-talker background. J. Acoust. Soc. Am. 2012, 132, 1700–1717. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Manjunath, B.S.; Salembier, P.; Sikora, T. Introduction to MPEG-7: Multimedia Content Description Interface; Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]
Joy, S.; Upadhya, S. Speech Analysis in Time and Frequency Domain. Int. J. Eng. Res. Technol. 2015, 3, 1–4. [Google Scholar]
Tyburek, K. Parameterisation of human speech after total laryngectomy surgery. Comput. Speech Lang. 2022, 72, 101313. [Google Scholar] [CrossRef]
Most, T.; Tobin, Y.; Mimran, R.C. Acoustic and perceptual characteristics of esophageal and tracheoesophageal speech production. J. Commun. Disord. 2000, 33, 165–181. [Google Scholar] [CrossRef] [PubMed]
Giannakopoulos, T.; Pikrakis, A. Introduction to Audio Analysis: A Matlab Approach; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
Kim, H.K.; Rose, R.C. Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments. IEEE Trans. Speech Audio Process. 2003, 11, 435–446. [Google Scholar] [CrossRef]
Yao, X.; Jitsuhiro, T.; Miyajima, C.; Kitaoka, N.; Takeda, K. Classification of speech under stress based on modeling of the vocal folds and vocal tract. EURASIP J. Audio Speech Music. Process. 2013, 2013, 17. [Google Scholar] [CrossRef]

Figure 2. F0 frequencies of selected oesophageal speech words.

Figure 3. Time waveform and spectrogram of the word “a package”—laryngectomised man.

Figure 4. Time waveform and spectrogram of the word “a package”—healthy man.

Figure 5. Mean value of the F0 frequencies contained in the pitch listing.

Figure 6. Minimum value of the F0 frequency from the pitch listing.

Figure 7. Maximum value of the F0 frequency from the pitch listing.

Figure 8. Standard deviation value of the F0 frequencies contained in the pitch listing.

Figure 9. Median value of the F0 frequencies contained in the pitch listing.

Figure 10. Average values of fundamental frequency F0 for all tested words.

Figure 11. Average values of fundamental frequency F0 for tested word “jajko” (“an egg”).

Table 1. List of studied words.

Studied Words
In Polish	In English	IPA Notations
beczka	a barrel	/bɛt͡ʂka/
bułka	a bread roll	/bu͜wka/
jajko	an egg	/jaj.kɔ/
paczka	a package	/pat͡ʂka/
śnieg	snow	/ɕɲɛɡ/

Table 2. Mean durations of the studied words for healthy and laryngectomised people.

Average Duration of the Word Studied [s]
In Polish	In English	Healthy	Laryngectomised
jajko	an egg	0.6050	0.7688
beczka	a barrel	0.6035	0.6758
bułka	a bread roll	0.6701	0.8442
paczka	a package	0.6687	0.6901
śnieg	snow	0.3283	0.4685

Table 3. Set of values for the average fundamental frequency F0 for all the tested words.

Average Values of Fundamental Frequency F0 for All Tested Words
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	96.87	80.86	83.48	16.01
Min. pitch F0 [Hz]	79.87	64.70	81.00	15.17
Max. pitch F0 [Hz]	124.22	94.54	76.10	29.68
Median F0	94.68	83.14	87.82	11.53
Standard deviation F0	17.54	12.61	71.89	4.93

Table 4. Set of values for the average fundamental frequency F0 for the tested word “jajko” (“an egg”).

Average Values of Fundamental Frequency F0 for the Tested Word “Jajko” (“an Egg”)
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	88.59	73.19	82.62	15.40
Min. pitch F0 [Hz]	71.02	45.15	63.57	25.88
Max. pitch F0 [Hz]	112.39	64.81	57.67	47.58
Median F0	90.95	73.49	80.80	17.46
Standard deviation F0	16.55	7.86	47.52	8.69

Table 5. Set of values for the average fundamental frequency F0 for the tested word “beczka” (“a barrel”).

Average Values of Fundamental Frequency F0 for the Tested Word “Beczka” (“a Barrel”)
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	98.26	74.97	76.30	23.29
Min. pitch F0 [Hz]	88.94	64.00	71.96	24.94
Max. pitch F0 [Hz]	121.40	87.72	72.25	33.68
Median F0	95.29	76.90	80.70	18.39
Standard deviation F0	14.29	9.74	68.21	4.54

Table 6. Set of values for the average fundamental frequency F0 for the tested word “bułka” (“a bread roll”).

Average Values of Fundamental Frequency F0 for the Tested Word “Bułka” (“a Bread Roll”)
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	107.81	88.71	82.28	23.29
Min. pitch F0 [Hz]	90.56	82.22	90.79	24.94
Max. pitch F0 [Hz]	145.98	115.74	79.28	33.68
Median F0	99.11	84.14	84.90	18.39
Standard deviation F0	23.85	16.27	68.23	4.54

Table 7. Set of values for the average fundamental frequency F0 for the tested word “paczka” (“a package”).

Average Values of Fundamental Frequency F0 for the Tested Word “Paczka” (“a Package”)
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	89.87	81.49	90.67	8.38
Min. pitch F0 [Hz]	58.74	59.74	101.71	−1.00
Max. pitch F0 [Hz]	120.31	104.86	87.16	15.45
Median F0	91.27	92.66	101.52	−1.39
Standard deviation F0	19.32	18.76	97.13	0.56

Table 8. Set of values for the average fundamental frequency F0 for the tested word “śnieg” (“snow”).

Average Values of Fundamental Frequency F0 for the Tested Word “śnieg” (“Snow”)
F0 Frequency	Healthy	Laryngectomised	Percentage Value in the Ratio of People Following Laryngectomy to Healthy People [%]	Frequency Difference—Healthy and Laryngectomised People
Mean pitch F0 [Hz]	99.80	85.94	86.11	13.86
Min. pitch F0 [Hz]	90.09	72.37	80.34	17.71
Max. pitch F0 [Hz]	121.01	99.56	82.27	21.46
Median F0	96.76	88.52	91.48	8.24
Standard deviation F0	13.71	10.42	75.96	3.30

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tyburek, K. Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery. Appl. Sci. 2025, 15, 4402. https://doi.org/10.3390/app15084402

AMA Style

Tyburek K. Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery. Applied Sciences. 2025; 15(8):4402. https://doi.org/10.3390/app15084402

Chicago/Turabian Style

Tyburek, Krzysztof. 2025. "Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery" Applied Sciences 15, no. 8: 4402. https://doi.org/10.3390/app15084402

APA Style

Tyburek, K. (2025). Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery. Applied Sciences, 15(8), 4402. https://doi.org/10.3390/app15084402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of the Fundamental Frequency F0 of Oesophageal Speech in Patients Following Total Laryngectomy Surgery

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI