Next Article in Journal
A Simple Method to Estimate the In Situ Performance of Noise Barriers
Next Article in Special Issue
A New Method for Detecting Onset and Offset for Singing in Real-Time and Offline Environments
Previous Article in Journal
Complex Band Structure of 2D Piezoelectric Local Resonant Phononic Crystal with Finite Out-Of Plane Extension
Previous Article in Special Issue
Automatic Clustering of Students by Level of Situational Interest Based on Their EEG Features
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart-Median: A New Real-Time Algorithm for Smoothing Singing Pitch Contours

Department of Computer Science, Maynooth University, W23 F2H6 Maynooth, Co. Kildare, Ireland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(14), 7026; https://doi.org/10.3390/app12147026
Submission received: 27 May 2022 / Revised: 30 June 2022 / Accepted: 7 July 2022 / Published: 12 July 2022
(This article belongs to the Special Issue Processing Techniques Applied to Audio, Image and Brain Signals)

Abstract

:

Featured Application

Some of the applications of this study are improving pitch estimation, removing outliers and errors, singing analysis, voice analysis, singing assessment, and singing information retrieval.

Abstract

Pitch detection is usually one of the fundamental steps in audio signal processing. However, it is common for pitch detectors to estimate a portion of the fundamental frequencies incorrectly, especially in real-time environments and when applied to singing. Therefore, the estimated pitch contour usually has errors. To remove these errors, a contour smoother algorithm should be employed. However, because none of the current contour-smoother algorithms has been explicitly designed to be applied to contours generated from singing, they are often unsuitable for this purpose. Therefore, this article aims to introduce a new smoother algorithm that rectifies this. The proposed smoother algorithm is compared with 15 other smoother algorithms over approximately 2700 pitch contours. Four metrics were used for the comparison. According to all the metrics, the proposed algorithm could smooth the contours more accurately than other algorithms. A distinct conclusion is that smoother algorithms should be designed according to the contour type and the result’s final applications.

1. Introduction

Estimating the fundamental frequency is usually one of the main steps in audio signal processing algorithms. However, it is common for pitch detector algorithms to make some incorrect estimations, resulting in a pitch contour that is not smooth and includes errors. These errors are often due to doubling or halving estimates of the true pitch value, and are therefore impulsive in appearance rather than random [1,2,3,4,5]. Furthermore, incorrect pitch estimation often happens in real-time pitch detection, especially when the sound source is a human voice [5]. Therefore, a contour-smoother algorithm is necessary to filter the incorrectly estimated F0 before further analysis.
Generally, contour smoothers can be divided into two categories: 1—contour smoothing to show the data trend; and 2—contour smoothing to remove errors, noise, and outlier points.
There are several algorithms for showing contour trend, such as polynomial [6], spline [7,8], Gaussian [9], Locally Weighted Scatterplot Smoothing (LOWESS) [10,11], and seasonal decomposition [12]. One of the applications of trend detection using pitch contours is to find the similarity between melodies [13,14,15,16]. Other contour smoothers, such as moving average [17] and Median filter, function by attenuating or removing outliers in the contour [5]. None of these contour-smoother algorithms was explicitly designed for smoothing pitch contours; they can be used for any contour from any data series. They have been applied to the smoothing of pitch contours, such as in the study by Kasi and Zahorian [18] that used the Median filter. However, there are certain adjusted versions of these algorithms for smoothing estimated pitches; for example, Okada et al. [19] and Jlassi et al. [20] introduced pitch contour algorithms based on the Median filter. In the following, some of these adjusted algorithms are discussed.
Zhao et al. [2] introduced a pitch smoothing method for the Mandarin language based on autocorrelation and cepstral F0 detection approaches. They first used two pitch estimation techniques to estimate two separate pitch contours, and then both were smoothed. Finally, combining the two estimated pitch contours created the smoothed contour. Generally, their approach was very similar to the idea of this paper, moving through a pitch contour to identify noisy estimates by comparing each point to its previous and succeeding points, and finally editing out the noise. However, their approach involved altering some correct parts of the data, which impacted peaks that were not incorrect. Moreover, in their evaluation, they only checked the error reduction capability of their algorithm for removing octave-doubling and sharp rises in estimated F0s. It would have been preferable to compare their smoothed contours with ground truth, to show how well their algorithm could adjust the estimated contour to make it similar to that of the ground truth.
Liu et al. [21] introduced a pitch-contour-smoother algorithm for Mandarin tone recognition. They used several thresholds for finding half, double, and triple errors by comparing each point with its previous point. Then, an incorrect frequency was doubled, halved, or divided by three, according to the type of error detected. They indicated that experiments should determine the threshold values, but did not provide any guidelines for selecting or adjusting these. In addition, the threshold values they used were not revealed. Therefore, it is unclear how one could change the thresholds to optimize the result. In addition, they tested their algorithm only on an isolated Mandarin syllabus, although realistically they should also have tried their approach on continuously spoken language. Moreover, they did not compare the accuracy of their algorithm with other contour-smoother algorithms to show how well their method performed compared to others.
The smoothing approach presented by Jlassi et al. [20] was designed for spoken English. Their smoothing system was based on the moving average filter. However, they only calculated the average of the two immediately previous F0 points for those points that showed more than a 30 Hz difference from their previous and following points. They compared their algorithm with the Median filter and Exponentially Weighted Moving Average (EWMA), and found improved accuracy using their approach. However, the dataset [22] used in their study was small (15 people reading a phonetically balanced text, “The North Wind Story”). Their results would have been much more convincing if they had evaluated their algorithm with a more extensive dataset generated by various pitch-detector algorithms. Moreover, several metrics could have been employed to measure how well they smoothed the errors. Furthermore, their algorithm considered a difference of more than 30 Hz from both the immediately previous and following points as an error; therefore, it was unable to identify and smooth any errors existing over more than one point on the contour.
Ferro and Tamburini [1] introduced another pitch-smoother technique for spoken English, based on Deep Neural Networks (DNN) and implemented explicitly as a Recurrent Neural Network (RNN). However, they did not provide a comparison between the improvement offered by their approach and that of any other method. In addition, comparison of their datasets and the mixture of datasets suggests that their DNN architecture may not work well with a new dataset.
As exemplified above, many pitch detection algorithms have been designed for and tested on speech. However, although both speech and singing are produced with the same human vocal system, because of the differences between speaking and singing, separate studies are required for pitch analysis of singing [23]. In addition, in real-time environments, the smoother algorithm should alter the contour with a reasonable delay, mainly based on previous data because future data is unavailable.
We believe that the smoother algorithm should be based on the features and applications of the contour, similar to the approach taken by Ferro and Tamburini [1] and the studies by So et al. [3] on smoothing contours generated from speech. In other words, expected error types in the pitch contours for the specific data type should be identified. Then, an investigation for a targeted contour-smoother algorithm to solve these errors should be made. In addition, the applications of the smoothed contour should also be considered. For example, when a highly accurate estimate of the F0 value at each point is required, the smoother algorithm should not change any data except those points identified as incorrectly estimated. Moreover, in real-time environments the smoother algorithm should not have a long delay.
This paper, therefore, introduces a new contour-smoother algorithm based on the features and applications of pitch contours that are derived only from singing. For this purpose, after describing the methodology applied, several typical contour-smoother algorithms are described. Then, the proposed algorithm is explained in Section 4, followed by the results and discussion. Finally, a conclusion and suggestions for future work are provided in Section 7.

2. Materials and Methods

2.1. Dataset

The VocalSet dataset [24] was used to evaluate the algorithms’ accuracy. This dataset includes more than 10 h of recordings of 20 (11 males and 9 females) professional singers. VocalSet includes a complete set of vowels and a diverse set of voices that exhibit many different vocal techniques, singing in contexts of scales, arpeggios, long tones, and melodic excerpts. For this study, a portion of VocalSet was selected; the scales and arpeggios sung across the vowels in loud slow and fast performances. The total number of files used from VocalSet was 511.

2.2. Ground Truth

In order to evaluate the accuracy of each of the smoother algorithms, ground truth pitch contours were required to compare the smoothed pitch contours. In other words, in this study, the best smoothing algorithm was considered the one that produced contours most similar to the ground truth. According to studies by Faghih and Timoney [4,5], a reliable offline pitch detector algorithm called PYin [24] was used. The pitch contours estimated by PYin were saved in several CSV files with two columns, time in seconds and F0. These were all plotted to ensure the accuracy of the pitch contours estimated by PYin. Those that included irrational jumps were considered incorrect and deleted. Therefore, after removing those contours, the number of the ground truth files remaining was 447.

2.3. Pitch Detection Algorithms to Generate Pitch Contours

To evaluate the proposed smoother algorithm, we used a similar approach as [1], employing several pitch contours with different random error (unsmoothed) points. As Faghih and Timoney’s [5] study discussed, six real-time pitch detection algorithms with different estimated contours were employed to obtain the required contours. The pitch detector algorithms were Yin [25], spectral YIN or YIN Fast Fourier transform (YinFFT), Fast comb spectral model (FComb), Multi-comb spectral filtering (Mcomb), Schmitt trigger, and the spectral auto-correlation function (specacf). The implementation for these algorithms came from a Python library, Aubio (https://aubio.org/manual/latest/cli.html#aubiopitch, accessed on 10 June 2021) [26], a well-known library for music information retrieval. Since the focus of this paper is on smoothing pitch contours, descriptions of these algorithms are not provided in this paper but can be found in [5,27]. The reason for selecting these real-time pitch estimators was that, based on the study by Faghih and Timoney [5], none of them can estimate F0s without error in singing signals. In addition, the accuracy of these algorithms varies, which helped us evaluate the contour-smoother algorithms in different situations.
In addition, to compare the accuracy of the algorithms in conditions where the pitch contours included no or only a few errors, an offline pitch-detector algorithm provided in the Praat tool [28] based on the Boersma algorithm [29] was used. According to Faghih and Timoney’s studies [4,5], the Praat and Pyin accuracies tend to be similar.
The settings used for pitch detection for women’s voices were 44,100 for sample rate, 1024 for window size, and 512 for hop size. The related settings for men’s voices were 44,100, 2048, and 1024 for sample rate, window size, and hop size, respectively. Therefore, the distance between two consecutive points in a pitch contour for women’s voices was 11.61 milliseconds, and for men’s voices was 23.22 milliseconds.
As shown in Figure 1, the contours generated by the different pitch detectors exhibited various errors. Therefore, the total number of contours used to evaluate the smoother algorithms was 2682 (corresponding to the six pitch detectors run on each of the 447 wav files).
All the provided files, such as the dataset and codes, are available in a GitHub repository at https://github.com/BehnamFaghihMusicTech/Smart-Median, accessed on 6 July 2022.

2.4. Evaluation Method

Several evaluation metrics were used to compare the accuracy of the smoothing algorithms. The metrics used for the evaluations were R-squared (R2), Root-Mean-Square Error (RMSE), Mean-Absolute-Error (MAE), and F0 Frame Error (FFE). A well-known Python library called Sklearn [30] was used for the metrics, except for the FFE metric that was created by this paper’s authors. These metrics are explained in the following subsections.

2.4.1. R-Squared (R2)

The formula for this metric is as follows (1) [31]:
R 2 = 1 i = 1 N ( G T i S M i ) 2 i = 1 N ( G T i m e a n ( G T ) ) 2 = 1 Regression   Sum   of   Squares   ( R S S ) Total   Sum   of   Squares   ( T S S )
where N is the total number of frames, G T is the ground truth contour, S M is the smoothed contour, and the m e a n ( G T ) = 1 N   i = 1 N G T i .
In the best case, when all the points in the ground truth contour and the estimated contour are similar, R 2 is equal to 1; otherwise, R 2 is less than 1. A value closer to 1 means more similarity between the two contours.

2.4.2. Root-Mean-Square Error (RMSE)

This metric is calculated according to the following Formula (2):
R M S E = i = 1 N ( G T i S M i ) 2 N 2
In the best case, when the two contours have precisely the same values, the RMSE is 0; otherwise, it is more significant than 0. Closer values to 0 mean more similarity between two contours.

2.4.3. Mean-Absolute-Error (MAE)

Equation (3) shows how to calculate this metric:
M A E = i = 1 N G T i S M i N
MAE is similar to RMSE, but, because of the squared difference, RMSE can be considered a more significant penalty for points at a greater distance from corresponding points in the ground truth contour.

2.4.4. F0 Frame Error (FFE)

FFE is the proportion of frames within which an error is made. Therefore, FFE alone can provide an overall performance measure of the accuracy of the pitch detection algorithm [32]. This metric calculates the percentage of points in the estimated pitch contour that are within a Threshold distance of corresponding points in the ground truth pitch contour (4):
F F E = i = 0 N { 1         | S M i G t i | T h r e s h o l d 0                      o t h e r w i s e N × 100
where N is the total number of frames/points.
For the T h r e s h o l d , in studies such as [1], a constant value, e.g., 16 Hz, was used as an acceptable variation from the ground truth. However, as is discussed by Faghih and Timoney [5], a fixed distance from the ground truth may not be a good approach, because the perceptual effect of 16 Hz is different when the estimated pitch is 100 Hz compared to 1000 Hz. However, it is also common to use a percentage, usually 20%, as the threshold [20], and a similar approach is used in this study.
Higher values of this metric indicate a higher similarity between the smoothed pitch contour and the ground truth pitch contour.
It should be mentioned that there are other algorithms for finding the similarities between pitch contours, such as those of Sampaio [13], Wu [14], and Lin et al. [15]. However, these aim to determine perceptual similarity between two pitch contours. In other words, those researchers were seeking to determine the similarity of one melody to another. The purpose of the current paper is not to ascertain the overall similarities between two tunes, but rather the numerical relationship between each point on two different pitch contours. Therefore, those algorithms were not suitable for this study.

3. Current Contour Smoother Algorithms

Several contour-smoother algorithms are commonly used to smooth pitch contours. This section provides a list of these algorithms.
To refer to the smoother algorithms within this paper, a code has been assigned to each algorithm listed in Table 1.
Figure 2 illustrates the effect of the smoother algorithms on a single estimated pitch contour. A female singer sang an arpeggio in the C major scale, and the FComb algorithm estimated the pitches. The smoothed contours are plotted in eight different panels. Each panel includes ground truth (GT), the original estimated (ST) contours, and the smoothed contours generated by some of the smoother algorithms.
In addition, the Python libraries employed to implement these smoothers are listed in Appendix A.
Each of the algorithms is described below.

3.1. Gaussian Filter

Generally, in signal processing, filtering removes or modifies unwanted error and noise signals from a series of data. Therefore, Gaussian filters smooth out fluctuations in data by convolution with a Gaussian function [9]. The one-dimensional Gaussian filter is expressed as (5):
S m i = 1 2 π σ   exp ( ( E s i ) 2 2 σ 2 )
where E s i is the original signal at position i , and S m i is the smoothed signal at position i . In addition, σ 2 indicates the variance of the Gaussian filter. The smoothing degree depends on the variance value size [9]. Although the Gaussian filter smooths out the noise, as shown in Figure 2a, some correctly estimated F0 may also change, i.e., become distorted [9].

3.2. Savitzky–Golay Filter

This particular type of low-pass filter was introduced into analytical chemistry, but soon found many applications in other fields [33]. It can be considered a weighted moving average [34], and is defined as follows (6):
S m i = k = M M h k   E s i k
where E s i is the original signal at position i , and S m i is the smoothed signal at position i . M is window length and h k are the filter coefficients that indicate the boundaries of the data. The drawback of the Savitzky–Golay (SG) filter, according to Schmid et al. [35], is that the data near the edges is prone to artefacts. Figure 2a illustrates its effect on a contour.

3.3. Exponential Filter

This approach is based on weighting the current values by the previously observed data, assuming that the most recent observations are more important than the older ones. The smoothed series starts with the second point in the contour. It is calculated by [36], (7):
S m i = α E s i 1 + ( 1 α )   S m i 1       0 < α 1     i 3 .
where α is called the smoothing constant. An illustration of this smoothing effect can be seen in Figure 2a.

3.4. Window-Based Finite Impulse Response Filter

In this approach, a window works as a mask to filter the data series. Different window shapes can be considered for filtering data. Each window point is usually between 0 and 1. Therefore, this method uses weighted windows. If E s i is considered a signal at index i , and a window at index i as w i , the smoothed signal S m i is calculated as follows, (8):
S m i = w i   E s i
The window types used in this study are described below.

3.4.1. Rectangular Window

This means that the window’s values all equal one; Figure 2b.

3.4.2. Hanning Window

The Hanning window is defined as follows, from [37] (9):
W H ( i ) = { 0.5 [ 1 cos ( 2 π i N ) ] 0        o t h e r w i s e      0 i = N 1
where N is the length of the window; Figure 2b.

3.4.3. Hamming Window

The Hamming window is defined as follows, from [37] (10):
w H M ( i ) = { 0.54 + 0.46 cos ( 2 π i N )     0 n N 1 0                   o t h e r w i s e
where N is the length of the window; Figure 2b.

3.4.4. Bartlett Window

The Bartlett (triangular) window is defined [37] using (11), Figure 2b:
w b ( i ) = { 1         0 i N 1 0             o t h e r w i s e

3.4.5. Blackman Window

The Blackman window is defined [38] by (12):
w b l a c k ( i ) = a 0 + a 1 + a 2 cos 4 π i N 1           f o r   N 1 2 i N 1 2
where N is the window length, and a 0 ,   a 1   and   a 2 are constants (13):
a 0 = 1 α 2 ,   a 1 = 1 2 ,   a 2 = α / 2
The α is static, and equals 0.16 ; Figure 2c.

3.5. Direct Spectral Filter

In this approach, a time series is smoothed by employing a Fourier Transformation. The essential frequencies remain, and others are removed. It operates similarly to multiplying the frequency domain by a rectangular window. In other words, it is a circular convolution generated by transforming the window in the time domain; Figure 2c.

3.6. Polynomial

This approach uses weighted linear regression on an ad-hoc expansion basis to smooth the time series. It is a generalization of the Finite Impulse Response (FIR) filter that can better preserve the desired signal’s higher frequency content without removing as much noise as the average [39]. The first derivative of the polynomial evaluated at the midpoint of the N-interval is generated by multiplying the position data E s i by coefficients and adding these multiplications, as shown in (14) [6]:
S m ( N + 1 ) / 2 = i = 1 N W i E s i       ( N = number   of   data   points ( odd ) )
where W i are the weights (coefficients) of the polynomial fit of degree p . The weights depend on the degree p , and the number of points, N, used in the fit; Figure 2c is an example.

3.7. Spline

This approach employs Spline functions to eliminate the noise from the data. It works by estimating the optimum amount of smoothing required for the data. Three types of spline smoothing were used in this study: ‘linear’ (Figure 2c), ‘cubic’ (Figure 2d), and ‘natural cubic’ (Figure 2d). The details of this approach are provided in [7,8].

3.8. Binner

This approach applies linear regression on an ad-hoc expansion basis within a time series. The features created by this method are obtained via binning the input space into intervals. An indicator feature is designed for each bin, indicating into which bin a given observation falls. The input space consists of a single continuous increasing sequence in the time series domain [40]; an illustration is shown in Figure 2d.

3.9. Locally Weighted Scatterplot Smoothing (LOWESS) Smoother

This is a non-parametric regression method. LOWESS attempts to fit a linear model to each data point, based on local data points; Figure 2e. This makes the procedure more versatile than simply including a high-order polynomial [10,11].

3.10. Seasonal Decomposition

One of the considerations in analysing time series data is dealing with seasonality. A seasonal decomposition deconstructs a time series into several components: a trend, a repeating seasonal time series, and the remainder. One of the benefits of seasonal decomposition is its capacity to locate anomalies and errors in data [12]. Seasonal decomposition can estimate the notes and seasons in a pitch contour, but the vibrations sung in each note are removed. Therefore, it can show the movements between seasons and notes in a pitch contour, as shown in Figure 2e,f.
Two seasonal component assessments may be used: ‘additive’ and ‘multiplicative’. In the additive method, the variables are assumed to be mutually independent and calculated by summation of the variables. The multiplicative approach considers that components are dependent on each other, and it is calculated by the multiplication of the variables [41].
Seasonal decomposition can be employed using different smoothing techniques. The smoothing techniques used in this study are Window-based, ‘LOWESS’, and ‘natural_cubic_spline’.

3.11. Kalman Filter

The Kalman filter is a set of mathematical equations that provides an efficient recursive means to estimate the state of a process in a way that minimises the norm of the squared error. The Kalman filter uses a form of feedback control, assessing the process state and then obtaining feedback in the form of (noisy) measurements. The equations for the Kalman filter have two parts: time update equations and measurement update equations. The time update equations operate as predictor equations, while the measurement update equations are corrector equations. Thus, the overall estimation algorithm is close to a predictor–corrector algorithm, i.e., correcting to improve the predicted value. In the standard Kalman filter, it is assumed that the noise is Gaussian, which may or may not reflect the reality of the system that is being modelled [42]. Thus, the more accurate the model used in the Kalman algorithm, the better the performance.
The Kalman smoother can be represented in the state space form. Therefore, a matrix representation of all the components is required. Four structure presentations in the contours are considered: ‘level’, ‘trend’, ‘seasonality’ and ‘long seasonality’, and a combination of these structures can be considered. Examples of the effects of different variations of the Kalman filter are shown in Figure 2f,g.

3.12. Moving Average

This simple filter aims to reduce random noise in a data series [17] by following the Formula (15):
S m i = 1 n   j = 0 n 1 E s i + j
where E s is the original pitch contour, S m is the smoothed pitch contour, and n is the number of points analysed at any given time and is referred to as the window length of the filter. The larger the value of n, the greater the level of smoothing. An example can be seen in Figure 2h.

3.13. Median Filter

The Median filter approach is similar to the moving average. Still, instead of calculating the average of a window of length n, the Median of the window is considered (16). Unlike the moving average filter, which is a linear system, this filter is nonlinear, rendering a more complicated analysis:
S m i = M e d i a n   ( E s i ,   E s i + 1 ,   E s i + 2 , ,   E s i + n 2 ,   E s i + n 1 )
where E s is the original pitch contour, S m is the smoothed pitch contour, and n is the number of points to calculate the Median at each instant. Figure 2h illustrates the effect of this method.

3.14. Okada Filter

This filter is an exciting combination of moving average and Median filter. This filter aims to remove the outliers from a contour while closely retaining its shape, by not incurring softening of contour definition at transitions typically observed with smoothing. Each of the estimated points E s i in a contour is compared with its immediate previous and successive points, E s i 1   and   E s i + 1 , respectively. If E s i is the median of E s i 1 ,   E s i , and E s i + 1 , then it does not need to be changed, otherwise E s i will be replaced by the average of E s i 1   and   E s i + 1 , as shown in (17). In this case, the first and the last point will not be changed [19].
S m i = E s i + E s i 1 + E s i + 1 2 E s i 2 ( 1 + e α ( E s i E s i 1 ) ( E s i E s i + 1   ) )
When α is sufficiently large, it can perform two operations: (1) if ( E s i E s i 1 ) ( E s i E s i + 1   )   0 , E s i is assigned to S m i ; and (2) if ( E s i E s i 1 ) ( E s i E s i + 1   ) > 0 , S m i is assigned by ( E s i 1 + E s i + 1 ) / 2 .
Figure 2h provides an example of this algorithm effect.

3.15. Jlassi Filter

This technique was presented by Jlassi et al. [20]. This approach has two main steps, first, finding the incorrect points in the pitch contour by considering those that exhibit a difference of more than a set threshold from both their previous and successive points. Second, replacing the incorrect point with the average of the last two points (18):
S m i = { E s i 2 + E s i 1 2        | E s i E s i 1 | > T h r e s h o l d     a n d    | E s i E s i + 1 | > T h r e s h o l d E s i                        o t h e r w i s e
The value for T h r e s h o l d is assumed to be 30, as mentioned in the original paper. Figure 2h illustrates the effect of the algorithm.

4. Smart-Median: A Real-Time Pitch Contour Smoother Algorithm

The approach applied in this study to adjust the incorrectly determined pitch values was based on the Median method, and has been named Smart-Median. The Smart-Median method is based on the belief that each contour should be smoothed based on its data features and intended applications. In other words, a general contour smoother may not be suitable for all applications. The considerations for designing the Smart-Median are given below.

4.1. Considerations

The Smart-Median algorithm is based on the following considerations:
  • Only the incorrectly estimated pitches need to be changed. Therefore, it is necessary to decide which jumps in a contour are incorrect.
  • To calculate the median, some of the estimated pitches around the incorrectly detected F0 should be selected. This represents the window length for calculating the median. Therefore, the decision on the number of estimated pitches before and/or after the erroneously estimated pitches provides the median window length. Thus, a delay is required in real-time scenarios to ensure sufficient successive pitch frequencies are available when correcting the current pitch frequency.
  • There is a minimum duration for which a human can sing.
  • There is a minimum duration for which a human can rest between singing two notes.
  • There is a maximum frequency that a human can sing
  • There is a maximum interval during which humans can move from one note to another when singing.
  • A large pitch interval in a very short time is impossible.
The following section explains our decisions regarding each of the above considerations.

4.2. Smart-Median Algorithm

The flowchart shown in Figure 3 illustrates how incorrectly estimated pitches can be distinguished. In addition, it indicates which estimated pitches should be selected to calculate the median for the wrongly detected pitches.
There are several variables and functions in the flowchart, explained as follows:
  • Fi refers to the frequency at index i.
  • AFD (Acceptable Frequency Difference) indicates the maximum pitch frequency interval acceptable for jumping between two consequent detected pitches. In two studies on speech contour-smoother algorithms [2,20], 30 Hz was selected as the AFD according to the researchers’ experiences. Because the frequency range that humans use for singing is wider than for speaking, a larger AFD is needed for singing. According to the dataset used, the largest interval between two consequently notes sung by men was from C4 to F4, at frequencies of approximately 261 Hz and 349 Hz, respectively, so the maximum interval was 88 Hz for men. The largest interval between notes sung by women was C5 to F5, at frequencies of approximately 523 Hz and 698 Hz, respectively. Therefore, the biggest interval for women was 175 Hz. According to our observations of pitch contours, the human voice cannot physically produce such a big jump within a 30 ms timestep; i.e., for moving from C4 to F4 or from C5 to F5, more than 30 ms is needed. Therefore, it was found that an AFD with a value of 75 Hz was an acceptable choice for pitch contours comprised mostly of frequencies less than 300 Hz (male voices). For those with frequencies that mostly greater than 300 Hz (female singers), 110 Hz was a good choice of AFD.
  • noZero: this is the minimum number of consequent zero pitch frequencies that should be considered a correctly estimated silence or rest. In this study, 50 milliseconds was regarded as the minimum duration for silence to be accepted as correct [43]; otherwise, the silence requires adjustment to the local median value.
  • The ZeroCounter(i) method calculates how many frequencies (pitches) of zero value exist after index i. The reason for checking the number of zero values (silence) is to ascertain whether or not the pitch detector algorithm has estimated a region of silence correctly or in error.
  • Median(i,j): calculates the median based on pitch frequencies from index i to index j.
  • PD (Prior Distance): this indicates how many estimated pitches before the current pitch frequency should be considered for the median. In this study, the PD was calculated to cover three estimated pitch frequencies, approximately 35 and 70 milliseconds for men’s and women’s voices, respectively. Nevertheless, the algorithm does not need to wait until this duration becomes available, e.g., at a time of 20 milliseconds, covering 20 milliseconds with PD is sufficient.
  • FD (Following Distance): indicates how many estimated pitches after the current pitch frequency should be considered for the median. In this study, the number three was assigned to FD, meaning that to calculate the median of the current wrongly estimated pitch required 35 milliseconds for women’s voices and 70 milliseconds for men’s voices. Therefore, in real-time environments a delay is required until three more estimated pitches are available.
  • MaxF0: indicates the maximum acceptable frequency. In this study, for male voices, a value of 600 Hz (near to tenor) and for female voices, a maximum of 1050 Hz (soprano) were considered for MaxF0. Rarely, male and female voices may exceed these boundaries. However, if the singer’s voice range is higher than these boundaries, a higher value can be considered for MaxF0.
The first condition in Figure 3 aims to calculate whether the frequency at index i is valid. There are three conditions for considering invalid estimates of pitch frequency. First, the previously estimated pitch should not be zero, because after a silence there should naturally be a significant difference between the current pitch frequency and the rest. Second, the absolute difference between the current estimated pitch and the previous one should be greater than the AFD. Finally, the number of consecutive zeros from the current index should be less than noZero. This condition checks whether the current index is zero, but it cannot be considered a proper rest.
If the current estimated pitch is not a good frequency, it branches to the right to “Yes”. The algorithm then continues by reducing the value of FD until the second condition is no longer true. In other words, the window for calculating the median shrinks until the difference between the calculated median and the previous point is less than the AFD. Finally, the correct median is held in the M e d variable. This should be less than the MaxF0 if it is considered a valid replacement value; otherwise, a zero will be substituted instead.
Since several incorrect estimated pitches have been observed after silences, the third condition in Figure 3 checks whether the estimated F0 immediately follows a silence. In this case, the difference between the current estimated F0 and the next estimated F0 is considered. If neither the first nor the third conditions are correct, the estimated F0 is assumed to be accurate, and it does not need to be changed.
For more detail, the algorithm’s source code is available from the GitHub repository mentioned above.

5. Results

This section provides the results of the comparisons between the Smart-Median algorithm and the other 35 contour smoothers mentioned in Section 3.3. Three groups of data were obtained for evaluation. These groups were 1—the ground truth pitch contour (GT), 2—the original estimated pitches (ES), and 3—the smoothed contour. The metrics explained in Section 3.5 were employed to compare these data groups. The data series were compared two by two, i.e., GT with ES, GT with SM, and ES with SM.
Table A4, Table A5, Table A6, Table A7 and Table A8 show the accuracy of each of the pitch detector algorithms, and the accuracy of contour-smoother algorithms applied to the estimated pitch contours to bring them closer to the ground truth pitch contour. The GT–ES columns show the initial difference between ground truth and the original estimated pitch contour. Next, the differences between ground truth and the smoothed contours are shown in the GT–MS columns. Finally, the results of comparing the initially estimated pitch contour and the smoothed pitch contour are provided in the ES–SM columns. The metrics comparing GT and SM are more important than those comparing GT–ES and ES–SM, because the values of GT–SM illustrate the resulting improvement supplied by each algorithm. For example, in the Specacf column in Table A7, the first row (smoother algorithm with code 00) shows that according to the FFE metric GT–ES = 40, GT–SM = 48, and ES–SM = 61. That is, 40 per cent of the pitches estimated by the Specacf algorithm were correct. Then, the smoother algorithm improved this to 48 per cent acceptable data. Finally, 61 per cent of the values in the estimated pitch and smoothed contour remained in the same range; i.e., the smoother algorithm significantly changed 39 per cent of the values.
According to Table A4, Table A5, Table A6 and Table A7, the Smart-Median was the best algorithm for all pitch contours estimated by Specacf, FComb, MComb, Yin, or YinFFT. However, the best accuracy for the pitch contours calculated by Praat was recorded by the contour smoother code 33 (standard median). However, there was no agreement between the metrics employed to select the best smoother of pitch contours generated by Schmitt or PYin.
Table A8 aggregates all the data in Table A4, Table A5, Table A6 and Table A7. It can be observed in Table A8 that all the metrics agree that the Smart-Median worked better than the other smoother algorithms.
Only the GT–SM column was considered to have found significant differences between the accuracy of the algorithms. All the algorithms in the range of the column average plus/minus standard deviation were considered to exhibit a similar accuracy. The algorithms with values outside this range were considered to be in the best or worst category, as shown in Table 2. There were certain agreements and disagreements between the metrics employed to find the best and worst algorithms. For example, the smoother code 07 was in the worst category based on the metrics MAE and RMSE, but in the best category based on the FFE metric. These agreements and disagreements are discussed in Section 6.
An ANOVA test was used to check the accuracy of the smoother algorithms. For all the metrics, the p-value calculated for each smoother algorithm was 0. That means that the accuracy of all the smoother algorithms depended on errors that occurred in the pitch contours, i.e., the smoother algorithms did not work with the same accuracy when each pitch contour was affected by different sources of error.

6. Discussion

This section discusses several aspects of the results obtained in Section 5. Because this paper focuses on the Smart-Median method, the only considerations provided here are those relating to comparisons of the accuracy of Smart-Median with that of other smoother algorithms.

6.1. Comparing the Results of Each Metric

A higher R2 value does not always mean a better fitting [44]. For example, Table 3 shows the R-squared scores of three series of predicted data. According to the R-squared (R2) scores in Table 3, the order of the best prediction to the worst was 4, 3, 2, then 1. However, Predict 3 estimated two wrong notes, such that each was one tone above the corresponding ground truth notes (A2 instead of G2), while Predicts 1 and 2 each had only one wrong estimated note (B2 instead of A2). Therefore, musically, the third was the worst, but based on R-squared, it was the second-best. In addition, musically, Predict 1 and Predict 2 were similar, and the 0.2 Hz pitch frequency difference could easily have resulted from a different method of F0 tracking, but their R-squared scores were different. In conclusion, we cannot compare two series of smoothed pitches based only on R-squared.
According to the RMSE and MAE columns in Table 3, the best to worst series were 4, 2, 1, then 3. This order is better than that based on R-squared. However, musically, we need to consider the similarity of Predict 1 and Predict 2; based on the FFE column in Table 3, Predicts 1 and 2 both had the same value. As shown in Table 3, Predict 4 was the best according to all the metrics, and musically, it was also the best. Moreover, although Predicts 1 and 2 were musically similar (FFE metric), Predict 2 was more accurate than Predict 1 (R2, RMSE, and MAE metrics).
To conclude, a single metric alone cannot provide a clear and accurate evaluation to compare pitch contours, but a firm conclusion can be reached by using all of them.

6.2. Comparing Moving Average, Median, Okada, Jlassi, and Smart-Median

The main weakness of the Median, Okada [19], and Jlassi [20] filters is that they only adjust noises with a duration of one point in the contour. In other words, if more than one consecutive wrongly estimated pitch occurs within a contour, these algorithms cannot smooth the errors. The following example illustrates the operation of the moving average, Median, Okada, Jlassi, and Smart-Median approaches on a data series.
As shown in Table 4, the moving average and Median methods changed some of the correctly estimated values, i.e., the 102 value which was the second piece of input data. On the other hand, Okada’s and Jlassi’s approaches did not change any of the values, because they look for significant differences with immediately preceding and following points. However, the Smart-Median is mainly concerned with finding an acceptable jump by comparing the current and previous points. Because of this different approach to identification of errors, when the pitch contour was already almost smooth (contours estimated by Praat and PYin) there was no significant difference between the accuracy of these approaches (as seen by comparing rows 00, 33, 34, and 35 in Praat and PYin columns in Table A4, Table A5, Table A6 and Table A7). However, while the pitch contours estimated by the other pitch detection algorithms exhibited several errors, Smart-Median appears to have worked in a meaningful manner that outperformed all other methods (observable in Specacf, Schmitt, FComb, MComb, Yin, and YinFFT columns in Table A4, Table A5, Table A6 and Table A7).
Generally, according to Table A8, the accuracy of Smart-Median based on the four metrics was much better than all the other algorithms.

6.3. Accuracy of the Contour Smoother Algorithms

All the contour smoother algorithms provided strong results according to the R2 and RMSE metrics (by comparing the GT–ES columns with GT–SM columns in Table A8). However, only the Smart-Median (00), Median (33), and Jlassi (35) approaches could change the pitch contour to ensure that more of the estimated F0 values were constrained to the range of 20% of the ground truth pitch contour (Table A4, Table A5, Table A6, Table A7 and Table A8). Therefore, although all the algorithms smoothed contour errors, many also altered the value of the corrected estimated pitches.

7. Conclusions

This paper has introduced a new pitch-contour-smoother targeted towards the singing voice in real-time environments. The proposed algorithm is based on the median filter and considers the features of fundamental frequencies in singing. The algorithm’s accuracy was compared with 35 other smoother techniques, and four metrics evaluated their results: R-Squared, Root-Mean-Square Error, Mean Absolute Error, and F0 Frame Error. The proposed Smart-Median algorithm achieved better results across all the metrics, in comparison to the other smoother algorithms. According to this study, a buffer delay of 35 to 70 milliseconds is required for the algorithm to smooth the contour appropriately.
Most of the general smoother algorithms did not show acceptable accuracy. A general observation is that in the ideal case, a smoother algorithm should be defined based on the essential features of the data in the contour and how that data is to be used after smoothing.
For future work, one short-term task is based on recognizing that the parameters of the Smart-Median can be set according to the specific properties of the sound input, such as those of particular musical instruments or their families, to improve accuracy in a targeted way. Another task considers that the Smart-Median finds the incorrect F0 based on its interval from the previous F0; this approach can be improved by considering a maximum noise duration. For example, if there is a considerable frequency interval between the previous F0 and the current one, or if several immediately subsequent F0s are near to the current F0, then we may not consider the large jump to be noise but rather a new musical articulation. This requires the introduction of an extra decision-making stage into the algorithm. In the longer term, further testing can be carried out on vocal material from a wide variety of genres and techniques. This would require the creation of new, specialist corpora, requiring considerable manual effort in both the gathering and labelling. This can be supported by machine learning. Such a dataset would also benefit the research field at large.

Author Contributions

Conceptualization, B.F.; methodology, B.F.; software, B.F.; validation, B.F.; formal analysis, B.F.; investigation, B.F.; resources, B.F.; data curation, B.F.; writing—original draft preparation, B.F.; writing—review and editing, B.F. and J.T.; visualization, B.F.; supervision, J.T.; project administration, B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the files relevant to this study, including an implementation of the proposed algorithm and the dataset generated by all the algorithms mentioned in this paper, are available online at https://github.com/BehnamFaghihMusicTech/Onset-Detection, accessed on 6 July 2022.

Acknowledgments

Thanks to Maynooth University and the Higher Education Authority in the Department of Further and Higher Education, Research, Innovation and Science in Ireland for their supports.

Conflicts of Interest

The authors declare no conflict of interest. In addition, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. The Python Libraries Used

Table A1. Python libraries used for pitch detection.
Table A1. Python libraries used for pitch detection.
LibraryPitch Detection Algorithm
Aubio [26]YinFFT, FComb, MComb, Schmitt, and Specacf
Librosa [45]PYin
Table A2. Python libraries used for smoothing pitch contours.
Table A2. Python libraries used for smoothing pitch contours.
Python LibrarySmoother Algorithm
TSmoothie (https://pypi.org/project/tsmoothie/, accessed on 1 February 2022)Exponential, Window-based (Convolution), Direct Spectral, Polynomial, Spline, Gaussian (code 14), Lowess, Decompose, Kalman
Scipy [46]Savitzky–Golay filter, Gaussian (code 01), Median
Pandas [47]Moving average
Table A3. Python libraries used for evaluating smoothed pitch contours.
Table A3. Python libraries used for evaluating smoothed pitch contours.
Python LibraryMetric
Sklearn [48]Mean Squared Error, Mean Absolute Error, R2 Score

Appendix B. Comparisons of Contour Smoother Algorithms

Table A4. Comparing pitch estimators and contour smoothers algorithms by ground truth based on the mean absolute error (MAE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
Table A4. Comparing pitch estimators and contour smoothers algorithms by ground truth based on the mean absolute error (MAE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
AlgorithmSpecacfSchmittFCombMCombYinYinFFTPraatPYin (GT)
GT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SM
002561232086861261234210846322858811454361263914140.701.21.2
012562331126863201231225646471558858129161603514140.2033
022562361286864211231225946471558858431661603714140.202.72.7
032562341186862261231226246472058858330261584014131.205.55.5
042562361286864211231225946471558858431661603714140.202.72.7
052562321246863231231226346481758858232561604014140.303.43.4
062562351046864181231225146471358858226661603214140.202.52.5
07256237966864161231224446471158858423761602814140.1022
082562331136864201231225646471558858229161603514140.202.92.9
092562441476869301231378346542458874354561775914140.505.65.6
102562262086877721231381464672645886286246187100144338.2044.344.3
11256227184686855123131122465946588626567617481142113.4023.623.6
12256227181686852123131120465844588648579617479142113.1021.521.5
13256227190687159123133128466352588613580617787142113.9028.728.7
14256226186686856123132127466048588660619617784142214.3024.224.2
15256227189687261123132126466453588580535617684142517.5031.231.2
16256223168686444123123104465337588580478616667141810016.916.9
172562361286864211231225946471558858431661603714140.202.72.7
18256214195686964123126132466255588571569617488142315.8031.131.1
19256227190687159123133128466352588613580617787142113.9028.728.7
20256227190687159123133128466352588613580617787142113.9028.728.7
21264227189686656128129125475847591577519627283142113.7024.524.5
22256227190687159123133128466352588613580617787142113.9028.728.7
232562251366862321231217846502658857638161625214140.807.37.3
242562341166864221231246046481758859932261623914140.203.23.2
252562311376865381231298746543258866548361736114141.7011.911.9
262562451026866201231325746501658870338061724114140.303.63.6
272562291336864301231257646512558862341061675114141.5099
282562351226865231231266546491858863437361664314140.304.54.5
292562361146865271231286746522358865638661714714141.5099
302562441056866211231315946501758869438061724114140.404.44.4
31261235153696239126125884751306005924116261571413209.49.4
32256232976861231231215446471858858025861583514131.305.75.7
332562289668631112310833464565884172016148201414000.30.3
342562261226862201231135446461358851029261543514140.101.91.9
3525623410268649123107364644558841024061461914140000
Table A5. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the R-squared (R2) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
Table A5. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the R-squared (R2) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
AlgorithmSpecacfSchmittFCombMCombYinYinFFTPraatPYin (GT)
GT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SM
00−28−30−0.5−0.30.7−22−10.3−10.20.7−1153−3−0.4−2210.70.80.84110.970.97
01−28−200.8−0.5−0.21−22−170.8−1−0.91−1153−4620.7−22−100.90.80.81110.990.99
02−28−200.7−0.5−0.30.9−22−170.8−1−0.90.9−1153−5160.6−22−110.90.80.81110.990.99
03−28−200.7−0.5−0.30.9−22−170.8−1−0.90.9−1153−5260.6−22−110.90.80.81110.980.98
04−28−200.7−0.5−0.30.9−22−170.8−1−0.90.9−1153−5170.6−22−110.90.80.81110.990.99
05−28−190.7−0.5−0.20.9−22−160.8−1−0.80.9−1153−4280.6−22−100.90.80.81110.990.99
06−28−200.8−0.5−0.31−22−170.9−1−0.91−1153−5010.7−22−110.90.80.81110.990.99
07−28−210.8−0.5−0.31−22−180.9−1−11−1153−5610.8−22−120.90.80.811111
08−28−200.7−0.5−0.31−22−170.8−1−0.91−1153−4690.7−22−110.90.80.81110.990.99
09−28−200.6−0.5−0.30.9−22−170.8−1−0.90.9−1153−4900.5−22−110.80.80.81110.990.99
10−28−150.4−0.500.6−22−100.5−1−0.40.7−1153−1910.2−22−50.60.80.60.710.790.79
11−28−170.5−0.500.8−22−130.6−1−0.50.8−1153−2390.3−22−60.70.80.8110.930.93
12−28−170.5−0.500.8−22−130.6−1−0.50.8−1153−2670.4−22−70.70.80.8110.940.94
13−28−160.4−0.500.7−22−120.6−1−0.50.8−1153−2040.3−22−60.70.80.8110.90.9
14−28−160.4−0.500.8−22−120.6−1−0.50.8−1153−2450.3−22−60.70.80.79110.930.93
15−28−160.4−0.500.7−22−120.5−1−0.50.7−1153−2380.3−22−60.60.80.770.910.850.85
16−28−170.5−0.500.8−22−130.6−1−0.50.8−1153−2590.4−22−60.70.80.81110.950.95
17−28−200.7−0.5−0.30.9−22−170.8−1−0.90.9−1153−5170.6−22−110.90.80.81110.990.99
18−28−150.4−0.50.20.7−22−100.5−1−0.30.7−1153−1550.3−22−40.60.80.80.910.890.89
19−28−160.4−0.500.7−22−120.6−1−0.50.8−1153−2040.3−22−60.70.80.8110.90.9
20−28−160.4−0.500.7−22−120.6−1−0.50.8−1153−2040.3−22−60.70.80.8110.90.9
21−30−170.4−0.60.10.7−24−120.5−1−0.40.8−1208−1940.3−23−50.70.80.8110.920.92
22−28−160.4−0.500.7−22−120.6−1−0.50.8−1153−2040.3−22−60.70.80.8110.90.9
23−28−180.7−0.5−0.10.9−22−140.8−1−0.60.9−1153−2880.6−22−70.80.80.81110.980.98
24−28−200.7−0.5−0.30.9−22−170.8−1−0.91−1153−4570.7−22−100.90.80.81110.990.99
25−28−180.7−0.500.9−22−140.8−1−0.60.9−1153−3490.7−22−80.80.80.81110.980.98
26−28−210.8−0.5−0.31−22−170.9−1−0.91−1153−5840.8−22−130.90.80.811111
27−28−180.7−0.5−0.10.9−22−140.8−1−0.60.9−1153−3530.6−22−90.90.80.81110.980.98
28−28−190.7−0.5−0.20.9−22−160.8−1−0.80.9−1153−4490.7−22−100.90.80.81110.990.99
29−28−190.8−0.5−0.10.9−22−150.9−1−0.70.9−1153−4680.8−22−110.90.80.81110.990.99
30−28−210.8−0.5−0.31−22−170.9−1−0.91−1153−5630.8−22−130.90.80.81110.990.99
31−31−220.6−0.7−0.30.8−25−180.7−2−10.8−1308−4780.5−25−110.80.80.81110.960.96
32−28−200.8−0.5−0.20.9−22−170.9−1−0.80.9−1153−5010.8−22−110.90.80.81110.990.99
33−28−230.6−0.5−0.41−22−190.8−1−1.11−1153−2660.4−22−120.80.80.811111
34−28−200.6−0.5−0.30.9−22−170.7−1−0.90.9−1153−3890.5−22−90.80.80.81110.990.99
35−28−220.5−0.5−0.40.9−22−200.7−1−1.10.9−1153−3760.4−22−110.80.80.811111
Table A6. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the Root-Mean-Square Error (RMSE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
Table A6. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the Root-Mean-Square Error (RMSE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
AlgorithmSpecacfSchmittFCombMCombYinYinFFTPraatPYin (GT)
GT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SM
00394161370111967325879240966662208615320771945815921213.104.54.5
01394307188111973625820611296843220861342121019413710221211.109.59.5
02394315220111994025821112796863620861417140519414311721211.1010.210.2
03394315201111964825821013196844320861427130619414111521202.4014.814.8
04394315220111994025821112796863620861417140519414311721211.1010.210.2
05394302207111964025820212496843620861291133219413411321211.2010.710.7
063943121761119933258210104968530208613961130194142952121108.68.6
0739432116511110130258216959687272086147510541941488821210.907.77.7
08394308190111983625820611396853220861351122119413810321211.109.59.5
093943112281111014725821114196874220861373148419414112721211.3012.412.4
10394256300111969725816721896919020868181839194117185215346.8060.660.6
11394265277111887725817219296807020869361768194110165212717.5034.834.8
12394266274111887525817319096796720869901741194113161212717.1031.531.5
13394263281111908125817219896827420868511816194109171212717.9040.640.6
14394260280111877925816719696797020869401770194110165212818.6034.334.3
15394266285111958725817620296888120869061800194119175213323.8049.149.1
16394266263111876725817217696776020869641685194110152212413.402828
17394315220111994025821112796863620861417140519414311721211.1010.210.2
18394243288111868625815520596807920867721836194102175212920.5044.244.2
19394263281111908125817219896827420868511816194109171212717.9040.640.6
20394263281111908125817219896827420868511816194109171212717.9040.640.6
21403259283110847826516420096787220388231739199105172212717.8036.736.7
22394263281111908125817219896827420868511816194109171212717.9040.640.6
23394279216111894925818313896784420861071140419411712321211.9014.614.6
24394306192111983825820611696853420861331123519413710521211.209.99.9
25394284203111885325818313796784820861199127819412611621212.9018.518.5
263943221531111003125821594968727208615119541941508121211088
27394287207111924525819012996814120861197133419412711621212.7015.515.5
28394304197111973925820411996843520861332126819413710721211.2010.510.5
2939430316911193392581981089682362086137610521941409321212.5014.314.3
30394319158111993225821296968628208614959781941498421211.108.78.7
31397303247112936526120016897825921091294161319613014721193.902222
3239431116211193402582051069681362086140010441941389321202.5013.413.4
33394319250111103332582191279690262086790159119412412321210.601.51.5
34394305249111974625820614496863920861143156619413213121210.909.19.1
35394317277111105432582201429690282086729177919411712321210.6000
Table A7. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the F0 Frame Error (FFE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
Table A7. Comparing pitch estimators and contour-smoother algorithms by ground truth based on the F0 Frame Error (FFE) metric. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
AlgorithmSpecacfSchmittFCombMCombYinYinFFTPraatPYin (GT)
GT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SM
004048616769897782848487924552638890979595.199.810099.899.8
014033606764867763728477864536778881859595.110010095.795.7
024035616766907766778479914539838884919595.110010098.398.3
034036646767877767788480914540838884919595.110010098.398.3
044035616766907766778479914539838884919595.110010098.398.3
054034616765887764748478884537798882889595.110010097.497.4
064035656765897765768479894538818882889595.110010097.497.4
074036696766927766808479924539858884919595.110010098.398.3
084034636765897765758478894538808882889595.110010097.497.4
094020416752707747548465734515438864689595.110010084.684.6
104021326746517741428454584513338855579583.386.91007272
114021366749587744468460654517448861639595.299.610080.180.1
124021366749597744478461664517428862649595.299.610081.381.3
134021356747567742448457624516418858609595.299.610076.876.8
144020356748587743458460654513358860629595.299.510080.680.6
154027426755647751548465704526568868709594.598.810083.283.2
164027446756687752558467734526588870729595.299.71008686
174035616766907766778479914539838884919595.110010098.398.3
184022346747547743448456614517418858609595.399.410075.675.6
194021356747567742448457624516418858609595.299.610076.876.8
204021356747567742448457624516418858609595.299.610076.876.8
213821366650587645478360654318478862649595.399.610079.879.8
224021356747567742448457624516418858609595.299.610076.876.8
234022456752707750568464724522598867719595.110010085.185.1
244023496753747753618467754525648870749595.110010086.786.7
254021436751677747538463704516438863669595.199.910083.383.3
264021516753767751618466764519548866709595.110010084.884.8
274022456752717750578465744521568867719595.199.910084.784.7
284022476753737751598466744522598868729595.110010084.684.6
294021496752747750598466744519528866709595.199.910084.884.8
304021506753757750608466764519538866719595.110010084.984.9
313934576666807664728378874437768882879595.299.910097.297.2
324030606760807761708474824532728877829595.110010092.892.8
334042806769957779948484984545958889999595.1100100100100
344034616763817768768477854536788881859595.110010094.794.7
354042826769967780948484984545958889999595.1100100100100
Table A8. Comparing the mean of pitch estimators and contour-smoother algorithms by ground truth based on the four metrics. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
Table A8. Comparing the mean of pitch estimators and contour-smoother algorithms by ground truth based on the four metrics. GT = Ground Truth, ES = Estimated pitch contour, SM = Smoothed contour.
AlgorithmMAER2RMSEFFE
GT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SMGT-ESGT-SMES-SM
0016559136−175−10.445191426717584
0116516076−175−730.9451313240716481
0216516182−175−810.8451327278716685
0316516081−175−820.8451327264716785
0416516182−175−810.8451327278716685
0516516085−175−680.8451304265716582
0616516169−175−790.9451324224716684
0716516162−175−870.9451338209716687
0816516076−175−740.9451315242716583
09165191127−175−770.8451321296715164
10165181179−175−320.5451228397714551
11165172153−175−390.7451240367715059
12165175153−175−430.7451248361715059
13165172158−175−340.6451228377714857
14165178162−175−400.6451239368714957
15165168152−175−390.6451241379715565
16165161130−175−420.7451243345715667
1716516182−175−810.8451327278716685
18165163160−175−260.6451210384714856
19165172158−175−340.6451228377714857
20165172158−175−340.6451228377714857
21168164147−184−320.6448220366705060
22165172158−175−340.6451228377714857
23165159101−175−470.8451262282715367
2416516482−175−720.9451312246715571
25165176120−175−550.8451283262715163
2616518388−175−910.9451344192715370
27165168104−175−560.8451285268715368
2816517092−175−710.9451311252715469
2916517595−175−730.9451316214715368
3016518289−175−880.9451340197715369
31168163111−199−760.7456303329706580
3216515970−175−780.9451321212716178
3316513252−175−460.8451238307717294
3416514676−175−620.8451284311716581
3516513159−175−610.7451228342717295

References

  1. Ferro, M.; Tamburini, F. Using Deep Neural Networks for Smoothing Pitch Profiles in Connected Speech. Ital. J. Comput. Linguist. 2019, 5, 33–48. [Google Scholar] [CrossRef]
  2. Zhao, X.; O’Shaughnessy, D.; Nguyen, M.Q. A Processing Method for Pitch Smoothing Based on Autocorrelation and Cepstral F0 Detection Approaches. In Proceedings of the 2007 International Symposium on Signals, Systems and Electronics, Montreal, QC, Canada, 30 July–2 August 2007; pp. 59–62. [Google Scholar] [CrossRef]
  3. So, Y.; Jia, J.; Cai, L. Analysis and Improvement of Auto-Correlation Pitch Extraction Algorithm Based on Candidate Set; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2012; Volume 128, pp. 697–702. ISBN 9783642257919. [Google Scholar]
  4. Faghih, B.; Timoney, J. An Investigation into Several Pitch Detection Algorithms for Singing Phrases Analysis. In Proceedings of the 2019 30th Irish Signals and Systems Conference (ISSC), Maynooth, Ireland, 17–18 June 2019; pp. 1–5. [Google Scholar]
  5. Faghih, B.; Timoney, J. Real-Time Monophonic Singing Pitch Detection. 2022; preprint. [Google Scholar] [CrossRef]
  6. Luers, J.K.; Wenning, R.H. Polynomial Smoothing: Linear vs Cubic. Technometrics 1971, 13, 589. [Google Scholar] [CrossRef]
  7. Craven, P.; Wahba, G. Smoothing Noisy Data with Spline Functions. Numer. Math. 1978, 31, 377–403. [Google Scholar] [CrossRef]
  8. Hutchinson, M.F.; de Hoog, F.R. Smoothing Noisy Data with Spline Functions. Numer. Math. 1985, 47, 99–106. [Google Scholar] [CrossRef]
  9. Deng, G.; Cahill, L.W. An Adaptive Gaussian Filter for Noise Reduction and Edge Detection. In Proceedings of the 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference, San Francisco, CA, USA, 31 October–6 November 1993; pp. 1615–1619. [Google Scholar]
  10. Cleveland, W.S. Robust Locally Weighted Regression and Smoothing Scatterplots. J. Am. Stat. Assoc. 1979, 74, 829–836. [Google Scholar] [CrossRef]
  11. Cleveland, W.S. LOWESS: A Program for Smoothing Scatterplots by Robust Locally Weighted Regression. Am. Stat. 1981, 35, 54. [Google Scholar] [CrossRef]
  12. Wen, Q.; Zhang, Z.; Li, Y.; Sun, L. Fast RobustSTL: Efficient and Robust Seasonal-Trend Decomposition for Time Series with Complex Patterns. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 2203–2213. [Google Scholar]
  13. Sampaio, M.D.S. Contour Similarity Algorithms. MusMat-Braz. J. Music Math. 2018, 2, 58–78. [Google Scholar]
  14. Wu, Y.D. A New Similarity Measurement of Pitch Contour for Analyzing 20th- and 21st-Century Music: The Minimally Divergent Contour Network. Indiana Theory Rev. 2013, 31, 5–51. [Google Scholar]
  15. Lin, H.; Wu, H.-H.; Kao, Y.-T. Geometric Measures of Distance between Two Pitch Contour Sequences. J. Comput. 2008, 19, 55–66. [Google Scholar]
  16. Chatterjee, I.; Gupta, P.; Bera, P.; Sen, J. Pitch Tracking and Pitch Smoothing Methods-Based Statistical Approach to Explore Singers’ Melody of Voice on a Set of Songs of Tagore; Springer: Singapore, 2018; Volume 462, ISBN 9789811079009. [Google Scholar]
  17. Smith, S.W. Moving Average Filters. In The Scientist & Engineer’s Guide to Digital Signal Processing; California Technical Publishing: San Clemente, CA, USA, 1999; pp. 277–284. ISBN 0-9660176-7-6. [Google Scholar]
  18. Kasi, K.; Zahorian, S.A. Yet Another Algorithm for Pitch Tracking. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, Orlando, FL, USA, 13–17 May 2002; Volume 1, pp. I-361–I-364. [Google Scholar]
  19. Okada, M.; Ishikawa, T.; Ikegaya, Y. A Computationally Efficient Filter for Reducing Shot Noise in Low S/N Data. PLoS ONE 2016, 11, e0157595. [Google Scholar] [CrossRef]
  20. Jlassi, W.; Bouzid, A.; Ellouze, N. A New Method for Pitch Smoothing. In Proceedings of the 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Monastir, Tunisia, 21–23 March 2016; pp. 657–661. [Google Scholar]
  21. Liu, Q.; Wang, J.; Wang, M.; Jiang, P.; Yang, X.; Xu, J. A Pitch Smoothing Method for Mandarin Tone Recognition. Int. J. Signal Process. Image Process. Pattern Recognit. 2013, 6, 245–254. [Google Scholar]
  22. Plante, F.; Meyer, G.; Ainsworth, W.A. A Pitch Extraction Reference Database. In Proceedings of the Fourth European Conference on Speech Communication and Technology, Madrid, Spain, 18–21 September 1995; EUROSPEECH: Madrid, Spain, 1995. [Google Scholar]
  23. Gawlik, M.; Wiesław, W. Modern Pitch Detection Methods in Singing Voices Analyzes. In Proceedings of the Euronoise 2018, Crete, Greece, 27–31 May 2018; pp. 247–254. [Google Scholar]
  24. Mauch, M.; Dixon, S. PYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 659–663. [Google Scholar] [CrossRef] [Green Version]
  25. De Cheveigné, A.; Kawahara, H. YIN, a Fundamental Frequency Estimator for Speech and Music. J. Acoust. Soc. Am. 2002, 111, 1917–1930. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Aubio. Available online: https://aubio.org/ (accessed on 1 February 2022).
  27. Brossier, P.M. Automatic Annotation of Musical Audio for Interactive Applications. Ph.D. Thesis, Queen Mary University of London, London, UK, 2006. [Google Scholar]
  28. Boersma, P.; van Heuven, V. PRAAT, a System for Doing Phonetics by Computer. Glot. Int. 2001, 5, 341–347. [Google Scholar]
  29. Boersma, P. Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-To-Noise Ratio of a Sampled Sound. In Proceedings of the Institute of Phonetic Sciences; University of Amsterdam: Amsterdam, The Netherlands, 1993; Volume 17, pp. 97–110. [Google Scholar]
  30. Buitinck, L.; Louppe, G.; Blondel, M.; Pedregosa, F.; Mueller, A.; Grisel, O.; Niculae, V.; Prettenhofer, P.; Gramfort, A.; Grobler, J.; et al. API Design for Machine Learning Software: Experiences from the Scikit-Learn Project. arXiv 2013, arXiv:1309.0238. [Google Scholar]
  31. Colin Cameron, A.; Windmeijer, F.A.G. An R-Squared Measure of Goodness of Fit for Some Common Nonlinear Regression Models. J. Econom. 1997, 77, 329–342. [Google Scholar] [CrossRef]
  32. Drugman, T.; Alwan, A. Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics. In Proceedings of the Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011. [Google Scholar] [CrossRef] [Green Version]
  33. Savitzky, A.; Golay, M.J.E. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
  34. Dai, W.; Selesnick, I.; Rizzo, J.-R.; Rucker, J.; Hudson, T. A Nonlinear Generalization of the Savitzky-Golay Filter and the Quantitative Analysis of Saccades. J. Vis. 2017, 17, 10. [Google Scholar] [CrossRef]
  35. Schmid, M.; Rath, D.; Diebold, U. Why and How Savitzky–Golay Filters Should Be Replaced. ACS Meas. Sci. Au 2022, 2, 185–196. [Google Scholar] [CrossRef]
  36. Rej, R. NIST/SEMATECH e-Handbook of Statistical Methods; American Association for Clinical Chemistry: Washington, DC, USA, 2003. [Google Scholar]
  37. Braun, S. WINDOWS. In Encyclopedia of Vibration; Elsevier: Amsterdam, The Netherlands, 2001; Volume 2, pp. 1587–1595. [Google Scholar]
  38. Podder, P.; Zaman Khan, T.; Haque Khan, M.; Muktadir Rahman, M. Comparative Performance Analysis of Hamming, Hanning and Blackman Window. Int. J. Comput. Appl. 2014, 96, 1–7. [Google Scholar] [CrossRef]
  39. Orfanidis, S.J. Local Polynomial Filters. In Applied Optimum Signal Processing; McGraw-Hill Publishing Company: New York, NY, USA, 2018; pp. 119–163. [Google Scholar]
  40. Jones, W.M.P. Kernel Smoothing; Chapman & Hall: London, UK, 1995. [Google Scholar]
  41. Dagum, E.B. Time Series Modeling and Decomposition. Statistica 2010, 70, 433–457. [Google Scholar] [CrossRef]
  42. Welch, G.F. Kalman Filter. In Computer Vision; Springer International Publishing: Cham, Switzerland, 2021; pp. 1–3. ISBN 9789533070940. [Google Scholar]
  43. Kroher, N.; Gomez, E. Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings. IEEE/ACM Trans. Audio Speech Lang. Process. 2016, 24, 901–913. [Google Scholar] [CrossRef] [Green Version]
  44. Lewis-Beck, M.S.; Skalaban, A. The R-Squared: Some Straight Talk. Polit. Anal. 1990, 2, 153–171. [Google Scholar] [CrossRef]
  45. McFee, B.; Metsai, A.; McVicar, M.; Balke, S.; Thomé, C.; Raffel, C.; Zalkow, F.; Malek, A.; Dana; Lee, K.; et al. Librosa/Librosa: 0.9.1 2022. Available online: https://librosa.org/doc/latest/index.html (accessed on 5 February 2022).
  46. Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Reback, J.; McKinney, W.; Jbrockmendel; den Bossche, J.V.; Augspurger, T.; Cloud, P.; Gfyoung; Sinhrks; Klein, A.; Roeschke, M.; et al. Pandas-Dev/Pandas: Pandas 1.0.3; Zenodo: Genève, Switzerland, 2020. [Google Scholar] [CrossRef]
  48. Barupal, D.K.; Fiehn, O. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 2, 2825–2830. [Google Scholar]
Figure 1. Pitch contours for a female singer of arpeggios in the C scale. (a) pitch contour estimated by Pyin (ground truth), Praat, Yin, and YinFFT algorithms. (b) pitch contour estimated by Pyin (ground truth), Fcomb, Schmitt, Mcomb, and Specacf.
Figure 1. Pitch contours for a female singer of arpeggios in the C scale. (a) pitch contour estimated by Pyin (ground truth), Praat, Yin, and YinFFT algorithms. (b) pitch contour estimated by Pyin (ground truth), Fcomb, Schmitt, Mcomb, and Specacf.
Applsci 12 07026 g001
Figure 2. The effect of each contour-smoother algorithm on a pitch contour from a female singer producing arpeggios in the C major scale. The pitch estimator algorithm was FComb. GT = Ground Truth (PYin), ST = Estimated pitch contour. For more straightforward observation, the smoothed contours are plotted in parts (ah). Each panel (af) plots three smoothed contours, while panels (g,h) have four contours each. Descriptions of the algorithms’ codes are provided in Table 1.
Figure 2. The effect of each contour-smoother algorithm on a pitch contour from a female singer producing arpeggios in the C major scale. The pitch estimator algorithm was FComb. GT = Ground Truth (PYin), ST = Estimated pitch contour. For more straightforward observation, the smoothed contours are plotted in parts (ah). Each panel (af) plots three smoothed contours, while panels (g,h) have four contours each. Descriptions of the algorithms’ codes are provided in Table 1.
Applsci 12 07026 g002aApplsci 12 07026 g002b
Figure 3. The central part of the Smart-Median algorithm for smoothing a pitch contour.
Figure 3. The central part of the Smart-Median algorithm for smoothing a pitch contour.
Applsci 12 07026 g003
Table 1. Code of each of the contour smoother algorithms.
Table 1. Code of each of the contour smoother algorithms.
CodeAlgorithm
00Smart-Median
01Gaussian (sigma = 1)
02Savitzky–Golay filter
03Exponential
04Window-based (window_type = ‘rectangular’)
05Window-based (window_type = ‘hanning’)
06Window-based (window_type = ‘hamming’)
07Window-based (window_type = ‘bartlett’)
08Window-based (window_type = ‘blackman’)
09Direct Spectral
10Polynomial
11Spline (type = ‘linear_spline’)
12Spline (type = ‘cubic_spline’)
13Spline (type = ‘natural_cubic_spline’)
14Gaussian (sigma = 0.2, n_knots = 10)
15Binner
16LOWESS
17Decompose (type = ‘Window-based’, method = ‘additive’)
18Decompose (type = ‘lowess’, method = ‘additive’)
19Decompose (type = ‘natural_cubic_spline’, method = ‘additive’)
20Decompose (type = ‘natural_cubic_spline’, method = ‘multiplicative’)
21Decompose (type = ‘lowess’, method = ‘multiplicative’)
22Decompose (type = ‘natural_cubic_spline’, method = ‘multiplicative’)
23Kalman (component = ‘level’)
24Kalman (component = ‘level_trend’)
25Kalman (component = ‘level_season’)
26Kalman (component = ‘level_trend_season’)
27Kalman (component = ‘level_longseason’)
28Kalman (component = ‘level_trend_longseason’)
29Kalman (component = ‘level_season_longseason’)
30Kalman (component = ‘level_trend_season_longseason’)
31Moving Average (simple = True)
32Moving Average (simple = False)
33Median Filter
34Okada Filter
35Jlassi Filter
Table 2. Dividing the contour smoother algorithms into three categories (best, normal, and worst) based on the standard deviation.
Table 2. Dividing the contour smoother algorithms into three categories (best, normal, and worst) based on the standard deviation.
Best
Code (Value)
NormalWorst
Code (Value)
MAE00 (58.71)
33 (131.85)
35 (131.3)
Avg = 162.56Std = 21.2509 (190.95)
Min = 141.31Max = 183.81
All the other algorithms
R200 (−0.72)
10 (−31.59)
13 (−34.07)
18 (−26.9)
19 (−34.7)
20 (−34.07)
21 (−32.46)
22 (−34.07)
Avg = −58.01Std = 21.9802 (−80.79)
03 (−82.22)
04 (−80.81)
07 (−87.46)
17 (−80.81)
26 (−90.75)
30 (−87.66)
Min = −79.99Max = −36.03
All the other algorithms
RMSE00 (90.67)
18 (209.56)
21 (220.12)
Avg = 275.62Std = 53.107 (338.41)
26 (343.63)
30 (339.93)
Min = 222.52Max = 328.72
All the other algorithms
FFE00 (74.73)
02 (66.21)
03 (66.87)
04 (66.22)
07 (66.48)
17 (66.22)
33 (71.87)
35 (71.99)
Avg = 57.59Std = 8.3510 (44.83)
13 (48.24)
14 (48.6)
18 (48.47)
19 (48.24)
20 (48.24)
22 (48.24)
Min = 49.24Max = 65.94
All the other algorithms
Table 3. Comparison of metrics in different series of predicted data.
Table 3. Comparison of metrics in different series of predicted data.
1st2nd3rd4th5thR2 ScoreRMSEMAEFFE
Ground Truth98 (G2)98 (G2)110 (A2)98 (G2)98 (G2)NANANANA
Predict 198.2 (G2)98.2 (G2)123.2 (B2)98.2 (G2)98.2 (G2)−0.615.912.80.8
Predict 298 (G2)98 (G2)123 (B2)98 (G2)98 (G2)−0.565.812.60.8
Predict 398 (G2)110 (A2)110 (A2)110 (A2)98 (G2)−0.337.594.80.6
Predict 498.2 (G2)98.2 (G2)110.2 (A2)98.2 (G2)98.2 (G2)0.9990.20.21
Table 4. An example to illustrate the weakness of the moving average, Median, Okada, and Jlassi algorithms as compared to the Smart-Median.
Table 4. An example to illustrate the weakness of the moving average, Median, Okada, and Jlassi algorithms as compared to the Smart-Median.
Input 100 102 2000 2000 100
Moving average(window size = 3)7341367.3331366.6671050100
Median (window size = 3)102200020001050100
Okada10010220002000100
Jlassi10010220002000100
Smart-Median100102102102100
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Faghih, B.; Timoney, J. Smart-Median: A New Real-Time Algorithm for Smoothing Singing Pitch Contours. Appl. Sci. 2022, 12, 7026. https://doi.org/10.3390/app12147026

AMA Style

Faghih B, Timoney J. Smart-Median: A New Real-Time Algorithm for Smoothing Singing Pitch Contours. Applied Sciences. 2022; 12(14):7026. https://doi.org/10.3390/app12147026

Chicago/Turabian Style

Faghih, Behnam, and Joseph Timoney. 2022. "Smart-Median: A New Real-Time Algorithm for Smoothing Singing Pitch Contours" Applied Sciences 12, no. 14: 7026. https://doi.org/10.3390/app12147026

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop