1. Introduction
Digital signal processing (DSP) is widely used in different areas, including the processing of musical signals [
1,
2]. However, different types of signals require tailored approaches. Algorithms that work well for certain signals may produce inaccurate or misleading results when applied to others. For example, a musical tone recorded with a standard sampling rate of 48 kHz is technically a wideband and non-stationary signal, exhibiting rapid changes over time and frequency. For this reason, it is important to adjust DSP techniques to that type of signal and use those algorithms that are fully designed with the nature of the signal in mind.
Musical tones generated by real instruments typically consist of a fundamental frequency and a series of partials. These components form a complex tone that can deviate from strict harmonic alignment due to the physical properties of the instrument [
3]. Such deviations are especially pronounced in plucked or struck string instruments, where the stiffness of the string causes inharmonicity—a phenomenon in which the partials are not located at exact integer multiples of the fundamental frequency [
4,
5,
6]. In the literature, the phenomenon of inharmonicity is described by the inharmonicity coefficient
B. This coefficient is determined not only by the stiffness of the string but also by other physical parameters such as Young’s modulus of elasticity, tension, radius, and the length of the string [
7]. Furthermore, a mathematical model has been developed that relates the inharmonicity coefficient
B to the partial frequencies
fn of the complex tone:
where
n denotes the order of the partial,
f0 is the fundamental frequency, and
B represents the inharmonicity coefficient [
8]. It should be noted that in the presence of inharmonicity, the fundamental frequency is not exactly the same as the frequency of the first or fundamental partial.
Typical values of the inharmonicity coefficient
B lie within the range of 10
−5 to 10
−3. The inharmonicity coefficient
B and string properties are related through string parameters as described by the equation [
5]:
where
E represents Young’s modulus of elasticity (in Pascal),
T is the wire tension (in Newtons), and
d and
l are the wire diameter and active wire length (in meters).
The perceptual consequences of inharmonicity have been examined from multiple perspectives, including pitch perception [
9,
10], timbre characteristics [
11,
12], the perception of musical intervals [
13,
14,
15], and overall tone quality [
7,
16,
17,
18]. Studies have also investigated perceptual thresholds for inharmonicity, identifying limits beyond which the sound of an instrument is perceived as unnatural or unpleasant [
19], thereby underscoring the role of inharmonicity in tone production and instrument tuning. Notably, inharmonicity alters the spacing between successive partials, which affects pitch perception by disrupting the harmonic template typically used by the auditory system. This effect has been empirically demonstrated in listening experiments with piano tones, where inharmonicity causes a perceived upward shift in pitch relative to harmonic tones—an effect that contributes to tuning practices such as the Railsback stretch [
18].
Given the inharmonicity, an inharmonic complex tone {
x[
k]} can be represented as a sum of its partial components:
where
N is the number of the prominent partials,
An[
k] is a time-varying envelope,
fn is the frequency of the
n-th inharmonic partial,
is a slowly varying component of the frequency, and
φn is the signal phase. The component
rest[
k] contains random noise and other artifacts, including quasi-sinusoidal components with frequencies other than expected partial values. Those additional sinusoidal components can arise from different phenomena specific to each of the instruments. Variation of the partial frequency, often referred to as pitch glide, is typically small and is usually neglected in rough signal analyses. This assumption is supported by empirical findings in the analysis of plucked string instruments, where pitch glide occurs primarily during the note onset and decays rapidly, often falling below perceptual thresholds [
9,
20]. For many DSP applications, including spectral analysis and partial tracking, this small, transient variation can be neglected in models.
In this paper, a set of digital signal processing procedures developed for the analysis of string instrument musical tones with prominent inharmonicity is proposed and presented. These procedures are organized in the form of the MuPI (Musical Signal Processing with Prominent Inharmonicity) application, comprising three major modules. The application, developed in MATLAB 2024, is focused on the quick inspection of recorded inharmonic tones following a processing chain. The application is intended for users with a good understanding of musical signal phenomena and basic DSP skills. Each module is designed as a Graphical User Interface (GUI) application.
The approach presented in this paper relies on observing the signal in both time and frequency domains in parallel. As a result of the application, additional information can be extracted from the signal and included in its description, such as the fundamental frequency, number of prominent partials, and inharmonicity coefficient B. The last submodule of the application assumes that processing of the subchannel signals corresponds to the frequency bins of each prominent partial. Additional features of the time envelopes of single partials can be extracted from the subchannel analysis. For the decomposition of the signal, a multichannel doubly complementary filter with an additional phase correction block is proposed, which improves the time alignment between the signals of the subchannels.
To position the proposed MuPI application within the landscape of existing audio signal analysis tools, a comparative evaluation was conducted against three well-established frameworks: LTFAT [
21], Essentia [
22], and MIRtoolbox [
23]. The comparison was performed across several criteria relevant to time–frequency analysis, feature extraction, signal manipulation, and system flexibility (
Table 1).
Unlike LTFAT and MIRtoolbox, which primarily rely on standard forms of frequency analysis, MuPI integrates a specific doubly complementary filter bank design, enabling more precise decomposition of signals into partials. Furthermore, while Essentia and MIRtoolbox emphasize high-level musical descriptors (e.g., tempo, key, chord progression), MuPI focuses on domain-specific parameters such as the inharmonicity coefficient and partial envelope extraction, which are not directly supported in the aforementioned tools. In terms of signal manipulation, MuPI offers functionality comparable to LTFAT by providing users with detailed control over subchannel signals and their recombination. This capability is particularly important for preparing stimuli in perceptual listening experiments. While Essentia and MIRtoolbox offer limited or no reconstruction capabilities, MuPI supports ideal reconstruction from filter banks, with a user-defined number of channels, further enhancing its flexibility.
Compared to LTFAT, MuPI is less broad but more focused on the specific task of analyzing the single complex tone of string instruments. The main goal was to keep the application as simple as possible and user-friendly for users and researchers in the field of musical signal processing who are not deeply familiar with advanced topics of DSP. For that reason, the application is designed as an intuitive GUI with some parameters set in advance and with a limited number of options.
In
Section 2, an overview of the major algorithms used in the developed application is presented. In
Section 3, the details of the developed application are provided.
Section 4 contains some examples and the results of the proposed application. In
Section 5, a discussion of the results is given, while in
Section 6, concluding remarks are given.
2. DSP Algorithms Used for the Analysis of String Instruments’ Recordings
The single musical tone signal is a wideband signal with prominent sinusoidal components. The signal is usually short, with very short quasi-stationary intervals. Based on this, the analysis of the signal in the developed application presented in this paper is performed in sub-bands. The recorded signal is split into sub-bands. The application supports time and frequency analysis, as well as a set of simple manipulations of the subchannels that assume simple modifications of the filtered signals and recombine them into the wideband signal.
2.1. Digital Filter Bank
A digital filter bank is used to decompose the signal into subchannels. The goal is to provide insight into the time and frequency parameters of each partial separately. Although any set of filters can be used for this task [
2], the designated filter bank has the advantage of being doubly complementary (both all-pass and power complementary) [
24]. The filter bank can be multirate [
25] or single-rate. The output signals and subchannels are not decimated, allowing users to listen to each subchannel with the same playback sampling frequency as for the original signal. The double complementarity of the proposed bank enables the reconstruction of the input signal by simply summing all subchannels.
A filter bank is designed as a tree structure, a cascaded connection of doubly (all-pass and power) complementary two-channel filter banks [
26,
27], as shown in
Figure 1 for the four channels. The number of subchannels can be arbitrarily large. In the proposed application, with additional all-pass filters (
A02(
z) and
A03(
z) in
Figure 1, this structure preserves the double complementarity property. The sum of the channel signals {
xn0[
k]} is equal to the input signal filtered by the cascaded connection of all-pass filters:
where
N + 2 is the number of channels. The final block in each channel is the phase correction block. It is implemented by inverse filtering of subchannel signals by the
A0total(
z) [
28]. With this additional phase correction block, the sum of all outputs of all channel signals is a zero-phase all-pass filtered input signal. In theory, it means that the input signal is completely restored by the sum of all channels. The only different samples are at the end of the signal and cannot be completely avoided due to limitations of the zero-phase filtering. The proposed application is developed for off-line processing of a prerecorded signal, and the proposed phase correction can be easily implemented. The proposed method preserves the time differences in the signal components. Additionally, it allows modifications to the signal, such as canceling one of the channels or similar simple manipulations.
Each two-channel filter bank is obtained by spectral transformation of the prototype half-band filter bank [
24,
29]. The prototype low-pass filter is designed as an odd-order Butterworth filter or an Elliptic minimal Q factor filter (EMQF) [
30,
31]. Poles of the real coefficient half-band low-pass transfer function
are located on the imaginary axes. One of them is located at the origin, and the remaining poles are complex conjugate pairs. In that case, the complementary filter pair, low-pass
filter, and high-pass filter
can be implemented as a parallel connection of two all-pass filters,
and
[
31]:
All-pass branches consist of a cascade connection of simple all-pass second-order sections, with additional delay corresponding to the origin located pole in the
A1(
z) branch
where
is the square of the pole
,
m = 2, 3, …, (
M + 1)/2, and
. The odd half-band filter order
M is calculated based on the values of the required stop-band attenuation
As and the required pass-band edge frequency ω
p0. As can be concluded from (6), the zeros of all-pass transfer functions are reciprocal to the poles.
The advantage of Butterworth and EMQF filters is that this half-band filter pair (a simple two-channel filter bank) can be easily transformed into a filter bank with an arbitrary crossover frequency [
24,
27,
30]. The only additional parameter needed for the spectral transformation is the exact value of the crossover frequency
ωc. By the transformation, each second-order section (6) is transformed into a section with changed poles and zeros:
where
,
, and
.
The trivial first-order section is transformed into a non-trivial all-pass section:
The transformation procedures (7) and (8) rely on spectral transformation of the digital filters [
32] that can be applied to different classes of filters. In the proposed application, the starting half-band filter is limited to either Butterworth or EMQF types because for these filter types, the coefficient α is the same in all second-order sections (7), allowing for rapid calculation of the resulting filter pair. By transformation, the pass-band edge frequency of the resulting low-pass filter
ωp is set to a value determined by the following [
24]:
where
ωc is the crossover frequency of the transformed filter pair, and ξ is the selectivity parameter determined by the pass-band edge frequency ω
p0 of the prototype low-pass half-band filter
The stop-band edge frequency of the resulting low-pass filter
ωs is determined by the following:
It can be seen (9) and (10) that the transition band of the resulting filter pair depends on its crossover frequency and pass-band edge frequency of the prototype low-pass half-band filter.
In the application presented in this paper, the array of crossover frequencies (
ωc) is calculated based on the estimated positions of the partials for determined values of fundamental frequency
f0 and inharmonicity coefficient
B as a normalized geometric mean of two successive partials:
where
fn,
n = 1, 2, …,
N + 1 are frequencies (in Hz) calculated as in (1), and
fs is the sampling frequency of the recorded signal. It should be noted that in the recorded signal, some partials can be totally suppressed (i.e., missing). The 3 dB bandwidth of channel
n is defined by crossover frequencies
ωc,n and
ωc,n+1. In the proposed application, the total number of channels is the number of partials
N enlarged by 2. The first channel (with index zero) contains components of the signal that are below the first partial, i.e., the pitch frequency, and the last channel contains higher frequency components of the signal, usually with no prominent partials. However, the same structure of the filter bank can be used for different analysis scenarios, including octave bank design or uniform bank design. The overall computational complexity depends on the order of the prototype filter and the number of channels. In the current form of the application, for each additional channel, an additional filter pair is required, which is obtained from the prototype filter by applying transformations (7) and (8). The number of channels used in the testing phase of the design was up to 90, with stop-band attenuations of up to 80 dB and ω
p0 in the range [0.45 0.495]. Due to the inherent properties of the Butterworth filter, its applicability is limited to stop-band attenuation of 60 dB and ω
p0 up to 0.45.
The proposed design of the filter bank is suitable for analyzing a single tone of a harmonic musical signal, with or without noticeable inharmonicity, and for different numbers of prominent partials and fundamental frequencies within a wide range.
2.2. The Short-Time Fourier Transform
The single tone musical signal is a non-stationary signal. For these types of signals, analysis is typically performed frame by frame. The frame-by-frame processing is included in the proposed application, basically for the time-dependent spectrum estimation.
The Short-Time Fourier Transform (STFT) is used for spectral analysis [
1]:
where
w[
l],
l = 0, 1, …,
L−1 is the window function of the length
L,
p is the frame index, and
R (in samples) is the shift of the successive frames. The expression can be transformed into a form:
And it can be implemented by the Discrete Fourier Transform (DFT) of each frame as follows:
where
Nfft is the length of the DFT. If it is set to a value greater than
L, the time array is zero-padded. Throughout this paper, the term STFT is used for the time-dependent analysis, and the term DFT is used for a single frame or the whole-length signal.
Currently, two types of window functions are supported: Hann and Blackman–Harris. The parameter
R is chosen based on the Constant Overlap—Add (COLA) condition [
1]:
In the current version of the application, it is set to 0.125 because for that value, the condition (16) is satisfied for both supported window functions. In the current form of the application, the inverse STFT is not used, and for that reason, the COLA condition does not have the full meaning. However, if the additional modules relying on the inverse STFS are included, this issue will be considered more carefully.
In the presented application, the overlapping DFT series is used for the visualization instead of the spectrogram that is typically used. The reason is that the STFT is only calculated for a limited number of frames and used to track the frequency glide of the signal’s partials. If the window length is set to achieve good time resolution, that is usually not good enough for satisfactory resolution in the frequency domain. For the detection of the positions of maximum values of the channel signal’s spectrum, the zero padding improves the result; however, that does not resolve the resolution problem, i.e., the components close in spectrum are difficult to resolve.
2.3. Envelope Estimation
By the proposed digital filter bank, an array of
N + 2 subchannel signals is obtained, {
xn[
k]},
k = 0, 1, …,
K−1,
n = 0, 1, …,
N + 1. Each element of the array is a signal of the same length
K as the selected segment of the recorded signal. If the bank parameters are properly chosen, for channels 1 to
N, the content corresponds to one harmonic partial of the signal. The signals
xn[
k] can be further analyzed in the time and frequency domains. The estimation of the envelope of a musical signal can be performed in various ways; an overview of methods is presented in [
33,
34]. Having in mind that in this case signals are prefiltered, i.e., can be classified as narrow band signals, in the proposed application, the estimation by Hilbert transform is applied
where
is the Hilbert transform of
xn[
k]. In some cases, it is useful to express envelopes mathematically. In the proposed application, the approximation equation is a modified version of the expression proposed in [
35]:
where parameters
An1–
An5 are calculated by fitting the envelope
An[
k] in MATLAB by the fit function [
36]. In
Table 2, the starting, minimum, and maximum input values for all parameters are provided. The other input parameters of the fit function are set to default values.
The proposed approach is simplified compared to [
35] because the signal is not split into segments traditionally used in music tone analysis and synthesis [
37]. Instead, an additional component is inserted into the (18). The reason is that the proposed application is intended for the rapid automatic processing of recordings, with only a very limited set of allowed modifications and recombination of subchannel signals. On the other hand, the extracted envelopes usually are not perfect, i.e., additional smoothing or other preprocessing is needed before the split-point detection [
33,
35]. Having that in mind, the expression (18) is chosen as a compromise between the perfect envelope fit and the complexity of the solution. This proposed form of the envelope expression (18) has been experimentally proven to be sufficient for the automatic processing of most test signals, in the expected scenario of application usage. However, if there is a need for different approaches, the modular form of the current application allows adding new modules or replacing the third submodule that uses the described approach of envelope approximation.
2.4. Inharmonicity Coefficient Estimation from the Recorded Signal
The presented application is primarily designed for analyzing musical tones produced by string musical instruments. For the plucked or struck string instruments, the inharmonicity is not negligible. Although the inharmonicity analysis is not the primary goal of the presented application, it must be estimated because the design of the filter bank relies on the values of the partial frequencies. Even if it seems that the variations in position raised by the inharmonicity are small, for the signals with a long array of prominent partials, the misadjustment between harmonic and inharmonic array can be significant for the larger index partials. For this reason, it is necessary to estimate the value of the inharmonicity coefficient
B before designing the filter bank. In this application, it is achieved through simple visual matching between spectral coefficients and an array of frequencies obtained with the assumed inharmonicity coefficient value. By manually changing the value of
B, a result that is good enough for the filter bank design can be obtained. In the second step, the result is verified once again. After the filtering of the output signal, the spectral maximums of each subchannel are estimated. Based on those values, the inharmonicity coefficient is recomputed using the algorithm proposed in [
38]. A mismatch of the values can be used as the cause for further analysis of the recorded signal. The algorithm [
38] is simple and suitable for automatic processing. It begins with an initially presumed value of inharmonicity coefficient and an array of frequencies corresponding to partials of the signal extracted from the recording before the core algorithm starts. The length of the partial frequency array depends on the instrument and can range from just a few to more than 50 components. The algorithm is an iterative procedure that calculates the new inharmonicity coefficient value in each iteration, thereby minimizing the difference between the actual partial frequencies estimated from the recorded signal and the array of frequencies calculated using (1) in each iteration. In the presented application, it is implemented as an additional module that can be activated after the filter bank is designed.
To evaluate the performance of the MuPI implementation of the algorithm [
38] for estimating the inharmonicity coefficient, a comparative test was conducted using reference data published in the original paper [
38]. Principally, the algorithm consists of two steps: estimation of the partials frequency array and iterative evaluation of the inharmonicity coefficient
B. The second step introduced a partial frequencies deviation (PFD) algorithm and validated it using synthetic piano tones with known inharmonicity profiles for key numbers 1–35 (
f0 ∈ [27.5 195.99]). The same synthetic dataset was analyzed using the MuPI implementation of the inharmonicity estimation method on the array of frequencies estimated as maximal values for each subchannel. The results obtained from MuPI were compared against the original synthetic reference curve published in the paper.
Figure 2 illustrates the comparison: the solid black line represents the reference value of the inharmonicity coefficient
B used for the synthetic piano model, while red dots denote the values estimated by the MuPI implementation.
The visual overlap of the two curves confirms that the MuPI implementation of the algorithm [
38] achieves full agreement with the target values. Quantitatively, all estimated inharmonicity coefficients are within ±2% deviation from the reference values, indicating high numerical accuracy. In particular, the shape of the inharmonicity profile—characterized by a decreasing trend in the bass range and an increase above key number 28—is faithfully reproduced.
This comparison validates both the accuracy of the MuPI implementation of the estimation procedure. It also demonstrates that MuPI’s concept is suitable for fine-grained spectral parameter analysis of inharmonic tones.
5. Discussion
In
Section 3, three different recordings were used to present different algorithms included in the current form of the application. In
Table 4, the major characteristics of the analyzed recorded signals important for the present experiment are summarized.
The characteristics of the signals are different in aspects of the number of prominent partials and values of the inharmonicity coefficient B. The harp tone is characterized by a low number of prominent partials as well as the presence of other sinusoidal components, near the expected positions of the partials. Those components can influence the estimation of the frequency array that is used as an input to the inharmonicity coefficient B PDF estimation algorithm. By simultaneous inspection of the signal in time and frequency, as described in example 2, the problem can be detected and even corrected by replacing the “corrupted” partial with the expected one.
The piano tone is usually the one easily recognized. Because of that, the example based on the piano tone is used for the illustration of the tool for the simple recombination of subchannels that can be used for the modification of the signal in a controlled manner. Realized applications can be used as a signal preparation tool for subjective tests.
The harpsichord recording is characterized by a large number of prominent partials, requiring a large number of channels in the filter bank design. It is shown that the proposed approach can provide the required bank.
All presented examples illustrate the scope of the proposed approach based on the decomposition of the signal into channels corresponding to signal partials.
6. Conclusions
In this paper, the analysis of single-tone signals from string musical instruments based on subchannel decomposition is presented. By this approach, the complex signal is divided into an array of less complex signals, and further analysis of each channel is performed simultaneously in both the time and frequency domains. The decomposition of the signal is performed by a non-uniform multichannel filter bank with additional phase correction, preserving time instances between channels. It is expected that each subchannel contains only one strong quasi-sinusoidal component. Sometimes the subchannel signal has an additional strong quasi-sinusoidal component, such as phantom partials. Due to the relatively short intervals of quasi-stationarity in the musical signal, it is challenging to detect and accurately verify the presence of those irregularities. The STFT is a powerful tool for analyzing non-stationary signals; however, due to the short window lengths, the close picks in frequency are non-separable. In the presented approach, the combination of time and frequency analysis of each channel and analysis of the overall signal improves the analysis. MuPI enables more accurate analysis of tone signals thanks to a specialized filter bank design that is not present in applications of similar purpose. Unlike other tools, it is focused on parameters like inharmonicity and reconstruction, which makes it suitable for the analysis of stringed instruments.
The presented application features a modular structure, enabling the implementation of various additional modules. In future work, the additional processing block can be developed that relies on the extrapolation of the signal as a tool for improving the resolution of the STFT [
39].
The parameter M2M is calculated as a reasonable merit of the prominence of the partial. However, it is only provided as a value but not fully used in the current version of the application. In further work, it can be seen as a tool for excluding the non-prominent or completely “missing” partials from the evaluation of the signal parameters, including the inharmonicity coefficient B. For the proposed structure of the filter bank, it is unexpected to detect two strong partials in the same band. The future work will also focus on finding a quick and simple tool for the detection of channels with more than one frequency peak.