Next Article in Journal
Towards Secure Big Data Analysis via Fully Homomorphic Encryption Algorithms
Previous Article in Journal
Studying Physiological Synchrony in Couple Therapy through Partial Directed Coherence: Associations with the Therapeutic Alliance and Meaning Construction
Previous Article in Special Issue
An Efficient and Accurate Multi-Sensor IF Estimator Based on DOA Information and Order of Fractional Fourier Transform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Recasting the (Synchrosqueezed) Short-Time Fourier Transform as an Instantaneous Spectrum

by
Steven Sandoval
* and
Phillip L. De Leon
Klipsch School of Electrical and Computer Engineering, New Mexico State University, Las Cruces, NM 88003, USA
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(4), 518; https://doi.org/10.3390/e24040518
Submission received: 13 January 2022 / Revised: 14 March 2022 / Accepted: 29 March 2022 / Published: 6 April 2022
(This article belongs to the Special Issue Time-Frequency Analysis, AM-FM Models, and Mode Decompositions)

Abstract

:
In a previous work, we proposed a time-frequency analysis called instantaneous spectral analysis (ISA), which generalizes the notion of the Fourier spectrum and in which instantaneous frequency is utilized to the fullest extent. In this paper, we recast both the Fourier transform (FT) and filterbank (FB) interpretations of the short-time Fourier transform (STFT) as instantaneous spectra. We show that to recast the FB interpretation of STFT as an instantaneous spectrum with valid structure, frequency reassignment is a fundamental necessity, thus demonstrating that this IS is closely related to the synchrosqueezed STFT. This result provides a new theoretical motivation for the synchrosqueezed STFT. Finally, we illustrate through example the instantaneous spectra corresponding to the FT and FB interpretations of STFT using two closed-form examples.

1. Introduction

In Gabor’s seminal work [1], the notion of joint time-frequency analysis was proposed and led to the development of the short-time Fourier transform (STFT) [2,3]. Even today, STFT is the most well-known and utilized method in time frequency analysis [4,5,6,7,8,9] and extensions such as the synchrosqueezing transform (SST) and synchrosqueezed STFT are still actively being investigated [10]. Synchrosqueezing and reassignment are typically motivated as post-processing techniques in order to improve readability in the time-frequency plane [11,12] or as refinements based on phase information.
In [13], we developed a generalized framework for time-frequency analysis which we termed instantaneous spectral analysis (ISA) and proved that the instantaneous spectrum (IS) exactly localizes signal components in an instantaneous bandwidth sense. Using ISA, we are able to define more general time-frequency spectra than is possible using STFT. This is a result of the basic time-frequency atom or component utilized in these methodologies. More specifically, ISA theory allows the use of an AM–FM component whereas STFT uses a far more restrictive component. Moreover, IS theory allows the unambiguous specification of an IS S ( t , ω ) and a related complex-valued signal z ( t ) corresponding to a component set S
Entropy 24 00518 i001
Although this may be considered as an ideal synthesis model for instantaneous spectra and AM–FM models, the many-to-one mappings are sources of information loss which allow an infinite number of instantaneous spectra and component sets to map to the same signal. As a result, the reverse process is under determined and no unique analysis model exists.
One approach which can be taken for the analysis stage is to consider two stages: (1) signal decomposition and (2) component demodulation
Entropy 24 00518 i002
where all ambiguity lies in how the signal z ( t ) is decomposed into a set of components—there exist an infinite number of ways to express a whole as a sum of parts. However, every decomposition has the advantage that it can be associated with an instantaneous spectrum in which each component is exactly localized. Thus, ISA can be immediately used to enhance existing decomposition methods with an associated instantaneous spectrum with exact time-frequency localization. However, while the distinct separation of the analysis stage into decomposition and demodulation is well suited for pairing ISA with decomposition methods and AM–FM models, alternate approaches to the analysis stage exist.
For example, there exists a great body of literature devoted to the study of time-frequency distributions (TFDs), and although TFDs and ISs are both mathematical objects which describe signals in time-frequency spaces, they describe different types of spaces. Thus, it is natural to seek to establish the connection of particular TFDs as special cases of ISs, if and when this is possible. Let T denote an integral transformation relating a complex-valued signal z ( t ) and a TFD Z ( t , ω ) , and T 1 denote its inverse, assuming it exists
z ( t ) T 1 T Z ( t , ω ) .
The focus of traditional TFD analysis may be considered as the theory regarding the existence and mathematical properties for different choices of the transformation T . ISA theory can be used in conjunction with other time-frequency methods as a framework to pose and address other important questions. For example, suppose we wish to study the performance of a particular integral transformation T on a particular class of signals. If we represent the particular class of signals by constraints on the form of the components in S , then we can compare the IS obtained by recasting the time-frequency distribution Z ( t , ω ) , denoted by S ^ ( t , ω ) , by considering
Entropy 24 00518 i003
and comparing S ^ ( t , ω ) to S ( t , ω ) .
The purpose of this work is to show how to enforce the structure necessary to recast the time-frequency distribution Z ( t , ω ) to an instantaneous spectrum S ^ ( t , ω ) , if we choose the integral transformation T as the STFT. We show that there are two ways to do the recasting, each corresponding to one of the two classic interpretations of STFT. Our contributions are as follows:
  • We show that the two equivalent STFT interpretations lead to different ISs, and thus provide new insights into STFT. In particular, we show that the IS corresponding to the FT interpretation of STFT corresponds to an IS for each window grain, while on the other hand, a single IS corresponding to the FB interpretation of STFT exists when STFT is synchrosqueezed [14,15]. As a result, in the FT interpretation, the components have a restrictive fixed amplitude and fixed frequency, while in the FB interpretation, the components are AM–FM in nature. This results in significant conceptual and practical differences between the two interpretations.
  • We contribute a new theoretical motivation for synchrosqueezing. In particular, in order to recast the FB interpretation as an IS, we show that reassignment in frequency is a fundamental requirement. This is in contrast to the view of synchrosqueezing largely as a heuristic approach to improve energy concentration in the time-frequency plane.
  • By recasting the two STFT interpretations as an IS, we can leverage the 3D IS visualization [13] to contribute a novel visualization of STFT. This is advantageous because the 3D IS allows visualization of multiple aspects of the signal decomposition simultaneously, including both magnitude and phase of each signal component. While it may take the reader some time to become comfortable with the 3D visualization, we believe it has significant advantages in terms of interpretability and note that the STFT phase spectrum is almost never considered or visualized.
For the benefit of the reader, this paper reviews key concepts of STFT and synchrosqueezed STFT, however, this work is not intended to be a review paper. For such reviews, we refer the reader to [2,3] for STFT and [14,16] for reassignment and synchrosqueezing. Rather, this paper recasts STFT as an IS, which we believe to be a more powerful signal analysis framework. The remainder of this paper is organized as follows. In Section 2, we introduce our notation for the continuous STFT and provide the expressions specifically related to the FT and FB interpretations. In Section 3, we provide a brief history of the development of reassignment techniques leading to the synchrosqueezed STFT. In Section 4, we provide a brief summary of ISA theory. However, as this work is an extension of our prior work, it is strongly recommended that [13] be read in advance. In Section 5, we give our main contribution by recasting the FT and FB interpretations of STFT as an IS. In Section 6, we provide illustrations and discussion on the relationships of the FT and FB interpretations of STFT to IS for example signals. Finally, in Section 7, we provide concluding remarks.

2. The Short-Time Fourier Transform

In this section, we review the (continuous) STFT following the development of the discrete STFT by Allen and Rabiner [2]. We begin by choosing a real, even-symmetric window function w ( · ) such that
w ( τ ) d τ = 1 .
This ensures that
w ( t τ ) = w ( τ t )
and superimposing all “window grains” z ( t ) w ( τ t ) over τ gives
z ( t ) = z ( t ) w ( τ t ) d τ .
Next, we review the two interpretations of STFT [2,3].

2.1. Fourier Transform Interpretation of STFT

The FT of (3) yields
  Z ( j ω ) = z ( t ) w ( τ t ) d τ e j ω t d t
          = z ( t ) w ( τ t ) e j ω t d t d τ
= F z ( t ) w ( τ t ) 0 0 d τ
= Z w ( ω ; τ ) d τ
where Z w ( ω ; τ ) denotes the classical (the use of classical and modified for describing STFT is based on the current literature). See for example [4,17,18,19]) STFT. Equating the expressions inside the integrals of (4c) and (4d), shows that STFT may be considered as the FT of all window grains
Z w ( ω ; τ ) F z ( t ) w ( τ t ) 0 0 .
This may be viewed as a function of ω at a fixed value of time shift τ . The signal may be recovered by means of the overlap-add (OLA) method for short-time synthesis
z ( t ) F 1 Z w ( ω ; τ ) 0 0 d τ .

2.2. Filterbank Interpretation of STFT

The FB interpretation of the modified STFT may be developed by considering w ( t ) e j ν t as a channelizer with center frequency ν
Z w ( t ; ν ) = z ( t ) w ( t ) e j ν t
where ∗ denotes convolution. Equivalently, the classical STFT
Z w ( t ; ν ) e j ν t z ( t ) e j ν t 0 0 w ( t )
which may be viewed as a function of t at a fixed value of ν , i.e., the signal is frequency shifted and filtered with the impulse response w ( t ) on a continuum of frequency shifts ν . The signal may be recovered by means of the filterbank summation (FBS) method for short-time synthesis
z ( t ) 1 2 π w ( 0 ) Z w ( t ; ν ) d ν = 1 2 π w ( 0 ) z ( t ) w ( t ) e j ν t d ν = 1 2 π w ( 0 ) z ( τ ) w ( t τ ) e j ν ( t τ ) d ν d τ = 1 2 π w ( 0 ) z ( τ ) w ( t τ ) 2 π δ ( t τ ) d τ = 1 w ( 0 ) z ( t ) w ( 0 ) .

2.3. Complimentary Interpretations

We point out to the reader a crucial difference between the meaning of the independent variables in (5) and (8) even though these equations are equivalent
Z w ( t ; ν ) e j ν t Z w ( ω ; τ ) .
In (5), τ is a time shift variable and ω is instantaneous frequency (IF), while in (8), t is time instant and ν is a frequency shift variable. While this difference is well known and insignificant in STFT theory, it results in major differences in the relationship of STFT to IS based on the interpretation taken. Note that regardless of how STFT is computed, we may switch interpretations by interchanging variables t τ and variables ω ν in (10).

3. Synchrosqueezed Short-Time Fourier Transform

In the 1970s, Kodera et al. [11,20] proposed to modify the spectrogram by taking into account the phase information that is usually discarded. The basic idea is to reassign energy in the spectrogram to a new time and frequency location by utilizing phase derivatives. Kodera’s work received little attention for decades [14], and in the 1980s Friedman also proposed spectrogram reassignment [21], apparently without knowledge of the work by Kodera. In Friedman’s approach, reassignment occurs in frequency but not in time. In both approaches, phase is used to perform reassignment but is subsequently discarded, thus preventing reconstruction. Slow adoption of these methods is often attributed to the inability to reconstruct the signal and the numerical problems associated with derivative approximations [12,14,16]. Auger and Flandrin showed that the numerical problems associated with the phase derivative could be avoided by computing three STFTs with related window functions, which led to a more efficient implementation.
In the 1990s, reassignment resurfaced when two independent groups developed the reassignment method (RM) [12,14,16] and the SST [22,23]. The RM developed by Auger and Flandrin is similar to Kodera’s work in that reassignments occur in both time and frequency. Additionally, they showed that the reassignment concept could be generalized to work for a broader class of time-frequency representations, e.g.,  in the Wigner–Ville distribution by recasting the problem in terms of centroids instead of phase derivatives. The SST developed by Maes and Daubechies is similar to Friedman’s work in that reassignments occur only in frequency using phase derivatives. However, differences from Friedman’s work include the use of a complex wavelet transform instead of the STFT and a reassignment of a complex value rather than a real value.
In [14,24], the SST is computed using an STFT rather than a wavelet transform, leading to the synchrosqueezed STFT
SST ( t , ω ) = 1 2 π w ( 0 ) Z w ( t ; ν ) e j ω t δ ω 0 0 ω ^ ( t ; ν ) 0 0     d ν
where
ω ^ ( t ; ν ) = d d t arg { Z w ( t ; ν ) } .

4. Instantaneous Spectral Analysis

In [13], we introduced ISA as a general framework for time-frequency analysis consisting of three parts: (1) a parameter set, (2) an IS, and (3) a complex AM–FM signal model. More specifically, in this framework: (1) a signal is represented by a set of canonical triplets S C 0 , C 1 , , C K 1 , (2) each set has a single-valued mapping to an IS S S ( t , ω ) , and (3) each IS has a single-valued mapping to a signal S ( t , ω ) z ( t )
S = { C k } Equation ( 14 ) S ( t , ω ) Equation ( 15 ) z ( t ) .
The canonical triplet for the kth AM–FM component is
C k a k ( t ) , ω k ( t ) , ϕ k 0 0
where a k ( t ) is the instantaneous amplitude (IA), ω k ( t ) is the IF, and ϕ k is the phase reference. The kth complex AM–FM component is then given by
ψ k t ; C k 0 0 a k ( t ) e j t ω k ( τ )     d τ + ϕ k
= a k ( t ) e j θ k ( t )
= s k ( t ) + j σ k ( t )
where θ k ( t ) is the phase function, s k ( t ) is the real part, and σ k ( t ) is the imaginary part. With (12) and (13), the IS is defined as
S ( t , ω ; S ) 2 π k = 0 K 1 ψ k τ ; C k 0 0 2 δ t τ , 0 0 ω ω k ( τ ) 0 0   d τ = 2 π k = 0 K 1 ψ k t ; C k 0 0 δ 0 0 ω ω k ( t ) 0 0 .
The IS S ( t , ω ; S ) maps to the complex signal z ( t ; S ) with
1 2 π S ( t , ω ; S ) d ω = z ( t ; S ) .
Finally, the complex signal z ( t ; S ) is represented as a superposition of K (possibly infinite) complex AM–FM components
z t ; S 0 0 k = 0 K 1 ψ k t ; C k 0 0
  = x ( t ) + j y ( t ) .
We refer the reader to [13] for additional details.
We emphasize to the reader that although IS is expressed by a (complex-valued) function of t and ω , not every function of t and ω has the necessary structural requirements to be a valid IS. This is not unlike STFT which is also a (complex-valued) function of time and frequency and where not every function of time and frequency has the necessary structural requirements to be a valid STFT. For example, it is well understood that when modifying the STFT magnitude there is the distinct possibility that modification may lead to an invalid STFT. In this case, inversion requires algorithms such as least-squared error inverse STFT (LSE-ISTFT) which inverts the invalid STFT to the signal which has an STFT closest (in an LSE sense) to the invalid STFT [25,26,27].
Moreover, although both the STFT and IS provide spectral representations in time and frequency, they have different structural requirements. This is due to the fact that the requirements are imposed by the analysis equations in (5) and (8), whereas the requirements of IS are imposed by the definition in (14). One implication of this is that one cannot assume that an STFT has the necessary structure to be a valid IS. On one hand, we show that although the FT interpretation of STFT does not possess the necessary structure to be a valid IS, it may be interpreted as a continuum of ISs. On the other hand, we show that while the FB interpretation of STFT does not possess the necessary structure to be a valid IS, the structure necessary to utilize ISA theory may be imposed by synchrosqueezing the STFT.

Relation to Frequency Domain Analysis

In [13], we gave proof that frequency domain analysis corresponds to a specialized (and restricted) form of an IS when a k ( t ) = a k ω 0 , ω k ( t ) = k ω 0 , and the discrete set takes on a continuum, i.e.,  ω 0 0 ,
Z ( j ω ) e j ω t = lim ω 0 0 S ( t , ω ) 0 0 0 a k ( t ) = a k ω 0 ω k ( t ) = k ω 0 = S FD ( t , ω ) .
Finally, evaluating (17) at t = 0 yields
Z ( j ω ) = S FD ( 0 , ω ) .
We refer the reader to [13] for additional details.

5. Recasting the Short-Time Fourier Transform as an IS

IS provides a signal analysis which is both instantaneous in t and ω . From Section 2, we see that STFT allows for instantaneous analysis in only one of the variables, i.e., the FB interpretation is instantaneous in time while the FT interpretation is instantaneous in frequency (albeit constant frequency). In this section, we recast each STFT interpretation as an IS. While the two interpretations are conceptual in nature, the ISs corresponding to these interpretations take on different mathematical forms. As we show, the IS corresponding to the FB interpretation S FB ( t , ω ) makes use of AM–FM components, and thus is easily understood in terms of a single IS. On the other hand, the IS corresponding to the FT interpretation S τ FT ( t , ω ) uses a more restrictive component and is best understood using a continuum of ISs. In this section, we continue from (11) and develop (19), (20), and (24), whose context in the overall theory is illustrated below
z ( t ) F     Z ( j ω ) Equation ( 17 ) S FD ( t , ω ) z ( t ) Equation ( 5 ) Z w ( ω ; τ ) Equation ( 19 ) S τ FT ( t , ω ) Equation ( 20 ) S FD ( t , ω ) . z ( t ) Equation ( 7 ) Z w ( t ; v ) Equation ( 24 ) S FB ( t , ω )

5.1. IS Corresponding to the FT Interpretation of STFT

The FT interpretation of the classical STFT in (5) is that of a continuum of FTs indexed by τ . Thus, from the relation of the FT to the IS in (17), the IS corresponding to the window grain at t = τ is
S τ FT ( t , ω ) = Z w ( ω ; τ ) e j ω t .
Superimposing the ISs in (19) gives the IS corresponding to the FT interpretation of STFT
S FD ( t , ω ) = S τ FT ( t , ω ) d τ .
Equation (4d) shows that STFT is a decomposition of FT. Likewise, the continuum of ISs corresponding to the window grains decomposes the IS corresponding to the FT interpretation
S τ FD ( t , ω ) d τ = Z w ( ω ; τ ) e j ω t d τ
= e j ω t Z w ( ω ; τ ) d τ
  = e j ω t Z ( j ω )
= S FD ( t , ω ) .
In other words, superimposing the ISs corresponding to the FT interpretation of STFT yields the IS corresponding to frequency domain analysis (Fourier transform) S FD ( t , ω ) , and as a result does not provide a new IS to study. Rather, when taking the FT interpretation of STFT, we only gain new insights by studying the ISs corresponding to the window grains.

5.2. IS Corresponding to the FB Interpretation of STFT

The FB interpretation of the modified STFT in (7) is that of an infinite number of signal components, each corresponding to frequency shift ν followed by filtering (convolution) with w ( · ) . Naively comparing (15) with (9) one might be tempted to assume that a corresponding IS may be formed with
S FB ( t , ω ) = 1 w ( 0 ) Z w ( t ; ν ) 0 0 ν ω .
However, this would be incorrect because it does not provide the structure necessary to be a valid IS. This is further illustrated and discussed below. On the other hand, we can construct an IS with valid structure from Z w ( t ; ν ) by reassigning the component associated with frequency shift ν to the appropriate IF. We begin by writing the modified STFT in polar form as
Z w ( t ; ν ) = a w ( t ; ν ) exp j θ w ( t ; ν ) 0 0 .
Using the IS definition in (14) we have
S FB ( t , ω ) = a w ( t ; ν ) e j θ w ( t ; ν ) δ ω 0 0 d d t θ w ( t ; ν )   d ν
which is immediately recognized as a synchrosqueezed STFT [14,15]. We note that while most developments of reassignment/synchrosqueezing are motivated as post-processing techniques in order to improve readability of spectrograms, in our development, reassignment in frequency is a fundamental necessity to ensure a valid IS structure. Furthermore, techniques which reassign in time are not compatible with a valid IS structure.

5.3. Discussion

A critical difference between the ISs corresponding to FT and FB interpretations of STFT is the form of the components utilized. With the FT interpretation, the individual components are obtained from the classical STFT Z w ( ω ; τ ) , as follows. For each point ( τ , ω ) , the component is formed as
ψ τ , ω ( t ) = Z w ( ω ; τ ) e j ω t .
Here, Z w ( ω ; τ ) acts as an initial condition, and multiplication with e j ω t projects the component forward and backward in time. With the FB interpretation, the individual components are obtained from the modified STFT Z w ( t ; ν ) , as follows. For each frequency shift ν , the component is formed as
ψ ν ( t ) = Z w ( t ; ν ) .
While (25) and (26) have similar mathematical forms, they are very different because Z w ( ω ; τ ) is independent of t while Z w ( t ; ν ) is dependent on t. As a result, the components in (25) have a fixed amplitude | Z w ( ω ; τ ) | and fixed frequency ω , while the components in (26) have, in general, a time-varying amplitude | Z w ( t ; ν ) | and time-varying frequency d d t arg { Z w ( t ; ν ) } . Thus, the former component is very restrictive, whereas the latter component is AM–FM in nature. This results in significant conceptual and practical differences between S τ FT ( t , ω ) in (19) and S FB ( t , ω ) in (24), even though mathematically there is little practical difference between Z w ( ω ; τ ) and Z w ( t ; ν ) . Moreover, (26) along with (23) explains why Z w ( t ; ν ) does not have the necessary structure to be interpreted as an IS: the energy associated with the component ψ ν 0 ( t ) is located at channelizer frequency ν = ν 0 in Z w ( t ; ν 0 ) , rather than at the appropriate IF location d d t arg { Z w ( t ; ν 0 ) ) } .

6. Instantaneous Spectra Corresponding to STFT Interpretations for Example Signals

In this section, we illustrate through example the IS corresponding to the two interpretations of STFT. Information regarding visualization of the IS can be found in [13] and software for IS visualization at [28,29]. The examples shown below consist of two signals which can be expressed and analyzed in a closed-form as well as a real world signal, i.e., acoustic recording. In order to develop closed-form expressions for STFT, we choose to analyze the complex exponential and linear FM chirp with a Gaussian window. Our analysis uses the following FT pairs. First, the FT of a quadratic chirplet is given by
z ( t ) = exp p 1 t 2 F Z ( j ω ) = π p 1 exp 1 4 p 1 ω 2
where p 1 C and Re { p 1 } > 0 . Second, it can be shown by completing the square and using (27), that the FT of a product of time-shifted quadratic chirplets is given by
z ( t ) = p 2 π exp p 1 ( t t 1 ) 2 exp p 2 ( t t 2 ) 2 F Z ( j ω ) = c exp 1 4 p 3 ω 2 exp j T ω
with the chirp parameters p 1 C , p 2 C , Re { p 1 } > 0 , Re { p 2 } > 0 , p 3 = p 1 + p 2 , T = ( t 1 p 1 + t 2 p 2 ) / p 3 , and c = p 2 / p 3 exp ( t 1 2 p 1 + t 2 2 p 2 T 2 p 3 ) .

6.1. Complex Exponential

In the first example, consider the canonical triplet
C 0 = 1 , ω 0 , 0
which using (16a) gives the complex exponential signal
z ( t ) = exp j ω 0 t
and with (14) gives the IS
S ( t , ω ) = 2 π δ ( ω ω 0 ) e j ω t .
Choosing a Gaussian window
w ( t ) = β 2 2 π exp β 2 t 2 2
we compute the STFT corresponding to the FT interpretation by using (5), choosing p 1 = β 2 / 2 in (27), and using time- and frequency-shift properties of the FT
Z w ( ω ; τ ) = exp ( ω ω 0 ) 2 2 β 2 j τ ( ω ω 0 ) 0 0 .
With (33) and (10) we then form the IS corresponding to the FT interpretation
S τ FT ( t , ω ) = exp ( ω ω 0 ) 2 2 β 2 j τ ( ω ω 0 ) 0 0 e j ω t .
Next, we compute the STFT corresponding to the FB interpretation using (33) and (10)
Z w ( t ; ν ) = exp ( ν ω 0 ) 2 2 β 2 + j θ w ( t ; ν ) 0 0
where
θ w ( t ; ν ) = ω 0 t .
Substituting into (24), we form the IS corresponding to the FB interpretation
S FB ( t , ω ) = exp ( ν ω 0 ) 2 2 β 2 e j t ω 0 δ ω 0 0 ω 0   d ν = β 2 π δ ω 0 0 ω 0 0 0 e j ω 0 t .
Comparing (37) with (31), we see that after reassignment S FB ( t , ω ) yields the correct IS (to within a scale factor) for this signal. On the other hand, direct comparison of (34) with (31) is not possible. While one could superimpose (34) on the continuum τ as described in (20), this would only lead to S FD ( t , ω ) . Although this does lead to a valid IS, it is not useful for time-frequency analysis because it provides the same information as FT.
The ISs corresponding to the FT interpretation of the STFT in (34) are shown in the left column of Figure 1. The top plot shows S FD ( t , ω ) for the complex exponential while the lower three plots show S τ FT ( t , ω ) for three different τ [see (19)]. The fact that the FT interpretation yields a representation that may be considered as a decomposition into a continuum of ISs of the window grains at each τ [see (21d)] is visually demonstrated by the “+” notation used in the figure. From the figure, it is apparent that the frequency spectrum which results from taking the FT of any window grain has components that extend beyond the time support of the window. Thus, even if a window grain has finite time support, the associated frequency spectrum has infinite extent and is not simply limited to the local vicinity of the window. This demonstrates that while the STFT is mathematically correct, there is a conceptual flaw in the IS obtained by recasting the FT interpretation of STFT.
The FB interpretation of the STFT in (35) is shown in Figure 2a. As discussed in Section 5.2, this STFT is not a valid IS and the structural problem may be seen in (36) and observed in the figure. In particular, the IF of the component corresponding to the channelizer with center frequency ν is given by d d t θ w ( t ; ν ) = ω 0 , i.e., the IF has constant value ω 0 and is thus independent of the value ν . This can be seen in the figure by observing that the oscillation rate of components is fixed and does not change along the frequency axis. On the other hand, reassigning this IS using (24) gives (37), which is illustrated in Figure 2b.

6.2. Linear FM Chirp

In the second example, consider the canonical triplet
C 0 = 1 , ω 0 + ω c t , 0
which using (16a) gives the linear FM chirp signal
z ( t ) = exp j ( ω 0 t + ω c 2 t 2 )
and with (14) gives the IS
S ( t , ω ) = 2 π δ ω ( ω 0 + ω c t ) 0 0 e j ( ω 0 t + ω c 2 t 2 ) .
Choosing the Gaussian window in (32), we compute the STFT corresponding to the FT interpretation by using (5); choosing p 1 = j ω c / 2 , p 2 = β 2 / 2 , t 1 = 0 , and t 2 = τ in (28) and using time- and frequency-shift properties of the FT
Z w ( ω ; τ ) = | λ | exp β 2 ( β 4 γ ) τ 2 + β 2 ω c τ ( ω ω 0 ) β 2 ( ω ω 0 ) 2 2 γ × exp j β 4 ω c τ 2 2 β 4 τ ( ω ω 0 ) ω c ( ω ω 0 ) 2 2 γ + j arg λ
where λ = β / β 2 j ω c . The IS corresponding to the FT interpretation S τ FT ( t , ω ) is then formed by substituting the above into (19).
Next, we compute the STFT corresponding to the FB interpretation using (41) and (10), then, in polar form (see (23)) we have
a w ( t ; ν ) = | λ | exp ( β 4 γ ) t 2 + ω c t ( ν ω 0 ) ( ν ω 0 ) 2 2 γ β 2
and
θ w ( t ; ν ) = β 4 ω c 2 γ t 2 β 4 ( ν ω 0 ) γ t ω c ( ν ω 0 ) 2 2 γ + ν t + arg λ .
Finally, the IF as a function of ν is given by
d d t θ w ( t ; ν ) = β 4 ω c γ t β 4 ( ν ω 0 ) γ + ν .
The IS corresponding to the FB interpretation of the STFT S FB ( t , ω ) is readily obtained from the equations above together with (24).
As in the previous example, S τ FT ( t , ω ) does not allow direct comparison with (40) because it yields a continuum of ISs. Thus, superposition of the IS continuum across τ would only lead to S FD ( t , ω ) . However, unlike in the previous example, reassignment of Z w ( t ; ν ) to form S FB ( t , ω ) does not yield the correct IS given in (40). Although reassignment improves energy concentration in time-frequency representations, it is unlikely to lead to the correct IS in general. We note that other variations of syncrosqueezing methods (e.g., higher order and adaptive methods) exist [18,19,30,31,32] that may perform well for specific signals, but in general there exists no single method that is well suited for all signals.
The ISs corresponding to the FT interpretation of the STFT in (41) are shown in the right column of Figure 1. The top plot shows S FD ( t , ω ) for the linear FM chirp while the lower three plots show S τ FT ( t , ω ) for three different τ [see (19)]. As before, it is apparent that the frequency spectrum which results from taking the FT of any window grain has components that extend beyond the time support of the window, further demonstrating the conceptual flaw in the FT interpretation of the STFT.
The FB interpretation of the STFT in (41) is shown in Figure 3a. Again, we see in the figure that the oscillation rate of components is fixed and does not change along the frequency axis. With reassignment, the resulting IS shown in Figure 3b [15] has improved energy concentration but does not give the correct IS provided in (40) and shown in Figure 3c.

6.3. Bat Vocalization

Finally, we illustrate the recasting of a synchrosqueezed STFT as an IS (i.e., the IS corresponding to the FB interpretation of the STFT) using a bat vocalization signal which is popular in the time-frequency literature [18,33,34,35]. The acoustic recording features a ∼2.5 ms pulse emitted by the Large Brown Bat Eptesicus Fuscus. The original recording consists of 400 samples captured with a sampling period of 7 μ s. In order to alleviate issues associated with numerical derivatives when recasting as an IS, we up-sampled the signal by 4 × . Finally, a 128 point Hann window was used in the analysis.
The synchrosqueezed STFT, shown in Figure 4a, was computed using the fsst() function in Matlab and plotted using the default Matlab visualization (with a perceptual colormap). For comparison, the 2D IS corresponding to the FB interpretation of the STFT is shown in Figure 4b. Finally, the 3D IS corresponding to the FB interpretation of the STFT is shown in Figure 4c. Broadly speaking, the energy in Figure 4a,b are in general agreement, however, the IS provides more precision that allows the display of finer details. Moreover, while both Figure 4a,b provide information about the magnitude in the time-frequency plane, by leveraging the 3D IS visualization, we are able to additionally illustrate the spectral phase.

7. Conclusions

In this paper, we used the ISA framework to recast the FT and FB interpretations of STFT in terms of an IS. We showed that these two equivalent STFT interpretations lead to different ISs, and thus provide new insights into STFT: the FT interpretation of STFT corresponds to an IS for each window grain, while the FB interpretation of STFT is a valid IS if the STFT is synchrosqueezed. Thus, we provided a new theoretical motivation for synchrosqueezing, which is a fundamental necessity in order to cast the FB interpretation of STFT as a valid IS. We also highlighted the differences in the components for these interpretations, which have significant conceptual and practical differences. Specifically, in the FT interpretation, the components have a restrictive fixed amplitude and fixed frequency, while in the FB interpretation the components are AM–FM in nature. We leveraged the 3D IS visualization to provide a novel visualization of an STFT in which multiple aspects, i.e., magnitude and phase of each signal component, can be viewed simultaneously. Moreover, the phase is visualized in a way that is easily interpreted—this is in stark contrast with typical STFT analysis where phase is rarely visualized because it is not easily interpreted. Finally, in order to demonstrate these relations and results, we provided examples and illustrations.

Author Contributions

Conceptualization, S.S.; methodology, S.S. and P.L.D.L.; formal analysis, S.S. and P.L.D.L.; writing—original draft preparation, S.S. and P.L.D.L.; writing—review and editing, S.S. and P.L.D.L.; visualization, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Notation

The following notation is used throughout this work:
z ( t ) signal under analysis
w ( t ) window signal
Z w ( ω ; τ ) classical STFT corresponding to the FT interpretation
Z w ( t ; ν ) modified STFT corresponding to the FB interpretation
Z w ( t ; ν ) e j ν t classical STFT corresponding to the FB interpretation
S ( t , ω ) an IS corresponding to the signal z ( t )
S τ FT ( t , ω ) IS corresponding to the FT interpretation of the STFT
S FB ( t , ω ) IS corresponding to the FB interpretation of the STFT
S FD ( t , ω ) IS corresponding to the frequency domain (FT)

References

  1. Gabor, D. Theory of communication. part 1: The analysis of information. J. Inst. Electr. Eng. III 1946, 93, 429–441. [Google Scholar] [CrossRef] [Green Version]
  2. Allen, J.B.; Rabiner, L. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 1977, 65, 1558–1564. [Google Scholar] [CrossRef]
  3. Lim, J.S.; Oppenheim, A.V. Advanced Topics in Signal Processing; Prentice-Hall, Inc.: Hoboken, NJ, USA, 1987. [Google Scholar]
  4. Cohen, L. Time-Frequency Analysis; Prentice Hall: Hoboken, NJ, USA, 1995. [Google Scholar]
  5. Papandreou-Suppappola, A. (Ed.) Applications in Time-Frequency Signal Processing; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
  6. Stanković, L.; Daković, M.; Thayaparan, T. Time-Frequency Signal Analysis with Applications; Artech House: Norwood, MA, USA, 2014. [Google Scholar]
  7. Boashash, B. Time-Frequency Signal Analysis and Processing: A Comprehensive Reference; Academic Press: Cambridge, MA, USA, 2015. [Google Scholar]
  8. Flandrin, P. Explorations in Time-Frequency Analysis; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
  9. Gröchenig, K. Foundations of Time-Frequency Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
  10. Meignen, S.; Oberlin, T.; Pham, D. Synchrosqueezing transforms: From low-to high-frequency modulations and perspectives. Comptes Rendus Phys. 2019, 20, 449–460. [Google Scholar] [CrossRef]
  11. Kodera, K.; Gendrin, R.; DeVilledary, C. Analysis of time-varying signals with small BT values. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 64–76. [Google Scholar] [CrossRef]
  12. Auger, F.; Flandrin, P. Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Trans. Signal Process. 1995, 43, 1068–1089. [Google Scholar] [CrossRef] [Green Version]
  13. Sandoval, S.; De Leon, P.L. The Instantaneous Spectrum: A General Framework for Time-Frequency Analysis. IEEE Trans. Signal Process. 2018, 66, 5679–5693. [Google Scholar] [CrossRef]
  14. Auger, F.; Flandrin, P.; Lin, Y.; McLaughlin, S.; Meignen, S.; Oberlin, T.; Wu, H. Time-frequency reassignment and synchrosqueezing: An overview. IEEE Signal Process. Mag. 2013, 30, 32–41. [Google Scholar] [CrossRef] [Green Version]
  15. Oberlin, T.; Meignen, S.; Perrier, V. The Fourier-based synchrosqueezing transform. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 315–319. [Google Scholar]
  16. Auger, F.; Flandrin, P. The why and how of time-frequency reassignment. In Proceedings of the IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis, Philadelphia, PA, USA, 25–28 October 1994; pp. 197–200. [Google Scholar]
  17. Flandrin, P. Time-Frequency/Time-Scale Analysis; Academic Press: Cambridge, MA, USA, 1998. [Google Scholar]
  18. Oberlin, T.; Meignen, S.; Perrier, V. Second-order synchrosqueezing transform or invertible reassignment? Towards ideal time-frequency representations. IEEE Trans. Signal Process. 2015, 63, 1335–1344. [Google Scholar] [CrossRef] [Green Version]
  19. Pham, D.H.; Meignen, S. High-order synchrosqueezing transform for multicomponent signals analysis—With an application to gravitational-wave signal. IEEE Trans. Signal Proc. 2017, 65, 3168–3178. [Google Scholar] [CrossRef] [Green Version]
  20. Kodera, K.; DeVilledary, C.; Gendrin, R. A new method for the numerical analysis of non-stationary signals. Phys. Earth Planet. Int. 1976, 12, 142–150. [Google Scholar] [CrossRef]
  21. Friedman, D. Instantaneous-frequency distribution vs. time: An interpretation of the phase structure of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Tampa, FL, USA, 26–29 April 1985; Volume 10, pp. 1121–1124. [Google Scholar]
  22. Maes, S. Synchrosqueezed representation yields a new reading of the wavelet transform. Wavelet Appl. II 1995, 2491, 532–560. [Google Scholar]
  23. Daubechies, I.; Maes, S. A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models. Wavelets Med. Biol. 1996, 527–546. [Google Scholar]
  24. Thakur, G.; Wu, H. Synchrosqueezing-based recovery of instantaneous frequency from nonuniform samples. SIAM J. Math. Anal. 2011, 43, 2078–2095. [Google Scholar] [CrossRef] [Green Version]
  25. Griffin, D.; Lim, J. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 1984, 32, 236–243. [Google Scholar] [CrossRef]
  26. Nawab, S.; Quatieri, T.; Lim, J. Signal reconstruction from short-time Fourier transform magnitude. IEEE Trans. Acoust. Speech Signal Process. 1983, 31, 986–998. [Google Scholar] [CrossRef]
  27. Boucheron, L.E.; De Leon, P.L. On the inversion of mel-frequency cepstral coefficients for speech enhancement applications. In Proceedings of the International Conference on Signals and Electronic Systems, Krakow, Poland, 14–17 September 2008; pp. 485–488. [Google Scholar]
  28. ISA.jl: Instantaneous Spectral Analysis in Julia. 2022. Available online: https://github.com/NMSU-ISA/ISA/ (accessed on 3 January 2022).
  29. Sandoval, S. Instaneous Spectral Analysis (in Matlab). 2018. Available online: https://github.com/ssandova/ISA-public (accessed on 3 January 2022).
  30. Daubechies, I.; Wang, Y.; Wu, H.T. ConceFT: Concentration of frequency and time via a multitapered synchrosqueezed transform. Philos. Trans. R. Soc. A 2016, 374, 20150193. [Google Scholar] [CrossRef]
  31. Behera, R.; Meignen, S.; Oberlin, T. Theoretical analysis of the second-order synchrosqueezing transform. Appl. Comput. Harmon. Anal. 2018, 45, 379–404. [Google Scholar] [CrossRef] [Green Version]
  32. Pham, D.H.; Meignen, S. Demodulation algorithm based on higher order synchrosqueezing. In Proceedings of the European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 2–6 September 2019; pp. 1–5. [Google Scholar]
  33. Bat Echolocation Chirp. 2022. Available online: https://www.ece.rice.edu/dsp/software/bat.shtml (accessed on 3 January 2022).
  34. Wang, S.; Chen, X.; Tong, C.; Zhao, Z. Matching synchrosqueezing wavelet transform and application to aeroengine vibration monitoring. IEEE Trans. Instrum. Meas. 2016, 66, 360–372. [Google Scholar] [CrossRef]
  35. Yu, G.; Yu, M.; Xu, C. Synchroextracting transform. IEEE Trans. Ind. Electron. 2017, 64, 8042–8054. [Google Scholar] [CrossRef]
Figure 1. Illustration of the instantaneous spectra associated with Fourier transform interpretation of the STFT for a (left column) complex exponential and (right column) linear FM chirp. For each of the signals, the top plot shows S FT ( t , ω ) while the lower three plots show S τ FT ( t , ω ) for three instances of τ . The superposition of the ISs of the window grains yields the IS of the FT and is visually demonstrated by the “+” notation used in the figure.
Figure 1. Illustration of the instantaneous spectra associated with Fourier transform interpretation of the STFT for a (left column) complex exponential and (right column) linear FM chirp. For each of the signals, the top plot shows S FT ( t , ω ) while the lower three plots show S τ FT ( t , ω ) for three instances of τ . The superposition of the ISs of the window grains yields the IS of the FT and is visually demonstrated by the “+” notation used in the figure.
Entropy 24 00518 g001
Figure 2. Illustrations associated with the filterbank interpretation of the STFT for a complex exponential. For the complex exponential, (a) shows a visualization of Z w ( t , ν ) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB ( t , ω ) , which is the correct IS after reassignment.
Figure 2. Illustrations associated with the filterbank interpretation of the STFT for a complex exponential. For the complex exponential, (a) shows a visualization of Z w ( t , ν ) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB ( t , ω ) , which is the correct IS after reassignment.
Entropy 24 00518 g002
Figure 3. Illustrations associated with the filterbank interpretation of the STFT for a linear FM chirp. For the linear FM chirp, (a) shows a visualization of Z w ( t , ν ) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB ( t , ω ) , which shows that reassignment improves energy concentration, but does not lead to the correct IS [shown in (c)] as explained in Section 6.
Figure 3. Illustrations associated with the filterbank interpretation of the STFT for a linear FM chirp. For the linear FM chirp, (a) shows a visualization of Z w ( t , ν ) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB ( t , ω ) , which shows that reassignment improves energy concentration, but does not lead to the correct IS [shown in (c)] as explained in Section 6.
Entropy 24 00518 g003
Figure 4. Illustrations associated with an acoustic bat vocalization recording for the (a) synchrosqueezed STFT, (b) 2D IS obtained by recasting the FB interpretation of the STFT, and (c) 3D IS. Broadly speaking, the energy in the subplots are in general agreement, however, the IS provides more precision that allows the display of finer details and the 3D IS allows the illustration of the spectral phase.
Figure 4. Illustrations associated with an acoustic bat vocalization recording for the (a) synchrosqueezed STFT, (b) 2D IS obtained by recasting the FB interpretation of the STFT, and (c) 3D IS. Broadly speaking, the energy in the subplots are in general agreement, however, the IS provides more precision that allows the display of finer details and the 3D IS allows the illustration of the spectral phase.
Entropy 24 00518 g004aEntropy 24 00518 g004b
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sandoval, S.; De Leon, P.L. Recasting the (Synchrosqueezed) Short-Time Fourier Transform as an Instantaneous Spectrum. Entropy 2022, 24, 518. https://doi.org/10.3390/e24040518

AMA Style

Sandoval S, De Leon PL. Recasting the (Synchrosqueezed) Short-Time Fourier Transform as an Instantaneous Spectrum. Entropy. 2022; 24(4):518. https://doi.org/10.3390/e24040518

Chicago/Turabian Style

Sandoval, Steven, and Phillip L. De Leon. 2022. "Recasting the (Synchrosqueezed) Short-Time Fourier Transform as an Instantaneous Spectrum" Entropy 24, no. 4: 518. https://doi.org/10.3390/e24040518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop