Recasting the (Synchrosqueezed) Short-Time Fourier Transform as an Instantaneous Spectrum

In a previous work, we proposed a time-frequency analysis called instantaneous spectral analysis (ISA), which generalizes the notion of the Fourier spectrum and in which instantaneous frequency is utilized to the fullest extent. In this paper, we recast both the Fourier transform (FT) and filterbank (FB) interpretations of the short-time Fourier transform (STFT) as instantaneous spectra. We show that to recast the FB interpretation of STFT as an instantaneous spectrum with valid structure, frequency reassignment is a fundamental necessity, thus demonstrating that this IS is closely related to the synchrosqueezed STFT. This result provides a new theoretical motivation for the synchrosqueezed STFT. Finally, we illustrate through example the instantaneous spectra corresponding to the FT and FB interpretations of STFT using two closed-form examples.


Introduction
In Gabor's seminal work [1], the notion of joint time-frequency analysis was proposed and led to the development of the short-time Fourier transform (STFT) [2,3]. Even today, STFT is the most well-known and utilized method in time frequency analysis [4][5][6][7][8][9] and extensions such as the synchrosqueezing transform (SST) and synchrosqueezed STFT are still actively being investigated [10]. Synchrosqueezing and reassignment are typically motivated as post-processing techniques in order to improve readability in the time-frequency plane [11,12] or as refinements based on phase information.
In [13], we developed a generalized framework for time-frequency analysis which we termed instantaneous spectral analysis (ISA) and proved that the instantaneous spectrum (IS) exactly localizes signal components in an instantaneous bandwidth sense. Using ISA, we are able to define more general time-frequency spectra than is possible using STFT. This is a result of the basic time-frequency atom or component utilized in these methodologies. More specifically, ISA theory allows the use of an AM-FM component whereas STFT uses a far more restrictive component. Moreover, IS theory allows the unambiguous specification of an IS S(t, ω) and a related complex-valued signal z(t) corresponding to a component set S Although this may be considered as an ideal synthesis model for instantaneous spectra and AM-FM models, the many-to-one mappings are sources of information loss which allow an infinite number of instantaneous spectra and component sets to map to the same signal. As a result, the reverse process is under determined and no unique analysis model exists.
One approach which can be taken for the analysis stage is to consider two stages: (1) signal decomposition and (2)  where all ambiguity lies in how the signal z(t) is decomposed into a set of componentsthere exist an infinite number of ways to express a whole as a sum of parts. However, every decomposition has the advantage that it can be associated with an instantaneous spectrum in which each component is exactly localized. Thus, ISA can be immediately used to enhance existing decomposition methods with an associated instantaneous spectrum with exact time-frequency localization. However, while the distinct separation of the analysis stage into decomposition and demodulation is well suited for pairing ISA with decomposition methods and AM-FM models, alternate approaches to the analysis stage exist.
For example, there exists a great body of literature devoted to the study of timefrequency distributions (TFDs), and although TFDs and ISs are both mathematical objects which describe signals in time-frequency spaces, they describe different types of spaces. Thus, it is natural to seek to establish the connection of particular TFDs as special cases of ISs, if and when this is possible. Let T denote an integral transformation relating a complex-valued signal z(t) and a TFD Z(t, ω), and T −1 denote its inverse, assuming it exists The focus of traditional TFD analysis may be considered as the theory regarding the existence and mathematical properties for different choices of the transformation T . ISA theory can be used in conjunction with other time-frequency methods as a framework to pose and address other important questions. For example, suppose we wish to study the performance of a particular integral transformation T on a particular class of signals. If we represent the particular class of signals by constraints on the form of the components in S , then we can compare the IS obtained by recasting the time-frequency distribution Z(t, ω), denoted byŜ (t, ω), by considering and comparingŜ (t, ω) to S(t, ω).
The purpose of this work is to show how to enforce the structure necessary to recast the time-frequency distribution Z(t, ω) to an instantaneous spectrumŜ (t, ω), if we choose the integral transformation T as the STFT. We show that there are two ways to do the recasting, each corresponding to one of the two classic interpretations of STFT. Our contributions are as follows:

1.
We show that the two equivalent STFT interpretations lead to different ISs, and thus provide new insights into STFT. In particular, we show that the IS corresponding to the FT interpretation of STFT corresponds to an IS for each window grain, while on the other hand, a single IS corresponding to the FB interpretation of STFT exists when STFT is synchrosqueezed [14,15]. As a result, in the FT interpretation, the components have a restrictive fixed amplitude and fixed frequency, while in the FB interpretation, the components are AM-FM in nature. This results in significant conceptual and practical differences between the two interpretations.

2.
We contribute a new theoretical motivation for synchrosqueezing. In particular, in order to recast the FB interpretation as an IS, we show that reassignment in frequency is a fundamental requirement. This is in contrast to the view of synchrosqueezing largely as a heuristic approach to improve energy concentration in the time-frequency plane.

3.
By recasting the two STFT interpretations as an IS, we can leverage the 3D IS visualization [13] to contribute a novel visualization of STFT. This is advantageous because the 3D IS allows visualization of multiple aspects of the signal decomposition simultaneously, including both magnitude and phase of each signal component. While it may take the reader some time to become comfortable with the 3D visualization, we believe it has significant advantages in terms of interpretability and note that the STFT phase spectrum is almost never considered or visualized.
For the benefit of the reader, this paper reviews key concepts of STFT and synchrosqueezed STFT, however, this work is not intended to be a review paper. For such reviews, we refer the reader to [2,3] for STFT and [14,16] for reassignment and synchrosqueezing. Rather, this paper recasts STFT as an IS, which we believe to be a more powerful signal analysis framework. The remainder of this paper is organized as follows. In Section 2, we introduce our notation for the continuous STFT and provide the expressions specifically related to the FT and FB interpretations. In Section 3, we provide a brief history of the development of reassignment techniques leading to the synchrosqueezed STFT. In Section 4, we provide a brief summary of ISA theory. However, as this work is an extension of our prior work, it is strongly recommended that [13] be read in advance. In Section 5, we give our main contribution by recasting the FT and FB interpretations of STFT as an IS. In Section 6, we provide illustrations and discussion on the relationships of the FT and FB interpretations of STFT to IS for example signals. Finally, in Section 7, we provide concluding remarks.

The Short-Time Fourier Transform
In this section, we review the (continuous) STFT following the development of the discrete STFT by Allen and Rabiner [2]. We begin by choosing a real, even-symmetric window function w(·) such that This ensures that and superimposing all "window grains" z(t)w(τ − t) over τ gives Next, we review the two interpretations of STFT [2,3].

Fourier Transform Interpretation of STFT
The FT of (3) yields where Z w (ω; τ) denotes the classical (the use of classical and modified for describing STFT is based on the current literature). See for example [4,[17][18][19]) STFT. Equating the expressions inside the integrals of (4c) and (4d), shows that STFT may be considered as the FT of all window grains This may be viewed as a function of ω at a fixed value of time shift τ. The signal may be recovered by means of the overlap-add (OLA) method for short-time synthesis

Filterbank Interpretation of STFT
The FB interpretation of the modified STFT may be developed by considering w(t)e jνt as a channelizer with center frequency ν where * denotes convolution. Equivalently, the classical STFT which may be viewed as a function of t at a fixed value of ν, i.e., the signal is frequency shifted and filtered with the impulse response w(t) on a continuum of frequency shifts ν. The signal may be recovered by means of the filterbank summation (FBS) method for short-time synthesis

Complimentary Interpretations
We point out to the reader a crucial difference between the meaning of the independent variables in (5) and (8) even though these equations are equivalent In (5), τ is a time shift variable and ω is instantaneous frequency (IF), while in (8), t is time instant and ν is a frequency shift variable. While this difference is well known and insignificant in STFT theory, it results in major differences in the relationship of STFT to IS based on the interpretation taken. Note that regardless of how STFT is computed, we may switch interpretations by interchanging variables t ↔ τ and variables ω ↔ ν in (10).

Synchrosqueezed Short-Time Fourier Transform
In the 1970s, Kodera et al. [11,20] proposed to modify the spectrogram by taking into account the phase information that is usually discarded. The basic idea is to reassign energy in the spectrogram to a new time and frequency location by utilizing phase derivatives. Kodera's work received little attention for decades [14], and in the 1980s Friedman also proposed spectrogram reassignment [21], apparently without knowledge of the work by Kodera. In Friedman's approach, reassignment occurs in frequency but not in time. In both approaches, phase is used to perform reassignment but is subsequently discarded, thus preventing reconstruction. Slow adoption of these methods is often attributed to the inability to reconstruct the signal and the numerical problems associated with derivative approximations [12,14,16]. Auger and Flandrin showed that the numerical problems associated with the phase derivative could be avoided by computing three STFTs with related window functions, which led to a more efficient implementation.
In the 1990s, reassignment resurfaced when two independent groups developed the reassignment method (RM) [12,14,16] and the SST [22,23]. The RM developed by Auger and Flandrin is similar to Kodera's work in that reassignments occur in both time and frequency. Additionally, they showed that the reassignment concept could be generalized to work for a broader class of time-frequency representations, e.g., in the Wigner-Ville distribution by recasting the problem in terms of centroids instead of phase derivatives. The SST developed by Maes and Daubechies is similar to Friedman's work in that reassignments occur only in frequency using phase derivatives. However, differences from Friedman's work include the use of a complex wavelet transform instead of the STFT and a reassignment of a complex value rather than a real value.
In [14,24], the SST is computed using an STFT rather than a wavelet transform, leading to the synchrosqueezed STFT

Instantaneous Spectral Analysis
In [13], we introduced ISA as a general framework for time-frequency analysis consisting of three parts: (1) a parameter set, (2) an IS, and (3) a complex AM-FM signal model. More specifically, in this framework: (1) a signal is represented by a set of canonical triplets S {C 0 , C 1 , · · · , C K−1 }, (2) each set has a single-valued mapping to an IS S → S(t, ω), and (3) each IS has a single-valued mapping to a signal S(t, ω) → z(t) The canonical triplet for the kth AM-FM component is where a k (t) is the instantaneous amplitude (IA), ω k (t) is the IF, and φ k is the phase reference.
The kth complex AM-FM component is then given by where θ k (t) is the phase function, s k (t) is the real part, and σ k (t) is the imaginary part. With (12) and (13), the IS is defined as The IS S(t, ω; S ) maps to the complex signal z(t; Finally, the complex signal z(t; S ) is represented as a superposition of K (possibly infinite) complex AM-FM components We refer the reader to [13] for additional details. We emphasize to the reader that although IS is expressed by a (complex-valued) function of t and ω, not every function of t and ω has the necessary structural requirements to be a valid IS. This is not unlike STFT which is also a (complex-valued) function of time and frequency and where not every function of time and frequency has the necessary structural requirements to be a valid STFT. For example, it is well understood that when modifying the STFT magnitude there is the distinct possibility that modification may lead to an invalid STFT. In this case, inversion requires algorithms such as least-squared error inverse STFT (LSE-ISTFT) which inverts the invalid STFT to the signal which has an STFT closest (in an LSE sense) to the invalid STFT [25][26][27].
Moreover, although both the STFT and IS provide spectral representations in time and frequency, they have different structural requirements. This is due to the fact that the requirements are imposed by the analysis equations in (5) and (8), whereas the requirements of IS are imposed by the definition in (14). One implication of this is that one cannot assume that an STFT has the necessary structure to be a valid IS. On one hand, we show that although the FT interpretation of STFT does not possess the necessary structure to be a valid IS, it may be interpreted as a continuum of ISs. On the other hand, we show that while the FB interpretation of STFT does not possess the necessary structure to be a valid IS, the structure necessary to utilize ISA theory may be imposed by synchrosqueezing the STFT.

Relation to Frequency Domain Analysis
In [13], we gave proof that frequency domain analysis corresponds to a specialized (and restricted) form of an IS when a k (t) = a k ω 0 , ω k (t) = kω 0 , and the discrete set takes on a continuum, i.e., ω 0 → 0, Finally, evaluating (17) at t = 0 yields We refer the reader to [13] for additional details.

Recasting the Short-Time Fourier Transform as an IS
IS provides a signal analysis which is both instantaneous in t and ω. From Section 2, we see that STFT allows for instantaneous analysis in only one of the variables, i.e., the FB interpretation is instantaneous in time while the FT interpretation is instantaneous in frequency (albeit constant frequency). In this section, we recast each STFT interpretation as an IS. While the two interpretations are conceptual in nature, the ISs corresponding to these interpretations take on different mathematical forms. As we show, the IS corresponding to the FB interpretation S FB (t, ω) makes use of AM-FM components, and thus is easily understood in terms of a single IS. On the other hand, the IS corresponding to the FT interpretation S FT τ (t, ω) uses a more restrictive component and is best understood using a continuum of ISs. In this section, we continue from (11) and develop (19), (20), and (24), whose context in the overall theory is illustrated below

IS Corresponding to the FT Interpretation of STFT
The FT interpretation of the classical STFT in (5) is that of a continuum of FTs indexed by τ. Thus, from the relation of the FT to the IS in (17), the IS corresponding to the window grain at t = τ is Superimposing the ISs in (19) gives the IS corresponding to the FT interpretation of STFT Equation (4d) shows that STFT is a decomposition of FT. Likewise, the continuum of ISs corresponding to the window grains decomposes the IS corresponding to the FT interpretation In other words, superimposing the ISs corresponding to the FT interpretation of STFT yields the IS corresponding to frequency domain analysis (Fourier transform) S FD (t, ω), and as a result does not provide a new IS to study. Rather, when taking the FT interpretation of STFT, we only gain new insights by studying the ISs corresponding to the window grains.

IS Corresponding to the FB Interpretation of STFT
The FB interpretation of the modified STFT in (7) is that of an infinite number of signal components, each corresponding to frequency shift −ν followed by filtering (convolution) with w(·). Naively comparing (15) with (9) one might be tempted to assume that a corresponding IS may be formed with However, this would be incorrect because it does not provide the structure necessary to be a valid IS. This is further illustrated and discussed below. On the other hand, we can construct an IS with valid structure from Z w (t; ν) by reassigning the component associated with frequency shift ν to the appropriate IF. We begin by writing the modified STFT in polar form as Using the IS definition in (14) we have which is immediately recognized as a synchrosqueezed STFT [14,15]. We note that while most developments of reassignment/synchrosqueezing are motivated as post-processing techniques in order to improve readability of spectrograms, in our development, reassignment in frequency is a fundamental necessity to ensure a valid IS structure. Furthermore, techniques which reassign in time are not compatible with a valid IS structure.

Discussion
A critical difference between the ISs corresponding to FT and FB interpretations of STFT is the form of the components utilized. With the FT interpretation, the individual components are obtained from the classical STFT Z w (ω; τ), as follows. For each point (τ, ω), the component is formed as Here, Z w (ω; τ) acts as an initial condition, and multiplication with e jωt projects the component forward and backward in time. With the FB interpretation, the individual components are obtained from the modified STFT Z w (t; ν), as follows. For each frequency shift ν, the component is formed as While (25) and (26) have similar mathematical forms, they are very different because Z w (ω; τ) is independent of t while Z w (t; ν) is dependent on t. As a result, the components in (25) have a fixed amplitude |Z w (ω; τ)| and fixed frequency ω, while the components in (26) have, in general, a time-varying amplitude |Z w (t; ν)| and time-varying frequency d dt arg{Z w (t; ν)}. Thus, the former component is very restrictive, whereas the latter component is AM-FM in nature. This results in significant conceptual and practical differences between S FT τ (t, ω) in (19) and S FB (t, ω) in (24), even though mathematically there is little practical difference between Z w (ω; τ) and Z w (t; ν). Moreover, (26) along with (23) explains why Z w (t; ν) does not have the necessary structure to be interpreted as an IS: the energy associated with the component ψ ν 0 (t) is located at channelizer frequency ν = ν 0 in Z w (t; ν 0 ), rather than at the appropriate IF location d dt arg{Z w (t; ν 0 ))}.

Instantaneous Spectra Corresponding to STFT Interpretations for Example Signals
In this section, we illustrate through example the IS corresponding to the two interpretations of STFT. Information regarding visualization of the IS can be found in [13] and software for IS visualization at [28,29]. The examples shown below consist of two signals which can be expressed and analyzed in a closed-form as well as a real world signal, i.e., acoustic recording. In order to develop closed-form expressions for STFT, we choose to analyze the complex exponential and linear FM chirp with a Gaussian window. Our analysis uses the following FT pairs. First, the FT of a quadratic chirplet is given by where p 1 ∈ C and Re{p 1 } > 0. Second, it can be shown by completing the square and using (27), that the FT of a product of time-shifted quadratic chirplets is given by with the chirp parameters p 1 ∈ C, p 2 ∈ C, Re{p 1 } > 0, Re{p 2 } > 0, p 3 = p 1 + p 2 , T = (t 1 p 1 + t 2 p 2 )/p 3 , and c = p 2 /p 3 exp −(t 2 1 p 1 + t 2 2 p 2 − T 2 p 3 ) .

Complex Exponential
In the first example, consider the canonical triplet which using (16a) gives the complex exponential signal and with (14) gives the IS Choosing a Gaussian window we compute the STFT corresponding to the FT interpretation by using (5), choosing (27), and using time-and frequency-shift properties of the FT With (33) and (10) we then form the IS corresponding to the FT interpretation Next, we compute the STFT corresponding to the FB interpretation using (33) and (10) where θ w (t; ν) = ω 0 t.
Substituting into (24), we form the IS corresponding to the FB interpretation Comparing (37) with (31), we see that after reassignment S FB (t, ω) yields the correct IS (to within a scale factor) for this signal. On the other hand, direct comparison of (34) with (31) is not possible. While one could superimpose (34) on the continuum τ as described in (20), this would only lead to S FD (t, ω). Although this does lead to a valid IS, it is not useful for time-frequency analysis because it provides the same information as FT.
The ISs corresponding to the FT interpretation of the STFT in (34) are shown in the left column of Figure 1. The top plot shows S FD (t, ω) for the complex exponential while the lower three plots show S FT τ (t, ω) for three different τ [see (19)]. The fact that the FT interpretation yields a representation that may be considered as a decomposition into a continuum of ISs of the window grains at each τ [see (21d)] is visually demonstrated by the "+" notation used in the figure. From the figure, it is apparent that the frequency spectrum which results from taking the FT of any window grain has components that extend beyond the time support of the window. Thus, even if a window grain has finite time support, the associated frequency spectrum has infinite extent and is not simply limited to the local vicinity of the window. This demonstrates that while the STFT is mathematically correct, there is a conceptual flaw in the IS obtained by recasting the FT interpretation of STFT.
The FB interpretation of the STFT in (35) is shown in Figure 2a. As discussed in Section 5.2, this STFT is not a valid IS and the structural problem may be seen in (36) and observed in the figure. In particular, the IF of the component corresponding to the channelizer with center frequency ν is given by d dt θ w (t; ν) = ω 0 , i.e., the IF has constant value ω 0 and is thus independent of the value ν. This can be seen in the figure by observing that the oscillation rate of components is fixed and does not change along the frequency axis. On the other hand, reassigning this IS using (24) gives (37), which is illustrated in  Illustrations associated with the filterbank interpretation of the STFT for a complex exponential. For the complex exponential, (a) shows a visualization of Z w (t, ν) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB (t, ω), which is the correct IS after reassignment.
Choosing the Gaussian window in (32), we compute the STFT corresponding to the FT interpretation by using (5); choosing p 1 = −jω c /2, p 2 = β 2 /2, t 1 = 0, and t 2 = τ in (28) and using time-and frequency-shift properties of the FT The IS corresponding to the FT interpretation S FT τ (t, ω) is then formed by substituting the above into (19).
Next, we compute the STFT corresponding to the FB interpretation using (41) and (10), then, in polar form (see (23)) we have Finally, the IF as a function of ν is given by The IS corresponding to the FB interpretation of the STFT S FB (t, ω) is readily obtained from the equations above together with (24). As in the previous example, S FT τ (t, ω) does not allow direct comparison with (40) because it yields a continuum of ISs. Thus, superposition of the IS continuum across τ would only lead to S FD (t, ω). However, unlike in the previous example, reassignment of Z w (t; ν) to form S FB (t, ω) does not yield the correct IS given in (40). Although reassignment improves energy concentration in time-frequency representations, it is unlikely to lead to the correct IS in general. We note that other variations of syncrosqueezing methods (e.g., higher order and adaptive methods) exist [18,19,[30][31][32] that may perform well for specific signals, but in general there exists no single method that is well suited for all signals.
The ISs corresponding to the FT interpretation of the STFT in (41) are shown in the right column of Figure 1. The top plot shows S FD (t, ω) for the linear FM chirp while the lower three plots show S FT τ (t, ω) for three different τ [see (19)]. As before, it is apparent that the frequency spectrum which results from taking the FT of any window grain has components that extend beyond the time support of the window, further demonstrating the conceptual flaw in the FT interpretation of the STFT.
The FB interpretation of the STFT in (41) is shown in Figure 3a. Again, we see in the figure that the oscillation rate of components is fixed and does not change along the frequency axis. With reassignment, the resulting IS shown in Figure 3b [15] has improved energy concentration but does not give the correct IS provided in (40) and shown in Figure 3c.   , ν) where the coloring is based on magnitude and height reflects the real value; this plot is not a valid IS. The plot in (b) shows S FB (t, ω), which shows that reassignment improves energy concentration, but does not lead to the correct IS [shown in (c)] as explained in Section 6.

Bat Vocalization
Finally, we illustrate the recasting of a synchrosqueezed STFT as an IS (i.e., the IS corresponding to the FB interpretation of the STFT) using a bat vocalization signal which is popular in the time-frequency literature [18,[33][34][35] . The acoustic recording features a ∼2.5 ms pulse emitted by the Large Brown Bat Eptesicus Fuscus. The original recording consists of 400 samples captured with a sampling period of 7 µs. In order to alleviate issues associated with numerical derivatives when recasting as an IS, we up-sampled the signal by 4×. Finally, a 128 point Hann window was used in the analysis.
The synchrosqueezed STFT, shown in Figure 4a, was computed using the fsst() function in MATLAB and plotted using the default MATLAB visualization (with a perceptual colormap). For comparison, the 2D IS corresponding to the FB interpretation of the STFT is shown in Figure 4b. Finally, the 3D IS corresponding to the FB interpretation of the STFT is shown in Figure 4c. Broadly speaking, the energy in Figure 4a,b are in general agreement, however, the IS provides more precision that allows the display of finer details. Moreover, while both Figure 4a,b provide information about the magnitude in the time-frequency plane, by leveraging the 3D IS visualization, we are able to additionally illustrate the spectral phase.   Broadly speaking, the energy in the subplots are in general agreement, however, the IS provides more precision that allows the display of finer details and the 3D IS allows the illustration of the spectral phase.

Conclusions
In this paper, we used the ISA framework to recast the FT and FB interpretations of STFT in terms of an IS. We showed that these two equivalent STFT interpretations lead to different ISs, and thus provide new insights into STFT: the FT interpretation of STFT corresponds to an IS for each window grain, while the FB interpretation of STFT is a valid IS if the STFT is synchrosqueezed. Thus, we provided a new theoretical motivation for synchrosqueezing, which is a fundamental necessity in order to cast the FB interpretation of STFT as a valid IS. We also highlighted the differences in the components for these interpretations, which have significant conceptual and practical differences. Specifically, in the FT interpretation, the components have a restrictive fixed amplitude and fixed frequency, while in the FB interpretation the components are AM-FM in nature. We leveraged the 3D IS visualization to provide a novel visualization of an STFT in which multiple aspects, i.e., magnitude and phase of each signal component, can be viewed simultaneously. Moreover, the phase is visualized in a way that is easily interpreted-this is in stark contrast with typical STFT analysis where phase is rarely visualized because it is not easily interpreted. Finally, in order to demonstrate these relations and results, we provided examples and illustrations. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.