Multiscale Decomposition and Spectral Analysis of Sector ETF Price Dynamics

: We present a multiscale analysis of the price dynamics of U.S. sector exchange-traded funds (ETFs). Our methodology features a multiscale noise-assisted approach, called the complementary ensemble empirical mode decomposition (CEEMD), that decomposes any ﬁnancial time series into a number of intrinsic mode functions from high to low frequencies. By combining high-frequency modes or low-frequency modes, we show how to ﬁlter the ﬁnancial time series and estimate conditional volatilities. The results show the different dynamics of the sector ETFs on multiple timescales. We then apply Hilbert spectral analysis to derive the instantaneous energy-frequency spectrum of each sector ETF. Using historical ETF prices, we illustrate and compare the properties of various timescales embedded in the original time series. Through the new metrics of the Hilbert power spectrum and frequency deviation, we are able to identify differences among sector ETF and with respect to SPY that were not obvious before.


Introduction
Asset prices are driven by factors of different timescales, ranging from long-term market regimes to short-term fluctuations. Many market observations and empirical studies (Fouque et al. 2003;In and Kim 2012;Maghyereh et al. 2019;Yahya et al. 2019) suggest that many financial time series often exhibit nonstationary behaviors, such as time-varying volatility and trends. These characteristics can hardly be captured by linear models and call for an adaptive and nonlinear approach for analysis.
One approach for analyzing nonstationary time series is the Hilbert-Huang transform (HHT) introduced by Huang et al. (1998). The HHT method can decompose any time series into oscillating components with nonstationary amplitudes and frequencies using empirical mode decomposition (EMD). This fully adaptive method provides a multiscale decomposition for the original time series, which gives richer information about the time series. The instantaneous frequency and instantaneous amplitude of each component are later extracted using the Hilbert transform. The decomposition onto different timescales also allows for reconstruction up to different resolutions, providing a smoothing and filtering tool that is ideal for noisy financial time series.
As discussed in Huang (2014), the method of HHT and its variations have been applied in numerous fields, from engineering to geophysics. Applications of HHT to finance date back to the work by Huang and co-authors on modeling mortgage rate data (Huang et al. 2003). Empirical mode decomposition (EMD) has been used for financial time series forecasting (Nava et al. 2018b;Wang and Wang 2017) and for examining the correlation between financial time series (Nava et al. 2018a).
In terms of methodology, there have been several studies on the variations and alternatives to EMD, including optimization-based methods Shi 2011, 2013;Hou et al. 2009;Huang and Kunoth 2013), ensemble empirical-mode decomposition (EEMD) for tackling mode mixing (Wu and Huang 2009), and noise-assisted alternative approaches (Yeh et al. 2010).
For financial time series with high levels of intrinsic noise, we apply the complementary ensemble empirical mode decomposition (CEEMD). Like EMD, CEEMD decomposes any time series-stationary or not-into a number of intrinsic mode functions representing the local characteristics of the time series at different timescales, but the timescale separation is improved by resolving mode mixing in EMD (Huang et al. 1999). The noise-assisted approach is also more robust to intrinsic noise in the data. CEEMD have been found to be useful for forecasting (Niu et al. 2016;Tang et al. 2015) and signal processing Li et al. (2015). In our companion papers Zhao (2021a, 2021b), we introduce variations of the complementary ensemble EMD method to analyze cryptocurrency and equity prices, respectively.
We focus our study on the multiscale analysis of sector ETF price dynamics. Our noise-assisted approach decomposes any financial time series into a number of intrinsic mode functions, along with the corresponding instantaneous amplitudes and instantaneous frequencies. Different combinations of modes allow us to reconstruct the time series using components of different timescales. We then apply Hilbert spectral analysis to define and compute the associated instantaneous energy-frequency spectrum to illustrate the properties of various timescales embedded in the original time series. In particular, we derive and compute the central frequency associated with a collection of sector ETFs, which allows us to compare the distinct behaviors of ETF prices.
In the literature, there are relatively few studies that examine and compare the price dynamics of sector ETFs. In Arugaslan and Samant (2014), traditional statistical estimation techniques are used to compare the price dynamics of ETFs. In Tiwari et al. (2017), the authors use multifractal detrended fluctuation analysis (MF-DFA) and the Hurst exponent to show that the sector ETF market is multifractal. They also argue that the dynamics on long and short timescales challenge the market efficiency hypothesis. Similar studies in other markets can also be found in Aloui et al. (2018); Mensi et al. (2017); Yang et al. (2016). These studies provide evidence and motivation that multiscale analysis is necessary to fully understand the price dynamics of various assets.
Our primary objective is to present a new method to analyze sector ETFs on a multiscale level. Comparing to previous studies, our methodology provides a new multiscale time-frequency perspective to investigate sector ETF price dynamics. To our best knowledge, our study is the first application of CEEMD to sector ETFs, and our Hilbert spectral analysis and derivation of power spectrum for sector ETFs are also new.
Moreover, our approach is a fully data-driven method and flexible to adapt to the different timescales embedded in the time series. Through the new metrics like filtered conditional volatility and the Hilbert power spectrum, we provide new ways to discern differences among sector ETFs that are hidden in certain timescales. For example, we introduce frequency deviation in order to measure the synchronization between a sector ETF and the market index.
Among our results, we find that sector ETFs exhibit different degrees of volatility asymmetry. Perhaps surprisingly, some sector ETFs, such as XLP, XLY, and XLC, tend to have higher volatility in an upside market than that in a downside market.
The rest of the paper is structured as follows. We present our time series decomposition method in Section 2. As a direct application, CEEMD is used for filtering ETF time series in Section 3. In Section 4, we discuss the estimation of multiscale conditional volatilities and compare them across sector ETFs. In Section 5, we derive the energy-frequency spectra of various sector ETFs and provide a comparative analysis. Lastly, concluding remarks are provided in Section 6.

Multiscale Decomposition Methodology
For our analysis of financial time series, a major building block is the empirical mode decomposition (EMD) introduced by Huang et al. (1998). In its simplest form, this method iteratively decomposes a time series x(t) into a finite sequence of oscillating components c 1 (t), · · · , c n (t), plus a nonoscillatory trend called the residual term r n (t). Precisely, we have The resulting components c j (t)s are called intrinsic mode functions (IMFs) by Huang et al. (1998). They have certain oscillatory properties and admit well-behaved and physically meaningful Hilbert transform. For each IMF, the numbers of extrema and zero crossings must be equal or at most, differ by one, and the maxima of the function defined by the upper envelope and the minima defined by the lower envelope must sum up to zero at any time. These conditions ensure pure oscillation while allowing time-varying frequency and amplitude. Mathematically, an IMF c(t) can be expressed as where a(t) ≥ 0 is the instantaneous amplitude, and θ(t) is the phase function with θ (t) ≥ 0.

EMD Algorithm and Variations
In other words, the standard EMD algorithm (Huang et al. 1998;Rilling et al. 2003) is a sifting process that decomposes any time series into a finite set of IMFs that oscillate on different timescales, plus a nonoscillatory residual term. The key idea is as follows: look for the finest oscillation by finding all the local maxima and minima, and then subtract the remaining trend until the oscillation satisfies the IMF conditions. Each IMF discovered is removed sequentially from the time series until a nonoscillatory residual term remains. The residual term is a constant or monotonic function, or has, at most, one maximum or minimum.
The algorithm is summarized as follows: • Initialize the residual term as r 0 (t) = x(t) and set j = 1. • While r j−1 (t) is not nonoscillatory, which means having more than one maximum/ minimum, apply the following sifting process: • Initialize the component as c j (t) = r j−1 (t).
-Interpolate the maxima of c j (t) using a cubic spline as the upper envelope u(t), and interpolate the minima of c j (t) using a cubic spline as the lower envelope l(t). Compute the mean of the upper and lower envelopes m(t) = 1 2 (u(t) + l(t)).
Stop when c j (t) satisfies the criteria of an IMF. Let c j (t) be the j-th component, and iterate r j (t) = r j−1 (t) − c j (t) and j ← j + 1.
This algorithm is purely empirically driven and adaptive with minimal assumption on temporal changes and distributional properties, making it ideal for nonstationary financial time series.
When decomposing a time series into IMF components of different frequencies, the phenomenon of mode mixing may arise. Mode mixing is defined as either one IMF consisting of widely disparate scales, or signals of similar scales residing in several IMF components (Huang et al. 1999;Wu and Huang 2009). This problem poses potential challenges on the interpretation of IMFs, and can be exacerbated by the high degrees of nonstationarity and noise commonly observed in sector ETF prices.
The ensemble empirical mode decomposition (EEMD) was proposed by Wu and Huang (2009) to resolve the mode mixing issue. It is a noise-assisted signal processing technique that extracts each component from an ensemble mean computed based on N trials. In each trial i, an i.i.d. white noise w i (t) with a zero mean and finite variance is added to the original time series x(t), and x(t) + w i (t) is referred to as the signal in this trial. The plain EMD algorithm is then applied to the signal, outputting the IMFs c ij (t), j = 1, · · · , n, and the residual term r in (t). Finally, the ensemble mean of the IMF components and residual terms across all the N trails is regarded as the true mode extraction. The components are given by Applying these to (1), we obtain Summing up (4) over the trials divided by N, the ensemble mean of the IMF components is ). Hence, a large ensemble size is desired, though it can also be prohibitive in terms of computational cost and speed. To address this particular issue, Yeh et al. (2010) introduced the complementary ensemble EMD (CEEMD) method. CEEMD adds a complementary negative noise −w i (t) to the ensemble for each trial, thus expanding the total ensemble size to 2N. The components from the ensemble mean sum up to equal the original time series: This holds exactly regardless of the choice of N, thus reducing the need to have a very large ensemble size. In addition to reducing mode mixing, CEEMD is also more robust to intrinsic noise in the original data, as it automatically averages out extra independent noises in the process.

CEEMD for Sector ETFs
We now apply CEEMD to analyze the empirical price dynamics of U.S. sector ETFs. As summarized in Table 1, there are, in total, eleven funds representing all the sectors in the U.S. market. Each ETF represents a portfolio of equities within the corresponding sector. The ETFs hold different numbers of stocks, and their assets under management (AUM) range from under U.S.$10 bil to over U.S.$70 bil. All these sector ETFs, except the real estate sector ETF VNQ, are the Select Sector SPDR Funds issued by State Street Global Advisors (SSGA). SSGA also has a real estate sector ETF (ticker: XLRE), but its AUM is much smaller than a similar fund, VNQ, issued by Vanguard. Hence, we select VNQ over XLRE as the real estate sector ETF in our study. Moreover, the communication services sector ETF XLC has the shortest price history since it was created in June 2018. To make a fair comparison across all the ETFs, we fix the common time frame from 20 June 2018 to 15 June 2021 to accommodate for the shortest price history of the ETFs in Table 1.
In Figures 1-6, we illustrate the decomposition implemented from CEEMD. We take the log prices of the top six sector ETFs: {XLK, XLE, XLF, XLV, XLY, VNQ}, and apply CEEMD to obtain five IMFs along with the residuals. As such, the top row in each plot shows the original log price time series x(t), followed by the IMF c j (t)s with decreasing frequencies, ending with the residual term. The number of IMF components n = 5, so along with the residual term, there are six components in total. As we can see, the first IMF exhibits the highest frequency of fluctuation, whereas the smooth residual term reflects the overall trend.  The modes on different timescales can indicate different behavior of the time series. Let's consider the recent COVID-19 event, for example. When S&P 500 and all sector ETFs began the sharp decline from 2/21/2020 on, intrinsic modes 1, 2, and 3 also showed a notable peak around the time period. The lower-frequency modes 4 and 5 exhibited a steep decrease towards the day of market crash.

Multiscale Filtering
The CEEMD algorithm is an iterative sifting process that identifies the rapid fluctuations and long-term trends from any given time series. As a consequence, the first few components have higher frequency and the last few components have lower frequency, and they all sum up to the original time series. Hence, by combining different components from the decomposition, CEEMD can be used as a filtering and reconstruction tool.
In the reconstruction of the original time series using the IMF components, we can choose a subset of modes as a filter for desired information. By removing the first few high-frequency components, we create a low-pass filter; that is, This reconstruction using only the last few components can serve as a smoothing of the time series. Similarly, we can also build a high-pass filter with which captures the high-frequency local behaviors, and can also be used to estimate volatility.
In each case, the number of components (including the residual term) equals to m l and m h , respectively. We use x (m l ) H (t) to denote the low-pass and high-pass filtered reconstruction of x(t) with m l and m h components. Note that the low-pass filter and the high-pass filter are complementary to each other when m l + m h = n + 1, where n + 1 is the total number of components (including the residual term).
To illustrate these filters, we consider four sector ETFs over a 3-year period. For each ETF, let s(t) be the value of the time series and x(t) := log(s(t)) be its log price. In each case, CEEMD is applied to the time series x(t) to extract the IMFs and residual.
In Figure 7, we show the low-pass filters of the price time series of four sector ETFs: {XLK, XLE, XLF, and XLV}. This involves applying (6) using different collections of components. Specifically, we used the last 4, 3, and 2 components including the residual term, that is, x (m l ) L (t) with m l = 4, 3, 2. The reconstructed log prices were taken exponentially to approximate the original price data. We can see that, with more components included in the reconstruction, the resulting time series resembles the original time series on a finer timescale. Compared to some other time series smoothing techniques, such as the moving average, the CEEMD low-pass filter achieves smoothing without lags.

Multiscale Volatility
We now examine the multiscale properties of volatility we apply CEEMD to filter the original time series into low-pass and high-pass components. Statistical analysis, such as mean and volatility, can be defined on the filtered time series.

High-/Low-Pass Filters
To set up, let s(t) be the price of the asset price, and x(t) = log s(t) be the log price. Implement CEEMD on x(t) and apply (6) and (7) to get x (m l ) H (t), with m l and m h being the number of components in the low-pass and high-pass filters, respectively. To study the statistical properties of the asset prices, we define the low-pass and high-pass log returns: The low-pass volatility and high-pass volatility are the standard deviation of the low-pass and high-pass returns, defined by the unbiased estimators where are the mean low-pass and high-pass log returns. Figures 8 and 9 show the 3-month rolling volatility of XLK, XLE, and SPY computed using all components, high-pass filtered data, and low-pass filtered data. Recall from Figures 1 and 6 that the time series are decomposed into six components, including the residual terms. We use m l = 4 for low-pass filtering and m h = 2 for high-pass filtering. Notice that the high-pass filtered time series captures most of the volatility residing in the original times series. This is also observed in Table 2. Hence, we consider using the stationary high-pass filtered data for statistical analysis. Annualized Volatility (%)

3-month Rolling Volatility of SPY
All Components Low-pass filtered High-pass filtered

Volatility Asymmetry
It has been observed in the financial market that the volatility of asset returns is usually asymmetric, that is, the volatility is higher in a downside market than that in an upside market. Following Bekaert and Wu (2000), we capture the asymmetry by looking at the conditional volatility. Specifically, we examine whether the asymmetry exists in the price dynamics. To that end, we define the conditional volatilities based on the ACE-EMD high-pass filter as follows: In essence, σ −H capture the high-pass volatilities conditioned on an upside movement and a downside movement, respectively. Figure 10 shows the 3-month rolling estimation of the conditional volatility using high-pass filtered data of XLK, XLE, and SPY. In the high-pass components, we see that S&P 500 shows asymmetric volatility with a larger downside volatility most of the time. The sector ETFs, however, exhibit both directions of asymmetry, where upside and downside movements trigger high volatility alternately during different periods of time. This phenomenon suggests distinct behavior of the sector ETFs comparing with the traditional equity indices.
Define the events of upside volatility asymmetry and downside volatility asymmetry as: Annualized Volatility (%)  Annualized Volatility (%)

3-month Rolling Conditional Volatility of SPY
High-pass unconditional High-pass upside High-pass downside In Figure 11, we plot the frequency of event A + , as defined in (15), against the frequency of event A − , as defined in (16). The dashed line corresponds to P(A + ) + P(A − ) = 1, and the closeness to the dashed line indicates the level of volatility asymmetry in general. We can see there are clear clusters separating the ETFs. In the upper left region, the corresponding ETFs, such as XLP, XLY, and XLC, are upside volatility asymmetry dominated. In contrast, towards the lower right direction the corresponding ETFs are downside volatility asymmetry dominated. The sector ETFs with the highest probability of downside asymmetry are XLU, XLE, XLK, XLV, and XLB. The S&P 500 index ETF, along with XLF and XLI, lie somewhere in the middle in terms of the upside/downside asymmetry.

Energy-Frequency Spectrum
As is well-known (see Koopmans 1995;Huang et al. 2003, among others), spectral analysis provides an alternative aspect to analyze time series on the frequency domain. The definition of an IMF is very amenable to Hilbert spectral analysis. In this section, we apply this approach to compute and analyze the energy-frequency spectra of the sector ETFs.

Hilbert Spectral Analysis
Every oscillating real-valued function can be viewed as the projection of an orbit on the complex plane onto the real axis. For any function in time X(t), the Hilbert transform is given by where the improper integral is defined as the Cauchy principle value. The transform exists for any function in L p (see Titchmarsh 1948). As a result, Y(t) provides the complementary imaginary part of X(t) to form an analytic function in the upper half-plane defined by where Recall from Equation (2) that an IMF admits the form c(t) = a(t) cos(θ(t)). It is known that if the amplitude and frequency are slow modulations, then the corresponding Hilbert transform gives a π/2 shift to the phase θ(t) (see Bedrosian 1963;Nuttall and Bedrosian 1966). Therefore, the a(t) given by (19) is the instantaneous amplitude, and the θ(t) given by (20) is the instantaneous phase function. The instantaneous frequency is then defined as the 2π-standardized rate of change of the phase function: Applying Hilbert transform to each of the IMF components individually yields a sequence of analytic signals (see Huang et al. 1998): for j = 1, · · · , n. In turn, the original time series can be represented as This decomposition can be seen as a sparse spectral representation of the time series with time-varying amplitude and frequency. In other words, each IMF represents a generalized Fourier expansion that are suitable for nonlinear and nonstationary financial time series. In summary, the procedure generates a series of complex functions that are analytic in time, along with their associated instantaneous amplitudes and instantaneous frequencies. These components capture different time scales and resolutions embedded in the time series and are used for time series filtering and reconstruction.
Lastly, the Hilbert spectrum is defined by The instantaneous energy of the j-th component is defined as We examine the behavior of the Hilbert spectrum through the pair ( f j (t), E j (t)) for t ∈ [0, T] and j = 1, · · · , n (see Figure 12), which forms a sparse energy-frequency spectrum. Through this new lens, we examine the behaviors of the time series.

Central Frequency and Power Spectrum
For each time series, we can obtain the instantaneous frequency f j (t) from (21) and instantaneous energy E j (t) from (25), corresponding to mode j = 1, · · · , n. This allows us to derive the instantaneous energy-frequency spectrum as shown in Figures 12-15. Each point on the plots is a pair of ( f j (t), E j (t)), for mode j = 1, · · · , n, and time t = 1, · · · , T. We see that for each mode j, the instantaneous energy-frequency pairs ( f j (t), E j (t)), t = 1, · · · , T form a cluster of points. We define the central frequency and central energy of mode j during the time period [0, T] as follows: While the instantaneous frequency and energy are time-varying, they are typically fluctuating or orbiting around the central points. Dragomiretskiy and Zosso (2013) assumed a central frequency in each mode and used the concept to derive the variational mode decomposition (VMD). Intuitively, the central frequency and central energy capture the overall spectral properties of the time series (see Wu and Huang 2004).
From Figures 12-15 we also observe a clear linear relationship in the log space of central frequency and energy pair (f j ,Ē j ), which are marked as the black crosses. A linear regression is run on the central frequencies to estimate the slope of the spectrum. This indicates a power spectrum relation The power spectrum exponent α controls how fast the energy decays from lower to higher frequency, and is key to the property of the spectrum and the associated time series. It has been long observed that many time series in nature have α close to one, well-known as the 1/ f spectrum or "pink noise". In Figures 12-15, the solid line in each plot is obtained from linear regression of (log(f j ), log(Ē j )). The negative slope of the line estimates the power spectrum exponent α for the time series. The power spectrum exponents for the S&P 500 index and U.S. sector ETFs are summarized in Table 3.  Instantaneous Energy E(t)

Frequency Synchronization
To further investigate the spectral characteristics of sector ETFs, we conduct a pairwise comparison between each sector ETF and the S&P500 ETF (SPY). For each pair of ETFs, we plot the associated central frequencies derived from their price time series, denoted bȳ f j , j = 1, · · · , n. In Figures 16-19, we show the log-log scatter plots of the instantaneous frequencies of sector ETFs and SPY. Each point on the plot is a pair of instantaneous frequencies of the two time series recorded at the same time, and the central frequencies are marked as black crosses. The solid straight line of unit slope shows the reference line for identical frequencies.
We observe that some sector ETFs, such as XLF and VNQ, appear to share similar frequency profiles and their central frequencies are very close. On the other hand, the central frequencies deviate from the identical line at low-frequency modes for all sector ETFs vs. SPY, with XLK being the most extreme example. This means that sector ETFs tend to exhibit slower dynamics in the longer term components, while the fast modes of both sector ETFs and SPY have similar mean frequencies. Similar frequency profiles suggest synchronization, which is a typical phenomenon in nonlinear dynamics with interaction (see e.g., Pikovsky et al. 2003). In order to quantify the frequency synchronization level between two time series x 1 and x 2 , we define the associated frequency deviation as follows: A lower frequency deviation value represents a higher synchronization level. In particular, we have D(x 1 , x 2 ) = 0 if, and only if the central frequencies of all the IMF components are identical for x 1 and x 2 , meaning the two time series are fully synchronized.
To see how the sector ETFs have evolved compared to the S&P 500, we plot their power spectrum exponent α estimated from a 2-year rolling window. As shown in Figure 20, the power spectrum exponents α of most sector ETFs are consistently higher than that of SPY. Over the one-year period, the power spectrum exponent of SPY is gradually decreasing from close to 1 to less than 0.8. Several sector ETFs, such as XLE and XLY, continue to have the highest power spectrum exponents through time. In contrast, the real estate ETF VNQ is seen to have the lowest power spectrum exponent over time.
Furthermore, in Figure 21, we see that the sector ETF frequency deviation, when compared against SPY as defined in (29), has shown a general trend of increase from 2020 to 2021. The largest frequency deviation comes from the technology ETF XLK, while the industrial sector ETF XLI has experienced a drastic continual increase in frequency deviation. These observations are evidence of partial asynchronization over a relatively long period of time.
The power spectra and frequency profiles provide new perspectives on how to differentiate and compare the price dynamics of sector ETFs. Our findings on these ETFs are useful for a number of practical purposes, including sector rotation, asset allocation, and portfolio diversification, which are vital not only for individual and institutional investors. Our study also discovers some new price behaviors and properties of sector ETFs. The multiscale metrics used herein, such as asymmetric volatility and frequency deviation, provide better understanding of the risks associated with these ETFs, which is important for both investors and policy makers.

Conclusions
We developed a data-driven algorithm to discover and examine the multiscale characteristics of U.S. sector ETF price dynamics, including intrinsic mode functions of various frequencies. Different combinations of modes allow us to filter the financial time series or estimate its volatility based on different timescales. Across all sector ETFs, we compare their price behaviors as well as multiscale properties. In the same spirit, we apply Hilbert spectral analysis to compute the associated instantaneous energy-frequency spectrum for each sector ETF. In particular, the power spectrum exponents of most sector ETFs have been significantly higher than that of the S&P 500, and the phenomenon tends to persist.
The EMD approach also has its shortcomings. One common issue is the end-effect, whereby the decomposition becomes less accurate near both ends of the time series (see Chen and Feng 2003, among others). The noise-assisted CEEMD method here partially addresses the end effect, but there is no guarantee. In addition, the number of modes from EMD may depend on a number of factors, such as the time frame of the data. Therefore, if two time series result in different numbers of modes, then comparison becomes less direct. As a remedy, one can pre-specify a common number of modes for the decomposition of multiple time series.
For future research, one direction is to apply or extend CEEMD to examine the highfrequency price evolution of ETFs and other assets. The CEEMD method can also be used to generate a collection of new HHT features. These features can be integrated into machine learning models, leading to several practical applications, such as forecasting and classification (see e.g., Leung and Zhao 2021b).