Tidal Analysis Using Time – Frequency Signal Processing and Information Clustering

Geophysical time series have a complex nature that poses challenges to reaching assertive conclusions, and require advanced mathematical and computational tools to unravel embedded information. In this paper, time–frequency methods and hierarchical clustering (HC) techniques are combined for processing and visualizing tidal information. In a first phase, the raw data are pre-processed for estimating missing values and obtaining dimensionless reliable time series. In a second phase, the Jensen–Shannon divergence is adopted for measuring dissimilarities between data collected at several stations. The signals are compared in the frequency and time–frequency domains, and the HC is applied to visualize hidden relationships. In a third phase, the long-range behavior of tides is studied by means of power law functions. Numerical examples demonstrate the effectiveness of the approach when dealing with a large volume of real-world data.


Introduction
Geophysical time series (TS) can be interpreted as the output of multidimensional dynamical systems influenced by many distinct factors at different scales in space and time.In light of Takens' embedding theorem, these TS can reveal-at least partially-the underlying dynamics of the corresponding systems [1].
Jalón-Rojas et al. [34] compared different spectral methods for the analysis of high-frequency and long TS collected at the Girond estuary.They considered specific evaluation criteria and concluded that the combination of distinct methods could be a good strategy for dealing with data measured at coastal waters.Grinsted et al. [35] adopted the cross wavelet transform and wavelet coherence for examining relationships in time and frequency between two TS.They applied these methods to the Arctic Oscillation index and the Baltic maximum sea ice extent record.Vautard et al. [7] used the singular-spectrum analysis, demonstrating the effectiveness of the technique when dealing with short and noisy TS.Malamud and Turcotte [36] introduced the self-affine TS, characterized by a power spectral density (PSD) that is described by a power law (PL) function of the frequency.They addressed a variety of techniques to quantify the strength of long-range persistence-namely, the Fourier power spectral, semivariogram, rescaled-range, average extreme-event, and wavelet variance analysis.Ding and Chao [9] adopted autoregressive methods for detecting harmonic signals with exponential decay or growth contained in noisy TS.Donelan et al. [10] used the maximum likelihood, maximum entropy, and wavelets for estimating the directional spectra of water waves.Gong et al. [37] adopted the S-transform for analyzing seismic data.Huang et al. [22] proposed empirical mode decomposition and the Hilbert-Huang transform.First, a TS is decomposed into a finite and often small number of intrinsic mode functions, and then the Hilbert transform is applied to the modes.Forootan and Kusche [38] used independent component analysis to separate unknown mixtures of deterministic sinusoids with non-null trend.Doner et al. [39] explored recurrence networks, interpreting the recurrence matrix of a TS as the adjacency matrix of an associated complex network that links different points in time if the considered states are closely neighbored in the phase space.The recurrence matrix yields new quantitative characteristics (such as average path length, clustering coefficient, or centrality measures of the recurrence network) related to the dynamical complexity of the TS.Lopes at al. [32,40,41] investigated geophysical data by means of multidimensional scaling and fractional order techniques.
Tides are variations in the sea level mainly caused by astronomical components, such as gravitational forces exerted by the Moon, the Sun, and the rotation of the Earth, but also reflect non-astronomical sources such as the weather [42].Understanding the sea-level variations is of great importance for both safe navigation and for planning and promoting the sustainable development of coastal areas.Moreover, sea-level observations provide valuable data to ocean sciences, geodynamics, and geosciences [43,44].Tides can be measured by means of gauges, with respect to a datum, and the values are recorded over time.A large volume of tidal information is presently available for scientific research.Tidal TS include harmonic constituents and other components with multiple time scales that span from hours to decades.On such time scales, tidal data are often non-stationary, and as with most geophysical TS, standard mathematical tools are insufficient to satisfactorily assess the information that they embed.
In this paper we combine time-frequency methods and hierarchical clustering (HC) techniques to process and visualize tidal information.In a first phase, we pre-process the raw data (i.e., we fill the gaps in the TS with values calculated with a suitable tidal model), and then we normalize the data to obtain dimensionless TS.In a second phase, we use the Jensen-Shannon divergence (JSD) to measure the dissimilarities between TS collected at several stations located worldwide.The TS are compared in the frequency and time-frequency domains.The frequency domain information consists of the PSD generated by the MM.The time-frequency information corresponds to the magnitudes of the fractional Fourier transform (FrFT) and the continuous wavelet transform (CWT) of the TS.In the three cases, HC generates maps that are interpreted based on the emerging clusters of the points that represent tidal stations.In a third phase, the long-range behavior of tides is modeled by means of PL functions using the TS spectra at low frequencies.Numerical examples demonstrate the effectiveness of the approach when dealing with a large volume of real-world data.
In this line of thought, the structure of the paper is as follows.Section 2 presents the main mathematical tools used for processing the TS.Section 3 introduces the data set and the pre-processing used to generate well-formatted TS.Section 4 applies the HC method and discusses the results.Section 5 studies the long-range behavior of tides by means of PL functions.Finally, Section 6 draws the main conclusions.

Mathematical Fundamentals
This section introduces the main mathematical tools adopted for data processing; namely, the MM, FrFt, CWT, JSD, and HC techniques.These tools are well-suited to TS generated by most naturally-occurring phenomena, as is the case of biological, climatic, and geophysical processes.

Multitaper Method
The MM is a robust numerical algorithm for estimating the PSD of a signal.Given an N-length sequence x(t), its PSD can be estimated by the single-taper, or modified periodogram function, T( f ), derived directly from the FT of x(t) [45].Therefore, we have: where t and f denote time and frequency, respectively, and j = √ −1.The function a(t) is called a taper, or window, and represents a series of weights that verify the condition ∑ N−1 t=0 |a(t)| = 1.If a(t) is a rectangular (or boxcar) function, then (1) yields the standard periodogram of x(t) [46].
Expression (1) leads to a biased estimate of the PSD due to both spectral leakage (i.e., power spreading from strong peaks at a given frequency towards neighboring frequencies) and variance of T( f ) (i.e., noise affecting the spectra).To avoid these artifacts, the MM method was introduced by Thomson [47].In this method, x(t) is multiplied by a set of orthogonal sequences, or tapers, to obtain a set of single-taper periodograms.The set is then averaged to yield an improved estimate of the PSD, S( f ), given by: where obtained with K Slepian sequences, v k (t), that verify [48]: Instead of (2), a weighted average is often adopted that minimizes some measure of discrepancy of Y k , yielding the estimate: where d k ( f ) are weights [47].Some variants of the MM can process TS with gaps [49,50], but we consider herein the "standard" MM implementation, which requires evenly sampled TS without gaps.

Fractional Fourier Transform
The FrFT of order a ∈ R, F a , is a linear integral operator that maps a given function (or signal) x(t) onto x a (τ), {t, τ} ∈ R, by the expression [51]: where, setting α = aπ/2, the kernel K a (τ, t) is defined as: with For a = 2k, k ∈ Z, α ∈ πk, we should take limiting values.Furthermore, when a = 4k and a = 2 + 4k, the FrFT becomes f 4k (τ) = f (τ) and f 2+4k (τ) = f (−τ), respectively, and the kernels are: When a = 1 + 4k, we have F a = F 1 that corresponds to the ordinary Fourier transform (FT), and when a = 3 + 4k, we have F a = F 3 = F 2 F 1 .Therefore, the operator F a can be interpreted as the ath power of the ordinary FT, that may be considered modulo 4 [51,52].For the digital computation of F a , different algorithms were proposed [51].Here we adopt the Fast Approximate FrFT [51] (https://nalag.cs.kuleuven.be/research/software/FRFT/).The signal x(t) must be evenly sampled and without gaps.

Wavelet Transform
The wavelet transform converts a given function, x(t), from standard time into the generalized time-frequency domain, and represents a powerful tool for identifying intermittent periodicities in the data.The discrete wavelet transform is particularly useful for noise reduction and data compression, while the CWT is better for feature extraction [35].
The CWT of x(t) is given by [53][54][55]: where ψ denotes the mother wavelet function, (•) * represents the complex conjugate of the argument, and the parameters (a, b) represent the dyadic dilation and translation of ψ, respectively.The CWT processes data at different scales.The temporal analysis is performed with a contracted version of the prototype wavelet, while frequency analysis is derived with a dilated version of ψ.The parameter a is related to frequency, and b often represents time or space.
The choice of an appropriate mother wavelet represents a key issue in the analysis [53,56].Some initial knowledge about the signal characteristics is important, but we often choose based on several trials and the results obtained.Therefore, the best ψ is the one that more assertively highlights the features that we are looking for.
Two TS can be compared directly by computing their wavelet coherence as a function of time and frequency.In other words, wavelet coherence measures time-varying correlations as a function of frequency [35,44,57].
Given two TS, x i (t) and x j (t), their wavelet coherence is given by [35,44,58]: where S(•) is a smoothing function in time and frequency.
Similarly to the MM and FrFT, the CWT can be applied to TS evenly sampled and without missing data.

Jensen-Shannon Divergence
The JSD measures the dissimilarity between two probability distributions, P and Q [59], and is the smoothed and symmetrical version of the Kullback-Leibler divergence, or relative entropy, given by: The JSD is formulated as: where M = P+Q 2 is a mixture distribution.Alternatively, we can write:

Hierarchichal Clustering
Clustering analysis groups objects that are similar to each other in some sense.In HC, a hierarchy of object clusters is built based on one of two alternative algorithms.In agglomerative clustering, each object starts in its own singleton cluster, and at each step the two most similar clusters are greedily merged.The algorithm stops when all objects are in the same cluster.In divisive clustering, all objects start in one cluster, and at each step the algorithm removes the outsiders from the least cohesive cluster.The iterations stop when each object is in its own singleton cluster.The clusters are combined (split) for agglomerative (divisive) clustering based on their dissimilarity.Therefore, given two clusters, I and J, a metric is specified to measure the distance, δ(x i , x j ), between objects x i ∈ I and x j ∈ J, and the dissimilarity between clusters, d(I, J), is calculated by the maximum, minimum, or average linkage, given by: The results of HC are usually presented in a dendrogram or a tree diagram.

Dataset
The tidal information are available at the University of Hawaii Sea Level Center (http://uhslc.soest.hawaii.edu/).Worldwide stations have records covering different time periods.We consider hourly data collected between January, 1 1994 and December, 31 2014 at s = 75 stations.Their labels, names, and percentage of missing data are shown in Table 1.The stations' geographical location is depicted in Figure 1.Occasional gaps in the TS, x(t), must be filled before applying the MM and CWT processing tools.The missing values are replaced by artificial data generated by a tidal model, x(t), given by: where U 0 = x(t) denotes the average value of x(t), and the sinusoidal terms represent standard tidal constituents of known frequency, f k , k = 1, • • • , T, according to the International Hydrographic Organization (https://www.iho.int/srv1/index.php?lang=en).The amplitude and phase shift, U k and φ k , are computed by the least-squares method.
Herein, we adopt T = 37 and the components listed in Table 2.For example, Figure 2 depicts the original, x(t), and the reconstructed, x(t), TS of Boston, illustrating the effectiveness of the model.Identical results are obtained for other tidal stations.

Analysis and Visualization of Tidal Data
In this section, we use HC for visualizing the relationships between s = 75 tidal TS.The signals, are first normalized to zero mean and unit variance in order to get a dimensionless TS: where µ i and σ i represent the mean and standard deviation values of x i (t), respectively.In Subsections 4.1 and 4.2, we use the JSD to measure the dissimilarities between the tidal data in the frequency and time-frequency domains, respectively, and we apply the HC algorithm to visualize relationships.It should be noted that other dissimilarity measures are possible [32,33], but several numerical experiments led to the conclusion that the JSD yields reliable results.

HC Analysis in the Frequency Domain
Data in the frequency domain corresponds to the TS PSD estimates, Ŝ( f ), calculated with the MM as defined in (5).The superiority of the MM over the standard periodogram is illustrated in Figure 3 for data collected at the Boston tidal station (lat: 42.35 • , lon: −71.05 • ).We observe that the variance (spectral noise) of Ŝ( f ) is considerably smaller than the one obtained for the classical periodogram, T( f ).We obtain similar results for other tidal stations.
We normalize the PSD estimates, Ŝ( f ), by calculating the ratio: where Φ( f ) is interpreted as a probability distribution [60], and we feed the HC with the matrix ∆ = [δ ij ], i, j = 1, ..., 75, where δ ij = JSD(Φ i , Φ j ) represents the JDS between the normalized PSD estimates (Φ i , Φ j ). Figure 4 depicts the tree generated by applying the successive (agglomerative) and average-linkage methods [32,40].The software PHYLIP was used for generating the graph [61].We observe not only the emergence of two main (level 1) clusters, C 1 and C 2 , but also the presence of various sub-clusters at different lower levels.For example, cluster C 1 is composed of level 2 sub-clusters C 11 and C 12 , while C 2 comprises C 21 , C 22 , and the "outlier" station 50.Nevertheless, at lower levels of the hierarchical tree, the elements of certain sub-clusters emerge very close to each other, making visualization more difficult.

The FFrT-Based Approach
The FrFT converts a function to a continuum of intermediate domains between the orthogonal time (or space) and frequency domains.Therefore, it can be thought of as an operator that rotates a signal by any angle, instead of just π/2 radians as performed by the ordinary FT.For a = 0, the FrFT corresponds to the time domain signal.For a = 1, the FrFT yields the ordinary FT.The main peaks observed in the time domain propagate along the continuum of pseudofrequency (or time-frequency) domains (as a increases), originating high-energy paths that determine the shape of the FrFT charts.Close to τ = 92028 (i.e., to half of the total number of samples of the TS), we observe a high-energy component that corresponds to the DC frequency, but other details are difficult to perceive.We obtain similar patterns for other tidal stations.The structure of the FrFT plots reflect the characteristics of the TS.Nevertheless, to the authors best knowledge, there are not yet assertive tools to explore this three-dimensional information.
For each TS, we calculate the corresponding FrFT, and we generate an L × N dimensional complex-valued matrix, W, where L and N denote the number of points in frequency and time, respectively.We then compute the P = LN dimensional vector w(p), p = 1, • • • , P, composed of the columns of |W|, and we perform the normalization: where the function Ω(p) is interpreted as a probability distribution.Finally, we feed the HC with the matrix ∆ = [δ ij ], i, j = 1, ..., 75, where δ ij = JSD(Ω i , Ω j ) represents the JSD between the normalized vectors (Ω i , Ω j ).
Figure 6 depicts the tree generated by the HC.As before, the successive (agglomerative) and average-linkage methods were used [32,40].We observe two main clusters, U 1 and U 2 , that are similar to the ones identified by the MM-based approach, C 1 and C 2 , respectively, revealing good consistency between the two processing alternatives.

The CWT-Based Approach
The CWT is well suited to non-stationary signals and establishes a compromise between precision analysis in the time and frequency domains [62].We adopt here the complex Morlet wavelet, since several numerical experiments were revealed to be a good choice in the context of continuous analysis and feature extraction [35,44].The complex Morlet wavelet is defined as: where f b is related to the wavelet bandwidth and f c is its center frequency.These constants can be interpreted as the parameters of a time-localized filtering, or correlation, operator.Figure 7 depicts the CWT for Boston (lat: 42.35 • , lon: −71.05 • ) and Christmas Is. (lat: 1.983 • , lon: −157.467• ) tidal stations.We observe two main patterns at frequencies around f = 0.08 h −1 and f = 0.042 h −1 , corresponding to the semi-diurnal and diurnal tidal components, but other objects are difficult to identify.For other tidal stations we obtain similar patterns.
Figure 8 shows the similarities between the two station pairs Boston (lat: 42.35  , lon: −74.02 • ).That is, we present one pair of distant and one pair of neighbor stations.We verify that coherence between neighbors is higher and-as expected-we observe regions of strong coherence at the frequencies of the main tidal components (Table 2).However, other strong coherence regions emerge throughout the data which are difficult to infer from the bare CWT charts.Therefore, from Figure 8 we conclude that wavelet coherence is a powerful tool for unveiling hidden similarities between data.Yet, since it produces one chart per TS pair, a large amount of data is generated for all combinations of pairs, and the global perspective is difficult to obtain.To overcome these problems, in the follow up, we combine CWT and HC tools.For all TS, we determine the corresponding CWT, and as described in Subsection 4.2.1, we calculate the function Ω(p), where w(p) now denotes a vector obtained from |W|, with W generated by the CWT.Finally, we feed the HC with the matrix ∆ = [δ ij ], i, j = 1, ..., 75.
Figure 9 depicts the tree generated for matrix ∆.We observe two main clusters, V 1 and V 2 , that are similar to those already identified in the MM-and FrFT-based trees.For example, relative to C 1 and C 2 , the main differences are for stations 30 (Keelung) and 50 (Papeete), which swapped places.Sub-clusters at lower levels are now well separated, demonstrating the superiority of the time-frequency analysis in discriminating differences between the data.
In conclusion, the trees from Figures 4-9 reveal the same type of clusters, with slightly distinct levels of discrimination of the sub-clusters, Figure 9 apparently being slightly superior to the others.This global comparison shows that geographically close stations can behave differently from each other due to local factors.However, this may be not perceived when using standard processing tools.

Long-Range Behavior of Tides
The previous analysis revealed similarities embedded into distinct TS, but does not focus on long memory effects that often occur in complex systems.Having this fact in mind, in this section we study the long-range behavior of tides based on the characteristics of the TS PSD at low frequencies.Therefore, we model the MM estimates, Ŝi ( f ), i = 1, • • • , 75, within the bandwidth f ∈ [ f L , f H ], where f L and f H denote the lower and upper frequency limits by means of PL functions: In this perspective, "low frequencies" means the bandwidth bellow the first harmonics with significant amplitude; that is, f ≈ 24 h −1 .
The values obtained for parameter b reveal underlying characteristics of the tidal dynamics; namely, a fractional value of b may be indicative of dynamical properties similar to those usually found in fractional-order systems [41,63,64].Moreover, Equation (23) implies a relationship between PL behavior and fractional Brownian motion (fBm) [30,65] (1/ f noise [66]), since for many systems fBm represents a signature of complexity [67].The parameters (a, b) are computed for the whole set of time-series (s = 75 in total), and the corresponding locus is depicted in Figure 11.The size and color of the markers are proportional to the value of the root mean squared error (RMSE) of the PL fit.We verify that b has values between 0.2 and 0.8, corresponding to TS including long memory effects typical of fBm.Values of b close to zero mean that tidal TS are close to white noise; that is, to a random signal having equal intensity at different frequencies.On the other hand, values of b close to 1 follow the so-called pink or 1/ f noise, which occurs in many physical and biological systems.In general, for non-integer values of b, signals are related to the ubiquitous fractional Brownian noise.So, we can say loosely that the smaller/higher the values of b, the less/more correlated are consecutive signal samples and the smaller/larger is the content of long-range memory effects.
In Figure 11, we group points in the locus (a, b) into four clusters L i , i = 1, • • • , 4, loosely having the correspondence V 11 ∪ V 12 → L 1 , V 21 → L 2 , and V 22 → L 3 ∪ L 4 .Therefore, we find that the clusters previously identified with the tree diagrams for the global time scale have a distinct arrangement in the long-range perspective.The chart also includes the approximation curves to L i , i = 1, • • • , 4, yielding lines resembling isoclines in a vector field.Third-order polynomials (i.e., degree n = 3) were interpolated since they lead to a good compromise between reducing the RMSE of the fit and avoiding overfitting.From the gradient generated by the isoclines approximation, we observe not only a gradual and smooth evolution between the four isoclines, but also a clear separation between them, with particular emphasis on L 3 and L 4 .This property was not clear in the previous diagram trees.Long-range memory effects are diluted when handling TS simultaneously with long and short time scales, but the (a, b) locus unveils properties that reflect distinct classes of phenomena, and their identification needs further study.
We should also note that the low-frequency range covers time scales between 1 year and several decades.So, the results demonstrate the presence of phenomena influencing tides during long periods of time.The limits of such time scales remain to be explored, since present-day TS do not include sufficiently long records.In other words, the results point toward obtaining longer TS, since relevant phenomena may be not completely captured with the available data.
are spectral estimates, or eigenspectra functions, and Y k ( f ) are the eigencomponents:

Table 1 .
Stations' labels, names, and percentage of missing data.Geographic location of the s = 75 stations considered in the study.