**José A. Tenreiro Machado 1,\* and António M. Lopes 2**


*Received: 26 July 2013; in revised form: 10 September 2013 / Accepted: 12 September 2013 / Published: 16 September 2013* 

**Abstract:** Seismic data is difficult to analyze and classical mathematical tools reveal strong limitations in exposing hidden relationships between earthquakes. In this paper, we study earthquake phenomena in the perspective of complex systems. Global seismic data, covering the period from 1962 up to 2011 is analyzed. The events, characterized by their magnitude, geographic location and time of occurrence, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based in latitude and longitude coordinates. Two methods of analysis are considered and compared in this study. In a first method, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the parameters used to reveal the relationships among regions. In the second method, the mutual information is calculated and adopted as a measure of similarity between regions. In both cases, using clustering analysis, visualization maps are generated, providing an intuitive and useful representation of the complex relationships that are present among seismic data. Such relationships might not be perceived on classical geographic maps. Therefore, the generated charts are a valid alternative to other visualization tools, for understanding the global behavior of earthquakes.

**Keywords:** seismic events; mutual information; clustering; visualization

Earthquakes are caused by a sudden release of elastic strain energy accumulated between the surfaces of tectonic plates. Big earthquakes often manifest by ground shaking and can trigger tsunamis, landslides and volcanic activity. When affecting urban areas, earthquakes usually cause destruction and casualties [1–4]. Better understanding earthquake behavior can help to delineate pre-disaster policies, saving human lives and mitigating the economic efforts involved in assembling emergency teams, gathering medical and food supplies and rebuilding the affected areas [5–8].

Earthquakes reveal self-similarity and absence of characteristic length-scale in magnitude, space and time, caused by the complex dynamics of Earth's tectonic plates [9,10]. The plates meet each other at fault zones, exhibiting friction and stick-slip behavior when moving along the fault surfaces [11,12]. The irregularities on the fault surfaces resemble rigid body fractals sliding over each other, originating the fractal scaling behavior observed in earthquakes [13]. The tectonic plates form a complex system due to interactions among faults, where motion and strain accumulation processes interact on different scales ranging from a few millimeters to thousands of kilometers [14–16]. Moreover, loading rates are not uniform in time. Earthquakes are likely to come in clusters, meaning that a cluster is most probable to occur shortly after another cluster and a cluster of clusters soon after another cluster of clusters [17]. Earthquakes unveil long range correlations and long memory characteristics [18], which are typical of fractional order systems [19,20]. Some authors also suggest that Self-Organized Criticality (SOC) is relevant for understanding earthquakes as a relaxation mechanism that organizes the terrestrial crust at both spatial and temporal levels [21]. Other researchers [22,23] emphasize the relationships between complex systems, fractals and fractional calculus [24–27].

In this paper, we analyze seismic data in the perspective of complex systems. Such data is difficult to analyze using classical mathematical tools, which reveal strong limitations in exposing hidden relationships between earthquakes. In our approach global data is collected from the Bulletin of the International Seismological Centre [28] and the period from 1962 up to 2011 is considered. The events, characterized by their magnitude, geographic location and time, are divided into groups, either according to the Flinn-Engdahl (F-E) seismic regions of Earth or using a rectangular grid based on latitude and longitude coordinates. We develop and compare two alternative approaches. In a first methodology, the distributions of magnitudes are approximated by Gutenberg-Richter (G-R) distributions and the corresponding parameters are used to reveal the relationships among regions. In the second approach, the mutual information is adopted as a measure of similarity between events in the distinct regions. In both cases, clustering analysis and visualization maps are adopted as an intuitive and useful representation of the complex relationships among seismic events. The generated maps are evidenced as a valid alternative to standard visualization tools, for understanding the global behavior of earthquakes.

Bearing these ideas in mind, this paper is organized as follows: in Section 2, we give a brief review of the techniques used. Section 3 analyses earthquakes' data and discusses results, adopting F-E seismic regions. Section 4 extends the analysis to an alternative seismic regionalization of Earth. Finally, Section 5 outlines the main conclusions.

#### **2. Mathematical Tools**

This section presents the main mathematical tools adopted in this study, namely G-R distributions, mutual information and clustering analysis. The G-R distribution is a two-parameter power-law (PL) that establishes a relationship between frequency and magnitude of earthquakes [29–31].

The concepts of entropy and mutual information [32–35], taken from the information theory, have been a common approach to the analysis of complex systems [36]. In particular, mutual information is adopted as a general measure of correlation between two systems. Mutual information, as well as entropy, have found significance in various applications in diverse fields, such as in analyzing experimental time series [37–39], in characterizing symbol sequences such as DNA sequences [40–42] and in providing a theoretical basis for the notion of complexity [43–47], just to name a few.

Clustering analysis consists on grouping objects in such a way that objects that are, in some sense, similar to each other are placed in the same group (cluster). Clustering is a common technique for statistical data analysis, used in many fields, such as data mining, machine learning, pattern recognition, image analysis, information retrieval and bioinformatics [48–50].

#### *2.1. Gutenberg-Richter Law*

The G-R law is given by:

$$a \log\_{10} N = a - bM \tag{10}$$

where *N* **N** is the number of earthquakes of magnitude greater than or equal to *M* **R**, occurred in a specified region and period of time. Parameters (*a*, *b*) **R** represent the activity level and the scaling exponent, respectively. The former is a measure of the level of seismicity, being related to the number of occurrences. The later has regional variation, being in the range *b* [0.8, 1.06] and *b* [1.23, 1.54] for small and big earthquakes, respectively [30].

#### *2.2. Mutual Information*

Mutual information measures the statistical dependence between two random variables. In other words, it gives the amount of information that one variable "contains" about the other. Let *X* and *Y* represent two discrete random variables with alphabet *X* and *Y*, respectively. The mutual information between *X* and *Y*, *I*(*X*, *Y*), is given by [51]:

$$I(X,Y) = \sum\_{\mathbf{y} \in \mathcal{Y}} \sum\_{\mathbf{x} \in \mathcal{X}} p(\mathbf{x}, \mathbf{y}) \cdot \log\_2 \left( \frac{p(\mathbf{x}, \mathbf{y})}{p(\mathbf{x}) \cdot p(\mathbf{x})} \right) \tag{11}$$

where *p*(*x*, *y*) is the joint probability distribution function of (*X*, *Y*), and *p*(*x*) and *p*(*y*) are the marginal probability distribution functions of *X* and *Y*, respectively. Mutual information is always symmetrical (*i.e.*, *I*(*X*, *Y*) = *I*(*Y*, *X*)). If the two variables are independent, the mutual information is zero.

#### *2.3. K-means Clustering*

*K*-means is a popular non-hierarchical clustering method, extensively used in machine learning and data mining. *K*-means starts with a collection of *N* objects *XN* ={*x*1, *x*2, …, *xN*}, where each object *xn* (1 # *n* < *N*) is a point in *D*-dimensional space (*xn* **R***<sup>D</sup>*), and a user specified number of clusters, *K*. The *K*-means method aims to partition the *N* objects into *K N* clusters, *CK* = {*c*1, *c*2,…, *cK*}, so as to minimize the sum of distances, *J*, between the points and the centers of their clusters, *MK* = {*μ*1, *μ*2, …, *μK*}:

$$J = \sum\_{n=1}^{N} \sum\_{k=1}^{K} r\_{nk} \left\| x\_n - \mu\_k \right\|^2 \tag{12}$$

where *rnk* {0, 1} is a parameter denoting whether object *xn* belongs to cluster *k* [52]. The result can be seen as partitioning the data space into *K* Voronoi cells.

The exact optimization of the *K*-means objective function, *J*, is *NP*-hard. Several efficient heuristic algorithms are commonly used, aiming to converge quickly to local minima. Among others [53] Lloyd's algorithm, described in the sequel, is one of the most popular. It initializes computing the cluster centers *MK* = {*μ*1, *μ*2,…, *μK*}. This can is done randomly choosing the centers, adopting *K* objects as the cluster centers, or using other heuristics. After initialization, the algorithm iterates assigning each object to its closest cluster center:

$$\mathbf{c}\_{k} = \{ n \colon \quad k = \arg\min\_{k} \| \mathbf{x}\_{n} - \boldsymbol{\mu}\_{k} \| ^{2} \} \tag{13}$$

where *ck* represents the set of objects closest to *μk*.

New cluster centers, *<sup>k</sup>*, are then calculated using:

$$\mu\_k = \frac{1}{\left| c\_k \right|} \sum\_{n \in c\_k} x\_n \tag{14}$$

and Equations (4) and (5) are repeated until some criterion is met (e.g., cluster centers do not change in space anymore).

One way to select the appropriate number of clusters, *K*, for the *K*-means algorithm is plotting the *K*-means objective, *J*, versus *K*, and looking at the "elbow" of the curve. The "optimum" value for *K* corresponds to the point of maximum curvature.

#### *2.4. Hierarchical Clustering*

Hierarchical clustering aims to build a hierarchy of clusters [54–57]. In agglomerative clustering each object starts in its own singleton cluster and, at each step, the two most similar (in some sense) clusters are greedily merged. The algorithm iterates until there is a single cluster containing all objects. In divisive clustering, all objects start in one single cluster. At each step, the algorithm removes the "outsiders" from the least cohesive cluster, stopping when each object is in its own singleton cluster. The results of hierarchical clustering are usually presented in the form of a dendrogram.

The clusters are combined (for agglomerative), or split (for divisive) based on a measure of dissimilarity between clusters. This is often achieved by using an appropriate metric (a measure of the distance between pairs of objects) and a linkage criterion, which defines the dissimilarity between clusters as a function of the pairwise distances between objects. The chosen metric will influence the composition of the clusters, as some elements may be closer to one another, according to one metric, and farther away, according to another.

Given two clusters, *R* and *S*, any metric can be used to measure the distance, *d*(*xR*, *xS*), between objects (*xR*, *xS*). The Euclidean and Manhattan distances are often adopted. Based on these metrics, the maximum, minimum and average linkages are commonly used, being, respectively:

$$d\_{\max}(R, S) = \max\_{\mathbf{x}\_R \in R, \mathbf{x}\_S \in S} d(\mathbf{x}\_R, \mathbf{x}\_S) \tag{15}$$

$$d\_{\min}(R, S) = \min\_{\mathbf{x}\_R \in R, \mathbf{x}\_S \in S} d(\mathbf{x}\_R, \mathbf{x}\_S) \tag{16}$$

$$d\_{ave}(R, S) = \frac{1}{|R \| S|} \sum\_{\mathbf{x}\_R \in R, \mathbf{x}\_S \in S} d(\mathbf{x}\_R, \mathbf{x}\_S) \tag{17}$$

While non-hierarchical clustering produces a single partitioning of *K* clusters, hierarchical clustering can give different partitioning spaces, depending on the chosen distance threshold.

#### **3. Analysis Global Seismic Data**

The Bulletin of the International Seismological Centre (ISC) [28] is adopted in what follows. The ISC Bulletin contains seismic events since 1904, contributed by more than 17,000 seismic stations located worldwide. Each data record contains information about magnitude, geographic location and time. Occurrences with magnitude in the interval *M* [–2.1, 9.2], expressed in a logarithm scale consistent with the local magnitude or Richter scale, are available [28]. In the first period of registers (about half a century) the number of records is remarkable smaller and lower magnitude events are scarce, when compared to the most recent fifty years. This may be justified by the technological constraints associated to the instrumentation available in the early decades of the last century. Therefore, to prevent misleading results, we study the fifty-year period from 1962 up to 2011. The events are divided into the fifty groups corresponding to the Flinn-Engdahl (F-E) regions of Earth [58,59], which correspond to seismic zones usually used by seismologists for localizing earthquakes (Table 1).


**Table 1.** Flinn-Engdahl regions of Earth and characterization of the seismic data.


**Table 1.** *Cont.*

### *3.1. K-means Analysis Based on G-R Law Parameters*

In this subsection the data is analyzed in a per region basis. Events with magnitude *M* \$ 4.5 are considered [60]. Above this threshold the cumulative number of earthquakes obeys the G-R law. The corresponding (*a*, *b*) parameters, as well as the coefficients of determination of each fit, *R*, are shown in Table 2.


**Table 2.** G-R law parameters corresponding to the data of each F-E region. The time period of analysis is 1962–2011. Events with magnitude M \$ 4.5 are considered.


**Table 2.** *Cont.*

The (*a*, *b*) parameters are analyzed using the non-hierarchical clustering technique *K*-means. We adopt *K* = 9 clusters as a compromise between a reliable interpretation of the maps and how well-separated the resulting clusters are. The obtained partition is depicted in Figure 1, where the axes values are normalized by the corresponding maximum values. Figure 2 shows the silhouette diagram. The silhouette value, for each object, is a measure of how well each object lies within its cluster [61]. Silhouette values vary in the interval *S* = 1 to *S* = +1 and are computed as

$$S(n) = \frac{b(n) - a(n)}{\max\left\{b(n), a(n)\right\}}\tag{18}$$

where *a*(*n*) is the average dissimilarity between object *n* and all other objects in the cluster to which the object *n* belongs, *ck*. On the other hand, *b*(*n*) represents the average dissimilarity between object *n* and the objects in the cluster closest to *ck*. Silhouette values closer to *S* = +1 correspond to objects that are very distant from neighboring clusters and, therefore, they are assigned to the right cluster. For *S* = 0 the objects could be assigned to another cluster. When *S* = –1 the objects are assigned to the wrong cluster.

From Figure 1, we obtain the *K* = 9 clusters: *A* = {4, 9, 34, 38, 39, 40, 47}, *B* = {36, 44}, *C* = {10, 14, 15, 16, 20, 26, 46}, *D* = {2, 3, 11, 21, 25, 27, 28, 41, 42, 45}, *E* = {49, 50}, *F* = {1, 8, 19, 22, 24}, *G* = {5, 6, 7, 17, 29, 30, 31, 33, 37, 43, 48}, *H* = {12, 13, 18, 23, 32}, *I* = {35}. Adopting the same colour map used in Figure 1, we depict the F-E regions in the geographical map of Figure 3. It can be noted that the obtained clusters correspond quite well to large contiguous regions.

**Figure 1.** *K*-means clustering of all F-E regions and Voronoi cells. Analysis based on the (*a*, *b*) parameters of the G-R law. The time period of analysis is 1962–2011. Events with magnitude *M* \$ 4.5 are considered.

**Figure 2.** Silhouette corresponding to the *K*-means clustering of all F-E regions. Analysis based on the (*a*, *b*) parameters of the G-R law. The time period of analysis is 1962–2011. Events with magnitude *M* \$ 4.5 are considered.

**Figure 3.** Geographical map of the F-E regions adopting the same colour map used in Figure 1 (green lines correspond to tectonic faults).

### *3.2. Analysis by Means of Mutual Information*

43

12

9

10

32

50

In this subsection we take the magnitude of the events as random variable and adopt the mutual information as a measurement of similarities between regions *i* and *j* (*i*, *j* = 1, …, 50). To avoid the systematic bias that occurs when estimating the mutual information from finite data samples we use the expression [62]:

$$I(X,Y) = I\_{hist}(X,Y) + \frac{B\_x + B\_y - B\_{xy} - 1}{2m \ln(2)}\tag{19}$$

$$I\_{hist}(X,Y) = \sum\_{r=1}^{N\_x} \sum\_{s=1}^{N\_y} D\_{xy}(r,s) \cdot \log\_2 \left[ \frac{D\_{xy}(r,s)}{D\_x(r) \cdot D\_y(s)} \right] \tag{20}$$

33

where *m* **N** is the number of data samples, (*Nx*, *Ny*) represent number of bins, [*Dx*(*r*), *Dy*(*s*)] denote the ratios of points belonging to the (*r* th, *s* th) bins and *Dxy*(*r*, *s*) is the ratio of points in the intersection of the (*r* th, *s* th) bins of the random variables. This means that probability density functions *p*(*x*), *p*(*y*) and *p*(*x*, *y*) are estimated via a histogram method, where *p*(*x*) = *Dx*(*r*)*x*(*r*) 1, *p*(*y*) = *Dy*(*s*)*y*(*s*) 1, *p*(*x*, *y*) = *Dxy*(*r*, *s*)*x*(*r*) 1*y*(*s*) 1, and [*x*(*r*), *y*(*s*)] represent the size of the (*r* th, *s* th) bins. Parameters (*Bx*, *By*) represent the number of bins, where [*Dx*(*r*) ; 0, *Dy*(*s*) ; 0] and *Bxy* is the number of bins where *Dxy*(*r*, *s*) ; 0. In this study we adopt *Nx* = *Ny* = 94.

Based on the mutual information, a 50 × 50 symmetric matrix, **I***XY*, is computed and hierarchical clustering analysis is adopted to reveal the relationships between the F-E regions under analysis.

Figure 4a depicts the mutual information as a contour map. As can be seen, the mutual information between F-E regions #35, #49 and #50 and the rest is remarkable higher, hiding the

11

45

38

relationships among most regions. We removed F-E regions #35, #49 and #50 and plotted the corresponding mutual information contour map in Figure 4b.

**Figure 4.** Mutual information represented as a contour map. (**a**) all F-E regions are considered; (**b**) F-E regions #35, #49 and #50 were deleted. The time period of analysis is 1962–2011.

As the graphs in Figure 4 are difficult to analyze, a hierarchical clustering algorithm is adopted for comparing results (Section 2.4.). We used the phylogenetic analysis open source software PHYLIP [63].

The corresponding circular phylograms are generated by successive (agglomerative) clustering and represented in Figure 5a (for all F-E regions) and 5b (for all F-E regions except #35, #49 and #50). The leaves of the phylograms represent F-E regions. An average-linkage method was used to generate the trees.

**Figure 5.** Circular phylogram, based on mutual information, used to compare F-E regions. (**a**) all F-E regions are considered. (**b**) F-E regions #35, #49 and #50 were deleted. The time period of analysis is 1962–2011.

Regarding Figure 5a, cluster {35, 49, 50} is clearly different from the rest, as expected. Moreover, clusters {9, 34, 36, 38}, {11, 28, 42}, {26, 39, 47} and {2, 4, 7, 45} can be identified. A larger cluster contains all the rest. Additionally, in Figure 5b, the clusters {3, 27, 29, 31, 40} and {8, 12, 13, 14, 15, 30}, for example, are easily noted, as well as the main larger cluster composed by the remaining F-E regions. Comparing the results coming from the analysis by means of G-R law parameters and mutual information, namely Figure 1 and Figure 5, we can see that the latter is easier to interpret. However, deciding for one or another approach necessitates a more detailed analysis based on specific evidences and practical knowledge in the field. In conclusion, the proposed analysis, based in seismic data catalogues, can help in understanding the overall complex dynamics of earthquakes.

#### **4. Analysis of Rectangular Grid-Based Regions**

In this section, instead of F-E regions, an alternative seismic regionalization is considered. The mathematical tools presented in Section 3 are also adopted. We propose dividing Earth into 14 F 14 rectangular cells and, as previously, analyzing data in a per region basis. Events with magnitude *M* \$ 4.5 and time period 1962–2011 are considered. The G-R law parameters (*a*, *b*) are computed for each region and the results are depicted in Figures 6 and 7, respectively.

**Figure 6.** Regional variation of G-R parameter *a*. A 14 F 14 rectangular grid is adopted and events with magnitude *M* \$ 4.5 are considered. The time period of analysis is 1962–2011.

**Figure 7.** Regional variation of G-R parameter *b*. A 14 F 14 rectangular grid is adopted and events with magnitude *M* \$ 4.5 are considered. The time period of analysis is 1962–2011.

It can be seen that the activity level parameter, *a*, assumes larger values in areas of larger seismicity that develop closer to tectonic faults. The scaling exponent, *b*, reveals identical behavior, being remarkable higher in Scandinavia, Northern Atlantic, Arabic Peninsula, Russian Far East, Brazilian Northeast and Fiji/Tonga/Samoa region. Alternatively, the mutual information is computed and a phylogram is generated to facilitate visualization for the 14 F 14 grid (Figures 8 and 9).

**Figure 8.** Contour plot representing the mutual information. A 14 F 14 rectangular grid is adopted and events with magnitude *M* \$ 4.5 are considered. The time period of analysis is 1962–2011.

**Figure 9.** Circular phylogram based on mutual information. A 14 F 14 rectangular grid is adopted and events with magnitude *M* \$ 4.5 are considered. The time period of analysis is 1962–2011.

We observe that the analysis based on the Cartesian grid leads to a more comprehensive visualization of the information than the Flinn-Engdahl regions. Therefore, this approach should be considered as an important alternative to classical definitions of geographical layouts for studying the mutual influence of earthquake and geological data.

#### **5. Conclusions**

Based on the magnitudes of the seismic events available in the ISC global catalogue, two schemes were proposed to compare the seismic activity between Earth's regions. A first method consisted in approximating the data by R-G law and analyzing the parameters that define the distributions shape. The second method used the mutual information as a measure of similarity between regions. In both cases clustering analysis was adopted to visualize the relationships between the data. Different measures lead to distinct results. The mutual information based measure gives results easier to interpret. Both measures can help in understanding the overall complex dynamics of earthquake phenomena.

#### **Conflicts of Interest**

The authors declare no conflict of interest.

#### **References**



Reprinted from *Entropy*. Cite as: Zhou, S.; Cao, J.; Chen, Y. Genetic Algorithm-Based Identification of Fractional-Order Systems. *Entropy* **2013**, *15*, 1624-1642.

## *Article*
