# Trajectory Data Mining via Cluster Analyses for Tropical Cyclones That Affect the South China Sea

## 1. Introduction

## 2. Research Methods

#### 2.1. Research Data

#### 2.2. Equal Division of TC Trajectory Method

#### 2.3. Mass Moment of the TC Trajectory Method

#### 2.4. Mixed Regression Model Method

_{i}× (p + 1) Vandermonde determinant; and β is the (p + 1) × 2 matrix of the regression coefficients. The first column in the matrix is the regression coefficient of the x coordinate on the TC trajectory, the second column is the regression coefficient of the y coordinate. ε

_{i}is the n

_{i}× 2 error term that obeys the normal distribution; its average is 0, and n

_{i}is the number of points in the trajectory. According to the number of observation points in the trajectory, n

_{i}can take different values. Based on the qualitative artificial method and the quantitative analysis used to determine the fitting results for the TC trajectory, Gaffney suggested that the quadratic polynomial has the best fitting result for the TC trajectory [42]. Therefore, quadratic polynomials are typically used for clustering analyses of TC trajectories. In this paper, we also adopt the quadratic polynomial method to perform the clustering analysis of TCs in the SCS.

#### 2.5. Selection of the Number of Clusters

## 3. Results

#### 3.1. Clustering Results for the TC Trajectories

#### 3.2. Trajectory Features of Different TC Classes

#### 3.3. Genesis Locations of Different TC Classes

#### 3.4. Landfall Locations for Different TC Classes

#### 3.5. Intensity and Lifetime of Different TC Classes

#### 3.6. Seasonal Distribution of Different TC Classes

## 4. Discussion

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

**Figure 1.**Variation of the coefficients of variation index with the number of clusters. The left side shows the result of the equal division of the trajectory method, and the right side is the result for the mass moment of the trajectory method. (

**a**) Equal division of the trajectory method. (

**b**) Mass moment of the trajectory method.

**Figure 2.**Clustering centres of the tropical cyclone (TC) trajectory obtained by different clustering algorithms: (

**a**) clustering centre for the equal division of the trajectory method, (

**b**) clustering centre for the mixed regression model method, and (

**c**) clustering centre for the mass moment of the trajectory method.

**Figure 3.**(

**a**–

**e**) TC trajectory clustering results and corresponding heading frequency distribution of TC movement obtained by the different trajectory clustering methods.

**Figure 4.**(

**a**–

**e**) kernel density distribution of genesis locations for the TC trajectories in the different classes according to the different trajectory clustering methods. The density is the number of TC genesis points per square kilometre, and the unit is 10

^{−5}events /km

^{2}.

**Figure 5.**(

**a**–

**e**) Spatial distribution of landfall locations for the TCs in different classes according to the different trajectory clustering methods. Red indicates the frequent landfall locations, and green indicates the infrequent landfall locations. A value of ±3 is statistically significant at a confidence level of 99%, ±2 is statistically significant at a confidence level of 95%, and ±1 is statistically significant at a confidence level of 90%; 0 indicates that the results are not significant.

**Figure 6.**The statistics of intensity distribution and lifetime of TCs in different classes obtained by the different trajectory clustering methods. The histograms on the left side of the figure show the intensity distribution of TCs in different classes; the corresponding boxplot on the right side shows the lifetime statistics of TCs in the classes. The line and the diamond within the box represent the median value and the mean value of the lifetime of TCs in the class, separately. (

**a**) Intensity and lifetime statistical results obtained by equal division of the trajectory method; (

**b**) Intensity and lifetime statistical results obtained by mixed regression model method; (

**c**) Intensity and lifetime statistical results obtained by mass moment of the trajectory method.

**Figure 7.**(

**a**–

**e**) Seasonal distributions of TCs in different classes according to the different trajectory clustering methods.

**Figure 8.**Central point and variance ellipse of the trajectory obtained by the (

**a**) original trajectory points and (

**b**) equal division trajectory points for Typhoon Emma (November 1959).

TC Trajectory Class | Equal Division of the Trajectory Method | Mixed Regression Model Method | Mass Moment of the Trajectory Method | |||
---|---|---|---|---|---|---|

Average Length (km) | Standard Deviation (km) | Average Length (km) | Standard Deviation (km) | Average Length (km) | Standard Deviation (km) | |

Class A | 1804.77 | 824.79 | 1939.48 | 900.28 | 2014.61 | 1007.87 |

Class B | 3297.24 | 971.83 | 3429.85 | 966.40 | 3318.78 | 1066.28 |

Class C | 4604.63 | 1297.13 | 5118.53 | 1275.13 | 5239.34 | 1416.27 |

Class D | 4049.65 | 1846.76 | 5216.37 | 1945.33 | 3812.19 | 1529.89 |

Class E | 7526.23 | 2210.65 | 4331.05 | 2545.40 | 8489.51 | 1930.25 |

**Table 2.**Number of TC trajectories in the different classes and the proportion relative to the total TC number.

TC Trajectory Class | Equal Division of the Trajectory Method | Mixed Regression Model Method | Mass Moment of the Trajectory Method | |||
---|---|---|---|---|---|---|

Number of Trajectories in the Class | Percentage | Number of Trajectories in the Class | Percentage | Number of Trajectories in the Class | Percentage | |

Class A | 227 | 24% | 260 | 27% | 261 | 28% |

Class B | 266 | 28% | 208 | 22% | 243 | 26% |

Class C | 203 | 21% | 119 | 13% | 186 | 20% |

Class D | 166 | 18% | 163 | 17% | 204 | 21% |

Class E | 84 | 9% | 196 | 21% | 52 | 5% |

Overall | 946 | 100% | 946 | 100% | 946 | 100 |

**Table 3.**Average longitude and latitude coordinates of the starting TC trajectory points in different classes.

TC Trajectory Class | Equal Division of the Trajectory Method | Mixed Regression Model Method | Mass Moment of the Trajectory Method | |||
---|---|---|---|---|---|---|

Longitude | Latitude | Longitude | Latitude | Longitude | Latitude | |

Class A | 117.38 | 14.914 | 118.99 | 14.47 | 118.90 | 14.72 |

Class B | 134.01 | 11.81 | 135.94 | 10.79 | 135.09 | 10.28 |

Class C | 148.93 | 9.78 | 150.84 | 7.87 | 150.04 | 9.62 |

Class D | 130.67 | 16.06 | 142.38 | 12.43 | 130.66 | 17.00 |

Class E | 132.75 | 13.75 | 127.75 | 17.05 | 132.96 | 13.89 |

Clustering Method | Equal Division of the Trajectory Method | Mixed Regression Model Method | Mass Moment of the Trajectory Method |
---|---|---|---|

Type of method | Combination with the general clustering model after the transformation of the original trajectory | Cluster the original trajectory data based on the mathematical model | Combination with the general clustering model after the transformation of the original trajectory |

Model complexity | Simple | Complicated | Relatively complicated |

Information contained | Spatial location and shape information of the trajectory | Original complete trajectory information | Spatial location, shape information and some velocity information of the trajectory |

Clustering results | Trajectory shape consistency is relatively good | Trajectory spatial consistency is relatively good | Essentially similar to the equal divide method |

Class centre | Average trajectory in the class | Quadratic curve | Variance ellipse |

**Table 5.**Cosine similarity statistics of TC trajectories in different classes obtained by the various trajectory clustering methods.

Clustering Method | Equal Division of the Trajectory Method | Mixed Regression Model Method | Mass Moment of the Trajectory Method | ||||||
---|---|---|---|---|---|---|---|---|---|

Mean | Maximum | Minimum | Mean | Maximum | Minimum | Mean | Maximum | Minimum | |

Class A | 0.991 | 1.000 | 0.914 | 0.990 | 1.000 | 0.924 | 0.991 | 1.000 | 0.914 |

Class B | 0.993 | 1.000 | 0.944 | 0.994 | 1.000 | 0.963 | 0.992 | 1.000 | 0.928 |

Class C | 0.992 * | 1.000 | 0.950 | 0.991 * | 1.000 | 0.939 | 0.986 | 1.000 | 0.898 |

Class D | 0.988 | 1.000 | 0.911 | 0.981 | 1.000 | 0.877 | 0.986 | 1.000 | 0.922 |

Class E | 0.987 * | 1.000 | 0.940 | 0.982 | 1.000 | 0.885 | 0.987 * | 1.000 | 0.940 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

