Timelines have been used for centuries and have become more and more widely used with the development of social media in recent years. Every day, various smart phones and other instruments on the internet of things generate massive data related to time. Most of these data can be managed in the way of timelines. However, it is still a challenge to effectively and efficiently store, query, and process big timeline data, especially the instant recommendation based on timeline similarities. Most existing studies have focused on indexing spatial and interval datasets rather than the timeline dataset. In addition, many of them are designed for a centralized system. A timeline index structure adapting to parallel and distributed computation framework is in urgent need. In this research, we have defined the timeline similarity query and developed a novel timeline index in the distributed system, called the Distributed Triangle Increment Tree (DTI-Tree), to support the similarity query. The DTI-Tree consists of one T-Tree and one or more TI-Trees based on a triangle increment partition strategy with the Apache Spark. Furthermore, we have provided an open source timeline benchmark data generator, named TimelineGenerator, to generate various timeline test datasets for different conditions. The experiments for DTI-Tree’s construction, insertion, deletion, and similarity queries have been executed on a cluster with two benchmark datasets that are generated by TimelineGenerator. The experimental results show that the DTI-tree provides an effective and efficient distributed index solution to big timeline data.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited