Track Pairs Collision Detection with Applications to Ship Collision Risk Assessment

: The port waterway network plays an important role in the organization and management of port ship traffic. Due to limited ship operations, conflicts, congestion, and safety issues often arise in port waters. Conflicts between ships can be predicted by collision detection between ships. A novel collision detection algorithm for trajectory pairs is proposed by introducing variable time interval variables. In addition, to improve the overall accuracy of trajectory compression and reduce redundant calculation in collision detection, a multi-factor Douglas-Peucker algorithm adapted to ship trajectory compression is proposed with the consideration of speed and turn constraints. The maximum speed difference of the algorithm is increased by 1.5–2.5%, and the average speed difference increased by 2.0–4.5%. Based on the method mentioned above, the risk assessment framework of maritime collision is established and the risk situation of the waters near Ningbo Zhoushan Port is evaluated and analyzed by using ship historical track data.


Introduction
As the last nautical mile of ship transportation, the safety of navigation for vessels in the port waters directly affects the trade exchanges between countries [1]. To strengthen the safety of navigation in port waters, the International Maritime Organization (IMO) has established a traffic separation system and a ship reporting system. At the same time, countries have established a Vessel Traffic System (VTS) supervision system based on the characteristics of the water area. Figure 1 shows the management area of VTS near Ningbo-Zhoushan port and Yang-shan port, where there are several functional areas such as anchorage, fairway, alert area, and pilot embarking or disembarking area (point), et al. Ningbo-Zhoushan port and Yang-shan port both have become one of the biggest port all over the world, where the vessel traffic flow is very complex constrained by narrow waterways, islands, bridges, et al. Serval managerial and navigational solutions such as the Traffic Separation Scheme(TSS) and VTS have been implemented in this area, however, the actual effectiveness of these solutions in generating navigational safety and the focus of subsequent navigational regulation still deserve further study.
Current ship collision risk evaluation theories and model studies can be classified according to macro and micro [2]. Macroscopic collision evaluation methods are mostly based on collision data, traffic flow statistics, and relevant hydro-meteorological data using statistical analysis to make a comprehensive evaluation of navigational safety in a certain range of waters. This study focuses on the evaluation of macroscopic collision in this water. It can be subdivided into two categories, including the evaluation methods based on the number of collisions and collision rate and the evaluation methods based on the number of encounters and encounter rate. In [3], the authors had assessed the collision risk of the narrow waterway in Ningbo-Zhoushan with collision detection method and ship domain. This study first aligns and discretizes the trajectories, and then achieves collision detection of trajectory pairs by detecting collisions between isolated trajectory points. It is easy to find that the efficiency will be low and the process is complicated.
Regarding the study of conflict prediction methods within the channel, the authors of literature [4,5] classified conflicts into two categories, connecting segments and intersections, by introducing the concept of network. Literature [6] provides a literature review on the assessment of individual collision risk in ship navigation. Chai [7] proposed a quantitative risk assessment model with the consideration of human life loss and environmental pollution. Weng [8] made an estimation for the collision frequency of vessels in the Singapore Strait with the real-time vessel movement data. Goerlandt [9] developed a collision alert framework from the perspective of the concept of risk and the intended use of the model. Furthermore, the authors also introduced a method for measuring the ship collision risk, which was successfully applied to a case study. Debnath [10] made a study on the influential factors of collision risks and analyzed the associations between risks and geometric, traffic, and the regulation for controlling traffic in Singapore port fairways. Based on the ship domain safety areas derived by the VTS operator, Yoo [11] proposed a real-time collision risk assessment support system, whose validity was verified in the scenarios at Busan port.
Since macro collision risk evaluation is mostly obtained by statistical analysis based on historical big data, and the huge amount of data will bring some trouble to the research. In terms of compression of trajectory big data, the famous Douglas Peucker algorithm was proposed in [12]. Considering the importance of the velocity of a moving object in map-matching [13], mobility prediction, and moving pattern mining [14], Ying, et al. [15] proposed an algorithm for preserving velocity when simplifying trajectory.
The key of ship collision risk assessment is to calculate the number of conflicts between ships. At present, in the field of ship-based conflict detection research, most of the continuous collision detection is carried out through the discrete collision detection method. Firstly, the continuous state of the trajectory pair is discretized, and then approximately continuous collision detection is performed by judging whether there is a collision between the finite states of the trajectory pair. Although its understanding and implementation are simple, its efficiency is very low. The detail for real-time collision detection can be found in the literature [16]. Tang [17] presented a novel continuous collision detection algorithm by introducing deforming non-penetration filters.
The purpose of this work is to evaluate the collision risk for vessels in specific water. The main contribution is divided into two parts. Firstly, the previous research work is improved and a new trajectory simplification method is designed. Secondly, a collision detection method between trajectories is proposed, which can effectively improve the efficiency of collision detection and reduce the difficulty of algorithm implementation. The framework of ship collision risk assessment is shown in Figure 2.

AIS Compression Based on the Improved Douglas-Peucker Algorithm
Ship trajectory data is accumulated continuously through AIS equipment, and its broadcast interval can be as short as 5 s, so it will consume a lot of computing resources when analyzing ship collisions through massive AIS data. Especially for ships sailing in the waterway network, their navigation is limited by the direction of the waterway. According to statistics, in a ship trajectory, the key points to describe navigation behavior only account for about 8%. Therefore, we firstly compress the big AIS data. Therefore, our primary work is to compress AIS data.
Regarding the compression for AIS data, some works have been made in our previous work [18]. Considering that the historical trajectory of the ship contains information such as speed, position, heading, etc., this section improves the DP (Douglas-Peucker) algorithm by considering the influence of various factors on the compression algorithm. This section improves the DP algorithm considering the influence of ship speed based on Euclidean distance, realizes the compression and preprocessing of ship AIS data, reduces the number of AIS data, and improves the efficiency of ship collision detection.

Trajectory Preprocessing
We use the trajectory AIS data of Ningbo-Zhoushan, China in 2018, due to the original data is redundant, besides, the same ship may have many voyages into the same port, therefore, the AIS data should be preprocessed before grouping by voyage. The preprocessing steps are as follows (Algorithm 1).

Algorithm 1. Preprocessing AIS data
Require: AIS database aisdata.db, threshold_time = 30 min 1: Select distinct(MMSI) from aisdata.db squry_mmsi 2: For each sub_mmsi in squry_mmsi 3: Select track from aisdata.db where MMSI = sub_mmsi 4: If the number of track is less than 10 then continue 5: Else For each point in track 6: If the time difference time_diff between the next timestamp and the present timestamp is bigger than threshold_time then insert voyage into return_voyages, voyage set NULL 7: Else insert point into voyage 8: End If 9: Insert voyage into return_voyages, voyage set NULL 10: End for 11: End If 12: End for 13: Return return_voyages Due to the coordinate of trajectory are stored by longitude and latitude. Amis at the satisfaction of accuracy and efficiency of trajectory compression and clustering, there need to be transferred from geodetic coordinate system to Mercator. ( , ) ϕ λ refers to the geographical coordinates and Mercator coordinates is ( , ) x y , a refers to the long radius of the ellipse of the earth and e refers to the first eccentricity of earth ellipsoid that is the meridian parts at latitude, 0 r refers to the circle radius of base dimensions latitude. The formula for transferring coordinate from geographical to Mercator as Equation.
(  . DP algorithm [12] as one of the most accurate line simplification algorithms is proposed in 1973. The core concept of the DP compression method is to reduce the original trajectory point with compression threshold the perpendicular Euclidean distance of each remaining point. Before simplifying the trajectory data from the berthing data, there need to certify the compression threshold τ . DP algorithm recursively divides the curve composed of line segments. Initially, it is given all the points between the first and the last point. Firstly, the first and last points are kept, and then find the furthest from the line segment with the first and last. If the point is closer than τ to the line segment, then all points can be discarded, otherwise, that point must be kept. The algorithm recursively calls itself with the first and the furthest and then with the furthest and the last. The Ramer-Douglas-Peucker (RDP) algorithm, also known as the Douglas-Peucker algorithm and iterative endpoint fit algorithm, so in the following, RDP algorithm refers to DP algorithm.
Based on the DP algorithm and combined with speed and turning limitation, Multi-Factor Douglas-Peucker (MFDP) algorithm (Algorithm 2) is proposed. Before introducing the MFDP algorithm, we first describe the formal definitions for illustrating the parameters: Find the point with maximum distance related to the baseline as d 3: If d > dmax, split the track point list into two parts MFDP (D, d, j, e) 6: Else output , i j P P

Error Analysis
The main measurements for the experimental evaluations are as follows, accuracy by turning angle SED C and speed SED V . The way of measuring the speed and turning angle difference between AIS trace and its compressed version is to measure the Synchronized Euclidean Distance (SED). SED measures the distance for any point in the track, this is the distance between its actual location and its synchronous position estimated via interpolation between its predecessor and successor points in the compressed track shown in Figure 3.
The Synchronous Euclidean Distance error of turning angle ( ) The algorithm for generating synchronization points is presented in Algorithm 3.

Algorithm 3. Generating synchronization points Subtrack
, , is a points list of each trajectory in tracks; marklabel is the filtered point label list; id is each one filtered point label in Subtrack; Outtrack is the generated synchronization trajectory list; L is the number of points in Outtrack.

1:
For each submark in tracks do 2: If the number of submark is less than 3 then continue 3: Else 4: For each id in marklabel do 5: while L < id do 6:

Collision Risk Assessment
At present, the research methods for ship collision accidents in a specific research water area within a specific period mostly refer to the results in the road and aviation fields. In the aviation field, a collision event is defined as the number of collisions within a specific period, which is also referred to as the frequency of collisions collision f . It is equal to the product of the number of collision incidents in a given time conflict N and the accident-causing coefficient causation P .
collision conflict causation Note that the conflicts between ships can be divided into three categories based on the International Regulations for Preventing Collisions at Sea (COLREGS) [7], such as head-on, crossing, and overtaking conflict. All three types of conflict are distinguished through the course differences between ship pairs.
Overtaking conflict means that ship pairs are sailing on almost parallel courses and the course difference should not exceed 10°. Crossing conflict means that the course difference of ship pair falls in the range 10-170° or 190-350°. Head-on conflict means that the course difference of ship pair falls in the range 170-190° [7]. Furthermore, the necessary condition for the occurrence of the above three types of conflicts is that one ship will intrude into the safety domain of the other when both ships sail with their current course and speed.
Considering the three types of conflicts mentioned above, the collision risk can be assessed by the sum of the number of collisions that all types will encounter. are respectively the causation probability for the head-on, crossing, and overtaking conflicts.
As mentioned, if we want to evaluate the number of collisions, we need firstly calculate the number of conflicts. The basis for whether there is a collision risk between two ships is whether there is an intrusion in the safety domain of the two ships under relative motion. Regarding the research on instantaneous collision detection, as mentioned in the first section, the most widely used currently is to evaluate the collision risk through the nearest encounter point or the ship domain.

Spatial-Temporal Alignment of Trajectory
In the past, collision detection was performed by judging the relative position between ship fields. Since the trajectory of a ship is composed of multiple position points, which can be regarded as a discrete object, therefore the collision detection between the trajectories needs to frequently detect the state of the two ships. To avoid unnecessary detection, it is necessary to filter out the state where there is no possibility of a collision at all. Traditional methods include hierarchical bounding boxes, space segmentation, and GJK, but most of the above methods are collision screening of isolated objects, and there is still a problem of low efficiency in trajectory collision detection. In this study, the spatiotemporal collision detection between trajectory pairs is simplified by time intervals.
Different from the previous study in [3,7,8], regarding the calculation for the number of vessel conflicts, we determined a self-adaptive time interval from the trajectory pairs as shown in Figure 4. In the previous study mentioned above, the trajectory was discrete by a fixed time interval, such as the value was set as three minutes in [7,8], and 10 s was taken in [3]. However, since the launch of an AIS device follows the principle of time division multiple access, the timestamps of each track are not consistent. Therefore, the collision detection based on fixed time intervals would need to interpolate the track to achieve the purpose of timestamp alignment. Furthermore, there is no guarantee that there is no waypoint when further collision detection is performed on the trajectory pair of each time interval through relative motion.
The steps for determining the time interval for collision detection of trajectory pairs are as follows: (1) Determine the time intersection interval of the trajectory pair, if it exists, continue, otherwise exit.
. In this study, the radius of the safety domain R is various lengths of vessels, which is set as three times the length of vessel referred [8].

Trajectory Pairs Collision Detection
In the mentioned above, the time interval set has been determined, based on this foundation, we need to detect whether there is conflict at any time interval. Figure 5 is a conflict example for trajectory pair, where ship domain was utilized to detect collision between two vessels. If ship i enters the domain of another ship j during the time interval  Whether an intrusion event occurs between ships i and j is determined here by the relative distance of movement and the size of the ship's domain as shown in Figure 6 where , Thus, the problem of ship collision risk assessment can be transformed into a statistical problem of the number of ship collisions in a specific area in a specific period. As shown in Figure 6, if , t t i j D +Δ is less than i R , then the ship j will intrude the safety domain of the ship i in the time interval t Δ .
Assuming that two ships maintain their current course and speed during the time interval, Formula (14) can be transformed to Therefore, whether there is a conflict between ship pairs during a time interval t Δ or not can be determined by the Boolean variable ( ) , , B i j t .

( )
, , Furthermore, due to the large span of the researched data in time and geographic scope, to avoid the extra calculation amount between trajectories that do not exist in conflict, it can be realized utilizing spatial retrieval. Currently, mature spatial retrieval technologies include the R-tree index, grid index, quad-tree index, and so on. In practical applications, we can combine one or more of the above technologies and merge their characteristics to form a new spatial indexing technology.

Case Study and Results
In this section, a series of experiments for the performance evaluation of the proposed MFDP and path generation algorithm by a real trajectory set obtained from Ningbo, CHINA was conducted. All the experiments are implemented in python 3.6 on an AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz machine with 16.0 GB of memory running Microsoft Windows 11. Our experimental results are as following discussions.

Experimental Data Source
In this study, AIS data were obtained from the Transportation and Maritime Safety Administration of China which is responsible for storing and maintaining various ship information along the coast of China.
The experimental data was all the Class-A AIS messages, that were collected in Zhou Shan Islands from selected obtained from 1 January 2018, to 10 January 2018. The data had been pre-processed utilizing the study which was described in the work [3]. The description of the measurement is shown in Table 1. The number of ships and the number of trajectories of each type are listed in Tables 2 and 3 gives the analysis of the ratio of ship length to width.

Threshold Analysis
From the described in Section 3.1, we can find that Compression accuracy depends on the compression threshold. The larger the compression threshold, the lower the compression accuracy. So, the threshold selection will directly determine the quality of the simplified data. However, the optimal threshold value is difficult to determine which was affected by many factors. For example, Zhao et al., 2018 used the own ship's length in different trajectories as the threshold value for satisfying the adaptability. Due to the data was collected from different types of ships, the characteristics of the trajectory are affected by a number of parameters, such as the length, the width, and the ratio of length to width. For example, the same length of ship sail on the same water, the track is different depending on the width of ships.
Considering the different maneuverability of ships, the bigger the ship, the more difficult it is to operate, therefore the track of the big one would have fewer points deviate the overall shape of ship trajectory than the small one. Therefore, if the threshold is global and suitable for the big ship, then the important track point for the small ship would be discarded. Conversely, if the threshold is suitable for the small ship, then there would retain many points of redundancy, and this leads to the low compression rate.
Consequently, the ratio of the length to width of ships was considered in the selection of threshold values. The threshold for each trajectory is different from each ratio of ship length to width. In the following experiments, which equal to 10 times or 100 times the ratio of ship length to width. Furthermore, when setting the threshold value, the sailing waters also can be considered, for example, the ratio in the harbor water can be set smaller than the ratio in the open water. Our practical experience shows that the appropriate ratio in the harbor water usually ranges from 30 times to 80 times. For different users, the threshold value varies to the purpose, thus whose variability should be considered in the application of the DP algorithm. Under certain conditions, the threshold value can be fixed differently as needed.
From the results shown in the left one of Figure 7, there was a strong correlation between the retention rate of characteristic points and the course threshold when the threshold of the Euclidean Distance and angle difference remained constant. From the results shown in the right one of Figure 7, there was a strong correlation between the retention rate of characteristic points and speed threshold when the threshold of the Euclidean Distance and the course difference remained constant. This showed that the MCDP method can retain the characteristic points that we wanted, and more features can be extracted.
(a) (b) Figure 7. The relationship between retaining ratio and speed constraint or course constraint. (a) is the relationship between the retaining ratio and course constraint when the speed constraint is 87 and the threshold of the Euclidean distance is 100 times; (b) is the relationship between the retaining ratio and speed threshold when the course threshold is 27 and the threshold of the Euclidean distance is 100 times. Figures 8a and 9a showed a strong correlation between the course difference, course threshold, and speed threshold when the threshold of the Euclidean Distance remained constant in standard error and mean error, respectively. The results in Figures 8b and 9b showed a strong correlation between the speed difference, speed threshold, and course difference when the threshold of the Euclidean Distance remained constant in standard error and mean error, respectively. Through analysis, it is also can be found that the MCDP method can retain the characteristic points that we wanted, and more features can be extracted.

Experimental Comparison
To further verify quality improvement, the samples were tested with the provided method. The experimental results of the proposed method are shown in Tables 2-4. The results of the number of critical points deleted by RDP incorrectly are presented in Table  2, where RDP was tested with the threshold of Euclidean Distance, and MCDP was tested with the threshold of Euclidean Distance, angle constraint, and course constraint. From this, we can find that the provided method could retain more characteristic points than RDP. Tables 3 and 4 is the speed loss and course loss of RDP and MCDP on the dataset with different thresholds. In comparing the results, MCDP (rather than RDP) improved 1.5-2.5% in the average course synchronization error of trajectories and 2-4.5% in the average speed synchronization error of trajectories. From the comparison shown in Figure  10, it is apparent that the trajectory after simplification using MCDP was more similar to the original trajectory than when using RDP for the course difference and speed difference. Table 1 is the measurement, unit, and description of the measurement.   Figure 11 shows the Synchronized Euclidean Distance of MCDP and RDP. In Figure  10, the various Synchronous Euclidean Distance error of speed and course of RDP is greater than MCDP, in other words, the trends both in speed and course simplified with MCDP algorithm were more accurate than that of RDP algorithm by SEDtotv, SEDavgv, SEDtotc, and SEDavgc. The reason is that the RDP algorithm just considered the spatial and temporal information and not the speed and course status. Our MCDP method fully and deeply tackled multiple constraints, e.g., velocity, course, and position, for improving the accuracy of the trajectory simplification. The Synchronous Euclidean Distance error of both methods increases with an increasing threshold value. Figure 11. Comparison between Synchronous Euclidean Distance error of speed and course of RDP and MCDP. (a-d) show the total SED error of speed, total SED error of course, average SED error of speed, and average SED error of course, respectively. Figure 12 shows the comparison in terms of the number of the remained critical points in different times of ship length to width. From this, we can find that the number of the remained critical points of both methods decreases with increasing threshold and MCDP can remain much more critical points than RDP. To verify the advantages of MCDP, one trajectory was utilized to conduct the parallel experiments. In this experiment, the threshold value was set as 50 times the ship length to width. In Figure 13, the blue and red lines represent the original algorithm and the improved algorithm, respectively. The improved algorithm at each peak and valley can better preserve the original properties. The results reveal that the MCDP method can preserve more attributes than RDP methods in terms of speed attribute and course attribute. Figure 14 shows the compression results of a vessel trajectory in the parallel experiment. Through analysis of this figure, it can be found that there will have more critical points been remained as denoted with an arrow in Figure 14. Therefore, the MCDP method is superior to the RDP method in retaining the attribute characteristics of trajectory including shape, speed, and course.

Collision Risk Analysis
In this section, collision risk was assessed with different meeting situations, such as head-on, crossing, overtaking situation. Therefore, we derived from existing studies [2,3,7], the causation probability values for head-on, crossing, and overtaking conflicts were set as  Table 5 presents the results of the collision risk assessment for the study area, where the collision frequency of the first officer is bigger than others, and the collision frequency of the other two duty periods is the same. In addition, there is a positive correlation between the on-duty stage and the frequency of conflict.   Table 6 shows the comparison of collision frequency in different encounter situations using analysis of variance (ANOVA) [3,8]. The F value and P-value in ANOVA are used to analyze the differences between and within groups. F value corresponds to a P-value, and the larger the F value, the smaller the P-value. If the F value is equal to 1, the difference between groups is consistent with the difference within groups. It can be found that F values of the three conflict types differ greatly and P values are small, therefore, the differences between groups are large, and it can be concluded that the collision frequency is affected by the driver's experience or duty period.  Figure 15 reveals that the difference between the relative speed distribution of ships involved in different conflict types. The major relative speed in head-on, crossing, and overtaking conflicts is range 11-12, 15-18, and 6-8 knots, respectively.  Figure 16 indicates that the length distribution for vessels involved in different types of conflict. From three types of distribution, it can be found that the majority of length in head-on and crossing conflicts is around 100 m. In overtaking conflict situation, the majority of length ranges from 100 to150 m.   Figures 17-19 illustrate the spatio-temporal distribution under head-on, crossing, and overtaking conflicts for the specific time of the studied waters respectively. Where the red, yellow, and brown colors represent the conflicts at different watch stages, respectively. The red color represents the duty stage of the first officer from 00:00 to 04:00 and 12:00 to 16:00. The yellow color represents the duty stage of the second officer from 04:00 to 08:00 and 16:00 to 20:00. The brown color represents the duty stage of the third officer from 08:00 to 12:00 and 20:00 to 24:00.

Discussion
Combining the water layout map in Figure 1 and the spatio-temporal trajectory distribution map, it can be found that the intersection of the Jin-Tang Bridge and the north side of the Xi Hou-men Bridge are both high incidence areas in the three types of conflicts. Furthermore, there are few conflicts in the Yang-shan harbor of Shang-hai port, the reason caused the conflict here should be limited by the narrow waters here so that the encounter distance between the ship pairs is less than the safety domain applied in this study. The width of the channel here is about 400 m, therefore, if there exists an encounter situation in the channel, it is very easy to form a conflict. In addition, during the maintenance and dredging of this channel, some engineering vessels are continuously operating in the channel, and other vessels may easily form conflicts with such vessels.
From the three spatio-temporal results, it can be found that conflicts are easy to occur in the traffic separation system (TSS) of Ningbo Zhoushan Port, especially at the intersection and the waterway in the south. The reasons for this phenomenon may be the following two aspects, on the one hand, the ship traffic flow is large, on the other hand, it may be due to there is a difference of the size of safety domain between the applied in this study and the actual in TSS.
Due to the simple pre-processing of the ship trajectory data in the early stage, the calculated collision frequency values will be smaller than the actual situation. The collision correlation between different ship types and ship sizes can be further analyzed in the later stage to further the risk evaluation of the waters.

Conclusions
The purpose of this study is to provide a framework for the design of a ship collision risk assessment system in port waters. In this paper, based on the estimated data of ships in port waters, we proposed a novel track pairs collision detection algorithm for evaluating the risk of ship collision in the port water. Based on considering the influence of ship speed on trajectory compression, the DP algorithm is improved, which not only ensures the compression ratio but also retains more key behavior points. The framework has great flexibility and can realize collision risk assessment in different waters, which is helpful to identify high-risk areas in port waters and improve ship management.
In the future, we can study the collision probability of this water by collecting collision accident data and determine a more suitable ship safety domain through comparative experiments. Data Availability Statement: Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

Conflicts of Interest:
The authors declare no conflict of interest.