Identification of Hotspot Segments with a Risk of Heavy-Vehicle Accidents Based on Spatial Analysis at Controlled-Access Highway

Significant risk factors that influence the occurrence of heavy vehicle accidents have been explored in numerous studies in order to lower injury severity in traffic accidents. It is imperative to explore road sections with a high risk of heavy vehicle accident occurrence by considering the significant consequences of such accidents for road users, despite the low number of heavy vehicles in traffic flow. To address this, this study proposes a method to predict clustering hotspots for heavy vehicle accidents on the basis of three different criteria, namely, heavy vehicle accident cases, the number of heavy vehicles involved, and accident severity index values. Moran’s I spatial autocorrelation was employed to identify the clustering for each criterion, and the Getis–Ord Gi* statistic was applied to estimate the likelihood of risk along the network. This study considers the features of hotspot points at significance levels from 0.10 to 0.01 with a 1355 m buffer radius to create segments for each criterion. The three criteria for hotspots were considered within the overlapped buffer zone. A total of 22 heavy vehicle risk segments (HVRSs) were identified and then ranked by crash rate. Overall, this study demonstrates the application of different criteria to identify accident hotspots involving a specific vehicle type, which could help in prioritizing segments with a high risk of heavy vehicle accidents, as well as providing information for HVRSs for the purpose of developing appropriate countermeasures for the identified accident hotspots.


Introduction
Due to their typical physical structures and operating characteristics, heavy vehicles affect the surrounding traffic flow. Heavy vehicles typically constitute a small percentage of the traffic stream, but their impact is prominent because they are overrepresented in the number of accidents involving fatalities and property damage [1,2]. Several studies have illustrated this subject; for example, in the United States, despite large trucks representing 3% of registered vehicles and 7% of vehicle miles traveled, they still were found to account for one-ninth of all traffic fatalities in a period of five years [3]. Thus, despite representing a lesser percentage of road vehicles, this demonstrates that truck accidents tend to cause more extensive damage compared to other crashes. The United States National Highway Traffic Safety Administration (NHTSA) statistics reveal that the fatalities caused by large trucks increased by as much as 9% to a total of 4761 in 2017, representing an additional 392 lives lost compared to the previous year. Approximately 18% of the fatalities were occupants of large trucks, whereas 72% constituted other vehicles involved in the collision, and the remaining 10% included nonoccupants of motor vehicles [4]. From 2016 to 2017, there was a 6.8% increase in truck-and bus-related road deaths per 100 million vehicle miles traveled by all motor vehicles [5].
In another example, heavy vehicles represent only 3% of the total registered vehicles and 8% of total vehicles per kilometer in Australia, but are involved in 18% of fatal accidents and serious injuries [6]. In European countries, heavy vehicles accounted for 18% of deadly accidents in 2013, 4021 (15%) of which were fatalities involving heavy goods vehicles and 652 (3%) were fatalities involving buses [7]. Similarly, in the United Arab Emirates (UAE), approximately 5% of accidents include heavy trucks; however, they are involved in 6% of severe collisions and 16% of casualties are related to traffic accidents [8]. The large physical structure and weight of a heavy vehicle contributes to the large impact on other vehicles in terms of accident severity. A study conducted in Sweden by [9] presented the impact of a head-on collision between a passenger car and a 65 t truck, both traveling at a speed of 70 km/h; accordingly, the car was subjected to a theoretical change in velocity of 137 km/h, equivalent to a fall from 70 m, which is almost impossible to survive. Moreover, the study discovered that the total annual mortality of passenger car occupants in collisions with other vehicles was 3.3 per 100,000 inhabitants, 6.9 per 100,000 passenger cars, or 4.8 per 109 passenger car km. From the 293 victims analyzed, 49% were killed by heavy vehicles, despite the fact that trucks and buses constituted only 9% of the vehicles. The authors of [3] discovered that, on average, 84% of fatalities related to heavy vehicle accidents are nonoccupants of heavy vehicles. Similarly, a study conducted in the United Kingdom [10] revealed that buses/coaches and heavy goods vehicles were most often implicated in aggressive collisions affecting motorcyclists involving junctions with consequent severe injuries.
In Malaysia, the total number of accidents in 2018, as revealed by the Malaysian Ministry of Transport, was 548,598, with a total of 6284 deaths [11,12]. Heavy vehicles represent 4.40% of total registered vehicles, whereby 4.20% are heavy goods vehicles, and the remaining 0.20% are buses [13,14]; however, there is a lack of information on the total number of cases involving heavy vehicles in Malaysia. Statistics from the Road Safety Department of Malaysia revealed that fatalities involving heavy vehicle drivers and occupants were low and contributed to 3.67% of the total road deaths [13]. Nevertheless, according to [15], heavy vehicles cause 1000 deaths per year. The numbers suggest that heavy goods vehicles (HGVs) are involved in accidents resulting in more than 80% of fatalities affecting other vehicles. Thus, this indicates that HGV accidents have a significant impact on the safety of other road users. In fact, in 2014, 1866 trucks caused fatal collisions, 575 trucks caused severe injuries, and 1047 trucks caused minor injuries toward drivers or passengers [16]. Furthermore, studies have also revealed that HGVs are most likely to be involved in accidents and in fatalities on expressways compared to other road hierarchies [15,17].
These serious issues show the importance of intervention in order to mitigate their impacts, since heavy vehicles are vital in freight logistics for the economic well-being of a country. One of the strategies to improve road safety is to identify safety-deficient areas on a road network. Therefore, it is very important to determine these high-risk locations for the purpose of implementing preventive measures. One such method to do so involves the investigation of accident hotspots, which is very important in safety management, because errors in determining high-risk segments for safety, result in the insufficient allocation of funds. Numerous studies [18][19][20][21][22] have addressed the issue of accurate hotspot detection on road segments; however, studies that have determined the accident hotspots for specific types of vehicles and assessed the risks associated with those types are scarce, especially for heavy vehicles. In contrast, many studies have focused on the investigation of heavy vehicles by emphasizing the estimation of injury probability and the linked severity [23,24], driver behavior [25,26], operational characteristics on different types of roads, and parameters related to infrastructure [27][28][29]. According to [30], the most important variable associated with accident severity is the type of vehicle. Thus, it is vital to identify the small sections of a road network (referred to as "segments" in this paper) that are overrepresented in accidents involving specific types of vehicle to further implement applicable preventive measures, especially for vehicles with a higher probability of being involved in severe accidents.
Since the 1990s, accident information systems with a geographic information system (GIS) have been widely used. These systems aim to identify high spatial concentrations for crashes, geocode accident locations, develop pin maps of accidents, and perform database queries [31][32][33]. Recently, many studies have utilized spatial autocorrelation methods for hotspot detection, such as kernel density estimation (KDE), local Moran's I, and Getis-Ord Gi*, where these methods are used with a GIS-based system to detect high-risk clusters for traffic accidents [34][35][36][37]. Getis-Ord Gi* is a statistic that identifies the specific locations of statistically significant point clusters at high data point densities in the given vicinity of a point. Features with a high value density may not be a statistically significant hotspot, and a hotspot can be determined if a feature with a high value density is surrounded by other features with high values as well [38]. One study conducted by [39] exposed that local Getis-Ord index hotspot analysis produced better results compared to the central feature and point density function for determining road accident-prone locations. This is due to the point density function, which provides an overall view of recurring accident locations without any statistical significance. The KDE method, which is commonly used to determine the accident density in a neighborhood, also fails to provide statistical significance [40]. In contrast, hotspot analysis via a local Getis-Ord index can detect areas where accidents frequently occur with statistical significance; however, it has also been discovered that the implementation of the Getis-Ord Gi* statistic via inverse distance and inverse distance squared conceptualization of a spatial relationship produces an accurate and similar result to KDE when compared to a fixed distance [41]. On the other hand, in [42], with specific parameter choices, the KDE and local Moran's methods led to similar results. The spatial autocorrelation technique via a local Moran's index (linear clustering) has been found to produce the best results for determining clusters on rural roads where traffic flows are clearly identified along certain routes. Contrarily, the KDE method (a two-dimensional clustering method) notably provides better results in urban areas [43]. Meanwhile, [44] discovered that a local Getis-Ord Gi* method determines hotspots and hot zones with greater frequency and size compared to a local Moran's I method.
One study has presented that different techniques for hotspot detection produce results that are different [45]. Kernel density function methods and Getis-Ord Gi* statistic methods are utterly different, where a KDE-based method aims to detect data for clusters with high values, whereas Gi* statistic-based methods both detect and deepen the understanding of spatial clusters [46]. The literature highlights the various methods used to identify crash hotspots and the most common methods are the Moran's I and Getis-Ord statistics [47]. Each method has its own characteristics that are suited to different problems [48]. The local Moran's I index, also referred to as the local indicators of spatial association (LISA) method, measures the extent of an observation's value similarity or difference from its neighboring observations [42], which generalizes the idea underlying the Getis-Ord Gi and Getis-Ord Gi* statistics, as indicators of local pockets of nonstationary areas or hotspots [49]. Additionally, kernel density estimation (KDE) is instrumental in visually highlighting clusters of crashes on a roadway network. Nevertheless, KDE fails to test the statistical significance for the identified hotspots, which is a key drawback of the method [48].
The main intention of this study is to propose a detection method for overrepresented accidents for a specific type of vehicle in a road network. This study extensively applies the global Moran's I measure and Getis-Ord Gi* function with the aim of analyzing heavy vehicle accident hotspots. Moreover, this study develops a further understanding of the spatial patterns of accidents for a specific type of vehicle through three different criteria, namely, accident frequency, the number of heavy vehicles involved, and the accident severity index in the clustering segment. The criteria were evaluated separately and different maps were produced. In this research, problematic areas are ranked by combining the criteria map and ranking the segments of overrepresented heavy vehicle accidents by using the crash rates for segments in the study area.

Study Area
The Malaysian North-South Expressway (NSE) was selected as a research area for this study. This interurban toll expressway is the longest controlled-access expressway in Malaysia, with a total length of 783 km in 2018, connecting major cities and towns in seven states along the West Coast of Peninsular Malaysia. The expressway is divided into two main routes, namely, the northern routes (E1) and the southern routes (E2). Additionally, the total number of heavy vehicles (vehicles with two axles and six wheels, vehicles with more than three axles, and buses) that use this expressway is approximately 9% of the total amount of NSE users. The NSE was constructed according to the Public Works Department of Malaysia R6 design standards with a designed speed limit of 120 km/h and lane width of 3.5 m; however, the maximum speed limit is 110 km/h, whereas for heavy vehicles it is 80-90 km/h [50].

Road Network and Accident Data
The crash data provided by the Malaysian Highway Authority (MHA) were used in this study. The MHA is a statutory body that monitors and administers the entire Malaysian expressway. The data were received in a computer-ready form, comprised of coded information on reported accidents at the scope of the study in a period of three years from June 2016 until the end of May 2019. Each accident's coded information included important features. The information provided included the locations of accidents per 100 m, accident severity values, report numbers, vehicle types, highway types, highway boundaries, and the year of the accident. Origin and destination data were also provided by the MHA. The crash data cover 458.9 km along the northern route (E1) and 310 km for the southern route (E2). The total number of crash reports along the expressway in the period of the provided data was 47,359, consisting of 29,891 unique cases. Stone road markers denote every kilometer along the NSE. The data used in this study include crash data for 100 m-long segments, and the information provided by the stone road markers is identical to the highway layer in ArcGIS.

Research Methodology
The research objective of this study is to design a procedure to detect spatial and temporal segments of local statistical distribution characteristics for heavy vehicle accidents along the NSE. Moreover, the study aims to find the heavy vehicle accident hotspots through three different criteria, namely, accident frequency, the number of heavy vehicles involved, and the severity index of an accident at the location. Hotspots are then ranked according to the crash rates per segment. Therefore, the method of this study is structured into five parts, and the first part was to determine the spatial location of each traffic accident. The traffic accident coordinate data were imported into ArcGIS to describe accident points for each criterion. The accident severity index is calculated based on the number of fatalities, severe injuries, slight injuries, and property damage cases. Then, Global spatial autocorrelation characteristics were studied for each criterion. Next, a hotspot map of accidents was developed to find typical hotspots in the study area for each criterion. Consequently, the spatial distributions of heavy vehicles for the three different criteria were buffered and overlain to determine points of hotspot intersection. Finally, the crash rates of the intersecting segments were calculated and ranked to determine the locations with high heavy vehicle accident risk. The ArcMap 10.3 platform was used for this research.

Spatial Location Displaying
Both the network structure and crash data used in this study are accurately described and localized in their real contexts. The collected 100 m-long segment crashes data were geocoded and generated by the kilometer markers provided by the MHA. The represented traffic accident dataset underwent a curation process, including removing missing values from the dataset due to the difficulty of using the hotspot analysis tool with missing data or null values. The data were prepared into three detailed subsections based on the three aforementioned criteria.
Consequently, the previous collection of traffic accident data was projected via rectified skewed orthomorphic (RSO) projection (in meter) from the World Geodetic System 1984 (WGS84) projection (in geographic coordinates) as chordal distances are poor estimates of geodesic distances beyond 30 degrees, thus producing a point pattern that remains precisely on the network. Then, the dataset was integrated with a 0.001 m tolerance for the coordinate data. Integration was used to maintain the integrity of shared feature boundaries. The x and y tolerances should be fairly small to minimize any undesired movement of vertices. Next, the data were converted to weighted point data prior to the next analysis step, except for hotspots using accident severity, which were analyzed via the accumulated severity index for every 100 m-long segment.

Severity Index
In this research, the severity index values were determined by accumulated weights of severity at accident points for every 100 m-long segment. The weightage system from the Highway Planning Unit of Malaysia was employed, which was adapted from a guide for accident blackspot identification and road safety countermeasures (Road Engineering Association of Malaysia). The system is based on severity weightage, which is also used by the Malaysian Institute of Road Safety Research (MIROS) to compute site priority values for determining accident-prone areas. In the system, a fatal accident that involves at least one fatality is given 6.0 points, whereas serious injury, minor injury, and property damage are given 4.0, 2.0, and 1.0 points, respectively [51]. The weightage system can be shown by Equation (1): where X 1 is the total number of fatal accidents, X 2 is the total number of serious injuries accidents, X 3 is the total number of minor injuries accidents, and X 4 is the total number of non-injuries or property damage accidents.

Spatial Autocorrelation
The next step included examining spatial patterns and concentrations. Spatial autocorrelation is a term used to describe the presence of systematic spatial variation in a variable and positive spatial autocorrelation, which is most frequently found in practical situations, spatial autocorrelation is a tendency for areas or sites that are close together to own similar values [52]. Spatial autocorrelation can measure global clustering tendencies for collisions and local clustering tendencies for collisions within a road section [44]. Global autocorrelation relates to the entire map pattern and provides only one set of values that represent the extent of spatial autocorrelation across the entire study area. Local, on the other hand, explores the global pattern and captures the many local spatial variations and dependencies [53].
For this study, Moran's I statistic was used as a parameter of global autocorrelation. Moran's I is a common global statistic for calculating spatial autocorrelation via translating a non-spatial correlation to a spatial context [45]. Moran's I measures the correlation among neighboring observations in a pattern [54]. With an associated attribute and the provided set of features, it evaluates whether the pattern expressed is random, clustered, or dispersed. Moran's I has value range from −1 to +1. Values of −1 indicate perfect dispersion and +1 indicates perfect clustering, whereas 0 shows perfect randomness; however, for a statistical hypothesis, Moran's I is interpreted by a z-score and p-value to define the statistical significance. The Z-score and p-value indicate whether or not to reject the null hypothesis. The null hypothesis is usually defined as "no cluster exists, Ho" [55]. The null hypothesis can be rejected when the p-value is very small. This represents a small probability that the observed pattern is the result of random chance.
Very high or very low (negative) z-scores are associated with very small p-values. In this case, a positive z-score means that similar values are spatially clustered, whereas a negative z-score means that similar values are spatially dispersed. For a statistically significant positive z-score, high values are found close to high values and low values are found close to low values. On the other hand, for a statistically significant negative z-score, high values are found distant from other high values, and low values are found distant from other low values, and this dispersion is more pronounced than we would expect from an underlying random spatial process [53]. The Moran's I statistic is expressed as per Equation (2): where n is the total number of features, ω ij is a spatial weight between the location of i and neighboring location j, z i is the deviation of an attribute for feature i from its mean (xi − x), and S 0 is the aggregate of all the spatial weight, as expressed by Equation (3): The global Moran's I statistic can be standardized to Z-score values and mathematically represented as in Equation (4) [56]: where, E[I] = −1/(n − 1) and The spatial weights in this study are distance-based and feature a fixed distance band with a distance threshold. Each feature is examined inside the context of neighboring features. A weight of one is applied to neighboring features within the threshold distance and exerts an influence on computations, whereas a weight of zero is applied to neighboring features outside the critical distance and has no influence on a target feature's calculations.

Hotspot Analysis Getis-Ord Gi*
Global spatial autocorrelation is useful for testing the tendency of global clustering in road collisions. Nonetheless, it fails to detect specific locations for road crashes aggregated throughout the study area. With the aim of identifying statistically significant spatial patterns, the Getis-Ord Gi* statistic and Anselin's local Moran's I statistic were developed as local versions of global spatial autocorrelation techniques [38,49]. Local spatial autocorrelation measures clusters of high crash frequency neighborhoods throughout a study area within a distance threshold.
In this study, the Getis Ord-Gi* statistic was used to detect local accident hotspots and evaluate statistically significant crash clusters. A positive z-score indicates a hotspot, whereas a negative z-score indicates a cold spot. Higher positive z-score values show the more intense high value clustering (i.e., stronger hotspots). On the other hand, a lower z-score indicates a greater clustering of low values (i.e., cold spots). The Getis-Ord local statistic is shown as per Equation (5): where, x = ∑ n j=1 x j n and S = ∑ n j=1 x 2 j n − (x) 2 . Gi* is the Getis-Ord Gi* z-score value, which includes the value at segment i; x j is the attribute value for feature j; ω i,j is the spatial weight matrix for all segment j within distance d; d is the fixed band radius around segment i; and n is the number of weighted points [57]. The Gi* statistic is a z-score, and thus no further calculation is required.

Ranking the Heavy Vehicle Risk Segment
The crash rate (CR) is used to rank heavy vehicle crash risk for segments. It is important to rank hazardous segments to prioritize accident locations to be treated in accordance with their safety development potential and to use limited funds as effectively as possible. Crash rates denote the number of crashes as compared to the exposure to crashes. The exposure in this study is the heavy vehicle volume in the heavy vehicle risk segment. The exposure and crash rates are estimated by Equations (6) and (7): where CR denotes the crash rate of the segment (heavy vehicle crashes per million vehicles in km), C is the total number of crashes in the study period, ADT is the average daily traffic volume, n is the number of years of data, L is the length of the segment in km, Hvc f is the total number of heavy vehicle crashes in the hotspot segment in the study period, Hvv is heavy vehicle volume for the study period at the hotspot segment, and L h is the length of the heavy vehicle hotspot segment.

Result and Discussion
After the curation process, heavy vehicles were discovered to contribute to 7276 accident cases from the total of 29,477 accident cases available in the dataset, which is equivalent to 24.7% of the total accidents. A total of 8641 heavy vehicles were involved from the total of 46,455, representing 18.6% of the total vehicles involved. Furthermore, the number of fatalities via heavy vehicles was 380, which is 42.9% of all fatalities (886). From that amount, 90.3% of fatalities were involved with heavy goods vehicles such as trucks, trailers, low loaders, and tankers, whereas buses/coaches accounted for 8.5% of fatalities and crane trucks represented the remaining 1.1%. The dataset was further evaluated in this study.
The methodology applied in this research emphasized and evaluated the global spatial autocorrelation via the global Moran's I statistic to further assess the overall clustering and determine the validity of the null hypothesis. The data points were analyzed with reference to their neighbors via a distance threshold. Therefore, it was crucial to determine an appropriate distance threshold. The distance threshold was determined through the peak distance of incremental spatial autocorrelation analysis. The peaks indicate the distances at which the spatial processes promoting clustering are most conspicuous. Two peak distances were obtained for each criterion. The first peak for each criterion was 1354.3; however, the maximum peak was different for the severity index criterion. The maximum peak distances for the frequency of accident cases and the number of heavy vehicles involved were the same (4278.8 m), whereas the maximum peak distance for the severity index criterion was 2626 m. In order to use the same clustering distance threshold, the first peak distance of 1355 m was selected to be the distance threshold in this study. Moran's index was calculated together with p-values and z-scores for the three investigated criteria in this study and the results are summarized in Table 1. With a 1355 m threshold radius, the spatial distribution of crashes for all criteria exhibited a high cluster with a high z-score that exceeded 2.58. The results show that the calculated z-score for the spatial autocorrelation for the severity index exhibited the highest value, which was 11.5154, followed by spatial autocorrelation for the number of heavy vehicles involved (11.2995) and heavy vehicle accident frequency (9.2754). A z-score value that exceeds 2.58 signifies 99% confidence that the dataset was clustered and less than 1% of the clustering occurred due to random chance. As was mentioned in the previous section, very high z-scores are associated with small p-values. The p-values for the three criteria were very small and near to zero and the observed patterns also exhibited very high positive z-scores. Thus, in this case, the null hypothesis of "no cluster exists" can be rejected. Global spatial autocorrelation failed to identify a characteristic pattern, i.e., a specific location of crashes aggregate, and thus Getis-Ord Gi* values were computed for the datasets. Hotspot analysis is shown in Figure 1, considering a 1355 m threshold radius for the maps for each criterion. The z-score and p-value of each hotspot can be identified through this analysis. The resulting z-scores and p-values define features that have high or low values in terms of spatial clustering. This is carried out by looking at each feature within the context of neighboring features. A high-value feature surrounded by other high-value features is identified as a statistically significant hotspot. Sustainability 2021, 13, x FOR PEER REVIEW 9 of 21 by accident frequency exhibited the highest maximum z-score of 10.1337, with 472 hotspots detected, whereas the hotspots determined by the severity indices indicated a maximum z-score of 5.9480 and 475 hotspots. Different appearances for hotspots with different criteria maps have also been reported in a previous study [57]. Despite hotspot locations not being accurately located at the same points (hotspots materialize at different points although on the same route) for the three different maps, there were numerous features that overlapped between each other. The next step was to determine the overlap for contrasting criteria with the aim of prioritizing the segments that need to be given more attention for preventive measures. The study by [57] also discovered intersection segments for the criteria in their study and highlighted that these areas are important and should be considered priority areas. Prior to the overlap process, the hotspots for each criterion were buffered within a radius of 1355 m. The threshold distance was selected as a buffer size in this study to ensure the inclusion of all data within the threshold distance during the overlap process. Buffering techniques are widely used in accident analysis when using GIS vectors [58][59][60]. A buffering process considers a buffer zone around target features within a predefined distance and merges or includes overlapping features [60]. Overlaps for all three criteria were identified via the intersect tool in ArcGIS and are marked in purple in Figure 2. Therefore, in this case, a combined segment was named a heavy vehicle risk segment (HVRS). Twenty-two HVRSs were discovered as potential high-risk accident areas, accounting for 22.3% of total heavy vehicle accidents and approximately 12.8% of the total length of the road segment, as shown in Figure 2. It is worth mentioning that the 10 locations that scored a severity index above 20 fell under the HVRS and are marked in red. Each segment was also evaluated, and the details of the evaluation are shown in Table 2. As presented in the table, the lengths of the segments varied due to the varying distances between overlaps. The longest HVRS was 6.5 km long, located at KM 228.0-234.5 E1, whereas the shortest HVRS was 2 km long, located at KM 81.0-83.0 E2.   In this research, features with z-scores exceeding 1.645 were considered hotspots. A zscore value of 1.645 signified a confidence level (CL) of 90% in terms of clustering and less than 10% for random occurrence. In this paper, a 90-99% CL was considered due to some of the features with a 90% CL being close to 95% CL and 99% CL points. Therefore, it was more accurate to gather these clustered features to form a linear segment for analysis. Past studies [44,57] also used a z-score value of 1.645 in their analysis, whereas [38] considered an 87.5% CL, which represents a z-score value of 1.15, in their research on sudden infant death syndrome by county in North Carolina. Moreover, traffic accident involvements do not result in a singular event at any one point. For this reason, it would be wrong to focus on only one point when identifying hotspots. Linear clustering is a new technique that has been applied to determine hazardous segments to limit traffic accidents and take appropriate measures on highways [57].
From the maps, it was observed that some of the hotspots of different criteria appeared at different points, albeit on the same routes. Hotspots categorized by the number of heavy vehicles involved exhibited the highest number of hotspots, with 512 points and a maximum z-score of 9.2766. On the other hand, hotspots for heavy vehicles by accident frequency exhibited the highest maximum z-score of 10.1337, with 472 hotspots detected, whereas the hotspots determined by the severity indices indicated a maximum z-score of 5.9480 and 475 hotspots. Different appearances for hotspots with different criteria maps have also been reported in a previous study [57].
Despite hotspot locations not being accurately located at the same points (hotspots materialize at different points although on the same route) for the three different maps, there were numerous features that overlapped between each other. The next step was to determine the overlap for contrasting criteria with the aim of prioritizing the segments that need to be given more attention for preventive measures. The study by [57] also discovered intersection segments for the criteria in their study and highlighted that these areas are important and should be considered priority areas. Prior to the overlap process, the hotspots for each criterion were buffered within a radius of 1355 m. The threshold distance was selected as a buffer size in this study to ensure the inclusion of all data within the threshold distance during the overlap process. Buffering techniques are widely used in accident analysis when using GIS vectors [58][59][60]. A buffering process considers a buffer zone around target features within a predefined distance and merges or includes overlapping features [60]. Overlaps for all three criteria were identified via the intersect tool in ArcGIS and are marked in purple in Figure 2. Therefore, in this case, a combined segment was named a heavy vehicle risk segment (HVRS). Twenty-two HVRSs were discovered as potential high-risk accident areas, accounting for 22.3% of total heavy vehicle accidents and approximately 12.8% of the total length of the road segment, as shown in Figure 2. It is worth mentioning that the 10 locations that scored a severity index above 20 fell under the HVRS and are marked in red. Each segment was also evaluated, and the details of the evaluation are shown in Table 2. As presented in the table, the lengths of the segments varied due to the varying distances between overlaps. The longest HVRS was 6.5 km long, located at KM 228.0-234.5 E1, whereas the shortest HVRS was 2 km long, located at KM 81.0-83.0 E2.    The detection of road network segments with overrepresented heavy vehicle accidents should be followed by the implementation of precautionary measures that attempt to improve the safety of these segments. For this reason, the identified HVRSs were ranked based on their crash rates for every segment in their vicinity to prioritize the segments with high risk to help with decision-making in terms of optimizing the cost benefit of implementing countermeasures. The highest ranked HVRS was located at KM 256.1-259.0 E1, with very high crash rate of 86.6, as shown in Table 2. The lowest ranked HVRS was located at 454.3-458.9 E1, with a crash rate of 4.8. Based on the crash rate per million heavy vehicle kilometers, a hotspot location with a lower heavy vehicle volume indicated more accidents compared to a hotspot location with a higher volume of heavy vehicles. Similarly, from the observation of this study, in terms of the total severity per heavy vehicle volume per kilometer, the hotspot locations with lower heavy vehicle volumes showed higher crash rates than the high-volume areas.
As this study determined segments of high risk instead of points, it can assist in describing the characteristics and vicinity of a hotspot. For vicinity overviews, horizontal and vertical alignments at various HVRSs were produced by using a cell-based digital elevation model (DEM) with a pixel size of 90 × 90 and map information, as shown in Figure 3. Horizontal and vertical curves existed at the majority of the HVRSs. The data for the horizontal curves were derived from the ROad Curvature Analyst (ROCA) software package [61], whereas the data for the vertical curves were produced via Google Earth. Considering the maximum curve radius of 3000 m, the results indicate the existence of horizontal curve radii ranging between 215-2998 m at 18 out of 22 HVRSs, as shown in Appendix A. The lengths of the curves varied between 100 to 2005 m. The maximum ascending slope values for the HVRSs were all more than 4%, except for one location at KM 381.9-385.5 E1 (1.6%). The maximum descending slope values were between 0 to −40.7%. The maximum elevation gain and loss were 168 m and −208 m. According to [62], in the study of risk assessment of heavy vehicles on ramps, they found that ramps with a longitudinal slope higher than 3.2% and a length longer than 1000 m present risk for heavy vehicles. Moreover, [63] found that continuous, long, steep, and descending gradients result in higher accident rates compared to steep gradients alone. Combinations of difficult vertical and horizontal elements accompanied by risky roadside environments such as cliffs and embankments make driving in these areas more demanding [64]. Other than that, elevation changes could also affect heavy vehicle accident occurrence. The findings from this study could provide a better understanding of the vicinity of high-risk segments for heavy vehicles and help in determining prevention measures for these areas.

Conclusions
Heavy vehicle accident analysis is crucial in transport safety, especially for roads that are shared by various types of vehicles, due to the high impacts of heavy vehicles on other vehicles in accidents. Although the proportion of heavy vehicles on the NSE is low (approximately 9%), nonetheless, heavy vehicles contributed to approximately 42.9% of

Conclusions
Heavy vehicle accident analysis is crucial in transport safety, especially for roads that are shared by various types of vehicles, due to the high impacts of heavy vehicles on other vehicles in accidents. Although the proportion of heavy vehicles on the NSE is low (approximately 9%), nonetheless, heavy vehicles contributed to approximately 42.9% of the fatalities for all accidents. This discovery is alarming since accidents cause serious causalities, property losses, and result in serious social impacts. Therefore, it is imperative to mitigate this. One mitigation method is determining high-risk locations that are overrepresented by accidents and then employing countermeasures in these areas.
One of the techniques used here is hotspot mapping with the help of spatial statistics. This technique was used to predict spatial patterns for overrepresented accident locations. The utilization of this mapping technique helps in identifying accident patterns in depth through the visualization of high-risk locations and the area in the vicinity. Other than that, instead of focusing on one criterion for determining hotspot locations, investigation with different criteria produces better results, especially when prioritizing high-risk locations for accidents. In this research, the considered criteria included the heavy vehicle accident frequency, the number of heavy vehicles involved in accidents, and severity index values for accidents at the location.
Global autocorrelation analysis with Moran's I was employed to calculate a crash pattern's clustering tendency in space and time in this paper. With a distance threshold of 1355 m, the results show that the spatial patterns for the three criteria considered in this study were clustered, showing strong spatial dependence across the study area. The local spatial autocorrelation Getis-Ord Gi* statistic was then utilized to identify exact aggregated locations for road crashes across the study area. The results for different criteria appeared at different locations along the same route. With a fixed distance threshold, the Getis-Ord Gi* statistic adequately detected clusters with high values. If a point with a high value was not surrounded with other high value points, the area was not considered a hotspot. This helped to predict segments with high values features and prioritize these segments.
It was also observed that numerous hotspots from the three criteria overlapped with each other. To ensure all the criteria were considered, an overlap method was applied to identify high-risk heavy vehicle accident segments and to help rank these locations. Buffering techniques were applied to the hotspots for each criterion before the overlap process. Overall, 22 HVRSs were identified and then ranked by their crash rate value.
At present, the majority of studies focus on the analysis of accident hotspots for all types of vehicles instead of focusing on specific types of vehicles. The analysis of hotspots for each type of vehicle can improve the understanding of the nature of accidents for each vehicle type. The findings of this study could help in predicting and prioritizing high-risk segments for heavy vehicles by clustering accidents. In addition, the use of ArcGIS to explore accident locations in detail is recommended.

Limitation of the Study and Further Direction of the Study
There are several limitations to this study that reduced the efficiency and accuracy of the work presented here. First, the data do not accurately pinpoint the exact locations of accidents, where the data are based on accident occurrence in a general 100 m length. Second, several vital variables are lacking, such as information regarding horizontal and vertical curves, curve change rates, and speed profiles. This study used an alternative solution by relying on ROCA for detecting horizontal curves on the polyline provided by MHA and in determining curve radii. Google Earth was used to obtain the elevation and slope information. The aim of this study was to investigate a method to predict highrisk HVRSs by clustering three different criteria and considering the nearby vicinity at the HVRS; however, this study has not investigated the causal factors of heavy vehicle accidents for the entire road. It is recommended to extensively study the various factors of heavy vehicle accidents at similar geometric conditions as in this study. It is also recommended to study other factors that contribute to heavy vehicle accidents in the hotspot area.
For future analysis, the use of multiple criteria in accident analysis is suggested to determine hotspot segments, as well as for aiding with ranking. Such improvement could improve the accuracy and reliability for the prioritization of potential high-risk locations for heavy vehicles. It is also important to analyze the distance threshold and buffer size sensitivity to find appropriate values. The utilization of maps and clustering for accidents provides better understanding about accident locations when compared to solely focusing on accident points without clustering. If an exact location and surrounding area can be determined, then a suitable remedy action can be employed. The use of different spatial weights is also recommended. It is also suggested to use the same techniques to study accidents for different vehicle types.

Institutional Review Board Statement:
Ethical review and approval were waived for this study, due to this study not involving biological human experimentation or patient data.

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All relevant data are included within the paper.

Acknowledgments:
The authors appreciatively acknowledge the Malaysian Highway Authority for providing the accident data used in this research.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

ArcGIS
Geographic