Improved Strategies for the Maritime Industry to Target Vessels for Inspection and to Select Inspection Priority Areas

Inspection authorities such as the Port State Control Memoranda of Understanding use different policies and targeting methods to select vessels for inspections and rely primarily on past inspection outcomes. One of the main goals of inspections is to improve the safety quality of vessels and to reduce the probability of future incidents. This study shows there is room for improvement in targeting vessels for inspections and in determining vessel-specific inspection priority areas (e.g., bridge management versus machinery related items). For the year 2018, sixty percent of vessels that experienced very serious or serious (VSS) incidents were not selected for inspection up to three months prior to the incident and forty percent of the vessels that were inspected still had incidents of which only four percent were detained. Furthermore, one can observe a very low correlation (−0.04) between the probabilities of detention and incidents (VSS) for the year 2018. The proposed approach treats detention and incident types as separate risk dimensions and evaluates seven targeting methods against random selection of vessels using empirical data for 2018. The analysis is based on three comprehensive data sets that cover the world fleet and shows potential gains (reduction of false negative events) of 14–27 percent compared to random selection. This can be further improved by adding eight inspection priority risk areas that help inspectors to focus inspections by providing insight in the individual risk profile of vessels. Policy makers can further customize the approach by classifying the risk of vessels into categories and by selecting inspection targets and benchmark samples. A small application example is provided to demonstrate feasibility of the proposed approach for policy makers and inspection authorities.


Introduction
This manuscript evaluates and challenges the status quo practice of the maritime industry to primarily use past Port State Control inspection outcomes to target future risky vessels for inspections as highlighted by Knapp [1]. The primary goal is to develop a method and possibly a practical tool that can be used by maritime administrations or integrated into their systems with the aim to enhance the detection of risky vessels and to reduce false negative events-in short to have a more proactive approach to inspection selection. The development of the method builds on work of Knapp and Franses [2], Heij et al. [3] and Heij and Knapp [4] to select vessels for inspections with the highest benefit (reduction in risk) and to treat detention and incident risk as separate risk dimensions in order to reduce false negative events. A false negative event is an event where the targeting method classifies a vessel as low risk, but in reality, this vessel is a risky vessel. If a vessel is classified as low risk based on detention risk only, the vessel does not get targeted but can experience an incident with (very) The first step is to create the data matrix to estimate risk formulas. The formulas are estimated by using logit models and the underlying matrix used was from the years 2010 to 2014 covering the world fleet. For more details on the underlying statistical models used including a list of variables that are evaluated, we refer to Knapp and Franses [20] and Knapp [21]. Separate risk formulas are estimated for each incident type of interest and for detention. The formulas can be used for a maximum of three to five years and will then need to be re-estimated.

Step 2: Estimate Probabilities Using Data Feeds from 2018
The second step is to use the derived risk formulas of Step 1 and, using input data feeds from 2018, to estimate probabilities at ship level for each of the four quarters of 2018. This is necessary because the main interest is to estimate the probabilities valid at a specific point in time. Since quarterly input data feeds for the year 2018 were available, those were used to estimate the probabilities using custom-made software that can process the input data feeds, calculate the input parameters needed to apply the risk formulas and estimate the probabilities. Ten risk formulas are selected that cover targeting and vessel inspection priority areas resulting in a total of 2.9 million probabilities.

Step 3: Calculate Percentile Ranks
The probabilities of Step 2 form the basis to calculate percentile ranks of vessels. Percentile ranks are useful to rank vessels according to their risk and to construct classes of vessels with roughly comparable risk and especially classes where such risk is relatively high. An advantage of using percentiles instead of estimated (logit) risk probability levels is that these percentiles are robust against size distortions of the logit probabilities as percentile ranks are not affected by size distortions. As a benchmark sample, the global fleet is used as the basis and the percentile ranks are calculated of the five methods that are evaluated combining detention and incident (VSS) percentile ranks. The percentile ranks of the inspection priority areas are also calculated but not combined with detention. In addition, percentile ranks are also used to classify vessels using five risk categories (1 = very high to 5 = very low risk). The output of this step is 4.4 million percentile ranks and form the basis for evaluation and validation. Different benchmark samples could be used such as for instance benchmarking only vessels that arrive in a specific geographic location in order to zoom into those vessels of interest if the global fleet is not adequate. The proposed method is very flexible and can easily be adjusted to zoom into particular geographic locations.

Step 4: Validate Targeting Methods Using Empirical Data from 2018
The last step is to validate the methods from Step 3 against empirical data from 2018 using three validation variables (detention only, incidents only, and detention and incidents combined). Table 1 lists the data sources, time frames, and number of observations for the databases and data feeds that form the basis for the four steps mentioned above. Appendix A (Table A1) provides descriptive statistics by ship type for all databases and data feeds used in this analysis. For Step 1, historical data is used to estimate risk formulas. For Steps 2 to 4, we use data input feeds with ship and various histories (changes in ship particulars as well as inspection and incident type histories) at a particular time to estimate probabilities at a specific time. The global inspection data comprises data from over seventy countries from eight Port State Control Memoranda of Understanding (MoU's). Incident data had to be combined and manually reclassified using IMO definitions [18] since different data providers use different definitions of seriousness. In addition, for each incident the first event of the chain of events was identified since that is needed for the incident type models for the eight-vessel inspection priority risk areas. In addition, the various types of consequences are recorded for each incident. To ensure that results are not biased due to underreporting of less serious incidents and near misses [19], this study concentrates on very serious (including total loss) and serious incidents (VSS). Ship particular data contains standard information such as ship type, age, size, flag, company (e.g., beneficial owner, class society, safety management company), construction (engine information, shipyard country), previous incidents and inspection outcomes. Tugs and fishing vessels are excluded, and ship types are grouped into six main groups: general cargo, dry bulk, container, tanker, passenger, and other types. To estimate probabilities at the ship level, the data feed contains quarterly data feeds for 2018 from the same sources shown in Table 1 for 73,905 vessels, which are out-of-sample data compared to the data for 2010-2014 used to estimate the risk formulas.
The risk formulas used in this analysis are based on logit models following the methodology of the selection of variables from Knapp [1,21] and Knapp and Franses [2,5]. The logit model estimates the probability (p) of an event of interest such as detention or incidents (VSS) by means of p = exp(xb)/(1 + exp(xb)), where 'exp' denotes the exponential function and 'x' is the set of vessel-specific variables (e.g., ship type, size, age, flag, classification society, beneficial owner, engine designer, ship yard country). Over 500 variables (including counting dummies for categorical ones) are considered initially. The database to estimate the incident type models has one observation per vessel per year whereas the database to estimate the detention model has multiple observations per vessel per year since vessels can be inspected various times per year. The models are specified by backward elimination by removing insignificant factors (at the 5% significance level). The largest of the resulting models for incidents (VSS) contains 172 variables while the smallest contains 16 (for fire and explosion). All models are estimated by quasi-maximum likelihood as by Greene [22] to allow for possible misspecification of the assumed underlying distribution function for logit models. The employed logit models are described in more detail in Knapp [21] and Heij and Knapp [4]. Table 2 lists the resulting risk models that form the basis for Step 2 of the methodology. The incident type models serve as proxy to inspection related focus areas, where separate models are used for collisions, powered groundings, main engine failures, and drift groundings. Since the effect of risk factors changes over time as they proxy how industry responds to market conditions and legislative changes, the risk formulas need to be updated every three to five years. Based on the data from 2010 to 2014, the effects of vessel age and size for VSS incident risk are opposite to those for detention risk. Since the detention model reflects actual Port State Control Memoranda of Understanding (MoU) decisions in practice, this indicates that no incident information is part of the targeting routine. This finding relates to one of the main messages of this paper that past incident information is relevant for targeting vessels for inspection to reduce future incidents (VSS). It also demonstrates that the inspection data is biased since it reflects the various targeting policies of coastal states.
Step 2 involves the estimation of ship-specific probabilities using quarterly input data feeds for 2018 and using the risk formulas obtained under Step 1. The input data feeds cover the global fleet of 73,905 individual vessels and cover the following six ship types: general cargo (23.8%), dry bulk carriers (15.7%), container vessels (7%), tankers (22.1%), passenger vessels (9.7%) and all other ship types (excluding fishing vessels, tugs and commercial yachts) (21.7%).
The calculation of the probabilities was performed by means of a customized software program resulting in 2.9 million probabilities that form the basis for the development and testing of the targeting methods shown in Table 3. The software processes the data feeds, calculates the input parameters to be applied for the risk formulas and estimates the probabilities. The probabilities are converted into percentile ranks using as benchmark the global fleet in each of the four quarters.
Percentile ranks provide a useful way for policy makers to understand where a particular vessel stands with respect to all other vessels in the benchmark sample, which could be adjusted to regional preferences (e.g., all vessels that arrived in a particular region over the last three years) rather than using the global fleet as benchmark. Table 3 shows the five combined targeting methods (A, B, C, D and E) and the two stand-alone (detention, incidents) that were evaluated based on 4.4 million percentile ranks. Table 3. Targeting methods evaluated.

Targeting Methods Description
Detention (only) Vessels are ranked by percentile ranks from detention probabilities only.
Incidents (only) Vessels are ranked by percentile ranks from VSS incident type probability (TLVSS = total loss, very serious and serious).

Combined methods -combining percentile ranks of detention and VSS:
Method A (max) Vessels are ranked by the highest of the two base percentile ranks Method B (min) Vessels are ranked by the lowest of the two base percentile ranks Method C (weight) Vessels are ranked by a weight of 50/50 incident to detention Method D (weight) Vessels are ranked by a weight of 75/25 incident to detention Method E (weight) Vessels are ranked by a weight of 25/75 incident to detention In order to classify vessels based on their percentile ranks, five risk categories are chosen as shown in Table 4 along with a color coding for the five risk categories to help visualize the risk categories. The suggested target inspection coverage is flexible and can be set by policy makers considering the country's (or group of countries in the case of a PSC MoU) risk appetite, regional priorities and arrival profiles, inspection policies (e.g., to inspect vessels every six months, to inspect all passenger vessels etc.) and resources. To test feasibility, the suggested yearly target inspection coverage was applied to ship arrival data for 2018 of one country containing 34 thousand arrivals in port (6065 unique IMOs) with average daily arrival rate of 95 vessels and average daily inspection rate of eight vessels. Applying the above target inspection coverage using Method B, the same average yearly figures were obtained, that is a daily average inspection rate of eight vessels, making the suggested coverage feasible. In addition, it is recommended to add some random selectin to the inspection coverage for the lower risk coverage areas. At the global level and based on unique IMO numbers, the quarterly average inspection rate is 21.2% while the yearly average inspection rate is 41.6%. The determination of the target inspection coverage will depend on regional inspection capacities and trade flows and it is recommended to custom tailor those to the respective country or region. Table 4 is only provided as example on what would be feasible given the inspection capacity of one country. Furthermore, the inspection coverage should be re-evaluated yearly to adjust to changes in the trade flows which will determine the type of vessels that trade in a region.

Evaluation of Targeting Methods
To evaluate the various proposed targeting methods, three evaluation variables were considered: incidents (VSS); detentions; incidents and detentions combined (the vessel was either detained or had an incident within the relevant time period). The evaluation time periods are specified as follows, since estimated probabilities are only valid up to a maximum of three months: One way to visualize how well the targeting methods perform compared to random selection of vessels is via ROC (receiver operating characteristic) curves that plot the true positive rate (TPR) on the Y-axis against the false positive rate (FPR) on the X-axis. Figures 1-3 provide ROC curves for the three evaluation variables, zooming into the top 30% of all vessels, which represents the top three risk categories (RC1 to RC3). Any curve above the diagonal line (random selection) constitutes an improvement. Appendix B (Figures A1-A3) provides the complete ROC curves.
One can observe that all methods perform better than random selection except for the detention method (using detention only) for the evaluation variable incidents (VSS). This is understandable given the small correlation (-0.046) between the two which also confirms that vessels with a high probability of detention do not necessarily have a high probability of incident (VSS) and that the two need to be treated as different risk dimensions.
Note: Y-axis: true positive rate, X-axis: false positive rate   At the global level, 60.2% of all incidents (VSS) were not selected for inspection up to three months prior to the incident. Of the 39.8% vessels that were inspected, 4.4% were detained indicating that vessel inspection priority risk areas could be improved in order to focus inspection efforts and to reduce incidents from happening. Restricting this to very serious incidents (VS) only, 75.4% were not selected for inspection and 3.3% were detained. After excluding cases with heavy or severe weather conditions, only 42.1% (VSS) and 34.3% (VS) of vessels with incidents were selected for inspection up to three months prior to the incident.
Method D on the other hand would have selected 44.8% of all vessels with VSS incidents and 39.3% of all vessels with vs. incidents in the top three risk categories (RC1 to RC3). Taking different inspection rates into account and if random selection is set to factor 1 for comparison reasons, the 44.8% classification rate of Method D translates to 1.49 compared to random selection (44.8% divided by 30%). Besides visualization by ROC curves, the improvement over random selection of vessels in terms of reduction of the false negative rate is quantified in Table 5. Note that the false negative rate is the opposite of the true positive rate (in the sense that both rates add up to 100 percent). To test the significance of differences in success rates across methods, the Satterthwaite Welch t-test is performed (Appendix C- Table A2 shows detailed results for some methods).
The results confirm that all methods are significantly better than random selection except for the method DET (using detention only) for the evaluation variable VSS. For the evaluation variable incidents (VSS), Method D performs best at the top 30% level which combines the first three risk categories (RC 1 to 3) while method VSS performs best at the top 10% level (RC1) followed closely by Method D. The Satterthwaite Welch t-test however confirms no significant difference between method VSS and Method D at the top 10% or top 30% level but confirms that Method B and using detention only vary significantly compared to method VSS or Method D. Method D gives more weight to incidents but also accounts for detention risk to capture vessels that have low percentile ranks for VSS but high for detention. For the evaluation variable detention and combining detention with incidents, Method B (min) performs best at the top 10% and top 30% level. This is also confirmed by the Satterthwaite Welch t-test where Method B varies significantly compared to method DET, VSS and Method D at the top 10% and top 30% level.  Figure 4 provides the mean deficiency rate and detention rate of inspected ships and the mean incident rate of all vessels for each of the suggested risk categories which should be higher for higher risk categories. Note: mean incident rate = sum of incidents/total nr of unique vessels by RC, mean detention rate = sum of detentions/sum of inspections by RC, mean nr of deficiencies = sum of deficiencies/total nr of unique inspected vessels. The final part of the analysis compares observed incident types with the eight-vessel inspection priority risk areas. The 817 incidents (including vessels that had more than one incident per quarter, which is therefore more than the 756 incidents mentioned in Table 5) are manually checked to identify the first event of what is normally a chain of events. Since an incident can have multiple events and consequences, this leads to 886 outcomes (84 for VS) linked to the inspection priority risk areas. Vessels with high risk (RC1 to RC3) are identified and Table 6 shows the percentage of these vessels to the total of relevant incidents for each category. Some incident types such as grounding, stranding and loss of life, have few observations and it is not possible to distinguish between powered and drift grounding with the available data, hence the comparison is made with grounding/stranding for this type of category using the same count. In the future, vessel inspection priority risk areas can be extended by adding for instance models related to the maritime labor convention (MLC) and by producing MLC type deficiency probabilities. Another possible improvement is related to occupational safety type incidents and human error. The empirical data showed 45 such cases (21 for VS) that cannot be easily matched against any risk inspection priority areas at this stage but could be in the future if there is a separate risk model for occupational safety related incidents.
The inspection priority risk areas can also be used to further improve targeting vessels for inspection in addition to using combined Methods A-E. This idea was tested using the inspection priority risk areas of the 817 vessels that had incidents with results shown in Table 7. Table 7 shows the improvement compared to random selection by using the various methods alone and by combining them with inspection priorities that have high risk ratings (RC1 to RC3). For instance, for Method B, improvement over random selection is 0.24 (1.24-1) compared to using Method B alone for targeting. Using at least one of the inspection priorities that have high risk ranking in addition to Method B, overall improvement is 1.63 (2.63-1) over random selection or 1.39 (2.63-1.24) compared to using Method B alone. Based on Table 7, using at least four or more inspection priorities with higher risk rankings in addition to a base targeting method alone (e.g., such as Method B) seems to provide a good balance between improvement and the number of added vessels that would be selected for inspection as shown in the last row of Table 7. Note: The relative hit rate corrects for different inspection rates and is calculated as follows: % correctly classified/% of vessels inspected which is 30% for RC1 to RC3.
The selection of how many of the risk priorities need to show a higher risk rating (e.g., 1 to 8) to be considered for inspection is up to policy makers and available resources. The inspection priority areas are more refined models (all restricted to VSS incidents) than the base VSS incident model and are worth been considered is more than three or four show high risk rating as they are also correlated (please refer to Appendix D, Table A3) indicating if a vessel has higher risk areas in one or two areas, it most likely will also show higher risk ranking in other areas.

Application Example for Inspectors
The approach presented in the previous section provides a data-driven or quantitative approach to assist selecting vessels for inspection and focusing inspection efforts with the aim to reduce false negative events. The data-driven part can be combined with other intelligence and expert knowledge of inspectors to finalize inspection selection and execution. The procedure can be split up into the following three main steps, where the first two steps can be fully automated and the final step allows addition of qualitative knowledge and other relevant intelligence.

•
Step 1: Use risk formulas that are updated every three to five years to estimate ship-specific probabilities based on up-to-date data feeds that are received daily or weekly.

•
Step 2: Calculate percentile ranks relative to the relevant benchmark sample (e.g., global fleet or vessels that visited the relevant region during the last three or five years) and classify vessels into risk categories. Consider inspection priority risk areas to focus inspection activities and to possibly further improve selection of vessels for inspection.

•
Step 3: Combine the outcomes of Steps 1 and 2 with expected arrival data in a particular port or wider area of interest and plan the inspection visits based on priorities and capacities, taking the data-driven outcome as guidance. To finalize the inspection planning, use other available intelligence and expert knowledge, for instance, knowledge about specific companies or vessels known to inspectors or the region, market economic conditions, or new legislative requirements.
Risk dimensions can be shown graphically as visual assistance to inspectors. Figure 5 provides an example of 11 vessels that all had incidents, of which only one (container Vessel 5) has been selected for inspection (with a resulting detention). Such graphs can be generated automatically showing all vessels in port liable for inspection for a specific day or time period. The graph shows instantly where each vessel stands with respect to the others. In this particular example, the targeting method using a top 30% rule for detention only would have missed Vessels 3, 4, 5, 6, 7 and 10. If only incidents (VSS) are used for targeting, the top 30% rule would have missed Vessels 2, 5, 6 and 9. When using combined methods with the addition of inspection priority areas, all vessels could be considered for inspection. This type of visualization could be complemented by a table (refer to Table 8 for an example) that lists the percentile ranks of selected methods and the percentile ranks for the vessel inspection priority risk areas. Note that in this example, all methods are shown. Under real operations, one or two would be chosen such as Method B, D, VSS and detention to cover all priorities and focuses. With respect to the inspection priority areas, the following is of interest given that all vessels experienced incidents but only one was selected for inspection and detained. . It arrived in port twice in August, the last time just two weeks before the incident, but was not selected for inspection. • Vessel 10 ran aground late in November 2018, was re-floated and departed. It has very high percentile ranks for drift grounding (97.41) and hull related failures (95.32). It arrived in port twice before the incident and was not inspected. • Vessel 11 experienced engine problems late in October 2018. While its percentile rank of engine failure is low, the one for drift grounding is medium (78.15).   Table 4 which gives the percentile rank range for each risk category (RC1 to RC5).

Discussion and Conclusions
This study considers and evaluates the status quo assumption of maritime inspections such as PSC inspections and industry inspections to primarily use past inspection outcomes -in particular past detention and deficiencies -to target vessels for inspections and to be able to identify future risky vessels. One of the main goals of inspections is to improve the safety quality of vessels and to reduce the probability of future incidents. The empirical analysis shows that for the year 2018, 60% of all vessels with a VSS incident were not inspected in the three months prior to the incident, and of the 40% that were inspected only 4% were detained. Furthermore, there is a low correlation (−0.04) between the probabilities of detention and incident (VSS) at ship level which confirms that these two dimensions measure different risk aspects and that targeting can be improved by combining these two risk dimensions.
The results further indicate that inspection efforts and inspection risk priorities can be improved by focusing them in a better way since inspectors could be guided the risk inspection priorities once a vessel is selected for inspection. One way to improve the effort is to treat detentions and incidents as two risk dimensions as suggested in this approach and to use methods that combine the two risk dimensions to target vessels for inspection and to guide the selection of the inspection priorities. In terms of targeting efficiency, the reduction of false negative events is the focus in order to minimize the probability that risky vessels are missed as incidents can be very costly.
Five targeting methods that combine the two risk dimensions (detention and incident) using percentile ranks are developed and tested against random selection of vessels using empirical data for 2018. The results show a potential gain (reduction of false negative events) of 14% to 27% compared to random selection which can be further improved if adding vessel inspection priority risk areas to the targeting routine. The study demonstrates that combined methods have the potential to reduce false negative events as the chance to catch risky vessels is improved. In the future, different weights for incident type risk and detention could be tested, especially if a longer time period for testing becomes available.
The use of percentile ranks makes it possible to combine two risk dimensions and allows customized benchmarking of vessels, for instance, by considering vessels for a specific country or region of interest rather than the global fleet used here. Since the percentile ranks are based on a benchmark sample which could be the global fleet or all vessels that arrived in a specific region over a time period (e.g., three or five years), the percentile ranks can be customized and are dynamic in nature since they automatically correct for improvements of the fleet since vessels are always compared against each other dynamically if the benchmarking routine is run daily or weekly. This approach is more dynamic to classify vessels compared to, for instance, just using an average detention rate that is used and not changed for many years.
The study presents risk categories and suggests associated inspection target coverage, which are tested for feasibility against arrival data of one country. However, these categories and target coverages are flexible and can be set by policy makers depending on their regional preferences and inspection capacities, along with the selection of the benchmark sample. It is recommended to customize the inspection coverage based on regional trade flows, inspection capacities and policies and to evaluate them yearly or every three years along with the size of the benchmark sample (e.g., all vessel arriving for the last 3, 5, 6 years etc.) The study confirms that incident data still has many quality issues and it remains difficult to use these data to either estimate risk formulas or to validate targeting methods or to determine incident types related to inspection priority risk areas. This was partly overcome by restricting the data to VSS incidents, by using raw data from at least three different sources that can be manually classified. In addition, it is impossible to measure the number of vessels that did not have incidents due to inspections or that would not have had an incident if they had been inspected which is the most desirable outcome since inspections should improve safety qualities of vessels and reduce the likelihood of having an incident later on.
The eight considered vessel inspection priority risk areas provide the means to help inspectors in focusing their efforts, as these areas provide insight into the individual vessel risk profile. In the future, these areas could be extended further, for example, by adding Maritime Labor Convention deficiency type probabilities (e.g., fatigue and working, living and labor conditions) or by revising the incident type models to include incidents related to equipment failures or occupational incidents. Other combinations are possible given regional priorities and subject to availability of data.
The data-driven part to assist with targeting and to decide how to focus inspection priorities can be fully automated. It is simpler to visualize risk dimensions and risk priorities at the level of vessels that are expected to arrive in a port or in a specific regional area. The visualization can act as one component in the more complex process to select vessels for inspections and can be combined with qualitative aspects including other intelligence available and the inspector's expert knowledge. These could for instance be mandatory or voluntary incident alerts that are available to some maritime administrations who are also involved in search and rescue operations. The percentile ranks could also be used to enhance domain awareness where risk profiles can be attached to Automatic Identification System (AIS) position records and vessels are monitored via automatic alerts (e.g., a high risk vessels enters a difficult area to navigate and starts drifting).
It should be acknowledged that the evaluation of new methods as presented here has two important restrictions. First, the inspection and detention data are the product of current inspection strategies across the globe and are therefore biased towards current targeting regimes. This means that potentially risky vessels that would have been detained if they would have been selected for inspection are not observed and are therefore not part of the empirical dataset used for validation here. In addition, if an incident was prevented due to an inspection, the desirable outcome cannot be observed. For this reason, evaluation is made against a random selection benchmark. Second, the incident data has several limitations. A true test of alternative inspection targeting approaches can only be obtained by implementing such approaches for some time to guide the inspection decisions and by recording some observable outcomes.
Author Contributions: S.K. identified the research area and worked on the data preparations, the logit models and the software to estimate probabilities. C.H. developed the overall methodology to combine the two risk dimensions. Both worked on the manuscript text collaboratively. All authors have read and agreed to the published version of the manuscript.