On the K-Means Clustering Model for Performance Enhancement of Port State Control

Zeyu Hou; Ran Yan; Shuaian Wang

doi:10.3390/jmse10111608

,

and

Department of Logistic and Maritime Studies, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng.2022, 10(11), 1608;https://doi.org/10.3390/jmse10111608

This article belongs to the Special Issue Sustainable Operations in Maritime Industry

Version Notes

Order Reprints

Abstract

Nowadays, the concept of port state control is viewed as a safety net to safeguard maritime security, protect the marine environment, and ensure decent working and living circumstances for seafarers on board to a large extent. The ship can be detained for further checking if significant deficiencies are discovered during a port state control inspection. There is much research on this topic, but there have been few studies on the relationship between ship deficiencies and ship detention decisions using unsupervised machine learning artificial intelligence techniques. Although the previous methods or models are feasible for ship detention decisions, they all have shortcomings to some extent, such as large training model errors caused by the imbalance of class labels in the dataset and the fact that the training model cannot comprehensively consider all factors influencing ship detention decision due to the complexity and diversity of the problem. Unsupervised algorithms do not need to label all data in advance, and we can incorporate some fields related to port state control inspection data that can be collected into the model to allow the computer to automatically classify the ships at different risk levels according to relative criteria, e.g., the Tokyo memorandum of understanding, which may result in more objective results, thus eliminating the influence of subjective domain knowledge. It may also have more comprehensive coverage and more information on port state control inspection and decision models. Therefore, this research explores and develops an unsupervised algorithm based on k-means to improve port state control inspection decision-making models using the six-years inspection data from the Tokyo memorandum of understanding. The results show that the accuracy rate is around 50%.

Keywords:

port state control; ship detention; machine learning in maritime transportation; unsupervised learning

1. Introduction

Although maritime transport is relatively safe, accidents and casualties involving marine vessels can bring about great losses to the shipping industry and the whole society [1]. The International Maritime Organization (IMO) proposes and implements a number of international regulations and conventions to ensure maritime safety and safeguard the marine environment. Under this circumstance, port state control (PSC) inspection is regarded as a safety net to guard maritime safety by verifying that foreign visiting ships are human-crewed and operated in compliance with international rules and protect the whole marine environment.

The widely used ship risk profile (SRP) analysis considers ship flag, recognized organization (RO), and company since they are crucial to ship management, operation, and maintenance [2]. In return, a vessel’s performance in PSC inspections affects its flag state, RO, and company reputation, as well as its performance in PSC memorandum of understanding (MoU) assessments [3,4]. In this case, it is reasonable to conclude that, all other things being equal, if the performance of the flag state/RO/company gets worse, the ship should be estimated to perform worse in the PSC inspection (e.g., more deficiencies and higher probability detection) [5].

Before ships come to the port state, the PSC officers (PSCOs) first select the ships with a higher risk of inspection. The results of an inspection mainly contain identified deficiencies and ship detention [1]. A ship deficiency is a condition found not to be in compliance with the requirements of the relevant convention, whereas ship detention is an intervention action taken by the port state when the ship is unseaworthy [2]. To improve the accuracy and efficiency of PSC inspection, this paper aims to propose a ship detention prediction model based on K-Means clustering, a typical type of unsupervised machine learning AI model, to serve as a decision support tool for ship selection, detention, and inspection for the port states. The model includes general factors (i.e., ship age, gross tonnage, type, depth, length, beam, flag performance, recognized organization performance, and company performance), ship dynamic factors (i.e., times of changing flag and casualties in the last five years), and ship inspection historical factors (i.e., total previous detention, last inspection time, last deficiency number, and PSC follow-up inspection rate) into account to predict ship detention probability. It solves the problem of an uneven distribution of detention ships among all ships entering the port and may find perspectives from unknown angles for the decision on PSC inspection.

2. Literature Review

As a complement to flag state control, PSC inspection, which is an inspection regime for ports to inspect foreign visiting ships, was first implemented in 1982. Since then, it has been viewed as the second line of defense against substandard vessels (while the first line of defense is ship flag states). Despite widespread industry and academic acceptance of the effectiveness of PSC inspections in raising the level of maritime transport safety, port state authorities continue to confront significant obstacles [6]. One of the biggest challenges is that the efficiency of port visiting ship classification methods for ship risk levels is not that satisfactory due to some limitations such as PSC inspection data missing and imbalance. Research on maritime transportation has attracted wide attention in recent years [7,8,9]. Especially, there has been an increasing number of studies on PSC inspection. Before conducting a PSC inspection, the decision of which ships should be selected for inspection among all the coming ships is one of the critical issues faced by the port state officers since limited time and resources need to be allocated to inspect the ships with worse conditions so as to increase inspection efficiency. PSC inspection impacts, suggestions for MoU management [10], factors impacting PSC inspection results, and ship selection methods in PSC are the four primary categories [10] into which literature reviews pertaining to PSC are typically divided. In this research, we focus on the literature related to ship selection schemes for PSC inspection.

The outcome of a PSC inspection mostly consists of detention decisions and other types of deficiencies. Several related studies reach a concordance that ship age, ship flag, and ship type are the main determinants of ship deficiencies and detention [7,8,9,10,11,12]. More specifically, some studies have also identified the extent to which the target factors would contribute to the deficiencies and detention [7]. Based on the target factors, various innovative ship selection schemes for PSC inspection are proposed. Zhou and Sun [12] proposed an automatically optimized and self-evolutional ship target system based on the target factors using the generalized additive modeling (GAM) approach. Yang, Yang, and Yin [13] created Bayesian networks to forecast the likelihood that bulk carriers will be detained in seven significant European nations. The number of deficiencies, the type of inspection, the recognized organization, and the vessel’s age was the main risk factors affecting PSC inspections.

To increase the effectiveness of the onboard inspection, some academics have suggested association rule mining techniques. Association rule mining techniques were developed by Tsou et al. [14] and used to determine the relationships between the defects of the detained ships and the external causes, as well as the relationships between the deficiencies. Chung et al. [15] examined the correlations between ship features and flaws found during inspections (e.g., ship type, flag, and classification society). Additionally, Osman et al. [16] used association rule mining techniques to examine PSC patterns in Malaysian ports. The location of the inspection, the flag state, the number of violations, the outcome of the detention, and the risk profile of the ship were all taken into account.

To increase SRP effectiveness, increasingly sophisticated and precise ship selection models have been created. A Bayesian network (BN) technique was used by Yang et al. [7] to forecast ship detention. Afterward, Yang et al. [8] combined the Bayesian network model with the game model between PSC port authorities and ship owners to present an optimal PSC inspection scheme. A BN model was also used by Wang et al. [9] to forecast how many problems will be found during a PSC inspection. Dinis et al. [10] developed a BN-based ship risk assessment and maritime traffic monitoring model based on the static risk factors adopted by the new inspection regime (NIR) and the SRP. Some scholars have suggested many new types of models for ship selection for PSC inspection in addition to the well-known BNs. For instance, a balanced random forest-based model was put forth by Yan et al. [11] to forecast the likelihood of ship detention. The SRP and the suggested detention prediction model were contrasted.

Based on the above review, it can be seen that among the previous research, there are few studies on analyzing and improving PSC efficiency using unsupervised machine learning methods. Unsupervised algorithms do not need to label all data in advance. We can incorporate corresponding PSC inspection data that can be collected into the model to allow the computer to automatically classify the ships at different risk levels according to relative criteria such as the Tokyo MOU. Therefore, this research explores and develops unsupervised algorithms to improve PSC inspection decision-making models.

3. Methodology

This research aims to classify foreign visiting ships by a clustering algorithm called K-means based on unsupervised learning. The specific method is as follows: based on ship inspection records (age, gross tonnage, length, depth, beam, type, flag performance, RO performance, date of last initial inspection date, total detentions, last deficiency number, total detentions, the number of flag changes, a casualty in last 5 years, and company) obtained over a period of time (in Tokyo MOU for six years [2015–2020]), the K-means algorithm is used to cluster the ships to different risk levels, and the large number of ships coming to the port can be pre-divided into three categories (high-risk ships, standard risk ships, and low-risk ships in SRP). After the clustering, the common deficiencies and detention conditions of the ships in each group are first extracted. Then, the newly arrived ships are matched and grouped, divided into corresponding groups, and the divided groups are divided into subgroups. The ship’s characteristics, such as deficiencies and detentions, are reported to the inspectors to guide them in a targeted inspection of the newly arrived ship.

Model Evaluation Method

Cluster quality

Clustering quality is generally determined by the separation of classes. The tighter the intra-class and the smaller the inter-class distance, the higher the quality. We use Silhouette Coefficient and Calinski-Harabaz Index in sklearn to evaluate cluster quality. Specifically, this model looks at the corresponding scores of these two indicators.

Comparison with training results with labeled data

Because the Tokyo MoU dataset is labeled, it is possible to directly summarize the ship characteristics in each cluster to obtain the characteristics of the ships in the cluster (deficiency and detention) and then recommend them to newly visited ships (that is, those in the test set). By comparing the real deficiencies and detention conditions of newly visited ships with the recommended ones, we can know the quality of the clustering algorithm.

The specific method is as follows: based on ship inspection records of TokyoMOU six years (2015–2020). The ship’s characteristics, such as deficiencies and detentions, are reported to the inspector to guide him in a targeted inspection of the newly arrived ship. The whole process can then be made into online learning: that is, every time a new ship arrives, after the above operations are performed on it, it is added to the entire training set database. For example: Suppose the original database contains n pieces of data and is divided into k groups. Now, the n + 1 pieces of data are re-divided into k groups so that this data set will continue to grow, and the division will become more and more accurate to reflect the characteristics of ships.

4. Prediction Model

4.1. Remove Useless Features and Extract New Features

There are a total of 35 features in the data set. First, delete seven business irrelevant features ‘Call_Sign’, ‘Inspection_Type,’ ‘Flag,’ ‘IMO number,’ ‘MMSI,’ ‘No,’ ‘Draft,’ delete too many missing, meaning the small feature ‘Liquid,’ the feature ‘Tonnage’ and the ‘deadweight’ are highly correlated, only the feature ‘deadweight’ is saved, and the remaining 26 features, which including ‘Dead weight, Flag performance, RO performance, Company performance, deficiency no, detention, last deficiency no, total detention, casualty in 5 years, flag changing times, length, beam, depth, speed, last_36_months_avg_def_no (The average number of deficiencies detected in the past 3 years), last_36_months_all_det_no (The total number of detainees in the past 3 years), last_inspection_state (Was it checked or not at last time), Ship Type_PSC’. Table 1 below introduces the meaning of each feature and its processing method in the entire dataset.

Table 1. Feature explanation and processing method.

Having divided the dataset, we fill in, encode, and scale missing values. There are 3672 pieces of data in the data set, 60% of which are training sets, 20% are validation sets, and 20% are test sets, i.e., there are 2202 pieces of data in the training set, 735 pieces of data in the validation set, and 735 pieces of data in the test set. After processing the data, we start to build the normal model with K = 3 using training data, because the traditional classification number of the ships is 3 (high risk, standard risk, low risk) based on the ship risk profile. Next, the test data were used to examine the performance of the new model. The following Table 2 shows the results and performance of the 3 clusters using the test set.

Table 2. The performance evaluation of the model with K = 3.

4.2. Model Evaluation and Results

When the K-means model was established, we then used the validation set to tune the value of K and find the best value of K for our problem. The silhouette score is used to evaluate the quality of different values of K. Candidate values of K vary from 2 to 14 are tried, and the evaluation results of the silhouette score are shown in Table 3. Based on the Silhouette score, it can be easily found that the trend is increasing first and then falling down with the peak value 9. After these steps, K = 9 was selected for the final model establishment using the train and validation data sets. The performance comparison of K = 3 and K = 9 has been concluded in the following Table 4. It can be concluded that although the model with K = 3 has better performance in deficiency_no_mse, detension_mse, and some code lable_MSE, the cover and evaluation ranges are not that wide, and the difference between K = 3 and K = 9 is not that much (The difference regarding deficiency_no_mean mse, detension_mse, and some code lable_mse between K = 3 and K = 9 are not that much). Hence, the Kmeans Model with K = 9 is outstanding. However, this model can be improved and refined by inputting as much qualified data as possible.

Table 3. The silhouette score of the model with different K.

Table 4. K-means models performance comparison of K = 3 and K = 9.

Actually, very few unsupervised learning methods have been conducted for PSC inspection because the data are scarce, and the available data are very likely to be imbalanced. Compared with another existing research that used principal component analysis (PCA) [17], which transforms the first principal component as the largest possible variance. Each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components. Our proposed method considers every factor equally and will not be affected by the influence of subjective emotional factors or domain knowledge and experience. In addition, the authors also suppose that the K = 9 is reasonable because the status of each ship should be very different, although it does not comply with the common domain knowledge.

5. Conclusions and Future Work

PSC inspection is viewed as an effective way to contribute to the enhancement of maritime safety and security and the prevention of marine pollution. Due to the limited time and human resources, not every deficiency item listed by the Tokyo MoU can be inspected, and even not every single ship could be selected to inspect. Therefore, it is worth developing new algorithms and methods that can instruct PSCOs to improve inspection efficiency.

In this research, we have designed and developed k-means clustering unsupervised learning methods for the efficient classification of ships coming to the port. Numerical experiments show the performance of the k-means model is better than random guess (accuracy rate: around 50%) and has wide coverage on the ship factors. Most importantly, the classification has been refined to nine groups, which would give more insight into ship risk prediction and analysis for PSC inspection. For the research limitation, the silhouette ranges from −1 to +1, where a high value indicates that the object is well-matched to its own cluster and poorly matched to neighboring clusters. However, the highest score of this paper is −0.0836, which means the model has a lot of room for improvement. In future research, this model can be refined by adding more features extracted from qualified data and putting more real data into the model as much as possible.

Author Contributions

Conceptualization, Z.H., R.Y., S.W.; methodology, Z.H.; software, Z.H.; validation, Z.H., R.Y.; formal analysis, Z.H.; investigation, Z.H.; resources, R.Y., S.W.; data curation, R.Y.; writing—original draft preparation, Z.H.; writing—review and editing, Z.H., R.Y., S.W.; visualization, Z.H., R.Y.; supervision, R.Y., S.W.; project administration, S.W.; funding acquisition, R.Y., S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by GuangDong Basic and Applied Basic Research Foundation (grant number 2019A1515011297). The APC was funded by Start-up Fund for RAPs under the Strategic Hiring Scheme of PolyU (grant number 1-BD5D).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

Not applicable.

Acknowledgments

This research is supported by GuangDong Basic and Applied Basic Research Foundation (grant number 2019A1515011297).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yan, R.; Wang, S.; Peng, C. An Artificial Intelligence Model Considering Data Imbalance for Ship Selection in Port State Control Based on Detention Probabilities. J. Comput. Sci. 2021, 48, 201257. [Google Scholar] [CrossRef]
Yan, R. Data Analytics for Improving Shipping Efficiency: Models, Methods, and Applications. Ph.D. Thesis, PolyUThe Hong Kong Polytechnic University, Hong Kong, China, 2022. [Google Scholar]
Balamurugan, K.S.; Chakrabarti, P.; Chakrabarti, T.; Gupta, A.; Elngar, A.A.; Nami, M.; Akbar, M.A. Improving the Performance of Diagnosing Chronic Obstructive Lung Disease Using Outlier Detection with Decision Tree Algorithm. 2022. Available online: https://assets.researchsquare.com/files/rs-2072803/v1/b9e70da5-9278-4bad-b918-e32dfdc1e8ce.pdf?c=1666880078 (accessed on 28 October 2022).
Sriraman, R.; Younis, J.A.; Lim, C.P.; Hammachukiattikul, P.; Rajchakit, G.; Boonsatit, N. A Sampling Load Frequency Control Scheme for Power Systems with Time Delays. Complexity 2022, 2022, 3878321. [Google Scholar] [CrossRef]
Visakamoorthi, B.; Muthukumar, P.; Rajchakit, G.; Boonsatit, N.; Hammachukiattikul, P. Stabilization of Fuzzy Hydraulic Turbine Governing System With Parametric Uncertainty and Membership Function Dependent H∞ Performance. IEEE Access 2022, 10, 23063–23073. [Google Scholar] [CrossRef]
Rajchakit, G.; Sriraman, R.; Boonsatit, N.; Hammachukiattikul, P.; Lim, C.P.; Agarwal, P. Exponential stability in the Lagrange sense for Clifford-valued recurrent neural networks with time delays. Adv. Differ. Equ. 2021, 2021, 256. [Google Scholar] [CrossRef]
Yan, R.; Wang, S.; Peng, C. Ship selection in port state control: Status and perspectives. Marit. Policy Manag. 2022, 49, 600–615. [Google Scholar] [CrossRef]
Yan, R.; Wang, S. Ship Inspection by Port State Control—Review of Current Research; Springer: Singapore, 2019. [Google Scholar]
Yan, R.; Wang, S.; Cao, J.; Sun, D. Shipping Domain Knowledge Informed Prediction and Optimization in Port State Control. Transp. Res. Part B 2021, 149, 52–78. [Google Scholar] [CrossRef]
Wang, S.; Yan, R.; Qu, X. Development of a non-parametric classifier: Effective identification, algorithm, and applications in port state control for maritime transportation. Transp. Res. Part B 2019, 128, 129–157. [Google Scholar] [CrossRef]
Yan, R.; Zhuge, D.; Wang, S. Development of Two Highly-Efficient and Innovative Inspection Schemes for PSC Inspection. Asia-Pac. J. Oper. Res. 2021, 38, 2040013. [Google Scholar] [CrossRef]
Chi, Z.; Jun, S. Automatically optimized and self-evolutional Ship Targeting system for Port State Control. In Proceedings of the 2010 IEEE International Conference on Systems, Man and Cybernetics, Istanbul, Turkey, 10–13 October 2010; pp. 791–795. [Google Scholar] [CrossRef]
Yang, Z.; Yang, Z.; Yin, J. Realising Advanced Risk-based Port State Control Inspection Using Data-driven Bayesian Networks. Transp. Res. Part A Policy Pract. 2018, 110, 38–56. [Google Scholar] [CrossRef]
Tsou, M. Big Data Analysis of Port State Control Ship Detention Database. J. Mar. Eng. Technol. 2019, 18, 113–121. [Google Scholar] [CrossRef]
Chung, W.; Kao, S.; Chang, C.; Yuan, C. Association Rule Learning to Improve Deficiency Inspection in Port State Control. Marit. Policy Manag. 2020, 47, 332–351. [Google Scholar] [CrossRef]
Osman, M.T.; Yuli, C.; Li, T.; Senin, S.F. Association Rule Mining for Identification of Port State Control Patterns in Malaysian Ports. Marit. Policy Manag. 2020, 48, 1082–1095. [Google Scholar] [CrossRef]
Zhang, L.F.; Gang, L.H.; Liu, Z.J. Analyzing Inspection Results of Port State Control by using PCA. Appl. Mech. Mater. 2014, 686, 730–735. [Google Scholar] [CrossRef]

Table 1. Feature explanation and processing method.

Feature Name	Feature Meaning	Missing Value	Processing Method	Encoding Method
‘Dead Weight’	Deadweight tonnage is a measure of how much weight a boat can carry	Yes	Mean fill	No encoding.
‘flag performance’	White, grey, black, not listed	No	“not listed” is filled with the mode of the feature in the training set	Label encoding: w white`→`1; grey`→`2; black`→`3.
‘RO performance’	High, medium, low, very low, not listed	Yes	“not listed” is processed with mode	Label encoding: high`→`1; medium`→`2; low`→`3. very low`→`4.
‘Company performance’	The performance of shipping businesses is determined using the company performance matrix from the Tokyo MoU	No	“not listed” is filled with mode	Label encoding: high`→`1; medium`→`2; low`→`3; very low`→`4.
‘deficiency no’	The number of defects in this inspection	No		No encoding.
‘detention’	Whether this inspection is detained	No		Label encoding.
‘last deficiency no’	Here is the number of defects from the last initial inspection	Yes	filled with the mode of the training set	No encoding.
‘total detentions’	total number of detentions	No		No encoding.
‘casualty in 5 years’	A binary variable indicating whether a ship has had a casualty accident in the past five years.	No		Casualty-in-5-years: one-hot encoding: 1 for each casualty that has occurred in the previous 5 years, 0 otherwise.
‘flag changing times’	Number of ship flag state changes	No		No encoding.
‘Length’(meter)	The ship’s overall maximum length	Yes	filled with the mean	No encoding.
‘Beam’(meter)	Hull width	Yes	filled with the mean	No encoding.
‘Depth’(meter)	vertical distance between the side upper deck and the underside of the keel	Yes	filled with the mean	No encoding.
‘Speed’	the speed of the boat	Yes	filled with the mean	No encoding.
‘last_36_months_avg_def_no’	Average number of defects in initial inspection in the past 36 months	No		No encoding.
‘last_36_months_all_det_no’	Total number of detentions at initial inspection in the past 36 months	No		No encoding.
‘last_inspection_state’	Whether the last initial inspection was held or not, its encoding method is a binary variable. The encoding method 1 indicates that it is held, and 0 means that it is not held.	No		No encoding.
‘Classification Society’	NGO that creates and upholds technical guidelines for the design, manufacture, and use of ships and offshore structures.	No		One-hot encoding
‘Ship Type_PSC’	Bulk carriers, container ships, general/multipurpose ships, passenger ships, oil tankers, and other ship categories are included in the collection.	No		One-hot encoding: is bulk carrier: 1 for bulk carriers, 0 otherwise; is container ship: 1 for container ships, 0 otherwise; is general cargo/multipurpose: 1 for such ships; is passenger ship: 1 for such ships; is tanker: 1 for such vessels; is other: 1 for other ship categories, 0 otherwise.

Note: Flag performance, Recognized Organization (RO) performance and company performance are calculated based on the flag black and white list, RO performance list and company performance list provided by the Tokyo Memorandum of Understanding, respectively. Whitelisted flags perform better than greylisted flags and much better than blacklisted flags. For ROs and companies, performance deteriorates in the order of “high”, “medium”, “low” and “very low”. If the RO and Company’s performance is not listed, the performance status is recorded as “Not Listed”.

Table 2. The performance evaluation of the model with K = 3.

Cluster/Features	1	2	3
No, of ships c	252	1680	1005
No. of ships in HRS (rate)	HRS:225 (0.89)	HRS:547 (0.33)	HRS:494 (0.12)
No. of ships in SRS (rate)	SRS:26 (0.10)	SRS:809 (0.48)	SRS:389 (0.49)
No. of ships in LRS (rate)	LRS:1 (0.0039)	LRS:324 (0.19)	LRS:122 (0.39)
Clustering performance on training set (accuracy)	0.89	0.33	0.39
Clustering performance on test set (accuracy)	0.86	0.35	0.40
Average no. of deficiencies	11.83	3.96	2.30
Prediction performance on training set (MSE)	79.25	15.86	8.50
Prediction performance on test set (MSE)	75.16	16.00	8.16
Total number of detentions	71	42	3
Average detention rate	0.28	0.03	0.0030
Prediction performance on training set (Brier score)	0.20	0.024	0.0030
Prediction performance on test set (Brier score)	0.23	0.022	0.0043
Distribution of deficiency code	{‘01’: 234, ‘02’: 54, ‘03’: 254, ‘04’: 184, ‘05’: 176, ‘06’: 12, ‘07’: 631, ‘08’: 20, ‘09’: 228, ‘10’: 513, ‘11’: 339, ‘12’: 3, ‘13’: 69, ‘14’: 144, ‘15’: 83, ‘18’: 24, ‘99’: 14}	{‘01’: 395, ‘02’: 62, ‘03’: 515, ‘04’: 452, ‘05’: 286, ‘06’: 23, ‘07’: 1450, ‘08’: 97, ‘09’: 662, ‘10’: 1026, ‘11’: 866, ‘12’: 1, ‘13’: 199, ‘14’: 356, ‘15’: 88, ‘18’: 96, ‘99’: 76}	{‘01’: 119, ‘02’: 18, ‘03’: 167, ‘04’: 163, ‘05’: 125, ‘06’: 5, ‘07’: 481, ‘08’: 37, ‘09’: 272, ‘10’: 330, ‘11’: 235, ‘12’: 4, ‘13’: 94, ‘14’: 125, ‘15’: 28, ‘18’: 59, ‘99’: 45}
Prediction performance on training set (MSE) code_label	1.25	0.34	0.19
Prediction performance on test set (MSE) code_label	1.41	0.34	0.22
Distribution of detainable code	{‘01’: 36, ‘02’: 24, ‘03’: 41, ‘04’: 28, ‘05’: 23, ‘06’: 2, ‘07’: 72, ‘08’: 2, ‘09’: 2, ‘10’: 51, ‘11’: 33, ‘13’: 2, ‘14’: 24, ‘15’: 54, ‘18’: 1}	{‘01’: 8, ‘02’: 3, ‘03’: 8, ‘04’: 12, ‘05’: 9, ‘06’: 2, ‘07’: 31, ‘09’: 1, ‘10’: 13, ‘11’: 19, ‘13’: 1, ‘14’: 11, ‘15’: 22}	{‘07’: 2, ‘10’: 2, ‘14’: 2, ‘15’: 2}
Prediction performance on training set (MSE) detainable_code_label	0.19	0.0098	0.0020
Prediction performance on test set (MSE) detainable_code_label	0.27	0.0091	0.0043

Table 3. The silhouette score of the model with different K.

Number of Clusters	Silhoutte_score
2	−0.0916
3	−0.0883
4	−0.0883
5	−0.0883
6	−0.0837
7	−0.0837
8	−0.0837
9	−0.0836
10	−0.0842
11	−0.0842
12	−0.0840
13	−0.0842
14	−0.0856

Table 4. K-means models performance comparison of K = 3 and K = 9.

	K = 3	K = 9
accuracy rate	0.41	0.47
deficiency_no_mse	18.73	19.86
detention_mse	0.54	0.61
detainable_code_label0_MSE	0.27	0.06
detainable_code_label1_MSE	0.01	0.10
detainable_code_label2_MSE	0.0043	0.26
detainable_code_label3_MSE	\	0.00
detainable_code_label4_MSE	\	0.0053
detainable_code_label5_MSE	\	0.0073
detainable_code_label6_MSE	\	0.54
detainable_code_label7_MSE	\	0.051
detainable_code_label8_MSE	\	0.032
code_label0_MSE	1.41	0.62
code_label1_MSE	0.34	0.91
code_label2_MSE	0.22	1.02
code_label3_MSE	\	0.18
code_label4_MSE	\	0.22
code_label5_MSE	\	0.31
code_label6_MSE	\	2.22
code_label7_MSE	\	0.43
code_label8_MSE	\	0.45

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

On the K-Means Clustering Model for Performance Enhancement of Port State Control

Abstract

1. Introduction

2. Literature Review

3. Methodology

Model Evaluation Method

4. Prediction Model

4.1. Remove Useless Features and Extract New Features

4.2. Model Evaluation and Results

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics