Next Article in Journal
Predicting Early Employability of Vietnamese Graduates: Insights from Data-Driven Analysis Through Machine Learning Methods
Previous Article in Journal
Adaptive Augmented Reality Architecture for Optimising Assistance and Safety in Industry 4.0
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Applying Big Data for Maritime Accident Risk Assessment: Insights, Predictive Insights and Challenges

by
Vicky Zampeta
1,*,
Gregory Chondrokoukis
1 and
Dimosthenis Kyriazis
2
1
Department of Industrial Management & Technology, University of Piraeus, 185 34 Piraeus, Greece
2
Department of Digital Systems, University of Piraeus, 185 34 Piraeus, Greece
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2025, 9(5), 135; https://doi.org/10.3390/bdcc9050135
Submission received: 17 March 2025 / Revised: 15 April 2025 / Accepted: 24 April 2025 / Published: 19 May 2025

Abstract

Maritime safety is a critical concern for the transport sector and remains a key challenge for the international shipping industry. Recognizing that maritime accidents pose significant risks to both safety and operational efficiency, this study explores the application of big data analysis techniques to understand the factors influencing maritime transport accidents (MTA). Specifically, using extensive datasets derived from vessel performance measurements, environmental conditions, and accident reports, it seeks to identify the key intrinsic and extrinsic factors contributing to maritime accidents. The research examines more than 90 thousand incidents for the period 2014–2022. Leveraging big data analytics and advanced statistical techniques, the findings reveal significant correlations between vessel size, speed, and specific environmental factors. Furthermore, the study highlights the potential of big data analytics in enhancing predictive modeling, real-time risk assessment, and decision-making processes for maritime traffic management. The integration of big data with intelligent transportation systems (ITSs) can optimize safety strategies, improve accident prevention mechanisms, and enhance the resilience of ocean-going transportation systems. By bridging the gap between big data applications and maritime safety research, this work contributes to the literature by emphasizing the importance of examining both intrinsic and extrinsic factors in predicting maritime accident risks. Additionally, it underscores the transformative role of big data in shaping safer and more efficient waterway transportation systems.

1. Introduction

In recent decades, maritime transport has attracted much attention due to its critical role in the global supply chain and the persistent safety challenges it faces. Despite technological advances in shipping and improved safety protocols, maritime accidents continue to cause significant economic, environmental, and human costs. These incidents, which usually result from a mixture of operational failures and adverse environmental conditions, require an advanced approach to risk analysis and accident prevention [1,2].
Operational factors, such as vessel size, deadweight, and speed, are determinants of safety at sea. Larger vessels, which inherently involve more complex operations and greater momentum, are more susceptible to serious accidents [3,4]. Research highlights the importance of vessel characteristics, including size and deadweight, for assessing operational risks and predicting accident probabilities [5]. In addition, vessel speed is emerging as an important factor in predicting accident probability, especially under adverse weather conditions [6,7].
It is also a common finding of many scholars that environmental conditions have a significant impact on maritime safety. Strong winds, strong waves, and limited visibility significantly increase the risk of maritime accidents. Previous studies have highlighted the impact of these environmental factors on maritime operations [8,9,10]. In addition, a number of other studies show that increased winds and significant waves are associated with higher accident rates, while highlighting the need for real-time monitoring and adaptive navigation strategies to mitigate these risks [11,12,13,14].
A more modern approach to maritime safety research is the integration of big data analytics, through which new dimensions have been introduced to understand and mitigate accident risks. Big data techniques enable the processing and analysis of extensive datasets from multiple sources, revealing patterns and correlations that traditional methods may overlook. This integrated approach incorporates operational variables, environmental conditions, and vessel characteristics to provide a holistic view of factors affecting marine safety [4,15,16,17,18,19,20].
Recent research has demonstrated the effectiveness of big data in enhancing forecasting models and improving decision-making processes. For example, the use of AIS data to analyze ship movements and environmental conditions has identified critical predictors of grounding and collision risks [7,21] In addition, machine learning algorithms have significantly improved the accuracy of accident risk estimates, offering more effective preventive measures [1,22].
While significant progress has been made in identifying the impact of operational, environmental, and maritime factors on maritime safety, a comprehensive model combining all these variables is still lacking. Although the application of big data analytics has shown promise in improving maritime safety, its full potential in integrating real-time data streams and improving forecasting models has not yet been adequately explored. Many existing studies focus on single factors or limited datasets, without a holistic approach to marine accident prediction. This research aims to fill these gaps by integrating multiple variables, including environmental, operational and maritime characteristics, into a unified prediction framework.
Using big data techniques and real-time data, this study seeks to provide a more dynamic and accurate understanding of the factors influencing marine accidents, ultimately contributing to the development of more effective preventive measures. The objectives of this paper are: (i) to investigate the contribution of big data analysis to maritime safety and accident prediction, and (ii) to examine the effect of environmental conditions, operational factors, and vessel characteristics on the probability of maritime accidents, identifying the most important variables.
To achieve these goals, we analyze an extensive dataset spanning nine years (2014–2022) and over 90,000 observations. Using advanced big data techniques combined with statistical methods, we explore the complex relationships between ship characteristics, environmental conditions, and operational factors to uncover patterns and identify key variables that influence maritime accident risks. The findings show that while the associations between some variables and accidents are weak, patterns emerge, such as the increased risk associated with larger vessels, stronger winds, and higher speeds in adverse weather conditions. These insights highlight the value of big data in understanding complex maritime safety dynamics and emphasize the importance of integrating multiple factors into risk prediction and management models.
The rest of the paper is organized as follows. Section 2 provides the review of the literature, whereas Section 3 describes the methodology, as well as the data and techniques used in the study. Section 4 analyzes the findings and Section 5 is the discussion of the results. Section 6 concludes the study.

2. Literature Review

Maritime safety continues to be a critical area of study because of its integral role in global trade and the ongoing challenges posed by maritime accidents. Several scholars have focused on identifying and predicting the operational, environmental, and technological factors that affect maritime safety, and relatively recently, there has been an emphasis on the transformative potential of big data analytics in advancing safety frameworks and risk assessment models.
More specifically, operational factors, in particular the size, deadweight, and speed of the ship, are consistently recognized as determinants of safety at sea. Analysis of maritime accidents and numerous studies over time have concluded that larger ships, which involve more complex operations and have greater momentum, are associated with greater accident severity. A typical example is the 2006 study by Akten [3] who examined the multifaceted nature of shipping accidents, focusing on the economic, environmental, and operational consequences. His study concludes that the main contributors to shipping accidents are physical factors, including adverse weather and currents, as well as technical failures, human error, and cargo-related hazards.
With regard to the latter factor in particular, he pointed out that, although the increasing size of ships undoubtedly optimizes transport costs, it nevertheless appears to exacerbate the risks of accidents at sea, due to reduced flexibility, while at the same time leading to worse consequences in the event of an accident. Talley, Jin, and Yin, two years later, in 2008 [4] examining the factors affecting the severity of accidents, particularly on cruise ships, concluded that factors such as the size of the ship, its age, weather conditions, and the location of the accident make the greatest contribution. Analyzing accident data, their conclusions were that larger cruise ships and adverse weather conditions increase risk and tend to lead to more serious accidents. These findings are also consistent with the studies by Talley, Jin and Kite-Powell (2008) [4] and the more recent study by Chang and Park in 2019 [5] who highlight the importance of ship characteristics, including size and deadweight, in assessing operational risks and predicting accident probabilities.
Reviewing the existing literature, it becomes clear that adverse weather conditions not only hinder navigation but also increase the likelihood of accidents, even serious ones. In recent years, several scholars have continued to provide empirical evidence that environmental conditions, such as wind strength, sea state, and visibility, play a key role in maritime accidents. One of these studies, that of Heij and Knapp, published in 2015 [9] examines the effect of wind intensity and wave height on the risk of maritime accidents, considering regional trends and seasonality. A recurring recommendation in the literature is to incorporate real-time environmental data into forecasting models. Bye and Aalberg, in a 2018 article [13], presented the results of statistical analyses of maritime accident data and AIS data from Norwegian waters, with the aim of identifying conditions associated with navigation accidents, such as strandings and collisions, which could be used as risk indicators.
In particular, the authors analyzed data from the Norwegian Maritime Administration’s accident database and historical AIS records, converting information related to ship behavior before the accident, technical and organizational conditions, and characteristics of the area where the accident occurred into variables. Through correspondence analysis and multivariate logistic models, they found that specific vessel types, shorter vessel length, poor visibility conditions, and flags of convenience increase the likelihood of navigation accidents. In addition, Ventikos, Stavrou, and Andritsopoulos [14] a year earlier, examined the characteristics and statistical patterns of maritime accidents in specific areas of the Aegean Sea. By studying decades of accident data, the authors analyzed the specific characteristics of the ships involved, such as their type and size, and presented specific results, providing valuable information on the accidents under consideration.
In addition, they also developed two models for risk assessment in Aegean shipping: a stochastic Poisson model to calculate the probability of accident occurrence in three sea zones and a model inspired by seismology, using the concept of energy release to predict maritime accidents. These models provide a systematic overview of the risk profile of shipping in the Aegean Sea, offering valuable insights for improving maritime safety. Özdemir and Güneroğlu, in 2015 [11], also analyzed the importance of the human factor in maritime accidents, underlining that the majority of them are caused by human error. The authors applied a hybrid multi-criteria decision-making methodology, combining DEMATEL and ANP methods, to quantitatively assess the importance of different human factors in maritime mishaps. The results of the study showed that the three most important factors, in descending order, were: competence, skills, and knowledge at 8.94%; physical condition at 8.77%; and weather—sea conditions at 8.21%; while the least important factor appeared to be cargo characteristics at 2.21%.
Indeed, they noted that quantitative assessment of human errors in maritime operations can significantly improve the decision-making process and reduce potential risks. Another very recent study, that of Pilatis et al. (2024) [12], presents a comprehensive analysis of 213 maritime accidents that occurred between 1990 and 2020. The study focuses on collisions, groundings, and hull failures, looking at parameters such as the type of ship, its main characteristics, and the causes of the accidents.
The results underline the importance of understanding the factors contributing to such accidents, providing valuable information for improving maritime safety and developing preventive measures. On the other hand, Endrina et al., 2019 [2] and Wu et al., 2023 [20], highlight the need for sophisticated risk analysis and accident prevention strategies to address these multifaceted challenges. In addition, vessel speed becomes an important variable, especially under adverse environmental conditions. Among other studies, recent research by Bao et al. (2020) [7] and Al-Behadili et al. (2023) [6] demonstrate that higher speeds exacerbate accident risks, especially when strong winds or rough seas are present. These insights highlight the importance of adaptive navigation strategies that incorporate both vessel characteristics and real-time environmental data. In addition, a common conclusion of the scholars is the need for enhanced safety measures and crew training to mitigate the risk and reduce the severity of maritime accidents.
The advent of big data analytics has transformed maritime safety research, enabling the analysis of huge datasets to reveal patterns and improve forecast accuracy. Big data analysis is a new trend in quantitative methods encompassing extensive and intricate datasets that cannot be efficiently handled, processed, or analyzed with conventional data processing tools. A big dataset amounts to thousands of entries in several variables consisting of a matrix of NxΜ dimensions, which is a complicated task to deal with [23]. In many applications, data is produced in real time or near real time, requiring rapid processing to extract meaningful insights [24].
Big data variety refers to the diversity of data types and sources. Big data encompasses a wide range of data formats, structures, and sources such as structured data, unstructured data, and semi-structured data. Specifically, in the field of maritime safety and maritime accidents, examining the existing literature, we find that the early contributions of Talley et al. (2008) [4] and Toffoli et al. (2005) [15] laid the foundation for data-based risk assessments.
More recent developments by Sadaharu (2015) [16] Jovic and Edvard (2019) [17], and Liu et al. (2023) [18] highlight the increasing importance of big data techniques in accident risk identification. In addition, the use of AIS (Automatic Identification System) data has proven to be particularly effective in maritime safety research. Indicatively, Bao et al. (2020) [7] and Feng (2019) [21] demonstrate their usefulness in collision and stranding prediction, while Xu et al. (2023) [1] and Maceiras et al. (2024) [22] demonstrate how machine learning algorithms enhance the accuracy of risk assessment and decision-making capabilities. These findings highlight the potential of integrating operational, environmental, and technological variables into unified security frameworks.
However, examining the relevant literature, we find that despite the remarkable progress that has been made in the field of our research, certain limitations are identified. Many studies focus on individual factors or specific datasets, limiting their applicability to comprehensive accident prediction models. While big data analysis has introduced innovative approaches, its full potential—in particular, regarding real-time data integration and advanced AI applications—remains untapped. Underexplored variables, such as ship maintenance and crew performance, also need to be further explored. The research of Dominguez-Péry et al., in 2023 [25] and the more recent research by Bogalecka (2024) [26] emphasize the important role of these factors, noting that inadequate maintenance and inadequate training significantly increase the risks of accidents.
In addition to the factors that we have already extensively mentioned, there are several scholars who focus on others, such as the human factor. For example, the study by Acejo et al. (2018) [8] provides an in-depth analysis of maritime accident reports from 2002 to 2016, examining their causes and contributing factors. The study highlights the role of human error in maritime accidents, despite advances in technology and regulatory improvements, concluding that the most common direct causes include inadequate surveillance, poor judgment, and communication failures.
It is therefore evident that existing studies often focus on single factors or limited datasets, thus failing to provide a holistic approach to predicting maritime accidents. Thus, this research aims to address these gaps by integrating multiple variables into a unified predictive framework, leveraging big data techniques and real-time data to offer a more dynamic and accurate understanding of factors influencing maritime accidents. In particular, big data analysis was used in this article to enhance and evaluate the key factors that affect safety at sea and are respectively responsible for causing maritime accidents.
Through the analysis conducted, the research hypotheses were confirmed, showing that both internal and external factors are statistically significant in relation to maritime transport accidents (MTA), and this is in agreement with a number of studies, including the recent study by Stojanovic et al. (2019) [27]. Ship data include extensive information about ships and marine vehicles that is essential for effective maritime management, while integrated weather data, including real-time conditions, forecasts, and historical patterns, serve as a critical tool for maritime operations. In particular, accurate weather data is essential to optimize the route, avoid adverse conditions and ensure the safety of both crew and cargo. Overall, the use of this data helps to optimize operational efficiency and prioritizes the safety and well-being of those sailing the seas. Ultimately, this approach seeks to contribute to the development of more effective preventive measures and enhance maritime safety.

3. Materials and Methods

This study utilizes a big data analysis approach to investigate factors influencing maritime accidents. The methodology is structured in a way that integrates diverse data sources, allowing for the identification of correlations and patterns that influence maritime accident occurrence. Specifically, we focus on ship characteristics, environmental conditions, and operational variables, combining big data analysis with other statistical analytics to draw conclusions.

3.1. Data Collection

The data used in this study were sourced from publicly available databases, official accident reports, and environmental datasets, specifically:
  • Accident Data: Maritime accident data were sourced from the Global Integrated Shipping Information System, a comprehensive database providing detailed records of accidents, including fatalities, vessel damage, and causes of incidents.
  • Environmental Data: Variables such as average ambient temperature, wind force, sea state, and swell force were obtained from the National Oceanic and Atmospheric Administration and the European Centre for Medium-Range Weather Forecasts.
  • Vessel Characteristics: Data on vessel characteristics, including deadweight, sizing, cargo type, and vessel state, were retrieved from maritime registries and performance tracking systems.

3.2. Data Description and Analysis

The variables examined in this study include:
  • Accident Data: Maritime accident data were sourced from the Global Integrated Shipping Information System, a comprehensive database providing detailed records of accidents, including fatalities, vessel damage, and causes of incidents.
  • Environmental Data: Variables such as average ambient temperature, wind force, sea state, and swell force were obtained from the National Oceanic and Atmospheric Administration and the European Centre for Medium-Range Weather Forecasts.
  • Accident: Binary variable indicating the occurrence or non-occurrence of an accident.
  • Sizing: Categorical variable representing vessel size categories (Aframax, Handymax, Panamax, Suezmax) as described in Appendix B.
  • Deadweight: Continuous variable measuring the vessel’s weight in tons.
  • Average Ambient Temperature: Continuous variable representing the temperature in degrees Celsius.
  • Wind Force (BFT): This categorical variable takes values for wind force based on the Beaufort scale.
  • Swell Force (DSS): This variable takes values based on the Douglas sea scale (categorical variable).
  • Sea State (DSS): This categorical variable represents sea conditions.
  • Average Speed (Knots): This is a continuous variable that measures the speed of the vessel in knots.
  • Cargo: Binary variable indicating whether the vessel is laden or in ballast.
  • Vessel State: This variable is categorical and takes values related to the operational status of the vessel.

3.3. Data Description and Analysis Statistical Techniques

In this study, we use a variety of analytical techniques to examine the factors that influence marine accidents. Each method provides unique insights into the data, helping to identify significant patterns and relationships. The techniques are listed below:
  • Descriptive Statistics: Descriptive statistics were employed as the first step in the analysis to summarize the distribution and central tendencies of the key variables in the dataset. This includes measures such as mean, median, and standard deviation, which help characterize the data landscape and identify potential irregularities. The rationale for selecting descriptive statistics lies in their essential role in exploratory data analysis (EDA), particularly in large-scale datasets such as this one. These measures provided the foundation for subsequent analysis by revealing patterns in vessel size, deadweight, average speed, ambient temperature, wind, and swell forces. For instance, the detection of extreme values (e.g., 490 knots for speed, or 255 °C for temperature) alerted us to likely data anomalies, which were accounted for in the analysis. Descriptive statistics also helped define meaningful bins or categories (e.g., for deadweight or temperature), which supported later cross-tabulations and variance analyses.
  • Cross-Tabulation and Chi-Square Tests: Cross-tabulation was used to explore the distribution of accident occurrences across different categories of variables such as vessel size, deadweight class, speed groups, and wind force levels. These cross-classifications enabled a comparative view of accident versus non-accident frequencies within each group. To statistically assess these relationships, chi-square tests of independence were conducted. This method was selected because it is particularly suitable for testing associations between two categorical variables—precisely the structure that arises when accident occurrence (binary) is compared across grouped explanatory variables. This approach is scientifically justified by the goal of the study: to determine whether specific categories (e.g., larger vessels, higher wind force) are statistically associated with a higher incidence of accidents. The results, however, showed no statistically significant associations. This suggests that while some categories may appear more accident prone descriptively, the differences are not large enough—given the rarity of accidents—to be statistically meaningful under a chi-square framework.
  • Analysis of Variance (ANOVA): ANOVA was employed to test whether the mean values of continuous variables (e.g., average speed, deadweight, ambient temperature, wind force, swell force) differ between accident and non-accident observations. This method was selected due to its robustness in comparing group means when the dependent variable is categorical and the explanatory variables are continuous. In this context, ANOVA serves as a logical extension of the chi-square test, allowing us to examine whether vessels involved in accidents systematically differ in average values of certain characteristics compared to non-accident vessels. The results of the ANOVA tests indicated that none of the variables showed statistically significant differences in their means across accident status. This finding, while negative in terms of statistical significance, is informative: it suggests that accident occurrence cannot be explained by univariate differences in key operational or environmental variables alone. Such a result aligns with the broader maritime safety literature, which points to the multifactorial and often stochastic nature of accidents.
  • Correlation Analysis (Pearson and Spearman): To further investigate potential relationships between accident occurrence and continuous variables, Pearson’s correlation coefficient (for linear relationships) and Spearman’s rank correlation (for monotonic relationships) were computed. These correlation measures help determine whether there is a consistent directional relationship between two variables. In this study, correlation analysis is appropriate because it allows for a different type of association assessment than chi-square or ANOVA: rather than testing for group differences, it evaluates whether, for example, accident occurrence increases with vessel speed or decreases with deadweight. However, as reported in the results, both Pearson and Spearman correlations were very weak and statistically insignificant for all variables tested. These results suggest that accident occurrence behaves independently of any single operational or environmental variable when considered in isolation. This insight supports the conclusion that accidents are likely driven by complex interactions or unmeasured factors not captured in the available dataset (e.g., human error, real-time decision making, or equipment failure).
  • Big Data Analytics: While the individual statistical techniques applied in this study are classical in nature, the scale and diversity of the dataset necessitated a big data approach for data integration, preprocessing, and computational efficiency. The dataset consisted of multiple sources, including vessel telemetry, environmental readings, and structured incident records, which required harmonization and cleaning before statistical testing.
Big data techniques were essential in enabling:
  • Automated anomaly detection (e.g., identification of implausible values);
  • Categorization of continuous variables into analytically meaningful groups;
  • Filtering and aggregation of over 90,000 observations in real-time environments.
This methodological infrastructure allowed the study to manage data complexity and volume without compromising analytical rigor. Although more advanced machine learning techniques were not employed in this phase, the foundation laid here supports their application in future predictive modeling.

3.4. Research Design and Objectives

This study employs a quantitative, empirical research design underpinned by big data analytics, aimed at identifying the intrinsic and extrinsic factors contributing to maritime transport accidents (MTA). More specifically, it seeks to investigate the statistical association between various operational and environmental variables and the occurrence of maritime transport accidents (MTA). The analysis is based on a comprehensive dataset of over 90,000 maritime operational records spanning from 2014 to 2022. Each record is annotated with vessel characteristics, environmental conditions, and accident occurrence. The primary objective is to evaluate whether measurable operational or environmental variables significantly influence the likelihood of maritime accidents.
The central hypothesis tested is:
H1: 
There is no statistically significant association between accident occurrence and the selected vessel/environmental variables.
A series of classical statistical techniques were used to evaluate this hypothesis, moving from descriptive statistics to inferential modelling through cross-tabulation, variance analysis, and correlation testing.
Given the structure and nature of the data—where accidents constitute an extremely small proportion (0.024%) of all observations—a traditional inferential approach was applied, supported by big data processing capabilities. The methods chosen are well established in empirical transportation research and were specifically selected to suit the objectives, data types, and structure of the study.

3.4.1. Significance Criteria and Methodological Considerations

All statistical tests were conducted using a conventional two-tailed significance level of α = 0.05. While none of the applied tests yielded statistically significant results, the methodological integrity of the analysis remains intact. The lack of significance is scientifically important in itself, as it implies that the hypothesized predictors—though intuitively plausible—do not demonstrate measurable effects in this dataset.
Several methodological constraints help explain these results:
  • The extreme class imbalance (only 0.024% of records involve accidents) limits the power of conventional significance testing.
  • Important behavioral and procedural factors (e.g., crew actions, navigational decisions) are absent from the dataset.
  • Measurement errors or inconsistencies (e.g., from automatic sensors) may obscure underlying patterns.

3.4.2. Methodological Validity and Future Directions

The methodology employed in this study is both appropriate and scientifically sound. It aligns with the research objective, respects the data structure, and applies the correct statistical tools for each type of variable and analysis objective.
Importantly, the insights gained from this analysis underscore the complexity of maritime accidents, which cannot be adequately explained by isolated variables. Future research should focus on:
  • Expanding the dataset with richer behavioral and technical information;
  • Exploring interaction effects and multivariate modeling;
  • Applying time-series and sequential modeling approaches;
  • Integrating machine learning classifiers, once more balanced or enriched datasets are available.
By establishing a methodologically transparent and statistically grounded baseline, this study contributes to a more nuanced understanding of accident dynamics and sets the stage for future predictive frameworks. The conceptual model of the research is given in Chart 1.

4. Findings and Empirical Analysis

The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted. According to the methodology developed above, it is clear that in addition to big data analysis, we used different statistical and econometric techniques to predict maritime transport accidents (MTA). In the following, we present the findings from the tested methods in a comparative approach. In addition to the main conclusions drawn for the variables and methods tested, we also list the factors that lead to estimation weaknesses and need further investigation and improvement.

4.1. Preliminary Analysis and Descriptive Statistics

The table below (Table 1) provides summary statistics for several variables related to maritime transportation accidents (MTA), including accident, sizing, deadweight, average ambient temperature, wind force (BFT), swell force (DSS), and average speed in knots.
A first basic interpretation of the main statistics follows:
  • Accident: Binary coding (0 and 1) is used, representing the occurrence or non-occurrence of an accident, respectively. The mean and median, which is 1, indicate that the majority of observations do not involve an accident.
  • Sizing: This variable has a mean value of 2.41, with most vessels hovering around this value, as shown by the median, which is close to the mean. The standard deviation of 1.22 indicates moderate variability in vessel sizes, with type 1 (i.e., Aframax) being the most common value.
  • Deadweight: This variable has a relatively high mean and median, suggesting most ships have a deadweight between 105,102 and 112,949 tons. The large standard deviation indicates significant variability in the deadweight values, with some vessels being much heavier than others.
  • Average Ambient Temperature: The average ambient temperature is 22.22, with values that vary considerably. This variable shows considerable variation, with extreme values reaching 255 °C and −13 °C. The large standard deviation and range suggest that some data points may be erroneous or reflect extreme environmental conditions and is also explained by the significant number of missing data for this variable.
  • Wind Force (BFT) and Swell Force (DSS): As above, there is a significant amount of missing data for these two variables. However, the average wind force (measured on the Beaufort scale, BFT) is about 4.50, with a standard deviation of about 1.36, suggesting that the data tend to cluster relatively closely around this average, with moderate variability. In addition, the average wave strength (measured on the Douglas sea scale, DSS) is about 3.19, with a higher standard deviation of about 1.73, indicating a reasonable spread of data points. A maximum value of 9 indicates rare extreme wave conditions. This suggests that while the average wave force is lower compared to the wind force, the variability in wave force measurements is higher, with data points further away from the mean value.
  • Average Speed, in Knots: The large standard deviation and range for the average speed suggest that while most vessels have an average speed closer to 3.5 knots, there are outliers with much higher speeds. A maximum of 490 knots is concerning and probably comes from outliers included in the sample.
In summary, from the above table we draw some first basic conclusions for specific variables. In particular, the variables sizing, deadweight, and wind force (BFT) show moderate variation, and their value usually falls within the expected ranges for maritime transport. In addition, average ambient temperature shows extreme variability, as does average speed, which reveals significant variability, with some vessels exhibiting extremely high speeds, suggesting outliers or data inconsistencies. Finally, the variable “accident” shows an expected pattern for binary data, with most values indicating no accident. However, despite any limitations of the data (low statistical significance), these seem to suggest certain trends.

Frequency Analysis

This subsection presents the frequency analysis for the individual variables examined in our sample.
Table 2 and Figure 1 below present the results of the frequency analysis for the dependent variable “accident”.
We observe that in this dataset we examined, the probability of an accident is extremely low, since accidents constitute only 0.024% of all observations. This very low percentage suggests that the shipping operations recorded in this dataset are generally safe. Consequently, any safety analysis or further investigation could focus on those few accident cases to identify patterns or causes in order to further improve safety measures.
The frequency analysis for the variable “sizing” follows.
As shown in Table 3 and Figure 2, the dataset is diverse in vessel sizes, with Aframax being the most prevalent (35%), followed by Suezmax (27.4%), Panamax (21.6%), and Handymax (16%). There is a relatively even distribution among the different vessel sizes, but Aframax vessels form a clear plurality.
The frequency analysis for the variable “deadweight” follows.
As shown in Table 4 and Figure 3, the largest share of vessels falls into the “up to 75,000” deadweight group (37.7%), indicating a high prevalence of smaller vessels in the fleet. The 75,001–150,000 deadweight category is also prominent (27.4%), with slightly fewer vessels than the smallest category, but still a significant portion. Although vessels above 150,000 deadweight are the least common, they still represent a substantial 27.4% of the fleet, highlighting the importance of larger vessels. For the deadweight in groups, the dataset gives the results as presented in Table 5 and Figure 4.
The frequency analysis for the variable “average ambient temperature” follows.
Table 6 and Figure 5 illustrate the distribution of ambient temperature data across three temperature groups: up to 20 °C, 20.1 °C to 28 °C, and above 28.01 °C. With a total of 90,215 observations, the analysis reveals that temperatures up to 20 °C account for 11.5% of the valid data, while temperatures ranging from 20.1 °C to 28 °C represent 12.3%. Above 28.01 °C, the data encompasses 7.6% of the total.
The following is the frequency analysis for the variable “wind force”.
Table 7 and Figure 6 present a comprehensive analysis of wind force data categorized according to the Beaufort wind force scale. With a total dataset of 90,215 observations, it reveals a varied distribution across different wind force categories. Notably, moderate to strong breezes (force 3 to force 6) dominate the observations, collectively representing over 92% of the valid data.
The frequency analysis for the variable “swell force” follows.
Table 8 and Figure 7 presents an analysis of sea state data categorized by the DSS (Douglas sea scale). Among 90,215 total observations, the distribution of sea states across different categories varies. Notably, moderate to rough sea states (from “slight” to “rough”) dominate the observations, collectively representing approximately 89% of the valid data.
The frequency analysis for the variable “average speed” follows.
Table 9 and Figure 8 offer insights into the distribution of average speeds, measured in knots, categorized into four groups. Among a total of 90,215 observations, the analysis reveals varying frequencies across different speed ranges. Notably, most observations fall within the range of “zero speed” and “more than 12.21 knots”, collectively representing approximately 63.2% of the valid data.
The frequency analysis for the variable “cargo distribution” follows.
Table 10 and Figure 9 present an analysis of cargo distribution, categorized into two main groups: “laden” and “ballast”. Among a total of 90,215 observations, the analysis reveals that laden cargo accounts for 59.6% of the valid data, while ballast cargo constitutes the remaining 40.4%. This breakdown highlights the prevalence of laden cargo in the dataset. Such insight into cargo distribution is crucial for understanding vessel operations and trade dynamics.
Finally, the results of the frequency analysis for the variable “vessel state” are presented.
Table 11 and Figure 10 present an analysis of vessel states, categorized into five main groups: “at port”, “dry dock”, “maneuvering”, “operation”, and “sea passage”. Among a total of 90,215 observations, the analysis reveals varying frequencies across different vessel states. Notably, most observations fall within “at port” and “sea passage” categories, collectively representing 70.4% of the valid data.

4.2. Empirical Results on Statistical Significance Tests: Cross-Tabulation, Chi-Square Tests

All statistical analyses that were conducted failed to find any statistically significant association between accident occurrence and other variables (e.g., sizing, deadweight, average ambient temperature, wind force, swell force, average speed, in knots, etc.). The lack of strong evidence suggesting that the examined variables significantly influence the likelihood of accidents could be due to several factors, including:
Sample Size: The number of accidents recorded is very small. When sample sizes are small, it becomes challenging to detect significant relationships or patterns in the data.
Statistical Tests: The chi-Square tests conducted on the data indicate that the p-values are not statistically significant. This suggests that any observed differences in accident count across the examined variables could be due to random chance, rather than a true association.
In summary, while there may be observed differences in accident counts across the examined variables (e.g., sizing, deadweight, average ambient temperature, wind force, swell force, average speed in knots, etc.), the statistical tests indicate that these differences are not statistically significant. Therefore, it is challenging to conclude that some factors significantly influence the likelihood of accidents based on the available data. Further research with a larger sample of accidents and consideration of additional factors would be needed to better understand the relationship between external factors and accidents. Three indicative analyses follow.
The crosstab analysis in Table 12 presents the relationship between accident occurrence and vessel sizing categories (Aframax, Handymax, Panamax, and Suezmax). Within the “accident” category, the highest count of accidents is observed in the Suezmax category, comprising 40.9% of all accidents. Aframax follows with 31.8%, Handymax with 13.6%, and Panamax with 13.6%. However, when considering the percentage within each sizing category, accidents are proportionally distributed across sizing categories, with 35% within Aframax, 16% within Handymax, 21.6% within Panamax, and 27.4% within Suezmax.
The chi-Square tests in Table 13 assess the significance of the relationship between accident occurrence and vessel sizing categories, indicating no statistically significant association between the variables (p > 0.05 for all tests).
In summary, while there are variations in the counts of accidents across different vessel sizing categories, these differences are not statistically significant. Therefore, there is no strong evidence to suggest that vessel sizing significantly influences the likelihood of accidents.

4.3. Crosstabulation Accident * Deadweight (Groups)

The crosstabulation analysis (Table 14) illustrates the relationship between accident occurrence and vessel deadweight groups categorized as up to 75,000, 75,001–150,000, and above 150,000. Within the “accident” category, the highest number of accidents is observed in vessels with deadweight above 150,000, comprising 40.9% of all accidents. This is followed by vessels with deadweight between 75,001 and 150,000 (31.8%), and vessels with deadweight up to 75,000 (27.3%).
However, when considering the percentage within each deadweight group, accidents are proportionally distributed, with each group representing approximately 37.7%, 35.0%, and 27.4% of the total accidents, respectively.
The chi-Square tests (Table 15) assess the significance of the relationship between accident occurrence and vessel deadweight groups, indicating no statistically significant association between the variables (p > 0.05 for all tests).
In summary, while there are differences in accident counts across different vessel deadweight groups, these differences are not statistically significant. Therefore, there is no strong evidence to suggest that vessel deadweight significantly influences the likelihood of accidents.

4.4. Crosstabulation Accident Occurrence and Average Temperature (In Groups)

The crosstab (Table 16) illustrates the relationship between accident occurrence and average ambient temperature groups categorized as up to 20, 20.1–28, and above 28.01. Within the “accident” category, the highest count of accidents is observed when the average ambient temperature is up to 20, comprising 46.2% of all accidents. This is followed by temperatures ranging from 20.1 to 28 (38.5%), and temperatures above 28.01 (15.4%).
However, when considering the percentage within each temperature group, accidents are proportionally distributed across temperature categories. The chi-square tests (Table 17) assess the significance of the relationship between accident occurrence and average ambient temperature groups, indicating no statistically significant association between the variables (p > 0.05 for all tests).
In summary, while there are variations in accident counts across different average ambient temperature groups, these differences are not statistically significant. Therefore, there is no strong evidence to suggest that average ambient temperature significantly influences the likelihood of accidents.

4.5. Empirical Results on Statistical Significance Tests: Analysis of Variance (ANOVA)

ANOVA (analysis of variance) tests are essential statistical tools used to assess whether there are statistically significant differences in the mean scores of multiple groups or variables. In this analysis, we focus on ten variables: sizing, deadweight, deadweight (groups), average ambient temperature, average ambient temperature (groups), wind force (BFT), swell force (DSS), sea state (DSS), average speed in knots, and average speed in knots (groups). The aim is to determine whether the occurrence of accidents significantly influences the mean scores of these variables. For instance, in the context of sizing, the ANOVA test assesses whether there are significant differences in vessel sizes between groups with and without accidents.
The analysis found no statistically significant relationships between the occurrence of accidents and the ten variables examined. This lack of significance suggests that, based on the available data, there is no strong evidence to support the idea that accidents are influenced by variations in these operational factors. While accidents undoubtedly have multiple contributing factors, including human error, mechanical failure, and environmental conditions, this analysis did not identify any clear associations between accidents and the variables examined. Therefore, the takeaway from this analysis is that accidents in maritime settings may be influenced by a complex interplay of factors beyond the scope of the variables investigated here.
Further research and analysis may be necessary to uncover additional factors or to explore different methodologies to better understand the dynamics of accidents in maritime environments (see Table 18 descriptives and Table 19 ANOVA analysis).

4.6. Empirical Results on Regression—Correlation: Pearson and Spearman Correlation Coefficients

Table A1 and Table A2 in Appendix A present Pearson and Spearman correlation coefficients and their corresponding significance levels (two-tailed) for various variables in relation to the occurrence of accidents. Here are the main conclusions:
  • Weak Correlation with Accident Occurrence: Across the board, the correlation coefficients between accident occurrence and the examined variables (sizing, deadweight, deadweight (groups), average ambient temperature, average ambient temperature (groups), wind force (BFT), swell force (DSS), sea state (DSS), average speed in knots, average speed in knots (groups), cargo, and vessel state) are all very close to zero. These coefficients indicate a weak linear relationship between accident occurrence and the variables.
  • Insignificant Correlations: The significance levels (Sig.) associated with the correlation coefficients are generally high (greater than 0.05), indicating that the correlations are not statistically significant. This suggests that the observed correlations between accident occurrence and the variables are likely due to chance rather than representing true associations.
  • Variable Independence: The lack of statistically significant correlations suggests that accident occurrence may be relatively independent of the variables examined in this analysis. This implies that factors other than those measured in this dataset may have a more substantial influence on accident occurrence in maritime settings.

5. Discussion of the Results

The current research investigates the use of big data analysis in the study of factors that appear to influence maritime accidents, examining ship characteristics, environmental conditions, and operational factors. The study period of the sample runs for nine years (2014–2022) and includes over 90,000 observations. To analyze our data and draw conclusions, in addition to big data analysis, tools from statistical science, in particular, descriptive statistics, statistical significance tests, analysis of variance, and regression-correlation, are applied.
Starting the discussion around the main results of our study, it should be mentioned that our findings confirm the first hypothesis that we put forward, i.e., that the analysis of big data contributes to the understanding of the factors that lead to safe berthing and correspondingly to the risk of maritime transportation accidents (MTA), which is the main aim of this paper. This initial finding is in line with relevant research studies, including recent studies of Sadaharu (2015) [16], Jović and Edvard (2019) [17], Liu et al. (2023) [18], Zhang et al. (2017) [28] and Ma et al. (2024) [19].
It is obvious that without big data analysis, it would have been impossible to examine this huge volume of data involving 90,215 trip cases (including the maritime accidents that occurred) and observations on a number of related variables that were examined: sizing, deadweight, average ambient temperature, wind force, swell force, and average speed. Table 1 summarizes these data.
Analyzing the data, it is evident that the number of cases leading to a maritime accident is significantly lower than the total data. This key finding could be seen as weakening our findings; however, because different methods were examined and the findings were converged, it can be argued that we are discovering specific patterns leading to maritime accidents and their factors. For this reason, it is particularly important that our second hypothesis is confirmed, which suggested that there are specific variables per category of factor we examined (endogenous, exogenous) that influence the probability of maritime accidents.
Specifically, to test the second hypothesis, we first applied cross-tabulation analysis and chi-square test to examine the relationship between the variables of vessel deadweight and average ambient temperature, respectively, and accident occurrence. Furthermore, a variance analysis with hypothesis testing was performed to determine whether the occurrence of accidents significantly influences the mean scores of selected variables, such as sizing, deadweight, deadweight (groups), average ambient temperature, average ambient temperature (groups), wind force (BFT), swell force (DSS), sea state (DSS), average speed in knots, and average speed in knots (groups), followed by a correlation analysis with Pearson and Spearman coefficient testing. Our analysis first indicated that there is a correlation between the variable “accident” and the variables “vessel size category”, “deadweight groups”, and “average ambient temperature groups”.
In particular, stronger winds and sea disturbances are associated with an increased probability of accidents, which is in line with the existing literature (see inter alia, Pilatis et al., 2024 [12]; Bye and Aalberg, 2018 [13]; Ventikos, Stavrou and Andritsopoulos, 2017 [14]. In addition, although deadweight and vessel size show a weak correlation with accidents, these factors cannot be completely ignored, as the results are also affected by the reduced number of observations. Furthermore, one of the most consistent findings in the data was the strong negative correlation between average speed (in knots) and maritime accidents, thus highlighting the risks associated with high-speed navigation, particularly in adverse weather conditions. For further investigation and applying the analysis of variance, we found that maritime accidents can be influenced by a number of factors and even by a complex interaction of factors.
This finding and the fact that the big data analysis, in our case, did not identify clear correlations between accidents and the variables examined lead us to the conclusion that other factors that may influence maritime accidents need to be investigated and examined by other methods or even in combination.
Furthermore, the regression-correlation analysis revealed a remarkable correlation between ship size, namely deadweight, and shipping accidents. Indeed, a number of studies, such as those of Akten (2006) [3] and Talley, Jin, and Kite-Powell (2008) [4] have focused on examining the severity of accidents and ship size, finding that larger ships often experience more serious accidents. In our dataset, the deadweight was positively correlated with accident severity (Spearman’s rho = 0.104), suggesting that larger ships may have a greater propensity for accidents, resulting in significant damage or casualties. This is further supported by our finding that deadweight groups also show a strong correlation with accident severity. Our study also confirms the findings of Heij and Knapp (2015) [9] and Brandt et al. (2024) [10] on the significant influence of environmental conditions, in particular wind force and sea state, on accident risk. We found a strong correlation between wind force and accident severity (Spearman’s rho = −0.226).
Moreover, our results showed that average ambient temperature had a weaker association with accidents (Spearman’s rho = −0.005), which contradicts some studies suggesting that extreme temperatures can exacerbate accidents. However, in our case, the minimal correlation may reflect the geographical distribution of the data or the limited influence of temperature on the selected dataset. Interestingly, vessel condition also showed a moderate correlation with accident severity (Spearman’s rho = 0.104). Although this variable is less frequently studied in the existing literature, it is in line with the findings of other researchers [26,28], who emphasized that poorly maintained or malfunctioning vessels are more likely to experience accidents. Therefore, the maintenance and technical readiness of vessels should be a priority to minimize accidents.
From this analysis, we conclude that the use of big data analysis to evaluate marine accidents revealed complex patterns that would have been difficult to identify using traditional statistical methods, while Spearman’s rho correlations and regression models allowed the identification of key predictors of accident severity, including environmental factors and operational conditions.

6. Conclusions

This study utilized big data analysis techniques in combination with statistical methods to examine the factors that influence maritime accidents, contributing to a deeper understanding of the relationship between ship characteristics, operational variables, and environmental conditions. The main objectives of our research include the evaluation of the possibility of using big data analysis to predict maritime transportation accidents, as well as the investigation of the main factors that cause them through the identification of specific variables. For our analysis, we gathered information and studied more than ninety thousand specific facts over a nine-year period. The findings of this study contribute to the literature by providing important information on the factors that appear to cause marine casualties, thus providing opportunities to develop more effective risk management strategies.
The main findings of our study were that ship characteristics, in particular deadweight (DWT), are important factors in causing marine accidents, with larger ships experiencing more severe problems, while environmental conditions such as wind force (BFT) and sea state (DSS) also appeared to play a role. Furthermore, it is evident that operational factors also have an influence, with average speed being more important, as confirmed by our analysis. The results of this analysis highlight the importance of a multi-factor approach to maritime safety, where both vessel characteristics and environmental conditions play an important role in determining accident risk.
Regarding the processing and analysis of the data and drawing more accurate conclusions, we agreed that the use of big data is a useful analytic process and, in addition to analysis, can provide the possibility of better accident prediction and mitigation by integrating various data sources such as weather forecasting systems, vessel monitoring systems, and sensor data. Although this study demonstrated the value of big data analysis in improving the accuracy of accident prediction, it should be noted that it also faced some important limitations, such as small accident data volume and geographical constraints. Furthermore, the reduced values of certain variables, such as the average ambient temperature, affected the statistical significance of the tests.
For these reasons, we believe that future research should incorporate other factors in combination with other tests to create a more comprehensive accident prediction model. Future studies could also explore the application of deep learning algorithms and more complex artificial intelligence models to enhance the predictive power of the analysis. Furthermore, incorporating real-time data streams into accident prediction models can also provide valuable insights and guidance.

Author Contributions

Conceptualization, V.Z.; methodology, V.Z., G.C., D.K.; software, V.Z., G.C., D.K.; validation, V.Z.; formal analysis, V.Z., G.C., D.K.; investigation, V.Z., G.C., D.K.; resources, V.Z., G.C., D.K.; data curation, V.Z., G.C., D.K.; writing—original draft preparation, V.Z.; writing—review and editing, V.Z., G.C., D.K.; visualization, V.Z., G.C., D.K.; supervision, G.C.; project administration, V.Z., G.C., D.K.; funding acquisition, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partly supported by the University of Piraeus Research Center (ELKE), Research Committee of University of Piraeus.

Data Availability Statement

The detailed dataset supporting the results presented in this study is available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Correlation analysis: Pearson test.
Table A1. Correlation analysis: Pearson test.
Correlations
AccidentSizingDeadweightDeadweight (Groups)Average Ambient TemperatureAverage Amine Temperature (Groups)Wind Force (BFT)Swell Force (DSS)Sea State (DSS)Average Speed, in KnotsAverage Speed, in Knots (Groups)CargoVessel State
AccidentPearson Correlation1−0.003−0.004−0.0050.0050.0050.0050.0030.003−0.002−0.003−0.003−0.004
Sig. (2-tailed) 0.3930.2050.1610.3750.3900.3570.5730.5870.5220.4560.3560.276
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
SizingPearson Correlation−0.00310.417 **0.383 **0.191 **0.153 **−0.073 **0.004−0.128 **−0.0030.012 **−0.023 **0.015 **
Sig. (2-tailed)0.393 0.0000.0000.0000.0000.0000.4610.0000.4150.0010.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
DeadweightPearson Correlation−0.0040.417 **10.968 **0.034 **0.024 **0.096 **0.018 **0.057 **0.062 **0.078 **0.024 **0.093 **
Sig. (2-tailed)0.2050.000 0.0000.0000.0000.0000.0030.0000.0000.0000.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Deadweight (Groups)Pearson Correlation−0.0050.383 **0.968 **10.013 *0.0010.088 **0.0000.050 **0.053 **0.070 **0.027 **0.084 **
Sig. (2-tailed)0.1610.0000.000 0.0230.8160.0000.9500.0000.0000.0000.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Average Ambient TemperaturePearson Correlation0.0050.191 **0.034 **0.013 *10.840 **−0.227 **−0.115 **−0.193 **0.088 **0.110 **−0.010.c
Sig. (2-tailed)0.3750.0000.0000.023 0.0000.0000.0000.0000.0000.0000.1050.000
N28,35928,35928,35928,35928,35928,35928,17728,15628,09627,82427,82428,35928,359
Average Ambient Temperature (Groups)Pearson Correlation0.0050.153 **0.024 **0.0010.840 **1−0.229 **−0.118 **−0.196 **0.071 **0.090 **−0.003.c
Sig. (2-tailed)0.3900.0000.0000.8160.000 0.0000.0000.0000.0000.0000.5630.000
N28,35928,35928,35928,35928,35928,35928,17728,15628,09627,82427,82428,35928,359
Wind Force (BFT)Pearson Correlation0.005−0.073 **0.096 **0.088 **−0.227 **−0.229 **10.626 **0.867 **−0.105 **−0.199 **−0.117 **.c
Sig. (2-tailed)0.3570.0000.0000.0000.0000.000 0.0000.0000.0000.0000.0000.000
N28,17728,17728,17728,17728,17728,17728,17728,14228,08127,64427,64428,17728,177
Swell Force (DSS)Pearson Correlation0.0030.0040.018 **0.000−0.115 **−0.118 **0.626 **10.660 **−0.094 **−0.170 **−0.076 **.c
Sig. (2-tailed)0.5730.4610.0030.9500.0000.0000.000 0.0000.0000.0000.0000.000
N28,15628,15628,15628,15628,15628,15628,14228,15628,07527,62627,62628,15628,156
Sea State (DSS)Pearson Correlation0.003−0.128 **0.057 **0.050 **−0.193 **−0.196 **0.867 **0.660 **1−0.103 **−0.199 **−0.108 **.c
Sig. (2-tailed)0.5870.0000.0000.0000.0000.0000.0000.000 0.0000.0000.0000.000
N28,09628,09628,09628,09628,09628,09628,08128,07528,09627,56727,56728,09628,096
Average Speed, in KnotsPearson Correlation−0.002−0.0030.062 **0.053 **0.088 **0.071 **−0.105 **−0.094 **−0.103 **10.857 **−0.044 **0.700 **
Sig. (2-tailed)0.5220.4150.0000.0000.0000.0000.0000.0000.000 0.0000.0000.000
N70,59870,59870,59870,59827,82427,82427,64427,62627,56770,59870,59870,59864,191
Average Speed, in Knots (Groups)Pearson Correlation−0.0030.012 **0.078 **0.070 **0.110 **0.090 **−0.199 **−0.170 **−0.199 **0.857 **1−0.048 **0.791 **
Sig. (2-tailed)0.4560.0010.0000.0000.0000.0000.0000.0000.0000.000 0.0000.000
N70,59870,59870,59870,59827,82427,82427,64427,62627,56770,59870,59870,59864,191
CargoPearson Correlation−0.003−0.023 **0.024 **0.027 **−0.010−0.003−0.117 **−0.076 **−0.108 **−0.044 **−0.048 **1−0.043 **
Sig. (2-tailed)0.3560.0000.0000.0000.1050.5630.0000.0000.0000.0000.000 0.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Vessel StatePearson Correlation−0.0040.015 **0.093 **0.084 **.c.c.c.c.c0.700 **0.791 **−0.043 **1
Sig. (2-tailed)0.2760.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.000
N83,17883,17883,17883,17828,35928,35928,17728,15628,09664,19164,19183,17883,178
Source: own study. ** Correlation is significant at the 0.01 level (two-tailed). * Correlation is significant at the 0.05 level (two-tailed). c Cannot be computed because at least one of the variables is constant.
Table A2. Correlation analysis: Spearman test.
Table A2. Correlation analysis: Spearman test.
Correlations
AccidentSizingDeadweightDeadweight (Groups)Average Ambient TemperatureAverage Ambient Temperature (Groups)Wind Force (BFT)Swell Force (DSS)Sea State (DSS)Average Speed, in KnotsAverage Speed, in Knots (Groups)CargoVessel State
Spearman’s rhoAccidentCorrelation Coefficient1.000−0.003−0.005−0.0050.0050.0050.0040.0040.004−0.002−0.003−0.003−0.004
Sig. (2-tailed).0.3820.1260.1670.4000.3930.5260.5220.5230.6130.4550.3560.213
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
SizingCorrelation Coefficient−0.0031.0000.363 **0.343 **0.163 **0.154 **−0.065 **0.002−0.115 **0.021 **0.017 **−0.022 **0.014 **
Sig. (2-tailed)0.382.0.0000.0000.0000.0000.0000.7410.0000.0000.0000.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
DeadweightCorrelation Coefficient−0.0050.363 **1.0000.941 **−0.005−0.0030.108 **0.015 *0.079 **0.087 **0.089 **0.031 **0.104 **
Sig. (2-tailed)0.1260.000.0.0000.4240.6610.0000.0140.0000.0000.0000.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Deadweight (Groups)Correlation Coefficient−0.0050.343 **0.941 **1.000−0.0010.0030.094 **0.0000.059 **0.070 **0.072 **0.028 **0.086 **
Sig. (2-tailed)0.1670.0000.000.0.8290.5800.0000.9610.0000.0000.0000.0000.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Average Ambient TemperatureCorrelation Coefficient0.0050.163 **−0.005−0.0011.0000.937 **−0.243 **−0.126 **−0.211 **0.099 **0.097 **−0.007.
Sig. (2-tailed)0.4000.0000.4240.829.0.0000.0000.0000.0000.0000.0000.222.
N28,35928,35928,35928,35928,35928,35928,17728,15628,09627,82427,82428,35928,359
Average Ambient Temperature (Groups)Correlation Coefficient0.0050.154 **−0.0030.0030.937 **1.000−0.228 **−0.118 **−0.197 **0.094 **0.090 **−0.003.
Sig. (2-tailed)0.3930.0000.6610.5800.000.0.0000.0000.0000.0000.0000.614.
N28,35928,35928,35928,35928,35928,35928,17728,15628,09627,82427,82428,35928,359
Wind Force (BFT)Correlation Coefficient0.004−0.065 **0.108 **0.094 **−0.243 **−0.228 **1.0000.616 **0.868 **−0.226 **−0.201 **−0.124 **.
Sig. (2-tailed)0.5260.0000.0000.0000.0000.000.0.0000.0000.0000.0000.000.
N28,17728,17728,17728,17728,17728,17728,17728,14228,08127,64427,64428,17728,177
Swell Force (DSS)Correlation Coefficient0.0040.0020.015 *0.000−0.126 **−0.118 **0.616 **1.0000.656 **−0.187 **−0.175 **−0.079 **.
Sig. (2-tailed)0.5220.7410.0140.9610.0000.0000.000.0.0000.0000.0000.000.
N28,15628,15628,15628,15628,15628,15628,14228,15628,07527,62627,62628,15628,156
Sea State (DSS)Correlation Coefficient0.004−0.115 **0.079 **0.059 **−0.211 **−0.197 **0.868 **0.656 **1.000−0.228 **−0.207 **−0.118 **.
Sig. (2-tailed)0.5230.0000.0000.0000.0000.0000.0000.000.0.0000.0000.000.
N28,09628,09628,09628,09628,09628,09628,08128,07528,09627,56727,56728,09628,096
Average Speed, in KnotsCorrelation Coefficient−0.0020.021 **0.087 **0.070 **0.099 **0.094 **−0.226 **−0.187 **−0.228 **1.0000.982 **−0.040 **0.808 **
Sig. (2-tailed)0.6130.0000.0000.0000.0000.0000.0000.0000.000.0.0000.0000.000
N70,59870,59870,59870,59827,82427,82427,64427,62627,56770,59870,59870,59864,191
Average Speed, in Knots (Groups)Correlation Coefficient−0.0030.017 **0.089 **0.072 **0.097 **0.090 **−0.201 **−0.175 **−0.207 **0.982 **1.000−0.046 **0.796 **
Sig. (2-tailed)0.4550.0000.0000.0000.0000.0000.0000.0000.0000.000.0.0000.000
N70,59870,59870,59870,59827,82427,82427,64427,62627,56770,59870,59870,59864,191
CargoCorrelation Coefficient−0.003−0.022 **0.031 **0.028 **−0.007−0.003−0.124 **−0.079 **−0.118 **−0.040 **−0.046 **1.000−0.043 **
Sig. (2-tailed)0.3560.0000.0000.0000.2220.6140.0000.0000.0000.0000.000.0.000
N90,21590,21590,21590,21528,35928,35928,17728,15628,09670,59870,59890,21583,178
Vessel StateCorrelation Coefficient−0.0040.014 **0.104 **0.086 **.....0.808 **0.796 **−0.043 **1.000
Sig. (2-tailed)0.2130.0000.0000.000.....0.0000.0000.000.
N83,17883,17883,17883,17828,35928,35928,17728,15628,09664,19164,19183,17883,178
Source: own study. ** Correlation is significant at the 0.01 level (two-tailed). * Correlation is significant at the 0.05 level (two-tailed).

Appendix B

Here’s a comparison table of different tanker and bulk carrier classes:
Vessel TypeDeadweight Tonnage (DWT)Length (m)Beam (m)Draft (m)Common CargoMain Trade Routes
Aframax80,000–120,000 DWT230–250~4413–15Crude oilShort/medium-haul crude transport (e.g., North Sea, Mediterranean, SE Asia, U.S. Gulf)
Suezmax120,000–200,000 DWT275–285~4815–17Crude oilRoutes passing through the Suez Canal (e.g., West Africa-Europe, Black Sea-Med)
Panamax 50,000–80,000 DWT294–330~3212–13Crude oil, bulk cargo, containersGlobal trade routes through the Panama Canal
Handymax40,000–50,000 DWT150–200~3210–12Dry bulk (grain, coal steel, cement)Short/medium-haul bulk transport (e.g., regional ports with size restrictions)

References

  1. Xu, M.; Ma, X.; Zhao, Y.; Qiao, W. A Systematic Literature Review of Maritime Transportation Safety Management. J. Mar. Sci. Eng. 2023, 11, 2311. [Google Scholar] [CrossRef]
  2. Endrina, N.; Konovessis, D.; Sourina, O.; Krishnan, G. Influence of ship design and operational factors on human performance and evaluation of effects and sensitivity using risk models. Ocean. Eng. 2019, 184, 143–158. [Google Scholar] [CrossRef]
  3. Akten, N. Shipping accidents: A serious threat for marine environment. J. Black Sea/Mediterr. Environ. 2006, 12, 269–304. [Google Scholar]
  4. Talley, W.K.; Jin, D.; Kite-Powell, H. Determinants of the severity of cruise vessel accidents. Transp. Res. Part D Transp. Environ. 2008, 13, 86–94. [Google Scholar] [CrossRef]
  5. Chang, Y.; Park, H. The impact of vessel speed reduction on port accidents. Accid. Anal. Prev. 2019, 123, 422–432. [Google Scholar] [CrossRef]
  6. Al-Behadili, A.; Al-Taai, O.; Al-Muhyi, A. Analysis of ship accident resulting from bad weather conditions in the port of Khor Al-Zubair, Iraqi crane accident Aba Thar: A case study. Rev. Bionatura 2023, 8, 1–16. [Google Scholar] [CrossRef]
  7. Bao, D.; Shang, R.; Wang, R.; Ma, R. AIS big data framework for maritime safety supervision. In Proceedings of the 2020 International Conference on Robots & Intelligent System (ICRIS), Sanya, China, 7–8 November 2020; pp. 150–153. [Google Scholar]
  8. Acejo, I.; Sampson, H.; Turgo, N.; Ellis, N.; Tang, L. The Causes of Maritime Accidents in the Period 2002–2016. 2018. Available online: https://researchportal.plymouth.ac.uk/en/publications/the-causes-of-maritime-accidents-in-the-period-2002-2016 (accessed on 30 November 2018).
  9. Heij, C.; Knapp, S. Effects of wind strength and wave height on ship incident risk: Regional trends and seasonality. Transp. Res. Part D Transp. Environ. 2015, 37, 29–39. [Google Scholar] [CrossRef]
  10. Brandt, P.; Munim, Z.H.; Chaal, M.; Kang, H.S. Maritime accident risk prediction integrating weather data using machine learning. Transp. Res. Part D Transp. Environ. 2024, 136, 104388. [Google Scholar] [CrossRef]
  11. Özdemir, Ü.; Güneroğlu, A. Strategic Approach Model for Investigating the Cause of Maritime Accidents. Sci. J. Traffic Transp. Res. 2015, 27, 113–123. [Google Scholar] [CrossRef]
  12. Pilatis, A.; Pagonis, D.N.; Serris, M.; Peppa, S.; Kaltsas, G. A Statistical Analysis of Ship Accidents (1990–2020) Focusing on Collision, Grounding, Hull Failure, and Resulting Hull Damage. J. Mar. Sci. Eng. 2024, 12, 122. [Google Scholar] [CrossRef]
  13. Bye, R.; Aalberg, A. Maritime navigation accidents and risk indicators: An exploratory statistical analysis using AIS data and accident reports. Reliab. Eng. Syst. Saf. 2018, 176, 174–186. [Google Scholar] [CrossRef]
  14. Ventikos, N.; Stavrou, D.; Andritsopoulos, A. Studying the marine accidents of the Aegean Sea: Critical review, analysis and results. J. Mar. Eng. Technol. 2016, 16, 103–113. [Google Scholar] [CrossRef]
  15. Toffoli, A.; Lefevre, J.M.; Bitner-Gregersen, E.; Monbaliu, J. Towards the identification of warning criteria: Analysis of a ship accident database. Appl. Ocean Res. 2005, 27, 281–291. [Google Scholar] [CrossRef]
  16. Sadaharu, K. Major Challenges and Solutions for Utilizing Big Data in the Maritime Industry. Master’s Thesis, World Maritime University, Malmö, Sweden, 2015. [Google Scholar]
  17. Jović, M.; Edvard, T. Big Data Management in Maritime Transport. Pomor. Zb. 2019, 57, 123–141. [Google Scholar] [CrossRef]
  18. Liu, Z.; Zhang, B.; Zhang, M.; Wang, H.; Fu, X. A quantitative method for the analysis of ship collision risk using AIS data. Ocean. Eng. 2023, 272, 113906. [Google Scholar] [CrossRef]
  19. Ma, Q.; Tang, H.; Liu, C.; Zhang, M.; Zhang, D.; Liu, Z.; Zhang, L. A big data analytics method for the evaluation of maritime traffic safety using automatic identification system data. Ocean. Coast. Manag. 2024, 251, 107077. [Google Scholar] [CrossRef]
  20. Wu, J.; Thorne-Large, J.; Zhang, P. Safety first: The risk of over-reliance on technology in navigation. J. Transp. Saf. Secur. 2021, 14, 1220–1246. [Google Scholar] [CrossRef]
  21. Feng, H. Analysis and enlightenment of AIS content in the accident report of M/T SANCHI. Navig. Technol. 2019, 2, 72–73. [Google Scholar]
  22. Maceiras, C.; Cao-Feijóo, G.; Pérez-Canosa, J.M.; Orosa, J.A. Application of Machine Learning in the Identification and Prediction of Maritime Accident Factors. Appl. Sci. 2024, 14, 7239. [Google Scholar] [CrossRef]
  23. O’Reilly, K. Ethnographic Methods, 2nd ed.; Routledge: London, UK, 2012. [Google Scholar] [CrossRef]
  24. Frisk, E.; Krysander, M.; Jung, D. A Toolbox for Analysis and Design of Model Based Diagnosis Systems for Large Scale Models. IFAC-Papers 2017, 50, 3287–3293. [Google Scholar] [CrossRef]
  25. Dominguez-Péry, C.; Tassabehji, R.; Corset, F.; Chreim, Z. A holistic view of maritime navigation accidents and risk indicators: Examining IMO reports from 2011 to 2021. J. Shipp. Trade 2023, 8, 11. [Google Scholar] [CrossRef]
  26. Bogalecka, M. Collision and Contact–Analysis of Accidents at Sea. Int. J. Mar. Navig. Saf. Sea Transp. 2024, 11, 75–85. [Google Scholar] [CrossRef]
  27. Stojanovic, J.; Koc, Y.; Wang, K. Zampeta analytics for maritime safety. In Proceedings of the International Conference on e-Business Engineering (ICEBE), Shanghai, China, 12–13 October 2019; pp. 1–8. [Google Scholar]
  28. Zhang, W.; Kopca, C.; Tang, J.; Ma, D. A Systematic Approach for Collision Risk Analysis based on AIS Data. J. Navig. 2017, 70, 1–16. [Google Scholar] [CrossRef]
Chart 1. Methodological flow chart. Source: authors’ compilation.
Chart 1. Methodological flow chart. Source: authors’ compilation.
Bdcc 09 00135 ch001
Figure 1. Big data analysis: frequency table—accidents. Source: own study.
Figure 1. Big data analysis: frequency table—accidents. Source: own study.
Bdcc 09 00135 g001
Figure 2. Big data analysis: dataset in sizing. Source: own study.
Figure 2. Big data analysis: dataset in sizing. Source: own study.
Bdcc 09 00135 g002
Figure 3. Big data analysis: deadweight. Source: own study.
Figure 3. Big data analysis: deadweight. Source: own study.
Bdcc 09 00135 g003
Figure 4. Big data analysis—deadweight in groups. Source: own study.
Figure 4. Big data analysis—deadweight in groups. Source: own study.
Bdcc 09 00135 g004
Figure 5. Big data analysis: average ambient temperature (groups). Source: own study.
Figure 5. Big data analysis: average ambient temperature (groups). Source: own study.
Bdcc 09 00135 g005
Figure 6. Big data analysis: a comprehensive analysis of wind force data categorized according to the Beaufort wind force scale. Source: own study.
Figure 6. Big data analysis: a comprehensive analysis of wind force data categorized according to the Beaufort wind force scale. Source: own study.
Bdcc 09 00135 g006
Figure 7. Big data analysis: swell force data categorized by the DSS (Douglas sea scale). Source: own study.
Figure 7. Big data analysis: swell force data categorized by the DSS (Douglas sea scale). Source: own study.
Bdcc 09 00135 g007
Figure 8. Big data analysis: average speed. Source: own study.
Figure 8. Big data analysis: average speed. Source: own study.
Bdcc 09 00135 g008
Figure 9. Big data analysis: cargo distribution. Source: own study.
Figure 9. Big data analysis: cargo distribution. Source: own study.
Bdcc 09 00135 g009
Figure 10. Big data analysis: vessel state. Source: own study.
Figure 10. Big data analysis: vessel state. Source: own study.
Bdcc 09 00135 g010
Table 1. Big data analysis: descriptive statistics.
Table 1. Big data analysis: descriptive statistics.
Statistics
NMeanMedianModeStd. DeviationMinimumMaximum
ValidMissing
Accident90,21501.001.0010.01601
Sizing90,21502.412.0011.22014
Deadweight90,2150105,102.29112,949.00163,21641,213.87139,378164,565
Average Ambient Temperature28,35961,85622.218524.000028.008.38562−13.00255.00
Wind Force (BFT)28,17762,0384.50134.00004.001.360810.0010.00
Swell Force (DSS)28,15662,0593.19453.00004.001.734450.009.00
Average Speed, in Knots70,59819,6175.95923.50000.006.666890.00490.00
Source: own study.
Table 2. Big data analysis: frequency table—accidents.
Table 2. Big data analysis: frequency table—accidents.
Accident
FrequencyPercentValid PercentCumulative Percent
ValidYes220.0240.0240.0
No90,19399.975100.0100.0
Total90,215100.0100.0
Source: own study.
Table 3. Big data analysis: dataset in sizing.
Table 3. Big data analysis: dataset in sizing.
Sizing
FrequencyPercentValid PercentCumulative Percent
ValidAframax31,54635.035.035.0
Handymax14,46716.016.051.0
Panamax19,50221.621.672.6
Suezmax24,70027.427.4100.0
Total90,215100.0100.0
Source: own study.
Table 4. Big data analysis: deadweight.
Table 4. Big data analysis: deadweight.
Deadweight
FrequencyPercentValid PercentCumulative Percent
Valid39,37860926.86.86.8
39,58958676.56.513.3
53,10725082.82.816.0
68,43925922.92.918.9
74,03922252.52.521.4
74,04324302.72.724.1
74,25123782.62.626.7
74,29649635.55.532.2
74,32725042.82.835.0
74,32924102.72.737.7
105,34427313.03.040.7
105,36523542.62.643.3
105,37423622.62.645.9
105,39230483.43.449.3
112,94931073.43.452.7
113,00424362.72.755.4
113,03928753.23.258.6
113,55424382.72.761.3
113,61224592.72.764.0
113,65125112.82.866.8
113,73727783.13.169.9
117,05524472.72.772.6
155,72149605.55.578.1
155,72324842.82.880.9
157,53921522.42.483.3
157,64821452.42.485.6
157,74021422.42.488.0
163,21665377.27.295.3
163,25020682.32.397.5
164,56522122.52.5100.0
Total90,215100.0100.0
Source: own study.
Table 5. Big data analysis: deadweight in groups.
Table 5. Big data analysis: deadweight in groups.
Deadweight (Groups)
FrequencyPercentValid PercentCumulative Percent
ValidUp to 75,00033,96937.737.737.7
75,001–150,00031,54635.035.072.6
Above 150,00024,70027.427.4100.0
Total90,215100.0100.0
Source: own study.
Table 6. Big data analysis: average ambient temperate (groups).
Table 6. Big data analysis: average ambient temperate (groups).
Average Ambient Temperature (Groups)
FrequencyPercentValid PercentCumulative Percent
ValidUp to 2010,39311.536.636.6
20.1–2811,09612.339.175.8
Above 28.0168707.624.2100.0
Total28,35931.4100.0
MissingSystem61,85668.6
Total90,215100.0
Source: own study.
Table 7. Big data analysis: a comprehensive analysis of wind force data categorized according to the Beaufort wind force scale.
Table 7. Big data analysis: a comprehensive analysis of wind force data categorized according to the Beaufort wind force scale.
Wind Force (BFT)
FrequencyPercentValid PercentCumulative Percent
Valid0.00410.00.10.1
1.001390.20.50.6
2.007490.82.73.3
3.0056026.219.923.2
4.00934810.433.256.4
5.0057226.320.376.7
6.0044745.015.992.5
7.0014401.65.197.7
8.005240.61.999.5
9.001070.10.499.9
10.00310.00.1100.0
Total28,17731.2100.0
MissingSystem62,03868.8
Total90,215100.0
Source: own study.
Table 8. Big data analysis: swell force data categorized by the DSS (Douglas sea scale).
Table 8. Big data analysis: swell force data categorized by the DSS (Douglas sea scale).
Sea State (DSS)
FrequencyPercentValid PercentCumulative Percent
ValidCalm (glassy)990.10.40.4
Calm (rippled)4860.51.72.1
Smooth19572.27.09.0
Slight75828.427.036.0
Moderate990111.035.271.3
Rough50385.617.989.2
Very rough21922.47.897.0
High6200.72.299.2
Very high1980.20.799.9
Phenomenal230.00.1100.0
Total28,09631.1100.0
MissingSystem62,11968.9
Total90,215100.0
Source: own study.
Table 9. Big data analysis: average speed.
Table 9. Big data analysis: average speed.
Average Speed, in Knots (Groups)
FrequencyPercentValid PercentCumulative Percent
ValidZero speed26,74929.737.937.9
0.01–3.586529.612.350.1
3.51–12.217,36619.224.674.7
More than 12.2117,83119.825.3100.0
Total70,59878.3100.0
MissingSystem19,61721.7
Total90,215100.0
Source: own study.
Table 10. Big data analysis: cargo distribution.
Table 10. Big data analysis: cargo distribution.
Cargo
FrequencyPercentValid PercentCumulative Percent
ValidLaden53,80959.659.659.6
Ballast36,40640.440.4100.0
Total90,215100.0100.0
Source: own study.
Table 11. Big data analysis: vessel state.
Table 11. Big data analysis: vessel state.
Vessel State
FrequencyPercentValid PercentCumulative Percent
ValidAt port30,16333.436.336.3
Dry dock6440.70.837.0
Maneuvering18,57520.622.359.4
Operation54376.06.565.9
Sea passage28,35931.434.1100.0
Total83,17892.2100.0
MissingSystem70377.8
Total90,215100.0
Source: own study.
Table 12. Crosstabulation analysis: accident * sizing.
Table 12. Crosstabulation analysis: accident * sizing.
Crosstab
SizingTotal
AframaxHandymaxPanamaxSuezmax
AccidentYesCount733922
% within accident31.8%13.6%13.6%40.9%100.0%
% within sizing0.0%0.0%0.0%0.0%0.0%
% of total0.0%0.0%0.0%0.0%0.0%
NoCount31,53914,46419,49924,69190,193
% within accident35.0%16.0%21.6%27.4%100.0%
% within sizing100.0%100.0%100.0%100.0%100.0%
% of total35.0%16.0%21.6%27.4%100.0%
TotalCount31,54614,46719,50224,70090,215
% within accident35.0%16.0%21.6%27.4%100.0%
% within sizing100.0%100.0%100.0%100.0%100.0%
% of total35.0%16.0%21.6%27.4%100.0%
Source: own study.
Table 13. Crosstabulation analysis: chi-square tests.
Table 13. Crosstabulation analysis: chi-square tests.
Chi-Square Tests
ValuedfAsymptotic Significance (2-Sided)
Pearson Chi-Square2.261 a30.520
Likelihood Ratio2.17030.538
Linear-by-Linear Association0.73010.393
N of Valid Cases90,215
Source: own study. a Two cells (25.0%) have expected count less than 5. The minimum expected count is 3.53.
Table 14. Crosstabulation analysis: accident * deadweight (groups).
Table 14. Crosstabulation analysis: accident * deadweight (groups).
Crosstab
Deadweight (Groups)Total
Up to 75,00075,001–150,000Above 150,000
AccidentYesCount67922
% within accident27.3%31.8%40.9%100.0%
% within deadweight (groups)0.0%0.0%0.0%0.0%
% of total0.0%0.0%0.0%0.0%
NoCount33,96331,53924,69190,193
% within accident37.7%35.0%27.4%100.0%
% within deadweight (groups)100.0%100.0%100.0%100.0%
% of total37.6%35.0%27.4%100.0%
TotalCount33,96931,54624,70090,215
% within accident37.7%35.0%27.4%100.0%
% within deadweight (groups)100.0%100.0%100.0%100.0%
% of total37.7%35.0%27.4%100.0%
Source: own study.
Table 15. Crosstabulation analysis: chi-square tests.
Table 15. Crosstabulation analysis: chi-square tests.
Chi-Square Tests
ValuedfAsymptotic Significance (2-Sided)
Pearson Chi-Square2.164 a20.339
Likelihood Ratio2.03720.361
Linear-by-Linear Association1.96610.161
N of Valid Cases90,215
Source: own study. a Zero cells (0.0%) have expected count less than 5. The minimum expected count is 6.02.
Table 16. Crosstabulation: accident * average ambient temperature (groups).
Table 16. Crosstabulation: accident * average ambient temperature (groups).
Crosstab
Average Ambient Temperature (Groups)Total
Up to 2020.1–28Above 28.01
AccidentYesCount65213
% within accident46.2%38.5%15.4%100.0%
% within average ambient temperature (groups)0.1%0.0%0.0%0.0%
% of total0.0%0.0%0.0%0.0%
NoCount10,38711,091686828,346
% within accident36.6%39.1%24.2%100.0%
% within average ambient temperature (groups)99.9%100.0%100.0%100.0%
% of total36.6%39.1%24.2%100.0%
TotalCount10,39311,096687028,359
% within accident36.6%39.1%24.2%100.0%
% within average ambient temperature (groups)100.0%100.0%100.0%100.0%
% of total36.6%39.1%24.2%100.0%
Source: own study.
Table 17. Crosstabulation: accident * average ambient temperature (groups) chi-square tests.
Table 17. Crosstabulation: accident * average ambient temperature (groups) chi-square tests.
Chi-Square Tests
ValuedfAsymptotic Significance (2-Sided)
Pearson Chi-Square0.742 a20.690
Likelihood Ratio0.78020.677
Linear-by-Linear Association0.73810.390
N of Valid Cases28.359
Source: own study. a Two cells (33.3%) have expected count less than 5. The minimum expected count is 3.15.
Table 18. Big data analysis: ANOVA, descriptives.
Table 18. Big data analysis: ANOVA, descriptives.
Descriptives
NMeanStd. DeviationStd. Error95% Confidence Interval for MeanMinimumMaximum
Lower BoundUpper Bound
SizingYes222.641.3290.2832.053.2314
No90,1932.411.2200.0042.412.4214
Total90,2152.411.2200.0042.412.4214
DeadweightYes22116,240.2043,832.2719345.07296,806.06135,674.3439,378163,250
No90,193105,099.5841,213.103137.230104,830.61105,368.5539,378164,565
Total90,215105,102.2941,213.871137.216104,833.35105,371.2439,378164,565
Deadweight (Groups)Yes222.140.8340.1781.772.5113
No90,1931.900.8000.0031.891.9013
Total90,2151.900.8000.0031.891.9013
Average Ambient TemperatureYes1320.15388.849162.4543214.806425.50131.0030.00
No28,34622.21948.385450.0498122.121822.3170−13.00255.00
Total28,35922.21858.385620.0498022.120922.3161−13.00255.00
Average Ambient Temperature (Groups)Yes131.690.7510.2081.242.1513
No28,3461.880.7700.0051.871.8813
Total28,3591.880.7700.0051.871.8813
Wind Force (BFT)Yes134.15381.463220.405833.26965.03811.006.00
No28,1644.50151.360770.008114.48564.51740.0010.00
Total28,1774.50131.360810.008114.48544.51720.0010.00
Swell Force (DSS)Yes132.92311.497860.415432.01793.82820.005.00
No28,1433.19461.734560.010343.17433.21490.009.00
Total28,1563.19451.734450.010343.17423.21470.009.00
Sea State (DSS)Yes133.76921.480640.410662.87454.66401.006.00
No28,0833.95881.257730.007513.94413.97350.009.00
Total28,0963.95871.257810.007503.94403.97340.009.00
Average Speed, in KnotsYes216.88956.205151.354084.06509.71410.0013.84
No70,5775.95906.667040.025105.90986.00820.00490.00
Total70,5985.95926.666890.025095.91016.00840.00490.00
Average Speed, in Knots (Groups)Yes212.571.3630.2971.953.1914
No70,5772.371.2230.0052.362.3814
Total70,5982.371.2230.0052.362.3814
Source: own study.
Table 19. Big data analysis: ANOVA.
Table 19. Big data analysis: ANOVA.
ANOVA
Sum of SquaresdfMean SquareFSig.
SizingBetween Groups1.08711.0870.7300.393
Within Groups134,378.63490,2131.490
Total134,379.72290,214
DeadweightBetween Groups2,729,832,761.14612,729,832,761.1461.6070.205
Within Groups153,233,252,987,388.12090,2131,698,571,746.726
Total153,235,982,820,149.28090,214
Deadweight (Groups)Between Groups1.25811.2581.9660.161
Within Groups57,715.41390,2130.640
Total57,716.67190,214
Average Ambient TemperatureBetween Groups55.440155.4400.7880.375
Within Groups1,994,040.97328,35770.319
Total1,994,096.41328,358
Average Ambient Temperature (Groups)Between Groups0.43810.4380.7380.390
Within Groups16,824.90528,3570.593
Total16,825.34328,358
Wind Force (BFT)Between Groups1.57011.5700.8480.357
Within Groups52,174.63028,1751.852
Total52,176.20028,176
Swell Force (DSS)Between Groups0.95810.9580.3180.573
Within Groups84,698.02628,1543.008
Total84,698.98528,155
Sea State (DSS)Between Groups0.46710.4670.2950.587
Within Groups44,448.55828,0941.582
Total44,449.02428,095
Average Speed, in KnotsBetween Groups18.179118.1790.4090.522
Within Groups3,137,833.99670,59644.448
Total3,137,852.17570,597
Average Speed, in Knots (Groups)Between Groups0.83310.8330.5570.456
Within Groups105,656.22070,5961.497
Total105,657.05470,597
Source: own study.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zampeta, V.; Chondrokoukis, G.; Kyriazis, D. Applying Big Data for Maritime Accident Risk Assessment: Insights, Predictive Insights and Challenges. Big Data Cogn. Comput. 2025, 9, 135. https://doi.org/10.3390/bdcc9050135

AMA Style

Zampeta V, Chondrokoukis G, Kyriazis D. Applying Big Data for Maritime Accident Risk Assessment: Insights, Predictive Insights and Challenges. Big Data and Cognitive Computing. 2025; 9(5):135. https://doi.org/10.3390/bdcc9050135

Chicago/Turabian Style

Zampeta, Vicky, Gregory Chondrokoukis, and Dimosthenis Kyriazis. 2025. "Applying Big Data for Maritime Accident Risk Assessment: Insights, Predictive Insights and Challenges" Big Data and Cognitive Computing 9, no. 5: 135. https://doi.org/10.3390/bdcc9050135

APA Style

Zampeta, V., Chondrokoukis, G., & Kyriazis, D. (2025). Applying Big Data for Maritime Accident Risk Assessment: Insights, Predictive Insights and Challenges. Big Data and Cognitive Computing, 9(5), 135. https://doi.org/10.3390/bdcc9050135

Article Metrics

Back to TopTop