Statistical Modeling of Bilge Water Discharge from Ships During Normal Operation

: This study estimated the amount of bilge water spilled from ships during normal operation and identiﬁed the contributing factors of the discharge by building a statistical model. To build the statistical model, we collected as much information as possible about the amount of bilge water in ships, then used the collected information to formulate a probability distribution of the discharge amount according to Bayesian statistics, and determined the parameters included in the model by applying a Markov Chain Monte Carlo method. The analysis of those parameters shows that the integrated bilge treatment system (IBTS) primarily contributes to reductions in the discharge amount, and that the container-type ship is involved with especially large discharge amounts.


Introduction
Normal maritime operations are one of the sources of hydrocarbon release into the ocean. Liquid, including hydrocarbons, is discharged through the washing of ballast and oil tanks. Bilge water (a mixture of water and oil accumulated in the bottom plate of a vessel, which includes a high concentration of organic hydrocarbons) is discharged into the marine environment for a number of reasons. Such discharge from vessels during normal operations is predicted to be a major source of hydrocarbon release into the ocean, along with runoff from rivers and sewage [1]. Bilge water shares a similarity with ballast water, as both are discharged from ships, then unfavorably influence the marine environment. A recent status of treating ballast water was reported by the authors of [2]. This paper presents a method for quantifying the amount of bilge water discharge in order to mitigate marine pollutions due to the normal ship operations.
Oil spills due to maritime accidents devastate the marine environment, especially if the spilled oil drifts on the surface of the sea and eventually arrives at coasts adjacent to the accident site. This marine environmental pollution attracts much attention, because the amount of oil spilled per unit time is extremely large. Meanwhile, the amount of discharge during normal operations per unit time is small. However, unlike oil spills due to maritime accidents, bilge water discharge occurs constantly from numerous vessels [3].
Recently, the United States Environmental Protection Agency (EPA) strengthened requirements for polluted liquid discharge on all commercial vessels greater than 79 feet in length under normal operations [4]. These vessels need to meet several requirements of the vessel general permit (VGP) when they enter within three nautical miles of the United States coastline or inland waters. This enhancement of the regulation suggests that more attention ought to be paid to chronic discharge, such as bilge water discharge, as well as to oil spillage due to maritime accidents. In order to identify which vessel specifications can be factors in bilge water discharge, this study attempted to build a statistical model that appropriately expresses the relationship of the vessel specification with discharge amount. The statistical model developed in this study enables us to quantify the bilge water discharge from various ship types. The establishment of such a statistical model provides useful information for refining the structural design of vessels that tend to discharge a relatively large amount of hydrocarbon, and for updating maritime regulations to protect the marine environment from pollution due to hydrocarbon discharge. In particular, the quantification of bilge water and identification of discharge factor can offer optimized usages of the IBTS and efficient eliminations of discharge factor.
In this study, data were collected through a search of the literature on the amount of bilge water discharged from ships during normal operations, together with information about ship type and construction year. The collected data was then used to build a Bayesian statistical model. Bayesian statistics is a powerful methodology for elucidating a relationship between two variables, even if there is only a small amount of information available. The authors of [5] constructed a receptor model to identify air pollution sources based on the Bayesian method. The authors of [6] used the Bayesian approach to assess the ecological risk in Tokyo surface water due to several toxic substances.
The IBTS mentioned above is expected to contribute to reduction in bilge water. Nevertheless, other factors that potentially result in the discharge of bilge water must be considered. The main purpose of this study was to identify other factors through examining the statistical properties of the parameters used in the model. This study hypothesized that a Bayesian statistical model that involves multiple parameters would provide us with a way to identify the other factors. This paper describes how such a model was built and discusses the factors influencing bilge water discharge by interpreting properties of that model.
In the subsequent sections, two statistical models are described: constant and variable discharge probability models. Through a comparison between those two models, this paper demonstrates the importance of considering uncertainties in the data and parameters used for constructing the model, thereby proving the usefulness of the statistical modeling.

Data Collection
Published papers and reports [7][8][9][10] were referred to for the data on ship type, ship size, amount of bilge water, and ship construction year (Table 1). In this table, bilge water amount is written in the unit of m 3 /day, which was converted from unit of m 3 per half year used by the authors of [8], and from m 3 per month used by the authors of [9].
One may claim that factors of the bilge water discharge can be revealed just by reading the values listed in Table 1. Such a simple analysis may be enough in order to qualitatively evaluate discharge amounts. However, it cannot serve as a reliable way to address the seawater pollution due to two problems. First, bilge water amount data are generally contaminated by errors through measurement and analysis processes. Second, a conclusion from the simple analysis is inapplicable to a prediction of the amount of discharge from many vessels other than the ones listed in Table 1. Statistical modeling is one of the rational methods for resolving these two problems and for maximizing the utility of collected data.

Constant Discharge Probability Model
The binomial distribution function was adopted to express the relationship of the response variable, that is, the amount of bilge water, with the explanatory variables, that is, factors contributing to bilge water discharge. The binomial distribution includes a single parameter which determines the occurrence probability of an independent trial, and is thus suitable to simply represent that relationship by regarding the parameter as the occurrence probability of a bilge water discharge event, hereafter referred to as the discharge probability.
The application of the binomial distribution to the present model has the disadvantage of representing the number of event occurrences, which is an integer. Thus, an amount of bilge water, which is originally a continuous value, needs to be converted into a discrete value. In this study, the amount of bilge water ranging from 0 m 3 /day to 0.5 m 3 /day was converted to 0 m 3 /day, that ranging from 0.5 m 3 /day to 1.0 m 3 /day was converted to 1 m 3 /day, and so forth. In this way, the next larger integer was assigned to the amount of bilge water as the amount increased by 0.5 m 3 /day. With this treatment, a larger amount of discharge was thought of as a larger number of times the trial, that is, the discharge event, occurs. It may be possible to apply the normal distribution function to the present model. The normal distribution function involves two parameters, i.e., mean and variance, both of which have to be specified at the same time for a statistical calculation. By emphasizing the simplicity of the mathematical expression of the binomial distribution function, which makes numerical computations correspondingly simpler, this study applied the binomial distribution. Denoting the discharge probability mentioned above as q, the likelihood, which measures the extent to which a specification of the parameter reproduces the observation, is expressed as, where N d k denotes the number of combinations that result from choosing k elements from a set of N d elements, N indicates the total number of the vessels in Table 1, and N d represents the maximum discrete value of discharge. The maximum likelihood method was applied to determine q, which, in this model, is constant. Searching for the value of q that maximizes the log-likelihood (Figure 1), q = 0.2218 was found to provide the best fit of the probability distribution function to the observation. Denoting the discharge probability mentioned above as q, the likelihood, which measures the extent to which a specification of the parameter reproduces the observation, is expressed as, where denotes the number of combinations that result from choosing elements from a set of elements, indicates the total number of the vessels in Table 1, and represents the maximum discrete value of discharge.
The maximum likelihood method was applied to determine , which, in this model, is constant. Searching for the value of that maximizes the log-likelihood (Figure 1), = 0.2218 was found to provide the best fit of the probability distribution function to the observation.
Comparing the means and variances computed using the observation (Table 1) and applying the maximum likelihood method revealed that, though the mean of the model agrees with the mean of the observation, the model underestimated the variance ( Table 2). The expected number of vessels that spill an amount can be calculated using the formula of the binomial distribution. Comparison of the calculated numbers with the observed one ( Figure 2) showed that this model failed to reproduce the dispersion exhibited by observation, i.e., the observation was overdispersed, demanding us to conduct a more sophisticated modeling, which is presented in the next section.   Comparing the means and variances computed using the observation (Table 1) and applying the maximum likelihood method revealed that, though the mean of the model agrees with the mean of the observation, the model underestimated the variance ( Table 2). The expected number of vessels that spill an amount k can be calculated using the formula of the binomial distribution. Comparison of the calculated numbers with the observed one ( Figure 2) showed that this model failed to reproduce the dispersion exhibited by observation, i.e., the observation was overdispersed, demanding us to conduct a more sophisticated modeling, which is presented in the next section. Denoting the discharge probability mentioned above as q, the likelihood, which measures the extent to which a specification of the parameter reproduces the observation, is expressed as, where denotes the number of combinations that result from choosing elements from a set of elements, indicates the total number of the vessels in Table 1, and represents the maximum discrete value of discharge.
The maximum likelihood method was applied to determine , which, in this model, is constant. Searching for the value of that maximizes the log-likelihood ( Figure 1), = 0.2218 was found to provide the best fit of the probability distribution function to the observation.
Comparing the means and variances computed using the observation (Table 1) and applying the maximum likelihood method revealed that, though the mean of the model agrees with the mean of the observation, the model underestimated the variance ( Table 2). The expected number of vessels that spill an amount can be calculated using the formula of the binomial distribution. Comparison of the calculated numbers with the observed one ( Figure 2) showed that this model failed to reproduce the dispersion exhibited by observation, i.e., the observation was overdispersed, demanding us to conduct a more sophisticated modeling, which is presented in the next section.

Variable Discharge Probability Model
Examination of the constant discharge probability model demonstrates that it is unable to incorporate the dispersion in the actual discharge amount data into its mathematical expression. This arises from treating the discharge probability as a constant parameter. A resolution of this issue necessitates an improved model involving individual differences in the discharge probability among the vessels.

Formulation
In the variable discharge probability model, the discharge probability q i is regarded as a variable that depends on i, the index number of vessels. It is assumed to have a relation to parameters β and γ i as follows: log which is generally referred to as the logit model, and the righthand side is referred to as linear predictor [11]. β is the parameter of discharge amount that all the vessels have in common, hereafter referred to as the common parameter; γ i (i = 1, · · · , N) is the parameter of discharge amount that varies for each vessel, hereafter referred to as the individual parameter. Rearranging Equation (2) gives the discharge probability q i as which gives q i monotonically increasing against β + γ i (Figure 3).

Variable Discharge Probability Model
Examination of the constant discharge probability model demonstrates that it is unable to incorporate the dispersion in the actual discharge amount data into its mathematical expression. This arises from treating the discharge probability as a constant parameter. A resolution of this issue necessitates an improved model involving individual differences in the discharge probability among the vessels.

Formulation
In the variable discharge probability model, the discharge probability is regarded as a variable that depends on , the index number of vessels. It is assumed to have a relation to parameters and as follows: which is generally referred to as the logit model, and the righthand side is referred to as linear predictor [11]. is the parameter of discharge amount that all the vessels have in common, hereafter referred to as the common parameter; ( = 1, ⋯ , ) is the parameter of discharge amount that varies for each vessel, hereafter referred to as the individual parameter. Rearranging Equation (2) gives the discharge probability as which gives monotonically increasing against ( Figure 3). The parameters , , and vary stochastically. This aspect is in contrast to the constant discharge probability model. Distributions of and are described in terms of other parameters, which are referred to as hyperparameters in the framework of the hierarchical Bayesian model [12].
The prior probability of the parameter is assumed to follow the normal distribution, expressed as The parameters q i , β, and γ i vary stochastically. This aspect is in contrast to the constant discharge probability model. Distributions of β and γ i are described in terms of other parameters, which are referred to as hyperparameters in the framework of the hierarchical Bayesian model [12].
The prior probability of the parameter β is assumed to follow the normal distribution, expressed as where σ β is a hyperparameter that quantifies the variance of the parameter β and is assumed to be a constant of 10.0, which is an uninformative prior.
The prior probability of the parameter γ i follows the normal distribution as where σ is also a hyperparameter that characterizes the Gaussian curve expressed by Equation (5) and is assumed to follow a uniform distribution. Multiplying Equation (4) by Equation (5) yields the final form of the prior distribution as The formulation of the prior distribution is followed by the formulation of a likelihood which conveys the information from observations into a model. In a manner similar to the preceding section, the probability of bilge water discharge from the i-th vessel (i = 1, · · · , N) follows the binomial distribution as follows: where k i denotes the discrete value of discharge amount from the i-th ship. Using Equations (6) and (7) and according to Bayes' theorem, the posterior probability distribution is written as, where p(k 1 , · · · , k N ) denotes the probability that the data k 1 , · · · , k N occur, which is independent on the parameters β, σ, r 1 , · · · · · · , r N , and is thus treated as a constant.

Numerical Method for Parameter Specification
This study applied the Metropolis-Hastings algorithm, one of the Markov Chain Monte Carlo (MCMC) methods [13], to the calculation of the posterior probability. In the MCMC method, numerous samples of a stochastic variable are randomly taken according to a probability density, and as the samples increase, the discrete set of samples gradually better approximates the continuous distribution of the probability.
Let θ be a parameter to which we hope to make inference. The Metropolis-Hastings algorithm generates random samples by consecutively proposing a new value θ * (Figure 4). If this proposed value is accepted according to a criterion mentioned below, it is set to be the next value θ j+1 , where j is a sampling sequence. If the proposed value is rejected, the previous value is retained, i.e., θ j+1 = θ j .
The criterion of whether a proposition is accepted or rejected computes a probability density p( θ * ), which is in proportion to the posterior probability and, accordingly, also in proportion to the likelihood. Then, the acceptance probability, defined as r ≡ p( θ * )/p θ j , is computed. With a probability of min(1, r), the proposition is accepted. Otherwise, it is rejected with a probability of 1 − min(1, r). When the number of samples grows enough, most of the disturbances involved in the initial guess of the parameters are removed and, consequently, a distribution of the samples with a density proportional to the posterior probability is determined.

Results and Discussion of Variable Discharge Probability Model
Trace plots of some parameters ( Figure 5) generated by MCMC sampling had increasing and decreasing trends, which resulted from the disturbances due to the initial guess of the parameters. To prevent the trends from affecting analyses of the posterior distribution, the first 3000 samples were discarded as burn-in, the remaining samples were used for the analyses. To see the relationship of numbers of vessels with the bilge water discharge amount, products of and (expected discharge amounts of each vessel) were computed for = 1, ⋯ , . Then, numbers of the vessels were counted for each discrete discharge amount. The numbers obtained in this way were compared with the number of vessels based on the observation ( Figure 6). The model was well consistent with the observation, and the variance calculated by the model was markedly improved compared to the variance obtained by the constant discharge probability model (Table 2).

Results and Discussion of Variable Discharge Probability Model
Trace plots of some parameters ( Figure 5) generated by MCMC sampling had increasing and decreasing trends, which resulted from the disturbances due to the initial guess of the parameters. To prevent the trends from affecting analyses of the posterior distribution, the first 3000 samples were discarded as burn-in, the remaining samples were used for the analyses.

Results and Discussion of Variable Discharge Probability Model
Trace plots of some parameters ( Figure 5) generated by MCMC sampling had increasing and decreasing trends, which resulted from the disturbances due to the initial guess of the parameters. To prevent the trends from affecting analyses of the posterior distribution, the first 3000 samples were discarded as burn-in, the remaining samples were used for the analyses. To see the relationship of numbers of vessels with the bilge water discharge amount, products of and (expected discharge amounts of each vessel) were computed for = 1, ⋯ , . Then, numbers of the vessels were counted for each discrete discharge amount. The numbers obtained in this way were compared with the number of vessels based on the observation ( Figure 6). The model was well consistent with the observation, and the variance calculated by the model was markedly improved compared to the variance obtained by the constant discharge probability model (Table 2). To see the relationship of numbers of vessels with the bilge water discharge amount, products of q i and N d (expected discharge amounts of each vessel) were computed for i = 1, · · · , N. Then, numbers of the vessels were counted for each discrete discharge amount. The numbers obtained in this way were compared with the number of vessels based on the observation ( Figure 6). The model was well consistent with the observation, and the variance calculated by the model was markedly improved compared to the variance obtained by the constant discharge probability model (Table 2).    The MCMC method generated the samples that approximate the posterior distributions of and . They were used to analyze the statistical properties of these parameters. Figures 7 and 8 show examples of the histograms of those samples and of the posterior distributions constructed using the histograms. Averages and 90% confidence intervals of and were calculated from the posterior distributions.   The MCMC method generated the samples that approximate the posterior distributions of and . They were used to analyze the statistical properties of these parameters. Figures 7 and 8 show examples of the histograms of those samples and of the posterior distributions constructed using the histograms. Averages and 90% confidence intervals of and were calculated from the posterior distributions.   The distributions of β and γ i determine the distributions of the discharge probability q i (Figure 8). The differences in q i among the vessels arise from the individual parameter γ i . Factors causing the discharge are discussed by examining γ i . As expected, the computed individual parameter γ i was larger for a larger discharge amount (Figure 9). A closer examination of the relationship between γ i and the discharge amount provides insight into the effect of the IBTS. The averages of γ i were computed separately for the vessels with and without the IBTS (Table 3). The comparison of γ i for the vessels with and without the IBTS shows that the IBTS certainly reduced the bilge water discharge amount. This was also demonstrated by analyzing the relationship of γ i to the year of construction ( Figure 10). Relatively new vessels constructed in the 1990s and 2000s were equipped with the IBTS and exhibited low γ i values. Under the MARPOL 73/78, the oil content of discharged effluent must not exceed 15 ppm. Both this regulation and the IBTS installation unquestionably suppressed the amount of discharge. The distributions of and determine the distributions of the discharge probability ( Figure 8). The differences in among the vessels arise from the individual parameter . Factors causing the discharge are discussed by examining . As expected, the computed individual parameter was larger for a larger discharge amount (Figure 9). A closer examination of the relationship between and the discharge amount provides insight into the effect of the IBTS. The averages of were computed separately for the vessels with and without the IBTS (Table 3). The comparison of for the vessels with and without the IBTS shows that the IBTS certainly reduced the bilge water discharge amount. This was also demonstrated by analyzing the relationship of to the year of construction ( Figure 10). Relatively new vessels constructed in the 1990s and 2000s were equipped with the IBTS and exhibited low values. Under the MARPOL 73/78, the oil content of discharged effluent must not exceed 15 ppm. Both this regulation and the IBTS installation unquestionably suppressed the amount of discharge.   Next, by examining in terms of the types of vessel, the question of which type of vessel has an especially large amount of discharge can be answered. Figure 11 depicts the individual parameter versus ship type. Whereas the vessels with the IBTS overall had smaller values than the vessels without the IBTS, special attention should be paid to the fact that of the vessels without the IBTS varied depending on the ship type. Container ships had a larger than the other types. The values of for two of the three container ships listed in Table 1 exceeded or approached the limit of the The distributions of and determine the distributions of the discharge probability ( Figure 8). The differences in among the vessels arise from the individual parameter . Factors causing the discharge are discussed by examining . As expected, the computed individual parameter was larger for a larger discharge amount (Figure 9). A closer examination of the relationship between and the discharge amount provides insight into the effect of the IBTS. The averages of were computed separately for the vessels with and without the IBTS ( Table 3). The comparison of for the vessels with and without the IBTS shows that the IBTS certainly reduced the bilge water discharge amount. This was also demonstrated by analyzing the relationship of to the year of construction ( Figure 10). Relatively new vessels constructed in the 1990s and 2000s were equipped with the IBTS and exhibited low values. Under the MARPOL 73/78, the oil content of discharged effluent must not exceed 15 ppm. Both this regulation and the IBTS installation unquestionably suppressed the amount of discharge.   Next, by examining in terms of the types of vessel, the question of which type of vessel has an especially large amount of discharge can be answered. Figure 11 depicts the individual parameter versus ship type. Whereas the vessels with the IBTS overall had smaller values than the vessels without the IBTS, special attention should be paid to the fact that of the vessels without the IBTS varied depending on the ship type. Container ships had a larger than the other types. The values of for two of the three container ships listed in Table 1 exceeded or approached the limit of the Next, by examining γ i in terms of the types of vessel, the question of which type of vessel has an especially large amount of discharge can be answered. Figure 11 depicts the individual parameter γ i versus ship type. Whereas the vessels with the IBTS overall had smaller γ i values than the vessels without the IBTS, special attention should be paid to the fact that γ i of the vessels without the IBTS varied depending on the ship type. Container ships had a larger γ i than the other types. The values of γ i for two of the three container ships listed in Table 1 exceeded or approached the limit of the 90% confidence interval. To more clearly bring confirmation of the dependence of γ i on the ship type, the averages of γ i were computed for each ship type ( Table 4). The container ships and unknown ships had greater averages of γ i than the other ship types.
Those results indicate that the container-type ship is somehow linked with the discharge of larger amounts of bilge water, and can be regarded as a secondary factor, while the absence of the IBTS constitutes the primary factor. Plotting the individual parameter γ i against ship size (gross tonnage (GT)) shows that the vessels of around 40000 GT had extremely large discharge amounts (Figure 12), which exceeded the 90% confidence interval.
It follows that the container-type ship and/or the vessel size of about 40000 GT were linked to a very large amount of bilge water discharge. It is unlikely, however, that ship size alone is related to the increase in the discharge. Technological problems might exist in the structural design of container ships without the IBTS, and/or there might be careless handling of the bilge water in those vessels.
The statistic model enables us to consider the uncertainties involved in the original data and, accordingly, to use results of the model to predict the bilge water discharge amount for a variety of vessels and propose an adequate strategy to improve ship designs toward smaller amounts of bilge water discharge. 90% confidence interval. To more clearly bring confirmation of the dependence of on the ship type, the averages of were computed for each ship type ( Table 4). The container ships and unknown ships had greater averages of than the other ship types. Those results indicate that the container-type ship is somehow linked with the discharge of larger amounts of bilge water, and can be regarded as a secondary factor, while the absence of the IBTS constitutes the primary factor. Plotting the individual parameter against ship size (gross tonnage (GT)) shows that the vessels of around 40000 GT had extremely large discharge amounts (Figure 12), which exceeded the 90% confidence interval.
It follows that the container-type ship and/or the vessel size of about 40000 GT were linked to a very large amount of bilge water discharge. It is unlikely, however, that ship size alone is related to the increase in the discharge. Technological problems might exist in the structural design of container ships without the IBTS, and/or there might be careless handling of the bilge water in those vessels.
The statistic model enables us to consider the uncertainties involved in the original data and, accordingly, to use results of the model to predict the bilge water discharge amount for a variety of vessels and propose an adequate strategy to improve ship designs toward smaller amounts of bilge water discharge.    unknown ships had greater averages of than the other ship types. Those results indicate that the container-type ship is somehow linked with the discharge of larger amounts of bilge water, and can be regarded as a secondary factor, while the absence of the IBTS constitutes the primary factor. Plotting the individual parameter against ship size (gross tonnage (GT)) shows that the vessels of around 40000 GT had extremely large discharge amounts (Figure 12), which exceeded the 90% confidence interval.
It follows that the container-type ship and/or the vessel size of about 40000 GT were linked to a very large amount of bilge water discharge. It is unlikely, however, that ship size alone is related to the increase in the discharge. Technological problems might exist in the structural design of container ships without the IBTS, and/or there might be careless handling of the bilge water in those vessels.
The statistic model enables us to consider the uncertainties involved in the original data and, accordingly, to use results of the model to predict the bilge water discharge amount for a variety of vessels and propose an adequate strategy to improve ship designs toward smaller amounts of bilge water discharge.  Individual parameter Individual parameter Figure 12. Relationship between ship size and individual parameters. Opened and closed circles are plots for vessels with IBTS and without IBTS, respectively. Cross symbols are plots for vessels for which information on the install of IBTS was unavailable. Solid line is mean of individual parameters, and dashed lines are upper and lower limits of 90% confidence interval.

Conclusions
For the purpose of identifying the major contributing factors of bilge water discharge from ships into the surrounding marine environment during normal operations, this study developed a statistical model that expresses the relationship of the amount of bilge water discharge with a few explanatory variables. This study collected as much data on bilge water as possible and used them to build the statistical model according to Bayesian statistics. Parameters included in the model were determined by applying the Metropolis-Hastings algorithm, one of the MCMC methods. The examination of the model results confirmed that the model well represents the expected values and variances of the probability distribution of the discharge amount. Interpretations of those parameters showed that the container-type ship is related to notably large amounts of bilge water discharge, while this large amount of discharge arises principally from the absence of the IBTS in the vessels.

Conflicts of Interest:
The authors declare no conflict of interest.