A Comparative Study of Statistical Techniques for Prediction of Meteorological and Oceanographic Conditions: An Application in Sea Spray Icing
Abstract
:1. Introduction
2. Methods
2.1. Bayesian Inference
Gaussian DataGenerating Process
 $\mu $: mean of the datagenerating process;
 ${\sigma}_{*}^{2}$: known variance of the datagenerating process;
 $\left({\mu}_{H},{\sigma}_{H}^{2}\right)$: hyperparameters of Gaussian prior distribution;
 $\overline{x}$: sample mean;
 $\left({\mu}_{H}^{\prime},{\sigma}_{H}^{2\prime}\right)$: hyperparameters of Gaussian posterior distribution;
 $\left({\mu}_{+},{\sigma}_{+}^{2}\right)$: parameters of Gaussian predictive distribution.
2.2. Sequential Importance Sampling
2.2.1. Sequential Importance Sampling for Markov Processes
2.3. Markov Chain Monte Carlo
2.3.1. The Metropolis–Hastings Algorithm
2.3.2. Convergence Diagnostic
2.4. Proposed Models
2.4.1. Proposed Bayesian Approach
2.4.2. Proposed Sequential Importance Sampling Algorithm
Algorithm 1 Proposed sequential importance sampling (SIS) for prediction of meteorological and oceanographic conditions. 

2.4.3. Proposed Markov Chain Monte Carlo Algorithm
Algorithm 2 Proposed Markov chain Monte Carlo (MCMC) for prediction of meteorological and oceanographic conditions. 

3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
List of Acronyms.  
Acronym  Meaning 
AAD  Average Absolute Deviation 
ACO  Ant Colony Optimization 
CV  Coefficient of Variation 
DOE  Design of Experiments 
IS  Importance Sampling 
MCDM  MultiCriteria DecisionMaking 
MCMC  Markov Chain Monte Carlo 
MCMC200  Markov Chain Monte Carlo with 200 iterations 
MCMC500  Markov Chain Monte Carlo with 500 iterations 
MCS  Monte Carlo Simulation 
MINCOG  MarineIcing model for the Norwegian COast Guard 
MLE  Maximum Likelihood Estimation 
NORA10  NOrwegian ReAnalysis 10 km 
NSR  Northern Sea Route 
RAMS  Reliability, Availability, Maintainability, and Safety 
Probability Density Function  
SIR  Sampling Importance Resampling 
SIS  Sequential Importance Sampling 
SIS200  Sequential Importance Sampling with 200 iterations 
SIS500  Sequential Importance Sampling with 500 iterations 
SMC  Sequential Monte Carlo 
List of Symbols.  
Symbol  Meaning 
$\left(a,b\right)$  The parameters of the Weibull distribution 
$A$  A positive value, which is needed to shift the data in Weibull estimation; since the Weibull distribution does not support nonpositive values. 
$C{V}_{m}$  CV for the $m$ last drawn samples in the MCMC algorithm iterations 
$C{V}^{T}$  A threshold for CV 
$D$  The number of days in a year, which adopts the values 365 and 366 for normal and leap years, respectively. 
$D{M}^{t,y}\left(\theta \right)$  The daily mean of the parameter $\theta $ at time $t$ in year $y$. Here, ‘time’ is referring to ‘day’. 
$D{V}^{t,y}\left(\theta \right)$  Deviation of $D{M}^{t,y}\left(\theta \right)$ from its value at time ‘$t1$’ in year $y$. Here, ‘time’ is referring to ‘day’. 
$E\left(X\right)$  The expected value of $X$ 
${E}_{{f}^{t}}\left[h\left({X}^{1:t}\right)\right]$  The expected value of a quantity of interest, $h\left({X}^{1:t}\right)$, with respect to ${f}^{t}$ 
$f$  A target density 
$f\left(\theta \right)$  Prior distribution of the parameter $\theta $ 
$f\left(\theta x\right)$  Posterior distribution of the parameter $\theta $ given the data $x$ 
$f\left(x\theta \right)$  The likelihood function of the data in hand, given the parameter $\theta $ 
${f}^{t}$  Target density of a discretetime sequential random variable at time $t$ 
$g$  Proposal density or envelope for $f$ 
${g}^{t}$  Proposal density or envelope for ${f}^{t}$ 
$h$  An arbitrary function 
$H$  Indices for hyperparameters 
H0  The null hypothesis in the AndersonDarling test of hypothesis 
H1  The alternative hypothesis in the AndersonDarling test of hypothesis 
$i$  Subscript index for samples; $i=1,2,\dots ,n$ 
$I{S}_{j}\left(\theta \right)$  The weighted average of all drawn samples until iteration $j$ for $\theta $, using IS weights 
$j$  Subscript index as iteration counter of algorithms; $j=1,2,\dots ,M$ 
${k}_{\left(z\right)}$  Center of the ${z}^{th}$ bin in the kernel density estimation 
$M$  Number of iterations of an algorithm 
$MCM{C}^{t}\left(\theta \right)$  MCMC estimation for $\theta $ at time $t$ 
$n$  Sample size 
p  The parameter of Binomial distribution 
${s}^{t,y}\left(\theta \right)$  Possible values (i.e. state space) for the parameter $\theta $ in the SIS algorithm at time $t$ in year $y$. Here, ‘time’ is referring to ‘day’. The values are based on the historical deviations from the daily mean of the parameter in the previous day. 
$S\left(X\right)$  Sample standard deviation of $X$ 
${S}^{t}\left(\theta \right)$  Set of ${s}^{t,y}\left(\theta \right)$ for all years; ${S}^{t}\left(\theta \right)=\left\{{s}^{t,1}\left(\theta \right),\dots ,{s}^{t,Y}\left(\theta \right)\right\}$ 
$SI{S}^{t}\left(\theta \right)$  SIS estimation for $\theta $ at time $t$ 
$t$  Superscript index for the time in a discretetime sequential process. Without loss of generality, ‘time’ is referring to ‘day’ in this study. 
${u}^{t}$  IS weight for $\left({x}^{t}{x}^{t1}\right)$ in a Markov process 
${u}_{j}^{t}$  IS weight for $\left({x}^{t}{x}^{t1}\right)$ in a Markov process for a drawn sample in iteration $j$ 
${w}_{j}$  IS weight for a drawn sample in iteration $j$ 
${w}^{t}$  IS weight for ${x}^{1:t}$ 
${w}_{j}^{t}$  IS weight for ${x}^{1:t}$ in iteration $j$ 
${W}^{t}$  Set of ${w}_{j}^{t}$ from iterations of SIS algorithm; ${W}^{t}=\left\{{w}_{1}^{t},\dots ,{w}_{M}^{t}\right\}$ 
$x$  The available data on the dataset 
${x}_{i}$  The ith sample of $x$ 
${x}^{+}$  Unobserved data of the random variable $X$ in the future 
${x}^{t}$  A sample for ${X}^{t}$ 
$\overline{x}$  Sample mean 
$X$  A random variable 
${X}^{t}$  A discretetime sequential random variable at time $t$ 
${X}^{1:t}=\left({X}^{1},\dots ,{X}^{t}\right)$  A discretetime stochastic process representing the entire history of the sequence of a random variable 
${x}^{1:t}$  A sample for ${X}^{1:t}$ 
${x}_{i}^{1:t}$  The ith sample for ${X}^{1:t}$ 
$y$  Superscript index for years; $y=1,\dots ,Y$ 
$Y$  Number of years from the dataset that are used for estimation 
$z$  Subscript index for bins in the kernel density estimation 
$\alpha $  Acceptance probability in the MetropolisHastings algorithm 
$\theta $  Generic parameter that is supposed to be estimated 
${\theta}^{\prime}$  A drawn sample for parameter $\theta $, which might be accepted or rejected 
${\theta}_{j}$  An accepted sample for parameter $\theta $ in iteration $j$ 
$\lambda $  The parameter of Poisson distribution 
$\mu $  Mean of the datagenerating process 
$\left({\mu}_{H},{\sigma}_{H}^{2}\right)$  Hyperparameters of Gaussian prior distribution 
$\left({\mu}_{H}^{\prime},{\sigma}_{H}^{2\prime}\right)$  Hyperparameters of Gaussian posterior distribution 
$\left({\mu}_{+},{\sigma}_{+}^{2}\right)$  Parameters of Gaussian predictive distribution 
$\left({\mu}_{re},{\sigma}_{re}^{2}\right)$  Parameters of reanalysis values in 2012 
${\sigma}_{*}^{2}$  The known variance of the datagenerating process 
Parameter  Number of Days in Year Which H0 Cannot Be Rejected  Percentage of Days in Year Which H0 Cannot Be Rejected 

Wave height  245  67% 
Wind speed  330  90% 
Temperature  257  70% 
Relative humidity  284  78% 
Atmospheric pressure  346  95% 
Wave period  180  49% 
Parameter  Value 

${\sigma}_{*}^{2}$  1.12 
$\left({\mu}_{H},{\sigma}_{H}^{2}\right)$  (−3.49, 10.29) 
$\overline{x}$  (−5.20, 8.52) 
$\left({\mu}_{H}^{\prime},{\sigma}_{H}^{2\prime}\right)$  (−5.16, 0.25) 
$\left({\mu}_{+},{\sigma}_{+}^{2}\right)$  (−5.16, 1.50) 
$\left({\mu}_{re},{\sigma}_{re}^{2}\right)$  (−4.44, 0.95) 
Month  Bayesian  SIS200 ^{2}  SIS500 ^{3}  MCMC200 ^{4}  MCMC500 ^{5} 

Jan  1.00  1.03  0.94  0.99  0.97 
Feb  0.97  1.19  1.25  1.00  0.97 
Mar  0.89  0.96  1.12  0.97  0.84 
Apr  0.65  0.74  0.63  0.67  0.70 
May  0.98  1.10  1.04  0.95  0.95 
Jun  0.54  0.62  0.52  0.66  0.53 
Jul  0.42  0.38  0.46  0.47  0.52 
Aug  0.54  0.56  0.59  0.56  0.74 
Sep  0.82  0.83  1.02  0.89  1.07 
Oct  0.99  0.94  1.17  1.15  1.22 
Nov  0.66  0.84  0.75  0.78  0.83 
Dec  1.03  1.48  1.37  1.15  1.09 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  3.39  3.37  3.31  3.76  3.52 
Feb  2.39  2.23  2.31  2.29  2.38 
Mar  2.69  3.14  2.92  2.77  2.89 
Apr  1.99  2.07  2.65  1.92  2.18 
May  2.77  2.50  2.84  2.60  2.54 
Jun  2.33  2.49  2.49  2.22  2.14 
Jul  1.56  1.86  1.90  1.65  1.53 
Aug  2.48  2.51  2.60  2.35  2.62 
Sep  2.90  3.62  3.23  3.01  3.00 
Oct  2.85  3.18  4.10  3.22  3.07 
Nov  2.63  3.13  3.48  2.43  2.59 
Dec  2.83  3.61  3.64  2.94  2.91 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  3.13  4.36  4.29  5.95  5.99 
Feb  4.25  4.88  5.16  7.25  6.94 
Mar  2.76  3.19  3.16  4.49  5.57 
Apr  1.83  2.58  2.26  2.40  2.18 
May  1.68  1.85  2.17  1.64  1.98 
Jun  0.63  0.64  0.76  0.67  0.57 
Jul  0.75  0.84  0.71  0.80  0.76 
Aug  0.85  0.83  0.97  0.89  0.78 
Sep  1.33  1.83  1.25  1.43  1.42 
Oct  1.41  2.44  2.46  2.07  2.12 
Nov  2.34  2.75  3.02  3.38  3.04 
Dec  2.12  3.31  3.44  3.55  3.49 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  5.31  5.59  5.20  5.82  6.02 
Feb  9.54  9.34  9.33  8.21  8.25 
Mar  5.94  7.53  7.49  6.28  5.16 
Apr  9.72  10.39  9.59  9.94  9.65 
May  8.22  8.82  8.57  8.60  8.54 
Jun  5.54  5.41  5.25  4.96  4.88 
Jul  6.48  5.98  7.34  6.36  6.36 
Aug  6.55  6.14  7.52  5.95  5.99 
Sep  8.81  10.47  9.22  9.63  9.75 
Oct  6.46  8.41  9.76  6.19  6.21 
Nov  9.98  13.56  11.86  11.59  11.29 
Dec  6.97  9.38  8.16  8.97  7.67 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  14.20  14.09  15.64  13.07  14.57 
Feb  17.13  15.96  16.92  16.46  16.86 
Mar  10.58  12.53  12.60  11.55  12.91 
Apr  9.35  12.01  9.41  8.40  10.38 
May  9.80  11.05  9.92  10.46  9.95 
Jun  4.40  5.05  4.20  4.84  4.88 
Jul  7.07  8.74  7.66  7.45  7.76 
Aug  7.62  8.29  6.44  7.46  6.84 
Sep  8.79  9.35  10.88  10.40  9.69 
Oct  8.46  8.41  12.09  8.68  7.70 
Nov  12.85  12.20  14.82  12.36  12.79 
Dec  16.90  17.16  18.81  17.05  16.12 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  1.06  1.14  1.08  2.00  1.82 
Feb  1.14  1.59  1.38  2.33  3.38 
Mar  0.99  1.40  1.20  1.52  2.16 
Apr  0.78  0.81  0.80  1.34  1.61 
May  1.03  1.36  1.29  1.47  1.56 
Jun  0.63  1.03  0.77  0.70  0.67 
Jul  0.70  0.97  0.79  0.84  0.71 
Aug  0.60  0.62  0.70  0.71  0.96 
Sep  0.64  0.63  0.88  0.89  0.61 
Oct  0.98  1.22  1.06  1.13  1.01 
Nov  0.63  0.98  1.13  0.83  0.73 
Dec  0.89  1.01  1.25  1.28  1.47 
Month  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Jan  0.08  0.19  0.20  0.34  0.35 
Feb  0.13  0.19  0.22  0.41  0.39 
Mar  0.08  0.12  0.12  0.22  0.29 
Apr  0.07  0.12  0.10  0.10  0.09 
May  0.00  0.00  0.02  0.01  0.02 
Jun  0.00  0.00  0.00  0.00  0.00 
Jul  0.00  0.00  0.00  0.00  0.00 
Aug  0.00  0.00  0.00  0.00  0.00 
Sep  0.00  0.00  0.00  0.00  0.00 
Oct  0.00  0.01  0.01  0.00  0.00 
Nov  0.03  0.04  0.04  0.07  0.04 
Dec  0.06  0.11  0.16  0.14  0.14 
Location  Bayesian  SIS200  SIS500  MCMC200  MCMC500 

Coordinates (74.07° N, 35.81° E)  00:00:01  00:00:05  00:00:22  00:00:12  00:00:19 
Entire area  00:04:41  00:44:29  02:00:03  01:28:28  02:22:14 
ttest Parameter  30 Years  32 Years 

Mean  2.68  2.61 
Variance  4.32  3.80 
Observations  365  365 
df  725   
t Stat  0.45   
pvalue  0.65   
