Next Article in Journal
The Thermodynamics of Network Coding, and an Algorithmic Refinement of the Principle of Maximum Entropy
Next Article in Special Issue
Machine Learning Predictors of Extreme Events Occurring in Complex Dynamical Systems
Previous Article in Journal
Statistical Complexity of the Coriolis Antipairing Effect
Previous Article in Special Issue
State and Parameter Estimation from Observed Signal Increments
Open AccessArticle

Parameter Estimation with Data-Driven Nonparametric Likelihood Functions

by Shixiao W. Jiang 1,† and John Harlim 1,2,3,*,†
Department of Mathematics, the Pennsylvania State University, 109 McAllister Building, University Park, PA 16802-6400, USA
Department of Meteorology and Atmospheric Science, the Pennsylvania State University, 503 Walker Building, University Park, PA 16802-5013, USA
Institute for CyberScience, the Pennsylvania State University, 224B Computer Building, University Park, PA 16802, USA
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2019, 21(6), 559;
Received: 24 April 2019 / Revised: 30 May 2019 / Accepted: 1 June 2019 / Published: 3 June 2019
(This article belongs to the Special Issue Information Theory and Stochastics for Multiscale Nonlinear Systems)
In this paper, we consider a surrogate modeling approach using a data-driven nonparametric likelihood function constructed on a manifold on which the data lie (or to which they are close). The proposed method represents the likelihood function using a spectral expansion formulation known as the kernel embedding of the conditional distribution. To respect the geometry of the data, we employ this spectral expansion using a set of data-driven basis functions obtained from the diffusion maps algorithm. The theoretical error estimate suggests that the error bound of the approximate data-driven likelihood function is independent of the variance of the basis functions, which allows us to determine the amount of training data for accurate likelihood function estimations. Supporting numerical results to demonstrate the robustness of the data-driven likelihood functions for parameter estimation are given on instructive examples involving stochastic and deterministic differential equations. When the dimension of the data manifold is strictly less than the dimension of the ambient space, we found that the proposed approach (which does not require the knowledge of the data manifold) is superior compared to likelihood functions constructed using standard parametric basis functions defined on the ambient coordinates. In an example where the data manifold is not smooth and unknown, the proposed method is more robust compared to an existing polynomial chaos surrogate model which assumes a parametric likelihood, the non-intrusive spectral projection. In fact, the estimation accuracy is comparable to direct MCMC estimates with only eight likelihood function evaluations that can be done offline as opposed to 4000 sequential function evaluations, whenever direct MCMC can be performed. A robust accurate estimation is also found using a likelihood function trained on statistical averages of the chaotic 40-dimensional Lorenz-96 model on a wide parameter domain. View Full-Text
Keywords: Bayesian inference; MCMC; diffusion maps; nonparametric likelihood function; surrogate modeling; reproducing kernel Hilbert space; kernel embedding of the conditional distribution Bayesian inference; MCMC; diffusion maps; nonparametric likelihood function; surrogate modeling; reproducing kernel Hilbert space; kernel embedding of the conditional distribution
Show Figures

Figure 1

MDPI and ACS Style

Jiang, S.W.; Harlim, J. Parameter Estimation with Data-Driven Nonparametric Likelihood Functions. Entropy 2019, 21, 559.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop