In this work, we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an a priori from initial domain knowledge before proceeding to an online data acquisition. We are particularly interested in the intermediate regime, where we do not have enough data to do without the initial a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue, with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited