Entropy2014, 16(3), 1376-1395; doi:10.3390/e16031376 (doi registration under processing) - published online 7 March 2014 Show/Hide Abstract
Abstract: Conditional independence tests have received special attention lately in machine learning and computational intelligence related literature as an important indicator of the relationship among the variables used by their models. In the field of probabilistic graphical models, which includes Bayesian network models, conditional independence tests are especially important for the task of learning the probabilistic graphical model structure from data. In this paper, we propose the full Bayesian significance test for tests of conditional independence for discrete datasets. The full Bayesian significance test is a powerful Bayesian test for precise hypothesis, as an alternative to the frequentist’s significance tests (characterized by the calculation of the p-value).
Entropy2014, 16(3), 1365-1375; doi:10.3390/e16031365 - published online 4 March 2014 Show/Hide Abstract
Abstract: Information-theory provides, among others, conceptual methods to quantify the amount of information contained in single random variables and methods to quantify the amount of information contained and shared among two or more variables. Although these concepts have been successfully applied in hydrology and other fields, the evaluation of these quantities is sensitive to different assumptions in the estimation of probabilities. An example is the histogram bin size used to estimate probabilities to calculate Information Theory quantities via frequency methods. The present research aims at introducing a method to take into consideration the uncertainty coming from these parameters in the evaluation of the North Sea’s water level network. The main idea is that the entropy of a random variable can be represented as a probability distribution of possible values, instead of entropy being a deterministic value. The method consists of solving multiple scenarios of Multi-Objective Optimization Problem in which information content is maximized and redundancy is minimized. Results include probabilistic analysis of the chosen parameters on the resulting family of Pareto fronts, providing additional criteria on the selection of the final set of monitoring points.
Entropy2014, 16(3), 1349-1364; doi:10.3390/e16031349 - published online 3 March 2014 Show/Hide Abstract
Abstract: This paper demonstrates a robust maximum entropy approach to estimating flexible-form farm-level multi-input/multi-output production functions using minimally specified disaggregated data. Since our goal is to address policy questions, we emphasize the model’s ability to reproduce characteristics of the existing production system and predict outcomes of policy changes at a disaggregate level. Measurement of distributional impacts of policy changes requires use of farm-level models estimated across a wide spectrum of sizes and types, which is often difficult with traditional econometric methods due to data limitations. We use a two-stage approach to generate observation-specific shadow values for incompletely priced inputs. We then use the shadow values and nominal input prices to estimate crop-specific production functions using generalized maximum entropy (GME) to capture individual heterogeneity of the production environment while replicating observed inputs and outputs to production. The two-stage GME approach can be implemented with small data sets. We demonstrate this methodology in an empirical application to a small cross-section data set for Northern Rio Bravo, Mexico and estimate production functions for small family farms and moderate commercial farms. The estimates show considerable distributional differences resulting from policies that change water subsidies in the region or shift price supports to direct payments.
Entropy2014, 16(3), 1331-1348; doi:10.3390/e16031331 - published online 3 March 2014 Show/Hide Abstract
Abstract: RNA is usually classified as either structured or unstructured; however, neither category is adequate in describing the diversity of secondary structures expected in biological systems We describe this diversity within the ensemble of structures by using two different metrics: the average Shannon entropy and the ensemble defect. The average Shannon entropy is a measure of the structural diversity calculated from the base pair probability matrix. The ensemble defect, a tool in identifying optimal sequences for a given structure, is a measure of the average number of structural differences between a target structure and all the structures that make up the ensemble, scaled to the length of the sequence. In this paper, we show examples and discuss various uses of these metrics in both structured and unstructured RNA. By exploring how these two metrics describe RNA as an ensemble of different structures, as would be found in biological systems, it will push the field beyond the standard “structured” and “unstructured” categorization.
Entropy2014, 16(3), 1315-1330; doi:10.3390/e16031315 - published online 28 February 2014 Show/Hide Abstract
Abstract: The nonverbal transmission of information between social animals is a primary driving force behind their actions and, therefore, an important quantity to measure in animal behavior studies. Despite its key role in social behavior, the flow of information has only been inferred by correlating the actions of individuals with a simplifying assumption of linearity. In this paper, we leverage information-theoretic tools to relax this assumption. To demonstrate the feasibility of our approach, we focus on a robotics-based experimental paradigm, which affords consistent and controllable delivery of visual stimuli to zebrafish. Specifically, we use a robotic arm to maneuver a life-sized replica of a zebrafish in a predetermined trajectory as it interacts with a focal subject in a test tank. We track the fish and the replica through time and use the resulting trajectory data to measure the transfer entropy between the replica and the focal subject, which, in turn, is used to quantify one-directional information flow from the robot to the fish. In agreement with our expectations, we find that the information flow from the replica to the zebrafish is significantly more than the other way around. Notably, such information is specifically related to the response of the fish to the replica, whereby we observe that the information flow is reduced significantly if the motion of the replica is randomly delayed in a surrogate dataset. In addition, comparison with a control experiment, where the replica is replaced by a conspecific, shows that the information flow toward the focal fish is significantly more for a robotic than a live stimulus. These findings support the reliability of using transfer entropy as a measure of information flow, while providing indirect evidence for the efficacy of a robotics-based platform in animal behavioral studies.
Entropy2014, 16(3), 1287-1314; doi:10.3390/e16031287 - published online 27 February 2014 Show/Hide Abstract
Abstract: Some known results from statistical thermophysics as well as from hydrology are revisited from a different perspective trying: (a) to unify the notion of entropy in thermodynamic and statistical/stochastic approaches of complex hydrological systems and (b) to show the power of entropy and the principle of maximum entropy in inference, both deductive and inductive. The capability for deductive reasoning is illustrated by deriving the law of phase change transition of water (Clausius-Clapeyron) from scratch by maximizing entropy in a formal probabilistic frame. However, such deductive reasoning cannot work in more complex hydrological systems with diverse elements, yet the entropy maximization framework can help in inductive inference, necessarily based on data. Several examples of this type are provided in an attempt to link statistical thermophysics with hydrology with a unifying view of entropy.