Next Article in Journal / Special Issue
Setting Diverging Colors for a Large-Scale Hypsometric Lunar Map Based on Entropy
Previous Article in Journal
Analytic Exact Upper Bound for the Lyapunov Dimension of the Shimizu–Morioka System
Previous Article in Special Issue
Intransitivity in Theory and in the Real World

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Entropy-Based Approach to Path Analysis of Structural Generalized Linear Models: A Basic Idea

by 1,*,
1
Department of Biostatistics, Faculty of Medicine, Oita University, Oita 879-5593, Japan
2
Department of Mathematical Sciences, Graduate School of Engineering, Osaka Prefecture University, Osaka 599-8531, Japan
3
Department of Statistics and Quantitative Methods, University of Milano Bicocca, via Bicocca degli Arcimboldi 8, Milano 20126, Italy
4
Division of Mathematical Science, Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(7), 5117-5132; https://doi.org/10.3390/e17075117
Received: 11 May 2015 / Revised: 7 July 2015 / Accepted: 17 July 2015 / Published: 22 July 2015
(This article belongs to the Special Issue Entropy, Utility, and Logical Reasoning)

Abstract

:
A path analysis method for causal systems based on generalized linear models is proposed by using entropy. A practical example is introduced, and a brief explanation of the entropy coefficient of determination is given. Direct and indirect effects of explanatory variables are discussed as log odds ratios, i.e., relative information, and a method for summarizing the effects is proposed. The example dataset is re-analyzed by using the method.

1. Introduction

Path analysis [1] is often applied to causal systems of continuous variables through the linear structural equations model (LISREL) [2,3]. In the LISREL approach, causal relationships among variables are described by a path diagram and translated into linear equations of the variables. Causal effects can then be calculated by regression and correlation coefficients obtained for the linear equations. In contrast, path analysis of categorical variables is more complex than that of continuous variables because the causal system under consideration cannot be described by linear regression equations. Goodman [47] considered path analysis of binary variables by using logit models and discussed the effects of explanatory variables, though without discussing direct and indirect effects. Hagenaars [8] discussed path analysis of categorical variables by using a log-linear model, but here as well without discussion of direct and indirect effects. Eshima et al. [9] proposed a path analysis method for categorical variables in logit models. Kuha and Goldthorpe [10] gave a two-stage path analysis method for generalized linear models (GLMs) that uses log odds ratios. In their approach, first the total, direct and indirect effects are defined for mean differences of response variables, and then the method is applied to measuring the effects on the basis of log odds ratios. However, additive decomposition of the total effect into the direct and indirect effects only approximately reflects reality, and assessing effects in categorical (polytomous) variable systems become more complicated as the numbers of variable categories are increased [10]. Albert and Nelson [11] proposed a path analysis method to calculate pathway effects for causal systems on the basis of GLMs, but not all pathway effects are identifiable. As in the two-stage cases, when factors, intermediate variables, and response variables are categorical, pathway effects become very complicated because the variable effects are defined for mean differences of response variables.
There are many examples of response variables in practical data that are not normally distributed in various fields of study. There is need for a method of path analysis with responses that are not normally distributed, especially categorical responses, and it is useful to discuss a path analysis approach for causal systems of GLMs [12,13]. When describing causal systems of the variables by GLMs, regression parameters or coefficients are related to log odds ratios [1416], and so it is natural to consider the effects of factors (explanatory variables) according to odds or log odds ratios. However results become more complicated as the number of categories of the variables increases. In such cases, it is suitable to summarize the effects of factors in GLMs. For this purpose, we use the entropy coefficient of determination (ECD), one of the entropy-based measures of predictive power for GLMs [15,16].
The remainder of this paper is organized as follows: Section 2 presents a practical example of causal systems—British mobility data [10]—and re-analyzes them by a new method of path analysis. Section 3 considers the relation between the log odds ratio and entropy, and ECD is briefly reviewed. Section 4 introduces a path analysis method for causal systems described by GLMs, and in Section 5 a method for testing effects of variables is given. The British mobility data are re-analyzed by the proposed approach in Section 6. Finally, Section 7 provides some discussion and conclusions for the present approach.

2. Practical Example

British mobility data describe the effects of education on social class mobility [10]. There are three variables, which are causally ordered as shown in Figure 1: parents’ social class, X; individual social class, Y; and education, Z, which intermediates between X and Y. The three variables are discrete. Social classes X and Y have three categories, “salariat and employers”, “middle class”, and “working class”; education Z has seven levels. While the effects of X and Z on Y can be discussed through log odds ratios, the results are complicated because the number of variable categories is large. It is important to summarize causal effects measured with log odds ratios, especially in such practical examples, to assess the intermediate effect of education on social class mobility.

3. Log Odds Ratio and Information

Let $X$ and Y be a p × 1 explanatory-variable vector and a response variable, respectively, and let $f ( y | x )$ be the conditional probability or probability density function of Y given that $X = x$. The function $f ( y | x )$ is assumed to belong to the following family of exponential distributions:
$f ( y | x ) = exp { y θ − b ( θ ) a ( ϕ ) + c ( y , ϕ ) } ,$
where θ and φ are parameters, and $a ( ϕ )$ (>0), $b ( θ )$, and $c ( y , ϕ )$ are known functions. Let $β T = ( β 1 , β 2 , ... , β p ) T$. The function θ is a function of $η = β T x$ through a link function $h ( u )$.
Remark 1. In general, the systematic component can be extended to be a function of explanatory variable vector x. Then, the model is referred to as a generalized nonlinear model. For the sake of simplicity, the function is denoted by$θ = θ ( x )$. The discussion below is applicable to this case.
Let $μ X$ and $μ Y$ be the means of $X$ and $Y$, respectively. Then, let us consider the following log odds ratio:
$log f ( y | x ) f ( μ Y | μ X ) f ( μ Y | x ) f ( y | μ X ) = ( log f ( y | x ) − log f ( μ Y | x ) ) − ( f ( y | μ X ) − f ( μ Y | μ X ) ) = ( y − μ Y ) ( θ ( x ) − θ ( μ X ) ) a ( ϕ ) .$
The first and second terms of the right hand side of the above equation are the relative information with respect to response variable $Y$, so the log odds ratio is the change of the relative information in explanatory variable vector $X$. In GLMs, taking the expectation of the above log odds ratio with respect to $X$ and Y, it is reduced to $cov ( θ , Y ) a ( ϕ )$. The quantity $cov ( θ , Y ) a ( ϕ )$ can be expressed as a symmetric type of the Kullback–Leibler (KL) information between a GLM based on (1) and the null model with $β T = 0$ [15]; thus, we denote $cov ( θ , Y ) a ( ϕ )$ by $K L ( X , Y )$ in this paper. Let $f ( y )$ be the density or probability function for null model $β T = 0$ and let $g ( x )$ be that of $X$. Then:
$K L ( X , Y ) = ∬ f ( y | x ) g ( x ) log f ( y | x ) f ( y ) . d x d y + ∬ f ( y ) g ( x ) log f ( y ) f ( y | x ) . d x d y .$
If the variables in the above are discrete, the related integrals are replaced by the summations. The ECD is then defined as:
$E C D ( X , Y ) = cov ( Y , θ ) cov ( Y , θ ) + a ( ϕ ) .$
Then, ECD can also be expressed as:
$E C D ( X , Y ) = cov ( Y , θ ) a ( ϕ ) cov ( Y , θ ) a ( ϕ ) + 1 = K L ( X , Y ) K L ( X , Y ) + 1 .$
The measure is interpreted as the proportion of the variation in entropy of Y that is explained by $X$ [15,16]. As shown above, GLMs explain the entropy of response variables, so it is suitable to measure the effects of explanatory variables based on entropy.
Remark 2. Applying ECD to the linear regression model, ECD is the usual coefficient of determination$R 2$.

4. Measuring the Total, Direct, and Indirect Effects in Recursive GLM Systems

For simplicity, in the recursive case with $X i$$( i = 1 , 2 , 3 )$, where $X i$ precedes $X i + 1$$( i = 1 , 2 )$, we discuss the effects of $X 1$ and $X 2$ on $X 3$. Let $μ i$ be the expectation of $X i$$( i = 1 , 2 , 3 )$. Then, for a GLM with the conditional density or probability function of $X 3$ when $( X 1 , X 2 ) = ( x 1 , x 2 )$ given by (1), the total effect of $( X 1 , X 2 ) = ( x 1 , x 2 )$ on $X 3 = x 3$ can be defined by using the following log odds ratio:
$log f ( x 3 | x 1 , x 2 ) f ( μ 3 | μ 1 , μ 2 ) f ( μ 3 | x 1 , x 2 ) f ( x 3 | μ 1 , μ 2 ) = ( x 3 − μ 3 ) ( θ ( x 1 , x 2 ) − θ ( μ 1 , μ 2 ) ) a ( ϕ ) .$
Taking the expectation of the above effect with respect to $X 1$, $X 2$ and $X 3$, we have:
$cov ( θ ( X 1 , X 2 ) , X 3 ) a ( ϕ ) = K L ( ( X 1 , X 2 ) , X 3 ) .$
The above KL information is the (summary) total effect of explanatory variables $( X 1 , X 2 )$ on response variable $X 3$. Let $μ 2 ( x 1 )$ and $μ 3 ( x 1 )$ be the conditional expectations for $X 2$ and $X 3$, respectively, given that $X 1 = x 1$. The log odds ratio with respect to $x 2$ and $x 3$ for a given $x 1$ is:
$log f ( x 3 | x 1 , x 2 ) f ( μ 3 ( x 1 ) | x 1 , μ 2 ( x 1 ) ) f ( μ 3 ( x 1 ) | x 1 , x 2 ) f ( x 3 | x 1 , μ 2 ( x 1 ) ) = ( x 3 − μ 3 ( x 1 ) ) ( θ ( x 1 , x 2 ) − θ ( x 1 , μ 2 ( x 1 ) ) ) a ( ϕ ) .$
The total effect of $X 2 = x 2$ on $X 3 = x 3$ when $X 1 = x 1$ is defined by the above log odds ratio because the effect expresses the total effect of $X 2 = x 2$ on $X 3 = x 3$ when the effect of the preceding variable $X 1 = x 1$ is excluded. From this, the total effect of $X 1 = x 1$ on $X 3 = x 3$ is defined by
$( the total effect of x 1 and x 2 on x 3 ) − ( the total effect of x 2 on x 3 when X 1 = x 1 ) = log f ( x 3 | x 1 , x 2 ) f ( μ 3 | μ 1 , μ 2 ) f ( μ 3 | x 1 , x 2 ) f ( x 3 | μ 1 , μ 2 ) − log f ( x 3 | x 1 , x 2 ) f ( μ 3 ( x 1 ) | x 1 , μ 2 ( x 1 ) ) f ( μ 3 ( x 1 ) | x 1 , x 2 ) f ( x 3 | x 1 , μ 2 ( x 1 ) ) = ( x 3 − μ 3 ) ( θ ( x 1 , x 2 ) − θ ( μ 1 , μ 2 ) ) a ( ϕ ) − ( x 3 − μ 3 ( x 1 ) ) ( θ ( x 1 , x 2 ) − θ ( x 1 , μ 2 ( x 1 ) ) ) a ( ϕ ) .$
By taking the expectation of the above information with respect to $X 1$, $X 2$ and $X 3$, the (summary) total effect of $X 1$ on $X 3$ is given by
$cov ( θ ( X 1 , X 2 ) , X 3 ) a ( ϕ ) − cov ( θ ( X 1 , X 2 ) , X 3 | X 1 ) a ( ϕ ) = K L ( ( X 1 , X 2 ) , X 3 ) − K L ( X 2 , X 3 | X 1 ) ,$
where $K L ( X 2 , X 3 | X 1 )$ is given by
$K L ( X 2 , X 3 | X 1 ) = ∭ f ( x 3 | x 1 , x 2 ) g ( x 1 , x 2 ) log f ( x 3 | x 1 , x 2 ) f ( x 3 | x 1 ) . d x 1 d x 2 d x 3 + ∭ f ( x 3 | x 1 ) g ( x 1 , x 2 ) log f ( x 3 | x 1 ) f ( x 3 | x 1 , x 2 ) . d x 1 d x 2 d x 3 ,$
The second term implies the effect of $X 2$ by itself, that is, the effect of $X 2$ on $X 3$ when the effect of $X 1$ is excluded, and is defined as the (summary) total effect of $X 2$ on $X 3$. The direct effect of $X 1 = x 1$ on $X 3 = x 3$ can be understood according to the following odds ratio:
$log f ( x 3 | x 1 , x 2 ) f ( μ 3 ( x 2 ) | μ 1 ( x 2 ) , x 2 ) f ( μ 3 ( x 2 ) | x 1 , x 2 ) f ( x 3 | μ 1 ( x 2 ) , x 2 ) = ( x 3 − μ 3 ( x 2 ) ) ( θ ( x 1 , x 2 ) − θ ( μ 1 ( x 2 ) , x 2 ) ) a ( ϕ ) .$
The above effect is derived by excluding the effect of $X 2 = x 2$, so it is defined as the direct effect of $X 1 = x 1$ on $X 3 = x 3$. Taking the expectation of the above effect, we have the (summary) direct effect of $X 1$ on $X 3$, expressed as follows:
$cov ( θ ( X 1 , X 2 ) , X 3 | X 2 ) a ( ϕ ) = K L ( X 1 , X 3 | X 2 ) ,$
where $K L ( X 1 , X 3 | X 2 )$ is defined as in (2). The above quantity is the amount of entropy of $X 3$ explained by $X 1$ alone, that is, excluding the effect of $X 2$. By subtracting the direct effect of $X 1 = x 1$ on $X 3 = x 3$ from the total effect, we have the indirect effect of $X 1 = x 1$ on $X 3 = x 3$:
$log f ( x 3 | x 1 , x 2 ) f ( μ 3 | μ 1 , μ 2 ) f ( μ 3 | x 1 , x 2 ) f ( x 3 | μ 1 , μ 2 ) − log f ( x 3 | x 1 , x 2 ) f ( μ 3 ( x 1 ) | x 1 , μ 2 ( x 1 ) ) f ( μ 3 ( x 1 ) | x 1 , x 2 ) f ( x 3 | x 1 , μ 2 ( x 1 ) ) − log f ( x 3 | x 1 , x 2 ) f ( μ 3 ( x 2 ) | μ 1 ( x 2 ) , x 2 ) f ( μ 3 ( x 2 ) | x 1 , x 2 ) f ( x 3 | μ 1 ( x 2 ) , x 2 ) = ( x 3 − μ 3 ) ( θ ( x 1 , x 2 ) − θ ( μ 1 , μ 2 ) ) a ( ϕ ) − ( x 3 − μ 3 ( x 1 ) ) ( θ ( x 1 , x 2 ) − θ ( x 1 , μ 2 ( x 1 ) ) ) a ( ϕ ) − ( x 3 − μ 3 ( x 2 ) ) ( θ ( x 1 , x 2 ) − θ ( μ 1 ( x 2 ) , x 2 ) ) a ( ϕ ) .$
Taking the expectation of the above effect, the (summary) indirect effect is given by
$K L ( ( X 1 , X 2 ) , X 3 ) − K L ( X 2 , X 3 | X 1 ) − K L ( X 1 , X 3 | X 2 ) .$
As in the previous section, to standardize the above effects by ECD, we define the standardized total, direct, and indirect effects of $X 1$ and $X 2$ on $X 3$ as follows:
The total effect of $X 1$ and $X 2$ on $X 3$ is:
$e T ( ( X 1 , X 2 ) → X 3 ) = cov ( θ ( X , 1 X 2 ) , X 3 ) cov ( θ ( X , 1 X 2 ) , X 3 ) + a ( ϕ ) = K L ( ( X , 1 X 2 ) , X 3 ) K L ( ( X , 1 X 2 ) , X 3 ) + 1 .$
The total effect of $X 1$ on $X 3$:
$e T ( X 1 → X 3 ) = cov ( θ ( X , 1 X 2 ) , X 3 ) − cov ( θ ( X , 1 X 2 ) , X 3 | X 1 ) cov ( θ ( X , 1 X 2 ) , X 3 ) + a ( ϕ ) = K L ( ( X , 1 X 2 ) , X 3 ) − K L ( X 2 , X 3 | X 1 ) K L ( ( X , 1 X 2 ) , X 3 ) + 1 .$
The direct effect of $X 1$ on $X 3$:
$e D ( X 1 → X 3 ) = cov ( θ ( X , 1 X 2 ) , X 3 | X 2 ) cov ( θ ( X , 1 X 2 ) , X 3 ) + a ( ϕ ) = K L ( X 1 , X 3 | X 2 ) K L ( ( X , 1 X 2 ) , X 3 ) + 1 .$
The indirect effect of $X 1$ on $X 3$:
$e I ( X 1 → X 3 ) = e T ( X 1 → X 3 ) − e D ( X 1 → X 3 ) .$
The total (direct) effect of $X 2$ on $X 3$:
$e T ( X 2 → X 3 ) = e D ( X 2 → X 3 ) = cov ( θ ( X , 1 X 2 ) , X 3 | X 1 ) cov ( θ ( X , 1 X 2 ) , X 3 ) + a ( ϕ ) = K L ( X 2 , X 3 | X 1 ) K L ( ( X , 1 X 2 ) , X 3 ) + 1 .$
In this case:
$e T ( ( X , 1 X 2 ) → X 3 ) = e T ( X 1 → X 3 ) + e T ( X 2 → X 3 ) .$
A general approach based on the above discussion is given below. Let $X i$$( i = 1 , 2 , .. , K )$ be variables such that the parents of $X k$ are $X pa ( k ) = ( X 1 X 2 , ... , X k − 1 )$$( k = 2 , 3 , .. , K )$, that is, $X i$ precedes $X i + 1$$( i = 1 , 2 , .. , K − 1 )$. Let $f ( x K | x 1 , x 2 , ... , x K − 1 )$ be the conditional density or probability of $X K$ given $X pa ( K ) = ( X 1 X 2 , ... , X K − 1 )$ such that:
$f ( x K | x 1 , x , 2... , x K − 1 ) = exp { x K θ − b ( θ ) a ( ϕ ) + c ( x K , ϕ ) } .$
Explaining response variable $X K$ in a GLM framework by explanatory variables $X pa ( K )$, the effects of the explanatory variables on the response variable can be treated in terms of entropy as discussed above. From this the standardized (summary) total effect of $X 1$ on $X K$ is defined by:
$e T ( X → 1 X K ) = K L ( X pa ( K ) , X K ) − K L ( X pa ( K ) , X K | X 1 ) K L ( X pa ( K ) , X K ) + 1 .$
Second, the total effect of $X 2$ is defined as:
$e T ( X → 2 X K ) = K L ( X pa ( K ) , X K | X 1 ) − K L ( X pa ( K ) , X K | X 1 , X 2 ) K L ( X pa ( K ) , X K ) + 1 .$
Then, we can find the total effects of $X i$ by induction, which yields:
$e T ( X → i X K ) = K L ( X pa ( K ) , X K | X pa ( i ) ) − K L ( X pa ( K ) , X K | X pa ( i + 1 ) ) K L ( X pa ( K ) , X K ) + 1 , ( i = 1 , 2 , ... , K − 1 ) ,$
where $K L ( X pa ( K ) , X K | X pa ( i ) )$ and $K L ( X pa ( K ) , X K | X pa ( i + 1 ) )$ can be defined as in (2). In the above formulae, we have:
$e T ( X → i X K ) ≥ 0 ( i = 1 , 2 , ... , K − 1 ) .$
and:
$e T ( X pa ( K ) → X K ) = ∑ i = 1 K − 1 e T ( X i → X K ) .$
Remark 3. The total effect of$X i = x i$ on$X K = x K$ is given by:
$log f ( x K | x pa ( i ) , x i , ... , x K − 1 ) f ( μ K ( x pa ( i ) ) | x pa ( i ) , μ i ( x pa ( i ) ) , ... , μ K − 1 ( x pa ( i ) ) ) f ( μ K ( x pa ( i ) ) | x pa ( i ) , x i , ... , x K − 1 ) f ( x K | x pa ( i ) , μ i ( x pa ( i ) ) , ... , μ K − 1 ( x pa ( i ) ) ) − log f ( x K | x pa ( i + 1 ) , x i + 1 , ... , x K − 1 ) f ( μ K ( x pa ( i + 1 ) ) | x pa ( i + 1 ) , μ i + 1 ( x pa ( i + 1 ) ) , ... , μ K − 1 ( x pa ( i + 1 ) ) ) f ( μ K ( x pa ( i + 1 ) ) | x pa ( i + 1 ) , x i + 1 , ... , x K − 1 ) f ( x K | x pa ( i + 1 ) , μ i + 1 ( x pa ( i + 1 ) ) , ... , μ K − 1 ( x pa ( i + 1 ) ) ) = ( x K − μ K ( x pa ( i ) ) ) ( θ ( x pa ( i ) , x i , ... , x K − 1 ) − θ ( x pa ( i ) , μ i ( x pa ( i ) ) , ... , μ K − 1 ( x pa ( i ) ) ) ) a ( ϕ ) − ( x K − μ K ( x pa ( i + 1 ) ) ) ( θ ( x pa ( i + 1 ) , x i + 1 , ... , x K − 1 ) − θ ( x pa ( i + 1 ) , μ i + 1 ( x pa ( i + 1 ) ) , ... , μ K − 1 ( x pa ( i + 1 ) ) ) ) a ( ϕ ) ,$
where$μ k ( x pa ( i ) )$ and$μ k ( x pa ( i + 1 ) )$ be the conditional expectations of$X k$ given$X pa ( i ) = x pa ( i )$ and$X pa ( i + 1 ) = x pa ( i + 1 )$, respectively.
Let $X pa ( K ) \ i = ( X 1 , X 2 , ... , X i − 1 , X i + 1 , ... , X K − 1 )$ be parent variables of $X K$ excluding $X i$. The direct effect of $X i$ on $X K$ is defined by:
$e D ( X → i X K ) = K L ( X pa ( K ) , X K | X pa ( K ) \ i ) K L ( X pa ( K ) , X K ) + 1 ( i = 1 , 2 , ... , K − 1 ) .$
From this, we have the indirect effect of $X i$:
$e I ( X → i X K ) = e T ( X → i X K ) − e D ( X → i X K ) = K L ( X pa ( K ) , X K | X pa ( i ) ) − K L ( X pa ( K ) , X K | X pa ( i + 1 ) ) − K L ( X pa ( K ) , X K | X pa ( K ) \ i ) K L ( X pa ( K ) , X K ) + 1 ( i = 1 , 2 , ... , K − 2 ) .$
Remark 4. The direct effect of$X i = x i$ on$X K = x K$ is given by
$log f ( x K | x pa ( i ) , x i , x i + 1 , ... , x K − 1 ) f ( μ K ( x pa ( K ) / i ) | x pa ( i ) , μ i ( x pa ( K ) / i ) , x i + 1 , ... , x K − 1 ) f ( μ K ( x pa ( K ) / i ) | x pa ( i ) , x i , x i + 1 , ... , x K − 1 ) f ( x K | x pa ( i ) , μ i ( x pa ( K ) / i ) , x i + 1 , ... , x K − 1 ) = ( x K − μ K ( x pa ( K ) / i ) ) ( θ ( x pa ( i ) , x i , x i + 1 , ... , x K − 1 ) − θ ( x pa ( i ) , μ i ( x pa ( K ) / i ) , x i + 1 , ... , x K − 1 ) ) a ( ϕ ) ,$
where$μ i ( x pa ( K ) / i )$ is the conditional expectation of$X i$ given$X pa ( K ) / i = x pa ( K ) / i$.
$θ = ∑ i = 1 K − 1 β i x i ,$
we have:
$cov ( θ , X K | X pa ( i ) ) = cov ( ∑ j = 1 K − 1 β j X j , X K | X pa ( i ) ) = ∑ j = i K − 1 β j cov ( X j , X K − 1 | X pa ( i ) )$
and:
$cov ( θ , X K | X pa ( i + 1 ) ) = ∑ j = i + 1 K − 1 β j cov ( X j , X K − 1 | X pa ( i + 1 ) )$
From (5) we have:
$e T ( X → i X K ) = cov ( θ , X K | X pa ( i ) ) − cov ( θ , X K | X pa ( i + 1 ) ) a ( ϕ ) ( K L ( X pa ( K ) , X K ) + 1 ) = β i cov ( X i , X K | X pa ( i ) ) + ∑ j = i + 1 K − 1 β j ( cov ( X j , X K | X pa ( i ) ) − cov ( X j , X K | X pa ( i + 1 ) ) ) a ( ϕ ) ( K L ( X pa ( K ) , X K ) + 1 ) ( i = 1 , 2 , ... , K − 2 ) .$
The direct effect of $X i$ on $X K$ is given by:
$e D ( X → i X K ) = β i cov ( X i , X K | X pa ( K ) \ i ) a ( ϕ ) ( K L ( X pa ( K ) , X K ) + 1 ) ,$
and the indirect effect is calculated by (8) minus (9).
The present approach is different from the usual approach for linear equation models and from the approach in [10], because it is based on the log odds ratio and entropy by using all the variables concerned.
Remark 5. The total effects of variables by Kuha and Goldthorpe [10] are defined with the marginal distributions of response variables and explanatory variables. Meanwhile the present approach defines the total effects of explanatory variables based on a recursive structure of all the variables concerned and we have (6).
Remark 6. Indirect effects are defined by the total effects minus the direct effects as (3), (4) and (7); however the interpretation can be done in terms of entropy. On the other hand, direct and indirect effects are defined in an approach by [10], though the sum of the effects does not equal to the total effect.
Remark 7. Assessing the model identification and testing the goodness-of-fit of the model are based on the discussion of GLMs.

5. Statistical Test for Effects

Let $K ⌢ L ( X pa ( K ) , X K | X pa ( i ) ) ,$$K ⌢ L ( X pa ( K ) , X K | X pa ( i + 1 ) ) ,$ and $K ⌢ L ( X pa ( K ) , X K | X pa ( K ) \ i )$ be the ML estimators of $K L ( X pa ( K ) , X K | X pa ( i ) ) ,$$K L ( X pa ( K ) , X K | X pa ( i + 1 ) ) ,$ and $K L ( X pa ( K ) , X K | X pa ( K ) \ i )$, respectively. A similar result presented in Eshima & Tabata [16] can be used to show that:
$n K ⌢ L ( X pa ( K ) , X K | X pa ( i ) ) − n K ⌢ L ( X pa ( K ) , X K | X pa ( i + 1 ) )$
is asymptotically distributed according to a chi-squared distribution with the degrees of freedom equal to the number of parameters in the conditional independent model with $X pa ( i )$ minus that with $X pa ( i + 1 )$.By using statistic (10), the total effects can be tested. Similarly, the statistic:
$n K ⌢ L ( X pa ( K ) , X K | X pa ( K ) \ i ) = n K ⌢ L ( X i , X K | X pa ( K ) \ i )$
is asymptotically distributed according to a chi-squared distribution with degrees of freedom equal to the number of regression coefficients (parameters) related to variable $X i$.
The following statistic is asymptotically distributed according to a non-central chi-squared distribution with degree of non-centrality:
$λ = n K L ( X pa ( K ) , X K | X pa ( i ) ) − n K L ( X pa ( K ) , X K | X pa ( i + 1 ) )$
and an appropriate degrees of freedom ν, found as the number of parameters in the conditional independent model with $X pa ( i )$ minus that with $X pa ( i + 1 )$. Let:
$χ T 2 = n K ⌢ L ( X pa ( K ) , X K | X pa ( i ) ) − n K ⌢ L ( X pa ( K ) , X K | X pa ( i + 1 ) )$
and let $c = 1 + λ ν + λ$ and $ν ′ = ν + λ 2 ν + 2 λ$. The statistic $χ T 2 c$ is asymptotically distributed according to the chi-squared distribution with $ν ′$ degrees of freedom. As $ν ′$ becomes large, the chi-squared distribution tends to a normal distribution with mean $ν ′$ and variance $2 ν ′$. From this, for sufficiently large sample sizes $n$, the statistic:
$χ T 2 n = K ⌢ L ( X pa ( K ) , X K | X pa ( i ) ) − K ⌢ L ( X pa ( K ) , X K | X pa ( i + 1 ) )$
is asymptotically normally distributed with mean $c ν ′ n$ and variance $2 c 2 ν ′ n 2$ [17]. For sufficiently large $n$, we have that:
$c ν ′ n ≈ K L ( X pa ( K ) , X K | X pa ( i ) ) − K L ( X pa ( K ) , X K | X pa ( i + 1 ) )$
$2 c 2 ν ′ n ≈ 4 n ( K L ( X pa ( K ) , X K | X pa ( i ) ) − K L ( X pa ( K ) , X K | X pa ( i + 1 ) ) ) ≈ 4 n 2 χ T 2 .$
From this, the asymptotic standard error (ASE) of $χ T 2 n$ is $2 χ T 2 n$. Similarly, the asymptotic standard error of:
$χ D 2 n = K ⌢ L ( X pa ( K ) , X K | X pa ( K ) \ i )$
is $2 χ D 2 n$. Moreover:
$χ T 2 n − χ D 2 n = K ⌢ L ( X pa ( K ) , X K | X pa ( i ) ) − K ⌢ L ( X pa ( K ) , X K | X pa ( i + 1 ) ) − K ⌢ L ( X pa ( K ) , X K | X pa ( K ) \ i )$
is asymptotically equal to a normal distribution with mean:
$K L ( X pa ( K ) , X K | X pa ( i ) ) − K L ( X pa ( K ) , X K | X pa ( i + 1 ) ) − K L ( X pa ( K ) , X K | X pa ( K ) \ i )$
and variance $4 n 2 ( χ T 2 − χ D 2 )$. By using the above results, ASEs of the estimates of the summary total and direct effects can be calculated.

6. Path Analysis of the British Morbility Data

The British mobility data described in Section 2 were analyzed in detail by using log odds ratios [10]. Here, the proposed path analysis method is applied to summarize the effects of parental class $X$ and education $Z$ on destination class Y, measured by log odds ratios as in the previous section, and to give a simple interpretation from the summary effects of $X$ and $Z$ on $Y$. The three variables are random, and the GLM system can be composed of logit models. In this example, the employed logistic model can be expressed as follows. Let $X$ be a categorical factor; $Z$ a score that take levels {1,2,3} and {1,2,…,7}, respectively, and let $Y$ be a categorical response variable with levels {1,2,3}. Let:
$X i = { 1 ( X = i ) 0 ( X ≠ i ) and Y j = { 1 ( X = j ) 0 ( X ≠ j )$
Then, dummy variable vectors $X = ( X 1 , X 2 , X 3 ) T$ and $Y = ( Y 1 , Y 2 , Y 3 ) T$ are identified with categorical variables $X$ and response $Y$, respectively. From this, the systematic component of the above model can be expressed as follows:
$θ = α + Β ( 1 ) X + β ( 2 ) Z ,$
where:
$α = ( α 1 α 2 α 3 ) , Β ( 1 ) = ( β ( 1 ) 11 β ( 1 ) 12 β ( 1 ) 13 β ( 1 ) 21 β ( 1 ) 22 β ( 1 ) 23 β ( 1 ) 31 β ( 1 ) 32 β ( 1 ) 33 ) , and β ( 2 ) = ( β ( 2 ) 1 β ( 2 ) 2 β ( 2 ) 3 )$
Then, the logit model is described as:
$Pr ( Y = y | x , z ) = exp ( y T θ ) ∑ u exp ( u T θ ) = exp ( y T α + y T Β ( 1 ) x + y T β ( 2 ) z ) ∑ u exp ( u T α + u T Β ( 1 ) x + u T β ( 2 ) z )$
where $∑ u$ implies the summation over all u. Then, from Table 4 in [10], the estimated regression parameters for men are calculated as follows:
$α ⌢ = ( 0.60 1.20 1.79 ) , Β ⌢ ( 1 ) = ( 0.35 − 0.10 − 0.25 − 0.05 0.16 − 0.11 − 0.30 − 0.06 0.36 ) , and β ⌢ ( 2 ) = ( 0.32 − 0.07 − 0.25 )$
Similarly, we have the estimated parameters for women as follows:
$α ⌢ = ( 1.04 2.20 3.01 ) , Β ⌢ ( 1 ) = ( 0.18 0.03 − 0.21 0.04 0.00 − 0.03 − 0.22 − 0.03 0.24 ) , and β ⌢ ( 2 ) = ( 0.41 − 0.03 − 0.38 )$
From Tables 1 and 5 in [10], the joint distributions of parental class $X$and education $Z$ for men and women are calculated, respectively, in Table 1.
On the basis of the estimated parameters shown above and the estimated joint distribution of $X$ and Z in Table 1, the joint distributions of X, Y, and Z by sex can be estimated. The effects of X and Z on Y for men are shown in Tables 24, for example, the effects of $X = S$ and $Z = 5$ on $Y = S$ illustrated in Table 2 are as follows:
• the total effect of $X = S$ and $Z = 5$ on $Y = S$ is calculated as follows: 0.51;
• the total effect of $Z = 5$ is 0.04;
• the total effect of $X = S$ is 0.47 when $Z = 5$;
• the direct effect of $X = S$ is 0.16 when $Z = 5$;
• the indirect effect of $X = S$ is 0.31 when $Z = 5$.
Similarly, the effects of X and Z on Y for women can be calculated. The results are omitted to avoid redundancy of the discussion.
The standardized summary effects are shown in Table 5. For men, the total effect of X and Z on Y is 0.276, and so 27.6% of the variation of Y’s entropy is explained by X and Z. The indirect effect of X is about twice the direct effect, and the total (direct) effect of Z on Y is about 1.5-fold that of X. Therefore, the effect of education Z on the destination class Y is large. For women, the total effect of X and Z on Y is 0.289, meaning that 28.9% of the variation of Y’s entropy is explained by X and Z. The indirect effect of X on Y is about 6-fold that of the direct effect, and the direct effect is small. The total effect of Z on Y is about 2.7-fold that of X. The effect of Z on Y is more pronounced for women than for men.
In a comparison of men and women, the effect of Z on Y for women is about 1.3-fold the effect for men, and, contrarily, the effect of X on Y for men is about 1.4-fold the effect for women. For both men and women, the direct effects of X on Y are mostly very small, and this decomposition of effects shows that education plays an important role in determining social class as an adult.

7. Discussion

In the usual path analysis of continuous variable systems, use of the regression coefficients allows straightforward calculation of total, direct and indirect effects, and the total effect can be expressed by the sum of the direct and indirect effects. However such techniques cannot be applied to structural GLMs with categorical variables or variables that are not normally distributed. Moreover, multiple variable categories make the problem more complicated in comparison with linear equation models for continuous variables. In the present paper, a path analysis approach for structural GLM models was proposed, and calculation of the direct and indirect effects was discussed. Although the analysis of effects of explanatory variables on response variables can be discussed in detail by using log odds ratios, and the effects can be interpreted as changes of relative information, the results are generally quite complicated as demonstrated in Tables 24. The present path analysis summarizes the effects, as measured by log odds ratios, and the standardized summary total, direct, and indirect effects are interpreted in the framework of entropy. The present path analysis approach has potential for wide application in practical data analyses of causal systems represented as GLMs, and is particularly well suited to categorical data analysis. The present study has provided a basic idea for path analysis of recursive systems with GLMs, where all the variables concerned are causally ordered, and further studies are needed for performing path analysis of more complicated recursive GLM systems and assessing spurious effects.

Acknowledgments

The authors would like to thank the two referees and the editor for their useful comments and suggestions for improving the first version of this paper. This research was supported by Grant-in-aid for Scientific Research 26330045, Ministry of Education, Culture, Sports, Science and Technology of Japan.

Author Contributions

Nobuoki Eshima and Claudio Giovanni Borroni wrote the manuscript. Nobuoki Eshima and Minoru Tabata designed the research. Nobuoki Eshima, Claudio Giovanni Borroni and Yutaka Kano carried out the present path analysis for a real dataset. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

1. Asher, H.B. Causal Modelling; Sage Publications: Beverly Hills, CA, USA, 1976. [Google Scholar]
2. Bentler, P.M.; Weeks, D.B. Linear structural equations with latent variables. Psychometrika 1980, 45, 289–308. [Google Scholar]
3. Jöreskog, K.G.; Sörbom, D. LISREL8: User’s Reference Guide, 2nd ed; Scientific Software International: Chicago, IL, USA, 1996. [Google Scholar]
4. Goodman, L.A. Causal analysis of data from panel studies and other kinds of surveys. Am. J. Sociol. 1973, 78, 1135–1191. [Google Scholar]
5. Goodman, L.A. The analysis of multidimensional contingency tables when some variables are posterior to others: A modified path analysis approach. Biometrika 1973, 60, 179–192. [Google Scholar]
6. Goodman, L.A. The analysis of systems of qualitative variables when some of the variables are unidentifiable: Part I. A modified latent structure approach. Am. J. Sociol. 1974, 79, 1179–1259. [Google Scholar]
7. Goodman, L.A. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 1974, 61, 215–231. [Google Scholar]
8. Hagenaars, J.A. Categorical causal modeling: Latent class analysis and directed loglinear models with latent variables. Sociol. Methods Res. 1998, 26, 436–489. [Google Scholar]
9. Eshima, N.; Tabata, M.; Geng, Z. Path analysis with logistic regression models: Effect analysis of fully recursive causal systems of categorical variables. J. Jpn. Stat. Soc. 2001, 31, 1–14. [Google Scholar]
10. Kuha, J.; Goldthorpe, J.H. Path analysis for discrete variables: The role of education in social mobility. J. R. Stat. Soc. A 2010, 173, 351–369. [Google Scholar]
11. Albert, J.M.; Nelson, S. Generalized causal mediation analysis. Biometrics 2011, 67, 1028–1038. [Google Scholar]
12. Nelder, J.A.; Wedderburn, R.W.M. Generalized linear model. J. R. Stat. Soc. A 1972, 135, 370–384. [Google Scholar]
13. McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed; Chapman and Hall: London, UK, 1989. [Google Scholar]
14. Eshima, N.; Tabata, M. Entropy correlation coefficient for measuring predictive power of generalized linear models. Stat. Probab. Lett. 2007, 77, 588–593. [Google Scholar]
15. Eshima, N.; Tabata, M. Entropy coefficient of determination for generalized linear models. Comput. Stat. Data Anal. 2010, 54, 1381–1389. [Google Scholar]
16. Eshima, N.; Tabata, M. Three predictive power measures for generalized linear models: Entropy coefficient of determination, entropy correlation coefficient and regression correlation coefficient. Comput. Stat. Data Anal. 2011, 55, 3049–3058. [Google Scholar]
17. Patnaik, P.B. The non-central χ2 and F-distributions and their applications. Biometrika 1949, 36, 202–232. [Google Scholar]
Figure 1. Path diagram of social class mobility.
Figure 1. Path diagram of social class mobility.
Table 1. The estimated joint distributions of parental class X and education level Z.
Table 1. The estimated joint distributions of parental class X and education level Z.
SexParental Class XEducation Level Z
1234567
MenS0.0380.0150.0090.0530.0500.0470.082
I0.0690.0170.0150.0510.0320.0370.025
W0.1890.0180.0370.0880.0550.0510.028

WomenS0.0400.0140.0200.0790.0340.0480.048
I0.0720.0130.0230.0750.0210.0340.021
W0.2160.0180.0460.1100.0230.0320.009
Table 2. The effects of X and Y on Y = S.
Table 2. The effects of X and Y on Y = S.
XZEffect on Y = S
(X,Z) (total)Z (total)X (total)X (direct)X (indirect)
S1−0.79−0.890.100.22−0.12
2−0.49−0.650.190.23−0.04
3−0.14−0.420.290.240.04
40.19−0.190.380.170.21
50.510.040.470.160.31
60.840.280.560.160.40
71.160.510.650.110.55

I1−0.93−0.89−0.04−0.04−0.01
2−0.93−0.57−0.04−0.050.01
3−0.28−0.25−0.03−0.050.02
40.040.07−0.03−0.040.02
50.370.39−0.02−0.050.03
60.690.71−0.01−0.050.03
71.021.02−0.01−0.050.04

W1−0.99−0.84−0.150.17−0.32
2−0.67−0.45−0.210.01−0.23
3−0.34−0.07−0.230.03−0.30
4−0.020.32−0.340.06−0.39
50.310.71−0.400.03−0.43
60.631.09−0.460.03−0.49
70.961.48−0.520.01−0.53
Table 3. The effects of X and Y on Y = I.
Table 3. The effects of X and Y on Y = I.
XZEffect on Y = I
(X,Z) (Total)Z (Total)X (Total)X (Direct)X (Indirect)
S10.070.60−0.52−0.07−0.45
20.010.44−0.43−0.09−0.34
3−0.060.28−0.34−0.09−0.25
4−0.120.13−0.25−0.10−0.15
5−0.19−0.03−0.16−0.11−0.04
6−0.25−0.19−0.06−0.120.06
7−0.32−0.340.03−0.120.15

I10.150.24−0.090.06−0.16
20.080.17−0.090.10−0.18
30.020.10−0.080.10−0.18
4−0.050.03−0.070.07−0.15
5−0.11−0.05−0.070.09−0.15
6−0.18−0.12−0.060.08−0.14
7−0.24−0.19−0.050.09−0.15

W10.320.010.310.070.24
20.250.000.250.000.25
30.190.000.190.010.18
40.120.000.130.010.12
50.06−0.010.070.000.06
6−0.01−0.010.000.000.00
7−0.07−0.01−0.060.00−0.05
Table 4. The effects of X and Y on Y = W.
Table 4. The effects of X and Y on Y = W.
XZEffect on Y = W
(X,Z) (Total)Z (Total)X (Total)X (Direct)X (Indirect)
S10.671.28−0.62−0.05−0.57
20.420.95−0.52−0.09−0.44
30.180.61−0.43−0.09−0.34
4−0.070.27−0.34−0.07−0.27
5−0.31−0.06−0.25−0.09−0.16
6−0.56−0.40−0.16−0.10−0.06
7−0.80−0.74−0.06−0.080.01

I10.740.650.09−0.020.11
20.490.390.10−0.040.14
30.250.140.11−0.050.15
40.00−0.110.11−0.030.14
5−0.24−0.360.12−0.040.16
6−0.49−0.610.12−0.040.16
7−0.73−0.860.13−0.040.17

W10.640.400.24−0.100.34
20.390.210.18−0.010.19
30.150.030.12−0.030.14
4−0.10−0.150.06−0.070.12
5−0.34−0.34−0.01−0.050.04
6−0.59−0.52−0.07−0.05−0.02
7−0.83−0.70−0.13−0.03−0.10
Table 5. Summary Direct, Indirect, and Total Effects of X and Z on Y.
Table 5. Summary Direct, Indirect, and Total Effects of X and Z on Y.
SexExplanatory VariablesDirect EffectIndirect EffectTotal Effect
MenParental Class X0.033 (0.004)*0.076 (0.07)0.109 (0.008)
Education Z0.168 (0.010)0.168 (0.010)
(X,Z)0.276 (0.013)

WomenParental Class X0.011 (0.002)0.068 (0.06)0.079 (0.007)
Education Z0.210 (0.011)0.210 (0.011)
(X,Z)0.289 (0.012)
*The numbers in parentheses are the standard errors.

Share and Cite

MDPI and ACS Style

Eshima, N.; Tabata, M.; Borroni, C.G.; Kano, Y. An Entropy-Based Approach to Path Analysis of Structural Generalized Linear Models: A Basic Idea. Entropy 2015, 17, 5117-5132. https://doi.org/10.3390/e17075117

AMA Style

Eshima N, Tabata M, Borroni CG, Kano Y. An Entropy-Based Approach to Path Analysis of Structural Generalized Linear Models: A Basic Idea. Entropy. 2015; 17(7):5117-5132. https://doi.org/10.3390/e17075117

Chicago/Turabian Style

Eshima, Nobuoki, Minoru Tabata, Claudio Giovanni Borroni, and Yutaka Kano. 2015. "An Entropy-Based Approach to Path Analysis of Structural Generalized Linear Models: A Basic Idea" Entropy 17, no. 7: 5117-5132. https://doi.org/10.3390/e17075117