On the folded normal distribution

The characteristic function of the folded normal distribution and its moment function are derived. The entropy of the folded normal distribution and the Kullback--Leibler from the normal and half normal distributions are approximated using Taylor series. The accuracy of the results are also assessed using different criteria. The maximum likelihood estimates and confidence intervals for the parameters are obtained using the asymptotic theory and bootstrap method. The coverage of the confidence intervals is also examined.


Introduction
Mainly studied in the 1960s, the folded normal distribution is a special case of the Gaussian distribution occurring when the sign of the variable is always positive. In 1961, a method of estimating the parameters based upon the estimating equations of the moments was discussed in [1], where they also gave some examples of its applications in the industrial sector. The folded normal distribution was used to study the magnitude of deviation of an automobile strut alignment [2]. The properties of the multivariate folded normal distribution with its possible applications were studied in [3]. In addition, tables with probabilities for a range of values of the vector of parameters were provided, and an application of the model with real data was illustrated. An alternative method using the second and fourth moments of the distribution was proposed in [4], whilst [5] performed maximum likelihood estimation and calculated the asymptotic information matrix. Thereafter, the sequential probability ratio test for the null hypothesis of the location parameter being zero against a specific alternative was evaluated in [6] with the idea of illustrating the use of cumulative sum control charts for multiple observations. In [7], the author dealt with the hypothesis testing of the zero location parameter regardless of the variance being known or not. The distribution formed by the ratio of two folded normal variables was studied and illustrated with a few applications in [8]. The folded normal distribution has been applied to many practical problems. For instance, introduced in [9] is an economic model to determine the process specification limits for folded normally distributed data.
Through this paper, we will examine the folded normal distribution from a different perspective. In the process, we will consider the study of some of its properties, namely the characteristic and moment generating functions, the Laplace and Fourier transformations and the mean residual life of this distribution. The entropy of this distribution and its Kullback-Leibler divergence from the normal and half normal distributions will be approximated via the Taylor series. The accuracy of the approximations are assessed using numerical examples.
Also reviewed here is the maximum likelihood estimates (for an introduction, see [1]), with examples from simulated data given for illustration purposes. Simulation studies will be performed to assess the validity of the estimates with and without bootstrap calibration in low sample cases. Numerical optimization of the log-likelihood will be carried out using the simplex method [10].

The Folded Normal
The folded normal distribution with parameters (µ, σ 2 ) stems from taking the absolute value of a normal distribution with the same vector of parameters. The density of Y , with Y ∼N (µ, σ 2 ) is given by: Thus, X = |Y |, denoted by Y ∼ F N (µ, σ 2 ), has the following density: The density can be written in a more attractive form [5]: and by expanding the cosh via a Taylor series, we can also write the density as: We can see that the folded normal distribution is not a member of the exponential family. The cumulative distribution can be written as: where erf is the error function: The mean and the variance of Equation (2) is calculated using direct calculation of the integrals as follows [1]: where Φ (.) is the cumulative distribution function of the standard normal distribution. The third and fourth moments about the origin are calculated in [4]. We develop the calculation further by providing the characteristic function and the moment generating function of Equation (2). Figure (1) shows the densities of the folded normal for some parameter values.

Relations to Other Distributions
The distribution of Z = X/σ is a non-central χ distribution with one degree of freedom and non-centrality parameter equal to (µ/σ) 2 [11]. It is clear that when µ = 0, a central χ 1 is obtained. The half normal distribution is a special case of Equation (2), with µ = 0 for which [12] showed that it is the limiting form of the folded (central) t distribution as the degrees of freedom of the latter go to infinity. Both distributions are further developed in the bivariate case in [13].
The folded normal distribution can also be seen as the the limit of the folded non-standardized t distribution as the degrees of freedom go to infinity. The folded non-standardized t distribution is the distribution of the absolute value of the non-standardized t distribution with v degrees of freedom:

Mode of the Folded Normal Distribution
The mode of the distribution is the value of x for which the density is maximised. In order to find this value, we take the first derivative of the density with respect to x and set it equal to zero. Unfortunately, there is no closed form. We can, however, write the derivative in a better way and end up with a non-linear equation.
We saw from numerical investigation that when µ < σ, the maximum is met when x = 0. When µ ≥ σ, the maximum is met at x > 0, and when µ becomes greater than 3σ, the maximum approaches µ. This is of course something to be expected, since, in this case, the folded normal converges to the normal distribution.

Characteristic Function and Other Related Functions of the Folded Normal Distribution
Forms for the higher moments of the distribution when the moment is an odd and even number is provided in [4]. Here, we derive its characteristic and, thus, the moment generating function.
We will work now with the forms A and B.
where a = iσ 2 t + µ. Thus, the first part of Equation (15) becomes: The second exponent, B, using similar calculations becomes: and, thus, the second part of Equation (15) becomes: Finally, the characteristic function becomes: Below, we list some more functions that include expectations.

The moment generating function of Equation (2) exists and is equal to:
We can see that the characteristic generating function can be differentiated infinitely many times, since the first derivative contains the density of the normal distribution, and thus, it always contains some exponential terms. The folded normal distribution is not a stable distribution. That is, the distribution of the sum of its random variables do not form a folded normal distribution. We can see this from the characteristic (or the moment) generating function Equation (22) or Equation (23).
2. The cumulant generating function is simply the logarithm of the moment generating function: 3. The Laplace transformation can easily be derived from the moment generating function and is equal to: 4. The Fourier transformation is: However, this is closely related to the characteristic function.
We can see that E (e −2πixt ) = φ x (−2πt). Thus, Equation (26) becomes: 5. The mean residual life is given by: where t ∈ R + . The above conditional expectation is given by: The denominator in Equation (30) is written as The contents within the integral in the numerator of Equation (30) could be replaced by 1 − F (t), as well, but we will not replace it. The calculation of the numerator is done in the same way as the calculation of the mean. Thus: Finally, Equation (30) can be written as:

Entropy and Kullback-Leibler Divergence
When studying a distribution, the entropy and the Kullback-Leibler divergence from some other distributions are two measures that have to be calculated. In this case, we tried to approximate both of these quantities using a Taylor series. Numerical examples are displayed to show the performance of the approximations.

Entropy
The entropy is defined as the negative expectation of − log f (x).
Let us now take the second term of Equation (35) and see what is equal to: since the first moment is given in Equation (7) and (37) Finally, the third term of Equation (35) is equal to: by making use of the Taylor expansion for log (1 + x) around zero, but instead of x, we have e − 2µx σ 2 . Thus, we have managed to "break" the second integral of entropy Equation (35) down to smaller pieces of: by interchanging the order of the summation and the integration, filling up the square in the same way to the characteristic function and with a n = − 2nµ σ 2 . The final form of the entropy is given in Equation (40): Figure 2 shows the true value of Equation (40), when σ = 5 and µ ranges from zero to 25, thus for values of θ = µ σ from zero to five. The true value was calculated using numerical integration. Rprovides this option with the command integrate. The second and third order approximations (using the first two and three terms of the infinite sums in Equation (40)), are also displayed for comparison. We can see that the second order approximation is not as good as the third order, especially for small values of θ. The Taylor approximation of Equation (40) is valid when the value, a n , is close to zero. As with the logarithm approximation, the expansion is around zero; thus, when we start going further away from zero, the approximation loses its accuracy. The same is true in our case. When the values of θ are small, then the value of log 1 + e − 2µx σ 2 is far from zero. As θ increases, and, thus, the exponential term decreases, the Taylor series approximates true value better. This is why we see a small discrepancy of the approximations on the left of Figure 2, which become negligible later on.

Kullback-Leibler Divergence from the Normal Distribution
The Kullback-Leibler divergence [14] of one distribution from another in general is defined as the expectation of the logarithm of the ratio of the two distributions with respect to the first one: The divergence of the folded normal distribution from the normal distribution is equal to: which is the same as the second integral of Equation (35). Thus, we can approximate this divergence by the same Taylor series: True value Second order approximation Third order approximation Figure 3 presents two cases of the Kullback-Leibler divergence, for illustration purposes, when the first two and three terms of the infinite sum have been used. In the first graph, the standard deviation is equal to one, and in the second case, it is equal to five. The divergence seems independent of the variance. The change occurs as a result of the value of θ. It becomes clear that when the value of the mean to the standard deviation increases, the folded normal converges to the normal distribution.

Kullback-Leibler Divergence from the Half Normal Distribution
As mentioned in Section 2.1, the half normal distribution is a special case of the folded normal distribution with µ = 0. The Kullback-Leilber divergence of the folded normal from the half normal distribution is equal to: where f (x; µ, σ 2 ) stands for the folded normal Equation (2) and µ f is the expected value given in Equation (7). Figure 4 shows the approximations to the true value when σ = 1 and σ = 5. This time, we used the third and fifth order approximations, but even then, for small values of θ, the approximations were not satisfactory. The previous result cannot lead to an inequality regarding the Kullback-Leibler divergences from the two other distributions. When µ > σ, then the divergence from the half normal will be greater than the divergence from the normal, and when µ < σ, the opposite is true. However, this is not strict, since it can be the case for either inequality that the relationship between the divergences is not true. Instead, we can use it as a rule of thumb in general.

Parameter Estimation
We will show two ways of estimating the parameters. The first one can be found in [1], but we review it and add some more details. Both of them are essentially the maximum likelihood estimation procedure, but in the first case, we perform maximization, whereas in the second case, we seek the root of an equation.
The log-likelihood of Equation (2) can be written in the following way: where n is the sample size of the x i values. The partial derivatives of Equation (41) are: , and By equating the first derivative of the log-likelihood to zero, we obtain a nice relationship: Note that Equation (42) has three solutions, one at zero and two more with the opposite sign. The example in Section 4.1 will show graphically the three solutions. By substituting Equation (42), to the derivative of the log-likelihood w.r.tσ 2 and equating to zero, we get the following expression for the variance: The above relationships Equations (42) and (43) can be used to obtain maximum likelihood estimates in an efficient recursive way. We start with an initial value for σ 2 and find the positive root of Equation (42). Then, we insert this value of µ in Equation (43) and get an updated value of σ 2 . The procedure is being repeated until the change in the log-likelihood value is negligible. Another easier and more efficient way is to perform a search algorithm. Let us write Equation (42) in a more elegant way.
where σ 2 is defined in Equation (43). It becomes clear that the optimization the log-likelihood Equation (41) with respect to the two parameters has turned into a root search of a function with one parameter only. We tried to perform maximization via the E-M algorithm, treating the sign as the missing information, but it did not prove very good in this case.

An Example with Simulated Data
We generated 100 random values from the F N (2,9) in order to illustrate the maximum likelihood estimation procedure. The estimated parameter values were equal to (μ = 2.183,σ 2 = 8.065). The corresponding 95% confidence intervals for µ and σ 2 were (0.782, 3.585) and (2.022, 14.108) respectively. Figure 5 shows graphically the existence of the three extrema of the log-likelihood Equation (41), one minimum (always at zero) and two maxima at the maximum likelihood estimates of µ.

Simulation Studies
Simulation studies were implemented to examine the accuracy of the estimates using numerical optimization based on the simplex method [10]. Numerical optimization was performed in [15], using the optim function. The term accuracy refers to interval estimation rather than point estimation, since the interest was on constructing confidence intervals for the parameters. The number of simulations was set equal to R = 1,000. The sample sizes ranged from 20 to 100 for a range of values of the parameter vector. The R-package VGAM [16] offers algorithms for obtaining maximum likelihood estimates of the folded normal, but we have not used it here.
For every simulation, we calculated 95% confidence intervals using the normal approximation, where the variance was estimated from the inverse of the observed information matrix. The maximum likelihood estimates are asymptotically normal with variance equal to the inverse of the Fisher's information. The sample estimate of this information is given by the second derivative (Hessian matrix) of the log-likelihood with respect to the parameter. This is an asymptotic confidence interval.
Bootstrap confidence intervals were also calculated using the percentile method [17]. For every simulation, we produced the bootstrap distribution of the data with B = 1000 bootstrap repetitions. Thus, we calculated the 2.5% lower and upper quantiles for each of the parameters. In addition, we calculated the correlations for every pair of the parameters.
Tables 1 to 4 present the coverage of the 95% confidence intervals for the two parameters at different pairs of sample size and mean. The rows correspond to the sample size, whereas the columns correspond to the ratio θ = µ σ , with σ = 5 fixed. Table 1. Estimated coverage probability of the 95% confidence intervals for the mean parameter, µ, using the observed information matrix. What can be seen from Tables 1 and 2 is that whist the sample size is important, the value of θ, the mean to standard deviation ratio, is more important. As this ration increase the coverage probability increases, as well, and reaches the desired nominal 95%. This is also true for the bootstrap confidence intervals, but the coverage is in general higher and increases faster as the sample size increases in contrast to the asymptotic confidence interval. What is more is that when the value of θ is less than one, the bootstrap confidence interval is to be preferred. When the value of θ becomes equal to or more than one, then both the bootstrap and the asymptotic confidence intervals produce similar coverages.

Values
The results regarding the variance are presented in Tables 3 and 4. When the value of θ is small, both ways of obtaining confidence intervals for this parameter are rather conservative. The bootstrap intervals tend to perform better, but not up to the expectations. Even when the value of θ is large, if the sample sizes are not large enough, the nominal coverage of 95% is not attained.  Table 3. Estimated coverage probability of the 95% confidence intervals for the variance parameter, σ 2 , using the observed information matrix. The correlation between the two parameters was also estimated for every simulation from the observed information matrix. The results are displayed in Table 5. The correlation between the two parameters is always negative irrespective of the sample size or the value of θ, except for the case when θ = 4. In this case, the correlation becomes zero as expected. As the value of θ grows larger, the probability of the normal distribution, which lies on the negative axis, becomes smaller until it becomes negligible. In this case, the distribution equals the classical normal distribution for which the two parameters are known to be orthogonal.  Table 5. Estimated correlations between the two parameters obtained from the observed information matrix.  Table 6 shows the probability of a normal random variable being less than zero when σ = 5 and the same values of θ as in the simulation studies. When the ratio of mean to standard deviation is small, the area of the normal distribution in the negative side is large, and as the value of this ratio increases, the probability decreases until it becomes zero. In this case, the folded normal is the normal distribution, since there are no negative values to fold on to the positive side. This of course is in accordance with all the previous observations and results we saw.

Application to Body Mass Index Data
We fitted the folded normal distribution on real data. These are observations of the the body mass index of 700 New Zealand adults, accessible via the R package VGAM [16]. These measurements are a random sample from the Fletcher Challenge/Auckland Heart and Health survey conducted in the early 1990s [18]. Figure 6 contains a histogram of the data along with the parametric (folded normal) and the non-parametric (kernel) density estimation. It should be noted that the fitted folded normal here converges in distribution to the normal. The estimated parameters (using the optim command in R) wereμ = 26.685(0.175) and σ 2 = 21.324(1.140), with their standard error appearing inside the parentheses. Since the sample size is very large, there is no need to estimate their standard errors and, consequently, 95% confidence intervals, even though their ratio is only 1.251. Their estimated correlation coefficient was very close to zero (2 × 10 −4 ), and the estimated probability of the folded normal with these parameters below zero is equal to zero.

Discussion
We derived the characteristic function of this distribution and, thus, its moment function. The cumulant generating function is simply the logarithm of the moment generating function, and therefore, it is easy to calculate. The importance of these two functions is that they allow us to calculate all the moments of the distribution. In addition, we calculated the Laplace and Fourier transformations and the mean residual life.
The entropy of the folded normal distribution and the Kullback-Leibler divergence of this distribution from the normal and half normal distributions were approximated using the Taylor series. The results were numerically evaluated against the true values and were as expected.
We reviewed the maximum likelihood estimates and simplified their calculation and saw some properties of them. Confidence intervals for the parameters were obtained using the asymptotic theory and the bootstrap methodology under the umbrella of simulation studies.