Next Article in Journal
Masked Instability: Within-Sector Financial Risk in the Presence of Wealth Inequality
Previous Article in Journal
Hierarchical Markov Model in Life Insurance and Social Benefit Schemes

Technical Note

# Sampling the Multivariate Standard Normal Distribution under a Weighted Sum Constraint

Louvain Finance Center & CORE, Université catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Risks 2018, 6(3), 64; https://doi.org/10.3390/risks6030064
Received: 30 May 2018 / Revised: 21 June 2018 / Accepted: 22 June 2018 / Published: 25 June 2018

## Abstract

Statistical modeling techniques—and factor models in particular—are extensively used in practice, especially in the insurance and finance industry, where many risks have to be accounted for. In risk management applications, it might be important to analyze the situation when fixing the value of a weighted sum of factors, for example to a given quantile. In this work, we derive the $(n−1)$-dimensional distribution corresponding to a n-dimensional i.i.d. standard Normal vector $Z=(Z1,Z2,…,Zn)′$ subject to the weighted sum constraint $w′Z=c$, where $w=(w1,w2,…,wn)′$ and $wi≠0$. This law is proven to be a Normal distribution, whose mean vector $μ$ and covariance matrix $Σ$ are explicitly derived as a function of $(w,c)$. The derivation of the density relies on the analytical inversion of a very specific positive definite matrix. We show that it does not correspond to naive sampling techniques one could think of. This result is then used to design algorithms for sampling $Z$ under constraint that $w′Z=c$ or $w′Z≤c$ and is illustrated on two applications dealing with Value-at-Risk and Expected Shortfall.

## 1. Introduction

Factor models are extensively used in statistical modeling. In banking and finance, for instance, it is a standard procedure to introduce a dependence structure among loans (see, e.g., Li’s model Li (2016), but also Andersen and Sidenius (2004); Hull and White (2004); Laurent and Sestier (2016); Vrins (2009), just to name a few). In the popular case of a one-factor Gaussian copula model, the default of the i-th entity is jointly driven by a common risk factor, say Y, as well as an idiosynchratic risk, all being independent and Normally distributed. In multi-factor models, the common factor is replaced by a weighted sum of Normal risk factors $Z ˜ : = ( Z ˜ 1 , Z ˜ 2 , … , Z ˜ n ) ′$ representing different aspects of the global economy (like region, sector, etc.). The random vector $Z ˜$ can typically be expressed (via a Cholesky decomposition) as a weighted sum of n (or less) i.i.d. standard Normal factors $Z : = ( Z 1 , Z 2 , … , Z n ) ′$. Interestingly, the asymptotic law (as the number of loans tends to infinity) of these portfolios can be derived analytically when only one systematic factor ($n = 1$) is considered (see Vasicek (1991) in the case of homogeneous pools and Gordy (2003) for an extension to heterogeneous pools). However, the case $n > 1$ requires numerical methods, typically Monte Carlo simulations. This raises the following question, whose practical interest will be illustrated with concrete examples: given a value c of a common factor $Y = w ′ Z$ where $w = ( w 1 , w 2 , … , w n ) ′$ is a vector of non-zero weights, what is the distribution of $Z$? In other words, how can we sample $Z$ conditional upon $Y = c$? It is of course straightforward to sample a vector $Z$ of n Normal variables such that the weighted sum is c. One possibility is to sample a $( n − 1 )$-dimensional i.i.d. standard Normal vector $( Z 1 , Z 2 , … , Z n − 1 )$ and then set $Z n = ( c − ∑ i = 1 n − 1 w i Z i ) / w n$ (Method 1). Another possibility would be to draw a sample of a n-dimensional i.i.d. standard Normal vector $Q = ( Q 1 , Q 2 , … , Q n ) ′$ and set $Z i = Q i + ( c − w ′ Q ) / ( n w i )$ (Method 2). Alternatively, one could just rescale such a vector and take $Z = c w ′ Q Q$ (Method 3). However, as discussed in Section 3, none of these approaches yield the correct answer.
Conditional sampling methods have already been studied in the literature. One example is of course sampling random variables between two bounds, which is a trivial problem solved by the inverse transform, see e.g., the reference books Glasserman (2010) or Rubinstein and Kroese (2017). In the context of data acquisition, Pham (2000) introduces a clever algorithm to draw m samples from the n-dimensional Normal distribution $N ( μ , Σ )$ subject to the constraint that the empirical (i.e., sample) mean vector $μ ^ ( m )$ and covariance matrix $Σ ^ ( m )$ of these m vectors agree with their theoretical counterparts $μ$ and $Σ$, respectively. Very recently, Meyer (2018) introduced a method to draw n-dimensional samples from a given distribution conditional upon the fact that $q < n$ entries take known values. These works, however, do not provide an answer to the question we are dealing with. In particular, we do not put constraints on the average of sampled vectors, nor force some of their entries to take specific values, but instead work with a constraint on the weighted sum of the n entries of each of these vectors.
In this paper, we derive the $( n − 1 )$-dimensional conditional distribution associated with the $( w ′ Z = c )$-slice of the n-dimensional standard Normal density when $w i ≠ 0$ for all $i ∈ { 1 , 2 , … , n }$, $n ≥ 2$. More specifically, we restrict ourselves to derive the joint distribution of $X = ( X 1 , X 2 , … , X n ) ,$ where $X i : = w i Z i$ (as the distribution of $Z$ can be obtained by simple rescaling of that of $X$ as $X = DZ ,$ where $D$ is an invertible diagonal matrix satisfying $D i , i = w i$). The result derives from the analytical properties of a square positive definite matrix having a very specific form. We conclude the paper with two sampling algorithms and two illustrative examples.

## 2. Derivation of the Conditional Density

The conditional distribution is derived by first noting that the conditional density takes the general form
$f X 1 , … , X n x 1 , … , x n ∑ i = 1 n X i = c = f X 1 , … , X n − 1 , X n x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i f ∑ i = 1 n X i ( c ) δ ∑ i = 1 n x i − c ,$
where $δ$ is the Dirac measure centered at 0. This expression is likely to be familiar to most readers but some technical details are provided in Appendix A.
We shall show that the random vector $X = ( X 1 , … , X n ) ′$ given $∑ i = 1 n X i = c$ is distributed as $( X ˜ 1 , … , X ˜ n − 1 , c − ∑ i = 1 n − 1 X ˜ i ) ′$ where the density of $X ˜ = ( X ˜ 1 , … , X ˜ n − 1 ) ′$ is a multivariate Normal with mean vector $μ ( c , w )$ and covariance matrix $Σ ( w )$ respectively given by
$μ i ( c , w ) : = c w i 2 ∥ w ∥ 2 ,$
$Σ i , j ( w ) : = w i 2 ∥ w ∥ 2 δ i j ∥ w ∥ 2 − w i 2 + ( δ i j − 1 ) w j 2 .$
Note that, in these expressions, the indices $i , j$ belong to ${ 1 , 2 , … , n − 1 }$ and $δ i j$ is the Kronecker symbol.
The denominator of Equation (1) collapses to the univariate centered Normal density with standard deviation $∥ w ∥$ evaluated at c, noted $ϕ ( c ; 0 , ∥ w ∥ )$. Similarly, when $∑ i = 1 n x i = c$, the numerator is just the product of univariate Normal densities. Using $x n = c − ∑ i = 1 n − 1 x i$,
$∏ i = 1 n ϕ ( x i ; 0 , w i ) = ϕ c − ∑ i = 1 n − 1 x i ; 0 , w n ∏ i = 1 n − 1 ϕ ( x i ; 0 , w i ) .$
Hence, when the vector $x = ( x 1 , … , x n ) ′$ meets the constraint, (1) looks like a $( n − 1 )$-th dimensional Normal pdf:
$∏ i = 1 n ϕ ( x i ; 0 , w i ) ϕ ( c ; 0 , ∥ w ∥ ) = k ( w ) e c 2 2 ∥ w ∥ 2 exp − 1 2 ∑ i = 1 n − 1 1 w i 2 + 1 w n 2 x i 2 + x i w n 2 ∑ j = 1 , j ≠ i n − 1 x j − 2 c w n 2 x i − c 2 2 w n 2 ,$
where
$k ( w ) : = 1 ( 2 π ) n − 1 ∥ w ∥ ∏ i = 1 n w i .$
On the other hand, the general expression of the Normal density of dimension $n − 1$ with mean vector $μ = ( μ 1 , … , μ n − 1 ) ′$ and covariance matrix $Σ$ whose inverse has entries noted by $α i , j = ( Σ − 1 ) i , j$ can be obtained by expanding the matrix form of the multivariate Normal:
$ϕ ( x ; μ , Σ ) = K exp − 1 2 ∑ i = 1 n − 1 α i , i x i 2 + x i ∑ j = 1 , j ≠ i n − 1 α i , j x j − x i ∑ j = 1 n − 1 ( α j , i + α i , j ) μ j + μ i ∑ j = 1 n − 1 α i , j μ j ,$
where $K : = 1 / ( 2 π ) n − 1 | Σ |$. To determine the expression of the covariance matrix and mean vector of the conditional density (4) (assuming it is indeed Normal), it remains to determine the entries of $μ$ and $Σ − 1$ by inspection, comparing the expression of conditional density in (4) with that of the multivariate Normal (5) and then to obtain $Σ$ by analytical inversion.
Leaving only $k ( w )$ as a factor in front of the exponential in (4), the independent term (i.e., the term that does not appear as a factor of any $x i$) reads w.l.o.g. as
$c 2 2 ∥ w ∥ 2 − c 2 2 w n 2 = − c 2 2 w n 2 ∑ i = 1 n − 1 w i 2 ∥ w ∥ 2 = − c 2 2 w n 2 ∥ w ∥ 2 ∑ i = 1 n − 1 γ i w i 2$
for any $( γ 1 , γ 2 , … , γ n − 1 )$ satisfying $∑ i = 1 n − 1 γ i w i 2 = ∑ i = 1 n − 1 w i 2$ (note that the constant case $γ i = 1$ might be a solution, but it is not guaranteed at this stage).
Comparing (4) and (5), it comes that the expression
$1 w i 2 + 1 w n 2 x i 2 + x i w n 2 ∑ j = 1 , j ≠ i n − 1 x j − 2 c w n 2 x i + c 2 γ i w i 2 w n 2 ∥ w ∥ 2$
must agree with
$α i , i x i 2 + x i ∑ j = 1 , j ≠ i n − 1 α i , j x j − x i ∑ j = 1 n − 1 ( α i , j + α j , i ) μ j + μ i ∑ j = 1 n − 1 α i , j μ j$
for all $x 1 , x 2 , … , x n − 1$. Equating the $x i x j$ terms in (6) and (7) uniquely determines the components of $Σ − 1$:
$α i , i = ( Σ − 1 ) i , i = 1 w i 2 + 1 w n 2 and α i , j ≠ i = ( Σ − 1 ) i , j ≠ i = 1 w n 2 .$
It remains to show that $k ( w ) = K$, in order to find the expressions of the $μ i$s from the $x i$ terms, provides the expression of $Σ$ by inverting $Σ − 1$ and, finally, to check that the independent terms in (6) and (7) agree and that the implied $γ i$’s comply with $∑ i = 1 n − 1 γ i w i 2 = ∑ i = 1 n − 1 w i 2$. To that end, we rely on the following lemma.
Lemma 1.
Let $A ( m )$ denote a matrix with $( i , j )$ elements $A i , j ( m ) = a i δ i j + a 0$, $a k > 0$ for all $k ∈ { 0 , 1 , … , m }$. Define $π ( m ) : = ∏ k = 0 m a k$ and $s ( m ) : = ∑ k = 0 m 1 / a k$. Then:
(i)
$A ( m )$ is positive definite;
(ii)
its determinant is given by
$| A ( m ) | = ∑ k = 0 m ∏ j = 0 , j ≠ k m a j = π ( m ) s ( m ) ;$
(iii)
the $( i , j )$-element of the inverse $B ( m ) : = ( A ( m ) ) − 1$ is given by
$B i , j ( m ) = 1 a i s ( m ) δ i j a i s ( m ) − 1 a i + δ i j − 1 a j .$
The proof is given in Appendix B.
Observe now that $Σ − 1$ takes the form $A ( n − 1 )$ with $a 0 ← 1 / w n 2$ and $a i ← 1 / w i 2$ for $i ∈ { 1 , 2 , … , n − 1 }$. We can call Lemma 1 $( i )$ to show that $Σ − 1$ is symmetric and positive definite, proving that $Σ$ is a valid covariance matrix satisfying $| Σ | > 0$ (notice however that in $A ( n − 1 )$ the summation and product indices agree with that of the $a i$’s, i.e., range from 0 to $n − 1$, but the index of $w i$ ranges from 1 to n). From Lemma 1 $( i i )$, $k ( w ) = K$ as
$| Σ − 1 | = ∏ j = 1 n 1 w j 2 ∥ w ∥ 2 = ∥ w ∥ 2 ∏ j = 1 n w j 2 ⇒ 1 / | Σ | = | Σ − 1 | = ∥ w ∥ ∏ k = 1 n w k .$
We can then use Lemma 1 $( i i i )$ to determine $β i , j : = B i , j ( n − 1 )$, the elements of $Σ$,
$β i , j = w i 2 ∥ w ∥ 2 δ i j ∥ w ∥ 2 − w i 2 + ( δ i j − 1 ) w j 2 .$
This expression agrees with the right-hand side of (3). Finally, the mean vector is obtained by equating the $x i$ terms in (6) and (7). Using that $Σ − 1$ is symmetric, we observe that for all $i ∈ { 1 , 2 , … , n − 1 }$:
$2 c w n 2 = 2 ∑ j = 1 n − 1 α i , j μ j ⇒ ∑ j = 1 n − 1 α i , j μ j = c w n 2 .$
Hence, $Σ − 1 μ = c w n 2 1 n − 1$ where $1 m$ is the m-dimensional column vector with m entries all set to 1 so that $μ i = c w n 2 ∑ j = 1 n − 1 β i , j = c w i 2 ∥ w ∥ 2$. It remains to check that these expressions for $μ$ and $Σ$ also comply with the independent term. Equating the independent terms of (6) and (7) and calling (8) yields
$c 2 γ i w i 2 w n 2 ∥ w ∥ 2 = μ i c w n 2 ⇒ μ i = c γ i w i 2 ∥ w ∥ 2$
which holds true provided that we take $γ i = 1$. This concludes the derivation of the conditional law as these $γ i$’s trivially comply with the constraint $∑ i = 1 n − 1 γ i w i 2 = ∑ i = 1 n − 1 w i 2 = ∥ w ∥ 2 − w n 2$. This expression of $μ i$ corresponds to the right-hand side of (2).

## 3. Discussion

It is clear from (2) that the construction scheme based on the conditional distribution derived above (referred to as Method 4) is incompatible with the three other construction schemes discussed in the Introduction. Indeed, Method 1 yields $E ( X i ) = c δ i n$, whereas Method 2 leads to $E ( X i ) = c / n$. Finally, Method 3 corresponds to take as $X i$ the ratio of two jointly Normal variables, for which it is known that the moments do not exist (for instance, the ratio of two independent and zero-mean Normal variables leads to a Cauchy distribution; see e.g., Fieller (1932) or more recently Cedilnik et al. (2004)). This is illustrated in Figure 1, which shows 250 samples of $X$ for these four methods in the bivariate case ($n = 2$). The same mismatch happens with the covariance, as shown in Figure 2 in the $n = 3$ case: Method 1 yields $C ov ( X 1 , X 2 ) = 0$, Method 2 leads to $C ov ( X 1 , X 2 ) = ( ∥ w ∥ / 3 ) 2 − ( w 1 2 + w 2 2 ) / 3$, the covariance is undefined for Method 3 and for Method 4 $C ov ( X 1 , X 2 ) = − w 1 w 2 / ∥ w ∥ 2$, from (3).

## 4. Sampling Algorithms

From the above result, it is easy to design sampling algorithms. We first derive an algorithm to sample a multivariate Normal distribution under a weighted sum constraint according to the density derived above (Algorithm 1). Next, we show that one can also easily extend this algorithm to sample the multivariate Normal distribution under an upperbound constraint on the weighted sum (Algorithm 2).
 Algorithm 1. Sampling of $Z$ given $w ′ Z = c$. 1. From the vector of weights $w$ and the constraint c, compute the $( n − 1 )$-dimensional mean vector $μ ( c , w )$ and symmetric matrix $Σ ( w )$ from (2) and (3); 2. Compute the eigen decomposition of the covariance matrix $Σ ( w ) = V Λ V ′$; 3. Sample $( n − 1 )$ i.i.d. standard Normal variates $z ˜ = ( z ˜ 1 , … , z ˜ n − 1 ) ′$; 4. Transform these variates using the mean vector and covariance matrix $x ˜ ← μ ( c , w ) + V Λ z ˜ ;$ 5. Enlarge the $( n − 1 )$-dimensional vector $x ˜$ with the n-th component to get $x$, $x ← x ˜ ′ , c − ∑ i = 1 n − 1 x ˜ i ′ ;$ 6. Return $Z$ where $z i ← x i / w i$ for $i ∈ { 1 , 2 , … , n }$.
The above algorithm can be extended to sample an n-th dimensional vector $Z$ given $w ′ Z ≤ c$. To that end, it suffices to first draw a sample from the conditional distribution of $w ′ Z$. Clearly, $w ′ Z ∼ N ( 0 , ∥ w ∥ )$ so that
$F w ′ Z ; c ( x ) : = P ( w ′ Z ≤ x | w ′ Z ≤ c ) = Φ x ∥ w ∥ Φ c ∥ w ∥ , if x < c , 1 , otherwise .$
From the inverse transform, the random variable
$Y : = F w ′ Z ; c − 1 ( U )$
has cumulative distribution function $F w ′ Z ; c$ whenever U is a Uniform-$[ 0 , 1 ]$ random variable. This leads to the following sampling procedure.
 Algorithm 2. Sampling of $Z$ given $w ′ Z ≤ c$. 1. Draw a sample u from a Uniform-$[ 0 , 1 ]$ distribution; 2. Draw a sample $c ˜$ from the conditional law of $w ′ Z$ given $w ′ Z ≤ c$: $c ˜ ← ∥ w ∥ Φ − 1 u Φ c ∥ w ∥ ;$ 3. Apply Algorithm 1 using $c ˜$ as constraint (i.e., $c ← c ˜$); 4. Return $Z$.
Observe that it is very easy to adjust Algorithm 2 to deal with the alternative constraint $w ′ Z ≥ c$. To that end, it suffices to replace $u Φ c ∥ w ∥$ by $( 1 − u ) + u Φ c ∥ w ∥$.

## 5. Applications

In this section, we provide two applications that are kept simple on purpose for the sake of illustration.

#### 5.1. Conditional Portfolio Distribution

The application consists of computing the distribution of a portfolio conditional upon the fact that one stock reaches an extreme level, i.e., $S = q α ( S )$ where $q α ( X )$ is the $α$-quantile of the random variable X. We consider K stocks with Normal dollar-returns $S i ∼ N ( m i , σ i 2 )$ and note J the dollar-return of a portfolio composed of these stocks with weigths $π = ( π 1 , π 2 , … , π K ) ′$. We postulate the following n-factor dependence structure:
$S i = m i + σ i ∑ j = 1 n W i , j Z j + W i , n + 1 ϵ i , ∑ j = 1 n + 1 W i , j 2 = 1 , i ∈ { 1 , 2 , … , K } ,$
where $W$ is the K-by-$( n + 1 )$ matrix of loadings, $Z$ is the vector of n i.i.d. systematic standard Normal risk factors and the $ϵ i$s, $i ∈ { 1 , 2 , … , K }$ are the i.i.d. idiosynchratic standard Normal risk factors, independent from the $Z j$s. Hence,
$J ∼ ∑ i = 1 K π i m i + σ i ∑ j = 1 n W i , j Z j + W i , n + 1 ϵ i .$
Set $D = d i a g ( σ 1 , … , σ K )$, $m = ( m 1 , … , m K ) ′$ and $M$ the K-by-n matrix made of the first n columns of $W$. The portfolio return satisfies $J ∼ N ( μ ˜ , σ ˜ 2 )$ with
$μ ˜ = ∑ i = 1 K π i m i = π ′ m , σ ˜ 2 = ∑ j = 1 n ∑ i = 1 K π i σ i W i , j 2 + ∑ i = 1 K ( π i σ i W i , n + 1 ) 2 = π ′ D MM ′ D π + ∑ i = 1 K ( π i σ i W i , n + 1 ) 2 ,$
where the first term in the right-hand side of $σ ˜ 2$ is the contribution of the n systematic factors and the second results from the K independent idiosynchratic components.
Let us now compute the conditional distribution of the portfolio given that $S 1$ is equal to its percentile $α$, i.e., when
$S 1 = m 1 + σ 1 Φ − 1 ( α ) .$
Noting $c = Φ − 1 ( α )$, we conclude that
$∑ j = 1 n W 1 , j Z j + W 1 , n + 1 ϵ 1 = c .$
Applying the above result with $w = w 1 = ( W 1 , 1 , … , W 1 , n + 1 ) ′$, the density of the random vector $( W 1 , j Z 1 , … , W 1 , n Z n , W 1 , n + 1 ϵ 1 )$ conditional upon (9) is jointly Normal with mean vector $μ$ and covariance matrix $Σ$ given by
$μ i : = μ i ( c , w 1 ) = c W 1 , i 2 , Σ i , j : = Σ i , j ( w 1 ) = W 1 , i 2 δ i j 1 − W 1 , i 2 + ( δ i j − 1 ) W 1 , j 2 = W 1 , i 2 δ i j − W 1 , j 2 ,$
where $i , j$ in ${ 1 , 2 , … , n + 1 }$. In order to find the joint distribution of $Z$, we need to correct for the scaling coefficients and disregard $ϵ 1$. Correcting for the scaling coefficients simply requires rescaling the entries of the mean vector and covariance matrix by $1 / W 1 , i$ and $1 / ( W 1 , i W 1 , j )$, respectively. The conditional distribution of $Z$ is a n-dimensional Normal with mean vector and covariance matrix found by taking the first n entries of the above mean vector and the n-by-n upper-left of the covariance matrix. This leads to
$μ i = c W 1 , i , Σ i , j = W 1 , i W 1 , j δ i j − W 1 , j 2 , i , j ∈ { 1 , 2 , … , n } .$
The eigen decomposition of the covariance matrix is noted $Σ = V Λ V ′$, so that $Z ∼ μ + V Λ Z ˜ ,$ where $Z ˜$ is a n-variate vector of independent standard Normal variables.
Let us note $π j$ the vector $π$ whose j-th entry is set to 0, i.e., $π j = π − π j e j$ where $e j$ is the j-th basis vector. Letting $Z ^$ be a standard Normal random variable independent from $Z ˜$,
$J ∼ π ′ m + π 1 σ 1 c + π 1 ′ DM μ + π 1 ′ DMV Λ Z ˜ + ∑ i = 2 K ( π i σ i W i , n + 1 ) 2 Z ^$
so that $J ∼ N μ ˜ , σ ˜ 2$ where
$μ ˜ = π ′ m + π 1 σ 1 c + π 1 ′ DM μ , σ ˜ 2 = π 1 ′ DM Σ M ′ D π 1 + ∑ i = 2 K ( π i σ i W i , n + 1 ) 2 .$

#### 5.2. Expected Shortfall of a Defaultable Portfolio

The next example consists of approximating the Conditional Value-at-Risk (a.k.a. Expected Shortfall) associated with the portfolio of m defaultable assets in a multi-factor model. The total loss L on such a portfolio up to the time horizon T is the sum of the individual losses $L = ∑ i = 1 m L i$, where the contribution of the loss of the i-th asset is of the form $L i = w i 1 { τ i ≤ T }$, where $w i$ is the weight of the asset in the portfolio and $τ i$ is the default time of the i-th obligor. Whereas the expected loss is independent from the possible correlation across defaults, it is a key driver of the Value-at-Risk, and hence of the economic and regulatory capital. Most credit risk models introduce such dependency by relying on latent variables, like $1 { τ i ≤ T } ⇔ 1 { ξ i ≤ F − 1 ( π i ) }$, where the $ξ i$s are correlated random variables with cumulative distribution function F and $π i$ is the marginal probability that $τ i ≤ T$ under the chosen measure. The most popular choice (although debatable) is to rely on multi-factor Gaussian models, i.e., to consider $ξ i = w i ′ Z + 1 − ∥ w i ∥ 2 ϵ i$, where $w i$ is a n-dimensional vector of weights with norm smaller than 1, $Z$ is the vector of n i.i.d. standard Normal systematic factors and the $ϵ i$ are i.i.d. standard Normal random variables independent from the $Z j$s representing the idiosynchratic risks. Computing the Expected Shortfall in a multi-factor framework is very time-consuming as there is no closed-form solution and many simulations are required. A possible alternative to the plain Monte Carlo estimator is to rely on the ASRF$*$ model of Pykhtin, which can be seen as the single-factor model that “best” approximates the multi-factor model in the left tail, in some sense (see Pykhtin (2004) for details). The ASRF$*$ model thus deals with a loss variable $L *$ relying on a single factor Y, but such that $q α ( L * ) ≈ q α ( L ) ,$ where L is the loss variable in the multi-factor model. By the law of large numbers, the idiosynchratic risks are diversified away for m large enough, so that conditional upon $Y = x$, the portfolio loss in the ASRF$*$ model converges almost surely, as $m → ∞$, to $L * ( x )$ with $L * ( Y ) : = E [ L * | Y ]$. Moreover, $L * ( x )$ is a monotonic and decreasing function of x. Consequently, the Value-at-Risk of the ASRF$*$ model satisfies $q α ( L * ) ≈ L * ( Φ − 1 ( 1 − α ) )$ for m large enough. The asymptotic analytical expression $L * ( Φ − 1 ( 1 − α ) )$ is known as the large pool approximation (see e.g., Gordy (2003)). In the derivation of the ASRF$*$ analytical formula, Pykhtin implicitly models Y as a linear combination of the factors $Z j$ appearing in the multi-factor model, i.e., $Y = b ′ Z$ s.t. $∥ b ∥ = 1$. One can thus draw samples for the $Z j$s by using Algorithm 2 as follows: (i) draw a value for the standard Normal factor Y conditional upon $Y ≤ Φ − 1 ( 1 − α )$, i.e., set $Y = Φ − 1 ( U ( 1 − α ) )$ with U a Uniform-$[ 0 , 1 ]$ random variable so that $L * ( Y ) ≥ L * ( Φ − 1 ( 1 − α ) )$, and then (ii) sample $Z$ conditional upon $b ′ Z = Y$ from the joint density derived in the paper. Therefore, we use $L * ( Y ) ≥ L * ( Φ − 1 ( 1 − α ) )$ as a proxy for the condition $L ≥ q α ( L )$ involved in the expected shortfall definition (observe that both L and $L *$ depend on the same random vector $Z$), but use the actual (multi-factor) loss to estimate the expected loss under this condition. In other words, we effectively compute $E ^ [ L | L * ( b ′ Z ) ≥ L * ( Φ − 1 ( 1 − α ) ) ]$ as a proxy of the genuine Expected Shortfall, defined as $E [ L | L ≥ q α ( L ) ]$, leading to a drastic reduction of the computational cost.

## 6. Conclusions

Many practical applications in the area of risk management deal with factor models, and often these factors are taken to be Gaussian. In various cases, it can be interesting to analyse the picture under some constraints on the weighted sum of these factors. This might be the case for instance when it comes to perform scenario analyses in adverse circumstances, to compute conditional risk measures or to speed up simulations (in the same vein as importance sampling). In this paper, we derive the density of a n-dimensional Normal vector with independent components subject to the constraint that the weighted sum takes a given value. It is proven to be a $( n − 1 )$-multivariate Normal whose mean vector and covariance matrix can be computed in closed-form by relying on the specific structure of the (inverse) covariance matrix. This result naturally leads to various sampling algorithms, e.g., to draw samples with weighted sum being equal to, below or above a given threshold. Interestingly, the proposed scheme is shown to differ from various “standard rescaling” procedures applied to independent samples. Indeed, the latter fail to comply with the actual conditional distribution found, both in terms of expectation and variance.

## Funding

This research has received no external funding.

## Acknowledgments

The author is grateful to Damiano Brigo for interesting discussions around this question and to Monique Jeanblanc for suggestions about Appendix A.

## Conflicts of Interest

The author declares no conflict of interest.

## Appendix A. General Expression of the Conditional Density

Define $Y : = ∑ i = 1 n X i$, where $( X 1 , X 2 , … , X n )$ has joint density f and $f Y$ is the density of Y. Let us note $Φ$ as the conditional expectation of $h ( X 1 , … , X n )$ given Y for some h:
$Φ ( Y ) : = E [ h ( X 1 , … , X n ) | Y ] .$
The conditional density we are looking for is the function $g ( x 1 , … , x n ; c )$ satisfying
$Φ ( c ) = ∫ x 1 … ∫ x n h ( x 1 , … , x n ) g ( x 1 , … , x n ; c ) d x 1 … d x n ,$
for all c and every function h. By the law of iterated expectations, $Φ ( u )$ is defined as the function satisfying, for any function $ψ$,
$E [ ψ ( Y ) h ( X 1 , … , X n ) ] = E ψ ∑ i = 1 n X i h ( X 1 , … , X n ) ︸ : = I 1 = E [ ψ ( Y ) E [ h ( X 1 , … , X n ) | Y ] ] = E [ ψ ( Y ) Φ ( Y ) ] ︸ : = I 2 .$
A change of variable $x n = u − ∑ i = 1 n − 1 x i$ yields
$I 1 = ∫ x 1 … ∫ x n − 1 ∫ u h x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i ψ ( u ) f x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i d x 1 … d x n − 1 d u = ∫ u ψ ( u ) ∫ x 1 … ∫ x n − 1 h x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i f x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i d x 1 … d x n − 1 d u I 2 = ∫ x 1 … ∫ x n − 1 ∫ u Φ ( u ) ψ ( u ) f x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i d x 1 … d x n − 1 d u , = ∫ u ψ ( u ) Φ ( u ) ∫ x 1 … ∫ x n − 1 f x 1 , … , x n − 1 , u − ∑ i = 1 n − 1 x i d x 1 … d x n − 1 ︸ = f Y ( u ) d u .$
$Φ ( c ) = ∫ x 1 … ∫ x n − 1 h ( x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i ) f x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i d x 1 … d x n − 1 f Y ( c ) = ∫ x 1 … ∫ x n − 1 h x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i f x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i f Y ( c ) d x 1 … d x n − 1 .$
From (A1), the conditional density reads
$Φ ( c ) = ∫ x 1 … ∫ x n h ( x 1 , … , x n ) f ( x 1 , … , x n ) δ ( ∑ i = 1 n x i − c ) f Y ( c ) ︸ g ( x 1 , … , x n ; c ) d x 1 … d x n$
or equivalently $g ( x 1 , … , x n ; c ) = f x 1 , … , x n − 1 , c − ∑ i = 1 n − 1 x i δ ∑ i = 1 n x i − c f Y ( c )$.

## Appendix B. Proof of Lemma 1.

The matrix $A ( m )$ is the sum of two positive definite matrices: a diagonal matrix with strictly positive entries $a 1 , … , a m$ and a constant matrix with entries all set to $a 0 > 0$. Hence, $A ( m )$ is positive definite, showing $( i )$.
Let us now compute the determinant of $A ( m )$. We proceed by recursion, showing that it is true for $m + 1$ whenever it holds for $m ≥ 2$. It is obvious to check that it is true for $m = 2$. The key point is to notice that it is enough to establish the following recursion rule:
$| A ( m + 1 ) | = π ( m + 1 ) s ( m + 1 ) = ∑ k = 0 m π ( m + 1 ) a i + π ( m + 1 ) a m + 1 = a m + 1 | A ( m ) | + π ( m ) .$
We now apply the standard procedure for computing determinants, taking the product of each element $A m + 1 , j ( m )$ of the last row of $A ( m )$ with the corresponding cofactor matrix $A m + 1 , j ( m )$ and computing the sum. Recall that the cofactor matrix associated with $A i , j ( m )$ is the submatrix $A i , j ( m )$ obtained by deleting the i-th row and j-th column of $A ( m )$ Gentle (2007). This yields
$| A ( m + 1 ) | = a 0 ∑ i = 1 m ( − 1 ) m + 1 + i | A m + 1 , i ( m + 1 ) | + ( a m + 1 + a 0 ) | A m + 1 , m + 1 ( m + 1 ) | ,$
where $| A i , j ( m + 1 ) |$ is the minor associated with the $( i , j )$ element of $A ( m )$, i.e., the determinant of the cofactor matrix $A i , j ( m + 1 )$. Interestingly, the cofactor matrices $A i , j ( m + 1 )$ take a form that is similar to $A ( m )$. For instance, $A m + 1 , m + 1 ( m + 1 ) = A ( m )$ and $A m + 1 , m ( m + 1 )$ is just $A ( m )$ with $a m ← 0$. Similarly, $A m + 1 , 1 ( m + 1 )$ is the same as $A ( m )$ with $a 1 ← 0$ provided that we shift all columns to the left, and put the last column back in first place (potentially changing the sign of the corresponding determinant), etc. More generally, for $i ∈ { 1 , 2 , … , m }$, the determinant of the $( i , j )$ cofactor matrix of $A ( m )$, $| A i , j ( m + 1 ) |$ is exactly that of $A ( m )$ with $a i ← a m + 1$ if $i = j$ or that of $A ( m )$ with $a i ← 0$ and $a j ← a m + 1$ when $j ≠ i$, up to some permutations of rows and columns. In fact:
$| A i , i ( m + 1 ) | = ∑ k = 0 , k ≠ i m + 1 ∏ p = 0 m a p a k a m + 1 a i + ∏ k = 0 m a k a i = π ( m + 1 ) a i ∑ k = 0 , k ≠ i m + 1 1 a k ,$
$| A i , j ≠ i ( m + 1 ) | = − ( − 1 ) i + j ∑ k = 0 , k ∉ { i , j } m + 1 π ( m + 1 ) a k 0 a i + π ( m ) a i a m + 1 a j = − ( − 1 ) i + j π ( m + 1 ) a i a j .$
The minor $| A m + 1 , i ( m + 1 ) |$ when $i ≠ m + 1$ can be obtained from the expression of $| A ( m ) |$ provided that we adjust the sign and replace $a i$ by 0:
$| A m + 1 , i ( m + 1 ) | = − ( − 1 ) i + m + 1 π ( m ) a i , i ∈ { 1 , 2 , … , m }$
(recall that $A ( m )$ is symmetric so that $A m + 1 , i ( m + 1 ) = A i , m + 1 ( m + 1 )$). Therefore,
$| A ( m + 1 ) | = ( a m + 1 + a 0 ) | A ( m ) | + a 0 ∑ i = 1 m ( − 1 ) m + 1 + i | A m + 1 , i ( m + 1 ) | = a m + 1 | A ( m ) | + a 0 | A ( m ) | + a 0 ∑ i = 1 m − ( − 1 ) 2 ( m + 1 + i ) π ( m ) a i = a m + 1 | A ( m ) | + a 0 π ( m ) a 0 + ∑ i = 1 m π ( m ) a i − a 0 ∑ i = 1 m π ( m ) a i = a m + 1 | A ( m ) | + π ( m )$
and this recursion is equivalent to $( i i )$.
Eventually, the expression of $B ( m ) : = ( A ( m ) ) − 1$ is given by $1 / | A ( m ) |$ times the adjunct matrix of $A ( m )$, which is the (symmetric) cofactor matrix $C ( m )$. Observe that the elements $C i , j ( m )$ are given by $( − 1 ) i + j | A i , j ( m ) |$. Using the minors expressions (A2) and (A3) derived above replacing m by $m − 1$ yields:
$B i , i ( m ) = | A i , i ( m ) | | A ( m ) | = ∑ k = 0 , k ≠ j m 1 a k a i ∑ k = 0 m 1 a k = s ( m ) − 1 / a i a i s ( m ) = a i s ( m ) − 1 a i 2 s ( m ) B i , j ≠ i ( m ) = ( − 1 ) i + j | A i , j ≠ i ( m ) | | A ( m ) | = − π ( m ) a i a j | A ( m ) | = − 1 a i a j ∑ k = 0 m 1 a k = − 1 a i a j s ( m ) .$
This concludes the proof.

## References

1. Andersen, Leif B. G., and Jakob Sidenius. 2004. Extensions to the Gaussian copula: Random recovery and random factor loadings. Journal of Credit Risk 1: 29–70. [Google Scholar] [CrossRef]
2. Cedilnik, Anton, Katarina Kosmelj, and Andrej Blejec. 2004. The distribution of the ratio of jointly Normal variables. Metodološki Zvezk 1: 99–108. [Google Scholar]
3. Fieller, Edgar C. 1932. The distribution of the index in a normal bivariate population. Biometrika 23: 428–40. [Google Scholar] [CrossRef]
4. Gentle, James E. 2007. Matrix Algebra: Theory, Computations, and Applications in Statistics. Berlin: Springer. [Google Scholar]
5. Glasserman, Paul. 2010. Monte Carlo Methods in Financial Engineering. Berlin: Springer. [Google Scholar]
6. Gordy, Michael B. 2003. A risk-factor model foundation for ratings-based bank capital rules. Journal of Financial Intermediation 12: 199–232. [Google Scholar] [CrossRef][Green Version]
7. Hull, John, and Alan White. 2004. Valuation of a CDO and an nth-to-default CDS without Monte Carlo simulation. Journal of Derivatives 12: 8–23. [Google Scholar] [CrossRef]
8. Laurent, Jean-Paul, Michael Sestier, and Stéphane Thomas. 2016. Trading book and credit risk: how fundamental is the Basel review? Journal of Banking and Finance 73: 211–23. [Google Scholar] [CrossRef]
9. Li, David X. 2016. On Default Correlation: A Copula Function Approach. Technical Report. Amsterdam: Elsevier. [Google Scholar]
10. Meyer, Daniel W. 2018. (un)conditional sample generation based on distribution element trees. Journal of Computational and Graphical Statistics. [Google Scholar] [CrossRef]
11. Pham, Dinh Tuan. 2000. Stochastic methods for sequential data assimilation in strongly nonlinear systems. Monthly Weather Review 129: 1194–207. [Google Scholar] [CrossRef]
12. Pykhtin, Michael. 2004. Multifactor adjustment. Risk Magazine 17: 85–90. [Google Scholar]
13. Rubinstein, Reuven Y., and Dirk Kroese. 2017. Simulation and the Monte Carlo Method. Hoboken: Wiley. [Google Scholar]
14. Vasicek, Oldrich A. 1991. Limiting loan loss probability distribution. Finance, Economics and Mathematics. [Google Scholar] [CrossRef]
15. Vrins, Frederic D. 2009. Double-t copula pricing of structured credit products—Practical aspects of a trustworthy implementation. Journal of Credit Risk 5: 91–109. [Google Scholar] [CrossRef]
Figure 1. Scatter plot of 250 samples $( x 1 , x 2 )$ drawn for the four methods with $n = 2$, $w 1 2 = 0 . 4$, $w 2 2 = 0 . 6$ and $c = 1$. Method 4 yields the correct answer. The vertical and horizontal dashed lines show the empirical means of $x 1$ and $x 2$ for that specific run, respectively. The diagonal (red) solid line is $x 2 = c − x 1$. The horizontal and vertical widths of the gray rectangles show the confidence intervals ($μ ^ x i ± 1 . 96 × σ ^ x i$) for both $X 1$ and $X 2$ based on 100 runs of 250 pairs each.
Figure 1. Scatter plot of 250 samples $( x 1 , x 2 )$ drawn for the four methods with $n = 2$, $w 1 2 = 0 . 4$, $w 2 2 = 0 . 6$ and $c = 1$. Method 4 yields the correct answer. The vertical and horizontal dashed lines show the empirical means of $x 1$ and $x 2$ for that specific run, respectively. The diagonal (red) solid line is $x 2 = c − x 1$. The horizontal and vertical widths of the gray rectangles show the confidence intervals ($μ ^ x i ± 1 . 96 × σ ^ x i$) for both $X 1$ and $X 2$ based on 100 runs of 250 pairs each.
Figure 2. Scatter plot of 250 samples $( x 1 , x 2 )$ drawn for the four methods with $n = 3$, $w 1 2 = w 2 2 = 0 . 4$, $w 3 2 = 0 . 2$ and $c = 4$. Method 4 yields the correct answer. Only the first two components of the vector $( x 1 , x 2 , x 3 )$ are shown to enhance readability; the third component is set to $x 3 = c − ( x 1 + x 2 )$. The vertical and horizontal dashed lines show the empirical means of $x 1$ and $x 2$ for that specific run, respectively. The horizontal and vertical widths of the gray rectangles show the confidence intervals ($μ ^ x i ± 1 . 96 × σ ^ x i$) for both $X 1$ and $X 2$ based on 100 runs of 250 pairs each.
Figure 2. Scatter plot of 250 samples $( x 1 , x 2 )$ drawn for the four methods with $n = 3$, $w 1 2 = w 2 2 = 0 . 4$, $w 3 2 = 0 . 2$ and $c = 4$. Method 4 yields the correct answer. Only the first two components of the vector $( x 1 , x 2 , x 3 )$ are shown to enhance readability; the third component is set to $x 3 = c − ( x 1 + x 2 )$. The vertical and horizontal dashed lines show the empirical means of $x 1$ and $x 2$ for that specific run, respectively. The horizontal and vertical widths of the gray rectangles show the confidence intervals ($μ ^ x i ± 1 . 96 × σ ^ x i$) for both $X 1$ and $X 2$ based on 100 runs of 250 pairs each.