Implementing the Rearrangement Algorithm: An Example from Computational Risk Management

: After a brief overview of aspects of computational risk management, the implementation of the rearrangement algorithm in R is considered as an example from computational risk management practice. This algorithm is used to compute the largest quantile (worst value-at-risk) of the sum of the components of a random vector with speciﬁed marginal distributions. It is demonstrated how a basic implementation of the rearrangement algorithm can gradually be improved to provide a fast and reliable computational solution to the problem of computing worst value-at-risk. Besides a running example, an example based on real-life data is considered. Bootstrap conﬁdence intervals for the worst value-at-risk as well as a basic worst value-at-risk allocation principle are introduced. The paper concludes with selected lessons learned from this experience.


Introduction
Computational risk management is a comparably new and exciting field of research at the intersection of statistics, computer science and data science. It is concerned with computational problems of quantitative risk management, such as algorithms, their implementation, computability, numerical robustness, parallel computing, pitfalls in simulations, run-time optimization, software development and maintenance. At the end of the (business) day, solutions in quantitative risk management are to be run on a computer of some sort and thus concern computational risk management.
There are various factors that can play a role when developing and implementing a solution to a problem from computational risk management in practice, for example: (1) Theoretical hurdles. Some solutions cannot be treated analytically anymore. Suppose X = (X 1 , . . . , X d ) is a d-dimensional random vector of risk factor changes, for example negative log-returns, with marginal distribution functions F 1 , . . . , F d and we are interested in computing the probability P(X j ∈ (F ← j (p j ), F ← j (q j )], for all j) of the jth risk factor change to take on values between its p j -and its q j -quantile for all j. Even if we knew the joint distribution function H of X, computing such a probability analytically involves evaluating H at 2 d different values. For d ≈ 32, this can already be time-consuming (not to speak of numerical inaccuracies appearing, which render the formula as good as useless) and for d ≈ 260 the number of points at which to evaluate H is roughly equal to the estimated number of atoms in the universe.
(2) Model risk. The risk of using the wrong model or a model in which assumptions are not fulfilled; every solution we implement is affected by model risk to some degree. In the aforementioned example, not knowing the exact H leads to model risk when computing the given probability by Monte Carlo integration. (3) The choice of software. The risk of using the wrong software; in the above example, a software not suitable for simulating from joint distribution functions H. Another example is to use a programming language too low-level for the problem at hand, requiring to implement standard tasks (sampling, optimization, etc.) manually and thus bearing the risk of obtaining unreliable results because of many possible pitfalls one might encounter when developing the tools needed to implement the full-blown solution. These aspects become more crucial nowadays since efficient solutions to different problems are often only available in one software each but companies need to combine these solutions and thus the corresponding software in order to solve these different problems at hand. (4) Syntax errors. These are compilation errors or execution errors by an interpreter because of a violation of the syntax of the programming language under consideration. Syntax errors are typically easy to detect as the program simply stops to run. Also, many programming languages provide tools for debugging to find the culprits. (5) Run-time errors. These are errors that appear while a program runs. Quite often, run-time errors are numerical errors, errors that appear, for example, because of the floating-point representation of numbers in double precision. Run-time errors can sometimes be challenging to find, for example, when they only happen occasionally in a large-scale simulation study and introduce a non-suspicious bias in the results. Run-time errors can sometimes also be hard to reproduce. (6) Semantic errors. These are errors where the code is syntactically correct and the program runs, but it does not compute what is intended (for example, due to a flipped logical statement).
If results do not look suspicious or pass all tests conducted, sometimes measuring run time can be the only way to find such problems (for example, if a program finishes much earlier than expected). (7) Scaling errors. These are errors of an otherwise perfectly fine running code that fails when run at large scale. This could be due to a lack of numerical robustness, the sheer run time of the code, an exploding number of parameters involved, etc. (8) User errors. These are errors caused by users of the software solution; for example, when calling functions with wrong arguments because of a misinterpretation of their meaning. The associated risk of wrongly using software can, by definition, often be viewed as part of operational risk. (9) Warnings. Warnings are important signs of what to watch out for; for example, a warning could indicate that a numerical optimization routine has not converged after a predetermined number of steps and only the current best value is returned, which might be far off a local or the global optimum. Unfortunately, especially in large-scale simulation studies, users often suppress warnings instead of collecting and considering them. (10) Run time. The risk of using or propagating method A over B because the run time of A beats the one of B by a split second or second, not realizing that run time depends on factors such as the hardware used, the current workload, the algorithm implemented, the programming language used, the implementation style, compiler flags, whether garbage collection was used, etc. There is not even a unique concept of time (system vs. user vs. elapsed time). (11) Development and maintenance. It is significantly more challenging to provide a robust, well developed, maintained and documented class of solutions as part of a bigger, coherent software package rather than a single script with hard-coded values to solve a special case of the same problem. Although stand-alone scripts get specific tasks done by 4 p.m., having a software package available is typically beneficial mid-to long-term. It can significantly reduce the risk of the aforementioned errors by reusing code well tested and applied by the users of the package or it can avoid the temptation of introducing errors long after the code was developed just because it suddenly looks suspicious although it is actually correct.
At the core of all solutions to problems from computational risk management lies an algorithm, a well-defined (unambiguous) finite set of instructions (steps) for solving a problem. An implementation of an algorithm in a programming language allows one to see how a problem is actually solved, unlike a formulation in pseudo-code, which is often vague and thus opens the door for misinterpretation. A classical example is if pseudo-code says "Minimize the function. . . " without mentioning how initial values or intervals can be obtained. Another example is "Choose a tolerance ε > 0. . . " but one is left in the dark concerning what suitable tolerances ε are for the problem at hand; they often depend on the unknown output one is interested to compute in the first place, see Step (1) of Algorithm 1 later. Oftentimes, these are the hardest parts to solve of the whole problem and it is important for research and scientific journals to accept corresponding contributions as important instead of brushing them off as "not innovative" or representing "no new contribution".
A fundamental principle when developing an implementation is that of a minimal working example. A minimal working example is source code that is a working example in the sense that it allows someone else to reproduce a problem (sufficiency) and it is minimal in the sense that it is as small and as simple as possible, without non-relevant code, data or dependencies (necessity). Minimal working examples are often constructed from existing larger chunks of code by divide and conquer, that is by breaking down the problem into sub-problems and commenting out non-relevant parts until the code becomes simple enough to show the problem (which is then easier to grasp and solve or can be sent to an expert in the field to ask for help).
In this paper, we consider the problem of implementing the rearrangement algorithm (RA) of Embrechts et al. (2013) as an example from computational risk management. The RA allows one to compute, for example, the worst value-at-risk of the sum of d risks of which the marginal distributions are known but the dependence is unknown; it can also be used to compute the best value-at-risk or expected shortfall, but we focus on the worst value-at-risk. Section 2 contains the necessary details about the RA. Section 3 addresses how the major workhorse underlying this algorithm can be implemented in a straightforward way in R. This turns out to be inefficient and Section 4 presents ways to improve the implementation. Section 5 utilizes the implementations in our R package qrmtools for tracing and to motivate the chosen default tolerances, and Section 6 considers a real data example, presents a bootstrap confidence interval for worst value-at-risk and introduces a basic capital allocation principle based on worst value-at-risk. The lessons learned throughout the paper are summarized in Section 7.

The Rearrangement Algorithm in a Nutshell
Let L 1 ∼ F 1 , . . . , L d ∼ F d be continuously distributed loss random variables over a predetermined period and let L + = ∑ d j=1 L j be the total loss over that time period. From a risk management perspective we are interested in computing value-at-risk (VaR α (L + ), VaR α or VaR) at confidence level α ∈ (0, 1), that is the α-quantile F ← L + (α) = inf{x ∈ R : F L + (x) ≥ α} of the distribution function F L + of L + ; in typical applications, α ∈ [0.99, 1). If we had enough joint observations, so realizations of the random vector L = (L 1 , . . . , L d ), we could estimate VaR α (L + ) empirically. A major problem is that one often only has realizations of each of L j individually, non-synchronized. This typically allows to pin down F 1 , . . . , F d but not the joint distribution function H of L. By Sklar's Theorem, such H can be written as for a unique copula C. In other words, although we often know or can estimate F 1 , . . . , F d , we typically neither know the copula C nor have enough joint realizations to be able to estimate it. However, the copula C determines the dependence between L 1 , . . . , L d and thus the distribution of L + as the following example shows.
Note that we did not utilize the stochastic representation (L 1 , L 2 ) = (F ← 1 (U), F ← 2 (U)), U ∼ U(0, 1), for sampling L = (L 1 , L 2 ). Instead, we sorted the realizations of L 2 so that their ranks are aligned with those of L 1 , in other words, so that the ith-smallest realization of L 2 lies next to the ith-smallest realizations of L 1 . The rows of this sample thus consist of (L (i)1 , L (i)2 ), where L (i)j denotes the ith order statistic of the n realizations L 1j , . . . , L nj of L j . Such a rearrangement mimics comonotonicity between the realizations of L 1 and L 2 . Note that this did not change the realizations of L 1 or L 2 individually, so it did not change the marginal realizations, only the joint ones.
As we learned from Example 1, by rearranging the marginal loss realizations, we can mimic different dependence structures between the losses and thus influence VaR α (L + ). In practice, the worst VaR α (L + ), denoted by VaR α (L + ), is of interest, that is the largest VaR α (L + ) for given margins L 1 ∼ F 1 , . . . , L d ∼ F d . The following remark motivates an objective when rearranging marginal loss realizations in order to increase VaR α (L + ) and thus approximate VaR α (L + ).

Remark 1 (Objective of rearrangements).
(1) By Example 1, the question which copula C maximizes VaR α (L + ) is thus, at least approximately, the question of which reordering of realizations of each of L 1 , . . . , L d leads to VaR α (L + ). For any C, the probability of exceeding VaR α (L + ) is (at most, but for simplicity let us assume exactly) 1 − α. How that probability mass is distributed beyond VaR α (L + ) depends on C. If we find a C such that L + has a rather small variance var(L + ), more probability mass will be concentrated around a single point which can help to pull VaR α (L + ) further into the right tail and thus increase VaR α (L + ); Figure 2 provides a sketch in terms of a skewed t 3 density. If that single point exists and if var(L + ) = 0, then it must be expected shortfall ES α (L + ) = 1 1−α 1 α VaR u (L + ) du, which provides an upper bound to VaR α (L + ).
Skewed t 3 density of L + with VaR α (L + ) and the probability mass of (at most) 1 − α exceeding it.
(2) It becomes apparent from Figure 2 that the distribution of L + below its α-quantile is irrelevant for the location of VaR α (L + ). It thus suffices to consider losses L 1 , . . . , L d beyond their marginal α-quantiles . . , d, and their copula C α , called worst VaR copula. Note that since α is typically rather close to 1, the distribution of L j | L j > F ← j (α) (the tail of L j ) is typically modeled by a continuous distribution; this can be justified theoretically by the Pickands-Balkema-de-Haan Theorem, see (McNeil et al. 2015, Section 5.2.1).
(3) In general, if var(L + ) = 0 cannot be attained, the smallest point of the support of L + given that L j > F ← j (α), j = 1, . . . , d, is taken as an approximation to VaR α (L + ).
The RA aims to compute VaR α (L + ) by rearranging realizations of L j | L j > F ← j (α), j = 1, . . . , d, such that their sum becomes more and more constant, in order to approximately obtain the smallest possible variance and thus the largest VaR α (L + ). There are still fundamental open mathematical questions concerning the convergence of this algorithm, but this intuition is sufficient for us to understand the RA and to study its implementation. To this end, we now address the remaining underlying concepts we need.
For each margin j = 1, . . . , d, the RA uses two discretizations of F ← j beyond the probability α as realizations of L j | L j > F ← j (α) to be rearranged. In practice, one would typically utilize the available data from L j to estimate its distribution function F j and then use the implied quantile function F ← j of the fitted F j to obtain such discretizations; see Section 6. Alternatively, F ← j could be specified by expert opinion. The RA uses two discretizations to obtain upper and lower bounds to VaR α (L + ), which, when sufficiently close, can be used to obtain an estimate of VaR α (L + ), for example, by taking the midpoint. The two discretizations are stored in the matrices (1) These matrices are the central objects the RA works with. For simplicity of the argument, we write X α = (X α ij ) if the argument applies to X α or X α . The RA iterates over the columns of X α and successively rearranges each column in a way such that the row sums (∑ d j=1 X α j ) i=1,...,N of the rearranged matrix have a smaller variance; compare with Remark 1.
In each rearrangement step, the RA rearranges the jth column X α j of X α oppositely to the vector of row sums After oppositely ordering, the (second-, third-, etc.) smallest component of a lies next to the (second-, third-, etc.) largest of b, which helps decrease the variance of their componentwise sum. To illustrate this, consider the following example.

Example 2 (Motivation for oppositely ordering columns).
We start by writing an auxiliary function to compute a matrix X α with the modification that for X α , to avoid possible infinities; we come back to this point later. It is useful to start writing functions at this point, as we can reuse them later and rely on them for computing partial results, inputs needed, etc. In R, the function stopifnot() can be used to check inputs of functions. If any of the provided conditions fails, this would produce an error. For more elaborate error messages, one can work with stop(). Input checks are extremely important when functions are exported in a package (so visible to a user of the package) or even made available to users by simply sharing R scripts. Code is often run by users who are not experts in the underlying theory or mechanisms and good input checks can prevent them from wrongly using the shared software; see Section 1 Point (8). Also, documenting the function (here: Roxygen-style documentation) is important, providing easy to understand information about what the function computes, meaning of input parameters, return value and who wrote the function.
The columns in X α are sorted in increasing order, so mimic comonotonicity in the joint tail, which leads to the following variance estimate of the row sums.

> var(rowSums(X))
[1] 3519.706 If we randomly shuffle the first column, so mimicking independence in the joint tail, we obtain the following variance estimate.
[,1]) # randomly shuffling the first column 4 > var(rowSums(X.)) # estimate the variance of the row sums Now if we oppositely order the first column with respect to the sum of all others, here the second column, we can see that the variance of the resulting row sums will decrease; note that we mimic countermonotonicity in the joint tail in this case.
The two marginal distributions largely differ in their heavy-tailedness, which is why the variance of their sum, even when oppositely reordered is still rather large. For θ 1 = θ 2 = 2, one obtains 138.6592, for two standard normal margins 0.0808 and for two standard uniform distributions 0.
As a last underlying concept, the RA utilizes the minimal row sum operator which is motivated in Remark 1 Part (3). We are now ready to provide the RA.
The RA as introduced in Embrechts et al. (2013) states that s N ≤ s N and in practice s N ≈ s N ≈ VaR α (L + ). Furthermore, the initial randomizations in Steps (2.2) and (3.2) are introduced to avoid convergence problems of s N − s N → 0.

A First Implementation of the Basic Rearrangement Step
When implementing an algorithm such as the RA, it is typically a good idea to divide and conquer, which means breaking down the problem into sub-problems and addressing those first. Having implemented solutions to the sub-problems (write functions!), one can then use them as black boxes to implement the whole algorithm. In our case, we can already rely on the function quantile_matrix() for Steps (2.1) and (3.1) of the RA and we have already seen how to oppositely order two columns in Example 2. We also see that apart from which matrix is rearranged, Steps (2) and (3) are equal. We thus focus on this step in what follows.

Example 3 (Basic rearrangements).
We start with a basic implementation for computing one of the bounds s N , s N of the RA.
1 > ##' @title Basic Rearrangements 2 > ##' @param X (N, d)-matrix of marginal quantiles beyond the confidence level 3 > ##' @param tol convergence tolerance determining when to stop rearranging columns; 4 > ##' can also be NULL in which case the algorithm runs until there was no change 5 > ##' in the matrix of rearranged columns after one iteration over all columns.
6 > ##' @return list with the minimal row sums after the algorithm terminates and the 7 > ##' corresponding rearranged matrix X 8 > ##' @author Marius Hofert 9 > basic_rearrange <-function(X, tol) 10 + { 11 + ## Random column permutations and basic setup 12 + X <-apply(X, 2, function(x) x[sample(seq_along(x))]) # randomly permute each column In comparison to Algorithm 1, our basic implementation already uses a relative rather than an absolute convergence tolerance ε (more intuitively named tol in our implementation), the former is more suitable here as we do not know VaR α (L + ) and therefore cannot judge whether an absolute tolerance of, say, 0.001 is reasonable. Also, we terminate if the attained relative tolerance is less than or equal to tol since including equality allows us to choose tol = 0; see Steps (2.4) and (3.4) of Algorithm 1. Furthermore, we allow the tolerance to be NULL in which case the columns are rearranged until all of them are oppositely ordered with respect to the sum of all other. The choice tol = NULL is good for testing, but typically results in much larger run times and rarely has an advantage over tol = 0.
(2.2) Permute randomly the elements in each column of X α . (2.3) Set Y α = X α . For 1 ≤ j ≤ d, rearrange the jth column of the matrix Y α so that it becomes oppositely ordered to the sum of all other columns.
(3.2) Permute randomly the elements in each column of X α .
The opposite ordering step in our basic implementation contains the argument ties.method = "first" of rank(), which specifies how ties are handled, namely those ties with smaller index get assigned the smaller rank. Although this was not a problem in Example 2, when N is large and d > 2, ties can quickly arise numerically. It can be important to use a sorting algorithm that has deterministic behavior in this case, as otherwise, the RA might not terminate or not as quickly as it could; see the vignette VaR_bounds of the R package qrmtools for more details.
Finally, we can call basic_rearrange() on both matrices in Equation (1) to obtain the lower and upper bounds s N , s N and thus implement the RA. We would thus reuse the same code for Steps (2) and (3) of the RA, which means less code to check, improve or maintain.

Example 4 (Using the basic implementation).
To call basic_rearrange() in a running example, we use the following parameters and build the input matrix X α .

Improvements
It is always good to have a first working version of an algorithm, for example, to compare against in case one tries to improve the code and it becomes harder to read or check. However, as we saw in Example 4, our basic implementation of Example 3 is rather slow even if we chose a rather small N in this running example. Our goal is thus to present ideas how to improve basic_rearrange() as a building block of the RA.

Profiling
As a first step, we profile the last call to see where most of the run time is spent. 1 > Rprof(profiling <-tempfile()) # enable profiling 2 > res.basic.prof <-basic_rearrange(X, tol = eps) # call 3 > Rprof(NULL) # disable profiling 4 > prof <-summaryRprof (profiling) Profiling writes out the call stack every so-many split seconds, so checks where the execution currently is. This allows one to measure the time spent in each function call. Note that some functions do not create stack frame records and thus do not show up. Nevertheless, profiling the code is often helpful. It is typically easiest to consider the measurement in percentage (see second and fourth column in the above output) and often in total, so the run time spent in that particular function or the functions it calls (see the fourth column). As this column reveals, two large contributors to run time are the computation of row sums and the sorting.

Avoiding Summations
First consider the computation of row sums. In each rearrangement step, basic_rearrange() computes the row sums of all but the current column. As we explained, this rearrangement step is mathematically intuitive but it is computationally expensive. An important observation is that the row sums of all but the current column is simply the row sums of all columns minus the current column; in short, X α −j = ∑ N k=1 X α k − X α j . Therefore, if we, after randomly permuting each column, compute the total row sum once, we can subtract the jth column to obtain the row sums of all but the jth column. After having rearranged the jth column, we then add the rearranged jth column to the row sums of all other columns in order to obtain the updated total row sums.
The following improved version of basic_rearrange() incorporates said idea; to save space, we omit the function header.
## Random column permutations and basic setup 4 + X <-apply(X, 2, function(x) x[sample(seq_along(x))]) # randomly permute each column Run time is substantially improved by about 85%. This should be especially helpful for large d, so let us investigate the percentage improvement; here only done for rather small d to show the effect. Figure 3 shows the relative speed-up in percentage of improved_rearrange() over basic_rearrange() as a function of d for the running example introduced in Example 4. 1 > d <-round(2^seq(3, 7, by = 0.5)) # dimensions d considered here 2 > num.d <-length(d) # number of dimensions 3 > time <-matrix(, nrow = num.d, ncol = 2) # (num.d, 2)-matrix (initialized with NAs) 4 > colnames(time) <-c("basic", "improved") 5 > ## Loop over the dimensions and measure elapsed time 6 > for(j in seq_along (d)  We can infer that the percentage improvement of improved_rearrange() in comparison to basic_rearrange() converges to 100% for large d. This is not a surprise since determining the row sums of all but a fixed column requires basic_rearrange() to compute N(d − 1)-many sums whereas improved_rearrange() requires only N-many. This is an improvement by O(d), which especially starts to matter for large d.
A quick note on run-time measurement is in order here. In a more serious setup, one would take repeated measurements of run time (at least for smaller d), determine the average run time for each d and perhaps empirical confidence intervals for the true run time based on the repeated measurements. Also, we use the same seed for each method, which guarantees the same shuffling of columns and thus allows for a fair comparison. Not doing so can result in quite different run times and would thus indeed require repeated measurements to get more reliable results.
Finally, the trick of subtracting a single column from the total row sums in order to obtain the row sums of all other columns comes at a price. Although the RA as in Algorithm 1 works well if some of the 1-quantiles appearing in the last row of X α are infinity, this does not apply to improved_rearrange() anymore since if the last entry in the jth column is infinity, the last entry in the vector of total row sums is infinity and thus the last entry in the vector of row sums of all but the jth column would not be defined (NaN in R). This is why quantile_matrix() avoids computing 1-quantiles; see Example 2.

Avoiding Accessing Matrix Columns and Improved Sorting
There are two further improvements we can consider. The first concerns the opposite ordering of columns. It so far involved a call to sort() and one to rank(). It turns out that these two calls can be replaced by a nested order() call, which is slightly faster than rank() alone. We start by demonstrating that we can replicate rank(, ties.method = "first") by a nested order() call based on standard uniform data.
Providing rank() with a method to deal with ties is important here as the default assigns average ranks on ties and thus can produce non-integer numbers. One would not guess to see ties in such a small set of standard uniform random numbers, but that is not true; see Hofert (2020) for details.
The following example incorporates these two further improvements and some more.
The following implementation saves column-access time by working with lists rather than matrices. We also use the slightly faster .rowSums() (as we know the dimensions of the input matrix) instead of rowSums() for computing the initial row sums once in the beginning, and we incorporate the idea of a nested order() call instead of rank(). Furthermore, before shuffling the columns, we also store their original sorted versions, which allows us to omit the call of sort() in improved_rearrange(). 1 > advanced_rearrange <-function(X, tol) 2 + { 3 + ## Setup 4 + X.lst <-split(X, col(X)) # split 'matrix' X in columns 5 + X.lst.sorted <-X.lst # X is assumed to be sorted 6 + X.lst <-lapply(X.lst, sample) # randomly permute each 'column' (element of X.lst) Although the gain in run time is not as dramatic as before, we still see an improvement of about 62% in comparison to improved_rearrange().

Comparison with the Actual Implementation
The computational improvements so far have already lead to a percentage improvement of 94% in comparison to basic_rearrange() in our running example. The following example compares our fastest function advanced_rearrange() so far with our implementation rearrange() in the R package qrmtools whose development was motivated by a risk management team at a Swiss bank.

Example 7 (Comparison with rearrange()).
To save even more run time, the function rearrange() from the R package qrmtools splits the input matrix into its columns at C level. It also uses an early stopping criterion in the sense that after all columns have been rearranged once, stopping is checked after every column rearrangement. Because of the latter, the result is not quite comparable to our previous versions.
1 > library(qrmtools) 2 > set.seed(271) # for reproducibility 3 > res.rearr <-rearrange(X, tol = eps) 4 > all.equal(res.rearr[["X.rearranged"]], res.adv[["X.rearranged"]], # comparison 5 + check.attributes = FALSE) [1] "Mean relative difference: 5.611011e-05" We did not measure run time here since, similar to Heisenberg's uncertainty principle, the difference in run time becomes more and more random; this is in line with the point we made about run-time measurement in the introduction. In other words, the two implementations become closer in run time. This can be demonstrated by repeated measurements.
1 > B <-20 # number of replications 2 > res <-matrix(, nrow = B, ncol = 2, 3 + dimnames = list(Replication = 1:B, Method = c("advanced", "rearrange"))) 4 > set.seed(271) # for reproducibility 5 > res[,"advanced"] <-6 + replicate(B, expr = system.time(advanced_rearrange(X, tol = eps))[["elapsed"]]) 7 > set.seed(271) # for reproducibility 8 > res[,"rearrange"] <-9 + replicate(B, expr = system.time(rearrange(X, tol = eps))[["elapsed"]]) 10 > tab <-as.data.frame.table(res, responseName = "Elapsed") # all results as a table 11 > boxplot(Elapsed~Method, data = tab, xlab = "", ylab = "Elapsed time in seconds", 12 + names = c("advanced_rearrange()", "rearrange()"), boxwex = rep(0.35, 2)) As we see from the box plot in Figure 5, on average, rearrange() is slightly faster than advanced_rearrange(). Although rearrange() could even be made faster, it also computes and returns more information about the rearrangements than advanced_rearrange(); for example, the computed minimal row sums after each column rearrangement and the row of the final rearranged matrix corresponding to the minimal row sum in our case here. Given that this is a function applied by users who are not necessarily familiar with the rearrangement algorithm, it is more important to provide such information. Nevertheless, rearrange() is still faster than advanced_rearrange() on average. The function rearrange() is the workhorse underlying the function RA() that implements the rearrangement algorithm in the R package qrmtools. The adaptive rearrangement algorithm (ARA) introduced in Hofert et al. (2017) improves the RA in that it introduces the joint tolerance, that is the tolerance between s N and s N ; the tolerance used so far is called individual tolerance. The ARA applies column rearrangements until the individual tolerance is met for each of the two bounds and until the joint tolerance is met or a maximal number of column rearrangements has been reached. If the latter is the case, N is increased and the procedure repeated. The ARA implementation ARA() in qrmtools also returns useful information about how the algorithm proceeded. Overall, ARA() neither requires to choose a tolerance nor the number of discretization points, which makes this algorithm straightforward to apply in practice where the choice of such tuning parameters is often unclear. Finding good tuning parameters (adaptively or not) is often a significant problem in newly suggested procedures and not rarely a question open for future research by itself.

The Functions rearrange() and ARA()
In this section we utilize the implementations rearrange() and ARA() of the rearrangement step and the adaptive rearrangement algorithm.

Example 8 (Tracing rearrange()).
The implementation rearrange() has a tracing feature. It can produce a lot of output but we here consider the rather minimal example of rearranging the matrix X =    1 1 1 2 2 2 3 3 3    .
The following code enables tracing (trace = TRUE) to see how the column rearrangements proceed; to this end we rearrange until all columns are oppositely ordered to the sum of all other columns (tol = NULL), disable random permutation of the columns (sample = FALSE) and provide the information that the columns are already sorted here (is.sorted = TRUE). 1 > trace <-rearrange(matrix(rep(1:3, times = 3), ncol = 3), tol = NULL, 2 + sample = FALSE, is.sorted = TRUE, trace = TRUE) [ A "|" or "=" symbol over the just worked on column indicates whether this column was changed ("|") or not ("=") in this rearrangement step. The last two printed columns provide the row sums of all other columns as well as the new updated total row sums after the rearrangement. As we can see from this example, the rearranged matrix has minimal row sum 5 and is given by would have led to the larger minimal row sum 6; this rather constructed example for when the greedy column rearrangements of the RA can fail to provide the maximal minimal row sum was provided by Haus (2015). As more interesting example to trace is to rearrange the matrix Example 9 (Calling ARA()).
We now call ARA() in the case of our running example in this paper. As we can see, the implementation returns a list with the computed bounds s N , s N , the relative rearrangement gap (s N − s N )/s N , the two individual and one joint tolerance reached on stopping, a corresponding vector of logicals indicating whether the required tolerances have been met, the number N of discretization points used in the final step, the number of considered column rearrangements for each bound, vectors of computed optimal (for worst VaR this means "minimal") row sums after each column rearrangement considered, a list with the two input matrices X α , X α , a list with the corresponding final rearranged matrices Y α , Y α and the rows corresponding to their optimal row sums. An estimate of VaR 0.99 (L + ) in our running example can then be computed as follows.  Figure 6 shows the minimal row sum as a function of the number of rearranged columns for each of the two bounds s N and s N on VaR 0.99 in our running example. We can see that already after all columns were rearranged once, the actual value of the two bounds does not change much anymore in this case, yet the algorithm still proceeds until the required tolerances are met. side = 4, line = 0.5, adj = 0) 16 > legend("bottomright", bty = "n", lty = 1:2,

Applications
We now consider a real data example and introduce bootstrap confidence intervals for VaR 0.99 as well as a basic worst VaR allocation principle.
1 > library(qrmdata) 2 > data("SP500_const") # load the constituents of the S&P 500 3 > stocks <-c("GOOGL", "AAPL", "MSFT") # stocks we consider (Google, Apple, Microsoft) 4 > d <-length(stocks) # dimension 5 > time <-c("2006-01-03", "2010-12-31") # time period considered 6 > S <-SP500_const[paste0(time, collapse = "/"), stocks] # pick out data 7 > stopifnot(all(!is.na(S))) # check that there is no missing data 8 > X <--returns(S) # -log-returns We treat the negative log-returns as (roughly) stationary here and consider mean excess plots in Figure 7 to determine suitable thresholds for applying the peaks-over-threshold method. We then fit generalized Pareto distributions (GPDs) to the excess losses over these thresholds. Given the additional information obtained from ARA(), we can also visualize a (pseudo-)sample from the worst VaR copula C α , see Figure 8. Such a sample is obtained by computing the pseudo-observations, that is componentwise ranks scaled to (0, 1), of the rearranged matrix Y α corresponding to the upper bound s N ; we could have also considered Y α here. Also, we can investigate how much the VaR 0.99 (L + ) bounds s N and s N change with changing individual relative tolerance. Figure 9 motivates the use of 0 as individual relative tolerance; this is often quite a bit faster than NULL and provides similar VaR 0.99 (L + ) bounds.

Example 11 (Bootstrap confidence interval for worst VaR).
As in Example 10 and mentioned in Section 2, the marginal quantile functions used as input for the rearrangement algorithm may need to be estimated from data. The corresponding estimation error translates to a confidence interval for VaR 0.99 (L + ), which we can obtain from a bootstrap. To this end, we start by building B bootstrap samples from the negative log-returns; we choose a rather small B = 20 here to demonstrate the idea.

Example 12 (Basic worst VaR allocation).
Capital allocation concerns the allocation of a total capital K to d business lines so that K = AC 1 + · · · + AC d (the full allocation property), where AC j denotes the capital allocated to business line j. As the RA uses the minimal row sum as VaR α (L + ) estimate, the rows in the final rearranged matrices Y α , Y α provide an allocation of K = VaR α (L + ) one can call worst VaR allocation. There could be multiple rows in Y α , Y α leading to the minimal row sum in which case the componentwise average of these rows is considered. The resulting two rows are returned by (rearrange() and) ARA() in qrmtools version 0.0.13. By componentwise averaging, the latter two rows we obtain an approximate worst VaR allocation. Given that we already have a bootstrap sample, we can average the B computed allocations componentwise to obtain the bootstrap mean of the worst VaR allocation.
For other applications of the RA, see, for example, Embrechts and Jakobsons (2016), Bernard et al. (2017), Bernard et al. (2018) and, most notably, Ramsey and Goodwin (2019) where the RA is applied in the context of crop insurance.

Selected Lessons Learned
After using R to motivate column rearrangements and opposite ordering for finding the worst VaR for given marginal distributions in Section 2, we considered a basic implementation of such rearrangement in Section 3. We profiled the code in Section 4 and improved the implementation step-by-step by avoiding unnecessary summations, avoiding accessing matrix columns and improving the sorting step. In Section 5 we then used the implementations rearrange() and ARA() in our R package qrmtools to trace the rearrangement steps and to motivate the use of the default tolerances used by ARA(). A real data example was considered throughout Section 6 where we computed the worst VaR with ARA(), visualized a sample from the corresponding worst VaR copula and assessed the sensitivity of the computed worst VaR with respect to the individual convergence tolerance. To incorporate the uncertainty due to the estimation error of the marginal distributions, we used a bootstrap idea to obtain a confidence interval for the true worst VaR. We also introduced worst VaR allocation as a capital allocation principle and computed bootstrap confidence intervals for the allocated capitals.
Our experiments reveal lessons to learn that frequently appear in problems from the realm of computational risk management: We can and should use software to learn about new problems and communicate their solutions. Implementing an algorithm often helps to fully comprehend a problem, experiment with it and get new ideas for practical solutions. It also allows one to communicate a solution since an implementation is unambiguous even if a theoretical presentation or pseudo-code of an algorithm leaves questions unanswered (such as how to choose tuning parameters); the latter is often the case when algorithms are provided in academic research papers. Both for understanding a problem and for communicating its solution, it is paramount to use visualizations. Creating meaningful graphs is unfortunately still an exception rather than the rule in academic publications where tables are frequently used. Tables mainly allow one to see single numbers at a time, whereas graphs allow one to see the "bigger picture", so how results are connected and behave as functions of the investigated inputs. Given an algorithm, start with a basic, straightforward implementation and make sure it is valid, ideally by testing against known solutions in special cases; we omitted that part here, but note that there are semi-analytical solutions available for computing worst VaR in the case of homogeneous margins. Learn from the basic implementation, experiment with it and improve it, for example, by improving numerical stability, run time, maintenance, etc. If you run into problems along the way, use minimal working examples to solve them; they are typically constructed by divide and conquer. As implementations get more involved, compare them against the basic implementation or previously checked improved versions. When improving code, be aware of the issues mentioned in the introduction and exploit the fact that mathematically unique solutions often allow for different computational solutions. Also, computational problems (ties, parts of domains close to limits, etc.) can arise when there are none mathematically. These are often the hardest problems to solve, which is why it is important to be aware of computational risk management problems and their solutions. Implementing a computationally tractable solution is often one "dimension" more complicated than finding a mathematical solution-also literally, since a random variable (random vector) is often replaced by a vector (matrix) of realizations in a computational solution.
Results in academic research papers are often based on an implementation of a newly presented algorithm that works only under specific conditions with parameters found by experimentation (often hard-coded) under such conditions in the examples considered. A publicly available implementation in a software package needs to go well beyond this point; for example, as users typically do not have expert knowledge of problems the implementation intends to solve and often come from different backgrounds altogether in the hope to find solutions to challenging problems in their field of application. On the one hand, the amount of work spent on development and maintenance of publicly available software solutions is largely underestimated. On the other hand, one can sometimes benefit from getting to know different areas of application or valuable input for further theoretical or computational investigation through user feedback. Also, ones own implementation often helps one to explore potential further improvements or new solutions altogether. Optimizing run time to be able to apply solutions in practical situations can be important but is not all there is. Providing a readable, comprehensible and numerically stable solution is equally important, for example. Code optimized solely for run time typically becomes less transparent and harder to maintain, is thus harder to adapt to future conceptual improvements and more prone to semantic errors. Also, a good solution typically not only provides a final answer but useful by-products and intermediate results computed along the way, just like a good solution in a mathematical exam requires intermediate steps to be presented in order to obtain full marks (which also simplifies to find the culprit if the final answer is erroneous). Acknowledgments: The author would like to thank Kurt Hornik (Wirtschaftsuniversität Wien) and Takaaki Koike (University of Waterloo) for valuable feedback and inspiring discussions.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: ARA adaptive rearrangement algorithm RA rearrangement algorithm VaR value-at-risk