Deep Learning-Based Method for Computing Initial Margin †

: Following the guidelines of the Basel III agreement (2013), large ﬁnancial institutions are forced to incorporate additional collateral, known as Initial Margin, in their transactions in OTC markets. Currently, the computation of such collateral is performed following the Standard Initial Margin Model (SIMM) methodology. Focusing on a portfolio consisting of an interest rate swap, we propose the use of Artiﬁcial Neural Networks (ANN) to approximate the Initial Margin value of the portfolio over its lifetime. The goal is to ﬁnd an optimal conﬁguration of structural hyperparameters, as well as to analyze the robustness of the network to variations in the model parameters and swap features.


Introduction
Due to the financial crisis experienced in 2008, the G8 World Council promoted the regulation of stricter actions for over-the-counter (OTC) derivatives market, especially to reduce the counterparty credit risk.Among the mandated measures is the progressive implementation of an additional type of collateral, known as Initial Margin (IM), with the aim of acting as a "cushion" against pronounced changes in the value of the portfolio contracts.
For the IM calculation, it is standard market practice to follow the Standard Initial Margin Model (SIMM) methodology [1], promoted by International Swaps and Derivatives Association (ISDA), which only requires the sensitivities of the portfolio as input data.When the goal is to know this amount over the whole life of the portfolio, the SIMM simulation becomes challenging due to the heavy computational burden coming from nested Monte Carlo simulations and the high-dimensional nature of the problem [2].
Among the existing alternatives to brute-force simulation, there are approaches based on Deep Learning algorithms, as [2].We aim to implement a supervised neural network for computing the IM over the considered portfolio's life, with special attention to its structure's design.In this regard, we limit our work to portfolios consisting of a single product, a vanilla interest rate swap.

Materials and Methods
As a Deep Learning model for the task of computing the IM, we propose to use a self-normalizing neural network (SNN) [3], adding a single unit output layer (since the IM is a scalar quantity) with a ReLu activation function and He normal kernel initialization strategy [4].We impose that all hidden layers have the same number of units, and such hyperparameters are fixed in the later results.
A supervised training is carried out.Unlike the usual methodology, where features associated with the scenario ω and time step j tuple are considered as a single input data for training, x w j , with the corresponding target y w j ; we propose to use the entire scenario as input data, x w , with the corresponding target vector y w .We believe that this incorporates additional information to the training, allowing the learning of intrinsic features that can improve it.
The interest rate swap portfolio's dataset is produced synthetically, on the fly, from the simulation of several interest rate scenarios under the Hull-White dynamic [5].We establish that it is necessary to know the following quantities throughout the life of the portfolio: the swap value; the two weeks, 1 month, 3 month and 6 month cash rates; the swap par rates for the following vertices: 1 year, 2 years, 3 years, 5 years, 10 years, 15 years, 20 years, and 30 years (as input features of the model); and the IM value (as model's target), for which is necessary to know the swap sensitivities in relation to the rates mentioned above.
The methodology recommended by ISDA, termed as PV01, is chosen for the production of swap sensitivities.It consists of calculating the impact of small changes in the swap rates used to construct the zero curve.
The SIMM methodology [6], is followed for the production of IM.Based on the assumptions of working in a single currency unit and exclusively with a portfolio consisting of a swap, the following formula is obtained for the SIMM: where s k , RW k are the net sensitivity and the risk weight for the rate tenor k; ρ k,l is the tenor correlation and CT is the concentration threshold for the given currency.RW k , ρ k,l and CT are parameters given by ISDA.

Results
First of all, we study the optimal choice of structural hyperparameters of our proposed neural network (depth and width).Finally, we present some experiments related to training robustness as a function of Hull-White simulation parameters and swap features (A summary of the results obtained is presented.The extended version can be found in [7]).

Numerical Experiments to Set Structural Hyperparameters
For the test in this subsection, a 1-year fixed, 6-months floating at-the-money swap with 10-year maturity is considered.We establish the theoretical values a = 0.1, σ = 0.5% for the Hull-White parameters and we choose the market forward rate, f (0, t), obtained from all Eurozone governments bonds on 28 January 2021 (Source: European Central Bank (ECB)).A dataset with 5000 scenarios and 199 time steps is produced.In all tests, we use 4000 scenarios for training and 1000 for validation.
With respect to our neural network, we worked with the stochastic gradient descent optimizer and the following training hyperparameters: a bath size of 256, a learning rate of 0.001, and 1000 epochs.

The Depth Test
We set the total number of units to 512, which will be distributed, by means of integer division, over the following number of hidden layers: 1, 2, 3, 4, 6, 8, 10, 12, and 16.We present the results from 10 training trials due to the stochasticity of the optimization algorithm.
We can observe in Figure 1 that a moderate number of hidden layers (between 3 and 6) tend to offer a better performance than the model with two hidden layers, theoretically the one with the highest capacity.We set the number of hidden layers in our network to 4. It presents the best performance on the trials considered, with shorter execution time than its direct competitors.

The Width Test
We set the number of hidden layers to 4 and we consider the following numbers of units: 1, 2, 4, 8, 16, 24, 32, 48, 64, 96, and 128.All other specifications remain unchanged.
The test shows that, as the number of neurons per layer increases, the network performance increases, as well as the execution time required.In order to achieve a balance between network efficiency and training time, we choose to select 48 units per hidden layer.

Numerical Experiments on Network Robustness
In this subsection we used the Adam optimizer with a learning rate of 10 −4 .On the one hand, it has been tested how the model training responds to market situations different from the reference configuration.In general, similar results are obtained, although in situations of stressed volatility the so-called zero-inflated data problem appears.On the other hand, the influence of the swap features is analyzed.Roughly speaking, it will be necessary to have a trained model for each maturity considered, but it is feasible to use the model trained for a given frequency payments on swaps at different frequencies.

Conclusions and Future Research
We have found that the proposed Deep Learning model provides good approximations of IM trajectories for the simplified portfolio considered.It shows an excellent performance on our main study dataset.It is maintained for higher volatility environments.We also concluded that it is feasible to use the same model as an IM computation engine for swaps with different payment structures.However, this is not possible for different maturities.It is necessary to have a model for each case.
Future research related to this work should be focused on the scalability of the model to other interest rate products; building a model for the IM computation of other ISDA product classes, such as equity or commodity; and developing a similar neural networkbased methodology to compute the IM for a real portfolio, consisting of many contracts from different product classes and driven by multiple risk factors.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Figure 1 .
Figure 1.Results obtained for the depth test.(a) convergence of the MSE training set with respect to the number of hidden layers; (b) execution time according to the number of hidden layers.