Market Graph Clustering Via QUBO and Digital Annealing

Our goal is to find representative nodes of a market graph that best replicate the returns of a broader market graph (index), a common task in the financial industry. We model our reference index as a market graph and express the index tracking problem in a quadratic K-medoids form. We take advantage of a purpose built hardware architecture, the Fujitsu Digital Annealer, to circumvent the NP-hard nature of the problem and solve our formulation efficiently. In this article, we combine three separate areas of the literature, market graph models, K-medoid clustering and quadratic binary optimization modeling, to formulate the index-tracking problem as a quadratic K-medoid graph-clustering problem. Our initial results show we accurately replicate the returns of a broad market index, using only a small subset of its constituent assets. Moreover, our quadratic formulation allows us to take advantage of recent hardware advances, to overcome the NP-hard nature of the problem.


Introduction
Our work is an empirical implementation of the K-medoid clustering technique expressed as a quadratic unconstrained binary optimization model (QUBO) and applied to market graphs. It is inspired by the seminal work of Boginski et al [6,8], Cornuéjols et al. [13] and that of Bauckhage et al. [4]. We combine these pieces of complementary but disjoint work, to formulate the index-tracking problem as a QUBO K-medoid clustering of a broader market graph problem.
Graph clustering is an unsupervised learning task, consisting of assigning common labels to vertices deemed similar. It has found applications in many areas. Chemistry, biology, social-networks and finance are a few examples where graph clustering has been applied. However, while there are many competing techniques, the graph clustering problem is NP-hard, which limits its scope of application.
QUBO formulations of many mathematical problems have recently gained in interest. This recent spike in interest is, in no small part, due to recent advances in computer hardware and the availability of purpose-built hardware for their solution that circumvent the NP-hard nature of the problem. Examples of this novel hardware are Fujitsu's Digital Annealer (DA) and D-Wave's Quantum Annealer.
Graphs have recently been introduced as models of the stock market. In addition, clustering of stock market data is a longstanding focus of interest for both practitioners and academics. It has been used for various purposes, like risk management and portfolio diversification, for example. Index-tracking is another longstanding interest in finance. It consists of building tracking-portfolios whose returns follow a broader index's return, but with a subset of stocks. Some authors in the field have used clustering for the purpose of index-tracking. Their methods identify exemplars of subsets of an index and construct tracking-portfolios consisting of only those exemplars.
Our initial results are very encouraging. Our tests show we accurately replicate the returns of a broad market index, using only a small subset of its constituent assets. Moreover, our QUBO formulation allows us to take advantage of recent hardware advances, to overcome the NP-hard nature of the problem.

Previous Work
Our work lies at the intersection of graph models of the stock markets, clustering, combinatorial optimization (QUBO) and index tracking. In this section, we briefly review these four areas of research. Our goal is not to provide the reader with a detailed review of the state of the art in these very broad fields, but rather to focus specifically on their relevance to the work in this article, in order to put it in context.
The use of graphs as models of the stock market is initially introduced in the literature by the very extensive work of Boginski et al. [6,7,10,8,9]. While different methods have been suggested for determining edge-weights, the idea is to model stocks as vertices and assign edge weights proportional to their returns correlations.
Other authors have also followed up on and expanded this work by studying graph dynamics over time [3,20] and examined methods for building the graph [5,21,19]. In fact, to this day, the topic of graphs as a model for equity markets remains a subject of discussion in the literature [1,23].
Graph clustering is the process of assigning common labels to vertices deemed similar. It has a long history in the literature. A thorough review of the graph clustering literature is beyond the scope of this article. For a very comprehensive view of the field, we refer the reader to the foundational work of Schaeffer [26], Fortunato [14] and the recent contribution by Fortunato and Hric [15].
The link between clustering and portfolio construction is of particular relevance to the work in our article [13,12,11,18,27,25]. Although not focused specifically on graph clustering, Cornuéjols et al. present a K-medoid formulation for index-tracking [13]. These authors use the standard K-medoid technique [17] to find 'K' representative stocks that compose a portfolio that replicates a broader index.

Methods
We begin with a market graph consisting of n = 453 stocks that have been constituents of the Standard and Poors 500 index (SP500) for every year since 2014. We apply a K-medoid index-tracking technique to find 'k = 10' exemplars that will form our tracking portfolio. Finally, to take advantage of fast purpose-built computer hardware, the Fujitsu DA, we express the K-medoid problem as a QUBO.

Market Graph
We represent our universe of n = 453 stocks as a complete weighted graph, where edge weights represent the association between stocks 'i' and 'j'.
These weights are defined using the Pearson correlation coefficient of the log daily returns, For each year in our study, edge weights are recomputed annually, using the previous year's daily returns. In modeling the market in this way, our investment universe is modeled as a complete weighted graph, with no selfloops (since ρ ii = 1).
To be consistent with the QUBO formulation of Bauckhage et al. [4], we convert our adjacency (distance) matrix into a more robust matrix ∆ = [δ ij ], with the elements δ ij = 1 − exp(− 1 2 × d ij ). We note that this formulation requires all-pairs distances (d ij ) be known, which is why we use a complete graph representation.

QUBO Model
Putting it all together, we formulate our K-medoid problem of finding a portfolio of k = 10 exemplars to replicate the returns of the n = 453 constituents of the SP500 as where, 1 denotes a vector of ones of appropriate dimension. (1) Our model (1) consists n decision variables, z i = 1 if node i is an exemplar node and 0 otherwise. We follow the example of Bauckhage et al. [4] and set the parameters α = 1 k , β = 1 n , γ = 2. (For more details on this QUBO formulation, we refer the reader to the original work of Bauckhage et al. [4].)

The Fujitsu DA: Purpose-Built Hardware
To circumvent the NP-hard nature of the clustering problem, we use purpose built architecture, the Fujitsu DA. The DA provides fast computation and is designed specifically for combinatorial optimization problems expressed in QUBO form [2,24].
All our computations for the minimization of the model described in Section 3.2 were done using this architecture. More specifically, these computations were done using hardware built exclusively for the University of Toronto's research environment.

Numerical Experiments
We use our K-medoid technique to construct four index-tracking portfolios, one for each year in our sample . For each year, we use the previous year's returns to compute stock-to-stock distances, build a new market graph and corresponding matrix ∆. We then optimize the QUBO model, using the DA, to obtain a tracking portfolio.
To assess tracking accuracy, we use tracking-error and "beta to the index", both measured with respect to the full SP500 in each year, as per industry practice. For each year, we compute the differences between the daily log returns of the SP500 benchmark and of the tracking portfolio. We calculate the standard deviation of the differences to obtain the annual tracking-error. We also regress index returns on market returns to obtain the "beta" of the tracking portfolio, the slope of the regression line.

Performance Measure: Tracking-Error
Tracking-error is the standard deviation of the differences between each pair of observations at a given time point (daily in this case). We denote it as and compute it as

Performance Measure: "beta"
The "beta" of the portfolio is the slope coefficient of the regression of its returns on market returns. A portfolio that tracks the index perfectly has a "beta" of one. The regression model we fit to obtain the "beta" is

Empirical Results
Tracking error, the "beta" and associated t-statistic are reported for each year, in Table 1. Daily log-returns for the replicating portfolio (solid blue line) and SP500 (dotted red line) are shown in Figure 1.

Conclusion and Future Work
Our results show that a QUBO formulation of the K-medoid problem can be successfully used to replicate a broad market index, using just a few  assets. Using only a subset of ten assets, we are able to track the SP500 daily returns with a tracking error of less than 1%. On the empirical side, future work will focus on alternate techniques for building the market graph and determining the optimal cardinality of the tracking subset. From a mathematical and computational point of view, we also intend to investigate alternate problem formulations and larger scale optimization.