Option Portfolio Selection with Generalized Entropic Portfolio Optimization

In this third and final paper of our series on the topic of portfolio optimization, we introduce a further generalized portfolio selection method called generalized entropic portfolio optimization (GEPO). GEPO extends discrete entropic portfolio optimization (DEPO) to include intervals of continuous returns, with direct application to a wide range of option strategies. This lays the groundwork for an adaptable optimization framework that can accommodate a wealth of option portfolios, including popular strategies such as covered calls, married puts, credit spreads, straddles, strangles, butterfly spreads, and even iron condors. These option strategies exhibit mixed returns: a combination of discrete and continuous returns with performance best measured by portfolio growth rate, making entropic portfolio optimization an ideal method for option portfolio selection. GEPO provides the mathematical tools to select efficient option portfolios based on their growth rate and relative entropy. We provide an example of GEPO applied to real market option portfolio selection and demonstrate how GEPO outperforms traditional Kelly criterion strategies.


Introduction
Our series on the topic of portfolio optimization has introduced novel entropy-based optimization problems that facilitate the selection of a variety of efficient portfolios. The return-entropy portfolio optimization (REPO) [1] allocates capital to an equity portfolio based on the expected return and Shannon entropy of portfolio returns. REPO was adapted to form the discrete entropic portfolio optimization (DEPO) [2] for application to portfolios comprised of assets with discrete distributed returns, like exotic instruments such as binary or digital options, or fixed-return options (FROs). In this paper we further extend DEPO to accommodate mixed returns with generalized entropic portfolio optimization (GEPO). GEPO can handle a combination of discrete and continuous returns, such as those exhibited by option strategies. Option strategies have expected return payoff functions that contain intervals of discrete returns and intervals of continuous returns.
The novel optimization methods introduced throughout our series on entropic portfolio optimization can be summarized as follows: (1) Return-entropy portfolio optimization (REPO) [1]: for the selection of portfolios comprising continuous return assets, such as equities, indices, mutual funds, or other non-derivative financial products, (2) Discrete entropic portfolio optimization (DEPO) [2]: for the selection of portfolios comprising strictly discrete return assets, such as binary options, digital options, fixed-return options (FRO), sports bets, or other betting wagers, and (3) Generalized entropic portfolio optimization (GEPO): for the selection of portfolios comprising assets with mixed returns (returns that can exhibit both continuous and discrete returns), like option strategies such as covered puts, married calls, credit spreads, straddles, long strangles, butterfly spreads, iron condors, and more. GEPO can handle portfolios that contain various types of options that are much more general than binary options. As seen in Section 2, these option strategies have a more generalized payoff function that include both continuous parts and discrete parts, and thus exist outside the scope of REPO or DEPO alone. GEPO can be thought of as combining the capabilities of both REPO and DEPO.
The applications of these new optimization methods can be illustrated by the following investor return payoff function diagrams: Figure 1 represents the payoff function for an investor with a regular long position in a stock. For every dollar increase or decrease in the underlying stock price, a proportional increase or decrease is generated in investor return. This payoff function is continuous, since the underlying stock price and thus investor returns fall on the (non-negative) real line.
The second Figure 2 illustrates the payoff function for a binary option. This type of option generates a fixed positive return W if the underlying asset price lands above a certain threshold, in this case a strike price of $60, and generates a fixed negative return L otherwise. This payoff function is strictly discrete, since the only possible return states are +W and −L.
The latter Figure 3, the focal topic of this paper, is an example of a payoff function for a more complex option strategy called a bull put credit spread. This strategy involves buying and selling put options (an option to sell assets at an agreed upon price and date) in such a manner to limit one's potential risk. The resulting payoff function covers a "spread" outside of which the option behaves just like a binary option-it generates a fixed positive return W if the underlying asset price lands above the upper bound, and generates a negative fixed return L if it lands below the lower bound. In between, if the price lands between the boundaries of the spread, the investor receives a partial return, proportional to the distance between the midpoint and the underlying price, as seen in the payoff function. These returns can be both discrete and continuous: +W, −L, and anything in between. These kind of option strategies are the main motivation behind this paper.   Traditional portfolio optimization methods, such as those employed by Markowitz mean-variance portfolio optimization (MVPO) [3], are not suited to handle the discretely distributed nature of option returns as these distributions cannot be described by mean and variance alone. So continuing with the use of relative entropy as the proxy for portfolio risk (as used in DEPO), GEPO is an optimization method containing an objective function that simultaneously maximizes the portfolio growth rate and minimizes the relative entropy of the portfolio with respect to the uniform distribution. In the case of option portfolios, GEPO selects a collection of options from a set of possible choices, for example a portfolio of credit spreads, in order to maximize the expected growth rate for a given level of relative entropy risk. By using the combinatorial generating functions to empirically calculate entropy, as done in REPO and DEPO, we are able to calculate the relative entropy risk of an option portfolio in GEPO by extending the scope to include both discrete and continuous returns. GEPO calculates the relative entropy over multiple return probability states, including return states with continuous returns. In terms of the expected return, the Kelly criterion provides valuable insights into maximizing the portfolio growth rate of an option portfolio. To that effect, we extend the Kelly criterion growth rate to include instances of both discrete and continuous returns. GEPO empowers investors to quantitatively select a portfolio of options based on their risk-reward tolerance.
The remainder of this paper is organized as follows. The following Section 1.1 gives a literature review of research on the topic of option portfolio optimization. Section 2 explains the technical details behind maximizing the portfolio growth rate by extending the Kelly criterion to generalized option strategies, with specific examples for several popular option strategies. Section 3 provides a brief review of information theory, Shannon entropy and Kullback-Leibler divergence, the foundation of entropic portfolio optimization. As the main feature of this paper, the GEPO problem is presented in Section 4. Finally, Section 5 demonstrates an example of GEPO selecting a portfolio of equity credit spreads chosen from the S&P100 composite index, and shows how GEPO outperformed the Kelly criterion and alternative Kelly criterion methods.

Literature Review
Option portfolio optimization is a considerably new topic of research, with the earliest found work beginning only this millennium. For a constant relative risk aversion investor, Liu (2003) [4] modeled the stochastic volatility to optimize a portfolio comprising one equity, a put option, and cash, deriving an analytic solution for the optimal allocation. Unfortunately for this method, a parametric model must be specified and the risk is mostly concentrated on just one option. Jones (2006) [5] exploited apparent mispricing of put options to derive an optimal portfolio of options using a general nonlinear latent factor model, but this model is overburdened by numerous required parameters. Eraker (2007) [6] modeled stochastic volatility parametrically, and then used the traditional mean-variance framework to optimize allocation between straddles, puts, and calls, yielding a closed-form solution for portfolio weights. Haugh (2007) [7] used duality and approximate dynamic programming (ADP) methods to facilitate high-dimensional American option pricing and portfolio optimization. Zymler (2011) [8] used robust portfolio optimization aimed to maximize the worst-case portfolio return for designing portfolios that include European-style options. This model trades off weak and strong guarantees on the worst-case portfolio return.
A moderate amount of research into option portfolio optimization has been conducted with respect to the use of value-at-risk (VaR) or conditional value-at-risk (CVaR) as a risk measure. See Alexander (2003Alexander ( , 2006 [9,10], Zymler (2013) [11], and Maasar (2016) [12]. As illustrated by Alexander, the VaR and CVaR minimization problems for derivative portfolios are typically ill-posed in the sense that there are many portfolios that have similar CVaR/VaR values to that of the optimal portfolio and slight perturbation of the data can lead to significantly different optimal solutions. Zymler notes that optimization problems involving VaR are often computationally intractable and require complete information about the return distribution of the portfolio assets, which is rarely available in practice. For these reasons, we conclude that VaR and CVaR methods are not ideal approaches to option strategy portfolio optimization.
Driessen (2013) [13] used generalized method of moments to maximize expected returns for a portfolio comprising a stock, an option strategy (puts and straddles), and cash. Constantinides (2013) [14] leverage-adjusted portfolios of either calls or puts by using the omega or lambda, known as the options' elasticity. Fadugba (2014) [15] used a binomial model as a performance measure to price American and European options, and exploited mispricings to derive optimal portfolios. In a continuous time regime-switching market, Fu (2014) [16] introduced a functional operator to maximize the expected utility of the terminal wealth of a portfolio that contains an option, an underlying stock and a risk-free bond. Fatyanova (2017) [17] developed a constrained optimization problem for constructing an option portfolio that maximizes a certain payoff function. Faias (2017) [18] also noted that traditional portfolio optimization methods like mean-variance optimization are not suitable for option portfolios due to non-normality and difficulty estimating distribution of returns, then introduced a short-term view objective function used to optimize portfolios of European options mainly by exploiting mispricing between options. Zhao (2018) [19] used first-and second-order moments to model options returns and extended the Markowitz mean-variance framework to include option selection with much lower computational time than off-the-shelf solvers. Zeng (2018) [20] introduced a progressive hedging algorithm using reinforcement learning (Q-learning) for option portfolio optimization that was first of its kind to consider the exercise timings of an American option.
Almost all these alternative methods require some kind of parametric model assumption and are limited to the number of options composing the portfolio. GEPO is non-parametric and delivers well-diversified portfolios of many options with no limit. Additionally, all these methods misguidedly aim to maximize the portfolio expected returns, and besides the use of VaR and CVaR there has not yet been any suggestion for measuring or managing the risk of these option portfolios. GEPO not only minimizes the relative entropy risk of the portfolio, but also maximizes the proper metric for measuring the future performance of option returns: exponential growth rate.

The Kelly Criterion for Multiple Wagers
As previously presented in Mercurio et al. (2020) [2], we use the extension of the Kelly criterion (Kelly, 1956) [21] to n wagers. For events Y 1 , . . . , Y n with success probabilities p 1 , . . . , p n , let w represent the total percentage of bankroll to be wagered. For a portfolio that allocates the wager equally across among events, the growth rate coefficient G would be Therefore, by denoting p = 1 n ∑ n i=1 p i (this can be thought of as a blended probability of success), the Kelly criterion can be used here to identify the optimal size wager for maximum growth rate as w * = 2p − 1.

Extension of the Kelly Criterion to Option Strategies
This section is a brief review of several most popular option strategies, with a summary of the details and motivation behind each. Additionally, we extend the Kelly criterion to find optimal growth rates of each option strategy. Further details and strategies can be found in the TMX Group Montreal Exchange guides and strategies documentation (2020) [22].

Covered Call
A covered call option strategy involves selling a call that is covered by an equivalent long stock position, providing a hedge on the stock. In exchange for upside potential, an investor can earn income on the premium. The motivation behind a covered call is to earn premium income. The maximum loss is limited to the initial stock purchase price slightly reduced by the premium income from sale of the call option. The maximum gain is limited, equal to the strike price less the initial stock purchase price, plus the premium income received. A sample expected payoff function for a covered call strategy is illustrated below in Figure 4.
For the price of underlying security S and option return R, the growth rate for a covered call strategy becomes where represents the probability P(S ∈ I 1 ) = P(R < 1), p 2 represents the probability P(S ∈ I 2 ) = P(R = 1) with p + = 1, and α is the expected value of returns conditional on underlying price falling on interval I 1 , α = E(R | S ∈ I 1 ). Then G is maximized by differentiating with respect to w and setting equal to zero, which implies a positive w * exists if and only if p > α/(α − 1), for α = 0.

Married Put
A married put, or protective put option strategy involves adding a long-put position to a long stock position, forming a lower bound for the stock value. The investor profits as the stock price keeps rising. The motivation for a married put is to hedge against a temporary decrease in the stock price. The maximum loss is the stock purchase price less the strike price of the put plus the premium paid for the option. There is unlimited potential gain for a married put. A sample expected payoff function for a married put strategy is illustrated below in Figure 5. For the price of underlying security S and option return R, the growth rate for a married put strategy becomes where q represents the probability P(S ∈ I 1 ) = P(R = −1), represents the probability P(S ∈ I 2 ) = P(R > −1) with q + = 1, and α is the expected value of returns conditional underlying price landing in interval . Then G is maximized by differentiating with respect to w and setting equal to zero, which implies a positive w * exists if and only if q < α/(α + 1), for α = 0.

Credit Spread
A put credit spread, or bull put spread option strategy consists of a short-put option at a certain strike price and a long-put option at lower strike price. The investor profits with a rise in the underlying stock price. The motivations for a credit spread include to earn income with limited risk and to moderately profit from a rise in the stock price. The maximum loss is limited, equal to the net difference between the higher and lower strike prices less the net premium received. The maximum gain too is limited to the net premium received when putting on the position. A sample expected payoff function for a put credit spread strategy is illustrated below in Figure 6. Alternatively, a call credit spread, or bear call spread option strategy consists of a short call option at a certain strike price and a long call option at a higher strike price. This way the investor profits with a decrease in the underlying stock price. The maximum loss is limited, equal to the net difference between the higher and lower strike prices less the net premium received. The maximum gain too is limited to the net premium received when calling on the position. A sample expected payoff function for a put credit spread strategy is illustrated below in Figure 7. For the price of underlying security S and option return R, the growth rate for a credit spread strategy becomes where p is the probability P(R = 1), represents the probability P(S ∈ I 2 ) = P(−1 < R < 1), q represents the probability P(R = −1), with p + q + = 1, and α is the expected value of returns conditional on underlying price landing in interval . Then G is maximized by differentiating with respect to w and setting equal to zero, by the quadratic formula, for α = 0.

Straddle
A straddle option strategy involves buying a call and buying a put with equal strike price and expiration date. The investor profits when the underlying stock price experiences a big move up or down. The motivation behind a straddle is to capitalize on correctly predicting a big price move or high volatility in the near future. The maximum loss for a straddle is limited to the premium paid for the call and put options. The potential gain is unlimited. A sample expected payoff function for a straddle strategy is illustrated below in Figure 8.
For the price of underlying security S and option return R, the growth rate for a straddle strategy becomes where ς represents the probability P(S ∈ I 1 ), represents the probability P(S ∈ I 2 ), and expected values β = −E(R | S ∈ I 1 ) and α = E(R | S ∈ I 2 ), while + ς = 1. Then G is maximized by differentiating with respect to w and setting equal to zero, which implies a positive w * exists if and only if the odds ratio /(1 − ) > β/α, for α, β = 0.

Long Strangle
A long strangle option strategy involves buying an out-of-the-money call option and an out-of-the-money put option with the same expiration date. A strangle is similar to a straddle except a straddle has equal strike price whereas a strangle has a call option with higher strike price than the put option. The investor profits when there is a very big move up or down in the stock price. The motivation behind a long strangle is to capture a big move in the stock price over the term of the option. The maximum loss for a straddle is limited to the net premium paid for the call and put options. A sample expected payoff function for a long strangle strategy is illustrated below in Figure 9. For the price of underlying security S and option return R, the growth rate for a long strangle strategy becomes where ς represents the probability P(S ∈ I 1 ), q represents the probability P(S ∈ I 2 ) = P(R = −1), and represents the probability P(S ∈ I 3 ), and expected values β = −E(R | S ∈ I 1 ) and α = E(R | S ∈ I 3 ), while q + + ς = 1. Then G is maximized by differentiating with respect to w and setting equal to zero, by the quadratic formula, for α, β = 0.

Butterfly Spread
A butterfly spread, or long call butterfly option strategy consists of two short calls at a middle strike price and two long calls, one at the lower strike and one at the higher strike price, all with the same expiration date. The investor profits by correctly predicting the underlying stock price at expiration. The motivation behind a butterfly spread is to capitalize from predicting a target stock price at the options expiry date. The maximum loss for a butterfly spread is the short call strike price less the lower long call strike price less the net premium paid. The potential gain is unlimited. A sample expected payoff function for a butterfly spread strategy is illustrated below in Figure 10. For the price of underlying security S and option return R, the growth rate for a butterfly spread strategy becomes where q represents the probability P(S ∈ I 1 ∪ I 4 ) = P(R = −1), represents the probability P(S ∈ I 2 ), ς represents the probability P(S ∈ I 3 ), and expected values α = E(R | S ∈ I 2 ) and β = −E(R | S ∈ I 3 ), while q + + ς = 1. Then G is maximized by differentiating with respect to w and setting equal to zero, by the quadratic formula, for α, β = 0.

Iron Condor
An iron condor, or short condor option strategy involves selling one call and buying another call with a higher strike price, plus selling one put and buying another put with a lower strike price, with the current underlying price falling between the call and put strikes. The investor profits if the underlying stock price is between the call and put strikes at option expiration. The motivation behind an iron condor is when the investor foresees the stock trading in a narrow range over the life of the options. The maximum loss of an iron condor is the greater of the difference between high and low call strikes and high and low put strikes, less the net premium received. A sample expected payoff function for an iron condor strategy is illustrated below in Figure 11. For the price of underlying security S and option return R, the growth rate for an iron condor strategy becomes where q represents the probability P(S ∈ I 1 ∪ I 5 ) = P(R = −1), represents the probability P(S ∈ I 2 ), p represents the probability P(S ∈ I 3 ) = P(R = 1), ς represents the probability P(S ∈ I 4 ), and expected values α = E(R | S ∈ I 2 ) and β = −E(R | S ∈ I 4 ), while p + q + + ς = 1. Then G is maximized by differentiating with respect to w and setting equal to zero, if and only if α = −β = 0 (reflective symmetry for expected values), by the quadratic formula.

Minimum Relative Entropy
For the purposes of GEPO, the risk of an option portfolio is defined here as the relative entropy of portfolio returns, with respect to the uniform distribution. In order to calculate the relative entropy, we first must calculate the Shannon entropy.

Shannon Entropy
As a quick review of information theory   [23,24], the Shannon entropy of a random variable represents the amount of randomness inherent to that variable. For a discrete random variable X with probability mass function P(·) that can take on possible values x 1 , . . . , x n , the Shannon entropy H is the average amount of information produced by X, defined as For n discrete random variables X 1 , . . . , X n respectively with m 1 , . . . , m n states, the joint entropy of X = (X 1 , . . . , X n ) is given by This can be calculated empirically given a set of historical data by using the method introduced in REPO [1]. For the case of options returns, many strategies have more than just two (binary) states. Generally there can be up to four total states: positive discrete returns (+1), negative discrete returns (−1), continuous returns positively correlated with underlying price, and continuous returns negatively correlated with underlying price.

Kullback-Leibler Divergence
Kullback and Leibler (1951) [25,26] introduced the Kullback-Leibler divergence which measures the directed divergence between two probability distributions. For discrete probability distributions P and Q, the Kullback-Leibler divergence between P and Q, also known as the relative entropy of P with respect to Q, is given by As shown in our previous paper (2020) [2], relative entropy qualifies as a convex risk measure based on the relative entropy principle. We once again use this quantity as the discriminatory risk measure for option portfolio optimization. For m total possible states, we will use the m-state discrete uniform distribution U m as the reference distribution and measure from there the distance to the distribution of portfolio returns. Thus, for Shannon entropy H(·), the risk of an option strategy portfolio R Q is measured by the relative entropy of R Q with respect to the uniform distribution U m , Using the same combinatorial technique employed for REPO [1] and DEPO [2], the Shannon entropy of option portfolio returns R Q can be estimated empirically via probability generating functions. For a collection of n discrete return assets over time period j = 1, . . . , T, let r j = (r 1j , . . . , r nj ) denote the cross-sectional n-dimensional vector of outcomes across one observational row of data, and let them be uniquely represented by the collection of u k 's such that u k = {r j | r j = u l , for some j, and any l = k}. Then the empirical Shannon entropy of option portfolio returns R Q can be expressed as for kth-derivative at x = 0 of generating function g(x; w 1 , . . . , Therefore, the risk of an option portfolio is given by the relative entropy of portfolio returns R Q , estimated empirically as for m-state discrete uniform distribution U m .

Generalized Entropic Portfolio Optimization (GEPO)
The new generalized entropic portfolio optimization (GEPO) problem uses a multi-objective function that minimizes estimated relative entropy and maximizes expected growth rate. Using this optimization, investors can make portfolio selections based on a chosen risk tolerance. The highest risk portfolio solely maximizes the expected portfolio growth rate, equivalent to the Kelly criterion method. The lowest risk portfolio minimizes the portfolio relative entropy, the most diversified portfolio allocating capital to all n options equally. Somewhere in between lies a user's optimal portfolio of choice. For the case of option strategies the returns can be both discrete and continuous in nature. For example, credit spread returns can be either +100% for a success, −100% for a failure, or somewhere in between −100% and +100% on a continuous scale for partial returns which we will denote by 0. Thus, we would have discrete return outcomes u such that u ∈ {−1, 0, +1}. This leads to the generalized entropic portfolio optimization problem. Consider n potential option strategy contracts. Let λ be the number of probability states in the single-asset option strategy. For the generalized option strategy, returns can exhibit at most four general unique probability states: +100%, −100%, some continuous return on a positively sloped leg (with mean α), and some continuous return on a negatively sloped leg (with mean β), therefore 1 < λ ≤ 4. Let the up-to-four probability states be represented respectively by p i , q i , i , ς i for event i ∈ {1, . . . , n}, and let w i represent the percentage of portfolio funds to be allocated on option i, with the total allocation summing to ω = w 1 + · · · + w n . Let r j = (r 1j , . . . , r nj ) denote the cross-sectional n-dimensional vector of outcomes across one observational row of data. Over T data points this leads to m historical unique vectors u k = {r j | r j = u l , for some j, and any l = k} for k = 1, . . . , m such that each u k is unique, with m bounded by either T or the maximum number of possible combinations λ n , so m = min(T, λ n ). Basically, the collection of u k 's is a unique representation of the r j 's with no duplicates. Let us also denote η = ∑ n i=1 I(w i ) ≤ n as the number of chosen options in the portfolio, where I(w i ) is the indicator function for the event {w i > 0}. Then the GEPO problem is defined as the following optimization program, generalized for different kinds of option strategies, subject to ω = w 1 + · · · + w n ≤ 1, for the m-state uniform distribution U m and kth-derivative at x = 0 of probability generating function g(x; w 1 , . . . , w n ) = 1 T T ∑ j=1 x {k : (I(w 1 )r 1j ,...,I(w n )r nj )=u k } . (24) The last constraint in the optimization problem stems from the fact that joint entropy measures randomness strictly based on the inclusion or exclusion of a random variable. The joint entropy value does not change upon changes to non-zero percentages of asset allocation, so any non-zero weight w i contributes the corresponding marginal entropy from asset i, regardless of the magnitude of w i . For this reason every asset included in the portfolio is assigned an equal weighting of η −1 ω.

Risk-Adjusted Performance
Here we will use the risk-adjusted ratio for comparing growth rates of gambling portfolios introduced in our previous paper (Mercurio et al., 2020) [2], called the Growth Rate Over UNiform Divergence (GROUND) ratio. This ratio measures the expected growth rate of a portfolio, adjusted by its risk level-relative entropy with respect to the uniform distribution. Let U m be the m-state discrete uniform distribution. Then for chosen portfolio R a and minimum risk portfolio R b existing in the m-state event space, the GROUND ratio Γ m is defined as where G a (ω a ) is the growth rate of the chosen portfolio with weighting ω a , G b (ω b ) is the growth rate of the minimum risk portfolio with weighting ω b , D KL (R a U m ) is the relative entropy of the chosen portfolio with respect to U m , D KL (R b U m ) is the relative entropy of the minimum risk portfolio with respect to U m , and H(·) is the Shannon entropy.

Data
In this example, actual put and call option data is presented for 20 randomly selected equities from the S&P100 composite index using the Wharton Research Data Services (WRDS) from the Wharton School of the University of Pennsylvania, found at wrds-www.wharton.upenn.edu. Prices, volumes, expiration dates, and other essential option data is compiled from June 2012 to January 2018 (June 2012 is the earliest month available that contains data for all listed securities). Included in this data are the Greek parameters for options, which are described in detail in the Equity Options Reference Manual from TMX Group Montreal Exchange guides and strategies documentation (2020) [22]. The delta of each option, from the Greek parameters for options, effectively represents an estimated probability of landing in-the-money at expiry, and this parameter serves as the estimated success rates for the portfolio optimization here. For the purposes of this paper, we are only concerned with deltas closest to, but not less than 50%, to ensure every possible option yields a slight edge. Using these data archives, we are able to build historical weekly bull put spread and bear call spread options by selecting the buy-sell pairs that most closely resemble a 1:1 odds wager for each expiration date, creating 287 unique data points for each equity. Weekly credit spread outcomes versus historical strike prices are recorded and computed as follows, I(E P > S P ) − I(E P ≤ B P ) + (E P − M P )/(S P − M P ), for put spreads, and I(E P > S P ) − I(E P ≤ B P ) + (E P − M P )/(B P − M P ), for call spreads, where E P is the expiration price for the underlying equity in question, S P is the strike price for the sold option, B P is the strike price for the bought option, and M P is the midpoint of the two strike prices. The result is a value between +1 and −1 inclusive, on the continuous line, where positive 1 is awarded for an expiration price above the upper strike, −1 for an expiration price below the lower strike, and partial continuous returns between −1 and +1 for spreads expiring in the middle interval in between strike prices from the credit spread payoff function diagrams in Figures 6 and 7. Using this historical data, we are able to empirically calculate the estimated relative entropy of each option. The selected credit spread options and their respective outcomes against strikes over June 2012 to January 2018 are presented below in Table 1. Over the following calendar year 2018, there are 52 weekly equity option expiration periods, and estimated success probabilities for each option are defined as follows. For put options, the negative delta represents the probability of landing in-the-money for bought puts, and (1 + delta) represents landing out-the-money for sold puts. For call options, the delta represents landing in-the-money for bought calls, and (1 − delta) represents landing out-the-money for sold calls. The historical results summarized in Table 1 are used to evaluate the estimated relative entropy risk of each option, as well as the combined estimated relative entropy of the composite portfolio. The emulation here shows how GEPO performs against leading Kelly criterion methods for picking a portfolio of options for each week throughout 2018.
For illustrative purposes, let us examine this method applied to the second week, with option expiration dates 12 January 2018. Table 2 lists the details of the selected group of equity credit spreads built of the appropriate option pairs. GEPO determines which collection of credit spreads to select and what percentage of portfolio funds to allocate to each, in order to build the optimal risk-reward credit spread portfolio.
Each potential portfolio has an expected growth rate, given the delta projections, and an estimated relative entropy with respect to the uniform distribution. The historical data contains T = 287 data points for each option, so the maximum joint entropy that can possibly be exhibited is log 3 (T) = 5.1515, i.e., uniform distribution with m = 287 possible probability states. Therefore, a portfolio R Q has an estimated relative entropy of for joint entropy H(R Q ).

Efficient Frontier and Portfolio Selection
In the portfolio selection problem, the efficient frontier refers to the set of the optimal portfolios that yield the greatest expected return for a defined level of risk, or equivalently the least risk of a defined level of expected return (the dual problem). The efficient frontier illustrates the risk-return trade-off for a given set of optimal portfolios. Here we show the analogous efficient frontier for portfolios with discrete returns, comparing the expected growth rates and risk levels (estimated relative entropy) of each efficient portfolio. For the same week of 12 January 2018, Figure 12 below plots the potential portfolios and their respective expected growth rates against their inherent risk profile, the estimated relative entropy with respect to the uniform distribution Uniform(T) to historical joint outcomes of the portfolio.
For the current season emulation, the Kelly criterion strategy chooses the portfolio R K that maximizes the expected growth rate. For week 2 this leads to the top right-most data point K = (4.5206, 0.0056), with estimated relative entropy of 4.5206 and expected growth rate of 0.0056. This portfolio consists of just one equity credit spread from Table 2: bull put spread on IBM, consisting of buying a put with strike 150 and selling a put with strike 152.5, betting on IBM to expire above 152.5 with a 50.5% probability of success, 31.2% probability of loss, and 18.3% of a partial return. According to our extended Kelly criterion conditions from Section 2.2, the optimal bet size here is 12%, and thus the chosen portfolio for week 2 is 12% of portfolio funds on [150, 152.5]<IBM, shown below in Table 3. This strategy disregards any concept of risk associated with the expected portfolio growth rate of G(ω) = 0.0056.  Alternatively, GEPO chooses the optimal portfolio based on the risk-reward trade-off. For each of the weeks 1 to 52, portfolio selection is performed according to the following GEPO problem with n = 20, α = −0.5 and T = 287 (using logarithm base 3 here since we are dealing with three outcome states: 1, 0, and −1), for Uniform(T) distribution as the target distribution, and for the kth-derivative at x = 0 of probability generating function with T data points, g(x; w 1 , . . . , w n ) = 1 T T ∑ j=1 x {k : (I(w 1 )r 1j ,...,I(w n )r nj )=u k } .
Although the Kelly criterion places the entire wager on the option (or options) with the greatest expected growth rate, GEPO diversifies the portfolio by distributing the percent allocation across multiple options according to the appropriate risk profile. For week 2, GEPO selects data point D = (1.5845, 0.0035), with estimated relative entropy of 1.5845 and expected growth rate of 0.0035. This corresponds to the optimal portfolio of the six credit spreads listed below in Table 4, with a total portfolio allocation of 9% (compared to the 12% allocated by the Kelly criterion). Looking at the portfolio efficiency via the risk-adjusted GROUND ratio, the GEPO portfolio has a GROUND ratio of Γ m = (0.0035 − 0.0021)/(1.5845 − 0.05) = 0.213%, more than twice as efficient as the Kelly criterion portfolio at Γ m = (0.0056 − 0.0021)/(4.5206 − 0.05) = 0.078%.
The actual expiration prices that follow for week 2 are IBM at 163.14 (win), AIG at 60.97 (94% partial loss), MCD at 173.57 (win), ORCL at 49.51 (win), FB at 179.37 (win), and MRK at 58.66 (loss), for a total return of 3.2% in week 2. Therefore, the Kelly criterion strategy experiences a gain of 12% of portfolio balance in week 2, while the competing GEPO strategy gains 3.2%.

Comparison to the Kelly Criterion Over Time
We demonstrate here the performance of GEPO versus the Kelly and Kelly variant strategies over the entire 2018 calendar year, executing option strategies at weekly intervals. Methods in the previous Section 5.2 are repeated week by week over the course of 52 weeks. The Kelly criterion strategy allocates the optimal investment size each week on the option (or options) that yield the greatest expected growth rate. Half Kelly is the same strategy but uses the fractional Kelly variant by allocating only half the Kelly criterion weighting on the same options. GEPO optimal risk strategy employs the GEPO algorithm each week to select the portfolio with the greatest growth rate subject to the main constraint that the portfolio has an estimated relative entropy of no greater than 2. Each strategy begins the year with $10,000 and the total results are shown below in Figure 13. With consistent, sustainable returns, GEPO ultimately outperforms both the Kelly and half Kelly methods over the 52-week period, and more than doubles the initial investment by the end of the year. As the Kelly strategies experiences gross variability with the see-saw pattern returns, the diversification strategy of GEPO holds strong and consistently returns profits month after month. The main purpose of GEPO is to mitigate risk of inaccurate predictions, and goal is well accomplished. In the end, GEPO finishes the year at a profit of $10,062, more than 100% ROI, while the Kelly criterion gives up most gains and only retains $2645 (26.45%) profit, with half Kelly finishing up $1770 (17.7%).

Conclusions
Presented here is a new entropy-based combinatorial approach to option strategy portfolio selection called generalized entropic portfolio optimization (GEPO). GEPO is the most general method of the entropic portfolio optimizations introduced in our research series. We extend the notorious Kelly criterion to accommodate multiple assets and mixed returns, with direct application to option strategies. Using the convex risk measure relative entropy, GEPO presents a robust method for evaluating risk of option strategy portfolios and gives the mathematical tools to make data-driven portfolio selection decisions to mitigate risk. GEPO is robust, non-parametric, and indifferent to non-normality, asymmetry and small sample size data, making it an ideal approach to the option strategy portfolio selection problem. We show how GEPO comfortably outperforms leading Kelly criterion strategies in choosing optimal portfolios of equity credit spreads over 2018, both absolutely and in terms of risk-adjusted performance via the GROUND ratio. GEPO has a wide range of applications including option strategies such as covered calls, married puts, credit spreads, straddles, strangles, butterfly spreads, iron condors, and more.

Materials and Methods
Equity option data sourced from OptionsMetrics (www.optionmetrics.com), provided by Wharton Research Data Services (WRDS) from the Wharton School of the University of Pennsylvania, found at wrds-www.wharton.upenn.edu.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: