1. Introduction and Brief Background
Portfolio construction is as much a timing problem as a financial instrument selection problem. Deciding what to invest in may be based on the historical rate of return of the assets, which is impacted by the interest the market is showing in the particular asset and, as a corollary, the liquidity that the assets will enjoy or not during the window of time one happens to have an interest in the given asset. This window of time is intimately connected to the acquisition and possession of the portfolio.
As an introduction, it seems fair to present the broad range of proposals that have been historically investigated to tackle the problem of portfolio volatility estimation and portfolio components discovery, not only from an entropic perspective, in order to emphasise the pertinence of our aim, and its “originality”.
The connection between entropy and excess market returns was investigated by
Maasoumi and Racine (
2002). They found significant evidence of small nonlinear unconditional serial dependence within the returns, but not conclusive evidence of superior profit opportunity when using marketswitching versus buyandhold strategies.
Dionisio et al. (
2007), questioning if the standard deviation is a good measure of risk and uncertainty, argued that entropy could present some advantages as a measure of uncertainty and simultaneously verify some basic assumptions of the portfolio management theory, namely the effect of diversification.
Liu and Chen (
2012) applied type2 fuzzy theory to the portfolio selection problem, in which the security returns are unknown and are characterised by type2 fuzzy variables. Not having the expectation and entropy of type2 fuzzy variable well defined, they opted to reduce first the type2 fuzzy variable and then propose a meanentropy model with reduced variables to be applied to the portfolio selection problem. The intention was to transform the established meanentropy model of reduced variables into equivalent parametric programming.
Ausloos (
2000) argued that in classical thermodynamics, the entropy is necessarily coupled to a temperature and that the temperature is known to mimic the inverse of a relaxation time. Similarly, he proposes that it is reasonable to assume that financial market actors may consider relaxation times on the market and that the relaxation times differ in reality, reflecting different perspectives that actors have regarding the market evolution. Nevertheless, the interaction which will be considered will be a selfinteraction of the information asymmetry level of the share price (stock value), which itself is in some sense due to different appreciation conditions of the market by actors (
Dhesi and Ausloos 2016).
To study portfolio diversity,
Song and Chan (
2020) proposed an adaptive entropy model, which incorporates entropy measurement and adaptability into the conventional Markowitz meanvariance model.
Mercurio et al. (
2020) introduced a new family of portfolio optimization problems called returnentropy portfolio optimization (REPO) that simplifies the computation of portfolio entropy using a combinatorial approach.
Novais et al. (
2022) proposed a portfoliooptimisation model that uses entropy and mutual information as risk measurements instead of variance and covariance. They experimented by comparing models that rely on meanvariance with counterparts based on meanentropy using a stochastic entropy estimation. Their results showed that when increasing return constraints on portfolio optimisation, the meanentropy models were more stable overall, exhibiting dampened responses in cumulative returns and Sharpe ratio in comparison to meanvariance methods.
According to Capital Asset Pricing Model (CAPM), the only risk that investors should be compensated for is the risk that cannot be diversified away (
Sharpe 1964;
Ross 1976). Only systematic risk will command a risk premium. CAPM is calculated according to the following formula:
where:
${R}_{x}$—expected return on security x
${R}_{rf}$—riskfree rate
${\beta}_{x}$—beta of the security x
${R}_{m}$—the expected return of the market.
The difference
$\left({R}_{m}{R}_{rf}\right)$ represents the risk premium. Since the riskfree rate
${R}_{rf}$ is pegged and readily available for a considered time interval, the expected return of the market
${R}_{m}$ is an estimation that may differ depending on the model used. There are also considerations related to the computation of returns on a stock mix, when short sales are permitted, or when short sales are not permitted (
Lintner 1965), and then the riskfree rate
${R}_{rf}$ can play a relevant decisional role.
A market index, such as the S&P500, is not the entire market. The total market includes thousands of other traded stocks and, in a broader sense, bonds, real estate, commodities, options, and many other assets of all sorts, including one of the most important assets any of us has: the human capital built up by education, work, and life experience (
Black and Litterman 1992;
Malkiel [1975] 2020). There is also evidence that, when estimating the volatility of the stock market, a more inclusive instrument, in terms of the number of symbols taken into account for what constitutes the stock market, provides safer grounds for nonmanipulative interpretations (
Bhowmik and Wang 2020). For example,
Saha et al. (
2019) found that the movements in the daily levels of the VIX index are explained by market fundamentals and not by manipulation. They show that the VIX closing values and VIX futures settlement prices from 2008 are consistent with normal market forces and are not artificial.
Moreira and Muir (
2017) proposed a volatilitymanaged portfolio strategy consisting of constructing portfolios that adjust monthly returns by the inverse of their previous month’s realised variance, thus decreasing risk exposure when variance was recently high and vice versa. They documented that this trading strategy earns large alphas across a wide range of asset pricing factors, suggesting that investors can benefit from volatility timing.
Volatility estimates of a portfolio of stocks need to be considered contextually. Volatility estimated through standard deviation or variance of asset prices over time may offer an indication of the dispersion of prices around the mean calculated for a given time interval but is the performance relative to other assets on the market, or against the market as a whole, that matters more when it comes to portfolio selection. For example,
Carr and Wu (
2009) used a large option data set to synthesise variance swap rates and investigate the historical behaviour of variance risk premiums on five stock indexes and 35 individual stocks.
Cejnek and Mair (
2021) implemented timing regressions and related returns of a volatilitymanaged portfolio to discount rate, cash flow, and expected volatility, providing evidence that volatility management outperforms by levering up good times without increasing downside exposure to fundamental risk drivers.
Fama and French (
1992) divided all traded stocks into deciles according to their beta measures over the 1963–1990 period. They concluded that there is no relationship between beta and return. Additionally, small firms tended to outperform larger firms with the same beta levels. Therefore, size is a risk factor that deserves to be compensated for with additional return.
Castellano and Cerqueti (
2014) analysed a meanvariance optimal portfolio selection problem in the presence of risky assets characterized by lowfrequency trading and therefore low liquidity. These attributes most often describe small market capitalization companies, which are not well known by investors and whose stocks, consequently, are not traded regularly and/or not in significant volumes. On the other hand, the Fama and French threefactor model (
Fama and French 1992,
1993), which takes into account the beta relative to the market index, the capitalization of the company (size), and the market price versus book value as a ratio, shows that the smaller firms are relatively risky. There is evidence (
Fama and French 1995,
1996) that returns are higher for stocks with lower pricetobook ratios and smaller sizes (market capitalization).
Previous work shows that average returns on common stocks are related to firm characteristics such as size, earnings/price, cash flow/price, booktomarket equity, past sales growth, longterm past return, and shortterm past returns. Additional factors to the Fama–French threefactor model (
Malkiel [1975] 2020):
 (a)
A momentum factor to capture the tendency for rising or falling stocks to continue moving in the same direction.
 (b)
A liquidity factor to reflect that investors need to be paid a return premium as an incentive to hold illiquid assets.
 (c)
Quality of the company, as measured by such indicators as the stability of its earnings, sales growth, and its low amount of debt.
Furthermore, stock returns can be sensitive to general market swings, changes in interest and inflation rates, changes in national income, exchange rates, and other economic factors. Investigating the aggregate volatility risk factor,
Barinov (
2012) proposes the hypothesis that small growth firms and equity issuers are used by portfolio managers to hedge against aggregate volatility risk.
Markowitz (
1959) ideas laid down in the seminal monograph on portfolio selection sparked an entire wave of emulation in academia and among practitioners, being continuously perfected in concrete implementations, reevaluated, and extended.
Wang and Xia (
2002) discussed the Markowitz model and its modifications, as well as the related models based on different criteria for risk and return, but which share the same feature as the Markowitz model, namely that there is an underlying probability distribution for changes in the stock market. They considered models in which a decision does not rely on probability distributions on stock movement, though such information may still be used.
There have been attempts to estimate the volatility of the portfolio without using an estimation of a volatility matrix (the volatilities of the individual assets in the portfolio and their correlations), although the approach estimates stochastic volatility and its volatility (
Alghalith 2016).
Depending on how one measures the market, different beta measures can be obtained. Searching for low beta stocks with returns as attractive as for the market as a whole but with much less risk. Or collect highreturn stocks with beta on par with the market index. Traditional betas refer to the index of the stock market, as broad as it can be, and the beta of the market is defined as having a value of 1. In the case of
CSIE market volatility estimates, a market index such as S&P500 can have a beta against the entire market between 0.1 and 0.5, which is significantly lower than 1, which would be the beta of the entire stick market.
Malkiel (
[1975] 2020) refers to smart betas as indicators that are intended to identify the possibility of gaining excess returns (greater than the market) by using a variety of relatively passive rulesbased investment strategies that involve no more risk than would be assumed by investing in a lowcost total stock market index fund.
Betas for individual stocks are not stable over time and are very sensitive to the market proxy against which they are measured. Tracking them against the volatility of market indices can lead to different results than using the
CSIE volatility estimates of the entire market.
GonzálezUrteaga and Rubio (
2016) investigated the determinants of the crosssectional variation of the average volatility risk premia for a representative set of portfolios sorted by volatility risk premium beta, explaining why the volatility risk premia are different across assets.
Price and quantity have been the two fundamental components of any human trade activity since the beginning of time. One buys or sells a certain quantity of a given good at a certain price, based on the credence that that is the right deal under the given market conditions. One does not enter the trade if one considers the price to be unjustified and the history does not record anything. Alternatively, one enters the deal at a certain price level and for a certain quantity because one believes that that is the adequate quantity that one would be willing to trade at that price level. In other words, if one is not entirely convinced or satisfied with the price, then the traded quantity reflects the level of trust in the considered price level.
This paper contributes to portfolio volatility estimation with an additional quantitative instrument to assist portfolio selection, based on asset volatility relative to market indices and the volatility of the market as a whole. Our study on intrinsic entropy does not necessarily aim to identify a sole means to assist portfolio selection, but rather
 (a)
Make use of a comprehensive crosssectional volatility estimator, constructed taking into account all the symbols listed and traded on a given market;
 (b)
Identify a subset (portfolio) of symbols built based on the rate of returns and the betas relative to the volatility of the market as a whole for various time frames and intervals of historical data.
To our best knowledge, crosssectional intrinsic entropy (CSIE) is the only crosssectional volatility estimator that:
 –
takes into account all the listed and traded symbols of a given market;
 –
includes in the model not only the daily OHLC prices but also the traded volume.
The intrinsic entropy (IE) volatility estimator possesses two peculiar features, compared to the variancebased volatility estimators [1]:
Takes into account the traded volume, in addition to the price data, bringing in additional insight regarding the market inclination.
It is a signed volatility estimator:
 (a)
high positive values of IE are associated with a preponderant market buy;
 (b)
while high negative values of IE are associated with a preponderantly market sale.
Since market indices started to be traded on the exchanges as regular securities, they became one the most, if not the most, soughtafter assets in portfolios by individual and institutional investors alike, investment funds, pension funds, etc. The attractiveness of stock market indices is rightly justified due to their relatively broad base of constituents and, corroborated with this, a historically proven lower exposure to risk compared to the market as a whole (
Vințe and Ausloos 2022). It is as if portfolio selection is already solved by owning a single asset that offers exposure to many stocks. Although there is still a significant drawback to not owning the actual stocks by not benefiting from the dividends the issuing companies may pay annually, but the lower risk associated with the market indices can be a tractive enough compensation for many investors. Additionally, diversification is always desirable, since not a single asset, not even an exchangetraded stock market index, can offer full coverage concerning market volatility.
In such a framework, the research questions of our study are the following.
 i.
For any given interval of time, can at least two symbols, traded on the market, be identified that have a combined risk equal to or lower than that provided by the volatility estimates of the market index, and with a higher rate of return?
 ii.
If multiple symbols satisfy these constraints, can we algorithmically discover all of them?
The remainder of the article is organised as follows.
Section 2, Materials and Methods, presents the portfolio volatility estimation based on the crosssectional intrinsic entropy (
CSIE) as the volatility estimator of a set of stocks and for the stock market as a whole, along with the intrinsic entropy (
IE) as the volatility estimator of market indices based on longitudinal data. The methodology for computing both
CSIE and
IE is intimately related to the format in which the market data are available, how it is preprocessed, and structurally reorganised to allow efficient computation. Therefore, the input data and the way they are organised are presented in this
Section 2, together with the algorithm for the calculation of the conditional betas.
Section 3, Results, introduces the results obtained that will contour the premises in view of
Section 4, Discussions, concerning the traits of the stocks that exhibit lower risk than the market index and, at the same time, move in the same direction as the index, having a positive conditional beta relative to the entire stock market. Here, we also discuss the limitations of our study and the delineation concerning future research.
Section 5, Conclusions, summarises the outcome of our present investigation.
2. Materials and Methods
According to
Markowitz (
1952,
1959), calculating the volatility estimate of a given portfolio
S of
m assets
$\left\{{x}_{1},{x}_{2},\dots ,{x}_{m}\right\}$ takes into account the weight of each constituent in the overall value of the portfolio and the covariances between any pair of assets. The volatility estimate is provided by the rateofreturn variance of the portfolio constituents over a given time frame, say,
n days. It is worth noting that, while the meanvariance formulation by Markowitz offers the basis for modern portfolio selection analysis in a single period, an analytical optimal solution to the meanvariance formulation in multiperiod portfolio selection has been investigated as well (
Li and Ng 2001).
where
w is the vector of weights or how much of the total value of the portfolio is allocated to each asset,
If we notate with
${p}_{ij}$ the price of the assets
${x}_{j},j\in \left[1,m\right]$ on day
$i\in \left[1,n\right]$, then the matrix of prices for all the assets considered in the portfolio in the interval of
n days is the following.
The price
${p}_{ij}$ of asset
${x}_{j},j\in \left[1,m\right]$ on day
$i\in \left[1,n\right]$ is usually considered as being the closing price of the day. The vector of price averages in the interval is considered for all components of the portfolio.
Then the covariance matrix
$Cov\left(S\right)$ of the portfolio,
S is calculated as follows.
If we calculate the differences between daily prices and the interval average for each asset, matrixwise, we obtain the following.
The constituent covariances and volatility of the portfolio become:
where
n is the number of days in the considered time interval.
In the context of the crosssectional intrinsic entropy model (CSIE), we consider endofday (EOD) data containing daily open, high, low and close (OHLC) prices along with the traded quantity (volume) of each marked listed symbol that may be selected in a set as a portfolio constituent.
Historical EOD data are sourced from
https://www.eoddata.com/ 5 February 2023 and consist of a daily file containing OHLC prices and the traded volume for each listed stock on the market and traded in the given day. The collections of over 5500 files for each of the markets considered in the present study, the NYSE and the NASDAQ, are processed in such a way as to obtain a multidimensional array in the memory for allowing access longitudinally to the time series, and in crosssection for daily EOD data of the entire market. Therefore, making use of historical daily OHLC prices and volume for a period of more than 21 years, from 1 January 2001 to 28 October 2022, the data are organised in a multidimensional array, having as entry point a matrix
$X$ of over 5647 rows, as the number of days of daily data, and 3321 columns, listed symbols, as of 28 October 2022, for the NYSE. Correspondingly, the matrix
$X$ for the NASDAQ has 5643 rows, as the number of days of daily data, and 4937 columns, listed symbols, as of 28 October 2022.
For each symbol
$j,j\in \left[1,m\right]$ listed and traded on the day
$i,i\in \left[1,n\right]$ we have available a 5tuple
${x}_{ij}$ of values that provide a daily informational depth.
where:
 –
${x}_{ij}^{O}$—the open price (O) of symbol j on day i;
 –
${x}_{ij}^{H}$—the high (H) price of the symbol j on day i;
 –
${x}_{ij}^{L}$—the low (L) price of symbol j on day i;
 –
${x}_{ij}^{C}$—the close price (C) of the symbol j on day i;
 –
${x}_{ij}^{V}$—the traded volume (V) of symbol j on day i.
The values $\left\{{m}_{1},{m}_{2},{m}_{3},\dots ,{m}_{i},\dots ,{m}_{n}\right\}$ are the number of the symbols listed and traded on the market on the corresponding day $i\in \left[1,n\right]$ and $m=\mathrm{max}\left({m}_{i}\right),\forall i\in \left[1,n\right].$
We point out that since the number of listed symbols on both the NYSE and the NASDAQ markets has changed over time, generally exhibiting an ascending trend, the matrix X which contains all the listed symbols on a given market is a fairly sparse matrix. The matrix X sparsity can exceed 50%, to provide a rough magnitude level, in the context in which matrix X has over 18.75 mils. cells for the NYSE market and more than 27.85 mils. cells for the NASDAQ. Additionally, each cell
${x}_{ij}$ in matrix
$X$ stores a tuple of 5 values, see relation (14). It is worth noting that research motivated by arbitrage pricing theory in finance has been conducted to reduce dimensionality and to estimate the covariance matrix through a multifactor model (
Fan et al. 2008,
2016;
Fan and Kim 2018) or to estimate the large integrated volatility matrix without using covolatilities of illiquid assets (
Fan and Kim 2019).
The daily total traded value, considered at the end of each trading day
i, is given by the following relation:
Therefore, the daily ratio of individual symbols in the overall traded value
${S}_{i}$ is defined by:
where
${m}_{i}$ is the number of the symbols listed and traded on the market on the corresponding day
i. If we notate
${\lambda}_{ij}={x}_{ij}^{C}{x}_{ij}^{V},$ then
Such ratios ${\psi}_{ij}$ denote the portion of the traded value ${\lambda}_{ij}$ corresponding to the symbol j on day i in the overall value traded on day i, or the total amount of money ${\lambda}_{i}$ exchanged on the market for the day $i\in \left[1,n\right]$.
With the above notation, the crosssectional intrinsic entropy (
CSIE) (
Vințe and Ausloos 2022) of a set of symbols on a given day is:
The components
${H}_{i}^{OC}$ and
${H}_{i}^{OLHC}$ are defined as follows:
The value of
${f}_{i}$ from Equation (21) is consistent with the determination by
Yang and Zhang (
2000). In their influential paper on driftindependent volatility estimation using OHLC prices, they searched for an equivalent value of
${f}_{i}$, see Equation (18), for which the variance of the volatility estimator reaches the minimum. Based on the work of
Rogers and Satchell (
1991) and
Rogers et al. (
1994), who showed that
$\alpha \le 2$ by using the triangle inequality, Yang and Zhang calculated that
$\alpha \le 1.5$ for all drifts. To optimize their volatility estimator for situations exhibiting a small drift, Yang and Zhang suggested setting
$\alpha =1.34$ in practice. Since the significance of the terms
${H}_{i}^{OC}$ and
${H}_{i}^{OLHC}$ is similar to that of
${V}_{OC}$ and
${V}_{RS}$ from the Yang–Zhang volatility estimator, we followed the same rationale for using
$\alpha =1.34$ to calculate the weight
${f}_{i}$.
If we select a portfolio S of symbols from the entire market X and hold it for a given time span, that means that S is a subset of X in terms of the number of symbols and the number of days for which the volatility of such portfolio can be estimated. CSIE provides a daily volatility estimate for the entire market. To estimate the volatility of the market for a tday interval, we calculate the moving averages of the CSIE for appropriate windows of wday.
The contract for difference (CFD) that is offered by most online brokers to retail customers, as a means to buy and sell stocks, excludes from the stock return equation the stock dividend since the buyer of such a contract does not own the stock bearing the dividend.
Therefore, we define ${V}_{CSIE,t,w}^{market}$ as being the volatility estimates of the entire market based on crosssectional intrinsic entropy (CSIE) and computed for tday time intervals based on rolling windows of wday.
Similarly, ${V}_{CSIE,t,w}^{S}$ is the volatility estimate of portfolio S, based on the CSIE of the portfolio, calculated for the tday time interval and windows of moving averages wday.
The market index volatility estimates
${V}_{IE,t,w}^{index}$ is computed using intrinsic entropy (
IE) based on time series data, using the same rolling windows of
wday, within the time interval of
tday (
Vințe et al. 2021).
Additionally, the volatility estimates of an individual symbol ${V}_{IE,t,w}^{{x}_{j}}$ is computed based on IE, using rolling windows of the same wday, within the tday time interval.
With these notations, we introduce the following betas.
Algorithm 1 for computing the betas and selecting the symbols according to the dynamically imposed criteria is described as follows.
Algorithm 1. Compute the conditional betas and select the symbols according to the dynamically imposed criteria.

 1:
Initialize the time interval for both CSIE and IE $\leftarrow $tday

 2:
Initialize the CSIE moving averages
$\leftarrow $wday

 3:
Initialize the IE rolling windows
$\leftarrow $wday

 4:
Initialize an empty portfolio of stocks
$S\leftarrow \left\{\right\}$

 5:
Compute the return rate of the index
${R}_{index}$ in tday interval

 6:
Compute CSIE market volatility estimates
${V}_{CSIE,t,w}^{market}$

 7:
Compute market index IE volatility estimates
${V}_{IE,t,w}^{index}$

 8:
Compute index
${\beta}_{t,w}^{index}=\frac{Cov\left({V}_{IE,t,w}^{index},{V}_{CSIE,t,w}^{market}\right)}{Var\left({V}_{CSIE,t,w}^{market}\right)}$

 9:
for each stock
${x}_{j}$ traded on the market in the tday interval do

 10:
Compute stock return rate
${R}_{{x}_{j}}$ in tday interval

 11:
Compute stock IE volatility estimates
${V}_{IE,t,w}^{{x}_{j}}$

 12:
Compute
${\beta}_{t,w}^{{x}_{j}}=\frac{Cov\left({V}_{IE,t,w}^{{x}_{j}},{V}_{CSIE,t,w}^{market}\right)}{Var\left({V}_{CSIE,t,w}^{market}\right)}$

 13:
if $({\beta}_{t,w}^{{x}_{j}}\le {\beta}_{t,w}^{index}$) and $\left({R}_{{x}_{j}}\ge {R}_{index}\right)$ then

 14:
Add the stock to the portfolio
$S\stackrel{+}{\leftarrow}\left\{{x}_{j}\right\}$

 15:
end if

 16:
end for

 17:
Compute portfolio
${V}_{CSIE,t,w}^{S}$

 18:
Compute portfolio
${\beta}_{t,w}^{S}=\frac{Cov\left({V}_{CSIE,t,w}^{S},{V}_{CSIE,t,w}^{market}\right)}{Var\left({V}_{CSIE,t,w}^{market}\right)}$

 19:
return${\beta}_{t,w}^{S}$ of the discovered portfolio

Portfolio S will be populated with symbols that, in the tday given time interval, have a lower or equal risk as the market index and, at the same time, a rate of return equal to or higher than the one realized by the index.
3. Results
To answer the posed research questions, we study various time intervals, in particular tumultuous periods, characterised by intense market volatility and downturns. Since the aim of our current study is not to optimise portfolio allocation, the weights of all constituents of the portfolio are considered equal. The strategy of buying and holding for a medium to long period is consistent with the type of trading in the market indices.
We first present the results obtained for three intervals of time with different time spans as follows.
 –
125day trading interval, from 1 March 2022 to 26 August 2022;
 –
250day trading interval, from 5 October 2020 to 20 December 2021;
 –
950day trading interval, from 2 April 2018 to 26 August 2022.
The intention is to investigate the most recent developments in the stock market, cover the downturn that started in the spring of 2022, including the entire period of lockdowns and uncertainties in the labour market from the fall of 2020 and the whole year 2021, along with a broader perspective provided the period of the last 4 years which covers the SARSCoV2 pandemic. Based on the preliminary observations drawn from these three intervals of time, we proceed to study the market on an annual basis from 2001 to 2021. We comment that even the yearbase study is not designed to be an exhaustive one but rather to showcase a series of evenly divided time intervals that can be easily followed and associated with events that marked the financialeconomic evolution in the overall time span. For concrete forecasting purposes, exhaustive combinations of tday intervals and moving averages of the CSIE for various windows of wday should be considered further. To not clutter the graphical representation excessively, we limit the number of stock symbols to the least risky 15 symbols, if there are more than 15 companies whose stocks satisfy the constraints.
Figure 1 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, that have a return rate higher than the NYSE S&P500 index (the vertical line for the return rate −5.77%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0642). In total, 314 stocks were identified that satisfy the constraints and have a positive beta. A rolling window of 10 days has been used. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the S&P500 market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NYSE market. We point out that while the index was down by 5.77% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 7.5%.
Figure 2 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, to have a rate of return higher than the NYSE S&P500 index (the vertical line for the return rate of 23.79%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0175). In total, 55 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that while the index was up by 23.79% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 40%.
Figure 3 shows the 15 least risky stock symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, to have a rate of return higher than the NYSE S&P500 index (the vertical line for the return rate of 48.29%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0369), lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0369). In total, 114 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. It is worth noting that while the index was up by 48.29% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 75%.
Figure 4 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, to have a rate of return higher than the NASDAQ Composite index (−10.28%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the NASDAQ Composite index (0.0894). In total, 481 stocks were identified that meet the constraints and have a positive beta. A rolling window of 10 days has been used. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the S&P500 market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NYSE market. We point out that, while the index was down by 10.28% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 5.5%.
Figure 5 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, to have a rate of return higher than the NASDAQ Composite index (17.30%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the NASDAQ Composite index (0.0319). In total, 228 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that while the index was up by 17.30% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 30%.
Figure 6 shows the 15 least risky stock symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, to have a rate of return higher than the NASDAQ Composite index (67.52%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the NASDAQ Composite index (0.0456). In total, 104 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. It should be noted that, while the index increased by 57.52% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return well above 110%.
It should be noted that in the results previously presented, we only considered the stocks with a positive beta. There are two reasons for this:
 –
First, since the beta of both the market indices S&P500 and NASDAQ Composite were positive in the periods, we wanted to take into account, for finding an answer to our research questions, the stocks that follow the trend of the market, exhibit a lower beta than the beta of the corresponding index, but still have a positive beta.
 –
Second, the number of stocks with a lower beta than the beta of the corresponding index, including stocks that had a negative beta, is even higher, and the graphical representation would lack clarity.
We point out that the stocks with the highest return rates in the investigated interval of times were those with a negative beta, thus those that were inversecorrelated with the market.
To further study the possibility of diversification from market indices, we test the strategy of systematically buying and holding for one calendar year, for 21 years, between 2001 and 2021. It is a simple strategy that offers equidistant periods. The intervals are equally spaced and able to capture reasonable selfexplained phenomena, in terms of the economic forces at work. Within each calendar year, the performance of the S&P500 and NASDAQ Composite indices—the rate of return, and beta relative to the CSIE of the corresponding NYSE and the NASDAQ—are computed and further used as references for selecting the stocks that meet the constraints of having a lower or equal risk of the index and a return rate higher or at least equal to that provided by the market index.
The synthetic results obtained for the NYSE market are organised in
Table 1 and those for the NASDAQ in
Table 2.
Table 1 presents the sets of NYSE stocks, selected annually, based on the rate of return of the S&P500 index and its beta relative to the
CSIE of the market as a whole.
Table 2 presents the sets of NASDAQ stocks, selected annually, based on the rate of return of the NASDAQ Composite index and its beta relative to the
CSIE of the market as a whole.
We point out that to extract the best performers from the market, we also considered for portfolio selection the stocks that, in the process of exhibiting a lower risk than the market index, had a negative beta. Additionally, the number of stocks with positive and positive beta is emphasised for each period.
The results presented in
Table 1 and
Table 2 show that the stock market is reliably resourceful in ensuring portfolio diversification. The best performers, in terms of rate of return (RoR), show to have consistently had a negative beta. Even the portfolio beta is consistently negative for each year in the study period of 21 years. This signals the fact that those stocks that performed better than the market index, in terms of return rate, had an inverse correlation with the market as a whole with respect to volatility.
4. Discussion
It has to be observed that, except for the year 2003 for the NASDAQ (end of, and recovery after the dotcom bubble burst), and the years 2017 and 2021 for the NYSE, the corresponding market index beta relative to CSIE was consistently positive. This comes as confirmation that in general, the S&P500 and NASDAQ Composite indices are representative of their corresponding markets.
On the other hand, the RoR provided by the bestperforming stocks or even by the selected portfolio in its entirety, along with a negative beta exhibited by the majority of the portfolio constituents, support the hypothesis that higher returns can be obtained by investing in stocks that do not follow the market trend in terms of volatility.
Finding that the number of stocks having beta lower than the market index, relative to the CSIE market volatility estimates, and in the negative territory, may potentially suggest a higher risk proposition for the portfolio selection. Thus, in order to answer our research question in comparable terms, we adjusted the selection algorithm by imposing the stock beta to be strictly positive as well:
for
$j\in \left[1,m\right]$; the beta of the stock
${x}_{j}$, relative to the entire market volatility. The values
$\left\{{m}_{1},{m}_{2},{m}_{3},\dots ,{m}_{i},\dots ,{m}_{t}\right\}$ are the number of the symbols listed and traded on the market on the corresponding day
i, and
$m=\mathrm{max}\left({m}_{i}\right),\forall i\in $ tday time interval.
With this additional constraint, the number of constituent stocks in the selected portfolio is considerably reduced (see the values in the table columns No. of symbols with positive beta).
Table 3 presents the sets of NYSE symbols, selected annually, based on the rate of return of the S&P500 index and its beta relative to the
CSIE of the market as a whole. Only sets of stocks with positive beta relative to the market
CSIE.
Table 4 presents the sets of NASDAQ symbols, selected annually, based on the rate of return of the NASDAQ Composite index and its beta relative to the
CSIE of the market as a whole. Only sets of stocks with positive beta relative to the market
CSIE. For years in which the beta of the market index was negative, the composite constraint concluded the impossibility of finding a stock with a beta lower than the beta of the index, but still positive.
It should be noted that, even when portfolio selection is restricted to stocks that follow the trend volatility of the index and the market as a whole, there can be identified stocks with a lower relative risk of the market that provides significantly higher returns compared to the market index.
In most years, the average RoR of the identified through the proposed methodology portfolio of stocks is consistently positive and a few times higher than the RoR provided by the market index. We underline the fact that thanks to the constraint construction, all the stocks selected for the portfolio have an RoR higher than or at least equal to that realised by the market index.
We point out that we do not consider in our study elements that would constitute more indepth information related to the sector in which the companies activate, their broader financial performance, experience in the business, workforce, etc. We perceive this aspect as a potential limitation that would have to be addressed in the process of selecting actual portfolios. Furthermore, provided the relatively high number of stocks that could satisfy the selection criteria, we consider that an additional optimization process would be required.
A volatilitymanaged portfolio that typically applies volatilitytiming strategies to the stock market has been studied (
Liu et al. 2019) only to discover that these strategies suffer from lookahead bias, despite existing evidence on the success of the strategies at the stock level. The results of
Liu et al. (
2019) show that one cannot easily beat the market by timing the market alone. However, their study was grounded on using variancebased volatility estimates. Volatility estimation based on the intrinsic entropy (
IE) model for longitudinal data (
Vințe et al. 2021) and the crosssectional intrinsic entropy (
CSIE) as volatility estimator for the market as a whole (
Vințe and Ausloos 2022), benefit from the additional information provided by traded volume taken into account, with the meaning of traction that the market gives to a certain price level, along with the signed volatility estimates: negative values indicating an inclination on the market to sell, and positive values suggesting a preponderantly buy tendency on the market.
Additionally, it should be observed that the denominator of these betas represents the variance of the volatility estimates of the entire market
${V}_{CSIE,t,w}^{market}$ based on the crosssectional intrinsic entropy (
CSIE) and therefore volatility of volatility (VoV). Volatility is a fundamental quantity that describes the dynamics of volatility processes. However,
Li et al. (
2022) argue that it is far less well understood and constructed a nonparametric estimator of the VoV based on noisy highfrequency data with price jumps. The perspective provided to intraday trading by the highfrequency data for dynamically estimating portfolio volatility through
CSIE and its volatility could represent a pertinent path to follow for further research.