Portfolio Volatility Estimation Relative to Stock Market Cross-Sectional Intrinsic Entropy

Selecting stock portfolios and assessing their relative volatility risk compared to the market as a whole, market indices, or other portfolios is of great importance to professional fund managers and individual investors alike. Our research uses the cross-sectional intrinsic entropy (CSIE) model to estimate the cross-sectional volatility of the stock groups that can be considered together as portfolio constituents. In our study, we benchmark portfolio volatility risks against the volatility of the entire market provided by the CSIE and the volatility of market indices computed using longitudinal data. This article introduces CSIE-based betas to characterise the relative volatility risk of the portfolio against market indices and the market as a whole. We empirically prove that, through CSIE-based betas, multiple sets of symbols that outperform the market indices in terms of rate of return while maintaining the same level of risk or even lower than the one exhibited by the market index can be discovered, for any given time interval. These sets of symbols can be used as constituent stock portfolios and, in connection with the perspective provided by the CSIE volatility estimates, to hierarchically assess their relative volatility risk within the broader context of the overall volatility of the stock market.


Introduction and Brief Background
Portfolio construction is as much a timing problem as a financial instrument selection problem. Deciding what to invest in may be based on the historical rate of return of the assets, which is impacted by the interest the market is showing in the particular asset and, as a corollary, the liquidity that the assets will enjoy or not during the window of time one happens to have an interest in the given asset. This window of time is intimately connected to the acquisition and possession of the portfolio.
As an introduction, it seems fair to present the broad range of proposals that have been historically investigated to tackle the problem of portfolio volatility estimation and portfolio components discovery, not only from an entropic perspective, in order to emphasise the pertinence of our aim, and its "originality".
The connection between entropy and excess market returns was investigated by Maasoumi and Racine (2002). They found significant evidence of small nonlinear unconditional serial dependence within the returns, but not conclusive evidence of superior profit opportunity when using market-switching versus buy-and-hold strategies. Dionisio et al. (2007), questioning if the standard deviation is a good measure of risk and uncertainty, argued that entropy could present some advantages as a measure of uncertainty and simultaneously verify some basic assumptions of the portfolio management theory, namely the effect of diversification. Liu and Chen (2012) applied type-2 fuzzy theory to the portfolio selection problem, in which the security returns are unknown and are characterised by type-2 fuzzy variables. Not having the expectation and entropy of type-2 fuzzy variable well defined, they opted to reduce first the type-2 fuzzy variable and then propose a mean-entropy model with reduced variables to be applied to the portfolio selection problem. The intention was to transform the established mean-entropy model of reduced variables into equivalent parametric programming.
Ausloos (2000) argued that in classical thermodynamics, the entropy is necessarily coupled to a temperature and that the temperature is known to mimic the inverse of a relaxation time. Similarly, he proposes that it is reasonable to assume that financial market actors may consider relaxation times on the market and that the relaxation times differ in reality, reflecting different perspectives that actors have regarding the market evolution. Nevertheless, the interaction which will be considered will be a self-interaction of the information asymmetry level of the share price (stock value), which itself is in some sense due to different appreciation conditions of the market by actors (Dhesi and Ausloos 2016).
To study portfolio diversity, Song and Chan (2020) proposed an adaptive entropy model, which incorporates entropy measurement and adaptability into the conventional Markowitz mean-variance model. Mercurio et al. (2020) introduced a new family of portfolio optimization problems called return-entropy portfolio optimization (REPO) that simplifies the computation of portfolio entropy using a combinatorial approach. Novais et al. (2022) proposed a portfolio-optimisation model that uses entropy and mutual information as risk measurements instead of variance and covariance. They experimented by comparing models that rely on mean-variance with counterparts based on mean-entropy using a stochastic entropy estimation. Their results showed that when increasing return constraints on portfolio optimisation, the mean-entropy models were more stable overall, exhibiting dampened responses in cumulative returns and Sharpe ratio in comparison to mean-variance methods.
According to Capital Asset Pricing Model (CAPM), the only risk that investors should be compensated for is the risk that cannot be diversified away (Sharpe 1964;Ross 1976). Only systematic risk will command a risk premium. CAPM is calculated according to the following formula: where: -expected return on security x -risk-free rate -beta of the security x -the expected return of the market.
The difference − represents the risk premium. Since the risk-free rate is pegged and readily available for a considered time interval, the expected return of the market is an estimation that may differ depending on the model used. There are also considerations related to the computation of returns on a stock mix, when short sales are permitted, or when short sales are not permitted (Lintner 1965), and then the risk-free rate can play a relevant decisional role. A market index, such as the S&P500, is not the entire market. The total market includes thousands of other traded stocks and, in a broader sense, bonds, real estate, commodities, options, and many other assets of all sorts, including one of the most important assets any of us has: the human capital built up by education, work, and life experience (Black and Litterman 1992;Malkiel [1975Malkiel [ ] 2020. There is also evidence that, when estimating the volatility of the stock market, a more inclusive instrument, in terms of the number of symbols taken into account for what constitutes the stock market, provides safer grounds for non-manipulative interpretations (Bhowmik and Wang 2020). For example, Saha et al. (2019) found that the movements in the daily levels of the VIX index are explained by market fundamentals and not by manipulation. They show that the VIX closing values and VIX futures settlement prices from 2008 are consistent with normal market forces and are not artificial.
Moreira and Muir (2017) proposed a volatility-managed portfolio strategy consisting of constructing portfolios that adjust monthly returns by the inverse of their previous month's realised variance, thus decreasing risk exposure when variance was recently high and vice versa. They documented that this trading strategy earns large alphas across a wide range of asset pricing factors, suggesting that investors can benefit from volatility timing.
Volatility estimates of a portfolio of stocks need to be considered contextually. Volatility estimated through standard deviation or variance of asset prices over time may offer an indication of the dispersion of prices around the mean calculated for a given time interval but is the performance relative to other assets on the market, or against the market as a whole, that matters more when it comes to portfolio selection. For example, Carr and Wu (2009) used a large option data set to synthesise variance swap rates and investigate the historical behaviour of variance risk premiums on five stock indexes and 35 individual stocks. Cejnek and Mair (2021) implemented timing regressions and related returns of a volatility-managed portfolio to discount rate, cash flow, and expected volatility, providing evidence that volatility management outperforms by levering up good times without increasing downside exposure to fundamental risk drivers. Fama and French (1992) divided all traded stocks into deciles according to their beta measures over the 1963-1990 period. They concluded that there is no relationship between beta and return. Additionally, small firms tended to outperform larger firms with the same beta levels. Therefore, size is a risk factor that deserves to be compensated for with additional return. Castellano and Cerqueti (2014) analysed a mean-variance optimal portfolio selection problem in the presence of risky assets characterized by low-frequency trading and therefore low liquidity. These attributes most often describe small market capitalization companies, which are not well known by investors and whose stocks, consequently, are not traded regularly and/or not in significant volumes. On the other hand, the Fama and French three-factor model French 1992, 1993), which takes into account the beta relative to the market index, the capitalization of the company (size), and the market price versus book value as a ratio, shows that the smaller firms are relatively risky. There is evidence French 1995, 1996) that returns are higher for stocks with lower price-to-book ratios and smaller sizes (market capitalization).
Previous work shows that average returns on common stocks are related to firm characteristics such as size, earnings/price, cash flow/price, book-to-market equity, past sales growth, long-term past return, and short-term past returns. Additional factors to the Fama-French three-factor model (Malkiel [1975] 2020): (a) A momentum factor to capture the tendency for rising or falling stocks to continue moving in the same direction. (b) A liquidity factor to reflect that investors need to be paid a return premium as an incentive to hold illiquid assets. (c) Quality of the company, as measured by such indicators as the stability of its earnings, sales growth, and its low amount of debt.
Furthermore, stock returns can be sensitive to general market swings, changes in interest and inflation rates, changes in national income, exchange rates, and other economic factors. Investigating the aggregate volatility risk factor, Barinov (2012) proposes the hypothesis that small growth firms and equity issuers are used by portfolio managers to hedge against aggregate volatility risk. Markowitz (1959) ideas laid down in the seminal monograph on portfolio selection sparked an entire wave of emulation in academia and among practitioners, being continuously perfected in concrete implementations, re-evaluated, and extended. Wang and Xia (2002) discussed the Markowitz model and its modifications, as well as the related models based on different criteria for risk and return, but which share the same feature as the Markowitz model, namely that there is an underlying probability distribution for changes in the stock market. They considered models in which a decision does not rely on probability distributions on stock movement, though such information may still be used.
There have been attempts to estimate the volatility of the portfolio without using an estimation of a volatility matrix (the volatilities of the individual assets in the portfolio and their correlations), although the approach estimates stochastic volatility and its volatility (Alghalith 2016).
Depending on how one measures the market, different beta measures can be obtained. Searching for low beta stocks with returns as attractive as for the market as a whole but with much less risk. Or collect high-return stocks with beta on par with the market index. Traditional betas refer to the index of the stock market, as broad as it can be, and the beta of the market is defined as having a value of 1. In the case of CSIE market volatility estimates, a market index such as S&P500 can have a beta against the entire market between 0.1 and 0.5, which is significantly lower than 1, which would be the beta of the entire stick market. Malkiel ([1975] 2020) refers to smart betas as indicators that are intended to identify the possibility of gaining excess returns (greater than the market) by using a variety of relatively passive rules-based investment strategies that involve no more risk than would be assumed by investing in a low-cost total stock market index fund.
Betas for individual stocks are not stable over time and are very sensitive to the market proxy against which they are measured. Tracking them against the volatility of market indices can lead to different results than using the CSIE volatility estimates of the entire market. González-Urteaga and Rubio (2016) investigated the determinants of the crosssectional variation of the average volatility risk premia for a representative set of portfolios sorted by volatility risk premium beta, explaining why the volatility risk premia are different across assets.
Price and quantity have been the two fundamental components of any human trade activity since the beginning of time. One buys or sells a certain quantity of a given good at a certain price, based on the credence that that is the right deal under the given market conditions. One does not enter the trade if one considers the price to be unjustified and the history does not record anything. Alternatively, one enters the deal at a certain price level and for a certain quantity because one believes that that is the adequate quantity that one would be willing to trade at that price level. In other words, if one is not entirely convinced or satisfied with the price, then the traded quantity reflects the level of trust in the considered price level.
This paper contributes to portfolio volatility estimation with an additional quantitative instrument to assist portfolio selection, based on asset volatility relative to market indices and the volatility of the market as a whole. Our study on intrinsic entropy does not necessarily aim to identify a sole means to assist portfolio selection, but rather (a) Make use of a comprehensive cross-sectional volatility estimator, constructed taking into account all the symbols listed and traded on a given market; (b) Identify a subset (portfolio) of symbols built based on the rate of returns and the betas relative to the volatility of the market as a whole for various time frames and intervals of historical data.
To our best knowledge, cross-sectional intrinsic entropy (CSIE) is the only cross-sectional volatility estimator that: − takes into account all the listed and traded symbols of a given market; − includes in the model not only the daily OHLC prices but also the traded volume.
The intrinsic entropy (IE) volatility estimator possesses two peculiar features, compared to the variance-based volatility estimators [1]: 1. Takes into account the traded volume, in addition to the price data, bringing in additional insight regarding the market inclination. 2. It is a signed volatility estimator: (a) high positive values of IE are associated with a preponderant market buy; (b) while high negative values of IE are associated with a preponderantly market sale.
Since market indices started to be traded on the exchanges as regular securities, they became one the most, if not the most, sought-after assets in portfolios by individual and institutional investors alike, investment funds, pension funds, etc. The attractiveness of stock market indices is rightly justified due to their relatively broad base of constituents and, corroborated with this, a historically proven lower exposure to risk compared to the market as a whole (Vințe and Ausloos 2022). It is as if portfolio selection is already solved by owning a single asset that offers exposure to many stocks. Although there is still a significant drawback to not owning the actual stocks by not benefiting from the dividends the issuing companies may pay annually, but the lower risk associated with the market indices can be a tractive enough compensation for many investors. Additionally, diversification is always desirable, since not a single asset, not even an exchange-traded stock market index, can offer full coverage concerning market volatility.
In such a framework, the research questions of our study are the following.
i. For any given interval of time, can at least two symbols, traded on the market, be identified that have a combined risk equal to or lower than that provided by the volatility estimates of the market index, and with a higher rate of return? ii.
If multiple symbols satisfy these constraints, can we algorithmically discover all of them?
The remainder of the article is organised as follows. Section 2, Materials and Methods, presents the portfolio volatility estimation based on the cross-sectional intrinsic entropy (CSIE) as the volatility estimator of a set of stocks and for the stock market as a whole, along with the intrinsic entropy (IE) as the volatility estimator of market indices based on longitudinal data. The methodology for computing both CSIE and IE is intimately related to the format in which the market data are available, how it is preprocessed, and structurally reorganised to allow efficient computation. Therefore, the input data and the way they are organised are presented in this Section 2, together with the algorithm for the calculation of the conditional betas. Section 3, Results, introduces the results obtained that will contour the premises in view of Section 4, Discussions, concerning the traits of the stocks that exhibit lower risk than the market index and, at the same time, move in the same direction as the index, having a positive conditional beta relative to the entire stock market. Here, we also discuss the limitations of our study and the delineation concerning future research. Section 5, Conclusions, summarises the outcome of our present investigation.

Materials and Methods
According to Markowitz (1952Markowitz ( , 1959, calculating the volatility estimate of a given portfolio S of m assets { , , … , } takes into account the weight of each constituent in the overall value of the portfolio and the covariances between any pair of assets. The volatility estimate is provided by the rate-of-return variance of the portfolio constituents over a given time frame, say, n days. It is worth noting that, while the mean-variance formulation by Markowitz offers the basis for modern portfolio selection analysis in a single period, an analytical optimal solution to the mean-variance formulation in multiperiod portfolio selection has been investigated as well (Li and Ng 2001).
where w is the vector of weights or how much of the total value of the portfolio is allocated to each asset, If we notate with the price of the assets , ∈ 1, on day ∈ 1, , then the matrix of prices for all the assets considered in the portfolio in the interval of n days is the following.
The price of asset , ∈ 1, on day ∈ 1, is usually considered as being the closing price of the day. The vector of price averages in the interval is considered for all components of the portfolio.
Then the covariance matrix ( ) of the portfolio, S is calculated as follows.
If we calculate the differences between daily prices and the interval average for each asset, matrix-wise, we obtain the following.
, and its transpose (9) The constituent covariances and volatility of the portfolio become: where n is the number of days in the considered time interval.
In the context of the cross-sectional intrinsic entropy model (CSIE), we consider endof-day (EOD) data containing daily open, high, low and close (OHLC) prices along with the traded quantity (volume) of each marked listed symbol that may be selected in a set as a portfolio constituent.
Historical EOD data are sourced from https://www.eoddata.com/ 5 February 2023 and consist of a daily file containing OHLC prices and the traded volume for each listed stock on the market and traded in the given day. The collections of over 5500 files for each of the markets considered in the present study, the NYSE and the NASDAQ, are processed in such a way as to obtain a multidimensional array in the memory for allowing access longitudinally to the time series, and in cross-section for daily EOD data of the entire market. Therefore, making use of historical daily OHLC prices and volume for a period of more than 21 years, from 1 January 2001 to 28 October 2022, the data are organised in a multidimensional array, having as entry point a matrix of over 5647 rows, as the number of days of daily data, and 3321 columns, listed symbols, as of 28 October 2022, for the NYSE. Correspondingly, the matrix for the NASDAQ has 5643 rows, as the number of days of daily data, and 4937 columns, listed symbols, as of 28 October 2022.
For each symbol , ∈ 1, listed and traded on the day , ∈ 1, we have available a 5-tuple of values that provide a daily informational depth.
where: We point out that since the number of listed symbols on both the NYSE and the NASDAQ markets has changed over time, generally exhibiting an ascending trend, the matrix X which contains all the listed symbols on a given market is a fairly sparse matrix. The matrix X sparsity can exceed 50%, to provide a rough magnitude level, in the context in which matrix X has over 18.75 mils. cells for the NYSE market and more than 27.85 mils. cells for the NASDAQ. Additionally, each cell in matrix stores a tuple of 5 values, see relation (14). It is worth noting that research motivated by arbitrage pricing theory in finance has been conducted to reduce dimensionality and to estimate the covariance matrix through a multifactor model (Fan et al. 2008(Fan et al. , 2016Fan and Kim 2018) or to estimate the large integrated volatility matrix without using covolatilities of illiquid assets (Fan and Kim 2019).
The daily total traded value, considered at the end of each trading day i, is given by the following relation: Therefore, the daily ratio of individual symbols in the overall traded value is defined by: where is the number of the symbols listed and traded on the market on the corresponding day i. If we notate = , then = , for ∈ 1, , ∈ 1, .
Such ratios denote the portion of the traded value corresponding to the symbol j on day i in the overall value traded on day i, or the total amount of money exchanged on the market for the day ∈ 1, .
With the above notation, the cross-sectional intrinsic entropy (CSIE) (Vințe and Ausloos 2022) of a set of symbols on a given day is: The components and are defined as follows: The value of from Equation (21) is consistent with the determination by Yang and Zhang (2000). In their influential paper on drift-independent volatility estimation using OHLC prices, they searched for an equivalent value of , see Equation (18), for which the variance of the volatility estimator reaches the minimum. Based on the work of Rogers and Satchell (1991) and Rogers et al. (1994), who showed that ≤ 2 by using the triangle inequality, Yang and Zhang calculated that ≤ 1.5 for all drifts. To optimize their volatility estimator for situations exhibiting a small drift, Yang and Zhang suggested setting = 1.34 in practice. Since the significance of the terms and is similar to that of and from the Yang-Zhang volatility estimator, we followed the same rationale for using = 1.34 to calculate the weight .
If we select a portfolio S of symbols from the entire market X and hold it for a given time span, that means that S is a subset of X in terms of the number of symbols and the number of days for which the volatility of such portfolio can be estimated. CSIE provides a daily volatility estimate for the entire market. To estimate the volatility of the market for a t-day interval, we calculate the moving averages of the CSIE for appropriate windows of w-day.
The contract for difference (CFD) that is offered by most online brokers to retail customers, as a means to buy and sell stocks, excludes from the stock return equation the stock dividend since the buyer of such a contract does not own the stock bearing the dividend.
Therefore, we define , , as being the volatility estimates of the entire market based on cross-sectional intrinsic entropy (CSIE) and computed for t-day time intervals based on rolling windows of w-day.
Similarly, , , is the volatility estimate of portfolio S, based on the CSIE of the portfolio, calculated for the t-day time interval and windows of moving averages w-day.
The market index volatility estimates , , is computed using intrinsic entropy (IE) based on time series data, using the same rolling windows of w-day, within the time interval of t-day (Vințe et al. 2021).
Additionally, the volatility estimates of an individual symbol , , is computed based on IE, using rolling windows of the same w-day, within the t-day time interval.
With these notations, we introduce the following betas.
; beta of the portfolio S, relative to the entire market volatility.
Algorithm 1 for computing the betas and selecting the symbols according to the dynamically imposed criteria is described as follows.
return , of the discovered portfolio Portfolio S will be populated with symbols that, in the t-day given time interval, have a lower or equal risk as the market index and, at the same time, a rate of return equal to or higher than the one realized by the index.

Results
To answer the posed research questions, we study various time intervals, in particular tumultuous periods, characterised by intense market volatility and downturns. Since the aim of our current study is not to optimise portfolio allocation, the weights of all constituents of the portfolio are considered equal. The strategy of buying and holding for a medium to long period is consistent with the type of trading in the market indices.
We first present the results obtained for three intervals of time with different time spans as follows. The intention is to investigate the most recent developments in the stock market, cover the downturn that started in the spring of 2022, including the entire period of lockdowns and uncertainties in the labour market from the fall of 2020 and the whole year 2021, along with a broader perspective provided the period of the last 4 years which covers the SARS-CoV-2 pandemic. Based on the preliminary observations drawn from these three intervals of time, we proceed to study the market on an annual basis from 2001 to 2021. We comment that even the year-base study is not designed to be an exhaustive one but rather to showcase a series of evenly divided time intervals that can be easily followed and associated with events that marked the financial-economic evolution in the overall time span. For concrete forecasting purposes, exhaustive combinations of t-day intervals and moving averages of the CSIE for various windows of w-day should be considered further. To not clutter the graphical representation excessively, we limit the number of stock symbols to the least risky 15 symbols, if there are more than 15 companies whose stocks satisfy the constraints. Figure 1 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, that have a return rate higher than the NYSE S&P500 index (the vertical line for the return rate −5.77%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0642). In total, 314 stocks were identified that satisfy the constraints and have a positive beta. A rolling window of 10 days has been used. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the S&P500 market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NYSE market. We point out that while the index was down by 5.77% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 7.5%. Figure 1. A set of 15 NYSE symbols that outperformed the S&P500 index in terms of rate of return without being exposed to higher risk (125-day trading interval, from 1 March 2022 to 26 August 2022). Figure 2 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, to have a rate of return higher than the NYSE S&P500 index (the vertical line for the return rate of 23.79%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0175). In total, 55 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that while the index was up by 23.79% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 40%.

Figure 2.
A set of 15 NYSE symbols that outperformed the S&P500 index in terms of rate of return without being exposed to higher risk (250-day trading interval, from 5 October 2020 to 20 December 2021). Figure 3 shows the 15 least risky stock symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, to have a rate of return higher than the NYSE S&P500 index (the vertical line for the return rate of 48.29%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0369), lower than the one exhibited by the S&P500 index (the horizontal line for beta 0.0369). In total, 114 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. It is worth noting that while the index was up by 48.29% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 75%. Figure 4 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, to have a rate of return higher than the NASDAQ Composite index (−10.28%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the NASDAQ Composite index (0.0894). In total, 481 stocks were identified that meet the constraints and have a positive beta. A rolling window of 10 days has been used. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the S&P500 market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NYSE market. We point out that, while the index was down by 10.28% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 5.5%. Figure 3. A set of 15 NYSE symbols that outperformed the S&P500 index in terms of rate of return without being exposed to higher risk (950-day trading interval, from 2 April 2018 to 26 August 2022).

Figure 4.
A set of 15 NASDAQ symbols that outperformed the NASDAQ Composite index in terms of rate of return without being exposed to higher risk (125-day trading interval, from 1 March 2022 to 26 August 2022). Figure 5 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, to have a rate of return higher than the NASDAQ Composite index (17.30%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the NASDAQ Composite index (0.0319). In total, 228 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that while the index was up by 17.30% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 30%. Figure 5. A set of 15 NASDAQ symbols that outperformed the NASDAQ Composite index in terms of rate of return without being exposed to higher risk (250-day trading interval, from 5 October 2020 to 20 December 2021). Figure 6 shows the 15 least risky stock symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, to have a rate of return higher than the NASDAQ Composite index (67.52%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the NASDAQ Composite index (0.0456). In total, 104 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. It should be noted that, while the index increased by 57.52% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return well above 110%. Figure 6. A set of 15 NASDAQ symbols that outperformed the NASDAQ Composite index in terms of rate of return without being exposed to a higher risk NYSE S&P500 (950-day trading interval, from 2 April 2018 to 26 August 2022).
In Appendix A, Figures A1-A6 show the portfolio of stocks from the NYSE and the NASDAQ market discovered using as benchmarks for the performance recorded in the studied periods for the DJIA index and Russell 2000 index, respectively.
It should be noted that in the results previously presented, we only considered the stocks with a positive beta. There are two reasons for this:

−
First, since the beta of both the market indices S&P500 and NASDAQ Composite were positive in the periods, we wanted to take into account, for finding an answer to our research questions, the stocks that follow the trend of the market, exhibit a lower beta than the beta of the corresponding index, but still have a positive beta. − Second, the number of stocks with a lower beta than the beta of the corresponding index, including stocks that had a negative beta, is even higher, and the graphical representation would lack clarity.
We point out that the stocks with the highest return rates in the investigated interval of times were those with a negative beta, thus those that were inverse-correlated with the market.
To further study the possibility of diversification from market indices, we test the strategy of systematically buying and holding for one calendar year, for 21 years, between 2001 and 2021. It is a simple strategy that offers equidistant periods. The intervals are equally spaced and able to capture reasonable self-explained phenomena, in terms of the economic forces at work. Within each calendar year, the performance of the S&P500 and NASDAQ Composite indices-the rate of return, and beta relative to the CSIE of the corresponding NYSE and the NASDAQ-are computed and further used as references for selecting the stocks that meet the constraints of having a lower or equal risk of the index and a return rate higher or at least equal to that provided by the market index.
The synthetic results obtained for the NYSE market are organised in Table 1 and those for the NASDAQ in Table 2.  Table 1 presents the sets of NYSE stocks, selected annually, based on the rate of return of the S&P500 index and its beta relative to the CSIE of the market as a whole. Table 2 presents the sets of NASDAQ stocks, selected annually, based on the rate of return of the NASDAQ Composite index and its beta relative to the CSIE of the market as a whole. We point out that to extract the best performers from the market, we also considered for portfolio selection the stocks that, in the process of exhibiting a lower risk than the market index, had a negative beta. Additionally, the number of stocks with positive and positive beta is emphasised for each period.
The results presented in Tables 1 and 2 show that the stock market is reliably resourceful in ensuring portfolio diversification. The best performers, in terms of rate of return (RoR), show to have consistently had a negative beta. Even the portfolio beta is consistently negative for each year in the study period of 21 years. This signals the fact that those stocks that performed better than the market index, in terms of return rate, had an inverse correlation with the market as a whole with respect to volatility.

Discussion
It has to be observed that, except for the year 2003 for the NASDAQ (end of, and recovery after the dot-com bubble burst), and the years 2017 and 2021 for the NYSE, the corresponding market index beta relative to CSIE was consistently positive. This comes as confirmation that in general, the S&P500 and NASDAQ Composite indices are representative of their corresponding markets.
On the other hand, the RoR provided by the best-performing stocks or even by the selected portfolio in its entirety, along with a negative beta exhibited by the majority of the portfolio constituents, support the hypothesis that higher returns can be obtained by investing in stocks that do not follow the market trend in terms of volatility.
Finding that the number of stocks having beta lower than the market index, relative to the CSIE market volatility estimates, and in the negative territory, may potentially suggest a higher risk proposition for the portfolio selection. Thus, in order to answer our research question in comparable terms, we adjusted the selection algorithm by imposing the stock beta to be strictly positive as well: for ∈ 1, ; the beta of the stock , relative to the entire market volatility. The values { , , , … , , … , } are the number of the symbols listed and traded on the market on the corresponding day i, and = max( ) , ∀ ∈ t-day time interval.
With this additional constraint, the number of constituent stocks in the selected portfolio is considerably reduced (see the values in the table columns No. of symbols with positive beta). Table 3 presents the sets of NYSE symbols, selected annually, based on the rate of return of the S&P500 index and its beta relative to the CSIE of the market as a whole. Only sets of stocks with positive beta relative to the market CSIE. Table 3. Sets of NYSE symbols, selected annually, based on the rate of return (RoR) of the S&P500 index and its beta relative to the CSIE of the market as a whole. Only sets of stocks with positive beta relative to the market CSIE.
Year S&P500 RoR (%)  Table 4 presents the sets of NASDAQ symbols, selected annually, based on the rate of return of the NASDAQ Composite index and its beta relative to the CSIE of the market as a whole. Only sets of stocks with positive beta relative to the market CSIE. For years in which the beta of the market index was negative, the composite constraint concluded the impossibility of finding a stock with a beta lower than the beta of the index, but still positive. It should be noted that, even when portfolio selection is restricted to stocks that follow the trend volatility of the index and the market as a whole, there can be identified stocks with a lower relative risk of the market that provides significantly higher returns compared to the market index.

Min. RoR in
In most years, the average RoR of the identified through the proposed methodology portfolio of stocks is consistently positive and a few times higher than the RoR provided by the market index. We underline the fact that thanks to the constraint construction, all the stocks selected for the portfolio have an RoR higher than or at least equal to that realised by the market index.
We point out that we do not consider in our study elements that would constitute more in-depth information related to the sector in which the companies activate, their broader financial performance, experience in the business, workforce, etc. We perceive this aspect as a potential limitation that would have to be addressed in the process of selecting actual portfolios. Furthermore, provided the relatively high number of stocks that could satisfy the selection criteria, we consider that an additional optimization process would be required.
A volatility-managed portfolio that typically applies volatility-timing strategies to the stock market has been studied (Liu et al. 2019) only to discover that these strategies suffer from look-ahead bias, despite existing evidence on the success of the strategies at the stock level. The results of  show that one cannot easily beat the market by timing the market alone. However, their study was grounded on using variance-based volatility estimates. Volatility estimation based on the intrinsic entropy (IE) model for longitudinal data (Vințe et al. 2021) and the cross-sectional intrinsic entropy (CSIE) as volatility estimator for the market as a whole ((Vințe and Ausloos 2022), benefit from the additional information provided by traded volume taken into account, with the meaning of traction that the market gives to a certain price level, along with the signed volatility estimates: negative values indicating an inclination on the market to sell, and positive values suggesting a preponderantly buy tendency on the market.
Additionally, it should be observed that the denominator of these betas represents the variance of the volatility estimates of the entire market , , based on the crosssectional intrinsic entropy (CSIE) and therefore volatility of volatility (VoV). Volatility is a fundamental quantity that describes the dynamics of volatility processes. However, Li et al. (2022) argue that it is far less well understood and constructed a nonparametric estimator of the VoV based on noisy high-frequency data with price jumps. The perspective provided to intraday trading by the high-frequency data for dynamically estimating portfolio volatility through CSIE and its volatility could represent a pertinent path to follow for further research.

Conclusions
In the context of the cross-sectional intrinsic entropy model (CSIE) to estimate the volatility of the stock market as a whole, we consider EOD data containing daily OHLC prices along with the traded quantity (volume) of each marked listed symbol that may be selected in a set as a portfolio constituent.
The research presented in this article makes use of historical daily open, high, low, and close prices and volume for a period of over 21 years, from 1 January 2001 to 28 October 2022.
In our study, we benchmark portfolio volatility risks against the volatility of the entire market provided by the CSIE and the volatility of market indices computed using longitudinal data. We introduce CSIE-based betas to characterise the relative volatility risk of the portfolio against market indices and the market as a whole.
The results we obtained empirically answer the research questions we established.
i. For any given interval of time, at least two symbols, traded on the market, that have a combined risk equal to or lower than that provided by the volatility estimates of the market index and with a higher rate of return, can be identified. ii.
Algorithmically, we discover all symbols that satisfy these constraints.
Thus, we empirically prove that, through CSIE-based betas, multiple sets of symbols that outperform the market indices in terms of rate of return while maintaining the same level of risk or even lower than the one exhibited by the market index can be discovered, for any given time interval.

Data Availability Statement:
The data presented in this study is available on request from the corresponding author. The data is not publicly available due to the original software components necessary to obtain them and developed for ongoing research purposes.

Acknowledgments:
The authors thank and extend their gratitude to the three anonymous reviewers for their careful reading of the manuscript and their insightful comments and suggestions, which no doubt greatly help to improve the report!

Conflicts of Interest:
The authors declare no conflict of interest. Figure A1 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, which have a return rate higher than the NYSE DJIA index (−3.04%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the DJIA index (0.0459). In total, 220 stocks were identified that satisfy the constraints and have a positive beta. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the DJIA market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NYSE market. It is worth noting that while the index was down by 3.04% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 10%. A rolling window of 10 days has been used. Figure A1. A set of 15 NYSE symbols that outperformed the DJIA index in terms of rate of return without being exposed to higher risk (125-day trading interval, from 1 March 2022 to 26 August 2022). Figure A2. A set of 15 NYSE symbols that outperformed the DJIA index in terms of rate-of-return without being exposed to higher risk (250-day trading interval, from 5 October 2020 to 20 December 2021). Figure A2 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, which have a rate of return higher than the NYSE DJIA index (15.94%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the DJIA index (0.0339). In total, 151 stocks were identified that satisfy the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that, while the index was up by 15.94% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 30%. Figure A3 shows the 15 least risky stock symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, which have a rate of return higher than the NYSE DJIA index (27.03%) and a beta, relative to the entire NYSE market, lower than the one exhibited by the DJIA index (0.0298). In total, 136 stocks were identified that meet the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that, while the index was up by 27.03% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 50%. Figure A3. A set of 15 NYSE symbols that outperformed the DJIA index in terms of rate-of-return without being exposed to higher risk (950-day trading interval, from 2 April 2018 to 26 August 2022). Figure A4 shows the 15 least risky stock symbols discovered in an interval of 125 trading days, from 1 March 2022 to 26 August 2022, to have a rate of return higher than the NASDAQ Russell 2000 index (−10.28%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the Russell 2000 index (0.0894). In total, 109 stocks were identified that satisfy the constraints and have a positive beta. In other words, a set of symbols can be identified, more than one, that have a higher rate of return than the Russell 2000 market index at a maximum level of risk provided as a threshold represented here by the beta of the index relative to the entire NASDAQ market. It is worth noting that while the index was down by 10.28% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of about 11%. A rolling window of 10 days has been used. Figure A4. A set of 15 NASDAQ symbols that outperformed the Russell 2000 Composite index in terms of rate of return without being exposed to higher risk (125-day trading interval, from 1 March 2022 to 26 August 2022). Figure A5 shows the 15 least risky stock symbols discovered in an interval of 250 trading days, from 5 October 2020 to 20 December 2021, that have a rate of return higher than the NASDAQ Russell 2000 index (8.60%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the Russell 2000 index (0.0118). Figure A5. A set of 15 NASDAQ symbols that outperformed the Russell 2000 index in terms of rate of return without being exposed to higher risk (250-day trading interval, from 5 October 2020 to 20 December 2021).

Appendix A
In total, 124 stocks were identified that satisfy the constraints and have a positive beta. A rolling window of 20 days has been used. We point out that while the index was up by 8.60% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of above 35%. Figure A6 shows the only 15 symbols discovered in an interval of 950 trading days, from 2 April 2018 to 26 August 2022, to have a rate of return higher than the NASDAQ Russell 2000 index (25.42%) and a beta, relative to the entire NASDAQ market, lower than the one exhibited by the Russell 2000 index (0.0051). A rolling window of 20 days has been used. We point out that while the index was up by 25.42% in the considered time interval, most of the symbols identified in the set based on the imposed restrictions were concentrated around a positive rate of return of well above 65%. Figure A6. A set of 15 NASDAQ symbols that outperformed the Russell 2000 index in terms of rate of return without being exposed to higher risk (950-day trading interval, from 2 April 2018 to 26 August 2022).