Exploitation of Information as a Trading Characteristic: A Causality-Based Analysis of Simulated and Financial Data

In financial markets, information constitutes a crucial factor contributing to the evolution of the system, while the presence of heterogeneous investors ensures its flow among financial products. When nonlinear trading strategies prevail, the diffusion mechanism reacts accordingly. Under these conditions, information englobes behavioral traces of traders’ decisions and represents their actions. The resulting effect of information endogenization leads to the revision of traders’ positions and affects connectivity among assets. In an effort to investigate the computational dimensions of this effect, we first simulate multivariate systems including several scenarios of noise terms, and then we apply direct causality tests to analyze the information flow among their variables. Finally, empirical evidence is provided in real financial data.


Introduction
During the last few decades, research in finance has pointed to biases and sub-optimal decision-making processes as the drivers of investors' behavior. Many alternatives have been proposed so as to rethink the strong assumptions of rationality and perfect information. According to Lo and Mackinlay [1] and Lo [2], economic agents' actions can be influenced by various behavioral biases. The fact that, in terms of expectations, investors appear to be so different, since they do not share the same analytical skills, capital to invest, or even profit-maximizing goals, complicates the identification of patterns in real data. The role of heterogeneity in expectations is crucial. As demonstrated by Assenza et al. [3], in the event of a negative shock, even a small fraction of pessimistic forces coordinate and, due to positive feedback mechanisms, confidence is destroyed, leading to an expansion of market collapse. Hommes [4] presents heterogeneity and heuristics switching as detrimental factors of market dynamics. When positive feedback activates, market prices fluctuate strongly. Frijns et al. [5] relate the stylized facts observed in financial markets to an individual investor's portfolio selection process, which is significantly driven by their risk perception, behavioral characteristics, and socio-demographic factors. As Peiro [6] shows, heterogeneity in the investment horizon may affect the significance of skewness in the portfolio construction process. In the short term, both fat tails and asymmetry contribute to the non-normality of the return distribution, while in broader horizons kurtosis seems to better explain deviations from normality. This comes in agreement with Prakash et al. [7], who suggest that the shape of a stock return distribution changes with the investment horizon.
Although information constitutes a crucial factor contributing to the evolution of financial systems, heterogeneous investors ensure the flow of information. The trading decision, strongly affected by the linear or nonlinear underlying trading strategies and filtering of news, can give birth to new information that enters the diffusion mechanism of the market and spreads rapidly. In this process, information englobes behavioral traces of traders' decisions and represents their actions. The endogenization of information creates risk that significantly affects traders' positions, capital flows, connectivity among assets, and finally asset allocation.
The observed non-normal distribution of asset returns is often attributed to the dominance of irrational investors following active trading strategies. Investors whose actions are affected by anchoring and disposition effects participate in the buildup of a trend-forcing price evolution. Thurner et al. [8] show that trading strategies, characterized by leverage, lead to nonlinear positive feedback mechanisms and the amplification of price movements that generate fat tails and volatility clustering. Along with the presence of fat tails, Daniel and Moskowitz [9] and Barroso and Santa-Clara [10] relate the momentum returns with the presence of negative skewness. Jacobs et al. [11] show that overweighting left-skewed stock returns distributions and underweighting the right-skewed ones can lead to profitable momentum strategies. Ekholm and Pasternack [12] suggest that the manner of releasing positive and negative information induces skewness in the return distribution. Through simulation experiments, Wen et al. [13] provide evidence that biases, such as overconfidence and regret aversion, determine the reaction of investors to nonlinearly received information and lead to skewed and leptokurtic returns. According to Xu [14], the existence of skewness in stock returns may be the result of the invertor's reaction to the returns themselves. As Ruttiens [15] points out, a rational investor will favor stocks presenting the highest odd moments (expected value and skewness) and the lowest event moments (variance and kurtosis). Other trading characteristics, such as trading volume and heterogeneity, seem to justify the appearance of asymmetry in returns series (Hutson et al. [16]; Albuquerque [17]). Finally, the inability to implement appropriate and effective corporate governance can be the source of positive skewness in data (Bae et al. [18]).
Despite the simplistic character of the assumption of normality, the conventional approach to build a diversified portfolio has appeal due to the ease of implementation. In this framework, practitioners need only to consider for each asset class its mean, variance, and covariances, with the latter introducing the additional restrictive hypothesis of linear relationship for each pair of asset classes. Nevertheless, in dynamically unstable markets where information canalizes traders' characteristics into prices, diversifying asset portfolios can become a complicated procedure. To this end, research on nonlinear analysis suggests alternative techniques to the standard mean-variance framework of Markowitz [19]. Boginski et al. [20] formulate the portfolio selection problem as the maximum weight s-plex problem in the market graph. Fernandez and Gomez [21] generalize the mean-variance model by using artificial neural networks to calculate the efficient frontier. Huang [22] provides a new definition of risk by taking into consideration investors' perceptions of the severity level of the potential loss and redefines the portfolio selection problem under this new definition. In line with the empirical evidence about the presence of heavy-tail distributions, Kraft and Steffensen [23] and Diesinger et al. [24] show that assets can be modelled as jump-diffusion processes.
As follows, the trading-based non-normality is linked to high skewness and kurtosis in asset returns and can modify drastically the portfolio (basket of variables) structure. With the aim to investigate the computational dimensions of this effect, we use multivariate systems of variables where several scenarios of disturbances are considered. Then, a set of direct causality measures is employed to analyze the information flow among the variables. In this simulation exercise, the goal is to demonstrate that non-Gaussianity in a system is able to destabilize the fundamentally defined linkages. The impact becomes more pronounced when the initial connectiveness is nonlinear, which technically may be interpreted in terms of trading activity. The empirical validation of the effect of trading information on the variables' connectiveness is provided through an application to a concentrated and mixed five-stock portfolio.

Simulation Experiment Design
In an attempt to concretize the effect of informational signals in the sense of random disturbances on the connectivity, we use three stochastic systems. Their residual terms are defined in different ways, so that the simulated time series obey irregular characteristics.
The two main stochastic systems, often used in the literature for the evaluation of causality measures, with Gaussian noise terms are (i) a linear vector autoregressive (VAR) of order 4 in five variables (Schelter et al. [25]) and (ii) a nonlinear VAR of order 3 in five variables with linear ( X1 → X3, X4 ↔ X5 ) and nonlinear couplings ( X1 → X2, X1 → X4 ) (Montalto et al. [26]) (hereafter, S1 and S2, respectively). In an effort to include alternative forms of nonlinearity in the construction of variables, we built a third system on the basis of S2, in which X1 is described by a noisy Mackey Glass process (Kyrtsou and Terraza [27]). To consider the effect of data length, in all cases four different samples are selected-i.e., 512, 1024, 2048, 4096.
The presence of several lags in the simulated systems as well as the diversity in the nature of the relationships help establish a direct connection with the trading practice in financial markets. The systems evolve because of the combination of value signals (in technical analysis, the indicator signals are usually expressed by an inequality in terms of past values (value signal)), since the current state of each variable depends on past information of the same or other variable. The delay in the system equations measures the speed at which imperfectly reflected information is incorporated into X t . Lagged information can also be transferred non-propositionally (nonlinear lagged X terms) into X t , revealing that each variable at time t either over-or under-reacts to the past. Both conditions determine the spreading of information flow and feedback within the system.

System S1 by Schelter et al. (2006)
The system S1 is represented by the following set of equations.
Based on S1, and by changing the distribution of noise terms ε i,t , i = 1, . . . , 5, we formulate five additional simulation systems. Their connectivity network remains intact, as shown in Figure 1. In the initial system S1, all the variables present mesokurtic and symmetric behavior. S1t: S1 with noise terms ε i,t , i = 1, . . . , 5 from a t-Student distribution, with df = 2 degrees of freedom. The generated series exhibit leptokurtic behavior that increases with the sample. Among all the variables, X5, participating in more couplings than the rest, seems to be more sensitive to residual irregularity, having the highest kurtosis and varying skewness values. S1n: S1 with noise terms ε i,t , i = 1, . . . , 5 from the following GARCH(1,1) model: where w t is a Gaussian white noise process with a 0 = 0.2, a 1 = 0.2, b 1 = 0.75. The resulting time series exhibit leptokurtic and asymmetric behavior. Again, for X5, which is more affected, we obtain the highest kurtosis and positives skewness. S1b: S1 with noise terms ε i,t , i = 1, . . . , 5 from a beta distribution, with parameters a = 20, b = 2. All the variables present abnormal negative skewness. S1g: S1 with noise terms ε i,t , i = 1, . . . , 5 from a GARCH(1,1) model, as defined for S1n, where w t follows the gamma distribution with parameters = 16, b = 1/4. As in S1n, the simulated time series are leptokurtic and asymmetric. X5 has the most important kurtosis and positive asymmetry as well. S1f: S1 with noise terms ε i,t , i = 1, . . . , 5 resulting from the following FIGARCH(1,d,1) model: where ε t is a Gaussian white noise process and ω = 0.2, a = 0.2, b = 0.7, d = 0.6. Contrary to the previous versions of S1, where fat tails are assumed in the error term, the obtained kurtosis is slightly above three, even for the largest sample.

Systems S2 by Montalto et al. (2014) and S3
The system S2 is represented by the following set of equations.
With the aim to complicate the structure of the driving variable in S3, X1 is modelled as a noisy Mackey Glass with c = 10 and τ = 2. The remaining equations are identical to those of S2.
Although the system S2 is disturbed by normally distributed errors, as Table 1 reports, the variables X2, X4, and X5 are leptokurtic and skewed. The resulting non-normality can be explained by the amplification of the information flow towards X2 and X4, as well as by the indirect transmission to X5 via X4. In system S3, similar conclusions can be drawn only for X2 and X4. The moment statistics of X5 clearly converge to their normal distribution values. The appearance of fat tails in systems, where the nonlinear skeletons are perturbed by gaussian noise, has been studied under the term of endogenous heteroskedasticity in Kyrtsou [28] and Ashley [29].
Based on S2 and S3, and by modifying the distribution of noise terms ε i,t , i = 1, . . . , 5, as for S1, we define five additional simulation systems. Their path diagram is represented in Figure 2. In the initial parametrization of S2, even though residuals are white noises, three (X2, X4, and X5) out of five variables obey non-normal characteristics, such as high kurtosis (around 10) and skewness (around 2, either positive or negative). For S3, two (X2 and X4) out of five variables deviate from normality, with lower values of kurtosis (around 8) and skewness (around 1, either positive or negative).  S2t and S3t: S2 and S3 with noise terms ε i,t , i = 1, . . . , 5 from a t-Student distribution, with df = 2 degrees of freedom. The inclusion of t-student disturbances aggravates the non-normality. More specifically, the kurtosis and skewness of variables X2, X4, and X5 double if compared with their respective behavior in the original system S2. In S3, the amplification gives birth to extreme fat-tail and asymmetric behavior for all variables. It is worth noticing that, even for small sample sizes, the kurtosis approaches the value of 250, while the skewness is about 15. This finding refines the view that the nature of shock matters a lot in the propagation mechanism within a system. S2n and S3n: S2 and S3 with noise terms ε i,t , i = 1, . . . , 5 from the corresponding GARCH(1,1) model, as defined for S1n. Although all the variables are skewed and leptokurtic, the kurtosis and skewness of X2, X4, and X5 reach high values. Again, the amplification of irregularity is more pronounced for S3.
S2b and S3b: S2 and S3 with noise terms ε i,t , i = 1, . . . , 5 from a beta distribution, with parameters a = 20, b = 2, producing negative skewness. The simulated results show that the beta distribution of the noise terms is imposed on the mean structure of the system, destroying the excess kurtosis we detected for the nonlinearly connected variables X2, X4, and X5 of S2. On the contrary, in S3 the kurtosis turns into platykurtic values.
S2g and S3g: S2 and S3 with noise terms ε i,t , i = 1, . . . , 5 from a GARCH(1,1) model, as for S1g. The distributional characteristics of the system variables are similar to those of the system S2n. It is worth mentioning the steadily negative asymmetry of variable X4 in the case that the residuals follow a GARCH-type process. If we compare the strength of non-Gaussianity between S2 and S3, we come to the conclusion that the specific nonlinearity in the skeleton of the third system favors the detection of higher 3rd and 4th moment statistics via interaction.
To provide a schematic description of the methodological part, we present the simulation experiment in four steps. First, we simulate the systems S1, S2, and S3. In the second step, we introduce irregularity in the noise terms, as described above. Then, we identify couplings using direct causality methods to capture the information flow. In the last step, performance metrics are applied to verify the consistency of the obtained results.

Connectivity Measures and Performance Metrics
After describing the systems, including both linear and nonlinear couplings, together with the irregular characteristics of the residual terms able to give rise to abnormal values for skewness and kurtosis, we apply three multivariate (direct) measures of causality, instead of bivariate ones, to better apprehend the information flow. More specifically, we intend to identify the impact of introducing non-Gaussianity in the stochastic systems S1, S2, and S3 into the connectivity among their variables. Let us consider a multivariate system with K variables, where X is the driving variable (source), Y is the response variable (target), and there also K−2 confounding variables Z = {Z 1 , . . . Z K−2 }. The multivariate causality measures capture the direct causal influence from X to Y, conditioning on the remaining variables ( X → Y|Z).
The Restricted Conditional Granger Causality Index (RCGCI) is an extension of the standard Conditional Granger Causality Index (Geweke [30]), including dimension reduction so that the curse of dimensionality can be effectively addressed (Siggiridou and Kugiumtzis [31]). Computationally, a modified backward-in-time selection method is selected to restrict the VAR model. The choice of the appropriate subset of lagged terms is based on the time series property-that is, the dependence structure is closely related to the temporal order of the variables. In this way, the unrestricted VAR is estimated based on the selected lagged variables. Except for the fact that the lagged terms of the driving variables are eliminated, the restricted model is similarly constructed. The RCGCI is then calculated as the logarithm of the ratio of the variances of the restricted (var(s 2 R )) and of the unrestricted model var(s 2 U ): .
The statistical significance of the RCGCI is assessed by a parametric significance test (F-statistic) on the coefficients of the lagged terms of the driving variable in the unrestricted model: where SSE is the sum of squared errors, while the superscripts U and R denote the unrestricted and restricted models, respectively. p i is the number of lagged components of X in the U-model for Y, c is the largest lag in the U-model, N is the data size, and P j is the total number of U-model coefficients.
In the Partial Mutual Information on Mixed Embedding (PMIME), the dimension reduction is effectuated via a non-uniform embedding scheme (Kugiumtzis [32]). The mixed embedding vector w t = w X where I(X; Y Z) stands for the conditional mutual information of X and Y, conditioning on the Z variables. The PMIME test in terms of conditional mutual information is expressed as follows: To obtain the probability densities in the estimation of the (conditional) mutual information terms, the nearest neighbors' estimator (Kraskov et al. [33]) is employed. PMIME becomes zero in the case of no causality; otherwise, it is positive.
Respectively, the Partial Transfer Entropy on Non-Uniform Embedding (PTENUE) is introduced using the non-uniform embedding scheme (Montalto et al. [26]). Although its estimation procedure is identical to that of PMIME, an alternative nearest neighbors' estimator of Kraskov et al. [33] is employed for computing the probability densities. The PTENUE measure is defined below: The measure equals zero if causality does not exist; otherwise, it is positive. The nonlinear causality measures PMIME and PTENUE do not require a significance test. Surrogates, though, are incorporated within the estimation algorithm of the measures to form the stopping criterion regarding the mixed embedding vector. Papana et al. [34], Papana [35], and Siggiridou et al. [36,37] have shown that RCGCI, PMIME, and PTENUE outperform a large range of linear and nonlinear, bivariate and multivariate causality measures.
In the fourth step, binary classification metrics such as the sensitivity, specificity, and Matthews correlation coefficient (Tharwat [38]) are employed to evaluate the performance of the tree direct causality measures. In the simulated systems, the causality measures are estimated on the K(K − 1) possible pairs of variables for a system of K variables.
The sensitivity metric-i.e., the true positive rate (TPR)-quantifies the true positives (TP) against the number of real positives (P) in the data.
The term true or false refers to the correct or incorrect (spurious) coupling, while positive or negative means the acceptance or rejection of couplings, respectively. If the sensitivity approaches 100%, then more correct causal links are detected over the total link detections.
The specificity metric-i.e., the true negative rate (TNR)-checks the true negatives (TN) against the number of real negatives (N) in the data.
Thus, the specificity provides the percentage of rejection of spurious links over the total number of detected uncoupled cases. The percentage at which the specificity value deviates from 100% denotes the accepted spurious couplings.
Finally, the Mathews' correlation coefficient (MCC) (Matthews [39]) is a measure of the overall performance, merging information from sensitivity and specificity by considering all the possible correct and spurious couplings, either causal or no causal. If it equals 100%, there is a perfect identification of the pairs of true and no causality.

Simulated Series Results
As reported in Table 2, the RCGI correctly identifies the connectivity network of S1 for all the samples. The performance is slightly improved as the time series length increases due to the identification of less spurious detections. The PMIME also correctly indicates connectivity. Again, similar results are obtained for all the time series lengths. However, the percentage of detecting significant causality is larger than the nominal level (5%) for the uncoupled pairs of variables. The true connectivity network of S1 is obtained with PTENUE, independent of the sample size. Less spurious cases are captured as well. On the basis of the performance metrics, the PTENUE outstands the other two measures, achieving the highest mean MCC score (96.66%) over the RCGCI (94.53%) and the PMIME (85.2%). Respectively, their mean sensitivities are very high. The difference in the performance of the measure is affected by its specificity-i.e., the true negative rate.
When a noise term from the t-distribution is added to S1 (S1t), the performance of RCGCI is similar to that of S1. It finds the causal links perfectly well, while the percentages of significant detections for the uncoupled pairs of variables does not exceed 5%. The performance of PMIME does not deteriorate for S1t compared to S1. The true links are detected. However, the percentage of significant detections for the uncoupled pairs of variables varies from 9.85% to 12.85%. The PTENUE performs for S1t as for S1. The true links are found, and a few spurious acceptances arise. In total, the PTENUE has the best mean performance for S1t. RCGCI comes second, with an MCC value very close to that of PTENUE.
Including the GARCH residuals in S1 (S1n) worsens the metrics of the RCGCI. The measure captures the true causal linkages, but for the uncoupled cases the percentage of significant RCGCI values varies from 8.36% to 10.38%. The PMIME gives less acceptances of spurious causalities for S1n. The best performance for the system S1n is achieved by PTENUE.
When beta-distributed errors are considered for S1b, the RCGCI finds the true connectivity network for all n (100%). Similar to S1, also a few spurious links are obtained. The PMIME identifies perfectly the true connections (100%) for all n. The uncoupled links are falsely indicated, with percentages that vary from 9.85% to 12.85%. The PTENUE indicates again the true connections, while a small number of spurious cases appear.
In the case of system S1g, the RCGCI captures the true causalities, but for the uncoupled cases the percentage of significant RCGCI values remains high (from 7.77% to 9.23%). The PMIME performs almost similarly to the RCGCI. The PTENUE continues to be the best measure. When the FIGARCH errors are considered, the PMINE puts forward more false couplings.
Tables 3 and 4 report the results of the application to the simulated system of Montatlo at al. [26] and the new system S3. As one can see, the RCGI correctly identifies the linear relationships. On the contrary, the nonlinear links are detected with very low percentages. In addition, spurious links are indicated. On the other hand, the PMIME captures more couplings. The performance increases with the sample length. However, in terms of spurious detection the PMIME gives similar results to the RCGCI for S2, while it finds further false couplings for S3. The PTENUE performs closely to the PMIME, but in terms of the mean binary classification metrics for the system S2. However, it is clearly better than the PMIME for S3. The RCGCI achieves a pretty low mean MCC score, mainly due to its low sensitivity.
In the case of the t-distributed errors in S2, the RCGI correctly detects the true couplings. This percentage increases to 90% for the large sample. Spurious links are also revealed. This effect is more significant in the sample of 4096 observations. The PTENUE shows more true causal links, while fewer spurious relationships than the RCGCI and PMIME detect are found. In terms of performance, the PTENUE overcomes the PMIME. In the third system, the PMINE seems to be more sensitive to the nonlinearity, suggesting an increasing number of spurious couplings.
For the systems S2n, S3n, and S2g, S3g considering the GARCH residual terms, the measures produce almost identical results. The RCGI indicates the true couplings, giving comparable percentages of acceptance with S2t. The spurious detections are high, approaching 30%. The PMIME captures effectively the true connections, but at the same time it shows spurious ones. According to the sensitivity metric, the PTENUE performs as well as the PMIME, but gives fewer wrong causalities. This performance is concretized through the highest mean MCC.
Regarding S2b, the RCGCI captures efficiently both the linear and nonlinear linkages, and the percentage of spurious cases is lower than in the previously analyzed versions of the system S2. Both the PMIME and the PTENUE find the true connectivity. Nevertheless, the latter indicates a smaller number of spurious couplings. The high MCC values show that the PTENUE stands out. The significant difference between the system S3b refers to the poor performance of the RCGCI, since the number of false detections rises to 37.73%.
Finally, when the FIGARCH error term is considered in S2 and S3, the linear measure fails significantly to indicate the correct causal relationships and suggests equally high false ones. The nonlinear tools perform better, giving a lower rate of false acceptances than RCGCI for S2, except PMINE for S3, which reaches 23.93% spurious couplings.  Comparing the rate of acceptance of false causalities (specificity) between the systems S1, S2, and S3, we can conclude that more spurious couplings emerge in the nonlinear systems S2 and S3 due to the common source of driving (X1→X2, X3, X4) and the transitive indirect paths (X1→X4↔X5). It is also worth emphasizing that, among the spurious causalities, the bidirectional coupling between X2 and X4 (by definition uncoupled) is steadily detected by all measures in all samples. Therefore, we believe that the high rate of acceptance of couplings which are not derived from the initial formulation of the systems S2 and S3 indicates the creation of new structures because of the propagation. To further deploy our rationale, we calculate the mutual information among the variables of all versions of S2 and S3. As can be seen in Table 5, when the error term obeys irregular properties that differs in intensity and nature, the dependence rises. In fact, the higher mutual information coefficient is obtained for the initially uncoupled pair X2 and X4. It occurs associatively that S2 stands out as a nonlinearly self-exciting process. According to Ocker et al. [40], in nonlinearly self-exciting processes nonlinearities impose bidirectional couplings and the structure expands. The detected spuriosity varies depending on the disturbance term that obviously affects the amplification within the nonlinear skeleton of the model. More specifically, when the noise term presents a more volatile or asymmetric profile, it imposes its own structure on the endogenous part, then amplification is braked, and the number of spurious couplings decreases. On the contrary, in Table 6, where the driver variable X1 is generated by a noisy Mackey-Glass process, a higher level of mutual information is achieved among the variables of the system S3b. Apparently, the interaction of the beta distribution with the skeleton of S3 is spread out rapidly. In this specific model, the linear RCGCI gives around 40% spurious couplings in contrast to the nonlinear measures that identify only 10% of the false linkages.
Regardless of the sample size of the nonlinear systems under study, we observe that a significant divergence in performance between the linear and the nonlinear causality measures is associated with nonlinear relationships among the variables. In this line, for all the causality forms and disturbance terms the simulation results show that the PTENUE decodes correctly the true linkages and gives lower rates of false couplings.

Application to Real Financial Data
With the aim to provide empirical evidence about the impact of heterogeneity in information on the connectivity of variables, we built two different stock portfolios. Following the simulated systems' construction, the first portfolio A is composed of five stocks from the French stock exchange index CAC40 to reproduce the properties of a concentrated structure. The respective listed companies are Total (FP) from the energy sector, Sanofi (SAN) from the healthcare sector, L'Oreal (OR) and Danone (BN) from the consumer defensive sector, and BNP Paribas (BNP) from the financial services sector. The required heterogeneity is achieved by considering three big capitalization stocks (FP, SAN, OR) together with two lower capitalization stocks (BN, BNP). In the second portfolio named B, the goal is to combine two different (preferably independent) concentrated structures so as to emphasize the contribution of their dynamics to the overall behavior of the portfolio. For this reason, we replace the lower capitalization stocks BN and BPN with Cipla limited (Cipla) from the healthcare sector and Britannia industries limited from the consumer defensive sector of the national stock exchange index of India NIFTY. The absence of lead-lag relationship between NIFTY and European stock indexes is pointed out by Choudhary and Singhal [41].
With a focus on considering heterogeneous investment time horizons, we select four distinct samples for each portfolio-i.e., 500, 1000, 2000, and 4000 data points-starting from the most recent observation of the dataset (i.e., 30/04/2020). The Tables 7 and 8 report the 3rd and 4th moment statistics of all the stock returns series. The results show that the variables are highly skewed and leptokurtic. However, the maximum values of skewness and kurtosis (value in red) are obtained for different sample lengths, confirming the fact that the incorporation of historical information reveals heterogeneous aspects of investors' activity and eventually variations in market conditions. Non-normality in the short-run subsample of 500 observations represents more speculative dimensions of trading. On the contrary, when deviations from Gaussianity become evident in longer samples, one should look at the volatile reaction of long-term-oriented investors. Deviations depend on the stock nature and do not rise or decrease proportionally with the sample size.
The application of the causality measures to the portfolios A and B helps to further illustrate the above heterogeneity. The resulting path diagrams per measure and sample are presented in Figures 3  and 4. The complex interdependence between stocks is clearly indicated by the either no-causality or sparse structure captured by the linear RCGCI measure. The implementation of PMIME and PTENUE indicates rich linkages among the variables. At first glance, it seems that the strong causal forms detected by both nonlinear tools are consistent in the shorter and longer samples (i.e., 500 and 4000 points). This diversity in patterns can eventually reflect the fact that investors may be willing to take more on risk over longer periods (Andries,et al. [42]), or when the risk-free rate turns negative (Baars,et al. [43]).
Although the stock systems are composed of only five variables, the intensity of spreading for the small and large samples among the variables of portfolio A slightly differs between PMIME and PTENUE. It turns out that the combination of domestic stocks generates a nonlinearly interconnected portfolio. Of course, introducing more variables would allow richer dynamics to unfold. Therefore, in the small sample (500 obs.) of the mixed portfolio B, mixing two different sets of stocks affects the consistency of couplings and interrupts the complex pattern of portfolio A. The decrease in the standard deviation from 2.6% (portfolio A) to 1.6% (portfolio B) along with the increase in mean return from −0.106% to 0.033% is an appealing effect of the changes in connectivity over the short-term period, including the first months of the coronavirus pandemic. Respectively, the slump in the skewness value from −1.73 (portfolio A) to −0.10 (portfolio B) illustrates the tight link between the nature of stock connectivity and portfolio asymmetry that reflects the trading activity (Horwitz [44]).
Additionally, the fact that the visual representation of both portfolios is time-varying brings out the role of heterogeneity in financial markets in terms of trading horizons and the subsequent complexity of information signals. Evidence about horizon-dependent behavior has been provided by Prat [45] for the equity premium in the US stock market data over the period 1871-2008. Conclusions regarding the horizon-dependent causality between the US and the China ETF markets that increases in the long-term have been also drawn by Nie at al. [46].     Looking closer at the evolution of dependence in portfolios A and B as far as information accumulates and sample size increases will shed light on the beneficial side of the resulting dynamics. To do so, we calculate the mutual information coefficient among the stock returns. The results for the concentrated portfolio A, reported in Table 9, show that the dependence clearly intensifies as the number of observations increases. Thereby, holding stocks for a short period of time by focusing on the recent performance of the respective firms could potentially turn into a beneficial decision under favorable market conditions. When conditions deteriorate and volatility bursts, it is possible to take advantage of the nonlinear connectivity within the concentrated structures by fusing appropriately different pools of assets in an effort to address trading heterogeneity, such as the case of portfolio B.  Looking closer at the evolution of dependence in portfolios A and B as far as information accumulates and sample size increases will shed light on the beneficial side of the resulting dynamics. To do so, we calculate the mutual information coefficient among the stock returns. The results for the concentrated portfolio A, reported in Table 9, show that the dependence clearly intensifies as the number of observations increases. Thereby, holding stocks for a short period of time by focusing on the recent performance of the respective firms could potentially turn into a beneficial decision under favorable market conditions. When conditions deteriorate and volatility bursts, it is possible to take advantage of the nonlinear connectivity within the concentrated structures by fusing appropriately different pools of assets in an effort to address trading heterogeneity, such as the case of portfolio B.

Implications
The endogenization of information and the subsequent amplification within the system has a significant effect on variables' connectiveness. The increasing nonlinearity and the presence of endogenous heteroskedasticity in the simulated systems S2 and S3, together with the appearance of new couplings detected as spurious by the specific statistical measures, can have several exploitable implications for the trading practice and portfolio construction. The property of nonlinearly self-exciting process applies in the set of real variables as well, where their connectivity relies upon the underlying dynamics of the data time horizon. Although, in the long run, the concentrated portfolio A and the mixed portfolio B are characterized by the rich nonlinear association of their components, in the short run mixing two different pools of stocks in portfolio B affects the connectivity and risk measurement. This finding emphasizes the importance of investors' risk profile and time horizon in the asset allocation process.
From a broader perspective, the presence of escalating nonlinear dependences among stocks justifies the need to deal with the curse of dimensionality in financial portfolios. In complex financial markets, where the number of variables influencing the asset prices can be huge, selecting a subset able to capture market risk is an appalling challenge. Green and Hollifield [47] show that estimation errors as a result of the optimization of many assets lead to not well diversified portfolios. Nonlinear interactions among stock returns can also affect the performance of standard asset pricing models. According to Chicheportiche and Bouchaud [48], portfolios generated by nonlinear approaches outperform the Markowitz mean-variance model, while Laloux et al. [49] show that, due to the high level of noise and instability of the dependence structure, over time the use of the covariance matrix underestimates portfolio risk.
Under conditions of strong nonlinear association, a diffusion of biased information among assets can modify a portfolio's characteristics and impact its performance. Remedies for this effect include the dynamical revising of the portfolio structure or updating asset allocation. On the other hand, exploiting informational evolution through investing in concentrated portfolios, possibly combined with style investing, can be an additional alternative. Although risky, a steady preference of individual investors for concentrated portfolios and active trading has been recorded in the literature. This is likely attributed to the existence of behavioral biases. Individual investors overestimate either the quality of their private information or their ability to interpret it (Odean [50]; Barber and Odean [51]). However, trading aggressively could also reflect their attempts to exploit superior private signals (Kyle [52]). As shown in Ivkovic et al. [53] investments made by concentrated investors can perform significantly better than the investments made by those diversifying across many stocks. Moreover, the concentration is more significant for stocks with greater information asymmetries. When the concentration increases, the risk increases nonlinearly (Horwitz [44]). Although concentrated portfolios frequently present substantial tracking errors to the benchmark, investors' information processing is capable of transforming a theoretically suboptimal decision into a beneficial investment strategy delivering high abnormal returns (Choi et al. [54]).
Future research on the impact of connectivity among financial assets will include high-dimensional simulated systems, as well as the evaluation of real stock portfolios built under different statistical scenarios.
Author Contributions: Methodology, analysis, presentation, and writing, C.K., C.M. and A.P.; data curation, C.M. and A.P.; supervision, C.K. All authors have read and agreed to the published version of the manuscript.