A Refinement of Recurrence Analysis to Determine the Time Delay of Causality in Presence of External Perturbations

This article describes a refinement of recurrence analysis to determine the delay in the causal influence between a driver and a target, in the presence of additional perturbations affecting the time series of the response observable. The methodology is based on the definition of a new type of recurrence plots, the Conditional Joint Recurrence plot. The potential of the proposed approach resides in the great flexibility of recurrence plots themselves, which allows extending the technique to more than three quantities. Autoregressive time series, both linear and nonlinear, with different couplings and percentage of additive Gaussian noise have been investigated in detail, with and without outliers. The approach has also been applied to the case of synthetic periodic signals, representing realistic situations of synchronization experiments in thermonuclear fusion. The results obtained have been very positive; the proposed Conditional Joint Recurrence plots have always managed to identify the right interval of the causal influences and are very competitive with alternative techniques such as the Conditional Transfer Entropy.


Multiple Causality between Time Series
Most specialists agree to trace the concept of synchronization back to Huygens and his study on what we define today as antiphase synchronized pendula. Presently, the study of synchronization between dynamical systems is generally focused on determining the nature of the coupling among the observables involved. As different forms of mutual influence exist, different synchronization typologies can be defined as well. Phase synchronization, generalized synchronization, and lag synchronization are just some examples [1]. Several valuable theoretical approaches have been proposed, see, e.g., in [2][3][4], for their assessment. In practical applications, the evolution of the generalized and phase synchronizations between two quantities are among the most widely studied, while lag synchronization is often aimed at inferring the delay of the maximal influence between two observables.
As dynamical systems can be observed in all of natural science, from medicine and biology to engineering and physics, the interest on their coupling, dynamic, and relation is a still very active topic of research. In biology, for example, Pavlova et al. [5] applied the detrended fluctuation analysis (DFA) to characterize complex interactions between neurophysiological signals, to avoid destroying the long-range correlation in the original data using prefiltering techniques. In medicine, studies Hadamard product of two RPs and so allowing investigation of the possible simultaneous occurrences between two quantities or two dynamical systems. Mathematically, in the case of two time series [15] JRP xy ij = Θ x − x i − x j Θ y − y i − y j , y i,j ∈ R m , x i,j ∈ R n , i, j ∈ 0, N With an analogous meaning for the symbols used. From Equation (2), it emerges clearly that it is easy to include further quantities in the analysis [15], by simply performing the Hadamard product (" ") of the JRP with a RP: JRP xyz = JRP xy RP z Various variants have been introduced for the RR and JRP [15]. Here, we propose the new conditional joint recurrence plot as CJRP xy|z = JRP xy 1 − JRP zy (4) While (3) allows estimating the same occurrence in phase space among three quantities, without distinguishing the actual synchronization between them, (4) has been conceived to filter out the influence of "z" on "y", to better investigate the one between "x" and "y".
The idea behind the methodology developed for the application of (4) is as follows. A key parameter of the RP and of their extensions is the threshold in (1). The higher its value, the more elements equal 1 in the matrix; the lower its value, the more difficult it is to get a "1". This reflects the fact that, when reducing the threshold, the closer the points on the attractors have to be in order to count as recurrences. As the aim is determining the delay between a driver "x" and response observable "y" in presence on external perturbation "z", to exclude the effect of unwanted occurrences, one can proceed by fixing the threshold JRP xy around its optimal value and iteratively reducing the threshold JRP yz . Then, for each value of JRP yz , a scan in the delay between the two signals is performed, to determine the time of maximum interaction between the two systems.
Reducing progressively JRP yz in Equation (4), starting from a value JRP xy ∼ JRP yz , the number of occurrences between z and y decreases. Only the points on the attractors of z that are close to those of y cause the Heaviside function to output a "1" and consequently a "0" in the matrix of CJRP xy|z . In this way, the occurrences between x and y are preserved and the ones between z and y are progressively sifted out. Scanning the value of JRP yz , the CJRP reaches a stable region, in which it is possible to extract reliably the desired information with specific indicators (see later). On the contrary, decreasing the threshold RP z in Equation (3), the resulting JRP provides the occurrences where both x and z strongly influence y, but the relative importance of the two cannot be determined. To conclude, the direct use of the JRP xy would not be able to single out spurious occurrences induced from external perturbations on the response system.
Besides allowing a synthetic visualization of complex processes recurrences, RPs, JRPs, and CJRPs may be used also to derive important properties of a system phase space. It has to be considered in fact that single isolated points correspond to infrequent states with a short persistence, while the existence of vertical and horizontal lines accounts for states relatively stable in time. The most interesting features are the diagonal lines, corresponding to trajectories in the phase space visiting the same region at different times. Short diagonals are characteristic for weakly correlated, stochastic, or chaotic processes, while long diagonals occur for deterministic processes [15]. Thus, recurrence quantification analysis [17] provides a number of useful estimators, which are derived from the visual features occurring in RPs. In the following, three main estimators are considered [15,17]: the mean diagonal length (L), the determinism (DET), and the recurrence rate (RR). L and DET are related to the periodic behavior of the system analyzed, while RR estimates the density of recurrences. For coupled systems, the length of a diagonal is proportional to the duration of the local evolution of the trajectories; the determinism, defined as the percentage of recurrence points forming the diagonal lines and consequently related to the length of the diagonal lines, allows measuring the predictability (or determinism) of the system. Finally, the recurrence rate represents an indicator for correlated recurrences.
A detailed description of these estimators can be found in the literature [15,17] and is out of the scope of this article. In Equations (5)-(7), the mathematical expressions for L, DET, and RR are provided according to [15,17] RP ij (7) where N is the number of samples, "l" stands for a specific diagonal length, while P(l) represents the number of diagonal lines with the specified length "l". The minimum value for l min in (5) and (6) has been set to l min = 2 [15,18].
To conclude the section, it has to be mentioned that in this article a well-established CRP tool [18] has been used to evaluate the RP, their extension, and the estimators RR, DET, and L.

Transfer Entropy and Conditional Transfer Entropy
The aforementioned methodology has been compared with the application of the Transfer Entropy (TE) and of the Conditional Transfer Entropy (CTE) [16] on the same set of data.
From the publication of Schreiber [19] onward, the Transfer Entropy has been widely applied in the natural sciences to infer the information flow from a driver to a response system and consequently to infer Granger causality relationships [20].
Considering two physical observables evolving in time, and their representation as two time series "x" and "y", using the formalism of the (discrete) Markov process of order "m", it is possible to introduce the TE in a natural way. The underlying assumption is that the probability of the occurrences, for each observable, of their value at the time instance "n + 1" depends only on certain number of previously assumed states of the quantity itself. In other words, x or y have a memory of order (l) or (k), respectively. The second assumption is that, if x has a causal influence on y, the past values of x allow minimizing the uncertainty in the prediction of y at the time "n + 1" [16].
Considering the previous assumptions, the TE quantifies the importance of the knowledge x n , of a process x (k) , to predict the occurrence of the state y n+1 of a process y (l) , also taking into consideration the contribution from the memory of y itself. What has just been said can be expressed mathematically in terms of the conditional mutual information between x (k) and y (l) or between x (k) , y (l) , and z (m) when a further quantity is considered as relevant for the information flow [16]: Equation (8) for discrete variables can be written as [19]  Equation (10) can be easily interpreted also as follows. If the knowledge of x (k) n does not improve the prediction of y n+1 , i.e., p y n+1 x (k) n , y (l) n = p y n+1 y (l) n , then TE = 0. Considering continuous quantities, the definition slightly changes and gets more complicated, but the idea is the same. Similar ideas and assumptions are at the basis of the CTE as well.
It is worth noting in this introduction, that the main issues about the TE lays in the estimation of the entropies in Equations (8) and (9) and consequently in the probability density functions (pdfs). Methods have been established to tackle this issue, like the Kraskov, Stögbauer, and Grassberger (KSG) estimator [21] that extends the Kozachenko-Leonenko estimator [22]. Basically, in the jointly embedded n , a specific norm (usually the max-norm) assesses the distance between kth nearest neighbors and entropies are expressed as functions of this distance and the digamma functions. Further details can be found in literature and are actually out of the scope of the article [16,21].
To conclude the section it has to be mentioned that in this work the well-established JIDT tool presented in [23] has been used to evaluate the TE and the CTE.

Numerical Tests: Autoregressive Models
The first group of tests has been performed using synthetic data generated with the following family of autoregressive models, constructed on the basis of [24] x n = (c 1 x n−1 + c 2 x n−2 In (11), ; the function f x α n−4 , z β n−2 ; c 5 , c 6 , c 7 varies depending on the test performed as Table 1 reports. r 0 stands for a sample drawn from a normal standard Gaussian distribution; r(µ = 0, σ = px n ) stands for a sample drawn from a normal Gaussian distribution with σ equals to a percentage "p" of the estimated value for x n , y n , z n . Quantities in (11) have been evaluated using 504 points and discarding the first four samples.  In (11), both the driver ("x") and the perturbation ("z") influence the response quantity ("y"). For each evaluation of Table 1, twelve different realizations of the system (11) have been averaged.  for , , . Quantities in (11) have been evaluated using 504 points and discarding the first four samples.
The methodology, based on the estimator described in Equation (4), performs well on autoregressive data. Figure 2 shows the behavior of the CJRP while varying JRP yz for the case of the f p=0 III   The methodology, based on the estimator described in Equation (4), performs well on autoregressive data. Figure 2 shows the behavior of the CJRP while varying for the case of the ( , ; 0.5, 0.5,0.25) function. Black pixels indicate positive recurrences of both x and y. The most challenging tests have been the ones concerning the nonlinear coupling with or without a correlation term, i.e., ⋯ (°) and ⋯ (°) with ( , ) = (2,2) according to the notation reported in Table 1, while varying the added percentage of noise p and coupling terms , , .
The overall procedure has been reported graphically in Appendix A for one of the difficult cases studied, i.e., (°). Figure 3 reports the behaviors of the average < >, < >, and < >, evaluated at two stable stages of the analysis obtained following the methodology described above.
It has also to be stated that the procedure proved to be robust against small errors in the evaluation of the initial guess of the threshold ϵ [15]. The methodology in fact allowed detecting the right delay between the driver "x" and the response system "y" even varying the threshold itself over a reasonable interval up to the 30% of its initial estimate.
The embedding dimension has been chosen as the value corresponding to the minimum percentage of false nearest neighbors (FNN). Following the Kennel, Brown, and Abarbanel methodology [25], = 15, = 2.0 have been used to evaluate the FNN percentage. At the same time, the optimal lag for the construction of the embedded and delayed space has been set considering the first minimum of the mutual information [25].  The most challenging tests have been the ones concerning the nonlinear coupling with or without a correlation term, i.e., f p=··· I ( • ) and f p=··· III ( • ) with (α, β) = (2, 2) according to the notation reported in Table 1, while varying the added percentage of noise p and coupling terms c 5,6,7 .
The overall procedure has been reported graphically in Appendix A for one of the difficult cases studied, i.e., f p= III ( • ). Figure 3 reports the behaviors of the average L , DET , and RR , evaluated at two stable stages of the analysis obtained following the methodology described above.  Comparison of the estimators drawn from the CJRP (left column) and the JRP (right column) varying the zy threshold while leaving xy one to its optimal value. The overall iterative procedure has been reported in Figures A1 and A2 in Appendix A. It has also to be stated that the procedure proved to be robust against small errors in the evaluation of the initial guess of the threshold xy [15]. The methodology in fact allowed detecting the right delay between the driver "x" and the response system "y" even varying the threshold itself over a reasonable interval up to the 30% of its initial estimate.
The embedding dimension has been chosen as the value corresponding to the minimum percentage of false nearest neighbors (FNN). Following the Kennel, Brown, and Abarbanel methodology [25], R tol = 15, A tol = 2.0 have been used to evaluate the FNN percentage. At the same time, the optimal lag for the construction of the embedded and delayed space has been set considering the first minimum of the mutual information [25]. Figure 3a-c shows the analysis performed with the same coupling between x and y (c 6 = c 5 ) and without noise. Figure 3d-f shows the results on the same formulation with 20% of added Gaussian noise.
The estimators suggest in both cases a delay of d xy = 4. Increasing the noise, the peak at the actual delay appears less distinct from the overall behavior. However, for practical applications to real data, considering a parabolic fit, the indicators show a clear peak at the expected delay.
Results have been compared with the TE and its extension. Figure 4 reports the average value obtained with the use of the TE.  The estimators suggest in both cases a delay of = 4. Increasing the noise, the peak at the actual delay appears less distinct from the overall behavior. However, for practical applications to real data, considering a parabolic fit, the indicators show a clear peak at the expected delay.
Results have been compared with the TE and its extension. Figure 4 reports the average value obtained with the use of the TE.  the KSG algorithm used to evaluate the TE itself and implies that the information flow is less than expected and is a symptom of no relationship [16,23].
Considering the results of the tests shown in Table 1, it has to be mentioned that the TE cannot always detect the actual delay = 4, but the use of the CTE clarify the analysis detecting the correct delay at which z influences y, i.e = 2.
To complete the section, another test has been performed to address the capability of CJRP to detect the proper delay between z and y instead of x and y. Figure 5 provides the behaviors of the estimators considered, for the case ( , ; 0.5,0.5,0.25) , stopping the aforementioned methodology once a stable behavior of the CJRP has been reached. Appendix A provides also the plots of the scan performed. As it can be observed, the CJRP can detect the correct delay at = 2. x 2 n−4 , z 2 n−2 ; 0.5, 5, 2.5 . Negative values of the TE are due to the KSG algorithm used to evaluate the TE itself and implies that the information flow is less than expected and is a symptom of no relationship [16,23].
Considering the results of the tests shown in Table 1, it has to be mentioned that the TE cannot always detect the actual delay d xy = 4, but the use of the CTE clarify the analysis detecting the correct delay at which z influences y, i.e d zy = 2.
To complete the section, another test has been performed to address the capability of CJRP to detect the proper delay between z and y instead of x and y. Figure 5 provides the behaviors of the estimators considered, for the case f p=0 III x 2 n−4 , z 2 n−2 ; 0.5, 0.5, 0.25 , stopping the aforementioned methodology once a stable behavior of the CJRP has been reached. Appendix A provides also the plots of the scan performed. As it can be observed, the CJRP can detect the correct delay at d yz = 2.

Numerical Tests: Causality Horizon
A second typology of test has been performed, aimed at investigating the causality horizon [26][27][28][29], i.e. the maximum time interval into which two physical quantities are synchronized and in which one observable can be thought as the "drive mechanism" of the second observable, for periodic and quasiperiodic signals of the same nature as those encountered in real experiments in thermonuclear scenarios [30,31]. In this kind of experiment, particular instabilities of the plasma, with potential harmful effects on the machine integrity, are paced by triggering them frequently enough that they do not have time to become dangerous for either the performance or safety of the reactor. The information related both to the evolution of such instabilities and of the occurrence of the pacing, can be retrieved by measuring specific physical quantities in the form of time series. Besides practical implementation difficulties, the evaluation of the pacing efficiency is difficult because the instabilities are quasi periodic, and therefore after a perturbation induced by the control systems, if enough time is allowed to pass, they are bound to reoccur. Typical pacing techniques are the injection of frozen deuterium pellets for controlling ELM instabilities [31] or ICRH power modulation for the control of sawtooth instabilities [30]. The latter experiment has been considered in this paper, by means of a simple synthetic model. A square function with = 3Hz, 50% duty cycle and two sawteeth functions with different frequencies ([ , ] = [3Hz, 1.25Hz]) have been considered. A delay of 50 samples has been added to the fastest sawteeth function with respect to the square wave, then the two have been summed as shown in Figure 6. The resulting time series are meant to simulate the modulation of a fast (paced) sawtooth by a square wave modulation in presence of a slower (natural) sawtooth behavior [30,31].
For this test, 1500 samples have been generated and the first 100 removed. Normal Gaussian noise with a standard deviation equals to = [0.01, 0.05,0.1] of the values assumed by the periodic functions has been also added on data.
With regard to the nomenclature, the square function, the driver, has been referred to as "x", the response, i.e., the fastest and delayed sawteeth as "y", i.e., the slower sawteeth function and the external perturbation, as "z".

Numerical Tests: Causality Horizon
A second typology of test has been performed, aimed at investigating the causality horizon [26][27][28][29], i.e., the maximum time interval into which two physical quantities are synchronized and in which one observable can be thought as the "drive mechanism" of the second observable, for periodic and quasiperiodic signals of the same nature as those encountered in real experiments in thermonuclear scenarios [30,31]. In this kind of experiment, particular instabilities of the plasma, with potential harmful effects on the machine integrity, are paced by triggering them frequently enough that they do not have time to become dangerous for either the performance or safety of the reactor. The information related both to the evolution of such instabilities and of the occurrence of the pacing, can be retrieved by measuring specific physical quantities in the form of time series. Besides practical implementation difficulties, the evaluation of the pacing efficiency is difficult because the instabilities are quasi periodic, and therefore after a perturbation induced by the control systems, if enough time is allowed to pass, they are bound to reoccur. Typical pacing techniques are the injection of frozen deuterium pellets for controlling ELM instabilities [31] or ICRH power modulation for the control of sawtooth instabilities [30]. The latter experiment has been considered in this paper, by means of a simple synthetic model. A square function with ν = 3Hz, 50% duty cycle and two sawteeth functions with different frequencies ([ν 1 , ν 2 ] = [3Hz, 1.25Hz]) have been considered. A delay of 50 samples has been added to the fastest sawteeth function with respect to the square wave, then the two have been summed as shown in Figure 6. The resulting time series are meant to simulate the modulation of a fast (paced) sawtooth by a square wave modulation in presence of a slower (natural) sawtooth behavior [30,31]. Results are reported in Figure 7. All the estimators for the noiseless test show the correct delay at = 50. Adding the noise, from the results in Figure 7d-f, it can be seen how the L estimator is not very informative in this case, while the correct delay can be detected and more clearly observed using DET and RR.
(a) (d) For this test, 1500 samples have been generated and the first 100 removed. Normal Gaussian noise with a standard deviation equals to p = [0.01, 0.05, 0.1] of the values assumed by the periodic functions has been also added on data.
With regard to the nomenclature, the square function, the driver, has been referred to as "x", the response, i.e., the fastest and delayed sawteeth as "y", i.e., the slower sawteeth function and the external perturbation, as "z".
Results are reported in Figure 7. All the estimators for the noiseless test show the correct delay at d xy = 50. Adding the noise, from the results in Figure 7d-f, it can be seen how the L estimator is not very informative in this case, while the correct delay can be detected and more clearly observed using DET and RR. Results are reported in Figure 7. All the estimators for the noiseless test show the correct delay at = 50. Adding the noise, from the results in Figure 7d-f, it can be seen how the L estimator is not very informative in this case, while the correct delay can be detected and more clearly observed using DET and RR.
(a) (d) Again, results are compared with the use of the TE and the CTE. As it can be observed in Figure  8a for the noise-free case or in Figure 8b for the 10% of added noise. Again, results are compared with the use of the TE and the CTE. As it can be observed in Figure 8a for the noise-free case or in Figure 8b for the 10% of added noise. Entropy 2020, 22, x FOR PEER REVIEW 12 of 24 (a) (b) Figure 8. Behaviors for the synthetic signals analyzed and shown in Figure 6 with (a) 0% of noise and (b) 10% of Gaussian noise on data. Negative values of the TE are due to the KSG algorithm used to evaluate the TE itself and implies that the information flow is less than expected and is a symptom of no relationship [16,23].
Considering Figure 8a, the TE allows estimating the correct delay between the driver and the response system, but spurious peaks appear in both Figure 8a,b. In real applications, the actual delay of = 50 samples would not be detected directly. The CTE improves the results especially for the noiseless data in Figure 8a. However, the behavior of the CTE with noised data (Figure 8b) requires a deeper and nontrivial analysis that might lead to wrong assessments when analyzing real data with periodic or quasiperiodic behaviors, typical of synchronization experiments in magnetically controlled nuclear fusion. Similar to the previous tests, it has to be considered that the CTE depends on two delays: one between the driver "x" and response "y" ( ) and another between the secondary driver, i.e., the perturbation, "z" and the response quantity itself ( ). Fixing a certain value for , a behavior of the CTE as a function of can therefore be observed. Scanning the possible delays  Figure 6 with (a) 0% of noise and (b) 10% of Gaussian noise on data. Negative values of the TE are due to the KSG algorithm used to evaluate the TE itself and implies that the information flow is less than expected and is a symptom of no relationship [16,23].
Considering Figure 8a, the TE allows estimating the correct delay between the driver and the response system, but spurious peaks appear in both Figure 8a,b. In real applications, the actual delay of d xy = 50 samples would not be detected directly. The CTE improves the results especially for the noiseless data in Figure 8a. However, the behavior of the CTE with noised data (Figure 8b) requires a deeper and nontrivial analysis that might lead to wrong assessments when analyzing real data with periodic or quasiperiodic behaviors, typical of synchronization experiments in magnetically controlled nuclear fusion. Similar to the previous tests, it has to be considered that the CTE depends on two delays: one between the driver "x" and response "y" (d xy ) and another between the secondary driver, i.e., the perturbation, "z" and the response quantity itself (d zy ). Fixing a certain value for d zy , a behavior of the CTE as a function of d xy can therefore be observed. Scanning the possible delays between the two sawteeth functions ("z" and "y"), i.e., changing the d zy values and observing the behavior of the CTE, completely different behaviors emerge in which the maximum of the CTE at a specific d xy cannot be detected clearly. This is basically due to the periodic characteristic of the signals and of the secondary driver. Consequently, the delays d zy to set in order to proceed with the analysis, i.e., in order to detect the d xy at which the maximal influence of "x" on "y" can be addressed, have been selected considering the local maxima observed from the TE x→y . In such a way, it has been noticed that the local peak at d xy = 50 remained almost unaltered as it can be observed in Figure 8b, being the actual delay between the driver and the response system and consequently the maximum causality horizon for the signals analyzed.
On the other hand, the analysis, performed with the CJRP and shown in Figure 7, is actually less vulnerable to possible misunderstanding. Secondary peaks appear, but the actual one at the expected delay is identified directly by two estimators out of the three considered without any other further interventions.

Extension to More than Three Observables
In all natural sciences it is not uncommon to deal with multiple sources affecting the same observable, i.e., the response system. An important example in nuclear fusion deals with the need of avoiding disruptions [32,33]. Disruptions are abrupt and unwanted events occurring in magnetically controlled nuclear fusion tokamaks due to the rapid loss of the plasma confinement, usually in a time scale of few milliseconds, causing the termination of the pulse and having the potential to cause serious damages to the reactor. While it is known nowadays that many factors concur to the occurrence of this kind of events, neither their relative importance nor the sequence of the triggering factors are fully understood. Consequently, to investigate the possibility of using the CJRP with more than three variables, tests have been conducted and an example is reported in this section. Another observable "v" has been introduced in (11) and coupled linearly with y with a delay of d vy = 6. The system used to generate the synthetic data is reported in Equation (12), that actually extends (11) and has also been built on the basis of the work in [24], √ 2v n−4 + r 0 y n = 0.5x n−4 + 0.5z n−2 + 0.5v n−6 (12) where r 0 stands for a sample drawn from a normal standard Gaussian distribution. For this analysis, 506 samples have been evaluated and the first six were discarded. The objective of this test is to confirm that the approach of the CJRP can identify the proper delay between a driver and a target in presence of more than one disturbing influence. The methodology described in the previous sections has been applied and the results reported here for the stable region, again using twelve repetitions of the system in (12). In this example, the threshold zy and vy have been reduced iteratively. The CJRP allows estimating quite easily the correct synchronization between x and y at the delay d xy = 4 as can be derived by simple inspection of Figure 9. , and for the system in (12). The study is aimed at highlighting the synchronization between x and y, minimizing the effect of both z and v from y.
Considering (12), the expected delay is = 4. , and <L> (c), and for the system in (12). The study is aimed at highlighting the synchronization between x and y, minimizing the effect of both z and v from y. Considering (12), the expected delay is d xy = 4.

Presence of Outliers
As a final test, also the presence of outliers has been investigated. Here, the results obtained for the f p=0.1 III x 2 n−4 , z 2 n−2 ; 0.5, 5, 2.5 relationship described in Table 1 have been reported as the formulation belongs to one of the most challenging cases studied. The test has been performed considering the system described in (11), but adding for each element x n , y n , z n a Gaussian randomly picked value, i.e., N x n ,y n ,z n (µ, σ), according to the following rule, N x n ,y n ,z n (µ, σ) = N(µ = 0, σ = p(x n , y n , z n )) i f g ∈ U[0, 1] < 0.2 N(µ = (x n , y n , z n ), σ = p(x n , y n , z n )) i f g ∈ U[0, 1] ≥ 0.2 (13) where U[0, 1] indicates a uniform distribution between 0 and 1 from which a scalar g value is picked. Results are reported in Figure 10 for the two stable iterations of the methodology suggested and in Figure 11 for the TE and CTE. RR and DET identify correctly the actual delay between x and y. On the other hand, in this case it can be observed how the TE cannot evaluate clearly the correct delay as another spurious peak d xy = 19 emerges of almost the same amplitude as the right one; the CTE on the contrary provides the appropriate information.

Presence of Outliers
As a final test, also the presence of outliers has been investigated. Here, the results obtained for the . ( , ; 0.5, 5, 2.5) relationship described in Table I have been reported as the formulation belongs to one of the most challenging cases studied. The test has been performed considering the system described in (11), but adding for each element , , a Gaussian randomly picked value, i.e., , , ( , ), according to the following rule, where [0,1] indicates a uniform distribution between 0 and 1 from which a scalar g value is picked. Results are reported in Figure 10 for the two stable iterations of the methodology suggested and in Figure 11 for the TE and CTE. RR and DET identify correctly the actual delay between x and y. On the other hand, in this case it can be observed how the TE cannot evaluate clearly the correct delay as another spurious peak = 19 emerges of almost the same amplitude as the right one; the CTE on the contrary provides the appropriate information.
Behaviors of <RR> and <DET> in panels (a,b), respectively, for the case f p=0.1 III x 2 n−4 , z 2 n−2 ; 0.5, 5, 2.5 considering outliers in data. The methodology detects the correct delay at d xy = 4, with the exception of the <L> estimator that has not been reported. Figure 10. Behaviors of <RR> and <DET> in panels (a,b), respectively, for the case . ( , ; 0.5, 5, 2.5) considering outliers in data. The methodology detects the correct delay at = 4, with the exception of the <L> estimator that has not been reported.
(a) (b) Figure 11. Behavior of (a) TE and (b) CTE for the autoregressive system in presence of outliers as described in Equation (13). The TE detects the delay at = 4 and a spurious one at = 19 with almost the same amplitude, while the CTE finds the actual delay = 4 at = 2.
To complete the section, another test with different percentages of outliers (10%, 20%, or 30%) has been reported. In this case, either a Normal or a Skew-normal distribution has been used as pdf to generate the outliers themselves. For coherence, results regarding the same relationship . ( , ; 0.5, 5, 2.5) reported previously have been used. The time occurrence of an outlier has been chosen randomly, then considering the Normal case, it has been drawn from = ( , , ), = 0.1( , , ) . For the Skew-normal [34] case instead, for each outlier, a scalar "g" has been picked up first from a uniform distribution, then if g ∈ U[0,1] 0.5 the skewness parameter has been set to "4", otherwise to "−4". Results are coherent in both cases with the test reported in the previous paragraph. Already with 10% of outliers, indeed the TE cannot detect the expected delay and an analysis based on this estimator would suggest a weak or no correlation between the driver and the response quantity. Considering the CTE instead, behaviors are similar to Figure 11b. Setting the right delay of = 2 between the quantity "z" and the response one, the maximum of the curve can be observed at either = 3 or = 4, without a sensible difference between the two values. Increasing the number of outliers, a lower value of the maximum detected by the CTE is observed. While the JRP does not detect the correct delay between the driver and the response quantity, considering the CJRP instead, as in the previously reported test the "L" estimator is not informative, but the RR and the DET detect correctly a delay between the driver and the response system at either = 4 or = 5. The higher the number of outliers, the more oscillations are observed. Similarly with the CTE, not a particular dependence has been observed on the type of pdf used.

Discussions on Lines of Future Investigations
In terms of future applications in thermonuclear fusion, the developed tools could contribute to the development of integrated scenarios for ITER [35] and to the refinement of diagnostics for the magnetic fields [36,37]. Other interesting challenges would be the determination of causal relations between various atmospheric phenomena in the environmental sciences, whose periodic nature renders detrending and interpretation very delicate [38]. Figure 11. Behavior of (a) TE and (b) CTE for the autoregressive system in presence of outliers as described in Equation (13). The TE detects the delay at d xy = 4 and a spurious one at d xy = 19 with almost the same amplitude, while the CTE finds the actual delay d xy = 4 at d zy = 2.
To complete the section, another test with different percentages of outliers (10%, 20%, or 30%) has been reported. In this case, either a Normal or a Skew-normal distribution has been used as pdf to generate the outliers themselves. For coherence, results regarding the same relationship f p=0.1 III x 2 n−4 , z 2 n−2 ; 0.5, 5, 2.5 reported previously have been used. The time occurrence of an outlier has been chosen randomly, then considering the Normal case, it has been drawn from N(µ = (x n , y n , z n ), σ = 0.1(x n , y n , z n )). For the Skew-normal [34] case instead, for each outlier, a scalar "g" has been picked up first from a uniform distribution, then if g ∈ U[0, 1] ≤ 0.5 the skewness parameter has been set to "4", otherwise to "−4". Results are coherent in both cases with the test reported in the previous paragraph. Already with 10% of outliers, indeed the TE cannot detect the expected delay and an analysis based on this estimator would suggest a weak or no correlation between the driver and the response quantity. Considering the CTE instead, behaviors are similar to Figure 11b. Setting the right delay of d zy = 2 between the quantity "z" and the response one, the maximum of the curve can be observed at either d xy = 3 or d xy = 4, without a sensible difference between the two values. Increasing the number of outliers, a lower value of the maximum detected by the CTE is observed. While the JRP xyz does not detect the correct delay between the driver and the response quantity, considering the CJRP instead, as in the previously reported test the "L" estimator is not informative, but the RR and the DET detect correctly a delay between the driver and the response system at either d zy = 4 or d xy = 5. The higher the number of outliers, the more oscillations are observed. Similarly with the CTE, not a particular dependence has been observed on the type of pdf used.

Discussions on Lines of Future Investigations
In terms of future applications in thermonuclear fusion, the developed tools could contribute to the development of integrated scenarios for ITER [35] and to the refinement of diagnostics for the magnetic fields [36,37]. Other interesting challenges would be the determination of causal relations between various atmospheric phenomena in the environmental sciences, whose periodic nature renders detrending and interpretation very delicate [38].

Conclusions
In this work, a new type of Recurrence Plot has been defined: the Conditional Joint Recurrence plot or CJRP. Its properties have been tested using synthetic data generated with autoregressive linear, nonlinear noised, and noise-free models, with different couplings, using also periodic noised and noise free synthetic signals and including outliers. Results have shown the capability of the CJRP to identify the actual time delay of the causal influences, removing the effects of external perturbations up to a relevant level of noise added to the data. The outcomes of the tests have been averaged over twelve repetitions to provide a significant statistical basis to the conclusions. It should be mentioned that, in terms of individual realizations, 92% of correctly detected cases has been observed in general. In particular, the lowest scoring has been obtained for the nonlinear case f p=0.2 III x 2 n−4 , z 2 n−2 ; 0.5, 0.5, 0.25 with a 75% of success rate. This is thought to be due to the typology of the formulation analyzed and to the specific noise added.
Different formulations have been tested before selecting the one in (4), also using the Cross Recurrence Plots. However, the best results have been obtained with the formulation reported.
The great flexibility of the Joint Recurrence Plots, to include more quantities, simply adding terms in the Hadamard product, allows the CJRP to be easily extended to more than three quantities. Tests have been performed using the autoregressive system studied in this article and the results confirm what was just stated.
In conclusion, for practical purposes, the proposed CJRP can be implemented to complement the transfer entropy and the conditional transfer entropy. On the other hand, the CJRP approach presents various advantages: it is both simpler to implement and interpret and can also be extended immediately to higher numbers of variables, which presents a very serious challenge for TE and CTE.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this appendix, the plots regarding the methodology described in the article have been reported. The iterative procedure via the estimators used to quantify the CJRP can be observed. Figures A1 and A2 are related to the study reported in Figure 3, while Figure A3 is related to the counter example reported in Figure 5. x 2 n−4 , z 2 n−2 ; 0.5, 0.5, 0.25 . The study is aimed at inferring the correct delay between x and y at d xy = 4, while minimizing the influence of z. Vertical black lines have been added at the expected delay between the driver and the response system. The most informative plots have been put in a red and ticker frame. is aimed at inferring the correct delay between x and y at = 4, while minimizing the influence of z. Vertical black lines have been added at the expected delay between the driver and the response system. The most informative plots have been put in a red and ticker frame. x 2 n−4 , z 2 n−2 ; 0.5, 0.5, 0.25 . The study is aimed at inferring the correct delay between x and y at d xy = 4, while minimizing the influence of z. Vertical black lines have been added at the expected delay between the driver and the response system. The most informative plots have been put in a red and ticker frame. is aimed at inferring the correct delay between z and y at = 2, while minimizing the influence of x. Vertical black lines have been added at the expected delay between the driver and the response system. The most informative plots have been put in a red and ticker frame. x 2 n−4 , z 2 n−2 ; 0.5, 0.5, 0.25 . The study is aimed at inferring the correct delay between z and y at d zy = 2, while minimizing the influence of x. Vertical black lines have been added at the expected delay between the driver and the response system. The most informative plots have been put in a red and ticker frame.