Information Theory to probe Intrapartum Fetal Heart Rate Dynamics

Intrapartum fetal heart rate (FHR) monitoring constitutes a reference tool in clinical practice to assess the baby health status and to detect fetal acidosis. It is usually analyzed by visual inspection grounded on FIGO criteria. Characterization of Intrapartum fetal heart rate temporal dynamics remains a challenging task and continuously receives academic research efforts. Complexity measures, often implemented with tools referred to as \emph{Approximate Entropy} (ApEn) or \emph{Sample Entropy} (SampEn), have regularly been reported as significant features for intrapartum FHR analysis. We explore how Information Theory, and especially {\em auto mutual information} (AMI), is connected to ApEn and SampEn and can be used to probe FHR dynamics. Applied to a large (1404 subjects) and documented database of FHR data, collected in a French academic hospital, it is shown that i) auto mutual information outperforms ApEn and SampEn for acidosis detection in the first stage of labor and continues to yield the best performance in the second stage; ii) Shannon entropy increases as labor progresses, and is always much larger in the second stage;iii) babies suffering from fetal acidosis additionally show more structured temporal dynamics than healthy ones and that this progressive structuration can be used for early acidosis detection.


Introduction
Intrapartum fetal heart rate monitoring Because it is likely to provide obstetricians with significant information related to the health status of the fetus during delivery, intrapartum fetal heart rate (FHR) monitoring is a routine procedure in hospitals. Notably, it is expected to permit to detect fetal acidosis, which may induce severe consequences for both the baby and the mother and thus requires a timely and relevant decision for rapid intervention and operative delivery [1]. In daily clinical practice, FHR is mostly inspected visually, trained from clinical guidelines formalized by the International Federation of Gynecology and Obstetrics (FIGO) [2,3]. However, it has been well documented that such visual inspection is prone to severe inter-individual variability and even shows a substantial intra-individual variability [4]. This is reflecting both that FHR temporal dynamics are complex and hard to assess, and that FIGO criteria lead to a demanding evaluation, as they mix several aspects of FHR dynamics (baseline drift, decelerations, accelerations, long and short-term variabilities). Difficulties in performing objective assessment of these criteria has led to a substantial number of unnecessary Caesarean sections [5]. This has triggered a large amounts of researches world wide aiming both to compute in a reproducible and objective way the FIGO criteria [2], and to devise new signal processing inspired features to characterize FHR temporal dynamics (cf. [6,7] for reviews).
Related works After the seminal contribution in the analysis of heart rate variability (HRV) in adults [8], spectrum estimation has been amongst the first signal processing tool that has been considered for computerized analysis of FHR, either constructed on models driven by characteristic time scales [9,10,11] or scale-free paradigm [12,13,14]). Further, aiming to explore temporal dynamics beyond the mere temporal correlations, several variations of nonlinear analysis have been envisaged both for antepartum and intrapartum FHR [15], based, e.g., on multifractal analysis [13], scattering transforms [16], phase-driven synchronous pattern averages [17] or complexity and entropy measures [18,19,20]. Interested readers are referred to e.g. [6,7] for overviews. There has also been several attempts to combine features different in nature by doing multivariate classification using supervised machine learning strategies (cf. e.g., [21,6,22,23,24]).
Measures from Complexity Theory or Information Theory remain however amongst the most used tools to construct HRV characterization. They are defined independently from (deterministic) dynamical system or from (random) stochastic process frameworks. The former led to standard references, both for adult and for antepartum and intrapartum fetal heart rate analysis: Approximate Entropy (ApEn) [18,25], Sample Entropy (SampEn) [26], which can be regarded as practical approximations to Kolmogorov-Sinai or Eckmann-Ruelle complexity measures. The stochastic process framework leads to the definitions of Shannon and Rényi entropies and entropy rates. Both worlds are connected by several relations, cf. e.g., [27,28] for reviews. Implementations of ApEn and SampEn rely on correlation integral based algorithm (CI) [18,29] while that of Shannon entropy rates may instead benefit from k-nearest neighbor (k-NN) algorithm [30], which brings robustness and improved performance to entropy estimation [31,32,33].
Labor stages Automated FHR analysis is complicated by the existence of two distinct stages during labor. The dilatation stage (stage 1) consists in progressive cervical dilatation and regular contractions. The active pushing stage (stage 2) is characterized by a fully dilated cervix and expulsive contractions. The most common approaches in FHR analysis consist in not distinguishing stages and performing a global analysis [34,23] or in focusing on stage 1 only, as it is better documented and usually shows data with better quality, cf. e.g. [35,24]. Whether or not temporal dynamics associated to each stage are different has not been intensively explored yet (see a contrario [36,37]). However, recently, some contributions start to conduct systematic comparisons [38,39].
Goals, contributions and outline The present contribution remains in the category of works aiming to design new efficient features for FHR, here based on advanced information theoretic concepts. These new tools are applied to a high quality large (1404 subjects) and documented FHR database collected across years in an academic hospital in France, and described in Section 2. The database is split into two datasets associated each with one stage of labor, which enables us first to assess and compare acidosis detection performance achieved by the proposed features independently on each stage, and second to address differences in FHR temporal dynamics between the two stages. Reexamining formally the definitions of entropy rates in Information Theory, Section 3.1 first establishes that they can be split into two components: Shannon entropy that quantifies data static properties, and Auto mutual information (AMI) that characterizes temporal dynamics in data, combining both linear and nonlinear (or higher order) statistics. ApEn and SampEn, defined from Complexity Theory, are then explicitly related to entropy rates, and hence to AMI (cf. Section 3.3). Estimation procedures for Shannon entropy, entropy rate and AMI, based on k-nearest neighbor (k-NN) algorithms [30], are compared to those of ApEn and SampEn, constructed on Correlation Integral algorithms [18,40,29]. Acidosis detection performances are reported in Section 4.1. Results are discussed in terms of quality versus analysis window size, k-NN or correlation integral based procedures, and differences between stage 1 and stage 2. Further, a longitudinal study consisting of sliding analysis in overlapping windows across the delivery process permits to show that processes characterizing stage 1 and stage 2 are different (Section 4.4).
2 Datasets: intrapartum fetal heart rate times series and labour stages.
Data collection. Intrapartum fetal heart rate data were collected at the academic Femme-Mère-Enfant hospital, in Lyon, France, during daily routine monitoring across the years 2000 to 2010. They are recorded using STAN S21 or S31 devices with internal scalp electrodes at 12bit resolution, 500 Hz sampling rate (STAN, Neoventa Medical, Sweden). Clinical information was provided by the obstetrician in charge, reporting delivery conditions as well as the health status of the baby, notably the umbilical artery pH after delivery and the decision for intervention due to suspected acidosis [41]. Datasets. For the present study, subjects were included using criteria detailed in [41,24], leading to a total of 1404 tracing, lasting from 30 minutes to several hours. These criteria essentially aim to reject subjects with too low quality recording (e.g., too many missing data, too large gaps, too short recordings, . . . ). As a result, for subjects in the database, the average fraction of data missing in the last 20 minutes before delivery is less than 5%. The first goal of the present work is to assess the relevance of new information theoretics measures; their robustness to poor quality data is postponed for future work.
The measurement of pH, performed by blood test immediately after delivery is systematically documented and used as ground-truth: When pH ≤ 7.05, the newborn is considered has having suffered from acidosis, and referred to as acidotic (A, pH ≤ 7.05). Conversely, when pH > 7.05, the newborn is considered not having suffered from acidosis during delivery, and termed normal (N, pH > 7.05). In order to have a meaningful pH indication, we retain only subjects for which the time between end of recording and birth is lower or equal to 10 min.
Following the discussion above on labor stages, subjects are split into two different datasets. Dataset I consists of subjects for which delivery took place after a short stage 2 (less than 15 min) or during stage 1 (stage 2 was absent), It contains 913 normal and 26 acidotic subjects. Dataset II gathers FHR for delivery that took place after more than 15 min of stage 2. It contains 450 normal and 15 acidotic subjects. Beat-per-Minute time series and preprocessing. For each subject, the collection device provides us with a digitalized list of RR-interarrivals ∆ k in ms. In reference to common practice in FHR analysis, and for ease of comparisons amongst subjects, RR-interarrivals are converted in regularly sampled Beat-per-Minute times series, by linear interpolation of the samples {. . . , 36000/∆ k , . . .}. The sampling frequency has been set to fs = 10Hz as FHR do not contain any relevant information above 3Hz.

Methods
Outline We describe in this section the five features that we use to analyze heart rate signals. We propose to apply Information Theory, as defined by Shannon, to the analysis of cardiac signals. We do so by computing the Shannon entropy, the Shannon entropy rate, and the auto mutual information. The first section below is devoted to the definition of these quantities, which provide three features. The second section reports the definitions of two features rooted in Complexity theory: Approximate Entropy (ApEn) and Sample Entropy (SampEn), which are classics in cardiac signal analysis. Although we use them in practice only as benchmarks, we devote a last section to their relation with the new features we propose.
Information Theory and Complexity theory only differ in the nature of the objects under study. Information Theory, on one hand, aims to analyze random processes and defines functionals of probability densities. Complexity theory, on the other hand, aims to analyze signals produced by dynamical systems and assumes the existence of ergodic probability measures to describe the density of trajectories in phase space, so that they can be manipulated as probability densities. In this spirit, we consider through all this paper, the signals to analyze as random processes, although they indeed originate from a dynamical system.
Assumptions For the sake of simplicity in the description of the features, and for practical use, we assume that signals are monovariate (unidimensional) and centered (zero mean) because the five features we use are independent of the first moment of the probability density function. We also assume that signals are stationary. Although this may seem at first a very strong assumption, it is very reasonable when examining time windows smaller than the natural time scale of the evolution between stages 1 and 2, as we discuss in section 4.4, and larger than events such as contractions. Finally, we also assume that the signals contain N points, sampled at a constant frequency. All estimates depends on N via finite size effects. In the following, we do not mention this dependence explicitly in the notations, and only compare features computed over the same window size.
Time-embedding Because we are interested in the dynamics of the signal, we use the delay-embedding procedure introduced by Takens [42] in the context of dynamical systems. Its goal is to include timecorrelations into the statistics. We construct, by sampling the initial process X every τ points in time, a m-dimensional time-embedded process X (m,τ ) as: In practice, we have a finite number N of points in the time-series, so there are N − m + 1 well-defined embedded vectors.

Information Theory features
We now briefly recall definitions from Information Theory introduced by Shannon [43]. This paradigm aims to describe processes in terms of information, and it can be applied to any experimental signals.

Shannon and Rényi Entropies
Shannon entropy H(X (m,τ ) ) of a m-dimensional embedded process X (m,τ ) is a functional of its joint probability density function p x (m,τ ) t [43]: (2) which does not depend on t, thanks to the stationarity of the signal. Shannon entropy measures the total information contained in the process X (m,τ ) . For embedding dimension m = 1, it is independent of the sampling parameter τ and we write in the following H(X (1,τ ) ) = H(X).
Rényi q-order entropy Rq(X (m,τ ) ) is defined as another functional of the probability density [44]: When q → 1, the Rényi q-order entropy converges to the Shannon entropy.

Entropy Rates
Shannon entropy rate is defined as: which can be shown to be equivalent to: We define the m-order Shannon entropy rate as measuring how much the Shannon entropy increases between two consecutive time-embedding dimensions m and m + 1: It quantifies the variation of the total information in the time-embedded process when the embedding dimension m is increased by 1. Interpreting eq.(6), it measures information in the m + 1 coordinates of X Rényi order-q entropy rate and m-order Rényi order-q entropy rate can be defined in the same way, replacing Shannon entropy H by Rényi order-q entropy Rq in eq.(4) and eq.(6), respectively. Nevertheless, it should be emphasized that Rényi order-q entropy is lacking chain rule of conditional probabilities as soon as q = 1; therefore, eq.(7) does not hold for Rényi order-q entropy, unless q = 1 (Shannon entropy).

Mutual Information
The Mutual Information (MI) of two processes measures the information they share [43]. MI is the Kullback-Leibler divergence [45] between the joint probability function and the product of the marginals which would express the joint probability function if the two processes were independent. For time-embedded processes X (m,τ ) and Y (p,τ ) , it reads: Mutual information is symmetrical with respect to its two arguments. If X and Y are stationary processes, the mutual information I(X (m,τ ) , Y (p,τ ) ) depends only on t − t, the time difference between X (m,τ ) and Y (p,τ ) .
Auto Mutual Information If Y = X and t − t = pτ , the MI measures the information shared by two consecutive chunks of the same process X, both sampled at τ . This quantity is sometimes called "information storage" [46,47,33], and we refer to it as the Auto Mutual Information (AMI) of the process X: Remarking that the concatenation of x (p,τ ) the AMI depends on the embedding dimensions (m, p) and the sampling time τ only. AMI of order (m, p) measures the shared information between consecutive m-points and p-points dynamics, i.e., by how much the uncertainty of future p-points dynamics is reduced if the previous m-points dynamics is known.
Thanks to the symmetry of the MI with respect to its two arguments, and invoking the stationarity of X, the AMI is invariant when exchanging m and p: We emphasize that the MI -and therefore AMI -are defined only for the Shannon entropy. The expression of the Rényi order-q mutual information is not unique as soon as q = 1, and we do not consider it here.
Special case p = 1 If p = 1, the AMI is directly related to the Shannon entropy rate of order m: or equivalently Interestingly, this splits the entropy rate h (m,τ ) (X) in two contributions. The first one is the total entropy H(X) of the process which only depends on the one-point statistics, and so does not describe the dynamics of X. The second term is the AMI I (m,1,τ ) (X) which depends on the dynamics of the process X, irrespective of its variance [48].

Special case of a process with Gaussian distribution For illustration, if X is a stationary
Gaussian process, hence fully defined by its variance σ 2 and normalized correlation function c(τ ), we have: where Σ (m) is the m × m correlation matrix of the process X; Σi,j = c(|i − j|τ ) and |Σ (1) | = 1. For the particular case p = 1, we have: which clearly illustrates the decomposition of the entropy rate according to eq.(11): the first term H(X) depends only on the static (one-point) statistics (via σ 2 ) and the second term I (m,1,τ ) (X) depends on the temporal dynamics (and in this simple case only on the dynamics, via the auto-correlation function c(τ )).

Features from Complexity Theory
In the 1960's, Kolmogorov and Sinai adapted Shannon's Information Theory to the study of dynamical systems. The divergence of trajectories starting from different but undistinguishable initial conditions can be pictured as creating uncertainty, so creating information. Kolmogorov Complexity (KC), also known as the Kolmogorov-Sinai entropy and noted hKS(ρ) in the following, measures the mean rate of creation of information by a dynamical system with ergodic probability measure ρ. KC is constructed exactly as the Shannon entropy rate from Information Theory, using eq.(4) and the same functional form as in eq.(2), but using the density ρ of trajectories in phase space instead of the probability density p. In the early 1980's, the Eckmann-Ruelle entropy K2(ρ) [29,49] was introduced following the same steps but using the functional form of the Rényi order-2 entropy (equation (3)). The interest of K2 relies in its easier and hence faster computation from experimental time series, which was at the time a challenging issue.
Kolmogorov-Sinai and Eckmann-Ruelle entropies The ergodic theory of chaos provides a powerful framework to estimate the density of trajectories in the phase space of a chaotic dynamical system [49].
For an experimental or numerical signal, it amounts to assimilating the phase space average to the time average. Given a distance d(., .) -usually defined with the L 2 or the L ∞ norm -in the m-dimensional embedded space, the local quantity provides, up to a factor , an estimate of the local density ρ in the m-dimensional phase space around the point x (m,τ ) i . The following averages: are then used to provide the following equivalent definitions of the complexity measures [49]:

Approximate Entropy
Approximate Entropy (ApEn) was introduced by Pincus in 1991 for the analysis of babies heart rate [50]. It is obtained by relaxing the definition (21) of hKS and working with a fixed embedding dimension m and a fixed box size , often expressed in units of the standard deviation σ of the signal as = rσ. ApEn is defined as On the practical side, and in order to have a well-defined Φ m ( ) in (19), the counting of neighbors in the definition (18) allows self-matches j = i. This ensures that C m i ( ) > 0, which is required by (19). ApEn depends on the number of points N in the time series. Assuming N is large enough, we have lim →0 lim m→∞ ApEn(m, ) = hKS , We interpret ApEn as an estimate of the m-order Kolmogorov-Sinai entropy hKS at finite resolution . The larger N , the better the estimate. More interesting is that the non-vanishing value of in its definition makes ApEn insensitive to details at scales lower than . On one hand, this is very interesting when considering an experimental (therefore noisy) signal: choosing larger than the rms of the noise (if known) filters the noise, and ApEn is then expected to measure only the complexity of the underlying dynamics. This was the main motivation of Pincus and explains the success of ApEn. On the other hand, not taking the limits → 0 and m → ∞ makes ApEn an ill-defined quantity that has no reason to behave like hKS. In addition, only very few analytical results have been reported on the bias and the variance of ApEn.
Although m should in theory be larger than the dimension of the underlying dynamical system, ApEN is defined and used for any possible value of m and most applications reported in the literature are using small m (1 or 2) without any analytical support, but with great success [50,51].

Sample Entropy
A decade after Pincus seminal paper, Richman and Moorman pointed out that ApEn contains in its very definition a source of bias and was lacking in some cases "relative consistency". They defined Sample Entropy (SampEn) on the same grounds as ApEn: so that lim On the practical side, the counting of neighbors in (18) does not allow self-matches. C m i ( ) may vanish, but when averaging over all points in eq.(20), the correlation integral C m ( ) > 0. In practice, SampEn is considered to improve on ApEn as it shows lower sensitivity to parameter tuning and data sample size than ApEn [52,53].
We interpret SampEn as an estimate of the m-order Eckmann-Ruelle entropy K2 at finite resolution .

Estimation
We note in the following ApEn (m) and SampEn (m) the estimated values of ApEn and SampEn using our own Matlab implementation, based on Physio-Net packages. We used the commonly accepted value, = 0.2σ, with σ the standard deviation of X, and m = 2. For all quantities, we used τ = 5 = fs/fmax with fmax = 2Hz the cutoff frequency above which FHR times series essentially contain no relevant information [13]; this time delay corresponds to 0.5s.

Connecting Complexity Theory and Information Theory.
We consider here for clarity only the relation between ApEn and m-order Shannon entropy rate, although the very same relation holds between SampEn and m-order Rényi order-2 entropy rate. In Information theory terms, ApEn appears as a particular estimator of the m-order Shannon entropy rate that computes the probability density by counting, in the m-dimensional embedded space, the number of neighbors in a hypersphere of radius , which can be interpreted as a particular kernel estimation of the probability density.

Limit of large datasets and vanishing : exact relation
When the size of the spheres tends to 0, the expected value of ApEn for a stochastic signal X with any smooth probability density is related, in the limit N → ∞, to the m-order Shannon entropy rate [28]: Both terms involve m-points correlations of the process X. This relation allows a quantitative comparison of ApEn with the m-order Shannon entropy rate h (m,τ ) . The log(2 ) difference corresponds to the pavingwith hyperspheres of radius -of the continuous m-dimensional space over which the probability p(x (m,τ ) t ) involved in eq.(2) is defined, and thus h (m,τ ) . This paving defines a discrete phase space, over which eqs. (18), (19) and (23) operate to define ApEn [54]. This illustrates that, for a stochastic signal, ApEn diverges logarithmically as the size approaches 0, as expected for hKS. Fortunately, is fixed in the definition of ApEn which allows in practice to compute it for any signal/process.

New features
Once recognized the success of ApEn and remembering its relation to h (m,τ ) , it seems interesting to probe other m-order Shannon entropy rate estimators. A straightforward improvement would be to consider a smooth -e.g. Gaussian -kernel of width instead of the step function used in (18). We prefer to reverse the perspective and use a k-nearest-neighbor (k-NN) estimate. Instead of counting the number of neighbors in a sphere of size , this approach searches for the size of the sphere that accomodates k neighbors. In practice, we compute the entropy H with the Kozachenko & Leonenko estimator [30,55], which we notê H. We compute the auto mutual information I (m,p,τ ) with the Kraskov et al estimator [56], which we notê I (m,p) . We then combine the two according to eq.(11) to get an estimatorĥ (m) of the m-order Shannon entropy rate. We use k = 5 neighbors and set τ = 5 (see section 3.2.3).
We report in the next section our results for the five features when setting m = 2, p = 1, and compare their performances in detecting acidosis. The dependance of the m-order entropy rate (and its estimators) on m is expected to give some insight in the dimension of the attractor of the underlying dynamical system, but as we have pointed out, the dynamics is indeeed contained in the AMI part of the entropy rate. This is why we further explore the effect of varying the embedding dimensions m and p on the AMI estimator I (m,p) .   ApEn (m) , SampEn (m) ,ĥ (m) ,Ĥ andÎ (m,p) for normal and acidotic (abnormal) subjects in dataset I and II using data from the last T = 20mn before delivery, which are the most crucial. We use the classical values m = 2 and p = 1 for embedding dimensions. To compare performance, we present the box plots of the five normalized (zero-mean, unit-variance) estimates in the left column of Fig. 1. For dataset I, the average of ApEn and SampEn for acidotic subjects is smaller than for normal subjects, while the average of Shanon entropy rate does not show any tendency. This is surprising as one might have expected forĥ (m) a behavior similar to ApEn and SampEn (see section 3.3). Average values ofĤ andÎ (2,1) are larger for acidotic subjects. The larger value of Shannon entropy H indicates that the acidotic FHR signals contain more information. The larger value of AMI indicates a stronger dependence structure in the dynamics of abnormal subjects. For subjects in dataset II, it is harder to find any tendency by looking the average values.
Features performance Fetal acidosis detection performance is assessed with the p-value given by the classical Wilcoxon ranksum test. This non-parametric test of the null hypothesis -which corresponds to identical medians of the distributions of estimates in the normal and abnormal classes -is reported in table 1. We have added one symbol when the p-value is less than 0.05, two when less than 0.01. We see that for dataset I, ApEn (m) , SampEn (m) ,Ĥ andÎ (m,p) for m = 2 discriminate normal and acidotic subjects, whileĥ (m) does not. Out of the three estimates (ApEn (m) , SampEn (m) ,ĥ (m) ) based on entropy rates, the nearest-neighbors one for Shannon entropy rate is the poorest, although its decomposition into Shannon entropy (static one-point information) and AMI (which includes dynamic information) leads to two satisfying estimates Fig. 1 and Table 1 both show that the best performing estimators are SampEn (m) and I (m,p) . In dataset II, although all features performs more poorly than in dataset I, SampEn and AMI are again the best ones, with a p-value lower than 0.05. We focus on these two features in the following.

Dataset II AUC
p-value AUC p-value ApEn (2) 0.76 4.08e-06 0.61 1.33e-01 SampEn (2) 0. 79   Receiver Operational Characteristics To compare the two best performing features, SampEn and AMI, we plot receiver operational characteristics (ROC) curves in Fig. 2, both for dataset I and II, using data from the last T = 20mn before delivery. For dataset I, AMI better discriminates acidotic subjects from normal ones. For dataset II, AMI discrimination is only slightly better than SampEn. The area under the curve (AUC) of the ROC of the features is reported in Table 1, with a bold font to indicate the estimator with the largest AUC. Performance is much worse in dataset II than in dataset I: the AUC is reduced. Nevertheless, AMI is always the better performing estimator (its AUC reduces from 0.84 to 0.68), followed by SampEn (AUC reducing from 0.79 to 0.67).

Effect of the window size on the performance
We investigate the robustness of the detection performance when the window size T is reduced, using data from T = 20, 10 and 5 minutes. Results are reported in Fig 1 and Table 2. p-values and AUC both indicate thatÎ (2,1) and SampEn (2) provide robust discrimination in dataset I even when the observation length is reduced. Again,Î (2,1) is better performing: its AUC is reduced from 0.84 to 0.83 when T is reduced from 20mn to 5mn, where the AUC of SampEn is reduced from 0.79 to 0.72. In dataset II, once again, performance degrades, but AMI is still better at discriminating acidotic from normal subjects. In the following, we focus on AMI estimates only.

Effect of the embedding dimensions on (fetal acidosis detection) performance of AMI
In order to improve the acidosis detection performance of the AMI, especially in dataset II, we increase the embedding dimensions m and p used in computingÎ (m,p) . This way, we probe higher order dependence stucture in the dynamics. Because of the symmetry of AMI (eq. (10)), and aiming at probing the effect of increasing either m or p, we plot the AUC of ROC as a function of m + p only. The dependence of AMI on m − p is much smaller and not reported here. These computations have been done with a larger value k = 15 in the k-NN algorithms, in order to accommodate the possibly large embedding dimensions (m + p up to 12). Results are presented in Fig. 3. For a fixed window size, the AUC increases when m + p increases, and reaches a maximum; it then remains constant or decreases slightly. This behaviour is observed in both datasets I and II and for any window size T ∈ [5, 10, 20]mn. Varying T dos not seems to change the location of the maximum of the AUC in a given dataset. The optimal embedding dimension is m + p = 6 in dataset I and m + p = 10 in dataset II. This hints at a difference in the dynamics of the FHR in the two datasets. Because both bias and computation time increase with the total dimensionality [57], the maximal embedding is restricted to m = p = 3. A reduction of the AUC is observed when the analysis window is reduced, but this is only significant for dataset II.
We reported in Table 2 the AUC and p-values of the AMI for two embedding dimensions:Î (2,1) and I (3,3) , for datasets I and II and several window sizes. The best performing estimator is indicated in bold. For all observation windows and for the two datasets,Î (3,3) achieves the best performance. Their AUC is always larger than the one obtained using SampEn or AMI with m = 2 and p = 1.

Dynamical Analysis
We now explore how long before delivery can the AMI diagnose fetal acidosis on a FHR signal. To do so, we do not restrict our analysis to the last data points before delivery, but we apply it to an ensemble of windows scanning first and second stages of labor. We examine the dynamics ofÎ (3,3) , the best performing feature, for both normal and abnormal subjects.

Dataset I: rapid delivery
In this first section, we focus on dataset I and probe the stage 1, including early labor, active labor, and transition. Using the time at which stage 1 ends as a reference (setting it at t0 = 0), we compute for each subjectÎ (3,3) in a set of time windows [ti − T, ti], 0 ≤ i ≤ 50 of fixed size T ending at ti = t0 − i * 2mn, so separated by 2mn. We perform this analysis for three window sizes T ∈ [20, 10, 5]mn. The value of AMI computed in the i-th window is then assigned to time ti − T /2, at the center of the interval. By construction of dataset I, delivery occurs less than 15 minutes after pushing started, and can be as short as 1mn, so we completely discard data from stage 2. We then average the values of AMI over the population of normal subjects, and over the population of acidotic subjects, respectively. Results, including p-values, are presented in Fig.4.
A first observation is that AMI is always larger for acidotic subjects than for normal subjects. As labor progresses, AMI increases in both populations, but the increase is stronger for acidotic subjects. As a consequence, the p-value of the test decreases clearly, so the feature performs better and better when approaching delivery. Detection of acidosis using the AMI feature and T = 20mn can be obtained in dataset Right: corresponding p-value. A single black + symbol in the AMI plot indicates a p-value lower than 0.05, two ++ indicate a p-value lower than 0.01. The gray-shaded region represents the time window used to compute the last value of AMI.
I as early as 80 minutes before entering in second stage. Using shorter windows, T = 10mn or 5mn, detection is still reliable as early as one hour before stage 2. We interpret this reduced forecast of acidosis detection in dataset I as a direct consequence of the reduction of the statistics when the window size T is reduced.

Dataset II : delivery after pushing more than 15mn
For dataset II, we performed the same dynamical analysis as in the previous section, using the end of stage 1 as the reference time (t0 = 0). Because there is now enough data in the pushing stage, we also perform the analysis of this stage using the delivery time (t0 = D) as the reference. All results are presented in Fig. 5. At the end of stage 1, we observe again that AMI is larger for acidodic subjects than for normal ones, but the difference is not significant in this group (see the corresponding p-value on the right of Fig. 5). The situation is identical at the end of stage 2, although we obtain a lower p-value in some windows. The p-value does not decrease clearly when approaching delivery time, as it was in dataset I, see Fig. 4. For subjects in dataset II it is very difficult to make an early detection of acidosis. However, we observe in Fig. 5 that the average AMI is significantly larger at the end of stage 2 than at the end of stage 1. The increase of AMI is larger for abnormal subjects.
To examine more precisely the dynamical increase of AMI, especially when entering stage 2, we computed I (3,3) over an ensemble of windows of size T = 20mn spanning continuously a large time interval that includes the end of the active labor and the beginning of the pushing stage. Results are reported in Fig. 6. We see a continuous increase of AMI values when evolving from stage 1 to stage 2. The increase is more important for abnormal subjects which corroborates the findings in Fig. 5. For smaller window sizes, the situation is less clear.
We also studied the dynamical evolution of the Shanon entropy estimateĤ, which, together with the AMI, combines into the Shannon entropy rate (see eq. (11)). Changes in the Shannon entropy H indicate changes in the probability density of the signal. Results are presented in Fig. 6, side by side with the AMI. We observe a dramatic rise of the value ofĤ when subjects evolve from stage 1 to stage 2. This increase is clearly observed for normal and abnormal subjects. No significant difference between normal and acidotic subjects is observed for this static quantity. The start of pushing implies a strong deformation of the probability density of the FHR, indicating strong perturbations of the FHR, for both normal and acidotic subjects.

Discussion, conclusions and perspectives
We now discuss interpretation of Shannon entropy and AMI measurements in different stages of labour. The fetuses are classified as normal or acidotic depending on a post-delivery pH measurement, which gives a diagnosis of acidosis at delivery only. There is no information on the health of the fetuses during labor.
The physiological interpretation of a feature, and especially its relation to specific FHR patterns, e.g., like those detailed in [58,2], is a difficult task that is only scarcely reported in the litterature [59,60]. In this article, we have averaged our results over large numbers of (normal or acidodic) subjects, which jeopardizes any precise interpretation in terms of a specific FHR pattern that may appear only intermittently.

Acidosis detection in first stage
We can nevertheless suggest that the value of the Shannon entropy H is related to the frequency of decelerations in the FHR signals. Indeed, Shannon entropy strongly depends on the standard deviation of the signal (e.g., see eq.(15)), which in turn depends on the variability in the observation window. A larger number of decelerations in the observation window deforms the PDF of the FHR signal by increasing its lower tail; in particular, this increases the width of the PDF, and hence increases the standard deviation and the Shannon entropy. This explains our findings in Fig. 1 (for dataset I).
When acidosis develops in first stage of labor, the Shannon auto mutual information estimatorÎ (m,p) significantly outperforms all other quantities both in terms of p-value and AUC. The performance of AMI is robust when tuning either the size of the observation -and hence the number of points in the data -and the embedding dimensions (m, p). In addition, the performance slightly increases when the total embedding dimension m + p increases; although one has to care about the curse of dimensionality.
For abnormal subjects from dataset II, AMI is not able to detect acidosis using data from stage 1. This suggests that acidosis develops later, in the second stage of labor.
For all datasets, AMI computed with τ = 0.5s is always larger for acidotic subjects than for normal subjects. This is in agreement with results obtained with ApEn and SampEn, which are both lower for acidotic subjects. This shows that FHR classified as abnormal have a stronger dependence structure at   Figure 6: Average behavior of AMIÎ (3,3) (left) and Shannon entropyĤ (right) for normal (black) and acidotic (red) subjects in dataset II (delivery occuring more than 15mn after pushing started). Quantities are computed in windows of size T = 20mn. Vertical magenta line indicate the beginning of stage 2 (pushing). We have used a different color code for windows spanning both stages I and II: blue for normal subjects and magenta for acidotic ones.
small scale than normal ones. We can relate this increase of the dependence structure of acidotic FHR to the short term variability and to its coupling with particular large scale patterns. For example, a sinusoidal FHR pattern [2], especially if its duration is long, should give a larger value of the AMI, because its large scale dynamics is highly predictable. As another example, we expect variable decelerations (with an asymmetrical V-shape) and late decelerations (with a symmetrical U-shape and/or reduced variability) to impact AMI differently. Of course, the choice of the embedding parameter τ is then crucial, and this is currently under investigation.
AMI and entropy rates depend on the dynamics as they operate on time-embedded vectors. AMI focuses on nonlinear temporal dynamics, while being insensitive to the dominant static information. AMI is independent of the standard deviation which on the contrary contributes strongly to the Shannon entropy. This explains why AMI performs better than entropy rates estimates, such as ApEn (m) , SampEn (m) andĥ (m) , which depend also on the standard deviation.

Acidosis detection in second stage
The results reported for stage 2 show a severe decrease in performance of the five estimated quantities. Analysing stage 2 is far more challenging than analysing stage 1, which suggests that temporal dynamics in stage 2 differ notably for those of stage 1 [38] or simply that our database does not contain enough acidotic subject in that case.Î (m,p) achieves the best performance in terms of p-value and AUC; this clearly underlines that the analysis of nonlinear temporal dynamics is critical for fetal acidosis detection in stage 2. As in stage 1, the AMI is always larger in stage 2 for acidotic subjects than for normal subjects.
Although the Shannon entropy computed from the last 20mn of stage 2 before delivery does not show a clear tendency in Fig. 1 for dataset II, looking at Figure 6 clearly shows thatĤ increases as labor progresses: this is probably related to the average increase of the number of decelerations, which is expected in both the normal and acidotic population.
SampEn (2) is also able to perform discrimination in stage 2. From these observations, one may envision the definition of a new estimator that would measure the auto mutual information using the Rényi order-2 entropy by applying eq. (12). Nevertheless, it should be emphasized that Rényi order-q entropy is lacking chain rule of conditional probabilities as soon as q = 1, which may jeopardize any practical use of such an estimator.

Probing the dynamics
Increasing the total embedding dimensions in AMI improves the performance in the detection of acidotic subjects, in both first and second stages. The best performance is found for different total embedding dimension in the two datasets. This suggests that FHR dynamics is different in each stage.
As seen in eq. (11), the Shannon entropy rate can be split in two contributions: one that depends only on static properties (the Shannon entropy, estimated byĤ) and one that involves the signal dynamics (the auto mutual information, estimated byÎ (m,1) ). By following the time evolution of these two parts, we were able to relate Shannon entropyĤ with the evolution of the labor and AMI not only with the evolution of the labor, but also with possible acidosis. Looking at subjects for which the pushing phase is longer than 15mn, it clearly appears that all fetuses are affected by the pushing, as evidenced by a large increase of the Shannon entropyĤ, and a small increase of AMI. Additionally, the increase of AMI is steeper for abnormal subjects than for normal ones, which may indicate different reactions to the pushing and can be related to specific pathological FHR patterns. When the pushing stage is long (dataset II), fetuses reported as acidotic are not showing any sign of acidosis until prolongated pushing. These fetuses appear as normal until delivery is near.
When acidosis develops during the first stage of labor, in dataset I, we observe clearly that while AMI increases steadily till delivery for healthy fetuses, it increases faster for acidotic ones. This suggests that acidotic fetuses in dataset I react to early labor, as early as one hour before pushing starts. This could not only indicate that some fetuses are prone to acidosis, but also may pave the way for an early detection of acidosis in this case.