Unsupervised Clustering of Heartbeat Dynamics Allows for Real Time and Personalized Improvement in Cardiovascular Fitness

VO2max index has a significant impact on overall health. Its estimation through wearables notifies the user of his level of fitness but cannot provide a detailed analysis of the time intervals in which heartbeat dynamics are changed and/or fatigue is emerging. Here, we developed a multiple modality biosignal processing method to investigate running sessions to characterize in real time heartbeat dynamics in response to external energy demand. We isolated dynamic regimes whose fraction increases with the VO2max and with the emergence of neuromuscular fatigue. This analysis can be extremely valuable by providing personalized feedback about the user’s fitness level improvement that can be realized by developing personalized exercise plans aimed to target a contextual increase in the dynamic regime fraction related to VO2max increase, at the expense of the dynamic regime fraction related to the emergence of fatigue. These strategies can ultimately result in the reduction in cardiovascular risk.


Introduction
VO 2 max is an index of cardiovascular fitness and aerobic endurance, expressed in mL/kg·min, which refers to the maximum amount of oxygen that an individual can uptake during intense or maximum exercise [1][2][3]. It is directly proportional to the amount of energy that an individual can produce aerobically: the more oxygen consumed, the greater the energy produced [4]. Estimation of VO 2 max is particularly important since it was shown in several studies that cardiovascular fitness has a significant impact on overall health. Mounting evidence has firmly established that low levels of cardio-respiratory fitness are associated with a high risk of cardiovascular disease (CVD) and all-cause mortality, as well as mortality rates attributable to various cancers, especially of the breast and colon/digestive tract [5]. Those findings are supported by additional research which revealed that a 10% increase in VO 2 max could decrease all-cause mortality risk by 15% [6,7]. In this respect, finding a way to easily measure improvement in cardiovascular fitness is especially important to reduce cardiovascular risk. Evidence on cardiovascular risk is biased toward causes rather than prevention techniques, which have yet to be widely reproduced or supplied at a scale that makes them a viable alternative for public health efforts. Change is difficult, time consuming, and resource intensive. In this context, technology can aid in the support and maintenance of healthy behaviors: smart wearable devices can monitor four intervals characterized by peculiar heart rate dynamics. Among these, a dynamic range is characterized by a fraction which highly correlates with the VO 2 max parameter, and another dynamic range presents some features that can be associated with the emergence of fatigue. This ultimately allows users to understand when, and to what extent, cardiovascular response is adapting to improve VO 2 max; thus, providing personalized real-time feedback about the user's fitness level improvement.

Data Acquisition
This study considered city running sessions of comparable duration performed by a male non-professional runner over a year (age = 57 years; BMI = (22.63 ± 1.85) kg/m 2 ) who has shown an improvement in his VO 2 max value (35.94 ± 1.91 mL/kg·min). The runner acquired 21 different time series (duration 1 h) using an Apple Watch (A2292) and, for a subset of the acquisitions the same data were acquired using a smart watch (Garmin Fenix 5x Plus), respectively one on the right wrist and one on the left wrist. Apple Watch provides heart rate (HR) time series (measured in beats per minute, bpm), speed (v) time series (measured in meters per second, m/s), and altitude (z) time series (measured in meters, m) at non equally spaced time intervals (respectively, (5.1 ± 2.6) s, (2.6 ± 1.4) s, (2.0 ± 1.8) s). Garmin provides heart rate time series (measured in beats per minute, HR) at equally spaced time intervals (t = 1 s). The reason why we chose to use both sensors is that an equally spaced signal allowed us to calculate the HR auto-correlation function (ACF), without any imputing and manipulation on raw data, needed to select the optimal time window to split up the time series and extract the features for the clustering analysis. Notice that we obtained a Pearson correlation coefficient r = 0.93 between the time series of apple and that of Garmin, meaning that they are extremely similar (Supplementary Materials, Figure S1).

Data Cleaning
For this analysis we focused on the interval including HR values which are related to intense and maximum physical activity (area of maximum cardiac effort). This interval will include, therefore, only HR values above 90% of the maximum heart rate, calculated according to Tanaka formula [26]: HR time series were further processed by removing outliers due to periodic sensor malfunctions using z-score with a threshold of 3 standard deviations. Speed time series, affected by the highest signal to noise ratio due to GPS, were processed with a low pass Butterworth filter with a cutoff frequency of 0.01 and order 2. The values of these parameters were chosen following a noise signal analysis. After data cleaning the different time series were time aligned. As every running session was performed at different altitudes, we rescaled the altitude timeseries subtracting the relative starting point z 0 .

Time Interval as Statistical Unit
We divided each time series in n overlapping point by point time intervals of width ∆t = 90 s, with n being the total number of points in the time series. ∆t was selected by calculating the ACF cutoff time (t cut ) for the HR time series. ACF defines how data points in a time series are related, on average, to the preceding data points [27]. The red point is the intersection between ACF and the upper border of the confidence interval. The red point thus indicates a threshold lag t cut , suggesting that at times higher than t cut a correlation can be found with a probability less than 5%. Figure 1 shows an example of the ACF function for a single running session. We can see that HR values are correlated with lag times until t cut 80 s. This value is the one corresponding to the point in which the ACF function cuts the upper confidence threshold (red point in Figure 1) [28].
point thus indicates a threshold lag tcut, suggesting that at times higher than tcut a c tion can be found with a probability less than 5%. Figure 1 shows an example of the ACF function for a single running session. see that HR values are correlated with lag times until tcut ≃ 80 s. This value is t corresponding to the point in which the ACF function cuts the upper confidence th (red point in Figure 1) [28]. Figure 1. ACF plot of HR for a single running session. The blue shaded region is the con interval with a value of α = 0.05 calculated with Bartlett's formula [29]. Anything within th represents a value that has no significant correlation with the most recent value for the HR case, we observe significant correlations from 0 to 80 s. The red point is the intersection betw ACF function and the upper confidence threshold. Correlations subsequent to that poin longer significant.
Since raw HR time series data were not stationary, we performed a detrendin the polyfit function in the numpy package [30]. After performing the Dickey-Fu for stationarity [31], we extracted the intersection value between ACF and uppe dence interval for the subgroup of running sessions acquired with Garmin since ready mentioned in the Data Acquisition paragraph, Garmin provides heart rate H series at equally spaced time intervals (Δt = 1 s). We selected 16 running sessions performed a statistical analysis on the intersection values of the ACF with confide tervals, which gave us a time window of width (90 ± 26) s. ACF cutoff lags (τACF) correlated with physical fitness (Supplementary Materials, Figure S2).

Feature Selection for the Clustering Algorithm
For each interval with width tcut = 90 s, we selected two features consideri speed v, and altitude z according to the following criteria. These features are two accounting for HR variation (ΔHR) and external energy demand variation (ΔE). ternal energy demand is described by the term E, resembling mechanical energ cludes a potential energy term ( = ( − )) and a kinetic energy term ( = two energy terms are normalized separately in a range of [0, 1]. In this way, both have comparable weights in the total energy, which can then vary in a range [0,2] Our aim is to cluster time intervals with an unsupervised method to classif according to the relationship between external energy demand variation and he variation. To express the variation in the signal and simultaneously guarantee o clustering performances we defined the following features: is the skewness index and it is a measure asymmetry of the probability distribution of a real-valued random variable abou  [29]. Anything within this range represents a value that has no significant correlation with the most recent value for the HR. In this case, we observe significant correlations from 0 to 80 s. The red point is the intersection between the ACF function and the upper confidence threshold. Correlations subsequent to that point are no longer significant. Since raw HR time series data were not stationary, we performed a detrending using the polyfit function in the numpy package [30]. After performing the Dickey-Fuller test for stationarity [31], we extracted the intersection value between ACF and upper confidence interval for the subgroup of running sessions acquired with Garmin since, as already mentioned in the Data Acquisition paragraph, Garmin provides heart rate HR time series at equally spaced time intervals (∆t = 1 s). We selected 16 running sessions and we performed a statistical analysis on the intersection values of the ACF with confidence intervals, which gave us a time window of width (90 ± 26) s. ACF cutoff lags (τ ACF ) are not correlated with physical fitness (Supplementary Materials, Figure S2).

Feature Selection for the Clustering Algorithm
For each interval with width t cut = 90 s, we selected two features considering HR, speed v, and altitude z according to the following criteria. These features are two terms accounting for HR variation (∆HR) and external energy demand variation (∆E). The external energy demand is described by the term E, resembling mechanical energy: it includes a potential energy term (V = g(z − z 0 )) and a kinetic energy term (K = 1 2 v 2 ). The two energy terms are normalized separately in a range of [0, 1]. In this way, both terms have comparable weights in the total energy, which can then vary in a range [0, 2].
Our aim is to cluster time intervals with an unsupervised method to classify them according to the relationship between external energy demand variation and heart rate variation. To express the variation in the signal and simultaneously guarantee optimal clustering performances we defined the following features: where is the skewness index and it is a measure of the asymmetry of the probability distribution of a real-valued random variable about its average. In these expressions, σ is the standard deviation, x is the mean value of the feature in the time interval, and x 0 is the initial point of the time interval. We used these features since variations are subjected to noise and are not always linear. The first term expresses the position of the mean of the points in the considered time interval with respect to the starting point of the interval, and the second is a reinforcing term expressing the skewness of the distribution. We provide below evidence that these features are able to discriminate variations in the signals on noisy and non-linear data (paragraph Section 3.1). Note that ∆E and ∆HR are mass independent since they are normalized to their standard deviation.

Clustering Analysis Algorithm
Once the values of the clustering features chosen were calculated for each interval of width ∆t, cluster analysis was performed.
Clustering analysis is an unsupervised statistical method for processing data consisting of the organization of data into groups called clusters, which has many applications as market research [32], pattern recognition [33], data analysis [34], and image processing [35,36]. Clustering is measured using intracluster distance, the distance between the data points inside the cluster, and intercluster distance, the distance between data points in different clusters. A good clustering analysis minimizes the intracluster distance meaning that a cluster is more homogeneous and maximizes the intercluster distance. We chose the k-means algorithm [37,38] which tries iteratively to partition the dataset into K predefined distinct non-overlapping subgroups. k-means represents each of the k clusters C j by the mean (or weighted average) c j of its points (centroid). The sum of distances between elements of a set of points and its centroid expressed through an appropriate distance function is used as the objective function. We employed the L2 norm-based objective function, i.e., the sum of the squares of errors between the points and the corresponding centroids, which is equal to the total intracluster variance: The k-means algorithm can be summarized in four main steps: 1. Specify the number of clusters k and initialize centroids C = {c 1 , c 2 . . . c k } by randomly selecting K data points for the centroids without replacement; 2.
For each j ∈ {1, . . . , k}, set the cluster C j to be the set of points in X that are closer to c j than they are to c j for all i = j; 3.
For each j ∈ {1, . . . , k}, set c i to be the center of mass of all points in C j :
This version, known as Forgy's algorithm [38], works with any Lp norm and it does not depend on data ordering. A weakness of the k-means algorithm is that after a certain time it will always converge due to a local minimum, and this is strictly connected to the starting centroid choice. One method to help address this issue is the k-means ++ scheme [39,40] which initializes the centroids to be distant from each other, leading to probably better results than random initialization.

Choosing the Best k Number
The optimal number of clusters was determined using the silhouette method [43]. The silhouette score is a very useful index for the quality of clustering analysis. It is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). Given a cluster A and any object i in the data set, when cluster A contains other objects except i, we can compute the distance of object i with all other points in the same cluster: where n A is the number of points belonging to cluster A and d ij is the mean distance between data points i and j and in the cluster A.
Considering any cluster B which is different from A, we can compute: which gives the smallest mean distance of i to all points in any other cluster of which i is not a member. With these two distances, we can define the silhouette value for the point i: which is set to be 0 when cluster A contains a single object. For every object i, the silhouette value can vary in the range [−1, +1]. A silhouette value near +1 indicates that point i is far away from the neighboring clusters. A value of 0 indicates that point i is on or very close to the decision boundary between two neighboring clusters and negative values indicate that point i might have been assigned to the wrong cluster.
The average of all silhouette values for each point i in the dataset returns the silhouette score:

Statistics
Differences among clusters were determined by conducting an ANOVA and Kruskal-Wallis test for the data that were not normally distributed. Normal data distribution was assessed by visual inspection, variance comparison and Shapiro-Wilk's test. Subsequently, we performed a Mann-Whitney-Wilcoxon post-hoc test for non-normal distributions and Tukey post-hoc test for normal distributions both with Bonferroni adjustment for p-values. Values of p < 0.05 were considered statistically significant.

K-Means Clustering Reveals Four Dynamic Clusters
The aim of our clustering strategy was to find different dynamical regimes during running sessions using different metabolic processes. Clustering parameters are rescaled using as an offset the mean value of the distribution, and as scaling factor α s , where s is the standard deviation of the distribution and α a tunable parameter. If the distributions are almost Gaussians, α > 3 ensures that more than 99% of the values are considered. This was verified by visual inspection and qq-plots. When there are significant deviations from Gaussians the α factor is adjusted to include at least 99% of values by direct calculation. The result is a k-means classification with four clusters, that we will call dynamic clusters, as shown in Figure 2.
Gaussians the α factor is adjusted to include at least 99% of values by direct calculation. The result is a k-means classification with four clusters, that we will call dynamic clusters, as shown in Figure 2.  Figure 2b shows the clustered data. For simplicity, we indicate the clusters with the respective signs of the variations: +/+ cluster (yellow), −/− cluster (black), −/+ cluster (blue), and +/− cluster (green). In clusters +/+ (yellow) and −/− (black), ΔHR and ΔE are directly proportional. The +/+ cluster is characterized by positive ΔHR (mean ± sd = 1.16 ± 0.52) and positive ΔE (mean ± sd = 1.24 ± 0.46). The −/− cluster is characterized by negative ΔHR (mean ± sd = −1.20 ± 0.58) and negative ΔE (mean ± sd = −1.25 ± 0.52). Contrariwise, −/+ (blue) clusters and +/− (green) are linked to inversely proportional regions for ΔHR and ΔE. The −/+ cluster is characterized by positive ΔHR (mean ± sd = 1.13 ± 0.55) and negative ΔE (mean ± sd = −1.29 ± 0.40). The +/− cluster is characterized by negative ΔHR (mean ± sd = −1.06 ± 0.60) and positive ΔE (mean ± sd = 1.30 ± 0.32). We can observe these results in Figure 3a,b. In Figure 3c representative clustered time intervals from a single running session are shown. We can observe that clustering identifies coherently the areas of growth and decreases in these quantities: when ΔHR and ΔE are both positive, HR(t) and E(t) are increasing in the considered time interval; when ΔHR and ΔE are both negative, HR(t) and E(t) are decreasing in the considered time interval. When they have different signs, the positive signal increases whilst the negative decreases. To show that this is the case, we performed a linear regression analysis on the same time windows of the clustering analysis. The signs of the slopes are in good agreement with the signs of the variations in the features identified in those specific time windows, except for a small fraction (between 10% and 20%) in which strong non-linearity affects the goodness of linear fits ( Figure S3). This happens when ΔE and ΔHR are very close to zero and/or when HR(t) and E(t) undergo both increases and decreases in the same time window. However, features reported in Equation (2) can capture the general tendency of these unfitted data to increase or decrease: while is in these cases characterized by a very small value, the skewness is still relevant and, with exception of perfectly symmetrical data distributions, the sign of the variation takes into account the general tendency of the data to be above or below the average value (which is almost zero). Figure S4 shows some representative time windows  Figure 2b shows the clustered data. For simplicity, we indicate the clusters with the respective signs of the variations: +/+ cluster (yellow), −/− cluster (black), −/+ cluster (blue), and +/− cluster (green). In clusters +/+ (yellow) and −/− (black), ∆HR and ∆E are directly proportional. The +/+ cluster is characterized by positive ∆HR (mean ± sd = 1.16 ± 0.52) and positive ∆E (mean ± sd = 1.24 ± 0.46). The −/− cluster is characterized by negative ∆HR (mean ± sd = −1.20 ± 0.58) and negative ∆E (mean ± sd = −1.25 ± 0.52). Contrariwise, −/+ (blue) clusters and +/− (green) are linked to inversely proportional regions for ∆HR and ∆E. The −/+ cluster is characterized by positive ∆HR (mean ± sd = 1.13 ± 0.55) and negative ∆E (mean ± sd = −1.29 ± 0.40). The +/− cluster is characterized by negative ∆HR (mean ± sd = −1.06 ± 0.60) and positive ∆E (mean ± sd = 1.30 ± 0.32). We can observe these results in Figure 3a,b. In Figure 3c representative clustered time intervals from a single running session are shown. We can observe that clustering identifies coherently the areas of growth and decreases in these quantities: when ∆HR and ∆E are both positive, HR(t) and E(t) are increasing in the considered time interval; when ∆HR and ∆E are both negative, HR(t) and E(t) are decreasing in the considered time interval. When they have different signs, the positive signal increases whilst the negative decreases. To show that this is the case, we performed a linear regression analysis on the same time windows of the clustering analysis. The signs of the slopes are in good agreement with the signs of the variations in the features identified in those specific time windows, except for a small fraction (between 10% and 20%) in which strong non-linearity affects the goodness of linear fits ( Figure S3). This happens when ∆E and ∆HR are very close to zero and/or when HR(t) and E(t) undergo both increases and decreases in the same time window. However, features reported in Equation (2) can capture the general tendency of these unfitted data to increase or decrease: while γ 0 is in these cases characterized by a very small value, the skewness γ 1 is still relevant and, with exception of perfectly symmetrical data distributions, the sign of the variation takes into account the general tendency of the data to be above or below the average value (which is almost zero). Figure S4 shows some representative time windows in which the clustering analysis effectively identifies an increase or decrease in the features. in which the clustering analysis effectively identifies an increase or decrease in the features. and energy (down) in running fragments belonging to different clusters. The clustering analysis assigns the belonging of these fragments to different clusters coherently with the sign of both ΔHR and ΔE. p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; **: 1.00 × 10 −3 < p ≤ 1.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 ; ****: p ≤ 1.00 × 10 −4 . Color legend: yellow: +/+ cluster; black: −/− cluster; blue: −/+ cluster; green: +/− (see Figure 2 in Section 3.1). Table 1 shows general characteristics of statistical quantities in different clusters. We chose a subgroup of clusters relative to non-overlapping windows to perform a statistical analysis on independent points. Clusters do not differ for average HR, average speed, HR standard deviation, speed standard deviation, and altitude standard deviation. The only significant quantities are then the clustering quantities. Table 1. General characteristics of the clustered population. p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; **: 1.00 × 10 −3 < p ≤ 1.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 ; ****: p ≤ 1.00 × 10 −4 .   (c) heart rate (up) and energy (down) in running fragments belonging to different clusters. The clustering analysis assigns the belonging of these fragments to different clusters coherently with the sign of both ∆HR and ∆E. p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; **: 1.00 × 10 −3 < p ≤ 1.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 ; ****: p ≤ 1.00 × 10 −4 . Color legend: yellow: +/+ cluster; black: −/− cluster; blue: −/+ cluster; green: +/− (see Figure 2 in Section 3.1). Table 1 shows general characteristics of statistical quantities in different clusters. We chose a subgroup of clusters relative to non-overlapping windows to perform a statistical analysis on independent points. Clusters do not differ for average HR, average speed, HR standard deviation, speed standard deviation, and altitude standard deviation. The only significant quantities are then the clustering quantities. Table 1. General characteristics of the clustered population. p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; **: 1.00 × 10 −3 < p ≤ 1.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 ; ****: p ≤ 1.00 × 10 −4 .    Table 2.

Temporal Mapping of Clusters on Running Sessions
PEER REVIEW 9 of 17

Fraction of Cluster −/+ Is Positively Correlated with VO2max, While Fraction of Cluster −/− Is Negatively Correlated
In addition to what has been said in the previous paragraph, we can observe in Figure  5 the correlations between the cluster frequencies for each cluster in the different running sessions and the VO2max value estimated with the Apple Watch in the same sessions. Our analysis found a positive correlation between VO2max and cluster −/+ (r = 0.72, Figure 5b) and a negative correlation between VO2max and cluster −/− (r = −0.52, Figure 5a), though the r value is slightly less. Projections of the clusters on representative HR, speed, and altitude time series are shown in Figure 6a: as an example, a running session acquired in June 2021 is reported. The distribution of the clusters along the curve seems isotropic.  In addition to what has been said in the previous paragraph, we can observe in Figure 5 the correlations between the cluster frequencies for each cluster in the different running sessions and the VO 2 max value estimated with the Apple Watch in the same sessions. Our analysis found a positive correlation between VO 2 max and cluster −/+ (r = 0.72, Figure 5b) and a negative correlation between VO 2 max and cluster −/− (r = −0.52, Figure 5a), though the r value is slightly less. Projections of the clusters on representative HR, speed, and altitude time series are shown in Figure 6a: as an example, a running session acquired in June 2021 is reported. The distribution of the clusters along the curve seems isotropic.

Temporal Distribution of the Heartbeat Dynamics and Correlation with Neuromuscular Fatigue
To deeply investigate the temporal distribution, we divided the time series into three equal sections and calculated the percentage concentrations of the clusters in the individual sections. To compare even slightly different runs in overall duration we normalized time between 0 (start time) and 1 (end time). We called the three sections "start" (normalized time interval 0-0.33), "middle" (normalized time interval 0.33-0.66), and "end" (normalized time interval 0.66-1). Characteristic trends can be seen for the various clusters over time. Results of this analysis are shown in Figure 6b-e.
Performing a repeated measure ANOVA with a Tukey HSD post-hoc comparison, we can observe that the cluster frequencies of the −/− cluster in Figure 6b in the "start" section is significantly higher than the "middle" section. A decreasing trend can be observed for cluster −/+ in Figure 6c. The cluster frequencies in every section are all significantly different.
In Figure 6d,e, frequencies for both +/− and +/+ clusters present an increasing trend from the "start" section to the "end" section: cluster frequencies in the "end" section are significantly higher than "start" and "middle" section. Values of the cluster frequencies experience a saturation. We performed RQA analysis on the same sections of the analysis just presented. An important parameter of RQA analysis is determinism (DET) which is an indicator of the regularity or complexity of the system dynamics. In earlier works, a correlation between high values of DET and fatigue during submaximal incremental exercise has been found [49,50]. We found an increasing trend for DET in every running session. Repeated measures ANOVA and post-hoc Tukey with Bonferroni corrections revealed significant differences between the "end"-"middle" section and "end"-"start" section. Figure 7a reports boxplots of determinism in the three different sections. In Figure 7b-d an example of a recurrence plot in the three different sections is shown.

Discussion and Conclusions
Several studies have found out that interval training (IT) produces improvements in VO2max slightly greater than those typically reported with continuous training (CT) [21]. These training approaches are indeed aimed to abruptly change the external energy demand ΔE. In this respect, if during city running abrupt variations in velocity or height due to the alternation of uphill and downhill slopes with flat areas occur, an improvement in Figure 7. (a) RQA determinism boxplots in three different running sections; a representative RP of HR time series during an exemplary running session. In "middle" (b) and "end" (c) sections a larger number of black points (recurrence points) can be seen with respect to the "start" section (d). p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; *: 1.00 × 10 −2 < p ≤ 5.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 .

Discussion and Conclusions
Several studies have found out that interval training (IT) produces improvements in VO 2 max slightly greater than those typically reported with continuous training (CT) [21]. These training approaches are indeed aimed to abruptly change the external energy demand ∆E. In this respect, if during city running abrupt variations in velocity or height due to the alternation of uphill and downhill slopes with flat areas occur, an improvement in the cardiovascular systems in copying with these stresses can trigger heart rate dynamics related to a VO 2 max increase. To investigate this point, in this manuscript we considered city running sessions of comparable duration performed by a non-professional runner over a year. We calculated ∆E and ∆HR features from the raw data extracted from the Apple Watch in overlapping point by point time intervals of width ∆t = 90 s, because of the statistical analysis on the intersection values of the ACF with confidence intervals on 16 different running sessions (Figure 1). The heart rate ACF was decisive for determining the time window in which heart rate values were still related to previous events. The clustering analysis identified four different cluster dynamics of heartbeat in response to external energy demand (+/+, +/−, −/+, −/−). In directly proportional clusters (+/+ and −/−) an increase (decrease) in the external demand is correlated to an increase (decrease) in the cardiovascular response to adapting to the cardiac output needed to fulfill increased oxygen requirements. While these dynamics of the cardiovascular response are standard and traceable also on longer timeframes, we found two peculiar dynamics grouped in −/+ and +/− clusters which are not characterized by this straightforward relation. In the −/+ cluster, ∆HR increases despite ∆E decreasing, i.e., cardiovascular response increases despite the energetic demand of the environment decreasing (i.e., z and/or v decrease). We observed a positive correlation with VO 2 max and the frequency of the −/+ cluster (r = 0.72, Figure 5a). This increase is at the expense of the −/− cluster whose frequency inversely correlates with VO 2 max (r = −0.52, Figure 5b). Moreover, investigation of the temporal distribution of the clusters by classifying running sessions in three equal time intervals ("start", "middle", and "end") shows that the −/+ cluster percentage decreases from the "start" to the "end". Since the increase in DET measured through RQA analysis (Figure 7), as described in previous publications, indicates the appearance of neuromuscular fatigue [49,50], we hypothesize that −/+ dynamics are especially active when the organism is not experiencing fatigue. Overall, we can hypothesize that the physiological processes connected to a VO 2 max increase, such improvement in cellular metabolism of muscular cells in the mechanisms characterizing oxygen uptake [59], and vascular modifications leading to a better oxygen distribution, could change cellular responses to stimuli ultimately leading to a hyper-stimulation of the sympathetic nervous system. This altered response lasts until the emergence of neuromuscular fatigue occurs. This neuromuscular modification leading to a new regime of sympathetic stimulation can therefore be connected with the increase in frequency of +/− cluster dynamics. The dynamics of the +/− cluster, in which ∆HR decreases despite ∆E increasing, does not present a significant correlation with VO 2 max variation, indicating that it is not related to the physiological improvement. This cluster may instead be related to a delayed physiological response to an increase in the external energy request and fatigue. The +/− fraction increase from "start" to "end" can be therefore related to the fact that a delayed cardiovascular response increases with fatigue.
Overall, the combination of these results can be extremely valuable in providing personalized exercise plans. Indeed, since it is possible to detect characteristic heartbeat dynamics, the possibility to provide personalized feedback about the user's fitness level improvement is opened: improvements in cardiovascular fitness may be realized developing personalized exercise plans aimed at targeting a contextual increase in the −/+ fraction, related to VO 2 max increase, at the expense of the +/− fraction, related to the emergence of fatigue. These strategies can ultimately result in the reduction in cardiovascular risk and in the risk of developing other devastating pathologies such as cancer. This study, by presenting a new method of analysis, is limited to a single subject, as the analysis is conceived as person-centered by extracting features that would be hidden by the variability between individuals. However, this innovative analysis is widely applicable and has implications beyond the specific case. Other subjects, analyzed with the same method, could display similar or differing features according to their medical history, age, sex, and fitness status. Further research will generalize these results to improve the extraction of cardiovascular fitness improvement features from wearable devices and the physiological interpretations of the signals belonging to each cluster.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/s22113974/s1, Figure S1: Comparison analysis between Heart Rate acquired by Apple Watch and Heart Rate acquired by Garmin. Linear regression highlights a Pearson correlation coefficient R = 0.92 and p value << 0.0001 so we can say that the two signals are highly correlated.; Figure S2: ACF time decay (τACF) values in 16 different running sessions acquired by Garmin. τACF values have been obtained by calculating the intersection values of the ACF with confidence intervals. An important result is the independence of the ACF time decay from training level. The plot in Figure S2 does not show any trend meaning that τACF is not correlated with physical fitness; Figure S3: Slope distributions obtained from the linear regression analysis relative to the four clusters and to the two clustering features ∆E and ∆HR are reported. The red bars indicate the slope values whose sign does not correspond to the sign of the variations in the clustering analysis, ranging from 8% to 23% of the points; Figure S4: Energy plots representing cases in which the sign of the variations in the features chosen in our clustering analysis (∆E and ∆HR, according to Equation (7)) and the sign of the slope calculated with the linear analysis are in agreement. The colors of the plots are equal to the color used to indicate the respective clusters (black for −/− cluster, blue for −/+ cluster, green for +/− cluster, and yellow for +/+ cluster, see Figure 2). In the inserts are instead reported energy plots representing representative cases in which the value of the variations in the features chosen in our clustering analysis and the value of the slope calculated with the linear analysis disagree. For example, in the inset plots in Figure S4a,b the energy variation (∆E) should be negative. Linear regression, on the other hand, identifies a positive slope. Similarly, in Figure S4c,d ∆E should be negative but the slopes are positive. This happens when there are particular configurations in which the variation is very close to zero. In these cases, the slope becomes sensitive to noise. Instead, in our features the value of γ 0 is close to zero and therefore the skewness γ 1 becomes important. Therefore, in these cases, the sign of the variation takes into account the general tendency of the data to be above or below the average value; Table S1: Values of repeated measures ANOVA and Tukey HSD post hoc for both RQA and clustering analysis performed on the three temporal sections. p-value annotation legend: ns: 5.00 × 10 −2 < p ≤ 1.00 × 10 0 ; *: 1.00 × 10 −2 < p ≤ 5.00 × 10 −2 ; **: 1.00 × 10 −3 < p ≤ 1.00 × 10 −2 ; ***: 1.00 × 10 −4 < p ≤ 1.00 × 10 −3 ; ****: p ≤ 1.00 × 10 −4 . Funding: This project was supported in part by a research grant awarded to MDS from Regione Lazio PO FSE 2014-2020 "Intervento per il rafforzamento della ricerca nel Lazio-incentivi per i dottorati di innovazione per le imprese", co-funded by Aenduo, and by a research grant awarded to GM by Università Cattolica del Sacro Cuore-Linea D1, 2021.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Università Cattolica del Sacro Cuore (Protocol Code diab_mf, 16 March 2017).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.