In this work, a graph structure is constructed based on prior knowledge about the ECG structure, and dynamic programming is used to segment the signal into different predefined states according to the graph. Three logical flags based on slope, difference, and shape of the signal are also introduced to better fit waveforms to ECG wave morphology. The adaptive thresholding for the slope of the signal in an R-peak is implemented to cope with the variation of ECG signal levels in different individuals. The details are given below.
3.1. ECGMorphGraph: The Proposed Graph for ECG Wave Morphology
Considering a discrete-time ECG signal of length , , the goal is to label the signal at each time index into different types/parts. Mainly, we aim to label the R-peaks. However, to increase the accuracy of the model, several other components of the ECG wave are incorporated. To capture the relation between each state and correctly label the signal at each time index, we incorporated graph theory.
In graph theory, a graph is defined as a set of vertices , and a set of edges where . A directed graph is a graph containing edges that have a direction, meaning that an edge does not imply the existence of a reverse edge . A weighted graph is a graph containing edges with each edge assigned a numerical value, or a weight. Mathematically, a weighted directed graph is a set of vertices and a set of edges , .
In our proposed method, the vertices of the graph are categorized into several types based on each component of the ECG. The proposed graph named “ECGMorphGraph” is created corresponding to the real ECG wave morphology. In the graph, a directed edge between vertices and exists if vertex (or state) might come directly after the vertex (or state) chronologically in an ECG signal. Specifically, each vertex is a representation of a corresponding state or segment within an ECG signal.
For each vertex
The following information is stored:
The state it corresponds to, labeled numerically, as well as stored as a string. The states used in our work are as follows: “Start”, “R”, “S”, “PQ”, “Q”, “ST”, “T”, “UNK”, in which each string represents an ECG wave component and “UNK” represents an unknown component. Note that there may be several numerical vertex types corresponding to the same ECG states, as an ECG state can have varying morphologies.
A slope constraint to ensure that the shape of the segment suits the state it is at, i.e., R-peaks would have to be significantly slanted, and isoelectric lines would have to be relatively flat. Since there are many instances where ECG waves are mirrored in the horizontal axis, we define a logical flag based on the signal slope (gradient) constraint in the vertex, , for both up-sided and down-sided waves as
for the following cases:
Case 1:
Case 2:
Case 3:
Case 4:
Case 5:
where the minimum and maximum slopes
and
are defined, respectively, as
and are the minimum and maximum values of the ECG signal, respectively, in the segment of interest from to , and is the slope thresholding value of the vertex. Note that corresponds to the time index of the vertex in our selected labeling, or optimal path. In other words, and are the slopes using the lowest and highest points, respectively, or the difference in amplitude between the highest/lowest points and the starting point, added with the highest/lowest points and the ending point, over the length. The idea behind the use of for each type of slope-based constraint is as follows:
- ○
no slope constraint
- ○
normal wave segment
- ○
upside-down wave-like segment
- ○
flat segments
- ○
any wave-like segments, i.e., up or down.
In short, the function is defined for five unique cases. For the no slope constraint case, it always returns true. For the normal wave segment case, it returns true if the maximum slope is not less than the defined slope constant. For the upside-down wave-like segments, it returns true when the minimum slope is not less than the defined constant. For the flat segments, it returns true when both the minimum and maximum slopes are not greater than the defined constant. Lastly, in wave-like segments that can be both normal and upside-down, it returns true when either the minimum or maximum slope is not less than the defined constant. Otherwise, the function will return false.
- 3.
A vector of edges that point to the current vertex.
- 4.
A range which represents having edges from the vertices of type within the range to .
It is very important to note that the slope thresholding value of the
vertex,
, was made adaptive for all pairs of consecutive R-peaks to cope with the vast variation of the broadness of R-peaks in different records as ECG wave morphology can vary depending on the person. Specifically, the broadness of the R-peaks and T-peaks differs greatly from one record to another, so much so that T-peaks from some records are wider or bigger than R-peaks. To combat this, the updated current thresholding value is basically calculated by averaging half the current slope value and the previous thresholding value. To make it robust to varied signal acquisition, the thresholding value is bounded based on the sampling rate
. In the experiments, the minimum thresholding value
was set to
. The threshold has to be at least
and cannot go beyond
. This prevents too much change in the consecutive thresholds. This update occurs only at the vertices corresponding to the state of R-peak, i.e., the vertices 1 and 6 in
Table 1. It is worth noting that the adaptive thresholding is calculated between iterations and is updated automatically once a state representing an R-peak is reached, making it adaptive for each R-peak we reach. This means that it operates independently from the other defined graph constraints. Specifically, the new adaptive slope thresholding value is updated according to the following algorithm:
For each edge
The edge stores the following information:
and representing a one-directional connection from vertex to .
An integer denoting the index of the edge.
The minimum difference constraint to ensure a significant change of amplitude is met. The difference-based constraint logical flag in the edge, , is defined as
for the following cases:
Case 1:
Case 2:
Case 3:
Case 4:
where
for a signal within the range
to
. To make it clearer,
is the difference between the last value of two consecutive segments, and
is the difference thresholding value of the
edge. Each type of difference-based constraint is used as follows:
- ○
same as the previous segment
- ○
normal wave-like segments
- ○
upside-down wave-like segments
- ○
flat segments
To sum it up, the function is defined for four types of segments. In the case of no difference constraint, or for segments that are identical to their previous segments, the function always returns true. For the normal wave-like segments, the function returns true when the difference is not less than the defined constant. For upside-down wave-like segments, the function returns true when the negative difference is not less than the defined constant. Lastly, for flat segments, the function returns true when the magnitude of the difference is not greater than the defined constant. Otherwise, the function returns false.
- 4.
A fixed bias for the edge which represents the cost of traversing the edge. This bias prevents changing states too frequently or sub-optimally.
The shape-based constraint logical flag of the vertex, , is also defined for some of the vertex types. Note that the vertex labeling a range to would have to satisfy their type constraint to be considered as a possible path. The constraint is given below.
for the following cases:
Case 1:
Case 2:
Case 3:
Case 4:
Case 5:
Note that all three functions, representing the three constraints, aid the labeling process (traversal through vertices), or the fourth step in
Figure 1. Specifically, if all three functions return true, then the labeling process is allowed to commence. Otherwise, the specific path cannot be taken because it does not satisfy a specific morphology.
Figure 6 displays the structure of the
ECGMorphGraph that we created, particularly for ECG signals. The graph consists of 15 vertices and 22 edges based on ECG signal morphology and structure. In addition, we also added another vertex as a failsafe path to ensure that the labeling still occurs when there are no fitting paths, which is not displayed in the figure. The states of each vertex in the graph and the special vertex (failsafe vertex) will be explained further in
Section 5.
To label the ECG signal as a time series with length
, we will copy the
ECGMorphGraph for each time index, creating
layers of the graph, with each layer corresponding to a time index as shown in
Figure 7. Specifically, each time index will contain all types of vertices with inward edges from previous time indices. As a result, the layered
ECGMorphGraph would contain
vertices, where
is the total number of ECG parts defined (or basically the number of vertices in the
ECGMorphGraph). Each type of vertex will also contain a range
representing the possible lengths of the state in the current vertex. This corresponds to the time indices of the possible previous vertices. That means all vertices with type
will contain inward edges from time indices
to
with vertices of type
for all
, where
is the type of the previous vertex and
is the type of the current vertex. As could be imagined, the actual number of edges pointing to each vertex is extremely large due to many possible edges from many layers corresponding to the past time instances in the range
. The actual layered
ECGMorphGraph would contain an extremely large number of edges, and hence, the layered graph would be unreadable. This is why the simplified version of the layered
ECGMorphGraph is shown in
Figure 7, rather than the actual one.
To label each signal segment, visiting a vertex of type
at time index
represents labeling an ECG segment ending at the time index
as type
. Therefore, a sequence of vertices, or a path, containing the vertices
would represent labeling the signal as follows:
as the type of vertex 1,
as the type of vertex 2,
as the type of vertex 3, …,
as the type of vertex
, where
denotes the time index of the
vertex. Note that
and
, as our goal is to label the entire sequence. For each directed edge, we will then assign a weight,
, representing the cost of labeling the specific segment and the corresponding bias. Mathematically, the
edge is assigned a weight of
, where
is the segment corresponding to the
edge,
is the type of the vertex the edge is pointed to, and
is the bias of an edge connecting the previous vertex type to the current vertex type. In simpler terms, the cost function is defined as the sum of the squares of the differences between every point and the starting point. The basic idea is that
is the cost function to help encourage labeling, as labeling a segment might decrease the overall cost, and the bias
is an additional cost to discourage some types of labeling unless it is clearly optimal. As mentioned earlier, we introduce 2 logical flags based on signal level constraints to prevent incorrect transitions, i.e., a slope-based flag
which is assigned to vertices, and a difference-based flag
which is assigned to edges. We also introduce another logical flag based on the shape constraint
to make sure that a labeled segment contains the data points we need. Specifically, the constraint makes sure that some labeled segments will contain a local maximum at endpoints of the segment or at the middle to ensure correct R-peak detection. Formally, we can write the labeling problem as
where
denotes the index of the type of the
th vertex, and
denotes the index of the current edge
for the
vertex, and
denotes the time index of the
vertex in the labeled path. To put it simply, our task is to find a path of vertices
, such that the sum of the edge weights between consecutive vertices is minimized. Visually, this path is displayed in the third and fourth steps of
Figure 1, where it will be used to segment the ECG signal and find the R-peak positions.
We can observe that this problem can be changed into a simple shortest path problem by representing the problem using graphs. More formally, we must find a sequence of vertices where represents the start vertex at time index 0, and represents a vertex ending at the last time index with minimum cost. This can be solved with Dijkstra’s shortest path algorithm in where and are the numbers of vertices and edges, respectively, as defined earlier. However, we can observe that the structure of our specific graph turns it into a directed acyclic graph (DAG), as edges point from vertices of smaller time indices to larger time indices. This means the shortest path can be found by iterating the topological ordering of the graph in time and storing the optimal answer with dynamic programming. The shortest path will then be considered as the most optimal labeling.
Table 1 and
Table 2 show the details of vertices and edges, respectively, used of the proposed
ECGMorphGraph. In total, we have 16 vertices and 30 edges in the graph. Note that each edge corresponds to
actual edges connecting a vertex of one type, with a time index
, to a vertex of another type with time indices
for each time index of the layered
ECGMorphGraph. Each vertex numbered from 0 to 14 represents a specific state of the ECG wave. Additionally, we introduce a special vertex, indexed 15 and labeled as “UNK”, representing the word unknown. This vertex is used as a failsafe to ensure that there is always a path for graph traversal, even if none of the other vertices are currently reachable. Therefore, vertex 15 is connected to all vertices and assigned a cost equivalent to infinity to prevent taking this path unless necessary. This can be easily implemented in this research by setting the bias
to 10
10.
The values in the tables were determined according to the ECG wave morphologies, varied individual conditions, and varied signal acquisition setups. It is worthwhile noting that, in
Table 1, the lower and upper bounds of the interval of interest
and
are set with respect to the sampling rate
of the ECG signal. To be more specific, the values shown in the table were divided by
. This is to make our algorithm robust to different ECG signals acquired by different sampling rates. Meanwhile, other parameters, i.e., the slope thresholding value of the
vertex
and the difference thresholding value of the
edge
were also affected by the sampling rate selection. The values of
and
in the tables were multiplied by
to make our algorithm robust to sampling rate variation.
For reproducibility purposes, it is important to note that the construction of the ECGMorphGraph was done manually through trial and error. Specifically, important parts of the ECG were added first, such as the parts representing the R-peaks and the “Start” vertex. The corresponding thresholds and constraints were also added to ensure correct detection. Then, more minor parts were incorporated into the graph one by one, such as the T-wave, ST segment, and the Q and S segments in the QRS complex. After the rough graph was built, minor changes were made to the structure to increase performance. It is also important to explain that each part of the proposed graph structure was added to combat specific ECG morphologies. For example, vertices 6 through 10 were added to enable detection of upside-down waves, and vertices 12 and 14 were added to allow correct detection of prominent T-waves instead of mislabeling them as R-peaks.
Furthermore, it is important to note that the selection of threshold and constraint values was done systematically and logically. In particular, real-world ECG data, usually from the MIT-BIH dataset, was manually analyzed to create appropriate threshold values by first using an estimate of the logical values each constant should take. Then, each value was roughly optimized to increase performance and detection accuracy. As such, most of the threshold and constraint constants are based on the 360 Hz sampling rate of the MIT-BIH dataset, and were then converted to the values presented in
Table 1 and
Table 2. Lastly, the determination of all values was done simultaneously with the addition of each vertex in the graph to ensure that the current structure works effectively.
To evaluate the detection performance of our algorithm, we used five evaluation metrics: accuracy, sensitivity, positive-predictive value (PPV), F1-score, and detection error rate (DER) [
21,
22]. The definitions of each metric are stated as follows:
where
is the number of true positives,
is the number of false negatives, and
is the number of false positives.
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 display the performance of our algorithm for the MIT-BIH dataset with moderate variations to the major constants that are used, while
Table 8 displays the performance when some vertices are removed from the manually defined graph. To be more specific, the removal is done by directing each of the incoming edges from the removed vertex to all of its possible successors. This ensures that there is still a valid path through the graph in the absence of the specific vertex type, and accurately measures the performance of the graph without the usage of the mentioned vertex. It should be noted that the results presented in the abstract as well as
Section 4 (Experimental Results) correspond to the optimal parameters across all databases, meaning that some variations may create a better performance in the MIT-BIH database as presented in
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. Namely, five of the most significant values were chosen to be tested, including: the R-peak difference threshold (
, the S slope threshold (
), Q slope threshold (
), PQ slope threshold (
), and the adaptive threshold coefficient. The adaptive threshold coefficient indicates the weight applied to the current slope threshold compared to the previous value. So, the simple average implies that this coefficient is 0.5. Five variations of each constant were chosen, and the results are reported to show the parameter robustness of our proposed method. These variations were chosen to be round numbers surrounding the actual value used, for simplicity and to demonstrate the change in performance when the values are increased/decreased to a certain level. Furthermore, during the variations, the other values are kept as reported in
Table 1 and
Table 2.
The first observation of the evaluation is that the variation of these five parameters did not significantly affect the overall detection performance achieved by our proposed algorithm on this dataset. The accuracy, sensitivity, PPV, and F1-score were all higher than 99%, and the values of DER were all less than 1% in all scenarios. The parameter that showed the most variation in detection performance was the R-peak difference threshold, where the accuracy ranged from 99.07 to 99.20, the sensitivity ranged from 99.39% to 99.60%, the PPV ranged from 99.47% to 99.69%, the F1-score ranged from 99.50% to 99.60%, and the DER ranged from 0.80% to 0.93% as shown in
Table 3. Meanwhile, the detection performance changed just a little over the variations of the S slope threshold, Q slope threshold, PQ slope threshold, and the adaptive threshold coefficient as shown in
Table 4,
Table 5,
Table 6 and
Table 7. This shows that our proposed algorithm does not rely upon specific values of parameters, but is a robust framework capable of detecting R-peaks. Note that the values chosen had a relatively large difference. For example, the 0.10 to 0.50 chosen for the R-peak difference threshold corresponds to a minimum requirement of 0.10 to 0.50 millivolt difference in amplitude for an R-peak to be considered for labeling, which is quite significant. In
Table 8, we see that each vertex plays an important role in R-peak detection. The removal of the vertices corresponding to the PQ, S, and T components, along with their upside-down counterparts, resulted in slight decreases in performance ranging from approximately 0.1 to 0.5% in accuracy. This demonstrates that these vertices slightly contribute to the overall performance. On the other hand, the removal of the vertices corresponding to the Q segment significantly decreases the accuracy to around 94.35%, showing that it is a very crucial part of the proposed graph structure. Overall, the controlled ablation experiments showed that the graph still works effectively to a considerable degree when some of its vertices are removed, but all the vertices are still needed to achieve a competitive performance, justifying the need for the complexity of the 16-vertex graph. Note that the value chosen for this specific work might not correspond to the optimal value in the table, as it only displays the results from the MIT-BIH dataset and not all four datasets. Also, we would like to state that the parameter optimization has not been done extensively to prevent overfitting, and to show a realistic performance of our algorithm and proposed framework, meaning that additional parameter optimization will increase the performance of our method to a considerable degree.
3.3. Computational Complexity
Figure 8 displays the pseudocode of the proposed algorithm. Each iteration of the algorithm can be separated into two main parts. ECG segmentation and peak localization from the segmented signal. The segmentation works by iterating over the signal, then iterating over all graph vertices and their indegrees. For each inward edge in the
ECGMorphGraph, we iterate the signal from
to
to update the dynamic programming array. It should be noted that updating the dynamic programming array can be done in
by precomputing prefix sums and maintaining the local minimum and maximum when iterating. Thus, our algorithm works in
where
is the signal length and
is the total number of edges of the layered
ECGMorphGraph. Specifically,
, where
is the total number of edges for all the vertices representing a single time index. Since the graph structure is predefined and static,
and
are constant. Therefore, the segmentation part of the algorithm works in
or more accurately
, which constitutes a relatively high constant factor.
The peak localization part of the algorithm works first by backtracking through the optimal segmentation obtained by the segmentation part. Specifically, we first iterate over all types of vertices and find the minimum cost of where is the signal length. Next, the backtracking is performed through the optimal sequence. During the segmentation part, the most optimal type and index to backtrack to are stored. This allows us to iterate through the most optimal signal from back to front. Specifically, for each state of the graph we store the previous state (the type and index) that results in the minimum cost, which allows us to track the previous state we need to backtrack to. Also, the backtracking process starts from state with minimal cost across all states ending at . In each iteration, the current state is set to the previous state that yielded the minimum cost, which has already been stored during the peak labeling process. Specifically, the state is updated as , where is the optimal previous state corresponding to . When a peak segment is arrived, we will then find the local maximum or minimum and assign it as the peak of the segment. For example, for an R-peak segment , we will iterate to and assign the highest/lowest value as the peak, depending on whether the type of the segment is a normal R-peak or an upside-down R-peak.
To remedy the effect of the high constant factor on computation time, particularly when is extremely large, we can observe that many non-optimal points were iterated over. For some specific segments, a precise time index is not needed for the segmentation. Thus, instead of iterating over all elements in a range , we can iterate only some elements by skipping over others with a certain number, such as 2, 3, and so on. In this research, the signal lengths are still within our tolerance. Hence, all time indices are considered.
Practicality-wise, our method used an average of 122.42 s per record in the MIT-BIH dataset, 29.08 s per record in the QT dataset, 112.82 s per record in the STCHANGE dataset, and 63.59 s per record in the INCART dataset. The hardware specification used was an AMD Ryzen 7 5800H with Radeon Graphics (3.2 GHz) processor with 16 GB RAM, on a 64-bit operating system, x64-based processor, on a local standalone computer. This shows that our algorithm could be implemented in smaller or portable hardware devices that could be used in the medical field. Note that the practical time used could be further reduced using the aforementioned technique of not considering every sample in a range during the labeling process in specific segments such as the “Start” segment, which does not require precise labeling.
Also, we would like to emphasize that our method, which runs in linear time, is theoretically more computationally efficient than conventional deep learning techniques that are employed in R-peak detections. Although not explicitly stated in their papers, deep learning architectures, such as convolutional operations or convolutional neural networks, can be very costly in computational power, typically running with a much higher complexity than linear time. This is usually due to the highly complex and multi-layered structure of deep learning architectures, which require several iterations of the input data. As a result, our proposed method offers an asymptotically computationally efficient alternative to deep learning-based methods and can be further reduced in real-life settings based on the pruning method mentioned above. It should also be noted that although the execution time of approximately 122 s for the MIT-BIH database on a mid-range standalone laptop computer may seem less efficient than some recent lightweight neural networks, it may be difficult to directly compare practical runtime on different devices or platforms, since hardware and software architectures vary greatly, ranging from low-end local standalone edge devices to high-end cloud computing servers.