1. Introduction
Acute chest pain accompanied by suspicion of ST elevation myocardial infarction (STEMI) is common in Europe. The incidence of hospital admissions for STEMI is between 44 and 142 per 100,000 inhabitants per year [
1]. Emergency Medical Services (EMSs) are involved in most cases for diagnosis, initial treatment and transportation [
2]. The 12-lead electrocardiogram (ECG) is the diagnostic cornerstone of STEMI [
3].
According to a recent study, sonification—the scientific method of representing data as sound in an auditory display—has the potential to play an important role as a new supporting tool for ST-segment surveillance during out-of-hospital care for patients suspected of having myocardial infarction [
4].
Sonification converts data, particularly biosignals, into sounds in a systematic and reproducible way. In our context, the ECG is used as a data source [
5]. These sounds are designed to convey information about biosignals to the EMS crew. Other examples of the auditory display of biosignals include pulse oximetry and the QRS tone of an ECG (cf.
Figure 1 for the notation of ECG signal segments), which are established methods in the medical environment. In the case of an ECG, the ST segment (
Figure 1) can also reveal different sounds when it is isoelectric or elevated [
4].
Rhythm disturbances are monitored continuously by one to three leads to enable the immediate treatment of malignant cardiac arrhythmias when they occur. According to the European Society of Cardiology (ESC) clinical practice guidelines, a 12-lead ECG has to be recorded on-scene within ten minutes of arrival but routinely recorded only once. The evaluation results of the ST segments are dependent on the age and sex of the patients, particularly the leads V2 and V3, according to the current ESC guidelines [
3]. Therefore, good knowledge and training are essential to ensuring correct and prompt diagnoses.
The diagnosis of STEMI can be challenging for EMS in cases of fluctuating symptoms due to transient ECG changes [
6]. We explore the potential of sonification to support EMSs in making the correct diagnosis by distinguishing ST elevation from non-ST elevation during the prehospital surveillance period. A quick and correct classification is necessary because if ST elevation or other signs of coronary occlusion are present in the ECG, immediate revascularization is required. Treatment includes the prompt delivery of medication on-scene and urgent heart catheterization. Therefore, the treating hospital has to offer 24/7 acute revascularization therapy [
3].
Encouraged by the recent findings of a feasibility study [
4], we formulated the following hypothesis for the current study: The sonification of the ECG can provide interpretable information about ST segment elevation and can be used supportively for emergency medical use. To test this hypothesis, we first developed a sonification concept to create an auditory representation of different STEMI severity levels and types. This concept incorporated evidence taken from corresponding cognitive studies to optimize information gain by reducing unnecessary cognitive load as much as possible, and we subsequently run through an expert evaluation process. We then conducted a classification study on a descriptive level with a training cohort of students. Of note, the presented classification, showing high accuracy, is only the first module of a bi-modular study. This first part is regarded as a preparatory step toward a confirmatory study addressed in the second part, a randomized controlled trial [
7]. A real-world study is our overriding near-term goal. We have prepared such a real-world study within the framework of ethical feasibility.
2. Related Research
A number of methods have been developed for the sonification of uni- and multivariate biomedical time series, such as the sonification of electroencephalography (EEG), of electromyography (EMG) and of CT/PET scans, to name a few [
8,
9,
10,
11]. With respect to ECG sonification, Kanev et al. have summarized the state-of-the-art in ECG sonification [
12]. There is recent work on novel wearable devices integrating ECG and PCG for cardiac health monitoring [
13,
14]. While data is displayed mostly visually so far, we regard wearables and mobile devices as highly promising for sonification applications. Current work on ECG sonification splits into two research lines. The first group of approaches focuses on sonifying temporal features such as the heart rate and the heart rate variability [
15,
16,
17]. For instance, heart rate sonification supports better self-regulation in sports training intensity [
18]. In contrast, heart rate variability is an early diagnostic indicator of arrhythmia and other heart-related diseases [
17].
The second line of research focuses on the representation of the morphology of ECG signals, supported by the fact that pathological states usually correspond to specific changes in the ECG signal shape—with applications to facilitate and support diagnostic tasks [
19,
20].
In summary, ECG sonification is a recognized field, especially when it comes to monitoring and detecting cardiac pathologies. To our knowledge, however, more elaborate/detailed sonifications are yet in the state of basic research, as complex sonifications are usually not integrated in medical devices or professional procedures.
Previous research introduced ECG sonification, with a focus on the ST segment. Its elevation was represented by various sonic means, such as using a formant synthesizer for
polarity sonification, to synthesis water drop sounds for the so-called
water ambience sonification, or a subtle morphing between timbres in morph sonification [
4,
5]. These approaches provided an overall indication of whether the ST segment was elevated but did not enable perception of the spatial distribution, i.e., in which orientations/projections ST elevations or depressions occur. To make these spatial patterns stand out clearly, the present study introduces an approach similar to what is known as
temporal nesting in the field of parameter-mapping sonification [
21].
Another focus of our previous ECG sonification research was the development of an
auditory magnification loupe, which perceptually magnifies changes in a variable, making subtle value changes highly salient. For instance, subtle rhythmic irregularities that are too small to be perceived as a change in rhythm have been mapped onto pitch deviations, thereby making them highly salient cues [
16]. Although we are taking a different approach here, this method could be a promising option for future projects. For example, it could be used to emphasize relative changes compared to a reference ST elevation profile.
5. Sonification Demonstration and Method Selection for Evaluation
The sonification methods introduced above demonstrate how flexible and versatile sonifications can be crafted, even under the narrow constraint of using just the 12-lead vector of ST elevations and even when narrowing the methodical focus even further to PMSon-based designs. Due to the sheer amount of possible designs, a comprehensive evaluation of all variants is not feasible. Therefore, design criteria and validation are based on expert reviews involving clinical partners, coauthors, and selected colleagues with different expertise. For that purpose, we developed a real-time sonification player and demonstrator that allows the user to interactively select sonification hyperparameters via drop-down widgets and sliders, as depicted in
Figure 8.
Figure 9 depicts a spectrogram of a 5-state sonification for the ST elevation profile of an severe inferior STEMI, using the parameters
ms, stride
, set separation
.
The key criteria for selecting the design and setting suitable hyperparameters, i.e., for model selection, were as follows.
Comprehensibility: The five pitch levels are few enough to be understood correctly without much training, and the low presentation speed allows one to follow the pitch contour.
Dominant Perceptual Quality: The presentation speed is fast enough that the six channels in each block are perceptually grouped in one cluster, forming a pattern or gestalt, similar to how raindrop sounds fuse into the perception of rain, or a sequence of knocking sounds merge into the interaction of a woodpecker with a tree. This grouping facilitates the association of groups to lead groups. Users are thus not challenged to interpret individual leads.
Aesthetics/pleasantness: The sound should be pleasant to hear, e.g., not too frequent, not too harsh or loud. This motivated the selection of dB, musical intervals for the two elevation pitches, and the two suppression pitches. The isoelectric pitch is already given by the pitch of the QRS tone.
Memorizability: Memorizability is an important characteristic optimized with design 3, as it involves only two groups, and the sequence is topologically ordered (Cabrera circle resp. counterclockwise)
Compatibility with environmental sounds: Note that the selection of sounds as synthesized sounds instead of using real-world sound samples or physical model-based sounds is useful as it avoids any confusion with ambient sounds. In addition, the sounds are selected to avoid interference with speech/vocal interaction. The pitches are in a middle/upper range of musical frequencies, where the sensitivity is high and low-cost loudspeakers easily project the sounds over several meters.
Backgrounding and Saliency: The sound streams should be regular (e.g., in rhythmic and temporal organization) to facilitate listeners’ habituation to these sound streams. Habituation is a perceptual skill allowing one to ignore sound streams (in favor of other tasks or sound streams) yet to remain sensitive to relevant changes in those background streams. An example is the sound of the refrigerator in the kitchen, or the exhaust in out-dated fossil-fuel-driven automobiles, which drivers clearly hear but can ignore until their sound pattern changes due to malfunction. However, mapping absolute ST values to more extreme pitches and sharpness, changes can be expected to be salient enough to draw the listeners’ attention.
Universality: Musical sounds are perceived according to the cultural background. The current choice of intervals, however, is largely driven by consonance, which is a more universal feature: the octave is a preferred interval in most musical communities, and the fifth and fourth are generally preferred consonant intervals.
Based on these criteria, we proceeded with design selection and method fine-tuning. We have tested multiple variants, including how it would appear at various realistic heart rates.
Concerning the channel model, we decided to play limb leads counterclockwise and to negate (invert) lead aVR so that the series becomes aVL, I, −aVR, II, aVF, III as depicted in the lower panel of
Figure 4 and
Figure 6. The clinical reason for this procedure is that by doing so, we have the ST elevations of the limb leads clustered. The resulting channel mode is the sequence of limb leads counterclockwise and precordial leads counterclockwise (viewed from the top downward).
Regarding the temporal organization (i.e., set separation and stride), the clinicians rated as a lower limit—since they believed that students otherwise could not properly resolve the channels. The slower version was regarded as being easier to learn and to comprehend, particularly within the scope of a study that leaves little time to familiarize. From a perspective of musical structures where patterns often take 4, 8 or 16 bars, a stride of eight heartbeats between repetitions of the sonification was subjectively preferred. To obtain a clear separation of the limb and precordial lead groups, a separation of two heartbeats is the minimum, i.e., limb leads/augmented leads played after the first beat, precordial leads starting with the third beat.
Regarding the pitch mapping, the tristate appeared to be ‘too coarse’ and did provide enough detail. On the contrary, continuous mapping and chromatic mapping offered too much variability to be processed with manageable effort. Further, it appeared to have worse aesthetics with many intervals that did not fit. This directed our attention towards the five-state sonifications, which were preferred by the clinicians, taking into account their expertise with medical alerts, soundscapes in the operating theater, and emergency situations. The solution was regarded as offering the highest discernibility and the lowest risk of alarm fatigue. If non-urgent or even false alarms are repeatedly detected by nurses or other staff, alarm fatigue may be the result, and staff might not react in the proper time if genuine emergency alarms are sounding [
29]. The five-state dissonant mapping was regarded as compelling concerning the quality of sound: dissonant sounds would have a stronger call for users’ actions to resolve a precarious situation. However, if a five-minute or even longer journey to the hospital is involved, then the constant need for the emergency crew to listen attentively to dissonant sounds while not being able to do anything about it will probably be pointless, which is why we eventually opted for the more pleasant/consonant five-state mapping.
In summary, for the study described in the following section, our sonification method assumes a QRS tone frequency of Hz and a pulse rate of 80 bpm, along with the following parameters: With respect to a tone’s fundamental frequency, we use (i) eight semitones below for a strongly suppressed ST (below −0.2 mV); (ii) five semitones below for a moderately suppressed ST (i.e., in [−0.2 mV, −0.1 mV]); (iii) for the non-elevated case (i.e., in [−0.1 mV, 0.1 mV]); (iv) four semitones above for a moderately elevated segment (in [0.1 mV, 0.2 mV]); and (v) seven semitones above for a strongly elevated ST segment (above 0.2 mV). Thereby, a tone’s frequency for any given reference frequency and semitone offset k is computed by .
The time difference between ST-tone onsets is
ms. The set separation is 8, i.e., a new sonification starts after every eight heartbeats. The stride is 2; i.e., the limb lead is anchored to the first beat, whereas the precordial lead set is anchored to the third (=1 + stride)th heartbeat. The duration
d of notes is 50 ms (respectively, 80 ms, 100 ms) for absolute values
mV (resp. [0.1 mV, 0.2 mV],
mV). The tone level
L is subject to individual adjustment; however, we apply a 5 dB boost for absolute values
mV (respectively, 15 dB for
mV). This offset itself is scaled by a parameter dBscale so that a tone’s actual level is the global base level
. We chose dBscale
and
dB. The tone’s sound signal itself is
, i.e., a sum of harmonics, appropriately phase-shifted to align the peaks.
for
mV, else
. We set the amplitude envelope as
where
c is the curve,
d the tone’s duration, and
t the time, in order to control how the tones fade out. We set
for absolute values
mV (respectively,
for absolute values
mV) to make iso-electric ST tones appear more percussive. We provide an implementation of this sonification in Python as
Supplementary Materials. This contains a function that takes a 12-lead vector of ST elevations as input and generates both a wave file and a visual plot of the sonification for reproducing our stimuli.
For the classification study described in
Section 6, we used the ST elevation profiles depicted in
Figure 10 obtained from the Laerdal simulator, as detailed in
Section 3. The same data are shown as a parallel coordinate plot in
Figure 4. These plots helped subjects understand the data and sonification for the classification study.
6. Classification Study Design
6.1. Overview
The local ethics committee of the Ruhr University Bochum, situated in Bad Oeynhausen, Germany, gave approval (file no.: 2022-1017). The study followed the principles of Helsinki.
The entire study consisted of a perception and STEMI classification study using audibly presented (sonified) ECG sequences, as well as a scenario study in the form of a randomized controlled trial (RCT) in order to compare sonification, assisted with a standard visual-based diagnostic process in an emergency situation. Here, we focus on the perception and classification part, whereas the results of the scenario study will be published elsewhere.
The long-term goal of the entire study is to enable and establish sonification in the medical emergency service. Therefore, the most important guiding principle here was to define a patient-relevant endpoint for an RCT. With regard to the emergency case, the correct classification of STEMI is merely an intermediate step, which presumably turns out differently in stressful situations versus a purely stress-free exercise situation. A patient-relevant endpoint is, therefore, the timely admission of a patient to the catheterization laboratory. Here, we refrain from elaborating on this endpoint and instead postpone the presentation of the corresponding RCT to a forthcoming article (for a conference abstract, see [
7]).
Of note, in order to ethically justify a study in an emergency setting, a preliminary study with a positive result and the design of a fallback strategy are absolutely necessary. An integration of this study concept into the emergency training of medical students therefore seemed predestined to create a scenario that is as real as possible and is located outside the real emergency operation. At the same time, a teaching concept is being developed. Part of this teaching concept is the introduction to sonification and the estimation of diagnostic accuracy in a first classification step, as presented here. The introduction to sonification and this first classification step thus represent the first module of a bi-modular approach. We refrained from a comparative study in the first module because both the pure classification quality without the temporal aspect is clinically not very relevant in stressful situations but also because the cohort size would not have permitted a further study arm in the first module.
Although no control group was planned for this first module, the study nevertheless serves to address the accuracy of the diagnosis as a function of various factors that are ultimately essential for the further development of the sonification design. The performed regressions therefore introduce a semi-quantitative aspect. As a final note, we would like to point out that we are aiming for sonification to optimize emergency operations; i.e., instead of addressing professional cardiologists, we are rather planning to train emergency paramedics, which should once again make the study setting in the context of emergency training plausible. Finally, cardiology experts also participated in the expert evaluation committee to select the best sonification method.
Further, basic epidemiological parameters were collected, and questionnaire-based surveys were carried out. A first survey regarding score-based self-assessments of previous experience with emergency situations and musical expertise, as well as questions concerning the general attitude toward sonification, took place before the study was carried out. Specifically, we asked forthe subjects’ agreement with the following statements on a Likert scale from 1 (not at all) to 5 (completely):
- Q1
I had experience with pre-clinical emergencies.
- Q2
I have participated to more than 3 emergency trainings.
- Q3
I feel confident when handling emergency situations.
- Q4
For me, emergency situations cause negative stress.
After the perception test, we asked for the subjects’ agreement with the following three statements:
- SQ1
Sonification is pleasant to listen to
- SQ2
The sonification is informative, i.e., it enables to identify ST elevation changes in the ECG
- SQ3
I can imagine to listen to these sonifications for a longer time period
6.2. Participants
After receiving the certificate of nonobjection from the local ethics committee on 20 January 2023, data acquisition began, which ended on 30 June 2023. The study cohort consisted of 44 students from the faculty of medicine of the Ruhr University Bochum (RUB) in Germany (5th academic year, corresponding to 9th and 10th semesters), who were assigned to a three-day training program in emergency medicine. After being informed about the nature, significance, and scope of the study, as well as the requirements for participants, but prior to the study, participants provided informed consent in written form by signing a declaration of consent, and also consented to the processing of study data. All students received a 10 min review about diagnosing and treating STEMI in prehospital care. Since the sonification of the ST segment was a new concept, they received an introduction to sonification, particularly ECG sonification. For this purpose, all participants received a 6:15 min introduction video about sonification, its impact on ECG monitoring, and the motivation of our research. As part of the introductio, n we presented audio examples of ECGs either in isoelectric mode or with significant elevation of the ST segment to the students.
Before starting the perception and classification study, we asked the participants to answer an initial questionnaire for the collection of basic demographic data (i.e., age, gender, medical education) and for experience in medical emergencies. Since it could affect the comprehension of sonification, they were also asked their pre-existing experience in music (see
Supplementary Materials for the complete questionnaire). Summary statistics depicting the composition of the cohort stratified by gender, including the self-assessment scores derived from the survey and the assignment to either the intervention or the control group of an RCT (to be published elsewhere) i.e., parameter sonification, are shown in
Table 1.
6.3. Classification Task
The sonification used in this study was designed for the detection of an ST segment elevation in the ECG that fulfills the STEMI criteria according to ESC guidelines (see 3.1 for details). A total of three similar blocks of five distinct sonified ECG-variants (IE, anterior–moderate, anterior–severe, inferior–moderate, inferior–severe) were presented to all of the participants () leading to classification instances. Participants were asked to classify the samples with regard to the localization (anterior or inferior) and severity (IE, moderate or severe) of the ST elevation using the questionnaire mentioned before.
9. Discussion
A first and primary focus of this article was the sonification concept, design, and implementation of the ST elevation sonification method. The step-by-step development process from ‘heartbeat-locked ST elevation arpeggio’ over ‘musical phrase melody over several heartbeats’ to ‘grouped lead scans’ illustrated how, step-by-step, sonification design progresses toward a solution that addresses issues such as cognitive load, attention, annoyance, perceptability and ecological acoustics, i.e., how sonifications integrate into the soundscape. We acknowledge that choices within the design progression are made on the basis of subjective decisions and not on objectively derived evidence. However, as in any other field of design, such as industrial or visual design, a fully objective evidence is not feasible. Retrospectively, however, usability or, in the given context, diagnostic value is objectively testable, the way we used it here. Further results will be published soon.
The design of sonification was initially based on the assumption that conveying as much information as possible in a timely manner was best. The ‘heartbeat-locked ST elevation arpeggio’ conveys all 12-lead elevations with each heartbeat. This could be useful if, e.g., medical interventions would require a precise and low-latency eyes-free monitoring of these details, for instance, for balloon-catheter interventions or intermittent pre-clinical ST elevation representations in the ECG (e.g., occlusive myocardial infarction). It is noteworthy that the sound level could be considerably lowered in practical applications; i.e., we expect the sound to become part of the
available periphery, as in Mark Weiser’s vision of calm computing [
33], an information stream that is available yet only attracting attention when needed—and thereby reducing distraction.
We realized that for extended monitoring (e.g., over many minutes to hours) the initially chosen amount of detail is overwhelming. Additionally, ST patterns change rarely between beats. Therefore, we explored a complementary approach in design 2 (musical phrase melody over several heartbeats), which reduces information density. However, a specific melody repeated many times over hours could be perceived as intrusive, and would also conflict with situations in which music is heard. Design 2 could be of interest for applications in long-term monitoring, particularly if it were modified by fading out upon absent changes and by boosting the sound level upon relevant ST changes over time.
Design 3 (the grouped lead scans) represents a compromise between precision and aesthetics, between low-latency and acceptable delay, in such a way that grouping is supported, thus revealing auditory gestalts (for typical motifs corresponding to clinically relevant parts).
All designs use pitched tones, as these are clearly recognized as artificial, to avoid the risk of confusing the auditory display with environmental sounds. Alternative ideas, such as water drop sounds and footstep sounds, have been dropped. However, it might be worth investigating other pitched real-world sounds as material, as these could lead to higher acceptance and reduced annoyance. We put this on our roadmap for future optimization.
We envision integrating sonification into commercial ECG devices. The computational requirements are rather low for the mapping and sonification rendering. For sound projection, due to the artificial nature of the sound, small loudspeakers would suffice, which could easily be integrated in devices.
It will be important that personnel have direct control to (de)activate sounds and control their sound levels. A more futuristic perspective could be the integration of monitoring sounds into the personal soundscape of medical staff, e.g., via transparent earplugs that do not block external sound but leave patients and others who do not need to be informed in a calm state.
In emergency situations, sonification may lead to cognitive overload, which needs evaluation. In the case the two groups of six sound events prove to be too much, the following pathways reduce the risk of cognitive load by condensing the information: (i) by merging two or three tones into one, in turn decreasing the spatial resolution of the sonification while maintaining the logic of suppression/elevation being represented as musical intervals, or (ii) by an automatic saliency control that couples the sound level of these additional elements to the amount of difference from the ST elevation profile of the initially recorded ECG.
In the advent of clinical AI-based diagnosis and information systems, it might appear questionable whether the proposed rather direct/low-level information approach would be superior to an AI method that condenses data and directly communicates the essential information to users. Recent approaches in replacing diagnostics completely by using AI and deep learning show high potential, as is the case, for instance, for cardiac arrhythmia diagnosis [
34] or abnormality classification of beat features [
35] and also STEMI classification [
36]. However, since we target scenarios where the aim is to remain sensitive to gradual changes in ST elevations, which may emerge, e.g., during an emergency transport, sonification has the benefit of being an analog representation, which is preferred in such situations. This is comparable to the pitch modulation used in a pulse oximeter to detect gradual changes in oxygen saturation. Furthermore, AI-based approaches bear the risk of false positives and go along with an interruption of other ongoing processes (in an emergency situation), while the low-level monitoring continuously supplies information and leaves the evaluation to the human. Human operators usually feel more in control if they perceive continuous changes or the actual state as compared to being informed only of critical changes.
We used surrogate ECG data obtained from the Laerdal Resusci Anne Advanced SkillTrainer®. We observed little variability in the measured data over time and obtained rather stable ST elevation patterns over time. The next step would be to apply the chosen design to real-world data from emergency procedures. This data will include various artifacts such as shivering, interference from other electronic devices, body movements, loose electrodes, or problems due to incorrect lead placement. Thus, the question becomes how increased variability in ST elevation sonification would be interpreted and to what extent medical conclusions depend on these factors. For the present study, however, the limitation of using the Laerdal System is outweighed by the advantage that a clear and reproducible gold standard is defined. The classification carried out is therefore significantly less susceptible to bias than would be the case with a gold standard defined by experts in the case of real data.
This study reveals that ST elevation, a pathology of the ECG, can reliably be detected by a defined minimally trained cohort of medical students. The data were transferred to a novel specially designed ST elevation sonification of the 12-lead ECG. ST elevation is the most important part of the ECG when assessing patients with chest pain and therefore essential for diagnosing STEMI [
3]. After a short training period, subjects demonstrated good recognition of the different sounds, suggesting that more precise discrimination could be achieved with longer training. A 6.5-minute video tutorial was sufficient to teach the method to subjects who were naïve to sonification.
Training is known to improve pattern recognition in regular visual ECG learning [
37]. Therefore, the same is to be expected for ECG sonification of the ST segment.
We investigated a group of medically trained individuals. However, among this otherwise quite homogeneous group, we observed structural variance. Demographic covariates include the fact that some of the students had worked in EMS previously, and some had musical training. However, the proposed method is designed to be generally suitable for EMS, and even in hospital setting, this pre-trained group of test subjects appears to be representative.
The use of sonification to improve the correct interpretation of abnormal ECG data and/or signals has already been tested in several preliminary studies. In 2017, Kather et al. [
20] showed that the auditory presentation of ECG data can enable the detection of clinically relevant cardiac pathologies (including bigemini, atrial fibrillation, or STEMI) by non-medically trained volunteers and after a short training session. Additionally, this study demonstrated that a cohort of 5th-year medical students had been able to correctly identify the severity and type of STEMI with a high degree of sensitivity.
Sonification of the ST segment has been scientifically investigated in previous studies [
4,
5], and the potential for detecting ST elevations has already been demonstrated. ST elevations can be transient and therefore not detectable at all times. Since this ECG pattern in the context of a myocardial infarction results in a different treatment priority, it is crucial to detect it. Sonification is a helpful method for detecting these transient ST elevations in this constellation.Additionally, different degrees of ST-segment elevation could be detected. It is to be expected that the just noticeable differences (JNDs) could become smaller with increased training.
Compared to realistic settings, the perception study has the limitation that the participants were not distracted by other impressions, particularly not by impressions of auditory nature. This limits the generalization power. In the future, users can be taught this method by using tutorials as video clips. It is not practical to replace the current practice of printing out a 12-lead ECG or visualizing it in another way. However, ST elevation sonification can be a valuable add-on to current practice, as sonification can draw attention to transient ST elevations.