Automated ECG Interpretation—A Brief History from High Expectations to Deepest Networks

: This article traces the development of automated electrocardiography from its beginnings in Washington, DC around 1960 through to its current widespread application worldwide. Changes in the methodology of recording ECGs in analogue form using sizeable equipment through to digital recording, even in wearables, are included. Methods of analysis are considered from single lead to three leads to twelve leads. Some of the inﬂuential ﬁgures are mentioned while work undertaken locally is used to outline the progress of the technique mirrored in other centres. Applications of artiﬁcial intelligence are also considered so that the reader can ﬁnd out how the ﬁeld has been constantly evolving over the past 50 years.


Introduction
Augustus Waller (1856Waller ( -1922 was the first person to record a single lead electrocardiogram (ECG), which is a recording of the electrical activity of the human heart, in St Mary's Hospital, London in May 1887 [1]. Ventricular depolarisation and repolarisation were demonstrated using the Lippmann capillary electrometer. Waller had been a medical student in Aberdeen and Edinburgh and was made professor in the University of Aberdeen, Scotland, in 1881. He made further observations on ECGs on his dog Jimmie, who was often used in his lectures.
Around the same time, in Aberdeen, Scotland, John A. MacWilliam, Professor of the Institute of Medicine, introduced the term atrial flutter and the concept of ventricular fibrillation [2]. James Mackenzie, a Scottish Physician, published his own work in 1902 on the study of the pulse [3]. He used a home-made polygraph to record the action of all four chambers of the heart. His contribution to ECG development was acknowledged by Karel Frederik Wenckebach, the much-honoured Dutch physician who researched irregularities of cardiac rhythm.
Shortly thereafter, Willem Einthoven, based in Leiden in the Netherlands, introduced the three standard bipolar limb leads with the use of his own galvanometer, as a result of which he was awarded the Nobel Prize for Medicine in 1924 [4]. The first commercial version of the Einthoven electrocardiograph was produced in 1908 by the Cambridge Instrument Company in England. It recorded Einthoven's three leads, I, II and III and used pails of conducting solution as electrodes.
Thomas Lewis, one of Mackenzie's former junior staff members, published 'The Mechanism and Graphic Registration of the Human Heart'. In its third edition, in 1925, it summarized early work on cardiac arrhythmias based on the use of Einthoven's three limb leads.
It is unthinkable that these pioneers could have projected forwards over 100 years to predict what electrocardiography would be like at the present time. Electronic computers had not been invented and electrical circuitry had certainly not been miniaturised to the extent which it is nowadays.

The Early Days
The use of computers for ECG interpretation was first evaluated using the orthogonal lead ECG and the 12 lead ECG. The first approach to automating analysis of ECGs commenced in 1957 in the laboratory of Dr. Hubert V. Pipberger ( Figure 1) using three simultaneously recorded orthogonal leads [11]. The Veteran's Administration (VA) Hospital in Washington DC established a special research programme for medical electronic data processing as medical electronics began a period of growth, and Pipberger was appointed director [12]. lead ECG [15]. Analogue ECG recordings had to be converted into digital data using rather large equipment [11], which has to be compared with the current possibility of converting an ECG signal into a digital form for analysis within a wearable such as a wristwatch. Each diagnostic output from Pipberger's program had a probability attached on a scale of 0-1 and the sum of all outputs had to total 1. This could be confusing if a new diagnostic output was added, in a later ECG, to an existing abnormality which then had a reduced probability of being present. In contrast, in 1959, Dr. Cesar Caceres ( Figure 1) and his team in the National Institute of Health's Medical Systems Development Laboratory, also in Washington DC, based their approach on the analysis of the 12 lead ECG using conventional clinical ECG criteria, but initially by processing one lead at a time [16]. Caceres coined the term 'clinical engineering', putting engineering into the clinical world of medicine in order that the various disciplines could work hand in hand to improve healthcare in practice. He graduated from Georgetown University and specialised in Internal Medicine at Tufts and Boston Universities in Boston, Massachusetts. He received Cardiology specialisation and research training from George Washington University [17].
This early work led to the expectation of the technique playing a significant role in ECG interpretation.

The Glasgow Contribution
One of the authors (PWM) began work in Glasgow on Computer Assisted Reporting of Electrocardiograms (C.A.R.E) as a student with Professor T.D. Veitch Lawrie in the Department of Medical Cardiology, University of Glasgow, who had anticipated an expansion of the use of computers in ECG analysis. Early work determined that the conventional 12 lead ECG and the three orthogonal lead ECG could be used for computer interpretation of ECGs and so diagnostic criteria were developed for both lead systems [18]. Additional vectorcardiographic measurements were made and incorporated into the diagnostic criteria. To record and analyse the ECGs, a standard 3 channel VCG system was combined with 3 single-channel electrocardiographic amplifiers and a multi-channel analogue tape recorder linked to a small PDP8 digital computer with an analogue-digital converter, allowing ECGs to be replayed from the tape recorder to the computer. An ECG database Pipberger was born in Hamburg in 1920 and studied at the Rheinische Friedrich Wilhelms University in Bonn, Germany. He was an army doctor during World War II and was captured and imprisoned in France. He saved himself by telling stories in French, entertaining his captors [13].
A pioneer in the field of electrocardiology, he trained as a cardiologist and recognised the effectiveness of collaboration with electrical engineers, physicists, mathematicians, statisticians and computer programmers in problem solving and interdisciplinary research [14]. Dr. Pipberger's lab based its early analysis system on the three orthogonal lead ECG [15]. Analogue ECG recordings had to be converted into digital data using rather large equipment [11], which has to be compared with the current possibility of converting an ECG signal into a digital form for analysis within a wearable such as a wristwatch. Each diagnostic output from Pipberger's program had a probability attached on a scale of 0-1 and the sum of all outputs had to total 1. This could be confusing if a new diagnostic output was added, in a later ECG, to an existing abnormality which then had a reduced probability of being present.
In contrast, in 1959, Dr. Cesar Caceres ( Figure 1) and his team in the National Institute of Health's Medical Systems Development Laboratory, also in Washington DC, based their approach on the analysis of the 12 lead ECG using conventional clinical ECG criteria, but initially by processing one lead at a time [16]. Caceres coined the term 'clinical engineering', putting engineering into the clinical world of medicine in order that the various disciplines could work hand in hand to improve healthcare in practice. He graduated from Georgetown University and specialised in Internal Medicine at Tufts and Boston Universities in Boston, Massachusetts. He received Cardiology specialisation and research training from George Washington University [17].
This early work led to the expectation of the technique playing a significant role in ECG interpretation.

The Glasgow Contribution
One of the authors (PWM) began work in Glasgow on Computer Assisted Reporting of Electrocardiograms (C.A.R.E) as a student with Professor T.D. Veitch Lawrie in the Department of Medical Cardiology, University of Glasgow, who had anticipated an expansion of the use of computers in ECG analysis. Early work determined that the conventional 12 lead ECG and the three orthogonal lead ECG could be used for computer interpretation of ECGs and so diagnostic criteria were developed for both lead systems [18]. Additional vectorcardiographic measurements were made and incorporated into the diagnostic criteria. To record and analyse the ECGs, a standard 3 channel VCG system was combined with 3 single-channel electrocardiographic amplifiers and a multi-channel analogue tape recorder linked to a small PDP8 digital computer with an analogue-digital converter, allowing ECGs to be replayed from the tape recorder to the computer. An ECG database was accumulated in both analogue and digital form. In the early 1970s, portable ECG recording units were assembled which could be transported to wards and clinics easily on a trolley. The modified axial lead system [19] was used and the analysis time was the order of one minute. What was thought to be the first hospital-based mini-computer system for routine ECG interpretation was developed and introduced ( Figure 2) in Glasgow Royal Infirmary around 1971 [20]. In the mid-70s, a hybrid lead system [21] was designed that combined the 12 lead and the three orthogonal lead ECG with the use of two additional electrodes (V5R and the neck), but there was little clinical acceptance.
Hearts 2021, 2, FOR PEER REVIEW 4 was accumulated in both analogue and digital form. In the early 1970s, portable ECG recording units were assembled which could be transported to wards and clinics easily on a trolley. The modified axial lead system [19] was used and the analysis time was the order of one minute. What was thought to be the first hospital-based mini-computer system for routine ECG interpretation was developed and introduced ( Figure 2) in Glasgow Royal Infirmary around 1971 [20]. In the mid-70s, a hybrid lead system [21] was designed that combined the 12 lead and the three orthogonal lead ECG with the use of two additional electrodes (V5R and the neck), but there was little clinical acceptance. Figure 2. The first automated ECG interpretation system in operation in Glasgow Royal Infirmary around 1971. One technician controlled the tape recorder and listened to the patient details which were also recorded. The second technician monitored the three orthogonal lead ECG on the oscilloscope and started the analogue to digital conversion. The software was stored on the small digital tapes (DECtapes) and retrieved as necessary.
A major technological advance was the advent of the microprocessor and the arrival of automated ECG analysis at the bedside. In Glasgow, analogue to digital conversion at 500 samples per second was undertaken within an electrocardiograph designed and built by Dr. M. P. Watts, who had also introduced techniques for transmitting ECGs between a local hospital and the ECG lab for automated interpretation [22]. The analysis program was rewritten in Fortran and moved to a PDP11 series computer. New diagnostic criteria evolved for the 12 lead ECG including rhythm analysis and serial comparison [23]. Many clinical studies, some of which are described below, led to an enhanced program for automated ECG analysis, which was commercialised in the early 1980s.

Neonatal and Paediatric ECG Analysis
The ECGs of the neonate, infant and child are completely different from the ECG of the adult. For this reason, special considerations apply to automated ECG interpretation of the neonatal and paediatric ECG. The first automated ECG interpretation system in operation in Glasgow Royal Infirmary around 1971. One technician controlled the tape recorder and listened to the patient details which were also recorded. The second technician monitored the three orthogonal lead ECG on the oscilloscope and started the analogue to digital conversion. The software was stored on the small digital tapes (DECtapes) and retrieved as necessary.
A major technological advance was the advent of the microprocessor and the arrival of automated ECG analysis at the bedside. In Glasgow, analogue to digital conversion at 500 samples per second was undertaken within an electrocardiograph designed and built by Dr. M. P. Watts, who had also introduced techniques for transmitting ECGs between a local hospital and the ECG lab for automated interpretation [22]. The analysis program was rewritten in Fortran and moved to a PDP11 series computer. New diagnostic criteria evolved for the 12 lead ECG including rhythm analysis and serial comparison [23]. Many clinical studies, some of which are described below, led to an enhanced program for automated ECG analysis, which was commercialised in the early 1980s.

Neonatal and Paediatric ECG Analysis
The ECGs of the neonate, infant and child are completely different from the ECG of the adult. For this reason, special considerations apply to automated ECG interpretation of the neonatal and paediatric ECG.
The duration of the neonatal QRS complex is significantly shorter than that of the adult ECG, which implies a higher frequency content. This has often led to claims that the technology for recording the neonatal and paediatric ECG in general should be enhanced compared to that for recording the adult ECG. However, there has not been any study which has shown a clinically significant difference between a higher and a lower sampling rate when converting the ECG from analogue to digital form. Specifically, a major study by Rijnbeek [24] and colleagues from the Netherlands showed that reducing the sampling rate from 1000 samples per second to 500 samples per second had no impact on normal limits which they developed in infants and children.
The ECG of the newborn tends to have a QRS axis which is in the range of 90-180 • which, for the adult, would be known as right axis deviation but for the baby is normal. This is simply a function of the path of circulation of blood through the foetus and the major role played by the right side of the heart at that stage in the development of the child. After the baby is born, the circulation changes and there is a gradual shift in emphasis of contraction, with the left ventricle becoming much more dominant than the right ventricle. It is probably not well appreciated that the ECG of the neonate therefore changes significantly, even over the first week of life. This was demonstrated by one of the authors (PWM) and colleagues [25]. Thus, interpretation of the neonatal ECG should ideally be based on a knowledge of the date of birth and date of recording so that changes from one day to another can be considered in an interpretation. Nowadays, new mothers are generally encouraged to leave hospital within 24 h or 48 h, and so trying to obtain a database of ECGs of neonates from birth to one week of life is currently extremely difficult.
Of course it goes without saying that, as the child grows, so does the heart and hence QRS duration, for example, linearly increases in duration from birth to adolescence. Allowance, therefore, has to be made for this in an interpretative program. Similarly, heart rate decreases shortly after birth, though not immediately, and again, simple equations can be used to set an upper limit of normal from the first week of life to adolescence [25].
It is suggested that with current advances, no matter how impressive, the use of machine learning will still prove challenging in the area of paediatric ECG interpretation.

Adult Age and Sex Differences in ECGs
There are many differences between adult male and female ECGs [26], and automated interpretation should be able to handle such variations with ease. In broad terms, QRS voltage is higher in younger compared to older persons, particularly in males, but this difference diminishes with increasing age. The same is true for ST amplitude, especially in the precordial leads, though it remains higher in males at all ages [27]. This latter publication eventually led to sex differences in ECG criteria being taken into account when reporting ST elevation myocardial infarction, as now acknowledged in the latest universal definition of myocardial infarction [28].
Mean QRS duration is higher in males than females though it is rare for this to be acknowledged in diagnostic criteria, with an exception being in 'true' left bundle branch block (LBBB) [29].

Racial Differences
It has been established that there is a clear ethnic variation in certain aspects of the ECG that should be acknowledged when making an interpretation. A number of studies have shown differences in normal limits of the ECG between Caucasians, Black people and Asian individuals [30,31] and diagnostic criteria should allow for this.
The availability of digital electrocardiographs and computers which can easily handle vast numbers of ECGs should allow for further work to be done on enhancing race-based diagnostic criteria. For example, it was noted that the mean ST segment amplitude is higher in Black people than in Caucasians and is higher in males than in females [30]. Rautaharju also showed that Black people had higher voltages than Caucasians in one of his studies [31]. From a historical perspective, Simonson pointed out racial differences in his 1961 treatise on the normal electrocardiogram [32]. His comparison was mainly between Caucasian and Japanese individuals, but differences were acknowledged at that time. One of the authors (PWM) also compared Caucasian with Indian, Nigerian and Chinese cohorts [30], showing a variety of differences, so it is important that race be acknowledged in ECG interpretation.

The European Contribution
In 1974, one of the authors (PWM) obtained a scholarship from the British Heart Foundation to spend one month in Europe visiting various centres which, by that time, had commenced work on some aspect of automated ECG interpretation. These included university departments in hospitals in Leuven in Belgium, Rotterdam in the Netherlands, Lyon in France, and Hannover in Germany. A report was compiled summarising the developments in progress and suggestions for collaboration were made. The net effect was that the European Economic Community, as it was known at the time, set up a project to further the technique of automated ECG analysis by supporting a joint project involving all those interested centres in various European countries. In due course, participation was opened to those from overseas, mainly the USA, who also had an interest in the topic. The North American delegates were mostly representatives from commercial companies developing products.
The project was entitled Common Standards for Quantitative Electrocardiography and it quickly became known as the CSE Project. A detailed summary of its main goals can be found elsewhere [33] and it arguably became the best-known project in automated electrocardiography. The project leader was Professor Jos Willems, who had spent time in Pipberger's laboratory in Washington DC in the early stages of the development of the technique. He led the project from 1976 through to the early 1990s, when unfortunately he was found to have a brain tumour and died shortly thereafter [34].
Early in the project, a steering committee was established, consisting of Professor Jos Willems (Chairman), Rosanna Degani (Padua, Italy), Christoph Zywietz (Hannover, Germany), Peter Macfarlane (Glasgow, UK), Jan van Bemmel (Rotterdam, The Netherlands) and Pierre Arnaud (Lyon, France). Later, Paul Rubel from Lyon replaced Pierre Arnaud ( Figure 3). The steering committee met almost quarterly for over 10 years and there were biennial meetings of the full working group where multiple individuals from the same team could attend.  The availability of a library of digitized ECGs made it possible for extensive recommendations on signal processing to be published as part of the CSE Study [37]. It should be noted that at that time (1985 and earlier), the majority of electrocardiographs produced recorded three leads simultaneously. Many of the recommendations therefore related to dealing with groups of three leads. Nevertheless, interval measures such as QT were recommended to be based on the longest QT interval measured in any single lead, though V1-V3 were suggested as giving the most accurate result. Other definitions, for example, included the exclusion of an isoelectric segment at the onset or termination of a QRS complex from the QRS duration in the lead in which it occurred. The overall QRS duration, One of the biggest outcomes of the project was the establishment of databases, both of ECG waveforms for testing measurements and also of ECG interpretations from 1220 patients whose clinical condition was documented. These databases are still of importance to this day, over 30 years later.
The availability of the waveform database allowed the establishment of standards for ECG wave recognition which are still in use at the present time. The CSE diagnostic database resulted in a landmark publication [35] in 1991, where the accuracies of different diagnostic programs were assessed against both a clinical diagnosis and, separately, against the opinion of a group of eight cardiologists. Even today, companies wishing to submit data on performance of their software very often resort to using analysis based on the CSE diagnostic database. This requirement will continue in the new ISO/IEC standard for automated ECG interpretation which will be entitled '80601, Part 2-86: Particular requirements for the basic safety and essential performance of electrocardiographs, including diagnostic equipment, monitoring equipment, ambulatory equipment, electrodes, cables, and leadwires'. A summary of the expected contents based on a first draft already circulated can be found elsewhere in this issue [36].
The availability of a library of digitized ECGs made it possible for extensive recommendations on signal processing to be published as part of the CSE Study [37]. It should be noted that at that time (1985 and earlier), the majority of electrocardiographs produced recorded three leads simultaneously. Many of the recommendations therefore related to dealing with groups of three leads. Nevertheless, interval measures such as QT were recommended to be based on the longest QT interval measured in any single lead, though V1-V3 were suggested as giving the most accurate result. Other definitions, for example, included the exclusion of an isoelectric segment at the onset or termination of a QRS complex from the QRS duration in the lead in which it occurred. The overall QRS duration, nevertheless, was defined as the time from the earliest onset to the latest offset in the group of leads under consideration.
One of the interesting points to emerge from analysis of the CSE diagnostic database was that there were significant differences in sensitivity and specificity of diagnostic programs when the gold standard was, on the one hand, based on clinical data and, on the other, when it was based on the consensus interpretation of eight cardiologists. Some programs were developed on the basis of using clinical data while others were developed using cardiologist views as the gold standard, and that was reflected in the results of the study [35].
It is worthy of note that several programs developed in academic institutions in Europe were commercialised. These included the Glasgow program developed in the University of Glasgow [38], the HES program developed in the University of Hannover [39], and the MEANS program developed in the University of Rotterdam [40]. Software developed in the USA for commercial use was developed within industry, e.g., the Marquette-General Electric (GE) program, Hewlett Packard-Philips program, and the Mortara program.
It should also be noted that some programs used classical deterministic criteria, e.g., R amp in aVL > 1.5 mV, where others used a more statistical approach, involving probabilities. It was found that those programs which used classical criteria were more closely aligned with the gold standard based on cardiologist interpretations and conversely, those developed using probability theory were more closely aligned with the clinical data.
A typical example of this conundrum would be when the clinical diagnosis was left ventricular hypertrophy (LVH), which was based perhaps on a history of hypertension and an increased cardiothoracic ratio (1980s type of CSE criteria), but where the ECG was essentially within normal limits. The software developed according to a clinician's view would report a normal ECG and that would be in line with the cardiologist interpretation. There is therefore agreement in that case between the computer and the cardiologist. On the other hand, both the computer program and the cardiologist are wrong with respect to the clinical diagnosis of LVH. Conversely, the statistical program would be more likely to report LVH correctly in line with the clinical diagnosis but would be wrong with respect to the cardiologist interpretation. This example explains why different results are obtained with different software and different gold standards.
This still remains a problem to some extent even with the newer techniques of machine learning, because more often than not when a large dataset is used for training, it can be that the cardiologist over-read of an ECG is used as the gold standard, although not always. This point will be considered later.
A by-product of the project was the establishment of a Standard Communications Protocol (SCP) for electrocardiography [41]. This was designed with the aim of providing data from different manufacturers' ECG machines in a similar format which might for example contribute to a database or allow one vendor's system to analyse ECGs recorded on another's equipment. The SCP was strongly supported by Rubel who continued to regard it as of significant value, so much so that a new version has been released within the last year. Details can be found elsewhere in this issue [42].
In summary, the CSE Project has had a very significant influence over the field of automated ECG interpretation and still remains of great value at the present time.

The North American Contribution
In terms of software development for automated ECG analysis, the USA undoubtedly led the way with Caceres and Pipberger taking the lead as previously described. However, while the CSE project had a huge impact on standards for ECG wave recognition, other recommendations had previously been initiated by Pipberger [43]. These concentrated mainly on equipment for ECG and VCG recording but a recommendation for sampling analogue data for conversion to digital data of 500 samples per second was made. That recommendation is still followed by many systems. After the Caceres program fell out of favour through sampling one lead at a time [16], IBM with Ray Bonner at the helm, led the way commercially in developing a 12 lead ECG program [44] with three leads recorded simultaneously. Early players in the field included Telemed and Marquette, who were also prominent, with interpretative software predominantly developed by Dr. David Mortara. He later launched his own company which was many years later purchased by Hill-Rom. Marquette was subsequently taken over by GE. Hewlett-Packard also had a 12 lead ECG analysis program and this arm of the company was taken over by Philips.
From an academic standpoint, there was not the same proliferation of software produced by Universities in North America as had emerged from Universities in Europe. The principal exception was software [45] that was developed in Dalhousie University, Halifax, Nova Scotia, where Dr. Pentti Rautaharju had established the Epicare lab investigating various aspects of electrocardiography, including mathematical modelling, the local development of which he stimulated. He was very influential in the field which he had followed from 1960, when he was one of the authors of the Minnesota Code [46]. He moved from Halifax to Edmonton, Alberta and continued with a variety of studies and, together with his wife, published a book on the ECG in epidemiological studies and clinical trials [47]. His final academic move was to Winston-Salem. He died in 2018 [48].
Other North American contributions should not be overlooked. Early work in automated ECG analysis was undertaken by Dr. Ralph Smith and colleagues at the Mayo Clinic [49]. This was facilitated by an enormous number of ECGs being recorded annually in that establishment in the early 1970s, reportedly over 100,000 [50]. Smith introduced the concept of the ECG Interpretation Technician (EIT). In short, experienced ECG technicians received instruction in reviewing ECGs interpreted by an early IBM program and ultimately worked independently in over-reading the initial computer-based report [50].
Dr. Jim Bailey and colleagues at NIH produced a number of papers in the early 1970s relating to the assessment of automated ECG analysis programs. One very interesting and simple technique which he described was to take odd and even samples separately from data sampled at 1000 samples/s in order to create two ECGs sampled at 500 samples/s. The paired ECGs were then interpreted using two different programs and similar interpretations were found in only 49.8% and 79.7% after initial analogue filtering of the signal [51].
Bailey was also involved in preparing standards for ECG signal processing in 1990 [52]. This was followed in 2007 by an expanded set of recommendations by Dr. Paul Kligfield et al. [53] for the standardization and interpretation of the electrocardiogram. This was the first of six such scientific statements which appeared between 2007 and 2009 with the support of the American Heart Association, American College of Cardiology, Heart Rhythm Society and the endorsement of the International Society of Computerized Electrocardiology.
It should also be noted that the International Electrotechnical Commission also became involved in creating standards and in 2003 produced guidelines such as IEC 60601-2-51 (subsequently replaced by IEC 60601-2-25) for ECG processing, including specifications on maximum tolerances on measuring ECG wave amplitudes and durations. These guidelines are discussed elsewhere in this issue [36].

The Technology
In parallel with developments in software for automated ECG analysis has been the miniaturisation of equipment. Initially, the early electrocardiographs which offered ECG interpretation on the spot were akin to the size of a washing machine on wheels (Figure 4). One advert proudly proclaimed that ECG interpretation was available within one minute. With hindsight, this was not necessarily the best way to advertise the product because no cardiologist would stand around at the bedside waiting one minute for a second opinion on an ECG interpretation. This has to be compared with the performance of current PCs, where 50 ECG analyses per second are commonplace.
The paired ECGs were then interpreted using two different programs and similar interpretations were found in only 49.8% and 79.7% after initial analogue filtering of the signal [51].
Bailey was also involved in preparing standards for ECG signal processing in 1990 [52]. This was followed in 2007 by an expanded set of recommendations by Dr. Paul Kligfield et al. [53] for the standardization and interpretation of the electrocardiogram. This was the first of six such scientific statements which appeared between 2007 and 2009 with the support of the American Heart Association, American College of Cardiology, Heart Rhythm Society and the endorsement of the International Society of Computerized Electrocardiology.
It should also be noted that the International Electrotechnical Commission also became involved in creating standards and in 2003 produced guidelines such as IEC 60601-2-51 (subsequently replaced by IEC 60601-2-25) for ECG processing, including specifications on maximum tolerances on measuring ECG wave amplitudes and durations. These guidelines are discussed elsewhere in this issue [36].

The Technology
In parallel with developments in software for automated ECG analysis has been the miniaturisation of equipment. Initially, the early electrocardiographs which offered ECG interpretation on the spot were akin to the size of a washing machine on wheels ( Figure  4). One advert proudly proclaimed that ECG interpretation was available within one minute. With hindsight, this was not necessarily the best way to advertise the product because no cardiologist would stand around at the bedside waiting one minute for a second opinion on an ECG interpretation. This has to be compared with the performance of current PCs, where 50 ECG analyses per second are commonplace. The reduction in size of equipment has continued through the possibility of having ECG acquisition and interpretation on a mobile phone with the actual ECG amplifiers The reduction in size of equipment has continued through the possibility of having ECG acquisition and interpretation on a mobile phone with the actual ECG amplifiers being external to the mobile unit. On the other hand, wrist watches are now widely available which allow a single lead of the ECG to be recorded, displayed and a limited interpretation of rhythm offered. There are variations on the theme whereby a three electrode device can be used to record six limb leads via a mobile phone with transmission of the signals to a central facility for review if required. A review of some of these devices and techniques can be found elsewhere in this issue [54].
Although clinicians have always favoured the 12 lead ECG, current technology allows for the recording of many more leads, such as in body surface mapping, which is discussed elsewhere in this issue [55], to the other extreme of a single channel recording for ambulatory monitoring, also discussed elsewhere in this issue [54]. Because of the lack of redundancy, it is extremely important that the single channel ECG is of high quality when analysed. Most smart watches will use strong filtering in order to try to remove unwanted noise on the recording and provide a very stable baseline. This may be at the expense of a small loss of signal amplitude, but ultimately this is not of relevance in the interpretation of cardiac arrhythmias.
The initial internal report using the Apple Watch (versions 1-3), which used photoplethysmography (optical sensor)-based detection of the pulse to determine heart rhythm irregularity, showed extremely good sensitivity and specificity ≥ 98% for detecting atrial fibrillation (AF) [56]. The Apple Heart Study, using the same type of watch, recruited 419,297 volunteers, and reported [57] that 0.52% received a notification of an irregular pulse. A total of 450 patients completed a follow up of long-term monitoring and only 34% were found to have AF. A more recent small study of 50 patients [58] using the Apple Watch 4, where the ECG is recorded using lead I (potential difference between left wrist and a finger on the right hand touching the crown of the watch), found 41% sensitivity and 100% specificity for AF determined from the watch-based interpretation, leading the authors from the Cleveland Clinic to conclude that 'physicians should exercise caution before undertaking action based on electrocardiographic diagnoses generated by this wrist-worn monitor'.
These results reflect the difficulty of providing high quality data to an algorithm for ECG signal processing and interpretation. A discussion on the use of 1 to 12 leads recorded from 10 s to 30 days, particularly for ambulatory monitoring with wearables, can be found elsewhere in this issue [54]. Various aspects of filtering/denoising the ECG in an attempt to provide a clean ECG are also presented in that article.

Machine Learning
The most recent development in the field of automated ECG analysis has been the use of artificial intelligence (AI), including a variety of machine learning techniques to aid interpretation. One of the authors (PWM) was involved in the use of neural networks in the early 1990s [59] but at that time, use of a simple neural network did not prove to be of any great advantage in ECG interpretation compared to the use of more basic, straightforward diagnostic criteria.
More recently, with advances in miniaturization but also very significant developments in software, the use of more advanced neural networks such as 'deep' convolutional neural networks has added greatly to the ability of software to undertake ECG interpretation without the need to develop diagnostic criteria. In addition, the easy availability of machine learning software has led many research groups to investigate the use of AI in this field.
There are essentially two approaches that can be used. In one case, the raw ECG data can be input to the software which 'detects' features, sometimes in an unknown manner, leading to a separation into different diagnostic groups. The other approach is to use actual ECG measurements such as wave amplitudes and durations and allow the software to pick out the features which will also lead to a separation of data. In some approaches, the ECG classification may be input to the training set and this is called supervised learning, whereas in other cases, there is no classification provided during training, i.e., this is unsupervised learning, and the system itself sorts out the different ECGs into a variety of classes.
Publications on the use of AI for 12 lead ECG interpretation are already appearing [60,61]. A recent study by Kashou et al. states that their AI-based approach 'outperforms an existing standard automated computer program' and also 'better approximates expert over-read for comprehensive 12 lead ECG interpretation' [62]. Rhythm analysis was included. Other studies dealing only with analysis of cardiac rhythm have been published [63,64].
One of the more interesting aspects of the use of machine learning, for example, is to use the ECG for the detection of abnormalities which are not in the ECG itself, e.g., in the contraction of the heart. Several papers have now been published on the detection of left ventricular diastolic dysfunction and reduced ejection fraction [65,66]. A recent meta-analysis of five such studies confirmed the ability of AI to identify heart failure from the 12 lead ECG [67]. This leads to the concept of certain packages based on AI being used in a very specific group of patients. A good example of this is the ability of an AI-based system to detect concealed long QT syndrome, where conventionally this is diagnosed when the QT interval exceeds a fixed threshold such as 500 ms Now, an AI-based system can report long QT syndrome when the QT interval is less than 450 ms [68], with confirmation being achieved via appropriate genetic testing. However, this approach would appear to be suited to use in a clinic where individuals suspected of having long QT or being screened for familial long QT are involved, but if applied to the general population, might result in a very high percentage of false positive reports of concealed long QT.
AI-based approaches have also been used for prediction, albeit retrospectively, based on large data bases. For example, mortality has been predicted from the 12 lead ECG in a large cohort, even in those with a normal 12 lead ECG [69]. Atrial fibrillation was predicted in patients in sinus rhythm by training a network with ECGs in sinus rhythm from patients who were known to have subsequently had an episode of atrial fibrillation [70]. The sensitivity was 79% and specificity was 79.5%.
A recent intriguing paper reported on the use of the 12 lead ECG for screening for SARS-CoV-2 [71]. The model could be adjusted to give varying sensitivity and specificity depending also on the prevalence of the virus. The sensitivity could be exceptionally high but with such a setting, the corresponding specificity was extremely low.
Critics of artificial intelligence techniques for ECG analysis will point to the fact that they do not give any indication of why a specific diagnosis has been made. However, there is an approach, termed saliency mapping, which purports to give an indication of the parts of the ECG waveform which contribute to reaching a specific interpretation [72]. The authors of this paper also point out that the technique allows users to find problems in their model which could be affecting performance and generalisability.
One of the advantages, or is it disadvantages, of some forms of machine learning is that thresholds can be chosen for deciding on the presence or absence of an abnormality. As yet, as far as is known, there is no commercial system available which allows the user to set such thresholds and hence 'control' the output. Furthermore, there can be a significant imbalance in groups used to test these newer techniques such that while sensitivity and specificity may seem reasonable, positive predictive value is simply not acceptable. For example, if there are 500 patients with an abnormality in a test population of 10,000 and the sensitivity and specificity of the algorithm for detecting the abnormality are each 80%, then the positive predictive value of the test is 17.4%.
It is generally recognised that the larger a training set can be for the development of AIbased techniques for ECG interpretation, the better will be the algorithm developed. This means that ECGs from a hospital database where cardiologists have verified (over-read) every ECG on the system can be used in a model based on supervised learning. Thus, in this situation, the AI-based approach is aimed at trying to perform as well as cardiologists would have done in interpreting an ECG. This comment relates only to those situations where the order of one million ECGs, for example, may be available for development of the newer techniques and clinical information may often be lacking, e.g., to substantiate an interpretation of left ventricular hypertrophy.
The foregoing suggests that an AI-based system for a complete interpretation of a 12 lead ECG cannot inherently improve on a 12 lead interpretation by cardiologists. Others might disagree with this view based, for example, on one recent report [62]. The same criticism applies to more conventional approaches to automated ECG analysis as shown in the CSE study [35]. Any form of automated ECG interpretation has the advantage of being able to apply the same thorough approach to interpretation 24/7, whereas a cardiologist will undoubtedly give different interpretations of some ECGs when seen several days, if not weeks, apart. The same could not be said of an AI-based system with identical data input to the logic to check on 'repeatability'. This may explain why cardiologists may not appear to give a performance which is equal to that of the AI-based approach [62].
It is becoming very fashionable to report performance in terms of the area under the curve (AUC) in the conventional format of plotting sensitivity versus (1-specificity). In addition, a different curve obtained by plotting precision recall (positive predictive value) versus sensitivity can also be obtained and the AUC similarly obtained. While the AUC may give an overall indication of the performance of the model, it still leaves the user with the problem of deciding at which point on the curve a threshold has to be chosen in order to set a desired sensitivity and specificity. Clearly, essentially by definition, the higher the sensitivity, the lower will be the specificity for the conventional AUC curve. Thus, it may be extremely difficult to gauge the potential day to day performance of a methodology from a knowledge of AUC alone, particularly in a site different from where it was developed.
Electrocardiography has seen many new ideas introduced through the years and essentially none has stood the test of time, e.g., vectorcardiography, QT dispersion, T wave alternans, late potentials, etc. AI is different, although it is aiming to improve diagnosis by finding hidden features in the ECG. Care will have to be taken that initial developments in one centre can be translated satisfactorily into routine practice in another. Much remains to be done for this to happen, including the ability of an AI-based approach to innately make use of age, sex, race, etc., as referenced in this review.

Supportive Organisations
There are two organisations which have played a major supporting role in the development of methods for computer analysis of ECGs, namely Computing in Cardiology (CinC) and the International Society for Computerized Electrocardiology (ISCE).
CinC was established in 1974 and has met annually ever since, either in the conventional manner or more recently as a hybrid in person and virtual conference. It is well attended and has a predominance of younger non-clinical researchers who participate in multiple parallel sessions and large poster sessions with the occasional plenary session. There are several competitions organised and, nowadays, conference proceedings are published online.
Of particular relevance is an annual Physionet/CinC Challenge where a problem relating to analysis of physiological signals is set and participants submit their own open source solutions. The challenge was initially organised by the Massachusetts Institute of Technology in Cambridge, MA. Dr Roger Mark and the late George Moody were principally involved in establishing this competition. More recently, Dr Gari Clifford, now at Emory University, Atlanta, has taken over the responsibility for the challenge. Further details can be found elsewhere [73].
A recent example of a challenge was the provision of several thousand single channel ECG recordings of at least 30 s duration obtained using the AliveCor KardiaMobile device. Competitors had to produce software that would determine whether or not atrial fibrillation was present. The interpretation of a large percentage of ECGs was provided for training purposes and the remainder was used for testing. This particular challenge attracted a very large number of participants.
This type of challenge is of current significance given the proliferation of wearables such as smart watches where only a single lead of ECG signal can be recorded. Thus, vendors can benefit perhaps by adopting some of the techniques used by competitors and even enter the competition themselves. Further discussion on analysis of ECGs in wearables can be found elsewhere in this issue [54].
As an aside, one of the most popular features of CinC is the social event where participants are divided into activists and passivists with the choice of being fully active, as in cycling or somewhat more sedate, perhaps undertaking a walking tour of the host city.
ISCE also started in the mid-70s, namely in 1975. It was initially organised by the Engineering Foundation based in New York. ISCE adopts a different style of a single session meeting and, traditionally, all delegates aim to attend all sessions. Each 15 min presentation is accompanied by a 15 min discussion period. ISCE advertises itself in the shape of the Einthoven Triangle with the vertices having Academia, Industry and the User as the three interlinked groups of participants. An average percentage of around 15% of delegates is medically qualified and a much higher percentage is from industry. ISCE also organises an annual Young Investigator Award, but is much more of a traditional conference with the major difference being afternoons free for socialising or engaging in business-like discussions or in potential scientific collaborations. Scientific presentations continue in the evening. Further details of the society can be found elsewhere [74].
An example of the value of ISCE can be given from an important study involving the developers of automated ECG analysis programs. Around 2012, it was suggested that there might be a comparative study of different commercially available software in respect of measuring common ECG intervals such as PR, QT and QRS duration. This was agreed and a database of 600 ECGs was assembled by the Cardiac Safety Research Consortium based at Duke University, North Carolina. Dr. Paul Kligfield from New York coordinated the exercise. Four participants, namely GE, Glasgow, Mortara, and Philips, participated.
Representatives from all groups met together on the occasion of an ISCE conference in Alabama and data was provided to all groups simultaneously gathered together in the same room ( Figure 5). Each had their software running on a laptop and measurement data were provided, immediately after analyses were completed, to the study statistician, who was in the room. Results were later published [75] showing differences in measurements between all four programs. As there was no gold standard, there were no right or wrong answers. The study suggested that measurement differences between programs could lead to different interpretations of the same ECG, while normal limits developed by one program would differ from those of another. The study was later expanded to include seven programs [76] with the three new participants being AMPS-LLC, University of Rotterdam (MEANS program) and Schiller. A different database of ECGs was used.

Conclusions
Automated electrocardiography has seen phenomenal advances from the 1960s to the 2020s. It is hard to predict where the technique will be even 10 years from now, but it would appear that the future of the 12 lead ECG remains secure, despite the fact that it has been under threat for many years. However, no matter how promising a new approach appears to be, individual cardiologists will still wish to make their own ECG interpretations and frequently claim that a particular automated report is incorrect. This has been true from the start of automated electrocardiography and will always be the case! Author Contributions: Conceptualisation, P.W.M.; Investigation, J.K. and P.W.M.; Writing-original draft, P.W.M.; Writing-reviewing and editing, P.W.M. and J.K. All authors have read and agreed to the published version of the manuscript.
Funding: There was no financial support for the preparation of this review article.

Conclusions
Automated electrocardiography has seen phenomenal advances from the 1960s to the 2020s. It is hard to predict where the technique will be even 10 years from now, but it would appear that the future of the 12 lead ECG remains secure, despite the fact that it has been under threat for many years. However, no matter how promising a new approach appears to be, individual cardiologists will still wish to make their own ECG interpretations and frequently claim that a particular automated report is incorrect. This has been true from the start of automated electrocardiography and will always be the case! Author Contributions: Conceptualisation, P.W.M.; Investigation, J.K. and P.W.M.; Writing-original draft, P.W.M.; Writing-reviewing and editing, P.W.M. and J.K. All authors have read and agreed to the published version of the manuscript.
Funding: There was no financial support for the preparation of this review article.