Machine Learning Techniques with ECG and EEG Data: An Exploratory Study

: Electrocardiography (ECG) and electroencephalography (EEG) are powerful tools in medicine for the analysis of various diseases. The emergence of affordable ECG and EEG sensors and ubiquitous mobile devices provides an opportunity to make such analysis accessible to everyone. In this paper, we propose the implementation of a neural network-based method for the automatic identification of the relationship between the previously known conditions of older adults and the different features calculated from the various signals. The data were collected using a smartphone and low-cost ECG and EEG sensors during the performance of the timed-up and go test. Different patterns related to the features extracted, such as heart rate, heart rate variability, average QRS amplitude, average R-R interval, and average R-S interval from ECG data, and the frequency and variability from the EEG data were identified. A combination of these parameters allowed us to identify the presence of certain diseases accurately. The analysis revealed that the different institutions and ages were mainly identified. Still, the various diseases and groups of diseases were difficult to recognize, because the frequency of the different diseases was rare in the considered population. Therefore, the test should be performed with more people to achieve better results.


Introduction
The emergence of non-invasive methods for analyzing and detecting diseases is one of the most significant prospects in medicine. At the same time, this poses challenges related to the correct use of technology, the positioning of the sensors, and the constant evolution of the equipment [1,2]. Technological advancements allow for a preliminary diagnosis through machines without any intervention from healthcare professionals. This research is included in the development of systems to support ambient assisted living technologies [3][4][5][6]. Cutting-edge approaches in the healthcare area have helped in solving various computer vision-based tasks by analyzing different features from various biosignals, including the facial features [7,8].
Mobile devices can be connected to different devices to head the creation of sophisticated handheld systems for the monitoring of health states [9][10][11]. They are handy because they are portable and small, allowing their correct positioning for different measurements [9][10][11]. These devices are equipped with different sensors, but more sensors can be connected through over-the-air connections [12][13][14][15][16][17][18]. These devices with increasing number of functionalities, and the number of available sensors, boost the options for creation of systems that could assist older adults [15,[19][20][21][22]. The use of this data captured in each individual and their subsequent calculation present the potential of these projects.
This research is included in a project related to Timed-Up and Go test, where the individuals were provided with a smartphone having an accelerometer and magnetometer sensors. To perform the experiments, we used two BITalino devices (https://bitalino.com/en/) that are affordable do-ityour-self boards facilitating the collection and analysis of variety of biomedical signals with inexpensive sensors. First, a BITalino device was positioned on the chair with a force sensor to detect the duration for which the individuals stood. Then, another BITalino device, with ECG and EEG sensors, was placed in the individual, and the different sensors were prepared for the acquisition of data during the test. Regarding the data acquired by the mobile device, the sampling rate is around 10 ms. Then, for the data acquired by the BITalino device, the sampling rate is exactly 100 ms. The similar frequencies enable the comparison of the data more accurately.
The main purpose of this study was the implementation of neural networks to identify the different diseases present in the population considered in the study reported in [23,24]. It was related to the timed-up and go test's execution with institutionalized people from the Covilhã and Fundão municipalities. Thus, the implementation of the methods started with identifying persons by institutions, age, diseases, and groups of diseases.
The implemented neural networks with the WEKA software [25] reported that the individuals might be recognized by institutions, where only the individuals from Centro Comunitário das Lameiras were not correctly identified. Similar results were obtained by age, where only persons who were 74, 85, and 86 years old were not correctly recognized. Regarding the recognition of the diseases, they were not correctly identified, because the sample consisted of a small number of individuals. However, after the categorization of the illnesses, cardiac diseases started to be recognized as a group of diseases.
Other studies related to processing of ECG and EEG data are reviewed and summarized in [26,27]. There are two main tracks regarding feature extraction-based on statistical features from the time and frequency domain, and ones based on deep learning. Our approach is using classical feature extraction because of the limited dataset size that we have. The results are on par with other approaches with a similar number of participants. What is novel in our approach is the combination of both sensors embedded on an inexpensive board, proving that even such affordable devices can provide satisfactory results and serve as indication of emerging diseases.
The Introductory section ends with this paragraph, and the remaining sections of this paper are organized as follows: Section 2 presents the description of the structure of the method implemented for the recognition of persons by institution, age, diseases, and groups of diseases. The results obtained are presented in Section 3. This paper ends with the discussion of the results and the presentation of the different conclusions in Section 4.

Methods
Machine learning methods were implemented with ECG and EEG data to identify the persons by the institution, age, diseases, and groups of disorders. The flow of the proposed method includes several stages, including data collection, feature extraction, machine learning, and statistical methods, as presented in Figure 1. We only considered the persons whose ECG and EEG data were correctly acquired, and the data were not filtered, extracting only the different features.

Data Collection
Following the previous studies [23,24], the data, as presented in Table 1, were acquired from 14 institutionalized individuals aged between 71 and 97 years old (83 +/− 7.4) with different diseases included in some categories, as presented in Section 3.4. The different data were acquired from various institutions, such as Centro Comunitário das Lameiras, Lar Minas, Lar da Misericórdia, and Lar da Nossa Senhora de Fátima. The data were collected by a mobile application connected by Bluetooth to a BITalino device. Different constraints were verified during the data collection, but these records were reliable for the analysis and correlation of the different diseases found in the population. For this study, only the ECG and EEG data acquired by a BITalino device were considered for the processing of the different disorders. The data acquisition faced some challenges, as presented in [28,29].
The acquisition of the data has different sample rate between devices, where the sampling rate of the sensors available in the mobile device is variable, because the instruction Sensor.DELAY_FASTEST does not have the same frequency on all devices. Regarding the device used, i.e., XIAOMI MI 6, reported that the frequency is around 10 ms, i.e., 100 Hz. Next, the sampling rate of the BITalino device connected by Bluetooth is 100 Hz. After the data acquisition, the different features were extracted for further comparison, as explained in Section 2.2. After the data acquisition, the data were processed offline as presented in Section 2.3.

Feature Extraction
Different features were extracted with the framework [4] from the ECG and EEG signals, including heart rate, heart rate variability, average QRS amplitude, average R-R interval, and the average R-S interval from the ECG data, and the frequency and variability from the EEG data. These data were combined for the identification of institution, age, disease, and a group of disorders of different individuals.

Machine Learning
The machine learning method implemented was a neural network, i.e., multiplayer perceptron, implemented with the WEKA software [25] with the following details: The WEKA software is a free and open-source application to test different machine learning methods. It includes a set of methods, but we chose the Multiplayer Perceptron method [30,31], which is a method that consists of the training and the prediction of different classes with different weights to the input and output neurons. It also supports different attribute transformation methods, including ones for handling nominal and numeric data, which is important for medical datasets, which frequently encounter mixed data types [32,33].

Statistical Analysis
For the validation of the implemented method, different parameters were calculated, such as true positive (TP), false positive (FP), true negative (TN), and false negative (FN). With these values, the accuracy, precision, recall, and F1 score values were calculated to measure the performance of the implemented method.

Results
Based on the different constraints during data acquisition, we performed various types of analyses with neural networks, firstly, by combining the institution with the different features extracted (Section 3.1). Secondly, we combined the different features extracted with the sample's different ages (Section 3.2). Thirdly, we combined the same features extracted with various diseases (Section 3.3). Finally, we established groups of disorders, and the disorders were categorized; then, we combined the different groups of illnesses with the features extracted previously (Section 3.4).

Analysis by Institution
Based on the implementation of the machine learning methods described in Section 2.3 with the data separated by institution, Table 2 presents the confusion matrix of the results obtained. We verified that the records from Centro Comunitário das Lameiras were not correctly identified, but the persons from the remaining institutions were correctly identified. The data were selected with WEKA software as shown in Figure 2, presenting the classification dispersed by the different institutions (Figure 3), such as Centro Comunitário das Lameiras, Lar Minas, Lar da Misericórdia, and Lar da Nossa Senhora de Fátima.   Next, the results of the identification of the persons from the institutions performed with neural networks, as presented in Table 3, showed that the persons from Lar Minas were correctly discretized, where the persons from Centro Comunitário das Lameiras were not identified. The remaining institutions were commonly identified, reporting one record that was not correctly identified in each institution. Thus, the use of neural networks resulted in an accuracy of 93% with a precision of 89%. Moreover, the recall value was 93%, and the F1 Score was 91%.

Analysis by Age
As the different institutions had different types of people, we implemented the machine learning methods with the data separated by age. Table 4 presents the confusion matrix of the results obtained. We verified that the records related to persons aged 74, 85, and 86 years old were not correctly identified. The data were selected with WEKA software as presented in Figure 4, showing the classification dispersed by the different ages in the following order ( Figure 5    Next, the results of the identification of the persons by age performed with neural networks, as presented in Table 5, showed that the 74 years-old people were not correctly identified at all. Concerning the 85 and 86 years-old, only 50% of the cases were correctly identified. Finally, the method reported an accuracy of 95%, precision of 96%, recall value of 95%, and F1 score of 95%.

Analysis by Diseases
The subjects of this study had different diseases, and we verified, with the implementation of neural networks, which disorders did not correlate with the different acquired data. We confirmed that this was because we had a limited number of persons with each disease. However, the negative cases were correctly identified, reporting an accuracy between 89% and 98%, as shown in Table 6. The data were selected with WEKA software as presented in Figure 6, showing the classification dispersed by the different diseases in the following order (Figure 7): arterial hypertension, cardiac arrhythmia, arteriosclerotic coronary disease, heart failure, Parkinson's disease, post-traumatic stress, depression, sequelae of surgery to brain injury, dementia of vascular etiology, and acute myocardial infarction.

Analysis by Group of Diseases
As previously verified, there was no correlation between the values acquired from the ECG and EEG sensors and the different diseases. Therefore, we grouped the different disorders by categories, such as osteoarticular diseases, cardiovascular diseases, lung diseases, neurological and balance diseases, psychiatric illnesses, nephro-urological diseases, digestive system and abdominal wall diseases, and metabolic disorders, as presented in Table 7. After the grouping of different diseases, the neural networks were applied to the various records grouped by diseases. As shown in Table 8, the results improved. The identification of persons with cardiovascular diseases had an accuracy of 51%. As in the case of the detection of isolated diseases, the negative cases were correctly identified, reporting an accuracy between 51% and 98%. As we are only acquiring data related to ECG and EEG sensors, the reported results are the expected. Thus, we analysed the groups of diseases that are related to this type of data, such as Cardiovascular diseases, Neurological and balance diseases, and Psychiatric illnesses, resulting in Table 9. It is also verified that the most recognized conditions are the Cardiovascular diseases with an accuracy of 76%. The data was selected with WEKA software as presented in Figure 8, presenting the classification dispersed by the different diseases in the following order ( Figure 9): cardiovascular, neurological and balance, and psychiatric.

Discussion and Conclusions
Machine learning techniques are helpful for the recognition of different diseases involved in the studied population. The application of machine learning techniques made it possible to identify with some accuracy the different patterns related to the extracted features, such as heart rate, heart rate variability, average QRS amplitude, average R-R interval, and average R-S interval from ECG data, and the frequency and variability from the EEG data. A combination of these parameters allowed us to identify, with some accuracy, the presence of certain diseases.
However, this study revealed some limitations related to the data acquisition and different constraints, and some data were excluded for several reasons, including the failure of the sensors. A small number of valid records implies that the machine learning method might benefit from larger datasets and samples for them to be reliable.
The obtained results revealed that the individuals related to institutions were recognized except for individuals from Centro Comunitário das Lameiras. The identification results related to age were also accurate except for the results for persons aged 74, 85, and 86 years old. Regarding the recognition of diseases and considering that we had a small dataset for the analysis, the isolated disorders were not recognized. However, when the disorders were categorized, some persons with cardiovascular diseases were identified. Thus, the proposed method reported low accuracies for illnesses, but the accuracy was higher for the recognition of persons by age and institution.
In the future we intend to study a larger number of individuals to increase the size of the dataset acquired. Next, other types of diseases will be analyzed, comparing healthy people with those suffering from certain disorders.
Author Contributions: Conceptualization, methodology, software, validation, formal analysis, investigation, writing-original draft preparation, writing-review, and editing: V.P., I.M.P., F.R.R., N.M.G., M.V.V., E.Z. and P.L. All authors have read and agreed to the published version of the manuscript.
Funding: This work is funded by FCT/MEC through national funds and co-funded by FEDER-PT2020 partnership agreement under the project UIDB/EEA/50008/2020.