Symptom Tracking and Experimentation Platform for Covid-19 or Similar Infections

: Remote symptom tracking is critical for the prevention of Covid-19 spread. The qualiﬁed medical staff working in the call centers of primary health care units have to take critical decisions often based on vague information about the patient condition. The congestion and the medical protocols that are constantly changing often lead to incorrect decisions. The proposed platform allows the remote assessment of symptoms and can be useful for patients, health institutes and researchers. It consists of mobile desktop applications and medical sensors connected to cloud infrastructure. The unique features offered by the proposed solution are: (a) dynamic adaptation of Medical Protocols (MP) is supported (for the deﬁnition of alert rules, sensor sampling strategy and questionnaire structure) covering different medical cases (pre-or post-hospitalization, vulnerable population, etc.), (b) anonymous medical data can be statistically processed in the context of the research about an infection such as Covid-19, (c) reliable diagnosis is supported since several factors are taken into consideration, (d) the platform can be used to drastically reduce the congestion in various healthcare units. For the demonstration of (b), new classiﬁcation methods based on similarity metrics have been tested for cough sound classiﬁcation with an accuracy in the order of 90%.


Introduction
During the Covid-19 pandemic, primary health care services and call centers have been criticized for their decisions about the need of a patient to be hospitalized or not. The medical staff working in these call centers have to remotely estimate the condition of the callers based on vague descriptions of their symptoms. Missing information about the patients, the congestion and the consistently changing medical protocols, combined with the subjective opinion of the medical practitioners, often leads to incorrect decisions. Sensitive population with weak immune system (cancer, kidney failure, heart diseases, etc.) is also necessary to be remotely monitored and the same holds for Covid-19 patients in their rehabilitation phase.
The diagnosis of Covid-19 or other similar infections that will appear in the near future (e.g., the Covid-19 variations that recently appeared in United Kingdom or South Africa) is a major challenge since reliable molecular tests have to be performed to a large fraction of the population. Covid-19 is caused by the Severe Acute Respiratory Syndrome Coronavirus 2 virus (SARS-CoV-2) [1] and is characterized by high infection and case-fatality rates [2]. There is a large number of asymptomatic carriers and a high infection rate even when there are no symptoms [3]. In [4], a number of molecular tests and immunoassays for Covid-19 management life cycle are presented. The life cycle of this virus consists of the preventive, preparedness, response and recovery phase. The main molecular test categories are the Nucleic Acid Amplification Test (e.g., Reverse Transcription-Quantitative Polymerase Chain Reaction (RT-qPCR)) [5], Immunoassays (such as immune colloidal gold strips) [6] and sequencing.
The developed platform (called henceforth Coronario) aims at the reduction of the traffic in the call centers of primary healthcare units through remote symptom tracking. The functionality offered can be used to support the real-time screening of vulnerable population (patients with weak immune system, heart diseases, cancer, transplanted, etc.). Moreover, it can be used to monitor the condition of the patients who have recovered from Covid-19 infection in real-time. The research concerning the nature of Covid-19 requires an abundance of information about the pre-and post-hospitalization phase. An eHealth infrastructure that is, a number medical sensors on the side of the patient is employed to support a more reliable diagnosis based on the overall patient condition and his environment [29]. The anonymous medical data that are used in the Coronario platform can also be exploited in the context of research on Covid-19.
The Coronario platform consists of a user mobile application (UserApp) and a number of medical IoT sensors that are connected to cloud services preferably through a sensor controller for higher security and efficiency. The information processed by the UserApp includes the description of the symptoms through questionnaires, the location of the user and the results of audio processing for the classification of cough and respiratory patterns. Cloud connection allows data privacy with user authentication, access permissions, encryption and other services. The sensor data (e.g., blood pressure, body temperature, etc.) are uploaded there and can be accessed by the supervisor doctor or authorized researchers. The diagnosis results are certified by authorized medical staff that access the data through the Supervisor application. The diagnosis decisions can be potentially assisted by Artificial Intelligence (AI) tools such as pre-trained TensorFlow neural network models. Finally, the researchers can access anonymous medical data, given after the consent of the patients, through the Scientific application.
Compared to similar symptom tracking applications, Coronario offers a flexible MP file format that allows the definition or modification of alert rules, sensor sampling scenarios and questionnaire structures in real time. In this way, several medical cases of Covid-19 or other infections can be supported. Moreover, the facilities of the Coronario platform can be exploited by various practitioners: physicians, primary healthcare units, hemodialysis centers, oncological clinics and so forth. An extensible platform where different sound classification methods can be tested, is also offered for the support of the research on Covid-19. The integration of various components in the Coronario platform offers extra services, beyond symptom assessment and statistical processing. Specifically, user tracking, localization and social distancing, is also supported.
The rest of the paper is organized as follows: The detailed description of the Coronario architecture is presented in Section 2. The dynamic medical protocol configuration is examined in Section 3. In Section 4, cough sound classification is examined for the demonstration of research experiments using the Coronario platform. A discussion on extensions and the security features, follows in Section 5. Section 6 draws conclusions.

Architecture of the Coronario Platform
The general Coronario platform architecture appears in Figure 1. It consists of a mobile (UserApp) and two desktop applications (SupervisorApp and ScientificApp), a local eHealth sensor network and cloud storage. More details about the data structures exchanged between these modules are shown in Figure 2. The use and the features of each one of these modules are described in the following paragraphs.

UserApp
The UserApp is a mobile application used by the patient. It has been developed using Microsoft Visual Studio 2019/Xamarin thus, that it can be deployed as an Android or iOS application. An overview of the UserApp pages appears in Figure 3. The user selects the language of the interface from the initial page (Figure 3a), signs in (Figure 3b) and updates the MP file ( Figure 3c). General information about the age, gender and habits of the user are asked ( Figure 3d) and then, the default or customized by the MP symptoms' questionnaire appears (Figure 3e,f). The answers can be either given as true/false using

UserApp
The UserApp is a mobile application used by the patient. It has been developed using Microsoft Visual Studio 2019/Xamarin thus, that it can be deployed as an Android or iOS application. An overview of the UserApp pages appears in Figure 3. The user selects the language of the interface from the initial page (Figure 3a), signs in (Figure 3b) and updates the MP file ( Figure 3c). General information about the age, gender and habits of the user are asked ( Figure 3d) and then, the default or customized by the MP symptoms' questionnaire appears (Figure 3e,f). The answers can be either given as true/false using

UserApp
The UserApp is a mobile application used by the patient. It has been developed using Microsoft Visual Studio 2019/Xamarin thus, that it can be deployed as an Android or iOS application. An overview of the UserApp pages appears in Figure 3. The user selects the language of the interface from the initial page (Figure 3a Figure 3d) and then, the default or customized by the MP symptoms' questionnaire appears (Figure 3e,f). The answers can be either given as true/false using checkboxes or in an analog way through sliders (slider on the left: low symptom intensity, slider on the right: high intensity). s 2021, 10, x FOR PEER REVIEW 5 of 25 checkboxes or in an analog way through sliders (slider on the left: low symptom intensity, slider on the right: high intensity). The current position of the user is detected through the Global Positioning System (GPS) of the cell phone in Figure 3g and displayed on the map (Figure 3h). In the current version, the country is detected through GPS coordinates and additional information such as the number of Covid-19 cases found so far in this country can also be retrieved by a web page that publishes such information. Narrower regions can also be used to trace the locations that the user has visited in case he found infected by Covid-19.
The user can select a sound file with recorded cough or respiratory sound as shown in Figure 3i,j. The sound file is then, played and sound processing is performed, potentially sending the analysis results and the extracted features to the cloud. In the last page shown in Figure 3k, several messages are displayed. For example, diagnosis results and alerts can appear. Guidelines are displayed showing to the user how and when to perform tests with the medical sensors.

SupervisorApp
The SupervisorApp is used by authorized medical practitioners. A supervisor doctor can access the patient data through this application and exchange messages. He can login to his account on the cloud, selects the interface language and accesses the sensor data in readable format. The supervisor can also select or create an appropriate MP file where the conditions that generate alerts and instructions to the user are defined. The sensor sampling scheme that has to be followed and guidelines about how the user will perform the medical tests with the available medical equipment, is also defined in the MP. The current position of the user is detected through the Global Positioning System (GPS) of the cell phone in Figure 3g and displayed on the map (Figure 3h). In the current version, the country is detected through GPS coordinates and additional information such as the number of Covid-19 cases found so far in this country can also be retrieved by a web page that publishes such information. Narrower regions can also be used to trace the locations that the user has visited in case he found infected by Covid-19.
The user can select a sound file with recorded cough or respiratory sound as shown in Figure 3i,j. The sound file is then, played and sound processing is performed, potentially sending the analysis results and the extracted features to the cloud. In the last page shown in Figure 3k, several messages are displayed. For example, diagnosis results and alerts can appear. Guidelines are displayed showing to the user how and when to perform tests with the medical sensors.

SupervisorApp
The SupervisorApp is used by authorized medical practitioners. A supervisor doctor can access the patient data through this application and exchange messages. He can login to his account on the cloud, selects the interface language and accesses the sensor data in readable format. The supervisor can also select or create an appropriate MP file where the conditions that generate alerts and instructions to the user are defined. The sensor sampling scheme that has to be followed and guidelines about how the user will perform the medical tests with the available medical equipment, is also defined in the MP. Moreover, the specific questions that should be included in the customized questionnaire are determined in the MP file. The flexibility in determining these issues is owed to the employed MP file format described in Section 3.
The SupervisorApp has been developed in Visual Studio 2019 as a Universal Windows Platform (UWP). Its main page is shown in Figure 4. The values of specific sensors are also viewed in this page. Moreover, the specific questions that should be included in the customized questionnaire are determined in the MP file. The flexibility in determining these issues is owed to the employed MP file format described in Section 3. The SupervisorApp has been developed in Visual Studio 2019 as a Universal Windows Platform (UWP). Its main page is shown in Figure 4. The values of specific sensors are also viewed in this page.

IoT eHealth Sensor Infrastructure
The IoT eHealth sensor infrastructure is employed to monitor several parameters concerning the condition of the patient's body and his environment. A non-expert should be able to handle the available sensors and a different combination of these sensors can be employed for each medical case. Ordinary sensors that can be operated by unexperienced patients are the following: • body position to monitor how long the patient was standing or walking, lying on the bed or if he had a fall, • blood pressure sensor, • glucose meter, • digital or analog thermometers for measuring the temperature of the body or the environment, • Pulse-oximeter (SPO2), • respiratory sensor to plot the patient breathing pattern, • Galvanic Skin Response (GSR) sensor to monitor the sweat and consequently the stress, • environmental temperature/humidity sensors.
Some of the sensors have analog interface as shown in Figure 2 (temperature, GSR, body position, respiratory) and should be connected to Analog/Digital Conversion readout circuits. Others offer a digital interface or even network connectivity (e.g., blood pressure, glucose meter, SPO2) and are capable of uploading their measurements directly to a cloud in real-time. A local controller (Gateway) may be used to connect both analog and digital sensors through appropriate readout circuits and upload data to the cloud through its wired or wireless network interface. The Gateway can also filter sensor values and upload only the necessary information required for the diagnosis. Filtering can include smoothing through moving average windows, estimation of missing sensor

IoT eHealth Sensor Infrastructure
The IoT eHealth sensor infrastructure is employed to monitor several parameters concerning the condition of the patient's body and his environment. A non-expert should be able to handle the available sensors and a different combination of these sensors can be employed for each medical case. Ordinary sensors that can be operated by unexperienced patients are the following: • body position to monitor how long the patient was standing or walking, lying on the bed or if he had a fall, • blood pressure sensor, • glucose meter, • digital or analog thermometers for measuring the temperature of the body or the environment, • Pulse-oximeter (SPO2), • respiratory sensor to plot the patient breathing pattern, • Galvanic Skin Response (GSR) sensor to monitor the sweat and consequently the stress, • environmental temperature/humidity sensors.
Some of the sensors have analog interface as shown in Figure 2 (temperature, GSR, body position, respiratory) and should be connected to Analog/Digital Conversion readout circuits. Others offer a digital interface or even network connectivity (e.g., blood pressure, glucose meter, SPO2) and are capable of uploading their measurements directly to a cloud in real-time. A local controller (Gateway) may be used to connect both analog and digital sensors through appropriate readout circuits and upload data to the cloud through its wired or wireless network interface. The Gateway can also filter sensor values and upload only the necessary information required for the diagnosis. Filtering can include smoothing through moving average windows, estimation of missing sensor values through Kalman filters, data mining that searches for strong patterns through Principal Component Analysis (PCA). Only the important sensor values are uploaded to the cloud for higher data privacy and they can be isolated for example, through appropriate thresholds. The experiments conducted here, were based on the sensor infrastructure described in [29]. After user consent, some sensor data may be anonymously used by the ScientificApp in the context of research about Covid-19 or other similar infections. Although, more advanced tests such as ElectroCardioGrams (ECG) or ElectroMyoGrams (EMG) are also available, their operation is more complicated and the results they produce cannot be interpreted by a non-expert. Moreover, many of these sensors are not medically certified.
UserApp is also employed to guide the user about how and when to operate an external sensor. The sensor sampling strategy can be defined in the MP file and the UserApp converts it to comprehensive guidelines for the patient.

Cloud Services
There were several options for the implementation of the communication between the Coronario modules. Dedicated links for any pair of modules would not be appropriate since multiple links would have higher cost, would be more difficult to synchronize and would require an overall larger buffering scheme to support asynchronous communication. Multiple copies of the same data travelling through different links could also cause incoherence. Private datacenters could be used to store the generated data. They can offer advanced storage security and speed but have higher initial (capital) and administrative cost. Moreover, they usually support only the customized services developed by their owners. On the contrary, public clouds support advanced services for data sharing, analysis and visualization, they allow ubiquitous uniform access while their cost is defined on a pay-as-you-go basis. However, the customers of the cloud services do not always trust them as far as the security of their data is concerned.
The information is exchanged between the various modules of the Coronario platform through general cloud services such as the ones offered by Google Cloud, Microsoft Azure, Amazon Web Services (AWS). These platforms support advanced security features guaranteed by Service Level Agreements (SLAs), data analysis and notification methods as well as storage of data in raw (blobs) or structured format (e.g., tables, databases). Advanced healthcare services such as Application Program Interfaces (APIs) and integration are offered by Google Cloud Healthcare API. Microsoft Cloud for Healthcare brings various new services for monitoring patients combining Microsoft 365, Dynamics, Power Platform, Azure cloud and IoT tools. The common data model used in this case, allows for easy data sharing and analysis. For example, medical data can be visualized using Microsoft Power BI. Microsoft Azure cloud and IoT services have been used by the Coronario platform although only the storage facilities have been exploited in order to make the developed application more portable. Several other even simpler cloud platforms (Ubidots, ThingSpeak, etc.), could have been employed in the place of Microsoft Azure. The data stored in the cloud can be visualized (e.g., plotted) outside the cloud and more specifically in the Supervisor App.
The following information is stored in the cloud: • Sensor values stored in (uid, sid, ts, v) format (see Figure 2). The uid and sid are the user and sensor identification number (not name), ts is the timestamp and v the sensor value represented as a string. This format is also followed for storing the information returned by the User App. • MP files.

•
Messages exchanged between the supervisor doctor and the patient.
MP files as well as messages can also be stored in the cloud as (uid, sid, ts, v) for uniform representation. A whole message or even an MP text file can be stored in the string field v. This field is intentionally defined as a string in order to incorporate a combination of one or more numerical values, text and additional meta-data needed besides uid, sid and ts.

ScientificApp
Scientific is a variation of UserApp compiled as a desktop application for a more ergonomic user interface. The details of this module are given in this paragraph in order to demonstrate how alternative classification methods and similarity metrics can be tested. Most of the pages are identical with UserApp except from the page that concerns the sound processor shown in Figure 5. In this page the researcher can select and play a sound file (e.g., daf9.wav as shown in Figure 5). The type of the recorded sound (cough or respiratory) as well as the "Training" mode is defined using checkboxes.
MP files as well as messages can also be stored in the cloud as (uid, sid, ts, v) for uniform representation. A whole message or even an MP text file can be stored in the string field v. This field is intentionally defined as a string in order to incorporate a combination of one or more numerical values, text and additional meta-data needed besides uid, sid and ts.

ScientificApp
Scientific is a variation of UserApp compiled as a desktop application for a more ergonomic user interface. The details of this module are given in this paragraph in order to demonstrate how alternative classification methods and similarity metrics can be tested. Most of the pages are identical with UserApp except from the page that concerns the sound processor shown in Figure 5. In this page the researcher can select and play a sound file (e.g., daf9.wav as shown in Figure 5). The type of the recorded sound (cough or respiratory) as well as the "Training" mode is defined using checkboxes. Based on the employed classification methods, the SupervisorApp can determine the reference features in the frequency domain of a cough sound class, using a number (e.g., 5-10) of training files. Then, the features extracted from any new sound file that is analyzed, are examined for similarity with the reference ones in order to classify it. Fast Fourier Transform (FFT) is applied to the input sound file in segments for example, of 1024 samples (defined in the page shown in Figure 5). Subsampling can also be applied in order to cover a longer time interval with a single FFT.
The drop down menu "Select Analysis Type" shown in the page of Figure 5 can be used to select the similarity method that will be used as shown in Figure 6. The supported similarity methods will be explained in detail in Section 4. Some similarity analysis methods may need extra parameters to be set. Reserved fields "Param1" and "Param2" can be used for this purpose. The magnitude of the FFT output values are the features used for similarity checking and are displayed at the bottom of Figure 4 after the analysis of the sound is completed. If training takes place these values appear after all the training sound files have been analyzed. These values can be exported for example, in a spreadsheet for statistical processing. Different FFT segments of the same sound file are either (a) averaged or (b) the maximum power that appears among different segments for the same frequency is used as the final bin value. Based on the employed classification methods, the SupervisorApp can determine the reference features in the frequency domain of a cough sound class, using a number (e.g., 5-10) of training files. Then, the features extracted from any new sound file that is analyzed, are examined for similarity with the reference ones in order to classify it. Fast Fourier Transform (FFT) is applied to the input sound file in segments for example, of 1024 samples (defined in the page shown in Figure 5). Subsampling can also be applied in order to cover a longer time interval with a single FFT.
The drop down menu "Select Analysis Type" shown in the page of Figure 5 can be used to select the similarity method that will be used as shown in Figure 6. The supported similarity methods will be explained in detail in Section 4. Some similarity analysis methods may need extra parameters to be set. Reserved fields "Param1" and "Param2" can be used for this purpose. The magnitude of the FFT output values are the features used for similarity checking and are displayed at the bottom of Figure 4 after the analysis of the sound is completed. If training takes place these values appear after all the training sound files have been analyzed. These values can be exported for example, in a spreadsheet for statistical processing. Different FFT segments of the same sound file are either (a) averaged or (b) the maximum power that appears among different segments for the same frequency is used as the final bin value. When a new sound file is analyzed, the FFT output is displayed at the bottom of the page shown in Figure 5 and the classification results can be generated after the user selects the file with the reference features of the supported classes. This set of classes can of course be extended if the system is trained with sounds from a new cough or respiratory category. In Figure 7, the name of the "Classify" button is changed to the specific reference When a new sound file is analyzed, the FFT output is displayed at the bottom of the page shown in Figure 5 and the classification results can be generated after the user selects the file with the reference features of the supported classes. This set of classes can of course be extended if the system is trained with sounds from a new cough or respiratory category. In Figure 7, the name of the "Classify" button is changed to the specific reference features' filename (classmax_1024_s4_maw7.txt). The field holding the selected analysis type in Figure 5 (Pearson Correlation Similarity), displays now the results of the classification (see Figure 7). When a new sound file is analyzed, the FFT output is displayed at the bottom of the page shown in Figure 5 and the classification results can be generated after the user selects the file with the reference features of the supported classes. This set of classes can of course be extended if the system is trained with sounds from a new cough or respiratory category. In Figure 7, the name of the "Classify" button is changed to the specific reference features' filename (classmax_1024_s4_maw7.txt). The field holding the selected analysis type in Figure 5 (Pearson Correlation Similarity), displays now the results of the classification (see Figure 7).

Dynamic MP File Configuration
The format of the MP file is studied in detail in this section, in order to demonstrate that any alert rule, sensor sampling scenario or questionnaire structure can be supported. JavaScript Object Notation (JSON) format is employed for the MP file. The configuration sections of an MP file are the following: (a) Questionnaire, (b) Sensor Sampling and (c) Alert Rules. A new MP file can be defined by the supervisor who uploads it to the cloud through the SupervisorApp (see Section 2.2). This file can be downloaded from the cloud by the UserApp. Several fields in the UserApp pages are adapted according to the information stored in the MP file. Updating an MP file is performed in real time without requiring a recompilation of the Coronario modules. Each MP section is described in the following paragraphs.

Questionnaire Configuration
The aim of the Questionnaire section is to select the appropriate questions about the patient symptoms that should appear in the corresponding form of the UserApp (see Section 2.1). These questions are selected from a pool that contains all the possible ones that are related to Covid-19 or other infections. Names such as q1, q2, q3 (see Figure 8) are indicating that the definition of a question follows. Each question is defined by the fields: SensId and Question. The value of SensId should be retrieved from the question pool so that it can be recognized by the UserApp. The text in the Question field can be customized (it is not predefined) and appears as a question in the UserApp. The default way of answering a question that is defined in the MP file is a percentage value determined by a slider control that appears in the UserApp questionnaire. In this way, the user can respond to a question by an analog indication between No (0%-slider to the left) and Yes (100%slider to the right). For example, if a symptom appears with mild intensity, the patient may place the slider close to the middle (50%). If the answers to specific symptom questions should be a clear yes or no, checkboxes can be defined through the Questionnaire section of the MP file. For uniform handling of both the sliders and the checkboxes, when the checkbox is ticked a value of 100% is assumed while 0% is returned

Dynamic MP File Configuration
The format of the MP file is studied in detail in this section, in order to demonstrate that any alert rule, sensor sampling scenario or questionnaire structure can be supported. JavaScript Object Notation (JSON) format is employed for the MP file. The configuration sections of an MP file are the following: (a) Questionnaire, (b) Sensor Sampling and (c) Alert Rules. A new MP file can be defined by the supervisor who uploads it to the cloud through the SupervisorApp (see Section 2.2). This file can be downloaded from the cloud by the UserApp. Several fields in the UserApp pages are adapted according to the information stored in the MP file. Updating an MP file is performed in real time without requiring a recompilation of the Coronario modules. Each MP section is described in the following paragraphs.

Questionnaire Configuration
The aim of the Questionnaire section is to select the appropriate questions about the patient symptoms that should appear in the corresponding form of the UserApp (see Section 2.1). These questions are selected from a pool that contains all the possible ones that are related to Covid-19 or other infections. Names such as q1, q2, q3 (see Figure 8) are indicating that the definition of a question follows. Each question is defined by the fields: SensId and Question. The value of SensId should be retrieved from the question pool so that it can be recognized by the UserApp. The text in the Question field can be customized (it is not predefined) and appears as a question in the UserApp. The default way of answering a question that is defined in the MP file is a percentage value determined by a slider control that appears in the UserApp questionnaire. In this way, the user can respond to a question by an analog indication between No (0%-slider to the left) and Yes (100%-slider to the right). For example, if a symptom appears with mild intensity, the patient may place the slider close to the middle (50%). If the answers to specific symptom questions should be a clear yes or no, checkboxes can be defined through the Questionnaire section of the MP file. For uniform handling of both the sliders and the checkboxes, when the checkbox is ticked a value of 100% is assumed while 0% is returned when the checkbox is not ticked. Figure 8 shows how the questions defined in the MP file appear in the UserApp. when the checkbox is not ticked. Figure 8 shows how the questions defined in the MP file appear in the UserApp.  , ts, v). Moreover, additional information uploaded by the UserApp such as the geolocation is also stored in this format. The type of the parameter v is string, so that multiple for example, floating point values or even whole messages/text files can be incorporated in a single field. All of these values whether retrieved from the IoT eHealth sensor infrastructure or for example, from the UserApp questionnaire can be viewed by the supervisor who can search for specific sensor values based on the sensor identity or other criteria (e.g., specific dates or users).

Sensor Sampling Configuration
The second section in the MP file concerns the sampling strategy that should be followed concerning the IoT medical sensors. The format of each entry in this section demands the declaration of the sensor name (e.g., BPos for body position, MPressH for systolic blood pressure and MPressL for the diastolic one), followed by 4 fields: Type, Date, Period, Repeat. The Type field can have one of the following values: Once, Repeat, Interval. If Once is selected, then the medical test should be performed once at the specific Date. The Period and Repeat fields are ignored in this case (can be 0). If Type = "Repeat", then the patient must perform a routine medical test starting at the date indicated by the corresponding field. A time interval (determined in the Period field) should intervene between successive tests. The Repeat field indicates a maximum number of tests that should be performed although these tests can be terminated earlier by a newer MP file. If Type = "Interval" then the interpretation of the other 3 fields is different. More specifically, a medical test should be performed in a Period starting from Date, Repeat times.
Although this MP file section has been defined for the sensors of the eHealth infrastructure, they can also be used for sensor-like information retrieved through the The values set in the questionnaire are uploaded to the cloud in the same format as the sensor values of the eHealth Infrastructure (see Section 2.4): (uid, sid, ts, v). Moreover, additional information uploaded by the UserApp such as the geolocation is also stored in this format. The type of the parameter v is string, so that multiple for example, floating point values or even whole messages/text files can be incorporated in a single field. All of these values whether retrieved from the IoT eHealth sensor infrastructure or for example, from the UserApp questionnaire can be viewed by the supervisor who can search for specific sensor values based on the sensor identity or other criteria (e.g., specific dates or users).

Sensor Sampling Configuration
The second section in the MP file concerns the sampling strategy that should be followed concerning the IoT medical sensors. The format of each entry in this section demands the declaration of the sensor name (e.g., BPos for body position, MPressH for systolic blood pressure and MPressL for the diastolic one), followed by 4 fields: Type, Date, Period, Repeat. The Type field can have one of the following values: Once, Repeat, Interval. If Once is selected, then the medical test should be performed once at the specific Date. The Period and Repeat fields are ignored in this case (can be 0). If Type = "Repeat", then the patient must perform a routine medical test starting at the date indicated by the corresponding field. A time interval (determined in the Period field) should intervene between successive tests. The Repeat field indicates a maximum number of tests that should be performed although these tests can be terminated earlier by a newer MP file. If Type = "Interval" then the interpretation of the other 3 fields is different. More specifically, a medical test should be performed in a Period starting from Date, Repeat times.
Although this MP file section has been defined for the sensors of the eHealth infrastructure, they can also be used for sensor-like information retrieved through the UserApp (questionnaire answers, geolocation coordinates or even sound/image analysis). The sampling scenarios determined in this MP file section should be converted in compre-hensive guidelines for the patient in order to schedule his medical tests in the right time. Figure 9 shows how the UserApp converts the JSON format of the Sampling Section in natural language. ters 2021, 10, x FOR PEER REVIEW 11 of 25 UserApp (questionnaire answers, geolocation coordinates or even sound/image analysis). The sampling scenarios determined in this MP file section should be converted in comprehensive guidelines for the patient in order to schedule his medical tests in the right time. Figure 9 shows how the UserApp converts the JSON format of the Sampling Section in natural language. The sampling scenario for the screening of a patient with kidney failure is examined to demonstrate the flexibility of the MP file format in determining any usage of the medical sensors. More specifically, it demonstrates how different regular and irregular sampling rates can be supported. Sleep apnea which is a breathing disorder is highly prevalent in these patients. This disorder results in reduced blood oxygen saturation (called hypoxemia) and oxygen saturation is below 90%. It is also the cause of high blood pressure and can cause cardiac diseases that are frequent in chronic kidney disease and dialysis patients [30]. The sensor sampling scheduling of Figure 10 is implemented by the MP file "Sampling" section as shown in Algorithm 1.  The sampling scenario for the screening of a patient with kidney failure is examined to demonstrate the flexibility of the MP file format in determining any usage of the medical sensors. More specifically, it demonstrates how different regular and irregular sampling rates can be supported. Sleep apnea which is a breathing disorder is highly prevalent in these patients. This disorder results in reduced blood oxygen saturation (called hypoxemia) and oxygen saturation is below 90%. It is also the cause of high blood pressure and can cause cardiac diseases that are frequent in chronic kidney disease and dialysis patients [30]. The sensor sampling scheduling of Figure 10 is implemented by the MP file "Sampling" section as shown in Algorithm 1. UserApp (questionnaire answers, geolocation coordinates or even sound/image analysis). The sampling scenarios determined in this MP file section should be converted in comprehensive guidelines for the patient in order to schedule his medical tests in the right time. Figure 9 shows how the UserApp converts the JSON format of the Sampling Section in natural language. The sampling scenario for the screening of a patient with kidney failure is examined to demonstrate the flexibility of the MP file format in determining any usage of the medical sensors. More specifically, it demonstrates how different regular and irregular sampling rates can be supported. Sleep apnea which is a breathing disorder is highly prevalent in these patients. This disorder results in reduced blood oxygen saturation (called hypoxemia) and oxygen saturation is below 90%. It is also the cause of high blood pressure and can cause cardiac diseases that are frequent in chronic kidney disease and dialysis patients [30]. The sensor sampling scheduling of Figure 10 is implemented by the MP file "Sampling" section as shown in Algorithm 1.

Alert Configuration
Local rule checks performed by the UserApp can be defined in the "Alert" section of the MP. Whenever the condition in a rule is found true, an action is taken. In the Computers 2021, 10, 22 13 of 24 current version this action is a message displayed to the user. Any logical condition C is expressed as: C = ((s 0 ≥ s 0,min )·(s 0 ≤ s 0,max ))·((s 1 ≥ s 1,min )·(s 1 ≤ s 1,max )) . . . , (1) where s 0 , s 1 are sensor values and s 0,min , s 0,max , s 1,min , s 1,max are the allowed limits in the sensor values. The operators "·", "+" and "L" are the logical AND, OR and inversion of L, respectively. If ±∞ can be used in these limits then, the inversion of a condition can be expressed with Equation (2). For example ((s 0 ≥ s 0,min )·(s 0 ≤ s 0,max )) = ((s 0 > −∞)·(s 0 ≤ s 0,min ))·((s 0 ≥ s 0,max )·(s 0 < +∞)), (2) Since all the conditions in a rule must be true to perform the defined action (i.e., they are related with an AND operator) the OR operator (+) in Equation (2) can be implemented by defining twice the rule (with the same action): one for: ((s 0 > −∞)·(s 0 ≤ s 0,min )) and one for: ((s 0 ≥ s 0,max )·(s 0 <+∞)). An example that shows that any complicated condition can be supported by the format of the Alerts section of the MP file is the following: let us assume that the action A has to be taken if the following condition C is true: Equation (3) can be rearranged by splitting the condition C in 3 separate conditions C 1 , C 2 , C 3 , as follows: A : C 3 = (((s 1 ≥ s 1,max )·(s 1 ≤ +∞))·((s 3 ≥ true)·(s 3 ≤ true))), This rule can be expressed in the Alerts section as shown in Algorithm 2.
As can be seen from Algorithm 2, "Inf" has been used to denote infinite and the minimum, maximum of the sensor values have been declared as: s 0_min , s 0_max , s 1_min , s 1_max . Conditions C 1 , C 2 , C 3 have been implemented by the pairs: (c1a, c1b), (c2a, c2b) and (c3a, c3b), respectively. Although three different rules have been defined (m1, m2, m3), they all trigger the same action "A." Of course the negation of, for example, s ≥ s min is s < s min and not s ≤ s min as used above in Algorithm 2 but this could be easily handled without modifying the MP file format, if a small correction e is used: s ≤ s min − e.

Materials and Methods
In this section, the sound analysis facilities offered by the ScientificApp are employed as a case study to demonstrate how the developed platform can assist the research on Covid-19 symptoms. More specifically, a number of new classification techniques based on similarity metrics are developed and tested. One of the most representative symptoms of Covid-19 is the dry cough. Recognizing the particular cough sound produced by a person infected by Covid-19 is not an easy task since it depends on the gender, the age as well as the progress of the infection. Computer aided sound analysis that allows the classification of the cough sound may help in this direction.
As already described in Section 2.5, the researcher can use a number of wav files that store recorded cough sounds. The labeled wav files with known origin are used for training. N-point FFT is applied to the wav file samples with subsampling 1/S m (i.e., 1 of S m consecutive samples is used). Subsampling allows a single FFT frame to span over a longer time interval. Each segment of the sound clip may have different significance in the classification of the sound but for the sake of simplicity we will assume that all the N f , FFT output sets may be averaged to estimate the significance of each frequency bin in the classification process (called henceforth, "Add" combination). Alternatively, we will also examine the case where the maximum power that appears for the same frequency among the different N f segments of a single wav file, is used as the final bin value ("Max" combination). If FFT n () denotes the n-th FFT operation that accepts as input a set of N input values: x = {x n,0 , x n,1 , . . . , x n,N−1 } and output magnitudes: X = {X n,0 , X n,1 , . . . , X n,N−1 } then, the results R = {R 0 , R 1 , . . . , R N−1 } of the sound analysis using the "Add" combination would be: {X n,0 , X n,1 , . . . , X n,N−1 } = FFT({x n,0 , x n,1 , . . . , x n,N−1 }), x i = y i·S m (7) In the combination called "Max" each R i would have been estimated as: Additional averaging of the R sets is performed to combine the training results and extract the reference spectrum R f = {R f i }, where each R f i has resulted from averaging the R i 's of all the training sounds. Special techniques can be applied to smooth the FFT output spectrum: Moving Average Window (MAW) and Principal Component Analysis (PCA).
In MAW each output X i is substituted by X i = 1 W ∑ W/2 k=−W/2 X i+k , where W is the window used for the averaging.
PCA is a useful statistical technique which is common for finding strong patterns in data of high dimension. These patterns do not only allow the interpretation of the data in a more meaningful way but they can also be used to compress these data. Assuming we have a matrix X r×n = [X c1 X c2 . . . X cn ] with measurements, where X ci is a sub-vector of r elements s j , which is necessary for the calculation of principal components.
The covariance matrix with each X ci has to be calculated in order to measure the variability of data: The Singular Value Decomposition (SVD) of (R) rxr is: where S is a diagonal matrix with the singular values of R c on its diagonal, V is a matrix the columns of which are the right singular vectors of R c and U = [U 1 U 2 . . . U n ] is a feature vector (matrix of vectors) the columns of which are the left singular vectors of R c . The first m eigenvectors (ordered by magnitude) from the vector U are preserved in order to create (U reduced ) nxm = [U 1 U 2 . . . U m ], m < n. Those elements correspond to principal components and m is the level of smoothing or compression.
The transpose of the vector is multiplied by the original data set: Computers 2021, 10, 22

of 24
The original data are retrieved by: The original data are recovered with no error if all the eigenvectors are used in the transformation, otherwise the data are recovered in a lossy way.
In the experiments conducted, all the combinations between different values for N = 256 or N = 1024, S m = 1/1 or S m = 1/4, MAW (with W = 7 or 15) or PCA (with m = 2 or 5) or No smoothing filter, "Add" or "Max" combinations have been tested with the same sound files. The most interesting combinations are the ones presented in Table 1   The classification is performed based on a similarity metric called Pearson Correlation Coefficient (PCC) [31]. The PCC is estimated between R that is estimated for a new sound file and each Rf c (the Rf estimated from the training samples of class c): The P c coefficient values are between −1 (rising edges of R correspond to falling edges of Rf and vice versa) to 1 (perfect match). P c equal to 0 implies that there is no linear correlation between the variables.
Let TP TN, FP and FN be the number of samples recognized as True Positive, True Negative, False Positive and False Negative, respectively, for a specific class. The Sensitivity (or Recall or True Positive Rate), the Specificity (or True Negative Rate), the Precision and the Accuracy are defined as: Speci f icity = TN TN + FP (16)

Experimental Results
A number of cough sound files (300) were retrieved from SoundSnap (soundsnap.com) and labeled according to the information given in this website, initially in 5 categories as shown in Table 2. The population of each category is not constant and depends on the availability in SoundSnap. The duration of the analyzed sounds, ranges between 1 and 5 s. Ten (10) representative sound files of each category were used for training that is, for the extraction of its reference Rf set. Taking into consideration the overall number of samples per sound class, the training set size ranges between 8% and 33% of each class population. In several machine learning applications, the training size is in the order of 3 4 of the data size. Increasing the training set size is expected to improve the achieved classification accuracy. All the files were used as a test set in each one of the methods listed in Table 1 and the classification results were used to estimate the TP, TF, FP, FN parameters. Although in large datasets it would be more appropriate to use separate training and test sets, the small number of training samples used here and the dataset size allows us to apply the test to all the available samples. The Sensitivity, Specificity, Precision and Accuracy achieved by each method are listed in Tables 3-6, respectively. We consider the sensitivity and accuracy as the most representative metrics and display these experimental results in Figures 11 and 12, respectively.       Table 6.
The rationale behind the selection of the specific combinations listed in Table 1 is the following: The unfiltered data were tested initially using 1024_s4 and 256_s1. As can be seen from Figures 11 and 12, 1024_s4 achieves better results in most of the cases. For this reason, N = 1024 and Sm = 4 was used in the rest of the combinations tested. When PCA was applied, better results were achieved when 5 principal components were used instead of 2. Similarly, when MAW was employed using a window of W = 7 samples achieved better results than using W = 15. From these results it can be concluded that the classification results are not favored by a heavy smoothing. All of the aforementioned methods concerned the extraction of Rf using "Add" Combination. The "Max" Combination was applied only to the methods that achieved the best results with the "Add" Combination that is: 1024_s4, pca128_5 and maw7.
The 5 categories listed in Table 2 were further split in the 9 categories of Table 7. Ten training samples were used again except for the case of Child-Productive, since in this case there were only 6 samples available and all of them were used both for training and test. Figures 13 and 14 display the sensitivity and accuracy results for this case.  Figure 12. Accuracy results corresponding to Table 6.
The rationale behind the selection of the specific combinations listed in Table 1 is the following: The unfiltered data were tested initially using 1024_s4 and 256_s1. As can be seen from Figures 11 and 12, 1024_s4 achieves better results in most of the cases. For this reason, N = 1024 and S m = 4 was used in the rest of the combinations tested. When PCA was applied, better results were achieved when 5 principal components were used instead of 2. Similarly, when MAW was employed using a window of W = 7 samples achieved better results than using W = 15. From these results it can be concluded that the classification results are not favored by a heavy smoothing. All of the aforementioned methods concerned the extraction of Rf using "Add" Combination. The "Max" Combination was applied only to the methods that achieved the best results with the "Add" Combination that is: 1024_s4, pca128_5 and maw7.
The 5 categories listed in Table 2 were further split in the 9 categories of Table 7. Ten training samples were used again except for the case of Child-Productive, since in this case there were only 6 samples available and all of them were used both for training and test. Figures 13 and 14 display the sensitivity and accuracy results for this case.    Elderly-Male-Productive Productive cough of elderly male 58 (17%) Elderly-Female-Productive Productive cough of elderly female 30 (33%) Figure 13. Sensitivity results. Figure 14. Accuracy results. Figure 14. Accuracy results.

Comparison
Cough features have been exploited in the diagnosis of several diseases of the respiratory system. Until recently, the patients reported the severity of their cough to the supervisor doctor. This approach of course is not reliable and often subjective [32]. The sound signal of the cough can reveal critical information about the patient condition [33]. Continuous audio signal collection provides important information and 24 h cough count is a metric used to approve an appropriate treatment. Automated cough recording and counting solutions have been recently presented in the literature [34][35][36]. In [34] the sound is split in segments of 1 s processed by Fourier transform with a window size of 25 ms, a window hop of 10 ms and then a periodic Hann window is used to generate the corresponding spectrum. The output of this analysis is input to a convolutional neural network that classifies the sounds as cough or something else: noise, throat clearing, sneeze and so forth. Despite the fact that the sound is affected by several parameters (noise, gender, age, disease, etc.) its recognition as cough is a binary decision and definitely this an easier task than classifying it in the multiple categories of Table 2 or Table 7.
In a more sophisticated approach [37], a novel feature set called intra-cough shimmer jump and intra-cough Teager energy operator jump is used to discriminate dry or wet coughs. The frequency, intensity of coughing and acoustic properties of cough sound, were analyzed in [38] but no classification results are presented. Table 8 summarizes the results achieved by these referenced approaches. In the first row, the classification in the 5 categories of Table 2 is used and in the second row, the classification in the 9 categories of Table 7. If the various classification methods examined on the Coronario platform are compared according to their accuracy, MAW7 achieved a slightly better average accuracy as shown from Tables 2 and 7. Consequently, the achievements of MAW7 are listed in Table 8.   The overall accuracy of the tested classification methods is acceptable although the average sensitivity is not particularly high. Especially, when the classification has to be performed in the categories of Table 7, there are some cough classes that achieve a sensitivity lower than 50% which is clearly not acceptable. It should be stressed that the sensitivity/accuracy results presented were achieved using only 10 training samples per cough class. The use of a larger training dataset is expected to lead to better accuracy. Moreover, the accuracy of the other approaches, concerned binary decisions (cough or noise [34][35][36], dry or productive cough [37]) and for this reason it is no surprise that this discrimination is accomplished with higher success rate in some referenced approaches.

Discussion
The already implemented sound processor can be used to classify cough or respiratory sounds related to the infection. The aim of the experiments conducted in the previous section is to demonstrate how the developed platform can host several filtering, signal processing and classification methods in order to exploit the medical information exchanged. Nonetheless, the classification accuracy results obtained in some cases are quite good compared to the referenced approaches listed in Table 8, taking into consideration that the referenced approaches concern binary decisions. It is obvious that the classification of a sound file in one of the multiple classes that are supported is a much harder problem. Furthermore, the classification accuracy is expected to be much higher if a larger portion of the dataset is used for training. Image processing will also be supported in the next version of the system, to monitor for example, skin disorders as already studied in our previous work [39]. Additionally, the image processing methods that will be employed may be used to scan the results of for example, Coaxial Tomography, X-ray, ultrasound images and extract useful features for the diagnosis of an infection.
In the current version, the supervisor doctor is responsible for the diagnosis based on the data retrieved from the UserApp and the IoT medical sensors. Nevertheless, AI deep learning tools such as TensorFlow or Caffe can be trained to assist the automation and reliability of the diagnosis. Such a neural network model can be trained using the information exchanged between the Coronario modules (sensor values, user information, geolocation data, etc.). The outputs of this model could be discrete decision about the treatment or suggestions about the next steps that the patient has to follow. Such a trained neural network model can be easily attached on the side of the supervisor doctor where all the information is available or on the side of the eHealth sensor infrastructure. The latter case can be used if some alerts have to be generated locally based on the sensor values. This process can support the alert rules already defined in the MP file as discussed in Section 3.3.
Several privacy and security issues have been addressed in order to make the system compliant with General Data Protection Regulation (GDPR). First of all, encryption is employed during the exchange and the storage of data. The information exchanged can be decoded at the edge of the Coronario platform from the users (e.g., patients, supervisor doctors or researchers) based on the certificates they own. The encrypted data are unreadable by anyone that does not own the appropriate keys/certificates (e.g., while they reside in the cloud). Only authenticated users have access to the stored information with different permissions. The authorized doctors can download information that concerns only their patients. The researchers can retrieve data from a pool of sensor values that are anonymized and accompanied with permissions for research use. The patients can download the MP files and the messages sent to them from their supervisor doctor. The cloud administrators will not be able to decrypt and read the stored information although the Service Level Agreement (SLA) between the cloud provider and the healthcare institutes will anyway prohibit such an access.
Only the necessary information is stored. Local processing is encouraged to minimize the risk of data exposure during wireless or even wired communication. This is the reason why a local Sensor Controller (Gateway) such as the one used in [29] is proposed for the sensor infrastructure of the Coronario platform. The supported alerts defined in the medical protocol can check various parameters locally and perform certain actions avoiding the move of sensitive data to the cloud. Several cloud facilities have been employed to guarantee the protection of the stored data: different access privileges, alerts at the cloud level to warn about attacks to the data by unauthorized personnel, the data are deleted from the cloud as soon as possible and so forth. For example, the sensor values can be deleted immediately after they are read by the authorized supervisor. The geolocation data show the places visited by the user and can be deleted after a safe interval of for example, 14 days. Finally, the data used for research such as audio or image files are anonymous and are used only after the consent of the patient.

Conclusions
A symptom tracking platform for Covid-19 or other similar infections has been developed. It is suitable for pre-or post-hospitalization and vulnerable population monitoring. Alert rules, sensor sampling scenarios and questionnaire structure are dynamically determined. An editor is available for sound classification, assisting the research on the infection symptoms (e.g., cough or respiratory sounds). The condition of a patient is reliably assessed since information from several sources is taken into account. A number of sound signal filtering and analysis methods have been already employed and tested for the classification of cough sounds with 90% accuracy.
Future work will focus on testing the developed platform in real environments, measuring its effectiveness in the protection of the public health and its financial impact. We will also extend ScientificApp module to allow experimentation with sound and image classification using larger datasets and different ML methods.