Speech-Based Support System to Supervise Chronic Obstructive Pulmonary Disease Patient Status

Featured Application: This work represents a proof of concept for COPD patient supervision, which can lead to a potential application related to home monitoring for clinicians working on the respiratory ﬁeld. Abstract: Patients with chronic obstructive pulmonary disease (COPD) suffer from voice changes with respect to the healthy population. However, two issues remain to be studied: how long-term speech elements such as prosody are affected; and whether physical effort and medication also affect the speech of patients with COPD, and if so, how an automatic speech-based detection system of COPD measurements can be inﬂuenced by these changes. The aim of the current study is to address both issues. To this end, long read speech from COPD and control groups was recorded, and the following experiments were performed: (a) a statistical analysis over the study and control groups to analyse the effects of physical effort and medication on speech; and (b) an automatic classiﬁcation experiment to analyse how different recording conditions can affect the performance of a COPD detection system. The results obtained show that speech—especially prosodic features—is affected by physical effort and inhaled medication in both groups, though in opposite ways; and that the recording condition has a relevant role when designing an automatic COPD detection system. The current work takes a step forward in the understanding of speech in patients with COPD, and in turn, in the research on its automatic detection to help professionals supervising patient status.


Introduction
Acoustic speech features are affected in patients suffering from respiratory diseases such as chronic obstructive pulmonary disease (COPD) or asthma [1][2][3], especially fundamental frequency (f0) and voice quality parameters. Some works have reported relevant correlations between f0 and voice quality parameters with the smoking index and the forced expiratory volume in one second (FEV 1 ) and the asthma degree [1,2], and increased voice quality parameters or breath sounds in patients with pneumonia and COPD [4][5][6][7], so that relevant differences appear between healthy and COPD subjects in acoustic and perceptual voice parameters [8]. Voice changes are also encountered in respiratory sounds in infants [9], bronchodilator response effects in asthma tracheal sounds [10][11][12], obstructive sleep apnea [13], and voice rise in dyspnea [14].
Most of these studies are focused on the analysis of small segments of controlled speech, or even only on sustained sounds of vowels. However, natural speech, either read or conversational, provides information not contained in short speech segments, such as prosodic elements [15]. Further studies have dealt with the automatic detection of breathing signal during conversational speech from low-level spectral features [16], the prediction of exacerbations in high-risk patients using the COPD assessment test (CAT) [17], the identification of clinical factors that modulate the risk of progression to COPD among asthma patients using electronic medical records [18], or the prediction of non-COPD/COPD patients from natural speech by means of pauses and voice quality parameters [19]. All these works represent a step forward in the automatic assessment of COPD measurements, which can greatly benefit the task of clinicians. Nevertheless, a more fine-grained detection is needed, to see to what extent speech features can predict the exacerbations of COPD associated with impairments reflected in parameters such as FEV 1 , mMRC (modified Medical Research Council) dyspnea scale [20,21], and CAT (COPD assessment test) questionnaire [22], and to what extent different recording conditionssuch as doing a physical effort or medication intake-can affect the widely reported voice variations and thus the accuracy in automatic detection of COPD measurements.
The main contributions of this paper are: 1. The analysis of prosodic features in long read speech, apart from low-level acoustic features. Based on the hypothesis that prosodic features are also vulnerable to changes in COPD patients, long speech analysis is needed to extract prosodic parameters apart from mere acoustic features.
2. The analysis of how these speech features are affected by physical effort and after medication intake. Based on the hypothesis that physical effort and inhaled medication can affect oral airflow and air pressure during speech production, the speech parameters will then be modified under these conditions.
3. The analysis of to what extent COPD/non-COPD subjects can be predicted from speech, and how variations of speech that feature in different recording conditions affect the automatic prediction of COPD measurements. If COPD is manifested through speech, one should be able to predict-to a certain extent-whether an individual suffers COPD or not. Moreover, if knowledge on COPD measurements can be inferred from speech, and if speech features change due to different recording conditions, then the way in which an automatic detection system is defined from speech recordings is crucial.

Materials and Methods
40 COPD patients and 19 non-COPD control subjects were recruited in a primary health care centre (CAP) to collect voice samples in different conditions: in rest, after doing a physical effort, and after an inhaled medication intake (only COPD patients). From the initial number of 40 study and 19 control subjects, only valid data from 35 study (288 recordings) and 13 control individuals (124 recordings) were collected due to several reasons: insufficient number of voice samples, recordings performed incorrectly, lack of availability or interest to repeat or complete the sessions, spirometry test missing, or FEV 1 results not meeting the inclusion criteria (<0.8). Speech features were then extracted from the recordings, and they were used to perform a statistical analysis to see how they varied between different conditions, and to build an automatic detection system to predict FEV 1 values from speech to infer how such variations can affect a classification experiment.

Sampling and Study Design
This study is an observational study. Participants were selected from a patient population attending at CAP Comte Borrell in Barcelona and were divided into study and control groups. Due to the exploratory character of the pilot study, no formal calculation was performed for sample size. The study group sample size was chosen based on experimental criteria selecting a total of 40 patients. For the control group, a minimum size of 10 patients was chosen. While 50 patients are generally considered a small group, numerous studies in the field of voice analysis have shown statistically significant correlations with a sample of 50 individuals or inferior [1,10,11,13].

Inclusion and Exclusion Criteria
Inclusion criteria for the study group: people >45 years with diagnosis of COPD according to FEV 1 [23]. Exclusion criteria: patients with diagnosis of COPD and homeadministered oxygen therapy; patients with pathologies that affect speech; patients with pharmaceutical prescription that are known to influence speech; patients with reduced mobility.
Inclusion criteria for the control group: people >45 years old not diagnosed with COPD, with a value of FEV 1 /FVC > 0.80. Exclusion criteria were applied as in the study group. Potential candidates were selected and contacted by phone by convenience sampling by family doctors in CAP Comte Borrell. Those who agreed to participate in the study gave their written consent. The ethics committee of the Hospital Clínic in Barcelona approved the study design and protocol (HCB/2018/1190).

Study Variables
For the study and control groups, socio-demographic variables (sex, age) and the following clinical variables related to COPD were collected: for the control group, FEV 1 before bronchodilator and smoking habits and history. For the study group, FEV 1 before/after bronchodilator, smoking habits and history, mMRC scale, CAT questionnaire, COPD exacerbations/year, and time since the last episode were also collected. FEV 1 is used to assess COPD and monitor its progression. The mMRC scale for COPD ranges from 0 to 4 grades based on the severity of dyspnea symptoms [24]. CAT questionnaire measures the impact of COPD in daily life, consisting of a list of eight questions rated from 0 to 5, so that the final sum ranges from 0 to 40 [25].
The variables of the study were gathered through the analysis of the health information system (ECAP database) or while interviewing the patient. To rule out undiagnosed COPD, the control group participants underwent a spirometry test performed by trained personnel without using the bronchodilator. During the first visit, the SpO2 was recorded for each of the subjects before and after a test voice recording.

Voice Recording Conditions
All participants in the study received a voice recorder (Model EVIDA L69), a hard copy user manual, a document with instructions on conditions for recordings, a sample text consisting of two long paragraphs to be used during the recordings, and a recordings table to keep record. Several test recordings were done to ensure that participants understood the complete procedure. The participants made the recordings at home in a completely autonomous way, and they returned the recorder once finished.
Each participant in the study was requested to record ten speech recordings in different conditions and different days. In the study group, the conditions of recording were classified according to intake of medication and the physical exertion performed. Recordings were requested to be made in three different days. The recording conditions were: before using the inhaler (days 1-2-3), one hour after using the inhaler (days 1-2-3), at rest (days 1-2), and at rest and five minutes after exercise (days 1-2). Participants with no inhaler prescription were requested to perform only the recordings related to rest and exercise conditions. In the control group, the voice recordings were requested to be made in five different days, and the conditions of recording were classified for each day: at rest, and five minutes after doing exercise. The physical exercise was also adapted to the fitness of the subject.

Extraction of Speech Features
The recordings were performed with portable recorders that provided files in DVI ADPCM (digital visual interface/adaptive differential pulse-code modulation) format, with a sampling rate of 48 kHz and a bit rate of 192 kbps. The original files were converted into PCM WAV (pulse-code modulation/waveform audio file format) for their further processing with the Praat speech analysis software [26].
f0, voice quality parameters such as jitter and shimmer, and pauses in COPD analysis were included following existing literature [1,19,27]. Prosodic information was also included, accounting for the variation of f0 (range and slope), and other rhythm features such as speech rate, articulation rate, and syllable duration. For the extraction of f0 and its related features, the auto-correlation method in Praat was used with an interval of 10 ms and a Hanning window of length 40 ms. Duration and speech rate features were extracted by adapting the Praat script found in [28], without relying on transcriptions, and thus were language independent.

Statistical Analysis
To analyse how speech was modified by physical effort and medication, their mean values and standard deviations were obtained for each feature, each subject, and each recording condition. First, a Shapiro-Wilk t test was performed to check normality in distributions, obtaining p values from 0.166 to 0.999 and W values from 0.771 to 0.999. Then, a one-way statistical paired t test with a confidence interval of 95% was performed to obtain p values and check whether differences of speech features between groups are statistically significant.

Classification Experiments
Machine learning-based classification was used to predict a class (FEV 1 ) given a set of data points (speech features). FEV 1 was split into intervals specified by the Global Initiative for Chronic Obstructive Lung Disease (GOLD) according to the severity degree [29], as shown in Table 1. Table 1. Intervals of FEV 1 values according to their severity degree. GOLD classification.

Class
Degree Range The classification experiments were carried out in two different scenarios: Scenario 1: To discern between non-COPD/COPD individuals (only in rest condition since non-COPD subjects are not taking medication).
Scenario 2: To discern between different COPD degrees in patients. Due to the severity degree of patients with mMRC = 4 and FEV 1 < 30%, the most severe patients were excluded due to their low representation. Only subjects from GOLD 1 and 2 classes were analysed. Here, two different classification experiments were tested: (a) using only in-rest recording conditions; (b) using all recording conditions, in two different subcases: 1) experiments using only speech information, and 2) experiments providing additional information on the recording condition and subject medication.
The experiments were performed with the Weka workbench [30]. Among several classification algorithms included in Weka, Random Forest was chosen for being the best performative one in terms of accuracy. The whole set of instances were used in a 10-fold cross-validation to get the maximum number of instances in the classification task [19], ensuring that subjects used in the training folds were not present in the testing fold, and using the following hyperparameters: number of iterations (-I) 100, batch size percentage (-P) 100, seed (-S) 1, and a maximum path length of 0. Table 2 shows the main characteristics of participants from the study and control groups.  3 1.1 (0.9) n/a n/a CAT score, average (SD) 4 11.3 (7.5) n/a n/a Number of exacerbations/year, average (SD) 5 1.3 (1.1) n/a n/a Table 3 shows the statistically significant changes in each of the speech features due to physical effort in both study (S) and control (C) groups, and Table 4 shows the statistically significant changes in each of the speech features due to medication in the study group. Corresponding FEV 1 , mMRC, and CAT values are provided along with the statistical measures. In both tables, the smoking habits information has been included next to the subject ID: smoker (s), non-smoker (ns, never smoked), and ex-smoker (es). Only p values < 0.05 are shown for simplicity. Positive (+) and negative (−) changes in the mean values between rest and physical effort/medication conditions are shown after the p values. In the control and study groups, the significant differences after doing physical effort showed absolute t values ranging from t(2) = 2.13 to t(2) = 3.96, and t(2) = 2.38 to t(2) = 24.10, respectively. In the study group, the significant differences after taking medication showed t values ranging from t(2) = 3.10 to t(2) = 24.51. The percentage of subjects in which each feature exhibited a statistically significant modification is also shown. Values with strongest statistical significance (p < 0.010) are marked in bold.

Voice Samples and Statistical Analysis
To avoid data imbalance in scenario 1, the non-COPD instances were oversampled to equal both groups and use a baseline of 50% in accuracy. A total of 314 instances (157 for each class) were used to classify COPD/non-COPD classes from the set of eleven speech features. Then, an automatic classification using a random forest (RF) with 10-fold cross-validation in rest condition was performed, achieving an accuracy of 72.0% of COPD detection in the dataset and using only those instances recorded in rest conditions, over a baseline of 50%. Sensitivity and specificity are 70.0% and 74.5%, respectively.  In scenario 2, after excluding the most severe cases, the average values of FEV 1 before the bronchodilator (preFEV 1 ) and after the bronchodilator (postFEV 1 ) were computed and used to set up a threshold to define the classes in the classification experiments, ensuring the maximum balance of instances per class. In preFEV 1 , class1 was assigned to instances with preFEV 1 < 75%, and class2 to instances with preFEV 1 ≥ 75%. Similarly, the threshold for postFEV 1 was fixed to 77%. In total, 222 instances were used in case (a)-rest conditions-, and 288 instances in case (b)-all recording conditions-to classify two classes for preFEV 1 and two classes for postFEV 1 from the set of speech features. In a third experiment, the classification was fed with additional information regarding which condition is related to each speech feature, and whether the patient is taking medication or not. As in scenario 1, the groups are balanced, and baseline is of 50% (Table 5). Considering class1 as negative condition and class2 the positive condition, the following measurements are shown: accuracy (percentage of correct instances), sensitivity (true positive rate), and specificity (true negative rate). Table 5. Accuracy (Acc), sensitivity (Sens) and specificity (Spec) values for the automatic classification of FEV 1 given a reference threshold.

Discussion
From the results in Tables 3 and 4, based on the experimental dataset, it can be inferred that, with regards to speech features, they all exhibit statistically significant differences between rest and physical/medication conditions, ranging from 4% to 38% (except for speaking rate) depending on the speech feature and the group of analysis, although no significant correlation is observed between the number of relevant modifications in speech features and COPD variables (FEV 1 , mMRC and CAT). However, when validating the significant modification in f0 values after taking the medication by computing the correlation of such changes (i.e., f0 after medication-f0 before medication) with four different COPD indicators: the FEV 1 value without bronchodilator, the FEV 1 value with bronchodilator, the modified Medical Research Council (mMRC) scale, and the COPD assessment test (CAT), the obtained results showed that high FEV1 values and low mMRC and CAT indicators (which are related to healthier subjects), clearly showed a tendency to smaller f0 modifications. This means that f0 modifications after medication are larger in those subjects with higher severity COPD degrees.
Regarding the effect of physical effort, in the study group, the less significant differences are observed in the prosodic features related to rhythm, and the most significant differences are in f0 and shimmer. Effort tends to be more relevant in the control group, with voice quality, f0, f0 range, and pause ratio being the most affected speech features. Regarding the effect of medication, the greatest effect is observed on the speaking rate.
The difference of the physical effort effect on both the study and control groups is significant. While a larger effect would have been expected in the COPD group due to their pulmonary and respiratory impairments, the results show the opposite. This could be explained by the fact that COPD patients might not have the same capability of making a large effort as the healthy group, and the resulting physical effort is lighter, so is their effect; or perhaps their voice parameters are always impaired, so no differences are perceived before and after exercise. A detailed look at a feature basis reveals that physical effort in general causes higher f0 values, more pauses, longer syllables, a more flattened intonation (lower values of f0 range and slope), lower voice quality features (probably due to a lower air pressure), and lower speech rates. The results also suggest that, unlike the physical effort effect, medication produces a less flattened intonation (f0 range and slope), and faster rhythm characteristics: higher speech rates, fewer pauses, and lower segment durations. By contrasting the results with the smoking habits of the subjects (currently smoker (s), never smoked (ns), and ex-smoker (es)), it transpired that there is no direct correspondence between the number and the degree of changes in speech features neither in the physical effort nor in the medication conditions. Even for the two smokers of the control group, their behaviour is similar, and they do not differ with respect to the non-smoking subjects.
The differences encountered can be relevant when designing a COPD prediction experiment, and the following conclusions can be inferred from Table 5. First, preFEV 1 can be detected more accurately than postFEV 1 when only the speech information extracted from the rest condition is used. This makes sense, since the in-rest samples have been recorded without any kind of medication, while postFEV 1 is obtained after use of the bronchodilator, which can cause changes not reflected in the corresponding speech samples. Second, by using all three conditions, the detection of both FEV 1 values decreases, since the aim is to predict the same value (preFEV 1 or postFEV 1 ) from speech samples that suffer a degree of variability due to the different recording conditions. However, in this case, the difference in the prediction of both preFEV 1 and postFEV 1 is not significant. Third, by using speech samples extracted from different conditions but also feeding the classifier with information about what recording condition corresponds to each speech sample and whether that patient is usually taking medication, the accuracy is highly increased. This can be explained by the fact that the system is provided with information about the source of speech variation, which is thus learned by the system. Furthermore, patients taking medication are usually associated with lower FEV 1 values, which also helps the system to discern between different COPD degrees. Last, it can be observed that sensitivity and specificity remain similar for each set of experiments, except for the last case (all conditions plus additional information) in which the rate of true positives detected (sensitivity) clearly outperforms the rate of true negatives (specificity).

Conclusions
The presented work explores the effect of physical exertion and medication on speech features in subjects with COPD. The effects of physical effort have been compared to those produced by a control group. The results have shown that several speech features, ranging from f0, voice quality, to prosodic features based on intonation and rhythm, might suffer signification changes under both physical effort and inhaled medication, and that the differences of the physical effort and medication effects are consistent in intonation and rhythm prosodic features. While effort causes a more flattened intonation and lower rhythm characteristics, medication seems to help patients to achieve more expressive speech and a higher rhythm, reflected as higher speech rates, a smaller number of pauses, and lower segment durations.
The current study also explored the performance of a classification system to discern between COPD/non-COPD subjects, and to predict FEV 1 . Very few studies deal with classification of COPD variables. The results achieved up to 72% of classification accuracy over a baseline of 50%, which are comparable to-and even outperform-the ones obtained in previous studies such as [19], where COPD and non-COPD individuals were found differentiable with an accuracy of 68%. Moreover, it was shown that beyond the specific accuracy values obtained, modifications in speech feature values due to different conditions can affect the performance of a COPD automatic detection system, so that its design regarding the recording conditions is important for a successful performance.
This work represents a proof of concept for COPD patient supervision using machine learning algorithms and takes a step forward to the understanding of COPD, and to the benefits of automatic detection of COPD variables from speech, which could be useful for clinicians working in the respiratory field. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical issues.