Low-Power Wearable Respiratory Sound Sensing

Building upon the findings from the field of automated recognition of respiratory sound patterns, we propose a wearable wireless sensor implementing on-board respiratory sound acquisition and classification, to enable continuous monitoring of symptoms, such as asthmatic wheezing. Low-power consumption of such a sensor is required in order to achieve long autonomy. Considering that the power consumption of its radio is kept minimal if transmitting only upon (rare) occurrences of wheezing, we focus on optimizing the power consumption of the digital signal processor (DSP). Based on a comprehensive review of asthmatic wheeze detection algorithms, we analyze the computational complexity of common features drawn from short-time Fourier transform (STFT) and decision tree classification. Four algorithms were implemented on a low-power TMS320C5505 DSP. Their classification accuracies were evaluated on a dataset of prerecorded respiratory sounds in two operating scenarios of different detection fidelities. The execution times of all algorithms were measured. The best classification accuracy of over 92%, while occupying only 2.6% of the DSP's processing time, is obtained for the algorithm featuring the time-frequency tracking of shapes of crests originating from wheezing, with spectral features modeled using energy.


Introduction
Asthma is one of the most common chronic diseases, affecting more than 300 million patients worldwide. Long-term disease management is required in order to maintain the life quality of asthmatic patients and to prevent the progression of the disease. Management mainly consists of adherence to a prescribed medication plan and avoidance of asthmatic attack triggers. The occurrence of symptoms, such as "asthmatic wheezing" in respiratory sound, indicates a low level of control over the chronic disease [1].
Recently, medical devices for the quantification of wheezing appeared on the market [2]. The devices, operating on-demand in handheld form and operating overnight in holter form, were found to be useful in clinical trials for the diagnosis of asthma in children during bronchial challenge tests [3], for the monitoring of the response to therapy [4] and for the diagnosing of nocturnal asthma [5]. Nevertheless, the current practice of long-term asthma management still lacks a low-cost and wearable sensing system to empower patients and caregivers to continuously track the intensity of symptoms on their own. Recently, the advancement of low-power electronic technologies and the advent of smartphones enabled the design of sensing systems consisting of unobtrusive wearable sensors, measuring physiological signals, and a smartphone, serving as a gateway and interface for feedback to the patient [6][7][8][9].
The concept of such a sensing system for the detection of asthmatic wheezing is shown in Figure 1. The battery-powered sensor node is worn on the skin surface. It consists of an acoustic sensor (microphone or accelerometer), a signal conditioning and an analog to digital conversion circuit (ADC), a digital signal processor (DSP) and a radio module communicating with the smartphone. A respiratory sound analysis algorithm performing real-time detection of wheezing is executed on the DSP on-board sensor node. The low power consumption of the wearable, size-constrained wheeze detection sensor node is required in order to achieve long autonomy. In [10], the power consumption of such a sensor node was profiled, identifying the DSP and the radio module as the main consumers. The context of the medical application and the use of a smartphone as a peer device narrow the choice of radio modules to IEEE-802. 15.1 (Bluetooth) and IEEE-802. 15.4 (ZigBee) compliant, setting the boundaries of the radio power consumption [11]. Considering that the radio power consumption is kept minimal if transmitting only upon (rare) occurrences of wheezing, we focus on optimizing the power consumption of the DSP.
The main hardware prerequisites for low DSP power consumption are: (a) architectural features enabling efficient code execution (the number of instructions per clock cycle); (b) the high ratio of the processing speed with respect to the power consumed in the active state (millions of instructions per milliwatt); and (c) low power consumption in non-active states (standby, sleep, etc.) [10]. Following these guidelines, a Texas Instruments TMS320C5505 [12] 16-bit fixed-point audio/speech processor, featuring fast Fourier transform (FFT) unit, was chosen for this study.
In software, the DSP power can be lowered by minimizing portions of the time spent in the active state, by shortening the wheeze detection algorithm's execution time. Wheeze detection is performed on features obtained from time-frequency decompositions of respiratory sound: short-time Fourier transform (STFT), cepstral analysis, wavelets or linear prediction [13][14][15][16]. Most numerous are the algorithms using computationally fast STFT [17][18][19][20][21][22][23]. However, to the best of our knowledge, no work has been done regarding their mutual comparison in terms of the relation between their classification accuracies and execution speeds.
Thus, the contributions of this article are: (a) the review of STFT-based wheeze detection algorithms; (b) the analysis of the a priori computational complexity of the representative algorithms and their DSP implementation; (c) the test environment for the automated assessment of the classification accuracies of the algorithms running on the DSP; and (d) the analysis of the relation between the accuracies and execution times, for two scenarios of different detection fidelities: (1) the detection of the occurrence (event) of wheezing; and (2) the tracking of the wheeze duration.
The outline of the article is as follows: Section 2 describes the properties and the acquisition of the respiratory sound signal. Section 3 reviews the previous work on the detection of wheezing, focusing on STFT-based algorithms. Section 4 describes the implemented algorithms. Section 5 describes the evaluation methodology: the hardware platform, test signals, testing procedures and metrics. The results are listed in Section 6 and discussed in Section 7, and conclusions are drawn in Section 8.

Acquisition of Respiratory Sounds
Air streaming through airways produces mechanical vibrations, which are conducted through body tissues to the skin surface [24]. The human body's transfer characteristic is a low-pass-type with the parameters varying with the local tissue. On the skin surface, vibrations are sensed by a transducer, most commonly an electret-condenser microphones. The microphone is coupled to the skin surface through an air cavity formed by a shallow conical or bell-shaped enclosure attached to the skin. As an alternative, accelerometers can be attached directly to the skin surface [25]. Both the frequency characteristic and dynamics of the signal acquired at the output of the transducer are patient dependent and affected by: the measurement location [25], body posture [26], the geometry of the transducer coupling [27] and the transducer design [28].
Usually, the transducer output signal contains heart sounds concentrated below 60 Hz superimposed on the respiratory sound signal [29]. Thus, an analog bandpass filter is commonly used to isolate the respiratory sound signal band. An amplifier with a gain of 40-60 dB is required to adjust the dynamics of the microphone output (order of magnitude: 1-10 mV) to the input range of an analog to digital converter (ADC). Usually, the signal is digitized to 16-bit resolution, with a sampling frequency higher than 5,000 Hz [30].

Time-Frequency Properties of Respiratory Sounds
Normal respiratory sounds are cyclostationary, exhibiting the repetition of respiratory cycles. Each respiratory cycle can be divided into the inspiratory phase, the expiratory phase and the inter-respiratory pause. The respiratory sounds of the inspiratory phase usually exhibit a higher amplitude and are of a longer duration than the sounds during the expiratory phase [31]. Normal respiratory sounds' frequency spectrum is similar to a band-limited colored noise. The majority of the energy of the respiratory sounds acquired over lungs is typically grouped into the 100 to 250 Hz band, while tracheal sounds have a wider frequency band, with components extending to about 1,000 Hz.
Asthmatic wheezing is a time-continuous, tonal adventitious sound occurring during a fraction of the respiratory phase (either inspirium, expirium or both). It can last from tens of milliseconds to several seconds. Wheezing can be modeled as a single-or multi-component harmonic signal superimposed on the frequency spectrum of a normal respiratory sound. The harmonic components originating from wheezing typically appear in the frequency range between 100 and 1,500 Hz [31]. Both the amplitudes and instantaneous frequencies of the harmonic components of wheezing gradually change throughout its duration. In the rest of the text, we assume that the signal is divided into segments, short-enough in order to be considered stationary segment-wise, allowing us to track the temporally-evolving frequency content of respiratory sounds by STFT.  A comparison of STFT time-frequency decompositions of normal respiratory sounds and respiratory sounds containing wheezing is shown in Figure 2. The harmonic components originating from wheezing appear as continuous frequency peaks elevated against the noise of normal respiration. The peaks of wheezing are localized along the frequency axis and spread in the direction of time axis.

Review of the STFT-Based Wheeze Detection Algorithms
This section provides an overview of the wheeze detection algorithms based on STFT decomposition. The employed preprocessing steps, feature extraction and classification methods are discussed. Table 1 summarizes the publications reviewed.

Signal Decomposition by STFT
The first step of a wheeze detection algorithm is the time-frequency decomposition of the respiratory sound signal in order to obtain its time-varying frequency content. Discrete STFT is used, because of the fast execution, despite its known limitations regarding temporal-frequency uncertainty.
Discrete STFT X[m, k] defined in Equation (1) calculates the k-point discrete Fourier transform of discrete-time windows w, sliding by step m over the signal, x. The non-rectangular window function is used to prevent spectral leakage due to finite window length N . Commonly, cosine window functions, such as Hann's or Hamming windows, are used. Furthermore, the overlap between successive windows may be used to show transients of a short duration in the signal [15].
Most of the wheeze detection algorithms operate on power spectrum P [m, k] (Equation (2)) or amplitude spectrum A[m, k] (Equation (3)). In comparison to the amplitude spectrum, the power spectrum causes the attenuation of lower magnitude frequency components potentially containing the high-frequency harmonics of wheezing, due to the omission of the square root. Information from phase spectrum Φ[m, k] (Equation (4)) may also be used.

Preprocessing of the Spectrum
Preprocessing may include the following steps: equalization of the amplitude (or power) spectrum, spectral denoising and enhancement of the frequency resolution.
Equalization of the spectrum is performed for the compensation of individual patient and measurement site variations. The equalization step is implemented by detrending the spectrum of normal respiratory sound, thus leaving only high-magnitude spectral peaks standing out. Depending on later processing steps, it may be accompanied by the normalization of the spectrum in order to make it independent of respiratory flow. Early wheeze detection algorithms implemented equalization by subtracting the mean value from the power spectrum and, afterwards, normalization by dividing the spectrum by the standard deviation [32]. The equalization step was refined in [17] by dividing the spectrum into equidistant bands and performing band-wise detrending by the mean, followed by normalization using the band-wise standard deviation. In [19], the authors implemented equalization by point-wise detrending using a moving average filter.
Spectral denoising is used in order to reduce the number of isolated transient peaks (potentially producing false positives), but preserving spectral crests originating from wheezing. Wavelet denoising was proposed for this task by [20]. Some authors applied 2D image processing tools, such as bilateral edge preserving filtering [21] and Laplacian edge enhancing filtering [22], in order to enhance wheezes against the background noise in the spectrograms.
Enhancement of the frequency resolution of the STFT improves the frequency-localization of the spectral peaks originating from wheezing. Zero padding is the most straight-forward approach to increasing the frequency resolution [19]. Spectrogram reassignment and temporal-spectral dominance techniques of enhancement of the STFT time-frequency resolution were compared in [23].

Feature Extraction
Wheezing is discriminated from normal respiratory sound using spectral and temporal features extracted from STFT. The most commonly used are the features describing the shapes of the wheezing peaks in the time-frequency plane. Most algorithms using such features operate segment-wise, iterating two steps: (a) an extraction of spectral features (frequencies, the number of wheezing peaks, etc.) from the current signal segment, followed by; (b) tracking the temporal features (continuity, duration, etc.) using information from prior segments. In order to reduce the number of temporal features processed in Step (b), several approaches are proposed in Step (a) for the discrimination of the spectral shapes originating from wheezing, from the isolated peaks of the noisy respiratory spectrum.
Due to signal windowing, the discrete frequencies of wheezing are smeared across a band occupying several frequency bins in the amplitude (or power) spectrum, appearing as flattened "spectral crests", rather than isolated discrete spectral peaks. A common approach of modeling the shape of such spectral crests is by low order statistical moments: the mean and variance (or standard deviation). This approach was first introduced in [32] by posing a set of relations between the mean value of different subsets of neighbor frequency bins surrounding each spectral maximum and the standard deviation of the whole spectrum. It was further refined by [17,20]. Both authors noticed that, if the spectrum has already been normalized (by the standard deviation) in the preprocessing step, the independence of the classification results from respiratory flow can be achieved by excluding the standard deviation from the spectral crest model. A different means of achieving flow independence was shown by [19]. There, due to the absence of the spectrum normalization step from preprocessing, the features describing spectral crests included both the mean and standard deviation, calculated locally around spectral maximums.
An alternative model of spectral crests was proposed in [18] with the aim of detecting only the audible sounds of wheezing. The audibility of a tonal signal masked in the noise of normal respiration was modeled by the ratio of the energy contained in the spectral crest to the energy contained in the noise of the normal respiratory sound. The bandwidth of such a wheezing crest was considered frequency-dependent, as analytically described by the psychoacoustic model.
The extraction of the spectral and temporal features describing wheezing crest shapes can also be performed simultaneously in time and frequency (on 2D spectrograms) by using image processing techniques. In [34], the detection of time-frequency plane crests was performed by gradient filtering. In [21], features describing centroid frequencies and the duration of spectral crests were calculated using edge detection Prewitt filtering, image closing and opening steps.
Apart from the features related to wheezing crest shapes, a variety of alternative STFT features were proposed in recent publications. One of the commonly used features is entropy, measuring the degree of grouping (clustering) of spectral components. Several variations are proposed. The difference and ratio between Shannon's entropy of probability mass functions of power-spectra maximums in successive time-windows were evaluated for single-feature classification in [37,40]. The mean distortion among sub-band histograms and the mean histograms of the sample entropy was evaluated in [36]. Rényi entropy was proposed in [38] as a measure of the time-domain signal's distribution uniformity. In addition, [38] evaluates the statistical parameters of kurtosis and the f 50 /f 90 ratio as spectral features. These features were later compared in [39] to the spectral features describing signal tonality: spectral flatness and tonal index. This work has been extended in [41] in the direction of selecting the most discriminating feature set for wheeze detection by applying the minimal redundancy, maximal relevance technique, affirming the potency of spectral tonality. Of the other features, the cross-correlation index of successive spectra was proposed in [35]. Furthermore, an integral of time-varying power spectral content was used as a feature in [22].

Classification
Decision tree classifiers have most commonly been used in algorithms using features describing (tracking) the shapes of wheezing crests [17][18][19][20]. The tree structure is designed to track features describing spectral crests originating from wheezing in the time and frequency plane.
By employing a precise formalism, a linear support vector machine (SVM) classifier was used with wheezing crest shape features in [23]. A SVM was also utilized in [39] with spectral features describing tonality, spectral flatness, f 50 /f 90 , kurtosis and entropy.
Some authors used features derived from STFT as an input to a neural network (NN). The initial study of [33] investigated the usage of all STFT amplitude spectrum samples directly as NN input coefficients, identifying the need for input vector dimensionality reduction. This was addressed in [22] by using the projection of the spectrogram to frequency axis as features (NN). In a comprehensive study [15], neural networks were compared to vector quantization (VQ) and Gaussian mixture model (GMM) classification systems, with the average magnitudes of power spectral bands as features. Table 1 summarizes the review of wheeze detection algorithms. Representative algorithms can be grouped into two groups. The first group is comprised of algorithms using features describing the shapes of wheezing crests, and the second group contains algorithms performing classification on alternative features. Several difficulties arise when comparing the results reported by different authors. First, it is unclear whether the features, other than those directly describing wheezing crest shapes, can provide sufficient information for accurate classification. Secondly, a variety of different datasets is used among the authors, as no publicly available standard dataset exist, containing normal and pathological respiratory sounds. Thirdly, classification accuracy testing methodologies and the associated accuracy reporting metrics vary. Nevertheless, two operating scenarios are commonly referred to: (1) the detection of the occurrence of sequences of wheezing; or (2) a wheezing sequence duration quantification. Finally, the execution speed of the proposed algorithms is seldom analyzed and reported.

Analysis of Implemented Algorithms
Following the presented review, we compare wheeze detection implemented using four algorithms, offering different levels of detection fidelity. The first two are the spectral crest shape tracking algorithms. The assumption is that such algorithms may provide the highest fidelity of wheeze classification, including estimation of the durations, number and frequency of the individual harmonic components composing the sound of wheezing. The algorithms differ by their spectral features: the first algorithm models the spectral crests using low-order statistical moments (mean and variance), building upon [17,19,20], and the second using energy (inspired by [18]).
The third algorithm also enables the estimation of the duration of wheezing, but does not enable distinguishing between individual frequency components. We implement the algorithm, tracking the duration of tonal intervals within the respiratory signal, facilitating a tonality feature recently proposed by [39].
The lowest fidelity algorithm is aimed solely at the detection of the occurrence of wheezing, without any prospect of estimating the duration of wheezing. We implemented the most representative of such algorithms, the one using Shannon's entropy of spectral peaks (as in [37]) to detect uniformity in the spectrum.  The complete set of features used in our work is shown in Figure 3. Features denoted as spectral are related to individual signal segments, while those denoted as temporal describe wheezing along the temporal axis in the time-frequency plane. The following sections describe the implementation of each program block and analyze their a priori computational complexity.
The analysis of computational complexity is performed by estimating the worst-case number of multiplications and additions, including multiplicative constants (additive constants are omitted from the analysis). No assumptions are made regarding any architectural specifics of the target DSP. Common elementary mathematical functions, listed in Table 2, are assumed to be implemented using the approximation methods listed in the column "Implementation". Approximation methods are chosen to match the ones used in the experimental DSP implementation [42]. Their computational complexity, described by the associated variables, defining their numerical precisions, is used throughout the analysis.

Function Implementation
Multiplications Additions Table 3. The computational complexity of the signal decomposition program blocks.

Program Block Comment Multiplications Additions
Windowing and STFT, Equation (1) calculated for the signal segment of length Power spectrum, Equation (2) calculated for N b < N bins corresponding to the bandwidth of respiration Amplitude spectrum, Equation (3) calculated on N b bins, the square root is implemented as in Table 2 2N Phase spectrum, Equation (4) calculated on N b bins, division and arctg implemented as in Table 2 N

STFT Decomposition and Preprocessing of the Spectrum
Firstly, signal segments are windowed using the Hamming's cosine windowing function, and STFT is calculated according to Equation (1). Depending on the features to be extracted, STFT is followed by one or several of the following preprocessing steps. The power spectrum of the signal segment is calculated as in Equation (2). From the power spectrum, the amplitude spectrum (module) of the current signal segment is derived according to Equation (3). The phase spectrum is calculated according to Equation (4). The estimates of the a priori computational complexity of the signal decomposition and preprocessing program blocks are shown in Table 3.

Signal Segment Energy
The energy of current signal segment E[m], defined in Equation (5), is used as the feature for the identification of respiratory pauses. The energy is calculated by the summation of the power spectrum components of the current segment.
Minimal and maximal energies E min and E max , given in Equation (6), are used as thresholds. They are obtained from the stored history of the previous segments' energies. The number of stored segment energies, M E , is chosen to cover the time-interval of at least one respiratory cycle.

Spectral Tonality
Spectrum tonality is a feature describing the existence of the harmonic content within each signal segment. It is calculated as proposed in [40]. Firstly, the amplitude and phase spectra, extracted as defined in Equations (3) and (4), are stored for the history of two preceding signal segments (at time-instants m − 1 and m − 2). Based on this, the current signal segment's amplitude,Â[m, k], and phase,φ[m, k], spectra estimates are calculated, as shown in Equation (7): The amplitude and phase spectrum estimates are used for the calculation of weight coefficients W [m, k], defined in Equation (8). W [m, k] is proportional to the estimation error of each frequency component, k, in the current signal segment, m.
is then used to calculate the weighted segment's energy, E w [m], shown in Equation (9): Finally, by comparing the weighted and unweighted segment's energy, tonal index T [m], related to the current signal segment, is defined in Equation (10). Based on this, the temporal feature, δm tonal , describing the duration of tonal sections, is extracted.

Entropy of Power Spectrum Peaks
Due to its property of expressing signal complexity, we evaluate Shannon's entropy as a detector of grouping in the spectrum, thus indicating the occurrence of wheezing. We calculate it similarly as proposed in [37].
First, extracted power spectrum peaks P peak [m, p] are rescaled according to Equation (13) to produce normalized spectral peaks P norm,peak [m, p]: Then, the signal segment's entropy, En[m], is expressed as in Equation (14): The most noticeable changes in entropy are expected upon the transition between signal segments of the normal respiratory sound and segments containing wheezing. Thus, a temporal feature, En ratio [m], defined in Equation (15), describing the ratio of entropies of two successive signal segments, is extracted.

Spectral Crests Modeled by Low-Order Statistical Moments
The first approach to spectral crest modeling is based on the first-and second-order statistical moments (mean, standard deviation) describing the distribution of the magnitudes of the subset of power spectrum components, P band [m, p], forming a band around the central frequency, k peak [m, p], of the each of the N p power spectrum peaks (see Equation (16)). Bandwidth B crest (see Figure 4, left) is chosen during algorithm training.
The mean value and the standard deviation of all power spectrum components within each of p bands P band [m, p] are calculated. Those peaks, P peak [m, p], the magnitudes of which exceed the condition defined in Equation (17), are declared to be the peaks of the spectral crests, P crest [m, c], potentially originating from wheezing. The constants, C m , C s , are obtained during the training phase. Crest-peak frequencies k crest [m, k] and the number of crests, N c [m, k], are also extracted, as shown in Equations (17) and (18):

Spectral Crests Modeled by Energy
An alternative approach to spectral crest modeling is to measure the distribution of energy localized around each identified spectral peak. This model is a modified version of the work presented in [18], with the omission of psychoacoustic auditory modeling.
For each of the identified peaks, P peak [m, p], three bands are defined, concentrically spanning around the peak frequency, k peak [m, p]: B crest < B narrow < B wide (see Figure 4, right). B crest is the bandwidth containing the main lattice of a single harmonic represented using a combination of the used signal window (e.g., Hamming) and the time-frequency resolution of STFT. B narrow and B wide define the surroundings of each spectral peak and are empirically set to 80 or 120 Hz, respectively. Those spectral peaks for which Equation (19)  Two temporal features of spectral crests are derived in order to discriminate longer spectral crests originating from wheezing from the short, isolated transients in the time-frequency plane.
The first feature is the continuity of the spectral crests in the time-frequency plane. Continuity is described by extracting the deviations of each crest's peak frequency, k crest [m, c], along the temporal axis, as shown in Figure The computational complexity of each feature extraction program block is listed in Table 4.  Table 2.

Decision Tree Classification
A total of four wheeze detection algorithms are developed, by organizing subsets of features from Section 4.2 into decision trees, shown in Figure 6: two crest tracking algorithms sharing the analogous decision trees (labeled Algorithms 1 and 2), a tonality tracking algorithm (Algorithm 3) and an entropy change detector (Algorithm 4). All decision trees share the same root, evaluating the segment energy, in order to decide whether the segment is part of a respiratory cycle or an inter-respiratory pause, enabling early termination. The remaining branches are algorithm-specific. The classification operates segment-wise, assigning each signal segment to one of two classes: "non-wheezing", or "wheezing".

Algorithms 1 and 2: Crest Tracking
First, the existence of spectral crests is determined by modeling the surroundings of the power spectrum peaks, either using statistical moments as in Algorithm 1 (see Equation (17)), or as in Algorithm 2, using energy (see Equation (19)). Extracted crests are counted in order to check that feature N c satisfies 1 < N c < C crests . Next, the temporal features of crests are evaluated.
First, the continuity describing features δk crest [1, c] Finally, the duration, δm crest [c], of each spectral crest is evaluated to lie between the minimal and maximal durations, M dur,min , and M dur,max , respectively. M dur,min is adjusted to the duration defining continuity, M dur,min = M cont . M dur,max is chosen to reflect the maximal expected uninterrupted duration of wheezing, typically being a duration of the respiratory cycle.

Algorithm 3: Tonality Tracking
The tonality tracking algorithm calculates the tonality of each signal segment according to Equations (7)- (10). Segments satisfying T [m] > C T are considered tonal. Constant C T is acquired through training. In the final decision tree branch, the duration of the successive signal segments marked as tonal, δm tonal , is compared against constants M dur,min and M dur,max .

Algorithm 4: Entropy Change Detection
The algorithm is designed to detect transitions between the interval of normal respiration and the interval containing wheezing. It compares the ratio of entropies, En ratio [m] (see Equations (13)-(15)), against a threshold, C ent . The threshold is acquired during training. Table 5. The total computational complexity of each implemented algorithm. For definitions of the variables, see Table 2.

Algorithm
Multiplications Additions The total computational complexity estimates of all algorithms are shown in Table 5. They are  obtained by summing the complexities of those program blocks from Tables 3 and 4, participating in each algorithm according to Figure 6.

Hardware Platform and Implementation
The algorithms described in Section 4.3 were first implemented in MATLAB and afterwards ported to DSP. A development board EZDSP-C5505-USB (Texas Instruments) [43] was used for prototyping of the wheeze detection sensor node. The board features an analog audio input/output interface, a TLV320AIC3204 analog to digital converter (ADC), a TMS320C5505 DSP core, an universal asynchronous receiver/transmitter (UART) and a debugging interface XDS-1000. The signal was digitized at the ADC's sampling frequency of f s = 8, 000 Hz. The Inter-integrated circuit sound bus (I2S) was used for the signal transport from the ADC to the DSP. The direct memory access (DMA) units' interrupts were used for the synchronization of the main processing tasks: (a) the signal acquisition task; and (b) the classification task; shown in Figure 7.  The classification task operated on fixed-sized signal segments of N = 512 samples, corresponding to 64 ms. The task resulted in declaring each segment to either be the "wheezing" or "normal" class. The result was output by UART. To compensate for the signal attenuation around the cosine window edges, segments were overlapped by 50%, resulting in a total of 32 ms available for the processing of each signal segment. With the DSP core operating at a 100 MHz clock, this sufficed for maximally N cycl,tot = 3.2 × 10 6 single-cycle instructions for the processing of each segment, and this yields a power consumption of approximately 22 mW. For the remainder of the cycle, the DSP is kept in standby state, while the DMA periphery performs the acquisition task, while consuming only 0.4 mW. The DSP is woken up upon the DMA's interrupt.
Texas Instruments "DSPlib" library functions [42] were used for the implementation of the common signal processing functions, such as algebraic operations on vectors, trigonometric, logarithmic functions, statistical functions, FFT, etc., in 16-bit fixed-point arithmetic. This ensures the reproducibility of the results and optimizes the execution performance by exploiting C5505's architectural features, such as two multiply-and-accumulate (MAC) units and the FFT coprocessor.

Test Signals
The wheeze detection algorithms were tested on a database of prerecorded respiratory sounds. Our database consisted of a total of 26 recordings. Thirteen of them were of normal breathing (N01...N13), and each of the other 13 audio recordings, labeled W01...W13, contained more than one uninterrupted interval of wheezing. The number of recordings used in our study corresponds to the dataset sizes used throughout the literature (see Table 1, column "Dataset"). Due to the lack of a single standard respiratory sound database, the recordings used in our study were drawn from multiple commonly referenced Internet sources [44][45][46][47][48], and some were recorded in the course of previous research [49]. Table 6 provides the details of each recording. "Dur." is the duration of the recording in seconds. "Seg." refers to the number of 50%-overlapped 64-ms signal segments. "Resp. phases" is the total count of inspiratory and expiratory phases. "Seg." and "Resp. phases" define the number of samples used in the statistical evaluation of results. "Wheeze intervals" are the count numbers of the intervals of wheezing within each recording. "Sample rate" is the frequency at which the recording was originally digitized. Table 6. Database of respiratory signals. "Dur." is the duration of the recording in seconds. "Seg." refers to the number of 50%-overlapped 64-ms signal segments. "Resp. phases" is the total count of inspiratory and expiratory phases. "Seg." and "Resp. phases" define the number of samples used in the statistical evaluation of results. "Wheeze intervals" are the count numbers of the intervals of wheezing within each recording. "Sample rate" is the frequency at which the recording was originally digitized. In the labels column, N stands for normal breathing, while W stands for wheezing.

Testing Environment
An environment for signal annotation, algorithm training and testing was designed in MATLAB (see Figure 8). The annotation of the referent classification results was performed by an expert's audio-visual inspection of the signals' waveforms and spectrograms. Intervals containing normal respiratory sounds were annotated as negative (N) and intervals containing wheezing as positive (P). The number of annotated intervals of wheezing is provided for each signal, W01...W13, in column "Wheeze intervals" of Table 6. The temporal resolution of the annotations is adjusted to the segment size upon which the wheeze detection algorithm was running on the DSP (determined by the signal segment size, the overlap and the development board's ADC sampling frequency).

Testing of Classification Accuracy
Classification accuracy was tested in two operating scenarios: 1. Wheeze duration tracking scenario. In this scenario, the dataset used for statistical evaluation consisted of a total of 4,422 segments of normal respiratory sounds, N01...N13, and 5,452 segments containing wheezing (W01...W13), each segment corresponding to 64 ms of sound. For details, please refer to column "Seg." in Table 6. N T P , N F P , N T N and N F N were calculated segment-wise.
2. Detection of the occurrence of wheezing in a respiratory phase. For this scenario, the annotations of the test-signals were readjusted for the classification results evaluated respiratory phase-wise. Whole respiratory phases containing more than one interval of wheezing were annotated as referent positives, and the phases without the occurrence of wheezing as referent negatives. Thus, the dataset consisted of a total of 65 positives (found throughout W01...W13) and 148 negatives (of those 66 in W01...W13 and the 82 in N01...N13), as seen from column "Resp. phases" in Table 6. N T P , N F P , N T N and N F N were calculated based on the classification results obtained for each respiratory phase. Due to the DSP still operating segment-wise, the following mapping is introduced: the respiratory phase containing wheezing (annotated positive) was considered TP if containing at least one positively classified signal segment. Furthermore, the respiratory phase was categorized as FP in the case of the existence of positively detected signal segments in the respiratory phase lacking the occurrence of pathology. This is analogously so for TN and FN.
From N T P , N T N , N F P and N F N , sensitivity SE, specificity SP and accuracy AC were calculated as defined in Equation (22). Sensitivity measures the fraction of correctly classified samples of wheezing (from the subset of test samples composed only of positives). On the other hand, specificity measures the percentage of correctly classified samples of normal respiration (in a signal containing exclusively negatives), while accuracy measures the overall performance.
For both operating scenarios, the leave-one-out method was used for training and testing, due to the limited size of the test signal database. The method tested each of N = 26 signals from the database, using the classification thresholds obtained through training on the remaining 25 signals. The training of the algorithm thresholds was performed by a grid-search hyper-parameter optimization procedure in which the goal function, shown in Equation (23), was chosen similarly to [15], as the maximum of the area under the curve, AU C max , of the receiver operating characteristic (ROC), comparing the true positive rate (T P R = SE) against the false positive rate (F P R = 1 − SP ).
After completing the leave-one-out procedure on all test signals, SE, SP and AC were calculated separately, both for test signals containing intervals of wheezing (W01...W13), for normal signals (N01...N13) and for the whole database, for each of the four algorithms. Training and testing were analogously repeated for both wheeze duration tracking and the wheeze occurrence detection operating scenario, resulting in SE dur , SP dur , AC dur and SE event , SP event , AC event , respectively.

Execution Duration
Verification of the execution duration was performed using code profiling tools of the Code Composer Studio development environment (Texas Instruments). Algorithms were running on the DSP in debug mode. The time intervals of interest were measured using manually set breakpoints in the number ticks of the DSP core clock running at 100 MHz. A common, representative segment chosen from an interval of wheezing contained in test signal W08 was used throughout all execution duration measurements, yielding the worst case execution time for all algorithms. All measurements were repeated 10 times and averaged.
Using such a setup, the durations of the execution of each program block from Figure 3 were measured. Furthermore, the total time required for the execution of the classification task over the single signal segment, N cycles,total , was measured.

Code Execution Efficiency
In order to evaluate the suitability of the implemented algorithms for long-term wheeze monitoring using a low-power wearable sensor, we assessed their execution efficiency. Therefore, we propose metrics, defined in Equation (24) as µ SE , µ SP and µ AC , comparing, respectively, overall classification sensitivity SE, specificity SP or accuracy AC for the processing duty-cycle, D exec . Processing duty-cycle D exec is defined as the ratio between the average number of DSP instructions required for the execution of classification task over a single signal segment, N cycl,exec , and the total number of clock cycles between two successive signal segments (e.g., N cycl,tot = 3.2 × 10 6 when the DSP core is running at 100 MHz and the time between successive signal segments equals 32 ms). D exec is directly related to the portion of time the DSP has to spend in the active state. The efficiency is measured for each of two operating scenarios: wheeze duration tracking (labeled as µ SE,dur , µ SP,dur and µ AC,dur ) and wheeze occurrence detection (labeled as µ SE,event , µ SP,event and µ AC,event ).

Accuracy of Classification
The receiver operating curves averaged through all N = 26 iterations of leave-one-out training of each of four algorithms are compared in Figure 9. The maximal areas under the curves, AU C max , and the associated set of trained classification parameters by which they are obtained, are shown on each graph. Figure 10 shows examples of the classification results overlaid onto signal spectrograms. Gray markings represent referent intervals of wheezing annotated by an expert (referent positives), while black markings are signal segments classified as wheezing (classified as positive). Figure 10a- Figure 10e shows an example of a less successful classification by Algorithm 3, containing a high number of false negative signal segments. Similarly, Figure 10f shows an example containing a high number of false positive signal segments obtained on normal a respiratory signal by Algorithm 4.
Overall SE dur , SP dur , AC dur , obtained in wheeze duration tracking operating scenario, are shown in Table 7. The values listed in column "Thresholds" refer to the trained threshold values of the classification parameters from Table 8. "W" denotes the results obtained only on W01...W13 and "N" on N01...N13. Event detection accuracies are compared in Table 9, listing only the overall results for brevity. The best results are highlighted in green, and the worst are colored red.

Execution Duration and Efficiency
Execution duration estimates, obtained by calculating expressions from Tables 4 and 5 Figure 11a compares the execution duration estimates feature-by-feature, while the total number of operations per each wheeze detection algorithm is given in Figure 11b. Figure 12 shows the experimental results of the DSP execution time profiling of each implemented program block, enabling the identification of bottlenecks. Arrows show the execution order and the inclusion of particular program blocks into each of the four implemented algorithms. The values express the average number of DSP cycles required for a single execution of the corresponding program block. The total number of DSP clock cycles required for the worst-case execution of classification task N clk,exec and the associated processing duty-cycle based on 32 ms between the processing of successive segments is shown in Table 10. The associated code execution efficiencies are compared in Table 11.     7. Discussion

Accuracy of Wheeze Duration Tracking
The receiver operating curves of both crest tracking algorithms (Algorithms 1 and 2) exhibit the highest maximal area under the curve (AU C max ). Additionally, by featuring a clear inflection point, they enable the unambiguous setting of the classification parameter thresholds, which yield the combination of the highest true positive rate (highest sensitivity) at the lowest false positive rate (highest specificity). Good wheeze duration tracking capability can be observed by the examples of the test results in Figure 10a,b and is supported by the highest overall sensitivities, specificities and accuracies. Both algorithms feature, on average, 3%-6% higher specificity than sensitivity (tracking normal signals slightly better than wheezing). Of two versions of the algorithms, Algorithm 2, featuring the energy-based crest model, shows a 1.21% advantage in sensitivity, 3.92% in specificity and 3.52% in accuracy over Algorithm 1, which models spectral crests by low-order statistical moments.
Even though Algorithm 4 (the entropy change detector) also features a receiver operating curve with a clear inflection point, its AU C max is approximately 15% lower than those of Algorithms 1 and 2. Its maximal sensitivity is limited to 85%, and the specificity converges to less than 90%. Compared to the crest tracking algorithms, Algorithm 4 achieves a lower overall SE dur , SP dur and AC dur , all equaling around 83% in the wheeze duration tracking scenario.
Algorithm 3 (tonality tracking) features the most shallow receiver operating curve without a clear inflection point. Thus, the algorithm can be adjusted either for high sensitivity at the cost of low specificity (e.g., efficient tracking of wheezing, but a high number of additional false positives in signal segments of normal respiration), or on the other hand, it may be set for high specificity, at the cost of a high count of false negatives during the occurrence of wheezing (a weaker wheeze duration tracking performance, as seen in Figure 10c). When the classification threshold is set in-between, in the ROC's "ramp" region, the results contain a significant amount of both false positives and negatives, keeping the overall accuracy around 70%.

Accuracy of Event Detection
Due to the invariance of the event detection metrics to the occurrence of individual signal segments classified as false negative in the intervals of wheezing (see Section 5.3 and Figure 10e), most successful event detection is expected of those algorithms featuring the receiver operating curves with the highest specificity.
Thus, Algorithms 1 and 2 provide the best overall results in the wheeze-event detection scenario. According to Table 9, Algorithm 1 features the highest sensitivity (SE event = 98.46%). Algorithm 2 shows the highest event detection specificity (SP event = 91.21%) and accuracy (AC event = 96.92%). Generally, crest tracking algorithms feature greater sensitivity than specificity of event detection (better at identifying respiratory cycles containing wheezing). Tonality tracking (Algorithm 3) offers comparable specificity and accuracy to crest tracking algorithms, but lacks sensitivity, meaning that it performs better at identifying respiratory cycles containing only normal breathing. Furthermore, tonality showed 9.4% better accuracy in event detection than the worst performing entropy-based Algorithm 4.

Execution Duration and Efficiency
The results of experimental DSP implementation shown in Table 10 and the a priori analysis of the computational complexity shown in Figure 11b agree on the relative relations between the total execution durations of all the algorithms. The differences in the results obtained in the per-feature profiling (Figures 11a and 12) clearly indicate the benefits of the exploitation of DSP's architectural features, which accelerate numerically intensive operations (the FFT coprocessor and dual MAC unit).
According to the experimental results, Algorithm 4 (peak entropy) features the shortest overall execution duration, with the power spectrum peak detection program block being its bottleneck. Algorithms 1 and 2 are slower in execution than Algorithm 4, for 21% and 28%, respectively. They differ only in crest modeling blocks (labeled as "Crest freq." in Figure 12), with the model based on crest energy being about 7% slower. Tonality tracking tends to be the slowest (a 65% longer execution than Algorithm 4). Its main bottleneck is the numerically intensive calculation of tonality. Furthermore, additional preprocessing blocks (the calculation of the amplitude and phase spectrum) contribute to its total execution time.
According to Table 10, algorithms implemented on the TMS320C5505 DSP range between a 1.87% and 3.09% processing time occupancy (D exec ) for a clock set to 100 MHz. Thus, the remaining 96.91%-98.13% of time may be spent in a low power state, minimizing the DSP core consumption. According to Figure 12, on average, 38,573 clock cycles are spent on common signal preprocessing tasks: signal segment windowing, FFT, energy and power spectrum calculation. The rest is algorithm specific.
In spite of its medium classification accuracy, Algorithm 4 (the spectral peaks entropy change detector) turns out to be the most efficient in both operating scenarios, thanks to its very short execution time (see Table 11). In comparison, crest tracking algorithms feature similar execution efficiencies. Compared to Algorithm 4, they are only about 9% lower in the wheeze duration tracking scenario (see µ AC,dur in Table 11), and µ A,event is only 5% lower in the event detection scenario. On the other hand, the absolute accuracies of Algorithms 1 and 2 are significantly higher than those of Algorithm 4 (see the associated AC dur in Table 7 and AC event in Table 9), making them suitable if higher classification accuracy is required. Algorithm 3 (tonality tracking) tends to be the least efficient, about 50% less than Algorithm 4, due to the low accuracy and high execution time.

Conclusions
In this article, we evaluated the computational complexity, the execution time and the accuracy of the wheeze detection algorithms for optimizing the active time of the DSP of the wearable sensor for real-time asthmatic wheeze detection. Efficiency metrics were introduced comparing the experimentally obtained accuracies and execution durations of four representative algorithms in wheeze occurrence detection and duration tracking scenarios.
The higher classification accuracies of crest tracking algorithms, obtained in both operating scenarios, have shown the advantage over the tonality or entropy-based ones. Though being the least accurate in the wheeze duration tracking scenario, tonality tracking proved more accurate than the entropy-based algorithm and comparable to the tracking of spectral crests modeled using statistical moments, in the event detection scenario.
The implementation of each algorithm required the DSPs activity to be less than 3% of the time, for real-time operation. The highest execution speed was obtained for the entropy-based algorithm and the lowest for tonality tracking (65% lower).
While a general purpose DSP proved valuable for the comparison of different algorithms, it does not define the absolute boundaries of the energy consumption cost of wheeze detection. Nevertheless, such an analysis provides the information necessary for the optimization of the architectural requirements of the DSP unit in future work.