Automatic Bowel Motility Evaluation Technique for Noncontact Sound Recordings

Information on bowel motility can be obtained via magnetic resonance imaging (MRI)s and X-ray imaging. However, these approaches require expensive medical instruments and are unsuitable for frequent monitoring. Bowel sounds (BS) can be conveniently obtained using electronic stethoscopes and have recently been employed for the evaluation of bowel motility. More recently, our group proposed a novel method to evaluate bowel motility on the basis of BS acquired using a noncontact microphone. However, the method required manually detecting BS in the sound recordings, and manual segmentation is inconvenient and time consuming. To address this issue, herein, we propose a new method to automatically evaluate bowel motility for noncontact sound recordings. Using simulations for the sound recordings obtained from 20 human participants, we showed that the proposed method achieves an accuracy of approximately 90% in automatic bowel sound detection when acoustic feature power-normalized cepstral coefficients are used as inputs to artificial neural networks. Furthermore, we showed that bowel motility can be evaluated based on the three acoustic features in the time domain extracted by our method: BS per minute, signal-to-noise ratio, and sound-to-sound interval. The proposed method has the potential to contribute towards the development of noncontact evaluation methods for bowel motility.


Introduction
The decrease in or loss of bowel motility is a problem that seriously affects quality of life (QOL) and daily eating habits of patients; examples of this include functional gastrointestinal disorders (FGID), in which patients experience bloating and pain when bowel motility is impaired due to stress or other factors. Such bowel disorders are diagnosed by evaluating the bowel motility. Bowel motility is currently measured using X-ray imaging or endoscopy techniques; however, these methods require complex testing equipment and place immense mental, physical, and financial burdens on patients, which make these methods unsuitable for repeated monitoring.
In recent years, the acoustic features obtained from bowel sounds (BS) have been used to evaluate bowel motility. BS are created when transportation of gas and digestive contents through the digestive tract occurs due to peristaltic movement [1]. BS can be easily recorded by applying an electronic stethoscope to the surface of the body. In recent years, a method has been developed for evaluating bowel motility by automatically extracting BS from the audio data recorded using electronic stethoscopes [2][3][4][5][6][7]. In quiet conditions, BS can be perceived at a slight distance without the use of an electronic stethoscope. As such, our recent research has demonstrated that even when data is acquired using a noncontact microphone, bowel motility can be evaluated based on BS in a manner the same as that when an electronic stethoscope is used [8]. However, in this study, BS were required to be manually extracted from the audio data that was recorded using noncontact microphones, and a large amount of time was spent on carefully labeling the sounds. The sound pressure of BS recorded using noncontact microphones was lower than that of BS recorded with electronic stethoscopes placed directly on the surface of the body. Furthermore, compared to recordings from electronic stethoscopes, there may have been sounds other than BS mixed in at higher volumes. As such, a BS extraction system that is robust against extraneous noise must be developed to reduce the time-and labor-intensive work of BS labeling.
To resolve these issues, this study proposes a new system for evaluating bowel motility on the basis of results obtained by automatically extracting BS from the audio data recorded with a noncontact microphone. The proposed method is primarily made up of the following four steps: (1) segment detection using the short-term energy (STE) method; (2) automatic extraction of two acoustic features-mel-frequency cepstral coefficients (MFCC) [9,10] and power-normalized cepstral coefficients (PNCC) [11][12][13][14]-from segments; (3) automatic classification of segments as BS/non-BS based on an artificial neural network (ANN); and (4) evaluation of bowel motility on the basis of the acoustic features in the time domain of the BS that were automatically extracted. On the basis of audio data recorded from 20 human participants before and after they consumed carbonated water, we verified (i) the validity of automatic BS extraction by the proposed method and (ii) the validity of bowel motility evaluation based on acoustic features in the time domain.

Subject Database
This study was conducted with the approval of the research ethics committee of the Institute of Technology and Science at Tokushima University in Japan. A carbonated water tolerance test was performed using 20 male participants (age: 22.9 ± 3.4, body mass index (BMI): 22.7 ± 3.8) who had provided their consent to the research content and their participation. The test was conducted after 12 or more hours of fasting by the participants, over a 25-min period (comprised of a 10-min period of rest before consuming carbonated water and a 15-min period of rest after consuming carbonated water). During the test, sound data was recorded using a noncontact microphone (NT55 manufactured by RODE), an electronic stethoscope (E-Scope2 manufactured by Cardionics), and a multitrack recorder (R16 manufactured by ZOOM). The primary frequency components of BS have generally been reported to be present between 100 Hz and 500 Hz [15]. Based on these reports, sound data was stored at a sampling frequency of 4000 Hz and digital resolution of 16 bits. Furthermore, sound data was filtered by a third-order Butterworth bandpass filter with a cutoff frequency of 100-1500 Hz. The participants were in a supine position during testing, with an electronic stethoscope positioned 9 cm to the right of the navel and a microphone 20 cm above the navel [8].
BS present in the sound data obtained using the noncontact microphone were also present in the sound data obtained using the electronic stethoscope. Based on this, as in our previous studies, we used audio playback software to listen carefully to both types of sound recordings, and classified as a BS episode any episode that was 20 ms or more in duration and could be distinguished by the ear at the same time position in both recordings [7].
For the analysis, we divided the sound data into sub-segments with a window range of 256 samples and a shift range of 64 samples. The STE method was used to calculate the power of each window range, making it possible to detect sub-segments above a certain signal-to-noise ratio (SNR). SNR, as used in this study, is defined as follows: Here, P S represents the signal power and P N represents the noise power. P N can be calculated based on a one-second interval of silence determined by conducting the abovementioned listening process, and it is a time-averaged value. Sub-segments detected successively using the STE method are treated as a single segment (also called sound episode (SE)). If a detected segment corresponds to a BS episode, then it is defined as a BS segment; otherwise, it is defined as a non-BS segment.

Automatic BS Extraction on the Basis of Acoustic Features
The acoustic feature presented to the ANN is either MFCC or PNCC. MFCC is widely used in fields such as speech recognition and analysis of biological sounds such as lung or heart sounds [9,[16][17][18]. MFCC is calculated by performing a discrete cosine transformation on the output from triangular filter banks evenly spaced along a logarithmic axis; this is referred to as a mel scale, and it approximates the human auditory frequency response. PNCC is a feature value developed to improve the robustness of voice recognition systems in noisy environments [11][12][13][14]. Because BS captured using noncontact microphones are generally low in volume and have degraded SNR, PNCC can be expected to be effective; it improves the process of calculating MFCC to make it more similar to certain physiological aspects of humans. Moreover, PNCC differs from MFCC primarily in the following three ways: First, instead of the triangular filter banks used in MFCC, PNCC uses gamma-tone filter banks based on an equivalent rectangular bandwidth to imitate the workings of the cochlea. Second, it uses bias subtraction based on the ratio of the arithmetic mean to the geometric mean (AM-to-GM ratio) for the sound that undergoes intermediate processing, which is not done in the MFCC calculation process. Third, it replaces the logarithmic nonlinearity (used in MFCC) with power nonlinearity. Owing to these differences, PNCC is expected to provide sound processing with excellent resistance to noise. For BS extraction in this work, a SE is divided into frames with a frame size of 200 samples and a shift size of 100 samples. Considering the number of dimensions often used in the field of voice recognition, we use 13-dimension MFCC and PNCC obtained from 24-channel filter banks, averaged over all the frames in each episode.
On the basis of these acoustic features, an artificial neural network (ANN) is used as a classifier to categorize segments detected with the STE method into BS segments and non-BS segments. The ANN is structured as a hierarchical neural network made up of three layers: namely, the input, intermediate, and output layers. The number of units in the input, intermediate, and output layers are, respectively, 13, 25 and 1. The output function of the intermediate layer units is a hyperbolic tangent function, and the transfer function of the output layer units is a linear function. As a target signal, the value of 1 is assigned to analysis sections in which sound is present if the sound is BS, whereas 0 is assigned if it is non-BS. The ANN learns from this categorization using an error back-propagation algorithm based on the Levenberg-Marquardt method [19,20]. To calculate sensitivity and specificity based on the post-training ANN output, a receiver operating characteristic (ROC) curve can be drawn. Through the analysis of the ROC curve, an optimum threshold (T h ) is estimated for use when classifying testing data sets. The optimum threshold used at this point is the threshold that is the shortest Euclidean distance from the positions at which sensitivity = 1 and specificity = 1 on the ROC curve [21]. Using this threshold for the ANN test outputb, it is possible to calculate the classification accuracy using sensitivity (Sen), specificity (Spe), positive predictive value (PPV), negative predictive value (NPV), and accuracy (Acc).
As shown in Figure 1, automatic BS extraction performance in this ANN-based method is evaluated by dividing the BS and non-BS segments obtained from the 20-person sound database at a ratio of 3:1, and using them respectively as training and testing data. This study calculated the average classification accuracy by performing multiple trials of ANN training and testing, in which (1) initial values of combined load were randomly assigned or (2) test data was randomly assigned.

Evaluation of Bowel Motility Based on Automatically Extracted BS
Our past research demonstrated significant differences in the following time domain acoustic features extracted before and after consumption of carbonated water by the participants: BS detected per minute, SNR, length of BS, and interval between BS (sound to sound (SS) interval). These differences suggest that bowel motility can be evaluated on the basis of these acoustic features [8]. As such, this study examines whether bowel motility can be automatically evaluated based on these acoustic features, as investigated in the previous study. To evaluate bowel motility from the data of one participant, the acoustic features of time domains were extracted based on multiple BS automatically extracted by performing leave-one-out cross validation for the proposed method. As in past studies, the differences between the previously mentioned acoustic features before and after the participant consumed carbonated water was evaluated using a Wilcoxon signed-rank test. The block diagram in Figure 2 shows the process leading up to the evaluation of bowel motility.

Evaluation of Bowel Motility Based on Automatically Extracted BS
Our past research demonstrated significant differences in the following time domain acoustic features extracted before and after consumption of carbonated water by the participants: BS detected per minute, SNR, length of BS, and interval between BS (sound to sound (SS) interval). These differences suggest that bowel motility can be evaluated on the basis of these acoustic features [8]. As such, this study examines whether bowel motility can be automatically evaluated based on these acoustic features, as investigated in the previous study. To evaluate bowel motility from the data of one participant, the acoustic features of time domains were extracted based on multiple BS automatically extracted by performing leave-one-out cross validation for the proposed method. As in past studies, the differences between the previously mentioned acoustic features before and after the participant consumed carbonated water was evaluated using a Wilcoxon signed-rank test. The block diagram in Figure 2 shows the process leading up to the evaluation of bowel motility.

Evaluation of Bowel Motility Based on Automatically Extracted BS
Our past research demonstrated significant differences in the following time domain acoustic features extracted before and after consumption of carbonated water by the participants: BS detected per minute, SNR, length of BS, and interval between BS (sound to sound (SS) interval). These differences suggest that bowel motility can be evaluated on the basis of these acoustic features [8]. As such, this study examines whether bowel motility can be automatically evaluated based on these acoustic features, as investigated in the previous study. To evaluate bowel motility from the data of one participant, the acoustic features of time domains were extracted based on multiple BS automatically extracted by performing leave-one-out cross validation for the proposed method. As in past studies, the differences between the previously mentioned acoustic features before and after the participant consumed carbonated water was evaluated using a Wilcoxon signed-rank test. The block diagram in Figure 2 shows the process leading up to the evaluation of bowel motility.

Results
To investigate the effect of SNR thresholds used in the STE method on the automatic evaluation performance and evaluation of bowel motility by the method, experiments were performed in which the SNR thresholds used in the STE method were 0, 0.5, 1 and 2 dB. Table 1 lists the number and length of BS and non-BS segments obtained at each SNR threshold used in the STE method.  Table 1 reveals the following pattern for both cases (before and after consumption of carbonated water by participants): As the SNR threshold decreases, the numbers of both BS and non-BS segments increase until a certain threshold, after which the numbers of segments decrease. Additionally, the values in the table confirm that the lengths of both segments also increase with decrease in SNR. The values of length and number of both segments were larger after consumption of carbonated water than those before consumption, and BS segments were longer than non-BS segments.

Automatic Bowel Sound Detection
To evaluate the automatic extraction performance of the proposed method, the respective segments were divided in a ratio of 3:1 for training data and testing data. Tables 2 and 3, respectively, present the results of 100 ANN-based approach trials that used MFCC and PNCC as acoustic features to derive the average classification accuracy. Table 2 reveals that for the case before consumption of carbonated water, accuracy slightly degraded with decrease in the SNR threshold, whereas the accuracy increased with decrease in SNR threshold in the case after consumption. Table 3 demonstrates that when PNCC is used, classification accuracy increases as SNR threshold decreases, for cases both before and after consumption of carbonated water. Furthermore, we can see that the highest accuracy is obtained when the SNR threshold is 0 dB. Figure 3 shows the results of a comparative analysis of extraction accuracy before and after consumption of carbonated water when using MFCC and PNCC, respectively. Table 3 shows that PNCC is more accurate than MFCC for all SNR thresholds. When the SNR threshold is 0 dB before the consumption of carbonated water, the average of PNCC becomes sufficiently larger compared to that of MFCC. In general, a BS with lower sound-pressure occurs before consumption of carbonated water than after consumption. This suggests that PNCC is effective in classifying such sounds. On the basis of the abovementioned observation, a subsequent automatic evaluation of bowel motility was conducted using PNCC with an ANN-based approach.  Table 3. Results of automatic BS extraction using an ANN-based approach based on PNCC (using performance evaluation through random sampling).  Table 3. Results of automatic BS extraction using an ANN-based approach based on PNCC (using performance evaluation through random sampling).

Bowel Motility Evaluation
In this study, leave-one-out cross validation was performed for each participant, and the classification accuracy of an ANN-based approach using PNCC was verified. Table 4 presents the average classification accuracies for which the corresponding accuracy was the highest for each participant after leave-one-out cross validation was performed 50 times.

Bowel Motility Evaluation
In this study, leave-one-out cross validation was performed for each participant, and the classification accuracy of an ANN-based approach using PNCC was verified. Table 4 presents the average classification accuracies for which the corresponding accuracy was the highest for each participant after leave-one-out cross validation was performed 50 times. Table 4. Results of automatic BS extraction using an ANN-based approach based on PNCC (using performance evaluation through leave-one-out cross validation). As was noted in a prior study [8], Table 5 shows that the acoustic features-BS detected per minute, SNR, and SS interval-can capture the differences in bowel motility before and after a participant consumes carbonated water, up to a point at which the SNR threshold decreases to nearly 0 dB. Note that these results are related to the accuracy of automatic BS extraction. However, unlike in the prior study [8], no significant difference in BS length before and after consumption of carbonated water was found. This suggests that when the SNR threshold reduces to 0 dB, the acoustic features of BS detected per minute, SNR, and SS interval can evaluate the bowel motility without being affected by the reduction in SNR threshold.

Discussion and Conclusions
This study proposes a system for automatic evaluation of bowel motility on the basis of acoustic features in BS time domains obtained by automatically extracting BS from sound data recorded using a noncontact microphone. Although studies related to bowel motility using BS have been conducted previously [2][3][4][5][6][7], those studies used electronic stethoscopes that were applied to the surface of the body. Our recent research has demonstrated that bowel motility can be evaluated from sound data recorded using a noncontact microphone the same way as it can be evaluated using data recorded with a stethoscope [8]. However, the extraction of BS from sound data performed in this study was based on manual labeling. The sound pressure of BS recorded using noncontact microphones is lower than that of BS recorded using electronic stethoscopes applied to the surface of the human body, and there are fewer perceptible BS. As such, using sound data recorded without contact requires an automatic BS extraction method that is resistant to extraneous noise. Even so, the results suggest that the system proposed herein-which uses PNCC and has excellent noise resistance-is able to automatically extract BS with approximately 90% accuracy if the SNR threshold is 0 dB. Furthermore, even when the SNR threshold drops to 0 dB, results suggest that bowel motility can be evaluated using the acoustic features other than those from the BS length time domain, such as BS detected per minute, SNR, and SS interval.
The proposed method could extract more sound by decreasing the SNR threshold used in the STE method, further extending segment length to increase the information provided to the system for BS/non-BS differentiation. We believe that as a result of this extension, we could improve the performance of automatic BS extraction. However, this also suggests that proper BS length cannot be obtained because of the extension in BS segment length caused by the decrease in the SNR threshold used in the STE method.
Compared to the results of the performance evaluation based on random sampling, the results based on leave-one-out cross validation tended to have a larger standard deviation and decreased sensitivity in the proposed method, particularly before the consumption of carbonated water by participants. The cause of this was thought to be the small number of participants, meaning that sufficient BS segments were not available for use in leave-one-out cross validation. As such, we expect an improvement with increase in the number of subjects. To further improve system performance, a combination of the following two measures would likely be useful: (1) replacing the STE method with another method for detecting segments having sound; and (2) selecting acoustic features with excellent resistance to extraneous noise.
In this study, we have provided new knowledge for noncontact automatic evaluation of bowel motility. It is hoped that the foundations of the system developed in this study can assist in the further development of the evaluation of bowel motility using noncontact microphones and research related to diagnostic support for bowel disorders.
Author Contributions: T.E., R.S., and Y.G. conceived and designed the experiments; R.S. and Y.G. performed the experiments; R.S. analyzed the data; R.S. and M.A. contributed materials/analysis tools; T.E. and R.S. wrote the paper.