Deep Learning-Based Nystagmus Detection for BPPV Diagnosis

Sae Byeol Mun; Young Jae Kim; Ju Hyoung Lee; Gyu Cheol Han; Sung Ho Cho; Seok Jin; Kwang Gi Kim

doi:10.3390/s24113417

,

and

¹

Department of Health Sciences and Technology, Gachon Advanced Institute for Health Sciences and Technology, Gachon University, Incheon 21999, Republic of Korea

²

Gachon Biomedical & Convergence Institute, Gachon University Gil Medical Center, Incheon 21565, Republic of Korea

³

Department of Otolaryngology Head & Neck Surgery, College of Medicine, Gachon University, Incheon 21565, Republic of Korea

⁴

AMJ Co., Ltd., Ansan-si 15610, Republic of Korea

Sensors2024, 24(11), 3417;https://doi.org/10.3390/s24113417

This article belongs to the Special Issue Deep Learning for Computer Vision and Image Processing Sensors

Version Notes

Order Reprints

Review Reports

Abstract

In this study, we propose a deep learning-based nystagmus detection algorithm using video oculography (VOG) data to diagnose benign paroxysmal positional vertigo (BPPV). Various deep learning architectures were utilized to develop and evaluate nystagmus detection models. Among the four deep learning architectures used in this study, the CNN1D model proposed as a nystagmus detection model demonstrated the best performance, exhibiting a sensitivity of 94.06 ± 0.78%, specificity of 86.39 ± 1.31%, precision of 91.34 ± 0.84%, accuracy of 91.02 ± 0.66%, and an F₁-score of 92.68 ± 0.55%. These results indicate the high accuracy and generalizability of the proposed nystagmus diagnosis algorithm. In conclusion, this study validates the practicality of deep learning in diagnosing BPPV and offers avenues for numerous potential applications of deep learning in the medical diagnostic sector. The findings of this research underscore its importance in enhancing diagnostic accuracy and efficiency in healthcare.

Keywords:

horizontal nystagmus; convolutional neural network; nystagmus detection; benign paroxysmal positional vertigo; pupil tracking

1. Introduction

Nystagmus refers to the occurrence of fine eye tremors resulting from challenges in eye movements, often closely associated with peripheral vertigo conditions such as benign paroxysmal positional vertigo (BPPV) [1,2]. This phenomenon significantly impacts visual stability and movement control, with BPPV causing inconvenience and safety issues in daily life due to the eyes’ inability to maintain a fixed position [3,4]. A diagnosis of nystagmus relies on the subjective experiences of the patient, hence introducing variability into its symptoms. This can present challenges, especially in identifying conditions like BPPV, which requires specialized knowledge. Consequently, accurate diagnosis of BPPV can be time-consuming and error-prone [5,6,7].

Recent efforts have been focused on the research of computer-aided diagnosis (CAD) for nystagmus [8]. CAD utilizes image analysis and pattern recognition technologies to swiftly analyze information related to specific patterns of nystagmus. This enables clinicians to make more accurate and prompt diagnoses, thereby improving patient safety and the efficacy of treatments.

In 2021, Zhang et al. developed a deep learning-based model for detecting nystagmus in video recordings of VOG tests. Training and validation were conducted using 8000 data entries, and the proposed torsion-aware bi-stream identification network (TBSIN) model achieved an F₁-score of 0.81 [9]. In 2019, Lim et al. proposed a model for diagnosing benign paroxysmal positional vertigo (BPPV) by detecting the presence of nystagmus in infrared video recordings of VOG tests. Using data from 1005 patients and employing 10-fold cross validation, their LSTM-based fusion model reached an F₁-score of 0.89 [10]. In 2022, Trung Xuan Pham et al. developed a deep learning-based model for detecting nystagmus in video recordings of eye movement tests taken using VOG. The study used 746 samples to diagnose nystagmus in six categories, achieving an F₁-score of 0.90 [11]. In 2023, Haibo Li et al. developed a deep learning-based nystagmus recognition model for vertigo diagnosis using infrared videos obtained from an eye movement recorder. They utilized a two-dimensional convolutional neural network (CNN) combined with Bi-Directional Long Short-Term Memory (BiLSTM), termed CNN-BiLSTM, for training and validation on a dataset comprising 24,521 videos obtained from 1236 patients. They achieved a sensitivity of 91.20% [12]. In 2024, Hang Lu et al. developed a deep learning-based model for diagnosing benign paroxysmal positional vertigo (BPPV) using data from 518 patients who visited the hospital for infrared nystagmus diagnostic tests. They trained and validated their model using the BKTDN (Big Kernel Temporal Difference Network) model, which enhances the TDN (Temporal Difference Module) model with a big kernel long-term module. On average, their model demonstrated an accuracy of 81.7% [13].

However, previous studies still faced several limitations and issues. Firstly, while prior research has investigated the automatic detection of eye oscillations, a comprehensive AI-based BPPV diagnosis system has not yet been implemented. This limitation indicates that manual intervention by medical professionals is still required throughout the process of AI-based nystagmus diagnosis due to the complexity of nystagmus diagnosis. Medical practitioners need to manually analyze vast amounts of data to accurately identify the cause of nystagmus and develop an appropriate treatment plan based on that, leading to an inefficient and time-consuming process. Furthermore, this suggests that the research to date has not succeeded in implementing an optimized process for nystagmus diagnosis, and automatically delivering accurate diagnoses considering the various causes and complex symptoms of nystagmus remains a challenging task.

In this paper, we propose a nystagmus detection model for diagnosing subjects’ BPPV based on deep learning. To build a reliable and robust AI diagnosis system, we developed a nystagmus detection model utilizing the horizontal movement of the pupils extracted through a pupil movement analysis algorithm. The proposed model structure is as follows: The pupils were segmented for movement tracking from video nystagmography data, recorded for BPPV diagnosis. Pupil movement was quantitatively extracted from the segmented pupils using a pupil movement analysis algorithm. Nystagmus was detected using the extracted pupil movement data. We conclude this paper by presenting the experimental results and subsequent analysis of the nystagmus detection model.

2. Materials and Methods

2.1. Development Environment

The deep learning training system described in this study was equipped with four NVIDIA Tesla V100 graphics processing units (GPUs) (NVIDIA, Santa Clara, CA, USA), 1.2 TB of RAM, and two twenty-core Intel (R) Xeon (R) Gold 5218R processors(Intel, Santa Clara, CA, USA) operating at 2.10 GHz. The training was conducted on an Ubuntu 18.04.5 operating system using Python (version 3.6) and the TensorFlow (version 2.2.0) framework. Additionally, the OpenCV library (version 4.5.1) was used for data loading and preprocessing.

2.2. Data Collection

A total of 3363 BPPV test videos were collected from 828 patients who visited Gachon University Gil Hospital for a video nystagmus test between 2 November 2021 and 7 September 2022. These videos were recorded using an ICS Chartr 200 device (Otometrics, a division of Natus, Copenhagen, Denmark) at 30 fps with a resolution of 320 × 120. The lengths of the collected videos varied by up to 5225 frames and were recorded in an infrared environment. All data were collected with the approval of the Gachon University Gil Hospital Institutional Review Board (GBIRB2023-313).

2.3. Data Labeling

In this study, our team developed in-house software capable of independently extracting two types of labeling for pupil tracking and nystagmus detection models. An ophthalmologist was trained on the developed software to manually perform the labeling for ground truth (GT) data.

2.3.1. GT Data for Pupil Tracking Model Development

The labeling data used for the development of the pupil tracking model mentioned in this paper consisted of binary images of the pupil area. In the study, 2160 frames were extracted from VOG, and in each frame, the area of the pupils of both eyes was labeled with dimensions of 320 × 120.

2.3.2. GT Data for Nystagmus Detection Model Development

Nystagmus refers to the phenomenon of continuous and repetitive rapid eye movements that originate from slow eye movements and swiftly return to the original position [14]. In the collected VOG data, videos displaying symptoms of nystagmus often do not exhibit these symptoms in the initial frames, but they become more pronounced in the later parts of the video. Therefore, labeling entire videos would be impractical. In this study, frames presenting nystagmus were distinguished on a frame-by-frame basis for all 3363 raw video data. Videos showing continuous and rapid eye movements oscillating from the origin point were labeled as positive.

2.4. Data Preprocessing

2.4.1. Data Preprocessing for Pupil Tracking Model Development

To construct the training data, the left and right areas of the eyes shown in the video images were separated and extracted. The extracted images were resampled to dimensions of 320 × 240 and, given the nature of infrared images, were converted into grayscale. Data augmentation was performed on the collected images via horizontal flipping to create mirrored images, while random rotations (≤15°) and horizontal shifts (5%) were used to create a more robust model.

2.4.2. Data Preprocessing for Nystagmus-Detection Model Development

Given the variation in length of up to 5225 frames, all the videos were divided into segments of 1000 frames each. Since videos displaying symptoms of nystagmus often do not exhibit these symptoms in the initial frames, which become more pronounced in the later parts of the video, the frames were deleted in reverse in segments of 1000 frames. Videos with a remaining length shorter than 1000 frames were discarded. The optimal frame length was determined through experimentation to optimize the model’s ability to recognize patterns and generalize. As a result of this preprocessing, the 3363 nystagmus video clips were divided into a total of 27,452 clips.

For the segmentation GT data, each of the 1000 frames of a segment labeled as nystagmus-positive was labeled. For the detection GT data, if more than 1% of the frames out of 1000 contained symptoms of nystagmus, the segment was classified as positive; otherwise, it was marked as negative.

2.5. Pupil Tracking and Quantification of Movement

For pupil movement tracking, a CNN-based 2D U-Net model architecture was utilized for pupil segmentation [15]. To train and validate the deep learning model, the data were randomly divided into training, validation, and test sets at an 8:1:1 ratio, each consisting of 6912, 216, and 216 samples, respectively. For each sample in the training set, there were 3 corresponding data points generated through augmentation. Model training was conducted with a learning rate of 1 × 10⁻⁴, 200 epochs, and a batch size of 16. The outermost coordinates of the segmented pupil area were extracted to determine the center coordinates of the pupil. Using these extracted coordinates, the best-fitting ellipse was calculated using the least-squares fitting algorithm [16,17]. The extracted elliptical data were depicted as shown in Figure 1.

Figure 1. Pupil detection results using a least-squares fitting algorithm. The red circle represents the edge of the pupil, while the blue dot indicates the center of the pupil.

To quantify the pupil movement using the extracted center coordinates, the horizontal movements of the pupil center were converted into 1D time-series data (Figure 2). However, there were cases in which the pupil center was not detected or falsely detected due to closed eyes or inaccurate pupil segmentation. In such cases, the data from the frames before and after the false detection were also likely to be inaccurate. Therefore, the coordinates at which inaccurate detections occurred, along with certain sections around those frames, were designated as inaccurately detected segments. To minimize errors in the extraction algorithm, segments suspected of containing inaccurate detections were treated as missing data.

Figure 2. Pupil tracking and eye movement quantification results.

2.6. Segmented Data Correction through a Bridging Algorithm

In patients with nystagmus, the typical eye movement pattern starts with slow eye movements, followed by a return to the original position with quick, saccadic movements, resembling a sawtooth pattern [14]. Given these characteristics, the use of linear interpolation as a bridging algorithm can inadvertently generate nystagmus-like movements, even in the absence of the condition. To address this issue, this study adopts an alternative approach to correcting missing data. This method fills in missing values with the position of the pupil detected immediately before the gap, denoted as NA, thereby avoiding the inadvertent generation of nystagmus-like patterns. This approach is illustrated in Figure 3.

Figure 3. Correcting for missing values (NA) in horizontal pupil movement. (A) Serrated eye movements in patients with nystagmus, (B) eye movement data with missing values, (C) data with linear interpolation, (D) data calibrated to the last detected position.

2.7. Nystagmus Detection Algorithm

In this study, various network architectures were utilized to train models for video-based nystagmus detection, including a CNN-based GoogLeNet, a Residual Neural Network (ResNet), a custom-developed 1D network model (CNN1D), and a neural network combining CNN1D with a long short-term memory (LSTM) network (CNN-LSTM1D). This led to the development of four different network structures as candidates for the nystagmus diagnosis model. GoogLeNet is known for its deep and complex structure and uses inception modules for efficient feature extraction [18]. ResNet introduces residual blocks to address the vanishing gradient problem encountered while training deep neural networks, allowing for the training of even deeper networks [19].

To detect nystagmus, this study proposes two custom-developed network structures: CNN1D and CNN-LSTM1D. The proposed CNN1D model adopts a shallow CNN-based network structure to prevent data loss and overfitting. The network consists of a total of 14 layers. The input layer has dimensions of 1000 by 1, corresponding to the x-axis movement features of the pupil. Subsequently, two 1D convolution layers with a kernel size of three were used to extract the basic features of the data. A max pooling layer was used to reduce the dimensionality of the feature map and extract strong features, and this process was repeated once more. Two additional 1D convolution layers followed, and Global Average Pooling (GAP) was used before the final output to reduce the number of features. Finally, two Fully Connected (FC) layers were used as the classifier, and batch normalization was performed before the final FC layer. The final output layer followed, with the total number of trainable parameters reaching 92,305.

The CNN-LSTM1D network structure combines CNN1D with an LSTM block to leverage both the local and spatial features of signals, as well as their temporal characteristics [20]. The network comprised 14 layers. Resembling the CNN1D structure closely, the Global Average Pooling (GAP) layer was replaced with a maximum pooling layer. This was followed by an LSTM layer and two FC layers. The final output layer completes the architecture, with the total number of trainable parameters at 117,781.

As previously mentioned, for the training and validation of the deep learning model, the training, validation, and test sets were randomly split at an 8:1:1 ratio based on the number of patients, resulting in 2656, 347, and 360 videos, respectively. These videos were further split into 21,856, 2840, and 2756 clips. The Adam optimizer was used to train each network with a learning rate of 0.00001 and a batch size of 32 [21]. All the networks were trained for over 200 epochs, with the best model extracted for optimization and early stopping applied to prevent overfitting. The mean absolute error (MAE) was selected as the loss function, and accuracy was chosen as the performance metric. All the networks employed a 1D convolutional architecture to process the input data, representing the horizontal movement of the pupils over time. The proposed structures of the two models are depicted in Figure 4.

Figure 4. The deep learning model architecture for nystagmus detection. (A) The details of the first proposed CNN1D model, (B) the details of the second proposed CNN-LSTM1D model.

3. Results

3.1. Performance Evaluation of the Pupil Segmentation Models for Pupil Tracking

To assess the performance of the trained pupil segmentation model in pupil tracking, its accuracy was evaluated using a randomly selected test set (N = 216). The performance evaluation was conducted using a confusion matrix to calculate the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values. Metrics including sensitivity, specificity, precision, accuracy, and the dice similarity coefficient score (DSC) were measured using the derived confusion matrix [22,23,24,25]. The performance evaluation results for the nystagmus segmentation model exhibited a recall of 97.51 ± 4.40%, precision of 97.98 ± 2.93%, accuracy of 99.92 ± 0.07%, and a DSC of 97.65 ± 3.09% (Table 1).

Table 1. Performance evaluation results of the pupil segmentation model.

3.2. Performance Evaluation of the Nystagmus Detection Models

The nystagmus detection models were trained using a training set (N = 2656) and a validation set (N = 347), and their performance was evaluated using a randomly selected test set (N = 360). To assess the performance of the nystagmus detection models, metrics including sensitivity, specificity, precision, accuracy, and F₁-score were measured using a confusion matrix. The results exhibited metrics of the highest performance with sensitivity at 95.68 ± 0.79%, specificity at 86.94 ± 1.42%, precision at 91.53 ± 0.96%, accuracy at 91.02 ± 0.66%, and an F₁-score of 92.68 ± 0.55% (Table 2).

Table 2. Performance evaluation results of the nystagmus detection models.

A comparison of the receiver operating characteristic (ROC) curves of the nystagmus detection models can be seen in Figure 5 [23]. The four models achieved high area under the curve (AUC) ROC values ranging between 0.83 and 0.90. Analysis of the ROC curves indicated that all the classifiers performed adequately, with statistically significant differences in the AUC performance among the four models. Notably, the CNN1D model was identified as the most suitable model for nystagmus detection, exhibiting the largest AUC of 0.902. This indicates that the CNN1D model can achieve higher accuracy in detecting nystagmus compared to the other models.

Figure 5. Comparison of ROC curves among the four nystagmus detection models.

4. Discussion

In this study, we propose a deep learning-based nystagmus detection algorithm for the diagnosis of BPPV using VOG data. To predict horizontal nystagmus, 1D pupil movement time-series data were extracted from the VOG data using U-Net2D. Based on the extracted data, four types of architectures were referenced to train the nystagmus detection models: GoogLeNet1D, ResNet1D, and two custom-developed models, CNN1D and CNN-LSTM1D.

Table 3 compares the performance of the models developed in this study with those from other studies that utilized machine learning- and deep learning-based nystagmus detection. All the models were designed to diagnose nystagmus from infrared-based nystagmus diagnostic video footage, with the methods or techniques listed by study. The results in each study were compared using sensitivity, precision, and F₁-score. Since the reported ranges for these metrics in each study varied from 0 to 1 or from 0 to 100, all the values were adjusted to fall within the range of 0 to 100.

Table 3. Comparison of nystagmus detection achieved in this study with related researchers’ work. (DIM: dimensions, SEN: sensitivity, PRE: precision, F₁: F₁-score).

The algorithm proposed in this study for the diagnosis of BPPV demonstrated higher sensitivity and F₁-score values compared to other studies, despite the N value being not as high as other studies, indicating that the 1D model structure contributed to an overall improvement in performance. The results demonstrate that the proposed algorithm is an effective tool for diagnosing BPPV. Moreover, comparisons to other studies emphasize the generalizability and reliability of the methods used in this study for diagnosing BPPV. Considering the complexity of the models used in other studies, the CNN1D model developed in this study provided higher accuracy and sensitivity despite its simplicity, indicating that simplicity can sometimes lead to better performance in specific diagnostic applications.

A comparison of the nystagmus diagnostic performance of the models trained using the four architectures revealed that the CNN1D model exhibited the best performance. This can be attributed to the fact that the training data primarily consisted of 1D time-series data on the horizontal movement of pupils, limiting the amount of information conveyed, likely leading to overfitting in more complex models. By employing a shallower CNN model, it was possible to extract more accurate features from limited information, which helped reduce the complexity of the model while preventing overfitting and improving generalization. These findings underscore the importance of innovation of the algorithm selection and model structure in nystagmus diagnosis, suggesting that future research should explore the application of various data types and model architectures to further enhance nystagmus diagnosis model performance.

However, there are limitations to this study. Analysis of the data that each model failed to correctly diagnose revealed that many instances involved transitions from normal eye responses to nystagmus-like movements. This suggests that while the clear sawtooth pattern of nystagmus did not emerge in the 1D time-series data representing pupil movement, precursor signs of nystagmus were present in the actual VOG data. This phenomenon indicates that detecting nystagmus based solely on 1D time-series pupil movement data presents challenges, particularly in segments where transitions from normal eye movements to nystagmus occur. This difficulty can be attributed to the limitations of single-variable models, which may not accurately capture specific changes in the data. Additionally, there is a need for additional data collection and construction. Although the model’s learning process and results were excellent even at low N values, there is a need to collect and build additional data to reflect this in various clinical environments.

Therefore, to overcome these limitations, future research should explore methods that incorporate not only information on horizontal pupil movement but also other variables, approaching the problem with multivariate data. Integrating a range of variables into training models is expected to enable more accurate differentiation of nystagmus. This multivariate approach is anticipated to be more effective than using short segments in predicting nystagmus, potentially overcoming the limitations of existing models and enhancing the accuracy of nystagmus diagnosis. This suggests the need for further exploration of how different types of data inputs and model architectures can improve the detection and diagnosis of nystagmus, indicating a promising direction for future research in this field. Additionally, there is a need to collect and build additional data so that they can be applied in various environments. Data obtained from a variety of situations and conditions will help improve the accuracy of the model and better detect subtle differences between normal eye movements and nystagmus. This will allow the model to perform more reliably in a variety of clinical scenarios.

5. Conclusions

In this paper, we propose a reliable method for nystagmus detection and a new approach to the diagnosis of benign paroxysmal positional vertigo (BPPV) through the integration of deep learning into pupillary analysis. The experimental results demonstrate that by utilizing a pupillary tracking model to detect the pupil area in video frames and classifying the acquired pupillary movement time-series data, nystagmus can be effectively detected. The proposed method has a significantly improved diagnostic accuracy compared to existing methods. The application of deep learning techniques to BPPV diagnosis is crucial for identifying and analyzing complex nystagmus patterns, while pupillary analysis aids in tracking eye movements to yield precise results. This approach expands the scope of deep learning applications in medical diagnostics, enabling faster and more accurate diagnoses, thereby reducing patients’ treatment time, and enhancing the overall quality of medical services. We anticipate meaningful contributions to the field of medicine through these advancements.

Author Contributions

Methodology, S.B.M.; software, S.B.M.; validation, S.B.M.; conceptualization, Y.J.K.; formal analysis, Y.J.K.; investigation, J.H.L.; data curation, J.H.L. and G.C.H.; writing—original draft preparation, S.B.M.; writing—review and editing, Y.J.K.; visualization, S.B.M.; supervision, J.H.L., G.C.H., S.H.C., S.J. and K.G.K.; project administration, S.H.C., S.J. and K.G.K.; funding acquisition, K.G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Technology development Program (S3141998) funded by the Ministry of SMEs and Startups (MSS, Republic of Korea), and by the Technology Innovation Program (or Industrial Strategic Technology Development Program (K_G012001185601, Building Data Sets for Artificial Intelligence Learning) funded By the Ministry of Trade Industry & Energy (MOTIE, Republic of Korea), and by the Gachon University research fund of 2022 (GCU-202300500001).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, approved by the Institutional Review Board (IRB) of Gachon University Gil medical center (GBIRB2023-313), and was conducted in accordance with the relevant guidelines and ethical regulations. Considering that this study would impose no more than minimal risk on the patients included, a waiver of consent was obtained.

Informed Consent Statement

Given the retrospective nature of this investigation, the Institutional Review Board (IRB) at Gachon University Gil Medical Center issued a waiver concerning the necessity of the patients’ informed consent. Analyzing pre-existing data, inherent in the retrospective approach, carries minimal risk to participants. This waiver was implemented to uphold ethical standards in the study, ensuring due regard for the rights and privacy of all the individuals concerned.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Sung Ho Cho was employed by the company AMJ Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Eggers, S.D.Z.; Bisdorff, A.; Von Brevern, M.; Zee, D.S.; Kim, J.-S.; Perez-Fernandez, N.; Welgampola, M.S.; Della Santina, C.C.; Newman-Toker, D.E. Classification of Vestibular Signs and Examination Techniques: Nystagmus and Nystagmus-Like Movements. J. Vestib. Res. 2019, 29, 57–87. [Google Scholar] [CrossRef] [PubMed]
Lemos, J.; Strupp, M. Central Positional Nystagmus: An Update. J. Neurol. 2022, 269, 1851–1860. [Google Scholar] [CrossRef] [PubMed]
Buttner, U.; Helmchen, C.; Brandt, T. Diagnostic Criteria for Central Versus Peripheral Positioning Nystagmus and Vertigo: A Review. Acta Otolaryngol. 1999, 119, 1–5. [Google Scholar] [PubMed]
Lopez-Escamez, J.A.; Gamiz, M.J.; Fernandez-Perez, A.; Gomez-Fiñana, M. Long-Term Outcome and Health-Related Quality of Life in Benign Paroxysmal Positional Vertigo. Eur. Arch. Oto-Rhino-Laryngol. Head Neck 2005, 262, 507–511. [Google Scholar] [CrossRef] [PubMed]
Alvarenga, G.A.; Barbosa, M.A.; Porto, C.C. Benign Paroxysmal Positional Vertigo without Nystagmus: Diagnosis and Treatment. Braz. J. Otorhinolaryngol. 2011, 77, 799–804. [Google Scholar] [CrossRef] [PubMed]
Edlow, J.A.; Newman-Toker, D. Using the Physical Examination to Diagnose Patients with Acute Dizziness and Vertigo. J. Emerg. Med. 2016, 50, 617–628. [Google Scholar] [CrossRef] [PubMed]
Power, L.; Murray, K.; Szmulewicz, D.J. Characteristics of Assessment and Treatment in Benign Paroxysmal Positional Vertigo (Bppv). J. Vestib. Res. 2020, 30, 55–62. [Google Scholar] [CrossRef] [PubMed]
Ben Slama, A.; Sahli, H.; Mouelhi, A.; Marrakchi, J.; Boukriba, S.; Trabelsi, H.; Sayadi, M. Hybrid Clustering System Using Nystagmus Parameters Discrimination for Vestibular Disorder Diagnosis. J. X-Ray Sci. Technol. 2020, 28, 923–938. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Wu, H.; Liu, Y.; Zheng, S.; Liu, Z.; Li, Y.; Zhao, Y.; Zhu, Z. Deep Learning Based Torsional Nystagmus Detection for Dizziness and Vertigo Diagnosis. Biomed. Signal Process. Control 2021, 68, 102616. [Google Scholar] [CrossRef]
Lim, E.-C.; Park, J.H.; Jeon, H.J.; Kim, H.-J.; Lee, H.-J.; Song, C.-G.; Hong, S.K. Developing a Diagnostic Decision Support System for Benign Paroxysmal Positional Vertigo Using a Deep-Learning Model. J. Clin. Med. 2019, 8, 633. [Google Scholar] [CrossRef] [PubMed]
Pham, T.X.; Choi, J.W.; Mina, R.J.L.; Nguyen, T.X.; Madjid, S.R.; Yoo, C.D. Lad: A Hybrid Deep Learning System for Benign Paroxysmal Positional Vertigo Disorders Diagnostic. IEEE Access 2022, 10, 113995–114007. [Google Scholar] [CrossRef]
Li, H.; Yang, Z. Torsional Nystagmus Recognition Based on Deep Learning for Vertigo Diagnosis. Front. Neurosci. 2023, 17, 1160904. [Google Scholar] [CrossRef] [PubMed]
Lu, H.; Mao, Y.; Li, J.; Zhu, L. Multimodal Deep Learning-Based Diagnostic Model for Bppv. BMC Med. Inform. Decis. Mak. 2024, 24, 82. [Google Scholar] [CrossRef] [PubMed]
Abadi, R.V. Mechanisms Underlying Nystagmus. J. R. Soc. Med. 2002, 95, 231–234. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Halır, R.; Flusser, J. Numerically Stable Direct Least Squares Fitting of Ellipses. In Proceedings of the 6th International Conference in Central Europe on Computer Graphics and Visualization. WSCG, Plzen, Czech Republic, 9–13 February 1998. [Google Scholar]
Hammel, B.; Sullivan-Molina, N. Bdhammel/Least-Squares-Ellipse-Fitting: v2.0.0. 2020. Available online: https://zenodo.org/records/3723294 (accessed on 19 May 2022).
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Mehmood, F.; Ahmad, S.; Whangbo, T.K. An Efficient Optimization Technique for Training Deep Neural Networks. Mathematics 2023, 11, 1360. [Google Scholar] [CrossRef]
Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to Roc, Informedness, Markedness and Correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
Altman, D.; Machin, D.; Bryant, T.; Gardner, M. Statistics with Confidence: Confidence Intervals and Statistical Guidelines; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and Roc: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Hobart, Australia, 4–8 December 2006. [Google Scholar]
Simpson, T. A Method of Establishing Group of Equal Amplitude in Plant Society Based on Similarity of Species Content; Det Kongelige Danske Videnskabernes Selskab. Biologiske Skrifter; Munksgaard in Komm.: Copenhagen, Denmark, 1948; Volume 5, pp. 1–34. [Google Scholar]
Anh, D.T.; Takakura, H.; Asai, M.; Ueda, N.; Shojaku, H. Application of Machine Learning in the Diagnosis of Vestibular Disease. Sci. Rep. 2022, 12, 20805. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.J.; Zhao, X.L.; Gao, Y.B.; Li, H.B.; Cheng, R.R. Reaserch on Nystagmus Video Classification Algorithm Based on Attentation Mechanism. Laser Optoelectron. Prog. 2021, 59, 380–389. [Google Scholar]

Figure 1. Pupil detection results using a least-squares fitting algorithm. The red circle represents the edge of the pupil, while the blue dot indicates the center of the pupil.

Figure 2. Pupil tracking and eye movement quantification results.

Figure 3. Correcting for missing values (NA) in horizontal pupil movement. (A) Serrated eye movements in patients with nystagmus, (B) eye movement data with missing values, (C) data with linear interpolation, (D) data calibrated to the last detected position.

Figure 4. The deep learning model architecture for nystagmus detection. (A) The details of the first proposed CNN1D model, (B) the details of the second proposed CNN-LSTM1D model.

Figure 5. Comparison of ROC curves among the four nystagmus detection models.

Table 1. Performance evaluation results of the pupil segmentation model.

	Sensitivity (%)	Specificity (%)	Precision (%)	Accuracy (%)	DSC (%)
U-Net	95.67 ± 9.60	99.96 ± 0.03	97.18 ± 8.12	99.91 ± 0.05	96.29 ± 8.44

Table 2. Performance evaluation results of the nystagmus detection models.

	Sensitivity (%)	Specificity (%)	Precision (%)	Accuracy (%)	F₁-Score (%)
CNN1D	94.06 ± 0.78	86.39 ± 1.31	91.34 ± 0.84	91.02 ± 0.66	92.68 ± 0.55
CNN-LSTM1D	95.68 ± 0.79	82.70 ± 1.41	89.40 ± 0.95	90.54 ± 0.76	92.43 ± 0.63
GoogLeNet1D	95.04 ± 0.77	71.76 ± 1.84	83.69 ± 1.20	85.82 ± 0.88	89.00 ± 0.72
ResNet1D	92.56 ± 0.92	86.94 ± 1.42	91.53 ± 0.96	90.33 ± 0.82	92.04 ± 0.69

Table 3. Comparison of nystagmus detection achieved in this study with related researchers’ work. (DIM: dimensions, SEN: sensitivity, PRE: precision, F₁: F₁-score).

	Method		DIM	SEN (%)	PRE (%)	F₁ (%)	N
Adopted Method	CNN1D	DL	1D	94.06	91.34	92.68	360
Anh et al. (2022) [26]	SVM	ML	-	79.00	78.00	78.00	253
Lim et al. (2019) [10]	CNN	DL	2D	80.80	79.80	79.40	1005
Zhang et al. (2021) [9]	TBSIN	DL	2D	78.92	81.88	81.00	1600
Rham et al. (2022) [27]	SVM	ML	1D	90.00	91.00	90.00	746
Li et al. (2023) [12]	CNN-BiLSTM	DL	2D	91.20	94.30	-	4904

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Deep Learning-Based Nystagmus Detection for BPPV Diagnosis

Abstract

1. Introduction

2. Materials and Methods

2.1. Development Environment

2.2. Data Collection

2.3. Data Labeling

2.3.1. GT Data for Pupil Tracking Model Development

2.3.2. GT Data for Nystagmus Detection Model Development

2.4. Data Preprocessing

2.4.1. Data Preprocessing for Pupil Tracking Model Development

2.4.2. Data Preprocessing for Nystagmus-Detection Model Development

2.5. Pupil Tracking and Quantification of Movement

2.6. Segmented Data Correction through a Bridging Algorithm

2.7. Nystagmus Detection Algorithm

3. Results

3.1. Performance Evaluation of the Pupil Segmentation Models for Pupil Tracking

3.2. Performance Evaluation of the Nystagmus Detection Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics