A Systematic Review on Machine Learning Techniques for Early Detection of Mental, Neurological and Laryngeal Disorders Using Patient’s Speech

: There is a substantial unmet need to diagnose speech-related disorders effectively. Machine learning (ML), as an area of artiﬁcial intelligence (AI), enables researchers, physicians, and patients to solve these issues. The purpose of this study was to categorize and compare machine learning methods in the diagnosis of speech-based diseases. In this systematic review, a comprehensive search for publications was conducted on the Scopus, Web of Science, PubMed, IEEE and Cochrane databases from 2002–2022. From 533 search results, 48 articles were selected based on the eligibility criteria. Our ﬁndings suggest that the diagnosing of speech-based diseases using speech signals depends on culture, language and content of speech, gender, age, accent and many other factors. The use of machine-learning models on speech sounds is a promising pathway towards improving speech-based disease diagnosis and treatments in line with preventive and personalized medicine.


Introduction
In the United States, 25% of adults, 18% of adolescents and 13% of children have a mental disorder.Although these disorders have larger economic impacts than others, governments still spend less on them [1].
Major depression is one of the most common mental disorders, affecting over 300 million people [2].The global prevalence of depression was estimated at 4.4% HE in 2015, affecting more women than men over a two-year period [3].Early depressive symptoms, such as psychomotor retardation and cognitive impairment, are usually associated with language impairment.Thus, depressive speech has been described by clinicians as monotonous, uninteresting, and lacking energy.The difference may allow for the detection of depression by analyzing the acoustics of depressed people's voices [3,4].
Alzheimer's disease is a type of neurological disorder that gradually reduces a patient's mental abilities.The main symptoms of this disease include memory loss, difficulties in making decisions, and the incorrect choice of words.Therefore, speech signal processing in this disease has attracted the attention of many researchers over the past decade.A diagnosis of Alzheimer's disease using audio signals depends on culture, language, language content, gender, age, accent, and many other factors.Parkinson's disease (PD) is the second most common neurodegenerative disease after Alzheimer's disease (AD).PD is reported to predominate in 0.3% of the general population in developed countries, whereas PD prevalence in the elderly population (60 years and older) is 1%.Voice impairment has been reported to be an early biomarker for this disease.Moreover, the proposed intelligent system can be used as a means of prodromal diagnosis [5,6].
Since medical-decision support systems are being developed in various fields and lead to early diagnosis, much research has been conducted to create intelligent disease diagnosis models using a patient's speech.This study's objective was to perform an updated systematic review of the available literature to appraise the machine learning models in diagnosing mental, neurological and laryngeal diseases based on a patient's speech/voice.

Related Work
Diseases related to the human vocal system, such as laryngeal cancer and polyps, usually have a serious impact on the patient's health and social life.Fortunately, most of these diseases can be cured if detected early.Because laryngeal syndrome actually causes voice abnormalities in the patient (such as wheezing and hoarseness-two major symptoms of voice system dysfunction), some professionals can detect the problem simply by listening to the patient and deciding to prescribe a test such as a laryngoscopy.However, these tests are very expensive and time consuming.Additionally, they are invasive and cause patient discomfort.Therefore, some preliminary research is worthwhile.A major drawback of perceptual studies is their inherent subjectivity, which is unreliable and difficult to quantify.To overcome these problems, researchers have sought reliable methods to distinguish between healthy and abnormal voices, usually based on speech signal processing techniques [4,7].
To be able to diagnose healthy and abnormal voices, we first need to find some distinct features.The speech is then classified as healthy or diseased, using several classification methods such as SVM and neural networks (NN) [4,8].
In modern language technology systems, many methods and algorithms have emerged that rely on the interdisciplinary research fields of signal processing and artificial intelligence.Machine learning helps to build good models using real-world features to do real work.The new techniques developed in the machine learning paradigm have brought enormous improvements to language technology.The main concept of machine learning is learning from a given dataset to analyze, recognize or complete a given task.Moreover, various mathematicians, psychologists, engineers, medical scientists, computer scientists and many others have invented and sometimes rediscovered some methods of solving problems.Therefore, different methods applicable to emotion prediction in speech recognition were presented in comparative frameworks [2,9].
Machine learning is an intensive research field with successful applications to solve various problems in health sciences, such as neurological diseases, larynx, stress, depression, autism and Parkinson's disease [6,[10][11][12][13][14].For example, one of the most important uses of machine learning is its use in solving problems for Spanish-speaking Alzheimer's patients, with features such as the number of verbs, nouns and conjunctions with radial basis function kernel [15].More than 60 percent of speech is voiced and this part very important to develop intelligent systems based on speech data [16].

Materials and Methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines was followed (Figure 1).The search aimed at identifying articles from the last 10 years that include machine learning methods for the diagnosis of mental, neurological and laryngeal disorders, based on a patient's speech.A comprehensive analysis of the publication was carried out using data from the Scopus, Web of Science (WOS), PubMed, IEEE and Cochrane databases from 2002-2022.
Based on the inclusion and exclusion criteria of this study (which were the time period and required English-language-based articles), articles were added to this study, their information was extracted in a checklist, and the collected data were analyzed using descriptive statistics.Incoming and outgoing articles were finally reported by a PRISMA flowchart.JBI's critical appraisal tools were used for assessing the trustworthiness.
Because of the different terminology of selected databases in indexing papers, and in an attempt to include all relevant articles, thesauruses were used and a systematic record in databases of subject headings were used to index articles.To organize the search systematically, search terms were grouped around four expressions: "Machine Learning", "Lower Gastrointestinal Cancers", "Cancer" and "Diagnosis and Screening".Further elaboration of the search terms used for eligible articles in the four expressions can be seen in Table 1.The search strategy consisted of four terms, respectively: Term1 (Machine Learning), Term2 (Speech), Term3 (Neurological, Mental or Laryngeal disorders) and Term4 (Diagnosis and Screening).The terms within each term were a mix of medical subject heading (MeSH) terms and synonyms.The AND operator was applied between each term and the OR operator was applied between each MeSH term and synonym.Only a few limitations were marked in the search criteria, such as studies written in languages The search aimed at identifying articles from the last 10 years that include machine learning methods for the diagnosis of mental, neurological and laryngeal disorders, based on a patient's speech.A comprehensive analysis of the publication was carried out using data from the Scopus, Web of Science (WOS), PubMed, IEEE and Cochrane databases from 2002-2022.
Based on the inclusion and exclusion criteria of this study (which were the time period and required English-language-based articles), articles were added to this study, their information was extracted in a checklist, and the collected data were analyzed using descriptive statistics.Incoming and outgoing articles were finally reported by a PRISMA flowchart.JBI's critical appraisal tools were used for assessing the trustworthiness.
Because of the different terminology of selected databases in indexing papers, and in an attempt to include all relevant articles, thesauruses were used and a systematic record in databases of subject headings were used to index articles.To organize the search systematically, search terms were grouped around four expressions: "Machine Learning", "Lower Gastrointestinal Cancers", "Cancer" and "Diagnosis and Screening".Further elaboration of the search terms used for eligible articles in the four expressions can be seen in Table 1.The search strategy consisted of four terms, respectively: Term1 (Machine Learning), Term2 (Speech), Term3 (Neurological, Mental or Laryngeal disorders) and Term4 (Diagnosis and Screening).The terms within each term were a mix of medical subject heading (MeSH) terms and synonyms.The AND operator was applied between each term and the OR operator was applied between each MeSH term and synonym.Only a few limitations were marked in the search criteria, such as studies written in languages other than English, literature published before 2002 and studies that were done on treatment and follow-up.The following studies were also excluded: (a) those that were used for animals and (b) those that did not use machine learning methods.Finally, JBI's critical appraisal tool was used for assessing the trustworthiness, relevance and results of published papers [17].

Results
A general description of the search results is shown in Table 2. Based on the studies, the articles can be divided into three general categories: neurological disorders, laryngeal disorders and mental disorders.According to Table 2, the frequency of each category is specified; this shows that most of the articles that used voice features for diagnosis are related to mental disorders.
Mental disorders MDD-anxiety and depression-schizophrenia-bipolar 62.5 (30) Table 3 shows the papers included in this systematic review, along with information from each paper that was the target of this systematic review.It is shown in this table each of the included papers focused on one of the targeted disease by this systematic review.In front of the name of the disease, the name of the category of that disease is also mentioned.This table also shows which machine learning algorithm each article used to diagnose its target disease.For a better and faster comparison, the most important evaluation metric of each study is stated in front of the name of the machine learning algorithm.Another important issue in this systematic review is which category of speech/sound features have been used by each of the included papers.
A general review on this table shows that the SVM algorithm has been the most used among all machine learning algorithms.Regardless of the algorithm used, only acoustic features have been used to diagnose laryngeal diseases, and mainly prosodic features have been used for neurological and mental diseases.

Discussion
To our knowledge, there was no precise classification of the types of machine learning algorithms in speech-based disease diagnosis that would specify which machine learning algorithms can be used for this purpose.We did not find any systematic study that checked which speech-based disorders and what origin and symptoms can be taken into consideration, what the characteristics are for each of the used algorithms, or what evaluation metrics have been used.Therefore, this systematic study was planned regarding the relationship among three important matters.Basic symptoms of the patient population, speech disorders and the machine learning methods for the development of an early detection model.
In this systematic review, four main issues were considered.First was the name of the disease related to speech; second was the features of speech that were affected by the given disease and therefore used in modeling; third was the machine learning algorithms that were used for modeling; and finally, the evaluation metrics for machine learning models used to detect the speech-based disorders.
Considering the three different categories of diseases that have been investigated in this systematic review, the results of this study are discussed independently for each category.
Neurological diseases: Twelve of the included studies investigated machine learning tools for diagnosing neurological diseases based on patient speech [5,6,10,14,[18][19][20][21][22][23][24][25].Nine papers presented a machine learning model for the early detection of Parkinson's disease [5,6,14,[18][19][20][21][22][23].Five articles in this category were better than all the other articles, and the results of their evaluation metrics were also better [14,[19][20][21].The deep learning algorithm had been used the most [14,19,21] and the best accuracy was reported in the study of Zahid et al. in 2020 in Pakistan, wherein they conducted their study on a Spanish language dataset using deep learning with 99.7% of accuracy [21].Only one study used prosodic features to diagnose Parkinson's, which is related to Hammer et al. from France in 2022.In this study, speech fluency and speech rhythm were used in SVM to implement a diagnostic model on a French language data set, and the accuracy of this model was 89% [20].The rest of the studies used acoustic features for analysis and modeling.Three of the included articles deal with the diagnosis of autism using machine learning [10,24,25].Two studies of this category presented better results, both of which used deep learning [24,25].A quantitative study by Eni and colleagues conducted in 2020 in Israel determined the severity of autism in patients using the prosodic features of speech.In this study, RMSE = 4.65 and mean correlation = 0.72 [24].In the qualitative study of Lin et al. in 2020, the presence of autism in patients was diagnosed using the acoustic characteristics of the patient's voice with an accuracy of 66.8 [25].
Laryngeal diseases: Six of the included articles presented machine learning models for the diagnosis of laryngeal diseases [4,[26][27][28][29][30].All these studies used acoustic features for modeling, and five articles presented the results well [4,[26][27][28]30].All the studies of this category used the SVM algorithm for modeling, except Mahmoud et al.'s study in 2021, which implemented a diagnostic model with deep learning (ACC = 99.2) [26].The best performance was related to the study of Ali et al. in 2016, where the accuracy of their model was 100% [30].
Mental disease: Thirty of the reviewed articles dealt with the diagnosis of mental illnesses using speech [1][2][3][7][8][9]11,13,.In fourteen studies, machine learning algorithms were used to diagnose Major Depressive Disorder (MDD) [1,3,8,[31][32][33][34][36][37][38][39][40][41][42].Five studies presented better results than the others [3,33,34,36,37].In all these studies, prosodic features of speech and artificial neural networks were used for disease diagnosis.The article by Bedi et al. is a quantitative study to calculate MDD severity in 2020 in China for a Chineselanguage dataset.In this study, RMSE = 5.51 and MAE = 4.2 were obtained [34].Among the qualitative studies, the article by Rezaii et al., which was conducted in 2019 in the United States, obtained the highest accuracy with artificial neural networks (ACC = 93%) [36].This level of accuracy is much better compared to the article by Gavrilescu et al. [37] in 2019 in Romania, which used artificial neural networks with an accuracy of 80.75%.The article by Espinola et al. [3] in 2021 in Brazil and the article by Bedi et al. [33] in 2014 in the United States used the SVM algorithm for modeling, and the accuracy of their models was 89.14% and 88%, respectively.
Eight of the included studies presented machine learning models for diagnosing anxiety and minor depression [2,7,9,11,[43][44][45][46].Among these articles, five articles obtained better results, in all of which acoustic sound vignettes were used to diagnose the disease [2,11,43,45,46].He et al.'s article is a quantitative study to detect disease severity using the SVM algorithm, which was conducted in China in 2018.In this article, RMSE = 10.44 and MAE = 8.60 [45].The best accuracy was related to the study by Jenei et al., which was conducted in Hungary in 2021 and obtained an accuracy of 85.2% using CNN [46].McGinnis et al.'s article in 2019 in India [43] and Sumali et al.'s article in 2020 in Japan [11] used the SVM algorithm, and their accuracy was reported as 80% and 67.2%, respectively.The most recent paper in this category belongs to Shin et al. in 2021 in Korea, which obtained an accuracy of 65.9% using MLP.
Five articles have examined the diagnosis of schizophrenia using machine learning [35,47-50].Among them, two studies presented better results [47,49].Huang  Two of the included articles are related to the diagnosis of bipolar disease using machine learning [13,51].In the study by Weintraub et al., which was conducted in 2021 in America, they implemented a diagnostic model with an accuracy of 81.8% using a decision tree and the prosodic features of speech [13].Arevian and colleagues also presented a diagnostic model with the prosodic features of speech.In this study, AUC was 81% [51].
Although the diagnosis of neurological diseases using speech deep learning has been the most repeated in the reviewed studies, the findings show that, in general, SVM is the most used for the diagnosis of speech-related diseases.All the studies that have dealt with the diagnosis of laryngeal diseases have used acoustic features-for the diagnosis of neurological and mental diseases more so than for the features of voice speech.This means that, in order to diagnose mental and neurological diseases, it is necessary for the patient to speak, while for laryngeal diseases, the voice of the larynx is usually sufficient.
One of the problems that make it difficult to compare studies is that the prosodic features are different from one language to another, and therefore one cannot expect an algorithm to provide the same results in two languages.On the other hand, studies that have been conducted in the same language usually use different features.
This study had some limitations.First, unpublished and non-English studies were not included in this review.Second, the quality of the included studies was not assessed.Also, heterogeneities between studies prevented the conducting of a meta-analysis.Finally, there was not a standard for the feature selection or the dataset in the reviewed studies.
Despite these limitations, the current review's findings provide crucial recommendations for further research.
First, future study should focus on implementing a standard dataset and also some indicators for each language, based on the acoustic and prosodic features of patient speech for each category of mentioned disease.Second, some research is needed to focus on how to generalize the use of the results of these studies, considering that in the studies conducted, the patient is evaluated in special conditions and by saying pre-determined sentences.

Conclusions
The present systematic review provides evidence of how the prosodic or acoustic features of a patient's speech can be affected by mental, neurological or laryngeal diseases, as well as a comparison of machine learning methods in diagnosing these diseases.
According to our study, the results demonstrate a classification of machine learning tools for detecting and screening severe disorders in a cost-effective manner.New mechanisms are needed to enhance the medical diagnosis of disease.Also, our findings show the relevant feature group for each disease category, which is an important phase for implementing a machine learning model.This approach has considerable potential to address a critical gap in the diagnosing of some severe diseases using patient speech; nevertheless, the findings from these primary studies support further research in implementing machine learning model as a clinical decision support system based on speech/voice.

Figure 1 .
Figure 1.PRISMA flow diagram of study inclusion and exclusion criteria for the systematic review.

Figure 1 .
Figure 1.PRISMA flow diagram of study inclusion and exclusion criteria for the systematic review.

Table 1 .
The terms below show the search strategy used in this research.Each term consists of MeSH terms and synonyms.paperswas undertaken to determine the most related ones.After removing duplicates, only 533 remained.Finally, 48 articles were selected against the eligibility criteria.Screening was performed by all authors by reading the title and abstract.From each article, the following features were synthesized if available: disorders, sample size, presence of control group, age, clinically assessed or self-assessed, clinical scales used for diagnosis, tasks to obtain speech, predictive model, highest performance or statistical significance, type of validation or test set, and other relevant findings (especially if it was stated which features were predictive).

Table 2 .
Summary of systematic review results.

Table 3 .
Classification of voice features and detection algorithms based on speech disorders.
et al. from China in 2020, using deep learning, presented a disease diagnosis model with 84% accuracy [47]; Fisher et al. from Canada in 2008, with MMN (Mismatch Negative), provided a variable between 82% and 99%.Both studies used prosodic features.